Next Article in Journal
Graph-Temporal Contrastive Transformer for Financial Fraud Detection Using Transaction Behavior Modeling
Previous Article in Journal
Robust Regularized Recursive Least-Squares Algorithm Based on Third-Order Tensor Decomposition
Previous Article in Special Issue
Heterogeneous Genetic Learning and Comprehensive Learning Strategy Particle Swarm Optimizer
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Low-Cost Lung Cancer Classification in WSIs Using a Foundation Model and Evolving Prototypes

by
Soroush Oskouei
1,2,*,
André Pedersen
3,
Marit Valla
4,5,
Vibeke Grotnes Dale
4,6,
Sissel Gyrid Freim Wahl
6,
Mats Dehli Haugum
6,
Borgny Ytterhus
4,
Maria Paula Ramnefjell
7,8,
Lars Andreas Akslen
7,8,
Gabriel Kiss
9,10 and
Hanne Sorger
1,2
1
Department of Circulation and Medical Imaging, Norwegian University of Science and Technology (NTNU), NO-7491 Trondheim, Norway
2
Clinic of Medicine, Levanger Hospital, Nord-Trøndelag Health Trust, NO-7600 Levanger, Norway
3
Development, DIPS AS, NO-0191 Oslo, Norway
4
Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), NO-7491 Trondheim, Norway
5
Clinic of Laboratory Medicine, St. Olavs Hospital, Trondheim University Hospital, NO-7030 Trondheim, Norway
6
Department of Pathology, St. Olavs Hospital, Trondheim University Hospital, NO-7030 Trondheim, Norway
7
Department of Clinical Medicine, Centre for Cancer Biomarkers CCBIO, University of Bergen, NO-5007 Bergen, Norway
8
Department of Pathology, Haukeland University Hospital, NO-5020 Bergen, Norway
9
Center for Innovation, Medical Devices, and Technology, St. Olavs Hospital, Trondheim University Hospital, NO-7030 Trondheim, Norway
10
Department of Computer Science, Norwegian University of Science and Technology (NTNU), NO-7491 Trondheim, Norway
*
Author to whom correspondence should be addressed.
Algorithms 2025, 18(12), 769; https://doi.org/10.3390/a18120769
Submission received: 31 October 2025 / Revised: 28 November 2025 / Accepted: 2 December 2025 / Published: 6 December 2025
(This article belongs to the Special Issue Evolutionary and Swarm Computing for Emerging Applications)

Abstract

Whole slide imaging has transformed the field of pathology by enabling high-resolution digitization of histopathological slides. However, the large image size and variability in morphology, tissue processing, and imaging can pose challenges for robust computational analysis. When working with specific tasks in digital pathology, conventional feature extractors pretrained on general images may not provide features as relevant as those trained on histopathological images. To address this, foundation models pretrained on histopathological images have been developed. Yet, their large size and computational demands might limit widespread adoptions to specific tasks. To facilitate the low-cost adoption of these models, we utilized low-rank adaptation for finetuning the model and developed evolving prototype-based multiple instance learning (EP-MIL). Our method’s capabilities were demonstrated by applying it to the classification of two histological subtypes of lung cancer. The results show that our approach achieves competitive performance when benchmarked against a state-of-the-art technique (CLAM), while offering improvements in efficiency. Specifically, our proposed method requires 8.3 times less training runtime compared with CLAM, uses less than 200.0 MB of memory during training, and enables 73.8 times faster inference runtime. These efficiency gains, combined with competitive performance, suggest that utilizing evolving prototypes with LoRA-tuned foundation models offers a more efficient and practical approach for broader use of foundation models in resource-constrained clinical settings.

1. Introduction

The digitization of tissue slides has transformed diagnostic pathology by enabling digital workflows, automatic analysis, and image sharing. Notable advances include the integration of artificial intelligence (AI), which can improve diagnostic accuracy, lung cancer management, and therapeutic decision-making [1,2,3,4]. AI-propelled algorithms offer substantial support in diagnostic and digital pathology by enabling tumor segmentation, artifact detection, and tumor classification, which can include distinguishing tumor subtypes [1,2,5,6,7].
In lung cancer, various deep learning models have been used for the classification of histological subtypes, including adenocarcinoma (AC) and squamous cell carcinoma (SCC) [2,4,8]. Deep learning has been applied to lung cancer WSIs for a variety of other tasks, including the prediction of key biomarkers and mutations and prediction of response to immune check-point inhibitors [2,4,8,9,10,11]. In non-small cell lung cancer (NSCLC), accurate histopathological subtyping is important for treatment decisions. In some cases, immunohistochemistry is needed for histopathological classification [2].
AI-driven platforms are currently being developed to offer explainable analyses of features in whole slide images (WSIs), providing new insights into the diagnostic and prognostic evaluation of lung cancer [12]. As opposed to immunohistochemistry and other laboratory assays, AI tools are tissue-preserving, which means tumor tissue is left intact for molecular genetic analyses [2,13]. In digital pathology, foundation models have been trained to recognize morphological feature characteristics of histological subtypes [14]. These models, along with the integration of multiple instance learning (MIL) frameworks, handle the gigapixel size of WSIs, showing the potential of domain-specific pretrained foundation models to outperform traditional ImageNet-pretrained architectures in distinguishing cancer subtypes [14].
MIL has proven particularly useful for the classification of WSIs [15]. MIL treats each slide as a bag of patch-level instances, and label predictions are made at slide level without requiring per-patch annotations [16]. Recent MIL variants, such as clustering-constrained attention multiple instance learning (CLAM), use attention mechanisms to identify the most informative patches, improving interpretability and performance in pathology tasks [17,18].
Training such complex models can be challenging. Gradient-based methods are usually used for training these models and navigate the highly non-convex surface defined by the objective function over the parameter space. The non-convex surface makes them susceptible to converging to suboptimal local minima. Furthermore, gradient calculations themselves can be computationally expensive. To overcome these problems, evolutionary algorithms (EAs) provide a population-based search strategy that is gradient-free and suitable for exploring complex parameter spaces, and have been successfully applied to neural networks [19,20].
Although MIL-based methods propose competitive classification results, feature extraction models inherently play a significant role. Pulmonary AC and SCC are the two most common subtypes of NSCLC, and distinguishing the two is important for diagnosis and treatment; however, this distinction can be challenging in poorly differentiated tumors [21,22].
Foundation models have demonstrated remarkable capabilities in feature extraction, offering versatile representations for downstream tasks [23]. Vision–language models such as CONCH, trained on over a million image–caption pairs, have demonstrated the benefits of integrating descriptive text for improved generalization across tasks [24]. TITAN extended this direction as a large-scale multimodal whole-slide foundation model, using hundreds of thousands of pathology reports and synthetic captions to align visual and textual features [25]. Prov-GigaPath, trained on more than 1.3 billion image tiles from 28 cancer centers, introduced a novel vision transformer (ViT)-based architecture capable of handling gigapixel slides and achieved state-of-the-art performance across diverse pathology tasks [26]. In addition to these efforts, H-optimus-0 employed one of the largest ViT backbones in computational pathology, pretrained on over 500,000 diverse WSIs, further highlighting the trend toward increasingly large-scale, multimodal, and generalizable pathology foundation models [27].
Finetuning foundation models and creating a computationally efficient and deployable method is critical for enhancing the models’ adaptability to domain-specific tasks while operating under usual computational constraints. It is also important to identify a well-performing method that minimizes both inference latency and memory footprint. This is essential for creating a computationally inexpensive and resource-friendly solution, particularly for the expanding landscape of internet of things (IoT), embedded systems, and edge computing. The dual focus on model specialization and operational efficiency is paramount for bridging the gap between state-of-the-art AI and its practical deployment in diverse, resource-constrained environments.
Recent advancements in computational pathology emphasize improving the efficiency, interpretability, and adaptability of deep learning models for medical diagnosis, particularly when dealing with limited data and complex WSIs. Hong et al. addressed the scarcity of labeled data in cervical cancer classification by proposing a lightweight LoRA ViT framework [28]. This method freezes pretrained weights and introduces low-rank matrices to drastically reduce trainable parameters, achieving superior accuracy and faster convergence compared to traditional CNNs. Focusing on the lack of transparency in WSI analysis, Rymarczyk et al. introduced ProtoMIL, a self-explainable MIL model that utilizes learnable “prototypical parts” to mimic human case-based reasoning [29]. By matching image patches to these prototypes, ProtoMIL provides fine-grained visual interpretations while maintaining state-of-the-art performance across five datasets, including breast and colon cancer. Lee et al. benchmarked four pathology-specific foundation models across 14 datasets to determine optimal adaptation strategies [30]. Their results indicate that parameter-efficient finetuning approach is the most effective strategy for consistent adaptation, while simpler linear adaptation methods outperform complex meta-learning algorithms in few-shot, data-limited scenarios.
To address these needs and bridge the gap between high-performance AI and clinical accessibility, this study pursued three primary research objectives:
(a)
To demonstrate that large-scale histopathology foundation models, specifically H-optimus-0, can be effectively finetuned on consumer-grade hardware using low-rank adaptation (LoRA), thereby eliminating the barrier of expensive enterprise-grade computing infrastructure [23].
(b)
To introduce evolving prototype-based multiple instance learning (EP-MIL), a novel, gradient-free approach designed to drastically reduce inference latency and memory footprint. This objective aims to validate EP-MIL as a superior alternative to complex attention-based models like CLAM for deployment in resource-constrained environments.
(c)
To rigorously benchmark the proposed efficient pipeline against diverse, multi-centric datasets (Biobank1, HULC, and TCGA) to ensure that the reduction in computational cost does not compromise classification accuracy or generalizability in the face of stain and scanner variability.

2. Methodology

2.1. Datasets

In this study, WSIs from three cohorts of NSCLC cases were used: a cohort from the regional research biobank in central Norway (Biobank1®), the Haukeland University Lung Cancer (HULC) cohort, and The Cancer Genome Atlas (TCGA) cohort (Figure 1).
Biobank1 and HULC cohorts include images of tissue sections of 4 μm thickness that were stained with HE through a standard staining method where the steps include deparaffinization, rehydration, hematoxylin staining, sequential rinsing, eosin staining, dehydration, immersion in TissueClear, air drying, and coverslipping [31]. All sections were scanned at ×40 magnification using the Olympus BX61VS VS120S5 scanner at the Norwegian University of Science and Technology.
The Biobank1 cohort comprises histopathological, cytological, biomarker, and longitudinal clinical follow-up data from patients diagnosed with lung cancer in Central Norway since 2006. It includes diagnostic tumor biopsies and sections obtained from surgically resected lung cancers [32].
The HULC cohort includes 438 patients with NSCLC who underwent surgical resection at the Haukeland University Hospital in Bergen, Norway, between 1993 and 2010 [33].
A total of 229 WSIs from the Biobank1 cohort and 97 WSIs from the HULC cohort that were reviewed for our previous study [34] were selected for further quality check. Exclusion criteria included blurry images, scanning artifacts, histological subtypes other than AC and SCC, and insufficient amounts of tumor tissue. After quality check, WSIs from 221 patients from the Biobank1 cohort and WSIs from 86 patients from the HULC cohort were retained.
A randomly selected subset of 30 WSIs from the Biobank1 cohort was designated as a hold-out test set, whereas the HULC cohort was employed as an independent external test set. The distribution of histological subtypes across the datasets is provided in Figure 1.
To test the generalizability of the models in different staining and scanning settings, 48 WSIs of pulmonary ACs and SCCs were also randomly selected from the TCGA dataset [35].

2.2. Preprocessing

Tissue segmentation was performed to outline the tissue in the WSI using a tissue segmentation and tiling model titled “superfast segmentor”, which is a lightweight U-Net-like convolutional neural network designed for image segmentation, featuring a 3-level encoder-decoder structure where the bottleneck uses two 3 × 3 convolutional layers to expand the channel depth from 64 to 128. It includes skip connections, ReLU activations, and spatial resizing for concatenation and final output [36]. Only patches containing more than 25% tissue were included. The output was a single JSON file with tissue patch positions from all WSIs. The saved positions were then used along with the cuCIM library to read the patches from disk at the saved positions [37].
For experiments restricted to tumor tissue and for finetuning the foundation model, an additional segmentation step was applied to filter out non-tumor regions. After the tissue patches were identified, each patch was passed through a lung tumor segmentation model [34]. This classifier processed normalized patch images, outputting class probabilities. Only patches containing more than 10% tumor area and presenting more than 50% tumor likelihood were retained as tumor-positive patches. The spatial locations of these tumor patches were preserved for feature extraction in a JSON file to be read using the cuCIM during the feature extraction stage [37].
Each patch was downsampled to a fixed input size ( 224 × 224 pixels), suitable for the ViT backbone. Pixel values were converted to floating point and normalized using zero-mean normalization. These normalization statistics were computed on the training data only and consistently applied across all experiments to ensure comparability and proper model input scaling.

2.3. H-Optimus Adoption

In the zero-shot setting, features were extracted from all tissue patches using the original H-optimus model in its pretrained configuration. In this setting, the model was loaded without any task-specific adaptation or finetuning. Patches were processed in batches, and for each patch, the output of the backbone was extracted and saved as a high-dimensional feature vector. No classifier head was used during feature extraction; only the feature representations were retained. This baseline served to evaluate the transferability of the H-optimus model as a feature extractor for the downstream classification task, providing a point of comparison for subsequent adaptations and simplifications.
For finetuning the H-optimus model, tumor patches were extracted from the outlined tumor regions of the training WSIs. The H-optimus model was adapted to the downstream classification task using LoRA modules, which enabled efficient parameter updates by injecting trainable low-rank matrices into the attention layers of selected transformer blocks [38]. Specifically, LoRA modules were added to the query, key, and value linear layers and other linear projection layers in the final transformer layer blocks. A new classifier head was introduced, replacing the model’s original identity head with a linear layer matching the number of target classes. The main model and the pretrained weights were loaded using the timm library, and all parameters were frozen except for the LoRA modules and the classifier head, which were updated using the Adam optimizer and cross-entropy loss [39,40].
A batch size of four, a learning rate of 1 × 10 4 , and a total of 100 epochs were used. Furthermore, model checkpointing was employed based on the minimum validation loss at the end of each epoch, ensuring retention of the best-performing model.
For model training, a data augmentation pipeline, including random flips, rotations, resized crops, color jitter, and multi-lens distortion was used [34]. A weighted random sampler was used for balanced training. Images were normalized before being used with the H-optimus model.
After each epoch, validation loss was computed, and the best-performing model, according to validation loss, was saved. This approach ensured efficient adaptation with minimal memory and computational overhead, since only a small subset of the model’s parameters was updated.

2.4. Proposed Classification Method

The proposed method optimizes a set of class-specific prototypes in the feature space using an EA. Class assignment is performed via a computationally efficient nearest-prototype rule.

2.4.1. Representation of WSIs in the MIL Framework

In this study, a MIL framework was used, where a WSI is defined as a bag of instances, each instance representing an image patch. Let a WSI bag be denoted by B = { x 1 , x 2 , , x N B } , where x i R D is the D-dimensional feature vector extracted from the i-th patch and N B is the number of patches in the bag.
In this framework, a single vector representation for each bag B (each WSI) is the average of its instance feature vectors, as described in Equation (1):
x ¯ = 1 N B i = 1 N B x i
This transformed the problem into a bag-level supervised learning problem, in which each WSI was represented by a single feature vector x ¯ in R D .

2.4.2. Prototype-Based Classification

In the proposed method, the classifier model is defined by | C | sets of prototype vectors, where | C | is the cardinality of the set C that indicates the classes. For a binary classification problem with classes c { 0 , 1 } , the model M is composed of a set of K prototypes for class 0, denoted as P 0 = { p 0 , 1 , p 0 , 2 , , p 0 , K } , and a set of K prototypes for class 1, denoted as P 1 = { p 1 , 1 , p 1 , 2 , , p 1 , K } . Each prototype p c , k is a vector in the same D-dimensional feature space as the slide representations, i.e., p c , k R D .
The classification of a new WSI with the single vector representation of x ¯ is based on a nearest-prototype rule. This is done by computing the minimum distance from x ¯ to each set of class-specific prototypes as described in Equation (2):
d ( x ¯ , P c ) = min k { 1 , , K } δ ( x ¯ , p c , k )
where δ is a distance metric, for which a Euclidean distance was used, calculated as the L 2 norm of the difference between two vectors.

2.4.3. Evolutionary Optimization of Prototypes

The challenge is to find an optimal set of prototypes P 0 , P 1 for classes 0 and 1, respectively, that maximizes the classification performance. This optimization was implemented with an evolutionary method that includes an iteratively evolving population. Each individual Θ in the population is a candidate solution consisting of the two sets of prototypes: Θ = ( P 0 , P 1 ) .
The initial population of individuals is generated by sampling from the training data. For each individual, every prototype p c , k for class c is initialized with the mean feature vector x ¯ of a randomly selected training slide belonging to class c. This initialization ensures that the search starts from a reasonable region of the space and reduces the convergence time.

2.4.4. Fitness Evaluation

The fitness of each individual (i.e., each set of prototypes) is determined by the performance of the classification using that individual. The fitness function F ( Θ ) for an individual Θ is defined as the macro classification accuracy.

2.4.5. Evolutionary Operators and Cycle

The evolution from one generation to the next is driven by selection, crossover, and mutation operators, along with elitism.
(a)
Selection: A tournament selection rule was employed, in which a small subset of individuals is randomly chosen from the population and the individuals with the highest fitness in the subset are selected to be parents for the next generation.
(b)
Crossover: Given two parent individuals, Θ 1 = ( P 0 ( 1 ) , P 1 ( 1 ) ) and Θ 2 = ( P 0 ( 2 ) , P 1 ( 2 ) ) , a uniform crossover operator is applied. For each prototype position k { 1 , , K } and for each class c { 0 , 1 } , the prototype vectors p c , k ( 1 ) and p c , k ( 2 ) are swapped between the two parents with a fixed probability. This creates two new offspring individuals that are combinations of their parents’ prototypes.
(c)
Mutation: Mutation introduces new values into the population, preventing premature convergence. Each prototype vector p c , k within an individual has a fixed probability of mutation, which was 0.3 in our experiments. If selected for mutation, the vector is perturbed by adding a random vector sampled from a zero-mean Gaussian distribution (Equation (3)):
p c , k = p c , k + N ( 0 , σ 2 I D )
where I D is a D × D identity matrix and σ is the mutation strength hyperparameter, controlling the magnitude of the perturbation.
(d)
Elitism: To ensure that the best solution found during each generation is never lost, the principle of elitism is used. The top-performing individuals from each generation are directly transferred to the next generation without modification.
The evolutionary process was executed for a predefined number of generations and the individual with the highest fitness score across all generations was selected as the set of prototypes (Figure 2).

2.4.6. Hyperparameters

The evolutionary algorithm’s parameters were configured with eight prototypes per class and a population size of seven for 240 generations. To ensure preservation of high-performing solutions, an elitism count of two was applied. The primary genetic operators were set with a crossover rate of 0.8 and a mutation rate of 0.3, with the latter having a mutation strength of 0.1.

2.4.7. Explainability

An explainability method generating a heatmap for WSIs using our prototype-based framework was applied. This process utilizes the entire WSI’s bag of instances X = { x 1 , x 2 , , x N } , where each instance vector represents the high-level features of its corresponding patch. The core of the explainability lies in comparing these instance embeddings with the learned set of class-specific prototypes. Let P 0 = { p 0 , 1 , , p 0 , K 0 } and P 1 = { p 1 , 1 , , p 1 , K 1 } be the sets of K 0 and K 1 prototype vectors for class 0 and class 1, respectively, which reside in the same embedding space R D . For each patch embedding x i , this method calculates its minimum distance to the prototypes of each class. The patch-level class prediction, c ^ i , is then determined by the class with the closest prototype: c ^ i = arg min c { 0 , 1 } d ( x i , P c ) .
To generate the heatmap, each patch region is colored according to its predicted class for that patch c ^ i . The opacity of the color is modulated by its proximity to the assigned prototype, calculated as an inverse function of the minimum distance, d i , min = min ( d ( x i , P 0 ) , d ( x i , P 1 ) ) . This ensures that patches strongly representative of a class (i.e., having a small distance to a prototype) appear more intensely, providing an instance-level visual explanation of the presented proximity reasoning and which regions have the most resemblance.

2.5. Comparison with the Existing Methods

To assess the robustness of the proposed approach to previously published work, a selection of previously published approaches was selected for comparison.

2.5.1. Classification with 1DCNN

For this approach, after feature extraction, slide-level classification was performed using a 1-dimensional convolutional neural network (1DCNN) [41]. To ensure consistent input dimensionality, patch-level feature vectors extracted from all WSIs were zero-padded to a fixed length. These features were flattened and fed to the classifier. The architecture of this model consists of several stacked 1D convolutional and pooling layers, followed by global average pooling and a fully connected layer, producing final slide-level class probabilities via softmax activation. The classifier was trained and validated using a split based on slide identity, ensuring patient-level separation. The network was optimized using cross-entropy loss and Adam optimizer, and the best-performing model on the validation set was retained for evaluation.

2.5.2. CLAM

In this experiment, a CLAM model was trained by bags of feature vectors. The core of the model, a CLAM_SB network, processed each bag of features through a gated attention mechanism that learned to assign an attention score to each patch [18]. These scores were then used to create a single, weighted-average feature vector for the entire slide (slide-level aggregation rule of attention-based pooling). The model was trained using a composite loss function that combined a standard bag-level cross-entropy loss with a unique instance-level loss. The instance-level loss acted as a regularizer, encouraging the model to correctly identify the most and least relevant patches by using a cosine embedding loss to pull high-attention instances toward the ground truth class centroid and push low-attention instances away from other class centroids. The training loop iterated through slides, computed this combined loss, and updated the model parameters using an Adam optimizer.

2.6. Experiments

An initial ablation study was conducted on the Biobank1 test set to determine the optimal magnification level for subsequent experiments. This ablation study, which evaluated magnifications of × 2.5 , × 5 , and × 10 , aimed to determine the optimal balance among computational efficiency, time consumption, and model performance.
A series of experiments were conducted to systematically evaluate the pipeline under different conditions. The experimental variables included the region of interest (whole tissue versus tumor tissue only), the choice of feature extractor (original versus finetuned) the classification method (1DCNN, CLAM, and EP-MIL) and the test datasets (Biobank1, HULC, and TCGA). For each experiment, the features were extracted as described above and classified using the three classification methods. Results are reported for all configurations, enabling a direct comparison of performance across combinations of region, magnification, and model variant, as summarized in Figure 3.

2.7. Evaluation

To evaluate the performance of our classification models, two key metrics were used: micro accuracy and macro F 1 -score to provide a more robust measure, considering the class imbalance.
Accuracy measures the overall correctness of the model’s predictions as described in Equation (4).
Accuracy = TP + TN TP + TN + FP + FN
where TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively.
The class-specific F 1 -score ( F 1 , c ) is defined as the harmonic mean of precision and recall for a given class c as described in Equation (5):
F 1 , c = 2 · Precision c · Recall c Precision c + Recall c
The macro F 1 -score ( F 1 , macro ) is subsequently computed as the unweighted average of the F 1 -scores obtained for all classes ( N classes ) as shown in Equation (6):
F 1 , macro = 1 N classes c = 1 N classes F 1 , c
To assess the effect of classifier models on generalizability, a logistic regression analysis was performed using the predictions of the models in TCGA as the dependent variable. Assessments were also conducted to test the effect of classifiers on predictions in the Biobank1 and HULC datasets. Similar evaluations were conducted to test the effect of feature extraction models. To account for the correlation among the predictions of the same WSI, clustered standard errors by sample ID was used. The analysis was performed using Python’s statsmodels library, and a significance level of 0.05 was used to determine statistical significance [42].
A comparison of the three classification methods was performed during both training and inference. To ensure a fair comparison across all methods, the random seed was fixed and the exact same data split and hardware settings were used for all experiments. Computational efficiency was evaluated by measuring seven metrics: training time per epoch or generation, total inference time, mean latency per slide, training peak RAM usage, inference peak RAM usage, model size on disk, and FLOPs per single-slide forward pass. Latency was measured by timing the forward pass for a single test slide after a warm-up phase to exclude compilation overhead. Peak RAM was tracked during inference using process-level memory monitoring tools, whereas the model size was computed as the serialized checkpoint size in MB. To quantify arithmetic complexity, the number of floating-point operations (FLOPs) required for a single slide forward pass was measured.

2.8. Computational Environment

All experiments were conducted on a workstation equipped with an Intel Xeon Gold 6230 central processing unit and a Quadro RTX 6000 graphics processing unit (GPU) with 24 GB of VRAM, running on the Ubuntu 22.04.3 operating system. The implementations were written in Python 3.10.12 and utilized PyTorch v2.7.0 deep learning framework for model training and inference [43]. Data processing and augmentation tasks were handled using cuCIM for WSI reading and preprocessing, and OpenCV and scikit-image for patch-level processing [37,44,45]. Feature extraction was performed using the H-optimus-0 model and its finetuned variant [27].
The source code of EP-MIL is made openly available at https://github.com/AICAN-Research/EP-MIL (accessed on 29 August 2025).

3. Results

The results of the initial ablation study for the selection of the magnification level indicated that × 10 magnification, while providing the highest resolution, incurred an extreme computational cost due to the heavy model architecture. It did not produce a significant difference compared to the × 5 magnification results. Conversely, the × 2.5 magnification demonstrated a significant drop in performance. Based on this trade-off analysis, the × 5 magnification was selected for all further experiments, as it provided an optimal balance of robust performance and manageable computational requirements. Table 1 summarizes the results of the main experiments performed.
Logistic regression analysis indicated that the choice of classification method had a statistically significant impact on performance in the TCGA test set (p = 0.030 and confidence interval [0.096, 1.901]). The statistical significance, combined with the observed metric values, suggests that the EP-MIL model achieved superior performance in TCGA. In contrast, logistic regression analysis for the same effect in the Biobank1 and HULC datasets did not produce a statistically significant result (p = 0.112 for HULC and p = 0.089 for Biobank1), which means one cannot reject the null hypothesis that the classification method has no impact on performance for these datasets.
Logistic regression analysis showed a statistically significant effect of the feature extraction model on the classification results in the TCGA test set (p < 0.001). This result indicates that the finetuned model had a statistically significant impact on performance in the TCGA dataset. In contrast, the logistic regression analysis for the effect of the feature extraction model on the Biobank1 and HULC datasets was not statistically significant. Consequently, for these two datasets, one cannot reject the null hypothesis that the choice of feature extraction model had no impact on performance.
Figure 4 shows visualizations from the explainability method, highlighting the distance of image patches to specific prototypes.
Figure 5 shows visualizations from the explainability method, along with a higher magnification of the region, highlighting the relative distance of the image patches to specific prototypes.
Time and computational cost evaluations are presented in Table 2.

4. Discussion

In this study, we evaluated the classification of lung AC and SCC by combining a state-of-the-art histopathology foundation model, H-optimus, with three distinct downstream classification strategies: a 1DCNN model, an attention-based MIL framework (CLAM), and our proposed MIL method, EP-MIL. The novelty of our work lies in the development of our EP-MIL classifier, which is a novel approach for handling the computational challenges of WSIs. We tested these pipelines on an internal cohort (Biobank1), and two external cohorts (HULC and TCGA).
Most of the classifier and feature extractor configurations achieved similar performance in the Biobank1 and HULC datasets. However, the combination of a fine-tuned feature extractor with the EP-MIL classifier demonstrated statistically significant superior performance in the TCGA dataset. This is likely due to the wider spectrum of staining and scanning variability among the TCGA data, as opposed to the more similar preparation, staining, and scanning techniques used for the Biobank1 and HULC cohorts. The variability in staining and scanning in the TCGA data could explain the performance drop observed across all models when moving to the TCGA dataset. TCGA includes a large global collection of cancer tissues, resulting in a heterogeneous image dataset due to variations in laboratory protocols in different sites [46,47,48].
The tested models performed similarly in the Biobank1 and HULC datasets. In the TCGA, on the other hand, we observed a statistically significant difference in F 1 -score. The extreme heterogeneity and variability of the data can suggest that TCGA is a more challenging test set for assessing the generalizability of models. For any AI tool to be clinically viable, it must be robust to the mentioned variations. Our findings suggest that the EP-MIL approach, especially when used along with feature extractors adapted to the specific task via finetuning, offers a promising direction for building more resilient diagnostic models.
The relatively lower accuracy and F 1 -scores observed in the TCGA test set can be attributed to the significant distributional shift between this external cohort and our training data. As a global collection, TCGA is a heterogeneous dataset with regards to laboratory protocols, tissue morphology, and scanning techniques. Additionally, the limited sample size of the TCGA test set ( N = 48 ) inevitably introduces greater statistical variance and amplifies the impact of the outliers on the final aggregated performance metrics.
There are two possible reasons for the relatively low and inconsistent performance difference between using whole tissue and only tumor-segmented regions. First, for this specific binary task, the tumor’s surrounding tissue in the whole-tissue patches may contain diagnostically relevant and informative features. Second, the currently used tumor segmentation model may not be accurately inclusive of the whole tumor region.
Our work highlights the trade-offs between different machine learning architectures. The 1DCNN and CLAM models represent more complex data-driven approaches that learn intricate decision boundaries. CLAM, with its attention mechanism, is designed to identify and weight the most informative instances within a bag. Although powerful, these mechanisms can potentially overfit to spurious features or patterns present in the training data, which may not generalize to new domains.
The success of attention-based MIL frameworks is well documented, achieving state-of-the-art performance by pinpointing diagnostically relevant regions within a WSI [16,17,49,50,51]. By focusing on specific areas found to be relevant for the task at hand, attention-based MIL frameworks like CLAM mimic a pathologist’s workflow and have proven valuable for WSI classification tasks [17]. We observed a significantly higher performance of the CLAM model on the HULC dataset. However, the flexibility that allows these models to focus on subtle key instances can become a vulnerability when faced with data derived from different scanners with different thicknesses and staining, which is the case in the TCGA test set. This is because attention mechanisms, trained to prioritize specific features from the training distribution, may fail to generalize effectively to novel data with different features [52].
Our proposed EP-MIL is conceptually much simpler. By aggregating all patch features into a single mean vector, it effectively smooths out instance-level noise and creates a holistic representation of the WSI. This preprocessing step, while losing granular patch-level information, may act as a strong form of regularization, preventing the model from relying on idiosyncratic patch features. The subsequent classification via nearest-prototype matching in the feature space is a nonparametric method that is less prone to learning the kind of complex, brittle decision boundaries that can fail under domain shift. Bag embedding has been shown to outperform the simpler and more direct approach of instance-level probability pooling [49,53].
In the presented study, the EP-MIL method also presented a significant reduction in computational cost and a notable decrease in RAM consumption compared to other methods tested. This is one of the factors that makes it particularly suitable for the application of this method in resource-constrained environments, such as mobile and IoT devices.
The EP-MIL method also supports the explainability of its decision-making process. By employing a simple geometrical distance-based approach for classification, the method can easily capture the distance of each patch to its respective class prototype. This allows identification of patches that are the best representatives of the classes, providing insight into the reasoning behind the classification.
The decision to aggregate all patch features into a single mean vector in EP-MIL was a strategic design choice, prioritizing robustness and computational efficiency over the explicit retention of granular spatial and heterogeneity information. This mean aggregation acts as a powerful form of regularization by effectively smoothing out instance-level noise and discouraging the model from learning complex, brittle decision boundaries that are susceptible to domain shift, particularly when dealing with highly heterogeneous external datasets like TCGA. Furthermore, while mean aggregation discards explicit spatial context, it still retains the essential characteristic features in the resulting mean vector, allowing for effective nearest-prototype classification.
The superiority of the finetuned version of the H-optimus model for feature extraction compared to the unfinetuned version, particularly in TCGA, reinforces the value of task-specific adaptation. Even with a powerful and large-scale foundation model, low-rank adaptation provides efficient means to specialize the feature representation, making it more discriminative for the target classes and more robust to irrelevant variations.
The EP-MIL method, with its use of a nearest-prototype classifier, belongs to a family of distance-based models with a long history in pattern recognition. The core challenge in such models is the optimal placement of prototypes. Digital pathology usually presents a non-convex optimization task, for which the evolutionary algorithms are particularly well-suited. EAs have been successfully employed in other medical imaging domains to optimize classifiers, perform feature selection, and tune deep learning architectures, often demonstrating the ability to navigate complex optimization landscapes where gradient-based methods might struggle [54,55,56,57,58,59,60].
The gradient-free approach of EP-MIL explores the solution space differently from gradient descent. While gradient-based methods follow a specific path down the loss surface, they can become trapped in sharp local minima, which often correspond to non-generalizable solutions. Population-based methods like evolutionary algorithms used in EP-MIL conduct a parallel search of the loss landscape with broader initializations. This would potentially provide them a better chance to discover more robust minima that correspond to solutions that generalize better.
The aggregation of patch features into a single mean vector can be viewed as projecting the high-dimensional point cloud of a WSI onto its centroid. The subsequent nearest-prototype classification in this space of centroids is geometrically interpretable: a slide is classified based on which reference class representation it most closely resembles.
A key strength of this study is its rigorous design, systematically testing multiple components of the pipeline (feature extractor, tissue selection, classifier) across three distinct datasets that represent different levels of domain shift. However, there are some study limitations. First, the sample sizes of the test sets, particularly the Biobank1 hold-out set (n = 30) and the TCGA set (n = 48), are relatively small due to the time constraints and resource limitations. This could limit the statistical power of our conclusions and may have introduced a greater variance in the performance metrics. Although our statistical analysis on the TCGA data shows a statistically significant effect of the classification model, validation on a larger test set is imperative. Second, our proposed EP-MIL model relies on a simple mean aggregation of patch features, which discards all information about the spatial distribution and heterogeneity of features within a slide. Although this appears to be a beneficial regularizer in our experiments, it may be suboptimal for tasks where a few small key local features can be the deciding factor for classification. For instance, using the whole tissue approach may perform poorly on a slide where a very small tumor region is present, as its discriminatory features could be lost among the features of the larger area of healthy tissue.
In the future, these limitations could be addressed by moving from a single holistic representation to a multi-faceted one. Instead of averaging all patch features into a single vector, one could first perform unsupervised clustering on the patch features within each WSI to identify K distinct morphological sub-populations. By averaging the features within each cluster, a WSI would be represented not as a single vector, but as a set of K centroid vectors in the feature space. This approach would allow the model to capture critical intratumoral heterogeneity—for instance, differentiating regions with high-grade tumor morphology from low-grade or distinct stromal reactions within the same slide. This richer representation would necessitate an evolution of the EP-MIL classification rule, requiring a set-based distance metric (such as the Chamfer or Hausdorff distance) to compare the set of K vectors from a WSI against the class-specific prototypes.
To improve EP-MIL, one could also explore more sophisticated aggregation methods that capture higher-order statistics of the patch feature distribution or incorporate a trainable attention mechanism that is regularized to prevent overfitting. It would also be valuable to expand this comparative analysis to more challenging classification tasks, such as distinguishing different subclasses of adenocarcinoma. Finally, a larger-scale validation study could be the next critical step.

5. Conclusions

We classified lung AC and SCC using the H-optimus foundation model and three downstream strategies: 1DCNN, CLAM, and our proposed EP-MIL. EP-MIL (with a finetuned feature extractor) outperformed state-of-the-art attention method (CLAM) on the heterogeneous out-of-distribution TCGA dataset. This superior generalization was achieved via evolutionary algorithms and geometrically interpretable prototype matching, unlike complex attention mechanisms. EP-MIL also showed the lowest computational cost and memory footprint. We also confirmed the efficacy of LoRA for efficient, task-specific foundation model adaptation.
Future work can focus on adopting a multi-faceted WSI representation instead of a single vector. This can involve using unsupervised clustering of patch features to identify distinct morphological sub-populations, representing the slide as a set of centroid vectors to capture intratumoral heterogeneity. Classification will then require a set-based distance metric, such as Chamfer or Hausdorff. Further improvements include exploring sophisticated aggregation methods like higher-order statistics or regularized attention mechanisms. The approach will be validated by expanding the analysis to more challenging classification tasks and conducting a larger-scale study.

Author Contributions

Conceptualization, S.O., G.K. and H.S.; Data curation, B.Y., M.V., H.S., V.G.D., S.G.F.W., M.D.H., M.P.R. and L.A.A.; Formal analysis, S.O. and A.P.; Funding acquisition, H.S. and L.A.A.; Investigation, S.O., A.P., M.V., G.K. and H.S.; Methodology, S.O., A.P. and G.K.; Project administration, H.S.; Resources, H.S. and M.V.; Software, S.O., A.P. and G.K.; Supervision, M.V., G.K. and H.S.; Validation, S.O., A.P., M.V., G.K. and H.S.; Visualization, S.O., A.P., G.K. and H.S.; Writing—original draft, S.O.; Writing—review & editing, S.O., A.P., M.V., V.G.D., S.G.F.W., M.D.H., B.Y., M.P.R., L.A.A., G.K. and H.S. All authors have read and agreed to the published version of the manuscript.

Funding

The research leading to these results received funding from The Liaison Committee for Education, Research, and Innovation in Central Norway (identifiers 2021/928 and 2022/787). This work was also supported by grants from the Research Council of Norway through its Centres of Excellence funding scheme, project number 223250 (to L.A.A.).

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki and approved by the Regional Committee for Medical and Health Sciences Research Ethics (REK) Norway (identifier 257624, date of approval 21 June 2021), the institutional Personal Protection Officer and local Data Access Committee at the Norwegian University of Science and Technology and St. Olavs hospital, Trondheim University Hospital (identifier 2021/1374, date of approval 27 May 2022).

Informed Consent Statement

Informed consent was obtained from all subjects and/or their legal guardian(s) for Biobank1 in accordance with REK 2016/1156. For subjects in the HULC cohort, exemption from consent was ethically approved by REK (2013/529). TCGA project provided a comprehensive framework for ethical data collection, but it did not operate under a single, centrally enforced policy; rather, it relied on local institutional review boards to oversee the informed consent process. The project’s policies aimed to manage and mitigate risks rather than “ensure” absolute protection, explicitly acknowledging that genomic data, even when de-identified, carries a small but inherent risk of re-identification. To balance participant privacy with the needs of scientific discovery, TCGA implemented a dual-tiered data access system. While general data is made openly available, access to the most sensitive, individual-specific genomic information is restricted, requiring a rigorous formal application and approval process for qualified researchers.

Data Availability Statement

The datasets generated and/or analyzed during the current study are not publicly available according to the ethical approval for this work. Requests for access to the datasets should be directed to hanne.sorger@ntnu.no.

Acknowledgments

The results presented here are, in part, based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga (accessed on 20 June 2025).

Conflicts of Interest

Author André Pedersen was employed by the company DIPS. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Shafi, S.; Parwani, A.V. Artificial intelligence in diagnostic pathology. Diagn. Pathol. 2023, 18, 109. [Google Scholar] [CrossRef]
  2. Davri, A.; Birbas, E.; Kanavos, T.; Ntritsos, G.; Giannakeas, N.; Tzallas, A.T.; Batistatou, A. Deep learning for lung cancer diagnosis, prognosis and prediction using histological and cytological images: A systematic review. Cancers 2023, 15, 3981. [Google Scholar] [CrossRef]
  3. Rigamonti, A.; Viatore, M.; Polidori, R.; Rahal, D.; Erreni, M.; Fumagalli, M.R.; Zanini, D.; Doni, A.; Putignano, A.R.; Bossi, P.; et al. Integrating AI-Powered Digital Pathology and Imaging Mass Cytometry Identifies Key Classifiers of Tumor Cells, Stroma, and Immune Cells in Non–Small Cell Lung Cancer. Cancer Res. 2024, 84, 1165–1177. [Google Scholar] [CrossRef]
  4. Kanan, M.; Alharbi, H.; Alotaibi, N.; Almasuood, L.; Aljoaid, S.; Alharbi, T.; Albraik, L.; Alothman, W.; Aljohani, H.; Alzahrani, A.; et al. AI-Driven Models for Diagnosing and Predicting Outcomes in Lung Cancer: A Systematic Review and Meta-Analysis. Cancers 2024, 16, 674. [Google Scholar] [CrossRef] [PubMed]
  5. Miry, A.; Tbouda, M.; Oqbani, K.; Abbaoui, S. Impact of Artificial Intelligence-Assisted Pathology on Patient Healthcare: Literature Review. In Proceedings of the 1st International e-Health Forum—IeHF, INSTICC, Melbourne, Australia, 28 October–1 November 2024; SciTePress: Setúbal, Portugal, 2024; pp. 5–7. [Google Scholar] [CrossRef]
  6. Ahmad, Z.; Rahim, S.; Zubair, M.; Abdul-Ghafar, J. Artificial intelligence (AI) in medicine, current applications and future role with special emphasis on its potential and promise in pathology: Present and future impact, obstacles including costs and acceptance among pathologists, practical and philosophical considerations. A comprehensive review. Diagn. Pathol. 2021, 16, 24. [Google Scholar] [CrossRef]
  7. Zhang, Y.; Liu, H.; Chang, C.; Yin, Y.; Wang, R. Machine learning for differentiating lung squamous cell cancer from adenocarcinoma using Clinical-Metabolic characteristics and 18F-FDG PET/CT radiomics. PLoS ONE 2024, 19, e0300170. [Google Scholar] [CrossRef] [PubMed]
  8. Ghajari, N.E.; Fathi, A. Detection and Classification of Lung Cancer in Histopathology Images Using Deep Learning. J. Comput. Secur. 2024, 11, 19–28. [Google Scholar] [CrossRef]
  9. Campanella, G.; Kumar, N.; Nanda, S.; Singi, S.; Fluder, E.; Kwan, R.; Muehlstedt, S.; Pfarr, N.; Schüffler, P.J.; Häggström, I.; et al. Real-world deployment of a fine-tuned pathology foundation model for lung cancer biomarker detection. Nat. Med. 2025, 31, 3002–3010. [Google Scholar] [CrossRef]
  10. Zheng, Y.; Gindra, R.H.; Green, E.J.; Burks, E.J.; Betke, M.; Beane, J.E.; Kolachalama, V.B. A graph-transformer for whole slide image classification. IEEE Trans. Med. Imaging 2022, 41, 3003–3015. [Google Scholar] [CrossRef] [PubMed]
  11. Rakaee, M.; Tafavvoghi, M.; Ricciuti, B.; Alessi, J.V.; Cortellini, A.; Citarella, F.; Nibid, L.; Perrone, G.; Adib, E.; Fulgenzi, C.A.; et al. Deep learning model for predicting immunotherapy response in advanced Non- Small cell lung cancer. JAMA Oncol. 2025, 11, 109–118. [Google Scholar] [CrossRef]
  12. Kludt, C.; Kludt, C.; Wang, Y.; Ahmad, W.; Bychkov, A.; Fukuoka, J.; Gaisa, N.; Kühnel, M.; Jonigk, D.; Pryalukhin, A.; et al. Next-generation lung cancer pathology: Development and validation of diagnostic and prognostic algorithms. Cell Rep. Med. 2024, 5, 101697. [Google Scholar] [CrossRef]
  13. Carrillo-Perez, F.; Morales, J.C.; Castillo-Secilla, D.; Molina-Castro, Y.; Guillén, A.; Rojas, I.; Herrera, L.J. Non-small-cell lung cancer classification via RNA-Seq and histology imaging probability fusion. BMC Bioinform. 2021, 22, 454. [Google Scholar] [CrossRef]
  14. Meseguer, P.; del Amor, R.; Colomer, A.; Naranjo, V. Foundation Models for Slide-Level Cancer Subtyping in Digital Pathology. In Proceedings of the Decision Science Alliance International Summer Conference, Valencia, Spain, 6–7 June 2024; Springer: Cham, Switzerland, 2024; pp. 190–198. [Google Scholar]
  15. Wang, J.; Mao, Y.; Guan, N.; Xue, C.J. Advances in multiple instance learning for whole slide image analysis: Techniques, challenges, and future directions. arXiv 2024, arXiv:2408.09476. [Google Scholar] [CrossRef]
  16. Campanella, G.; Hanna, M.G.; Geneslaw, L.; Miraflor, A.; Werneck Krauss Silva, V.; Busam, K.J.; Brogi, E.; Reuter, V.E.; Klimstra, D.S.; Fuchs, T.J. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 2019, 25, 1301–1309. [Google Scholar] [CrossRef]
  17. Lu, M.Y.; Williamson, D.F.; Chen, T.Y.; Chen, R.J.; Barbieri, M.; Mahmood, F. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 2021, 5, 555–570. [Google Scholar] [CrossRef]
  18. Ilse, M.; Tomczak, J.; Welling, M. Attention-based deep multiple instance learning. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 2127–2136. [Google Scholar]
  19. Ünal, H.T.; Başçiftçi, F. Evolutionary design of neural network architectures: A review of three decades of research. Artif. Intell. Rev. 2022, 55, 1723–1802. [Google Scholar] [CrossRef]
  20. Stanley, K.O.; Clune, J.; Lehman, J.; Miikkulainen, R. Designing neural networks through neuroevolution. Nat. Mach. Intell. 2019, 1, 24–35. [Google Scholar] [CrossRef]
  21. Langer, C.J.; Besse, B.; Gualberto, A.; Brambilla, E.; Soria, J.C. The evolving role of histology in the management of advanced non–small-cell lung cancer. J. Clin. Oncol. 2010, 28, 5311–5320. [Google Scholar] [CrossRef] [PubMed]
  22. Rekhtman, N.; Ang, D.C.; Sima, C.S.; Travis, W.D.; Moreira, A.L. Immunohistochemical algorithm for differentiation of lung adenocarcinoma and squamous cell carcinoma based on large series of whole-tissue sections with validation in small specimens. Mod. Pathol. 2011, 24, 1348–1359. [Google Scholar] [CrossRef]
  23. Bilal, M.; Raza, M.; Altherwy, Y.; Alsuhaibani, A.; Abduljabbar, A.; Almarshad, F.; Golding, P.; Rajpoot, N. Foundation models in computational pathology: A review of challenges, opportunities, and impact. arXiv 2025, arXiv:2502.08333. [Google Scholar] [CrossRef]
  24. Lu, M.Y.; Chen, B.; Williamson, D.F.; Chen, R.J.; Liang, I.; Ding, T.; Jaume, G.; Odintsov, I.; Le, L.P.; Gerber, G.; et al. A visual-language foundation model for computational pathology. Nat. Med. 2024, 30, 863–874. [Google Scholar] [CrossRef]
  25. Ding, T.; Wagner, S.J.; Song, A.H.; Chen, R.J.; Lu, M.Y.; Zhang, A.; Vaidya, A.J.; Jaume, G.; Shaban, M.; Kim, A.; et al. Multimodal whole slide foundation model for pathology. arXiv 2024, arXiv:2411.19666. [Google Scholar] [CrossRef]
  26. Xu, H.; Usuyama, N.; Bagga, J.; Zhang, S.; Rao, R.; Naumann, T.; Wong, C.; Gero, Z.; González, J.; Gu, Y.; et al. A whole-slide foundation model for digital pathology from real-world data. Nature 2024, 630, 181–188. [Google Scholar] [CrossRef]
  27. Saillard, C.; Jenatton, R.; Llinares-López, F.; Mariet, Z.; Cahané, D.; Durand, E.; Vert, J.P. H-Optimus-0. 2024. Available online: https://github.com/bioptimus/releases/tree/main/models/h-optimus/v0 (accessed on 15 June 2025).
  28. Hong, Z.; Xiong, J.; Yang, H.; Mo, Y.K. Lightweight low-rank adaptation vision transformer framework for cervical cancer detection and cervix type classification. Bioengineering 2024, 11, 468. [Google Scholar] [CrossRef]
  29. Rymarczyk, D.; Pardyl, A.; Kraus, J.; Kaczyńska, A.; Skomorowski, M.; Zieliński, B. Protomil: Multiple instance learning with prototypical parts for whole-slide image classification. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Grenoble, France, 19–23 September 2022; Springer: Cham, Switzerland, 2022; pp. 421–436. [Google Scholar]
  30. Lee, J.; Lim, J.; Byeon, K.; Kwak, J.T. Benchmarking pathology foundation models: Adaptation strategies and scenarios. Comput. Biol. Med. 2025, 190, 110031. [Google Scholar] [CrossRef]
  31. Fischer, A.H.; Jacobson, K.A.; Rose, J.; Zeller, R. Hematoxylin and eosin staining of tissue and cell sections. Cold Spring Harb. Protoc. 2008, 2008, pdb-prot4986. [Google Scholar] [CrossRef] [PubMed]
  32. Hatlen, P.; Grønberg, B.H.; Langhammer, A.; Carlsen, S.M.; Amundsen, T. Prolonged survival in patients with lung cancer with diabetes mellitus. J. Thorac. Oncol. 2011, 6, 1810–1817. [Google Scholar] [CrossRef] [PubMed]
  33. Ramnefjell, M.; Aamelfot, C.; Helgeland, L.; Akslen, L.A. Vascular invasion is an adverse prognostic factor in resected non–small-cell lung cancer. Apmis 2017, 125, 197–206. [Google Scholar] [CrossRef]
  34. Oskouei, S.; Valla, M.; Pedersen, A.; Smistad, E.; Dale, V.G.; Høibø, M.; Wahl, S.G.F.; Haugum, M.D.; Langø, T.; Ramnefjell, M.P.; et al. Segmentation of Non-Small Cell Lung Carcinomas: Introducing DRU-Net and Multi-Lens Distortion. J. Imaging 2025, 11, 166. [Google Scholar] [CrossRef] [PubMed]
  35. The Cancer Genome Atlas Research Network; Weinstein, J.N.; Collisson, E.A.; Mills, G.B.; Shaw, K.R.; Ozenberger, B.A.; Ellrott, K.; Shmulevich, I.; Sander, C.; Stuart, J.M. The cancer genome atlas pan-cancer analysis project. Nat. Genet 2013, 45, 1113–1120. [Google Scholar] [CrossRef]
  36. Oskouei, S. SoroushOskouei/FastPatchFinder: Superfast_Tissue_Segmentor. 2025. Available online: https://zenodo.org/records/17516344 (accessed on 3 November 2025).
  37. Lee, G.; Bae, G.; Zaitlen, B.; Kirkham, J.; Choudhury, R. cuCIM—A GPU Image I/O and Processing Library. 2021. Available online: https://zenodo.org/records/5151998 (accessed on 5 June 2025).
  38. Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. ICLR 2022, 1, 3. [Google Scholar]
  39. Kinga, D.; Adam, J.B. A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar] [CrossRef]
  40. Wightman, R. PyTorch Image Models. 2019. Available online: https://github.com/rwightman/pytorch-image-models (accessed on 20 June 2025).
  41. Oskouei, S.; Pedersen, A.; Valla, M.; Dale, V.G.; Wahl, S.G.F.; Haugum, M.D.; Langø, T.; Ramnefjell, M.P.; Akslen, L.A.; Kiss, G.; et al. OKEN: A Supervised Evolutionary Optimizable Dimensionality Reduction Framework for Whole Slide Image Classification. Bioengineering 2025, 12, 733. [Google Scholar] [CrossRef]
  42. Seabold, S.; Perktold, J. Statsmodels: Econometric and statistical modeling with python. SciPy 2010, 7, 92–96. [Google Scholar]
  43. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 1–12. [Google Scholar]
  44. Bradski, G. The OpenCV Library. Dr. Dobb’s J. Softw. Tools 2000, 25, 120–123. [Google Scholar]
  45. Van der Walt, S.; Schönberger, J.L.; Nunez-Iglesias, J.; Boulogne, F.; Warner, J.D.; Yager, N.; Gouillart, E.; Yu, T. scikit-image: Image processing in Python. PeerJ 2014, 2, e453. [Google Scholar] [CrossRef] [PubMed]
  46. Clark, K.; Vendt, B.; Smith, K.; Freymann, J.; Kirby, J.; Koppel, P.; Moore, S.; Phillips, S.; Maffitt, D.; Pringle, M.; et al. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository. J. Digit. Imaging 2013, 26, 1045–1057. [Google Scholar] [CrossRef]
  47. Albertina, B.; Watson, M.; Holback, C.; Jarosz, R.; Kirk, S.; Lee, Y.; Rieger-Christ, K.; Lemmerman, J. The Cancer Genome Atlas Lung Adenocarcinoma Collection (TCGA-LUAD); The Cancer Imaging Archive: St. Louis, MO, USA, 2016. [Google Scholar] [CrossRef]
  48. Kirk, S.; Lee, Y.; Kumar, P.; Filippini, J.; Albertina, B.; Watson, M.; Rieger-Christ, K.; Lemmerman, J. The Cancer Genome Atlas Lung Squamous Cell Carcinoma Collection (TCGA-LUSC); The Cancer Imaging Archive: St. Louis, MO, USA, 2016. [Google Scholar] [CrossRef]
  49. Shao, Z.; Bian, H.; Chen, Y.; Wang, Y.; Zhang, J.; Ji, X. Transmil: Transformer based correlated multiple instance learning for whole slide image classification. Adv. Neural Inf. Process. Syst. 2021, 34, 2136–2147. [Google Scholar]
  50. Cheng, H.; Huang, S.; Cai, L.; Xu, Y.; Wang, R.; Zhang, Y. Focus your attention: Multiple instance learning with attention modification for whole slide pathological image classification. IEEE Trans. Circuits Syst. Video Technol. 2025, 35, 5791–5804. [Google Scholar] [CrossRef]
  51. Qiu, P.; Xiao, P.; Zhu, W.; Wang, Y.; Sotiras, A. Sc-mil: Sparsely coded multiple instance learning for whole slide image classification. arXiv 2023, arXiv:2311.00048. [Google Scholar] [CrossRef]
  52. Zhang, H.; Meng, Y.; Zhao, Y.; Qiao, Y.; Yang, X.; Coupland, S.E.; Zheng, Y. Dtfd-mil: Double-tier feature distillation multiple instance learning for histopathology whole slide image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 21–24 June 2022; pp. 18802–18812. [Google Scholar]
  53. Wang, X.; Yan, Y.; Tang, P.; Bai, X.; Liu, W. Revisiting multiple instance neural networks. Pattern Recognit. 2018, 74, 15–24. [Google Scholar] [CrossRef]
  54. Piri, J.; Mohapatra, P.; Dey, R.; Acharya, B.; Gerogiannis, V.; Kanavos, A. Literature review on hybrid evolutionary approaches for feature selection. Algorithms 2023, 16, 167. [Google Scholar] [CrossRef]
  55. Khan, S.; Mazhar, T.; Naz, N.S.; Ahmed, F.; Shahzad, T.; Ali, A.; Khan, M.A.; Hamam, H. Advanced Feature Selection Techniques in Medical Imaging—A Systematic Literature Review. Comput. Mater. Contin. 2025, 85, 2347–2401. [Google Scholar] [CrossRef]
  56. Taha, Z.Y.; Abdullah, A.A.; Rashid, T.A. Optimizing feature selection with genetic algorithms: A review of methods and applications. Knowl. Inf. Syst. 2025, 67, 9739–9778. [Google Scholar] [CrossRef]
  57. Liu, X.; Li, J.; Zhao, J.; Cao, B.; Yan, R.; Lyu, Z. Evolutionary neural architecture search and its applications in healthcare. CMES-Comput. Model. Eng. Sci. 2024, 139, 143–185. [Google Scholar] [CrossRef]
  58. Agrawal, U.K.; Panda, N.; Das, D.; Dalai, A.K.; Ramana, B.; Mishra, A. Automated Healthcare Medical Imaging through EOA Optimized Hyperparameter in CNN. Procedia Comput. Sci. 2025, 259, 1106–1114. [Google Scholar] [CrossRef]
  59. Aswathy, S.; Devadhas, G.G.; Kumar, S. MRI brain tumor segmentation using genetic algorithm with SVM classifier. J. Electron. Commun. Eng. 2017, 22–26. [Google Scholar]
  60. Wu, Q.; Zhao, Z.; Chen, M.; Chi, X.; Zhang, B.; Wang, J.; Zhilenkov, A.A.; Chepinskiy, S.A. An RNA evolutionary algorithm based on gradient descent for function optimization. J. Comput. Des. Eng. 2024, 11, 332–357. [Google Scholar] [CrossRef]
Figure 1. This diagram illustrates the data split and distribution. Abbreviations: WSIs: whole slide images; AC: adenocarcinoma; SCC: squamous cell carcinoma; HULC: Haukeland University Lung Cancer; TCGA: The Cancer Genome Atlas.
Figure 1. This diagram illustrates the data split and distribution. Abbreviations: WSIs: whole slide images; AC: adenocarcinoma; SCC: squamous cell carcinoma; HULC: Haukeland University Lung Cancer; TCGA: The Cancer Genome Atlas.
Algorithms 18 00769 g001
Figure 2. Overview of the training (A) and inference (B) stages in the proposed evolutionary prototype-based multiple instance learning (EP-MIL) framework. Abbreviations: WSI: Whole Slide Image.
Figure 2. Overview of the training (A) and inference (B) stages in the proposed evolutionary prototype-based multiple instance learning (EP-MIL) framework. Abbreviations: WSI: Whole Slide Image.
Algorithms 18 00769 g002
Figure 3. Overview of the experiments performed. Abbreviations: 1DCNN: 1-dimensional convolutional neural network, CLAM: clustering-constrained attention multiple-instance learning, EP-MIL: evolving prototype-based multiple instance learning.
Figure 3. Overview of the experiments performed. Abbreviations: 1DCNN: 1-dimensional convolutional neural network, CLAM: clustering-constrained attention multiple-instance learning, EP-MIL: evolving prototype-based multiple instance learning.
Algorithms 18 00769 g003
Figure 4. Images (AD) show sample explainability assessments. Heatmap colors represent patch-assigned classes and the intensity represents the relative proximity to the NSCLC subtypes. The blue color signifies a relative proximity to adenocarcinoma prototypes, while the red color signifies a relative proximity to squamous cell carcinoma prototypes. Color intensity indicates a closer distance to prototype features. Images (ad) show regions from the same slides with high heatmap intensity. (A,a,B,b) labeled as squamous cell carcinoma in the TCGA cohort, (C,c,D,d) adenocarcinoma from the HULC cohort. Abbreviation: NSCLC= Non Small Cell Lung Carcinoma.
Figure 4. Images (AD) show sample explainability assessments. Heatmap colors represent patch-assigned classes and the intensity represents the relative proximity to the NSCLC subtypes. The blue color signifies a relative proximity to adenocarcinoma prototypes, while the red color signifies a relative proximity to squamous cell carcinoma prototypes. Color intensity indicates a closer distance to prototype features. Images (ad) show regions from the same slides with high heatmap intensity. (A,a,B,b) labeled as squamous cell carcinoma in the TCGA cohort, (C,c,D,d) adenocarcinoma from the HULC cohort. Abbreviation: NSCLC= Non Small Cell Lung Carcinoma.
Algorithms 18 00769 g004
Figure 5. Image (A) shows a region with the explainability patches overlayed, image (B) shows the original sample region. Outlined sections (C) show stroma with pigmented macrophages and (D) shows adenocarcinoma with solid and cribriform components respectively.
Figure 5. Image (A) shows a region with the explainability patches overlayed, image (B) shows the original sample region. Outlined sections (C) show stroma with pigmented macrophages and (D) shows adenocarcinoma with solid and cribriform components respectively.
Algorithms 18 00769 g005aAlgorithms 18 00769 g005b
Table 1. Model performance on internal and external test datasets. The most optimal values are indicated in bold. Abbreviations: 1DCNN: 1-dimensional convolutional neural network, CLAM: clustering-constrained attention multiple-instance learning, EP-MIL: evolving prototype-based multiple instance learning, HULC: Haukeland University Lung Cancer, TCGA: The Cancer Genome Atlas.
Table 1. Model performance on internal and external test datasets. The most optimal values are indicated in bold. Abbreviations: 1DCNN: 1-dimensional convolutional neural network, CLAM: clustering-constrained attention multiple-instance learning, EP-MIL: evolving prototype-based multiple instance learning, HULC: Haukeland University Lung Cancer, TCGA: The Cancer Genome Atlas.
ModelBiobank1 TestHULC TestTCGA Test
Accuracy F 1 -ScoreAccuracy F 1 -ScoreAccuracy F 1 -Score
1DCNN
H-optimus—zero-shot—whole tissue0.7660.6880.8250.8220.5830.515
H-optimus—zero-shot—tumor only0.7660.6880.8140.8050.4580.344
H-optimus—Finetuned—whole tissue0.7660.6880.8230.8230.6250.624
H-optimus—Finetuned—tumor only0.7660.6880.8480.8470.6250.624
CLAM
H-optimus—zero-shot—whole tissue0.7660.7700.8830.8820.5000.333
H-optimus—zero-shot—tumor only0.7660.7700.8020.7980.5200.378
H-optimus—finetuned—whole tissue0.7660.7760.8250.8230.5620.546
H-optimus—finetuned—tumor only0.7660.7700.8950.8950.6250.622
EP-MIL
H-optimus—zero-shot—whole tissue0.7330.7000.4880.3880.5000.333
H-optimus—zero-shot—tumor only0.7000.7050.7440.7420.5000.333
H-optimus—finetuned—whole tissue0.8000.8060.7320.7270.7080.695
H-optimus—finetuned—tumor only0.6660.6640.8020.7950.6660.685
Table 2. Cost evaluation of different models. The lowest costs, representing the most optimal values, are indicated in bold. Preprocessing and feature extraction times are not included. Identical hardware settings were applied for all of the tested methods. Abbreviations: 1DCNN: 1-dimensional convolutional neural network, CLAM: clustering-constrained attention multiple-instance learning, EP-MIL: evolving prototype-based multiple instance learning, RAM: random-access memory, FLOP: floating point operations.
Table 2. Cost evaluation of different models. The lowest costs, representing the most optimal values, are indicated in bold. Preprocessing and feature extraction times are not included. Identical hardware settings were applied for all of the tested methods. Abbreviations: 1DCNN: 1-dimensional convolutional neural network, CLAM: clustering-constrained attention multiple-instance learning, EP-MIL: evolving prototype-based multiple instance learning, RAM: random-access memory, FLOP: floating point operations.
Metric1DCNNCLAMEP-MIL
Training time per epoch or generation (s)6.4149.1981.104
Training peak RAM (MB)1819.3071547.612199.961
Model size on disk (MB)0.7803.0300.685
Total inference time (s)0.7080.5900.008
Mean latency per slide (ms)0.6711.2450.168
Inference peak RAM (MB)1399.5291572.265201.833
FLOPs per single-slide forward pass (ops)272,016,064858,427,18473,759
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Oskouei, S.; Pedersen, A.; Valla, M.; Dale, V.G.; Wahl, S.G.F.; Haugum, M.D.; Ytterhus, B.; Ramnefjell, M.P.; Akslen, L.A.; Kiss, G.; et al. Low-Cost Lung Cancer Classification in WSIs Using a Foundation Model and Evolving Prototypes. Algorithms 2025, 18, 769. https://doi.org/10.3390/a18120769

AMA Style

Oskouei S, Pedersen A, Valla M, Dale VG, Wahl SGF, Haugum MD, Ytterhus B, Ramnefjell MP, Akslen LA, Kiss G, et al. Low-Cost Lung Cancer Classification in WSIs Using a Foundation Model and Evolving Prototypes. Algorithms. 2025; 18(12):769. https://doi.org/10.3390/a18120769

Chicago/Turabian Style

Oskouei, Soroush, André Pedersen, Marit Valla, Vibeke Grotnes Dale, Sissel Gyrid Freim Wahl, Mats Dehli Haugum, Borgny Ytterhus, Maria Paula Ramnefjell, Lars Andreas Akslen, Gabriel Kiss, and et al. 2025. "Low-Cost Lung Cancer Classification in WSIs Using a Foundation Model and Evolving Prototypes" Algorithms 18, no. 12: 769. https://doi.org/10.3390/a18120769

APA Style

Oskouei, S., Pedersen, A., Valla, M., Dale, V. G., Wahl, S. G. F., Haugum, M. D., Ytterhus, B., Ramnefjell, M. P., Akslen, L. A., Kiss, G., & Sorger, H. (2025). Low-Cost Lung Cancer Classification in WSIs Using a Foundation Model and Evolving Prototypes. Algorithms, 18(12), 769. https://doi.org/10.3390/a18120769

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop