Metric-Based Meta-Learning Approach for Few-Shot Classification of Brain Tumors Using Magnetic Resonance Images

Gull, Sahar; Kim, Juntae

doi:10.3390/electronics14091863

Open AccessArticle

Metric-Based Meta-Learning Approach for Few-Shot Classification of Brain Tumors Using Magnetic Resonance Images

by

Sahar Gull

and

Juntae Kim

^*

Department of Computer Engineering, Dongguk University, Seoul 04620, Republic of Korea

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(9), 1863; https://doi.org/10.3390/electronics14091863

Submission received: 18 March 2025 / Revised: 25 April 2025 / Accepted: 29 April 2025 / Published: 2 May 2025

(This article belongs to the Special Issue Revolutionizing Medical Image Analysis with Deep Learning, 2nd Edition)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Brain tumor prediction from magnetic resonance images is an important problem, but it is difficult due to the complexity of brain structure and variability in tumor appearance. There have been various ML and DL-based approaches, but the limitations of current models are a lack of adaptability to new tasks and a need for extensive training on large datasets. To address these issues, a novel meta-learning approach has been proposed, enabling rapid adaptation with limited data. This paper presents a method that integrates a vision transformer with a metric-based model, and few-shot learning to enhance classification performance. The proposed method begins with preprocessing MRI images, followed by feature extraction using a vision transformer. A metric-based Siamese network enhances the model’s learning, enabling quick adaptation to unseen data and improving robustness. Furthermore, applying a few-shot learning strategy enhances performance when there is limited training data. A comparison of the model’s performance with other developed models reveals that it consistently performs better. It has also been compared with previously proposed approaches with the same datasets using evaluation metrics including accuracy, precision, specificity, recall, and F1-score. The results demonstrate the efficacy of our methodology for brain tumor classification, which has significant implications for enhancing diagnostic accuracy and patient outcomes.

Keywords:

brain tumor; meta-learning; few-shot learning; siamese network; classification; vision transformer; magnetic resonance images

1. Introduction

A brain tumor is a mass of abnormal cells that develops within the brain [1]. Brain tumors include a diverse range of types, which can be classified as either non-cancerous (benign) or cancerous (malignant) based on the nature and progression of the tumor cells [2]. Benign tumors develop at a slow pace and stay confined to their original location without spreading to surrounding tissues. On the other hand, malignant tumors grow quickly and have the ability to invade nearby regions. Malignant brain tumors are further classified into various types, with gliomas, meningiomas, and pituitary tumors being among the most prevalent [3]. According to World Health Organization (WHO) statistics, a brain tumor is considered one of the most common cancers [4]. Therefore, accurate classification of brain tumors using magnetic resonance imaging (MRI) is essential for controlling disease progression and lowering the mortality rate associated with brain tumors [5]. Moreover, the classification of MR images is one of the most fundamental tasks in medical image analysis, helping clinical diagnosis and treatment planning. Neurologists frequently utilize MRI scans to study the anatomy of the brain. They offer axial, coronal, and sagittal images of the brain from three different planes. MRI scans depicting these three anatomical views of the human brain are illustrated in Figure 1.

MRI scans [6] are generally more effective than computed tomography (CT) scans due to their lower radiation exposure and superior contrast capabilities. MRI does not use ionizing radiation and has higher spatial and temporal resolution; it can be used to find small vascular anomalies and blood flow. Furthermore, T1-weighted (T1), T1 with gadolinium contrast (T1c), T2-weighted (T2), and Fluid-Attenuated Inversion Recovery (FLAIR) are the four most often utilized MRI sequences [7].

For example, an ordinary dataset has four distinct volumes or sequences for every patient. Each MRI volume is represented as a 3D array with dimensions 240 × 240 × 155, where 240 × 240 pixels correspond to the resolution of each 2D cross-sectional slice, and 155 indicates the number of slices spanning the entire brain. All four sequences are required for the classification of brain tumors since some tumor types can be better seen in particular sequences. T1-weighted images are more likely to be used for viewing necrosis, but FLAIR and T2-weighted sequences are the best for identifying edema [8].

Gliomas are classified by the WHO into categories I to IV based on their level of malignancy. Low-grade gliomas (LGGs) are slow-growing tumors classified as malignant and categorized under grades I and II. Grades III and IV are categorized as high-grade gliomas (HGGs), known for their highly invasive and malignant nature. LGGs are generally benign and offer a more favorable diagnosis, whereas HGGs are aggressive and more malignant. Tumor structures are typically divided into three concentric regions, which are the whole tumor (including all regions), the tumor core (excluding edema), and the enhancing tumor (the active region containing necrosis) [9].

In radiology, the use of artificial intelligence (AI) [10] has considerably lowered error rates, surpassing human capabilities. Machine learning (ML) [11] and deep learning (DL) [12] are the subfields of AI that assist radiologists in quickly identifying and classifying tumors without the need for surgery, making them highly efficient for classification tasks [13]. Convolutional neural networks (CNNs) are a branch of deep learning widely used in medical imaging [14]. Several CNN models, such as GoogleNet [15], AlexNet [16], SqueezeNet [17], and ShuffleNet [18] are employed for classification tasks [19]. These models have demonstrated considerable identification ability in medical classification tasks, such as the classification of skin lesions [20], breast cancer classification [21], and brain tumor classification [22]. In recent years, computer-aided diagnosis (CAD) systems [23] constrained by deep learning have demonstrated significant potential in the diagnosis of various types of cancer [24]. Furthermore, CAD systems have historically depended on traditional machine learning techniques, which frequently had trouble generalizing well across various datasets [25]. As a result, these conventional techniques emphasize the necessity of meta-learning strategies that can enhance diagnostic precision and adjust to various clinical situations. Additionally, the limited availability of data highlights the need for advanced methods that can effectively learn from small datasets.

Meta-learning involves various methods to improve adaptability by leveraging knowledge (meta-data) gained from previous experiences with limited data. This allows for rapid adjustment to new tasks without requiring a complete retraining process [26]. Often referred to as learning to learn, this concept involves the ability of a model to improve its learning process based on past experiences, thereby optimizing its performance on new tasks [27]. The initial embedding network in meta-learning becomes critical, significantly influencing the model’s performance because each task only has a limited number of samples. The effectiveness of this embedded network is essential to many existing techniques [28].

The classical approaches in meta-learning, particularly related to conventional classification methods, are primarily categorized into three types: metric-based, model-based, and optimization-based meta-learning [29,30,31]. Among these methods, metric-based learning emerges as an effective approach. It has become a superior approach across various domains due to its ability to enhance inter-class separability and intra-class similarity. In this context, a high similarity between two images indicates that they are likely to belong to the same category. Metric-based meta-learning approaches aim to develop a distance metric that accurately measures sample similarity, making them adaptable to new learning tasks [32]. Current metric-based meta-learning techniques predominantly focus on network architectures, including Siamese networks [33], matching networks [34], prototypical networks [35], relation networks [36], and many others [37,38].

Meta-learning for medical image classification involves training a model to quickly adapt to new classification tasks using limited data [39]. During meta-training, the model is exposed to multiple small classification tasks, each derived from different MRI datasets, enabling it to extract transferable patterns. A meta-learning algorithm, such as a Siamese network or a Prototypical network, optimizes the model’s ability to generalize across tasks by updating its parameters efficiently. In meta-testing, the trained model is evaluated on unseen tumor classification tasks with minimal training samples, leveraging its learned adaptability to achieve high accuracy even with limited labeled MRI scans. This approach enhances robustness and generalization in medical image classification, particularly for rare tumor types [40].

Few-shot learning [41,42] aims to develop models that can adapt to unseen tasks with minimal training data. This approach requires the model to classify using only a limited number of samples per class in the training phase. A few-shot learning technique is used to mitigate the risk of overfitting scenarios with low data volume [43]. During testing, a few-shot learning model identifies the distinctive features of an unseen class [44,45,46,47,48].

This paper presents a method that integrates a vision transformer (ViT), a metric-based model, and few-shot learning to enhance classification performance. The MRI images are first preprocessed, followed by feature extraction using a ViT. A metric-based Siamese network improves the model’s learning ability, enabling fast adaptation to new data and improving robustness. Furthermore, incorporating a few-shot learning framework boosts performance, especially when training data are scarce.

The key contributions of this paper are summarized as follows:

We introduce a novel metric-based meta-learning approach that is designed for effective few-shot MR image classification tasks with advanced vision transformer techniques.
The research highlights that meta-learning algorithms are inherently suited for few-shot learning, enabling models to efficiently learn new tasks with minimal training data, unlike traditional machine learning methods that typically require large datasets.
A comprehensive comparative analysis is conducted against multiple state-of-the-art models, demonstrating the proposed framework’s effectiveness in handling data-limited scenarios and enhancing robustness in medical image analysis.

The remainder of this manuscript is organized as follows: Section 2 discusses the background and relevant literature related to medical image classification. Section 3 elaborates on the proposed method. Section 4 outlines the selected medical imaging dataset, evaluation metrics, and experimental setup. Finally, Section 5 provides the conclusion and some directions for future studies.

2. Literature Review

Meta-learning, particularly in the context of few-shot settings, has emerged as a significant area of research that has been extensively explored. This excerpt summarizes the recent deep learning-based approaches for brain MRI classification, including vision transformers, which are relevant to this study.

2.1. Brain MRI Classification

Deep learning-based CNN architectures are widely used to tackle challenges in brain tumor classification [49]. Various techniques have been proposed to enhance their performance, including data augmentation [50], transfer learning [51,52], and ensemble methods [53]. Furthermore, integrating attention mechanisms and hybrid models improves tumor classification accuracy by capturing spatial and temporal features within MRI scans.

Louis et al. developed an efficient automated deep learning approach for brain tumor classification [9]. The presented method used a Figshare dataset of 233 patients, and the images were tested without any prior processing. The outputs from five refined pre-trained models, GoogleNet, AlexNet, ShuffleNet, SqueezeNet, and NASNet-Mobile were utilized in this study. These models demonstrated strong performance in classification tests and could compete with more advanced CNN architectures.

Mohanty et al. [54] proposed a deep learning-based CNN model with four convolutional layers. One of the key innovations in this approach was the feature extraction method. This method aggregated and combined features from all layers. The most prominent and clinically significant feature was the soft attention mechanism, which notably enhanced classification accuracy.

An optimized deep learning model was developed by Sharif et al. [55] for the classification of brain tumors using multimodal imaging data. During the preprocessing phase, a nine-layer CNN model was trained for contrast enhancement through a combination of hybrid division histogram equalization and an ant colony optimization method. A multi-class support vector machine (MC-SVM) was fed the outputs from both techniques after fusing them using a matrix length technique. On the BRATS 2013 [56], BRATS 2015 [57], BRATS 2017 [58], and BRATS 2018 [59] datasets, the developed technique obtained accuracies of 99.06%, 98.76%, 98.18%, and 94.6%, respectively.

Gull et al. [60] proposed a deep learning framework for multi-stage brain tumor classification utilizing a fast bounding-box approach for efficient segmentation. Furthermore, three publicly accessible datasets (Brain Tumor MRI Dataset, Figshare, and REMBRANDT) were utilized with three CNN-based models for the multi-stage classification. The dataset size of the MR images was expanded using a variety of augmentation approaches. The first deep CNN framework to be presented was Classification-1, which divided MR images into two groups: normal and pathological. The second framework, referred to as Classification-2, distinguished three categories of brain cancers: pituitary tumors, gliomas, and meningiomas. Similarly, tumor MR images were classified into four grades by the third framework, referred to as Classification-3.

A deep learning-based framework consisting of transfer learning and ensemble learning was presented by Remzan et al. [61]. The pre-trained models ResNet50 and DenseNet121 were trained on ImageNet. The dataset used to train the models included 7023 MRI images consisting of pituitary tumors, gliomas, meningiomas, and normal brain scans. Two ensemble strategies were applied. The first was the Stacking Ensemble method, which combined SVM, KNN, Extra Trees, and MLP as base learners with Logistic Regression as the meta-learner. The second was the Feature Ensemble method, which merged the two selected features to improve classification performance [61].

Deep learning models have traditionally relied on conventional machine learning techniques, which often fail to generalize well, especially when dealing with small datasets. These models tend to overfit, capturing noise instead of meaningful patterns, which limits their applicability in real-world clinical scenarios where labeled data are often scarce. Moreover, detecting tumors from MRI scans presents unique challenges, such as low contrast between normal and abnormal tissues, variations in tumor shape, size, and location, as well as the presence of noise and imaging artifacts that can obscure critical features [11,50]. These factors further complicate the learning process and require more robust, adaptive approaches. This limitation highlights the importance of meta-learning strategies to enhance the overall performance of models.

2.2. Vision Transformer

Vision transformers (ViTs) have emerged as a powerful alternative to traditional convolutional neural networks (CNNs) by leveraging self-attention mechanisms to capture global dependencies across the entire image, rather than relying solely on local feature extraction like CNNs. This ability to handle large-scale datasets efficiently makes ViTs particularly effective for image classification tasks [62].

A computer-aided diagnosis method RanMerFormer was proposed by Wang et al. [63] for brain tumor classification. The backbone model was a vision transformer that had already been trained. The RanMerFormer’s head had a randomized vector functional connection, which enabled quick training. All the simulation results were taken from two publicly available benchmark datasets (Figshare and Brain Tumor MRI Dataset) for classifying brain tumors.

Goceri et al. [64] employed the CNN and transformer-based models for glioma classification using histopathological images. Conversely, vision transformers face challenges in capturing detailed and local features while effectively extracting global features via global receptive fields in their early layers. The results indicated that the proposed method attained an accuracy of 96.75% on The Cancer Genome Atlas (TCGA) dataset.

Krishnan et al. [65] presented a rotation-invariant ViT architecture based on deep learning to classify brain tumor MR images. The Rotation-Invariant Vision Transformer (RViT) used rotated patch embeddings to enhance brain tumor detection capabilities. This study exhibited that the use of rotating patch embeddings significantly enhanced the model’s ability to manage various orientations effectively.

An enhanced transformer model for tumor classification was developed by Srinivas et al. [66], integrating a data-efficient image transformer (DeiT) with the firefly algorithm (FA). The model was designed to effectively capture spatial correlations and structural patterns in brain MRI images using a transformer-based architecture.

Dutta et al. [67] designed a global transformer network (GT-Net) to classify multi-class brain tumor MR images. A Global Transformer Module (GTM) integrated into the backbone network served as the core component of the GT-Net. A generalized self-attention block (GSB) was integrated to improve the extraction of critical tumor lesion features while suppressing less relevant information that captures the feature interdependencies across spatial and channel dimensions.

2.3. Meta-Learning-Based Methods

Meta-learning, often referred to as learning to learn, is an emerging paradigm in machine learning that focuses on developing algorithms capable of rapidly adapting to new tasks with minimal data [68,69]. This capability is particularly valuable in medical image classification, where labeled data are often scarce due to the high costs [70]. Furthermore, meta-learning methods [71] aim to optimize models across multiple tasks rather than a single task, which distinguishes them from traditional machine learning approaches [72]. As research continues to evolve, integrating these methods with advancements in medical imaging technology holds great promise for improving diagnostic accuracy [73].

Ali et al. [74] developed a meta-learning ensemble approach for breast cancer classification using transfer learning. The proposed method was evaluated using the Ultrasound Images of BUSI dataset. The pre-trained CNN models, including Inception V3, ResNet50, and DenseNet121 were utilized to enhance feature extraction. Additionally, data augmentation techniques were applied to expand the dataset.

A metric-based meta-learning HSADML framework was proposed by Verma et al. [75] for brain tumor classification using SphereFace loss. SphereFace loss enhanced class differentiability by embedding features within a hyperspherical manifold. The model leveraged deep metric learning with SphereFace loss to ensure that samples of the same class were closely clustered, while those from different classes were distinctly separated using the Figshare dataset [76].

Chen et al. [77] presented a self-supervised learning (SSL) model for few-shot image classification. Two few-shot classification datasets, Mini ImageNet and CUB, were used. In the first stage, self-supervised learning was utilized to train a large embedding network (AmdimNet) while the second stage applied a meta-learning framework to fine-tune the model in an episodic manner.

A gradient-based MetaMed approach was employed by Singh et al. [78] for few-shot medical image classification. In this work, transfer learning was used to be the Reptile model. To address the issue of overfitting, augmentation techniques such as CutOut, MixUp, and CutMix were applied to three datasets: Pap smear [79], BreakHis [80], and ISIC [81]. On the ISIC dataset, MetaMed was able to classify images for both the two-way five-shot and three-way five-shot tasks when compared to classical transfer learning.

A few-shot learning-based meta-model was presented by Sekhar et al. [82] for histopathology image classification. In this work, several meta-learning models, such as MAML, ProtoNet, DeepEMD, SimpleShot, and LaplacianShot were used for classification. Furthermore, four histopathology datasets (TCGA, NCT, LC25000, and CRC-TP) were utilized for a few-shot classification. The evaluation included five-way one-shot, five-way five-shot, and five-way ten-shot scenarios.

Jiang et al. [39] developed a multi-learner-based FSL approach that integrates meta-learning, transfer learning, and metric learning to address various medical image classification challenges. Three learners made up the developed model: a task learner, a metric learner, and an auto-encoder. A dynamic Gaussian disturbance soft label (GDSL) scheme was employed as an effective generalization technique for few-shot classification tasks. The experiments focused on three-class classification tasks using the BLOOD [83], PATH [84], and CHEST [85] datasets.

Işık et al. [86] employed a meta-learning technique to develop a few-shot learning approach for the classification of breast cancer using ultrasound images. A cross-domain meta-testing strategy was applied using the BUSI dataset along with supplementary datasets for meta-training. The presented technique used ProtoNet with 10-shots to develop the ResNet50 backbone. Furthermore, the ResNet model served as the feature extraction backbone network. This study applied meta-learning for few-shot classification on the BUSI dataset.

3. Research Methodology

This study proposes an approach based on a transformer and a meta-learner to classify MR images with limited data. The proposed framework begins with the input of MRI images, which undergo preprocessing before being split into training and testing sets. Few-shot learning settings are applied to handle classification with limited data. After that, a vision transformer (ViT) is utilized as a feature extractor. In the target domain, metric-based meta-learning is performed by comparing feature embeddings of support and query sets to enable accurate classification. The workflow of the proposed framework is illustrated in Figure 2.

3.1. Preprocessing

An essential step in the preprocessing phase is resizing images to a uniform dimension. All images are resized to 224 × 224 pixels using bilinear interpolation to fit the input dimensions of the vision transformer. This setting is crucial because many meta-learning models require input data to have a consistent shape. Apart from resizing, normalization is employed to modify the pixel intensity values of the image, usually scaling them to the range of 0 to 1. This adjustment is particularly crucial in medical imaging datasets such as MRI scans, where pixel intensity values can vary significantly due to differences in acquisition settings. Normalizing these intensity values helps to improve model performance by reducing variability and ensuring consistency across the dataset. Additionally, the images are converted into tensor format to ensure they are in the appropriate format for model training.

3.2. Feature Extraction

The ViT model originally presented in [87] adapts the transformer model for image analysis by processing image patches as tokens, like words in natural language processing. It is effective to capture both local and global image features for feature extraction. In this work, the ViT uses only an encoder, while the original transformer has both an encoder and a decoder. The input image has dimensions R^H×W×C (H: height, W: width, C: number of channels). It is split into N patches, each of size

P \times P \times C

, where

N = \frac{H W}{P^{2}}

. To preserve spatial information, these patches are then flattened, linearly embedded, and enhanced with positional embeddings. A multilayer perceptron (MLP) head is used with an extra learnable embedding to make classification easier. A transformer encoder that has alternating layers of multi-head self-attention and MLP blocks receives the series of patches with positional embeddings (Figure 3).

In the ViT, similar to the use of the [CLS] token in BERT [88], a learnable classification embedding

I_{c l a s s}

is concatenated with the patch embedding sequence. The mathematical foundation of the ViT is defined in Equations (1)–(6) [89]. In Equation (1), the positional embedding matrix (

E_{p o s} {\in R}^{(N + 1) \times D}

) consists of learnable parameters that encode spatial information. Each patch embedding

x_{p}^{N}

is derived through a learnable linear projection using a patch embedding matrix (

{E \in R}^{D_{p} \times D} w h e r e D_{p} = P^{2} . C

). This matrix transforms each flattened patch (of size

P^{2} . C

) into a D-dimensional embedding. The output of this linear projection layer yields

z_{0}

, which includes the classification token, patch embeddings, and positional embeddings. The positional embeddings maintain the spatial order of the input patches, ensuring that the ViT understands the layout of the image.

In the transformer encoder’s architecture [90], the Multi Self-Attention Dropping (MSAD) function prevents overfitting by randomly dropping some weights during training. During each forward pass, MSAD randomly sets certain outputs of the multi-head self-attention mechanism to zero. After that, a residual connection is applied, resulting in an output

z_{l}^{'}

at the appropriate layer

l

. As seen in Equations (2) and (3), the next block similarly starts with layer normalization (LN), then an MLP and a residual connection, producing output

z l

. LN is a standard operation used in transformer architectures to stabilize and accelerate training by normalizing inputs across the feature dimension. This encoder structure is depicted in Figure 3.

Each transformer block’s MLP is made up of two completely linked layers that are activated by a Gaussian Error Linear Unit (GELU). The final latent representation

y

of the input image

I

is derived by applying layer normalization to the output obtained from Equation (4) after processing through all encoder layers

z_{L}^{0}

. During both pre-training and fine-tuning, this final latent representation is associated with an MLP head or classification head.

z_{0} = [I_{c l a s s}; x_{p}^{1} E; x_{p}^{2} E; \dots; x_{p}^{N} E] + E_{p o s} {{E \in R}^{D_{p} \times D} w h e r e D_{p} = P^{2} . C, E}_{p o s} {\in R}^{(N + 1) \times D}

(1)

z_{l}^{'} = M S A (L N (z_{l - 1})) + z_{l - 1} l = 1 \dots L

(2)

z_{l} = M L P (L N (z_{l}^{'})) + z_{l}^{'} l = 1 \dots L

(3)

y = L N ((z_{L}^{0}))

(4)

Multiple self-attention heads are combined in the transformer encoder to generate the output from multi-head self-attention (MSA). Equation (5) provides a mathematical representation of the self-attention process. Here, the query (Q), key (K), and value (V) matrices are derived by applying matrix multiplications to the previous layer’s output

z_{l - 1}

. Specifically, the query matrix is formed as

Q = z_{l - 1}

, where

{Q \in R}^{(N + 1) \times D}

and

W_{Q} ϵ R^{D \times D}

. Similarly, the key and value matrices are K

= z_{l - 1} W_{k}

and

V = z_{l - 1} W_{v}

, respectively. The weights

W_{Q}, W_{k}

, and

W_{v}

are learnable parameters within the model.

The query and key products are scaled by the square root of the dimensionality within each self-attention head

(H ϵ R^{(N + 1) \times D})

, as shown in Equation (5), which assists in solving problems like the vanishing gradient problem.

H = A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{{Q k}^{T}}{\sqrt{D}}) V

(5)

M S A (Q, K, V) = [H_{1}, H_{2}, {\dots, H}_{h}] W_{O}

(6)

Equation (6) illustrates that the concatenated outputs of each self-attention head are passed via a linear layer to form the final output of multi-head self-attention (MSA), which is represented as

(R^{(N + 1) \times D})

. In this case,

h

stands for the total number of self-attention heads, while

W_{o}

indicates the learnable transformation matrix.

3.3. Siamese Network as Meta-Learner

The Siamese network was initially proposed in the early 1990s by Bromley and LeCun to address the problem of signature verification by treating it as an image-matching task [91]. A Siamese network can be designed with two interconnected subnetworks that leverage contrastive loss. The Siamese subnetworks consist of identical sequences of convolutional layers designed to share the same weight, as shown in Figure 4. These subnetworks are architecturally the same as the neural networks used in the baseline fine-tuning model for comparison. A pair of images

X_{i}

and

X_{j}

are input into the two-subnet Siamese network during training. The contrastive loss function [78] for the pair is calculated as follows:

L_{c} (X_{i}, X_{j}) = y \times | | f_{i} - f_{j} {| |}^{2} + (1 - y) \times \max (0, m - | | f_{i} - f_{j} {| |}^{2})

(7)

The Euclidean distance is represented by

| | f_{i} - f_{j} {| |}^{2}

; m is the margin, and

y = 0

if

X_{i}

and

X_{j}

are from different classes, and

y = 1

if they are from same class. By optimizing the loss function, the network learns to reduce the distance between embeddings of the same class

(y = 1)

, and increase the distance between embeddings of different classes

(y = 0)

up to the defined margin

m

. Since both networks perform the same computation, weight sharing guarantees that two highly similar images are not mapped to significantly different locations in the feature space by their respective networks. The network operates symmetrically, meaning that when the twin networks are presented with two distinct images, the top merging layer computes the same metric as it would if the images were swapped between the twin networks.

The Siamese network is utilized to learn similarities between pairs of feature vectors. The network calculates the L2 distance between these vectors and uses a sigmoid activation function to classify pairs as either similar or dissimilar. This approach leverages metric-based learning, aiming to map similar pairs close together in the feature space while placing dissimilar pairs farther apart. A pair of images is passed into the Siamese network for classification. One image is selected from the support set, which contains labeled examples from known classes, while the other is from the query set, which contains unlabeled images that need to be classified based on their similarity to the support examples. The Siamese network computes the distance between the feature embeddings of the two images using the L2 distance function. If the distance is small, it means the two images are similar. If the distance is large, it means the two images are dissimilar.

Figure 4. Structure of Siamese network for metric-based meta-training.

3.4. Few-Shot Classification

During classification, the network follows a few-shot learning settings. It is especially useful in medical imaging, where it can be challenging to obtain large, annotated datasets. In a few-shot learning, a support set contains a few labeled examples from each class, and the query set contains the images to be classified. For each image in the query set, the network computes the distance between the query image’s embedding and all embeddings from the support set (one from each class). The class with the smallest distance (highest similarity) is assigned as the predicted class for the query image.

In few-shot learning, the

k

-way,

n

-shot problem involves classifying inputs into

k

classes with

n

examples per class. For instance, in a four-way, one-shot scenario, the model receives one example from each of the four classes (a total of four examples). The meta-learning process consists of episodes, each including meta-training and meta-update steps.

A base learner is represented by

y = f (x; \emptyset)

where x is the input, ∅ represents the neural network weights, and

α

is the learning rate.

During meta-training, the meta-parameter ∅ is used to initialize the classification model

y = (x; \emptyset_{k})

where

\emptyset_{k} = \emptyset

for each task

k

out of a total of

m

tasks. Each task is generated from the meta-training dataset

(D_{m e t a - t r a i n})

and the task-specific parameter

\emptyset_{k}

is independently optimized for each task using the Adam optimizer [90]. Subsequently, the meta-parameter ∅ is refined through a meta-updated process based on the learned task-specific parameters

\emptyset_{k}

, resulting in an updated parameter

\emptyset^{'}

. The update rule for this procedure is defined as

\emptyset^{'} \leftarrow \emptyset + α \frac{1}{m} \sum_{k = 1}^{m} (\emptyset_{k}^{'} - \emptyset)

(8)

(D_{m e t a - t r a i n})

represents the training split and

(D_{m e t a - t e s t})

represents the test split. To fine-tune the weights acquired in Equation (8) for fewer iterations (h) using Adam with learning rate (α), the

k

-shot task taken from the training split

(D_{m e t a - t r a i n})

comprises a total of

n k

images (

k

images per class) for each

n

-way [69]. This produces

\emptyset^{″}

, which is used to evaluate the model’s accuracy on the test dataset using

\emptyset^{″}

on the test

(D_{m e t a - t e s t})

, containing

n

images.

During classification, each query sample is compared against a set of reference samples. The classification decision is made based on the similarity scores computed by the Siamese network. To identify the test sample, the model employs a single training sample for each class and pairs of samples from the same class or other classes throughout the training process. For each query, the model identifies the most similar samples from each class and assigns the query to the class of the most similar samples or uses a nearest-neighbor approach for classification. The distance between the two embeddings is passed through a dense layer with a sigmoid activation, which outputs a probability between 0 and 1. If the output is close to 1, it indicates that the two images are similar and likely belong to the same class. If the output is close to 0, it indicates that the images are dissimilar and belong to different classes.

4. Experimental Results

This section outlines the experimental results and a comparative analysis of the proposed method. The developed model was tested on a publicly available MRI dataset comprising real patient brain tumor data, where it exhibited enhanced performance.

4.1. Dataset

In this research, the model was trained on a publicly available dataset for MRI brain scans. The Brain Tumor MRI Dataset combines three datasets—Figshare, Sartaj, and Br35H—which have been widely used in medical imaging and brain tumor classification [91]. These datasets provided a collection of 7023 MRI images. It includes various types of brain tumor images, comprising 1645 slices of meningioma tumors, 1621 slices of glioma tumors, and 1757 slices of pituitary tumors. Additionally, 2000 images depict healthy brain scans labeled as the no tumor class, with these non-tumor images sourced specifically from the Br35H dataset. Moreover, the dataset division for each class as well as the support and query set split are presented in Table 1.

One challenge presented by the dataset is the diversity in image resolution, as it includes MRI scans with differing dimensions, some at 512

\times

512 pixels, while others are sized at 256

\times

256 pixels. All images are resized to a standard dimension of 224

\times

224 pixels in the preprocessing stage to ensure uniformity and compatibility for model training. This resizing step is essential to prevent potential discrepancies due to varying image resolutions. Furthermore, this dataset encompasses four primary MRI modalities, which are T1-weighted, T2-weighted, T1CE (T1-weighted contrast-enhanced), and FLAIR (Fluid-Attenuated Inversion Recovery). Each modality highlights unique structural and functional aspects of the brain, which allows the model to capture comprehensive features associated with different tumor types and healthy brain tissue.

The dataset for the support set is tailored to assist a few-shot learning approach, which is effective for handling scenarios with limited labeled data. The study evaluated the model’s performance under varying levels of data scarcity by organizing the training samples into three categories: 1-shot, 5-shot, and 10-shot settings. This involved randomly selecting 1, 5, or 10 examples per class to form the support set for training. The data used as the query set were reserved solely for evaluation purposes and were completely excluded from the training process. As shown in Table 1, the query set consists of unseen data, providing a reliable basis for evaluating the model’s generalization capability. The consistent shot settings were maintained between the meta-training and meta-testing stages, as any mismatch in the number of shots could degrade performance due to inconsistency in the learning objectives. This setup enabled an effective evaluation of the model’s adaptability to be limited to training data. The goal is to develop a model that is accurate and capable of generalizing across various MRI modalities and tumor types. The types of brain tumors in the dataset are illustrated in Figure 5 [91].

4.2. Model Implementation

In this study, the proposed model was implemented using Python (version 3.10.0). For training the model, TensorFlow, an open-source, high-level library, was utilized, which provides a range of powerful tools and APIs for employing neural networks and managing computational workflows.

The experiments were conducted on a Windows 64-bit system equipped with a 9th-generation Intel Core i7 processor and an NVIDIA GeForce RTX 2060 Super GPU supported by 40 GB of RAM. This configuration provided a robust environment for efficiently handling the computational demands of training and validating the model on the dataset of MRI brain images. The GPU is particularly beneficial in accelerating the meta-learning processes, allowing the model to learn from data more effectively and reducing overall training time.

4.3. Training Details

During the training phase, the model was trained for 10 epochs with a batch size of 32. Optimization was performed using the Adam optimizer with an initial learning rate set to 0.001 and a weight decay of 0.01 to promote model generalization. Binary cross-entropy loss was utilized along with a sigmoid activation function to train the models effectively from scratch. Two variations were implemented—baseline and baseline++—with normalization applied to the linear projection layer and feature norms constrained to a constant value prior to L2 normalization.

This approach standardized the feature magnitudes, facilitating consistent learning in the Siamese network framework.

For meta-training, a four-way classification strategy, sampling episodes with four classes for each training iteration, was employed. Within each class, k-labeled instances were used, where k represented the shot number (1, 5, and 10) as the support and query sets for few-shot learning tasks. The model’s performance on the validation set was evaluated every 10 epochs to identify the training episodes with the highest accuracy, ensuring the optimization of the best-performing episodes during meta-training. The specific experimental details are presented in Table 2.

The results, presented in Figure 6 and Figure 7, show training and validation accuracy as well as loss over epochs for 10-shot and 5-shot tasks. The training accuracy demonstrates a steady improvement, while validation accuracy fluctuates, indicating potential challenges in generalization. This validates the effectiveness of the experimental setup in optimizing few-shot learning for generalization to unseen data.

4.4. Evaluation Metrics

The effectiveness of image classification is measured using performance metrics including accuracy, specificity, sensitivity, precision, and F1-score. Equations (9)–(13) provide the formula for determining these performance metrics.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(9)

S p e c i f i c i t y = \frac{T N}{T N + F P}

(10)

S e n s i t i v i t y (R e c a l l) = \frac{T P}{T P + F N}

(11)

P r e c i s i o n = \frac{T P}{T P + F P}

(12)

F 1 S c o r e = 2 \times \frac{R e c a l l \times P r e c i s i o n}{R e c a l l + P r e c i s i o n}

(13)

In this context, True Positive (TP) refers to instances where the model correctly classifies an image as belonging to the positive class. True Negative (TN) describes instances where the model correctly identifies an image as not belonging to the positive class, assigning it to the negative class. A False Positive (FP) occurs when the model incorrectly classifies an image as positive when it belongs to a different class. In contrast, a False Negative (FN) happens when the model incorrectly labels an image as negative, failing to recognize it as positive. These metrics are derived from the confusion matrix and offer detailed insights into the model’s performance and error distribution across all categories.

In medical image classification tasks, such as brain tumor detection, relying solely on accuracy can be misleading, particularly in cases of class imbalance. Therefore, metrics such as precision, sensitivity (recall), specificity, and F1-score are essential for evaluating the model’s diagnostic performance. These metrics offer a complete evaluation of the model’s ability to accurately detect tumors while reducing the likelihood of misclassification, which is critical for ensuring reliable outcomes in clinical settings.

4.5. Results

This section delves into analyzing the performance of few-shot classification models utilizing meta-learning algorithms. The five-evaluation metrics were utilized across all experiments, as these are the standard measures for assessing classification task performance. The model was tested on 1-shot, 5-shot, and 10-shot learning scenarios for a four-way classification problem.

The proposed ViT + Siamese network model integrates a vision transformer to extract high-level features and utilizes a Siamese structure to compute similarity between support and query samples for classification. ViT effectively captures global contextual information from brain MRI images, which is particularly beneficial in data-constrained scenarios, while the Siamese network enables effective comparison between support and query samples based on learned feature embeddings. This combination enhances both feature representation and class discrimination in few-shot learning settings.

Additional architectures evaluated in this study include the ViT alone, the ViT with a matching network, and the ViT with Model-Agnostic Meta-Learning (MAML). All models were trained and tested on the same brain MRI dataset, using consistent 1-shot, 5-shot, and 10-shot few-shot learning settings. Each architecture incorporated the vision transformer as the feature extractor. In the ViT + matching network model, pairwise distance-based classification was performed using cosine similarity between support and query embeddings. In the ViT + MAML model, the ViT was used as the base learner, and MAML’s inner-loop adaptation was applied to fine-tune the model for each task using gradient-based updates.

The comprehensive results of these models are presented in Table 3. Among them, the ViT + Siamese network framework achieved an accuracy of 60.11 ± 0.25 with 10 shots, 58.30 ± 0.09 with 5 shots, and 50.00 ± 0.05 with 1 shot, indicating its effectiveness in few-shot settings. The ViT + Siamese network demonstrated a superior performance for brain tumor classification compared to the other developed models. The complementary strengths of the ViT and the Siamese network are emphasized as key contributors to the effectiveness of the proposed framework. The ViT + matching network achieved the second-best accuracy, slightly behind the ViT + Siamese network, which consistently outperformed the others across all few-shot scenarios. The ViT + MAML model followed, showing moderate performance, while the standalone ViT model achieved the lowest accuracy. These results highlight that metric-based meta-learning approaches, such as Siamese and matching networks, are more effective in few-shot classification settings than optimization-based methods like MAML, especially in the context of brain tumor classification using MRI data.

Table 4 presents a comparison between our previously proposed deep learning approaches [22,60] and the current proposed meta-learning methodology, evaluated on the same brain MRI dataset under few-shot learning settings. The results indicate that the proposed approach outperforms conventional deep learning models, demonstrating improved performance in scenarios with limited training data. This highlights the effectiveness of integrating a meta-learning framework for brain tumor classification when only a few labeled samples are available.

The proposed model integration enhances both feature extraction and classification accuracy within a few-shot learning framework, effectively addressing the challenges posed by limited data scenarios. Furthermore, the developed model demonstrates higher classification accuracy, sensitivity, and specificity, critical for medical diagnosis. Furthermore, the model’s generalizability suggests potential applications across other medical imaging domains. The comprehensive results underscore that the proposed model significantly advances current meta-learning approaches, establishing a new benchmark for brain tumor classification. Its effectiveness is evaluated alongside recent advancements in meta-learning [32], particularly those applied in the medical imaging domain. Although no prior studies have utilized meta-learning specifically for brain MRI classification, related research in other medical imaging tasks [39] demonstrates the growing potential of meta-learning in handling data-scarce environments.

5. Conclusions

In this work, a meta-learning strategy was proposed for brain MR image classification. The approach incorporates few-shot learning to enhance generalization with limited data, metric-based learning to optimize training, and a vision transformer (ViT) for effective feature extraction. MR images were first preprocessed, and then features were extracted using ViTs. To further enhance the learning process, a Siamese network was used as a meta-learner, which improved the model’s robustness and enabled quick adaptation to MR images. Furthermore, additional approaches, such as the ViT combined with MAML and a matching network, were also evaluated. The detailed evaluation showed that the proposed method outperformed other developed techniques, with a noticeable improvement in accuracy for 1, 5, and 10 shots, highlighting its efficacy with few annotated samples. Therefore, this work makes a valuable contribution to the emerging field of meta-learning in medical diagnostics. Our suggested method provides an effective and flexible solution in AI-driven medical diagnostics by addressing the drawbacks of current classification strategies.

For future research, focusing on integrating multimodal data, developing innovative data augmentation strategies, enhancing robustness to image variability, and addressing challenges in real-time implementation could significantly advance the field. These efforts collectively aim to improve diagnostic accuracy and efficiency by making optimal use of limited labeled data. This research aims to make a helpful contribution to the community by advancing the application of meta-learning approaches in the medical field.

Author Contributions

Conceptualization, S.G. and J.K.; Data curation, S.G.; Formal analysis, S.G.; Funding acquisition, J.K.; Methodology, S.G.; Software, S.G.; Supervision, J.K.; Visualization, S.G.; Writing—original draft, S.G.; Writing—review and editing, J.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (2021R1A2C2008414), the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2025-RS-2020-II201789), and the Artificial Intelligence Convergence Innovation Human Resources Development (IITP-2025-RS-2023-00254592) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation).

Institutional Review Board Statement

This study utilized publicly available, anonymized data from the Brain Tumor MRI Dataset on Kaggle. According to the Bioethics and Safety Act of South Korea, research using such data is exempt from Institutional Review Board (IRB) review.

Informed Consent Statement

Not applicable.

Data Availability Statement

We utilized a publicly available dataset to train our models. The datasets can be accessed using the following link: https://www.kaggle.com/datasets/masoudnickparvar/brain-tumor-mri-dataset (last accessed on 10 November 2024).

Conflicts of Interest

The authors claim that this study does not include conflicts of interest.

References

Chinnam, S.K.R.; Sistla, V.; Kolli, V.K.K. Multimodal attention-gated cascaded u-net model for automatic brain tumor detection and segmentation. Biomed. Signal Process. Control 2022, 78, 103907. [Google Scholar]
Deeksha, K.; Deeksha, M.; Girish, A.V.; Bhat, A.S.; Lakshmi, H. Classification of brain tumor and its types using convolutional neural network. In Proceedings of the 2020 IEEE International Conference for Innovation in Technology (INOCON), Bangluru, India, 6–8 November 2020; pp. 1–6. [Google Scholar]
Rasool, M.; Ismail, N.A.; Boulila, W.; Ammar, A.; Samma, H.; Yafooz, W.M.; Emara, A.-H.M. A hybrid deep learning model for brain tumour classification. Entropy 2022, 24, 799. [Google Scholar] [CrossRef] [PubMed]
Wen, P.Y.; Packer, R.J. The 2021 who classification of tumors of the central nervous system: Clinical implications. Neuro-oncology 2021, 23, 1215–1217. [Google Scholar] [CrossRef]
Abd El-Wahab, B.S.; Nasr, M.E.; Khamis, S.; Ashour, A.S. Btc-fcnn: Fast convolution neural network for multi-class brain tumor classification. Health Inf. Sci. Syst. 2023, 11, 3. [Google Scholar] [CrossRef]
Mokri, S.; Valadbeygi, N.; Grigoryeva, V. Diagnosis of glioma, menigioma and pituitary brain tumor using mri images recognition by deep learning in python. EAI Endorsed Trans. Intell. Syst. Mach. Learn. Appl. 2024, 1, 1–9. [Google Scholar]
Ciceri, T.; Casartelli, L.; Montano, F.; Conte, S.; Squarcina, L.; Bertoldo, A.; Agarwal, N.; Brambilla, P.; Peruzzo, D. Fetal brain mri atlases and datasets: A review. NeuroImage 2024, 292, 120603. [Google Scholar] [CrossRef]
Roozpeykar, S.; Azizian, M.; Zamani, Z.; Farzan, M.R.; Veshnavei, H.A.; Tavoosi, N.; Toghyani, A.; Sadeghian, A.; Afzali, M. Contrast-enhanced weighted-t1 and flair sequences in mri of meningeal lesions. Am. J. Nucl. Med. Mol. Imaging 2022, 12, 63. [Google Scholar]
Louis, D.N.; Perry, A.; Wesseling, P.; Brat, D.J.; Cree, I.A.; Figarella-Branger, D.; Hawkins, C.; Ng, H.; Pfister, S.M.; Reifenberger, G. The 2021 who classification of tumors of the central nervous system: A summary. Neuro-oncology 2021, 23, 1231–1251. [Google Scholar] [CrossRef]
Gull, S.; Akbar, S. Artificial intelligence in brain tumor detection through mri scans: Advancements and challenges. Artif. Intell. Internet Things 2021, 1, 241–276. [Google Scholar]
Bakas, S.; Reyes, M.; Jakab, A.; Bauer, S.; Rempfler, M.; Crimi, A.; Shinohara, R.T.; Berger, C.; Ha, S.M.; Rozycki, M. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge. arXiv 2018, arXiv:1811.02629. [Google Scholar]
Archana, R.; Jeevaraj, P.E. Deep learning models for digital image processing: A review. Artif. Intell. Rev. 2024, 57, 11. [Google Scholar] [CrossRef]
Nassar, S.E.; Yasser, I.; Amer, H.M.; Mohamed, M.A. A robust mri-based brain tumor classification via a hybrid deep learning technique. J. Supercomput. 2024, 80, 2403–2427. [Google Scholar] [CrossRef]
Mohammed, F.A.; Tune, K.K.; Assefa, B.G.; Jett, M.; Muhie, S. Medical image classifications using convolutional neural networks: A survey of current methods and statistical modeling of the literature. Mach. Learn. Knowl. Extr. 2024, 6, 699–735. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <0.5 mb model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
Chauhan, P.; Lunagaria, M.; Verma, D.K.; Vaghela, K.; Diwan, A.; Patole, S.; Mahadeva, R. Analyzing brain tumour classification techniques: A comprehensive survey. IEEE Access 2024, 12, 136389–136407. [Google Scholar] [CrossRef]
Safdar, K.; Akbar, S.; Gull, S. An automated deep learning based ensemble approach for malignant melanoma detection using dermoscopy images. In Proceedings of the 2021 International Conference on Frontiers of Information Technology (FIT), Islamabad, Pakistan, 13–14 December 2021; pp. 206–211. [Google Scholar]
Mahesh, T.; Vinoth Kumar, V.; Vivek, V.; Karthick Raghunath, K.; Sindhu Madhuri, G. Early predictive model for breast cancer classification using blended ensemble learning. Int. J. Syst. Assur. Eng. Manag. 2024, 15, 188–197. [Google Scholar] [CrossRef]
Gull, S.; Akbar, S.; Khan, H.U. Automated detection of brain tumor through magnetic resonance images using convolutional neural network. BioMed Res. Int. 2021, 2021, 3365043. [Google Scholar] [CrossRef]
Hassan, N.M.; Hamad, S.; Mahar, K. Mammogram breast cancer cad systems for mass detection and classification: A review. Multimed. Tools Appl. 2022, 81, 20043–20075. [Google Scholar] [CrossRef]
Bhardawaj, F.; Jain, S. Cad system design for two-class brain tumor classification using transfer learning. Curr. Cancer Ther. Rev. 2024, 20, 223–232. [Google Scholar] [CrossRef]
Işık, G.; Paçal, İ. Few-shot classification of ultrasound breast cancer images using meta-learning algorithms. Neural Comput. Appl. 2024, 36, 12047–12059. [Google Scholar] [CrossRef]
Gharoun, H.; Momenifar, F.; Chen, F.; Gandomi, A.H. Meta-learning approaches for few-shot learning: A survey of recent advances. ACM Comput. Surv. 2024, 56, 1–41. [Google Scholar] [CrossRef]
Zhang, C.; Cui, Q.; Ren, S. Few-shot medical image classification with maml based on dice loss. In Proceedings of the 2022 IEEE 2nd International Conference on Data Science and Computer Application (ICDSCA), Dalian, China, 28–30 October 2022; pp. 348–351. [Google Scholar]
Huisman, M.; Van Rijn, J.N.; Plaat, A. A survey of deep meta-learning. Artif. Intell. Rev. 2021, 54, 4483–4541. [Google Scholar] [CrossRef]
Fu, M.; Wang, X.; Wang, J.; Yi, Z. Prototype bayesian meta-learning for few-shot image classification. IEEE Trans. Neural Netw. Learn. Syst. 2024, 36, 7010–7024. [Google Scholar] [CrossRef]
Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International conference on machine learning, Sydney, Australia, 6–11 August 2017; PMLR. pp. 1126–1135. [Google Scholar]
Yeung, M.; Sala, E.; Schönlieb, C.-B.; Rundo, L. Unified focal loss: Generalising dice and cross entropy-based losses to handle class imbalanced medical image segmentation. Comput. Med. Imaging Graph. 2022, 95, 102026. [Google Scholar] [CrossRef]
Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese neural networks for one-shot image recognition. In Proceedings of the ICML Deep Learning Workshop, Lille, France, 10 July 2015; pp. 1–30. [Google Scholar]
Vinyals, O.; Blundell, C.; Lillicrap, T.; Wierstra, D. Matching networks for one shot learning. Adv. Neural Inf. Process. Syst. 2016, 19, 1–9. [Google Scholar]
Snell, J.; Swersky, K.; Zemel, R. Prototypical networks for few-shot learning. Adv. Neural Inf. Process. systems 2017, 30, 1–11. [Google Scholar]
Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P.H.; Hospedales, T.M. Learning to compare: Relation network for few-shot learning. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1199–1208. [Google Scholar]
Yan, J.; Feng, K.; Zhao, H.; Sheng, K. Siamese-prototypical network with data augmentation pre-training for few-shot medical image classification. In Proceedings of the 2022 2nd International Conference on Frontiers of Electronics, Information and Computation Technologies (ICFEICT), Wuhan, China, 19–21 August 2022; pp. 387–391. [Google Scholar]
Pal, A.; Xue, Z.; Befano, B.; Rodriguez, A.C.; Long, L.R.; Schiffman, M.; Antani, S. Deep metric learning for cervical image classification. IEEE Access 2021, 9, 53266–53275. [Google Scholar] [CrossRef]
Jiang, H.; Gao, M.; Li, H.; Jin, R.; Miao, H.; Liu, J. Multi-learner based deep meta-learning for few-shot medical image classification. IEEE J. Biomed. Health Inform. 2023, 27, 17–28. [Google Scholar] [CrossRef]
Ramanarayanan, S.; Palla, A.; Ram, K.; Sivaprakasam, M. Generalizing supervised deep learning mri reconstruction to multiple and unseen contrasts using meta-learning hypernetworks. Appl. Soft Comput. 2023, 146, 110633. [Google Scholar] [CrossRef]
Sun, Q.; Liu, Y.; Chua, T.-S.; Schiele, B. Meta-transfer learning for few-shot learning. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 403–412. [Google Scholar]
Jia, J.; Feng, X.; Yu, H. Few-shot classification via efficient meta-learning with hybrid optimization. Eng. Appl. Artif. Intell. 2024, 127, 107296. [Google Scholar] [CrossRef]
Pachetti, E.; Colantonio, S. A systematic review of few-shot learning in medical imaging. Artif. Intell. Med. 2024, 156, 102949. [Google Scholar] [CrossRef] [PubMed]
Bateni, P.; Barber, J.; Van de Meent, J.-W.; Wood, F. Enhancing few-shot image classification with unlabelled examples. In Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 2796–2805. [Google Scholar]
Shin, J.; Kang, Y.; Jung, S.; Choi, J. Active instance selection for few-shot classification. IEEE Access 2022, 10, 133186–133195. [Google Scholar] [CrossRef]
Valero-Mas, J.J.; Gallego, A.J.; Rico-Juan, J.R. An overview of ensemble and feature learning in few-shot image classification using siamese networks. Multimed. Tools Appl. 2024, 83, 19929–19952. [Google Scholar] [CrossRef]
Zeng, W.; Xiao, Z.-Y. Few-shot learning based on deep learning: A survey. Math. Biosci. Eng. 2024, 21, 679–711. [Google Scholar] [CrossRef]
Liu, Q.; Tian, Y.; Zhou, T.; Lyu, K.; Xin, R.; Shang, Y.; Liu, Y.; Ren, J.; Li, J. A few-shot disease diagnosis decision making model based on meta-learning for general practice. Artif. Intell. Med. 2024, 147, 102718. [Google Scholar] [CrossRef]
Al-Khuzaie, M.I.M.; Al-Jawher, W.A.M. Enhancing brain tumor classification with a novel three-dimensional convolutional neural network (3d-cnn) fusion model. J. Port Sci. Res. 2024, 7, 254–267. Available online: https://jport.co/index.php/jport/article/view/255 (accessed on 12 December 2024). [CrossRef]
Pereira, S.; Pinto, A.; Alves, V.; Silva, C.A. Brain tumor segmentation using convolutional neural networks in mri images. IEEE Trans. Med. Imaging 2016, 35, 1240–1251. [Google Scholar] [CrossRef]
Shamshad, N.; Sarwr, D.; Almogren, A.; Saleem, K.; Munawar, A.; Rehman, A.U.; Bharany, S. Enhancing brain tumor classification by a comprehensive study on transfer learning techniques and model efficiency using mri datasets. IEEE Access 2024, 12, 100407–100418. [Google Scholar] [CrossRef]
Khaliki, M.Z.; Başarslan, M.S. Brain tumor detection from images and comparison with transfer learning methods and 3-layer cnn. Sci. Rep. 2024, 14, 2664. [Google Scholar] [CrossRef]
LAROUI. A hybrid machine learning method for image classification. Int. J. Comput. Digit. Syst. 2024, 15, 1–16. [Google Scholar]
Mohanty, B.C.; Subudhi, P.K.; Dash, R.; Mohanty, B. Feature-enhanced deep learning technique with soft attention for mri-based brain tumor classification. Int. J. Inf. Technol. 2024, 16, 1617–1626. [Google Scholar] [CrossRef]
Sharif, M.I.; Li, J.P.; Khan, M.A.; Kadry, S.; Tariq, U. M3btcnet: Multi model brain tumor classification using metaheuristic deep neural network features optimization. Neural Comput. Appl. 2024, 36, 95–110. [Google Scholar] [CrossRef]
Ghaffari, M.; Sowmya, A.; Oliver, R. Automated brain tumor segmentation using multimodal brain scans: A survey based on models submitted to the brats 2012–2018 challenges. IEEE Rev. Biomed. Eng. 2020, 13, 156–168. [Google Scholar] [CrossRef]
Menze, B.H.; Jakab, A.; Bauer, S.; Kalpathy-Cramer, J.; Farahani, K.; Kirby, J.; Burren, Y.; Porz, N.; Slotboom, J.; Wiest, R.; et al. The multimodal brain tumor image segmentation benchmark (brats). IEEE Trans. Med. Imaging 2015, 34, 1993–2024. [Google Scholar] [CrossRef]
University of Pennsylvania. Multimodal Brain Tumor Segmentation Challenge 2017. 2017. Available online: https://www.med.upenn.edu/sbia/brats2017.html (accessed on 12 March 2025).
University of Pennsylvania. Multimodal Brain Tumor Segmentation Challenge 2018. 2018. Available online: https://www.med.upenn.edu/sbia/brats2018/data.html (accessed on 10 March 2025).
Gull, S.; Akbar, S.; Naqi, S.M. A deep learning approach for multi-stage classification of brain tumor through magnetic resonance images. Int. J. Syst. Technol. 2023, 33, 1745–1766. [Google Scholar] [CrossRef]
Remzan, N.; Tahiry, K.; Farchi, A. Advancing brain tumor classification accuracy through deep learning: Harnessing radimagenet pre-trained convolutional neural networks, ensemble learning, and machine learning classifiers on mri brain images. Multimed. Tools Appl. 2024, 83, 82719–82747. [Google Scholar] [CrossRef]
Reddy, C.K.K.; Reddy, P.A.; Janapati, H.; Assiri, B.; Shuaib, M.; Alam, S.; Sheneamer, A. A fine-tuned vision transformer based enhanced multi-class brain tumor classification using mri scan imagery. Front. Oncol. 2024, 14, 1400341. [Google Scholar] [CrossRef]
Wang, J.; Lu, S.-Y.; Wang, S.-H.; Zhang, Y.-D. Ranmerformer: Randomized vision transformer with token merging for brain tumor classification. Neurocomputing 2024, 573, 127216. [Google Scholar] [CrossRef]
Goceri, E. Vision transformer based classification of gliomas from histopathological images. Expert Syst. Appl. 2024, 241, 122672. [Google Scholar] [CrossRef]
Krishnan, P.T.; Krishnadoss, P.; Khandelwal, M.; Gupta, D.; Nihaal, A.; Kumar, T.S. Enhancing brain tumor detection in mri with a rotation invariant vision transformer. Front. Neuroinformatics 2024, 18, 1414925. Available online: https://www.frontiersin.org/journals/neuroinformatics/articles/10.3389/fninf.2024.1414925 (accessed on 15 August 2024). [CrossRef] [PubMed]
Srinivas, B.; Anilkumar, B.; Devi, N.; Aruna, V. A fine-tuned transformer model for brain tumor detection and classification. Multimed. Tools Appl. 2024, 12, 1573–7721. [Google Scholar] [CrossRef]
Dutta, T.K.; Nayak, D.R.; Pachori, R.B. Gt-net: Global transformer network for multiclass brain tumor classification using mr images. Biomed. Eng. Lett. 2024, 14, 1069–1077. [Google Scholar] [CrossRef] [PubMed]
Singh, K.; Malhotra, D. Iram–net model: Image residual agnostics meta-learning-based network for rare de novo glioblastoma diagnosis. Neural Comput. Appl. 2024, 36, 21465–21485. [Google Scholar] [CrossRef]
Rafiei, A.; Moore, R.; Jahromi, S.; Hajati, F.; Kamaleswaran, R. Meta-learning in healthcare: A survey. SN Comput. Sci. 2024, 5, 791. [Google Scholar] [CrossRef]
Lu, L.; Cui, X.; Tan, Z.; Wu, Y. Medoptnet: Meta-learning framework for few-shot medical image classification. IEEE/ACM Trans. Comput. Biol. Bioinform. 2024, 21, 725–736. [Google Scholar] [CrossRef]
Tian, Y.; Zhao, X.; Huang, W. Meta-learning approaches for learning-to-learn in deep learning: A survey. Neurocomputing 2022, 494, 203–223. [Google Scholar] [CrossRef]
Monteiro, J.P.; Ramos, D.; Carneiro, D.; Duarte, F.; Fernandes, J.M.; Novais, P. Meta-learning and the new challenges of machine learning. Int. J. Intell. Syst. 2021, 36, 6240–6272. [Google Scholar] [CrossRef]
Vettoruzzo, A.; Bouguelia, M.R.; Vanschoren, J.; Rögnvaldsson, T.; Santosh, K. Advances and challenges in meta-learning: A technical review. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 4763–4779. [Google Scholar] [CrossRef]
Ali, M.D.; Saleem, A.; Elahi, H.; Khan, M.A.; Khan, M.I.; Yaqoob, M.M.; Khattak, U.F.; Al-Rasheed, A. Breast cancer classification through meta-learning ensemble technique using convolution neural networks. Diagnostics 2023, 13, 2242. [Google Scholar] [CrossRef]
Verma, A.; Singh, V.P. Hsadml: Hyper-sphere angular deep metric based learning for brain tumor classification. In Proceedings of the Satellite Workshops of ICVGIP 2021, Singapore, 27 November 2022; Springer Nature: Singapore, 2022; pp. 105–120. [Google Scholar]
Singh, J. Figshare. J. Pharmacol. Pharmacother. 2011, 2, 138–139. [Google Scholar] [CrossRef] [PubMed]
Chen, D.; Chen, Y.; Li, Y.; Mao, F.; He, Y.; Xue, H. Self-supervised learning for few-shot image classification. In Proceedings of the ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 1745–1749. [Google Scholar] [CrossRef]
Singh, R.; Bharti, V.; Purohit, V.; Kumar, A.; Singh, A.K.; Singh, S.K. Metamed: Few-shot medical image classification using gradient-based meta-learning. Pattern Recognit. 2021, 120, 108111. [Google Scholar] [CrossRef]
Jantzen, J.; Norup, J.; Dounias, G.; Bjerregaard, B. Pap-smear benchmark data for pattern classification. Nat. Inspired Smart Inf. Syst. (NiSIS 2005) 2005, 1–9. [Google Scholar]
Spanhol, F.A.; Oliveira, L.S.; Petitjean, C.; Heutte, L. A dataset for breast cancer histopathological image classification. IEEE Trans. Biomed. Eng. 2016, 63, 1455–1462. [Google Scholar] [CrossRef]
Zou, J.; Ma, X.; Zhong, C.; Zhang, Y. Dermoscopic image analysis for isic challenge 2018. arXiv 2018, arXiv:1807.08948. [Google Scholar]
Sekhar, A.; Gupta, R.K.; Sethi, A. Few-shot histopathology image classification: Evaluating state-of-the-art methods and unveiling performance insights. Comput. Vis. Pattern Recognit. 2024, 2408, 13816. [Google Scholar]
Acevedo, A.; Merino, A.; Alférez, S.; Molina, Á.; Boldú, L.; Rodellar, J. A dataset of microscopic peripheral blood cell images for development of automatic recognition systems. Data Brief 2020, 30, 105474. [Google Scholar] [CrossRef]
Kather, J.N.; Krisam, J.; Charoentong, P.; Luedde, T.; Herpel, E.; Weis, C.-A.; Gaiser, T.; Marx, A.; Valous, N.A.; Ferber, D.; et al. Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study. PLoS Med. 2019, 16, e1002730. [Google Scholar] [CrossRef]
Wang, X.; Peng, Y.; Lu, L.; Lu, Z.; Bagheri, M.; Summers, R.M. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Lee, J.D.M.C.K.; Toutanova, K. Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Tummala, S.; Kadry, S.; Bukhari, S.A.; Rauf, H.T. Classification of brain tumor from magnetic resonance imaging using vision transformers ensembling. Curr. Oncol. 2022, 29, 7498–7511. [Google Scholar] [CrossRef] [PubMed]
Zhou, H.; Huang, S.; Xu, Y. Inceptr: Micro-expression recognition integrating inception-cbam and vision transformer. Multimed. Syst. 2023, 29, 3863–3876. [Google Scholar] [CrossRef]
Bromley, J.; Guyon, I.; LeCun, Y.; Säckinger, E.; Shah, R. Signature verification using a” siamese” time delay neural network. Adv. Neural inf. Process. Syst. 1993, 6, 737–744. [Google Scholar] [CrossRef]
Kingma, D.P.J.B. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Nickparvar, M. Brain Tumor Mri Dataset. 2021. Available online: https://www.kaggle.com/datasets/masoudnickparvar/brain-tumor-mri-dataset?select=Training (accessed on 24 November 2024).

Figure 1. MR images of three types of brain tumors viewed from three distinct planes: axial, coronal, and sagittal.

Figure 2. Workflow diagram of proposed framework.

Figure 3. Vision transformer model for feature extraction of MR images. [* is used to indicate image classes].

Figure 5. Classes of brain tumors in the dataset.

Figure 6. Accuracy and loss curves for Siamese network with 10-shot learning.

Figure 7. Accuracy and loss curves for Siamese network with 5-shot learning.

Table 1. Division of MR images dataset for each tumor category.

Dataset Classes	Support Set	Query Set	Total Images of Each Class
Glioma Tumor	1321	300	1621
Meningioma Tumor	1339	306	1645
Pituitary Tumor	1457	300	1757
No Tumor	1595	405	2000
Total Images	5712	1311	7023

Table 2. Experimental hyperparameter selections.

Parameters	Description
Input Image Size	224 $\times$ 224 pixels
Patch Dimension	16
Embedding Size	768
Network Depth	12 layers
Attention Heads	12
MLP Layer Size	1024
Batch Capacity	32
Dropout Rate	0.1
Total Epochs	10
Optimization Method	Adam
Regularization	0.01 weight decay
Learning Rate	0.001

Table 3. Comparison of results of proposed methodology with other developed methods based on meta-learning.

Proposed Methods	Accuracy	Specificity	Sensitivity	Precision	F1-Score
ViT (4-way 1-shot)	$20.50 \pm$ 0.12	$52.89 \pm$ 0.35	$19.76 \pm$ 0.06	$27.43 \pm$ 0.10	$20.20 \pm$ 0.27
ViT (4-way 5-shot)	$31.50 \pm$ 0.45	$66.32 \pm$ 0.40	$31.30 \pm$ 0.13	$36.82 \pm$ 0.20	$29.63 \pm$ 0.09
ViT (4-way 10-shot)	$32.25 \pm$ 0.49	$66.18 \pm$ 0.48	$31.79 \pm$ 0.25	$32.68 \pm$ 0.31	$27.57 \pm$ 0.17
ViT + MAML (4-way 1-shot)	$32.50 \pm$ 0.29	$67.50 \pm$ 0.58	$32.50 \pm$ 0.18	$30.78 \pm$ 0.09	$31.45 \pm$ 0.30
ViT + MAML (4-way 5-shot)	$37.50 \pm$ 0.35	$62.50 \pm$ 0.39	$37.50 \pm$ 0.12	$38.13 \pm$ 0.23	$35.85 \pm$ 0.16
ViT + MAML (4-way 10-shot)	$43.75 \pm$ 0.50	$69.25 \pm$ 0.67	$39.75 \pm$ 0.14	$40.01 \pm$ 0.17	$43.71 \pm$ 0.13
ViT + matching network (4-way 1-shot)	$37.12 \pm$ 0.89	$62.33 \pm$ 0.62	$35.00 \pm$ 0.15	$32.63 \pm$ 0.35	$28.63 \pm$ 0.43
ViT + matching network (4-way 5-shot)	$39.22 \pm$ 0.60	$68.58 \pm$ 0.57	$39.05 \pm$ 0.28	$42.09 \pm$ 0.48	$33.75 \pm$ 0.56
ViT + matching network (4-way 10-shot)	$45.62 \pm$ 0.60	$72.46 \pm$ 0.49	$44.38 \pm$ 0.37	$42.92 \pm$ 0.52	$39.76 \pm$ 0.27
ViT + Siamese network (4-way 1-shot)	$50.00 \pm$ 0.05	$75.39 \pm$ 0.20	$31.03 \pm$ 0.39	$50.02 \pm$ 0.27	$39.48 \pm$ 0.35
ViT + Siamese network (4-way 5-shot)	$58.30 \pm$ 0.09	78.60 $\pm$ 0.32	$42.01 \pm$ 0.73	62.32 $\pm$ 0.16	$50.19 \pm$ 0.21
ViT + Siamese network (4-way 10-shot)	60.11 $\pm$ 0.95	$76.49 \pm$ 0.09	47.73 $\pm$ 0.05	$61.79 \pm$ 0.15	53.86 $\pm$ 0.19

Note: Bold values indicate the highest performance scores.

Table 4. Comparative results of previous deep learning approaches and the proposed model on the same dataset.

Existing Methods	Accuracy	Specificity	Sensitivity	Precision	F1-Score
CNN-based model [22]
4-way 1-shot	39.66%	65.13%	36.00%	42.18%	38.84%
4-way 5-shot	52.76%	62.81%	52.35%	54.66%	52.48%
4-way 10-shot	55.00%	68.76%	51.53%	55.51%	53.44%
Deep learning model based on CNN layers [60]
4-way 1-shot	22.88%	49.00%	25.00%	05.72%	09.31%
4-way 5-shot	23.34%	57.00%	25.00%	05.84%	09.46%
4-way 10-shot	34.63%	68.12%	31.50%	19.63%	23.38%
Proposed method (ViT+ Siamese network)
4-way 1-shot	50.00%	58.39%	$31.03 %$	$50.02$ %	39.48%
4-way 5-shot	58.30%	74.60%	$42.01 %$	62.32%	50.19%
4-way 10-shot	60.11%	$70.49 %$	47.73%	$61.79 %$	53.86%

Note: Bold values indicate the highest performance scores.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gull, S.; Kim, J. Metric-Based Meta-Learning Approach for Few-Shot Classification of Brain Tumors Using Magnetic Resonance Images. Electronics 2025, 14, 1863. https://doi.org/10.3390/electronics14091863

AMA Style

Gull S, Kim J. Metric-Based Meta-Learning Approach for Few-Shot Classification of Brain Tumors Using Magnetic Resonance Images. Electronics. 2025; 14(9):1863. https://doi.org/10.3390/electronics14091863

Chicago/Turabian Style

Gull, Sahar, and Juntae Kim. 2025. "Metric-Based Meta-Learning Approach for Few-Shot Classification of Brain Tumors Using Magnetic Resonance Images" Electronics 14, no. 9: 1863. https://doi.org/10.3390/electronics14091863

APA Style

Gull, S., & Kim, J. (2025). Metric-Based Meta-Learning Approach for Few-Shot Classification of Brain Tumors Using Magnetic Resonance Images. Electronics, 14(9), 1863. https://doi.org/10.3390/electronics14091863

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Metric-Based Meta-Learning Approach for Few-Shot Classification of Brain Tumors Using Magnetic Resonance Images

Abstract

1. Introduction

2. Literature Review

2.1. Brain MRI Classification

2.2. Vision Transformer

2.3. Meta-Learning-Based Methods

3. Research Methodology

3.1. Preprocessing

3.2. Feature Extraction

3.3. Siamese Network as Meta-Learner

3.4. Few-Shot Classification

4. Experimental Results

4.1. Dataset

4.2. Model Implementation

4.3. Training Details

4.4. Evaluation Metrics

4.5. Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI