EMGP-Net: A Hybrid Deep Learning Architecture for Breast Cancer Gene Expression Prediction

Thâalbi, Oumeima; Akhloufi, Moulay A.

doi:10.3390/computers14070253

Open AccessArticle

EMGP-Net: A Hybrid Deep Learning Architecture for Breast Cancer Gene Expression Prediction

by

Oumeima Thâalbi

and

Moulay A. Akhloufi

^*

Perception, Robotics, and Intelligent Machines (PRIME), Department of Computer Science, Université de Moncton, Moncton, NB E1A 3E9, Canada

^*

Author to whom correspondence should be addressed.

Computers 2025, 14(7), 253; https://doi.org/10.3390/computers14070253

Submission received: 20 May 2025 / Revised: 24 June 2025 / Accepted: 25 June 2025 / Published: 26 June 2025

(This article belongs to the Special Issue AI in Its Ecosystem)

Download

Browse Figures

Versions Notes

Abstract

Background: The accurate prediction of gene expression is essential in breast cancer research. However, spatial transcriptomics technologies are usually too expensive. Recent studies have used whole-slide images combined with spatial transcriptomics data to predict breast cancer gene expression. To this end, we present EMGP-Net, a novel hybrid deep learning architecture developed by combining two state-of-the-art models, MambaVision and EfficientFormer. Method: EMGP-Net was first trained on the HER2+ dataset, containing data from eight patients using a leave-one-patient-out approach. To ensure generalizability, we conducted external validation and alternately trained EMGP-Net on the HER2+ dataset and tested it on the STNet dataset, containing data from 23 patients, and vice versa. We evaluated EMGP-Net’s ability to predict the expression of 250 selected genes. EMGP-Net mixes features from both models, and uses attention mechanisms followed by fully connected layers. Results: Our model outperformed both EfficientFormer and MambaVision, which were trained separately on the HER2+ dataset, achieving the highest PCC of 0.7903 for the PTMA gene, with the top 14 genes having PCCs greater than 0.7, including other important breast cancer biomarkers such as GNAS and B2M. The external validation showed that it also outperformed models that were retrained with our approach. Conclusions: The results of EMGP-Net were better than those of existing models, showing that the combination of advanced models is an effective strategy to improve performance in this task.

Keywords:

breast cancer; gene expression; whole-slide images; spatial transcriptomics; deep learning

1. Introduction

Breast cancer remains one of the most common and potentially most deadly forms of cancer, with hundreds of thousands of deaths and millions of women diagnosed each year, as of 2018 [1]. Despite important advances in early detection, the molecular complexity and heterogeneity of breast cancer pose serious challenges to the accurate diagnosis of the disease. Histopathology, particularly the use of hematoxylin and eosin (H&E)-stained tissue slides, has been one of the main bases for diagnosing breast cancer. These images provide detailed visual information about the structure of the tissue, making it easier for pathologists to determine the type, grade, and stage of cancer. Histopathological images were initially employed to determine cancer types [2], and several studies have utilized datasets with histopathological samples, such as the BreakHis dataset, the CIAR 2018 dataset, the BreCaHAD dataset, and others. These studies have contributed to classifying tumors into different subtypes. For example, Al-Jabbar et al. [3], Obayya et al. [4], and Clement et al. [5] developed various approaches to detect benign and malignant classes. Other studies progressed further by identifying additional subclasses of breast cancer, such as those proposed by Bagchi et al. [6], Bhausaheb and Kashyap [7], and Yu et al. [8]. However, these methods primarily focused on morphology and did not capture the molecular details that could better inform treatment decisions, such as gene expression levels, which are critical to understanding tumor behavior. ST technologies have recently emerged as strong tools that enable researchers to profile gene expression while preserving the spatial context of the tissue and allow for high-resolution mapping of gene expression across tissue sections, providing valuable insights into the molecular architecture of tumors. However, the high cost of these technologies limits their large-scale use, particularly in clinical settings. This challenge has led to the exploration of alternative approaches, such as predicting gene expression directly from histopathological images, especially whole-slide images (WSIs). In the literature, past studies have shown that WSIs contain biologically rich information that can be leveraged to predict gene expression. Many methods have successfully used deep learning models to predict gene expression patterns from WSIs. ST-Net [9], which uses DenseNet-121 to predict 250 genes, was one of the first transfer learning-based approaches to make marked progress in breast cancer gene expression prediction. Afterwards, BrST-Net [10] adapted CNN models, such as EfficientNet-b0, and incorporated an auxiliary network to predict the expression of 250 genes. GeNetFormer [11], our previous approach, integrated several transformer models, including EfficientFormer, to predict 250 genes. HisToGene [12], on the other hand, used attention mechanisms and ViT to predict the expression of 785 genes. Other models such as SEPAL [13], Hist2ST [14], and THItoGene [15] used graph neural networks (GNNs) and showed interesting results. SEPAL [13], Hist2ST [14], and THItoGene [15] not only capture spatial information from individual spots, but also use neighboring spots to gain a global perspective. Similarly, PH2ST [16] and HGGEP [17] incorporate hypergraphs, allowing both local and global integration of spatial context. In the studies reviewed, methods such as ST-Net [9], BrST-Net [10], GeNetFormer [11], EGN [18], and EGGN [19] were used to focus primarily on local predictions, whereas HisToGene [12] involves an adapted global prediction approach. However, methods such as SEPAL [13] and THItoGene [15] combine both local and global predictions. Although transformers were initially designed for natural language processing, their ability to model global dependencies has made them highly valuable in computer vision [20]. CNNs are good at detecting local patterns, but they have trouble with long-range interactions, while transformers, on the other hand, use attention mechanisms to capture relationships across the whole image [20]. This makes them effective at analyzing complex spatial structures in histopathological slides, where fine-grained detail and large-scale tissue architecture are both important for gene expression prediction. Recently, state-space models (SSMs) have emerged as a valuable alternative to attention mechanisms, offering competitive modeling capabilities at a lower computational cost. MambaVision [21], built on this principle, combines the efficiency of SSMs with the global awareness of attention. It delivered a strong performance on large image datasets while being much faster and more memory-efficient than traditional transformers. These properties make MambaVision well suited for high-resolution tasks, such as predicting gene expression from histopathological images. The goal of this study is to explore the application of advanced hybrid architectures combining MambaVision and EfficientFormer to predict breast cancer gene expression from WSIs. While convolutional neural networks (CNNs) have long been the architecture of choice for medical image analysis, they tend to excel at capturing local features, and struggle to model long-range spatial relationships and global patterns. Building on advances in deep learning-based gene expression prediction from histopathological images, and to address the limitations of older state-of-the-art (SOTA) approaches, we propose a novel approach that combines two SOTA architectures, EfficientFormer and MambaVision. MambaVision, one of the most recent models in the field, has demonstrated important potential in visual recognition tasks. By leveraging MambaVision’s advanced feature extraction capabilities, our approach aims to capture the complicated patterns in histopathological images that can contribute to more accurate gene expression prediction. EfficientFormer, a transformer model known for its efficiency and performance in gene expression prediction tasks, can complement MambaVision by providing strong spatial dependency processing capabilities. Our hybrid model, called EMGP-Net, integrates these two powerful architectures to improve gene expression prediction from WSIs. This combination of both models provides a robust approach. For this work, we used two widely used datasets: the HER2+ dataset and the STNet dataset. The HER2+ dataset consists of 36 sections from 8 patients, while the STNet dataset contains 68 sections from 23 patients. Both datasets have associated ST data. To ensure the generalizability of our model, we employed two validation strategies. First, we used the HER2+ dataset for internal validation. Then, for external validation, we evaluated the model using two approaches: one test performed on the HER2+ dataset and another on the STNet dataset. In both cases, the model was trained on the remaining dataset. All tests aimed to predict the expression of 250 genes, as in other methods [9,10,11]. This innovative hybrid architecture motivated us to further explore the potential of combining these two SOTA models for gene expression prediction. Our contributions are as follows:

Proposing EMGP-Net: We propose a hybrid model combining MambaVision and EfficientFormer to predict gene expression more effectively from WSIs.
Performing exhaustive validation: We perform internal and external validation on the HER2+ and STNet datasets to ensure model robustness and generalizability.
Demonstrating benefits of hybrid deep learning: We demonstrate the benefits of combining the latest powerful SOTA models and evaluate them on medical tasks, contributing to advances in breast cancer research, particularly in gene expression prediction for diagnosis.

2. Related Work

2.1. CNN-Based Approaches

Recently, several studies have focused on gene expression prediction from WSIs with associated ST data to mitigate the high cost of ST technologies by using the histopathological images that are widely available. For instance, ST-Net [9] was the first approach that combined ST data and WSIs of breast cancer tissues to predict gene expression, and a CNN-based method was developed using DenseNet-121. It uses H&E tissue images from the STNet dataset, first introduced in [9], and employs cross-validation. The model achieved the highest median PCC of 0.3400 for the GNAS gene. ST-Net was trained to predict 250 genes and was tested on the external 10x Genomics dataset, as well as on the Cancer Genome Atlas (TCGA) dataset. BrST-Net [10], also a CNN-based approach, has an integrated primary network and auxiliary network (AuxNet). It was evaluated using 10 SOTA models, and the best performing was EfficientNet-b0 with AuxNet. This model was trained and tested on the STNet dataset and achieved the highest PCC of 0.6325 for the B2M gene. The TRIPLEX approach [22], designed to predict the expression of 250 genes, consists of three parts: a target encoder, a global encoder, and a neighbor encoder. The target encoder processes the specific region of interest (target spot) in the tissue using ResNet18 and outputs predictions through a predictor. The global encoder, which uses multiple layers of transformer blocks and an atypical position encoding generator (APEG), captures a broader context by analyzing the entire tissue, while the neighbor encoder focuses on the regions around the target spot, also using ResNet18, with a final fusion layer integrating attention modules. The model was validated on three internal datasets, STNet, HER2+, and cSCC, and externally validated on the Visium dataset of breast cancer patients. On the HER2+ dataset, the model achieved a mean PCC of 0.3140 for all genes and 0.4970 for highly predictive genes. On the STNet dataset, it achieved a mean PCC of 0.3520 for all genes and 0.2060 for highly predictive genes.

2.2. Transformer-Based Approaches

Other methods apply transformer-based architectures, such as the GeNetFormer framework [11], which evaluated eight advanced transformer models, including EfficientFormer, FasterViT, BEiT v2, Swin Transformer v2, PyramidViT v2, MobileViT v2, MobileViT, and EfficientViT. These models were applied to predict 250 genes using the STNet dataset. The framework was trained using different image resolutions (224 × 224, 256 × 256) and loss functions (MSELoss, SL1Loss). The approach integrates these models into a comprehensive pipeline. The highest PCC was obtained with the configuration with the MSELoss function, 224 × 224 resolution, and EfficientFormer, which gave 9 out of the top 10 genes with the highest PCC values. HisToGene [12] focuses on modeling the spatial dependencies of gene expression using multi-head attention and a modified ViT architecture that allows the model to handle heterogeneity. In terms of results, HisToGene’s predictions on the HER2+ dataset achieved the highest mean R of 0.3200 for the GNAS gene, and it was also evaluated on the human cutaneous squamous cell carcinoma (cSCC) dataset.

2.3. Hybrid Transformer and GNN Approaches

Zeng et al. [14] proposed Hist2ST, based on transformer and GNN architectures, which starts the process by dividing the images around each spot into patches. These patches are then processed by three components. The convmixer extracts internal visual features within the image patches, while the transformer captures global spatial dependencies between spots. The GNN is then used to capture the local relationships between neighboring spots. The model was trained on the HER2+ and cSCC datasets using leave-one-out cross-validation. Hist2ST achieved an average PCC of 0.3900 for the top gene FN1. SEPAL [13] predicts gene expression by analyzing tissue images in two stages: local learning and spatial learning. First, it processes each image patch to extract visual features and predict 256 genes. In the second step, a graph is constructed for each patch and its neighbors, allowing the model to learn from both the local features and the spatial context. The GNN is used to refine the initial predictions by incorporating spatial relationships between patches. It was trained on the STNet dataset and the 10x Genomics dataset. The THItoGene [15] architecture consists of three parts: First, the image segmentation and position embedding step divides the image into patches corresponding to the location of each spot, embedding the spatial coordinates and aggregating the data for processing. Second, feature extraction is performed using a dynamic convolution module that captures deep molecular features from the patches, with the efficient capsule network improving the model by using self-attention mechanisms to adjust the convolution kernels based on the spatial context. Finally, global modeling is performed by the ViT module, which integrates the position embeddings and image features. The graph attention network (GAT) module is then used to learn the relationships between adjacent spots. The HER2+ dataset, with 785 selected genes, along with the cSCC dataset, was used for analysis, and leave-one-out cross-validation was applied. On the HER2+ dataset, PCC values of 0.7470, 0.7110, 0.6720, and 0.4520 were achieved for the genes FN1, SCD, IGKC, and FASN, respectively.

2.4. Graph-Based and Relational Modeling Approaches

Approaches such as ErwaNet [23] consist of two modules: the edge relational module (ERM) and the window attention module (WAM). The ERM captures local information by constructing a heterogeneous graph where each window in the slide image is treated as a node, and the relationships between windows are represented by three different types of edges: K-nearest neighbor (KN) edges, percent similarity (PS) edges, and K-nearest similarity (KS) edges. The WAM, on the other hand, captures global information by using an attention mechanism that aggregates feature representations from the entire tissue slide. The model was validated on two datasets, the STNet dataset and the 10x Genomics dataset, using cross-validation. A total of 250 targeted genes were selected for prediction. ErwaNet achieved PCC@F, PCC@S, and PCC@M values of 2.3600, 3.5200, and 3.3300 on the STNet dataset and 8.2800, 8.6900, and 8.3600 on the 10x Genomics dataset, respectively. Some other approaches are based on hypergraphs: PH2ST [16] is a prompt-based framework designed to handle multi-scale histological features by integrating dual-scale hypergraphs and ViT for both global and local features. This method uses a set of known ST values to guide over unmeasured spots. For feature extraction, PH2ST uses UNI, a universal histology image encoder pre-trained on a large corpus of WSIs. It captures both local and neighboring spatial context through dual-scale hypergraph-based spot representations, where each spot is connected to neighboring regions via hypergraph convolution. A cross-attention mechanism is then used to refine these representations for the final prediction. The model was applied to the HER2+ dataset containing 785 genes and also to the cSCC datasets. The highest PCC values were obtained for the genes TMSB10, CISD3, CD74, and COL6A2 from the HER2+ dataset: 0.5095, 0.2911, 0.5603, and 0.4091, respectively. The HGGEP architecture [17] consists of a gradient enhancement module (GEM) that enhances the gradients and captures cell morphological information, a ShuffleNet V2 backbone that extracts latent features from histology images, and a convolutional block attention module (CBAM) and ViT to refine the extracted features. The model includes a hypergraph association module (HAM) that captures spatial relationships between different regions in the tissue and uses long short-term memory (LSTM) to model dependencies between features. The HER2+ dataset with 785 genes and cSCC datasets were used. HGGEP achieved PCC values of 0.637 for GNAS, 0.564 for FASN, 0.652 for MR12B, and 0.649 for SCD.

2.5. Exemplar-Guided Approaches

Other approaches employ exemplar guidance learning, such as EGN [18] and its enhanced version EGGN [19]. EGN uses an exemplar bridging (EB) block and ViT as a backbone. The framework first retrieves the nearest exemplars for each tissue image window and constructs a graph to model spatial relations. It then updates window features using information from the exemplars to predict genes. EGGN constructs visual similarity graphs with the exemplars, which are then processed by a GraphSAGE-based backbone. Both models are based on the idea that images with similar visual features will have similar gene expression patterns, regardless of their location within the tissue.

3. Materials and Methods

3.1. Dataset

Two datasets were used in this work:

HER2+ is the HER2 (human epidermal growth factor receptor)-positive breast cancer dataset that was investigated in [24]. It was collected from eight patients (A-H). A total of 36 sections comprise the samples in this dataset. Three or, alternatively, six replicates (sections from the same patient) were assigned to each patient and stained with H&E. Each sample is in JPG format and comes with associated ST data. The dataset may represent various tissue types, including invasive cancer, breast glands, immune infiltrate, cancer in situ, connective tissue, and adipose tissue.
STNet is the fifth edition of the human breast cancer in situ capturing transcriptomics dataset, referred to as the STNet dataset, as was presented in [9]. It was obtained from 23 patients. It contains a total of 68 sections, with 3 sections per patient (except for 2 sections for 1 patient). The images are also stained with H&E and each sample is in JPG format and has corresponding ST data. The subtypes represented in the STNet dataset are luminal A, luminal B, triple negative, HER2 luminal, and non-luminal HER2.

Both datasets include files with the spot coordinates, the count matrices, and the gene names or symbols. Some examples of the datasets are shown in Figure 1.

3.2. Data Pre-Processing and Augmentation

Before our data were used by the model, we prepared them by applying various pre-processing techniques previously used in our earlier work [11] to both datasets. First, we filtered out spots with a total count of less than 1000. This is the same approach used in [10,11]. This resulted in a final number of spots of about 28,792 for the STNet dataset and 11,666 for the HER2+ dataset. We then normalized the gene expression count, and we added 1 as pseudo-count to avoid zero issues. We applied a log1p transformation (log(1 + x)) to the normalized counts to stabilize variance, reduce skewness, and ensure coherence between studies. Patches were generated according to the coordinates of each spot. In addition, to adjust the input dimension to fit with the model, the extracted patches were fixed to 224 × 224 × 3. For the STNet dataset, the list of genes with Ensembl identifiers (IDs) was converted to gene symbols using the HUGO Gene Nomenclature Committee (HGNC) database to ensure the use of unique gene symbols, thus simplifying electronic data retrieval and minimizing ambiguity. Classical data augmentation was also used to diversify the training data to help generalize and avoid overfitting. We applied random horizontal flipping, random vertical flipping, and random 90-degree rotation for each image patch. These techniques are important for learning from features independently of the direction or the rotation of the images. During testing, we computed the average of the eight symmetries obtained from the rotations and reflection.

3.3. Proposed Approach

3.3.1. Overview of the EMGP-Net Architecture

In this study, we introduce EMGP-Net, a hybrid deep learning architecture named after its components, EfficientFormer (E) and MambaVision (M), and its purpose, gene expression prediction (GP). The aim is to improve gene expression prediction from WSIs by using recent and robust SOTA models, and to improve the feature representations by exploiting the capabilities of both models. We followed three steps: the preparation and pre-processing of the data, the training of the model, and, finally, the evaluation. We extracted patches from the WSIs, centering on spots according to the positions provided for them in the corresponding ST data. This was performed to standardize the inputs before they were fed into the model for training. Initially, EMGP-Net uses MambaVision, an SOTA model known for its significant capacity in image tasks, especially when compared with other SOTA models [21]. In parallel, it integrates EfficientFormer, an efficient vision transformer that has shown good performance in gene expression prediction tasks, and achieved the best scores among seven evaluated transformer models implemented in the GeNetFormer framework that was introduced in [11]. We first adapted both models to produce 1024-dimensional feature vectors by replacing their original classification heads with custom linear layers. After extracting the features from both branches, they were stacked and passed on a multi-head attention mechanism. The attention layer takes these stacked vectors as input and learns how to weigh them based on their importance. This allows the model to focus on the most important features of each branch and to summarize the information in an appropriate way. Unlike simple concatenation, which increases dimensionality, or basic averaging without using attention, this approach allows the attention mechanism to influence the final combination, improving the fusion of information by capturing relevant relationships between the outputs of MambaVision and EfficientFormer. The fused features are then passed through a layer normalization (LN) layer. This helps to stabilize the training and improve the generalizability of the model. Then, we applied the Gaussian error linear unit (GeLU) activation function, which introduces non-linearity and improves learning capacity compared to the traditional ReLU. Finally, the normalized and activated features are passed through two fully connected linear layers. The first layer reduces the dimensionality from 1024 to 512, and the second layer gives predictions for the 250 targeted genes. The final output is a 250-dimensional vector representing the predicted gene expression values. To guarantee the generalizability of our model, the training was conducted with the leave-one-patient-out approach using the HER2+ dataset for the first evaluation. Then, we applied the first external validation using the STNet dataset with the model trained on the whole HER2+ dataset and the second external validation using the HER2+ dataset with the model trained on the whole STNet dataset. More details are mentioned below, and the EMGP-Net architecture is shown in Figure 2.

3.3.2. MambaVision

MambaVision: Introduced in [21], it is a hybrid architecture comprising Mamba [25] and transformers designed for improving feature learning. It presents a new SOTA approach trained on the ImageNet dataset. MambaVision combines Mamba blocks with transformer and self-attention layers. The network is divided into four stages. Two stages are based on CNN layers which integrate the GeLU activation and the batch normalization (BN) as follows:

\begin{matrix} \hat{z} & = GeLU (BN ({Conv}_{3 \times 3} (z))), \\ z & = BN ({Conv}_{3 \times 3} (\hat{z})) + z . \end{matrix}

(1)

The other two stages come with MambaVision and transformer blocks. This novel approach showed its capacity in capturing the global context and long-range spatial dependencies and handling high-resolution images with faster training and inference than standard transformers. MambaVision provided a better accuracy–throughput balance compared to other SOTA models such as FasterViT and Swin Transformer v2 and the two models that were evaluated on gene expression prediction in [11]. Also, it outperformed other SOTA models such as NextViT, VMamba, FastViT, and Vim using the ImageNet dataset. It achieved higher accuracy than transformer and Mamba models. Figure 3 shows the architecture of the MambaVision block.

3.3.3. EfficientFormer

EfficientFormer: Presented in [26], it is efficient for low-latency applications and is completely based on a pure transformer architecture. Like many other SOTA models, it was trained on the ImageNet dataset. Within each of the four network stages, a number of meta transformer blocks (MB) are implemented to avoid the use of MobileNet components. The following equation shows the relationship between the two components, the token mixer (TokenMixer) and the multi-layer perceptron (MLP), that construct the meta transformer block:

X_{i + 1} = {MB}_{i} (X_{i}) = MLP (TokenMixer (X_{i})) .

(2)

EfficientFormer achieved the best results compared to other SOTA models [26]. It outperformed several ViT-based architectures, such as DeiT-Small, LeViT-256, and PoolFormer-S24, in terms of accuracy and latency using the ImageNet dataset. Furthermore, GeNetFormer, a framework for gene expression prediction task, showed the best scores when implementing the EfficientFormer model, and it outperformed FasterViT, BEiT v2, Swin Transformer v2, PyramidViT v2, MobileViT v2, MobileViT, and EfficientViT, seven models evaluated in the GeNetFormer framework. Figure 4 shows the architecture of the meta transformer block.

3.3.4. Multi-Head Attention Mechanism

Multi-head attention: shown in Figure 5, in multi-head attention, instead of using one attention function with the same size for queries, keys, and values, several attention functions in parallel are used. Each function works with smaller versions of the queries, keys, and values. After each function processes its version, the results are combined, allowing the model to look at the input from different perspectives at the same time. In the transformer model, there are eight such attention functions, called heads, each looking at the input in different ways [27].

\begin{matrix} MultiHead (Q, K, V) = Concat (h e a d_{1}, \dots, h e a d_{h}) W^{O} \\ where h e a d_{i} = Attention (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V}) \end{matrix}

(3)

The implementation of our proposed approach was based on two SOTA models, MambaVision and EfficientFormer. We used the pre-trained versions on the ImageNet dataset. We added a set of layers to improve the feature learning mentioned in the previous part.

3.4. Evaluation Metrics

To evaluate the performance of the proposed EMGP-Net architecture, we used three metrics commonly applied in regression tasks. These metrics are described as follows:

MAE (Mean Absolute Error): Measures the average of the absolute differences between the observed and predicted values with the following equation:

$M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |$

(4)
RMSE (Root Mean Squared Error): Measures the square root of the average of the squared differences between the observed and predicted values with the following equation:

$R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}$

(5)
PCC (Pearson Correlation Coefficient): A measure of the strong correlation between the observed and predicted values. It is defined by the following equation:

$P C C = \frac{\sum_{i = 1}^{n} (y_{i} - \bar{y}) ({\hat{y}}_{i} - \bar{\hat{y}})}{\sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2} \sum_{i = 1}^{n} {({\hat{y}}_{i} - \bar{\hat{y}})}^{2}}}$

(6)

4. Experimental Results

The EMGP-Net architecture proposed in this research was designed to predict the expression of 250 genes, which were initially set and selected based on their highest mean expression in the dataset. (The lists of the 250 genes from the HER2+ and STNet datasets can be found in Appendix A). In this study, we focus on presenting the top 14 predicted genes among the 250. First, we evaluated our model using a leave-one-patient-out approach with the HER2+ dataset. To assess the model’s ability to generalize, we performed external validation in two ways: In the first test, the model was trained on the entire HER2+ dataset and tested on the entire STNet dataset. In the second test, the model was trained on the entire STNet dataset and tested on the entire HER2+ dataset. Each evaluation was repeated for all 8 patients in the HER2+ dataset and 23 patients in the STNet dataset and then we selected the highest PCC value across all patients for each gene. Image dimensions were fixed at 224 × 224 pixels as input to the model, and we applied the MSELoss function along with the MAE and RMSE metrics to refine the results. The equation for the MSELoss is as follows:

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2} .

(7)

The training was performed on eight NVIDIA GeForce RTX 3090 GPUs with 24 GB of memory each (https://www.nvidia.com/, accessed on 20 May 2025). We selected a batch size of 768, and the entire script was implemented using the PyTorch library. The learning rate and weight decay were set to

10^{- 4}

. In the following section, we present the results of our EMGP-Net model.

We applied the Wilcoxon signed-rank test to evaluate whether the observed performance improvements of EMGP-Net over the different models were reliable. This non-parametric test is well suited to our study because it handles paired data. The Wilcoxon signed-rank test statistic W is computed as follows:

W = \sum_{i = 1}^{n} R_{i} \cdot sign (d_{i}),

(8)

where

d_{i} = x_{i} - y_{i}

is the difference in PCC between EMGP-Net and the other model for gene i,

R_{i}

is the rank of

| d_{i} |

among all non-zero differences, and sign(

d_{i}

) indicates the direction of the difference.

The p-values resulting from this analysis are reported in the last rows of Table 1, Table 2 and Table 3. Values below the common threshold of 0.05 were considered statistically meaningful, indicating that EMGP-Net achieved a better gene prediction score for most of the genes tested.

4.1. Model Trained on the HER2+ Dataset

First, we present the results obtained using the leave-one-patient-out approach with the HER2+ dataset. Table 1 shows the top 14 genes with the highest PCC. We compared four models: the EfficientFormer, MambaVision, and two versions of our proposed model, EMGP-Net, which includes an attention mechanism, and EMGP-Net-noAttn, which does not. EMGP-Net and EMGP-Net-noAttn are both based on a combination of EfficientFormer and MambaVision, along with the set of added layers mentioned previously. EMGP-Net gave better results in most cases.

Comparing EMGP-Net-noAttn with EMGP-Net demonstrated the impact of the attention mechanism. Although they have the same architecture, EMGP-Net uses attention to weigh and combine features, while EMGP-Net-noAttn simply concatenates them and applies fully connected layers. EMGP-Net outperformed EMGP-Net-noAttn on nearly all genes. This confirmed that the attention-based fusion enhances the model’s ability to focus on more representations, thus improving prediction.

4.1.1. Comparison of Architectural Components by PCC for Top-Ranked Genes:

EMGP-Net was able to predict 13 out of 14 genes with PCC scores greater than 0.7, which is higher than the PCC scores of the MambaVision and EfficientFormer models separately, as well as that of EMGP-Net-noAttn. The top 14 genes predicted by EMGP-Net included PTMA, GNAS, B2M, HNRNPA2B1, and TPT1, with PCC values of 0.7903, 0.7843, 0.7777, 0.7532, and 0.7360, respectively. On the other hand, MambaVision predicted only 1 gene, the B2M gene, out of the 14 genes with the highest PCC score, equal to 0.8049, and scored 0.7763, 0.7674, 0.7363, and 0.7198 for GNAS, PTMA, TPT1, and HNRNPA2B1, respectively. For EfficientFormer, all PCCs were lower than those of EMGP-Net, with values of 0.7777, 0.7746, 0.7661, 0.7266, and 0.7245 for PTMA, B2M, GNAS, HNRNPA2B1, and TPT1, respectively. EMGP-Net outperformed EfficientFormer on 14 genes, MambaVision on 13 genes out of 14 genes, and EMGP-Net-noAttn on all 14 genes. These results demonstrate that combining features from both backbone models and refining them with the attention mechanism greatly improved the PCC scores.

4.1.2. Comparison of Architectural Components by PCC for Common Genes

The common genes among the selected top 14 genes evaluated in all three models included PTMA, GNAS, B2M, HNRNPA2B1, TPT1, XBP1, ACTG1, HLA-B, HLA-DRA, and ACTB. As shown in Table 4, the results indicated that EMGP-Net outperformed EMGP-Net-noAttn, MambaVision, and EfficientFormer in predicting gene expression for the majority of common genes. Specifically, EMGP-Net achieved higher PCC values than MambaVision for 7 out of the 10 common genes. The genes where MambaVision outperformed EMGP-Net were B2M, TPT1, and HLA-DRA, with MambaVision achieving higher PCC scores of 0.8049, 0.7363, and 0.7089, respectively, compared to the 0.7777, 0.7360, and 0.7056 of EMGP-Net. However, when compared to EMGP-Net-noAttn and EfficientFormer, EMGP-Net showed better performance across all genes. Figure 6 shows the visualization of the predictions of six genes with the highest PCCs.

4.2. Quantitative Analysis of the Results

Our method aims to predict the expression of 250 breast cancer-related genes, selected based on their highest mean expression in the dataset.

4.2.1. Analysis of EMGP-Net Results

To evaluate the effectiveness of our model, we categorized the gene expression values into ranges. The results showed that there were 14 genes with PCC values between 0.7 and 0.8, while 56 genes had PCC values between 0.6 and 0.7. Additionally, 75 genes were in the range of 0.5 to 0.6, 54 genes had PCC values between 0.4 and 0.5, and 32 genes were in the range of 0.3 to 0.4. Furthermore, 15 genes had PCC values between 0.2 and 0.3, and 4 genes had values between 0.1 and 0.2. Notably, all genes were predicted with positive PCC values, indicating that our model successfully predicted gene expression for all selected genes.

4.2.2. Analysis of MambaVision Results

The MambaVision results show the following distribution: 1 gene with a PCC value higher than 0.8, 8 genes with a score between 0.7 and 0.8, and 39 genes with PCC values between 0.6 and 0.7. In addition, 65 genes were in the range of 0.5 to 0.6, 73 genes had PCC values between 0.4 and 0.5, and 39 genes were in the range of 0.3 to 0.4. In addition, 20 genes had PCC values between 0.2 and 0.3, 4 genes had PCC values between 0.1 and 0.2, and 1 gene had a PCC value between 0.0 and 0.1.

4.2.3. Analysis of EfficientFormer Results

The EfficientFormer PCC values were distributed as follows: 6 genes with PCC values between 0.7 and 0.8, and 45 genes with PCC values between 0.6 and 0.7. In addition, 68 genes were in the range of 0.5 to 0.6, 70 genes had PCC values between 0.4 and 0.5, and 35 genes were in the range of 0.3 to 0.4. In addition, 18 genes had PCC values between 0.2 and 0.3, 7 genes had values between 0.1 and 0.2, and 1 gene had a PCC value between 0.0 and 0.1.

4.2.4. Analysis of EMGP-Net-noAttn Results

The EMGP-Net-noAttn variant, which eliminated the attention mechanism and employed simple concatenation, achieved positive PCC values for all genes. Of the genes, 8 had PCC values between 0.7 and 0.8, 49 had a PCC value between 0.6 and 0.7, and 72 had a PCC value between 0.5 and 0.6. Additionally, 66 genes had a PCC that fell between 0.4 and 0.5, 32 had a value between 0.3 and 0.4, 15 had a value between 0.2 and 0.3, and 8 had a value between 0.1 and 0.2. Despite performing slightly worse than the full EMGP-Net, this model still showed a clear improvement over the individual backbone models.

EMGP-Net predicted a total of 145 genes with a PCC value greater than 0.5, outperforming EMGP-Net-noAttn which predicted 129, MambaVision which predicted 113, and EfficientFormer which predicted 119. In this context, EMGP-Net had the highest number of predicted genes in the PCC intervals of 0.7 to 0.8, 0.6 to 0.7, and 0.5 to 0.6, indicating its improved performance in these ranges compared to the other models, as shown in Figure 7.

4.3. External Validation

In this section, we discuss the external validation process, where we evaluated our model’s generalizability on a dataset different from the one used for training. This approach allowed us to evaluate the performance of the model on a new set of patients and validate its ability to make accurate predictions on different datasets. Our approach was compared to GeNetFormer [11] and ST-Net [9] after it was retrained.

4.3.1. Model Evaluation on the STNet Dataset

This part presents the results obtained when the model was trained on the entire HER2+ dataset and tested on all patients from the STNet dataset, selecting the best PCC for each gene across all patients. Table 2 shows that EMGP-Net had the best PCC values for the set of the top 14 genes predicted compared to GeNetFormer and ST-Net. The set of the top 14 genes predicted by EMGP-Net included ERBB2, ACTG1, CALR, RPL23, and GNAS, with PCC values of 0.7145, 0.7051, 0.7047, 0.6973, and 0.6962, respectively. For GeNetFormer, the genes were DDX5, ACTG1, CPB1, PTMA, and RPL23, with PCC values of 0.7069, 0.6510, 0.6384, 0.6235, and 0.6130, respectively. For ST-Net, the genes were GNAS, RPL23, PTPRF, ACTG1, and DDX5, with PCC values of 0.6708, 0.6592, 0.6503, 0.6460, and 0.6406, respectively.

We present a comparison of the performance of the EMGP-Net, GeNetFormer, and ST-Net models based on common genes. Table 5 shows the PCC values for the 9 common genes out of the top 14 genes for all three models. The results show that EMGP-Net outperformed GeNetFormer and ST-Net. We can see that EMGP-Net predicted eight out of nine common genes with higher PCC values compared to GeNetFormer and nine out of nine genes with higher PCC values compared to ST-Net. Specifically, EMGP-Net achieved the highest PCC scores for genes such as ACTG1, CALR, RPL23, GNAS, and PTPRF, with PCC values of 0.7051, 0.7047, 0.6973, 0.6962, and 0.6867, respectively, while GeNetFormer showed the best PCC in predicting only DDX5, with a PCC value of 0.7069.

4.3.2. Model Evaluation on the HER2+ Dataset

This part presents the results obtained when the model was trained on the entire STNet dataset and tested on all patients from the HER2+ dataset, selecting the best PCC for each gene across all patients. Table 3 shows that EMGP-Net had the best PCC values for the set of the top 14 genes predicted compared to GeNetFormer and ST-Net. The set of the top 14 genes predicted by EMGP-Net included ERBB2, S100A11, ATP5E, HSP90B1, and LGALS3, with PCC values of 0.7285, 0.6686, 0.6650, 0.6404, and 0.6347, respectively. For GeNetFormer, the top 14 genes included ATP5E, S100A11, ERBB2, PTPRF, and HSP90B1, with PCC values of 0.6746, 0.6434, 0.6141, 0.6115, and 0.5986, respectively. For ST-Net, the top 14 genes included ATP5E, ERBB2, S100A11, PTPRF, and LGALS3, with PCC values of 0.6719, 0.6620, 0.6374, 0.6227, and 0.5918, respectively.

We compared the performance of EMGP-Net, GeNetFormer, and ST-Net on the set of the common genes among the top 14 predicted genes. Table 6 shows the PCC values for ERBB2, S100A11, ATP5E, HSP90B1, LGALS3, PTPRF, and PSMB4. The results show that EMGP-Net outperformed both GeNetFormer and ST-Net for most of these genes. It is clear that EMGP-Net showed the best performance, especially for ERBB2, where it achieved a PCC value of 0.7285, compared to the 0.6620 of ST-Net and the 0.6141 of GeNetFormer. For S100A11, EMGP-Net again outperformed the other models, with a PCC of 0.6686, higher than the 0.6374 of ST-Net and the 0.6434 of GeNetFormer. Similarly, for HSP90B1, EMGP-Net achieved a PCC of 0.6404, higher than both the 0.5903 of ST-Net and the 0.5986 of GeNetFormer. GeNetFormer performed better on one gene, ATP5E, with a PCC value of 0.6746, and ST-Net outperformed on one gene, PTPRF, with a PCC value of 0.6227.

5. Discussion

Obviously, the increasing interest in gene expression prediction using WSIs combined with ST data shows the importance of improving predictive models in the field of cancer research, especially breast cancer research, diagnosis, and treatment, and this study has contributed towards addressing the limitation of the high cost of ST technologies.

Several studies have proposed different architectures. He et al. [9] used DenseNet-121. BrST-Net [10] is a CNN-based framework with EfficientNet-b0 as the best-performing model. GeNetFormer [11] is a transformer-based model, with EfficientFormer being the best performing compared to seven other transformer models. TRIPLEX [22] adapts another method and uses a three-encoder approach based on ResNet18 in which the global encoder also integrates transformer blocks. HisToGene [12] integrates multi-head attention and vision transformers. GNNs have been widely used; for example, Zeng et al. [14] proposed an architecture based on a GNN and transformer as well as the architecture introduced in [13], while Jia et al. [15] used GAT to introduce THItoGene. ErwaNet [23] follows a different strategy and integrates different modules like an ERM and a WAM and different edges like K-nearest neighbor, K-nearest similarity, and percent similarity edges. Hypergraphs were also used in several studies, such as PH2ST [16], which uses ViT and a universal histology image encoder, and the HGGEP architecture [17], which includes a hypergraph association module, gradient enhancement module, ShuffleNet V2 backbone, convolutional block attention module, and ViT. EGN [18] and EGGN [19] were based on exemplar guidance learning. All previous studies have followed different strategies and implemented different training approaches. Some have used a leave-one-patient-out approach, others have used cross-validation, and others have simply divided datasets into training, validation, and test sets. Also, different datasets have been used in different studies, such as the HER2+, STNet, and 10x Genomics datasets with different numbers of targeted genes. To maintain the coherence of research, we introduced EMGP-Net in this paper, a hybrid deep learning model that combines MambaVision and EfficientFormer, designed to predict 250 genes. By leveraging features from both models and utilizing an attention-based fusion layer, we aimed to improve the prediction of breast cancer gene expression. We employed the leave-one-patient-out approach in both internal and external validation.

Our results showed that EMGP-Net outperformed both individual models, MambaVision and EfficientFormer, in internal validation using the HER2+ dataset. It had the highest PCC scores ranging from 0.7002 (CD24) to 0.7903 (PTMA) for the top 14 genes compared to EfficientFormer, which generally dropped behind our model with scores ranging from 0.6834 (S100A11) to 0.7777 (PTMA). It also outperformed MambaVision on 13 out of 14 genes, with MambaVision’s top PCC score being 0.8049 (B2M). Of the 14 genes, we took the common genes across all models, comprising 10 genes, and compared their PCCs. Our model outperformed EfficientFormer on all genes including PTMA, GNAS, B2M, HNRNPA2B1, and TPT1. MambaVision outperformed EMGP-Net only on three genes, namely, B2M, TPT1, and HLA-DRA.

External validation was used to evaluate the performance of EMGP-Net on two datasets: the HER2+ and STNet, representing different breast cancer subtypes. It was trained on one dataset and tested on the other. The model achieved the highest PCC for most of the top 14 genes and showed the best external validation results compared to other retrained models from other studies.

When trained on the HER2+ dataset and tested on the STNet dataset, our model outperformed GeNetFormer and ST-Net on all the 14 genes, with PCC values ranging from 0.6563 (KRT19) to 0.7145 (ERBB2) compared to the PCC values of GeNetFormer, which ranged from 0.5250 (KRT19) to 0.7069 (DDX5), and the PCC values of ST-Net, which ranged from 0.5749 (HLA-DRA) to 0.6708 (GNAS). On 9 common genes out of the 14 genes, our model outperformed ST-Net, including ACTG1, CALR, RPL23, GNAS, and PTPRF, while GeNetFormer only outperformed our model on 1 gene, which was DDX5.
When trained on the STNet dataset and tested on the HER2+ dataset, our model outperformed GeNetFormer and ST-Net on all 14 genes, with PCC scores ranging from 0.5465 (GNAS) to 0.7285 (ERBB2) compared to the PCC values of GeNetFormer, which ranged from 0.5185 (COL1A2) to 0.6746 (ATP5E), and the PCC values of ST-Net, which ranged from 0.5287 (LASP1) to 0.6719 (ATP5E). Of the 7 common genes out of 14 genes, our model outperformed GeNetFormer on 6 genes, namely, ERBB2, S100A11, HSP90B1, LGALS3, PTPRF, and PSMB4, and only 1 gene, ATP5E, was well predicted with GeNetFormer. Our model also outperformed ST-Net on six other genes, namely, ERBB2, S100A11, ATP5E, HSP90B1, LGALS3, and PSMB4, and only PTPRF was well predicted by ST-Net.

When compared to other studies that applied different approaches, EMGP-Net achieved the highest PCC value of 0.7903 (PTMA) using the HER2+ dataset: THItoGene reached a top PCC value of 0.7470 (FN1), HGGEP achieved a top PCC value of 0.6520 (MYL12B), and Hist2ST achieved a top PCC value of 0.7310 (FN1) on the same dataset. Using the STNet dataset, BrST-Net reached a top PCC value of 0.6325 (B2M), while SEPAL had a top PCC value of 0.6390 (ENSG00000145824). EMGP-Net, tested on the STNet dataset as an external validation, achieved a top PCC value of 0.7145 (ERBB2).

Notably, EMGP-Net performed well in predicting the expression of genes such as PTMA, GNAS, B2M, HNRNPA2B1, TPT1, and XBP1 out of 250 targeted genes, which are important biomarkers in breast cancer prognosis. When compared to the results of MambaVision and EfficientFormer, EMGP-Net had the best PCC values for most of the genes evaluated. The model’s ability to integrate features from both MambaVision and EfficientFormer using a multi-head attention mechanism contributed to its improved performance.

Although our results are encouraging, this study has some limitations. First, the HER2+ and STNet datasets that we used are smaller than other datasets that are commonly used in computer vision. This small size may make the model more difficult to apply to other patient groups. Differences in how tissue slides are stained, the type of scanner used, and the unique characteristics of each patient could affect the model’s performance on new data. While we tested the model on both HER2+ and STNet datasets to ensure its effectiveness in various cases, using larger, more diverse datasets would give more reliable results. Additionally, the datasets used in this study are specific to particular types of breast cancer. This could make it difficult for the model to perform well on some other cancer types. For instance, what the model learned from the HER2+ samples may not be directly applicable to triple-negative or luminal A types. We may need to create specialized versions of the model for different subtypes or develop a more general model. Another limitation is that the model is not easily interpretable from a biological point of view. Even though EMGP-Net makes accurate predictions, it does not explain which genes are important for its decisions. In the future, we could use techniques like Grad-CAM or others to better understand how the model works and help doctors trust its predictions.

6. Conclusions

In this study, we introduced EMGP-Net, a hybrid deep learning architecture that combines MambaVision and EfficientFormer to predict gene expression from breast cancer WSIs. By leveraging the capabilities of both models and incorporating a multi-head attention mechanism, EMGP-Net successfully improved gene expression predictions and outperformed models in internal and external validation tests. Our results showed that EMGP-Net achieved a high performance compared to MambaVision and EfficientFormer, with high PCC values. It showed an improvement in the prediction of 250 genes, with EMGP-Net surpassing both models in most cases. External validation on two datasets, HER2+ and STNet, demonstrated the reliability of EMGP-Net. It outperformed both GeNetFormer and ST-Net in most cases. Despite the high performance of EMGP-Net, our study has some limitations. Future work will focus on increasing the number of predicted genes and exploring the integration of explainable AI techniques to improve interpretability. Synthetic data generated by generative models may also enhance training diversity and performance. EMGP-Net provides a solid foundation for advancing gene expression prediction from histopathological images in breast cancer research.

Author Contributions

Conceptualization, O.T. and M.A.A.; methodology, O.T. and M.A.A.; validation, O.T. and M.A.A.; formal analysis, O.T. and M.A.A.; writing—original draft preparation, O.T.; writing—review and editing, M.A.A.; funding acquisition, M.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was enabled in part by support provided by the Natural Sciences and Engineering Research Council of Canada (NSERC), funding reference number RGPIN-2024-05287, and by the AI in Health Research Chair at the Université de Moncton.

Institutional Review Board Statement

This research did not require Institutional Review Board (IRB) approval as it exclusively utilized publicly available data.

Informed Consent Statement

Not applicable.

Data Availability Statement

This work uses two public datasets: 1. HER2+ dataset (https://www.synapse.org/Synapse:syn52503858/files/, accessed on 20 May 2025). 2. STNet dataset (https://data.mendeley.com/datasets/29ntw7sh4r/5, accessed on 20 May 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

APEG	Atypical Position Encoding Generator
AuxNet	Auxiliary Network
BN	Batch Normalization
CBAM	Convolutional Block Attention Module
CNN	Convolutional Neural Network
EB	Exemplar Bridging
EMGP-Net	EfficientFormer (E), MambaVision (M), gene expression prediction (GP) network (Net).
EMGP-Net-noAttn	EfficientFormer (E), MambaVision (M), gene expression prediction (GP) network (Net), no attention (noAtt).
ERM	Edge Relational Module
GAT	Graph Attention Network
GeLU	Gaussian Error Linear Unit
GEM	Gradient Enhancement Gradient
GNN	Graph Neural Network
HAM	Hypergraph Association Module
H&E	Hematoxylin and Eosin
LN	Layer Normalization
MAE	Mean Absolute Error
PCC	Pearson Correlation Coefficient
RMSE	Root Mean Squared Error
SOTA	State Of The Art
SSM	State-Space Model
ST	Spatial Transcriptomics
ViT	Vision Transformer
WAM	Window Attention Module
WSIs	Whole-Slide Images

Appendix A. Lists of the 250 Genes Included in This Study Across the Two Datasets: HER2+ and STNet

This appendix provides complete lists of the 250 genes used in this study. Appendix A.1 presents the genes selected from the HER2+ dataset, and Appendix A.2 presents the genes selected from the STNet dataset.

Appendix A.1. List of the 250 Genes from the HER2+ Dataset

This list contains the 250 genes selected from the HER2+ dataset that were used to train and evaluate the gene expression prediction models described in this study.

Table A1. List of the 250 genes from the HER2+ dataset. The last row uses the asterisk (*) to fill empty cells and complete the seven-column layout.

Gene Name	Gene Name	Gene Name	Gene Name	Gene Name	Gene Name	Gene Name
PTMA	GNAS	B2M	HNRNPA2B1	TPT1	XBP1	ACTG1
HLA-B	TMSB10	DDX5	HLA-DRA	ACTB	S100A11	CD24
HSP90B1	PSMB4	COX6C	TUBA1B	EIF4G2	PRDX1	HLA-C
HLA-A	LAPTM4A	VMP1	HSP90AA1	UBC	ATP5E	CALM2
SCGB2A2	NACA	FTH1	COX7C	CALR	CCT3	FASN
PEBP1	HSPB1	PSAP	SPINT2	BEST1	PFN1	PLXNB2
ATP5B	SERF2	LGALS3	P4HB	MYH9	CRIP2	CHCHD2
ATP1A1	ERBB2	KRT19	CD74	FN1	GAPDH	HSP90AB1
HSPA8	PTPRF	FTL	LSM4	KDELR1	CFL1	VCP
MIDN	PPP1CA	SLC9A3R1	PABPC1	APOE	GRB7	RACK1
EEF2	TUBB	JTB	SH3BGRL3	TXNIP	SCD	OAZ1
LASP1	ATG10	SPDEF	SEPW1	VIM	MDK	CTSB
SEC61A1	GRINA	IDH2	UBE2M	COPS9	MMACHC	MZT2B
JUP	UBA52	PSMD8	SLC2A4RG	MLLT6	SSR2	DBI
TAPBP	CIB1	PPDPF	CST3	TSPO	CD63	COL1A1
PTBP1	AES	TAGLN2	ATP5G2	MYL6	NUCKS1	GNAI2
PLD3	GNB2	LMAN2	HM13	RALY	SNRPB	SDC1
ENO1	COPE	PHB	GRN	HLA-E	STARD10	COL1A2
A2M	ALDOA	NUPR1	LAPTM5	EIF3B	EDF1	MAPKAPK2
SERINC2	FLNA	MIEN1	SYNGR2	MUC1	COX4I1	EIF4G1
C3	PERP	H1FX	GPX4	C1QB	APOC1	DHCR24
PRSS8	COX6B1	IGLC2	KRT18	ERGIC1	GUK1	PGAP3
IGLC3	IGHG3	FAU	UQCRQ	UQCR11	ZYX	CLDN4
CD81	CD99	NDUFA3	CISD3	RRBP1	COX5B	S100A6
LGALS3BP	PCGF2	TYMP	TIMP1	NDUFB9	ATP6V0B	AP2S1
COX8A	FNBP1L	COL3A1	STARD3	PTMS	IFI27	KRT7
PFKL	CTSD	RABAC1	PSMB3	PSMD3	LMNA	H2AFJ
ARHGDIA	SPARC	EEF1D	SLC25A6	INTS1	ACTN4	IGHA1
CHPF	ELOVL1	SSR4	ATP6AP1	CYBA	TAGLN	C1QA
PRRC2A	RHOC	IGHG1	MMP14	PPP1R1B	CALML5	BSG
CLDN3	AEBP1	LY6E	TRAF4	IGKC	BGN	NBL1
FKBP2	AP000769.1	ROMO1	COL6A2	IGHM	C12orf57	MYL9
BCAP31	SCAND1	TCEB2	PFDN5	BST2	KIAA0100	NDUFB7
MUCL1	LGALS1	POSTN	TFF3	MGP	COL18A1	NDUFA11
IGFBP2	KRT81	SUPT6H	ORMDL3	S100A9	MUC6	AZGP1
S100A14	S100A8	IGHG4	ADAM15	ISG15	*	*

Appendix A.2. List of the 250 Genes from the STNet Dataset

This list contains the 250 genes selected from the STNet dataset that were used to train and evaluate the gene expression prediction models described in this study.

Table A2. List of the 250 genes from the STNet dataset. Gene names marked as N/A indicate names that were originally ambiguous. The last row uses the asterisk (*) to fill empty cells and complete the seven-column layout.

Gene Name	Gene Name	Gene Name	Gene Name	Gene Name	Gene Name	Gene Name
ERBB2	ACTG1	CALR	RPL23	GNAS	PSMD3	PTPRF
TMSB10	GAPDH	TAGLN2	DDX5	HSPB1	PTMA	KRT19
P4HB	PRDX1	PFN1	HLA-C	S100A11	RPL28	ENSG00000203812
B2M	HLA-DRA	CPB1	NHERF1	RPLP0	S100A9	RPL19
HLA-B	C4B	CALML5	ACTB	S100A8	RPLP2	TMSB4X
APOE	GRINA	ENO1	RPL35	MGP	TIMP1	HLA-A
RPS11	IGLL5	PRSS8	ENSG00000272196	COX6C	ATP1A1	CYBA
RPS19	RPLP1	RPS28	RPS18	JUP	RPS2	UBA52
TUBA1B	SELENOW	IFI27	ELF3	FTL	N/A	RPL13
RPL9	ATP5F1E	N/A	RPL10	CST3	RPS4X	RPL38
TAPBP	SYNGR2	RPS20	CD74	SERF2	FASN	C1QA
CLDN3	N/A	SPDEF	RACK1	UBC	BCAP31	PABPC1
RPS6	N/A	FLNA	RPS13	H1-10	SDC1	EIF4G1
FTH1	RPS9	CRIP2	RPS27A	AEBP1	CLU	S100A6
RPL8	FN1	SEC61A1	MYL6	RPL15	RPS17	PPP1CA
GPX4	RPS7	BGN	RPL13A	ATP6V0B	BSG	TPT1
A2M	BST2	PPDPF	MYL9	VIM	RPS15A	XBP1
COL1A1	RPS14	STARD10	RPS12	RPS3	ISG15	RPS15
ENSG00000169100	MZT2A	HSP90AB1	CD81	LY6E	IFITM3	MZT2B
EIF4A1	PFDN5	RPS8	COX8A	UBB	LGALS3BP	RPL23A
EEF2	RPL29	N/A	TAGLN	EVL	N/A	RPL3
MUC1	SPARC	N/A	APOC1	H3-3B	RPS23	N/A
KRT8	RPS21	UQCR11	TSPO	RPL27	UQCRQ	GNB2
RPL34	ARHGDIA	LAPTM5	SNHG25	RPL5	N/A	N/A
RHOC	TUFM	RPL35A	RPL14	EDF1	N/A	CFL1
RPL18A	HLA-E	SSR2	FXYD3	H2AJ	FAU	AZGP1
BEST1	COL1A2	LMNA	RPL12	GUK1	COX4I1	OAZ1
RPL37A	PLXNB2	ELOB	GAS5	N/A	GRN	MALAT1
RPS24	IGFBP2	COX6B1	CTSB	TFF3	RPL24	ALDOA
RPL32	RPS16	PRDX2	EEF1D	RPL4	RPL31	CCND1
NDUFA13	RPL7A	RPL11	RPL36	NBEAL1	EIF5A	PLD3
RPL27A	CD63	SH3BGRL3	ATP6AP1	PSAP	ZNF90	TLE5
RPS29	RPL7	RPS25	KRT18	RPS5	NDUFA11	CTSD
NDUFB9	SSR4	C3	RPS27	N/A	ENSG00000279274	RPL37
RPS3A	ENSG00000255823	POLR2L	IFI6	ENSG00000269028	RPS10	RPL30
ENSG00000279483	C12orf57	GNAI2	TFF1	RPL18	*	*

References

Obeagu, E.I.; Obeagu, G.U. Breast cancer: A review of risk factors and diagnosis. Medicine 2024, 103, e36905. [Google Scholar] [CrossRef]
Thaalbi, O.; Akhloufi, M.A. Deep learning for breast cancer diagnosis from histopathological images: Classification and gene expression: Review. Netw. Model. Anal. Health Inform. Bioinform. 2024, 13, 52. [Google Scholar] [CrossRef]
Al-Jabbar, M.; Alshahrani, M.; Senan, E.M.; Ahmed, I.A. Multi-Method Diagnosis of Histopathological Images for Early Detection of Breast Cancer Based on Hybrid and Deep Learning. Mathematics 2023, 11, 1429. [Google Scholar] [CrossRef]
Obayya, M.; Maashi, M.S.; Nemri, N.; Mohsen, H.; Motwakel, A.; Osman, A.E.; Alneil, A.A.; Alsaid, M.I. Hyperparameter Optimizer with Deep Learning-Based Decision-Support Systems for Histopathological Breast Cancer Diagnosis. Cancers 2023, 15, 885. [Google Scholar] [CrossRef] [PubMed]
Clement, D.; Agu, E.; Obayemi, J.; Adeshina, S.; Soboyejo, W. Breast Cancer Tumor Classification Using a Bag of Deep Multi-Resolution Convolutional Features. Informatics 2022, 9, 91. [Google Scholar] [CrossRef]
Bagchi, A.; Pramanik, P.; Sarkar, R. A Multi-Stage Approach to Breast Cancer Classification Using Histopathology Images. Diagnostics 2022, 13, 126. [Google Scholar] [CrossRef]
Bhausaheb, D.P.; Kashyap, K.L. Detection and classification of breast cancer availing deep canid optimization based deep CNN. Multimed. Tools Appl. 2023, 82, 18019–18037. [Google Scholar] [CrossRef]
Yu, D.; Lin, J.; Cao, T.; Chen, Y.; Li, M.; Zhang, X. SECS: An effective CNN joint construction strategy for breast cancer histopathological image classification. J. King Saud Univ.-Comput. Inf. Sci. 2023, 35, 810–820. [Google Scholar] [CrossRef]
He, B.; Bergenstråhle, L.; Stenbeck, L.; Abid, A.; Andersson, A.; Borg, Å.; Maaskola, J.; Lundeberg, J.; Zou, J. Integrating spatial gene expression and breast tumour morphology via deep learning. Nat. Biomed. Eng. 2020, 4, 827–834. [Google Scholar] [CrossRef]
Rahaman, M.M.; Millar, E.K.; Meijering, E. Breast cancer histopathology image-based gene expression prediction using spatial transcriptomics data and deep learning. Sci. Rep. 2023, 13, 13604. [Google Scholar] [CrossRef]
Thaalbi, O.; Akhloufi, M.A. GeNetFormer: Transformer-Based Framework for Gene Expression Prediction in Breast Cancer. AI 2025, 6, 43. [Google Scholar] [CrossRef]
Pang, M.; Su, K.; Li, M. Leveraging information in spatial transcriptomics to predict super-resolution gene expression from histology images in tumors. BioRxiv 2021. [Google Scholar] [CrossRef]
Mejia, G.; Cárdenas, P.; Ruiz, D.; Castillo, A.; Arbeláez, P. SEPAL: Spatial Gene Expression Prediction from Local Graphs. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 2294–2303. [Google Scholar] [CrossRef]
Zeng, Y.; Wei, Z.; Yu, W.; Yin, R.; Yuan, Y.; Li, B.; Tang, Z.; Lu, Y.; Yang, Y. Spatial transcriptomics prediction from histology jointly through transformer and graph neural networks. Briefings Bioinform. 2022, 23, bbac297. [Google Scholar] [CrossRef] [PubMed]
Jia, Y.; Liu, J.; Chen, L.; Zhao, T.; Wang, Y. THItoGene: A deep learning method for predicting spatial transcriptomics from histological images. Briefings Bioinform. 2024, 25, bbad464. [Google Scholar] [CrossRef]
Niu, Y.; Liu, J.; Zhan, Y.; Shi, J.; Zhang, D.; Machado, I.; Crispin-Ortuzar, M.; Li, C.; Gao, Z. ST-Prompt Guided Histological Hypergraph Learning for Spatial Gene Expression Prediction. arXiv 2025, arXiv:2503.16816. [Google Scholar] [CrossRef]
Li, B.; Zhang, Y.; Wang, Q.; Zhang, C.; Li, M.; Wang, G.; Song, Q. Gene expression prediction from histology images via hypergraph neural networks. Briefings Bioinform. 2024, 25, bbae500. [Google Scholar] [CrossRef]
Yang, Y.; Hossain, M.Z.; Stone, E.A.; Rahman, S. Exemplar guided deep neural network for spatial transcriptomics analysis of gene expression prediction. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 5039–5048. [Google Scholar] [CrossRef]
Yang, Y.; Hossain, M.Z.; Stone, E.; Rahman, S. Spatial transcriptomics analysis of gene expression prediction using exemplar guided graph neural network. Pattern Recognit. 2024, 145, 109966. [Google Scholar] [CrossRef]
Liu, Z.; Qian, S.; Xia, C.; Wang, C. Are transformer-based models more robust than CNN-based models? Neural Netw. 2024, 172, 106091. [Google Scholar] [CrossRef]
Hatamizadeh, A.; Kautz, J. Mambavision: A hybrid mamba-transformer vision backbone. arXiv 2024, arXiv:2407.08083. [Google Scholar] [CrossRef]
Chung, Y.; Ha, J.H.; Im, K.C.; Lee, J.S. Accurate Spatial Gene Expression Prediction by Integrating Multi-Resolution Features. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 11591–11600. [Google Scholar] [CrossRef]
Chen, C.; Zhang, Z.; Tang, P.; Liu, X.; Huang, B. Edge-relational window-attentional graph neural network for gene expression prediction in spatial transcriptomics analysis. Comput. Biol. Med. 2024, 174, 108449. [Google Scholar] [CrossRef]
Andersson, A.; Larsson, L.; Stenbeck, L.; Salmén, F.; Ehinger, A.; Wu, S.Z.; Al-Eryani, G.; Roden, D.; Swarbrick, A.; Borg, Å.; et al. Spatial deconvolution of HER2-positive breast cancer delineates tumor-associated cell type interactions. Nat. Commun. 2021, 12, 6012. [Google Scholar] [CrossRef] [PubMed]
Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023, arXiv:2312.00752v2. [Google Scholar] [CrossRef]
Li, Y.; Yuan, G.; Wen, Y.; Hu, J.; Evangelidis, G.; Tulyakov, S.; Wang, Y.; Ren, J. Efficientformer: Vision transformers at mobilenet speed. Adv. Neural Inf. Process. Syst. 2022, 35, 12934–12949. [Google Scholar] [CrossRef]
Vaswani, A. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]

Figure 1. Representative whole-slide images from the HER2+ and STNet datasets: (a) Sample WSIs from the HER2+ dataset. (b) Sample WSIs from the STNet dataset.

Figure 2. Overview of the EMGP-Net architecture. The workflow comprises multiple stages: extracting patches from WSIs (the red frame shows an extracted patch and its location in the original image), extracting features using MambaVision and EfficientFormer backbones, and utilizing multi-head attention followed by activation functions (GeLU), layer normalization, and fully connected layers to output the final gene expression predictions.

Figure 3. Architecture of the MambaVision block. The block integrates two parallel branches. One branch contains a linear layer and applies a selective SSM and a 1D convolutional layer. The other branch also includes a linear layer and applies only a 1D convolution. The outputs are then combined and passed through a last linear layer.

Figure 4. Architecture of the meta transformer block, including both

{MB}^{4 D}

and

{MB}^{3 D}

variants. The

{MB}^{4 D}

variant uses pooling followed by two convolutional layers with 1 × 1 kernels, batch normalization, and a GeLU activation function. In contrast, the

{MB}^{3 D}

variant includes a sequence of linear layers alternating with layer normalization and GeLU. In this variant, the input is projected into a query (Q), key (K), and value (V) before final integration through linear transformations.

Figure 4. Architecture of the meta transformer block, including both

{MB}^{4 D}

and

{MB}^{3 D}

variants. The

{MB}^{4 D}

variant uses pooling followed by two convolutional layers with 1 × 1 kernels, batch normalization, and a GeLU activation function. In contrast, the

{MB}^{3 D}

variant includes a sequence of linear layers alternating with layer normalization and GeLU. In this variant, the input is projected into a query (Q), key (K), and value (V) before final integration through linear transformations.

Figure 5. Architecture of the multi-head attention mechanism. The figure illustrates the multi-head attention structure. First, the inputs are passed through linear layers, and then the outputs are processed in parallel by several attention heads (shown as h). Then, the outputs from all heads are combined and passed through another linear layer.

Figure 6. Visualization of the top 6 genes predicted by EMGP-Net using the HER2+ dataset with a leave-one-patient-out approach. Each pair of images shows the ground truth on the left and the corresponding prediction on the right for one gene. The color indicates gene expression levels as standard deviations from the mean. The corresponding PCC values are shown for each gene.

Figure 7. Distribution of PCC values for gene expression predictions from each model: (a) PCC distribution for EMGP-Net with attention mechanism. (b) PCC distribution for MambaVision. (c) PCC distribution for EfficientFormer. (d) PCC distribution for EMGP-Net without attention mechanism. Each histogram shows the number of genes that fell within specific PCC value intervals. The x-axis represents the PCC ranges, and the y-axis shows the number of genes in each range.

Table 1. Overview of performance comparison between EMGP-Net, EMGP-Net-noAttn, EfficientFormer, and MambaVision. The highest PCC values across the different models for the top 14 predicted genes are in bold and underlined. The p-values at the bottom show the statistical differences between each model and EMGP-Net. Values < 0.05 indicate significance.

Genes	EfficientFormer	MambaVision	EMGP-Net-noAttn	EMGP-Net
Gene 1	0.7777 (PTMA)	0.8049 (B2M)	0.7791 (PTMA)	0.7903 (PTMA)
Gene 2	0.7746 (B2M)	0.7763 (GNAS)	0.7768 (B2M)	0.7843 (GNAS)
Gene 3	0.7661 (GNAS)	0.7674 (PTMA)	0.7700 (GNAS)	0.7777 (B2M)
Gene 4	0.7266 (HNRNPA2B1)	0.7363 (TPT1)	0.7356 (TPT1)	0.7532 (HNRNPA2B1)
Gene 5	0.7245 (TPT1)	0.7198 (HNRNPA2B1)	0.7331 (HNRNPA2B1)	0.7360 (TPT1)
Gene 6	0.7075 (ACTG1)	0.7089 (HLA-DRA)	0.7271 (ACTG1)	0.7339 (XBP1)
Gene 7	0.6965 (XBP1)	0.7042 (ACTG1)	0.7237 (XBP1)	0.7318 (ACTG1)
Gene 8	0.6964 (HLA-DRA)	0.7032 (HLA-B)	0.7005 (HLA-B)	0.7228 (HLA-B)
Gene 9	0.6938 (CD24)	0.7010 (XBP1)	0.6959 (HLA-DRA)	0.7122 (TMSB10)
Gene 10	0.6929 (HLA-B)	0.6921 (COX6C)	0.6951 (ACTB)	0.7085 (DDX5)
Gene 11	0.6868 (TMSB10)	0.6832 (VMP1)	0.6873 (TMSB10)	0.7056 (HLA-DRA)
Gene 12	0.6859 (DDX5)	0.6809 (ACTB)	0.6826 (TUBA1B)	0.7020 (ACTB)
Gene 13	0.6839 (ACTB)	0.6789 (PSMB4)	0.6826 (COX6C)	0.7016 (S100A11)
Gene 14	0.6834 (S100A11)	0.6780 (NACA)	0.6799 (DDX5)	0.7002 (CD24)
p-value	0.0001 (<0.05)	0.0009 (<0.05)	0.0001 (<0.05)	N/A

Table 2. Overview of EMGP-Net performance vs. ST-Net and GeNetFormer performance. The highest PCC values across the different models for the top 14 predicted genes are in bold and underlined. The p-values at the bottom show the statistical differences between each model and EMGP-Net. Values < 0.05 indicate significance.

Genes	ST-Net	GeNetFormer	EMGP-Net
Gene 1	0.6708 (GNAS)	0.7069 (DDX5)	0.7145 (ERBB2)
Gene 2	0.6592 (RPL23)	0.6510 (ACTG1)	0.7051 (ACTG1)
Gene 3	0.6503 (PTPRF)	0.6384 (CPB1)	0.7047 (CALR)
Gene 4	0.6460 (ACTG1)	0.6235 (PTMA)	0.6973 (RPL23)
Gene 5	0.6406 (DDX5)	0.6130 (RPL23)	0.6962 (GNAS)
Gene 6	0.6325 (PRDX1)	0.5974 (PTPRF)	0.6894 (PSMD3)
Gene 7	0.6274 (TAGLN2)	0.5943 (GNAS)	0.6867 (PTPRF)
Gene 8	0.6273 (CALR)	0.5864 (CALR)	0.6842 (TMSB10)
Gene 9	0.6235 (HSPB1)	0.5840 (HSPB1)	0.6835 (GAPDH)
Gene 10	0.6201 (PTMA)	0.5701 (TMSB10)	0.6814 (TAGLN2)
Gene 11	0.6144 (CPB1)	0.5638 (TAGLN2)	0.6724 (DDX5)
Gene 12	0.6027 (NHEERF1)	0.5344 (P4HB)	0.6645 (HSPB1)
Gene 13	0.5908 (ENSG00000203812)	0.5307 (PRDX1)	0.6588 (PTMA)
Gene 14	0.5749 (HLA-DRA)	0.5250 (KRT19)	0.6563 (KRT19)
p-value	0.0001 (<0.05)	0.0001 (<0.05)	N/A

Table 3. Overview of EMGP-Net performance vs. ST-Net and GeNetFormer performance. The highest PCC values across the different models for the top 14 predicted genes are in bold and underlined. The p-values at the bottom show the statistical differences between each model and EMGP-Net. Values < 0.05 indicate significance.

Genes	ST-Net	GeNetFormer	EMGP-Net
Gene 1	0.6719 (ATP5E)	0.6746 (ATP5E)	0.7285 (ERBB2)
Gene 2	0.6620 (ERBB2)	0.6434 (S100A11)	0.6686 (S100A11)
Gene 3	0.6374 (S100A11)	0.6141 (ERBB2)	0.6650 (ATP5E)
Gene 4	0.6227 (PTPRF)	0.6115 (PTPRF)	0.6404 (HSP90B1)
Gene 5	0.5918 (LGALS3)	0.5986 (HSP90B1)	0.6347 (LGALS3)
Gene 6	0.5903 (HSP90B1)	0.5967 (CST3)	0.6262 (CD24)
Gene 7	0.5880 (CST3)	0.5572 (ACTG1)	0.6049 (PTPRF)
Gene 8	0.5812 (KRT19)	0.5449 (MYH9)	0.5927 (FN1)
Gene 9	0.5750 (PSMB4)	0.5400 (PSMB4)	0.5905 (PTMA)
Gene 10	0.5662 (GNAS)	0.5393 (LGALS3)	0.5832 (FTH1)
Gene 11	0.5353 (EEF2)	0.5384 (KRT19)	0.5763 (PSMB4)
Gene 12	0.5317 (ACTG1)	0.5310 (CD24)	0.5726 (ACTB)
Gene 13	0.5301 (IGLC2)	0.5293 (FTH1)	0.5642 (MYH9)
Gene 14	0.5287 (LASP1)	0.5185 (COL1A2)	0.5465 (GNAS)
p-value	0.0001 (<0.05)	0.0001 (<0.05)	N/A

Table 4. Overview of performance comparison between EMGP-Net, EMGP-Net-noAttn, EfficientFormer, and MambaVision. The highest PCC values for the common genes among the top 14 genes predicted by the different models are in bold and underlined.

Gene	EfficientFormer	MambaVision	EMGP-Net-noAttn	EMGP-Net
PTMA	0.7777	0.7674	0.7791	0.7903
GNAS	0.7661	0.7763	0.7700	0.7843
B2M	0.7746	0.8049	0.7768	0.7777
HNRNPA2B1	0.7266	0.7198	0.7331	0.7532
TPT1	0.7245	0.7363	0.7356	0.7360
XBP1	0.6965	0.7010	0.7237	0.7339
ACTG1	0.7075	0.7042	0.7271	0.7318
HLA-B	0.6929	0.7032	0.7005	0.7228
HLA-DRA	0.6964	0.7089	0.6959	0.7056
ACTB	0.6839	0.6809	0.6951	0.7020

Table 5. Overview of EMGP-Net performance vs. ST-Net and GeNetFormer performance. The highest PCC values for common genes among the top 14 genes predicted by the different models are in bold and underlined.

Gene	ST-Net	GeNetFormer	EMGP-Net
ACTG1	0.6460	0.6510	0.7051
CALR	0.6273	0.5864	0.7047
RPL23	0.6592	0.6130	0.6973
GNAS	0.6708	0.5943	0.6962
PTPRF	0.6503	0.5974	0.6867
TAGLN2	0.6274	0.5638	0.6814
DDX5	0.6406	0.7069	0.6724
HSPB1	0.6235	0.5840	0.6645
PTMA	0.6201	0.6235	0.6588

Table 6. Overview of EMGP-Net performance vs. ST-Net and GeNetFormer performance. The highest PCC values for common genes among the top 14 genes predicted by the different models are in bold and underlined.

Gene	ST-Net	GeNetFormer	EMGP-Net
ERBB2	0.6620	0.6141	0.7285
S100A11	0.6374	0.6434	0.6686
ATP5E	0.6719	0.6746	0.6650
HSP90B1	0.5903	0.5986	0.6404
LGALS3	0.5918	0.5393	0.6347
PTPRF	0.6227	0.6115	0.6049
PSMB4	0.5750	0.5400	0.5763

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Thâalbi, O.; Akhloufi, M.A. EMGP-Net: A Hybrid Deep Learning Architecture for Breast Cancer Gene Expression Prediction. Computers 2025, 14, 253. https://doi.org/10.3390/computers14070253

AMA Style

Thâalbi O, Akhloufi MA. EMGP-Net: A Hybrid Deep Learning Architecture for Breast Cancer Gene Expression Prediction. Computers. 2025; 14(7):253. https://doi.org/10.3390/computers14070253

Chicago/Turabian Style

Thâalbi, Oumeima, and Moulay A. Akhloufi. 2025. "EMGP-Net: A Hybrid Deep Learning Architecture for Breast Cancer Gene Expression Prediction" Computers 14, no. 7: 253. https://doi.org/10.3390/computers14070253

APA Style

Thâalbi, O., & Akhloufi, M. A. (2025). EMGP-Net: A Hybrid Deep Learning Architecture for Breast Cancer Gene Expression Prediction. Computers, 14(7), 253. https://doi.org/10.3390/computers14070253

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

EMGP-Net: A Hybrid Deep Learning Architecture for Breast Cancer Gene Expression Prediction

Abstract

1. Introduction

2. Related Work

2.1. CNN-Based Approaches

2.2. Transformer-Based Approaches

2.3. Hybrid Transformer and GNN Approaches

2.4. Graph-Based and Relational Modeling Approaches

2.5. Exemplar-Guided Approaches

3. Materials and Methods

3.1. Dataset

3.2. Data Pre-Processing and Augmentation

3.3. Proposed Approach

3.3.1. Overview of the EMGP-Net Architecture

3.3.2. MambaVision

3.3.3. EfficientFormer

3.3.4. Multi-Head Attention Mechanism

3.4. Evaluation Metrics

4. Experimental Results

4.1. Model Trained on the HER2+ Dataset

4.1.1. Comparison of Architectural Components by PCC for Top-Ranked Genes:

4.1.2. Comparison of Architectural Components by PCC for Common Genes

4.2. Quantitative Analysis of the Results

4.2.1. Analysis of EMGP-Net Results

4.2.2. Analysis of MambaVision Results

4.2.3. Analysis of EfficientFormer Results

4.2.4. Analysis of EMGP-Net-noAttn Results

4.3. External Validation

4.3.1. Model Evaluation on the STNet Dataset

4.3.2. Model Evaluation on the HER2+ Dataset

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Lists of the 250 Genes Included in This Study Across the Two Datasets: HER2+ and STNet

Appendix A.1. List of the 250 Genes from the HER2+ Dataset

Appendix A.2. List of the 250 Genes from the STNet Dataset

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI