Cancer Diagnosis through Contour Visualization of Gene Expression Leveraging Deep Learning Techniques

Prompt diagnostics and appropriate cancer therapy necessitate the use of gene expression databases. The integration of analytical methods can enhance detection precision by capturing intricate patterns and subtle connections in the data. This study proposes a diagnostic-integrated approach combining Empirical Bayes Harmonization (EBS), Jensen–Shannon Divergence (JSD), deep learning, and contour mathematics for cancer detection using gene expression data. EBS preprocesses the gene expression data, while JSD measures the distributional differences between cancerous and non-cancerous samples, providing invaluable insights into gene expression patterns. Deep learning (DL) models are employed for automatic deep feature extraction and to discern complex patterns from the data. Contour mathematics is applied to visualize decision boundaries and regions in the high-dimensional feature space. JSD imparts significant information to the deep learning model, directing it to concentrate on pertinent features associated with cancerous samples. Contour visualization elucidates the model’s decision-making process, bolstering interpretability. The amalgamation of JSD, deep learning, and contour mathematics in gene expression dataset analysis diagnostics presents a promising pathway for precise cancer detection. This method taps into the prowess of deep learning for feature extraction while employing JSD to pinpoint distributional differences and contour mathematics for visual elucidation. The outcomes underscore its potential as a formidable instrument for cancer detection, furnishing crucial insights for timely diagnostics and tailor-made treatment strategies.


Introduction
Cancer is another notable driver of fatalities everywhere, behind cardiovascular disease.The World Health Organization (WHO) reported that more than nine million individuals died from cancer in 2018, with the number of new cases projected to rise to twenty-seven million annually by 2040 (World Health Organization (WHO), 2019).An accurate and prompt cancer diagnosis is thus essential for improving rates of recovery and survival via patient outcomes.
Due to the rising abundance of gene expression data from populations worldwide, there has never been a better opportunity to uncover the molecular insights of cancer.However, novel approaches are required for efficient analysis and interpretation due to these data's extensive dimensionality and multifaceted nature [1].
Cancer continues to be a global health challenge, causing substantial morbidity and mortality worldwide.Early and accurate cancer detection is critical for successful treatment outcomes and predictive analytics [2], making integrating multiple analytical techniques essential for enhancing diagnostic accuracy [3].In this context, gene expression datasets have emerged as a valuable resource, providing insights into the molecular mechanisms underlying cancer development and progression.
The need of the hour is to explore innovative methodologies that harness the potential of gene expression data for cancer detection.Grasping the intricate patterns and associations in these high-dimensional data can greatly enhance our comprehension of cancer biology and aid in tailoring individualized treatment approaches [4].Integrating multiple analytical techniques offers a unique opportunity to extract valuable information from gene expression datasets, paving the way for more effective and timely cancer diagnosis and intervention.
The integration of EBS [5], JSD, deep learning, and contour mathematics in the analysis of gene expression data presents a cutting-edge and innovative approach in the process of cancer detection.Combining these powerful analytical techniques, this research holds significant potential to revolutionize cancer diagnosis and treatment strategies worldwide.JSD [6] enables the measurement of distributional differences between cancerous and noncancerous samples, providing valuable insights into gene expression patterns associated with cancer.Deep learning models bring automatic feature extraction and pattern recognition capabilities, enabling the discovery of complex molecular relationships crucial for accurate cancer detection [7].Additionally, contour mathematics offers a unique visualization tool, aiding in the interpretation of the model's decision boundaries and regions in the high-dimensional feature space [8].Ultimately, this integrated approach aims to contribute to the global fight against cancer by improving early diagnosis and enabling personalized treatment approaches, thus alleviating the burden of cancer on a global scale.
This research presents a novel and integrated approach to cancer detection, combining EBS, JSD, deep learning, and contour mathematics in the analysis of gene expression data.The objectives of this study are three-fold:

•
To leverage the power of EBS and JSD as an information-theoretic measure to quantify distributional differences between cancerous and non-cancerous samples based on preprocessed data.By integrating JSD into the analysis, this research aims to gain deeper insights into gene expression patterns, enabling the identification of critical genomic signatures associated with cancer.

•
To harness the capabilities of deep learning models for automatic feature extraction and pattern recognition from gene expression data.By employing deep learning, this research seeks to uncover complex molecular relationships and identify crucial features that contribute to accurate cancer detection.

•
To utilize contour mathematics for visual interpretation of the deep learning model's decision boundaries and regions in the high-dimensional feature space.This novel visualization approach enhances the interpretability of the model, facilitating a deeper understanding of the complex interactions between genes and their relevance in cancer detection.
achieved an impressive 95.65% accuracy rate across 33 cancer type cohorts.The study also incorporated heat maps to interpret gene significance in diverse cancer types, ultimately enhancing the understanding of cancer's intricate characteristics and propelling the usage of deep learning in cancer genomics.This work [16], employed a combination of fuzzy support vector machine (SVM), particle swarm optimization (PSO), and genetic algorithms (GAs) for improved gene-based cancer classification.This approach uses fuzzy logic and a decision tree algorithm to boost its sensitivity to training samples and to tailor a unique set of rules per cancer type.High classification accuracy was achieved across leukemia, colon, and breast cancer datasets, demonstrating the method's ability to effectively reduce data dimensionality and identify pertinent gene subsets.
The authors [17] introduced the MCSE-enhancer model, a multi-classifier stacked ensemble, to pinpoint enhancers in DNA (Deoxyribonucleic acid) sequences accurately.Leveraging both experimental techniques like ChIP-seq and computational methods, our model surpassed existing enhancer classifiers with 81.5% accuracy.This integrated approach offers a significant advancement in enhancer detection.Utilizing RNA-Seq (Ribonucleic acid-sequence) data from the Mendeley repository for five cancer types, this study [18] converted values to 2D images.They applied DL for feature extraction and classification.Among eight tested models, the convolutional neural network (CNN) emerged as the most effective, excelling particularly with a 70-30 data split.
The authors [19] introduced the m5C (5-methylcytosine)-pred model, which accurately identifies RNA m5C methylation sites across five species, leveraging five feature encoding techniques and optimizing with SHapley Additive exPlanations and Optuna, surpassing existing methods.This study [20], assessed the literature on convolutional neural network applications in gene expression data analysis, highlighting a peak accuracy of 99.2% across studies.This study [21] introduced i6mA-Caps (N6-methyladenine-CapsuleNet), a CapsuleNet-based tool for detecting DNA N6-methyladenine sites, achieving up to 96.71% accuracy across three genomes, outperforming current leading methods.On utilizing ML, this study [22], integrated gene expression data from three SLE (systemic lupus erythematosus) datasets, achieving up to 83% classification accuracy for disease activity.Despite technical variation challenges, gene modules proved more robust than raw gene expression, maintaining around 70% accuracy.[23] evaluated the efficacy of various optimizers in deep learning for classifying five cancer types using gene expression data.AdaGrad and Adam stood out among tested optimizers, with performance further analyzed across different learning and decay rates.
This study [24], introduced DCGN, a novel DL approach that integrates CNN and BiGRU (Bidirectional Gated Recurrent Unit), to optimize cancer subtype classification from gene expression data.Addressing challenges of limited samples and high dimensionality, DCGN outperforms seven existing methods in classifying breast and bladder cancers, showcasing its superior capability in handling sparse, high-dimensional datasets.This study [25], introduced the DL-m6A (N6-methyladenosine) tool based on deep learning and multiple encoding schemes, which improves the identification of m6A sites in mammals.Surpassing existing tools in performance, a dedicated web server is available for broader access.
While many of the reviewed studies utilized CNN models and other ML approaches for cancer classification using gene expression data, few have focused on integrating these models with comprehensive explainability methods for better interpretability of the model outcomes.Also, there is a lack of research on the development of models that can efficiently handle complex extraction and visualization among different types of cancer.

Dataset
From Mendeley data [26], selecting the Microarray Gene Expression Cancer (MGEC) dataset allowed us to assess the efficacy of our unique approach.With an impressive array of more than 14,124 features grouped within six distinct classifications, this dataset provides a rare chance to rigorously evaluate the effectiveness of our proposed technique on specific categories of malignancy data.The MGEC dataset is widely used for cancer type predictions, and its samples come from prestigious bioinformatics laboratories at top institutions across the globe.The use of microarray data in oncology research has become crucial in recent years, especially for early cancer detection, directing treatment choices, and forecasting outcomes.
This comprehensive collection includes brain, lung, prostate, and CNS (Central Nervous System) embryonal cancers.Figure 1 depicts the expression heat map of all the considered cancer types.Gene expression heat maps are graphical representations that showcase the expression levels of multiple genes across various samples or conditions.These heat maps reveal distinct expression patterns when focusing on specific cancers such as lung, brain, prostate, and CNS embryonal cancers.Lung cancer heat maps might exhibit specific upregulation or downregulation of genes related to cell proliferation and smoke exposure.Brain cancer maps could highlight genes involved in neural development and signaling pathways.Genes associated with hormonal regulation and cell growth might stand out in prostate cancer.For CNS embryonal cancers, a group of high-grade malignant tumors usually found in children, genes related to embryonic development and rapid cell division might be prominently displayed.Researchers can identify commonalities and differences by comparing the expression patterns across these cancers, potentially guiding the DL methodologies to learn therapeutic strategies and understand disease mechanisms.The term "microarray" is often used in the medical sector to refer to an essential research factor that can evaluate the expression of several genes at once.Microarray profiling has become the gold standard for identifying and classifying tumor development.
We have analyzed microarray data for reliable cancer diagnostics using unique methods and developed improved techniques to analyze the results.To fully evaluate the efficacy of our suggested approach, we compare the accuracy results to those of other existing datasets; this further emphasizes the importance of the MGEC dataset in furthering oncology studies and precise diagnosis.

Methodology
The generic architecture of the proposed model is depicted in Figure 2, which comprises the primary strategy of the computation and its purpose.The diagram vividly illustrates the Empirical Bayes Harmonization (EBH) process applied to gene expression datasets, highlighting its efficacy in addressing batch effects.Through contour visualizations, areas of heightened concentration for cancer-related gene expression signatures in n-dimensional feature space are distinctly demarcated, either by contour lines or colorcoded regions.The visualization effectively contrasts the gene expression profiles of cancerous and non-cancerous samples, as measured by JSD.Furthermore, the schematic representation of the PCA-transformer showcases its three-phase structure, including the embedding layer, self-learning transformer, and output layer, elucidating its capability to discern intricate patterns from individual gene elements in the dataset.
tions, areas of heightened concentration for cancer-related gene expression signatures in n-dimensional feature space are distinctly demarcated, either by contour lines or colorcoded regions.The visualization effectively contrasts the gene expression profiles of cancerous and non-cancerous samples, as measured by JSD.Furthermore, the schematic representation of the PCA-transformer showcases its three-phase structure, including the embedding layer, self-learning transformer, and output layer, elucidating its capability to discern intricate patterns from individual gene elements in the dataset.

Data Preprocessing
Batch effect correction in gene expression data is crucial to ensure that the input to the subsequent steps is clean and consistent.Thus, the Empirical Bayes Harmonization (EBH) is a novel data preprocessing procedure that combines the Empirical Bayes framework and Harmonization principles to address batch effects in gene expression datasets [27,28].EBH aims to remove technical variations while harmonizing the data, allowing for robust and integrative analyses across diverse dataset formats.Algorithm 1 represents the procedures of EBH.
In this data preparation and analysis process, we start with gene expression data matrix  ∈ || × for the primary dataset, which we divide into a  ∈ || × (biological signal matrix) and a  ∈ || × (batch-specific effect matrix).We fit a linear model to estimate the batch effects for each gene ( ) in the primary dataset, taking into account the overall mean (µ) and residual error (e).We then obtain additional gene expression

Data Preprocessing
Batch effect correction in gene expression data is crucial to ensure that the input to the subsequent steps is clean and consistent.Thus, the Empirical Bayes Harmonization (EBH) is a novel data preprocessing procedure that combines the Empirical Bayes framework and Harmonization principles to address batch effects in gene expression datasets [27,28].EBH aims to remove technical variations while harmonizing the data, allowing for robust and integrative analyses across diverse dataset formats.Algorithm 1 represents the procedures of EBH.
In this data preparation and analysis process, we start with gene expression data matrix D ∈ |M| n×p for the primary dataset, which we divide into a B s ∈ |M| n×p (biological signal matrix) and a B e ∈ |M| n×p (batch-specific effect matrix).We fit a linear model to estimate the batch effects for each gene (g i ) in the primary dataset, taking into account the overall mean (µ) and residual error (e).We then obtain additional gene expression datasets D j from different batches and perform the same process to estimate batchspecific effects in each dataset using linear models, followed by empirical Bayes shrinkage to stabilize variance estimates.Afterwards, we correct all datasets' gene expression data matrices to remove batch effects and harmonize the data.The resulting harmonized and batch-corrected gene expression data can be integrated for more robust analyses, such as differential expression, enabling comprehensive insights into gene expression patterns across diverse datasets, crucial for cancer detection and research.
End Do //Harmonization: End Do //Batch Effect Correction and Harmonization: The harmonized and batch-corrected gene expression data matrices B s (g i ), B s j (g i ) are utilized to extract the required features.PCA (Principal Component Analysis) is applied to extract the features from the batch effect correction and harmonization step.
We sort the eigenvalues in descending order and choose the top 'K' eigenvectors, representing the principal components.These selected eigenvectors V1 , V2 • • • VK capture the most significant variation in the integrated gene expression data, allowing for dimensionality reduction and efficient feature extraction.The mean-centered integrated data (the mean of each gene across samples, X(µ) are ultimately projected onto the selected principal components (PCs) to obtain the feature representation.
The represented F i ∈ |M| n×p matrix is projected onto a two-dimensional space for N number of selected principal components.

Jensen-Shannon Divergence (JSD)
The distributions of gene expression profiles of cancerous ( differential expression, enabling comprehensive insights into gene expression patterns across diverse datasets, crucial for cancer detection and research.are utilized to extract the required features.PCA (Principal Component Analysis) is applied to extract the features from the batch effect correction and harmonization step.We sort the eigenvalues in descending order and choose the top 'K' eigenvectors, representing the principal components.These selected eigenvectors  ,  ⋯  capture the most significant variation in the integrated gene expression data, allowing for dimensionality reduction and efficient feature extraction.The mean-centered integrated data (the mean of each gene across samples, ()) are ultimately projected onto the selected principal components (PCs) to obtain the feature representation.
The represented  ∈ || × matrix is projected onto a two-dimensional space for N number of selected principal components.

Jensen-Shannon Divergence (JSD)
The distributions of gene expression profiles of cancerous (Ꞓ) and non-cancerous (₵) samples are computed using JSD.JSD will measure the similarity or dissimilarity between the two distributions, providing valuable information about the differences in gene expression patterns between the two groups.,  ⋯ cted onto a two-dimensional space for N s of cancerous (Ꞓ) and non-cancerous (₵) re the similarity or dissimilarity between mation about the differences in gene ex-samples are computed using JSD.JSD will measure the similarity or dissimilarity between the two distributions, providing valuable information about the differences in gene expression patterns between the two groups.
Initially, for each g i , the probability distribution is computed as follows: Diagnostics 2023, 13, x FOR PEER REVIEW 10 of 20 Initially, for each g , the probability distribution is computed as follows: where (Ꞓ ) and (₵ ) represent the probability distribution of gene expression profiles for cancerous and non-cancerous samples, respectively.∑ Ꞓ and ∑ ₵ represent the count of occurrences of gene expression values for g in the respective sample group (n).
To estimate the JSD between the two distributions, it is necessary to compute the average distribution ƥ of (Ꞓ ) and (₵ ) for each g .
Then, the Jensen-Shannon divergence is determined as follows:   Initially, for each g , the probability distribution is computed as follows: where (Ꞓ ) and (₵ ) represent the probability distribution of gene expression profiles for cancerous and non-cancerous samples, respectively.∑ Ꞓ and ∑ ₵ represent the count of occurrences of gene expression values for g in the respective sample group (n).
To estimate the JSD between the two distributions, it is necessary to compute the average distribution ƥ of (Ꞓ ) and (₵ ) for each g .
Then, the Jensen-Shannon divergence is determined as follows: ( where P( Initially, for each g , the probability distribution is computed as follows: where (Ꞓ ) and (₵ ) represent the probability distribution of gene expression profiles for cancerous and non-cancerous samples, respectively.∑ Ꞓ and ∑ ₵ represent the count of occurrences of gene expression values for g in the respective sample group (n).
To estimate the JSD between the two distributions, it is necessary to compute the average distribution ƥ of (Ꞓ ) and (₵ ) for each g .
Then, the Jensen-Shannon divergence is determined as follows: ) and P( Initially, for each g , the probability distribution is computed as follows: where (Ꞓ ) and (₵ ) represent the probability distribution of gene expression profiles for cancerous and non-cancerous samples, respectively.∑ Ꞓ and ∑ ₵ represent the count of occurrences of gene expression values for g in the respective sample group (n).
To estimate the JSD between the two distributions, it is necessary to compute the average distribution ƥ of (Ꞓ ) and (₵ ) for each g .
Then, the Jensen-Shannon divergence is determined as follows: ) represent the probability distribution of gene expression profiles for cancerous and non-cancerous samples, respectively.∑ Initially, for each g , the probability distribution is where (Ꞓ ) and (₵ ) represent the probability dis files for cancerous and non-cancerous samples, respecti count of occurrences of gene expression values for g i To estimate the JSD between the two distribution average distribution ƥ of (Ꞓ ) and (₵ ) for each g ƥ  = [(Ꞓ  ) + (₵  )] Then, the Jensen-Shannon divergence is determin represent the count of occurrences of gene expression values for g i in the respective sample group (n).
To estimate the JSD between the two distributions, it is necessary to compute the average distribution where (Ꞓ ) and (₵ ) represent the probability distribution of gene expression profiles for cancerous and non-cancerous samples, respectively.∑ Ꞓ and ∑ ₵ represent the count of occurrences of gene expression values for g in the respective sample group (n).
To estimate the JSD between the two distributions, it is necessary to compute the average distribution ƥ of (Ꞓ ) and (₵ ) for each g .
Then, the Jensen-Shannon divergence is determined as follows: where ϒ(X||Y)↦ϒ[P(Ꞓi)||P(₵i)] denotes the Kullback-Leibler (ϒ) divergence, which measures how one probability distribution diverges from a second, expected probability distribution.
This provides a symmetric measure of the similarity between the two probability distributions, P(Ꞓi) and P(₵i).The result is a value ranging from 0 to 1, where 0 indicates that the two distributions are identical, and 1 indicates that the two distributions do not overlap at all.
The estimation of ϒ, as described in the provided method, involves comparing two probability distributions, P(Ꞓi) and P(₵i), with an expected distribution ƥ  .KL divergence measures how P(Ꞓi) and P(₵i) deviate from the expected distribution ƥ  .Specifically, the method uses the symmetric form of ϒ, denoted as ϒ[P(Ꞓi)||P(₵i)], which ensures that the divergence between P(Ꞓi) and P(₵i) is balanced and unbiased.This symmetric formulation acknowledges that ϒ is not symmetric by nature, thereby providing an accurate and comprehensive evaluation of the dissimilarity between the two probability distributions.Incorporating this symmetric ϒ into the Jensen-Shannon divergence calculation demonstrates a mathematically rigorous and well-founded approach to comparing probability distributions in genomics research.This nuanced understanding and application of ϒ highlight the method's technical robustness, ensuring precise measurement of distributional differences and enhancing the reliability of the research outcomes.
In the context of gene expression profiles, JSD effectively distinguishes between cancerous and non-cancerous samples, with lower values indicating similarity and higher values highlighting significant differences.

Intelligent Computation
To learn complex patterns and relationships from each of the ( ) ∈  , we applied a DL transformer-based process, namely PCA-transformer [29][30][31].The model comprises of three phases: embedding layer, self-learning transformer, and output layer.
Embedding Process (Ẽ): The extracted  ∈ || × will be passed through an embedding layer to convert the numerical values into dense embeddings.This layer allows the model to learn meaningful representations of the PCs.Thus, the output of the embedding layer is obtained by matrix multiplication and is expressed as where E is the embedding matrix with embedding dimensions.

of P(
Initially, for each g , the probability distribution is computed as follows: where (Ꞓ ) and (₵ ) represent the probability distribution of gene express files for cancerous and non-cancerous samples, respectively.∑ Ꞓ and ∑ ₵ repre count of occurrences of gene expression values for g in the respective sample gr To estimate the JSD between the two distributions, it is necessary to com average distribution ƥ of (Ꞓ ) and (₵ ) for each g .
Then, the Jensen-Shannon divergence is determined as follows: where ϒ(X||Y)↦ϒ[P(Ꞓi)||P(₵i)] denotes the Kullback-Leibler (ϒ) divergence measures how one probability distribution diverges from a second, expected pro distribution.
This provides a symmetric measure of the similarity between the two probab tributions, P(Ꞓi) and P(₵i).The result is a value ranging from 0 to 1, where 0 indic the two distributions are identical, and 1 indicates that the two distributions do n lap at all.
The estimation of ϒ, as described in the provided method, involves compa probability distributions, P(Ꞓi) and P(₵i), with an expected distribution ƥ  .KL div measures how P(Ꞓi) and P(₵i) deviate from the expected distribution ƥ  .Specific method uses the symmetric form of ϒ, denoted as ϒ[P(Ꞓi)||P(₵i)], which ensures divergence between P(Ꞓi) and P(₵i) is balanced and unbiased.This symmetric form acknowledges that ϒ is not symmetric by nature, thereby providing an accurate a prehensive evaluation of the dissimilarity between the two probability distribut corporating this symmetric ϒ into the Jensen-Shannon divergence calculation strates a mathematically rigorous and well-founded approach to comparing pro distributions in genomics research.This nuanced understanding and applicat highlight the method's technical robustness, ensuring precise measurement of tional differences and enhancing the reliability of the research outcomes.
In the context of gene expression profiles, JSD effectively distinguishes betw cerous and non-cancerous samples, with lower values indicating similarity and values highlighting significant differences.

Intelligent Computation
To learn complex patterns and relationships from each of the ( ) ∈  , we a DL transformer-based process, namely PCA-transformer [29][30][31].The model com three phases: embedding layer, self-learning transformer, and output layer.
Embedding Process (Ẽ): The extracted  ∈ || × will be passed through an ding layer to convert the numerical values into dense embeddings.This layer al model to learn meaningful representations of the PCs.Thus, the output of the em layer is obtained by matrix multiplication and is expressed as where E is the embedding matrix with embedding dimensions.
) and P( Initially, for each g , the probability distribution is computed as follows: where (Ꞓ ) and (₵ ) represent the probability distribution of gene expressio files for cancerous and non-cancerous samples, respectively.∑ Ꞓ and ∑ ₵ repres count of occurrences of gene expression values for g in the respective sample gro To estimate the JSD between the two distributions, it is necessary to comp average distribution ƥ of (Ꞓ ) and (₵ ) for each g .
This provides a symmetric measure of the similarity between the two probabil tributions, P(Ꞓi) and P(₵i).The result is a value ranging from 0 to 1, where 0 indica the two distributions are identical, and 1 indicates that the two distributions do no lap at all.
The estimation of ϒ, as described in the provided method, involves compari probability distributions, P(Ꞓi) and P(₵i), with an expected distribution ƥ  .KL dive measures how P(Ꞓi) and P(₵i) deviate from the expected distribution ƥ  .Specifica method uses the symmetric form of ϒ, denoted as ϒ[P(Ꞓi)||P(₵i)], which ensures t divergence between P(Ꞓi) and P(₵i) is balanced and unbiased.This symmetric form acknowledges that ϒ is not symmetric by nature, thereby providing an accurate an prehensive evaluation of the dissimilarity between the two probability distributio corporating this symmetric ϒ into the Jensen-Shannon divergence calculation d strates a mathematically rigorous and well-founded approach to comparing prob distributions in genomics research.This nuanced understanding and applicatio highlight the method's technical robustness, ensuring precise measurement of d tional differences and enhancing the reliability of the research outcomes.
In the context of gene expression profiles, JSD effectively distinguishes betwe cerous and non-cancerous samples, with lower values indicating similarity and values highlighting significant differences.

Intelligent Computation
To learn complex patterns and relationships from each of the ( ) ∈  , we ap DL transformer-based process, namely PCA-transformer [29][30][31].The model comp three phases: embedding layer, self-learning transformer, and output layer.
Embedding Process (Ẽ): The extracted  ∈ || × will be passed through an e ding layer to convert the numerical values into dense embeddings.This layer allo model to learn meaningful representations of the PCs.Thus, the output of the emb layer is obtained by matrix multiplication and is expressed as where E is the embedding matrix with embedding dimensions.
) for each g i .
where (Ꞓ ) and (₵ ) represent the probability distribution of gene expression profiles for cancerous and non-cancerous samples, respectively.∑ Ꞓ and ∑ ₵ represent the count of occurrences of gene expression values for g in the respective sample group (n).
To estimate the JSD between the two distributions, it is necessary to compute the average distribution ƥ of (Ꞓ ) and (₵ ) for each g .
Then, the Jensen-Shannon divergence is determined as follows: where ϒ(X||Y)↦ϒ[P(Ꞓi)||P(₵i)] denotes the Kullback-Leibler (ϒ) divergence, which measures how one probability distribution diverges from a second, expected probability distribution.
This provides a symmetric measure of the similarity between the two probability distributions, P(Ꞓi) and P(₵i).The result is a value ranging from 0 to 1, where 0 indicates that the two distributions are identical, and 1 indicates that the two distributions do not overlap at all.
The estimation of ϒ, as described in the provided method, involves comparing two probability distributions, P(Ꞓi) and P(₵i), with an expected distribution ƥ  .KL divergence measures how P(Ꞓi) and P(₵i) deviate from the expected distribution ƥ  .Specifically, the method uses the symmetric form of ϒ, denoted as ϒ[P(Ꞓi)||P(₵i)], which ensures that the divergence between P(Ꞓi) and P(₵i) is balanced and unbiased.This symmetric formulation acknowledges that ϒ is not symmetric by nature, thereby providing an accurate and comprehensive evaluation of the dissimilarity between the two probability distributions.Incorporating this symmetric ϒ into the Jensen-Shannon divergence calculation demonstrates a mathematically rigorous and well-founded approach to comparing probability distributions in genomics research.This nuanced understanding and application of ϒ highlight the method's technical robustness, ensuring precise measurement of distributional differences and enhancing the reliability of the research outcomes.
In the context of gene expression profiles, JSD effectively distinguishes between cancerous and non-cancerous samples, with lower values indicating similarity and higher values highlighting significant differences.

Intelligent Computation
To learn complex patterns and relationships from each of the ( ) ∈  , we applied a DL transformer-based process, namely PCA-transformer [29][30][31].The model comprises of three phases: embedding layer, self-learning transformer, and output layer.
Embedding Process (Ẽ): The extracted  ∈ || × will be passed through an embedding layer to convert the numerical values into dense embeddings.This layer allows the model to learn meaningful representations of the PCs.Thus, the output of the embedding layer is obtained by matrix multiplication and is expressed as where E is the embedding matrix with embedding dimensions.
Then, the Jensen-Shannon divergence is determined as follows: Initially, for each g , the probability distribution is computed as follows: where (Ꞓ ) and (₵ ) represent the probability distribution of gene expression profiles for cancerous and non-cancerous samples, respectively.∑ Ꞓ and ∑ ₵ represent the count of occurrences of gene expression values for g in the respective sample group (n).
To estimate the JSD between the two distributions, it is necessary to compute the average distribution ƥ of (Ꞓ ) and (₵ ) for each g .
Then, the Jensen-Shannon divergence is determined as follows: where ϒ(X||Y)↦ϒ[P(Ꞓi)||P(₵i)] denotes the Kullback-Leibler (ϒ) divergence, which measures how one probability distribution diverges from a second, expected probability distribution.
This provides a symmetric measure of the similarity between the two probability distributions, P(Ꞓi) and P(₵i).The result is a value ranging from 0 to 1, where 0 indicates that the two distributions are identical, and 1 indicates that the two distributions do not overlap at all.
The estimation of ϒ, as described in the provided method, involves comparing two probability distributions, P(Ꞓi) and P(₵i), with an expected distribution ƥ  .KL divergence measures how P(Ꞓi) and P(₵i) deviate from the expected distribution ƥ  .Specifically, the method uses the symmetric form of ϒ, denoted as ϒ[P(Ꞓi)||P(₵i)], which ensures that the divergence between P(Ꞓi) and P(₵i) is balanced and unbiased.This symmetric formulation acknowledges that ϒ is not symmetric by nature, thereby providing an accurate and comprehensive evaluation of the dissimilarity between the two probability distributions.Incorporating this symmetric ϒ into the Jensen-Shannon divergence calculation demonstrates a mathematically rigorous and well-founded approach to comparing probability distributions in genomics research.This nuanced understanding and application of ϒ highlight the method's technical robustness, ensuring precise measurement of distributional differences and enhancing the reliability of the research outcomes.
In the context of gene expression profiles, JSD effectively distinguishes between cancerous and non-cancerous samples, with lower values indicating similarity and higher values highlighting significant differences.

Intelligent Computation
To learn complex patterns and relationships from each of the ( ) ∈  , we applied a DL transformer-based process, namely PCA-transformer [29][30][31].The model comprises of three phases: embedding layer, self-learning transformer, and output layer.
Embedding Process (Ẽ): The extracted  ∈ || × will be passed through an embedding layer to convert the numerical values into dense embeddings.This layer allows the model to learn meaningful representations of the PCs.Thus, the output of the embedding layer is obtained by matrix multiplication and is expressed as where E is the embedding matrix with embedding dimensions. ( where γ(X||Y) →γ[P( Initially, for each g , the probability distribution is computed as follows: where (Ꞓ ) and (₵ ) represent the probability distribution of gene expression p files for cancerous and non-cancerous samples, respectively.∑ Ꞓ and ∑ ₵ represent count of occurrences of gene expression values for g in the respective sample group To estimate the JSD between the two distributions, it is necessary to compute average distribution ƥ of (Ꞓ ) and (₵ ) for each g .
This provides a symmetric measure of the similarity between the two probability tributions, P(Ꞓi) and P(₵i).The result is a value ranging from 0 to 1, where 0 indicates the two distributions are identical, and 1 indicates that the two distributions do not o lap at all.
The estimation of ϒ, as described in the provided method, involves comparing probability distributions, P(Ꞓi) and P(₵i), with an expected distribution ƥ  .KL diverge measures how P(Ꞓi) and P(₵i) deviate from the expected distribution ƥ  .Specifically, method uses the symmetric form of ϒ, denoted as ϒ[P(Ꞓi)||P(₵i)], which ensures that divergence between P(Ꞓi) and P(₵i) is balanced and unbiased.This symmetric formula acknowledges that ϒ is not symmetric by nature, thereby providing an accurate and c prehensive evaluation of the dissimilarity between the two probability distributions corporating this symmetric ϒ into the Jensen-Shannon divergence calculation dem strates a mathematically rigorous and well-founded approach to comparing probab distributions in genomics research.This nuanced understanding and application o highlight the method's technical robustness, ensuring precise measurement of distr tional differences and enhancing the reliability of the research outcomes.
In the context of gene expression profiles, JSD effectively distinguishes between c cerous and non-cancerous samples, with lower values indicating similarity and hig values highlighting significant differences.

Intelligent Computation
To learn complex patterns and relationships from each of the ( ) ∈  , we appli DL transformer-based process, namely PCA-transformer [29][30][31].The model comprise three phases: embedding layer, self-learning transformer, and output layer.
Embedding Process (Ẽ): The extracted  ∈ || × will be passed through an emb ding layer to convert the numerical values into dense embeddings.This layer allows model to learn meaningful representations of the PCs.Thus, the output of the embedd layer is obtained by matrix multiplication and is expressed as where E is the embedding matrix with embedding dimensions.
This provides a symmetric measure of the similarity between the two probability distributions, P(Ꞓi) and P(₵i).The result is a value ranging from 0 to 1, where 0 indicates that the two distributions are identical, and 1 indicates that the two distributions do not overlap at all.
The estimation of ϒ, as described in the provided method, involves comparing two probability distributions, P(Ꞓi) and P(₵i), with an expected distribution ƥ  .KL divergence measures how P(Ꞓi) and P(₵i) deviate from the expected distribution ƥ  .Specifically, the method uses the symmetric form of ϒ, denoted as ϒ[P(Ꞓi)||P(₵i)], which ensures that the divergence between P(Ꞓi) and P(₵i) is balanced and unbiased.This symmetric formulation acknowledges that ϒ is not symmetric by nature, thereby providing an accurate and comprehensive evaluation of the dissimilarity between the two probability distributions.Incorporating this symmetric ϒ into the Jensen-Shannon divergence calculation demonstrates a mathematically rigorous and well-founded approach to comparing probability distributions in genomics research.This nuanced understanding and application of ϒ highlight the method's technical robustness, ensuring precise measurement of distributional differences and enhancing the reliability of the research outcomes.
In the context of gene expression profiles, JSD effectively distinguishes between cancerous and non-cancerous samples, with lower values indicating similarity and higher values highlighting significant differences.

Intelligent Computation
To learn complex patterns and relationships from each of the ( ) ∈  , we applied a DL transformer-based process, namely PCA-transformer [29][30][31].The model comprises of three phases: embedding layer, self-learning transformer, and output layer.
Embedding Process (Ẽ): The extracted  ∈ || × will be passed through an embedding layer to convert the numerical values into dense embeddings.This layer allows the model to learn meaningful representations of the PCs.Thus, the output of the embedding layer is obtained by matrix multiplication and is expressed as where E is the embedding matrix with embedding dimensions.Self-Supervised Transformer: Let TL be the number of transformer layers in the model.Each transformer layer consists of self-attention and feedforward neural network sub-layers.
For each TL(L → 1 to l), the self-attention mechanism computes the attention weights Aw and the attention output, Ow.Let I l−1 be the input embeddings for the (l−1) th layer, and O l be the output embeddings for the l th layer.Thus, the self-attention computation is expressed as )] denotes the Kullback-Leibler (γ) divergence, which measures how one probability distribution diverges from a second, expected probability distribution.Then, the Jensen-Shannon divergence is determined as follows: where ϒ(X||Y)↦ϒ[P(Ꞓi)||P(₵i)] denotes the Kullback-Leibler (ϒ) divergence, which measures how one probability distribution diverges from a second, expected probability distribution.
This provides a symmetric measure of the similarity between the two probability distributions, P(Ꞓi) and P(₵i).The result is a value ranging from 0 to 1, where 0 indicates that the two distributions are identical, and 1 indicates that the two distributions do not overlap at all.
The estimation of ϒ, as described in the provided method, involves comparing two probability distributions, P(Ꞓi) and P(₵i), with an expected distribution ƥ  .KL divergence measures how P(Ꞓi) and P(₵i) deviate from the expected distribution ƥ  .Specifically, the method uses the symmetric form of ϒ, denoted as ϒ[P(Ꞓi)||P(₵i)], which ensures that the divergence between P(Ꞓi) and P(₵i) is balanced and unbiased.This symmetric formulation acknowledges that ϒ is not symmetric by nature, thereby providing an accurate and comprehensive evaluation of the dissimilarity between the two probability distributions.Incorporating this symmetric ϒ into the Jensen-Shannon divergence calculation demonstrates a mathematically rigorous and well-founded approach to comparing probability distributions in genomics research.This nuanced understanding and application of ϒ highlight the method's technical robustness, ensuring precise measurement of distributional differences and enhancing the reliability of the research outcomes.
In the context of gene expression profiles, JSD effectively distinguishes between cancerous and non-cancerous samples, with lower values indicating similarity and higher values highlighting significant differences.

Intelligent Computation
To learn complex patterns and relationships from each of the ( ) ∈  , we applied a DL transformer-based process, namely PCA-transformer [29][30][31].The model comprises of three phases: embedding layer, self-learning transformer, and output layer.
Embedding Process (Ẽ): The extracted  ∈ || × will be passed through an embedding layer to convert the numerical values into dense embeddings.This layer allows the model to learn meaningful representations of the PCs.Thus, the output of the embedding layer is obtained by matrix multiplication and is expressed as where E is the embedding matrix with embedding dimensions.Self-Supervised Transformer: Let TL be the number of transformer layers in the model.Each transformer layer consists of self-attention and feedforward neural network sub-layers.
For each TL(L → 1 to l), the self-attention mechanism computes the attention weights Aw and the attention output, Ow.Let I l−1 be the input embeddings for the (l−1) th layer, and O l be the output embeddings for the l th layer.Thus, the self-attention computation is expressed as This provides a symmetric measure of the similarity between the two probability distributions, P( Diagnostics 2023, 13, x FOR PEER REVIEW 10 of 20 Initially, for each g , the probability distribution is computed as follows: where (Ꞓ ) and (₵ ) represent the probability distribution of gene expression profiles for cancerous and non-cancerous samples, respectively.∑ Ꞓ and ∑ ₵ represent the count of occurrences of gene expression values for g in the respective sample group (n).
To estimate the JSD between the two distributions, it is necessary to compute the average distribution ƥ of (Ꞓ ) and (₵ ) for each g .
Then, the Jensen-Shannon divergence is determined as follows: where ϒ(X||Y)↦ϒ[P(Ꞓi)||P(₵i)] denotes the Kullback-Leibler (ϒ) divergence, which measures how one probability distribution diverges from a second, expected probability distribution.
This provides a symmetric measure of the similarity between the two probability distributions, P(Ꞓi) and P(₵i).The result is a value ranging from 0 to 1, where 0 indicates that the two distributions are identical, and 1 indicates that the two distributions do not overlap at all.
The estimation of ϒ, as described in the provided method, involves comparing two probability distributions, P(Ꞓi) and P(₵i), with an expected distribution ƥ  .KL divergence measures how P(Ꞓi) and P(₵i) deviate from the expected distribution ƥ  .Specifically, the method uses the symmetric form of ϒ, denoted as ϒ[P(Ꞓi)||P(₵i)], which ensures that the divergence between P(Ꞓi) and P(₵i) is balanced and unbiased.This symmetric formulation acknowledges that ϒ is not symmetric by nature, thereby providing an accurate and comprehensive evaluation of the dissimilarity between the two probability distributions.Incorporating this symmetric ϒ into the Jensen-Shannon divergence calculation demonstrates a mathematically rigorous and well-founded approach to comparing probability distributions in genomics research.This nuanced understanding and application of ϒ highlight the method's technical robustness, ensuring precise measurement of distributional differences and enhancing the reliability of the research outcomes.
In the context of gene expression profiles, JSD effectively distinguishes between cancerous and non-cancerous samples, with lower values indicating similarity and higher values highlighting significant differences.

Intelligent Computation
To learn complex patterns and relationships from each of the ( ) ∈  , we applied a DL transformer-based process, namely PCA-transformer [29 -31].The model comprises of three phases: embedding layer, self-learning transformer, and output layer.
Embedding Process (Ẽ): The extracted  ∈ || × will be passed through an embedding layer to convert the numerical values into dense embeddings.This layer allows the Then, the Jensen-Shannon divergence is determined as follows: where ϒ(X||Y)↦ϒ[P(Ꞓi)||P(₵i)] denotes the Kullback-Leibler (ϒ) divergence, which measures how one probability distribution diverges from a second, expected probability distribution.
This provides a symmetric measure of the similarity between the two probability distributions, P(Ꞓi) and P(₵i).The result is a value ranging from 0 to 1, where 0 indicates that the two distributions are identical, and 1 indicates that the two distributions do not overlap at all.
The estimation of ϒ, as described in the provided method, involves comparing two probability distributions, P(Ꞓi) and P(₵i), with an expected distribution ƥ  .KL divergence measures how P(Ꞓi) and P(₵i) deviate from the expected distribution ƥ  .Specifically, the method uses the symmetric form of ϒ, denoted as ϒ[P(Ꞓi)||P(₵i)], which ensures that the divergence between P(Ꞓi) and P(₵i) is balanced and unbiased.This symmetric formulation acknowledges that ϒ is not symmetric by nature, thereby providing an accurate and comprehensive evaluation of the dissimilarity between the two probability distributions.Incorporating this symmetric ϒ into the Jensen-Shannon divergence calculation demonstrates a mathematically rigorous and well-founded approach to comparing probability distributions in genomics research.This nuanced understanding and application of ϒ highlight the method's technical robustness, ensuring precise measurement of distributional differences and enhancing the reliability of the research outcomes.
In the context of gene expression profiles, JSD effectively distinguishes between cancerous and non-cancerous samples, with lower values indicating similarity and higher values highlighting significant differences.

Intelligent Computation
To learn complex patterns and relationships from each of the ( ) ∈  , we applied a DL transformer-based process, namely PCA-transformer [29][30][31].The model comprises of three phases: embedding layer, self-learning transformer, and output layer.
Embedding Process (Ẽ): The extracted  ∈ || × will be passed through an embedding layer to convert the numerical values into dense embeddings.This layer allows the model to learn meaningful representations of the PCs.Thus, the output of the embedding layer is obtained by matrix multiplication and is expressed as where E is the embedding matrix with embedding dimensions.Self-Supervised Transformer: Let TL be the number of transformer layers in the model.Each transformer layer consists of self-attention and feedforward neural network sub-layers.
For each TL(L → 1 to l), the self-attention mechanism computes the attention weights Aw and the attention output, Ow.Let I l−1 be the input embeddings for the (l−1) th layer, and O l be the output embeddings for the l th layer.Thus, the self-attention computation is ex- ).The result is a value ranging from 0 to 1, where 0 indicates that the two distributions are identical, and 1 indicates that the two distributions do not overlap at all.
The estimation of γ, as described in the provided method, involves comparing two probability distributions, P( Diagnostics 2023, 13, x FOR PEER REVIEW Initially, for each g , the probability distribution is computed as follows: where (Ꞓ ) and (₵ ) represent the probability distribution of gene express files for cancerous and non-cancerous samples, respectively.∑ Ꞓ and ∑ ₵ repr count of occurrences of gene expression values for g in the respective sample g To estimate the JSD between the two distributions, it is necessary to com average distribution ƥ of (Ꞓ ) and (₵ ) for each g .
Then, the Jensen-Shannon divergence is determined as follows: where ϒ(X||Y)↦ϒ[P(Ꞓi)||P(₵i)] denotes the Kullback-Leibler (ϒ) divergence measures how one probability distribution diverges from a second, expected pr distribution.

ϒ(𝑿||𝒀) = 𝑿(𝒙) 𝐥𝐨𝐠 𝟐 𝑿(𝒙) 𝒀(𝒙)
This provides a symmetric measure of the similarity between the two probab tributions, P(Ꞓi) and P(₵i).The result is a value ranging from 0 to 1, where 0 indic the two distributions are identical, and 1 indicates that the two distributions do n lap at all.
The estimation of ϒ, as described in the provided method, involves compa probability distributions, P(Ꞓi) and P(₵i), with an expected distribution ƥ  .KL div measures how P(Ꞓi) and P(₵i) deviate from the expected distribution ƥ  .Specific method uses the symmetric form of ϒ, denoted as ϒ[P(Ꞓi)||P(₵i)], which ensures divergence between P(Ꞓi) and P(₵i) is balanced and unbiased.This symmetric form acknowledges that ϒ is not symmetric by nature, thereby providing an accurate a prehensive evaluation of the dissimilarity between the two probability distribut corporating this symmetric ϒ into the Jensen-Shannon divergence calculation strates a mathematically rigorous and well-founded approach to comparing pr distributions in genomics research.This nuanced understanding and applicat highlight the method's technical robustness, ensuring precise measurement of tional differences and enhancing the reliability of the research outcomes.
In the context of gene expression profiles, JSD effectively distinguishes betw cerous and non-cancerous samples, with lower values indicating similarity an values highlighting significant differences.

Intelligent Computation
To learn complex patterns and relationships from each of the ( ) ∈  , we a Then, the Jensen-Shannon divergence is determined as follows: where ϒ(X||Y)↦ϒ[P(Ꞓi)||P(₵i)] denotes the Kullback-Leibler (ϒ) divergence, which measures how one probability distribution diverges from a second, expected probability distribution.
This provides a symmetric measure of the similarity between the two probability distributions, P(Ꞓi) and P(₵i).The result is a value ranging from 0 to 1, where 0 indicates that the two distributions are identical, and 1 indicates that the two distributions do not overlap at all.
The estimation of ϒ, as described in the provided method, involves comparing two probability distributions, P(Ꞓi) and P(₵i), with an expected distribution ƥ  .KL divergence measures how P(Ꞓi) and P(₵i) deviate from the expected distribution ƥ  .Specifically, the method uses the symmetric form of ϒ, denoted as ϒ[P(Ꞓi)||P(₵i)], which ensures that the divergence between P(Ꞓi) and P(₵i) is balanced and unbiased.This symmetric formulation acknowledges that ϒ is not symmetric by nature, thereby providing an accurate and comprehensive evaluation of the dissimilarity between the two probability distributions.Incorporating this symmetric ϒ into the Jensen-Shannon divergence calculation demonstrates a mathematically rigorous and well-founded approach to comparing probability distributions in genomics research.This nuanced understanding and application of ϒ highlight the method's technical robustness, ensuring precise measurement of distributional differences and enhancing the reliability of the research outcomes.
In the context of gene expression profiles, JSD effectively distinguishes between cancerous and non-cancerous samples, with lower values indicating similarity and higher values highlighting significant differences.

Intelligent Computation
To learn complex patterns and relationships from each of the ( ) ∈  , we applied a DL transformer-based process, namely PCA-transformer [29][30][31].The model comprises of three phases: embedding layer, self-learning transformer, and output layer.
Embedding Process (Ẽ): The extracted  ∈ || × will be passed through an embedding layer to convert the numerical values into dense embeddings.This layer allows the model to learn meaningful representations of the PCs.Thus, the output of the embedding layer is obtained by matrix multiplication and is expressed as where E is the embedding matrix with embedding dimensions.Self-Supervised Transformer: Let TL be the number of transformer layers in the model.Each transformer layer consists of self-attention and feedforward neural network ), with an expected distribution where (Ꞓ ) and (₵ ) represent the probability distribution of gene expression profiles for cancerous and non-cancerous samples, respectively.∑ Ꞓ and ∑ ₵ represent the count of occurrences of gene expression values for g in the respective sample group (n).
To estimate the JSD between the two distributions, it is necessary to compute the average distribution ƥ of (Ꞓ ) and (₵ ) for each g .
Then, the Jensen-Shannon divergence is determined as follows: where ϒ(X||Y)↦ϒ[P(Ꞓi)||P(₵i)] denotes the Kullback-Leibler (ϒ) divergence, which measures how one probability distribution diverges from a second, expected probability distribution.
This provides a symmetric measure of the similarity between the two probability distributions, P(Ꞓi) and P(₵i).The result is a value ranging from 0 to 1, where 0 indicates that the two distributions are identical, and 1 indicates that the two distributions do not overlap at all.
The estimation of ϒ, as described in the provided method, involves comparing two probability distributions, P(Ꞓi) and P(₵i), with an expected distribution ƥ  .KL divergence measures how P(Ꞓi) and P(₵i) deviate from the expected distribution ƥ  .Specifically, the method uses the symmetric form of ϒ, denoted as ϒ[P(Ꞓi)||P(₵i)], which ensures that the divergence between P(Ꞓi) and P(₵i) is balanced and unbiased.This symmetric formulation acknowledges that ϒ is not symmetric by nature, thereby providing an accurate and comprehensive evaluation of the dissimilarity between the two probability distributions.Incorporating this symmetric ϒ into the Jensen-Shannon divergence calculation demonstrates a mathematically rigorous and well-founded approach to comparing probability distributions in genomics research.This nuanced understanding and application of ϒ highlight the method's technical robustness, ensuring precise measurement of distributional differences and enhancing the reliability of the research outcomes.
In the context of gene expression profiles, JSD effectively distinguishes between cancerous and non-cancerous samples, with lower values indicating similarity and higher values highlighting significant differences.

Intelligent Computation
To learn complex patterns and relationships from each of the ( ) ∈  , we applied a DL transformer-based process, namely PCA-transformer [29][30][31].The model comprises of three phases: embedding layer, self-learning transformer, and output layer.
Embedding Process (Ẽ): The extracted  ∈ || × will be passed through an embedding layer to convert the numerical values into dense embeddings.This layer allows the model to learn meaningful representations of the PCs.Thus, the output of the embedding layer is obtained by matrix multiplication and is expressed as where E is the embedding matrix with embedding dimensions.
. KL divergence measures how P( Diagnostics 2023, 13, x FOR PEER REVIEW 10 of 20 Initially, for each g , the probability distribution is computed as follows: where (Ꞓ ) and (₵ ) represent the probability distribution of gene expression profiles for cancerous and non-cancerous samples, respectively.∑ Ꞓ and ∑ ₵ represent the count of occurrences of gene expression values for g in the respective sample group (n).
To estimate the JSD between the two distributions, it is necessary to compute the average distribution ƥ of (Ꞓ ) and (₵ ) for each g .
Then, the Jensen-Shannon divergence is determined as follows: where ϒ(X||Y)↦ϒ[P(Ꞓi)||P(₵i)] denotes the Kullback-Leibler (ϒ) divergence, which measures how one probability distribution diverges from a second, expected probability distribution.
This provides a symmetric measure of the similarity between the two probability distributions, P(Ꞓi) and P(₵i).The result is a value ranging from 0 to 1, where 0 indicates that the two distributions are identical, and 1 indicates that the two distributions do not overlap at all.
The estimation of ϒ, as described in the provided method, involves comparing two probability distributions, P(Ꞓi) and P(₵i), with an expected distribution ƥ  .KL divergence measures how P(Ꞓi) and P(₵i) deviate from the expected distribution ƥ  .Specifically, the method uses the symmetric form of ϒ, denoted as ϒ[P(Ꞓi)||P(₵i)], which ensures that the divergence between P(Ꞓi) and P(₵i) is balanced and unbiased.This symmetric formulation acknowledges that ϒ is not symmetric by nature, thereby providing an accurate and comprehensive evaluation of the dissimilarity between the two probability distributions.Incorporating this symmetric ϒ into the Jensen-Shannon divergence calculation demonstrates a mathematically rigorous and well-founded approach to comparing probability distributions in genomics research.This nuanced understanding and application of ϒ highlight the method's technical robustness, ensuring precise measurement of distributional differences and enhancing the reliability of the research outcomes.
In the context of gene expression profiles, JSD effectively distinguishes between cancerous and non-cancerous samples, with lower values indicating similarity and higher values highlighting significant differences.Then, the Jensen-Shannon divergence is determined as follows:
This provides a symmetric measure of the similarity between the two probability distributions, P(Ꞓi) and P(₵i).The result is a value ranging from 0 to 1, where 0 indicates that the two distributions are identical, and 1 indicates that the two distributions do not overlap at all.
The estimation of ϒ, as described in the provided method, involves comparing two probability distributions, P(Ꞓi) and P(₵i), with an expected distribution ƥ  .KL divergence measures how P(Ꞓi) and P(₵i) deviate from the expected distribution ƥ  .Specifically, the method uses the symmetric form of ϒ, denoted as ϒ[P(Ꞓi)||P(₵i)], which ensures that the divergence between P(Ꞓi) and P(₵i) is balanced and unbiased.This symmetric formulation acknowledges that ϒ is not symmetric by nature, thereby providing an accurate and comprehensive evaluation of the dissimilarity between the two probability distributions.Incorporating this symmetric ϒ into the Jensen-Shannon divergence calculation demonstrates a mathematically rigorous and well-founded approach to comparing probability distributions in genomics research.This nuanced understanding and application of ϒ highlight the method's technical robustness, ensuring precise measurement of distributional differences and enhancing the reliability of the research outcomes.
In the context of gene expression profiles, JSD effectively distinguishes between cancerous and non-cancerous samples, with lower values indicating similarity and higher values highlighting significant differences.

Intelligent Computation
To learn complex patterns and relationships from each of the ( ) ∈  , we applied a DL transformer-based process, namely PCA-transformer [29][30][31].The model comprises of three phases: embedding layer, self-learning transformer, and output layer.
Embedding Process (Ẽ): The extracted  ∈ || × will be passed through an embedding layer to convert the numerical values into dense embeddings.This layer allows the model to learn meaningful representations of the PCs.Thus, the output of the embedding layer is obtained by matrix multiplication and is expressed as where E is the embedding matrix with embedding dimensions.Self-Supervised Transformer: Let TL be the number of transformer layers in the ) deviate from the expected distribution where (Ꞓ ) and (₵ ) represent the probability distribution of gene expression profiles for cancerous and non-cancerous samples, respectively.∑ Ꞓ and ∑ ₵ represent the count of occurrences of gene expression values for g in the respective sample group (n).
To estimate the JSD between the two distributions, it is necessary to compute the average distribution ƥ of (Ꞓ ) and (₵ ) for each g .
Then, the Jensen-Shannon divergence is determined as follows: where ϒ(X||Y)↦ϒ[P(Ꞓi)||P(₵i)] denotes the Kullback-Leibler (ϒ) divergence, which measures how one probability distribution diverges from a second, expected probability distribution.
This provides a symmetric measure of the similarity between the two probability distributions, P(Ꞓi) and P(₵i).The result is a value ranging from 0 to 1, where 0 indicates that the two distributions are identical, and 1 indicates that the two distributions do not overlap at all.
The estimation of ϒ, as described in the provided method, involves comparing two probability distributions, P(Ꞓi) and P(₵i), with an expected distribution ƥ  .KL divergence measures how P(Ꞓi) and P(₵i) deviate from the expected distribution ƥ  .Specifically, the method uses the symmetric form of ϒ, denoted as ϒ[P(Ꞓi)||P(₵i)], which ensures that the divergence between P(Ꞓi) and P(₵i) is balanced and unbiased.This symmetric formulation acknowledges that ϒ is not symmetric by nature, thereby providing an accurate and comprehensive evaluation of the dissimilarity between the two probability distributions.Incorporating this symmetric ϒ into the Jensen-Shannon divergence calculation demonstrates a mathematically rigorous and well-founded approach to comparing probability distributions in genomics research.This nuanced understanding and application of ϒ highlight the method's technical robustness, ensuring precise measurement of distributional differences and enhancing the reliability of the research outcomes.
In the context of gene expression profiles, JSD effectively distinguishes between cancerous and non-cancerous samples, with lower values indicating similarity and higher values highlighting significant differences.

Intelligent Computation
To learn complex patterns and relationships from each of the ( ) ∈  , we applied a DL transformer-based process, namely PCA-transformer [29][30][31].The model comprises of three phases: embedding layer, self-learning transformer, and output layer.
Embedding Process (Ẽ): The extracted  ∈ || × will be passed through an embedding layer to convert the numerical values into dense embeddings.This layer allows the model to learn meaningful representations of the PCs.Thus, the output of the embedding layer is obtained by matrix multiplication and is expressed as where E is the embedding matrix with embedding dimensions.
. Specifically, the method uses the symmetric form of γ, denoted as γ[P( Diagnostics 2023, 13, x FOR PEER REVIEW Initially, for each g , the probability distribution where (Ꞓ ) and (₵ ) represent the probability d files for cancerous and non-cancerous samples, respec count of occurrences of gene expression values for g To estimate the JSD between the two distributi average distribution ƥ of (Ꞓ ) and (₵ ) for each g ƥ  = [(Ꞓ  ) + (₵  Then, the Jensen-Shannon divergence is determi where ϒ(X||Y)↦ϒ[P(Ꞓi)||P(₵i)] denotes the Kullba measures how one probability distribution diverges f distribution.

ϒ(𝑿||𝒀) = 𝑿(𝒙) 𝐥𝐨𝐠 𝟐 𝑿
This provides a symmetric measure of the similar tributions, P(Ꞓi) and P(₵i).The result is a value rangin the two distributions are identical, and 1 indicates tha lap at all.
The estimation of ϒ, as described in the provide probability distributions, P(Ꞓi) and P(₵i), with an expe measures how P(Ꞓi) and P(₵i) deviate from the expec method uses the symmetric form of ϒ, denoted as ϒ[P divergence between P(Ꞓi) and P(₵i) is balanced and un acknowledges that ϒ is not symmetric by nature, there prehensive evaluation of the dissimilarity between th corporating this symmetric ϒ into the Jensen-Shann strates a mathematically rigorous and well-founded distributions in genomics research.This nuanced u highlight the method's technical robustness, ensurin tional differences and enhancing the reliability of the In the context of gene expression profiles, JSD eff cerous and non-cancerous samples, with lower valu values highlighting significant differences.

ϒ(𝑿||𝒀) = 𝑿(𝒙) 𝐥𝐨𝐠 𝟐 𝑿(𝒙) 𝒀(𝒙)
This provides a symmetric measure of the similarity between tributions, P(Ꞓi) and P(₵i).The result is a value ranging from 0 to the two distributions are identical, and 1 indicates that the two d lap at all.
The estimation of ϒ, as described in the provided method, i probability distributions, P(Ꞓi) and P(₵i), with an expected distrib measures how P(Ꞓi) and P(₵i) deviate from the expected distribu method uses the symmetric form of ϒ, denoted as ϒ[P(Ꞓi)||P(₵i)] divergence between P(Ꞓi) and P(₵i) is balanced and unbiased.This acknowledges that ϒ is not symmetric by nature, thereby providi prehensive evaluation of the dissimilarity between the two prob corporating this symmetric ϒ into the Jensen-Shannon diverge strates a mathematically rigorous and well-founded approach to distributions in genomics research.This nuanced understandin highlight the method's technical robustness, ensuring precise m tional differences and enhancing the reliability of the research ou In the context of gene expression profiles, JSD effectively dis cerous and non-cancerous samples, with lower values indicatin values highlighting significant differences.

Intelligent Computation
To learn complex patterns and relationships from each of the DL transformer-based process, namely PCA-transformer [29][30][31].three phases: embedding layer, self-learning transformer, and ou Embedding Process (Ẽ): The extracted  ∈ || × will be pa ding layer to convert the numerical values into dense embeddin model to learn meaningful representations of the PCs.Thus, the o layer is obtained by matrix multiplication and is expressed as where E is the embedding matrix with embedding dimensions.
)], which ensures that the divergence between P( Diagnostics 2023, 13, x FOR PEER REVIEW 10 Initially, for each g , the probability distribution is computed as follows: where (Ꞓ ) and (₵ ) represent the probability distribution of gene expression files for cancerous and non-cancerous samples, respectively.∑ Ꞓ and ∑ ₵ represen count of occurrences of gene expression values for g in the respective sample group To estimate the JSD between the two distributions, it is necessary to compute average distribution ƥ of (Ꞓ ) and (₵ ) for each g .

ϒ(𝑿||𝒀) = 𝑿(𝒙) 𝐥𝐨𝐠 𝟐 𝑿(𝒙) 𝒀(𝒙)
This provides a symmetric measure of the similarity between the two probability tributions, P(Ꞓi) and P(₵i).The result is a value ranging from 0 to 1, where 0 indicates the two distributions are identical, and 1 indicates that the two distributions do not o lap at all.
The estimation of ϒ, as described in the provided method, involves comparing probability distributions, P(Ꞓi) and P(₵i), with an expected distribution ƥ  .KL diverge measures how P(Ꞓi) and P(₵i) deviate from the expected distribution ƥ  .Specifically method uses the symmetric form of ϒ, denoted as ϒ[P(Ꞓi)||P(₵i)], which ensures tha divergence between P(Ꞓi) and P(₵i) is balanced and unbiased.This symmetric formula acknowledges that ϒ is not symmetric by nature, thereby providing an accurate and c prehensive evaluation of the dissimilarity between the two probability distributions corporating this symmetric ϒ into the Jensen-Shannon divergence calculation dem strates a mathematically rigorous and well-founded approach to comparing probab distributions in genomics research.This nuanced understanding and application highlight the method's technical robustness, ensuring precise measurement of distr tional differences and enhancing the reliability of the research outcomes.
In the context of gene expression profiles, JSD effectively distinguishes between cerous and non-cancerous samples, with lower values indicating similarity and hig values highlighting significant differences.Then, the Jensen-Shannon divergence is determined as follows: where ϒ(X||Y)↦ϒ[P(Ꞓi)||P(₵i)] denotes the Kullback-Leibler (ϒ) divergence, which measures how one probability distribution diverges from a second, expected probability distribution.

ϒ(𝑿||𝒀) = 𝑿(𝒙) 𝐥𝐨𝐠 𝟐 𝑿(𝒙) 𝒀(𝒙)
This provides a symmetric measure of the similarity between the two probability distributions, P(Ꞓi) and P(₵i).The result is a value ranging from 0 to 1, where 0 indicates that the two distributions are identical, and 1 indicates that the two distributions do not overlap at all.
The estimation of ϒ, as described in the provided method, involves comparing two probability distributions, P(Ꞓi) and P(₵i), with an expected distribution ƥ  .KL divergence measures how P(Ꞓi) and P(₵i) deviate from the expected distribution ƥ  .Specifically, the method uses the symmetric form of ϒ, denoted as ϒ[P(Ꞓi)||P(₵i)], which ensures that the divergence between P(Ꞓi) and P(₵i) is balanced and unbiased.This symmetric formulation acknowledges that ϒ is not symmetric by nature, thereby providing an accurate and comprehensive evaluation of the dissimilarity between the two probability distributions.Incorporating this symmetric ϒ into the Jensen-Shannon divergence calculation demonstrates a mathematically rigorous and well-founded approach to comparing probability distributions in genomics research.This nuanced understanding and application of ϒ highlight the method's technical robustness, ensuring precise measurement of distributional differences and enhancing the reliability of the research outcomes.
In the context of gene expression profiles, JSD effectively distinguishes between cancerous and non-cancerous samples, with lower values indicating similarity and higher values highlighting significant differences.

Intelligent Computation
To learn complex patterns and relationships from each of the ( ) ∈  , we applied a DL transformer-based process, namely PCA-transformer [29][30][31].The model comprises of three phases: embedding layer, self-learning transformer, and output layer.
Embedding Process (Ẽ): The extracted  ∈ || × will be passed through an embedding layer to convert the numerical values into dense embeddings.This layer allows the model to learn meaningful representations of the PCs.Thus, the output of the embedding layer is obtained by matrix multiplication and is expressed as ) is balanced and unbiased.This symmetric formulation acknowledges that γ is not symmetric by nature, thereby providing an accurate and comprehensive evaluation of the dissimilarity between the two probability distributions.Incorporating this symmetric γ into the Jensen-Shannon divergence calculation demonstrates a mathematically rigorous and well-founded approach to comparing probability distributions in genomics research.This nuanced understanding and application of γ highlight the method's technical robustness, ensuring precise measurement of distributional differences and enhancing the reliability of the research outcomes.
In the context of gene expression profiles, JSD effectively distinguishes between cancerous and non-cancerous samples, with lower values indicating similarity and higher values highlighting significant differences.

Intelligent Computation
To learn complex patterns and relationships from each of the (g i ) ∈ D I , we applied a DL transformer-based process, namely PCA-transformer [29][30][31].The model comprises of three phases: embedding layer, self-learning transformer, and output layer.
Embedding Process (Ẽ): The extracted F i ∈ |M| n×p will be passed through an embed- ding layer to convert the numerical values into dense embeddings.This layer allows the model to learn meaningful representations of the PCs.Thus, the output of the embedding layer is obtained by matrix multiplication and is expressed as where E is the embedding matrix with embedding dimensions.Self-Supervised Transformer: Let T L be the number of transformer layers in the model.Each transformer layer consists of self-attention and feedforward neural network sub-layers.
For each T L (L → 1 to l), the self-attention mechanism computes the attention weights A w and the attention output, O w .Let I l−1 be the input embeddings for the (l−1) th layer, and O l be the output embeddings for the l th layer.Thus, the self-attention computation is expressed as Self-Supervised Transformer: Let TL be the number of transformer layers in the model.Each transformer layer consists of self-attention and feedforward neural network sub-layers.
For each TL(L → 1 to l), the self-attention mechanism computes the attention weights Aw and the attention output, Ow.Let I l−1 be the input embeddings for the (l−1) th layer, and O l be the output embeddings for the l th layer.Thus, the self-attention computation is expressed as =   ×    ×    (9) where  ,  ʞ , and  are learnable weight |M| for query, key, and value projections, respectively, and  ʞ , is the dimension of the specific keys.Feedforward Neural Network (FFNN): The FFNN consists of two linear transformations with a ReLU activation function in between [32].Thus, the outcome is computed as where  ,  are learnable weight matrices and bias terms, respectively.The output of the transformer layer is obtained by applying a residual connection and layer normalization.
After the self-supervised transformer [33] encoder, the output embeddings  are used for downstream tasks, such as cancer detection, using a supervised learning approach.Let Ty be the target label for the cancer detection task, with dimensions (n × c), where c indicates the number of cancer types.Similarly,  and  are the weight matrix and bias term for the downstream task classification, respectively.Thus, the final predictive (վ) analytics is determined as The self-supervised loss encourages the model to learn meaningful representations from the extracted PCs.Depending on the chosen self-supervised task, the loss can be contrastive or reconstruction loss.Let Lss be the self-supervised loss term.We use a supervised loss, such as cross-entropy loss, to train the model on labeled data for the downstream task.Let Ls be the supervised loss term.Thus, the overall loss (⅄) is a combination of the Lss and Ls, weighted by their respective hyperparameters, ( and  ): Density Estimation: For each point on the grid, compute the density of the 'n' in  that corresponds to the cancer signature region in the reduced feature space.For these, we utilized KDE (kernel density estimation) computation, which is stated as where (a, b) denotes a point on the grid, ( ,  , ⋯  ) are the values of the selected PCs for each sample, and fk is the kernel function.Figure 3 represents the visualization of the decision boundaries and decision regions of the DL model.This visualization can aid in understanding how the model separates cancerous and non-cancerous samples in the high-dimensional feature space. ( where w l q , w l 11 of 20 pervised Transformer: Let TL be the number of transformer layers in the model.rmer layer consists of self-attention and feedforward neural network sub-layh TL(L → 1 to l), the self-attention mechanism computes the attention weights attention output, Ow.Let I l−1 be the input embeddings for the (l−1) th layer, and tput embeddings for the l th layer.Thus, the self-attention computation is ex- ʞ , and  are learnable weight |M| for query, key, and value projections, rend  ʞ , is the dimension of the specific keys.rward Neural Network (FFNN): The FFNN consists of two linear transforh a ReLU activation function in between [32].Thus, the outcome is computed are learnable weight matrices and bias terms, respectively.The output of mer layer is obtained by applying a residual connection and layer normaliza- he self-supervised transformer [33] encoder, the output embeddings  are wnstream tasks, such as cancer detection, using a supervised learning ap-Ty be the target label for the cancer detection task, with dimensions (n × c), icates the number of cancer types.Similarly,  and  are the weight mas term for the downstream task classification, respectively.Thus, the final prenalytics is determined as lf-supervised loss encourages the model to learn meaningful representations tracted PCs.Depending on the chosen self-supervised task, the loss can be or reconstruction loss.Let Lss be the self-supervised loss term.We use a superuch as cross-entropy loss, to train the model on labeled data for the down-.Let Ls be the supervised loss term.Thus, the overall loss (⅄) is a combination d Ls, weighted by their respective hyperparameters, ( and  ): Estimation: For each point on the grid, compute the density of the 'n' in  onds to the cancer signature region in the reduced feature space.For these, we E (kernel density estimation) computation, which is stated as denotes a point on the grid, ( ,  , ⋯  ) are the values of the selected PCs ple, and fk is the kernel function.Figure 3 represents the visualization of the undaries and decision regions of the DL model.This visualization can aid in ing how the model separates cancerous and non-cancerous samples in the sional feature space.
, and w l v are learnable weight |M| for query, key, and value projections, respectively, and d 11 of 20 elf-Supervised Transformer: Let TL be the number of transformer layers in the model.transformer layer consists of self-attention and feedforward neural network sub-lay-or each TL(L → 1 to l), the self-attention mechanism computes the attention weights d the attention output, Ow.Let I l−1 be the input embeddings for the (l−1) th layer, and the output embeddings for the l th layer.Thus, the self-attention computation is ex-ed as e  ,  ʞ , and  are learnable weight |M| for query, key, and value projections, re-ively, and  ʞ , is the dimension of the specific keys.eedforward Neural Network (FFNN): The FFNN consists of two linear transfor-ns with a ReLU activation function in between [32].Thus, the outcome is computed e  ,  are learnable weight matrices and bias terms, respectively.The output of ansformer layer is obtained by applying a residual connection and layer normaliza- fter the self-supervised transformer [33] encoder, the output embeddings  are for downstream tasks, such as cancer detection, using a supervised learning ap-h.Let Ty be the target label for the cancer detection task, with dimensions (n × c), e c indicates the number of cancer types.Similarly,  and  are the weight ma-nd bias term for the downstream task classification, respectively.Thus, the final pre-e (վ) analytics is determined as he self-supervised loss encourages the model to learn meaningful representations the extracted PCs.Depending on the chosen self-supervised task, the loss can be astive or reconstruction loss.Let Lss be the self-supervised loss term.We use a super-loss, such as cross-entropy loss, to train the model on labeled data for the down-task.Let Ls be the supervised loss term.Thus, the overall loss (⅄) is a combination Lss and Ls, weighted by their respective hyperparameters, ( and  ): ) ensity Estimation: For each point on the grid, compute the density of the 'n' in  orresponds to the cancer signature region in the reduced feature space.For these, we ed KDE (kernel density estimation) computation, which is stated as e (a, b) denotes a point on the grid, ( ,  , ⋯  ) are the values of the selected PCs ch sample, and fk is the kernel function.Figure 3 represents the visualization of the ion boundaries and decision regions of the DL model.This visualization can aid in rstanding how the model separates cancerous and non-cancerous samples in the dimensional feature space.
, is the dimension of the specific keys.Feedforward Neural Network (FFNN): The FFNN consists of two linear transformations with a ReLU activation function in between [32].Thus, the outcome is computed as where w l n , e l n are learnable weight matrices and bias terms, respectively.The output of the transformer layer is obtained by applying a residual connection and layer normalization.
After the self-supervised transformer [33] encoder, the output embeddings Z l are used for downstream tasks, such as cancer detection, using a supervised learning approach.Let T y be the target label for the cancer detection task, with dimensions (n × c), where c indicates the number of cancer types.Similarly, w T y and e T y are the weight bias term for the downstream task classification, respectively.Thus, the final predictive ( Self-Supervised Transformer: Let TL be the Each transformer layer consists of self-attention ers. For each TL(L → 1 to l), the self-attention m Aw and the attention output, Ow.Let I l−1 be the i O l be the output embeddings for the l th layer.T pressed as where  ,  ʞ , and  are learnable weight |M| spectively, and  ʞ , is the dimension of the spec Feedforward Neural Network (FFNN): T mations with a ReLU activation function in betw as where  ,  are learnable weight matrices an the transformer layer is obtained by applying a tion.

𝒁 𝒍 = 𝒏𝒐𝒓𝒎 𝑶 𝑭 𝒍
After the self-supervised transformer [33] used for downstream tasks, such as cancer de proach.Let Ty be the target label for the cance where c indicates the number of cancer types.S trix and bias term for the downstream task class dictive (վ) analytics is determined as The self-supervised loss encourages the m from the extracted PCs.Depending on the cho contrastive or reconstruction loss.Let Lss be the vised loss, such as cross-entropy loss, to train stream task.Let Ls be the supervised loss term. of the Lss and Ls, weighted by their respective hy ⅄ =    ×   Density Estimation: For each point on the g that corresponds to the cancer signature region utilized KDE (kernel density estimation) compu where (a, b) denotes a point on the grid, ( ,  for each sample, and fk is the kernel function.F decision boundaries and decision regions of th understanding how the model separates canc high-dimensional feature space. ) analytics is determined as Self-Supervised Transformer: Let TL be the number of transformer layers in the model.Each transformer layer consists of self-attention and feedforward neural network sub-layers.
For each TL(L → 1 to l), the self-attention mechanism computes the attention weights Aw and the attention output, Ow.Let I l−1 be the input embeddings for the (l−1) th layer, and O l be the output embeddings for the l th layer.Thus, the self-attention computation is expressed as where  ,  ʞ , and  are learnable weight |M| for query, key, and value projections, respectively, and  ʞ , is the dimension of the specific keys.Feedforward Neural Network (FFNN): The FFNN consists of two linear transformations with a ReLU activation function in between [32].Thus, the outcome is computed as where  ,  are learnable weight matrices and bias terms, respectively.The output of the transformer layer is obtained by applying a residual connection and layer normalization.
After the self-supervised transformer [33] encoder, the output embeddings  are used for downstream tasks, such as cancer detection, using a supervised learning approach.Let Ty be the target label for the cancer detection task, with dimensions (n × c), where c indicates the number of cancer types.Similarly,  and  are the weight matrix and bias term for the downstream task classification, respectively.Thus, the final predictive (վ) analytics is determined as The self-supervised loss encourages the model to learn meaningful representations from the extracted PCs.Depending on the chosen self-supervised task, the loss can be contrastive or reconstruction loss.Let Lss be the self-supervised loss term.We use a supervised loss, such as cross-entropy loss, to train the model on labeled data for the downstream task.Let Ls be the supervised loss term.Thus, the overall loss (⅄) is a combination of the Lss and Ls, weighted by their respective hyperparameters, ( and  ): Density Estimation: For each point on the grid, compute the density of the 'n' in  that corresponds to the cancer signature region in the reduced feature space.For these, we utilized KDE (kernel density estimation) computation, which is stated as where (a, b) denotes a point on the grid, ( ,  , ⋯  ) are the values of the selected PCs for each sample, and fk is the kernel function.Figure 3 represents the visualization of the decision boundaries and decision regions of the DL model.This visualization can aid in understanding how the model separates cancerous and non-cancerous samples in the high-dimensional feature space.
The self-supervised loss encourages the model to learn meaningful representations from the extracted PCs.Depending on the chosen self-supervised task, the loss can be contrastive or reconstruction loss.Let L ss be the self-supervised loss term.We use a supervised loss, such as cross-entropy loss, to train the model on labeled data for the downstream task.Let L s be the supervised loss term.Thus, the overall loss ( Each transformer layer consists of self-attention and feedforward neural n ers. For each TL(L → 1 to l), the self-attention mechanism computes the a Aw and the attention output, Ow.Let I l−1 be the input embeddings for the O l be the output embeddings for the l th layer.Thus, the self-attention co pressed as where  ,  ʞ , and  are learnable weight |M| for query, key, and valu spectively, and  ʞ , is the dimension of the specific keys.Feedforward Neural Network (FFNN): The FFNN consists of two mations with a ReLU activation function in between [32].Thus, the outco as where  ,  are learnable weight matrices and bias terms, respectively the transformer layer is obtained by applying a residual connection and tion.
After the self-supervised transformer [33] encoder, the output em used for downstream tasks, such as cancer detection, using a supervi proach.Let Ty be the target label for the cancer detection task, with dim where c indicates the number of cancer types.Similarly,  and  ar trix and bias term for the downstream task classification, respectively.Th dictive (վ) analytics is determined as The self-supervised loss encourages the model to learn meaningful from the extracted PCs.Depending on the chosen self-supervised task contrastive or reconstruction loss.Let Lss be the self-supervised loss term.vised loss, such as cross-entropy loss, to train the model on labeled da stream task.Let Ls be the supervised loss term.Thus, the overall loss (⅄) of the Lss and Ls, weighted by their respective hyperparameters, ( and ⅄ =    ×   +    ×  Density Estimation: For each point on the grid, compute the density that corresponds to the cancer signature region in the reduced feature spa utilized KDE (kernel density estimation) computation, which is stated as where (a, b) denotes a point on the grid, ( ,  , ⋯  ) are the values of for each sample, and fk is the kernel function.Figure 3 represents the vis decision boundaries and decision regions of the DL model.This visualiz understanding how the model separates cancerous and non-cancerous high-dimensional feature space.
) is a combination of the L ss and L s , weighted by their respective hyperparameters, (λ L ss and λ L s : Self-Supervised Transformer: Let TL be the number of transformer layers in the model.Each transformer layer consists of self-attention and feedforward neural network sub-lay-ers. For each TL(L → 1 to l), the self-attention mechanism computes the attention weights Aw and the attention output, Ow.Let I l−1 be the input embeddings for the (l−1) th layer, and O l be the output embeddings for the l th layer.Thus, the self-attention computation is ex-pressed as where  ,  ʞ , and  are learnable weight |M| for query, key, and value projections, re-spectively, and  ʞ , is the dimension of the specific keys.Feedforward Neural Network (FFNN): The FFNN consists of two linear transfor-mations with a ReLU activation function in between [32].Thus, the outcome is computed as where  ,  are learnable weight matrices and bias terms, respectively.The output of the transformer layer is obtained by applying a residual connection and layer normaliza-tion.
After the self-supervised transformer [33] encoder, the output embeddings  are used for downstream tasks, such as cancer detection, using a supervised learning ap-proach.Let Ty be the target label for the cancer detection task, with dimensions (n × c), where c indicates the number of cancer types.Similarly,  and  are the weight ma-trix and bias term for the downstream task classification, respectively.Thus, the final pre-dictive (վ) analytics is determined as վ =   ×    ×    (12) The self-supervised loss encourages the model to learn meaningful representations from the extracted PCs.Depending on the chosen self-supervised task, the loss can be contrastive or reconstruction loss.Let Lss be the self-supervised loss term.We use a super-vised loss, such as cross-entropy loss, to train the model on labeled data for the down-stream task.Let Ls be the supervised loss term.Thus, the overall loss (⅄) is a combination of the Lss and Ls, weighted by their respective hyperparameters, ( and  ): ⅄ =    ×   +    ×   (13) Density Estimation: For each point on the grid, compute the density of the 'n' in  that corresponds to the cancer signature region in the reduced feature space.For these, we utilized KDE (kernel density estimation) computation, which is stated as where (a, b) denotes a point on the grid, ( ,  , ⋯  ) are the values of the selected PCs for each sample, and fk is the kernel function.Figure 3 represents the visualization of the decision boundaries and decision regions of the DL model.This visualization can aid in understanding how the model separates cancerous and non-cancerous samples in the high-dimensional feature space.

= [(λ
Density Estimation: For each point on the grid, compute the density of the 'n' in D I that corresponds to the cancer signature region in the reduced feature space.For these, we utilized KDE (kernel density estimation) computation, which is stated as where (a, b) denotes a point on the grid, (F 1 , F 2 , • • • F n ) are the values of the selected PCs for each sample, and f k is the kernel function.The attention mechanism is pivotal in the described DL process, particularly in the self-learning transformer component.This architecture uses attention to capture complex patterns and relationships within the high-dimensional input data represented by the selected principal components (PCs) after PCA dimensional reduction.As defined in Equations (11) and (12), the attention mechanism enables the model to focus on specific parts of the input embeddings and learn the relevant features crucial for downstream tasks, such as cancer detection.The model can effectively capture intricate relationships among the input features by calculating attention weights and output embeddings iteratively through self-attention.This process is vital for understanding how the transformed data in the form of PCs are leveraged by the DL model, ensuring that the model can discern meaningful patterns even in the reduced feature space.
Regarding the transition from the original image data to the final X_Train and y_train datasets, the description provides a clear pathway.First, PCA dimensional reduction is applied to the original high-dimensional data, retaining only the top 'K' principal components that capture the most significant variation.These selected PCs form the basis of the subsequent DL feature extraction process.As part of the self-learning transformer, the attention mechanism ensures that the model effectively learns from these PCs, even though the dimensionality has been reduced.The model is trained using labeled data (target labels for cancer detection task) represented as y_train.In contrast, the input features are represented by X_Train, comprising the transformed data obtained after the self-learning transformer's processing.The attention mechanism is pivotal in the described DL process, particularly in the self-learning transformer component.This architecture uses attention to capture complex patterns and relationships within the high-dimensional input data represented by the selected principal components (PCs) after PCA dimensional reduction.As defined in Equations ( 11) and ( 12), the attention mechanism enables the model to focus on specific parts of the input embeddings and learn the relevant features crucial for downstream tasks, such as cancer detection.The model can effectively capture intricate relationships among the input features by calculating attention weights and output embeddings iteratively through self-attention.This process is vital for understanding how the transformed data in the form of PCs are leveraged by the DL model, ensuring that the model can discern meaningful patterns even in the reduced feature space.
Regarding the transition from the original image data to the final X_Train and y_train datasets, the description provides a clear pathway.First, PCA dimensional reduction is applied to the original high-dimensional data, retaining only the top 'K' principal components that capture the most significant variation.These selected PCs form the basis of the subsequent DL feature extraction process.As part of the self-learning transformer, the attention mechanism ensures that the model effectively learns from these PCs, even though the dimensionality has been reduced.The model is trained using labeled data (target labels for cancer detection task) represented as y_train.In contrast, the input features are represented by X_Train, comprising the transformed data obtained after the self-learning transformer's processing.

Empirical Layout
This deep learning model was developed on an Intel Core i7 13620H CPU clocked at 1.8 GHz in an experimental setting.Ubuntu 20.04 LTS is the preferred OS since it offers a reliable and well-supported setting for ML projects.Python 3.7 or later is the primary programming language, while PyTorch 1.7.1 is the deep learning framework of choice.The environment includes popular data processing and scientific computing libraries, including Numpy, Pandas, and Scikit-Learn.The versions of CUDA and cuDNN were installed to use NVIDIA GPUs when computation varies with the version of PyTorch.
Table 1 represents the hyperparameters configured in the proposed deep learning model for training purposes.

Empirical Layout
This deep learning model was developed on an Intel Core i7 13620H CPU clocked at 1.8 GHz in an experimental setting.Ubuntu 20.04 LTS is the preferred OS since it offers a reliable and well-supported setting for ML projects.Python 3.7 or later is the primary programming language, while PyTorch 1.7.1 is the deep learning framework of choice.The environment includes popular data processing and scientific computing libraries, including Numpy, Pandas, and Scikit-Learn.The versions of CUDA and cuDNN were installed to use NVIDIA GPUs when computation varies with the version of PyTorch.
Table 1 represents the hyperparameters configured in the proposed deep learning model for training purposes.

Outcome Analysis
The results of the proposed model are comparatively assessed with some relevant existing approaches discussed in Section 2. A few key performance metrics are included in this section for precise analysis.
In the research context, a contour visualization [33] would use contour lines (or colorcoded regions) to indicate areas in an n-dimensional feature space where cancer-related gene expression signatures are more concentrated.The contour lines (or regions) represent areas with a high concentration of cancer-related signatures.In a color-coded contour plot, warmer colors like red or orange represent high-concentration regions, while cooler colors like blue or green represent low-concentration regions.Figure 3 presents a series of six contour plots labeled from sample (a) to (f), which depict the spatial distribution of cancer-related gene expression in a reduced feature space.As we traverse from sample (a) to (f), there is a noticeable gradation in the intensity of cancer-related gene signatures.Sample (b), for instance, displays minimal areas of heightened gene expression, signifying a scant presence of cancer-associated markers.
In contrast, sample (c) unveils expansive zones of intensified expression, signaling a robust concentration of cancer-specific signatures.Such visualizations furnish a tangible representation of the gene expression landscape by mapping the gene expression onto a 2D plane using contour lines.This provides an intuitive understanding of the data's structure and offers critical insights into the relative abundance and clustering of specific cancer-related genetic markers within the compressed feature domain.
From Figure 4, the accuracy rates of 96.5% for lung cancer, 94.5% for brain cancer, 93.5% for prostate cancer, and 95.5% for CNS embryonal cancer are all relatively high, suggesting that the integrated approach is successful in the detection of these types of cancer using gene expression data.Firstly, the procedures of EBS in preprocessing the gene expression data may have helped to address issues of variability and noise in the data, enhancing the quality and reliability of the gene expression measurements and making them more amenable for further analysis.Secondly, the use of JSD to measure the distributional differences between cancerous and non-cancerous samples provides a

Outcome Analysis
The results of the proposed model are comparatively assessed with some relevant existing approaches discussed in Section 2. A few key performance metrics are included in this section for precise analysis.
In the research context, a contour visualization [33] would use contour lines (or colorcoded regions) to indicate areas in an n-dimensional feature space where cancer-related gene expression signatures are more concentrated.The contour lines (or regions) represent areas with a high concentration of cancer-related signatures.In a color-coded contour plot, warmer colors like red or orange represent high-concentration regions, while cooler colors like blue or green represent low-concentration regions.Figure 3 presents a series of six contour plots labeled from sample (a) to (f), which depict the spatial distribution of cancer-related gene expression in a reduced feature space.As we traverse from sample (a) to (f), there is a noticeable gradation in the intensity of cancer-related gene signatures.Sample (b), for instance, displays minimal areas of heightened gene expression, signifying a scant presence of cancer-associated markers.
In contrast, sample (c) unveils expansive zones of intensified expression, signaling a robust concentration of cancer-specific signatures.Such visualizations furnish a tangible representation of the gene expression landscape by mapping the gene expression onto a 2D plane using contour lines.This provides an intuitive understanding of the data's structure and offers critical insights into the relative abundance and clustering of specific cancer-related genetic markers within the compressed feature domain.
From Figure 4, the accuracy rates of 96.5% for lung cancer, 94.5% for brain cancer, 93.5% for prostate cancer, and 95.5% for CNS embryonal cancer are all relatively high, suggesting that the integrated approach is successful in the detection of these types of cancer using gene expression data.Firstly, the procedures of EBS in preprocessing the gene expression data may have helped to address issues of variability and noise in the data, enhancing the quality and reliability of the gene expression measurements and making them more amenable for further analysis.Secondly, the use of JSD to measure the distributional differences between cancerous and non-cancerous samples provides a robust way to identify crucial genomic signatures associated with each type of cancer.This information-theoretic measure quantifies differences in gene expression patterns, potentially aiding in capturing unique disease signatures.
learning has proven to be very effective in tasks involving high-dimensional data, such as gene expression profiles, and can potentially uncover complex molecular relationships and identify critical features for cancer detection.Lastly, the use of contour mathematics for visualization provides an intuitive way to understand the decision boundaries and regions in the high-dimensional feature space.It enhances the interpretability of the model, providing a visual representation of where cancer-related signatures are more concentrated in the feature space.Therefore, considering the complexity and high dimensionality of gene expression data, achieving accuracy rates of over 93% for all types of cancer studied is a strong endorsement of the proposed integrated approach.
The impressive accuracy rates achieved for detecting various types of cancer underscore the effectiveness of the integrated approach.This is due to the inclusion of the EBH technique in preprocessing, which ensures the removal of extraneous noise and variance from the gene expression data, leading to consistent and reliable measurements.This foundation is crucial, as cleaner data often correlate with enhanced predictive performance.The JSD also introduces a rigorous mathematical framework to differentiate between cancerous and non-cancerous gene expression profiles, ensuring that the most pivotal genomic markers are emphasized.DL, especially transformer architectures, delves into the intricacies of high-dimensional data, autonomously pinpointing and deciphering multi-layered patterns that might elude traditional methods.When visualized using contour mathematics, these patterns furnish a lucid, graphical delineation of the decisionmaking process, revealing zones of high cancer signature concentration and providing clinicians and researchers with actionable insights into the underlying molecular dynamics.Thirdly, implementing a DL transformer-based process allows for the automatic extraction of deep features and the ability to learn complex patterns from the data.Deep learning has proven to be very effective in tasks involving high-dimensional data, such as gene expression profiles, and can potentially uncover complex molecular relationships and identify critical features for cancer detection.Lastly, the use of contour mathematics for visualization provides an intuitive way to understand the decision boundaries and regions in the high-dimensional feature space.It enhances the interpretability of the model, providing a visual representation of where cancer-related signatures are more concentrated in the feature space.
Therefore, considering the complexity and high dimensionality of gene expression data, achieving accuracy rates of over 93% for all types of cancer studied is a strong endorsement of the proposed integrated approach.
The impressive accuracy rates achieved for detecting various types of cancer underscore the effectiveness of the integrated approach.This is due to the inclusion of the EBH technique in preprocessing, which ensures the removal of extraneous noise and variance from the gene expression data, leading to consistent and reliable measurements.This foundation is crucial, as cleaner data often correlate with enhanced predictive performance.The JSD also introduces a rigorous mathematical framework to differentiate between cancerous and non-cancerous gene expression profiles, ensuring that the most pivotal genomic markers are emphasized.DL, especially transformer architectures, delves into the intricacies of high-dimensional data, autonomously pinpointing and deciphering multi-layered patterns that might elude traditional methods.When visualized using contour mathematics, these patterns furnish a lucid, graphical delineation of the decision-making process, revealing zones of high cancer signature concentration and providing clinicians and researchers with actionable insights into the underlying molecular dynamics.
Performances of a few recent and relevant methodologies (MLP, DEGnext, BPSO-DT+CNN) are compared with the outcomes of the proposed model.Figure 4 showcases the performance of different models on the same dataset as evaluated by the Area Under the Receiver Operating Characteristics (AUC-ROC) measure [32].This measure illustrates the ability of a model to differentiate between classes, in this case, different types of cancers, with a value of 1 indicating perfect classification and a value of 0.5 equivalent to arbitrary guessing.
The Multilayer Perceptron (MLP) model delivered an AUC-ROC score of 0.8901.This suggests that the MLP model has a good ability to distinguish between cancer types, showing its effectiveness in this classification task.Next, the DEGnext model achieved an AUC-ROC of 0.9021.This impressive score demonstrates an excellent classification performance, superior to the MLP model, indicating that the DEGnext model has a slightly higher capacity to distinguish between the classes.The BPSO-DT+CNN model achieved a remarkable AUC-ROC of 0.9133.This score shows that the BPSO-DT+CNN model could differentiate between cancer types with even greater accuracy than both the MLP and DEGnext models.The proposed model, however, achieved an outstanding AUC-ROC score of 0.9411, the highest among all the evaluated models.This exceptional performance shows that the proposed model not only outperformed the other models but also has a high discriminative power, making it highly efficient and accurate in classifying different types of cancer.
The insights drawn from Figure 5 have significant implications for cancer classification using gene expression data.The progressive increase in AUC-ROC scores from MLP to the proposed model underscores the continuous advancements in machine learning and data processing techniques tailored for genomic data.The dominant outcome of the proposed model, with its peak AUC-ROC score of 0.9411, suggests that the integration of advanced techniques, possibly coupled with superior feature engineering or extraction methodologies, can provide unparalleled precision in distinguishing between different cancer types.This precision is invaluable in clinical settings, as it can guide diagnosis, treatment decisions, and prognostic evaluations.Furthermore, the evident gap between traditional models like MLP and the proposed model emphasizes the need for continuous research and adaptation in the rapidly evolving domain of genomic data analysis.In essence, the insights highlight the paramount importance of leveraging cutting-edge techniques to achieve optimal accuracy, ultimately benefiting patient care and advancing our understanding of cancer biology.
DT+CNN) are compared with the outcomes of the proposed model.Figure 4 showcases the performance of different models on the same dataset as evaluated by the Area Under the Receiver Operating Characteristics (AUC-ROC) measure [32].This measure illustrates the ability of a model to differentiate between classes, in this case, different types of cancers, with a value of 1 indicating perfect classification and a value of 0.5 equivalent to arbitrary guessing.
The Multilayer Perceptron (MLP) model delivered an AUC-ROC score of 0.8901.This suggests that the MLP model has a good ability to distinguish between cancer types, showing its effectiveness in this classification task.Next, the DEGnext model achieved an AUC-ROC of 0.9021.This impressive score demonstrates an excellent classification performance, superior to the MLP model, indicating that the DEGnext model has a slightly higher capacity to distinguish between the classes.The BPSO-DT+CNN model achieved a remarkable AUC-ROC of 0.9133.This score shows that the BPSO-DT+CNN model could differentiate between cancer types with even greater accuracy than both the MLP and DEGnext models.The proposed model, however, achieved an outstanding AUC-ROC score of 0.9411, the highest among all the evaluated models.This exceptional performance shows that the proposed model not only outperformed the other models but also has a high discriminative power, making it highly efficient and accurate in classifying different types of cancer.
The insights drawn from Figure 5 have significant implications for cancer classification using gene expression data.The progressive increase in AUC-ROC scores from MLP to the proposed model underscores the continuous advancements in machine learning and data processing techniques tailored for genomic data.The dominant outcome of the proposed model, with its peak AUC-ROC score of 0.9411, suggests that the integration of advanced techniques, possibly coupled with superior feature engineering or extraction methodologies, can provide unparalleled precision in distinguishing between different cancer types.This precision is invaluable in clinical settings, as it can guide diagnosis, treatment decisions, and prognostic evaluations.Furthermore, the evident gap between traditional models like MLP and the proposed model emphasizes the need for continuous research and adaptation in the rapidly evolving domain of genomic data analysis.In essence, the insights highlight the paramount importance of leveraging cutting-edge techniques to achieve optimal accuracy, ultimately benefiting patient care and advancing our understanding of cancer biology.Figure 6 shows the performance of different models on a precision-recall curve, a widely used metric in machine learning to evaluate model performance, especially in the case of imbalanced datasets.A higher area under the precision-recall curve (AUC-PR) indicates a more accurate model.Starting with the MLP (Multilayer Perceptron), it has an AUC-PR score of 0.8234.This shows a reasonably good performance in balancing both precision and recall, thus making it a reliable model for predicting cancer classes from gene expressions.The DEGnext model further improves the precision-recall trade-off, scoring 0.8911 on the AUC-PR.This means it can correctly identify more true positives while minimizing the false positives, hence being more precise and trustworthy for the same task.The BPSO-DT+CNN model, with an AUC-PR of 0.8709, also exhibits a strong performance.Despite its slightly lower score than DEGnext, it still has a commendable ability to classify the cancer types correctly while minimizing errors, making it a potential choice for such diagnostic tasks.Finally, the proposed model outperforms all the previous models with an impressive AUC-PR score of 0.9123.This clearly indicates its superior ability to maintain high precision and recall simultaneously, thus making it the most reliable model among the four for this specific task.It effectively minimizes prediction errors, offering a significant promise for practical applications in diagnosing cancer types using gene expression data.
widely used metric in machine learning to evaluate model performance, especially in the case of imbalanced datasets.A higher area under the precision-recall curve (AUC-PR) indicates a more accurate model.Starting with the MLP (Multilayer Perceptron), it has an AUC-PR score of 0.8234.This shows a reasonably good performance in balancing both precision and recall, thus making it a reliable model for predicting cancer classes from gene expressions.The DEGnext model further improves the precision-recall trade-off, scoring 0.8911 on the AUC-PR.This means it can correctly identify more true positives while minimizing the false positives, hence being more precise and trustworthy for the same task.The BPSO-DT+CNN model, with an AUC-PR of 0.8709, also exhibits a strong performance.Despite its slightly lower score than DEGnext, it still has a commendable ability to classify the cancer types correctly while minimizing errors, making it a potential choice for such diagnostic tasks.Finally, the proposed model outperforms all the previous models with an impressive AUC-PR score of 0.9123.This clearly indicates its superior ability to maintain high precision and recall simultaneously, thus making it the most reliable model among the four for this specific task.It effectively minimizes prediction errors, offering a significant promise for practical applications in diagnosing cancer types using gene expression data.The insights from Figure 6 carry significant implications for the realm of cancer diagnosis using gene expression data.The varying scores among the models highlight the importance of choosing a suitable algorithm, especially in scenarios with imbalanced datasets.While MLP offers a foundational approach, advanced models like DEGnext and BPSO-DT+CNN demonstrate the potential of specialized algorithms to enhance diagnostic accuracy.Most importantly, the superior performance of the proposed model underscores the value of continuous research and innovation.For healthcare professionals and researchers, this suggests that leveraging the most advanced and tailored models can lead to more precise diagnoses, potentially improving patient outcomes and guiding targeted therapeutic interventions.In essence, the right choice of model can significantly impact the accuracy and reliability of cancer type predictions, driving better clinical decisions.The insights from Figure 6 carry significant implications for the realm of cancer diagnosis using gene expression data.The varying scores among the models highlight the importance of choosing a suitable algorithm, especially in scenarios with imbalanced datasets.While MLP offers a foundational approach, advanced models like DEGnext and BPSO-DT+CNN demonstrate the potential of specialized algorithms to enhance diagnostic accuracy.Most importantly, the superior performance of the proposed model underscores the value of continuous research and innovation.For healthcare professionals and researchers, this suggests that leveraging the most advanced and tailored models can lead to more precise diagnoses, potentially improving patient outcomes and guiding targeted therapeutic interventions.In essence, the right choice of model can significantly impact the accuracy and reliability of cancer type predictions, driving better clinical decisions.
The loss values in Figure 7 indicate how well each model's predictions align with the actual data.A lower loss value implies better model performance, as the predictions closely match the actual data.
Figure 7 shows that various models exhibit different loss trajectories when applied to gene expression data for cancer classification.The MLP model, possibly hovering around a loss value of approximately 0.25, showcases its competence, though there is evident room for refinement.DEGnext, with an inferred loss value nearing 0.15, outperforms MLP, highlighting its superior alignment with the actual dataset values.The BPSO-DT+CNN model, potentially registering a loss close to 0.18, while commendable, lags slightly behind DEGnext.Most notably, the proposed model, estimated at an impressive loss value of around 0.10, underscores its unmatched predictive prowess, making it the standout performer in this comparative analysis.
a loss value of approximately 0.25, showcases its competence, though there is evident room for refinement.DEGnext, with an inferred loss value nearing 0.15, outperforms MLP, highlighting its superior alignment with the actual dataset values.The BPSO-DT+CNN model, potentially registering a loss close to 0.18, while commendable, lags slightly behind DEGnext.Most notably, the proposed model, estimated at an impressive loss value of around 0.10, underscores its unmatched predictive prowess, making it the standout performer in this comparative analysis.The MLP model has a moderate loss, which suggests that while it is performing adequately, there may be room for improvement in aligning its predictions more accurately with the actual data.The DEGnext model demonstrates an improvement over the MLP model, indicating that its predictions are more in sync with the real data.The BPSO-DT+CNN model has a slightly higher loss, which suggests that, although it is providing valuable predictions, there is potential to reduce this error margin and bring it more in line with the actual data.Finally, the proposed model shows the lowest loss.This indicates superior performance over the other models, as it aligns more closely with the actual data, making it the most accurate model in this selection.This is a very encouraging result, suggesting that the proposed model could be a highly effective tool for predicting future data based on the patterns it has learned from the training data.The result lays a solid foundation for applying and improving this model in future work.
The practical implementation of our interdisciplinary approach in real-world clinical settings necessitates a thorough evaluation of its feasibility within the constraints of current healthcare infrastructure.It is imperative to assess its integration with existing diagnostic tools, the training required for medical professionals, and its cost-effectiveness.Moreover, regulatory considerations are pivotal, as any novel diagnostic modality must conform to stringent safety, accuracy, and reproducibility standards.As future research, we plan to delve into pilot studies within clinical environments to understand these  The MLP model has a moderate loss, which suggests that while it is performing adequately, there may be room for improvement in aligning its predictions more accurately with the actual data.The DEGnext model demonstrates an improvement over the MLP model, indicating that its predictions are more in sync with the real data.The BPSO-DT+CNN model has a slightly higher loss, which suggests that, although it is providing valuable predictions, there is potential to reduce this error margin and bring it more in line with the actual data.Finally, the proposed model shows the lowest loss.This indicates superior performance over the other models, as it aligns more closely with the actual data, making it the most accurate model in this selection.This is a very encouraging result, suggesting that the proposed model could be a highly effective tool for predicting future data based on the patterns it has learned from the training data.The result lays a solid foundation for applying and improving this model in future work.
The practical implementation of our interdisciplinary approach in real-world clinical settings necessitates a thorough evaluation of its feasibility within the constraints of current healthcare infrastructure.It is imperative to assess its integration with existing diagnostic tools, the training required for medical professionals, and its cost-effectiveness.Moreover, regulatory considerations are pivotal, as any novel diagnostic modality must conform to stringent safety, accuracy, and reproducibility standards.As future research, we plan to delve into pilot studies within clinical environments to understand these dynamics while liaising with regulatory bodies to ensure that the method meets the benchmarks for clinical adoption.

Conclusions
Based on a comprehensive exploration and evaluation of various methods, the diagnosticintegrated approach combining Empirical Bayes Harmonization (EBS), Jensen-Shannon Divergence (JSD), deep learning, and contour mathematics proves to be highly effective for cancer detection utilizing gene expression data.The EBS preprocessing optimizes the data quality, setting a solid foundation for accurate diagnostics.JSD plays a pivotal role in distinguishing between cancerous and non-cancerous samples.The deep learning model's prowess in extracting intricate features and mastering sophisticated patterns from the data is paramount in achieving commendable accuracy.Moreover, contour mathematics offers a robust tool for visualizing decision boundaries in the intricate, high-dimensional feature

Figure 1 .
Figure 1.Gene expressions of various cancer types.(a) Lung cancer gene expression; (b) Brain cancer gene expression; (c) CNS embryonal cancer gene expression; (d) Prostate cancer gene expression.

Figure 1 .
Figure 1.Gene expressions of various cancer types.(a) Lung cancer gene expression; (b) Brain cancer gene expression; (c) CNS embryonal cancer gene expression; (d) Prostate cancer gene expression.

Figure 2 .
Figure 2. Architecture of the proposed model.

Figure 2 .
Figure 2. Architecture of the proposed model.

Figure 3
represents the visualization of the decision boundaries and decision regions of the DL model.This visualization can aid in understanding how the model separates cancerous and non-cancerous samples in the high-dimensional feature space.

Figure 3 .
Figure 3. Sample contour visualization representing the presence of cancer signature (from sample (a-f)).High-expression regions indicate areas in the reduced feature space where cancer-related signatures are more concentrated.

Figure 3 .
Figure 3. Sample contour visualization representing the presence of cancer signature (from sample (a-f)).High-expression regions indicate areas in the reduced feature space where cancer-related signatures are more concentrated.

Figure 4 .
Figure 4. Confusion matrix of various cancer types.

Figure 4 .
Figure 4. Confusion matrix of various cancer types.

Table 1 .
Vital parameters of the model.

Table 1 .
Vital parameters of the model.