Next Article in Journal
Co-Cultivation Assays for Detecting Infectious Human-Tropic Porcine Endogenous Retroviruses (PERVs)
Previous Article in Journal
Development of Cytisus Flower Extracts with Antioxidant and Anti-Inflammatory Properties for Nutraceutical and Food Uses
Previous Article in Special Issue
Integrating Machine Learning and Follow-Up Variables to Improve Early Detection of Hepatocellular Carcinoma in Tyrosinemia Type 1: A Multicenter Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

PriorCCI: Interpretable Deep Learning Framework for Identifying Key Ligand–Receptor Interactions Between Specific Cell Types from Single-Cell Transcriptomes

by
Hanbyeol Kim
1,
Eunyoung Choi
1,
Yujeong Shim
1 and
Joonha Kwon
1,2,*
1
Bioinformatics Branch, National Cancer Center, Goyang 10408, Republic of Korea
2
Department of Public Health & AI, Graduate School of Cancer Science and Policy, National Cancer Center, Goyang 10408, Republic of Korea
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2025, 26(15), 7110; https://doi.org/10.3390/ijms26157110
Submission received: 18 June 2025 / Revised: 16 July 2025 / Accepted: 22 July 2025 / Published: 23 July 2025
(This article belongs to the Special Issue New Insights in Translational Bioinformatics: Second Edition)

Abstract

Understanding the interactions between specific cell types within tissue environments is essential for elucidating key biological processes, such as immune responses, cancer progression, inflammation, and development, in both physiological and pathological studies. The predominant methods for analyzing cell–cell interactions (CCI) rely primarily on statistical inference using mapping or network-based techniques. However, these approaches often struggle to prioritize meaningful interactions owing to the high sparsity and heterogeneity inherent in single-cell RNA sequencing (scRNA-seq) data, where small but biologically important differences can be easily overlooked. To overcome these limitations, we developed PriorCCI, a deep-learning framework that leverages a convolutional neural network (CNN) alongside Grad-CAM++, an explainable artificial intelligence algorithm. This study aims to provide a scalable, interpretable, and biologically meaningful framework for systematically identifying and prioritizing key ligand–receptor interactions between defined cell-type pairs from single-cell RNA-seq data, particularly in complex environments such as tumors. PriorCCI effectively prioritizes interactions between cancer and other cell types within the tumor microenvironment and accurately identifies biologically significant interactions related to angiogenesis. By providing a visual interpretation of gene-pair contributions, our approach enables robust inference of gene–gene interactions across distinct cell types from scRNA-seq data.

1. Introduction

Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of cellular interactions by enabling high-resolution analysis at the cellular level. It allows researchers to examine co-expression patterns between cell types and gain insights into the functional roles of specific gene–gene interactions [1,2,3]. By comparing expression profiles across defined cell type groups, scRNA-seq data revealed how cells communicate and coordinate functions within tissue environments, contributing to our understanding of tissue organization and disease mechanisms [4,5].
A wide range of computational tools has been developed to infer and analyze cell–cell communication using single-cell transcriptomic data [6]. These tools adopt diverse algorithms, statistical frameworks, and assumptions to reconstruct intercellular interactions, predict signaling pathways, and identify key mediators of communication [7]. Many studies utilize ligand–receptor expression patterns to infer potential interactions, offering complementary perspectives that, when integrated, provide a more comprehensive understanding of signaling networks in complex biological systems [3,8].
Among these tools, CellChat is a well-established platform that leverages ligand–receptor interaction databases and scRNA-seq data to infer signaling activities and estimate both the strength and directionality of communication pathways [7,9,10]. CellPhoneDB, another widely used resource, combines a curated ligand–receptor database with statistical testing to predict significant associations between cell types [11]. NicheNet takes a receptor-centric approach, incorporating regulatory networks of receptor-mediated pathways and transcription factors to predict how ligands from one cell population influence gene expression in another [12]. ICELLNET employs a unique strategy of integrating protein–protein interaction data and signaling pathway information to reconstruct cell–cell interaction (CCI) networks, emphasizing intracellular cascades triggered by intercellular communication [13]. Together, these tools represent a rich and diverse methodological landscape, each using statistical, integrative, or knowledge-based strategies to decode complex cell–cell signaling patterns from scRNA-seq data [1,9,14]. Despite their sophistication, current methods often struggle to detect subtle yet biologically important interactions. This limitation is particularly critical in cancer research using tumor-derived scRNA-seq data, in which small differences in gene expression may be overlooked by methods that favor strong or global statistical trends.
Deep learning offers a powerful solution to these problems. For instance, DeepCCI is a recently developed deep-learning-based tool for predicting CCIs from scRNA-seq data. It combines an autoencoder–graph convolutional network-based cell clustering module with an interaction prediction module that integrates residual neural networks and graph convolutional networks [15]. However, DeepCCI lacks explainability in terms of model interpretability, and its dependence on differentially expressed genes may introduce statistical bias, limiting its ability to selectively prioritize interactions specific to certain cell types.
To overcome these limitations, we developed PriorCCI, a deep learning framework designed to prioritize CCIs between specific pairs of cell types. The purpose of this study is to establish a scalable and interpretable deep learning approach that systematically identifies and prioritizes key ligand–receptor interactions between cancer cells and other cell types within the tumor microenvironment (TME) using scRNA-seq data. PriorCCI uses a CNN to integrate gene expression patterns from two cell-type groups and employs Grad-CAM++ for visual interpretation [16] to quantify the contribution of each ligand–receptor pair for the groups based on the learning result by CNN. We applied this method to two independent datasets of lung and colorectal cancers sourced from the Cancer Cell Atlas (CCA) [17,18]. Our analysis identified ligand–receptor pairs uniquely enriched in interactions between tumors and endothelial cells, which may serve as potential targets for angiogenesis-related studies. This approach enables the systematic, cell-type-specific prioritization of CCIs and provides a scalable framework with broad applications in both basic research and translational medicine.

2. Results

2.1. Input Data Preparation for PriorCCI and Presentation

To investigate the CCIs within the TME, we first constructed two-channel input samples suitable for deep learning analysis. For each pair of cell types (A and B), we randomly selected 100 representative cells and organized their gene expression values into two expression matrices of size 100 × 6296. The 6296 genes corresponded to 3148 ligand–receptor pairs, with group A sorted by ligand–receptor order and group B sorted by reverse (receptor–ligand) order. This mirrored structure allows the resulting data to be interpreted as two-channel image-like inputs with shapes of (100, 6296, 2), preserving the directional nature of the interactions. For each of the 21 pairwise cell type combinations (classes), 1000 samples were generated through random sampling, resulting in 21,000 input samples used to train the PriorCCI model (Figure 1).
To determine class-specific gene importance, we applied the Grad-CAM++ algorithm to the trained CNN model. For each class, the top 25% of the ligand–receptor pairs were selected based on the normalized contribution scores. To evaluate the reproducibility of the model outputs, training was repeated 10 times independently, and the similarity between Grad-CAM++ importance scores across models was assessed using cosine similarity and Spearman correlation.
This analysis was performed using scRNA-seq datasets from two epithelial-origin cancers, non-small cell lung cancer (NSCLC) and colorectal cancer (CRC). Both datasets included cancerous and adjacent normal tissues curated in an integrated atlas format. After preprocessing, 36,601 genes were analyzed in 482,351 NSCLC and 702,657 CRC cells. As described in the Materials and Methods section, cell types were annotated and visualized using UMAP, showing a clear separation among seven major groups: tumor, endothelial, fibroblast, myeloid, T/NK, B, and epithelial cells. All possible pairwise combinations (7C2 = 21) were considered in the analysis (Figure 2).
To reduce the computational burden and correct for cell type imbalance, we employed geometric sketching using endothelial cell counts (the lowest among cell types: NSCLC, 9990; CRC, 23,742) as the baseline. This resulted in balanced subsets consisting of 104,109 cells for NSCLC and 247,011 for CRC (Table 1). Representative cells from each type were evenly sampled to construct the input data for all pairwise interaction classes.

2.2. Performance Evaluation of the CNN Model

To quantitatively evaluate the classification performance, CNN models were trained for 21 cell-pair classes using the generated two-channel inputs. For each of the 10 training runs (initialized with different random seeds), the data were split into training, validation, and test sets in a 64:16:20 ratio. The model performance was assessed using accuracy, precision, recall, F1 score, and area under the ROC curve (AUC).
As shown in Figure 3, both NSCLC and CRC datasets demonstrated steady decreases in training and validation loss and a gradual increase in validation accuracy, reaching ≥ 0.95 and indicating stable convergence without overfitting. These results demonstrate that the model consistently converges during training and achieves strong generalization performance.
As summarized in Table 2, the model accuracy for NSCLC ranged from 0.986 to 0.997, with other performance metrics (precision, recall, and F1 score) exceeding 0.98. For CRC, the accuracy ranged from 0.929 to 0.981, with other metrics exceeding 0.94. Both datasets exhibited low variability across runs and consistently strong performance. The average loss values were also low (0.0289 for NSCLC and 0.154 for CRC), highlighting the robustness and generalization capability of the PriorCCI framework. Notably, the macro-averaged AUC across all classes remained near 1.0 (range: 0.999–1.0), confirming the model’s excellent discriminative power.
The confusion matrices (Figure 3) showed high prediction accuracy across all classes. The ROC curves revealed a strong separation between classes, and standard deviations of the AUC across 10 runs remained below 0.02, further supporting the reproducibility of model performance. These results indicate that PriorCCI effectively learns CCI patterns from single-cell expression data and is capable of reliably distinguishing cell-type pairs within a complex TME.

2.3. Model Consistency Across Training Runs

To evaluate the reproducibility and reliability of PriorCCI, Grad-CAM++ importance results were compared across 10 independently trained models. Figure 4 shows the similarity metrics (cosine and Spearman correlations) calculated for the prioritized gene pairs of each model. The mean and standard deviation values for these similarities, excluding self-comparisons, are summarized in Table 3.
Strong consistency in gene rankings was observed among high-interaction cell type pairs, such as Tumor–Endothelial and Tumor–Myeloid, suggesting that PriorCCI steadily captures biologically meaningful interaction patterns. Conversely, a combination of Tumor–Epithelial, especially in CRC, whose cells are from the same tissue origin, showed lower inter-model similarities among 10 models, as evidenced by the reduced cosine and Spearman correlation values shown in Figure 4 and Table 3. This might be the result of the cell-type pair exhibiting weaker or less distinguishable interaction signatures, likely due to the biological similarity between cell types. These findings highlight the model’s ability to reflect biological variability and to distinguish context-specific interactions across different cancer types adaptively.

2.4. Functional Validation of Prioritized Gene Pairs Using GSEA

To validate the biological relevance of the genes prioritized by the PriorCCI model, gene set enrichment analysis (GSEA) was performed using the corresponding gene sets [19]. In the NSCLC dataset, interactions between Tumor–Endothelial and Tumor–Myeloid, which are known to play major roles in the TME, were analyzed. In the CRC dataset, Tumor–Fibroblast and Tumor–Myeloid interactions were examined. The enrichment results for these interactions are illustrated in Figure 5, which presents the associated biological pathways identified through GSEA.
In NSCLC, the Tumor–Endothelial interaction showed significant enrichment of angiogenesis-related pathways, including ‘Angiogenesis’, ‘Vascular endothelial cell response’, and ‘Blood vessel morphogenesis’. These results demonstrate that the gene pairs prioritized by PriorCCI are closely related to angiogenesis-associated biological processes. In the Tumor–Myeloid interaction, enrichment was observed in immune-related pathways such as ‘IL-6/JAK/STAT3 Signaling’, ‘Cytokine–cytokine receptor interaction’, ‘Inflammatory Response’, and ‘Chemokine binding’. This indicates that the prioritized gene pairs are involved in the regulation of tumor immune responses.
In CRC, Tumor–Fibroblast interactions showed enrichment in ‘Epithelial Mesenchymal Transition (EMT)’, ‘ECM–receptor interaction’, ‘Collagen receptor activity’, and ‘Extracellular matrix organization’, all of which are representative of cancer-associated fibroblast activation pathways. Tumor–Myeloid interactions were enriched in pathways such as ‘Neutrophil extracellular trap formation’, ‘Macrophage activation’, and ‘Positive regulation of myeloid leukocyte-mediated immunity’.
These findings confirm that the interaction gene pairs derived from PriorCCI reflect key biological characteristics of the TME, rather than being the result of mere mathematical optimization.

2.5. Comparison with Existing CCI Analysis Tools on Gene Priorities

To validate the gene pair prioritization, PriorCCI was compared with five established CCI inference tools: CellPhoneDB, ICELLNET, CellChat, NicheNet, and DeepCCI. While most tools rely on statistical or expression-based inferences, DeepCCI uses a deep-learning-based approach. Comparisons focused on the Tumor–Endothelial cell pair in NSCLC and CRC, where PriorCCI showed strong intra-model consistency. The top 25% of gene pairs ranked by Grad-CAM++ (averaged over 10 model runs) intersected with the predictions from each tool. Gene pairs were further filtered using expressing cell fraction (ECF); only gene pairs in which each gene showed the maximal ECF within the relevant cell type were retained. In NSCLC, this process identified 83 validated interacting gene pairs, many of which were not predicted using existing methods.
Figure 6 shows the overlap between PriorCCI and the other tools. Gene pairs on the x-axis are ranked by their Grad-CAM++ importance scores, whereas the y-axis lists each tool. Tools, such as CellPhoneDB and ICELLNET, detected more overlapping pairs, whereas CellChat and NicheNet detected fewer. Notably, PriorCCI’s top-ranked pairs were concentrated among biologically meaningful interactions, whereas traditional tools yielded a broader, less specific distribution.

2.6. Single-Cell Expression of Tumor-Endothelial Gene Pairs

To further validate the specificity of the prioritized gene pairs, ECF heat maps were generated for the top 30 gene pairs in the Tumor–Endothelial CCI (Figure 7a). This analysis highlighted gene pairs with highly distinct expression patterns between tumor and endothelial cells. For comparison, ECF distributions were plotted for the top 10 gene pairs from CellPhoneDB, ICELLNET, CellChat, NicheNet, and DeepCCI.
Notably, some gene pairs frequently prioritized by existing tools, such as APP–CD74, exhibited uniformly high expression across normal immune cell types, suggesting poor specificity for Tumor–Endothelial interactions (Table 4). This underscores the limitations of the previous methods for identifying context-specific CCI.
Figure 7b presents the aligned heat maps of the final prioritized gene pairs, showing the expression specificity for tumor and endothelial cells. Genes strongly expressed in tumor cells were predominantly oncogenes, whereas endothelial-expressed genes were enriched in angiogenesis-related pathways, demonstrating biological plausibility.
Specifically, ITGB3–VWF and ITGAV–VWF were consistently identified as high-priority pairs in the NSCLC dataset, exhibiting strong Grad-CAM++ importance and high ECF values. These gene pairs participate in angiogenic signaling and contribute to tumor growth within the TME [20].

3. Discussion

In this study, we propose PriorCCI, a deep-learning-based framework designed to prioritize cell-type-specific CCIs. The framework combines a CNN with Grad-CAM++, enabling the detection of subtle differences in gene expression between cell types, which are often overlooked by conventional statistical methods. We validated PriorCCI using datasets from patients with both NSCLC and CRC. Despite distinct differences in cellular composition and gene expression between these tumor types, PriorCCI demonstrated consistently high classification performance, achieving an average macro-AUC of ≥0.999. These results indicated the strong generalizability of the model across diverse TMEs.
In particular, key CCIs identified in NSCLC, especially the interactions between the tumor and endothelial cells, are strongly associated with angiogenesis-related signaling pathways. These ligand–receptor pairs may serve as promising candidates for the therapeutic targeting of tumor progression. Accordingly, PriorCCI not only enables precise prioritization of intercellular interactions but also facilitates biological interpretation and informs therapeutic strategy development.
Recent lung cancer studies have highlighted the heterogeneity of vascular subtypes within the TME, with growing interest in the role of tumor endothelial cells (TECs) [21]. TECs actively participate in tumor vascular remodeling, suppress immune cell infiltration, and impede drug delivery, thereby contributing to immune evasion and therapeutic resistance [22]. Given these roles, interactions between tumor cells and TECs are considered high-priority targets for therapeutic interventions.
A key strength of PriorCCI is its ability to provide explanations. By incorporating Grad-CAM++, our framework provides intuitive visual interpretations of why certain gene pairs are prioritized, thus improving transparency in deep-learning-based inference. The reproducibility of the top-ranked gene pairs across multiple runs further supported the robustness of the model. To enhance biological credibility, we applied ECF filtering to ensure that the prioritized genes were broadly expressed within the relevant cell populations. While conventional tools often rank gene pairs such as APP–CD74 highly, these genes are broadly expressed across tumor, endothelial, and immune cells [23], making them suboptimal targets. PriorCCI mitigates nonspecific predictions by integrating Grad-CAM++ prioritization with ECF-based filtering.
Although PriorCCI was validated using only lung and colon cancer datasets, its framework is applicable to a broad range of tissue and disease contexts, provided that appropriate scRNA-seq input formats are available. Its potential applications include autoimmune diseases, infectious diseases, and developmental biology.
However, this study has several limitations. First, the model relies solely on transcriptomic data; therefore, protein-level validation is necessary to confirm the functional relevance of the prioritized gene pairs. Second, rare cell populations present in very small numbers pose statistical challenges owing to insufficient sample sizes, which limits interaction inference. This is a common constraint in conventional tools. Future studies should explore strategies to enhance the analysis of rare cell types while maintaining their biological validity. Third, the current CNN architecture uses simple ligand–receptor ordering to structure its input matrix. For example, incorporating functional similarity into matrix organization via interaction networks or clustering may improve CNN learning efficiency and enhance the precision of CCI detection.
In conclusion, PriorCCI is a robust and interpretable framework for prioritizing and interpreting CCIs in scRNA-seq data. Given its interpretability and scalability, PriorCCI has potential for integration into translational pipelines aimed at therapeutic target discovery, such as for immune-oncology or anti-angiogenic drug development. Its reliable performance and broad utility make it a valuable tool for cancer research as well as for exploring complex physiological and pathological processes.

4. Materials and Methods

4.1. Data Preprocessing and Sampling of Representative Cells for Each Cell Type

Datasets for NSCLC and CRC were obtained from CCA and developed through our prior collaborative research. Cell-type annotations were conducted using a scanpy-based preprocessing pipeline [24] in combination with manual curation of marker genes following automatic annotation by SingleR (v.1.4.0) [25] and CellTypist (v.1.6.3) [26].
Tumor cells were identified based on copy number variation (CNV) scores calculated using InferCNVpy (v.0.4.2, https://github.com/icbi-lab/infercnvpy, accessed on 1 August 2023) [27]. Cells with a CNV z-score ≥ 1 compared to normal cells within each cluster were classified as tumor cells.
As illustrated in Figure 1, once cell types were pre-annotated in the scRNA-seq data, subsampling was performed using geometric sketching [28] to reduce the computational burden. This approach utilizes the coordinate positions and metadata from each cell-type cluster. To construct the input data for CCI classification, we extracted 100 representative cells per cluster based on gene expression and organized their profiles by ligand–receptor pairs. For each pairwise combination of cell type clusters, we generated expression matrices (100 cells per cluster) and performed 1000 random samplings per combination to comprehensively capture the overall co-expression patterns.

4.2. CNN in PriorCCI

To construct a comprehensive list of biologically relevant ligands and receptors, we integrated and curated data from CellPhoneDB and ICELLNET, resulting in 3148 ligand–receptor pairs composed of 755 receptor and 893 ligand genes.
As shown in Figure 1, the form of input for multiclass classification using a CNN is as follows: The final matrix consisted of 6296 pairs, including 3148 ligand–receptor pairs for the two cell types and 3148 pairs in the opposite direction, forming a matrix of 100 cells each (100, 6296, 2). The CNN comprises four convolution steps, with the initial three steps serving to reduce the expression values of the cells to a single value. The first convolution layer was performed for each ligand–receptor pair from the two cell types. The second convolution layer reduced the values of the 10 cells to a single value. The third convolution layer repeated this process. The filters applied in each convolution were 8, 16, 16, and 32 filters. The kernel filters move in strides (1,1), (10,1), (10,1), and (1,4), respectively.
For a standard 2D convolution operation, let the input feature map be noted as X R H × W × C , where H, W, and C represent the height, width, and number of input channels, respectively. A set of learnable convolutional kernels (filters) is noted as W R K H × K W × C × C , where KH and KW are the kernel height and width, and C′ is the number of output channels. Each output feature map Y(k) is obtained by computing the weighted sum of the local region of the input, followed by the addition of a learnable bias term b(k), k-th output channel. This is expressed as follows:
Y i , j k = m = 0 K H     1 n = 0 K W     1 c = 0 C     1 X i + m , j + n c · W m , n c , k + b k
Here, i and j indexes the spatial position of the output, and k∈[1,C′] indexes the output channel.
In all four convolutions, to mitigate the gradient vanishing problem, the activation function used to impart nonlinearity was a Rectified Linear Unit (ReLU) that passed negative numbers as 0 and positive numbers unchanged [29].
MaxPooling was performed before and after the fourth convolution in order to further compress the convolution results, reduce the spatial dimensionality, and ensure that only important features remained. In addition, a batch normalization process was incorporated to normalize the mean and variance on a mini-batch basis to mitigate the problems of gradient runaway or vanishing. GlobalAveragePooling2D calculates the mean value of the entire feature map for each channel, thereby entirely removing spatial information and leaving only the vector information to be sent to the classifier. This was followed by a dense ReLU and two dropout processes that randomly removed 40% of the neurons to prevent overfitting. To obtain class probabilities, the final layer applies a softmax activation function using the following equation:
  S o f t m a x z i = e z i j = i K e z j ,         f o r   i = 1,2 , , K
Here, zi is the logit corresponding to the i-th class, and K is the total number of classes. The softmax function converts the raw outputs into a probability distribution across classes, ensuring that the outputs sum up to one.
The model uses Sparse Categorical Cross-Entropy as the loss function for training multiclass classification. The loss of a single sample is defined as follows:
  L C E = log y ^ c
where y ^ c denotes the predicted probability of class c obtained from the softmax output layer.
  y ^ i = e z i j = 1 C e z j ,         f o r   i = 1,2 , , C
where zi is the logit of the pre-softmax activation for class i and C is the total number of classes.
Optimization was performed using the Adam optimizer [30], with learning rate decay applied via plateau detection. It adapts the learning rate for each parameter based on the first and second moments of the gradient. The update rule for parameter θ at time step t is:
m t = β 1 m t 1 + 1 β 1 θ L t
υ t = β 2 υ t 1 + 1 β 2 θ L t 2
m ^ t = m t 1 β 1 t ,         υ ^ t = υ t 1 β 2 t
θ t = θ t 1 η · m ^ t υ ^ t + ϵ
where η is the learning rate (set to 1 × 10−4 in our experiments), and the default hyperparameters are set as follows: β1 = 0.9, β2 = 0.9999, and ϵ = 10−8.
The models were evaluated using the accuracy, precision, recall, F1 score, and area under the ROC curve (AUC) as the primary metrics. In multiple classifications, the AUC was computed using the one-vs.-rest strategy and macro-averaged across all classes. In addition, to ensure robustness, the stability and consistency of the 10 models were evaluated through 10 learning repetitions using random seeds.

4.3. Similarity Calculation Within Models

To compare gene importance across different models, we computed cosine similarity:
c o s i n e _ s i m i l a r i t y A , B = i = 1 n A i B i i = 1 n A i 2 i = 1 n B i 2
We also used Spearman rank correlation to assess monotonic relationships:
ρ = 1 6 d i 2 n n 2 1
where di is the rank difference between corresponding elements.

4.4. Visual Interpretation with Grad-CAM++ in PriorCCI

Following CNN learning in PriorCCI, this is the most central method for visually interpreting the priorities of gene pairs. This can be summarized in six steps to determine the priority of major gene combinations:
  • Model and class definition: The first step was to define the model and class. Let f:XRC be the trained CNN model, where X∈ℝH×W×D is the input (e.g., ligand–receptor pixel image), C is the number of output classes. We denote the output logit (before softmax) for class c as yc = fc(X).
  • Grad-CAM++ computation: In the second step, an importance map calculation based on Grad-CAM++ is performed. Let Ak∈ℝH′×W be the k-th feature map at the last convolutional layer. The importance weight α k c for class c is computed via Grad-CAM++ as:
α k c = i , j 2 y c A i j k 2 · R e L U y c A i j k
Then, the Grad-CAM++ heatmap L Grad - CAM + + c is:
L Grad - CAM + + c = \ R e L U k α k c A k
This heatmap is normalized for visualization:
L c ~ = L c min L c max L c min L c
3.
Classwise average of CAMs: The third step is the calculation of the class-specific average of the CAM. Given N samples from class c, the classwise mean of the CAM is:
L c ¯ = 1 N n = 1 N L n c ~
4.
Extraction of ligand–receptor importance: In the fourth step, the importance of each ligand–receptor pair must be extracted. Given the predefined ligand–receptor index l i , r i i = 1 G , the CAM weight for pair i is:
w i c = L c ¯ l i , r i
5.
Statistical analysis: The fifth step was the statistical analysis of gene pairs with the top 5% importance values. Let w i j be the weight of the gene pair i in model run j (total M runs). Filtering the top 5% per model, we define
μ i = 1 M i j = 1 M i w i j ,         σ i 2 = 1 M i j = 1 M i w i j μ i 2
where Mi is the number of models where pair i is in the top 5%. And then we define the coefficient of variation (CV) and median values.
6.
Final ranking: The final step in the process entails the acquisition and organization of the information set, denoted by μ i , σ i 2 , CV i , Med i , M i , and its subsequent arrangement in descending order of Mi or μi.

4.5. Gene Filtering with ECF

While Grad-CAM++ highlights the gene pair importance between cell types, further validation of the gene expression patterns is necessary. In scRNA-seq data, it has been noted that the proportion of cells expressing a gene is often more informative than averaged expression levels [18]. Therefore, for all ligand–receptor pairs prioritized by Grad-CAM++, we assessed the ECF in the relevant cell type. Only gene pairs with ECF values in the upper quantiles (top 25%) of the distribution were retained, allowing for a more biologically meaningful interpretation of key interactions.

Author Contributions

Conceptualization, H.K. and J.K.; methodology, H.K., Y.S. and J.K.; software, H.K. and J.K.; validation, H.K. and E.C.; formal analysis, H.K. and J.K.; investigation, H.K., E.C. and J.K.; resources, J.K.; data curation, H.K., E.C. and Y.S.; writing—original draft preparation, H.K. and J.K.; writing—review and editing, J.K.; visualization, H.K. and J.K.; supervision, J.K.; project administration, J.K.; funding acquisition, J.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by grants from the National Cancer Center, Republic of Korea (grant no. NCC-2410650) and the National Research Foundation of Korea (NRF) funded by the Korean government (MSIT) (Grant No. RS-2024-00352797).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The NSCLC and CRC datasets were obtained from CCA [17,18] and are available on ZENODO at https://zenodo.org/records/10651059. The source code for PriorCCI has been uploaded to our Github site, https://github.com/nccpai/PriorCCI.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Almet, A.A.; Cang, Z.; Jin, S.; Nie, Q. The Landscape of Cell–Cell Communication through Single-Cell Transcriptomics. Curr. Opin. Syst. Biol. 2021, 26, 12–23. [Google Scholar] [CrossRef]
  2. Kolodziejczyk, A.A.; Kim, J.K.; Svensson, V.; Marioni, J.C.; Teichmann, S.A. The Technology and Biology of Single-Cell RNA Sequencing. Mol. Cell 2015, 58, 610–620. [Google Scholar] [CrossRef]
  3. Natri, H.M.; Del Azodi, C.B.; Peter, L.; Taylor, C.J.; Chugh, S.; Kendle, R.; Chung, M.I.I.; Flaherty, D.K.; Matlock, B.K.; Calvi, C.L.; et al. Cell-Type-Specific and Disease-Associated Expression Quantitative Trait Loci in the Human Lung. Nat. Genet. 2024, 56, 595–604. [Google Scholar] [CrossRef] [PubMed]
  4. Armingol, E.; Officer, A.; Harismendy, O.; Lewis, N.E. Deciphering Cell–Cell Interactions and Communication from Gene Expression. Nat. Rev. Genet. 2021, 22, 71–88. [Google Scholar] [CrossRef] [PubMed]
  5. Eidi, Z.; Khorasani, N.; Sadeghi, M. Correspondence between Signaling and Developmental Patterns by Competing Cells: A Computational Perspective. bioRxiv 2023. [Google Scholar] [CrossRef]
  6. Dimitrov, D.; Türei, D.; Garrido-Rodriguez, M.; Burmedi, P.L.; Nagai, J.S.; Boys, C.; Ramirez Flores, R.O.; Kim, H.; Szalai, B.; Costa, I.G.; et al. Comparison of Methods and Resources for Cell-Cell Communication Inference from Single-Cell RNA-Seq Data. Nat. Commun. 2022, 13, 3224. [Google Scholar] [CrossRef]
  7. Jin, S.; Plikus, M.V.; Nie, Q. CellChat for Systematic Analysis of Cell–Cell Communication from Single-Cell Transcriptomics. Nat. Protoc. 2025, 20, 180–219. [Google Scholar] [CrossRef]
  8. Wilk, A.J.; Shalek, A.K.; Holmes, S.; Blish, C.A. Comparative Analysis of Cell–Cell Communication at Single-Cell Resolution. Nat. Biotechnol. 2024, 42, 470–483. [Google Scholar] [CrossRef]
  9. Liu, Z.; Sun, D.; Wang, C. Evaluation of Cell-Cell Interaction Methods by Integrating Single-Cell RNA Sequencing Data with Spatial Information. Genome Biol. 2022, 23, 218. [Google Scholar] [CrossRef]
  10. Comes, M.C.; Casti, P.; Mencattini, A.; Di Giuseppe, D.; Mermet-Meillon, F.; De Ninno, A.; Parrini, M.C.; Businaro, L.; Di Natale, C.; Martinelli, E. The Influence of Spatial and Temporal Resolutions on the Analysis of Cell-Cell Interaction: A Systematic Study for Time-Lapse Microscopy Applications. Sci. Rep. 2019, 9, 6789. [Google Scholar] [CrossRef]
  11. Efremova, M.; Vento-Tormo, M.; Teichmann, S.A.; Vento-Tormo, R. CellPhoneDB: Inferring Cell–Cell Communication from Combined Expression of Multi-Subunit Ligand–Receptor Complexes. Nat. Protoc. 2020, 15, 1484–1506. [Google Scholar] [CrossRef] [PubMed]
  12. Browaeys, R.; Saelens, W.; Saeys, Y. NicheNet: Modeling Intercellular Communication by Linking Ligands to Target Genes. Nat. Methods 2020, 17, 159–162. [Google Scholar] [CrossRef] [PubMed]
  13. Noël, F.; Massenet-Regad, L.; Carmi-Levy, I.; Cappuccio, A.; Grandclaudon, M.; Trichot, C.; Kieffer, Y.; Mechta-Grigoriou, F.; Soumelis, V. Dissection of Intercellular Communication Using the Transcriptome-Based Framework ICELLNET. Nat. Commun. 2021, 12, 1089. [Google Scholar] [CrossRef] [PubMed]
  14. Forcato, M.; Romano, O.; Bicciato, S. Computational Methods for the Integrative Analysis of Single-Cell Data. Brief. Bioinform. 2021, 22, bbaa042. [Google Scholar] [CrossRef]
  15. Yang, W.; Wang, P.; Luo, M.; Cai, Y.; Xu, C.; Xue, G.; Jin, X.; Cheng, R.; Que, J.; Pang, F.; et al. DeepCCI: A Deep Learning Framework for Identifying Cell–Cell Interactions from Single-Cell RNA Sequencing Data. Bioinformatics 2023, 39, btad596. [Google Scholar] [CrossRef]
  16. Chattopadhay, A.; Sarkar, A.; Howlader, P.; Balasubramanian, V.N. Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 839–847. [Google Scholar] [CrossRef]
  17. Kang, J.; Lee, J.H.; Cha, H.; An, J.; Kwon, J.; Lee, S.; Kim, S.; Baykan, M.Y.; Kim, S.Y.; An, D.; et al. Systematic Dissection of Tumor-Normal Single-Cell Ecosystems across a Thousand Tumors of 30 Cancer Types. Nat. Commun. 2024, 15, 4067. [Google Scholar] [CrossRef]
  18. Kwon, J.; Kang, J.; Jo, A.; Seo, K.; An, D.; Baykan, M.Y.; Lee, J.H.; Kim, N.; Eum, H.H.; Hwang, S.; et al. Single-Cell Mapping of Combinatorial Target Antigens for CAR Switches Using Logic Gates. Nat. Biotechnol. 2023, 41, 1593–1605. [Google Scholar] [CrossRef]
  19. Subramanian, A.; Tamayo, P.; Mootha, V.K.; Mukherjee, S.; Ebert, B.L.; Gillette, M.A.; Paulovich, A.; Pomeroy, S.L.; Golub, T.R.; Lander, E.S.; et al. Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles. Proc. Natl. Acad. Sci. USA 2005, 102, 15545–15550. [Google Scholar] [CrossRef]
  20. Somanath, P.R.; Malinin, N.L.; Byzova, T.V. Cooperation between Integrin Aνβ3 and VEGFR2 in Angiogenesis. Angiogenesis 2009, 12, 177–185. [Google Scholar] [CrossRef]
  21. Ribatti, D. The Crossroad between Tumor and Endothelial Cells. Clin. Exp. Med. 2024, 24, 227. [Google Scholar] [CrossRef]
  22. Leone, P.; Malerba, E.; Susca, N.; Favoino, E.; Perosa, F.; Brunori, G.; Prete, M.; Racanelli, V. Endothelial Cells in Tumor Microenvironment: Insights and Perspectives. Front. Immunol. 2024, 15, 1367875. [Google Scholar] [CrossRef]
  23. David, K.; Friedlander, G.; Pellegrino, B.; Radomir, L.; Lewinsky, H.; Leng, L.; Bucala, R.; Becker-Herman, S.; Shachar, I. CD74 as a Regulator of Transcription in Normal B Cells. Cell Rep. 2022, 41, 111572. [Google Scholar] [CrossRef] [PubMed]
  24. Wolf, F.A.; Angerer, P.; Theis, F.J. SCANPY: Large-Scale Single-Cell Gene Expression Data Analysis. Genome Biol. 2018, 19, 15. [Google Scholar] [CrossRef]
  25. Aran, D.; Looney, A.P.; Liu, L.; Wu, E.; Fong, V.; Hsu, A.; Chak, S.; Naikawadi, R.P.; Wolters, P.J.; Abate, A.R.; et al. Reference-Based Analysis of Lung Single-Cell Sequencing Reveals a Transitional Profibrotic Macrophage. Nat. Immunol. 2019, 20, 163–172. [Google Scholar] [CrossRef] [PubMed]
  26. Domínguez Conde, C.; Xu, C.; Jarvis, L.B.; Rainbow, D.B.; Wells, S.B.; Gomes, T.; Howlett, S.K.; Suchanek, O.; Polanski, K.; King, H.W.; et al. Cross-Tissue Immune Cell Analysis Reveals Tissue-Specific Features in Humans. Science 2025, 376, eabl5197. [Google Scholar] [CrossRef] [PubMed]
  27. Available online: https://github.com/broadinstitute/inferCNV/wiki (accessed on 1 August 2023).
  28. Hie, B.; Cho, H.; DeMeo, B.; Bryson, B.; Berger, B. Geometric Sketching Compactly Summarizes the Single-Cell Transcriptomic Landscape. Cell Syst. 2019, 8, 483–493.e7. [Google Scholar] [CrossRef]
  29. Agarap, A.F. Deep Learning Using Rectified Linear Units (Relu). arXiv 2018, arXiv:1803.08375. [Google Scholar]
  30. Kingma, D.P. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Figure 1. Overview of the PriorCCI framework. Each interaction class is defined by specific cell type combinations identified via scRNA-seq data clustering. Two-channel expression matrices are constructed from ligand–receptor gene pairs across sampled cell profiles. The CNN model applies two steps of convolutions and Grad-CAM++ to prioritize biologically relevant ligand–receptor interactions, followed by gene filtering.
Figure 1. Overview of the PriorCCI framework. Each interaction class is defined by specific cell type combinations identified via scRNA-seq data clustering. Two-channel expression matrices are constructed from ligand–receptor gene pairs across sampled cell profiles. The CNN model applies two steps of convolutions and Grad-CAM++ to prioritize biologically relevant ligand–receptor interactions, followed by gene filtering.
Ijms 26 07110 g001
Figure 2. Summary of all datasets used for PriorCCI (a) UMAP presentation with clusters representing the major cell types in the NSCLC and CRC datasets, respectively: NSCLC, Non-small cell lung cancer; CRC, Colorectal cancer. (b) Bar and pie plots for showing the composition of cell-type-specific cell counts and data-source-wise proportions, respectively. NSCLC is on the left, and CRC is on the right.
Figure 2. Summary of all datasets used for PriorCCI (a) UMAP presentation with clusters representing the major cell types in the NSCLC and CRC datasets, respectively: NSCLC, Non-small cell lung cancer; CRC, Colorectal cancer. (b) Bar and pie plots for showing the composition of cell-type-specific cell counts and data-source-wise proportions, respectively. NSCLC is on the left, and CRC is on the right.
Ijms 26 07110 g002
Figure 3. Performance of PriorCCI. The learning curves for loss and accuracy according to epochs, using the training and validation sets, are shown on the left. The right side displays ROC curves and the associated macro-average AUC, as well as the confusion matrix. The upper plots are for NSCLC, and the lower plots are for CRC.
Figure 3. Performance of PriorCCI. The learning curves for loss and accuracy according to epochs, using the training and validation sets, are shown on the left. The right side displays ROC curves and the associated macro-average AUC, as well as the confusion matrix. The upper plots are for NSCLC, and the lower plots are for CRC.
Ijms 26 07110 g003
Figure 4. Similarity across 10 models. Heatmaps show the similarity of ligand–receptor importance values across models for each cell-type combination involving tumor cells. Similarity was computed using both cosine and Spearman methods; the results shown here are based on Spearman correlation.
Figure 4. Similarity across 10 models. Heatmaps show the similarity of ligand–receptor importance values across models for each cell-type combination involving tumor cells. Similarity was computed using both cosine and Spearman methods; the results shown here are based on Spearman correlation.
Ijms 26 07110 g004
Figure 5. Functional validation of PriorCCI-prioritized gene pairs through enrichment analysis. (a) Bar plot showing significantly enriched terms associated with gene pairs from Tumor–Endothelial interactions in NSCLC. (b) Enriched pathways for Tumor–Myeloid interactions in NSCLC. (c) Functional terms for Tumor–Fibroblast interactions in CRC. (d) Immune-related enrichment in Tumor–Myeloid interactions in CRC. Each bar indicates an enriched term from Gene Ontology (GO Biological Process and GO Molecular Function), KEGG pathway, or MSigDB Hallmark. Color legend corresponds to the source of each term.
Figure 5. Functional validation of PriorCCI-prioritized gene pairs through enrichment analysis. (a) Bar plot showing significantly enriched terms associated with gene pairs from Tumor–Endothelial interactions in NSCLC. (b) Enriched pathways for Tumor–Myeloid interactions in NSCLC. (c) Functional terms for Tumor–Fibroblast interactions in CRC. (d) Immune-related enrichment in Tumor–Myeloid interactions in CRC. Each bar indicates an enriched term from Gene Ontology (GO Biological Process and GO Molecular Function), KEGG pathway, or MSigDB Hallmark. Color legend corresponds to the source of each term.
Ijms 26 07110 g005
Figure 6. Distribution of gene-pair rankings assigned by each method relative to prioritization in PriorCCI. The horizontal strip plot compares the ranks of ligand–receptor pairs predicted by each CCI analysis method for the Tumor–Endothelial cell interaction. Each dot represents a gene pair detected by one of the tools. The x-axis indicates the Grad-CAM++-based rank in PriorCCI, with higher-priority pairs positioned on the left (top 25%).
Figure 6. Distribution of gene-pair rankings assigned by each method relative to prioritization in PriorCCI. The horizontal strip plot compares the ranks of ligand–receptor pairs predicted by each CCI analysis method for the Tumor–Endothelial cell interaction. Each dot represents a gene pair detected by one of the tools. The x-axis indicates the Grad-CAM++-based rank in PriorCCI, with higher-priority pairs positioned on the left (top 25%).
Ijms 26 07110 g006
Figure 7. ECF status of ligand–receptor pair candidates specific for the Tumor–Endothelial cell interaction (a) Candidates of gene–gene pairs from PriorCCI (upper) and other methods (lower) are shown as ECF (%) with respect to endothelial cells (y-axis) and tumor cells (x-axis). (b) The ECF values of the PriorCCI candidates according to cell type. The genes from tumor cells (top) and endothelial cells (bottom) are compared in other cell types after filtering out genes with an ECF under 0.1% in all cell types.
Figure 7. ECF status of ligand–receptor pair candidates specific for the Tumor–Endothelial cell interaction (a) Candidates of gene–gene pairs from PriorCCI (upper) and other methods (lower) are shown as ECF (%) with respect to endothelial cells (y-axis) and tumor cells (x-axis). (b) The ECF values of the PriorCCI candidates according to cell type. The genes from tumor cells (top) and endothelial cells (bottom) are compared in other cell types after filtering out genes with an ECF under 0.1% in all cell types.
Ijms 26 07110 g007
Table 1. Adjusted cell counts for each cell type after geometric sketching for subsampling.
Table 1. Adjusted cell counts for each cell type after geometric sketching for subsampling.
Cell TypeNSCLCCRC
Not AppliedAppliedNot AppliedApplied
T/NK198,92718,594252,23243,586
Tumor131,66217,856196,58942,991
B73,50817,729100,11642,142
Myeloid40,66015,50269,27536,787
Epithelial15,42712,73434,87432,111
Fibroblast/Pericyte12,17711,70425,82925,652
Endothelial9990999023,74223,742
Total482,351104,109702,65727,001
T/NK: T/Natural Killer cells.
Table 2. Performance of 10 CNN models for each cancer type.
Table 2. Performance of 10 CNN models for each cancer type.
ModelNSCLCCRC
LossAccuracyPrecisionRecallF1 ScoreMacro AUCLossAccuracyPrecisionRecallF1 ScoreMacro AUC
v10.0210.9930.9940.9930.9931.0000.1410.9540.9610.9540.9530.999
v20.0430.9830.9850.9830.9831.0000.2000.9420.9450.9420.9420.999
v30.0180.9970.9970.9970.9970.9990.0750.9770.9790.9770.9770.999
v40.0210.9940.9940.9940.9941.0000.1980.9490.9520.9490.9480.999
v50.0290.9890.9900.9890.9891.0000.1770.9430.9450.9430.9430.999
v60.0430.9860.9870.9860.9861.0000.1910.9440.9460.9440.9440.999
v70.0270.9910.9910.9910.9910.9990.1570.9490.9500.9490.9490.999
v80.0320.9880.9890.9880.9881.0000.1710.9290.9420.9290.9270.999
v90.0260.9920.9920.9920.9920.9990.1680.9550.9570.9550.9550.999
v100.0280.9920.9930.9920.9921.0000.0640.9810.9810.9810.9801.000
Avg.0.0290.9910.9910.9910.9911.0000.1540.9520.9560.9520.9520.999
Table 3. Summary of model-to-model similarities across CCI classes based on the importance scores by Grad-CAM++.
Table 3. Summary of model-to-model similarities across CCI classes based on the importance scores by Grad-CAM++.
Class No.NSCLCCRC
CosineSpearmanCosineSpearman
MeanSDMeanSDMeanSDMeanSD
00.8670.0490.6520.0900.8840.0470.6320.136
10.9020.0300.6720.1000.8540.0350.6030.113
20.8990.0580.8300.0620.9040.0270.8220.043
30.9700.0110.9020.0380.9070.0220.7310.088
40.8830.0420.7000.1040.9300.0250.8480.060
50.9340.0240.7560.0800.8410.0530.4740.172
60.8950.0440.8670.0390.8350.1090.6280.149
70.9590.0160.9440.0190.7660.0860.3210.223
80.9430.0220.9150.0350.8450.0570.6580.106
90.9570.0140.9040.0360.9400.0150.9140.030
100.7170.1130.4180.1400.8610.0640.7340.074
110.8220.0770.6710.0980.8920.0470.8040.056
120.8940.0390.6240.1140.8570.0360.6620.100
130.8190.0580.5570.1300.8250.0860.6570.157
140.8610.0350.6550.0680.9020.0260.7570.085
150.8830.0440.6620.0930.8190.0850.5650.172
160.9580.0110.8840.0470.8110.0490.6680.100
170.7790.0950.5800.0990.8450.0610.6060.120
180.9240.0250.7370.0980.9190.0260.8340.058
190.9220.0220.7510.0690.8240.0720.5870.116
200.8670.0490.6520.0900.8180.1220.5610.117
Table 4. Cell type-specific ECF (%) of gene candidates from other CCI tool results.
Table 4. Cell type-specific ECF (%) of gene candidates from other CCI tool results.
Cell TypeGene Candidates
APPCD74FLT1TNFRSF21TNFSF10TNFRSF10B
T/NK29.495.97.910.424.114.2
Tumor3.674.10.40.312.23.1
B62.881.81.132.239.925.9
Myeloid77.893.146.62.768.613.7
Epithelial6.999.10.50.79.13.8
Fibroblast/Pericyte57.760.01.212.720.29.7
Endothelial45.687.50.713.930.817.4
T/NK: T/Natural Killer cells.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, H.; Choi, E.; Shim, Y.; Kwon, J. PriorCCI: Interpretable Deep Learning Framework for Identifying Key Ligand–Receptor Interactions Between Specific Cell Types from Single-Cell Transcriptomes. Int. J. Mol. Sci. 2025, 26, 7110. https://doi.org/10.3390/ijms26157110

AMA Style

Kim H, Choi E, Shim Y, Kwon J. PriorCCI: Interpretable Deep Learning Framework for Identifying Key Ligand–Receptor Interactions Between Specific Cell Types from Single-Cell Transcriptomes. International Journal of Molecular Sciences. 2025; 26(15):7110. https://doi.org/10.3390/ijms26157110

Chicago/Turabian Style

Kim, Hanbyeol, Eunyoung Choi, Yujeong Shim, and Joonha Kwon. 2025. "PriorCCI: Interpretable Deep Learning Framework for Identifying Key Ligand–Receptor Interactions Between Specific Cell Types from Single-Cell Transcriptomes" International Journal of Molecular Sciences 26, no. 15: 7110. https://doi.org/10.3390/ijms26157110

APA Style

Kim, H., Choi, E., Shim, Y., & Kwon, J. (2025). PriorCCI: Interpretable Deep Learning Framework for Identifying Key Ligand–Receptor Interactions Between Specific Cell Types from Single-Cell Transcriptomes. International Journal of Molecular Sciences, 26(15), 7110. https://doi.org/10.3390/ijms26157110

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop