Biology-Informed Matrix Factorization: An AI-Driven Framework for Enhanced Drug Repositioning

Wang, Yangyang; Wang, Yaping; Hu, Ya; Wang, Jihan

doi:10.3390/biology14050549

Open AccessArticle

Biology-Informed Matrix Factorization: An AI-Driven Framework for Enhanced Drug Repositioning

by

Yangyang Wang

^1,†,

Yaping Wang

^2,†,

Ya Hu

³ and

Jihan Wang

^2,*

¹

School of Physics and Electronic Information, Yan’an University, Yan’an 716000, China

²

Yan’an Medical College, Yan’an University, Yan’an 716000, China

³

Department of Medical College, Hunan Polytechnic of Environment and Biology, Hengyang 421000, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Biology 2025, 14(5), 549; https://doi.org/10.3390/biology14050549

Submission received: 18 April 2025 / Revised: 8 May 2025 / Accepted: 14 May 2025 / Published: 15 May 2025

(This article belongs to the Special Issue Artificial Intelligence Research for Complex Biological Systems)

Download

Browse Figures

Review Reports Versions Notes

Simple Summary

Finding new medical uses for existing drugs can help patients faster and at a lower cost compared to developing entirely new drugs. This process, called drug repositioning, uses the information we already know about approved drugs to see if they might treat other diseases. In this study, we created a computer model that combines drug and disease similarities based on biological knowledge, making predictions about which drugs might be effective for diseases they are not currently used to treat. Our method uses a mathematical technique called matrix factorization, enhanced by biological information, to improve accuracy and reliability. We tested the model using well-known data and showed that it works better than existing approaches. We also looked at several real drugs and found new disease connections for each one, many of which were supported by trusted medical databases. This approach helps researchers discover new treatment options faster and more efficiently, which could benefit public health and reduce costs in drug development.

Abstract

Advances in artificial intelligence (AI) and intelligent computing have significantly accelerated drug discovery by enabling accurate modeling of complex biomedical relationships. Among these efforts, drug repositioning—identifying novel therapeutic uses for approved or investigational drugs—offers a cost-effective and time-efficient alternative to de novo drug development. While non-negative matrix factorization (NMF) has been widely adopted for uncovering latent drug–disease associations, conventional implementations often neglect the biological context that underpins these relationships. In this work, we propose a novel NMF-based drug repositioning model that incorporates biological context (NMFIBC), which integrates drug and disease similarity networks through graph-regularized optimization to enhance predictive performance. This design enhances both the robustness and interpretability of association prediction. Extensive benchmarking on multiple gold-standard datasets demonstrates that NMFIBC outperforms existing methods across a range of metrics, including AUC, precision, and F1-score. Moreover, case studies involving clinically relevant drugs validate the biological plausibility of the predicted associations using public databases such as DrugBank, CTD, and KEGG. The proposed framework provides a powerful, context-aware AI strategy for discovering actionable insights in drug repositioning research.

Keywords:

artificial intelligence; drug repositioning; non-negative matrix factorization; biological context; graph regularization; drug–disease association prediction

Graphical Abstract

1. Introduction

Drug repositioning, also known as drug repurposing, is the process of identifying new therapeutic uses for existing drugs beyond their original medical indications [1,2]. This strategy leverages established pharmacokinetic, pharmacodynamic, and toxicity profiles to accelerate development, reduce costs, and lower the risk of failure in clinical trials. Repositioning can involve both approved drugs and those discontinued for reasons unrelated to safety. The pharmaceutical industry continues to face substantial challenges in traditional drug discovery, including high costs, extended timelines, and high attrition rates [2]. Drug repositioning offers a practical alternative by reassessing existing compounds for new indications [3]. Compared to de novo drug development, which may take over a decade and cost upwards of $2 billion, repositioning significantly reduces both development time and financial burden. Moreover, as repositioned drugs have already undergone preclinical and clinical evaluations, they carry a lower risk of safety-related failure [4,5].

In recent years, a variety of computational approaches have been proposed to facilitate drug repositioning, including signature matching, network-based models, and matrix decomposition-based techniques [4]. Signature matching compares gene expression profiles of diseases and drugs to identify therapeutic candidates that may reverse disease phenotypes. Network-based methods explore biological interaction networks (e.g., protein–protein interaction, gene regulation) to uncover latent links between drugs and diseases. Matrix decomposition methods—such as non-negative matrix factorization (NMF) [6,7,8] and singular value decomposition (SVD) [9,10]—aim to extract low-rank representations from drug–disease association matrices, revealing latent associations that are often imperceptible in the high-dimensional space [11].

Among the above approaches, NMF has attracted significant attention due to its scalability, interpretability, and effectiveness in identifying hidden structures. For example, Zhang et al. proposed feature-derived graph regularized matrix factorization (FGRMF) [12], which predicts drug side effects by incorporating drug similarity features. The similarity-constrained matrix factorization method (SCMFDD) [13] extended this approach by adding regularization terms for both drug and disease similarity. Sadeghi et al. introduced NMFDR [6], a network-based NMF variant for predicting new drug–disease associations. Other recent developments include bounded nuclear norm regularization (BNNR) [14], which mitigates cold start problems, and hybrid models that combine matrix factorization with semantic diffusion frameworks [15]. However, existing NMF-based methods often neglect the biological context underlying drug–disease associations, focusing primarily on numerical decompositions without integrating domain-specific knowledge. This omission may limit both prediction accuracy and the interpretability of results.

To address these limitations, we propose a novel non-negative matrix factorization-based drug repositioning model that incorporates biological context (NMFIBC). Our framework embeds biological similarity networks—capturing functional or semantic relationships among drugs and diseases—into a graph-regularized NMF optimization scheme. This design enhances predictive accuracy and ensures that inferred associations are not only statistically robust but also biologically meaningful.

2. Materials and Methods

2.1. Datasets

We employed two gold-standard datasets (Cdataset [16] and Fdataset [17]) to evaluate the performance of the proposed method in predicting novel drug–disease associations. These datasets vary in size, containing different numbers of drugs, diseases, and known associations, as detailed in Table 1. Each dataset includes precomputed drug similarity matrices, disease similarity matrices, and a binary association matrix that encodes known drug–disease interactions.

2.2. Standard NMF

Suppose we have a set of

n

drugs and

m

disease, denoted as

D = {d_{1}, d_{2}, ..., d_{n}}

and

S = {s_{1}, s_{2}, ..., s_{m}}

, respectively. The known side effects of these drugs, that is, the drug–side effect associations, are naturally represented by an

n \times m

matrix of

A

. As illustrated in Figure 1A, the binary value “0” or “1” of entry

A_{i, j}

indicates the absence or presence of disease

s_{j}

for drug

d_{i}

. Matrix factorization plays a pivotal role in various applications, particularly in recommendation systems, data analysis, and machine learning. Its primary function is to decompose a large, complex matrix into a product of two or more smaller, lower-rank matrices, as shown in Figure 1B. The matrix of

A

can be decomposed into two low-rank matrices

X

and

Y

, where

A \approx X^{T} Y, X \in R^{k \times n}, Y \in R^{k \times m}

and

r a n k (X) = r a n k (Y)

. Generally, the selection of the parameter may affect the accuracy of the final drug repositioning. From Figure 1, it is evident that the drug repurposing problem can be addressed through matrix factorization. This approach leverages the inherent structure of the data to decompose the complex relationships between drugs and their potential indications into more manageable components, thereby facilitating the discovery of new uses for existing drugs.

NMF is a valuable tool in data science due to its ability to provide insights into the composition of non-negative datasets and its interpretability of the underlying structure. The workflow of NMF is shown in Figure 1B. It is used to decompose a non-negative data matrix

A

into two or more non-negative matrices

X

and

Y

, whose product approximates the original matrix, and can be formulated by Equation (1):

\min_{W \geq 0, H \geq 0} | | A - X^{T} Y | |_{F}^{2}, s . t . X \geq 0, Y \geq 0

(1)

where

| | \cdot | |_{F}

is the Frobenius norm,

X \in R^{k \times n}

and

Y \in R^{k \times m}

(

k ≪ \min {m, n}

) are the latent matrix,

X_{i}

and

Y_{j}

can be regarded as drug-specific and disease-specific latent feature vectors, and

A_{i j} = X_{i}^{T} Y_{j}

. There are several ways in which

X

and

Y

can be obtained, and the most popular method implemented by Lee et al. can be found in [18].

2.3. Proposed Model: NMFIBC

The standard NMF can sometimes be sensitive to noise, overfit the data, or fail to capture the underlying structure effectively. Tikhonov regularization can stabilize the NMF algorithm and ensure

X, Y

smoothness [19,20], especially when dealing with high-dimensional data or when the data are not well-represented by a low-rank approximation. The NMF with Tikhonov regularization model can be presented by using the following objective function:

\begin{array}{l} \min | | A - X^{T} Y | |_{F}^{2} + μ (| | X | |_{F}^{2} + | | Y | |_{F}^{2}) \\ = \sum_{i j} {(A_{i j} - x_{i}^{T} y_{j})}^{2} + μ (\sum_{i} | | x_{i} | |^{2} + {\sum_{j} | | y_{j} | |}^{2}) \\ s . t . X \geq 0, Y \geq 0 \end{array}

(2)

where

μ

and

λ

are free parameters. By adjusting the regularization parameters, one can control the trade-off between fitting the data and maintaining the desired properties (like sparsity), allowing for more flexibility in model selection. Graph regularization can incorporate prior knowledge about the structure of the data, helping to preserve important relationships and patterns within the data [21,22]. In order to incorporate known drug interaction information, we developed an effective model, a non-negative matrix factorization-based drug repositioning method that incorporates biological context (NMFIBC). It amalgamates various data types: drug similarities

S_{d}

, disease similarities

S_{e}

, and established networks of drug–disease interactions to form a multifaceted network structure, as shown in Figure 2.

This model dissects matrix A into a pair of matrices with reduced rank, capturing the latent attributes of both drugs and diseases. Our model further imposes similarity constraints for drugs within these reduced-dimensional spaces. These drugs identified as potential candidates by the NMFIBC model may offer a foundation for deeper analytical investigation and subsequent empirical verification. Based on the drug similarity constraints and Equation (2), the objective function of NMFIBC can be formulated as Equation (3):

\begin{array}{l} L = \min | | A - X^{T} Y | |_{F}^{2} + μ (| | X | |_{F}^{2} + | | Y | |_{F}^{2}) + λ (| | X^{T} X - S_{d} | |_{F}^{2} + | | Y^{T} Y - S_{e} | |_{F}^{2}) \\ = \sum_{i j} {(A_{i j} - x_{i}^{T} y_{j})}^{2} + μ (\sum_{i} | | x_{i} | |^{2} + {\sum_{j} | | y_{j} | |}^{2}) \\ + λ (\sum_{i} | | x_{i}^{T} x_{i} - S_{d} | |^{2} + \sum_{i} | | y_{i}^{T} y_{i} - S_{e} | |^{2}) \\ s . t . X \geq 0, Y \geq 0 \end{array}

(3)

2.4. Optimization Algorithm

In this section, we use gradient descent to derive the solutions for the two latent feature matrices,

X

and

Y

, based on the objective function in Equation (3). In order to solve the optimization problem of Equation (3), we introduce Lagrange multipliers,

Φ = [φ_{i k}]

and

Ψ = [ψ_{j k}]

, to implement the constraints on

X \geq 0, Y \geq 0

[23]. The objective function can be transformed into Equation (4).

\begin{array}{l} L_{f} & = T r (A A^{T} - A Y^{T} X - X^{T} Y A^{T} + X^{T} Y Y^{T} X) + μ T r (X X^{T} + Y Y^{T}) \\ + λ T r (S_{d} S_{d}^{T} - S_{d} X^{T} X - X^{T} X S_{d} + X^{T} X X^{T} X) \\ + λ T r (S_{e} S_{e}^{T} - S_{e} Y^{T} Y - Y^{T} Y S_{e} + Y^{T} Y Y^{T} Y) \\ + T r (Φ X^{T}) + T r (Φ Y^{T}) \end{array}

(4)

The partial derivatives with respect to

X

and

Y

are as Equations (5) and (6):

\frac{\partial L}{\partial X} = - Y A^{T} + Y Y^{T} X + μ X + 2 λ (- X (S_{d}) + X X^{T} X) + Φ

(5)

\frac{\partial L}{\partial Y} = - X A + X X^{T} Y + μ Y + 2 λ (- Y (S_{e}) + Y Y^{T} Y) + Ψ

(6)

By using the Karush–Kuhn–Tuker (KKT) conditions [24], Equations (5) and (6) can be transformed into Equations (7) and (8):

{(- Y A^{T} + Y Y^{T} X)}_{i k} X_{i k} + {(μ X)}_{i k} X_{i k} + {(2 λ (- X S_{d} + X X^{T} X))}_{i k} X_{i k} = 0

(7)

{(- X A + X X^{T} Y)}_{j k} Y_{j k} + {(μ Y)}_{j k} Y_{j k} + {(2 λ (- Y S_{e} + Y Y^{T} Y))}_{i k} Y_{i k} = 0 = 0

(8)

Thus, the updating rules for

X

and

Y

can be obtained, as shown in Formulas (9) and (10):

X_{i k} \leftarrow X_{i k} \frac{{(Y A^{T} + 2 λ X S_{d})}_{i k}}{{(Y Y^{T} X + μ X + 2 λ X X^{T} X)}_{i k}}

(9)

Y_{j k} \leftarrow Y_{j k} \frac{{(X A + 2 λ Y S_{e})}_{j k}}{{(X X^{T} Y + μ Y + 2 λ Y Y^{T} Y)}_{i k}}

(10)

Upon adequate convergence of

X

and

Y

, or upon reaching a predefined number of iterations, the predictive matrix

\overset{⌢}{A}

can be accurately reconstructed as the product of obtained

X

and

Y

,

\overset{⌢}{A} \approx X^{T} Y

.

Algorithm 1 summarizes the procedure of NMFIBC for drug–disease association prediction.

Algorithm 1: Optimization algorithm for NMFIBC
	Input: $Drug similarity matrix, X \in R^{n \times n}$ ; $Disease similarity matrix, Y \in R^{m \times m}$ ;
	$Drug - side effect association matrix, A \in R^{n \times m}$ ; $The latent dimension of feature space, k < \min \{m, n\}$ ;
	$Regularization parameter, μ > 0, λ > 0$ ; $Maximum iterations M$ . $Output : The prediction matrix \hat{A}$ ; $Initialize X \in R^{n \times n}$ $and Y \in R^{m \times m}$ ; $t i m e s \leftarrow 0$ ; while $t i m e s < M$ do $update X$ $by using Equation (9)$ ; $update Y$ $by using Equation (10)$ ; $t i m e s \leftarrow t i m e s + 1$ ; end
	$\hat{A} \leftarrow X Y^{T}$ ;

2.5. Evaluation Metrics

In this section, we present a comprehensive evaluation of the performance of five distinct algorithms (IMCMDA [25], NCPMDA [26], RLSMDA [27], and SIMCLDA [28]) on two datasets, Cdataset and Fdataset. The metrics of interest include Area Under the Curve (AUC), Area Under the Precision-Recall Curve (AUPR), Accuracy (Acc), Sensitivity or Recall (Sen), Specificity (Spe), Precision (Pre), and F1 Score (Fl), as shown in Equations (11)–(16). These metrics provide a holistic view of the classification capabilities of each algorithm. All the experiments were conducted using MATLAB 2023b on Windows 10, running on an Intel(R) Core (TM) i5-12400F at 2.50 GHz.

S N = \frac{T P}{T P + F N}

(11)

S P = \frac{T N}{T N + F P}

(12)

A c c = \frac{T P + T N}{T P + T N + F P + F N}

(13)

p r e c i s i o n = \frac{T P}{T P + F P}

(14)

r e c a l l = \frac{T P}{T P + F N}

(15)

F 1 = \frac{2 * p r e c i s i o n * r e c a l l}{p r e c i s i o n + r e c a l l}

(16)

3. Results

3.1. Performance Evaluation and Metric Analysis

To assess the predictive capability of the proposed NMFIBC model, we conducted a comparative analysis against four state-of-the-art algorithms (IMCMDA, NCPMDA, RLSMDA, and SIMCLDA) on two benchmark datasets: Cdataset and Fdataset. The evaluation metrics include AUC, accuracy, sensitivity, specificity, precision, and F1 score. As summarized in Table 2 and Table 3, NMFIBC achieves the highest AUC scores on both datasets (0.921 on Cdataset and 0.894 on Fdataset), outperforming all competing methods.

Beyond AUC, NMFIBC also demonstrates superior performance in accuracy (0.993 on Cdataset and 0.990 on Fdataset), along with consistently high precision and F1 scores. Notably, the model maintains a balanced trade-off between sensitivity and specificity, with specificity values exceeding 0.99, indicating robustness in distinguishing both positive and negative associations. In contrast, other models such as IMCMDA and RLSMDA show moderate accuracy but comparatively lower recall and F1 scores, suggesting potential overfitting or class imbalance issues. Figure 3 presents a visual comparison of AUC and AUPR scores across the five models, while Figure 4 illustrates their F1 scores. These results suggest that incorporating biological similarity networks through graph-regularized optimization enables NMFIBC to generalize more effectively, especially when dealing with sparse and noisy biomedical data.

3.2. Case Study Overview

To further assess the practical utility of NMFIBC, we conducted case studies on five representative drugs: Levodopa, Doxorubicin, Amantadine, Flecainide, and Tacrolimus. These drugs were selected due to their diverse clinical applications and well-documented pharmacological profiles. For each drug, we predicted the top five disease candidates that had no known associations in the original dataset. These predictions were ranked by their model-generated confidence scores and validated using biomedical databases such as DrugBank, Comparative Toxicogenomics Database (CTD), and Kyoto Encyclopedia of Genes and Genomes (KEGG). A summary of these case studies is provided in Table 4 (based on Cdataset) and Table 5 (based on Fdataset), which include the following: the original known associations (if any), the top five predicted diseases with scores, and supporting evidence from public databases or literature. These predictions serve as compelling demonstrations of how NMFIBC can uncover novel drug indications. Further interpretation and biological relevance of these results are provided in the Discussion Section.

4. Discussion

The increasing complexity of biological systems and the explosion of high-dimensional biomedical data have placed AI at the forefront of modern drug discovery and systems biology. In this context, the integration of AI algorithms with domain-specific biological knowledge has become essential for deriving meaningful insights from heterogeneous biomedical datasets. Our proposed model, a nonnegative matrix factorization-based drug repositioning framework that incorporates biological context (NMFIBC), is an example of such an approach. By embedding biological similarity networks into the matrix factorization process, NMFIBC addresses the limitations of traditional models and enhances the discovery of novel, biologically plausible drug–disease associations.

In these exploratory analyses, the NMFIBC algorithm was utilized to forecast novel therapeutic applications for existing medications in practical scenarios. During the discovery phase of fresh connections between pharmaceuticals and medical conditions, the existing links within a benchmark dataset served as our training material, while the uncharted pairs were designated as the pool of prospective associations. Upon employing the NMFIBC model to compute the predictive scores for the entire spectrum of potential drug–disease pairs, we ranked the candidate illnesses in a hierarchy based on the calculated scores specific to each medication. To ascertain the veracity of these forecasts, we handpicked Levodopa, Doxorubicin, Amantadine, Flecainide, and Tacrolimus as representative examples, scrutinizing the potential conditions forecasted by NMFIBC and subsequently cataloging the validation details for their top quintet of prospective illnesses. The associations of potential diseases with their respective drugs were corroborated through reputable public repositories, including DrugBank [42], CTD [43], and KEGG [44]. A synthesis of the prognostications and substantiating proof is encapsulated within Table 4 and Table 5.

Levodopa, a well-established treatment for Parkinson’s disease, is highlighted in both datasets. In the original Fdataset, the relationship between Levodopa and Parkinson’s disease (168600) was not explicitly documented. However, our model was able to accurately predict this known drug–disease link, demonstrating its robust predictive power. The Fdataset also predicts additional candidate diseases such as insensitivity to pain with hyperplastic myelinopathy (147530) and restless leg syndrome (102300), with weights indicating the strength of these predictions. The Cdataset reinforces Levodopa’s association with Parkinson’s disease and introduces Alzheimer’s disease (605055) as a candidate for further investigation. Doxorubicin, primarily used in cancer treatment, shows a robust existing link with diseases like small cell cancer of the lung (182280) and breast cancer (114480) in the Fdataset. The predictions extend to other cancer types, indicating a potential broad-spectrum activity against various malignancies. The Cdataset also emphasizes Doxorubicin’s connection with cancer, predicting lymphoblastic leukemia (247640) and testicular germ cell tumor (273300) as new candidates. Amantadine, traditionally used for influenza and Parkinson’s disease, is presented with a significant existing relationship with multiple sclerosis (126200) in the Cdataset. The Fdataset predicts restless legs syndrome (102300) and Alzheimer’s disease (104300) as new candidates, suggesting a potential neuroprotective role for Amantadine beyond its current uses. Flecainide, an antiarrhythmic medication, is shown in the Cdataset with a strong existing relationship with atrial fibrillation (607554) and is predicted to have potential against dermatitis (603165) and allergic rhinitis (607154) in the Fdataset. This suggests that Flecainide may have immunomodulatory properties that could be harnessed for non-cardiac conditions. Tacrolimus, an immunosuppressive drug, is associated with asthma (208550) and dermatitis (603165) in the Fdataset, hinting at its potential in treating autoimmune and inflammatory conditions. The Cdataset also points to Tacrolimus’s potential in treating dermatitis (605805) and asthma (600807), further supporting its broad application in immunological disorders.

These case studies serve not only as illustrative examples of the model’s predictive capabilities but also as practical validations that demonstrate its biological relevance in real-world settings. While the core strength of NMFIBC lies in its systematic integration of biological similarity networks to enhance drug–disease association prediction, the case-level findings provide complementary insight into how these computational predictions translate into meaningful hypotheses for drug repurposing. Collectively, these findings underscore the potential of intelligent, context-aware modeling frameworks to advance our understanding of pharmacological mechanisms and to support hypothesis-driven biomedical research. The integration of rigorous benchmarking with interpretable, biologically grounded predictions makes this approach especially valuable in navigating the complexity of modern therapeutic discovery.

5. Conclusions

These case studies serve not only as illustrative examples of the model’s predictive capabilities but also as practical validations that demonstrate its biological relevance in real-world settings. While the core strength of NMFIBC lies in its systematic integration of biological similarity networks to enhance drug–disease association prediction, the case-level findings provide complementary insight into how these computational predictions can generate meaningful and testable hypotheses for drug repurposing.

Collectively, these findings underscore the potential of intelligent, context-aware modeling frameworks to advance our understanding of pharmacological mechanisms and support hypothesis-driven biomedical research. By combining rigorous performance benchmarking with biologically grounded interpretability, the NMFIBC framework contributes to bridging the gap between computational predictions and translational applications in therapeutic discovery.

Author Contributions

Conceptualization, Y.W. (Yangyang Wang), Y.W. (Yaping Wang) and J.W.; methodology, Y.W. (Yaping Wang) and Y.H.; validation, Y.W. (Yangyang Wang) and J.W., investigation, Y.W. (Yangyang Wang) and Y.W. (Yaping Wang); writing—original draft preparation, Y.W. (Yangyang Wang) and Y.W. (Yaping Wang); writing—review and editing, J.W.; funding acquisition, J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Open Funds for Shaanxi Provincial Key Laboratory of Infection and Immune Diseases (No. 2023-KFMS-1); Research Project of Yan’an University (No. YAU202512552).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Jourdan, J.-P.; Bureau, R.; Rochais, C.; Dallemagne, P. Drug repositioning: A brief overview. J. Pharm. Pharmacol. 2020, 72, 1145–1151. [Google Scholar] [CrossRef] [PubMed]
Hua, Y.; Dai, X.; Xu, Y.; Xing, G.; Liu, H.; Lu, T.; Chen, Y.; Zhang, Y. Drug repositioning: Progress and challenges in drug discovery for various diseases. Eur. J. Med. Chem. 2022, 234, 114239. [Google Scholar] [CrossRef]
Low, Z.Y.; Farouk, I.A.; Lal, S.K. Drug repositioning: New approaches and future prospects for life-debilitating diseases and the COVID-19 pandemic outbreak. Viruses 2020, 12, 1058. [Google Scholar] [CrossRef] [PubMed]
Jarada, T.N.; Rokne, J.G.; Alhajj, R. A review of computational drug repositioning: Strategies, approaches, opportunities, challenges, and directions. J. Cheminform. 2020, 12, 46. [Google Scholar] [CrossRef]
Yu, J.-L.; Dai, Q.-Q.; Li, G.-B. Deep learning in target prediction and drug repositioning: Recent advances and challenges. Drug Discov. Today 2022, 27, 1796–1814. [Google Scholar] [CrossRef] [PubMed]
Sadeghi, S.; Lu, J.; Ngom, A. A network-based drug repurposing method via non-negative matrix factorization. Bioinformatics 2022, 38, 1369–1377. [Google Scholar] [CrossRef]
Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401, 788–791. [Google Scholar] [CrossRef]
Tang, Y.; Li, G.; Wu, Y.; Yi, D. Identifying potential miRNA-disease associations based on an improved manifold learning framework. IEEE Access 2020, 8, 33263–33275. [Google Scholar] [CrossRef]
Wang, Y.Y.; Cui, C.; Qi, L.; Yan, H.; Zhao, X.M. DrPOCS: Drug Repositioning Based on Projection onto Convex Sets. IEEE/ACM Trans. Comput. Biol. Bioinform. 2019, 16, 154–162. [Google Scholar] [CrossRef]
Liu, J.; Zuo, Z.; Wu, G. Link prediction only with interaction data and its application on drug repositioning. IEEE Trans. NanoBioscience 2020, 19, 547–555. [Google Scholar] [CrossRef]
Luo, J.; Ding, P.; Liang, C.; Cao, B.; Chen, X. Collective prediction of disease-associated miRNAs based on transduction learning. IEEE/ACM Trans. Comput. Biol. Bioinform. 2016, 14, 1468–1475. [Google Scholar] [CrossRef]
Zhang, W.; Liu, X.; Chen, Y.; Wu, W.; Wang, W.; Li, X. Feature-derived graph regularized matrix factorization for predicting drug side effects. Neurocomputing 2018, 287, 154–162. [Google Scholar] [CrossRef]
Zhang, W.; Yue, X.; Lin, W.; Wu, W.; Liu, R.; Huang, F.; Liu, F. Predicting drug-disease associations by using similarity constrained matrix factorization. BMC Bioinform. 2018, 19, 233. [Google Scholar] [CrossRef]
Yang, M.; Luo, H.; Li, Y.; Wang, J. Drug repositioning based on bounded nuclear norm regularization. Bioinformatics 2019, 35, i455–i463. [Google Scholar] [CrossRef] [PubMed]
Timilsina, M.; Tandan, M.; d’Aquin, M.; Yang, H. Discovering Links Between Side Effects and Drugs Using a Diffusion Based Method. Sci. Rep. 2019, 9, 10436. [Google Scholar] [CrossRef]
Luo, H.; Wang, J.; Li, M.; Luo, J.; Peng, X.; Wu, F.X.; Pan, Y. Drug repositioning based on comprehensive similarity measures and Bi-Random walk algorithm. Bioinformatics 2016, 32, 2664–2671. [Google Scholar] [CrossRef] [PubMed]
Gottlieb, A.; Stein, G.Y.; Ruppin, E.; Sharan, R. PREDICT: A method for inferring novel drug indications with application to personalized medicine. Mol. Syst. Biol. 2011, 7, 496. [Google Scholar] [CrossRef]
Lee, D.; Seung, H.S. Algorithms for non-negative matrix factorization. In Proceedings of the Advances in Neural Information Processing Systems 13 (NIPS 2000), Denver, CO, USA, 1 January 2000. [Google Scholar]
Guan, N.; Tao, D.; Luo, Z.; Yuan, B. Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent. IEEE Trans. Image Process. 2011, 20, 2030–2048. [Google Scholar] [CrossRef]
Yuan, L.; Zhu, L.; Guo, W.-L.; Zhou, X.; Zhang, Y.; Huang, Z.; Huang, D.-S. Nonconvex penalty based low-rank representation and sparse regression for eQTL mapping. IEEE/ACM Trans. Comput. Biol. Bioinfor-Matics 2016, 14, 1154–1164. [Google Scholar] [CrossRef]
Cai, D.; He, X.; Han, J.; Huang, T.S. Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 1548–1560. [Google Scholar]
Li, X.; Cui, G.; Dong, Y. Graph regularized non-negative low-rank matrix factorization for image clustering. IEEE Trans. Cybern. 2016, 47, 3840–3853. [Google Scholar] [CrossRef]
Xiao, Q.; Luo, J.; Liang, C.; Cai, J.; Ding, P. A graph regularized non-negative matrix factorization method for identifying microRNA-disease associations. Bioinformatics 2017, 34, 239–248. [Google Scholar] [CrossRef] [PubMed]
Facchinei, F.; Kanzow, C.; Sagratella, S. Solving quasi-variational inequalities via their KKT conditions. Math. Program. 2014, 144, 369–412. [Google Scholar] [CrossRef]
Chen, X.; Wang, L.; Qu, J.; Guan, N.N.; Li, J.Q. Predicting miRNA-disease association based on inductive matrix completion. Bioinformatics 2018, 34, 4256–4265. [Google Scholar] [CrossRef] [PubMed]
Gu, C.; Liao, B.; Li, X.; Li, K. Network Consistency Projection for Human miRNA-Disease Associations Inference. Sci. Rep. 2016, 6, 36054. [Google Scholar] [CrossRef] [PubMed]
Chen, X.; Yan, G.Y. Semi-supervised learning for potential human microRNA-disease associations inference. Sci. Rep. 2014, 4, 5501. [Google Scholar] [CrossRef] [PubMed]
Lu, C.; Yang, M.; Luo, F.; Wu, F.X.; Li, M.; Pan, Y.; Li, Y.; Wang, J. Prediction of lncRNA-disease associations based on inductive matrix completion. Bioinformatics 2018, 34, 3357–3364. [Google Scholar] [CrossRef]
Newson, J.M.; Santos, C.D.; Walters, B.L.; Todd, B.R. The Case of Flecainide Toxicity: What to Look for and How to Treat. J. Emerg. Med. 2020, 59, e43–e47. [Google Scholar] [CrossRef]
Araujo, G.; Nascimento, M.; Arruda, P.; Moraes, L.F.; Carvalho, M.D.A.; Medeiros, D.; Sarinho, E. The importance of tacrolimus in the treatment of allergic keratoconjunctivitis. World Allergy Organ. J. 2015, 8 (Suppl. S1), A234. [Google Scholar] [CrossRef]
Taniguchi, H.; Tokui, K.; Iwata, Y.; Abo, H.; Izumi, S. A case of severe bronchial asthma controlled with tacrolimus. J. Allergy 2011, 2011, 479129. [Google Scholar] [CrossRef]
Gupta, A.; Dai, Y.; Vethanayagam, R.R.; Hebert, M.F.; Thummel, K.E.; Unadkat, J.D.; Ross, D.D.; Mao, Q. Cyclosporin A, tacrolimus and sirolimus are potent inhibitors of the human breast cancer resistance protein (ABCG2) and reverse resistance to mitoxantrone and topotecan. Cancer Chemother. Pharmacol. 2006, 58, 374–383. [Google Scholar] [CrossRef] [PubMed]
Nakata, S.; Kakimoto, K.; Numa, K.; Kinoshita, N.; Kawasaki, Y.; Tatsumi, Y.; Tawa, H.; Koshiba, R.; Hirata, Y.; Ota, K.; et al. Risk Factors for Nephrotoxicity due to Tacrolimus Therapy for Ulcerative Colitis. Digestion 2022, 103, 339–346. [Google Scholar] [CrossRef] [PubMed]
Molloy, S.; McKeith, I.G.; O’Brien, J.T.; Burn, D.J. The role of levodopa in the management of dementia with Lewy bodies. J. Neurol. Neurosurg. Psychiatry 2005, 76, 1200–1203. [Google Scholar] [CrossRef]
Vaculova, A.; Kaminskyy, V.; Jalalvand, E.; Surova, O.; Zhivotovsky, B. Doxorubicin and etoposide sensitize small cell lung carcinoma cells expressing caspase-8 to TRAIL. Mol. Cancer 2010, 9, 87. [Google Scholar] [CrossRef] [PubMed]
Manchun, S.; Dass, C.R.; Cheewatanakornkool, K.; Sriamornsak, P. Enhanced anti-tumor effect of pH-responsive dextrin nanogels delivering doxorubicin on colorectal cancer. Carbohydr. Polym. 2015, 126, 222–230. [Google Scholar] [CrossRef]
Habas, K.; Anderson, D.; Brinkworth, M.H. Germ cell responses to doxorubicin exposure in vitro. Toxicol. Lett. 2017, 265, 70–76. [Google Scholar] [CrossRef] [PubMed]
Dai, S.; Ye, Z.; Wang, F.; Yan, F.; Wang, L.; Fang, J.; Wang, Z.; Fu, Z. Doxorubicin-loaded poly(ε-caprolactone)-Pluronic micelle for targeted therapy of esophageal cancer. J. Cell. Biochem. 2018, 119, 9017–9027. [Google Scholar] [CrossRef]
Evidente, V.G.H.; Adler, C.H.; Caviness, J.N.; Hentz, J.G.; Gwinn-Hardy, K. Amantadine is beneficial in restless legs syndrome. Mov. Disord. 2000, 15, 324–327. [Google Scholar] [CrossRef]
Andrikopoulos, G.K.; Pastromas, S.; Tzeis, S. Flecainide: Current status and perspectives in arrhythmia management. World J. Cardiol. 2015, 7, 76–85. [Google Scholar] [CrossRef]
Canzanello, V.J.; Textor, S.C.; Taler, S.J.; Schwartz, L.L.; Porayko, M.K.; Wiesner, R.H.; Krom, R.A. Late hypertension after liver transplantation: A comparison of cyclosporine and tacrolimus (FK 506). Liver Transplant. Surg. 1998, 4, 328–334. [Google Scholar] [CrossRef]
Wishart, D.S.; Feunang, Y.D.; Guo, A.C.; Lo, E.J.; Marcu, A.; Grant, J.R.; Sajed, T.; Johnson, D.; Li, C.; Sayeeda, Z.; et al. DrugBank 5.0: A major update to the DrugBank database for 2018. Nucleic Acids Res. 2018, 46, 1074–1082. [Google Scholar] [CrossRef] [PubMed]
Davis, A.P.; Wiegers, T.C.; Johnson, R.J.; Sciaky, D.; Wiegers, J.; Mattingly, C.J. Comparative Toxicogenomics Database (CTD): Update 2023. Nucleic Acids Res. 2023, 51, D1257–D1262. [Google Scholar] [CrossRef] [PubMed]
Kanehisa, M.; Goto, S.; Sato, Y.; Kawashima, M.; Furumichi, M.; Tanabe, M. Data, information, knowledge and principle: Back to metabolism in KEGG. Nucleic Acids Res. 2014, 42, D199–D205. [Google Scholar] [CrossRef]

Figure 1. Overview of the standard non-negative matrix factorization (NMF) workflow for drug–disease association prediction. (A) Binary drug–disease association matrix representing known interactions. (B) Decomposition of the matrix into two lower-dimensional matrices capturing latent drug and disease features for use in predictive modeling.

Figure 2. Workflow of the proposed NMFIBC framework integrating biological similarity networks. The model incorporates drug and disease similarity information into the matrix factorization process via graph-regularized constraints, enhancing prediction accuracy and biological relevance.

Figure 3. AUC and AUPR scores of the five models (NMFIBC, IMCMDA, NCPMDA, RLSMDA, and SIMCLDA) evaluated on Cdataset and Fdataset. (A,B) AUC for Cdataset and Fdataset, respectively. (C,D) AUPR for Cdataset and Fdataset, respectively.

Figure 4. F1 Score of the five models on Cdataset and Fdataset.

Table 1. Summary of drug–disease similarity matrices and known associations in the benchmark datasets.

Dataset	Drug Similarity Matrix	Disease Similarity Matrix	Known Associations
Cdataset	663 × 663	409 × 409	409 × 663
Fdataset	593 × 593	313 × 313	313 × 593

Table 2. Performance comparison of competing methods on Cdataset across multiple evaluation metrics.

	AUC	AUPR	Acc	Sen (Recall)	Spe	Pre	F1
NMFIBC	0.921	0.566	0.993	0.504	0.997	0.633	0.561
IMCMDA	0.647	0.041	0.979	0.129	0.988	0.090	0.106
NCPMDA	0.665	0.315	0.989	0.333	0.995	0.395	0.362
RLSMDA	0.758	0.048	0.979	0.149	0.987	0.096	0.116
SIMCLDA	0.866	0.057	0.954	0.392	0.959	0.083	0.136

Table 3. Performance comparison of competing methods on Fdataset across multiple evaluation metrics.

	AUC	AUPR	Acc	Sen (Recall)	Spe	Pre	F1
NMFIBC	0.894	0.443	0.990	0.430	0.996	0.501	0.463
IMCMDA	0.634	0.037	0.982	0.100	0.992	0.111	0.105
NCPMDA	0.644	0.165	0.981	0.343	0.988	0.224	0.271
RLSMDA	0.738	0.048	0.982	0.121	0.991	0.121	0.121
SIMCLDA	0.856	0.054	0.941	0.408	0.946	0.074	0.125

Table 4. Top five predicted diseases for selected drugs based on Cdataset and their validation using public databases.

Drugs	Diseases (Existing Relations in Original Matrix)	Top Five Predicted Candidate Diseases (No Relation in Original Matrix)	Weight	Evidence
Levodopa (DB01235)	Paralysis agitans (168100) Parkinson disease (168600) Parkinson disease 2 (600116) Parkinson disease 7 (606324) Parkinson disease 15 (260300)	Dementia (125320)	0.761	DB/KEGG
		Alzheimer disease 9 (608907)	0.571	DB/KEGG
		Alzheimer disease (605055)	0.568	DB/KEGG
		Alzheimer disease 2 (104310)	0.560	DB/KEGG
		Alzheimer disease 5 (602096)	0.536	DB/KEGG
Doxorubicin (DB00997)	Mismatch repair cancer syndrome 1 (276300) Breast cancer (114480) Lymphoblastic leukemia (247640) Leukemia (601626) Lymphoma (236000)	Renal cell carcinoma (144700)	0.734	DB/KEGG
		Testicular germ cell tumor (273300)	0.692	DB
		Small cell cancer of the lung (182280)	0.654	DB
		Leukemia (246470)	0.651	KEGG
		Dohle bodies and leukemia (223350)	0.649	KEGG
Amantadine (DB00915)	Paralysis agitans (168100) Multiple sclerosis (126200) Popliteal pterygium syndrome (119500)	Parkinson’s disease 7 (606324)	0.337	DB/KEGG/CTD
		Parkinson’s disease 15 (260300)	0.325	DB/KEGG/CTD
		Schizophrenia (181500)	0.322	DB/KEGG
		Parkinson’s disease (168600)	0.318	DB/KEGG/CTD
		Parkinson’s disease 2 (600116)	0.318	DB/KEGG/CTD
Flecainide (DB01195)	Atrial fibrillation (607554)	Hypertension (608622)	0.688	[29]
		Renal failure (161900)	0.672	[29]
		Insensitivity to pain with hyperplastic Myelinopathy (147530)	0.520	Unknown
		Raynaud disease (179600)	0.413	Unknown
		Atrial fibrillation (608583)	0.404	DB/KEGG/CTD
Tacrolimus (DB00864)	Dermatitis (603165) Dermatitis (605805) Dermatitis (605804) Dermatitis (605844)	Allergic rhinitis (607154)	0.625	[30]
		Asthma (208550)	0.462	[31]
		Asthma (600807)	0.438	[31]
		Breast cancer (114480)	0.424	[32]
		Renal failure (161900)	0.396	[33]

Table 5. Top five predicted diseases for selected drugs based on Fdataset and their validation using public databases.

Drugs	Diseases (Existing Relations in Original Matrix)	Top Five Predicted Candidate Diseases (No Relation in Original Matrix)	Weight	Evidence
Levodopa (DB01235)	Paralysis agitans (168100) Restless legs syndrome (102300)	Parkinson’s disease (168600)	0.548	DB/KEGG/CTD
		Insensitivity to pain with hyperplastic Myelinopathy (147530)	0.531	Unknown
		Dementia (125320)	0.451	DB/KEGG, [34]
		Renal failure (161900)	0.422	Unknown
		Attention deficit hyperactivity disorder (143465)	0.382	Unknown
Doxorubicin (DB00997)	Myeloma (254500) Breast cancer (114480) Neuroblastoma (256700) Leukemia (601626) Lymphoma (236000)	Small cell cancer of the lung (182280)	0.577	[35]
		Colorectal cancer (114500)	0.573	[36]
		Testicular germ cell tumor (273300)	0.530	[37]
		Kaposi sarcoma (148000)	0.518	DB/KEGG
		Esophageal cancer (133239)	0.513	[38]
Amantadine (DB00915)	Paralysis agitans (168100) Multiple sclerosis (126200) Popliteal pterygium syndrome (119500)	Dementia (125320)	0.365	DB/KEGG/CTD
		Parkinson’s disease (168600)	0.363	DB/KEGG/CTD
		Restless legs syndrome (102300)	0.295	[39]
		Alzheimer’s disease (104300)	0.227	DB/KEGG/CTD
		Alzheimer disease (605055)	0.216	DB/KEGG/CTD
Flecainide (DB01195)	Atrial fibrillation (607554)	Hypertension (608622)	0.597	[29]
		Renal failure (161900)	0.560	[29]
		Atrial fibrillation (608583)	0.524	DB/CTD, [29]
		Insensitivity to pain with hyperplastic Myelinopathy (147530)	0.463	Unknown
		Stroke (601367)	0.335	[40]
Tacrolimus (DB00864)	Dermatitis (603165)	Renal failure (161900)	0.582	[33]
		Hypertension (608622)	0.490	[41]
		Asthma (208550)	0.381	[31]
		Insensitivity to pain with hyperplastic Myelinopathy (147530)	0.376	Unknown
		Hypoparathyroidism (146255)	0.374	Unknown

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Wang, Y.; Hu, Y.; Wang, J. Biology-Informed Matrix Factorization: An AI-Driven Framework for Enhanced Drug Repositioning. Biology 2025, 14, 549. https://doi.org/10.3390/biology14050549

AMA Style

Wang Y, Wang Y, Hu Y, Wang J. Biology-Informed Matrix Factorization: An AI-Driven Framework for Enhanced Drug Repositioning. Biology. 2025; 14(5):549. https://doi.org/10.3390/biology14050549

Chicago/Turabian Style

Wang, Yangyang, Yaping Wang, Ya Hu, and Jihan Wang. 2025. "Biology-Informed Matrix Factorization: An AI-Driven Framework for Enhanced Drug Repositioning" Biology 14, no. 5: 549. https://doi.org/10.3390/biology14050549

APA Style

Wang, Y., Wang, Y., Hu, Y., & Wang, J. (2025). Biology-Informed Matrix Factorization: An AI-Driven Framework for Enhanced Drug Repositioning. Biology, 14(5), 549. https://doi.org/10.3390/biology14050549

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Biology-Informed Matrix Factorization: An AI-Driven Framework for Enhanced Drug Repositioning

Simple Summary

Abstract

1. Introduction

2. Materials and Methods

2.1. Datasets

2.2. Standard NMF

2.3. Proposed Model: NMFIBC

2.4. Optimization Algorithm

2.5. Evaluation Metrics

3. Results

3.1. Performance Evaluation and Metric Analysis

3.2. Case Study Overview

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI