- Article
Identifying Metabolite–Disease Associations via Messaging in Hypergraphs
- Fuheng Xiao,
- Yihao Ran and
- Zhanchao Li
Background: Traditional machine-learning approaches face challenges when attempting to integrate diverse biological information for predicting metabolite–disease relationships. The intricate connections linking metabolites, diseases, proteins, and Gene Ontology (GO) annotations present substantial obstacles for conventional pairwise graph representations, which prove inadequate for modeling such complex multi-way interactions. Methods: An innovative hypergraph-based framework (DHG-LGB) was developed to exploit this complexity through conceptualizing diseases as hyperedges. Within this architecture, individual hyperedges link multiple vertices including metabolites, proteins, and GO annotations, thereby enabling richer representation of the biological networks underlying metabolite–disease relationships. Metabolite–disease relationships were encoded as low-dimensional vectors through hypergraph neural network (HGNN) operations incorporating Laplacian smoothing and message propagation mechanisms. LightGBM (LGB) was used to construct a model for identifying the potential metabolite–disease associations. Results: Under 5-fold cross-validation, DHG-LGB achieved 98.87% accuracy, 91.77% sensitivity, 99.58% specificity, 95.60% precision, Matthews correlation coefficient (MCC) of 0.9305, receiver operating characteristic area under curve (AUC) of 0.9983, and precision-recall area under curve (AUPRC) of 0.9860. The framework maintained strong performance when tested with varying positive-to-negative ratios (spanning 1:1 through 1:10), consistently achieving AUC values exceeding 0.9954 and AUPRC values above 0.9820, thereby confirming excellent robustness and generalization capability. Comparative evaluations against existing methodologies verified the superiority of DHG-LGB. Conclusions: The DHG-LGB framework delivers more comprehensive modeling of biological interactions relative to conventional approaches and substantially enhances predictive accuracy for metabolite–disease relationships. It is foreseeable that it will be a valuable computational tool for biomarker identification and precision medicine initiatives.
9 February 2026







