1. Introduction
Genome-scale metabolic models (GEMs) can reveal the association between genotype and phenotype through gene–protein-reaction (GPR) rules [
1,
2,
3,
4], which integrate all known genes, metabolites, and metabolic reactions of organisms into mathematical frameworks and are reconstructed from genome sequence data [
5,
6]. GEMs are widely utilized for biological phenotype predictions [
7,
8], metabolic engineering [
9,
10] and biomedicine [
11], etc. In recent years, many automated reconstruction tools of GEMs have emerged, which can effectively accelerate the reconstructions and reduce experimental costs for GEMs compared with traditional manual reconstructions [
12]. These automated tools such as ModelSEED [
13], CarveMe [
14] and Gapseq [
15] generally reconstruct GEMs from genome sequence data and relevant metabolic reaction databases. However, draft GEMs reconstructed by these tools commonly include numerous network gaps, which are due to incomplete genome annotations or underground reactions [
16,
17]. Therefore, it is essential to identify and fill these network gaps in GEMs.
A series of optimization-based tools was proposed to achieve gap-fillings. GrowMatch [
18] can resolve growth prediction inconsistencies by identifying the minimal set of restrictions that need to be imposed on the model. However, the method based on optimization algorithms usually excessively depends on phenotypic data and is difficult to apply to non-model organisms. FASTGAPFILL [
19] identifies a minimum reaction set from biochemical reaction databases to restore metabolic flux in dead-end reactions. However, the method usually pursues mathematically optimal solutions and ignores biological feasibility. Meneco [
20] addresses gap-filling problems by utilizing answer set programming (ASP), which can identify missing reactions and achieve gap-fillings in draft GEMs at high degradation rates. However, the method based on optimization algorithms usually tends to prioritize the shortest reaction paths and lacks biological feasibility.
With the development of deep learning, the advanced deep learning-based methods are increasingly applied to gap-fillings of GEMs [
21,
22]. These deep learning-based methods typically construct the network structure as graphs or hypergraphs for hyperedge predictions of missing reactions, which combine with candidate reaction pools to achieve gap-fillings. CHESHIRE [
23] employs Chebyshev spectral convolutional networks to capture topological features of metabolic networks. This strategy enables the prediction of missing reactions and selects suitable candidate reactions from a universal reaction pool to achieve gap-fillings. HGNNP [
24] is a hyperedges prediction strategy that employs a two-stage hypergraph convolution approach for capturing higher-order interactions between metabolites and reactions. DSHCNet [
25] further constructs homogeneous and heterogeneous graphs to identify substrates and products in GEMs based on Chebyshev spectral convolutional networks and selects appropriate reactions to implement gap-fillings. Multi-HGNN [
26] further combines a pre-trained model to perform feature extractions from molecular graphs of metabolites, and integrates directed graphs with hypergraphs to predict missing reactions in GEMs. However, the prediction performances of these topology-based approaches by deep learning algorithms need to be further improved.
In this study, we propose a novel topology-based approach, named GHCN-SE (Graph and Hypergraph Convolution Networks—Squeeze and Excitation), to predict and fill the candidate reactions in GEMs. GHCN-SE consists of three modules. In the feature extraction and fusion module, we utilize draft GEMs as input and simultaneously use a graph convolutional network and a hypergraph convolutional network to extract both associations of metabolites in the same reaction and higher-order interactions of metabolites within reactions, respectively, where learnable metabolite initial embeddings can be updated in the training phase. In the feature enhancement module, we employ a squeeze-and-excitation network to enhance metabolite features after the fusion of topological features. In the output module, GHCN-SE uses a multi-layer perceptron to yield confidence scores of candidate reactions. To evaluate the performances of GHCN-SE, we selected 108 high-quality BiGG GEMs from the BiGG database [
27] for training and testing. We conducted 5-fold cross validations and compared the evaluation metrics with state-of-the-art deep learning-based methods by evaluating reaction prediction performances and reaction recovery performances. The reaction prediction results demonstrate that the proposed GHCN-SE has the best performance metrics compared with other related methods. Moreover, we further analyzed the results classified by the network scales and biological categories of GEMs. The reaction recovery results demonstrate that GHCN-SE can more effectively identify the reactions within metabolic networks from real candidate reactions compared with other related methods. Our ablation study shows the effectiveness of the graph convolutional network, hypergraph convolutional network, and squeeze-and-excitation network in GHCN-SE for candidate reaction predictions. Moreover, a visualization study was conducted to interpret the effectiveness of the feature extraction and enhancement.
4. Discussion
Genome-scale metabolic models (GEMs) integrate all known genes, metabolites, and metabolic reactions into mathematical frameworks, which can reveal the association between genotype and phenotype. Gap-fillings are important for reconstructions of high-quality GEMs. Many optimization-based algorithms highly rely on phenotypic data and are difficult to apply in large-scale GEMs. Moreover, the prediction performance of these topology-based approaches by deep learning algorithms needs to be further improved.
In this study, we proposed a novel topology-based approach designated as GHCN-SE for the prediction of candidate reactions in GEMs. In the feature extraction and fusion module, GHCN-SE simultaneously employs the graph convolutional network and the hypergraph convolutional network to extract the topological features in GEMs, where the graph convolutional network can capture associations of metabolites in the same reaction and the hypergraph convolutional network can capture higher-order interactions of metabolites within reactions. GHCN-SE utilizes the learnable embeddings as metabolite initial embeddings, which can continuously be updated during training. In the feature enhancement module, we use the squeeze-and-excitation network to enhance fused metabolite features. In the output module, we aggregate the features of nodes and output the confidence scores of candidate reactions using a multi-layer perceptron.
To evaluate the reaction prediction performances and recovery performances of GHCN-SE, we conducted 5-fold cross validations on 108 high-quality BiGG GEMs for training and testing. GHCN-SE achieves the best performance compared with state-of-the-art methods across all evaluation metrics. To be specific, GHCN-SE achieves average AUPRC, recall, F1 score, accuracy, and precision scores of 0.915, 0.808, 0.816, 0.818, and 0.827, respectively, which outperform the second-best model by 0.5%, 7%, 15.4%, 2.9%, and 0.6%. Moreover, GHCN-SE achieves the best recovery performance on reaction recovery evaluations. Specifically, the average recovery rates for Top 25, Top 50, Top 100, and Top N achieve 0.160, 0.158, 0.152, and 0.132, which outperform the second-best model by 22.1%, 25.3%, 35.7%, and 28.2%, respectively. The ablation study was conducted to further demonstrate the contributions of the graph convolutional network, hypergraph convolutional network and squeeze-and-excitation network in GHCN-SE. Furthermore, the visualization results can demonstrate that GHCN-SE has efficient capability for metabolite feature representations and feature enhancements.
GHCN-SE significantly outperforms state-of-the-art topology-based approaches in both reaction prediction and reaction recovery experiments. Notably, GHCN-SE also surpasses the existing related approach, which integrates biochemical features of metabolites. These results demonstrate the potential of GHCN-SE for gap-fillings of draft GEMs.
Despite the remarkable improvement achieved by the GHCN-SE we proposed in reaction prediction and recovery evaluations, the results reveal potential limitations inherent to the current model. As a supervised learning framework that relies on the network topology, the performances of GHCN-SE are inevitably subject to the scale and reconstruction accuracy of draft GEMs. In future research, the biochemical information of metabolites and reactions, such as metabolite structures and enzyme information, can be considered for incorporation, which is expected to expand its potential applications in draft GEMs and constraint-based GEMs. Future research may explore methods for the designated restoration of connections between two specific metabolites.
Author Contributions
Conceptualization, K.W.; methodology, J.Q. and K.W.; software, J.Q.; validation, J.Q. and K.W.; formal analysis, J.Q. and K.W.; investigation, J.Q. and K.W.; resources, K.W.; data curation, J.Q.; writing—original draft preparation, J.Q. and K.W.; writing—review and editing, J.Q. and K.W.; visualization, J.Q. and K.W.; supervision, K.W.; project administration, K.W.; funding acquisition, K.W. All authors have read and agreed to the published version of the manuscript.
Funding
This research was supported by the National Key Research and Development Program of China (2024YFF1106400), and the National Natural Science Foundation of China (62373166).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Acknowledgments
The authors thank the reviewers and editors for their work.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Oyserman, B.O.; Cordovez, V.; Flores, S.S.; Leite, M.F.A.; Nijveen, H.; Medema, M.H.; Raaijmakers, J.M. Extracting the GEMs: Genotype, environment, and microbiome interactions shaping host phenotypes. Front. Microbiol. 2021, 11, 574053. [Google Scholar] [CrossRef]
- Nielsen, J. Systems biology of metabolism. Annu. Rev. Biochem. 2017, 86, 245–275. [Google Scholar] [CrossRef]
- O’Brien, E.J.; Monk, J.M.; Palsson, B.O. Using genome-scale models to predict biological capabilities. Cell 2015, 161, 971–987. [Google Scholar] [CrossRef] [PubMed]
- Oberhardt, M.A.; Palsson, B.Ø.; Papin, J.A. Applications of genome-scale metabolic reconstructions. Mol. Syst. Biol. 2009, 5, 320. [Google Scholar] [CrossRef] [PubMed]
- Gu, C.; Kim, G.B.; Kim, W.J.; Kim, H.U.; Lee, S.Y. Current status and applications of genome-scale metabolic models. Genome Biol. 2019, 20, 121. [Google Scholar] [CrossRef] [PubMed]
- Zhang, C.; Hua, Q. Applications of genome-scale metabolic models in biotechnology and systems medicine. Front. Physiol. 2016, 6, 413. [Google Scholar] [CrossRef]
- Orth, J.D.; Thiele, I.; Palsson, B.Ø. What is flux balance analysis? Nat. Biotechnol. 2010, 28, 245–248. [Google Scholar] [CrossRef]
- Harcombe, W.R.; Delaney, N.F.; Leiby, N.; Klitgord, N.; Marx, C.J. The ability of flux balance analysis to predict evolution of central metabolism scales with the initial distance to the optimum. PLoS Comput. Biol. 2013, 9, e1003091. [Google Scholar] [CrossRef]
- Dong, Y.; Chen, Z. Systems metabolic engineering of Corynebacterium glutamicum for efficient L-tryptophan production. Synth. Syst. Biotechnol. 2025, 10, 511–522. [Google Scholar] [CrossRef]
- Yim, H.; Haselbeck, R.; Niu, W.; Pujol-Baxley, C.; Burgard, A.; Boldt, J.; Khandurina, J.; Trawick, J.D.; Osterhout, R.E.; Stephen, R.; et al. Metabolic engineering of Escherichia coli for direct production of 1,4-butanediol. Nat. Chem. Biol. 2011, 7, 445–452. [Google Scholar] [CrossRef]
- Oberhardt, M.A.; Yizhak, K.; Ruppin, E. Metabolically re-modeling the drug pipeline. Curr. Opin. Pharmacol. 2013, 13, 778–785. [Google Scholar] [CrossRef] [PubMed]
- Gong, Z.; Chen, J.; Jiao, X.; Gong, H.; Pan, D.; Liu, L.; Zhang, Y.; Tan, T. Genome-scale metabolic network models for industrial microorganisms metabolic engineering: Current advances and future prospects. Biotechnol. Adv. 2024, 72, 108319. [Google Scholar] [CrossRef] [PubMed]
- Seaver, S.M.D.; Liu, F.; Zhang, Q.; Jeffryes, J.; Faria, J.P.; Edirisinghe, J.N.; Mundy, M.; Chia, N.; Noor, E.; Beber, M.E.; et al. The ModelSEED Biochemistry Database for the integration of metabolic annotations and the reconstruction, comparison and analysis of metabolic models for plants, fungi and microbes. Nucleic Acids Res. 2021, 49, D575–D588. [Google Scholar] [CrossRef]
- Machado, D.; Andrejev, S.; Tramontano, M.; Patil, K.R. Fast automated reconstruction of genome-scale metabolic models for microbial species and communities. Nucleic Acids Res. 2018, 46, 7542–7553. [Google Scholar] [CrossRef]
- Zimmermann, J.; Kaleta, C.; Waschina, S. Gapseq: Informed prediction of bacterial metabolic pathways and reconstruction of accurate metabolic models. Genome Biol. 2021, 22, 81. [Google Scholar] [CrossRef]
- Pan, S.; Reed, J.L. Advances in gap-filling genome-scale metabolic models and model-driven experiments lead to novel metabolic discoveries. Curr. Opin. Biotechnol. 2018, 51, 103–108. [Google Scholar] [CrossRef] [PubMed]
- Li, F. Filling gaps in metabolism using hypothetical reactions. Proc. Natl. Acad. Sci. USA 2022, 119, e2217400119. [Google Scholar] [CrossRef]
- Kumar, V.S.; Maranas, C.D. GrowMatch: An automated method for reconciling in silico/in vivo growth predictions. PLoS Comput. Biol. 2009, 5, e1000308. [Google Scholar] [CrossRef]
- Thiele, I.; Vlassis, N.; Fleming, R.M.T. FASTGAPFILL: Efficient gap filling in metabolic networks. Bioinformatics 2014, 30, 2529–2531. [Google Scholar] [CrossRef]
- Prigent, S.; Frioux, C.; Dittami, S.M.; Thiele, S.; Larhlimi, A.; Collet, G.; Gutknecht, F.; Got, J.; Eveillard, D.; Bourdon, J.; et al. Meneco, a topology-based gap-filling tool applicable to degraded genome-wide metabolic networks. PLoS Comput. Biol. 2017, 13, e1005276. [Google Scholar] [CrossRef]
- Klamt, S.; Haus, U.U.; Theis, F. Hypergraphs and cellular networks. PLoS Comput. Biol. 2009, 5, e1000385. [Google Scholar] [CrossRef]
- Chen, C.; Liu, Y.Y. A survey on hyperlink prediction. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 15034–15050. [Google Scholar] [CrossRef]
- Chen, C.; Liao, C.; Liu, Y.Y. Teasing out missing reactions in genome-scale metabolic networks through hypergraph learning. Nat. Commun. 2023, 14, 2375. [Google Scholar] [CrossRef] [PubMed]
- Gao, Y.; Feng, Y.; Ji, S.; Ji, R.R. HGNN+: General hypergraph neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 3181–3199. [Google Scholar] [CrossRef]
- Huang, W.; Yang, F.; Zhang, Q.; Liu, J. A dual-scale fused hypergraph convolution-based hyperedge prediction model for predicting missing reactions in genome-scale metabolic networks. Briefings Bioinform. 2024, 25, bbae383. [Google Scholar] [CrossRef]
- Huang, Y.; Liang, X.; Lin, T.; Liu, J. Multi-modal hypergraph neural networks for predicting missing reactions in metabolic networks. Inf. Sci. 2025, 704, 121960. [Google Scholar] [CrossRef]
- King, Z.A.; Lu, J.; Dräger, A.; Miller, P.; Federowicz, S.; Lerman, J.A.; Ebrahim, A.; Palsson, B.O.; Lewis, N.E. BiGG Models: A platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Res. 2016, 44, D515–D522. [Google Scholar] [CrossRef] [PubMed]
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017; OpenReview: Toulon, France, 2017; pp. 1–14. [Google Scholar]
- Chien, E.; Pan, C.; Peng, J.H.; Milenkovic, O. You are allset: A multiset function framework for hypergraph neural networks. arXiv 2021, arXiv:2106.13264. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; IEEE: New York, NY, USA, 2018; pp. 7132–7141. [Google Scholar]
- Kinga, D.; Adam, J.B. A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015; OpenReview: San Diego, CA, USA, 2015; Volume 5, pp. 1–10. [Google Scholar]
- Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Figure 1.
Overall architecture of the proposed GHCN-SE.
Figure 2.
Raincloud plots and average values of reaction prediction performances on 108 BiGG GEMs for GHCN-SE and three baseline models, in terms of (A) AUPRC, (B) recall, (C) F1 score, (D) accuracy, and (E) precision.
Figure 3.
Bar chart of the average values and distributions of reaction prediction performances of GHCN-SE under different network scales, with error bars representing the standard deviation.
Figure 4.
Bar chart of the average values and distributions of reaction prediction performances of GHCN-SE classified by species, with error bars representing the standard deviation.
Figure 5.
Raincloud plots and average values of reaction recovery performances on 108 BiGG GEMs for GHCN-SE and three baseline models, by selecting top 25, 50, 100, and N reactions, respectively. (A) Top 25, (B) Top 50, (C) Top 100, and (D) Top N.
Figure 6.
Bar chart of the average values and distributions of prediction performances of GHCN-SE and three variant approaches on 108 BiGG GEMs, with error bars representing the standard deviation.
Figure 7.
The visualization of the distributions of the initial embeddings and the enhanced embeddings before and after training using t-SNE, where the orange point denotes hexanoate, the blue point denotes oxidized glutathione, the yellow point denotes itaconate, the green point denotes cob(II)alamin, and the purple point denotes D-sorbitol. (A) iML1515, and (B) Recon3D.
Figure 8.
Visualization of features enhancement using the squeeze-and-excitation network for (A) iML1515 and (B) Recon3D.
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |