Nuclear Magnetic Resonance and Artificial Intelligence

Kuhn, Stefan; de Jesus, Rômulo Pereira; Borges, Ricardo Moreira

doi:10.3390/encyclopedia4040102

Open AccessEditor’s ChoiceReview

Nuclear Magnetic Resonance and Artificial Intelligence

by

Stefan Kuhn

^1,*

,

Rômulo Pereira de Jesus

²

and

Ricardo Moreira Borges

²

¹

Institute of Computer Science, University of Tartu, 51009 Tartu, Estonia

²

Instituto de Pesquisas de Produtos Naturais Walter Mors, Universidade Federal do Rio de Janeiro (UFRJ), Rio de Janeiro 21941-901, Brazil

^*

Author to whom correspondence should be addressed.

Encyclopedia 2024, 4(4), 1568-1580; https://doi.org/10.3390/encyclopedia4040102

Submission received: 25 August 2024 / Revised: 1 October 2024 / Accepted: 16 October 2024 / Published: 18 October 2024

(This article belongs to the Collection Nuclear Magnetic Resonance Techniques)

Download

Browse Figure

Review Reports Versions Notes

Abstract

This review explores the current applications of artificial intelligence (AI) in nuclear magnetic resonance (NMR) spectroscopy, with a particular emphasis on small molecule chemistry. Applications of AI techniques, especially machine learning (ML) and deep learning (DL) in the areas of shift prediction, spectral simulations, spectral processing, structure elucidation, mixture analysis, and metabolomics, are demonstrated. The review also shows where progress is limited.

Keywords:

nuclear magnetic resonance; NMR; artificial intelligence; spectrum prediction; metabolomics

1. Introduction

NMR spectroscopy is indispensable for the identification and structural elucidation of small compounds (e.g., metabolites). The resonance frequencies of each spin (mainly ¹³C and ¹H), also known as chemical shifts, together with their multiplicities, provide a unique fingerprint for different chemical environments within a molecule. By analyzing these shifts and peak profiles, analysts can deduce the connectivity, arrangement, and electronic environments of atoms in a molecule, making NMR an essential tool for determining the precise structure of organic and inorganic compounds. NMR spectroscopy also plays a crucial role in studying molecular dynamics and interactions. The technique can probe how molecules behave in different environments, offering insights into conformational changes, reaction mechanisms, and molecular interactions. This makes NMR invaluable not only for structural chemistry but also for understanding complex biological systems and metabolic pathways. In metabolomics, NMR is used to identify and quantify metabolites in biological samples, providing a comprehensive overview of metabolic processes [1]. By analyzing the NMR spectra of biofluids, tissues, or cells, researchers can gain insights into the metabolic state of an organism, detect biomarkers for diseases, and study the effects of drugs and other interventions [2].

Integrating artificial intelligence (AI) into NMR spectroscopy has been revolutionizing the field, enhancing the accuracy, efficiency, and scope of analyses. This review outlines this progress. Most techniques mentioned here can be classified as machine learning (ML), so we mostly use the terms AI and ML interchangeably. Deep learning (DL) is a subset of newer ML techniques. We do not delve into details of the definition of artificial intelligence, but comprise everything here which is helping to replace human expertise and input. The aim of this is to transform data handling and interpretation, enabling more complex and large-scale studies automatically. This review focuses on advancements in small molecule chemistry and the potential for future developments. Apart from the references given in the text, we point the reader to two recent special issues on the topic [3,4] and the reviews [5,6,7,8] for more materials and references. As part of this overview paper, we do not claim to cover the extensive literature exhaustively.

2. Databases and Data Standards

Large datasets are essential for training robust models capable of making accurate predictions. High-quality data that cover a broad range of chemical structures and experimental conditions enhance the model’s ability to generalize and perform well on new simulations. This is demonstrated, for example, in [9]. Correct data handling and storage practices are critical to achieving better accuracy in machine learning applications. In practice, data collections are not easily available for many problems in chemistry [10]. Apart from the simple lack of resources, available data are often difficult to access. A more formal definition of good practice is the FAIR (findable, accessible, interoperable, and reusable) data principles [11], which ensure that data are well-organized, easily accessible, and reusable by other researchers. Efficient data management systems that implement these principles can facilitate the integration of diverse datasets, enabling comprehensive analyses and more accurate predictions.

For NMR, the number of available resources has recently increased. For assigned NMR shifts, sdbs (https://sdbs.db.aist.go.jp/ (All websites in this paragraph accessed on 17 October 2024)) and nmrshiftdb2 (http://nmrshiftdb.org/), which recently celebrated its 20th anniversary [12], still seem to be the largest resources. A newly available resource is NP-MRD (https://np-mrd.org/), which holds experimental and (mostly) calculated NMR data for nearly 41,000 natural products. A bulk download is possible, but we did not manage to actually extract the data from it, mainly due to inconsistencies in the format and between files. NAPROC-13 (http://c13.usal.es) holds about 6000 natural products. nmrXiv (https://nmrxiv.org/) is part of the NFDI (Nationale Forschungsdateninfrastruktur) initiative by DFG (Deutsche Forschungsgemeinschaft) from Germany (https://www.nfdi.de/?lang=en) and is part of a landscape of research data management tools. Currently, the data in nmrXiv is very limited. Workflow4Metabolomics [13] is a platform for metabolomics workflows including NMR. It enables the saving and publication of analyses, going beyond pure analytical data.

Data standards are another important topic for machine learning, since only data that can be processed automatically are useful as training data. For example, the difficulties involved in reading the NP-MRD data [14] show the importance of data formats. In order to have fully available data, a record should contain raw data and extracted data (peaks), a full metadata record, and a complete set of spectra for a sample. The sample could be a compound, but also any other measurable substance. An NMReDATA record can hold all that information, raw data from the vendor, and annotated structure spectra in an NMReDATA file [15]. Other formats include JCAMP-DX [16] and CMLSpect [17], which hold extracted spectral data. nmrML [18], on the other hand, is focused on raw data. A consensus on which (meta)data should be reported would be important for collecting comparable data. Unfortunately, this does not exist so far. As part of NFDI4Chem, Minimum Information Metadata Standards for Chemical Investigations (MIChIs) have been developed, including NMR [19].

An alternative to using data collections is the use of artificial or computer-generated data. This includes spectra calculated using density functional theory (DFT), but also other tools or methods, depending on their suitability for the specific purpose. In any case, care needs to be taken to avoid the introduction of systematic errors from insufficiencies in the data generation.

3. Chemical Shift Prediction

One of the oldest applications of AI in conjunction with NMR is in shift prediction for compounds. This is typically used to compare measured shifts to the predicted shifts of potential structures of unknown compounds. Such comparisons can be carried out manually as part of a structure elucidation process, but can also be used during computer-aided structure elucidation (CASE) processes (see Section 6 for details). A recent example is [20], in which an optimization process guided by the similarity of the generated structures’ predicted spectra with the measured spectra was used to find good hits. Clearly, the prediction quality is a crucial factor for such applications. Quality here means primarily the closeness of the predicted shifts to the real shifts of these compounds, if they could be measured.

Therefore, a competition has been ongoing to improve prediction methods. In [21], a long term comparison was carried out between prediction quality for ¹H and ¹³C NMR. We continue this comparison in Table 1, which presents work undertaken since the publication of [21]. Figure 1 shows the results, including those from [21], in a single chart. This only contains results for small molecule solution NMR. We have also made an online version of this table and chart, which will be updated in the future. This is available at the following link: https://nmrshiftdb2.sourceforge.io/predictionhistory/history.html (accessed on 17 October 2024).

As before, the comparison is not a precise like-for-like comparison. In particular, the underlying datasets and the evaluation methods are not necessarily the same. See [21] for a thorough discussion. We still believe this is a valuable comparison.

Methods for NMR prediction can be divided into either ab initio or data-driven methods.Ab initio methods, including DFT, do not use any shift data, but calculate shifts directly from quantum mechanical models. They are known for accuracy, but are still very demanding in terms of computing power and are slow when used with common hardware. Data-driven methods, on the other hand, rely on collections of known shifts and try to use these to infer values for other compounds. These methods, including machine-learning approaches, can be very fast with standard hardware, but struggle to reach the accuracy of ab initio methods, which themselves are not AI methods, but can serve as a reference. Data-driven methods can be divided into three broad groups, namely increment-based, HOSE (Hierarchically Ordered Spherical Environment) code, and machine learning. Machine learning can be either older techniques or recently invented deep learning methods. For graph structures like molecules, graph convolutional networks and message passing are fundamental techniques. For details, see [21].

Looking at the figures for ¹³C predictions, it can be seen that recent machine learning predictions are close to what is possible using DFT. It should be noted that the prediction in 8/2002 achieving 1.0 ppm was for a single compound. Recent results using deep learning methods have achieved errors below 1 ppm. For ¹H, ref. [22] clearly achieved a historically best result with 0.1 ppm mean average error (MAE). This uses a single-solvent dataset, which up to this point has not been available. Since solvents matter significantly for ¹H predictions, it cannot be distinguished how much the result is due to the new technique or the new dataset. A very good result was also reported by [23], using a fragment-based approach. This is contrary to the recent trend towards graph neural network (GNN) approaches. In [21], it was concluded that, since the least squares linear regression for ¹H was almost horizontal, real progress was not visible. Since the least squares linear regression line is now clearly falling, this assessment has to be revised. A clear trend of improved results now is visible also for ¹H prediction.

All the above mentioned papers use a training set of a certain size (normally as big as possible). Judging a model on this alone was shown in [9,24] to give an incomplete picture. They suggested looking at the results of the model with differently sized training sets to get a full picture of the capabilities of a model. Another issue with shift prediction is the use of three-dimensional coordinates, which were used, e.g., in [25]. The generation of appropriate conformations is a problem on its own, considered in [26] using clustering techniques. In [27], AI methods were used to select the best conformer. The conformers were generated using a force field technique, and the prediction was undertaken using DFT, so the AI application was not for prediction directly, but for conformer selection.

Table 1. Historically achieved MAEs for various methods and datasets. This is a continuation of Table 1 in [21]. *** Root mean squared error instead of mean absolute error.

Date	¹H MAE (ppm)	¹³C MAE (ppm)	Method	Training	Test Dataset	Ref.
9/2023	0.209	2.18	FullSSPrUCe (GNN)	nmrshiftdb2	nmrshiftdb2	[28]
11/2023	0.168	2.938	ComENet	GlycoNMR 80%	GlycoNMR 10%	[29]
11/2023	0.145	2.550	DimeNet++	GlycoNMR 80%	GlycoNMR 10%	[29]
11/2023	0.140	2.492	SchNet	GlycoNMR 80%	GlycoNMR 10%	[29]
11/2023	0.146	3.044	SphereNet	GlycoNMR 80%	GlycoNMR 10%	[29]
12/2023	0.138	1.79	fragment-based	COLMAR	768 COLMAR Metabolites	[23]
3/2024	0.210	2.228	GNN	nmrshiftdb2 subset	HMDB and CH-NMR-NP	[30]
5/2024	0.10	-	GNN	PROSPRE 3755 compounds	PROSPRE 272 compounds	[22]
5/2024	-	0.7	DFT	-	132 shifts	[31]
6/2024	0.185	0.944	DFT+3D GNN	nmrshiftdb2 80%	nmrshiftdb2 20%	[32]
6/2024	-	0.9 ***	GNN	2026 organic molecules	171 benzenic structures	[33]

4. Spectral Simulations

In addition to shift values, an NMR spectrum is characterized by other parameters, most prominently coupling constants, which determine the shape of the signals. These can be predicted as well, using AI techniques.

{}^{1}J_{C H}

couplings were predicted in [28] with a precision of 0.679 Hz and

{}^{1}J_{C C}

of 0.194 Hz. For

{}^{1}J_{C H}

, ref. [25] reported 0.870 Hz. Ab initio-calculated training data were used in all cases, since there was no big enough dataset of assigned coupling constants available. Based on predicted shifts and coupling constants, ref. [28] provided simulated spectra, fully based on machine learning.

Another option is the simulation of spectral data, including chemical shifts and coupling constants, using high-level spin dynamics. Spinach [34] is a robust computational tool for simulating NMR experiments, as it is adept at handling complex spin systems through advanced mathematical frameworks. Central to Spinach is its ability to model the behaviour of spins—nuclei or electrons possessing a magnetic moment—under various magnetic fields. These spins interact among themselves and with external magnetic fields, a phenomenon mathematically encapsulated by the Hamiltonian. The Hamiltonian provides a comprehensive representation of the total energy of the spin system, accounting for Zeeman interactions, J-coupling, and dipole–dipole interactions. The spin system’s state is described by the density matrix, which includes information about the populations and quantum coherences of the spins. The temporal evolution of this state is governed by the Liouville–von Neumann equation. Its practical workflow includes defining the spin system by specifying the spins, their interactions, and the external magnetic field; constructing the Hamiltonian to form a comprehensive energy landscape; and setting the initial state of the system. Subsequent steps involve simulating the state evolution, applying radio frequency pulses, and detecting the resultant NMR signals. Spinach leverages these mathematical techniques to predict and analyze NMR spectra, facilitating a deeper understanding of complex spin dynamics and enabling the design of more effective NMR experiments. Spinach also uses deep learning techniques, e.g., to interpret double electron–electron resonance (DEER) data [35] or to assign protein NMR signals [36].

Using full spectrum simulations, it might be possible to identify fragments and substructures of compounds [37], simulate the chemical shifts and the J-coupling values, and generate the simulated spectra to be used as training datasets. These simulated spectra can then be used for machine learning algorithms, enhancing their ability to interpret complex NMR data and automate the identification process. By leveraging both experimental and simulated NMR data, we can achieve a deeper understanding of molecular structures and dynamics, pushing the boundaries of chemical and biochemical research.

5. Spectral Processing and Peak Picking

Peak picking, a critical step in automated NMR data analysis, has been enhanced by AI techniques. AI algorithms can automatically identify and quantify peaks in complex spectra, improving the accuracy and efficiency of the process [38]. DEEP picker [38] works on two-dimensional (2D) spectra, using a convolutional neural network (CNN). For one-dimensional (1D) spectra, ref. [39] provided an automatic deconvolution of 1D spectra. It uses a combination of deep learning techniques, including CNNs and long–short-term memory (LSTM) networks. This uses synthetic data for training purposes.

An alternative to peak picking frequently used in metabolomics is binning, where the intensities of a spectrum are integrated over certain areas ("bins"). Those bins can either be of fixed width, or use so-called "intelligent" binning methods. To our knowledge, these do not apply AI or machine learning methods, but use statistical evaluations (e.g., [40] or [41]).

In [42], an algorithm to correct shift-uncertainties to better match known metabolic profiles was presented. This used a genetic algorithm combined with a least squares optimization.

6. (Pure Compound) Structure Elucidation

Computer-aided structure elucidation (CASE) systems represent a significant advancement in the field of chemical analysis by automating and enhancing the process of molecular structure determination. These systems leverage advanced algorithms, extensive spectral databases, and artificial intelligence to interpret NMR, MS, IR, and other types of spectral data, reducing the manual effort and potential for human error. CASE systems can automatically identify and label peaks in spectral data, simulate spectra from proposed structures for comparison with experimental data, and generate possible molecular structures based on spectral features. By integrating techniques like COSY, HSQC, and HMBC or 2D NMR interpretation, these systems can deduce atom connectivity and build more complete and accurate molecular structures [43,44].

For the core process of structure finding, CASE systems have historically relied on optimization and heuristics. Optimization techniques are traditionally considered part of the research field of artificial intelligence. The core challenge is to find a structure that optimally matches the constraints provided, including the measured NMR spectra. Even if restricted by, e.g., a molecular formula, testing all possible candidates is not feasible; the number of structures to test can be reduced by using heuristics like simulated annealing (e.g., SENECA [45]), genetic algorithms (e.g., GENIUS [46]), or swarm intelligence (e.g., [47]).

Key examples of CASE software include Structure Elucidator (ACD/Labs), Mnova (Mestrelab), and open-source tools like Magma [48] and Sherlock [20]. These tools offer capabilities ranging from automated data processing and peak picking to structure generation and ranking, using probabilistic scoring methods such as DP4. The integration of AI techniques, such as machine learning and deep learning, can further enhance the ability of CASE systems to handle complex datasets and predict accurate structures. An example of this is the DeepSAT system [49]. As opposed to the directed generation of compounds to find one that matches the measured spectra best, DeepSAT predicts a fingerprint and a chemical class from the spectra using machine learning techniques and searches a database for the best fit to those predicted properties. In [49], good results using this technique were reported. It should be noted that the method is restricted by the database to search in, but these do not have to be NMR databases, so a much larger range of databases can be used than with a direct NMR search.

Despite the challenges of data quality and the interpretation of complex structures, CASE systems significantly improve the efficiency, accuracy, and consistency of structure elucidation, making advanced analytical techniques accessible to a wider range of researchers and professionals in chemistry.

Bai, M. et al. (2020) [50] used ACD/Structure Elucidator software v.18 to determine the structure of four new alkaloids from the stems of Picrasma quassioides. These were elucidated through a combination of computer-aided structure elucidation software (ACD/Structure Elucidator v.18), 2D, and 1D NMR chemical shift calculations based on the GIAO (gauge-independent atomic orbital) method. The simulated NMR data aided the chemical shift assignment and ACD/Structure Elucidator yielded connectivity information for the structural formula. Comparing the experimental chemical shift values with the simulated values, the authors refined the proposed molecular structures. The GIAO simulation clarified ambiguities in the structures, especially in cases where the spectra presented overlaps or difficult-to-interpret peaks.

Natural products containing the epoxide groups were reviewed using a hybrid parametric/DFT approach called DU8+, which enabled the evaluation of more than 20 structures, including achicretin 2, guaianolide A, artanomalide B, and chloroklotzchin. CASE was shown to be crucial for overcoming difficulties in interpreting NMR data, especially in molecules with complex structural features, such as the overlap of spin–spin coupling constants and the interpretation of NOE effects. This study, by Kutateladze et al., exemplified how advanced computational techniques could significantly improve the accuracy of structure assignment in natural products, contributing to the advancement of knowledge in organic chemistry and pharmacognosy [51].

Elyashberg and Argyropoulos (2020) [52] demonstrated the application of computer-assisted structure elucidation (CASE) in determining complex natural product structures. They highlighted that modern CASE systems, such as ACD/SE, utilized NMR data to elucidate molecular structures based on molecular formulas obtained from high-resolution MS and 1D and 2D NMR spectra, including COSY, HSQC, and HMBC. The authors also noted the adoption of new experimental approaches, such as long-range correlation experiments and pure-shift methods, which enhanced the capability for structural elucidation. Caffeine, quercetin, and salicylic acid are examples of substances successfully elucidated using these computational tools. The integration of computational chemistry methods and machine learning algorithms is anticipated to further advance the accuracy and reliability of structural elucidation, solidifying the role of CASE as an indispensable tool in contemporary chemical research.

Liu et al. (2017) [53] focused on elucidating the structure of cryptospirolepine, emphasizing the importance of the CASE approach in correcting previously mischaracterized structures. The original structure, proposed in 1993 based on NMR spectroscopy data, was later found to be incorrect. Utilizing advanced MicroCryoProbe technology and 1,1-HD-ADEQUATE experiments, the authors revised the structure in 2015 and proposed possible candidate structures by comparing experimental chemical shift data with theoretical values calculated using DFT. The comparison between the original and revised structures revealed a significantly stronger correlation between the experimental and theoretical data for the correct structure, as indicated by a much lower Q value (0.122 for the revised structure, compared to 0.245 for the original). This difference underscored the effectiveness of using RDC and RCSA in structural validation, showing that the new approach not only corrected the structure but also provided a robust tool to avoid characterization errors in future studies. These findings highlighted the relevance of combining computational and experimental methods in elucidating complex molecular structures, particularly when crystallization is not feasible, as often occurs with natural compounds.

Howarth et al. (2023) [54] introduced a significant innovation in the structural elucidation of organic compounds with the development of the DP4-AI system. This automated system processes and interprets data from carbon and hydrogen nuclear magnetic resonance (NMR) spectroscopy, enabling the rapid and efficient analysis of complex molecules, such as 2,3-dihydroxy-1,4-naphthoquinone, and 1,2-dihydroxy-3,4-naphthoquinone. DP4-AI employs advanced algorithms for NMR peak selection and matches chemical shifts calculated via density functional theory (DFT) with experimental data, resulting in precise and reliable spectral assignments. With a 60-fold increase in processing speed compared to traditional methods, DP4-AI significantly reduces analysis time and the manual workload for chemists, who previously spent hours on data interpretation. This system optimizes the workflow in structural elucidation, minimizing the need for manual intervention and making the process more efficient and accessible for high-throughput analyses. DP4-AI’s capabilities are particularly valuable in research areas where speed and accuracy are critical, such as drug development and natural product characterization. The results obtained with DP4-AI underscore its potential to revolutionize organic chemistry, offering a robust tool that combines automation and computational intelligence to address the challenges of modern structural elucidation [54].

A class of AI models exploited recently for structure elucidation are transformers. A encoder–decoder architecture was used in [55]. In this, spectra are transformed to text (most prominently shift values) and structures are encoded as SMILES (simplified molecular-input line-entry system). The encoder–decoder network then learns how to transform those two representations into each other. For this, successive layers learn the context of the individual tokens in the text representations. In [55], accuracies for structure recall of 50% up to 100% (for very small molecules) were reported. An accuracy of 69.6% was reported in [56] for molecules up to 19 heavy atoms. This model predicted substructures from the spectrum and assembled them to a solution, using a transformer model for the last step and another neural network for the first.

Specifically in metabolomics, ref. [57] illustrated the use of quantum chemistry methods to identify metabolites in complex samples, addressing the challenge of characterizing the vast chemical diversity present in the metabolome. They explained how quantum calculations can be used to generate chemical properties and reference spectra for metabolites like caffeine, salicylic acid, and flavonoids, which are common in metabolomics studies. The research underscored the importance of developing in silico libraries for molecular identification, which can help overcome the limitation of the unavailability of authentic chemical standards for many compounds. They highlighted the need for free energy corrections and the inclusion of entropic contributions in the analysis of reaction mechanisms, which enhances the accuracy of predicting fragmentation patterns in mass spectrometry (MS) experiments. This approach not only aids in compound identification but also opens new avenues for designing reactions and catalysts, significantly advancing the field of metabolomics.

7. Mixtures and Metabolomics

NMR spectra can be recorded from mixtures in solution as well from single compounds. In the case of mixtures, the peaks of the individual compounds show up in one spectrum as the sum of their intensities, which reflect their respective concentrations. Since it is not clear which peaks belong together to compose a single structure, the potential search space becomes even bigger than for single mixtures. Dereplication strategies are still possible, implemented for example in MixONat [58]. COLMAR [59] uses a combination of HSQC and TOCSY spectra and queries them against a database. COLMAR NMR has emerged as a powerful analytical tool in metabolomics, natural products research, food science, environmental monitoring, and clinical diagnostics. Its application in metabolomics enables the broad profiling of biofluids such as urine [60], serum [61], and others [62,63], facilitating the identification and quantification of metabolites that reflect physiological and pathological states, including those associated with cancer, diabetes, and neurological disorders. The method’s robustness, when the user reproduce sample requirements, offers detailed spectral information with the capacity to unravel complex mixtures, offering valuable insights across scientific disciplines. Clustering methods which make use of shared shifts of peaks in 2D spectra were demonstrated in [64,65] as a tool named NMRfilter. Clustering was also used in [66], but required a separation, albeit a simple one compared to a full separation. NMRfilter is still underused by the community, but its application was shown in ref. [67,68]. SCORE-metabolite-ID [69] correlates NMR with MS data over the third dimension of the time course of a chromatographic fractionation. Similarly, DAFdiscovery [70] applies the concept of statistical heterospectroscopy (SHY) to correlate data from NMR, MS, and biological activity (or any combination of these) to pinpoint compounds of interest, mainly directed to natural products discovery.

With the emergence of AI technologies, more advanced analyses have become possible. In [37,71], a convolutional neural network was used to predict the chemical class of sub-fragments directly from 2D spectra. This demonstrates the power of deep learning technologies, since the network extracts a typical pattern from the training set, finds those in the new cases, performs fuzzy matches where needed, and performs the classification. This works for mixtures as well as for single compounds. Similar techniques (e.g., fingerprint prediction from spectra) were used in [49]. SMART-Miner [72] uses a CNN as well for the identification of components in mixtures. As opposed to using processed data, ref. [73] used raw data and identified functional groups by a combination of peak sampling and a recurrent neural network (RNN).

In metabolomics, both targeted and untargeted approaches have benefited from AI for statistical evaluation and for data interpretation. For a recent review of AI in metabolomics generally, not only in NMR, see [74]. Metabolomics stands for the comprehensive study of metabolites, which are small molecules produced or consumed in metabolic processes within a biological system, such as a cell, tissue, organ, or organism. It is used to understand the metabolic profiles and pathways involved in various physiological states, diseases, or responses to external factors like drugs or environmental changes by creating prediction models. This process is generally performed using statistical regression methods, such as logistic regression (see [75]). However, as machine learning models began to outperform statistical models in other fields, they were increasingly applied to this task as well. Recently, ref. [76] used a neural network (called a metabolomic state model) to predict multidisease risks. This is an application of untargeted metabolomics, where the complete metabolome is measured. Peak picking and binning, described in Section 5, are typically used for further evaluation. This process is referred to as metabolomic fingerprinting. In [77], machine learning was used to evaluate features from LC-MS (liquid chromatography–mass spectrometry)and NMR spectra in order to diagnose renal cell carcinoma. This can potentially replace a costly combination of imaging and invasive techniques.

In the food sector, metabolic fingerprinting is used to determine various parameters of food, including origin, freshness, species (e.g., for meat), or processing steps. This was traditionally done using statistical methods, and has been applied to a variety of food (for example [78,79,80,81]). Recent studies have claimed that using AI techniques can distinguish samples which purely statistical methods cannot deal with, e.g., the origin of fish from closely related water bodies [82]. NMR is routinely used for fingerprinting of plasma and urine samples for human metabolomics studies. Metabolomics-based methods have immense potential for early disease diagnosis and monitoring therapeutic responses [83,84]. The identification and quantification of metabolites through NMR spectroscopy and mass spectrometry has enabled the discovery of novel biomarkers with the potential for early disease diagnosis and personalized treatment [85]. An open question when discussing AI applications is the explainability and trustworthiness of results. In [86], an AI application was demonstrated which performed quantification of a limited set of analytes and showed which regions of 1D spectra had been used for this.

A very important area in metabolomic profiling is lipoprotein profiling, pioneered by [87], which traditionally relies on de-convolution and curve-fitting. Improved results using machine-learning techniques, specifically random forest and logistic regression, were shown in [88].

In contrast to untargeted metabolomics, targeted metabolomics measures a set of defined analytes. It is generally agreed upon that only a fraction of the relevant metabolites are currently known. A total of 150,000 is considered a lower limit, whereas HMDB lists about 42,000 [89]. In [90], a pipeline of deep learning techniques was used to identify metabolites in human plasma samples. This study reported the successful identification of known metabolites, but could not deal with the “metabolic dark matter”. For clinical and other purposes, a targeted analysis can identify conditions which an untargeted analysis may not show, as reported for example in [91].

8. Conclusions

This review highlights the significant role of artificial intelligence in enhancing nuclear magnetic resonance spectroscopy and metabolomics. NMR is still essential for identifying and understanding small molecules, and AI techniques, particularly machine learning, have improved the accuracy and efficiency of data analysis. By integrating AI, increasingly complex datasets can be handled. This is aided by the improved prediction of chemical shift data, and the more efficient simulation of spectra.

The adoption of FAIR data principles and advanced data management systems can further support these advancements. At the moment, data collection and accessibility is still an issue. By combining experimental and simulated data, AI-driven approaches are pushing the boundaries of NMR and metabolomics, enabling more precise compound identification and deeper insights into biological systems.

Author Contributions

Conceptualization, S.K. and R.M.B.; methodology, S.K. and R.M.B.; investigation, S.K., R.P.d.J. and R.M.B.; writing—original draft preparation, S.K., R.P.d.J. and R.M.B.; writing—review and editing, S.K., R.P.d.J. and R.M.B. All authors have read and agreed to the published version of the manuscript.

Funding

R.M.B. acknowledges the Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro (FAPERJ) for the grant 210.489/2019 APQ-1 and the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) for the grant 304501/2021-2. R.P.d.J. acknowledges the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).

Data Availability Statement

No new data were created for this research.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial intelligence
CNN	Convolutional neural network
DL	Deep learning
ML	Machine learning
MRI	Magnetic resonance imaging
MRS	In vivo magnetic resonance spectroscopy
NMR	Nuclear magnetic resonance
RNN	Recurrent neural network

References

Borges, R.M.; Ferreira, G.A.; Campos, M.M.; Teixeira, A.M.; Costa, F.D.N.; das Chagas, F.O.; Colonna, M. NMR as a tool for compound identification in mixtures. Phytochem. Anal. 2023, 34, 385–392. [Google Scholar] [CrossRef] [PubMed]
Wishart, D.S.; Cheng, L.L.; Copie, V.; Edison, A.S.; Eghbalnia, H.R.; Hoch, J.C.; Gouveia, G.J.; Pathmasiri, W.; Powers, R.; Schock, T.B.; et al. NMR and Metabolomics-A Roadmap for the Future. Metabolites 2022, 12, 678. [Google Scholar] [CrossRef] [PubMed]
Journal of Magnetic Resonance. Special Issue: Artificial Intelligence in NMR, EPR, and MRI; Elsevier: Amsterdam, The Netherlands, 2022; Available online: https://www.sciencedirect.com/special-issue/106L0B084H8 (accessed on 17 October 2024).
Magnetic Resonance in Chemistry. Special Issue: Applications of Machine Learning and Artificial Intelligence in NMR; Wiley: Hoboken, NJ, USA, 2022; Volume 60. [Google Scholar] [CrossRef]
Lu, X.Y.; Wu, H.P.; Ma, H.; Li, H.; Li, J.; Liu, Y.T.; Pan, Z.Y.; Xie, Y.; Wang, L.; Ren, B.; et al. Deep Learning-Assisted Spectrum–Structure Correlation: State-of-the-Art and Perspectives. Anal. Chem. 2024, 96, 7959–7975. [Google Scholar] [CrossRef] [PubMed]
Shukla, V.K.; Heller, G.T.; Hansen, D.F. Biomolecular NMR spectroscopy in the era of artificial intelligence. Structure 2023, 31, 1360–1374. [Google Scholar] [CrossRef]
Karamanos, T.K.; Matthews, S. Biomolecular NMR in the AI-assisted structural biology era: Old tricks and new opportunities. Biochim. Biophys. Acta (BBA)-Proteins Proteom. 2024, 1872, 140949. [Google Scholar] [CrossRef]
Cortés, I.; Cuadrado, C.; Hernández Daranas, A.; Sarotti, A.M. Machine learning in computational NMR-aided structural elucidation. Front. Nat. Prod. 2023, 2, 1122426. [Google Scholar] [CrossRef]
Kuhn, S.; Borges, R.M.; Venturini, F.; Sansotera, M. Dataset Size and Machine Learning-Open NMR Databases as a Case Study. In Proceedings of the 2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC), Online, 27 June–1 July 2022. [Google Scholar] [CrossRef]
For chemists, the AI revolution has yet to happen. Nature 2023, 617, 438. [CrossRef]
Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.W.; da Silva Santos, L.B.; Bourne, P.E.; et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016, 3, 160018. [Google Scholar] [CrossRef]
Kuhn, S.; Kolshorn, H.; Steinbeck, C.; Schlörer, N. Twenty years of nmrshiftdb2: A case study of an open database for analytical chemistry. Magn. Reson. Chem. 2024, 62, 74–83. [Google Scholar] [CrossRef]
Guitton, Y.; Tremblay-Franco, M.; Le Corguillé, G.; Martin, J.F.; Pétéra, M.; Roger-Mele, P.; Delabrière, A.; Goulitquer, S.; Monsoor, M.; Duperier, C.; et al. Create, run, share, publish, and reference your LC–MS, FIA–MS, GC–MS, and NMR data analysis workflows with the Workflow4Metabolomics 3.0 Galaxy online infrastructure for metabolomics. Int. J. Biochem. Cell Biol. 2017, 93, 89–101. [Google Scholar] [CrossRef]
NP-MRD: The Natural Products Magnetic Resonance Database. Available online: https://pubpeer.com/publications/C08A991740F8D3D70C95F7CDE904C8 (accessed on 10 August 2024).
Kuhn, S.; Wieske, L.H.E.; Trevorrow, P.; Schober, D.; Schlörer, N.E.; Nuzillard, J.M.; Kessler, P.; Junker, J.; Herráez, A.; Farès, C.; et al. NMReDATA: Tools and applications. Magn. Reson. Chem. 2021, 59, 792–803. [Google Scholar] [CrossRef] [PubMed]
Davies, A.N.; Lampen, P. JCAMP-DX for NMR. Appl. Spectrosc. 1993, 47, 1093–1099. [Google Scholar] [CrossRef]
Kuhn, S.; Helmus, T.; Lancashire, R.J.; Murray-Rust, P.; Rzepa, H.S.; Steinbeck, C.; Willighagen, E.L. Chemical Markup, XML, and the World Wide Web. 7. CMLSpect, an XML Vocabulary for Spectral Data. J. Chem. Inf. Model. 2007, 47, 2015–2034. [Google Scholar] [CrossRef]
Schober, D.; Jacob, D.; Wilson, M.; Cruz, J.A.; Marcu, A.; Grant, J.R.; Moing, A.; Deborde, C.; de Figueiredo, L.F.; Haug, K.; et al. nmrML: A Community Supported Open Data Standard for the Description, Storage, and Exchange of NMR Data. Anal. Chem. 2018, 90, 649–656. [Google Scholar] [CrossRef]
Rayya, N. A New NMR MI Standard–Feedback Welcome. Available online: https://www.nfdi4chem.de/a-new-nmr-mi-standard-from-nfdi4chem/ (accessed on 10 August 2024).
Wenk, M.; Nuzillard, J.M.; Steinbeck, C. Sherlock—A Free and Open-Source System for the Computer-Assisted Structure Elucidation of Organic Compounds from NMR Data. Molecules 2023, 28, 1448. [Google Scholar] [CrossRef]
Jonas, E.; Kuhn, S.; Schlörer, N. Prediction of chemical shift in NMR: A review. Magn. Reson. Chem. MRC 2022, 60, 1021–1031. [Google Scholar] [CrossRef] [PubMed]
Sajed, T.; Sayeeda, Z.; Lee, B.L.; Berjanskii, M.; Wang, F.; Gautam, V.; Wishart, D.S. Accurate Prediction of 1H NMR Chemical Shifts of Small Molecules Using Machine Learning. Metabolites 2024, 14, 290. [Google Scholar] [CrossRef]
Rigel, N.; Li, D.W.; Brüschweiler, R. COLMARppm: A Web Server Tool for the Accurate and Rapid Prediction of 1H and 13C NMR Chemical Shifts of Organic Molecules and Metabolites. Anal. Chem. 2024, 96, 701–709. [Google Scholar] [CrossRef]
Rull, H.; Fischer, M.; Kuhn, S. NMR shift prediction from small data quantities. J. Cheminform. 2023, 15, 114. [Google Scholar] [CrossRef]
Gerrard, W.; Bratholm, L.A.; Packer, M.J.; Mulholland, A.J.; Glowacki, D.R.; Butts, C.P. IMPRESSION—Prediction of NMR parameters for 3-dimensional chemical structures using machine learning with near quantum chemical accuracy. Chem. Sci. 2020, 11, 508–515. [Google Scholar] [CrossRef]
Tanemura, K.A.; Das, S.; Merz, K.M., Jr. AutoGraph: Autonomous Graph-Based Clustering of Small-Molecule Conformations. J. Chem. Inf. Model. 2021, 61, 1647–1656. [Google Scholar] [CrossRef] [PubMed]
Das, S.; Edison, A.S.; Merz, K.M.J. Metabolite Structure Assignment Using In Silico NMR Techniques. Anal. Chem. 2020, 92, 10412–10419. [Google Scholar] [CrossRef] [PubMed]
Williams, J.; Jonas, E. Rapid prediction of full spin systems using uncertainty-aware machine learning. Chem. Sci. 2023, 14, 10902–10913. [Google Scholar] [CrossRef] [PubMed]
Chen, Z.; Badman, R.P.; Foley, L.; Woods, R.; Hong, P. GlycoNMR: Dataset and benchmarks for NMR chemical shift prediction of carbohydrates with graph neural networks. arXiv 2023, arXiv:cs.LG/2311.17134. [Google Scholar]
Li, Y.; Xu, H.; Hong, P. AI-enabled prediction of NMR spectroscopy: Deducing 2-D NMR of carbohydrate. arXiv 2024, arXiv:cs.LG/2403.11353. [Google Scholar]
Ramos, S.A.; Mueller, L.J.; Beran, G.J.O. The interplay of density functional selection and crystal structure for accurate NMR chemical shift predictions. Faraday Discuss. 2024. advance article. [Google Scholar] [CrossRef]
Han, C.; Zhang, D.; Xia, S.; Zhang, Y. Accurate Prediction of NMR Chemical Shifts: Integrating DFT Calculations with Three-Dimensional Graph Neural Networks. J. Chem. Theory Comput. 2024, 20, 5250–5258. [Google Scholar] [CrossRef]
Duprat, F.; Ploix, J.L.; Dreyfus, G. Can Graph Machines Accurately Estimate ¹³C NMR Chemical Shifts of Benzenic Compounds? Prepr. Mol. 2024, 29, 3137. [Google Scholar] [CrossRef]
Hogben, H.; Krzystyniak, M.; Charnock, G.; Hore, P.; Kuprov, I. Spinach—A software library for simulation of spin dynamics in large spin systems. J. Magn. Reson. 2011, 208, 179–194. [Google Scholar] [CrossRef]
Worswick, S.G.; Spencer, J.A.; Jeschke, G.; Kuprov, I. Deep neural network processing of DEER data. Sci. Adv. 2018, 4, eaat5218. [Google Scholar] [CrossRef]
Protein NMR Assignment with AI. Available online: https://spindynamics.org/wiki/index.php?title=Protein_NMR_Assignment_with_AI (accessed on 3 July 2024).
Kuhn, S.; Cobas, C.; Barba, A.; Colreavy-Donnelly, S.; Caraffini, F.; Borges, R.M. Direct deduction of chemical class from NMR spectra. J. Magn. Reson. 2023, 348, 107381. [Google Scholar] [CrossRef] [PubMed]
Li, D.W.; Hansen, A.L.; Yuan, C.; Bruschweiler-Li, L.; Bruschweiler, R. DEEP picker is a deep neural network for accurate deconvolution of complex two-dimensional NMR spectra. Nat. Commun. 2021, 12, 5229. [Google Scholar] [CrossRef]
Schmid, N.; Bruderer, S.; Paruzzo, F.; Fischetti, G.; Toscano, G.; Graf, D.; Fey, M.; Henrici, A.; Ziebart, V.; Heitmann, B.; et al. Deconvolution of 1D NMR spectra: A deep learning-based approach. J. Magn. Reson. 2023, 347, 107357. [Google Scholar] [CrossRef]
De Meyer, T.; Sinnaeve, D.; Van Gasse, B.; Tsiporkova, E.; Rietzschel, E.R.; De Buyzere, M.L.; Gillebert, T.C.; Bekaert, S.; Martins, J.C.; Van Criekinge, W. NMR-Based Characterization of Metabolic Alterations in Hypertension Using an Adaptive, Intelligent Binning Algorithm. Anal. Chem. 2008, 80, 3783–3790. [Google Scholar] [CrossRef] [PubMed]
Anderson, P.E.; Reo, N.V.; DelRaso, N.J.; Doom, T.E.; Raymer, M.L. Gaussian binning: A new kernel-based method for processing NMR spectroscopic data for metabolomics. Metabolomics 2008, 4, 261–272. [Google Scholar] [CrossRef]
Schleif, F.M.; Riemer, T.; Börner, U.; Schnapka-Hille, L.; Cross, M. Genetic algorithm for shift-uncertainty correction in 1-D NMR-based metabolite identifications and quantifications. Bioinformatics 2010, 27, 524–533. [Google Scholar] [CrossRef][Green Version]
Burns, D.C.; Mazzola, E.P.; Reynolds, W.F. The role of computer-assisted structure elucidation (CASE) programs in the structure elucidation of complex natural products. Nat. Prod. Rep. 2019, 36, 919–933. [Google Scholar] [CrossRef]
Milanowski, D.J.; Oku, N.; Cartner, L.K.; Bokesch, H.R.; Williamson, R.T.; Saurí, J.; Liu, Y.; Blinov, K.A.; Ding, Y.; Li, X.C.; et al. Unequivocal determination of caulamidines A and B: Application and validation of new tools in the structure elucidation tool box. Chem. Sci. 2018, 9, 307–314. [Google Scholar] [CrossRef]
Steinbeck, C. SENECA: A Platform-Independent, Distributed, and Parallel System for Computer-Assisted Structure Elucidation in Organic Chemistry. J. Chem. Inf. Comput. Sci. 2001, 41, 1500–1507. [Google Scholar] [CrossRef] [PubMed]
Meiler, J.; Will, M. Genius: A Genetic Algorithm for Automated Structure Elucidation from 13C NMR Spectra. J. Am. Chem. Soc. 2002, 124, 1868–1870. [Google Scholar] [CrossRef] [PubMed]
Farrelly, C.; Kell, D.B.; Knowles, J. Molecular Structure Elucidation Using Ant Colony Optimization: A Preliminary Study. In International Conference on Ant Colony Optimization and Swarm Intelligence; Dorigo, M., Birattari, M., Blum, C., Clerc, M., Stützle, T., Winfield, A.F.T., Eds.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 120–131. [Google Scholar]
Pritišanac, I.; Degiacomi, M.T.; Alderson, T.R.; Carneiro, M.G.; AB, E.; Siegal, G.; Baldwin, A.J. Automatic Assignment of Methyl-NMR Spectra of Supramolecular Machines Using Graph Theory. J. Am. Chem. Soc. 2017, 139, 9523–9533. [Google Scholar] [CrossRef] [PubMed]
Kim, H.W.; Zhang, C.; Reher, R.; Wang, M.; Alexander, K.L.; Nothias, L.F.; Han, Y.K.; Shin, H.; Lee, K.Y.; Lee, K.H.; et al. DeepSAT: Learning Molecular Structures from Nuclear Magnetic Resonance Data. J. Cheminform. 2023, 15, 71. [Google Scholar] [CrossRef] [PubMed]
Bai, M.; Zhao, W.Y.; Zhang, Y.J.; Zhang, Y.Y.; Huang, X.X.; Song, S.J. The identification of alkaloids from the stems of Picrasma quassioides via computer-assisted structure elucidation and quantum chemical calculations. J. Asian Nat. Prod. Res. 2021, 23, 217–227. [Google Scholar] [CrossRef] [PubMed]
Kutateladze, A.G.; Kuznetsov, D.M.; Beloglazkina, A.A.; Holt, T. Addressing the Challenges of Structure Elucidation in Natural Products Possessing the Oxirane Moiety. J. Org. Chem. 2018, 83, 8341–8352. [Google Scholar] [CrossRef] [PubMed]
Elyashberg, M.; Argyropoulos, D. Computer Assisted Structure Elucidation (CASE): Current and future perspectives. Magn. Reson. Chem. 2021, 59, 669–690. [Google Scholar] [CrossRef]
Liu, Y.; Saurí, J.; Mevers, E.; Peczuh, M.W.; Hiemstra, H.; Clardy, J.; Martin, G.E.; Williamson, R.T. Unequivocal determination of complex molecular structures using anisotropic NMR measurements. Science 2017, 356, eaam5349. [Google Scholar] [CrossRef]
Howarth, A.; Ermanis, K.; Goodman, J.M. DP4-AI automated NMR data analysis: Straight from spectrometer to structure. Chem. Sci. 2020, 11, 4351–4359. [Google Scholar] [CrossRef]
Alberts, M.; Zipoli, F.; Vaucher, A.C. Learning the Language of NMR: Structure Elucidation from NMR spectra using Transformer Models. ChemRxiv 2023. [Google Scholar] [CrossRef]
Hu, F.; Chen, M.S.; Rotskoff, G.M.; Kanan, M.W.; Markland, T.E. Accurate and efficient structure elucidation from routine one-dimensional NMR spectra using multitask machine learning. arXiv 2024, arXiv:2408.08284. [Google Scholar]
Borges, R.M.; Colby, S.M.; Das, S.; Edison, A.S.; Fiehn, O.; Kind, T.; Lee, J.; Merrill, A.T.; Merz, K.M.J.; Metz, T.O.; et al. Quantum Chemistry Calculations for Metabolomics. Chem. Rev. 2021, 121, 5633–5670. [Google Scholar] [CrossRef] [PubMed]
Bruguière, A.; Derbré, S.; Dietsch, J.; Leguy, J.; Rahier, V.; Pottier, Q.; Bréard, D.; Suor-Cherer, S.; Viault, G.; Le Ray, A.M.; et al. MixONat, a Software for the Dereplication of Mixtures Based on 13C NMR Spectroscopy. Anal. Chem. 2020, 92, 8793–8801. [Google Scholar] [CrossRef] [PubMed]
Bingol, K.; Li, D.W.; Zhang, B.; Brüschweiler, R. Comprehensive Metabolite Identification Strategy Using Multiple Two-Dimensional NMR Spectra of a Complex Mixture Implemented in the COLMARm Web Server. Anal. Chem. 2016, 88, 12411–12418. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Tayyari, F.; Edison, A.S.; Su, Z.; Gu, L. NMR-based metabolomics reveals urinary metabolome modifications in female Sprague–Dawley rats by cranberry procyanidins. J. Nutr. Biochem. 2016, 34, 136–145. [Google Scholar] [CrossRef]
Borchert, A.J.; Gouveia, G.J.; Edison, A.S.; Downs, D.M.; Pupo, M.T. Proton Nuclear Magnetic Resonance Metabolomics Corroborates Serine Hydroxymethyltransferase as the Primary Target of 2-Aminoacrylate in a ridA Mutant of Salmonella enterica. mSystems 2020, 5, 10–1128. [Google Scholar] [CrossRef]
Maughon, T.S.; Shen, X.; Huang, D.; Michael, A.O.A.; Shockey, W.A.; Andrews, S.H.; McRae, J.M.; Platt, M.O.; Fernández, F.M.; Edison, A.S.; et al. Metabolomics and cytokine profiling of mesenchymal stromal cells identify markers predictive of T-cell suppression. Cytotherapy 2022, 24, 137–148. [Google Scholar] [CrossRef]
DeRatt, B.N.; Ralat, M.A.; Lysne, V.; Tayyari, F.; Dhar, I.; Edison, A.S.; Garrett, T.J.; Øivind, M.; Ueland, P.M.; Nygård, O.K.; et al. Metabolomic Evaluation of the Consequences of Plasma Cystathionine Elevation in Adults with Stable Angina Pectoris. J. Nutr. 2017, 147, 1658–1668. [Google Scholar] [CrossRef]
Bakiri, A.; Hubert, J.; Reynaud, R.; Lambert, C.; Martinez, A.; Renault, J.H.; Nuzillard, J.M. Reconstruction of HMBC Correlation Networks: A Novel NMR-Based Contribution to Metabolite Mixture Analysis. J. Chem. Inf. Model. 2018, 58, 262–270. [Google Scholar] [CrossRef] [PubMed]
Kuhn, S.; Colreavy-Donnelly, S.; Santana de Souza, J.; Borges, R.M. An integrated approach for mixture analysis using MS and NMR techniques. Faraday Discuss. 2019, 218, 339–353. [Google Scholar] [CrossRef]
Hubert, J.; Nuzillard, J.M.; Purson, S.; Hamzaoui, M.; Borie, N.; Reynaud, R.; Renault, J.H. Identification of Natural Metabolites in Mixture: A Pattern Recognition Strategy Based on 13C NMR. Anal. Chem. 2014, 86, 2955–2962. [Google Scholar] [CrossRef]
Kuhn, S.; Colreavy-Donnelly, S.; de Andrade Silva Quaresma, L.E.; de Andrade Silva Quaresma, E.; Borges, R.M. Applying NMR compound identification using NMRfilter to match predicted to experimental data. Metabolomics 2020, 16, 123. [Google Scholar] [CrossRef]
de Souza Wuillda, A.C.J.; das Neves Costa, F.; Garrett, R.; dos Santos de Carvalho, M.; Borges, R.M. High-speed countercurrent chromatography with offline detection by electrospray mass spectrometry and nuclear magnetic resonance detection as a tool to resolve complex mixtures: A practical approach using leaf extract. Phytochem. Anal. 2024, 35, 40–52. [Google Scholar] [CrossRef] [PubMed]
Watermann, S.; Bode, M.C.; Hackl, T. Identification of metabolites from complex mixtures by 3D correlation of 1H NMR, MS and LC data using the SCORE-metabolite-ID approach. Sci. Rep. 2023, 13, 15834. [Google Scholar] [CrossRef]
Borges, R.M.; das Neves Costa, F.; Chagas, F.O.; Teixeira, A.M.; Yoon, J.; Weiss, M.B.; Crnkovic, C.M.; Pilon, A.C.; Garrido, B.C.; Betancur, L.A.; et al. Data Fusion-based Discovery (DAFdiscovery) pipeline to aid compound annotation and bioactive compound discovery across diverse spectral data. Phytochem. Anal. 2023, 34, 48–55. [Google Scholar] [CrossRef] [PubMed]
Kuhn, S.; Tumer, E.; Colreavy-Donnelly, S.; Moreira Borges, R. A pilot study for fragment identification using 2D NMR and deep learning. Magn. Reson. Chem. 2022, 60, 1052–1060. [Google Scholar] [CrossRef] [PubMed]
Kim, H.W.; Zhang, C.; Cottrell, G.W.; Gerwick, W.H. SMART-Miner: A convolutional neural network-based metabolite identification from 1H-13C HSQC spectra. Magn. Reson. Chem. 2022, 60, 1070–1075. [Google Scholar] [CrossRef]
Li, C.; Cong, Y.; Deng, W. Identifying molecular functional groups of organic compounds by deep learning of NMR data. Magn. Reson. Chem. 2022, 60, 1061–1069. [Google Scholar] [CrossRef]
Chi, J.; Shu, J.; Li, M.; Mudappathi, R.; Jin, Y.; Lewis, F.; Boon, A.; Qin, X.; Liu, L.; Gu, H. Artificial intelligence in metabolomics: A current review. TrAC Trends Anal. Chem. 2024, 178, 117852. [Google Scholar] [CrossRef]
Debik, J.; Sangermani, M.; Wang, F.; Madssen, T.S.; Giskeødegård, G.F. Multivariate analysis of NMR-based metabolomic data. NMR Biomed. 2022, 35, e4638. [Google Scholar] [CrossRef] [PubMed]
Buergel, T.; Steinfeldt, J.; Ruyoga, G.; Pietzner, M.; Bizzarri, D.; Vojinovic, D.; Upmeier zu Belzen, J.; Loock, L.; Kittner, P.; Christmann, L.; et al. Metabolomic profiles predict individual multidisease outcomes. Nat. Med. 2022, 28, 2309–2320. [Google Scholar] [CrossRef]
Bifarin, O.O.; Gaul, D.A.; Sah, S.; Arnold, R.S.; Ogan, K.; Master, V.A.; Roberts, D.L.; Bergquist, S.H.; Petros, J.A.; Fernández, F.M.; et al. Machine Learning-Enabled Renal Cell Carcinoma Status Prediction Using Multiplatform Urine-Based Metabolomics. J. Proteome Res. 2021, 20, 3629–3641. [Google Scholar] [CrossRef]
Sundekilde, U.K.; Eggers, N.; Bertram, H.C. NMR-Based Metabolomics of Food. In NMR-Based Metabolomics: Methods and Protocols; Gowda, G.A.N., Raftery, D., Eds.; Springer: New York, NY, USA, 2019; pp. 335–344. [Google Scholar] [CrossRef]
Ebrahimi, P.; Viereck, N.; Bro, R.; Engelsen, S.B. Chemometric Analysis of NMR Spectra. In Modern Magnetic Resonance; Webb, G.A., Ed.; Springer International Publishing: Cham, Switzerland, 2017; pp. 1–20. [Google Scholar] [CrossRef]
Cui, C.; Xu, Y.; Jin, G.; Zong, J.; Peng, C.; Cai, H.; Hou, R. Machine learning applications for identify the geographical origin, variety and processing of black tea using 1H NMR chemical fingerprinting. Food Control 2023, 148, 109686. [Google Scholar] [CrossRef]
Saeed, M.; Kim, J.S.; Kim, S.Y.; Ryu, J.E.; Ko, J.; Zaidi, S.F.A.; Seo, J.A.; Kim, Y.S.; Lee, D.Y.; Choi, H.K. Differentiation of Geographical Origin of White and Brown Rice Samples Using NMR Spectroscopy Coupled with Machine Learning Techniques. Metabolites 2022, 12, 1012. [Google Scholar] [CrossRef] [PubMed]
Kuhn, S.; Reitel, K.; Homapour, E.; Kork, K.; Vaino, V.; Arula, T.; Bernotas, P.; Reile, I. Discriminating the origin of fish from closely related water bodies by combining NMR spectroscopy with statistical analysis and machine learning. Ecol. Inform. 2024, 83, 102753. [Google Scholar] [CrossRef]
Khakimov, B.; Mobaraki, N.; Trimigno, A.; Aru, V.; Engelsen, S.B. Signature Mapping (SigMa): An efficient approach for processing complex human urine 1H NMR metabolomics data. Anal. Chim. Acta 2020, 1108, 142–151. [Google Scholar] [CrossRef]
Georgiopoulou, P.D.; Chasapi, S.A.; Christopoulou, I.; Varvarigou, A.; Spyroulias, G.A. Untargeted 1H-NMR Urine Metabolomic Analysis of Preterm Infants with Neonatal Sepsis. Appl. Sci. 2022, 12, 1932. [Google Scholar] [CrossRef]
Da Silva, L.; Godejohann, M.; Martin, F.P.J.; Collino, S.; Bürkle, A.; Moreno-Villanueva, M.; Bernhardt, J.; Toussaint, O.; Grubeck-Loebenstein, B.; Gonos, E.S.; et al. High-Resolution Quantitative Metabolome Analysis of Urine by Automated Flow Injection NMR. Anal. Chem. 2013, 85, 5801–5809. [Google Scholar] [CrossRef]
Hayden, J.; Aaryani, T.S. Explainable AI to Facilitate Understanding of Neural Network-Based Metabolite Profiling Using NMR Spectroscopy. Metabolites 2024, 14, 332. [Google Scholar] [CrossRef] [PubMed]
Otvos, J.D.; Jeyarajah, E.J.; Bennett, D.W. Quantification of plasma lipoproteins by proton nuclear magnetic resonance spectroscopy. Clin. Chem. 1991, 37, 377–386. [Google Scholar] [CrossRef] [PubMed]
Daiana, I.; Dídac, L.; Cèlia, R.B.; Natalia, A.; Núria, P.; Roberto, S.; Ana, G.L.; Núria, A.; Josefa, G.; Lluí, M. The Lipoprotein Profile Evaluated by 1H-NMR Improves the Performance of Genetic Testing in Familial Hypercholesterolemia. J. Clin. Endocrinol. Metab. 2024, 109, e2090–e2099. [Google Scholar] [CrossRef]
Markley, J.L.; Brüschweiler, R.; Edison, A.S.; Eghbalnia, H.R.; Powers, R.; Raftery, D.; Wishart, D.S. The future of NMR-based metabolomics. Curr. Opin. Biotechnol. 2017, 43, 34–40. [Google Scholar] [CrossRef]
Wang, W.; Ma, L.H.; Maletic-Savatic, M.; Liu, Z. NMRQNet: A deep learning approach for automatic identification and quantification of metabolites using Nuclear Magnetic Resonance (NMR) in human plasma samples. bioRxiv 2023. [Google Scholar] [CrossRef]
Embade, N.; Cannet, C.; Diercks, T.; Gil-Redondo, R.; Bruzzone, C.; Ansó, S.; Echevarría, L.R.; Ayucar, M.M.M.; Collazos, L.; Lodoso, B.; et al. NMR-based newborn urine screening for optimized detection of inherited errors of metabolism. Sci. Rep. 2019, 9, 13067. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Selected prediction results for ¹H and ¹³C NMR predictions from 1990 until June 2024, ordered by publication time. The light lines represent the least squares linear regression. The dashed light lines represent the regression for data up to February 2021 from [21]. The squares indicate ab initio calculations, the triangles represent deep learning, and the pentagons represent combined DFT and DL methods. See the text for an explanation and caveats to consider. Data up to February 2021 are from Table 1 in [21].

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kuhn, S.; de Jesus, R.P.; Borges, R.M. Nuclear Magnetic Resonance and Artificial Intelligence. Encyclopedia 2024, 4, 1568-1580. https://doi.org/10.3390/encyclopedia4040102

AMA Style

Kuhn S, de Jesus RP, Borges RM. Nuclear Magnetic Resonance and Artificial Intelligence. Encyclopedia. 2024; 4(4):1568-1580. https://doi.org/10.3390/encyclopedia4040102

Chicago/Turabian Style

Kuhn, Stefan, Rômulo Pereira de Jesus, and Ricardo Moreira Borges. 2024. "Nuclear Magnetic Resonance and Artificial Intelligence" Encyclopedia 4, no. 4: 1568-1580. https://doi.org/10.3390/encyclopedia4040102

APA Style

Kuhn, S., de Jesus, R. P., & Borges, R. M. (2024). Nuclear Magnetic Resonance and Artificial Intelligence. Encyclopedia, 4(4), 1568-1580. https://doi.org/10.3390/encyclopedia4040102

Article Menu

Nuclear Magnetic Resonance and Artificial Intelligence

Abstract

1. Introduction

2. Databases and Data Standards

3. Chemical Shift Prediction

4. Spectral Simulations

5. Spectral Processing and Peak Picking

6. (Pure Compound) Structure Elucidation

7. Mixtures and Metabolomics

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI