Next Article in Journal
Nucleic Acid-Based Approaches to Tackle KRAS Mutant Cancers
Previous Article in Journal
Circulating Proteins as Diagnostic Markers in Gastric Cancer
Previous Article in Special Issue
Self-Organization of Enzyme-Catalyzed Reactions Studied by the Maximum Entropy Production Principle
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Enzyme Databases in the Era of Omics and Artificial Intelligence

by
Uroš Prešern
and
Marko Goličnik
*
Institute of Biochemistry and Molecular Genetics, Faculty of Medicine, University of Ljubljana, Vrazov trg 2, 1000 Ljubljana, Slovenia
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2023, 24(23), 16918; https://doi.org/10.3390/ijms242316918
Submission received: 3 November 2023 / Revised: 24 November 2023 / Accepted: 26 November 2023 / Published: 29 November 2023
(This article belongs to the Special Issue Molecular Advances in Enzyme Kinetics)

Abstract

:
Enzyme research is important for the development of various scientific fields such as medicine and biotechnology. Enzyme databases facilitate this research by providing a wide range of information relevant to research planning and data analysis. Over the years, various databases that cover different aspects of enzyme biology (e.g., kinetic parameters, enzyme occurrence, and reaction mechanisms) have been developed. Most of the databases are curated manually, which improves reliability of the information; however, such curation cannot keep pace with the exponential growth in published data. Lack of data standardization is another obstacle for data extraction and analysis. Improving machine readability of databases is especially important in the light of recent advances in deep learning algorithms that require big training datasets. This review provides information regarding the current state of enzyme databases, especially in relation to the ever-increasing amount of generated research data and recent advancements in artificial intelligence algorithms. Furthermore, it describes several enzyme databases, providing the reader with necessary information for their use.

1. Introduction

Enzymes represent a large and diverse group of biomolecules, catalyzing various chemical reactions in all living organisms. Although most enzymes are proteins, RNA molecules with catalytic functions (so-called ribozymes) exist as well [1]. Furthermore, synthetic deoxyribozymes have been designed [2], whereas natural deoxyribozymes remain to be discovered. Typically, enzymes represent 20–30% of the whole proteome of an organism, e.g., 22% of all human proteins possess some degree of catalytic function, whereas this proportion is higher for Saccharomyces cerevisiae (27%) and Escherichia coli (38%) (based on data from the protein database UniProtKB) [3]. Enzyme research spans over various research fields, from basic to applied life sciences, such as medical [4], pharmaceutical [5], agricultural [6], and biotechnological fields [7]. Information regarding the structure, function, distribution, and molecular and kinetic properties of enzymes is therefore essential to understand how enzymes work. This knowledge can be applied to the development of new drugs, diagnostics tools, green chemistry technologies, and food processing.

1.1. Enzyme Classification

The rapid rise in the number of newly discovered enzymes, which begun in the previous century, resulted in the need for systematic classification and nomenclature of enzymes based on their function and properties. For this purpose, the International Commission on Enzymes was established more than 50 years ago. The function of this initial commission is performed today by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB). The International Commission introduced an enzyme classification system based on Enzyme Commission (EC) numbers associated with systematic and recommended enzyme names [8]. A four-level EC number is assigned to each enzyme based on its function, rather than its structure (e.g., EC 1.1.1.1 designates alcohol dehydrogenase). Consequently, all enzymes catalyzing the same reaction have the same EC number, even though their protein sequences differ and may not even be evolutionarily connected. The first digit of the EC number designates the class of an enzyme. There are seven classes: oxidoreductases, transferases, hydrolases, lyases, isomerases, ligases, and translocases, with translocases only added recently in 2018. The second and third digits of the EC number correspond to the subclass and sub-subclass, whereas the fourth digit represents a serial number. Currently, 6710 active EC numbers exist (with an additional 1346 EC numbers that were either deleted or transferred to a different number). Every year, approximately 100 new EC numbers are introduced (Figure 1a).

1.2. Enzyme Databases

In addition to the need for enzyme classification, a demand for systematic storage of an increasing amount of enzyme data (Figure 1b,c) arose as well. Enzyme databases serve as centralized repositories of information on enzymes, offering a wealth of data that can aid in various aspects of scientific research: gathering information, designing experiments, and performing comparative analyses, functional annotation, data integration, predictions, and modeling. Throughout the years, enzyme databases that cover different aspects of enzyme properties and functions have been established. The first and only overview of enzyme databases was published by Schomburg et al. in 2010 [9]. The functionalities of enzyme databases have improved since then and many new databases have been introduced. The aim of this article is therefore to present the updated state of the enzyme database field.
Enzyme databases can be classified as general databases, which cover all varieties of enzymes (Table 1), and specialized databases, which focus only on enzymes from a specific class or organism (Table S1). Certain enzyme databases provide only one type of information (e.g., enzyme nomenclature, kinetic parameters, reaction mechanisms, or involvement in metabolic pathways), whereas others are more comprehensive and gather diverse types of data. Additionally, enzyme data can be obtained from databases that are not specifically focused on enzymes per se but still contain relevant enzyme information (e.g., UniProt, Protein Data Bank (PDB), and Reactome). Corresponding data entries from different databases are usually cross-referenced, a fact which enables easier navigation among them. Secondary information resources such as Enzyme Portal offer a concise overview of enzyme data by integrating publicly available information from primary enzyme databases on one site [10].

2. The Quest to Standardize Data Reporting

The rise of the omics era has caused a huge increase in the amount of data gathered per experiment; however, incorporating all these data into databases has proven to be a challenging task. Most enzyme databases are manually curated, which is a time- and resource-consuming task. Research groups that manage such databases often have limited funding and therefore have troubles keeping databases up to date. Another obstacle during curation is encountering missing, incomplete, or ambiguous published data [39]. Exact information regarding experimental conditions is often lacking, which is problematic for data reproduction and analysis because enzyme properties and activity highly depend on parameters such as temperature, substrate/enzyme concentrations, and buffer composition. Analysis of published data has shown that 11%, 45%, 11%, and 22% of papers do not report data regarding temperature, enzyme concentration, substrate concentration, or buffer counterions, respectively [40]. Additionally, the lack of proper nomenclature and identifiers (e.g., EC numbers are provided in less than half of published papers) leads to ambiguous identification of used enzymes and chemical compounds [39]. Data relevant for inclusion in databases are often scattered throughout the text, graphs, and figures, a fact which prolongs the time needed for data curation [41]. Additionally, extracting data from graphs and figures does not always enable an accurate determination of parameters if the exact numerical value is not stated in the text.
Various efforts have been made to improve the quality and standardization of reported data. The development of ontologies and controlled vocabularies helps with problems regarding ambiguous data; however, implementing their use is hard to achieve in practice [39]. The initiative Standards for Reporting Enzyme Data (STRENDA) has been established to provide guidelines regarding which information should be included when reporting biocatalytic reactions and their results [42]. To date, more than 55 international biochemistry journals have adopted the STRENDA guidelines. To ensure datasets are complete and valid before submission, the STRENDA Database (STRENDA DB) was developed [17]. Data entered into the STRENDA DB are automatically checked for compliance with STRENDA guidelines, notifying the relevant researcher if required information is missing. When an article is published, its dataset, stored in the STRENDA DB, becomes public and is assigned a digital object identifier (DOI) for easier tracking. Such deposition of experimental data does not only ensure completeness of information but also simplifies the integration of these data into various enzyme databases, reducing the need for manual curation. To date, only 40 articles have submitted their data to the STRENDA DB since it was founded in 2017, signifying that the submission of experimental data has not yet become the norm.
EnzymeML represents another attempt to standardize data reporting and implement FAIR (findable, accessible, interoperable, reusable) principles of data management in the field of enzyme research [43]. EnzymeML is an extensible markup language-based data exchange format that enables the transfer of data among experimental platforms, modelling tools, and databases [44,45]. It comprises experimental data and metadata with implemented STRENDA guidelines. Original data (e.g., the time course of substrate concentrations) are also included, enabling other researchers to reanalyze experimental data. EnzymeML files can be uploaded to various repositories, such as Dataverse, and can be linked to the original research paper. Thus, data can be directly uploaded to enzyme databases, eliminating the need for manual curation.

3. The Future of Enzyme Databases in the Light of Recent Artificial Intelligence Developments

In recent years, scientific fields have been affected by the quick progress in artificial intelligence, and enzyme research is no exception. Although various prediction tools have already been present for several decades, deep-learning-based prediction models have considerably higher accuracy and precision. Deep learning algorithms for the prediction of enzyme function [46,47,48,49], structure [50,51], catalytic residues [52,53], and kinetic properties [54,55,56], as well as algorithms for the de novo design of enzymes [57,58], have already been developed.
Enzyme databases are important for the development of such algorithms because they contain useful information for the generation of training datasets. However, for databases to be used for this purpose, they must be machine-readable. While humans rely on context when selecting and interpreting data, the ability of computers to do the same is limited. Therefore, data must be organized in a systematic and predictable structure with a clear schema definition that describes how the data in a database are organized and how different parts relate to each other. Data should be presented in a consistent format with a standardized vocabulary to make it easier for machines to process information (a fact which underlines the usefulness of ontologies once again). It is also important that the data are accessible so that they can be easily imported into a data analysis tool [43]. Currently, most enzyme databases would benefit from improvements of their machine-readability, as they often do not present data in a sufficiently systematic way [18]. To this end, the standardization of data reporting (mentioned in the previous section) would help as well. Additionally, adjusting the architecture of databases to facilitate automatic data retrieval and analysis is necessary.
Apart from being used as a source for the training of datasets, enzyme databases can also be used for the deposition of predicted parameters generated by deep learning algorithms. A significant number of enzyme researchers lack the appropriate knowledge to run deep-learning-based software; therefore, such databases provide wider access to predicted data. Additionally, the acquisition of data is faster because the algorithm is run in advance [20].
This year has been marked by the breakthrough in the development of large language models (LLMs) such as GPT-4, BARD, and PaLM 2 [59]. LLMs are deep learning algorithms with a remarkable ability to understand and generate text-based content. Therefore, there is a great potential for the implementation of LLMs in the process of data extraction and curation in order to reduce the need for manual curation. This would vastly accelerate the speed of incorporation of new information into databases and, thus, alleviate the problem of maintaining databases up to date. Greater coverage of available experimental information would also improve subsequent data analysis and provide even larger training datasets for artificial intelligence prediction algorithms. In addition to data curation, LLMs could also be used for data analysis. Their ability to process complex queries could enhance search capabilities of databases in order to provide the user with the desired data. The usefulness of LLMs as prediction tools have also already been demonstrated [60].

4. Overview of General Enzyme Databases

The following section provides an overview of selected general enzyme databases. Data content, general architecture and properties, and available tools are presented for each database. In addition to the most widely used enzyme databases, GotEnzymes and TopEnzymes, which contain data generated by deep learning algorithms, are presented. Other specialized databases are listed in Table S1 in the Supplementary Material.

4.1. Enzyme Nomenclature Databases: ExplorEnz, IntEnz, and ExPASy ENZYME

ExplorEnz (https://www.enzyme-database.org/, accessed on 1 November 2023) is a primary source for IUBMB enzymes [12]. It contains basic data for all classified enzymes: IUBMB classification, accepted and systematic names as well as other names, catalyzed reaction, links to other databases, and references to the literature. It also provides guidelines and recommendations for enzyme naming. An input form enables researchers to report enzymes not yet classified or request changes to existing data.
Two additional enzyme classification databases are ExPASy ENZYME (https://enzyme.expasy.org/, accessed on 1 November 2023) [13] and IntEnz, Integrated relational Enzyme Database (https://www.ebi.ac.uk/intenz/, accessed on 1 November 2023) [14]. In comparison to ExplorEnz, they also provide enzyme sequence information by referencing corresponding data entries in UniProt. IntEnz additionally connects cofactor data to the Chemical Entities of Biological Interest database [61], thus improving standardization of its vocabulary.

4.2. BRENDA

BRENDA, Braunschweig Enzyme Database (www.brenda-enzymes.org, accessed on 1 November 2023), is a comprehensive enzyme information system [62]. Established in 1987, BRENDA started as a database containing manually curated enzyme-specific data extracted from the literature. However, other functionalities such as visualization tools, prediction algorithms, text mining methods, and integration from external sources were added throughout the years. It has been a part of the ELIXIR Core Data Resource since 2018 [63].
The database provides information on ~8400 enzyme entries (EC numbers). For enzymes not yet classified by NC-IUBMB, a preliminary BRENDA-supplied EC number is given. Each data entry is connected to its respective literature reference, source organism, and UniProt protein sequence ID, if available. Data entries are stored in ~50 categories covering enzyme classification and nomenclature, enzyme–ligand interactions, functional and kinetic parameters, molecular properties, stability, enzyme structure, organism-related information, isolation and preparation methods, application, related diseases, and references. Listed enzyme reactions include naturally occurring and synthetic substrates. Information regarding cofactors, inhibitors, and activating compounds is also included. All compounds that interact with specific enzymes are stored in the associated ‘BRENDA ligand database’ in which ligand structures are provided with their names and synonyms.
Importantly, BRENDA contains data on a wide spectrum of kinetic parameters (KM, kcat, Ki, IC50, specific activity, temperature, and pH optimum/range). Numerical values of parameters are provided in standard units. Essential information describing experimental conditions used for determining parameters is described in the commentary section. This section is not fully standardized, which hinders automatic extraction and analysis of experimental conditions.
Each enzyme is presented alongside information regarding the organism source. In cases of multicellular organisms, the tissue type from which an enzyme originates is further specified. The terms used to describe the location and tissue type are based on the BRENDA Tissue Ontology [64]. To describe the subcellular localization, Gene Ontology terms are used [65,66]. Protein sequences and structures are retrieved from UniProt and PDB, respectively. Structures with active and binding sites can be visualized via an integrated NGL viewer [15,67].
The user can access information by performing text-based queries with quick, full-text, or advanced searches. The sequence search is useful for enzymes with known protein sequences. The Taxonomy Tree Explorer, EC Explorer, and Ontology Explorer can be used to search for enzymes in the taxonomic tree, hierarchical tree, and various biochemical ontologies, respectively. Enzyme ligands can be searched for with structure-based queries by drawing a chemical structure [63]. A substructure, isomer, or similarity search can be performed.
Text mining procedures complement manual annotation of data to provide complete overviews of the available literature. The data are derived from the titles and abstracts of all enzyme-related articles accessible within the PubMed database and are stored in the accessory repositories FRENDA, AMENDA, DRENDA, and KENDA. Data entries produced by text mining are classified into four categories based on their reliability level. FRENDA (Full Reference Enzyme Data) contains links to literature references providing information on enzyme occurrence in living organisms [68,69]. AMENDA (Automatic Mining of Enzyme Data) is a subset of FRENDA containing its most reliable data and additional information on the tissue source and subcellular localization of an enzyme [68,69]. While performing queries in BRENDA, one can choose whether to include data from FRENDA/AMENDA. DRENDA (Disease-Related Enzyme Information Database) contains information on enzyme–disease relations [70]. In addition to providing a reference link, these relations are classified into four categories: ‘Causal Interaction’, ‘Ongoing Research’, ‘Diagnostic Usage’, and ‘Therapeutic Application’. KENDA (Kinetic Enzyme Data) contains kinetic parameter data extracted from PubMed [70].
BRENDA employs visualization tools to offer clearer and easier access to enzyme data. A summary page of each enzyme class and organism (animal kingdom not yet included) contains a word cloud, offering a quick overview of the key aspects of the relevant research area [63,71]. The distribution of functional parameters can be visualized for a selected organism or enzyme class; however, this visualization is not interactive and does not have options to further specify a database query [67]. The BRENDA Genome Explorer provides a connection between genomic and enzymatic data. It displays enzyme genes in a selected genome and enables comparisons among genomic environments of an enzyme gene in different organisms [68]. BRENDA pathway maps visually display metabolic pathways with their associated enzymes and ligands [72]. Furthermore, users can upload their own transcriptomic, proteomic, or metabolomic data to analyze them in their metabolic context [15].
Prediction tools integrated in BRENDA enable additional annotation of enzyme data. EnzymeDetector provides genome-wide enzyme function annotation, which is performed by combining data from the main annotation databases and results in prediction algorithms that rely on sequence similarity analysis and sequence pattern searches [73]. Confidence scores are calculated based on the agreement level of different sources, thereby improving reliability and decreasing the error rate compared to using a single annotation source. Furthermore, implementation of TransMembrane Hidden Markov Model [74] and Target-P 1.1 [75] enables the prediction of the presence of transmembrane helices and subcellular localization of enzymes, respectively. The performances of these algorithms have been surpassed by deep-learning-based models [76,77], which have not yet been included in the BRENDA database.
BRENDA represents one of the most comprehensive databases for biochemical reactions, although it remains incomplete. The BKMS-react module (https://bkms.brenda-enzymes.org, accessed on 1 November 2023) combines the BRENDA database with three other main reaction databases: KEGG, MetaCyc, and SABIO-RK [29,63]. This provides a nonredundant combined list of biochemical reactions, enabling the creation of more accurate metabolic models.
All manually annotated BRENDA data can be downloaded as a single text file. Enzyme kinetic data can be exported in a Systems Biology Markup Language format as well [67]. Computer-based data acquisition is possible via the SOAP-based web service [69].

4.3. SABIO-RK

SABIO-RK, System for the Analysis of Biochemical Pathways—Reaction Kinetics (https://sabiork.h-its.org, accessed on 1 November 2023), is a manually curated database containing data on biochemical reactions and their kinetics [78,79]. Besides metabolic reactions, a small fraction of data consist of cellular signaling and transport reactions. Although the overall number of kinetic parameters in SABIO-RK is lower than that in BRENDA (e.g., ~51,000 KM and ~28,000 kcat values vs. ~177,000 KM and ~87,000 kcat values, respectively), SABIO-RK provides more structured and standardized data with separately stored experimental conditions and kinetic rate laws. This makes the extraction and analysis of kinetic data much easier compared to BRENDA, in which all experimental parameters are stored together in a commentary section in a non-standardized fashion with often incomplete information. Additionally, SABIO-RK enables more interactive visualization of kinetic data, facilitating searches and analyses [16].
In addition to manual data extraction from published literature by curators, data from laboratory experiments can also be submitted directly by users. Uploading EnzmyeML documents is possible, an option which simplifies submission of one’s own data [44]. At first, the selection of publications for data extraction was performed non-specifically by keyword searches in PubMed [79]. Nowadays, publications are selected based on SABIO-RK collaboration projects and user requests [16]. Consequently, certain organisms and enzyme classes are more represented than others.
Each data entry contains kinetic data of a single reaction performed under specific experimental conditions. If a publication contains data for multiple reactions, or the same reaction is performed under different experimental conditions, these data are stored in several distinct data entries. A data entry consists of a description of the biochemical reaction (substrates, products, enzyme class, organism taxonomy, and tissue and cellular location), kinetic data, and additional annotation data [80]. The type of kinetic model and associated equation are listed (e.g., Michaelis–Menten rate law, competitive inhibition, and Hill equation). The provided kinetic parameters include substrate and product concentrations and determined kinetic constants (e.g., Km, kcat, Vmax, KI, and Hill coefficient). Experimental conditions include pH, temperature, and buffer composition. Enzyme variants (wildtype, mutant, or recombinant) are specified as well. Any association of reactions with metabolic pathways and additional enzyme information are obtained from KEGG and UniProt. Standardization of the vocabulary used to enter the data is maintained using various ontologies (i.e., Chemical Entities of Biological Interest, BRENDA Tissue Ontology, Gene Ontology, Systems Biology Ontology, and NCBI taxonomy).
The database information can be queried with free text or advanced searches, and the results can be displayed as a list of individual data entries or grouped together based on corresponding reactions. SABIO-RK contains a tool for the interactive visualization of search results, which are presented in the form of heat maps, parallel coordinates, and scatter plots [16]. This provides a quick overview of the results and promotes a better understanding of the database content and connections among individual parameters. Kinetic parameter-based graphs enable the exploration of the kinetic data space and identification of outliers. By selecting data in the graphs, searches can be further refined, thus providing an alternative to ordinary text queries.
SABIO-RK data can be downloaded in a Systems Biology Markup Language format. Computer-based data acquisition is possible via RESTful web services.

4.4. Reaction Mechanism Databases: M-CSA and EzCatDB

M-CSA, Mechanism and Catalytic Site Atlas (https://www.ebi.ac.uk/thornton-srv/m-csa, accessed on 1 November 2023), is a manually curated database of enzyme reaction mechanisms and active sites [33]. It was built in 2017 by merging two separate databases: CSA—Catalytic Site Atlas [81] and MACiE—Mechanism, Annotation and Classification in Enzymes [82]. M-CSA provides information (position and role) on the catalytic residues, cofactors, and metal ions that participate in the catalyzation of a reaction as well as the annotation of overall reactions and their individual steps (if the mechanism is known). Reaction mechanisms are presented by curly arrow description, showing how electrons are transferred during each step of the reaction. A data entry in the database represents a unique reaction mechanism, meaning that two enzymes with identical mechanisms are not submitted as two separate entries (unless they are the result of convergent evolution). For each mechanism, a representative enzyme is chosen. Information regarding catalytic residue positions in the enzyme sequence and 3D structure are obtained from UniProt and PDB, respectively. Homologues of a representative enzyme are determined using PHMMER [83], and an analysis of catalytic residue conservation between homologues is provided.
Browsing and searching the data in M-CSA is possible in a text or graphical manner. Currently, the database contains 1003 entries, of which 734 contain detailed mechanistic descriptions. When M-CSA was set up in 2017, the existing data from CSA and MACiE were updated; however, inclusion of new data has been advancing slowly (58 new entries in the last 6 years).
EzCatDB, Enzyme Catalytic Mechanism Database (https://ezcatdb.cbrc.pj.aist.go.jp/EzCatDB, accessed on 1 November 2023), is a manually curated database that hierarchically classifies catalytic mechanisms based on four levels: basic reactions, ligand groups, catalytic mechanisms, and active sites [34]. Otherwise, it contains similar types of information as M-CSA, although it is slightly less comprehensive (880 entries).

4.5. MetaCyc

MetaCyc (https://metacyc.org/, accessed on 1 November 2023) is a manually curated database of metabolic pathways and enzymes [36]. It relies solely on experimental data and does not include computationally predicted pathways. MetaCyc contains information regarding metabolic pathways, reactions, compounds, enzymes, and their genes, with each having its own description page. However, all the information is interrelated, a fact which enables effective analysis of different relationships. Currently, it contains ~3100 pathways with ~18,500 reactions.
MetaCyc aims to represent true biological pathways without combining data from different organisms. If different variants of a pathway exist, they are recorded separately. Pathway data provide a list of reactions, background information on participating metabolites, relationships with respect to other pathways, taxonomic distribution, and references to relevant published literature. Information regarding taxonomic distribution of pathways is not comprehensive, and only organisms with well-studied pathways are included [84].
Enzyme data include molecular properties, kinetic parameters, tissue type, subcellular location, substrate specificity, and available experimental evidence. The name of the enzyme gene with a link to the external database containing sequence information is also provided.
The database can be searched via text-based queries (quick or advanced searches) or browsed using ontologies. BLAST searches using protein or nucleotide sequence are possible as well. MetaCyc includes a variety of specialized tools for visualization and analysis of metabolic networks [85,86,87]. Additionally, tools for the analysis of one’s own omics data are provided [88].

4.6. KEGG

KEGG, Kyoto Encyclopedia of Genes and Genomes (https://www.genome.jp/kegg/, accessed on 1 November 2023), is a composite database resource consisting of 16 manually curated databases that cover systems, genomic, chemical, and health information [35]. The KEGG Pathway database contains manually drawn reference maps, which reflect a union of experimentally determined biological pathways from multiple organisms (in contrast to MetaCyc, which provides only true biological pathways). Additionally, organism-specific pathways are generated computationally from genomic data using the KEGG Orthology system [89]. The KEGG Pathway database is linked with reaction and enzyme databases. In addition to NC-IUBMB classified reactions, additional reactions from the KEGG metabolic pathways are present. Currently, KEGG contains ~12,000 reactions. The KEGG Enzyme database contains information on enzyme substrates and products, links to reactions and pathways that the enzyme participates in, list of genes that code for that enzyme in different organisms, and references to the literature. KEGG enables text-based queries, searching via BLAST, and interactive browsing of metabolic maps and genomes. KEGG also contains tolls for analyses of omics data and functional characterization of genome sequences [90]; however, its array of tools is not as wide as that of MetaCyc [91].

4.7. Reactome

The Reactome Knowledgebase (https://reactome.org/, accessed on 1 November 2023) offers manually curated and peer-reviewed data on a wide range of biological processes [38]. In contrast to KEGG and MetaCyc, Reactome provides manual annotation only for human data. However, the semi-automatic identification of orthologous reactions enables the extension of the data to 15 non-human species.
The basic unit of Reactome is a reaction, which is defined as any event that converts inputs into outputs. Reactions in Reactome are therefore not limited to classical biochemical reactions, but also include ligand binding, complex formation and dissociation, conformational changes, etc. Physical entities that represent inputs and outputs of reactions can be of different types: e.g., small molecules, proteins, other macromolecules, and their complexes. Catalyzed reactions are associated with corresponding enzymes. Regulators (if present) are also linked to the reactions, together with information about their mode of action. Additionally, the reactions include information about the subcellular location and experimental evidence. Together with entities (inputs, outputs, enzymes, regulators), reactions are extensively cross-referenced with other databases and ontologies [92].
Series of reactions chained together by common inputs or outputs form pathways. Reactome currently contains ~2600 pathways and ~15,000 reactions involved in various cellular processes such as signal transduction, cell cycle, motility, and immune response. The annotation of pathogenic genomic variants, infectious pathogens genomic data, and molecular mechanisms of drug action enables the modeling of pathways in various diseases [93,94].
The Reactome database can be accessed with Pathway Browser, which enables the data to be searched via a graphical user interface. A full-text search is also possible. In addition, Reactome offers a range of tools for the analysis of experimental data. The protocols by Rothfels et al. provide instructions on how to use them [95].

4.8. GotEnzymes

GotEnzymes (https://metabolicatlas.org/gotenzymes, accessed on 1 November 2023) is a newly established database containing enzyme parameter predictions made by artificial intelligence [20]. It enables the retrieval of predicted parameters without the need for complete reproduction of a prediction pipeline (which can be resource-consuming). Such data can be used for statistical analysis and implementation into genome-scale metabolic models. The database currently contains predicted turnover numbers for 25.7 million enzyme–compound pairs from 8099 different organisms. The input data (protein sequences, compound structures, and EC numbers of reactions connecting enzymes with compounds) were extracted from KEGG, whereas predictions were made by the deep learning algorithm DLKcat [56]. Plans for future GotEnzymes releases comprise the inclusion of predictions for other kinetic parameters and implementation of annotations from other databases alongside KEGG.

4.9. TopEnzyme

TopEnzyme (https://cpclab.uni-duesseldorf.de/topenzyme, accessed on 1 November 2023) is a database of enzyme structure models containing more than 200,000 sequences [23]. Enzyme models are created with TopModel [51] and are linked to PDB, SWISS-MODEL repository [96], and AlphaFold Protein Structure Database [22]. Protein sequences used to generate models are extracted from the UniprotKB/Swiss-Prot database, thus covering 60% of all EC numbers. Although the AlphaFold Protein Structure Database provides models of unreviewed sequences in UniprotKB/TrEMBL as well, these are not included in TopEnzyme. Each data entry contains a Uniprot accession number and links to models and existing PDB structures. A confidence score for each model is provided. Additional information regarding enzyme name, function, EC number, organism source, and links to external databases is provided as well. Some structures also contain information regarding active and binding site residues.

5. Conclusions

Databases play an important role in enzyme research. To optimize the research process, constant improvements and upgrades of databases are necessary. Currently, one of the main focuses should be data standardization, which should be accomplished at the level of both data reporting in research articles and database curation. Although human-readability was the main goal of databases in the past, machine-readability has now become equally important, as it enables the performance of analyses of large amounts of data. Databases should also find a way to include data obtained from artificial intelligence prediction algorithms, which could close the gaps where experimental data are missing.

Supplementary Materials

Author Contributions

Conceptualization, U.P. and M.G.; writing—original draft preparation, U.P.; writing—review and editing, M.G.; visualization, U.P.; supervision, M.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Slovenian Research and Innovation Agency (research program P1-170).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study.

Acknowledgments

We thank Eva Lasic for reviewing a draft of this manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Strobel, S.A.; Cochrane, J.C. RNA catalysis: Ribozymes, ribosomes, and riboswitches. Curr. Opin. Chem. Biol. 2007, 11, 636–643. [Google Scholar] [CrossRef] [PubMed]
  2. Silverman, S.K. Catalytic DNA: Scope, Applications, and Biochemistry of Deoxyribozymes. Trends Biochem. Sci. 2016, 41, 595–609. [Google Scholar] [CrossRef] [PubMed]
  3. Consortium, T.U. UniProt: The Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 2023, 51, D523–D531. [Google Scholar] [CrossRef] [PubMed]
  4. Geronikaki, A.; Eleutheriou, P.T. Enzymes and Enzyme Inhibitors—Applications in Medicine and Diagnosis. Int. J. Mol. Sci. 2023, 21, 5245. [Google Scholar] [CrossRef] [PubMed]
  5. Meghwanshi, G.K.; Kaur, N.; Verma, S.; Dabi, N.K.; Vashishtha, A.; Charan, P.D.; Purohit, P.; Bhandari, H.S.; Bhojak, N.; Kumar, R. Enzymes for pharmaceutical and therapeutic applications. Biotechnol. Appl. Biochem. 2020, 67, 586–601. [Google Scholar] [CrossRef] [PubMed]
  6. Mir Khan, U.; Selamoglu, Z. Use of Enzymes in Dairy Industry: A Review of Current Progress. Arch. Razi Inst. 2020, 75, 131–136. [Google Scholar]
  7. Wu, S.; Snajdrova, R.; Moore, J.C.; Baldenius, K.; Bornscheuer, U.T. Biocatalysis: Enzymatic Synthesis for Industrial Applications. Angew. Chem. Int. Ed. 2021, 60, 88–119. [Google Scholar] [CrossRef]
  8. McDonald, A.G.; Tipton, K.F. Enzyme nomenclature and classification: The state of the art. FEBS J. 2023, 290, 2214–2231. [Google Scholar] [CrossRef]
  9. Schomburg, D.; Schomburg, I. Enzyme Databases. In Data Mining Techniques for the Life Sciences; Carugo, O., Eisenhaber, F., Eds.; Humana Press: Totowa, NJ, USA, 2010; pp. 113–128. [Google Scholar]
  10. Alcántara, R.; Onwubiko, J.; Cao, H.; de Matos, P.; Cham, J.A.; Jacobsen, J.; Holliday, G.L.; Fischer, J.D.; Rahman, S.A.; Jassal, B.; et al. The EBI enzyme portal. Nucleic Acids Res. 2013, 41, D773–D780. [Google Scholar] [CrossRef]
  11. Ma, L.; Zou, D.; Liu, L.; Shireen, H.; Abbasi, A.A.; Bateman, A.; Xiao, J.; Zhao, W.; Bao, Y.; Zhang, Z. Database Commons: A Catalog of Worldwide Biological Databases. Genom. Proteom. Bioinform. 2022; in press. [Google Scholar] [CrossRef]
  12. McDonald, A.G.; Boyce, S.; Tipton, K.F. ExplorEnz: The primary source of the IUBMB enzyme list. Nucleic Acids Res. 2009, 37, D593–D597. [Google Scholar] [CrossRef]
  13. Bairoch, A. The ENZYME database in 2000. Nucleic Acids Res. 2000, 28, 304–305. [Google Scholar] [CrossRef]
  14. Fleischmann, A.; Darsow, M.; Degtyarenko, K.; Fleischmann, W.; Boyce, S.; Axelsen, K.B.; Bairoch, A.; Schomburg, D.; Tipton, K.F.; Apweiler, R. IntEnz, the integrated relational enzyme database. Nucleic Acids Res. 2004, 32, D434–D437. [Google Scholar] [CrossRef] [PubMed]
  15. Chang, A.; Jeske, L.; Ulbrich, S.; Hofmann, J.; Koblitz, J.; Schomburg, I.; Neumann-Schaal, M.; Jahn, D.; Schomburg, D. BRENDA, the ELIXIR core data resource in 2021: New developments and updates. Nucleic Acids Res. 2021, 49, D498–D508. [Google Scholar] [CrossRef] [PubMed]
  16. Dudaš, D.; Wittig, U.; Rey, M.; Weidemann, A.; Müller, W. Improved insights into the SABIO-RK database via visualization. Database 2023, 2023, baad011. [Google Scholar] [CrossRef] [PubMed]
  17. Swainston, N.; Baici, A.; Bakker, B.M.; Cornish-Bowden, A.; Fitzpatrick, P.F.; Halling, P.; Leyh, T.S.; O’Donovan, C.; Raushel, F.M.; Reschel, U.; et al. STRENDA DB: Enabling the validation and sharing of enzyme kinetics data. FEBS J. 2018, 285, 2193–2204. [Google Scholar] [CrossRef] [PubMed]
  18. Yan, B.; Ran, X.; Gollu, A.; Cheng, Z.; Zhou, X.; Chen, Y.; Yang, Z.J. IntEnzyDB: An Integrated Structure–Kinetics Enzymology Database. J. Chem. Inf. Model. 2022, 62, 5841–5848. [Google Scholar] [CrossRef] [PubMed]
  19. Wang, X.; Zhang, X.; Peng, C.; Shi, Y.; Li, H.; Xu, Z.; Zhu, W. D3DistalMutation: A Database to Explore the Effect of Distal Mutations on Enzyme Activity. J. Chem. Inf. Model. 2021, 61, 2499–2508. [Google Scholar] [CrossRef]
  20. Li, F.; Chen, Y.; Anton, M.; Nielsen, J. GotEnzymes: An extensive database of enzyme parameter predictions. Nucleic Acids Res. 2023, 51, D583–D586. [Google Scholar] [CrossRef]
  21. Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235–242. [Google Scholar] [CrossRef]
  22. Varadi, M.; Anyango, S.; Deshpande, M.; Nair, S.; Natassia, C.; Yordanova, G.; Yuan, D.; Stroe, O.; Wood, G.; Laydon, A.; et al. AlphaFold Protein Structure Database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2022, 50, D439–D444. [Google Scholar] [CrossRef]
  23. van der Weg, K.J.; Gohlke, H. TopEnzyme: A framework and database for structural coverage of the functional enzyme space. Bioinformatics 2023, 39, btad116. [Google Scholar] [CrossRef]
  24. Qi, G.; Hayward, S. Database of ligand-induced domain movements in enzymes. BMC Struct. Biol. 2009, 9, 13. [Google Scholar] [CrossRef]
  25. Fischer, J.D.; Holliday, G.L.; Thornton, J.M. The CoFactor database: Organic cofactors in enzyme catalysis. Bioinformatics 2010, 26, 2496–2497. [Google Scholar] [CrossRef] [PubMed]
  26. Lingė, D.; Gedgaudas, M.; Merkys, A.; Petrauskas, V.; Vaitkus, A.; Grybauskas, A.; Paketurytė, V.; Zubrienė, A.; Zakšauskas, A.; Mickevičiūtė, A.; et al. PLBD: Protein–ligand binding database of thermodynamic and kinetic intrinsic parameters. Database 2023, 2023, baad040. [Google Scholar] [CrossRef] [PubMed]
  27. Sillitoe, I.; Furnham, N. FunTree: Advances in a resource for exploring and contextualising protein function evolution. Nucleic Acids Res. 2016, 44, D317–D323. [Google Scholar] [CrossRef] [PubMed]
  28. Hadadi, N.; Hafner, J.; Shajkofci, A.; Zisaki, A.; Hatzimanikatis, V. ATLAS of Biochemistry: A Repository of All Possible Biochemical Reactions for Synthetic Biology and Metabolic Engineering Studies. ACS Synth. Biol. 2016, 5, 1155–1166. [Google Scholar] [CrossRef] [PubMed]
  29. Lang, M.; Stelzer, M.; Schomburg, D. BKM-react, an integrated biochemical reaction database. BMC Biochem. 2011, 12, 42. [Google Scholar] [CrossRef] [PubMed]
  30. Sun, D.; Cheng, X.; Tian, Y.; Ding, S.; Zhang, D.; Cai, P.; Hu, Q. EnzyMine: A comprehensive database for enzyme function annotation with enzymatic reaction chemical feature. Database 2020, baaa065. [Google Scholar] [CrossRef]
  31. Bansal, P.; Morgat, A.; Axelsen, K.B.; Muthukrishnan, V.; Coudert, E.; Aimo, L.; Hyka-Nouspikel, N.; Gasteiger, E.; Kerhornou, A.; Neto, T.B.; et al. Rhea, the reaction knowledgebase in 2022. Nucleic Acids Res. 2022, 50, D693–D700. [Google Scholar] [CrossRef] [PubMed]
  32. McDonald, A.G.; Tipton, K.F.; Boyce, S. Tracing metabolic pathways from enzyme data. Biochim. Biophys. Acta Proteins Proteom. 2009, 1794, 1364–1371. [Google Scholar] [CrossRef] [PubMed]
  33. Ribeiro, A.J.M.; Holliday, G.L.; Furnham, N.; Tyzack, J.D.; Ferris, K.; Thornton, J.M. Mechanism and Catalytic Site Atlas (M-CSA): A database of enzyme reaction mechanisms and active sites. Nucleic Acids Res. 2018, 46, D618–D623. [Google Scholar] [CrossRef] [PubMed]
  34. Nagano, N.; Nakayama, N.; Ikeda, K.; Fukuie, M.; Yokota, K.; Doi, T.; Kato, T.; Tomii, K. EzCatDB: The enzyme reaction database, 2015 update. Nucleic Acids Res. 2015, 43, D453–D458. [Google Scholar] [CrossRef]
  35. Kanehisa, M.; Furumichi, M.; Sato, Y.; Kawashima, M.; Ishiguro-Watanabe, M. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res. 2023, 51, D587–D592. [Google Scholar] [CrossRef] [PubMed]
  36. Caspi, R.; Billington, R.; Keseler, I.M.; Kothari, A.; Krummenacker, M.; Midford, P.E.; Ong, W.K.; Paley, S.; Subhraveti, P.; Karp, P.D. The MetaCyc database of metabolic pathways and enzymes—A 2019 update. Nucleic Acids Res. 2020, 48, D445–D453. [Google Scholar] [CrossRef]
  37. Wishart, D.S.; Li, C.; Marcu, A.; Badran, H.; Pon, A.; Budinski, Z.; Patron, J.; Lipton, D.; Cao, X.; Oler, E.; et al. PathBank: A comprehensive pathway database for model organisms. Nucleic Acids Res. 2020, 48, D470–D478. [Google Scholar] [CrossRef]
  38. Milacic, M.; Beavers, D.; Conley, P.; Gong, C.; Gillespie, M.; Griss, J.; Haw, R.; Jassal, B.; Matthews, L.; May, B.; et al. The Reactome Pathway Knowledgebase 2024. Nucleic Acids Res. 2023; in press. [Google Scholar] [CrossRef]
  39. Wittig, U.; Rey, M.; Kania, R.; Bittkowski, M.; Shi, L.; Golebiewski, M.; Weidemann, A.; Müller, W.; Rojas, I. Challenges for an enzymatic reaction kinetics database. FEBS J. 2014, 281, 572–582. [Google Scholar] [CrossRef]
  40. Halling, P.; Fitzpatrick, P.F.; Raushel, F.M.; Rohwer, J.; Schnell, S.; Wittig, U.; Wohlgemuth, R.; Kettner, C. An empirical analysis of enzyme function reporting for experimental reproducibility: Missing/incomplete information in published papers. Biophys. Chem. 2018, 242, 22–27. [Google Scholar] [CrossRef]
  41. Wittig, U.; Kania, R.; Bittkowski, M.; Wetsch, E.; Shi, L.; Jong, L.; Golebiewski, M.; Rey, M.; Weidemann, A.; Rojas, I.; et al. Data extraction for the reaction kinetics database SABIO-RK. Perspect. Sci. 2014, 1, 33–40. [Google Scholar] [CrossRef]
  42. Gardossi, L.; Poulsen, P.B.; Ballesteros, A.; Hult, K.; Svedas, V.K.; Vasić-Racki, D.; Carrea, G.; Magnusson, A.; Schmid, A.; Wohlgemuth, R.; et al. Guidelines for reporting of biocatalytic reactions. Trends Biotechnol. 2010, 28, 171–180. [Google Scholar] [CrossRef]
  43. Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.W.; da Silva Santos, L.B.; Bourne, P.E.; et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3, 160018. [Google Scholar] [CrossRef] [PubMed]
  44. Range, J.; Halupczok, C.; Lohmann, J.; Swainston, N.; Kettner, C.; Bergmann, F.T.; Weidemann, A.; Wittig, U.; Schnell, S.; Pleiss, J. EnzymeML—A data exchange format for biocatalysis and enzymology. FEBS J. 2022, 289, 5864–5874. [Google Scholar] [CrossRef] [PubMed]
  45. Lauterbach, S.; Dienhart, H.; Range, J.; Malzacher, S.; Spöring, J.D.; Rother, D.; Pinto, M.F.; Martins, P.; Lagerman, C.E.; Bommarius, A.S.; et al. EnzymeML: Seamless data flow and modeling of enzymatic data. Nat. Methods 2023, 20, 400–402. [Google Scholar] [CrossRef] [PubMed]
  46. Ryu, J.Y.; Kim, H.U.; Lee, S.Y. Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers. Proc. Natl. Acad. Sci. USA 2019, 116, 13996–14001. [Google Scholar] [CrossRef] [PubMed]
  47. Li, Y.; Wang, S.; Umarov, R.; Xie, B.; Fan, M.; Li, L.; Gao, X. DEEPre: Sequence-based enzyme EC number prediction by deep learning. Bioinformatics 2018, 34, 760–769. [Google Scholar] [CrossRef] [PubMed]
  48. Dalkiran, A.; Rifaioglu, A.S.; Martin, M.J.; Cetin-Atalay, R.; Atalay, V.; Doğan, T. ECPred: A tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature. BMC Bioinform. 2018, 19, 334. [Google Scholar] [CrossRef]
  49. Zou, Z.; Tian, S.; Gao, X.; Li, Y. mlDEEPre: Multi-functional enzyme function prediction with hierarchical multi-label deep learning. Front. Genet. 2019, 9, 714. [Google Scholar] [CrossRef]
  50. Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
  51. Mulnaes, D.; Porta, N.; Clemens, R.; Apanasenko, I.; Reiners, J.; Gremer, L.; Neudecker, P.; Smits, S.H.J.; Gohlke, H. TopModel: Template-Based Protein Structure Prediction at Low Sequence Identity Using Top-Down Consensus and Deep Neural Networks. J. Chem. Theory Comput. 2020, 16, 1953–1967. [Google Scholar] [CrossRef]
  52. Torng, W.; Altman, R.B. High precision protein functional site detection using 3D convolutional neural networks. Bioinformatics 2019, 35, 1503–1512. [Google Scholar] [CrossRef]
  53. Song, J.; Li, F.; Takemoto, K.; Haffari, G.; Akutsu, T.; Chou, K.-C.; Webb, G.I. PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework. J. Theor. Biol. 2018, 443, 125–137. [Google Scholar] [CrossRef]
  54. Kroll, A.; Rousset, Y.; Hu, X.-P.; Liebrand, N.A.; Lercher, M.J. Turnover number predictions for kinetically uncharacterized enzymes using machine and deep learning. Nat. Commun. 2023, 14, 4139. [Google Scholar] [CrossRef] [PubMed]
  55. Kroll, A.; Engqvist, M.K.M.; Heckmann, D.; Lercher, M.J. Deep learning allows genome-scale prediction of Michaelis constants from structural features. PLoS Biol. 2021, 19, e3001402. [Google Scholar] [CrossRef] [PubMed]
  56. Li, F.; Yuan, L.; Lu, H.; Li, G.; Chen, Y.; Engqvist, M.K.M.; Kerkhoven, E.J.; Nielsen, J. Deep learning-based kcat prediction enables improved enzyme-constrained model reconstruction. Nat. Catal. 2022, 5, 662–672. [Google Scholar] [CrossRef]
  57. Yeh, A.H.-W.; Norn, C.; Kipnis, Y.; Tischer, D.; Pellock, S.J.; Evans, D.; Ma, P.; Lee, G.R.; Zhang, J.Z.; Anishchenko, I.; et al. De novo design of luciferases using deep learning. Nature 2023, 614, 774–780. [Google Scholar] [CrossRef] [PubMed]
  58. Wang, J.; Lisanza, S.; Juergens, D.; Tischer, D.; Watson, J.L.; Castro, K.M.; Ragotte, R.; Saragovi, A.; Milles, L.F.; Baek, M.; et al. Scaffolding protein functional sites using deep learning. Science 2022, 377, 387–394. [Google Scholar] [CrossRef] [PubMed]
  59. Thapa, S.; Adhikari, S. ChatGPT, Bard, and Large Language Models for Biomedical Research: Opportunities and Pitfalls. Ann. Biomed. Eng. 2023, 51, 2647–2651. [Google Scholar] [CrossRef]
  60. Qian, C.; Tang, H.; Yang, Z.-J.; Liang, H.; Liu, Y. Can Large Language Models Empower Molecular Property Prediction? arXiv 2023, arXiv:2307.07443. [Google Scholar]
  61. Hastings, J.; Owen, G.; Dekker, A.; Ennis, M.; Kale, N.; Muthukrishnan, V.; Turner, S.; Swainston, N.; Mendes, P.; Steinbeck, C. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Res. 2016, 44, D1214–D1219. [Google Scholar] [CrossRef] [PubMed]
  62. Schomburg, I.; Hofmann, O.; Baensch, C.; Chang, A.; Schomburg, D. Enzyme data and metabolic information: BRENDA, a resource for research in biology, biochemistry, and medicine. Gene Funct. Dis. 2000, 1, 109–118. [Google Scholar] [CrossRef]
  63. Jeske, L.; Placzek, S.; Schomburg, I.; Chang, A.; Schomburg, D. BRENDA in 2019: A European ELIXIR core data resource. Nucleic Acids Res. 2019, 47, D542–D549. [Google Scholar] [CrossRef]
  64. Gremse, M.; Chang, A.; Schomburg, I.; Grote, A.; Scheer, M.; Ebeling, C.; Schomburg, D. The BRENDA Tissue Ontology (BTO): The first all-integrating ontology of all organisms for enzyme sources. Nucleic Acids Res. 2011, 39, D507–D513. [Google Scholar] [CrossRef]
  65. Ashburner, M.; Ball, C.A.; Blake, J.A.; Botstein, D.; Butler, H.; Cherry, J.M.; Davis, A.P.; Dolinski, K.; Dwight, S.S.; Eppig, J.T.; et al. Gene Ontology: Tool for the unification of biology. Nat. Genet. 2000, 25, 25–29. [Google Scholar] [CrossRef]
  66. Consortium, T.G.O.; Aleksander, S.A.; Balhoff, J.; Aleksander, S.A.; Balhoff, J.; Carbon, S.; Cherry, J.M.; Drabkin, H.J.; Ebert, D.; Feuermann, M.; et al. The Gene Ontology knowledgebase in 2023. Genetics 2023, 224, iyad031. [Google Scholar] [CrossRef] [PubMed]
  67. Scheer, M.; Grote, A.; Chang, A.; Schomburg, I.; Munaretto, C.; Rother, M.; Söhngen, C.; Stelzer, M.; Thiele, J.; Schomburg, D. BRENDA, the enzyme information system in 2011. Nucleic Acids Res. 2011, 39, D670–D676. [Google Scholar] [CrossRef] [PubMed]
  68. Barthelmes, J.; Ebeling, C.; Chang, A.; Schomburg, I.; Schomburg, D. BRENDA, AMENDA and FRENDA: The enzyme information system in 2007. Nucleic Acids Res. 2007, 35, D511–D514. [Google Scholar] [CrossRef]
  69. Chang, A.; Scheer, M.; Grote, A.; Schomburg, I.; Schomburg, D. BRENDA, AMENDA and FRENDA the enzyme information system: New content and tools in 2009. Nucleic Acids Res. 2009, 37, D588–D592. [Google Scholar] [CrossRef] [PubMed]
  70. Schomburg, I.; Chang, A.; Placzek, S.; Söhngen, C.; Rother, M.; Lang, M.; Munaretto, C.; Ulas, S.; Stelzer, M.; Grote, A.; et al. BRENDA in 2013: Integrated reactions, kinetic data, enzyme function data, improved disease classification: New options and contents in BRENDA. Nucleic Acids Res. 2013, 41, D764–D772. [Google Scholar] [CrossRef]
  71. Chang, A.; Schomburg, I.; Placzek, S.; Jeske, L.; Ulbrich, M.; Xiao, M.; Sensen, C.W.; Schomburg, D. BRENDA in 2015: Exciting developments in its 25th year of existence. Nucleic Acids Res. 2015, 43, D439–D446. [Google Scholar] [CrossRef]
  72. Placzek, S.; Schomburg, I.; Chang, A.; Jeske, L.; Ulbrich, M.; Tillack, J.; Schomburg, D. BRENDA in 2017: New perspectives and new tools in BRENDA. Nucleic Acids Res. 2017, 45, D380–D388. [Google Scholar] [CrossRef] [PubMed]
  73. Quester, S.; Schomburg, D. EnzymeDetector: An integrated enzyme function prediction tool and database. BMC Bioinform. 2011, 12, 376. [Google Scholar] [CrossRef] [PubMed]
  74. Sonnhammer, E.L.; von Heijne, G.; Krogh, A. A hidden Markov model for predicting transmembrane helices in protein sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1998, 6, 175–182. [Google Scholar] [PubMed]
  75. Emanuelsson, O.; Nielsen, H.; Brunak, S.; von Heijne, G. Predicting Subcellular Localization of Proteins Based on their N-terminal Amino Acid Sequence. J. Mol. Biol. 2000, 300, 1005–1016. [Google Scholar] [CrossRef] [PubMed]
  76. Hallgren, J.; Tsirigos, K.; Pedersen, M.D.; Almagro Armenteros, J.J.; Marcatili, P.; Nielsen, H.; Krogh, A.; Winther, O. DeepTMHMM predicts alpha and beta transmembrane proteins using deep neural networks. BioRxiv 2022. [Google Scholar] [CrossRef]
  77. Armenteros, J.J.A.; Salvatore, M.; Emanuelsson, O.; Winther, O.; Heijne, G.; von Elofsson, A.; Nielsen, H. Detecting sequence signals in targeting peptides using deep learning. Life Sci. Alliance 2019, 2, e201900429. [Google Scholar] [CrossRef] [PubMed]
  78. Wittig, U.; Rey, M.; Weidemann, A.; Kania, R.; Müller, W. SABIO-RK: An updated resource for manually curated biochemical reaction kinetics. Nucleic Acids Res. 2018, 46, D656–D660. [Google Scholar] [CrossRef]
  79. Krebs, O.; Golebiewski, M.; Kania, R.; Mir, S.; Saric, J.; Weidemann, A.; Wittig, U.; Rojas, I. SABIO-RK: A data warehouse for biochemical reactions and their kinetics. J. Integr. Bioinform. 2007, 4, 22–30. [Google Scholar] [CrossRef]
  80. Wittig, U.; Golebiewski, M.; Kania, R.; Krebs, O.; Mir, S.; Weidemann, A.; Anstein, S.; Saric, J.; Rojas, I. SABIO-RK: Integration and curation of reaction kinetics data. In Proceedings of the Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 4075 LNBI, Hinxton, UK, 20–22 July 2006; pp. 94–103. [Google Scholar]
  81. Furnham, N.; Holliday, G.L.; de Beer, T.A.P.; Jacobsen, J.O.B.; Pearson, W.R.; Thornton, J.M. The Catalytic Site Atlas 2.0: Cataloging catalytic sites and residues identified in enzymes. Nucleic Acids Res. 2014, 42, D485–D489. [Google Scholar] [CrossRef]
  82. Holliday, G.L.; Bartlett, G.J.; Almonacid, D.E.; O’Boyle, N.M.; Murray-Rust, P.; Thornton, J.M.; Mitchell, J.B.O. MACiE: A database of enzyme reaction mechanisms. Bioinformatics 2005, 21, 4315–4316. [Google Scholar] [CrossRef]
  83. Potter, S.C.; Luciani, A.; Eddy, S.R.; Park, Y.; Lopez, R.; Finn, R.D. HMMER web server: 2018 update. Nucleic Acids Res. 2018, 46, W200–W204. [Google Scholar] [CrossRef] [PubMed]
  84. Karp, P.D.; Riley, M.; Paley, S.M.; Pellegrini-Toole, A. The MetaCyc Database. Nucleic Acids Res. 2002, 30, 59–61. [Google Scholar] [CrossRef] [PubMed]
  85. Karp, P.D.; Midford, P.E.; Billington, R.; Kothari, A.; Krummenacker, M.; Latendresse, M.; Ong, W.K.; Subhraveti, P.; Caspi, R.; Fulcher, C.; et al. Pathway Tools version 23.0 update: Software for pathway/genome informatics and systems biology. Brief. Bioinform. 2021, 22, 109–126. [Google Scholar] [CrossRef] [PubMed]
  86. Paley, S.; Karp, P.D. The BioCyc Metabolic Network Explorer. BMC Bioinform. 2021, 22, 208. [Google Scholar] [CrossRef]
  87. Krummenacker, M.; Latendresse, M.; Karp, P.D. Metabolic route computation in organism communities. Microbiome 2019, 7, 89. [Google Scholar] [CrossRef]
  88. Paley, S.; Parker, K.; Spaulding, A.; Tomb, J.-F.; O’Maille, P.; Karp, P.D. The Omics Dashboard for interactive exploration of gene-expression data. Nucleic Acids Res. 2017, 45, 12113–12124. [Google Scholar] [CrossRef] [PubMed]
  89. Kanehisa, M. Enzyme Annotation and Metabolic Reconstruction Using KEGG. In Protein Function Prediction: Methods and Protocols; Kihara, D., Ed.; Springer: New York, NY, USA, 2017; pp. 135–145. [Google Scholar]
  90. Kanehisa, M.; Sato, Y.; Morishima, K. BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences. J. Mol. Biol. 2016, 428, 726–731. [Google Scholar] [CrossRef] [PubMed]
  91. Altman, T.; Travers, M.; Kothari, A.; Caspi, R.; Karp, P.D. A systematic comparison of the MetaCyc and KEGG pathway databases. BMC Bioinform. 2013, 14, 112. [Google Scholar] [CrossRef]
  92. Vastrik, I.; D’Eustachio, P.; Schmidt, E.; Gopinath, G.; Croft, D.; de Bono, B.; Gillespie, M.; Jassal, B.; Lewis, S.; Matthews, L.; et al. Reactome: A knowledge base of biologic pathways and processes. Genome Biol. 2007, 8, R39. [Google Scholar] [CrossRef]
  93. Jassal, B.; Matthews, L.; Viteri, G.; Gong, C.; Lorente, P.; Fabregat, A.; Sidiropoulos, K.; Cook, J.; Gillespie, M.; Haw, R.; et al. The reactome pathway knowledgebase. Nucleic Acids Res. 2020, 48, D498–D503. [Google Scholar] [CrossRef]
  94. Croft, D.; Mundo, A.F.; Haw, R.; Milacic, M.; Weiser, J.; Wu, G.; Caudy, M.; Garapati, P.; Gillespie, M.; Kamdar, M.R.; et al. The Reactome pathway knowledgebase. Nucleic Acids Res. 2014, 42, D472–D477. [Google Scholar] [CrossRef]
  95. Rothfels, K.; Milacic, M.; Matthews, L.; Haw, R.; Sevilla, C.; Gillespie, M.; Stephan, R.; Gong, C.; Ragueneau, E.; May, B.; et al. Using the Reactome Database. Curr. Protoc. 2023, 3, e722. [Google Scholar] [CrossRef] [PubMed]
  96. Bienert, S.; Waterhouse, A.; de Beer, T.A.P.; Tauriello, G.; Studer, G.; Bordoli, L.; Schwede, T. The SWISS-MODEL Repository—New features and functionality. Nucleic Acids Res. 2017, 45, D313–D319. [Google Scholar] [CrossRef] [PubMed]
  97. Ehlting, J.; Sauveplane, V.; Olry, A.; Ginglinger, J.-F.; Provart, N.J.; Werck-Reichhart, D. An extensive (co-)expression analysis tool for the cytochrome P450 superfamily in Arabidopsis thaliana. BMC Plant Biol. 2008, 8, 47. [Google Scholar] [CrossRef]
  98. Zhang, Y.; Pan, X.; Shi, T.; Gu, Z.; Yang, Z.; Liu, M.; Xu, Y.; Yang, Y.; Ren, L.; Song, X.; et al. P450Rdb: A manually curated database of reactions catalyzed by cytochrome P450 enzymes. J. Adv. Res. 2023; online ahead of print. [Google Scholar] [CrossRef] [PubMed]
  99. Wang, H.; Wang, Q.; Liu, Y.; Liao, X.; Chu, H.; Chang, H.; Cao, Y.; Li, Z.; Zhang, T.; Cheng, J.; et al. PCPD: Plant cytochrome P450 database and web-based tools for structural construction and ligand docking. Synth. Syst. Biotechnol. 2021, 6, 102–109. [Google Scholar] [CrossRef]
  100. Li-Beisson, Y.; Shorrosh, B.; Beisson, F.; Andersson, M.X.; Arondel, V.; Bates, P.D.; Baud, S.; Bird, D.; Debono, A.; Durrett, T.P.; et al. Acyl-Lipid Metabolism. Arab. Book 2013, 11, e0161. [Google Scholar] [CrossRef]
  101. Corcoran, C.C.; Grady, C.R.; Pisitkun, T.; Parulekar, J.; Knepper, M.A. From 20th century metabolic wall charts to 21st century systems biology: Database of mammalian metabolic enzymes. Am. J. Physiol. Renal Physiol. 2016, 312, F533–F542. [Google Scholar] [CrossRef]
  102. Sajed, T.; Marcu, A.; Ramirez, M.; Pon, A.; Guo, A.C.; Knox, C.; Wilson, M.; Grant, J.R.; Djoumbou, Y.; Wishart, D.S. ECMDB 2.0: A richer resource for understanding the biochemistry of E. coli. Nucleic Acids Res. 2016, 44, D495–D501. [Google Scholar] [CrossRef]
  103. Wishart, D.S.; Guo, A.; Oler, E.; Wang, F.; Anjum, A.; Peters, H.; Dizon, R.; Sayeeda, Z.; Tian, S.; Lee, B.L.; et al. HMDB 5.0: The Human Metabolome Database for 2022. Nucleic Acids Res. 2022, 50, D622–D631. [Google Scholar] [CrossRef]
  104. Le Boulch, M.; Déhais, P.; Combes, S.; Pascal, G. The MACADAM database: A MetAboliC pAthways DAtabase for Microbial taxonomic groups for mining potential metabolic capacities of archaeal and bacterial taxonomic groups. Database 2019, 2019, baz049. [Google Scholar] [CrossRef] [PubMed]
  105. Huang, W.; Brewer, L.K.; Jones, J.W.; Nguyen, A.T.; Marcu, A.; Wishart, D.S.; Oglesby-Sherrouse, A.G.; Kane, M.A.; Wilks, A. PAMDB: A comprehensive Pseudomonas aeruginosa metabolome database. Nucleic Acids Res. 2018, 46, D575–D580. [Google Scholar] [CrossRef]
  106. Hawkins, C.; Ginzburg, D.; Zhao, K.; Dwyer, W.; Xue, B.; Xu, A.; Rice, S.; Cole, B.; Paley, S.; Karp, P.; et al. Plant Metabolic Network 15: A resource of genome-wide metabolism databases for 126 plants and algae. J. Integr. Plant Biol. 2021, 63, 1888–1905. [Google Scholar] [CrossRef] [PubMed]
  107. Shameer, S.; Logan-Klumpler, F.J.; Vinson, F.; Cottret, L.; Merlet, B.; Achcar, F.; Boshart, M.; Berriman, M.; Breitling, R.; Bringaud, F.; et al. TrypanoCyc: A community-led biochemical pathways database for Trypanosoma brucei. Nucleic Acids Res. 2015, 43, D637–D644. [Google Scholar] [CrossRef]
  108. Ramirez-Gaona, M.; Marcu, A.; Pon, A.; Guo, A.C.; Sajed, T.; Wishart, N.A.; Karu, N.; Djoumbou Feunang, Y.; Arndt, D.; Wishart, D.S. YMDB 2.0: A significantly expanded version of the yeast metabolome database. Nucleic Acids Res. 2017, 45, D440–D445. [Google Scholar] [CrossRef] [PubMed]
  109. Drula, E.; Garron, M.-L.; Dogan, S.; Lombard, V.; Henrissat, B.; Terrapon, N. The carbohydrate-active enzyme database: Functions and literature. Nucleic Acids Res. 2022, 50, D571–D577. [Google Scholar] [CrossRef]
  110. Egorova, K.S.; Smirnova, N.S.; Toukach, P.V. CSDB_GT, a curated glycosyltransferase database with close-to-full coverage on three most studied nonanimal species. Glycobiology 2021, 31, 524–529. [Google Scholar] [CrossRef] [PubMed]
  111. Ausland, C.; Zheng, J.; Yi, H.; Yang, B.; Li, T.; Feng, X.; Zheng, B.; Yin, Y. dbCAN-PUL: A database of experimentally characterized CAZyme gene clusters and their substrates. Nucleic Acids Res. 2021, 49, D523–D528. [Google Scholar] [CrossRef]
  112. Zheng, J.; Hu, B.; Zhang, X.; Ge, Q.; Yan, Y.; Akresi, J.; Piyush, V.; Huang, L.; Yin, Y. dbCAN-seq update: CAZyme gene clusters and substrates in microbiomes. Nucleic Acids Res. 2023, 51, D557–D563. [Google Scholar] [CrossRef]
  113. d’Acierno, A.; Scafuri, B.; Facchiano, A.; Marabotti, A. The evolution of a Web resource: The Galactosemia Proteins Database 2.0. Hum. Mutat. 2018, 39, 52–60. [Google Scholar] [CrossRef]
  114. Srivastava, J.; Sunthar, P.; Balaji, P.V. Monosaccharide biosynthesis pathways database. Glycobiology 2021, 31, 1636–1644. [Google Scholar] [CrossRef] [PubMed]
  115. Ekstrom, A.; Taujale, R.; McGinn, N.; Yin, Y. PlantCAZyme: A database for plant carbohydrate-active enzymes. Database 2014, 2014, bau079. [Google Scholar] [CrossRef] [PubMed]
  116. Adler, B.A.; Trinidad, M.I.; Bellieny-Rabelo, D.; Zhang, E.; Karp, H.M.; Skopintsev, P.; Thornton, B.W.; Weissman, R.F.; Yoon, P.H.; Chen, L.; et al. CasPEDIA Database: A functional classification system for class 2 CRISPR-Cas enzymes. Nucleic Acids Res. 2023; in press. gkad890. [Google Scholar] [CrossRef] [PubMed]
  117. Tang, Z.; Chen, S.; Chen, A.; He, B.; Zhou, Y.; Chai, G.; Guo, F.; Huang, J. CasPDB: An integrated and annotated database for Cas proteins from bacteria and archaea. Database 2019, 2019, baz093. [Google Scholar] [CrossRef] [PubMed]
  118. Ponce-Salvatierra, A.; Boccaletto, P.; Bujnicki, J.M. DNAmoreDB, a database of DNAzymes. Nucleic Acids Res. 2021, 49, D76–D81. [Google Scholar] [CrossRef] [PubMed]
  119. Huang, Z.; Jiang, H.; Liu, X.; Chen, Y.; Wong, J.; Wang, Q.; Huang, W.; Shi, T.; Zhang, J. HEMD: An Integrated Tool of Human Epigenetic Enzymes and Chemical Modulators for Therapeutics. PLoS ONE 2012, 7, e39917. [Google Scholar] [CrossRef] [PubMed]
  120. Taylor, G.K.; Petrucci, L.H.; Lambert, A.R.; Baxter, S.K.; Jarjour, J.; Stoddard, B.L. LAHEDES: The LAGLIDADG homing endonuclease database and engineering server. Nucleic Acids Res. 2012, 40, W110–W116. [Google Scholar] [CrossRef]
  121. Boccaletto, P.; Stefaniak, F.; Ray, A.; Cappannini, A.; Mukherjee, S.; Purta, E.; Kurkowska, M.; Shirvanizadeh, N.; Destefanis, E.; Groza, P.; et al. MODOMICS: A database of RNA modification pathways. 2021 update. Nucleic Acids Res. 2022, 50, D231–D235. [Google Scholar] [CrossRef]
  122. Roberts, R.J.; Vincze, T.; Posfai, J.; Macelis, D. REBASE: A database for DNA restriction and modification: Enzymes, genes and genomes. Nucleic Acids Res. 2023, 51, D629–D630. [Google Scholar] [CrossRef]
  123. Milanowska, K.; Krwawicz, J.; Papaj, G.; Kosiński, J.; Poleszak, K.; Lesiak, J.; Osińska, E.; Rother, K.; Bujnicki, J.M. REPAIRtoire—A database of DNA repair pathways. Nucleic Acids Res. 2011, 39, D788–D792. [Google Scholar] [CrossRef]
  124. Deng, J.; Shi, Y.; Peng, X.; He, Y.; Chen, X.; Li, M.; Lin, X.; Liao, W.; Huang, Y.; Jiang, T.; et al. Ribocentre: A database of ribozymes. Nucleic Acids Res. 2023, 51, D262–D268. [Google Scholar] [CrossRef] [PubMed]
  125. Nie, F.; Tang, Q.; Liu, Y.; Qin, H.; Liu, S.; Wu, M.; Feng, P.; Chen, W. RNAME: A comprehensive database of RNA modification enzymes. Comput. Struct. Biotechnol. J. 2022, 20, 6244–6249. [Google Scholar] [CrossRef] [PubMed]
  126. Podlevsky, J.D.; Bley, C.J.; Omana, R.V.; Qi, X.; Chen, J.J.-L. The Telomerase Database. Nucleic Acids Res. 2008, 36, D339–D343. [Google Scholar] [CrossRef]
  127. Wild, A.R.; Hogg, P.W.; Flibotte, S.; Nasseri, G.G.; Hollman, R.B.; Abazari, D.; Haas, K.; Bamji, S.X. Exploring the expression patterns of palmitoylating and de-palmitoylating enzymes in the mouse brain using the curated RNA-seq database BrainPalmSeq. eLife 2022, 11, e75804. [Google Scholar] [CrossRef] [PubMed]
  128. Wild, A.R.; Hogg, P.W.; Flibotte, S.; Kochhar, S.; Hollman, R.B.; Haas, K.; Bamji, S.X. CellPalmSeq: A curated RNAseq database of palmitoylating and de-palmitoylating enzyme expression in human cell types and laboratory cell lines. Front. Physiol. 2023, 14, 1110550. [Google Scholar] [CrossRef]
  129. Grinshpon, R.D.; Williford, A.; Titus-McQuillan, J.; Clay Clark, A. The CaspBase: A curated database for evolutionary biochemical studies of caspase functional divergence and ancestral sequence inference. Protein Sci. 2018, 27, 1857–1870. [Google Scholar] [CrossRef]
  130. Damle, N.P.; Köhn, M. The human DEPhOsphorylation Database DEPOD: 2019 update. Database 2019, 2019, baz133. [Google Scholar] [CrossRef]
  131. Xue, Z.; Chen, J.-X.; Zhao, Y.; Medvar, B.; Knepper, M.A. Data integration in physiology using Bayes’ rule and minimum Bayes’ factors: Deubiquitylating enzymes in the renal collecting duct. Physiol. Genom. 2016, 49, 151–159. [Google Scholar] [CrossRef]
  132. Huang, H.; Arighi, C.N.; Ross, K.E.; Ren, J.; Li, G.; Chen, S.-C.; Wang, Q.; Cowart, J.; Vijay-Shanker, K.; Wu, C.H. iPTMnet: An integrated resource for protein post-translational modification network discovery. Nucleic Acids Res. 2018, 46, D542–D550. [Google Scholar] [CrossRef]
  133. Zhou, J.; Xu, Y.; Lin, S.; Guo, Y.; Deng, W.; Zhang, Y.; Guo, A.; Xue, Y. iUUCD 2.0: An update with rich annotations for ubiquitin and ubiquitin-like conjugations. Nucleic Acids Res. 2018, 46, D447–D453. [Google Scholar] [CrossRef]
  134. Manning, G.; Whyte, D.B.; Martinez, R.; Hunter, T.; Sudarsanam, S. The Protein Kinase Complement of the Human Genome. Science 2002, 298, 1912–1934. [Google Scholar] [CrossRef]
  135. Rawlings, N.D.; Barrett, A.J.; Thomas, P.D.; Huang, X.; Bateman, A.; Finn, R.D. The MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 and a comparison with peptidases in the PANTHER database. Nucleic Acids Res. 2018, 46, D624–D632. [Google Scholar] [CrossRef]
  136. Garavelli, J.S. The RESID Database of Protein Modifications as a resource and annotation tool. Proteomics 2004, 4, 1527–1533. [Google Scholar] [CrossRef]
  137. Scietti, L.; Campioni, M.; Forneris, F. SiMPLOD, a Structure-Integrated Database of Collagen Lysyl Hydroxylase (LH/PLOD) Enzyme Variants. J. Bone Miner. Res. 2019, 34, 1376–1382. [Google Scholar] [CrossRef]
  138. Li, Z.; Chen, S.; Jhong, J.-H.; Pang, Y.; Huang, K.-Y.; Li, S.; Lee, T.-Y. UbiNet 2.0: A verified, classified, annotated and updated database of E3 ubiquitin ligase–substrate interactions. Database 2021, 2021, baab010. [Google Scholar] [CrossRef]
  139. Jorge, P.; Alves, D.; Pereira, M.O. Catalysing the way towards antimicrobial effectiveness: A systematic analysis and a new online resource for antimicrobial–enzyme combinations against Pseudomonas aeruginosa and Staphylococcus aureus. Int. J. Antimicrob. Agents 2019, 53, 598–605. [Google Scholar] [CrossRef]
  140. Vivek, K.; Diene, S.M.; Adrien, E.; Justine, D.; Olivier, C.; Laurent, T.; Jean-Marc, R.; Didier, R.; Pierre, P. An Integrative Database of β-Lactamase Enzymes: Sequences, Structures, Functions, and Phylogenetic Trees. Antimicrob. Agents Chemother. 2019, 63, e02319-18. [Google Scholar] [CrossRef]
  141. Naas, T.; Oueslati, S.; Bonnin, R.A.; Dabos, M.L.; Zavala, A.; Dortet, L.; Retailleau, P.; Iorga, B.I. Beta-lactamase database (BLDB)—Structure and function. J. Enzyme Inhib. Med. Chem. 2017, 32, 917–919. [Google Scholar] [CrossRef]
  142. Li, F.; Yin, J.; Lu, M.; Mou, M.; Li, Z.; Zeng, Z.; Tan, Y.; Wang, S.; Chu, X.; Dai, H.; et al. DrugMAP: Molecular atlas and pharma-information of all drugs. Nucleic Acids Res. 2023, 51, D1288–D1299. [Google Scholar] [CrossRef]
  143. Yin, J.; Li, F.; Zhou, Y.; Mou, M.; Lu, Y.; Chen, K.; Xue, J.; Luo, Y.; Fu, J.; He, X.; et al. INTEDE: Interactome of drug-metabolizing enzymes. Nucleic Acids Res. 2021, 49, D1233–D1243. [Google Scholar] [CrossRef]
  144. Zhou, J.; Ouyang, J.; Gao, Z.; Qin, H.; Jun, W.; Shi, T. MagMD: Database summarizing the metabolic action of gut microbiota to drugs. Comput. Struct. Biotechnol. J. 2022, 20, 6427–6430. [Google Scholar] [CrossRef]
  145. Gao, J.; Ellis, L.B.M.; Wackett, L.P. The University of Minnesota Biocatalysis/Biodegradation Database: Improving public access. Nucleic Acids Res. 2010, 38, D488–D491. [Google Scholar] [CrossRef]
  146. Rojas-Vargas, J.; Castelán-Sánchez, H.G.; Pardo-López, L. HADEG: A curated hydrocarbon aerobic degradation enzymes and genes database. Comput. Biol. Chem. 2023, 107, 107966. [Google Scholar] [CrossRef]
  147. Arora, P.K.; Kumar, M.; Chauhan, A.; Raghava, G.P.S.; Jain, R.K. OxDBase: A database of oxygenases involved in biodegradation. BMC Res. Notes 2009, 2, 67. [Google Scholar] [CrossRef]
  148. Buchholz, P.C.F.; Feuerriegel, G.; Zhang, H.; Perez-Garcia, P.; Nover, L.-L.; Chow, J.; Streit, W.R.; Pleiss, J. Plastics degradation by hydrolytic enzymes: The plastics-active enzymes database—PAZy. Proteins 2022, 90, 1443–1456. [Google Scholar] [CrossRef]
  149. Gambarini, V.; Pantos, O.; Kingsbury, J.M.; Weaver, L.; Handley, K.M.; Lear, G. PlasticDB: A database of microorganisms and proteins linked to plastic biodegradation. Database 2022, 2022, baac008. [Google Scholar] [CrossRef]
  150. Gan, Z.; Zhang, H. PMBD: A Comprehensive Plastics Microbial Biodegradation Database. Database 2019, 2019, baz119. [Google Scholar] [CrossRef]
  151. Chakraborty, J.; Jana, T.; Saha, S.; Dutta, T.K. Ring-Hydroxylating Oxygenase database: A database of bacterial aromatic ring-hydroxylating oxygenases in the management of bioremediation and biocatalysis of aromatic compounds. Environ. Microbiol. Rep. 2014, 6, 519–523. [Google Scholar] [CrossRef]
  152. Deckers, M.; Van Braeckel, J.; Vanneste, K.; Deforce, D.; Fraiture, M.-A.; Roosens N h., c. Food Enzyme Database (FEDA): A web application gathering information about food enzyme preparations available on the European market. Database 2021, 2021, baab060. [Google Scholar] [CrossRef]
  153. Mariano, D.; Pantuza, N.; Santos, L.H.; Rocha, R.E.O.; de Lima, L.H.F.; Bleicher, L.; de Melo-Minardi, R.C. Glutantβase: A database for improving the rational design of glucose-tolerant β-glucosidases. BMC Mol. Cell Biol. 2020, 21, 50. [Google Scholar] [CrossRef]
  154. Wu, H.; Huang, J.; Lu, H.; Li, G.; Huang, Q. GMEnzy: A Genetically Modified Enzybiotic Database. PLoS ONE 2014, 9, e103687. [Google Scholar] [CrossRef]
  155. Sunny, J.S.; Nisha, K.; Natarajan, A.; Saleena, L.M. IND-enzymes: A repository for hydrolytic enzymes derived from thermophilic and psychrophilic bacterial species with potential industrial usage. Extremophiles 2021, 25, 319–325. [Google Scholar] [CrossRef]
  156. Sharma, V.K.; Kumar, N.; Prakash, T.; Taylor, T.D. MetaBioME: A database to explore commercially useful enzymes in metagenomic datasets. Nucleic Acids Res. 2010, 38, D468–D472. [Google Scholar] [CrossRef]
  157. Wang, C.Y.; Chang, P.M.; Ary, M.L.; Allen, B.D.; Chica, R.A.; Mayo, S.L.; Olafson, B.D. ProtaBank: A repository for protein design and engineering data. Protein Sci. 2018, 27, 1113–1124. [Google Scholar] [CrossRef]
  158. Finnigan, W.; Lubberink, M.; Hepworth, L.J.; Citoler, J.; Mattey, A.P.; Ford, G.J.; Sangster, J.; Cosgrove, S.C.; da Costa, B.Z.; Heath, R.S.; et al. RetroBioCat Database: A Platform for Collaborative Curation and Automated Meta-Analysis of Biocatalysis Data. ACS Catal. 2023, 13, 11771–11780. [Google Scholar] [CrossRef]
  159. Duigou, T.; du Lac, M.; Carbonell, P.; Faulon, J.-L. RetroRules: A database of reaction rules for engineering biology. Nucleic Acids Res. 2019, 47, D1229–D1235. [Google Scholar] [CrossRef]
  160. Percudani, R.; Peracchi, A. The B6 database: A tool for the description and classification of vitamin B6-dependent enzymatic activities and of the corresponding protein families. BMC Bioinform. 2009, 10, 273. [Google Scholar] [CrossRef]
  161. Buchholz, P.C.F.; Vogel, C.; Reusch, W.; Pohl, M.; Rother, D.; Spieß, A.C.; Pleiss, J. BioCatNet: A Database System for the Integration of Enzyme Sequences and Biocatalytic Experiments. ChemBioChem 2016, 17, 2093–2098. [Google Scholar] [CrossRef]
  162. Tao, X.B.; LaFrance, S.; Xing, Y.; Nava, A.A.; Martin, H.G.; Keasling, J.D.; Backman, T.W.H. ClusterCAD 2.0: An updated computational platform for chimeric type I polyketide synthase and nonribosomal peptide synthetase design. Nucleic Acids Res. 2023, 51, D532–D538. [Google Scholar] [CrossRef]
  163. Bretaudeau, A.; Coste, F.; Humily, F.; Garczarek, L.; Le Corguillé, G.; Six, C.; Ratin, M.; Collin, O.; Schluchter, W.M.; Partensky, F. CyanoLyase: A database of phycobilin lyase sequences, motifs and functions. Nucleic Acids Res. 2013, 41, D396–D401. [Google Scholar] [CrossRef]
  164. Lenfant, N.; Hotelier, T.; Velluet, E.; Bourne, Y.; Marchot, P.; Chatonnet, A. ESTHER, the database of the α/β-hydrolase fold superfamily of proteins: Tools to explore diversity of functions. Nucleic Acids Res. 2013, 41, D423–D429. [Google Scholar] [CrossRef]
  165. Amata, E.; Marrazzo, A.; Dichiara, M.; Modica, M.N.; Salerno, L.; Prezzavento, O.; Nastasi, G.; Rescifina, A.; Romeo, G.; Pittalà, V. Heme Oxygenase Database (HemeOxDB) and QSAR Analysis of Isoform 1 Inhibitors. ChemMedChem 2017, 12, 1873–1881. [Google Scholar] [CrossRef]
  166. Yu, J.-L.; Wu, S.; Zhou, C.; Dai, Q.-Q.; Schofield, C.J.; Li, G.-B. MeDBA: The Metalloenzyme Data Bank and Analysis platform. Nucleic Acids Res. 2023, 51, D593–D602. [Google Scholar] [CrossRef]
  167. Kishore, S.; Khosla, C. Genomic mining and diversity of assembly line polyketide synthases. Open Biol. 2023, 13, 230096. [Google Scholar] [CrossRef]
  168. Gunera, J.; Kindinger, F.; Li, S.-M.; Kolb, P. PrenDB, a Substrate Prediction Database to Enable Biocatalytic Use of Prenyltransferases. J. Biol. Chem. 2017, 292, 4003–4021. [Google Scholar] [CrossRef]
  169. Velez Rueda, A.J.; Palopoli, N.; Zacarías, M.; Sommese, L.M.; Parisi, G. ProtMiscuity: A database of promiscuous proteins. Database 2019, 2019, baz103. [Google Scholar] [CrossRef]
  170. Oberg, N.; Precord, T.W.; Mitchell, D.A.; Gerlt, J.A. RadicalSAM.org: A Resource to Interpret Sequence-Function Space and Discover New Radical SAM Enzyme Chemistry. ACS Bio Med Chem Au 2022, 2, 22–35. [Google Scholar] [CrossRef]
  171. Akhter, S.; Kaur, H.; Agrawal, P.; Raghava, G.P.S. RareLSD: A manually curated database of lysosomal enzymes associated with rare diseases. Database 2019, 2019, baz112. [Google Scholar] [CrossRef]
  172. Savelli, B.; Li, Q.; Webber, M.; Jemmat, A.M.; Robitaille, A.; Zamocky, M.; Mathé, C.; Dunand, C. RedoxiBase: A database for ROS homeostasis regulated proteins. Redox Biol. 2019, 26, 101247. [Google Scholar] [CrossRef]
  173. Stam, M.; Lelièvre, P.; Hoebeke, M.; Corre, E.; Barbeyron, T.; Michel, G. SulfAtlas, the sulfatase database: State of the art and new developments. Nucleic Acids Res. 2023, 51, D647–D653. [Google Scholar] [CrossRef]
  174. Chen, N.; Zhang, R.; Zeng, T.; Zhang, X.; Wu, R. Developing TeroENZ and TeroMAP modules for the terpenome research platform TeroKit. Database 2023, 2023, baad020. [Google Scholar] [CrossRef]
  175. Caswell, B.T.; de Carvalho, C.C.; Nguyen, H.; Roy, M.; Nguyen, T.; Cantu, D.C. Thioesterase enzyme families: Functions, structures, and mechanisms. Protein Sci. 2022, 31, 652–676. [Google Scholar] [CrossRef] [PubMed]
  176. Miettinen, K.; Iñigo, S.; Kreft, L.; Pollier, J.; De Bo, C.; Botzki, A.; Coppens, F.; Bak, S.; Goossens, A. The TriForC database: A comprehensive up-to-date resource of plant triterpene biosynthesis. Nucleic Acids Res. 2018, 46, D586–D594. [Google Scholar] [CrossRef]
Figure 1. Growth of enzyme-related data in the last two decades. (a). The numbers of classified active entries (EC numbers) in the enzyme list. (b). The numbers of publications in PubMed that contain the terms enzyme or metabolism. (c) The numbers of protein sequences with annotated catalytic function in UniProtKB database.
Figure 1. Growth of enzyme-related data in the last two decades. (a). The numbers of classified active entries (EC numbers) in the enzyme list. (b). The numbers of publications in PubMed that contain the terms enzyme or metabolism. (c) The numbers of protein sequences with annotated catalytic function in UniProtKB database.
Ijms 24 16918 g001
Table 1. General enzyme databases and their properties. We retrieved the databases by searching PubMed for articles that contain the terms ‘database’, ‘repository’, or ‘resource’ in their title and the term ‘enzym*’ in their title or abstract. In addition, the Database Commons catalog [11] was searched for databases containing the keyword ‘enzym*’. Each recovered database was assessed with respect to whether it could be described as a general enzyme database and whether it was freely accessible and active. Only the databases that fulfil these criteria were included in the table.
Table 1. General enzyme databases and their properties. We retrieved the databases by searching PubMed for articles that contain the terms ‘database’, ‘repository’, or ‘resource’ in their title and the term ‘enzym*’ in their title or abstract. In addition, the Database Commons catalog [11] was searched for databases containing the keyword ‘enzym*’. Each recovered database was assessed with respect to whether it could be described as a general enzyme database and whether it was freely accessible and active. Only the databases that fulfil these criteria were included in the table.
Database TypeDatabaseScope of DatabaseData SourceCuration
Enzyme
nomenclature
ExplorEnzIUBMB classification [12]IUBMB enzyme listManual
ExPASy ENZYMEIUBMB classification with references to UniProt entries [13]IUBMB enzyme listManual
IntEnzIUBMB classification with references to UniProt and ChEBI entries [14]IUBMB enzyme listManual
KineticsBRENDAFunction and kinetic parameters, enzyme–ligand interactions, organism-related information, isolation methods [15]Experimental
(implementation of some prediction tools)
Manual and
automated *
SABIO-RKKinetic parameters with experimental conditions [16]ExperimentalManual (option of data submission by
experimenters)
STRENDA-DBStandardized kinetic data [17]ExperimentalSubmission of data by experimenters
IntEnzyDBA comparison of kinetic parameters between wildtype and mutant enzymes [18]Integration from
multiple databases
Automated
D3DistalMutationEffects of mutations on enzyme activity [19]Integration from
multiple databases
Automated (mostly)
GotEnzymesKinetic parameters predicted with a computer algorithm [20]PredictedAutomated
StructureUniProtProtein sequence and functional information [3]Experimental and predictedManual and
automated
PDB **Experimentally verified protein structures [21]ExperimentalManual
AlphaFold DB **Protein structures predicted with a computer algorithm [22]PredictedAutomated
TopEnzymeEnzyme structures predicted with a computer algorithm [23]PredictedAutomated
Ligand-induced
domain movements in enzymes
Data on movements of enzyme domains upon ligand binding [24]ExperimentalManual and
automated
CoFactorData on organic enzyme cofactors [25]ExperimentalManual and
automated
Natural Ligand
DataBase
Structural data on enzyme–ligand interactions [26]Experimental and predictedAutomated
PhylogenyFunTreeSequence, structural, and phylogenetic data on enzymes and other proteins fun [27]Integration from
multiple databases
Automated
Reactions (general)ATLAS of
Biochemistry
A database of all theoretical biochemical reactions [28]Experimental and predictedAutomated
BKMS-reactList of biochemical reactions from BRENDA, KEGG, MetaCyc, and SABIO-RK [29]Integration from
multiple databases
Automated
EnzyMineMining of enzymatic reactions linked to sequence and structural annotations [30]Integration from
multiple databases
Manual
RheaA resource of biochemical reactions [31]IUBMB enzyme listManual
Reaction explorerBiochemical reactions derived from IUBMB enzyme list [32]IUBMB enzyme listManual
Reaction mechanismM-CSAInformation on position and role of catalytic residues and annotated step-by-step reaction mechanisms [33]ExperimentalManual
EzCatDBA hierarchical classification of catalytic mechanisms [34]ExperimentalManual
Metabolic pathwaysKEGGInformation about metabolic pathways, reactions, metabolites, enzymes, and genes [35]ExperimentalManual
MetaCycInformation about metabolic pathways, reactions, metabolites, enzymes, and genes [36]ExperimentalManual
PathBankA metabolic pathway resource for model organisms [37]ExperimentalManual
ReactomeInformation about biological pathways in human and model organisms [38]ExperimentalManual and
automated
Secondary information resourceEnzyme PortalIntegration of publicly available enzyme information [10]Integration from
multiple databases
Automated
* The BRENDA main repository is manually curated, while its accessory repositories are curated automatically. ** A general protein database covering proteins with or without catalytic function.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Prešern, U.; Goličnik, M. Enzyme Databases in the Era of Omics and Artificial Intelligence. Int. J. Mol. Sci. 2023, 24, 16918. https://doi.org/10.3390/ijms242316918

AMA Style

Prešern U, Goličnik M. Enzyme Databases in the Era of Omics and Artificial Intelligence. International Journal of Molecular Sciences. 2023; 24(23):16918. https://doi.org/10.3390/ijms242316918

Chicago/Turabian Style

Prešern, Uroš, and Marko Goličnik. 2023. "Enzyme Databases in the Era of Omics and Artificial Intelligence" International Journal of Molecular Sciences 24, no. 23: 16918. https://doi.org/10.3390/ijms242316918

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop