Next Article in Journal
Thermal and Rheological Properties of Crude Tall Oil for Use in Biodiesel Production
Next Article in Special Issue
Minimizing the Effect of Substantial Perturbations in Military Water Systems for Increased Resilience and Efficiency
Previous Article in Journal
Energy Optimization of Gas–Liquid Dispersion in Micronozzles Assisted by Design of Experiment
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

A Reaction Database for Small Molecule Pharmaceutical Processes Integrated with Process Information

Emmanouil Papadakis
Amata Anantpinijwatna
John M. Woodley
1 and
Rafiqul Gani
Department of Chemical and Biochemical Engineering, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
Faculty of Engineering, King Mongkut’s Institute of Technology Ladkrabang, 10520 Bangkok, Thailand
Author to whom correspondence should be addressed.
Processes 2017, 5(4), 58;
Submission received: 12 September 2017 / Revised: 28 September 2017 / Accepted: 2 October 2017 / Published: 12 October 2017


This article describes the development of a reaction database with the objective to collect data for multiphase reactions involved in small molecule pharmaceutical processes with a search engine to retrieve necessary data in investigations of reaction-separation schemes, such as the role of organic solvents in reaction performance improvement. The focus of this reaction database is to provide a data rich environment with process information available to assist during the early stage synthesis of pharmaceutical products. The database is structured in terms of reaction classification of reaction types; compounds participating in the reaction; use of organic solvents and their function; information for single step and multistep reactions; target products; reaction conditions and reaction data. Information for reactor scale-up together with information for the separation and other relevant information for each reaction and reference are also available in the database. Additionally, the retrieved information obtained from the database can be evaluated in terms of sustainability using well-known “green” metrics published in the scientific literature. The application of the database is illustrated through the synthesis of ibuprofen, for which data on different reaction pathways have been retrieved from the database and compared using “green” chemistry metrics.

1. Introduction

Organic chemistry has an important role to play in the development of synthetic routes for new drugs during early stage process development. To pursue synthesis at a high level, access to chemical information is needed, which can be provided by using knowledge databases, experience, literature review and/or computer-aided tools [1,2]. The retrieved data is used for similarity search, reaction data retrieval, synthesis route planning, drug discovery-development and prediction of physicochemical properties [3]. The development of methods, algorithms and tools to systematize data collection, retrieval of chemical information-data, and to assist the solution approach to many problems related to the synthesis of molecules in organic chemistry has been developed since the 1970s. The methods and tools for reaction synthesis are based on retrieving chemical information organized in chemical reaction databases where data for individual reactions and structural information for different components involved in the reaction are stored.
Computer-aided tools have been developed to solve problems related to “synthesis” and “retrosynthesis.” The focus of these tools is to generate a number of possible chemical synthesis paths for possible precursors (synthesis tree) to achieve the synthesis of a given target compound. In retrosynthesis, the process of generating the possible pathways starts from the given target compound and, by going backwards, the reactions necessary to synthesize the target compound are identified. In addition, the reactions to produce the reactants of identified reactions are generated. The process is repeated until commercially available reactants are identified. These approaches are based on heuristics and logical rules and all of them rely on knowledge databases [4,5,6,7,8]. Recently, computer-aided tools that are based on algorithmic approaches have been developed, such as The Route Designer [9], which automatically extracts rules that capture the essence of the reactions in the chemical reaction database [10]. The tool ICSYNTH utilizes a graph-based approach with available data from the literature to generate the reaction rules [9]. Many other computer-aided methods and tools for reaction synthesis have already been developed with different characteristics. For example, tools to perform combinatorial searches, to screen generated alternatives based on information retrieved from knowledge databases and to perform extensive reaction assessment calculations [11,12,13,14,15].
Searching for reactions and retrieving the relevant information is a complex problem because it involves searching for chemical structures (complete or partial), transformation information (reaction centers), description of the reactions (reaction type, general comments) and numerical data such as experimental reaction data (including conversion, yield, selectivity, reaction conditions etc.). Reaction databases that help to organize, store and retrieve data continue to be developed (Houben-Weyl [16] and Theillheimer [17]), but more recently, the field of reaction databases has evolved further and databases (see Table 1) such as CASREACT [18], ChemReact [17] and REAXYS (previously Beilstein plus Reactions) [19] have been established, while reaction databases such as ChemInform [20] have become well-known.

1.1. General Databases

In these types of database, the information included is focused on organic reactions and synthetic methods in general. The CASREACT reaction database [18] was started in 1840 and since then more than 74.9 million reactions have been added as it is updated daily. The information is related to organic synthesis including organometallics, total synthesis of natural products and biocatalytic (biotransformation) reactions. This database can be used to provide information on different ways to produce the same product (single step or multi-step reactions), used for applications of a particular catalyst and various ways to carry out specific functional group transformations. The REAXYS reaction database [19]—based on data from Elsevier’s industry-leading chemistry databases (CrossFire Beilstein, CrossFire Gmelin and Patent Chemistry Database)—includes data for more than 40.7 million reactions, dating from 1771 to the present. It includes a large number of compounds (organic, inorganic and organometallic) and experimental reaction details (yield, solvents etc.). It is searchable for reactions, substances, formulas, and data such as physico-chemical properties data, spectra. Additionally, the REAXYS database can be used for synthesis route planning. The Current Chemical Reaction (CCR) database [21] includes over one million organic reactions together with reaction diagrams, critical conditions and bibliographic data. The Reference library of synthetic methodology (RefLib) covers reaction data from 1946 to 1992. The database contains information from different sources and the latest version has a comprehensive heterocyclic chemistry database [17].
The ChemReact reaction database [17] is a closed database that covers the period from 1974 to 1998 and includes over 3.5 million reactions. It is searchable by reaction type and provides information for the reaction transformation classified by type of reaction and relevant data (bibliographic, spectra and yield). Chemogenesis is a web-book [22], dealing with chemical reactions and chemical reactivity. It examines the rich science between the periodic table and the established disciplines of inorganic and organic chemistry. The Organic Synthesis database [23], includes more than 6000 organic reactions and is searchable by the reaction type or the structure of the compounds and it provides information for single and multi-step organic reaction together with reaction components, conditions and description. The reaction database-Chemical Synthesis [24] enables the user to find reactions related to reagents or target products and it also provides information with the necessary details of the reagents. The Synthetic Pages reaction database [25], covers 292 reactions and provides information for the optimized reaction procedure. It is searchable by reaction type and/or the structure of the reagent or the target product. The Chemical Thesaurus reaction database [26] contains 4000 reactions classified as organic, inorganic, organometallic, transition metal and biochemical.
The WebReaction reaction database [27] covers over 400,000 reactions; it can be searched by defining the structure of the reactant and the product and it performs search based on the reaction similarity with focus on reaction center. The Science of Synthesis database (previously Houben-Weyl) [16] covers information for organic and organometallic reactions with detailed experimental procedures, methodology evaluation and discussion of the field. Finally, the SPRESI reaction database [28] contains 4.6 million reactions and it enables searching of structures, references and reactions.
The Synthetic Reaction Updated (previously Methods in Organic Synthesis) lists many organic reactions (in graphical form) and is searchable by reaction type [29].

1.2. Specialized Databases

These databases are specialized in one class of reaction type. The ChemInform reaction database [20] includes more than 2 million reactions, including organic, enzymatic and microbial reactions. The available data can be used for the application of new reagents and also for catalysts as with the preparation of natural and pharmaceutical products. Other aspects that are covered by the ChemInform database include synthetic procedures, enantio-and diasteroselective syntheses and new protection/de-protection procedures. The Biotage Pathfinder reaction database [30] is specialized in the verified methods of microwave synthesis.
The e-EROS (Encyclopedia of Reagents for Organic Synthesis) [31] focuses on the reagents and catalysts used in organic chemistry for synthesis. The FlowReact Search [32] covers a range of over 2000 flow chemistry reactions adapted from publications on pharmaceutical, fine chemical and biotech companies. The Protecting Groups reaction database [33] provides information for protection, de-protection and trans-protection methods, stability, liability, and reaction conditions, and includes up-to-date information. Recently, a reaction library focused on generic reactions (88 reactions, ~ 20,000 reactants) with high reliability and reasonable yield has been developed by Masek et al. [34]. The objective of this library is to provide information on synthetically feasible design ideas for de novo drug design.
Representing chemical reactions in a structured way is a complex task. The reaction information contained in a database needs to fulfil several criteria and needs to be categorized with respect to their searchable reaction information. The criteria that a reaction database should fulfill are [17]:
Each reaction is an individual record in the database (detailed and graphical). The reaction must be able to be retrieved from the database as a detailed record (reagents, products, stoichiometry etc.). It can also be extracted as a graphical representation where the reaction scheme is shown. In many databases, the reaction is represented in a graphical form.
Structural information for target product as well as substrates.
Reaction centers. The reaction center of a reaction is the collection of atoms and bonds that are changed during the reaction [3].
Reaction components must be searchable. Information for the components involved in the reaction such as reagent, catalysts, solvents etc.
Multistep reactions. In the case of multistep reactions, all reactions (individual and whole pathway) must be searchable.
Reaction conditions. Conditions such as pH, temperature, pressure etc. should be searchable by exact and a suitable range of values.
Reaction classification. The type of reaction (i.e., esterification) should be searchable.
Post-processing of the database contents. Export of the retrieved reaction data in other tools (i.e., MS Excel).
Many reaction databases have been developed over time—some of them have a large number of reactions available and others a smaller number, and some of the databases cover the whole range of the organic and/or inorganic reactions. There are also reaction databases that cover more specialized reactions such as solid reactions, flow reactions etc. It can also be seen that most of the databases cover the most important criteria as defined by Zass [17], such as the need for individual reaction records (criterion i, in Table 1). In Table 1, existing reaction databases are listed and have been classified based on the different presented criteria. The numbers of reactions, as well as online sources, have also been listed.
The main objective of this article is to assist pharmaceutical process development in the early stages of the synthesis route selection and development, by providing enhanced process understanding. To achieve this task, a data-rich environment where knowledge can be collected, stored and retrieved is a requirement. A database that covers reactions taking place in pharmaceutical processes covering information connected to the criteria listed by Zass [17] and additionally covering process information has been developed to create an environment where process knowledge is available. The connection of individual reactions to criteria like scalability, cost, expected yield, and reaction steps, ease of separation, safety and to parameters such as reaction conditions, experimental data and models, they can improve the process understanding and the decision making process during the synthesis route selection process. In addition to constraints of high product quality and process economics, a pharmaceutical process needs to fulfill the criteria for environmental issues. In particular, for pharmaceutical processes, the environmental sustainability evaluation must be performed during the early stage of process development [36] before the approval of the regulatory bodies as the re-approval of the process can be a very expensive process [37]. Constable et al. [38] has reviewed “green” metrics proposed in literature and these metrics are used to increase the awareness of generated waste sources from the reaction and to identify opportunities for further improvement. The reviewed “green” metrics are listed in Table 2, where for each metric an explanation and the equation to quantify the specific metric are given.
This information, in combination with other knowledge databases and computer-aided synthesis design (CASD) tools developed earlier, provides an opportunity for an integrated approach to the solution of problems related to synthesis route selection and improvement, taking into account important process considerations such as the development time to establish the synthesis route, product quality, cost of manufacture that are often linked to “green” chemistry metrics and the final approval of regulatory agencies [1]. This process related information is not available in the reaction databases listed in Table 1, but is needed for plant-wide design, process-operation simulation and optimization in studies related to sustainability and the economics of processes producing active pharmaceutical ingredients [39,40,41].
In this article, the developed reaction database is presented with a specific focus on reactions (including multiple reactions) taking place in pharmaceutical processes within the pharmaceutical industry and connecting them with process information. The reactions in this database have been categorized according to the reaction type, the target product to be produced (when single-step or multistep reactions are considered), the reaction product and the effect of the solvent use on the reacting system. Reaction conditions (temperature, pressure etc.), reaction components (reagents, catalysts etc.), reaction data (conversion, selectivity, etc.), scaling information and finally batch or continuous processing is included in the developed database. For each reaction entry, a description of the process exists and the references are provided. A more detailed description of the database development and structure follows later in this article.
This reaction type database, more specifically, aims to:
  • Identify reactions that are used to produce different types of products (Active Pharmaceutical Ingredients (API), Intermediates).
  • Identify reactions to be utilized, for a given compound availability.
  • Investigate the function of different type of solvents in single/multiphase reactive systems.
  • Facilitate the choice of the reaction conditions.
  • Evaluate the reaction pathway in terms of yield, cost and sustainability metrics.
  • Facilitate the reactor design from available experimental data and kinetic models.
In addition, with the process information that is included in the database and has been mentioned in points 1–6 above, the database fulfills most of the criteria defined by Zass [17] (see Table 3). Table 3 provides a comparison of the available database with respect to the criteria given by Zass [17]. It can be noted that most of the available databases provide information for individual reactions (criterion i) and molecular structure information on reactants and products (criterion ii). However, the remaining criteria are covered only in some databases (see Table 3).

2. Reaction Database

The data required to populate a reaction database to satisfy the abovementioned objectives has been acquired from numerous published articles and patents. The collected knowledge from these sources has been structured in the database according to a developed ontology (knowledge representation) and stored for easy data retrieval and re-use in different likely applications. The database consists of classes, sub-classes, instances and objects. A class is a representation for a conceptual grouping of similar terms. Classes are the focus of most ontology. A class describes concepts in the domain. A class can have subclasses that represent concepts that are more specific than a super class [42]. A simplified flow-diagram, which serves as a guide for the reaction database in terms of knowledge representation system, classes and instances of data and information on the available data, and where information can be found in the article, is shown in Figure 1.

2.1. Knowledge Representation

For the development of the reaction database, classes have been used to represent the main knowledge categories such as the reaction type, the reaction, phases involved, how the phases are created, solvent use, solvent function, type of solvent, reaction conditions, available data and finally operation mode (listed in Table 4 and shown in Figure 2). The first knowledge class consists of different reaction types that are commonly found in pharmaceutical processes (i.e., hydrogenation). The set of these reaction types are called the instances of the class. The second class in the knowledge representation system (or data) is the reaction, which is divided in four sub-classes; the reactants, reaction products, and target product and reaction information (see Figure 3). The instances of the three first sub-classes of the second class are classified in terms of name of the compound, type of the compound and molecular structure while the fourth class summarizes information for the specific reaction. This type of information is important to identify the structural changes of the compounds during the reaction. The fourth class of data consists of instances describing the phases involved in the specific reaction. It is important to note that this class connects the reaction information with the reaction performance class, which will be described later, and it has an important role in the database since in this way, the advantages of using a multiphase or a single-phase system can be identified. The next two classes of the database consist of instances describing the solvent function, in case an organic solvent has been used in the reactive system, for example, the solvent function is “creates a second phase and removes the reaction product,” and the type and name of the used organic solvent. The last three classes of the data consist of instances describing the reaction performance under certain conditions. The reaction conditions class consists of instances, which have to do with the reaction variables such as reaction temperature, stoichiometric amount, catalyst (type and amount), pH, pressure and the need to use acid or base. The data class consists of four sub-classes, reaction data, dynamic data, kinetic model, and scale. The instances of the reaction data sub-classes are information related to reaction time (or residence time), conversion, selectivity, reaction yield and overall process yield (usually after isolation and purification). The instances of the dynamic data are sets of experimental data that can be used to fit or to develop a kinetic model. The next sub-class describes the availability of kinetic models that can be used either directly, or after fitting to the experimental data for reaction optimization studies. The last sub-class of the data is a super class that provides important information on the scale the reaction has been performed. Finally, the last class of the data is the operation mode, instances of this class can be different operational modes such as batch reaction or flow reaction.

2.2. Database Structure

Table 4 lists the classes of the data in the first column, the second column relates the classes to the instances that an individual class contains and in the third column, the instances are listed for different classes. The structure of the database is visually shown in Figure 2.
In Figure 3, the subclasses and the values of each instance in the “Reaction” class are illustrated. For example, each reaction has reactants—as well as reaction products—and can be used to eventually produce a target product (in case of multi-step reactions), each of the sub-classes take values such as the name of the compound (N), the type of the compound (T, for example, alcohol) and the molecular structure of the compound. The reaction info subclasses takes text values that can be used to give useful insights for the reaction.

3. Statistics of the Reaction Database

To determine the range of applications and the capability of the reaction database, the statistics of the stored data within the database are needed. The statistics are given in terms of number of reactions, reaction types, list of APIs, reactions where the use of solvent improves the reaction performance and available kinetic models.

3.1. General Numbers

In this section, the general statistics of this database are given, for example, total number of reactions, total number of APIs, the number of the intermediates, reactions that require solvent, multiphase reactions, experimental data, type of reaction operation (batch or continuous, technology i.e., microwave technology). The general characteristics of the reaction type database are listed in Table 5.

3.2. Reaction Types

The different reaction types included in the database are listed in Table 6, together with the number of the reactions, the catalyst need, the phases (usually) involved and the solvent function if it used.

3.3. Active Pharmaceutical Ingredients (APIs)

In Table 7, the list of the available APIs (or the final drug) in the database is given. The database includes at least one pathway for each API listed in Table 7. In some cases, more than two completely different published reaction pathways (for example, for Ibuprofen) exist, which are also listed in the database. Finally, in some cases efforts have been focused on improving a certain reaction within the reaction path that has also been included in the knowledge database.

3.4. Reaction with Improved Reaction Performance When Solvent Is Used

Reaction improvements in terms of reaction time, reaction volume, yield, conversion and/or selectivity and post-processing improvement in the separation and purification steps related to solvent use are considered in database development. The functions of solvent and the possible process improvements are listed below and summarized in Table 8:
  • Reaction medium.
  • Separation of the main product in order to shift the equilibrium reaction towards the product side in order to increase the yield and/or reduce the separation steps required.
  • Separation of an inhibitory product to increase the productivity of the reaction.
  • Controlled released of substrate, it might improve the process safety in case of hazardous compounds or increase selectivity towards the desired product.
  • Reaction volume reduction.
  • Dissolves reactants to increase the reaction rate and/or to avoid process complications when the reaction involves compounds in solid phase at the reaction conditions.
In Table 9 below, different reactive systems where solvent has been added in order to improve the reaction performance are listed. Table 9 has been classified based on the reaction type and the main product—it also gives the reaction phases, the solvent function and the reaction improvement.

3.5. Kinetic Models Available

Table 10 lists the kinetic model availability (found through literature search) and their inclusion in the reaction database kinetic model library. Some of the available kinetic models in the literature have been analyzed, validated against experimental data and, if found acceptable, then been used for reaction optimization in order to establish the design space. In other cases a model has been used by taking it directly from the reported reference, for example, the model reported by Thakar et al. [57] for the second hydrogenation step of ibuprofen synthesis has been successfully used without any modification (of the kinetic parameters) to fit the dynamic experimental data published by Cho et al. [58].

4. Reaction Database Application

The reaction database has multiple features that can assist in the creation of a data-rich environment in the early stage pharmaceutical process-product development. The knowledge stored in the database is searchable by forward or backward search options. As is illustrated in Figure 2, data can be retrieved for the specific search and the retrieved data is used for reaction improvement studies in subsequent calculation-analysis.

4.1. Reaction Data

Process improvements are usually related to resources such as development cost and time. The process of establishing the reactions, the experimental procedure, and the reaction conditions might require significant resources during the initial reaction screening that is required to identify the reaction pathway that leads to the production of the desired type of products (i.e., chiral alcohols). However, having an information-based system that can provide information for reaction identifications, reaction conditions and experimental procedures, can rapidly reduce the required time and cost of the initial screening process. The data-rich environment can also provide solution for reaction improvements related to the mass and heat transfer improvements by the use of new technologies such as flow reactions using for example new microwave technologies.
The use of experimental data (dynamic or end-points) can assist the improvement of the reaction system as the effect of reaction variable changes can be understood and quantified. Moreover, experimental data can be used to develop or to fit kinetic models that capture the behavior of the system under different conditions. These kinetic models can be used for validation studies, optimization studies to identify improved reaction conditions, evaluate different operation scenario and/or different reactor designs and networks.

4.2. Organic Solvents

Another class of process improvement is related to the solvent role during the synthesis step. There are cases where solvent use might enhance the reaction performance. Solvents might have different roles such as creating a second phase to remove an inhibitory product and shift the reaction equilibrium towards the product side, or simply it can create the second phase to remove the product in order to facilitate the following separation procedure. The solvent can also be used as a carrier for the controlled release of the substrate in the reaction mixture, which can minimize the amount of by-products produced when the concentration of substrate is high. The solvent can also have a role as the medium of the reaction and broaden the reaction conditions in order to improve reaction performance or satisfy other process concerns such as process safety. For example, if a reaction takes place at very low temperatures (<−25 °C), the solvent should be liquid at this condition and have the ability to dissolve the reactants, products and catalyst [70].

4.3. Search Options

The search options of the database in terms of both the retrieved data and the use of that data for a defined process are given below.
  • Search for reaction types
    Different reaction types can be searched in the reaction database, the retrieved results provide information for the reaction (reactants, product and target product), the solvent role and how it improves the reaction, reaction conditions (i.e., temperature range, acid/base, different catalyst) and quantitative data (i.e., conversion, concentration vs. time), and finally applicability information such as scale or batch/continuous mode. The results can be used as similarity check, to identify reaction conditions, solvents and possibilities for improvement (i.e., equipment, production mode, technology) for quick reaction optimization.
  • Search for main products (such as APIs or intermediates or type of products like chiral alcohols)
    Searching for main products or type of products, reactions that are used to synthesize this type of compound can be retrieved. The results are used to identify different ways for synthesis and to evaluate them in terms of reaction performance, cost, scalability and sustainability.
  • Search for reactants
    The results obtained by searching reactants are used to identify ways for further utilizing them in case they have used or produced a product during a reaction.
  • Multiphase reactions
    Multiphase and single reactions where the solvent use has improved the reaction performance can be searched, the retrieved results are used to identify the role of the multiphase system, for example, solvent creates a second phase to remove inhibitory by-product and to quantify the improvement in reaction performance, for example, increased conversion.
To summarize, the information retrieved from the reaction database can be used to:
  • Identify reaction pathways, reaction types, reactants, catalysts, solvents and base/acid.
  • Optimization reaction conditions.
  • Investigate the solvent role in process improvement.
  • Optimize the process development identified reactions in terms of cost, yield and time.
  • Improve the overall process performance in terms of separation process, overall yield, sustainability, safety, scalability, controllability and utilized mass.
  • Improve reactor design and evaluate different reactor designs.
  • Establish operation procedure for the reactors.
  • Assist in plant-wide design, simulation, and techno-economic optimization.
  • Enhance process understanding.

5. Application Example: Ibuprofen Synthesis and Evaluation

5.1. Problem Definition

To illustrate the applicability of the database, the synthesis of ibuprofen is selected as an example. The objectives of this example are:
  • To retrieve data relevant to the reaction pathway of Ibuprofen.
  • Collect data related to individual reactions.
  • Evaluate the alternatives based on green metrics.
    Database Search: “Main Product = ‘Ibuprofen’”.

5.1.1. Database Results

The main product sub-class is found in the “Reaction” class and from there information before (reactants, reaction types) and information forward (solvents, reaction conditions, data etc.) are retrieved. The database information as retrieved from the database is shown in Figure 4, which contains three screenshots for the purpose of illustration. Screenshot-1 connects the main product (ibuprofen) to the reaction type data; screenshot-2 connects the main product to the specific reaction information (temperature, pressure, solvent use, etc.); screenshot-3 connects the main product to modelling details (for example, kinetic model). The information is also given in the text as follows:
  • Summary of the findings (reaction pathways, reaction types, operation mode, available data and reference).
  • For each reaction pathway, each reaction is analyzed in terms of:
    • Reactant, products, by-products, acids/base, solvents, catalysts.
    • Then the reaction conditions for each reaction is presented.
    • Finally, the reaction data is presented.
The retrieved information is used for the evaluation of different pathways to produce ibuprofen using the green chemistry metrics.
The database search gives three different reaction pathways. Pathway 1 consists of three reactive steps. It has been proposed by Elango et al. [71] and consists of three batch reaction steps—a Friedel craft acylation, a hydrogenation and finally, a carbonylation step. The first reactive step has been improved by Lindley et al. [72] using a continuous counter flow reaction-separation system which enables the recovery and recycle of the solvent and the unreacted reactants. The second reactive step is a hydrogenation step that takes place in a fed-batch reactor and the final step is a carbonylation step that also takes place in a fed-batch reactor. Pathway 2 consists of 3 reactive steps as well—a Friedel crafts acylation, an 1,2-aryl migration step and a saponification step—all the reactions are taking place in a continuous flow reactor and this reaction pathway that has been proposed by Snead et al. [73]. Finally, the third reaction pathway consists of the same three reactive steps, as the second pathway, although the intermediates and reactants are different Bogdan et al. [74]. Table 11 gives a summary of the reaction pathways retrieved from the database.
The details for reaction pathways 1 and 3 are given in the supplementary material (see Sections A.1 and A.2 for pathways 1 and 3 respectively) while the retrieved data for reaction pathway 2 are given and analyzed in the text below.

5.1.2. Pathway 2: Ibuprofen Synthesis

The individual reaction details for the reaction pathway proposed by Snead et al. [73] are presented in Table 12, where the reaction is given in terms of reactants and reaction product for each step and the overall reaction pathway is illustrated in Figure 5. The stoichiometric amounts of the reactants, the solvents, the catalyst, acid/base and by-products are also given in Table 12 for the three reaction steps involved in this pathway.
The reaction conditions in terms of temperature, pressure, residence time, catalyst amount and solvent amount are listed in Table 13 for all the reaction steps.
The retrieved experimental data are given in Table 14 in terms of conversion, selectivity, overall reaction yield, experimental data and model availability.

5.1.3. Reaction Pathways Evaluation through the “Green” Metrics

A simple evaluation based on green metrics [38] has been performed and the results are illustrated in Figure 6. For this analysis, pathway 1 (BHC pathway) with and without recycling of HF and IBB, pathway 3 proposed by Bogdan et al. [74], and pathway 2 proposed by Snead et al. [73] have been considered. The effective mass yield, which is a ratio of the produced product (in mass, kg) over the total amount of non-benign reactant, has been evaluated first. As shown in Figure 6, step 1 of the BHC synthesis requires larger amounts of non-benign reactants compared to pathways 2 and 3, whereas reaction steps 2 and 3 require much less non-benign reactants. Another metric that has been evaluated is the mass intensity (MI), which shows the total required mass for the reaction per kg of product. In Figure 6b, it can be seen that the first reaction steps of pathways 2 and 3 require fewer reactants than the amount required for the BHC pathway without considering the recycling. However, when recycle is considered, the MI metric has lower values for BHC pathway than the other two pathways where recycle is not possible. In addition, pathway 2 proposes much fewer reactants than are required by pathway 3.
The E-factor metric, which shows the generated waste per kg of product, has been evaluated for all the four cases (shown in (Figure 6c). The first step of the BHC pathway has been found to be the main contributor in the E-factor metric—even if step 1 produces a small amount of waste during the reaction, the large value of E-factor is caused by the large stoichiometric amounts of needed solvent and reactant. When the solvent and the reactant are recycled back into the reactor, the E-factor reduces dramatically and the small value of the E-factor is now caused by the small amount of waste and non-recovered solvent and reactant (~1%) [72]. The other two pathways (2 and 3) have relatively high E-factor values, which means that larger amounts of waste are generated through the synthesis steps. The generated waste for pathway 2 has been found to be slightly lower compared to the reaction in pathway 3. Finally, the atom efficiency has been evaluated for the all pathways and is illustrated in Figure 6d. It can be seen that the atom efficiency for the BHC pathway is very high and therefore, most of the reactant atoms remain in the final product whereas the atom efficiencies are much lower for the two new pathways which means that pathways 2 and 3 might generate more waste than the batch process. Note that the interpretation and the analysis of each “green” metric should be performed individually for each reaction pathway as they represent different aspects of the process (for example, waste generation and total mass used per kg of product). Therefore, an overall conclusion about the “green extent” of the reaction pathways using weighted individual metrics cannot easily be made.

6. Conclusions

In this article, a reaction database has been developed to assist pharmaceutical process development during the early stages of the synthesis route selection and process-product development by providing enhanced process understanding. A data-rich environment is proposed for this task, where knowledge can be collected, stored and retrieved. The focus of this database is on the pharmaceutical processes and multiphase reactions taking place within them. The reactions in this database have been represented in terms of reaction type, target product to be produced (when single-step or multistep reactions are considered), reaction product and the effect of the solvent use in the reacting system. Information that is contained in the database includes: reaction conditions (temperature, pressure etc.), reaction components (reagents, catalysts etc.), reaction data (conversion, selectivity, dynamic data set, and kinetic models), scaling information and finally batch or continuous processing. For each reaction entry, a description of the process together with literature references are provided.
Reaction data collection is a crucial and very challenging task together with the development of an appropriate knowledge representation system. Also, verification of the consistency of the data is necessary but tests for consistency of data are not yet available, except for some phase equilibrium data.
The application of the database has been highlighted by retrieving data for the synthesis of ibuprofen and using the retrieved data to evaluate the identified reaction pathways using “green” metrics. This reaction database can be used to provide important information during the development of pharmaceutical processes at the early stages of process design. The reaction database covers chemical and biochemical reactions and the future aim is to extend it in terms of reactions and pathways to cover a wider range of reaction systems-products. Many multiphase reactions or single-phase reactions have been improved through the use of solvents available in the database. The solvents are either organic solvents or ionic solvents and in some cases, the extra phase is created by resin, especially for biochemical processes.

Supplementary Materials

Reaction data and reaction pathways for ibuprofen synthesis. The following are available online at

Author Contributions

This research has been carried out in collaboration with all authors. Papadakis and Anantpinijwatna collected reaction data. Papadakis designed the database, performed the analysis and drafted the manuscript, which is based on his PhD-thesis. John M. Woodley and Rafiqul Gani supervised the research work and revised the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.


  1. Caron, S.; Thomson, N.M. Pharmaceutical process chemistry: Evolution of a contemporary data-rich laboratory environment. J. Org. Chem. 2015, 80, 2943–2958. [Google Scholar] [CrossRef] [PubMed]
  2. Gasteiger, J. Chemoinformatics: Achievements and challenges, a personal view. Molecules 2016, 21, 151. [Google Scholar] [CrossRef]
  3. Warr, W.A. A short review of chemical reaction database systems, computer-aided synthesis design, reaction prediction and synthetic feasibility. Mol. Inform. 2014, 33, 469–476. [Google Scholar] [CrossRef]
  4. Corey, E.J. General methods of synthetic analysis. Strategic bond disconnections for bridged polycyclic structures. J. Am. Chem. Soc. 1975, 97, 6116–6124. [Google Scholar] [CrossRef]
  5. Bersohn, M.; Esack, A. Computers and organic synthesis. Chem. Rev. 1976, 76, 269–282. [Google Scholar] [CrossRef]
  6. Corey, E.J.; Jorgensen, E.J. Computer-Assisted Synthetic Analysis. Synthetic Strategies Based on Appendages and the Use of Reconnective Transforms. J. Am. Chem. Soc. 1976, 98, 189–203. [Google Scholar] [CrossRef]
  7. Agarwal, K.K.; Larsen, T.D.L.; Gelernter, H.L. Application of chemical transforms in synchem2, a computer program for organic synthesis route discovery. Comput. Chem. 1978, 2, 75–84. [Google Scholar] [CrossRef]
  8. Wipke, T.W.; Ouchi, G.I.; Krishnan, S. Simulation and Evaluation of Chemical Synthesis—SECS: An Application of Artificial Intelligence Techniques. Artif. Intell. 1978, 11, 173–193. [Google Scholar] [CrossRef]
  9. Ravitz, O. Data-driven computer aided synthesis design. Drug Discov. Today Technol. 2013, 10, 443–449. [Google Scholar] [CrossRef] [PubMed]
  10. Bogevig, A.; Federsel, H.J.; Huerta, F.; Hutchings, M.G.; Kraut, H.; Langer, T.; Low, P.; Oppawsky, C.; Rein, T.; Saller, H. Route design in the 21st century: The IC SYNTH software tool as an idea generator for synthesis prediction. Org. Process Res. Dev. 2015, 19, 357–368. [Google Scholar] [CrossRef]
  11. Salatin, T.D.; Jorgensen, W.L. Computer-assisted mechanistic evaluation of organic reactions. 1. Overview. J. Org. Chem. 1980, 45, 2043–2051. [Google Scholar] [CrossRef]
  12. Chen, J.H.; Baldi, P. No electron left behind: A rule-based expert system to predict chemical reactions and reaction mechanisms. J. Chem. Inf. Model. 2009, 49, 2034–2043. [Google Scholar] [CrossRef] [PubMed]
  13. Kayala, M.A.; Baldi, P. A Machine Learning Approach to Predict Chemical Reactions. Adv. Neural Inf. Process. Syst. 2011, 747–755. [Google Scholar]
  14. Kayala, M.A.; Azencott, C.A.; Chen, J.H.; Baldi, P. Learning to predict chemical reactions. J. Chem. Inf. Model. 2011, 51, 2209–2222. [Google Scholar] [CrossRef] [PubMed]
  15. Gothard, C.M.; Soh, S.; Gothard, N.A.; Kowalczyk, B.; Wei, Y.; Baytekin, B.; Grzybowski, B.A. Rewiring chemistry: Algorithmic discovery and experimental validation of one-pot reactions in the network of organic chemistry. Angew. Chem. Int. Ed. 2012, 51, 7922–7927. [Google Scholar] [CrossRef] [PubMed]
  16. Science of Synthesis. Thieme Chemistry. Available online: (accessed on 7 July 2016).
  17. Zass, E. Databases of Chemical Reactions. In Handbook of Chemoinformatics: From Data to Knowledge in 4 Volumes; Gasteiger, J., Ed.; Wiley-VCH Verlag GmbH: Weinheim, Germany, 2003; pp. 667–699. [Google Scholar]
  18. Reactions-CASREACT. Available online: (accessed on 1 July 2016).
  19. Reaxys. Available online: (accessed on 7 July 2016).
  20. ChemInform. Available online: (accessed on 7 July 2016).
  21. Current Chemical Reaction. Available online: (accessed on 7 July 2016).
  22. The Chemogenesis web book. Available online: (accessed on 7 July 2016).
  23. Organic Synthesis. Available online: (accessed on 7 July 2016).
  24. Reaction Database. Available online: (accessed on 7 July 2016).
  25. Synthetic Pages. Available online: (accessed on 7 July 2016).
  26. The Chemical Thesaurus. Available online: (accessed on 7 July 2016).
  27. Webreactions. Available online: (accessed on 7 July 2016).
  28. SPRECI InfoChem. Available online: (accessed on 7 July 2016).
  29. Synthetic Reaction. Available online: (accessed on 7 July 2016).
  30. Biotage PathFinder. Available online: (accessed on 7 July 2016).
  31. e-EROS. Available online: (accessed on 7 July 2016).
  32. FlowReact. Available online: (accessed on 7 July 2016).
  33. Protecting Groups. Available online: (accessed on 7 July 2016).
  34. Masek, B.B.; Baker, D.S.; Dorfman, R.J.; DuBrucq, K.; Francis, V.C.; Nagy, S.; Richey, B.L.; Soltanshahi, F. Multistep Reaction Based De Novo Drug Design: Generating synthetically feasible design ideas. J. Chem. Inf. Model. 2016, 56, 605–620. [Google Scholar] [CrossRef] [PubMed]
  35. Databases M. Theilheimer. Available online: (accessed on 7 July 2016).
  36. Butters, M.; Catterick, D.; Craig, A.; Curzons, A.; Dale, D.; Gillmore, D.; Green, S.P.; Marziano, I.; Sherlock, J.-P.; White, W. Critical assessment of pharmaceutical processes - A rationale for changing the synthetic route. Chem. Rev. 2006, 106, 3002–3027. [Google Scholar] [CrossRef] [PubMed]
  37. Cervera-Padrell, A.E.; Skovby, T.; Kiil, S.; Gani, R.; Gernaey, K.V. Active pharmaceutical ingredient (API) production involving continuous processes--a process system engineering (PSE)-assisted design framework. Eur. J. Pharm. Biopharm. 2012, 82, 437–456. [Google Scholar] [CrossRef] [PubMed]
  38. Constable, D.J.C.; Curzons, A.D.; Cunningham, V.L. Metrics to “green” chemistry-which are the best? Green Chem. 2002, 4, 521–527. [Google Scholar] [CrossRef]
  39. Jolliffe, H.G.; Gerogiorgis, D.I. Plantwide design and economic evaluation of two continuous pharmaceutical manufacturing (CPM) cases: Ibuprofen and artemisinin. Comput. Chem. Eng. 2016, 91, 269–288. [Google Scholar] [CrossRef]
  40. Schaber, S.D.; Gerogiorgis, D.I.; Ramachandran, R.; Evans, J.M.B.; Barton, P.I.; Trout, B.L. Economic Analysis of Integrated Continuous and Batch Pharmaceutical Manufacturing: A Case Study. Ind. Eng. Chem. Res. 2011, 50, 10083–10092. [Google Scholar] [CrossRef]
  41. Jolliffe, H.G.; Gerogiorgis, D.I. Technoeconomic optimisation and comparative environmental impact evaluation of continuous crystallisation and antisolvent selection for artemisinin recovery. Comput. Chem. Eng. 2016, 91, 269–288. [Google Scholar] [CrossRef]
  42. Singh, R.; Gernaey, K.V.; Gani, R. An ontological knowledge-based system for the selection of process monitoring and analysis tools. Comput. Chem. Eng. 2010, 34, 1137–1154. [Google Scholar] [CrossRef]
  43. Pfruender, H.; Amidjojo, M.; Kragl, U.; Weuster-Botz, D. Efficient whole-cell biotransformation in a biphasic ionic liquid/water system. Angew. Chem. Int. Ed. 2004, 43, 4529–4531. [Google Scholar] [CrossRef] [PubMed]
  44. Kopach, M.E.; Murray, M.M.; Braden, T.M.; Kobierski, M.E.; Williams, O.L. Improved Synthesis of 1-(Azidomethyl)-3,5-bis-(trifluoromethyl)benzene: Development of Batch and Microflow Azide Processes. Org. Process Res. Dev. 2009, 13, 152–160. [Google Scholar] [CrossRef]
  45. Li, P.; Buchwald, S.L. Continuous-flow synthesis of 3,3-disubstituted oxindoles by a palladium-catalyzed α-arylation/alkylation sequence. Angew. Chem. Int. Ed. Engl. 2011, 50, 6396–6400. [Google Scholar] [CrossRef] [PubMed]
  46. Domier, R.C.; Moore, J.N.; Shaughnessy, K.H.; Hartman, R.L. Kinetic Analysis of Aqueous-Phase Pd-Catalyzed, Cu-Free Direct Arylation of Terminal Alkynes Using a Hydrophilic Ligand. Org. Process. Res. Dev. 2013, 17, 1262–1271. [Google Scholar] [CrossRef]
  47. Xin, J.Y.; Li, S.B.; Xu, Y.; Wang, L.L. Enzymatic resolution of (S)-(+)-naproxen in a trapped aqueous-organic solvent biphase continuous reactor. Biotechnol. Bioeng. 2000, 68, 78–83. [Google Scholar] [CrossRef]
  48. Wang, Z.; Wang, L.; Xu, J.H.; Bao, D.; Qi, H. Enzymatic hydrolysis of penicillin G to 6-aminopenicillanic acid in cloud point system with discrete countercurrent experiment. Enzyme Microb. Technol. 2007, 41, 121–126. [Google Scholar] [CrossRef]
  49. Shin, J.S.; Kim, B.G. Transaminase-catalyzed asymmetric synthesis of L-2-aminobutyric acid from achiral reactants. Biotechnol. Lett. 2009, 31, 1595–1599. [Google Scholar] [CrossRef] [PubMed]
  50. Tufvesson, P.; Lima-Ramos, J.; Jensen, J.S.; Al-Haque, N.; Neto, W.; Woodley, J.M. Process considerations for the asymmetric synthesis of chiral amines using transaminases. Biotechnol. Bioeng. 2011, 108, 1479–1493. [Google Scholar] [CrossRef] [PubMed]
  51. Houng, J.Y.; Tseng, J.C.; Hsu, H.F.; Wu, J.Y. Kinetic investigation on asymmetric bioreduction of ethyl 4-chloro acetoacetate catalyzed by baker’s yeast in an organic solvent-water biphasic system. Korean J. Chem. Eng. 2008, 25, 1427–1433. [Google Scholar] [CrossRef]
  52. Houng, J.Y.; Liau, J.S. Applying slow-release biocatalysis to the asymmetric reduction of ethyl 4-chloroacetoacetate. Biotechnol. Lett. 2003, 25, 17–21. [Google Scholar] [CrossRef] [PubMed]
  53. Papadogianakis, G.; Maat, L.; Sheldon, R.A. Catalytic Conversions in Water. Part 5: Carbonylation of 1- (4-Isobutylphenyl) ethanol to Ibuprofen Catalysed by Water-Soluble Palladium-Phosphine Complexes in a Two-Phase System. J. Chem. Technol. Biotechnol. 1997, 70, 83–91. [Google Scholar] [CrossRef]
  54. Chaudhari, R.V.; Mills, P.L. Multiphase catalysis and reaction engineering for emerging pharmaceutical processes. Chem. Eng. Sci. 2004, 59, 5337–5344. [Google Scholar] [CrossRef]
  55. Savile, C.K.; Janey, J.M.; Mundorff, E.C.; Moore, J.C.; Tam, S.; Jarvis, W.R.; Colbeck, J.C.; Krebber, A.; Fleitz, F.J.; Brands, J.; et al. Biocatalytic asymmetric synthesis of chiral amines from ketones applied to sitagliptin manufacture. Science 2010, 329, 305–309. [Google Scholar] [CrossRef] [PubMed]
  56. Dunn, P.J. The importance of green chemistry in process research and development. Chem. Soc. Rev. 2012, 41, 1452–1461. [Google Scholar] [CrossRef] [PubMed]
  57. Thakar, N.; Berger, R.J.; Kapteijn, F.; Moulijn, J.A. Modelling kinetics and deactivation for the selective hydrogenation of an aromatic ketone over Pd/SiO2. Chem. Eng. Sci. 2007, 62, 5322–5329. [Google Scholar] [CrossRef]
  58. Cho, H.-B.; Lee, B.U.; Ryu, C.-H.; Nakayama, T.; Park, Y.-H. Selective hydrogenation of 4-isobutylacetophenone over a sodium-promoted Pd/C catalyst. Korean J. Chem. Eng. 2013, 30, 306–313. [Google Scholar] [CrossRef]
  59. Cervera-Padrell, A.E. Moving from Batch towards Continuous Organic-Chemical Pharmaceutical Production. Ph.D. Thesis, Technical University of Denmark, Kgs. Lyngby, Denmark, 2011. [Google Scholar]
  60. Bhatia, S.; Long, W.S.; Kamaruddin, A.H. Enzymatic membrane reactor for the kinetic resolution of racemic ibuprofen ester: Modeling and experimental studies. Chem. Eng. Sci. 2004, 59, 5061–5068. [Google Scholar] [CrossRef]
  61. Shankar, S.; Agarwal, M.; Chaurasia, S.P. Study of reaction parameters and kinetics of esterification of lauric acid with butanol by immobilized Candida antarctica lipase. Indian J. Biochem. Biophys. 2013, 50, 570–576. [Google Scholar] [PubMed]
  62. Shin, J.S.; Kim, B.G. Substrate inhibition mode of ω-transaminase from Vibrio fluvialis JS17 is dependent on the chirality of substrate. Biotechnol. Bioeng. 2002, 77, 832–837. [Google Scholar] [CrossRef] [PubMed]
  63. Al-Haque, N.; Santacoloma, P.A.; Neto, W.; Tufvesson, P.; Gani, R.; Woodley, J.M. A robust methodology for kinetic model parameter estimation for biocatalytic reactions. Biotechnol. Prog. 2012, 28, 1186–1196. [Google Scholar] [CrossRef] [PubMed]
  64. Diender, M.B.; Straathof, A.J.J.; van der Does, T.; Ras, C.; Heijnen, J.J. Equilibrium Modeling of Extractive Enzymatic Hydrolysis of Penicillin G with Concomitant 6-Aminopenicillanic Acid Crystallization. Biotechnol. Bioeng. 2002, 78, 395–402. [Google Scholar] [CrossRef] [PubMed]
  65. Den Hollander, J.L.; Zomerdijk, M.; Straathof, A.J.J.; Van Der Wielen, L.A.M. Continuous enzymatic penicillin G hydrolysis in countercurrent water-butyl acetate biphasic systems. Chem. Eng. Sci. 2002, 57, 1591–1598. [Google Scholar] [CrossRef]
  66. Jayasree, S.; Seayad, A.; Chaudhari, R.V. Novel palladium(II) complex containing a chelating anionic N-O ligand: efficient carbonylation catalyst. Org. Lett. 2000, 2, 203–206. [Google Scholar] [CrossRef] [PubMed]
  67. Seayad, A.; Seayad, J.; Mills, P.L.; Chaudhari, R.V. Kinetic Modeling of Carbonylation of 1-(4-Isobutylphenyl)ethanol Using a Homogeneous PdCl2(PPh3)2/TsOH/LiCl Catalyst System. Ind. Eng. Chem. Res. 2003, 42, 2496–2506. [Google Scholar] [CrossRef]
  68. Seayad, A.; Kelkar, A.A.; Chaudhari, R.V. Carbonylation of p-isobutyl phenylethanol to ibuprofen using palladium catalyst: activity and selectivity studies. Stud. Surf. Sci. Catal. 1998, 113, 883–889. [Google Scholar]
  69. Rajashekharam, M.V.; Chaudhari, R.V. Kinetics of Hydrogenation of p-isobutyl acetophenone using a supported Ni catalyst in a slurry reactor. Chem. Eng. Sci. 1996, 51, 1663–1672. [Google Scholar] [CrossRef]
  70. Watson, D.; Dowdy, E.D.; Depue, J.S.; Kotnis, A.S.; Leung, S.; Reilly, B.C.O. Development of a Safe and Scalable Oxidation Process for the Preparation of 6-Hydroxybuspirone: Application of In-Line Monitoring for Process Ruggedness and Product Quality. Org. Process Res. Dev. 2004, 8, 616–623. [Google Scholar] [CrossRef]
  71. Elango, V.; Murphy, M.; Smith, B.L.; Davenport, K.G.; Mott, G.N.; Zey, E.G.; Moss, G.L. Method for Producing Ibuprofen. U.S. Patent 4,981,995 A, 1 January 1991. [Google Scholar]
  72. Lindley, D.D.; Curtis, T.A.; Ryan, T.R.; de la Garza, E.M.; Hilton, C.B.; Kenesson, T.M. Process for the Production of 4-Isobutylacetophenone. U.S. Patent 5,068,448 A, 26 November 1991. [Google Scholar]
  73. Snead, D.R.; Jamison, T.F. A three-minute synthesis and purification of ibuprofen: Pushing the limits of continuous-flow processing. Angew. Chem. Int. Ed. 2015, 54, 983–987. [Google Scholar] [CrossRef] [PubMed]
  74. Bogdan, A.R.; Poe, S.L.; Kubis, D.C.; Broadwater, S.J.; McQuade, D.T. The continuous-flow synthesis of Ibuprofen. Angew. Chem. Int. Ed. 2009, 8547–8550. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Simplified flow-diagram highlighting the contents of the reaction database. Figure 2 and Figure 3 provide details of the knowledge representation system, Table 4 provides information on the classification of the data and Table 5, Table 6, Table 7, Table 8, Table 9 and Table 10 provide information on the available data.
Figure 1. Simplified flow-diagram highlighting the contents of the reaction database. Figure 2 and Figure 3 provide details of the knowledge representation system, Table 4 provides information on the classification of the data and Table 5, Table 6, Table 7, Table 8, Table 9 and Table 10 provide information on the available data.
Processes 05 00058 g001
Figure 2. Knowledge representation system of the reaction database.
Figure 2. Knowledge representation system of the reaction database.
Processes 05 00058 g002
Figure 3. Sub-classes and instance/individuals for the reaction class of the database.
Figure 3. Sub-classes and instance/individuals for the reaction class of the database.
Processes 05 00058 g003
Figure 4. Reaction database search results for “Main Product = Ibuprofen”.
Figure 4. Reaction database search results for “Main Product = Ibuprofen”.
Processes 05 00058 g004
Figure 5. Reaction pathway proposed by Snead et al. for the continuous flow synthesis of ibuprofen.
Figure 5. Reaction pathway proposed by Snead et al. for the continuous flow synthesis of ibuprofen.
Processes 05 00058 g005
Figure 6. “Green” metrics evaluation for the reaction pathways found in the reaction database: (a) effective mass yield (EM) metric, (b) mass intensity (MI) metric, (c) E-Factor metric and (d) atom efficiency metric.
Figure 6. “Green” metrics evaluation for the reaction pathways found in the reaction database: (a) effective mass yield (EM) metric, (b) mass intensity (MI) metric, (c) E-Factor metric and (d) atom efficiency metric.
Processes 05 00058 g006
Table 1. Database review. All the databases have been summarized with respect to the number of reactions and the focus of the database.
Table 1. Database review. All the databases have been summarized with respect to the number of reactions and the focus of the database.
DatabaseNumber of ReactionCriteria [17]Reference
CASREACT>74.9 million (1840–present)i, iv, v, vi[18]
REAXYS (previously CrossFire Beilstein)40.7 million (1771–present)i, ii, iv, vii[19]
Theilheimer>72200 (1946–1980)i, v, vi, vii[35]
ChemInform RX>2 million (since 1990–present)i, iv, vi[20]
Current chemical reactions1,083,758 (1840–present)i, vi[21]
Methods in organic synthesis33,000 (1999–2014)i, vii[29]
Reference library of synthetic methodology209.800 (1946–2001)i[17]
ChemReact3.5 million reactions (1974–1998)i, vii[17]
Chemogenesis-ii, iii[22]
Organic synthesis >6000 (1921–present)i, ii, v, vi, vii[23]
Reaction Database-Chemical Synthesis-i, ii[24]
Synthetic Pages292i, ii, vi, vii[25]
The chemical thesaurus 4000i, ii[26]
WebReactions>400,000i, ii, iii[27]
Biotage Pathfinder (reaction assisted with microwave technology)>1000i, vi, vii, viii [30]
e-EROS Encyclopedia of Reagents for Organic Synthesis>70,000 (4000 *)i, ii[31]
FlowReact Search >2000i (reaction in flow)[32]
Protecting groups-i[33]
Science of Synthesis (previously Houben-Weyl)240,000 (early 1800s–present)i, ii, iii[16]
SPRESI 4.6 millioni, ii, iii[28]
Table 2. List of metrics that have been proposed for “green” chemistry (reviewed by Constable et al. [38]).
Table 2. List of metrics that have been proposed for “green” chemistry (reviewed by Constable et al. [38]).
MetricExplanation Equation
Effective Mass yield (EM) The percentage of the mass of product over the overall mass of non-benign compounds used during the synthesis. E M ( % ) =   M a s s   o f   p r o d u c t s   ( k g ) M a s s   o f   n o n b e n i g n   r e a g e n t s   ( k g ) × 100 %
E-factorThe mass of total waste produced for a given amount of produced product. E f a c t o r =   T o t a l   w a s t e   ( k g ) k g   p r o d u c t
Atom EconomyHow much of the reactants remain in the product. A t o m   E c o n o m y ( % ) =   M W   P   ( M W   A ,   B ,   D ,   F ,   G ,   I ) × 100
Where A, B, D, F, G, I: reactants; P: product
Mass Intensity (MI)Total mass used to produce the product. M I =   T o t a l   m a s s   u s e d   i n   a   p r o c e s s   o r   p r o c e s s   s t e p   ( k g ) M a s s   o f   p r o d u c t   ( k g )
Carbon efficiency Percentage of carbon of the reactants that remain in the final product. C a r b o n   e f f i c i e n c y   ( % ) =   a m o u n t   o f   c a r b o n   i n   p r o d u c t T o t a l   c a r b o n   p r e s e n t   i n   r e a c t a n t s × 100
Reaction mass efficiency (RME)Mass of reactants remaining in the product. R M E   ( % ) =   m a s s   o f   p r o d u c t ( k g ) m a s s   o f   r e a c t a n t s   ( k g ) × 10
Table 3. List of available databases and the criteria [17] they fulfill. Criterion: (i) individual records of reactions, (ii) chemical structure information, (iii) reaction centers, (iv) searchable reaction components, (v) multistep reactions, (vi) reaction conditions, (vii) reaction classification, (viii) post-processing information.
Table 3. List of available databases and the criteria [17] they fulfill. Criterion: (i) individual records of reactions, (ii) chemical structure information, (iii) reaction centers, (iv) searchable reaction components, (v) multistep reactions, (vi) reaction conditions, (vii) reaction classification, (viii) post-processing information.
CriterionCASREACTREAXYSTheilheimerChemInform RXCurrent Chemical reactionsSynthetic Reaction Updates Reference Library of Synthetic methodologyChemReactChemogenesisOrganic SynthesisReaction Database-Chemical SynthesisSynthetic PagesThe Chemical ThesaurusWebreactionsBiotage Pathfindere-EROS Encyclopedia of Reagents for Organic SynthesisFlowReact SearchProtecting GroupsScience of SynthesisSPRESIThis Work
Table 4. Main classes of the reaction type database and the instances.
Table 4. Main classes of the reaction type database and the instances.
Main ClassesRelation with Instances Instances
Reaction Type, TT = [T1, T2, ..., Ti, …, Tn]Ti: reaction type in the knowledge base (i.e., acylation etc.)
Reaction, RR = [R1, R2, ..., Ri, …, Rn]Ri: reaction of the ith reaction type; for each reaction information about the reactants and reaction products are provided as well as information for the target product and process (for example: 1st step for production of an API)
Phases involved, P P = [P1, P2, ..., Pi, …, Pn]Pi: phase of the ith reaction (i.e., organic-aqueous, organic-gas etc.)
How phases are created, CC = [C1, C2, ..., Ci, …, Cn]Ci: (i.e., solvent etc.)
Solvent function, FF = [F1, F2, ..., Fi, …, Fn]Fi: (i.e., phase creation, carrier etc.)
Solvent type, STST = [ST1, ST2, ..., STi, …, STn]STi: (i.e., ether, alcohol etc.)
Solvent, SS = [S1, S2, ..., Si, …, Sn]Si: Solvents in ith reaction
Reaction condition, RCRC = [RC1, RC2, ..., RCi, …, RCn]RCi (i.e., Temperature, composition, cat, pH etc.)
Data, DD = [D1, D2, ..., Di, …, Dn]Di (reaction data: conversion, selectivity, reaction time, and dynamic data: concentration vs. time, scale information and kinetic models etc.)
Operation Mode, OPOP = [OP1, OP2, ..., OPi, …, OPn]OPi: batch, continuous, fed batch
Table 5. Summary of the information included in the database.
Table 5. Summary of the information included in the database.
Category Number
Total number of reactions285
Types of reactions 44
Number of multiphase reactions 88
Number of reaction with solvents 226
SolventDissolve, Phase creation, Substrate/catalyst carrier, compound extraction
Number of APIs (with total synthesis pathway)21
Number of building blocks (type of compounds) 19
Number of experimental data275 (conversion, selectivity, reaction yield, conditions), 32 (dynamic data), 11 (kinetic models)
Number of production mode data96 (in flow), 203 (in batch)
Number of application examples 14 (chemicals), 16 (Fine chemicals), 251 (pharmaceuticals)
Table 6. Reaction types included in database, phases involved and function of the used solvent.
Table 6. Reaction types included in database, phases involved and function of the used solvent.
Reaction TypeCatalystPhasesSolvent Function
1. AlkylationYesLiquid (org.)Dissolves reactants
Liquid (org.)—Liquid (aq.)Creates second phase
2. HydrogenationYes Liquid (org.)—GasDissolves reactants
3. Epoxidation Liquid (org.)—Liquid (aq.)Creates second phase
Reactant/catalyst carrier
4. CarbonylationYesLiquid (org.)—Liquid (aq)—GasCreates phase
Carrier for catalyst
YesLiquid (org.)—GasDissolves reactants
5. Hydroformulation? Creates second phase
Catalyst carrier
6. Enzymatic reductionYesLiquid (org.)—Liquid (aq.)Reactant carrier
Creates second phase
7. ArylationYesLiquid (org.)—Liquid (aq.)Creates second phase
YesLiquid (org.)Dissolves reactants
8. OxidationYesLiquid (org.)—GasDissolves reactants
9. TransaminationyesLiquid (org,)—Liquid (aq.)Creates second phase
Product removal
10. SaponificationNoLiquid (org.)Dissolves reactants
11. AmidationYes/NoLiquid (org.)—liquid (aq.) Creates second phase
removes product
12. AminationYesLiquid (org.)Dissolves reactants
13. EsterificationYesLiquid (org.)Solvent free
Liquid (org.)Dissolves reactant
14. Hydrolysis YesLiquid (org.)—liquid (aq.)Creates second phase
removes product
Dissolves reactants
15. Aminolysis yesLiquid (org.)Dissolves reactants
16 .Condensation No Liquid (org.)Dissolves reactant
17. DeprotectionNoLiquid (org.)Dissolves reactant
18. ProtectionYesLiquid (org.)Dissolves reactant
19. Dehydration YesLiquid (org.)—liquid (aq.)Catalyst carrier
Create second phase
Product removal
20. Cyclization NoLiquid (org.)—liquid (aq.)Dissolves reactant
Product separation
21. LithiationNoLiquid (org.)Dissolves reactants
Note: aq.: aqueous and org.: organic.
Table 7. List of APIs and final drugs (*) in the database, of which complete reaction pathway and the reactions are provided in the database.
Table 7. List of APIs and final drugs (*) in the database, of which complete reaction pathway and the reactions are provided in the database.
1. 6-aminopenicillanic acid12. Tramadol
2. Zuchopenthixol13. Artemisinin
3. 6-Hydroxybuspirone14. Saxagliptin
4. aliskiren hemifumarate15. Atazanavir
5. Ibuprofen16. PDE5 inhibitor *
6. Meclinertant *17. Axitinib
7. Rufinamide18. Olanzapine *
8. Ciprofloxacin19. Amitriptyline
9. Naproxen20. Tamoxifen
10. OZ439 * (antimalarial drug candidate)21. Vildagliptin
11. Efavirenz *
Table 8. Solvent functions in reaction and their possible improvements.
Table 8. Solvent functions in reaction and their possible improvements.
Possible Improvements
ProductivityProcess safetySeparation stepsWaste reduction
Solvent FunctionsReaction medium--
Product removal (phase creation)-
Substrate carrier (phase creation)-
Catalyst carrier (Phase creation)
Table 9. List of reactions where the use of the solvent has a specific function that leads in direct reaction performance improvement.
Table 9. List of reactions where the use of the solvent has a specific function that leads in direct reaction performance improvement.
Reaction Type Main Product Phases Solvent Function Improvement
Amidation [1]PDE5 inhibitorLiquid(org.)—SolidProduct separation (Product not soluble in solvent) Direct product separation
Enzymatic reduction [43]Chiral alcohols Liquid (aq.)—Ionic liquidSubstrate carrier Increased productivity (82–92% yield)
Liquid (aq.)-Productivity (42–46% yield)
Liquid (aq.)—Organic solvent Substrate carrierProductivity (0% yield)
Alkylation [44]Alyl azidesliquid (org. DMSO)Dissolves reactants High productivity (94% yield) but high waste generation
Liquid (aq.)—liquid (org. DMSO)Dissolves reactants High in productivity (94% yield) and lower waste generation
Liquid (aq.)—liquid (org. Isopropyl acetate)Dissolves reactants High productivity (96.5% yield) and lower waste generation
Liquid (aq.)—liquid (org. Isooctane)Dissolves reactants High productivity (91.4% yield) and lower waste generation
Arylation [45]3,3-disubstituted oxindolesLiquid (aq.)—liquid (org. THF or Toluene)Dissolve reactants Increased reaction rate that leads to complete conversion and high yields compared to single phase systems
Arylation [46]Arylation of Alkynes Liquid (aq)—liquid (org.)Catalysts dissolved in aq. Phase Catalyst recovery while maintaining high yields
Hydrolysis [47]NaproxenLiquid (aq.)—liquid (org.; Hexane or isooctane or toluene)Product removal (in organic phase)Increased yield, enzyme stability increases
Hydrolysis [48]6-amino penicillanic acidLiquid (aq.)—liquid (org.; butyl acetate)Product removal in the organic phase Productivity increases (product removal shifts the equilibrium reaction towards the product)
Transamination [49]L-2 Aminobutyric acidLiquid (aq.)—liquid (org.)By-product inhibits the enzyme, removal in the organic phase Increased conversion (96%)
Liquid (aq.)-Conversion (~40%)
Transamination [50]Chiral aminesLiquid (aq.)—ResinProduct removal Equilibrium shifts towards product side
Enzymatic Reduction [51,52]S-4-Chloro-3-hydroxybutyric acid ethyl esterLiquid (aq.)—liquid (org.)Substrate controlled release Increased reaction productivity
Carbonylation [53,54]Ibuprofen Gas—Liquid (org.)—Liquid (aq.)Dissolves catalyst (aq.)Less waste generated, same productivity, slightly lower reaction rates, reduction in the separation steps
Gas—Liquid (org.); MEK-Liquid (aq.)Dissolves catalyst (aq.);
Dissolves reactants (org.)
Increased reaction rates
Transamination [55,56]SitagliptinLiquid (aq.); DMSO used as co-solventDMSO dissolves amine donor Increased productivity; enantiomeric selectivity and less waste generated
Note: aq.: aqueous and org.: organic.
Table 10. Kinetic models availability; * indicates those that are included in the kinetic model library.
Table 10. Kinetic models availability; * indicates those that are included in the kinetic model library.
Kinetic ModelsNumberReference
Enzymatic reduction3[51,60]
Table 11. Summary of the data retrieved from the database.
Table 11. Summary of the data retrieved from the database.
PathwayReaction StepsDatabase EntriesOperationReference
11.1 Friedel Crafts acylation67 and 74Batch (67), continuous (74)Elango et al. [71]
Lindley et al. [72]
1.2 Hydrogenation68–71 and 92BatchElango et al. [71]
1.3 Carbonylation9–13BatchElango et al. [71]
23.1 Friedel Crafts45ContinuousSnead et al. [73]
3.2 1–2 aryl migration46ContinuousSnead et al. [73]
3.3 Saponification44ContinuousSnead et al. [73]
32.1 Friedel-Crafts73ContinuousBogdan et al. [74]
2.2 1–2,aryl migration72ContinuousBogdan et al. [74]
2.3 Saponification44ContinuousBogdan et al. [74]
Table 12. Retrieved reaction information from the database.
Table 12. Retrieved reaction information from the database.
Reaction InformationReaction Step 1Reaction Step 2Reaction Step 3
Friedel Crafts (Flow) Aryl Migration (Flow)Saponification (Flow)
ReactionIsobutylbenzene + propionyl chloride → 4-isobutylpropiophenone + HCl C13H18O (4-isobutylpropiophenone) + C4H10O3 (trimethyl orthoformate) → C14H18O2 (Methyl 2-(4-isobutylphenyl) propanoate) + C3H8O2 (Dimethoxymethane)C14H21O2 (Methyl 2-(4-isobutylphenyl) propanoate) + C2H6OS (2-mercaptoethanol) + NaOH→ C13H18O2 Na (ibuprofen sodium salt) + CH3OH ( MeOH )
Composition (Reactant A: Reactant B, in moles eq.)1:1.171:81:8
SolventWater DMF/1-propanolMeOH/H2O
Acid/BaseHCl --
By Products---
Table 13. Reaction Conditions for the three reactive steps.
Table 13. Reaction Conditions for the three reactive steps.
Reaction ConditionsReaction Step 1Reaction Step 2Reaction Step 3
Friedel Crafts (Flow) Aryl Migration (Flow)Saponification (Flow)
Temperature87 °C90 °C90 °C
Pressure17 atm14 atm14 atm
Residence time1.25 min1min1 min
Catalyst amount1.11 eq. AlCl33 eq. ICI-
Solvent amount-0.25 eq. DMF/0.71 eq. n-propanolMeOH/H2O (1:3 v/v)
Table 14. Available experimental data as retrieved from the database.
Table 14. Available experimental data as retrieved from the database.
Type of DataReaction Step 1Reaction Step 2Reaction Step 3
Friedel Crafts (Flow) Aryl Migration (Flow)Saponification (Flow)
Selectivity (main product; by-product)---
Reaction Yield---
ExperimentalSteady state data for different residence timesSteady state data for different residence timesSteady state data for different residence times

Share and Cite

MDPI and ACS Style

Papadakis, E.; Anantpinijwatna, A.; Woodley, J.M.; Gani, R. A Reaction Database for Small Molecule Pharmaceutical Processes Integrated with Process Information. Processes 2017, 5, 58.

AMA Style

Papadakis E, Anantpinijwatna A, Woodley JM, Gani R. A Reaction Database for Small Molecule Pharmaceutical Processes Integrated with Process Information. Processes. 2017; 5(4):58.

Chicago/Turabian Style

Papadakis, Emmanouil, Amata Anantpinijwatna, John M. Woodley, and Rafiqul Gani. 2017. "A Reaction Database for Small Molecule Pharmaceutical Processes Integrated with Process Information" Processes 5, no. 4: 58.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop