Solvents are frequently used as diluents in chemical production processes and products like paint, cleaning agents, glue, and ink. The global solvent market is about 20 million tons a year [1
]. The paints, cleaning, and pharmaceutical industries represent the main sectors with more than 60% of the total consumption of solvents [2
]. In 2017, paints and coating (46%) and pharmaceutical industry (9%) combined solvent usage accounted for 55% in Europe [3
]. Most classical solvents are nonrenewable, nonbiodegradable, flammable, or toxic and therefore, exhibit various problems in manipulation, recycling, and waste treatment [4
The European regulation concerning the “Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH)” was adopted to improve the protection of human health and the environment from the risks that can be posed by chemicals [5
]. Solvents such as cyclohexane, benzene, toluene, chloroform and dichloromethane can be found in the REACH restriction list [6
]. Chemicals that are candidates for future bans or restrictions are listed in the “candidate list of substances of very high concern (SVHC) for authorization” [7
]. The candidate list for SVHC contains some solvents including nitrobenzene, o
-dimethylformamide, furan, formamide, N
-methylpyrrolidone, 2-(m)ethoxyethanol, trichloroethylene, 1,2,3-trichloropropane, and 2,4-dinitrotoluene.
ChemSec is an independent nonprofit organization that advocates for substitution of toxic chemicals to safer alternatives [8
]. Among other initiatives, ChemSec developed the “SIN-list” (Substitute It Now-list). The SIN-list consists of chemicals that ChemSec has identified as fulfilling the criteria for SVHC as defined by the EU chemicals regulation REACH [9
]. Currently, the SIN list contains 991 chemicals, which makes it a stricter list than the SVHC list. Recently (November 2019) ChemSec’s SIN-list was updated with 47 chemicals [10
], including a handful of solvents (e.g., 1,1,1-trichloroethane, trifluoroacetic acid, 1,4-dioxane, acetaldehyde, and 1,2-dihydroxybenzene).
This legislative pressure is one of the main drivers that steers industry towards greener alternatives for toxic solvents [11
]. Furthermore, costs of disposing toxic solvents or recycling by distillation can have a significant impact on production costs. In the pharmaceutical industry, solvents typically contribute for 56% to the materials used to manufacture an active pharmaceutical ingredient (API) [12
]. In a typical alkyd resin formulation, solvent content ranges between 15% and 55% (for heavy duty applications) by mass [13
]. In addition, it has been estimated that solvents contribute 50% of the post-treatment greenhouse gas emissions of pharmaceutical manufacturing [17
Additionally, increased consumer awareness can be a driver for manufacturers to look for alternatives in their consumer products [11
]. The LIFE AskREACH project [18
] has recently conducted surveys of more than 14,000 citizens in 14 European countries, targeting consumer awareness about SVHCs in consumer products. The survey shows that Europeans are highly concerned about the presence of problematic chemicals in products. In 9 (out of 14) countries, 70% of the respondents was highly or extremely concerned about the possibility that everyday products may have problematic substances that can be harmful to human health and the environment. At the same time, the study also reveals that EU citizens are not well informed [19
]. According to REACH article 33 [20
], upon request, consumers have the right to receive information from the suppliers about the presence of any SVHC in a product, its subassemblies, or its packaging above a threshold of 0.1% (weight/weight). The AskREACH project encourages consumers to make use of their “SVHC right to know” through the use of an app [21
]. Initiatives like the AskREACH project, the ECHA ‘Chemicals in our life’ webpage [22
], ChemSec, and other national and local consumer organizations can point companies to their corporate responsibility and drive manufacturers to more benign solutions.
In the past, pharmaceutical industry has already put great efforts into the substitution of toxic or hazardous solvents. Said efforts were mainly focused on safety, health, and environmental impact with limited consideration of the physical properties from the solvents in dispute. Several pharmaceutical companies developed their own substitution guides in the form of a table which lists preferred solvents and solvents that are to be avoided. The CHEM21 consortium did a great effort reviewing [23
] the GSK [17
], AstraZeneca [26
], ACS GCI-PR [28
], Sanofi [29
], Rowan University [30
], and ETH Zürich [31
] solvent guides. Eventually this resulted into creating a consensus across previously mentioned guides and developing a solvent guide [33
] that is widely supported, at least in the pharmaceutical industry. Although pharmaceutical industry has put the most effort in solvent awareness, it should be noted that major solvent use is situated in paints and coatings industry [3
Byrne et al. [34
] published an extensive review on previously developed solvent guides. Similarities and discrepancies of existing solvent guides [17
] are discussed and the authors conclude that “there is no need for more general-purpose solvent selection guides of the familiar format because they are no longer providing any significant advancement in this field.” Moreover, “the solvent selection guide format has reached its potential.” However, we believe that there is a need for a data-driven, automated solvent selection/substitution guide. Such a guide would allow nonexpert users, not familiar with the field of solvent selection, to look for greener solvents and create more sustainable products and processes. Moreover, the software we introduce is based on Artificial Intelligence (AI) algorithms, resulting in a self-learning, extensible application for the future.
State-of-the-art AI supported quantitative structure property relationship (QSPR) modeling makes use of deep neural networks (DNN). Among others, AI supported QSPRs have been used for the prediction of octanol–water partition coefficients [36
], solvation free energies [37
], gas chromatographic retention indices [38
]. and critical properties [39
]. Often SMILES strings are used as the input for the DNN model [36
]. Such an approach proves useful in the screening and development of green solvents with respect to unconventional and novel compounds. Interestingly, the methodology can be inversed (iQSPR) to generate new structures based on a set of molecular properties [40
]. Deep neural network supported QSPR modeling exploits the concept of automatic feature extraction. Based on the input vector (SMILES or other), dominant structural features of the given compound are extracted by the neural network [37
]. This technique requires a complex deep learning architecture. In contrast, the software presented in this publication makes use of a dataset of experimental physical properties, thus the features are known beforehand. Consequently, a complex deep learning architecture would be unsubstantiated for such use. Instead, a “shallow” neural network is used in this work (see Section 3.4.1
Sustainable solvent alternatives are available but the search for the best solution is often time consuming, labor intensive, and requires specific knowledge. Moreover, nonexpert scientists often first draw up a list of potential solvents based on their own experience (“What did I use in the past?”) and from a pragmatic point of view (“What do we have in stock?”) and then make a choice based on (unsystematic) trial and error. In the past decade, considerable efforts were made especially in the pharmaceutical industry. However, for SME’s and small formulating companies where chemistry is not the core business and R&D culture is lacking, the search for a new and more sustainable solvent is not obvious. As Jessop concluded in his 2011 publication “Academic research in the area of green solvents is currently not focused on the applications that make the greatest contribution to the environmental impact of solvents” [42
To enable a more efficient, objective, and purposeful selection of solvents, a user-friendly software tool—SUSSOL (Sustainable Solvents Selection and Substitution Software)—was developed, using AI. The aim of SUSSOL is to support companies in the search for sustainable and viable alternatives for nonbenign solvents currently used in their products and processes. Our software tries to bridge the gap between academic research and applicability in industry. By providing a flexible tool, companies can use it according to their own needs.
Previously, two software-based solvent guides have been published. Both use the physical properties of solvents to attain a data-driven and objective solvent selection. Scientists from AstraZeneca [27
] developed a solvent selection tool based on a dataset of 272 solvents, characterized by 30 physical properties. This solvent dataset is visualized in a three-dimensional map and can be explored by the user. The tool allows to interactively select solvents based on the principal component analysis (PCA) of the solvents’ physical properties. Solvents which are close to each other on the map have similar physical and chemical properties, whereas solvents at a distance are significantly different [43
]. In addition to the PCA scores, other data including the physical properties, functional groups, and environmental data has been included to aid in the rational selection of solvents. Meanwhile, the tool is validated by the ACS GCI Pharmaceutical Roundtable and publicly available on the ACS website [43
]. The tool offers excellent insight; however, it is a bit slow as a web-based tool. It is clearly focused on the pharmaceutical industry and engineering. Most important, it does not offer flexibility to the user to add new solvents or additional (meta) data.
The second software-based guide is published by Tobiszewski et al. [44
]. The authors use Ward’s hierarchical clustering analysis [45
] to analyze a set of 151 solvents. The similarity between the solvents in the multidimensional space can be determined by the Euclidean distance between the solvents in the dataset. Three clusters (groups) of solvents with similar properties are determined. Tobiszewski et al. describe a group of “rather nonpolar and volatile compounds,” a group of “nonpolar and sparingly volatile solvents,” and a cluster with “polar solvents.” Within each cluster, the solvents are ranked by means of a multiple-criteria decision analysis technique (TOPSIS) [46
]. Each solvent is assigned a calculated score between 1 and 0. Score 1 being the ideal solution, value 0 being the nonideal solution. The scores are benchmarked against the scores from the Pfizer [35
], GCI-PR [28
], GlaxoSmithKline [17
], AstraZeneca [26
], Sanofi [29
], and the CHEM21 survey [23
] solvent guides. The authors conclude that the ranking of solvents within each cluster generally agrees with other solvent selection guides [44
]. However, the ranking of solvents with TOPSIS suffers from a lack of data. Hence, the authors split their ranking into different levels of confidence which does not benefit to the usability of the solvent guide. The guide is more of an academic exercise, than a ready-to-use tool.
Byrne et al. [34
] state that this approach has reiterated that certain types of solvent have inherently undesirable characteristics and therefore, solvent selection on a direct ‘like-for-like’ substitution basis is restrictive. They also conclude that “Relying only on the existing catalogue of largely conventional solvents, it is not possible to have a green solvent substitute readily available for every application.”
Academic efforts to develop benign and biobased solvents should continue, but these new solvents should find their way to SME’s and small nonresearch-intensive companies. An SME with a small product portfolio does not have the means nor the knowledge to start the search for a better solvent. Software-aided tools could be of great help, and our tool attempts to enable this search for better solvents. Companies that are not aware of the relevant physical solvent properties characterizing their products and processes will find a great deal of help in the described software.
We think of a solvent selection/substitution tool as an interactive data-centered catalogue, both with conventional and neoteric solvents, that guides the user towards the best possible alternative. This process should be effortless and transparent. Providing flexibility to the users to work with their own dataset, add new solvents, and add company-specific or confidential data will certainly facilitate the use of this tool in an industrial environment. Furthermore, solvent producers and distributors of green and neoteric solvents can use the tools as a benchmark to promote their solvents as an alternative for conventional nonbenign solvents.
The presented SUSSOL software uses a solvent dataset in the form of a .csv file. The software [48
] can be operated in two modes, the selection mode (Section 3.3
Solvent Selection) and the substitution mode (Section 3.4
Solvent Substitution). The solvent selection mode consists of a Multidimensional Scaling (MDS) plot of all solvents in the dataset. In the solvent substitution mode solvents are clustered into groups (clusters) of similar solvents, based on their physical properties. For this cluster analysis, the Self-organizing Map (SOM) from Kohonen [49
] is used. The clusters are ranked on a two-dimensional grid where the distance between two clusters is a measure of similarity. The user is able to specify the solvent he wants to replace. After cluster analysis, a “candidate list” with similar solvents is generated. After selecting the most sustainable candidates, laboratory testing can take place.
The software and underlying principles and algorithms are described in detail in Section 3
. Materials and Methods. We recommend to read this section first before proceeding with Section 2
. Results and Discussion.