1. Introduction
A plasticizer is a substance or material that can increase the flexibility, processability, or expansion of plastics by reducing the glass transition temperature (Tg) [
1,
2,
3]. In addition, many other properties of plastics are also affected by plasticizers, such as crystallization, melting and gel temperature, interaction with water, fire resistance, gas permeability, degradation rate, etc. [
4]. As the most common plastic additive, worldwide production of plasticizers was around 6.4 million tons per year during the last decade. The global plasticizer market was valued at US
$93.76 billion in 2019, and will reach US
$111.38 billion by 2023 [
5,
6]. As the most important category of PVC plasticizers, phthalates are the most used plasticizers in the world. However, the use of phthalates has caused concern and controversy due to the migration phenomenon toward elements in contact with them (medical and childcare articles) and bioaccumulation in the environment [
7,
8]. Therefore, the current goal is to still find new plasticizers to satisfy the numerous applications of plastic products. The most traditional method for discovering new plasticizers is experimental exploration, which includes structure synthesis, property analysis, and measurement. Although experimental exploration is intuitive and accurate, it has been carried out in an inefficient manner for a long time. In addition, experimental exploration has high requirements on equipment, experimental environment, and the professional knowledge of researchers. It has limitations of being greatly affected by the external environment, long cycle, and high cost, which make it difficult to accelerate the development of plasticizers. Therefore, it is necessary to develop more effective plasticizers research methods to shorten the research cycle.
In the past decade, machine learning (ML) has become a powerful tool for accelerating material development. ML publishing activities for chemicals and materials are growing exponentially [
9,
10,
11]. In particular, the publication of some polymer databases has laid a data foundation for ML-related research on polymer materials. A representative database is NanoMine [
12], which builds an extensible data representation for data on the material composition, properties, and microstructure of a polymer nanocomposite. Another similar database that can be used for polymer material design is PoLyInfo, which collects information such as polymer name, chemical structure, sample processing method, measurement conditions, properties, monomers used, and polymerization method [
13]. ML method can extract knowledge from existing data, gain insights, and produce reliable results, especially for high-dimensional data classification and regression. Therefore, the method of ML can help during the development and research of materials. Especially in polymer-related fields, the emerging of polymer informatics aims to provide tools to accelerate performance prediction (and design) through alternative ML methods based on reliable data [
14]. Stephen Wu et al. gave a systematic review of the potential and challenges of the latest polymer informatics [
15]. Some recent research on polymer ML has focused on performance indicators such as Tg and atomization temperature [
16]. Chiho Kim et al. established a polymer informatics platform, which uses ML methods to link the key characteristics and performance of polymers, and can predict various important polymer properties on demand [
16]. Ghanshyam Pilania et al. established the Tg prediction model of polyhydroxyalkanoate (PHA) homopolymers and copolymers based on the ML method [
17]. Similar work has also been performed by Yun Zhang et al., which uses the Gaussian process regression model to establish a Tg prediction model of polymer [
18]. In terms of data set sensitivity, Anurag Jha et al. explored the impact of data set uncertainty on the prediction of polymer Tg by ML methods [
19]. In the prediction of polymer functions other than physical properties, a typical machine learning application is the prediction of the performance of polymer filtration membranes (polyvinylidene fluoride, polyethersulfone, and polysulfone filtration membranes) [
20]. Wang et al. presented a novel deep learning approach that combined convolutional neural networks with multi-task learning for building quantitative correlations between microstructures and property values of nanostructured polymers [
21]. However, as an important aid for adjusting the properties of polymers, the application of ML in the field of plasticizer performance evaluation and prediction has not been reported as far as we know.
Different plasticizers produce different plasticizing effects due to the strength of the plasticizer-polymer and the plasticizer-plasticizer interaction [
22]. Plasticizers generally contain two structural components: polar and non-polar parts. The polar part of the molecule must be able to reversibly bind to the polymer to soften the polymer, while the non-polar part allows controlling the interaction between the polymers [
1]. Chandola et al. proposed a more accurate plasticization interpretation model, which established a relationship between performance (specific volume, viscosity, etc.) and variables (molecular weight, terminal group content, etc.), allowing it to predict the behavior of 25 PVC plasticizers [
23]. In addition to the physical property parameters, like Tg and atomic atomization temperature, etc., an important index to measure the comprehensive performance of plasticizers is the “Substitution Factor” (SF) [
22,
24,
25,
26]. The definition of SF is the amount of another plasticizer, other than one specific plasticizer (such as DOP), in order to plasticize equivalently, according to the following formulae:
where PHR is per 100 parts of polymer, which represents the parts by weight of plasticizer per 100 parts of resin required to produce a plasticized PVC resin of a particular hardness on a certain Durometer scale [
27]. It was found that the SF is consistent over the plasticizer level range from 20 to 90 phr, and the value of SF usually increased as the molecular weight of the plasticizer increased [
1,
28]. Substitution factors of a large number of commercial plasticizers have been evaluated in order to evaluate and adjust the properties and performance of additives, with DOP always chosen as a reference [
29]. In addition, the SF of the plasticizer also had great significance for the cost estimation of obtaining a specific hardness product [
30]. Therefore, obtaining the SF of plasticizers accurately and effectively provides important support not only for evaluating the properties of plasticizers, but also for evaluating the economic feasibility of plasticizers.
Obtaining the SF of plasticizer through experiments has a long research period and high cost, which brings disadvantages in a large number of potential plasticizer screenings. On the other hand, the existing plasticizer substitution factor data has not been fully utilized. Considering that the main types of plasticizers are esters, the similar methods and experience of known esters quantitative structure-property relationship (QSPR) is used to build a predictive evaluation model between SF and plasticizer molecules [
31,
32]. In this work, traditional and improved genetic algorithms (GA), as well as grid search algorithms, were used in combination with ML methods, such as support vector machines (SVM), random forests (RF), and partial least squares (PLS), to screen important molecular descriptors of plasticizer molecules, and to model the difference between SF and molecular descriptors. The results showed that the support vector machine model constructed by screening the descriptors with an improved genetic algorithm and further dimensionality reduction by principal component analysis (PCA) showed good prediction results. A combination of the grid search algorithm and SVM also showed good prediction results, although were weaker than the optimal model. The screened descriptors were also analyzed and the molecular features related to plasticizer substitution factors were interpreted, providing theoretical support for the design of new plasticizer molecules outlined in the next section.
4. Conclusions
As an important index to measure the performance and economy of plasticizers, the substitution factor has been attracting attention from the industry. However, research and model building on the relationship between SF and plasticizer molecular structures, especially model building based on machine learning algorithms, have not been reported. Based on the reported SF data of plasticizers, this work used a genetic algorithm and grid search algorithm to screen the molecular descriptors of different plasticizer molecules and establish the model between key descriptors and SF. A genetic algorithm with “variable mutation probability” (GA-REC) was also developed in this work to screen the key molecular descriptors of plasticizers that were highly correlated with the SF, and a SF prediction model was then established based on these filtered molecular descriptors. The combined results indicate that the GA-REC + PCA + SVR model will be more suitable for this system. Its R2 on the test set reached 0.9181, with perfect fit on the training set, and 0.9192 with cross-validation results, indicating that the model had good generalization ability. The improved genetic algorithm has greatly improved the prediction accuracy in different regression models. The coefficient of determination (R2) for the test set and the cross-validation was at least 0.15 higher than the R2 of the unimproved genetic algorithm. The filtered descriptors also covered relatively complete molecular information, such as rings, heteroatoms, local branches, etc., which also illustrated the scientific validity of the model from the side. The descriptors also revealed the importance of molecular branching features in the action of plasticizers. This conclusion is consistent with the basic judgment of the plasticizer principle. The model constructed in this work was also applied to predict the SF of several biobased plasticizers. Among them, the SF of Dimethyl furan-2,5-dicarboxylate, and isosorbide dioctanoate were compared with the available experimental values and were in general agreement. As the first study to establish the relationship between plasticizer SF and plasticizer molecular structure, this work compared the effectiveness of major machine learning approaches adapted to low data volumes, and provides a basis for subsequent modeling of plasticizer performance and evaluation systems.