AI-Driven Insight into Polycarbonate Synthesis from CO2: Database Construction and Beyond
Abstract
:1. Introduction
2. Dataset
2.1. Description
2.2. Data Exploration
- There is an unfavorable sample-to-descriptor ratio (i.e., massive amount of features (1177) and very few samples (201).
- There is a great amount of missing values (≈12% of the dataset).
3. Proposed Methodology
3.1. Data Preprocessing
Algorithm 1: Missing value imputation algorithm. |
3.2. Dimensionality Reduction
- PCA is one of the most commonly used dimensionality reduction techniques. It works by finding the orthogonal axes (principal components) along which the variance of the data is maximized [43]. These principal components capture the directions of maximum variance (related to learning capabilities) in the data. PCA then projects the original data onto these principal components, effectively reducing the dimensionality while preserving the most significant variation in the data. Some of its potential qualities relates to (1) its linearity, as PCA works well when the relationship between features are linear; (2) its computational efficiency and wide applicability; and (3) its assumption that the data are centered around the origin.
- PCA-Kernel is an extension of PCA that allows non-linear dimensionality reduction [44]. It applies the kernel trick, similar to Support Vector Machines [45], to project the data into a higher-dimensional space, where they becomes linearly separable. PCA (as explained above) is then performed in this higher-dimensional space, followed by projecting the data back to the original space.
3.3. Feature Selection
- Random Forest assigns feature importance scores based on how much each feature decreases the impurity in the individual trees of the ensemble forest. Features with higher importance scores are considered more significant and are thus retained for subsequent modeling tasks [46].
- XGBoost, an implementation of gradient boosting, works by sequentially adding decision trees to an ensemble model, with each subsequent tree trained (by employing gradient descent techniques) to correct the errors of the previous ones [47].
3.4. Regression Techniques
- Linear Regression (LR) [43] assumes a linear relationship between the descriptors and target variables. It aims to fit a straight line to the data that best minimize the sum of squared residuals.
- Support Vector Regression (SVR) [48] aims to use kernel functions to transfer the original descriptor space to a higher dimensional one, which enables samples to exhibit linear separability. It employs the Maximum Margin criterion.
- Random Forest (RF) [46] is based on ensemble learning. By combining the predictions of multiple decision trees, RF reduces overfitting and generalizes well to unseen data, improving predictive performance and robustness. It is less sensitive to noisy data and outliers compared to individual decision trees.
4. Experiments Results
4.1. Experiments Description
4.2. Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Leung, D.Y.; Caramanna, G.; Maroto-Valer, M.M. An overview of current status of carbon dioxide capture and storage technologies. Renew. Sustain. Energy Rev. 2014, 39, 426–443. [Google Scholar] [CrossRef]
- IPCC—Intergovernmental Panel on Climate Change. Climate Change 2021—The Physical Science Basis: Working Group I Contribution to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change; Cambridge University Press: Cambridge, UK, 2023. [Google Scholar]
- Grignard, B.; Gennen, S.; Jérôme, C.; Kleij, A.W.; Detrembleur, C. Advances in the use of CO2 as a renewable feedstock for the synthesis of polymers. Chem. Soc. Rev. 2019, 48, 4466–4514. [Google Scholar] [CrossRef]
- Dowell, N.M.; Fennell, P.S.; Shah, N.; Maitland, G.C. The role of CO2 capture and utilization in mitigating climate change. Nat. Clim. Chang. 2017, 7, 243–249. [Google Scholar] [CrossRef]
- Inoue, S.; Koinuma, H.; Tsuruta, T. Copolymerization of carbon dioxide and epoxide with organometallic compounds. Die Makromol. Chem. 1969, 130, 210–220. [Google Scholar] [CrossRef]
- Liu, Y.; Lu, X.B. Current Challenges and Perspectives in CO2-Based Polymers. Macromolecules 2023, 56, 1759–1777. [Google Scholar] [CrossRef]
- Aresta, M.; Angelini, A. The Carbon Dioxide Molecule and the Effects of Its Interaction with Electrophiles and Nucleophiles. In Carbon Dioxide and Organometallics; Springer International Publishing: Cham, Switzerland, 2016; pp. 1–38. [Google Scholar] [CrossRef]
- Wang, Y.; Darensbourg, D.J. Carbon dioxide-based functional polycarbonates: Metal catalyzed copolymerization of CO2 and epoxides. Coord. Chem. Rev. 2018, 372, 85–100. [Google Scholar] [CrossRef]
- Andrea, K.A.; Kerton, F.M. Iron-catalyzed reactions of CO2 and epoxides to yield cyclic and polycarbonates. Polym. J. 2021, 53, 29–46. [Google Scholar] [CrossRef]
- Bhat, G.A.; Darensbourg, D.J. Progress in the catalytic reactions of CO2 and epoxides to selectively provide cyclic or polymeric carbonates. Green Chem. 2022, 24, 5007–5034. [Google Scholar] [CrossRef]
- Huang, J.; Worch, J.C.; Dove, A.P.; Coulembier, O. Update and Challenges in Carbon Dioxide-Based Polycarbonate Synthesis. ChemSusChem 2020, 13, 469–487. [Google Scholar] [CrossRef] [PubMed]
- Kozak, C.M.; Ambrose, K.; Anderson, T.S. Copolymerization of carbon dioxide and epoxides by metal coordination complexes. Coord. Chem. Rev. 2018, 376, 565–587. [Google Scholar] [CrossRef]
- Scharfenberg, M.; Hilf, J.; Frey, H. Functional Polycarbonates from Carbon Dioxide and Tailored Epoxide Monomers: Degradable Materials and Their Application Potential. Adv. Funct. Mater. 2018, 28, 1704302. [Google Scholar] [CrossRef]
- Liu, X.M.; Lu, G.Q.; Yan, Z.F.; Beltramini, J. Recent Advances in Catalysts for Methanol Synthesis via Hydrogenation of CO and CO2. Ind. Eng. Chem. Res. 2003, 42, 6518–6530. [Google Scholar] [CrossRef]
- Chen, G.; Shen, Z.; Iyer, A.; Ghumman, U.F.; Tang, S.; Bi, J.; Chen, W.; Li, Y. Machine-Learning-Assisted De Novo Design of Organic Molecules and Polymers: Opportunities and Challenges. Polymers 2020, 12, 163. [Google Scholar] [CrossRef] [PubMed]
- Morgan, D.; Jacobs, R. Opportunities and Challenges for Machine Learning in Materials Science. Annu. Rev. Mater. Res. 2020, 50, 71–103. [Google Scholar] [CrossRef]
- Batra, R.; Song, L.; Ramprasad, R. Emerging materials intelligence ecosystems propelled by machine learning. Nat. Rev. Mater. 2021, 6, 655–678. [Google Scholar] [CrossRef]
- Pilania, G. Machine learning in materials science: From explainable predictions to autonomous design. Comput. Mater. Sci. 2021, 193, 110360. [Google Scholar] [CrossRef]
- Kadulkar, S.; Sherman, Z.M.; Ganesan, V.; Truskett, T.M. Machine Learning-Assisted Design of Material Properties. Annu. Rev. Chem. Biomol. Eng. 2022, 13, 235–254. [Google Scholar] [CrossRef]
- Otsuka, S.; Kuwajima, I.; Hosoya, J.; Xu, Y.; Yamazaki, M. PoLyInfo: Polymer Database for Polymeric Materials Design. In Proceedings of the 2011 International Conference on Emerging Intelligent Data and Web Technologies, Tirana, Albania, 7–9 September 2011; pp. 22–29. [Google Scholar]
- Curteanu, S.; Leon, F.; Mircea-Vicoveanu, A.M.; Logofătu, D. Regression Methods Based on Nearest Neighbors with Adaptive Distance Metrics Applied to a Polymerization Process. Mathematics 2021, 9, 547. [Google Scholar] [CrossRef]
- Ghiba, L.; Drăgoi, E.N.; Curteanu, S. Neural network-based hybrid models developed for free radical polymerization of styrene. Polym. Eng. Sci. 2021, 61, 716–730. [Google Scholar] [CrossRef]
- Fiosina, J.; Sievers, P.; Drache, M.; Beuermann, S. AI-Based Forecasting of Polymer Properties for High-Temperature Butyl Acrylate Polymerizations. ACS Polym. Au 2024, 4, 438–448. [Google Scholar] [CrossRef]
- Aida, T.; Ishikawa, M.; Inoue, S. Alternating copolymerization of carbon dioxide and epoxide catalyzed by the aluminum porphyrin-quaternary organic salt or -triphenylphosphine system. Synthesis of polycarbonate with well-controlled molecular weight. Macromolecules 1986, 19, 8–13. [Google Scholar] [CrossRef]
- Kobayashi, M.; Inoue, S.; Tsuruta, T. Copolymerization of carbon dioxide and epoxide by the dialkylzinc–carboxylic acid system. J. Polym. Sci. Polym. Chem. Ed. 1973, 11, 2383–2385. [Google Scholar] [CrossRef]
- Ye, S.; Wang, S.; Lin, L.; Xiao, M.; Meng, Y. CO2 derived biodegradable polycarbonates: Synthesis, modification and applications. Adv. Ind. Eng. Polym. Res. 2019, 2, 143–160. [Google Scholar] [CrossRef]
- Super, M.; Berluche, E.; Costello, C.; Beckman, E. Copolymerization of 1,2-Epoxycyclohexane and Carbon Dioxide Using Carbon Dioxide as Both Reactant and Solvent. Macromolecules 1997, 30, 368–372. [Google Scholar] [CrossRef]
- Qin, Z.; Thomas, C.M.; Lee, S.; Coates, G.W. Cobalt-Based Complexes for the Copolymerization of Propylene Oxide and CO2: Active and Selective Catalysts for Polycarbonate Synthesis. Angew. Chem. Int. Ed. 2003, 42, 5484–5487. [Google Scholar] [CrossRef] [PubMed]
- Mang, S.; Cooper, A.I.; Colclough, M.E.; Chauhan, N.; Holmes, A.B. Copolymerization of CO2 and 1,2-Cyclohexene Oxide Using a CO2-Soluble Chromium Porphyrin Catalyst. Macromolecules 2000, 33, 303–308. [Google Scholar] [CrossRef]
- Kim, I.; Yi, M.J.; Byun, S.H.; Park, D.W.; Kim, B.U.; Ha, C.S. Biodegradable Polycarbonate Synthesis by Copolymerization of Carbon Dioxide with Epoxides Using a Heterogeneous Zinc Complex. Macromol. Symp. 2005, 224, 181–192. [Google Scholar] [CrossRef]
- Kim, J.G.; Cowman, C.D.; LaPointe, A.M.; Wiesner, U.; Coates, G.W. Tailored Living Block Copolymerization: Multiblock Poly(cyclohexene carbonate)s with Sequence Control. Macromolecules 2011, 44, 1110–1113. [Google Scholar] [CrossRef]
- Jansen, J.C.; Addink, R.; te Nijenhuis, K.; Mijs, W.J. Synthesis and characterization of novel side-chain liquid crystalline polycarbonates, 4. Synthesis of side-chain liquid crystalline polycarbonates with mesogenic groups having tails of different lengths. Macromol. Chem. Phys. 1999, 200, 1407–1420. [Google Scholar] [CrossRef]
- Hu, Y.; Wei, Z.; Frey, A.; Kubis, C.; Ren, C.Y.; Spannenberg, A.; Jiao, H.; Werner, T. Catalytic, Kinetic, and Mechanistic Insights into the Fixation of CO2 with Epoxides Catalyzed by Phenol-Functionalized Phosphonium Salts. ChemSusChem 2021, 14, 363–372. [Google Scholar] [CrossRef]
- Gu, L.; Qin, Y.; Gao, Y.; Wang, X.; Wang, F. Hydrophilic CO2-based biodegradable polycarbonates: Synthesis and rapid thermo-responsive behavior. J. Polym. Sci. Part Polym. Chem. 2013, 51, 2834–2840. [Google Scholar] [CrossRef]
- Eberhardt, R.; Allmendinger, M.; Rieger, B. DMAP/Cr(III) Catalyst Ratio: The Decisive Factor for Poly(propylene carbonate) Formation in the Coupling of CO2 and Propylene Oxide. Macromol. Rapid Commun. 2003, 24, 194–196. [Google Scholar] [CrossRef]
- Coates, G.W.; Moore, D.R. Discrete Metal-Based Catalysts for the Copolymerization of CO2 and Epoxides: Discovery, Reactivity, Optimization, and Mechanism. Angew. Chem. Int. Ed. 2004, 43, 6618–6639. [Google Scholar] [CrossRef] [PubMed]
- Moore, D.R.; Cheng, M.; Lobkovsky, E.B.; Coates, G.W. Electronic and Steric Effects on Catalysts for CO2/Epoxide Polymerization: Subtle Modifications Resulting in Superior Activities. Angew. Chem. Int. Ed. 2002, 41, 2599–2602. [Google Scholar] [CrossRef]
- Byrne, C.M.; Allen, S.D.; Lobkovsky, E.B.; Coates, G.W. Alternating Copolymerization of Limonene Oxide and Carbon Dioxide. J. Am. Chem. Soc. 2004, 126, 11404–11405. [Google Scholar] [CrossRef] [PubMed]
- O’Boyle, N.M. Towards a Universal SMILES representation—A standard method to generate canonical SMILES based on the InChI. J. Cheminformatics 2012, 4, 22. [Google Scholar] [CrossRef]
- Moriwaki, H.; Tian, Y.S.; Kawashita, N.; Takagi, T. Mordred: A molecular descriptor calculator. J. Cheminformatics 2018, 10, 4. [Google Scholar] [CrossRef]
- dos Santos, V.H.J.M.; Pontin, D.; Rambo, R.S.; Seferin, M. The Application of Quantitative Structure–Property Relationship Modeling and Exploratory Analysis to Screen Catalysts for the Synthesis of Oleochemical Carbonates from CO2 and Bio-Based Epoxides. J. Am. Oil Chem. Soc. 2020, 97, 817–837. [Google Scholar] [CrossRef]
- Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
- Pearson, K. LIII. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Philos. Mag. J. Sci. 1901, 2, 559–572. [Google Scholar] [CrossRef]
- Schölkopf, B.; Smola, A.; Müller, K.R. Kernel principal component analysis. In International Conference on Artificial Neural Networks; Springer: Berlin/Heidelberg, Germany, 1997; pp. 583–588. [Google Scholar]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; Volume 1, pp. 278–282. [Google Scholar]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Drucker, H.; Burges, C.J.; Kaufman, L.; Smola, A.; Vapnik, V. Support vector regression machines. Adv. Neural Inf. Process. Syst. 1996, 9, 779–784. [Google Scholar]
SVR | |
Name | Possible Values |
kernel | [poly, rbf] |
C | (0.1, 1.0, 0.3) |
epsilon | (0.0, 1.0, 0.1) |
RF | |
Name | Possible Values |
max_depth | (1, 20, 2) |
min_samples_split | (2, 20, 2) |
min_samples_leaf | (1, 20, 2) |
max_features | [sqrt, log2, None] + (0.1, 0.9, 0.1) |
n_estimators | [10, 50, 100, 150, 200] |
Mn (kgmol)−1 | Mw/Mn | Conversion | |||||||
---|---|---|---|---|---|---|---|---|---|
LR | SVR | RF | LR | SVR | RF | LR | SVR | RF | |
RAW | −72.130 | 0.102 | 0.777 | −0.317 | 0.834 | 0.292 | 0.886 | ||
PCA_poly_sam | 0.095 | 0.750 | 0.200 | 0.793 | 0.146 | 0.909 | |||
PCA_poly_sam_RF | 0.095 | 0.152 | 0.441 | 0.068 | 0.385 | 0.723 | 0.270 | 0.137 | 0.768 |
PCA_poly_sam_XGB | 0.016 | 0.049 | 0.224 | 0.070 | 0.221 | 0.743 | 0.112 | 0.349 | 0.829 |
PCA_poly_var | 0.095 | 0.719 | 0.200 | 0.845 | 0.146 | 0.930 | |||
PCA_poly_var_RF | 0.095 | 0.152 | 0.444 | 0.076 | 0.393 | 0.709 | 0.178 | 0.113 | 0.716 |
PCA_poly_var_XGB | 0.016 | 0.049 | 0.209 | 0.070 | 0.221 | 0.751 | 0.112 | 0.349 | 0.829 |
PCA_rbf_sam | 0.289 | 0.718 | 0.704 | 0.834 | 0.404 | 0.901 | |||
PCA_rbf_sam_RF | 0.197 | 0.221 | 0.537 | 0.456 | 0.604 | 0.816 | 0.620 | 0.387 | 0.866 |
PCA_rbf_sam_XGB | 0.003 | 0.005 | 0.424 | 0.192 | 0.548 | 0.846 | 0.417 | 0.385 | 0.865 |
PCA_rbf_var | 0.289 | 0.778 | 0.704 | 0.843 | 0.404 | 0.849 | |||
PCA_rbf_var_RF | 0.198 | 0.222 | 0.556 | 0.459 | 0.601 | 0.861 | 0.581 | 0.385 | 0.868 |
PCA_rbf_var_XGB | 0.003 | 0.005 | 0.481 | 0.192 | 0.548 | 0.850 | 0.417 | 0.385 | 0.937 |
PCA_sam | 0.224 | 0.782 | 0.461 | 0.779 | 0.305 | 0.891 | |||
PCA_sam_RF | 0.067 | 0.239 | 0.639 | 0.183 | 0.411 | 0.750 | 0.492 | 0.308 | 0.875 |
PCA_sam_XGB | 0.067 | 0.239 | 0.594 | 0.156 | 0.373 | 0.725 | 0.305 | 0.394 | 0.899 |
PCA_var | 0.208 | 0.211 | 0.784 | 0.437 | 0.391 | 0.824 | 0.608 | 0.308 | 0.844 |
PCA_var_RF | 0.183 | 0.226 | 0.794 | 0.253 | 0.422 | 0.740 | 0.573 | 0.317 | 0.848 |
PCA_var_XGB | 0.105 | 0.187 | 0.711 | 0.437 | 0.391 | 0.793 | 0.608 | 0.308 | 0.859 |
RF | 0.266 | 0.103 | 0.605 | 0.470 | 0.451 | 0.777 | 0.213 | 0.398 | 0.744 |
XGB | 0.191 | 0.186 | 0.191 | 0.150 | 0.234 | 0.730 | 0.235 | 0.316 | 0.768 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Martinez, A.D.; Navajas-Guerrero, A.; Bediaga-Bañeres, H.; Sánchez-Bodón, J.; Ortiz, P.; Vilas-Vilela, J.L.; Moreno-Benitez, I.; Gil-Lopez, S. AI-Driven Insight into Polycarbonate Synthesis from CO2: Database Construction and Beyond. Polymers 2024, 16, 2936. https://doi.org/10.3390/polym16202936
Martinez AD, Navajas-Guerrero A, Bediaga-Bañeres H, Sánchez-Bodón J, Ortiz P, Vilas-Vilela JL, Moreno-Benitez I, Gil-Lopez S. AI-Driven Insight into Polycarbonate Synthesis from CO2: Database Construction and Beyond. Polymers. 2024; 16(20):2936. https://doi.org/10.3390/polym16202936
Chicago/Turabian StyleMartinez, Aritz D., Adriana Navajas-Guerrero, Harbil Bediaga-Bañeres, Julia Sánchez-Bodón, Pablo Ortiz, Jose Luis Vilas-Vilela, Isabel Moreno-Benitez, and Sergio Gil-Lopez. 2024. "AI-Driven Insight into Polycarbonate Synthesis from CO2: Database Construction and Beyond" Polymers 16, no. 20: 2936. https://doi.org/10.3390/polym16202936
APA StyleMartinez, A. D., Navajas-Guerrero, A., Bediaga-Bañeres, H., Sánchez-Bodón, J., Ortiz, P., Vilas-Vilela, J. L., Moreno-Benitez, I., & Gil-Lopez, S. (2024). AI-Driven Insight into Polycarbonate Synthesis from CO2: Database Construction and Beyond. Polymers, 16(20), 2936. https://doi.org/10.3390/polym16202936