Expectation-Maximization Model for Substitution of Missing Values Characterizing Greenness of Organic Solvents
Abstract
:1. Introduction
2. Materials and Methods
2.1. Dataset
2.2. E-M Model
2.3. Dataset Preparation
3. Results and Discussion
3.1. Basic Statistics
3.2. Predictions with Bayesian Model
Application of E-M Algorithm
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Anastas, P.T.; Warner, J.C. Green Chemistry: Theory and Practice; Oxford Univerity Press: New York, NY, USA, 1998. [Google Scholar]
- Anastas, P.; Eghbali, N. Green chemistry: Principles and practice. Chem. Soc. Rev. 2010, 39, 301–312. [Google Scholar] [CrossRef] [PubMed]
- Gu, Y.; Jérôme, F. Bio-based solvents: An emerging generation of fluids for the design of eco-efficient processes in catalysis and organic chemistry. Chem. Soc. Rev. 2013, 42, 9550–9570. [Google Scholar] [CrossRef] [PubMed]
- Pena-Pereira, F.; Kloskowski, A.; Namieśnik, J. Perspectives on the replacement of harmful organic solvents in analytical methodologies: A generation of eco-friendly alternatives. Green Chem. 2015, 17, 3687–3705. [Google Scholar] [CrossRef]
- Linak, E.; Bizzari, S.N. Global Solvents: Opportunities for Greener Solvents; IHS Markit: London, UK, 2013. [Google Scholar]
- Nerín, C.; Salafranca, J.; Aznar, M.; Batlle, R. Critical review on recent developments in solventless techniques for extraction of analytes. Anal. Bioanal. Chem. 2009, 393, 809–833. [Google Scholar] [CrossRef] [PubMed]
- Cave, G.W.V.; Raston, L.; Scott, J.L. Recent advances in solventless organic reactions: Towards benign synthesis with remarkable versatility. Chem. Commun. 2001, 21, 2159–2169. [Google Scholar] [CrossRef]
- Pena-Pereira, F.; Tobiszewski, M. The Application of Green Solvents in Separation Processes; Pena-Pereira, F., Tobiszewski, M., Eds.; Elsevier: Cambridge, UK, 2017. [Google Scholar]
- Anastas, P.T. Green Chemistry as Applied to Solvents. In Clean Solvents—Alternative Media for Chemical Reactions and Processing; Abraham, M.A., Moens, L., Eds.; ACS Symposium Series; American Chemical Society: Washington, DC, USA, 2002; Volume 1991, pp. 1–9. [Google Scholar]
- Curzons, A.D.; Constable, D.C.; Cunningham, V.L. Solvent selection guide: A guide to the integration of environmental, health and safety criteria into the selection of solvents. Clean Technol. Environ. Policy 1999, 1, 82–90. [Google Scholar] [CrossRef]
- Jiménez-González, C.; Curzons, A.D.; Constable, D.J.C.; Cunningham, V.L. Expanding GSK’s Solvent Selection Guide—Application of life cycle assessment to enhance solvent selections. Clean Technol. Environ. Policy 2005, 7, 42–50. [Google Scholar] [CrossRef]
- Alfonsi, K.; Colberg, J.; Dunn, P.J.; Fevig, T.; Jennings, S.; Johnson, T.A.; Kleine, H.P.; Knight, C.; Nagy, M.A.; Perry, D.A.; et al. Green chemistry tools to influence a medicinal chemistry and research chemistry based organisation. Green Chem. 2008, 10, 31–36. [Google Scholar] [CrossRef]
- Henderson, R.K.; Jiménez-González, C.; Constable, D.J.C.; Alston, S.R.; Inglis, G.G.A.; Fisher, G.; Sherwood, J.; Binks, S.P.; Curzons, A.D. Expanding GSK’s solvent selection guide—Embedding sustainability into solvent selection starting at medicinal chemistry. Green Chem. 2011, 13, 854–862. [Google Scholar] [CrossRef]
- Moity, L.; Durand, M.; Benazzouz, A.; Pierlot, C.; Molinier, V.; Aubry, J. Panorama of sustainable solvents using the COSMO-RS approach. Green Chem. 2012, 14, 1132–1145. [Google Scholar] [CrossRef]
- Prat, D.; Pardigon, O.; Flemming, H.-W.; Letestu, S.; Ducandas, V.; Isnard, P.; Guntrum, E.; Senac, T.; Ruisseau, S.; Cruciani, P.; et al. Sanofi’s solvent selection guide: A step toward more sustainable processes. Org. Process Res. Dev. 2013, 17, 1517–1525. [Google Scholar] [CrossRef]
- Tobiszewski, M.; Tsakovski, S.; Simeonov, V.; Pena-Pereira, F. A solvent selection guide based on chemometrics and multicriteria decision analysis. Green Chem. 2015, 17, 4773–4785. [Google Scholar] [CrossRef]
- Tobiszewski, M.; Namieśnik, J.; Pena-Pereira, F. Environmental risk-based ranking of solvents using the combination of a multimedia model and multi-criteria decision analysis. Green Chem. 2017, 19, 1034–1042. [Google Scholar] [CrossRef]
- Prat, D.; Wells, A.; Hayler, J.; Sneddon, H.; Mcelroy, C.R.; Abou-shehada, S.; Dunn, P.J. CHEM21 selection guide of classical- and less classical-solvents. Green Chem. 2016, 18, 288–296. [Google Scholar] [CrossRef]
- Byrne, F.P.; Jin, S.; Paggiola, G.; Petchey, T.H.M.; Clark, J.H.; Farmer, T.J.; Hunt, A.J.; Mcelroy, C.R.; Sherwood, J. Tools and techniques for solvent selection: Green solvent selection guides. Sustain. Chem. Process. 2016, 4, 1–24. [Google Scholar] [CrossRef] [Green Version]
- Alder, C.M.; Hayler, J.D.; Henderson, R.K.; Redman, A.M.; Shukla, L.; Shuster, L.E.; Sneddon, H.F. Updating and Expanding GSK’s Solvent Sustainability Guide. Green Chem. 2016, 18, 3879–3890. [Google Scholar] [CrossRef]
- Do, C.B.; Batzoglou, S. What is the expectation maximization algorithm? Nat. Biotechnol. 2008, 26, 897–899. [Google Scholar] [CrossRef] [PubMed]
- Schafer, J.L.; Graham, J.W. Missing data: Our view of the state of the art. Psychol. Methods 2002, 7, 147–177. [Google Scholar] [CrossRef] [PubMed]
- Vazifehdan, M.; Moattar, M.H.; Jalali, M. A hybrid Bayesian network and tensor factorization approach for missing value imputation to improve breast cancer recurrence prediction. J. King Saud Univ. Comput. Inf. Sci. 2018. [Google Scholar] [CrossRef]
- Troyanskaya, O.; Cantor, M.; Sherlock, G.; Brown, P.; Hastie, T.; Tibshirani, R.; Botstein, D.; Altman, R.B. Missing value estimation methods for DNA microarrays. Bioinformatics 2001, 17, 520–525. [Google Scholar] [CrossRef] [PubMed]
- Miller, L.; Xu, X.; Wheeler, A.; Zhang, T.; Hamadani, M.; Ejaz, U. Evaluation of missing value methods for predicting ambient BTEX concentrations in two neighbouring cities in Southwestern Ontario Canada. Atmos. Environ. 2018, 181, 126–134. [Google Scholar] [CrossRef]
- Stanimirova, I.; Serneels, S.; Van Espen, P.J.; Walczak, B. How to construct a multiple regression model for data with missing elements and outlying objects. Anal. Chim. Acta 2007, 581, 324–332. [Google Scholar] [CrossRef] [PubMed]
- Wei, G.; Margolin, A.A.; Haery, L.; Brown, E.; Cucolo, L.; Julian, B.; Shehata, S.; Kung, A.L.; Beroukhim, R.; Golub, T.R. Chemical genomics identifies small-molecule MCL1 repressors and BCL-xL as a predictor of MCL1 dependency. Cancer Cell 2012, 21, 547–562. [Google Scholar] [CrossRef] [PubMed]
- Chang, K.Y.; Suri, A.; Unanue, E.R. Predicting peptides bound to I-Ag7 class II histocompatibility molecules using a novel expectation-maximization alignment algorithm. Proteomics 2007, 7, 367–377. [Google Scholar] [CrossRef] [PubMed]
- Mackay, D.; Shiu, W.-Y.; Ma, K.-C.; Lee, S.C. Handbook of Physical-Chemical Properties and Environmental Fate for Organic Chemicals, 2nd ed.; CRC Press: Boca Raton, FL, USA, 2016. [Google Scholar]
- Dellacherie, C.; Meyer, P.-A. Probabilities and Potential; North-Holland Mathematics Studies; North-Holland Publishing Co.: Amsterdam, The Netherland, 1978. [Google Scholar]
- Schafer, J.L. Analysis of Incomplete Multivariate Data; Chapman and Hall: New York, NY, USA, 1997. [Google Scholar]
- OSHA. Chemical Hazard Classification and Labeling: Comparison of OPP Requirements and the GHS; OSHA: Washington, DA, USA, 2004.
- Mardia, K.V.; Kent, J.T.; Bibby, J.M. Multivariate Analysis; Academic Press: London, UK, 1979. [Google Scholar]
- García, J.I.; García-Marín, H.; Mayoral, J.A.; Pérez, P. Green solvents from glycerol. Synthesis and physico-chemical properties of alkyl glycerol ethers. Green Chem. 2010, 12, 426–434. [Google Scholar]
- Qi, L.; Mui, Y.F.; Lo, S.W.; Lui, M.Y.; Akien, G.R.; Horvaáth, I.T. Catalytic conversion of fructose, glucose, and sucrose to 5-(hydroxymethyl) furfural and levulinic and formic acids in γ-valerolactone as a green solvent. ACS Catal. 2014, 4, 1470–1477. [Google Scholar] [CrossRef]
- Davison, A.C.; Hinkley, D.V. Bootstrap Methods and Their Application; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
Sample Availability: Samples of the compounds are not available from the authors. |
Variable | Mean | Std. Dev. | Variance | Minimum | Maximum | N | N Missing |
---|---|---|---|---|---|---|---|
Melting point (°C) | −43.901 | 48.728 | 2374.453 | −140 | 49.52 | 152 | 3 |
Boiling point (°C) | 142.385 | 68.626 | 4709.488 | 20 | 323 | 155 | 0 |
Density (g cm−3) | 0.952 | 0.214 | 0.046 | 0.62 | 1.68 | 155 | 0 |
Water solubility (mg dm−3) | 116,796.63 | 244,339.68 | 5.97 × 1010 | 0.000927 | 1,000,000 | 155 | 0 |
Vapor pressure (Pa) | 11,901.626 | 28,631.781 | 8.2 × 108 | 0 | 241,900 | 155 | 0 |
Henry law constant (Pa m3 mol−1) | 60,736.714 | 267,946.02 | 7.18 ×1 010 | 8.03×10−6 | 2,219,017 | 153 | 2 |
log KOW | 2.229 | 2.352 | 5.531 | −2.32 | 8.73 | 155 | 0 |
log KOA | 4.434 | 1.999 | 3.995 | 1.451 | 12.101 | 152 | 3 |
Oral LD50 (mg kg−1) | 3667.383 | 4658.48 | 21,701,436 | 5 | 31,500 | 120 | 35 |
Inhalation LC50 (ppm) | 10,532.284 | 18,252.957 | 3.33 × 108 | 34 | 123,000 | 109 | 46 |
Fish LC50 (mg dm−3) | 970.096 | 2813.093 | 7,913,490 | 0.1 | 16,700 | 98 | 57 |
BOD t1/2 [days] | 55.360 | 127.192 | 16,178 | 1 | 800 | 93 | 62 |
log BCF | 1.154 | 1.016 | 1.032 | −1.63 | 4.7 | 151 | 4 |
Comp. 1 | Comp. 2 | Comp. 3 | |
---|---|---|---|
Melting point | −0.4448 | −0.1465 | −0.0451 |
Boiling point | −0.4963 | −0.0883 | 0.0598 |
Density | −0.0607 | −0.0709 | −0.4854 |
Water solubility | 0.1150 | −0.3744 | 0.0708 |
Vapor pressure | 0.3478 | 0.0295 | −0.0888 |
log Henry law const | 0.1247 | 0.5103 | 0.0289 |
log KOW | −0.2462 | 0.4340 | 0.1635 |
log KOA | −0.4352 | −0.1795 | 0.1165 |
log Oral LD50 | −0.0528 | 0.0933 | 0.5678 |
log Inhalation LC50 | 0.2287 | 0.0600 | 0.4662 |
log fish LC50 | 0.1673 | −0.2642 | 0.2638 |
log BOD t1/2 | 0.0848 | 0.2915 | −0.3085 |
log BCF | −0.2492 | 0.4202 | 0.0174 |
Solvent | CAS Number | Oral LD50 (mg kg−1) | Inhalation LC50 (ppm) | Fish LC50 (mg dm−3) | BOD t1/2 (days) | log BCF | |
---|---|---|---|---|---|---|---|
1 | Cyclopentane | 287-92-3 | 11,400 | 57,377 | 100 | 10.6 | 1.61 |
2 | Octane | 111-65-9 | 7930 | 25,260 | 100 | 13.7 | 3.289 |
3 | Nonane | 111-84-2 | 218 | 3200 | 6.5 | 16.4 | 2.651 |
4 | Decane | 124-18-5 | 5000 | 1369 | 500 | 40.0 | 2.158 |
5 | Tridecane | 629-50-5 | 5000 | 41 | 0.9 | 18.2 | 2.979 |
6 | Tetradecane | 629-59-4 | 15,000 | 5001 | 1000 | 38.7 | 3.036 |
7 | Pentadecane | 629-62-9 | 5000 | 5001 | 100.1 | 39.6 | 2.34 |
8 | 1-pentene | 109-67-1 | 3197 | 21,800 | 90.7 | 17.0 | 1.349 |
9 | 1-hexene | 646-04-8 | 10,000 | 32,000 | 5.6 | 10.7 | 1.91 |
10 | 1-heptene | 592-76-7 | 5000 | 27,986 | 175 | 12.6 | 2.372 |
11 | 1-octene | 111-66-0 | 10,000 | 8500 | 6.8 | 9.3 | 2.819 |
12 | 1-nonene | 124-11-8 | 4390 | 7116 | 5.0 | 9.6 | 3.266 |
13 | Pentanol | 71-41-0 | 2200 | 6119 | 370 | 4 | 0.463 |
14 | oleic alcohol | 143-28-2 | 9604 | 13,049 | 46.7 | 9.0 | 2.623 |
15 | 1,3-di-iso-propoxy-2-propanol | 13021-54-0 | 1267 | 2725 | 33.5 | 1.6 | 0.5 |
16 | 1,3-dimethoxypropan-2-ol | 1393 | 3794 | 104.7 | 2.5 | 0.5 | |
17 | 1,3-di-n-butoxy-2-propanol | 1130 | 885 | 69.4 | 2.1 | 0.603 | |
18 | 1-ethoxy-3-iso-propoxy-2-propanol | 1256 | 1889 | 377.7 | 4.3 | 0.5 | |
19 | 1-methoxy-3-(propan-2-yloxy)propan-2-ol | 1498 | 2945 | 160.8 | 2.1 | 0.5 | |
20 | 1-n-butoxy-3-ethoxy-2-propanol | 2220 | 2347 | 232.2 | 1.8 | 0.5 | |
21 | 1-n-butoxy-3-iso-propoxy-2-propanol | 3047 | 4273 | 188.0 | 1.5 | 0.168 | |
22 | 1-n-butoxy-3-methoxy-2-propanol | 1883 | 2582 | 197.6 | 2.1 | 0.5 | |
23 | 1-tert-butoxy-3-ethoxy-2-propanol | 2568 | 4601 | 97.1 | 1.3 | 0.5 | |
24 | 1-tert-butoxy-3-methoxy-2-propanol | 1477 | 3305 | 35.3 | 1.4 | 0.5 | |
25 | 3-butoxypropane-1,2-diol | 3875 | 2818 | 203.4 | 1.5 | 0.5 | |
26 | 3-ethoxypropane-1,2-diol | 2538 | 2663 | 186.2 | 1.9 | 0.5 | |
27 | 3-methoxypropane-1,2-diol | 2081 | 1985 | 272.4 | 2.6 | 0.5 | |
28 | 3-n-butoxy-1-tert-butoxy-2-propanol | 5660 | 5167 | 51.7 | 1.4 | 0.517 | |
29 | Isopropylidene glycerol | 100-79-8 | 7000 | 167,197 | 16,700 | 1.3 | 0.125 |
30 | Methoxycyclopentane | 5614-37-9 | 1500 | 5250 | 34.9 | 6.6 | 0.721 |
31 | Benzyl ethyl ether | 539-30-0 | 2428 | 2625 | 38.6 | 6.6 | 1.374 |
32 | 1,2,3-trimethoxypropane | 1305 | 2815 | 135.8 | 5.3 | 0.5 | |
33 | 1,2,3-tri-n-butoxypropane | 4390 | 5001 | 261.2 | 4.5 | 2.276 | |
34 | 2-methylfuran | 1965 | 9352 | 94.3 | 16.0 | 0.725 | |
35 | 2-methyltetrahydrofuran | 4500 | 24,083 | 319.6 | 6.4 | 0.343 | |
36 | 3-n-butoxy-1-tert-butoxy-2-methoxypropane | 2392 | 1656 | 95.4 | 2.9 | 1.094 | |
37 | Isosorbide dimethyl ether | 1545 | 18,269 | 213.8 | 4.9 | 0.5 | |
38 | Dioxolane | 646-06-0 | 2833 | 3,7363 | 31.0 | 6.1 | 0.149 |
39 | Benzaldehyde | 100-52-7 | 1300 | 1304 | 1.07 | 10 | 1.1 |
40 | gamma-valerolactone | 108-29-2 | 2800 | 1186 | 756.6 | 7.8 | 0.5 |
41 | Dihydrolevoglucosenone | 2021 | 2916 | 59.3 | 4.4 | 0.5 | |
42 | 1,8-cineole | 470-82-6 | 2480 | 1000 | 102 | 26.4 | 1.41 |
43 | 3-carene | 13466-78-9 | 4800 | 8800 | 17.9 | 28 | 2.673 |
44 | Neryl acetate | 141-12-8 | 4550 | 5001 | 41.7 | 8.7 | 2.365 |
45 | Propionic acid | 79-19-4 | 3500 | 5422 | 51 | 1 | 0 |
46 | Ethyl formate | 1850 | 9800 | 276.6 | 15 | 0.5 | |
47 | Butyl levulinate | 2052-15-5 | 5000 | 5001 | 26.3 | 3.3 | 0.278 |
48 | Ethyl levulinate | 539-88-8 | 5000 | 4735 | 121.3 | 3.3 | 0.5 |
49 | Glycerol triacetate | 102-76-1 | 3000 | 5001 | 72.5 | 2.2 | 0.5 |
50 | Methyl caprylate | 111-11-5 | 10,800 | 9987 | 95 | 7.0 | 1.856 |
51 | Methyl lactate | 27871-49-4 | 5000 | 1350 | 828.6 | 11.8 | 0.5 |
52 | Methyl levulinate | 624-45-3 | 2051 | 2888 | 92.7 | 3.4 | 0.5 |
53 | Methyl linoleate | 112-63-0 | 3977 | 5001 | 4.5 | 20.4 | 3.051 |
54 | Isopropyl myristate | 110-27-0 | 8348 | 11,207 | 8.4 | 10.7 | 3.07 |
55 | Methyl oleate | 112-62-9 | 2000 | 5001 | 6.1 | 18.9 | 2.694 |
56 | Methyl palmitate | 112-39-0 | 4786 | 5001 | 1.8 | 9.4 | 2.789 |
57 | Isopropyl palmitate | 142-91-6 | 17,781 | 45,414 | 50.3 | 13.0 | 1.725 |
58 | Methyl stearate | 112-61-8 | 5237 | 5001 | 2.8 | 10.4 | 1.46 |
59 | Tributyl 2-acetylcitrate | 77-90-7 | 31,500 | 226,174 | 60 | 14 | 1.6 |
60 | Benzyl benzoate | 120-51-4 | 1700 | 665 | 6.2 | 5.3 | 2.357 |
61 | cis-1,2-dichloroethene | 156-59-2 | 1393 | 13,700 | 54.2 | 180 | 1.18 |
62 | 1,1-dichloroethane | 75-34-3 | 725 | 13,000 | 100.0 | 154 | 1.24 |
63 | 1,1,1,2-tetrachloroethane | 630-20-6 | 670 | 2100 | 20 | 134.0 | 1.559 |
64 | 1-chloropropane | 540-54-5 | 2000 | 14,034 | 117.8 | 30 | 0.763 |
65 | 1-chlorobutane | 109-69-3 | 2670 | 11,879 | 101.2 | 18.2 | 1.333 |
66 | 1-chloropentane | 543-59-9 | 3379 | 11,804 | 27.8 | 10.5 | 1.402 |
67 | Dimethyl sulphide | 75-18-3 | 535 | 5156 | 87.1 | 10.7 | 0.561 |
68 | Dimethyl sulfoxide | 67-68-5 | 2758 | 4291 | 36.9 | 1.6 | 0.349 |
69 | Diethylamine | 109-89-7 | 540 | 4000 | 218.5 | 5.0 | 0.21 |
70 | 2-pyrrolidone | 616-45-5 | 2030 | 1083 | 152.5 | 2.5 | 0.5 |
Variable | Mean | Mean Error |
---|---|---|
Melting point (°C) | −43.1378 | −0.0056 |
Boiling point (°C) | 142.3852 | −0.4245 |
Density (g cm−3) | 0.9521 | −0.0005 |
Water solubility (mg dm−3) | 116,796.6328 | −279.7044 |
vapor pressure (Pa) | 11,901.6258 | 403.5914 |
Henry law constant (Pa m3 mol−1) | 2.4677 | 0.0955 |
log KOW | 2.2285 | 0.0125 |
log KOA | 4.4644 | −0.0269 |
Oral LD50 (mg kg−1) | 7.6122 | −0.0093 |
Inhalation LC50 (ppm) | 8.2734 | 0.0014 |
Fish LC50 (mg dm−3) | 4.2248 | −0.0069 |
BOD t1/2 (days) | 2.2431 | 0.0065 |
log BCF | 1.1450 | 0.0076 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Łuczyńska, G.; Pena-Pereira, F.; Tobiszewski, M.; Namieśnik, J. Expectation-Maximization Model for Substitution of Missing Values Characterizing Greenness of Organic Solvents. Molecules 2018, 23, 1292. https://doi.org/10.3390/molecules23061292
Łuczyńska G, Pena-Pereira F, Tobiszewski M, Namieśnik J. Expectation-Maximization Model for Substitution of Missing Values Characterizing Greenness of Organic Solvents. Molecules. 2018; 23(6):1292. https://doi.org/10.3390/molecules23061292
Chicago/Turabian StyleŁuczyńska, Gabriela, Francisco Pena-Pereira, Marek Tobiszewski, and Jacek Namieśnik. 2018. "Expectation-Maximization Model for Substitution of Missing Values Characterizing Greenness of Organic Solvents" Molecules 23, no. 6: 1292. https://doi.org/10.3390/molecules23061292
APA StyleŁuczyńska, G., Pena-Pereira, F., Tobiszewski, M., & Namieśnik, J. (2018). Expectation-Maximization Model for Substitution of Missing Values Characterizing Greenness of Organic Solvents. Molecules, 23(6), 1292. https://doi.org/10.3390/molecules23061292