Predictive Modeling of Critical Temperatures in Superconducting Materials
Abstract
1. Introduction
2. Results and Discussion
2.1. Data Pre-Processing
2.2. Model Development
2.3. Optimization of the Best Models
2.4. Interpretation of Optimized Model and Potential Real-World Applications
3. Materials and Methods
3.1. Dataset
3.2. Duplicates Removal
3.3. Attribute Selection
3.4. QSPR Modeling
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Conflicts of Interest
Sample Availability
Abbreviations
| AE | absolute error |
| MLR | multiple linear regression |
| NN | neural network |
| PCA | principal component analysis |
| RF | random forests |
| PLS | projections to latent structures |
| RMSE | root mean squared error |
| XGBoost | gradient boosted decision trees |
References
- Hamidieh, K. A data-driven statistical model for predicting the critical temperature of a superconductor. Comput. Mater. Sci. 2018, 154, 346–354. [Google Scholar] [CrossRef]
- Mousavi, T.; Grovenor, C.R.M.; Speller, S.C. Structural parameters affecting superconductivity in iron chalcogenides: A review. Mater. Sci. Technol. 2014. [Google Scholar] [CrossRef]
- Bardeen, J.; Rickayzen, G.; Tewordt, L. Theory of the Thermal Conductivity of Superconductors. Phys. Rev. 1959, 113, 982–994. [Google Scholar] [CrossRef]
- Gallop, J.C. Introduction to Superconductivity, in: SQUIDs. Josephson Eff. Supercond. Electron. 2018. [Google Scholar] [CrossRef][Green Version]
- Schafroth, M.R. Theory of superconductivity. Phys. Rev. 1954. [Google Scholar] [CrossRef]
- Stanev, V.; Oses, C.; Kusne, A.G.; Rodriguez, E.; Paglione, J.; Curtarolo, S.; Takeuchi, I. Machine learning modeling of superconducting critical temperature. Npj Comput. Mater. 2018. [Google Scholar] [CrossRef]
- Kononenko, O.; Adolphsen, C.; Li, Z.; Ng, C.-K.; Rivetta, C. 3D multiphysics modeling of superconducting cavities with a massively parallel simulation suite. Phys. Rev. Accel. Beams. 2017, 20, 102001. [Google Scholar] [CrossRef]
- Tanaka, I.; Rajan, K.; Wolverton, C. Data-centric science for materials innovation. MRS Bull. 2018, 43, 659–663. [Google Scholar] [CrossRef]
- Liu, Y.; Zhao, T.; Ju, W.; Shi, S. Materials discovery and design using machine learning. J. Mater. 2017. [Google Scholar] [CrossRef]
- Smith, J.S.; Isayev, O.; Roitberg, A.E. ANI-1: An extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 2017, 4, 3192–3203. [Google Scholar] [CrossRef]
- Jha, D.; Ward, L.; Paul, A.; Liao, W.; Choudhary, A.; Wolverton, C.; Agrawal, A. ElemNet: Deep Learning the Chemistry of Materials From Only Elemental Composition. Sci. Rep. 2018, 8, 17593. [Google Scholar] [CrossRef] [PubMed]
- Sizochenko, N.; Mikolajczyk, A.; Jagiello, K.; Puzyn, T.; Leszczynski, J.; Rasulev, B. How toxicity of nanomaterials towards different species could be simultaneously evaluated: Novel multi-nano-read-across approach. Nanoscale 2018, 10, 582–591. [Google Scholar] [CrossRef] [PubMed]
- Halder, A.K.; Moura, A.S.; Cordeiro, M.N.D.S. QSAR modelling: A therapeutic patent review 2010-present. Expert Opin. Ther. Pat. 2018. [Google Scholar] [CrossRef] [PubMed]
- Goh, G.B.; Hodas, N.O.; Vishnu, A. Deep learning for computational chemistry. J. Comput. Chem. 2017. [Google Scholar] [CrossRef]
- Tropsha, A. Best practices for QSAR model development, validation, and exploitation. Mol. Inform. 2010, 29, 476–488. [Google Scholar] [CrossRef]
- Cherkasov, A.; Muratov, E.N.; Fourches, D.; Varnek, A.; Baskin, I.I.; Cronin, M.; Dearden, J.; Gramatica, P.; Martin, Y.C.; Todeschini, R.; et al. QSAR Modeling: Where Have You Been? Where Are You Going To? J. Med. Chem. 2014, 57, 4977–5010. [Google Scholar] [CrossRef]
- Correa-Baena, J.-P.; Hippalgaonkar, K.; van Duren, J.; Jaffer, S.; Chandrasekhar, V.R.; Stevanovic, V.; Wadia, C.; Guha, S.; Buonassisi, T. Accelerating Materials Development via Automation, Machine Learning, and High-Performance Computing. Joule 2018, 2, 1410–1420. [Google Scholar] [CrossRef]
- Ghiringhelli, L.M.; Vybiral, J.; Levchenko, S.V.; Draxl, C.; Scheffler, M. Big Data of Materials Science: Critical Role of the Descriptor. Phys. Rev. Lett. 2015, 114, 105503. [Google Scholar] [CrossRef]
- De Jong, M.; Chen, W.; Notestine, R.; Persson, K.; Ceder, G.; Jain, A.; Asta, M.; Gamst, A. A Statistical Learning Framework for Materials Science: Application to Elastic Moduli of k-nary Inorganic Polycrystalline Compounds. Sci. Rep. 2016, 6, 34256. [Google Scholar] [CrossRef]
- Lehmus, K.; Karppinen, M. Application of Multivariate Data Analysis Techniques in Modeling Structure–Property Relationships of Some Superconductive Cuprates. J. Solid State Chem. 2001, 162, 1–9. [Google Scholar] [CrossRef]
- Villars, P.; Phillips, J. Quantum structural diagrams and high-T_{c} superconductivity. Phys. Rev. B. 1988, 37, 2345–2348. [Google Scholar] [CrossRef]
- OECD. Guidance Document on the Validation of (Quantitative) Structure-Activity Relationship [(Q)Sar] Models. Transport 2007. [Google Scholar] [CrossRef]
- Sizochenko, N.; Jagiello, K.; Leszczynski, J.; Puzyn, T. How the “Liquid Drop” Approach Could Be Efficiently Applied for Quantitative Structure–Property Relationship Modeling of Nanofluids. J. Phys. Chem. C. 2015, 119, 25542–25547. [Google Scholar] [CrossRef]
- Mejía-Salazar, J.R.; Perea, J.D.; Castillo, R.; Diosa, J.E.; Baca, E. Hybrid superconducting-ferromagnetic [Bi2Sr2(Ca,Y)2Cu3O10]0.99(La2/3Ba1/3MnO3)0.01 composite thick films. Materials 2019, 12, 861. [Google Scholar] [CrossRef]
- Zhang, G.; Samuely, T.; Xu, Z.; Jochum, J.K.; Volodin, A.; Zhou, S.; May, P.W.; Onufriienko, O.; Kačmarčík, J.; Steele, J.A.; et al. Superconducting Ferromagnetic Nanodiamond. ACS Nano. 2017. [Google Scholar] [CrossRef]
- Bache, K.; Lichman, M. UCI Machine Learning Repositor. Univ. Calif. Irvine Sch. Inf. 2013. [Google Scholar] [CrossRef][Green Version]
- Xu, Y.; Hosoya, J.; Sakairi, Y.; Yamasato, H. Superconducting Material Database (SuperCon), n.d. Available online: https://supercon.nims.go.jp/index_en.html (accessed on 18 August 2020).
- Jurs, P.C. Mathematica. J. Chem. Inf. Comput. Sci. 1992. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Liu, P.; Long, W. Current mathematical methods used in QSAR/QSPR studies. Int. J. Mol. Sci. 2009, 10, 1978. [Google Scholar] [CrossRef]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- RapidMiner Studio, version (9.3); (n.d.); RapidMiner Inc.: Boston, MA, USA, 2019.





| Preprocessing | Attribute Selection | Performance | ||
|---|---|---|---|---|
| R2 | RMSE | AE | ||
| Cleaned Dataset | n/a | 0.726 ± 0.012 | 17.664 ± 0.279 | 13.317 ± 0.194 |
| weight by relief | 0.611 ± 0.017 | 21.038 ± 0.490 | 16.286 ± 0.374 | |
| weight by PCA | 0.606 ± 0.016 | 21.170 ± 0.453 | 16.131 ± 0.326 | |
| weight by correlation | 0.618 ± 0.011 | 20.860 ± 0.372 | 16.060 ± 0.239 | |
| Correlations Removed | n/a | 0.699 ± 0.009 | 18.505 ± 0.348 | 14.185 ± 0.265 |
| weight by relief | 0.657 ± 0.021 | 19.771 ± 0.521 | 14.957 ± 0.391 | |
| weight by PCA | 0.576 ± 0.011 | 21.957 ± 0.243 | 17.339 ± 0.243 | |
| weight by correlation | 0.610 ± 0.006 | 21.063 ± 0.236 | 16.760 ± 0.165 | |
| No Outliers | n/a * | 0.734 ± 0.007 | 17.414 ± 0.251 | 13.124 ± 0.241 |
| weigh by relief | 0.607 ± 0.013 | 21.199 ± 0.349 | 16.351 ± 0.342 | |
| weight by PCA | 0.616 ± 0.012 | 20.936 ± 0.289 | 15.927 ± 0.262 | |
| weight by correlation | 0.626 ± 0.014 | 20.682 ± 0.347 | 15.882 ± 0.239 | |
| Correlations Removed, No Outliers | n/a | 0.708 ± 0.016 | 18.244 ± 0.435 | 13.983 ± 0.411 |
| weight by relief | 0.603 ± 0.017 | 21.310 ± 0.378 | 16.631 ± 0.347 | |
| weight by PCA | 0.585 ± 0.010 | 21.761 ± 0.367 | 17.163 ± 0.270 | |
| weight by correlation | 0.619 ± 0.016 | 20.867 ± 0.323 | 16.578 ± 0.293 | |
| Preprocessing | Attribute Selection | Performance | ||
|---|---|---|---|---|
| R2 | RMSE | AE | ||
| Cleaned Dataset | n/a | 0.840 ± 0.011 | 14.376 ±0.346 | 10.515 ± 0.269 |
| weight by relief | 0.801 ± 0.015 | 15.774 ± 0.467 | 11.489 ± 0.366 | |
| weight by PCA | 0.808 ± 0.007 | 15.576 ± 0.319 | 11.354 ± 0.143 | |
| weight by correlation | 0.803 ± 0.009 | 15.715 ± 0.315 | 11.442 ± 0.231 | |
| Correlations Removed | n/a | 0.831 ± 0.012 | 14.718 ± 0.441 | 10.704 ± 0.309 |
| weight by relief | 0.810 ± 0.011 | 15.486 ± 0.406 | 11.356 ± 0.193 | |
| weight by PCA | 0.799 ± 0.006 | 15.864 ± 0.247 | 11.441 ± 0.220 | |
| weight by correlation | 0.814 ± 0.006 | 15.337 ± 0.273 | 11.143 ± 0.173 | |
| No Outliers | n/a* | 0.847 ± 0.009 | 14.132 ± 0.347 | 10.314 ± 0.260 |
| weigh by relief | 0.810 ± 0.014 | 15.473 ± 0.344 | 11.250 ± 0.226 | |
| weight by PCA | 0.812 ± 0.007 | 15.424 ± 0.291 | 11.238 ± 0.191 | |
| weight by correlation | 0.810 ± 0.012 | 15.494 ± 0.250 | 11.222 ± 0.181 | |
| Correlations Removed, No Outliers | n/a | 0.839 ± 0.012 | 14.428 ± 0.428 | 10.472 ± 0.301 |
| weight by relief | 0.817 ± 0.014 | 15.237 ± 0.349 | 11.113 ± 0.245 | |
| weight by PCA | 0.803 ± 0.015 | 15.756 ± 0.428 | 11.337 ± 0.266 | |
| weight by correlation | 0.820 ± 0.016 | 15.114 ± 0.463 | 10.969 ± 0.280 | |
| Preprocessing | Attribute Selection | Performance | ||
|---|---|---|---|---|
| R2 | RMSE | AE | ||
| Cleaned Dataset | n/a | 0.863 ± 0.010 | 12.614 ± 0.466 | 8.351 ± 0.300 |
| weight by relief | 0.836 ± 0.005 | 13.745 ± 0.239 | 9.105 ± 0.171 | |
| weight by PCA | 0.844 ± 0.007 | 13.410 ± 0.315 | 8.815 ± 0.150 | |
| weight by correlation | 0.851 ± 0.007 | 13.119 ± 0.194 | 8.643 ± 0.166 | |
| Correlations Removed | n/a | 0.855 ± 0.011 | 12.965 ± 0.490 | 8.591 ± 0.315 |
| weight by relief | 0.830 ± 0.014 | 13.987 ± 0.470 | 9.308 ± 0.249 | |
| weight by PCA | 0.837 ± 0.011 | 13.715 ± 0.354 | 9.010 ± 0.203 | |
| weight by correlation | 0.846 ± 0.009 | 13.331 ± 0.391 | 8.788 ± 0.202 | |
| No Outliers | n/a* | 0.868 ± 0.007 | 12.399 ± 0.247 | 8.180 ± 0.165 |
| weigh by relief | 0.848 ± 0.011 | 13.278 ± 0.439 | 8.748 ± 0.276 | |
| weight by PCA | 0.849 ± 0.010 | 13.224 ± 0.496 | 8.670 ± 0.313 | |
| weight by correlation | 0.856 ± 0.007 | 12.893 ± 0.251 | 8.431 ± 0.134 | |
| Correlations Removed, No Outliers | n/a | 0.859 ± 0.014 | 12.790 ± 0.371 | 8.426 ± 0.177 |
| weight by relief | 0.848 ± 0.017 | 13.266 ± 0.558 | 8.789 ± 0.277 | |
| weight by PCA | 0.843 ± 0.010 | 13.497 ± 0.415 | 8.827 ± 0.229 | |
| weight by correlation | 0.853 ± 0.015 | 13.063 ± 0.474 | 8.579 ± 0.230 | |
| Preprocessing | Attribute Selection | Performance | ||
|---|---|---|---|---|
| R2 | RMSE | AE | ||
| Cleaned Dataset | n/a | 0.837 ± 0.012 | 14.194 ± 0.696 | 9.619 ± 0.426 |
| weight by relief | 0.746 ± 0.013 | 17.685 ± 0.603 | 12.667 ± 0.755 | |
| weight by PCA | 0.763 ± 0.012 | 16.902 ± 0.866 | 11.906 ± 1.058 | |
| weight by correlation | 0.769 ± 0.011 | 16.857 ± 1.009 | 12.028 ± 1.167 | |
| Correlations Removed | n/a | 0.831 ± 0.009 | 14.637 ± 0.848 | 10.379 ± 0.999 |
| weight by relief | 0.783 ± 0.019 | 16.496 ± 1.023 | 11.700 ± 1.117 | |
| weight by PCA | 0.766 ± 0.016 | 17.086 ± 0.942 | 12.249 ± 0.987 | |
| weight by correlation | 0.780 ± 0.012 | 16.746 ± 1.231 | 12.054 ± 1.343 | |
| No Outliers | n/a* | 0.842 ± 0.007 | 14.186 ± 0.794 | 10.021 ± 1.137 |
| weigh by relief | 0.755 ± 0.013 | 17.460 ± 0.773 | 12.497 ± 1.069 | |
| weight by PCA | 0.773 ± 0.013 | 16.888 ± 0.942 | 12.287 ± 1.019 | |
| weight by correlation | 0.774 ± 0.013 | 16.805 ± 0.937 | 12.004 ± 1.007 | |
| Correlations Removed, No Outliers | n/a | 0.834 ± 0.010 | 13.996 ± 0.332 | 9.369 ± 0.305 |
| weight by relief | 0.777 ± 0.010 | 16.541 ± 0.599 | 11.817 ± 0.726 | |
| weight by PCA | 0.775 ± 0.012 | 16.858 ± 1.206 | 12.016 ± 1.566 | |
| weight by correlation | 0.793 ± 0.012 | 16.394 ± 1.532 | 11.916 ± 2.028 | |
| Preprocessing | Performance | Algorithm | |||
|---|---|---|---|---|---|
| MLR | XGBoost | RF | NN | ||
| Aggregation Only 1 | R2 | 0.542 ± 0.014 | 0.768 ± 0.014 | 0.825 ± 0.008 | 0.688 ± 0.013 |
| RMSE | 0.677 ± 0.012 | 0.501 ± 0.013 | 0.421 ± 0.012 | 0.566 ± 0.018 | |
| AE | 0.535 ± 0.008 | 0.364 ± 0.011 | 0.278 ± 0.007 | 0.408 ± 0.023 | |
| Aggregation Only 1, No outliers | R2 | 0.530 ± 0.013 | 0.780 ± 0.012 | 0.834 ± 0.012 | 0.691 ± 0.021 |
| RMSE | 0.673 ± 0.017 | 0.492 ± 0.012 | 0.412 ± 0.015 | 0.574 ± 0.023 | |
| AE | 0.551 ± 0.016 | 0.356 ± 0.009 | 0.270 ± 0.010 | 0.419 ± 0.024 | |
| Aggregation, Merged Attributes | R2 | 0.726 ± 0.011 | 0.840 ± 0.012 | 0.863 ± 0.011 | 0.836 ± 0.009 |
| RMSE | 17.657 ± 0.421 | 14.376 ± 0.433 | 12.615 ± 0.433 | 14.224 ± 0.591 | |
| AE | 13.312 ± 0.263 | 10.490 ± 0.293 | 8.339 ± 0.261 | 9.932 ± 0.836 | |
| Aggregation, Merged Attributes, No outliers | R2 | 0.735 ± 0.006 | 0.846 ± 0.012 | 0.867 ± 0.012 | 0.844 ± 0.011 |
| RMSE | 17.409 ± 0.300 | 14.126 ± 0.378 | 12.405 ± 0.524 | 13.624 ± 0.391 | |
| AE | 13.121 ± 0.293 | 10.279 ± 0.318 | 8.186 ± 0.377 | 9.469 ± 0.504 | |
| Preprocessing | Attribute selection | Performance | ||
|---|---|---|---|---|
| R2 | RMSE | AE | ||
| Original Dataset (with Duplicates) | n/a (XGBoost model from [1]) | 0.92 | 9.5 | - |
| n/a | 0.926 ± 0.004 | 9.344 ± 0.289 | 5.142 ± 0.147 | |
| weight by relief | 0.922 ± 0.005 | 9.544 ± 0.372 | 5.313 ± 0.160 | |
| weight by PCA | 0.922 ± 0.007 | 9.551 ± 0.357 | 5.346 ± 0.107 | |
| weight by correlation | 0.923 ± 0.007 | 9.494 ± 0.504 | 5.297 ± 0.168 | |
| Cleaned Dataset | n/a | 0.923 ± 0.005 | 9.365 ± 0.329 | 5.168 ± 0.110 |
| weight by relief | 0.914 ± 0.009 | 9.882 ± 0.518 | 5.504 ± 0.221 | |
| weight by PCA | 0.917 ± 0.009 | 9.737 ± 0.476 | 5.513 ± 0.248 | |
| weight by correlation | 0.917 ± 0.009 | 9.683 ± 0.492 | 5.510 ± 0.141 | |
| Correlations Removed | n/a | 0.925 ± 0.005 | 9.265 ± 0.244 | 5.170 ± 0.190 |
| weight by relief | 0.920 ± 0.009 | 9.557 ± 0.511 | 5.377 ± 0.256 | |
| weight by PCA | 0.918 ± 0.008 | 9.665 ± 0.442 | 5.463 ± 0.189 | |
| weight by correlation | 0.919 ± 0.009 | 9.613 ± 0.544 | 5.424 ± 0.235 | |
| No Outliers | n/a * | 0.930 ± 0.012 | 8.927 ± 0.689 | 4.975 ± 0.259 |
| weight by relief | 0.921 ± 0.007 | 9.497 ± 0.417 | 5.334 ± 0.169 | |
| weight by PCA | 0.920 ± 0.007 | 9.557 ± 0.388 | 5.408 ± 0.211 | |
| weight by correlation | 0.922 ± 0.010 | 9.444 ± 0.593 | 5.354 ± 0.285 | |
| Correlations Removed, No Outliers | n/a | 0.929 ± 0.005 | 9.012 ± 0.319 | 5.030 ± 0.121 |
| weight by relief | 0.924 ± 0.004 | 9.336 ± 0.242 | 5.296 ± 0.121 | |
| weight by PCA | 0.922 ± 0.006 | 9.413 ± 0.379 | 5.332 ± 0.196 | |
| weight by correlation | 0.921 ± 0.011 | 9.477 ± 0.659 | 5.334 ± 0.279 | |
| Attribute 1 | Relative Importance | Scaled Importance |
|---|---|---|
| range_ThermalConductivity | 47,722,904.0 | 1.000 |
| wtd_gmean_ThermalConductivity | 10,336,861.0 | 0.217 |
| range_atomic_radius | 3,051,781.3 | 0.064 |
| range_atomic_mass | 2,503,977.0 | 0.052 |
| range_fie | 2,469,144.3 | 0.052 |
| wtd_range_fie | 1,768,628.4 | 0.037 |
| wtd_mean_atomic_mass | 1,551,901.4 | 0.033 |
| mean_Density | 1,533,498.8 | 0.032 |
| gmean_atomic_radius | 1,522,213.5 | 0.032 |
| wtd_range_atomic_radius | 1,455,983.8 | 0.031 |
| wtd_mean_Density | 890,073.5 | 0.019 |
| wtd_std_fie | 832,274.6 | 0.017 |
| wtd_mean_atomic_radius | 832,100.6 | 0.017 |
| mean_fie | 792,180.4 | 0.017 |
| range_Density | 744,477.6 | 0.016 |
| range_ElectronAffinity | 720,590.1 | 0.015 |
| gmean_ThermalConductivity | 670,280.3 | 0.014 |
| mean_atomic_mass | 664,245.5 | 0.014 |
| gmean_atomic_mass | 412,535.3 | 0.009 |
| wtd_gmean_Density | 344,775.4 | 0.007 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sizochenko, N.; Hofmann, M. Predictive Modeling of Critical Temperatures in Superconducting Materials. Molecules 2021, 26, 8. https://doi.org/10.3390/molecules26010008
Sizochenko N, Hofmann M. Predictive Modeling of Critical Temperatures in Superconducting Materials. Molecules. 2021; 26(1):8. https://doi.org/10.3390/molecules26010008
Chicago/Turabian StyleSizochenko, Natalia, and Markus Hofmann. 2021. "Predictive Modeling of Critical Temperatures in Superconducting Materials" Molecules 26, no. 1: 8. https://doi.org/10.3390/molecules26010008
APA StyleSizochenko, N., & Hofmann, M. (2021). Predictive Modeling of Critical Temperatures in Superconducting Materials. Molecules, 26(1), 8. https://doi.org/10.3390/molecules26010008

