Next Article in Journal
Advanced Stainless Steel—From Making, Shaping, Treating to Products
Previous Article in Journal
Enhancing Reverse Design Ability of Functional Materials Based on Data Quality Management: Taking Biomedical Zinc Alloy as an Example
Previous Article in Special Issue
Stability of SnSe-Based Thermoelectric Compounds
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning Phase Classification of Thermoelectric Materials

1
Department of Physics, University of Virginia, Charlottesville, VA 22904, USA
2
Department of Material Science and Engineering, University of Virginia, Charlottesville, VA 22904, USA
*
Author to whom correspondence should be addressed.
Materials 2025, 18(20), 4726; https://doi.org/10.3390/ma18204726
Submission received: 2 September 2025 / Revised: 11 October 2025 / Accepted: 13 October 2025 / Published: 15 October 2025

Abstract

In this study, we employ a Support Vector Machine (SVM) model to efficiently classify the phases of thermoelectric (TE) alloys. While ab initio calculations and experiments have explored the phases of functional TE materials, the large variety of alloys makes these explorations time-consuming and expensive. Therefore, there is a critical need for time-efficient methods to accelerate the discovery and development of new TE materials. Recently, machine learning (ML) classification models have been applied to predict material phases, including those of multi-principal element alloys. Using an SVM to classify phases of TE alloys, our results demonstrate that the model achieves prediction accuracies ranging from 77 % to 92 % . Additionally, cross-validation across various TE phases is performed to demonstrate the model’s robustness in phase differentiation. This work offers a time-efficient computational approach to distinguish TE material phases, offering valuable insights that can aid in the evaluation and design of high-performance thermoelectric materials.

1. Introduction

Conventional energy generation loses a considerable amount of energy to waste heat [1]. Transforming this waste heat into electricity will significantly advance the efficiency of energy production. Thermoelectric (TE) technology has the goal of turning this waste heat into electricity [2,3,4,5]. In addition, thermoelectric materials have been investigated for improving battery thermal management, which can optimize the efficiency and lifespan of these systems [6,7]. A wide range of materials has been investigated for thermoelectric applications, including Half-Heusler (HH) compounds [8,9,10,11], B i 2 T e 3 -based alloys [12,13,14], Ge, Pb, transition metal (TM) chalcogenides [15,16,17,18,19,20,21,22,23], oxides [24,25,26], and M g 2 (Si or Sb)-based alloys [27,28,29]. The phase formations of these materials can greatly affect their thermoelectric properties. In this context, phase formation refers to the development of specific crystal structures. For instance, in Half-Heusler (HH) compounds, it corresponds to the formation of the XYZ face-centered cubic (FCC) structure [9]. Many ab initio calculations and experiments have investigated phase formations in thermoelectric alloys. However, with many types of thermoelectric materials to choose from, identifying and differentiating their phases can be time-consuming and costly.
To this end, recent studies have employed machine learning (ML) models as an efficient method to identify phase formation in various compositionally complex or high-entropy alloys, correspondingly known as CCAs or HEAs. These phases include B2, face-centered cubic (FCC), body-centered cubic (BCC), hexagonal, and amorphous. In these studies, databases of materials have been constructed using both experimental data and first-principles calculations [30,31,32,33,34,35,36,37,38]. Both elemental parameters and alloy parameters have been used as descriptors in these models. Feature selection techniques [31,32,33,39], including correlation coefficients and wrapper methods, have been applied to select relevant raw features. Furthermore, feature engineering methods, such as using math variations and one-hot encoding, have been employed to enhance the model’s performance. Different kinds of classification models [30,31,32,33,34,35,36,37,38], including Support Vector Machine (SVM), random forest (RF), neural network (NN), and gradient boosting machine (GBM), have been developed. These models have been used to categorize various types of alloys, including multi-principal element alloys and high-entropy alloys. Some of these models have shown excellent predictive ability, achieving an accuracy of over 90 % . Furthermore, regression models have predicted numerous thermoelectric properties, including figure of merit ZT, Seebeck Coefficient S, and thermal conductivity κ [40,41,42,43,44,45,46]. Some of these models have achieved a coefficient of determination, R 2 , of over 0.90. In addition, the application of machine learning to a complex and diverse material system parallels approaches in other fields, for example, the use of physical parameters to optimize traffic flow dynamics [47].
While several studies have employed classification models to investigate phase formation in complex alloys, and regression models to predict thermoelectric properties, there is a noticeable lack of research specifically focused on the phase classification of thermoelectric materials. As previously discussed, the phase of a thermoelectric material plays a critical role in determining its thermoelectric performance. Therefore, developing a time-efficient and cost-effective ML classification approach to distinguish between different phases of thermoelectric materials can provide essential insights. Such an approach would complement existing regression models by offering valuable guidance for the design and discovery of high-performance thermoelectric materials.
To address this problem, we focus on distinguishing the phases of various thermoelectric materials in this study. We construct databases to identify the phase formations of different groups of thermoelectric materials. These groups include Half-Heusler (HH) compounds, which form an FCC structure; M g 2 (Si or Sb)-based alloys, which form a hexagonal structure; B i 2 T e 3 -based alloys, which form a rhombohedral structure; transition metal (TM) chalcogenides, which generally exhibit a hexagonal structure; and (Pb, Sn, Ge) chalcogenides, which may adopt hexagonal, rhombohedral, or cubic structures. For oxide-based thermoelectric materials, we classify them into four structural categories: hexagonal, perovskite, orthorhombic, and rhombohedral. Using a Support Vector Machine (SVM) with previously developed alloy parameters [33], we classify various phases of thermoelectric materials. To further enhance model accuracy, we create a new set of raw features by incorporating additional elemental parameters alongside the alloy parameters. Moreover, we evaluate the model’s ability to distinguish between different phases through cross-validation across multiple thermoelectric material databases. The results demonstrate the model’s effectiveness in classifying thermoelectric phases that exhibit specific thermoelectric properties. Thus, the model developed herein provides an important grounding for understanding the structure–property relationships essential in the development of future thermoelectric materials.

2. Methods

In this work, we adopt the ML phase classification models of Qi et al. with appropriate modifications [33] to classify thermoelectric crystal phases. The overall process of this method is illustrated in the flowchart shown in Figure 1. TE materials are grouped into databases based on their material classes and phases. These include Half-Heusler (HH), M g 2 (Si or Sb)-based alloys, B i 2 T e 3 -based alloys, TM chalcogenides, (Pb, Sn, Ge) chalcogenides, and various oxides. Their respective crystal structures are as follows: HHs are FCC; M g 2 (Si or Sb)-based alloys are hexagonal; B i 2 T e 3 -based alloys are rhombohedral, which consists of ( B i , S b ) 2 T e 3 , B i 2 ( T e , S e ) 3 , and doped derivatives of B i 2 T e 3 [12]; TM chalcogenides are hexagonal; (Pb, Sn, Ge) chalcogenides are either rhombohedral or cubic; and various oxides contain hexagonal, perovskite, or orthorhombic structures. Each model is trained independently on a specific phase-type database, such as the HH database. In this framework, an alloy known to form the HH phase is labeled as not forming the other phases. As a result, each model can only predict whether a given composition will form the specific phase it was trained on. For example, a model trained on the HH database will predict whether a composition can form the HH phase, regardless of whether the same composition is predicted to form (or not form) other phases by models trained on different databases. For oxides, the model can only distinguish whether a given composition forms a hexagonal phase, but not between different hexagonal phases. Random forest (RF) and Support Vector Machine (SVM) classification models are trained to categorize the phase formations of TE alloys. Only the results of SVM classification models are shown herein, because SVM models show mostly higher accuracy than RF models. The accuracy used to evaluate the models’ performance is the overall accuracy. For detailed comparisons between different ML classification models, refer to previous work by Qi et al. [33]. During each training, 80 % of the data is used for training and the other 20 % is for validations. This process is repeated ten times.
For feature selection, we start with raw features obtained from various thermodynamic and Hume-Rothery parameters [33]. These features are mixing entropy Δ S m i x , mixing enthalpy Δ H m i x (obtained from Miedama’s model) [48], Ω , Φ , η , k 1 c r , radius mismatch δ , E 2 / E 0 , electronegativity mismatch Δ χ , and mean valence electron concentration VEC. The definitions of these raw features are listed in Table 1. We also utilize elemental parameters, obtained from Matminer [49], as raw features. These elemental parameters include covalent radius, first ionization energy, and Mendeleev number. For a given alloy, the elemental features also include the minimum, maximum, weighted average, standard deviation of the mean, and range of each elemental parameter. In total, 10 alloy parameters and 15 elemental parameters serve as the raw features in this work.
Feature engineering is employed to improve the performance of the machine learning model [33]. First, a new set of features is constructed from raw features X, using mathematical variations x 2 , x 1 , x , ln(x), and e x . Then, the set of features is further expanded by grouping two mathematical variances, A and B, using the following arithmetic operations: A+B, A-B, A/B, and AB. To filter the expanded set of features, the Pearson Correlation Coefficient (PCC) is employed. For any feature pair with |PCC| > 0.9, only one is kept in the model. By doing this, any feature pairs that are strongly correlated, both positive and negative, are filtered down to one. Then, a logistic regression with L1 (or Lasso) regularization is used to directly select important features and eliminate useless features. This selection is achieved by minimizing the total prediction penalty, which is a trade-off between reducing prediction error and regulating the number of selected features. Finally, a sequential learning algorithm selects the best features by minimizing the average prediction error from thirty rounds of five-fold cross-validation. For each round, the top feature is selected for ML. After these steps, only the top features are used for phase classifications in this work. For alloy parameters, the top five features are Δ H m i x 2 , V E C 2 , δ Φ , Δ χ , and Δ S m i x η . With the addition of elemental parameters, the top five features are Δ H m i x 2 , c o v a l e n t r a d i u s ( w e i g h t e d a v e r a g e ) Δ S m i x , Δ χ 2 x Mendeleev number (weighted average), V E C 2 , and δ Φ .

3. Results and Discussion

First, we examine the accuracy of phase classifications using alloy parameters as raw features. As shown in Table 2, using SVM, the accuracy ranges from 77 % to 91 % across material groups. In comparison, RF yields accuracy ranging from 72 % to 87 % , which justifies the selection of SVM for this study. The listed accuracy and the range of accuracy represent the average prediction accuracy and the range from ten repeated calculations with different random seeds, respectively, which play a role in feature selection, feature engineering, and the ML classification algorithm. The highest accuracy is for the prediction HH phase, with an accuracy of 91 % . Then, there is a rather noticeable drop in accuracy to 84 % to 85 % for predicting M g 2 (Si or Sb)-based alloys, B i 2 T e 3 -based alloys, and TM chalcogenides. A further drop in accuracy to 81 % is seen in predicting (Pb, Sn, Ge) chalcogenides. The lowest accuracy is for predicting the oxides, which has 78 % accuracy for perovskites and the orthorhombic phase, and 77 % accuracy for the hexagonal and rhombohedral phases. We also examine specific examples where the model performs well and where it is less successful. For example, for HH compounds, the model accurately predicts the formation of the HH phase for T i C o S n and various doped alloys of T i C o S n , such as T i F e 0.1 C o 0.9 S b and T i F e 0.5 C o 0.5 S b . However, the model fails to predict the HH phase formation for Z r N i P b 0.98 B i 0.02 . From these results, the model can predict the inter-metallic HH phase with the highest accuracy. However, when the alloy group contains more small-group (non-metallic) elements, such as Si and Te, the accuracy decreases to around 85 % . Further decreases are seen in the oxides, which contain non-metal O. A possible explanation for this decrease in accuracy is the incomplete alloy parameters for some of these semi-metal or non-metal alloys. While parameters, such as entropy S, are well-defined for any given alloy, other parameters, including mixing enthalpy Δ H, are estimations, which can be inaccurate. Furthermore, parameters, such as melting temperature T m , can vary greatly for different i-j element pairs. Another plausible reason is that HH compounds form a well-defined FCC structure. In contrast, other material groups, such as chalcogenides, can adopt multiple crystal structures, include hexagonal, rhombohedral, or cubic, making their phase identification more challenging. Despite these limitations, the model is still able to predict the phase formation with a reasonable prediction accuracy of 77 % or above.
Then, we include several elemental parameters, including covalent radius, first ionization energy, and Mendeleev number, to re-examine the model. By incorporating elemental parameters with alloy parameters, the prediction accuracy increases by 1 % to 4 % for all material groups, as shown in Table 3. Starting with the HH group, the prediction accuracy increases from 91 % to 92 % . For M g 2 (Si or Sb)-based alloys, the prediction accuracy increases from 84 % to 86 % . For B i 2 T e 3 -based alloys, the prediction accuracy increases from 85 % to 86 % . For TM chalcogenide alloys, the prediction accuracy increases from 84 % to 86 % . For (Pb, Ge, or Sn) chalcogenide alloys, the prediction accuracy increases from 80 % to 82 % . For the oxides, the hexagonal phase increases from 77 % to 80 % , the perovskite phase increases from 78 % to 81 % , the orthorhombic phase increases from 78 % to 80 % , and the rhombohedral phase increases from 77 % to 81 % . These increases in prediction accuracy can be attributed to the inclusion of more well-defined elemental parameters in the model and the incorporation of some missing physical concepts in alloy parameters. These physical concepts, such as covalent radius and first ionization energy, can play a key role in phase formation. Thus, including these parameters can improve the model. For HH, since the original model with alloy parameters can already predict well with well-defined alloy parameters, the increase in prediction accuracy is marginal, from 91 % to 92 % . More notable improvements of 3 % to 4 % are found in the oxide group, which originally contained more estimated parameters. Thus, incorporating elemental parameters has a greater influence on the oxide group. Overall, by using alloy and elemental parameters with feature engineering, the model achieves prediction accuracies of 80 % or higher across the nine different material groups.
To examine the model’s ability to distinguish between material groups, we use cross-validation to check the model. In this cross-validation, each target material group is tested using models trained on a different material dataset. Table 4 shows the results of this cross-validation. As shown in Table 4, the diagonal terms are the accuracy of the targeted material group trained using the respective dataset. These accuracies are the same as those obtained in Table 2, because they are the same model. The off-diagonal terms are the false positive rate of targeted materials trained using a different dataset. In other words, it represents the percentage of alloys from another material group that a given model falsely predicts as belonging to the trained material group. For example, in the HH column and M g 2 (Si, Sb)-based row, the false positive rate is 0.05. This means that when we use a model trained with HH datasets, and let the model predict if alloys are from the M g 2 (Si, Sb)-based dataset, the model falsely predicts 5 % of those materials will form the HH phase. Looking at Table 4, for the majority of these cross-validations, the false positive rate ranges from 0.01 to 0.10. The material groups that these models have trouble distinguishing are between the TM chalcogenides and (Pb, Ge, or Sn) chalcogenides, where the false positive rate reaches 0.21 and 0.28 using the model trained by (Pb, Ge, or Sn) chalcogenides to predict TM chalcogenides and using the model trained by TM chalcogenides to predict (Pb, Ge, or Sn) chalcogenides, respectively. This is likely due to the fact that they are overlapped in the material phase space between these two material groups, as both contain chalcogen elements (S, Se, or Te). In addition, both chalcogenide groups share similar crystal structures; as mentioned earlier, both can adopt hexagonal, rhombohedral, or cubic structures, which may also contribute to the model’s confusion between these two groups. For oxides, when presented with an oxide, the model attempts to classify it into one of the four oxide phase types included in this study: hexagonal, perovskites, orthorhombic, or rhombohedral. Ideally, the model would be able to accurately differentiate among all four categories. However, as shown in Table 2 and Table 3, the overall prediction accuracy for oxides ranges from 77 % to 81 % . As a result, there are instances of false positives, where the model predicts the formation of multiple oxide phases for a single material. Addressing this limitation will require future experimental validation and more precise ab initio calculations to obtain detailed alloy-specific parameters, which could enhance the model’s predictive accuracy. From this cross-validation, the models show robustness in identifying and distinguishing different TE phases.

4. Conclusions

We have employed Support Vector Machine (SVM) to predict phases of thermoelectric (TE) alloys, with the goal of identifying and distinguishing different TE phases so that specific phases can be predicted correctly. Our initial model, using only alloy parameters, achieved accuracies ranging from 77 % to 91 % . With the incorporation of additional elemental parameters, the accuracies improved to between 80 % and 92 % . To further evaluate the model’s robustness, we performed cross-validation across various TE material groups. Notably, the model achieved a low false positive rate of 0.01 when predicting whether chalcogenide or oxide alloys would incorrectly form the HH phase, and vice versa. However, the model struggled to distinguish between transition metal (TM) chalcogenides and (Pb, Ge, or Sn)-based chalcogenides, with false positive rates reaching 0.21 and 0.28, respectively. With future experimental validation and more accurate ab initio calculations, the precision of alloy parameters can be significantly improved, which is expected to enhance the model’s performance. Overall, this study provides an important step toward the reliable identification of phase formations in TE alloys, which serve as a critical foundation for the design and discovery of high-performance TE materials for future energy applications.

Author Contributions

Data curation and analysis, writing—original draft preparation, C.T.M.; writing—review and editing, supervision, S.J.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available in FigShare at https://doi.org/10.6084/m9.figshare.30041518.v1. accessed on 1 September 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. He, J.; Tritt, T.M. Advances in thermoelectric materials research: Looking back and moving forward. Science 2017, 357, eaak9997. [Google Scholar] [CrossRef] [PubMed]
  2. Wei, J.; Yang, L.; Ma, Z.; Song, P.; Zhang, M.; Ma, J.; Yang, F.; Wang, X. Review of current high-ZT thermoelectric materials. J. Mater. Sci. 2020, 55, 12642–12704. [Google Scholar] [CrossRef]
  3. Shi, X.L.; Zou, J.; Chen, Z.G. Advanced Thermoelectric Design: From Materials and Structures to Devices. Chem. Rev. 2020, 120, 7399–7515. [Google Scholar] [CrossRef] [PubMed]
  4. Hasan, M.N.; Wahid, H.; Nayan, N.; Mohamed Ali, M.S. Inorganic thermoelectric materials: A review. Int. J. Energy Res. 2020, 44, 6170–6222. [Google Scholar] [CrossRef]
  5. Mukherjee, M.; Srivastava, A.; Singh, A.K. Recent advances in designing thermoelectric materials. J. Mater. Chem. C 2022, 10, 12524–12555. [Google Scholar] [CrossRef]
  6. Qi, W.; Lan, P.; Yang, J.; Chen, Y.; Zhang, Y.; Wang, G.; Peng, F.; Hong, J. Multi-U-Style micro-channel in liquid cooling plate for thermal management of power batteries. Appl. Therm. Eng. 2024, 256, 123984. [Google Scholar] [CrossRef]
  7. Qi, W.; Yang, J.; Zhang, Z.; Wu, J.; Lan, P.; Xiang, S. Investigation on thermal management of cylindrical lithium-ion batteries based on interwound cooling belt structure. Energy Convers. Manag. 2025, 340, 119962. [Google Scholar] [CrossRef]
  8. Zhu, T.; Fu, C.; Xie, H.; Liu, Y.; Zhao, X. High Efficiency Half-Heusler Thermoelectric Materials for Energy Harvesting. Adv. Energy Mater. 2015, 5, 1500588. [Google Scholar] [CrossRef]
  9. Poon, S.J. Half Heusler compounds: Promising materials for mid-to-high temperature thermoelectric conversion. J. Phys. D Appl. Phys. 2019, 52, 493001. [Google Scholar] [CrossRef]
  10. Rogl, G.; Rogl, P.F. Development of Thermoelectric Half-Heusler Alloys over the Past 25 Years. Crystals 2023, 13, 1152. [Google Scholar] [CrossRef]
  11. Mitra, M.; Benton, A.; Akhanda, M.S.; Qi, J.; Zebarjadi, M.; Singh, D.J.; Poon, S.J. Conventional Half-Heusler alloys advance state-of-the-art thermoelectric properties. Mater. Today Phys. 2022, 28, 100900. [Google Scholar] [CrossRef]
  12. Cao, T.; Shi, X.L.; Li, M.; Hu, B.; Chen, W.; Liu, W.D.; Lyu, W.; MacLeod, J.; Chen, Z.G. Advances in bismuth-telluride-based thermoelectric devices: Progress and challenges. eScience 2023, 3, 100122. [Google Scholar] [CrossRef]
  13. Kim, S.I.; Lee, K.H.; Mun, H.A.; Kim, H.S.; Hwang, S.W.; Roh, J.W.; Yang, D.J.; Shin, W.H.; Li, X.S.; Lee, Y.H.; et al. Dense dislocation arrays embedded in grain boundaries for high-performance bulk thermoelectrics. Science 2015, 348, 109–114. [Google Scholar] [CrossRef]
  14. Guo, F.; Sun, Y.; Qin, H.; Zhu, Y.; Ge, Z.; Liu, Z.; Cai, W.; Sui, J. BiSbTe alloy with high thermoelectric and mechanical performance for power generation. Scr. Mater. 2022, 218, 114801. [Google Scholar] [CrossRef]
  15. Hong, M.; Li, M.; Wang, Y.; Shi, X.L.; Chen, Z.G. Advances in Versatile GeTe Thermoelectrics from Materials to Devices. Adv. Mater. 2023, 35, 2208272. [Google Scholar] [CrossRef]
  16. Li, M.; Shi, X.L.; Chen, Z.G. Trends in GeTe Thermoelectrics: From Fundamentals to Applications. Adv. Funct. Mater. 2024, 34, 2403498. [Google Scholar] [CrossRef]
  17. Vankayala, R.K.; Lan, T.W.; Parajuli, P.; Liu, F.; Rao, R.; Yu, S.H.; Hung, T.L.; Lee, C.H.; Yano, S.i.; Hsing, C.R.; et al. High zT and Its Origin in Sb-doped GeTe Single Crystals. Adv. Sci. 2020, 7, 2002494. [Google Scholar] [CrossRef]
  18. Gelbstein, Y.; Dashevsky, Z.; Dariel, M. High performance n-type PbTe-based materials for thermoelectric applications. Phys. B Condens. Matter 2005, 363, 196–205. [Google Scholar] [CrossRef]
  19. Xiao, Y.; Zhao, L.D. Charge and phonon transport in PbTe-based thermoelectric materials. NPJ Quantum Mater. 2018, 3, 55. [Google Scholar] [CrossRef]
  20. Wang, L.; Wen, Y.; Bai, S.; Chang, C.; Li, Y.; Liu, S.; Liu, D.; Wang, S.; Zhao, Z.; Zhan, S.; et al. Realizing thermoelectric cooling and power generation in N-type PbS0.6Se0.4 via lattice plainification and interstitial doping. Nat. Commun. 2024, 15, 3782. [Google Scholar] [CrossRef] [PubMed]
  21. Shi, Y.; Sturm, C.; Kleinke, H. Chalcogenides as thermoelectric materials. J. Solid State Chem. 2019, 270, 273–279. [Google Scholar] [CrossRef]
  22. Yu, Y.; Cagnoni, M.; Cojocaru-Mirédin, O.; Wuttig, M. Chalcogenide Thermoelectrics Empowered by an Unconventional Bonding Mechanism. Adv. Funct. Mater. 2020, 30, 1904862. [Google Scholar] [CrossRef]
  23. Wei, T.R.; Qiu, P.; Zhao, K.; Shi, X.; Chen, L. Ag2Q-Based (Q = S, Se, Te) Silver Chalcogenide Thermoelectric Materials. Adv. Mater. 2023, 35, 2110236. [Google Scholar] [CrossRef] [PubMed]
  24. He, J.; Liu, Y.; Funahashi, R. Oxide thermoelectrics: The challenges, progress, and outlook. J. Mater. Res. 2011, 26, 1762–1772. [Google Scholar] [CrossRef]
  25. Yin, Y.; Tudu, B.; Tiwari, A. Recent advances in oxide thermoelectric materials and modules. Vacuum 2017, 146, 356–374. [Google Scholar] [CrossRef]
  26. Banerjee, R.; Chatterjee, S.; Ranjan, M.; Bhattacharya, T.; Mukherjee, S.; Jana, S.S.; Dwivedi, A.; Maiti, T. High-Entropy Perovskites: An Emergent Class of Oxide Thermoelectrics with Ultralow Thermal Conductivity. ACS Sustain. Chem. Eng. 2020, 8, 17022–17032. [Google Scholar] [CrossRef]
  27. Zhang, F.; Chen, C.; Yao, H.; Bai, F.; Yin, L.; Li, X.; Li, S.; Xue, W.; Wang, Y.; Cao, F.; et al. High-Performance N-type Mg3Sb2 towards Thermoelectric Application near Room Temperature. Adv. Funct. Mater. 2020, 30, 1906143. [Google Scholar] [CrossRef]
  28. Zhang, J.; Song, L.; Mamakhel, A.; Jørgensen, M.R.V.; Iversen, B.B. High-Performance Low-Cost n-Type Se-Doped Mg3Sb2-Based Zintl Compounds for Thermoelectric Application. Chem. Mater. 2017, 29, 5371–5383. [Google Scholar] [CrossRef]
  29. Wang, L.; Zhang, W.; Back, S.Y.; Kawamoto, N.; Nguyen, D.H.; Mori, T. High-performance Mg3Sb2-based thermoelectrics with reduced structural disorder and microstructure evolution. Nat. Commun. 2024, 15, 6800. [Google Scholar] [CrossRef]
  30. Islam, N.; Huang, W.; Zhuang, H.L. Machine learning for phase selection in multi-principal element alloys. Comput. Mater. Sci. 2018, 150, 230–235. [Google Scholar] [CrossRef]
  31. Hart, G.L.; Mueller, T.; Toher, C.; Curtarolo, S. Machine learning for alloys. Nat. Rev. Mater. 2021, 6, 730–755. [Google Scholar] [CrossRef]
  32. Lee, K.; Ayyasamy, M.V.; Ji, Y.; Balachandran, P.V. A comparison of explainable artificial intelligence methods in the phase classification of multi-principal element alloys. Sci. Rep. 2022, 12, 11591. [Google Scholar] [CrossRef] [PubMed]
  33. Qi, J.; Hoyos, D.I.; Poon, S.J. Machine learning-based classification, interpretation, and prediction of high-entropy-alloy intermetallic phases. High Entropy Alloy. Mater. 2023, 1, 312–326. [Google Scholar] [CrossRef]
  34. Bilińska, K.; Winiarski, M.J. Machine learning-based predictions for half-Heusler phases. Inorganics 2023, 12, 5. [Google Scholar] [CrossRef]
  35. Chen, Z.; Shang, Y.; Liu, X.; Yang, Y. Accelerated discovery of eutectic compositionally complex alloys by generative machine learning. NPJ Comput. Mater. 2024, 10, 204. [Google Scholar] [CrossRef]
  36. Oñate, A.; Seidou, H.; Tchoufang-Tchuindjang, J.; Tuninetti, V.; Miranda, A.; Sanhueza, J.P.; Mertens, A. New analytical parameters for B2 phase prediction as a complement to multiclass phase prediction using machine learning in multicomponent alloys: A computational approach with experimental validation. J. Alloys Compd. 2025, 1022, 179950. [Google Scholar] [CrossRef]
  37. Beniwal, D.; Ray, P.K. FCC vs. BCC phase selection in high-entropy alloys via simplified and interpretable reduction of machine learning models. Materialia 2022, 26, 101632. [Google Scholar] [CrossRef]
  38. Jin, T.; Park, I.; Park, T.; Park, J.; Shim, J.H. Accelerated crystal structure prediction of multi-elements random alloy using expandable features. Sci. Rep. 2021, 11, 5194. [Google Scholar] [CrossRef]
  39. Bansal, A.; Kumar, P.; Yadav, S.; Hariharan, V.; MR, R.; Phanikumar, G. Accelerated design of high entropy alloys by integrating high throughput calculation and machine learning. J. Alloys Compd. 2023, 960, 170543. [Google Scholar] [CrossRef]
  40. Wang, X.; Sheng, Y.; Ning, J.; Xi, J.; Xi, L.; Qiu, D.; Yang, J.; Ke, X. A Critical Review of Machine Learning Techniques on Thermoelectric Materials. J. Phys. Chem. Lett. 2023, 14, 1808–1822. [Google Scholar] [CrossRef] [PubMed]
  41. Xu, Y.; Liu, X.; Wang, J. Prediction of thermoelectric-figure-of-merit based on autoencoder and light gradient boosting machine. J. Appl. Phys. 2024, 135, 074901. [Google Scholar] [CrossRef]
  42. Vaitesswar, U.S.; Bash, D.; Huang, T.; Recatala-Gomez, J.; Deng, T.; Yang, S.W.; Wang, X.; Hippalgaonkar, K. Machine learning based feature engineering for thermoelectric materials by design. Digit. Discov. 2024, 3, 210–220. [Google Scholar] [CrossRef]
  43. Juneja, R.; Yumnam, G.; Satsangi, S.; Singh, A.K. Coupling the high-throughput property map to machine learning for predicting lattice thermal conductivity. Chem. Mater. 2019, 31, 5145–5151. [Google Scholar] [CrossRef]
  44. Iwasaki, Y.; Takeuchi, I.; Stanev, V.; Kusne, A.G.; Ishida, M.; Kirihara, A.; Ihara, K.; Sawada, R.; Terashima, K.; Someya, H.; et al. Machine-learning guided discovery of a new thermoelectric material. Sci. Rep. 2019, 9, 2751. [Google Scholar] [CrossRef]
  45. Barua, N.K.; Lee, S.; Oliynyk, A.O.; Kleinke, H. Thermoelectric Material Performance (zT) Predictions with Machine Learning. ACS Appl. Mater. Interfaces 2025, 17, 1662–1673. [Google Scholar] [CrossRef] [PubMed]
  46. Wang, Z.L.; Yokoyama, Y.; Onda, T.; Adachi, Y.; Chen, Z.C. Improved Thermoelectric Properties of Hot-Extruded Bi–Te–Se Bulk Materials with Cu Doping and Property Predictions via Machine Learning. Adv. Electron. Mater. 2019, 5, 1900079. [Google Scholar] [CrossRef]
  47. Zhang, Y.; Xu, S.; Song, Y.; Qi, W.; Guo, Q.; Li, X.; Kong, L.; Chen, J. Real-Time Global Optimal Energy Management Strategy for Connected PHEVs Based on Traffic Flow Information. IEEE Trans. Intell. Transp. Syst. 2024, 25, 20032–20042. [Google Scholar] [CrossRef]
  48. Zhang, R.; Zhang, S.; He, Z.; Jing, J.; Sheng, S. Miedema Calculator: A thermodynamic platform for predicting formation enthalpies of alloys within framework of Miedema’s Theory. Comput. Phys. Commun. 2016, 209, 58–69. [Google Scholar] [CrossRef]
  49. Ward, L.; Dunn, A.; Faghaninia, A.; Zimmermann, N.E.; Bajaj, S.; Wang, Q.; Montoya, J.; Chen, J.; Bystrom, K.; Dylla, M.; et al. Matminer: An open source toolkit for materials data mining. Comput. Mater. Sci. 2018, 152, 60–69. [Google Scholar] [CrossRef]
Figure 1. A flowchart is provided to illustrate the overall process of the method used in this work. The process begins with databases of thermoelectric materials, from which alloy parameters and elemental parameters are generated as raw features based on the alloys’ compositions. For feature engineering, mathematical variations and subsequent operations are applied to create an expanded set of features. To filter these features, the Pearson Correlation Coefficient (PCC) is used as a criterion: one feature is removed from any pair with an absolute PCC value greater than 0.9 (|PCC| > 0.9). After filtering, L1 regularization is applied to identify and select important features. Finally, best features are obtained from the sequential learning algorithm by minimizing the average prediction error, which are then used to train the phase classification model.
Figure 1. A flowchart is provided to illustrate the overall process of the method used in this work. The process begins with databases of thermoelectric materials, from which alloy parameters and elemental parameters are generated as raw features based on the alloys’ compositions. For feature engineering, mathematical variations and subsequent operations are applied to create an expanded set of features. To filter these features, the Pearson Correlation Coefficient (PCC) is used as a criterion: one feature is removed from any pair with an absolute PCC value greater than 0.9 (|PCC| > 0.9). After filtering, L1 regularization is applied to identify and select important features. Finally, best features are obtained from the sequential learning algorithm by minimizing the average prediction error, which are then used to train the phase classification model.
Materials 18 04726 g001
Table 1. Definitions of alloy parameters used as raw features in this work.
Table 1. Definitions of alloy parameters used as raw features in this work.
DefinitionComments
Δ S m i x = R i = 1 N c i l n ( c i ) R = the gas constant
Δ H m i x = i = 1 , i j N 4 Δ H m i x i j c i c j c i = the atomic percentage of the i-th element for an N-element alloy
Δ H i , j m i x = the binary mixing enthalpy obtained from Miedama’s model [48] of i-j elemental pair
Ω = T m Δ S m i x | Δ H m i x | T m = alloy melting temperature
Φ = Δ G S S | Δ G m a x | Δ G S S = the Gibbs free energy change for forming a fully disordered solid solution phase
η = T a n n Δ S m i x | Δ H f | Δ G m a x = the largest absolute Gibbs free energy for forming the strongest binary compound
T a n n = annealing temperature, or if T a n n is unknown, T a n n = 0.8 T m
k l c r = 1 0.4 T m Δ S m i x Δ H m i x Δ H I M Δ H m i x T m = i j T i j c i c j i j c i c j where T i j is the melting temperature of the i-j elements
Δ H f = the most negative binary mixing enthalpy for forming inter-metallics
δ = i = 1 N c i ( 1 r i j = 1 N c j r j ) 2 Δ H I M = mixing enthalpy for forming inter-metallics
E 2 E 0 = j i N c i c j | r i + r j 2 r ¯ 2 | ( 2 r ¯ ) 2 r i = the atomic radius of the i-th element
r ¯ = i = 1 N c i r i = average atomic radius
Δ χ = i = 1 N c i ( χ i j = 1 N c j χ j ) 2 χ i = electronegativity of i-th element
V E C = i = 1 N c i V E C i V E C i = valence electron count of the i-th element
Table 2. Results of phase classifications using alloy parameters with feature engineering. Accuracy is listed for each material group dataset, with the highest accuracy for HH at 91 % , and the lowest accuracy for hexagonal and rhombohedral oxides at 77 % . The range of accuracy from ten repeated calculations is in parentheses.
Table 2. Results of phase classifications using alloy parameters with feature engineering. Accuracy is listed for each material group dataset, with the highest accuracy for HH at 91 % , and the lowest accuracy for hexagonal and rhombohedral oxides at 77 % . The range of accuracy from ten repeated calculations is in parentheses.
Materials GroupAccuracy
HH 91 % ( 85 96 % )
M g 2 + (Si or Sb)-based 84 % ( 78 90 % )
BiTe-based 85 % ( 80 88 % )
TM Chalcogenides 84 % ( 79 88 % )
(Pb, Ge, or Sn) Chalcogenides 80 % ( 76 84 % )
Oxides (Hexagonal) 77 % ( 74 84 % )
Oxides (Perovskites) 78 % ( 75 82 % )
Oxides (Orthorhombic) 78 % ( 73 81 % )
Oxides (Rhombohedral) 77 % ( 71 84 % )
Table 3. Results of phase classification using alloy and elemental parameters with feature engineering. Accuracy is reported for each material group dataset, with the highest accuracy observed for Half-Heusler (HH) compounds at 92 % , and the lowest for hexagonal and orthorhombic oxides at 80 % . The range of accuracy from ten repeated calculations is shown in parentheses.
Table 3. Results of phase classification using alloy and elemental parameters with feature engineering. Accuracy is reported for each material group dataset, with the highest accuracy observed for Half-Heusler (HH) compounds at 92 % , and the lowest for hexagonal and orthorhombic oxides at 80 % . The range of accuracy from ten repeated calculations is shown in parentheses.
Materials GroupAccuracy
HH 92 % ( 85 96 % )
M g 2 + (Si or Sb)-based 86 % ( 81 90 % )
BiTe-based 86 % ( 81 91 % )
TM Chalcogenides 86 % ( 81 90 % )
(Pb, Ge, or Sn) Chalcogenides 82 % ( 77 86 % )
Oxides (Hexagonal) 80 % ( 76 85 % )
Oxides (Perovskites) 81 % ( 77 82 % )
Oxides (Orthorhombic) 80 % ( 76 83 % )
Oxides (Rhombohedral) 81 % ( 77 84 % )
Table 4. Cross-validation table between materials from different datasets. Diagonal components (red) are the accuracy of targeted materials trained using respective datasets, which is the same as the accuracy shown in Table 2. Off-diagonal components (blue and black) are the false positive rates of targeted materials trained using a different dataset. As indicated by the right and down arrows, each column represents a distinct training set, and each row corresponds to a specific target alloy group used for validation. Besides the TM chalcogenides and (Pb, Ge, or Sn) chalcogenides (blue), which have 0.21 and 0.28 false positives, others show false positives of 0.10 or less.
Table 4. Cross-validation table between materials from different datasets. Diagonal components (red) are the accuracy of targeted materials trained using respective datasets, which is the same as the accuracy shown in Table 2. Off-diagonal components (blue and black) are the false positive rates of targeted materials trained using a different dataset. As indicated by the right and down arrows, each column represents a distinct training set, and each row corresponds to a specific target alloy group used for validation. Besides the TM chalcogenides and (Pb, Ge, or Sn) chalcogenides (blue), which have 0.21 and 0.28 false positives, others show false positives of 0.10 or less.
Training Sets →
Targeted Alloys ↓
HH Mg 2 + (Si or Sb)-BasedBiTe-BasedTM Chalcogenides(Pb, Ge, or Sn) ChalcogenidesOxides
HH0.910.060.030.100.040.02
Mg 2 + (Si or Sb)-based0.050.840.030.080.020.02
BiTe-based0.010.020.850.010.020.01
TM Chalcogenides0.070.050.020.840.210.02
(Pb, Ge, or Sn) Chalcogenides0.010.010.020.280.800.01
Oxides0.010.010.010.010.010.77
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ma, C.T.; Poon, S.J. Machine Learning Phase Classification of Thermoelectric Materials. Materials 2025, 18, 4726. https://doi.org/10.3390/ma18204726

AMA Style

Ma CT, Poon SJ. Machine Learning Phase Classification of Thermoelectric Materials. Materials. 2025; 18(20):4726. https://doi.org/10.3390/ma18204726

Chicago/Turabian Style

Ma, Chung T., and S. Joseph Poon. 2025. "Machine Learning Phase Classification of Thermoelectric Materials" Materials 18, no. 20: 4726. https://doi.org/10.3390/ma18204726

APA Style

Ma, C. T., & Poon, S. J. (2025). Machine Learning Phase Classification of Thermoelectric Materials. Materials, 18(20), 4726. https://doi.org/10.3390/ma18204726

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop