Formation Energy Prediction of Doped Perovskite Structures Based on Transfer Learning with Small Datasets

Yu, Yang; Deng, Mingxuan; Rui, Tianhao; Ma, Zhuangzhuang; Lu, Linyuan; Wang, Yunhao; Lan, Tianxing; Lan, Yulin; Wan, Hengcheng; Li, Yiyan; Li, Zhipeng; Zhang, Haibin

doi:10.3390/cryst15121008

Open AccessArticle

Formation Energy Prediction of Doped Perovskite Structures Based on Transfer Learning with Small Datasets

by

Yang Yu

¹,

Mingxuan Deng

¹,

Tianhao Rui

¹

,

Zhuangzhuang Ma

¹,

Linyuan Lu

¹,

Yunhao Wang

¹,

Tianxing Lan

¹,

Yulin Lan

¹,

Hengcheng Wan

^1,2,*,

Yiyan Li

^1,*,

Zhipeng Li

^3,* and

Haibin Zhang

^1,*

¹

College of Smart Energy, Shanghai Jiao Tong University, Shanghai 200240, China

²

Civil Aircraft Fire Science and Safety Engineering Key Laboratory of Sichuan Province, Civil Aviation Flight University of China, Guanghan 618300, China

³

Frontiers Science Center for Flexible Electronics, Xi’an Institute of Flexible Electronics, Northwestern Polytechnical University, Xi’an 710072, China

^*

Authors to whom correspondence should be addressed.

Crystals 2025, 15(12), 1008; https://doi.org/10.3390/cryst15121008 (registering DOI)

Submission received: 27 October 2025 / Revised: 10 November 2025 / Accepted: 17 November 2025 / Published: 24 November 2025

(This article belongs to the Special Issue Emerging Perovskite Materials and Applications)

Download

Browse Figures

Versions Notes

Abstract

Doped perovskites are widely studied in the domain of perovskite material design. However, due to the limited data available for the target materials, machine learning methods based on small datasets become particularly important. In this study, we propose a transfer learning strategy aimed at predicting doped perovskites on limited data samples. This strategy first utilizes the ABO₃-type perovskite dataset to develop a deep learning source model based on its formation energies. Then, fine-tuning is performed on the doped perovskite structure dataset to obtain a model with good transferability, applicable to the doped perovskite oxide target domain. Based on the transfer learning model, we further predict the formation energies of 12,897 A₂BB′O₆ compounds, 10,401 AA′B₂O₆ compounds, and 49,723 AA′BB′O₆ compounds. With the tolerance factor

t \in [0.7 - 1.1]

, octahedral factor

μ \in [0.45 - 0.7]

, and the modified tolerance factor

τ \in [0, 4.18]

for screening, we successfully predict 3389 A₂B′BO₆, 3002 AA′B₂O₆, and 13,563 AA′BB′O₆ structures as potential stable doped perovskite candidates. Among these filtered results, 821 A₂B′BO₆, 69 AA′B₂O₆, and 6 AA′BB′O₆ compounds have been reported in the OQMD database. For each doped perovskite, we select the candidate with the lowest formation energy and perform DFT validation. This resulted in three newly reported stable doped perovskite materials: CaSrHfScO₆, BaSrHf₂O₆, and Ba₂HfNdO₆. The transfer learning-based perovskite material design method proposed in this study not only effectively addresses the challenges of model training on small datasets but also significantly improves the accuracy and stability of doped perovskite material predictions. Through transfer learning, the model can fully leverage the data and knowledge from the ABO₃-type perovskite, effectively overcoming the problem of limited data. This strategy provides a new approach for efficient perovskite material design, enabling broader structural and performance predictions under limited experimental data conditions, and offering a powerful tool for the development of novel functional materials.

Keywords:

solid oxide fuel cell; machine learning; transfer learning; small data; stability

1. Introduction

In recent years, perovskite materials have attracted widespread attention in research on solid oxide fuel cell (SOFC) materials owing to their superior properties such as mixed ionic–electronic conductivity (MIEC), excellent oxygen reduction reaction (ORR) catalytic activity, highly tunable chemical composition, and outstanding thermodynamic and chemical stability [1,2,3,4,5,6,7]. Considering the extreme and harsh operating conditions of SOFCs—high temperatures of 600–1000 °C, repeated redox cycling, and exposure to toxic fuel impurities—material stability is crucial; only with excellent stability can the safety and longevity of the cell be ensured [8,9]. The chemical composition of ABO₃-type perovskite structures can be tailored by hetero-ion doping at the A- and B-sites (e.g., Sr, Ni), thereby enhancing properties such as electrical conductivity, catalytic performance, and stability [10,11,12]. Therefore, the development of doped perovskite materials holds significant scientific importance and practical value.

Traditional materials discovery requires extensive experimentation, making it a time-consuming and costly process [13]. Computational methods provide an attractive alternative for exploring a wider design space. Density functional theory (DFT) offers strong predictive capability; however, its high computational cost makes exhaustive high-throughput screening over vast compositional and configurational spaces challenging [14]. In addition to conventional high-throughput DFT calculations, global structure prediction frameworks such as the USPEX code [15,16] have been successfully employed to explore complex energy landscapes, predict stable crystal structures, and discover new compounds, including perovskite-type oxides. These evolutionary-algorithm-based approaches provide an essentially unbiased and accurate search over structural and compositional space, but their computational cost remains substantial when thousands of doped compositions or large design spaces must be systematically evaluated.

With the rapid development of artificial intelligence (AI) in materials science, machine-learning-driven approaches have shown tremendous potential in predicting and designing the properties of perovskite materials [17,18]. Li et al. compared multiple machine learning algorithms for predicting the formation energies of perovskites and demonstrated that ensemble strategies based on random forests, support vector machines, and neural networks can effectively improve predictive accuracy. These models also simultaneously predicted relevant physical properties, such as cell volume and thermodynamic stability, showing good generalization performance [19]. Emery et al. developed property prediction models based on DFT data for multiple ABX₃ perovskite systems and systematically analyzed their formation energies, thermodynamic stability, and oxygen vacancy formation energies [20]. These studies highlight the promise of data-driven models as efficient surrogates for accelerating perovskite materials discovery.

Although machine learning has made significant progress in perovskite property prediction, most existing works are still limited to conventional ABO₃-type compositions. In contrast, doped perovskites—owing to their enhanced tunability and performance potential—are more relevant for practical SOFC applications, yet remain underexplored. On one hand, doped structures are severely underrepresented in existing databases and suffer from typical issues such as small sample sizes and imbalanced composition distributions. On the other hand, their higher structural and chemical complexity leads to larger discrepancies in feature distributions between simple and doped systems, increasing the difficulty of model training. As a result, traditional supervised learning models often perform poorly on such small-sample and distribution-shift scenarios, tending to overfit and exhibiting limited generalization [21]. In this context, there is a strong need for an efficient and robust framework that can exploit knowledge learned from large datasets of simple perovskites and transfer it to complex doped systems, thereby providing a fast and scalable pre-screening tool that is complementary to high-cost first-principles and evolutionary structure search methods such as USPEX.

To address these challenges, the emergence of small-sample learning strategies—particularly transfer learning (TL)—has offered new solutions for improving the prediction accuracy of complex material structures. The core concept of TL is to transfer the knowledge learned from a “source domain” with abundant data to a “target domain” with limited data, enabling good model performance even under small-data conditions [22,23,24]. In recent years, transfer learning has shown great potential in material modeling tasks. Lee and Asahi [25] applied transfer learning to crystal graph convolutional neural networks (CGCNN). By first training on a large dataset (source domain) and then transferring the learned structure–property mapping capabilities to a small dataset (target domain), they significantly improved the predictive accuracy under small-sample conditions. They also verified that the transfer strategy was effective across different crystal systems, such as metal oxides and nitrides, offering a feasible route for deploying deep graph neural networks in materials science. Moreover, Zhou et al. [26] proposed a center–environment embedding mechanism, enabling cross-structure transfer learning from spinel oxides to perovskite oxides. Despite the scarcity of data for the target system (perovskites), the model maintained low prediction errors, demonstrating its effectiveness in capturing local structural correlations across different crystal systems. It provides a promising direction for knowledge transfer between complex structures, especially for structure–property prediction under small-data scenarios.

In summary, this work proposes a transfer learning strategy for predicting the formation energy of perovskite materials across multiple structural types. ABO₃-type structures are used as the source domain, while doped structures are treated as the target domain. By constructing a unified feature representation for element and structure information, we first train a deep neural network model on the source domain, and then transfer its learned weights to the small-sample target structures. This approach significantly improves prediction performance under data-scarce conditions. The proposed method is expected to provide a reference solution for small-data modeling in material science and accelerate the screening and design of complex doped perovskite materials. As illustrated in Figure 1, the overall workflow of this study includes key steps such as data preprocessing, model training, and weight transfer, offering a clear framework for applying transfer learning in perovskite property prediction.

2. Methods

2.1. Dataset Preparation

To construct a representative multi-structure perovskite dataset, we select four typical perovskites and their derivative structures from the publicly available Materials Project database [27]. These structures include ABO₃, A₂B′BO₆, AA′B₂O₆, and AA′BB′O₆. A₂BB′O₆ is structurally equivalent to AB_0.5B′_0.5O₃, in which the B-site is half-substituted by B′. Similarly, AA′B₂O₆ represents a configuration in which the A-site is half occupied by a dopant cation A′, while AA′BB′O₆ incorporates dopants at both A- and B-sites, forming a double substitutional perovskite structure. The number of species is shown in Table 1.

The data selection process followed these principles:

Valid Formation Energy Labels: Each sample must include a valid formation energy label (eV/atom) to ensure that each data point has a reliable target value.
Chemical Formula Duplication Handling: When records with the same chemical formula are found, only the sample with the smallest formation energy is retained to ensure that the most stable structure is selected.
Perovskite Material Scope [28]: In this study, the A-site elements are limited to the following candidate elements: “Ba”, “Ca”, “Cd”, “Ce”, “Cs”, “K”, “La”, “Na”, “Nd”, “Pb”, “Ra”, “Rb”, “Sm”, “Sr”, “Th”, “Tl”, “U”, “Y”, “Pr”, “Zn”, “Dy”, “Gd”, “Ho”, “Sn”, “Mg”, “Er”. The B-site elements are limited to the following candidate elements: “Al”, “Co”, “Cr”, “Cu”, “Fe”, “Ga”, “Ge”, “Hf”, “Ir”, “Mg”, “Mn”, “Mo”, “Nb”, “Ni”, “Os”, “Pd”, “Pt”, “Re”, “Rh”, “Ru”, “Sc”, “Si”, “Sn”, “Ta”, “Tc”, “Ti”, “V”, “W”, “Y”, “Zn”, “Zr”, “Nd”. Additionally, the stoichiometric ratio of the sample must satisfy A:B:O = 1:1:3, ensuring that the selected materials belong to the perovskite category.The elements used in this work are shown in Figure 2.

2.2. Feature Engineering

The features used in this study are primarily constructed based on elemental characteristics and crystal structure information, including the basic physicochemical properties of the A- and B-site cations and derived structural factors. All the physicochemical properties are automatically extracted using the Python material modeling toolkit pymatgen [29]. These elemental features are further used to derive structural descriptors, such as the tolerance factor. All elemental and structural features are then integrated and used as input to the neural network.

2.2.1. Elemental Property Feature Extraction

We use the pymatgen extraction tool to extract 8 types of elemental property features from each of the A- and B-site cations, which include atomic number Z, atomic mass M, electronegativity χ, ionization energy I, ionic radius

r^{i o n}

, Mendeleev group number G, melting point

T_{m}

, and boiling point

T_{b}

.

To ensure consistent input dimensions across different structure types, we apply the following processing methods for the A- and B-site elements: If the A-site has no dopant element, the features of the A′-site are set to be the same as those of the A-site. Similarly, if the B-site has no dopant element, the features of the B′-site are set to be the same as those of the B-site. This ensures that the elemental feature vector always has a fixed dimension of 8 × 4 = 32.

We conceptually considered alternative schemes, such as zero-padding, using a separate “vacant” token, or adding explicit binary mask features for doped/undoped sites. However, these approaches either break the physical symmetry between chemically identical sites or introduce additional hyperparameters and complexity without clear benefit for the present data size. In contrast, the adopted mirroring strategy preserves invariance with respect to identical site occupancies and yields a deterministic, easily reproducible input representation. Importantly, this choice does not by construction favor doped over undoped systems; it only ensures that any predictive advantage arises from genuine differences in elemental descriptors rather than from arbitrary encoding artifacts. We acknowledge that more sophisticated occupancy encodings (e.g., explicit site-fraction vectors) may further improve performance for larger and more diverse datasets and leave this as an interesting direction for future work.

2.2.2. Derived Structural Feature Calculation

We include the tolerance factor and octahedral factor, which are commonly used indicators of structural stability in perovskite systems, as part of our model features.

The tolerance factor is a key geometric parameter that measures the stability of the perovskite structure, proposed by Goldschmidt [30], and is defined as

t = \frac{r_{A} + r_{O}}{\sqrt{2} (r_{B} + r_{O})}

(1)

where

r_{A}, r_{B}

,

r_{O}

represent the ionic radius of the A-site and B-site cations and anions, respectively.

In multi-doped perovskite systems (such as A₂B′BO₆, AA′B₂O₆, etc.), the same lattice sites (such as the A-site or B-site) are often jointly occupied by multiple chemical elements. To accurately quantify the effect of doping on the lattice geometric parameters, this study uses the stoichiometric ratio weighted average method to calculate the equivalent ionic radius of the cations at each lattice site:

t = \frac{\bar{r_{A}} + r_{O}}{\sqrt{2} (\bar{r_{B}} + r_{O})}

(2)

For structures with

n

types of doped A-site elements, the effective ionic radius is defined as

{\bar{r}}_{A} = \sum_{i = 1}^{n} ω_{A_{i}} \cdot r_{A_{i}}

(3)

where

ω_{A_{i}}

is the mole fraction of the i-th element in the chemical formula, and

r_{A_{i}}

is the Shannon ionic radius of the corresponding element in a specific oxidation state and coordination environment.

Similarly, for structures with m types of doped B-site elements:

{\bar{r}}_{B} = \sum_{j = 1}^{m} ω_{B_{j}} \cdot r_{B_{j}}

(4)

Additionally, there is the octahedral factor, which quantifies the size matching between the B-site cation and the oxygen octahedral cavity (with the oxygen ionic radius), reflecting the degree of octahedral distortion. The further this value deviates from the ideal range (≥0.414), the lower the structural stability [31]. The formula is as follows:

μ = \frac{\bar{r_{B}}}{r_{O}}

(5)

2.3. Transfer Learning Strategy

To address the challenges posed by small datasets, such as low prediction accuracy and limited generalization capability, this study employs a transfer learning approach. The ABO₃ type perovskite structure is selected as the source domain, where a source model is trained using 463 samples from this category. Although the sample sizes of the target domain structure, namely A₂B′BO₆ (449 samples), AA′B₂O₆ (138 samples), and AA′BB′O₆ (441 samples), the ABO₃ structure and its doped derivatives share strong structural similarity. Therefore, transfer learning can still effectively improve the prediction accuracy for the target structures by transferring the knowledge learned from the source domain.

First, we train a deep neural network (DNN) on the source domain data (ABO₃-type structures) using ten-fold cross-validation. The network consists of six fully connected layers: the input layer connects to a hidden layer with 512 units, followed by a 15% dropout layer to mitigate overfitting. Subsequent hidden layers decrease in size by half (i.e., 256, 128, 64, and 32 units), each employing the ReLU activation function. The final output layer produces the predicted formation energy. The model is trained using the mean squared error (MSE) loss function for 100 epochs, with a batch size of 32 and a learning rate of 0.0005. Among the ten trained models, the one that achieves the highest coefficient of determination (R²) on the validation set is selected as the source model and saved for transfer learning.

The pre-trained weights from the source model are then transferred to the target domain model and fine-tuned on the target dataset. During fine-tuning, no layers are frozen, and incremental transfer learning is not adopted, as preliminary experiments on our small datasets show that these strategies do not yield satisfactory results. Therefore, we employ full fine-tuning of all parameters, enabling the model to effectively transfer knowledge from the source domain to the target structures. This approach is particularly beneficial for small datasets, where models with randomly initialized weights often fail to converge effectively. By leveraging pre-trained weights from the source domain, especially when the target and source domains are closely related (e.g., ABO₃ perovskites and their doped counterparts), transfer learning accelerates convergence and enhances prediction performance, even with limited data.

3. Results and Discussion

In this section, we validate the effectiveness of transfer learning on small datasets by comparing the performance of the baseline model (i.e., the model trained directly using the target domain data) with the transfer learning model. To minimize the influence of randomness, all models were trained and evaluated using 10-fold cross-validation.

During the training process, we set the learning rate, batch size, and other hyperparameters consistently to ensure a fair comparison between the models. The models use the mean absolute error (MAE) as the loss function and employ the coefficient of determination (R²) and root mean square error (RMSE) as performance evaluation metrics. The performance comparison between the baseline model and the transfer learning model for different structures is shown in the table, further proving the effectiveness of transfer learning in improving prediction accuracy for small datasets.

As shown in Figure 3, the transfer learning model outperforms the baseline model across all structure types, with particularly significant improvements in structures with smaller sample sizes. The results in the table represent the average performance under 10-fold cross-validation.

As demonstrated in Table 2, For the AA′B₂O₆ structure, which has the smallest number of samples, transfer learning improves the R² value from 0.919 to 0.956, while MAE and RMSE decrease by approximately 17.6% and 24.8%, respectively. This result further confirms the effectiveness of transfer learning in small-sample scenarios.

It is worth noting that transfer learning also achieves stable performance improvements in structures such as A₂B′BO₆ and AA′BB′O₆, whose sample sizes are comparable to those of the source domain. For example, in the A₂B′BO₆ structure, R² increased from 0.909 to 0.941, MAE decreased from 0.096 to 0.079 (a reduction of about 17.7%), and RMSE dropped from 0.132 to 0.108 (a reduction of about 18.2%), indicating a significant reduction in model error and enhanced fitting capability. In the AA′BB′O₆ structure, R² improved from 0.913 to 0.931, MAE decreased from 0.084 to 0.075 (a reduction of about 10.7%), and RMSE declined from 0.116 to 0.105 (a reduction of about 9.5%), demonstrating that even under non-extremely low sample conditions, the knowledge learned from the source domain model still exhibits good transferability and generalization ability.

Building upon these encouraging validation results, we apply the trained models to large-scale forward prediction tasks. Specifically, for each structure type, we select the best-performing model from the 10-fold cross-validation (i.e., the one with the highest R²) as the final prediction model.

To construct the prediction datasets, we systematically recombine the A-site and B-site elements that have appeared in the training data. The core rationale behind this strategy is that the model has already learned the mapping relationships between these elemental combinations and their corresponding formation energies. Therefore, recombining within the known elemental space not only expands the material design space but also ensures that predictions remain within the effective domain of the model’s learned chemical knowledge.

Using this approach, we predicted the formation energies of 12,897 A₂B′BO₆ compounds, 10,401 AA′B₂O₆ compounds, and 49,723 AA′BB′O₆ compounds. These candidate materials were subsequently filtered based on the following structural criteria: Tolerance factor

t \in [0.7 - 1.1]

, Octahedral factor

μ \in [0.45, 0.7]

, Modified tolerance factor [29]

τ \in [0, 4.18]

.

τ = \frac{r_{X}}{r_{B}} - n_{A} (n_{A} - \frac{\frac{r_{A}}{r_{B}}}{\ln (\frac{r_{A}}{r_{B}})})

(6)

In this formula,

r_{A}

,

r_{B}

and

r_{X}

represent the ionic radii of the A-site cation, B-site cation, and anion, respectively, while

n_{A}

is taken as the weighted average oxidation state of the A-site cations. The oxidation states used in this work are determined using the oxidation-state assignment routines implemented in pymatgen and are provided, together with the corresponding scripts and input data, in the Supplementary Data (xlsx file).

After applying these filters, 3389 A₂B′BO₆, 3002 AA′B₂O₆, and 13,563 AA′BB′O₆ compounds are retained as potentially stable doped perovskite candidates. The predicted materials for each category are illustrated in Table 3; only the top ten are shown.

Among these filtered results, 821 A₂B′BO₆, 69 AA′B₂O₆, and 6 AA′BB′O₆ compounds are found to have matching entries in the OQMD database, indicating that they have already been independently verified as stable doped perovskite materials.

To further assess the behavior of the model in a forward-prediction scenario, we evaluate its performance on an independent test set constructed from OQMD, which is not used during training and is generated using different DFT settings from those of the Materials Project. As summarized in Table 4, the model maintains a clear correlation with the OQMD formation energies for all structure types; however, both the MAE and R2R^2R2 values are noticeably degraded compared with the internal cross-validation results on the MP-based dataset (Table 2). In particular, the increase in error indicates that the present model does not exhibit uniformly strong, database-independent generalization. Instead, these discrepancies are consistent with a systematic domain shift between the MP and OQMD computational protocols (e.g., different pseudopotentials, DFT+UUU settings, and reference energies). Therefore, the OQMD evaluation should be interpreted as an out-of-distribution stress test: it demonstrates that the proposed transfer learning framework preserves a certain level of transferability beyond its training domain, while also revealing its limitation that the predictions remain anchored to the “DFT dialect” of the Materials Project rather than representing universally calibrated formation energies.

Finally, for each type of doped perovskite, we select the compound with the lowest predicted formation energy—that is, the most thermodynamically stable candidate—for further DFT calculations. As a result, three stable doped perovskite structures are identified: CaSrHfScO₆, BaSrHf₂O₆, and Ba₂HfNdO₆.

4. Conclusions

This study proposes a transfer learning-based strategy for predicting the formation energy of perovskite materials, successfully addressing the issue of reduced prediction accuracy caused by small sample datasets. By using the ABO₃-type perovskite structure as the source domain and other doped perovskite structures as the target domain, we employ a deep neural network model and fine-tune the weights on the target structures, significantly improving the prediction performance for small sample target structures. The experimental results show that the transfer learning model significantly outperforms the baseline model across all structure types, especially in structures with fewer samples (such as AA′B₂O₆ and A₃B₂B′O₉), where transfer learning improved prediction accuracy and effectively reduced errors.

Although this study has achieved certain results, there are still areas worth further exploration. For instance, future research could explore combining multi-source domain models with transfer learning to enhance the adaptability to complex perovskite material systems and improve model generalization ability. Furthermore, with the development of deep learning technologies, combining novel neural network architectures (such as graph neural networks) with transfer learning strategies may further improve prediction accuracy, providing more efficient tools for the design and optimization of perovskite materials.

Overall, transfer learning provides an effective approach to solving the prediction problems of perovskite materials, especially under small-dataset conditions, and has important application potential. This study offers new insights for the screening and design of perovskite materials and provides a reference path for small-sample learning issues in materials science. Furthermore, the underlying concept is not limited to perovskites: in conventional machine learning models, the network parameters are typically initialized randomly, whereas in transfer learning, the model is initialized from a chemically related source domain, which can significantly improve convergence and generalization when the target system shares similar structural motifs and descriptor space. Therefore, provided that comparable feature definitions (e.g., site-resolved elemental descriptors) and analogous structural representations can be constructed, the present framework could in principle be extended to other oxide families or non-perovskite crystal structures, offering a promising route for efficient property prediction and composition screening beyond the perovskite domain.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/cryst15121008/s1, File S1: DFT; File S2: mp_data; File S3: prediation; Table S1: oxidation.xlsx.

Author Contributions

Conceptualization, Y.Y., H.W., Y.L. (Yiyan Li) and Z.L.; methodology, Y.Y.; software, Y.Y., M.D., T.R. and Z.M.; validation, Y.Y., M.D., T.R., Z.M. and H.Z.; formal analysis, Y.Y.; investigation, Y.Y., L.L., Y.W., T.L. and Y.L. (Yulin Lan); resources, H.W., Y.L. (Yiyan Li), Z.L. and H.Z.; data curation, Y.Y., M.D. and L.L.; writing—original draft preparation, Y.Y.; writing—review and editing, Y.Y., H.W., Y.L. (Yiyan Li), Z.L. and H.Z.; visualization, Y.Y., Z.M. and L.L.; supervision, H.W., Y.L. (Yiyan Li), Z.L. and H.Z.; project administration, H.W.; funding acquisition, H.W., Y.L. (Yiyan Li), Z.L. and H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This study was financially supported by the National Natural Science Foundation of China (U24B2025), the LingChuang Research Project of China National Nuclear Corporation, and the open research fund of Songshan Lake Materials Laboratory (2023SLABFN09).

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Irshad, M.; Idrees, R.; Siraj, K.; Shakir, I.; Rafique, M.; Ain, Q.U.; Raza, R. Electrochemical evaluation of mixed ionic–electronic perovskite cathode LaNi_1−xCo_xO_3−δ for IT-SOFC synthesized by high-temperature decomposition. Int. J. Hydrog. Energy 2021, 46, 10448–10456. [Google Scholar] [CrossRef]
Ji, Q.; Bi, L.; Zhang, J.; Cao, H.; Zhao, X.S. The role of oxygen vacancies of ABO₃ perovskite oxides in the oxygen reduction reaction. Energy Environ. Sci. 2020, 13, 1408–1428. [Google Scholar] [CrossRef]
Yuan, B.; Wang, N.; Tang, C.; Meng, L.; Du, L.; Su, Q.; Aoki, Y.; Ye, S. Advances and challenges in high-performance cathodes for protonic solid oxide fuel cells and machine learning-guided perspectives. Nano Energy 2024, 122, 109306. [Google Scholar] [CrossRef]
Cao, J.; Ji, Y.; Shao, Z. Perovskites for protonic ceramic fuel cells: A review. Energy Environ. Sci. 2022, 15, 2200–2232. [Google Scholar] [CrossRef]
Cao, J.; Wu, S.; He, J.; Zhou, Y.; Ma, P. Research progress of high-entropy perovskite oxides in energy and environmental applications: A review. Particuology 2024, 95, 62–81. [Google Scholar] [CrossRef]
Sikstrom, D.; Thangadurai, V. A tutorial review on solid oxide fuel cells: Fundamentals, materials, and applications. Ionics 2024. [Google Scholar] [CrossRef]
Yahyazadeh, A. A Comprehensive Review of the Development of Perovskite Oxide Anodes for Fossil Fuel-Based Solid Oxide Fuel Cells (SOFCs): Prospects and Challenges. Physchem 2025, 5, 25. [Google Scholar] [CrossRef]
Zarabi, S.; Asghar, M.I.; Lund, P. A review on solid oxide fuel cell durability: Latest progress and trends. Renew. Sustain. Energy Rev. 2022, 161, 112339. [Google Scholar] [CrossRef]
Price, R.; Cassidy, M.; Grolig, J.G.; Longo, G.G.; Weissen, U.G.; Mai, A.G.; Irvine, J.T.S. Upscaling of co-impregnated La_0.2Sr_0.25Ca_0.45TiO₃ anodes for solid oxide fuel cells: A progress report on a decade of academic-industrial collaboration. Adv. Energy Mater. 2021, 11, 2003951. [Google Scholar] [CrossRef]
Pan, B.; Miao, H.; Liu, F.; Yuan, J. Optimizing La_1−xSr_xFeO_3−δ electrodes for symmetrical reversible solid oxide cells. Int. J. Hydrog. Energy 2023, 48, 11045–11057. [Google Scholar] [CrossRef]
Bai, J.H.; Niu, L.L.; Zhu, Q.R.; Zhou, D.F.; Zhu, X.F.; Wang, N.; Yan, W.; Wang, J.; Liang, Q.; Wang, C. Ni-doped Fe-based perovskite to obtain multifunctional and highly efficient electrocatalytic active IT-SOFC electrode. Fuel 2024, 365, 131334. [Google Scholar] [CrossRef]
Guo, R.; He, T. High-Entropy Perovskite Electrolyte for Protonic Ceramic Fuel Cells Operating below 600 °C. ACS Mater. Lett. 2022, 4, 1646–1652. [Google Scholar] [CrossRef]
Pyzer-Knapp, E.O.; Pitera, J.W.; Staar, P.W.J.; Takeda, S.; Laino, T.; Sanders, D.P.; Sexton, J.; Smith, J.R.; Curioni, A. Accelerating materials discovery using artificial intelligence, high performance computing and robotics. npj Comput. Mater. 2022, 8, 84. [Google Scholar] [CrossRef]
Jain, A.; Ong, S.P.; Hautier, G.; Chen, W.; Richards, W.D.; Dacek, S.; Cholia, S.; Gunter, D.; Skinner, D.; Ceder, G.; et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 2013, 1, 011002. [Google Scholar] [CrossRef]
Oganov, A.R.; Glass, C.W. Crystal structure prediction using ab initio evolutionary techniques: Principles and applications. J. Chem. Phys. 2006, 124, 244704. [Google Scholar] [CrossRef]
Lyakhov, A.O.; Oganov, A.R.; Stokes, H.T.; Zhu, Q. New developments in evolutionary structure prediction algorithm USPEX.Comput. Phys. Commun. 2013, 184, 1172–1182. [Google Scholar] [CrossRef]
Peivaste, I.; Belouettar, S.; Mercuri, F.; Fantuzzi, N.; Daouadji, A.; Dehghani, H.; Izadi, R.; Ibrahim, H.; Lengiewicz, J.; Belouettar-Mathis, M.; et al. Artificial intelligence in materials science and engineering: Current landscape, key challenges, and future trajectories. Compos. Struct. 2025, 372, 19419. [Google Scholar] [CrossRef]
Cheetham, A.K.; Seshadri, R. Artificial Intelligence Driving Materials Discovery? A Perspective. Chem. Mater. 2024, 36, 3490–3495. [Google Scholar] [CrossRef]
Li, R.; Deng, Q.; Tian, D.; Zhu, D.; Lin, B. Predicting Perovskite Performance with Multiple Machine-Learning Algorithms. Crystals 2021, 11, 818. [Google Scholar] [CrossRef]
Touati, S.; Benghia, A.; Hebboul, Z.; Lefkaier, I.K.; Kanoun, M.B.; Goumri-Said, S. Machine learning models for efficient property prediction of ABX₃ materials: A high-throughput approach. ACS Omega 2024, 9, 47519–47531. [Google Scholar] [CrossRef]
Xu, P.; Ji, X.; Li, M.; Lu, W. Small data machine learning in materials science. npj Comput. Mater. 2023, 9, 42. [Google Scholar] [CrossRef]
Ranaweera, M.; Mahmoud, Q.H. Virtual to real-world transfer learning: A systematic review. Electronics 2021, 10, 1491. [Google Scholar] [CrossRef]
Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. Proc. IEEE 2021, 109, 43–76. [Google Scholar] [CrossRef]
Schütt, K.T.; Sauceda, H.E.; Kindermans, P.J.; Tkatchenko, A.; Müller, K.R. SchNet—A deep learning architecture for molecules and materials. J. Chem. Phys. 2018, 148, 241722. [Google Scholar] [CrossRef] [PubMed]
Lee, J.; Asahi, R. Transfer learning for materials informatics using crystal graph convolutional neural network. Comput. Mater. Sci. 2021, 190, 110314. [Google Scholar] [CrossRef]
Li, Y.; Zhu, R.; Wang, Y.; Feng, L.; Liu, Y. Center–environment deep transfer machine learning across crystal structures: From spinel oxides to perovskite oxides. npj Comput. Mater. 2023, 9, 109. [Google Scholar] [CrossRef]
Li, W.; Jacobs, R.; Morgan, D. Predicting the Thermodynamic Stability of Perovskite Oxides Using Machine Learning Models. Comput. Mater. Sci. 2018, 150, 454–463. [Google Scholar] [CrossRef]
O’Keeffe, M.; Rohl, A.L.; Kim, J.; Choudhary, A.; Wolverton, C.; Agrawal, A. Pymatgen: A Python Materials Genomics Library. Comput. Mater. Sci. 2017, 99, 133–138. [Google Scholar] [CrossRef]
Goldschmidt, V.M. Die Gesetze der Krystallochemie. Naturwissenschaften 1926, 14, 477–485. [Google Scholar] [CrossRef]
Li, C.; Soh, K.C.K.; Wu, P. Formability of ABO₃ perovskites. J. Alloys Compd. 2004, 372, 40–48. [Google Scholar] [CrossRef]
Bartel, C.J.; Sutton, C.; Goldsmith, B.R.; Ouyang, R.; Musgrave, C.B.; Ghiringhelli, L.M.; Scheffler, M. New tolerance factor to predict the stability of perovskite oxides and halides. Sci. Adv. 2019, 5, eaav0693. [Google Scholar] [CrossRef]

Figure 1. Overall workflow diagram.

Figure 2. Constituent elements of perovskite oxides studied in this work. The periodic table highlights the elements used in this study: A-site elements are marked in red, B-site elements are shown in green, and elements that appear in both A-site and B-site positions are colored blue. All other elements not included in the dataset are displayed in gray.

Figure 3. Performance comparison of baseline and transfer learning models across three perovskite structures.

Table 1. Number of samples for each perovskite structure type.

Perovskite Structure Type	Number of Samples
ABO3	463
A2B′BO6	449
AA′B2O6	138
AA′BB′O6	441

Table 2. Performance comparison chart.

Structure	A2B′BO6			AA′B2O6			AA′BB′O₆
Metric	<MAE>	<R²>	<RMSE>	<MAE>	<R²>	<RMSE>	<MAE>	<R²>	<RMSE>
Baseline	0.096	0.909	0.132	0.085	0.919	0.129	0.084	0.913	0.116
Transfer	0.079	0.941	0.108	0.070	0.956	0.097	0.075	0.931	0.105

Table 3. Predicted candidates of doped perovskites across configurations (top ten).

AA′B₂O₆	Ef (eV/atom)	A₂B′BO₆	Ef (eV/atom)	AA′BB′O₆	Ef (eV/atom)
BaCaHf₂O₆	−3.661	Ba₂HfNdO₆	−3.644	CaSrHfScO₆	−3.587
BaSrHf₂O₆	−3.648	Ba₂HfYO₆	−3.634	LaSrAlScO₆	−3.583
CaSrHf₂O₆	−3.636	Sr₂AlYO₆	−3.632	BaSrHfScO₆	−3.580
BaSrSc₂O₆	−3.633	Ca₂AlYO₆	−3.620	KNdHfScO₆	−3.579
BaCaSc₂O₆	−3.633	Ba₂TaNdO₆	−3.619	BaSrHfYO₆	−3.579
LaNaSc₂O₆	−3.633	Sr₂HfScO₆	−3.616	NaNdHfScO₆	−3.577
CaSrSc₂O₆	−3.626	Sr₂HfZrO₆	−3.615	KLaHfScO₆	−3.574
TlYSc₂O₆	−3.625	Ra₂HfNdO₆	−3.612	LaRbHfScO₆	−3.574
BaRaSc₂O₆	−3.624	Ba₂TaYO₆	−3.611	NdRbHfScO₆	−3.570
CaRaSc₂O₆	−3.621	Ba₂HfZrO₆	−3.609	CeKHfScO₆	−3.569
TlErSc₂O₆	−3.619	Ca₂AlScO₆	−3.607	CeRbHfScO₆	−3.568

Table 4. Comparison between predicted formation energies and OQMD data.

Structure	<MAE>	<RMSE>	<R²>	<MAPE (%)>
A₂B′BO₆	0.2610	0.3220	0.7181	15.07
AA′B₂O₆	0.2732	0.3495	0.6394	11.72
AA′BB′O₆	0.0621	0.0923	0.7939	1.93

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, Y.; Deng, M.; Rui, T.; Ma, Z.; Lu, L.; Wang, Y.; Lan, T.; Lan, Y.; Wan, H.; Li, Y.; et al. Formation Energy Prediction of Doped Perovskite Structures Based on Transfer Learning with Small Datasets. Crystals 2025, 15, 1008. https://doi.org/10.3390/cryst15121008

AMA Style

Yu Y, Deng M, Rui T, Ma Z, Lu L, Wang Y, Lan T, Lan Y, Wan H, Li Y, et al. Formation Energy Prediction of Doped Perovskite Structures Based on Transfer Learning with Small Datasets. Crystals. 2025; 15(12):1008. https://doi.org/10.3390/cryst15121008

Chicago/Turabian Style

Yu, Yang, Mingxuan Deng, Tianhao Rui, Zhuangzhuang Ma, Linyuan Lu, Yunhao Wang, Tianxing Lan, Yulin Lan, Hengcheng Wan, Yiyan Li, and et al. 2025. "Formation Energy Prediction of Doped Perovskite Structures Based on Transfer Learning with Small Datasets" Crystals 15, no. 12: 1008. https://doi.org/10.3390/cryst15121008

APA Style

Yu, Y., Deng, M., Rui, T., Ma, Z., Lu, L., Wang, Y., Lan, T., Lan, Y., Wan, H., Li, Y., Li, Z., & Zhang, H. (2025). Formation Energy Prediction of Doped Perovskite Structures Based on Transfer Learning with Small Datasets. Crystals, 15(12), 1008. https://doi.org/10.3390/cryst15121008

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Formation Energy Prediction of Doped Perovskite Structures Based on Transfer Learning with Small Datasets

Abstract

1. Introduction

2. Methods

2.1. Dataset Preparation

2.2. Feature Engineering

2.2.1. Elemental Property Feature Extraction

2.2.2. Derived Structural Feature Calculation

2.3. Transfer Learning Strategy

3. Results and Discussion

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI