- Article
Enhancing Reverse Design Ability of Functional Materials Based on Data Quality Management: Taking Biomedical Zinc Alloy as an Example
- Xujie Gong,
- Xue Jiang and
- Shiyu Huang
- + 4 authors
Biodegradable zinc alloys have shown great potential in the biomedical field, but are limited by their poor mechanical properties. Alloying is essential for improving mechanical properties, yet designing multicomponent zinc alloys remains challenging due to complex elemental interactions. Notably, while data-driven active learning approaches offer new strategies for zinc alloy design, data quality issues such as redundancy, outliers, and inconsistencies in multi-source heterogeneous data hinder modeling accuracy and interpretability. In this work, we proposed a data quality management strategy based on recursive screening, targeting three key data problems, namely, redundant data (RD), outlier data (OD), and inconsistent target data (ID). Case studies on hydrogen embrittlement, phase-change refrigeration materials, and matbench_expt_gap datasets showed that, in the aforementioned data-driven research, RD optimized data distribution but risked precision loss in high-performance regions; OD enhanced minority alloy features but risked overfitting; and ID preserved high-performance data, boosting extrapolation but risking underfitting. Six multicomponent zinc alloys were designed and fabricated using these strategies. Experiments showed ID-optimized datasets achieving 482 MPa—near state-of-the-art performance. The highest tensile strength of 482 MPa was obtained in the alloy Zn-1.2Al-0.8Mg-0.45Li-0.3Mn (at%), designed via the ID-optimized dataset. The study revealed that in inverse design, predictive accuracy in high-performance regions outweighs data volume or density, underscoring the value of data quality management for multi-source materials development.
15 October 2025