Next Article in Journal
Polynitrogen Bicyclic and Tricyclic Compounds as PDE4 Inhibitors
Previous Article in Journal
Development Process of TGDI SI Engine Combustion Simulation Model Using Ethanol–Gasoline Blends as Fuel
 
 
Article
Peer-Review Record

An Interpretable XGBoost Framework for Predicting Oxide Glass Density

Appl. Sci. 2025, 15(15), 8680; https://doi.org/10.3390/app15158680
by Pawel Stoch
Reviewer 1: Anonymous
Reviewer 2:
Appl. Sci. 2025, 15(15), 8680; https://doi.org/10.3390/app15158680
Submission received: 18 July 2025 / Revised: 30 July 2025 / Accepted: 1 August 2025 / Published: 5 August 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

In this paper, the authors developed ML models to predict the density of oxide glasses and investigate how different feature engineering strategies impact model performance and interpretability. Generally, this is a very interesting and important work, contributing to the field of oxide glasses. In addition, the manuscript aligns well with the research direction of Applied Sciences. However, several problems in the authors’ manuscript need to be addressed before acceptance. The following comments would be helpful to improve the quality of this manuscript.
(1)    Please give the full names of ML and SHAP when they first appear in the abstract.
(2)    If possible, please give the used sample dataset after screening in the supplementary materials.
(3)    Please list the equations of the evaluation functions of the model performance.
(4)    Please explain what measures you have taken to avoid overfitting.
(5)    Please provide more details about calculation method of feature importance in section 2.
(6)    Figures 6 and 7 are not clear and please improve them.
(7)    Please revise the font of the text in the figures to be the same as that of the main text.
(8)    More results should be added in Conclusion section. Also, I suggest the authors clarify the limitations of this study, as well as the potential direction of future work. A previous work (ANN-based structure-viscosity relationship model of multicomponent slags for production design in mineral wool) has adopted first-nearest neighbor pairs, BO, NBO, FO, and NBO/T as the structural features of oxide glass in the ML models, which may be a potential method to further improve your model in your future research.

Author Response

First, we would like to thank the reviewer for his insightful and thoughtful comments. These comments significantly improved the quality of the manuscript. All changes in the text are highlighted in red.

 

(1)    Please give the full names of ML and SHAP when they first appear in the abstract.

The full names have been given in the abstract.

 

(2)    If possible, please give the used sample dataset after screening in the supplementary materials.

The full dataset after screening has been added into public AGH Rodbuk repository link: https://doi.org/10.58032/AGH/WY0GEJ. The appropriate statement is included at the end of manuscript (Data Availability Statement).

 

(3)    Please list the equations of the evaluation functions of the model performance.

The equations have been added to the section 2.4 Machine learning models.

 

(4)    Please explain what measures you have taken to avoid overfitting.

Several standard were implemented throughout the modeling pipeline to prevent overfitting and ensure the generalizability of our results. The most fundamental step taken was the strict separation of our data into training and test sets. The entire dataset was partitioned using an 80/20 split, where 80% of the data was used for training and hyperparameter tuning, and a completely independent 20% was held out as a test set. The final performance of each model, as reported in Figures 2-4, was exclusively evaluated on this unseen test set. This practice provides an unbiased assessment of how well the model generalizes to new, previously unobserved data. To ensure consistency and fair comparison, a fixed random state was used for the split across all experiments. A 5-fold cross-validation scheme within a grid search framework was employed for hyperparameter tuning. This technique ensures that the selected hyperparameters are robust and perform well across the entire training dataset, rather than being tuned to a specific, arbitrary validation subset. It significantly reduces the risk of overfitting during the model selection phase. The machine learning algorithms chosen for this study have built-in mechanisms to control model complexity, which were optimized during the hyperparameter tuning process. In case of the ElasticNet, that is an inherently regularized linear model that combines L1 and L2 penalties to constrain the model's coefficients, preventing them from becoming too large and complex. The XGBoost algorithm includes several key hyperparameters that directly control overfitting, such as the maximum depth of the trees, the learning rate, and L1/L2 regularization terms on the weights. These were carefully tuned using the cross-validated grid search to find an optimal balance between model performance and complexity. Similarly, the hyperparameters for the neural network, such as the regularization parameter, were optimized to prevent the model from becoming overly complex. Final point was feature selection. Before training the models, we performed a feature reduction process on the feature sets. This involved removing sparse features (present in <10% of samples) and highly correlated features (Pearson correlation > 0.95). By reducing the dimensionality and removing redundant information, we simplify the problem for the learning algorithm, making it less likely to learn spurious correlations present only in the training data.

Appropriate comment has been added to the section 2.4 Machine learning models.

 

(5)    Please provide more details about calculation method of feature importance in section 2.

The more detailed description has been added in section 2 (last paragraph)

 

(6)    Figures 6 and 7 are not clear and please improve them.

All the figures have been improved.

 

(7)    Please revise the font of the text in the figures to be the same as that of the main text.

The fonts in all the figures have been unified.

 

(8)    More results should be added in Conclusion section. Also, I suggest the authors clarify the limitations of this study, as well as the potential direction of future work. A previous work (ANN-based structure-viscosity relationship model of multicomponent slags for production design in mineral wool) has adopted first-nearest neighbor pairs, BO, NBO, FO, and NBO/T as the structural features of oxide glass in the ML models, which may be a potential method to further improve your model in your future research.

The Conclusions has been improved.

We thank the reviewer for pointing us to the work on ANN-based models using structural features (BO, NBO, etc.). Our current study focuses on a "composition-to-property" framework, which is powerful for high-throughput screening where only compositional data is available. Incorporating detailed structural descriptors like BO/NBO represents a shift to a "structure-to-property" model. While this is a very powerful approach for achieving high accuracy, it is often limited to smaller datasets where such detailed structural information can be obtained from simulations or spectroscopy. We agree this is a valuable direction for future, more focused studies and will add it to our outlook on future work.

The appropriate comments have been added into Discussion and Conclusion sections.

Reviewer 2 Report

Comments and Suggestions for Authors

This manuscript presents a well-structured and scientifically robust study on the prediction of oxide glass density using machine learning (ML), particularly focusing on model interpretability. The authors employ a large and diverse dataset extracted from the SciGlass database, carefully engineer multiple feature sets of increasing physical complexity (Compositional, Atomic Types, Meredig, Magpie), and evaluate the performance of three ML models (ElasticNet, MLP, XGBoost). The XGBoost model combined with the Magpie features achieves excellent predictive performance (R² = 0.97), while the interpretability is rigorously assessed using SHAP and other feature importance methods.

The manuscript is timely, relevant, and provides an insightful contribution to the growing field of interpretable ML in materials science. In particular, the connection between the model's internal logic and fundamental physical relationships (e.g., molar mass and volume) is clearly demonstrated.

However, a few issues and clarifications should be addressed before publication.

1. There are multiple instances where the term “Atopic types” appears instead of “Atomic types” (e.g., Figures 3–7, and throughout the results and discussion). Please revise for consistency.

2. You mention that the model shows slightly increased error for high-density glasses. Did you consider techniques such as data reweighting, oversampling, or distribution-aware loss functions to improve predictions in this region?

3. Given that the model was trained exclusively on oxide glasses, how well would it perform when applied to fluoride, chalcogenide, or other non-oxide systems? Have you considered or tested any domain adaptation strategies?

4. While the SHAP analysis is very well explained, do you observe any significant discrepancies between the SHAP and permutation/gain-based importance rankings? If so, it may be worth highlighting such examples for deeper insight.

5. Is the trained model or pipeline currently integrated into any materials screening tool or computational platform for glass design? A brief mention of this (or future plans) would reinforce the practical applicability.

6. You state that the final Magpie set includes 43 descriptors after filtering. For full reproducibility, I suggest including a summary of the selected descriptors in the main text or as an appendix table, in addition to Table S4.

7. Figures 2–4 and 6–7 are informative, but may benefit from slightly larger font sizes and higher resolution for clarity in print.

Author Response

First, we would like to thank the reviewer for his insightful and thoughtful comments. These comments significantly improved the quality of the manuscript. All changes in the text are highlighted in red.

1. There are multiple instances where the term “Atopic types” appears instead of “Atomic types” (e.g., Figures 3–7, and throughout the results and discussion). Please revise for consistency.

It was a typo error, and should be “Atomic”. It has been corrected.

 

2. You mention that the model shows slightly increased error for high-density glasses. Did you consider techniques such as data reweighting, oversampling, or distribution-aware loss functions to improve predictions in this region?

We thank the reviewer for raising this important point. The observation that the model's error increases for high-density glasses is a important finding of our error analysis. For this particular study, we made a conscious decision not to implement techniques such as data reweighting or oversampling. The primary objective of this work was to systematically investigate and demonstrate the hierarchical impact of physically-informed feature engineering on model performance and interpretability. To create a clear and controlled comparison, we aimed to keep the training methodology and loss functions consistent across the four tested feature sets. Introducing advanced sampling methods or custom loss functions would add another variable to the experiment, potentially confounding the central conclusion that the sophistication of the features is the dominant factor in creating an accurate and physically interpretable model. Our goal was to first establish this foundational principle with a standard, robust methodology. Implementing techniques like reweighting samples based on their density or using a distribution-aware loss function would be a good strategy to specifically improve the model's precision in the underrepresented high-density region. This would be particularly critical for applications involving heavy-element glasses, such as materials for radiation shielding or high-refractive-index optics. We consider this for future research.

 

The appropriate comment for future research direction has been added in the Discussion.

 

3. Given that the model was trained exclusively on oxide glasses, how well would it perform when applied to fluoride, chalcogenide, or other non-oxide systems? Have you considered or tested any domain adaptation strategies?

The current model, trained exclusively on oxide glasses, would perform poorly and provide unreliable predictions if directly applied to other glass families such as fluorides, chalcogenides, or metallic glasses. The reason for this is fundamental. The model's predictive power comes from learning the specific physicochemical relationships between composition and density within the context of oxide systems. The nature of chemical bonding, atomic packing efficiency, and the structural roles of constituent elements are vastly different in non-oxide networks. For example, the model's learned understanding of cation-oxygen interactions is not transferable to the cation-fluorine or cation-sulfur bonds that define fluoride and chalcogenide glasses, respectively. Applying the model to these systems forcing it to extrapolate far beyond its training data, which would undoubtedly lead to high prediction errors. While domain adaptation and transfer learning are powerful and highly relevant techniques, they were considered outside the scope of this specific study. Our primary objective was to first develop a robust and interpretable baseline model for a single, well-defined, and data-rich domain (oxide glasses) to validate our feature engineering and interpretability workflow. However, this is an important direction for future research.

 

4. While the SHAP analysis is very well explained, do you observe any significant discrepancies between the SHAP and permutation/gain-based importance rankings? If so, it may be worth highlighting such examples for deeper insight.

The most prominent example of this discrepancy is the oxygen (O) fraction, which allows for a deeper understanding of how the different importance metrics should be interpreted.

In the model trained on Compositional features (Figure 6a), oxygen has the highest weight (gain) by a large margin. This occurs because oxygen is the most abundant element across nearly all samples in the dataset. Consequently, the XGBoost algorithm uses the oxygen fraction very frequently for splitting nodes within its decision trees. The "gain" metric reflects this high utility in building the model's internal structure. In contrast, oxygen's permutation importance is considerably lower than that of key cations like lead (Pb), boron (B), and silicon (Si). The SHAP analysis (Figure 7a) confirms this, showing that oxygen's SHAP values are tightly clustered around zero, indicating a very low direct impact on the final density prediction. It demonstrates that the model correctly learns the oxygen sublattice forms the bulk of the glass volume, it is the identity of the cations that overwhelmingly dictates the final density. Therefore, shuffling the values of a heavy element like Pb (which strongly changes the glass mass) has a high effect on the model's predictive accuracy. Shuffling the oxygen fraction, which varies within a much smaller relative range, has a far weaker effect.

Meredig and Magpie feature sets, all importance measures are in much stronger agreement, correctly identifying mean_atomic_weight as the dominant feature. This suggests that as the features become more physically informative, the model's internal logic (gain) aligns more closely with direct predictive impact (permutation/SHAP).

 

The comment has been added to the Discussion.

 

5. Is the trained model or pipeline currently integrated into any materials screening tool or computational platform for glass design? A brief mention of this (or future plans) would reinforce the practical applicability.

While the model is not currently integrated into a production-level computational platform, the work presented here is the foundational step toward that goal. The validated framework and the complete source code have been made openly available to facilitate use and integration by the wider research community.

 

6. You state that the final Magpie set includes 43 descriptors after filtering. For full reproducibility, I suggest including a summary of the selected descriptors in the main text or as an appendix table, in addition to Table S4.

The short description of the features has been added to the Table S4.

 

7. Figures 2–4 and 6–7 are informative, but may benefit from slightly larger font sizes and higher resolution for clarity in print.

All the figures have been improved.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The revised manuscript fully resolves all the raised questions with clarity and understanding, benefiting the readers. It qualifies for publication.

Back to TopTop