Next Article in Journal
Chatbot-Supported Written Mediation and Pluricultural Competence in Adult EFL: An Exploratory Study in Official Language Schools
Previous Article in Journal
Mind the Gap: Teacher Knowledge, Classroom Reality, and Early Literacy Instruction
 
 
Article
Peer-Review Record

Two-Level Monitoring System for Preventing Academic Failure, Based on Predictive Models and SHAP Analysis

Educ. Sci. 2026, 16(6), 842; https://doi.org/10.3390/educsci16060842
by Roman V. Esin and Tatiana A. Kustitskaya *
Reviewer 1:
Reviewer 2: Anonymous
Educ. Sci. 2026, 16(6), 842; https://doi.org/10.3390/educsci16060842
Submission received: 21 April 2026 / Revised: 21 May 2026 / Accepted: 22 May 2026 / Published: 27 May 2026
(This article belongs to the Section Higher Education)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This work presents a two-level monitoring system for at-risk students prediction at a university. The paper addresses a practically relevant problem in higher education. However, the manuscript has significant methodological shortcomings, limited novelty in its technical approach, and insufficient experimental rigor, which must be addressed before the work can be considered for publication.

Introduction

Lines 62–77: The Introduction devotes considerable space to discussing clustering methods in educational data science (e.g., identifying engagement patterns, profiling low-engagement students) and their interpretability limitations. However, the study itself employs exclusively supervised classification methods with no clustering component. The transition from clustering literature to the authors' supervised learning approach (Lines 78 onwards) is not clearly motivated. This narrative structure may mislead readers into expecting a clustering-based or hybrid unsupervised-supervised approach. The authors should either tighten this section to more directly motivate the use of supervised tree-based models with SHAP or explicitly explain why the clustering literature is relevant to framing their supervised learning contribution.

Methods (Section 2.3)

Lines 225–226: The choice of CatBoost, XGBoost, LightGBM, and Random Forest is presented without justification. The authors should explain why these specific models were selected and why other commonly used approaches in educational data mining (e.g., neural networks, SVMs, or simpler interpretable models such as single decision trees) were excluded.

Line 227: The use of "default hyperparameters" in the initial modeling stage is stated without explanation. Given that default hyperparameters can vary substantially across library versions and may not be appropriate for the specific dataset characteristics.

Lines 229–231: Logistic regression was eliminated based on a single evaluation with default hyperparameters, with an F1-score of 0.64 labeled as "poor" without any defined threshold or baseline for comparison.

Lines 238–239: The authors employ stepwise forward feature selection based on the Akaike Information Criterion (AIC). AIC is designed for parametric, likelihood-based models (e.g., linear regression, GLMs) where the number of estimable parameters (k) is well-defined. Tree-based ensemble models are non-parametric and do not have a fixed parameter count, nor do they straightforwardly yield a likelihood function suitable for AIC computation. The authors should: (a) clarify precisely how AIC was computed for each tree-based model (i.e., what was used as the likelihood and how k was defined); (b) justify why AIC was chosen over model-native feature selection methods well-suited for tree-based models, such as permutation importance, recursive feature elimination, or SHAP-based feature selection; (c) explain how feature selection and hyperparameter tuning interact within the nested cross-validation to prevent information leakage.

Lines 244–249: The custom penalty function penalizes Random Forest models for having more than 200 trees, maximum depth greater than 10, minimum samples to split below 5, etc. These thresholds appear to be arbitrary engineering choices with no theoretical or empirical justification provided. An academic paper should justify why 200 estimators is the cutoff rather than 150 or 300, why depth 10 rather than 8 or 15, and so on. Are similar penalty functions applied to CatBoost, XGBoost, and LightGBM? If so, the thresholds for all models should be reported; if not, the asymmetric treatment requires justification.

Lines 252–254: Feature importance is reported using different default metrics for each package (PredictionValuesChange for CatBoost, Gain for XGBoost, Split for LightGBM, Mean Decrease in Impurity for Random Forest). These metrics measure fundamentally different quantities and are not directly comparable, making the cross-model comparison of feature rankings in Tables 2 and 3 potentially misleading. A uniform metric (e.g., permutation importance) should be used for fair cross-model comparison, or this limitation should be explicitly acknowledged when interpreting Tables 2 and 3.

Methods (Section 2.4)

Lines 278–284: The authors use RandomOverSampler to address class imbalance (17% vs. 83%) but provide insufficient detail: (a) What resampling ratio was used? Was the minority class fully balanced to 50:50? (b) Was oversampling performed only within training folds of the nested cross-validation to prevent data leakage, or was the entire training set oversampled before cross-validation? (c) The justification for rejecting SMOTE that "most predictors are non-numeric" is not substantiated. The authors should report the actual ratio of categorical to numeric features, and note that SMOTE-NC is designed to handle mixed feature types. (d) No assessment of the quality of the oversampled data is reported (e.g., effect on decision boundaries, comparison of model performance with and without oversampling).

Lines 265–274 (general): The Level-2 model uses different features from the Level-1 model but the same modeling pipeline. Given that the only structural difference between the two levels is the target variable and the available feature set, the authors should more clearly justify the need for a two-level architecture rather than a single model with appropriately timed features.

Results and Discussion

Lines 350-362: The variables Semester_Spring and Year_2023 rank as the top two predictors across all Level-1 models by SHAP feature importance. These are calendar indicators, not student-level characteristics. Their dominance suggests the models may be capturing temporal or cohort-specific effects (e.g., policy changes, curriculum restructuring, post-COVID adjustments) rather than generalizable risk factors. The authors should discuss how these temporal confounds affect the generalizability of both the predictive models and the derived risk profiles to future academic years.

Line 745: While the authors briefly acknowledge in Line 745 that SHAP is not a causal inference tool, the discussion in Sections 3.2, 3.4, and 4.4 repeatedly frames findings in implicitly causal terms—for example, recommending "restoring minimal engagement with the learning environment" as an intervention strategy (Line 743). SHAP identifies predictive associations, not causal mechanisms. The absence of LMS activity could be a symptom rather than a cause of dropout risk (e.g., students who have already decided to leave would naturally stop logging in). The paper should more carefully and consistently distinguish between predictive associations and actionable causal factors throughout.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Dear authors,

The manuscript addresses a relevant topic, however, it would benefit from a stronger connection between the analytical results and their practical and educational implications.

Comments for author File: Comments.pdf

Comments on the Quality of English Language

The overall quality of the article is good, but I would suggest an additional revision of the language to improve the clarity, fluidity and consistency of the text.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have undertaken a substantial and commendable revision. The majority of concerns raised in the first round have been thoroughly addressed, and the manuscript is significantly improved in methodological rigor, transparency, and reproducibility. I appreciate the constructive engagement with the review process.

The following minor issues should be corrected before publication:

  1. Table 5 column header error (p. 11): The third column of Table 5 is labeled "CatBoost," but the first column already bears this label. Based on the response letter and context, this column should read "LightGBM." This must be corrected to avoid misinterpretation of the dropout prediction results.
  2. Cross-reference error (Line 297–298): The text states "penalty(hyperparams) penalizes model complexity using the thresholds described in Section 2.3.1." However, there is no Section 2.3.1. Please correct the section reference.
  3. Comment 5(b) from Round 1: The authors' response did not explicitly address why the AIC-like penalty approach was preferred over model-native feature selection methods (permutation importance, recursive feature elimination, SHAP-based selection) that are well-established for tree-based ensembles. Given that the empirical results in Tables 2 and 4 demonstrate effective feature reduction with maintained performance, this is no longer a critical concern. However, a brief sentence in Section 2.2.2 acknowledging these alternatives and noting the rationale for the chosen approach would improve methodological completeness.

Overall, the manuscript now presents a well-designed, reproducible, and practically relevant framework for monitoring at-risk students. 

Author Response

We sincerely thank you for the constructive feedback, the time dedicated to evaluating our manuscript, and the positive assessment of our study. Below, we provide a point-by-point response to all comments.

Comments 1: Table 5 column header error (p. 11): The third column of Table 5 is labeled "CatBoost," but the first column already bears this label. Based on the response letter and context, this column should read "LightGBM." This must be corrected to avoid misinterpretation of the dropout prediction results.

Response 1: Thank you for pointing out this error. The header of the third column in Table 5 has been corrected to "LightGBM" as suggested.

Comments 2: The text states "penalty(hyperparams) penalizes model complexity using the thresholds described in Section 2.3.1." However, there is no Section 2.3.1. Please correct the section reference.

Response 2: We apologize for this oversight. The incorrect cross-reference has been fixed. The text now correctly references Section 2.2.1

Comments 3:  Comment 5(b) from Round 1: The authors' response did not explicitly address why the AIC-like penalty approach was preferred over model-native feature selection methods (permutation importance, recursive feature elimination, SHAP-based selection) that are well-established for tree-based ensembles. Given that the empirical results in Tables 2 and 4 demonstrate effective feature reduction with maintained performance, this is no longer a critical concern. However, a brief sentence in Section 2.2.2 acknowledging these alternatives and noting the rationale for the chosen approach would improve methodological completeness.

Response 3: We appreciate your constructive feedback on this methodological point. Following this suggestion, we have expanded the discussion in Section 2.2.2 (immediately following Table 2). We added a passage that explicitly acknowledges well-established alternatives and provides the rationale for selecting the AIC-like penalty approach in our framework. The newly added text has been highlighted in red in the revised manuscript.

Reviewer 2 Report

Comments and Suggestions for Authors

The reviewer's suggestions were considered and incorporated into the manuscript, resulting in significant improvements in writing clarity and flow.

However, the request for Figure 2, including items (a), (b), and (c), should be revised and improved to ensure greater sharpness, visual quality, and clarity in the presentation of information.

Author Response

We sincerely appreciate your time, positive evaluation, and constructive feedback.  A response to your comments is provided below.

Comments 1: The reviewer's suggestions were considered and incorporated into the manuscript, resulting in significant improvements in writing clarity and flow. However, the request for Figure 2, including items (a), (b), and (c), should be revised and improved to ensure greater sharpness, visual quality, and clarity in the presentation of information.

Response 1: Thank you for your valuable comment. For the final publication, we have prepared all figures, including Figure 2  in high resolution, and have submitted the original high-quality files separately to the editor. Additionally, we have clarified in the revised manuscript that the conclusions regarding feature importance are explicitly based on the analysis of Figure 2, thereby improving the transparency and interpretability of the presented results. The newly added text has been highlighted in red in the revised manuscript.

Back to TopTop