Review Reports - Evaluating and Forecasting Undergraduate Dropouts Using Machine Learning for Domestic and International Students

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Attached file

Comments for author File: Comments.pdf

Comments on the Quality of English Language

The English could be improved to more clearly express the research.

Author Response

List of changes

（Responses to reviewers’ comments）

Paper Title: Evaluating and forecasting undergraduate dropouts using machine learning for domestic and international students

Name(s) of Author(s): Songbo Wang and Jiayi He

Dear editor

Thank you for sending the reviews for our paper “Evaluating and forecasting undergraduate dropouts using machine learning for domestic and international students” and for the comments of the three reviewers.

We have revised the paper following consideration of the reviewers’ comments, and this document details our response to each of their comments.

Yours sincerely,

Songbo Wang (corresponding author).

Reviewer #1

Part 1:

Section 1: Summary of the Manuscript and Overall Recommendation

This manuscript addresses the important and challenging problem of predicting undergraduate student dropout, commendably distinguishing between domestic and international student cohorts. The authors utilize a dataset from Portuguese institutions and employ machine learning models to identify key predictive factors, with a particular focus on academic performance in the first two semesters. Strengths of the work include the development of an interactive, open access web application for practical use, which is a valuable contribution toward translating research into practice, and the application of Explainable AI (XAI) techniques (SHAP, ICE, PDP) to interpret model predictions and identify key drivers of dropout risk.

We sincerely appreciate the reviewer’s recognition of our work. We are pleased that the distinction between domestic and international students, as well as the use of the Portuguese dataset and machine learning models, was well received. We also value your positive feedback on the analysis of academic performance and dropout risk, and our application of deep learning and XAI techniques to provide clear, meaningful insights. Furthermore, we are grateful that you acknowledged the practical contribution of our interactive web application. Your feedback has been instrumental in improving and refining our research. Thank you again for your valuable suggestions. [Highlighted in red.]

Core Assessment: While the research question is well-motivated and the practical goals are laudable, the manuscript in its current form suffers from several fundamental methodological and conceptual flaws that preclude its publication in a high-impact journal. The most significant issues are a persistent and inaccurate framing of the core methodology as "deep learning," and an insufficiently validated, high risk data augmentation and modeling strategy for the international student cohort. These issues undermine the scientific framing of the paper and the reliability of a significant portion of its conclusions.

Thank you to the reviewer for recognizing the importance of our research question and for providing valuable feedback on the flaws in the manuscript. We acknowledge that there are indeed methodological and conceptual issues, particularly the incorrect application of the "deep learning" concept. As a result, we have revised the Title, Abstract, Keywords, and the entire manuscript to ensure a more accurate description of the machine learning framework, rather than "deep learning".

Additionally, in Section 4, we have provided a detailed validation of the data augmentation and modeling strategy for the international student cohort to ensure the effectiveness of the international student model.

These revisions significantly strengthen the scientific framework and methodology of the paper, making it more rigorous and reliable. Once again, we appreciate the reviewer’s valuable feedback.

Section 2: Major Concerns Requiring Fundamental Revision

This section details the critical issues that must be addressed for the manuscript to be reconsidered.

Thank you for outlining the critical issues. We understand their importance and have made revisions to address the key concerns, including refining the methodology and improving data validation. We believe these changes resolve the main issues and enhance the manuscript. We appreciate your feedback and look forward to resubmitting the revised version.

Fundamental Misclassification of the Core Methodology as "Deep Learning"

2.1) The manuscript's central claim of using "deep learning" (DL) is fundamentally incorrect and must be revised throughout the paper. The title, abstract, keywords, and main text repeatedly refer to "deep learning" and an "Automated Deep Learning (AutoDL) framework". However, the models identified as the most effective and used for the final analysis Light Gradient Boosting Machine (LightGBM) for domestic students and CatBoost for international students are not deep learning models.

We thank the reviewers for highlighting the mischaracterization of our methodology as “deep learning” As correctly noted, the predictive models used in our final analysis, LightGBM for domestic students and CatBoost for international students, are gradient boosting machine learning methods rather than deep learning.

The term “deep learning” was originally adopted because CTGAN, used for data augmentation, is based on generative adversarial networks, but this was misleading since the predictive stage relied on tree-based machine learning models.

We have therefore revised the manuscript throughout: the Title, Abstract, Keywords and Section 2 now refer to “machine learning (AutoML)” and the methodology section clearly distinguishes between CTGAN’s neural network based on augmentation and the machine learning models for prediction. These corrections improve both the accuracy and clarity of the manuscript.

2.2) Technical Clarification: LightGBM and CatBoost are highly optimized and powerful implementations of Gradient Boosted Decision Trees (GBDTs). GBDTs are a form of ensemble machine learning where multiple weak learners (decision trees) are trained sequentially to correct the errors of their predecessors. This is a distinct and well-established class of algorithms. Deep learning, conversely, refers specifically to artificial neural networks characterized by multiple hidden layers (i.e., deep architecture) that learn hierarchical representations of data. While GBDTs can be used in conjunction with deep learning models (for example, as a classifier operating on features extracted by a neural network), this work uses them as the primary predictive models on structured, tabular data, which is a classic machine learning application, not a deep learning one.

This is a highly valuable suggestion. We recognize that the original manuscript contained ambiguous descriptions of machine learning (ML) and deep learning (DL), which failed to accurately reflect the overall research framework and therefore required improvement.

Initially, we highlighted the innovative use of the DL-based CTGAN to address the limited sample size of international students, but we overlooked the fact that the overarching research framework was essentially ML.

In response to the comments from Reviewers #1, Reviewers #2 and Reviewers #3, we have revised the methodological descriptions in Section 2 to explicitly clarify that the core framework of this study is ML, with DL techniques (CTGAN) incorporated specifically for data augmentation. This revision effectively avoids conceptual confusion and strengthens both the accuracy and logical rigor of the manuscript.

2.3) Likely Source of Error and Implications: This mischaracterization appears to stem from a misunderstanding of the PyCaret library used in the study. The authors refer to an "AutoDL" approach, but PyCaret is a low-code Automated Machine Learning (AutoML) framework. AutoML platforms automate the process of comparing a wide range of machine learning models, including GBDTs, logistic regression, random forests, and others, to find the best performer for a given task. The authors have seemingly used a function like compare_models() which identified GBDTs as the top performers, and then incorrectly labeled this entire automated workflow with the more fashionable but inaccurate "deep learning" moniker. This is not a mere semantic issue; it represents a foundational conceptual error that undermines the scientific framing and credibility of the paper. It suggests a lack of familiarity with the very techniques central to the study's contribution and could mislead readers unfamiliar with the field about the state of the art for this type of predictive problem. For a high impact journal, such a fundamental mistake signals a lack of rigor.

We thank the reviewers for their insightful comments. The original manuscript inaccurately characterized our methodology as “deep learning” In fact, the core predictive framework is based on machine learning, with GBDTs (LightGBM and CatBoost) identified as the best-performing models through PyCaret, a low-code AutoML platform. The only deep learning component is CTGAN, which was used exclusively for data augmentation.

In the revised manuscript, we have consistently replaced AutoDL with AutoML and clarified the distinction between ML and DL in Section 2.2. These revisions remove conceptual ambiguity, improve accuracy, and enhance the rigor of the study without affecting the results or conclusions.

2.3.1) Required Action: The authors must systematically re-frame the entire manuscript.

Replace all instances of "deep learning" and "AutoDL" with accurate descriptors such as "ensemble machine learning," "gradient boosting," or "automated machine learning (AutoML)."
Revise the title, abstract, keywords, introduction, and methodology sections to accurately reflect that the study compares multiple machine learning models using an AutoML framework and finds that GBDT based algorithms provide the best
The narrative should be adjusted to correctly position of the work within the machine learning literature, not the deep learning

In summary, we have carefully revised the manuscript to address the critical issues raised by the reviewers regarding the mischaracterization of the methodology as "deep learning". Specifically, we have revised these aspects in detail to rectify the methodological mislabeling and align all content with the actual research design:

Replaced all instances of deep learning and AutoDL with accurate terms such as machine learning (ML) and automated machine learning (AutoML).
Revised the Title, Abstract, Keywords, Introduction and Section 2 to reflect the correct methodology, which focuses on comparing multiple machine learning models using an AutoML framework, with GBDT-based algorithms (LightGBM and CatBoost) identified as the top performers.
In Section 2.1, clarified the distinction between machine learning for prediction and the use of deep learning (CTGAN) exclusively for data augmentation.
Repositioned the manuscript within the machine learning literature, ensuring that the work is framed appropriately.

These revisions address the foundational concerns raised by the reviewers, eliminate conceptual ambiguity, and enhance the clarity and scientific rigor of the manuscript.

Insufficient Validation of Data Augmentation and the "Train on Synthetic, Test on Real" (TSTR) Paradigm

2,4) The study confronts a severe data scarcity problem for the international student cohort, with only 86 real samples available. The author's solution using CTGAN to generate 487 synthetic samples for training while reserving all 86 real samples for testing is known as the "Train on Synthetic, Test on Real" (TSTR) paradigm. This approach is methodologically perilous and requires an exceptionally high standard of evidence to be considered valid, a standard this manuscript does not meet.

We thank the reviewer for this insightful comment and fully acknowledge the concern regarding the methodological risks of the TSTR paradigm. The extremely limited number of real samples (n = 86) indeed posed a significant challenge for model development, as traditional train-test splits would have resulted in even smaller training sets and unstable results.

Our motivation for applying CTGAN was to explore whether synthetic data could provide supplementary training signals under such severe scarcity. However, we recognize that synthetic data cannot fully replace real-world data, and the validity of TSTR requires careful justification. To address this concern, we have taken the following steps in the revised manuscript:

In Section 4.1, we have added a comparative analysis between the TRTR and TSTR paradigms to evaluate the effectiveness of TSTR. The CatBoost model for TRTR achieved an accuracy of 0.8518, which is slightly lower than the 0.9004 accuracy achieved using TSTR. This demonstrates that TSTR can enhance the model's classification capability.
In Section 4.2, we have added comparative experiments (e.g., training only on the small real dataset as well as mixed synthetic-real scenarios) to contextualize the performance of the TSTR approach.
In Section 5 (Discussion), we explicitly state that the use of TSTR is an exploratory approach rather than a definitive validation strategy.
In Section 6 (Conclusions), we expanded the content to emphasize the risks and limitations of relying on synthetic training data and cautioned readers against overinterpreting the results.

These revisions do not change the core findings but ensure that the claims are presented with greater caution and transparency.

2.5) Critique of Synthetic Data Validation: The validation of the synthetic data is superficial and inadequate. The authors provide visual comparisons of density distributions (Figure 2) and correlation matrices (Figure 3). While these are useful as a preliminary check, they are insufficient to prove the utility and fidelity of the synthetic data. Best practices in the field demand more rigorous, quantitative validation techniques. These include:

Quantitative Statistical Comparisons: Using statistical tests like the two-sample Kolmogorov-Smirnov test or divergence measures (e.g., Jensen-Shannon) to formally quantify the distributional similarity between the real and synthetic datasets for each feature.
Model-based Utility Testing: A crucial test of synthetic data is its utility for a downstream machine learning task. The authors have not demonstrated that a model trained in their synthetic data can produce substantively similar results (e.g., similar feature importance rankings, similar performance metrics) to a model trained in real data.

We thank the reviewer for this constructive critique and fully agree that our initial validation of the synthetic data relied too heavily on visual comparisons, which are only preliminary checks. We recognize that rigorous, quantitative validation is necessary to properly assess the fidelity and utility of synthetic data.

In the revised manuscript, we strengthened validation in two major ways:

Quantitative Statistical Comparison: In Section 3.2, we computed Jensen-Shannon divergence scores for each feature, and reported the results in Table 3, to formally quantify the distributional similarity between real and synthetic datasets.
Model-based Utility Validation: We introduced three models in total. For the international student cohort, we conducted a two-model comparison: one model was trained and tested on real data, while another was trained on synthetic data and tested on real data. By comparing the performance metrics and consistency of these two models, we provide stronger evidence of the effectiveness of CTGAN-based data augmentation. The specific modifications can be found in Figure 5, where the newly added AUC curves illustrate the comparative results more clearly.

These additions provide a more rigorous validation framework. While they do not alter the core conclusions of our study, they increase the robustness and credibility of the synthetic data evaluation.

2.6) Critique of the TSTR Paradigm: The TSTR approach is fraught with risk due to the inherent "domain gap" or distribution shift that almost always exists between synthetic and real data. Agenerative model like CTGAN, especially when trained on a minuscule dataset of only 86 samples, is highly unlikely to capture the true, complex underlying data distribution perfectly. It may fail to model complex interactions, miss rare but important subgroups, or introduce its own artifacts and biases. A model trained exclusively on this imperfect synthetic data is likely to learn spurious correlations that do not generalize to the real test set, leading to unreliable performance. The reported high test accuracy of over 0.90 for the international student model is therefore highly suspect and potentially misleading. With a test set of only 86 instances, this high accuracy could easily be an artifact of chance. More worrisomely, it could indicate that the CTGAN generator produced a simplified or biased version of the data that happens to fit the small real test set well, but would not generalize to a larger, unseen real-world population. The manuscript presents this high accuracy as a major success without any critical discussion of the profound challenges and potential for failure in the TSTR setting. This lack of critical self-assessment is a significant weakness. Consequently, all downstream conclusions for international students, including the SHAP-based feature importance rankings, rest on this unstable foundation and may be invalid.

We appreciate the reviewer’s insightful critique regarding the TSTR paradigm. We fully understand the inherent "domain gap" or distribution shift between synthetic and real data. As pointed out by the reviewer, training CTGAN on such a small dataset of only 86 samples makes it highly unlikely to perfectly capture the true, complex underlying data distribution. This can lead to the model failing to capture complex interactions, missing rare but important subgroups, or introducing artifacts and biases of its own.

Consequently, a model trained solely on this imperfect synthetic data is likely to learn spurious correlations that do not generalize well to the real test set, resulting in unreliable performance. In particular, the reported high test accuracy (over 0.90) for the international student model is highly suspect—it could easily be a result of chance or suggest that the CTGAN generator has produced a simplified or biased version of the data that fits the small real test set but would not generalize to a larger or unseen real-world population.

We fully agree with the reviewer that the manuscript's lack of critical discussion and self-assessment regarding the potential risks and limitations of the TSTR paradigm is a significant weakness. In response to this critique, the revised manuscript now includes a more comprehensive comparison in Section 4.1 and Section 4.2 between the "Train on Real, Test on Real" (TRTR) and "Train on Synthetic, Test on Real" (TSTR) paradigms. The results show AUC values of 0.93 for TRTR and 0.94 for TSTR, and both curves are presented side by side in Figure 5 for direct comparison. This revision allows readers to better assess the relative strengths and weaknesses of the two approaches and highlights the limitations of relying solely on the TSTR paradigm.

We emphasize that the TSTR paradigm cannot replace real data and that its role in this study is exploratory, intended to evaluate the potential of synthetic data augmentation rather than to draw definitive conclusions. While the close alignment of TRTR and TSTR results suggests that CTGAN-generated data can somewhat approximate the real distribution, we acknowledge that this evidence is preliminary, and further validation with larger datasets will be required in future research.

2.6.1) Required Action: The authors must fundamentally strengthen and re-evaluate their entire approach for the international student cohort.

Establish a Baseline: Before making any claims about the utility of synthetic data, a baseline performance must be established. The authors should train and evaluate a model (e.g., Logistic Regression or a GBDT) using rigorous cross-validation (given the small N, Leave-One-Out or repeated 10-fold CV is appropriate) on the 86 real samples only. This result will serve as the benchmark against which any data augmentation strategy must be compared.
Rigorous Synthetic Data Validation: The authors must provide quantitative evidence of the synthetic data's quality. This should include statistical tests of distributional similarity. A "Train on Real, Test on Real" vs. "Train on Synthetic, Test on Real" comparison is essential.
Justify TSTR: The manuscript must include a new section that explicitly discusses the limitations and known risks of the TSTR paradigm, citing relevant literature. The authors must provide a compelling argument for why their results should be considered reliable despite these risks.
Ablation Study: To understand the impact of the synthetic data, an ablation study is needed.

The authors should show how the model's performance on the real test set changes as the proportion of synthetic data in the training set increases (e.g., 100% real, 75% real/25% synthetic,..., 100% synthetic). This would provide crucial insight into whether the synthetic data is genuinely helpful or is introducing noise. If performance degrades as more synthetic data is added, the entire approach is invalid.

We appreciate the reviewer’s constructive suggestions and have strengthened the manuscript accordingly.

Firstly, Section 3.2 introduces Table 3, which reports distributional similarity between real and synthetic data using Jensen–Shannon Divergence (JSD), providing quantitative validation of the CTGAN-generated data.

In Section 4.1, we present the TRTR model, which achieved an accuracy of 0.8518, compared to the TSTR model, which attained a higher accuracy of 0.9004. This result demonstrates that the TSTR paradigm can improve the model’s predictive performance.

In Section 4.2, we added a baseline “Train on Real, Test on Real” (TRTR) model using the 86 real samples. Its AUC curve is shown in Figure 5(c), with an AUC value of 0.93. We also compared this directly with the “Train on Synthetic, Test on Real” (TSTR) paradigm, which achieved an AUC of 0.94. The side-by-side curves illustrate that the two approaches yield comparable performance, highlighting the practical utility of TSTR while also acknowledging its limitations.

In Section 4.3, We added a shap

Section 3: Minor Comments for Manuscript Improvement

3.1) Manuscript Presentation and Clarity

3.1.1) Broken Cross References: The manuscript is filled with broken cross-references that appear as "Error! Reference source not found.".¹This occurs on lines 114, 163, 176, 223, 243, 250, 269, 274, 296, 307, 314, 326, 346, 367, 371, and 376, among others. This level of error is unprofessional and makes the paper difficult to read. All references to figures, tables, and sections must be meticulously checked and corrected.

We thank the reviewer for noting the broken cross-references. In the revised manuscript, we have carefully checked and corrected all figure, table, and section references to ensure accuracy.

3.1.2) Figure and Table Numbering: There are multiple figures labeled "Figure 1" (on pages 4 and 16) and other numbering inconsistencies. All figures and tables must be numbered sequentially throughout the manuscript and correctly referenced in the text.

We thank the reviewer for noting the inconsistencies in figure and table numbering. In the revised manuscript, all figures and tables have been renumbered sequentially, and the corresponding references in the text have been updated for consistency.

3.1.3) Prose and Flow: While the writing is generally clear, some sections could be more concise. The introduction, for example, could be streamlined to present the background more efficiently and arrive at the specific research questions and contributions more quickly.

We have revised the introduction to make it more concise and focused. Following the comments from Reviewer #2, we also clarified the ML framework and streamlined the background to better lead into the research questions.

3.2) Quality of Figures and Tables

3.2.1) Correlation Matrices (Figure 3): These plots, presented on pages 9 and 10, are visually dense and nearly unreadable. The use of numerical text for every cell in a large matrix is ineffective. These should be replotted as color-coded heatmaps, which would allow readers to quickly identify strong positive and negative correlations at a glance.

We thank the reviewer for this valuable suggestion. We agree that the correlation matrices in Figure 3 were visually dense and difficult to interpret due to the numerical text in every cell. In the revised manuscript, we have replotted these figures as color-coded heatmaps, which allow readers to quickly identify strong positive and negative correlations at a glance. We believe this modification significantly improves the readability and clarity of the results.

3.2.2) Proposed New Table for Synthetic Data Validation: To support the claims regarding data augmentation, a new table should be added to Section 3.2 ("Synthetic data generation"). This table is essential for providing quantitative evidence of the CTGAN's performance.

Table Caption: Comparison of key statistical properties between the real (n=86) and CTGAN- generated synthetic (n=487) datasets for international students. The final column reports the p- value from a two-sample Kolmogorov-Smirnov (K-S) test for distributional similarity.

This table would provide a direct numerical comparison of first- and second-order moments and a formal statistical test of distributional similarity, moving beyond subjective visual assessment and adding much-needed rigor.

We thank the reviewer for this valuable suggestion. Following your recommendation and similar feedback from other reviewers, we have added a new table in Section 3.2 (Table 3, shown below) to provide quantitative evidence for synthetic data validation.

Specifically, we adopted the Jensen–Shannon Divergence (JSD) rather than the Kolmogorov–Smirnov test. JSD was chosen because it provides a symmetric and bounded measure of distributional similarity (ranging from 0 to 1) without assuming a particular distribution, whereas the Kolmogorov–Smirnov test may overemphasize minor differences in small samples.

This addition offers a clearer and more reliable assessment of the CTGAN-generated data.

Table 3: Key distributional indicators for real dataset and CTGAN-synthetic dataset

Data	Abbreviation	Real data mean	Real data std	Synthetic data mean	Synthetic data std	JSD
Inputs	Marital	1.08	0.38	1.07	0.36	0.034
	Mode	6.74	4.98	6.73	4.87	0.238
	Order	1.49	0.92	1.47	0.77	0.046
	Course	10.11	4.41	11.44	4.26	0.067
	Attendance	0.95	0.21	0.95	0.21	0.008
	Qualification	1.74	2.97	1.67	2.72	0.044
	Nationality	11.23	4.47	12.56	3.82	0.693
	Mother-Q	11.40	9.61	11.11	8.45	0.206
	Father-Q	11.78	11.46	10.53	10.48	0.152
	Mother-O	7.5	3.65	7.46	3.06	0.140
	Father-O	8.01	3.19	7.56	2.40	0.038
	Displaced	0.55	0.50	0.53	0.50	0.0002
	Need	0.12	0.11	0.004	0.064	0.001
	Debtor	0.26	0.44	0.27	0.44	0.021
	Fee	0.74	0.44	0.75	0.43	0.011
	Gender	0.24	0.43	0.27	0.44	0.004
	Scholarship	0.21	0.41	0.21	0.41	0.003
	Age	23.09	7.31	22.90	6.56	0.035
	1st	4.91	3.43	5.46	4.04	0.058
	2nd	4.31	3.04	4.68	3.17	0.042
	Unemployment	11.49	2.55	12.13	2.32	0.295
	Inflation	1.18	1.33	1.10	1.19	0.328
	GDP	0.34	2.37	0.018	2.41	0.284
Output	Target	0.63	0.49	0.66	0.48	0.001

3.3) Strengthening the Literature Review and Discussion

3.3.1) Literature on TSTR: The literature review fails to engage with the growing body of work on the challenges of training on synthetic data. The authors should review and cite works from top machine learning conferences (e.g., NeurIPS, ICML) that discuss the domain gap, generative uncertainty, and the potential for synthetic data to produce real errors in downstream tasks.

We thank the reviewer for this suggestion. In the revised manuscript, we have cited relevant works from top journals and added content in Section 5 (Discussion) addressing the risks of CTGAN, including domain gap and generative uncertainty.

3.3.2) Discussion of Null Findings: The finding that objective demographic factors like age and gender have minimal predictive power is interesting. However, the discussion should be more nuanced. The authors suggest this reflects reality, but they should also consider alternative explanations. Could this be an artifact of the specific dataset? Or is it a known behavior of GBDT models, which can sometimes prioritize a few highly predictive features (like grades) at the expense of many other features with weaker, but still potentially meaningful, signals? A more critical and multi-faceted discussion is needed.

We appreciate the reviewer’s insightful comment. We agree that the limited predictive power of demographic factors deserves a more critical perspective. While this will be an important direction for our future work, we have acknowledged this limitation in the revised manuscript.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

After analyzing the manuscript, I have the impression that the authors focused on the technical aspect, while the ideological, substantive aspect—in the sense of "socio-economic and cultural"—was treated very superficially and cursorily. As an analyst of socio-economic phenomena and a computer scientist, I noticed that the manuscript is highly technical, yet weakly rooted in the socio-economic context. This is problematic given the topic, which touches not only on algorithms but also on educational policy, the labor market, and social and cultural transformations. Importantly, the authors seem to have completely ignored cultural and technological factors. The issue is, however, extremely complex and difficult to assess. I recommend that the authors supplement the work with a broader discussion of the context, taking into account changes in the labor market, the growing importance of vocational education, and the depreciation of the value of a diploma in some sectors. This would lend the work a more balanced and interdisciplinary character. In addition, I recommend that the authors thoroughly refine the manuscript, which currently has numerous technical/editorial deficiencies, which significantly reduce its quality and perception.

1) Reading the introduction gave me a thought. The passage stating that student withdrawals "pose a significant challenge to social stability and economic growth" (lines 9-11; 41-44) is too one-sided and fails to consider alternative interpretations of this phenomenon. In the context of labor market transformation, the increase in withdrawals from academic education can also be seen as an opportunity for the development of vocational and technical education, which in many countries (including Europe) is gaining importance and better responding to economic needs. I recommend that the authors supplement the introduction with a more balanced analysis of the effects of withdrawals. Please comment.

a) Lines 41-46 – The authors' arguments seem one-sided to me. The text suggests that every withdrawal is a loss for the individual and the economy, and ignores the growing value of vocational and technical paths, which in many industries offer stable employment and high earnings, often faster than academic studies. The section on "giving up the benefits of education" is overly simplistic and fails to take into account the changing realities of the labor market. In many sectors (e.g., IT, logistics, crafts, industry), professional work does not require an academic degree, and those who withdraw from studies often enter the labor market more quickly, gaining experience and financial independence. From this perspective, abandoning academic education does not necessarily mean individual failure or economic loss. It can even support economic development by strengthening the vocational and technical sectors. I suggest the authors address this aspect to provide a more balanced approach to the problem. Please comment.

b) Lines 47-54. I got the impression that the authors approached the problem of "withdrawal from studies very superficially and superficially." In my opinion, the authors treat the phenomenon of withdrawal only as a problem, ignoring the broader context of labor market transformations and social and cultural changes. Perhaps the world is changing, the labor market is changing, employers are changing, and a diploma has become devalued, resulting from the decline in the quality of education? Perhaps the labor market doesn't need graduates with certificates and diplomas, but rather specialists with skills, and the problem lies in the quality of education? When discussing the lack of a coherent definition of "withdrawal," the authors should address the changing socioeconomic and cultural context. In many sectors, the labor market increasingly values formal diplomas less and practical skills more. Therefore, withdrawing from academic studies may not so much indicate a systemic problem as it may be a rational adaptation to market realities, especially with the declining quality of education and the depreciation of a diploma. Incorporating this perspective would allow for a more balanced and multidimensional analysis of the phenomenon. Please comment.

2) Importantly, the authors seem to have completely ignored cultural and technological factors in the introduction (e.g., migration, content consumption, social media, and the myriad cultural changes that are often a consequence of social engineering, politics, and technological development). Please comment.

3) The methodology presented in the manuscript appears generally consistent and logical. In my opinion, the TSTR approach was well-chosen for the small sample of international students; reasonable model evaluation metrics were used, and XAI tools were used to interpret the results. However, the study has several significant limitations, namely:

a) the "deep learning" claimed in the title does not fully reflect reality, as the final classifiers are primarily gradient boosting algorithms, and deep learning networks were used exclusively to generate synthetic data. Please comment.

b) the test sample for international students is too small to draw strong/reliable conclusions, especially without confidence intervals. Please comment.

c) the assessment of the synthetic data quality is rather superficial and based primarily on visualizations. Furthermore, reference errors ("Error! Reference source not found") hinder a full assessment of the methodology and results. Please comment.

4) In my opinion, the discussion in the manuscript (5. Discussion) is rather one-sided and too superficial. While the authors summarize the results well and point to the high effectiveness of the TSTR approach, they lack a deeper reflection on the limitations of the study, including the small sample of international students, the risk of overfitting when using synthetic data, and the lack of confidence intervals for the metrics. Please comment.

a) Furthermore, the discussion barely touches on the socioeconomic context of the dropout phenomenon, despite the topic having strong social, economic, and cultural dimensions. A more critical, interdisciplinary analysis could significantly strengthen the paper's conclusions and make it more valuable for both researchers and practitioners. Please comment.

5) The summary (6. Conclusions), like the "Discussion of results" (5. Discussion), is primarily technical in nature and lacks critical reflection on the study's limitations. The Summary also fails to address the broader socioeconomic and cultural context of the dropout phenomenon, thus losing its fuller applied and interdisciplinary value. Please comment.

Author Response

List of changes

（Responses to reviewers’ comments）

Paper Title: Evaluating and forecasting undergraduate dropouts using machine learning for domestic and international students

Name(s) of Author(s): Songbo Wang and Jiayi He

Dear editor

We have revised the paper following consideration of the reviewers’ comments, and this document details our response to each of their comments.

Yours sincerely,

Songbo Wang (corresponding author).

Reviewer #2

GENERAL COMMENT:

We sincerely thank the reviewer for this insightful comment. We acknowledge that the original manuscript placed stronger emphasis on the technical methodology, while the socio-economic and cultural aspects were discussed only in a relatively simplified manner.

Our intention, however, was to present a technical analytical framework that could serve as a reference tool for future socio-economic investigations. In the revised manuscript, we have carefully supplemented the discussion by incorporating broader socio-economic and cultural perspectives, including labor market transformations, the growing importance of vocational education, and the depreciation of academic diplomas in some sectors. These additions aim to provide a more balanced and interdisciplinary framing of the study.

We are grateful for the reviewer’s valuable feedback, which has helped us refine and improve the manuscript in both substance and presentation. [Highlighted in blue.]

SPECIFIC COMMENTS

Reading the introduction gave me a thought. The passage stating that student withdrawals "pose a significant challenge to social stability and economic growth" (lines 9-11; 41-44) is too one-sided and fails to consider alternative interpretations of this phenomenon. In the context of labor market transformation, the increase in withdrawals from academic education can also be seen as an opportunity for the development of vocational and technical education, which in many countries (including Europe) is gaining importance and better responding to economic needs. I recommend that the authors supplement the introduction with a more balanced analysis of the effects of withdrawals. Please comment.

We sincerely thank the reviewer for this valuable suggestion. In response, we revised Lines 10–12 (previously Lines 9–11) and Lines 42–46 (previously Lines 41–44) in the introduction to present a more balanced perspective on student dropout. In addition to highlighting its potential negative consequences, we now also acknowledge that dropout is not universally detrimental.

Specifically, we added discussion on how, in the context of labor market transformation, some students pursue vocational and technical education, which in many countries (including Europe) provides stable employment and competitive earnings more quickly than academic paths. This revision clarifies that dropout may, in some cases, represent an adaptation to labor market realities rather than a failure.

Lines 41-46 – The authors' arguments seem one-sided to me. The text suggests that every withdrawal is a loss for the individual and the economy, and ignores the growing value of vocational and technical paths, which in many industries offer stable employment and high earnings, often faster than academic studies. The section on "giving up the benefits of education" is overly simplistic and fails to take into account the changing realities of the labor market. In many sectors (e.g., IT, logistics, crafts, industry), professional work does not require an academic degree, and those who withdraw from studies often enter the labor market more quickly, gaining experience and financial independence. From this perspective, abandoning academic education does not necessarily mean individual failure or economic loss. It can even support economic development by strengthening the vocational and technical sectors. I suggest the authors address this aspect to provide a more balanced approach to the problem. Please comment.

We sincerely thank the reviewer for this insightful comment. In line with the suggestion, we have revised Section 1.1 (originally Lines 41–46) to provide a more balanced perspective. In addition to discussing the potential negative consequences of dropout, we now also highlight that dropout is not universally detrimental.

Specifically, we added discussion on how, in the context of labor market transformation, some students pursue vocational and technical education, which in many countries (including Europe) provides stable employment and competitive earnings more quickly than academic studies. This revision acknowledges that, in certain cases, dropout may represent an adaptation to labor market realities rather than a failure.

Lines 47-54. I got the impression that the authors approached the problem of "withdrawal from studies very superficially and superficially." In my opinion, the authors treat the phenomenon of withdrawal only as a problem, ignoring the broader context of labor market transformations and social and cultural changes. Perhaps the world is changing, the labor market is changing, employers are changing, and a diploma has become devalued, resulting from the decline in the quality of education? Perhaps the labor market doesn't need graduates with certificates and diplomas, but rather specialists with skills, and the problem lies in the quality of education? When discussing the lack of a coherent definition of "withdrawal," the authors should address the changing socioeconomic and cultural context. In many sectors, the labor market increasingly values formal diplomas less and practical skills more. Therefore, withdrawing from academic studies may not so much indicate a systemic problem as it may be a rational adaptation to market realities, especially with the declining quality of education and the depreciation of a diploma. Incorporating this perspective would allow for a more balanced and multidimensional analysis of the phenomenon. Please comment.

We sincerely thank the reviewer for this thoughtful comment. In line with the suggestion, we have revised Section 1.1 (originally Lines 47–54) to provide a more balanced and multidimensional perspective on student dropout.

In addition to recognizing dropout as a potential challenge to educational quality and social stability, we now also emphasize that dropout reflects broader social and cultural transformations, such as the depreciation of academic diplomas, the rising demand for practical skills in the labor market, and the growing role of vocational education. Accordingly, we highlight that dropout may represent not only a loss of certain educational benefits (e.g., social mobility, degree-related opportunities) but also, in some cases, a rational adaptation to labor market realities that supports quicker economic independence.

We believe this revision directly addresses the reviewer’s concern and significantly enriches the introduction.

Importantly, the authors seem to have completely ignored cultural and technological factors in the introduction (e.g., migration, content consumption, social media, and the myriad cultural changes that are often a consequence of social engineering, politics, and technological development). Please comment.

Thank you for pointing out the absence of cultural and technological factors in the introduction. Indeed, migration patterns, content consumption habits, social media influences, and the broader cultural changes driven by social engineering, politics, and technological development play a significant role in shaping students' priorities and behaviors. These factors significantly affect their academic engagement and decisions to persist or drop out.

Introducing the study by Loureiro et al. in Section 1.2 will help better understand the multidimensional nature of dropout risk. While affirmative action policies successfully increased enrollment among Black, mixed-race, Indigenous, and quilombo students, the dropout rates for these groups also rose. This indicates that cultural factors, including potential technological and societal changes, continue to exacerbate dropout risk. By incorporating this, the article can more comprehensively consider the multifaceted influences on dropout risk. Thank you again for your valuable feedback.

The methodology presented in the manuscript appears generally consistent and logical. In my opinion, the TSTR approach was well-chosen for the small sample of international students; reasonable model evaluation metrics were used, and XAI tools were used to interpret the results. However, the study has several significant limitations, namely:

3.1) the "deep learning" claimed in the title does not fully reflect reality, as the final classifiers are primarily gradient boosting algorithms, and deep learning networks were used exclusively to generate synthetic data. Please comment.

We sincerely thank the reviewers for their valuable feedback. Since both Reviewer #1 and Reviewer #3 pointed out that the term “deep learning” was not appropriate in describing our methodology, we have carefully reconsidered this issue and realized the inaccuracy.

In the revised manuscript, we have corrected the terminology in the Title, Abstract, Keywords, Figure 1 and Section 2, and clarified that our study is based on a machine learning framework rather than deep learning. These revisions ensure that the manuscript more accurately reflects the actual methodology.

3.2) The test sample for international students is too small to draw strong/reliable conclusions, especially without confidence intervals. Please comment.

We thank the reviewer for this important comment. Reviewer #1 also raised a similar concern, which prompted us to reflect further and recognize the need for more substantial revisions.

Following Reviewer #1’s suggestion, we have made the following changes: first, in Section 3.2, We added Table 3 to compare the effectiveness of the CTGAN-generated data, evaluating how well the synthetic data matches the real data across various statistical measures. This comparison provides further insights into the quality of the synthetic data generated by the CTGAN model and highlights its potential for augmenting real-world datasets when sample sizes are limited. Second, in Section 4.1 and Section 4.2, we added Figure 5(c), where we trained a model using only the real international student data and compared it with models trained on real-plus-synthetic data.

These additions ensure that the augmented models are more convincing and demonstrate the validity of CTGAN-based data augmentation. We believe these revisions significantly strengthen the methodological rigor and credibility of our findings.

3.3) The assessment of the synthetic data quality is rather superficial and based primarily on visualizations. Furthermore, reference errors ("Error! Reference source not found") hinder a full assessment of the methodology and results. Please comment.

We thank the reviewer for this important comment. Reviewer #1 also raised a similar concern, which prompted us to reflect further and recognize the need for more substantial corrections. Following Reviewer #1’s suggestion and building on our response to Comment 3.2), we have added Table 3 in Section 3.2 to ensure and validate the effectiveness of the CTGAN-generated data.

At the same time, we also acknowledge that the original manuscript contained reference errors ("Error! Reference source not found"), which indeed hindered a full assessment of the methodology and results. In the revised manuscript, we have carefully checked and corrected all cross-references to ensure accuracy and consistency, thereby improving readability and the reliability of the assessment.

In my opinion, the discussion in the manuscript (5. Discussion) is rather one-sided and too superficial. While the authors summarize the results well and point to the high effectiveness of the TSTR approach, they lack a deeper reflection on the limitations of the study, including the small sample of international students, the risk of overfitting when using synthetic data, and the lack of confidence intervals for the metrics. Please comment.

We thank the reviewer for the valuable comment. In the revised manuscript, we have made several key improvements to strengthen the methodological rigor and better address the study’s limitations, as suggested by Reviewer #1 and Reviewer #3:

In Section 3.2, we added Table 3 to validate the effectiveness of CTGAN-generated data by providing a statistical assessment of the distributional similarity between real and synthetic data using Jensen–Shannon Divergence (JSD). This quantifies the quality of the synthetic data and addresses concerns about its relevance for model training.
In Section 4.1 and Section 4.2, we included Figure 5(c), where we compared a model trained exclusively on real international student data with models trained on a combination of real and synthetic data. This comparison allows for a direct evaluation of the TSTR paradigm's performance, highlighting both its strengths and limitations.
In Section 5, we expanded our discussion of the study's limitations, focusing on the small sample size of international students, the potential risk of overfitting when using CTGAN-generated data, and the absence of confidence intervals, which we acknowledge as a limitation in our current analysis. These revisions offer a more nuanced reflection on the challenges of the study and clarify the conditions under which the findings should be interpreted.

We believe these changes significantly improve the methodological rigor, robustness, and overall professionalism of the manuscript.

4.1) Furthermore, the discussion barely touches on the socioeconomic context of the dropout phenomenon, despite the topic having strong social, economic, and cultural dimensions. A more critical, interdisciplinary analysis could significantly strengthen the paper's conclusions and make it more valuable for both researchers and practitioners. Please comment.

Thank you for your valuable comment. We fully acknowledge the importance of the socioeconomic context in understanding the dropout phenomenon, which indeed has significant social, economic, and cultural dimensions. While this study emphasizes academic performance as a key predictor of dropout risk, we agree that a broader, more interdisciplinary analysis is essential to fully grasp the complex factors that contribute to student attrition.

In response to your suggestion, we have revised the Discussion to include a more critical reflection on the broader socioeconomic and cultural factors that influence student retention. Specifically, we recognize that factors such as financial stress, family background, and cultural adaptation, particularly for international students, often interact with academic outcomes. These non-academic factors are crucial in understanding dropout risk, yet they are not easily captured by academic performance metrics alone.

Furthermore, we have highlighted the need for an interdisciplinary approach, integrating perspectives from sociology, economics, and psychology, to gain deeper insights into how these factors influence student dropout. By combining academic support with interventions addressing non-academic factors, universities can develop more comprehensive and effective strategies to support at-risk students. We believe these additions significantly strengthen the paper's conclusions and make it more valuable for both researchers and practitioners.

Thank you again for your thoughtful feedback.

We thank the reviewer for the valuable comment. We acknowledge that the original conclusion section was primarily technical and did not sufficiently address the study’s limitations or the broader socioeconomic and cultural context of the dropout phenomenon.

In the revised manuscript, we have expanded Section 6 (Conclusions) to include a clearer reflection on key limitations, such as the small sample size of international students, the potential risk of overfitting when using CTGAN-generated data, and the absence of confidence intervals. Furthermore, we highlight the social, economic, and cultural dimensions of dropout, linking our technical findings to these broader contexts.

We believe these revisions enhance the applied and interdisciplinary value of the study.

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

Evaluating and forecasting undergraduate dropouts using deep learning for domestic and international students
here are some questions, suggestions, and comments

topic is quite interesting.
just a concern that a small number of international students as compared to the local domestic students in the sample, why not focus on the domestic students instead

High dropout rates undermine the stability and quality... line 38 this should be cited

Academic success not only enhances... this should also be cited - line 41/42

lines 56 to 58 - this should also be cited, students often have different academic profiles and experiences compared to international...

Current state of literature - should be expanded to discuss or compare international/domestic students antecedents of dropouts

many (See Error! Reference source not found.).this should be remedy

why these data are considered - most of them are demographics, which are easily remedied, for instance if you found out that parents qualification is an issue, you can not ask the students' parents to change their qualiication....
(just an example) so the authors must justify the use of each of the data/indicators - or whether these indicators have been in previous literature found to be of significant

machine learning seems adequate

still major issue is the comparison between domestic and local, the current results were noted to be identical - which is quite surprising and is quite controversial,
as the author noted tailored strategies for international students (e.g., language-academic integration)......
would still suggests to be modest on this and use only domestic samples

Author Response

List of changes

（Responses to reviewers’ comments）

Paper Title: Evaluating and forecasting undergraduate dropouts using machine learning for domestic and international students

Name(s) of Author(s): Songbo Wang and Jiayi He

Dear editor

We have revised the paper following consideration of the reviewers’ comments, and this document details our response to each of their comments.

Yours sincerely,

Songbo Wang (corresponding author).

Reviewer #3

Comments and Suggestions for Authors

Here are some questions, suggestions, and comments

Thank you for your thoughtful questions, suggestions, and comments. We greatly appreciate your feedback and have carefully considered each point. Your insights have been invaluable in strengthening the manuscript, and we believe the revisions address the concerns raised. Thank you again for your constructive input. [Highlighted in green.]

Topic is quite interesting. Just a concern that a small number of international students as compared to the local domestic students in the sample, why not focus on the domestic students instead.

Thank you for your thoughtful comment and for recognizing the interest in our topic. We appreciate your concern regarding the small number of international students in the sample.

While focusing on domestic students might offer more robust insights due to a larger dataset, one of the key contributions of our study is the application of CTGAN to address the data limitations for international students. In response to the feedback from Reviewers #1 and #2, we have included a comparison between models trained on real data only and those enhanced with CTGAN-generated data, highlighting the performance improvements and strengthening the overall credibility of our analysis. These revisions are reflected in Table 3 in Section 3.2 and Figure 5 in Section 4.2. We believe this approach adds valuable insights to the study.

High dropout rates undermine the stability and quality... line 38 this should be cited

We thank the reviewer for the comment. A supporting citation has been added in Line 38 (originally line 38) of the revised manuscript.

Academic success not only enhances... this should also be cited - line 41/42

We thank the reviewer for this helpful suggestion. In line with Reviewer #2 in comment 1.1), we have revised the text and incorporated the recommended citation. The supporting reference has been added and the corresponding sentences have been updated in Lines 41–44 of the revised manuscript.

lines 56 to 58 - this should also be cited; students often have different academic profiles and experiences compared to international...

We thank the reviewer for the suggestion. A relevant citation has been added in Lines 63-65 (originally lines 56–58) in the revised manuscript.

Current state of literature - should be expanded to discuss or compare international/domestic students’ antecedents of dropouts

We sincerely thank the reviewers for their helpful suggestions. Based on the combined comments from Reviewer #1 and Reviewer #2, we have expanded Section 1.1 to provide a broader discussion of undergraduate dropout, including both international and domestic students.

Specifically, we now emphasize that high dropout rates not only threaten the stability and quality of educational systems but also reflect deeper socio-cultural transformations, such as the depreciation of diplomas, the rising demand for practical skills in the labor market, and the growing role of vocational education. Within this context, dropout may represent both a loss of certain educational benefits (e.g., social mobility, degree-related opportunities) and, in some cases, a rational adaptation to market realities.

Furthermore, we added Discussion on the lack of a universally accepted definition of “dropout” and how differences in data sources and calculation methods contribute to inconsistent estimates of dropout rates. This highlights the importance of examining socio-economic and cultural drivers when comparing dropout antecedents across domestic and international student populations.

These revisions, guided by the reviewers’ comments, provide a more balanced and comprehensive view of the literature and underscore the need for further research to clarify the causes and predictors of dropout in diverse contexts.

many (See Error! Reference source not found.). this should be remedy

We thank the reviewer for pointing this out. All broken cross-references have been checked and corrected in the revised manuscript.

why these data are considered - most of them are demographics, which are easily remedied, for instance if you found out that parents qualification is an issue, you cannot ask the students' parents to change their qualification....(just an example) so the authors must justify the use of each of the data/indicators - or whether these indicators have been in previous literature found to be of significant.

We thank the reviewer for this thoughtful comment. The selection of indicators in our study follows prior literature and established research frameworks in this field, where these demographic and contextual factors have consistently been examined in relation to dropout and educational outcomes. Our intention is not to suggest that such indicators (e.g., parental qualification) can be directly changed, but rather to align with the existing body of work and ensure comparability with previous studies. Therefore, the use of these indicators is justified by their recurrent adoption in the literature, and their inclusion allows our results to be situated within a broader scholarly context.

machine learning seems adequate

We thank the reviewers for highlighting this important issue. In line with the comments from Reviewer #1 and Reviewer#2, we recognize that describing the study as deep learning was inaccurate. The work is based on a machine learning framework, with CTGAN (a DL-based method) used only for data augmentation. Referring to the entire study as DL was indeed inappropriate. In the revised manuscript, we have clarified this throughout, replacing all references to DL with ML and making corresponding corrections in the Title, Abstract, Keywords, Section 2 and Section 4 to ensure accuracy and consistency.

still major issue is the comparison between domestic and local, the current results were noted to be identical - which is quite surprising and is quite controversial, as the author noted tailored strategies for international students (e.g., language-academic integration)......would still suggests to be modest on this and use only domestic samples

We thank the reviewer for this important suggestion. We understand the concern regarding the similarity of dropout prediction results between domestic and international students. The main aim of this study was to explore the use of CTGAN to address the limited data for international students and improve dropout risk assessment. However, we acknowledge that the results are derived from the source data and may involve some uncertainty. Ensuring the accuracy of predictions will be an important focus of our future work.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

review recommendations
corrections to minor methodological errors and text editing

Author Response

List of changes

（Responses to reviewers’ comments）

Paper Title: Evaluating and forecasting undergraduate dropouts using machine learning for domestic and international students

Name(s) of Author(s): Songbo Wang and Jiayi He

Dear editor

We have revised the paper following consideration of the reviewers’ comments, and this document details our response to each of their comments.

Yours sincerely,

Songbo Wang (corresponding author).

Reviewer #1

review recommendations：corrections to minor methodological errors and text editing

We sincerely appreciate the reviewer’s thoughtful suggestions and deeply apologize for not noticing these errors earlier. We understand the inconvenience caused.

We have now made the necessary revisions to the text, including correcting minor methodological errors and text formatting issues. For example, in Section 1, we have corrected the incorrect symbol “’” and adjusted the subsequent formatting issues. Additionally, in Figure 1, we have changed “DL method” to “ML method,” in accordance with the reviewer’s suggestion. Thank you again for your valuable feedback. [Highlighted in red.]

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors The revised manuscript represents some progress compared to the earlier version. The authors have largely addressed the reviewers' comments, enriching the introduction with broader socioeconomic and cultural perspectives, expanding the discussion of the study's limitations, and strengthening the methodological section of the work. While some issues, particularly the relatively small sample of international students, which is a limitation of the study, and the level of reflection on the socioeconomic determinants of dropout, could be improved or further developed, the article already provides valuable insights for both researchers and practitioners. Technical comments 1) In my opinion, the illustration presented in Figure 1 does not meet the graphic standards required for scientific publications. The composition resembles a presentation slide, infographic, or promotional material rather than a professional research design. Some of the text appears to be too small, making it difficult to read when reduced to publication size. I would recommend a simpler, minimalist flowchart with clearly described methodological steps, consistent with graphic conventions used in scientific journals. Please comment. 2) Is Figure 10 even necessary? (4.4 Interactive software tool for predictions; Figure 10: Web-based interactive tool setup for flexible devices). In my opinion, Figure 10 should be replaced with a short text description.

Author Response

List of changes

（Responses to reviewers’ comments）

Paper Title: Evaluating and forecasting undergraduate dropouts using machine learning for domestic and international students

Name(s) of Author(s): Songbo Wang and Jiayi He

Dear editor

We have revised the paper following consideration of the reviewers’ comments, and this document details our response to each of their comments.

Yours sincerely,

Songbo Wang (corresponding author).

Reviewer #2

GENERAL COMMENT

He revised manuscript represents some progress compared to the earlier version. The authors have largely addressed the reviewers' comments, enriching the introduction with broader socioeconomic and cultural perspectives, expanding the discussion of the study's limitations, and strengthening the methodological section of the work.While some issues, particularly the relatively small sample of international students, which is a limitation of the study, and the level of reflection on the socioeconomic determinants of dropout, could be improved or further developed, the article already provides valuable insights for both researchers and practitioners.

We sincerely appreciate the reviewer’s thoughtful feedback. We are glad to hear that the revised manuscript represents progress compared to the earlier version and that we have addressed most of the reviewer’s comments.

Regarding the issues raised, we acknowledge the limitation posed by the relatively small sample of international students and will further emphasize this point in the revised manuscript. We agree that a deeper reflection on the socioeconomic determinants of dropout would enhance the robustness of the study, and we are working on expanding this discussion in the final version.

We are grateful for the reviewer’s recognition that the article already provides valuable insights for both researchers and practitioners. We will continue to refine and improve the manuscript to ensure it meets the highest academic standards. Thank you again for your constructive feedback. [Highlighted in blue.]

Technical comments

In my opinion, the illustration presented in Figure 1 does not meet the graphic standards required for scientific publications. The composition resembles a presentation slide, infographic, or promotional material rather than a professional research design. Some of the text appears to be too small, making it difficult to read when reduced to publication size. I would recommend a simpler, minimalist flowchart with clearly described methodological steps, consistent with graphic conventions used in scientific journals. Please comment.

We appreciate the reviewer’s valuable feedback on Figure 1. We understand your concerns and will revise the figure accordingly.

In the updated version of Figure 1, we have enlarged the text describing the working mechanism of the CTGAN section, adjusted the color scheme, and added a description of the overall framework for assessing dropout risks among domestic and international students using machine learning methods.

We will ensure that the flowchart clearly outlines the methodological steps and adheres to the graphic conventions used in scientific publications. Thank you for your guidance in helping us improve the quality of the figure.

Is Figure 10 even necessary? (4.4 Interactive software tool for predictions; Figure 10: Web-based interactive tool setup for flexible devices). In my opinion, Figure 10 should be replaced with a short text description.

We appreciate the reviewer’s valuable feedback. We have realized that the content of Figure 10 can be effectively conveyed through a brief text description. Therefore, we have removed Figure 10 and added the corresponding description in Section 4.4. We have revised the description to include more detailed information about the application’s code and datasets being hosted on GitHub, the deployment process via Streamlit Cloud, and the generation of a unique web link for easy access to the live Streamlit app. The updated text now provides a more comprehensive explanation of the tool, which we believe aligns with your suggestion for a more concise and information-rich description. Thank you for your guidance.

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

after going over the point by point revisions made by the author/s the paper is now better and clearer, and is ready for acceptance

Author Response

List of changes

（Responses to reviewers’ comments）

Paper Title: Evaluating and forecasting undergraduate dropouts using machine learning for domestic and international students

Name(s) of Author(s): Songbo Wang and Jiayi He

Dear editor

We have revised the paper following consideration of the reviewers’ comments, and this document details our response to each of their comments.

Yours sincerely,

Songbo Wang (corresponding author).

Reviewer #3

GENERAL COMMENT

After going over the point by point revisions made by the author/s the paper is now better and clearer, and is ready for acceptance.

We sincerely thank the reviewer for the positive feedback. We are grateful for the thorough review of our revisions and are pleased to hear that the paper is now clearer and improved. Your valuable comments and suggestions have significantly contributed to enhancing the quality of our work. We appreciate your recommendation for acceptance and look forward to the paper's publication.

Thank you once again for your time and constructive feedback.

Author Response File: Author Response.docx