A Structural Causal Model Ontology Approach for Knowledge Discovery in Educational Admission Databases
Round 1
Reviewer 1 Report
Comments and Suggestions for Authorsdear authors,
Thanks for your work. Below is my observation regarding the paper:
- I missed a good relevant literature review and comparison to establish this research problem.
- How research question 3 (what predictive machine learning (ML) models can best be generalized on features extracted/engineered from the admission database) contributes in this research.
- I missed the reason why only these ML model have been chosen.
- Overall, I missed the quality structure and clarity in the paper.
- A proper alignment of research question with research findings in terms of scientific contribution should be presented clearly.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis manuscript presents an interesting approach by integrating Structural Causal Models with Knowledge Discovery in Databases for analyzing educational admission systems, specifically focusing on a Nigerian polytechnic institution. While the core concept shows promise, several significant concerns need to be addressed before this work can be considered for publication.
The most critical limitation is the reliance on data from a single institution (Benue State Polytechnic) with only 12,043 records. Although the authors claim their findings are generalizable to other Nigerian polytechnic institutions, this assertion lacks empirical support. The geographical and institutional specificity severely constrains the external validity of the results. To strengthen the study's impact and credibility, data from multiple polytechnic institutions across different regions should be included to validate the proposed framework's broader applicability.
Methodological concerns also require attention. The feature engineering process, particularly the creation of the "Current_Qualification" variable from "Course_Category," lacks sufficient documentation, which affects the study's reproducibility. The presence of age outliers ranging from -4 to 14 years suggests underlying data quality issues that may indicate broader systematic problems with the dataset. These anomalies should be thoroughly investigated and documented. Additionally, the binary classification approach may oversimplify the admission process, potentially missing important nuanced admission categories that exist in real-world scenarios.
The statistical analysis presents several weaknesses. While the Conditional Independence Test validation shows statistical significance, the correlations are relatively weak and close to zero, which may indicate that the claimed causal relationships are not as robust as suggested. The machine learning evaluation lacks crucial details about cross-validation procedures and proper train/test split documentation. Furthermore, the absence of baseline comparisons or benchmarking against existing admission prediction systems makes it difficult to assess the true value of the proposed approach.
Regarding the SCM framework implementation, the Directed Acyclic Graph construction appears to rely primarily on domain expertise rather than data-driven discovery methods. While domain knowledge is valuable, the study would benefit from exploring alternative causal structures or employing causal discovery algorithms to validate the proposed relationships. The reported accuracy rates of up to 92% seem optimistic given the binary nature of the problem and dataset characteristics. More comprehensive evaluation metrics including ROC-AUC, precision-recall curves, and model calibration would provide a more balanced assessment of model performance.
Technical presentation issues also need addressing. Figure quality requires improvement, particularly Figures 5 and 5b, which are difficult to interpret. Table formatting, especially Table 1, could be enhanced for better readability. The manuscript contains several typographical errors that should be corrected. The literature review section would benefit from incorporating more recent publications in educational data mining and causal inference, and the comparison with existing ontological approaches for admission systems appears superficial.
To strengthen this work, I recommend expanding the dataset to include multiple institutions, providing detailed documentation of all preprocessing and feature engineering steps, implementing proper cross-validation strategies, exploring alternative causal structures, including comprehensive performance metrics with statistical significance tests, and discussing practical implementation challenges and ethical considerations in automated admission decisions. The integration of SCM with KDD shows promise for educational applications, but the current execution requires substantial improvement to meet publication standards.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsDear Authors,
The manuscript addresses a timely and relevant topic using a promising interdisciplinary approach. However, in its current form, the manuscript requires major revisions to meet the scientific and editorial standards of a high impact journal.
1. The Abstract needs substantial improvement. It should begin with a clear and concise statement of the research objective. The authors should explicitly articulate the novelty and contribution of their approach, briefly describe the data and methodological sequence (SCM-CIT-ML) and summarize key numerical results, including algorithms used and their evaluation metrics. The conclusion should be reformulated to emphasize the academic relevance and practical applicability of their findings.
2. The Introduction provides a general overview of knowledge discovery in databases (KDD) and its applications, but the first part is overly encyclopedic, and lacks focus on the specific problem of admissions databases in higher education. The research gap is not clearly formulated, and the transition toward SCM is insufficiently justified theoretically. The listed contributions are presented in isolation, without narrative integration or emphasis on their innovative character. The section would benefit from significant rewriting to:
- Reduce general background content.
- Emphasize the methodological novelty and justify the use of SCM;
- Clearly articulate the research gap;
- Improve coherence and academic precision in tone;
Additionally, the number of references is excessive, and many are tangential to the main objective. It is recommended to reduce the citations and retain only those that:
- Establish the relevance of KDD in educational contexts.
- Justify the use of SCM and ontologies.
- Support the articulation of a clear research gap.
3. Related Works. This section requires structural reorganization and greater analytical depth. Although key concepts such as ontologies and SCM are mentioned, the review is presented in a linear narrative format without clear thematic subdivisions or critical positioning of the proposed study within the literature. Comparative evaluations are vague, and many cited sources are outdated or regional. The section should be reorganized around clear subthemes, updated with recent hight-impact sources (preferably Q1 journals), and include a stronger articulation of how this works advances or differs from previous studies.
4. Materials and Methods. The section is logically structured and outlines the main steps: data preprocessing, SCM modeling, CIT validation, and ML integration. However, critical implementation details are missing. For instance: What specific R functions were used for CIT? Which Python libraries were employed for data conversion? Was the data split using a hold-out or cross-conversion? There is also inconsistency in variable naming (e.g. modeofentry vs mdfn), which creates confusion. A standardized table listing variable names and codes would be useful. The explanation of ontology validation lacks depth (e.g. thresholds, number of tests, interpretation of coefficients). It is unclear how the SCM was adjusted if CIT validation failed. Greater clarity and technical transparency are needed.
5. Implementation and Results. The implementation flow is logically organized. However, Table 1 is overly long and difficult to interpret. A concise summary should be provided:
- How many independences test were confirmed?
- Which relationships failed the validation?
- What p-value threshold was applied?
ML modeling lacks methodological depth. Details about data splitting strategy, validation methods (e.g. hold-out or cross-validation) and hyperparameter tuning are missing. The performance differences between models (e.g. KNN and SVM vs Decision Tree) are not discussed or interpreted. The authors should address why some models performed better and what this implies in terms of model behavior or data proprieties. Feature importance and potential overfitting risks should also be addressed, especially for non-linear models like Random Forest or SVM. Finally, a critical reflection on the CIT validation results is necessary: was it perfect? Were any hypotheses rejected?
6. Conclusion and Future Work. The current conclusion is mostly descriptive and lacks a critical reflection on the study s limitations (e.g. single institution data sources, potential class imbalance, lack of external validation). The authors should explicitly link the conclusion to the previously stated research gap, articulating the study s contribution to literature. Practical implications should also be emphasized - for example, how causal variables can inform or improve admissions decisions. The” future work” section should be made more concrete: Which XAI techniques are envisioned? What types of bias are being targeted?; How will results be validated across broader datasets? A final paragraph summarizing the key contributions and potential for generalizability would strengthen the impact of the conclusion.
Best regards,
Comments on the Quality of English LanguageThe manuscript requires professional language editing prior to publication, preferably conducted by a fluent speaker of academic English.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsDear authors,
Thanks for the revised version.
Comments on the Quality of English LanguageNA
Author Response
Thank you for your positive feedback and for taking the time to review our revised manuscript. We appreciate your support.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe revised manuscript still omits several key elements: the Abstract does not explicitly state that the study was “demonstrated as a proof-of-concept,” despite the authors’ commitment; no comparative benchmarking against established acceptance-prediction systems is presented; precision–recall curves and calibration plots are absent, limiting assessment of model reliability in imbalanced settings; alternative causal-discovery methods (e.g., PC algorithm) remain unexplored; and the ethical discussion lacks consideration of broader societal and decision-automation implications. I therefore recommend further minor revision to incorporate these components, which will substantially strengthen the manuscript’s conceptual rigor, methodological depth, and ethical transparency .
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsDear Authors,
Thank you for your thoughtful and comprehensive revisions. The manuscript shows clear improvement in all major areas:
- The abstract and introduction now clearly state the research objective, gap, and methodological innovation.
- Related work is better structured and contextualized.
- Methods and implementation details have been clarified, improving reproducibility.
- Results are more transparent and critically interpreted, especially regarding CIT validation and ML performance.
- The conclusion effectively links findings to the research gap and outlines practical implications and future work.
To further strengthen the manuscript, I suggest the following minor improvements:
- Add a brief reflection on the risk of expert bias in DAG construction.
- Include a short paragraph on ethical considerations in automated admissions.
- Optionally, refer to 1–2 recent Q1 sources to reinforce the literature base.
- Consider framing the study's relevance in a broader international context.
Overall, this is a valuable and well-executed contribution.
Best regards,
Author Response
Please see the attachment.
Author Response File: Author Response.pdf