Next Article in Journal
Optimizing HUD-EVS Readability: Effects of Hue, Saturation and Lightness on Information Recognition
Next Article in Special Issue
BabloXR: An Authoring Tool for Developing WebXR Educational Applications
Previous Article in Journal
LLMs in Education: Evaluation GPT and BERT Models in Student Comment Classification
Previous Article in Special Issue
Evaluating Uses of XR in Fostering Art Students’ Learning
 
 
Article
Peer-Review Record

Personalized and Timely Feedback in Online Education: Enhancing Learning with Deep Learning and Large Language Models

Multimodal Technol. Interact. 2025, 9(5), 45; https://doi.org/10.3390/mti9050045
by Óscar Cuéllar 1,*, Manuel Contero 1 and Mauricio Hincapié 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Multimodal Technol. Interact. 2025, 9(5), 45; https://doi.org/10.3390/mti9050045
Submission received: 27 March 2025 / Revised: 4 May 2025 / Accepted: 9 May 2025 / Published: 14 May 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper describes an Adaptive Feedback System (AFS) based on the combination of neural networks and LLMs, aiming at improving learning outcomes. The paper describes the output of a practical experiment. The paper is well-written and very interesting. However, a section explaining the AFS architecture is missing, making reproducing the experiment in other contexts difficult. For this reason, I suggest the authors add a section describing the AFS architecture and explain how the neural network is connected to the LLM. In fact, it's unclear who uses what. In addition, KPIs are not measured in the experiments; they are only described at a conceptual level.

Some minor suggestions:

  • Table 1 - in the last row, the challenge is missing
  • Section 4.4 is very generic. Add some specific details and/or references to the related literature.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript  presents an ambitious and timely investigation into the integration of AI technologies—specifically recurrent neural networks and GPT-4—within an educational feedback system. The study addresses an important problem: how to offer personalized, scalable, and effective feedback in online learning environments. The writing is clear, the structure is coherent, and the experimental deployment in a real-world digital art course is impressive. However, despite these strengths, the paper has several flaws.

The most pressing issue lies in the design. The quasi-experimental approach introduces substantial internal validity challenges. The experimental and control groups differ significantly in their baseline characteristics—most notably in prior knowledge and gender composition. While the authors apply ANCOVA to control for these differences, the inference of causality is still weakened by the absence of randomization and the reliance on different cohorts taken a year apart. Furthermore, the experimental group begins with a lower level of proficiency, and this initial gap is later used to highlight the adaptive system’s “equity-promoting” effect. While this interpretation is interesting, it must be treated with caution. Without randomization or at least matched control groups, such claims remain speculative.

In terms of technical depth, the paper lacks crucial detail about the AI components. The RNN model is described in general terms, but no information is provided about its architecture, hyperparameters, training-validation split, loss function, or optimization strategy. There is no report of its raw predictive performance metrics (e.g., precision, recall, ROC-AUC), and no comparison against baseline models such as logistic regression or simpler classifiers. Without this, the reader cannot assess whether the deep learning approach is justified or necessary. Similarly, GPT-4 is treated as a black-box component that “generates personalized feedback,” but there is no explanation of how prompts were constructed, how feedback was curated (if at all), or whether responses were reviewed for relevance, tone, or pedagogical value. This is especially important because the nature of GPT-4 outputs is highly dependent on prompt engineering and context quality.

The concept of the “overcoming effect” is a valuable contribution and the analysis of performance transitions across program levels is one of the paper’s strengths. However, the logic of interpreting improved outcomes in the experimental group as resulting from the feedback system is undermined by the lack of transparency in how that feedback was delivered and measured. Without any qualitative data—such as student surveys or interviews—it is impossible to know how students perceived the feedback or whether it influenced their engagement intentionally or incidentally. More importantly, there is no evidence that students engaged with the feedback systematically, nor is there any behavioral trace data showing that feedback led to specific actions.

The paper also overlooks a deeper discussion of interpretability and bias. Since the system makes predictions about student success and communicates these to learners, there is a real risk of misclassification or reinforcement of inequities if these predictions are inaccurate or biased. The authors briefly mention this concern but do not offer mitigation strategies. Moreover, they make no use of explainability methods such as SHAP or LIME that could have increased transparency of the predictive model. GPT-4 itself can also amplify cultural or linguistic biases, and without a clear content analysis of the messages sent to students, this risk is not meaningfully addressed.

The authors have done a commendable job implementing an innovative system in a demanding, real-world setting, and the longitudinal data is a rare strength. But for the work to be publishable, they need to strengthen its empirical foundation and clarify the mechanisms at play.

To improve the paper, the authors should:

  • Provide full details of the recurrent neural network architecture, including number of layers, units, training method, and metrics beyond prediction accuracy.

  • Include a baseline comparison with simpler models to demonstrate that deep learning offers a significant benefit.

  • Explain how the GPT-4 module was prompted, what kind of input data it received, and whether any feedback messages were reviewed, moderated, or analyzed for tone and quality.

  • Collect and present student perceptions of the feedback. Even a small set of survey or interview data would provide important insight into the mechanism of the system’s impact.

  • Strengthen the discussion of interpretability, especially for the predictive model, possibly using explainable AI tools to show what influenced predictions.

  • Address ethical concerns more substantively, particularly regarding how false positives or negatives might affect students and how their data privacy is protected in practical terms.

  • Refrain from drawing strong conclusions about causality or equity effects without more robust control over group differences.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The paper has been improved and is ready for publication.

Reviewer 2 Report

Comments and Suggestions for Authors

The authors seem to have adequately addressed the reviewers' comments and thus the paper can now be accepted for publication.

Back to TopTop