Next Article in Journal
Visual Footprint of Separation Through Membrane Distillation on YouTube
Next Article in Special Issue
SAPEx-D: A Comprehensive Dataset for Predictive Analytics in Personalized Education Using Machine Learning
Previous Article in Journal / Special Issue
Stress Factors in Higher Education: A Data Analysis Case
 
 
Article
Peer-Review Record

A Bayesian State-Space Approach to Dynamic Hierarchical Logistic Regression for Evolving Student Risk in Educational Analytics

by Moeketsi Mosia
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Submission received: 11 January 2025 / Revised: 26 January 2025 / Accepted: 2 February 2025 / Published: 7 February 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

see attachment.

Comments for author File: Comments.pdf

Comments on the Quality of English Language

There are also many writting error that should be revised.

Author Response

Comment

  1. The content of the Introduction part is too long, it is suggested to rewrite this part for a concison expression.

Response

Thank you for this feedback. In our manuscript, we elected to integrate the literature review into the Introduction rather than placing it in a separate section, which is why it appears lengthier. This approach helps synthesize the relevant background and highlight existing gaps in a single coherent narrative. While we can certainly streamline a few sentences for clarity and brevity, we also consider many of these details essential for establishing the broader context and motivating our study. Thus, our preference is to keep an integrated approach, but we will ensure that the final version is as concise as possible without losing crucial references and arguments.

Comments

  1. Please add the sketch of the paper at the end of Section One, for example, “The paper is organized as follows. In section 2, ……”

Response

Thanks, added.

Comment

  1. The paper does not explicitly state the source of the real-world data. Please provide information on the data source, collection methods, and sample size.

Response

Section was added thanks

Comment

  1. The paper primarily utilizes the AR(1) model to describe the dynamic changes in risk. Could other models, such as ARIMA, be considered and compared?

Response

Motivation for AR(1) was added

Comment

  1. The paper includes covariates from learning management systems and demographic characteristics. Could other potential covariates, such as learning styles or motivation, be considered?

Response

Unfortunately this data was not collected

Comment

  1. One of my main concerns is that how to get the values of the hyper-parameters in Section 2.3, it seems that the author has set the associated values directly in lines 185, 187, 189 and 192, so I wonder why such priors have been provided directly with given hyper-parameters, and it there other choices for your priors?

Response

Thanks for the comment, guided by standard Bayesian practice (see, e.g., Gelman et al., 2013) rather than completely uninformative or overly constraining. In particular, normal(0, 1) priors on regression coefficients have become a common default that encodes reasonable scale assumptions without dictating strict beliefs. Indeed, other choices are possible our approach simply reflects a desire for modest regularization and some domain knowledge about plausible parameter magnitudes. In fact, we ran prior predictive checks (not shown in detail here) to confirm that these priors produce reasonable simulated outcomes (i.e., they do not force extreme or implausible predictions). If needed, one could adopt more diffuse priors or stronger informative priors, our aim was to strike a balance that (1) does not overwhelm the likelihood with preconceptions, yet (2) prevents pathological parameter estimates when sample sizes are limited.

Comment

  1. In addition to AIC and BIC, could other model evaluation metrics be considered, such as ROC curves, precision, and recall?

Response

Thank you for suggesting these additional metrics. In a Bayesian framework especially when we have continuous posterior predictive distributions for each outcome posterior predictive checks (PPCs) can already capture much of what ROC curves, precision, and recall aim to assess, by comparing replicated draws to observed data. That said, if our primary interest is in classification performance (e.g., predicting “fail” vs. “pass”), we can absolutely compute ROC or precision-recall curves from the posterior-predicted probabilities for each observation, thereby incorporating uncertainty across the parameter space. However, unlike AIC and BIC, which are frequentist criteria, we typically rely on Bayesian metrics such as WAIC or LOO for overall model comparison, complemented by PPCs to diagnose any systematic mismatch. In short, while ROC curves and related classification metrics can certainly be informative, we find that posterior predictive checks and Bayesian information criteria often provide a more complete picture of model fit and predictive adequacy under uncertainty.

Comment

  1. There is just one real-world dataset used for model validation. It is suggested to test the model with other datasets to assess its generalization ability.

Response

Thank you for pointing this out. In our specific application domain, we currently only have access to one comprehensive real-world dataset that covers the full scope of the problem. However, we compensate for this limitation by performing extensive posterior predictive checks, including both simulation studies and in-sample diagnostic evaluations, to ensure that our model is not simply overfitting this single dataset. In principle, we would welcome additional datasets to confirm broader generalizability, but given domain constraints, these are not readily available. Going forward, we intend to apply our model to new data as it becomes accessible, but for now, the Bayesian framework, with its built-in uncertainty quantification and posterior predictive checking, provides us with a robust means of gauging how well the model is likely to generalize beyond the single dataset.

Comment

  1. There are also some grammars, typos errors in the main text, please check carefully. For example, some of them are as follows

Response

The manuscript was read carefully – thanks.

Comment

The paper employs many pronoun “we” in its writing, please remove it in your contents to align with standard practices in academic writting.

Response

Thanks – it is now reduced to the extent that was possible, however this is very common in Statistics and Mathematics research academic writing.

Comment

The detaied description of Section 2 shown from line 121-126 is not necessary.

Response

Thanks – removed

Comment

Line 151 & 183, the reference is missing.

Response

Done – thanks

Comment

Line 200, The word “where” should be capitalized.

Response
Thanks – Done.

Comment

Page 14, the symbols should be written in Greek letter form at Tables A1 and A2.

Response
Thanks – Done

Comment

Page 15, Appendix B is not necessary, and there are also no figures in your paper.

Response

Done – thanks

Reviewer 2 Report

Comments and Suggestions for Authors

In short, the manuscript makes important contributions to the related Bayesian modeling literature, and is generally well written and technically correct, to warrant its eventual journal publication.

Main Points

(1)  However, the posterior predictive methodology for model checking and diagnostics, mentioned in Section 2.5 of the manuscript, needs to be made more rigorous, and needs to be described in more detail. Also Sections 4 and 5 do not describe or discuss, in detail, the results of the posterior predictive methodology for model checking.

In particular, the posterior predictive methodology for model checking and diagnostics, proposed in Section 2.5, uses the data twice: first, to estimate the posterior distribution of the model parameters conditionally on the observed dataset; and then to compare the observed data to posterior predictive distribution of the model (which is based on the posterior distribution of the model parameters, conditional on the observed dataset).
This double use of data will make the results of (posterior predictive) model fit overly optimistic in favor of model fit.
This problem can be solved using the 'sampled posterior predictive p-value' methodology, as described in the following article and the references therein. 

Gosselin F (2011) A New Calibrated Bayesian Internal Goodness-of-Fit Method: Sampled Posterior p-Values as Simple and General p-Values That Allow Double Use of the Data. PLOS ONE 6(3): e14770. https://doi.org/10.1371/journal.pone.0014770
 
(2)  Also, be sure that the (revised) manuscript provides a GitHub (or other) weblink to the software code, so that readers can reproduce all the empirical results and equations presented in the manuscript.


Details
Line 8:  What do you mean by "robust posterior estimates".
The term "robust" has multiple meanings in the statistics field.
Lines 151 and 183:  Typos "?"
Line 200 starts with a set of unnecessary blank spaces. Delete these blank spaces.

Author Response

Comment 1:

(1)  However, the posterior predictive methodology for model checking and diagnostics, mentioned in Section 2.5 of the manuscript, needs to be made more rigorous, and needs to be described in more detail. Also Sections 4 and 5 do not describe or discuss, in detail, the results of the posterior predictive methodology for model checking.

In particular, the posterior predictive methodology for model checking and diagnostics, proposed in Section 2.5, uses the data twice: first, to estimate the posterior distribution of the model parameters conditionally on the observed dataset; and then to compare the observed data to posterior predictive distribution of the model (which is based on the posterior distribution of the model parameters, conditional on the observed dataset).
This double use of data will make the results of (posterior predictive) model fit overly optimistic in favor of model fit.
This problem can be solved using the 'sampled posterior predictive p-value' methodology, as described in the following article and the references therein. 

Gosselin F (2011) A New Calibrated Bayesian Internal Goodness-of-Fit Method: Sampled Posterior p-Values as Simple and General p-Values That Allow Double Use of the Data. PLOS ONE 6(3): e14770. https://doi.org/10.1371/journal.pone.0014770

Response:
Thank you for raising these points. From a Bayesian perspective, posterior predictive checks are naturally conditioned on the observed data—yes, “double use” occurs, but that’s built into the theory. We treat them not as formal frequentist tests but as tools for revealing whether the model can reproduce the salient features of the dataset that informed it. In this sense, the approach is perfectly coherent: if a model fails to generate data resembling what we actually observed, it likely fails as a good explanation. The primary goal here is to check in-sample adequacy, and posterior predictive checks (combined with our simulation study) do exactly that. While Gosselin’s methodology provides a more formal “calibrated” p-value framework, it remains less commonly adopted, and extensive literature (e.g., Gelman et al.) supports the effectiveness of standard posterior predictive checks—even with their “double use of data”—as long as we interpret them diagnostically rather than as strict hypothesis tests. The simulation results further confirm that this approach is theoretically sound for our models, reinforcing that these checks can robustly flag any substantial misfit

Comment:
Also, be sure that the (revised) manuscript provides a GitHub (or other) weblink to the software code, so that readers can reproduce all the empirical results and equations presented in the manuscript.

Response

Thanks; a link to the repo will be added in the manuscript if the paper is accepted. 

Reviewer 3 Report

Comments and Suggestions for Authors

The paper introduces a Bayesian State-Space approach for dynamic hierarchical logistic regression to model student risk over time. This approach uses partial pooling and state-space formulations to better account for individual student risk trajectories, offering interpretability and dynamic updates. Key contributions include using Bayesian methods for tracking temporal risk evolution, providing interpretable outputs, and handling sparse data effectively. The model is validated on simulated and real-world datasets and performs better than static approaches.

 

Top 4 Critical Issues:

  1. Poor Description of Datasets: The real-world datasets are not adequately described, lacking critical details about their sources, size, and characteristics. This hampers reproducibility and understanding of how representative the data is for the broader educational context.
  2. Limited Exploration of Model Assumptions: While the Bayesian framework is robust, the paper does not sufficiently validate or justify assumptions such as prior distributions or the AR(1) dynamics for real-world data. Sensitivity analysis on these assumptions appears absent.
  3. Overemphasis on Simulations: The paper relies heavily on simulation studies to demonstrate the model's efficacy, but the real-world application lacks a comprehensive comparison to baseline methods, leaving its practical utility underexplored.
  4. Missing References: These are key papers in predicting students at risk with an emphasis on interpretable results. 
    * "Early Prediction of At-Risk Students in Secondary Education: A Countrywide K-12 Learning Analytics Initiative in Uruguay"
    * "Modeling Engagement in Self-Directed Learning Systems Using Principal Component Analysis"

* "Predicting At-Risk Students in Higher Education" 

 

Author Response

Comment 1:

  1. Poor Description of Datasets: The real-world datasets are not adequately described, lacking critical details about their sources, size, and characteristics. This hampers reproducibility and understanding of how representative the data is for the broader educational context

Response:
Thank you for highlighting the poor description of the datasets. Section 2.1.1 has been revised to provide a comprehensive overview of the data sources, including their size, characteristics, and any ethical considerations. A detailed table now summarizes key variables, sources, and limitations, ensuring greater transparency and reproducibility

comment 2:

2. Limited Exploration of Model Assumptions: While the Bayesian framework is robust, the paper does not sufficiently validate or justify assumptions such as prior distributions or the AR(1) dynamics for real-world data. Sensitivity analysis on these assumptions appears absent.

Response:

Thank you for highlighting the need to justify our modeling assumptions. In the revised manuscript, an expanded section now provides explicit rationales for each prior distribution, discussing their weakly informative nature and referencing standard guidelines (e.g., Gelman’s recommendations). The AR(1) formulation is also motivated by the inertia commonly observed in student outcomes, where performance or risk at one time point typically correlates with the next. This revised discussion clarifies how these choices reflect both practical experience in educational contexts and established Bayesian modeling practices.

Comment 3:

3. Overemphasis on Simulations: The paper relies heavily on simulation studies to demonstrate the model's efficacy, but the real-world application lacks a comprehensive comparison to baseline methods, leaving its practical utility underexplored.

Response:
In response to the comment about the real-world application of the model and the limited comparison with baseline methods, a Posterior Predictive Check analysis has been introduced. This additional section demonstrates that the model can reproduce observed fail rates in an actual dataset, establishing a measure of practical utility beyond the theoretical support from simulation studies. The Posterior Predictive Check analysis provides essential evidence that the model captures real-world data patterns, and it lays the groundwork for future direct.

Comment:

4. Missing References: These are key papers in predicting students at risk with an emphasis on interpretable results. 
* "Early Prediction of At-Risk Students in Secondary Education: A Countrywide K-12 Learning Analytics Initiative in Uruguay"
* "Modeling Engagement in Self-Directed Learning Systems Using Principal Component Analysis" * "Predicting At-Risk Students in Higher Education" 

Response:
Thank you for highlighting these important references, which indeed offer valuable insights into student risk prediction and interpretable modeling. Both “Early Prediction of At-Risk Students in Secondary Education: A Countrywide K-12 Learning Analytics Initiative in Uruguay” and “Modeling Engagement in Self-Directed Learning Systems Using Principal Component Analysis” have been incorporated into the revised manuscript

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have revised the paper due to previous comments, and now I suggest to accept the paper at this verison.

Reviewer 3 Report

Comments and Suggestions for Authors

changes implemented very well

Back to TopTop