Integrating Ensemble Learning with Item Response Theory to Improve the Interpretability of Student Learning Outcome Tracing
Abstract
1. Introduction
- (1)
- Compared with DeepIRT and SLO at PVAMU [2,3], this study integrates cognitive theories with ensemble deep learning techniques to develop a novel and interpretable SLO tracing framework. Unlike DeepIRT, which focuses on individual IRT-based DKT models, and the SLO implementation at PVAMU, which also employs a single DKT model, this paper constructs multiple IRT-based interpretable DKT models for SLO tracing. The outputs of these models are then aggregated using a bagging ensemble approach to enhance generalizability and provide a more comprehensive representation of students’ knowledge states. Furthermore, visualization tools are utilized to depict the relationships among student ability, item difficulty, and predicted probabilities, thereby improving the interpretability of the SLO tracing process. These visualizations function as transparent and actionable decision-support tools for educators and administrators, allowing for a deeper understanding of student learning patterns.
- (2)
- Comprehensive validation of the proposed method on a PVAMU dataset demonstrates its effectiveness in accurately predicting student learning outcomes. The dataset spans multiple colleges, including the College of Engineering and the College of Science and Arts, and covers several departments such as Electrical and Computer Engineering, Civil Engineering, and Computer Science. Experimental results show that the proposed approach consistently outperforms individual models across key metrics, including AUC, accuracy, and precision.
2. Related Work
3. Task Definition
4. Methods
4.1. Item Response Theory (IRT)
4.2. Ensemble Learning
4.3. Proposed Method
4.3.1. IRT-Based DKT:
- 1PL-based DKT implements an interpretable model based on 1PL. Each item j is characterized by a single parameter, its difficulty βj. The model assumes that all items share the same discrimination power, meaning they are equally effective at distinguishing between learners of different ability levels. The probability that student i, with ability θi, answers item j correctly is given by:
- -
- θi is student ability;
- -
- βj is item difficulty;
- -
- rij ∈ {0,1} indicates the response (1 = correct, 0 = incorrect);
- -
- Pij is the predicted probability that student i correctly answers item j.
- 2PL-based DKT employs 2PL IRT to build an interpretable DKT model. In this model, each item j is characterized by difficulty parameter βj and discrimination parameter αj, which reflects how well the item differentiates between students of different ability levels:
- DeepIRT [22] introduces a variant of IRT with a weight λ applied to the ability parameter:
4.3.2. Bagging DKT
5. Experiment
5.1. Dataset
5.2. Experiment Setup
5.3. Evaluation Metrics
- TP (True Positive): correct predictions of passing courses.
- TN (True Negative): correct predictions of failing courses.
- FP (False Positive): incorrect predictions of passing courses.
- FN (False Negative): incorrect predictions of failing courses.
6. Results
6.1. Performance Comparison of SLOs Tracing Across Different Training and Testing Settings
6.2. Visualization of the Interpretability of the Proposed Method
6.3. Discussion
6.4. Practical Implications
6.5. Limitations
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| ACC | Accuracy |
| AUC | Area Under the Curve |
| CEE | Civil Engineering & Environmental |
| CHE | Chemical Engineering |
| CSC | Computer Science |
| COE | College of Engineering |
| COAS | College of Arts & Science |
| DeepIRT | Deep Item Response Theory |
| DKT | Deep Knowledge Tracing |
| DKVMN | Dynamic Key Value Memory Network |
| ECE | Electrical & Computer Engineering |
| EDM | Educational Data Mining |
| FP | False Positive |
| FN | False Negative |
| HBCU | Historically Black college and university |
| ISKT | Intention-Aware Knowledge Tracing for Learning Stage |
| IRT | Item Response Theory |
| KCDKT | Knowledge Component-integrated Deep Knowledge Tracing |
| KQN | Knowledge Query Network |
| LSTM | Long Short-Term Memory |
| KCs | Knowledge Components |
| MCE | Mechanical Engineering |
| MIRT | Multidimensional Item Response Theory |
| PL | Parameter Logistic Model (1PL |
| PVAMU | Prairie View A&M University |
| RCKT | Response Influence based Counterfactual Reasoning |
| SAKT | Self Attentive Knowledge Tracing |
| SLO | Student learning outcome |
| STEM | Science, Technology, Engineering, and Mathematics |
| TN | True Negative |
| TP | True Positive |
References
- Song, X.; Li, J.; Cai, T.; Yang, S.; Yang, T.; Liu, C. A survey on deep learning-based knowledge tracing. Knowl. Based Syst. 2022, 258, 110036. [Google Scholar] [CrossRef]
- Kuo, M.-M.; Li, X.; Obiomon, P.; Qian, L.; Dong, X. Improving student learning outcome tracing at HBCUs using tabular generative ai and deep knowledge tracing. IEEE Access 2025, 13, 82407. [Google Scholar] [CrossRef]
- Kuo, M.M.; Li, X.; Obiomon, P.; Qian, L.; Dong, X. Tracing student learning outcome at Historically Black Colleges and Universities via deep knowledge tracing. IEEE Access 2025, 13, 61340–61349. [Google Scholar] [CrossRef]
- Huang, C.-Q.; Huang, Q.-H.; Huang, X.; Wang, H.; Li, M.; Lin, K.-J.; Chang, Y. XKT: Towards explainable knowledge tracing model with cognitive learning theories for questions of multiple knowledge concepts. IEEE Trans. Knowl. Data Eng. 2024, 36, 7308–7325. [Google Scholar] [CrossRef]
- Lu, Y.; Wang, D.; Chen, P.; Meng, Q.; Yu, S. Interpreting deep learning models for knowledge tracing. Int. J. Artif. Intell. Educ. 2023, 33, 519–542. [Google Scholar] [CrossRef]
- Li, Q.; Yuan, X.; Liu, S.; Gao, L.; Wei, T.; Shen, X.; Sun, J. A genetic causal explainer for deep knowledge tracing. IEEE Trans. Evol. Comput. 2023, 28, 861–875. [Google Scholar] [CrossRef]
- Cai, L.; Choi, K.; Hansen, M.; Harrell, L. Item response theory. Annu. Rev. Stat. Its Appl. 2016, 3, 297–321. [Google Scholar] [CrossRef]
- Tsutsumi, E.; Nishio, T.; Ueno, M. Deep-IRT with a temporal convolutional network for reflecting students’ long-term history of ability data. In Artificial Intelligence in Education, Proceedings of the International Conference on Artificial Intelligence in Education, Recife, Brazil, 8–12 July 2024; Springer: Cham, Switzerland, 2024; pp. 250–264. [Google Scholar]
- Gu, W.; Liu, Z.; Liu, S. Interpretable deep knowledge tracing with graph relationship information. In Proceedings of the 2023 IEEE 5th International Conference on Civil Aviation Safety and Information Technology (ICCASIT), Dali, China, 11–13 October 2023; IEEE: New York City, NY, USA, 2023; pp. 290–295. [Google Scholar]
- Piech, C.; Bassen, J.; Huang, J.; Ganguli, S.; Sahami, M.; Guibas, L.J.; Sohl-Dickstein, J. Deep knowledge tracing. In Proceedings of the Advances in Neural Information Processing Systems 28, Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
- Xu, F.; Chen, K.; Zhong, M.; Liu, L.; Liu, H.; Luo, X.; Zheng, L. Dkvmn&mri: A new deep knowledge tracing model based on dkvmn incorporating multi-relational information. PLoS ONE 2024, 19, e0312022. [Google Scholar]
- Lu, Y.; Tong, L.; Cheng, Y. Advanced knowledge tracing: Incorporating process data and curricula information via an attention-based framework for accuracy and interpretability. J. Educ. Data Min. 2024, 16, 58–84. [Google Scholar]
- Yang, Q.; Chi, J.; Chen, W.; Wu, Z.; Huang, Y.; Zhang, J. Learning intention-aware knowledge tracing for learning stage. Discov. Comput. 2025, 28, 98. [Google Scholar] [CrossRef]
- Chen, J.; Liu, Z.; Huang, S.; Liu, Q.; Luo, W. Improving interpretability of deep sequential knowledge tracing models with question-centric cognitive representations. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; pp. 14196–14204. [Google Scholar]
- Cui, J.; Yu, M.; Jiang, B.; Zhou, A.; Wang, J.; Zhang, W. Interpretable knowledge tracing via response influence-based counterfactual reasoning. In Proceedings of the 2024 IEEE 40th International Conference on Data Engineering (ICDE), Utrecht, The Netherlands, 16–19 April 2024; IEEE: New York City, NY, USA, 2024; pp. 1103–1116. [Google Scholar]
- Hambleton, R.K.; Swaminathan, H.; Rogers, H.J. Fundamentals of Item Response Theory; Sage: Thousand Oaks, CA, USA, 1991; Volume 2. [Google Scholar]
- Liu, F.; Bu, C.; Zhang, H.; Wu, L.; Yu, K.; Hu, X. FDKT: Towards an interpretable deep knowledge tracing via fuzzy reasoning. ACM Trans. Inf. Syst. 2024, 42, 139. [Google Scholar] [CrossRef]
- Frick, S.; Krivosija, A.; Munteanu, A. Scalable learning of item response theory models. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Valencia, Spain, 2–4 May 2024; PMLR: Cambridge, MA, USA, 2024; pp. 1234–1242. [Google Scholar]
- Dong, X.; Yu, Z.; Cao, W.; Shi, Y.; Ma, Q. A survey on ensemble learning. Front. Comput. Sci. 2020, 14, 241–258. [Google Scholar] [CrossRef]
- Mienye, D.; Sun, Y. A survey of ensemble learning: Concepts, algorithms, applications, and prospects. IEEE Access 2022, 10, 99129–99149. [Google Scholar] [CrossRef]
- Ribeiro, M.H.D.M.; dos Santos Coelho, L. Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series. Appl. Soft Comput. 2020, 86, 105837. [Google Scholar] [CrossRef]
- Yeung, C.-K. Deep-IRT: Make deep learning-based knowledge tracing explainable using item response theory. arXiv 2019, arXiv:1904.11738. [Google Scholar]
- Zhang, J.; Shi, X.; King, I.; Yeung, D.-Y. Dynamic key-value memory networks for knowledge tracing. In Proceedings of the 26th International Conference on World Wide Web, WWW ’17, Perth, Australia, 3–7 April 2017; International World Wide Web Conferences Steering Committee: Geneva, Switzerland, 2017; pp. 765–774. [Google Scholar] [CrossRef]
- Wang, J.; Jing, X.; Yan, Z.; Fu, Y.; Pedrycz, W.; Yang, L.T. A survey on trust evaluation based on machine learning. ACM Comput. Surv. 2020, 53, 107. [Google Scholar] [CrossRef]
- Gervet, T.; Koedinger, K.; Schneider, J.; Mitchell, T. When Is Deep Learn. Best Approach Knowledge Tracing? J. Educ. Data Min. 2020, 12, 31–54. [Google Scholar]

| Training Dataset | # of Records | # of Students | # of KCs |
|---|---|---|---|
| COE | 36,026 | 2036 | 165 |
| COE + COAS | 101,529 | 6179 | 206 |
| University | 24,964 | 16,549 | 224 |
| Testing Dataset | # of Records | # of Students | # of KCs |
|---|---|---|---|
| CEE | 1043 | 130 | 57 |
| CHE | 1437 | 147 | 57 |
| CSC | 2461 | 284 | 67 |
| ECE | 2173 | 244 | 75 |
| MCE | 3337 | 387 | 83 |
| COE | 10,451 | 1182 | 125 |
| University | 79,305 | 9102 | 200 |
| Key Configuration Setting | Key Configuration Setting |
|---|---|
| Batch Size | 32 |
| Number of Epochs 50 | 50 |
| Learning Rate 0.003 | 0.003 |
| Sequence Length 200 | 200 |
| Optimizer Adam | Adam |
| Training: University Data, Testing: University Data | |||||
| Model | AUC | ACC | Precision | Recall | F1-Score |
| 1PL-based DKT 2PL-based DKT DeepIRT Proposed Method | 62.79 62.06 62.24 63.97 | 80.45 79.50 79.05 80.56 | 0.8476 0.8459 0.8462 0.8465 | 0.9302 0.9188 0.9118 0.9336 | 0.8870 0.8808 0.8778 0.8879 |
| Training: COE Data, Testing: COE Data | |||||
| Model | AUC | ACC | Precision | Recall | F1-Score |
| 1PL-based DKT 2PL-based DKT DeepIRT Proposed Method | 65.70 64.17 65.48 66.54 | 77.75 76.50 77.00 78.46 | 0.8251 0.8215 0.8223 0.8264 | 0.9176 0.9039 0.9104 0.9265 | 0.8689 0.8607 0.8641 0.8736 |
| Training: COAS Data, Testing: COE Data | |||||
| Model | AUC | ACC | Precision | Recall | F1-Score |
| 1PL-based DKT 2PL-based DKT DeepIRT Proposed Method | 62.84 62.46 61.13 63.14 | 76.57 75.18 74.41 76.34 | 0.8382 0.8393 0.8349 0.8359 | 0.8778 0.8547 0.8496 0.8778 | 0.8575 0.8469 0.8421 0.8563 |
| Testing CEE data | |||||
| Model | AUC | ACC | Precision | Recall | F1-Score |
| 1PL-based DKT 2PL-based DKT DeepIRT Proposed Method | 55.32 57.80 55.51 57.98 | 69.99 68.26 68.74 71.14 | 0.7780 0.7807 0.7707 0.7773 | 0.8406 0.8036 0.8316 0.8635 | 0.8081 0.7920 0.8000 0.8181 |
| Testing CHE data | |||||
| Model | AUC | ACC | Precision | Recall | F1-Score |
| 1PL-based DKT 2PL-based DKT DeepIRT Proposed Method | 56.57 54.77 56.12 58.41 | 79.61 71.33 83.23 84.62 | 0.8770 0.8747 0.8849 0.8791 | 0.8876 0.7785 0.9256 0.9523 | 0.8823 0.8238 0.9048 0.9142 |
| Testing CSC data | |||||
| Model | AUC | ACC | Precision | Recall | F1-Score |
| 1PL-based DKT 2PL-based DKT DeepIRT Proposed Method | 50.21 50.25 55.71 56.14 | 81.39 83.38 83.26 83.42 | 0.8489 0.8565 0.8512 0.8518 | 0.9485 0.9649 0.9716 0.9731 | 0.8960 0.9075 0.9075 0.9084 |
| Testing ECE data | |||||
| Model | AUC | ACC | Precision | Recall | F1-Score |
| 1PL-based DKT 2PL-based DKT DeepIRT Proposed Method | 62.54 60.11 60.86 64.34 | 79.52 79.11 78.97 79.34 | 0.8082 0.8116 0.8045 0.8044 | 0.9746 0.9614 0.9729 0.9792 | 0.8837 0.8801 0.8807 0.8832 |
| Testing MCE data | |||||
| Model | AUC | ACC | Precision | Recall | F1-Score |
| 1PL-based DKT 2PL-based DKT DeepIRT Proposed Method | 61.31 61.93 62.23 63.71 | 71.32 71.35 71.89 73.45 | 0.7969 0.7909 0.7957 0.7974 | 0.8407 0.8521 0.8528 0.8770 | 0.8182 0.8204 0.8233 0.8353 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Onyeke, C.; Qian, L.; Obiomon, P.; Dong, X. Integrating Ensemble Learning with Item Response Theory to Improve the Interpretability of Student Learning Outcome Tracing. Appl. Sci. 2025, 15, 12594. https://doi.org/10.3390/app152312594
Onyeke C, Qian L, Obiomon P, Dong X. Integrating Ensemble Learning with Item Response Theory to Improve the Interpretability of Student Learning Outcome Tracing. Applied Sciences. 2025; 15(23):12594. https://doi.org/10.3390/app152312594
Chicago/Turabian StyleOnyeke, Christian, Lijun Qian, Pamela Obiomon, and Xishuang Dong. 2025. "Integrating Ensemble Learning with Item Response Theory to Improve the Interpretability of Student Learning Outcome Tracing" Applied Sciences 15, no. 23: 12594. https://doi.org/10.3390/app152312594
APA StyleOnyeke, C., Qian, L., Obiomon, P., & Dong, X. (2025). Integrating Ensemble Learning with Item Response Theory to Improve the Interpretability of Student Learning Outcome Tracing. Applied Sciences, 15(23), 12594. https://doi.org/10.3390/app152312594

