MOOC Video Personalized Classification Based on Cluster Analysis and Process Mining
:1. Introduction
- an approach is proposed to implement MOOC video personalized classification in terms of difficulty and importance for students at different knowledge levels;
- the business process modeling idea is introduced into the modeling of MOOC video learning behaviors, and the process mining technique is used to mine the video watching behaviors of students;
- an approach of measuring difficulty and importance of MOOC videos based on a process model is proposed, by which the difficulty and importance of MOOC videos for students at different knowledge levels can be obtained automatically.
2. Related Works
3. MOOC Video Personalized Classification Framework
3.1. Student Clustering
3.2. Video Learning Behavior Process Model Mining
3.3. MOOC Video Personalized Classification
4. MOOC Video Personalized Classification
4.1. Student Clustering
Algorithm 1 Student clustering based on question answering vectors by K-means clustering. |
Input: Students set S = {s1, s2, …, sn}; Students’ question answering vector set SV = {sv1, sv2, …, svn}; //svi represents the question answering vector of the i-th student; The number of clusters K; Max iteration times MT1; Max times of cluster centers unchanging MT2 Output: K clusters: C1, C2, …, CK |
1: init: C1 = C2 = … = CK = {}, current iteration times CT1 = 0, current times of cluster centers unchanging CT2 = 0, 2: CV = {cv1, cv2, …, cvK} = Random(SV, K) //select the question answering vectors of K students as K initial cluster centers randomly 3: while (CT1 < MT1 and CT2 < MT2) //stop iteration when CT1 reach MT1 or CT2 reach MT2 4: for each svi∈SV and cvj∈CV://traverse question answering vectors and cluster centers 5: if (svi nearest to cvj) then: //search the nearest cluster center to each students 6: add si to Cj //add student si to the nearest cluster 7: end if 8: end for 9: for each cvi∈CV, each sj∈Ci: //traverse cluster centers and students that belong to the cluster 10: cvi = avg(svj) //take the average value of every students’ question answering vectors in cluster i as new cluster centers of cluster i 11: end for 12: CT1 = CT1+1 //add 1 to max iteration times 13: if (unchanged(CV)) then://if all cluster centers are unchanged 14: CT2 = CT2+1 //add 1 to max times of cluster centers unchanging 15: end if 16: end while 17: output C1, C2, …, CK |
4.2. VLBP Model Mining
Algorithm 2 VLBP model mining based on heuristic mining. |
Input: Learning Sequence set LSS = {LS1, LS2, …, LSn}; The threshold of the number of following directly Tf; The threshold of dependency Td; Output: VLBP = (V, E) |
1: init: the number of following directly matrix F[][]=0, dependency matrix D[][], all videos set Vall = {}, target videos set V = {}, order relations set E = {} 2: for each LS∈LSS and each vi∈LS://traverse videos belonging to LSS 3: if (vi not belong to Vall) then: 4: add vi to Vall //record videos that appear in LSS 5: end if 6: end for 7: for each LS∈LSS and each vi, vi+1∈LS://traverse neighboring videos in each LS in LSS 8: F[vi][vi+1] = F[vi][vi+1] + 1 //count the times that vi follows vj directly 9: end for 10: for each vi, vj∈Vall: //traverse every two videos 11: if (vi == vj) then: 12: D[vi][vj]=F[vi][vj]/(F[vi][vj]+1) //calculate the dependency between vi and itself 13: end if 14: if (vi != vj) then: 15: D[vi][vj]=(F[vi][vj]-F[vj][vi])/(F[vi][vj]+F[vj][vi]+1)//calculate the dependency between vi and vj 16: end if 17: if (F[vi][vj] >= Tf) and (D[vi][vj] >= Td) then://the number of following directly and the dependency between videos are all greater than or equal to the threshold 18: if (vi, vj not belong to V) then: 19: add vi, vj to V//record videos that meet the conditions 20: end if 21: add (vi, vj) to E//record the order relationship between videos 22: end if 23: end for 24: output (V, E) |
4.3. Video Classification Based on VLBP
4.3.1. VLBP Structures for Video Difficulty and Importance Measure
4.3.2. Video Importance and Difficulty Measure based on VLBP
4.3.3. MOOC Video Classification
5. Experiment and Evaluation
5.1. Dataset
5.2. Experimental Procedures
5.2.1. Student Clustering
5.2.2. VLBP Model Mining and Video Classification
5.3. Experiment Analysis and Verification
5.3.1. Difficulty and Importance Analysis of Videos
5.3.2. Effectiveness of Video Personalized Classification
6. Conclusions
Author Contributions
Conflicts of Interest
Appendix A. Dataset Used in the Experiments
- Long, T.; Cummins, J.; Waugh, M. Use of the flipped classroom instructional model in higher education: instructors’ perspectives. J. Comput. High. Educ. 2017, 29, 179–200. [Google Scholar] [CrossRef]
- Elmaadaway, M.A.N. The effects of a flipped classroom approach on class engagement and skill performance in a Blackboard course. Br. J. Educ. Technol. 2018, 49, 479–491. [Google Scholar] [CrossRef]
- Wu, H.Y. Integration of Personalized Learning and Flipped Classroom Teaching Mode. Mod. Educ. Technol. 2015, 25, 46–52. [Google Scholar]
- Wu, L.J.; Liu, Q.T.; Huan, H.; Liu, M.; Huang, J.X. The Design and Development of Educational Resources Clustering System Oriented to e-Learning. China Educ. Technol. 2014, 35, 85–89. [Google Scholar]
- Rodríguez, P.; Duque, N.; Ovalle, D.A. Multi-agent system for knowledge-based recommendation of learning objects using metadata clustering. In Highlights of Practical Applications of Agents, Multi-Agent Systems, and Sustainability—The PAAMS Collection; Bajo, J., Hallenborg, K., Pawlewski, P., Botti, V., Sánchez-Pi, N., Méndez, N.D.D., Lopes, F., Julian, V., Eds.; Springer: Cham, Switzerland, 2015; Volume 524, pp. 356–364. [Google Scholar]
- Zhou, Q.; Mu, C.; Yang, D. Research Progress on Educational Data Mining: A Survey. J. Softw. 2015, 26, 3026–3042. [Google Scholar]
- Dutt, A.; Ismail, M.A.; Herawan, T. A systematic review on educational data mining. IEEE Access 2017, 5, 15991–16005. [Google Scholar] [CrossRef]
- Peña-Ayala, A. Educational data mining: A survey and a data mining-based analysis of recent works. Expert Syst. Appl. 2014, 41, 1432–1462. [Google Scholar] [CrossRef]
- van der Sluis, F.; Ginn, J.; van der Zee, T. Explaining student behavior at scale: The influence of videos complexity on student dwelling time. In Proceedings of the Third (2016) ACM Conference on Learning @ Scale; ACM: Edinburgh, UK, 2016; pp. 51–60. [Google Scholar]
- Li, N.; Kidziński, Ł.; Jermann, P.; Dillenbourg, P. MOOC videos interaction patterns: What do they tell us? In Design for Teaching and Learning in a Networked World; Conole, G., Klobučar, T., Rensing, C., Konert, J., Lavoué, E., Eds.; Springer: Cham, Switzerland, 2015; Volume 9307, pp. 197–210. [Google Scholar]
- Ye, H.Z.; Cheng, Q.J.; Huang, H.T. Using the K-means Algorithm-Based Method to Screen High-quality Online Resources. Distance Educ. China 2014, 34, 62–66. [Google Scholar]
- Niemann, K.; Schmitz, H.C.; Kirschenmann, U.; Wolpers, M.; Schmidt, A.; Krones, T. Cluestering by usage: Higher order co-occurrences of learning objects. In Proceedings of the 2nd International Conference on Learning Analytics and Knowledge; ACM: Vancouver, BC, Canada, 2012; pp. 238–247. [Google Scholar]
- Niemann, K.; Wolpers, M. Usage-based clustering of learning resources to improve recommendations. In Open Learning and Teaching in Educational Communities; Rensing, C., de Freitas, S., Ley, T., Muñoz-Merino, P.J., Eds.; Springer: Cham, Switzerland, 2014; Volume 8719, pp. 317–330. [Google Scholar]
- Jiang, Q.; Zhao, W.; Li, S.; Wang, P.J. Research on the Mining of Precise Personalized Learning Path in Age of Big Data: Analysis of Group Learning Behaviors Based on AprioriAll. e-Educ. Res. 2018, 39, 45–52. [Google Scholar]
- Bogarín, A.; Cerezo, R.; Romero, C. A survey on educational process mining. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, 1–17. [Google Scholar] [CrossRef] [Green Version]
- Liu, C.; Duan, H.; Zeng, Q.; Zhou, M.; Lu, F.; Cheng, J. Towards comprehensive support for privacy preservation cross-organization business process mining. IEEE Trans. Serv. Comput. 2019, 12, 639–653. [Google Scholar] [CrossRef]
- Liu, C. Automatic Discovery of Behavioral Models from Software Execution Data. IEEE Trans. Autom. Sci. Eng. 2018, 2018 15, 1897–1908. [Google Scholar] [CrossRef]
- Liu, C.; Pei, P.; Duan, H.; Zeng, Q. LogRank: An Approach to Sample Business Process Event Log for Efficient Discovery. In 11th International Conference on Knowledge Science, Engineering, and Management (KSEM 2018); Springer: Cham, Switzerland, 2018; pp. 415–425. [Google Scholar]
- Liu, C.; Zhang, J.; Li, G.; Gao, S.; Zeng, Q. A Two-Layered Framework to Discover Software Behavior: A Case Study. IEICE Trans. Inf. Syst. 2018, E101-D, 2005–2014. [Google Scholar] [CrossRef] [Green Version]
- Zeng, Q.; Sun, S.X.; Duan, H.; Liu, C.; Wang, H. Cross-organizational Collaborative Workflow Mining from a Multi-source log. Decis. Support Syst. 2013, 54, 1280–1301. [Google Scholar] [CrossRef]
- Liu, C.; Wang, S.; Gao, S.; Zhang, F.; Cheng, J. User Behavior Discovery from Low-level Software Execution Logs. IEEJ Trans. Electr. Electron. Eng. 2018, 13, 1624–1632. [Google Scholar] [CrossRef]
- Wasik, S.; Antczak, M.; Badura, J.; Laskowski, A.; Sternal, T. A survey on online judge systems and their applications. ACM Comput. Surv. (CSUR) 2018, 51, 1–34. [Google Scholar] [CrossRef] [Green Version]
- Xue, L.M.; Luan, W.X. Application of clustering algorithm in university network user behavior analysis. Mod. Electron. Tech. 2016, 39, 29–32. [Google Scholar]
- You, Z.X.; Qian, X.L.; Wang, Z.X. Clustering Research on MOOC Hot Topics Abroad. e-Educ. Res. 2015, 36, 38–44. [Google Scholar]
- Mekhala. Review Paper on Process Mining. Int. J. Eng. Tech. 2015, 1, 11–17. [Google Scholar]
- Ayutaya, N.S.N.; Palungsuntikul, P.; Premchaiswadi, W. Heuristic mining: Adaptive process simplification in education. In Proceedings of the 2012 Tenth International Conference on ICT and Knowledge Engineering, Bangkok, Thailand, 21–23 November 2012; pp. 221–227. [Google Scholar]
- Vázquez-Barreiros, B.; Mucientes, M.; Lama, M. ProDiGen: Mining complete, precise and minimal structure process models with a genetic algorithm. Inf. Sci. 2015, 294, 315–333. [Google Scholar] [CrossRef]
- Günther, C.W.; van der Aalst, W.M.P. Fuzzy mining—Adaptive process simplification based on multi-perspective metrics. In Business Process Management; Alonso, G., Dadam, P., Rosemann, M., Eds.; Springer: Heidelberg/Berlin, Germany, 2007; Volume 4714, pp. 328–343. [Google Scholar]
Video Name | Whether it Appears | Self-Loop | Short-Loop | Long-Loop | Skip | Difficulty | Importance |
v1 | 1 | Self-Loop1 | 2 | 2 | |||
v2 | 1 | Short-Loop1 | 2 | 2 | |||
v3 | 1 | Short-Loop1 | 2 | 2 | |||
v4 | 1 | Long-Loop1 | 1 | 2 | |||
v5 | 1 | Long-Loop1 | 1 | 2 | |||
v6 | 1 | Short-Loop2 | Long-Loop1 | 2 | 3 | ||
v7 | 1 | Short-Loop2 | Long-Loop1 | 2 | 3 | ||
v8 | 1 | Long-Loop1 | 1 | 2 | |||
v9 | 1 | Long-Loop1 | 1 | 2 | |||
v10 | 1 | 1 | 1 | ||||
v11 | 1 | Skip1 | 1 | 0 | |||
v12 | 1 | 1 | 1 | ||||
v13 | 0 |
classify by difficult | classification 1 (D = 1) | classification 2 (D = 2) | ||
v4, v5, v8, v9, v10, v11, v12 | v1, v2, v3, v6, v7 | |||
classify by importance | classification 1 (I = 0) | classification 2 (I = 1) | classification 3 (I = 2) | classification 4 (I = 3) |
v11 | v10, v12 | v1, v2, v3, v4, v5, v8, v9 | v6, v7 |
Cluster | Number of Students | Correct Number of Answered Questions | Correct Rate of Answered Questions | Knowledge Level |
1 | 12 | 16.0833 | 0.5092 | High |
2 | 56 | 11.3214 | 0.4818 | Middle |
3 | 26 | 8.8077 | 0.3058 | Low |
4 | 2 | 2.5 | 0.875 | Poor |
overall mean | 96 | 11.0521 | 0.4457 |
Video Name | Whether It Appears | Self-Loop | Short-Loop | Long-Loop | Skip | D | I |
1.1 JDK installation | 1 | Short-Loop1 | 2 | 2 | |||
1.2 Path configuration | 1 | Short-Loop1 | 2 | 2 | |||
1.3 JAVA_HOME environment variable configuration | 1 | Short-Loop1 | 2 | 2 | |||
1.4 classpath environment variable configuration | 1 | 1 | 1 |
classify by difficult | classification1 (D = 1) | classification2 (D = 2) | |
v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v39, v40 | v1, v2, v3, v16, v35, v36, v37, v38 | ||
classify by importance | classification1 (I = 1) | classification2 (I = 2) | classification3 (I = 3) |
v4, v17, v19, v20, v21, v22, v23, v24, v25 | v1, v2, v3, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v18, v26, v27, v28, v29, v30, v31, v32, v33, v34, v39, v40 | v35, v36, v37, v38 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (
Share and Cite
Zhang, F.; Liu, D.; Liu, C. MOOC Video Personalized Classification Based on Cluster Analysis and Process Mining. Sustainability 2020, 12, 3066.
Zhang F, Liu D, Liu C. MOOC Video Personalized Classification Based on Cluster Analysis and Process Mining. Sustainability. 2020; 12(7):3066.
Chicago/Turabian StyleZhang, Feng, Di Liu, and Cong Liu. 2020. "MOOC Video Personalized Classification Based on Cluster Analysis and Process Mining" Sustainability 12, no. 7: 3066.
APA StyleZhang, F., Liu, D., & Liu, C. (2020). MOOC Video Personalized Classification Based on Cluster Analysis and Process Mining. Sustainability, 12(7), 3066.