Educational Data Mining: A Foundational Overview
Definition
:1. Evolution of EDM
2. The Significance of EDM in Education
3. EDM Literature Reviews
4. EDM Data and Tools
- Micro-level data in education are mainly generated through interactions between students and learning platforms, such as MOOCs, simulations, and games. This data can capture detailed learner actions and contexts, enabling real-time interventions like feedback or skill adjustment. It is validated through real-time observations or retrospective coding [23,24]. Common methodologies, including Bayesian Knowledge Tracing and Performance Factor Analysis, are used to assess students’ knowledge and predict outcomes [25].
- Meso-level data are mainly generated through student texts in platforms like LMS and social media. Natural Language Processing (NLP) helps examine students’ cognitive, social, behavioral, and emotional processes. It supports automated grading and feedback, improves course design, and enhances student participation, though challenges remain in tool reliability and contextual factors [16].
- Macro-level data are collected over longer periods and include demographics, course enrollments, and academic records. These data are primarily used for institutional decision-making, supporting early warning systems that identify at-risk students and guidance systems that recommend courses [26,27]. Macro-level data are also utilized for administrative analyses to assess curriculum effectiveness and patterns of student success or dropout [28].
- Demographic data refer to ways in which students are divided by the information about their backgrounds (e.g., their age, gender, socio–economic status, and educational history). Demographic data are generally used to uncover some regularities and interconnections between sociodemographic features and the academic success of students. For example, the researchers can decide to investigate how the poverty rate affects the availability of education resources or what role the demographic aspects have in the level of a student’s performance [29]. This piece of information is crucial for building personalized learning experiences that highlighting the connection between the students and educators.
- Interaction data are data that students receive from their interactions with different educational technologies; for instance, learning management systems, online courses, and educational software [30]. These data contain specific information about how often, for how long, and in what style students interact with a digitized learning process. Interaction data are a crucial part of the understanding of how students actually interact with e-learning, which can thus be helpful in the design of more effective instructional materials and interventions [31]. Thus, from an LMS, clickstream data analysis is a powerful tool that can allow educators to figure out which study resources are important to students; hence, the right content delivery can be achieved.
- Performance data are grades, scores, and other assessments that tell us how well a student performed. This class of data can determine the effects of the educational programs on students and look where the children need the most help [23]. Performance data can be collected from traditional assessments, such as exams and quizzes, as well as from more dynamic sources such as real-time analytics from online learning platforms [31]. The interpretation of student performance data calls for the provision of quality performance measures that will be used in decision-making and predicting the student’s achievement.
- Psychometric data cover measuring students’ cognitive abilities, their personality traits, and their emotional states. This kind of data is typically gathered through questionnaires, psychological assessments, and observation studies [30]. Psychometric data forms a basis for understanding the latent variables that on the one hand make learning happen in the first place, e.g., motivation, self-efficacy, and stress levels [29]. Other than providing the data of psychology, the data of other educational types come from the researchers and make possible the buildup of models shown by the students and the processes of the training.
5. EDM Methods
- ○
- Decision tree is a tree-like structure where each internal node represents a decision based on a feature, each branch represents an outcome of that decision, and each leaf node represents a class label. The tree is constructed by recursively splitting the data based on the feature that best separates the classes at each step, usually using metrics like Gini index or information gain.
- ○
- The Naive Bayes classifier is based on Bayes’ Theorem and assumes that features are independent of each other (hence the term “naive”). Despite this unrealistic assumption, Naive Bayes works well in practice, especially for text classification tasks like spam detection.
- ○
- Support vector machines aim to find a hyperplane that best separates the data points of different classes in a high-dimensional space. SVMs are particularly effective when the data are linearly separable. In cases where the data are not linearly separable, SVM uses a kernel trick to transform the data into a higher dimension where separation is possible.
- ○
- The KNN algorithm is an instance-based learning method where the classification of a new data point is determined by the majority class among its K-Nearest Neighbors in the feature space. It relies on distance metrics like Euclidean distance to measure the similarity between data points.
- ○
- Random Forest is an ensemble learning method that combines multiple decision trees to improve classification accuracy. Each tree is trained on a random subset of the data, and the final prediction is made based on the majority vote of the individual trees. This approach helps mitigate the overfitting problem associated with single decision trees.
- ○
- Neural networks are inspired by the biological neural networks of the human brain. They consist of layers of interconnected nodes (neurons) that process input features to make predictions. Deep learning models, which are a type of neural network with many hidden layers, have gained popularity for complex classification tasks like image recognition and Natural Language Processing.
- Regression is a basic technique in supervised learning used to model the relationship between dependent and independent variables to predict continuous variables. More specifically, regression tries to predict a variable y using a function g(x), where x is an explanatory vector x = [x1, …, xp] ⊤. The optimal function g* has to be learned from the training set by minimizing the training loss:
- Mean Squared Error (MSE) calculates the average of the squared differences between predicted and actual value.
- Root Mean Squared Error (RMSE) is the square root of MSE, providing error estimates in the same units as the target variable.
- Clustering—an unsupervised learning technique—classifies similar instances into groups without predetermined labels. Clustering techniques aim to minimize within-cluster variance by iteratively assigning data points to the nearest cluster center. Clustering methods can be broadly categorized into five types: Partitioning methods, such as K-means and K-medoids, aim to divide data into a predefined number of clusters. K-means assigns points to clusters based on their proximity to centroids, which are iteratively refined, while K-medoids are more robust for outliers. Hierarchical methods build a tree-like structure (dendrogram) of clusters through either agglomerative (bottom-up) or divisive (top-down) processes.
- Association rules mining (ARM) discovery reveals interesting relationships between variables in large data sets [32]. It is particularly effective in analyzing large amounts of student-related data, such as academic performance and behavioral data, to uncover hidden relationships. For example, ARM can be used to identify which combinations of courses or study habits are often associated with academic success or failure [14]. This information could be used to guide the adaptation of curriculum and teaching methods.
- Sequential pattern mining (SPM) is a method that focuses on identifying and analyzing sequences of events or behaviors over time. In the context of education, SPM helps in understanding how students’ progress through learning activities, capturing patterns in their interactions with learning management systems or standard routes through a curriculum [26]. For example, SPM can reveal common sequences of mistakes or successful strategies in problem-solving, which can then be used to guide instructional design and provide personalized feedback [33].
- Natural Language Processing (NLP) is a subfield of computational linguistics that aims to enable machines to understand, interpret, and generate human language. It bridges the gap between humans’ natural language and computers’ binary code.
- Social Network Analysis (SNA) examines the relationships and interactions between entities, such as students and educators, in educational settings. By mapping the flow of communication and collaboration, SNA provides insights into how students form learning communities, share knowledge, and develop a better school climate that can influence academic performance [34]. This approach is particularly useful for analyzing online discussion forums, collaborative learning environments, and other learning networks. The relationships between individuals are represented as graphs, where nodes represent individuals and edges represent the interactions between them [34].
- Data visualization can help both students and teachers. Data visualization tools can effectively present information to students in an intuitive way, helping them understand their learning progress in real time. These tools also support classroom instruction, teaching interventions, and evaluations, while enabling educators to adjust their teaching goals, methods, and management strategies to enhance decision-making [35].
6. EDM Topics
- A dominant theme of EDM is pattern learning, which includes studies that analyze how students learn and behave in educational contexts. The main goal is to understand the learning process by examining patterns in the way students interact with learning systems and educational content. Much of the research has focused on discovering sub-groups of students with similar learning styles and evaluating the effectiveness of teaching methods [4,30]. The results of the research can help educational authorities in designing effective curricula and interventions tailored to the needs of students. According to Ozyurt et al., 27.22% of the studies between 2008 and 2022 have pattern learning as an objective [22].
- Recommendation systems in EDM can enhance personalized learning by providing students and educators with tailored suggestions that improve educational outcomes. These systems use algorithms to recommend courses, learning materials, and resources that align with individual student needs and preferences. By analyzing historical data, such as academic performance and behavior patterns, recommendation systems can predict future learning requirements and suggest the most relevant educational content. For instance, they help students select the most suitable courses or assignments, thereby optimizing their learning path and improving academic performance. Moreover, these systems can assist educators in developing customized learning plans for students, improving the overall quality of education by fostering more personalized and efficient learning experiences [37].
- Sentiment analysis and feedback help educational institutions to evaluate the quality of courses and programs from the student’s perspective. Within this theme, studies have focused on developing approaches to analyze textual data from sources, such as open-ended survey responses, online discussions, and assignments using Natural Language Processing and machine learning [38]. Sentiment analysis provides valuable insights for educators on how to enhance student satisfaction, engagement, and motivation.
- Analysis of data from MOOCs and online learning platforms involves the use of several machine learning methods applied to online and blended learning models. The field continues to be important as virtual and hybrid education become dominant, especially in the COVID-19 era [39]. Researchers in this area are trying to optimize the design of online courses, promote participation, and enhance the digital learning experience.
- Learning analytics uses techniques to process massive educational and classroom-level data to gain insights into learning behaviors, understand learner profiles, and predict educational-related outcomes. This area has become increasingly important in recent years as technology-enhanced learning and large data sets are generated [40]. Learning analytics help to develop ideas for personalizing learning experiences, monitoring progress, and improving program design.
- Performance prediction, in the form of final grades, exam grades, and assessment of the risk of failure also informs targeted interventions. Advances in machine learning have enabled more accurate predictions. By analyzing characteristics such as demographics, prior academic history, measures of effort, and engagement, performance prediction develops performance prediction models and can thus identify at-risk students [28,41]. This allows for targeted academic support and optimization of teaching methodologies.
- Finally, student clustering applies unsupervised machine learning techniques to discover groups of students with similar characteristics without predefined labels. Clustering supports the development of personalized, differentiated teaching by recognizing student diversity and grouping students accordingly.
7. EDM Basic Applications
8. Challenges
9. Future Trends
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Romero, C.; Romero, J.R.; Ventura, S. A Survey on Pre-Processing Educational Data. In Educational Data Mining; Springer: Cham, Switzerland, 2014; pp. 29–64. [Google Scholar]
- Romero, C.; Ventura, S. Educational Data Science in Massive Open Online Courses. WIREs Data Min. Knowl. Discov. 2017, 7, e1187. [Google Scholar] [CrossRef]
- Bakhshinategh, B.; Zaiane, O.R.; ElAtia, S.; Ipperciel, D. Educational Data Mining Applications and Tasks: A Survey of the Last 10 Years. Edu. Inf. Technol. 2018, 23, 537–553. [Google Scholar] [CrossRef]
- Baker, R.S.J.d; Inventado, P.S. Educational Data Mining and Learning Analytics. In Learning Analytics: From Research to Practice; Larusson, J.A., White, B., Eds.; Springer: Berlin/Heidelberg, Germany, 2014; pp. 61–75. [Google Scholar]
- Romero, C.; Ventura, S.; Pechenizky, M.; Baker, R. Handbook of Educational Data Mining; CRC Press, Taylor & Francis Group: Boca Raton, FL, USA, 2010. [Google Scholar]
- Siemens, G.; Baker, R.S.J.d. Learning Analytics and Educational Data Mining: Towards Communication and Collaboration. In Proceedings of the 2nd International Conference on Learning Analytics and Knowledge, Vancouver, BC, Canada, 29 April–2 May 2012; pp. 1–3. [Google Scholar]
- Siemens, G. Learning Analytics: The Emergence of a Discipline. Am. Behav. Sci. 2013, 57, 1380–1400. [Google Scholar] [CrossRef]
- Cerezo, R.; Lara, J.-A.; Azevedo, R.; Romero, C. Reviewing the differences between learning analytics and educational data mining: Towards educational data science. Comput. Hum. Behav. 2024, 154, 108155. [Google Scholar] [CrossRef]
- Chan, K.I.; Lei, P.I.S.; Pang, P.C.-I. A literature review on educational data mining with secondary school data. In Proceedings of the 9th International Conference on Education and Training Technologies, Macau, China, 21–23 April 2023; ACM: New York, NY, USA, 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Baker, R.S. Big Data and Education, 2nd ed.; Teachers College, Columbia University: New York, NY, USA, 2015. [Google Scholar]
- Ray, S.; Saeed, M. Applications of Educational Data Mining and Learning Analytics Tools in Handling Big Data in Higher Education. In Applications of Big Data Analytics; Springer: Cham, Switzerland, 2018; pp. 135–160. [Google Scholar] [CrossRef]
- Bousbia, N.; Belamri, I. Which Contribution Does EDM Provide to Computer-Based Learning Environments? In Studies in Computational Intelligence. Educational Data Mining; Springer: Cham, Switzerland, 2014; pp. 3–28. [Google Scholar] [CrossRef]
- Papadogiannis, I.; Poulopoulos, V.; Wallace, M. A Critical Review of Data Mining for Education: What Has Been Done, What Has Been Learnt and What Remains to Be Seen. Int. J. Educ. Res. Rev. 2020, 5, 353–372. [Google Scholar] [CrossRef]
- Choi, W.-C.; Lam, C.-T.; Mendes, A.J. A systematic literature review on performance prediction in learning programming using educational data mining. In Proceedings of the 2023 IEEE Frontiers in Education Conference (FIE), College Station, TX, USA, 18–21 October 2023; pp. 1–9. [Google Scholar] [CrossRef]
- Romero, C.; Ventura, S. Educational data mining: A survey from 1995 to 2005. Expert Syst. Appl. 2007, 33, 135–146. [Google Scholar] [CrossRef]
- Baker, R.S.; Yacef, K. The state of educational data mining in 2009: A review and future visions. J. Educ. Data Min. 2009, 1, 3–17. [Google Scholar]
- Papamitsiou, Z.; Economides, A.A. Learning analytics and educational data mining in practice: A systematic literature review of empirical evidence. J. Educ. Technol. Soc. 2014, 17, 49–64. [Google Scholar]
- Peña-Ayala, A. Educational data mining: A survey and a data mining-based analysis of recent works. Expert Syst. Appl. 2014, 41, 1432–1462. [Google Scholar] [CrossRef]
- Thakar, P.; Mehta, A.; Manisha. Performance Analysis and prediction in educational Data mining: A research travelogue. Int. J. Comput. Appl. 2015, 110, 60–68. [Google Scholar]
- Sukhija, S.; Singh, S.; Riar, C.S. Isolation of starches from different tubers and study of their physicochemical, thermal, rheological and morphological characteristics. Starch-Stärke 2016, 68, 160–168. [Google Scholar] [CrossRef]
- Del Río, C.A.; Insuasti, J.A.P. Predicting academic performance in traditional environments at higher-education institutions using data mining: A review. Ecos Acad. 2016, 4, 185–201. [Google Scholar]
- Ozyurt, O.; Ozyurt, H.; Mishra, D. Uncovering the educational data mining landscape and future perspective: A comprehensive analysis. IEEE Access 2023, 11, 120192–120208. [Google Scholar] [CrossRef]
- Romero, C.; Ventura, S. Educational data mining and learning analytics: An updated survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2020, 10, e1355. [Google Scholar] [CrossRef]
- Rodrigues, M.W.; Isotani, S.; Zárate, L.E. Educational data mining: A review of evaluation process in e-learning. Telemat. Inform. 2018, 35, 1701–1717. [Google Scholar] [CrossRef]
- Pham Kim, C. Evaluating Student Teachers in Micro-Teaching with Analysis of Video Recording Lesson by Boris Software at Vietnam National University. Sci. Publ. Cent. Sociosphere 2017, 8, 67–74. [Google Scholar] [CrossRef]
- Ferreira-Mello, R.; André, M.; Pinheiro, A.; Costa, E.; Romero, C. Text mining in education. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1332. [Google Scholar] [CrossRef]
- Chaturapruek, S.; Dalberg, T.; Thompson, M.E.; Giebel, S.; Harrison, M.H.; Johari, R.; Stevens, M.L.; Kizilcec, R.F. Studying undergraduate course consideration at scale. AERA Open 2021, 7, 233285842199114. [Google Scholar] [CrossRef]
- Papadogiannis, I.; Wallace, M.; Poulopoulos, V.; Karountzou, G.; Ekonomopoulos, D. A First Ever Look into Greece’s Vast Educational Data: Interesting Findings and Policy Implications. Educ. Sci. 2021, 11, 489. [Google Scholar] [CrossRef]
- Chen, Y.; Chang, H.-H. Psychometrics Help Learning: From Assessment to Learning. Appl. Psychol. Meas. 2018, 42, 3–4. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Yun, Y.; An, R.; Cui, J.; Dai, H.; Shang, X. Educational data mining techniques for student performance prediction: Method review and comparison analysis. Front. Psychol. 2021, 12, 698490. [Google Scholar] [CrossRef]
- Romero, C.; Ventura, S. Data Mining in Education. WIREs Data Min. Knowl. Discov. 2013, 3, 12–27. [Google Scholar] [CrossRef]
- Njiru, T. Association rule mining in educational data: Unveiling patterns for enhanced learning outcomes. Preprints 2024. [Google Scholar] [CrossRef]
- Xu, R.; Chen, J.; Han, J.; Tan, L.; Xu, L. Towards emotion-sensitive learning cognitive state analysis of big data in education: Deep learning-based facial expression analysis using ordinal information. Computing 2020, 102, 765–780. [Google Scholar] [CrossRef]
- Polatcan, M.; Balcı, A. Social capital wealth as a predictor of innovative climate in schools. Int. J. Contemp. Educ. Res. 2022, 6, 183–194. [Google Scholar] [CrossRef]
- Lu, M. Research on data visualization analysis in education curriculum quality management and student development. In Proceedings of the Annual Conference on Computers, Ottawa, ON, Canada, 16–18 October 2020. [Google Scholar] [CrossRef]
- Hansen, L.; Holanda, M.; Borges, V.R.P.; Da Silva, D. Visual analysis of educational data: A case study of introductory programming courses at the University of Brasília. In Proceedings of the 2022 IEEE Frontiers in Education Conference (FIE), Uppsala, Sweden, 8–11 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar] [CrossRef]
- Aucancela, M.; Briones, A.G.; Chamoso, P. Educational recommender systems: A systematic literature review. In Proceedings of the Barcelona Conference on Education 2023: Official Conference Proceedings, Barcelona, Spain, 19–23 September 2023; pp. 933–951. [Google Scholar] [CrossRef]
- Kandhro, I.A.; Chhajro, M.A.; Kumar, K.; Lashari, H.N.; Khan, U. Student feedback sentiment analysis model using various machine learning schemes: A review. Indian J. Sci. Technol. 2019, 12, 1–9. [Google Scholar] [CrossRef]
- Asad, R.; Altaf, S.; Ahmad, S.; Mahmoud, H.A.; Huda, S.; Iqbal, S. Machine learning-based hybrid ensemble model achieving precision education for online education amid the lockdown period of COVID-19 pandemic in Pakistan. Sustainability 2023, 15, 5431. [Google Scholar] [CrossRef]
- Hernandez-de-Menendez, M.; Morales-Menendez, R.; Escobar, C.A.; Ramírez Mendoza, R.A. Learning analytics: State of the art. Int. J. Interact. Des. Manuf. 2022, 16, 1209–1230. [Google Scholar] [CrossRef]
- Gadde, S.S.; Anand, D.; Sasidhar Babu, N.; Pujitha, B.V.; Sai Reethi, M.; Pradeep Ghantasala, G.S. Performance prediction of students using machine learning algorithms. In Lecture Notes in Mechanical Engineering; Applications of Computational Methods in Manufacturing and Product Design; Springer: Singapore, 2022; pp. 405–411. [Google Scholar] [CrossRef]
- Pardo, A.; Siemens, G. Ethical and Privacy Principles for Learning Analytics. Br. J. Educ. Technol. 2014, 45, 438–450. [Google Scholar] [CrossRef]
- Ankora, C.; Aju, D. Integrating Educational Data Mining in Augmented Reality Virtual Learning Environment. In Advances in Computing Communications and Informatics; Bentham Science Publishers: Sharjah, United Arab Emirates, 2022; pp. 1–18. [Google Scholar] [CrossRef]
- Liu, N.; Chen, Y.; Yang, X.; Hu, Y. Do Demographic Characteristics Make Differences? Demographic Characteristics as Moderators in the Associations between Only Child Status and Cognitive/Non-cognitive Outcomes in China. Front. Psychol. 2017, 8, 423. [Google Scholar] [CrossRef]
- Shaukat, S.M. Exploring the potential of augmented reality (AR) and virtual reality (VR) in education. Int. J. Adv. Res. Sci. Commun. Technol. 2023, 3, 52–57. [Google Scholar] [CrossRef]
- Lampropoulos, G.; Keramopoulos, E.; Diamantaras, K.; Evangelidis, G. Augmented reality and virtual reality in education: Public perspectives, sentiments, attitudes, and discourses. Educ. Sci. 2022, 12, 798. [Google Scholar] [CrossRef]
- Khan, A.; Ghosh, S.K. Student performance analysis and prediction in classroom learning: A review of educational data mining studies. Educ. Inf. Technol. 2021, 26, 205–240. [Google Scholar] [CrossRef]
- Dol, S.M.; Jawandhiya, P.M. Review of EDM for Analyzing the Performance of Students in Educational Settings. In Proceedings of the 2022 6th International Conference on Computing, Communication, Control and Automation (ICCUBEA), Pune, India, 26–27 August 2022; IEEE: Piscataway, NJ, USA, 2022. [Google Scholar] [CrossRef]
- Mavridis, A.; Symeonidis, A.L. A review of sentiment analysis applied to education. J. Educ. Technol. Soc. 2021, 24, 48–58. [Google Scholar]
- Raza, S.; Rahman, M.; Kamawal, S.; Toroghi, A.; Raval, A.; Navah, F.; Kazemeini, A. A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice. arXiv 2024, arXiv:2407.13699. Available online: https://arxiv.org/abs/2407.13699 (accessed on 29 September 2024).
- Wei, H.; Cong, W.; Wu, A.; Zhou, G. Prediction method of higher education college students’ employability based on data mining. In Learning and Analytics in Intelligent Systems; Springer: Cham, Switzerland, 2024; pp. 144–154. [Google Scholar] [CrossRef]
- Pliuskuvienė, B.; Radvilaitė, U.; Juodagalvytė, R.; Ramanauskaitė, S.; Stefanovič, P. Educational data mining and learning analytics: Text generators usage effect on students’ grades. New Trends Comput. Sci. 2024, 2, 19–30. [Google Scholar] [CrossRef]
- Hanumanthappa, S.; Prakash, C. Machine learning based education data mining through student session streams. Int. J. Reconfigur. Embedded Syst. 2024, 13, 383. [Google Scholar] [CrossRef]
- Tosun, S.; Bakan Kalaycıoğlu, D. Data mining approach for prediction of academic success in open and distance education. J. Educ. Technol. Online Learn. 2024, 7, 168–176. [Google Scholar] [CrossRef]
- Bussaman, S.; Nasa-Ngium, P.; Sararat, T.; Nuankaew, W.S.; Nuankaew, P. Influence analytics model of the general education courses toward the academic achievement of Rajabhat university students using data mining techniques. In Smart Innovation, Systems and Technologies; Springer: Cham, Switzerland, 2024; pp. 117–129. [Google Scholar] [CrossRef]
- Shen, S. Exploration of the management mode and quality evaluation of entrepreneurship education in colleges and universities based on data mining. Trans. Comp. Educ. 2024, 6, 1. [Google Scholar] [CrossRef]
- Chen, Z. Intelligent evaluation system for labor education quality based on data mining. In Proceedings of the 2024 IEEE 7th Eurasian Conference on Educational Innovation (ECEI), Bangkok, Thailand, 26–28 January 2024. [Google Scholar] [CrossRef]
- Papadogiannis, I.; Wallace, M.; Poulopoulos, V.; Vassilakis, C.; Lepouras, G.; Platis, N. An Assessment of the Effectiveness of the Remedial Teaching Education Policy. Knowledge 2023, 3, 349–363. [Google Scholar] [CrossRef]
- Papadogiannis, I.; Wallace, M.; Poulopoulos, V. Examining Pupils’ Achievement in Primary and Secondary Schools in Greece. Eur. J. Eng. Technol. Res. 2022, 2022, 10–18. [Google Scholar] [CrossRef]
- Roski, M.; Ewerth, R.; Hoppe, A.; Nehring, A. Exploring data mining in chemistry education: Building a web-based learning platform for learning analytics. J. Chem. Educ. 2024, 101, 930–940. [Google Scholar] [CrossRef]
- Gagnon, D.J.; Swanson, L.; Harpstead, E. Open game data: Defining a pipeline and standards for educational data mining and learning analytics with video game data. In Proceedings of the 2024 IEEE Conference on Games (CoG), Milan, Italy, 5–8 August 2024; pp. 1–8. [Google Scholar] [CrossRef]
- Pan, J. Research on the online learning mechanism of education based on data mining. In Proceedings of the 2024 International Conference on Informatics Education and Computer Technology Applications (IECA), Beijing, China, 26–28 January 2024; pp. 38–41. [Google Scholar] [CrossRef]
- Hajjej, F.; Ayouni, S.; Alohali, M.A.; Maddeh, M. Novel framework for autism spectrum disorder identification and tailored education with effective data mining and ensemble learning techniques. IEEE Access 2024, 12, 35448–35461. [Google Scholar] [CrossRef]
- Chen, J. Construction of E-learning English wisdom classroom based on educational big data mining. Comput.-Aided Des. Appl. 2024, 21, 251–264. [Google Scholar] [CrossRef]
- Yu, S.; Zhang, Z.; Kang, K.; Zhu, L.; Jiang, X. Discussion on individualized teaching strategies of international Chinese education based on data mining. In Learning and Analytics in Intelligent Systems; Springer: Cham, Switzerland, 2024; pp. 574–583. [Google Scholar] [CrossRef]
- Zhang, A. Research and practice of E-learning education and teaching mode based on data mining technology. Comput.-Aided Des. Appl. 2024, 21, 32–44. [Google Scholar] [CrossRef]
- Rybalchenko, A.; Abildinova, G. Personalizing the learning process through data mining in higher education. Sci. Herald Uzhhorod Univ. Phys. Ser. 2024, 56, 1580–1588. [Google Scholar] [CrossRef]
- Zhang, L. Data mining and learning behaviour analysis of French online education data-driven teaching based on generative adversarial network improvement Apriori algorithm. Int. J. Wirel. Mobile Comput. 2024, 1, 1. [Google Scholar] [CrossRef]
- Ji, X.; Sun, L.; Xu, X.; Lei, X. Construction and innovative exploration of personalized learning systems in the context of educational data mining. Int. J. Inform. Commun. Technol. Educ. 2024, 20, 1–14. [Google Scholar] [CrossRef]
- Sareminia, S.; Mohammadi Dehcheshmeh, V. Developing an intelligent and sustainable model to improve E-learning satisfaction based on the learner’s personality type: Data mining approach in high education systems. Int. J. Inform. Learn. Technol. 2024, 41, 394–427. [Google Scholar] [CrossRef]
- Chen, X.; Cao, C. Research on building community education platform based on data mining technology. In Learning and Analytics in Intelligent Systems; Springer: Cham, Switzerland, 2024; pp. 398–406. [Google Scholar] [CrossRef]
- Wang, Y. University moral education management system using ensemble learning in data mining. In Proceedings of the 2024 International Conference on Data Science and Network Security (ICDSNS), Tiptur, India, 26–27 July 2024; pp. 1–4. [Google Scholar] [CrossRef]
- Han, L. Prediction and analysis of students’ behavior based on data mining in educational administration. In Learning and Analytics in Intelligent Systems; Springer: Cham, Switzerland, 2024; pp. 229–238. [Google Scholar] [CrossRef]
- Liu, W.; Qin, X.; Yang, L. High quality management of higher education based on data mining. Int. J. Bus. Intell. Data Min. 2024, 25, 424–450. [Google Scholar] [CrossRef]
- Li, S.; Ma, B.; Meng, D. Reflections on strategies for psychological health education for college students based on data mining. Int. J. Bus. Intell. Data Min. 2024, 25, 394–408. [Google Scholar] [CrossRef]
- Kawesha, F.; Phiri, J. Data mining and machine learning-based predictive model to support decision-making for the accreditation of learning programmes at the higher education authority. In Lecture Notes in Networks and Systems; Springer: Cham, Switzerland, 2024; pp. 351–361. [Google Scholar] [CrossRef]
- Pan, W. Study on quality evaluation method of multimedia distance education based on Data Mining. Int. J. Contin. Eng. Educ. Lifelong Learn. 2024, 34, 194–203. [Google Scholar] [CrossRef]
- Wang, L.; Wang, B.; Huang, H.; Zhang, X. Research on the teaching reform of Data Mining and Data Analysis based on the concept of ‘outcomes-Based Education’. High. Educ. Pract. 2024, 1, 1–6. [Google Scholar] [CrossRef]
- Zhong, Q. Intelligent optimization of labor education curriculum based on data mining technology. In Proceedings of the 2024 IEEE 4th International Conference on Electronic Communications, Internet of Things and Big Data (ICEIB), Taipei, Taiwan, 19–21 April 2024. [Google Scholar] [CrossRef]
Reference | Year | Findings |
---|---|---|
Romero and Ventura [15] | 2007 |
|
Baker and Yacef [16] | 2009 |
|
Romero and Ventura [5] | 2010 |
|
Papamitsiou and Economides [17] | 2014 |
|
Pena-Ayala [18] | 2014 |
|
Thakar, Mehta, and Manisha [19] | 2015 |
|
Sukhija et al. [20] | 2015 |
|
Del Rio and Insuasti [21] | 2016 |
|
Type | Percent |
---|---|
Academic performance data | 36.2% |
Behavioral interaction data | 20.3% |
Programming data | 20.3% |
Demographic data | 11.6% |
Contextual data | 10.1% |
Psychometric data | 1.4% |
Datasets | URL | Description |
---|---|---|
ASSISTments Data Set 2012–2013 | https://sites.google.com/view/assistmentsdatamining/home (accessed on 5 October 2024) | Competition data set using real-world educational data. |
Canvas Network dataset | https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/1XORAL (accessed on 5 October 2024) | Data from Canvas Network |
DataShop | https://pslcdatashop.web.cmu.edu/index.jsp?datasets=public (accessed on 5 October 2024) | A large repository of learning interaction data. |
Educational Process Mining Dataset | https://archive.ics.uci.edu/dataset/346/educational+process+mining+epm+a+learning+analytics+data+set (accessed on 5 October 2024) | Students’ logs from activities through a logging application while learning with an educational simulator. |
HarvardX-MITx dataset | https://dataverse.harvard.edu/dataverse/mxhx (accessed on 5 October 2024) | Deidentified student-level data from the first year of HarvardX and MITx courses. |
KDD Cup 2010 Dataset | https://pslcdatashop.web.cmu.edu/KDDCup/ (accessed on 5 October 2024) | Data from an education data mining challenge in 2010. |
Educational Data Set Prize | https://educationaldatamining.org/data-set-awards/ (accessed on 5 October 2024) | Contains data about courses, students, and their interactions with Virtual Learning Environment. |
NUS Multisensor Presentation Dataset | https://scholarbank.nus.edu.sg/handle/10635/137261 (accessed on 5 October 2024) | It contains real-world presentations recorded in a multisensory environment. |
Open University Learning Analytics Dataset | https://analyse.kmi.open.ac.uk/open_dataset (accessed on 5 October 2024) | It contains data about courses, students, and their interactions with Moodle for seven selected courses. |
Student Performance Dataset | https://archive.ics.uci.edu/ml/datasets/Student+Performance (accessed on 5 October 2024) | Student achievement data in secondary education of two Portuguese schools. |
xAPI-Educational Mining Dataset | https://www.kaggle.com/aljarah/xAPI-Edu-Data (accessed on 5 October 2024) | Academic performance dataset from e-learning system. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Papadogiannis, I.; Wallace, M.; Karountzou, G. Educational Data Mining: A Foundational Overview. Encyclopedia 2024, 4, 1644-1664. https://doi.org/10.3390/encyclopedia4040108
Papadogiannis I, Wallace M, Karountzou G. Educational Data Mining: A Foundational Overview. Encyclopedia. 2024; 4(4):1644-1664. https://doi.org/10.3390/encyclopedia4040108
Chicago/Turabian StylePapadogiannis, Ilias, Manolis Wallace, and Georgia Karountzou. 2024. "Educational Data Mining: A Foundational Overview" Encyclopedia 4, no. 4: 1644-1664. https://doi.org/10.3390/encyclopedia4040108
APA StylePapadogiannis, I., Wallace, M., & Karountzou, G. (2024). Educational Data Mining: A Foundational Overview. Encyclopedia, 4(4), 1644-1664. https://doi.org/10.3390/encyclopedia4040108