Data Mining and Computational Intelligence for E-Learning and Education—3rd Edition

A special issue of Data (ISSN 2306-5729). This special issue belongs to the section "Information Systems and Data Management".

Deadline for manuscript submissions: 20 August 2025 | Viewed by 6011

Special Issue Editor


E-Mail Website
Guest Editor

Special Issue Information

Dear Colleagues,

In recent decades, the rise of artificial intelligence has driven its application in various fields, including education. Applications exist that aim to analyze the data of the learning–teaching activity, both in face-to-face environments and distance-learning environments, using intelligent algorithms to extract information about the educational process. Using this information, it is possible to infer aspects such as the reasons for the success or failure of students, patterns of behavior and learning, and other predictions. Likewise, applications have also been developed that implement intelligent algorithms to automate the educational process. Related to this last point is the development of chatbots and approaches to ethics in the use of artificial intelligence. In this sense, an area of interest has developed related to the application of artificial intelligence to problem solving in education. This Special Issue will bring together works that show the latest advances in the application of artificial intelligence in the educational field, as well as those describing specific experiences and applications to certain problems.

This Special Issue will serve as a meeting point for all researchers working in these fields, whether they study theories or applications. The topics of interest include, but are not limited to, the following:

  • Generative AI for creating educational content and learning materials;
  • Machine learning and deep learning applications in education;
  • Artificial intelligence for personalized learning and adaptive education;
  • Big data and learning analytics in education;
  • Intelligent tutoring systems and virtual teaching assistants;
  • Predictive analytics and data-driven decision-making in education;
  • AI-driven content creation and curriculum development;
  • Ethics and bias in AI-powered educational tools;
  • Learning experience platforms (LXP) and smart educational environments;
  • Data privacy and security in AI-enhanced learning systems;
  • Conversational AI and chatbots for enhancing student engagement;
  • Gamification and immersive learning experiences using AI;
  • Natural language processing (NLP) in educational assessments and feedback.

Both review articles on the state of the art and experimental/theoretical articles are welcome.

Dr. Antonio Sarasa-Cabezuelo
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Data is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • e-learning
  • Machine learning
  • Artificial intelligence
  • Data analysis
  • Algorithms
  • Big data

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issues

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

10 pages, 267 KiB  
Article
Dataset on Programming Competencies Development Using Scratch and a Recommender System in a Non-WEIRD Primary School Context
by Jesennia Cárdenas-Cobo, Cristian Vidal-Silva and Nicolás Máquez
Data 2025, 10(6), 86; https://doi.org/10.3390/data10060086 - 3 Jun 2025
Viewed by 215
Abstract
The ability to program has become an essential competence for individuals in an increasingly digital world. However, access to programming education remains unequal, particularly in non-WEIRD (Western, Educated, Industrialized, Rich, and Democratic) contexts. This study presents a dataset resulting from an educational intervention [...] Read more.
The ability to program has become an essential competence for individuals in an increasingly digital world. However, access to programming education remains unequal, particularly in non-WEIRD (Western, Educated, Industrialized, Rich, and Democratic) contexts. This study presents a dataset resulting from an educational intervention designed to foster programming competencies and computational thinking skills among primary school students aged 8 to 12 years in Milagro, Ecuador. The intervention integrated Scratch, a block-based programming environment that simplifies coding by eliminating syntactic barriers, and the CARAMBA recommendation system, which provided personalized learning paths based on students’ progression and preferences. A structured educational process was implemented, including an initial diagnostic test to assess logical reasoning, guided activities in Scratch to build foundational skills, a phase of personalized practice with CARAMBA, and a final computational thinking evaluation using a validated assessment instrument. The resulting dataset encompasses diverse information: demographic data, logical reasoning test scores, computational thinking test results pre- and post-intervention, activity logs from Scratch, recommendation histories from CARAMBA, and qualitative feedback from university student tutors who supported the intervention. The dataset is anonymized, ethically collected, and made available under a CC-BY 4.0 license to encourage reuse. This resource is particularly valuable for researchers and practitioners interested in computational thinking development, educational data mining, personalized learning systems, and digital equity initiatives. It supports comparative studies between WEIRD and non-WEIRD populations, validation of adaptive learning models, and the design of inclusive programming curricula. Furthermore, the dataset enables the application of machine learning techniques to predict educational outcomes and optimize personalized educational strategies. By offering this dataset openly, the study contributes to filling critical gaps in educational research, promoting inclusive access to programming education, and fostering a more comprehensive understanding of how computational competencies can be developed across diverse socioeconomic and cultural contexts. Full article
Show Figures

Figure 1

29 pages, 4066 KiB  
Article
SAPEx-D: A Comprehensive Dataset for Predictive Analytics in Personalized Education Using Machine Learning
by Muhammad Adnan Aslam, Fiza Murtaza, Muhammad Ehatisham Ul Haq, Amanullah Yasin and Numan Ali
Data 2025, 10(3), 27; https://doi.org/10.3390/data10030027 - 20 Feb 2025
Cited by 1 | Viewed by 1161
Abstract
Education is crucial for leading a productive life and obtaining necessary resources. Higher education institutions are progressively incorporating artificial intelligence into conventional teaching methods as a result of innovations in technology. As a high academic record raises a university’s ranking and increases student [...] Read more.
Education is crucial for leading a productive life and obtaining necessary resources. Higher education institutions are progressively incorporating artificial intelligence into conventional teaching methods as a result of innovations in technology. As a high academic record raises a university’s ranking and increases student career chances, predicting learning success has been a central focus in education. Both performance analysis and providing high-quality instruction are challenges faced by modern schools. Maintaining high academic standards, juggling life and academics, and adjusting to technology are problems that students must overcome. In this study, we present a comprehensive dataset, SAPEx-D (Student Academic Performance Exploration), designed to predict student performance, encompassing a wide array of personal, familial, academic, and behavioral factors. Our data collection effort at Air University, Islamabad, Pakistan, involved both online and paper questionnaires completed by students across multiple departments, ensuring diverse representation. After meticulous preprocessing to remove duplicates and entries with significant missing values, we retained 494 valid responses. The dataset includes detailed attributes such as demographic information, parental education and occupation, study habits, reading frequencies, and transportation modes. To facilitate robust analysis, we encoded ordinal attributes using label encoding and nominal attributes using one-hot encoding, expanding our dataset from 38 to 88 attributes. Feature scaling was performed to standardize the range and distribution of data, using a normalization technique. Our analysis revealed that factors such as degree major, parental education, reading frequency, and scholarship type significantly influence student performance. The machine learning models applied to this dataset, including Gradient Boosting and Random Forest, demonstrated high accuracy and robustness, underscoring the dataset’s potential for insightful academic performance prediction. In terms of model performance, Gradient Boosting achieved an accuracy of 68.7% and an F1-score of 68% for the eight-class classification task. For the three-class classification, Random Forest outperformed other models, reaching an accuracy of 80.8% and an F1-score of 78%. These findings highlight the importance of comprehensive data in understanding and predicting academic outcomes, paving the way for more personalized and effective educational strategies. Full article
Show Figures

Figure 1

17 pages, 662 KiB  
Article
A Bayesian State-Space Approach to Dynamic Hierarchical Logistic Regression for Evolving Student Risk in Educational Analytics
by Moeketsi Mosia
Data 2025, 10(2), 23; https://doi.org/10.3390/data10020023 - 7 Feb 2025
Viewed by 997
Abstract
Early detection of academically at-risk students is crucial for designing timely interventions that improve educational outcomes. However, many existing approaches either ignore the temporal evolution of student performance or rely on “black box” models that sacrifice interpretability. In this study, we develop a [...] Read more.
Early detection of academically at-risk students is crucial for designing timely interventions that improve educational outcomes. However, many existing approaches either ignore the temporal evolution of student performance or rely on “black box” models that sacrifice interpretability. In this study, we develop a dynamic hierarchical logistic regression model in a fully Bayesian framework to address these shortcomings. Our method leverages partial pooling across students and employs a state-space formulation, allowing each student’s log-odds of failure to evolve over multiple assessments. By using Markov chain Monte Carlo for inference, we obtain robust posterior estimates and credible intervals for both population-level and individual-specific effects, while posterior predictive checks ensure model adequacy and calibration. Results from simulated and real-world datasets indicate that the proposed approach more accurately tracks fluctuations in student risk compared to static logistic regression, and it yields interpretable insights into how engagement patterns and demographic factors influence failure probability. We conclude that a Bayesian dynamic hierarchical model not only enhances prediction of at-risk students but also provides actionable feedback for instructors and administrators seeking evidence-based interventions. Full article
Show Figures

Figure 1

14 pages, 770 KiB  
Article
Stress Factors in Higher Education: A Data Analysis Case
by Rodolfo Bojorque, Fernando Moscoso, Fernando Pesántez and Ángela Flores
Data 2025, 10(2), 22; https://doi.org/10.3390/data10020022 - 7 Feb 2025
Viewed by 1715
Abstract
This study investigates stressors in higher education, focusing on their impact on students and faculty at Universidad Politécnica Salesiana (UPS) and using eight years of comprehensive data. Employing data mining techniques, the research analyzed enrollment, retention, graduation, employability, socioeconomic status, academic performance, and [...] Read more.
This study investigates stressors in higher education, focusing on their impact on students and faculty at Universidad Politécnica Salesiana (UPS) and using eight years of comprehensive data. Employing data mining techniques, the research analyzed enrollment, retention, graduation, employability, socioeconomic status, academic performance, and faculty workload to uncover patterns affecting academic outcomes. The study found that UPS exhibits a stable educational system, maintaining consistent metrics across student success indicators. However, the COVID-19 pandemic presented unique stressors, evidenced by a paradoxical increase in student grades during heightened faculty stress levels. This anomaly suggests a potential link between academic rigor and faculty well-being during systemic disruptions. Stressors affecting students directly correlated with reduced academic performance, highlighting the importance of early detection and intervention. Conversely, faculty stress was reflected in adjustments to grading practices, raising questions about institutional pressures and faculty motivation. These findings emphasize the value of proactive data analytics in identifying stress-induced anomalies to support student success and faculty well-being. The study advocates for further research on faculty burnout, motivation, and institutional strategies to mitigate stressors, underscoring the potential of data-driven approaches to enhance the quality and sustainability of higher education ecosystems. Full article
Show Figures

Figure 1

Other

Jump to: Research

17 pages, 6223 KiB  
Data Descriptor
A Structured Dataset for Automated Grading: From Raw Data to Processed Dataset
by Ibidapo Dare Dada, Adio T. Akinwale and Ti-Jesu Tunde-Adeleke
Data 2025, 10(6), 87; https://doi.org/10.3390/data10060087 - 6 Jun 2025
Viewed by 192
Abstract
The increasing volume of student assessments, particularly open-ended responses, presents a significant challenge for educators in ensuring grading accuracy, consistency, and efficiency. This paper presents a structured dataset designed for the development and evaluation of automated grading systems in higher education. The primary [...] Read more.
The increasing volume of student assessments, particularly open-ended responses, presents a significant challenge for educators in ensuring grading accuracy, consistency, and efficiency. This paper presents a structured dataset designed for the development and evaluation of automated grading systems in higher education. The primary objective is to create a high-quality dataset that facilitates the development and evaluation of natural language processing (NLP) models for automated grading. The dataset comprises student responses to open-ended questions from the Management Information Systems (MIS221) and Project Management (MIS415) courses at Covenant University, collected during the 2022/2023 academic session. The responses were originally handwritten, scanned, and transcribed into Word documents. Each response is paired with corresponding scores assigned by human graders, following a detailed marking guide. To assess the dataset’s potential for automated grading applications, several machine learning and transformer-based models were tested, including TF-IDF with Linear Regression, TF-IDF with Cosine Similarity, BERT, SBERT, RoBERTa, and Longformer. The experimental results demonstrate that transformer-based models outperform traditional methods, with Longformer achieving the highest Spearman’s Correlation of 0.77 and the lowest Mean Squared Error (MSE) of 0.04, indicating a strong alignment between model predictions and human grading. The findings highlight the effectiveness of deep learning models in capturing the semantic and contextual meaning of both student responses and marking guides, making it possible to develop more scalable and reliable automated grading solutions. This dataset offers valuable insights into student performance and serves as a foundational resource for integrating educational technology into automated assessment systems. Future work will focus on enhancing grading consistency and expanding the dataset for broader academic applications. Full article
Show Figures

Figure 1

11 pages, 1930 KiB  
Data Descriptor
Towards a Datatset of Digitalized Historical German VET and CVET Regulations
by Thomas Reiser, Jens Dörpinghaus, Petra Steiner and Michael Tiemann
Data 2024, 9(11), 128; https://doi.org/10.3390/data9110128 - 3 Nov 2024
Viewed by 1103
Abstract
The digitization of historical documents has gained particular interest in recent years in the digital humanities. The goal is to digitize historical documents by extracting and structuring text from scanned images. Here, we focus on the processing of historical German VET (vocational education [...] Read more.
The digitization of historical documents has gained particular interest in recent years in the digital humanities. The goal is to digitize historical documents by extracting and structuring text from scanned images. Here, we focus on the processing of historical German VET (vocational education and training) and CVET (continuing vocational education and training) regulations to support educational research. This dataset contains data from 1908 to the present and includes 2125 documents as PDF, 983 fully converted XML documents, and additional metadata for 7090 documents from the archive. We present an overview of the historical background and the challenges of processing different historical documents from three different federal states. Full article
Show Figures

Figure 1

Back to TopTop