MDPI - Publisher of Open Access Journals

31 pages, 334 KiB

Open AccessArticle

Enhancing Discoverability: A Metadata Framework for Empirical Research in Theses

by Giannis Vassiliou, George Tsamis, Stavroula Chatzinikolaou, Thomas Nipurakis and Nikos Papadakis

Algorithms 2025, 18(8), 490; https://doi.org/10.3390/a18080490 - 6 Aug 2025

Despite the significant volume of empirical research found in student-authored academic theses—particularly in the social sciences—these works are often poorly documented and difficult to discover within institutional repositories. A key reason for this is the lack of appropriate metadata frameworks that balance descriptive [...] Read more.

Despite the significant volume of empirical research found in student-authored academic theses—particularly in the social sciences—these works are often poorly documented and difficult to discover within institutional repositories. A key reason for this is the lack of appropriate metadata frameworks that balance descriptive richness with usability. General standards such as Dublin Core are too simplistic to capture critical research details, while more robust models like the Data Documentation Initiative (DDI) are too complex for non-specialist users and not designed for use with student theses. This paper presents the design and validation of a lightweight, web-based metadata framework specifically tailored to document empirical research in academic theses. We are the first to adapt existing hybrid Dublin Core–DDI approaches specifically for thesis documentation, with a novel focus on cross-methodological research and non-expert usability. The model was developed through a structured analysis of actual student theses and refined to support intuitive, structured metadata entry without requiring technical expertise. The resulting system enhances the discoverability, classification, and reuse of empirical theses within institutional repositories, offering a scalable solution to elevate the visibility of the gray literature in higher education. Full article

(This article belongs to the Special Issue AI-Driven Solutions for Smart Systems in Engineering, Computing, Education, and Society)

17 pages, 1707 KiB

Open AccessArticle

A Structural Causal Model Ontology Approach for Knowledge Discovery in Educational Admission Databases

by Bern Igoche Igoche, Olumuyiwa Matthew and Daniel Olabanji

Knowledge 2025, 5(3), 15; https://doi.org/10.3390/knowledge5030015 - 4 Aug 2025

Viewed by 77

Abstract

Educational admission systems, particularly in developing countries, often suffer from opaque decision processes, unstructured data, and limited analytic insight. This study proposes a novel methodology that integrates structural causal models (SCMs), ontological modeling, and machine learning to uncover and apply interpretable knowledge from [...] Read more.

Educational admission systems, particularly in developing countries, often suffer from opaque decision processes, unstructured data, and limited analytic insight. This study proposes a novel methodology that integrates structural causal models (SCMs), ontological modeling, and machine learning to uncover and apply interpretable knowledge from an admission database. Using a dataset of 12,043 records from Benue State Polytechnic, Nigeria, we demonstrate this approach as a proof of concept by constructing a domain-specific SCM ontology, validate it using conditional independence testing (CIT), and extract features for predictive modeling. Five classifiers, Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbors (KNN), and Support Vector Machine (SVM) were evaluated using stratified 10-fold cross-validation. SVM and KNN achieved the highest classification accuracy (92%), with precision and recall scores exceeding 95% and 100%, respectively. Feature importance analysis revealed ‘mode of entry’ and ‘current qualification’ as key causal factors influencing admission decisions. This framework provides a reproducible pipeline that combines semantic representation and empirical validation, offering actionable insights for institutional decision-makers. Comparative benchmarking, ethical considerations, and model calibration are integrated to enhance methodological transparency. Limitations, including reliance on single-institution data, are acknowledged, and directions for generalizability and explainable AI are proposed. Full article

(This article belongs to the Special Issue Knowledge Management in Learning and Education)

► Show Figures

Figure 1

31 pages, 2148 KiB

Open AccessArticle

Supporting Reflective AI Use in Education: A Fuzzy-Explainable Model for Identifying Cognitive Risk Profiles

by Gabriel Marín Díaz

Educ. Sci. 2025, 15(7), 923; https://doi.org/10.3390/educsci15070923 - 18 Jul 2025

Viewed by 511

Abstract

Generative AI tools are becoming increasingly common in education. They make many tasks easier, but they also raise questions about how students interact with information and whether their ability to think critically might be affected. Although these tools are now part of many [...] Read more.

Generative AI tools are becoming increasingly common in education. They make many tasks easier, but they also raise questions about how students interact with information and whether their ability to think critically might be affected. Although these tools are now part of many learning processes, we still do not fully understand how they influence cognitive behavior or digital maturity. This study proposes a model to help identify different user profiles based on how they engage with AI in educational contexts. The approach combines fuzzy clustering, the Analytic Hierarchy Process (AHP), and explainable AI techniques (SHAP and LIME). It focuses on five dimensions: how AI is used, how users verify information, the cognitive effort involved, decision-making strategies, and reflective behavior. The model was tested on data from 1273 users, revealing three main types of profiles, from users who are highly dependent on automation to more autonomous and critical users. The classification was validated with XGBoost, achieving over 99% accuracy. The explainability analysis helped us understand what factors most influenced each profile. Overall, this framework offers practical insight for educators and institutions looking to promote more responsible and thoughtful use of AI in learning. Full article

(This article belongs to the Special Issue Generative AI in Education: Current Trends and Future Directions)

► Show Figures

Figure 1

24 pages, 1952 KiB

Open AccessArticle

How China Governs Open Science: Policies, Priorities, and Structural Imbalances

by Xiaoting Chen, Abdelghani Maddi and Yanyan Wang

Publications 2025, 13(3), 30; https://doi.org/10.3390/publications13030030 - 23 Jun 2025

Viewed by 769

Abstract

This article investigates the architecture and institutional distribution of policy tools supporting open science (OS) in China. Based on a corpus of 199 policy documents comprising 25,885 policy statements, we apply an AI-assisted classification to analyze how the Chinese government mobilizes different types [...] Read more.

This article investigates the architecture and institutional distribution of policy tools supporting open science (OS) in China. Based on a corpus of 199 policy documents comprising 25,885 policy statements, we apply an AI-assisted classification to analyze how the Chinese government mobilizes different types of tools. Using Qwen-plus, a large language model developed by Alibaba Cloud and fine-tuned for OS-related content, each policy statement is categorized into one of fifteen subcategories under three main types: supply-oriented, environment-oriented, and demand-oriented tools. Our findings reveal a strong dominance of supply-oriented tools (63%), especially investments in infrastructure, education, and public services. Demand-oriented tools remain marginal (11%), with little use of economic incentives or regulatory obligations. Environment-oriented tools show more balance but still underrepresent key components like incentive systems and legal mandates for open access. To deepen the analysis, we introduce a normalized indicator of institutional focus, which captures the relative emphasis of each policy type across administrative levels. Results show that supply-oriented tools are concentrated at top-level institutions, reflecting a top-down governance model. Demand tools are localized at lower levels, highlighting limited strategic commitment. Overall, China’s OS policy mix prioritizes infrastructure over incentives, limiting systemic transformation toward a more sustainable open science ecosystem. Full article

► Show Figures

Figure 1

31 pages, 5232 KiB

Open AccessArticle

A Comparative Evaluation of Machine Learning Methods for Predicting Student Outcomes in Coding Courses

by Zakaria Soufiane Hafdi and Said El Kafhali

AppliedMath 2025, 5(2), 75; https://doi.org/10.3390/appliedmath5020075 - 18 Jun 2025

Viewed by 486

Abstract

Artificial intelligence (AI) has found applications across diverse sectors in recent years, significantly enhancing operational efficiencies and user experiences. Educational data mining (EDM) has emerged as a pivotal AI application to transform educational environments by optimizing learning processes and identifying at-risk students. This [...] Read more.

Artificial intelligence (AI) has found applications across diverse sectors in recent years, significantly enhancing operational efficiencies and user experiences. Educational data mining (EDM) has emerged as a pivotal AI application to transform educational environments by optimizing learning processes and identifying at-risk students. This study leverages EDM within a Moroccan university (Hassan First, University Settat, Morocco) context to augment educational quality and improve learning. We introduce a novel “Hybrid approach” that synthesizes students’ historical academic records and their in-class behavioral data, provided by instructors, to predict student performance in initial coding courses. Utilizing a range of machine learning (ML) algorithms, our research applies multi-classification, data augmentation, and binary classification techniques to evaluate student outcomes effectively. The key performance metrics, accuracy, precision, recall, and F1-score, are calculated to assess the efficacy of classification. Our results highlight the long short-term memory (LSTM) algorithm’s robustness achieving the highest accuracy of 94% and an F1-score of 0.87 along with a support vector machine (SVM), indicating high efficacy in predicting student success at the onset of learning coding. Furthermore, the study proposes a comprehensive framework that can be integrated into learning management systems (LMSs) to accommodate generational shifts in student populations, evolving university pedagogies, and varied teaching methodologies. This framework aims to support educational institutions in adapting to changing educational dynamics while ensuring high-quality, tailored learning experiences for students. Full article

► Show Figures

Figure 1

20 pages, 1565 KiB

Open AccessArticle

Long-Term Experiences of Basic Education in Laboratory Animal Science

by Valeria Küller and Johannes Schenkel

Animals 2025, 15(11), 1541; https://doi.org/10.3390/ani15111541 - 25 May 2025

Viewed by 482

Abstract

Adequate education in laboratory animal science and subsequently the attendance of relevant courses are mandatory prerequisites for animal experimentation. The course content for different stakeholders is stipulated by European and national regulations. If all of this content is covered, accreditation by competent bodies [...] Read more.

Adequate education in laboratory animal science and subsequently the attendance of relevant courses are mandatory prerequisites for animal experimentation. The course content for different stakeholders is stipulated by European and national regulations. If all of this content is covered, accreditation by competent bodies is possible and recommended. Here, we present our experiences with an EU-Function A/C/D accredited course (practical training with mice and rats) and an introductory seminar for undergraduate students, which have been running for more than ten years. All courses were organized in-house and were very relevant to the students and their needs but were also very labor intensive. The courses were systematically (and retrospectively) evaluated, showing a high degree of satisfaction and a great acquisition of knowledge, and the organizer was able to re-adjust the courses as needed over the years. Tests demonstrated the students’ progress and highlighted some parts of the lessons that were difficult to convey, such as those on legal regulations, housing and feeding, transport, GM animals, breeding, and the classification of severity. Dummies were proven to be very helpful at the beginning of the training but could not fully replace training with live animals. On-site lectures were favored over online sources, which were needed due to the pandemic. High standards in education are mandatory, and the accreditation process allows for the transferal of certificates to other institutions. Full article

(This article belongs to the Section Animal Ethics)

► Show Figures

Figure 1

20 pages, 1872 KiB

Open AccessArticle

Diagnostic Predictors of Recovery Outcomes Following Open Reduction and Internal Fixation for Tibial Plateau Fractures: A Retrospective Study Based on the Schatzker Classification

by Carlo Biz, Carla Stecco, Samuele Perissinotto, Xiaoxiao Zhao, Raffaele Ierardi, Luca Puce, Filippo Migliorini, Nicola Luigi Bragazzi and Pietro Ruggieri

Diagnostics 2025, 15(11), 1304; https://doi.org/10.3390/diagnostics15111304 - 22 May 2025

Viewed by 683

Abstract

Background: Tibial plateau fractures (TPFs) are complex injuries often leading to long-term complications such as knee instability, limited range of motion, and osteoarthritis. Accurate diagnostic evaluations combining subjective and objective assessments are essential for identifying functional limitations, guiding rehabilitation, and improving recovery [...] Read more.

Background: Tibial plateau fractures (TPFs) are complex injuries often leading to long-term complications such as knee instability, limited range of motion, and osteoarthritis. Accurate diagnostic evaluations combining subjective and objective assessments are essential for identifying functional limitations, guiding rehabilitation, and improving recovery outcomes. This study examines the role of diagnostic predictors in differentiating recovery trajectories in two groups of patients treated for closed TPFs by open reduction and internal fixation (ORIF), comparing patients with less severe fractures and patients with more severe fractures (BCFs). Methods: A consecutive series of patients with a diagnosis of TPFs treated by ORIF at our institution between 2009 and 2016 were analyzed in this retrospective study. All injured patients were divided according to the Schatzker classification into two groups: mono-condylar (MCF) and bi-condylar (BCF) fracture patient groups. Diagnostic evaluations included patient-reported outcome measures (PROMs) such as KOOS, IKDC, and AKSS, alongside objective assessments of functional recovery using dynamometers, force platform tests (single-leg stance and squat jump variations), and measurements of active range of motion (AROM). Results: A total of 28 patients were included: 17 in the MCF patient group (Schatzker: 12 II; 5 III; 0 IV) and 11 in the BCF patient group (Schatzker: 6 V; 5 VI). Patients with less severe MCFs exhibited significantly better recovery outcomes, including higher KOOS (86.0 vs. 64.6, p = 0.04), IKDC (80.3 vs. 64.6, p = 0.04), and AKSS (95.3 vs. 70.5, p = 0.02) scores. They also demonstrated greater knee flexion (122.8° vs. 105.5°, p = 0.04) and faster neuromuscular recovery, as evidenced by higher rates of force development (RFD) during dynamic performance tests. Conversely, patients with more severe BCFs showed lower RFD values, indicating slower recovery and greater rehabilitation challenges. Conclusions: Integrating diagnostic tools like PROMs, AROM, and neuromuscular performance tests provides valuable insights into recovery after ORIF for TPFs. Fracture severity significantly impacts functional recovery patients with MCFs showing better outcomes and faster neuromuscular recovery, while subjects with BCFs require a longer rehabilitation treatment focusing on neuromuscular re-education and soft tissue recovery. Full article

(This article belongs to the Special Issue Clinical Diagnosis and Management in Orthopaedics and Traumatology)

► Show Figures

Figure 1

26 pages, 2575 KiB

Open AccessArticle

Comparing the Effectiveness of Machine Learning and Deep Learning Models in Student Credit Scoring: A Case Study in Vietnam

by Nguyen Thi Hong Thuy, Nguyen Thi Vinh Ha, Nguyen Nam Trung, Vu Thi Thanh Binh, Nguyen Thu Hang and Vu The Binh

Risks 2025, 13(5), 99; https://doi.org/10.3390/risks13050099 - 20 May 2025

Viewed by 1450

Abstract

In emerging markets like Vietnam, where student borrowers often lack traditional credit histories, accurately predicting loan eligibility remains a critical yet underexplored challenge. While machine learning and deep learning techniques have shown promise in credit scoring, their comparative performance in the context of [...] Read more.

In emerging markets like Vietnam, where student borrowers often lack traditional credit histories, accurately predicting loan eligibility remains a critical yet underexplored challenge. While machine learning and deep learning techniques have shown promise in credit scoring, their comparative performance in the context of student loans has not been thoroughly investigated. This study aims to evaluate and compare the predictive effectiveness of four supervised learning models—such as Random Forest, Gradient Boosting, Support Vector Machine, and Deep Neural Network (implemented with PyTorch version 2.6.0)—in forecasting student credit eligibility. Primary data were collected from 1024 university students through structured surveys covering academic, financial, and personal variables. The models were trained and tested on the same dataset and evaluated using a comprehensive set of classification and regression metrics. The findings reveal that each model exhibits distinct strengths. Deep Learning achieved the highest classification accuracy (85.55%), while random forest demonstrated robust performance, particularly in providing balanced results across classification metrics. Gradient Boosting was effective in recall-oriented tasks, and support vector machine demonstrated strong precision for the positive class, although its recall was lower compared to other models. The study highlights the importance of aligning model selection with specific application goals, such as prioritizing accuracy, recall, or interpretability. It offers practical implications for financial institutions and universities in developing machine learning and deep learning tools for student loan eligibility prediction. Future research should consider longitudinal data, behavioral factors, and hybrid modeling approaches to further optimize predictive performance in educational finance. Full article

► Show Figures

Figure 1

20 pages, 1064 KiB

Open AccessArticle

Predicting Early Employability of Vietnamese Graduates: Insights from Data-Driven Analysis Through Machine Learning Methods

by Long-Sheng Chen, Thao-Trang Huynh-Cam, Van-Canh Nguyen, Tzu-Chuen Lu and Dang-Khoa Le-Huynh

Big Data Cogn. Comput. 2025, 9(5), 134; https://doi.org/10.3390/bdcc9050134 - 19 May 2025

Viewed by 1853

Abstract

Graduate employability remains a crucial challenge for higher education institutions, especially in developing economies. This study investigates the key academic and vocational factors influencing early employment outcomes among recent graduates at a public university in Vietnam’s Mekong Delta region. By leveraging predictive analytics, [...] Read more.

Graduate employability remains a crucial challenge for higher education institutions, especially in developing economies. This study investigates the key academic and vocational factors influencing early employment outcomes among recent graduates at a public university in Vietnam’s Mekong Delta region. By leveraging predictive analytics, the research explores how data-driven approaches can enhance career readiness strategies. The analysis employed AI-driven models, particularly classification and regression trees (CARTs), using a dataset of 610 recent graduates from a public university in the Mekong Delta to predict early employability. The input factors included gender, field of study, university entrance scores, and grade point average (GPA) scores for four university years. The output factor was recent graduates’ (un)employment within six months after graduation. Among all input factors, third-year GPA, university entrance scores, and final-year academic performance are the most significant predictors of early employment. Among the tested models, CARTs achieved the highest accuracy (93.6%), offering interpretable decision rules that can inform curriculum design and career support services. This study contributes to the intersection of artificial intelligence and vocational education by providing actionable insights for universities, policymakers, and employers, supporting the alignment of education with labor market demands and improving graduate employability outcomes. Full article

► Show Figures

Figure 1

20 pages, 988 KiB

Open AccessReview

Safety and Security Considerations for Online Laboratory Management Systems

by Andrea Eugenia Pena-Molina and Maria Mercedes Larrondo-Petrie

J. Cybersecur. Priv. 2025, 5(2), 24; https://doi.org/10.3390/jcp5020024 - 13 May 2025

Viewed by 771

Abstract

The pandemic forced educators to shift abruptly to distance learning, also referred to as e-learning education. Educational institutions integrated new educational tools and online platforms. Several schools, colleges, and universities began incorporating online laboratories in different fields of education, such as engineering, information [...] Read more.

The pandemic forced educators to shift abruptly to distance learning, also referred to as e-learning education. Educational institutions integrated new educational tools and online platforms. Several schools, colleges, and universities began incorporating online laboratories in different fields of education, such as engineering, information technology, physics, and chemistry. Online laboratories may take the form of virtual laboratories, software-based simulations available via the Internet, or remote labs, which involve accessing physical equipment online. Adopting remote laboratories as a substitute for conventional hands-on labs has raised concerns regarding the safety and security of both the remote lab stations and the Online Laboratory Management Systems (OLMSs). Design patterns and architectures need to be developed to attain security by design in remote laboratories. Before these can be developed, software architects and developers must understand the domain and existing and proposed solutions. This paper presents an extensive literature review of safety and security concerns related to remote laboratories and an overview of the industry, national and multinational standards, and legal requirements and regulations that need to be considered in building secure and safe Online Laboratory Management Systems. This analysis provides a taxonomy and classification of published standards as well as security and safety problems and possible solutions that can facilitate the documentation of best practices, and implemented solutions to produce security by design for remote laboratories and OLMSs. Full article

► Show Figures

Figure 1

22 pages, 4114 KiB

Open AccessArticle

Coupling Analysis of “Demand–Satisfaction” for Rural Public Service Facilities Based on the Kano Model with Importance–Performance Analysis: A Case Study of Gaoqing County, Zibo City

by Xinlei Wang, Jinwei Wen, Jing He, Mengying Wang, Keju Liu, Jinghua Dai, Dingqing Zhang, Dian Zhou and Yingtao Qi

Buildings 2025, 15(10), 1614; https://doi.org/10.3390/buildings15101614 - 10 May 2025

Viewed by 504

Abstract

In the context of people-oriented and high-quality development, improving the quality of rural public service facilities is an important way to promote rural revitalization. However, at present, there are problems such as imbalanced resource allocation, single service functions, and low service quality in [...] Read more.

In the context of people-oriented and high-quality development, improving the quality of rural public service facilities is an important way to promote rural revitalization. However, at present, there are problems such as imbalanced resource allocation, single service functions, and low service quality in regard to rural public service facilities in general, and the contradiction between supply and demand is becoming increasingly prominent. Only by effectively grasping the correlation between residents’ needs and their satisfaction with such facilities can the precise allocation of service facilities be carried out. In this study, Gaoqing County, Zibo City, Shandong Province, is selected as the empirical case. By applying the Kano model and Importance–Performance Analysis (IPA) through the use of questionnaire surveys and expert interviews, we develop an integrated “demand–satisfaction” analytical framework to propose targeted optimization strategies for rural public service facilities. Based on the Kano model, the essential, desired, and attractive needs of rural residents for various types of facilities were identified. Combining the IPA model, the coupling and coordination relationship between demand and satisfaction was further analyzed, and rural public service facilities were analyzed and categorized accordingly. The research results show that educational institution facilities, healthcare facilities, and market facilities have the highest demand, while cultural, sports, and technological facilities have relatively low demand. Based on the analysis results, this study proposes an optimization strategy for the classification and grading of the “demand–satisfaction” coupling relationship of public service facilities. It provides a scientific basis for the layout optimization of rural public service facilities, and also provides an effective reference for the sustainable development and service quality improvement and upgrading of rural areas. Full article

(This article belongs to the Special Issue Renewal Design and Challenge of Urban and Rural Livable Environments in the Age of Population Shrinkage)

► Show Figures

Figure 1

9 pages, 765 KiB

Open AccessArticle

Adaptation and Validation of the DSM 5 Youth Anxiety Scale—Part I (YAM-5-I) in Colombian Adolescents

by Yenny Salamanca-Camargo, José Antonio Muela-Martínez, Lourdes Espinosa-Fernandez and Mª del Mar Díaz-Castela

Healthcare 2025, 13(8), 900; https://doi.org/10.3390/healthcare13080900 - 14 Apr 2025

Viewed by 450

Abstract

Background/Objectives: In adolescence, anxiety disorders are the most prevalent, besides being highly comorbid, with a tendency to chronicity and persistence in adulthood; although there are different assessment measures with good psychometric properties, in Colombia there are no instruments that include the new international [...] Read more.

Background/Objectives: In adolescence, anxiety disorders are the most prevalent, besides being highly comorbid, with a tendency to chronicity and persistence in adulthood; although there are different assessment measures with good psychometric properties, in Colombia there are no instruments that include the new international diagnostic classifications, aspects that may hinder accurate diagnosis and consequent care. This psychometric study aimed to adapt and validate the Anxiety Scale for Adolescents YAM-5, part I. Methods: A review of the items of the instrument was carried out, seeking to identify possible difficulties in the use of terms according to the culture; a sample of 536 adolescents linked to different public and private educational institutions from the five regions of the country was applied. The analysis of the instrument was based on the analysis of its reliability by means of Cronbach’s Alpha Coefficient, the construct validity by means of the Exploratory Factor Analysis using the principal components method, and, finally, the Confirmatory Factor Analysis using the Structural Equations technique. Results: An internal consistency of 0.93 and a structural validity with a construct of five correlated dimensions were identified, which best fitted the data collected. Conclusions: The structure examined provides high reliability and structural validity, highlighting benefits such as being of screening type, its low cost, and application aimed at non-clinical populations from the perspective of Colombian adolescents. Full article

(This article belongs to the Special Issue Mental Health, Innovative Therapies and Assessment in Adolescents and Young Adults and Related Contexts)

► Show Figures

Figure 1

22 pages, 3190 KiB

Open AccessReview

Global Research Trends, Hotspots, Impacts, and Emergence of Artificial Intelligence and Machine Learning in Health and Medicine: A 25-Year Bibliometric Analysis

by Alaa Dalky, Mahmoud Altawalbih, Farah Alshanik, Rawand A. Khasawneh, Rawan Tawalbeh, Arwa M. Al-Dekah, Ahmad Alrawashdeh, Tamara O. Quran and Mohammed ALBashtawy

Healthcare 2025, 13(8), 892; https://doi.org/10.3390/healthcare13080892 - 13 Apr 2025

Cited by 2 | Viewed by 1791

Abstract

Background/Objectives: The increasing application of artificial intelligence (AI) and machine learning (ML) in health and medicine has attracted a great deal of research interest in recent decades. This study aims to provide a global and historical picture of research concerning AI and [...] Read more.

Background/Objectives: The increasing application of artificial intelligence (AI) and machine learning (ML) in health and medicine has attracted a great deal of research interest in recent decades. This study aims to provide a global and historical picture of research concerning AI and ML in health and medicine. Methods: We used the Scopus database for searching and extracted articles published between 2000 and 2024. Then, we generated information about productivity, citations, collaboration, most impactful research topics, emerging research topics, and author keywords using Microsoft Excel 365 and VOSviewer software (version 1.6.20). Results: We retrieved a total of 22,113 research articles, with a notable surge in research activity in recent years. Core journals were Scientific Reports and IEEE Access, and core institutions included Harvard Medical School and the Ministry of Education of the People’s Republic of China, while core countries comprised the United States, China, India, the United Kingdom, and Saudi Arabia. Citation trends indicated substantial growth and recognition of AI’s and ML impact on health and medicine. Frequent author keywords identified key research hotspots, including specific diseases like Alzheimer’s disease, Parkinson’s diseases, COVID-19, and diabetes. The author keyword analysis identified “deep learning”, “convolutional neural network”, and “classification” as dominant research themes. Conclusions: AI’s transformative potential in AI and ML in health and medicine holds promise for improving global health outcomes. Full article

► Show Figures

Figure 1

32 pages, 3163 KiB

Open AccessArticle

Unveiling the Impact of Socioeconomic and Demographic Factors on Graduate Salaries: A Machine Learning Explanatory Analytical Approach Using Higher Education Statistical Agency Data

by Bassey Henshaw, Bhupesh Kumar Mishra, William Sayers and Zeeshan Pervez

Analytics 2025, 4(1), 10; https://doi.org/10.3390/analytics4010010 - 11 Mar 2025

Viewed by 1312

Abstract

Graduate salaries are a significant concern for graduates, employers, and policymakers, as various factors influence them. This study investigates determinants of graduate salaries in the UK, utilising survey data from HESA (Higher Education Statistical Agency) and integrating advanced machine learning (ML) explanatory techniques [...] Read more.

Graduate salaries are a significant concern for graduates, employers, and policymakers, as various factors influence them. This study investigates determinants of graduate salaries in the UK, utilising survey data from HESA (Higher Education Statistical Agency) and integrating advanced machine learning (ML) explanatory techniques with statistical analytical methodologies. By employing multi-stage analyses alongside machine learning models such as decision trees, random forests and the explainability with SHAP stands for (Shapley Additive exPanations), this study investigates the influence of 21 socioeconomic and demographic variables on graduate salary outcomes. Key variables, including institutional reputation, age at graduation, socioeconomic classification, job qualification requirements, and domicile, emerged as critical determinants, with institutional reputation proving the most significant. Among ML methods, the decision tree achieved a standout with the highest accuracy through rigorous optimisation techniques, including oversampling and undersampling. SHAP highlighted the top 12 influential variables, providing actionable insights into the interplay between individual and systemic factors. Furthermore, the statistical analysis using ANOVA (Analysis of Variance) validated the significance of these variables, revealing intricate interactions that shape graduate salary dynamics. Additionally, domain experts’ opinions are also analysed to authenticate the findings. This research makes a unique contribution by combining qualitative contextual analysis with quantitative methodologies, machine learning explainability and domain experts’ views on addressing gaps in the existing identification of graduate salary predicting components. Additionally, the findings inform policy and educational interventions to reduce wage inequalities and promote equitable career opportunities. Despite limitations, such as the UK-specific dataset and the focus on socioeconomic and demographic variables, this study lays a robust foundation for future research in predictive modelling and graduate outcomes. Full article

► Show Figures

Figure 1

29 pages, 4066 KiB

Open AccessArticle

SAPEx-D: A Comprehensive Dataset for Predictive Analytics in Personalized Education Using Machine Learning

by Muhammad Adnan Aslam, Fiza Murtaza, Muhammad Ehatisham Ul Haq, Amanullah Yasin and Numan Ali

Data 2025, 10(3), 27; https://doi.org/10.3390/data10030027 - 20 Feb 2025

Cited by 2 | Viewed by 1593

Abstract

Education is crucial for leading a productive life and obtaining necessary resources. Higher education institutions are progressively incorporating artificial intelligence into conventional teaching methods as a result of innovations in technology. As a high academic record raises a university’s ranking and increases student [...] Read more.

Education is crucial for leading a productive life and obtaining necessary resources. Higher education institutions are progressively incorporating artificial intelligence into conventional teaching methods as a result of innovations in technology. As a high academic record raises a university’s ranking and increases student career chances, predicting learning success has been a central focus in education. Both performance analysis and providing high-quality instruction are challenges faced by modern schools. Maintaining high academic standards, juggling life and academics, and adjusting to technology are problems that students must overcome. In this study, we present a comprehensive dataset, SAPEx-D (Student Academic Performance Exploration), designed to predict student performance, encompassing a wide array of personal, familial, academic, and behavioral factors. Our data collection effort at Air University, Islamabad, Pakistan, involved both online and paper questionnaires completed by students across multiple departments, ensuring diverse representation. After meticulous preprocessing to remove duplicates and entries with significant missing values, we retained 494 valid responses. The dataset includes detailed attributes such as demographic information, parental education and occupation, study habits, reading frequencies, and transportation modes. To facilitate robust analysis, we encoded ordinal attributes using label encoding and nominal attributes using one-hot encoding, expanding our dataset from 38 to 88 attributes. Feature scaling was performed to standardize the range and distribution of data, using a normalization technique. Our analysis revealed that factors such as degree major, parental education, reading frequency, and scholarship type significantly influence student performance. The machine learning models applied to this dataset, including Gradient Boosting and Random Forest, demonstrated high accuracy and robustness, underscoring the dataset’s potential for insightful academic performance prediction. In terms of model performance, Gradient Boosting achieved an accuracy of 68.7% and an F1-score of 68% for the eight-class classification task. For the three-class classification, Random Forest outperformed other models, reaching an accuracy of 80.8% and an F1-score of 78%. These findings highlight the importance of comprehensive data in understanding and predicting academic outcomes, paving the way for more personalized and effective educational strategies. Full article

(This article belongs to the Special Issue Data Mining and Computational Intelligence for E-Learning and Education—3rd Edition)

► Show Figures

Figure 1

Search Results (76)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (76)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI