Personalized Course Recommendation System: A Multi-Model Machine Learning Framework for Academic Success

Islam, Md Sajid; Hosen, A. S. M. Sanwar

doi:10.3390/digital5020017

Open AccessArticle

Personalized Course Recommendation System: A Multi-Model Machine Learning Framework for Academic Success

by

Md Sajid Islam

and

A. S. M. Sanwar Hosen

^*

Department of Artificial Intelligence and Big Data, Woosong University, Daejeon 34606, Republic of Korea

^*

Author to whom correspondence should be addressed.

Digital 2025, 5(2), 17; https://doi.org/10.3390/digital5020017

Submission received: 21 March 2025 / Revised: 11 May 2025 / Accepted: 19 May 2025 / Published: 22 May 2025

Download

Browse Figures

Versions Notes

Abstract

:

The increasing complexity of academic programs and student needs necessitates personalized, data-driven academic advising. Traditional heuristic-based methods often fail to optimize course selection, leading to inefficient academic planning and delayed graduations. This study introduces a hierarchical multi-model machine learning framework for personalized course recommendations, integrating five predictive models: Success Probability Model (SPM), Course Fit Score Model (CFSM), Prerequisite Fulfillment Model (PFM), Graduation Priority Model (GPM), and Recommended Load Model (RLM). These models operate independently in a local model framework, generating specialized predictions that are synthesized by a global model framework through a meta-function. The meta-function aggregates predictions to compute a final score for each course and ensures recommendations align with student success probabilities, program requirements, and workload constraints. It enforces key constraints, such as prerequisite satisfaction, workload optimization, and program-specific requirements, refining recommendations to be both academically viable and institutionally compliant. The framework demonstrated strong predictive performance, with root mean squared error values of 0.00956, 0.011713, and 0.005406 for SPM, CFSM, and RLM, respectively. Classification models for PFM and GPM also yielded high accuracy, exceeding 99%. Designed for modularity and adaptability, the framework allows for the integration of additional predictive models and fine-tuning of recommendation priorities to suit institutional needs. This scalable solution enhances academic advising efficiency by transforming granular model predictions into personalized, actionable course recommendations, supporting students in making informed academic decisions.

Keywords:

machine learning; synthetic data; feature engineering; multi-model integration; personalized recommendations; academic advising

1. Introduction

The growing complexity of academic programs, combined with the diverse needs of students, presents a significant challenge for traditional academic advising systems. Effective course selection is critical to academic success, timely graduation, and long-term career planning. However, conventional advising methods—often reliant on static rules and manual interventions—struggle to account for each student’s unique background, goals, and constraints. These limitations can result in misaligned course choices, workload imbalances, and delayed graduations, negatively affecting both student outcomes and institutional efficiency [1,2,3]. As higher education becomes more competitive, the demand for scalable, personalized advising solutions continues to grow [4].

Recent advancements in machine learning (ML) offer promising avenues for automating and individualizing the advising process. ML models can learn from large volumes of academic data—including course histories, student performance, and institutional requirements—to generate predictive insights for course suitability, success likelihood, and academic risk [5,6]. However, many existing systems rely on single-model frameworks, which fail to account for the multi-objective nature of academic advising. These models often overlook critical academic constraints, such as prerequisite fulfillment, credit load feasibility, and graduation progression [7].

This study proposes a modular, multi-model ML framework for personalized course recommendation. The system decomposes the advising task into five predictive components—Success Probability Model (SPM), Course Fit Score Model (CFSM), Prerequisite Fulfillment Model (PFM), Graduation Priority Model (GPM), and Recommended Load Model (RLM)—each modeled independently to address a specific advising dimension. Their outputs are integrated via a meta-function to generate comprehensive, constraint-aware course recommendations that balance academic performance, feasibility, and institutional rules. The framework emphasizes domain-driven feature engineering to improve both accuracy and interpretability. Features such as success probability score and graduation priority were designed to capture nuanced academic planning needs. To support experimentation while maintaining data privacy, synthetic datasets were developed to simulate realistic academic structures and enrollment behaviors [8,9,10,11]. In doing so, this work addresses several research questions:

(1): How can a modular, multi-model ML system improve the accuracy and constraint compliance of course recommendations?
(2): What role do domain-specific features and constraint-enforcing meta-functions play in enhancing the interpretability and applicability of ML-driven advising?
(3): Can realistic synthetic academic datasets support the ethical and scalable development of personalized advising tools in the absence of real student data?

By bridging algorithmic prediction with academic planning constraints, the proposed framework supports informed, individualized, and institutionally aligned advising. It serves not only as a tool for improving student outcomes but also as a foundation for scalable, adaptive educational infrastructure.

The remainder of the paper is structured as follows: Section 2 reviews related works on ML-based course recommendation systems. Section 3 details the methodology, including model architecture, feature design, and the meta-function. Section 4 presents the experimental results, analysis, and discussion. Section 5 concludes the work.

2. Related Work

Academic advising has evolved with the integration of ML techniques, enhancing personalization, scalability, and decision-making in course recommendation systems [5,6,12]. Prior research has explored deep learning (DL) models, ensemble techniques, hybrid recommendation systems, and synthetic data applications to improve advising efficiency and address data privacy concerns. However, many existing frameworks suffer from single-model limitations, lack of multi-objective optimization [13], and poor integration of academic constraints such as workload balancing and prerequisite fulfillment [5,6,14]. This section reviews key prior works, identifying their methodologies, objectives, performance, and limitations.

2.1. Collaborative and Content-Based Filtering Models

Early ML-based recommendation systems applied collaborative filtering (CF) and content-based filtering (CBF) to improve personalization. The authors of [6] proposed a hybrid approach combining CF and CBF, which enhances recommendations in sparse data settings. However, such models often suffer from cold-start and sparsity issues and typically lack structured academic constraints like prerequisites and workload balancing.

2.2. Deep Learning Approaches

DL models have been employed to improve prediction accuracy in course recommendations. The authors of [5] introduced DORIS, using deep factorization machines (DeepFMs) and achieved an Area Under the Receiver Operating Characteristic Curve (AUC) of 0.969. The work in [12] proposed a personality-aware DL model that incorporates user traits for improved personalization. While these models show high performance, they generally do not account for academic-specific constraints or long-term planning strategies.

2.3. Context-Aware and Constrained Optimization Models

Structured decision-making in academic environments often involves competing objectives. The authors of [13] proposed a constrained Bayesian optimization framework for balancing objectives under real-world constraints. Though not designed for advising, its applicability in structured settings is notable. Such methods highlight the importance of context-aware and multi-objective optimization in recommendation systems.

2.4. Hybrid and Ensemble Learning Models

Hybrid systems that integrate multiple ML algorithms have demonstrated improved robustness. The authors of [14] developed a weighted ensemble model combining Random Forest (RF), Naïve Bayes, and Support Vector Machine (SVM), outperforming single-model baselines in multiple metrics. The proposed method in [15] used conformal prediction in an ensemble setting to improve model interpretability. However, these works focus primarily on prediction performance and lack structured academic constraint integration.

2.5. Feature Engineering and Interpretability

Feature engineering enhances both model accuracy and transparency. The authors of [16] emphasized that AI-driven recommendations are more useful when interpretable. The proposed method in [17] focused on feature selection to reduce redundancy and improve explainability. However, these works rarely integrate domain-specific academic variables like skill alignment, course load thresholds, or prerequisite chains.

2.6. Synthetic Data in Academic Recommender Systems

Synthetic data are increasingly used in privacy-sensitive educational ML applications. The research in [18,19] demonstrated that well-structured synthetic datasets can preserve predictive utility while maintaining confidentiality. However, prior works have not fully integrated synthetic data into multi-model recommendation systems tailored to academic advising. This study builds upon these efforts by embedding synthetic data into a scalable, constraint-aware framework.

A comparative overview of these prior works, categorized by methodology, contributions, and limitations, is presented in Table 1.

While previous ML-based advising systems have explored DL [5,12], hybrid approaches [6], constrained optimization [13], and ensemble learning [14], they often lack a unified mechanism to optimize multiple academic constraints simultaneously. These include prerequisite fulfillment, workload balancing, and graduation planning. Single-model systems often focus narrowly on prediction accuracy [5,12], sparsity resolution [6], or interpretability [15,16,17], without addressing structured academic planning holistically. In contrast, our proposed framework integrates multiple specialized models using a hierarchical meta-function to generate constraint-aware recommendations that align with institutional policies and individual student needs. Furthermore, by leveraging synthetic datasets [18,19], the framework supports scalable and privacy-compliant deployment across diverse academic contexts.

3. The Proposed Personalized Course Recommendation System

The proposed Personalized Course Recommendation System (PCRS) leverages a hierarchical multi-model ML framework to address the complexities of academic advising. The framework consists of two distinct layers: a Local Model Framework (LMF) and a Global Model Framework (GMF). The LMF consists of five predictive models—SPM, CFSM, PFM, GPM, and RLM—each trained to address a specific dimension of course selection. The GMF synthesizes the outputs of these models through a meta-function and dynamically combines their predictions to generate actionable recommendations tailored to individual student profiles. This design is inspired by the Hierarchical Mixtures of Experts (HME) model [20], which enables modular and scalable decision-making by integrating specialized local models into a unified framework. Such an approach ensures adaptability and efficiency in academic advising systems.

Figure 1 provides an overview of the system. It shows the distinct stages and hierarchical structure. The framework begins with raw data collection, followed by data preprocessing and feature engineering to extract meaningful and domain-specific features. The five predictive models in the LMF operate independently, focusing on task-specific objectives such as predicting success probabilities or ensuring prerequisite fulfillment. Their outputs are then aggregated in the GMF through a meta-function that accounts for academic priorities, workload constraints, and program requirements. This end-to-end process ensures that the system captures nuanced aspects of academic needs while producing personalized, optimized recommendations. The following subsections describe each stage of the PCRS, emphasizing the technical design and implementation of the system.

3.1. Dataset Collection

The dataset used in this study was synthetically generated to emulate real-world academic environments while preserving privacy and ensuring reproducibility. It comprises 101,330 student–course interaction records, based on a simulated academic population of 10,000 students and 600 unique courses. Data generation was guided by institutional logic and curriculum structures observed in higher education, including GPA-dependent course enrollment, probabilistic grade assignment, prerequisite and corequisite relationships, and graduation planning aligned with credit accumulation and semester limits.

Four primary datasets were created to represent different facets of academic advising. The Course dataset, exemplified in Table 2, provides detailed metadata for each course, including its credit type, difficulty level, offering schedule, mode of instruction, and historical pass rates. The Student dataset, exemplified in Table 3, includes academic performance indicators such as GPA and credit completion status, as well as contextual factors like major, minor, learning preferences, class availability, and extracurricular load. The Student–Course Interaction dataset, exemplified in Table 4, links students with courses they have taken, incorporating attributes such as enrollment status, earned grades, engagement levels, and feedback scores. Additionally, the Semester dataset records time-sensitive information such as the number of semesters remaining until graduation, current semester load, and scheduling constraints, enabling models to generate recommendations consistent with academic progression and institutional timelines.

The merging of these datasets was performed using common identifiers such as student_id and course_id, resulting in a unified structure for ML tasks. While the dataset is entirely synthetic, its design emphasizes structural and behavioral realism. Diverse course types, varying academic pathways, and broad distributions of student attributes were incorporated to reduce narrow-pattern overfitting. Although explicit fairness metrics were not computed, the data generation process aimed to simulate a representative academic population with varied performance histories and learning trajectories. The dataset is available upon request for academic research and validation purposes.

3.2. Data Preparation and Feature Engineering

The final dataset (DF4) was constructed through the integration of student, course, and semester-related attributes. The steps for the derivation of DF4 involve rigorous data preparation and feature engineering, illustrated in Figure 2.

To ensure the dataset is clean, structured, and ready for modeling, several preprocessing steps are applied. The raw datasets contain duplicates, missing values, and inconsistencies, which are addressed before merging. Duplicates are removed to avoid redundant records, and missing values in categorical attributes such as minor and corequisites are replaced with “None”, while missing numerical values, including GPA and grades, are imputed using median values. The Student dataset and Course dataset are merged based on student id and course id to form the Student–Course Interaction dataset, which introduces additional attributes related to student performance, course engagement, and academic behavior. The Semester dataset is then integrated based on student id, incorporating semester-specific academic progress into the final dataset.

Furthermore, in order to reduce the impact of extreme values, outlier removal is performed using the interquartile range (IQR) filtering method, particularly for numerical features such as grades [21], workload levels, and participation rates. Since ML models require numerical inputs, categorical variables are encoded to ensure compatibility. One-hot encoding is used for non-ordinal categorical features such as course level and mode of instruction, while ordinal encoding is applied to attributes with inherent ranking, such as workload level, where values are mapped to numerical equivalents (Low = 1, Medium = 2, High = 3). Additionally, to enhance the model’s ability to predict student success and course suitability [22], a total of twelve features are engineered. These features are categorized into input features, which serve as predictors, and output features, which act as target variables for predictive modeling.

To serve as interpretable heuristics that balance simplicity, relevance, and compatibility with ML-based modeling, we engineered several domain-driven features. These include Engagement Score (

E_{s c o r e}

), Completion Ratio (

C_{r a t i o}

), Demand Ratio (

D_{r a t i o}

), Skill Tag Diversity (

S_{d i v e r s i t y}

), Difficulty Score (

D_{s c o r e}

), Alignment Score (

A_{s c o r e}

), Course Fit Score (

C_{f i t}

), Success Probability (

P_{s u c c e s s}

), Graduation Priority (

G_{p r i o r i t y}

), Prerequisite Fulfillment (

P_{f u l l}

), and Recommended Load (

R_{l o a d}

). These features transform raw academic data into meaningful metrics that capture student behaviors, course characteristics, and academic priorities. The

E_{s c o r e}

combines feedback and satisfaction as proxies for student involvement. It quantifies student involvement by combining satisfaction levels with course feedback, defined in Equation (1).

\begin{matrix} E_{s c o r e} = s a t i s f a c t i o n_{l e v e l} \times f e e d b a c k_{s c o r e} \end{matrix}

(1)

where

s a t i s f a c t i o n_{l e v e l}

is measured on a Likert scale (e.g., 1–5) and

f e e d b a c k_{s c o r e}

is a numerical value summarizing course evaluation. A higher

E_{s c o r e}

indicates deeper student involvement, which is associated with better academic outcomes. The

C_{r a t i o}

measures the efficiency of course completion by considering the time required to complete a course relative to its credit value. A lower ratio suggests better time management and course efficiency.

The

D_{r a t i o}

reflects the popularity of a course by comparing the number of enrolled students to the course’s total capacity. A higher ratio indicates strong demand and suggests the course is competitive or frequently selected by students. The

S_{d i v e r s i t y}

assesses the range of skills a course offers relative to its prerequisites, as derived in Equation (2).

\begin{matrix} S_{d i v e r s i t y} = \frac{(s k i l l_{t a g_{c o u n t}})}{(p r e r e q u i s i t e_{c o u n t} + 1)} \end{matrix}

(2)

where

s k i l l_{t a g_{c o u n t}}

is the number of distinct skill tags associated with the course, and

p r e r e q u i s i t e_{c o u n t}

is the number of required prior courses (adding 1 in the denominator avoids division by 0). This feature helps measure the breadth of a course’s curriculum.

The

D_{s c o r e}

quantifies course complexity by combining qualitative workload information with its credit value, calculated as in Equation (3).

\begin{matrix} D_{s c o r e} = w o r k l o a d_{l e v e l} (m a p p e d) \times c r e d i t s \end{matrix}

(3)

where

w o r k l o a d_{l e v e l}

is a numerical mapping of qualitative workload descriptions (e.g., Low = 1, Medium = 2, High = 3), and credits represents the course’s academic weight.

From the engineered features, five key output features that serve as the target variables are derived for the predictive models. The

C_{f i t}

evaluates how well a course aligns with a student’s academic profile by integrating several input features, computed as in Equation (4).

\begin{matrix} C_{f i t} = A_{s c o r e} + 0.5 \times S_{d i v e r s i t y} + 0.5 \times G_{a v g} \end{matrix}

(4)

\begin{matrix} A_{s c o r e} = G_{a v g} \times g r a d u a t i o n_{r a t e_{i m p a c t}} \end{matrix}

(5)

where

A_{s c o r e}

is defined in Equation (5), and

g r a d u a t i o n_{r a t e_{i m p a c t}}

quantifies the course’s importance for graduation;

S_{d i v e r s i t y}

is given by Equation (2); and Grade (

G_{a v g}

) is the average of all the grades. This composite metric ensures that recommended courses match the student’s academic strengths and institutional requirements.

The

P_{s u c c e s s}

estimates the likelihood of a student successfully completing a course, defined in Equation (6).

\begin{matrix} P_{s u c c e s s} = P_{a d j} \times h i s t o r i c a l_{p a s s_{r a t e}} \end{matrix}

(6)

where Adjusted Popularity (

P_{a d j}

) is a combination of course popularity and success rate.

The

G_{p r i o r i t y}

is a binary indicator that signifies whether a course is critical for timely graduation. It is formulated as in Equation (7).

\begin{matrix} G_{p r i o r i t y} = \{\begin{matrix} 1, i f s e m e s t e r_l e f t \\ < 2 a n d c r e d i t_{t y p e} 0, O t h e r w i s e \end{matrix} \end{matrix}

(7)

where semester_left denotes the number of semesters remaining for the student, and the course’s mandatory status is determined by a coded variable (e.g., 0 for mandatory courses). This output helps prioritize courses that are essential for graduation.

The

P_{f u l l}

output is a binary flag that indicates whether a student meets all the required prerequisites for a course, defined in Equation (8).

\begin{matrix} P_{f u l l} = \{\begin{matrix} 1 \\ i f p r e r e q u i s i t s a r e m e t 0, O t h e r w i s e \end{matrix} \end{matrix}

(8)

where

P_{f u l l} =

1 ensures that only courses for which the student meets all prerequisites are recommended, and 0 indicates that the student does not fulfill the necessary prerequisites for the course, meaning it should not be recommended at this stage. The

R_{l o a d}

estimates an optimal course load by balancing course difficulty with student engagement, defined in Equation (9).

\begin{matrix} R_{l o a d} = \frac{D_{s c o r e}}{E_{s c o r e} + 1} \end{matrix}

(9)

where

D_{s c o r e}

is the Difficulty Score derived from Equation (3) and

E_{s c o r e}

is the Engagement Score derived from Equation (1). The addition of 1 in the denominator prevents division by 0 and stabilizes the metric.

Collectively, the input features capture detailed aspects of student performance and course characteristics, while the output features—specifically

C_{f i t}

,

P_{s u c c e s s}

,

G_{p r i o r i t y}

,

P_{f u l l}

, and

R_{l o a d}

—serve as the target variables for our predictive models. By transforming raw data into these robust and interpretable metrics, this feature engineering process lays the groundwork for accurate modeling and ultimately supports the generation of personalized, data-driven course recommendations that align with both institutional priorities and student needs. Additionally, essential numerical features are standardized using z-score normalization (StandardScaler), computed as Equation (10).

\begin{matrix} z = \frac{(x - u)}{s} \end{matrix}

(10)

where u is the mean and s is the standard deviation of the training samples.

The transformation ensures that numerical features have a mean of 0 and a standard deviation of 1, improving model stability. While all engineered features are designed to enhance predictive modeling, some features are explored during experimentation but are not explicitly used in model training. While they do not influence the final predictions, they offer insights that could enhance future models through advanced feature selection or hybrid approaches.

3.3. Local Model Framework

The Local Model Framework (LMF) is designed to address the diverse and multi-faceted aspects of the course recommendation process. It consists of multiple specialized models, each responsible for a specific predictive task. This modular and task-specific design ensures precision, interpretability, and scalability, allowing seamless integration into the global recommendation system. Each model leverages engineered features and advanced ML algorithms to generate outputs that collectively contribute to personalized course recommendations, supporting multi-dimensional academic decision-making.

The first component, SPM, predicts the likelihood that a student will successfully complete a course. It utilizes academic performance indicators, engagement levels, course complexity, popularity, and historical success trends to generate this estimate. These features were previously defined in Section 3.2. To model this probability, a Gradient Boosting Regressor (GBR) is employed. GBR is well-suited for this task due to its ability to handle non-linear relationships and iteratively refine predictions, making it particularly effective in modeling complex dependencies between success rates and influencing factors [23]. Its boosting mechanism minimizes prediction errors by combining weak learners in a sequential manner, leading to strong overall predictive performance, especially in datasets with mixed feature types and noisy patterns. The model outputs a probability score that represents the likelihood of a student passing a course.

Building on this prediction, CFSM evaluates how well a course aligns with a student’s academic strengths and performance history. It incorporates past academic outcomes, program alignment, course characteristics, and predicted success probability to ensure that recommended courses match a student’s abilities and graduation trajectory. GBR is used for this task as well, due to its strength in handling continuous output variables and capturing non-linear relationships between academic profile features and course characteristics. Notably, GBR’s ability to provide interpretable feature importance is especially beneficial in academic advising, where transparency and explainability are essential for both students and institutional stakeholders. Additionally, its iterative boosting mechanism improves prediction accuracy by minimizing residual errors, making it well-suited for generating nuanced alignment scores that reflect multiple academic dimensions.

Following course alignment, PFM determines whether a student meets the prerequisites for enrolling in a course. It evaluates prerequisite completion, academic readiness, course difficulty, and skill progression to ensure students fulfill all necessary requirements before enrollment. For this binary classification task, a CatBoost Classifier (CBC; version 1.2.7) is employed due to its superior handling of categorical variables—such as prerequisite tags and completion status—natively, minimizing the need for extensive preprocessing. Additionally, CBC captures complex feature interactions and reduces overfitting through ordered boosting and efficient categorical encoding, making it well-suited for structured academic datasets that combine discrete and numerical inputs [24].

Prioritization of course selection is addressed by GPM, which identifies courses that are essential for a student’s timely graduation. The inputs for this model analyze degree requirements, remaining semesters, credit distribution, and enrollment patterns to prioritize courses essential for timely graduation. The graduation priority condition was previously defined in Section 3.2. A Light Gradient Boosting Machine (LightGBM) model is used for this classification task. LightGBM (version 4.5.0) is selected due to its efficiency in handling large-scale data and complex feature interactions, which are common in educational datasets involving degree requirements, remaining semesters, credit distribution, and enrollment patterns. Its histogram-based algorithm and leaf-wise tree growth contribute to faster training speeds and reduced memory usage, making it particularly suitable for processing intricate educational data [25]. The model produces a binary output, where 1 represents a high-priority course for graduation and 0 represents a lower-priority course.

Finally, the last component, RLM, predicts the optimal academic workload for a student in an upcoming semester. It utilizes student engagement levels, workload intensity, academic priorities, and course difficulty to ensure a well-balanced and manageable semester schedule. Once again, GBR is used for this model due to its effectiveness in capturing complex relationships between course difficulty, engagement, and time efficiency. The model predicts an optimal workload value, ensuring that students receive recommendations that balance academic challenge and manageability.

Together, these modular prediction models form the foundation of the overall recommendation framework. By leveraging advanced ML algorithms such as GBR, CBC, and LightGBM, combined with carefully engineered features, this framework enhances both the accuracy and interpretability of course recommendations. These individual model outputs are then synthesized in the Global Recommendation Model, ensuring holistic and institutionally aligned academic advising.

3.4. Global Model Framework

The Global Model Framework (GMF) synthesizes the outputs of the local models to generate actionable, personalized course recommendations. By integrating predictions from the five independent local models—SPM, CFSM, PFM, GPM, and RLM—the GMF ensures holistic recommendations that consider success likelihood, course fit, prerequisite fulfillment, graduation priorities, and optimal workload management. Operating as a higher-level decision-making layer, this framework translates granular predictions into a cohesive and tailored recommendation list.

At the core of the GMF lies the meta-function, which combines the outputs of SPM, CFSM, and GPM into a single, composite final score for each course. This score reflects the overall suitability of a course for a given student, balancing academic priorities and institutional constraints. The meta-function is defined in Equation (11).

\begin{matrix} F i n a l S c o r e = w_{1} \times {p r e d i c t e d}_{s u c c e s s_p r o b a b i l i t y} + w_{2} \times p r e d i c t e d_{c o u r s e_{f i t_{s c o r e}}} + w_{3} \times p r e d i c t e d_{g r a d u a t i o n_{p r i o r i t y}} \end{matrix}

(11)

where

w_{1}

,

w_{2}

, and

w_{3}

are weights assigned to the outputs of SPM, CFSM, and GPM, respectively. These weights are static but manually adjustable, allowing institutions to fine-tune the balance of priorities based on specific advising policies or the needs of different student cohorts.

The GMF further refines the recommendation list by enforcing several academic constraints to ensure that course recommendations are both practical and institutionally compliant. One critical constraint is Prerequisite Satisfaction, which ensures that courses for which prerequisites are not met are excluded. This filtering is guided by the binary output of PFM, which flags courses as eligible (1) or ineligible (0). Another essential constraint is Workload Optimization, leveraging the output of RLM to filter out course selections that would exceed a student’s optimal semester workload. A typical threshold of 15 credits is applied to promote academic success and student well-being.

The framework also accounts for Program-Specific Requirements, deprioritizing courses that fall outside a student’s program unless they fulfill elective or cross-departmental requirements. Additionally, Course Availability constraints ensure that courses not offered in the current semester or already completed by the student are removed from the recommendation list. Collectively, these constraints optimize the recommendations, ensuring both academic feasibility and alignment with individual student progress.

After applying these constraints, the GMF ranks the remaining courses based on their computed final scores. The top N ranked courses are presented to the student in descending order of suitability, ensuring that the most relevant and feasible options are prioritized. To enhance interpretability, the recommendations are enriched with detailed course attributes, including course name, level, category, prerequisites, semester availability, and mode of instruction. This comprehensive output empowers students to make informed, confident decisions about their academic paths.

The GMF’s design offers several key advantages. First, its modularity allows for the seamless addition of new local models. For example, models predicting financial aid eligibility or elective preferences can be incorporated without disrupting the aggregation process. This flexibility is crucial for adaptive learning environments, enabling independent components to collaborate while leveraging shared information. Second, its adaptability allows institutions to tune meta-function weights to reflect varying advising priorities. For instance, institutions may prioritize SPM for first-year students exploring foundational courses, while emphasizing GPM for final-year students nearing program completion.

The GMF bridges the gap between detailed model outputs and actionable course recommendations. By combining advanced aggregation techniques, enforcing essential constraints, and presenting enriched outputs, it ensures that students receive personalized, feasible, and academically aligned recommendations. Algorithm 1 outlines the core steps of this process, detailing how predictions from SPM, CFSM, PFM, GPM, and RLM are aggregated, constraints applied, and a ranked list of personalized course recommendations is generated.

Algorithm 1: Global meta-function for course recommendation

Input: student_id: Unique identifier of the student
course_data: Dataset containing course information
models: {success_model, fit_model, prerequisite_model, graduation_model, load_model}
Pretrained local models for success probability, course fit score, prerequisite fulfillment, graduation priority, and workload optimization
weights: {w_success, w_fit, w_priority}
Default weights for meta-function components
max_credits: Maximum allowable workload per semester (default = 18 credits)
Output: Ranked list of recommended courses for the student

1: Initialize: completed_courses ← GetCompletedCourses(course_data, student_id)
eligible_courses ← Filter(course_data, course_id NOT IN completed_courses)
2: Predict Local Model Outputs:
For each course in eligible_courses:
success_probability ← Predict(success_model, course_features)
course_fit_score ← Predict(fit_model, course_features)
prerequisite_flag ← Predict(prerequisite_model, course_features)
graduation_priority ← Predict(graduation_model, course_features)
recommended_load ← Predict(load_model, course_features)
3: Compute Meta-Function Scores:
For each course in eligible_courses: final_score ←

w_{1} \times p r e d i c t e d_s u c c e s s_p r o b a b i l i t y + w_{2} \times p r e d i c t e d_c o u r s e_f i t_s c o r e + w_{3} \times p r e d i c t e d_g r a d u a t i o n_p r i o r i t y

4: Apply Constraints:
valid_courses ← Filter(eligible_courses,
prerequisite_flag == 1 AND recommended_load ≤ max_credits)
5: Rank and Select:
ranked_courses ← Sort(valid_courses, by = final_score, descending = True)
recommendations ← SelectTopN(ranked_courses, N = 10)
6: Enrich Recommendations:
For each course in recommendations:
Add additional metadata:
{course_name, course_level, prerequisites, credit_type, mode_of_instruction}
7: Return: recommendations

4. Performance Evaluation

The experiments were conducted on Jupyter Notebook (version 7.2.2), a local web-based interactive coding environment commonly used for ML model development and visualization, and the application was run on a system with an 8-core CPU, 32.77 GB RAM, and 8 GB VRAM. Python (version 3.12.8) was used as the programming language for implementing the model. The experimental details are outlined below.

4.1. Experiment Setup

The evaluation of the PCRS was conducted through a multi-stage experimental process designed to reflect the hierarchical structure of the system, which consists of local models and a global meta-function. This process systematically validates each predictive component independently before assessing their combined impact in the global model, ensuring robustness at both the local and global levels.

The experiments were conducted using DF4, the synthetic dataset described in Section 3.2, to ensure privacy-compliant validation in realistic academic scenarios. This dataset enabled the system to generate data-driven, personalized course recommendations across 101,330 records of student, course, and semester-specific attributes.

4.2. Results Analysis of the Local Models

This section presents the evaluation of the five local models used in the course recommendation framework: SPM, CFSM, PFM, GPM, and RLM. Each model was tested with the most relevant features based on correlation analysis and domain knowledge to determine the most effective configuration in terms of predictive accuracy, computational efficiency, and generalizability. The performance of each model was assessed using appropriate metrics, including mean absolute error (MAE), root mean squared error (RMSE), accuracy, precision, recall, F1-score, and AUC-ROC, supported by cross-validation, residual analysis, and learning curve assessments.

For SPM, multiple algorithms were tested to identify the most effective prediction method. Specifically, GBR, Random Forest Regressor (RFR), Linear Regression (LR), XGBoost (XGB), LightGBM, and CatBoost (CB) were evaluated. Among these, GBR with optimized hyperparameters significantly outperformed the other models, achieving the lowest MAE of 0.006522 and RMSE of 0.009565. Detailed comparisons are provided in Table 5 and visualized in Figure 3 and Figure 4, respectively.

To further validate SPM under varying data conditions, a robustness test was performed using noisy data, which resulted in a slight decrease in performance but maintained overall effectiveness. Specifically, GBR with noisy data achieved an MAE of 0.027954 and an RMSE of 0.040952, confirming that the model retains strong generalization capacity even when exposed to potential real-world data inconsistencies.

To strengthen this evaluation and mitigate overfitting risks, 10-fold cross-validation was conducted on the optimized GBR model. The results showed consistent performance across all folds, with a mean RMSE of 0.009703 ± 0.000314, demonstrating that the model is well-trained, stable, and effective in predicting success probability.

Residual analysis further confirms that SPM is well-calibrated, with residuals centered around 0 and normally distributed, as shown in Figure 5, indicating no systematic bias. The residuals vs. predicted success probability plot in Figure 6 displays a random scatter pattern, verifying that the model maintains stable predictive performance across different success probabilities, although slightly higher variance is observed at extreme values. The learning curve in Figure 7 supports these findings, showing low overfitting and optimal generalization, with training and validation RMSE values converging as the dataset size increases.

While the model performs consistently well, minor refinements, such as further hyperparameter tuning or feature adjustments, could help reduce variance in extreme predictions and address residual outliers. Overall, SPM demonstrates high reliability, strong predictive accuracy, and robustness, making it a key component of the course recommendation framework.

The CFSM was evaluated using LR, RFR, Support Vector Regressor (SVR), XGB, LightGBM, GBR, and CB. Among these, GBR once again demonstrated superior predictive accuracy, achieving the lowest RMSE of 0.011713 and MAE of 0.009170, outperforming all other configurations, as summarized in Table 6.

To ensure the robustness of the GBR-based CFSM, 10-fold cross-validation was performed, yielding a mean RMSE of 0.012256 ± 0.000376, indicating stable performance across different data splits. Additionally, a robustness test with noisy data resulted in an RMSE of 0.016042, confirming the model’s generalization capacity under slight perturbations.

A computational efficiency analysis (Table 7) revealed that while XGB and LightGBM achieved faster training times (1.18 s and 1.09 s, respectively) compared to GBR (67.35 s), GBR consistently provided the best accuracy and robustness, making it the preferred choice.

Residual analysis was conducted to validate prediction reliability and generalization performance. The residuals vs. predicted values plot in Figure 8 confirms homoscedasticity, with residuals randomly distributed around 0, indicating that the model does not systematically favor specific score ranges. Additionally, residual variance remains consistent across predicted values, further supporting the model’s stable performance.

The learning curve shown in Figure 9 demonstrates that as training data size increases, validation error steadily decreases, confirming that the model effectively generalizes to unseen data. While minor fluctuations in validation error are observed at larger dataset sizes, the narrowing gap between training and validation errors suggests that the model does not suffer from significant overfitting.

Based on this comprehensive evaluation—considering predictive accuracy, cross-validation stability, computational efficiency, and robustness testing—the GBR model with optimized hyperparameters (n_estimators = 200, max_depth = 6, learning_rate = 0.05) was selected as the final CFSM.

The PFM was evaluated using multiple classification algorithms, including Decision Tree (DT), RFC, XGB Classifier, CBC, SVM, and Logistic Regression. Among these, CBC achieved the highest AUC-ROC score of 0.998556, slightly outperforming XGB (0.998546) and other models, as summarized in Table 7 and visualized in Figure 10.

Cross-validation further confirmed the stability of CBC, with a 5-fold cross-validation AUC of 0.998345, demonstrating consistent performance across data splits. Additionally, a robustness test with noisy data showed that CBC maintained an accuracy of 99.35% and an AUC-ROC of 0.9979, confirming the model’s strong generalization capability under slight data perturbations. In comparison, XGB showed slightly lower robustness, with its accuracy dropping to 85.85% and AUC-ROC reducing to 0.8878 under noisy conditions.

The ROC curve shown in Figure 11 highlights the near-perfect AUC score of 0.9986 with CBC, indicating that the model effectively distinguishes between students who have met prerequisite requirements and those who have not. The confusion matrix in Figure 12 shows minimal false negatives, ensuring that students who have fulfilled prerequisites are classified accurately.

Based on predictive accuracy, cross-validation stability, and robustness testing, the CBC model was selected as the final PFM. With its high precision, superior AUC-ROC score, and resilience to noisy data, it provides reliable classification performance and is well-suited for integration into the course recommendation framework.

The GPM was evaluated using multiple classification algorithms, including Logistic Regression, RFC, GBC, SVM, K-Nearest Neighbors (KNN), and LightGBM. Among these, LightGBM was selected as the final model, achieving perfect classification performance with 100% accuracy. Other models, such as Logistic Regression, RFC, and GBR, also achieved 100% accuracy. However, LightGBM demonstrated superior computational efficiency, faster training times, and better robustness under noisy conditions, making it the optimal choice. Although SVM and KNN performed well, their accuracy was slightly lower, with SVM achieving 99.99% accuracy and KNN scoring 99.95%. KNN, in particular, exhibited a marginal decrease in recall, indicating occasional misclassification of priority students. Despite this, all models displayed strong classification performance, with minimal variance across different training sets.

While multiple models achieved near-perfect scores, LightGBM was chosen due to its efficiency, robustness to noisy data, and ability to scale with larger datasets. Compared to tree-based models like RFC and GBR, LightGBM is specifically designed to optimize training speed and memory usage, making it highly efficient for large-scale classification tasks [26]. LightGBM’s leaf-wise tree growth strategy allows for better feature utilization and faster convergence compared to traditional boosting methods, reducing computational overhead while maintaining high accuracy. Moreover, benchmarking studies have demonstrated that LightGBM consistently outperforms other ensemble models in large-scale classification tasks, particularly when speed and scalability are critical factors [27]. When tested under noisy conditions, LightGBM maintained an accuracy of 99.83%, which confirms its stability and generalization capabilities. In contrast, other models, particularly XGB and KNN, showed slight performance drops under noisy conditions, reinforcing the decision to select LightGBM for its superior resilience.

To confirm the model’s reliability, validation techniques such as cross-validation, confusion matrix analysis, and learning curve assessment were applied. The confusion matrix in Figure 13 shows that LightGBM perfectly classified all 18,075 non-priority and 2191 priority students, with no false positives or negatives, ensuring unbiased predictions.

To ensure stability, 10-fold cross-validation was conducted, where the model consistently maintained near-perfect accuracy across all folds. The minimal variance observed across different training splits indicates that LightGBM is not overly sensitive to specific data distributions and retains its predictive accuracy across different subsets of data. This consistency makes it a highly dependable model for real-world implementation.

Based on classification accuracy, computational efficiency, robustness to noisy data, and stability across validation techniques, LightGBM was selected as the final GPM. Its fast training speed, low memory usage, and ability to maintain perfect classification under varying conditions make it an ideal choice for deployment in a course recommendation framework. LightGBM’s efficiency and scalability, particularly in classification tasks requiring high accuracy with low computational cost, further reinforce its selection as the final model for GPM, as supported by prior studies [25,27].

Lastly, The RLM was evaluated using multiple regression algorithms, with GBR achieving the best performance, and having the lowest error rates (MAE = 0.003403, RMSE = 0.005406). While CB, LightGBM, and XGB also performed well, GBR consistently outperformed them across all evaluation metrics. Traditional models such as RFR, SVR, and DT exhibited significantly higher errors, making them less suitable for workload prediction. Linear Regression performed the worst, with much higher MAE and RMSE values, as summarized in Table 8 and visualized in Figure 14.

GBR was chosen for its superior accuracy, stability, and robustness. A 5-fold cross-validation confirmed its low variance across training splits, ensuring consistency and reliability. Additionally, robustness testing under noisy data conditions demonstrated that GBR maintained lower MAE and RMSE compared to other models, highlighting its resilience in real-world applications.

A residual vs. predicted values plot shown in Figure 15 confirmed that the model’s errors were evenly distributed around 0, with no signs of systematic bias or heteroscedasticity. The learning curve shown in Figure 16 demonstrates that training and validation errors closely converged, confirming that GBR effectively generalizes without overfitting. Unlike more complex models, which might show fluctuations in validation performance, the learning curve showed smooth convergence, ensuring model stability even as training data size increases.

A 5-fold cross-validation confirmed that GBR consistently achieved the lowest error rates, with MAE and RMSE values remaining stable across all folds. This indicates that the model does not suffer from sensitivity to training split variations, making it highly dependable for real-world deployment. Given its exceptional predictive performance, stability across validation folds, and robustness to noisy data, GBR was selected as the final RLM. Its ability to maintain near-perfect accuracy across varying conditions makes it the optimal choice for workload recommendations in course selection systems. A summary of local model configurations and performance metrics is shown in Table 9.

4.3. Results Analysis of the Global Model

The GFM synthesizes predictions from local models to generate personalized course recommendations. A meta-function ranks courses based on success probability, course fit, and graduation priority, with weights set as w₁ = 0.4, w₂ = 0.3, and w₃ = 0.3. Constraints such as prerequisite fulfillment and workload limits (max 18 credits per semester) ensure practical recommendations. The framework was evaluated using standard recommendation metrics, achieving a precision of 1.00, ensuring all recommended courses in the top k = 5 were relevant. Recall (1.00) confirmed all relevant courses were captured, while the F1-score (1.00) balanced both. Ranking metrics such as normalized discounted cumulative gain (NDCG) and mean reciprocal rank (MRR) (1.00) validated the prioritization of relevant courses, and a hit rate of 1.00 ensured at least one relevant course was always recommended. These metrics are widely used to assess recommender system accuracy, ranking quality, and user engagement [26].

The adaptability of the global framework is demonstrated by its ability to recommend courses across diverse academic contexts. For instance, Table 10 presents the top five recommendations for an Economics student, featuring a mix of introductory, intermediate, and upper-division courses spanning categories such as application, theory, and research. Similarly, Table 11 showcases recommendations for a Psychology student, reflecting the system’s capacity to prioritize a variety of course types while maintaining alignment with constraints and objectives.

While the framework achieved optimal evaluation metrics, real-world validation through student feedback or advisor reviews remains necessary to confirm its effectiveness in practical settings. Furthermore, the current static weight configuration may limit adaptability to dynamic student preferences. Future enhancements will focus on introducing dynamic weight adjustments and incorporating additional features, such as elective preferences and cross-departmental courses, to expand the system’s scope and relevance. The results highlight the robustness of the global framework in generating high-quality, personalized recommendations. By integrating predictions from local models and ranking courses using a robust meta-function, the system ensures alignment with both academic and institutional priorities. The diversity and feasibility of recommendations further demonstrate the framework’s potential to enhance academic advising and streamline the course selection process.

4.4. Discussion

By decomposing the advising task into multiple predictive dimensions and integrating them through a rule-based meta-function, the system offers a practical and interpretable solution to course planning, academic progression, and institutional policy alignment. Compared to existing systems that prioritize either predictive accuracy or personalization in isolation, this framework emphasizes a holistic approach. The integration of models addressing success probability, course alignment, prerequisite eligibility, graduation urgency, and workload estimation supports multi-faceted decision-making. This separation of logic across independent models increases transparency, facilitates debugging and refinement, and enables institutions to adjust or extend components according to their specific advising policies.

The design also supports scalability and adaptability. Though developed for traditional academic environments, the modular architecture could be extended to structured learning scenarios such as online education platforms, certification programs, or corporate training paths. By redefining input features and constraint logic, the same principles can be applied to sequence skill-based content or optimize individualized learning trajectories in other domains.

While the system achieved consistently high performance across predictive tasks and global recommendation metrics, several limitations remain. The current integration strategy relies on fixed, manually assigned weights in the meta-function. Although this preserves interpretability, it may not capture individual or institutional preferences that vary over time. While this study implemented a weighted meta-function for integrating model outputs, other hybridization mechanisms—such as cascade or intersection-based integration—were not explored. Future research could investigate these alternative strategies to assess their impact on recommendation diversity, constraint satisfaction, and user relevance.

The decision to include five sub-models reflects core advising considerations observed in academic policy: graduation timing, course prerequisites, workload limits, and student–course alignment. GPM was modeled as a binary classification task due to variations in student timelines and program structures, which are not easily captured by static rules. Each model’s contribution was justified by distinct targets and strong empirical performance. Across regression tasks, RF models consistently underperformed compared to gradient boosting (GBR, XGB), likely due to GBR’s ability to capture complex interactions through sequential learning, especially in structured, low-noise synthetic environments. While all models were tuned, boosting models consistently showed lower error rates. Additionally, the use of synthetic data—though carefully structured to mirror real academic scenarios—limits the generalizability of findings. Real-world validation will be essential to confirm performance, user acceptance, and system trustworthiness. Feedback from academic advisors and students could further improve feature design and output presentation. Moreover, the current system returns a single recommendation list per student per term. Generating multiple valid course sets could offer more flexibility, particularly for students with overlapping interests or uncertain schedules.

Future work will explore the integration of student and advisor feedback, expansion of feature sets to include elective preferences, interdepartmental constraints, and financial aid considerations. Additionally, dynamic weighting strategies and alternative meta-function architectures may enhance the system’s adaptability and personalization in real advising contexts.

5. Conclusions

This study bridges the gap between traditional academic advising and modern data-driven solutions by introducing a hierarchical multi-model framework for personalized course recommendations. By integrating five predictive models through a meta-function, the system generates actionable, institutionally aligned recommendations that optimize student success, workload balance, and timely graduation. The framework’s domain-driven feature engineering enhances interpretability, while synthetic data enable privacy-compliant experimentation across diverse academic settings. Performance evaluations demonstrated high accuracy and robustness across all models. Specifically, SPM and CFSM achieved MAE scores of 0.006522 and 0.009170, respectively. PFM and GPM attained classification accuracies of 99.40% and 100%, while RLM achieved an MAE of 0.003403, indicating highly precise workload estimation. The global recommendation framework further achieved perfect scores across all ranking metrics (precision, recall, F1-score, NDCG, MRR, and hit rate—all at 1.00), confirming that the top recommended courses were consistently relevant, well-ranked, and aligned with institutional constraints. These findings position the framework as a modular and scalable solution for advancing personalized, data-driven academic advising. Overall, this framework offers a modular and extensible foundation for academic advising systems, balancing predictive precision with institutional realism. Its constraint-aware design and strong empirical performance position it as a meaningful step toward more personalized, scalable, and policy-aligned learning support tools in higher education.

Author Contributions

Conceptualization, M.S.I. and A.S.M.S.H.; methodology, M.S.I.; software, M.S.I.; validation, M.S.I.; formal analysis, M.S.I.; investigation, A.S.M.S.H.; resources, M.S.I.; data curation, M.S.I.; writing—original draft preparation, M.S.I.; writing—review and editing, M.S.I. and A.S.M.S.H.; visualization, M.S.I. and A.S.M.S.H.; supervision, A.S.M.S.H.; project administration, A.S.M.S.H.; funding acquisition, A.S.M.S.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Woosong University Academic Research Fund, 2025, South Korea.

Data Availability Statement

Data are available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Khan, M.A.Z.; Polyzou, A. Session-based Methods for Course Recommendation. J. Educ. Data Min. 2024, 16, 164–196. [Google Scholar]
Hu, H.; Pan, L.; Ran, Y.; Kan, M.-Y. Modeling and leveraging prerequisite context in recommendation. In Proceedings of the Workshop on Context-Aware Recommender Systems (CARS@RecSys’22), Seattle, WA, USA, 18–23 September 2022; pp. 1–13. [Google Scholar]
Xu, J.; Xing, T.; van der Schaar, M. Personalized course sequence recommendations. IEEE Trans. Signal Process. 2016, 64, 5340–5352. [Google Scholar] [CrossRef]
Tilahun, L.A.; Sekeroglu, B. An intelligent and personalized course advising model for higher educational institutes. SN Appl. Sci. 2020, 2, 1635. [Google Scholar] [CrossRef]
Ma, Y.; Ouyang, R.; Long, X.; Gao, Z.; Lai, T.; Fan, C. DORIS: Personalized course recommendation system based on deep learning. PLoS ONE 2023, 18, e0284687. [Google Scholar] [CrossRef] [PubMed]
Ren, X.; Yang, W.; Jiang, X.; Jin, G.; Yu, Y. A deep learning framework for multimodal course recommendation based on LSTM+Attention. Sustainability 2022, 14, 2907. [Google Scholar] [CrossRef]
Maphosa, M.; Doorsamy, W.; Paul, B. Improving academic advising in engineering education with machine learning using a real-world dataset. Algorithms 2024, 17, 85. [Google Scholar] [CrossRef]
Vie, J.-J.; Rigaux, T.; Minn, S. Privacy-Preserving Synthetic Educational Data Generation. In Educating for a New Future: Making Sense of Technology-Enhanced Learning Adoption, Proceedings of the 17th European Conference on Technology Enhanced Learning, EC-TEL 2022, Toulouse, France, 12–16 September 2022; Lecture Notes in Computer Science; Springer: Cham, Switzerland; Volume 13450.
Liu, Q.; Shakya, R.; Jovanovic, J.; Khalil, M.; de la Hoz-Ruiz, J. Ensuring privacy through synthetic data generation in education. Br. J. Educ. Technol. 2025, 56, 1053–1073. [Google Scholar] [CrossRef]
Pereira, M.; Kshirsagar, M.; Mukherjee, S.; Dodhia, R.; Ferres, J.L.; de Sousa, R. Assessment of differentially private synthetic data for utility and fairness in end-to-end machine learning pipelines for tabular data. PLoS ONE 2024, 19, e0297271. [Google Scholar] [CrossRef] [PubMed]
Torfi, A.; Fox, E.A.; Reddy, C.K. Differentially Private Synthetic Medical Data Generation using Convolutional GANs. Inf. Sci. 2022, 586, 485–500. [Google Scholar] [CrossRef]
Hassan, R.H.; Hassan, M.T.; Sameem, M.S.I.; Rafique, M.A. Personality-Aware Course Recommender System Using Deep Learning for Technical and Vocational Education and Training. Information 2024, 15, 803. [Google Scholar] [CrossRef]
Li, D.; Zhang, F.; Liu, C.; Chen, Y. Constrained Multi-objective Bayesian Optimization through Optimistic Constraints Estimation. In Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR, Mai Khao, Thailand, 3–5 May 2025; Volume 258. [Google Scholar]
San, K.K.; Win, H.H.; Chaw, K.E.E. Enhancing Hybrid Course Recommendation with Weighted Voting Ensemble Learning. J. Future Artif. Intell. Technol. 2025, 1, 338–347. [Google Scholar] [CrossRef]
Gil, N.M.; Patel, D.; Reddy, C.; Ganapavarapu, G.; Vaculin, R.; Kalagnanam, J. Identifying Homogeneous and Interpretable Groups for Conformal Prediction. In Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, Barcelona, Spain, 15–19 July 2024. [Google Scholar]
Iatrellis, O.; Samaras, N.; Kokkinos, K.; Panagiotakopoulos, T. Leveraging generative AI for sustainable academic advising: Enhancing educational practices through AI-driven recommendations. Sustainability 2024, 16, 7829. [Google Scholar] [CrossRef]
Cheng, X. A Comprehensive Study of Feature Selection Techniques in Machine Learning Models. Insights Comput. Signals Syst. 2024, 1, 65–78. [Google Scholar] [CrossRef]
Bobadilla, J.; Gutiérrez, A. Generating and Testing Synthetic Datasets for Recommender Systems to Improve Fairness in Collaborative Filtering Research. In Proceedings of the 2023 20th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA), Giza, Egypt, 4–7 September 2023; pp. 1–6. [Google Scholar] [CrossRef]
de Wilde, P.; Arora, P.; Buarque, F.; Chin, Y.C.; Thinyane, M.; Stinckwich, S.; Fournier-Tombs, E.; Marwala, T. Recommendations on the Use of Synthetic Data to Train AI Models; United Nations University: Tokyo, Japan, 2024. [Google Scholar]
Jordan, M.I.; Jacobs, R.A. Hierarchical mixtures of experts and the EM algorithm. Neural Comput. 1994, 6, 181–214. [Google Scholar] [CrossRef]
Vinutha, H.P.; Poornima, B.; Sagar, B. Detection of outliers using interquartile range technique from intrusion dataset. In Information and Decision Sciences, Proceedings of the 6th International Conference on FICTA, Bhubaneswar, India, 14–16 October 2017; Springer: Singapore, 2018; pp. 511–521. [Google Scholar] [CrossRef]
Heaton, J. An Empirical Analysis of Feature Engineering for Predictive Modeling. In Proceedings of the SoutheastCon 2016, Norfolk, VA, USA, 30 March–3 April 2016; pp. 1–6. [Google Scholar] [CrossRef]
Aslam, M.A.; Murtaza, F.; Haq, M.E.U.; Yasin, A.; Azam, M.A. A Human-Centered Approach to Academic Performance Prediction Using Personality Factors in Educational AI. Information 2024, 15, 777. [Google Scholar] [CrossRef]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased Boosting with Categorical Features. In Proceedings of the Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Ricci, F.; Rokach, L.; Shapira, B. Recommender Systems Handbook; Springer: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]

Figure 1. An overview architecture of the proposed Personalized Course Recommendation System (PCRS).

Figure 2. Derivation of the final dataset (DF4).

Figure 3. Mean absolute error (MAE) comparison for the Success Probability Model (SPM).

Figure 4. Root mean squared error (RMSE) comparison for the Success Probability Model (SPM).

Figure 5. Residual distribution of the Success Probability Model (SPM). The red dotted line indicates zero residuals, and the blue solid line represents the kernel density estimation (KDE).

Figure 6. Residuals vs. predicted values for the Success Probability Model (SPM). The red dotted line represents the zero-residual reference line and blue dots represent individual residual values.

Figure 7. Learning curve of the Success Probability Model (SPM).

Figure 8. Residuals vs. predicted values for the Course Fit Score Model (CFSM). The red dotted line represents the zero-residual reference line and blue dots represent individual residual values.

Figure 9. Learning curve of Course Fit Score Model (CFSM).

Figure 10. Model performance comparison for the Prerequisite Fulfillment Model (PFM).

Figure 11. Receiver Operating Characteristic (ROC) curve of the Prerequisite Fulfillment Model (PFM).

Figure 12. Confusion matrix of the Prerequisite Fulfillment Model (PFM).

Figure 13. Confusion matrix of the Graduation Priority Model (GPM).

Figure 14. Mean absolute error (MAE) comparison of the Recommended Load Model (RLM).

Figure 15. Residuals vs. predicted values for the Recommended Load Model (RLM). The red dotted line represents the zero-residual reference line and blue dots represent individual residual values.

Figure 16. Learning curve of the Recommended Load Model (RLM).

Table 1. Summary of the related works.

Ref.	Methodology	Key Contributions	Limitations
[5]	Deep Learning (DeepFM)	Improves prediction accuracy (AUC = 0.969), mitigates cold-start issues	Lacks structured academic constraints (prerequisites, workload balance, graduation planning)
[6]	Hybrid (Collaborative + Content-Based Filtering)	Enhances recommendations in sparse data settings	Requires additional domain-specific adjustments for structured academic environments
[12]	Personality-Aware Recommender (Deep Learning)	Personalized recommendations based on individual traits	Does not enforce prerequisite fulfillment or workload balancing
[13]	Constrained Multi-Objective Bayesian Optimization	Balances competing objectives under constraints using optimistic estimation and UCB.	Not designed for recommendation systems
[14]	Ensemble Learning (Weighted Voting: RF, Naïve Bayes, SVM)	Improves robustness and accuracy in course recommendation (ARHR: 0.333, NDCG: 1.0)	Lacks integration of academic constraints like prerequisites, workload balance, and graduation planning
[15]	Ensemble Learning for Interpretability	Enhances interpretability in predictive models using conformal prediction	Does not address structured multi-model academic advising
[16]	AI-Driven Feature Engineering	Enhances transparency and user trust in AI-powered advising	Does not consider academic-specific constraints such as workload balance and prerequisite satisfaction
[17]	Feature Selection in ML Models	Improves explainability in AI-driven decision systems	Does not explore structured academic advising constraints
[18]	Synthetic Data for Recommender Systems	Demonstrates that synthetic datasets can match real-world data in recommendation tasks	No specific application for academic advising
[19]	Policy Research on Synthetic Data	Provides guidelines on synthetic data use for AI models in privacy-sensitive domains	Does not address multi-model integration for course advising

Table 2. Samples of the Courses dataset.

Feature Name	Data Type	Sample Entry
course_id	int	329
department	object	Mathematics
credits	int	3
credit_type	object	GE
prerequisites	list	[269, 254]
corequisites	list	[560]
course_level	object	Introductory
course_schedule	dict	{‘MWF’: ‘2–3 PM’}
semester_offered	list	[‘Fall’]
mode_of_instruction	object	Online
historical_pass_rate	float	0.70
course_category	object	Theory
workload_level	object	Medium
popularity_score	float	0.993988
graduation_rate_impact	float	0.95
course_name	object	MATH 329
course_code	object	MATH193
prerequisites_count	int	2
corequisites_count	int	1
skill_tags_count	int	2

Table 3. Samples of the Student dataset.

Feature Name	Data Type	Sample Entry
student_id	int64	77672505
major	object	Biological Sciences
minor	object	Mathematics
current_gpa	float64	3.32
completed_courses	object	[353, 379, 103, 188, 357, 111, 363, 184, 182]
completed_credits_by_type	object	{‘MR’: 36, ‘ME’: 24, ‘GR’: 13, ‘GE’: 9, ‘Special’: 10}
remaining_credits_by_type	object	{‘MR’: 31, ‘ME’: 7, ‘GR’: 2, ‘GE’: 10, ‘Special’: 12}
total_required_credits	int64	120
preferred_class_days	object	MWF
preferred_class_times	object	Morning
semester_availability	object	[‘Fall’, ‘Spring’]
learning_style	object	Reading/Writing
has_academic_advisor	int64	1
transfer_credits	int64	3
course_re_enrollment	int64	0
program_start_date	object	15-08-2022
current_course_load	int64	2
career_goals_tags	object	Entrepreneurship
extracurricular_commitments	object	Full-time Job

Table 4. Samples of the Student–Course Interaction dataset.

Feature Name	Data Type	Sample Entry
student_id	int	77672505
course_id	int	452
enrollment_status	int	1
grade	float	2.70
completion_status	int	0
feedback_score_x (Student)	float	3.500000
satisfaction_level	int	1
engagement_level	object	Low
grade_exam	float	2.24
grade_project	float	3.65
grade_participation	float	2.48
time_to_complete	int	11
engagement_type	object	Attendance
timestamp	datetime	08-05-2022 04:40
year	int	2022
Semester	object	Spring
historical_pass_rate	float	0.78
course_category	object	Application
workload_level	object	Medium
popularity_score	float	0.150301
graduation_rate_impact	float	0.67
course_name	object	COMP 452
course_code	object	CS293
prerequisites_count	int	0
corequisites_count	int	1
skill_tags_count	int	1
semesters_left	int	5
current_semester	int	5
feedback_score_y (Course)	float	3.1
engagement_score	float	6.5
participation_rate	float	76.5
instructor_rating	float	3.9
assignments_completed	int	10
discussion_posts	int	5
lecture_attendance	int	12
online_resources_used	int	2
peer_interactions	int	15
project_score	float	65.2
exam_attempts	int	1

Table 5. Model performance comparison for the Success Probability Model (SPM). Note: “↓” indicates that lower values are preferred for the respective metrics (e.g., MAE and RMSE).

Model	MAE (↓ Lower Is Better)	RMSE (↓ Lower Is Better)
GBR (Optimized)	0.006522	0.009565
GBR (Initial)	0.013726	0.018875
LightGBM	0.013944	0.019402
XGB	0.014076	0.019320
CB	0.019341	0.025844
RFR	0.134149	0.175979
LR	0.143416	0.188638

Table 6. Model performance comparison for the Course Fit Score Model (CFSM). Note: “↓” indicates that lower values are preferred for the respective metrics (e.g., MAE and RMSE).

Model	MAE (↓ Lower Is Better)	RMSE (↓ Lower Is Better)
GBR	0.009170	0.011713
XGB	0.009811	0.012612
LightGBM	0.012393	0.015815
CB	0.011153	0.014404
SVR	0.042138	0.050452
LR	0.091535	0.091535
RFR	0.122978	0.153813

Table 7. Model performance comparison for the Prerequisite Fulfillment Model (PFM).

Model	Accuracy	Precision	Recall	F1-Score	AUC-ROC	Cross-Validation AUC
CB	99.40%	100.00%	98.31%	99.15%	0.998556	0.998345
XGB	99.40%	100.00%	98.31%	99.15%	0.998546	0.998369
RFR	99.40%	100.00%	98.31%	99.15%	0.998414	0.998261
DT	99.40%	100.00%	98.31%	99.15%	0.998417	0.998249
SVM	99.40%	100.00%	98.31%	99.15%	0.996821	0.995645
Logistic Regression	99.40%	100.00%	98.31%	99.15%	0.997709	0.997448

Table 8. Model performance comparison for the Recommended Load Model (RLM). Note: “↓” indicates that lower values are preferred for the respective metrics (e.g., MAE and RMSE).

Model	MAE (↓ Lower Is Better)	RMSE (↓ Lower Is Better)
GBR	0.003403	0.005406
CB	0.005237	0.020630
LightGBM	0.008566	0.021444
XGB	0.008574	0.022642
HistGradientBoosting	0.008716	0.022654
Extra Trees	0.057097	0.097678
RFR	0.060356	0.095996
SVR	0.061143	0.069634
DT	0.076509	0.114145
LR	0.335903	0.435449

Table 9. Summary of local model configurations and performance metrics. Note: “↓” indicates that lower values are preferred for the respective metrics (e.g., MAE and RMSE).

Model	Selected Algorithm	MAE (↓ Lower Is Better)	RMSE (↓ Lower Is Better)	Accuracy	Precision	Recall	F1-Score	AUC-ROC
SPM	GBR	0.006522	0.009565	-	-	-	-	-
CFSM	GBR	0.009170	0.011713	-	-	-	-	-
PFM	CBC	-	-	99.40%	100.00%	98.31%	99.15%	0.998556
GPM	LightGBM	-	-	100.00%	100.00%	100.00%	100.00%	1.000
RLM	GBR	0.003403	0.005406	-	-	-	-	-

Table 10. Top five recommended courses for a Psychology student.

Course ID	Final Score	Course Name	Course Level	Course Category	Credit Type
458	1.34	PSYC 458	Upper-division	Theory	ME
242	1.32	PSYC 242	Upper-division	Application	GE
499	1.18	PSYC 499	Intermediate	Research	MR
103	1.07	PSYC 103	Intermediate	Research	GR
483	1.04	PSYC 483	Upper-division	Application	GR

Table 11. Top five recommended courses for an Economics student.

Course ID	Final Score	Course Name	Course Level	Course Category	Credit Type
179	1.83	ECON 179	Upper-division	Application	MR
420	1.33	ECON 420	Upper-division	Application	GR
193	1.33	ECON 193	Introductory	Research	GR
159	1.30	ECON 159	Upper-division	Theory	MR
268	1.06	ECON 268	Intermediate	Theory	Special

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Islam, M.S.; Hosen, A.S.M.S. Personalized Course Recommendation System: A Multi-Model Machine Learning Framework for Academic Success. Digital 2025, 5, 17. https://doi.org/10.3390/digital5020017

AMA Style

Islam MS, Hosen ASMS. Personalized Course Recommendation System: A Multi-Model Machine Learning Framework for Academic Success. Digital. 2025; 5(2):17. https://doi.org/10.3390/digital5020017

Chicago/Turabian Style

Islam, Md Sajid, and A. S. M. Sanwar Hosen. 2025. "Personalized Course Recommendation System: A Multi-Model Machine Learning Framework for Academic Success" Digital 5, no. 2: 17. https://doi.org/10.3390/digital5020017

APA Style

Islam, M. S., & Hosen, A. S. M. S. (2025). Personalized Course Recommendation System: A Multi-Model Machine Learning Framework for Academic Success. Digital, 5(2), 17. https://doi.org/10.3390/digital5020017

Article Menu

Personalized Course Recommendation System: A Multi-Model Machine Learning Framework for Academic Success

Abstract

1. Introduction

2. Related Work

2.1. Collaborative and Content-Based Filtering Models

2.2. Deep Learning Approaches

2.3. Context-Aware and Constrained Optimization Models

2.4. Hybrid and Ensemble Learning Models

2.5. Feature Engineering and Interpretability

2.6. Synthetic Data in Academic Recommender Systems

3. The Proposed Personalized Course Recommendation System

3.1. Dataset Collection

3.2. Data Preparation and Feature Engineering

3.3. Local Model Framework

3.4. Global Model Framework

4. Performance Evaluation

4.1. Experiment Setup

4.2. Results Analysis of the Local Models

4.3. Results Analysis of the Global Model

4.4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI