Prediction of Student Academic Performance Utilizing a Multi-Model Fusion Approach in the Realm of Machine Learning

Zou, Wei; Zhong, Wei; Du, Junzhen; Yuan, Lingyun

doi:10.3390/app15073550

Open AccessArticle

Prediction of Student Academic Performance Utilizing a Multi-Model Fusion Approach in the Realm of Machine Learning

¹

School of Information Science and Technology, Yunnan Normal University, Kunming 650500, China

²

Key Laboratory of Education Informatization for Nationalities, Ministry of Education, Yunnan Normal University, Kunming 650500, China

³

Yunnan Key Laboratory of Smart Education, Yunnan Normal University, Kunming 650500, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(7), 3550; https://doi.org/10.3390/app15073550

Submission received: 18 January 2025 / Revised: 9 March 2025 / Accepted: 11 March 2025 / Published: 24 March 2025

Download

Browse Figures

Versions Notes

Abstract

The digitization of college student management is a crucial approach for training institutions to decrease management costs while enhancing the quality of students’ development. In this study, we focused on the students majoring in Computer Science in a certain university and conducted an exploration using their scores in multiple undergraduate courses. Initially, we selected the students’ basic and core academic courses based on the training program and identified four groups of course combinations with strong positive correlations through correlation and cluster analysis. This finding helped the university optimize the arrangement and structure of the Computer Science major’s course system. Next, we organized the student overall course performance data in a sequential format based on the semester order. Multiple machine learning models were utilized to perform regression prediction for student performance and classification prediction tasks to determine the student’s performance level. Finally, we integrated multiple machine learning models to create a practical framework for predicting student academic performance, which can be applied in student digital management. The framework can also provide effective decision support for academic early warning and guide the students’ development.

Keywords:

machine learning; multi-model fusion; performance prediction

1. Introduction

Student academic performance prediction is a significant research area within educational big data mining. It serves the purpose of identifying potential issues in students’ studies, providing data-driven support for personalized intervention, and it plays an increasingly important role in higher education. The prediction of student performance serves as the foundation for academic early warning systems. Its objective is to develop a prediction model by thoroughly mining relevant student information in order to accurately forecast students’ future academic achievements. These predictions could include course performance, the risk of course failure, grade point average prediction, and dropout risk prediction [1,2,3]. In colleges and universities, each year, a number of students either receive poor grades or drop out due to failing course exams. This significantly impacts the quality of higher education and social stability. Therefore, researchers have used a variety of data analysis methods to achieve effective academic alerts. Some researchers focus on exploring the correlation between students’ performance and individual factors, as well as learning behavior factors. For instance, Gao Xiumei [4] discovered a positive correlation between learning motivation and academic performance. Sun et al. [5] found a significant positive effect of students’ learning behavior on their academic performance. Farshiid et al. [6] and Maureen [7] identified classroom attendance as one of the influencing factors on course performance. Wheation et al. [8] found that sufficient sleep can reduce risk behaviors like absenteeism and improve students’ academic performance. Xu et al. [9] revealed the correlation between students’ online access behavior and their performance. Moreover, researchers are exploring the correlation between course scores. Jamie Gilbert Philipp [10] analyzed the relationship between course pass rates and final exam pass rates using Pearson correlation analysis. The study found a significant positive correlation between the two, except for ninth-grade literature. Fabio Sticca et al. [11] examined the reliability of self-reported academic performance across four subject areas using a sample of 916 high school students. The results indicated a strong positive correlation between self-reported and actual grades across all subjects and across grades 9–11. Elisa Caponera [12] investigated the impact of students’ reading literacy on their performance in the TIMSS (Trends in International Mathematics and Science Study) mathematics and science assessments. The study confirmed that reading literacy significantly influences mathematics achievement, while its effect on science performance was less pronounced.

However, most learning performance prediction methods rely on traditional data analysis techniques or are solely based on a single machine learning model, resulting in limited accuracy for early warning. In recent years, more researchers have been utilizing deep learning methods based on mathematical statistics, rule-based approaches, neural networks, and sample similarity [13]. Nabil et al. [14] compared the performance of various machine learning models, including DNN, DT, and LR, by applying resampling methods such as SMOTE and random oversampling, along with multiple model validation techniques. They selected the optimal classifier through various evaluation metrics and statistical hypothesis testing. Bujang et al. [15] proposed a combined model based on SMOTE and two feature selection algorithms, which improves the performance of student-level predictions in imbalanced multi-class datasets by automatically determining the optimal sampling ratio. Feng et al. [16] improved the traditional K-means algorithm by objectively determining the number of clusters and combined it with deep learning algorithms for training on unlabeled data, proposing a novel method for predicting students’ future academic achievements. Zeineddine et al. [17] leveraged AutoML to enhance the accuracy of predicting student performance using available data upon entering academic courses. Brahim et al. [18] designed a predictive model for student academic performance based on statistical feature extraction, evaluating the model’s performance using classifiers such as random forests and support vector machines. Qiu et al. [19] proposed a behavior-based electronic learning performance prediction framework (BCEP), which improves prediction accuracy through four steps—data cleaning, behavior classification, feature fusion, and model training—and introduced the PBC model to classify e-learning behaviors, thereby enhancing both learning performance prediction effectiveness and model applicability.

It is evident that extensive research has been conducted on predicting students’ academic performance using machine learning technology. However, there is still a lack of research on the correlation of course performance, which is crucial for guiding the course arrangement in computer-related majors at universities. Moreover, due to the varying abilities of different machine learning algorithms in processing different types of data and the varying accuracy of training on different datasets, a single machine learning model may not effectively function in real-world teaching scenarios. To address these issues, this study focuses on practical application scenarios in student education management. It establishes a machine learning framework based on multi-model integration to improve the accuracy of student performance prediction for undergraduate majors. The study begins by conducting a correlation analysis of the scores in 17 specialized courses selected from the student training program. It discovers that these courses can be grouped into four collections, each exhibiting strong positive correlations. Next, multiple machine learning models are trained on this data to complete regression tasks for predicting student performance or classification tasks for predicting student performance ratings. It is observed that different machine learning models exhibit varying abilities to predict accurately for different courses or tasks. Finally, to construct a robust and reliable student performance prediction system, multiple machine learning models are integrated, and the optimal model is selected using accuracy voting. This leads to the development of a well-generalized student performance prediction system.

The main contributions of this study can be summarized as follows:

(1): Course Correlation Analysis: The study conducts correlation and cluster analyses on 17 specialized courses in the computer major training program. The identification of four sets of highly related courses provides valuable insights for optimizing the teaching arrangement of computer curriculum systems in universities.
(2): Evaluation of Multiple Machine Learning Algorithms: Through numerous experiments, the study validates the performance of various machine learning algorithms in predicting student performance under different tasks and datasets. The findings highlight the inconsistency in the performance of different models and emphasize the limitations of relying on a single machine learning method for accurate predictions in real-world teaching scenarios.
(3): Multi-Model Fusion Framework: To address the limitations of a single machine learning model, this study proposes a multi-model fusion framework based on machine learning. The framework utilizes accuracy voting to select the optimal model for each dataset or task, resulting in a more reliable and comprehensive student performance prediction system. This framework offers valuable insights and guidance for university administrators in conducting academic prediction and early warning.

The paper is structured as follows: Section 1 provides the research background, Section 2 analyzes the correlation of specialized courses, Section 3 presents the experiments evaluating machine learning models for student performance prediction, Section 4 introduces the multi-model fusion framework for student performance prediction, and Section 5 concludes the study and suggests future directions for research.

In this study, “accuracy voting” was chosen as an ensemble method because it effectively utilizes the actual performance of individual models without relying on complex weight settings. Moreover, compared to the commonly used weighted averaging method, accuracy voting provides more consistent prediction results. In contrast to the majority voting method, accuracy voting proves to be more reliable in regression tasks, better handling discrepancies in model predictions. Therefore, accuracy voting serves as an appropriate ensemble strategy in this study, better meeting our needs.

2. Materials and Methods

In this section, we employ extensive data analysis techniques to uncover the correlations among the 17 courses offered in the computer major at a specific university.

2.1. Determination of Analysis Objects and Data Preprocessing

The data acquisition and processing processes in this paper focus on the final examination results of 2019 and 2020 undergraduates from a university’s college of information [20]. Data preprocessing plays a crucial role in ensuring the integrity and standardization of the data. Several steps were performed. Firstly, any data samples with missing values, outliers, or repeated values were removed. Additionally, students who had dropped out, suspended their studies, or transferred to other programs were excluded from the analysis as their final exam score data may be incomplete. The dataset also does not involve students from other disciplines or majors, so its applicability is mainly limited to academic performance in computer science. This careful selection resulted in a final dataset of 292 students’ final exam score data, including student number, semester, course number, course name, and test scores. Furthermore, it is recognized that different courses may have variations in difficulty and scoring criteria, which could introduce bias in direct comparative analysis using the original course scores. To address this concern, the student overall course performance was standardized. Specific steps were taken to transform the scores to a common scale, ensuring fair comparisons across courses. The two major steps are shown as follows:

Step 1: Calculate the standard deviation, using the following formula:

σ = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} (X_{i} - μ)}

(1)

where

X_{i}

represents the score of the i-th student,

μ

represents the mean of all students, and

N

is the total number of all students.

Step 2: Standardize student performance, using the following formula:

X_{s t d} = \frac{X_{i} - μ}{σ}

(2)

By completing these steps, we obtained a standardized dataset of 292 computer students’ final scores.

In addition, triple classification is one of the prediction models. We use 70% of the samples and use different methods to predict, to analyze the precision, recall, and F1-score. In the subsequent experiment, students were divided into three categories: fail, pass, and excellent. The number of samples of each category in the training dataset was not counted, and no obvious potential bias was found in the dataset.

2.2. Analytical Method and Process

In this section, we present an analytical examination of the pertinence of the diverse academic courses offered in the curriculum. By considering the temporal alignment of students’ final exam scores, the course list is structured according to the university’s training program, encompassing six semesters. The courses in the first semester encompass Linear Algebra, Introduction to Computers, and Advanced Mathematics 1. Moving on to the second semester, the curriculum comprises Discrete Mathematics, Foundations of Programming, and Advanced Mathematics 2. The third semester introduces courses such as Data Structures, Probability and Statistics, and Object-Oriented Programming. As for the fourth semester, the course load entails Operating Systems, Algorithm Analysis and Design, and Constitution Principles of Computers. Subsequently, in the fifth semester, students engage in courses including Database Principles and Applications, Compiler Principles, and Computer Networks. Finally, in the sixth and final semester, students delve into Python (version 3.11.5) Program Language Design and Web Application Development Technology.

During the analysis process, we exclude the correlation between courses in the same semester and focus solely on the impact of subsequent semester course grades. To make preliminary observations, Figure 1 illustrates the strong correlations between courses. Notable high-degree correlations include: Linear Algebra with Advanced Mathematics 2, Discrete Mathematics with Data Structures, Discrete Mathematics with Probability and Statistics, Discrete Mathematics with Compiler Principles, Data Structures with Compiler Principles, Probability and Statistics with Operating Systems, Probability and Statistics with Compiler Principles, Operating Systems with Compiler Principles, and so on.

Furthermore, this study incorporates a cluster analysis of courses to group them based on Euclidean distance calculations between different course grades. As depicted in Figure 2, when the Euclidean distance is less than 200, courses can be classified into four clusters. The first cluster includes Web Application Development Technology, Database Principles and Applications, Computer Networks, and Python. The second cluster comprises Compiler Principles, Operating Systems, Foundations of Programming, and Discrete Mathematics. The third cluster consists of Higher Mathematics 1, Higher Mathematics 2, Probability and Statistics, and Data Structures. Lastly, the fourth cluster includes Computer Composition Principles, Algorithm Analysis and Design, Object-Oriented Programming, Linear Algebra, and Introduction to Computers.

Based on the course correlation thermal map analysis (Figure 1) and cluster results (Figure 2), we have identified four course groups with strong relationships, analyzed and selected in accordance with the order of course opening. The course groups and their relevance are presented in Table 1, Table 2, Table 3 and Table 4 below.

The correlation and cluster analyses discussed above provide valuable insights into the relationship between courses and have multiple benefits.

(1): Correlation Between Courses: By analyzing correlations, we can identify connections and intersections between different courses. This information helps students understand how various subjects are related and how they can build on each other. Students can leverage this knowledge to see the broader picture and make connections across different courses, enhancing their learning experience.
(2): Continuity of Courses: Cluster analysis helps identify the continuity of courses within a semester. By organizing related courses together, students can better understand the progression of knowledge and skills. This arrangement allows for a smooth transition between topics, enabling students to build a strong foundation and delve deeper into complex concepts.
(3): Prerequisites for Courses: Correlation analysis helps identify prerequisites for courses. By understanding the correlations between different courses, educators can establish an appropriate sequence for the curriculum. This ensures that students have the necessary pre-knowledge to understand and succeed in subsequent courses, fostering a comprehensive understanding of the subject matter.

In summary, the correlation and cluster analyses described above provide valuable guidance in designing college courses. The insights gained from these analyses help education departments and teachers optimize curriculum structure and semester planning. Moreover, students can benefit from these analyses by making informed choices about their learning paths, improving learning efficiency, and ultimately enhancing their academic performance.

3. Model Selection

In this section, a thorough analysis was performed to confirm that different machine learning models exhibit varying levels of accuracy when predicting student performance. This finding indicates that relying solely on a single model for performance prediction in real-world teaching scenarios may be imprecise and challenging. Consequently, it is crucial to consider and utilize multiple machine learning models to ensure accurate predictions and their practical applicability in educational settings.

3.1. Methodology Selection and Rationale

This study employs a variety of machine learning models to predict student academic performance, each of which has unique advantages that enable it to excel in different tasks and datasets. The linear regression model [21] is simple in structure, computationally efficient, and highly interpretable, making it suitable for small datasets. Ridge regression [22] effectively prevents overfitting through L2 regularization, improving generalization, and is particularly useful for addressing multicollinearity issues. Lasso regression uses L1 regularization for automatic feature selection [23], making it suitable for high-dimensional data and enhancing model interpretability. Decision tree regression [24] captures nonlinear relationships, is robust to outliers and missing values, and is easy to visualize. Random forests [25], by integrating multiple decision trees, improve predictive accuracy, assess feature importance, and reduce the risk of overfitting. Support vector machines [26,27] perform exceptionally well with high-dimensional and small sample data, offering strong classification performance and robustness, capable of handling both linear and nonlinear data. Gradient boosting trees [28] iteratively build decision trees, improving model performance and handling complex data and nonlinear relationships. K-nearest neighbors (KNN) [29,30] is intuitive, simple, does not require assumptions about data distribution, and is robust to noise and outliers. AdaBoost [31,32] significantly enhances classification performance by adjusting sample weights, with strong adaptability and ease of implementation. Through the multi-model fusion framework, this study fully leverages the strengths of each model, significantly improving prediction accuracy and reliability, providing effective decision support for student education management.

3.2. Regression Prediction

In this paper, several regression prediction algorithms were employed, such as linear regression, ridge regression, lasso regression and decision tree regression.

(1) Linear Regression [21]: Linear regression involves finding a line (or hyperplane in a multidimensional space) that minimizes the sum of squared differences between the observed data points and the predicted values. It is advantageous for making accurate predictions with small datasets. However, it is sensitive to data noise and may encounter issues when dealing with large amounts of data or high-dimensional feature spaces. The calculation formula is provided as follows:

L = \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2} = \sum_{i = 1}^{n} {(y_{i} - (β_{0} + β_{1} x_{i 1} + \dots + β_{n} x_{i n}))}^{2}

(3)

where

y_{i}

is the actual value for the i-th observation;

{\hat{y}}_{i}

is the predicted value for the i-th observation, which is calculated using the linear model;

x_{i 1}, x_{i n}

are the values of the predictors for the

i

-th observation;

β_{0}, β_{n}

are the regression coefficients to be estimated; and

n

is the number of observations. The regression coefficient

\hat{β}

is solved by minimizing the loss function, where

y_{i}

represents the actual value and

{\hat{y}}_{i}

is the predictive value.

(2) Ridge Regression [22]: Ridge regression addresses the issue of potential overfitting in linear regression by adding a regularization term to the loss function. This regularization term helps improve the model’s ability to generalize and provides some robustness to data noise. One advantage of ridge regression is that it effectively tackles overfitting. However, a potential drawback is that tuning the ridge parameters may be necessary to achieve the best prediction performance. The calculation formula is provided as follows:

L = \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2} + λ \sum_{j = 1}^{p} β_{j}^{2}

(4)

The regression coefficient

\hat{β}

is solved by minimizing the loss function, where

λ

represents the regularization parameter.

(3) Lasso Regression [23]: Lasso regression is a similar process to ridge regression but uses L1 regularization instead, which can result in some coefficients being reduced to zero, effectively performing feature selection. One advantage of lasso regression is its ability to automatically select features relevant to the target variable. It addresses the fitting problem and improves the model’s generalization ability. However, a drawback is that tuning the lasso parameters may be necessary to achieve the best prediction, which can be computationally inefficient for large-scale datasets. The calculation formula is provided as follows:

L = \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2} + λ \sum_{j = 1}^{p} | β_{j} |

(5)

where

y_{i}

represents the actual value,

{\hat{y}}_{i}

is the predictive value,

λ

represents the regularization parameter, and

β_{j}

represents the parameter vector that needs to be estimated.

(4) Decision Tree Regression [24]: In decision tree regression, data are divided into subsets based on a series of rules, and each subset is then predicted using a simple model, typically the average. This segmentation process allows decision tree regression to handle nonlinear relationships well and to be robust to missing values and outliers. However, it is prone to overfitting during training, and the noise present in the training dataset can impact the prediction results. Decision tree regression involves recursively partitioning the data and constructing leaf nodes for each subset, resulting in a tree structure. When given a new input sample, the feature values are evaluated starting from the root node and traversing down to a leaf node, whose value is then used as the prediction result. Here is a conceptual representation of the output of a regression tree for an input vector:

x . \hat{y} (x)

is the prediction for the input

x

;

n

is the number of leaves in the tree;

c_{m}

is the predicted value for leaf m, usually the mean of the target values of the training samples falling into m;

c_{m}

is the predicted value for leaf

m

, usually the mean of the target values of the training samples falling into m;

R_{m}

represents the region of the input space associated with leaf m; and

I

is an indicator function that outputs 1 if

x

falls into the region

R_{m}

, and 0 otherwise.

\hat{y} (x) = \sum_{m = 1}^{n} c_{m} \cdot I (x \in R_{m})

(6)

3.3. Classification Prediction

In this section, we address the issue of predicting student achievement using categorical grades instead of continuous values. This is done to account for potential variations in the number of students in different grade categories, which could result in large errors if regression models are directly used for predictions. Furthermore, the use of discrete grade categories makes the model more interpretable and easier to handle. To tackle this categorical prediction task, we conducted experiments using five different algorithms: random forest, support vector machine, gradient boosting tree, K-nearest neighbors, and AdaBoost. These algorithms were applied to predict grades in four distinct courses: Data Structures, Compilation Principles, Algorithm Analysis and Design, and Operating Systems. Let us now provide a brief introduction to these algorithms.

(1) Random Forest [25]: Random forest is an ensemble learning method that utilizes decision trees to improve the model’s generalization capability and reduce the risk of overfitting. Each decision tree is trained independently, and the final prediction is obtained by averaging or taking the majority vote of the predictions from all the decision trees.

The calculation process of random forest is as follows: Each decision tree randomly selects a subset of samples from the training data through a process called “bootstrap aggregating” or “bagging”. For each selected sample, a random subset of k features (where k is an integer less than the total number of features) is chosen. The decision tree is built based on the selected features and undergoes a process of dividing the data into smaller subsets. The above steps are repeated for all the decision trees until each tree is trained. The calculation formula is follows:

{\hat{y}}_{R F} = \frac{1}{T} \sum_{t = 1}^{T} {\hat{y}}_{t}

(7)

where

{\hat{y}}_{R F}

is the prediction of the random forest,

T

is the number of trees, and

{\hat{y}}_{t}

is the prediction of the t tree.

(2) Support Vector Machine (SVM) [26]: SVM is a supervised learning algorithm used for classification. It works by mapping the input data into a high-dimensional feature space and then finding the optimal hyperplane that separates the data points belonging to different classes. The objective of the SVM is to minimize classification errors on the training set while maximizing the margin between the hyperplane and the support vectors.

The process of SVM classification can be outlined in three steps: mapping the data, finding the hyperplane, and classifying based on distance. The SVM uses a kernel function to map the input data from the original space to a higher-dimensional feature space. Then, in the high-dimensional space, the SVM identifies the hyperplane that best separates the data points from different classes. Once the hyperplane is determined, the SVM calculates the distance between each support vector (data point that has the most influence on the location of the hyperplane) and the hyperplane. The sign of this distance determines the classification of a data point. Data points on one side of the hyperplane are assigned to one class, while those on the opposite side are assigned to the other class. The optimization problem can be expressed as follows:

\min \frac{1}{2} W^{T} W + C \sum_{i = 1}^{N} \max (0, 1 - y_{i} (W^{T} X_{i}))

(8)

The vector

W

represents the weights of the model. The term

\frac{1}{2} W^{T} W

is a regularization term that penalizes the complexity of the model; it effectively tries to keep the weights small to avoid overfitting. The sum

\sum_{i = 1}^{N} \max (0, 1 - y_{i} (W^{T} X_{i}))

is the sum of the hinge loss for each of the

N

samples in the dataset. Each

X_{i}

is a feature vector representing the

i

-th sample, and each

y_{i}

is the corresponding true label, which is usually +1 or −1 in binary classification tasks.

C

is a hyperparameter that controls the trade-off between maximizing the margin and minimizing the classification error. A larger value of

C

allows the optimization to choose a smaller-margin hyperplane if that hyperplane does a better job of getting all the training points classified correctly. The hinge loss function

\max (0, 1 - y_{i} (W^{T} X_{i}))

is piecewise linear and is used to compute the loss of a prediction. It penalizes predictions that are on the wrong side of the margin.

(3) Gradient Boosting Tree (GBT) [28]: Also known as gradient boosting machine, GBT is a machine learning method that combines decision trees to create a strong predictive model. It iteratively builds a series of decision trees by minimizing the loss function, with each tree attempting to correct the errors made by the previous one.

The calculation process of GBT can be summarized in five steps: initializing a decision tree, iterating over the training data, calculating the gradient, building a new decision tree, and updating the model. The process begins by initializing a decision tree as the base model. In each iteration, GBT calculates the gradient of the loss function with respect to the predictions made by the current model. Then, GBT calculates the gradient based on the difference between the actual target values and the predictions made by the current ensemble of trees. Next, based on the gradient, GBT constructs a new decision tree that aims to reduce the errors made by the ensemble of trees constructed so far. Finally, after generating the new decision tree, GBT combines it with the previous models to update the ensemble. The new model is created by adding the predictions of the newly generated tree to the predictions of the existing ensemble, with the objective of minimizing the overall loss function. The calculation formula is as follows:

{\hat{y}}_{G P T} = \sum_{t = 1}^{T} α_{t} \cdot {\hat{y}}_{t}

(9)

where

{\hat{y}}_{G P T}

is the prediction result of the gradient lift tree,

T

is the number of trees,

α_{t}

is the weight of t tree, and

{\hat{y}}_{t}

is the prediction result of t tree. In GBT, the weights

α_{t}

are usually fixed or calculated through optimization algorithms such as gradient descent, aiming at reducing the loss function of the overall model, and are not directly based on the error rate of each weak learner.

(4) K-Nearest Neighbors (KNN) [29]: The optimization problem in KNN lies in determining the optimal value of K, which represents the number of neighbors considered for classification or regression. The choice of K can significantly impact the performance of the KNN algorithm. The optimization can be expressed as follows: first, for each unknown sample, the distance in the feature space is calculated; then, the nearest K neighbors are found; last, the category of neighbors (for classification problems) is voted on. The calculation formula is as follows:

{\hat{y}}_{K N N} = \arg \max \sum_{i = 1}^{K} I (y_{i} = y)

(10)

where

{\hat{y}}_{K N N}

is the prediction result of KNN,

K

is the number of neighbors, and

y_{i}

is the category of the

i

neighbor.

(5) AdaBoost [31]: Also known as adaptive boosting, AdaBoost is an iterative algorithm that constructs a series of weak learners by progressively emphasizing the misclassified samples. These weak learners are then combined to create a more powerful or stronger learner.

The calculation process is as follows: a weak learner is initialized; the training dataset is iterated for each iteration; for each sample, the prediction error of the current learner is calculated; the weight of the sample is updated according to the error; and the new weak learner is trained with the updated weight. The calculation formula is follows:

{\hat{y}}_{A d a} = \sum_{t = 1}^{T} α_{t} \cdot {\hat{y}}_{t}

(11)

where

{\hat{y}}_{A d a}

is the prediction result of

A d a B o o s t

,

T

is the number of learners,

α_{t}

is the weight of the t learner, and

{\hat{y}}_{t}

is the prediction result of the t learner. In AdaBoost, the weights

α_{t}

are calculated based on the error rate

\in t

of each weak learner, expressed as

α_{t} = \frac{1}{2} \ln (\frac{1 - \in_{t}}{\in_{t}})

. The lower the error rate of the weak learner, the higher its weight.

4. Experiments and Analysis

4.1. Regression Prediction Experiments

To assess the performance of the regression model in predicting student performance for the four chosen courses (Data Structures, Compilation Principles, Algorithm Analysis and Design, and Operating Systems), we randomly split the dataset into a training set comprising 70% of the 292 student achievement data and a test set comprising the remaining 30%. The evaluation metrics for the four regression algorithms are presented below.

Based on the findings presented in Table 5, we can infer the following analytical insights.

(1) Algorithm Results Comparison: Looking at the mean square error and root mean square error metrics, lasso regression had the lowest values, suggesting the best fit. On the other hand, decision tree regression had the highest mean square error and root mean square error, indicating the worst fit. When considering relative error, lasso regression had a relatively small relative error in predicting the performance of the four courses, while the other three regression types had larger relative errors. This implies that lasso regression’s predicted results were closer to the true values compared to the other regressions.

(2) Model Selection Analysis: Based on the analysis, if the objective is to choose the most suitable regression model for predicting course performance, lasso regression can be considered. It performed well across all four courses, with low mean squared error and relative error. If the goal is to predict the performance for a specific course, it would be beneficial to select an appropriate model based on the characteristics of that specific course. For instance, the Compiler Principles course may benefit from ridge regression or lasso regression; the Operating Systems course may be better suited to linear regression or ridge regression; for the Data Structures course, lasso regression or linear regression may be useful; and the Algorithm Analysis and Design course may benefit from ridge regression or linear regression.

To calculate the accuracy of each regression algorithm, we can set a threshold of 10 points as the acceptable difference between the predicted result and the actual result. If the difference falls within this threshold, we consider the prediction to be correct, as shown in Table 6 and Figure 3:

Based on the experimental findings presented in Table 6 and Figure 3, we can derive the following analytical conclusions.

Firstly, regarding the prediction of scores for the Data Structures course, linear regression, ridge regression, and lasso regression achieved similar prediction accuracies, while decision tree regression had the lowest accuracy. For the prediction of Compiler Principles course scores, linear regression and ridge regression yielded equivalent prediction accuracies. However, lasso regression outperformed both by achieving the highest accuracy, while decision tree regression had the lowest accuracy.

Secondly, in terms of predicting Algorithm Analysis and Design course scores, linear regression, ridge regression, and lasso regression achieved similar prediction accuracies, while decision tree regression had the lowest accuracy.

Thirdly, when it comes to predicting Operating Systems course performance, linear regression and ridge regression showed comparable accuracies, while lasso regression outperformed the others with the highest accuracy. On the other hand, decision tree regression had the lowest accuracy in this case.

In conclusion, the results suggest that lasso regression performs the best overall in accurately predicting student performance in these courses, while decision tree regression tends to be the least accurate among the regression algorithms.

4.2. Classification Prediction Experiments

Using Data Structures course prediction as an example, we designed three different types of performance grade prediction tasks. The first task includes two categories: “Fail” and “Pass”; the second task includes three categories: “Fail”, “Pass”, and “Excellent”; the third task includes “Fail”, “Pass”, “Fair”, “Good”, and “Excellent”. Macro average refers to the average of a metric calculated independently for each class. This calculation method treats all classes equally, regardless of their support (the number of instances in each class). Weighted average is similar to the macro average, but it accounts for class imbalance by weighting the average of a metric by the number of instances in each class. Therefore, classes with more instances have a bigger impact on the metric. The focus of this paper is on overall performance rather than differences between categories, so the focus here is on overall accuracy.

Based on the above results of Table 7, the following analysis conclusions can be made:

Firstly, the models performed well on the test set, with most models achieving an average precision, recall, and F1-score above 0.9. This indicates their ability to predict both positive and negative cases accurately.

Secondly, random forest, SVM, gradient boosting, KNN, and AdaBoost achieved 95% accuracy on the test set, showcasing their strong classification abilities.

Thirdly, the models exhibited excellent prediction performance on positive cases (Pass), with an accuracy of 0.95, recall of 1.00, and F1-score of 0.98. However, there were four instances where the models incorrectly predicted negative cases (Fail). This could be due to the similarities between the characteristics of these samples and positive and negative cases, making accurate classification challenging.

Lastly, weighted average measures indicate stable performance of all models on the test set, with a precision rate of 0.91, recall of 0.95, and F1-score of 0.93. This suggests that the models have robust generalization abilities across different sample classes.

Based on the above results of Table 8, the following analysis conclusions can be made:

Firstly, the models had poor prediction performance on the “Fail” category, with precision, recall, and F1-score of 0.00. This is likely due to the small number of samples in this category, making it difficult for the models to accurately predict.

Secondly, on the “Pass” category, all models performed relatively well, with precision between 0.75 and 0.77, recall between 0.78 and 0.89, and F1-score between 0.77 and 0.82. This indicates that the models are able to identify positive cases with a decent level of accuracy.

Thirdly, for the “Excellent” category, the models also showed relatively good prediction performance, with precision between 0.62 and 0.75, recall between 0.59 and 0.69, and F1-score between 0.64 and 0.68. However, the recall rate was slightly lower compared to the “Pass” category, suggesting that the models may struggle to correctly identify all instances of “Excellent”.

Based on the evaluation metrics provided, it seems that all models have challenges in predicting the “Fail” category. This is evident from the precision, recall, and F1-score values of 0.00 for the “Fail” category in all models. On the other hand, the models perform relatively better in predicting the “Pass” and “Excellent” categories, with precision values ranging from 0.75 to 0.77, recall values ranging from 0.62 to 0.89, and F1-score values ranging from 0.68 to 0.82 across the models.

Overall, it seems that the random forest and SVM models perform slightly better than the others in terms of precision and recall for the “Pass” and “Excellent” categories. However, it is important to consider other factors such as computational efficiency, interpretability, and the specific requirements of the prediction task when selecting the most suitable model.

Additionally, the accuracy score for all models ranges from 0.72 to 0.76, indicating that the overall accuracy of the models is relatively high.

In summary, while all models demonstrate good performance in predicting the “Pass” and “Excellent” categories, there is room for improvement in predicting the “Fail” category. It may be worth considering further model optimization or exploring alternative algorithms to improve performance in all categories.

Based on the above results of Table 9, the following analysis conclusions can be made:

Firstly, the performance of the random forest model was found to be superior in correctly categorizing samples as “Pass” and “Fair”, but less accurate in classifying samples as “Good” and “Excellent”. This suggests that the model is more effective at distinguishing between normal and slightly inferior samples but struggles with distinguishing between excellent and outstanding samples.

Secondly, the SVM model exhibited higher accuracy in correctly classifying samples as “Pass”, but lower accuracy in classifying samples as “Fair” and “Excellent”. This could potentially be attributed to the SVM model encountering challenges when dealing with nonlinear separable data.

Thirdly the gradient boosting model demonstrated relatively strong performance across all categories as it achieved the highest accuracy in classifying samples as “Pass”, “Fair”, and “Good”. This effectiveness could be attributed to gradient boosting being a comprehensive learning approach that effectively captures complex patterns within the dataset.

Fourthly, the K-neighbors model displayed good performance in correctly categorizing samples as “Pass” and “Fair”, but struggled in classifying samples as “Good”, “Very Good”, and “Excellent”. This limitation can be attributed to the K-neighbors model’s sensitivity to the dataset’s distribution and feature selection.

Fifthly, the AdaBoost model showcased strong performance in correctly classifying samples as “Pass”, “Fair”, and “Good”, but performed less accurately in classifying samples as “Very Good” and “Excellent”. This could be due to the AdaBoost model’s susceptibility to overfitting and limited capability to handle complex datasets.

In summary, based on the overall accuracy, macro average, and weighted average evaluation indicators, the K-neighbors model exhibited the highest performance, followed by the SVM model. On the other hand, the random forest, gradient boosting, and AdaBoost models displayed comparatively weaker performance. It is important to note that the performance of each model varied across different categories, with random forest performing poorly in the “Fail” category and SVM performing better in the “Pass” category. An additional visual comparison should be provided, presenting the model accuracy for two, three, and five classification categories using appropriate graphical representations. In addition, considering additional evaluation metrics, such as precision, recall, and F1-score, would provide a more comprehensive assessment of each model’s performance and its ability to correctly identify positive and negative instances.

According to the above data, this paper presents a visual comparison of the model accuracy of category 2, category 3, and category 5 prediction of the five machine learning algorithms, as shown in Figure 4:

4.3. Multi-Model Fusion Learning Performance Analysis

From the observations in Section 3, we can conclude that different machine learning models perform differently on different performance prediction tasks, i.e., no single algorithm performs the best on all task predictions. Based on this, this section proposes a multi-model fusion framework based on machine learning, which integrates multiple machine learning algorithms and selects the optimal machine learning algorithm by voting on each task to predict student performance. We further apply this framework in the work of student education management work.

The specific prediction process is shown in Figure 5 and is divided into five steps:

Step 1: Data Preprocessing. Clean the data, such as handling missing values and outliers; normalize or standardize the data to ensure the data are on a uniform scale; analyze the relevance of course scores to ensure strongly relevant courses are selected.

Step 2: Model Selection and Training. Select the regression prediction model and the classification prediction model suitable for student achievement prediction. In this paper, linear regression, ridge regression, lasso regression, and decision tree regression were used as candidates for the regression prediction models, and random forest, support vector machine, gradient boost tree, K-nearest neighbor, and adaptive boost were used as candidates for the classification prediction model.

Step 3: Multi-Model Fusion and Model Evaluation. Each model is tested using the test set to calculate the accuracy. By comparing the accuracy of different models, the model with the highest accuracy is selected as the model for subsequent prediction.

Step 4: Prediction and Student Classification. The students’ grades are predicted using selected models or fused models. The students are then categorized based on these predicted grades. A threshold is employed to classify students into distinct categories, such as excellent, good, medium, passing, and failing grades.

Step 5: Results Output. The performance of the model can be further analyzed to provide a reference for future academic interventions. From the perspective of application results, the machine learning algorithm based on multi-model fusion can integrate the advantages of a single algorithm, providing the integrated model with stronger generalization ability than the single algorithm model.

5. Discussion

5.1. Comparison of Research Results

This study systematically reviews the existing literature to reveal its academic value in the field of student academic performance prediction and identifies the unique contributions of this research from a methodological innovation perspective. In terms of technical approach, unlike the single deep neural network architecture used in [14], this study innovatively constructs a multi-model fusion framework, effectively integrating the strengths of heterogeneous models through ensemble learning, significantly enhancing the stability and accuracy of the prediction system. In contrast to [15], which solely relies on SMOTE oversampling or feature selection algorithms to address data imbalance, this method combines algorithm optimization with model integration, offering a multi-dimensional solution. Addressing the limitations of clustering analysis and deep learning in [16], this study creatively merges supervised and unsupervised learning methods, establishing a hybrid architecture where regression and classification models collaborate, expanding the prediction scope. Unlike [17], which relies on AutoML for automatic optimization, this study enhances both prediction performance and result interpretability through manually designed model selection and parameter tuning mechanisms. Although [18] constructs a prediction model based on online learning behavior data, this study builds a more robust prediction system by integrating multiple classification techniques. Notably, while [19] focuses on online education, the multi-source data fusion approach validated in this study within the context of traditional university education shares methodological similarities.

5.2. Research Limitations and Future Prospects

Despite the significant contributions of this study to the field of student academic performance prediction, several limitations remain. Firstly, this study has not fully considered factors that may influence academic performance, such as student personal characteristics, teaching methods, and family background. Future research could incorporate these factors to build a more comprehensive prediction model. Secondly, although multiple machine learning models were employed, the interpretability analysis of the models is not sufficiently in-depth. Future work could explore methods to enhance the interpretability of “black box” models. Regarding course relevance analysis, while strong correlations between courses were identified, the underlying causes were not explored in detail. Future research could examine how course content and teaching methods affect academic performance. In terms of model performance evaluation, although common evaluation metrics were used, the analysis of performance across different subgroups or under specific conditions was limited. Future studies could introduce more refined evaluation methods, such as fairness assessment and robustness testing. Finally, given that this study is based on cross-sectional analysis, lacking long-term tracking, future research could adopt longitudinal studies to track the dynamic changes in students’ academic performance.

5.3. Practical Application Value of the Research

The student overall course performance prediction model proposed in this study holds significant practical value in educational management. The model’s prediction results can effectively support educational institutions and teachers in their teaching decisions. By accurately forecasting student performance, teachers can identify students facing academic difficulties and adjust teaching methods or provide personalized tutoring based on the predictions, thereby improving teaching effectiveness. Furthermore, by identifying students at risk of academic failure in advance, schools can implement targeted interventions to reduce dropout rates and mitigate the risk of declining performance. Additionally, the prediction results can assist educational administrators in optimizing the allocation of educational resources. By precisely identifying students who require additional support, resources can be allocated efficiently to ensure that they are maximally utilized to assist students in need. Moreover, this study can be applied to course adjustments, teaching strategy optimization, personalized learning recommendations, and interventions, further promoting educational equity.

5.4. Generalizability of the Research

The student overall course performance prediction method proposed in this study demonstrates good generalizability and can be applied to different grading systems. Although different institutions may adopt varying grading standards, for example, many schools use traditional test scores as the main assessment criteria, usually categorized as midterm exams, final exams, and classroom quizzes. A student’s final grade may be based on a weighted average of these exams. Some institutions use a grading scale based on sustained student performance, with assessments including class work, midterm exams, extracurricular reading reports, and group activities. Instead of numerical scores, some schools use grading systems (e.g., A, B, C, D, etc.) or descriptive grading systems (e.g., excellent, good, passing, failing), and these grades may be closely related to a student’s particular skill level and stage of development. The core of the model is based on predicting students’ historical performance. To apply it to different grading systems, adjustments in data preprocessing and feature engineering are required. For instance, some institutions may place more emphasis on project assessments or class participation rather than traditional exam scores, which would necessitate adaptation of the input data. Furthermore, adjustments should be made according to the local education system, student demographics, and course structure to ensure the accuracy and applicability of the model.

5.5. Research Thinking

In this study, four courses were selected as prediction targets, based on their significance in academic performance and their ability to represent learning conditions across different disciplines. However, we acknowledge that using only these four courses for prediction may have certain limitations. Future research could extend the applicability of the model by incorporating prediction tasks for additional courses, further validating the generalizability and stability of the approach. Furthermore, future work could consider incorporating more student characteristics, such as attendance rate, assignment completion, and learning behaviors, as these factors may positively influence academic performance prediction, thereby enhancing the accuracy and practical application value of the model.

6. Conclusions

In this paper, the authors conducted an experimental analysis of the academic performance of students majoring in Computer Science at a university. We first investigated the correlation between course scores, which can provide valuable insights into course arrangements in colleges and universities. We found that the performance prediction models for different tasks or courses were unstable, indicating the need for a more robust approach. To address this issue, the authors proposed a framework based on multiple model fusion for predicting academic performance. We applied this framework to predict student performance, classification, and early warning in college settings. The results showed promising practical value. The paper suggests several directions for further research. For instance, the study only focused on the correlation between grades and did not consider other factors that could affect academic performance, such as students’ characteristics and teachers’ teaching methods. Future research could incorporate these additional factors to build a more comprehensive student achievement prediction system. Overall, this work contributes to the field of academic performance prediction and provides valuable insights for improving educational practices. Further research in this area is encouraged to enhance the accuracy and relevance of student performance prediction models.

Author Contributions

Software, L.Y.; formal analysis, W.Z. (Wei Zhong); data curation, J.D.; writing—original draft, W.Z. (Wei Zou). All authors have read and agreed to the published version of the manuscript.

Funding

The Yunnan International Joint R&D Center of China-Laos-Thailand Educational Digitalization (202203AP140006), National New Liberal Arts Research and Reform Practice Project (No. 2021180030), Basic Research Project of Science and Technology Department of Yunnan Province (No. 202401AT070112), Open Funding Programme of Key Laboratory of Education Informatization for Nationalities (EIN202105), Open Funding Programme of Key Laboratory of Yunnan Province Smart Education (YNSE2024C004), and Educational Project of Yunnan Provincial Philosophy and Social Sciences Planning (AC24009).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Acknowledgments

The author sincerely thanks the team for their guidance. The author sincerely expresses thanks to the reviewers for taking the time to review the paper in a busy schedule.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Khan, A.; Ghosh, S.K. Student performance analysis and prediction in classroom learning: A review of educational data mining studies. Educ. Inf. Technol. 2021, 26, 205–240. [Google Scholar] [CrossRef]
Wang, Y.; OuYang, Y.; Levkiv, M. Academic performance prediction model based on educational similarity. In Proceedings of the 2023 17th International Conference on the Experience of Designing and Application of CAD Systems (CADSM), Jaroslaw, Poland, 22–25 February 2023; Volume 1, pp. 1–4. [Google Scholar]
Amjad, S.; Younas, M.; Anwar, M.; Shaheen, Q.; Shiraz, M.; Gani, A. Data mining techniques to analyze the impact of social media on academic performance of high school students. Wirel. Commun. Mob. Comput. 2022, 2022, 9299115. [Google Scholar] [CrossRef]
Gao, X. Characteristics of Study Motivation of Contemporary College Students and Its Impact on Academic Achievement. High. Educ. Explor. 2020, 43–47. (In Chinese) [Google Scholar] [CrossRef]
Sun, R.J.; Shen, R.M.; Guan, L.S. Study on the influencing factors of college students’ learning effectiveness. J. Nat. Acad. Educ. Adm. 2012, 9, 65–71. [Google Scholar]
Marbouti, F.; Diefes-Dux, H.A.; Madhavan, K.J.C. Models for early prediction of at-risk students in a course using standards-based grading. Comput. Educ. 2016, 103, 1–15. [Google Scholar] [CrossRef]
Conard, M.A. Aptitude is not enough: How personality and behavior predict academic performance. J. Res. Personal. 2006, 40, 339–346. [Google Scholar] [CrossRef]
Wheaton, A.G.; Chapman, D.P.; Croft, J.B. School start times, sleep, behavioral, health, and academic outcomes: A review of the literature. J. Sch. Health 2016, 86, 363–381. [Google Scholar] [CrossRef]
Xu, X.; Wang, J.; Peng, H.; Wu, R. Prediction of academic performance associated with internet usage behaviors using machine learning algorithms. Comput. Hum. Behav. 2019, 98, 166–173. [Google Scholar] [CrossRef]
Philipp, J.G. End of Course Grades and End of Course Tests in the Virtual Environment: A Study of Correlation; Liberty University: Lynchburg, VA, USA, 2014. [Google Scholar]
Sticca, F.; Goetz, T.; Nett, U.E.; Hubbard, K.; Haag, L. Examining the Accuracy of Students’ Self-Reported Academic Grades from a Correlational and a Discrepancy Perspective: Evidence from a Longitudinal Study. PLoS ONE 2017, 12, e0187367. [Google Scholar] [CrossRef]
Caponera, E.; Sestito, P.; Russo, P.M. The Influence of Reading Literacy on Mathematics and Science Achievement. J. Educ. Res. 2016, 109, 197–204. [Google Scholar] [CrossRef]
Nie, X.; Ma, Y.; Qiao, H.; Guo, J.; Cui, C.; Yu, Z.; Liu, X.; Yin, Y. Survey on student academic performance prediction from the perspective of task granularity. J. Shandong Univ. (Eng. Sci.) 2022, 52, 1–14. [Google Scholar]
Nabil, A.; Seyam, M.; Abou-Elfetouh, A. Prediction of students’ academic performance based on courses’ grades using deep neural networks. IEEE Access 2021, 9, 140731–140746. [Google Scholar] [CrossRef]
Bujang, S.D.A.; Selamat, A.; Ibrahim, R.; Krejcar, O.; Herrera-Viedma, E.; Fujita, H.; Ghani, N.A.M. Multiclass prediction model for student grade prediction using machine learning. IEEE Access 2021, 9, 95608–95621. [Google Scholar] [CrossRef]
Feng, G.; Fan, M.; Chen, Y. Analysis and prediction of students’ academic performance based on educational data mining. IEEE Access 2022, 10, 19558–19571. [Google Scholar] [CrossRef]
Zeineddine, H.; Braendle, U.; Farah, A. Enhancing prediction of student success: Automated machine learning approach. Comput. Electr. Eng. 2021, 89, 106903. [Google Scholar] [CrossRef]
Brahim, G.B. Predicting student performance from online engagement activities using novel statistical features. Arab. J. Sci. Eng. 2022, 47, 10225–10243. [Google Scholar] [CrossRef]
Qiu, F.; Zhang, G.; Sheng, X.; Jiang, L.; Zhu, L.; Xiang, Q.; Jiang, B.; Chen, P.-K. Predicting students’ performance in e-learning using learning process and behaviour data. Sci. Rep. 2022, 12, 453. [Google Scholar] [CrossRef]
Mingyu, Z.; Sutong, W.; Yanzhang, W.; Dujuan, W.J.C. An interpretable prediction method for university student academic crisis warning. Complex Intell. Syst. 2022, 8, 323–336. [Google Scholar] [CrossRef]
Gareth, J.; Witten, D.; Hastie, T.; Tibshirani, R.; Taylor, J. An Introduction to Statistical Learning: With Applications in Python; Springer International Publishing: Cham, Switzerland, 2023. [Google Scholar]
Wang, X.; Wang, X.; Ma, B.; Li, Q.; Wang, C.; Shi, Y. High-performance reversible data hiding based on ridge regression prediction algorithm. Signal Process. 2023, 204, 108818. [Google Scholar] [CrossRef]
Mei, Z.; Shi, Z. On LASSO for high dimensional predictive regression. J. Econom. 2024, 242, 105809. [Google Scholar] [CrossRef]
Costa, V.G.; Pedreira, C.E. Recent advances in decision trees: An updated survey. Artif. Intell. Rev. 2023, 56, 4765–4800. [Google Scholar] [CrossRef]
He, S.; Wu, J.; Wang, D.; He, X. Predictive modeling of groundwater nitrate pollution and evaluating its main impact factors using random forest. Chemosphere 2022, 290, 133388. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.; Mao, Q.; Wang, B.; Duan, P.; Zhang, B.; Hong, Z. Privacy-preserving multi-class support vector machine model on medical diagnosis. IEEE J. Biomed. Health Inform. 2022, 26, 3342–3353. [Google Scholar] [CrossRef] [PubMed]
Mohd Talib, N.I.; Abd Majid, N.A.; Sahran, S.J.A.S. Identification of student behavioral patterns in higher education using K-means clustering and support vector machine. Appl. Sci. 2023, 13, 3267. [Google Scholar] [CrossRef]
Dombry, C.; Duchamps, J.-J. Infinitesimal gradient boosting. Stoch. Process. Their Appl. 2024, 170, 104310. [Google Scholar] [CrossRef]
Uddin, S.; Haque, I.; Lu, H.; Moni, M.A.; Gide, E.J.S.R. Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Sci. Rep. 2022, 12, 6256. [Google Scholar] [CrossRef]
Rincon-Flores, E.G.; Lopez-Camacho, E.; Mena, J.; Olmos, O. Teaching through learning analytics: Predicting student learning profiles in a physics course at a higher education institution. Int. J. Interact. Multimedia Artif. Intell. 2022, 7, 82–89. [Google Scholar] [CrossRef]
Wu, Z.; Jing, L.; Wu, B.; Jin, L. A PCA-AdaBoost model for E-commerce customer churn prediction. Ann. Oper. Res. 2022, 208, 1–18. [Google Scholar] [CrossRef]
Sevinç, E.J.C. An empowered AdaBoost algorithm implementation: A COVID-19 dataset study. Comput. Ind. Eng. 2022, 165, 107912. [Google Scholar] [CrossRef]

Figure 1. Heat map of the course correlation coefficient matrix.

Figure 2. Course cluster dendrogram.

Figure 3. Precision comparison of the four types of regression algorithms.

Figure 4. Comparison of model accuracy in grade prediction of different courses.

Figure 5. Multi-model fusion performance analysis flow chart.

Table 1. Relativity of strong-correlation courses in the same cluster with Data Structures.

Course Name	Introduction to Computers (1st Semester)	Linear Algebra (1st Semester)	Foundations of Programming (2nd Semester)	Discrete Mathematics (2nd Semester)
Data Structures (3rd semester)	0.50	0.52	0.54	0.58

Table 2. Relativity of strong-correlation courses in the same cluster with Algorithm Design and Analysis.

Course Name	Introduction to Computers (1st Semester)	Linear Algebra (1st Semester)	Discrete Mathematics (2nd Semester)	Object-Oriented Programming (3rd Semester)	Probability and Statistics (3rd Semester)	Data Structures (3rd Semester)
Algorithm Analysis and Design (4th semester)	0.40	0.34	0.36	0.55	0.53	0.47

Table 3. Relativity of strong-correlation courses in the same cluster with Operating Systems.

Course Name	Linear Algebra (1st Semester)	Discrete Mathematics (2nd Semester)	Foundations of Programming (2nd Semester)	Object-Oriented Programming (3rd Semester)	Probability and Statistics (3rd Semester)	Data Structures (3rd Semester)
Operating Systems (4th semester)	0.42	0.53	0.46	0.48	0.60	0.53

Table 4. Relativity of strong-correlation courses in the same cluster with Compiler Principles.

Course Name	Discrete Mathematics (2nd Semester)	Object-Oriented Programming (3rd Semester)	Probability and Statistics (3rd Semester)	Data Structures (3rd Semester)	Algorithm Analysis and Design (4th Semester)	Operating Systems (4th Semester)
Compiler Principles (5th semester)	0.58	0.52	0.66	0.54	0.47	0.65

Table 5. Comparison table of evaluation indicators for four types of regression algorithms.

Algorithm	Course Name	Intercept	MSE	RMSE	RE
Linear Regression	Compiler Principles	−0.34	72.32	8.50	9.27%
	Operating Systems	11.43	84.27	9.18	9.28%
	Data Structures	9.21	74.85	8.65	9.77%
	Algorithm Analysis and Design	31.38	53.60	7.32	7.3%
Ridge Regression	Compiler Principles	−0.33	72.32	8.50	9.27%
	Operating Systems	11.43	84.27	9.18	9.28%
	Data Structures	9.21	74.85	8.65	9.77%
	Algorithm Analysis and Design	31.38	53.60	7.32	7.3%
Lasso Regression	Compiler Principles	1.00	71.05	8.43	9.19%
	Operating Systems	12.34	84.34	9.18	9.29%
	Data Structures	10.37	75.35	8.68	9.82%
	Algorithm Analysis and Design	32.08	52.42	7.24	7.23%
Decision Tree Regression	Compiler Principles	None	110.08	10.49	11.28%
	Operating Systems	None	189.89	13.78	12.48%
	Data Structures	None	151.93	12.33	13.83%
	Algorithm Analysis and Design	None	119.07	10.91	10.41%

Table 6. Precision comparison of four types of regression algorithms.

Course Name	Linear Regression	Ridge Regression	Lasso Regression	Decision Tree Regression
Data Structures	0.78	0.78	0.78	0.68
Compiler Principles	0.78	0.78	0.79	0.70
Algorithm Analysis and Design	0.81	0.81	0.81	0.72
Operating Systems	0.75	0.75	0.77	0.68

Table 7. Prediction accuracy comparison of category 2.

Prediction Model	Evaluation Metrics	Precision	Recall	F1-Score	Support
Random Forest	Fail	0.00	0.00	0.00	4
	Pass	0.95	1.00	0.98	84
	Accuracy	--	--	0.95	88
	Macro Avg	0.48	0.50	0.49	88
	Weighted Avg	0.91	0.95	0.93	88
SVM	Fail	0.00	0.00	0.00	4
	Pass	0.95	1.00	0.98	84
	Accuracy	--	--	0.95	88
	Macro Avg	0.48	0.50	0.49	88
	Weighted Avg	0.91	0.95	0.93	88
Gradient Boosting	Fail	0.00	0.00	0.00	4
	Pass	0.95	1.00	0.98	84
	Accuracy	--	--	0.95	88
	Macro Avg	0.48	0.50	0.49	88
	Weighted Avg	0.91	0.95	0.93	88
K-Neighbors	Fail	0.00	0.00	0.00	4
	Pass	0.95	1.00	0.98	84
	Accuracy	--	--	0.95	88
	Macro Avg	0.48	0.50	0.49	88
	Weighted Avg	0.91	0.95	0.93	88
AdaBoost	Fail	0.00	0.00	0.00	4
	Pass	0.95	1.00	0.98	84
	Accuracy	--	--	0.95	88
	Macro Avg	0.48	0.50	0.49	88
	Weighted Avg	0.91	0.95	0.93	88

Table 8. Prediction accuracy comparison of category 3.

Prediction Model	Evaluation Metrics	Precision	Recall	F1-Score	Support
Random Forest	Fail	0.00	0.00	0.00	4
	Pass	0.77	0.89	0.82	55
	Excellent	0.75	0.62	0.68	29
	Accuracy	--	--	0.76	88
	Macro Avg	0.51	0.50	0.50	88
	Weighted Avg	0.73	0.76	0.74	88
SVM	Fail	0.00	0.00	0.00	4
	Pass	0.77	0.85	0.81	55
	Excellent	0.70	0.66	0.68	29
	Accuracy	--	--	0.75	88
	Macro Avg	0.49	0.50	0.50	88
	Weighted Avg	0.71	0.75	0.73	88
Gradient Boosting	Fail	0.00	0.00	0.00	4
	Pass	0.75	0.87	0.81	55
	Excellent	0.71	0.59	0.64	29
	Accuracy	--	--	0.74	88
	Macro Avg	0.49	0.49	0.48	88
	Weighted Avg	0.70	0.74	0.72	88
K-Neighbors	Fail	0.00	0.00	0.00	4
	Pass	0.76	0.82	0.79	55
	Excellent	0.66	0.66	0.66	29
	Accuracy	--	--	0.73	88
	Macro Avg	0.47	0.49	0.48	88
	Weighted Avg	0.69	0.73	0.71	88
AdaBoost	Fail	0.00	0.00	0.00	4
	Pass	0.77	0.78	0.77	55
	Excellent	0.62	0.69	0.66	29
	Accuracy	--	--	0.72	88
	Macro Avg	0.46	0.49	0.48	88
	Weighted Avg	0.69	0.72	0.70	88

Table 9. Prediction accuracy comparison of category 5.

Prediction Model	Evaluation Metrics	Precision	Recall	F1-Score	Support
Random Forest	Fail	0.00	0.00	0.00	4
	Pass	0.54	0.58	0.56	24
	Fair	0.38	0.48	0.43	31
	Good	0.50	0.33	0.40	21
	Excellent	0.44	0.50	0.47	8
	Accuracy	--	--	0.45	88
	Macro Avg	0.37	0.38	0.37	88
	Weighted Avg	0.44	0.45	0.44	88
SVM	Fail	0.00	0.00	0.00	4
	Pass	0.63	0.50	0.56	24
	Fair	0.43	0.65	0.52	31
	Good	0.50	0.33	0.40	21
	Excellent	0.44	0.50	0.47	8
	Accuracy	--	--	0.49	88
	Macro Avg	0.40	0.40	0.39	88
	Weighted Avg	0.49	0.49	0.47	88
Gradient Boosting	Fail	0.00	0.00	0.00	4
	Pass	0.58	0.62	0.60	24
	Fair	0.50	0.52	0.51	31
	Good	0.50	0.48	0.49	21
	Excellent	0.40	0.50	0.44	8
	Accuracy	--	--	0.51	88
	Macro Avg	0.40	0.42	0.41	88
	Weighted Avg	0.49	0.51	0.50	88
K-Neighbors	Fail	0.00	0.00	0.00	4
	Pass	0.53	0.42	0.47	24
	Fair	0.36	0.52	0.43	31
	Good	0.42	0.24	0.30	21
	Excellent	0.31	0.50	0.38	8
	Accuracy	--	--	0.40	88
	Macro Avg	0.32	0.33	0.32	88
	Weighted Avg	0.40	0.40	0.38	88
AdaBoost	Fail	0.11	0.25	0.15	4
	Pass	0.27	0.12	0.17	24
	Fair	0.39	0.42	0.41	31
	Good	0.40	0.67	0.50	21
	Excellent	0.00	0.00	0.00	8
	Accuracy	--	--	0.35	88
	Macro Avg	0.24	0.29	0.25	88
	Weighted Avg	0.31	0.35	0.32	88

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zou, W.; Zhong, W.; Du, J.; Yuan, L. Prediction of Student Academic Performance Utilizing a Multi-Model Fusion Approach in the Realm of Machine Learning. Appl. Sci. 2025, 15, 3550. https://doi.org/10.3390/app15073550

AMA Style

Zou W, Zhong W, Du J, Yuan L. Prediction of Student Academic Performance Utilizing a Multi-Model Fusion Approach in the Realm of Machine Learning. Applied Sciences. 2025; 15(7):3550. https://doi.org/10.3390/app15073550

Chicago/Turabian Style

Zou, Wei, Wei Zhong, Junzhen Du, and Lingyun Yuan. 2025. "Prediction of Student Academic Performance Utilizing a Multi-Model Fusion Approach in the Realm of Machine Learning" Applied Sciences 15, no. 7: 3550. https://doi.org/10.3390/app15073550

APA Style

Zou, W., Zhong, W., Du, J., & Yuan, L. (2025). Prediction of Student Academic Performance Utilizing a Multi-Model Fusion Approach in the Realm of Machine Learning. Applied Sciences, 15(7), 3550. https://doi.org/10.3390/app15073550

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction of Student Academic Performance Utilizing a Multi-Model Fusion Approach in the Realm of Machine Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Determination of Analysis Objects and Data Preprocessing

2.2. Analytical Method and Process

3. Model Selection

3.1. Methodology Selection and Rationale

3.2. Regression Prediction

3.3. Classification Prediction

4. Experiments and Analysis

4.1. Regression Prediction Experiments

4.2. Classification Prediction Experiments

4.3. Multi-Model Fusion Learning Performance Analysis

5. Discussion

5.1. Comparison of Research Results

5.2. Research Limitations and Future Prospects

5.3. Practical Application Value of the Research

5.4. Generalizability of the Research

5.5. Research Thinking

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI