Renewal of the Concept of Diverse Education: Possibility of Further Education Based on a Novel AI-Based RF–ISSA Model

Li, Enhui; Wang, Zixi; Liu, Jin; Huang, Jiandong

doi:10.3390/app15010250

Open AccessArticle

Renewal of the Concept of Diverse Education: Possibility of Further Education Based on a Novel AI-Based RF–ISSA Model

by

Enhui Li

¹,

Zixi Wang

^1,*

,

Jin Liu

^1,* and

Jiandong Huang

^2,3,4

¹

College of Music and Dance, Guangzhou University, Guangzhou 510006, China

²

School of Civil and Transportation Engineering, Guangzhou University, Guangzhou 510006, China

³

Higher School of Advanced Digital Technologies, Peter the Great St. Petersburg Polytechnic University, St. Petersburg 195251, Russia

⁴

School of Civil Engineering and Architecture, Linyi University, Linyi 276000, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(1), 250; https://doi.org/10.3390/app15010250

Submission received: 3 November 2024 / Revised: 16 December 2024 / Accepted: 22 December 2024 / Published: 30 December 2024

(This article belongs to the Special Issue Artificial Intelligence Technologies for Education: Advancements, Challenges, and Impacts)

Download

Browse Figures

Versions Notes

Abstract

:

The traditional graduate admission method is to evaluate students’ performance and interview results, but this method relies heavily on the subjective feelings of the evaluators, and these methods may not be able to comprehensively and objectively evaluate the qualifications and potential of the applicants. At present, artificial intelligence has played a key role in the reform of the education system, and the data processing function of artificial intelligence has greatly reduced the workload of screening work. Therefore, this study aims to optimize the graduate enrollment evaluation process by applying a new composite model, the random forest–improved sparrow search algorithm (RF–ISSA). The research used seven data sets including research, cumulative grade point average (CGPA), letter of recommendation (LOR), statement of purpose (SOP), university rating, TOEFL score, and graduate record examination (GRE) score, and carried out the necessary data pre-processing before the model construction. The experimental results show that the RMSE and R values of the composite model are 0.0543 and 0.9281, respectively. The predicted results of the model are very close to the actual data. In addition, the study found that the importance score of CGPA was significantly higher than other characteristics, and that this value has the most significant impact on the outcome of the graduate admissions assessment. Overall, this study shows that combining the integrated strategy sparrow search algorithm (ISSA) with hyperparameter optimization and focusing on the most influential features can significantly improve the predictive performance and applicability of graduate admissions models, providing a more scientific decision support tool for school admissions professionals.

Keywords:

machine learning model; random forest; improved sparrow search algorithm; student ability assessment

1. Introduction

With the continuous development of higher education, graduate studies have become a significant option for students’ future development. Many students choose to pursue further academic research after completing their undergraduate degrees. However, graduate work is not suitable for every student. Some students may excel academically but lack the necessary research capabilities. As a result, their academic achievements may suggest they are well suited for graduate studies, but in reality, their true abilities may not meet the demands of graduate work. Therefore, it is essential to introduce more evaluation criteria in the graduate admissions process to assist admissions departments in making informed decisions. This introduces numerous uncertainties, posing a significant challenge for universities and admissions departments [1,2].

Traditional admissions methods primarily rely on applicants’ academic records, letters of recommendation, and personal statements. However, these methods tend to be subjective, and relying solely on such factors may not provide a comprehensive and objective evaluation of an applicant’s overall qualifications and potential. Consequently, truly talented researchers may be overlooked. Therefore, optimizing the graduate admissions process through modern technological means in order to improve selection efficiency and accuracy has become an urgent issue to address [3].

Analyzing the profiles of prospective students is often a challenging task, as it heavily depends on various parameters. This reliance can lead to imbalances in the evaluation process due to the subjective judgments of the evaluators, ultimately affecting the final assessment results [4,5]. Therefore, it is necessary to find a method that can objectively analyze all relevant data of students and intuitively express their abilities. To help admissions staff with student selection. At present, the rapid development of artificial intelligence and its sensitive data searching and efficient processing abilities provide a new way to solve this problem. Under this background, the graduate admission evaluation model came into being. These models leverage the data processing power and sensitivity of AI models to dig deep and analyze historical admission data to build AI models that can output intuitive data evaluations by analyzing student profiles and related information [6,7]. The graduate admission evaluation model can not only effectively help admissions departments make rational decisions about admitting or rejecting students, but also provide valuable insights to applicants. By understanding the criteria that increase one’s chances of admission, applicants can better prepare their application materials and improve their likelihood of being accepted [8,9,10,11].

Currently, many researchers are already beginning to apply artificial intelligence to the evaluation of student capabilities. These researchers make full use of AI’s ability to assist in the analysis and integration of various forms of student information, uncovering patterns that can be used to support the analysis of new data in future work. Livieris et al. [12] examined and evaluated the effectiveness of semi-supervised algorithms under two assembling methods and utilized these algorithms to predict students’ final exam grades. The research findings indicate that altering a small amount of labeled data within the semi-supervised model can significantly enhance the model’s classification accuracy. Experimental data support this conclusion, demonstrating the potential application of this method in the educational field. Alexandro David M et al. [13] developed a student dropout risk assessment model using over a hundred predictive indicators to better identify and support at-risk students, thereby reducing the likelihood of their dropping out of high school. The model provides detailed insights, which, when combined with predictive factors related to special education, allow for an in-depth analysis of students’ learning situations. This approach enables early intervention to mitigate dropout risks.

Table 1 presents examples from recent studies that integrate student data with machine learning models. These models analyze students’ behaviors, grades, and other relevant data to predict academic performance and future potential. Such predictive analysis not only aids educators in better understanding students’ learning needs but also helps in the early identification and intervention of potential learning issues, thus enhancing students’ learning efficiency and broadening their developmental pathways [14,15,16,17].

However, based on current research trends, many scholars tend to employ single algorithms when developing these machine-learning models. Although this method simplifies the model construction process, it also imposes limitations on the flexibility and adaptability of the model. The selection of machine learning model parameters is a complex and time-consuming task, one which not only poses challenges to researchers’ expertise but also impacts the model’s predictive accuracy and computational efficiency [18,19,20].

To address this challenge, this study proposes an innovative approach that optimizes the machine learning model development process by integrating multiple algorithm selection methods. The core of this approach lies in utilizing an algorithm selection mechanism to quickly identify and adapt the most suitable model parameters for the given dataset. This strategy not only significantly improves the model’s predictive accuracy but also enhances the efficiency of the model development process while maintaining performance.

Furthermore, combining algorithms can enhance the model’s generalizability, improving its adaptability to diverse data distribution and learning situations. This flexibility is crucial for addressing the ever-evolving learning needs and challenges in the field of education.

Table 1. Literature review for renewal of the concept of diverse education based on machine learning.

(1) Admission criteria and academic performance
Author	Analysis Target	Data Set Size	Machine Learning Model	Maximum Accuracy	R²
Adekitan, AI [21]	Admission criteria and the academic performance of the student after the first academic session	100	KNIME model	50.23%	-
			Orange model	51.9%	-
			Linear regression model	-	0.207
			Quadratic regression model	-	0.232
(2) Students learning ability
Author	Analysis Target	Data Set Size	Machine Learning Model	Accuracy
Kukkar [22]	Predicting students’ pass or fail outcome in certain courses	32,593	Recurrent neural network + long short term memory network + support vector machine (RNNs + LSTM + SVM)	90.67%
			Recurrent neural network + long short term memory network + naive Bayes (RNNs + LSTM + NB)	86.45%
			Recurrent neural network + long short term memory network + decision tree (RNNs + LSTM + DT)	84.42%
Raheela Asif [23]	Study the performance of undergraduate students	210	Decision tree with Gini index	68.27%
			Decision tree with information gain	69.23%
			Decision tree with accuracy	60.58%
			Rule induction with information gain	55.77%
			1-Nearest neighbor	74.04%
			Neural networks	62.50%
			Random forest trees with accuracy	62.50%
Jui-Long Hung [24]	Evaluates the accuracy of student learning ability	509	DNN	84.79%
			RF	85.37%
			DNN	95.89%
			RF	95.53%
Ashima Kukkar [25]	Performance of students in higher education	32593	Three-layer stacked long short term memory + random forest + gradient boosting (LSTM + RF + GB)	91.66%
			Two-layer stacked LSTM + RF + GB	86.54%
			One-layer LSTM + RF + GB	75.47%
(3) Students’ comprehensive psychological quality and achievement
Author	Analysis Target	Data Set Size	Machine Learning Model	Mean AUC
Radwan, AM [26]	Response of students with autism and students’ failure	3739	Support vector machine	0.74
			AdaBoost	0.73
			Logistic regression	0.72
Liana Maria Crivei [27]	Students’ course achievement	2401	Students’ performance prediction using relational association rules (SPRAR)	0.74
			DT	0.61
			ANN	0.67
			SVM	0.65
			Original SPRAR	0.7

The remainder of this paper is organized into several key sections. Section 2 provides a detailed description of the dataset, including its sources, characteristics, and significance in the model development process. Section 3 outlines the selection algorithms employed in this study, the specific machine learning models used, and the metrics utilized to quantify and evaluate the predictive outcomes. Section 4 systematically explains the experimental design and implementation process, using charts and graphs to visually present the experimental data, aiding readers in quickly understanding and analyzing the results. Section 5 offers a thorough analysis and discussion of all experimental findings, distills the research conclusions, and explores potential directions for future research and improvements.

2. Materials

The data used in this study were collected by other scholars [28]. The dataset includes seven characteristics: research, CGPA, LOR, SOP, college rating, TOEFL score, and GRE score. Among these characteristics, some have direct relevance and can objectively evaluate student-specific performance and behavioral characteristics in specific contexts, while others lack direct objective criteria. Different data types cannot be used to build the same model at the same time. Therefore, it is necessary to perform certain pre-processing steps before using these data for model building to ensure that the data are represented in the same model and meets the requirements of model building.

Specifically, for directly related characteristics, such as CGPA and GRE score, these data can often be used directly as input to the model because they provide clear quantitative indicators for assessing a student’s academic ability. As the most intuitive student data, these two parameters can provide the most direct judgment basis for the assessment staff. However, SOP, LOR, and other features are qualitative descriptions that need to be translated into a format suitable for model training through text processing and feature extraction techniques. Alternatively, they may be artificially classified to highlight the impact of this type of data on student evaluations. In addition, special attention needs to be paid when addressing university rankings and research functions, two parameters that are not directly relevant to students. This is necessary to ensure the consistency and accuracy of data in data processing and transformation.

Due to the nature of the data entry, there may be gaps or incorrect filling. This can lead to gaps or noise in the data set. Therefore, it is necessary to preprocess the data set before its use. The pre-processing steps include data cleaning and missing value input. The purpose of data cleaning is to remove noise and errors in the data while the insertion of missing values is undertaken in order to solve gaps in the data and improve its integrity. Through a certain degree of pre-processing, the integrity of the data is guaranteed, thus providing enough quality data for the construction of the model [29,30,31].

These pre-processing steps are designed to improve data quality, reduce noise, and ensure that the model is trained and tested on reliable input data. By implementing these steps, the model can effectively utilize the information contained in the data, thereby improving its predictive performance and stability [32,33,34,35].

3. Proposed Methodology

3.1. ISSA

The sparrow search algorithm (SSA) is a population intelligence optimization algorithm, one which is mainly inspired by the sparrow’s foraging behavior and anti-predation behavior [36,37]. In the process of foraging, the sparrows are divided into discoverers and participants. To obtain food, sparrows can adopt two behavioral strategies: finder and entrant. The finder is responsible for finding food in the population and provides foraging areas and directions for the entire sparrow population, while the entrant uses the discoverer to obtain food. Individuals in the group monitor the behavior of other individuals in the group, and attackers in the group compete for food resources with their peers to increase their own predation rates. When a sparrow population is attacked by a predator, it will engage in anti-predation behavior. Sparrows on the periphery of the population are vulnerable to predators and need to constantly adjust their position for better protection. At the same time, sparrows in the center of the population approach their neighbors to minimize danger zones [38].

In SSA, the finder with a better fitness value will obtain the food first (i.e., the better solution) during the search. Because the finder is responsible for finding food for the entire sparrow population and for providing foraging directions for all of the entrants, the finder has access to a larger foraging search area than the entrant. During the foraging process, entrants will either always monitor the discoverer, compete with the discoverer for food, or hunt around the discoverer.

The location of the discoverer changes as the number of iterations is updated.

x_{i, j} (t + 1) = \{\begin{matrix} x_{i, j} (t) \cdot e x p (- \frac{i}{α \cdot {i t e m}_{m a x}}), R_{2} < S T \\ x_{i, j} (t) + Q \cdot L, R_{2} \geq S T \end{matrix}

(1)

Each sparrow takes into account the optimal solution of the neighbor and, with a certain probability, moves towards the neighbor’s position. It hops randomly within a certain distance with a certain probability, centered on the current optimal solution. Additionally, each sparrow randomly perturbs its position with a certain probability to avoid falling into the local optimal solution. The current solution is evaluated according to the fitness function, and the optimal solution and the location of the optimal solution are updated. The algorithm terminates when a preset stopping condition is reached, such as reaching the maximum number of iterations or meeting certain accuracy requirements.

When the discoverers find the food, the position of the entrants is updated as follows:

x_{i, j} (t + 1) = \{\begin{matrix} Q \cdot e x p (\frac{{x_{w r o s t} - x}_{i, j} (t)}{i^{2}}), i > \frac{n}{2} \\ x_{p} (t + 1) + |x_{i, j} (t) - x_{p} (t + 1)| \cdot A^{+} \cdot L, i \leq \frac{n}{2} \end{matrix}

(2)

where

x_{p}

can be determined as the locally optimized position by the discoverer, and

x_{w r o s t}

can be selected as the global worst position. A is a 1 × d matrix where each element is randomly assigned to either 1 or −1,

A^{+} = {A^{T} (A A^{T})}^{- 1}

. When this is b, this indicates that i entrant with low fitness has not obtained food and needs to update its position again to find new food.

x_{p}

and

x_{w r o s t}

represent the best and worst position found by the finder. A is a 1 × d matrix of 1 and −1 at random.

The alert’s location is updated below.

x_{i, j} (t + 1) = \{\begin{matrix} x_{b e s t j} (t) + β |x_{i, j} (t) - x_{b e s t j} (t)|, f_{i} > f_{g} \\ x_{i, j} (t) + K (\frac{|x_{i, j} (t) - x_{w r o s t j} (t)|}{(f_{i} {- f}_{w}) + e}), f_{i} = f_{g} \end{matrix}

(3)

SSA has a relatively strong search ability, but it is easy to fall into the trouble of a local optimal solution because it cannot jump out of the local limit value in the face of a large number of data sets. To solve the problem that SSA is easy to fall into the local optimal solution, an improved sparrow search algorithm (ISSA) is proposed in this study. Firstly, a normal migration strategy is proposed, one which bases the population migration on the center of gravity position and which realizes the normal distribution attenuation of moving energy. This improvement effectively improves the exploration ability of local search and enables the algorithm to explore the solution space more efficiently and find better solutions in the search process. Secondly, the ISSA algorithm introduces a dynamic sinusoidal perturbation strategy. This strategy realizes the discoverer’s bidirectional demand for search step size in the early stage and fast convergence in the late stage through the scaling factor. At the initial stage of the search, the algorithm needs a larger search step size to explore a wider solution space. In the latter stage of the search, smaller search steps are needed to converge to the optimal solution quickly. The dynamic sinusoidal perturbation strategy satisfies this demand by adjusting the scaling factor, thus improving the search efficiency and precision of the algorithm. Such a process can be realized as follows:

x_{i, j} (t + 1) = \{\begin{matrix} L e v y (d) \cdot x_{b e s t j} (t) + β |x_{i, j} (t) - L e v y (d) \cdot x_{b e s t j} (t)|, f_{i} > f_{g} \\ x_{i, j} (t) + K (\frac{|x_{i, j} (t) - x_{w r o s t j} (t)|}{(f_{i} {- f}_{w}) + e}), f_{i} = f_{g} \end{matrix}

(4)

L e v y (d) = 0.01 \cdot \frac{r_{1} \cdot σ}{{|r_{2}|}^{\frac{1}{β}}}

(5)

σ = {\{\frac{Γ (1 + β) \cdot \sin (\frac{π β}{2})}{Γ ((\frac{1 + λ}{2}) β \cdot 2^{\frac{(β - 1)}{2}})}\}}^{\frac{1}{β}}

(6)

Using the improvements proposed in this study, the ISSA model greatly reduces the risk of sparrows falling into local optimality while maintaining sufficient local search capability. Random numbers of [0,1] can be employed for

r_{1}

and

r_{2}

.

3.2. RF

The random forest algorithm is a machine learning technique based on the bootstrap aggregating (Bagging) ensemble learning method. Its essence is a composite algorithm composed of multiple decision number models [39,40]. Each decision number represents a classifier within the random forest model. Bootstrapping random sampling techniques are used to draw a certain amount of data from the dataset to serve as the discriminant sample, and corresponding subsets for training and testing are generated for it [41,42]. The results of the decision trees have been obtained. All the results from the decision trees have been classified and integrated in order to form the final data outcome for the random forest ridge regression model [41,43].

The calculation steps of the RF model are outlined as follows:

Employ the Bootstrap sampling approach to randomly draw n samples out of the sample collection.

To construct a random forest, the following operations are initially performed. For each decision tree, attributes are randomly picked from the entire set of attributes. Subsequently, the optimal splitting attribute is identified to serve as the node. This process is iterated to create decision trees, thereby forming a random forest. The classification of data is ascertained via mode voting, with the mode voting equation being as follows:

H (x) = \arg \max \sum_{i = 1}^{k} I (h_{i} (x) = Y)

(7)

where

H (x)

denotes the final voting consequence of the RF model;

h_{i}

symbolizes the prediction outcome of each single decision tree; and Y

Y

indicates the objective function.

RF illustrates the approximate process of the algorithm. Given that the outcome of a solitary decision tree is highly susceptible to the training set, the final result of the algorithm hinges on the category with the greatest count. Hence, multiple decision trees are incorporated into a random forest model. By augmenting the number of samples, this effectively mitigates the correlation issue among the multiple decision trees.

3.3. Ten-Fold Cross-Validation and Judgment Index

3.3.1. Ten-Fold Cross-Validation

Ten-fold cross-validation is a commonly used model validation technique primarily aimed at assessing the specific performance and generalization capability of machine learning models [44,45,46]. The main approach of this technique involves dividing the dataset into multiple parts, repeatedly training and validating the model on different subsets, and using the results from multiple validations to cross-check the model’s predictive ability, thus providing a more stable and reliable performance estimate. The specific process is as follows:

The entire data set is randomly divided into 10 subsets of similar size.
One subset is selected as the verification set and the other nine subsets are selected as the training set. The training set is used as the data for model construction and training. The rest of the verification set is used to check the accuracy of the model built at this time.
The above steps are repeated 10 times, each time selecting a different subset as the validation set to ensure that the validation set used in each experiment is unique.
The performance indicators after each iteration were recorded and the values of ten different experiments were compared in order to evaluate the accuracy and generalizability of the model.

Of course, in different cases, the original data set can be divided into different amounts to improve the number of model tests and to enhance the precision of the model’s predictive ability. Depending on the size of the original quantity set and the choice of the specific model, the number of tests in this paper will be taken as ten.

3.3.2. Judgment Index

The RMSE and R values were employed to assess the discrepancy between the forecasted and actual values, to validate the predictive accuracy of the models for the dynamic modulus of RAP concrete. The

R M S E

value can be computed using the following formula:

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} (y_{i}^{*} - y_{i})^{2}}{n}}

(8)

where n represents the number of data sets and

y_{i}^{*}

and

y_{i}

represent the predicted and actual ITS of waste plastic modified asphalt mixture, respectively.

The formula for

R

can be summarized as follows:

R = \frac{\sum_{i = 1}^{N} (y_{i}^{*} - \bar{y^{*}}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{n} (y_{i}^{*} - \bar{y^{*}})^{2}} \sqrt{\sum_{i = 1}^{N} (y_{i} - \bar{y})^{2}}}

(9)

where

\bar{y^{*}}

and

\bar{y}

denote the mean of the forecasted and real values, respectively.

4. Results and Discussion

4.1. Data Analysis

Before constructing a predictive model, it is crucial to conduct a comprehensive pre-processing of the original data set. Raw data sets often contain incomplete or noisy information, and the inaccuracy and instability of these data can have a significant negative impact on the training effectiveness of the model [47,48,49]. Therefore, to ensure that the model can learn and predict accurately, the data must be carefully cleaned and prepared before the model-building phase [50,51,52].

To eliminate these potential disadvantages, data cleaning is particularly important. Data cleansing aims to find and correct false data and noisy information in the data that is not relevant to the model’s prediction goals, thus ensuring the purity of the data set. This process is essential to reduce the negative impact of data quality issues on model training results [53,54].

In this study in particular, we focus mainly on the relevant performance data of the applicants. Such data collection is prone to errors or data gaps, which, if not properly addressed, can greatly hinder efforts to build a model that accurately predicts student recruitment. Therefore, in the data pre-processing stage, special attention should be paid to the calibration and correction of student performance data to ensure that each datum can accurately reflect the actual academic level of students [55,56].

4.2. Correlation Analysis

After the initial processing of the data, correlation analysis of all input features is needed to improve the accuracy of the model. The purpose of this step is to determine if there is an excessive correlation between the features [57,58,59]. If the correlation value between two features is too high, it indicates that the relationship between them is strong, i.e., when one feature changes, the other feature may also change. This situation can affect the accuracy of the graduate student admission prediction model, as highly correlated features can introduce redundant information into the data set, which can degrade the performance of the model [60,61,62].

By analyzing the correlation matrix (Figure 1), we found that most of the correlation values between features are below 0.5, indicating weak correlations between them. However, there is an exception where the correlation value between the university rating and SOP is 0.5, showing a moderate level of correlation. Nevertheless, this correlation value remains within an acceptable range and will not have a significant impact on model training. The results of this correlation analysis indicate that the selected input features generally meet the requirements for model training and effectively mitigate the impact of feature relationships on the prediction model’s accuracy, providing a solid foundation for constructing an accurate and reliable graduate admissions prediction model.

4.3. Hyperparameter Tuning

After determining the selected features for the model, the collected dataset is used for model construction. However, when building the graduate admissions prediction model, the initial random forest (RF) model requires the manual tuning of several hyperparameters to ensure the model’s predictive accuracy. The selection of hyperparameters requires a high level of expertise and experience from engineers. For those with less experience or data sensitivity, finding the most suitable hyperparameter values for the graduate admissions prediction problem within a short time frame can be challenging.

To enhance the model’s predictive accuracy and reliability, this study employs an improved sparrow search algorithm (ISSA) to adjust and optimize the hyperparameters of the random forest algorithm. Through optimization by the ISSA, the model can identify more suitable hyperparameters, thereby better adapting to the graduate admissions prediction task.

Figure 2 shows the changes in the RF model RMSE after multiple hyperparameter selections using the ISSA algorithm. Evidently, with the increase in the number of ISSA iterations, the RMSE of the integrated ISSA–RF model gradually nears the optimal value. After the 14th iteration, the RMSE of the combined model stabilizes. This indicates that the ISSA algorithm has a significant optimization effect on the RF model, and that the chosen hyperparameters effectively enhance the RF model’s information processing capability, thereby improving its predictive accuracy and applicability.

In summary, the ISSA algorithm demonstrates excellent performance in optimizing the RF model’s hyperparameters. By leveraging the global search capability of the ISSA algorithm and the strong predictive power of the RF model, the combined ISSA–RF model shows more precise and reliable performance in the graduate admissions prediction task.

4.4. Model Construction

The following two figures display the simulation results of the ISSA–RF model using the collected student data for graduate admissions evaluation. Figure 3 presents the comparison between the predicted outcomes and the actual outcomes during the model training period. The green bars in the figure vividly display the discrepancies between the predicted values and the actual values. Generally speaking, the model shows high prediction accuracy throughout the training phase. Although there are some instances where the predicted values deviate significantly from the actual values, the majority of these deviations are below 0.05, indicating that the model effectively captures the relationship between student data and evaluation outcomes [63,64].

Figure 4 reveals a more pronounced trend during the testing phase. There are some larger differences, which may be attributed to noise or outliers in the collected data. However, these anomalies have a limited impact on the overall predictive performance of the model. Despite a few larger discrepancies, the model maintains high prediction accuracy, further validating its robustness in practical applications.

This robustness is not only reflected in the model’s fit to the training data but also in its ability to provide reliable predictions when dealing with unseen data. This capability is particularly crucial for graduate admissions evaluation, as it involves predicting students’ future academic potential and research abilities. It helps the admissions committee make more scientific and rational decisions, providing students with a fairer competitive opportunity.

The provided graph illustrates only the discrepancy between experimental and actual values. There is a lack of intuitive data by which to express the quality of the model training results. Therefore, the data points should be presented in the form of Figure 5a. The model training results are judged by the specific values of root mean Square error (RMSE) and the correlation coefficient (R), thereby reflecting the predictive accuracy of the model.

In the model training phase, as depicted in Figure 5a, the discrepancy between the predicted results and actual values is kept within a narrow range. Most predicted points are closely related to the central line, with only a few data points slightly deviating from the expected trajectory. This close fit is consistent with the statistical data results, with an RMSE of 0.0543 and a correlation coefficient (R) as high as 0.9281. This indicates that the model has a high predictive accuracy for the training data, accurately capturing the complex relationship between input features and output results. However, it is important to be cautious about potential overfitting phenomena. The results may be due to overfitting between the data and the model during the training phase, leading to a very high similarity between predicted and actual data. This could harm the practical application of the model.

Figure 5b shows the results of testing with the remaining 30% of the data. The difference between the predicted and actual values has increased, but the overall performance is still satisfactory. The RMSE during the testing phase is 0.0653, and the correlation coefficient (R) is 0.8803. Although the predictive accuracy during the testing phase is slightly lower than during the training phase, this difference is within an acceptable range. This implies that the model works well with the training data and holds high accuracy in prediction and generalization power when dealing with unseen test data. This robustness further validates the effectiveness of the model, indicating that the high precision of the results obtained during the training phase is not due to overfitting phenomena.

In summary, through the performance analysis of the model during both the training and testing phases, it is concluded that the model has significant application value in graduate student recruitment analysis. The model provides intuitive and accurate data support for the faculty team, helping them select the most suitable candidates for graduate programs from a large pool of students. This data-driven analytical approach not only improves the efficiency of the recruitment process but also enhances the scientific and objective nature of decision-making.

4.5. Ten-Fold Cross-Validation Result

Although preliminary experimental results show that the composite model has good predictive performance in graduate admissions evaluation, these results are based on specific data sets. In practical application, the model will be faced with a variety of new data, requiring the model to have strong generalizability if it is to effectively deal with these unknown data. Therefore, performing 10-fold cross-validation is a key step to further evaluate the model’s performance and generalization ability on a specific dataset.

The 10-fold cross-validation method divides the data set into ten parts, one part is selected as the test set at a time, and the remaining nine parts are used for training the model. In this way, every data point has the opportunity to be used as a test, thus ensuring the robustness of the model over different data distributions and avoiding special results that may occur during data acquisition. After 10 iterations, the mean and standard deviation of the model’s performance can be calculated, which helps to obtain a more accurate understanding of the model’s predictive power and generalization potential.

In addition, the 10-fold cross-validation results can be used to determine if the model shows signs of overfitting or underfitting. From the above experimental results, it can be observed that the model’s results during the training stage are highly similar to the actual data, which might lead to overfitting. This phenomenon can be further judged and analyzed by the results of 10-fold cross-validation. The performance of the model fluctuates greatly over different data folds, which may indicate that the model is too sensitive to the training data, i.e., that the model has overfitting conditions and lacks the necessary generalization ability. Through the careful analysis of the cross-validation results, the stability and reliability of the model can be evaluated, and the necessary adjustment and optimization of the model can be carried out through the evaluation results in order to improve the effectiveness of the model in practical application.

Based on the test results provided (Figure 6), it can be observed that the model fits the data set to a high degree, the results are similar in each test, and that there is no excessive fluctuation. Specifically, even under the most unfavorable conditions, the RMS error is only 0.071. This value is significantly lower than the generally acceptable threshold, indicating that the model not only captures the underlying pattern of the data set but also has excellent generalizability. At the same time, it is proved that the above experimental data are reasonable, and that the model does not overfit in the training stage of the data set. This ability to generalize means that the model can maintain high prediction accuracy even in the face of unknown data.

In addition, low RMSE values indicate little deviation between model predictions and actual observed values, which is particularly important in practical applications. When building predictive models for graduate school admissions, for example, this high level of predictive accuracy will help admissions committees more accurately assess applicants’ potential, leading to more informed and well-reasoned admissions decisions.

The results from the testing experiments indicate that the ISSA–RF model used in this study achieves high predictive accuracy and provides effective data support for graduate admissions prediction. However, the construction of the model is not straightforward and involves several key feature parameters (CGPA, GRE score, SOP, LOR, university rating, research, TOEFL score). From the data alone, it is not possible to determine the impact of each feature on the admissions prediction results. While all of these features play important roles in the model, their contributions to the final prediction results vary. Therefore, performing an importance analysis of the feature factors, using importance scores as a criterion, helps to visually express which feature factors have a greater influence on the prediction results, thereby increasing the admissions team’s focus on these features.

4.6. Importance of Input Variables

Figure 7 presents the importance of the input variables regarding the input parameters. The analysis of the data showed that the importance score of cumulative grade points (CGPA) was 3.3234, significantly higher than the scores of the other six characteristics. This important result underscores the critical role of CGPA as a key indicator for assessing students’ ability to learn at the graduate level. Therefore, the admissions committee should give priority to the applicant’s CGPA in the selection process in order to improve the validity and accuracy of the evaluation.

The higher the importance score of a feature, the more the model relies on that attribute to distinguish between different outcomes during the prediction phase. The frequent use of this feature helps to mitigate the impact of sample noise and incomplete data, reduce the negative impact of defects in other data, and thus improve the prediction accuracy of the model. On the other hand, the relatively low importance scores for research grades and TOEFL scores mean that they have a fairly limited impact on the graduate admissions prediction process.

With this in mind, admissions teams may want to consider reducing the emphasis on these two attributes in their selection criteria. Alternatively, they may explore the possibility of eliminating or replacing these features in subsequent model iterations to improve the model’s performance and ensure that the selection process is aligned with the most relevant and influential factors in graduate research.

5. Conclusions

The primary objective of this study is to utilize artificial intelligence models for processing and analyzing data related to graduate admissions evaluations and to establish an evaluation model based on this analysis. Currently, commonly used prediction models typically rely on a single core model, which often lacks the necessary generalization capability when faced with unseen data. Therefore, in this research, the ISSA algorithm is utilized as a supplementary means by which to optimize the hyperparameters of the RF model, to improve the machine learning model’s responsiveness to the correlation between graduate admissions outcomes and student-related data, and to further improve the model’s predictive performance. By constructing a dataset from collected student data and admission outcomes, this dataset serves as the foundational training and testing data for the model. Statistical metrics are used to evaluate the model’s strengths and weaknesses. Based on the research findings, the following conclusions can be drawn:

Employing the ISSA algorithm for hyperparameter adjustment effectively enhanced the predictive capability of the RF model. The RMSE test data clearly show that, as the number of iterations increases, the gap between predicted results and actual data gradually decreases. This indicates a high degree of fit between the ISSA algorithm and the RF model, effectively leveraging the strengths of both.

The ISSA–RF algorithm effectively reduces the impact of noise and outliers in the dataset. Despite the presence of some prediction values that significantly deviate from actual values in both the training and testing datasets, these anomalies have a limited impact on the model’s overall predictive accuracy. The model demonstrates high predictive precision, with a statistical parameter R reaching 0.9281.

The analysis of feature importance reveals that a student’s CGPA has the greatest impact on graduate admissions outcomes. In contrast, the importance scores for research and TOEFL score are very low. This indicates that the most crucial assessment parameter for graduate admissions is the student’s academic performance rather than extracurricular factors. Future admissions evaluations could consider modifying student feature indicators to include more specific in-school data rather than extracurricular achievements.

Overall, the study demonstrates that incorporating ISSA for hyperparameter optimization and focusing on the most influential features significantly improves the predictive performance and applicability of the graduate admissions model. However, there are still many limitations in this study. First of all, this study only uses an ISSA single algorithm and model for combination and analysis, although it is clear from the experimental results that this algorithm has a high accuracy for this study. However, this does not mean that other algorithms are less capable than ISSA. On the other hand, the data used in this study come from a single source and may be flawed in the face of different problems. In future research, more kinds of algorithms and models will be combined and analyzed to find more excellent algorithms and models with higher accuracy. In addition, more and different kinds of data are collected to train the model to build a model that can be widely used in the field.

Author Contributions

Conceptualization, E.L., Z.W. and J.L.; methodology, E.L., Z.W. and J.L.; software, J.H.; validation, E.L., Z.W. and J.L.; formal analysis, E.L. and Z.W.; investigation, E.L., Z.W. and J.L.; writing—original draft preparation, E.L., Z.W. and J.L.; writing—review and editing, E.L., Z.W., J.L. and J.H.; supervision, J.L. and J.H.; funding acquisition, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Education industry-university cooperative education project (2412273220), 2024 Humanities and Social Sciences Youth Fund Project of the Ministry of Education of China (24YJC760131), 2024 Higher Education Research Project of the Guangzhou Municipal Education Bureau (2024312216), Ministry of Science and Higher Education of the Russian Federation within the framework of the state assignment No. 075-03-2022-010 dated 14 January 2022, No. 075--01568-23-04 dated 28 March 2023 (Additional agreement 075-03-2022-010/10 dated 9 November 2022, Additional agreement 075-03-2023-004/4 dated 22 May 2023), FSEG-2022-0010, Ministry of Science and Higher Education of Russian Federation (funding No FSFM-2024-0025), Natural Science Foundation of Guangdong Province (2024A1515011162), Natural Science Foundation of Shandong Province (ZR2024QE021), and Guangzhou Science and technology planning project (2024A04J3831).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author/s.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bai, Y. A Study on Intercultural Operational Ability and Assessment of Students of International Trade Major. In Proceedings of the 2012 International Conference on Financial, Management and Education Science (ICFMES 2012), Singapore, 23–24 July 2012; pp. 432–437. [Google Scholar]
Liu, Y.; Han, C.; Huang, L.; Wang, B.; Zhu, Z. Research and Development of Student Assessment System Based on Knowledge, Ability and Mentality. In Proceedings of the 2012 7th International Conference on Computer Science & Education, VOLS I-VI, Melbourne, Australia, 14–17 July 2012; pp. 1829–1832. [Google Scholar]
Gentrup, S.; Lorenz, G.; Kristen, C.; Kogan, I. Self-Fulfilling Prophecies in the Classroom: Teacher Expectations, Teacher Feedback and Student Achievement. Learn. Instr. 2020, 66, 101296. [Google Scholar] [CrossRef]
Murshed, N.A. A Fuzzy ARTMAP Framework for Predicting Student Dropout in Higher. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), online, 18–22 July 2021. [Google Scholar]
Liu, Q.; Huang, Z.Y.; Yin, Y.; Chen, E.H.; Xiong, H.; Su, Y.; Hu, G.P. EKT: Exercise-Aware Knowledge Tracing for Student Performance Prediction. IEEE Trans. Knowl. Data Eng. 2021, 33, 100–115. [Google Scholar] [CrossRef]
Al-Adwan, A.S.; Li, N.; Al-Adwan, A.; Abbasi, G.A.; Albelbis, N.A.; Habibi, A. Extending the Technology Acceptance Model (TAM) to Predict University Students’ Intentions to Use Metaverse-Based Learning Platforms. Educ. Inf. Technol. 2023, 28, 15381–15413. [Google Scholar] [CrossRef] [PubMed]
Randelovic, M.; Aleksic, A.; Radovanovic, R.; Stojanovic, V.; Cabarkapa, M.; Randelovic, D. One Aggregated Approach in Multidisciplinary Based Modeling to Predict Students’ Education. Mathematics 2022, 10, 2381. [Google Scholar] [CrossRef]
Wu, D. A Brief Analysis on How Formative Assessment Helps to Develop Students’ Ability of English Autonomous Learning. In Proceedings of the 2012 2nd International Conference on Applied Social Science (ICASS 2012), Kuala Lumpur, Malaysia, 1–2 February 2012; Volume 2, pp. 273–276. [Google Scholar]
Bashir, T.; Haoyong, C.; Tahir, M.F.; Liqiang, Z. Short Term Electricity Load Forecasting Using Hybrid Prophet-LSTM Model Optimized by BPNN. Energy Rep. 2022, 8, 1678–1686. [Google Scholar] [CrossRef]
Wang, L.; Zeng, Y.; Chen, T. Back Propagation Neural Network with Adaptive Differential Evolution Algorithm for Time Series Forecasting. Expert Syst. Appl. 2015, 42, 855–863. [Google Scholar] [CrossRef]
Pu, H.-t.; Fan, M.-q.; Zhang, H.-b.; You, B.-z.; Lin, J.-j.; Liu, C.-f.; Zhao, Y.-z.; Song, R. Predicting Academic Performance of Students in Chinese-Foreign in Running Schools with Graph Convolutional Network. Neural Comput. Appl. 2021, 33, 637–645. [Google Scholar] [CrossRef]
Livieris, I.E.; Drakopoulou, K.; Tampakas, V.T.; Mikropoulos, T.A.; Pintelas, P. Predicting Secondary School Students’ Performance Utilizing a Semi-Supervised Learning Approach. J. Educ. Comput. Res. 2019, 57, 448–470. [Google Scholar] [CrossRef]
Alexandro, D. Aiming for Success: Evaluating Statistical and Machine Learning Methods to Predict High School Student Performance and Improve Early Warning Systems. Ph.D. Thesis, University of Connecticut, Storrs, CT, USA, 2018. [Google Scholar]
Daucourt, M.C.; Napoli, A.R.; Quinn, J.M.; Wood, S.G.; Hart, S.A. The Home Math Environment and Math Achievement: A Meta-Analysis. Psychol. Bull. 2021, 147, 565–596. [Google Scholar] [CrossRef] [PubMed]
Urhahne, D.; Wijnia, L. A Review on the Accuracy of Teacher Judgments. Educ. Res. Rev. 2021, 32, 100374. [Google Scholar] [CrossRef]
Abdelhafez, H.A.; Elmannai, H. Developing and Comparing Data Mining Algorithms That Work Best for Predicting Student Performance. Int. J. Inf. Commun. Technol. Educ. 2022, 18, 14. [Google Scholar] [CrossRef]
Alturki, S.; Hulpus, I.; Stuckenschmidt, H. Predicting Academic Outcomes: A Survey from 2007 Till 2018. Technol. Knowl. Learn. 2022, 27, 275–307. [Google Scholar] [CrossRef]
Devi, K.; Ratnoo, S. Predicting Student Dropouts Using Random Forest. J. Stat. Manag. Syst. 2022, 25, 1579–1590. [Google Scholar] [CrossRef]
Sharabiani, A.; Karim, F.; Sharabiani, A.; Atanasov, M.; Darabi, H. An Enhanced Bayesian Network Model for Prediction of Students’ Academic in Engineering Programs. In Proceedings of the 2014 IEEE Global Engineering Education Conference (EDUCON), Istanbul, Turkey, 3–5 April 2014; pp. 832–837. [Google Scholar]
Harvey, J.L.; Kumar, S.A.P. A Practical Model for Educators to Predict Student Performance in K-12 Using Machine Learning. In Proceedings of the 2019 IEEE Symposium Series on Computational Intelligence (IEEE SSCI 2019), Xiamen, China, 6–9 December 2019; pp. 3004–3011. [Google Scholar]
Adekitan, A.I.; Noma-Osaghae, E. Data Mining Approach to Predicting the Performance of First Year Student in a University Using the Admission Requirements. Educ. Inf. Technol. 2019, 24, 1527–1543. [Google Scholar] [CrossRef]
Kukkar, A.; Mohana, R.; Sharma, A.; Nayyar, A. A Novel Methodology Using RNN plus LSTM plus ML for Predicting Student’s Academic Performance. Educ. Inf. Technol. 2024, 29, 14365–14401. [Google Scholar] [CrossRef]
Asif, R.; Merceron, A.; Ali, S.A.; Haider, N.G. Analyzing Undergraduate Students’ Performance Using Educational Data Mining. Comput. Educ. 2017, 113, 177–194. [Google Scholar] [CrossRef]
Hung, J.-L.; Shelton, B.E.; Yang, J.; Du, X. Improving Predictive Modeling for At-Risk Student Identification: A Multistage Approach. IEEE Trans. Learn. Technol. 2019, 12, 148–157. [Google Scholar] [CrossRef]
Kukkar, A.; Mohana, R.; Sharma, A.; Nayyar, A. Prediction of Student Academic Performance Based on Their Emotional Wellbeing and Interaction on Various E-Learning Platforms. Educ. Inf. Technol. 2023, 28, 9655–9684. [Google Scholar] [CrossRef]
Radwan, A.M.; Cataltepe, Z. Improving Performance Prediction on Education Data with Noise and Class Imbalance. Intell. Autom. Soft Comput. 2018, 24, 777–784. [Google Scholar] [CrossRef]
Crivei, L.M.; Czibula, G.; Mihai, A. A Study on Applying Relational Association Rule Mining Based Classification for Predicting the Academic Performance of Students BT. In Knowledge Science, Engineering and Management; Douligeris, C., Karagiannis, D., Apostolou, D., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 287–300. [Google Scholar]
Acharya, M.S.; Armaan, A.; Antony, A.S. A Comparison of Regression Models for Prediction of Graduate Admissions. In Proceedings of the 2019 Second International Conference on Computational Intelligence in Data Science (ICCIDS 2019), Chennai, India, 21–23 February 2019. [Google Scholar]
Feraco, T.; Resnati, D.; Fregonese, D.; Spoto, A.; Meneghetti, C. An Integrated Model of School Students’ Academic Achievement and Life Satisfaction. Linking Soft Skills, Extracurricular Activities, Self-Regulated Learning, Motivation, and Emotions. Eur. J. Psychol. Educ. 2023, 38, 109–130. [Google Scholar] [CrossRef]
Steenbergen-Hu, S.; Makel, M.C.; Olszewski-Kubilius, P. What One Hundred Years of Research Says About the Effects of Ability Grouping and Acceleration on K-12 Students’ Academic Achievement: Findings of Two Second-Order Meta-Analyses. Rev. Educ. Res. 2016, 86, 849–899. [Google Scholar] [CrossRef]
Alwarthan, S.A.; Aslam, N.; Khan, I.U. Predicting Student Academic Performance at Higher Education Using Data: A Systematic Review. Appl. Comput. Intell. Soft Comput. 2022, 2022, 8924028. [Google Scholar] [CrossRef]
Wu, P.-H.; Wu, H.-K.; Hsu, Y.-S. Establishing the Criterion-Related, Construct, and Content Validities of a Simulation-Based Assessment of Inquiry Abilities. Int. J. Sci. Educ. 2014, 36, 1630–1650. [Google Scholar] [CrossRef]
Shan, L. The Study of Test Method in Authentic Assessment of English Reading Ability. In Proceedings of the 2nd International Conference on Education, Management and Social Science, Shanghai, China, 21–22 August 2014; Volume 6, pp. 444–446. [Google Scholar]
Rajak, A.; Shrivastava, A.K.; Vidushi. Applying and Comparing Machine Learning Classification Algorithms for predicting the Results of Students. J. Discret. Math. Sci. Cryptogr. 2020, 23, 419–427. [Google Scholar] [CrossRef]
Tang, P.; Wang, Y.; Shen, N. Prediction of College Students’ Physique Based on Support Vector. In Proceedings of the 2019 Chinese Automation Congress (CAC2019), Hangzhou, China, 22–24 November 2019; pp. 2439–2443. [Google Scholar]
Gharehchopogh, F.S.; Namazi, M.; Ebrahimi, L.; Abdollahzadeh, B. Advances in Sparrow Search Algorithm: A Comprehensive Survey. Arch. Comput. Methods Eng. 2023, 30, 427–455. [Google Scholar] [CrossRef] [PubMed]
Yue, Y.; Cao, L.; Lu, D.; Hu, Z.; Xu, M.; Wang, S.; Li, B.; Ding, H. Review and Empirical Analysis of Sparrow Search Algorithm. Artif. Intell. Rev. 2023, 56, 10867–10919. [Google Scholar] [CrossRef]
Yao, Z.; Wang, Z.; Wang, D.; Wu, J.; Chen, L. An Ensemble CNN-LSTM and GRU Adaptive Weighting Model Based Improved Sparrow Search Algorithm for Predicting Runoff Using Historical Meteorological and Runoff Data as Input. J. Hydrol. 2023, 625, 129977. [Google Scholar] [CrossRef]
Wang, Q.A.; Zhang, J.; Huang, J.D. Simulation of the Compressive Strength of Cemented Tailing Backfill through the Use of Firefly Algorithm and Random Forest Model. Shock Vib. 2021, 2021, 5536998. [Google Scholar] [CrossRef]
Ma, H.X.; Liu, J.D.; Zhang, J.; Huang, J.D. Estimating the Compressive Strength of Cement-Based Materials with Mining Waste Using Support Vector Machine, Decision Tree, and Random Forest Models. Adv. Civ. Eng. 2021, 2021, 6629466. [Google Scholar] [CrossRef]
Hajjem, A.; Bellavance, F.; Larocque, D. Mixed-Effects Random Forest for Clustered Data. J. Stat. Comput. Simul. 2014, 84, 1313–1328. [Google Scholar] [CrossRef]
Karabadji, N.E.I.; Korba, A.A.; Assi, A.; Seridi, H.; Aridhi, S.; Dhifli, W. Accuracy and Diversity-Aware Multi-Objective Approach for Random Forest Construction. Expert Syst. Appl. 2023, 225, 120138. [Google Scholar] [CrossRef]
Speiser, J.L.; Miller, M.E.; Tooze, J.; Ip, E. A Comparison of Random Forest Variable Selection Methods for Classification Prediction Modeling. Expert Syst. Appl. 2019, 134, 93–101. [Google Scholar] [CrossRef]
Wang, S.; Wang, H.T.; Lu, Y.J.; Huang, J.D. Toward a Comprehensive Evaluation of Student Knowledge Assessment for Art Education: A Hybrid Approach by Data Mining and Machine Learning. Appl. Sci. 2024, 14, 5020. [Google Scholar] [CrossRef]
Liu, J.D.; Li, G.C.; Yang, S.; Huang, J.D. Prediction Models for Evaluating the Strength of Cemented Paste Backfill: A Comparative Study. Minerals 2020, 10, 1041. [Google Scholar] [CrossRef]
Huang, J.D.; Zhou, M.M.; Zhang, J.; Ren, J.L.; Vatin, N.I.; Sabri, M.M.S. The Use of GA and PSO in Evaluating the Shear Strength of Steel Fiber Reinforced Concrete Beams. KSCE J. Civ. Eng. 2022, 26, 3918–3931. [Google Scholar] [CrossRef]
Amani, A.; Alamolhodaei, H.; Ghanbari, R.; Radmehr, F. An Epidemiological Model for Predicting Students’ Mathematics Anxiety. J. Interdiscip. Math. 2021, 24, 793–805. [Google Scholar] [CrossRef]
Guleria, P.; Thakur, N.; Sood, M. Predicting Student Performance Using Decision Tree Classifiers and Information Gain. In Proceedings of the 2014 International Conference on Parallel, Distributed and Grid Computing (PDGC), Solan, India, 11–13 December 2014; Singh, Y., Sehgal, V., Ghrera, S.P., Eds.; 2014; pp. 126–129. [Google Scholar]
Nachouki, M.; Abou Naaj, M. Predicting Student Performance to Improve Academic Advising Using the Random Forest Algorithm. Int. J. Distance Educ. Technol. 2022, 20, 296702. [Google Scholar] [CrossRef]
Cerchiara, J.A.; Kim, K.J.; Meir, E.; Wenderoth, M.P.; Doherty, J.H. A New Assessment to Monitor Student Performance in Introductory Neurophysiology: Electrochemical Gradients Assessment Device. Adv. Physiol. Educ. 2019, 43, 211–220. [Google Scholar] [CrossRef]
Austin, P.C.; Merlo, J. Intermediate and Advanced Topics in Multilevel Logistic Regression Analysis. Stat. Med. 2017, 36, 3257–3277. [Google Scholar] [CrossRef]
Sperandei, S. Understanding Logistic Regression Analysis. Biochem. Med. 2014, 24, 12–18. [Google Scholar] [CrossRef]
Granic, A. Educational Technology Adoption: A Systematic Review. Educ. Inf. Technol. 2022, 27, 9725–9744. [Google Scholar] [CrossRef] [PubMed]
Streit, J.A. Online Higher Education: A Correlational Study of Student Engagement, Satisfaction, and Perceptions of Learning; 2019; ISBN 978-1-392-02460-7. Available online: https://webofscience.clarivate.cn/wos/alldb/full-record/PQDT:61207343 (accessed on 21 December 2024).
Alshamaila, Y.; Alsawalqah, H.; Aljarah, I.; Habib, M.; Faris, H.; Alshraideh, M.; Salih, B.A. An Automatic Prediction of Students’ Performance to Support the university Education System: A Deep Learning Approach. Multimed. Tools Appl. 2024, 83, 46369–46396. [Google Scholar] [CrossRef]
Wang, Y.; Pan, Z. Modeling the Effect of Chinese EFL Teachers’ Self-Efficacy and Resilience on Their Work Engagement: A Structural Equation Modeling. Sage Open 2023, 13, 21582440231214329. [Google Scholar] [CrossRef]
Aydilek, I.B. A Hybrid Firefly and Particle Swarm Optimization Algorithm for Computationally Expensive Numerical Problems. Appl. Soft Comput. 2018, 66, 232–249. [Google Scholar] [CrossRef]
Wang, H.; Wang, W.; Zhou, X.; Sun, H.; Zhao, J.; Yu, X.; Cui, Z. Firefly Algorithm with Neighborhood Attraction. Inf. Sci. 2017, 382, 374–387. [Google Scholar] [CrossRef]
Liang, X.M.; Yu, X.; Jin, Y.; Huang, J.D. Compactness Prediction of Asphalt Concrete Using Ground-Penetrating Radar: A Comparative Study. Constr. Build. Mater. 2022, 361, 129588. [Google Scholar] [CrossRef]
Zhang, Z. Model Building Strategy for Logistic Regression: Purposeful Selection. Ann. Transl. Med. 2016, 4. [Google Scholar] [CrossRef]
Zare, M.; Ghasemi, M.; Zahedi, A.; Golalipour, K.; Mohammadi, S.K.; Mirjalili, S.; Abualigah, L. A Global Best-Guided Firefly Algorithm for Engineering Problems. J. Bionic Eng. 2023, 20, 2359–2388. [Google Scholar] [CrossRef]
Huang, J.D.; Losa, M.; Leandri, P.; Kumar, S.G.; Zhang, J.F.; Sun, Y.T. Potential Anti-Vibration Pavements with Damping Layer: Finite Element (FE) Modeling, Validation, and Parametrical Studies. Constr. Build. Mater. 2021, 281, 122550. [Google Scholar] [CrossRef]
Huang, J.D.; Zhou, M.M.; Yuan, H.W.; Sabri, M.M.S.; Li, X. Prediction of the Compressive Strength for Cement-Based Materials with Metakaolin Based on the Hybrid Machine Learning Method. Materials 2022, 15, 3500. [Google Scholar] [CrossRef]
Huang, J.D.; Zhou, M.M.; Yuan, H.W.; Sabri, M.M.S.; Li, X. Towards Sustainable Construction Materials: A Comparative Study of Prediction Models for Green Concrete with Metakaolin. Buildings 2022, 12, 772. [Google Scholar] [CrossRef]

Figure 1. Correlation analysis.

Figure 2. Hyperparameter tuning.

Figure 3. Training set for the actual and predicted admission.

Figure 4. Test set for the actual and predicted admit.

Figure 5. A comparison of the admit for actual and predicted data: (a) training dataset; (b) testing dataset.

Figure 6. Ten-fold cross-validation.

Figure 7. Importance of input variables.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, E.; Wang, Z.; Liu, J.; Huang, J. Renewal of the Concept of Diverse Education: Possibility of Further Education Based on a Novel AI-Based RF–ISSA Model. Appl. Sci. 2025, 15, 250. https://doi.org/10.3390/app15010250

AMA Style

Li E, Wang Z, Liu J, Huang J. Renewal of the Concept of Diverse Education: Possibility of Further Education Based on a Novel AI-Based RF–ISSA Model. Applied Sciences. 2025; 15(1):250. https://doi.org/10.3390/app15010250

Chicago/Turabian Style

Li, Enhui, Zixi Wang, Jin Liu, and Jiandong Huang. 2025. "Renewal of the Concept of Diverse Education: Possibility of Further Education Based on a Novel AI-Based RF–ISSA Model" Applied Sciences 15, no. 1: 250. https://doi.org/10.3390/app15010250

APA Style

Li, E., Wang, Z., Liu, J., & Huang, J. (2025). Renewal of the Concept of Diverse Education: Possibility of Further Education Based on a Novel AI-Based RF–ISSA Model. Applied Sciences, 15(1), 250. https://doi.org/10.3390/app15010250

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Renewal of the Concept of Diverse Education: Possibility of Further Education Based on a Novel AI-Based RF–ISSA Model

Abstract

1. Introduction

2. Materials

3. Proposed Methodology

3.1. ISSA

3.2. RF

3.3. Ten-Fold Cross-Validation and Judgment Index

3.3.1. Ten-Fold Cross-Validation

3.3.2. Judgment Index

4. Results and Discussion

4.1. Data Analysis

4.2. Correlation Analysis

4.3. Hyperparameter Tuning

4.4. Model Construction

4.5. Ten-Fold Cross-Validation Result

4.6. Importance of Input Variables

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI