Using Decision Trees and Random Forest Algorithms to Predict and Determine Factors Contributing to First-Year University Students’ Learning Performance

: First-year students’ learning performance has received much attention in educational practice and theory. Previous works used some variables, which should be obtained during the course or in the progress of the semester through questionnaire surveys and interviews, to build prediction models. These models cannot provide enough timely support for the poor performance students, caused by economic factors. Therefore, other variables are needed that allow us to reach prediction results earlier. This study attempts to use family background variables that can be obtained prior to the start of the semester to build learning performance prediction models of freshmen using random forest (RF), C5.0, CART, and multilayer perceptron (MLP) algorithms. The real sample of 2407 freshmen who enrolled in 12 departments of a Taiwan vocational university will be employed. The experimental results showed that CART outperforms C5.0, RF, and MLP algorithms. The most important features were mother’s occupations , department , father’s occupations , main source of living expenses , and admission status . The extracted knowledge rules are expected to be indicators for students’ early performance prediction so that strategic intervention can be planned before students begin the semester.


Introduction
Institutional research (IR) comprises a set of activities that support institutional planning, policy development, and decision making within higher education institutions (HEIs) [1]. In recent years, the urge to achieve excellence in research has led HEIs to have greater awareness of their roles in the entire educational management process and to place more strategic emphasis on the development of assessment tools for monitoring and evaluating the research quality [2]. In the USA and Japan, IR has been widely and successfully applied to evaluation, strategic planning, budget analysis, enrollment management, and research studies. Their studies focus on income analysis, research activities, and some issues reflecting strategic targets of HEIs. These studied issues might have some diversities from technical and vocational universities and colleges in Taiwan [3]. Thus, Taiwanese technical and vocational universities need to discover their own IR issues for specific targets and constraints.
Students, the indispensable participants in universities, their learning performance, and their attitudes towards these campuses should be seriously evaluated since they not only impact students' motivation, but also affect teaching quality and shape the design and delivery of university courses [4]. Specially, students' early performance prediction is important to academic communities so that strategic intervention can be planned before students reach the final semester. If universities in general, and Taiwanese technical and main source of living expenses, student loan, tuition waiver, parents' average income, status, occupations, and education. These variables can be obtained before the start of the semester, in order to construct predictions before the freshmen students start to learn, and thus buy more time for student guidance or investing learning resources in technological and vocational education. In sum, this paper aims to build a prediction model that can be used to predict freshmen students' learning performance based on decision trees and random forest algorithms. The sample was 2407 freshmen who enrolled in 12 departments of a university in Taiwan. From this constructed model, we can determine which students will succeed and which students indicate to be poor; the university is then able to offer them necessary assistance before they start their sophomore year. Based on experimental results, we can highlight some factors, which highly affect the first-year undergraduates' learning performance.

TheLearning Performance of First-Year Students
Students' learning performance plays a vital role in universities since it affects both individual and organizational performance [17,18]; therefore, studies on factors and variables affecting students' learning performance have been in existence for decades and have continuously attracted an increasing number of diverse researchers. In 1975, in [19,20], four factors were identified as causing poor students' academic performance: (1) society, (2) school, (3) family, and (4) student. In contrast, general factors affecting successful learning performance were highlighted in [21]. Particularly, authors in [22] reported that the factors, such as gender, students' ages, and students' high school scores in mathematics, English, and economics affected university students' scores and they also concluded that students with high scores in their high schools performed better in their university level. Additionally, authors in [23] studied the relationship between students' matriculation exam scores and their academic performance and found that a student's admission scores positively affected their undergraduates performance.
The idea of applying data mining in the educational system attracted authors in [24] in 2007 since data mining can show discovered knowledge to educators and academic teams, and show recommendations to students. Moreover, authors in [25] used ANN for university educational systems while the authors in [19] applied ANN in a narrower field of academic performance prediction in university. Particularly, Oladokun et al. [19] utilized an ANN model to predict students' academic performance based on factors, such as ordinary level subjects' scores and subjects' combination, matriculation exam scores, age on admission, parental background, types and location of secondary schools attended, and gender. Students' learning performance was predicted based on their average point scores (APS) of Grade 12 in [8], on high school scores in [17], and on cumulative grade point average (CGPA) in fundamental subjects [18].
The predictors of first-year student success has received much attention in educational practice and theory [11]. Consequently, many researchers have paid attention to this issue. For example, Ayala and Manzano [12] investigated whether or not a relationship between the dimensions of resilience and engagement, and the academic performance of first-year university students. Baneres et al. [13] (2019) aimed to identify at-risk students by building a predictive model using students' grades. Their model can predict at-risk students during the semester on a first-year undergraduate course in computer science. Neumann et al. [15] focused on first year international students in undergraduate business programs at an English-medium university in Canada. They found there to be a positive relationship between students' academic self-concept and subsequent academic achievement. In the work of Anderton [26], he indicated gender and the Australian Tertiary Admissions Rank as significant predictors of academic performance. After surveying 80 published articles, Zanden et al. [11] found that some predictors contributed to multiple domains of success, including students' previous academic performance, study skills, motivation, social relationships, and participation in first-year programs. We can establish from these published works the variables used, such as resilience, engagement, scores of quizzes and assignments, students' academic self-concept, motivation, social relationships, and participation to build prediction models. However, the information on these variables used in the literature can only be obtained during the course or in the progress of the semester. As well, some information needs to be obtained through questionnaires and interviews. This shortens the time for universities to take remedial measures, especially for some students of poor learning performance, caused by economic factors. In practice, obtaining this information and then making predictions based is too slow to prevent students from dropping out due to poor academic performance. Therefore, this study attempts to use family background variables, including department, gender, address, admission status, Aboriginal status, child of new residents, family children ranking, on-campus accommodation, main source of living expenses, student loan, tuition waiver, parent's average income, status, occupations, and education. These variables can be obtained before the start of the semester, allowing to make predictions before the freshmen students start to learn, and providing more time for student guidance or investing in learning resources.

Decision Trees
Decision trees (DT) are widely applied for prediction and classification in domain of machine learning [27]. DT have the advantages of simple use, easy understanding, high accuracy, and high prediction ability [28][29][30]. In recent years, decision trees have been successfully applied in education areas [6,[29][30][31][32][33][34][35][36][37][38]. For example, Wang et al. [33] proposed a higher educational scholarship evaluation model based on a C4.5 decision tree, while Hamoud et al. [34] used DT to predict and analyze student behaviors. Their results indicated that students' health, social activities, interpersonal relationships, and academic performance affected learning performance. Furthermore, authors in [27] used the DT method to conduct research on students' employment wisdom courses in order to provide solutions for training professionals and employment courses, and to solve the contradiction between training plans and enterprise needs. A semi-automated assessment model was built by using DT in [35].
There are a variety of DT algorithms, such as ID3, C4.5, C5.0 (a commercial version of C4.5), and CART (classification and regression tree). Among them, C4.5 and CART algorithms are the most popular and have many useful applications [33]. Compared with other classification methods, such as ANN and support vector machines, the decision tree can extract readable knowledge rules, which is helpful for university-side decision-making reference [34,35]. Therefore, this study will use decision trees algorithms, including C5.0 and CART, to build DT prediction models.

Random Forests
Random forests (RF) are regarded as an effective method in machine learning since RF can solve the problems of over-training [39,40], which decision trees may face. RF operates classification, regression, and other tasks by constructing multiple decision trees during training [41][42][43]. The calculation method is to evaluate multiple independent DT and determine the result through their voting results. When each node in DT is split using the best among the attributes, "each node in RF is split using the best among the subset of predictors randomly chosen at the node" [40]. RF has been widely applied to IR in universities. For example, in the work of [38], they used RF to predict if a student would obtain an undergraduate degree or not using the learning performance of the first two semesters of courses completed in Canada. Ghosh and Janan [16] utilized 24 variables, including creating good notes, group study, adaptation to university, and self-confidence, which were obtained from a questionnaire survey. RF was then employed to predict the firstyear student performance of a university in Bangladesh. From the above literature, we can establish that RF has been successfully applied to predict students' learning performance. Therefore, this study also applied RF as one of the candidate algorithms to predict the learning performance and to identify features, which importantly affect first-year students' learning performance.

Artificial Neural Networks
An artificial neural network (ANN) is a computational system which mimics the neural structures and the process of human brains, including biological structure, processing capacity, and learning ability. ANNs can receive input data, analyze, and process information, and provide output data/actions through a large number of interconnected "neurons" or nodes. It is the foundation of artificial intelligence (AI) and solves problems, which are difficult and/or impossible to be carried out by humans. However, ANNs must be trained with a large amount of data/information through mathematical models and/or equations because ANNs cannot understand, think, know, and process data like the human nervous system. There are two types of ANN: supervised learning and unsupervised learning. Supervised learning is a process of supervising or teaching a machine/computer by feeding it input data and correct output data, which is referred to as a "labelled dataset" so that the machine/computer can predict the outcome of sample data. Supervised learning is the machine learning task of learning that maps an input to an output based on sample input-output pairs. Unsupervised learning uses machine learning algorithms, which draw conclusions on an "unlabeled dataset". Data must then be determined based only on input data.
ANN has been applied in numerous applications with considerable attainment. ANN have been effectively and efficiently applied in the area of prediction [44,45] since ANN can be used to predict future events based on historical data. In addition, a deep learning algorithm and neural network [46][47][48][49][50] have been proposed for university student performance prediction. Dharmasaroja and Kingkaew [49] used ANN to predict learning performance in medical education. In their work, they used demographics, high-school backgrounds, first-year grade-point averages, and composite scores of examinations during the course to be input variables. Sivasakthi [50] utilized MLP, Naïve Bayes, and DT to predict introductory programming performance of first year bachelor students.
In the works of [20,39], MLP was applied to build a model for predicting student performance and had good results. Therefore, we use MLP to be our comparison base in this study.

Methodology
The experimental process of this study included five steps as shown in Figure 1. first-year student performance of a university in Bangladesh. From the above literature, we can establish that RF has been successfully applied to predict students' learning performance. Therefore, this study also applied RF as one of the candidate algorithms to predict the learning performance and to identify features, which importantly affect first-year students' learning performance.

Artificial Neural Networks
An artificial neural network (ANN) is a computational system which mimics the neural structures and the process of human brains, including biological structure, processing capacity, and learning ability. ANNs can receive input data, analyze, and process information, and provide output data/actions through a large number of interconnected "neurons" or nodes. It is the foundation of artificial intelligence (AI) and solves problems, which are difficult and/or impossible to be carried out by humans. However, ANNs must be trained with a large amount of data/information through mathematical models and/or equations because ANNs cannot understand, think, know, and process data like the human nervous system. There are two types of ANN: supervised learning and unsupervised learning. Supervised learning is a process of supervising or teaching a machine/computer by feeding it input data and correct output data, which is referred to as a "labelled dataset" so that the machine/computer can predict the outcome of sample data. Supervised learning is the machine learning task of learning that maps an input to an output based on sample input-output pairs. Unsupervised learning uses machine learning algorithms, which draw conclusions on an "unlabeled dataset". Data must then be determined based only on input data.
ANN has been applied in numerous applications with considerable attainment. ANN have been effectively and efficiently applied in the area of prediction [44,45] since ANN can be used to predict future events based on historical data. In addition, a deep learning algorithm and neural network [46][47][48][49][50] have been proposed for university student performance prediction. Dharmasaroja and Kingkaew [49] used ANN to predict learning performance in medical education. In their work, they used demographics, high-school backgrounds, first-year grade-point averages, and composite scores of examinations during the course to be input variables. Sivasakthi [50] utilized MLP, Naïve Bayes, and DT to predict introductory programming performance of first year bachelor students.
In the works of [20,39], MLP was applied to build a model for predicting student performance and had good results. Therefore, we use MLP to be our comparison base in this study.

Methodology
The experimental process of this study included five steps as shown in Figure 1.

Sample and Data Collection
This research was conducted at the end of the first semester of the academic year 2020-2021 at one technical and vocational university in Taiwan. The data for the experimental models were collected through the school register system and school grading system. When students first enroll in this university, they were required to fill in their personal information in an electronic form through the school register system. Then, during the learning process, all subjects' grades and achievements of every student were recorded in the school grading system. Therefore, at the research time, each student's registered profile included 18 personal information variables and one variable of average scores of all the subjects' grades, which they learned in the first semester.

Data Pre-Processing
In the data pre-processing phase, we performed data clean and data normalization steps. In the data clean step, we dealt with missing value examples and processed category data, after determining the 18 input and output variables (learning performance). In this step, we removed all examples that contain missing values, and encoding category data.
In data normalization step, the data was normalized according to Equation (1).
where X max is the maximum value, X min is the minimum value, and X mon is the normalized value.

Building Prediction Models
The study employed the experiments on Windows Operating Systems with a 3.80 GHz Intel(R) Xeon(R) E-2174G CPU and 64 GB of RAM. Four supervised learning models based on MLP, random forest (RF) and decision tree (DT) algorithms were developed. C5.0 and CART algorithms were used to build DT prediction models while the python (version 3.7.1) programming language was used to build RF prediction models. The experiment was carried out five times on each model. The mean values and standard deviation of the classification performance in each model were then taken and used as the benchmark for measuring the DT and RF models. The aims of various experiments were to investigate and benchmark their performance in predicting freshmen's learning performance on the dataset and to select features which highly affect students' learning performance.
Furthermore, there are three cases of output data in this experimental study as follows: • Case 1 is the origin case for the output: Excellent, Very Good, Good, Average, and Poor class to measure the four models' prediction performance originally and generally. • Case 2 is to combine the majority output: Very Good, Good, Average classes into the Normal class to investigate whether the four models predict the minority or not. • Case 3 is to focus only on the minority output: Excellent and Poor classes.

Decision Trees (DT)
The experimental process of C5.0 algorithm for all the three cases in this study included the following steps.
(1) Create training and testing data (2) Set decision tree parameters (3) Create an initial rule tree (4) Prune this tree (5) Process the pruned tree to improve its understandability (6) Pick a tree whose performance is the best among all constructed trees (7) Repeat steps 1-6 for 10 experiments (8) Take the mean values and standard deviation of the classification performance in 10 experiments for benchmarking.
We used a 10-fold cross validation (CV) experiment and constructed a DT for each fold of the data set based on the C5.0 algorithm. The collected data sets were divided into 10 equal sized sets and each set was then in turn used as the test set. Beside the test set, we used 9 other sets as our training set to build DT. Therefore, we had 10 trees. The tree, which had the best performance, was picked out and all attributes left in this tree were considered as important.
Apart from the C5.0 algorithm, after extracting the DT experimental results, this study utilized the CART algorithm by python as the other technique to test, compare and measure the prediction accuracy and feature importance selection between C5.0 and CART. The experimental process of CART algorithm for all the three cases was as follows: (1) Create training and testing data.
(3) Process the DT with training, testing, and cross validation for prediction accuracy.

Random Forest (RF)
The RF experimental process in this study consists of the following steps.
(1) Create training and testing data.
(3) Process the RF with training, testing, and cross validation for prediction accuracy.

Multilayer Perceptron (MLP)
MLP [39] is a multi-layer structure composed of an input layer, a hidden layer, and an output layer, the input layer receives data, the hidden layer processes the data, and the output layer is responsible for the final output of the model. The MLP experimental process in this study consists of the following steps.
(1) Set the initial weight and deviation value (2) Input training data and target data (3) Calculate the error between the expected output and the target (4) Adjust the weight and update the network weight (5) Repeat step (3)~step (4) until the end of learning or convergence.

Experimental Results
After pre-processing, the dataset was imported to both See5 software to implement C5.0 algorithm and jupyter software to implement MLP, and both RF and CART algorithms, i.e., DT models were conducted in two different algorithms: C5.0 and CART. Every model was implemented 10 times in each software with 10 different training and testing dataset in which the students' learning performance variables were divided into three different cases (Table 1) The experimental results of four models in each case will be presented in the following sections.
Regarding parameter settings, in RF, and the number of trees in the forest was set to 100. For the decision tree, in C5.0 and CART, pruning CF affects the way of estimating the error rate, thereby affecting the severity of pruning, in order to avoid overfitting of the model. In this study, pruning CF was set to 25%. In MLP, the learning rate was set to 0.3, and the training stop condition was set to the number of learning iterations to 1000. At

Data Preprocessing
The learning performance prediction data set had 4375 first-year students enrolled in 12 departments of a Taiwanese university during the first semester of the academic year 2020-2021. These departments were selected randomly. However, after data cleaning, only 2407 usable numbers of students were selected for the experimental sample data since all variables were fulfilled in students' profiles, resulting in a return rate of 55%. The remaining 1968 students (45%) who had missing variables in their profiles, dropped out, and/or were suspended, were excluded in this study.
After relevant data sets were processed, a total number of 18 factors, which were predicted to influence the learning performance of freshmen students, were used as input (independent) variables for the prediction model (Table 2). These proposed factors included "Department", "Gender", "Address", "Admission status", "Aboriginal", "Child of new residents", "Family children ranking", "Parent average income per month", "On-campus accommodation", "Main source of living expenses", "Students' loan", "Tuition waiver", "Father live or not", "Father's occupations", "Father's education", "Mother live or not", "Mother's occupations", and "Mother's education". The factor "Average scores" of all the subjects' grades in the first semester recorded in the school grading system was used as output (dependent) variable for the model (Table 1).  Table 2 reports 18 selected factors for input (independent) variables, including feature names and their description. Table 1 shows the output (dependent) variable, the classification of the chosen output variables, which follow the grading system, and how the output was distributed in this study. For the scope of this paper, the domain of the output variable represents the average score of all the subjects' grades in the first semester of the academic year 2020-2021 of the freshmen. Table 3 shows results of Case 1, which is our original data. For Case 1, the mean values (standard deviation) of overall accuracy are 51.20% (0.44%), 47.86% (0.68%), 52.61% (0.7%), and 41.67% (1.70%) for CART, C5.0, RF, and MLP, respectively. From Table 3, we can find all models built by these algorithms cannot achieve an acceptable performance. The reason may be that we divided too many class labels (EX, VG, G, AVG, Poor). Therefore, we combined the majority (VG, G, AVG) into a new class label (Normal) for Case 2 because we expected the models can predict the minority.    Table 3 also lists results of Case 2. For Case 2, the mean values (standard deviation) of overall accuracy are 87.50% (0.44%) for CART, 91.60% (0%) for C5.0, 89.62% (0%) for RF, and 89.91% (1.05%) for MLP. The prediction accuracies have been significantly improved. Among these four algorithms, C5.0 outperforms MLP, CART, and RF. Table 4 reports the confusion matrix of C5.0 in Case 2. It is obvious that C5.0 algorithm cannot recognize the minority classes (EX and Poor). In other words, the constructed prediction models by C5.0 algorithm cannot identify excellent and poor students. Those minority are usually important for HEIs' management to invest teaching resources and offer special assistance. For our research purposes, this prediction model can only find normal students. The students who need tutoring with poor learning effectiveness and the gifted students who need additional teaching resources to achieve higher achievements will not be identified. Therefore, we implemented another experiment similar to Case 3 in which we focused only the minority classes: Excellent and Poor.

Results of Case 3
In Case 3, we only used two class labelled samples to build prediction models. For Case 3, we focused on the Excellent and Poor classes. Table 5 lists results of Case 3. From this table, we can find that the mean values (standard deviation) of overall accuracy are 79.82% (0.91%) for CART, 74.52% (0.41%) for C5.0, 79.02% (4.43%) for RF, and 69.02% (7.28%) for MLP. In order to validate the difference between CART, RF, C5.0, and MLP, we implemented one way ANOVA. Null hypothesis is "All means are equal" and alternative hypothesis is "At least one mean is different". The significance level (α) is set as 0.05. From Table 6, we can reject null hypothesis due to the p-value (0.000) is less than 0.05. To find the best prediction models, 6 statistical hypotheses under 95% confidence level have been carried out using two-sample t-test. Table 7 lists the results of statistical hypotheses tests. From the results of H1 and H2, we can find CART has no significant difference compared to RF. From H3 to H6, the p-values are all less than 0.05. Consequently, for these four hypotheses, we reject all null hypotheses. It means CART is better than C5.0 and MLP; RF is better than C5.0 and MLP. In sum, it can be concluded that CART is slightly better than RF since the difference is not significant. And both CART and RF are significantly superior to C5.0 and MLP. In this case, CART is superior to MLP, C5.0, and RF.

Results of Importance Feature Selection
In DT algorithms, the nodes left in the constructed trees will be considered as important. Table 9 provides the extracted top five important features for three cases in the three models. However, in Case 1 and Case 2, the extracted features only can be used to identify the majority students. In Case 3, the discovered features could be used to predict excellent and poor students.
In Case 3, CART algorithm had the best performance. Consequently, we used results of CART to select important features. Figure 2 shows the rank of Gini importance of CART for Case 3. From Table 9 and Figure 2, we can find the top five important features. They are "Mother's occupations", "Department", "Fathers' occupations", "Main source of living expenses", and "Admission status".  Table 10 summarizes all the knowledge rules extracted from decision trees. Rules 1 to 13 can be used to predict the freshman academic performance. These rules will be discussed in details in the following sections.

Extracted Rules from Decision Trees
Rule 1 to Rule 9 are for predicting students with excellent academic performance.
 Rule 1 shows that the on-the-job students are hardworking and have excellent academic performance.  Rule 2 reports that if the main source of living expenses comes from family support, and the mother is a housewife who does not need to earn money for living can pay full attention to her children's education, it is not surprising that such students will perform well in their studies.  Rule 3 displays that when students of TF2 department live in the dormitory on campus, their academic performance will be excellent because the on-campus dormitory is mainly provided for economically disadvantaged students. Therefore, living in the dormitory inside the school is less expensive. Moreover, there is an unnecessary daily commute, students can fully use the on-campus library and other learning resources, thus the learning performance is naturally excellent. In the future, the accommodation for the TF2 department students should be arranged for the on-campus dormitory.  Rule 4 points out that if students' sources of living expenses come from their families, and the occupation of their mothers is as a government employee, they will have excellent academic performance.  Rule 5 is also for specific departments. If students of TD5 department pay for student loans, their academic performance will be very good.  Table 10 summarizes all the knowledge rules extracted from decision trees. Rules 1 to 13 can be used to predict the freshman academic performance. These rules will be discussed in details in the following sections.

Extracted Rules from Decision Trees
Rule 1 to Rule 9 are for predicting students with excellent academic performance.
• Rule 1 shows that the on-the-job students are hardworking and have excellent academic performance. • Rule 2 reports that if the main source of living expenses comes from family support, and the mother is a housewife who does not need to earn money for living can pay full attention to her children's education, it is not surprising that such students will perform well in their studies. • Rule 3 displays that when students of TF2 department live in the dormitory on campus, their academic performance will be excellent because the on-campus dormitory is mainly provided for economically disadvantaged students. Therefore, living in the dormitory inside the school is less expensive. Moreover, there is an unnecessary daily commute, students can fully use the on-campus library and other learning resources, thus the learning performance is naturally excellent. In the future, the accommodation for the TF2 department students should be arranged for the on-campus dormitory.
• Rule 4 points out that if students' sources of living expenses come from their families, and the occupation of their mothers is as a government employee, they will have excellent academic performance. • Rule 5 is also for specific departments. If students of TD5 department pay for student loans, their academic performance will be very good. • Rule 6 points out that if the father's occupation is a government employee, the students' academic performance will be excellent.

•
In Rule 7, if the source of living expense comes from scholarships and grants from inside and outside the school, students will perform very well. • Regarding Rule 8, for female students, if the mother is a full-time housewife, they will perform well. • Rule 9 also indicates that if the mother's occupation is an educator, the student's performance will also be very good. From the above rules, we can see that the occupation of parents can determine the academic performance of freshmen students, especially government employees and educators who have a high education level. In addition, if the mother is a full-time housewife, she can devote all her energy to student learning. It can also contribute to outstanding performance in learning. We can also see that if the financial resource is intact, whether it comes from family supply or scholarships inside and outside the school, it will also be quite helpful for students's learning.
Rule 10 to Rule 13 are for predicting students with extremely poor academic performance.
• Compared with Rule 2, Rule 10 has a clear contrast for male students, if the mother is a housewife, the academic performance will be poor. This results from the patriarchal tradition of Taiwanese society. Housewife mothers spoil their sons, which can cause this phenomenon. Therefore, it is necessary to carry out stricter learning supervision for the male students before the senior years. • Rule 11 is for the TD5 department. If students in that department do not have student loans, i.e., they have better family background, their academic performance will be quite poor. This can be inferred that if the rich families do not have strict requirements for their children's education, their family member's academic performance will be poor. In this case, more than 50% of the students, who paid for student loans, received government financial subsidies, and tuition reductions or exemptions over the years are consistent among Taiwanese private vocational universities. The students enrolled in TD5 also have low admission scores. Therefore, the university can provide intensive study guidance and strict schoolwork supervision for those students who are not doing well financially, in the departments with low admission scores. • Rule 12 reflects the general situation of students in private vocational universities in Taiwan. If the source of living expenses is mainly from part-time jobs, then those students' academic performance will also be poor. At this point, the government has launched a program of "purchasing working hours", which allows economically disadvantaged students to invest in studies by paying work-study fees. They can get financial support and promote social class mobility as with doing part-time jobs. • Rule 13 states that if a freshman is a transfer student, academic performance will be quite poor. Therefore, for the transfer students who enter the school in the first year, the student guidance system will help them integrate into class and establish contacts. After solving the possible problems, the school's remedial teaching methods can be effective.
Since most students in Taiwanese private vocational universities are economically disadvantaged, these rules have a good reference value for Taiwanese private vocational universities.

Discussion and Conclusions
In practice, the prediction models built in Case 3 are more meaningful than models of Case 1 and Case 2. Therefore, we focus on results of Case 3. In this case, the experimental results showed that prediction accuracy mean rate of RF 10-fold experiments was nearly 79.99%, that of DT 10-fold experiments was 74.59% by C5.0 algorithm and 80.00% by CART algorithm, and that of MLP 10-fold experiment was 69.02%. CART outperforms C5.0, RF, and MLP algorithms.
For Case 3, the selected factors, which most influenced freshmen's' learning performance were "Mother's occupations", "Department", "Father's occupations", "Main source of living expenses", and "Admission status". Importantly, the two factors: "Mother's occupations" and "Department" had the highest significant impact on first-year students' learning performance; whereas four factors: "Father live or not", "Mother live or not", "Child of new residents", and "Aboriginal" had the least effect on freshmen's learning performance. The analysis results are expected to be a roadmap for students' early performance prediction so that strategic intervention can be planned before students reach the final semester. The results of prediction model and those discovered to be important factors also can be used as leading indicators to prevent students from being dropped out due to poor learning performance.
From the extracted knowledge rules of decision trees, we have discovered some useful information. To predict excellent students, the occupation of parents can determine the academic performance of freshmen students, especially when parents' occupations are government employees and teachers who have higher education backgrounds. Moreover, if the mother is a housewife, it can also contribute to outstanding academic performance. It also could be found that if the financial resource is intact, whether it comes from family supply or scholarships, it will also be quite helpful for students' learning.
To predict students of extremely poor academic performance, we also discovered some rules. The technological and vocational universities should focus on transfer students and those students whose living expenses is mainly from part-time jobs. Generally, their learning performance will be poor and they require additional guidance.
In this study, we used family background variables, which can be obtained in the beginning of freshmen semester to predict students' learning performance. We can use the established models to predict the academic performance of freshmen as soon as they enter the school. If a student is predicted with poor learning performance, educational teams can carry out early-warning counseling measures, such as reminding class tutors to pay more attention to them. In the case of negative influence of part-time jobs on the absences and poor learning situations, educational teams can offer early remedial teaching resources or teaching assistants for individual tutoring. These proposed measures can effectively prevent these poor students from falling behind in their learning process.
For students who are predicted for excellent academic performance, universities can focus on elite-style tutoring, such as special classes for professional and technical advancement, license examination training, entrepreneurial competitions and other employment skills enhancement. For undergraduates who are planning to enter higher education programs, universities can offer more support for foreign language skills development and entrance examinations.
In sum, this study successfully built prediction models for freshmen's academic performance using CART, C5.0, RF, and MLP algorithms in a Taiwanese vocational university. Five important features have been determined to take advanced actions for HEIs management. For potential direction of future works, other machine learning algorithms could be applied. In addition, more input variables could be included in the future. Regarding techniques of solving class imbalance problems, such as under-sampling, over-sampling (synthetic minority oversampling technique, SMOTE), and cost adjust methods, future works can introduce those techniques to deal with imbalanced data. Furthermore, this study used an off-line training mode, which means we can have time to build high accuracy prediction models and determine the important variables based on them. Therefore, we focus on prediction accuracy without considering computational time and complexity. In future works, computational complexity and time could be considered to evaluate the constructed models.