Predicting the Variables That Determine University (Re-)Entrance as a Career Development Using Support Vector Machines with Recursive Feature Elimination: The Case of South Korea

The current study seeks to identify variables that affect the career decision-making of high school graduates with respect to the choice of university (re-)entrance in South Korea where education has great importance as a tool for self-cultivation and social prestige. For pattern recognition, we adopted a support vector machine with recursive feature elimination (SVM-RFE) with a big-data of survey of Korean college candidates. Based on the SVM-RFE analysis results, new enrollers were mostly affected by the mesosystems of interactions with parents, while re-enrollers were affected by the macrosystems of social awareness as well as individual estimates of talent and aptitude of individual systems. By predicting the variables that affect the high school graduates’ preparation for university re-entrance, some survey questions provide information on why they make the university choice based on interactions with their parents or acquaintances. Along with these empirical results, implications for future research are also presented.


Introduction
Although college admission is an important decision with career implications, which typically requires high-school graduates to commit to a 4-year-long course of studies before seeking careers, most Korean high-school graduates find it difficult to grasp their interests, aptitudes, and values, owing to the entrance exam-oriented academic environments. Consequently, young adults often select colleges and majors based on the guidance of their teachers and family members as well as their entrance exam grades [1]. With the increasing importance of career education, recent studies have shed light on career-related decision-making processes and associated difficulties experienced by high-school students [2][3][4][5]. However, little research exists on predicting the variables that affect college-related decision-making of high-school graduates as a first step toward choosing their career. These factors may differ depending on the number of attempts made by a prospective student to enroll in a course at a specific university. The present study, therefore, considers this dependence for the student population of South Korea (Korea, hereinafter). Specifically, we distinguish between new admissions and readmissions, as there is a need to identify differences in the factors that determine the students' decision to enroll. Thus, the study analyzes the factors that influence the decisions of readmission students as well as those that influence the decisions of students enrolling for the first time, with relevance to career planning and development.
This study mainly focuses on a comparative analysis of factors affecting career-planning behavior of Korean university students who are enrolling for the first time and reenrolling. In particular, because these insights can aid in improving the design of college admission rules and be more appropriately understood in the context of the enrollers' interactions with their environments based on Bronfenbrenner's ecological system theory [6][7][8], we seek to determine the variables that affect college admissions and readmissions, and learn more about the relationships between these variables. Thus, it is necessary to design and operate systematic career education planning by conducting empirical research that predicts variables affecting the decisions about college admission and readmission, which help to establish and plan career directions for students.
This study aims to identify different variables that affect the career decision-making of high school graduates about entering university depending on prior university entrance experience. Consequently, we formulate the following research questions. First, what is the difference between the ecological system variables that influence career decision-making on (re-)entering university? Second, how can we predict the variables that affect the students' preparation for university (re-)entrance?

University (Re-)Entrance as Career Development of Korean High-School Graduates
Selecting an appropriate career, occupation, or profession is one of the most important decisions that one makes in life, and many factors affect this decision [9]. This choice is a result of a lifelong process that starts long before prospective enrollers graduate from high school [10]. Career aspirations expressed by adolescents are mostly unstable and tend to change many times before adulthood [11]. These changes occur owing to social issues, family background, economic status, access to opportunity structures, and individual characteristics [11,12]. A wrong choice of career may result in anxiety, with a negative impact on life [12].
Like other East Asian countries, Korea has undergone dramatic economic and social development throughout the last few decades that has prompted it to provide universal formal educational opportunities and distribute educational resources fairly [13]. Korea has a tradition in which formal education has played an essential role in society. This tradition is associated with Confucian philosophy in which individuals were evaluated through highly competitive examinations for obtaining prestigious government positions [14]. In this context, admission to the most prestigious university in Korea is considered as an individual success. This consideration has led to Koreans' zeal for education which is often perceived as "education fever" in Korean society [15]. In order to confirm a good career and get a high level of social status, Korean students and parents have experienced huge competition to increase their level of formal education [16]. Students in Korea who hope to get higher formal education have to take the College Scholastic Ability Test (CAST) in the third year of high school. The tremendous competition in Korean society elevates the instrumental needs of students to obtain high scores on the CAST [13]. If they do not get the desired scores, it often means that they should wait another year to take the examination or choose another university or specialization that requires a lower score. Although some Korean universities evaluate their high school records, essay writing tests, and/or interviews to offer them early admission, their CAST score is still required to decide the final offer. The prestigious four-year universities have been ranked by the CAST score gained by their applicants; that is, the admission to university in Korea is based on one's CAST score. Thus, one graduating from a prestigious university is one of the most proper measures of individuals' academic competencies and more the quality of their social capital [17]. Hence, most Korean high school students' main motivation for learning is closely connected to extrinsic motivation, where obtaining the desired CAST score and then entering a top-ranking university to improve their career and social perspective is highly appreciated [13].
For high-school graduates, the decision to enter university is the starting point for planning and developing their own careers. Students who are able to decide their careers set a clear course of action related to their future, which indicates confidence in the choice of one's major or a specific career field after graduation. In contrast, career-choice indecision implies a state in which one cannot choose a career by oneself or is unsure of the career about which one has decided [18]. Deciding on entering university can be a starting point in the career development of these individuals, but it is also a difficult task. High-school students can easily be indecisive individuals, because they are not proficient and inexperienced in career decision-making. According to Van Matre and Cooper [19], some indecisive individuals could remain undecided owing to ambivalent feelings about their choice [20]. That is, if students hesitate to decide which college to enter at the first stage of their career choice, they are likely to hesitate later in their vocational choices. Furthermore, if high-school graduates do not make good decisions on their own with respect to their college admissions, they may have difficulty making decisions in future career choices.

Career Development from an Ecological Perspective
Career planning behavior for personal life design can be more appropriately understood in the context of interaction with the environment [8,21]. The ecological model of human development provided by Bronfenbrenner is utilized as a fundamental and conceptual framework for designing and conducting research on adolescents' career development [22]. According to the Bronfenbrenner 89] theory of ecological systems for explaining human development, humans and their environments maintain mutually beneficial and reciprocal relationships through sustained interactions and exchanges. This eco-systemic perspective fosters and supports human growth, development, emotions, and physical satisfaction, when human-environment interactions are oriented toward adaptive and positive directions. This theory can be applied when considering various environments surrounding the life of an individual. This ecological system provides knowledge about the environment surrounding the individual, and a framework for various problems or situations occurring in society with a comprehensive view [6][7][8]. Bronfenbrenner [7,8] explained that one environment forms a multi-layered structure inherent in the next by emphasizing human development in an ecological environment. Furthermore, Bronfenbrenner's ecological model regards human career development to be understood as occurring in an ecological environment made up of multiple or hierarchically nested contexts [22]. The ecological contexts or systems of human (career) development conceptualized by Bronfenbrenner [8] are individual systems, microsystems, mesosystems, exosystems, macrosystems, and chronosystems, which are shown in Table 1. Table 1. Ecological systems of human (career) development [6][7][8].

System Examples
individual systems gender roles, religious beliefs, and intrapsychic processes microsystems family, siblings, teachers, peers, and schools where each student belongs mesosystems interaction between parents, siblings, teachers, and peers exosystems extended family, community resources, school board, organization neighborhoods, mass media, and parents' work environments where each student belongs macrosystems social, cultural, historical influences, broad ideology, and laws and customs chronosystems changes in environment over time The ecological variables at each level-individual systems, microsystems, mesosystems, exosystems, macrosystems, and chronosystems-influence career development of adolescents and are influenced by their career development. In the present study, the variables influencing college entrance as well as career choice and decision are classified into individual systems, mesosystems, and macrosystems. The individual system variables include personal factors, such as each student's aptitude, hope, satisfaction, and entrance exam results. The mesosystem variables contain interactions between students and parents on recommending the choice of a specific university. The macrosystem variables include a major career path after graduation and university awareness and reputation as social influences.

SVM-RFE in Educational Fields
Some real-world pattern recognition applications infer knowledge from data. This knowledge is useful for making predictions about previously unseen data, and/or for developing deeper insights about the concepts that underlie these data. Presently, "big-data" refers to huge amounts of variables or features that characterize machine-learning models. A big-data sample can be represented as a vector in a well-organized form, whose components correspond to such variables. In feature pattern recognition and discrimination problems, each such well-organized vector (sample) is associated with a certain category. Machine learning algorithms, such as a support vector machine (SVM), learn the dependencies between samples and categories. The computational complexity of the learning process depends on the feature space dimensionality. However, some variables in a high-dimensional feature space may be redundant or not significant, which can undermine the success of machine learning algorithms, which are strongly affected by the quality of data. The feature selection step is used to eliminate irrelevant and redundant variables from data, making the algorithm more generalizable. The selection of strongly relevant variables might be useful for obtaining insights about the learned concept. Additional advantages of feature selection include reducing the cost of data storage, access, and computation.
The SVM proposed by Cortes and Vapnik [23] is a learning method to find and classify the hyperplane with the largest margin in order to classify the given data best. Because it can be handled, it has been successfully applied in various classifications or predictions by SVM-based classifier. However, in general, the SVM has a misclassification rate when there are many variable patterns or feature patterns. Negative or type II error is statistically positive, but the test result is negative. In order to compensate for this problem, it is possible to consider pattern selection or a variable selection method that removes less relevant variables or features and configures the final classifier with only highly explanatory variables, the most common being a support vector machine with recursive feature elimination (SVM-RFE).
According to Sanz, Valim, Vegas, Oller, and Reverter [24], the first reason is that the most relevant or correlated variables require Minimum Redundancy and Maximum Relevance (MRMR) calculations between variables where the computational loads are substantial. SVM is a classifier that is difficult to accurately find variables that help predict, but SVM is based on non-linear kernels, not linear kernels, and iterates RFE calculations by calculating importance values between variables. If SVM-RFE is adopted, identification of the most relevant variables can be performed accurately. In other words, it is possible to extract highly related variables without the need for MRMR calculation that requires considerable computation. Second, the variables resulting from these results can be used as time-to-event analysis results. The following is a look at what a time-to-event analysis looks like: (1) Survival analysis (time-to-event analysis): Once you have a model through survival analysis on your data, you can answer the following questions.
• What is the probability that a patient diagnosed with blood cancer will survive for more than 3 years? where S(t) is called a survival function, and it is usually S(t) = 1 − F(t), where F(t) is a function in the case of censoring. Being censored refers to the same situation such as a loss of contact due to patient rejection, and t is referred to be a certain point in time. Thus, S(t) means the probability of surviving at a certain point in time t.) (1) Survival analysis applications: Survival analysis applications are frequently used in the following medicines, but the same technique can be used in marketing and engineering (reliability).
• Establishing a business plan − Establishing a strategy by identifying the characteristics of customers with a long remaining period without departure As shown above, it is suitable for applications that need to be analyzed for a long time, and this paper is not the result of analysis right now, but results are obtained later by analyzing the patterns within 1~3 years. Education is one of the fields in which SVM-RFE can be applied. Patterns that result from SVM-RFE results can be used as time-to-event analysis results, which can help with analysis related to education [24].

Methods
This study aims to identify and predict the variables that affect the career decision-making of high school graduates with respect to (re-)entering university in Korea where formal education is a good measurement of self-cultivation and social capital. To recognizing patterns, we used SVM-RFE, with a big-data-like survey of Korean college candidates. If a certain pattern is selected in the high school senior year or a certain variable is selected while attending university, we can predict the future happenings on (re-)entering university. That is, the well-selected variable reflects the results of improving the accuracy of prediction in the future, and it was selected as the method of this study because it is SVM-RFE that enables the selection of variables well. The primary goal of this paper is not to classify each group by classifier, but to select such a pattern well. Regarding what was accepted as the generally accepted factors without scientific analysis, we tried to confirm that the factors were correct by scientific analysis. After the analysis, the factors can be variable patterns, which can provide meaningful information selected by the classifier.
Therefore, we investigated the effectiveness of non-redundant and relevance criteria derived by the SVM-RFE, shown in Figure 1. Our experiments on big-data have been performed to assess the effectiveness of these relevant criteria. Our results show that the relevant criterion is based on minimizing the variation of the weight vector, w 2 of the SVM-RFE.
The SVM feature pattern classifier is a binary classifier for an optimal hyper-plane as a decision learning function in a high-dimensional space [25,26]. There is a training data set denoted by {X k , y k } ∈ R n × {−1, 1}, where X k are the training samples and y k are the class labels, {−1, 1}. The SVM considers X in a high-dimensional space using function Φ, and then computes the following decision learning function: Equation (1) maximizes the distance between Φ(X k ) to the hyper-plane parameterized by (w, b). For the SVM feature pattern classifier, the optimization function can be obtained using the following Lagrangian equation: where k ranges from 1 to the number of features or variables, and α* k is the solution of the following quadratic optimization problem: Σα k α l y k y l (K(X k , X l ) where δ k,l is the Kronecker symbol and K(X k , X l ) is Φ(X k ), Φ(X l ) , over the training-set samples. The SVM-RFE was proposed by Guyon, Weston, Barnhill, and Vapnik [25], and independently by Rakotomamonjy [26], for selecting feature patterns, which are relevant for real-world problems. The goal is to determine a highly relevant subset of size "r" among "d" variables, such that "r" < "d", which maximizes the prediction performance. The SVM-RFE begins with all the feature patterns and removes one feature pattern at a time in each loop, until "r" highly relevant feature patterns are left. The removed irrelevant and redundant variables are minimized in the variation, w 2 . Therefore, the ranking criterion, CR, in Figure 1 for any variable i is Σα* k α* l y k y l (K(X k , X l ) − Σα* (i) k α* (i) l y k y l (K (i) (X k , X l ) | where K (i) is from the training-set samples when the feature pattern is removed. Based on the ranking criterion, CR, Guyon et al. [25] and Rakotomamonjy [26] suggest that the removed feature pattern has the least effect on the weight vector, w. Therefore, the ranking criterion is the simplest form of w 2 .
Sustainability 2020, 12, x FOR PEER REVIEW 6 of 11 ∈ R n × {−1, 1}, where Xk are the training samples and yk are the class labels, {−1, 1}. The SVM considers X in a high-dimensional space using function Φ, and then computes the following decision learning function: Equation (1) maximizes the distance between Φ(Xk) to the hyper-plane parameterized by (w, b). For the SVM feature pattern classifier, the optimization function can be obtained using the following Lagrangian equation: where k ranges from 1 to the number of features or variables, and *k is the solution of the following quadratic optimization problem: where k,l is the Kronecker symbol and K(Xk, Xl) is ⟨Φ(Xk), Φ(Xl)⟩, over the training-set samples. The SVM-RFE was proposed by Guyon, Weston, Barnhill, and Vapnik [25], and independently by Rakotomamonjy [26], for selecting feature patterns, which are relevant for real-world problems. The goal is to determine a highly relevant subset of size "r" among "d" variables, such that "r" < "d", which maximizes the prediction performance. The SVM-RFE begins with all the feature patterns and removes one feature pattern at a time in each loop, until "r" highly relevant feature patterns are left.
The removed irrelevant and redundant variables are minimized in the variation, ‖w‖ 2 . Therefore, the ranking criterion, CR, in Figure 1 for any variable i is where K (i) is from the training-set samples when the feature pattern is removed. Based on the ranking criterion, CR, Guyon et al. [25] and Rakotomamonjy [26] suggest that the removed feature pattern has the least effect on the weight vector, w. Therefore, the ranking criterion is the simplest form of ‖w‖ 2 .

Evaluation Results
Removing irrelevant and redundant variables based on the SVM-RFE can improve the generalization performance of a learning algorithm with respect to real-world data. In the present work, an SVM-RFE was used instead of a simple SVM to eliminate irrelevant and redundant variables from the survey results collected by the Seoul Metropolitan Office of Education in 2013.

Evaluation Results
Removing irrelevant and redundant variables based on the SVM-RFE can improve the generalization performance of a learning algorithm with respect to real-world data. In the present work, an SVM-RFE was used instead of a simple SVM to eliminate irrelevant and redundant variables from the survey results collected by the Seoul Metropolitan Office of Education in 2013.
According to the 2013 survey results, the variable patterns based on the SVM-RFE for new admissions and readmissions are shown in Table 2 below. Some students wanted to re-enroll after getting a job because they did not like their major, and because the social awareness of the university was poor. For this reason, these students only considered their talent and career aptitude when re-enrolling. Conversely, new enrollers made decisions based on their CAST scores (which were less than expected) and parents' opinions. The reliability of the results of the current study can be confirmed in the survey for 2016 college candidates. Table 3 indicates the newly updated survey questions in 2016 that were not included in the 2013 questionnaire. As the number of students who prepared to enter universities has increased since 2013, the newly added questions to the 2016 survey were confirmed by revising the questions' variables. However, it can be observed that the results of this study are significant based on the modified 2016 survey questions. Table 3. Newly updated 2016 survey questions that were not included in the 2013 survey.

Variable What Is the Main Reason for Preparing for College Entrance again?
Re-admitted and newly admitted group Class variable Association with class variables mentioned in These results shed light not only on the decision-making of new enrollers, but also on the decision-making of re-enrollers who prepare for another year of college entrance exams in Korean society. As an important part of the overall education cost in Korean society, the cost of preparing for college admission has been rising steadily, because the number of students preparing for college entrance exams has been increasing. This, in turn, increases the cost of education in Korean society by including not only the cost directly spent on college education, but also the cost spent on college preparation. Figure 2 shows that the additional survey question proves that students who are already enrolled, as well as students who are employed or unemployed, are interested in re-enrolling.
Sustainability 2020, 12, x FOR PEER REVIEW 8 of 11 by including not only the cost directly spent on college education, but also the cost spent on college preparation. Figure 2 shows that the additional survey question proves that students who are already enrolled, as well as students who are employed or unemployed, are interested in re-enrolling.
(a) (b) Notably, the question variables, which are shown in Figure 2 and newly included in the 2016 survey, indicate that not only repeat candidates who are already in college, but also the employed and unemployed are interested in college re-entry.
According to the variables in the questionnaire in Figure 2 and Table 4, there is a question on "recommendation from acquaintances" for unemployed individuals. This question is similar to the question on "parents' recommendation". It can be inferred that unemployed individuals in 2016 were still worried about something similar to what the group of new enrollers was worried about. Without additional big-data analysis, it is conceivable that unemployed individuals, rather than the employed ones, are more affected by others' or social opinions compared with their wishes, talent, and aptitude.  Notably, the question variables, which are shown in Figure 2 and newly included in the 2016 survey, indicate that not only repeat candidates who are already in college, but also the employed and unemployed are interested in college re-entry.
According to the variables in the questionnaire in Figure 2 and Table 4, there is a question on "recommendation from acquaintances" for unemployed individuals. This question is similar to the question on "parents' recommendation". It can be inferred that unemployed individuals in 2016 were still worried about something similar to what the group of new enrollers was worried about. Without additional big-data analysis, it is conceivable that unemployed individuals, rather than the employed ones, are more affected by others' or social opinions compared with their wishes, talent, and aptitude. Table 4. High relevance between the variables shown in the 2013 survey big data analysis results in Table 2, and the variables shown in the updated 2016 survey.

Variable What Is the Main Reason for Preparing for College Entrance?
Both the employed and unemployed Class variable Association with class variables mentioned in

Discussion and Conclusions
The current study revealed the significant impact of ecological systems on decisions of Korean high-school graduates about choosing and/or (re-)-entering universities. In addition, using the SVM-RFE approach, the study elucidated the predictors that affect the individuals' preparations for university entrance exams. The SVM-RFE approach was useful for reducing the problem dimensionality and complexity, and essential for preprocessing realistic big-data from the 2013 survey results about university admissions. Based on the evaluation results obtained using the SVM-RFE, we drew several conclusions about the social implications of preparations for university (re-)entrance.
First of all, new enrollers are mainly affected by the mesosystems of interactions with their parents on recommending the students' choice of university and major, while re-enrollers are largely affected by the macrosystems like university's social awareness as well as the individual systems of their talent and aptitude. To prevent high-school graduates from failing in university decision-making, they should be presented with career educational opportunities to determine areas where they see themselves as being successful and enjoy themselves, rather than following their parents' recommendations. For students to be able to make appropriate decisions regarding college admissions based on their talent and aptitude, the career education programs and other relevant supports of post-secondary institutions and governments are required. Owing to the prestigious university entrance which is a measure of career success, the cost of expenditure on private and shadow education in Korean society has increased significantly [13]. The increasing number of students who prepare for the CAST repetitively to try to change their university has become a problem for individuals and society. Such phenomena can waste education costs and time personally and lower the utilization of human resources in the national society [27]. More professional and systematic career interventions are needed to incorporate microsystem, mesosystem, exosystem and macrosystem levels [22,28,29]. Therefore, the current study suggests that it is extremely important to provide high-school students with a variety of systematic and long-term interventional assistance in career decision-making to integrate all levels of their ecological systems from individual, micro-, meso-, exo-, macro-, to chrono-systems. These interventions are required to be tailored to the needs of each type of student (e.g., decisive type vs. indecisive type) and uniform interventions should be avoided [30]. This means that it is of great importance to design and offer career interventions with the differentiated groups.
Next, the results of this study also suggest that relevant social information should be better reflected in future questionnaires, which should be designed based on the big-data analysis results from previous research. Questions inquiring why it is inevitable to accept the parents' financial support for tuition fees, in addition to questions about the parents' and/or acquaintances' recommendations, can shed additional light on the type of social advice sought by employed and unemployed individuals who want to re-enter colleges. Korea's low fertility rate, which has been maintained for more than 20 years, has led to a decline in the school-age population, which has led to a sharp decline in the number of high school graduates and threatens the sustainability of university education [31]. Nowadays, the dropout rate of students is becoming a major problem in low-ranked Korean universities. For sustainable long-term development, the quality and competitiveness of universities' educational services will need to be improved so that students, parents, employers, social communities, and the government can be satisfied with them.
Lastly, the questionnaire items revised in 2016, which confirmed the reliability of big-data analysis results using the SVM-RFE, indicate that many students go back to colleges, whether they are already in colleges or working. This implies that Korean society has to bear tremendous social capital associated with direct and indirect college enrollments. Preparing students for college readmission can be a significant economic burden, because these individuals cannot work during the preparatory period and it delays college graduation and reduces the length and income of future work. From a social perspective, the Korean government should consider the impact of the rising cost of education owing to university re-entry preparation.
The conclusions of our research should take into account some of its limitations. It is difficult to generalize because it is limited to Korean situations and universities. All the variables were evaluated by simple self-report scales which have difficulty in securing objectivity and in understanding the university and career choice of some indecisive students. Future studies could integrate other variables in order to conduct longitudinal studies with the purpose of evaluating the developmental paths and to more fully understand the university and career choice of indecisive students using more delicate self-report items and semi-structured in-depth interviews.
Funding: This research received no external funding.