Predicting Determinants of Lifelong Learning Intention Using Gradient Boosting Machine (GBM) with Grid Search

: The purpose of this study is to explore the factors that have the most decisive inﬂuence on actual learning intention that leads to participation in adult education. For developing the predictive model, we used tree-based machine learning, with the longitudinal big data (2017~2020) of Korean adults. Based on the gradient boosting machine (GBM) results, among the eleven variables used, the most inﬂuential variables in predicting the possibility of lifelong education participation were self-pay education expenses and then highest level of education completed. After the grid search, not only the importance of the two variables but also the overall ﬁgures including the false positive rate improved. In future studies, it will be possible to improve the performance of the machine learning model by adjusting the hyper-parameters that can be directly set by less computational methods.


Introduction
For a long time, large number of studies on factors influencing adult lifelong learning intention and participation have been conducted [1]. Educational background, competency, gender, occupational status, and occupational characteristics have been variously explored as determinants of lifelong learning participation [2][3][4] but have not shown consistent results. The reason may be the small number of target groups analyzed in each study, different characteristics of the target groups, or differences in the time or period of the study. In addition, it may be due to the fact that the variables included in the research model are different, or it may be due to the fact that the type of lifelong learning that is the dependent variable is different [5]. Therefore, it is necessary to re-explore the factors that have a decisive influence on lifelong learning participation in reality based on the massive data systematically collected in the form of a longitudinal study.
Recently, machine learning algorithms have been proposed for prediction in various fields, and have achieved remarkable results. Due to the recent rapid development of computer specifications and the development of machine learning libraries, research on predictive model development using machine learning techniques is being actively conducted in various fields including education [6]. Therefore, this study intends to propose a prediction method based on classification learning, a machine learning technique, to predict participation in lifelong learning. A prediction technique using the Gradient Boosting Machine Learning Algorithm was proposed to predict lifelong learning participation by using the 2017-2020 longitudinal study data of the Seoul Lifelong Learning Survey, Republic of Korea. The results of this study are expected to deepen the existing understanding of lifelong learning since it is possible to accurately predict participation in lifelong learning and can be used as important basic information to promote participation in lifelong learning in the future.
Unlike previous studies that focused only on the individual characteristics of learners, studies such as Darkenwald [20], Ivy [21], and Parasuraman [22] have shown that that the factors participating in lifelong learning are influenced not only by the psychological and environmental characteristics of individuals but also by the influence of the participating lifelong education institutions. Based on research by Emmalou Van Tilberg [23], the participation of farmers in education includes economic and physical factors such as the cost of participating in the training, the distance to the place of participation and the time to reach the place, institutional factors related to reliability of educational institutions, reliability of instructors, educational content, training delivery method, appropriateness of training time, training program design and operation, and other factors include training participants' education level, age, gender, and income. Ivy [21] presented educational institution facilities, accessibility, support environments, and staff services as factors that influence adult learners' participation in education. A recent study of Thongmak [24] on the effects of individual, institutional, and pedagogical factors on lifelong learners' intervention and motivation and intention recommend institutions and companies to expose learners to media regarding to the critical core skills to improve their learning intention.
Taken together, these findings reveal the importance of both personal, institutional, and socio-contextual influences on the lifelong learning intention and participation of individuals. Baert and other colleagues [10] classified the variables affecting participation in lifelong learning into three categories such as micro-level, meso-level, and macro-level based on the original Bronfenbrenner [25,26]'s ecological system theory of the microsystem, meso-system, and macro-system contexts of human development. According to the Bronfenbrenner's revised theory proposes that the person's development takes place through the process of progressively more complex reciprocal interactions between an active biopsychological human organism and the persons, objects, and symbols in its immediate external environment [27]. Park and Cha [28]'s support vector machines with recursive feature elimination (SVM-RFE) analysis results, Korean high-school graduates on the decisions of choosing and newly entering universities were mostly affected by the mesosystems of interactions with parents, while re-enrollers were affected by the macrosystems of social awareness as well as individual estimates of talent and aptitude of microsystems. To understand the factors influencing lifelong learning in the context of the lifelong learners' interactions with their environments, this study also divided the environment surrounding learners' lifelong learning intentions into three ecological levels. The three levels are as shown in Table 1: the perception about the characteristics of the learner (micro-level), the characteristics of the institutional programs and learning activities (meso-level), and the broader social context and its actors (macro-level). These factors determine the attitude of the individual, and consequently influence the development of the learning intention of that individual. bagging technique where each tree is independent of each other, the technique is vulnerable to overfitting by learning the tree by weighting the parts with large sequential errors. The overfitting problem can be prevented by optimizing various hyperparameters of the boosting models. A representative machine learning technique based on such boosting is gradient boosted regression trees (GBRTs), which is the latest gradient boosting ensemble technique. However, GBRTs take a considerable amount of calculation time to train the model, and several algorithms have been developed to supplement this. Among them, the light gradient boosting machine (LightGBM) algorithm is typically a model in which the training time is considerably shortened [29].
The boosting method during classification learning is one of the techniques for generating multiple classifiers by manipulating initial sample data similar to bagging, but the biggest difference is that it is a sequential method. The boosting technique is a method of adjusting the sample weight of the training data of the next classifier based on the training result of the previous classifier to proceed with the training [30].
As shown in Figure 1 in the boosting method, training data and test data are randomly extracted at an appropriate ratio and divided. Then, samples are extracted from the test set using the boost trap sampling technique and applied to a specific learning algorithm to create a classifier. Through the classification result of the classifier generated in this way, weights are given to the misclassified data and the unextracted (not used for learning) data to be used for the next learning. This series of processes is called a boosting round. In this way, a final classification model is created using the completed models through a total of "n" boosting rounds [31].

Boosting-Based Machine Learning
The biggest problem that appears when applying machine learning techniques is overfitting. The boosting technique trains the datasets on a tree basis, but unlike the bagging technique where each tree is independent of each other, the technique is vulnerable to overfitting by learning the tree by weighting the parts with large sequential errors. The overfitting problem can be prevented by optimizing various hyperparameters of the boosting models. A representative machine learning technique based on such boosting is gradient boosted regression trees (GBRTs), which is the latest gradient boosting ensemble technique. However, GBRTs take a considerable amount of calculation time to train the model, and several algorithms have been developed to supplement this. Among them, the light gradient boosting machine (LightGBM) algorithm is typically a model in which the training time is considerably shortened [29].
The boosting method during classification learning is one of the techniques for generating multiple classifiers by manipulating initial sample data similar to bagging, but the biggest difference is that it is a sequential method. The boosting technique is a method of adjusting the sample weight of the training data of the next classifier based on the training result of the previous classifier to proceed with the training [30].
As shown in Figure 1 in the boosting method, training data and test data are randomly extracted at an appropriate ratio and divided. Then, samples are extracted from the test set using the boost trap sampling technique and applied to a specific learning algorithm to create a classifier. Through the classification result of the classifier generated in this way, weights are given to the misclassified data and the unextracted (not used for learning) data to be used for the next learning. This series of processes is called a boosting round. In this way, a final classification model is created using the completed models through a total of "n" boosting rounds [31]. Boosting algorithm, one of the machine learning algorithms, plays an important role in handling bias-variance-tradeoff. Unlike the bagging algorithm, which controls only the high variance in the model, boosting is considered to be more effective since it controls both aspects (deflection and variance). Boosting is a sequential technique that works according to the principles of the ensemble and combines a weak set of learners to provide Boosting algorithm, one of the machine learning algorithms, plays an important role in handling bias-variance-tradeoff. Unlike the bagging algorithm, which controls only the high variance in the model, boosting is considered to be more effective since it controls both aspects (deflection and variance). Boosting is a sequential technique that works according to the principles of the ensemble and combines a weak set of learners to provide improved prediction accuracy. The model result at any moment t is weighted based on the result at the previous moment t − 1. Correctly predicted results are given lower weights, and misclassified results are given higher weights. This technique is available for both classification and regression. The easiest way to understand GBM is to understand it as residual fitting. If we predict y through a very simple model A, then predict the remaining residuals again through model B, and predict y through model A + B, we can make a better model B than A. If we continue in this way, the residuals will continue to shrink, and we will be able to build a predictive model that describes the training set well. However, this method has the disadvantage that although bias can be significantly reduced, overfitting may occur. If you look at Figure 2 you can see that the residuals that are predicted through tree1 are predicted through tree 2, and the residuals are gradually reduced by repeating this. At this time, each model tree 1, 2, 3 are called weak learners, and a classifier combining them is also called a strong learner. A simple decision tree is often used as a weak classifier. This is also called a gradient boosting tree, and recently implemented representative libraries include LightGBM and XGboost. XGBoost is an algorithm that supplements the shortcomings of the existing GBM algorithm. GBM's gradient descent is a method to find the optimal parameter that minimizes the loss function. Gradient boosting boosts performance by focusing on other models when the gradient reveals the weakness of the model learned so far. However, gradient boosting is slow and has the problem of overfitting as mentioned above.
improved prediction accuracy. The model result at any moment t is weighted based on the result at the previous moment t − 1. Correctly predicted results are given lower weights, and misclassified results are given higher weights. This technique is available for both classification and regression.
The easiest way to understand GBM is to understand it as residual fitting. If we predict y through a very simple model A, then predict the remaining residuals again through model B, and predict y through model A + B, we can make a better model B than A. If we continue in this way, the residuals will continue to shrink, and we will be able to build a predictive model that describes the training set well. However, this method has the disadvantage that although bias can be significantly reduced, overfitting may occur. If you look at Figure 2 you can see that the residuals that are predicted through tree1 are predicted through tree 2, and the residuals are gradually reduced by repeating this. At this time, each model tree 1,2,3 are called weak learners, and a classifier combining them is also called a strong learner. A simple decision tree is often used as a weak classifier. This is also called a gradient boosting tree, and recently implemented representative libraries include LightGBM and XGboost. XGBoost is an algorithm that supplements the shortcomings of the existing GBM algorithm. GBM's gradient descent is a method to find the optimal parameter that minimizes the loss function. Gradient boosting boosts performance by focusing on other models when the gradient reveals the weakness of the model learned so far. However, gradient boosting is slow and has the problem of overfitting as mentioned above. A common approach to parameter tuning is to tune two types of parameters which are (1) tree-based parameters and (2) boosting parameters. GBM is powerful enough not to overfit to tree growth, but for a given learning_rate a high number can lead to overfitting. Usually, a relatively high learning_rate is chosen. In general, the default 0.1 works well, but it may work well somewhere between 0.05 and 0.2. After that, n_estimators (optimal number of trees) are determined for the corresponding learning_rate. Usually around 40-70 range is suitable. After that, the parameters for each tree are adjusted for the determined learning_rate and the number of trees. After obtaining rational per-tree parameters, a more robust model is obtained by lowering the learning_rate and increasing the estimator proportionally [32].
XGBoost is a model born to complement this problem. XGBoost is able to solve real world scale problems using a minimal amount of resources [33]. XGBoost is a method introduced by Chen and Guestrin [33] for the purpose of solving overfitting problems in linear or tree-based models and improving the stability and training speed of large datasets. It is an abbreviation of eXtreme gradient boosting and is a boosting algorithmbased model, and is a flexible model that supports regression, classification, ranking, and user-defined objectives [30]. In the case of recent gradient boosting, it has excellent A common approach to parameter tuning is to tune two types of parameters which are (1) tree-based parameters and (2) boosting parameters. GBM is powerful enough not to overfit to tree growth, but for a given learning_rate a high number can lead to overfitting. Usually, a relatively high learning_rate is chosen. In general, the default 0.1 works well, but it may work well somewhere between 0.05 and 0.2. After that, n_estimators (optimal number of trees) are determined for the corresponding learning_rate. Usually around 40-70 range is suitable. After that, the parameters for each tree are adjusted for the determined learning_rate and the number of trees. After obtaining rational per-tree parameters, a more robust model is obtained by lowering the learning_rate and increasing the estimator proportionally [32].
XGBoost is a model born to complement this problem. XGBoost is able to solve real world scale problems using a minimal amount of resources [33]. XGBoost is a method introduced by Chen and Guestrin [33] for the purpose of solving overfitting problems in linear or tree-based models and improving the stability and training speed of large datasets. It is an abbreviation of eXtreme gradient boosting and is a boosting algorithm-based model, and is a flexible model that supports regression, classification, ranking, and userdefined objectives [30]. In the case of recent gradient boosting, it has excellent predictive performance, but it is difficult to tune the optimization model due to the disadvantage that it takes a long time to perform. However, as algorithms that shorten the execution time while improving the prediction performance of existing gradient boosting, such as XGBoost and LihgtGBM, continue to appear, it is the most useful algorithm in the classification of structured data and is receiving the most attention in boosting algorithms. It's not an overwhelming numerical difference, but it generally shows better predictive performance Sustainability 2022, 14, 5256 6 of 13 than other machine learning methods in classification. XGBoost is based on the existing Gradient boosting since the weighting, which is a characteristic of ensemble boosting, is gradient descent, but it is faster than GBM and includes regulations such as early stopping to prevent overfitting. XGBoost has better performance than previous GBM, but still has the disadvantage of slow learning time. LightGBM came out to compensate for these shortcomings of XGBoost. LightGBM can handle large amounts of data, use less memory, and is fast, but it also has its drawbacks, which can lead to overfitting if too little data is used. Unlike the existing boosting models including XGBoost, lightGBM divides the tree around leaf nodes. Figure 3 shows the difference between the methods. Level-wise tree analysis has to be balanced, so the depth of the tree is reduced and operations are added. LightGBM continuously splits the leaf nodes in the direction that can reduce the loss regardless of the balance, so an asymmetric and deep tree is created, but loss can be reduced compared to level-wise when generating the same leaf. Therefore, LightGBM can be efficient if you have to deal with large datasets within a limited time. Recently, the problem of time tends to be solved with a large set of GPUs, so XGBoost is still widely used.
predictive performance, but it is difficult to tune the optimization model due to the disadvantage that it takes a long time to perform. However, as algorithms that shorten the execution time while improving the prediction performance of existing gradient boosting, such as XGBoost and LihgtGBM, continue to appear, it is the most useful algorithm in the classification of structured data and is receiving the most attention in boosting algorithms. It's not an overwhelming numerical difference, but it generally shows better predictive performance than other machine learning methods in classification. XGBoost is based on the existing Gradient boosting since the weighting, which is a characteristic of ensemble boosting, is gradient descent, but it is faster than GBM and includes regulations such as early stopping to prevent overfitting. XGBoost has better performance than previous GBM, but still has the disadvantage of slow learning time. LightGBM came out to compensate for these shortcomings of XGBoost. LightGBM can handle large amounts of data, use less memory, and is fast, but it also has its drawbacks, which can lead to overfitting if too little data is used. Unlike the existing boosting models including XGBoost, lightGBM divides the tree around leaf nodes. Figure 3 shows the difference between the methods. Level-wise tree analysis has to be balanced, so the depth of the tree is reduced and operations are added. LightGBM continuously splits the leaf nodes in the direction that can reduce the loss regardless of the balance, so an asymmetric and deep tree is created, but loss can be reduced compared to level-wise when generating the same leaf. Therefore, LightGBM can be efficient if you have to deal with large datasets within a limited time. Recently, the problem of time tends to be solved with a large set of GPUs, so XGBoost is still widely used. Yağcı [34] explored the hidden relationship of education data already known in education data mining and proposed a tool to predict students' academic achievement using a machine learning model. However, in Yağcı 's study, nearest neighbors, support vector machines, logistic regression, Naïve Bayes, and k-nearest neighbor algorithms was utilized as a prediction method and GBM was used to classify high academic achievement. While Yağcı [34] 's study used GBM as it is, we proposed a way to efficiently and effectively better hyperparameters of GBM. Yagcı [34] explored the hidden relationship of education data already known in education data mining and proposed a tool to predict students' academic achievement using a machine learning model. However, in Yagcı's study, nearest neighbors, support vector machines, logistic regression, Naïve Bayes, and k-nearest neighbor algorithms was utilized as a prediction method and GBM was used to classify high academic achievement. While Yagcı's [34] study used GBM as it is, we proposed a way to efficiently and effectively better hyperparameters of GBM.

Proposed Method
In this study, the GBM Python scikit-learn library, which is known as the most basic of boosting, was used, and Grid search was used to tune hyper-parameters. Therefore, after building a model with the most recent versions of GBM, GBRTs, LightGBM, and XGBoosting, it is expected that similar results will be obtained even if hyper-parameter tuning is performed using grid search or random search.
This study shows that even if the method provided by the Python library is used, much better results can be obtained by selecting a scientific method by the user to adjust the hyperparameters. Unlike the parameters set by the model or data, this study focused more on the hyper-parameters that the field user can set with insight. In other words, we do not aim to improve the GBM method, but rather focuses on hyper-parameter setting based on the user's insight regardless of which method is used.
Data sets from 2017 to 2020 of the Seoul Lifelong Learning survey were used, and each feature correlation of big-data was confirmed at the data pre-processing stage. As shown in Table 2, 11 important characteristics could be selected. In Table 3, we have shown that one characteristic, employment type, was used as a target for the classification model. Figure 4 shows the feature ranking according to importance, which was obtained through simple regression analysis. According to this, academic background, self-pay tuition, and program type are the most important, and it was confirmed that there is no relation with variables such as gender or age that are traditionally considered to be related. The ranking of the importance is also changed by the gradient boosting and subsequent grid search. The ranking provided the basis for using the yes/no classification of self-paying learning expenses when performing the gradient boosting classification model. Highest level of education completed-school level (including non-response) 1 Uneducated, 2 Elementary school, 3 Middle school, 4 High school, 5 University (2 or 3-year university), 6 University (4-year university), 7 Graduate school (Master), 8 Graduate school (Ph.D.)

DQ7
Main source of income

Gradient Boost Machine (GBM)
The importance of features as shown in Figure 5 was obtained by GBM for classification. Fundamentally, even if the default parameters provided by the python scikit-learn library are used without adjustment, good results can be obtained. The reason is that it was reconfirmed that the academic level and self-pay learning cost of the importance of each feature found in the basic regression analysis are the most important. Based on the target (① wage workers, ② non-wage workers), the confusion matrix as shown in Figure 6 was also confirmed. However, as mentioned in the introduction, this study intends to perform better hyperparameter tuning through grid search through user insight.

Gradient Boost Machine (GBM)
The importance of features as shown in Figure 5 was obtained by GBM for classification. Fundamentally, even if the default parameters provided by the python scikit-learn library are used without adjustment, good results can be obtained. The reason is that it was reconfirmed that the academic level and self-pay learning cost of the importance of each feature found in the basic regression analysis are the most important. Based on the target ( 1 wage workers, 2 non-wage workers), the confusion matrix as shown in Figure 6 was also confirmed. However, as mentioned in the introduction, this study intends to perform better hyperparameter tuning through grid search through user insight.

Gradient Boost Machine (GBM)
The importance of features as shown in Figure 5 was obtained by GBM for classification. Fundamentally, even if the default parameters provided by the python scikit-learn library are used without adjustment, good results can be obtained. The reason is that it was reconfirmed that the academic level and self-pay learning cost of the importance of each feature found in the basic regression analysis are the most important. Based on the target (① wage workers, ② non-wage workers), the confusion matrix as shown in Figure 6 was also confirmed. However, as mentioned in the introduction, this study intends to perform better hyperparameter tuning through grid search through user insight.   (1) and (2). Figure 7 shows the general structure of a decision tree, so we will consider parameter tuning for a general decision tree first. This study uses Python's scikit-learn library, but even if R is used, the specific terminologies are slightly different, but the basic tuning idea is the same.

Grid Search
In this study, among the tree parameter options shown in Figure 8, 'min_samples_split': 150, 'min_samples_leaf': 40, 'max_depth': 7 and 'max_features': 'sqrt' were found as optimal parameter values. At this time, 'learning_rate' was 0.15 (usually 0.05-0.2) and n_estimators' was 50 (usually 40-70), so it was judged to be a reasonable optimal parameter. After fixing the tree-related parameters, the boosting parameter options as shown in Figure 9 were performed. After that, 'learning_rate': 0.15, 'n_estimators': 50, and 'subsample': 0.8 were found as optimal boosting parameter options. In the parameter combination of this study, 50 was obtained as 'n_estimators', an optimal estimate for 'learning_rate': 0.15. This is a fairly reasonable value and could be used as it is. However, since it may not be the same in all cases, it was run again, lowering the learning rate by 1/10 to 0.015 and raising the estimate tenfold from 50 to 500. Usually,  (1) and (2). Figure 7 shows the general structure of a decision tree, so we will consider parameter tuning for a general decision tree first. This study uses Python's scikit-learn library, but even if R is used, the specific terminologies are slightly different, but the basic tuning idea is the same.  (1) and (2). Figure 7 shows the general structure of a decision tree, so we will consider parameter tuning for a general decision tree first. This study uses Python's scikit-learn library, but even if R is used, the specific terminologies are slightly different, but the basic tuning idea is the same.

Grid Search
In this study, among the tree parameter options shown in Figure 8, 'min_samples_split': 150, 'min_samples_leaf': 40, 'max_depth': 7 and 'max_features': 'sqrt' were found as optimal parameter values. At this time, 'learning_rate' was 0.15 (usually 0.05-0.2) and n_estimators' was 50 (usually 40-70), so it was judged to be a reasonable optimal parameter. After fixing the tree-related parameters, the boosting parameter options as shown in Figure 9 were performed. After that, 'learning_rate': 0.15, 'n_estimators': 50, and 'subsample': 0.8 were found as optimal boosting parameter options. In the parameter combination of this study, 50 was obtained as 'n_estimators', an optimal estimate for 'learning_rate': 0.15. This is a fairly reasonable value and could be used as it is. However, since it may not be the same in all cases, it was run again, lowering the learning rate by 1/10 to 0.015 and raising the estimate tenfold from 50 to 500. Usually,

Grid Search
In this study, among the tree parameter options shown in Figure 8, 'min_samples_split': 150, 'min_samples_leaf': 40, 'max_depth': 7 and 'max_features': 'sqrt' were found as optimal parameter values. At this time, 'learning_rate' was 0.15 (usually 0.05-0.2) and n_estimators' was 50 (usually 40-70), so it was judged to be a reasonable optimal parameter. After fixing the tree-related parameters, the boosting parameter options as shown in Figure 9 were performed. After that, 'learning_rate': 0.15, 'n_estimators': 50, and 'subsample': 0.8 were found as optimal boosting parameter options. In the parameter combination of this study, 50 was obtained as 'n_estimators', an optimal estimate for 'learning_rate': 0.15. This is a fairly reasonable value and could be used as it is. However, since it may not be the same in all cases, it was run again, lowering the learning rate by 1/10 to 0.015 and raising the estimate tenfold from 50 to 500. Usually, in this way, the optimal parameters can be further improved, and then the parameters for each tree can be adjusted again. However, in this case, the order of the tuning variables must be carefully determined. That is, the variable that has a greater influence on the outcome should be taken first. For example, max_depth and min_samples_split have a significant impact and should be carried out first. Finally, in this study, parameter tuning as shown in Figure 10 was performed. As a result of this study, it can be seen that better parameter tuning results can be obtained through more diverse combinations. This study confirmed that even if a basic library is used, parameter tuning based on user insight can show much better results.
Sustainability 2022, 14, x FOR PEER REVIEW 11 of 15 in this way, the optimal parameters can be further improved, and then the parameters for each tree can be adjusted again. However, in this case, the order of the tuning variables must be carefully determined. That is, the variable that has a greater influence on the outcome should be taken first. For example, max_depth and min_samples_split have a significant impact and should be carried out first. Finally, in this study, parameter tuning as shown in Figure 10 was performed. As a result of this study, it can be seen that better parameter tuning results can be obtained through more diverse combinations. This study confirmed that even if a basic library is used, parameter tuning based on user insight can show much better results.   After the grid search, the importance was re-confirmed in Figure 11. Although the highest level of education completed and self-pay education expenses are still important, it is confirmed that the difference in importance between the two is similar. Usually, it is better to have the high-priority features become similar. Figure 12 shows the confusion matrix with grid search. True positive rate was slightly lowered from 1061 to 1057, but true negative rate was raised from 138 to 143. In particular, false true rate improved from 164 to 159.  in this way, the optimal parameters can be further improved, and then the parameters for each tree can be adjusted again. However, in this case, the order of the tuning variables must be carefully determined. That is, the variable that has a greater influence on the outcome should be taken first. For example, max_depth and min_samples_split have a significant impact and should be carried out first. Finally, in this study, parameter tuning as shown in Figure 10 was performed. As a result of this study, it can be seen that better parameter tuning results can be obtained through more diverse combinations. This study confirmed that even if a basic library is used, parameter tuning based on user insight can show much better results.   After the grid search, the importance was re-confirmed in Figure 11. Although the highest level of education completed and self-pay education expenses are still important, it is confirmed that the difference in importance between the two is similar. Usually, it is better to have the high-priority features become similar. Figure 12 shows the confusion matrix with grid search. True positive rate was slightly lowered from 1061 to 1057, but true negative rate was raised from 138 to 143. In particular, false true rate improved from 164 to 159. Table 4 contains a comparison table of train scores and test scores. in this way, the optimal parameters can be further improved, and then the parameters for each tree can be adjusted again. However, in this case, the order of the tuning variables must be carefully determined. That is, the variable that has a greater influence on the outcome should be taken first. For example, max_depth and min_samples_split have a significant impact and should be carried out first. Finally, in this study, parameter tuning as shown in Figure 10 was performed. As a result of this study, it can be seen that better parameter tuning results can be obtained through more diverse combinations. This study confirmed that even if a basic library is used, parameter tuning based on user insight can show much better results.   After the grid search, the importance was re-confirmed in Figure 11. Although the highest level of education completed and self-pay education expenses are still important, it is confirmed that the difference in importance between the two is similar. Usually, it is better to have the high-priority features become similar. Figure 12 shows the confusion matrix with grid search. True positive rate was slightly lowered from 1061 to 1057, but true negative rate was raised from 138 to 143. In particular, false true rate improved from 164 to 159. Table 4 contains a comparison table of train scores and test scores. After the grid search, the importance was re-confirmed in Figure 11. Although the highest level of education completed and self-pay education expenses are still important, it is confirmed that the difference in importance between the two is similar. Usually, it is better to have the high-priority features become similar. Figure 12 shows the confusion matrix with grid search. True positive rate was slightly lowered from 1061 to 1057, but true negative rate was raised from 138 to 143. In particular, false true rate improved from 164 to 159.       Table 5 compares overall accuracy, sensitivity, precision or positive predictive value, specificity (1-false positive rate), and f1_score. Overall, the numerical values have been improved, and it can be further improved through more diverse combinations of hyperparameters. In particular, it can be seen that the false-positive rate (normal is judged as incorrect) has significantly improved. The desirable classification method should be highly sensitive and specific.   Table 5 compares overall accuracy, sensitivity, precision or positive predictive value, specificity (1-false positive rate), and f1_score. Overall, the numerical values have been improved, and it can be further improved through more diverse combinations of hyperparameters. In particular, it can be seen that the false-positive rate (normal is judged as incorrect) has significantly improved. The desirable classification method should be highly sensitive and specific. Specificity was significantly improved in the hyperparameter tuning of this study, showing how low the false negative rate (FNR; 1-specificity), which is the rate at which non-wage workers are judged to be wage workers. This suggests that tuning through grid search is a much better way to select non-wage workers who can benefit from regular gradient boosting. In addition, the specificity shows that the applicability of Bayes' Theorem can be considered in the decision-making of selection and classification. The most important criterion for judging the validity of a classification method is precision, which is the same as the posterior of Bayes theorem. Therefore, the results of this study suggest that a simpler Bayes theorem can be used for data handling similar to this study in place of the complex and time-consuming gradient boosting and grid search.

Discussion and Conclusions
In this study, the likelihood of Korean adults participating in lifelong education was predicted using GBM with grid search, a tree-based machine learning classification algorithm. The prediction was made using 11 independent variables such as program type, program satisfaction, and social participation. Among the variables used, the most influential variable in predicting the possibility of lifelong education participation was selfpay education expenses and the next variable was the highest level of education completed. Both of these main predictor variables of self-pay education expenses and the highest level of education completed are limited to the micro level. In order to promote participation in lifelong learning, efforts should be made not only at the micro (individual) level but also at the meso (institutional) level and macro (socio-contextual) level from an ecological point of view. Particularly, the predictor of self-pay education expenses which are the first predictor of lifelong education intentions found through GBM with grid search has to do with Individual characteristics regarding the living (economic) situation. On the other hand, De Meester and other colleagues [35] presented financial costs such as program fees and course materials as the characteristics of the educational program at the meso level.
After the grid search, not only the importance of the two variables but also the overall figures improved. In particular, the false positive rate improved significantly. In this study, the performance of the GBM model was improved by using grid search. In general, hyperparameters must be manually enhanced by the insight of the data processing specialists or an optimized algorithm must be developed for the manipulation. However, since the proposed method showed better results even with ready-made Grid-Search, it shows that users can acquire better results with an automated scientific method rather than a manual one. In addition, the results of the proposed method show that false-positive rate (1-specificity) in confusion matrix is the most improved. The important criteria for evaluating the validity of a classification are precision and false-positive rate (1-specificity), which are the same as the Bayesian theorem. Therefore, it is one of the contributions of the proposed method to show that the Bayesian theorem can be used less computationally instead of the complex and time-consuming GBM and grid search in research fields that require evaluation of results similar to this study. In future studies, it will be possible to improve the performance of the machine learning model by less computational methods. Furthermore, the results of this study can be applied to the prediction of participation in various learning forms (i.e., face-to-face learning, online learning, blended learning, and flipped learning) of K-12 and higher education in addition to lifelong education in the post-COVID-19 era. Also, it is expected to be used as a basis for big data research in the educational field.
Funding: This work was supported by the Pukyong National University Research Fund in 2021 (CD20210841).

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data used to support the findings of this study are included within the article.