Predicting Entrepreneurial Intention of Students: Kernel Extreme Learning Machine with Boosted Crow Search Algorithm

: College students are the group with the most entrepreneurial vitality and potential. How to cultivate their entrepreneurial and innovative ability is one of the important and urgent issues facing this current social development. This paper proposes a reliable, intelligent prediction model of entrepreneurial intentions, providing theoretical support for guiding college students’ positive entrepreneurial intentions. The model mainly uses the improved crow search algorithm (CSA) to optimize the kernel extreme learning machine (KELM) model with feature selection (FS), namely CSA-KELM-FS, to study entrepreneurial intention. To obtain the best ﬁtting model and key features, the gradient search rule, local escaping operator, and levy ﬂight mutation (GLL) mechanism are introduced to enhance the CSA (GLLCSA), and FS is used to extract the key features. To verify the performance of the proposed GLLCSA, it is compared with eight other state-of-the-art methods. Further, the GLLCSA-KELM-FS model and ﬁve other machine learning methods have been used to predict the entrepreneurial intentions of 842 students from the Wenzhou Vocational College in Zhejiang, China, in the past ﬁve years. The results show that the proposed model can correctly predict the students’ entrepreneurial intention with an accuracy rate of 93.2% and excellent stability. According to the prediction results of the proposed model, the key factors affecting the student’s entrepreneurial intention are mainly the major studied, campus innovation, entrepreneurship practice experience, and positive personality. Therefore, the proposed GLLCSA-KELM-FS is expected to be an effective tool for predicting students’ entrepreneurial intentions.


Introduction
Colleges, as the main front of college students' entrepreneurship education, to further promote the reform of college entrepreneurship education has become an important task for the reform and development of higher education at present and in the future. The country has provided broad development space and unlimited opportunities for college students to start their businesses. However, the entrepreneurial rate of college students is not very high, and the success rate is even lower. Innovation and entrepreneurship education in colleges and universities are in full swing, but has it played its due role? What impact do college innovation and entrepreneurship education have on students' entrepreneurial intentions? What is the impact of innovation and entrepreneurship education in colleges 1.
Incorporating the gradient search rule (GSR), local escaping operator (LEO), and Levy mutation mechanism (LM) into the CSA to improve their search capabilities.

2.
The performance of GLLCSA is effectively verified through benchmark function experiments. 3.
The GLLCSA-KELM-FS model is proposed to predict students' entrepreneurial intentions. 4.
Achieve an effective prediction of students' entrepreneurial intentions and screen out the key features.
This paper is organized as follows. Section 2 briefly describes KELM and the CSA. Section 3 introduces GLLCSA. Section 4 presents the GLLCSA-KELM-FS model. Section 5 evaluates the performance of the proposed GLLCSA. Section 6 uses the GLLCSA-KELM-FS model to predict graduate entrepreneurial intentions. Section 7 draws conclusions and outlooks. Appl According to the study, the performance of the classifier is affected greatly by its inner parameters and the features in the data. Metaheuristics have great effectiveness in solving this type of problem as shown in many works [28][29][30][31][32], such as object tracking [33,34], the traveling salesman problem [35], gate resource allocation [36,37], multi-attribute decisionmaking [38,39], the design of the power electronic circuit [40,41], fractional-order controllers [42], medical diagnoses [43,44], big data optimization problems [45], green supplier selections [46], economic emission dispatch problems [47], scheduling problems [48,49], and combination optimization problems [50]. This study proposes an enhanced crow search algorithm (CSA) [51] to simultaneously optimize the hyperparameters of the kernel extreme learning machine (KELM) and the feature space for predicting the entrepreneurial intention of college students. The CSA is a novel-inspired metaheuristic method proposed in 2016 and has resolved many complex optimization problems [51], with many researchers attempting to boost it from different aspects [52][53][54][55][56]. For example, Adamu et al. [57] proposed a hybrid particle swarm optimization with the CSA for feature selection, and the result of the proposed model gained an accuracy of 89.67% on 15 datasets. Aliabadi et al. [58] designed an improved CSA to optimize a hybrid renewable energy system in radial distribution networks, and the results showed significant active losses and voltage deviations. Al-Thanoon et al. [59] adopted the CSA for big data classification, and the results showed a higher classification performance compared with other algorithms. Awadallah et al. [60] developed a cellular CSA with topological neighborhood shapes to resolve three real-world problems. Bakhshaei et al. [61] designed a boosted CSA-based tournament selection strategy, and the results showed that the optimal determination of power exchange and the incentive rate could lead to decreasing operation costs. Chaudhuri et al. [62] developed a binary CSA with a time-varying flight length to solve the feature selection problem, and the results showed that the proposed feature selection technique behaved better than other approaches. Geetha et al. [63] designed an enhanced CSA for a forgery detection technique in the image, and the results exhibited that the proposed classification performs better than most algorithms. Guha et al. [64] used the CSA with chaotic mapping for fine-turning the controller and varied the efficacy of the controller in its frequency regulation, as validated. Gupta et al. [65] introduced a novel boosted CSA to predict Parkinson's disease with an accuracy of 100% and helped patients obtain proper treatment. Hossain et al. [66] developed a hybrid support vector regression and CSA to handle the multi-objective optimization of microalgae-based wastewater treatment. Ke et al. [67] proposed an enhanced CSA to deal with energy optimization problems, and the results showed that the approach can better obtain proper solutions with lower calculation times. Khattab et al. [68] developed a novel crow spiral-based search algorithm to solve the design problem formulated, and the gained results confirmed the success of the filter design. Kumar et al. [69] designed a hybrid CSA with an arithmetic crossover to two real-world engineering optimization problems and gained effective results. Li et al. designed [70] an improved CSA with an extreme learning machine model to effectively forecast short-term wind power.

Kernel Extreme Learning Machine (KELM)
The extreme learning machine (ELM) [71] is a class of machine learning systems or methods based on feed-forward neural networks. The traditional ELMs have a single hidden layer, which is considered to have a low learning rate and generalization when compared with other shallow learning systems, such as the single-layer perceptron and support vector machine (SVM). The ELM algorithm randomly generates the connection weight between the input layer and the hidden layer and the threshold of the neurons in the hidden layer. Therefore, in the process of training, one only needs to set the number of neurons in the hidden layer, and the unique optimal solution can then be obtained.
KELM was designed by adding a radial basis function (RBF) kernel that is based on ELM [72]. The selection of RBF kernels aims to map samples into a high-dimensional mapping space, further solving nonlinear problems. In addition, the RBF kernel has few parameters; only the penalty factor and kernel parameters need to be considered. However, the choice of hyperparameters has a certain impact on the effect of model fitting. Therefore, it is necessary to select an appropriate hyperparameter optimization method according to the specific problem.
Suppose there exists a space, and the number of crows (group size) is N and crow i in the search space in time (iteration), which is defined by a vector x i,iteration (i = 1, 2, . . . , N; and iteration max is the maximum number of iterations. When iteration is going on, the hidden position of crow i is represented by m j,iteration . This is by far the best position to have. In fact, in the memory of each crow, what it considers to be in the optimal position is remembered. After that, the crow moves through the spatial environment and then looks for a more optimal location for the presence of food.
Suppose that in iter iteration, crow i wants to access its hidden location m j,iteration . In this iteration, crow i decides to follow crow j and close to crow j s hideout. In this case, there may be two states: State 1: crow j does not know crow i is tracking it. As a result, crow i will be close to crow j s hiding position. In this case, the new location of crow i is obtained as follows: where r i is a random value between [0, 1], and f l i,iteration is the flight length of crow i when iterating iter. State 2: crow j knows crow i is tracking it. Therefore, to keep its food from being stolen, crow j will deceive crow i by leaving for another location.
In general, state 1 and state 2 can be expressed as: Among them, r j is a random number that is evenly distributed between 0 and 1; iteration represents crow j perception probability in iteration.
The MA should supply a good equilibrium between exploration and exploitation. In the CSA, the perception/affinity probability (AP) parameter controls exploration and exploitation. By abating the value of perception chance, the CSA tends to search for the current good solution in the local area. Therefore, using a smaller AP value will enhance the local search capability. On the other hand, with the increase in perception chance, the ability to search for locally optimal solutions decreases and is replaced by an improved ability to search in the global space.

Gradient Search Rule and Local Escaping Operator and Levy Flight Operator (GLL)
In this part, we will introduce the GLL mechanism, which aims to achieve better performance for the CSA and prevent the CSA from sinking into the local optimum (LO). In this paper, the GSR, LEO, and LM operators are used to improve the personal position updating function of the CSA.
The GSR strategy has shown great capability in many tasks [89][90][91][92][93]. In this study, the GSR strategy is mixed with the position update of the CSA to enhance the global search capability. The algorithm creates a population search method with strong robustness. The GSR can be expressed as: where randn is a normally distributed random number, ∆x means the value of the increment, and ε is a number from 0 to 0.1. x best and x worst , respectively, represent the best and worst solutions gained in the search process. Equation (3) is better updated for the current solution. In addition, to better enhance the overall effect of the algorithm, ρ 1 is then introduced into the equation to modify the GSP, as detailed below. The purpose of the optimization algorithm is to be able to search both locally as well as globally to achieve optimal results. Therefore, the GSR is updated by adaptive coefficients. Further, ρ 1 is regarded as the vital variable in GBO to achieve this purpose, and it can be expressed as: where β min and β max are equal to 0.2 and 1.2, respectively; m expresses the number of iterations; M represents the blanket number of iterations. The GSR mechanism can be used to enhance the ability to search in the global space from the principle of the CSA. In Equation (3), ∆x is determined by distinguishing the best solution (x best ) and a selected position (x i r1 ) (see Equations (7)-(9)) at random. Parameter δ is defined by Equation (9). To improve exploration, a random number (rand) is mixed in Equation (9). ∆x = rand(1 : N) × |step| (7) where rand(1 : N) is a vector composed of N dimensions; r1, r2, r3, and r4 (r1 = r2 = r3 = r4 = n) are unequal integers chosen from 1 to N at random; step is the size of the step determined by x best and x i r1 . On the other hand, the local escaping operator (LEO) is mixed in the CSA to improve the search performance in the local space. The principle is to transform the moving direction of the current solution into the direction toward the optimal solution so far into the local space, thus speeding up the algorithm convergence. Furthermore, LEO can be expressed in Equation (10): where rand is a random number from 0 to 1; ρ 2 is based on a random number on α, which makes the vector have different step sizes. ρ 2 is given by: Finally, Equations (12) and (13) aim to update the position of the current vector (x i n ) established by the GSR and LEO mechanism.
The position of the best vector (x best ) is replaced by the current vector (x i n ) in Equation (13), and the new vector (X2 i n ) can generate as follows: where The above-mentioned method of updating the location uses a searching direction at the local space. In Equations (13) and (14), there are advantages and disadvantages in the search, respectively. Equation (13) is better in global space but worse in local space, and Formula (14) is vice versa. Therefore, the CSA is based on Equations (13) and (14) to balance the exploration and exploitation capabilities. Thus, based on the positions X1 i n , X2 i n , the vector X i n and X i+1 n can be defined as: where r a and r b are two random numbers in [0, 1].
where f 1 is the uniform random number in [−1, 1]; f 2 expresses the random number of the standard normal distribution; u 1 , u 2 , u 3 are all random numbers in Equations (21)-(23): where rand is the random number from 0 to 1, and µ 1 is a number in [0, 1]. The above equations are further expressed as follows: where L 1 is a binary parameter with a value of 0 or 1. If parameter µ 1 is less than 0.5, the value of L 1 is 1, otherwise, it is 0.
To determine the solution x p in Equation (20), the following scheme is suggested: where x rand is a new solution, x i p is a randomly selected solution of the population (p ∈ [1, 2, . . . , N]), and µ 2 is the random number in the range of [0, 1]. Equation (27) can be simplified as: where L 2 is a binary parameter with a value of 0 or 1. If µ 2 is less than 0.5, the value of L 2 is 1, otherwise, it is 0. Moreover, the Levy mutation mechanism (LM) is adopted to update x i , which is expressed as Equation (31).
where β is an index of stability. The Levy random number can be described by the following formula: where µ and v are standard normal distributions, Γ is known as the standard Gamma function, β = 1.5, and φ denotes as below: where L(β) is the randomly spread number drawn from the Levy distribution. The LM operator is likely to generate a different offspring because of its heavy-tailed distribution. Hence, it can help all individuals avoid the local optima without striking a blow. Thus, the implementation of GLLCSA can be indicated in detail as below.

GLLCSA
We combined all the above strategies, and the pseudo-code for the whole GLLCSA framework is shown in Algorithm 1. Figure 1 is the flowchart of GLLCSA. In the proposed GLLCSA, almost all parameters are affected by the GLLCSA group size. However, the new tactic will not increase the complexity of the original algorithm too much. GLLCSA adopts a gradient search strategy, which includes the Newton method to change the gradient descent rate, and it can coordinate the rationality of each object and improve the authenticity of the algorithm simulation. According to Figure 1, the time complexity of GLLCSA relates to the following factors: initialization, fitness evaluation, individual location update, ranking, gradient search mechanism, local operator, and levy flight mutation. On account of the time consumption of fitness, an evaluation is decided by the concrete optimization problem. The computational complexity analysis centers on the other six aspects. Assuming the population and initialization of the solutions with N agents, the computational complexity of the sorting mechanism is O(N) where G is the highest iteration number and D is the dimension. The complexity of the GSR, LEO, and LM update stages is O(G × N × D). Therefore, the computational complexity of GLLCSA is O(N + N*logN) + O(G*D(2*N + 1)), which is the same as the original CSA.
initialization of the solutions with N agents, the computational complexity of the sorting mechanism is O(N) + O (N × logN). The computational complexity of the individual location update is O(G × N × D), where G is the highest iteration number and D is the dimension. The complexity of the GSR, LEO, and LM update stages is O(G × N × D). Therefore, the computational complexity of GLLCSA is O(N + N*logN) + O(G*D(2*N + 1)), which is the same as the original CSA.

Proposed GLLCSA-KELM-FS Model
In this section, the proposed GLLCSA-KELM-FS model is introduced in detail, as shown in Figure 2. First of all, the input data needs to be preprocessed; the first is the normalization operation and feature selection to eliminate the redundant partial features. The ten-fold cross-validation was used to avoid overfitting. For the KELM technique, the input data was mapped into the hidden layer space through the RBF kernel, including the hyperparameters C and γ to be optimized and the n features. The proposed GLLCSA was adopted to optimize the two hyperparameters involved. Specifically, the dataset was divided into 10 parts, with 9 parts selected as training data and 1 part used as the test data, in turn. Further, the nine pieces of data were divided into five pieces again, and four pieces of data were selected, in turn, as training data, which were combined with GLLCSA and used to train the KELM model. We evaluated the trained KELM model with one copy as the test data and obtained an optimal KELM model. Finally, for this model, the original reserved one test data was used to evaluate the performance of the resulting KELM model. hyperparameters C and γ to be optimized and the n features. The proposed GLLCSA was adopted to optimize the two hyperparameters involved. Specifically, the dataset was divided into 10 parts, with 9 parts selected as training data and 1 part used as the test data, in turn. Further, the nine pieces of data were divided into five pieces again, and four pieces of data were selected, in turn, as training data, which were combined with GLLCSA and used to train the KELM model. We evaluated the trained KELM model with one copy as the test data and obtained an optimal KELM model. Finally, for this model, the original reserved one test data was used to evaluate the performance of the resulting KELM model.

Test Experiments of Benchmark Function Sets
In the experimental part, the GLLCSA was evaluated for a variety of aspects through a set of experiments on benchmarks and practical problems. To ensure the fairness of the experiment, we adopted the same environment and parameter settings in the same experiment. The population size and the maximum number of iterations were set to 30 and 500, respectively. Each algorithm ran independently on each function for a specified number of times to reduce the weight of unpredictability. In this experiment, two main metrics were used to test and estimate the proposed algorithm's performance: the average result (AVG) and standard deviation (STD). In addition, we placed the optimal results obtained in each test function in bold.

Benchmark Functions
To compare the proposed algorithm and other algorithms, this experiment used 30 classical functions, including the unimodal functions, multimodal functions, hybrid

Test Experiments of Benchmark Function Sets
In the experimental part, the GLLCSA was evaluated for a variety of aspects through a set of experiments on benchmarks and practical problems. To ensure the fairness of the experiment, we adopted the same environment and parameter settings in the same experiment. The population size and the maximum number of iterations were set to 30 and 500, respectively. Each algorithm ran independently on each function for a specified number of times to reduce the weight of unpredictability. In this experiment, two main metrics were used to test and estimate the proposed algorithm's performance: the average result (AVG) and standard deviation (STD). In addition, we placed the optimal results obtained in each test function in bold.

Benchmark Functions
To compare the proposed algorithm and other algorithms, this experiment used 30 classical functions, including the unimodal functions, multimodal functions, hybrid functions, and composition functions. These 30 functions are all taken from CEC 2014 [94]. F1-F3 represent the unimodal functions, F4-F16 are simple multimodal functions, F17-F22 are hybrid functions, and F23-F30 are composition functions. Thirty different types of benchmarks allow a more comprehensive evaluation of the performance of the proposed algorithms.

Comparison with Classical Algorithms
In this section, we compare GLLSCA with BA [95], SCA [96], PSO, MFO, GWO, CESCA [12], OBLGWO [97], IGWO [86], CGPSO [98], CBA [99], and RCBA [100] on CEC 2014 to test its performance. Among them, Functions (1)-(7) are unimodal func-tions to test the local search ability of the function. Functions (8)- (13) are multimodal functions to test the global search ability of the function. Functions (14)-(23) are hybrid functions, which are applied to test the global and local comprehensive ability of the function. Functions (24)-(30) are composition functions, which can be used to fully test the performance of the algorithm in all aspects. Some of the parameter settings for the different algorithms are shown in Table 1. The unimodal and multimodal functions have been obtained in a great deal of literature, as shown in Tables 2-4. The parameter dim expresses the dimension of the selected objective function. The parameter Range is responsible for the boundary of the function search space, and the parameter F min is applied to represent the optimal value of the function. Table 1. Parameters for involved methods.

Function Dim
Range f min   5] 0.00030 Moreover, the value of dim is set to 30, the population size is set to 30, and the maximum number of iterations is 1000. In addition, we performed 30 tests for each function.

Results on 30D Functions
The proposed GLLCSA displays the lowest average value of the 30 functions, and the detailed comparison results of GLLCSA and the other eleven peers can also be seen in Table 5. In other words, the improved algorithm can obtain better results than the other algorithms on every test function. Moreover, the results of F2, F3, F5, F7, F8, F10, F14, F17, F18, F21, F23-F29 bear out the ability of this method to obtain the highest quality solutions. Compared with the other competitive algorithms, the statistical performance of GLLCSA is verified by the CEC 2014 benchmark functions. For further statistical comparison, according to the ARV index, the ranking of the 12 algorithms is shown in Table 6. According to the statistical value of ARV, when searching for the minimum value of the function, GLLCSA is better than the other methods, followed by IGWO, CGPSO, OBLGWO, PSO, BA, GWO, RCBA, CBA, MFO, SCA, and CESCA. In Table 7, the Wilcoxon signed-rank test [101] is given, which indicates whether the difference between GLLCSA and the other algorithms is significant. As can be seen from the table, the experimental results of GLLCSA have a better performance than SCA and CESCA in all test functions. In conclusion, compared with SCA, CESCA, and the other nine algorithms, the GLLCSA achieves the best results for these test functions. In addition, to display the significant advantages of GLLSCA, Figure 3 shows the convergence rates of IGWO, CGPSO, OBLGWO, PSO, BA, GWO, RCBA, CBA, MFO, SCA, and CESCA on 30 benchmarks. As shown in the figure, for the problems of F2, F3, etc., it can be seen from the convergence curve that, compared with the other algorithms, GLLCSA can improve both the convergence speed and the convergence accuracy based on the original SCA. Furthermore, the GLLCSA can reach the optimal solution when dealing with F5, F8, F10, F17, F18, F21, F27, F28, and F29. Since both F12 and F13 are multimodal functions, it is obvious to notice that there are many locally optimal solutions for these two functions. Although GLLCSA also falls into the optimal local solution, its convergence speed and accuracy are optimal when compared to the other algorithms. It is proved that the CSA based on GLL has a good exploration in avoiding LO. Therefore, the GLLCSA has more advantages than the original CSA.

Data Collection
The data for the study was derived from 842 graduates from Wenzhou Vocational and Technical College in Zhejiang, China, over five years. The dataset covers a total of ten features, including gender, political affiliation (PA), major, place of student's source (PSS), family financial situation (FFS), practical experience in innovation and entrepreneurship on campus (PEIEC), training course of innovation and entrepreneurship (TCIE), grade point average (GPA), scholarship awards (SA) and proactive personality (PP). In the program, the above ten features are marked with F1-F10, in turn. A detailed description of the dataset can be found in reference.

Condition Configuration
This experiment is carried out in the same simulation environment based on the Windows 10 system and using MATLAB 2016a. The experimental setting is important for computational sciences, such as drug discovery [102,103], information retrieval services [104][105][106], network analysis [107,108], active surveillance [109], and disease prediction [110][111][112]. For the sake of fairness, this experiment used statistical methods, including means and standard deviation, to analyze ten identical experiments. Specifically, average result (Avg) and standard deviation (Std) represent the average prediction result and standard deviation of each model under ten experiments, respectively. To find the best model, we evaluated each model using 10-fold cross-validation, which was adopted in many studies [43,44,113]. Moreover, the prediction results were analyzed by using the common metrics [114][115][116], including accuracy (ACC), sensitivity (Sens), specificity (Spec), and the Matthews correlation coefficient (MCC).

Experiment Results
In this section, we performed statistical analyses on the comparison results of the proposed GLLCSA-KELM-FS model with the other five models, including GLLCSA-KELM, CSA-KELM, CSA-KELM, RF, and FKNN. Among them, Table 8 shows the average results of the ten experiments of GLLCSA-KELM-FS and the other five models. The best results are in bold. It can be seen that the accuracy rate of the GLLCSA-KELM-FS model in predicting the entrepreneurial intention of graduates is as high as 93.20%, while the accuracy rates of the other five models are 90.17%, 88.22%, 88.81%, 89.42%, and 87.29%, respectively. For the sensitivity metric, the result of KELM is 0.99% higher than that of GLLCSA-KELM-FS. Regarding the specificity and MCC metrics, the GLLCSA-KELM-FS model achieves the best results.  Table 9 shows the standard deviation results of the ten experiments between GLLCSA-KELM-FS and the other five models. It is easily obtained that the GLLCSA-KELM-FS model has the best stability in the ACC, sensitivity, and MCC metrics. On the specificity index, the RF model has the best stability. For a better description, Figure 4 shows the above-mentioned histograms of Avg and Std in detail. It can be seen that the accuracy, specificity, and MCC of the proposed GLLCSA-KELM-FS model are higher than the other models. For stability, the specificity of the GLLCSA-KELM-FS model is only inferior to RF. The key features of the graduates' entrepreneurial intentions screened by FS in the experiment are shown in Figure 5, in which the occurrences of features F3, F6, and F10 are 9, 8, and 8, respectively, which can be used as the key features to classify and measure students' entrepreneurial intentions. Appl

Discussion
According to the above research results, graduates' entrepreneurial intention is affected by many factors. Among them, the major (F3), campus innovation and entrepreneurship practice experience (F6), and positive personality (F10) have the greatest impact. Usually, the majors chosen by students have the necessary connections to their subsequent entrepreneurial intentions. Of course, the practical experience of campus innovation and entrepreneurship also promotes students' entrepreneurial preferences. Students who have participated in entrepreneurial practice have significantly stronger entrepreneurial motivation than students without entrepreneurial experience. In addition, it was found that entrepreneurial talents have the characteristics of "endogenous growth", and

Discussion
According to the above research results, graduates' entrepreneurial intention is affected by many factors. Among them, the major (F3), campus innovation and entrepreneurship practice experience (F6), and positive personality (F10) have the greatest impact. Usually, the majors chosen by students have the necessary connections to their subsequent entrepreneurial intentions. Of course, the practical experience of campus innovation and entrepreneurship also promotes students' entrepreneurial preferences. Students who have participated in entrepreneurial practice have significantly stronger entrepreneurial motivation than students without entrepreneurial experience. In addition, it was found that entrepreneurial talents have the characteristics of "endogenous growth", and

Discussion
According to the above research results, graduates' entrepreneurial intention is affected by many factors. Among them, the major (F3), campus innovation and entrepreneurship practice experience (F6), and positive personality (F10) have the greatest impact. Usually, the majors chosen by students have the necessary connections to their subsequent entrepreneurial intentions. Of course, the practical experience of campus innovation and entrepreneurship also promotes students' entrepreneurial preferences. Students who have participated in entrepreneurial practice have significantly stronger entrepreneurial motivation than students without entrepreneurial experience. In addition, it was found that entrepreneurial talents have the characteristics of "endogenous growth", and entrepreneurial cognition and student behavior are easily affected by positive personality. There is also a moderating effect between the active personality with the growth ability in the new environment and the students' entrepreneurial intention. The stronger the active personality, the stronger the motivation for entrepreneurial behavior, and the stronger the entrepreneurial practice ability. The difficulty of this paper mainly lies in the construction of the GLLCSA-KELM-FS model. First, the design of GLLCSA was an important difficulty. Second, how to filter out the key features from complex datasets was another important difficulty. Finally, how to combine GLLCSA with KELM and FS was the key to solving these problems.
This paper also has some limitations. First, the sample data is limited. An appropriate amount of datasets can effectively avoid underfitting imagination and improve the accuracy of model fitting. Secondly, the characteristics of the graduates' entrepreneurial intentions involved in this paper still need to be further explored. Finally, the generality of the fitted model needs to be further verified. Owing to the optimization potential, the proposed GLLCSA can be also applied to other complex tasks, such as the recom-mender systems [117][118][119], location-based services [120,121], human motion capture [122], text clustering [123], kayak cycle phase segmentation [124], drug-disease associations prediction [125], practical engineering problems [126,127], medical diagnosis [28,128], fault diagnosis [129] and solar cell parameter identification [130].

Conclusions and Future Works
This paper constructs a reliable student entrepreneurial intention prediction model, namely the GLLCSA-KELM-FS model. On the one hand, the model optimizes two hyperparameters of KELM through GLLCSA, aiming to obtain the best fitting model. On the other hand, it uses FS to extract the key features that affect students' entrepreneurial intentions. The main innovation of this paper was to introduce the GLL mechanism to effectively improve the performance of the CSA and obtain the best combination of KELM parameters. The benchmark function comparison experiments effectively verified the performance of GLLCSA. Further, GLLCSA-KELM-FS was used to predict the entrepreneurial intentions of 842 students from Zhejiang Wenzhou Vocational College over the past five years. Compared with the five other machine learning methods, the GLLCSA-KELM-FS model can correctly predict students' entrepreneurial intentions with an accuracy of 93.2%. In addition, the key factors affecting the school's entrepreneurial intention are the major studied, campus innovation and entrepreneurship practice experience, and positive personality. This research can deeply dig into how colleges and universities can cultivate students' entrepreneurial ability more scientifically through the relevant factors so as to help them position their careers more concretely and rationally. Moreover, it can be used to help cultivate entrepreneurial talents so that they can make more conscious and focused career decisions.
In the follow-up research, the generality of the proposed GLLCSA-KELM-FS will be further improved, and the employment intention of more college students will be predicted. Furthermore, it can also be used to solve other problems, such as disease diagnoses and financial risk predictions [131,132]. Further, the GLLCSA method can also be used to optimize the hyperparameters of other models and solve more complex optimization problems [133,134].