Prediction of Hypertension Outcomes Based on Gain Sequence Forward Tabu Search Feature Selection and XGBoost

For patients with hypertension, serious complications, such as myocardial infarction, a common cause of heart failure, occurs in the late stage of hypertension. Hypertension outcomes can lead to complications, including death. Hypertension outcomes threaten patients’ lives and need to be predicted. In our research, we reviewed the hypertension medical data from a tertiary-grade A class hospital in Beijing, and established a hypertension outcome prediction model with the machine learning theory. We first proposed a gain sequence forward tabu search feature selection (GSFTS-FS) method, which can search the optimal combination of medical variables that affect hypertension outcomes. Based on this, the XGBoost algorithm established a prediction model because of its good stability. We verified the proposed method by comparing other commonly used models in similar works. The proposed GSFTS-FS improved the performance by about 10%. The proposed prediction method has the best performance and its AUC value, accuracy, F1 value, and recall of 10-fold cross-validation were 0.96. 0.95, 0.88, and 0.82, respectively. It also performed well on test datasets with 0.92, 0.94, 0.87, and 0.80 for AUC, accuracy, F1, and recall, respectively. Therefore, the XGBoost with GSFTS-FS can accurately and effectively predict the occurrence of outcomes for patients with hypertension, and can provide guidance for doctors in clinical diagnoses and medical decision-making.


Introduction
Hypertension outcomes can include serious complications (e.g., cerebrovascular disease, myocardial infarction, stroke, etc.), and even death, when the condition progresses to a terminal stage. Hypertension is one of the most common chronic human diseases, and serious hypertension complications can greatly endanger the life and health of a patient, causing irreversible damage to the patient's heart, brain, kidney, and fundus.
Heart complications of hypertension mainly include left ventricular hypertrophy, angina pectoris, myocardial infarction, and heart failure. Hypertension can damage the heart blood vessels, mainly the coronary arteries, which will eventually cause atherosclerosis of the coronary arteries. The myocardial blood supply is reduced, causing coronary heart disease. The brain complications of hypertension mainly include hemorrhagic stroke, ischemic stroke, hypertensive encephalopathy, etc. Among them, cerebral hemorrhage is one of the most serious hypertension complications. The kidney complications of hypertension mainly include malignant arterioles nephrosclerosis and chronic renal failure. The main manifestations of hypertension on the kidneys are proteinuria and impaired renal function. Some patients tend to have the impaired distal renal tubular concentration in the late stage of hypertension. Fundus complications of hypertension include retinal arteriosclerosis. Patients may experience decreased vision, bleeding in the fundus, cataracts, and blindness.
Due to the severe outcomes, patients and doctors are aiming to prevent the occurrence and progression of hypertension. To effectively reduce the incidence of outcomes, people must first effectively predict the outcomes and prevent them before they occur. However, it is still very difficult to detect the threat of outcomes because there are no obvious early signs of hypertensive complications. In addition, there are few studies on hypertension outcome predictions; the current research is mainly from a medical perspective and it is hard to predict.
Computer science and data analysis technology provide new methods and ideas for the prediction of hypertension outcomes. On the one hand, machine learning, data mining, and information sciences have been widely used in various fields of medicine and achieved good results [1][2][3][4][5][6][7], providing technical support for this study. On the other hand, the application of electronic medical records and databases, automatic and electronic medical equipment, and increasing emphasis placed on medical data by hospitals and medical institutions all promote the digitalization of medical information. As a result, massive amounts of medical data on hypertension patients were obtained and preserved, providing data for this study.
This paper reviews the medical data of hypertension patients provided by the hypertension center of a tertiary-grade A class hospital in Beijing as the research object, and accurately predicts the occurrence of outcomes by machine learning technology. The purpose of this study was to map the relationships between medical indicators and outcomes through analyses of hypertension medical data. Therefore, when the medical data of a new patient was input, the model determined whether the hypertension outcomes occurred with a certain probability. This was a supervised machine learning classification problem with two main tasks: (1) to reduce the dimension of medical data characteristics. The patients' medical data indices were numerous, with high dimensions, including blood pressure index, blood routine, urine, routine, etc. From the perspective of data mining, the high dimension of the data set had irrelevant, redundant information and noise, which affected the accuracy of the prediction model. From the perspective of a medical application, the less indicators needed to predict the outcomes, the less difficult the indicator acquisition, and the lower the prediction cost. (2) The second task was to establish a machine learning model to predict hypertension outcomes and evaluate the predictive performance of the model.
In the first task, this paper constructed the feature selection method based on the gain sequence forward tabu search (GSFTS) to automatically select the high-quality feature combination. This method can greatly reduce the data dimension and improve the prediction accuracy. More importantly, the feature selection method automatically helps doctors identify the key factors for hypertension outcomes in a large number of medical indicators. In the second task, this paper adopted XGBoost to realize the prediction of hypertension outcomes.
Feature selection is the process of selecting a feature subset in a given set of attributes. The dimensions of medical data are usually very high. It is important to select the best feature subset to reduce the processing cost, and improve the practicability of the model constructed from it.
Search strategy and feature evaluation functions are the key steps of feature selection. The wrapper method uses the classifier performance as the evaluation function of feature selection. The embedded method combines the process of feature selection with the process of learning. The filter feature selection method first carries on the feature selection before training the learner [8,9]. Sequence search strategy refers to adding or deleting one or more features in each step, and the feature evaluation function is used to determine whether the deletion or addition is effective. Therefore, feature selection is essentially a search optimization problem, and heuristic algorithms, such as genetic algorithm (GA), simulated annealing algorithm (SA), ant colony algorithm (ACA), and tabu search (TS), are a few options to solve the combination optimization problem and find better solutions. Heuristic algorithms were widely used in feature selection and have achieved good results in medicine [10,11].
XGBoost (Extreme Gradient Boosting) is a commonly used and efficient algorithm for machine learning, and its effect is remarkable [12][13][14][15][16]. For example, CYe (2018) et al. constructed a risk prediction model for essential hypertension using electronic health data and XGBoost algorithm [13]. Y Liu compared four machine learning algorithms through experiments and proved that XGBoost has advantages in predicting hypertension outcomes [16].
At present, some scholars have applied machine learning methods to the prediction of hypertension complications and other related diseases [17][18][19][20][21][22]. The methods include random forests, support vector machines, logistic regression, decision trees, etc. We noticed that these studies focus on the use of traditional and standard machine learning algorithms to build models, without considering the impact of medical features on the prediction results, and the efficiency of machine learning under the condition of large-scale patient data. Therefore, on the one hand, we proposed a new feature selection method to better explore what medical features could affect the hypertension outcomes. On the other hand, we used a novel integrated learning method XGBoost to efficiently process large-scale medical data and meet actual needs. Therefore, the method proposed in this study is innovate, integrating theory and practice.
This article is divided into four parts. The specific organizational structure is as follows: Section 1 mainly introduces the research background, literature review, research contents, and ideas. Section 2 introduces the method and model, and proposes a hypertension outcome prediction model based on GSFTS-FS and XGBoost. Section 3 is an empirical study, the results of which are analyzed and discussed to verify the effectiveness of the proposed method. Section 4 is the conclusion.

Medical Data and Preprocessing
The medical data used in this study were obtained from a hypertension center of a tertiary-grade A class hospital in Beijing. The hospital collected data from 1357 patients with hypertension from September 2012 to December 2016. The patients come from various regions in China. The data set were divided into two parts. One part was the medical examination data and related survey data (i.e., characteristic data) during the patient admission. The other part involved the data on whether the outcomes occurred in the patients (i.e., the labeled data: yes/no or 1/0) marked by the hospital staff during the followup period after the patient was discharged. Characteristic data included baseline data, limb blood pressure, ambulatory blood pressure, echocardiography, heart failure, and other categories-a total of 132 examination indicators. The outcomes involved complications of the four target organs: heart, brain, kidney, and fundus. Table 1 shows the name, medical description, data type, mean value, standard deviation, and data distribution range of some medical indicators of the data set. Table A1 in Appendix A is a list of all the medical features that are considered in this study.  17 20.13 There are impurities in the original data. (1) There are a large number of missing values in the original data set, and some attributes have missing values of more than 90%.
(2) There are some abnormal values, which exceed the regular distribution interval of the attribute. (3) For different physical examination indicators, the attribute dimensional units are different.
The deletion and mean interpolation are used to deal with missing values. Features and samples with missing values exceeding 50% are directly deleted; for deleted data sets, missing values are interpolated according to the mean value of attributes. Outliers are directly deleted. In this study, the maximum and minimum standardization methods are used to unify the dimensions.

Gain Sequence Forward Tabu Search Feature Selection (GSFTS-FS)
In this study, we proposed a new medical feature selection (FS) strategy called gain sequence forward tabu search (GSFTS). GSFTS-FS is a wrapper feature selection method. It takes the performance of the prediction model as a criterion and objective function to evaluate the quality of the selected feature subset. It is mainly divided into three steps. First, XGBoost rank and score feature importance based on the average gain. Second, sequence forward search based on the ranking is performed to obtain initial feature combinations. Finally, the selected feature combination is further optimized by tabu search algorithm. The basic steps of the GSFTS-FS algorithm are shown in Figure 1.
Based on the concept of the GSFTS algorithm, the specific process is as follows: 1. Feature importance ranking. We first build an initial classifier (XGBoost) and fit the data. We calculate the average information gain across all split points in XGBoost of each feature to rank all feature importance. The higher the gain, the greater the feature contribution and the higher the importance.
2. Initial Solution by gain-based sequence forward search. The traditional tabu search algorithm has two shortcomings: (1) Strong dependence on the initial solution. A good initial solution helps the search to reach the optimal solution quickly, while a bad initial solution often makes the search difficult or impossible to reach the optimal solution. (2) The running time of the algorithm is greatly affected by the initial solution. A better initial solution can push the search move closer to the optimal solution with fewer iterations, thereby reducing the search time. The search with poor initial solution needs many iterations to get close to the optimal solution, which prolongs the search time.  Based on the concept of the GSFTS algorithm, the specific process is as follows: 1.
Feature importance ranking. We first build an initial classifier (XGBoost) and fit the data. We calculate the average information gain across all split points in XGBoost of each feature to rank all feature importance. The higher the gain, the greater the feature contribution and the higher the importance.
2. Initial Solution by gain-based sequence forward search. The traditional tabu search algorithm has two shortcomings: (1) Strong dependence on the initial solution. A good initial solution helps the search to reach the optimal solution quickly, while a bad initial solution often makes the search difficult or impossible to reach the optimal solution. (2) The running time of the algorithm is greatly affected by the initial solution. A better initial solution can push the search move closer to the optimal solution with fewer iterations, thereby reducing the search time. The search with poor initial solution needs many iterations to get close to the optimal solution, which prolongs the search time.
Aiming to provide a better initial optimal solution for tabu search, we proposed a new Sequence Forward Search based on feature importance ranking by information Gain (gain-based SFS). Suppose the feature importance is ranked as (Fa, Fb, Fc…), specific steps are as follows: (1) Add the feature Fa, which ranks first in importance, to the feature subset S. The current subset is S' = {Fa}, the dimension of the subset is i = 1, and the classification accuracy on the training set is selected as the evaluation function f.
(2) Calculate the evaluation function score under the current feature subset f(S'). Aiming to provide a better initial optimal solution for tabu search, we proposed a new Sequence Forward Search based on feature importance ranking by information Gain (gain-based SFS). Suppose the feature importance is ranked as (Fa, Fb, Fc . . . ), specific steps are as follows: (1) Add the feature Fa, which ranks first in importance, to the feature subset S. The current subset is S = {Fa}, the dimension of the subset is i = 1, and the classification accuracy on the training set is selected as the evaluation function f.
(2) Calculate the evaluation function score under the current feature subset f (S ).
(3) According to the order of feature importance, the feature with ranking i + 1 is added to feature subset S .
(4) Calculate the evaluation function score f (S ) under the current feature subset, if the score drops, stop searching; if the score rises, repeat step (3).
We propose a coding structure as shown in Figure 2 before tabu search. It consists of three parts. The first part F1, F2, F3 . . . Fn represents each feature in an n-dimensional medical feature set by a 0/1-bit string. If the feature is in the feature subset, then Fi (i ∈ [1, n]) is 1, or else it is 0. The second part is the objective function (accuracy, precision, recall, F1, and AUC), and the third part is the selected classification algorithm. Figure 3 is an example of an initial solution after encoding. (3) According to the order of feature importance, the feature with ranking i+1 is added to feature subset S'.
(4) Calculate the evaluation function score f(S') under the current feature subset, if the score drops, stop searching; if the score rises, repeat step (3). 3.
Encoding. We propose a coding structure as shown in Figure 2 before tabu search. It consists of three parts. The first part F1, F2, F3… Fn represents each feature in an n-dimensional medical feature set by a 0/1-bit string. If the feature is in the feature subset, then Fi (i ∈ [1, n]) is 1, or else it is 0. The second part is the objective function (accuracy, precision, recall, F1, and AUC), and the third part is the selected classification algorithm. Figure 3 is an example of an initial solution after encoding.  and AUC), and the third part is the selected classification algorithm. Figure 3 is an example of an initial solution after encoding.

4.
Neighborhood feasible solution. It is important to generate a neighborhood feasible solution based on the current solution. The specific method is to randomly select the feature code in the initial solution. If the feature number is 0, add the feature (the code is changed to 1); if the feature number is 1, then delete the feature (the code is changed to 0). Each neighborhood feasible solution differs from the initial solution by only one feature code. Then a specified number of neighborhood feasible solutions are generated. The number of feasible solutions in the neighborhood is the candidate set length. Figure 4 shows the four neighborhood feasible solutions generated from the initial solution. According to selected classifier DT and evaluation function AUC, the optimal solution of the four feasible solutions in the neighborhood is selected and regarded as the current optimal solution in the next iteration.
If the feature (added or deleted) makes the neighborhood feasible solution the current optimal solution, then the feature cannot be selected (added or deleted) in the next several rounds of T (tabu list length) iterations. For example, the third feasible solution in Figure 4 becomes the current optimal solution because the feature F2 is added to the initial solution. Then feature F2 is added to the tabu list. Table 2 is a tabu list with tabu length TL = 3. In the next 3 iterations, F2 cannot be added or deleted. The tabu list guarantees that the algorithm prevents searching for solutions that have been accessed, and helps to jump out of local optimal solutions.

Neighborhood feasible solution.
It is important to generate a neighborhood feasible solution based on the current solution. The specific method is to randomly select the feature code in the initial solution. If the feature number is 0, add the feature (the code is changed to 1); if the feature number is 1, then delete the feature (the code is changed to 0). Each neighborhood feasible solution differs from the initial solution by only one feature code. Then a specified number of neighborhood feasible solutions are generated. The number of feasible solutions in the neighborhood is the candidate set length. Figure 4 shows the four neighborhood feasible solutions generated from the initial solution. and AUC), and the third part is the selected classification algorithm. Figure 3 is an example of an initial solution after encoding.

4.
Neighborhood feasible solution. It is important to generate a neighborhood feasible solution based on the current solution. The specific method is to randomly select the feature code in the initial solution. If the feature number is 0, add the feature (the code is changed to 1); if the feature number is 1, then delete the feature (the code is changed to 0). Each neighborhood feasible solution differs from the initial solution by only one feature code. Then a specified number of neighborhood feasible solutions are generated. The number of feasible solutions in the neighborhood is the candidate set length. Figure 4 shows the four neighborhood feasible solutions generated from the initial solution. According to selected classifier DT and evaluation function AUC, the optimal solution of the four feasible solutions in the neighborhood is selected and regarded as the current optimal solution in the next iteration.
If the feature (added or deleted) makes the neighborhood feasible solution the current optimal solution, then the feature cannot be selected (added or deleted) in the next several rounds of T (tabu list length) iterations. For example, the third feasible solution in Figure 4 becomes the current optimal solution because the feature F2 is added to the initial solution. Then feature F2 is added to the tabu list. Table 2 is a tabu list with tabu length TL = 3. In the next 3 iterations, F2 cannot be added or deleted. The tabu list guarantees that the algorithm prevents searching for solutions that have been accessed, and helps to jump out of local optimal solutions. According to selected classifier DT and evaluation function AUC, the optimal solution of the four feasible solutions in the neighborhood is selected and regarded as the current optimal solution in the next iteration.
5. Tabu movement. If the feature i (added or deleted) makes the neighborhood feasible solution the current optimal solution, then the feature i cannot be selected (added or deleted) in the next several rounds of T (tabu list length) iterations. For example, the third feasible solution in Figure 4 becomes the current optimal solution because the feature F2 is added to the initial solution. Then feature F2 is added to the tabu list. Table 2 is a tabu list with tabu length TL = 3. In the next 3 iterations, F2 cannot be added or deleted. The tabu list guarantees that the algorithm prevents searching for solutions that have been accessed, and helps to jump out of local optimal solutions.

NO.
Tabu target 1 F2 2 3 6. Contempt principle. Due to the existence of the tabu list, generally tabu feature will not participate in the next several rounds of search. However, when the participation of the tabu feature can make the evaluation function reach the historical optimal, the tabu feature will be amnesty, which is conducive to finding the global optimal solution. Specifically, if moving (adding/deleting) feature i can make the feasible solution better than any solution of the previous iteration, then it is allowed to add/delete the feature, even if feature i is in the tabu list. For example, if moving feature F2 in Table 2 in the next three rounds of iteration can make the feasible solution the historically optimal solution, then remove F2 from the tabu list. The contempt principle is a method of covering tabu movement, which can avoid missing a good solution.
7. Stop rule. We set the stop rule to a fixed number of iterations.
In summary, GSFTS-FS has the following advantages: In gain-based-SFS, the order of adding features is arranged in order of feature importance. The more important features are prioritized and added to the feature combination until the classification algorithm reaches a certain local optimal solution. Therefore, medical features that have a significant impact on the hypertension outcomes are given priority to provide a good initial solution for the subsequent tabu search.
The tabu search optimizes the gain-based-SFS solution, which push the algorithm jump out of the local optimal solution and continue the search. The tabu search uses a tabu list to record the local optimal points that have been reached. In the next search, the information in the tabu list is used to no longer or selectively search for these points, so as to avoids converging into a local optimum.

Model Mathematical Theory
XGBoost is an ensemble learning algorithm, and it is one of the boosting algorithms. The idea of XGBoost is to continuously add trees, and continuously perform feature splitting to grow a tree. Each time a tree is added, it is actually learning a new function to fit the residuals of the last prediction. When k trees are obtained after training, the score of a sample is predicted. In fact, according to the characteristics of this sample, a corresponding leaf node will fall in each tree, and each leaf node corresponds to a score. The scores corresponding to each tree add up to the predicted value of the sample.
XGBoost uses the second-order Taylor expansion of the loss function and adds a regular term to balance the complexity of the model and the decline in the loss function. It seeks the best solution globally and well avoids model overfitting. Suppose that the model generates t decision trees. Its prediction value for sample i is as follows.
i represents the predicted value of sample i, which is based on the sum of the predicted values of t decision trees. n represents the total number of all samples, and the subscript i represents the i-th sample. f t is the t-th classification tree, and F is the set space of all trees.
The loss function is as follows: l represents the degree of deviation between the predicted value ∧ y (t) i and the true value y i ; the second half of formula (2) represents the sum of the complexity of each tree and Ω( f k ) = γ * T + 1/2λ ω 2 . T is the number of leaf nodes, γ is the weight of leaf nodes, and λ and ω are regular coefficients. Combining Formulas (1), (2) and Taylor expansion of the loss function, Formula (3) is obtained as follows: g i is the first derivative, h i is the second derivative, and C is a constant. The formula is as follows: Definition I j = {i|q(x i ) = j} represents a sample set of leaf node j. After removing the constant term from Formula (3), the derivative term is 0, and the optimal solution ω * j can be obtained as follows: After bringing the optimal solution ω * j into Formula (3), we get Formula (10): XGBoost uses the greedy algorithm to segment the existing nodes each time. Assuming that IL and IR are the set of left and right nodes after segmentation, I = IL∪IR, then the information gain after segmentation is: As can be seen from Formula (11), similar to the ID3, C4.5, and CART decision tree algorithms, XGBoost determines whether a node is being split by subtracting the unsplit node score from the split left and right node scores. Meanwhile, XGBoost considers the complexity of the model and adds a regular term λ to limit the growth of the tree. When the gain is less than λ, no node splitting is performed.

XGBoost Hypertension Outcomes Prediction Process
The XGBoost algorithm is used to establish a model for prediction of hypertension outcomes. Figure 5 is the process of modeling of XGBoost. complexity of the model and adds a regular term λ to limit the growth of the tree. When the gain is less than λ, no node splitting is performed.

XGBoost Hypertension Outcomes Prediction Process
The XGBoost algorithm is used to establish a model for prediction of hypertension outcomes. Figure 5 is the process of modeling of XGBoost.   Figure 6 is an example of modeling using XGBoost. Two decision trees are generated based on the two characteristics of maximum systolic blood pressure and 24-h average blood pressure. For determining whether an outcome occurs in a sample, there are corresponding scores on the leaf nodes of the two trees. The scores of the two trees are summed up, the score of "outcome occurrence" is 2.5, and the score of "no outcome occurrence" is 4, so it is judged that the sample would not have an outcome.  Figure 6 is an example of modeling using XGBoost. Two decision trees are generated based on the two characteristics of maximum systolic blood pressure and 24-h average blood pressure. For determining whether an outcome occurs in a sample, there are corresponding scores on the leaf nodes of the two trees. The scores of the two trees are summed up, the score of "outcome occurrence" is 2.5, and the score of "no outcome occurrence" is 4, so it is judged that the sample would not have an outcome.

Analysis and Optimization of GSFTS-FS and XGBoost Parameters
The number of parameters to be determined in this paper is small and the value range is relatively easy to determine. Therefore, the grid search with cross validation is selected as the parameter optimization method with F1 as the evaluation index. The following parameters need to be adjusted and optimized.
(1) Candidate Set Length (CSL): the larger the length of the candidate set, the more feasible solutions can be selected in the neighborhood, and the easier it is to find the global optimal solution. However, if the length is too long, the amount of calculation will be large, and if the length is too short, it will easily fall into the local optimal solution.
(2) Tabu list length (TLL): the smaller the TLL, the larger the search range, but it is easy to repeat the search. If the TLL is too long, the calculation time will become longer.

Analysis and Optimization of GSFTS-FS and XGBoost Parameters
The number of parameters to be determined in this paper is small and the value range is relatively easy to determine. Therefore, the grid search with cross validation is selected as the parameter optimization method with F1 as the evaluation index. The following parameters need to be adjusted and optimized.
(1) Candidate Set Length (CSL): the larger the length of the candidate set, the more feasible solutions can be selected in the neighborhood, and the easier it is to find the global optimal solution. However, if the length is too long, the amount of calculation will be large, and if the length is too short, it will easily fall into the local optimal solution.
(2) Tabu list length (TLL): the smaller the TLL, the larger the search range, but it is easy to repeat the search. If the TLL is too long, the calculation time will become longer.
(3) Number of iterations: the more iterations, the easier it is to find a better solution. When it reaches a certain number (saturation point), the effect will not fluctuate greatly.
(4) Max depth of the tree in XGBoost: it is used to avoid overfitting. The larger the value, the more specific samples the model will learn.
(5) Number of estimators in XGBoost (NE): the more classifiers, the better the performance of the ensemble learning model. However, too many base classifiers will not only make it more computationally expensive and slower, but also cause overfitting.

Hypertension Outcomes Prediction Model Based on GSFTS-FS and XGBoost
Combining the GSFTS-FS and XGBoost methods mentioned above, the hypertension outcomes prediction model can be established. The prediction model flowchart is shown in Figure 7. After comparison and verification, the optimal feature combination is determined and used as the input of the final XGBoost prediction model for clinical practical application in patients with hypertension.

Preprocessed Medical Data
The violin plot, box plot, and clustered scatter plot of the preprocessed data are shown in Figures 8-10, respectively. Moreover, 0 means that hypertension outcomes did not occur, 1 means that hypertension outcomes occurred.

Preprocessed Medical Data
The violin plot, box plot, and clustered scatter plot of the preprocessed data are shown in Figures 8-10, respectively. Moreover, 0 means that hypertension outcomes did not occur, 1 means that hypertension outcomes occurred.    Through missing value processing, the number of samples in the data set is 752; the feature dimension is 84. The ratio of positive and negative samples is 1:7, which belongs to the category imbalanced data set. This paper adopts the EasyEnsemble method to deal with the imbalance of categories.

Feature Selection Results Based on GSFTS-FS
The gain sequence forward tabu search feature selection (GSFTS-FS) proposed in this study is performed and has achieved some good results. Feature gain importance ranking based on XGBoost are shown in Figure 11. The feature combinations after SFS under the four evaluation criteria are obtained, as shown in Table 3. It shows the feature combinations obtained by SFS under different evaluation standards are not exactly the same.  Through missing value processing, the number of samples in the data set is 752; the feature dimension is 84. The ratio of positive and negative samples is 1:7, which belongs to the category imbalanced data set. This paper adopts the EasyEnsemble method to deal with the imbalance of categories.

Feature Selection Results Based on GSFTS-FS
The gain sequence forward tabu search feature selection (GSFTS-FS) proposed in this study is performed and has achieved some good results. Feature gain importance ranking based on XGBoost are shown in Figure 11. The feature combinations after SFS under the four evaluation criteria are obtained, as shown in Table 3. It shows the feature combinations obtained by SFS under different evaluation standards are not exactly the same. Through missing value processing, the number of samples in the data set is 752; the feature dimension is 84. The ratio of positive and negative samples is 1:7, which belongs to the category imbalanced data set. This paper adopts the EasyEnsemble method to deal with the imbalance of categories.

Feature Selection Results Based on GSFTS-FS
The gain sequence forward tabu search feature selection (GSFTS-FS) proposed in this study is performed and has achieved some good results. Feature gain importance ranking based on XGBoost are shown in Figure 11. The feature combinations after SFS under the four evaluation criteria are obtained, as shown in Table 3. It shows the feature combinations obtained by SFS under different evaluation standards are not exactly the same. Figure 11. Feature importance ranking graph based on XGBoost. Figure 11. Feature importance ranking graph based on XGBoost. The length of the candidate set, the length of the tabu list, and the number of iterations in GSFTS-FS need to be further adjusted and optimized before it is used to optimize the feature combination selected by SFS.
The candidate set length (CSL) is optimized first. The tabu list length is fixed at 2 and the number of iterations is fixed at 80. The performance of the prediction model under different CSL is obtained, as shown in Table 4 and Figure 12. Table 4 and Figure 12 show that the length of the candidate set has an impact on the prediction model. Both the best F1 value and the average F1 value in 80 iterations change with CSL. The optimal CSL is 20, with which the prediction model performs best. Besides, as CSL increases, the model computation time increases significantly. This means that the increase in CSL will obviously bring more computation.  The length of the candidate set, the length of the tabu list, and the number of iterations in GSFTS-FS need to be further adjusted and optimized before it is used to optimize the feature combination selected by SFS.
The candidate set length (CSL) is optimized first. The tabu list length is fixed at 2 and the number of iterations is fixed at 80. The performance of the prediction model under different CSL is obtained, as shown in Table 4 and Figure 12. Table 4 and Figure 12 show that the length of the candidate set has an impact on the prediction model. Both the best F1 value and the average F1 value in 80 iterations change with CSL. The optimal CSL is 20, with which the prediction model performs best. Besides, as CSL increases, the model computation time increases significantly. This means that the increase in CSL will obviously bring more computation.  For the optimal CSL, the performance of the prediction model under different tabu list length (TLL) is shown in Table 5 and Figure 13. Table 5 shows that when the TLL reaches 12, the model performs best. Therefore, the optimal value is 12. Figure 13 shows that the performance and the calculation time does not change significantly with the increase TLL. This parameter has a small impact on the model. For the optimal CSL, the performance of the prediction model under different tabu list length (TLL) is shown in Table 5 and Figure 13. Table 5 shows that when the TLL reaches 12, the model performs best. Therefore, the optimal value is 12. Figure 13 shows that the performance and the calculation time does not change significantly with the increase TLL. This parameter has a small impact on the model.   For the optimal CSL and TLL, the model performance with increasing iterations is shown in Figure 14 Table 3 as the initial solution to obtain the final optimized   For the optimal CSL and TLL, the model performance with increasing iterations is shown in Figure 14 Table 3 as the initial solution to obtain the final optimized  Table 3 as the initial solution to obtain the final optimized feature combinations for the prediction model. The results are shown in Table 6. The number of features in the feature combination by GSFTS-FS is 9-16, which is relatively small compared to the original 84 features.

XGBoost Parameter Tuning
The results of parameter grid search tuning of XGBoost are shown in Table 7. Therefore, the maximum tree depth and the number of trees are determined to be 7 and 70 respectively. The trends of XGBoost performance with increasing max depth and NE are shown in Figure 15. feature combinations for the prediction model. The results are shown in Table 6. The number of features in the feature combination by GSFTS-FS is 9-16, which is relatively small compared to the original 84 features.

XGBoost Parameter Tuning
The results of parameter grid search tuning of XGBoost are shown in Table 7. Therefore, the maximum tree depth and the number of trees are determined to be 7 and 70 respectively. The trends of XGBoost performance with increasing max depth and NE are shown in Figure 15.

Prediction Model Validation Results
The hypertension outcomes prediction model proposed in this paper is verified below. The verification methods included 10-fold cross-validation and test set evaluation. We divided the medical data into training set, validation set, and test set. The training set and the validation set account for 75% of the data volume and are used for 10-fold cross training and validation. The test set accounts for 25% of the data volume and is completely independent from the model optimization procedure. The evaluation results of the test set can show the generalization performance of the model.
In order to verify the superiority of the proposed method, we compared the existing different feature selection methods and classification methods. For feature selection, the input data are the complete feature set and three feature combinations obtained by SFS, recursive feature elimination (RFE) and GSFTS. For outcome prediction, the support vector machine (SVM), decision tree (DT) and random forest (RF) are used for comparison. The model performance evaluation criteria are accuracy, AUC, F1, and recall. The results are shown in Tables 8 and 9. The McNemar statistical test is to verify the results in Table 9. At the significance level α = 0.05, we used McNemar statistic τ χ 2 and the corresponding P-value to provide the quantitative information for the significance of the difference between methods. The significance analysis results are shown in Table 10, Table 11 for the feature selection methods and are shown in Table 12, and Table 13 for the prediction algorithms.  Figure 16 is the performance comparison of the four prediction models under the three sets of feature combination. We compare the average and optimal values in the 80 iterations of the GSFTS-FS algorithm with the values of SFS and RFE. Figures 8-10 show the distribution of all medical data on different indicators. It can be seen from the figures that the data distribution on some indicators is the same for patients with outcomes and those without outcomes, while the data distribution on other indicators is different for different patients. This shows that some medical characteristics can affect hypertension outcomes, resulting in differences in data distribution.

Discussion
Feature combinations by GSFTS-FS in Table 6 show that the number of medical indicators used to predict hypertension outcomes has been greatly reduced compared with the original medical indicators set. For doctors, this helps them narrow the scope of analysis and analyze pathologically the factors that affect hypertension outcomes. For example, the No. 61 index (right brachial-ankle pulse wave conduction velocity) is required by the prediction model under all evaluation indices in Table 6. For patients, fewer medical indicators mean fewer examination items. For machine learning algorithms, the optimal combination of features can reduce learning burden and calculation time, to better judge the hypertension outcomes with high efficiency.

Discussion
Figures 8-10 show the distribution of all medical data on different indicators. It can be seen from the figures that the data distribution on some indicators is the same for patients with outcomes and those without outcomes, while the data distribution on other indicators is different for different patients. This shows that some medical characteristics can affect hypertension outcomes, resulting in differences in data distribution.
Feature combinations by GSFTS-FS in Table 6 show that the number of medical indicators used to predict hypertension outcomes has been greatly reduced compared with the original medical indicators set. For doctors, this helps them narrow the scope of analysis and analyze pathologically the factors that affect hypertension outcomes. For example, the No. 61 index (right brachial-ankle pulse wave conduction velocity) is required by the prediction model under all evaluation indices in Table 6. For patients, fewer medical indicators mean fewer examination items. For machine learning algorithms, the optimal combination of features can reduce learning burden and calculation time, to better judge the hypertension outcomes with high efficiency. Now, we can analyze the impact of different feature combinations on the prediction model from Table 8. No matter what kind of evaluation index, the prediction result of the feature combination obtained by GSFTS-FS is significantly better than that without feature selection. The GSFTS-FS method significantly improves the performance of the prediction model. The feature combinations obtained by SFS and RFE also help to improve performance, but it is not as good as the feature combination optimized by GSFTS-FS, which has a higher promotion rate. Now, we can analyze the impact of different feature combinations on the prediction model from Table 8. No matter what kind of evaluation index, the prediction result of the feature combination obtained by GSFTS-FS is significantly better than that without feature selection. The GSFTS-FS method significantly improves the performance of the prediction model. The feature combinations obtained by SFS and RFE also help to improve performance, but it is not as good as the feature combination optimized by GSFTS-FS, which has a higher promotion rate.
Next, we analyze the performance of the four prediction models. It can be seen from Table 8 that the performance of the XGBoost is better than SVM, C4.5 decision tree, or random forest under any evaluation criteria, and the advantage is obvious. The XGBoost classification algorithm combined with the GSFTS-FS algorithm performs the best, with an accuracy of 0.95, an AUC value of 0.96, an F1 value of 0.88, and a recall value of 0.82. Compared with the dataset without feature selection, the accuracy, AUC value, F1 value, and recall value of this method have been increased by 7.9%, 7.9%, 8.6%, and 15.5%, respectively. Table 9 shows that the prediction has good generalization ability and can be used for the prediction of new samples. The GSFTS-XGB achieved the best performance on the test set and its AUC, accuracy, F1 value, recall are 0.92, 0.94, 0.87, 0.80, respectively. The results of McNemar's significance analysis confirmed that, the proposed methods is superior to alternative approaches. The results in Tables 10 and 11 show that new medical feature selection method GSFTS has significant advantages over other existing methods (τ χ 2 > 3.8415 or p-value < 0.05) while there is no significant difference between RFE and SFS. As shown in Tables 12 and 13, the difference between XGBoost and other classification algorithms are significant within 95% confidence interval.
By observing Figure 16, the same conclusions as in Table 8 can be obtained. The GSFTS-FS method is more effective than SFS method, and the performance is significantly improved. The XGBoost performed better than the other two classification algorithms.
Compared with similar work [17][18][19][20][21][22], we not only proved that XGBoost is better than random forest and decision tree that are commonly used in the literature in predicting hypertension diseases, but also further optimized the selection process of medical features to better quantitatively analyze which medical indicators affect hypertension outcomes. For example, Liu Y used a combination of recursive feature elimination (RFE) and XGBoost to build a hypertension prediction model [16]. RFE is a greedy optimization algorithm, which is easy to fall into a local optimal solution, while GSFTS-FS can solve this problem and obtain a better global optimal solution.
The study in this paper has the following points for further research. First, due to the difficulty in collecting medical data, the amount of data used in this research is still small. With more samples accumulated and collected in the future, the model can be further optimized and adjusted. Second, the Recall value of the prediction model established in this paper is about 80%. For clinical applications, it is hoped that patients who are about to have an outcome will be predicted as many as possible. Due to the characteristics of hypertension, the incidence of outcomes is small, and further research is needed to improve the Recall value of the prediction model.

Conclusions
One difficult question in medicine is, "Will serious outcomes occur in patients with hypertension?" Research derived from clinical medicine has not solved this question. In regards to data mining and machine learning, we proposed a hypertensive outcome prediction model combining GSFTS-FS and XGBoost. We analyzed the medical data of 1357 patients with hypertension from a Beijing hospital, and verified the prediction model. By comparing and analyzing the experimental results, we can draw the following conclusions: (1) GSFTS-FS can screen valuable parts from many medical indicators and provide high-quality input variables for the prediction model. (2) GSFTS-FS captures the changes in the information gain of medical variables and digs out key factors affecting the hypertension outcomes through global optimal search. (3) Through GSFTS-FS, we discovered that medical variables, such as right brachial-ankle pulse wave conduction velocity, the highest systolic blood pressure, limb blood pressure, and ambulatory blood pressure may have a higher impact on hypertension outcomes. (4) The prediction model combining the GSFTS-FS method and XGBoost algorithm performs well and has an accuracy of 0.946, an AUC of 0.956, an F1 of 0.879, and a recall of 0.805, which can accurately and effectively predict outcomes in patients with hypertension.
The model proposed in this paper can provide guidance and aid decision-making for doctors in clinical diagnosis and treatment, and has the significance of theoretical research and practical application. First, the model greatly reduces the number of medical indicators needed to determine the occurrence of outcome. Instead of a full set of inspections, patients only need to do targeted inspection items. Second, the model can accurately determine whether a patient will have a hypertension outcome. If the patient is predicted to have an outcome, the doctor can prepare in advance and take protective measures. This can help reduce the incidence of outcomes or enable patients to be treated promptly and properly when outcomes occur. Third, the model can indicate which medical indicators have an impact on the outcome. It provides guidance for medical research. Experts and scholars in the medical field can analyze the correlation between various indicators and hypertension outcomes from a pathological perspective.