Controllability of Fractional-Order Particle Swarm Optimizer and Its Application in the Classiﬁcation of Heart Disease

: This study proposes a method to improve fractional-order particle swarm optimizer to overcome the shortcomings of traditional swarm algorithms, such as low search accuracy in a high-dimensional space, falling into local minimums, and nonrobust results. In natural phenomena, our controllable fractional-order particle swarm optimizer can explore search spaces in detail to obtain high resolutions. Moreover, the proposed algorithm is memorable, i.e., position updates focus on the particle position of previous and last generations, rendering it conservative when updating the position, and obtained results are robust. For verifying the algorithm’s effectiveness, 11 test functions compare the average value, overall best value, and standard deviation of the controllable fractional-order particle swarm optimizer and controllable particle swarm optimizer; experimental results show that the stability of the former is better than the latter. Furthermore, the solution position found by the controllable fractional-order particle swarm optimizer is more reliable. Therefore, the improved method proposed herein is effective. Moreover, this research describes how a heart disease prediction application uses the optimizer we proposed to optimize XGBoost hyperparameters with custom target values. The ﬁnal veriﬁcation of the obtained prediction model is effective and reliable, which shows the controllability of our proposed fractional-order particle swarm optimizer. Author Contributions: Conceptualization, F.-I.C. and T.-H.H.; methodology, P.-Y.Y.; software, C.-H.L.; validation, P.-Y.Y. and C.-H.L.; formal analysis, W.-H.H.; investigation, J.-H.C.; resources, J.-H.C.; data curation, T.-C.L.; writing—original draft preparation, F.-I.C.; writing—review and editing, T.-H.H.; visualization, W.-H.H.; supervision, J.-H.C.; project administration, J.-H.C.; funding acquisition,


Introduction
Optimization methods involve automatically finding the best solution in a problem's solution space set. When converting a real problem into a mathematical model, simulating the actual physical characteristics of the problem requires a detailed description of the conversion process. Moreover, the mathematical model of the problem becomes more complex. Currently, three methods exist for solving optimization problems: numerical method, enumerative, and random search.
Numerical methods use the derivative as a technique to find the best value in the space. For example, the traditional neural network series of algorithms is based on this method's gradient descent to find the best parameters [1]. However, the numerical method has two shortcomings. First, it searches for the best solution from a local point of view, so there is no guarantee that the solution found is globally optimal. Second, numerical methods are not applicable for search spaces that are not smooth or continuous [2,3]. However, because usually many regional optimal solutions exist in a search space that is not smooth or continuous, it is easy to converge early in the search process and find the optimal local solution.
The enumeration method, such as grid search, uses the objective function to test all solutions in the search space at that level when the segmentation level is selected. This method has a better chance of obtaining the best solution, but it requires considerable computation time. Therefore, when the search space is ample, the enumeration method is inefficient. The random search method is currently a commonly used optimization method, which finds the best solution space by imitating natural or biological behavior, and particle swarm optimization (PSO) [4] is one of the optimization methods that imitate biological behavior.
PSO is an intelligent swarm algorithm that observes the behavior of swarm creatures. This method is used to find the best position in the current space. After PSO was published, many scholars proposed different methods to improve the algorithm. These methods have been applied in many fields [5][6][7].
For example, Shi and Eberhart [8] proposed a constant inertia weight to improve the moving direction of particles. Different inertial weights make it possible to find a balance between local and global searches. Thus, the algorithm considers the best solution in the whole domain. Shi and Eberhart [9] again proposed linearly decreasing inertia weights. In the same year, Suganthan [10] proposed a linear decrease with dynamic characteristics applied to individual learning parameters c 1 and group learning parameters c 2 , which effectively improved global search. Clerc [11,12] put forward the concept of the shrinkage coefficient, and its idea is to change the moving direction of particles to increase local search. Shi and Eberhart [13] proposed the maximum speed method to improve search. Ratnaweera et al. [14] improved the method proposed by Suganthan by changing the swarm learning parameter from linearly decreasing to linearly increasing. Chatterjee and Siarry [15] proposed to change inertia weights in a nonlinearly decreasing way. Ko et al. [16] extended the concept of nonlinear change to individual and group learning parameters. They changed the individual learning parameters to nonlinear decreasing individual learning parameters and the group learning parameters to nonlinear increasing group learning parameters.
With many scholars proposing methods to improve the original algorithm, the particle swarm algorithm search has been dramatically improved. Since 2002, Clerc and Kennedy [12] have used the dynamic system notation in control theory to explore the internal operation of the particle swarm algorithm. Many scholars have also proposed the stability of particle swarm algorithms under different conditions based on a dynamic system representation [17][18][19][20]. In particular, in 2014, Lin [21] proposed the PSO algorithm with controllability. The method explores the particle swarm algorithm from the viewpoint of state controllability in dynamic systems. When the controllable conditions are met, the particle swarm algorithm's position and velocity vectors are controlled by its own best solution position vector and the global best solution position vector; this makes the convergence better than in the original particle swarm algorithm.
However, the searchability of the integer-order particle swarm algorithm proposed by Lin [21] is poor and is unstable in high-dimensional or complex spaces. Therefore, this study suggests combining fractional-order particle swarm optimizer and PSO algorithm with controllability, which is called "the controllable fractional-order particle swarm algorithm". We can use the proposed algorithm to improve the PSO algorithm with controllability such that it performs better in high-dimensional or complex spaces.
We need to optimize hyperparameters efficiently and systematically for machine learning algorithms. Therefore, this study applies the "controllable fractional-order particle swarm algorithm" to optimize machine learning hyperparameters. We demonstrated how to efficiently and systematically find hyperparameters in extreme gradient boosting (XG-Boost) [22] machine learning algorithms using our recommended method. Further, we used the heart disease data set downloaded from the UCI website. The six best hyperparameters found using our recommended method and the hyperparameters officially recommended by XGBoost [23] were trained and tested, and data sets were compared. Experimental results showed that the recommended method had better performance, and no false-negative results were produced. This method can help physicians quickly determine whether a patient has heart disease using the learned model.

Materials and Methods
The particle swarm algorithm was initially inspired by Kennedy and Eberhart [4] by observing the foraging behavior of birds. It is a kind of mimic optimization algorithm with the concept of swarm intelligence. Suppose there is a flock of birds randomly scattered in a space where food exists, and there are many food piles of different sizes in the space. Then, the largest food pile is the best position (P g ) in this space. First, each bird starts to search for food piles at a random location, searches for routes with its own experience, and records the largest food pile (P l i ) that it has ever searched. When a particular bird finds a better food pile than all the current bird flocks, it will notify the other bird flocks to move toward the best food pile. Therefore, the following search route of each bird will be affected by three factors: the direction in its own experience (the direction of its own speed), the current direction of finding the best food pile position by itself (the direction of its own best solution), and best food pile position direction found among all the flocks (the best solution direction in the whole domain).

Particle Swarm Algorithm
In the theory of particle swarm algorithm, each bird in the space is regarded as a particle in the "solution space". The current position of each particle is considered to be a solution to the optimization problem of the solution space, and each solution corresponds to an answer, which is called the objective function value or fitness value of the solution space. Each particle has its own speed (V i ) and uses its own speed direction, the best solution (pbest) currently found by itself, and the best solution (gbest) found by the current group to generate a new particle speed. After determining the update speed and direction of the particles, the conditions are used to generate a new update position. Subsequently, the value of the objective function is brought into the position of each particle to judge the pros and cons of the current position. If it is better than the previously searched solution, replace it. Otherwise, keep the original best solution. We used this mechanism to iteratively search in the solution space to find the best solution in the space and used the following Equations (1) and (2) to express the search mechanism for finding the best solution: where i = 1, 2, . . . , m (m denotes the number of particles); k represents the iteration index; V i (k) denotes the velocity vector of the i-th particle in the θ dimension; p i (k) represents the position of the i-th particle in the θ dimension vector; P l i (k) denotes the position vector of the best solution θ dimension of each iteration; P g (k) represents the position vector of the best solution θ dimension of the group; c 1 denotes the individual learning parameter; c 2 represents the group learning parameter; r 1 and r 2 denote random numbers between 0 and 1; θ represents the dimensionality of the search space.
The individual learning parameter c 1 and the group learning parameter c 2 represent the acceleration weights of the best solution and the best solution of the group that advances the particle to each iteration, respectively. When the value of c is small, the particle is allowed to perform multiple searches near the target area before reaching the best solution of its own or the best solution of the group in each iteration. This increases the probability of finding the best solution in the entire domain but at the cost of more computation power and time. When the value of c is large, the particles are allowed to reach their own best solution or the best solution of the group at a faster speed in each iteration. This will save some unnecessary calculations and time and improve the convergence speed. Moreover, when the value c 1 or c 2 is 0, the particle swarm algorithm will have different characteristics [24].
The first part of Equation (1) is the particle's previous inertia, i.e., the velocity of its previous experience. The second part is the "cognition" part, which represents the thinking of the particle itself. Finally, the third part is the "social" part, which implies that the information among particles is shared such that the particles can cooperate. Therefore, the core of the particle swarm algorithm is to use these three parts to update the particle speed and position in a linear combination and to calculate the fitness value to complete the problem optimization.

Fractional-Order Particle Swarm Algorithm
Fractional calculus is derived from traditional calculus [25]. For example, Equation (3) represents fractional differentiation based on the Grünwald-Letnikov definition, where D stands for the differential operator, λ denotes a fractional-order power, and Γ is the Euler function. According to Solteiro Pires et al. [26], if h is expressed in discrete terms, it can be approximated as Equation (4).
where T denotes the sampling period, and r represents the truncation order.
In contrast to the integer-order derivative as a finite series, the fractional-order derivative requires an infinite number of terms. This implies that the information obtained using the fractional order is more global than the integer-order differentiation. Therefore, the solution space that can be explored for the fraction order is more refined than the integer order, and it is expected to obtain better solution space accuracy. Further, the fractional differentiation allows the particle swarm algorithm to memorize the position. Therefore, the velocity vector is affected by the positions of the previous and last generations. This makes the fractional-order particle swarm algorithm more conservative in the search process, making the solution space results more similar and stable every time.
If r = 4 is used as an example, the speed and position vectors are updated with the following Equations (5) and (6).
To improve the point of the direction from which the particles are searched, Shi and Eberhart [8] added an inertia weight term ω to the velocity V i (k) to facilitate the contribution of the particle itself to the update velocity. When ω is large, the direction of the update speed will depend on the direction of the previous generation speed. At this time, the search direction is more stable, which improves the global search ability of the particles in space. However, when ω is considerably large, overcorrection occurs. Consequently, the particle correction speed is excessively large and deviates from the better solution, resulting in "flying" trajectories.
When ω is small, the update speed direction is dominated by the optimal local solution and the direction of the global optimal solution. At this time, a local search capability is provided. However, because the solution searched by the particle is not explored globally, the obtained solution may not achieve the global best solution due to its locality. Therefore, Shi and Eberhart [9] once again proposed a solution, changing the "constant inertia weight" to a "linearly decreasing inertia weight." When ω is set to a larger value in the initial stage, the particle swarm has a better ability to expand the search to find the best solution area in the whole domain. After the number of iterations increases, the value of ω is gradually reduced. The particle swarm will switch from an extended search to a local search to find a better solution in the best found so far. The formula for changing the "constant term inertia weight ω" to "time-varying linear inertia weight ω(k)" is shown in Equation (7).
where k denotes the number of iterations; ω max represents the maximum value of the inertia weight; ω min denotes the minimum value of the inertia weight; iter max represents the maximum number of iterations.
To avoid excessive velocity exceeding the search space during the particle update, Shi and Eberhart [13] used the maximum velocity method (V max Method) to limit particle velocity and improve particle search capability. Among them, the value of V max cannot be set too large because the particles can have a high speed, which may cause the particles to fly out of the search range. The value of V max cannot be set too small either because the particle swarm will search the space too slowly and thus will not be able to search the global space and is limited to the best solution in a local range. The formula of the maximum speed method is as follows: Among them, this study set V max to be 0.2 times of the maximum search range, i.e., V max = 0.2 × X max .
Subsequently, rewrite these equations into the state Equation (11): where T ∈ R n represents the control input vector; i = 1, 2, . . . , m (m denotes the number of particles) and n = 5θ, while θ represents the di-mension for the search space, and k is the iteration index. A I denotes the system matrix, and B I represents the input matrix, as shown in the following Equation (12): where and I denotes the θ × θ unit matrix. r 1 and r 2 denote random numbers between 0 and 1.
Equations (9) and (10) are equivalent to Equation (11) based on Equation (12) and matrix multiplication rules. If each pair (A I , B I ) is controllable, then the state Equation (11) is said to be robust and controllable [27]. Suppose that a fixed fractional value λ, an inertial weight constant reference value ω 0 , a volume learning parameter constant reference value c 10 , and a group learning parameter constant reference value c 20 are selected as the nominal values of the fractional-order particle swarm algorithm. In that case, the equation of state (11) can be rewritten as an uncertain linear system, i.e., the system will be transformed into a nominal fractional-order particle swarm optimization (FPSO) combined with uncertain matrices: where and ∆A and ∆B denote the uncertainty matrices of the (system matrix) A I and (input matrix) B I , respectively, as shown in the following equation: In this study, a sufficient condition is proposed to explain that the linear system with unstructured parametric uncertainties is robust and controllable: assume that the linear interval system of Equation (11), x(k + 1) = A I ·x(k) + B I ·u(k), is controllable. If the following conditions are true, Equations (13) and (15) are robust and controllable. where I n 2 denotes the identity matrix of n 2 × n 2 , and n represents the number of uncertain matrices. The matrices S, E j are defined as follows: and E I = E 0 + ∑ n j=1 ε j E j allows singular value decomposition to become where U ∈ R n 2 ×n 2 and V ∈ R n(2n−1)×n(2n−1) are unitary matrices, S = diag[σ 1 , σ 2 , · · · , σ n 2 ].
The singular values of E I are σ 1 ≥ σ 2 ≥ · · · σ n 2 ≥ 0. The proof of this sufficient condition is shown in Appendix A.

Uncertain Parameter Range Corresponds to a Random Number Range
This section uses the sufficient condition proved in Appendix A to analyze the uncertain linear system of the fractional-order particle swarm algorithm for finding the range of uncertain parameters corresponding to the range of random numbers. Herein, we selected the maximum inertia weight ω max of the fractional-order particle swarm algorithm to be 0.9, the minimum inertia weight ω min to be 0.4, the individual learning parameter c 1 to be 2, and the group learning parameter c 2 also to be 2. Moreover, we set the individual learning parameter constant value c 10 to 1 and the group learning parameter constant value c 20 to 1. Therefore, the following Equation (20) can be obtained: where and ω(k) = ω min + ((iter max − k)/iter max ) × (ω max − ω min ). In this system, the inertia weight parameter of each generation is a constant value, which does not affect the robustness and controllability of Equation (20).
As the fractional-order particle swarm algorithm generates different values in Equation (20) based on different values of λ, the ranges of random numbers r 1 and r 2 corresponding to ε 1 and ε 2 , respectively, are also different. As the value of λ is usually between 0 and 2, this study breaks λ into 20 values and deduces them one by one at the intervals of 0.1. Table 1 shows the r 1 and r 2 for different λ values according to Equation (20), where the random number range of r 2 is between 0 and 1, and the random number range r 1 will change as per different λ values. Among them, the range of r 1 obtained with λ = 0.3 is the widest and least conservative. Thus, the final range of r 1 and r 2 is  The fractional-order particle swarm algorithm using the random number of Equation (21) is called the controllability fractional-order particle swarm optimizer (CFPSO) algorithm.
This study quotes Lin [21] when it is time to implement the controllable fractionalorder particle swarm algorithm. If it meets the conditions, it will be executed. The conditions are as follows: Among them, e pbest i and e gbet are set to 10 −4 . The execution steps of the controllable fractional-order particle swarm algorithm are as follows: Step 1: Set the number of groups, maximum value ω max , and minimum value ω min of the inertia weight of Equation (7). Then, set the individual learning parameter c 1 , group learning parameter c 2 , fractional value λ, function evaluations, and maximum number of iterations of Equation (7) iter max ; Step 2: Initialize the random particle position p i (0) and initial velocity of V i (0) to 0; Step 3: Calculate particle fitness; Step 4: Update each particle's best solution and global best solution; Step 5: Check whether the condition of Equation (22) is satisfied. If it is satisfied, obtain the controllable random number range according to Equation (21) and update Equations (6) and (9); Step 6: Check whether the stop condition is met; if not, go back to Step 3, Step 4, and Step 5 until the stop condition is met.

XGBoost
The integrated machine learning algorithm combines many "weak learners" into one "strong learner" and has two integrated methods. One of the methods is "bagging" [28]. Each weak learner will randomly select some samples for independent training. The final classification result is to calculate the category that all weak learners discriminate the most times (majority voting). The most representative algorithm is random forest [29]. Another method is "boosting" [30]. The weak learner has a sequence relationship. The next weak learner will learn the information that the previous weak learner has not learned. After repeating N times, the N weak learners are weighted and combined into a strong learner. The most representative algorithm is the adaptive boost (AdaBoost) algorithm.
Chen and Guestrin [22] proposed the XGBoost algorithm. It combines the advantages of bagging and boosting and introduces a regularization function to improve the boosting method, and only optimizes the loss function. The regularization function is mainly used to limit the complexity of the model. With the regularization function, the model will be less complicated and is less likely to overfit. XGBoost uses a classification and regression tree (CART) [31] to classify weak learners. CART can be applied to classification tasks because CART uses binary segmentation, and features can be reused to generate trees. Further, it can also be applied to regression tasks. CART uses the maximum Gini index (Gini) as a method to select features to reduce the number of calculations.
For a given training set S, its Gini index is where C k denotes a subset of samples belonging to the k-th category in S, and K represents the number of categories. The greater the Gini index, the greater the uncertainty of the data. CART uses a binary tree as a decision tree and only classifies node features as "yes" or "no." Therefore, the decision tree is equivalent to recursively dicing each feature: divide the feature space into a finite number of units, and determine the prediction probability distribution on these units. Thus, the overall process comprises two steps: decision tree generation and pruning. The generating calculation step is to start from the root node and divide the root node recursively until the stopping conditions are met. The stopping conditions are as follows: (1) the number of samples in the node is less than the preset threshold; (2) the Gini index of the sample set is less than the preset threshold; (3) the depth of the decision tree meets the specified conditions; (4) after the feature is used, it cannot be divided. The pruning step is to start pruning from the bottom of the decision tree T 0 generated using the decision tree generation algorithm and pruning one node at a time until the root node of T 0 forms a subtree sequence {T 0 , T 1 , · · · T n }. Subsequently, the cross-validation method predicts the subtree sequence in the verification data set, and the best subtree T α is selected from it.
Suppose that when a new tree f n is to be constructed in the n-th iteration, the objective function is where l(·) denotes a loss function that is a convex function; Ω(·) represents a regularization term;ŷ (n) i denotes the model prediction for the n-th round; T represents the number of leaf nodes; f n denotes the structure of the n-th tree; Y represents the penalty coefficient for the number of leaf nodes; λ denotes the penalty coefficient for the leaf node score; w represents the score of each tree leaf node.

Results
According to statistics from the Ministry of Health and Welfare, heart disease is ranked second among the top 10 causes of death in Taiwan in 2018. The death toll increased by 4.5% from the previous year. Therefore, adjusting the hyperparameters of XGBoost through the controllable fractional-order particle swarm algorithm is necessary. The trained, reliable prediction model can assist doctors in quickly discerning whether a patient has heart disease so that early treatment can reduce the number of deaths caused by heart damage.

Heart Disease Data Set
The UCI website provides a heart disease data set [32] with no missing data. This data set offers 13 patient characteristics. There are five continuous features and eight category features to predict whether the patient has heart disease. Table 2 is an introduction to the data characteristics.

Data Preprocessing
Standard preprocessing methods include sampling, noise reduction, normalization, data cleaning, and feature engineering. The data preprocessing methods used herein include standardization and feature engineering.

Standardization
The features in the data have different units, and the distribution ranges are different. Thus, using original features will cause some machine learning algorithms to only focus on the features with larger values that cannot accurately train the model. Therefore, the feature distribution is converted to the same range through a standardized method so that all features have the same influence when the model is learning. The commonly used methods are z-score standardization and maximum and minimum standardization. This study uses maximum and minimum standardization to scale all unique values to between 0-1. The formula is as follows: where x ori denotes the original feature value; x min represents the minimum feature value; and x max denotes the maximum feature value. The calculated x new is the new feature value between 0 and 1 after standardization.

Feature Engineering
In the original data, the values of the feature of discrete data may have no meaning to each other, but they have serial properties when represented. For example, the display of chest pain types in the data set is shown in the left half of Figure 1. Therefore, in the learning process, distance-related algorithms will be affected and lead to erroneous learning. Thus, through one-hot encoding, the original discrete features are expanded into mutually independent and exclusive qualities, as shown in the right half of Figure 1. This will make the features have the same effects on the algorithm. This study uses onehot encoding to expand the discrete features (such as Cp) that do not contain sequence properties in the original features into four independent features (Cp 1 , Cp 2 , Cp 3, and Cp 4 ). methods are z-score standardization and maximum and minimum standardization. This study uses maximum and minimum standardization to scale all unique values to between 0-1. The formula is as follows: where denotes the original feature value; represents the minimum feature value; and denotes the maximum feature value. The calculated is the new feature value between 0 and 1 after standardization.

Feature Engineering
In the original data, the values of the feature of discrete data may have no meaning to each other, but they have serial properties when represented. For example, the display of chest pain types in the data set is shown in the left half of Figure 1. Therefore, in the learning process, distance-related algorithms will be affected and lead to erroneous learning. Thus, through one-hot encoding, the original discrete features are expanded into mutually independent and exclusive qualities, as shown in the right half of Figure 1. This will make the features have the same effects on the algorithm. This study uses one-hot encoding to expand the discrete features (such as Cp) that do not contain sequence properties in the original features into four independent features (Cp1, Cp2, Cp3, and Cp4). Moreover, the information that the machine learning model can learn is increased using derivative features. There are many ways to derive features [33]. This research regards the results of the "unsupervised learning" K-means algorithm as new features and incorporates them into the original features for machine learning models to learn. Among them, we used the "Euclidean distance" as the calculation method of data grouping and selected the results when the number of groups K = 2, K = 3, and K = 4. Furthermore, the analysis of the principal components of projecting multidimensional features to a lower-dimensional feature coordinate system produces new features orthogonal to each other (which implies that the features are irrelevant). Thus, it is possible to represent the original data with fewer features but still retain the most important information. This study regards the projection of the original data to the new 2D coordinates as a derivative feature for the model to learn.

Application Controllable Fractional-Order Particle Swarm Algorithm
Herein, the number of populations is set to 30 groups, the number of iterations is 100 generations, and six hyperparameters in the XGBoost model are selected. We used CFPSO to find the best location, i.e., to find the best hyperparameters to minimize the "custom fitness" and compare the differences with the hyperparameters officially recommended by XGBoost (XGBoost, 2021). The six types of hyperparameters include "Learning Rate," the "Max_depth" that each tree can grow, "lowest segmentation threshold (Gamma)" when the leaf node is to be divided into two cotyledon nodes, "data sampling (Subsample)" for the magnification of the data in the training set when each tree is trained, Moreover, the information that the machine learning model can learn is increased using derivative features. There are many ways to derive features [33]. This research regards the results of the "unsupervised learning" K-means algorithm as new features and incorporates them into the original features for machine learning models to learn. Among them, we used the "Euclidean distance" as the calculation method of data grouping and selected the results when the number of groups K = 2, K = 3, and K = 4. Furthermore, the analysis of the principal components of projecting multidimensional features to a lower-dimensional feature coordinate system produces new features orthogonal to each other (which implies that the features are irrelevant). Thus, it is possible to represent the original data with fewer features but still retain the most important information. This study regards the projection of the original data to the new 2D coordinates as a derivative feature for the model to learn.

Application Controllable Fractional-Order Particle Swarm Algorithm
Herein, the number of populations is set to 30 groups, the number of iterations is 100 generations, and six hyperparameters in the XGBoost model are selected. We used CFPSO to find the best location, i.e., to find the best hyperparameters to minimize the "custom fitness" and compare the differences with the hyperparameters officially recommended by XGBoost (XGBoost, 2021). The six types of hyperparameters include "Learning Rate," the "Max_depth" that each tree can grow, "lowest segmentation threshold (Gamma)" when the leaf node is to be divided into two cotyledon nodes, "data sampling (Subsample)" for the magnification of the data in the training set when each tree is trained, "feature sampling (Colsample_bytree)" for the magnification of the data feature of the training set when each tree is trained, and "L2 regularization parameter (Reg_lambda)." Table 3 shows the hyperparameter values and search ranges officially recommended by XGBoost. Accuracy is used to evaluate the accuracy of model predictions and is a basis for assessing the overall credibility of the model. F βscore evaluates and weighs false positives and false negatives through precision and recall. Among them, when β > 1, recall is β times more important than precision. As the model is applied in the medical field, the false-negative component is more important than the false positive, so this study set β to 2. The formula is as follows: where Precision = TP TP+FP and Recall = TP TP+FN . TP denotes true positive; TN represents true negative; FP denotes false positive; FN represents false negative.
This study focuses on the accuracy and F βscore of the validation set but also on the accuracy and F βscore of the training set. To prevent the model from learning the model hyperparameters that happen to be performed better when evaluated on the validation set, the custom fitness is set using the following equation: F βscore = 0.5· Train F βscore + Val F βscore (31) Custom fitness is within the range 0 ∞ . The smaller the value, the better. The execution steps of the experiment are as follows: Step 1: Load the heart disease data set; Step 2: Adjust the data set for preprocessing, such as standardization and one-hot encoding, to a type that the machine learning model can learn. Use derivative features such as K-means and PCA's two-dimensional coordinates as new features for model learning; Step 3: Divide the data set into three parts: 70% training data set, 15% validation data set, and 15% test data set; Step 4: Initialize the controllable fractional-order particle swarm optimizer (CFPSO) position and use it as the initial input of the model hyperparameters; Step 5: Use the training set to train the machine learning model according to the hyperparameter settings and evaluate the accuracy and F βscore of the model training set and validation set to integrate into a custom objective function (custom fitness); Step 6: Update the controllable fractional-order particle swarm algorithm's own best solution and global best solution according to the minimum value of the self-defined objective function; Step 7: Repeat Steps 5 and 6 until the stop condition is met; Step 8: Use the global best solution as the model's best hyperparameters to train the model and use the test set as the final evaluation result.

Experimental Results
The experimental results are shown in Table 4, which shows the evaluation index results of the hyperparameters officially recommended by XGBoost and the best hyperparameters found by CFPSO on the training and validation sets. Table 5 shows the two results in the test data set, and Table 6 lists the best hyperparameter values found by CFPSO. The experimental results show that the model trained using CFPSO to obtain the best hyperparameters is better than the model learned using the hyperparameters officially recommended by XGBoost. Furthermore, the accuracy, F βscore , and custom fitness are better than the original hyperparameter model, so CFPSO is more suitable as a reference basis for assisting doctors in determining whether a patient has heart disease.

Conclusions
This study replaced PSO with FPSO. We used the control theory viewpoint to deduce the CFPSO algorithm, rewrote the system into an uncertain linear system, and proved its controllability. As the fractional value λ will affect the random number range, this research disassembled λ into 20 values and deduced them one by one at intervals of 0.1. The range obtained when λ is 0.3 was the least conservative. Therefore, the random number selected had a wide range. This study used CFPSO to find the best hyperparameters of the model when predicting the heart disease data set. The experimental results show that the best hyperparameters found through CFPSO were better than the hyperparameters officially recommended by XGBoost. Therefore, CFPSO is more suitable for helping physicians quickly determine whether a patient has heart disease through the learned model. In future work, variable-order fractional operators (VOFO) can be involved in this topic for coping with complicated real-world issues because VOFO can provide the robustness and flexibility characteristics more in control theory [34][35][36][37][38].
Therefore, according to the lemma proposed by Lin [40], let A ∈ C n×n . If µ measure (−A) < 1, then det(I + A) = 0, and the following formula is obtained: Therefore, the matrix of Equation (A1) is nonsingular. It can be observed that the matrix E I is full rank n 2 . Furthermore, according to the lemma proposed by Rosenbrock [41], the uncertain linear systems of Equations (13) and (14) are robust and controllable. End of proof.