The Technology-Oriented Pathway for Auxiliary Diagnosis in the Digital Health Age: A Self-Adaptive Disease Prediction Model

The advent of the digital age has accelerated the transformation and upgrading of the traditional medical diagnosis pattern. With the rise of the concept of digital health, the emerging information technologies, such as machine learning (ML) and data mining (DM), have been extensively applied in the medical and health field, where the construction of disease prediction models is an especially effective method to realize auxiliary medical diagnosis. However, the existing related studies mostly focus on the prediction analysis for a certain disease, using models with which it might be challenging to predict other diseases effectively. To address the issues existing in the aforementioned studies, this paper constructs four novel strategies to achieve a self-adaptive disease prediction process, i.e., the hunger-state foraging strategy of producers (PHFS), the parallel strategy for exploration and exploitation (EEPS), the perturbation–exploration strategy (PES), and the parameter self-adaptive strategy (PSAS), and eventually proposes a self-adaptive disease prediction model with applied universality, strong generalization ability, and strong robustness, i.e., multi-strategies optimization-based kernel extreme learning machine (MsO-KELM). Meanwhile, this paper selects six different real-world disease datasets as the experimental samples, which include the Breast Cancer dataset (cancer), the Parkinson dataset (Parkinson’s disease), the Autistic Spectrum Disorder Screening Data for Children dataset (Autism Spectrum Disorder), the Heart Disease dataset (heart disease), the Cleveland dataset (heart disease), and the Bupa dataset (liver disease). In terms of the prediction accuracy, the proposed MsO-KELM can obtain ACC values in analyzing these six diseases of 94.124%, 84.167%, 91.079%, 72.222%, 70.184%, and 70.476%, respectively. These ACC values have all been increased by nearly 2–7% compared with those obtained by the other models mentioned in this paper. This study deepens the connection between information technology and medical health by exploring the self-adaptive disease prediction model, which is an intuitive representation of digital health and could provide a scientific and reliable diagnostic basis for medical workers.


Introduction
With the rapid development of the economy and technology, public demands for improving healthcare are getting stronger and how to utilize information technology to achieve auxiliary diagnosis has received increasing social attention [1][2][3][4]. Moreover, the spread of the concept of digital health [5][6][7] promotes deep integration between information technology and healthcare, where a large number of machine learning (ML) models and data mining (DM) methods have been introduced into the traditional medical diagnosis pattern. To date, there are various existing studies adopting the technologies of ML and DM to predict diseases, such as predicting stable MCI patients [8], forecasting nuanced yet significant MT errors of clinical symptoms [9], survival risk prediction for esophageal cancer [10],

Disease Data Description
In this paper, we select six different real-world disease datasets as the experiment samples (these data are available at https://archive.ics.uci.edu/mL/index.php, accessed on 6 June 2022), i.e., the Breast Cancer dataset (cancer), the Parkinson dataset (Parkinson's disease), the Autistic Spectrum Disorder Screening Data for Children dataset (Autism Spectrum Disorder), the Heart Disease dataset (heart disease), the Cleveland dataset (heart disease), and the Bupa dataset (liver disease). The reasons for selecting these six disease datasets are shown as follows: • They are the common diseases in the real world; • These disease data are extensively utilized by numerous investigators; • These disease data have different internal structures and different diagnosis indicators. The characteristics of these six disease datasets are shown in Table 1.

The Base Disease Classifier-Kernel Extreme Learning Machine
The KELM is an excellent classifier which has advantages in generalization ability and learning speed [18][19][20][21][22]. Since its emergence, the KELM has been extensively studied by numerous investigators for problems such as hyperspectral image classification [23], data classification in enterprise cloud data [24], time-varying distributed parameter systems [25], and intrusion detection [26]. Notably, there are two significant parameters in the original KELM, i.e., the k value (kernel parameter) and the c value (regularization coefficient), which easily fall into local optimum in the original searching process [27,28] and can influence the final prediction accuracy. The main calculation processes of KELM are shown as follows:

Disease Data Description
In this paper, we select six different real-world disease datasets as the experiment samples (these data are available at https://archive.ics.uci.edu/mL/index.php, accessed on 6 June 2022), i.e., the Breast Cancer dataset (cancer), the Parkinson dataset (Parkinson's disease), the Autistic Spectrum Disorder Screening Data for Children dataset (Autism Spectrum Disorder), the Heart Disease dataset (heart disease), the Cleveland dataset (heart disease), and the Bupa dataset (liver disease). The reasons for selecting these six disease datasets are shown as follows:

•
They are the common diseases in the real world; • These disease data are extensively utilized by numerous investigators; • These disease data have different internal structures and different diagnosis indicators.
The characteristics of these six disease datasets are shown in Table 1.

The Base Disease Classifier-Kernel Extreme Learning Machine
The KELM is an excellent classifier which has advantages in generalization ability and learning speed [18][19][20][21][22]. Since its emergence, the KELM has been extensively studied by numerous investigators for problems such as hyperspectral image classification [23], data classification in enterprise cloud data [24], time-varying distributed parameter systems [25], and intrusion detection [26]. Notably, there are two significant parameters in the original KELM, i.e., the k value (kernel parameter) and the c value (regularization coefficient), which easily fall into local optimum in the original searching process [27,28] and can influence the final prediction accuracy. The main calculation processes of KELM are shown as follows: where the Objective_function_ELM and Objective_function_KELM indicate the learning objective function of ELM and that of KELM, respectively. The hid(x) and H_outp represent the feature mapping matrix of hidden layer [18], I_matrix indicates the identity matrix, Cofficient_r indicates the regularization coefficient, L indicates the expectation matrix, and k(x i , x j ) indicates the kernel function. In this paper, we adopt the Gaussian kernel function, and the kernel parameter k indicates the kernel width.

The Base Optimization Tool-Sparrow Search Algorithm
In order to improve the prediction performance of the base-classifier, we take SSA as a base-optimizer and conduct the improvements on SSA to design a more effective optimizer, i.e., the EGSSA. As a recent meta-heuristic algorithm [29], SSA has been applied in many real-world problems [30][31][32][33][34] because of the advantages in convergence speed and exploitation ability. There are two significant population roles in the optimization process of SSA, i.e., producers and scroungers, in which the producers could be regarded as the leader with a higher fitness [30]. The location update processes of these two roles are shown as follows: Equation (4) describes the location update process of producer, the posi t+1 i, d indicates the current location of the i th sparrow individual in the d th dimensional space when the populations are carrying out the t th iteration [30], t and i indicate the current iterations and current sparrow individual, respectively. The α, R, and Random are the random parameters set manually, the ST is a warning threshold, the values of which could be set in (0.5, 1) [30]. The Matrix indicates a row vector where each element value is set to 1 and the dimensions are d [30]. Equation (5) describes the location update process of scrounger, the posi t+1 a indicates the optimal location searched by producer, the posi worst indicates the worst location in the current iteration [30], the n is the sparrow population size, and the V indicates a row vector where each element value is set to 1 or −1 randomly and the dimensions are d. Equation (6) describes the location update process of detection sparrow, the posi t best indicates the optimal location when the populations are carrying out the t th iteration, and the β, ψ, and ξ indicate adjustment parameters [30].

The Introduction of Evaluation Metrics
To evaluate the disease prediction performance of the proposed MsO-KELM, we adopt the four evaluation metrics in this paper, i.e., classification accuracy (ACC, the value range is from 0 to 1.0) [18], sensitivity (the value range is from 0 to 1.0) [18], specificity (the value range is from 0 to 1.0) [18], and Mathews correlation coefficient (MCC, the value range is from −1.0 to 1.0) [18]. The ACC emphasizes the number of samples which are correctly predicted (the most significant evaluation metric to measure the classification performance of a model), the sensitivity shows the ability to correctly predict a positive sample among all positive samples, the specificity indicates the ability to correctly predict a negative sample among all negative samples, and the MCC mainly shows the reliability of a model (the closer the MCC value is to 1, the more accurate and effective the model is). The calculation processes of these four evaluation metrics are shown as follows: where the TP indicates the number of samples with the positive prediction result and the positive label, the TN indicates that the number of samples with the negative prediction result and the negative label, the FP indicates that the number of samples with the positive prediction result and the negative label, and the FN indicates that the number of samples with the negative prediction result and the positive label.

The Proposed Methodology
The core idea of the proposed methodology is to construct a self-adaptive disease prediction model with high accuracy, strong generalization ability, and strong robustness. Therefore, we select an excellent base-classifier (KELM) and design an enhanced metaheuristic algorithm (EGSSA) as the optimizer to finally construct the self-adaptive disease prediction model, i.e., the MsO-KELM. Specifically, there are four novel strategies in the MsO-KELM, i.e., the hunger-state foraging strategy of producers (PHFS), the parallel strategy for exploration and exploitation (EEPS), the perturbation-exploration strategy (PES), and the parameter self-adaptive strategy (PSAS), where the EGSSA consists of the PHFS, the EEPS, and the PES. In addition, the PSAS will act on a parameter acquisition mechanism which is formed by combining the EGSSA with the KELM. In this section, the technological details of the MsO-KELM will be discussed as follows:

Foraging Strategy of Producers in Hunger-State (PHFS)
In the original SSA, the producers could be regarded as the leader roles in the sparrow populations, which are responsible for searching for food-rich positions and providing foraging directions for all scroungers, and the scroungers could follow the producers to achieve the foraging process. In that case, if the producers could expand the searching range to find a safer and more adequate position, it would provide more possibilities for scroungers to improve the foraging rate and finally enhance the global convergence performance.
However, the existing location update mechanism mainly focuses on the exploitation ability (local searching ability) of the original SSA, which may cause the producers trapping into the local optimum situation. To address this issue and enhance the exploration ability (global searching ability), we introduce the hunger games search algorithm (HGS) [35,36] into the original SSA to construct the PHFS for expanding the searching range of optimal position. Specifically, the PHFS is a hybrid strategy, which retains the advantage in exploitation ability of the original SSA while combining the exploration approach of the HGS algorithm with the location update mechanism of the producers.
In the original SSA, the location update process of producers is shown as Equation (4), and the key calculation function affecting the convergence efficiency is shown in Equation (11): where the α and iter max are the parameters set manually. It is clearly shown that Equation (11) has a descending trend and will eventually converge to 0, which means the producers can easily repeat the searching behavior at a certain position with the number of individuals increasing, and eventually fall into a local optimum. By contrast, the HGS has a significant advantage in exploration ability, and the hungry feature function affecting the convergence efficiency is shown in Equation (12): where the par_maunal is a parameter set manually. It is clearly shown that Equation (12) is not a single ascending or descending trend, but the function trend is affected by the parameter value. When the inputting value is less than the par_maunal value, the function could have an ascending trend with the inputting value increasing. Moreover, when the inputting value is larger than the par_maunal value, the function could have a descending trend with the inputting value increasing. Therefore, introducing the hungry feature of HGS into the searching behavior of producers could expand the searching range of optimal position, and ultimately enhance the exploration ability of SSA. The searching processes based on Equations (11) and (12) are shown in Figure 2.
algorithm (HGS) [35,36] into the original SSA to construct the PHFS for expanding the searching range of optimal position. Specifically, the PHFS is a hybrid strategy, which retains the advantage in exploitation ability of the original SSA while combining the exploration approach of the HGS algorithm with the location update mechanism of the producers.
In the original SSA, the location update process of producers is shown as Equation (4), and the key calculation function affecting the convergence efficiency is shown in Equation (11): where the α and itermax are the parameters set manually. It is clearly shown that Equation (11) has a descending trend and will eventually converge to 0, which means the producers can easily repeat the searching behavior at a certain position with the number of individuals increasing, and eventually fall into a local optimum. By contrast, the HGS has a significant advantage in exploration ability, and the hungry feature function affecting the convergence efficiency is shown in Equation (12): where the par_maunal is a parameter set manually. It is clearly shown that Equation (12) is not a single ascending or descending trend, but the function trend is affected by the parameter value. When the inputting value is less than the par_maunal value, the function could have an ascending trend with the inputting value increasing. Moreover, when the inputting value is larger than the par_maunal value, the function could have a descending trend with the inputting value increasing. Therefore, introducing the hungry feature of HGS into the searching behavior of producers could expand the searching range of optimal position, and ultimately enhance the exploration ability of SSA. The searching processes based on Equations (11) and (12) are shown in Figure 2. According to Figure 2, we could find that the original location update method can easily search in a local space (like the red region in Figure 2a), but the hungry roles of HGS could search in a relatively global region (like Figure 2b). Therefore, the PHFS could be described as follows: According to Figure 2, we could find that the original location update method can easily search in a local space (like the red region in Figure 2a), but the hungry roles of HGS could search in a relatively global region (like Figure 2b). Therefore, the PHFS could be described as follows: where the meaning of posi_new t+1 i, d is similar to that of posi t+1 i, d in Equation (4), the posi_current t i,d indicates the location of last iteration, the rand is a random number, and the o is an adjustment parameter which is set to 2 in this paper.

Parallel Strategy for Exploration and Exploitation (EEPS)
As shown in Equation (4), we could find that the location update process of producers consists of two different stages. The PHFS acts on stage 1 (R < ST), and the EEPS described below is going to act on stage 2 (R ≥ ST). According to the principle of SSA, the producers would move to a safer place when perceiving danger coming. In the new location update process, the producers would search globally with a normally distributed random manner and eventually converge to the optimal position. Nevertheless, there are still some limitations in this moving process, i.e., (i) the existing searching range could continue to be expanded; and (ii) the single global searching process could affect the convergence accuracy. Therefore, we design the EEPS to balance the exploration process and exploitation process in this paper. In fact, the balancing effect of EEPS is able to dynamically adjust the position searching approach of the producers, i.e., by promoting the producers exploring the whole space with a global searching approach for expanding the searching range of the potential optimal position in the early stage, while exploiting the current area with a local searching approach when a certain area is close to the optimal position.
Similarly, inspired by the literature [37,38] related to HGS, we introduce the pattern of food-approaching into the location update process of producers in stage 2, and construct the balance factor shown in Equation (14): where the meaning of para is a random number in the range of (0, 1), the δ indicates a control parameter (it is set to 2), the iter indicates the current number of iterations, and the iter_max indicates the maximum number of iterations. Subsequently, the EEPS, which combines the balance factor with the original location update mechanism, is shown in Equation (15): where the meaning of the parameters are similar to Equations (4), (6) and (14). The comparison results of the original searching process and the searching process based on after-EEPS are shown in Figure 3a,b, respectively. adjustment parameter which is set to 2 in this paper.

Parallel Strategy for Exploration and Exploitation (EEPS)
As shown in Equation (4), we could find that the location update process of producers consists of two different stages. The PHFS acts on stage 1 (R < ST), and the EEPS described below is going to act on stage 2 (R ≥ ST). According to the principle of SSA, the producers would move to a safer place when perceiving danger coming. In the new location update process, the producers would search globally with a normally distributed random manner and eventually converge to the optimal position. Nevertheless, there are still some limitations in this moving process, i.e., (i) the existing searching range could continue to be expanded; and (ii) the single global searching process could affect the convergence accuracy. Therefore, we design the EEPS to balance the exploration process and exploitation process in this paper. In fact, the balancing effect of EEPS is able to dynamically adjust the position searching approach of the producers, i.e., by promoting the producers exploring the whole space with a global searching approach for expanding the searching range of the potential optimal position in the early stage, while exploiting the current area with a local searching approach when a certain area is close to the optimal position.
Similarly, inspired by the literature [37,38] related to HGS, we introduce the pattern of food-approaching into the location update process of producers in stage 2, and construct the balance factor shown in Equation (14): where the meaning of para is a random number in the range of (0, 1), the δ indicates a control parameter (it is set to 2), the iter indicates the current number of iterations, and the iter_max indicates the maximum number of iterations. Subsequently, the EEPS, which combines the balance factor with the original location update mechanism, is shown in Equation (15): where the meaning of the parameters are similar to Equations (4), (6) and (14). The comparison results of the original searching process and the searching process based on after-EEPS are shown in Figure 3a and Figure 3b, respectively.  In Figure 3, the points in the same color are the positions of all producers in one iteration, while different colors indicate different iterations. Figure 3a shows that the original location update mechanism of producers in stage 2 could achieve a global searching process to some extent, but it lacks the local exploitation behavior and the global searching range is not large enough, which can finally affect the convergence accuracy. As a comparison, Figure 3b shows that the EEPS not only expands the global searching range, but retains the exploitation ability, which can be clearly seen in the color changing process of points.

Perturbation-Exploration Strategy (PES)
Notably, the PHFS and EEPS only enhance the exploration ability of producer populations, which means that there remains the possibility of trapping into local optimum for the whole sparrow populations. Therefore, we introduce the Cauchy distribution operator [39][40][41] to construct the PES to further expand the searching range and avoid the local optimum situation in late iterations.
In the PES, we adopt the Cauchy distribution operator to obtain a variant of the current optimal individual. Moreover, we compare the fitness of the current optimal individual with that of the variant to save the better solution. In this paper, the probability density function of Cauchy distribution operator is shown as follows: where the a is a parameter (in this paper, the a is equal to 1), and the k indicates a variable, the value range of which is from negative infinity to positive infinity. Based on the Cauchy distribution operator, we could obtain the perturbation-exploration process as follows: where the r is a Cauchy distribution random variable generating function.

Parameter Self-Adaptive Strategy (PSAS)
As we all know, obtaining the appropriate parameter values (k and c) is most significant for the original KELM. However, the existing method of parameter taking is to adopt grid searching, which not only increases the computational cost but also affects the final results of parameter obtaining. To achieve a self-adaptive process of these two parameters, we propose the PSAS via combining the EGSSA with the KELM.
Specifically, there are four significant stages in the PSAS, and the technical details are as follows: (i) achieving the location initialization process of sparrow populations (this paper adopts the random generation method of the original SSA); (ii) the core parameters (k and c) could be automatically obtained by adopting the EGSSA; (iii) to eliminate the randomness of these two obtained parameters, this paper utilizes 10-fold cross-validation [18] to reobtain the optimal parameter values; and (iv) the re-obtained optimal parameter values are introduced into the original KELM, and finally the MsO-KELM is formed to perform the prediction performance on six different disease datasets based on 10-fold cross-validation.
In summary, the novel MsO-KELM model utilizes the four major strategies, i.e., PHFS, EEPS, PES, and PSAS, to achieve improved prediction performance, and the specific details of the MsO-KELM are shown in Algorithm 1.

Results
In this section, we mainly analyze the results of two experiments, i.e., the performance analysis of the EGSSA and the disease prediction evaluation of the MsO-KELM. To ensure the reasonableness of these two experiments, we set the population size and the maximum iterations of each experiment group to (30, 500) and (10, 50), respectively. Importantly, each algorithm in experiment 1 runs 30 times independently on each function to minimize the effects of algorithmic randomness [30]. Moreover, all experiments are conducted in the same running environment, i.e., Intel Core i7, 2.40 GHz, 8 GB RAM, MATLAB2021a, and the IBM SPSS Statistics 20.

Experiment 1: The Performance Analysis of EGSSA
In this paper, we select nine different swarm intelligence algorithms to conduct the comparison analysis, which contain seven classical algorithms, i.e., SSA [29], PSO [42], GWO [43], HHO [44], LSA [45], WOA [46], and FPA [47], and two classical swarm intelligence variants, i.e., SCACSSA [48] and HHOHGSO [49]. The parameters which exist in both EGSSA and SSA are set to the same values, and the specific parameter settings in this experiment are shown in Table 2.

Results
In this section, we mainly analyze the results of two experiments, i.e., the performance analysis of the EGSSA and the disease prediction evaluation of the MsO-KELM. To ensure the reasonableness of these two experiments, we set the population size and the maximum iterations of each experiment group to (30,500) and (10,50), respectively. Importantly, each algorithm in experiment 1 runs 30 times independently on each function to minimize the effects of algorithmic randomness [30]. Moreover, all experiments are conducted in the same running environment, i.e., Intel Core i7, 2.40 GHz, 8 GB RAM, MATLAB2021a, and the IBM SPSS Statistics 20.

Experiment 1: The Performance Analysis of EGSSA
In this paper, we select nine different swarm intelligence algorithms to conduct the comparison analysis, which contain seven classical algorithms, i.e., SSA [29], PSO [42], GWO [43], HHO [44], LSA [45], WOA [46], and FPA [47], and two classical swarm intelligence variants, i.e., SCACSSA [48] and HHOHGSO [49]. The parameters which exist in both EGSSA and SSA are set to the same values, and the specific parameter settings in this experiment are shown in Table 2.

Performance Analysis Based on 23 Classical Benchmark Functions
The 23 classical benchmark functions are frequently adopted to evaluate the performance of optimization algorithms, which could be divided into three categories, i.e., the unimodal functions, the multimodal functions, and the fixed-dimension multimodal functions [30]. The specific characteristics of these 23 benchmark functions are shown in Table 3, and the experimental results of the aforementioned algorithms based on the 23 functions are shown in Table 4. Table 3. Characteristics of the 23 classical benchmark functions [30].

Function Function Equationuation Dim Range
Optimal In this experiment, considering that all the swarm intelligence optimization algorithms should perform multiple iterations, we utilize the mean value (avg) and the standard deviation value (std) to measure the performance [30], where the avg value is the key metric. The avg value being closer to the optimal value means the algorithm has a better performance in solving the current function. In addition, the std value could show the stability of this algorithm in solving the current function. Notably, when there are the same avg values between two algorithms, the std value could be the second evaluation metric.
According to the results in Table 4, we find that the EGSSA could obtain the best results in solving the benchmark functions F1, F2, F3, F4, F5, F6, F9, F10, F11, F12, F13, F16, F17, F18, F19, F20, F21, F22, and F23 among all 10 compared algorithms. Furthermore, in solving the unimodal functions F1-F4, the EGSSA not only obtains the optimal values, but also has the most stable performance. In solving the multimodal functions F9, F11, and the fixed-dimension multimodal functions F16-F19, the EGSSA can obtain the optimal values, while solving the multimodal functions F10, F12, F13, and the fixed-dimension multimodal functions F20-F23, the EGSSA can obtain the best results among all 10 compared algorithms and show more stable performance than others.
Apart from the comparison results table, we could verify the validity of EGSSA based on the convergence curve figure. In Figure 4, the F5 and F6 are unimodal functions, the F12 and F13 are multimodal functions, and the F22 and F23 are the fixed-dimension multimodal functions. According to Figure 4, it indicates that the EGSSA can obtain the fastest convergence among the 10 compared algorithms in solving the benchmark functions F6, F12, F13, F22, and F23. When solving the function F5, the convergence speed of EGSSA is similar to that of HHO, SSA, and SCASCSSA during the first 100 iterations, which is significantly superior to others, and the EGSSA can obtain the fastest convergence among the 10 compared algorithms after 100 iterations. In summary, the EGSSA enhances the overall convergence performance compared with the original SSA, other classical swarm intelligence optimization algorithms, and other variants. Therefore, the EGSSA could provide good support for the construction of the subsequent disease prediction model.

Statistical Test
To show the statistical significance of the proposed EGSSA, we introduce the Wilcoxon rank-sum test into this section [30]. The better one of any two compared algorithms could be identified by comparing the obtained significance level value (pvalue) with 0.05. The p-value being less than 0.05 indicates that the former (the algorithm to be verified) has more significant advantages than the latter (the algorithm being compared); otherwise, there is no significant difference between the latter and the former. The results of the statistical test are shown in Table 5.

Statistical Test
To show the statistical significance of the proposed EGSSA, we introduce the Wilcoxon rank-sum test into this section [30]. The better one of any two compared algorithms could be identified by comparing the obtained significance level value (p-value) with 0.05. The p-value being less than 0.05 indicates that the former (the algorithm to be verified) has more significant advantages than the latter (the algorithm being compared); otherwise, there is no significant difference between the latter and the former. The results of the statistical test are shown in Table 5. According to Table 5, we find that the EGSSA could obtain 16 better statistical test results among 23 benchmark functions compared with the original SSA; meanwhile, it could also obtain 7 statistical test results which are equal to those of the original SSA. For the comparison results between the EGSSA and the other six classical swarm intelligence algorithms, it is clearly shown that the p-values of EGSSA versus PSO, EGSSA versus GWO, EGSSA versus HHO, EGSSA versus LSA, EGSSA versus WOA, and EGSSA versus FPA are much less than 0.05, and the EGSSA could obtain 16-19 better benchmark function results than the aforementioned compared algorithms. When comparing the EGSSA with the HHO, although the p-value is larger than 0.05, the number of benchmark functions which could obtain better results is still significantly higher than with the HHO. Similarly, for the comparison results between the EGSSA and the SCACSSA, we could find that the EGSSA has significant advantages compared with SCACSSA, and the former could obtain 19 better solving results among 23 benchmark functions while it could also obtain four statistical test results which are equal to those of the SCACSSA. When comparing the EGSSA with the HHOHGSO, although the p-value is larger than 0.05, the number of benchmark functions for which better results could be obtained is still significantly higher than the HHOHGSO.

Experiment 2: The Prediction Analysis in Solving Different Disease Datasets
This section is mainly to verify the prediction performance of the proposed MsO-KELM in solving the aforementioned six disease datasets. Furthermore, we select another five different optimization algorithms-based KELM variants to be the compared group, i.e., GWO-KELM, HHO-KELM, FPA-KELM, WOA-KELM, and SSA-KELM. The prediction results of different algorithms on four different evaluation metrics are shown in Tables 6-11 and Figures 5-7.       values. Meanwhile, we normalize all attributes to the interval form −1 to 1. To obtain a fair comparison result for these six different KELM variants, we set the same population size and maximum iteration numbers for all compared algorithms, to 10 and 50, respectively. In addition, to effectively estimate the generalization ability of the proposed model, we utilize 10-fold cross-validation to divide the original data samples into 10 groups, where each group can be used separately as a testing set, and the other 9 groups can be used as a training set, and, in that case, the 10 models can be obtained. These 10 models are evaluated in the 10 testing sets, respectively, and the final cross-validation result is obtained by summing and averaging the results of the 10 models. For the other special parameters, we can set the values based on Table 2.

Data Pre-Processing and Parameter Settings
Considering that the number of samples with missing values is about 1-3% of the sample volume of the relevant dataset, we therefore remove these samples with missing values. Meanwhile, we normalize all attributes to the interval form −1 to 1. To obtain a fair comparison result for these six different KELM variants, we set the same population size and maximum iteration numbers for all compared algorithms, to 10 and 50, respectively. In addition, to effectively estimate the generalization ability of the proposed model, we utilize 10-fold cross-validation to divide the original data samples into 10 groups, where each group can be used separately as a testing set, and the other 9 groups can be used as a training set, and, in that case, the 10 models can be obtained. These 10 models are evaluated in the 10 testing sets, respectively, and the final cross-validation result is obtained by summing and averaging the results of the 10 models. For the other special parameters, we can set the values based on Table 2.

Experiment Results Analysis
The prediction results in solving Breast Cancer via 10-fold cross-validation method are shown in Table 6. Based on Table 6 and Figure 5a, it is clearly shown that the proposed MsO-KELM could obtain the best average values of ACC, sensitivity, specificity, and MCC among all six compared algorithms, at 94.124%, 94.434%, 94.539%, and 87.099%, respectively. As a comparison, the GWO-KELM obtains the second best average values of these four evaluation metrics at 93.695%, 92.628%, 94.212%, and 86.068%, respectively. In addition, the HHO-KELM, the WOA-KELM, and the SSA-KELM all obtain the worst ACC values, specificity values, and MCC values at 92.237%, 93.944%, and 82.433%, respectively, but the FPA-KELM obtains the worst sensitivity value at 88.539%. According to these evaluation metric results, we could find that the proposed MsO-KELM enhances the overall prediction performance and outperforms the other optimization algorithmbased KELM models in predicting breast cancer; therefore, the MsO-KELM could be adopted in the early detection of breast cancer.
The prediction results in solving Parkinson via 10-fold cross-validation method are shown in Table 7. Based on Table 7 and Figure 5b, it is clearly shown that the proposed MsO-KELM could obtain the best average values of ACC, sensitivity, specificity, and MCC among all six compared algorithms at 84.167%, 89.596%, 67.571%, and 57.140%, respectively. As a comparison, the FPA-KELM could obtain the second best ACC value at 82.833% and the second best MCC value at 54.365%; the HHO-KELM and the WOA-KELM could obtain the second best sensitivity value at 88.423% and the second best

Experiment Results Analysis
The prediction results in solving Breast Cancer via 10-fold cross-validation method are shown in Table 6. Based on Table 6 and Figure 5a, it is clearly shown that the proposed MsO-KELM could obtain the best average values of ACC, sensitivity, specificity, and MCC among all six compared algorithms, at 94.124%, 94.434%, 94.539%, and 87.099%, respectively. As a comparison, the GWO-KELM obtains the second best average values of these four evaluation metrics at 93.695%, 92.628%, 94.212%, and 86.068%, respectively. In addition, the HHO-KELM, the WOA-KELM, and the SSA-KELM all obtain the worst ACC values, specificity values, and MCC values at 92.237%, 93.944%, and 82.433%, respectively, but the FPA-KELM obtains the worst sensitivity value at 88.539%. According to these evaluation metric results, we could find that the proposed MsO-KELM enhances the overall prediction performance and outperforms the other optimization algorithm-based KELM models in predicting breast cancer; therefore, the MsO-KELM could be adopted in the early detection of breast cancer.
The prediction results in solving Parkinson via 10-fold cross-validation method are shown in Table 7. Based on Table 7 and Figure 5b, it is clearly shown that the proposed MsO-KELM could obtain the best average values of ACC, sensitivity, specificity, and MCC among all six compared algorithms at 84.167%, 89.596%, 67.571%, and 57.140%, respectively. As a comparison, the FPA-KELM could obtain the second best ACC value at 82.833% and the second best MCC value at 54.365%; the HHO-KELM and the WOA-KELM could obtain the second best sensitivity value at 88.423% and the second best specificity value at 67.500%, respectively. In addition, the SSA-KELM obtained the worst ACC value, specificity value, and MCC value at 80.333%, 61.500%, and 47.749%, respectively, but the GWO-KELM obtained the worst sensitivity value at 85.991%. According to these evaluation metric results, we could find that the proposed MsO-KELM has better prediction performance than other compared models for Parkinson's disease.
The prediction results in solving Autistic Spectrum Disorder Screening Data for Children via 10-fold cross-validation method are shown in Table 8. Based on Table 8 and Figure 6a, it is clearly shown that the proposed MsO-KELM could obtain the best average values of ACC and MCC among all six compared algorithms at 91.079% and 82.899%, respectively. As a comparison, the WOA-KELM obtained the second best ACC values at 89.433% and the FPA-KELM obtained the second best MCC values at 80.578%. In addition, for the sensitivity value and specificity value, the MsO-KELM could obtain the results at 98.424% and 83.379%, respectively, which are close to the best values for these two metrics. According to these evaluation metric results, we could find that the proposed MsO-KELM has the best prediction accuracy among these compared models in the diagnosis of autism spectrum disorders.
The prediction results in solving Heart Disease via 10-fold cross-validation method are shown in Table 9. Based on Table 9 and Figure 6b, it is clearly shown that the proposed MsO-KELM could obtain the best average values of ACC, sensitivity, and MCC among all six compared algorithms at 72.222%, 68.853%, and 44.557%, respectively. As a comparison, the FPA-KELM obtained the second best ACC value, sensitivity value, and MCC values at 70.741%, 63.198%, and 41.245%, respectively. In addition, the SSA-KELM obtained the worst ACC value, sensitivity value, and MCC value at 66.296%, 42.951%, and 32.280%, respectively. For the sensitivity value, the MsO-KELM could obtain a result which is close to that of the other models. According to these evaluation metric results, we could find that the proposed MsO-KELM has more advantages and could obtain a better classification performance than the other compared models.
The prediction results in solving Cleveland via 10-fold cross-validation method are shown in Table 10. Based on Table 10 and Figure 7a, it is clearly shown that the proposed MsO-KELM could obtain the best average values of ACC and MCC among all six compared algorithms at 70.184% and 40.608%, respectively. As a comparison, the WOA-KELM obtained the second best average values of ACC and MCC at 68.839% and 38.364%, respectively. In addition, the SSA-KELM could obtain the worst ACC value, specificity value, and MCC value at 64.517%, 45.252%, and 30.108%, respectively, but the GWO-KELM and the HHO-KELM obtained the worst sensitivity value at 68.668%. According to these evaluation metric results, we could find that the proposed MsO-KELM is superior to the others and could be adopted to diagnose the Cleveland heart disease.
The prediction results in solving Bupa via 10-fold cross-validation method are shown in Table 11. Based on Table 11 and Figure 7b, it is clearly shown that the proposed MsO-KELM could obtain the best average values of ACC, specificity, and MCC among all six compared algorithms at 70.476%, 59.423%, and 36.706%, respectively. As a comparison, the WOA-KELM could obtain the second best ACC value at 68.095%, the WOA-KELM and the HHO-KELM could obtain the second best specificity value at 55.407%, and the FPA-KELM could obtain the second best MCC value at 33.780%. For the sensitivity value, the MsO-KELM could obtain a result which is close to that of the other models. According to these evaluation metric results, we could find that the proposed MsO-KELM has more advantages in predicting liver disease than the other compared models.

Discussion
Combining information technology with medical information to realize the auxiliary diagnosis is a research trend in the digital health age; therefore, researchers have conducted a large number of prediction studies on common diseases by utilizing some machine learning models. However, the existing studies have weak generalization performance, which limits their further application for other diseases. In other words, these findings may obtain better prediction results in solving a certain disease but may obtain worse prediction results in solving other, different diseases. The reason why these findings have weak generalization performance is that the researchers mainly emphasize result-orientation for a certain disease, i.e., they focus on training with single disease data to obtain an effective prediction model which is suitable for the current disease. In fact, a prediction model with better generalization performance could help medical workers to diagnose various diseases. Therefore, exploring a prediction model with strong generalization ability has important theoretical and practical significance in the current digital health age.

Theoretical Significance
This paper is an attempt to explore a technology-oriented pathway for auxiliary diagnosis. On the one hand, our investigation adopts different disease data as the experimental objects, and aims to expand the application scope of the final findings by mining the internal characteristics of these different disease data, which could provide a novel research idea for researchers to eliminate the application limitations of the existing studies, and could provide effective theoretical guidance for researchers to realize comprehensive auxiliary diagnosis in facing various diseases.
On the other hand, utilizing an enhanced meta-heuristic algorithm to optimize the operating mechanism and inner structure of the original classifier is an efficient method to improve the performance of a model. The proposed MsO-KELM not only enhances the generalization ability and rapid learning ability of the original KELM classifier, but also realizes the self-adaptive process in predicting different diseases by introducing the EGSSA optimizer. The results of the evaluation metrics show that the MsO-KELM has significant advantages among all compared models in predicting the six diseases. Specifically, the MsO-KELM can obtain the best ACC value when predicting each disease, such that the ACC value is 94.124% in predicting breast cancer, the ACC value is 91.079% in predicting Autistic Spectrum Disorder, and the ACC value is 84.167% in predicting Parkinson's disease. These ACC values could reflect the effectiveness of the MsO-KELM in disease prediction. Compared with some specific disease models [11,50], there may be a slight decrease in prediction accuracy of the MsO-KELM. However, according to the No Free Lunch (NFL) theory [51], although the MsO-KELM has slightly decreased accuracy in predicting a certain disease, it could predict more different diseases, which enhances the medical application value of the model.

Practical Significance
On the one hand, for the six different real-world diseases selected in this study, our investigation could assist doctors to diagnose or screen patients with related diseases, guide patients to prevent the diseases in a targeted manner, reduce the risk level of these diseases, and finally improve the survival quality of the patients.
On the other hand, there may be a consistent one-to-one match between the existing prediction models and their prediction targets, which means they will increase the economic cost and time cost for medical departments when analyzing different diseases. Therefore, a universal prediction model with strong generalization ability and robustness could predict more different diseases so as to improve the overall work efficiency of the medical workers.

Limitations
This study completes the construction of a self-adaptive prediction model in solving different diseases, but some limitations related to the MsO-KELM should be noted.
(i) In terms of model, although this paper presents four entirely novel optimization strategies to further optimize the prediction performance of the model, the performance improvement is accompanied by an increase in computational complexity, which is inevitable and could be comprehended according to the NFL theory. Therefore, we will further enhance the prediction performance of the MsO-KELM by redesigning a novel parameter optimization mechanism and optimizing the fundamental structure of the prediction model to reduce the computational complexity in the future studies.
(ii) In terms of data, on the one hand, the disease data analyzed in this paper are publicly available datasets, where each disease dataset has a limited sample size and some datasets even have specific geographical characteristics; on the other hand, the number of disease categories covered by the selected datasets are not enough. Therefore, we will increase the category of disease data and expand the sample size of disease data. In addition, we will also strive for collecting global disease data to eliminate the influence of the geographical factors.

Conclusions
To effectively predict more diseases and provide more comprehensive auxiliary diagnosis for medical workers, this paper proposes a novel disease prediction model, i.e., the MsO-KELM. There are two experiments conducted in this paper to elaborate the details of the MsO-KELM. The first experiment shows an optimization process for the EGSSA, which aims to construct a self-adaptive characteristic for the subsequent MsO-KELM model. The second experiment proves that the MsO-KELM has high accuracy, strong generalization ability, and strong robustness, which highlights the better disease prediction performance of the MsO-KELM. In the future, we will conduct in-depth research into optimizing the prediction model and expanding the disease data sample to achieve a further breakthrough in medical informatics.