Multilayer Perceptron Neural Network with Arithmetic Optimization Algorithm-Based Feature Selection for Cardiovascular Disease Prediction

: In the healthcare ﬁeld, diagnosing disease is the most concerning issue. Various diseases including cardiovascular diseases (CVDs) signiﬁcantly inﬂuence illness or death. On the other hand, early and precise diagnosis of CVDs can decrease chances of death, resulting in a better and healthier life for patients. Researchers have used traditional machine learning (ML) techniques for CVD prediction and classiﬁcation. However, many of them are inaccurate and time-consuming due to the unavailability of quality data including imbalanced samples, inefﬁcient data preprocessing, and the existing selection criteria. These factors lead to an overﬁtting or bias issue towards a certain class label in the prediction model. Therefore, an intelligent system is needed which can accurately diagnose CVDs. We proposed an automated ML model for various kinds of CVD prediction and classiﬁcation. Our prediction model consists of multiple steps. Firstly, a benchmark dataset is preprocessed using ﬁlter techniques. Secondly, a novel arithmetic optimization algorithm is implemented as a feature selection technique to select the best subset of features that inﬂuence the accuracy of the prediction model. Thirdly, a classiﬁcation task is implemented using a multilayer perceptron neural network to classify the instances of the dataset into two class labels, determining whether they have a CVD or not. The proposed ML model is trained on the preprocessed data and then tested and validated. Furthermore, for the comparative analysis of the model, various performance evaluation metrics are calculated including overall accuracy, precision, recall, and F1-score. As a result, it has been observed that the proposed prediction model can achieve 88.89% accuracy, which is the highest in a comparison with the traditional ML techniques.


Introduction
Cardiovascular disease (CVD) is a critical health problem caused by disorders in the heart and blood vessels, and it is one of the significant causes of mortality worldwide.CVD includes several categories, such as coronary heart disease, strokes, and others.Machine learning (ML) has been effectively applied in biomedicine fields, introducing easy methods of photovoltaic power forecasting as well as diagnosing diseases such as CVD and other diseases [1,2].Medical data are usually significant in volume.The analysis of extensive data requires a lot of resources and time for implementation, which increases the computational complexity and reduces the efficiency of ML models.Furthermore, not all features in the dataset may contribute effectively to CVD diagnosis.
According to the World Heart Federation (WHF), more than 17 million people die from CVDs yearly, and the World Health Organization (WHO) states that the leading cause of death worldwide is CVD [3].About 26 million adults are diagnosed with heart disease each year, according to the European Society of Cardiology (ESC).Accurate and early diagnosis of CVD risk in patients is essential in reducing its associated risks [3].Many reports indicate that CVD is one of the main causes of sudden death in industrialized countries.These increasing death rates, especially prominent in developed countries, affect the health, financial resources, and budgets of individuals [3].Diagnosing and treating heart diseases are very complicated procedures, especially in developing countries, due to the scarcity of diagnostic devices and the lack of medical cadres and resources [3].An electrocardiogram (ECG) is used and is considered the gold standard in CVD detection and its risk analysis.However, this method is expensive, requires a high level of technical expertise, and is time-consuming.Therefore, researchers must find cheaper and more effective alternative methods.
Based on Figure 1, a diagnosis is often made based on a doctor's experience and assumptions.Therefore, it is possible to make wrong decisions that may be fatal in some cases.Hence, ML has become a popular method of efficiently predicting disease rather than relying exclusively on human knowledge [4].With the increasing availability of electronic health data and the resolution of the complexities of CVD diagnosis, computational methods, such as support vector machine (SVM), K-nearest neighbor (K-NN), decision tree (DT), AdaBoost (AB), artificial neural network (ANN), etc., are becoming more applicable, with exploratory value in disease prediction [5].
methods of photovoltaic power forecasting as well as diagnosing diseases such as CVD and other diseases [1,2].Medical data are usually significant in volume.The analysis of extensive data requires a lot of resources and time for implementation, which increases the computational complexity and reduces the efficiency of ML models.Furthermore, not all features in the dataset may contribute effectively to CVD diagnosis .
According to the World Heart Federation (WHF), more than 17 million people die from CVDs yearly, and the World Health Organization (WHO) states that the leading cause of death worldwide is CVD [3].About 26 million adults are diagnosed with heart disease each year, according to the European Society of Cardiology (ESC).Accurate and early diagnosis of CVD risk in patients is essential in reducing its associated risks [3].Many reports indicate that CVD is one of the main causes of sudden death in industrialized countries.These increasing death rates, especially prominent in developed countries, affect the health, financial resources, and budgets of individuals [3].Diagnosing and treating heart diseases are very complicated procedures, especially in developing countries, due to the scarcity of diagnostic devices and the lack of medical cadres and resources [3].An electrocardiogram (ECG) is used and is considered the gold standard in CVD detection and its risk analysis.However, this method is expensive, requires a high level of technical expertise, and is time-consuming.Therefore, researchers must find cheaper and more effective alternative methods.
Based on Figure 1, a diagnosis is often made based on a doctor's experience and assumptions.Therefore, it is possible to make wrong decisions that may be fatal in some cases.Hence, ML has become a popular method of efficiently predicting disease rather than relying exclusively on human knowledge [4].With the increasing availability of electronic health data and the resolution of the complexities of CVD diagnosis, computational methods, such as support vector machine (SVM), K-nearest neighbor (K-NN), decision tree (DT), AdaBoost (AB), artificial neural network (ANN), etc., are becoming more applicable, with exploratory value in disease prediction [5].Medical datasets may contain a large volume of data.One problem with analyzing such data is the dimensional curse.These data suffer from a higher dimension with fewer numbers.Reducing feature dimensions improves the effectiveness of prediction models.Therefore, it is necessary to find an efficient method for selecting significant features, removing irrelevant features, and paving the way for effective CVD prediction.Research is still ongoing to find the correct CVD prediction model.The performance of models drops Medical datasets may contain a large volume of data.One problem with analyzing such data is the dimensional curse.These data suffer from a higher dimension with fewer numbers.Reducing feature dimensions improves the effectiveness of prediction models.Therefore, it is necessary to find an efficient method for selecting significant features, removing irrelevant features, and paving the way for effective CVD prediction.Research is still ongoing to find the correct CVD prediction model.The performance of models drops significantly without a proper selection of significant features.Therefore, there is an urgent need to find an efficient methodology for selecting them for the diagnosis of the disease [2,4].
This research aims to improve the accuracy of the CVD prediction model by developing a predictive model consisting of a multilayer perceptron neural network (MLPNN) with an arithmetic optimization algorithm (AOA) (MLPNN-AOA).The AOA is an optimization algorithm used to select the most relevant features [6].The Cleveland dataset was utilized to evaluate the method employed.This dataset contains 304 instances and 14 features with a CVD diagnosis [7].
Many previous studies have examined CVD, including [8][9][10][11][12], but the present research is the first to utilize AOA to select the most relevant features for diagnosis of CVD.The primary goal of the AOA is to obtain optimal fitness solutions and the best performance convergence [6].The performance of MLPNN-AOA was measured in terms of accuracy, the area under the receiver operating characteristic (AROC) curve, the mean square error (MSE), and the precision, recall, specificity, sensitivity, and F1-score.Many deep learning (DL) methods are widely applied to diagnose and predict CVD [13].One of these methods is MLPNN, which is a type of artificial neural network (ANN) that generates a set of outputs from inputs.MLPNN is an example of a feed-forward neural network (FFNN).MLPNN consists of several input layers connected as a directed graph between the input and output layers [14].Backpropagation is used for MLPNN training in MLPNN-AOA.
Feature selection (FS) is a measurable property of the observed process using a feature set that reduces computational requirements, improves prediction, and uses specific metrics to find a subset of features [15].
Several effective decision support models based on ML tools have been proposed widely in previous studies to detect CVD.However, most of these models focus on feature preprocessing only.Unfortunately, datasets may contain redundant and irrelevant features that affect prediction accuracy, precision, processing speed, and problems posed by the predictive model, such as underfitting and overfitting problems.These problems can be solved using MLPNN-AOA (e.g., via data augmentation, simplified neural network, or early stopping) to eliminate irrelevant features, find the best features, and increase prediction accuracy in both the training and testing datasets.
Hence, the primary objective of this research is to implement a predictive CVDdiagnosing model based on the AOA as an optimization algorithm and MLPNN as a prediction model.
The specific objectives of this research are: 1.
To select the best features that affect the accuracy of MLPNN using an AOA. 2.
To compare MLPNN-AOA with other similar models on the same dataset to verify its performance.

3.
To compare AOA with some other optimization algorithms in selecting the most relevant features in the Cleveland dataset.4.
To eliminate the problems of overfitting and underfitting in the CVD prediction model by developing a hybrid MLPNN-AOA algorithm.
The practical importance of the research is due to the fact that medical diagnostics must be specialized, reliable, and supported by computer technologies to reduce the costs of diagnostic tests.Therefore, most researchers want to develop new algorithms to predict heart disease.The main contribution of this research is to develop an MLPNN-AOA prediction model, which has two characteristics.First, the combination of MLPNN and AOA extends the ability to learn and generalize across different specifications of CVD datasets.Second, the implementation of AOA focuses on selecting the most relevant features and then performing the prediction process.This may prevent overfitting or biased classification, ignore irrelevant and redundant features to increase prediction accuracy, and reduce classification time.
In addition, our research paper contributes in the following ways: • Presents practical and academic knowledge to researchers.

•
Helps health professionals, especially doctors, in CVD diagnosis.
• Supports anyone interested in optimization algorithms and ML techniques, especially in the utilization of AOA and MLPNN in many applications.
In this research, MLPNN-AOA is developed to select the most relevant features by AOA to reduce the dimensionality of the dataset that affects the accuracy of the CVD prediction model representing MLPNN.This research used a free, open-source dataset from the UCI Machine Learning repository of 76 attributes [7,16].All published papers used a portion of this set of 14 features.The most relevant feature is the 'target', which indicates that the patient has CVD and represents integer values from 0 (the probability of no injury) to 4 (the high-rate probability of infection).Patient names and National Health Service user numbers (SNS) are removed from the dataset and replaced by dummy values.Different performance measures will be used to evaluate the effectiveness of MLPNN-AOA through accuracy, AROC, MSE, precision, recall, sensitivity, specificity, and F1-score.
The rest of this research is organized as follows.Section 2 presents a research background about ML-based FS and FS using optimization algorithms.In addition, a review of the previous literature for the CVD prediction model and an analysis of related studies are conducted to identify research gaps.Section 3 describes the solution that has been adopted to select the most relevant features and the approach that is used for CVD prediction.Section 4 describes the experiments that have been performed in this research and an analysis of the obtained results.It shows a comparison between MLPNN-AOA, MLPNN, DT, SVM, and RFC.In addition, it presents a comparison between the results of the AOA and the prior optimization algorithms in selecting the most relevant features in the Cleveland dataset.Finally, Section 5 gives the conclusions and suggestions for future work.

Literature Review
Recently, many ML techniques have been suggested for diagnosing CVD and examining its efficient predictive model.This section describes the soundest methods employed in this domain.In addition, it illustrates the importance of utilizing optimization algorithms to select only the relevant features in its in increasing the efficiency and effectiveness of prediction models.
Different ML algorithms are used to predict and classify data, relying on training data.A classification task is used to classify items in the dataset into a predefined set of class labels [2,17].ML has three types: supervised ML, unsupervised ML, and reinforcement machine learning [2,18].
Supervised and unsupervised learning are used to overcome numerous issues with pattern recognition.In supervised learning, considerable classifiers are utilized to classify data, such as self-organizing maps, K-NN, and DT [19].The training data are used to make a function.This includes a pair of input vectors and a class label.The training function is performed to evaluate the approximate length between the input and the output to build a classifier.When the classifier is created, the classification can be performed to classify novel classes based on known class labels [17].In this research, MLPNN and FS were utilized for the CVD prediction.MLPNN is widely used in classification algorithms, and it has outstanding classification accuracy [17].The FS selection is utilized to reduce the features in the dataset by choosing the most relevant one [20].
Feature selection (FS) is a difficult task that needs an optimization algorithm to select the best subset of features that has an impact on the classification accuracy.It is a method to analyze all attributes on a full dataset.Some suitable features for the issue are selected.Its main purpose is to improve classification accuracy and decrease computational time [21].See Figure 2. Eliminating some attributes does not mean they are without important information, but they may not have significant statistical relationships with others.FS methods are required for evaluation and analysis.As demonstrated by [22] and applied by [1,2,23], FS starts with generating subsets from the whole dataset.Then, the evaluation function chooses the features associated with the problem by employing either a wrapper or a filter technique.Finally, the validation stage takes place for the model's efficiency and consistency.Further descriptions are provided by [23], where the types and capabilities of optimization algorithms are used for the FS task.
Currently, there are wide bodies of research covering a wide range of techniques, which can be used as an integral part of predicting CVD using ML methods.Accurate and timely CVD diagnosis is primary for the prevention and treatment of heart failure.Diagnosis of CVD by conventional medical history is unreliable in many respects.To classify healthy people and people with heart disease, noninvasive methods such as machine learning are reliable and effective [5].
A CVD prediction model was proposed by [24] using ML algorithms based on National Health Insurance Service Health Screening datasets (a cardiovascular disease group).An efficient two-layer convolutional neural network (CNN) was proposed by [25] to classify highly unbalanced clinical data for predicting the incidence of coronary heart disease (CHD).A study was presented by [26] that used many classification methods like SVM, naïve Bayes, DT, RFC, and logistic regression (LR) using the Waikato Environment for Knowledge Analysis (Weka) tool for predicting cardiovascular disease.The study of [27] used eight classification algorithms (DT, J48, logistic model tree, RFC, naïve Bayes, KNN, and SVM) to foresee heart disease and perform predictive analysis using data mining techniques to infer efficient algorithms from those algorithms.
In general, as demonstrated by the mentioned studies' techniques, their limitations are mainly in slow computation, due to the large dataset sizes.Hence, several state-of-theart techniques have utilized optimization algorithms to perform a feature selection mechanism to select a subset of the most relevant data to reduce the dimensionality in the prediction model.The main objectives of feature selection are to avoid overfitting or mismatch, enhance generalization, improve model performance, reduce model training time, simplify the model, provide faster and more cost-effective models, and improve prediction and classification accuracy [28].The selected feature set needs a search and routing mechanism for choosing the sub-feature.The objective of the job is to estimate specific features, the terms of termination, and the evaluation outcomes [15].
A hybrid algorithm, genetic algorithm-linear discriminant analysis (GA-LDA), was proposed by [4] for CAD diagnostics.A GA was combined with an LDA to identify and select significant features in the coronary heart disease dataset.A similar model was proposed by [29], the feature optimization by discrete weights (FODW) model.A hybrid model was proposed by [30] consisting of bi-directional long short-term memory with Eliminating some attributes does not mean they are without important information, but they may not have significant statistical relationships with others.FS methods are required for evaluation and analysis.As demonstrated by [22] and applied by [1,2,23], FS starts with generating subsets from the whole dataset.Then, the evaluation function chooses the features associated with the problem by employing either a wrapper or a filter technique.Finally, the validation stage takes place for the model's efficiency and consistency.Further descriptions are provided by [23], where the types and capabilities of optimization algorithms are used for the FS task.
Currently, there are wide bodies of research covering a wide range of techniques, which can be used as an integral part of predicting CVD using ML methods.Accurate and timely CVD diagnosis is primary for the prevention and treatment of heart failure.Diagnosis of CVD by conventional medical history is unreliable in many respects.To classify healthy people and people with heart disease, noninvasive methods such as machine learning are reliable and effective [5].
A CVD prediction model was proposed by [24] using ML algorithms based on National Health Insurance Service Health Screening datasets (a cardiovascular disease group).An efficient two-layer convolutional neural network (CNN) was proposed by [25] to classify highly unbalanced clinical data for predicting the incidence of coronary heart disease (CHD).A study was presented by [26] that used many classification methods like SVM, naïve Bayes, DT, RFC, and logistic regression (LR) using the Waikato Environment for Knowledge Analysis (Weka) tool for predicting cardiovascular disease.The study of [27] used eight classification algorithms (DT, J48, logistic model tree, RFC, naïve Bayes, KNN, and SVM) to foresee heart disease and perform predictive analysis using data mining techniques to infer efficient algorithms from those algorithms.
In general, as demonstrated by the mentioned studies' techniques, their limitations are mainly in slow computation, due to the large dataset sizes.Hence, several state-ofthe-art techniques have utilized optimization algorithms to perform a feature selection mechanism to select a subset of the most relevant data to reduce the dimensionality in the prediction model.The main objectives of feature selection are to avoid overfitting or mismatch, enhance generalization, improve model performance, reduce model training time, simplify the model, provide faster and more cost-effective models, and improve prediction and classification accuracy [28].The selected feature set needs a search and routing mechanism for choosing the sub-feature.The objective of the job is to estimate specific features, the terms of termination, and the evaluation outcomes [15].
A hybrid algorithm, genetic algorithm-linear discriminant analysis (GA-LDA), was proposed by [4] for CAD diagnostics.A GA was combined with an LDA to identify and select significant features in the coronary heart disease dataset.A similar model was proposed by [29], the feature optimization by discrete weights (FODW) model.A hybrid model was proposed by [30] consisting of bi-directional long short-term memory with conditional random field (BiLSTM-CRF) to predict heart disease.An improved functionality based on SVM was also proposed by [31].To select the most relevant features, GA was used.A hybrid model was proposed by [32], consisting of a random search algorithm (RSA) for FS and an RFC for prediction.The proposed model has been improved using a network search algorithm.Similarly, a hybrid model (artificial neural network and deep neural network) was proposed by [33] to eliminate redundant features and a deep neural network for prediction.The proposed model achieved a prediction accuracy of 93.33%, but a limitation of time complexity is not determined.A hybrid ML-based cardiac diagnostic system was developed by [5] using a set of ML algorithms to select important features.Three algorithms were used to validate the proposed model: relief, mRMR, and LASSO.The K-fold validation method was used.A feature selection approach was proposed by [34] based on a multipurpose artificial bee colony algorithm combined with the nondominant screening procedure and genetic operators.
ML is used as an effective support system in health diagnosis that contains a large volume of data.More commonly, parsing such a large volume of data consumes more resources and execution time.In addition, not all features in the dataset support the solution to the specific problem.Thus, there is a need to use an efficient FS algorithm to find the most significant features that contribute the most to disease prognosis.Based on previous research, it is concluded that employing optimization methods to choose the most relevant features will improve the CVD model's accuracy, reduce its computational complexity and execution time, and reduce overfitting and underfitting issues.As a result of this research, an MLPNN-AOA algorithm is proposed, in which an AOA is employed to choose the most relevant features from the Cleveland dataset.Although many studies utilized different feature selection mechanisms, they lack a few capabilities due to the way they either implemented or coded the structure of these mechanisms.These limitations are the following: a normal distribution assumption on features is needed; they are not suitable for rare categories (imbalanced dataset); computation to create a cross-validation evaluation of some potential subsets is costly.In addition, the hybrid approaches are not scaled sufficiently with complexity, and most of them do not measure the AROC, the MSE, or the confusion matrix, which misleads the performance evaluation.In addition, the optimization algorithms (e.g., particle swarm optimization) used in the literature are mainly swarm-optimization-based algorithms, which easily lead to an early convergence towards a local optimum, and their iterative process results in a low convergence rate in general.
Hence, we intended to utilize the AOA proposed by [6].It has the advantage that its implementation is so easy and direct; based on its mathematical presentation, it is able to adapt to and address new improvement problems and undertakes its execution according to a mathematical view.AOA is mathematically designed and implemented in vast areas of research to perform optimization processes [6].

Materials and Methods
Considerable research has obtained surprising results when using neural networks (NNs) in various applications [35].NNs have two learning algorithm types: supervised and unsupervised learning [36].The present research utilized supervised learning because it could conclude a general function depending on the training data, and it would be able to test the data reasonably.Optimization algorithms are used to select the most relevant features that positively affect the prediction model accuracy, execution time, and problems such as overfitting and underfitting [20].
In this research, the MLPNN-AOA model will be implemented.The AOA is used to select the most significant features in the Cleveland dataset, while MLPNN is used for prediction.According to the problem noted in the introduction section of the research problem section, it is necessary to find an adequate CVD prediction model.Therefore, AOA was utilized to select the relevant features and find the best ones on the Cleveland dataset.Then, the diagnosis is made by MLPNN.In general, the proposed methodology comprises five steps: 1.
Extracting medical data that are obtained from the web in a tabular form containing different data types.2.
Data preprocessing using normalization techniques, e.g., chi-square and gain ratio, including handling missing data, then splitting the dataset into training and testing datasets.

3.
The AOA optimizer is used for the feature selection task to determine the best subset of features from the training dataset.4.
The MLPNN classifier is then employed on the training dataset to train the prediction model for the classification task based on the best subset of features.5.
Finally, the MLPNN classifier is employed on the testing dataset for classifying the unlabeled data into two classes for the prediction model.
As shown in Figure 3, three main phases have been undertaken to implement the MLPNN-AOA model.

Phase 1: Data Preprocessing
A freely available dataset (namely, the Cleveland dataset [7,37]) is utilized in this research.It is an open-source dataset obtained from the UCI repository, holding 14 numeric features.The most important of these features is a class feature labeled as the 'goal', which

Phase 1: Data Preprocessing
A freely available dataset (namely, the Cleveland dataset [7,37]) is utilized in this research.It is an open-source dataset obtained from the UCI repository, holding 14 numeric features.The most important of these features is a class feature labeled as the 'goal', which refers to whether the patient has heart disease or not.We have chosen this dataset solely because it was widely used in the literature and has been studied comprehensively.Also, other datasets in the same repository for the same heart disease prediction task (e.g., datasets from Hungary, Switzerland, Long Beach VA, and Statlog) are not implemented in our research work because they have missing values.
The dataset must be prepared to obtain good prediction accuracy by removing redundant and duplicate records.Furthermore, most ML algorithms only deal with numeric feature values.Issues with noise, missing values, and inconsistency are expected, particularly in the medical field.When operating with data of low quality, low-quality results are obtained.Usually, feature records have missing values [38].Therefore, it is better to process non-numeric values to obtain many results.Therefore, the initial step in any ML approach is dataset preparation, for attain an appropriate format that is most valuable for the modeling stage [39].The following is a review of the Cleveland dataset preparation steps that have been performed: Step 1: Normalization is a data scaling method, which is the procedure for decreasing attribute values to a limited degree [40].Usually, it is performed before FS and modeling stages according to different attribute scales, which confuse attribute comparison and impair the learning capability of the algorithms.
Step 2: Since the prediction model deals with only two classes, to improve the accuracy of the CVD prediction model, the classes (0,1,2,3,4) in the class label are transformed into only two: zero (if the original value is zero, then there is no CVD) and one (if the value is greater than or equal to one, then there is a CVD).

Phase 2: Data Reduction
FS is a data-reduction technique that involves selecting a subset of relevant features without changing feature dimensions to build a prediction model.The FS needs a search strategy and direction to select the sub-feature set, an objective function to evaluate the chosen features, termination condition, and outcome evaluation [41].
The main essence of optimization algorithms lies in finding new solutions with rules set that differ from one algorithm to another.These solutions are frequently evaluated to find the best one.These algorithms seek to find the best solution, as it has become important to not be satisfied with one process.The probability of reaching an optimal solution increases with the increase in the random number of these solutions and the number of iterations with substantial enough improvements [42].
Optimization processes are divided into two main phases: exploration and exploitation.The exploration phase aims to explore a wide range of research areas, using proxies to avoid local solutions.The exploitation phase aims to reach promising solutions close to improving their efficiency locally.The efficiency of the optimization algorithm requires an appropriate balance between the previous two stages.
In this research, the AOA was utilized to identify the most relevant features, where a subset of them was selected, consisting of twelve whose performance was greater than or equal to the performance offered by the other thirteen.
AOA is one of the optimization algorithm types which can solve optimization problems without counting their derivatives (meta-heuristic optimization population algorithms especially can achieve this).The exploration and exploitation stages are represented in this algorithm based on simple mathematical operations: (A "+"), (S "−"), (M "×"), and (D "/").More details on the mechanism of the exploration and exploitation phases in the AOA can be found in [6].
Arithmetic is a fundamental part of number theory.It is one of the most significant parts of modern mathematics, along with algebra, geometry, and analysis.The traditional arithmetic measures used to study numbers are simple arithmetic operators (M, D, S, and A) [43].
The main inspiration for AOA stems from the use of the simple arithmetic operators above in solving arithmetic problems.To choose the best solution, AOA uses these factors as mathematical optimization.The selected solutions are subject to specific criteria to be selected from a solution set.
The behavior and influence of arithmetic factors in AOA start by filtering several solutions that are generated randomly.The best solution obtained in each iteration is considered the best solution.First, the search stage must be determined as exploration or exploitation before the AOA starts working.Math optimizer accelerated (MOA) is a parameter used in the exploration and exploitation stages, where it utilizes the current iteration ranging from 1 to the maximum number of iterations as a termination condition of non-improvement criteria.The AOA employs the exploration and exploitation processes of a solution space using (D) and (M) operators, aiming to discover a semi-optimal solution, which can be deduced after many iterations.This also supports the second stage (exploitation) in improving the search process via enhanced communication between two search strategies.
The exploration stage explores the search area in several areas and uses methods to find the best solution using two arithmetic operations (D) and (M).The implementation of (D) or (M) is conditional on the MOA function and a random variable (r1) that fulfills the condition r1 > MOA.As shown in Equation ( 1), the implementation of (D) (the first rule in the equation) is conditional on r2 < 0.5; r2 is a random variable; otherwise, (M) is executed.
The parameter (C_Iter) refers to the position of a solution in the current iteration, which by default is considered the best solution found so far.On the other hand, the parameter C_Iter + 1 is the ith solution in the following iteration controlled by upper and lower bounds.Then, math optimizer probability (MOP) is implemented; this is the coefficient, where MOP (C_Iter) is the value of the function at the current iteration C_Iter, and M_Iter is the maximum iterations number.
The exploitation stage is conducted using (S) or (A) operations which are meant to explore the search space, aiming at finding a near-optimal solution after a predetermined number of iterations.The operation of the exploitation stage is conditional on the value of the MOA function, where it must meet a condition (r1) that is less than the value of the MOA (C_Iter).The implementation of (S) (first rule in the equation) is conditional on r3 < 0.5; otherwise, (A) is executed.Producing a random number at each iteration, especially in the last iteration, sustains the exploration process by avoiding local optima stagnation.The estimation of the semi-optimal solution that is finally obtained can be randomly placed within a range that is determined by the positions of (D, M, S, and A) in the search range.
This can be summed up as follows: the AOA algorithm begins with random solutions.Factors (D, M, S, and A) estimate where solutions are in relation to an optimal solution.Then, each solution revamps its site to approach the best solution.The factor MOA will change its value from 0.2 to 0.9.Whenever the value of MOA < r1, it moves away from a near-optimal solution.If it is MOA > r1, then it approaches a near-optimal solution.Eventually, the AOA algorithm is stopped by reaching the criterion, as shown in Algorithm 1, pseudo-code of the AOA, in Algorithm 1 [6].

Phase 3: Classification Task
The MLPNN model is one an ANN.As information flows from one layer to the next layer, it is called a feed-forward model.MLPNN refers to networks consisting of multiple layers of cognition with threshold activation.MLPNN in its simplest form consists of three layers of nodes: the first layer is called the input layer, the intermediate layer is hidden, and the last layer is the output where the resulting output is obtained.Each layer consists of a specified number of nodes.Each node is a neuron that uses a nonlinear activation function except for the input nodes.Each node in each layer is connected to each node of the next and previous layer.The connections are called links or synapses.MLPNN is classified by the number of hidden layers, i.e., the number of all layers except the input and output layers.MLPNN uses a supervised learning method in training, which utilizes backpropagation.Backpropagation algorithms are widely used ML algorithms for training ANNs [44].
Backpropagation algorithms are used for calculating gradients; this is important in this model and in neural networks in general.The term is used to refer to the entire learning algorithm, including how gradients are used, such as random gradients.Backpropagation generalizes at the expense of delta-base gradation, which is the monolayer version of backpropagation; this, in turn, is generalized through auto-differentiation, where backpropagation is a special case of reverse accumulation (or "reverse mode") [45].
In MLPNN-AOA, this step relies on training, experimentation, and the comparison of algorithm parameters for improving the MLPNN's accuracy in predicting the probability of infection.The MLPNN's set configuration parameters are as follows: 1.
The hidden layer's number: four hidden layers with four neurons for each layer and two output units.

2.
The biases and weights were first initialized randomly.

3.
The maximum number of epochs is 500.4.
The activation function was set via a "set" method.

Results
MLPNN-AOA consists of AOA to select the most relevant features; then, MLPNN is used to predict the probability of CVD.The dataset used here is the Cleveland dataset; this was chosen because it is applied in many state-of-the-art approaches for predicting CVD.The dataset is split into two sets: training and testing (70% for training and 30% for testing).The training set is utilized to build the classifier and the testing set is utilized to evaluate it.The validation set is the same as the testing set.Many preliminary experiments are performed to obtain the best configurations that give the best results.
This section aims to explain the working environment that has been used for implementation, the criteria used to evaluate it, and the method of implementing it.A review of the obtained results is provided.Finally, the obtained results are compared with the results achieved using the MLPNN, DT, SVM, KNN, naïve Bayes, and RFC without FS.

Experimental Setup
The metrics that are used in experiments have the same cessation conditions.There are nine main evaluation metrics that are utilized to estimate the proposed model.
Accuracy: This represents how the ML algorithm is accurate in classification or prediction.Accuracy is defined as the ratio of correctly predicted data to all data.It is defined mathematically as several data that the algorithm correctly classified as true or false, segmented using the sum of the data categorized as true or false.Equation (2) shows how to calculate it [46].
The area under the receiver operating characteristic (AROC) is a widely utilized statistic for evaluating the discriminative power of species distribution models.The area under the ROC curve is calculated by determining how accurate the quantitative diagnostic test is [47].
Execution Time: This is time taken by the prediction models to predict the probability of developing CVD.Equation (3) shows how to calculate it.See Figure 4.

ExecutionTime = (Finishing
Geometric mean (Geomean): This is the product of several series by the inverse of the total length of the series.The Geomean standard is most useable when the numbers tend to have large fluctuations or the numbers in the series are dependent.Equation (4) shows how to calculate it [48].
where R 1 . . .R n is the average of the observations and n is the total number of observations.F1-Score: This represents a combination of precision and recall classifiers in one metric by taking the harmonic mean of them.It is often used to compare the results of two different classifiers.Equation (5) shows how to calculate it [49].
False Positive (FP): This represents the number of negatively classified categories.In other words, it answers the following question: which categories are incorrectly predicted as being positive categories?
False Negative (FN): This represents the number of positively classified categories.In other words, it answers the following question: which categories are incorrectly predicted as being negative categories?
Mean Square Error (MSE): This is the mean squared difference between the evaluated subject and the evaluated values.MSE is the easiest and most expected loss function in ML.As illustrated in Equation ( 6), the MSE means the difference between the model's predictions and the ground truth, squares them, and modifies them across the entire dataset [50,51].
Precision: This represents how the ML algorithm can determine how close the prediction results are to each other, regardless of whether those predictions are accurate or not.Equation (7) shows how to calculate it [46].
Recall (sensitivity, also called true positive rate): This represents the sick patient classification probability, which means the capability of a test to recognize those with the disease.Equation (8) shows how to calculate it [46].
Specificity: This is the ratio of the true negative that the model correctly predicts.Equation (9) shows how it is calculated [46].
True Positive (TP): This represents the number of positively classified categories.True Negative (TN): This represents the number of negatively classified categories.
MLPNN-AOA was implemented on a Lenovo workstation with Intel(R) processor Intel core-i5 4460 CPU 3.20 GHz; it has a 4 GB DDR3 RAM, Windows 8, a 64-bit operating system, and an x64-based processor.The program was written using MATLAB R2020a language.It is a powerful computational package that is dependent on a proprietary computational language that provides tools for users with a wide range of programming knowledge; it is utilized in different applications.

Testing and Analysis
In this research, two methods for diagnosing and predicting CVD are studied, analyzed, and compared.The first method deals with CVD prediction using MLPNN, DT, SVM, RFC, KNN, and naïve Bayes without FS.The second one uses AOA to select the most relevant features in the Cleveland dataset and then predicts using MLPNN.Below is a review of the results of the two techniques.

CVD Prediction without Using FS
Several experiments have been conducted on MLPNN by random choice for the configuration of parameters, such as the number of neurons in each hidden layer, learning rate, number of epochs, and momentum alpha.Table 1 shows the performance metrics that are used to determine what the best MLPNN configuration parameters are.It can be concluded that the best configuration parameters are (4, 4, 0.6, 500, and 0.05) for the number of hidden layers, the number of neurons in each layer, the learning rate, the number of epochs, and the momentum alpha, respectively; these achieve (84.444%, 0.156, and 0.711) in terms of accuracy, MSE, and AROC, respectively.

Testing and Analysis
In this research, two methods for diagnosing and predicting CVD are studied, analyzed, and compared.The first method deals with CVD prediction using MLPNN, DT, SVM, RFC, KNN, and naïve Bayes without FS.The second one uses AOA to select the most relevant features in the Cleveland dataset and then predicts using MLPNN.Below is a review of the results of the two techniques.

CVD Prediction without Using FS
Several experiments have been conducted on MLPNN by random choice for the configuration of parameters, such as the number of neurons in each hidden layer, learning rate, number of epochs, and momentum alpha.Table 1 shows the performance metrics that are used to determine what the best MLPNN configuration parameters are.It can be concluded that the best configuration parameters are (4, 4, 0.6, 500, and 0.05) for the number of hidden layers, the number of neurons in each layer, the learning rate, the number of epochs, and the momentum alpha, respectively; these achieve (84.444%, 0.156, and 0.711) in terms of accuracy, MSE, and AROC, respectively.
The experimental results in the CVD prediction problems of MLPNN, SVM, DT, KNN, naïve Bayes, and RFC in terms of accuracy, MSE, recall, precision, F1-score, AROC, Geomean, and execution time are shown in Table 2.It can be concluded that SVM outperforms the other classifiers in all performance metrics.In detail, SVM achieves (81.1111%, 0.18889, and 0.822) in terms of accuracy, MSE, and AROC, respectively.MLPNN achieves (84.44%, 0.156, and 0.711).KNN achieves (61.11%, 0.39, and 0.6944).DT achieves (56.67%, 0.43, and 0.231).Naïve Bayes achieves (42.22%, 0.58, and 0.1).RFC achieves (0, 1, and 0.29).Several experiments were performed on MLPNN-AOA to choose the best function, the number of solutions, the iterations, the lower and upper bounds, and the dimensions.Table 3 shows the best-obtained solutions.As shown in Table 3, the best AOA functions that outperformed the others are (F8, F11, F13, F20, F21, F22, and F24).The best function of them is that F20 achieves (88.890%, 0.110, and 0.840) for accuracy, MSE, and AROC, respectively; it achieves (two and ten) for the iteration number and the number of solutions, [0, 1] for the upper and lower bounds, and thirteen for the dimensions.To guarantee the usefulness of MLPNN-AOA, it was tested by splitting the dataset into 80% for training and 20% for testing, in addition to a 10-fold cross-validation.As shown in Table 4, the best AOA functions that outperformed the others are (F11, F20, and F24).The best one is F20, which achieves (86.67%, 0. 1333, and 0.85) for accuracy, MSE, and AROC, respectively; it achieves (twenty and ten) for the iteration number and the number of solutions, [−100, 100] for the upper and lower bounds, and thirteen for the dimensions.Table 5 shows the experimental results for the 10-fold cross-validation.The best AOA function is F20, which achieves (60.00%, 0.40, and 0.47) for accuracy, MSE, and AROC, respectively; it achieves (ten and two) for the iteration number and the number of solutions, [0, 1] for the upper and lower bounds, and thirteen for the dimensions.From Tables 1 and 2, F20 selects the highest feature number, equal to twelve, which means that all features in the Cleveland dataset positively affect CVD predictions, excluding feature number 10; this represents exercise-induced ST depression, compared to the rest, which do not (see Figure 5).Therefore, it can be concluded that the first question of the research has been answered, and the first objective has been achieved.

Comparison of MLPNN-AOA with MLPNN
Based on the evaluation metrics improvement percentage, Table 6 shows the comparison between MLPNN-AOA and MLPNN in terms of accuracy, average MSE, AROC, F1-score, and Geomean.In terms of accuracy, it can be seen that the MLPNN-AOA model surpasses the MLPNN model.In detail, MLPNN-AOA reaches 88.890% when the number of epochs is (500); meanwhile, MLPNN achieves 84.444%.Thus, it can be inferred that the accuracy increases when the most relevant features are chosen by AOA.Therefore, the second question of the research has been answered.On the other hand, in terms of the average MSE, it can be seen that MLPNN-AOA exceeds the MLPNN.In detail, MLPNN-AOA reaches 0.11 when the number of epochs is (500); meanwhile, MLPNN achieves 0.156.Hence, it can be inferred that the MSE reduces when AOA chooses the most relevant features in the Cleveland dataset.In addition, in terms of AROC, it can be seen that MLPNN-AOA exceeds the MLPNN.In detail, MLPNN-AOA reaches 0.840 when the number of epochs is (500); meanwhile, MLPNN achieves 0.711.Also, in terms of Geomean, it can be seen that MLPNN-AOA exceeds the MLPNN.In detail, MLPNN-AOA reaches 0.852 when the number of epochs is (500); meanwhile, MLPNN achieves 0.796.The same goes with the F1-score as well.
It can be concluded that MLPNN-AOA significantly improved the performance of MLPNN-based FS.Table 7 shows statistical significance at the level of 0.0001, a confidence interval of 5.158, and the degrees of freedom of MLPNN-AOA and MLPNN.The results proved that MLPNN-AOA is statistically feasible.In this subsection, the experimental results for MLPNN-AOA are compared with other FS approaches, such as correlation-based feature selection (CFS), relief, filtered subset, PSO, info gain, chi-squared, consistency subset, filtered attribute, one-attribute-based approach, GA, and gain ratio.Table 8 shows the comparison with some state-of-the-art models in terms of the number of FS approaches and the prediction accuracy of MLPNN after they selected the most relevant features.It is concluded that MLPNN-AOA is superior to other models in terms of prediction accuracy on a Cleveland dataset with twelve features; it is noted that all the other optimization algorithms selected feature 10 except for AOA.So, the third objective of this research is achieved.
As shown in Table 8, the performance of MLPNN-AOA in terms of accuracy is compared with some previously proposed prediction models using FS before predicting CVD using MLPNN on the Cleveland dataset.It can be concluded that the accuracy of the MLPNN-AOA model outperformed that of all other models.
The research outcomes confirmed that MLPNN-AOA surpassed the SVM, MLPNN, DT, KNN, naïve Bayes, and RFC in terms of accuracy, MSE, and AROC.Further, it outperforms other models based on FS, such as PSO-MLP.AOA has shown its ability to decrease the number of features and select the best ones in the Cleveland dataset, where it ignores the irrelevant feature number ten.So, MLPNN-AOA facilitates learning the dataset by reducing the total feature number; as a result, it eliminates the problem of overfitting and underfitting.Hence, the third question of the research has been answered, and the fourth objective has been achieved.Further, it can be concluded that the objectives of the research have been achieved.Despite the noticeable improvement of MLPNN-AOA over MLPNN in terms of MSE, AROC, and F1-score, the improvement percentage was slight in accuracy.Also, MLPNN-AOA selected twelve features, excluding only feature number ten from the Cleveland dataset; meanwhile, other state-of-the-art models such as the filtered subset select six, which reduces the computational complexity of the dataset and reduces the problems of overfitting and underfitting.

Conclusions
In conclusion, the development of an intelligent system for accurately diagnosing cardiovascular diseases (CVDs) represents a crucial advancement in the healthcare field.In this research, we proposed an automated ML model for various kinds of CVD prediction and classification.Our prediction model consists of multiple steps.Firstly, a benchmark dataset is preprocessed using filter techniques.Secondly, the novel AOA is implemented as a feature selection technique to select the best subset of features that influence the accuracy of the prediction model.Thirdly, the classification task is implemented using a multilayer perceptron neural network to classify the instances of the dataset into two class labels: determining whether a CVD is present or not.The AOA is used as an optimization algorithm.It is one of the robust FS algorithms utilized in various fields, where it selects the most relevant features that can improve accuracy and overall performance measurements.The limitation of the proposed model in this research is generally shown in the exact number of features needed for the CVD prediction model to significantly increase.It depends on how it predicts undefined classes efficiently in terms of accuracy, MSE, precision, recall, F1-score, AROC, Geomean, execution time, and the total number of selected features.The Cleveland dataset has been utilized in training and testing MLPNN-AOA.
The results of the two methods used in this research have been compared.The first is the prediction of CVD without FS and the second is MLPNN-AOA.The results demonstrate that MLPNN-AOA outperformed the other six classifiers on all performance measures.Moreover, the results show an improvement between MLPNN-AOA and MLPNN without FS; in contrast, MLPNN-AOA improved MLPNN by 14.74% in accuracy, 48% in MSE, and 1.9% in AROC.Moreover, the prediction accuracy of MLPNN-AOA was compared with other prediction models proposed in previous studies that use FS methods in terms of prediction accuracy.The results showed the superiority of MLPNN-AOA over the other models by selecting 12 features, excluding feature number 10, which was selected by most of the other models.In future work, hybridizing MLPNN with any other optimization algorithm may be proposed to choose efficient and unexplored features that improve the significance of this research and develop efficient prediction models for CVD problems.Other deep learning methods can also be utilized instead of MLPNN, such as CNNs.

Figure 2 .
Figure 2. A general example of CVD prediction using supervised learning based on a feature selection mechanism.

Figure 2 .
Figure 2. A general example of CVD prediction using supervised learning based on a feature selection mechanism.

1 .
Extracting medical data that are obtained from the web in a tabular form containing different data types.2. Data preprocessing using normalization techniques, e.g., chi-square and gain ratio, including handling missing data, then splitting the dataset into training and testing datasets.3. The AOA optimizer is used for the feature selection task to determine the best subset of features from the training dataset.4. The MLPNN classifier is then employed on the training dataset to train the prediction model for the classification task based on the best subset of features.5. Finally, the MLPNN classifier is employed on the testing dataset for classifying the unlabeled data into two classes for the prediction model.As shown in Figure3, three main phases have been undertaken to implement the MLPNN-AOA model.

Table 1 .
Performance comparison of MLPNN configuration parameters.

Table 1 .
Performance comparison of MLPNN configuration parameters.

Table 8 .
Comparison with the FS algorithms.