Cost-Sensitive Models to Predict Risk of Cardiovascular Events in Patients with Chronic Heart Failure

: Chronic heart failure (CHF) is a clinical syndrome characterised by symptoms and signs due to structural and/or functional abnormalities of the heart. CHF confers risk for cardiovascular deterioration events which cause recurrent hospitalisations and high mortality rates. The early prediction of these events is very important to limit serious consequences, improve the quality of care, and reduce its burden. CHF is a progressive condition in which patients may remain asymptomatic before the onset of symptoms, as observed in heart failure with a preserved ejection fraction. The early detection of underlying causes is critical for treatment optimisation and prognosis improvement. To develop models to predict cardiovascular deterioration events in patients with chronic heart failure, a real dataset was constructed and a knowledge discovery task was implemented in this study. The dataset is imbalanced, as it is common in real-world applications. It thus posed a challenge because imbalanced datasets tend to be overwhelmed by the abundance of majority-class instances during the learning process. To address the issue, a pipeline was developed speciﬁcally to handle imbalanced data. Different predictive models were developed and compared. To enhance sensitivity and other performance metrics, we employed multiple approaches, including data resampling, cost-sensitive methods, and a hybrid method that combines both techniques. These methods were utilised to assess the predictive capabilities of the models and their effectiveness in handling imbalanced data. By using these metrics, we aimed to identify the most effective strategies for achieving improved model performance in real scenarios with imbalanced datasets. The best model for predicting cardiovascular events achieved mean a sensitivity 65%, a mean speciﬁcity 55%, and a mean area under the curve of 0.71. The results show that cost-sensitive models combined with over/under sampling approaches are effective for the meaningful prediction of cardiovascular events in CHF patients.


Introduction
In the world, more than 64 million people suffer from heart failure (HF) [1].The term chronic heart failure (CHF) refers to a clinical syndrome characterised by symptoms and signs due to structural and/or functional abnormalities of the heart, resulting in a reduction in cardiac output and/or an increase in intracardiac pressure at rest or under stress.CHF is defined as a condition characterised by asthenia and dyspnoea, as well as signs of pulmonary and/or systemic venous congestion.Although these symptoms manifest over a medium-long period, patients seem to be asymptomatic during that time, while the causes that lead to HF often exist for a long time before symptoms appear.The early identification of the organic causes that drive HF is very important to ameliorate the prognosis, but also to allow the precocious detection of decompensation, in case of an already developed disease [2,3].CHF can be classified according to different criteria.The updated European Society of Cardiology Guidelines recommend considering the classification based on the measurement of the left ventricle ejection fraction (LVEF).In addition, the New York Heart Association (NYHA) classification is used to define the gravity of symptoms.The higher the NYHA class, the worse the clinical condition, and the poorer the overall outcome.However, it is possible that patients with mild symptoms may have an increased risk of mortality and/or hospitalisation [4,5].
According to the guidelines of the European Society of Cardiology, we recognise three phenotypes of HF, one with reduced ejection fraction (HFrEF) with an LVEF ≤ 40%; one HF with mildly reduced ejection fraction (HFmrEF) with an LVEF between 41 and 49%; and one HF with preserved ejection fraction (HFpEF) with LVEF ≥ 50%.This characterisation is not only echocardiographic but also aetiological and clinical.In fact, HFrEF is predominantly due to ischaemic heart disease, affects mostly young male patients and responds well to pharmacological treatment with inhibitors of the renin-angiotensin-aldosterone system, beta-blockers and glyflozines.In contrast, the HFmrEF and HFpEF forms are prevalent in the elderly and in women, and in this phenotype, an important influence on the aetiology and symptomatology is determined by comorbidities, especially chronic kidney disease, anaemia, arterial hypertension, atrial fibrillation and chronic obstructive pulmonary disease.In these two phenotypes, the correct treatment of comorbidities and treatment with diuretics and glyflozines is crucial [6].In addition to the NYHA functional class, the INTERMACS (Interagency Registry for Mechanically Assisted Circulatory Support) classification plays an important role when it comes to terminal forms of HF, which is used to assess patients who could benefit from mechanical supportive circulation [7].
HF is widespread in Europe.Indeed, about 15 million people suffer from this disease, whose estimated incidence is 2-3% per year in a population older than 40 years of age.Current epidemiological projections forecast that such numbers will further increase in parallel with the progressive increment in life expectancy [8].In particular, there is a CHF prevalence of 6.4% in subjects older than 65 years.These considerations highlight the importance of a better and earlier disease identification, hopefully coupled with greater public awareness of the relevance of CHF detection, associated with a more rational use of health resources [9].Therefore, it is noteworthy to consider the problem of recurrent hospitalisations for CHF patients.An Italian study showed that the one-year hospitalisation rate of CHF patients was about 22%.The outcome after one year demonstrated, in the same study, that the mortality rate was greater in NYHA III-IV patients than in NYHA I-II patients (14.5% and 4%, respectively) [10].Epidemiological data based on hospital discharge showed that CHF was the second cause of hospitalisation, with an incidence rate of 4-5 cases for every one-thousand inhabitants.Moreover, one patient out of four had a hospital readmission after one month.Fifty percent of patients were re-hospitalised during the six months following the first hospital admission, and their prognosis worsened.According to these reports, health costs were around 2% of the total healthcare expenditures, and hospitalisation could account for 60-70% of the global CHF cure burden [11][12][13].Obviously, frailty and comorbidities may have a negative impact on health costs and prognosis [14].
The importance of early diagnosis is well recognised.Predictive models based on common clinical variables are potentially useful for the early detection of decompensation risk.In this regard, several studies have been published, based on the evaluation of telemonitored patients undergoing investigation related to common clinical parameters, such as weight, systolic blood pressure (SBP), diastolic blood pressure (DBP), heart rate (HR), and arterial oxygen saturation [15,16].A relevant advantage of these systems relies on the implementation of easy home checking, affordable by patients themselves or their caregivers.This approach can be associated with a better compliance of most patients, who prefer non-invasive procedures which do not require going to hospital.
Machine learning (ML) is an interesting area of research within heart failure.ML techniques have been applied to many aspects of heart failure such as diagnosis, classification, and prediction [17].Some limitations of ML in this field were discussed in [18].Knowledge discovery techniques can be properly taken into account due to their effective impact in the cardiovascular domain for prediction tasks [19,20].ML and KD techniques can significantly contribute to disease identification and make a real-time effective clinical decision.Many ML methodologies have already been investigated in predicting the presence of adverse events in CHF patients, such as destabilisation, re-hospitalisation, and mortality [21,22].
It is important to point out that in many applications of ML, such as medical diagnosis, datasets are often imbalanced.Typically, the number of patients is far less than that of healthy individuals.In order to solve this problem, several methods are proposed in the literature.In order to generate a balanced dataset, over-sampling methods add more data to the smaller class, making it the same size as the larger class [23,24]; under-sampling methods sample the larger class in order to have the same size as the smaller class [25].Cost-sensitive learning approaches take the costs of misclassification errors into account [26][27][28][29][30].They assign a higher misclassification cost for objects belonging to the minority class with respect to the misclassification cost for objects belonging to the majority class.For instance, the approach presented in [28] employs support vector machines (SVMs) as a classification method and assigns penalty coefficients to both positive and negative instances.Combinations of sampling techniques with cost-sensitive learning methods were shown to be effective in addressing the class imbalance problem [31].
In this paper, we designed a knowledge discovery (KD) task and implemented it with a two-fold purpose.Firstly, the predictive capability of clinical variable (CV) events are investigated analysing real collected data spanning five years from the ambulatory of the Geriatrics Division at the "Mater Domini" University Hospital in Catanzaro, Italy.Secondly, different ML models for predicting cardiovascular deterioration events in CHF patients were developed.The analysis focused on clinical decompensation events and major CV events were selected.The KDD analysis was defined as a predictive task stated as a supervised binary classification problem [32].The real data were analysed and a dataset was constructed from it.The dataset exhibited an imbalanced distribution due to the under-representation of event cases.Dealing with this imbalanced data was one of the most challenging aspects of applied ML.To address the issue, a pipeline was specifically developed to handle imbalanced data.Subsequently, various ML models were trained, and their hyper-parameters were optimised using a grid search approach.The performance of these learned models was then evaluated using appropriate metrics.To further tackle the class imbalance problem, three distinct approaches were implemented and tested: cost-sensitive learning methods, data resampling methods, and a combination of costsensitive learning methods with data resampling methods.The results demonstrated that combining sampling methods with cost-sensitive learning models yielded promising values for sensitivity and balanced accuracy.Moreover, several computational experiments were carried out to optimise the hyper-parameters of the ML models to improve the performance on the real-world and imbalanced dataset.The ML approach adopted in this study can be broken down into four main steps: (1) Data preprocessing-this step involved operations such as data cleaning, handling missing data, data transformation, and reducing data imbalance; (2) Features selection-this step aimed to reduce overfitting, training times, and improve accuracy by selecting the most relevant subset of variables to build the predictive model; (3) Model building-in this stage, parameter and hyper parameter values for the ML model are chosen to optimise its performance; (4) Cross-validation approach-the dataset was divided into two separate groups-a training set and a test set-for validating the ML model.The model is trained on the training set and then tested on the test set.By following these steps, the study successfully developed predictive models for identifying cardiovascular deterioration events in CHF patients based on the real-world data collected from the Geriatrics Division at the "Mater Domini" University Hospital.The remainder of this paper is structured as follows.Section 2 offers a comprehensive description of our data and outlines the data processing method employed.Section 3 presents an overview of the ML models selected for this research paper (i.e., support vector machine, artificial neural network, naïve Bayes, decision tree, and random forest), the three methods that we adopt to balance the class distribution of the unbalanced dataset, and the parameter tuning approach for the ML models.Moving on, Section 4 presents the experimental results and discusses the model performance metrics.Lastly, Section 5 provides the concluding remarks for this paper.

Real Data Collection and Dataset Construction
The data were collected over five years, as part of a pilot study conducted at the CHF outpatient clinic of the Geriatrics Division at the "Mater Domini" University Hospital in Catanzaro, Italy.The key steps undertaken are described below, and the statistical analysis was performed using R software, version 4.0.1 [33].

Study Population
A total of 154 patients suffering from CHF participated, comprising 119 men (77.3%) and 35 women (22.7%).However, only 50 patients (i.e., 32.5% of the total) who voluntarily provided their consent were enrolled in the pilot study.During the baseline assessment, all patients underwent medical history examination requiring a full physical examination.The medical history evaluation mainly focused on CV events, respiratory, and metabolic comorbidities, while key haemodynamic and anthropometric parameters were measured.The set of outpatients was meticulously monitored with a five-year follow-up conducted every 3 months.During each visit, six parameters were measured: weight, heart rate (HR), respiratory rate (RR), body temperature (BT), systolic blood pressure (SBP), and diastolic blood pressure (DBP).Clinical decompensation, with or without hospitalisation, and major CV events such as acute coronary syndrome, myocardial infarction, percutaneous transluminal coronary angioplasty (PTCA), surgical coronary artery revascularisation, stroke, death, and hospitalisation for any reason were reported.All clinical events had to be validated by source data (hospital records, death certificates or other original documents).Throughout the five-year follow-up, some patients missed a few medical visits, while others did not attend any appointments from the second-year onwards.Only eight patients completed the full five-year follow-up period.

Events during Follow-Up
During an average follow-up of 60 months, 19 patients presented a CV event.Among those patients who experienced an adverse event, 8 patients developed a second episode of decompensation.Notably, patients who manifested episodes of clinical instability were found to be older (p < 0.05).Figure 2 displays the distribution of events based on the nine subtypes of CHF.Among the patients with ischaemic aetiology, 9 individuals (39.13%) had an event, while 2 patients (22.2%) were affected by idiopathic dilated cardiomyopathy; 2 patients (50%) had hypertensive aetiology; 3 patients (37.5%) had valvular disease; and 2 patients (50%) presented both valvular disease and hypertension.Interestingly, patients with alcoholic aetiology remained in stable clinical condition without any complications, in contrast to other types of CHF.Approximately half of the patients who experienced CV deterioration had a history of hypertension, and just under half had mitral insufficiency, while no patient with transient ischemic attack (TIA) presented further complications.

Data Preprocessing
A dataset consisting of 187 instances and suitable for the prediction task was created based on the collected data.Originally, the dataset was in a wide format with 50 rows, each containing personal data and a medical history of patients, along with visit dates, vital signs recorded at the visit, events occurring between the current and previous visits, and the corresponding event dates.
To perform the classification task, the data were converted into a long format where each row represents a patient's visit.Input errors were corrected, outliers were discarded, and missing values were statistically imputed.Categorical variables with n values (e.g., aetiology, CV history, and other diseases) were converted into n numerical binary variables, while numerical data were expressed in their correct units of measurement.The resulting dataset comprises 794 instances and 37 features.Each instance was labelled as positive if there were CV deterioration events between two consecutive visits, and negative otherwise.
Instances representing CV deterioration events were assigned to Class 1, while instances without any events were assigned to Class 2.
Imbalanced datasets are common in real-world applications and often become the focus of significant research efforts in knowledge discovery and data engineering.A dataset is considered imbalanced when one class, known as the minority class, is under-represented compared to the other class, which is the majority class.Relative imbalances frequently occur in practical scenarios, prompting extensive research in knowledge discovery and data engineering.For ML algorithms, imbalanced datasets pose a challenge because they tend to be overwhelmed by the abundance of majority class instances during the learning process.Therefore, methods are needed to improve recognition rates and address the issue of imbalanced data.In our dataset, the significant disparity between the number of negative instances (majority class) and positive instances (minority class) makes it imbalanced.Specifically, there are only 31 positive instances, accounting for a mere 4.6% of the entire dataset.

Feature Selection
Our aim was to develop a predictive model for the early detection of cardiovascular events in patients with CHF, utilising a limited set of basic clinical parameters.Through the feature selection process, vital signs were identified as the most informative factors.Vital signs, i.e., HR, RR, DPB, and SBP, were identified as the key factors for monitoring CHF and assessing the patient's overall condition.These parameters play a crucial role in the predictive model's design due to their significance in both CHF monitoring and patient assessment.Following the feature selection process, duplicate instances were removed from the dataset.

Machine Learning Process
In this section, we illustrate an overview of the entire machine learning process.We constructed, tested, and compared five ML predictive models, which are briefly introduced below.The details of this process are described in the subsequent section.
Supervised learning models were developed to accurately predict the risk of major events in CHF patients.The ML models implemented to develop the related prediction models are support vector machines (SVMs), artificial neural network, naïve Bayes, decision tree, and random forest.
SVM is based on the statistical learning theory [34,35] and is the most widely used ML technique available nowadays.It searches for an optimal hyperplane that separates the patterns of two classes by maximising the margin.Let X be a dataset with N instances X = (x 1 , . . ., x N ), where x i , i = 1, . . ., N, denotes an instance with m features, and y i ∈ {±1} its label.Finding the optimal hyperplane means solving the quadratic programming model ( 1)-( 3) where C, named penalty parameter, is a trade-off between the size of the margin and the slack variable penalty.In a non-linearly separable dataset, the SVM basically maps inputs into high-dimensional feature spaces by the so-called kernel functions.A kernel function denotes an inner product in a feature space, measures similarity between any pair of inputs x i and x j , and is usually denoted by K(x i , x j ) = φ(x i ), φ(x j ) [36].Here, we used three kernel functions: linear kernel K(x i , x j ) = x i , x j , polynomial kernel K(x i , x j ) = ( x i , x j + 1) d , and the RBF kernel K(x i , x j ) = exp(−γ||x i − x j || 2 ).The linear kernels are a special case of polynomial kernels as the degree d is set to 1 and they compute similarity in the input space, whereas the other kernel functions compute similarity in the feature space.
Artificial neural networks are computational models, consisting of a number of artificial neural units.They emulate biological neural networks [37].In this study, we used a feed-forward artificial neural network named multilayer perceptron (MLP) with a threelayer structure of neurons: an input layer, one or more hidden layers with a variable number of neurons, and an output classification layer.The neurons in the MLP are trained with the back propagation learning algorithm [38].
Naïve Bayes [39] is a probabilistic ML algorithm based on the Bayes Theorem.It assumes that a particular feature in a class is unrelated to the presence of any other feature.
Decision trees are a non-parametric supervised learning method [40].One of their main advantages is that they are simple to understand and interpret, and they can be visualised.
Random forest [41] consists of individual decision trees that operate as an ensemble.Each tree is built by applying bagging, which is the general technique of bootstrap aggregation.A simple majority vote of all trees gives the final result.RF had good accuracy results in medical diagnosis problems [42,43].

Dealing with Imbalance Data: Cost-Sensitive Learning and Methods for Model Assessment
The problem addressed here is one of the most challenging issues in applied ML, as the event cases are under-represented.It is strongly important to correctly identify instances from the minority class compared to the majority class.To handle this problem, specific methods need to be employed.Usually, misclassification errors are treated equally but they are different depending on the class.In this work, we use and test cost-sensitive algorithms which involve the use of different misclassification costs.
A general cost matrix Cost denotes the cost of each class misclassification [44].A cost matrix for datasets with two classes is illustrated in Table 2.In general, c ij is the cost of predicting an instance belonging to class i as belonging to class j.The goal of this type of learning is to minimise the total misclassification cost of a model on the training set.Formally, given a cost matrix Cost and an instance x, the cost R(i|x) of classifying x into class i is R(i|x) = ∑ j p(j|x)c ij , where p(j|x) is the probability estimation of classifying an instance into class j.In the above cost matrix, c 12 represents the cost of a false positive misclassification and c 21 is the cost of a false negative misclassification.Usually, as we also assume here, there is no cost for correct classifications, that is, c 11 = c 22 = 0. We tested different cost matrices, as detailed in the next section.

Hybrid Method for Imbalanced Dataset and Hyper-Parameter Optimisation Approach
Our dataset suffers from class imbalance, a common issue wherein classifiers developed on such data tend to be biased towards negative predictions due to the majority class having no-event cases.To address this problem and enable the better generalisation to new data, it is essential to use appropriate methods for handling imbalanced classes.The main methods for sampling-based imbalance correction are based on over-sampling and under-sampling.Over-sampling methods involves adding more data to the smaller class equalising its size with the larger class.On the other hand, under-sampling methods involve randomly selecting data from the larger class, matching its size with the smaller class.To tackle the class imbalance in our study, we adopted a hybrid method that balances the class distribution by combining over-sampling and under-sampling approaches.This approach adds data in the minority class while simultaneously removing data from the majority class, achieving a more balanced representation of both classes in the dataset.
To assess the models, we conducted a k-fold cross-validation process, ensuring the use of k independent sets to test the model, effectively simulating unseen data.The procedure basically consists in randomly partitioning the dataset into k equal-sized folds.During each of the k rounds, the k-th fold is the test set while the remaining folds are used as the training set.The test set is never used during the training of the model, preventing overfitting.Each fold is used exactly once as a test set, ensuring that each instance is used for testing exactly once.Figure 3 provides a schematic representation of the k-fold cross-validation process.The performance metrics are averaged across the k estimates from each test fold.In a well-designed k-fold cross-validation procedure, it is crucial to determine the training and test partitions before applying any oversampling technique.As thoroughly discussed in [45], conducting k-fold cross-validation on imbalanced data could have overly optimistic estimates if oversampling methods are applied to the entire dataset.To ensure a reliable evaluation of the model's ability to generalise to real-world data, the oversampling should only be applied to the folds designated as training data.This approach maintains the integrity of the test sets, providing a more accurate assessment of the model's performance.The impact of hyper-parameters on the performance of an ML model is widely recognised.Generally, the hyper-parameters of each model are adjusted to find a hyperparameter setting that maximises the model performances and enables accurate predictions on unseen data.Recent studies proved that hyper-parameter tuning with class weight optimisation are efficient in handling imbalanced data [46][47][48][49][50].In this study, we employed the grid search optimisation strategy [51] to determine the optimal hyper-parameters for each ML model on the entire dataset.This strategy involves exploring all specified hyperparameter combinations within a multi-dimensional grid.Each combination is evaluated using a performance metric to assess its effectiveness in enhancing the model's performance.Multiple combinations of hyper-parameters are evaluated during the Grid Search optimisation process.Among these combinations, the one that yields the best performance is selected.Subsequently, this optimal set of hyper-parameters is utilised to train the ML model on the entire dataset.

Performance Metrics for Imbalanced Dataset
The predictive performance of the constructed ML models is evaluated and compared using various metrics, including the receiver operating characteristic (ROC-area under the curve (AUC), sensitivity, specificity, balanced accuracy, and G-mean values.
Let P and N be the numbers of positive and negative instances in the dataset, respectively.Let TP and TN be the numbers of instances correctly predicted as positive and negative, respectively, and FP and FN be the number of instances predicted as positive and negative while actually belonging to the opposite class, respectively.

•
AUC measures the classifier's ability to avoid false classification [52].It is the area under the curve of the true positive ratio vs. the false positive ratio that indicates the probability that the model will rank a positive case more highly than a negative case.
A model whose predictions are 100% correct has an AUC of 1.0.• Sensitivity, also referred to as true positive rate or recall, measures the proportion of positive instances that are correctly identified, i.e., it is the ability to predict a CV event: Sens = TP TP + FN • Specificity (also known as true negative rate) is used to determine the ability to correctly classify.It measures the proportion of negatives that are correctly identified, and is defined as Spec = TN TN + FP

•
The accuracy metric can be misleading for our imbalanced dataset.As it is equally important to accurately predict the events of the positive and negative classes for the addressed problem, we used the balanced accuracy metric [53,54], which is defined as the arithmetic mean of sensitivity and specificity: • Another useful metric is the so-called geometric mean or G-Mean that balances both sensitivity and specificity by combining them.It is defined as The sensitivity and specificity are also known as quality parameters and used to define the quality of the predicted class.

Results and Discussions
For predicting cardiovascular deterioration events in CHF patients, we only used the vital signs as predictive features for a deterioration event.With the aim of finding an ML model that is able to predict CV events, we carried out computational experiments with three methods, named as follows: • Method 1 : cost-sensitive learning methods; • Method 2 : data resampling methods; • Method 3 : cost-sensitive learning methods combined with data resampling methods.
In cost-sensitive learning, we considered the cost of misclassifying a positive instance and the cost of misclassifying a negative instance as hyper-parameters to be tuned during the model training process.Referring to the notation of Table 2, we fixed the misclassification cost c 12 = 1 and explored different values for c 21 ∈ {1, 1.5, 2, 5, 10, 20, 30, 40, 50, 100, 200, 300, 400}, while setting c 11 = c 22 = 0.The Waikato Environment for Knowledge Analysis (Weka, version 3.8.2,[55]) was utilised for constructing and evaluating the classification models.For each ML model, we used the grid search algorithm to select the optimal values of the cost matrix.
Table 3 presents the hyper-parameter tuning performed with a data resampling approach for SVM and MLP.The final column displays the optimal hyper-parameter values, as follows: SVM: We tested SVM with three kernel functions, i.e., the linear kernel, polynomial kernel, and RBF kernel.The hyper-parameter C and those related to the polynomial kernel and RBF kernel were optimised by searching for the best values within the specified range, as reported in Table 3.The incremental value used for optimisation is denoted in the fourth column as "Step."

MLP:
The model parameter optimisation involves choosing the number of hidden layers, the number of neurons in each layer, the number of epochs, the learning rate, and the momentum.We tested different MLP models consisting of one input layer, one output layer, and one hidden layer.The number of neurons in the hidden layer was set according to the formula We optimised the learning rate, the momentum, and the number of epochs in a defined range of values, as reported in Table 3.The incremented value is denoted in the fourth column as "Step".
Table 3 presents the values tested for optimising the hyper-parameters of SVM and MLP models through the grid search algorithm.The results of the hyper-parameter tuning for both SVM and MLP models using a cost matrix with equal misclassification costs (i.e., false negative cost and false positive cost) are shown as optimal hyper-parameter values in the last column.

Predictive Models Performance Metrics
We carried out computational experiments with the three methods.The following tables only report the best performance results.The performance metrics are related to both three-fold cross-validation and five-fold cross-validation.The best values for each performance metric achieved by a ML model are highlighted in bold.
Table 5 summarises the best results related to Method 1 , i.e., cost-sensitive learning methods.The SVM models with the linear kernel and RBF kernel demonstrated the best overall performances compared to the other models when the misclassification costs were set to c 12 = 1 and c 21 = 30.The decision tree models exhibited a comparable prediction performance in terms of sensitivity and specificity only when the cost of misclassifying the minority class was set to a high value, that is, c 21 = 200 or c 21 = 300.Naive Bayes models showed a comparable prediction performance with both c 21 = 30 and c 21 = 40 with five-fold cross-validation.Random forest models showed a comparable prediction performance only when the cost of misclassifying the minority class was set to a high value, that is, c 21 = 500.Furthermore, it is noteworthy that, in general, hyper-parameters that allow one to find the highest sensitivity have the lowest specificity, and vice versa.Based on the performance results, the MLP with c 12 = 1, c 21 = 2 has the best performance among the built models with a mean sensitivity and a mean specificity of 65% and 55%, respectively, a mean area under the curve of 0.71, and a G-mean of 0.60.In general, we observed that the performance of the predictive models MLP, naive Bayes, decision tree, and random forest improved when the cost of misclassifying the minority class was higher (c 21 > c 12 ).Additionally, all constructed models achieved a meaningful prediction performance with a five-fold cross-validation approach.However, the SVM model with a polynomial kernel showed a lower sensitivity performance in certain cases.These findings indicate that the combination of cost-sensitive methods with data over/undersampling approaches is effective for the meaningful prediction of cardiovascular events.Comparing the three methods, the cost-sensitive learning methods proved to be superior to the sampling approach.They achieved a high performance in terms of G-mean, indicating their efficacy in handling imbalanced data and improving the model's overall performance.This model could be particularly useful in CHF patients with NYHA class III or IV, where functional reserves are reduced and each exacerbation leads to a further deterioration of cardiac function that worsens the symptomatology and often requires hospitalisation and sometimes results in death.In this context, the early identification and treatment of a re-exacerbation of HF may improve the patient's symptoms and prognosis and avoid hospitalisation with reduced healthcare costs [6].

Conclusions
Technology is increasingly playing a significant role in clinical practice.In this context, machine learning represents an innovative methodology for managing chronic disorders, empowering clinicians with a key role in predicting cardiovascular events rather than merely being spectators.
The data used in this study were collected during a pilot study in a well-characterised CHF population of patients.The results open up numerous potential perspectives for applying ML approaches to clinical practice.Having a predictive system for CV deterioration events in CHF patients can lead to significant advantages, such as reducing hospitalisations and associated costs.The clinical importance of utilising such a model cannot be understated, as early detection can prevent CV events, improve patient health, and optimise healthcare expenditure.
The findings suggest that cost-sensitive methods can effectively predict CV deterioration events in CHF patients using only a few clinical variables.CHF remains one of the most severe chronic diseases in terms of mortality, hospitalisation rate, and healthcare costs.Successfully addressing even one variable linked to the natural course of this disease would be a significant achievement, benefiting the patients, clinicians, and the National Health System.Our KD process yielded promising results, paving the way for large-scale application.It is interesting that a few clinical variables such as HR, RR, DBP, and SBP had a good performance in the prediction of CV deterioration events.Furthermore, the performance of these models may be enhanced by including other variables of study patients.Further studies, however, are needed to validate this approach in a more large-scaled patient population.As part of future work, we plan to expand the sample size by including more patients, reducing the interval between consecutive visits using remote monitoring, and conduct an in-depth feature selection study to better understand which other features are crucial in diagnosing deterioration events in CHF patients.

Figure 1
Figure1illustrates the knowledge discovery process that we designed and implemented.It is detailed in the following sections.

Figure 2 .
Figure 2. Distribution of the events based on chronic heart failure aetiology.

Figure 3 .
Figure 3. Graphical representation of the k-fold cross-validation method.

Table 2 .
Cost matrix for a binary classification problem.

Table 3 .
Hyper-parameter tuning for SVM and MLP and best values.

Table 4
displays the hyper-parameter values set for each model, alongside the corresponding cost matrix with equal misclassification costs.The table provides a comprehensive overview of the selected hyper-parameter values for each model in the study and the same cost matrix.

Table 5 .
Predictive cost-sensitive learning model performance metrics by the cost-sensitive method.

Table 6
reports the performance of the predictive models with Method 2 and Method 3 .More specifically, the first row shows the results found by Method 2 per each ML model reported in the first columns; the rest of the rows show the results found by Method 3 .

Table 6 .
Predictive model performance metrics related to the data resampling method, and the cost-sensitive method combined with the data resampling method.