Predicting Heart Disease Using Collaborative Clustering and Ensemble Learning Techniques

: Different data types are frequently included in clinical data. Applying machine learning algorithms to mixed data can be difﬁcult and impact the output accuracy and quality. This paper proposes a hybrid model of unsupervised and supervised learning techniques, which can be used in modelling and processing mixed data with an application in heart disease diagnosis. The model consists of two main components: collaborative clustering and combining decisions (the ensemble approach). The mixed data clustering problem is considered as a multi-view clustering problem; each view is processed using specialised clustering algorithms. Since each algorithm operates on a different space of the data set’s features, a novel collaborative framework was proposed that promotes the clustering process through information exchange between the different clustering algorithms, thereby producing expert models that model other spaces of the data set’s features. The expectation maximisation algorithm forms the foundation for this optimisation process, enhancing the collaborative term representing entropy; excellent convergence characteristics are therefore ensured. An ensemble approach similar to the stacking approach was used. The logistic regression model was utilised as a meta-classiﬁer, training the expert model prediction results, and was subsequently used to predict the ﬁnal output. The results prove the efﬁcacy of this collaborative approach in optimising different clustering algorithms and meta-classiﬁer outcomes.


Introduction
The heart is the second most critical human organ after the brain, as it pumps oxygen and nutrients to the body's tissues and viscera via the circulation system [1].If the cardiac function was to fail, the brain and other organs would stop operating, causing the individual to die within minutes.Heart-related disorders, or cardiovascular diseases (CVDs), have been a primary cause of global mortality for several decades, placing them amongst the most lethal diseases [2].The prevalence of CVDs is currently rising worldwide, primarily due to lifestyle changes, work stress and poor eating habits.CVDs are thought to cause 17.9 million deaths annually, or 32% of all fatalities worldwide.Myocardial infarctions and cerebrovascular events comprise approximately 85% of deaths related to CVDs; in excess of 75% deaths are linked to cardiovascular disorders [3].These statistics make the need to understand more about CVDs, their prevalence and diagnosis, more pressing.
Data clustering is an essential part of extracting information from databases.The aim is to identify patterns inherent in a set of items by clustering them based on standard features.However, the number of clusters to be found is usually unknown, which makes this process more difficult than supervised classification.Specifically, it becomes more challenging to assess the quality of the clustering section [4].Over the last two decades, the increasing availability of more complex data sets, including multi-view, distributed and scaled data, has made the clustering process even more difficult.This problem can be solved efficiently by combining several different clustering algorithms, referred to as collaborative clustering [4].This comprises an unsupervised machine learning approach which involves several different clustering algorithms working concordantly in order to identify structures within data sets.The process is characterised by frequent exchanges of information between collaborating members, direct action in relation to each member of the group and responsibility involving all group members, which affects the individual tasks of each of the collaborating members [4].In comparison to a lone algorithm working autonomously on a data set, collaborative clustering can enhance the results and reliability of the clustering algorithms applied to the identical data set.The process can be used in various applications, including distributed data clustering, multi-expert clustering, multi-scale clustering analysis and multi-view clustering [4].
Collaborative clustering comprises two basic steps: a local step during which each member performs its task individually and produces a clustering solution, followed by a collaborative step, in which no fixed techniques are present.In the latter step, the collaborating members exchange their results and try to improve their models in order to achieve better clustering [5].Each local computation, which may be performed on different data sets, gains from the efforts of the other contributors [4].There are several ways to split the data, the most important of which is horizontal and vertical collaborative clustering [6].
Some applications for collaboration, such as multi-expert analysis [4], are a type of collaborative clustering in which all algorithms work on the same objects and features of a difficult data set.Multiple algorithms are applied to a data set and share their predictions.This enables the merging of information in clusters only identified by some algorithms and refines results based on clusters that have not yet been well classified.Multi-view clustering is a type of unsupervised learning in which multiple algorithms are used to analyse a set of objects.Each algorithm processes different attributes of the same objects, such as their geometry, text, colour and numerical data, in order to find clusters in the data set.The goal is to improve the clustering accuracy by combining predictions.
Multi-scale clustering is an example of a collaborative application.Various algorithms examine identical objects and properties whilst seeking a different number of clusters.Such an approach is advantageous for data sets with an inherently multi-scale structure, such as satellite image data.However, collaborative clustering faces many challenges, including which collaborating algorithms to choose in the collaboration process, the information that is exchanged, on what basis the collaborators make this decision and when to stop the collaboration, as increasing or decreasing the collaborative process leads to negative collaboration.Thus, in this work, ensemble learning is utilised to create several clusterings and to combine them in order to obtain a final consensus clustering.This technique promotes classification accuracy and prediction performance.A group of individual models is referred to as an ensemble model [7].Typically, alternative methods, variable algorithm parameters or random data sampling create the initial clustering.Ensemble methods are simple supervised learning methods dependent on consensus-based techniques, as defining a set of predictive functions in order to produce an aggregated prediction function is straightforward.For example, a linear combination is used in boosting.It is also easy to gauge the effectiveness of individual predictive functions as well as the variety of the group of task contenders for inclusion in the final combined global decision function.
The main issues in the current research are that all the work in the literature used supervised learning algorithms for the prediction of heart disease; there is no work that utilised unsupervised learning algorithms or collaborative clustering technology for this application.In this paper, a hybrid model is proposed for unsupervised learning techniques.A summary of this contribution is detailed below:

•
The presented model comprises two main components: collaborative clustering and combining decisions (ensemble approach).

•
The problem of mixed data clustering, i.e., quantitative and qualitative, here deemed multi-view clustering, whereby each offer is processed using specialised clustering algorithms, was faced.Since each algorithm operates on a different area of the data set features, the proposed collaborative clustering framework is horizontal collaborative.

•
A clustering strategy similar to the stacking approach was employed, together with a logistic regression model as a meta-classifier.The latter was trained on the results of expert model predictions and then employed in order to predict the final decision.

•
The efficacy of the approach with respect to different clustering algorithm optimisation and meta-classifier results was proven using the collaborative technique.The strength of this proposed collaborative method is that it does not require the prototypes or models used by the different algorithms to be shared during the collaborative step.
Only the solution vectors produced by all the algorithms require sharing.

•
Controlling collaborative step continuity also avoids negative collaboration.
The remainder of this paper is divided into several sections.Section 2 includes relevant work on predicting heart disease using supervised and unsupervised learning, together with a description of some general studies related to collaborative clustering.In Section 3, the proposal for collaborative clustering and the ensemble approach is presented; each stage of the methodology is explained in detail.Section 4 contains the results and discussion relating to a number of experiments for both parallel and reinforcement scenarios.The article concludes with a conclusion and thoughts relating to future work.

Literature Review
Several previous studies have been conducted in the field of CVD diagnosis.For instance, a new model was proposed for heart disease prediction and referred to as the Heart Disease Prediction Model Using a Hybrid Random Forest (RF) and Linear Model.This exploited several machine learning techniques, including a variety of feature combinations and numerous categorisation algorithms, and achieved an 88.7% accuracy [8].
Generally, research has focused on diagnosing heart disease based on historical data and information.The Smart Heart Disease Prediction model utilised naive Bayesian (NB) techniques to forecast risk factors for heart disease [9].The results revealed that this diagnostic approach was highly successful at predicting risk factors for CVDs with an accuracy of 89.77% [9].Fitriyani et al. [10] published an efficient heart disease prediction model for a clinical decision support system that utilised density-based spatial clustering of applications with noise (DBSCAN) in order to identify and eliminate outliers, in conjunction with a hybrid synthetic minority over-sampling technique-edited nearest neighbour technique.Two publicly available data sets, Statlog and Cleveland, were used to evaluate the model and to compare its performance against other widely used classifier models.The proposed model outperformed previous models, achieving an accuracy of up to 96%.
Various artificial intelligence strategies for coronary artery heart disease prediction were compared using seven computational intelligence approaches, namely logistic regression (LR), support vector machine (SVM), a deep neural network (DNN), decision tree (DT), NB techniques, RF, and k-nearest neighbour (K-NN) [11].The Statlog and Cleveland heart disease data sets were used to assess each technique's performance.DNNs were found to achieve the highest accuracy, i.e., 98.15%.Compared to the state-of-the-art methods, and focusing on heart disease prediction, their approach outperformed prior studies [11].
A novel method for predicting heart disease was also presented that included all techniques in a single algorithm, i.e., the hybridisation technique [12].Several methods were listed and evaluated in order to determine their accuracy level, and the results showed that a correct diagnosis was possible when using a composite model comprising all approaches.An accuracy of 89.2% was attained.
A further paper utilised an optimum feature extraction method to generate a de novo clustering model for the prediction of CVD using numerical data and electrocardiographic (ECG) data [13].Rather than directly grouping the numerical data, a principal component analysis was used for dimensionality reduction.The hybrid clustering technique consisted of improved DBSCAN combined with optimised k-means clustering (KMC).Simultaneously, four data sets were utilised, i.e., two data sets and an ECG data set, which is a test used to detect and to record the timing and intensity of human cardiac electrical activity.The findings indicated that the model effectively resolved the issues associated with using dual data for heart disease prediction.
A heart disease prediction system was also developed using the data mining approach by combining NB techniques and KMC [14].This approach facilitated the prediction of CVD by incorporating numerous variables and providing output data in the prediction form, using the k-means and NB algorithms to group a range of factors and to perform prediction, respectively.
Unsupervised KMC has also been utilised in the clinical domain to detect anomalies as a method of CVD prediction [15].The suggested model produced an ideal value for k by forming clusters to recognise anomalies using the silhouette approach.Identified anomalies were excluded from the data, and the resulting prediction model was created using the five most prominent machine learning classification approaches, namely KNN, RF, SVM, the NB technique and LR.This approach was justified by the efficacy of the method and substantiated using a common cardiovascular data set.Data charting was also used in order to determine the precision with which they found abnormalities in their experimental research.
All the above algorithms have the same limitations.Firstly, they rely on prototypes based on fixed parameters and algorithms from the same family, and secondly, they find identical clusters.In one study [28], a horizontal collaborative clustering method is used, referred to as SAMARA.However, this uses only hard clustering as it does not deal with prototypes and is not limited to a specific number of clusters.The goal is to modify the clustering results for the same data iteratively and collaboratively, thereby lowering their diversity and facilitating the discovery of a consensus solution.After completing the local step, the clusters are mapped to different clustering algorithms using a probabilistic confusion matrix.
A new method [5] was proposed that allows the enhancement of the clustering process by exchanging information between several local results.This addresses the limitations of previous techniques that depended on the algorithms being both homogeneous and from the same family.The data are split using a horizontal collaborative [4,5] method, either into subsets that represent the same data on different features, or into the same data but searching for a different number of clusters or a mixture between them.Several heterogeneous algorithms are used, including Self-Organizing Mapping (SOM) [23,25], the Generative Topographic Mapping Algorithm (GTM) [26,27] and Expectation Maximisation (EM) [4], which can be applied to different probability distributions.These probability distributions include the Gaussian mixture distribution, the Dirichlet distribution [29] and the Bernoulli mixture model (BMM) [30].The approach based on the SAMARA modification [4] avoids the limitations found in previous studies by having the requisite for identical prototypes amongst various collaborators and by employing an abundance of clustering algorithms from other families.This is based on an estimation process that operates on subsequent distributions known as composite functions.These techniques handle data with the same feature data type, e.g., numeric or imaging data.One of the challenges in the current proposed approach is handling mixed data.
Most clustering algorithms do not have the ability to deal with mixed data.This is because each clustering algorithm has a specific distance measure.It may be specialised in dealing with quantitative data or dealing with qualitative data.For example, k-means depends on the Euclidean distance measure for quantitative data, and k-modes uses a categorical measure to calculate distance.Therefore, there is no single algorithm that works with mixed data.
The current study investigates a different approach in order to optimise the clustering algorithm results.This strategy has large-scale advantages which include the ability to apply collaborative clustering techniques to medical specifications or to alternative domains.Additionally, there is the option to predetermine the cluster number [31].In this study, two clusters will be predetermined in other to predict whether or not an individual has a CVD.

Methodology
In this paper, a hybrid model comprising unsupervised and supervised learning techniques is proposed for the modelling of mixed data and specifically, of cardiology data sets.The general framework of the proposed model is illustrated in Figure 1.
The model consists of two main components: (i) collaborative clustering and (ii) combining decisions (the ensemble approach).In collaborative clustering, the mixed data clustering problem considered was multi-view clustering, where the data set is divided into two views, representing quantitative and qualitative features, respectively.Each is processed using specialised clustering algorithms.Since each algorithm operates on different areas of the data set's features, a new collaborative method that enhances the clustering process by sharing information between results obtained from different clustering algorithms in the horizontal collaborative was designed.The collaboration goal was to process each type of data set feature with specialised clustering algorithms whilst giving these algorithms a more comprehensive picture of the data set by allowing some information to be exchanged between them, improving their results.This process led to the generation of expert models that model different areas of the data set's features.Mapping of the models' clusters to appropriate data set classes was then performed, resulting in a shift from the unsupervised to the supervised learning approach, and enabling meaningful predictions from the clustering models.Finally, to produce a single decision from these expert models, an ensemble approach similar to the stacking method was applied.The logistic regression model was utilised as a meta-classifier, training the expert model prediction results, and then used to predict the final output.In the remainder of this section, each phase of the proposed framework is described in detail.

Data Pre-Processing
The cardiac data set available on Kaggle was used for this study [32].This resulted from merging five cardiac data sets, i.e., Cleveland, Hungarian, Swiss, Long Beach VA and Stalog (heart), containing 11 common clinical features.Each data set is available within the cardiology data sets index from the UCI Machine Learning Repository [33].It contains 918 observations, the target variable representing whether or not a person has heart disease.The identification of heart disease is based on 11 clinical features of different types, 5 of which are quantitative and 6 of which are qualitative (Table 1).
Most machine learning models only work with numeric values, and so qualitative values have to be converted into numeric values so that the machine can learn from those data correctly.One HotEncoder technology was used to encode the qualitative features into a one hot-encoded binary feature, i.e., if a column represents the feature, it receives the symbol 1.Otherwise, it is assigned a 0. This can help avoid the ordering problem, which can occur when a qualitative variable has a natural ordering.The quantitative features were then normalised in order to enhance the machine learning model input quality by data rescaling to a standardised range.Although normalisation can be achieved using different methods, such as Robust Scaler, MinMax Scaler and Standard Scaler [34], the latter proved to be the most effective.
The Robust Scaler scales robust features for outliers [35].This approach is almost the same as the MinMax Scaler but uses the range between quartiles instead of the minmax.This scaling algorithm removes the mean and measures the data according to the quantitative scope, following Equation (1): where Q 1 is the first quarter and Q 3 is the third quarter.Finally, the data set was divided into two views, the first representing quantitative feature data (n = 5) and the second the qualitative features.After coding, the number becomes 10 binary features.

Collaborative Clustering
In this subsection, the proposed collaborative method is presented.This can be applied to different types of general horizontal collaborative learning tasks and specifically to multi-view clustering tasks.The proposed method removes several limitations of previous collaborative frameworks.Data need not be shared between different algorithms, the number of clusters can vary and notably, different algorithms can collaborate.The proposed work is similar to that published by Author [5], i.e., collaborative clustering with heterogeneous algorithms, where information is exchanged between many results of different clustering algorithms.The currently proposed method differs in that it is concerned with processing different types of data using specialised clustering algorithms and offers a more general approach by controlling collaborative step continuity using the stopping criterion (entropy), thereby circumventing the negative collaboration issue between collaborative clustering algorithms.
Firstly, the principle of the proposed method and its theoretical basis are explained.The stopping criterion is examined, and the collaborative scenarios in which the collaborative process can be applied are presented.How these collaborative scenarios can enhance the heart disease data set compilation is then discussed.Finally, the clustering algorithms employed are introduced, and the way in which they are adapted to the collaborative method is demonstrated.

Formalism
In horizontal collaborative clustering, a limited set of algorithms, A = A 1 , A 2 , • • • , A J , is considered, which operates on the same data items, albeit with access to different features, and possibly also looking for a different number of clusters.Let X = {x 1 , x 2 , • • • , x N }, and x n ∈ R d be a data set containing N elements, each with real numeric properties.
Each A i clustering algorithm has parameters, θ i , to describe the clusters or model, and produces its clustering solution, S i , made of K i clusters based on the features of the X i ⊆ X data set to which it has access.In the case of hard clustering, S i can be translated into a solution vector of size N , soft clustering into a matrix of size, N × K i .This matrix is denoted S i = s i n,c , where 1 ≤ n ≤ N and 1 ≤ c ≤ K i .Thus, the S i solutions generated by the algorithms are 2D matrices of size N × K i , where each element s i n,c expresses the responsibility (probability) given by the algorithm A i to cluster c of the x i n data element.This matrix is transformed into a solution vector of size N by adopting the fact that each data set record belongs to one set with the highest probability, S i = arg max s i n,c .The method assumes that any clustering algorithm whose model is collaboratively optimised attempts to optimise an objective function similar to Equation (2): Equation ( 2) can be solved using a local minimisation process, as expressed in Equation (3): Each algorithm may be based on different statistical models, e.g., Gaussian or polynomial, amongst others.It is hypothesised that this collaborative method can only be applied to algorithms attempting to optimise an equation similar to Equation (2).Most of the symbols used in this section are summarised in Table 2.
The subset of the data observed by algorithm A i .
All data with all views.
The solution vector of the algorithm A i .
Solution vectors for all clustering algorithms.
The set of distribution parameters for all algorithms.
Z An algorithm looking for K i clusters of distribution parameters θ i in the subset X i and finding a solution vector S i The consensus matrices map the other algorithms' clusters to the ones of the current algorithm A j .

Problem Formulation
The proposed method enhances the clustering process by sharing information between several results obtained from different clustering algorithms.The originality of the proposed approach is that the collaboration step can use the clustering results obtained from any algorithm during the local step.The main issue comes from Equation (3) in calculating P(s), which is the probability of each cluster.This is not known in advance as most clustering algorithms include an assumption to calculate P(s).For example, the K-means algorithm and some versions of the EM algorithm assume that all combinations have the same occurrence probability.Most unsupervised probabilistic classifiers follow this method.The principal concept is to measure the event of all clusters after each iteration.The local probability hypothesis, primarily used in computer vision, will depend on the neighbour composition of the observed data, rather than measuring the global P(s) of the entire data set, P(s).Thus, its value will be determined based on the clusters assigned to the neighbour data.
In the proposed collaborative method, a hypothesis similar to the local hypothesis was followed.During the collaborative step, it will be considered that the probability of occurrence of each P(s) cluster will not be a global probability.Instead, P(s) will be associated with clustering choices made by other algorithms for the same data point.The objective is to establish a method of modifying Equation (3) such that the solutions of all the clustering algorithms are used to compute P(s), as indicated by Equation (4).
C(x t , s t ) is a function that determines the value of P(s) by the consensus of all other clustering algorithms for the same data point x t .

Local Step
In the local step, each clustering algorithm will process the data it can access, as shown in Table 3, and produce a clustering result as a solution vector.The solution vectors of all clustering algorithms are clustered into a two-dimensional matrix similar to that illustrated in Table 4.Each vector column represents a solution proposed by a particular clustering algorithm.This matrix is used to form probability confusion matrices, which will be used in the collaborative step of computing the cs function by the consensus between the different clustering algorithms.This process will be discussed in detail in the collaborative step.In summary, the local step involves the processing of the individual data set views by the appropriate clustering algorithms in order to generate the solutions matrix.

Collaborative Step
The local step clustering algorithm solutions matrix is then used in the initial stage of the collaborative step, during which the probability confusion matrices are calculated in order to generate the C(x, s) function.This idea was inspired by another work [28], in which the same confusion matrices are computed for a consensus-based method.The probability confusion matrix between two clustering algorithms is represented as shown in Equation (5).
Ψ i→j is a matrix of size K i × K j which maps the clusters of the A i algorithm to the clusters of the A j algorithm.Each element of the matrix α i,j k,l represents the probability of placing the data in cluster l of the A j algorithm if this data were already in cluster k of the algorithm A i .S i k ∩ S j l represents the number of data points in cluster k of A i that are simultaneously in cluster l of A j ./ S i k is the number of data points belonging to the k cluster of the A i algorithm.
Once all the probability confusion matrices, Ψ, have been computed, the second stage of the collaboration step commences.Given the matrices of Ψ and the results of other algorithms from the local step for a given clustering algorithm A i , the consensus function C(x, s) to be estimated for cluster s becomes as shown in Equations ( 6) and (7).
s x,a j is the cluster defined by algorithm a j for object x.At this juncture, other algorithm solutions are incorporated to estimate the probability of cluster s.It is assumed that P s | s x,a j is independent.This assumption enables the use of the Ψ probability confusion matrices computed in the previous step, which yields Equation (7).
Finally, Equation (3), which will be maximised during the collaborative step, becomes Equation (8): where Z is a normalised constant independent of s, and P(x | (s, θ s )) is the local term.This probability function depends on the type of probability distribution used in the clustering algorithm's, C(x, s), collaborative term in the form of a global consensus function between the solutions of all algorithms.The concept of enhancing the objective function of the clustering algorithm is evident from Equation (8).This takes into account the two local solutions, together with those generated by alternative algorithms.Only the solution vectors, S, are taken into consideration; the parameters, θ, are excluded.
This change from θ to S is made possible owing to the use of an alternative maximisation procedure in which S partitions (solution vectors) are computed from the prototypes, which are then updated based on the partitions and data.Partitions can therefore be considered to reflect an estimate of the distributions described by the prototypes.The EM strategy is then used to improve Equation (8).The workflow in Algorithm 1 shows how EM can be implemented for a given clustering algorithm.During step E, the algorithm's solutions, S, are updated by using constant θ parameter values distributed by the algorithm, together with information coming from the solutions of other algorithms as expressed by Equation (8).During step M, these parameters are updated based on the new S solutions.The solutions of the algorithm and the probability confusion matrices, Ψ i→j , are then revised based on the updated parameters.Update all Ψ i→j matrices 10: end while

Stopping Criterion
One of the most challenging collaboration problems is knowing when to stop the collaboration.Often, continuity in the collaborative step leads to incorrect results.In this case, collaboration is negative, so defining a stopping criterion that controls the continuity of the collaborative step was necessary.It was defined as the entropy of probabilistic confusion [4] (Equation ( 9)).
This entropy evaluates the pairwise differences between the algorithms.In short, H is the global entropy of collaboration, given that all algorithms are independent.The reason for this entropy is that it uses α i,j k,l , from the probability confusion matrices in Equation ( 5), which is calculated during the collaborative step.On this basis, the global entropy, H, is much less expensive to calculate than any other measure of divergence or consensus.In addition, this type of entropy criterion is consistent with studies that have showed the importance of diversity and entropy in collaborative clustering [36][37][38].

Collaborative Scenarios
In this section, some collaborative scenarios are presented in which the collaborative method could be used, together with the most critical collaborative applications of these scenarios and a description of the collaborative workflow process in each case.

• Parallel Scenario
Figure 2 illustrates the parallelism scenario in collaborative clustering; several clustering algorithms run in parallel and improve each other's results.This process underlies the previously discussed horizontal collaborative method.
The most important applications of this type of collaboration are several algorithms that operate on (i) different feature spaces searching for a different number of clusters, i.e., multi-view clustering; (ii) distributed data sets; and (iii) a problematic data set, whether distributed or undistributed.In all three applications, mutual improvement in the results would be beneficial, i.e., the outcome of collaborative clustering.
This collaborative method can work with all previous applications, but in this research, the first application is of interest.Qualitative and quantitative features of the heart disease data are processed with suitable clustering algorithms in an attempt to enhance their results by sending and receiving information determined by the collaborative method.In Algorithm 2, the workflow demonstrates how the collaborative process can be implemented with clustering algorithms running in parallel.Apply the clustering algorithm A j on the data X j 4 : Initialise the local parameters θ 5: end for 6: Collaboration step: 7 : Compute all Ψ matrices 8 : while the global entropy H decreases do 9 : for each A j ∈ A do 10 : Running the A j algorithm to be optimised based on Equation (8) In addition, this type of entropy criterion is consistent with studies that have showed the importance of diversity and entropy in collaborative clustering [36][37][38].

Collaborative Scenarios
In this section, some collaborative scenarios are presented in which the collaborative method could be used, together with the most critical collaborative applications of these scenarios and a description of the collaborative workflow process in each case.

•
Parallel Scenario Figure 2 illustrates the parallelism scenario in collaborative clustering; several clustering algorithms run in parallel and improve each other's results.This process underlies the previously discussed horizontal collaborative method.The most important applications of this type of collaboration are several algorithms that operate on (i) different feature spaces searching for a different number of clusters, i.e., multi-view clustering; (ii) distributed data sets; and (iii) a problematic data set, whether distributed or undistributed.In all three applications, mutual improvement in the results would be beneficial, i.e., the outcome of collaborative clustering.
This collaborative method can work with all previous applications, but in this research, the first application is of interest.Qualitative and quantitative features of the heart disease data are processed with suitable clustering algorithms in an attempt to enhance their results by sending and receiving information determined by the collaborative method.In Algorithm 2, the workflow demonstrates how the collaborative process can be implemented with clustering algorithms running in parallel.However, one should note that in this case, all collaborative algorithms must satisfy the collaborative method hypothesis, which is an objective functional optimisation similar to Equation (2).

•
Reinforcement Scenario Figure 3 shows the reinforcement scenario, which is another possible scenario that could be handled by the collaborative method.In this case, the information is transmitted from one side only.
In Algorithm 3, the performance of an EM clustering algorithm is enhanced by the use of information from other algorithms.One-sided information transfer would be beneficial if the latter were specialised algorithms capable of detecting particular elements of the observed data.In this case, the process would be analogous to the reinforcing rather than the collaborative learning process.
In the current context, since the observed heart disease data were processed using clustering algorithms which specialised in qualitative and quantitative features, the collaborative method could be used to reinforce a qualitative algorithm with information from other quantitative algorithms and vice versa (Figure 3).
However, one should note this case, all collaborative algorithms must satisfy the collaborative method hypothesis, which is an objective functional optimisation similar to Equation ( 2).

•
Reinforcement Scenario Figure 3 shows the reinforcement scenario, which is another possible scenario that could be handled by the collaborative method.In this case, the information is transmitted from one side only.In Algorithm 3, the performance of an EM clustering algorithm is enhanced by the use of information from other algorithms.One-sided information transfer would be beneficial if the latter were specialised algorithms capable of detecting particular elements of the observed data.In this case, the process would be analogous to the reinforcing rather than the collaborative learning process.
In the current context, since the observed heart disease data were processed using clustering algorithms which specialised in qualitative and quantitative features, the However, in this case, the EM algorithm must satisfy the premise of the collaborative method, which is an objective function optimisation similar to Equation ( 2).Although the remaining clustering algorithms can optimise different objective functions only, their solution vectors are of interest.Equation ( 9) is computed for the algorithm being reinforced only, as shown in Equation (10).Apply the clustering algorithm A j on the data X j 4 : Initialise the solution vectors S 5: end for 6 : Determine the A i ∈ A clustering algorithm to be optimised 7 : Initialise the local parameter θ i 8: Collaboration step: 9 : Compute all Ψ i→j matrices 10 : while the global entropy H decreases do 11 : Running the A i algorithm to be optimised based on Equation (8) 12 : Update the solution vector S i 13 : Update the local parameters θ i 14 : Update all Ψ i→j matrices 15: end while

Generate and Adapt Collaborative Members
As explained earlier, the majority of clustering algorithms are unable to process mixed data, so specialised clustering algorithms for each data type, selected according to the previously described method limitations, have been used.
We show the clustering algorithms used in our proposed model and how they are adapted to our collaborative approach.

•
Gaussian and Bernoulli Mixture Models algorithm to estimate the parameters of a mixture model from data makes essential use of these posterior probabilities [39].In this paper, both Gaussian Mixture Models (GMMs) and BMMs were employed in order to model the observed data set.As the qualitative data are binary, a superior fit is obtained from the Bernoulli distribution.Given that a Gaussian distribution operates on the binary data, i.e., 1 and 0, this was utilised to model the qualitative data after converting them into binary data in the pre-processing phase.

• K-Means Clustering
The k-means approach to solving the problem is referred to as EM.The E-step maps the data points to the nearest cluster and the M-step calculates the centroid of each cluster using a k-means algorithm based on Euclidean similarity.A k-means algorithm was also employed, in which the similarity calculation equation was changed to the cosine similarity measure.The k-means algorithm is not probabilistic, but it can be considered to be a degraded case of the GMM algorithm.

• K-modes clustering
A K-modes clustering algorithm was proposed by the author in [40].This is an extension of the k-means algorithm, which can handle categorical features.It inherits the characteristics of the KMC algorithm and is efficient and easy to implement.It is therefore widely used in various fields.Since the k-modes algorithm is suitable for clustering categorical data, it was utilised to process the qualitative heart disease data features.However, it can only be used in the reinforcement scenario as it was not adapted for use with the collaborative method.In this paper, the collaborative method presented can only be applied to GMMs and BMMs.However, adaption of the technique is possible with respect to algorithms based on alternative probabilistic models.

Mapping Clusters to Class Labels
After applying clustering techniques to the data set, cluster 0 and cluster 1 are for each clustering model and mapped with the true prediction classes from the data set, i.e., set cluster 0 to class 0 and cluster 1 to class 1.
The purpose of the mapping process is to shift from clustering to a supervised learning or classification approach.The prediction process for the clustering algorithms then becomes clear, i.e., when predicting a new state, if it belongs to cluster 0, it is not infected, and if it belongs to cluster 1, it is infected.After the mapping process, the performance of the clustering models can be evaluated using different classification performance metrics.There is also the possibility of dealing with ensemble techniques in order to combine multiple model predictions into an optimised composite model.

Combining Different Models (Ensemble Approach)
The transition to a supervised learning approach produces two sets of models specialised in quantitative and qualitative data modelling, respectively.Many decisions are made by these models.Undoubtedly, one of these models alone cannot be selected because it specialises in a particular feature set space.The ensemble approach, therefore, had to be adopted to combine the different models and to produce a single composite decision.An ensemble-based system is obtained by blending diverse models, subsequently referred to as classifiers or experts.These systems are also known as multiple classifier systems or ensemble systems [41].
There are many scenarios in which the ensemble approach makes statistical sense.In many applications requiring automated decision making, receiving data from various sources that may provide complementary information is not unusual.An appropriate combination of this information is known as data or information fusion [41].It can improve the accuracy of a classification decision compared to a decision based on any of the individual data sources alone [41].
It is clear that the data fusion has an excellent fit with the proposed model.The observed heart data set is considered to be heterogeneous data, coming from two different sources which represent the quantitative and qualitative features, respectively.Clustering algorithms specialised in processing these data types are used, which collaborate to improve their results through the previously described collaborative method.Finally, clusters are assigned to the data set classes, i.e., the transition from clustering to classification.This produces a set of models that are experts in modelling the heart data set features.In order to create a single decision, the decisions made by each expert are grouped by a specific combination rule, which, for the purposes of the current model, is described below.
The ensemble approach used in the model consists of two levels of learning: (i) essential learning and (ii) meta-learning.At the first level, a set of primary models is trained on the features of the observed data sets in order to produce a group of expert models.Once the training is finished, the expert models create a new data set containing the results of their predictions.The meta-learner is then trained using this new data set, and ultimately used to classify the latest cases.Figure 4 illustrates the ensemble approach used in the model.ultimately used to classify the latest cases.Figure 4 illustrates the ensemble approach used in the model.Any meta-classifier can be used.However, LR was used as this performs well with binary classification problems.It is trained on the results of expert models' predictions for both quantitative and qualitative data types and used to predict the final decision.
The stacking approach is in keeping with the current strategy of using a stacking rule for training a meta-classifier.The main difference between them is that the cross-validation type is usually used to prepare the first-level or basic models in stacking.In the current approach, it is trained on a set of data set feature sets, i.e., on different feature spaces.
The basic idea of using a two-level ensemble approach to learning is to see whether or not the data sets were learned correctly.For example, if a given model knew a particular region of the feature space incorrectly and thus consistently misclassified states coming from that region, a second-level meta-model could potentially learn that behaviour, together with the learned behaviours of the model.The improper training of the original model can then be corrected.

Experimental Results
In this section, the experiments carried out to test the proposed framework for predicting heart disease in both the reinforcement and parallel scenarios are described.For each context, the experimental setting is given and the obtained outcomes are discussed.Since the proposed model is a hybrid of supervised and unsupervised learning techniques, evaluation metrics, presented below, were applied for both clustering and classification.

Performance Measures
The evaluation of the clustering result quality is referred to as the evaluation of its validity, for which external and internal indicators are utilised.The external validation Any meta-classifier can be used.However, LR was used as this performs well with binary classification problems.It is trained on the results of expert models' predictions for both quantitative and qualitative data types and used to predict the final decision.
The stacking approach is in keeping with the current strategy of using a stacking rule for training a meta-classifier.The main difference between them is that the cross-validation type is usually used to prepare the first-level or basic models in stacking.In the current approach, it is trained on a set of data set feature sets, i.e., on different feature spaces.
The basic idea of using a two-level ensemble approach to learning is to see whether or not the data sets were learned correctly.For example, if a given model knew a particular region of the feature space incorrectly and thus consistently misclassified states coming from that region, a second-level meta-model could potentially learn that behaviour, together with the learned behaviours of the model.The improper training of the original model can then be corrected.

Experimental Results
In this section, the experiments carried out to test the proposed framework for predicting heart disease in both the reinforcement and parallel scenarios are described.For each context, the experimental setting is given and the obtained outcomes are discussed.Since the proposed model is a hybrid of supervised and unsupervised learning techniques, evaluation metrics, presented below, were applied for both clustering and classification.

Performance Measures
The evaluation of the result quality is referred to as evaluation of its validity, for which external and internal indicators are utilised.The external validation index is the most common validation method used in the clustering method.It is based on prior knowledge of the data and measures the similarity of clustering results to external ground truth information.Hence, any valid similarity metric suitable for partition comparison can be used as an external indicator [42].
In contrast, the internal validation index relies only on the information in the data without any additional information.Most internal validation indicators are based on two criteria, i.e., coherence and separation [43].Compression is defined as a measure of how close objects are in mass.It is often measured by variance; a lower variance indicates better compressibility.Segregation is a measure of how separated a cluster is from other clusters.The distance between the cluster centroids is usually measured.
In order to assess the current model, the Rand index and purity score were utilised as external validation indicators, and the Davies-Bouldin (DB) index was used as the internal validation index.
The classification performance refers to how well a classification model can correctly predict the class labels of a given data set.The following metrics were applied as performance measures in order to appraise the proposed model:

•
Accuracy, which measures the proportion of correctly classified instances out of the total number of instances (Equation ( 11)): • Precision, which measures the proportion of true positives out of all the instances classified as positive (Equation ( 12)): • Recall or sensitivity, which is defined as the proportion of true positives out of all the positive instances (Equation ( 13)): • F1 score, which is a metric which combines precision and recall into a single value and provides a balanced measure of a model's accuracy (Equation ( 14)):

Experimental Setting
As mentioned earlier, in the reinforcement scenario, one of the clustering algorithms is improved by augmenting it with information from the remaining algorithms such that only a single clustering algorithm is optimised.In order to be enhanced by different clustering algorithms, this algorithm must satisfy the proposed collaborative method hypothesis, i.e., an objective functional improvement.However, the residual clustering algorithms can optimise different objective functions, which enables the use of various clustering algorithms.In these experiments, the collaborative clustering task framework in this scenario was evaluated to this end.The experiments relevant to this scenario were divided according to the algorithm selected for reinforcement.Algorithms specialised in each data type were therefore chosen and then reinforced with the results of other algorithms.The following experiences were realised: Choosing an algorithm specialised in processing quantitative data enhancing it through the results of algorithms specialised in processing qualitative data only.

2.
Choosing an algorithm specialised in processing quantitative data and enhancing it through the results of algorithms specialised in processing both qualitative and quantitative data.

3.
Choosing an algorithm specialised in processing qualitative data and enhancing it through the results of algorithms that only process quantitative data.4.
Choosing an algorithm specialised in processing qualitative data and enhancing it through the results of algorithms specialised in processing both quantitative and qualitative data.
The proposed model was initially evaluated after the local step.For all the clustering algorithms, the previously described internal and external validation indicators were measured, and their clusters were mapped to accurate class labels and assessed using the classification performance metrics.The meta-classifier (LR model) performance that collects decisions and produces the final decision was subsequently trained and evaluated.The efficacy of the collaborative method was finally compared with the output of the selected clustering algorithm and the meta-classifier both prior to and following the collaborative stage in order to identify any benefit from the collaboration.

Results Discussion
Table 5 shows the clustering algorithm validation indicators and the classification performance measures, which were assessed prior to the collaborations and following the mapping phase, respectively.The performances of clustering algorithms specialised in qualitative data are often higher than those specialised in quantitative data.This does not mean that these algorithms are superior to the other algorithms.However, it does imply that for determining CVD incidence, qualitative variables have a higher weight.Consequently, it is possible to improve the performance of clustering algorithms specialised in quantitative data processing through the collaborative method, which gives them insight into qualitative data clustering, and vice versa.
In Table 5, one model of each of the quantitative data algorithm models from each of the qualitative data algorithms, i.e., k-modes, K-means cosine (1) and BMM, as well as two models from each of the k-modes and BMM qualitative data algorithms, are presented.
It is evident that the clustering algorithms specialised in qualitative data have a superior performance to those specialised in quantitative data.This does not mean that these algorithms are more effective than the other algorithms.However, it does infer that with respect to recognising the incidence of heart disease, the qualitative variables have a higher weight.Consequently, there is the potential to enhance the performance of clustering algorithms specialised in quantitative data processing through the collaborative method, which gives them insight into qualitative data clustering, and vice versa.
It is also clear that the cosine (1) and BMM (1) algorithms the weakest algorithms specialised in processing quantitative and qualitative data, respectively.The remainder of this subsection will therefore focus on the effectiveness of the collaborative method for improving these algorithms.
In Table 6, the evaluation of the performance of the K-means cosine (1) algorithm after enhancing it with the results of algorithms specialised in qualitative data processing only is shown.This is also appraised after enhancing the outcomes of the two algorithms specialised in qualitative data and the algorithms specialised in data quantitation (Table 6).In both tables, the performance of the meta-classifier is evaluated after the reinforcement process.The results in Table 6 indicate that there is a significant improvement after the reinforcement process, as the percentage of progress in the purity index is 17%.In addition, there is a rise in the Rand index and a minimal increase in the DB index.The Rand index is better when it is close to 1, whereas the DB index is not natural and is best when it is smaller.A significant improvement can also be seen in the classification performance metrics after the appointment process, as the percentage of progress in the F1 score reached 19%.Following training once the reinforcement process was complete, a good accuracy was achieved.The data in Table 7 indicate that there is a slight increase in performance, as the percentage purity index improvement is 18% and there is an increase in the Rand index although the DB index remained practically unchanged.The performance of the meta-classifier remained constant in both tables.It can be concluded from both tables that enhancing the algorithm specialised in processing quantitative data with information from algorithms specialised in processing qualitative data, or by using inputs from algorithms specialised in different data altogether, has a significant impact on improving the result quality.This proves the strength of the proposed framework in the reinforcement scenario.The performance of the meta-classifier remains constant because it depends on the results of all the clustering algorithms shown in Table 5.Its performance is therefore unaffected by the improvement in the results of only a single algorithm.
Tables 8 and 9 contain the data which describe the performance evaluation of the BMM (1) algorithm following its enhancement by algorithms specialised in processing quantitative data only and by both a combination of quantitative and qualitative algorithms, respectively.As in previous experiments, the performance of the meta-classifier is assessed after the reinforcement process.The data in both tables demonstrate a significant improvement after the reinforcement process, with an improvement in the purity index of 9% and an increase in the Rand index in both cases.In contrast, the DB external validation index deteriorates, as its value has increased from the value in the local step in both tables.There was a significant improvement in the evaluation performance metrics after the reinforcement process, as the percentage of progress in the F1 score reached 15%.A slight decrease in the performance of the meta-classifier model is also evident, with a 1% fall in the F1 score compared to the previous experiment.
It can be concluded from both tables that the process of reinforcing an algorithm specialised in qualitative data with quantitative algorithms only or with quantitative and qualitative algorithms together has a similar effect in terms of improving its results.Although the reinforcement process is based on the use of weak-performing quantitative algorithms in order to improve a higher-performing algorithm, the stopping factor in the collaborative method controls stopping the reinforcement process at the onset of negative collaboration, which proves the strength of the proposed reinforcement framework in detecting negative collaboration.

Experimental Setting
As mentioned earlier, in the parallel scenario, all collaborating algorithms must satisfy the proposed collaborative method hypothesis.This section offers experiments performed to assess the collaborative approach of the proposed model in the parallel scenario.To this end, the experimental setup is similar to that described in the previous section, with the exception that the number of collaborating algorithms is increased by generating multiple algorithms with different random initialisation in order to achieve diversity in the solutions.
In this scenario, the experiments are divided according to the algorithm specialisation for collaboration, with the first and second experiments representing the collaboration of algorithms specialised in only quantitative and qualitative data processing, respectively, with each other.Finally, the third experiment represents the collaboration of all the qualitative and quantitative algorithms.The method used to assess the proposed model is identical to that described in the preceding section.

Results Discussion
The performance data for the clustering algorithms after the local step are presented in Table 10, and include six models specialising in quantitative data and three models specialising in qualitative data.There is diversity in the performance of all the algorithms, which achieves diversity in the solutions.
The evaluation of the performance data for the quantitative clustering models after collaboration with each other is shown in Table 11.Two models for each algorithm, i.e., K-means, the K-means cosine GMM, k-mode and the BMM, were created in order to achieve diversity.The evaluation of the qualitative clustering models after their collaboration is presented in Table 12, and the assessment of the quantitative and qualitative clustering models after collaboration with each other is shown in Table 13.
Numerous differences can be observed in the DB internal validation index results.This could be explained by the fact that although the proposed collaborative framework aims to optimise all outcomes, weaker algorithms often impact the outcomes of the best collaborators negatively.However, on average, the collaboration scores of the DB index remain positive, proving the strength of the proposed collaborative framework.The second achieved superior results to the algorithms generated by the local phase.the concept of the best result in unsupervised learning is highly dependent on the index considered; it is impossible to know in advance which algorithm will give the best results.In most cases, the presented approach therefore remains valid.Given the issue of strong and weak collaborators, future implementations of this work will focus on improving the collaborative process by balancing the influence of collaborators on each other based on quality and diversity criteria, as well as incorporating external knowledge into the collaborative approach.The objective will be to reduce instances of negative collaboration.The proposed algorithm can be used in other applications in several fields such as health and engineering, as well as in the method of ablation by only using two cluster algorithms.

Figure 1 .
Figure 1.The general framework of the proposed model.

Figure 1 .
Figure 1.The general framework of the proposed model.

Algorithm 2 1 :
Collaborative Clustering for Parallel Scenario Local step: 2 : for each A j ∈ A do 3 :

Algorithm 3 1 :
Collaborative Clustering for Reinforcement Scenario Local step: 2 : for each A j ∈ A do 3 : Appl.Sci.2023, 13, x FOR PEER REVIEW 16 of 25

Table 1 .
Description of data set.

Table 5 .
Evaluate of the performance of all algorithms after the local step.

Table 6 .
Reinforcement of the K-means cosine (1) algorithm by algorithms specialised in qualitative data processing.

Table 7 .
Reinforcement of the K-means cosine (1) algorithm by both qualitative and quantitative algorithms.

Table 8 .
(1)nforcement of the BMM(1)algorithm by algorithms specialised in quantitative data processing.Reinforcement of BMM(1)algorithm by both algorithms and qualitative algorithms.