Similarity-Based Three-Way Clustering by Using Dimensionality Reduction

: Three-way clustering uses core region and fringe region to describe a cluster, which divide the dataset into three parts. The division helps identify the central core and outer sparse regions of a cluster. One of the main challenges in three-way clustering is the meaningful construction of the two sets. Aimed at handling high-dimensional data and improving the stability of clustering, this paper proposes a novel three-way clustering method. The proposed method uses dimensionality reduction techniques to reduce data dimensions and eliminate noise. Based on the reduced dataset, random sampling and feature extraction are performed multiple times to introduce randomness and diversity, enhancing the algorithm’s robustness. Ensemble strategies are applied on these subsets, and the k-means algorithm is utilized to obtain multiple clustering results. Based on these results, we obtain co-association frequency between different samples and fused clustering result using the single-linkage method of hierarchical clustering. In order to describe the core region and fringe region of each cluster, the similar class of each sample is defined by co-association frequency. The lower and upper approximations of each cluster are obtained based on similar class. The samples in the lower approximation of each cluster belong to the core region of the cluster. The differences between lower and upper approximations of each cluster are defined as fringe region. Therefore, a three-way explanation of each cluster is naturally formed. By employing various UC Irvine Machine Learning Repository (UCI) datasets and comparing different clustering metrics such as Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), and Accuracy (ACC), the experimental results show that the proposed strategy is effective in improving the structure of clustering results.


Introduction
As an unsupervised technique in data mining and machine learning, cluster analysis is widely used in various areas such as attribute reduction [1][2][3][4], feature selection [5][6][7], image processing [8,9], information granulation [10][11][12], and graph convolutional neural networks [13][14][15].The primary objective of clustering is to organize heterogeneous data into meaningful groups based on their similarities, revealing the inherent structures and patterns within the dataset.To achieve this, various clustering algorithms [16] have been developed.However, it has been accepted that a single clustering algorithm cannot handle all types of data distribution effectively.Different algorithms or different parameters for an algorithm may lead to different clustering results.To enhance the robustness and stability of clustering algorithms, researchers have proposed ensemble clustering methods.In comparison to single clustering methods, ensemble clustering methods [17][18][19][20][21][22] integrate results from multiple foundational clustering algorithms, yielding more stable, robust, and accurate clustering solutions.Nevertheless, existing ensemble clustering methods typically adopt a hard clustering strategy, where an element can belong to only one cluster or none, and clear boundaries exist between different clusters.However, in situations with insufficient information on data samples, hard clustering algorithms often lead to higher decision risks.
To address this issue, three-way decision theory [23,24] was introduced to describe uncertainties in information.This method divides the sample universe into three mutually exclusive regions and adopts different decision strategies for each region [25,26].The three-way decision framework can be integrated with various computational models for learning uncertainty, such as rough set theory [27][28][29], Bayesian networks [30,31], and fuzzy particle swarm optimization [32,33].Inspired by the idea of three-way decision, Yu [34] presented the framework of three-way clustering by using core and the fringe regions to character a cluster.These two sets partition the sample space into three parts, which capture three kinds of relationships between objects and a cluster, namely, belonging to, partially belonging to, and not belonging to [35][36][37][38].
Recently, three-way clustering [39] has garnered widespread research interest, leading to the development of various three-way clustering algorithms within this theoretical framework.Wang and Yao [40] proposed a three-way clustering framework called CE3, derived from mathematical morphology's erosion and dilation concepts.Li et al. [41] introduced sample's stability to identify and establish relationships in ensemble clustering.Yu et al. [42] proposed an efficient three-way clustering algorithm based on the idea of universal gravitation.Jia et al. [43] developed an automatic three-way clustering approach by combining the proposed threshold selection and the cluster number selection method.Wang et al. [44] proposed a three-way adaptive density peak clustering (3W-ADPC) method by integrating natural nearest neighbors with DPC.
Most of the existing three-way clustering algorithms are based on the original dataset, which is not suitable for high-dimensional datasets.The processing of high-dimensional data poses a fundamental yet highly challenging problem in the current field of data science.The purpose of dimensionality reduction is to decrease the data's dimensionality while retaining the most significant aspects of its characteristics.By reducing the data's dimensionality, we can simplify the complexity of data analysis, enhance model training speed, reduce storage requirements, and facilitate a clearer understanding and interpretation of the model's results.Various dimensionality reduction techniques are commonly employed to address this challenge, including Principal Component Analysis (PCA) [45][46][47], spectral clustering [48,49], factor analysis [50], and multidimensional scaling [51].
By integrating dimensionality reduction into three-way clustering, this paper presents an ensemble three-way clustering algorithm based on dimensionality reduction.The proposed method uses dimensionality reduction techniques to reduce data dimensions and eliminate noise.Based on the reduced dataset, random sampling and feature extraction are performed multiple times to introduce randomness and diversity, enhancing the algorithm's robustness.Ensemble strategies are applied on these subsets, and the k-means algorithm is utilized to obtain multiple clustering results.Based on these results, the frequency of different data points being assigned to the same cluster is calculated to derive the cooccurrence frequency.If the co-occurrence frequency between data points exceeds a certain threshold, they are defined as similar classes.Finally, a three-way clustering approach was introduced by using the proposed similar relations.The main contributions of this research are as follows: (1) Ensemble three-way clustering framework based on dimensionality reduction.
We introduce a novel ensemble three-way clustering framework that combines dimensionality reduction techniques with clustering ensemble methods.This framework reduces data dimensions, eliminates noise, and enhances clustering stability.By leveraging multiple clustering results, the method enhances the algorithm's robustness through randomness and diversity.
(2) Integration of co-occurrence frequency, hierarchical clustering, and lifecycle analysis: The proposed method calculates the co-occurrence frequency of data points being in the same cluster, aiding in accurately defining similar classes.It employs a single-linkage hierarchical clustering approach to fuse clustering results and constructs a dendrogram based on these probabilities.By analyzing the lifecycle of clusters, we determine the most stable clustering result, ensuring robustness and consistency.
These contributions collectively enhance the performance and applicability of threeway clustering algorithms, especially for high-dimensional datasets, providing a more accurate and stable clustering solution.
The remainder of this paper is organized as follows.In Section 2, we provide a comprehensive review of the concepts related to three-way clustering, the k-means algorithm, PCA, and data integration strategies.Section 3 outlines the methodology and algorithmic process employed in this study.The results and performance metrics obtained from the proposed algorithm on the UCI dataset are presented in Section 4. Section 5 encompasses the discussion of our findings and identifies areas for future improvement.

Three-Way Clustering
Traditional hard clustering depicts a cluster by one set with a sharp boundary.Only two relationships between the sample and cluster are considered, i.e., belonging to and not belonging to.For the samples inside the cluster, they belong to this cluster, and for the samples outsider the cluster, they are not the elements of this cluster.Given a dataset X = {x 1 , x 2 , • • • , x n } with n samples and k clusters in traditional hard clustering, the clustering results can be represented as In traditional clustering, each sample is unequivocally assigned to one cluster, and there are clear boundaries between different clusters.This two-way description of a cluster may not adequately show the uncertainty information in data.To address the limitation in traditional clustering, Yu [34,52] proposed three-way clustering by defining three types of membership relations between a sample and a cluster, namely, belonging to fully, belonging to partially and not belonging to.Three-way clustering utilizes the core region Co(C i ) and the fringe region Fr(C i ) to depict a cluster, and the universe is split by these two sets into three sections, Co(C i ), Fr(C i ), and Tr(C i ) = U − Co(C i ) − Fr(C i ), which obey the following conditions: Three-way clustering results of dataset X are expressed as

PCA Dimensionality Reduction
As a powerful tool in the realm of data analysis, PCA [47] (Principal Component Analysis) offers a systematic approach to reduce the dimensionality of data while retaining the significant variance within the dataset.This not only makes data easier to visualize but also enhances the efficiency of subsequent analytical techniques.The fundamental idea of PCA involves a linear transformation that maps the original data onto a new coordinate system.The selection of this new coordinate system aims to maximize the variance of the data along specific axes.By choosing the first few principal components, the data can be projected onto these components, achieving dimensionality reduction.
In the computational process, the initial step involves calculating the covariance matrix of the original data.Subsequently, through eigenvalue decomposition, the eigenvalues and eigenvectors of the covariance matrix are obtained.Following this, a selection of the top eigenvalue-ordered eigenvectors forms the new coordinate system, representing the principal components.Finally, projecting the original data onto these principal components yields the reduced-dimensional data.Figure 1 illustrates the fundamental principle of PCA for dimensionality reduction.In Figure 1, the original distribution of the dataset is given on the plane, where the red and black dots represents different classes.Through PCA, these points are projected onto the principal component directions in the reduced-dimensional space, resulting in a new distribution of data.This process allows for the mapping of high-dimensional data into a lower-dimensional space while retaining the essential features of the original data, thus reducing dimensionality.
ing the significant variance within the dataset.This not only makes data easier to visualize but also enhances the efficiency of subsequent analytical techniques.The fundamental idea of PCA involves a linear transformation that maps the original data onto a new coordinate system.The selection of this new coordinate system aims to maximize the variance of the data along specific axes.By choosing the first few principal components, the data can be projected onto these components, achieving dimensionality reduction.
In the computational process, the initial step involves calculating the covariance matrix of the original data.Subsequently, through eigenvalue decomposition, the eigenvalues and eigenvectors of the covariance matrix are obtained.Following this, a selection of the top eigenvalue-ordered eigenvectors forms the new coordinate system, representing the principal components.Finally, projecting the original data onto these principal components yields the reduced-dimensional data.Figure 1 illustrates the fundamental principle of PCA for dimensionality reduction.In Figure 1, the original distribution of the dataset is given on the plane, where the red and black dots represents different classes.Through PCA, these points are projected onto the principal component directions in the reduced-dimensional space, resulting in a new distribution of data.This process allows for the mapping of high-dimensional data into a lower-dimensional space while retaining the essential features of the original data, thus reducing dimensionality.The application of PCA for dimensionality reduction offers the advantage of preserving the crucial features of the data while reducing their dimensionality.This enhances computational efficiency for subsequent analyses, providing robust support for research endeavors.

K-Means Algorithm
K-means algorithm [53] is a widely used clustering method with the goal of partitioning a dataset into k clusters, such that samples in the same cluster have high similarity, and samples in distinct clusters have low similarity.The main idea of k-means algorithm involves determining the positions of cluster centers by minimizing a loss function, which incorporates the Euclidean distance between sample and cluster centers.Specifically, the algorithm initiates by randomly selecting k sample points as initial cluster centers.It iteratively performs two key steps, i.e., assigning each sample point to the closest cluster center in Euclidean distance, and updating the position of each cluster's center based on the samples assigned to it.This process repeats until the cluster centers no longer undergo The application of PCA for dimensionality reduction offers the advantage of preserving the crucial features of the data while reducing their dimensionality.This enhances computational efficiency for subsequent analyses, providing robust support for research endeavors.

K-Means Algorithm
K-means algorithm [53] is a widely used clustering method with the goal of partitioning a dataset into k clusters, such that samples in the same cluster have high similarity, and samples in distinct clusters have low similarity.The main idea of k-means algorithm involves determining the positions of cluster centers by minimizing a loss function, which incorporates the Euclidean distance between sample and cluster centers.Specifically, the algorithm initiates by randomly selecting k sample points as initial cluster centers.It iteratively performs two key steps, i.e., assigning each sample point to the closest cluster center in Euclidean distance, and updating the position of each cluster's center based on the samples assigned to it.This process repeats until the cluster centers no longer undergo significant changes, signifying convergence of the loss function.The mathematical formulation of the loss function is given by where w ij is the indicator function, indicating whether the sample χ i is assigned to cluster µ j .By minimizing this loss function, k-means algorithm efficiently identifies optimal cluster center positions, facilitating effective data clustering.

Hierarchical Clustering
Hierarchical clustering builds a tree-like structure (dendrogram) to represent the nested grouping of data points.It can be divided into agglomerative and divisive methods [54].Agglomerative hierarchical clustering starts with each data point as an individual cluster and iteratively merges the closest pairs of clusters until a single cluster is formed.Conversely, divisive hierarchical clustering starts with the whole dataset as a single cluster and recursively splits it into smaller clusters.A well-known variation is the single-linkage method, which defines the distance between two clusters as the minimum distance between any pair of points from the two clusters.This method is effective in identifying clusters with irregular shapes.

Clustering Ensemble and Co-Association Frequency
Although there are many clustering methods, it has been accepted that there is not one clustering method that can identify all kinds of data structure distribution.In order to solve this problem, Strehl and Ghosh [16] proposed the cluster ensemble algorithm, which combines multiple clustering results of a set of objects into one clustering result without accessing the original features of the objects.The framework of clustering ensemble can be depicted by Figure 2.
where ij w is the indicator function, indicating whether the sample χ i is assigned to cluster µ j .By minimizing this loss function, k-means algorithm efficiently identifies op- timal cluster center positions, facilitating effective data clustering.

Hierarchical Clustering
Hierarchical clustering builds a tree-like structure (dendrogram) to represent the nested grouping of data points.It can be divided into agglomerative and divisive methods [54].Agglomerative hierarchical clustering starts with each data point as an individual cluster and iteratively merges the closest pairs of clusters until a single cluster is formed.Conversely, divisive hierarchical clustering starts with the whole dataset as a single cluster and recursively splits it into smaller clusters.A well-known variation is the singlelinkage method, which defines the distance between two clusters as the minimum distance between any pair of points from the two clusters.This method is effective in identifying clusters with irregular shapes.

Clustering Ensemble and Co-Association Frequency
Although there are many clustering methods, it has been accepted that there is not one clustering method that can identify all kinds of data structure distribution.In order to solve this problem, Strehl and Ghosh [16] proposed the cluster ensemble algorithm, which combines multiple clustering results of a set of objects into one clustering result without accessing the original features of the objects.The framework of clustering ensemble can be depicted by Figure 2. The aim of clustering ensemble [55] is to consolidate multiple independent clustering results into a comprehensive outcome, aiming to overcome potential biases introduced by different clustering algorithms.Moreover, the rise of clustering ensemble has given birth to various clustering ensemble methods, such as the voting-merging approach proposed by Hornik [56].This method leverages clustering ensemble algorithms to achieve more reliable and stable clustering results.It utilizes an unsupervised voting mechanism to amalgamate within the ensemble clustering, ultimately merging to derive the final clustering outcome.For family clustering results of a dataset, there are three types of relationships between two samples by qualitative observation.They may be always assigned to the same cluster, or they are assigned to the same cluster occasionally.The last circumstance is not assigned to the same group completely.In order to quantify a sample's tendency of changing groups quantitatively, Li et al.
[41] introduced a measurement named as co-association frequency by using the results of a family clustering.
Given a dataset x with n samples and  The aim of clustering ensemble [55] is to consolidate multiple independent clustering results into a comprehensive outcome, aiming to overcome potential biases introduced by different clustering algorithms.Moreover, the rise of clustering ensemble has given birth to various clustering ensemble methods, such as the voting-merging approach proposed by Hornik [56].This method leverages clustering ensemble algorithms to achieve more reliable and stable clustering results.It utilizes an unsupervised voting mechanism to amalgamate within the ensemble clustering, ultimately merging to derive the final clustering outcome.For family clustering results of a dataset, there are three types of relationships between two samples by qualitative observation.They may be always assigned to the same cluster, or they are assigned to the same cluster occasionally.The last circumstance is not assigned to the same group completely.In order to quantify a sample's tendency of changing groups quantitatively, Li et al.
[41] introduced a measurement named as co-association frequency by using the results of a family clustering.
Given a dataset X = {x 1 , x 2 , • • • , x n } with n samples and C 1 , C 2 , • • • , C L are family clustering results on U, we use C l (x i ) to indicate the label of x i induced by clustering result C l .The co-association frequency p ij , which represents that two samples x i and x j appear in the same cluster, is calculated by where We use an example to illustrate p ij .Figure 3 is a dataset X with 6 samples and four clustering results C 1 , C 2 , C 3 , C 4 of X.The samples x 1 and x 2 consistently remain in the same cluster across all results, indicating that co-association frequency p 12 = 1.On the other hand, x 1 and x 3 are assigned to the same cluster only in C 1 and C 2 , showing that co-association frequency p 13 = 0.5.The samples x 1 and x 5 are grouped into different clusters across all four clustering results, indicating that co-association frequency p 15 = 0.
clustering results C , C , C , C of X .The samples 1 x and 2 x consistently remain in the same cluster across all results, indicating that co-association frequency 12 1 = p .On the other hand, 1 x and 3 x are assigned to the same cluster only in 1 C and 2 C , show- ing that co-association frequency 13 0.5 = p .The samples 1 x and 5 x are grouped into different clusters across all four clustering results, indicating that co-association frequency  According to the above definition, we can obtain the co-association matrix of Figure 3 as Table 1.
Table 1.The co-association frequency matrix of Figure 3. Co-association frequency [57,58] is to measure the probability that two data samples are assigned to the same cluster in multiple clustering results.Specifically, if two samples are consistently assigned to the same cluster across multiple clustering results, their co-Data set X According to the above definition, we can obtain the co-association matrix of Figure 3 as Table 1.Co-association frequency [57,58] is to measure the probability that two data samples are assigned to the same cluster in multiple clustering results.Specifically, if two samples are consistently assigned to the same cluster across multiple clustering results, their coassociation frequency is 1.If two samples are not assigned to the same group completely, their co-association frequency is 0. By calculating the co-occurrence probability for all data points, a co-association frequency matrix is obtained.This matrix provides information about the similarity of data points.By setting a threshold for co-association frequency, samples with frequencies above the threshold are grouped into the same similarity class.This approach integrates information from multiple clustering runs, not relying solely on a single clustering result, thereby enhancing a comprehensive understanding of the data structure.

Similarity-Based Three-Way Clustering by Using Dimensionality Reduction
In this section, we propose a similarity theory [43,59] based on data dimensionality reduction and similarity-based three-way clustering.In contrast to traditional algorithms, our approach first employs the PCA algorithm for data preprocessing, transforming highdimensional data into low-dimensional data.It incorporates an ensemble strategy by randomly extracting subsets of features from the samples in multiple iterations, generating diverse basic clustering results using the traditional k-means clustering algorithm.Subsequently, we calculate the co-association frequency between samples to derive similarity classes.By extracting only partial features of the samples, we significantly reduce the computational complexity compared to the existing traditional ensemble clustering methods.The algorithm proposed in this paper involves three main steps: the generation of basic clusters by using dimensionality-reduced data, the computation of co-association frequency and similarity classes, and the integration of these results into three-way clustering.

Dimensionality Reduction
In this study, we employed data dimensionality reduction techniques, specifically utilizing Principal Component Analysis (PCA) to reduce the dimensions of the data.PCA is a commonly used dimensionality reduction method, aiming to map the original data onto a lower-dimensional subspace while retaining the maximum variance in the data.Through PCA, we can transform high-dimensional data into lower-dimensional space, thereby enhancing our understanding of the intrinsic structure of the data.
To begin with, consider a dataset comprising n samples and D features, represented by matrix X, where each row corresponds to a sample, and each column represents a feature.Our objective is to project this D-dimensional dataset onto a K-dimensional subspace (where K < D) and obtain a new feature matrix Z.The specific steps of dimensionality reduction by using PCA are as follows: Step 1: Data normalization: The first step involves centralizing the original data by subtracting the mean of each feature, resulting in the centered matrix X ′ .
Step 2: Covariance Matrix Computation: The covariance matrix represents the correlations between data features, with the specific formula Step 3: Eigenvalue and Eigenvector Computation: Eigenvalue decomposition is applied to the covariance matrix Ω, yielding eigenvalues Step 4: Selection of Top K Eigenvectors: The eigenvectors corresponding to the top K largest eigenvalues are chosen, forming the projection matrix V.
Step 5: Data Projection: The centered original data matrix X ′ is projected onto the selected K-dimensional subspace, resulting in the reduced feature matrix Z, where each row represents a sample, and each column represents a reduced feature.The specific formula is Through the aforementioned steps, we obtain the reduced-dimensional data matrix.In this low-dimensional space, we conduct fundamental clustering operations.This datadriven foundational clustering method allows for clustering analysis in lower dimensions while preserving the primary features of the data.The key advantage of this approach lies in its ability to facilitate data visualization, reduce computational complexity, and enhance clustering effectiveness through dimensionality reduction.
Next, we randomly select parts of the sample's features to obtain different clustering results.For a multidimensional dataset, different subsets of features try to describe the dataset from different views.Thus, a set of diverse clustering results will be obtained when distinguishing subsets of features are employed.Suppose that we randomly extract parts of the features and apply the k-means clustering method to divide the dataset into k clusters.This process is repeated L times, yielding multiple clustering results C 1 , C 2 , • • • , C L .The process of foundational clustering based on data dimensionality reduction is outlined in Algorithm 1.

Algorithm 1: Foundational Clustering Based on Data Dimensionality Reduction
when distinguishing subsets of features are employed.Suppose that we randomly extract parts of the features and apply the k-means clustering method to divide the dataset into k clusters.This process is repeated L times, yielding multiple clustering results Apply the dimensionality reduction method to X and obtain the reduced- dimensional dataset Z .

3
Randomly extract a subset of features from Z .C , C , , C  L and reduced dimensional da- taset Z .

Clustering Ensemble
From multiple clustering iterations, we obtain basic clustering results . Subsequently, we present a method for integrating the basic clustering results by using the co-occurrence frequency matrix.The aim is to employ the single-link method of hierarchical clustering to generate a more robust clustering result.
For a dataset x with n samples and 1 2 C ,C , ,C  L are family clustering results of X , we can construct an n n × co-association frequency matrix P , whose elements ij p represents the frequency that two samples i x and j x are simul- taneously assigned to the same cluster.

Clustering Ensemble
From multiple clustering iterations, we obtain basic clustering results C 1 , C 2 , • • • , C L .Subsequently, we present a method for integrating the basic clustering results by using the co-occurrence frequency matrix.The aim is to employ the single-link method of hierarchical clustering to generate a more robust clustering result.
For a dataset X = {x 1 , x 2 , • • • , x n } with n samples and C 1 , C 2 , • • • , C L are family clustering results of X, we can construct an n × n co-association frequency matrix P, whose elements p ij represents the frequency that two samples x i and x j are simultaneously assigned to the same cluster.
We view p ij as the similarity between samples and utilize the single-linkage of hierarchical clustering to obtain an ensemble clustering result.In the process of clustering, each data sample is treated as an independent cluster, and then gradually the most similar cluster is merged based on their co-association frequencies.Clusters with the highest similarity are merged to form a new cluster node.This process iterates until the cluster result with the highest lifetime is chosen as the final merged result.
The schematic representation of the single-linkage clustering dendrogram is illustrated in Figure 4. Different colors in Figure 4 represent different clusters at present, and each color represents a set of samples with high similarity.This bottom-up merging strategy ensures that we fully consider the degree of association between samples, resulting in more accurate clustering results.By measuring the similarity between different clusters and visualizing them as a dendrogram, we could intuitively observe the structure and hierarchy of the clustering results.In the dendrogram, higher connecting points represented stronger associations between clusters with higher co-occurrence frequencies.These results were relatively stable and less susceptible to noise or changes in the data.Therefore, such clustering results were more reliable and better able to reflect the true structure and patterns of the data.
By constructing a single-linkage clustering dendrogram using co-association frequencies and selecting the clustering result with the highest lifetime as the final fusion result, we obtain more stable clustering results, thereby enhancing our understanding of the features and inherent structure of the dataset.The process of ensemble clustering is outlined in Algorithm 2.

Input
Compute the co-occurrence frequency matrix P by (2). 2 Obtain the single-linkage dendrogram of P.

3
Achieve ensemble clustering results C with the highest lifetime.4 Return ilarity are merged to form a new cluster node.This process iterates until the cluster re with the highest lifetime is chosen as the final merged result.
The schematic representation of the single-linkage clustering dendrogram is i trated in Figure 4. Different colors in Figure 4 represent different clusters at present, each color represents a set of samples with high similarity.This bottom-up merging s egy ensures that we fully consider the degree of association between samples, resultin more accurate clustering results.By measuring the similarity between different clu and visualizing them as a dendrogram, we could intuitively observe the structure hierarchy of the clustering results.In the dendrogram, higher connecting points re sented stronger associations between clusters with higher co-occurrence frequen These results were relatively stable and less susceptible to noise or changes in the d Therefore, such clustering results were more reliable and better able to reflect the structure and patterns of the data.By constructing a single-linkage clustering dendrogram using co-association quencies and selecting the clustering result with the highest lifetime as the final fu result, we obtain more stable clustering results, thereby enhancing our understandin the features and inherent structure of the dataset.The process of ensemble clusterin outlined in Algorithm 2.

Similar Classes Based on Co-Association Frequency
This section introduces three-way clustering models based on the co-occurrence quency derived from clustering ensemble, proposing a similarity relationship under

Similar Classes Based on Co-Association Frequency
This section introduces three-way clustering models based on the co-occurrence frequency derived from clustering ensemble, proposing a similarity relationship under the framework of co-association frequency.Firstly, we give the definition of similar relation between x i and x j .Definition 1.For a dataset X = {x 1 , x 2 , • • • , x n } with n samples and C 1 , C 2 , • • • , C L are family clustering results of X, p ij is the co-association frequency between samples x i and x j .The similarity relation Sim θ (x i , x j ) based on a threshold θ is defined as: where 0 ≤ θ ≤ 1 is a pre-defined parameter.For x i ∈ X, the similar class is computed by: We still use Figure 3 as an example.If we take θ = 0.7, then Sim θ (x From the above definition, we can find that the similar class Sim θ (x i ) has the following properties: Clearly, the set of similar classes {Sim θ (x i )|{x i ∈ X} forms a covering of dataset X.For any subset C ⊆ X, 0 ≤ θ ≤ 1, the lower and upper approximations based on the co-association frequency are defined as follows: Furthermore, we can use the positive region Pos θ (C) and the fringe region Bnd θ (C) to describe the objective subset C. So, we define Pos θ (C) and Bnd θ (C), of C as Usually, the positive region Pos θ (C) contains the samples that belong to C definitely, and the fringe region Bnd θ (C) contains the samples that belong to C possibly.Based on the definitions and properties of Pos θ (C) and Bnd θ (C), for any cluster C i ⊆ X, it is straightforward to obtain the core region Co(C i ) and the fringe region Fr(C i ) by Algorithm 3 illustrates the calculation of core region Co(C i ) and the fringe region Fr(C i ) based on co-association frequency., , , Output: C {( ( ), ( )), ( ( ), ( )), , ( ( ), ( ) C {( ( ), ( )), ( ( ), ( )), , ( ( ), ( )

Similarity-Based Three-Way Clustering by Using Dimensionality Reduction
The stepwise execution of Algorithms 1-3 forms the framework of the proposed similarity-based three-way clustering by using dimensionality reduction, as illustrated in Algorithm 4.

Similarity-Based Three-Way Clustering by Using Dimensionality Reduction
The stepwise execution of Algorithms 1-3 forms the framework of the proposed similarity-based three-way clustering by using dimensionality reduction, as illustrated in Algorithm 4.

Algorithm 4: Similarity-based three-way clustering algorithm
Identify Core and Fringe Regions: In this framework, we first generate a set of base clustering results by employing dimensionality reduction techniques (Algorithm 1).Subsequently, by calculating coassociation frequencies, we utilize the single-linkage of hierarchical clustering to obtain ensemble clustering results (Algorithm 2).Finally, by defining the similar classes of each sample, we derive the core and fringe regions, further adjusting the clustering structure to yield more accurate and representative three-way clustering outcomes.
The uniqueness of this framework lies in its integration of data dimensionality reduction, co-association frequency computation, and definition of similar classes, providing a comprehensive revelation of the intrinsic structure during the clustering ensemble process.Algorithm 4 outlines the overall process of the three-way clustering framework, demonstrating how optimized clustering results are generated through multiple iterations to better reflect the characteristics of the original data.
The proposed approach offers a powerful tool for clustering ensemble, aiding in the precise capture of complex relationships and distribution patterns in clustering analysis.The three-way clustering framework provides valuable insights seeking to uncover intricate structures within their datasets.

Data Descriptions
In this section, we conduct some experiments to evaluate the effectiveness of the proposed algorithm.We employ datasets from 13 UCI machine learning repositories [60], spanning diverse domains such as biology, medicine, and finance.The detailed information about these datasets is presented in Table 2, including the number of clusters and other relevant details.The software used for implementation includes MATLAB2019a for statistical and matrix computations and Python 3.9 with libraries such as NumPy, SciPy, and scikit-learn for data processing and machine learning tasks, ensuring robust and efficient analysis.

Evaluation Indices
(1) Adjusted Rand Index (ARI) [61,62] serves as a prominent external metric for assessing clustering performance in comparison to ground truth labels.The ARI, an extension of the Rand Index (RI), is designed to overcome the limitations of the RI by adjusting for chance agreements.
ARI adjusts the RI using the following formula: where E[RI] represents the expected Rand Index under random conditions.The Rand Index (RI) is calculated by the formula: a: the number of sample pairs that belong to the same cluster in both the ground truth and clustering results.
b: the number of sample pairs that belong to different clusters in both the ground truth and clustering results.
c: the number of sample pairs that belong to the same cluster in the ground truth but to different clusters in the results.
d: the number of sample pairs that belong to different clusters in the ground truth but to the same cluster in the results.
ARI values provide insights into the agreement between clustering results and ground truth labels, with 1 indicating perfect agreement, 0 suggesting performance no better than random assignment, and negative values indicating worse than random allocation.The introduction of ARI offers a comprehensive and objective means for evaluating clustering algorithms, facilitating a more accurate understanding of their performance.
(2) Adjusted Mutual Information (AMI) [63,64] is an internal metric commonly used to assess the performance of clustering results.It is designed to measure the similarity between clustering results and a ground truth (typically, actual labels) by quantifying the information gain between two distributions.
The computation of AMI involves the following formula: where MI(U, V) represents the mutual information between U and V. E[MI(U, V)] is the expected mutual information under random conditions.H(U) and H(V) are the entropies of U and V, respectively.The numerator of AMI is an adjusted value of mutual information, while the denominator is an adjusted value of entropy.The values of AMI range from [0, 1], where 1 indicates a perfect match, 0 denotes random matching, and negative values signify matching below random levels.
(3) Accuracy (ACC) [65] is a common metric used to assess the performance of a classification model.It measures the proportion of samples that the model correctly classifies and serves as a simple and intuitive performance indicator.The formula for calculating ACC is as follows: where TP (True Positives) represents the number of samples correctly classified as the positive class, TN (True Negatives) represents the number of samples correctly classified as the negative class, FP (False Positives) represents the number of samples actually belonging to the negative class but misclassified as the positive class, FN (False Negatives) represents the number of samples actually belonging to the positive class but misclassified as the negative class.
The range of ACC is [0, 1], where 1 indicates perfect classification and 0 indicates classification failure.While ACC is an intuitive and easy-to-understand metric, it may have limitations when dealing with class imbalance.

Experimental Performances
Firstly, the PCA dimensionality reduction method is applied to high-dimensional datasets to obtain processed low-dimensional data.Subsequently, a clustering ensemble strategy is employed for the low-dimensional data.This involves randomly sampling subsets of data and features and running the traditional k-means clustering strategy for 50 iterations on all datasets.Then, an automatic hierarchical clustering method is used to form the clustering structure, and the merged results can be visualized using a dendrogram.Finally, the upper and lower approximations of similar classes are derived, and the core and fringe regions of each cluster are determined.Additionally, similarity threshold θ is 0.7 in the experiments.
Because NMI, ARI, and ACC are only adopted to the hard clustering results, three-way clustering results cannot calculate these values directly.In order to present the performances of our proposed algorithm, this study uses the core regions to form a clustering result, then calculate the NMI, ARI, and ACC by using the core region to represent the corresponding cluster.The clustering ensemble strategy is executed 50 times on all datasets, with an ensemble size of 50, to calculate the average NMI, ARI, and ACC values.The performances of the proposed algorithm on these three indicators are displayed in Table 3 and Figures 5-7.
To compare clustering effects, the performances of k-means, FCM, and DBSCAN are also presented in Table 3 and Figures 5-7.The best performances for each dataset are highlighted in bold.Through a comparative analysis of the data presented in Table 3 and Figures 5-7, the following conclusions can be drawn:    Through a comparative analysis of the data presented in Table 3 and Figures 5-7, the following conclusions can be drawn: Through a comparative analysis of the data presented in Table 3 and Figures the following conclusions can be drawn: (1).By comparing the performance of our proposed three-way clustering algorithm with traditional clustering methods, such as k-means, FCM (Fuzzy C-Means), and DBSCAN (Density-Based Spatial Clustering of Applications with Noise), on AMI, ARI, and ACC, it can be found that our proposed algorithm demonstrates significant advantages on most datasets.Taking the Libras dataset as an example, after running the proposed algorithm, the resulting AMI, ARI, and ACC values are 0.5193, 0.7144, and 0.6182, respectively.In contrast, the AMI, ARI, and ACC values for the traditional k-means algorithm are only 0.1837, 0.1842, and 0.3389, respectively.This improvement is attributed to the dimensionality reduction of original high-dimensional data, mapping it to a lower-dimensional space, thus reducing data complexity.The introduction of cooccurrence probability enables more precise delineation of similar classes, allocating data points to core and fringe regions, better capturing the inherent structure of the data.(2).By comparing the proposed three-way clustering algorithm with other algorithms in terms of AMI, ARI, and ACC, we observed significant improvements in the proposed algorithm relative to others.Specifically, across all datasets, the proposed algorithm exhibited an average improvement of approximately 20% to 30% in ARI and ACC, and an average increase of about 15% to 35% in AMI.There are several potential reasons behind these improvements.Firstly, the proposed three-way clustering algorithm adopts an ensemble strategy, integrating concepts of data dimensionality reduction, co-occurrence frequencies, and similarity classes, thereby offering a more comprehensive consideration of the inherent structure of the data.Secondly, leveraging the single-linkage method of hierarchical clustering, the proposed three-way clustering algorithm effectively captures the degree of correlation among data points, resulting in more precise classification of data points into clusters.Additionally, by selecting the clustering result with the highest lifetime as the final merged result, the proposed three-way algorithm ensures the stability and consistency of the clustering results, rendering it more suitable for various data types and complex structures.The suboptimal performance on the Wdbc dataset may be due to algorithm sensitivity to different parameter settings, and parameter selection may vary across different datasets.Although our proposed algorithm shows significant improvements, certain algorithms may perform better under specific conditions due to their inherent characteristics.For example, algorithms like DBSCAN are particularly effective for datasets with noise and density variations, while hierarchical clustering can capture nested cluster structures.By comparing the actual runtime with the computational time complexity, it is concluded that the proposed algorithm strikes a balance between accuracy and computational efficiency.Although it is not the fastest, its robustness and ability to handle high-dimensional and noisy data make it a valuable tool in practical applications.
In summary, the proposed three-way clustering algorithm amalgamates ideas from data dimensionality reduction, co-occurrence frequency calculation, and similar class partitioning.Compared to traditional clustering algorithms, it demonstrates advantages in more nuanced data analysis and accurate clustering results, making it more feasible and effective in practical applications.

Conclusions
The theoretical contribution of this paper lies in the proposal of a novel three-way clustering framework that integrates dimensionality reduction, co-occurrence frequencies, and similarity classes with three-way clustering.The objective is to efficiently cluster heterogeneous data from multiple sources by leveraging inherent structural information.Initially, we employ principal component analysis (PCA) to reduce the dimensionality of the data, mapping high-dimensional data into a lower-dimensional space.This not only decreases computational complexity but also enhances clustering efficiency.
Subsequently, we introduce the concept of co-occurrence frequencies, considering the co-occurrence relationships between samples.By applying a threshold to the co-occurrence probability, samples are classified into similar classes, combined with the division into core and fringe regions.This ensures that the proposed algorithm not only accurately describes the intrinsic structure of the data but also exhibits robustness.The experimental results show that the proposed algorithm can improve clustering accuracy, particularly when dealing with complex data structures and significant noise interference.To further enhance the clustering process, we integrate these co-occurrence probabilities with a single-linkage hierarchical clustering method.This fusion enables us to construct a dendrogram that captures the similarity between different clusters.Lifecycle analysis is then employed to select the most stable clustering result, ensuring consistency and robustness.
The practical contribution of this paper is the improvement in clustering accuracy.Experimental results demonstrate that the proposed algorithm significantly enhances clustering precision, especially when handling complex data structures and substantial noise interference.This proves its practical effectiveness in various real-world scenarios.The method shows significant advantages across multiple datasets, highlighting its versatility and robustness in dealing with diverse and high-dimensional data.This adaptability makes it suitable for a wide range of applications, from bioinformatics to market segmentation.
Although the algorithm demonstrates significant advantages across multiple datasets during experimental validation, it does not consistently exhibit the expected improvements on certain specific datasets.This discrepancy may arise due to a partial mismatch between data characteristics and algorithm design, necessitating further exploration and refinement.
In future research, we will focus on the following aspects: (1).Adaptability of parameter selection: The subjective nature of parameter thresholds in the algorithm may impact the stability of experimental results.To enhance algorithm robustness, considering more objective and adaptive parameter selection methods to accommodate different dataset requirements and application scenarios is essential.
(2).Improving the Quality of Base Clustering: The generation of base clustering using different feature subsets may lead to poorquality results, negatively affecting the final ensemble clustering outcome.To enhance the quality of base clustering, we can employ automatic evaluation mechanisms based on the data's intrinsic structure or utilize advanced clustering performance metrics.Additionally, introducing other methods such as setting evaluation functions will help eliminate the impact of low-quality base clustering, effectively improving the overall performance of ensemble clustering.
(3).Adaptation Improvements for Specific Datasets: The observation that the algorithm did not consistently exhibit expected improvements on specific datasets suggests a potential mismatch between data characteristics and algorithm design.Further work can include adapting the algorithm specifically for certain datasets, enhancing its generality and adaptability.

Figure 1 .
Figure 1.Illustration of Dimensionality Reduction by PCA.

Figure 1 .
Figure 1.Illustration of Dimensionality Reduction by PCA.

Figure 2 .
Figure 2. The framework of clustering ensemble.

1 2 C
,C , ,C  L are family clustering results on U, we use C ( ) l i x to indicate the label of i x induced by clustering

Figure 2 .
Figure 2. The framework of clustering ensemble.

Figure 3 .
Figure 3.An example of a dataset with four clustering results.

Figure 3 .
Figure 3.An example of a dataset with four clustering results.

1 2 CAlgorithm 1 :
, C , , C L .The process of foundational clustering based on data dimensionality re- duction is outlined in Algorithm 1. Foundational Clustering Based on Data Dimensionality Reduction Input:

4
Use the k-means algorithm to obtain the clustering result C i .

2
Obtain the single-linkage dendrogram of P .

3
Achieve ensemble clustering results C with the highest lifetime.

Algorithm 3 :Algorithm 3 :
Finding core region and fringe region Mathematics 2024, 12, x FOR PEER REVIEW 11 of 19 Algorithm 3 illustrates the calculation of core region ( ) i Co C and the fringe region ( ) i Fr C based on co-association frequency.Finding core region and fringe region Input: Co-association frequency ij p , ensemble clustering results 1 2

Figure 5 .
Figure 5. ARI Comparison of Algorithms Embedded in K-means.

Figure 5 .
Figure 5. ARI Comparison of Algorithms Embedded in K-means.

Figure 6 .
Figure 6.AMI Comparison of Algorithms Embedded in K-means.

Figure 7 .
Figure 7. ACC Comparison of Algorithms Embedded in K-means.

Figure 6 .
Figure 6.AMI Comparison of Algorithms Embedded in K-means.

Figure 5 .
Figure 5. ARI Comparison of Algorithms Embedded in K-means.

Figure 6 .
Figure 6.AMI Comparison of Algorithms Embedded in K-means.

Figure 7 .
Figure 7. ACC Comparison of Algorithms Embedded in K-means.

Figure 7 .
Figure 7. ACC Comparison of Algorithms Embedded in K-means.

Table 1 .
The co-association frequency matrix of Figure3.

Table 2 .
Datasets Used in Experiments.

Table 3 .
The performances of different algorithms.