A Fair Ensemble Clustering Method

Yanqing Li; Ruixin Feng; Caiming Zhong

doi:10.3390/sym17122184

,

and

College of Science and Technology, Ningbo University, Ningbo 315211, China

^*

Author to whom correspondence should be addressed.

Symmetry2025, 17(12), 2184;https://doi.org/10.3390/sym17122184

This article belongs to the Special Issue Advances in Machine Learning and Symmetry/Asymmetry

Version Notes

Order Reprints

Abstract

Ensemble clustering has become a widely used technique for improving robustness and accuracy by combining multiple clustering results. However, traditional ensemble clustering methods often fail to provide fair treatment between groups defined by sensitive attributes. Central to many ensemble methods is the symmetric co-association matrix, which captures pairwise similarity between data points based on their co-occurrence across base clusterings. This paper introduces a fair ensemble clustering method based on the symmetric co-association matrix. The proposed method integrates fairness constraints into the objective function of the ensemble process, using the results from base clusterings that lack fairness considerations as input. The optimization is performed iteratively, and the final clustering results are represented directly by a label matrix obtained efficiently using a coordinate descent approach. By integrating fairness into the clustering process, the method avoids the need for post-processing to achieve fair results. Comprehensive experiments on both real-world and synthetic datasets validate the effectiveness and practicality of the proposed method.

Keywords:

ensemble clustering; fairness; coordinate descent; machine learning; unsupervised learning

1. Introduction

Clustering is an unsupervised learning technique in which a set of unlabeled data points is partitioned into clusters to reveal intrinsic structures and patterns within the data [1]. It is widely applied in fields such as image segmentation [2], speech recognition [3], and text mining [4], and can be categorized into various types, including hierarchical clustering, partition-based clustering, density-based clustering, model-based clustering, grid-based clustering, graph-based clustering, and fuzzy clustering.

However, the results of a single clustering method may vary due to differences in algorithms and initialization strategies or the risk of falling into local optima. To address this, ensemble clustering was proposed [5,6]. The core idea of ensemble clustering is to combine multiple clustering results, exploiting their diversity and complementarity to produce more stable, accurate, and robust final clustering outcomes. Early studies employed simple voting methods to aggregate cluster labels for each data point [7,8]. The effectiveness of these approaches was limited by their inadequate consideration of base classifier fusion strategies, which included a lack of evaluation and weighting of base cluster quality. Furthermore, these methods failed to capture the intrinsic local structures of the data.

A popular recent approach is the co-association matrix method, which constructs a symmetric co-association matrix from the frequencies with which data points appear in the same cluster across base clustering results, as co-occurrence is a symmetric relation. A clustering algorithm is then applied to the co-association matrix to generate the final partitioning [9,10,11,12,13,14]. The symmetry of this matrix arises naturally from the reciprocal nature of co-occurrence relationships: if sample i is clustered with sample j, then j is also clustered with i. Since the co-association matrix reflects pairwise similarities between samples, it can serve as a similarity or adjacency matrix.

Clustering methods based on similarity matrices are then applied to the co-association matrix, yielding ensemble clustering results through graph partitioning [15]. Clustering methods based on graph partitioning have been extensively studied [16,17,18,19]. To further enhance the expressive power of the co-association (CA) matrix and improve the stability of clustering results, Bian et al. proposed a fuzzy-neighbor-based CA matrix learning method called EC–CA–FN [20]. This method simultaneously considers both intra-cluster and inter-cluster relationships among samples and introduces a fuzzy index to control the degree of fuzziness, thereby significantly improving the robustness and stability of clustering performance. In addition, Bai et al. addressed the limitation of traditional k-means in identifying nonlinearly separable structures by proposing a multiple k-means ensemble algorithm [21]. By extracting locally credible labels and constructing a cluster relationship graph, their approach enables linear clusterers to simulate nonlinear partitioning, expanding the applicability of ensemble clustering to more complex data structures. For large-scale data, Zhang et al. introduced a fast spectral ensemble clustering algorithm called FSEC, which is based on anchor graphs [22]. By performing spectral embedding only once and leveraging anchor-based graph construction along with singular value decomposition (SVD) for dimensionality reduction, the algorithm significantly reduces computational cost while maintaining high clustering efficiency on large datasets. Deep ensemble clustering methods have also gradually gained popularity. For example, Hao et al. [18] proposed ECAR, an ensemble clustering method that incorporates attention mechanisms and joint optimization strategies, demonstrating the potential of combining deep learning and ensemble clustering. In multi-view data contexts, Huang et al. [23] introduced FastMICE, which focuses on scalability and efficiency for large-scale data.

Although ensemble clustering has shown promise in improving clustering quality, its results can exhibit implicit unfairness, a problem that has been largely overlooked. In recent years, machine learning technologies have played an important role in data-driven decision-making. However, traditional algorithms may introduce biases when handling data involving sensitive attributes (e.g., gender, race, or age), leading to systemic neglect or unfair treatment of certain groups. This undermines the trustworthiness of algorithms in real-world applications and risks exacerbating societal inequalities [24,25]. Designing fair clustering algorithms has thus become an important research area.

The practical urgency of fair clustering is evident in applications like resume screening or credit scoring, where algorithmic grouping may reinforce societal biases if sensitive attributes are ignored. Ensuring fairness in such unsupervised learning stages is therefore critical for developing trustworthy and equitable machine learning systems. There are two main concepts of fairness in clustering: group and individual fairness. In this study, we focus on group fairness, in which the goal is to ensure proportional representation of sensitive groups while maintaining clustering quality. This prevents systemic neglect of any group [26]. For instance, Chierichetti et al. proposed the fairlet decomposition method, which partitions data into small, fair “fairlets” for clustering to meet group fairness constraints. However, this method has a high computational complexity. Building on this, Backurs et al. [27] developed a near-linear-time fair clustering algorithm that significantly improves efficiency for large-scale datasets. Bera et al. [28] extended k-means clustering by incorporating upper- and lower-bound constraints into the optimization objective, enabling flexible control over group proportions. In spectral clustering, Kleindessner et al. [29] enforced group fairness by adding linear constraints to a Laplacian matrix optimization objective, ensuring that the group proportions in clustering results closely match their overall distribution while maintaining low RatioCut or NCut values, which indicate a high clustering quality. Ghadiri et al. [30] proposed a cost-balance-based fairness concept that minimizes clustering cost disparities across groups.

Individual fairness, in contrast, emphasizes that similar individuals should be treated similarly in clustering results, such as by imposing constraints on pairwise distances between samples [31]. Although some fairness studies have focused on single clustering algorithms, their methods are often specific to a given algorithm and difficult to generalize. Ensemble clustering offers a more flexible framework by synthesizing results from distinct base clustering models without accessing raw data features. However, fairness in ensemble clustering has been underexplored.

Unlike prior fair clustering methods that enforce fairness within a single clustering model, our approach embeds fairness directly into an ensemble consensus process, where multiple base clusterings are aggregated under fairness constraints. The only existing study on fair ensemble clustering, by Zhou et al. [32], introduced fairness regularization into the consensus objective to balance fairness with clustering quality, assuming equal capacity for each cluster.

To address scenarios where cluster sizes may naturally vary, we build upon the parameter-free ensemble framework of Nie et al. [6] and incorporate the proportional fairness notion from Chierichetti et al. [26]. Our method ensures that the distribution of sensitive attributes within each cluster aligns with the global distribution, without imposing equal cluster capacity. A key feature of our formulation is the consistent use of a symmetric similarity matrix, which directly encodes pairwise sample relationships. This symmetric representation better preserves the relational structure within the data and promotes stable, efficient convergence during the consensus optimization. The key innovation lies in the joint optimization framework that simultaneously learns this fair similarity matrix A and a label matrix Y, embedding fairness directly into the consensus process and avoiding post-processing. This enables dynamic, fairness-aware clustering that adapts to both data structure and group proportions.

The main contributions of this study are as follows:

We address fairness in ensemble clustering, making it applicable to diverse clustering scenarios and improving the robustness and accuracy of clustering results.
We propose a fast iterative optimization framework based on coordinate descent to solve the optimization problem.
We present experimental results demonstrating that our method effectively balances clustering quality and fairness, validating its effectiveness.

2. Related Work

2.1. Notation

Here, we define some notation used throughout this paper. The transpose of a matrix X is denoted by

X^{T}

, and

A \in R^{n \times n}

represents the weighted adjacency matrix or similarity matrix. The Laplacian matrix is given by

L = D - W

, and

Y \in Ind

is the clustering indicator matrix with

n \times c

elements, where

y_{i j} = 1

if the i-th sample belongs to the j-th cluster, otherwise,

y_{i j} = 0

. The binary indicator matrices for each base clustering (BCs) are denoted as

Y^{(1)}, Y^{(2)}, \dots, Y^{(m)}

,with the number of partitions for each BC represented by

c^{(1)}, c^{(2)}, \dots, c^{(m)}

. For the l-th BC, the dimension of

Y^{(l)}

is

n \times c^{(l)}

, and the true number of clusters in the dataset X is c. The balance matrix for each partition is denoted as

Θ^{(l)}

, which is a diagonal matrix containing the elements

θ_{i}^{(l)}

, where

i = 1, 2, \dots, c^{(l)}

.

2.2. Ensemble Clustering

Ensemble clustering, also known as consensus clustering, aims to enhance clustering robustness by integrating multiple BCs. Traditional clustering methods, such as k-means and spectral clustering, often suffer from sensitivity to initialization and parameter settings. By aggregating the outputs of diverse BCs, ensemble clustering mitigates the biases of individual models and improves overall performance.

A common approach is to construct a co-association matrix that records how often data points are clustered together across different BCs. However, this method assumes equal contributions from all BCs, ignoring differences in their quality and reliability. To address this limitation, weighted ensemble clustering frameworks assign varying importance to BCs and clusters. However, many existing methods use fixed weights determined by predefined metrics, limiting their adaptability.

A more advanced approach is to introduce a parameter-free dynamic weighting mechanism that learns a structured similarity matrix A directly from multiple BCs [6]. Unlike traditional methods that rely on a co-association matrix, this framework dynamically adjusts the weights of BCs and clusters to reflect their quality. The weight

α^{(l)}

of a given BC is computed on the basis of its consistency with the learned matrix A, while the cluster-level weights

Θ^{(l)}

address cluster imbalances within each BC. From Theorem 1 in [6], to ensure that A has exactly c connected components, thereby allowing the samples to be partitioned into c classes, it can be shown that the rank of

L_{A}

should be equal to

n - c

. Therefore, a rank constraint is imposed on the learned similarity matrix A.

The objective function is formulated as:

\begin{matrix} min_{A, Θ^{(l)}, α^{(l)}} \sum_{l = 1}^{m} α^{(l)} {∥ Y^{(l)} Θ^{(l)} Y^{(l) T} - A ∥}_{F}^{2} \\ s . t . θ^{(l) T} 1_{c^{(l)}} = 1, θ^{(l)} \geq 0, \\ A 1_{n} = 1_{n}, A \geq 0, rank (L_{A}) = n - c . \end{matrix}

(1)

Here,

Y^{(l)}

represents the label matrix of the l-th BC, and

α^{(l)}

is defined as:

α^{(l)} = \frac{1}{2 {∥ Y^{(l)} Θ^{(l)} Y^{(l) T} - A ∥}_{F}} .

(2)

To optimize the objective, we perform iterative updates. The rank constraint on

L_{A}

is non-convex and difficult to solve directly. Therefore, we transform this constraint during the optimization process. Since

L_{A}

is a positive semi-definite matrix, its eigenvalues are non-negative. The rank constraint can be converted into the condition that the c smallest eigenvalues of

L_{A}

are zero. When optimizing A while fixing other variables, the problem can be written as:

\begin{matrix} min_{A} & \sum_{l = 1}^{m} α^{(l)} {∥Y^{(l)} Θ^{(l)} Y^{(l) T} - A∥}_{F}^{2} + 2 λ \sum_{i = 1}^{c} σ_{i} (L_{A}) \\ s . t . & A 1_{n} = 1_{n}, A \geq 0 . \end{matrix}

(3)

Here, the regularization parameter

λ

should be sufficiently large. According to Ky Fan’s theorem [33], we have the following equality:

\sum_{i = 1}^{c} σ_{i} (L_{A}) = min_{F \in R^{n \times c}, F^{T} F = I_{c}} Tr (F^{T} L_{A} F)

(4)

where F is an orthogonal

n \times c

matrix. This transformation allows us to replace the non-convex rank constraint with a trace minimization over an orthogonal matrix, which can be efficiently optimized. Therefore, the problem can be rewritten as:

\begin{matrix} min_{A, F} & \sum_{l = 1}^{m} α^{(l)} {∥Y^{(l)} Θ^{(l)} Y^{(l) T} - A∥}_{F}^{2} + 2 λ Tr (F^{T} L_{A} F) \\ s . t . & A 1_{n} = 1_{n}, A \geq 0, \\ F \in R^{n \times c}, F^{T} F = I_{c} . \end{matrix}

(5)

The alternating optimization for updating A and F is carried out next. A more detailed exposition of this algorithm and optimization process has been presented by Xie et al. [6]. By iteratively updating A,

α^{(l)}

, and

Θ^{(l)}

, the framework adapts to the diversity and quality of the BCs. This parameter-free ensemble clustering approach dynamically balances the contributions of BCs and clusters, achieving superior performance across diverse datasets and establishing itself as a robust and efficient clustering solution. A more detailed exposition of this algorithm and optimization process has been presented by Xie et al. [6].

2.3. Fairness

In this work, we focus exclusively on group fairness, which ensures that each cluster’s sensitive group distribution aligns with the global distribution. Individual fairness, which would require similar treatment for similar individuals, is not enforced here, as our objective is to prevent systemic bias against protected groups at the cluster level. This aligns with real-world applications such as demographic-balanced resource allocation.

Group fairness refers to the principle that groups with different sensitive attribute values should receive similar treatment in model decisions. Following the fairness definition proposed by Chierichetti et al. [26], we formalize fairness as follows. Assume a dataset X is divided into w protected groups

G_{1}, G_{2}, \dots, G_{w}

, where each group corresponds to a specific sensitive attribute value. Furthermore, the data are clustered into c partitions

{π_{1}, π_{2}, \dots, π_{c}}

. The proportion of group

G_{s}

in the overall dataset is given by:

\begin{matrix} p r \end{matrix} (G_{s}) = \frac{| G_{s} |}{n}, s = 1, 2, \dots, w

(6)

The proportion of group

G_{s}

within the v-th cluster is:

\begin{matrix} p r \end{matrix} (G_{s} | π_{v}) = \frac{| G_{s} \cap π_{v} |}{| π_{v} |}, v = 1, 2, \dots, c

(7)

The fairness of cluster

π_{v}

can then be expressed as:

| \begin{matrix} p r \end{matrix} (G_{s} | π_{v}) - \begin{matrix} p r \end{matrix} (G_{s}) | \leq ϵ, \forall s \in {1, 2, \dots, w}, \forall v \in {1, 2, \dots, c}

(8)

Here,

ϵ

represents the allowable deviation, which defines the acceptable range of fairness. To achieve fairness, it is required that the distribution of the protected group

G_{s}

within each cluster

π_{v}

aligns closely with its global distribution.

Unlike single-model fair clustering methods, our approach enforces fairness at the ensemble level. Moreover, instead of imposing equal cluster sizes, we require each cluster’s sensitive attribute distribution to match the global proportion. This proportional fairness is integrated directly into the joint optimization of the similarity matrix and cluster labels, ensuring fairness is intrinsic to the consensus process.

3. Proposed Algorithm

In Section 2.2, the ensemble clustering algorithm of Nie et al. [6] was discussed, with Equation (1) describing the learning of a similarity matrix A from multiple BCs. In our proposed algorithm, a fairness objective function is added so that the similarity matrix and the clustering result are fair.

3.1. Fair Ensemble Clustering

We assume that the dataset V consists of w groups, i.e.,

V = V_{1} \cup V_{2} \cup \dots \cup V_{w}

. Following the fairness concept introduced in Section 2.3, we ensure that the distribution of different groups

V_{s}

within each cluster matches the global distribution. Consider the vector space

P = {[\begin{matrix} p_{1}, p_{2}, \dots, p_{n} \end{matrix}]}^{T} \in R^{n \times w}

which represents the group matrix. This binary matrix is defined as:

p_{i j} = \{\begin{matrix} 1, & if x_{i} \in V_{j}, \\ 0, & otherwise . \end{matrix}

(9)

We then construct the following formulation to enforce fairness:

min_{Y} {∥{(Y^{T} Y)}^{- 1} P^{T} Y - U∥}_{F}^{2}

(10)

where

Y \in {0, 1}^{n \times c}

represents the label matrix of the final clustering result. This fairness term penalizes deviations from the global group distribution in each cluster, acting as a soft proportional constraint.

Assume

B = P^{T} Y

, where

B \in R^{w \times k}

. The element

B_{i j}

in matrix B represents the number of points with the i-th sensitive attribute in the j-th cluster. Multiplying B by the inverse of the matrix

(Y^{T} Y)

yields a result that indicates the proportional distribution. The matrix

U \in R^{w \times c}

is constructed to reflect the global distribution of different groups; each column of U is the same vector u, and

u_{i}

represents the proportion of the i-th group in the entire dataset. By adjusting Y, we ensure that the distribution of different groups in each cluster matches the global distribution represented by U, thereby achieving fairness in the clustering results.

The proposed fairness constraint is formulated using the label matrix Y to ensure that the proportions of different sensitive attributes in each cluster are accurately represented, thereby making the difference with the global distribution matrix more reflective of fairness. Additionally, using the label matrix

Y \in R^{n \times c}

, we ensure the data are divided into c clusters without needing to further constrain the rank of the Laplacian matrix in Equation (1).

From Equation (1), we only learn the similarity matrix A; the label matrix is derived from A in a post-processing step. However, we need to learn the 0–1 label matrix Y directly. To achieve this, inspired by NCUT [34], we add the term

Tr ({(Y^{T} D Y)}^{- \frac{1}{2}} Y^{T} L_{A} Y {(Y^{T} D Y)}^{- \frac{1}{2}})

to learn the label matrix Y from the similarity matrix A.

Based on this idea and the existing ensemble clustering framework introduced in Section 2.2, we propose the following objective function:

\begin{matrix} min_{A, Y, Θ^{(l)}, α^{(l)}} \sum_{l = 1}^{m} α^{(l)} {∥Y^{(l)} Θ^{(l)} {Y^{(l)}}^{⊤} - A∥}_{F}^{2} \\ + λ_{1} Tr ({(Y^{⊤} D Y)}^{- \frac{1}{2}} Y^{⊤} L_{A} Y {(Y^{⊤} D Y)}^{- \frac{1}{2}}) \\ + λ_{2} {∥{(Y^{⊤} Y)}^{- 1} P^{⊤} Y - U∥}_{F}^{2} \\ s . t . A 1_{n} = 1_{n}, A \geq 0, Y \in {0, 1}^{n \times c}, \sum_{j = 1}^{c} Y_{i j} = 1, \\ θ^{(l) ⊤} 1_{c^{(l)}} = 1, θ^{(l)} \geq 0 . \end{matrix}

(11)

Here,

λ_{1}, λ_{2} \geq 0

are Lagrange multipliers that control the trade-off between clustering quality and fairness. A larger

λ_{2}

places greater emphasis on satisfying the proportional fairness constraint, which corresponds to enforcing a smaller allowable deviation

ϵ

in Equation (8).

3.2. Optimization

We sequentially update A, Y,

α^{(l)}

, and

Θ^{(l)}

, keeping the other variables fixed when updating each one. We initialize

A = \frac{1}{m} \sum_{i = 1}^{m} Y^{(i)} {Y^{(i)}}^{T}

and

α^{(l)} = \frac{1}{m}, l = 1, 2, \dots, m

. The variables Y and

Θ^{(l)}, l = 1, 2, \dots, m

, are randomly initialized.

3.2.1. Updating A

The subproblem can be formulated as:

\begin{matrix} min_{A} & \sum_{l = 1}^{m} α^{(l)} {∥Y^{(l)} Θ^{(l)} {Y^{(l)}}^{T} - A∥}_{F}^{2} \\ + λ_{1} Tr ({(Y^{T} D Y)}^{- \frac{1}{2}} Y^{T} L_{A} Y {(Y^{T} D Y)}^{- \frac{1}{2}}), \\ s . t . & A 1_{n} = 1_{n}, A \geq 0 . \end{matrix}

(12)

Referring to the integrated algorithm for updating A mentioned in Section 2.3, the above problem can be rewritten as:

\begin{matrix} min_{A} & \sum_{l = 1}^{m} \sum_{i, j = 1}^{n} α^{(l)} {(b_{i j}^{(l)} - a_{i j})}^{2} + λ \sum_{i, j = 1}^{n} {∥g_{i} - g_{j}∥}_{2}^{2} a_{i j}, \\ s . t . & a_{i j} \geq 0, 1_{n}^{T} a_{i} = 1 \end{matrix}

(13)

where

B^{(l)} = Y^{(l)} Θ^{(l)} {Y^{(l)}}^{T}

,

G = Y {(Y^{T} D Y)}^{- \frac{1}{2}}

, and

g_{i}

is the i-th column of G. It can be observed that the problems for different values of i are independent. Thus, for a specific i, the problem can be written as:

\begin{matrix} min_{a_{i}} & \sum_{l = 1}^{m} α^{(l)} {∥a_{i} - b_{i}^{(l)}∥}_{2}^{2} + λ t_{i}^{T} a_{i}, \\ s . t . & a_{i} \geq 0, 1_{n}^{T} a_{i} = 1, \end{matrix}

(14)

where

t_{i j} = {∥g_{i} - g_{j}∥}_{2}^{2}

, and

a_{i}

and

b_{i}^{(l)}

are the i-th columns of A and

B^{(l)}

, respectively.

Let

u_{i} = \frac{\sum_{l = 1}^{m} α^{(l)} b_{i}^{(l)} - \frac{λ}{2} t_{i}}{\sum_{l = 1}^{m} α^{(l)}}

; then the above problem can be transformed into:

\begin{matrix} min_{a_{i}} & \frac{1}{2} {∥a_{i} - u_{i}∥}_{2}^{2}, \\ s . t . & a_{i} \geq 0, 1_{n}^{T} a_{i} = 1 . \end{matrix}

(15)

To solve this problem, we use the Lagrange multiplier method proposed by Nie et al. [6], and the final solution is:

\begin{matrix} a_{i} = {(v_{i} - σ 1_{n})}_{+}, \end{matrix}

(16)

where

v_{i} = u_{i} + \frac{1_{n}}{n} - \frac{1_{n}^{T} u_{i}}{n} 1_{n}

and

σ = \frac{1_{n}^{T} ρ^{*}}{n}

. Here,

ρ \in R^{n}

is a Lagrange multiplier and

ρ_{i} \geq 0

. The solution for

ρ

is

ρ = {(σ - v_{i j})}_{+}

, where

σ

is the root of the following equation solved using the Newton’s method:

\begin{matrix} f (σ) = σ - \frac{\sum_{j = 1}^{n} {(σ - v_{i j})}_{+}}{n}, \end{matrix}

(17)

where

σ \geq 0

and

f^{'} (σ) \geq 0

.

3.2.2. Updating Y

The subproblem for updating Y can be formulated as follows:

\begin{matrix} min_{Y} & λ_{1} Tr ({(Y^{T} D Y)}^{- \frac{1}{2}} Y^{T} L_{A} Y {(Y^{T} D Y)}^{- \frac{1}{2}}) \\ + λ_{2} {∥{(Y^{T} Y)}^{- 1} P^{T} Y - U∥}_{F}^{2} \\ s . t . & Y \in {0, 1}^{n \times c}, \sum_{j = 1}^{c} Y_{i j} = 1 . \end{matrix}

(18)

Here, Y represents the clustering label matrix, where each data point belongs to exactly one cluster. Consequently, each row of Y has only a single element equal to 1, with all other elements 0. Using this property, we employ the coordinate descent method to update Y, treating each row as an independent variable. When updating the i-th row, all other rows remain unchanged. The update involves iterating over all possible cluster assignments, setting one column to 1 while others are set to 0, and selecting the assignment that minimizes the objective function.

3.2.3. Updating $Θ^{(l)}$

The augmented Lagrangian multiplier method is used to update

Θ^{(l)}

, as detailed by Nie et al. [6]. The updated value of

Θ^{(l)}

is the solution to the following augmented Lagrangian function:

\begin{matrix} L (θ^{(l)}, β^{(l)}) = θ^{{(l)}^{T}} P^{(l)} β^{(l)} - θ^{{(l)}^{T}} q^{(l)} + \frac{μ}{2} {∥θ^{(l)} - β^{(l)} + \frac{1}{μ} γ∥}_{F}^{2}, \end{matrix}

(19)

where

P^{(l)} = {Y^{(l)}}^{T} Y^{(l)} ⊙ {Y^{(l)}}^{T} Y^{(l)}

,

q^{(l)} = diag ({Y^{(l)}}^{T} A Y^{(l)})

, and

μ \in R

is the penalty coefficient. The update step for

β^{(l)}

is given by:

\begin{matrix} β^{(l)} = θ^{(l)} + \frac{1}{μ} (γ - P^{{(l)}^{T}} θ^{(l)}) . \end{matrix}

(20)

The optimization problem for

θ^{(l)}

is:

\begin{matrix} min_{θ^{(l)}} & \frac{μ}{2} {∥θ^{(l)} - β^{(l)} + \frac{1}{μ} (γ + P^{(l)} θ^{(l)} - q^{(l)})∥}_{F}^{2}, \\ s . t . & θ^{{(l)}^{T}} 1_{c^{(l)} = 1}, θ^{(l)} \geq 0 . \end{matrix}

(21)

The solution process follows a similar approach to the previous problems, as detailed in Algorithm 1.

Algorithm 1 Augmented Lagrangian Multiplier (ALM) Method for Optimizing

Θ^{(l)}

Input: Base clusterings $Y^{(l)}$ , co-associate matrix A, initialize $θ^{(l)}$ , $γ$ and $μ > 0$ , set $1 \leq ρ \leq 2$ .
Output: The optimal solution of $θ^{(l)}$ .

1:: while Not convergence do
2:: Fix $θ^{(l)}$ and update $β^{(l)}$ according to Equation (20);
3:: Fix $β^{(l)}$ and update $θ^{(l)}$ by solving problem (21);
4:: Update $γ$ by $γ \leftarrow γ + μ (θ^{(l)} - β^{(l)})$ ;
5:: Update $μ$ by $μ \leftarrow ρ μ$ ;
6:: end while

3.2.4. Updating $α^{(l)}$

The update step for

α^{(l)}

follows the formula in Equation (2). The detailed algorithmic process is summarized in Algorithm 2.

Algorithm 2 The Algorithm of Solving Problem (12)

Input: Base clusterings $Y^{(l)}, l = 1, 2, . . ., m$ .
Output: The structured graph A and the consensus clustering results.

1:: Initialize A,Y, $Θ^{(l)}$ and $α^{(l)}, l = 1, 2, . . ., m$ ;
2:: while Not convergence do
3:: Update A by solving Equation (13);
4:: Update Y by solving Equation (19);
5:: Update $Θ^{(l)}, l = 1, 2, . . ., m$ by Algorithm 1;
6:: Update $α^{(l)}$ according to Equation (2);
7:: end while

3.3. Refining

Using coordinate descent to update Y directly results in a computational complexity of

O (n^{3})

. To address this inefficiency, Nie et al. [35] proposed a faster coordinate descent approach for solving the NCUT problem. We augment their method with an improved update strategy for Y to reduce its computational complexity. The terms in Equation (1) can be rewritten as follows:

Tr ({(Y^{T} D Y)}^{- \frac{1}{2}} Y^{T} L_{A} Y {(Y^{T} D Y)}^{- \frac{1}{2}}) = \sum_{i = 1}^{c} \frac{y_{i}^{T} L_{A} y_{i}}{y_{i}^{T} D_{A} y_{i}} .

(22)

{∥{(Y^{T} Y)}^{- 1} P^{T} Y - U∥}_{F}^{2} = \sum_{i = 1}^{c} {∥\frac{P^{T} y_{i}}{y_{i}^{T} y_{i}} - M∥}_{F}^{2} .

(23)

Thus, the subproblem for updating Y can be reformulated as:

\begin{matrix} min_{Y} & \sum_{i = 1}^{c} (λ_{1} \frac{y_{i}^{T} L_{A} y_{i}}{y_{i}^{T} D_{A} y_{i}} + λ_{2} {∥\frac{P^{T} y_{i}}{y_{i}^{T} y_{i}} - M∥}_{F}^{2}) \\ s . t . & Y \in {0, 1}^{n \times c}, \sum_{j = 1}^{c} Y_{i j} = 1 . \end{matrix}

(24)

The update for the r-th row, denoted as

\hat{y^{m}}

, can be represented as:

\begin{matrix} \hat{y^{m}} = Q_{c} & (\underset{k \in Z_{[1, c]}}{\arg \max} \sum_{i = 1}^{c} [λ_{1} \frac{{(y_{i}^{(k)})}^{T} L_{A} y_{i}^{(k)}}{{(y_{i}^{(k)})}^{T} D y_{i}^{(k)}} \\ + λ_{2} {∥\frac{P^{T} y_{i}^{(k)}}{{(y_{i}^{(k)})}^{T} y_{i}^{(k)}} - M∥}_{F}^{2}]) . \end{matrix}

(25)

When updating the r-th row while keeping the other rows fixed, there are c possible update strategies, denoted as

{Q_{c} (k) ∣ k \in Z_{[1, c]}}

, where

Q_{c} (k) \in R^{c}

is a one-hot vector with the k-th element set to 1. The r-th row is updated to

Q_{c} (k)

, and the other rows are fixed as

Y^{(k)}

. When updating the r-th row, computations related to the other rows become redundant, so the expression can be simplified to:

\begin{matrix} \hat{y^{m}} = Q_{c} & (\underset{k \in Z_{[1, c]}}{\arg \max} λ_{1} \frac{{(y_{k}^{(k)})}^{T} L_{A} y_{k}^{(k)}}{{(y_{k}^{(k)})}^{T} D y_{k}^{(k)}} + λ_{2} {∥\frac{P^{T} y_{k}^{(k)}}{{(y_{k}^{(k)})}^{T} y_{k}^{(k)}} - M∥}_{F}^{2} \\ - [λ_{1} \frac{y_{k}^{(0) T} L_{A} y_{k}^{(0)}}{{(y_{k}^{(0)})}^{T} D y_{k}^{(0)}} + λ_{2} {∥\frac{P^{T} y_{k}^{(0)}}{{(y_{k}^{(0)})}^{T} y_{k}^{(0)}} - M∥}_{F}^{2}]), \end{matrix}

(26)

where

λ_{1} \frac{y_{k}^{(0) T} L_{A} y_{k}^{(0)}}{{(y_{k}^{(0)})}^{T} D y_{k}^{(0)}} + λ_{2} {‖ \frac{P^{T} y_{k}^{(0)}}{{(y_{k}^{(0)})}^{T} y_{k}^{(0)}} - M ‖}_{F}^{2}

is constant term.

To further optimize the computational efficiency, we can precompute the terms

y_{k}^{T} L_{A} y_{k}

,

y_{k}^{T} D y_{k}

,

y_{k}^{T} l_{r}

, and

P^{T} y_{k}

. Assuming the original label of the r-th point is p, we now consider the following two scenarios:

(1): When $k = p$ , we have $y_{k}^{(k)} = y_{k}$ and $y_{k}^{(0)} = y_{k} - Q_{n} (m)$ . Therefore,

$\begin{matrix} λ_{1} \frac{y_{k}^{(0) T} L_{A} y_{k}^{(0)}}{{(y_{k}^{(0)})}^{T} D y_{k}^{(0)}} + λ_{2} {∥\frac{p^{T} y_{k}^{(0)}}{{(y_{k}^{(0)})}^{T} y_{k}^{(0)}} - M∥}_{F}^{2} \\ = λ_{1} \frac{{(y_{k} - Q_{n} (m))}^{T} L_{A} (y_{k} - Q_{n} (m))}{{(y_{k} - Q_{n} (m))}^{T} D (y_{k} - Q_{n} (m))} \\ + λ_{2} {∥\frac{P^{T} (y_{k} - Q_{n} (m))}{{(y_{k} - Q_{n} (m))}^{T} (y_{k} - Q_{n} (m))} - M∥}_{F}^{2} \\ = λ_{1} \frac{y_{k}^{T} L_{A} y_{k} - 2 y_{k}^{T} l_{r} + l_{r r}}{y_{k}^{T} D y_{k} - d_{r r}} + λ_{2} {∥\frac{P^{T} y_{k} - {(P^{m})}^{T}}{y_{k}^{T} y_{k} - 1} - M∥}_{F}^{2} . \end{matrix}$

(27)

Thus, the objective function becomes:

$\begin{matrix} L (k) & = λ_{1} [\frac{y_{k}^{T} A y_{k}}{y_{k}^{T} D y_{k}} - \frac{y_{k}^{T} L_{A} y_{k} - 2 y_{k}^{T} l_{r} + l_{r r}}{y_{k}^{T} D y_{k} - d_{r r}}] \\ + λ_{2} [{∥\frac{P^{T} y_{k}}{y_{k}^{T} y_{k}} - M∥}_{F}^{2} - {∥\frac{P^{T} y_{k} - {(P^{m})}^{T}}{y_{k}^{T} y_{k} - 1} - M∥}_{F}^{2}] . \end{matrix}$

(28)

Since the terms on the right-hand side of Equation (28) have already been precomputed, the time complexity of calculating this expression is $O (1)$ .
(2): When $k \neq p$ , we have $y_{k}^{(k)} = y_{k} + Q_{n} (m)$ and $y_{k}^{(0)} = y_{k}$ . Then,

$\begin{matrix} λ_{1} \frac{{(y_{k}^{(k)})}^{T} L_{A} y_{k}^{(k)}}{{(y_{k}^{(k)})}^{T} D y_{k}^{(k)}} + λ_{2} {∥\frac{P^{T} y_{k}^{(k)}}{{(y_{k}^{(k)})}^{T} y_{k}^{(k)}} - M∥}_{F}^{2} \\ = λ_{1} \frac{{(y_{k} + Q_{n} (m))}^{T} L_{A} (y_{k} + Q_{n} (m))}{{(y_{k} + Q_{n} (m))}^{T} D (y_{k} + Q_{n} (m))} \\ + λ_{2} {∥\frac{P^{T} (y_{k} + Q_{n} (m))}{{(y_{k} + Q_{n} (m))}^{T} (y_{k} + Q_{n} (m))} - M∥}_{F}^{2} \\ = λ_{1} \frac{y_{k}^{T} L_{A} y_{k} + 2 y_{k}^{T} l_{r} + l_{r r}}{y_{k}^{T} D y_{k} + d_{r r}} + λ_{2} {∥\frac{P^{T} y_{k} + {(P^{m})}^{T}}{y_{k}^{T} y_{k} + 1} - M∥}_{F}^{2} . \end{matrix}$

(29)

and

$\begin{matrix} λ_{1} \frac{{(y_{k}^{(0)})}^{T} L_{A} y_{k}^{(0)}}{{(y_{k}^{(0)})}^{T} D y_{k}^{(0)}} + λ_{2} {∥\frac{P^{T} y_{k}^{(0)}}{{(y_{k}^{(0)})}^{T} y_{k}^{(0)}} - M∥}_{F}^{2} \\ = λ_{1} \frac{y_{k}^{T} L_{A} y_{k}}{y_{k}^{T} D y_{k}} + λ_{2} {∥\frac{P^{T} y_{k}}{y_{k}^{T} y_{k}} - M∥}_{F}^{2} \end{matrix}$

(30)

Thus, the objective function can be written as:

\begin{matrix} L (k) = & λ_{1} [\frac{y_{k}^{T} L_{A} y_{k} + 2 y_{k}^{T} l_{r} + l_{r r}}{y_{k}^{T} D y_{k} + d_{r r}} - \frac{y_{k}^{T} L_{A} y_{k}}{y_{k}^{T} D y_{k}}] \\ + λ_{2} [{∥\frac{P^{T} y_{k} + {(P^{m})}^{T}}{y_{k}^{T} y_{k} + 1} - M∥}_{F}^{2} - {∥\frac{P^{T} y_{k}}{y_{k}^{T} y_{k}} - M∥}_{F}^{2}] . \end{matrix}

(31)

Similarly, the time complexity of computing Equation (31) is

O (1)

.

After updating the r-th row, it is necessary to update the stored values of

y_{k}^{T} L_{A} y_{k}

,

y_{k}^{T} D y_{k}

,

y_{k}^{T} l_{r}

, and

P^{T} y_{k}

. If the label of the r-th point remains unchanged, the stored values are not changed. However, if the label changes, let

q = arg max L (k)

, where

q \neq p

. In this case, the corresponding values need to be updated as follows. We can derive that

{\hat{y}}_{q} = y_{q}^{(q)}

and

{\hat{y}}_{p} = y_{p}^{(0)}

. Consequently, the expressions involving

y_{q}

are computed as follows:

{\hat{y}}_{q}^{T} L_{A} {\hat{y}}_{q} = {(y_{q}^{(q)})}^{T} L_{A} y_{q}^{(q)}

(32)

{\hat{y}}_{q}^{T} D {\hat{y}}_{q} = {(y_{q}^{(q)})}^{T} D y_{q}^{(q)}

(33)

These two terms have already been computed when evaluating the objective function and do not introduce additional computational complexity. For the term involving

l_{i}

, we have

{\hat{y}}_{q}^{T} l_{i} = {(y_{q}^{(q)})}^{T} l_{i} + L_{r i}, \forall i \in Z_{[1, n]}

(34)

This computation has a complexity of

O (n)

. Similarly, for the term involving P:

P^{T} {\hat{y}}_{q} = P^{T} y_{q}^{(q)} + {(P^{r})}^{T}

(35)

This operation has a complexity of

O (w)

. Since

w \leq n

, the total computational complexity for these four operations is

O (n)

.

For the expressions involving

y_{p}

, the computational complexity follows the same pattern as those involving

y_{q}

:

{\hat{y}}_{p}^{T} L_{A} {\hat{y}}_{p} = {(y_{p}^{(0)})}^{T} L_{A} y_{p}^{(0)}

(36)

{\hat{y}}_{p}^{T} D {\hat{y}}_{p} = {(y_{p}^{(0)})}^{T} D y_{p}^{(0)}

(37)

{\hat{y}}_{p}^{T} l_{i} = {(y_{p}^{(0)})}^{T} l_{i} - L_{r i}

(38)

P^{T} {\hat{y}}_{p} = P^{T} y_{p}^{(0)} - {(P^{r})}^{T}

(39)

These updates maintain the same computational complexity, ensuring efficient evaluation of the required terms. The redefined method is summarized in Algorithm 3.

Algorithm 3 The Algorithm for Quickly Solving Problem (12)

Input: Base clusterings $Y^{(l)}, l = 1, 2, . . ., m$ .
Output: The structured graph A and the consensus clustering results.

1:: Initialize A,Y, $Θ^{(l)}$ and $α^{(l)}, l = 1, 2, . . ., m$ ;
2:: while Not convergence do
3:: Update A by Equation (13);
4:: Calculate $y_{t}^{T} L_{A} y_{t}$ , $y_{t}^{T} D y_{t}$ , $y_{t}^{T} l_{i}$ , $P^{T} y_{t}$ , $\forall t \in Z_{[1, c]}$ , $\forall r \in Z_{[1, n]}$ , then store them in the memory;
5:: Repeat:
6:: for $r \leftarrow 1$ to n do
7:: for $k \leftarrow 1$ to c do
8:: Calculate $L (k)$ via Equations (28) and (31) and store
9:: end for
10:: Update the label of point r by choosing the index of the minimum value of $L (k)$
11:: Update the memory of $y_{t}^{T} L y_{t}$ , $y_{t}^{T} D y_{t}$ , $y_{t}^{T} l_{i}$ , $P^{T} y_{t}$
12:: end for
13:: Until convergence
14:: Update $Θ^{(l)}, l = 1, 2, . . ., m$ by Algorithm 2;
15:: Update $α^{(l)}$ according to Equation (2);
16:: end while

3.4. Time Complexity Analysis

The time complexity of the proposed method can be determined as follows. The computation required to initialize

A, Y, Θ^{(l)},

and

α^{(l)}

is negligible. When updating A, the time complexity of computing T is

O (n c)

, whereas for U, it is

O (n^{2} m)

. Thus, the overall complexity of computing A is

O (n^{2} max {c, m})

. To update Y, the fast coordinate descent method is used. The initialization involves computing

y_{k}^{T} L_{A} y_{k}

,

y_{k}^{T} D y_{k}

,

y_{k}^{T} l_{r}

, and

P^{T} y_{k}

, whose respective complexities are

O (n^{2} c)

,

O (n^{2} c)

,

O (n c)

, and

O (n w c)

. At each point

O (c)

operations are required to evaluate the objective function for each potential solution. When a better solution is found, the corresponding label update involves computing

y_{q}^{T} l_{i}

and

y_{p}^{T} l_{i}

, each with a complexity of

O (n)

. Let

t_{1}

be the number of outer iterations, and let

n_{1}

denote the number of improved solutions. Then, the time complexity of updating Y is given by

O (n^{2} c + t_{1} (n c + n_{1} n))

. Under the conditions

n_{1} \leq n

and

t_{1} ≪ n

, this complexity simplifies to

O (n^{2} c)

. The time complexity of updating

Θ^{(l)}

and

α^{(l)}

are

O (n c m)

and

O (n^{2} m)

, respectively. Thus, the overall time complexity of our method is

O ((n^{2} max {c, m} + n c m) t)

, where t represents the number of iterations.

3.5. Convergence Analysis

The proposed method employs an alternating optimization approach in which, at each step, a subproblem involving the current variable is minimized. Let the value of the objective function after the t-th update be

f (A^{(t)}, Y^{(t)}, Θ^{(l, t)}, α^{(l, t)})

. It follows that

\begin{matrix} f (A^{(t)}, Y^{(t)}, Θ^{(l, t)}, α^{(l, t)}) \geq f (A^{(t + 1)}, Y^{(t + 1)}, Θ^{(l, t + 1)}, α^{(l, t + 1)}), \end{matrix}

(40)

which implies that the objective function does not increase during any optimization step; thus, its value is monotonically decreasing.

The objective function can be divided into three terms. The first term,

\sum_{l = 1}^{m} α^{(l)} {|Y^{(l)} Θ^{(l)} Y^{(l) T} - A|}_{F}^{2}

, and the third term,

{|{(Y^{T} Y)}^{- 1} P^{T} Y - U|}_{F}^{2}

, both represent the squared Frobenius norm, and hence are non-negative. The second term,

Tr ({(Y^{T} D Y)}^{- \frac{1}{2}} Y^{T} L_{A} Y {(Y^{T} D Y)}^{- \frac{1}{2}})

, is also non-negative.

Proof.

Let

L_{A}

denote the Laplacian matrix, which is defined as

L_{A} = D - A

, where D is the degree matrix. The Laplacian matrix

L_{A}

is a positive semi-definite matrix, meaning all its eigenvalues are non-negative. Therefore, for any non-zero vector

x \in R^{n}

, we have

\begin{matrix} x^{T} L_{A} x \geq 0 . \end{matrix}

(41)

Since Y is the clustering label matrix, for any vector

u \in R^{c}

, we observe that

\begin{matrix} u^{T} Y^{T} L_{A} Y u = {(Y u)}^{T} L_{A} (Y u) \geq 0 . \end{matrix}

(42)

Thus,

Y^{T} L_{A} Y

is also a positive semi-definite matrix. Additionally,

Y^{T} D Y

is a diagonal matrix whose diagonal elements correspond to the sum of degrees in each cluster. Since D is a positive definite diagonal matrix,

Y^{T} D Y

is also positive definite; therefore, its inverse square root,

{(Y^{T} D Y)}^{- 1 / 2}

, is positive definite as well.

Given these properties, the matrix

\begin{matrix} Z = ({(Y^{T} D Y)}^{- 1 / 2} Y^{T} L_{A} Y {(Y^{T} D Y)}^{- 1 / 2}) \end{matrix}

(43)

is positive semi-definite, as its eigenvalues are non-negative. Consequently, the trace of Z is also non-negative:

Tr (Z) = \sum_{i = 1}^{c} λ_{i} \geq 0,

(44)

where

λ_{i}

are the eigenvalues of Z.

Thus, the objective function has a lower bound:

f (A, Y, Θ^{(l)}, α^{(l)}) \geq 0 .

(45)

As a result, the objective function cannot decrease indefinitely, and each variable update process is convergent. Therefore, we can conclude that the algorithm converges. □

4. Experiments

4.1. Datasets

To evaluate the effectiveness of the proposed method, it was tested on multiple synthetic and real-world datasets, as detailed in Table 1. The four synthetic datasets DS-577, 2d-4c-no0, 2d-4c-no1, and 2d-4c-no4 are extensively employed in clustering research. In the absence of explicit sensitive attributes, their ground-truth labels are treated as implicit sensitive attributes. As shown in Figure 1, the shape of each point in these synthetic two-dimensional datasets indicates its group membership. The shape of each point represents the group to which it belongs. For real-world datasets, we selected several widely used benchmarks. The HAR dataset [36] is used for human activity recognition and consists of data collected from sensors in smartphones, including accelerometers and gyroscopes. It categorizes six common activities such as walking and running, forming protected groups based on activity types. The MNIST-USPS dataset [37] combines the classical MNIST and USPS handwritten digit datasets, which contain 28 × 28 and 16 × 16 grayscale images, respectively, and forms protected groups based on the dataset origin. Reverse MNIST [37] is a variant of MNIST in which the colors are inverted, making the background black and the digits white, used to test the robustness of models to data transformations, with the protected attribute being whether the image has been inverted. The JAFFE dataset [38] consists of facial expression images from 10 Japanese women, with seven basic emotions such as happiness and sadness, used for emotion analysis and facial expression recognition. The protected attribute used for these images is the expression category. These datasets provide a diverse experimental foundation to assess the algorithm’s performance across different scenarios.

Table 1. Datasets.

Figure 1. Four synthetic datasets. The shape of a point in the graph represents its sensitive attribute. (a) DS-577; (b) 2d-4c-no0; (c) 2d-4c-no1; (d) 2d-4c-no4.

4.2. Experimental Setup

To comprehensively evaluate the proposed method, we conducted experiments on both two-dimensional synthetic datasets and real-world datasets. We compared our approach against several baseline methods, including K-means, ensemble clustering methods (PFEC, LEGP [39], LEWA [39], ECPCS_MC [40], ECPCS_HC [40]), mainstream fair clustering algorithms (fair unnormalized spectral clustering (FSCUN) [29], fair normalized spectral clustering (FSCN) [29], fair algorithms for clustering (FAC) [28]), and the fair ensemble clustering method (FCE) proposed by Zhou et al. For our method, we constructed a pool of 100 base clusterings, generated by varying the initialization centroids of K-means and the number of clusters. The random seed for cluster center initialization was fixed at 2025. The number of clusters was sampled from the interval

[min {c, \sqrt{n}}, max {c, \sqrt{n}}]

. In each trial, we randomly selected 20 base clusterings to execute our method, and repeated this process 10 times without parameter adjustments to compute the mean and variance. The optimal settings for the hyperparameters

λ_{1}

and

λ_{2}

were determined through a grid search. Comparative methods were configured according to their original recommended settings, with FCE utilizing K-means generated base clusterings. The details of our method and the comparative methods are summarized in Table 2.

Table 2. Comparison of clustering methods in terms of fairness consideration, fairness type, ensemble framework, and complexity.

We employed four evaluation metrics: two external metrics (ACC and NMI) to measure the similarity between clustering results and ground truth labels, thus assessing clustering quality; one internal metric (SSE) to evaluate intra-cluster tightness; and two fairness metrics (BAL and MNCE) to quantify the balance of sensitive attribute distributions across clusters.

For each dataset, the ideal fair state is defined such that the proportion of each sensitive group within every cluster matches its global proportion in the entire dataset, i.e.,

p r (G_{s} | π_{v}) = p r (G_{s}) \forall s, v .

(46)

The BAL metric measures the balance between different sensitive groups within clusters, with higher values indicating more equitable distributions. Specifically, BAL is defined as the minimum ratio between the sizes of the smallest and largest sensitive groups in any cluster:

Bal (C) = min_{k} (\frac{N_{k}^{min}}{N_{k}^{max}}) \in [0, 1],

(47)

where

N_{k}^{min}

and

N_{k}^{max}

represent the number of instances in the smallest and largest sensitive groups in cluster

C_{k}

, respectively. Values closer to 1 indicate fairer clustering results.

MNCE evaluates distributional imbalance by comparing the entropy of sensitive group distributions within clusters against the global entropy of sensitive group distributions:

M N C E = min_{k} (\frac{- \sum_{i} \frac{| P_{i} \cap C_{k} |}{| C_{k} |} log \frac{| P_{i} \cap C_{k} |}{| C_{k} |}}{- \sum_{i} \frac{| P_{i} |}{n} log \frac{| P_{i} |}{n}}) \in [0, 1],

(48)

where

| P_{i} \cap C_{k} |

denotes the number of instances from the i-th sensitive group in cluster

C_{k}

,

| C_{k} |

represents the total instances in cluster

C_{k}

,

| P_{i} |

indicates the total instances in the i-th sensitive group, and n is the overall sample size. Higher MNCE values indicate better fairness by minimizing the unevenness of sensitive attribute distributions.

The convergence condition is set as: the relative change in the objective function value between two consecutive iterations is less than

10^{- 6}

, or the maximum number of iterations

T_{max} = 100

is reached.

4.3. Experimental Results and Comparison

As shown in Figure 2, the proposed method introduces a fairness mechanism, which distinguishes it from the original ensemble clustering algorithm PFEC as well as several existing fair clustering algorithms. This mechanism ensures that the distribution of sensitive attributes within each cluster remains consistent with the global distribution. However, incorporating such fairness constraints inevitably affects clustering quality to some degree, a trade-off that becomes especially pronounced when the data distribution is highly correlated with sensitive attributes. For example, on several synthetic datasets, to obtain fairer outcomes, it was necessary to assign some geographically proximate points to different clusters so that the proportion of points possessing each sensitive attribute within every cluster matches the overall distribution. Our objective is to achieve relatively high clustering quality while maintaining strong fairness, keeping points within clusters more compactly grouped rather than sacrificing clustering performance entirely in pursuit of fairness. Empirical evidence from Table 3, Table 4, Table 5, Table 6 and Table 7 indicates that several fair clustering methods underperform on the two-dimensional datasets depicted in Figure 2 with respect to conventional clustering quality metrics. Notwithstanding this, they achieve superior performance in fairness metrics. Compared to the FCE algorithm, our method attains excellent fairness values on the DS577 dataset while simultaneously demonstrating advantages in terms of Accuracy (ACC), Normalized Mutual Information (NMI), and Sum of Squared Errors (SSE). Visual inspection of the results reveals that our method yields more compact and coherent clusters. On the 2d-4c-no0 and 2d-4c-no1 datasets, a marginal deficiency is observed in the fairness metrics of our method relative to FCE. It is noteworthy, however, that this minor discrepancy is attributable to only one or two outlier data points, thereby exerting a negligible impact on the overall results. Furthermore, our method maintains superior performance on standard clustering metrics. On the 2d-4c-no4 dataset, our approach comprehensively surpasses FCE across all evaluated metrics.

Figure 2. Clustering results on the synthetic datasets. Clusters are represented by different colors, and sensitive attributes are indicated by shapes.

Table 3. The ACC(%) values of our algorithm and the comparative algorithms (optimal values in bold, suboptimal values underlined).

Table 4. The NMI(%) values of our algorithm and the comparative algorithms (optimal values in bold, suboptimal values underlined).

Table 5. The SSE values of our algorithm and the comparative algorithms (optimal values in bold, suboptimal values underlined).

Table 6. The BAL values of our algorithm and the comparative algorithms (optimal values in bold, suboptimal values underlined).

Table 7. The MNCE values of our algorithm and the comparative algorithms (optimal values in bold, suboptimal values underlined).

The distribution of data points with sensitive attributes in real-world datasets often diverges significantly from the characteristics of synthetic datasets. As the results demonstrate, traditional algorithms such as K-means and ensemble methods including PFEC, LEGP, LEWA, ECPCS_MC, and ECPCS_HC yield comparatively poor results in terms of ACC and NMI on these real datasets. Additionally, the MNCE values on the HAR and JAFFE datasets are not uniformly zero, which contrasts with their performance on the two-dimensional datasets. This discrepancy suggests that the underlying data structures of these real-world datasets deviate from the property observed in two-dimensional data, where spatially proximate points typically share identical sensitive attributes. On these real-world datasets, our method not only achieves outstanding results in fairness metrics but also attains the best performance in external validation indices, namely ACC and NMI, outperforming all baseline algorithms. Although its performance on the SSE internal metric is inferior to that of standard clustering algorithms. It remains competitive when compared to other fair clustering approaches such as FSCUN, FSCN, FAC, and FCE. In summary, our method exhibits consistent improvements over FCE in both clustering quality and fairness enforcement. Table 8 presents the runtime of the proposed method on various datasets, demonstrating its computational efficiency under different data scales and complexities.

Table 8. Runtime of the Proposed Method on Various Datasets (Unit: seconds).

4.4. Sensitivity Analysis

To investigate the sensitivity of the model to key hyperparameters, we conducted a systematic sensitivity analysis on the regularization parameters

λ_{1}

and

λ_{2}

. The experiment was carried out on several synthetic datasets and the real-world facial dataset JAFFE. By varying the values of

λ_{1}

and

λ_{2}

, we observed their effects on the model’s classification accuracy (ACC) and balance (BAL). The experimental results are shown in Figure 3.

Figure 3. Sensitivity Analysis.

As can be seen from the figure, the model exhibits good robustness to changes in

λ_{1}

and

λ_{2}

on the synthetic datasets. Both ACC and BAL remain relatively stable under different parameter settings. This indicates that the synthetic data distributions are relatively uniform and the model is less dependent on fairness constraints.

However, on the real-world dataset JAFFE, the model’s performance shows a notably different trend. The results reveal that the BAL metric improves significantly only when

λ_{2}

is set to a relatively large value, which corresponds to imposing a stronger fairness constraint, while ACC remains at a high level. This suggests that when dealing with real-world data bias, strengthening the fairness constraint is crucial for improving the fairness of model decisions. Therefore, in subsequent experiments on real-world datasets, we set

λ_{2}

to a larger value to better balance model performance and fairness.

In summary, the sensitivity analysis not only verifies the stability of the model on synthetic data but also highlights the necessity of adjusting fairness constraints in real-world scenarios, providing an experimental basis for hyperparameter selection.

4.5. Ablation Study

To verify the practical effect of the fairness constraint in ensemble clustering, we conduct an ablation study to investigate its impact in real-world scenarios. HAR and MNIST-USPS are selected for the experiment, representing sensor-based activity recognition and cross-domain handwritten digit classification tasks, respectively. Both datasets possess clear sensitive attributes and inherent structural complexity, making them closer to actual application environments. In the objective function, the second constraint controlled by the parameter

λ_{1}

plays a structural role and cannot be omitted; therefore,

λ_{1} > 0

is maintained in all experiments. This study mainly compares the full method (Ours) with a version that removes the fairness constraint. The results are summarized in Table 9. From the experimental outcomes, it can be observed that after introducing the fairness constraint, the fairness metrics BAL and MNCE on both datasets are significantly improved, indicating that the model can effectively promote a balanced distribution of each sensitive group in the clustering. Meanwhile, the clustering quality metrics ACC and NMI also show improvement on HAR and MNIST-USPS, suggesting that the fairness constraint does not harm the overall consistency of the clustering structure and may even bring some structural optimization due to the re-balancing of group distributions in real data. Notably, on the MNIST-USPS dataset, the SSE value increases after adding the fairness constraint, implying a slight decrease in intra-cluster compactness. This may stem from the inconsistency between the distribution of the sensitive attribute and the clustering structure in this dataset; the fairness constraint causes a slight perturbation to the original geometric structure when adjusting group proportions. Overall, the ablation study demonstrates that the introduced fairness constraint can notably enhance group fairness in clustering results and, in most cases, does not significantly degrade clustering quality. It may even lead to improved overall performance in real-world scenarios, indicating that the method achieves a good balance between fairness and clustering effectiveness, and possesses potential for practical applications.

Table 9. Ablation study of objective fuction.

5. Conclusions

In this paper, we have presented a fair ensemble clustering method that extends the framework of Nie et al. [6] by incorporating proportional fairness constraints. Our key contribution is the integration methodology, which embeds fairness directly into the ensemble objective and optimizes it jointly with clustering quality. By introducing a regularization term, it ensures that the distribution of protected groups within each cluster closely aligns with their overall distribution in the dataset while also balancing cluster sizes to prevent extreme imbalances. Unlike conventional post-processing techniques, this method embeds fairness constraints directly into the objective function and employs an iterative optimization strategy to achieve both group fairness and a high clustering quality. Experimental results demonstrate that the proposed approach consistently delivers robust and fair clustering outcomes across datasets of varying sizes, establishing a solid foundation for its application in fairness-sensitive domains. In future work, we will explore how fairness-related information is embedded within the similarity matrix. Specifically, we will investigate the underlying mechanisms that govern the presence and representation of fairness information in the matrix, with the goal of enhancing the interpretability and effectiveness of fairness-aware clustering methods. Nevertheless, our method primarily focuses on group fairness and may face challenges with extremely high-dimensional or large-scale data due to the iterative optimization of the label matrix. Future work could extend this framework to individual fairness notions or integrate fairness constraints into other ensemble mechanisms, such as graph-based or deep ensemble clustering frameworks.

Author Contributions

Conceptualization, Y.L., R.F. and C.Z.; Methodology, Y.L., R.F. and C.Z.; Software, Y.L.; Formal analysis, Y.L.; Investigation, Y.L.; Writing—original draft, Y.L. and R.F.; Writing—review & editing, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 62172242), the Ningbo Science and Technology Fund (Grant Nos. 2023Z228, 2023 Z213, 2024Z126, 2024Z202, and 2024Z119), and the Ningbo Key Laboratory of Intelligent Home Appliances, College of Science and Technology, Ningbo University.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Jain, A.K.; Murty, M.N.; Flynn, P.J. Data clustering: A review. ACM Comput. Surv. (CSUR) 1999, 31, 264–323. [Google Scholar] [CrossRef]
Wang, L.; Pan, C. Robust level set image segmentation via a local correntropy-based K-means clustering. Pattern Recognit. 2014, 47, 1917–1925. [Google Scholar] [CrossRef]
Chen, D.; Lv, J.; Yi, Z. Graph Regularized Restricted Boltzmann Machine. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2651–2659. [Google Scholar] [CrossRef]
Tunali, V.; Bilgin, T.; Camurcu, A. An improved clustering algorithm for text mining: Multi-cluster spherical K-Means. Int. Arab. J. Inf. Technol. (IAJIT) 2016, 13, 12–19. [Google Scholar]
Zhang, M. Weighted clustering ensemble: A review. Pattern Recognit. 2022, 124, 108428. [Google Scholar] [CrossRef]
Xie, F.; Nie, F.; Yu, W.; Li, X. Parameter-free ensemble clustering with dynamic weighting mechanism. Pattern Recognit. 2024, 151, 110389. [Google Scholar] [CrossRef]
Fred, A.; Jain, A. Data clustering using evidence accumulation. In Proceedings of the International Conference on Pattern Recognition, Quebec City, QC, Canada, 11–15 August 2002; Volume 4, pp. 276–280. [Google Scholar] [CrossRef]
Ayad, H.G.; Kamel, M.S. On voting-based consensus of cluster ensembles. Pattern Recognit. 2010, 43, 1943–1953. [Google Scholar] [CrossRef]
Fred, A.L.; Jain, A.K. Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 835–850. [Google Scholar] [CrossRef]
Huang, D.; Wang, C.D.; Wu, J.S.; Lai, J.H.; Kwoh, C.K. Ultra-Scalable Spectral Clustering and Ensemble Clustering. IEEE Trans. Knowl. Data Eng. 2020, 32, 1212–1226. [Google Scholar] [CrossRef]
Jia, Y.; Liu, H.; Hou, J.; Zhang, Q. Clustering ensemble meets low-rank tensor approximation. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; Volume 35, pp. 7970–7978. [Google Scholar]
Wang, T. CA-Tree: A hierarchical structure for efficient and scalable coassociation-based cluster ensembles. IEEE Trans. Syst. Man, Cybern. Part B (Cybern.) 2010, 41, 686–698. [Google Scholar] [CrossRef]
Cai, Y.; Mahmud, M.S.; Xu, J.; Sun, X.; Huang, J.Z. Spectral ensemble clustering with doubly stochastic co-association matrix. Inf. Sci. 2025, 686, 121314. [Google Scholar] [CrossRef]
Wu, Y.; Wu, R.; Liu, J.; Tang, X. Metawce: Learning to weight for weighted cluster ensemble. Inf. Sci. 2023, 629, 39–61. [Google Scholar] [CrossRef]
Jia, Y.; Tao, S.; Wang, R.; Wang, Y. Ensemble Clustering via Co-Association Matrix Self-Enhancement. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 11168–11179. [Google Scholar] [CrossRef] [PubMed]
Zhou, P.; Du, L.; Shen, Y.D.; Li, X. Tri-level robust clustering ensemble with multiple graph learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; Volume 35, pp. 11125–11133. [Google Scholar]
Chen, M.S.; Lin, J.Q.; Wang, C.D.; Xi, W.D.; Huang, D. On regularizing multiple clusterings for ensemble clustering by graph tensor learning. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 3069–3077. [Google Scholar]
Zhou, P.; Du, L.; Li, X. Self-paced consensus clustering with bipartite graph. In Proceedings of the 29th International Conference on International Joint Conferences on Artificial Intelligence, Macao, China, 19–25 August 2021; pp. 2133–2139. [Google Scholar]
Yang, X.; Zhao, W.; Wang, J.; Peng, S.; Nie, F. Auto-weighted Graph Reconstruction for efficient ensemble clustering. Inf. Sci. 2025, 689, 121486. [Google Scholar] [CrossRef]
Bian, Z.; Yu, L.; Qu, J.; Deng, Z.; Wang, S. An Ensemble Clustering Method via Learning the CA Matrix with Fuzzy Neighbors. Inf. Fusion 2025, 120, 103105. [Google Scholar] [CrossRef]
Bai, L.; Liang, J.; Cao, F. A multiple k-means clustering ensemble algorithm to find nonlinearly separable clusters. Inf. Fusion 2020, 61, 36–47. [Google Scholar] [CrossRef]
Zhang, R.; Hang, S.; Sun, Z.; Nie, F.; Wang, R.; Li, X. Anchor-based fast spectral ensemble clustering. Inf. Fusion 2025, 113, 102587. [Google Scholar] [CrossRef]
Huang, D.; Wang, C.D.; Lai, J.H. Fast multi-view clustering via ensembles: Towards scalability, superiority, and simplicity. IEEE Trans. Knowl. Data Eng. 2023, 35, 11388–11402. [Google Scholar] [CrossRef]
Mehrabi, N.; Morstatter, F.; Saxena, N.; Lerman, K.; Galstyan, A. A survey on bias and fairness in machine learning. ACM Comput. Surv. (CSUR) 2021, 54, 1–35. [Google Scholar] [CrossRef]
Caton, S.; Haas, C. Fairness in machine learning: A survey. ACM Comput. Surv. 2024, 56, 1–38. [Google Scholar] [CrossRef]
Chierichetti, F.; Kumar, R.; Lattanzi, S.; Vassilvitskii, S. Fair clustering through fairlets. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Backurs, A.; Indyk, P.; Onak, K.; Schieber, B.; Vakilian, A.; Wagner, T. Scalable fair clustering. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 405–413. [Google Scholar]
Bera, S.; Chakrabarty, D.; Flores, N.; Negahbani, M. Fair algorithms for clustering. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
Kleindessner, M.; Samadi, S.; Awasthi, P.; Morgenstern, J. Guarantees for spectral clustering with fairness constraints. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 3458–3467. [Google Scholar]
Ghadiri, M.; Samadi, S.; Vempala, S. Socially fair k-means clustering. In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, Online, 3–10 March 2021; pp. 438–448. [Google Scholar]
Kleindessner, M.; Awasthi, P.; Morgenstern, J. Fair k-center clustering for data summarization. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 3448–3457. [Google Scholar]
Zhou, P.; Li, R.; Ling, Z.; Du, L.; Liu, X. Fair Clustering Ensemble With Equal Cluster Capacity. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 47, 1729–1746. [Google Scholar] [CrossRef] [PubMed]
Fan, K. On a theorem of Weyl concerning eigenvalues of linear transformations I. Proc. Natl. Acad. Sci. USA 1949, 35, 652–655. [Google Scholar] [CrossRef] [PubMed]
Shi, J.; Malik, J. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 888–905. [Google Scholar] [CrossRef]
Nie, F.; Lu, J.; Wu, D.; Wang, R.; Li, X. A novel normalized-cut solver with nearest neighbor hierarchical initialization. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 659–666. [Google Scholar] [CrossRef]
Anguita, D.; Ghio, A.; Oneto, L.; Parra, X.; Reyes-Ortiz, J.L. A public domain dataset for human activity recognition using smartphones. In Proceedings of the European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, 24–26 April 2013; Volume 3, pp. 3–4. [Google Scholar]
Li, P.; Zhao, H.; Liu, H. Deep fair clustering for visual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 9070–9079. [Google Scholar]
Lyons, M.J.; Akamatsu, S.; Kamachi, M.; Gyoba, J.; Budynek, J. The Japanese female facial expression (JAFFE) database. In Proceedings of the 3rd International Conference on Automatic Face and Gesture Recognition, Nara, Japan, 14–16 April 1998; pp. 14–16. [Google Scholar]
Huang, D.; Wang, C.D.; Lai, J.H. Locally weighted ensemble clustering. IEEE Trans. Cybern. 2017, 48, 1460–1473. [Google Scholar] [CrossRef]
Huang, D.; Wang, C.D.; Peng, H.; Lai, J.; Kwoh, C.K. Enhanced ensemble clustering via fast propagation of cluster-wise similarities. IEEE Trans. Syst. Man Cybern. Syst. 2018, 51, 508–520. [Google Scholar] [CrossRef]

Figure 1. Four synthetic datasets. The shape of a point in the graph represents its sensitive attribute. (a) DS-577; (b) 2d-4c-no0; (c) 2d-4c-no1; (d) 2d-4c-no4.

Figure 2. Clustering results on the synthetic datasets. Clusters are represented by different colors, and sensitive attributes are indicated by shapes.

Figure 3. Sensitivity Analysis.

Table 1. Datasets.

Datasets	Instances	Dimensions	Classes	Groups
DS-577	577	2	3	3
2d-4c-no0	1572	2	4	4
2d-4c-no1	1623	2	4	4
2d-4c-no4	863	2	4	4
HAR	10,299	561	6	30
MNIST-USPS	3800	256	10	2
Reverse MNIST	4000	784	10	2
JAFFE	213	676	10	7

Table 2. Comparison of clustering methods in terms of fairness consideration, fairness type, ensemble framework, and complexity.

Method	Fairness	Fairness Type	Ensemble?	Complexity
K-means	No	-	No	$O (n k d t)$
PFEC [6]	No	-	Yes	$O (n^{2} m)$
LWGP [39]	No	-	Yes	$O (n m)$
LWEA [39]	No	-	Yes	$O (n^{2} m + n^{3})$
ECPCS-HC [40]	No	-	Yes	$O (n^{2} m + n^{3})$
ECPCS-MC [40]	No	-	Yes	$O (n_{c}^{3} + n_{c} n)$
FSCUN [29]	Yes	Group	No	$O (n^{3})$
FSCN [29]	Yes	Group	No	$O (n^{3})$
FAC [28]	Yes	Group	No	$O (n k d t)$
FCE [32]	Yes	Group	Yes	$O (n^{2} m)$
Ours	Yes	Group	Yes	$O (n^{2} max {c, m})$

Note:

n_{c}

refers to the total number of clusters from all base clusterings in the ensemble.

Table 3. The ACC(%) values of our algorithm and the comparative algorithms (optimal values in bold, suboptimal values underlined).

Method	DS-577 (k = 3)	2d-4c-no0 (k = 4)	2d-4c-no1 (k = 4)	2d-4c-no4 (k = 4)	HAR (k = 6)	MNIST-USPS (k = 10)	Reverse MNIST (k = 10)	JAFFE (k = 10)
KM	$0.999 \pm 0.001$	$\underset{̲}{0.929 \pm 0.138}$	$0.802 \pm 0.107$	$0.846 \pm 0.168$	$0.547 \pm 0.067$	$0.322 \pm 0.013$	$0.348 \pm 0.008$	$0.743 \pm 0.079$
PFEC	$0.982 \pm 0.001$	$0.998 \pm 0.001$	$\underset{̲}{0.940 \pm 0.032}$	$\underset{̲}{0.998 \pm 0.001}$	$0.564 \pm 0.015$	$0.368 \pm 0.026$	$0.344 \pm 0.005$	$0.801 \pm 0.052$
LWGP	$0.967 \pm 0.001$	$0.998 \pm 0.001$	$0.921 \pm 0.088$	$0.999 \pm 0.001$	$0.562 \pm 0.024$	$0.355 \pm 0.021$	$0.348 \pm 0.008$	$0.795 \pm 0.051$
LWEA	$0.967 \pm 0.001$	$0.997 \pm 0.001$	$0.776 \pm 0.092$	$0.982 \pm 0.076$	$0.582 \pm 0.022$	$0.381 \pm 0.015$	$0.344 \pm 0.012$	$0.802 \pm 0.068$
ECPCS-HC	$0.968 \pm 0.002$	$0.997 \pm 0.001$	$0.938 \pm 0.080$	$0.000 \pm 0.000$	$0.579 \pm 0.013$	$0.372 \pm 0.019$	$0.297 \pm 0.028$	$0.727 \pm 0.058$
ECPCS-MC	$0.968 \pm 0.001$	$0.998 \pm 0.001$	$0.906 \pm 0.056$	$0.999 \pm 0.001$	$0.571 \pm 0.017$	$0.369 \pm 0.012$	$\underset{̲}{0.349 \pm 0.006}$	$0.772 \pm 0.059$
FSCUN	$0.464 \pm 0.002$	$0.321 \pm 0.000$	$0.332 \pm 0.000$	$0.530 \pm 0.000$	$0.471 \pm 0.000$	$0.318 \pm 0.009$	$0.110 \pm 0.000$	$0.690 \pm 0.000$
FSCN	$0.573 \pm 0.029$	$0.353 \pm 0.031$	$0.440 \pm 0.012$	$0.382 \pm 0.002$	$0.733 \pm 0.005$	$0.282 \pm 0.000$	$0.316 \pm 0.001$	$\underset{̲}{0.967 \pm 0.000}$
FAC	$0.452 \pm 0.000$	$0.351 \pm 0.000$	$0.338 \pm 0.000$	$0.320 \pm 0.000$	$\underset{̲}{0.742 \pm 0.000}$	$0.274 \pm 0.000$	$0.244 \pm 0.000$	$0.817 \pm 0.000$
FCE	$0.351 \pm 0.006$	$0.252 \pm 0.001$	$0.251 \pm 0.000$	$0.255 \pm 0.001$	$0.648 \pm 0.026$	$\underset{̲}{0.419 \pm 0.022}$	$0.359 \pm 0.012$	$0.913 \pm 0.086$
Ours	$0.362 \pm 0.006$	$0.260 \pm 0.002$	$0.265 \pm 0.003$	$0.280 \pm 0.014$	$0.747 \pm 0.020$	$0.512 \pm 0.002$	$0.362 \pm 0.012$	$0.980 \pm 0.010$

Table 4. The NMI(%) values of our algorithm and the comparative algorithms (optimal values in bold, suboptimal values underlined).

Method	DS-577 (k = 3)	2d-4c-no0 (k = 4)	2d-4c-no1 (k = 4)	2d-4c-no4 (k = 4)	HAR (k = 6)	MNIST-USPS (k = 10)	Reverse MNIST (k = 10)	JAFFE (k = 10)
KM	$0.999 \pm 0.001$	$\underset{̲}{0.943 \pm 0.074}$	$0.810 \pm 0.021$	$0.871 \pm 0.117$	$0.572 \pm 0.047$	$0.302 \pm 0.011$	$0.285 \pm 0.021$	$0.828 \pm 0.053$
PFEC	$0.902 \pm 0.002$	$0.992 \pm 0.001$	$\underset{̲}{0.934 \pm 0.045}$	$\underset{̲}{0.994 \pm 0.005}$	$0.586 \pm 0.023$	$0.352 \pm 0.010$	$0.368 \pm 0.015$	$0.872 \pm 0.029$
LWGP	$0.875 \pm 0.003$	$0.990 \pm 0.002$	$0.889 \pm 0.064$	$0.995 \pm 0.005$	$0.588 \pm 0.031$	$0.348 \pm 0.009$	$0.362 \pm 0.016$	$0.862 \pm 0.027$
LWEA	$0.873 \pm 0.003$	$0.989 \pm 0.003$	$0.829 \pm 0.055$	$0.981 \pm 0.055$	$0.637 \pm 0.023$	$0.345 \pm 0.007$	$0.346 \pm 0.021$	$0.869 \pm 0.031$
ECPCS-HC	$0.876 \pm 0.004$	$0.988 \pm 0.003$	$0.908 \pm 0.046$	$0.000 \pm 0.000$	$0.668 \pm 0.014$	$0.354 \pm 0.009$	$0.244 \pm 0.058$	$0.838 \pm 0.031$
ECPCS-MC	$0.877 \pm 0.002$	$\underset{̲}{0.993 \pm 0.004}$	$0.854 \pm 0.047$	$0.997 \pm 0.005$	$0.609 \pm 0.029$	$0.357 \pm 0.011$	$\underset{̲}{0.369 \pm 0.005}$	$0.852 \pm 0.032$
FSCUN	$0.138 \pm 0.001$	$0.013 \pm 0.000$	$0.003 \pm 0.000$	$0.129 \pm 0.000$	$0.571 \pm 0.001$	$0.289 \pm 0.004$	$0.005 \pm 0.000$	$0.833 \pm 0.000$
FSCN	$0.235 \pm 0.041$	$0.170 \pm 0.061$	$0.292 \pm 0.031$	$0.243 \pm 0.055$	$0.662 \pm 0.002$	$0.308 \pm 0.000$	$0.333 \pm 0.000$	$\underset{̲}{0.964 \pm 0.000}$
FAC	$0.028 \pm 0.000$	$0.015 \pm 0.000$	$0.012 \pm 0.000$	$0.018 \pm 0.000$	$\underset{̲}{0.669 \pm 0.000}$	$0.199 \pm 0.000$	$0.145 \pm 0.000$	$0.854 \pm 0.000$
FCE	$0.001 \pm 0.001$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.557 \pm 0.024$	$0.385 \pm 0.013$	$0.368 \pm 0.02$	$0.896 \pm 0.077$
Ours	$0.003 \pm 0.001$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.672 \pm 0.041$	$0.436 \pm 0.002$	$0.372 \pm 0.005$	$0.974 \pm 0.028$

Table 5. The SSE values of our algorithm and the comparative algorithms (optimal values in bold, suboptimal values underlined).

Method	DS-577 (k = 3)	2d-4c-no0 (k = 4)	2d-4c-no1 (k = 4)	2d-4c-no4 (k = 4)	HAR (k = 6)	MNIST-USPS (k = 10)	Reverse MNIST (k = 10)	JAFFE (k = 10)
KM	$71.174 \pm 0.000$	$\underset{̲}{147.465 \pm 54.956}$	$125.465 \pm 18.652$	$\underset{̲}{180.639 \pm 44.573}$	$421.740 \pm 2.578$	$1001.484 \pm 31.012$	$1081.214 \pm 6.140$	$64.149 \pm 0.000$
PFEC	$80.192 \pm 1.673$	$122.672 \pm 0.971$	$132.554 \pm 5.024$	$206.824 \pm 0.891$	$432.891 \pm 4.214$	$\underset{̲}{1034.583 \pm 30.285}$	$\underset{̲}{1081.214 \pm 6.140}$	$\underset{̲}{66.135 \pm 2.150}$
LWGP	$81.918 \pm 1.735$	$124.997 \pm 0.751$	$174.803 \pm 86.514$	$204.848 \pm 0.857$	$428.527 \pm 4.549$	$1100.592 \pm 35.039$	$1110.103 \pm 7.341$	$69.544 \pm 2.520$
LWEA	$82.843 \pm 1.707$	$125.150 \pm 0.916$	$139.058 \pm 11.391$	$200.652 \pm 17.518$	$434.760 \pm 5.986$	$1134.151 \pm 10.653$	$1106.245 \pm 5.876$	$68.684 \pm 3.342$
ECPCS-HC	$81.352 \pm 1.992$	$124.683 \pm 1.547$	$135.679 \pm 4.194$	$0.000 \pm 0.000$	$436.780 \pm 5.041$	$1146.345 \pm 13.808$	$1097.999 \pm 7.160$	$73.097 \pm 3.203$
ECPCS-MC	$80.907 \pm 1.207$	$124.222 \pm 0.183$	$204.688 \pm 114.886$	$204.645 \pm 0.877$	$421.740 \pm 2.578$	$1034.583 \pm 30.285$	$1095.734 \pm 7.134$	$70.917 \pm 3.761$
FSCUN	$460.711 \pm 0.836$	$1530.374 \pm 0.000$	$1614.998 \pm 0.000$	$634.852 \pm 0.000$	$469.185 \pm 0.050$	$1383.759 \pm 16.756$	$3990.999 \pm 0.000$	$73.416 \pm 0.000$
FSCN	$395.266 \pm 74.097$	$1243.533 \pm 88.108$	$1111.799 \pm 60.268$	$574.834 \pm 39.023$	$423.147 \pm 2.022$	$1157.652 \pm 0.851$	$2207.682 \pm 16.195$	$64.149 \pm 0.000$
FAC	$500.437 \pm 0.000$	$1410.137 \pm 0.000$	$1472.426 \pm 0.000$	$674.395 \pm 0.000$	$440.259 \pm 0.000$	$3119.421 \pm 0.000$	$3631.785 \pm 0.000$	$76.735 \pm 0.000$
FCE	$576.123 \pm 0.474$	$1545.267 \pm 0.147$	$1616.014 \pm 0.764$	$847.566 \pm 9.289$	$458.278 \pm 2.116$	$3153.548 \pm 10.013$	$3551.728 \pm 5.189$	$74.148 \pm 9.253$
Ours	$563.430 \pm 4.513$	$1520.350 \pm 6.269$	$1600.552 \pm 2.669$	$799.761 \pm 24.847$	$435.115 \pm 12.101$	$3058.837 \pm 0.912$	$3354.145 \pm 3.890$	$71.832 \pm 11.018$

Table 6. The BAL values of our algorithm and the comparative algorithms (optimal values in bold, suboptimal values underlined).

Method	DS-577 (k = 3)	2d-4c-no0 (k = 4)	2d-4c-no1 (k = 4)	2d-4c-no4 (k = 4)	HAR (k = 6)	MNIST-USPS (k = 10)	Reverse MNIST (k = 10)	JAFFE (k = 10)
KM	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$
PFEC	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$
LWGP	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$
LWEA	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$
ECPCS-HC	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$
ECPCS-MC	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$
FSCUN	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$
FSCN	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$
FAC	$\underset{̲}{0.600 \pm 0.000}$	$\underset{̲}{0.329 \pm 0.000}$	$\underset{̲}{0.233 \pm 0.000}$	$\underset{̲}{0.095 \pm 0.000}$	$0.000 \pm 0.000$	$\underset{̲}{0.609 \pm 0.000}$	$\underset{̲}{0.661 \pm 0.000}$	$\underset{̲}{0.333 \pm 0.000}$
FCE	$0.896 \pm 0.000$	$0.515 \pm 0.000$	$0.356 \pm 0.000$	$0.142 \pm 0.001$	$\underset{̲}{0.149 \pm 0.148}$	$0.105 \pm 0.037$	$0.339 \pm 0.030$	$0.483 \pm 0.032$
Ours	$0.890 \pm 0.003$	$0.513 \pm 0.002$	$0.354 \pm 0.002$	$0.144 \pm 0.002$	$0.700 \pm 0.067$	$0.812 \pm 0.002$	$0.465 \pm 0.001$	$0.500 \pm 0.003$

Table 7. The MNCE values of our algorithm and the comparative algorithms (optimal values in bold, suboptimal values underlined).

Method	DS-577 (k = 3)	2d-4c-no0 (k = 4)	2d-4c-no1 (k = 4)	2d-4c-no4 (k = 4)	HAR (k = 6)	MNIST-USPS (k = 10)	Reverse MNIST (k = 10)	JAFFE (k = 10)
KM	$\underset{̲}{0.054 \pm 0.000}$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$\underset{̲}{0.810 \pm 0.113}$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$\underset{̲}{0.299 \pm 0.290}$
PFEC	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.620 \pm 0.210$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.320 \pm 0.304$
LWGP	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.597 \pm 0.166$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.301 \pm 0.314$
LWEA	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.610 \pm 0.107$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.358 \pm 0.329$
ECPCS-HC	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$
ECPCS-MC	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.841 \pm 0.101$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.101 \pm 0.213$
FSCUN	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$
FSCN	$0.446 \pm 0.235$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.963 \pm 0.000$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.896 \pm 0.000$
FAC	$0.979 \pm 0.000$	$0.961 \pm 0.000$	$0.962 \pm 0.000$	$0.905 \pm 0.000$	$0.860 \pm 0.000$	$0.959 \pm 0.000$	$0.970 \pm 0.000$	$0.970 \pm 0.000$
FCE	$1.000 \pm 0.000$	$1.000 \pm 0.000$	$1.000 \pm 0.000$	$\underset{̲}{0.995 \pm 0.003}$	$\underset{̲}{0.977 \pm 0.002}$	$\underset{̲}{0.445 \pm 0.094}$	$\underset{̲}{0.814 \pm 0.027}$	$\underset{̲}{0.986 \pm 0.005}$
Ours	$1.000 \pm 0.000$	$\underset{̲}{0.999 \pm 0.000}$	$1.000 \pm 0.000$	$0.998 \pm 0.001$	$0.998 \pm 0.001$	$0.978 \pm 0.003$	$0.902 \pm 0.002$	$0.989 \pm 0.009$

Table 8. Runtime of the Proposed Method on Various Datasets (Unit: seconds).

Dataset	Runtime
DS-577	0.012
2d-4c-no0	0.355
2d-4c-no1	0.326
2d-4c-no4	0.361
JAFFE	0.010

Table 9. Ablation study of objective fuction.

Method	HAR					MNIST-USPS
Method	ACC (%)	NMI (%)	SSE	BAL	MNCE	ACC (%)	NMI (%)	SSE	BAL	MNCE
Without Fairness	$0.561 \pm 0.020$	$0.579 \pm 0.018$	$440.102 \pm 4.512$	$0.000 \pm 0.000$	$0.000 \pm 0.000$	$0.360 \pm 0.039$	$0.359 \pm 0.021$	$1035.672 \pm 28.938$	$0.000 \pm 0.000$	$0.000 \pm 0.000$
Ours	$0.747 \pm 0.020$	$0.672 \pm 0.041$	$435.115 \pm 12.101$	$0.700 \pm 0.067$	$0.998 \pm 0.001$	$0.512 \pm 0.002$	$0.436 \pm 0.020$	$3058.837 \pm 0.912$	$0.812 \pm 0.002$	$0.978 \pm 0.003$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A Fair Ensemble Clustering Method

Abstract

1. Introduction

2. Related Work

2.1. Notation

2.2. Ensemble Clustering

2.3. Fairness

3. Proposed Algorithm

3.1. Fair Ensemble Clustering

3.2. Optimization

3.2.1. Updating A

3.2.2. Updating Y

3.2.3. Updating $Θ^{(l)}$

3.2.4. Updating $α^{(l)}$

3.3. Refining

3.4. Time Complexity Analysis

3.5. Convergence Analysis

4. Experiments

4.1. Datasets

4.2. Experimental Setup

4.3. Experimental Results and Comparison

4.4. Sensitivity Analysis

4.5. Ablation Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

A Fair Ensemble Clustering Method

Abstract

1. Introduction

2. Related Work

2.1. Notation

2.2. Ensemble Clustering

2.3. Fairness

3. Proposed Algorithm

3.1. Fair Ensemble Clustering

3.2. Optimization

3.2.1. Updating A

3.2.2. Updating Y

3.2.3. Updating Θ ( l )

3.2.4. Updating α ( l )

3.3. Refining

3.4. Time Complexity Analysis

3.5. Convergence Analysis

4. Experiments

4.1. Datasets

4.2. Experimental Setup

4.3. Experimental Results and Comparison

4.4. Sensitivity Analysis

4.5. Ablation Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

3.2.3. Updating $Θ^{(l)}$

3.2.4. Updating $α^{(l)}$