Next Article in Journal
Selected Algorithmic Papers from the International Workshop on Combinatorial Algorithms 2024
Previous Article in Journal
ML-RASPF: A Machine Learning-Based Rate-Adaptive Framework for Dynamic Resource Allocation in Smart Healthcare IoT
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fuzzy Clustering Approaches Based on Numerical Optimizations of Modified Objective Functions

by
Erind Bedalli
1,2,*,
Shkelqim Hajrulla
1,
Rexhep Rada
2 and
Robert Kosova
3
1
Department of Computer Engineering, Epoka University, 1039 Tirana, Albania
2
Department of Informatics, University of Elbasan “Aleksandër Xhuvani”, 3001 Elbasan, Albania
3
Department of Mathematics, University “Aleksander Moisiu” Durres, 2001 Durres, Albania
*
Author to whom correspondence should be addressed.
Algorithms 2025, 18(6), 327; https://doi.org/10.3390/a18060327
Submission received: 14 April 2025 / Revised: 22 May 2025 / Accepted: 23 May 2025 / Published: 29 May 2025
(This article belongs to the Section Algorithms for Multidisciplinary Applications)

Abstract

Fuzzy clustering is a form of unsupervised learning that assigns the elements of a dataset into multiple clusters with varying degrees of membership rather than assigning them to a single cluster. The classical Fuzzy C-Means algorithm operates as an iterative procedure that minimizes an objective function defined based on the weighted distance between each point and the cluster centers. The algorithm operates decently in many datasets but struggles with datasets that exhibit irregularities in overlapping shapes, densities, and sizes of clusters. Meanwhile, there is a growing demand for accurate and scalable clustering techniques, especially in high-dimensional data analysis. This research work aims to address these infirmities of the classical fuzzy clustering algorithm by applying several modification approaches on the objective function of this algorithm. These modifications include several regularization terms aiming to make the algorithm more robust in specific types of datasets. The optimization of the modified objective functions is handled based on several numerical methods: gradient descent, root mean square propagation (RMSprop), and adaptive mean estimation (Adam). These methods are implemented in a Python environment, and extensive experimental studies are conducted, following carefully the steps of dataset selection, algorithm implementation, hyper-parameter tuning, picking the evaluation metrics, and analyzing the results. A comparison of the features of these algorithms on various datasets is carefully summarized.

1. Introduction

Unsupervised learning is a sub-domain of machine learning where the algorithms operate on data without prior information being provided in the form of labeled responses or predefined classes. Several important forms of unsupervised learning are developed and successfully applied, including clustering, association rule mining, latent variable models, anomaly detection, etc. Clustering aims at finding unknown internal structures in datasets, grouping the data instances into groups based on their similarities with each other [1]. Besides the classical crisp clustering methods, another significant domain of clustering is soft clustering, where the boundaries between the clusters do not have to be crisply separated; thus, overlapping clusters and partial memberships of data instances into several clusters are naturally occurring. An important form of soft clustering is the Fuzzy C-Means (FCM), a clustering algorithm that extends the classical K-Means algorithm, allowing partial memberships of data instances into multiple clusters with varying membership degrees [2]. The algorithm operates based on the minimization of an objective function expressed as the sum of weighted distances between data instances and cluster centroids. The modus operandi of the fuzzy clustering algorithm is an iterative procedure where the new centroids are re-evaluated based on the current membership degrees of each data instance to the clusters, and the memberships are updated based on the distances of each data instance from the new centroids [3]. This dual updating procedure continues until the changes fall below a predefined threshold, thus achieving convergence. Finally, each data instance is assigned a membership degree, which is a real value in the [0, 1] interval, in such a way allowing for a nuanced representation of the intrinsic structures of the dataset. Applications of fuzzy clustering extend to a myriad of disciplines, including image processing, customer profiling, bioinformatics, text organization, etc. [4].
Despite its overall utility, the classical Fuzzy C-Means algorithm may struggle with datasets that are characterized by complex distributions, such as clusters of varying sizes or densities, overlapping clusters, and the presence of noise and outliers. On such occasions, the application of the standard Fuzzy C-Means algorithm may lead to suboptimal clustering outcomes, where clusters are poorly defined and the membership values of data instances may not be properly assigned into the clusters [5]. To address these issues, ideas of modifying the objective function have been occasionally suggested in the machine learning research community, incorporating additional parameters, employing different distance measures, and adjusting penalization terms [6]. For instance, the usage of regularization parameters can diminish the influence of noise and outliers, thus improving the stability of the clustering procedure [7]. Furthermore, the incorporation of spatial characteristics or density-related information can assist in providing a better definition of the clusters and refine the adjustments for the overlapping clusters [8].
The idea of modifying the objective function of Fuzzy C-Means or K-Means in order to make new versions of the respective algorithms that perform better on specific datasets is known in the machine learning community. There is a variety of modification forms, and also there are several ways how each modification is processed into an iterative procedure [9]. Once the objective function is modified, the iterative procedure for updating the centers of the clusters and the membership values should be properly adapted. There are two main approaches towards its optimization: employing the Lagrange multipliers or using numerical optimization directly on the modified objective function. When Lagrange multipliers are employed, an iterative procedure is generated for updating the centers of the clusters and membership values, but the generation of this procedure may become a challenging differentiation problem for sophisticated modifications of the objective function [10]. Conversely, a numerical optimization algorithm can be applied directly to the objective function without requiring an explicit analytical calculation of the partial derivatives.
In this research work, several modification forms of the objective function will be discussed, providing the overall motivation for each modification. The modified objective functions will be optimized based on the gradient descent numerical method and some of its variants, like root mean square propagation (RMSprop) and adaptive moment estimation (Adam). A diverse set of experimental procedures will be conducted on the modified algorithms, intending to expose the strengths and the vulnerabilities of each algorithm. Working with numerical optimization will provide more flexibility in the spectrum of modifications applicable to the objective functions, as no analytical evaluations will be demanded [11].

2. Classical FCM and Modifications on Its Objective Function

The classical Fuzzy C-Means algorithm (FCM), developed by Dunn and refined by J. C. Bezdek, is the cornerstone of the fuzzy cluster analysis. It is a generalization of the well-known K-Means algorithm, involving the notion of fuzziness degree, consequently allowing for partial memberships of the data instances in the generated clusters. The algorithm is fundamentally an unsupervised procedure, as no prior information (in the form of predefined classes or labeled responses) is provided. FCM works as an iteration scheme, aiming to achieve a nonlinear optimization of an objective function defined as
J X ,   U ,   V ;   m ,   C = i = 1 n j = 1 C μ i j m d 2 x i ,   c j
Here, X denotes the dataset, U denotes the membership matrix, and V denotes the cluster centers. Hyper-parameter m is the fuzziness degree (or fuzzy exponent) for controlling the level of fuzziness in the generated clusters [12]. Generally, it is given values larger than 1 (but not larger than 2.5). The greater the value of m , the more will be the fuzziness and less distinct the generated clusters. In addition, the hyper-parameter C represents the number of clusters, n is the size of the dataset, x i is the i -th instance, c j is the center of the j -th cluster, and μ i j is the membership degree of the i -th element in the j -th cluster. Two other hyper-parameters of this algorithm are the scale of tolerance T o l and the distance metric (d) [13].
The objective function is optimized using Lagrange multipliers to form a dual updating scheme that re-calculates the cluster centers and the membership degrees of the data instances into the clusters, as in the following pseudo-code [14]:
  • Initialize the centers of the clusters, assigning as values random data instances.
  • Initialize the partition matrix (assigning 0 to all its entries).
  • Set k   =   1 (number of iterations).
  • Evaluate the d i j values (i.e., the distances of instances from cluster centers).
  • Update membership values as μ i j = d i j 2 m 1 k = 1 c d i k 2 m 1
  • Update the centers of the clusters as c i = j = 1 n μ i j m X j j = 1 n μ i j m
  • k   =   k + 1 (increment the number of iterations).
  • If U k 1 U k 2 > T o l jump to step 4.
  • END.
Although the FCM algorithm has a multitude of successful applications, there are conversely many datasets exhibiting complex distributions, such as overlapping clusters and variety in the size and density of the clusters, where the FCM would underperform, generating clusters that are poorly defined, and the membership values of data instances not being properly assigned to the clusters [15]. In this research work, we are going to consider several modifications (described in the following paragraphs) that are either novel in the way the objective function is modified or in the way the iterative procedure is constructed. In each modification, an extra term is involved in the original objective function of the original FCM, aiming to divert the tendencies of the original algorithm to make them more adaptable to certain types of datasets. In all the cases, the coefficient λ, known as the regularization weight, controls the strength of the additional term, thus representing a hyper-parameter that may be tuned.

2.1. An L 2 Regularization Term

One of the modifications on the objective function is adding an L 2 regularization term to refrain the centroids of the clusters from obtaining large values, thus helping to avoid overfitting. The objective function becomes:
J X ,   U ,   V ;   m ,   C ,   λ = i = 1 n j = 1 C μ i j m d 2 x i ,   c j + λ i = 1 n j = 1 C μ i j m c j 2
Here, c j 2 represents the squared Euclidean norm of the j -th cluster center, thus penalizing centers that lie far from the origin. This modification is very useful once the data are previously normalized using z-score or min–max normalizations, which centralize the original data points. Application of the regularization term stimulates simpler models, typically leading to better generalization of data and less susceptibility to noise and outliers [16]. Furthermore, this modification makes the gradient descent procedure more stable, as the regularization adds a smooth constraint in the optimization framework (avoiding local minima). The idea of this modification has been used in hard clustering by W. Sun et al. to address the high-dimensional noise anomaly.

2.2. An Entropy Term

Another modification of the objective function involves an entropy term, which would stimulate more balanced membership degrees of data instances into the clusters:
J X ,   U ,   V ;   m ,   C ,   λ = i = 1 n j = 1 C μ i j m d 2 x i ,   c j λ i = 1 n j = 1 C μ i j log μ i j
This term would penalize extreme membership values (close to 1 or close to 0), so influencing the data to be more distributed into the clusters. This is of particular usefulness in scenarios where data instances are expected to belong to multiple clusters, avoiding the tendency towards one or a small number of clusters in datasets varying in density and spread of the intrinsic clusters [17]. The idea has been used in several works, mostly in the form of a cluster validity measure, but also in the form of objective function modification being optimized via Lagrange multipliers rather than based on numerical methods [18].

2.3. A Sparsity-Inducing Term

Contrary to the previous approach, it is possible to modify the objective function by adding a sparsity-inducing term, which will manifest a tendency towards crisper and more interpretable clusters:
J X ,   U ,   V ;   m ,   C ,   λ = i = 1 n j = 1 C μ i j m d 2 x i ,   c j λ i = 1 n j = 1 C μ i j
This penalization term would cause the data instances to belong to fewer clusters but with higher membership values. This is of considerable usefulness for feature selection in high-dimensional datasets or interpretability in scenarios like image analysis, customer segmentation, etc. [19]. Several variants of this idea have been presented before, primarily aiming to rectify the robustness in the presence of noise and outliers and the clustering of high-dimensional data.

2.4. A Penalty for Large-Sized Clusters

This modification of the objective function aims at keeping the generated clusters in comparable sizes to each other. It is involving an entropy term, which would stimulate more balanced membership degrees of data instances into the clusters:
J X ,   U ,   V ;   m ,   C ,   λ = i = 1 n j = 1 C μ i j m d 2 x i ,   c j + λ i = 1 n j = 1 C μ i j 2
The squared sum of the membership values acts efficaciously as a penalty for clusters growing large by accumulating a high sum of memberships. The idea of penalizing the large clusters has been employed multiple times, and it is comprised within the more general idea of clustering with constraints [20]. The squared sum of memberships is one of the approaches in this context aiming to ensure that the generated clusters will remain balanced.

2.5. A Spatial Regularization Term

This modification of the objective function aims at keeping the generated clusters smooth in space. It is involving a spatial regularization term, which would stimulate points close to each other to have similar values of the memberships into the clusters.
J X ,   U ,   V ;   m ,   C ,   λ = i = 1 n j = 1 C μ i j m d 2 x i ,   c j + λ i = 1 n j = 1 C k N i μ i j μ k j 2
Here, N ( i ) represents the neighbourhood set of the data instance x i , i.e., the set of points that are in spatial proximity to x i . This may be determined by an additional radius parameter (the radius of proximity) or naturally based on the adjacency concept in the representative data structure, such as adjacent pixels in an image [21]. The coefficient λ controls the strength of the spatial regularization term, thus representing a hyper-parameter that may be tuned. The idea of this modification has been previously used in image analysis, adapting it to the specific scenarios rather than aiming to provide a more general approach [22].

3. Optimization Based on Numerical Techniques

The classical approach based on Lagrange multipliers employs extra variables in order to seamlessly integrate the constraints into the optimization procedure, thus guaranteeing that the constraints are continuously satisfied throughout the procedure. Afterwards, through partial differentiation, the iterative procedure is constructed. The generation of this procedure is well known for the classical forms of the objective function, but it may become a challenging problem for sophisticated modifications of the objective function. In these circumstances, numerical optimization methods may be utilized, providing improved flexibility [23]. In this work, three numerical optimization methods are employed: gradient descent, root mean square propagation (RMSprop) and adaptive moment estimation (Adam). In the following sections, for each of these methods the general theoretical principles are explained; afterwards, the applications of these principles for the optimization of the modified objective functions for fuzzy clustering are elaborated. For each numerical method, the iterative workflow for optimizing the objective function is provided.

3.1. Gradient Descent

Gradient descent is one of the most widely used optimization techniques, especially in the context of machine learning. The fundamental principle of this technique is to iteratively update parameters of interest by tracking the direction of the steepest descent in the objective function that is intended to be minimized. The gradient (i.e., partial derivative) of the objective function with respect to parameters of interest is evaluated, and the parameter values are updated by moving in the opposite direction of the gradient [24].
At each iteration, the algorithm calculates the gradient (the partial derivative) of the objective function with respect to each parameter of interest and updates the parameter values by moving in the opposite direction of the gradient. The learning rate (i.e., the step size) is a parameter that controls the size of each step. For smaller values of the learning rate, the convergence would be slower but would avoid overshooting the minimum. For larger values of the learning rate, the convergence would be reached faster, but there is an augmented risk of instability [25].
The gradient descent technique is helpful in designating the iterative procedure for complex modifications of the objective function. The general form of the modified objective functions considered in this paper was
J M o d i f i e d = J F C M + λ R U , c 1 , c 2 , , c C = i = 1 n j = 1 C μ i j m d 2 x i ,   c j + λ R U , c 1 , c 2 , , c C
Therefore, based on the gradient descent technique, the memberships of the data instances in the clusters are updated according to the equation:
μ i j k + 1 = μ i j k η   J M o d i f i e d   μ i j = μ i j k η · m · μ i j m 1 d 2 x i ,   c j η · λ ·   R   μ i j
Here, μ i j k represents the membership value at the k -th iteration, η represents the learning rate, and represents the partial derivative. Additionally, in order to guarantee that the constraint about the memberships being values in [0, 1] summing up to 1, an extra normalization step is carried out:
μ i j k + 1 = μ i j k + 1 μ i j k + 1
Furthermore, let us consider the distances being evaluated by the Euclidean distance measure, so
d 2 x i ,   c j = x i c j 2
Then, the update of centers based on the gradient descent will be
c j k + 1 = c j k η   J M o d i f i e d   c j = c i k 2 η · i = 1 N μ i j m · x i c j η · λ · R   c i j
The general modus operandi of the fuzzy clustering algorithm based on numerical optimizations of the modified objective function will be similar to the classical FCM algorithm, the main difference relying on the way the dual updates are handled. The procedure can be summarized by the following pseudo-code:
  • Initialize the centers of the clusters, assigning as values random data instances.
  • Initialize the partition matrix (assigning 0 to all its entries).
  • Set k = 1 (number of iterations).
  • Update membership values according to Equation (8).
  • Normalize the membership values according to Equation (9).
  • Update the centers of the clusters according to Equation (10).
  • k = k + 1 (increment the number of iterations).
  • If U k U k 1 > T o l ,   jump to step 4.
  • END.
The given pseudo-code is generic, thus being applicable for every modification form of the objective function being employed. According to the specific type of modification, the details rely on the evaluation of the R μ i j and R c i j terms.

3.2. Root Mean Square Propagation (RMSprop)

Although gradient descent is one of the most widely used optimization techniques, especially in the context of machine learning, it may suffer from instabilities due to its sensitivity to the learning rate in complex landscapes of the objective functions and its sensitivity to noise. As gradient descent uses a fixed learning rate, this may affect the process when moving through a steep valley, causing oscillations through the sides of the valley or even divergent updates [26]. Overall, a constant learning rate is not appropriate for complex landscapes, as a small value of the learning rate would cause a slow convergence on a flat surface, while a relatively large learning rate would cause oscillations or divergent updates in a steep terrain [27].
The root mean square propagation method modifies the gradient descent method by scaling the learning rate based on the recent values of the gradients. Thus, the learning rate is dynamically adjusted to properly adapt to the current landscape. The weighted quadratic average of the previous gradients is evaluated and applied to scale the learning rate during the iterations, so for a parameter θ, the update would be controlled by the equation:
θ k + 1 = θ k η E g 2 k + ϵ · g k
Here, η represents the learning rate, g(t) represents the gradient at the k -th iteration, E[g2](k) represents the weighted quadratic mean of the recent gradients, and ε is a small constant utilized to avoid division by zero. This modification in the update rule makes the root mean square method more stable and particularly efficient in optimizing complex objective functions. Furthermore, it operates robustly in circumstances with noisy or fluctuating gradients, thus requiring significantly less tuning of the learning rate [28].
The root mean square method can be applied in optimizing the objective functions for fuzzy clustering, especially the more complex modification mentioned in the previous sections. So, the gradient of the membership values would be evaluated as
g μ i j k =   J M o d i f i e d   μ i j = m · μ i j m 1 d 2 x i ,   c j + λ ·   R   μ i j
Furthermore, the weighted quadratic average of the gradients would be evaluated (employing a decay factor β, typically having a value close to 0.9) as
E g μ i j 2 k + 1 = β · E g μ i j 2 k + 1 β · g μ i j t 2
In a very similar way, the principle of the moving average characterizing the root mean squares method is applied for the update of the centers. The gradients of the cluster centers are evaluated based on the partial derivatives of the modified objective functions as
g c j k =   J M o d i f i e d   c j = 2 i = 1 N μ i j m · x i c j + λ · R   c i j
The modus operandi of the fuzzy clustering algorithm based on optimization of the objective functions via root mean square propagation will remain the same as the classical FCM, with the major differences relying on the way the dual updates are handled. The number of steps is naturally increased as there are several intermediate computations necessary for both the calculations of the membership values and the calculations of the new centers. The procedure can be summarized by the following pseudo-code:
  • Initialize the centers of the clusters, assigning as values random data instances.
  • Initialize the partition matrix (assigning 0 to all its entries).
  • Set k   =   1 (number of iterations).
  • Update the gradients of the memberships according to Equation (12).
  • Update the weighted quadratic average according to Equation (13).
  • Update memberships, applying Equation (11) for θ = μ i j , so
    μ i j k + 1 = μ i j k η E g μ i j 2 t + ϵ · g μ i j k
  • Normalize the membership values according to Equation (9).
  • Update the gradients of the centers according to Equation (14).
  • Update the centers applying Equation (11) for θ = c i j , so
    c j k + 1 = c j k η E g c i j 2 t + ϵ · g c j k
  • k   =   k + 1 (increment the number of iterations)
  • If U k U k 1 > T o l , jump to step 4.
  • END.

3.3. Adaptive Moment Estimation (Adam)

The adaptive moment estimation (Adam) technique intertwines ideas from root mean square propagation and momentum. It is a popular optimization algorithm with many applications in the domain of machine learning. The algorithm tracks simultaneously two moments: the first moment (i.e., the mean) and the second moment (i.e., the variance) [26]. The rationale behind this approach is that the first moment will assist in maintaining the direction by reducing the oscillations while moving along the landscape, and the second moment will assist in properly adapting the learning rates based on the previous gradient data. The update of the parameters will be handled employing gradients evaluated using both the first and second moments, according to [29]:
θ k + 1 = θ k η · m ^ k v ^ k + ϵ
Here, η represents the learning rate, m ^ k represents the first moment at the k-th iteration after bias correction, v ^ k represents the second moment at the k-th iteration after bias correction, and ɛ is a small constant utilized to avoid division by zero. The update principles are similar to the root mean squared method but are now applied twofold [30]:
m k + 1 = β 1 · m k + 1 β 1 · g k v k + 1 = β 2 · v k + 1 β 2 · g k 2
Furthermore, both moments are bias-corrected, yielding
m ^ k + 1 = m k + 1 1 β 1 t + 1 v ^ k + 1 = v k + 1 1 β 2 t + 1
When optimizing the modified objective functions based on adaptive moment estimation, the update principles employing two moments will be applied for both updating the membership values and the cluster centers. The primary structure of the algorithm remains the same, consisting of an iterative dual update on membership values and the cluster centers; however, the number of intermediate steps is significantly increased. The complete procedure is described by the following pseudo-code:
  • Initialize the centers of the clusters, assigning as values random data instances.
  • Initialize the partition matrix (assigning 0 to all its entries).
  • Set k   =   1 (number of iterations).
  • Update the gradients of the memberships according to Equation (12).
  • Update the first and second moments for each membership value according to Equation (16).
  • Apply the bias correction for both moments of the membership values according to Equation (17).
  • Update the membership values, applying Equation (15) for θ = μ i j , so
    μ i j k + 1 = μ i j k η · m ^ μ i j k + 1 v ^ μ i j k + 1 + ϵ
  • Normalize the membership values according to Equation (9).
  • Update the gradients of the centers according to Equation (14).
  • Update the first and second moments of the centers according to Equation (16).
  • Apply the bias correction for both moments of the centers according to Equation (17).
  • Update the centers, applying Equation (15) for θ = c i j , so
    c j k + 1 = c j k η · m ^ c j k + 1 v ^ c j k + 1 + ϵ
  • k   =   k + 1 (increment the number of iterations).
  • If U k U k 1 > T o l , jump to step 4.
  • END.

4. Experimental Studies

A series of experimental procedures are carried out to analyze the modified fuzzy clustering algorithms. These algorithms were applied on several slightly distorted versions of benchmark datasets of the UCI Machine Learning Repository (where some artificially noise data are added) in order for their performance to be evaluated. The employed datasets in these studies are Banknotes, Liver, Yeast, Dermatology, Vehicle Silhouettes, and LANDSAT Satellite with an additional quantity of 2–3% artificial noise points, which are randomly positioned at a distance from the cluster centers being 5–15% larger than the distance of the farthest genuine point from the respective cluster center [31]. In the original datasets, these artificially added noise points are labeled according to the cluster the random generation relied on, which is expected to be the closest cluster, too, though not mathematically guaranteed. As these points typically lie near cluster boundaries, they are expected to exhibit partial memberships across clusters.
The Banknotes dataset contains four numerical features derived from wavelet-transformed images of a group of Swiss banknotes, along with the class labels indicating for each banknote whether it is genuine or counterfeit.
The Liver dataset (Indian Liver Patient Dataset) contains two demographic features (age and sex) and eight biochemical test results related to liver health, along with the class label indicating whether the patient is healthy or has a liver disorder.
The Yeast dataset contains eight numerical features representing various physicochemical properties of proteins, such as sequence length and localization signals, as well as the class, which is the sub-cellular localization of proteins across ten possible classes.
The Dermatology dataset contains patient data about some skin diseases manifested with redness and scaling. The dataset comprises 34 clinical attributes, including histological, pathological, and subjective features, used to classify six types of skin diseases.
The Vehicle Silhouettes dataset contains 18 numerical features extracted from the silhouettes of vehicle images. It comprises instances of four classes (Double Decker Bus, Opel Manta 400, Saab 9000, and Chevrolet Van).
The LANDSAT Satellite dataset contains multi-spectral values of pixels from satellite images. It comprises 36 numerical features derived from 3 × 3 pixel neighborhoods in satellite images and the class representing the land cover type (e.g., red soil, grey soil, etc.).
The details of the original datasets, including the number of features, number of instances, and number of clusters, are summarized in Table 1.
Although the categorization into classes for the instances of these datasets is well known, this information will not be provided to the clustering algorithms, so they will operate in an unsupervised way. The process starts with the generation of the clustering results, which are then compared with the known natural classes of the dataset to evaluate the performance of the algorithms. The number of clusters hyper-parameter is set to the respective natural number of clusters of each dataset. The regularization hyper-parameter λ is tuned via grid search among the candidate values [0.1, 0.25, 0.5, 1, 2, 4]. The learning rate is set to 0.01, but it is progressively decreased by 10% in case of divergence. Beyond accuracy assessment, the algorithms will also be examined for their sensitivity to noise, allowing for a more thorough evaluation of their performance. Two criteria are employed for the performance evaluation, which are the fuzzy adjusted Rand index (Fuzzy ARI) and fuzzy normalized mutual information (Fuzzy NMI).
The fuzzy adjusted Rand index is a generalization of the Rand index, adjusting this metric to operate with soft (partial) memberships of the instances in the clusters. The similarity is assessed by comparing the fuzzy partition matrix U with the real class labels C , based on mutual agreements of memberships weighted by the membership degrees. While adjusted Rand index measures are based on hard assignments, Fuzzy ARI employs partial memberships, enabling the capture of soft boundaries present in the fuzzy clustering approaches [32]. If the two compared partitions are U = μ i j and V = v i j , for any pair of instances x k and x l , their similarity is evaluated according to partition U and their similarity according to partition V , respectively, as
s k l U = r = 1 n μ k r · μ l r
s k l V = r = 1 n v k r · v l r
Based on these mutual similarity values, the adjusted fuzzy Rand index (F-ARI) is evaluated as
F A R I = i < j s k l U · s k l V i < j s k l U i < j s k l V / n 2 1 2 i < j s k l U + i < j s k l V i < j s k l U i < j s k l V / n 2
The evaluation score varies in the interval [−1, 1], with value 1 indicating perfect matching, value 0 indicating independence (random labelling) and negative values indicating worse than random scenarios.
Normalized mutual information (NMI) is a metric used to assess the quality of clustering in cases when the intrinsic structures are available (but not used during the clustering procedure). It evaluates the shared information between the generated clusters and the true classes of the dataset. If the two compared partitions are U = μ i j and V = v i j , then their fuzzy mutual information is evaluated as
F M I = i = 1 n j = 1 c k = 1 c μ i j · v j k · log μ i j · v j k p j U · p k V
Here, p j U and p k V represent respectively the average membership of cluster j in partition U and the average membership of cluster k in partition V , thus:
p j U = 1 n i = 1 n μ i j
p k V = 1 n i = 1 n v i k
On the other hand, the Fuzzy Normalized Fuzzy Mutual Information guarantees that the value is between 0 and 1 by dividing the fuzzy mutual information by the sum of the entropies of the two respective partitions:
F N M I = F M I i = 1 n j = 1 c μ i j + i = 1 n j = 1 c v i j
NMI is a broad coverage measure as it comprises both how well each cluster corresponds to a real class in the dataset (homogeneity) and how well the instances of a real class in a dataset correspond to a single cluster (completeness) [33]. The final score is normalized, guaranteeing the result to be in the interval [0, 1], making the clustering results more comparable. The value 1 is an indicator of perfect clustering, while the value 0 indicates independence between the real classes and the clustering results. Fuzzy normalized mutual information is a generalized version of NMI, which considers the membership values (treating the clusters as fuzzy sets) instead of hard assignments into clusters [32]. The evaluation proceeds by calculating the joint entropy for the fuzzy membership values.
Table 2, given below, summarizes the performance evaluation of classical Fuzzy C-means based on the fuzzy adjusted Rand index (F-ARI) and the fuzzy normalized mutual information (F-NMI). The FCM algorithm is applied 20 times, and the results shown in the table are the averages of these evaluations.
Table 3 given below summarizes the results of the experimental procedures, where for each dataset are applied the five modified versions of the objective function, respectively L2REG (L2 Regularization Term explained in Section 2.1), ENTR-T (Entropy term explained in Section 2.2), SP-IN (A Sparsity-Inducing Term explained in Section 2.3), PEN-LSC (A Penalty For Large Sized Clusters explained in Section 2.4), and SPAT-REG (Spatial Regularization Term explained in Section 2.5). For each of the modified versions, firstly, three numerical methods are applied: gradient descent (GD), root mean square propagation (RMSprop), and adaptive moment estimation (Adam). In order to provide a robust evaluation, each method is repeated 20 times, and the values shown in the table are the average scores obtained after these executions.
The Fuzzy ARI and Fuzzy NMI scores are visualized in two different line charts, as shown in Figure 1 and Figure 2 below. The abbreviations for each line given in the chart legend are formed by concatenating the objective function modification abbreviations, such as L2REG (L2 Regularization Term explained in Section 2.1), ENTR-T (Entropy term explained in Section 2.2), SP-IN (A Sparsity-Inducing Term explained in Section 2.3), PEN-LSC (A Penalty For Large Sized Clusters explained in Section 2.4), and SPAT-REG (Spatial Regularization Term explained in Section 2.5) with the abbreviation of the three employed numerical methods for optimization, gradient descent (GD), root mean square propagation (RMSprop), and adaptive moment estimation (Adam).

5. Discussion

The experimental results demonstrate in an obvious manner that modifications to the objective function of the classical Fuzzy C-Means (FCM) algorithm, by applying proper regularization or penalization terms, contribute to the improvement of the quality of the generated clusters. Generally, all five modifications (L2 regularization, entropy term, sparsity-inducing term, penalty for large clusters, and spatial regularization) in all datasets had an overall higher Fuzzy ARI and Fuzzy NMI score compared with the classical FCM, and this is valid for all three optimization techniques (gradient descent, root mean propagation, and adaptive moment estimate). From the summarizing table and the charts can be noted that the spatial regularization (SPAT-REG) and large-cluster penalty (PEN-LSC) modifications typically achieved the highest performance scores, especially when optimized using the Adam method, highlighting their robustness in dealing with noisy or imbalanced data distributions.
The comparison of optimization methods points out that adaptive techniques such as RMSprop and Adam generally perform better compared with the classical gradient descent, offering more stable convergence and less susceptiblity to oscillations for sophisticated objective functions. Especially the adaptive moment estimation (Adam) technique demonstrated very effective results across almost all modifications of the objective function and optimization techniques. Moreover, the numerical methods provide flexibility in designing and optimizing complex objective functions without requiring sophisticated derivatives or the Lagrange multipliers approach. A significant disadvantage of the proposed approaches, compared with the classical methods, is the higher computational complexity, which is considered a natural trade-off for the algorithm simplicity that it offers.
This work also holds possibilities for future directions of research, such as devising hybrid approaches combining multiple regularization techniques to further improve the quality of generated clusters or utilization of adaptive optimization techniques in the hyper-parameter tuning area. Finally, applications of these modified algorithms on new datasets, not well known to the machine learning community, would confirm their validity in real-world problems.

6. Conclusions

This study introduced a comprehensive framework for enhancing the classical Fuzzy C-Means algorithm via several options of modification for the objective function and various numerical optimization strategies applied to them. The incorporation of several modifications to the objective function, such as L2 regularization, entropy term, sparsity-inducing term, penalty for large clusters, and spatial regularization, improved the adaptability to datasets with noise and complex distributions. Each of these modifications was optimized numerically using gradient descent, root mean squared, and adaptive moment estimation, thus avoiding the necessity for analytically derived update rules.
The experimental studies applied on a variety of real-world datasets demonstrated the effectiveness of the proposed modifications. Throughout these experimental studies, an extensive comparison was carried out across various modifications and various numerical optimization techniques. Two significant performance metrics, the fuzzy adjusted rand index and fuzzy normalized information, were applied for the assessment and comparison of the quality of the generated clusters. The findings highlight the practical advantages of combining adjustments on the objective functions with numerical optimization techniques in fuzzy clustering tasks. Among the employed modifications and optimization algorithms, it was noticed that generally, the spatial regularization (SPAT-REG) and large-cluster penalty (PEN-LSC) modifications combined with adaptive moment estimation performed better. Future research may explore hybrid approaches integrating multiple regularization strategies or applying this flexible optimization framework to hyper-parameter tuning problems too.

Author Contributions

Conceptualization, E.B. and S.H.; methodology, E.B. and S.H.; differentiation of regularization terms, S.H. and R.K.; numerical algorithm formulation, E.B. and S.H.; data preprocessing, R.R.; implementation, E.B. and R.R.; writing—original draft preparation, E.B., S.H. and R.R.; writing—review and editing, S.H. and R.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All the used datasets are publicly available in the cited online repositories.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Ezugwu, A.E.; Ezugwu, A.E.; Ikotun, A.M.; Oyelade, O.O.; Abualigah, L.; Agushaka, J.O.; Eke, C.I.; Akinyelu, A.A. A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects. Eng. Appl. Artif. Intell. 2022, 110, 104743. [Google Scholar] [CrossRef]
  2. Hashemi, S.E.; Gholian-Jouybari, F.; Hajiaghaei-Keshteli, M. A fuzzy C-means algorithm for optimizing data clustering. Expert Syst. Appl. 2023, 227, 120377. [Google Scholar] [CrossRef]
  3. Rada, R.; Bedalli, E.; Shurdhi, S.; Çiço, B. A comparative analysis on prototype-based clustering methods. In Proceedings of the 12th Mediterranean Conference on Embedded Computing (MECO 2023), Budva, Montenegro, 6–10 June 2023; IEEE: Piscataway, NJ, USA; pp. 1–5. [Google Scholar]
  4. Krasnov, D.; Davis, D.; Malott, K.; Chen, Y.; Shi, X.; Wong, A. Fuzzy c-means clustering: A review of applications in breast cancer detection. Entropy 2023, 25, 1021. [Google Scholar] [CrossRef]
  5. Gosain, A.; Sonika, D. Performance analysis of various fuzzy clustering algorithms: A review. Procedia Comput. Sci. 2016, 79, 100–111. [Google Scholar] [CrossRef]
  6. Liu, J.; Xu, M. Penalty Constraints and Kernelization of M-Estimation Based Fuzzy C-Means. arXiv 2012, arXiv:1207.4417. [Google Scholar]
  7. Borgelt, C. Objective functions for fuzzy clustering. In Computational Intelligence in Intelligent Data Analysis; Springer: Berlin/Heidelberg, Germany, 2013; pp. 3–16. [Google Scholar]
  8. Kang, J.; Min, L.; Luan, Q.; Li, X.; Liu, J. Novel modified fuzzy c-means algorithm with applications. Digit. Signal Process. 2009, 19, 309–319. [Google Scholar] [CrossRef]
  9. Adhikari, S.K.; Sing, J.K.; Basu, D.K.; Nasipuri, M. Conditional spatial fuzzy C-means clustering algorithm for segmentation of MRI images. Appl. Soft Comput. 2015, 34, 758–769. [Google Scholar] [CrossRef]
  10. Zabihi, S.M.; Akbarzadeh-T, M.R. Generalized fuzzy C-means clustering with improved fuzzy partitions and shadowed sets. Int. Sch. Res. Not. 2012, 2012, 929085. [Google Scholar] [CrossRef]
  11. Hajrulla, S.; Bedalli, E.; Kosova, R.; Cela, M.; Ali, L. Dynamic processes through the mathematical models and numerical evaluations. J. Ilm. Ilmu Terap. Univ. Jambi 2025, 9, 199–215. [Google Scholar] [CrossRef]
  12. Khang, T.D.; Vuong, N.D.; Tran, M.K.; Fowler, M. Fuzzy C-means clustering algorithm with multiple fuzzification coefficients. Algorithms 2020, 13, 158. [Google Scholar] [CrossRef]
  13. Bedalli, E.; Ninka, I. Exploring an educational system’s data through fuzzy cluster analysis. In Proceedings of the 11th Annual International Conference on Information Technology & Computer Science, Athens, Greece, 18–21 May 2014; pp. 33–44. [Google Scholar]
  14. Hoppner, F.; Klawonn, F. A contribution to convergence theory of fuzzy c-means and derivatives. IEEE Trans. Fuzzy Syst. 2003, 11, 682–694. [Google Scholar] [CrossRef]
  15. Pérez-Ortega, J.; Roblero-Aguilar, S.S.; Almanza-Ortega, N.N.; Frausto Solís, J.; Zavala-Díaz, C.; Hernández, Y.; Landero-Nájera, V. Hybrid fuzzy C-means clustering algorithm oriented to big data realms. Axioms 2022, 11, 377. [Google Scholar] [CrossRef]
  16. Benjamin, J.B.M.; Yang, M.S. Weighted multiview possibilistic c-means clustering with L2 regularization. IEEE Trans. Fuzzy Syst. 2021, 30, 1357–1370. [Google Scholar] [CrossRef]
  17. Cardone, B.; Di Martino, F. A novel fuzzy entropy-based method to improve the performance of the fuzzy C-means algorithm. Electronics 2020, 9, 554. [Google Scholar] [CrossRef]
  18. Ichihashi, H.; Honda, K.; Notsu, A.; Hattori, T. Aggregation of standard and entropy based fuzzy c-means clustering by a modified objective function. In Proceedings of the 2007 IEEE Symposium on Foundations of Computational Intelligence, Honolulu, HI, USA, 1–5 April 2007; IEEE: Piscataway, NJ, USA; pp. 447–453. [Google Scholar]
  19. Guillon, A.; Lesot, M.J.; Marsala, C. Sparsity-inducing fuzzy subspace clustering. In Archives of Data Science, Series B; Lucius & Lucius Verlagsgesellschaft mbH: Stuttgart, Germany, 2019. [Google Scholar]
  20. Bonilla, J.; Vélez, D.; Montero, J.; Rodríguez, J.T. Fuzzy clustering methods with Rényi relative entropy and cluster size. Mathematics 2021, 9, 1423. [Google Scholar] [CrossRef]
  21. Yang, Y.; Huang, S. Image segmentation by fuzzy c-means clustering algorithm with a novel penalty term. Comput. Inform. 2007, 26, 17–31. [Google Scholar]
  22. Zhao, F.; Jiao, L.; Liu, H. Fuzzy c-means clustering with non-local spatial information for noisy image segmentation. Front. Comput. Sci. China 2011, 5, 45–56. [Google Scholar] [CrossRef]
  23. Binu, D. Cluster analysis using optimization algorithms with newly designed objective functions. Expert Syst. Appl. 2015, 42, 5848–5859. [Google Scholar] [CrossRef]
  24. Wang, X.; Yan, L.; Zhang, Q. Research on the application of gradient descent algorithm in machine learning. In Proceedings of the 2021 International Conference on Computer network, Electronic and Automation (ICCNEA), Xi’an, China, 20–26 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 11–15. [Google Scholar]
  25. Tian, Y.; Zhang, Y.; Zhang, H. Recent advances in stochastic gradient descent in deep learning. Mathematics 2023, 11, 682. [Google Scholar] [CrossRef]
  26. Shi, N.; Li, D. Rmsprop converges with proper hyperparameter. In Proceedings of the International Conference on Learning Representation, Vienna, Austria, 3–7 May 2021. [Google Scholar]
  27. Liu, J.; Xu, D.; Zhang, H.; Mandic, D. On hyper-parameter selection for guaranteed convergence of RMSProp. Cogn. Neurodynamics 2024, 18, 3227–3237. [Google Scholar] [CrossRef]
  28. Khaniki, M.A.L.; Hadi, M.B.; Manthouri, M. Feedback error learning controller based on RMSprop and Salp swarm algorithm for automatic voltage regulator system. In Proceedings of the 2020 10th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad, Iran, 29–30 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 425–430. [Google Scholar]
  29. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2015, arXiv:1412.6980. [Google Scholar]
  30. Bock, S.; Weiß, M. A proof of local convergence for the Adam optimizer. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–8. [Google Scholar]
  31. Khan, M.M.R.; Arif, R.B.; Siddique, M.A.B.; Oishe, M.R. Study and observation of the variation of accuracies of KNN, SVM, LMNN, ENN algorithms on eleven different datasets from UCI machine learning repository. In Proceedings of the 2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT), Kuala Lumpur, Malaysia, 26–28 January 2018; IEEE: Piscataway, NJ, USA, 2019; pp. 124–129. [Google Scholar]
  32. Hullermeier, E.; Rifqi, M.; Henzgen, S.; Senge, R. Comparing fuzzy partitions: A generalization of the rand index and related measures. IEEE Trans. Fuzzy Syst. 2011, 20, 546–556. [Google Scholar] [CrossRef]
  33. Yuan, Z.; Chen, H.; Zhang, P.; Wan, J.; Li, T. A novel unsupervised approach to heterogeneous feature selection based on fuzzy mutual information. IEEE Trans. Fuzzy Syst. 2021, 30, 3395–3409. [Google Scholar] [CrossRef]
Figure 1. Fuzzy ARI scores across objective function modifications and optimization methods.
Figure 1. Fuzzy ARI scores across objective function modifications and optimization methods.
Algorithms 18 00327 g001
Figure 2. Fuzzy NMI scores across objective function modifications and optimization methods.
Figure 2. Fuzzy NMI scores across objective function modifications and optimization methods.
Algorithms 18 00327 g002
Table 1. Summary of the employed datasets.
Table 1. Summary of the employed datasets.
Dataset# Features# Instances# Clusters
Banknotes513722
Liver105832
Yeast8148410
Dermatology343666
Vehicle Silhouettes188464
LANDSAT Satellite3664357
Table 2. Fuzzy ARI and Fuzzy NMI scores for the classical FCM algorithm.
Table 2. Fuzzy ARI and Fuzzy NMI scores for the classical FCM algorithm.
DatasetClassical FCM
F-ARIF-NMI
Banknotes0.730.74
Liver0.680.72
Yeast0.650.69
Dermatology0.700.73
Vehicle Silhouettes0.680.74
LANDSAT Satellite0.690.76
Table 3. Fuzzy ARI and Fuzzy NMI scores across modifications and optimization methods.
Table 3. Fuzzy ARI and Fuzzy NMI scores across modifications and optimization methods.
DatasetObjective Function ModificationGDRMSpropAdam
F-ARIF-NMIF-ARIF-NMIF-ARIF-NMI
BanknotesL2REG0.730.760.740.770.750.78
ENTR-T0.760.790.770.80.780.81
SP-IN0.750.80.760.810.770.82
PEN-LSC0.780.810.790.820.800.83
SPAT-REG0.770.820.780.830.790.84
LiverL2REG0.680.720.690.730.70.74
ENTR-T0.710.750.720.760.730.77
SP-IN0.720.770.730.780.740.81
PEN-LSC0.740.780.750.790.760.8
SPAT-REG0.730.790.740.80.750.79
YeastL2REG0.660.70.670.710.680.72
ENTR-T0.690.730.700.740.710.75
SP-IN0.700.740.710.750.720.76
PEN-LSC0.720.760.730.770.740.78
SPAT-REG0.710.770.720.780.730.79
DermatologyL2REG0.700.750.710.760.720.77
ENTR-T0.730.780.740.790.750.8
SP-IN0.740.790.750.80.760.81
PEN-LSC0.750.80.760.810.770.82
SPAT-REG0.750.810.760.820.770.83
Vehicle SilhouettesL2REG0.670.730.680.740.690.75
ENTR-T0.700.760.710.770.720.78
SP-IN0.710.760.720.770.750.78
PEN-LSC0.730.770.740.780.730.79
SPAT-REG0.720.780.730.790.740.8
LANDSAT SatelliteL2REG0.710.770.720.780.730.79
ENTR-T0.740.80.750.810.760.82
SP-IN0.760.820.770.830.780.84
PEN-LSC0.760.820.770.830.780.84
SPAT-REG0.760.830.770.840.780.85
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bedalli, E.; Hajrulla, S.; Rada, R.; Kosova, R. Fuzzy Clustering Approaches Based on Numerical Optimizations of Modified Objective Functions. Algorithms 2025, 18, 327. https://doi.org/10.3390/a18060327

AMA Style

Bedalli E, Hajrulla S, Rada R, Kosova R. Fuzzy Clustering Approaches Based on Numerical Optimizations of Modified Objective Functions. Algorithms. 2025; 18(6):327. https://doi.org/10.3390/a18060327

Chicago/Turabian Style

Bedalli, Erind, Shkelqim Hajrulla, Rexhep Rada, and Robert Kosova. 2025. "Fuzzy Clustering Approaches Based on Numerical Optimizations of Modified Objective Functions" Algorithms 18, no. 6: 327. https://doi.org/10.3390/a18060327

APA Style

Bedalli, E., Hajrulla, S., Rada, R., & Kosova, R. (2025). Fuzzy Clustering Approaches Based on Numerical Optimizations of Modified Objective Functions. Algorithms, 18(6), 327. https://doi.org/10.3390/a18060327

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop