Next Article in Journal
Magnetic Curves in Generalized Almost Cosymplectic Manifolds
Previous Article in Journal
Interest-Aware Cooperative Caching for Symmetric Space–Air–Ground Integrated Networks
Previous Article in Special Issue
Hybrid Machine Learning for Optimal Design of Piezoelectric Diaphragm Energy Harvesters Using Modified Grey Wolf Optimization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Two-Level Entropy-Weighted Fuzzy C-Means Algorithm and Its Application for Classifying Urban Patterns by Residential Building Characteristics

by
Rosa Cafaro
1,
Barbara Cardone
1 and
Ferdinando Di Martino
1,2,*
1
Department of Architecture, University of Naples Federico II, Via Toledo 402, 80134 Napoli, Italy
2
Center for Interdepartmental Research “Alberto Calza Bini”, University of Naples Federico II, Via Toledo 402, 80134 Napoli, Italy
*
Author to whom correspondence should be addressed.
Symmetry 2026, 18(5), 807; https://doi.org/10.3390/sym18050807
Submission received: 14 April 2026 / Revised: 30 April 2026 / Accepted: 7 May 2026 / Published: 8 May 2026
(This article belongs to the Special Issue Symmetries in Machine Learning and Artificial Intelligence)

Abstract

In this work, a novel entropy-weighted fuzzy c-means variation, referred to as Group-based Entropy Weighted Fuzzy C-Means (GEWFCM), is proposed. This variation introduces a semantic level of partitioning of features into groups. This approach enables the provision of optimal semantic meaning to the clusters, thereby capturing the intrinsic structure of the features, which are naturally grouped into homogeneous semantic sets; the weights are independent of the clusters. The cluster weights provide a direct measure of the importance of each group, determining which dimensions of the phenomenon are relevant, and the intragroup weights determine the most relevant features within a group. Additionally, GEWFCM is computationally more efficient than other cluster-specific weighted fuzzy clustering algorithms, due to the independence of the weights from the clusters. The efficacy of the method was assessed by evaluating census data from 16 Italian cities, with the objective of partitioning urban settlements based on characteristics of residential buildings, including construction technique, period, number of floors, and state of conservation. The findings suggest that the proposed algorithm effectively captures the semantic meaning of clusters. In addition, a comparative analysis between GEWFCM and the well-known Entropy Weighted Fuzzy C-Means (EWFCM) algorithm showed that, although both algorithms provide high similarity of results for all case studies, GEWFCM is significantly faster.

1. Introduction

The classic Fuzzy C-Means (FCM) algorithm [1,2] operates under the implicit assumption that all features contribute equally to the definition of clusters. This assumption is frequently deemed unrealistic in real-world datasets, which are often characterized by heterogeneous, redundant, or irrelevant features. In order to surmount this limitation, a number of studies have proposed variants of FCM based on the introduction of features associated with weights. The objective of this approach is to adapt the distance metric to the data structure.
In recent years, several weighted variations of FCM have been proposed with the objective of balancing the importance of features in cluster construction. A weighted FCM feature selection method based on the principle of refined justifiable granularity, measuring the significance of features in the feature space, is proposed in [3]. In [4], a classification method is proposed that integrates a weighted FCM algorithm and an enhanced adaptive neuro-fuzzy inference model for the classification of chronic kidney disease. In [5], an algorithm based on a dissimilarity measure is employed for the purpose of clustering gene expression data.
A significant constraint of these methodologies is the complexity involved in assigning a semantic interpretation to the clusters. To address this limitation, Keller and Klawonn [6] propose a model in which each feature possesses a cluster-dependent weight, that is, a distinct parameter for each feature–cluster pair. In this formulation, each cluster is characterized by its own relevant subspaces. Furthermore, features may possess differing levels of importance across different clusters. This approach exhibits a greater degree of expressiveness in comparison with FCM; however, it concomitantly introduces novel challenges, including a substantial augmentation in the number of parameters, a heightened computational complexity, and an augmented sensitivity to noise and initialization.
In [7], a weight learning mechanism based on optimization techniques (e.g., gradient descent) is proposed, showing that an appropriate weight assignment can improve clustering quality compared to the standard FCM. This algorithm has been demonstrated to exhibit superior speed and resilience to noise compared to [6]. However, it should be noted that these weights are global, which means they do not take into account the differences between clusters. Additionally, the estimation of these weights can be unstable and dependent on the initialization process.
In order to enhance the interpretability of the clusters, an entropy-weighted FCM variation based on entropy regularization, termed Entropy-Weighted Fuzzy C-Means (EWFCM), is proposed in [8]. This approach involves differentiating the weight of the features between the clusters. In EWFCM, feature weights are determined on a per-cluster basis and undergo an exponential transformation of feature costs; an entropy term is incorporated to circumvent degenerate solutions. EWFCM demonstrates superior robustness in comparison with non-regularized methods and exhibits enhanced adaptability in the context of high-dimensional data. Nonetheless, the model exhibits considerable limitations. Primarily, it possesses a high degree of computational complexity. Indeed, the number of model parameters is proportional to the product of the number of features and the number of clusters. This results in a high cost per iteration, protracted convergence times, and suboptimal scalability when dealing with large datasets.
In order to manage high-dimensional datasets, a feature-weighted entropy FCM method was proposed in [9]. This method allows for the reduction of the feature space by removing irrelevant features with small weights. This method automatically calculates individual feature weights while reducing redundant feature components, thereby enabling the clustering of high-dimensional features.
In [10], a variation of EWFCM is proposed. In this variation, a subset of the feature space is extracted, and a weight is allocated to each feature dimension based on the feature’s impact on clustering. In [11], an automatic local feature weighting and cluster weighting mechanism is proposed to properly weigh the features and to attenuate the initialization sensitivity of FCM.
An adaptive feature-weighted entropy FCM algorithm for image segmentation is applied in [12] to mitigate the contribution of less significant features. The authors employ a distance metric that incorporates both Euclidean and non-Euclidean distances.
In [13], a novel weighted FCM is proposed, in which an objective function is utilized that is based on a feature-weighted generalized entropy regularization strategy.
These approaches, while endeavoring to reduce the computational complexity of EWFCM, fail to fully capture the semantics of the data and efficiently optimize the interpretability of the results. Additionally, their computational complexity remains high, rendering them ill-suited for the management of high-dimensional datasets.
Table 1 summarize the characteristics, strengths, and limitations of the existing method types. In particular, while EWFCM offers high flexibility thanks to cluster-specific weighting, it suffers from increased computational complexity and limited interpretability.
In a multitude of real-world application contexts, datasets manifest an inherent structural organization, wherein variables can be naturally classified into homogeneous semantic groups. These sets reflect well-defined conceptual categories and not simple arbitrary aggregations of features.
For instance, the criteria employed to assess and appraise the quality of drinking water can be categorized into several domains. These domains include physical characteristics, such as temperature, turbidity, and color; chemical characteristics, such as pH, fixed residue, and hardness; chemical–toxicological characteristics, such as the concentration of heavy metals and organic pollutants; and microbiological characteristics, such as bacterial concentrations and indicators of environmental contamination.
The grouped FCM algorithm [6] while adapting to this natural way of grouping features into categories, does not capture the different influences of the features in a group, since all the features within a group are treated in the same way, which reduces the model’s ability to capture heterogeneous feature relevance.
In this scenario, traditional fuzzy clustering methods and their feature-weighted extensions, such as EWFCM, treat variables independently, neglecting the hierarchical or semantic structure of the data.
In order to overcome this limitation, a variation of EWFCM was proposed. This variation is referred to as Group-based EWFCM (GEWFCM). GEWFCM introduces a two-level weighting mechanism. In this mechanism, feature groups represent higher-level semantic units. Furthermore, features within each group contribute relatively to the representation of clusters.
In the proposed model, the explicit introduction of groups enables the objective function to be modeled consistently with the data structure. Specifically, the weight assigned to a feature is expressed as the product of the weight assigned to the group to which the feature belongs and the weight of the feature within that group.
In comparison to EWFCM, in which weights are defined independently for each cluster and each feature, the proposed model introduces a structural regularization that reduces uncontrolled weight variability and improves solution stability.
Unlike existing feature-weighted approaches, the proposed GEWFCM does not simply reduce the number of parameters but introduces a structurally constrained weighting model that explicitly reflects the semantic organization of the feature space. In particular, while EWFCM assigns feature weights independently for each cluster, GEWFCM decomposes feature importance into two complementary components: a group-level weight and an intra-group feature weight. This formulation allows the model to capture both the relevance of semantic dimensions and the contribution of individual features within each dimension.
This represents a conceptual shift from unstructured feature weighting to semantically guided clustering. As a result, GEWFCM not only reduces the dimensionality of the optimization problem, but also improves the stability of the solution and provides a more interpretable representation of clusters.
Furthermore, compared to existing grouped approaches, GEWFCM overcomes the limitation of uniform feature importance within groups by introducing intra-group weighting, thereby enabling a more flexible and accurate modelling of heterogeneous feature relevance.
A further objective of the method is to improve the interpretability of the results. In the context of EWFCM, the weights are designated as cluster-specific, not constrained by a semantic structure. This implies that each cluster is characterized by combinations of features that are challenging to interpret and may not be consistent with each other.
In contrast, in GEWFCM, the weights are cluster independent. The group weights provide a direct measure of the importance of each semantic dimension, determining which dimensions of the phenomenon are relevant. The intra-group weights allow us to identify the most relevant features within each group and determine which specific variables contribute to the definition of the clusters. This dual structure is intended to ensure greater semantic interpretability of the clusters.
Additionally, the independence of the weights from the clusters results in enhanced computational efficiency in comparison to EWFCM. This results in a significant reduction in the dimensionality of the optimization problem, greater numerical stability, and shorter convergence times, making GEWFCM suitable for handling high-dimensional data.
In summary, the proposed method is characterized by two main contributions:
  • Improved interpretability: Partitioning features into groups allows for better semantic meaning to be assigned to clusters. This dual structure enables a two-level interpretation of clusters: a global level for groups, which allows us to determine which dimensions of the phenomenon are relevant, and a local level consisting of features within a group, which allows us to determine which group-specific variables contribute to the definition of a cluster.
  • Higher computational efficiency: Indeed, the number of parameters is reduced compared to cluster-specific models like EWFCM. The independence of the weights from clusters reduces computational cost, especially when dealing with many clusters.
GEWFCM has been tested as an unsupervised classifier for classifying urban settlements based on a set of residential building characteristics acquired from the population and building census dataset compiled by the Italian National Statistical Institute (ISTAT). The information on residential buildings is summarized by census zone; for each characteristic, the number of residential buildings with that characteristic located in each census zone is measured. These characteristics are grouped into five types: construction technique, construction period, number of floors, number of interiors, and state of conservation.
After introducing the EWFCM algorithm in Section 2, Section 3 discusses the proposed algorithm in detail and describes the case studies used. Section 4 presents the test results and discusses the comparative results. Section 5 includes concluding remarks.

2. Preliminary Concepts

The EWFCM Algorithm

EWFCM is an extension of FCM that, instead of considering each feature of equal importance, assigns them a weight to enhance their contribution in the creation of clusters.
Let x i R p   i = 1 , , N be the set of N samples Let X be a set of N samples described by p features and let C be the number of clusters.
The fuzzy partition matrix is denoted by U, the membership degree of the ith sample in the kth cluster by uik, and the centroid of kth cluster by vk.
In EWFCM, the relevance of the jth feature in the kth cluster is represented by a cluster-dependent feature weight w k j where 0 w k j 1  and k = 1 C w k j = 1 .
The objective function is given by
J ( U , V , w ) = i = 1 N k = 1 C u i k m j = 1 p w j k x i j v k j 2 + λ k = 1 C j = 1 p w j k l o g w j k
where m is the fuzzifier parameter and Γ is an entropy parameter set for the regularization of the weights.
The first term is exactly the classical fuzzy energy as in FCM, but with the distance scaled by the eights w k j . The last term is an entropy penalty term that prevents degenerate solutions (e.g., all the weight on a single feature). This term encourages the membership degrees to be more distributed, to avoid the model getting stuck on local optima.
Using the Lagrange multiplicators to minimize the objective function, the three solutions for the partition matrix, the centroids of the clusters and the weights are obtained.
v k j = i = 1 N u k i m x i j i = 1 N u k i m
u k i = r = 1 C j = 1 p w k j x i j v k j 2 j = 1 p w r j x i j v r j 2 1 m 1 1
w k j = exp i = 1 N u k i m x i j v k j 2 Γ s = 1 p e x p i = 1 N u k i m x i s v k s 2 Γ
In (4) the weight are given in exponential form as softmax functions [14,15].
The process is iterated until a stop criterion is reached.
Below is shown in pseudocode the Algorithm 1 EWFCM.
Algorithm 1: EWFCM
Input:                   Set of data points
                              Number of cluster C
                              Fuzzifier parameter m
                              Entropy parameter Γ
                              Stop iteration error ε
Output:                Partition matrix
                              Centroids of the clusters
                              Weights
Set initially randomly the centroids of the clusters and the weights
Repeat
Update the partition matrix by (2)
Update the centroids by (3)
Update the weights by (4)
Until ( J U t + 1 , V t + 1 , w t + 1 J U t , V t , w t > ε) and (numIter ≤ maxIter)
Return  U t + 1 , V t + 1 , w t + 1

3. Materials and Methods

3.1. The GEWFCM Algorithm

Let x i R p   i = 1 , , N be N samples. Let the p features partitioned into H disjoint semantic groups G 1 , , G H , where G h ( j )  denotes the group to which the jth feature belongs.
GEWFCM introduces a two-level weighting mechanism. To each group G h is assigned a weight g h where g h 0 and h = 1 H g h = 1 . To the jth feature belonging to the group G h ( j ) is assigned an intra-group weight w j h where w j h 0  and j G h w j h = 1 .
The aggregate weight associated with the jth feature is therefore defined as follows:
w j   =   g h ( j )   w j h ( j )
where the following constraint holds:
j = 1 p w j = 1
The objective function is given by
J ( U , V , g , w ) = i = 1 N k = 1 C u i k m j = 1 p w j ( x i j v k j ) 2 + Γ g h = 1 H g h l o g   g h + h = 1 H Γ h j G h w j h l o g   w j h
Here, U = u i k is the fuzzy partition matrix, V = v k is the set of cluster centroids, and m > 1 is the fuzzifier.
The parameter Γ g > 0 controls the entropy regularization of the group weights, whereas Γ h > 0 controls the entropy regularization of the intra-group feature weights. Small values of γ g and γ h promote concentrated weight distributions; large values produce smoother and more uniform distributions.
The convergence properties of GEWFCM are consistent with those of standard FCM and its weighted extensions. GEWFCM follows an alternating optimization scheme, where the objective function is minimized with respect to one set of variables at a time while keeping the others fixed.
Specifically, the algorithm alternates between the update of membership degrees, cluster centroids, intra-group feature weights, and group weights. Each of these steps corresponds to the solution of a constrained optimization problem, which ensures that the value of the objective function does not increase. Therefore, the sequence of objective function values J(t) at the iteration t is monotonically non-increasing and J(t+1)) ≤ J(t). Moreover, since the objective function is bounded from below, the sequence converges to a stationary point. Although global optimality cannot be guaranteed due to the non-convex nature of the problem, this property ensures the stability of the algorithm.
In particular, GEWFCM minimizes the objective function by an alternating optimization procedure. At iteration t , the aggregate weights are computed from (5) and normalized so that their sum is equal to one. The data are transformed according to the aggregate feature weights:
x ~ i j t = w j t x i j .
Standard FCM is applied to the transformed data, yielding the updated memberships
u i k t + 1 = r = 1 C d i k t d i r t 1 m 1 1
where d i k t = x ~ i t v ~ k t 2 , and the centroids in the transformed space are
v ~ k t + 1 = i = 1 N u i k t + 1 m x ~ i t i = 1 N u i k t + 1 m
The centroids in the original feature space are then updated as
v k j t + 1 = i = 1 N u i k t + 1 m x i j i = 1 N u i k t + 1 m .
For fixed U and V , the intra-group weights are obtained by minimizing the part of the objective function that depends on the weights of the features belonging to the same group. The feature cost is defined as
S j = i = 1 N k = 1 C u i k m x i j v k j 2 .
The quantity S j measures the intra-cluster dispersion associated with the j -th feature. Lower values of S j indicate that the feature is more effective in describing the cluster structure.
For each group G h , the intra-group weights are obtained by minimizing the weight-dependent part of the objective function under the normalization constraint.
m i n { w j h } j G h j G h w j h S j + Γ h j G h w j h l o g w j h
subject to j G h w j h = 1 .
Introducing the Lagrange multiplier λ h , the Lagrangian is
L h = j G h w j h S j + Γ h j G h w j h l o g w j h + λ h j G h w j h 1 .
The stationarity condition
L h w j h = S j + Γ h ( 1 + l o g w j h ) + λ h = 0
leads to the closed-form update
w j h t 1 = exp S j t 1 Γ h r G h e x p S r t 1 Γ h .
The group cost of the hth group G h is defined as the weighted average feature cost within the group:
S h = 1 | G h | j G h w j h S j .
The quantity S h measures the average intra-cluster dispersion of the features in group G h , weighted by their intra-group importance. Lower values of S h indicate that the corresponding semantic group provides a more informative description of the cluster structure.
For fixed U , V , and w j h , the group weights are obtained by solving
m i n { g h } h = 1 H h = 1 H g h S h + Γ g h = 1 H g h l o g g h
subject to h = 1 H g h = 1 .
Introducing the Lagrange multiplier μ , the Lagrangian is
L g = h = 1 H g h S h + Γ g h = 1 H g h l o g g h + μ h = 1 H g h 1 .
The stationarity condition
L g g h = S h + Γ g ( 1 + l o g g h ) + μ = 0
yields the closed-form update
g h t 1 = exp S g t 1 ( h ) Γ g r = 1 H e x p S g t 1 ( r ) Γ g .
Finally, convergence is checked through the variation of the aggregate feature weights:
Δ t + 1 = j = 1 p w j t + 1 w j t .
If Δ t + 1 < ε , the iterative process stops; otherwise, the next iteration is performed.
Below, Algorithm 2 summarizes the GEWFCM procedure.
Algorithm 2: GEWFCM
Input:                  Set of data points
                             Number of cluster C
                             Fuzzifier parameter m
                             Entropy parameters Γ g  and  Γ h
                             Stop iteration error ε
Output:                Partition matrix
                             Centroids of the clusters
                             Group weights g h
                             Intra-group weights w j h
Initialize U 0 (or, equivalently, the centroids), the group weights g 0 , and the intra-group weights w j h 0 .
Repeat
Compute the aggregate feature weights w j t from (5) and normalize them
Transform the data according to (8).
Update the partition matrix by (9)
Update the transformed centroids by (10) and the original centroids by (11).
Update the original centroids by (11)
Compute the feature costs by (12)
Update the intra-group weights by (16)
Compute the group costs by (17)
Compute the weight variation  Δ t + 1 by (22)
Until  Δ t + 1 > ε
Return  U t + 1 , V t + 1 , g t + 1 , w t + 1
From a computational standpoint, GEWFCM has a significant advantage over EWFCM. In EWFCM, the number of feature weights updated at each iteration is equal to p × C , where p is the number of features and C is the number of clusters. In contrast, GEWFCM requires the update of p + H weights per iteration, where H is the number of semantic groups and typically H p .
Consequently, GEWFCM reduces the dimensionality of the optimization problem and generally leads to lower computational cost and faster convergence.

3.2. The Case Studies

GWEFCM has been tested to classify urban settlement zones based on residential building fabric characteristics. To this end, the population and building census datasets carried out by the Italian Institute of Statistics (ISTAT) in 2011 were utilized.
The objective of the present study was to conduct a series of tests. To this end, all information relating to the characteristics of residential buildings was extracted.
The information was grouped by census zone, and the samples included 16 Italian cities. Each piece of information corresponds to the number of residential buildings in the census zone that possess a specific characteristic.
The data have been standardized by dividing it by the total number of buildings in the census zone. Consequently, each feature will contain the frequency of residential buildings exhibiting a specific characteristic.
This normalization corresponds to a frequency-based scaling, which is particularly appropriate for this type of data, where each feature represents the relative frequency of a specific building characteristic within a census zone.
Alternative normalization techniques, such as min–max scaling or z-score standardization, could also be considered. However, these approaches may alter the semantic meaning of the features. In particular, z-score normalization assumes a Gaussian distribution and may introduce negative values, which are not meaningful for frequency-based variables. Min–max normalization, on the other hand, may reduce variability in the presence of outliers.
The adopted normalization preserves the relative proportions of the features within each census zone, which is essential for maintaining the interpretability of the clustering results. Preliminary tests showed that the proposed method is robust with respect to the choice of normalization, and the overall clustering structure remains stable. For these reasons, frequency-based normalization was selected as the most appropriate preprocessing step for this study.
The features were grouped into five groups, as specified in Table 2.
In recent years, urban data analysis has increasingly leveraged advanced machine learning frameworks, including deep learning and spatial analytics techniques, for tasks such as urban morphology analysis and Transit-Oriented Development (TOD). These approaches enable the extraction of complex patterns from large-scale urban datasets.
For example, recent studies have explored the integration of clustering and representation learning methods to capture spatial and functional characteristics of urban environments [16]. While these approaches are particularly effective for predictive modeling and large-scale pattern recognition, they often rely on complex architectures and may lack interpretability.
In this context, the proposed GEWFCM method provides a complementary approach, focusing on interpretable unsupervised clustering. By introducing a structured feature weighting mechanism based on semantic grouping, GEWFCM enables a meaningful description of urban patterns while maintaining computational efficiency.
In [17] an unsupervised FCM-based classifier was tested to classify urban patterns based on residential building characteristics related to construction techniques and construction macro-periods.
In the experimental tests conducted on GWEFCM, all the characteristics of the residential buildings present in the datasets provided by ISTAT were considered separately.
For each municipality case study, the dataset was constructed including all the 26 features described in Table 1, where the instances are the census zones into which the municipality is partitioned.
Each census zone will be assigned to the clustering to which it belongs with the highest membership degree.
GEWFCM was implemented using the Python ArcPy libraries from the GIS tool ArcGIS Pro 3.5.
The Xie–Beni validity index [18,19] was employed to ascertain the optimal number of clusters. Xie–Beni determines the optimal number of clusters by minimizing the ratio between the compactness of the clusters (intra-cluster variance) and the minimum separation between the cluster centers. Xie–Beni is the most widely used validity index in FCM to determine the optimal number of clusters [20].
Several samplings were performed to determine the optimal values of the Γ g and Γ h parameters. In each trial, different combinations of the two parameters were set, and the combination that generated the minimum value of the Xie–Beni index was selected. The best values are obtained setting Γ g = 10 and Γ h = 10 .
To give semantic meaning to the clusters, the centroid values were normalized by dividing them by the sum of the centroid values of the features in the corresponding group. Then, a linguistic label is assigned as the relevance of each feature in a cluster in the following way.
Let vkj be the value of the jth component of the centroid of the kth cluster.
Let h(j) be the group in which the jth feature is inserted and let |h(j)| the cardinality of this group.
The relevance of the jth feature in the cluster is labeled as significant if vkj > 2/|h(j)|. Otherwise, it is labeled as not negligible if vkj > 1/|h(j)| or negligible if vkj ≤ 1/|h(j)|.
For example, suppose that in a given cluster the normalized values obtained for the Construction technique group features are d5 = 0.8, d6 = 0.15, d7 = 0.05. In this case, |h(j)| = 3 and the relevance of each of the three features in the cluster will be, respectively,
r5 = significant   r6 = negligible   r7 = negligible
This cluster will then group together urban areas with a significant prevalence of masonry buildings.

4. Results and Discussion

The experimental tests were carried out by acquiring ISTAT census data on building characteristics for the 16 most populous Italian cities. The data sources are the ISTAT census datasets conducted in 2011 on all Italian municipalities. They are available at https://www.istat.it/notizia/basi-territoriali-e-variabili-censuarie, accessed on 1 February 2026.
For each city, a dataset was extracted containing data relating to the frequency of residential buildings with the characteristics described in Table 1. For reasons of brevity, the results obtained for the cities of Genoa, Bari, and Naples are shown in detail and the results obtained for all cities are summarized.
In Table 3, the values assigned to the GEWFCM parameters are shown.
As shown in [21], although the optimal choice for the fuzzifier parameter m depends on the dataset, the optimal range is between 1.5 and 2.5, and the central value of m = 2 is considered a safe and robust choice; it is a well-established best practice in literature that ensures good management of uncertainties and overlaps in the data, avoiding both excessive crispness (m tending towards 1) and excessive blurriness (m greater than 2).
The value of the convergence threshold ε was set to 1 × 10−5 because it is small enough to ensure that the cluster centers have stabilized and do not undergo significant changes. A lower threshold would increase the number of iterations required for convergence, without significantly improving the quality of the clustering.
The group and feature entropy parameters Γ g  and Γ h were selected through a systematic exploration over the range [1, 100]. For each combination of the two parameters, GEWFCM was executed and the corresponding Xie–Beni validity index was computed.
The optimal values were selected as those minimizing the Xie–Beni index. In addition, a sensitivity analysis was conducted to assess the robustness of the proposed method with respect to variations of Γ g  and Γ h The experimental results show that the clustering structure remains stable over a relatively wide range of parameter values, indicating that the method is not overly sensitive to the specific choice of these parameters. This analysis confirms that the selected values of Γ g  and Γ h provide a good trade-off between sparsity of the weights and stability of the clustering results.
Inn each test, a preprocessing phase is performed to determine the optimal number of clusters. This is accomplished by running GWEFCM while increasing the number of clusters from C = 2 to C = 10. The optimal number of clusters is obtained for the value of C that minimizes the Xie–Beni index.
We performed 20 independent runs of each algorithm with different random initializations for each test case. The results are reported in terms of average.
All experiments were conducted on a machine equipped with an Intel Core i7-12700K processor and 32 GB of RAM. The GEWFCM and EWFCM algorithms was implemented using the Python 3.13 programming language and the NumPy, SciPy, and scikit-learn libraries. The suite ESRI ArcGIS Pro 10.3 release was used to construct all the thematic maps. Both GEWFCM and EWFCM were implemented in the same environment and executed under identical conditions to ensure a fair comparison.

4.1. Building Classification of Genoa

In the 2011 ISTAT census database Genoa is partitioned in 3616 census zones, of which 3454 are residential (Figure 1).
The input dataset was prepared by selecting only the residential census zones and constructing the 26 features as in Table 1.
At the end of the preprocessing phase, the optimal number of clusters was obtained with value C = 3, for which the smallest value of the Xie–Beni index was measured to be equal to 0.660.
For C = 3 the convergence is reached after eight iterations, in which the difference Δ between the weights obtained in the last iteration and those obtained in the previous iteration (17) is 2.139 × 10−6.
Then, the census zones of Genoa were grouped into three clusters, C1, C2, and C3. Table 4 shows, normalized with respect to the groups to which they belong, the centroids of the three clusters and the relevance of the features with respect to the three clusters.
The Cluster C1 category is comprised of census zones that predominantly feature residential buildings constructed in load-bearing masonry prior to 1919 and that have undergone a relatively intact conservation process. Cluster C2 comprises census zones that are distinguished by the predominant use of reinforced concrete in residential building construction during the post-war period, spanning from 1945 to 1960, and are notable for their state of preservation. The third cluster encompasses census zones characterized by the coexistence of residential buildings in good repair, constructed using load-bearing masonry prior to 1919, and those constructed using reinforced concrete between 1945 and 1960.
The thematic map in Figure 2 illustrates the partitioning of the census zones of Genoa into three clusters. Non-residential census zones are indicated by the use of the color gray.
In summary, the urban zone of Genoa appears to be comprised of three distinct categories. The first category, depicted in red on the map, encompasses residential buildings predominantly constructed with load-bearing masonry and erected prior to the onset of the 20th century. These buildings are representative of the historic core of the city. The second category, represented by green on the map, is characterized by later urbanization, with buildings predominantly constructed using reinforced concrete. The third category, represented by orange on the map, comprises buildings that employ a combination of both construction techniques. These buildings were likely inhabited historically and have undergone subsequent urbanization.

4.2. Building Classification of Bari

Bari is partitioned in 1502 census zones of which 1450 are residential census zones (Figure 3).
At the end of the preprocessing phase, the optimal number of clusters was obtained with value C = 2, for which the smallest value of the Xie–Beni index was measured to be equal to 0.225. For C = 2 the convergence is reached after six iterations, in which the value of the difference Δ in (17) is 3.426 × 10−6.
Then, the census zones of Bari have been grouped into three clusters, C1 and C2. Table 5 shows, normalized with respect to the groups to which they belong, the centroids of the three clusters and the relevance of the features with respect to the three clusters.
Cluster C1 comprises census zones that predominantly contain residential buildings constructed in load-bearing masonry prior to 1919 and that exhibit a satisfactory state of conservation. Cluster C2 comprises census zones that are characterized by the presence of residential buildings constructed in load-bearing masonry and reinforced concrete, with a frequency that is not negligible. The majority of these structures are in fair condition and were constructed primarily between 1919 and 1980, with a notable increase in the period between 1960 and 1970.
The thematic map in Figure 4 illustrates the partitioning of the census zones of Bari into three clusters. Non-residential census zones are indicated by the use of the color gray.
Bari’s urban landscape is characterized by the coexistence of two distinct residential zones, as depicted on the provided map. The first zone, delineated in red, comprises historic centers that have remained largely untouched by subsequent urban expansion. In contrast, the second zone, marked in green, consists of equally historic settlements that have undergone substantial building development between the post-war period and the 1980s. This phenomenon is exemplified by the coexistence within these urban areas of load-bearing masonry residential buildings, likely constructed between 1919 and 1945, and reinforced concrete buildings, presumably erected from the post-war period onwards.

4.3. Building Classification of Naples

Naples is partitioned in 4301 census zones of which 4049 are residential census zones (Figure 5).
At the end of the preprocessing phase, the optimal number of clusters was obtained with value C = 3, for which the smallest value of the Xie–Beni index was measured to be equal to 2.480.
For C = 3 the convergence is reached after six iterations, in which the value of the difference Δ in (17) is 2534 × 10−6.
Then, the census zones of Naples have been grouped into three clusters, C1, C2, and C3. Table 6 shows, normalized with respect to the groups to which they belong, the centroids of the three clusters and the relevance of the features with respect to the three clusters.
Cluster C1 groups together census zones in which residential buildings were predominantly built in load-bearing masonry at a time before 1919, but with a non-negligible frequency of buildings constructed later, up until 1960. These residential buildings are, predominantly, at least four floors high. Cluster C2 groups together census zones in which residential buildings were predominantly built in reinforced concrete in the post-war period, between 1945 and 1960, and in a good state of preservation. In this cluster too, residential buildings are predominantly at least four stories high.
Cluster C3 includes census zones in which mainly residential buildings were constructed in load-bearing masonry before 1919. Most of them are in a poor state of preservation.
The thematic map in Figure 6 shows the partitioning of the census zones of Bari into the three clusters. Non-residential census zones are highlighted in grey.
In summary, Naples appears to be made up of three types of urban zones: those, displayed in red on the thematic map, with residential buildings predominantly made of load-bearing masonry and constructed before the beginning of the last century, which characterize the historic center of the city; they are, with high frequency, in poor state of conservation; those, displayed in green on the map, with later urbanization, with buildings constructed predominantly in reinforced concrete; and those, displayed in orange on the map, with buildings built predominantly in load-bearing masonry, but with subsequent constructions, probably in reinforced concrete, carried out up until 1960.

4.4. Comparison Results

The experimental comparison focuses on EWFCM, which is a representative state-of-the-art extension of FCM that incorporates entropy-regularized feature weighting. Comparative tests performed in [8] have shown that EWFCM consistently outperforms both the classical FCM and weighted FCM (wFCM) on standard benchmark datasets, including those from the UCI repository. Since EWFCM can be regarded as a generalization of these methods, comparing the proposed approach with EWFCM provides a stronger baseline.
These comparative tests have been carried by running WEFCM on all test cases. The stop iteration error ε in WEFCM is fixed to the value 1 × 10−5.
Additionally, for EWFCM, 20 independent runs of each algorithm with different random initializations for each test case. The results are reported in terms of average.
First, the computational speeds of the two algorithms were measured. In Table 7 are shown the number of iterations and the CPU times obtained by running WEFCM and GWEFCM in the 16 test cases.
In all cases, GWEFCM reaches convergence in an average number of iterations equal to half the number of iterations of WEFCM, with CPU times on average equal to one third of those obtained by running WEFCM. These results highlight that GWEFCM is much more efficient than WEFCM in terms of execution times. This greater efficiency is since, unlike WEFCM, in which the feature weights vary within each cluster, in GWEFCM the feature weights do not vary across clusters; they are calculated as the product of the weight of the group to which they belong and the weight of the feature within the group.
To assess the statistical significance of the differences between GEWFCM and EWFCM, a Wilcoxon signed-rank test [22] was performed on the number of iterations and CPU execution times reported in Table 6. In the test the null hypothesis H0 is that GEWFCM and EWFCM have the same performance. A significance level α = 0.05 is set. The results of this test are shown in Table 8.
Therefore, the null hypothesis of equal performance between the two algorithms is rejected, indicating that the observed improvements are statistically significant and not due to random variability. This indicates that the observed improvements of GEWFCM in terms of number of iterations and CPU time are not due to random variability but reflect a systematic advantage of the proposed method.
Further comparative tests were performed to measure the similarity between the results obtained with the two algorithms. To measure this similarity, the Adjusted Rand Index (ARI) [23,24] was used; this index allows comparing the similarity between partitions obtained by two clustering algorithms. In Table 9 the values of the ARI metrics obtained in each test case are shown.
The ARI values range between 0.861 and 0.917, with a mean of 0.896 and a standard deviation of 0.017. Given that ARI values lie in the interval [0, 1], these results demonstrate a strong agreement between the partitions produced by WEFCM and GWEFCM, confirming that the proposed method preserves the clustering structure of the baseline approach.
In summary, GWEFCM is comparable to WEFCM in terms of the quality of the clustering results, but it is computationally much faster.
Furthermore, it allows for assigning greater semantic meaning to clusters, highlighting how significant a feature is within a cluster compared to the group to which it belongs.

5. Conclusions

This study examined the most significant and representative feature-weighted fuzzy clustering algorithms. In most practical clustering tasks, attributes representing data characteristics have varying degrees of importance in forming the clustering structure.
This work presents a new entropy-weighted fuzzy clustering algorithm based on a two-level weight hierarchy: a global level that refers to groups of features and a global level that determines the weight of features within a group.
This mechanism, on the one hand, provides greater semantic interpretability of the clusters, and on the other, guarantees high computational speed, due to the independence of the weights from the clusters.
Experimental tests conducted on residential building census datasets from 16 Italian cities have shown that GEWFCM increases the semantic interpretability of the clusters. Furthermore, in all test cases, the results are highly similar to EWFCM, despite significantly lower processing times.
Further testing will be necessary to demonstrate the effectiveness of the proposed two-level hierarchy in optimally capturing the intrinsic structure of the data and providing better semantic interpretability of the clusters. To conclude, the authors intend to conduct future research to test GEWFCM on high-dimensional datasets with different cardinalities and data structures.

Author Contributions

Conceptualization, R.C., B.C., and F.D.M.; methodology, R.C., B.C., and F.D.M.; software, R.C., B.C., and F.D.M.; validation, R.C., B.C., and F.D.M.; formal analysis, R.C., B.C., and F.D.M.; investigation, R.C., B.C., and F.D.M.; resources, R.C., B.C., and F.D.M.; data curation, R.C., B.C., and F.D.M.; writing—original draft preparation, R.C., B.C., and F.D.M.; writing—review and editing, R.C., B.C., and F.D.M.; visualization, R.C., B.C., and F.D.M.; supervision, R.C., B.C., and F.D.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study and the source code created to implement the proposed method are available on request from the corresponding author.

Acknowledgments

The article has been developed within the context of the project RETURN (Multi-Risk sciEnce for resilienT commUnities undeR a changiNg climate)—the extended partnership that aims to strengthen research chains on environmental, natural and anthropogenic risks at national level and promote their participation in strategic European and global value chains.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Bezdek, J.C. Pattern Recognition with Fuzzy Objective Function Algorithms; Kluwer Academic Publishers: Norwell, MA, USA, 1981; p. 256. [Google Scholar] [CrossRef]
  2. Bezdek, J.C.; Ehrlich, R.; Full, W. FCM: The fuzzy c-means clustering algorithm. Comput. Geosci. 1984, 10, 191–203. [Google Scholar] [CrossRef]
  3. Li, W.; Zhai, S.; Xu, W.; Pedrycz, W.; Qian, Y.; Ding, W.; Zhan, T. Feature Selection Approach Based on Improved Fuzzy C-Means with Principle of Refined Justifiable Granularity. IEEE Trans. Fuzzy Syst. 2023, 31, 2112–2126. [Google Scholar] [CrossRef]
  4. Lincy, J.M.; Sudha, N. Weighted fuzzy C means and enhanced adaptive neuro-fuzzy inference based chronic kidney disease classification. J. Fuzzy Ext. Appl. 2024, 5, 100–115. [Google Scholar] [CrossRef]
  5. Ma, N.; Hu, Q.; Wu, K.; Yuan, Y. A Dissimilarity Measure Powered Feature Weighted Fuzzy C-Means Algorithm for Gene Expression Data. IEEE Trans. Fuzzy Syst. 2025, 33, 192–202. [Google Scholar] [CrossRef]
  6. Keller, A.; Klawonn, F. Fuzzy clustering with weighting of data variables. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 2000, 8, 734–746. [Google Scholar] [CrossRef]
  7. Winkler, R.; Klawonn, F.; Kruse, R. Fuzzy c-means in high dimensional spaces. Int. J. Fuzzy Syst. Appl. 2011, 1, 1–16. [Google Scholar] [CrossRef]
  8. Zhou, J.; Chen, L.; Chen, C.I.P.; Zhang, Y.; Li, H.-Z. Fuzzy clustering with the entropy of attribute weights. Neurocomputing 2016, 198, 125–134. [Google Scholar] [CrossRef]
  9. Yang, M.-S.; Nataliani, Y. A feature-reduction fuzzy clustering algorithm based on feature-weighted entropy. IEEE Trans. Fuzzy Syst. 2017, 26, 817–835. [Google Scholar] [CrossRef]
  10. Guo, Y.; Wang, R.; Zhou, J.; Chen, Y.; Jiang, H.; Han, S.; Wang, L.; Du, T.; Ji, K.; Zhao, Y.; et al. Soft Subspace Fuzzy Clustering with Dimension Affinity Constraint. Int. J. Fuzzy Syst. 2022, 24, 2283–2301. [Google Scholar] [CrossRef]
  11. Hashemzadeh, M.; Oskouei, A.G.; Farajzadeh, N. New fuzzy C-means clustering method based on feature-weight and cluster-weight learning. Appl. Soft Comput. 2019, 78, 324–345. [Google Scholar] [CrossRef]
  12. Song, S.; Jia, Z.; Shi, F.; Wang, J.; Ni, D. Adaptive fuzzy weighted C-mean image segmentation algorithm combining a new distance metric and prior entropy. Eng. Appl. Artif. Intell. 2024, 131, 107776. [Google Scholar] [CrossRef]
  13. Lin, J.; Wu, L.; Chen, R.; Wu, J.; Wang, X. Double-weighted fuzzy clustering with samples and generalized entropy features. Concurr. Comput. Pract. Exp. 2021, 33, e5758. [Google Scholar] [CrossRef]
  14. Bishop, C.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006; 778p, ISBN 978-0-387-31073-2. [Google Scholar]
  15. He, Y.-L.; Zhang, X.-L.; Ao, W.; Huang, J.Z. Determining the optimal temperature parameter for softmax function in reinforcement learning. Appl. Soft Comput. 2018, 70, 80–85. [Google Scholar] [CrossRef]
  16. Amini Pishro, A.; Zhang, S.; L’Hostis, A.; Liu, Y.; Hu, Q.; Hejazi, F.; Shahpasand, M.; Rahman, A.; Oueslati, A.; Zhang, Z. Machine learning-aided hybrid technique for dynamics of rail transit stations classification: A case study. Sci. Rep. 2024, 14, 23929. [Google Scholar] [CrossRef] [PubMed]
  17. Cafaro, R.; Cardone, B.; D’Ambrosio, V.; Di Martino, F.; Miraglia, V. A GIS-Integrated Framework for Unsupervised Fuzzy Classification of Residential Building Pattern. Electronics 2025, 14, 4022. [Google Scholar] [CrossRef]
  18. Xie, X.L.; Beni, I.G. A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 1991, 13, 841–847. [Google Scholar] [CrossRef]
  19. Pal, N.R.; Bezdeck, J.C. On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 1995, 3, 370–379. [Google Scholar] [CrossRef]
  20. Pérez-Sánchez, I.; Medina-Pérez, M.A.; Monroy, R.; Loyola-González, O.; Gutierrez-Rodríguez, A.E. New Evaluation Method for Fuzzy Cluster Validity Indices. IEEE Access 2025, 13, 22728–22744. [Google Scholar] [CrossRef]
  21. Huang, M.; Xia, Z.; Wang, H.; Zeng, Q.; Wang, Q. The range of the value for the fuzzifier of the fuzzy c-means algorithm. Pattern Recognit. Lett. 2012, 33, 2280–2284. [Google Scholar] [CrossRef]
  22. Conover, W.J. Practical Nonparametric Statistics, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 1999; 608p, ISBN 978-0-471-16068-7. [Google Scholar]
  23. Santos, J.M.; Embrechts, M. On the Use of the Adjusted Rand Index as a Metric for Evaluating Supervised Classification. In Artificial Neural Networks–ICANN 2009. ICANN 2009; Lecture Notes in Computer Science; Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; Volume 5769. [Google Scholar] [CrossRef]
  24. Warrens, M.J.; van der Hoef, H. Understanding the Adjusted Rand Index and Other Partition Comparison Indices Based on Counting Object Pairs. J. Classif. 2022, 39, 487–509. [Google Scholar] [CrossRef]
Figure 1. Partitioning of the city of Genoa into census zones.
Figure 1. Partitioning of the city of Genoa into census zones.
Symmetry 18 00807 g001
Figure 2. Thematic map of the census zones of Genoa classified into the tree clusters.
Figure 2. Thematic map of the census zones of Genoa classified into the tree clusters.
Symmetry 18 00807 g002
Figure 3. Partitioning of the city of Bari into census zones.
Figure 3. Partitioning of the city of Bari into census zones.
Symmetry 18 00807 g003
Figure 4. Thematic map of the census zones of Bari classified into the tree clusters.
Figure 4. Thematic map of the census zones of Bari classified into the tree clusters.
Symmetry 18 00807 g004
Figure 5. Partitioning of the city of Naples into census zones.
Figure 5. Partitioning of the city of Naples into census zones.
Symmetry 18 00807 g005
Figure 6. Thematic map of the census zones of Naples classified into the tree clusters.
Figure 6. Thematic map of the census zones of Naples classified into the tree clusters.
Symmetry 18 00807 g006
Table 1. Characteristics, strengths, and limitations of the existing weighted FCM methods.
Table 1. Characteristics, strengths, and limitations of the existing weighted FCM methods.
Type of MethodCore IdeaComplexityInterpretabilityStrengthsLimitations
FCM ([1,2])Prototype-based clusteringLowLowSimple, fastNo feature weighting; sensitive to noise
Grouped FCM ([6])Feature groupingLowLowStructured representationLimited adaptability; sensitive to noise
wFCM ([7])Global feature weightingLowLowReduces noiseDoes not exploit semantic relationships between features; all variables are treated independently
EWFCM-based ([8,9,10,11,12,13])Entropy-based weightingHighMedium-LowAdaptive feature selectionHigh computational cost; low interpretability;
Table 2. Features related to the frequency of residential buildings with a specific characteristic and their grouping.
Table 2. Features related to the frequency of residential buildings with a specific characteristic and their grouping.
GroupFeatureDescription
Construction techniqued5Frequency of residential buildings constructed of masonry
d6Frequency of residential buildings constructed of reinforced concrete
d7Frequency of residential buildings constructed of other materials
Construction periodd8Frequency of residential buildings constructed before 1919
d9Frequency of residential buildings constructed from 1919 to 1945
d10Frequency of residential buildings constructed from 1946 to 1960
d11Frequency of residential buildings constructed from 1961 to 1970
d12Frequency of residential buildings constructed from 1971 to 1980
d13Frequency of residential buildings constructed from 1981 to 1990
d14Frequency of residential buildings constructed from 1991 to 2000
d15Frequency of residential buildings constructed from 2001 to 2005
d16Frequency of residential buildings constructed after 2005
Number of floorsd17Frequency of single-story residential buildings
d18Frequency of two-story residential buildings
d19Frequency of three-story residential buildings
d20Frequency of residential buildings with four or more floors
Number of interiorsd21Frequency of single-family residential buildings
d22Frequency of two-apartment residential buildings
d23Frequency of residential buildings from three to four apartments
d24Frequency of residential buildings from five to eight apartments
d25Frequency of residential buildings from nine to sixteen apartments
d26Frequency of residential buildings with at least sixteen apartments
State of conservationd28Frequency of residential buildings with excellent state of conservation
d29Frequency of residential buildings with fair state of conservation
d30Frequency of residential buildings with poor state of conservation
d31Frequency of residential buildings with very poor state of conservation
Table 3. Values of the parameters used by executing GEWFCM.
Table 3. Values of the parameters used by executing GEWFCM.
ParameterDescriptionValue
mFuzzifier2
εStop iteration error1 × 10−5
Γ g Groups entropy parameter10
Γ h Features entropy parameter10
Table 4. Cluster centroids and feature relevance for Genoa.
Table 4. Cluster centroids and feature relevance for Genoa.
GroupFeatureCluster CentroidsRelevance
C1C2C3r1r2r3
Construction techniqued50.7620.3840.599significantnot negligiblenot negligible
d60.1880.6680.346negligiblesignificantnot negligible
d70.0500.0180.055negligiblenegligiblenegligible
Construction periodd80.5260.2000.331significantnot negligiblesignificant
d90.2130.2190.209not negligiblenot negligiblenot negligible
d100.1400.2770.286not negligiblesignificantsignificant
d110.0720.1900.093negligiblenot negligiblenegligible
d120.0280.0740.041negligiblenegligiblenegligible
d130.0140.0320.027negligiblenegligiblenegligible
d140.0030.0050.006negligiblenegligiblenegligible
d150.0020.0030.004negligiblenegligiblenegligible
d160.0020.0020.004negligiblenegligiblenegligible
Number of floorsd170.0990.0390.087negligiblenegligiblenegligible
d180.2790.1000.234not negligiblenegligiblenegligible
d190.2090.1140.200negligiblenegligiblenegligible
d200.4120.7470.479not negligiblesignificantnot negligible
Number of interiorsd210.2420.0950.227not negligiblenegligiblenot negligible
d220.2150.0790.165not negligiblenegligiblenegligible
d230.1190.0720.111negligiblenegligiblenegligible
d240.1090.0990.105negligiblenegligiblenegligible
d250.1070.1580.114negligiblenegligiblenegligible
d260.2080.4970.278not negligiblesignificantnot negligible
State of conservationd280.2210.1630.256negligiblenegligiblenot negligible
d290.6110.7290.543significantsignificantsignificant
d300.1510.1000.180negligiblenegligiblenegligible
d310.0160.0080.021negligiblenegligiblenegligible
Table 5. Cluster centroids and feature relevance for Bari.
Table 5. Cluster centroids and feature relevance for Bari.
GroupFeatureCluster CentroidsRelevance
C1C2r1r2
Construction techniqued50.6880.347significantnot negligible
d60.2590.603negligiblenot negligible
d70.0530.051negligiblenegligible
Construction periodd80.2350.098significantnegligible
d90.1580.167not negligiblenot negligible
d100.1610.173not negligiblenot negligible
d110.1530.237not negligiblesignificant
d120.1800.205not negligiblenot negligible
d130.0540.067negligiblenegligible
d140.0370.036negligiblenegligible
d150.0130.011negligiblenegligible
d160.0090.005negligiblenegligible
Number of floorsd170.2020.100negligiblenegligible
d180.2610.172not negligiblenegligible
d190.1910.152negligiblenegligible
d200.3470.576not negligiblesignificant
Number of interiorsd210.2700.155not negligiblenegligible
d220.1570.092negligiblenegligible
d230.1510.107negligiblenegligible
d240.1490.190negligiblenot negligible
d250.1210.216negligiblenot negligible
d260.1520.240negligiblenot negligible
State of conservationd280.2480.174negligiblenegligible
d290.5360.674significantsignificant
d300.1950.143negligiblenegligible
d310.0210.010negligiblenegligible
Table 6. Cluster centroids and feature relevance for Naples.
Table 6. Cluster centroids and feature relevance for Naples.
GroupFeatureCluster CentroidsRelevance
C1C2C3r1r2r3
Construction techniqued50.7580.4410.961significantnot negligiblesignificant
d60.2120.5140.034negligiblenot negligiblenegligible
d70.0300.0460.005negligiblenegligiblenegligible
Construction periodd80.4150.2000.769significantnot negligiblesignificant
d90.2120.1390.173not negligiblenot negligiblenot negligible
d100.1340.1970.021not negligiblenot negligiblenegligible
d110.1000.2180.012negligiblenot negligiblenegligible
d120.0670.1170.009negligiblenot negligiblenegligible
d130.0330.0660.008negligiblenegligiblenegligible
d140.0060.0150.000negligiblenegligiblenegligible
d150.0070.0110.002negligiblenegligiblenegligible
d160.0250.0370.006negligiblenegligiblenegligible
Number of floorsd170.0750.0870.053negligiblenegligiblenegligible
d180.1860.2250.068negligiblenegligiblenegligible
d190.1420.1380.060negligiblenegligiblenegligible
d200.5980.5500.819significantsignificantsignificant
Number of interiorsd210.1340.1740.053negligiblenot negligiblenegligible
d220.0760.0860.050negligiblenegligiblenegligible
d230.1260.1140.148negligiblenegligiblenegligible
d240.2140.1560.349not negligiblenegligiblesignificant
d250.2200.1430.282not negligiblenegligiblenot negligible
d260.2290.3270.117not negligiblenot negligiblenegligible
State of conservationd280.0500.0890.008negligiblenegligiblenegligible
d290.4730.5620.163not negligiblesignificantnegligible
d300.4140.3120.752not negligiblenot negligiblesignificant
d310.0630.0370.077negligiblenegligiblenegligible
Table 7. Number of iterations and CPU times obtained executing WEFCM and GWEFCM.
Table 7. Number of iterations and CPU times obtained executing WEFCM and GWEFCM.
CityNumber of ClustersWEFCMGWEFCM
IterationsCPU Time (s)IterationsCPU Time (s)
Tourin31517.4173.28
Genoa31315.3363.04
Milan21620.2793.56
Venice21215.1063.05
Verona31214.6572.89
Padua41314.8972.92
Parma21114.5452.86
Bologna31416.7673.10
Florence21416.3263.05
Rome21620.0683.55
Naples31518.6073.37
Bari21115.2463.07
Palermo21115.1663.04
Catania31114.8162.90
Messina21013.9552.86
Cagliari21013.9852.87
Table 8. Results of the Wilcoxon signed-rank test applied to number of iterations and CPU time.
Table 8. Results of the Wilcoxon signed-rank test applied to number of iterations and CPU time.
Measurep-Value
Iterations1.53 × 10−5
CPU time1.53 × 10−5
Table 9. Values of the ARI index for the 16 test cases.
Table 9. Values of the ARI index for the 16 test cases.
CityARI Index
Tourin0.906
Genoa0.871
Milan0.862
Venice0.915
Verona0.877
Padua0.904
Parma0.917
Bologna0.889
Florence0.903
Rome0.866
Naples0.890
Bari0.901
Palermo0.905
Catania0.903
Messina0.912
Cagliari0.910
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cafaro, R.; Cardone, B.; Di Martino, F. A Novel Two-Level Entropy-Weighted Fuzzy C-Means Algorithm and Its Application for Classifying Urban Patterns by Residential Building Characteristics. Symmetry 2026, 18, 807. https://doi.org/10.3390/sym18050807

AMA Style

Cafaro R, Cardone B, Di Martino F. A Novel Two-Level Entropy-Weighted Fuzzy C-Means Algorithm and Its Application for Classifying Urban Patterns by Residential Building Characteristics. Symmetry. 2026; 18(5):807. https://doi.org/10.3390/sym18050807

Chicago/Turabian Style

Cafaro, Rosa, Barbara Cardone, and Ferdinando Di Martino. 2026. "A Novel Two-Level Entropy-Weighted Fuzzy C-Means Algorithm and Its Application for Classifying Urban Patterns by Residential Building Characteristics" Symmetry 18, no. 5: 807. https://doi.org/10.3390/sym18050807

APA Style

Cafaro, R., Cardone, B., & Di Martino, F. (2026). A Novel Two-Level Entropy-Weighted Fuzzy C-Means Algorithm and Its Application for Classifying Urban Patterns by Residential Building Characteristics. Symmetry, 18(5), 807. https://doi.org/10.3390/sym18050807

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop