Efﬁciency of Cluster Validity Indexes in Fuzzy Clusterwise Generalized Structured Component Analysis

: Fuzzy clustering has been broadly applied to classify data into K clusters by assigning membership probabilities of each data point close to K centroids. Such a function has been applied into characterizing the clusters associated with a statistical model such as structural equation modeling. The characteristics identiﬁed by the statistical model further deﬁne the clusters as heterogeneous groups selected from a population. Recently, such statistical model has been formulated as fuzzy clusterwise generalized structured component analysis (fuzzy clusterwise GSCA). The same as in fuzzy clustering, the clusters are enumerated to infer the population and its parameters within the fuzzy clusterwise GSCA. However, the identiﬁcation of clusters in fuzzy clustering is a difﬁcult task because of the data-dependence of classiﬁcation indexes, which is known as a cluster validity problem. We examined the cluster validity problem within the fuzzy clusterwise GSCA framework and proposed a new criterion for selecting the most optimal number of clusters using both ﬁt indexes of the GSCA and the fuzzy validity indexes in fuzzy clustering. The criterion, named the FIT-FHV method combining a ﬁt index, FIT, from GSCA and a cluster validation measure, FHV, from fuzzy clustering, performed better than any other indices used in fuzzy clusterwise GSCA.


Introduction
Statistical inference is a method where researchers make an inference on a population via statistical analysis, assuming the population as a homogeneous group would be invalid if there are heterogeneous subpopulations. To avoid such invalid inference, it is required to identify homogeneous subpopulations that hold the statistical inference as an internal validation. That is, it is required to examine the heterogeneity of the population correctly to understand data better, to utilize data more precisely, and to derive the valid inference through data analysis for the population. In this study, we investigate the identification of heterogeneous subpopulations via fuzzy clustering in inferential statistics. In general, we would not have census data for statistics. That is, we may not observe a heterogeneity of a population but infer the heterogeneity via data analysis using a sample indicating a discrepancy among heterogeneous subgroups if it exists. Although such discrepancy can be examined, it has often been looked over by saying that the discrepancy is a sampling error, which may cause a serious problem in inferential statistics. In the statistics field of structural equation modeling (SEM) that examines the associations and directionalities among variables, researchers have contemplated the heterogeneity and developed a statistical method, called mixture modeling [1]. In this study, we narrow down our scope to generalized structured component analysis (GSCA; [2]) as a component-based structural equation modeling. In GSCA, fuzzy clustering has been utilized to classify a heterogeneity of a population when the indicators (observed variables) of clusters were ordinal or continuous variables [3], which is called fuzzy clusterwise GSCA. Analogous to the latent class analysis in a factor-based SEM, fuzzy clusterwise GSCA was also extended to categorical indicators [4]. Although the algorithm of the fuzzy clusterwise GSCA performs well in classifying and characterizing the heterogeneity, it is still unanswered (or need to be developed) in finding the optimal number of clusters, which is often called a cluster validity problem.
To validate the number of clusters in fuzzy clusterwise GSCA, the modified partition coefficient (MPC) and the normalized classification entropy (NCE) were proposed [5] and have been used so far [2]. However, there was no study conducted to examine the efficiency of such cluster validity indexes in fuzzy clusterwise GSCA. Although there were several studies including Wang and Zhang [6] that examined the efficiency in cluster validity measures, it was limited to unsupervised classification, fuzzy clustering, instead of fuzzy clusterwise GSCA. In this study, we (1) review cluster validity measures that were introduced from the previous studies (e.g, [7]), (2) examine the efficiency in seven indexes selected for fuzzy clusterwise GSCA, and (3) propose selection criteria as the result of a simulation study that fits best for fuzzy clusterwise GSCA, named the FIT-FHV method combining a fit index, FIT, from GSCA and a cluster validation measure, FHV, from fuzzy clustering. The resulting outcomes would be applicable to the other structural equation mixture modeling such as latent profile analysis and growth mixture curve modeling [1].

GSCA Model Specification
As stated earlier, GSCA is a component-based approach of SEM. Different from factor-based SEM, GSCA consists of three sub-models: measurement, structural, and weighted relation models. The first two of these models are the same as the factor-based SEM, also known as the linear structural relations model [8], while the weighted relation model defines a latent variable as a weighted composite (or component) of indicators in a way that is unique in component-based analyses. Utilizing the same notations as Hwang and Takane [2], we can write these sub-models in the matrix form as follows: where z is a J by 1 vector of observed variables, γ is a P by 1 vector of latent variables that are unobserved variables but defined by observed variables, C is a P by J matrix of loadings indicating the associations between z and γ, B is a P by P matrix of path coefficients between latent variables in γ, W is a J by P matrix of component weights that define γ by z, is a J by 1 vector of the residuals of z, and ζ is a P by 1 vector of the residuals of γ. The superscript T denotes a transpose matrix (see further discussion of SEM in [2]).

Estimation of GSCA
GSCA estimates model parameters, including weights (W), path coefficients(B), and loadings (C), by minimizing the sum of squares of the residuals, e i , i.e., consistently minimizing a single least square criterion defined by where A = C T B T , V = I W T , and N is the sample size. To maintain consistent scaling for the indicators and latent variables, both the indicators and the latent variables must be standardized in GSCA.

Optimal Scaling for Categorical Variables.
Hwang and Takane [9] extended GSCA to include categorical indicators, making it possible to apply GSCA to qualitative data such as nominal and categorical data. They proposed their new method nonlinear GSCA. In the same paper, they resolved the linearity issue afflicting least square methods by applying the same optimal scaling method used by other researchers and reported in the item response theory literature (see, for example, [10]). In nonlinear GSCA, each indicator, z j , where j = 1, · · · , J, is transformed by s j = ω(z j ) , where ω depends on the measurement characteristics of the variable, z j . This requires an additional step in the estimation of GSCA but, just as in conventional GSCA, nonlinear GSCA is implemented by minimizing the following criterion: with respect to, W, A, and S, where S = s j , subject to the restrictions that , diag((SW) T SW) = I, s T j s j = 1, s j = ω(z j ), and SS stands for sum of squares. This criterion is minimized by alternating two phases [11]. The first of these phases is identical to the alternating least square estimation procedure used for updating model parameters (W and A) in the conventional GSCA for quantitative data. The second is an optimal scaling phase in which qualitative data are transformed to quantitative data S in such a way that they agree maximally with their model predictions while at the same time preserving the measurement characteristics of the data. In practice, variables in the original data matrix Z may consist of a mix of different measurement characteristics; for example, some variables are nominal, others are ordinal, and yet others are numerically defined, as explained in Young [12].

Fuzzy Clustering Algorithm
Hwang and Takane [2] described how to utilize fuzzy clustering in GSCA for continuous data, which is called fuzzy clusterwise GSCA. Briefly, to estimate the memberships for the heterogeneous subgroups, we can estimate the membership parameters, by minimizing the residual sums of their squares weighted by u m ki : with respect to u m ki , subject to the probabilistic condition, ∑ K k=1 u m ki = 1. The exponent, m, is referred to as the fuzzifier. The m becomes fuzzier in boundary as it gets larger, but it is less fuzzier, or harder, as it approaches 1. In this study, we consider m = 2 that is the most popular choice [13].

Fuzzy Clusterwise GSCA for Latent Class Analysis
Ryoo, Park, and Kim [4] applied fuzzy clusterwise GSCA to binary or ordinal observed data by combining optimal scaling with the fuzzy clustering algorithm to identify heterogeneous subgroups. The model provided is similar as latent class analysis (LCA), a factor-based model identifying heterogeneous subgroups under binary or ordinal observed data. The algorithm depicted in Figure 1 starts with fuzzy clustering in Step 1, which gives initial membership probability, u m ki . In step 2, after estimating the parameters of GSCA and applying the optimal scaling, s j = ω(z j ), we also update the u m ki . This alternating process will be repeated until the parameters converge. When all parameters were estimated in Step 3, we utilize the parameters to define the characteristics of clusters and assign the membership based on posterior probabilities, u m ki . It should be noted that, due to the nature of the difference between component-based LCA, i.e., fuzzy clusterwise GSCA for discrete data, and factor-based LCA, there is no one-to-one correspondence between the membership assignments in component-based LCA and factor-based LCA. One of the reasons causing the discrepancy would be estimation methods. That is, the fuzzy clustering GSCA identifies the distances from the centroids, whereas factor-based LCA relies on the likelihood function. Thus, instead of competing with each other, it is reasonable to suggest that researchers select one of the approaches, either fuzzy clustering GSCA for discrete data or factor based LCA, following Hwang, Takane, and Jung [14], who discussed the conceptual difference between two approaches, namely the component-based SEM and the factor-based SEM, respectively, for such cases.

Model Evaluation
For the overall model evaluation in fuzzy clusterwise GSCA, FIT and AFIT indices were used to evaluate fitted GSCA models (see [2] for more details). FIT and AFIT are interpreted as the variance explained by the fitted model, like R squared and adjusted R squared, in regression analysis. Thus, for the larger FIT and AFIT, the more of the variance is explained. In this study, we mainly focus on model evaluation to enumerate the optimal number of clusters (or latent classes) indicating the heterogeneity of the population given. Accordingly, two additional model evaluation tools used in fuzzy clusterwise GSCA, modified partition coefficient (MPC), and normalized classification entropy (NCE), are our interest as well. For MPC and NCE, the smaller the better. These indexes are somewhat limited to be used in the fuzzy clustering context. Therefore, we will extend model evaluation tools for fuzzy clusterwise GSCA by employing cluster validity indexes known in fuzzy clustering ( [6,15]).

Method
To enumerate the clusters in fuzzy clustering, many indexes have been used but can be classified based on two criteria: compactness (intra-cluster consistency) and separation (inter-cluster consistency) [6]. However, not all of the indexes can also be used in fuzzy clusterwise GSCA because fuzzy clusterwise GSCA accounts for not only distance from the centroids but also characteristics of each cluster identified by observed indicators via the probabilities of occurring the event associated with the indicator. In addition, fuzzy clusterwise GSCA accounts for the effects of covariates and/or grouping variables as statistical modeling, which is called latent class regression ( [16][17][18]). Thus, cluster validation indexes dealing with various structures in fuzzy clusterwise GSCA should be considered. In the section, we review cluster validity indexes that have been used within fuzzy clustering with two validity index categories: using the membership values only and using both the membership values and the dataset ( [6,15]). Moreover, we also review two indexes, MPC and NCE, that have been used in fuzzy clusterwise GSCA. Overall, we selected eight index candidates, two model fit indexes, and six cluster validity indexes that can be used for enumerating the optimal number of clusters within fuzzy clusterwise GSCA.

Cluster Validity Indexes
To find the optimal number of clusters using the membership probability and the data given, we consider the following six indexes. Although Wang and Zhang [6] listed more than 20 indexes, we did not list all of them (1) because they said there is no dominating index in the study using both empirical and simulated datasets and (2) because not all the indexes work in fuzzy clusterwise GSCA. Let N and C be sample size and number of clusters in the formulae below: (a) Dave's modified partition coefficient (MPC; [15]): Using partition coefficient defined by ki and PC ∈ 1 C , 1 , Dave defined the MPC, also known as fuzzy performance index (FPI), as where C is the number of clusters. As a modification of the bias of PC such that there is a monotonic tendency [15], MPC performs well with a criterion-the larger the better. (b) Bezdek's normalized classification entropy (NCE; [19]): Using partition entropy Bezdek defined the NCE as which is also recommended by Roubens [5]. The criterion that smaller is better is applied in the NCE. Based on the partition entropy, Bezdek [13] also defined the normalized partition entropy (NPE) as The same criterion as in the NCE is applied to NPE. (c) Chen and Linkens' validity index (CLVI; [20]) : They defined the CLVI as where k. An optimal cluster number C is obtained at the max of CLVI. The first term indicates the compactness within a cluster while the second term indicates the separation between clusters. When the first term is larger and the second term is lower, the clusters are compact as well as clearly separable from the other clusters. Thus, the larger the CLVI is, the better.
(d) Fukuyama and Sugeno proposed an index (FS; [21]): The first term describes both the fuzziness in each cluster and the compactness of data, and the second term describes the fuzziness of the clusters with the distance of centroids from the grand mean of the data. We can find an optimal C at min (e) Gath and Geva's fuzzy hypervolume validity index (FHV; [22]): They defined the FHV as where The matrix F k denotes the fuzzy covariance matrix of cluster k, and det(F k ) is the determinant of the matrix F k . The smaller the FHV value is the better and, thus, we find an optimal C at min

Holistic Approach to Enumerate the Number of Clusters
In fuzzy clustering, the literature on finding the optimal number of clusters indicated that there is no dominant index outperforming [6]. Rather, the efficiency of indexes would be dependent on data distribution [15]. Although fuzzy clustering GSCA does include more model specification than fuzzy clustering, it is likely to hold similar property in model evaluation. In this study, we are not only examining the efficiency of indexes but also investigating a holistic criterion that performs well and in a stable manner.

Simulation Design
To examine the efficiency of cluster validation indexes: FIT, AFIT, FPI, NCE, NPE, CLVI, FS, and FHV, five variables were considered in this simulation: 1.
three levels of sample size (N), 2.
three levels of the number of latent classes/clusters (K), 3.
two levels of the number of indicators (V), 4.
three levels of prevalence of the cluster membership (T), and 5.
three levels of error rate in the cluster structure (ER).
In total, 162 (= 3 × 3 × 2 × 3 × 3) simulation conditions were employed, and they are summarized in Table 1. Based on the simulation conditions, 100 responses were generated and, then, we fitted fuzzy clusterwise GSCA into the response data. The current study designed the simulation conditions based on the previously conducted simulation studies in component-based latent class analysis including [23,24].

Sample Size
Three levels of sample size were used in the simulation study: 200, 500, and 1000, representing small, medium, and large samples. Brusco et al. [23] investigated the cases with relatively small sample sizes (100, 200, and 400); however, the fit indices of their study were not distinguishable depending on the sample sizes. Thus, the current study explored the broader range of sample sizes.

The Number of Classes/Clusters
Two, four, and six classes were investigated to examine the efficiency of cluster validation indexes, which covers most of results from latent class analysis. Previous studies ( [23,24]) considered two additional numbers of classes, 3 and 5, so that it covers all the cases from 2 to 6. However, the cases of 3 and 5 did not give any additional trend in the examination of the efficiency. Thus, we exclude the cases in this study.

The Number of Indicators
The number of indicators and response patterns adapted from the previous studies ( [23,24]) was applied in this study. The response patterns in Table 2 were directly used to generate data under the aforementioned conditions of sample sizes and the number of classes. For example, when sample size is 200 and samples are equally clustered in two classes, the response of the first 100 samples with six indicators is (1, 1, 1, 1, 0, 0), and the next response of 100 samples is (0, 0, 1, 1, 1, 1), assuming there is no error rate. Table 2. Response patterns for combinations of the number of classes (K) and the number of item indicators (V).

Prevalence of Class Membership
There are three different types of prevalence, T1, T2, and T3 in Table 1. The first one (T1) is equally clustered, given the number of classes. The second possible prevalence (T2) is that one class has a dominant number of examinees (60%) and the other classes have equally clustered within the leftover (40%). For example, for K = 4, the first cluster (C1), the second cluster (C2), the third cluster (C3), and the fourth cluster (C4) consist of 60%, 13.3%, 13.3%, and 13.3%, respectively. Lastly, we considered the case where one group has a small proportion. For example, for K = 6, C1, C2, C3, C4, C5, and C6 consist of 18%, 18%, 18%, 18%, 18%, and 10%. The details of formulas are presented in Table 1. These study conditions follow Brusco et al. [23].

Error Rates
All samples within a class would not have the exact same response patterns but, instead, the variations of the responses generated were determined by three levels of error rates: 5%, 10%, and 15%. As the higher error rates, such as 20% or 30%, caused poor cluster recovery in a previous study [24], the current study used relatively low error rates, following Brusco et al. [23]. Once an error rate is determined, the samples with the error rate were randomly selected and one of the indicator responses was manipulated as follows: If the original response was 0, it was changed to 1 and vice versa. Selection of indicators manipulated was also random.
Once the data were generated, we estimated the generated responses with different assigned cluster numbers in order to evaluate how many clusters would be best given each study condition. When the true number of clusters was 2 and 4, we fitted fuzzy clusterwise GSCA using a R package, gscaLCA [25], with 2-solution to 4-solution and 2-solution to 6-solution, respectively. When a true number of clusters was 6, we fitted fuzzy clusterwise GSCA with 4-solution to 8-solution.

Results
After fitting a fuzzy clusterwise GSCA into 100 replications for each condition (total of 162 conditions), we examined the efficiency of 8 cluster validity indexes, FIT, AFIT, MPC, NCE, NPE, CLVI, FS, and FHV. In this section, we summarized the results of (1) K = 4 and N = 500 and (2) K = 6 and N = 200 representing the medium number of clusters and the medium number of sample size and the large number of clusters and a small number of sample size. The first case informs the trend of our findings from the 162 conditions, which allows us to formulate our findings as well as propose a holistic criterion. The latter case informs the stability of our findings because the case is most vulnerable due to the complexity in the clusters and the sample variation from the small sample.

FIT-FHV Method
As Wang W. et al. [6,15] pointed out, any single index was not sufficient enough to identify the true number of clusters due to the dependency of data distribution in the efficiency. However, when the model evaluation tools in GSCA were controlled, it was clear that a holistic search performs well in identifying the true number of clusters. Table 3 summarized the result of K = 4, N = 500, V = 6 items, and ER = 15% over all three types of distributions, T = 1, 2, and 3. Note that the symbol, C, in Table 3 is the number of clusters used for the estimation. Table 3. Simulation results of K = 4, N = 500, V = 6 items, and ER = 15%. FIT and AFIT performed similarly and indicated a big drop after C reaches the true number of cluster K. More than 0.2 were dropped at C = 5 in T1 and T3 while more than 0.1 was dropped at C = 5 in T2. However, both FIT and AFIT did not show any discrepancy for C = 2, C = 3, and C = 4 (true number of clusters). Although FHV showed small values when C = 4, there were several cases indicating lower values (C ≥ 4). Interestingly, none of the FHV values for C ≤ 3 were lower than the FHV of C = 4. Thus, we found that the smallest FHV indicates the true number of clusters if we consider the index FHV within the range of C whose FIT and AFIT's values are stable and high. Let us call this holistic criterion as a FIT-FHV method. Only 11 conditions out of 162 simulation conditions did not follow the FIT-FHV criterion. However, those 11 conditions were associated with a small sample size of N = 200 where we rarely fit the fuzzy clusterwise GSCA into such relatively small data.

T # of Clusters
FS would be an index to be considered similar in the FIT-FHV method, but FS indicated a monotonicity, which was a drawback in the traditional measures of partition coefficient (PC) and partition entropy (PE) shown in [15]. Thus, it requires further investigation regarding the index FS. MPC, NCE, NPE, and CLVI indicated neither the true number of clusters nor any consistency in the selection.
The property of the FIT-FHV criterion was also observed in Figure 2. Six indexes (blue) were plotted with the profile of FIT (black), where the first row indicates the equally clustered (T1), the second and third row describe the one cluster that has a large proportion (T2), and one cluster has a small proportion (T3) condition, respectively. The profile of FIT clearly showed the big drop when C > K.
If we considered C ≤ K, FHV was the smallest number when C = K. Again, FS could also be considered, but the range of values were relatively large from negative to positive. The FIT-FHV method worked well for larger samples, N = 1000, larger indicators, V = 9, and smaller ER such as 5% and 10%. The first row is T1, the second row is T2, and the third row is T3.

Stability of the FIT-FHV Method
Furthermore, we examined the stability of the FIT-FHV method by exploring the most vulnerable condition such that K = 6 and N = 200 ( Table 4). Compared with the previous condition, the case of K = 6 and N = 200 has two additional clusters but only 200 samples. Although the drops in FIT and AFIT were not bigger than the previous case, we observed the drops in each T. Based on the C selected from the FIT criterion, we easily identified the true number of clusters by examining the FHV values. The FIT-FHV method worked well even in the most vulnerable condition in the simulation study. Figure 3 indicated that MPC, NCE, NPE, CLVI, and FS may work for a criterion, similar to the FIT-FHV method. However, all of the five fit indexes showed monotonicity. Again, FS indicated more severe negative values as C increased. Thus, we would conclude that the FIT-FHV method outperforms in identifying the true number of clusters in this simulation.  The first row is T1, the second row is T2, and the third row is T3.

Prevalence of Clusters When C Is Assumed to be the Optimal Number of Clusters
In fuzzy clusterwise GSCA, it is also of importance that the selected number of clusters, by the FIT-FHV method, consists of similar proportions as in the true population. Tables 5 and 6 summarized the proportions for the two conditions aforementioned. For the case of K = 4 and N = 500, T1, T2, and T3 should consist of (25%, 25%, 25%, 25%), (60%, 13.3%, 13.3%, 13.3%), and (30%, 30%, 30%, 10%). In T1, all of the Cs except C = 4 included a very small proportion at the last cluster. In practice, it is hard to characterize a group with such a small proportion. On the other hand, the case of C = 4 included 13.38% that was much smaller than 25% but was not comparable with the others. In T2, the true dominant group consists of 60% and we found that the cases of C ≥ 4 held a close proportion. However, the last two groups of C = 5 and C = 6 were 1.42% and 1.62%, respectively. Such lower prevalence rates would not be meaningful. These results indicate that the FIT-FHV method works well. In T3, the true last cluster should consist of 10%, which may indicate that C = 2 is a good proxy. However, its FHV was always higher than C = 4, which means that the classification would hold weak compactness (intra-cluster consistency). It is clear that C = 4 in T3 was closer to the true prevalence than C = 3 (not identifying the fourth cluster), C = 5 (too low prevalence in the last cluster), and C = 6 (too low prevalence in the last two clusters).
Even though we consider the most vulnerable case of K = 6 and N = 200, the same trend as in the case of K = 4 and K = 500 was observed. For the case of K = 6 and N = 200, T1, T2, and T3 should consist of (16.7%, 16.7%, 16.7%, 16.7%, 16.7%, 16.7%), (60%, 8%, 8%, 8%, 8%, 8%), and (18%, 18%, 18%, 18%, 18%, 10%). In T1, all of the Cs except C = 6 included a very small proportion at the last cluster. On the other hand, the case of C = 6 included 11.54% that was much smaller than 16.7% but was not comparable with the others. In T2, where the true dominant cluster consists of 60%, we found that the cases of C ≤ 6 held close proportions to the true ones. However, the second groups of C = 4 and C = 5 were relatively higher than 8% as 21.00% and 17.39%, respectively. Thus, it would not be meaningful. In T3, where the true last group consists of 10%, the results may indicate that C = 4 is good proxy. However, its FHV was always higher than C = 6, which means that the classification would hold weak compactness (intra-cluster consistency). Overall, C = K approximates the true prevalences. Table 5. Prevalence of K = 4, N = 500, V = 6 items, and ER = 15%.  Table 6. Prevalence of K = 6, N = 200, V = 6 items, and ER = 15%.

Real World Application in the Field of Public Health
To investigate the applicability of the proposed FIT-FHV method in a real world data, we fitted fuzzy clusterwise GSCA into Add Health data [26]. The Add Health data used in this study consists of five indicators which are dichotomous responses. For the detailed explanations of Add Health data, please refer to the package gscaLCA [4]. The sample size of the data are 5114, but we explored the smaller sample sizes cases corresponding to the sample sizes used in the simulation study. For the smaller sample sizes (250, 500, and 1000), we randomly selected observations out of the entire samples. For each sample size data, we fitted fuzzy clusterwise GSCA over the number of clusters from 2 to 8.
The fit indexes to determine the optimal number of clusters were summarized in Table 7. The results showed that there were drops regarding FIT and AFIT at C = 5 regardless of the sample sizes. Although the drops of FIT and AFIT were not very distinguishable in the large sample size 5114, compared to the small sample size cases, they were still noticeable within the result of sample size 5114. In addition, the results showed that the FHV values were smallest at C = 4 for four different sample size data sets when considering only C ≤ 4 based on the drops. While applying the FIT-FHV method into the results, we would conclude that C = 4 is the optimal clusters in fitting fuzzy clusterwise GSCA in this dataset regardless of sample sizes. In addition, the empirical analysis results demonstrated that MPC, NCE, NPE, CLVI, and FS showed the monotonic tendency like the simulation results. That is, the results of empirical data analysis confirmed that the proposed FIT-FHV method performed well. In Table 8, the results of prevalence in the empirical data analysis are presented. Table 8 showed that the proportions of each cluster were similar overall for different sample sizes, given each number of clusters, although there is some variability. The distinguishable variabilities were usually observed with the sample size 250. When C = 4, the selected number of clusters based on the FIT-FHV method, the ranges of prevalence except for the sample size 250 were 29.59 to 32.93%, 26.69 to 28.48%, 19.13 to 20.00%, and 18.83 to 19.58% for each cluster, respectively. With the sample size 250, the prevalence is 37.65%, 23.08%, 21.24%, and 17.81%. These differences among the results of the sample sizes would be because of either randomness or the instability of small sample size like the results of the simulation study.

Discussion and Conclusions
In this simulation study, we found that each cluster validation index with its criterion given did not perform well. Some indexes including FS still showed monotonicity. These results are consistent with [6,15] studies. Among the eight indexes compared, FHV performs better although some of high-solutions (meaning that its number of clusters is higher than the true number of clusters) hold lower FHV. As fuzzy clusterwise GSCA is a complex model including both fuzzy clustering and GSCA, it would not be optimistic to get a single index that outperforms over the other indexes. Accordingly, we focused on finding a holistic criterion to identify the true number of clusters with a combination of a couple of indexes. This strategy allows us to find the FIT-FHV method that identifies the true number of clusters for most of the 162 simulation conditions. Here is the FIT-FHV method: • (Step 1:) Find a drop point on FIT and AFIT. The last point of the higher levels indicate the max of the range where the true number of clusters is located. • (Step 2:) Find the smallest FHV within the range found in Step 1, which gives the optimal number of clusters. • (Step 3:) Explore the prevalence distribution of clusters and confirm that none of the prevalence rates are too low.

Limitation
This study finding of the FIT-FHV method would not be generalizable to both fuzzy clustering and GSCA. Even if we consider fuzzy clusterwise GSCA, it should be noted that this simulation study assumed m = 2. Another limitation is that data distribution generated in this simulation study only accounts for the variations with error rates, 5%, 10%, and 15%. As [15] pointed out, the specific data distribution would affect the performance of the FIT-FHV method. Lastly, the type of observed indicators was binary although fuzzy clusterwise GSCA covers ordinary and continuous indicators. All of these limitations are our future research topics so that the FIT-FHV method works for all of fuzzy clusterwise GSCA.
The cluster validation problem is a common issue in fuzzy clustering in general. To resolve this issue, researchers have recently developed a variety of tools in clustering; for example, Bayesian neuro-fuzzy modeling for estimating gas turbine compressor discharge temperature [27]. At the same time, researchers also focused on utilizing the classical cluster quality indexes, for example, in selecting optimal features for cross-fleet analysis and fault diagnosis of industrial gas turbines [28]. This study mainly focused on utilizing the classical cluster quality indexes in structural equation modeling dealing with many social/behavioral sciences and medical fields. Due to the combination of fuzzy clustering and generalized structured component analysis, it should be noted that the proposed method, FIT-FHV, may not outperform the other cluster indexes. In the same fashion, it is out of our scope to compare the efficiency between the FIT-FHV and the other indexes used in fuzzy clustering.

Concluding Remarks
Although fuzzy clustering and generalized structured component analysis have been widely used as a statistical model in the inferential statistics, the fuzzy clusterwise GSCA has suffered from identifying the optimal number of clusters. The new method, called the FIT-FHV method, proposed in this study should help researchers identify the optimal number of clusters in fuzzy clusterwise GSCA. Furthermore, the FIT-FHV method could be beneficial to mixture models in general.

Conflicts of Interest:
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: