Next Article in Journal
The Cubic α-Catmull-Rom Spline
Previous Article in Journal
On the O(1/n) Convergence Rate of the Auxiliary Problem Principle for Separable Convex Programming and Its Application to the Power Systems Multi-Area Economic Dispatch Problem
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Comparison of Information Criteria in Clustering Based on Mixture of Multivariate Normal Distributions

1
Department of Statistics, Yildiz Technical University, Istanbul 34220, Turkey
2
Department of Statistics, Necmettin Erbakan University, Konya 42090, Turkey
*
Author to whom correspondence should be addressed.
Math. Comput. Appl. 2016, 21(3), 34; https://doi.org/10.3390/mca21030034
Submission received: 3 June 2016 / Revised: 13 July 2016 / Accepted: 22 July 2016 / Published: 1 August 2016

Abstract

:
Clustering analysis based on a mixture of multivariate normal distributions is commonly used in the clustering of multidimensional data sets. Model selection is one of the most important problems in mixture cluster analysis based on the mixture of multivariate normal distributions. Model selection involves the determination of the number of components (clusters) and the selection of an appropriate covariance structure in the mixture cluster analysis. In this study, the efficiency of information criteria that are commonly used in model selection is examined. The effectiveness of information criteria has been determined according to the success in the selection of the number of components and in the selection of an appropriate covariance matrix.

1. Introduction

Models for mixtures of distributions—first discussed by Newcomb [1] and Pearson [2]—are currently very popular in clustering. Wolfe [3,4] and Day [5] proposed a multivariate normal mixture model in cluster analysis. The most important problems in clustering are choosing the number of components and identifying the structure of the covariance matrix, based on modeling with multivariate normal distributions for each component that forms the data set. Oliveira-Brochado and Martins [6] examined information criteria used in the determination of the number of components in the mixture model. Despite the many criteria used in the determination of the number of components, these criteria cannot always give accurate results. In particular, information criteria on real data sets with a known number of clusters give different results. In this study, commonly used methods for the determination of the number of clusters—Akaike Information Criterion (AIC), corrected Akaike Information Criterion (AICc), Bayesian Information Criterion (BIC), Classification Likelihood Criterion (CLC), Approximate Weight of Evidence Criterion (AWE), Normalized Entropy Criterion (NEC), Kullback Information Criterion (KIC), corrected Kullback Information Criterion (KICc), and an approximation of Kullback Information Criterion (AKICc) are compared according to the effectiveness of the information criteria, determined by the number of components, and determined by the success in the selection of appropriate covariance matrices and classification accuracy (CA).

2. Clustering Based on Multivariate Finite Mixture Distributions

Mixture cluster analysis based on the mixture of multivariate distributions assumes that the data to be clustered are from several subgroups or clusters, with distinct multivariate distributions. In mixture cluster analysis, each cluster is mathematically represented by a parametric distribution, such as multivariate normal distribution. The entire data set is modeled by a mixture of these distributions.
Assume that there are n observations with p -dimensions, such that an observed random sample is expressed as y = ( y 1 T ,   , y n T ) T . The probability density function of finite mixture distribution models is given by [7],
f ( y j ; Ψ ) = i = 1 g π i f i ( y j ; θ i )
where f i ( y j ; θ i ) are probability density functions of the components and π i are the mixing proportions or weights. Here, 0 π i 1 and i = 1 g π i = 1 ( i = 1 , , g ) . The parameter vector Ψ = ( π ,   θ ) contains all of the parameters of the mixture models. Here θ = ( θ 1 , θ 2 , , θ g ) denotes unknown parameters of the probability density function of the ith components (subgroup or cluster) in the mixture models. In Equation (1), the number of components or clusters is represented by g .
The mixture likelihood approach can be used for estimation of the parameters in the mixture models. This approach assumes that the probability function can be the sum of weighted component densities. If the mixture likelihood approach is used for clustering, the clustering problem becomes a problem of estimating the parameters of a mixture distribution model. The maximum-likelihood function is given as follows [8],
L M ( θ 1 , θ 2 , , θ g ; π 1 , , π g | y ) = j = 1 n i = 1 g π i f i ( y j | θ i )
The most widely used approach for parameter estimation is the Expectation-Maximization (EM) algorithm [9].
In the EM framework, the data y = ( y 1 T , y 2 T , , y n T ) T are considered incomplete because their associated component label vectors z 1 , z 2 , , z n are not observed. The component label variables z i j are consequently introduced, where z i j is defined to be one or zero, according to whether y j did or did not arise from the ith component of the mixture model ( i = 1 , 2 , , g ; j = 1 , 2 , , n ). The completed data vector is represented as follows
y c = ( y T , z T ) T
where
z = ( z 1 T , z 2 T , , z n T ) T
is the unobservable vector of component-indicator variables. The log-likelihood function for the completed data is shown as
log L c ( Ψ ) = i = 1 g j = 1 n z i j [ log π i + log f i ( y j ; θ i ) ]

3. EM Algorithm

The EM algorithm is applied to this problem by treating the z i j as missing data. In this part, the E and M steps of the EM algorithm are described for the mixture distribution models [7].
E step: Log-likelihood function of the complete data, since z i j is linear in terms of its label values in the E step; given y observed value, the instant conditional expected values of the categorical variables of Z i j are calculated. Here, Z i j is a random variable corresponding to z i j . For parameter vector Ψ , the initial value Ψ ( 0 ) is assigned. In the first loop of the EM algorithm, while y is given for the E step, the conditional expected value of log L c ( Ψ ) is calculated with the initial value of Ψ ( 0 ) .
Q ( Ψ ; Ψ ( 0 ) ) = E Ψ ( 0 ) { log L c ( Ψ ) | y }
In the (k + 1)th loop of the E step of the EM algorithm, the expression Q ( Ψ ; Ψ ( k ) ) must be represented. Here, Ψ ( k ) is a value of vector Ψ which is obtained from the kth step of EM. Since the Eth step of the (k + 1)th loop of the EM algorithm, where i = 1 , 2 , , g and j = 1 , 2 , , n the formula below is calculated.
E Ψ ( k ) ( Z i j | y ) = p r Ψ ( k ) { Z i j = 1 | y } = τ i ( y j ; Ψ ( k ) ) = π i ( k ) f i ( y j ; θ i ( k ) ) m = 1 g π m ( k ) f m ( y j ; θ m ( k ) )
Here, the expression of τ i ( y j ; Ψ ( k ) ) is the membership probability of pattern y j in segment i (posterior probability). While y is given using the expression in Equation (7), the conditional probability in Equation (5) can be calculated as follows
Q ( Ψ ; Ψ ( k ) ) = i = 1 g j = 1 n τ i ( y j ; Ψ ( k ) ) { log π i + log f i ( y j ; θ i ) }
M step: In the (k + 1)th loop of the EM algorithm, the estimated value Ψ ( k + 1 ) of Ψ , defined in parameter space Ω , that makes the Q ( Ψ ; Ψ ( k ) ) function maximum is calculated. In the finite mixture probability distribution model, the current estimate π i ( k + 1 ) of π i is done independently from the updated vector of the unknown parameters in component density ξ .
If z i j ’s are observed, the maximum likelihood estimation of π i for completed data can be found as
π ^ i = j = 1 n z i j n ,   ( i = 1 , 2 , , g )
If the logarithm takes in completed data in the Eth step of the EM algorithm, τ i ( y j ; Ψ ( k ) ) values are used instead of the z i j expression. Similarly, when the current estimate π i ( k + 1 ) of π i is calculated, τ i ( y j ; Ψ ( k ) ) is used instead of the z i j expression in Equation (9), as shown below:
π i ( k + 1 ) = j = 1 n τ i ( y j ; Ψ ( k ) ) n ,   ( i = 1 , 2 , , g )
In the (k + 1)th iteration of the Mth step of the EM algorithm, the current value ξ ( k + 1 ) of ξ is defined as
i = 1 g j = 1 n τ i ( y j ; Ψ ( k ) ) log f i ( y j ; θ i ) ξ = 0
The E and M steps are repeated until the convergence criterion in the EM algorithm is satisfied. As a convenient stopping rule for convergence, if the difference of L ( Ψ ( k + 1 ) ) L ( Ψ ( k ) ) is quite small or stable, the algorithm is terminated.

4. The Mixture of Multivariate Normal Distribution

The mixture density function of the multivariate normal distribution is given by [7];
f ( y j ; Ψ ) = i = 1 g π i Φ i ( y j ; μ i , Σ i )
where Φ i ( y j ; μ i , Σ i ) is a multivariate normal distribution function, such that
Φ i ( y j ; μ i , Σ i ) = ( 2 π ) p 2 | Σ i | 1 2 e { 1 2 ( x j μ i ) T Σ k 1 ( x j μ i ) }
Here, the mean vector is μ i , and the covariance matrix is Σ i , i = 1 , 2 , , g , and j = 1 , 2 , , n . In this case, all unknown parameters of the model are shown as Ψ = ( π 1 , , π g 1 , ξ T ) T . Here, ξ occurs from the mean compound vectors μ = ( μ 1 , μ 2 , , μ g ) and the compound covariance matrix Σ = ( Σ 1 , Σ 2 , , Σ g ) of the parameters of the compound probability density function in the mixture distribution model. Posterior probability is given as
τ i ( y j ; Ψ ) = π i Φ ( y j ; μ i , Σ i ) h = 1 g π i Φ ( y j ; μ h , Σ h )   ,   i = 1 , 2 , , g   and   j = 1 , 2 , , n
Maximum likelihood estimators of updated mixture proportions π i , and mean vector μ i of the (k + 1)th iteration of the Mth step is calculated, respectively, by
π i ( k + 1 ) = j = 1 n τ i ( y j ; Ψ ( k ) ) n
μ i ( k + 1 ) = j = 1 n τ i ( y j ; Ψ ( k ) ) y j j = 1 n τ i ( y j ; Ψ ( k ) )
Current estimates of the covariance matrix ( Σ i ) of the component probability density are calculated via the following formula
Σ i ( k + 1 ) = j = 1 n τ i ( y j ; Ψ ( k ) ) ( y j μ i ( k + 1 ) ) ( y j μ i ( k + 1 ) ) T j = 1 n τ i ( y j ; Ψ ( k ) )   ,   ( i = 1 , 2 , , g )

5. Information Criteria for Model Selection in Model Based Clustering

Model selection is one of the most important problems in mixture cluster analysis based on the mixture of multivariate normal distributions. Model selection includes the determination of the number of components (cluster) and the selection of an appropriate covariance structure in the mixture cluster analysis. Information criteria are often used in the model selection in mixture cluster analysis. In the literature, information criteria are usually computed as twice a negative value of the bias correction
2 log L ( Ψ ^ ) + 2 C
Here, the first term represents the lack of harmonization, and the second term C is a measure of complexity. C is usually called the penalty term. The best model that makes the term 2 log L ( Ψ ^ ) + 2 C minimum is selected. Some commonly used information criteria in the literature are given below [6,7,10]:
  • If the number of parameters in the model is shown by d , this is called as Akaike’s Information Criterion (AIC), defined as
    AIC = 2 log L ( Ψ ^ ) + 2 d
    A model that makes the AIC score minimum can be selected as the best model [11].
  • When d is large relative to the sample size n (which includes when n is small, for any d) there is a small-sample version called AICc. AICc is defined as
    AIC c = 2 log L ( Ψ ^ ) + 2 d n / ( n d 1 )  
    The model that yields the minimum AICc score can be selected as the best model [12].
  • If we take the number of parameters in the mixture distribution models d , and the number of observations n , the Bayesian Information Criterion (BIC) can be calculated as
    BIC = 2 log L ( Ψ ^ ) + d log ( n )
    The model that gives the minimum BIC score can be selected as the best model [13].
  • The Hathaway [14] mixture logarithmic likelihood is formulated as
    log L ( Ψ ) = log L c ( Ψ ) E N ( τ )
    Here, Equation (23) is defined as
    E N ( τ ) = i = 1 g j = 1 n τ i j log τ i j
    where E N ( τ ) is the entropy of the fuzzy classification matrix C = { ( τ i j ) } .
    The CLC (Classification Likelihood Criterion) is defined as
    CLC = 2 log L ( Ψ ^ ) + 2 E N ( τ ^ )
    A model that gives the minimum CLC score can be selected as the best model [15].
  • The Approximate Weight of Evidence (AWE) is expressed as
    AWE = 2 log L c + 2 d ( 3 / 2 + log n )
    A model that gives the minimum AWE score can be selected as the best model [16].
  • The Normalized Entropy Criterion (NEC) is shown as below [17]
    NEC g = E N ( τ ^ ) log L ( Ψ ^ ) log L ( Ψ ^ * )
    Here, Ψ ^ *   is a maximum likelihood estimator for Ψ when (g = 1). The minimum NEC for the number of components g is selected as the number of clusters. When g = 1 , entropy takes the value of zero. For this case, Biernacki et al. [18] suggested the selection of the minimum value of NEC where the number of components g > 1 , including NEC < 1.
  • Cavanaugh [19] has proposed an asymptotic unbiased estimator of the Kullback information criterion (KIC). KIC is defined as
    KIC = 2 log L ( Ψ ^ ) + 3 ( d + 1 )
  • Bias correction of the Kullback information criterion (KICc) and an approximation of the Kullback information criterion (AKICc) are shown as below [20,21]
    KIC c = 2 log L ( Ψ ^ ) + 2 ( d + 1 ) n n d 2 n ψ ( n d 2 ) + n log ( n 2 )
    AKIC c = 2 log L ( Ψ ^ ) + ( d + 1 ) ( 3 n d 2 ) n d 2 + d n d
    Here, d is the number of parameters in the model, n is the sample size, and ψ (.) is the digamma or the psi function.

6. Application and Results

In this section, the performances of the information criteria used for the determination of the number of clusters are compared. Moreover, the efficiency of the different types of covariance matrices are investigated in the model based on clustering. The comparison of the information criteria is performed using two different settings. First, commonly used real data sets are used. Second, synthetic data sets are generated by using the properties of these real data sets, and they are used for comparison.
The properties of real data sets are given in Table 1. Moreover, the computed information criteria for each different data set are provided in Table 2, Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8.
The appropriate number of clusters is determined as the value which gives the minimum information criteria. According to Table 2, the number of clusters of the Liver Disorders data set is correctly determined via AWE, BIC, KICc, and NEC. In Table 3, AICc and KIC could accurately determine the number of clusters of the Iris data set. The number of clusters of the Wine data set is correctly determined via AIC and KIC in Table 4.
According to Table 5, the number of clusters of the Ruspini [23] data set is correctly determined via AICc, AKICc, BIC, and KICc. In Table 6, the number of clusters of the Vehicle Silhouettes data set is correctly determined by AIC, BIC, CLC, and KIC.
According to Table 7, the number of clusters for the Landsat Satellite data set is correctly determined via AIC and KIC. In Table 8, the number of clusters for the Image Segmentation data set is correctly determined by all information criteria.
In Table 2, Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8, the performance of each information criterion varies in each data set. In order to make general conclusions, a simulation study is provided. By using the properties of each real data set, synthetic data sets are generated. In this simulation, we generated 1000 data sets according to each real data set. The synthetic data sets are generated from Liver, Iris, Wine, Ruspini, Vehicle, Landsat, and Image data sets. The cluster number determination accuracy is computed for each information criterion. The results are given in Table 9 and Figure 1. According to simulation results, better results are obtained by using KIC.
The efficiency of different types of covariance structures in mixture clustering based on a mixture of multivariate normal distributions is investigated.
According to the number of clusters regarding each data set, classification accuracy and information criteria are computed for each covariance structure. The results are given in Table 10.
According to Table 10, the Type III ( Σ k ) covariance matrix of each subgroup has generally performed better in the results, both in terms of the correct classification and the minimum information criteria value. The classification accuracy in mixture clustering based on a mixture of multivariate normal distributions according to covariance types is given in Figure 2.

7. Conclusions

In this study, we compared the effectiveness of information criteria in clustering analysis based on the mixture of multivariate normal distributions. As a result of this simulation study, KIC gave better results than other information criteria in the determination of the number of clusters in mixture clustering based on a mixture of multivariate normal distributions. Also, the efficiency of different types of covariance matrices are investigated in the model based clustering. The better results are obtained by the using covariance matrix of each subgroup (Type III) in mixture clustering based on a mixture of multivariate normal distributions.

Acknowledgments

This research has been supported by TUBITAK-BIDEB (2211) Ph.D. scholarship program. The author is grateful to the editors and the anonymous referees for their constructive comments and valuable suggestions which have helped me very much to improve the paper.

Author Contributions

All authors have equally contributed to this paper. They have read and approved the final version of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Newcomb, S. A generalized theory of the combination of observations so as to obtain the best result. Am. J. Math. 1886, 8, 343–366. [Google Scholar] [CrossRef]
  2. Pearson, K. Contributions to the mathematical theory of evolution. Philos. Trans. R. Soc. Lond. A 1894, 185, 71–110. [Google Scholar] [CrossRef]
  3. Wolfe, J.H. A Computer Program for the Maximum Likelihood Analysis of Types; U.S. Naval Personnel Research Activity: San Diego, CA, USA, 1965. [Google Scholar]
  4. Wolfe, J.H. Normix: Computational Methods for Estimating the Parameters of Multivariate Normal Mixtures of Distributions; U.S. Naval Personnel Research Activity: San Diego, CA, USA, 1967. [Google Scholar]
  5. Day, N.E. Estimating the components of a mixture of normal distributions. Biometrika 1969, 56, 463–474. [Google Scholar] [CrossRef]
  6. Oliveira-Brochado, A.; Martins, F.V. Assessing the Number of Components in Mixture Models: A Review; Universidade do Porto, Faculdade de Economia do Porto: Porto, Portugal, 2005. [Google Scholar]
  7. Mclachlan, G.; Peel, D. Finite Mixture Models; John Wiley & Sons, Inc.: New York, NY, USA, 2000. [Google Scholar]
  8. Fraley, C. Algorithms for model-based Gaussian hierarchical clustering. SIAM J. Sci. Comput. 1998, 20, 270–281. [Google Scholar] [CrossRef]
  9. Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser B 1977, 39, 1–38. [Google Scholar]
  10. Servi, T. Multivariate Mixture Distribution Model Based Cluster Analysis. Ph.D. Thesis, University of Cukurova, Adana, Turkey, 2009. [Google Scholar]
  11. Akaike, H. Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory; Petrov, B.N., Csaki, F., Eds.; Akademiai Kiado: Budapest, Hungary, 1973; pp. 267–281. [Google Scholar]
  12. Hurvich, C.M.; Tsai, C.L. Regression and time series model selection in small samples. Biometrika 1989, 76, 297–307. [Google Scholar] [CrossRef]
  13. Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  14. Hathaway, R.J. Another interpretation of the EM algorithm for mixture distributions. Stat. Probab. Lett. 1986, 4, 53–56. [Google Scholar] [CrossRef]
  15. Biernacki, C.; Govaert, G. Using the classification likelihood to choose the number of clusters. Comput. Sci. Stat. 1997, 29, 451–457. [Google Scholar]
  16. Banfield, J.D.; Raftery, A.E. Model-based Gaussian and non-Gaussian clustering. Biometrics 1993, 49, 803–821. [Google Scholar] [CrossRef]
  17. Celeux, G.; Soromenho, G. An entropy criterion for assessing the number of clusters in a mixture. J. Classif. 1996, 13, 195–212. [Google Scholar] [CrossRef]
  18. Biernacki, C.; Celeux, C.; Govaert, G. An improvement of the NEC criterion for assessing the number of clusters in a mixture model. Pattern Recognit. Lett. 1999, 20, 267–272. [Google Scholar] [CrossRef]
  19. Cavanaugh, J.E. A large-sample model selection criterion based on Kullback’s symmetric divergence. Stat. Probab. Lett. 1999, 42, 333–343. [Google Scholar] [CrossRef]
  20. Seghouane, A.-K.; Maiza, B. A small sample model selection criterion based on Kullback’s symmetric divergence. IEEE Trans. Signal Process. 2004, 52, 3314–3323. [Google Scholar] [CrossRef]
  21. Seghouane, A-K.; Bekara, M.; Fleury, G. A criterion for model selection in the presence of incomplete data based on Kullback’s symmetric divergence. Signal Process. 2005, 85, 1405–1417. [Google Scholar]
  22. UCI Machine Learning Repository. Available online: http://archive.ics.uci.edu/ml (accessed on 6 May 2016).
  23. Ruspini, E.H. Numerical methods for fuzzy clustering. Inf. Sci. 1970, 2, 319–350. [Google Scholar] [CrossRef]
Figure 1. According to synthetic data sets, (a) the percentage of success for the determination of the number of clusters from the best six information criteria and (b) the average of success in determining the number of clusters from the information criteria.
Figure 1. According to synthetic data sets, (a) the percentage of success for the determination of the number of clusters from the best six information criteria and (b) the average of success in determining the number of clusters from the information criteria.
Mca 21 00034 g001
Figure 2. The efficiency of different covariance types in the mixture clustering, according to the classification accuracy.
Figure 2. The efficiency of different covariance types in the mixture clustering, according to the classification accuracy.
Mca 21 00034 g002
Table 1. Descriptions of real data sets.
Table 1. Descriptions of real data sets.
Data Sets *Sample Size ( n )Number of Variables ( p )Number of Clusters ( g )
Liver Disorders34562
Iris15043
Wine178133
Ruspini7524
Vehicle Silhouettes846184
Landsat Satellite6435366
Image Segmentation2310197
Note: * Data sets are taken from the website of UCI Machine Learning Repository [22].
Table 2. Information criteria results in the determination of the number of clusters for the Liver Disorders data set.
Table 2. Information criteria results in the determination of the number of clusters for the Liver Disorders data set.
Liver Disorders Data Set
g 2 *345
AIC14,752.4314,693.7514,605.72 *14,643.68
AICc14,773.7514,747.1714,712.44 *14,833.53
AKICc14,832.7914,834.9914,829.30 *14,979.88
AWE15,503.68 *15,865.2416,176.2716,582.91
BIC14,963.83 *15,012.7615,032.3615,177.93
CLC14,695.8914,646.2114,546.0014,541.41 *
KIC14,810.4314,779.7514,719.72 *14,785.68
KICc14,837.70 *14,846.9314,852.2415,018.79
NEC0.06956 *0.134120.157970.16811
Note: * True value of g or value of g given by criterion. AIC: Akaike information criterion; AICc: corrected Akaike information criterion; AKICc: approximation of Kullback information criterion; AWE: approximate weight of evidence criterion; BIC: Bayesian information criterion; CLC: classification likelihood criterion; KIC: Kullback information criterion; KICc: corrected Kullback information criterion; NEC: normalized entropy criterion.
Table 3. Information criteria results in the determination of the number of clusters for the Iris data set.
Table 3. Information criteria results in the determination of the number of clusters for the Iris data set.
Iris Data Set
g 23 *45
AIC487.11449.15448.86 *474.12
AICc501.61486.86 *527.53622.12
AKICc534.98 *536.37593.75706.14
AWE806.74 *944.151126.551378.81
BIC574.42 *581.61626.49696.90
CLC429.12371.21358.29 *415.24
KIC519.11496.15 *510.86551.12
KICc538.21 *544.45609.73734.14
NEC0.00003 *0.025240.063930.20542
Note: * True value of g or value of g given by criterion.
Table 4. Information criteria results in the determination of the number of clusters for the Wine data set.
Table 4. Information criteria results in the determination of the number of clusters for the Wine data set.
Wine Data Set
g 23 *45
AIC6446.146255.42 *6258.496382.48
AICc3703.01 *4811.484804.114796.89
AKICc3965.94 *5127.505223.445320.90
AWE8821.75 *9827.0411,021.2212,345.63
BIC7111.13 *7254.507591.668049.74
CLC6028.775630.885421.885343.12 *
KIC6658.146572.42*6680.496909.48
KICc----
NEC0.00099 *0.003340.001120.00650
Note: * True value of g or value of g given by criterion. KICc could not be calculated because d is greater than n .
Table 5. Information criteria results in the determination of the number of clusters for the Ruspini data set.
Table 5. Information criteria results in the determination of the number of clusters for the Ruspini data set.
Ruspini Data Set
g 234 *5
AIC1409.231369.921329.891322.48 *
AICc1413.421380.661351.54 *1361.15
AKICc1428.431402.431380.33 *1397.39
AWE1515.21 *1534.051552.261602.17
BIC1434.721409.321383.19 *1389.69
CLC1387.231336.261284.661264.76 *
KIC1423.231389.921355.891354.48 *
KICc1429.341404.711384.81 *1405.06
NEC0.00001 *0.001840.003280.00108
Note: * True value of g or value of g given by criterion.
Table 6. Information criteria results in the determination of the number of clusters for the Vehicle Silhouettes data set.
Table 6. Information criteria results in the determination of the number of clusters for the Vehicle Silhouettes data set.
Vehicle Silhouettes Data Set
g 234 *5
AIC80,747.8478,171.4676,987.29 *77,496.43
AICc81,365.9680,521.6790,402.1760,158.93 *
AKICc81,753.3781,112.5691,366.4861,230.64 *
AWE86,249.13 *86,431.4088,003.0691,282.76
BIC82,544.5080,868.8180,585.34 *81,995.18
CLC80,002.8277,053.6975,493.95 *75,642.25
KIC81,129.8478,743.4677,749.29 *78,448.43
KICc81,877.0581,488.13 *92,531.84-
NEC0.00204 *0.002170.002270.00408
Note: * True value of g or value of g given by criterion. KICc ( g = 5 ) could not be calculated because d is greater than n .
Table 7. Information criteria results in the determination of the number of clusters for the Landsat Satellite data set.
Table 7. Information criteria results in the determination of the number of clusters for the Landsat Satellite data set.
Landsat Satellite Data Set
g 56 *78
AIC1,255,771.791,251,929.66 *1,253,394.331,252,217.99
AICc1,264,231.87 *1,267,975.951,285,377.581,330,205.05
AKICc1,267,757.79 *1,272,212.701,290,337.971,335,962.02
AWE1,321,382.181,330,827.271,345,663.271,357,765.53
BIC1,279,559.84 *1,280,476.681,286,700.301,290,282.93
CLC1,249,208.091,244,214.251,244,611.321,242,274.65 *
KIC1,259,288.791,256,149.66 *1,258,317.331,257,843.99
KICc1,269,326.31 *1,274,849.921,294,725.151,343,659.52
NEC0.004470.006590.009690.01167
Note: * True value of g or value of g given by criterion. AWE and NEC have found g = 2 .
Table 8. Information criteria results in the determination of the number of clusters for the Image Segmentation data set.
Table 8. Information criteria results in the determination of the number of clusters for the Image Segmentation data set.
Image Segmentation Data Set
g 567 *8
AIC67,666.7464,307.9459,502.75 *61,395.91
AICc68,423.3265,517.4361,352.58 *64,152.73
AKICc69,193.2866,441.6062,431.40 *65,386.90
AWE80,315.6079,480.1877,188.94 *81,663.54
BIC72,055.9269,576.1165,649.90 *68,422.05
CLC66,189.2362,524.8457,404.63 *59,050.26
KIC68,433.7465,227.9460,575.75 *62,621.91
KICc69,356.9366,692.9762,798.53 *65,905.24
NEC0.000660.000630.00049 *0.00119
Note: * True value of g or value of g given by criterion.
Table 9. The accuracy of determining the cluster numbers from the information criteria according to synthetic data sets.
Table 9. The accuracy of determining the cluster numbers from the information criteria according to synthetic data sets.
Synthetic Data SetsAICAICcAKICcAWEBICCLCKICKICcNEC
Data1 (Generated from Liver)46.897.998.998.699.152.595.898.991.3
Data2 (Generated from Iris)4266.950.91.442.44.768.344.12.3
Data3 (Generated from Wine)48.900039.10.864034.1
Data4 (Generated from Ruspini)35.456.858.113.45421.745.458.419.3
Data5 (Generated from Vehicle)58003.259.61259018.3
Data6 (Generated from Landsat)37.60.1000.32.43800.1
Data7 (Generated from Image)36.637.337.532.137.226.73737.39.5
The average of success43.63735.121.247.417.358.234.125
Note: The best performance is indicated in bold.
Table 10. Classification accuracy (CA) and information criteria results for real data sets, according to different types of covariance structures.
Table 10. Classification accuracy (CA) and information criteria results for real data sets, according to different types of covariance structures.
Data SetsCovariance TypesCAAICAICcAKICcBICKICKICc
Liver DisordersI43.4815,371.115,378.815,416.415,501.815,408.115,418.2
II43.7715,605.715,608.115,630.415,678.715,627.715,630.9
III49.8614,752.414,773.714,832.814,963.814,810.414,837.7
IV49.2814,905.214,909.314,937.715,001.314,933.214,938.6
IrisI43.33799.5809.1837.1871.7826.5839.3
II83.331326.91332.11353.91381.11347.91355.1
III96.67449.1486.9536.4581.6496.1544.5
IV90.00666.5677.9708.1744.8695.5710.7
WineI41.016851.27631.57799.27271.26986.27908.1
II97.197528.87662.47751.07783.37611.87777.2
III85.396255.44811.55127.57254.56572.44864.0
IV93.266757.16890.76979.37011.76840.17005.6
RuspiniI26.671546.81553.81572.21579.31563.81573.7
II69.331545.01551.01568.21575.11561.01569.5
III100.001329.91351.51380.31383.21355.91384.8
IV98.671324.21338.01362.11368.21346.21365.0
Vehicle SilhouettesI29.7985,618.685,821.486,072.886,784.785,867.686,117.5
II35.70111,399.9111,423.2111,519.8111,840.8111,495.9111,525.4
III46.4576,987.390,402.291,366.580,585.377,749.392,531.8
IV36.64102,899.9102,962.3103,113.4103,596.8103,049.9103,127.9
Landsat SatelliteI62.461,349,241.41,349,525.31,350,416.21,355,245.91,350,131.41,350,483.6
II67.691,828,135.01,828,156.51,828,416.71,829,874.81,828,395.01,828,422.0
III56.671,251,929.71,267,975.91,272,212.71,280,476.71,256,149.71,274,849.9
IV65.871,618,062.21,618,126.11,618,566.41,621,020.51,618,502.21,618,582.0
Image SegmentationI28.7038,937.239,000.239,257.940,396.439,194.239,273.0
II66.54286,194.3286,210.9286,348.3286,964.1286,331.3286,352.3
III65.9359,502.761,352.662,431.465,649.960,575.762,798.5
IV63.68198,790.6198,841.7199,075.3200,111.9199,023.6199,087.5
Type I ( Σ ) : Covariance matrix of the data set used for clustering.
Type II ( σ i 2 I ) : Variance matrix of the data set used for clustering.
Type III ( Σ k ) : Covariance matrix of each subgroup in the data set.
Type IV ( σ i k 2 I ) : Variance matrix of each subgroup in the data set.
Note: The best performances are indicated in bold.

Share and Cite

MDPI and ACS Style

Akogul, S.; Erisoglu, M. A Comparison of Information Criteria in Clustering Based on Mixture of Multivariate Normal Distributions. Math. Comput. Appl. 2016, 21, 34. https://doi.org/10.3390/mca21030034

AMA Style

Akogul S, Erisoglu M. A Comparison of Information Criteria in Clustering Based on Mixture of Multivariate Normal Distributions. Mathematical and Computational Applications. 2016; 21(3):34. https://doi.org/10.3390/mca21030034

Chicago/Turabian Style

Akogul, Serkan, and Murat Erisoglu. 2016. "A Comparison of Information Criteria in Clustering Based on Mixture of Multivariate Normal Distributions" Mathematical and Computational Applications 21, no. 3: 34. https://doi.org/10.3390/mca21030034

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop