Single-Valued Neutrosophic Clustering Algorithm Based on Tsallis Entropy Maximization

Data clustering is an important field in pattern recognition and machine learning. Fuzzy c-means is considered as a useful tool in data clustering. The neutrosophic set, which is an extension of the fuzzy set, has received extensive attention in solving many real-life problems of inaccuracy, incompleteness, inconsistency and uncertainty. In this paper, we propose a new clustering algorithm, the single-valued neutrosophic clustering algorithm, which is inspired by fuzzy c-means, picture fuzzy clustering and the single-valued neutrosophic set. A novel suitable objective function, which is depicted as a constrained minimization problem based on a single-valued neutrosophic set, is built, and the Lagrange multiplier method is used to solve the objective function. We do several experiments with some benchmark datasets, and we also apply the method to image segmentation using the Lena image. The experimental results show that the given algorithm can be considered as a promising tool for data clustering and image processing.


Introduction
Data clustering is one of the most important topics in pattern recognition, machine learning and data mining.Generally, data clustering is the task of grouping a set of objects in such a way that objects in the same group (cluster) are more similar to each other than to those in other groups (clusters).In the past few decades, many clustering algorithms have been proposed, such as k-means clustering [1], hierarchical clustering [2], spectral clustering [3], etc.The clustering technique has been used in many fields, including image analysis, bioinformatics, data compression, computer graphics and so on [4][5][6].
The k-means algorithm is one of the typical hard clustering algorithms that is widely used in real applications due to its simplicity and efficiency.Unlike hard clustering, the fuzzy c-means (FCM) algorithm [7] is one of the most popular soft clustering algorithms in which each data point belongs to a cluster to some degree that is specified by membership degrees in [0, 1], and the sum of the clusters for each of the data should be equal to one.In recent years, many improved algorithms for FCM have been proposed.There are three main ways to build the clustering algorithm.First is extensions of the traditional fuzzy sets.In this way, numerous fuzzy clustering algorithms based on the extension fuzzy sets, such as the intuitionistic fuzzy set, the Type-2 fuzzy set, etc., are built.By replacing traditional fuzzy sets with intuitionistic fuzzy set, Chaira introduced the intuitionistic fuzzy clustering (IFC) method in [8], which integrated the intuitionistic fuzzy entropy with the objective function.Hwang and Rhee suggested deploying FCM on (interval) Type-2 fuzzy set sets in [9], which aimed to design and manage uncertainty for fuzzifier m.Thong and Son proposed picture fuzzy clustering based on the picture fuzzy set (PFS) in [10].Second, the kernel-based method is applied to improve the fuzzy clustering quality.For example, Graves and Pedrycz presented a kernel version of the FCM algorithm, namely KFCM in [11].Ramathilagam et al. analyzed the Lung Cancer database by incorporating the hyper tangent kernel function [12].Third, adding regularization terms to the objective function is used to improve the clustering quality.For example, Yasuda proposed an approach to FCM based on entropy maximization in [13].Of course, we can use these together to obtain better clustering quality.
The neutrosophic set was proposed by Smarandache [14] in order to deal with real-world problems.Now, the neutrosophic set is gaining significant attention in solving many real-life problems that involve uncertainty, impreciseness, incompleteness, inconsistency and indeterminacy.A neutrosophic set has three membership functions, and each membership degree is a real standard or non-standard subset of the nonstandard unit interval ]0 − , 1 + [= 0 − ∪ [0, 1] ∪ 1 + .Wang et al. [15] introduced single-valued neutrosophic sets (SVNSs), which are an extension of intuitionistic fuzzy sets.Moreover, the three membership functions are independent, and their values belong to the unit interval [0, 1].In recent years, the studies of SVNSs have been rapidly developing.For example, Majumdar and Samanta [16] studied the similarity and entropy of SVNSs.Ye [17] proposed correlation coefficients of SVNSs and applied them to single-valued neutrosophic decision-making problems, etc. Zhang et al. in [18] proposed a new definition of the inclusion relation of neutrosophic sets (which is also called the Type-3 inclusion relation), and a new method of ranking neutrosophic sets was given.Zhang et al. in [19] studied neutrosophic duplet sets, neutrosophic duplet semi-groups and cancelable neutrosophic triplet groups.
The clustering methods by the neutrosophic set have been studied deeply.In [20], Ye proposed a single-valued neutrosophic minimum spanning tree (SVNMST) clustering algorithm, and he also introduced single-valued neutrosophic clustering methods based on similarity measures between SVNSs [21].Guo and Sengur introduced the neutrosophic c-means clustering algorithm [22], which was inspired by FCM and the neutrosophic set framework.Thong and Son did significant work on clustering based on PFS.In [10], a picture fuzzy clustering algorithm, called FC-PFS, was proposed.In order to determine the number of clusters, they built an automatically determined most suitable number of clusters based on particle swarm optimization and picture composite cardinality for a dataset [23].They also extended the picture fuzzy clustering algorithm for complex data [24].Unlike the method in [10], Son presented a novel distributed picture fuzzy clustering method on the picture fuzzy set [25].We can note that the basic ideas of the fuzzy set, the intuitionistic fuzzy set and the SVNS are consistent in the data clustering, but there are differences in the representation of the objects, so that the clustering objective functions are different.Thus, the more adequate description can be better used for clustering.Inspired by FCM, FC-PFS, SVNS and the maximization entropy method, we propose a new clustering algorithm, the single-valued neutrosophic clustering algorithm based on Tsallis entropy maximization (SVNCA-TEM), in this paper, and the experimental results show that the proposed algorithm can be considered as a promising tool for data clustering and image processing.
The rest of paper is organized as follows.Section 2 shows the related work on FCM, IFC and FC-PFS.Section 3 introduces the proposed method, using the Lagrange multiplier method to solve the objective function.In Section 4, the experiments on some benchmark UCI datasets indicate that the proposed algorithm can be considered as a useful tool for data clustering and image processing.The last section draws the conclusions.

Related Works
In general, suppose dataset D = {X 1 , X 2 , • • • , X n } includes n data points, each of the data In the following, we will briefly introduce three fuzzy clustering methods, which are FCM, IFC and FC-PFS.

Fuzzy c-Means
The FCM was proposed in 1984 [7].FCM is a data clustering technique wherein each data point belongs to a cluster to some degree that is specified by a membership grade.A data point X i of cluster C j is denoted by the term µ ij , which shows the fuzzy membership degree of the i-th data point in the j-th cluster.We use V = {V 1 , V 2 , • • • , V k } to describe the cluster centroids of the clusters, and V j ∈ R d is the cluster centroid of C j .The FCM is based on the minimization of the following objective function: where m represents the fuzzy parameter and m ≥ 1.The constraints for (1) are, Using the Lagrangian method, the iteration scheme to calculate cluster centroids V j and the fuzzy membership degrees µ ij of the objective function ( 1) is as follows: The iteration will not stop until it reaches the maximum iterations or |J (t) − J (t−1) | < , where J (t) and J (t−1) are the objection function value at (t)-th and (t − 1)-th iterations, and is a termination criterion between zero and 0.1.This procedure converges to a local minimum or a saddle point of J. Finally, each data point is assigned to a different cluster according to the fuzzy membership value, that is

Intuitionistic Fuzzy Clustering
The intuitionistic fuzzy set is an extension of fuzzy sets.Chaira proposed intuitionistic fuzzy clustering (IFC) [8], which integrates the intuitionistic fuzzy entropy with the objective function of FCM.The objective function of IFS is: where π * j = 1 n ∑ n i=1 π ij , and π ij is the hesitation degree of X i for C j .The constraints of IFC are similar to (2).Hesitation degree π ik is initially calculated using the following form: and the intuitionistic fuzzy membership values are obtained as follows: where µ * ij (µ ij ) denotes the intuitionistic (conventional) fuzzy membership of the i-th data in the j-th class.The modified cluster centroid is: The iteration will not stop until it reaches the maximum iterations or the difference between µ * (t) ij and µ * (t−1) ij is not larger than a pre-defined threshold , that is max i,j |µ * (t)

Picture Fuzzy Clustering
In [26], Cuong introduced the picture fuzzy set (which is also called the standard neutrosophic set [27]), which is defined on a non-empty set S, is the positive degree of each element x ∈ X, η Ȧ(x) is the neutral degree and γ Ȧ(x) is the negative degree satisfying the constraints, The refusal degree of an element is calculated as: In [10], Thong and Son proposed picture fuzzy clustering (FC-PFS), which is related to neutrosophic clustering.The objective function is: where and ξ ij are the positive, neutral and refusal degrees, respectively, for which each data point X i belongs to cluster C j .Denote µ, η and ξ as the matrices whose elements are µ ij , η ij and ξ ij , respectively.The constraints for FC-PFS are defined as follows: Using the Lagrangian multiplier method, the iteration scheme to calculate µ ij , η ij , ξ ij and V j for the model (11,12) is as the following equations: The iteration will not stop until it reaches the maximum iterations or

The Proposed Model and Solutions
Definition 1. [15] Set U as a space of points (objects), with a generic element in U denoted by u.A SVNS A in U is characterized by three membership functions, a truth membership function T A , an indeterminacy membership function I A and a falsity-membership function F A , where ∀u ∈ U, T A (u), There is no restriction on the sum of T A (u), I A (u) and F A (u); thus, 0 ≤ T A (u) + I A (u) + F A (u) ≤ 3.
Moreover, the hesitate membership function is defined as Entropy is a key concept in the uncertainty field.It is a measure of the uncertainty of a system or a piece of information.It is an improvement of information entropy.The Tsallis entropy [28], which is a generalization of the standard Boltzmann-Gibbs entropy, is defined as follows.
Definition 2. [28] Let X be a finite set and X be a a random variable taking values x ∈ X , with distribution p(x).The Tsallis entropy is defined as S m (X) , where m > 0 and m = 1.
For FCM, µ ij denotes the fuzzy membership degree of X i to C j , and supports ∑ k j=1 µ ij = 1.From Definition 2, the Tsallis entropy of µ can be described by . n being a fixed number, Yasuda [13] used the following formulary to describe the the Tsallis entropy of µ: The maximum entropy principle has been widely applied in many fields, such as spectral estimation, image restoration, error handling of measurement theory, and so on.In the following, the maximum entropy principle is applied to the single-valued neutrosophic set clustering.After the objection function of clustering is built, the maximum fuzzy entropy is used to regularize variables.
Suppose that there is a dataset D consisting of n data points in d dimensions.Let µ ij , γ ij , η ij and ξ ij be the truth membership degree, falsity-membership degree, indeterminacy membership degree and hesitate membership degree, respectively, that each data point X i belongs to cluster C j .Denote µ, γ, η and ξ as the matrices, the elements of which are µ ij , γ ij , η ij and ξ ij , respectively, where The single-valued neutrosophic clustering based on Tsallis entropy maximization (SVNC-TEM) is the minimization of the following objective function: The constraints are given as follows: The proposed model in Formulary ( 18)-( 21) is applied to the maximum entropy principle of the SVNS.Now, let us summarize the major points of this model as follows.

•
The first term of the objection function (18) describes the weighted distance sum of each data point X i to the cluster center V j .µ ij being from the positive aspect and (4 − ξ ij − γ ij ) (four is selected in order to guarantee µ ij ∈ [0, 1] in the iterative calculation) from the negative aspect, denoting the membership degree for X i to V j , we use µ ij (4 − ξ ij − γ ij ) to represent the "integrated true" membership of the i-th data point in the j-th cluster.From the maximum entropy principle, the best to represent the current state of knowledge is the one with largest entropy, so the second term of the objection function (18) describes the negative Tsallis entropy of µ(4 − γ − ξ), which means that the minimization of ( 18) is the maximum Tsallis entropy.ρ is the regularization parameter.If γ = η = ξ = 0, the proposed model returns the FCM model.

•
Formulary (20) implies that the "integrated true" membership of a data point X i to the cluster center V j satisfies the sum-row constraint of memberships.For convenience, we set 21) guarantees the working of the SVNS since at least one of two uncertain factors, namely indeterminacy membership degree and hesitate membership degree, always exists in the model.
In order to get V j , taking the derivative of the objective function with respect to V j , we have ∂J ; thus, (23) holds.

Similarly, ∂L ∂η
Theorem 1 guarantees the convergence of the proposed method.The detailed descriptions of SVNC-TEM algorithm are presented in the following Algorithm 1: , number of clusters k, maximal number of iterations (Max-Iter), parameters: m, , ρ Output: Cluster result 1: t = 0; 2: Initialize µ, γ, ξ, satisfies Constraints ( 19) and (20); 3: Repeat 4: Update Update ξ Compared to FCM, the proposed algorithm needs additional time to calculate µ, γ, η and ξ in order to more precisely describe the object and get better performance.If the dimension of the given dataset is d, the number of objects is n, the number of clusters is c and the number of iterations is t, then the computational complexity of the proposed algorithm is O(dnct).We can see that the computational complexity is very high if d and n are large.

Experimental Results
In this section, some experiments are intended to validate the effectiveness of the proposed algorithm SVNC-TEM for data clustering.Firstly, we used an artificial dataset to show that SVNC-TEM can cluster well.Secondly, the proposed clustering method was used in image segmentation using an example.Lastly, we selected five benchmark datasets, and SVNC-TEM was compared to four state-of-the-art clustering algorithms, which were: k-means, FCM, IFC and FS-PFS.
In the experiments, the parameter m was selected as two and = 10 −5 .The maximum iterations (Max-Iter) = 100.The selected datasets have class labels, so the number of cluster k was known in advance.All the codes in the experiments were implemented in MATLAB R2015b.

Artificial Data to Cluster by the SVNC-TEM Algorithm
The activities of the SVNC-TEM algorithm are illustrated to cluster artificial data, which was two-dimensional data and had 100 data points, into four classes.We use an example to show the clustering process of the proposed algorithm.The distribution of data points is illustrated in Figure 1a. Figure 1b-e shows the cluster results when the number of iterations was t = 1, 5, 10, 20, respectively.We can see that the clustering result was obtained when t = 20.Figure 1f shows the final results of the clustering; the number of iterations was 32.We can see that the proposed algorithm gave correct clustering results from Figure 1.

Image Segmentation by the SVNC-TEM Algorithm
In this subsection, we use the proposed algorithm for image segmentation.As a simple example, the Lena image was used to test the proposed algorithm for image segmentation.Through this example, we wish to show that the proposed algorithm can be applied to image segmentation.Figure 2a is the original Lena image.Figure 2b shows the segmentation images when the number of clusters was k = 2, and we can see that the quality of the image was greatly reduced.Figure 2c-f shows the segmentation images when the number of clusters was k = 5, 8, 11 and 20, respectively.We can see that the quality of segmentation image was improved very much with the increase of the clustering number.The above two examples demonstrate that the proposed algorithm can be effectively applied to clustering and image processing.Next, we will further compare the given algorithm to other state-of-art clustering algorithms on benchmark datasets.

Comparison Analysis Experiments
In order to verify the clustering performance, in this subsection, we experiment with five benchmark datasets of the UCI Machine Learning Repository, which are IRIS, CMC, GLASS, BALANCE and BREAST.These datasets were used to test the performance of the clustering algorithm.Table 1 shows the details of the characteristics of the datasets.In order to compare the performance of the clustering algorithms, three evaluation criteria were introduced as follows.
Given one data point X i , denote p i as the truth class and q i as the predicted clustering class.The clustering accuracy (ACC) measure is evaluated as follows: We analyze the results from the dataset firstly.For IRIS dataset, the proposed method obtained the best performance for ACC, NMI and RI.For the CMC dataset, the proposed method had the best performance for ACC and RI.For the GLASS and BREAST datasets, the proposed method obtained the best performance for ACC and NMI.For the BALANCE dataset, the proposed method had the best performance for NMI and RI.On the other hand, from the three evaluation criteria, for ACC and NMI, the proposed method beat the other methods for four datasets.For RI, SVNC-TEM beat the other methods for three datasets.From the experimental results, we can see that the proposed method had better clustering performance than the other algorithms.

Conclusions
In the paper, we consider the truth membership degree, the falsity-membership degree, the indeterminacy membership degree and hesitate membership degree in a comprehensive way for data clustering by the single-valued neutrosophic set.We propose a novel data clustering algorithm, SVNC-TEM, and the experimental results showed that the proposed algorithm can be considered as a promising tool for data clustering and image processing.The proposed algorithm had better clustering performance than the other algorithms such as k-means, FCM, IFC and FC-PFS.Next, we will consider the proposed method to deal with outliers.Moreover, we will consider the clustering algorithm combined with spectral clustering and other clustering methods.

Figure 1 .
Figure 1.The demonstration figure of the clustering process for artificial data.(a) The original data.(b-e) The clustering figures when the number of iterations t = 1, 5, 10, 20, respectively.(f) The final clustering result.

20 Figure 2 .
Figure 2. The image segmentation for the Lena image.(a) The original Lena image.(b-f) The clustering images when the number of clusters k = 2, 5, 8, 11 and 20, respectively.

Table 1 .
Description of the experimental datasets.

Table 4 .
The RI for different algorithms on different datasets.
Bold format: the best performance.