Fuzzy Rough C-Mean Based Unsupervised CNN Clustering for Large-Scale Image Data

: Deep learning has been well-known for a couple of years, and it indicates incredible possibilities for unsupervised learning of representations with the clustering algorithm. The forms of Convolution Neural Networks (CNN) are now state-of-the-art for many recognition and clustering tasks. However, with the perpetual incrementation of digital images, there exist more and more redundant, irrelevant, and noisy samples which cause CNN running to gradually decrease, and its clustering accuracy decreases concurrently. To conquer these issues, we proposed an effective clustering method for a large-scale image dataset which combines CNN and a Fuzzy-Rough C-Mean (FRCM) clustering algorithm. The main idea is that ﬁrst a high-level representation, learned by multi-layers of CNN with one clustering layer, produce the initial cluster center, then during training image clusters, and representations, are updating jointly. FRCM is utilized to update the cluster centers in the forward pass, while the parameters of proposed CNN are updated by the backward pass based on Stochastic Gradient Descent (SGD). The concept of the rough set of lower and boundary approximations deal with uncertainty, vagueness, and incompleteness in cluster deﬁnition, and fuzzy sets enable efﬁcient handling of overlapping partitions in the noisy environment. The experiment results show that the proposed FRCM based unsupervised CNN clustering method is better than the standard K-Mean, Fuzzy C-Mean, FRCM and also other deep-learning-based clustering algorithms on large-scale image data.


Introduction
Image clustering [1][2][3][4][5][6][7][8][9][10][11][12][13] is a consequential research field in image processing and computer vision applications. Nowadays, it has come into an age of big data (with many novel and portable digital devices). Every day millions of data are produced and worldwide a large number of digital images are uploaded to the cloud for storage or sharing. With more and more data being generated, efficiently consolidating the large-scale image data is becoming a demanding problem. However, many researchers focus on decreasing data from a large-scale dataset [14] and feature encoding [15] for large-scale image clustering. The reason for this is the existence of many redundant and noise samples among the entire large data [16].
Unsupervised clustering is an essential machine-learning technique that is utilized to discover the common natural cluster structure of an unlabeled dataset [17]. Therefore, a great clustering data is large then it devours high computation and memory complexity. Instead, an unnecessary number of input data would increase the parameter fine-tuning frequency of CNNs; this could cause the risk of uncertainties and over-fitting among the raw data [49,50]. Therefore, it is normal to reason that there must be considerable noise and redundant data in large-scale data sets.
To overcome the above issues, the contribution of our work is that we can decrease the training time and furthermore maintain and even improve the test accuracy by selecting noise-free data for updating clusters by using the Fuzzy-Rough C-Mean (FRCM) algorithm. In this paper, we satisfy the model with unsupervised CNN clustering and FRCM algorithm. Our proposed clustering algorithm benefits from both CNNs and FRCM by merging one with the other; this is shown in Figure 1. Unsupervised CNN clustering (UCNN) architecture is proposed, which can extract salient features to produce the initial cluster centers. During the learning stage the cluster and representation are updating simultaneously: cluster centers are updating based on robust free samples by FRCM algorithm in forward-pass and the parameters of representation are updating in backward-pass on stochastic gradient descent (SGD). The main idea behind our method is that good representations are beneficial to image clustering and better clustering results beneficial for feature representations.
The main contributions of the proposed algorithm are as follows: 1.
We present an FRCM-based unsupervised CNN clustering, which is robust to the uncertainties in the training data and could achieve reliable performances even with noisy samples.

2.
We propose a joint learning framework to simultaneously update the parameter of unsupervised CNN and the cluster centroid iteratively. 3.
We introduce FRCM with CNN to reduce the time complexity and increase cluster performance by updating the cluster centers based on a reliable sample selection, which is a key component of our method to ensure its success.

4.
Extensive experiments on large-scale image datasets indicate that our strategy to enhance the clustering accuracy, when compared with other non-fuzzy deep neural networks, show that fuzzy learning is without a doubt a conceivable method to further improve the performance of deep-learning-based clustering algorithms.
This paper is organized as follows: Section 2 deals with the implementation strategy of the algorithm, Section 3 describes the experiment and results, Section 4 provides the threats to validity and Section 5 provides the conclusion. the risk of uncertainties and over-fitting among the raw data [49,50]. Therefore, it is normal to reason that there must be considerable noise and redundant data in large-scale data sets. To overcome the above issues, the contribution of our work is that we can decrease the training time and furthermore maintain and even improve the test accuracy by selecting noise-free data for updating clusters by using the Fuzzy-Rough C-Mean (FRCM) algorithm. In this paper, we satisfy the model with unsupervised CNN clustering and FRCM algorithm. Our proposed clustering algorithm benefits from both CNNs and FRCM by merging one with the other; this is shown in Figure 1. Unsupervised CNN clustering (UCNN) architecture is proposed, which can extract salient features to produce the initial cluster centers. During the learning stage the cluster and representation are updating simultaneously: cluster centers are updating based on robust free samples by FRCM algorithm in forward-pass and the parameters of representation are updating in backward-pass on stochastic gradient descent (SGD). The main idea behind our method is that good representations are beneficial to image clustering and better clustering results beneficial for feature representations.
The main contributions of the proposed algorithm are as follows: 1. We present an FRCM-based unsupervised CNN clustering, which is robust to the uncertainties in the training data and could achieve reliable performances even with noisy samples. 2. We propose a joint learning framework to simultaneously update the parameter of unsupervised CNN and the cluster centroid iteratively. 3. We introduce FRCM with CNN to reduce the time complexity and increase cluster performance by updating the cluster centers based on a reliable sample selection, which is a key component of our method to ensure its success. 4. Extensive experiments on large-scale image datasets indicate that our strategy to enhance the clustering accuracy, when compared with other non-fuzzy deep neural networks, show that fuzzy learning is without a doubt a conceivable method to further improve the performance of deep-learning-based clustering algorithms.
This paper is organized as follows: Section 2 deals with the implementation strategy of the algorithm, Section 3 describes the experiment and results, Section 4 provides the threats to validity and Section 5 provides the conclusion.
Previously deep learning-based clustering algorithms have not focused on data uncertainty reductions to increase the accuracy and decrease the time complexity.
Xie et al. [45] proposed the first deep-learning based clustering. Deep Embedded Clustering (DEC) is based on auto-encoders as network architecture followed by conventional K-Means for final clustering. The network model is fine-tuned using the cluster assignment hardening less and the cluster centers are updated.
Yang et al. [64] also proposed the auto-encoder-based method followed by K-Mean clustering. However, the network is jointly trained using a combination of the representation learning and image clustering.
Lie et al. [58] proposed an idea almost identical to DEC except for using a conventional auto-encoder. However, for high-dimensional data, the auto-encoder usually cannot learn representation features well compared to CNN-based architectures.
Yang et al. [47] proposed a convolution neural network with re-running clustering (CNN-RC) method. For clustering, a hierarchical clustering approach is utilized. Concerning the training part, the network is jointly trained, and the cluster is updated in the forward pass, while representation learning is in the backward pass. However, compared the centroid-based clustering, hierarchical clustering devours high computation and memory complexity due to the size of image data which becomes large.
Dundar [46] proposed a CNN with the connection matrix (CNN-CM) method; for clustering K-Mean is utilized. A connection matrix is proposed that enables encouraging in additional side data to help to learn the representation for clustering. In the view of learned features, a full-set K-Mean is then performed to gather all images into their relating clusters. However, when the size of data becomes large, the complexity of full-set K-Mean will increase.
Hu & Lin [44] proposed clustering CNN to achieve joint clustering and representation learning with feature Drift Compensation for large-scale image data. They extracted the silent features from one of the internal layers of CCNN. At first, initial cluster centroids are assumed from extracted features of randomly picked k samples, and K-Mean is performed on the features extracted from the input dataset to get corresponding cluster labels. Based on the assigned labels and labels predicted by the Softmax layer, the network parameters can be updated. Further, the corresponding cluster centroids are updated by extracted features of the mini-batch. However, the fuzzy modeling achieves many advantages over the non-fuzzy method, such as robustness against uncertainties, vagueness, and overlapping dataset.

Background of Fuzzy Rough C-Mean (FRCM)
Hu et al. [69] proposed a FRCM clustering algorithm, which is the development and combination of FCM [70] and RCM [37,38]. As we know, FCM maps a membership over the range 0-1; each object belongs to some or all of the clusters to some fuzzy degree. RCM classifies the object space into three parts: lower approximation, boundary, and negative region. All the objects with RCM in the lower approximation take the same weight and all the objects in the boundary take another weighting index uniformly. In fact, the objects in the lower approximation definitely belong to a cluster, but the objects in the boundary regions belong to a cluster to some extent and have different influences on the centers and clusters, so different weighting values should be imposed on the boundary objects in computing the new centers. Inspired by the above idea, FRCM [69] integrates the advantage of fuzzy set theory and rough set theory and incorporates fuzzy membership values of each sample to the lower approximation and boundary area of a cluster. Let a set of image data I = {I 1 , I 2 , . . . , I n x } ∈ R d , where d is the dimension of the data points. Each cluster C j (j = 1, 2, . . . , k) is regarded as a rough set. It is categorized by the lower approximations C j , the upper approximations C j and the boundary area C B j = C j − C j , respectively. Let c = {c 1 , c 2 , . . . , c k } be a vector composed of k centers of clusters, where c j ∈ R d . The objects in lower approximation belong to a cluster categorically, however the objects in the boundary regions belong to a cluster to some extent and have diverse effect on the centers and clusters, so different weighting values ought to be imposed on the boundary objects in computing the new centers.
Let u = {u i (j)} n x x k be a membership matrix, we can define that membership function as follows: The exponent m > 1 is utilized to change the weighting impact of membership values. The new cluster center is computing as follows The objective function of FRCM is The FRCM can be formulated as follows.
Input: Unlabeled data I, number of cluster k, threshold parameter T, exponent index m, stop criterion ε. Output: membership matrix u, k cluster centers.
Step 1: Assign the data objects to the approximations i.
For a given data object I i calculate its closest center c (l) h and A (l) as follows Step 2: Compute membership values using Equation (1).
Step 3: Compute new cluster center by using Equation (2).
Step 4: Check convergence of the algorithm. If the algorithm has converged, stop, else l = l + 1 go to Step 1.

FRUCNN Clustering Architecture
To enhance the performance of the clustering algorithm, we integrate unsupervised CNN (UCNN) [71] with Fuzzy Rough C-Mean clustering to the proposed new clustering algorithm, which is shown in Figure 1. It is generally divided into two parts such as the pre-clustering part and further joint clustering and representation learning. During the learning stage, the clusters are updated by the FRCM algorithm. The pre-clustering part, Figure 2 show the network architecture of our proposed unsupervised convolutional neural network which contains multi-convolution with one clustering layer. During the pre-clustering part, the size of multi-convolution layers depends on the size of the dataset. For big data, it needs large-scale networks. For example, for image net [72] the unsupervised CNN clusters consist of five convolutional layers received from the initial five convolutional layers (Conv 1-Conv 5) of AlexNet [40], followed by three adjustment layers (Conv 6, Conv 7, and CConv) with channel number 6144, 2048 and k, respectively, that supplant the fully connected (FC) layer in AlexNet. The adjustment layers (Conv 6, Conv 7, and CConv) involve two convolutional layers (Conv 6-Conv 7) with one clustering convolutional layer (CConv) with k clusters, all with 3 × 3 kernels followed by global max-pooling. The maximum value for each channel of the clustering convolutional layer (CConv) is the output of the max-pooling so that the size is 1 × k. Finally, we join with a fully connected layer (FC) and Softmax layer to extract the image features. layer. During the pre-clustering part, the size of multi-convolution layers depends on the size of the dataset. For big data, it needs large-scale networks. For example, for image net [72] the unsupervised CNN clusters consist of five convolutional layers received from the initial five convolutional layers (Conv 1-Conv 5) of AlexNet [40], followed by three adjustment layers (Conv 6, Conv 7, and CConv) with channel number 6144, 2048 and k, respectively, that supplant the fully connected (FC) layer in AlexNet. The adjustment layers (Conv 6, Conv 7, and CConv) involve two convolutional layers (Conv 6-Conv 7) with one clustering convolutional layer (CConv) with k clusters, all with 3 × 3 kernels followed by global max-pooling. The maximum value for each channel of the clustering convolutional layer (CConv) is the output of the max-pooling so that the size is 1 × k. Finally, we join with a fully connected layer (FC) and Softmax layer to extract the image features.

Joint Clustering and Representation Learning
Suppose the unlabeled dataset contains images , , … . . . The main objective is to group images into k clusters , , … . . . Let , , … . . be the set of extracted features from the FC layer of UCNN using filters ⁄ , where represents the set of parameters (weights) of FC layer. We use FRCM to update the clusters by using features extracted from the FC layer as initial cluster centers.
In our proposed method, given an input image set, we first randomly pick k samples and extract their features as an initial cluster centroid using the proposed UCNN with an initial pre-trained image from the ImageNet datasets. FRCM is then performed to assign cluster labels to individual images randomly sampled from the input set until all images are processed. Subsequently, the proposed UCNN simultaneously updates the parameters of proposed UCNN and the centroids of image clusters iteratively based on stochastic gradient descent.
In the learning part, the weight W and cluster centroid c will be updated simultaneously using Algorithm 2, the cluster centroid is updated by using FRCM using Algorithm 1, and updating the representation parameters by stochastic gradient descent (SGD).

Pre-Processing Data for UCNN
Data argumentation is utilized to increase sample variety during the initial pre-clustering process. After the initialization, we used ILSVRC12 training set of ImageNet [73] to pre-train the parameters of Conv 1-Conv 5 in the AlexNet [

Joint Clustering and Representation Learning
Suppose the unlabeled dataset contains n x images I = {I 1 , I 2 , . . . , I n x }. The main objective is to group n x images into k clusters C = {C 1 , C 2 , . . . , C k }. Let H = {h 1 , h 2 , . . . , h n x } be the set of extracted features from the FC layer of UCNN using filters h i = f (W FC /I i ), where W FC represents the set of parameters (weights) of FC layer. We use FRCM to update the clusters by using features extracted from the FC layer as initial cluster centers.
In our proposed method, given an input image set, we first randomly pick k samples and extract their features as an initial cluster centroid using the proposed UCNN with an initial pre-trained image from the ImageNet datasets. FRCM is then performed to assign cluster labels to individual images randomly sampled from the input set until all images are processed. Subsequently, the proposed UCNN simultaneously updates the parameters of proposed UCNN and the centroids of image clusters iteratively based on stochastic gradient descent.
In the learning part, the weight W FC and cluster centroid c will be updated simultaneously using Algorithm 2, the cluster centroid is updated by using FRCM using Algorithm 1, and updating the representation parameters by stochastic gradient descent (SGD).

Pre-Processing Data for UCNN
Data argumentation is utilized to increase sample variety during the initial pre-clustering process. After the initialization, we used ILSVRC12 training set of ImageNet [73] to pre-train the parameters of Conv 1-Conv 5 in the AlexNet [40].

Cluster Centroid Updating
Let I = {I 1 , I 2 , . . . , I n x } be the set of n x images. Initially, k random images are selected from input image set I and extract their features H j using the pre-trained UCNN H . . , h t k ∈ H FC from FC layer as initial cluster centroid c in the initial iteration (i.e., t = 0) by UCNN network. The Fuzzy-Rough C-Mean (FRCM) algorithm [39] is performed to update the cluster centroid by objective function of FRCM, which is The exponent m > 1 is utilized to change the weighting impact of membership values. Updated cluster centroid by FRCM is In iteration t, the jth-centroid c t j that is assigned as a new sample h j represent the extracted features for FC layer.

Input:
Unlabeled I n x image dataset, Number of cluster k, Randomly select k image I nk from I nx , Extract image feature from I nk images as the initial centroid. Initial cluster center c, at t = 0, Exponent index m, Threshold parameter T, Stop criterion ε. Output: Updated cluster centroid c, Updated extracted features for FC layer.

1.
Let t = 0, initialization the cluster center Assign the data object to the approximation by Equation (4). Compute new cluster center c t+1 j according to Equation (6).

5.
Check convergence of the algorithm. If the algorithm converged, stop, else t = t + 1 go to step 2. 6.
Assigned features for FC layer as

Representation Learning
By using unsupervised CNN, the features are extracted for FC layers to generate the features for clustering by the output of the extracted local salient features from CConv layer [74]. To learn the parameters θ(w ri , w ij ) of FC and softmax layers of UCNN, we utilize SGD process [75] as in Figure 3, where w ri is the set of weights of layer FC and w ij is the set of weights of softmax. In order to learn the parameters of layers FC and softmax, we first define the objective function.
where k is denoted as the number of the cluster,ŷ j is the predicted jth cluster label utilizing UCNN and y j is the predicted jth cluster label by using FRCM that is used as a pseudo ground-truth to assist the update of the UCNN clustering model. Then we compute the gradient of the objective function w.r.t w ri for updating the weights of FC. For this, first use the chain rule to calculate gradient w.r.t w ij as follows.
where u j is the activation function of the jth ReLU [76]. The partial derivative of L w.r.tŷ j is and the partial derivative of ReLU w.r.t its u is The partial derivative of Consequently, w ij can be updated in the tth iteration by where η is the learning rate. Now by using chain rule to calculate gradient w.r.t w ri as follows. where and ∂u i Consequently w r i can be updated in the tth iteration by Equation (16) is the full gradient for updating the weights of FC layer.

Input:
Input image dataset I, Randomly select k images dataset I nk , Number of cluster k, Learning rate η, Max iteration τ, Randomly pick k images from I and extract image features h j .
Then initial cluster centers, c

Output:
Final cluster centroid, Final weight (w ri , w ij )

1.
For t = 1 to τ to do.

2.
Calculate y j cluster label using FRCM [39] as a ground-truth.

4.
Find predicted cluster labelŷ j using updated cluster centroid in FC layer of UCNN network.

Data Preparation
In this paper, MATLAB 2018a [77] was utilized as the programming tool. Table 1 is the description of three publically available datasets on which the experiments were performed. We selected one large-scale image dataset, ILSVRC12 in ImageNet [73], which consists of 1.2 million training images and 50 thousand validation images of 256 × 256-pixel size collected from one thousand object categories. Other than the large-scale image dataset, we additionally evaluated the performance of our proposed approach on two smaller scale dataset; MNIST [78] and Youtube-Face (YTF) [79]. MNIST contains 60 thousand training images and 10 thousand testing images of handwritten digits of 28 × 28-pixel size; the digits are centered and the size is normalized. YTF consists of

Data Preparation
In this paper, MATLAB 2018a [77] was utilized as the programming tool. Table 1 is the description of three publically available datasets on which the experiments were performed. We selected one large-scale image dataset, ILSVRC12 in ImageNet [73], which consists of 1.2 million training images and 50 thousand validation images of 256 × 256-pixel size collected from one thousand object categories. Other than the large-scale image dataset, we additionally evaluated the performance of our proposed approach on two smaller scale dataset; MNIST [78] and Youtube-Face (YTF) [79]. MNIST contains 60 thousand training images and 10 thousand testing images of hand-written digits of 28 × 28-pixel size; the digits are centered and the size is normalized. YTF consists of 10 thousand images 55 × 55-pixel size. The images are cropped faces and then resized into a constant. The personal computer is equipped and is used for experiments on all dataset with a commercial GPU card.

Performance Measure
We adopted four widely utilized clustering performance measures to evaluate the performance of the proposed method, Normalized Mutual Information (NMI) [80], Clustering Accuracy (ACC) [81], Mean of F-Measure (MFM), and Mean of Area Under the Curve (MAUC) [82].
NMI is defined as where C is the class label, Y is the cluster label, H(.) stands for the Entropy and I(CY) = H(C) − H(C/Y) denotes the mutual information between C and Y. A higher NMI is more consistent for clustering results. The Acc gives the same weight for each class. The final result is obtained by the average value of the accuracy rate of each class independently. Acc is defined as.
where m is the number of classes and Acc i stands for the accuracy rate for the ith class. Binary class problems are shown in Table 2 and donate categorized results of True and False. Here, the minority class is considered positive and the majority class is considered negative. Several measures can be deduced from the confusion matrix for the binary class problem.
The mean F-Measure can be defined for multi-class problems as follows: where m is the total number of classes and i is the index for positive class.
For two classes C i & C j , the value of AUC (C i , C j ) represents the probability of being assigned to the ith class by the classifier. A randomly selected sample from the first class (ith class) has a higher probability to assign compared to a randomly selected sample from the second class (jth class) and vice versa.

Comparison Schemes
To analyze the performance of our approach on large-scale image dataset, we compared our model into two parts. In the first part, we tested our method with seven clustering models with two state-of-the-art clustering models including K-Mean [83] and Fuzzy C-Mean [84]; also five deep learning based clustering models including Deep Embedded Clustering (DEC) [45], Deep Embedded Regularized Clustering (DEPICT) [53], Convolution Clustering for Unsupervised Learning (CNN-CM) [46], Joint Unsupervised Learning of Deep Representations and image cluster (JULE) [47] and CNN-Based Joint Clustering and Representation learning (CCNN) [44]. Besides the seven cluster models, in the second part we applied three baseline schemes for performance evaluation. Scheme 1: the proposed method without update cluster centroid by FRCM. Scheme 2: the proposed method without iterative representation learning using updated cluster centroid. Scheme 3: the proposed method with update cluster centroid and iterative representation learning simultaneously.

Implementation Details
We utilized AlexNet [40] pre-trained on the ILSVRC12 training set on ImageNet as our basic CNN models to avoid turning any hyper-parameters utilizing the labeled data and also to accelerate the convergence. Data augmentation was utilized to increase the sample variety during the pre-training process. We considered that the number of convolutional layers depends on the size of the image in the dataset. We selected ILSVRC12 in ImageNet [73] as large-scale image dataset. We demonstrated the performance of our clustering method on ILSVRC12 validation set denoted as "ILSVRC12-Val" and did not evaluate theILSVRC12 training set for fairness. The proposed unsupervised CNN clustering for ImageNet is composed of five convolutional layers assumed from AlexNet [40], followed by two adjustment layers with channel size 6144 & 2048 with 3 × 3 kernel size and one clustering convolutional layer with k channel size with 3 × 3 kernels followed by a global max-pooling with (1 × k) output, where k is denoted as the number of clusters that replace the one fully connected layer and softmax layer with k channel size.
We also evaluated the performance of our approach on two other image dataset: MNIST and YTF. The image size of these datasets is substantially smaller than the ImageNet. So, we composed two convolution layers adopted form AlexNet followed by one clustering convolution layer with one fully connected (FC) layer and the other Softmax layer. The output of FC layer is considered the initial centroid of clusters which is updated by utilizing the Fuzzy Rough C-Mean (FRCM) algorithm in the forward pass and representation learning of cluster convolution, FC and softmax layers by backward pass using SGD. The personal computer is utilized to compute all results with a commercial GPU card.

Experimental Design
The experimental results analysis is mainly split into two parts. In the first part, FRUCNN is tested on three benchmark large-scale image datasets with comparison to the other state-of-the-art clustering methods and deep-learning-based clustering methods, and in the second part, we will discuss three different schemes of FRUCNN and experimentally compare their performances in the consequent test. Table 3 shows the results of the performance measure (NMI, Acc, MFM, and MAUC) on four datasets to demonstrate the effectiveness of our proposed approach. From the analysis of the results of all performance measures, MNIST-Full and MNIST-Test show better results compared to ILSVRC12-Val, and YTF. For further analysis of the results of large-scale dataset, we compared our proposed approach to other state-of-the-art methods. In Table 4, the reported results are borrowed from the original paper; for those that did not show the results, the corresponding results are found by re-running the code released by original papers. We put dash marks (-) for the results that are not applied to obtain.  Table 4 compares the NMI and ACC performances of the proposed clustering method with other clustering methods with the parameters T = 0.05 and ε = 0.01. We will do the analysis of the results in Table 4 of non-deep-learning based clustering algorithms; these are not applicable for large-scale image dataset like ILRCVRC-12 in ImageNet. In another image dataset such as MNIST and YTF, FCM performed better than K-Mean. When we do the comparison with the results of deep-learning based clustering algorithms with the state-of-the-art clustering algorithms, at that point all deep-learning based clustering algorithms outperform with a significant margin. The reason behind this result that learning performs better is that feature representation of input images lead to better clustering results. We also compare our proposed clustering method with other deep-learning based clustering methods. The proposed FRUCNN achieves comparable performance upgrading in NMI by 0.021-0.032, 0.002-0.013, 0.004-0.005 and 0.043-0.089 with DEPICT, JULE and CCNN methods and significantly outperforms upgrading in NMI by 0.176-0.246. 0.013-0.103, 0.043-0.093 and 0.445 with DEC and CNN-CM for all ILSVRC 12-Val, MNIST, and YTF datasets. The experiment results demonstrate that the proposed FRUCNN approach performs better for image dataset for numerous scales. Table 5 compares the NMI and ACC performance on different schemes of the proposed method. When we compare Schemes 1 and 2, the performance of clustering is improved with updating the centroid with the assistance of the FRCM algorithm. When Scheme 2 is compared with Scheme 1, it handles the large-scale image data with less time complexity because of the reliable sample selection for updating cluster centers by FRCM. The performance of Scheme 3, our proposed method, significantly improves the cluster performance in NMI with 0.201, 0.289, and 0361, when compared with only updating iteratively representation learning and in NMI with 0.191, 0.79, 0.068, and 0.151, when compared the proposed method is updating only cluster centroid. In the analysis from the above results, we achieved the better cluster performance with updating the cluster centers compared to updating iteratively representation learning. The highest performance of Scheme 3 shows that the combination of updating cluster centers and iterative representation learning simultaneously make a great combination for achieving maximum clustering performance.

Computational Time Comparison
To evaluate the efficiency of our clustering algorithm on image datasets, we compare the computational time of our proposed FR-UCNN method with other compared algorithms, where we set the number of epochs for parameters updating to 10. Table 6 illustrates the computational time for FR-UCNN and other methods on all datasets. The comparison shows that our method consumed 3.1 h, 1.2 h, 0.16 h, and 0.5 h to obtain the clustering results of t ILSVRC12-Val, MNIST-Full, and MNIST-Test respectively. Our method achieved better results compared to other methods except CCNN due to a mini-batch based method with feature drift comparison which can effectively address the problem of large-scale image dataset.  Figure 4 shows the NMI performance on the different number of k clusters w.r.t ILSVRC12-Val and MNIST-Full dataset. First, the ILSVRC12-Val dataset at k = 1 accomplishes the best NMI performance. The performance is decreasing with the number of k increasing. Let k = 1 be the good choice for ILSVRC12-Val, however the computation time is too long for fine-tuning the learning representation. We will consider the computational time and performance results together, at that point the best choice is k = 10 or less for the ILSVRC12-Val dataset. Secondly, for the MNIST-Full dataset, at k = 9 accomplishes the highest NMI performance as expected. When we compare the performance MNIST-Full dataset with k = 9 and k = 10 then k = 9 accomplishes better performance as the handwriting digits 4 and 9 consider the same cluster. For both datasets, k = 9 or 10 seems to be a reasonable choice for good NMI performance with much less computation.
3.5.2. Performance on Number of Cluster (k) Figure 4 shows the NMI performance on the different number of k clusters w.r.t ILSVRC12-Val and MNIST-Full dataset. First, the ILSVRC12-Val dataset at k = 1 accomplishes the best NMI performance. The performance is decreasing with the number of k increasing. Let k = 1 be the good choice for ILSVRC12-Val, however the computation time is too long for fine-tuning the learning representation. We will consider the computational time and performance results together, at that point the best choice is k = 10 or less for the ILSVRC12-Val dataset. Secondly, for the MNIST-Full dataset, at k = 9 accomplishes the highest NMI performance as expected. When we compare the performance MNIST-Full dataset with k = 9 and k = 10 then k = 9 accomplishes better performance as the handwriting digits 4 and 9 consider the same cluster. For both datasets, k = 9 or 10 seems to be a reasonable choice for good NMI performance with much less computation.   Figure 6 also shows the comparison of NMI performance with respect to the different number of epochs on four data sets. We evaluate the performance of our proposed method; the clustering performance gradually increases with the number of epochs increasing for updating parameters. We also analyze from Figure 6 that the performance is stable after the number of epochs is 25 for ILSVRS12-Val and MNIST dataset and clustering performance is stable for YTF dataset after the  Figure 6 also shows the comparison of NMI performance with respect to the different number of epochs on four data sets. We evaluate the performance of our proposed method; the clustering performance gradually increases with the number of epochs increasing for updating parameters. We also analyze from Figure 6 that the performance is stable after the number of epochs is 25 for ILSVRS12-Val and MNIST dataset and clustering performance is stable for YTF dataset after the number of epochs is 30. After the analysis of Figures 4 and 6, the best choice for ILSVRC12 is k = 10 with epochs 25 and for MNIST and YTF k = 9 or 10 with epochs 30.
(d) Epoch 7 (e) Epoch 9 (f) Epoch 12  Figure 6 also shows the comparison of NMI performance with respect to the different number of epochs on four data sets. We evaluate the performance of our proposed method; the clustering performance gradually increases with the number of epochs increasing for updating parameters. We also analyze from Figure 6 that the performance is stable after the number of epochs is 25 for ILSVRS12-Val and MNIST dataset and clustering performance is stable for YTF dataset after the number of epochs is 30. After the analysis of Figures 4 and 6, the best choice for ILSVRC12 is k = 10 with epochs 25 and for MNIST and YTF k = 9 or 10 with epochs 30.

Threats to Validity
Some potential threats to validity exist in our experimental study. Dataset quality might be the most important threat to the external validity, which refers to the generalizability of our experimental

Threats to Validity
Some potential threats to validity exist in our experimental study. Dataset quality might be the most important threat to the external validity, which refers to the generalizability of our experimental results. To guarantee the representativeness of our experiment, we utilized ImageNet, MNIST and YTF image dataset s, which are usually utilized for clustering techniques.
Our unsupervised CNN clustering method adopts an initial pre-trained CNN model from ImageNet to predict cluster labels; it can generally lead to consistent convergence performance. In addition, Fuzzy Rough C-Mean clustering is utilized for updating cluster centroid with CNN architecture. Although Fuzzy C-Mean clustering is certain to converge, there is no hypothetical guarantee for the convergence of the FRCM clustering approach.
We implement a broadly utilized metric NMI (Normalized Mutual Information) matrix and ACC (Average) to evaluate the clustering performance. NMI is more reliable for clustering results compared with other performance measures. To avoid the internal threat, all implementation is cross-checked by our research group. There is no hypothetical guarantee of convergence (just like the other existing deep-learning-based clustering models) for large image datasets. Our experiment results show that our approach can achieve a good convergence performance.

Conclusions
In this paper, we provided image clustering using extracted from convolutional clustering layers in a Convolution Neural Network (CNN). This provides a state-of-the-art performance; we have shown that with our proposed unsupervised CNN clustering based on Fuzzy Rough C-Mean (FRCM) algorithm, performance can be improved for robust large-scale image dataset based on the iteration between an updating cluster centroid using FRCM algorithm and an unsupervised CNN clustering fine-tuning. An unsupervised CNN clustering can extract silent features from the convolutional clustering layer to produce the initial cluster center. During the training process, the cluster and representation are trained jointly; the cluster center is updated step-by-step by FRCM algorithm during the forward pass and learned representation in backward pass. We also show that reliable sample selection for updating cluster centroids by the FRCM algorithm is the key component to its success. Empirical studies with other non-fuzzy CNN reveal that fuzzy and rough learning with CNN demonstrates the strength of the proposed method on several image datasets. But the defect is that too many parameters need to adjusted, and in future we need to work to make the parameters self-adaptive.