An Application of Hyperspectral Image Clustering Based on Texture-Aware Superpixel Technique in Deep Sea

: This paper aims to study the application of hyperspectral technology in the classiﬁcation of deep-sea manganese nodules. Considering the spectral spatial variation of hyperspectral images, the difﬁculty of label acquisition, and the inability to guarantee stable illumination in deep-sea environments. This paper proposes a local binary pattern manifold superpixel-based fuzzy clustering method (LMSLIC-FCM). Firstly, we introduce a uniform local binary pattern (ULBP) to design a superpixel algorithm (LMSLIC) that is insensitive to illumination and has texture perception. Secondly, the weighted feature and the mean feature are fused as the representative features of superpixels. Finally, it is fused with fuzzy clustering method (FCM) to obtain a superpixel-based clustering algorithm LMSLIC-FCM. To verify the feasibility of LMSLIC-FCM on deep-sea manganese nodule data, the experiments were conducted on three different types of manganese nodule data. The average identiﬁcation rate of LMSLIC-FCM reached 83.8%, and the average true positive rate reached 93.3%, which was preferable to the previous algorithms. Therefore, LMSLIC-FCM is effective in the classiﬁcation of manganese nodules.


Introduction
Manganese nodules, also known as poly-metallic nodules (PMNs), are rich in copper, nickel, cobalt, lithium, and other raw materials necessary for producing high-tech products [1]. The metal reserves of manganese nodules are enormous. The Clarion-Clipperton Zone alone in the Eastern Pacific contains more manganese, cobalt, nickel, tellurium, titanium, and yttrium than their combined reserves on land [2]. It is a strategic mineral resource most likely to be acquired preferentially from the seabed. At present, the exploration methods commonly used for manganese nodules are roughly divided into three categories [3]: physical sampling [4], acoustic detection [5], and optical image analysis [6]. Among them, the optical image analysis better combines the characteristics of the other two methods, but there are still problems with this method, such as low resolution [7]. Hyperspectral imaging, as high-resolution imaging technology, has gradually attracted the attention of researchers.
Hyperspectral images are obtained by imaging the target area with many continuous and subdivided spectral bands. Each pixel in the image contains a continuous spectral curve, and the spectral curves obtained by different materials are different [8]. In addition, it can identify and classify objects on a fine scale due to its high spatial-spectral resolution [9]. Therefore, researchers started to apply this technique in the marine field. However, the penetration depth of most light wavelengths does not exceed 50 m, limiting its applications to shallow waters in the early days [10,11]. However, with active underwater hyperspectral imagers (UHI) [12], hyperspectral technology is now being applied to the deep sea. For

1.
Introducing a superpixel to replace the fixed spatial structure reduces the possibility of foreign objects in the same area and suppresses noise interference to a certain extent; 2.
The introduction of the ULBP operator increases the texture perception ability of the superpixel algorithm and alleviates the noise interference caused by the lighting conditions during the segmentation process; 3.
The recognition task of manganese nodules was completed by fusing the superpixel algorithm and clustering algorithm. The average recognition rate was 83.8%.
The rest of this article is as follows. The second section introduces the specific process of the LMSLIC-FCM algorithm. The third section introduces the experimental results of the proposed algorithm on the manganese nodules dataset and the comparison with other methods. Finally, the fourth section concludes and discusses future work.

Methods
The proposed LMSLIC-FCM structure is illustrated in Figure 1. Firstly, principal component analysis (PCA) is used to reduce the dimensionality of the original data, and then ULBP to extract texture features. Secondly, extracted texture features are introduced into MSLIC for improvement and a new iterative formula is proposed. Then, we iterate in high-dimensional space to get super pixels with texture perception ability. Finally, the representative features are extracted for each superpixel and combined with the fuzzy clustering algorithm to achieve superpixel clustering. Therefore, in this paper, local binary pattern manifold superpixel-based fuzzy clustering method (LMSLIC-FCM) is proposed to realize the exploration of seabed poly-metallic nodules. We introduce ULBP to improve the Manifold SLIC (MSLIC) to obtain a local binary pattern manifold superpixel (LMSLIC). Then, we combine it with fuzzy clustering method to get LMSLIC-FCM which is suitable for deep-sea hyperspectral image classification. The main contributions of this method are as follows: 1. Introducing a superpixel to replace the fixed spatial structure reduces the possibility of foreign objects in the same area and suppresses noise interference to a certain extent; 2. The introduction of the ULBP operator increases the texture perception ability of the superpixel algorithm and alleviates the noise interference caused by the lighting conditions during the segmentation process; 3. The recognition task of manganese nodules was completed by fusing the superpixel algorithm and clustering algorithm. The average recognition rate was 83.8%.
The rest of this article is as follows. The second section introduces the specific process of the LMSLIC-FCM algorithm. The third section introduces the experimental results of the proposed algorithm on the manganese nodules dataset and the comparison with other methods. Finally, the fourth section concludes and discusses future work.

Methods
The proposed LMSLIC-FCM structure is illustrated in Figure 1. Firstly, principal component analysis (PCA) is used to reduce the dimensionality of the original data, and then ULBP to extract texture features. Secondly, extracted texture features are introduced into MSLIC for improvement and a new iterative formula is proposed. Then, we iterate in high-dimensional space to get super pixels with texture perception ability. Finally, the representative features are extracted for each superpixel and combined with the fuzzy clustering algorithm to achieve superpixel clustering.

LMSLIC
In this paper, we define a stretching transformation Φ . The process is as follows:

LMSLIC
In this paper, we define a stretching transformation Φ. The process is as follows: where (u, v) represents the position of the pixel p in the hyperspectral image, S represents step length, g i represents the spectral value of the i th band of the pixel, t i represents the texture value of the i th band of the pixel, i = {1, 2, . . . , b}, b is the maximum number of spectral bands, and M represents the compactness in which the parameter S is given: where N represents the total number of image pixels, K represents the number of initial superpixels. Then, we define Area(Φ( p )) as the area of p . The process is given: where the quadrilateral p is defined to represent a square formed by four vertices c 1 ,c 2 ,c 3 ,c 4 with p as the center, Area(Φ(∆ c 1 c 2 c 3 )) represents the area of the curved-sided triangle formed by c 1 , c 2 , c 3 in the high-dimensional space in which the structure and transformation process of p are shown in Figure 2, and the parameter Area(Φ(∆ c 1 c 2 c 3 )) is shown below: where θ represents the angle between the two vectors.
( ) 1 1 , , ,..., , ,..., where ( ) , b is the maximum number of spectral bands, and M represents the compactness in which the parameter S is given: where N represents the total number of image pixels, K represents the number of initial superpixels. Then, we define ( ( )) p Area Φ  as the area of  p . The process is given: where the quadrilateral  p is defined to represent a square formed by four vertices 1 c , 2 c , 3 c , 4 c with p as the center, Area Φ ∆ is shown below: where θ represents the angle between the two vectors. . They are given: In addition, we define Area(Φ(Ω(s i ))) as the area of all pixels in the 2λ i S × 2λ i S region centered on s i and define Area(V (Φ(s i ))) as the total area of pixels in the superpixels. In addition, λ i represents scaling factor, i = {1, 2, . . . , K}. They are given: where V represents the set of superpixels after mapping, W represents a local search range. The detail is given: Besides, in order to obtain texture features, PCA is first used for data dimensionality reduction to retain the primary information. Then, the ULBP operator extracts the texture information according to the bands after the dimensionality reduction to form a texture map. ULBP is an improved version of LBP. It calculates the number of transitions between Remote Sens. 2022, 14, 5047 5 of 15 0 and 1 based on LBP and considers the transitions more than two times as a non-uniform pattern, and classifies others as a uniform pattern. The specific process is as follows: where L represents the eigenvalue after ULBP encoding, U(x c , y c ) indicates the number of transitions in encoding, (x c , y c ), (x c , y c ) is the coordinate of the central position, g i represents the spectral value of the i th element, g c represents the spectral value at the central position, q is the number of surrounding elements, and h is the threshold of ULBP in which, the parameter h is given: The acquisition process of texture spectrum T m×n×b of hyperspectral image is shown below: where m × n is the size of the hyperspectral image.
What's more, we define D(Φ(p 1 ), Φ(p 2 )) as the similarity measure in the iterative process. The detail is as follow: where d s is the spatial distance, d c is the spectral distance, d t is the texture distance, α, in which the parameters d s , d c and d t are given: In addition, in the iterative process, the data is projected from the high-dimensional space to the original plane to calculate the centroid c i of each region. The process is given: where i represents the i th superpixel set, i = {1, 2, . . . , K}, V i denotes the set of superpixel on the original plane. Finally, we define err as the error function in the superpixel iteration process, i.e., Then, err < ρ is the convergence condition, where ρ is a constant.

Superpixel Fuzzy Clustering
The segmentation of hyperspectral images using the superpixel method produces superpixels, denoted by {V 1 , V 2 , . . . , V k }, where V i = [s 1 , s 2 , . . . , s n i ], s j denotes the j th pixel in the superpixel and n i denotes the total number of pixels in V i .
The mean features x i of the superpixels are obtained by mean filtering. The detail is given: It is necessary to take the superpixel as a whole, and its mean feature represents the features of all pixels inside the superpixel. However, relying solely on mean filtering may distort the data, making it challenging to reflect regional features accurately.
Therefore, considering the importance of individual pixels s j in the superpixel, the weighted mean filtering method is used to extract the features in each superpixel. The high spectral similarity with the centroid for the superpixel will result in a greater contribution. The weighted mean feature x i of the i th superpixel is defined as follows: where the weight w s j is defined as: However, the weighted mean feature is not smooth and may lead to increased noise in some cases. Therefore, taking this into account, we use the feature fusion method to fuse the two features, the smoothness of the weighted mean and the contribution of each internal pixel, to obtain a new feature Y K×2b = X, X , where X K×b = {x 1 ; x 2 ; . . . ; x K } and X K×b = {x 1 ; x 2 ; . . . ; x K } are the set of corresponding features. The feature information redundancy is then reduced by the PCA method to obtain the representative feature Y K×b of the superpixel. The process is given: After obtaining the representative superpixel feature, it is used as the input image of FCM to perform fuzzy clustering by superpixel. The objective function J is defined as: where i = {1, 2, . . . , K},j = {1, 2, . . . , C}, K is the number of superpixel, C is the number of clusters, a is the fuzzy coefficient, u ij denotes the fuzzy affiliation between the i th superpixel and the j th cluster center, y i denotes the representative feature of the i th superpixel satisfying y i ∈ Y K×b , and o j denotes the cluster center of the j th class in which, the parameters u ij and o j are given: Finally, we define ε k as the fuzzy affiliation error: where k represents the k th iteration process. Then, ε k < δ is the convergence condition in which δ is a constant.

Dataset
The data used in this experiment are from the underwater hyperspectral dataset provided by Dumke et al. [35], which was collected using a new underwater hyperspectral imager (UHI) mounted on the ROV at a water depth of approximately 4200 m. The original dataset contained 112 bands of wavelengths from 378 to 805 nanometers. Dumke et al.

Experimental Setup
The dataset does not provide labels and cannot be calculated to obtain a precise overall accuracy (OA). Therefore, this paper refers to the manual counting method of Dumke et al. [15] to determine the actual number of manganese nodules in each dataset and to calculate the parameters, such as the identification rate, true positive rate, and false positives.

Superpixel Analysis
Since the dataset used in this paper does not have superpixel labels, it is impossible to calculate the boundary recall rate, under-segmentation error, and other indicators to measure the segmentation performance. Therefore, pictures are used to compare the difference between LMSLIC and MSLIC segmentation performance. First, we set ρ = 10 −9 . Then, the initial parameters of this part are obtained by traversing the grid search method, and finally, the parameters with the highest classification accuracy are selected as the optimal parameters.
It can be seen from Figure 3 that the No. 7 dataset [35] is the best when K = 10000, and the true positive rate decreases when the number of superpixels continues to increase. Because a too high value of K will cause over-segmentation, and the pixels of the same manganese nodules are divided into different superpixel sets. Thus, the elements in the set are reduced, making the features not smooth enough and affecting the classification accuracy. In addition, the superpixel algorithm presented in this paper automatically splits and merges some superpixels in the iterative process, leading to consistent parameter effects within a specific range. Therefore, the experimental interval was set to 3000.
Remote Sens. 2022, 14, x FOR PEER REVIEW 9 of 16 classification accuracy. In addition, the superpixel algorithm presented in this paper automatically splits and merges some superpixels in the iterative process, leading to consistent parameter effects within a specific range. Therefore, the experimental interval was set to 3000. The superpixel segmentation performance of each dataset is shown in Figure 4. It can be seen that LMSLIC segments the boundaries which were not identified in MSLIC, and the segmentation by LMSLIC is refined to a higher degree. Compared with MSLIC, the superpixel algorithm in this paper introduces a ULBP operator to construct texture spectrum, which enriches the information of manganese nodules, while improving the anti-interference ability. Therefore, it can better identify the boundary and is more sensitive to internal elements.  The superpixel segmentation performance of each dataset is shown in Figure 4. It can be seen that LMSLIC segments the boundaries which were not identified in MSLIC, and the segmentation by LMSLIC is refined to a higher degree. Compared with MSLIC, the superpixel algorithm in this paper introduces a ULBP operator to construct texture spectrum, which enriches the information of manganese nodules, while improving the anti-interference ability. Therefore, it can better identify the boundary and is more sensitive to internal elements.

Cluster Analysis
In this paper, a grid search is performed on the number of c cients, and the optimal parameters of each data are obtained wh jective function J is a stable transition point.

Cluster Analysis
In this paper, a grid search is performed on the number of clusters and fuzzy coefficients, and the optimal parameters of each data are obtained when the value of the objective function J is a stable transition point. First, the parameter search range was set according to FCM. The search range of the number of clusters C is (1,11). The search range of the fuzzy coefficient a is (1, 1.9). Second, the search step size was set in both directions. Finally, take C as the main direction and a as the secondary direction to search for the final parameters. In addition, we set δ = 10 −9 . The experimental results are shown in Figure 5. Remote Sens. 2022, 14, x FOR PEER REVIEW 11 of 16 It can be seen from Figure 5 that the curvature is largest when = 1.3 a , = 5 C . When the parameter C and a continues to increase, the slope of J becomes flat, indicating that the parameter is closest to the actual parameter. Therefore, it is the optimal parameter for clustering at this time.
Consider that deep-sea hyperspectral data lack classification methods, to verify the classification performance of LMSLIC-FCM on manganese nodules data, we compared the proposed method with K-means, MSLIC-FCM, and DKFCM. The experimental results are listed in Table 1. Firstly, DKFCM is a manganese nodule classification algorithm proposed by Zhang et al. [16]. Secondly, as a primary clustering algorithm, K-means is used to verify the effectiveness of LMSLIC-FCM. Finally, MSLIC-FCM is obtained by combining MSLIC [25] and superpixel-based fast fuzzy C-means [34], which is the basis of the improved method in this paper.
In addition, the clustering results of data No. 7 are shown in Figure 6. The red circles represent true nodules, the blue circles represent missed nodules, and the green circles represent false nodules.
The clustering results of the original hyperspectral pseudo-RGB image and LMSLIC-FCM are compared in Figure 6. We manually calibrated 17 real nodules by combining pseudo-RGB images with video data. These nodules are also more accurately marked in the clustered images. A total of 20 nodules were detected by this method, of which 16 were correct, 4 were falsely detected, and 1 was missed. Therefore, this method can effectively identify manganese nodules. Despite false and missed positives, the number of manganese nodules compared to correctly identified remains within acceptable limits. It can be seen from Figure 5 that the curvature is largest when a = 1.3, C = 5. When the parameter C and a continues to increase, the slope of J becomes flat, indicating that the parameter is closest to the actual parameter. Therefore, it is the optimal parameter for clustering at this time.
Consider that deep-sea hyperspectral data lack classification methods, to verify the classification performance of LMSLIC-FCM on manganese nodules data, we compared the proposed method with K-means, MSLIC-FCM, and DKFCM. The experimental results are listed in Table 1. Firstly, DKFCM is a manganese nodule classification algorithm proposed by Zhang et al. [16]. Secondly, as a primary clustering algorithm, K-means is used to verify the effectiveness of LMSLIC-FCM. Finally, MSLIC-FCM is obtained by combining MSLIC [25] and superpixel-based fast fuzzy C-means [34], which is the basis of the improved method in this paper. In addition, the clustering results of data No. 7 are shown in Figure 6. The red circles represent true nodules, the blue circles represent missed nodules, and the green circles represent false nodules.   The clustering results of the original hyperspectral pseudo-RGB image and LMSLIC-FCM are compared in Figure 6. We manually calibrated 17 real nodules by combining pseudo-RGB images with video data. These nodules are also more accurately marked in the clustered images. A total of 20 nodules were detected by this method, of which 16 were correct, 4 were falsely detected, and 1 was missed. Therefore, this method can effectively identify manganese nodules. Despite false and missed positives, the number of manganese nodules compared to correctly identified remains within acceptable limits. Figure 7 shows the experimental indicators of the four classification methods for each dataset. It can be seen that although the true positive rate of the K-means is not much different from other methods, the identification rate is much lower than that of other methods. Because it only considers the similarity between a single spectrum, it is susceptible to noise interference leading to many misclassifications. In addition, combined with Table 1, it can be concluded that the clustering performance of K-means is poor, and the number of false positives in each dataset is more than 10, which is much more than other methods.
The other three methods consider not only the spectral information of nodules but also the impact of spatial structure on the classification performance. These three methods use spatial-spectral information to improve the classification accuracy of nodules. The algorithm in this paper and MSLIC-FCM belong to superpixel fuzzy clustering, while DKFCM is a deep network-based clustering algorithm. We can see from Table 1, the superpixel-based algorithm has fewer false positives. It is because DKFCM extracts features through a fixed convolution window and does not consider the spatial structure changes. At the same time, the superpixel algorithm can achieve adaptive changes in the spatial neighborhood and better utilize the spatial structure of manganese nodules. However, DKFCM can extract more abstract information by using the dimensionality reduction feature of RPnet [36] and the fusion of deep and shallow features, reducing the possibility of missing nodules compared to manually constructed features. Due to alleviating illumination interference with ULBP and smooth noise through adaptive neighborhood, the results show that the proposed algorithm is generally better than the previous algorithms, and the number of missed measurements is also reduced compared with MSLIC-FCM.
Remote Sens. 2022, 14, x FOR PEER REVIEW 13 of poor, and the number of false positives in each dataset is more than 10, which is mu more than other methods. The other three methods consider not only the spectral information of nodules b also the impact of spatial structure on the classification performance. These three me ods use spatial-spectral information to improve the classification accuracy of nodul The algorithm in this paper and MSLIC-FCM belong to superpixel fuzzy clusterin while DKFCM is a deep network-based clustering algorithm. We can see from   In order to visually verify the classification effect of the algorithm in this paper, Figure 8 shows the comparison of the clustering of different algorithms and LMSLIC-FCM. The classification results in the red box represent the most obvious differences between the algorithms. vious algorithms, and the number of missed measurements is also reduced compar with MSLIC-FCM.
In order to visually verify the classification effect of the algorithm in this pap Figure 8 shows the comparison of the clustering of different algorithms a LMSLIC-FCM. The classification results in the red box represent the most obvious d ferences between the algorithms. Manganese Nodules It can be seen from Figure 8 that there are a lot of false detections in the image o tained by K-means, especially in the red box. Although the nodules are relatively com plete and the number of false positives is relatively small, MSLIC-FCM still identif most of the areas in the red box as nodules. Although DKFCM has few false positives small part of the nodules identified only have a few single pixel points that are not com plete. Over-segmentation of some nodules results in identified nodules larger than t actual nodule size. The image obtained by the method in this paper is relatively comple and has fewer false positives, especially in the red box. Therefore, this method can mo accurately identify manganese nodules.

Conclusions
In order to utilize hyperspectral technology for deep-sea manganese nodule exp ration, an unsupervised classification method, LMSLIC-FCM, suitable for deep-sea h perspectral data is proposed in this paper. The method fully considers the particularity deep-sea hyperspectral data and has good accuracy in practical applications.
This method integrates ULBP into the MSLIC algorithm, which enables the LMSL algorithm to have texture perception capability and alleviates the error caused by unev seabed illumination. In addition, using LMSLIC-FCM as a classifier effectively solves t problem of difficulty in obtaining deep-sea labels. The experimental results show th LMSLIC has a better segmentation performance than the original MSLIC algorithm a It can be seen from Figure 8 that there are a lot of false detections in the image obtained by K-means, especially in the red box. Although the nodules are relatively complete and the number of false positives is relatively small, MSLIC-FCM still identifies most of the areas in the red box as nodules. Although DKFCM has few false positives, a small part of the nodules identified only have a few single pixel points that are not complete. Over-segmentation of some nodules results in identified nodules larger than the actual nodule size. The image obtained by the method in this paper is relatively complete and has fewer false positives, especially in the red box. Therefore, this method can more accurately identify manganese nodules.

Conclusions
In order to utilize hyperspectral technology for deep-sea manganese nodule exploration, an unsupervised classification method, LMSLIC-FCM, suitable for deep-sea hyperspectral data is proposed in this paper. The method fully considers the particularity of deep-sea hyperspectral data and has good accuracy in practical applications.
This method integrates ULBP into the MSLIC algorithm, which enables the LMSLIC algorithm to have texture perception capability and alleviates the error caused by uneven seabed illumination. In addition, using LMSLIC-FCM as a classifier effectively solves the problem of difficulty in obtaining deep-sea labels. The experimental results show that LMSLIC has a better segmentation performance than the original MSLIC algorithm and can more effectively process deep-sea data. LMSLIC-FCM shows excellent classification performance, which proves that this method can fully explore the internal connection of data and reduce external interference. This paper proposes introducing texture spectrum in superpixel segmentation, which can better obtain adaptive spatial structure and alleviate noise caused by illumination, which has significant advantages in deep-sea data. However, manual selection of suitable superpixel parameters is still tricky. In addition, the generality of the manually selected features is insufficient. Therefore, we may try to automatically estimate superpixel parameters and utilize deep learning methods to automatically extract deep features to increase the robustness of the proposed method.