1. Introduction
China’s coastline stretches a full 18,000 km, and its expansive maritime territory makes it a true maritime power. However, China’s seas possess unique characteristics: most waters are nearshore and relatively shallow, with areas within 60 nautical miles and depths not exceeding 100 m accounting for 98.5% of the country’s total maritime area. As China’s comprehensive national strength continues to grow, its exploration and understanding of the ocean have deepened significantly. The rapid expansion of the marine economy has come to play an increasingly important role in the nation’s overall economic growth. The vast ocean harbors valuable mineral resources, abundant biological resources, and marine chemical resources. By continuously advancing marine resource exploitation technologies, these latent potentials are being transformed into new drivers of economic growth, powerfully propelling rapid economic development. This process is vital for addressing major societal challenges and advancing the great rejuvenation of the Chinese nation. At the same time, the expansive ocean provides significant military-strategic advantages but also imposes higher demands on the security, defense, and resource protection of maritime territories. Economic activities such as seabed mineral exploration, oil drilling platform monitoring, and fisheries resource tracking rely heavily on precise underwater target classification technology. Therefore, whether to promote the development of the marine economy or to safeguard national sovereignty and territorial integrity, researchers are driven to continuously improve the accuracy and efficiency of underwater target classification.
As one of the core technologies for marine resource exploration, ecological monitoring, and autonomous navigation of underwater robots, the research value of underwater image classification becomes more and more prominent with the deepening of marine development in recent years. However, the complexity and specificity of underwater environments—such as light attenuation, scattering effects, suspended particle interference, and low contrast—make traditional image processing methods face serious challenges in feature extraction and classification accuracy.
Traditional underwater image classification research mainly uses manually extracted features to describe the image content, such as Scale Invariant Feature Transform (SIFT) [
1], Histogram Orientation Gradient (HOG) [
2], and then combines them with the Nearest Neighbor Classifier, Support Vector Machine (SVM) [
3], Random Forest [
4] and other classifiers for recognition classification. Although these methods are simple and effective, they are usually limited by manually extracted features and suffer from weak generalization ability and sensitivity to environmental changes.
In recent years, learning methods represented by deep learning, especially convolutional neural networks (CNNs), have significantly driven the performance of underwater image classification. CNNs play an important role as a powerful tool in image and video classification [
5], and CNN methods, with their powerful nonlinear feature representation, enable the integration of feature extraction and classification decision-making processes in classification tasks [
6,
7]. Aiming at the problem of uneven illumination and color shift specific to underwater images, researchers have proposed enhanced CNN structures dedicated to underwater image classification. For example, Li et al. [
8] proposed combining an image enhancement module with a CNN structure to significantly improve the classification accuracy. Fu et al. proposed a detection method for small underwater targets: candidate regions were first extracted based on Markov random field segmentation, followed by the computation of Hu moment features for potential target areas and their fusion with a convolutional neural network, thereby enhancing the characterization and recognition capabilities for small targets [
9]. Wang et al. improved upon the CenterNet framework by integrating semantic and spatial information within the network, enhancing the multi-scale consistency of feature representation and subsequently improving the detection performance for multi-scale underwater targets [
10]. Cai et al. employed the YOLOv3 object detection algorithm to achieve precise identification and counting of Takifugu rubripes, effectively improving the recognition accuracy and practical applicability for specific aquatic species [
11]. Wang et al. proposed an improved algorithm named YOLOv7-PSS based on YOLOv7. Through structural optimization, the model’s parameter count was reduced while training and inference efficiency were enhanced. Additionally, the SIoU loss function was introduced to accelerate network convergence, thereby optimizing the overall training process and detection performance [
12]. Yuan et al. developed a fish target detection method based on Faster R-CNN and implemented a secondary transfer learning strategy for network training, aiming to strengthen the model’s adaptability and generalization ability across various underwater scenarios [
13]. Chen et al. introduced a fine-grained feature-aware detection method for underwater salps, enhancing the expression of key cues through cross-dimensional feature interaction and fusion, thereby reducing missed detections caused by underwater biological occlusion and improving detection reliability [
14]. While deep learning models can provide superior recognition accuracy and performance, they have a huge demand for computational resources and often face operational challenges on resource-constrained devices. Comparatively speaking, machine learning technology is known for its fast training process, small model size, and convenient deployment characteristics, and can operate without relying on high-end hardware, so the classification models based on traditional machine learning methods still have strong application value and research significance in resource-constrained underwater image classification scenarios.
Machine learning algorithms, as useful decision-making tools, are widely used in society [
15]. With the continuous development of machine learning methods, traditional machine learning models are widely used in the field of underwater image classification. Early research mainly relied on the combination of feature extraction and underlying classification models, where classification models include the K-Nearest Neighbors (KNN) algorithm [
16], Decision Tree (DT) [
17]. These methods usually extract image features by hand and then classify them using the classification models mentioned above. KNN is commonly used to solve classification and regression problems [
18]. The KNN method is based on Euclidean distance for category discrimination, which is easy to implement but sensitive to noise and sample distribution; the DT model has strong interpretability and adaptability, but is prone to overfitting problems and has limited performance in the face of complex nonlinear boundaries. To address the shortcomings of the above methods, the Support Vector Machine (SVM) has gradually become a mainstream model in underwater image classification. The SVM uses the data points closest to the hyperplane, known as support vectors, to guide hyperplane placement [
19]. SVM maps the input features to a high-dimensional feature space by introducing a kernel function to construct an optimal hyperplane with maximum spacing, which has good robustness and is especially suitable for high-dimensional, small-sample, and nonlinear classification problems. In the face of underwater images with uneven illumination, color distortion, and background noise interference, SVM shows strong discriminative ability and is considered one of the classifiers with outstanding performance in traditional machine learning. However, the direct use of high-dimensional image features often suffers from data redundancy, degradation of classification performance, and increased computational complexity due to the fact that underwater images are generally affected by light attenuation, color distortion, and background complexity [
20]. To alleviate the above problems, researchers have widely introduced feature dimensionality reduction techniques to improve the performance of models by reducing the data dimensions. Among them, the commonly used dimensionality reduction methods include Linear Discriminant Analysis (LDA), flow learning methods such as Local Linear Embedding LLE, Isometric Mapping, and so on. Linear Discriminant Analysis (LDA) implements supervised feature dimensionality reduction by maximizing inter-category differences while minimizing intra-category differences. However, the LDA method relies on strong distributional assumptions in the modeling process, i.e., the samples of each category obey a Gaussian distribution and have a similar covariance structure, which is usually difficult to establish in complex and variable underwater environments, thus limiting its classification performance [
21]. Stream shape learning methods perform nonlinear dimensionality reduction by maintaining the local geometric structure of the data. Although they can effectively capture the nonlinear structure, they are sensitive to noise and have high computational complexity, which can lead to poor image classification performance in underwater environments with severe noise interference [
22].
Compared with the above methods, principal component analysis (PCA) has received extensive attention in underwater image classification in recent years due to its advantages of simple algorithms, efficient computation, and robustness to noise [
23]. PCA was initially used to extract latent features and minimize redundancy [
24]. PCA projects the original data into a low-dimensional space by orthogonal transformation to maximize the variance information of the original data and obtain the features containing the most important information about the data [
25]. In other words, PCA reduces high-dimensional data to a lower number of dimensions while retaining the important information that explains the original data [
26], which is able to effectively remove redundant information and reduce the feature dimensions, thus improving the performance of the classifier [
27]. Kent et al. [
28]. achieved a high classification accuracy of approximately 90% for occupants’ overall spatial satisfaction by combining PCA and SVM. However, due to the fact that underwater image data usually has obvious nonlinear characteristics, the performance of traditional linear PCA is limited in processing nonlinear streaming data. Therefore, nonlinear dimensionality reduction methods such as kernel principal component analysis (KPCA) are gradually introduced into the field of underwater image classification [
29]. Although KPCA partially solves the problem of nonlinearity in the data, its ability to characterize local details is still limited by the way the original features are represented. In order to enhance the ability of describing the local structure of the image, local binary pattern (LBP) is introduced in this paper for texture feature extraction [
30]. LBP is a texture description operator based on pixel gray-scale differences, which has the advantages of computational simplicity, robustness to illumination changes, and the ability to effectively characterize local microstructures. By combining LBP with KPCA, the ability to characterize image details can be enhanced while retaining the nonlinear mapping capability, further improving the performance of subsequent classifiers.
Meanwhile, as a widely used classifier for underwater image classification tasks, the performance of the Support Vector Machine (SVM) is highly dependent on the optimization of hyperparameters. Traditional parameter optimization methods, such as grid search and cross-validation, although simple, have high computational complexity and are prone to falling into local optimums, which are difficult to meet the demand for real-time and robustness of underwater image classification tasks. To address this problem, this paper introduces the sparrow search algorithm (SSA) to optimize the hyperparameters of SVM. SSA is a new type of population intelligence optimization algorithm proposed in recent years, which is constructed inspired by the sparrow’s foraging behavior, and possesses the advantages of strong global search capability, fast convergence speed, and simple implementation [
31]. In order to further improve the convergence stability and local exploration ability of SSA in complex search spaces, this paper introduces three optimization strategies, namely, dynamic weighting, boundary contraction, and adaptive mutation, on the basis of the original SSA framework, which enhances the performance of SSA in high-dimensional nonlinear optimization problems. Combining the improved SSA with SVM can significantly improve the feature extraction ability and classification performance of the model for complex images, especially for underwater image scenes with complex feature distribution.
Based on the above background, this paper proposes an underwater image classification framework that integrates LBP extraction of texture features, KPCA nonlinear dimensionality reduction, and SSA optimization, in order to solve the dual bottlenecks in feature extraction and parameter optimization of existing methods. Firstly, the local texture histogram of the image is extracted by the LBP algorithm, and the high-dimensional features are nonlinearly downscaled by using KPCA to eliminate the redundant information; secondly, an improved SSA is designed, which optimizes the process of finding the optimal parameters of the SVM by incorporating the optimization strategies, such as dynamic weights, boundary contraction, and so on, and further improves the classification accuracy of the algorithm. The experimental results show that compared with other classification models, the method effectively improves the accuracy of underwater image classification, proves the feasibility and effectiveness of the method, and provides a feasible solution for underwater image classification.
3. Materials and Methods
The designed algorithm flow of underwater image classification based on LBP-KPCA combined with the SSA-SVM method is shown in
Figure 1, where image preprocessing is performed on the underwater dataset, after which the LBP features of the image are extracted and incorporated into the KPCA to reduce the dimensionality, and finally, the classification is performed by using the SSA-optimized SVM.
3.1. Feature Extraction Method by Fusing LBP-KPCA
In the underwater image classification task, due to the complexity of the underwater environment, resulting in serious distortion and degradation of the image, traditional feature extraction methods often find it difficult to accurately capture the essential features of the image. Local binary pattern (LBP), as a widely used texture feature description operator in computer vision, the core idea is to encode local texture information by comparing the magnitude relationship between the center pixel and its neighboring pixel values, which not only has the advantages of rotational invariance and grayscale invariance, but also performs well in terms of computational efficiency. Therefore, this paper designs a feature extraction framework based on LBP-KPCA, which realizes the efficient extraction of underwater image features by combining the texture description capability of LBP with the nonlinear dimension reduction advantage of KPCA. For the center pixel point located at (
) in the image, the mathematical expression of the LBP feature value is as follows:
where P denotes the number of neighborhood sampling points, R denotes the sampling radius,
is the grayscale value of the center pixel,
is the grayscale value of the neighboring pixel points, and s(x) is the threshold function:
In practical applications, considering that the texture features presented in underwater images in different scales and directions have significant local similarity and directional consistency, this paper adopts the uniform mode LBP operator, which filters the main structural modes by counting the number of “0/1” and “1/0” jumps in the LBP coding. When the number of jumps does not exceed 2, the pattern is considered UNIFORM [
33], which not only significantly reduces the feature dimensions but also preserves more than 90% of the texture information in the image. The texture feature descriptors with strong discriminative ability are finally obtained by histogram statistics and normalization of LBP feature maps. The U-value is calculated by the following formula:
After extracting the texture features by LBP, considering that there may be complex nonlinear relationships between underwater image features, this paper introduces the kernel principal component analysis KPCA, which maps the data in the input space to the high-dimensional feature space through the kernel function to capture the nonlinear structural features of the data. Given
-dimensional sample vectors
, the samples are mapped to the high-dimensional feature space by the kernel function
, in which the radial basis kernel function (RBF) chosen in this paper has good nonlinear mapping ability and mathematical properties:
where
is the bandwidth parameter of the RBF kernel function used to control the radial range of action of the Gaussian kernel. By solving the eigenvalue equation,
The eigenvector
and the corresponding eigenvalue
can be obtained. In this paper, KPCA is not only used to reduce the feature dimension, but more importantly, it enhances the discriminative ability of LBP features through nonlinear mapping, so that a set of orthogonal feature vectors are obtained, which constitute a set of bases in the feature space, and the original features can be projected into the k-dimensional space by selecting the feature vectors corresponding to the largest k feature values:
In KPCA, we employ the RBF kernel and set as a fixed parameter. This parameter is not included in the SSA optimization, because our hyperparameter search is primarily devoted to the downstream RBF-SVM ( and ), which directly governs the classification margin and typically exhibits higher sensitivity to performance. Jointly optimizing would significantly enlarge the search space and increase training cost. Moreover, the KPCA input in our pipeline is a normalized LBP histogram feature. For L1-normalized histograms, the squared Euclidean distance is bounded by , and with , the RBF kernel value remains in a non-saturated range , avoiding degenerate similarity structures (nearly constant or overly localized kernels) and enabling KPCA to extract meaningful nonlinear components before SVM classification.
The specific underwater image feature extraction process is shown in
Figure 2. The underwater image feature extraction process designed in this paper adopts a multi-layer processing architecture, which constructs a complete feature extraction and optimization system from raw data input to feature dimension compression. Firstly, in the data input layer, the system performs format validation and quality screening on the raw image data from four categories (sea urchin, fish, rock, and scallop) to ensure the normality and usability of the input data in the subsequent processing process; secondly, the preprocessing layer implements the conversion from BGR format to grayscale map and the size standardization (256 × 256), and at the same time introduces data enhancement techniques such as noise suppression and contrast enhancement. Pixel values are mapped to the [0, 1] interval via min–max normalization. Subsequently, the feature extraction layer carries out texture feature extraction by integrating the uniform mode LBP operator, and the generated feature histogram is statistically generated by using n_points+3 intervals, and the stability of the features is ensured by the normalization process. Finally, in the KPCA feature dimensionality reduction layer, the nonlinear mapping is performed by using the RBF kernel function (γ = 0.1), and the feature is reduced to 8 dimensions by principal component analysis. The final output is a feature matrix of shape (N,8), and the whole process ensures the reliability and validity of feature extraction through a quality feedback loop. The whole process ensures the reliability and effectiveness of feature extraction through a quality feedback loop.
3.2. SVM Parameter Optimization Method Based on Improved SSA
As a classical statistical learning method, the classification performance of SVM relies heavily on the selection of kernel function parameters and penalty factors, while traditional parameter tuning methods often suffer from computational inefficiency, local optimization, etc. Therefore, this section discusses in depth the optimization mechanism of SVM parameters based on the sparrow search algorithm SSA, and realizes the classifier by constructing an adaptable and convergent optimization framework, with a performance that is significantly improved, while the convergence and stability of the algorithm are deeply analyzed at both theoretical and practical levels.
In order to fully leverage the classification performance of SVM, this paper designs a parameter optimization framework based on SSA. First, within the discoverer update, we design a staged dynamic position update mechanism, and employ the nonlinear decay and periodic modulation of , , and to realize an adaptive regulation strategy characterized by strong exploration in the early phase and strong convergence in the later phase. This mechanism enhances the coverage of global search while improving the stability of late-stage convergence. Second, we incorporate boundary contraction and gradient-guided local refinement, enabling more directional exploitation within the neighborhood of promising optima and thereby improving the granularity of the resulting parameter solutions. Third, Lévy flight mutation is introduced to provide long-range jumps that strengthen the ability to escape local optima, preventing the search from being trapped in spurious basins under noisy perturbations. Finally, a population-density-based adaptive control is adopted to dynamically perceive and adjust diversity, such that the balance between exploration and exploitation no longer relies solely on the iteration index, but can be adaptively tuned according to the current search state.
In the optimization process of SSA, by simulating the foraging behavior and vigilance mechanism of the sparrow population in nature, this paper constructs an efficient parameter finding strategy framework. In this framework, the finder’s update strategy introduces a staged dynamic position update mechanism, which is formulated as follows:
This updating strategy reflects a hierarchical pattern of behavior in sparrow populations: in the early stages, it relies mainly on the exponential decay factor with the velocity term for large-scale search, i.e., global exploration, while in the later stages, it tends to converge towards the individual optimum and the group average position for local refinement. This position update mechanism not only ensures the convergence of the algorithm but also effectively avoids falling into local optimality.
In order to further enhance the dynamic regulation of the algorithm, a time-dependent adaptive parameter adjustment mechanism is introduced in the following form:
These adjustment functions embody the optimization idea of pre-exploration and post-convergence, in which the parameters decay nonlinearly with the number of iterations, or fluctuate periodically, which helps to adjust the search intensity and scope at different stages of the algorithm, thus realizing a balanced adjustment of the search process.
In terms of strengthening the local search capability, this paper designs a fine search strategy combined with gradient guidance, which is formulated as follows:
where
is the adaptive step factor,
is the gradient estimate of the fitness function, and
is the dynamically adjusted Gaussian perturbation variance. Through this multi-level optimization mechanism, the algorithm is able to achieve efficient and refined exploration within the neighborhood of the optimal solution while maintaining a strong global search capability.
In order to be able to provide a wider range of parameter space exploration capabilities while maintaining population diversity, the algorithm further introduces a Lévy flight-based mechanism, which enhances the ability of the algorithm to jump out of the local optimum, as formulated below:
where
denotes the Lévy flight step and
denotes the element-by-element product. This mechanism generates long jumps through a non-Gaussian distribution, which helps to break the local convergence state.
In order to dynamically regulate the global exploration and local utilization balance of the algorithm, this paper also introduces an adaptive control mechanism based on population density with the following formula:
where
denotes the population density and
is an adaptive control parameter. Through the dynamic perception of population diversity, the search intensity and search direction are adjusted so as to more finely regulate the balance between exploration and utilization of the algorithm.
The SSA optimization process proposed in this paper mainly consists of four core aspects: parameter space definition, discoverer update, follower update, and sentinel update, where the parameter space covers three key optimization variables, namely, the penalty factor C, the kernel function parameter γ, and the type of the kernel function, and the search ranges of these parameters are set to be C∈[2−5,210] and γ∈[2−15,2−5].
As shown in
Figure 3, in the discoverer update phase, the algorithm explores the optimal solution through a combination of global and local search strategies, where the global search ensures a broad coverage of the search space, while the local search focuses on the fine-grained exploration of the discovered regions of high-quality solutions. The follower update mechanism is mainly responsible for tracking the optimal position and avoiding the worst position, through which the overall distribution of the population is continuously optimized by this dual mechanism. Sentinel updating, on the other hand, ensures stability and effectiveness of the search process through safety checks and boundary control.