Seafloor Sediment Classification Using Small-Sample Multi-Beam Data Based on Convolutional Neural Networks

Ma, Haibo; Lai, Xianghua; Hu, Taojun; Fu, Xiaoming; Zhang, Xingwei; Song, Sheng

doi:10.3390/jmse13040671

Open AccessArticle

Seafloor Sediment Classification Using Small-Sample Multi-Beam Data Based on Convolutional Neural Networks

by

Haibo Ma

¹,

Xianghua Lai

^1,2,*,

Taojun Hu

^1,2,

Xiaoming Fu

¹,

Xingwei Zhang

¹ and

Sheng Song

¹

Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, China

²

Key Laboratory of Nearshore Engineering Environment and Ecological Security of Zhejiang Province, Hangzhou 310012, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2025, 13(4), 671; https://doi.org/10.3390/jmse13040671

Submission received: 1 February 2025 / Revised: 1 March 2025 / Accepted: 12 March 2025 / Published: 27 March 2025

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

Accurate, rapid, and automatic seafloor sediment classification represents a crucial challenge in marine sediment research. To address this, our study proposes a seafloor sediment classification method integrating convolutional neural networks (CNNs) with small-sample multi-beam backscatter data. We implemented four CNN architectures for classification—LeNet, AlexNet, GoogLeNet, and VGG—all achieving an overall accuracy exceeding 92%. To overcome the scarcity of seafloor sediment acoustic image data, we applied a deep convolutional generative adversarial network (DCGAN) for data augmentation, incorporating a de-normalization and anti-normalization module into the original DCGAN framework. Through comparative analysis of the generated versus original datasets using visual inspection and grayscale co-occurrence matrix methods, we substantially enhanced the similarity between synthetic and authentic images. Subsequent model training using the augmented dataset demonstrated improved classification performance across all architectures: LeNet showed a 1.88% accuracy increase, AlexNet an increase of 1.06%, GoogLeNet an increase of 2.59%, and VGG16 achieved a 2.97% improvement.

Keywords:

seafloor sediment classification; CNN; DCGAN

1. Introduction

Seafloor substrate is a crucial component of marine geographical information, and seafloor substrate classification plays an important role in marine engineering, seabed mineral resource development, and marine habitat research [1]. In marine engineering and infrastructure development, preliminary investigations of seabed substrate conditions in project areas are essential. For instance, during submarine cable or fiber-optic cable installation, sandy or gravelly substrates are more suitable for cable burial due to their mechanical stability and trenching feasibility, whereas rocky terrains require avoidance or specialized engineering treatments to mitigate abrasion risks. In offshore hydrocarbon and mineral exploration, substrate composition critically influences resource distribution patterns. Sandy sediments, characterized by high porosity and permeability, often serve as hydrocarbon reservoirs, while cohesive clay layers may act as impermeable cap rocks, sealing hydrocarbon accumulations. Similarly, the distribution of marine metallic mineral resources, such as polymetallic nodules and rare-earth-enriched muds, exhibits strong correlations with substrate types and depositional environments. Regarding marine ecological conservation, seabed substrate heterogeneity directly governs the spatial distribution of benthic ecosystems. Specific substrates support distinct biological communities: coral reefs thrive on hard substrates (e.g., bedrock or biogenic carbonates), mollusk populations dominate muddy sediments, and seagrass beds preferentially colonize sandy or mixed substrates. Systematic substrate mapping thus provides a scientific basis for delineating ecologically sensitive zones and establishing marine protected areas.

Traditional substrate classification methods, such as coring or grab sampling, involve field sampling, which not only incurs high costs and takes considerable time but also struggles to cover large areas of the seafloor. With the advancement of seafloor acoustic detection technologies, different substrate types produce varying responses to acoustic signals, allowing for the rapid acquisition of a large amount of seafloor substrate information. As a result, substrate classification based on seafloor acoustic data has become a hot topic in seafloor substrate classification research [2]. Many researchers have demonstrated that seafloor acoustic data can effectively be used for classifying seafloor sediments.

Common seafloor acoustic detection methods include multi-beam echo sounding, side-scan sonar, and sub-bottom profiling. Both multi-beam echo sounding and side-scan sonar can obtain seafloor depth values and backscatter intensity values for multiple measurement points within a strip-covered area. Compared to side-scan sonar, multi-beam technology can provide more accurate geographical location data, while sub-bottom profiling offers a smaller data coverage range. The principle of seafloor substrate classification based on multi-beam backscatter data is that different seafloor substrate types can reflect varying backscatter intensities. By using the backscatter data, a backscatter intensity grayscale image can be constructed. Seafloor substrate classification can then be performed based on features such as texture or backscatter intensity from the grayscale image [3].

Some studies have demonstrated the effectiveness of seabed substrate classification based on multi-beam backscatter data [4,5]. Traditional methods for seabed substrate classification using multi-beam backscatter data mainly rely on manual classification combined with actual sampling. Although this approach saves a significant amount of time and human resources compared to extensive actual sampling, there is still room for improvement in classification efficiency. The advent of CNNs has made significant progress in image classification. Famous networks include LeNet [6], AlexNet [7], GoogLeNet [8], and VGG [9]. LeNet is one of the pioneering convolutional neural networks, originally designed for handwritten digit recognition and achieving great classification results in that task. LeNet first combined a convolutional layer and a pooling layer, two novel components of neural networks; proposed a new image processing method; and implemented end-to-end training using a backpropagation algorithm, which optimized the step of manual feature design required by traditional methods. Moreover, the network is simple and utilizes fewer computational resources, allowing for decent classification results even with ordinary devices in a short period of time. LeNet-5 is the most famous and effective version of the LeNet series. AlexNet is a classic convolutional neural network proposed in 2012 for the ImageNet image classification competition. Compared to LeNet, it is more complex, with more convolutional and pooling layers. With its deep network structure, data augmentation, ReLU, and other technical innovations, it achieved outstanding results in the ImageNet large-scale visual recognition competition and received widespread recognition from scholars in related fields. GoogLeNet, also known as Inception v1, was proposed by the Google team in 2014 and won first place in the Classification Task of the ImageNet competition that year. Its biggest innovation is the unique Inception module, which combines convolutional kernels and pooling layers of different sizes to automatically learn and select the most appropriate feature extraction method. The introduction of the Inception module enabled GoogLeNet to adopt a deeper network structure while maintaining a small parameter count, achieving excellent classification performance while consuming fewer computational resources. VGG, proposed by the VGG at Oxford University, is widely used in image classification and object detection tasks. Its network structure is simple and direct, and it achieved impressive results in the ImageNet competition. However, due to its large number of model parameters, it is prone to overfitting when the dataset is small. Since seabed substrate images are relatively scarce compared to general images, we require a good data augmentation method.

As is well known, not only the VGG network but also other deep learning models require large datasets for effective training. However, obtaining a substantial amount of seabed sonar data for deep learning training is time-consuming, costly, and challenging. Therefore, the task of dataset expansion is particularly important. Common data augmentation methods include geometric transformations, image modifications, etc. While these methods have some benefits, they are limited and struggle to generate datasets with various features. The emergence of GANs [10] addresses this issue. Since their introduction, GANs have found increasing applications in fields such as computer vision, natural language processing, and human–computer interaction [11]. GANs are primarily composed of two components: a generator module and a discriminator module. The generator produces fake samples from a random noise vector, while the discriminator receives both real and generated samples and outputs a probability indicating how likely the sample is to be real. Through this adversarial process, the generator continuously improves the quality of the generated samples in order to better deceive the discriminator. Compared to traditional data augmentation methods, GANs can generate large datasets with various features. Deep convolutional generative adversarial networks (DCGANs) [12], a variant of GANs, utilize convolutional neural networks to construct both the generator and discriminator. This enables the network to effectively capture image features, generating more realistic and detailed images. Compared to traditional GANs, DCGANs produce higher-quality images and exhibit better training stability.

This paper combines CNNs for seabed sediment classification based on small-sample multi-beam backscatter data, achieving high classification accuracy while significantly reducing labor and time. The contributions of this study are as follows: (1) seabed sediment classification using CNNs on small-sample multi-beam backscatter data; (2) improvement of the traditional DCGAN by incorporating a de-normalization module to enhance the quality of generated images; (3) application of a DCGAN to augment multi-beam backscatter data, addressing the issue of scarcity of seabed sediment images in multi-beam backscatter data.

2. Materials and Methods

2.1. Data Acquisition and Processing

The study area is located on the northern continental shelf edge of the South China Sea, as shown in Figure 1. The area features a rich variety of seabed sediment types and significant sediment type variations, making it ideal for seabed sediment classification research. An EM302 multi-beam echo sounding system, installed on the vessel, with a frequency of 300 kHz and 256 beams per ping, was used for data collection. The survey area included a 45 km long and 620 m wide survey line, with the seabed sloping from northwest to southeast, where the water depth gradually increases from 90 m to 210 m. For full coverage of the study area by the multi-beam echo sounding system, gravity core sampling was conducted at 53 designated stations. The multi-beam backscatter data were processed using CARIS11.3 software to generate seabed sonar images. Ground truth samples were collected to accurately assess the sediment types, and training samples were extracted near the confirmed stations to create training and validation datasets for network model training.

2.2. Convolutional Neural Networks

CNNs have achieved remarkable success in the field of image classification due to their high computational efficiency, strong generalization ability, and ability to automatically learn and extract useful features from images, eliminating the need for manual feature extraction. Therefore, in this study, a CNN-based approach was used to perform automatic seabed sediment classification using small-sample multi-beam backscatter data. Four networks, LeNet, AlexNet, GoogLeNet, and VGG, have been proven to deliver excellent performances in image classification tasks. Among them, the LeNet network achieved remarkable classification results in handwritten digit recognition, while AlexNet, GoogLeNet, and VGG networks have all performed exceptionally well in the ImageNet competition.

This paper selected the LeNet-5 model from the LeNet network series. The network consists of an input layer, convolutional layers, pooling layers, fully connected layers, and an output layer. The combination of convolutional and pooling layers enables effective extraction of textural and structural features from images, achieving excellent classification performance even with a small number of parameters. The network architecture begins with an input layer receiving 1 × 32 × 32 grayscale images. The initial convolution layer employs six 5 × 5 convolutional kernels with stride 1 and padding 2, followed by sigmoid activation to introduce nonlinear transformations, enabling the network to learn complex functional mappings. Subsequent average pooling performs spatial downsampling to compress feature dimensions and reduce the computational complexity. The second convolutional layer utilizes sixteen 5 × 5 kernels with stride 1 and zero padding, similarly activated through sigmoid nonlinearity before further dimensionality reduction via average pooling. Finally, a fully connected layer transforms the two-dimensional feature maps into one-dimensional vectors, completing the end-to-end learning framework.

The implemented AlexNet architecture comprises an input layer, an output layer, five convolutional layers, three pooling layers, and three fully connected layers. Distinct from LeNet, this implementation employs max-pooling layers that extract the maximum pixel values within local receptive fields. This approach enhances sensitivity to textural features in the resulting feature maps. The experimental implementation utilizes rectified linear unit (ReLU) activation functions, which offer three principal advantages: (1) simplified computational complexity through linear thresholding, (2) improved gradient propagation during backpropagation through sparse activation characteristics, and (3) mitigation of gradient explosion and vanishing issues commonly associated with saturating activation functions. These design choices collectively optimize feature discriminability while maintaining computational efficiency throughout the network’s hierarchical processing stages.

The GoogLeNet architecture, comprising a total of 22 layers, is primarily characterized by its innovative Inception module. This structural design represented a significant advancement in CNNs at the time of its proposal. Unlike traditional CNNs that require manual selection of convolutional kernel sizes, the Inception module achieves automatic multi-scale feature learning through parallel implementation of convolutional operations with varying kernel dimensions (1 × 1, 3 × 3, 5 × 5) and pooling operations. Particularly, the incorporation of 1 × 1 convolutions serves dual purposes: performing dimensionality reduction to optimize computational efficiency while simultaneously enhancing network non-linearity through rectified linear unit (ReLU) activation. Furthermore, GoogLeNet introduces auxiliary classifiers in intermediate layers to mitigate the vanishing gradient problem during backpropagation, thereby stabilizing the training process for deep network architectures. These architectural innovations collectively enable effective feature extraction while maintaining manageable computational complexity.

The main feature of the VGG network is its use of a very simple and uniform convolutional layer structure, with all convolutional layers employing 3 × 3 kernels with a stride of 1 and the use of 1 × 1 convolutions. This design makes the network easier to implement and allows for increased depth, thus enhancing the model’s learning capacity.

2.3. Generative Adversarial Networks

GANs have garnered significant attention as powerful machine learning models in recent years. However, GANs still face several challenges. As a result, researchers have combined convolutional neural networks with GANs, leading to the development of DCGANs, which have yielded impressive results. Sonar images of seabed sediments are low-resolution grayscale images, primarily consisting of grayscale values and sediment texture information. Additionally, certain types of sediment images visually resemble noise, as these sediments typically do not contain as many distinguishing features as images of faces or buildings. Therefore, it is necessary to use a large amount of training data to improve the model’s classification accuracy. However, due to the high cost and time required for acquiring seabed sediment data, only a small amount of data is available to build and train seabed sediment classifiers. To address these issues, we used a convolutional generative adversarial network for data augmentation.

Figure 2 illustrates the network architecture of the DCGAN. The DCGAN consists of two sub-networks: a generator network and a discriminator network. The task of the generator is to continuously learn the distribution of the data and generate samples to deceive the discriminator. The task of the discriminator is to determine whether the data are real or generated. The goal of this study is to obtain a trained data augmentation generator to enrich the data features and improve the performance of the sediment classifier.

2.4. Inverse Normalization and Inverse Standardization

Normalization and standardization are common data preprocessing techniques. Normalization typically refers to scaling the data to a specific range, such as [0, 1], and is often used in neural networks to accelerate convergence. Standardization, on the other hand, involves transforming the data to have a mean of 0 and a variance of 1 and is commonly used in optimization algorithms to reduce bias, particularly in most machine learning models.

In CNNs, both normalization and standardization help accelerate training and improve model performance. Normalization is typically used in image data preprocessing to ensure that pixel values are within a consistent range, which helps avoid issues such as gradient vanishing or explosion. Standardization helps maintain the balance of input data, ensuring that the data distribution across different channels is consistent, thereby accelerating the optimization process and improving the stability of the network. However, in the case of the DCGAN, since the goal is to generate images, normalization and standardization can impact the final generated image, causing significant differences in grayscale values between the generated and original images. Therefore, we incorporated inverse normalization and inverse standardization modules into the original DCGAN framework to enhance the quality of the generated data.

Inverse normalization and inverse standardization refer to the process of restoring data from a normalized or standardized state back to its original range. In deep learning, inverse normalization is typically performed by multiplying the data by the range (i.e., the difference between the maximum and minimum values) used during normalization, thereby recovering the original data range. Inverse standardization is achieved by multiplying the data by the standard deviation and adding the mean, which restores the original data distribution. These two processes are commonly applied at the output layer, particularly in image generation tasks, to ensure that the model’s outputs are consistent with the original data.

3. Experiment and Analysis

3.1. Original Dataset

All the collected data were preprocessed, segmented, and labeled to form our dataset. The dataset consisted of a total of 1062 images. To evaluate the classification performance of the classifier, the amount of data in the training set was significantly smaller than that in the test set. The dataset was partitioned using a stratified random sampling approach, with 20% of the instances assigned to the training subset and the residual 80% reserved as the test subset to ensure unbiased performance assessment. The training set contained 214 images, while the test set contained 848 images. Each image had a size of 32 × 32 pixels. The specific details of the dataset are shown in Table 1, and examples of some multi-beam backscatter images are illustrated in Figure 3.

3.2. Experimental Setup

Table 2 and Table 3 present the network parameters of the generator and discriminator in the DCGAN, with an initial learning rate of 0.001.

3.3. Image Enhancement Based on DCGAN

The DCGAN was trained for 500 epochs, taking approximately 30 min. The images generated by the network at different batches are shown in Figure 4. A total of 8000 images were generated. During the conventional DCGAN-based sample generation process, it was observed that the synthesized images exhibited significant dissimilarity to the original training samples. This discrepancy primarily stemmed from the irreversible omission of de-normalization and de-standardization operations during the generation phase, despite the application of pixel-wise normalization and standardization during model initialization. Consequently, the grayscale value distribution of the generated images deviated markedly from that of the original dataset. To address this limitation, we introduced a post-processing module implementing inverse normalization and de-standardization transformations within the DCGAN architecture. Experimental validation confirmed that this modification substantially enhanced the statistical consistency between the synthesized and authentic data distributions.

As the training progressed, the generated images gradually became clearer, transitioning from blurry to sharp and from abstract to detailed. Initially, we visually selected images that were similar to the original dataset. Next, we performed feature extraction using the gray level co-occurrence matrix (GLCM) [13] and compared the features with those of the original dataset. The analysis of features extracted from the GLCM revealed that the mean gray value serves as the most discriminative characteristic for categorical differentiation. Specifically, rock specimens exhibited the highest mean gray values, exceeding 160, while sand particles demonstrated intermediate values ranging from 150 to 160. In contrast, silt displayed the lowest mean gray values, approximating 130. Finally, 600 high-quality images were selected for data augmentation. The new dataset was then incorporated into the original dataset, resulting in the augmented dataset. Table 4 provides a detailed description of the augmented dataset.

3.4. Seafloor Sediment Classification Based on Multi-Beam Backscatter Data from the Original Dataset

To evaluate the performance of the CNNs in seafloor sediment classification and to demonstrate the effectiveness of data augmentation using the DCGAN, we first conducted a comparative analysis using four different CNN classification methods based on the original dataset. LeNet, AlexNet, VGG, and GoogLeNet have demonstrated strong image classification performances in other domains. The models’ classification performances were evaluated using cross-validation, with accuracy quantified as the proportion of accurately predicted images relative to the entire test set population. The classification accuracies of these deep learning models are shown in Table 5, and the classification results are detailed in Figure 5. From Table 5, it can be observed that LeNet, AlexNet, VGG, and GoogLeNet all achieved a good classification performance based on the original data, with overall classification accuracies exceeding 90%. LeNet achieved the highest accuracy, reaching 95.17%. All four networks could accurately classify silt, while the recognition of rock showed the poorest performance. The rock recognition accuracy of the GoogLeNet network was only 71.50%, and the rock recognition rates for LeNet and AlexNet were below 80%. VGG’s rock recognition rate was only 65.28%. The recognition accuracy for sand was high across all four networks, exceeding 97%, with LeNet and VGG16 achieving 100%.

Across the three sediment categories, all four networks performed well. As shown in the confusion matrices of the four models in Figure 4, these models tended to classify some rocks as sand, and some sands as rock. However, they rarely misclassified rock and sand as silt. This is because images of sand and rock are relatively similar in the dataset, whereas the features of silt images exhibit more distinct differences from the other two categories.

3.5. Classification Accuracy of Seafloor Sediment Based on Augmented Multi-Beam Backscatter Data

The classification accuracy and confusion matrix after training on the augmented dataset are shown in Table 6 and Figure 6. Compared to the original dataset, the overall classification accuracy of the four classifiers improved. Specifically, LeNet’s accuracy increased by 1.88%, AlexNet’s accuracy improved by 1.06%, GoogLeNet’s classification accuracy rose by 2.59%, and VGG’s classification accuracy increased by 2.97%. Among the four networks, VGG, having a larger number of parameters, required a larger dataset for training. Therefore, after data augmentation, its accuracy significantly improved, especially in the recognition of rock-class images. When trained on the original dataset, the model’s classification accuracy for rock images was only 65.28%, whereas after data augmentation, the classification accuracy for rock images reached 87.05%, marking a substantial increase of 21.77%. This demonstrates that data augmentation based on the DCGAN is an excellent method for improving classification performance.

4. Discussion

The features of seabed substrate acoustic images are not very distinct, and in certain marine areas, these images may appear more like noise. As a result, manually extracting features is highly inconvenient. However, CNNs, with their ability to automatically select features and easily adjust network structures, have not only gained popularity among researchers across various fields but also demonstrated their end-to-end advantages in the classification task of seabed substrate acoustic images. This significantly reduces the workload and still yields a satisfactory classification performance, even with small-sample training datasets.

In order to generate clearer images, a model with a more easily adjustable structure is required. Traditional GAN often struggle to produce satisfactory images, whereas DCGAN allow for more flexible structural adjustments and the addition of other modules, making them highly suitable for seabed image generation. In our DCGAN, we incorporated a de-normalization and anti-normalization module, which significantly improved the similarity between the generated data’s grayscale values and distribution compared to the original data, greatly enhancing the quality of the generated images.

As shown in Figure 4, with the increase in the number of epochs, the generated images gradually evolved from abstract to more concrete. From the final selected results, random samples show that it is indistinguishable to the naked eye whether the images were generated by the generator or not. Moreover, the grayscale co-occurrence matrix features of the generated images were highly similar to those of the original images. These observations demonstrate the effectiveness of the improved DCGAN. By adding the generated data to the training set, the quantity and feature richness of the data were increased, thereby enhancing the final classification performance of the classifier.

As shown in the data in Table 4, the traditional CNN methods achieved relatively good classification results, with the overall classification accuracy of the four networks exceeding 90%. Among them, VGG16 achieved the highest accuracy, reaching 95.75%. After training the network models on the augmented dataset, the classification accuracy of each model improved to a varying degree. Specifically, LeNet’s accuracy increased by 1.88%, AlexNet’s accuracy improved by 1.06%, GoogLeNet’s classification accuracy rose by 2.59%, and VGG16’s classification accuracy increased by 2.97%. In the recognition of rock-class images, when the VGG network model was trained on the original dataset, the classification accuracy was only 65.28%. However, after data augmentation, the classification accuracy for rock-class images increased significantly to 87.05%, a substantial improvement of 21.77%.

5. Conclusions

In this study, CNNs were applied for the seabed substrate classification of multi-beam backscatter data images. Through experimentation, the following conclusions were drawn:

The CNNs can achieve good results when trained on small-sample datasets, with accuracy rates exceeding 90%.
The DCGAN was used to learn the data distribution of the original dataset and generate new multi-beam backscatter grayscale images for data augmentation. After training the four classification models on the augmented dataset, the classification accuracy improved compared to the original dataset, with the largest improvement observed in the VGG network, which increased by 2.97%. In the recognition of rock-class images, when the VGG network was trained on the original dataset, the classification accuracy was only 65.28%. However, after data augmentation, the classification accuracy for rock-class images increased significantly to 87.05%, representing a substantial improvement of 21.77%.
The introduction of a de-normalization and anti-standardization module into the traditional DCGAN learning model was shown to improve the quality of the generated images, as observed through visual inspection and the gray level co-occurrence matrix method.

This data augmentation approach can also be applied to other data-scarce domains, such as seabed target recognition. However, this method has limitations, specifically that the quality of the generated data from the DCGAN is unstable, necessitating the use of visual inspection and gray level co-occurrence matrices for selection. In future research, we aim to design a more stable data generator with better performance. Additionally, super-resolution image reconstruction techniques can effectively address the low-resolution issues of sonar images, which will be a key focus of our future studies.

Author Contributions

Conceptualization, H.M., X.L. and T.H.; methodology, H.M., X.F., X.Z. and S.S.; investigation, H.M. and X.L.; data curation, X.Z.; writing—original draft preparation, H.M.; writing—review and editing, X.L., T.H., X.F., X.Z. and S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhao, J.; Yan, J.; Zhang, H.; Meng, J. A new radiometric correction method for side-scan sonar images in consideration of seabed sediment variation. Remote Sens. 2017, 9, 575. [Google Scholar] [CrossRef]
Ji, X.; Yang, B.; Tang, Q. Acoustic seabed classification based on multibeam echosounder backscatter data using the PSO-BPAdaBoost algorithm: A case study from Jiaozhou Bay, China. IEEE J. Ocean. Eng. 2020, 46, 509–519. [Google Scholar] [CrossRef]
Tang, Q.H.; Ji, X.; Ding, J.S.; Zhou, X.H.; Li, J. Research Progress and Prospect of Acoustic Seabed Classification Using Multibeam Echo Sounder. Adv. Mar. Sci. 2019, 37, 1–10. [Google Scholar]
Anokye, M.; Cui, X.; Yang, F.; Fan, M.; Luo, Y.; Liu, H. CNN Multibeam Seabed Sediment Classification Combined with a Novel Feature Optimization Method. Math. Geosci. 2024, 56, 279–302. [Google Scholar] [CrossRef]
Wan, J.; Qin, Z.; Cui, X.; Yang, F.; Yasir, M.; Ma, B.; Liu, X. MBES Seabed Sediment Classification Based on a Decision Fusion Method Using Deep Learning Model. Remote Sens. 2021, 14, 3708. [Google Scholar] [CrossRef]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, 26th Annual Conference on Neural Information Processing Systems 2012, Lake Tahoe, NV, USA, 3–6 December 2012; Volume 25. [Google Scholar]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Zhao, Y.; Zhu, K.; Zhao, T.; Zheng, L.; Deng, X. Small-sample seabed sediment classification based on deep learning. Remote Sens. 2023, 15, 2178. [Google Scholar] [CrossRef]
Radford, A. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
Haralick, R.M.; Shanmugam, K.; Dinstein, I. Textural features for image classification. IEEE Trans. Syst. Man Cybern. 1973, SMC–3, 610–621. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of the study area.

Figure 2. Structure of the deep convolutional generative adversarial network.

Figure 3. A subset of the dataset (from left to right: rock, sand, and silt).

Figure 4. From left to right, top to bottom, the different types of seabed sediment images generated by the DCGAN are sequentially displayed as rock, sand, and silt, with the progression of training epochs. As the epochs increase, the DCGAN-generated images of the three types of seabed sediments gradually transition from blurry and abstract to clear and detailed.

Figure 5. Confusion matrix of training different classifiers with the original dataset.

Figure 6. Confusion matrices of different classifiers trained on the augmented dataset.

Table 1. Dataset distribution statistics.

	Rock	Sand	Silt	Overall
Dataset	Rock	Sand	Silt	Overall
Training Set	49	84	81	214
Test Set	193	334	321	848
Overall	242	418	402	1062

Table 2. Network parameters of the convolutional generative adversarial network generator.

Generator
Layer Name	T-conv1	T-conv2	T-conv3	T-conv4
Channel	128	64	32	1
Padding	0	1	1	1
Kernel size	4	4	4	4
Stride	1	2	2	2
Activation	ReLU	ReLU	ReLU	Tanh
Normalization	BatchNorm2d	BatchNorm2d	BatchNorm2d	×

Table 3. Network parameters of the convolutional generative adversarial network discriminator.

Discriminator
Layer Name	conv1	conv2	conv3	conv4
Channel	32	64	128	128
Padding	1	1	1	0
Kernel size	4	4	4	4
Stride	2	2	2	1
Activation	LeakyReLU	LeakyReLU	LeakyReLU	Sigmoid
Normalization	BatchNorm2d	BatchNorm2d	BatchNorm2d	×

Table 4. Distribution statistics of the enriched dataset.

	Rock	Sand	Silt	Overall
Dataset	Rock	Sand	Silt	Overall
Training Set	212	292	285	789
Test Set	193	334	321	848
Overall	405	626	606	1637

Table 5. Original data classification accuracy.

	Rock	Sand	Silt	Overall
Model	Rock	Sand	Silt	Overall
LeNet	78.24%	100.00%	100.00%	95.17%
AlexNet	78.24%	98.50%	100.00%	94.46%
GoogLeNet	71.50%	97.60%	100.00%	92.57%
VGG16	65.28%	100.00%	100.00%	92.10%

Table 6. Classification accuracy of each classifier after data augmentation, where ↑, I, and ↓ indicate increase, no change, and decrease, respectively.

	Rock	Sand	Silt	Overall
Model	Rock	Sand	Silt	Overall
LeNet	87.05% ↑	100.00% I	100.00% I	97.05% ↑
AlexNet	80.83% ↑	100.00% ↑	100.00% I	95.99% ↑
GoogLeNet	81.56% ↑	94.91% ↓	100.00% I	95.16% ↑
VGG16	87.05% ↑	100.00% I	100.00% I	95.07% ↑

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ma, H.; Lai, X.; Hu, T.; Fu, X.; Zhang, X.; Song, S. Seafloor Sediment Classification Using Small-Sample Multi-Beam Data Based on Convolutional Neural Networks. J. Mar. Sci. Eng. 2025, 13, 671. https://doi.org/10.3390/jmse13040671

AMA Style

Ma H, Lai X, Hu T, Fu X, Zhang X, Song S. Seafloor Sediment Classification Using Small-Sample Multi-Beam Data Based on Convolutional Neural Networks. Journal of Marine Science and Engineering. 2025; 13(4):671. https://doi.org/10.3390/jmse13040671

Chicago/Turabian Style

Ma, Haibo, Xianghua Lai, Taojun Hu, Xiaoming Fu, Xingwei Zhang, and Sheng Song. 2025. "Seafloor Sediment Classification Using Small-Sample Multi-Beam Data Based on Convolutional Neural Networks" Journal of Marine Science and Engineering 13, no. 4: 671. https://doi.org/10.3390/jmse13040671

APA Style

Ma, H., Lai, X., Hu, T., Fu, X., Zhang, X., & Song, S. (2025). Seafloor Sediment Classification Using Small-Sample Multi-Beam Data Based on Convolutional Neural Networks. Journal of Marine Science and Engineering, 13(4), 671. https://doi.org/10.3390/jmse13040671

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Seafloor Sediment Classification Using Small-Sample Multi-Beam Data Based on Convolutional Neural Networks

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Acquisition and Processing

2.2. Convolutional Neural Networks

2.3. Generative Adversarial Networks

2.4. Inverse Normalization and Inverse Standardization

3. Experiment and Analysis

3.1. Original Dataset

3.2. Experimental Setup

3.3. Image Enhancement Based on DCGAN

3.4. Seafloor Sediment Classification Based on Multi-Beam Backscatter Data from the Original Dataset

3.5. Classification Accuracy of Seafloor Sediment Based on Augmented Multi-Beam Backscatter Data

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI