Next Article in Journal
Mapping Dependence between Extreme Skew-Surge, Rainfall, and River-Flow
Next Article in Special Issue
Differential Evolution Algorithm-Aided Time-Varying Carrier Frequency Offset Estimation for OFDM Underwater Acoustic Communication
Previous Article in Journal
A Framework for Optimal Sensor Placement to Support Structural Health Monitoring
Previous Article in Special Issue
Performance Comparison of Feature Detectors on Various Layers of Underwater Acoustic Imagery
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Small-Sample Sonar Image Classification Based on Deep Learning

1
School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an 710072, China
2
Ocean Institute of Northwestern Polytechnical University, Taicang 215400, China
*
Author to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2022, 10(12), 1820; https://doi.org/10.3390/jmse10121820
Submission received: 5 October 2022 / Revised: 2 November 2022 / Accepted: 21 November 2022 / Published: 25 November 2022
(This article belongs to the Special Issue Application of Sensing and Machine Learning to Underwater Acoustic)

Abstract

:
Deep learning is a core technology for sonar image classification. However, owing to the cost of sampling, a lack of data for sonar image classification impedes the training and deployment of classifiers. Classic deep learning models such as AlexNet, VGG, GoogleNet, and ResNet suffer from low recognition rates and overfitting. This paper proposes a novel network (ResNet-ACW) based on a residual network and a combined few-shot strategy, which is derived from generative adversarial networks (GAN) and transfer learning (TL). We establish a sonar image dataset of six-category targets, which are formed by sidescan sonar, forward-looking sonar, and three-dimensional imaging sonar. The training process of ResNet-ACW on the sonar image dataset is more stable and the classification accuracy is also improved through an asymmetric convolution and a designed network structure. We design a novel GAN (LN-PGAN) that can generate images more efficiently to enhance our dataset and fine-tune ResNet-ACW pretrained on mini-ImageNet. Our method achieves 95.93% accuracy and a 14.19% increase in the six-category target sonar image classification tasks.

1. Introduction

Deep learning has been successfully applied to optical image classification, and the research on transferring the deep learning to underwater image recognition has also started [1,2]. Underwater images are mainly divided into optical images and acoustic images, which usually include underwater targets, seabed topography, and seabed sediments. Because of the strong noise in the imaging background and the low operating frequency band of the sonar, the resolution and contrast of the sonar images are low, and the target edge is fuzzy [3]. Therefore, how to effectively classify sonar images has become a major issue for underwater target recognition [4].
In the past decades of research, the mainstream method of underwater target classification was extracting features manually and identifying them, which had a poor robustness and generalization due to the inevitable loss of information in the feature extraction process [5]. Deep learning uses multiple processing layers to form a computing model to study data with different abstract levels based on artificial neural networks. The deep learning structure can automatically extract features of unstructured and structured data, and the deep learning methods greatly improve the latest technology in different fields, including automatic sonar target classification [6,7].
Reference [8] proposed a sonar target classification algorithm based on multi-angle sensing, fractional Fourier transform features, and a three-layer hidden DBN. Reference [9] proposed an effective convolutional network (ECNet) for sidescan sonar image semantic segmentation. C. Li et al. [10] adopted the method of a sub-channel feature cascade. The accuracy of the deep neural network using multi-channel cascade features was significantly higher than that of the recognition method based on a support vector machine (SVM) under the condition of a low SNR. Qin X et al. tested AlexNet, LeNet, ResNet, VGG, and DenseNet at different depths on the shallow water sidescan sonar images collected at the Pearl River Estuary, and finally, ResNet-18 achieved the best accuracy [11]. Jin et al. proposed an underwater target recognition model that integrated a significant region segmentation and the pyramid pooling of images. The accuracy of nine targets classification in the measured sonar image dataset reached 89.52% [12].
At present, most research on underwater target recognition directly applies the network optimized by optical image modification [13,14]. The number of sonar images is far less than that of optical images [15]. The few-shot training of a deep network is more difficult, and the problem of overfitting is more prominent [16]. Tang Y et al. expanded the dataset of sidescan sonar sunken ship images through data enhancement methods, mainly including image multi-scale clipping and amplification, image translation, image rotation, image mirroring, and noise addition [17]. On the expanded dataset, VGG performed better than before, but repeated images cannot solve the problem of overfitting caused by few-shot learning. Qin X et al. enhanced their datasets by generative adversarial networks (GANs) [11], and the result indicated that this method could improve the accuracy of the sediment classification, but the effects of the GANs were limited and they are computationally expensive. Huo et al. applied transfer learning to VGG-16, which improves the classification of sidescan sonar images [18]. However, transfer learning did not perform well when the SNR of the dataset is low or the characteristics between the categories are similar. Steiniger et al. demonstrated how to apply transfer learning to GANs for generating synthetic sidescan snippets in a more robust way [19], and their work could help other researchers in the field of sonar image processing to augment their dataset with additional synthetic samples. However, as the authors pointed out in the article, the images generated by the TransfGAN had a low variability due to the small training set, and the increase in the performance of a classifier CNN when augmented with synthetic data from this GAN was small.
In order to solve the problem of sonar image classification with small samples, we propose a novel structure based on a depth residual network (ResNet-ACW) and a combined few-shot strategy derived from a GAN and TL. Figure 1 shows the context of this paper and the main contributions are as follows:
1
We propose a novel network structure (ResNet-ACW) based on ResNet-18, which effectively improves the utilization of the network’s feature matrix and the classification accuracy without significantly improving the network parameters and computational complexity.
2
We propose a novel generative adversarial network model (LN-PGAN), which can generate high-quality sonar images more efficiently and faster to enhance the dataset.
3
We pretrain and fine-tune our model on mini-ImageNet. The classification accuracy on the six-category target sonar image dataset reaches 95.93% with the combined few-shot strategy, which is 14.19% higher than the baseline model.

2. Materials and Methods

2.1. Sonar Image Dataset

In this paper, the dataset contains sonar images from various acoustic imaging devices, such as sidescan sonar, forward-looking sonar, and three-dimensional imaging sonar. Sidescan sonar transmits dense beams to the seabed and receives and records the backscattering intensity through the transducer. The forward-looking sonar performs imaging within a certain angle interval θ by collecting the reflected echoes from the front. Three-dimensional imaging sonar generates high-definition sonar images by transmitting and collecting a large number of reflected echoes in the range of θ 1 × θ 2 from the front. Through sorting and screening, the images generated by the three types of sonar are classified into 6 categories, namely aircraft, sunken ship, dummy, tire, pipe, and rock. There are 786 images in the sonar dataset, including both RGB and single-channel images. Figure 2 shows the specific image data, and Table 1 shows the quantity of samples in each category.
As shown in Figure 2, the sonar image differs from the optical image of the same object due to the different imaging methods, which appears in image clarity and target texture. Sonar imaging relies on the high-frequency acoustic echoes and the resolution of the signal is lower than that of optical images. The characteristics of sonar images put forward higher requirement for the classification network. The sonar image dataset in this paper is divided as follows: 70% of the data is randomly selected as train data, 15% of the data is selected as validation data, and the rest is selected as test data. We preprocess the sonar image data through noise reduction and cropping. After preprocessing, the size of the image is unified to [224 × 224]. Meanwhile, we also use three methods to augment the sonar image:
  • Mirror Flip the original image;
  • Rotate the original image three times, 90 each time;
  • Randomly crop the original image. The cropping range is [0.9, 1.1] times the length and width of the image and then restore the original image size through image interpolation.
After three image augmentation methods, the amount of data in the training set is increased by 6 times, from 550 images to 3300 images.

2.2. Proposed Residual Network Model

Because the overfitting phenomenon of deep network models is more serious in few-shot learning, researchers usually choose lightweight network models in the classification task of sonar images. X. Qin et al. tested AlexNet, LeNet, ResNet, VGG, and DenseNet at different depths on their sidescan sonar image dataset, and finally, ResNet-18 achieved the best accuracy [11]. We perform the same test on our dataset and reach the same conclusion. As the result, we reconstruct the network based on ResNet-18.

2.2.1. Baseline Model

In the process of backpropagation, the problems such as gradient explosion, disappearance, and degradation occur in the wake of over-multiplication of gradients caused by deepening the network. To overcome such pitfalls of deep network, He Kaiming proposed deep residual learning in 2015 and named his network ResNet [20]. As shown in Figure 3, the beginning module of ResNet contains a 7 × 7 large convolution kernel and a maximum pooling layer, while the final module contains an average pooling layer and a fully connected layer. ResNets with different depths have the same beginning and final module, and their difference is embodied in the structure and stacking number of residual blocks in the middle layers. The residual block of shallow ResNet models contains two 3 × 3 convolution kernels, two batch normalization (BN) layers, and an activation function (AF). We build the baseline model with two residual blocks stacked in each of the middle four layers, and the input channel dimensions are [64, 128, 256, 1024], from shallow to deep.

2.2.2. The Optimization on ResNet-ACW

To achieve higher accuracy on the sonar image dataset, we modified the baseline model as follows:
1
ResNet-18 uses the ReLU function with the formula as follows.
f R e L U = max ( 0 , x )
Assume that the input of a neuron is a i n , the output is z after multiplying with the weight W, and the output after ReLU is a o u t . By the forward propagation formula and the ReLU activation function definition, we have
a o u t = f R e L U ( z ) = f R e L U ( W a o u t + b ) = max ( 0 , W a o u t + b )
For a fixed learning rate α , the weight W increases as the absolute value | J W | of the gradient increases. When the next forward propagation is performed, if the output z is less than 0, there is an output a o u t of the activation function, then the chain of the derivation rule has
L W = L z × z W = L z × ( a o u t ) T = 0
where the gradient of the weight W is 0 and it will no longer change according to the weight update formula. At this time, z is an outlier for ReLU, causing the ReLU function to be permanently closed and resulting in the death of neurons.
Swish is an improved activation function based on ReLU and its expression and first derivative function are as shown in Equations (4) and (5).
f S w i s h ( x ) = x 1 + e β x
d f S w i s h ( x ) d x = 1 + ( 1 + x ) e β x ( 1 + e β x ) 2
Swish activation function is approaching 0 in the negative semi-axis, which can solve the problem of neuron deactivation. On ResNet-ACW, we replace ReLU with Swish.
2
Sonar images have the characteristics of low resolution, low contrast, and blurred object edges. The 7 × 7 convolution kernel used by ResNet-18 is too large to fully abstract its features. Inspired by reference [21], we decompose the large convolution kernel as shown in Figure 4. The blue 7 × 7 convolution kernel is decomposed into a black 3 × 3 convolution kernel and a green 5 × 5 convolution kernel. The green 5 × 5 convolution kernel is decomposed into an orange 3 × 3 convolution kernel and purple 3 × 3 convolution kernel. According to the principle of asymmetric convolution [22], the 3 × 3 convolution kernel is decomposed into smaller red 1 × 3 convolution kernels and yellow 3 × 1 convolution kernels.
We call the first convolutional layer in ResNet-18 as De0, and our proposed convolution kernel decomposition structure as De1 and De2. After decomposition, our structure obtains six convolutional layers with depths [32, 32, 32, 32, 32, 64, 64], deepening the first part of the network with almost constant parameters. Figure 5 shows the structure of De1 and De2.
3
There is redundancy in the residual structure of ResNet-18, which leads to too many similar feature maps generated during the up-dimension operation [23,24]. The BN layer performs feature normalization on the entire batch of data and uses the mean and variance of a batch of data to simulate the mean and variance of all data. Due to the small amount of sonar image data and the small batch size, too much use of BN layers in small batch training will make the simulation effect worse and increase the difficulty of network learning. We refer to the residual block structure baseline model as RB0 and the residual block structures proposed in this paper as RB1, RB2, RB3, and RB4, which is shown in Figure 6. Our proposed RB structures advance the AF layer before CONV to pre-activate CONV and delete a BN layer. RB3 and RB4 have an AF layer reduced on the basis of RB1 and RB2 for comparison.
4
Small batch input will also lead to poor stability of network training. Inspired by Dense Connect [25], this paper proposes a new branch structure: weight reuse structure (WRS). The schematic diagram of the network structure combined with WRS is shown in Figure 7. WRS reuses the high-order feature matrix output from the last layer of each part of the network. The features are dimensionally reduced and fused by extracting features and downsampling operations again through the convolutional layer on the branch. The depths of the convolution kernels on the three branches are 64, and the sizes of the convolution kernels are [3, 5, 5]. The feature fusion method used in this paper is as follows. After straightening the features of different channels, splice them in sequence, and output them through the fully connected layer. The feature with output of 512 on the trunk is sent to an FC layer with output of 64, and the feature fused in WRS is sent to another. Finally, the outputs of the two FC layers are fused again and sent to the fully connected layer whose output is the number of categories.

2.3. The Proposed GAN

2.3.1. DCGAN

Generative adversarial network is an unsupervised deep learning model. The network model framework consists of generator model and discriminative model. The generator model randomly generates data through certain hidden layer information and the discriminative model estimates whether the sample data come from real training data or the generator model. A common problem when training a GAN is the so-called mode collapse, where the generator produces the same image regardless of the input noise. The DCGAN proposed in 2015 used a convolutional neural network to build the generator and a discriminator, which solved the problems of unstable GAN training, mode collapse, and internal covariate transformation [26]. Extensive studies have shown that the generator model and discriminative model of DCGAN can learn rich hierarchical representations on object components and backgrounds. Because the GAN proposed in the paper [26] is designed to generate images with a size of 64 × 64 based on optical images, which is not suitable for the dataset and experimental conditions of our task, we build our GAN based on DCGAN to produce excellent sonar images.

2.3.2. Our Structure (LN-PGAN)

As shown in Figure 8, the generator model designed in this paper first generates 100-dimensional random noise, maps it to a matrix of size 50,176 through a fully connected layer, and converts it into a matrix of 7 × 7 × 1024. The length and width of the matrix are expanded by the deconvolution operation with a convolution kernel size of 3 × 3 and a stride of 2. Before the deconvolution operation, the eigenvalues are normalized by batch normalization and PReLU activation function to accelerate the convergence of the network and improve the learning ability of the network. After 5 deconvolution layers, the output size is 224 × 224 × 3 generated data. The data generated by the generator model or real data are shuffled and inputted into the discriminative model. In the discriminative model, the image is extracted layer by layer through four convolutional layers with a stride of 2, and a value is output through the fully connected layer. Finally, this value passes through the integrated output discriminative model of the sigmoid activation function to estimate the probability of the source of the image. In the middle of the adjacent convolutional layers, the feature matrix is also integrated to a certain extent through batch normalization and activation function, and at the same time, the drop-out method is used to reduce the overfitting of the network. In particular, in the discriminative model, we replace BN with layer normalization (LN) and use PReLU activation function instead of LeakyReLU in both models. As mentioned earlier, BN normalizes number, weight, and height on the batch, which is not good for small batch size, while LN normalizes channel, weight, and height in channel direction independent of the small batch.

2.3.3. Image Evaluation

In addition to subjective analysis, peak signal-to-noise ratio (PSNR) is generally used to compare the similarity of two images. We can judge the quality of the generated image by GAN through calculating the PSNR of the sample image and the generated image. The PSNR calculation formula of two pictures I and K with size m × n is defined as:
P S N R = 10 log 10 ( M A X 1 2 M S E )
where M A X 1 is the maximum pixel value of image I, and the mean square error (MSE) is defined as:
M S E = 1 m n i = 0 m 1 j = 0 n 1 [ I ( i , j ) K ( i , j ) ] 2
In addition, structural similarity (SSIM) is measured from three aspects of lightness l, contrast c, and structure s of two images, which is an effective evaluation index to measure the similarity of images.
S S I M ( I , K ) = l ( I , K ) × c ( I , K ) × s ( I , K )
where
S S I M ( I , K ) = l ( I , K ) × c ( I , K ) × s ( I , K )
l ( I , K ) = 2 μ I μ K + c 1 μ I 2 + μ K 2 + c 2 , c 2 = ( 0.01 × 255 ) 2 c ( I , K ) = 2 σ I σ K + c 2 σ I 2 + σ K 2 + c 2 , c 2 = ( 0.03 × 255 ) 2 s ( I , K ) = σ I K + c 3 σ I σ K + c 3 , c 3 = 0.5 c 2
where μ I and μ K are the mean values of I and K, σ I and σ K are the variances of I and K, and σ I K is the covariance of I and K. The larger the value of PSNR, the more similar the two pictures are. In addition, the closer the value of SSIM is to 1, the closer the picture generated by the GAN is to the real picture.

2.4. Transfer Learning

Transfer learning applies the knowledge of the source domain and the learning task of the source domain to the learning of the prediction function f T ( x ) on the target domain. Fine-tune is a method of transfer learning. This method freezes some layers of the pretrained model, usually the first few convolution layers of the network. The parameters in the frozen layers will not change with training, so it is beneficial to the extraction of the underlying features of the image. Other unfrozen layers of the network are subsequently trained, maintaining some variability in the network. Mini-ImageNet is a sub-dataset of ImageNet, with a total of 100 categories of data and 600 images per category. We pretrain and fine-tune our network model on mini-ImageNet to solve the problems of insufficient network training and low weight utilization.

2.5. The Fusion Few-Shot Strategy

In the methods mentioned in the previous sections, we improved ResNet-18 as a classifier, built a generative adversarial network to expand the dataset, and used the fine-tuning method in transfer learning to initialize and pretrain the network parameters. This section combines the optimization strategies above and proposes a small-sample sonar image classification method that combines multiple optimization strategies. Figure 9 presents a schematic diagram of the strategy proposed in this paper. The specific steps are as follow:
  • Divide the dataset into training set, validation set, and testing set;
  • Input real data into LN-PGAN, generate synthetic data through the training of generative adversarial network, and expand the training set in the dataset according to certain principles;
  • Perform data preprocessing operations, such as noise reduction and data augmentation, with the training set composed of real data and synthetic data, as the training data of the network;
  • Based on transfer learning, the network model parameters pretrained with a large amount of data are imported into ResNet-ACW in a fine-tuning method;
  • Training and classification of ResNet-ACW.

3. Results

We run our model on a desktop workstation with an AMD Ryzen 9 3900X CPU, a 3080 GPU, 64 G of RAM, and Windows 10 as the operating system. To ensure credibility, each classification accuracy result is averaged over five tests.

3.1. Performance Comparison of Classical Networks on Sonar Image Datasets

We build AlexNet, VGG, and ResNet with different layers to compare their performance on our dataset and calculate the parameters and floating-point operations (FLOPs) of each network. For a convolution kernel with a size of h × w , the general calculation method of the network parameters is as follows:
P a r a m e t e r s = h × w × C i n × C o u t
where C i n and C o u t are the depths of the input and output feature matrices. The calculation method of the floating-point operand of the general convolutional layer is as follows:
F L O P s = h × w × C o u t p u t × ( 2 × C i n p u t × h × w )
where h and w are the height and width of the output feature matrix of the layer. The FLOPs calculation of the fully connected layer is as follows:
F L O P s ( F C ) = O × ( 2 × I 1 )
where I and O are the number of input and output neurons in the fully connected layer. Table 2 shows the network parameters, FLOPs, the average accuracy of the test set of five experiments on the sonar image dataset, and the running time of a single experiment for the three classic networks of AlexNet, VGG, ResNet, and their variants. Compared with VGG, ResNet takes up less computing resources and computing time to obtain the best classification results, which is consistent with the results of reference [11].

3.2. Results of Different Optimization on ResNet-ACW

3.2.1. Activation Function

We replace the ReLU activation function used in the baseline model with a different activation function, including ELU, Swish, and PReLU. Table 3 shows the test results. The Swish activation function achieves the best performance when applied to the baseline model, while the ELU activation function performs the worst. As a result, we choose Swish as the activation function to optimize the model.

3.2.2. Convolution Kernel Decomposition

We propose two convolution kernel decomposition structures in Figure 5, named De1 and De2, respectively. In order to verify the improvement effect in the decomposition structure proposed in this paper on the network performance, the first part De0 of the benchmark network is decomposed according to both decomposition structures, and the training is carried out under the same experimental environment and standard. The results are shown in Table 4, and both of our proposed decomposition structures improve the test accuracy of the baseline model without increasing the amount of parameters and FLOPs. The number of channels for a feature extraction increases with the number of convolution kernels and the network can extract more detailed information, which improves the classification accuracy of the network. Compared with the large convolution kernel, the stacked structure of small convolution kernels increases the number of convolution kernels with the same receptive field. The increase in the number of channels for a feature extraction improves the accuracy of the network, but the number of parameters does not change significantly, which is an advantage brought by the decomposition of large convolution kernels into small convolution kernel structures. Multiple memory calls become more time-consuming, so the decomposition structure will increase the operation time of the network to a certain extent. However, the time cost increase is within an acceptable range and will be resolved with the improvement in the computer hardware.

3.2.3. Residual Block

We propose four residual block structures in Figure 6, named RB1, RB2, RB3, and RB4, respectively. We train the networks with different residual block structures according to the same experimental environment and standard, and the results are summarized in Table 5. Changing the structure of the residual block does not affect the number of parameters and FLOPs of the network.
Although the network parameters and complexity of several residual structures are the same, it can be seen from Table 5 that the pre-activation structure increases the training time of the network. When the convolutional layer is at the top of the structure, the data can be dimensionally reduced, and the latter structure will process the dimensionally reduced data faster. RB2 and RB3, as the improved structure of the residual block optimization in this paper, have achieved good results on the baseline model, among which the structure of RB3 is better. As a comparison structure of RB1, RB3 performs worse than RB0, which shows the importance of pre-activation from the side. In addition, RB4, as the contrast structure of RB2, performs better than RB3 and basically the same as RB0. The result of combining RB2 is better than RB1, which shows that the effect of the BN layer before the first convolutional layer is better than that after the first convolutional layer. To sum up, it is concluded that the residual structure is improved when optimizing the network structure in the sonar image dataset in this paper:
1
The pre-activated structure can increase the feature extraction ability of the structure to the input feature matrix, thereby improving the recognition rate of the entire network, but at the same time, it will increase the network running time to a certain extent. It is better not to delete the AF layer easily, which may make the network learn less important complex nonlinear features.
2
For sonar image datasets with a small number of samples and a small batch size, a certain BN layer can be deleted, which can effectively improve the accuracy of the network. The BN layer is better placed in the first volume before the accumulation layer, and the normalization of the batch features in advance is conducive to the feature extraction of the subsequent convolutional layers.

3.2.4. Weight Reuse Structure

In the previous discussion, we analyzed the advantages and disadvantages of large convolution kernels and small convolution kernels. Because the small convolution kernels can re-extract the front-end detailed features, and the large convolution kernels can sort out the back-end high-order features, the size selection of the convolution kernels on different branches should follow the principle of keeping the same or increasing from front to back in the WRS. As shown in Figure 6, we refer to the convolutional layers on the first, second, and third branches as convolutional layer A, convolutional layer B, and convolutional layer C, respectively. In order to choose an optimal structure, the recognition performance when the convolutional layers A, B, and C on the WRS branch are composed of convolution kernels of different sizes is summarized in Table 6. Convolutional layers A, B, and C are all composed of square convolution kernels with a depth of 64. In the table, ‘1’, ‘3’, and ‘5’ represent the side lengths of the square convolution kernels that make up the convolutional layers. From the results, it can be seen that when A is 3, B is 5, and C is 5, the WRS has the best performance, and when A is 3, B is 3, and C is 3, the performance is worse, which verifies the previous discussion on the role of convolution kernels with different sizes. Therefore, we choose A as 3, B as 5, and C as 5 as the optimal choice of the WRS in this paper.

3.2.5. Combination of the Optimized Structure of the Network

In the previous section, we discussed several structural improvement methods and gave the conclusion of the network structure improvement under the sonar image dataset in this paper. However, these conclusions are based on changes made to the baseline shallow residual network, and the combination of several improved structures has not been discussed. Therefore, it is impossible to determine which combination of structures can obtain the best training and classification results on the sonar image dataset. In order to obtain an optimal result, we combine different activation functions (PReLU, ELU, and Swish), different decomposition structures (De1 and De2), and different residual structure optimizations (RB1 and RB2), respectively, and conduct the training experiments. The results are shown in Table 7.
There are certain rules through comparison. As we discussed earlier, the Swish function achieves the best performance among all the activation functions and the combination of the ELU function performs the worst. The performance of the decomposition structure De2 is significantly better than that of De1 and the combination of RB2 achieves better results. ResNet-18+Swish+De2+RB2 achieves the best result of 86.27%, which is 4.53% higher than the average accuracy of the baseline model. Through the permutation and combination of several structure optimizations, we obtain two network structure optimization combinations with the best accuracy. Then, these two structural optimization combinations are combined with the WRS again. We call the combination of swish+De1+RB2+WRS as S1 and the combination of swish+De2+RB2+WRS as S2. We also map this structural optimization method to other deep ResNets, and Table 8 shows the training results. We name the optimal combination ResNet-18+Swish+De2+RB2+WRS as ResNet-ACW. The test results show that our optimization method is effective for residual networks of different depths.
Figure 10 shows the confusion matrix of ResNet-18, and Figure 11 shows the confusion matrix of ResNet-ACW. By comparing the confusion matrices, we find that the confusion between the aircraft and the sunken ship is severe when classifying with ResNet-18 and our method effectively improves the problem. At the same time, we find that the error rate of the aircraft and dummy is higher, which may be due to the small number of their samples and the failure of the network to better learn their characteristics. Therefore, we need an effective few-shot strategy to solve the above problem.

3.3. LN-P-GAN

According to the working principle of a GAN, we select several similar sonar images in a class as real data, and then we can generate images that are similar but not identical to real data through the GAN. The generated image has the structural features of the original image and is different from the original image to a certain extent, and the generated image is called synthetic data. Taking the shipwreck sonar image as an example, we input the eight pictures shown in Figure 12 into the LN-P-GAN constructed in this paper and record the pictures output by the generator model after every 100 iterations. Figure 13 shows the sonar image of the sunken ship generated by the generator after 5000 iterations. From the given example, it can be seen that the pictures generated by the generator model start with random noise, gradually learn the contour information of the sonar image, reduce the noise, and finally learn more contours. For additional detailed information, it is very close to the real sonar image, but at the same time it incorporates some other real data features, so it is not exactly the same as the original image.
We also build the DCGAN [26], LSGAN [27], and WGAN [28] for the comparison, and the results of the 500th, 1000th, 2000th, and 5000th iterations are shown in Figure 14. Take 10 pictures generated by four different GANs after 5000 iterations, calculate the PSNR and SSIM with real input pictures according to Equations (6) and (10), and take their average values. The results are summarized in Table 9, and the images generated by the LN-PGAN are better than those generated by the DCGAN, LSGAN, and WGAN in terms of the PSNR and SSIM.
Compared with the DCGAN and LSGAN, the LN-PGAN generates less image noise and converges faster, which proves the effectiveness in optimizing the generator model and the discriminative model. By observing the generated images, we find that the DCGAN and LSGAN do not learn the main features of the source image, which leads to the poor overall retention of the generated images. Although the WGAN has the fastest convergence speed, it has too much noise in the image and lost a lot of information of the source image to generate a usable sonar image. The images generated by the LN-PGAN obtain the highest PSNR and SSIM, which are 20.09 dB and 0.46, respectively, far higher than other GANs. The higher PSNR and SSIM prove that our image is closer to the real sonar image. At the same time, the value of the SSIM is 0.46, which shows that the image we generate is still different from the source image, supplementing the feature breadth of network learning and avoiding the problems in [19].
We expand the samples according to the classification accuracy of the network and test the baseline network and ResNet-ACW on the expanded dataset. Table 10 shows that ResNet ACW achieves 93.77% accuracy by reasonably expanding the dataset with the synthetic data generated by the GAN. It is an effective and practical method to enlarge the data regularly through the GAN, which reduces the overfitting of the network and solves the problem of insufficient underwater sonar image data. It can be predicted that continuing to expand the dataset can still reduce the network overfitting to a certain extent, but it also needs to pay the cost of increased training time.

3.4. Transfer Learning

We use the same method as in this paper to partition the mini-ImageNet dataset and then pretrain and fine-tune our model on the partitioned data. For ResNet, the first five sections are available for freezing. We call the number of frozen parts N f . N f = 5 represents that the first five parts are frozen, and N f = 0 means that network parameters are not frozen. Table 11 shows the fine-tuning test results of the baseline model and ResNet-ACW. As can be seen from the results in Table 11, the fine-tuning method can greatly improve the accuracy of the network verification set, indicating that the network parameters pretrained by Mini-ImageNet are successfully transferred to the network used for sonar image recognition. When N f = 3 , both the baseline and ResNet-ACW achieve the best network recognition rate.

3.5. The Fusion Strategy

We combine each optimization part proposed in this paper according to the process shown in Figure 9 and compare the training process optimized by this combination strategy with some novel used network structures, such as EfficientNet [29], Convnext [30], and SwinTransformer [31]. Figure 15 shows the validation accuracy curves of these networks, and Figure 16 shows their training losses. Table 12 shows the test accuracy of these networks and our method. Finally, ResNet-ACW using the fusion strategy achieves the optimal test set classification accuracy of 95.93% and improves by about 15% compared with the baseline model. It can be seen from the training results that contrast to the state-of-the-art networks or transformer models, the training process of our method is more stable and converges faster.
Figure 17, Figure 18, Figure 19 and Figure 20 show the confusion matrices of ResNet-ACW+GAN+FL, SwinTransformer, ConvNext, and EfficientNet, respectively. Compared with Figure 11, our method significantly improves the discrimination of confusing aircrafts and sunken ships and improves the overall classification accuracy. Observing the state-of-the-art classification models, we find that although these models are better than the ResNet-18 classification, they still fail to solve the accurate classification of confusing items. Our proposed network model outperforms these models without the optimization of the few-shot fusion strategies. After the GAN and FL, our method has no obvious confusion, and the classification accuracy is much higher than these methods. The result verifies the reliability of our method.

4. Conclusions and Discussion

This work has contributed to the classification task of sonar images. We have built a sonar image dataset of 768 images with RGB and single-channel data which contains six categories of targets. By comparing the common sonar image classifiers such as AlexNet, VGG, and ResNet, we found that ResNet-18 performs best on our dataset. In order to better classify the sonar images, we improved ResNet-18 around four aspects, the activation function, convolution kernel decomposition, residual block, and weight reuse structure, and propose our new network model ResNet-ACW. The improvements based on the activation function and the residual block did not increase the parameters, FLOPs, and calculation time of the network but achieved better classification results. The convolution kernel decomposition structure increased the number of FLOPs and a few parameters and increased the classification accuracy at the expense of a certain calculation time. The increased weight reuse structure could make the training process of the network more stable and improve the ability of the network to extract image features. The proposed ResNet-ACW achieved 87.01% accuracy, which was 5.27% higher than ResNet-18 and even exceeded the state-of-the-art classifier models. By analyzing the confusion matrices, we found that although ResNet-ACW improved the accuracy, there were still confusion terms, which is caused by the insufficient learning of the sonar image features. Therefore, we need to optimize the few-shot strategy for the classification process.
We proposed a fusion few-shot strategy based on a GAN and transfer learning. The fusion strategy consists of a novel GAN (LN-PGAN) and a fine-tuning method in transfer learning. We improved the generator model and discriminative model and proposed the LN-PGAN based on the DCGAN. By comparing with the DCGAN, WGAN, and LSGAN, our GAN model achieved a higher PSNR and SSIM, which means the synthetic image is closer to real sonar image data. At the same time, the synthetic data generated by the LN-PGAN were different from the source image with sonar image features, which provided additional features for the network to learn. We used the fine-tuning method in transfer learning to pretrain the network and obtained the best effect by freezing different layers. The fine-tuning method improved the ability of the network convolution kernel to extract features by freezing the training of the network parameters layer by layer, which reduced the parameters of the network and finally effectively reduced the overfitting. After the fusion optimization, our method achieves 95.93% classification accuracy on the sonar image dataset. Through the training and loss curves, our method avoided the overfitting problem and the low accuracy problem under small samples. Meanwhile, our algorithm had a faster fitting speed and a more stable training process to make the algorithm more robust. According to the confusion matrices, our algorithm overcame the classification problem of confusing items and could accurately extract the key features between the different categories.
Restricted to the sonar image collecting way, however, the types of datasets we built were not complex, both with a data imbalance problem. In the following work, we will continue to supplement our sonar image data, including some closer targets formed in worse environments, and further study the training problems caused by the sample imbalance. We observed shadows in some sonar images and the shadows are often close to the target itself. It is difficult to remove the shadow by a method such as saliency segmentation and directly find that the feature block in the image comes from which part of the original image in the deep feature map extracted by the network. Meanwhile, some sonar images often provide additional information, such as backscatter maps, when they are formed. How to input them into the network as a priori knowledge will also be our follow-up work.
In summary, we built a new network model ResNet-ACW for sonar image datasets, which works well on both RGB and single-channel image data. Aiming at the overfitting problem and low classification accuracy of a deep learning model under small samples, we propose a new generative adversarial network and combine it with the fine-tuning method in transfer learning. The results show that the network structure optimization method based on sonar images and the training optimization scheme based on the fusion strategy can effectively improve the sonar image classification accuracy and reduce the overfitting of a network. Our algorithm achieves a classification accuracy of 95.93% on the six-category target sonar image dataset, which is 14.19% higher than the baseline model.

Author Contributions

Conceptualization, Z.D.; methodology, Z.D.; software, Z.D.; validation, Z.D., H.L. and T.D.; formal analysis, Z.D.; investigation, Z.D. and T.D.; data curation, Z.D.; writing—original draft preparation, Z.D.; writing—review and editing, H.L. and T.D.; supervision, H.L.; project administration, H.L.; funding acquisition, H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under grant number 61971354.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Radeta, M.; Zuniga, A.; Motlagh, N.H.; Liyanage, M.; Freitas, R.; Youssef, M.; Nurmi, P. Deep Learning and the Oceans. Computer 2022, 55, 39–50. [Google Scholar] [CrossRef]
  2. Xu, H.; Yang, L.; Long, X. Underwater Sonar Image Classification with Small Samples Based on Parameter-based Transfer Learning and Deep Learning. In Proceedings of the 2022 Global Conference on Robotics, Artificial Intelligence and Information Technology (GCRAIT), Chicago, IL, USA, 30–31 July 2022. [Google Scholar]
  3. Krishnan, H.; Lakshmi, A.A.; Anamika, L.S.; Athira, C.H.; Alaikha, P.V.; Manikandan, V.M. A Novel Underwater Image Enhancement Technique using ResNet. In Proceedings of the 2020 IEEE 4th Conference on Information & Communication Technology (CICT), Chennai, India, 3–5 December 2020; pp. 1–5. [Google Scholar]
  4. Nguyen, H.-T.; Eon-Ho, L.; Sejin, L. Study on the Classification Performance of Underwater Sonar Image Classification Based on Convolutional Neural Networks for Detecting a Submerged Human Body. Sensors 2020, 20, 94. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Liu, F.; Ding, H.; Li, D.; Wang, T.; Luo, Z.; Chen, L. Few-shot Learning with Data Enhancement and Transfer Learning for Underwater Target Recognition. In Proceedings of the 2021 OES China Ocean Acoustics (COA), Harbin, China, 14–17 July 2021. [Google Scholar]
  6. Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
  7. Neupane, D.; Seok, J. Bearing fault detection and diagnosis using case Western Reserve University dataset with deep learning approaches: A review. IEEE Access 2020, 8, 93155–93178. [Google Scholar] [CrossRef]
  8. Seok, J. Active sonar target classification using multi-aspect sensing and deep belief networks. Int. J. Eng. Res. Technol. 2018, 11, 1999–2008. [Google Scholar]
  9. Wu, M.; Wang, Q.; Rigall, E.; Li, K.; Zhu, W.; He, B.; Yan, T. ECNet: Efficient convolutional networks for side scan sonar image segmentation. Sensors 2019, 19, 2009. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Li, C.; Huang, Z.; Xu, J.; Guo, X.; Gong, Z.; Yan, Y. Multi-channel underwater target recognition using deep learning. Acta Acust. 2020, 45, 506–514. [Google Scholar]
  11. Qin, X.; Luo, X.; Wu, Z.; Shang, J. Optimizing the Sediment Classification of Small Side-Scan Sonar Images Based on Deep Learning. IEEE Access 2021, 9, 29416–29428. [Google Scholar] [CrossRef]
  12. Jin, L.; Liang, H.; Yang, C. Sonar image recognition of underwater target based on convolutional neural network. J. Northwestern Polytech. Univ. 2021, 39, 285–291. [Google Scholar] [CrossRef]
  13. Li, C.; Ye, X.; Cao, D.; Hou, J.; Yang, H. Zero shot objects classification method of side scan sonar image based on synthesis of pseudo samples. Appl. Acoust. 2021, 173, 107691. [Google Scholar] [CrossRef]
  14. Huang, C.; Zhao, J.; Yu, Y.; Zhang, H. Comprehensive Sample Augmentation by Fully Considering SSS Imaging Mechanism and Environment for Shipwreck Detection Under Zero Real Samples. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
  15. Zhou, Y.; Chen, S.-C.; Wu, K.; Ning, M.-Q.; Chen, H.-K.; Zhang, P. SCTD 1.0:Sonar Common Target Detection Dataset. Comput. Sci. 2021, 48, 334–339. [Google Scholar]
  16. Chen, Y.; Liang, H.; Pang, S. Study on Small Samples Active Sonar Target Recognition Based on Deep Learning. J. Mar. Sci. Eng. 2022, 10, 1144. [Google Scholar] [CrossRef]
  17. Yulin, T.A.N.G.; Shaohua, J.I.N.; Gang, B.I.A.N.; Yonghou, Z.H.A.N.G.; Fan, L.I. The transfer learning with convolutional neural network method of side-scan sonar to identify wreck images. Acta Geod. Cartogr. Sin. 2021, 50, 260–269. [Google Scholar]
  18. Huo, G.; Wu, Z.; Li, J. Underwater Object Classification in Sidescan Sonar Images Using Deep Transfer Learning and Semisynthetic Training Data. IEEE Access 2020, 8, 47407–47418. [Google Scholar] [CrossRef]
  19. Steiniger, Y.; Kraus, D.; Meisen, T. Generating SyntheticSidescan Sonar Snippets Using Transfer-Learning in Generative Adversarial Networks. J. Mar. Sci. Eng. 2021, 9, 239. [Google Scholar] [CrossRef]
  20. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  21. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
  22. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
  23. Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More features from cheap operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, CA, USA, 27 June–1 July 2020; pp. 1577–1586. [Google Scholar]
  24. Chu, Y.; Wang, J.; Zhang, X.; Liu, H. Image Classification Algorithm Based on ImprovedDeep Residual Network. J. Univ. Electron. Sci. Technol. China 2021, 50, 243–248. [Google Scholar]
  25. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
  26. Radford, A.; Metz, L.; Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
  27. Qi, G. Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities. Int. J. Comput. Vis. 2020, 128, 1118–1140. [Google Scholar] [CrossRef]
  28. Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein GAN. arXiv 2017, arXiv:1701.07875. [Google Scholar]
  29. Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning (ICML), Long Beach, CA, USA, 10–15 June 2019. [Google Scholar]
  30. Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. arXiv 2022, arXiv:2201.03545. [Google Scholar]
  31. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv 2021, arXiv:2103.14030. [Google Scholar]
Figure 1. The context of this paper.
Figure 1. The context of this paper.
Jmse 10 01820 g001
Figure 2. Sonar image in the dataset.
Figure 2. Sonar image in the dataset.
Jmse 10 01820 g002
Figure 3. The structure of ResNet-18.
Figure 3. The structure of ResNet-18.
Jmse 10 01820 g003
Figure 4. Convolution kernel decomposition.
Figure 4. Convolution kernel decomposition.
Jmse 10 01820 g004
Figure 5. Structure of De1 and De2.
Figure 5. Structure of De1 and De2.
Jmse 10 01820 g005
Figure 6. Residual block structures.
Figure 6. Residual block structures.
Jmse 10 01820 g006
Figure 7. Convolution kernel decomposition.
Figure 7. Convolution kernel decomposition.
Jmse 10 01820 g007
Figure 8. Flow of LN-PGAN.
Figure 8. Flow of LN-PGAN.
Jmse 10 01820 g008
Figure 9. Flow of the fusion few-shot strategy.
Figure 9. Flow of the fusion few-shot strategy.
Jmse 10 01820 g009
Figure 10. Confusion matrix of ResNet-18.
Figure 10. Confusion matrix of ResNet-18.
Jmse 10 01820 g010
Figure 11. Confusion matrix of ResNet-ACW.
Figure 11. Confusion matrix of ResNet-ACW.
Jmse 10 01820 g011
Figure 12. LN-P-GAN input image.
Figure 12. LN-P-GAN input image.
Jmse 10 01820 g012
Figure 13. Output image of LN-PGAN.
Figure 13. Output image of LN-PGAN.
Jmse 10 01820 g013
Figure 14. Output image of the contrastive GANs.
Figure 14. Output image of the contrastive GANs.
Jmse 10 01820 g014
Figure 15. Val_acc curves of networks.
Figure 15. Val_acc curves of networks.
Jmse 10 01820 g015
Figure 16. Train losses of the networks.
Figure 16. Train losses of the networks.
Jmse 10 01820 g016
Figure 17. ResNet-ACW+GAN+FL.
Figure 17. ResNet-ACW+GAN+FL.
Jmse 10 01820 g017
Figure 18. SwinTransformer.
Figure 18. SwinTransformer.
Jmse 10 01820 g018
Figure 19. ConvNext.
Figure 19. ConvNext.
Jmse 10 01820 g019
Figure 20. EfficientNet.
Figure 20. EfficientNet.
Jmse 10 01820 g020
Table 1. The quantity of samples in each category.
Table 1. The quantity of samples in each category.
CategoryAircraftSunken ShipDummyTirePipeRockTotal
Quantity873792811270110786
Table 2. Performance summary of several classical networks on sonar image datasets.
Table 2. Performance summary of several classical networks on sonar image datasets.
NetworkParametersFLOPsTest AccuracyTraining Time
AlexNet14.60 M0.62 G74.13%1150 s
VGG-1164.82 M15.12 G76.46%2703 s
VGG-1365.00 M22.54 G77.65%3619 s
VGG-1670.31 M30.88 G77.12%4199 s
VGG-1975.62 M39.20 G78.13%4795 s
ResNet-1811.18 M3.64 G81.74%1583 s
ResNet-3421.29 M7.36 G81.03%2027 s
ResNet-5025.05 M8.24 G80.69%2886 s
ResNet-10144.04 M15.70 G81.55%4097 s
Table 3. The quantity of samples in each category.
Table 3. The quantity of samples in each category.
NetworkActivation FunctionTest AccuracyAccuracy IncrementTraining Time
Baseline ModelReLU81.74%/1583 s
PReLU82.21%0.47%1649 s
ELU81.43%−0.31%1596 s
Swish82.85%1.11%1787 s
Table 4. Test results of the convolution kernel decomposition structure.
Table 4. Test results of the convolution kernel decomposition structure.
NetworkDe StructureParametersFLOPsTest AccuracyAccuracy IncrementTraining Time
Baseline ModelDe011.18 M3.64 G81.74%/1583 s
De111.20 M4.10 G83.82%2.08%1783 s
De211.20 M4.10 G83.14%1.40%1778 s
Table 5. Test results of the residual block structure.
Table 5. Test results of the residual block structure.
NetworkResidual BlockTest AccuracyAccuracy IncrementTraining Time
Baseline ModelRB081.74%/1583 s
RB183.48%1.74%1605 s
RB283.81%2.07%1626 s
RB381.09%−0.65%1590 s
RB481.75%0.01%1595 s
Table 6. Effect of the size of the convolution kernel on the WRS branch.
Table 6. Effect of the size of the convolution kernel on the WRS branch.
NetworkABCTest AccuracyAccuracy Increment
Baseline Model///81.74%/
ResNet-18+WRS11183.04%1.30%
11383.42%1.68%
13383.63%1.89%
13584.15%2.41%
15584.11%2.37%
33383.17%1.43%
33584.04%2.30%
35584.92%3.18%
55584.63%2.89%
Table 7. Test results of different structural combinations.
Table 7. Test results of different structural combinations.
NetworkActivation FunctionDecompositionResidual BlockTest AccuracyAccuracy Increment
Baseline ModelPReLUDe1RB183.12%1.38%
De1RB284.13%2.43%
De2RB184.81%3.07%
De2RB285.48%3.74%
ELUDe1RB182.78%1.04%
De1RB283.18%1.44%
De2RB183.11%1.37%
De2RB283.13%1.39%
SwishDe1RB183.44%1.70%
De1RB285.82%4.08%
De2RB185.68%3.94%
De2RB286.27%4.53%
Table 8. Test results of S1 and S2 structures combined with ResNet of different depths.
Table 8. Test results of S1 and S2 structures combined with ResNet of different depths.
NetworkTest AccuracyAccuracy IncrementTraining Time
ResNet-18-S186.84%5.1%2327 s
ResNet-ACW87.01%5.27%2484 s
ResNet-34-S185.94%4.91%3143 s
ResNet-34-S286.37%5.34%3086 s
ResNet-50-S185.72%5.03%4687 s
ResNet-50-S286.10%5.41%4312 s
ResNet-101-S186.19%4.64%6051 s
ResNet-101-S286.38%4.83%5933 s
Table 9. Comparison of the generated images.
Table 9. Comparison of the generated images.
GANPSNR (dB)SSIM
LN-P-GAN20.090.46
DCGAN14.990.33
LSGAN17.580.36
WGAN14.650.20
Table 10. Change in network accuracy after expansion of synthetic data.
Table 10. Change in network accuracy after expansion of synthetic data.
NetworkAcc before ExpansionAcc after ExpansionAccuracy Increment
baseline81.74%90.35%8.61%
ResNet-ACW87.01%93.77%6.76%
Table 11. Test results of fine-tuning by changing the number of frozen layers.
Table 11. Test results of fine-tuning by changing the number of frozen layers.
Network N f Test AccuracyAccuracy IncrementNetwork N f Test AccuracyAccuracy Increment
Baseline086.11%4.37%ResNet-ACW089.75%2.74%
187.79%6.05%190.81%3.80%
287.01%5.27%291.12%4.11%
388.11%6.37%391.47%4.46%
488.04%6.30%491.41%4.40%
581.15%−0.59%583.75%−3.26%
Table 12. Test results of different networks and our method.
Table 12. Test results of different networks and our method.
NetworkTest Accuracy
ResNet-ACW+GAN+TL95.93%
SwinTransformer83.34%
ConvNext86.49%
EfficientNet85.62%
ResNet-1881.74%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Dai, Z.; Liang, H.; Duan, T. Small-Sample Sonar Image Classification Based on Deep Learning. J. Mar. Sci. Eng. 2022, 10, 1820. https://doi.org/10.3390/jmse10121820

AMA Style

Dai Z, Liang H, Duan T. Small-Sample Sonar Image Classification Based on Deep Learning. Journal of Marine Science and Engineering. 2022; 10(12):1820. https://doi.org/10.3390/jmse10121820

Chicago/Turabian Style

Dai, Zezhou, Hong Liang, and Tong Duan. 2022. "Small-Sample Sonar Image Classification Based on Deep Learning" Journal of Marine Science and Engineering 10, no. 12: 1820. https://doi.org/10.3390/jmse10121820

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop