Toward a Highly Accurate Classiﬁcation of Underwater Cable Images via Deep Convolutional Neural Network

: Underwater cables or pipelines are commonly utilized elements in ocean research, marine engineering, power transmission, and communication-based activities. Their performance necessitates regularly conducted inspection for maintenance purposes. A vision system is commonly used by autonomous underwater vehicles (AUVs) to track and search for underwater cable. Its traditional methods are characteristically applicable in AUVs, wherein they are equipped with handcrafted features and shallow trainable architectures. However, such methods are subpar or even incapable of tracking underwater cable in fast-changing and complex underwater conditions. In contrast to this, the deep learning method is linked with the capacity to learn semantic, high-level, and deeper features, thus rendering it recommended for performing underwater cable tracking. In this study


Introduction
Underwater infrastructures such as underwater communication cable, underwater power cable, and subsea pipeline are highly crucial to humankind. In particular, underwater communication cables played an essential role over the last 170 years to connect the whole world. Previously, the Asia-Pacific Economic Cooperation forum revealed that around 97% of all intercontinental data available are transferred via underwater cables [1]. In the current modern era, the global demand for data consistently increases every year, especially in the explosion of mobile device usage and the development of cloud computing, big data, and artificial intelligence. Therefore, new underwater communication cables are highly in demand in order to support higher speed and larger capacity of data, voice, and video transmissions. Accordingly, underwater power cables typically deployed at oil and gas platforms or renewable power projects are generally utilized to connect topside and subsea facilities for power provision purposes [2]. Meanwhile, an underwater pipeline is implemented to transport important resources such as oil and natural gas, whereby the longest underwater pipeline is used to transport natural gas, namely, the Langeled Pipeline. The cable is about 1200 km long under the North Sea and spans from the Ormen Lange field in Norway to the Easington Gas Terminal in the United Kingdom [3]. From 1986 to 2003, around 70% of faulty communication cables were attributed to benthic fishing and ship anchors, whereby they would occur between the water depths of 0 and 200 m [4]. According to the Submarine Telecoms Industry Report 2018/2019, the average time required for a repair crew to restore the cables has shown a reducing trend from 30 days to 26 days throughout the duration of 2013 until 2019. In general, most of the time required is spent by the crew to find, track, and diagnose the faulty cables [5].
The importance of underwater cables and underwater pipelines as underwater infrastructures renders their protection paramount, whereby the cables are typically protected by covering them with steel wires and burying them under the seabed [6]. Due to the working conditions of rough and aggressive underwater pipeline conditions, they are prone to leakage or failure. As a result, the oil and gas sector has developed reliable leak detection systems to monitor the state of the pipelines [7]. For example, Ortiz et al. [8] have stated that a better alternative is to consistently inspect the current cable state to prevent damages, which include corrosion, crack, or those due to human activities such as marine traffic or fishing. Accordingly, constant monitoring and inspection of the underwater cables are highly recommended for early detection of defects, which are commonly carried out with the used of surface ships, remotely operated underwater vehicles or both together. However, their response and mobilization time are not satisfactorily adequate [9], whereas inspection undertaken by human divers poses health and safety issues, especially due to the difficulties for them to find, track, inspect, and diagnose underwater cables over an extended period [10]. Therefore, an autonomous underwater vehicle (AUV) is strongly recommended for the cable tracking operation and diagnosing its fault, whereby it is capable of collecting information of importance using its sensor and making an appropriate decision in different conditions via its embedded intelligent algorithms [11].
Furthermore, a variety of sensors can be fitted on AUVs to perform underwater cable tracking operations, wherein extensive research efforts have been spent to study and use them, such as vision-based sensor, sonar sensor, and magnetometer sensor. Normally, AUVs are fitted with vision sensors for research purposes, typically in biological, geological, and archaeological surveys, thus rendering them one of the standard equipment [2]. This type of sensor is recommended to be programmed with an inspection framework for underwater cable search and detection [12], whereby it is less expensive, capable of identifying faults in a short distance, and offers a lower power consumption for its operations. Meanwhile, undersea scenes are typically captured by video cameras, following which important information is extracted and analyzed by the vision system to guide the operation of underwater robots. Accordingly, the rough and dynamic underwater conditions may cause underwater images to show a blurring effect, low-contrast environment, and nonuniform illumination, thus increasing their complexity, computational time and cost [13]. Low-contrast and non-illumination environments confuse the object detection algorithm to detect the target object from background environment [14]. Besides, natural properties of water cause the scattering and absorption of light, which influences the quality of images [15]. This further increases the difficulty in object detection. Consequently, researchers have developed a different kind of technique to restore or enhance the degraded image qualities [16], which will specifically reduce the noise in images and thus increase the classification accuracy. Lee et al. [16] applied Jaffe-McGlamery model to restore and recover the degraded color information. With the restoration of the color information, this simplifies the object detection algorithm to identify the interested object.
Some of the commonly implemented cable tracking methods include Hough Transform, Kalman filters, and particle filters, which incorporate an algorithm to search for the main straight line in the underwater images and reveal its position [2]. In line with this, Ortiz et al. [17] have developed a system based on image segmentation, thereby transforming an image to grayscale outcome that can be classified in different gray-level intensities in order to extract the cable from its environment.
Meanwhile, the Kalman filter is used to predict and identify the pose of cable in the subsequent images, which is a method employed by Antich and Ortiz [18] to predict the cable. However, the process of segmentation is different from previous research efforts, wherein contour extraction and line extraction are included to search for the main straight line in the images and thus greatly minimizing the computation time. Regardless, both approaches require one to decide the number of segments by partitioning the images manually. Such static parameters may lead to false results due to the constantly changing environment and the nature of the obstacles.
Accordingly, Balasuriya and Ura [19] have utilized multi-sensors that are fused together to search for underwater cables, whereby the dead reckoning position is employed to predict the cable in the images, which is then located via Hough Transform. Meanwhile, Chen et al. [20] have proposed an algorithm that applies a probabilistic Hough Transform for line detection and increases the detection speed, wherein the line detection requires good visibility of the underwater images and 40% of the edge point. Similarly, Fatan et al. [11] have proposed the use of Hough Transform for cable tracking purposes by using the Multilayer Perceptron (MLP) neural network and Support Vector Machine (SVM) to extract texture information from the images. This combines machine learning (MLP and SVM) and Hough Transform, the latter of which is capable of tracking a straight line in the images based on the extracted information. It should be noted that the Hough Transform approach needs a controlled environment for cable tracking and its limiting features include sediment-covered pipes, non-uniform illumination, or spurious edge detection from other pipes or elements, which will reduce its performance [12].
Nevertheless, Wirth et al. [21] have proposed a probabilistic approach for vision-based cable tracking with particle filters, which is tested by using a group of videos that record the state of power cable that has been installed about 30 years ago. Meanwhile, Ortiz et al. [8] have developed a cable tracking system which used the particle filters to counter the complexity of undersea environments. For every video frame in the video, the previously obtained probability density function data of the undersea cable parameters are then applied to predict the position of cable in a subsequent frame. However, a fast change of environment would affect the algorithm to predict the position of underwater cable.
Over the last few decades, a deep learning method has been successfully used in different kinds of fields, technologies, and mechanisms requiring a huge amount of data for training purposes, providing useful information. The improvement of its methods has been found to be a remarkable success in the context of image identification, object detection, image classification, and face identification tasks [22]. Besides, deep learning is employed in face detection [23], pedestrian detection [24], and underwater object detection as well [25,26]. According to O' Byrne et al. [27], it lies in its highly repurposable nature. When a model is trained with a huge amount of marine growth images, it can be reprogrammed to become a crack detection model specifically by training the model with crack images.
In the era of the Internet of Things (IoT), the technology of deep learning has grown rapidly in the context of computer vision, which is a task highly influenced by video analysis and image understanding methods. The development of deep learning has attracted much research attention in current times. Image classification or recognition is the primary domain in the deep learning field, which learns the important feature from the images and based on that information, images will be classified [22,28]. Object detection or localization locates and provides spatial information for the target object in images. Object tracking needs to obtain sufficient feature data during the image classification stage and combine with a deep-learning-based classifier for tracking the target in a real-time scenario [28].
Commonly, traditional object detection models can roughly be categorized into three stages, namely the informative region selection, feature extraction, and classification [29]. Accordingly, they are less effective in constructing a complex situation when classifying multiple low-level image features with high-level context [12]. However, deep learning may address these issues typically present in traditional architectures [29]. This is due to the natural properties of the deep learning method, large availability dataset and high powerful hardware in the current market [30]. Valdenegro-Toro [31] and Kvasic et al. [32] have applied deep Convolutional Neural Network (CNN) for underwater object recognition by using the underwater sonar images, their study showed outstanding performance when compared to the current state-of-the-art method. Jalal et al. [33] have developed a deep learning method to locate the position of the fish and perform species classification from images. The Jalal et al. [33] method has achieved remarkable results compared to other methods. Figure 1 describes the application and process of deep Convolutional Neural Network (CNN) in object detection. Deep CNN consists of various neural networks in object detection, such as Convolutional Neural Network, Region-based Convolutional Neural Network, and Fast Region-based Convolutional Neural Network. In particular, Buetti-Dinh et al. [34] have developed deep CNNs to classify the bacterial biofilm composition, whereby its usage outperforms the human experts given the results obtained: 90% (via CNN) compared to 50% (by human experts). Similarly, Villon et al. [26] have employed deep CNN to identify fish species from underwater images, wherein its accuracy is as high as 94.9% and greater than humans (89.3%). Therefore, the proposed deep CNN is able to identify fishes in complex conditions and more effective than human methods even for the smallest or blurry images. In 2019, Gómez-Ríos et al. [35] created and proved that deep CNN was an excellent technique for the classification of underwater coral images accurate coral classification model by using several deep CNN models which are Inception V3, ResNet-50, ResNet-152, DenseNet-121, DenseNet-161.
Previous used vision systems are using an algorithm to identify two straight lines from images and calculate the probability of underwater cable in images based on the previous frame. The existing knowledge gap in those methods is the complex and fast change of underwater environments that might affect the detection of a line from images and influence the prediction of the system. Based on the aforementioned, the author believed deep CNN to be able to minimize the gap of knowledge in the classification of underwater cable images. Hence, the current study employs different types of deep CNN models to perform the classification of underwater cable images and varying optimization techniques are applied to increase their performance.

Materials and Methods
Nowadays, deep learning techniques have been used and studied across different research areas due to their outstanding performance. In particular, deep CNNs are one of the artificial neural networks (ANNs) formed by a stack of convolutional layers, activation function, pooling layers, and fully connected layers. Their natural procedure entails the learning of low-level and high-level features such as edges and curves, and shapes and different patterns from input image data, respectively [14]. Driven by such achievements across various research areas, this study employed the deep CNN method to perform the classification of underwater cable images. Figure 2 details the overall framework in the development of deep CNN model for the classification of underwater cable images from underwater images. Input data, deep Convolutional Neural Network model, and experiment setting are important for this particular task. A group of underwater cable videos were collected from different underwater cable service companies. Those videos were converted to video frames. After that, the video frames are standardized to 75 × 75 pixels and arranged accordingly before being fed into the deep CNN models for training purposes. Several deep CNN models were chosen in this study and subjected to training and performance evaluation, whereby transfer learning and data augmentation were applied to optimize their performance and attain accuracy above 90%. Then, the suitable deep CNN model and optimization techniques were proposed for the classification of underwater cable images in which its training was carried out by using TensorFlow-GPU in Jupyter Notebook on an Intel Core i7 7500U/2.7 GHz using an NVIDIA GeForce 940MX laptop.

Data Acquisition
In this research, all underwater images employed were converted from a video of underwater cable tracking task, whereby about 13 underwater cable tracking and inspection video clips had been obtained from the website of underwater service companies. Ten videos clips were selected for training and validation purpose while another three videos were used to test the performance of the models. Accordingly, the video clips were selected and extracted to yield video frame images, following which a total of 2000 underwater images were collected. Then, the images were manually selected and categorized into two groups, which were labeled as 'images with underwater cable' and 'images without underwater cable' accordingly. Much of the conventional marine research employs color images to train the deep CNN models as they can extract more information from such images [36]. Recently, the deep learning techniques are strongly recommended in solving problems with low-cost devices and less public resources [37,38]. The higher-resolution images require higher computational time and expensive hardware for processing the information, hence low-resolution images are decided to be used in this study Therefore, all underwater images in this study were set in the RGB color mode and resized to the dimension of 75 × 75 pixels, which were the minimum input size of the deep CNN models. All of the images were divided randomly into either the training set (70%), validation set (20%), and test set (10%) before they were fed to the deep CNN model. Figure 3 shows the overall framework for data acquisition and data pre-processing, whereas Table 1 depicts the categorization of image data for this particular study.   5 show some examples of images with underwater cable and images without underwater cable. With the use of lower-resolution images, the performance of deep CNN models in the classification of underwater cable images might be affected. The study of Kannojia and Jaiswal [39], in particular, has shown that such performance is decreased when the degradation of image resolution from a higher to lower grade occurs. Regardless, the use of several optimization techniques such as transfer learning and data augmentation can minimize the problem of low-resolution images [14,35,40].

Deep Convolutional Neural Network Model
In general, deep learning architectures can be categorized into Deep Belief Network (DBN), Boltzmann Machine (BM), Restricted Boltzmann Machine (RBM), Deep Auto-Encoder (DAE), and Convolutional Neural Network (CNN) accordingly. Much of the recent scholarly efforts have underlined the superior performance of deep CNN in the learning feature from images, whereby they are used for image classification problems [41]. In this study, several deep CNN models were selected based on the consideration of the learning parameter, layer of the models, computational cost, and performance of deep CNN models. As a result, the following were chosen: MobileNet, MobileNet V2, Inception V3, Xception, and Inception-ResNet-V2.

•
MobileNet-MobileNet is built for a very small, low-latency model and is purposely designed to use in low-cost applications. MobileNets is designed based on a streamlined architecture that applies depthwise separable convolutions. This is to develop a less computational cost and lower parameter of deep neural network. MobileNet has been applied in large-scale geolocalization, face attributes, object detection, face embedding [38]. The minimum input layer of MobileNet accepts images of 32 × 32 pixels.

•
MobileNetV2-MobileNetV2 is a new mobile architecture introduced by a new technique that is inverted residual with a linear bottleneck. This further increases its performance and reduces the need for main memory from the hardware [42]. The minimum input layer of MobileNetV2 accepts images of 32 × 32 pixels. • Inception V3-The architecture of Inception was introduced as GoogLeNet, named Inception V1. InceptionV3 is a variant of GoogleNet that was refined by adding factorization ideas [43]. The computational cost of Inception is lower than VGGNet as the number of parameters is less. The minimum input layer of InceptionV3 accepts images of 75 × 75 pixels. • Xception-Xception stands for "Extreme Inception" which outperforms than Inception V3. The architectures of Xception are stimulated by the idea of Inception, where replaced Inception modules with depthwise separable convolutions [44] and Xception outperforms Inception V3 due to higher model efficiency. The minimum input layer of Xception accepts images of 71 × 71 pixels.

•
Inception-ResNet-V2-Inception-ResNet-V2 is the combination of the idea of Inception model and Resnet model for obtaining high performance at low computational cost compare to other models. The Inception model tends to develop deeper layers to achieve good performance. Resnet model performs better for training very deep architecture by using residual block to inherent importance data. The combination of both models means that the Inception model is able to reap all the advantages of the Resnet model while maintaining its computational efficiency [45]. The minimum input layer of the Inception-ResNet-V2 model is 75 × 75 pixels.

Transfer Learning
Transfer learning is a machine learning technique where a pre-trained deep CNN model is subjected to retraining with new input data for a different task. Instead of rebuilding a new model, which requires a lot of time and cost, transfer learning technique is thus found to be useful for reusing the model without influencing its performance [46]. In this article, fine-tuning and deep feature learning were applied to train the models by using the underwater images collected. Generally, a huge amount of data is required for the deep CNN models to learn from scratch. However, in the case of insufficient data for training the model in a specific problem, the fine-tuning approach is decidedly helpful. It is carried out by retraining the last layer of the model with a new dataset to classify the images with or without a cable. Here, the weights of the early layers are frozen as they are used to learn low-level features such as edges and lines, whereas the last layer is specifically employed for the classification task. Meanwhile, another approach to transfer learning is deep feature learning in which new input data are provided to the pre-trained models, which will learn the feature from the input data. Such models will then be retrained with new input data and the weight values in the layers are thus updated. Here, the learning rate of the pre-trained deep CNN models is faster than a new deep CNN model due to all of the weights stored in the layers [46]. Figure 6 shows the general framework for both fine-tuning and deep feature learning.

Data Augmentation
Data augmentation is a very useful approach in generating an abundance of data from the original data while preserving the important information in the newly generated data [35,40,[47][48][49][50]. Therefore, high-quality and a huge amount of data are important for improving the performance of various deep learning models. In this article, data augmentation was applied, wherein an Augmentor software package was employed to generate an abundance of image data for deep CNN model training [51]. To ensure a high-performing deep CNN model, the existing data were extended by applying three different augmentation techniques, namely, rotation, flipping, and random distortion. Meanwhile, other techniques such as random cropping, random zoom, and color augmentation were not suggested since they might impact the deep CNN model to learn the features of an underwater cable. The aforementioned three operations alone would allow for the generation of augmented images, specifically bypassing the input image through the pipeline multiple times. The operation is either applied or skipped based on a user-defined probability parameter; if an operation is applied, its parameters are chosen randomly within the user-specified range. The operation pipeline employed to generate data is shown in Figure 7. Here, a total of 23,200 images were generated using the data augmentation techniques, which were then combined with the original data and fed into all the deep CNN models. Figure 8 shows the sample images generated by Augmentor.

Training Settings
In this study, transfer learning was implemented to train the deep CNN models, which were initialized by using the pre-trained weights obtained from ImageNet. In the context of the fine-tuning approach, all of the layers were frozen, except for the last layer, which was removed as it was employed to classify the images in ImageNet. Hence, the last layers of all deep CNN models in this study were replaced with a dense layer of two neurons via the SoftMax activation function. The dense layer with two neurons was then utilized to classify the image with or without a cable accordingly. Here, the models were subjected to training using 1400 images, and then validated using 400 images and tested using 200 images. Meanwhile, the Deep Feature Learning process was similar: the last layer was removed and replaced with a dense layer of two neurons by using the SoftMax activation function. However, all of the previous layers were not frozen and the weight was consistently updated during the training. Afterward, the models were subjected to training, validation, and testing by using 1400, 400, and 200 images, respectively.
Then, the data augmentation technique was applied to generate a total of 20,000 images used for training and 5000 images subjected to validation. Both fine-tuning and deep feature learning techniques were applied for all models, which were then trained with 20,000 images, validated with 5000 images, and tested with 200 images.
An ADAM optimizer with a suitable learning rate value would aid them in learning for the training data set without losing any of the useful features and ensure the learning process in obtaining a good local minima [52]. As transfer learning was utilized in this study, the Adaptive Moment Estimator Optimizer (ADAM) with a learning rate of 0.00001 was thus suggested for the deep CNN models [52]. Meanwhile, categorical cross-entropy loss was applied for any loss function. Throughout the study, the batch size was limited to 10 and iterated for 100 epochs, following which the deep CNN models were implemented by using the Keras API in Jupyter Notebook.

Testing the Model Performance
All five deep CNN models utilized in this study were trained with 1400 images and validated using 400 images by implementing different techniques, wherein their respective performance was next tested with 200 images. This would ensure the attainment of classification accuracy, precision value, recall value, and f1 score. Next, the classification performance of the models was visualized by using a confusion matrix, which is a table detailing the number of correct and incorrect predictions. Figure 9 shows a sample of the confusion matrix. Following this, the classification accuracy, precision value, recall value, and f1 score of all five deep CNN models were compared. In particular, classification accuracy is the ratio between the number of true positive samples and the total number of samples, while precision is the closeness of true positive prediction. Meanwhile, recall is also known as sensitivity, which is the true positive rate, whereas the f1 score is used to measure the weighted average of precision and recall values. In a classification problem with more than two classes, average precision, recall, and f1 score are calculated to show the performance of model. The accuracy, precision, recall, and F1 score are calculated as follows: where N is the total number of instances, Precision = True Positives/(True Positives + False Positives),

Fine-Tuning
Due to the difference present between the images in ImageNet and ocean images, fine-tuning was applied for all layers of the deep CNN models in this study to train them [53]. The optimization approach was applied to transfer data from the deep CNN model trained by the ImageNet database, which was then retrained using the underwater cable images. Table 2 Figure 10 shows the confusion matrix employed for classifying an image with or without an underwater cable. Therefore, MobileNet V2 correctly classified 74 images with underwater cable and 61 images without underwater cable out of 200 testing images.  The low classification accuracy observed for the five deep CNN models is attributable to their inability to learn the underwater cable images from scratch. When fine-tuning was applied, all of their layers were frozen, except for the last layer that was used to classify the underwater cable images. These frozen layers are important to perform low-level, mid-level, and high-level extraction from the underwater cable images as vast differences are present between the ImageNet images and underwater images used in this study. When the layers are frozen, the last layer is unable to get useful information from its previous layers to perform cable tracking, thereby yielding the overall performance that is less than 90% accurate. According to Cetinic et al. [54], the performance of deep CNN model is the lowest among all transfer learning techniques when fine-tuning is applied as it freezes all layers except the last layer, thereby resulting in the low similarity of the source and target domains. Consequently, the fine-tuning technique is noted to be weak in classifying such underwater cable images.

Deep Feature Learning
In this section, the results of deep feature learning technique implementation to train five deep CNN models in classifying underwater images are provided in Table 3. In general, the classification accuracy for the models was improved compared to the previous approach, yielding the following values: MobileNet (89.50%), MobileNetV2 (88.50%), Inception V3 (85.50%), Xception (88.50%), and Inception-ResNet-V2 (87.50%). Furthermore, the overall computational time for the deep learning feature was higher than the fine-tuning technique, whereby MobileNet revealed the highest classification accuracy in the classification of images, among others. It successfully and correctly classified 84 images with underwater cable and 95 images without underwater cable out of 200 testing images. All the confusion matrix of deep CNN models can be observed in Figure A1.
The superior performance of the deep feature learning compared to the fine-tuning technique in the classification of underwater cable images was attributable to the models' ability to learn from scratch. Besides, the technique utilized all layers of the five deep CNN models to learn useful information and features from the training data, allowing them to learn the patterns and classify the underwater cable from the underwater images. Based on these results, the deep feature learning technique is found to be better than fine-tuning in classifying underwater cable images. However, its longer computation time is a specific drawback. In the next subsection, the results of fine-tuning and deep feature learning with data augmentation are presented.

Performance of Deep CNN Models with Data Augmentation
In general, 20,000 training data were generated via data augmentation and then used to train the deep CNN models, following which fine-tuning and deep feature learning were both applied to train them using the augmented data. Accordingly, the model performance was slightly improved when the fine-tuning technique with data augmentation was applied for all models, except for Inception-ResNet-V2, whose performance was unchanged. The classification accuracy of all deep CNN models is presented in Table 4. Meanwhile, the computational time for training the models yielded a large increment when subjected to more data. In particular, deep feature learning when combined with augmented data was applied to train the deep CNN models, thus yielding superior results compared to other techniques. In fact, the classification accuracy for the deep CNN models was the highest among all experiments done prior, resulting in the following values: MobileNet (91.50%), MobileNetV2 (93.50%), Inception V3 (90.50%), Xception (91.00%), and Inception-ResNet-V2 (91.50%). However, the training time for the models also increases by a huge margin shown in Table 4. All the visual performances of deep CNN models trained with fine-tuning with augmented data and deep feature learning with augmented data were in Figures A2 and A3. In this study, 20,000 training datasets were generated via data augmentation and used to train the five deep CNN models for learning the difference between images with and without underwater cables, which could further improve the performance of cable tracking despite requiring more computational time. Here, the classification accuracy of deep CNN models trained with augmented data was improved in comparison with the use of fine-tuning or deep feature learning singular. Regardless, their accuracy was still lower than 90%, which was the accuracy for the fine-tuning with the augmented data approach. This is due to the technique's limitation that causes the deep CNN models to be unable to learn important information even though they are trained with that amount of data. In the context of deep feature learning, the models were pre-trained with the data from ImageNet before being trained with the underwater images, thus showing better performance than fine-tuning in classifying the images. Therefore, this proves the ability of data augmentation to optimize the performance of deep CNN models. In addition, some experiments have been done to optimize the model performance in recognizing underwater objects, which is done by increasing the underwater images. The results obtained have proven the concept by yielding an improved performance of the deep CNN models for underwater object recognition [14,55]. Figure 11 shows the training accuracy and validation accuracy of Inception-Resnet-V2, whereas those of the remaining models are included in Figure A4. The figure clearly shows the improved training and validation accuracy in the presence of an abundance of data. Another important observation from Figures 11 and A4 was the improved stability of the model when more data were used in training and validation. For example, fine-tuning with an abundance of data, the training and validation accuracy of Inception-ResNet-V2 have not fluctuated as fine-tuning singular. The higher validation accuracy indicates the higher expected performance of deep CNN models when subjected to the testing dataset, whereas the higher training and validation accuracy both also prove the importance of data for deep-learning-based techniques. In the study of Sajjad et al. [47], a comparison of validation accuracy with and without data augmentation has been presented, whereby the validation accuracy with data augmentation is shown to be higher than that without data augmentation. Therefore, the inclusion of the data augmentation technique improves the performance of deep CNN models in classifying the underwater cable but it requires a longer computational time.

Proposal of a Suitable Deep CNN Model for the Classification of Underwater Cable from Images
This study proposed several deep CNN models to perform the classification of underwater cable, whereby different optimization techniques were applied to increase their respective performance. Figure 12 shows that deep CNN models trained with deep feature learning and data augmentation yield the highest performance in classifying underwater cable images compared to other techniques. Meanwhile, Figure 12 details the longer computational time obtained by deep feature learning compared to fine-tuning. Besides, data augmentation inclusion increased computational time significantly. Furthermore, the performance of MobileNetV2 was improved from 67.50% (i.e., fine-tuning) to 68.50% (i.e., fine-tuning with data augmentation) and from 88.50% (i.e., deep feature learning) to 93.50% (i.e., deep feature learning with data augmentation). Meanwhile, its computational time was increased from 0.16 h (i.e., fine-tuning) to 5.55 h (i.e., fine-tuning with data augmentation) and from 0.58 h (i.e., deep feature learning) to 8.75 h (i.e., deep feature learning with data augmentation).
augmentation yield the highest performance in classifying underwater cable images compared to other techniques. Meanwhile, Figure 12 details the longer computational time obtained by deep feature learning compared to fine-tuning. Besides, data augmentation inclusion increased computational time significantly. Furthermore, the performance of MobileNetV2 was improved from 67.50% (i.e., fine-tuning) to 68.50% (i.e., fine-tuning with data augmentation) and from 88.50% (i.e., deep feature learning) to 93.50% (i.e., deep feature learning with data augmentation). Meanwhile, its computational time was increased from 0.16 h (i.e., fine-tuning) to 5.55 h (i.e., fine-tuning with data augmentation) and from 0.58 h (i.e., deep feature learning) to 8.75 h (i.e., deep feature learning with data augmentation). Among all deep CNN models proposed in this study, it showed the highest performance in classifying underwater cable images when trained with deep feature learning and data augmentation. Hence, MobileNetV2 with deep feature learning and data augmentation is thus suggested for the classification of underwater cable images. Previously, the study of Valentini and Balouin [37] has also utilized the same model with transfer learning and data augmentation to perform algae detection by using low-cost smartphone-based images.
In another experiment, the deep CNN models have been trained and optimized via hyperparameter tuning. The training of such models involves the selection of many critical and essential hyperparameters, which can impact the model performance in the context of its accuracy and computational time to a significant extent [56]. Therefore, deep CNN models should have a suitable set up of hyperparameter configuration to obtain a relatively low training time while conserving their high classification performance concurrently. Accordingly, the selected hyperparameters are learning rate, number of patience for early stopping, and batch size, whereby a comparison of the performance metrics to construct a suitable configuration of hyperparameters will lead to better performance and lower computational time. Here, the suggested ranges for the learning rate, number of patience for early stopping, and batch size are within the range of 0.1 to 0.00001, 0 to 50, and 10 to 50, respectively. However, their selection is dependent on the data and hardware used for the experiment. Among all deep CNN models proposed in this study, it showed the highest performance in classifying underwater cable images when trained with deep feature learning and data augmentation. Hence, MobileNetV2 with deep feature learning and data augmentation is thus suggested for the classification of underwater cable images. Previously, the study of Valentini and Balouin [37] has also utilized the same model with transfer learning and data augmentation to perform algae detection by using low-cost smartphone-based images.

Conclusions
In another experiment, the deep CNN models have been trained and optimized via hyperparameter tuning. The training of such models involves the selection of many critical and essential hyperparameters, which can impact the model performance in the context of its accuracy and computational time to a significant extent [56]. Therefore, deep CNN models should have a suitable set up of hyperparameter configuration to obtain a relatively low training time while conserving their high classification performance concurrently. Accordingly, the selected hyperparameters are learning rate, number of patience for early stopping, and batch size, whereby a comparison of the performance metrics to construct a suitable configuration of hyperparameters will lead to better performance and lower computational time. Here, the suggested ranges for the learning rate, number of patience for early stopping, and batch size are within the range of 0.1 to 0.00001, 0 to 50, and 10 to 50, respectively. However, their selection is dependent on the data and hardware used for the experiment.

Conclusions
This study proposed using a deep learning method for the classification of underwater cable images from underwater images, which arose due to the challenges of underwater cable tracking by using the traditional methods. This is due to the large number of different underwater cable types available, variation of lighting underwater causing difficulties for the camera to capture underwater cable images, and the presence of algae and sand covering the underwater cables that increase the chance of misclassification. However, deep learning methods have typically performed greatly in the underwater conditions for identifying underwater cable.
In this study, several deep CNN models were chosen to perform the classification of underwater cable images, whereby transfer learning and data augmentation were implemented to enhance the model performance in classifying underwater cable images from underwater images. Among the deep CNN models evaluated, MobileNetV2 yielded the best performance valued at 93.5% when deep feature learning and data augmentation techniques were applied. Furthermore, the advantages of the deep learning method include its capability to identify underwater cable in any situation when a large volume of images related to the cable is provided to train the deep CNN model. In contrast, its drawbacks consist of more computational power required and expensive GPUs necessary to process the large amount of data and complex data models. Based on the experiment results, it can be concluded that the deep learning method is powerful and highly accurate in the classification of underwater cable images. Accordingly, the contribution of this study lies in the development of a deep learning method to perform underwater cable image classification. For future works, researchers may opt to further improve the deep CNN models by localizing the position of underwater cable in the images. It is suggested for them to apply other types of deep learning approaches, such as Region-based CNN, Fast Region-based CNN, and Mask Region-based CNN. In some of the fields, deep learning had been used to perform defect detection such as road crack, building crack, etc. It is also suggested for researchers to collect a new image dataset to train the deep CNN model to perform an inspection of underwater cable, power cable, or underwater pipeline. The constraint for both suggestions is the collection of suitable image data as deep learning required a huge amount of data for it to capture those features that researchers are interested in.