Improved Ship Detection Algorithm from Satellite Images Using YOLOv7 and Graph Neural Network

: One of the most critical issues that the marine surveillance system has to address is the accuracy of its ship detection. Since it is responsible for identifying potential pirate threats, it has to be able to perform its duties efﬁciently. In this paper, we present a novel deep learning approach that combines the capabilities of a Graph Neural Network (GNN) and a You Only Look Once (YOLOv7) deep learning framework. The main idea of this method is to provide a better understanding of the ship’s presence in harbor areas. The three hyperparameters that are used in the development of this system are the learning rate, batch sizes, and optimization selection. The results of the experiments show that the Adam optimization achieves a 93.4% success rate when compared to the previous generation of the YOLOv7 algorithm. The High-Resolution Satellite Image Dataset (HRSID), which is a high-resolution image of a synthetic aperture radar, was used for the test. This method can be further improved by taking into account the various kinds of neural network architecture that are commonly used in deep learning.


Introduction
The management of marine security relies significantly on remote sensing images for the automatic ship detection. Its primary duties include keeping an eye on traffic, finding illicit fishing, and stopping maritime pollution. Military organizations use the automatic ship detection system to enhance maritime security. This process can be carried out through various activities, such as reconnaissance, surveillance, and intelligence. One of the most common technologies used in this field is advanced remote sensing. This type of technology is used to gather various types of data. It can gather various data points such as radar, electro-optical cameras, and electronic support systems. This research focuses on analyzing satellite photos. Deep learning is a process that requires a lot of training data to develop.
All commercial and passenger ships weighing over 300 tons are required to have an automatic identification system (AIS) transponder. This type of device transmits information about the vessel's location and destination. However, it can be easily manipulated. For instance, if a fishing boat wants to pretend to be another vessel, it can alter the type of information that the ship transmits. Convolutional neural networks (CNNs), a tiny subset of machine learning (ML), are among the more recent technologies that have seen more successful implementations [1]. Additionally, they have been integrated with a multilayered network architecture created using conventional neural network techniques. CNNs consist of various components, such as activation function, input layers, output layers, and convolutional layers.
Ship identification and classification have also been accomplished using a deep learning strategy [2], a process of deep learning influenced by the human brain's structure and function. It can be used to process the data collected by the SAR system, which include monitoring plants and diseases, mapping various trajectories, and analyzing the data collected from various sources. Therefore, the primary goal of this study is to identify the presence of ships using satellite photos with high accuracy.
In this study, we suggest going a step further in addressing the issue in the automatic identification of ships. When compared to handcrafted features, our method, which is based on the well-known CNN architecture You Only Look Once (YOLO), can determine the most distinctive features for the given task [3]. In the suggested framework, picture features are extracted through Graph Neural Networks (GNNs) and then categorized using a YOLOv7 detector. The HRSID dataset was used to train the automatic ship detection system; a comparison was made with various CNNs (YOLOv3, YOLOv4, YOLOv5 and YOLOv6) presented by other authors [4][5][6][7][8][9][10][11][12][13]. We evaluated our approach using a publicly accessible ship dataset made up of around 16 K satellite photos that also included moving ships.
The summary of the contribution is as follows: (1) For ship detection, a high-resolution SAR dataset is used. It was not able to take into account the various flaws in the previous SAR ship dataset, which is mainly used for CNN-based detectors. (2) The goal of this paper is to analyze the effects of ship detection on the images captured by the SAR system. A large-sized image of the ship is used to test the model's performance. (3) A comprehensive evaluation of ship detection is performed using MS COCO metrics.
The IoU threshold of objects is evaluated using an average precision. An HRSID comparison between different YOLO versions is also carried out.
The organization of paper is as follows. The related work and state of the art on ship detection from satellite images, including those that employ DL algorithms for classification, is briefly summarized in Section 2. Then, in Section 3, we provide our DL-based YOLOv7_GNN ship detection approach. The examined datasets and a thorough examination of the stated outcomes resulting from the executed experiments are both provided in Section 4. The key pertinent conclusions from this study are presented in Section 5, along with a list of future research topics.

Ship Detection Data Collections Platform
According to Kanjir et al. [14], optical, infrared, and radar sensors are the most frequently utilized sensors in sea surveillance applications. Since the 1990s, radar has been a common technology for ship surveillance and detection. One of the most common types of ship detection that can be performed is by using a satellite-based sensor. This method involves collecting data from various sources.

Improvement in Deep Learning (DL)
The amount of data that are now being saved grows daily. As there are more datasets available, researchers are constantly attempting to improve the algorithms that are currently in use. There are typically many layers of deep learning that are used in the processing of complex information. These include the input and output layers of the deep learning classifier. In order to compare the three machine learning algorithms, Chua and colleagues presented a comparison of the SVM, the histogram of oriented gradient, and the latent SVM.
In order to detect objects with complex backgrounds and scale variation, Kanjir et al. [14] developed a fully convolutional neural network algorithm. The algorithm uses a combination of box regression and CNN to label the class [15].
Through a combination of deep learning and CNN, Jaafar Alghazo [16] was able to create two models that can detect ships in the Airbus Satellite dataset. These models can be used to deal with various maritime-related problems, such as illegal fishing and resource surveillance.
Researchers have developed a new technique that allows them to recognize ships using satellite images. The method, known as R-CNN, is mainly used for analyzing the images taken by the RADARSAT-2 and Sentinel-1 satellites [17]. The models were able to recall and improve their accuracy by 89.14% and 89.23%, respectively.
In order to create a generative transfer learning framework that can be used for ship detection, X. Lou [18] proposed a method that combines knowledge transfer and ship recognition. The output of this module was fed into a detector model. The goal of the experiment was to analyze the characteristics of the ship detection datasets taken from the Air-SAR-Ship-1.0 and SAR Ship Detection datasets.

Dataset
The HRSID [19,20] is a repository for ship detection in high-resolution SAR images. It contains over 16,951 instances of ships, as well as 5604 high-resolution SAR photos. The HRSID was created using the COCO datasets, which include images with varying marine areas, resolutions, water conditions, and coastal ports. These allow researchers to compare their methods against those of other researchers. Figure 1a,b show the images from HRSID repository. The three resolutions of the HRSID's high-resolution SAR images are: 1 m, 3 m, and 0.5 m. Table 1 displays some detailed information of the images from HRSID dataset.
detection, X. Lou [18] proposed a method that combines knowledge transfer and ship recognition. The output of this module was fed into a detector model. The goal of the experiment was to analyze the characteristics of the ship detection datasets taken from the Air-SAR-Ship-1.0 and SAR Ship Detection datasets.

Dataset
The HRSID [19,20] is a repository for ship detection in high-resolution SAR images. It contains over 16,951 instances of ships, as well as 5604 high-resolution SAR photos. The HRSID was created using the COCO datasets, which include images with varying marine areas, resolutions, water conditions, and coastal ports. These allow researchers to compare their methods against those of other researchers. Figure 1a,b show the images from HRSID repository. The three resolutions of the HRSID's high-resolution SAR images are: 1 m, 3 m, and 0.5 m. Table 1 displays some detailed information of the images from HRSID dataset.

Our Approach
Based on the YOLOv4, Scaled YOLOv4, and YOLO-R YOLO model architectures, the YOLOv7 architecture was developed.
The YOLOv7, an Extended Efficient Layer Aggregation Network (E-ELAN) architecture, is a framework that enables the continuous improvement of the learning capabilities of the network by implementing various features such as shuffle, expand, and merge cardinality. This allows the network to maintain its learning performance even when the gradient route is changed.
Compound model scaling is a process that involves modifying the characteristics of a model to improve its performance in various applications. For instance, it can help to improve the model's depth, width, and resolution. Different scaling considerations in conventional techniques with concatenation-based architectures (such as ResNet or PlainNet) must be taken into account collectively rather than separately. For instance, increasing model depth will affect the ratio between a transition layer's input and output channels, which may result in less hardware being used by the model. For a concatenationbased model, YOLOv7 introduces compound model scaling as shown in Figure 2. The compound scaling method can preserve the model's original design elements. It can be used to modify the output channel and depth factor of a computational block. This process can be performed in order to maintain the model's ideal structure. For instance, changing the block's depth factor can affect the output channel.

Our Approach
Based on the YOLOv4, Scaled YOLOv4, and YOLO-R YOLO model architectures, the YOLOv7 architecture was developed.
The YOLOv7, an Extended Efficient Layer Aggregation Network (E-ELAN) architecture, is a framework that enables the continuous improvement of the learning capabilities of the network by implementing various features such as shuffle, expand, and merge cardinality. This allows the network to maintain its learning performance even when the gradient route is changed.
Compound model scaling is a process that involves modifying the characteristics of a model to improve its performance in various applications. For instance, it can help to improve the model's depth, width, and resolution. Different scaling considerations in conventional techniques with concatenation-based architectures (such as ResNet or PlainNet) must be taken into account collectively rather than separately. For instance, increasing model depth will affect the ratio between a transition layer's input and output channels, which may result in less hardware being used by the model. For a concatenation-based model, YOLOv7 introduces compound model scaling as shown in Figure 2. The compound scaling method can preserve the model's original design elements. It can be used to modify the output channel and depth factor of a computational block. This process can be performed in order to maintain the model's ideal structure. For instance, changing the block's depth factor can affect the output channel. Despite being an excellent VGG architecture, the planned re-parameterized version of RepConv loses significant accuracy when applied to either DenseNet or ResNet. In Figure 3, YOLOv7, the architecture is presented without an identity connection. Concatenation or residual is used to replace a previously created layer with a re-parameterized convolutional layer. Despite being an excellent VGG architecture, the planned re-parameterized version of RepConv loses significant accuracy when applied to either DenseNet or ResNet. In Figure 3, YOLOv7, the architecture is presented without an identity connection. Concatenation or residual is used to replace a previously created layer with a re-parameterized convolutional layer.
A head, a neck, and a backbone are parts of a YOLO architecture as shown in Figure 4. The projected model outputs are located in the head. YOLOv7 is suitable for lead loss and auxiliary. The aim of this paper is to create a neural network that is capable of training various types of neural networks. It is inspired by deep supervision, a method that involves training a neural network. The generation of final product is the responsibility of the lead head, whereas the auxiliary head supports middle-layer training. Figure 5 demonstrates the general flow of the ship detection model from satellite images. In the dataset preprocessing step in Figure 5, first, the division of the dataset is made with a ratio of 6:4 for training set and testing set. For the identification of the ship, we consider the area of the bounding box and aspect ratio of the bounding box. The shape of the bounding box matches the aspect ratio of the bounding box, which is helpful in adopting an anchor for generating the bounding boxes. A head, a neck, and a backbone are parts of a YOLO architecture as shown in Figure  4. The projected model outputs are located in the head. YOLOv7 is suitable for lead loss and auxiliary. The aim of this paper is to create a neural network that is capable of training various types of neural networks. It is inspired by deep supervision, a method that involves training a neural network. The generation of final product is the responsibility of the lead head, whereas the auxiliary head supports middle-layer training. Figure 5 demonstrates the general flow of the ship detection model from satellite images. In the dataset preprocessing step in Figure 5, first, the division of the dataset is made with a ratio of 6:4 for training set and testing set. For the identification of the ship, we consider the area of the bounding box and aspect ratio of the bounding box. The shape of the bounding box matches the aspect ratio of the bounding box, which is helpful in adopting an anchor for generating the bounding boxes.   A head, a neck, and a backbone are parts of a YOLO architecture as shown in Figure  4. The projected model outputs are located in the head. YOLOv7 is suitable for lead loss and auxiliary. The aim of this paper is to create a neural network that is capable of training various types of neural networks. It is inspired by deep supervision, a method that involves training a neural network. The generation of final product is the responsibility of the lead head, whereas the auxiliary head supports middle-layer training. Figure 5 demonstrates the general flow of the ship detection model from satellite images. In the dataset preprocessing step in Figure 5, first, the division of the dataset is made with a ratio of 6:4 for training set and testing set. For the identification of the ship, we consider the area of the bounding box and aspect ratio of the bounding box. The shape of the bounding box matches the aspect ratio of the bounding box, which is helpful in adopting an anchor for generating the bounding boxes.

Setup
We develop a classifier in this paper using a particular YOLOv7 architecture. The layer number and parameter adjustments listed in Table 2 are used to finetune the method. Plotting the figures and handling data are carried out using the Python platform. These experiments also use the modules numpy, open cv, and pandas. The standard Keras library is used to download the pretrained DenseNet parameters. However, because our computer's RAM is limited and since no graphics processing unit is being employed in the trials, the batch size will also be altered.

Parameters
Values

Setup
We develop a classifier in this paper using a particular YOLOv7 architecture. The layer number and parameter adjustments listed in Table 2 are used to finetune the method. Plotting the figures and handling data are carried out using the Python platform. These experiments also use the modules numpy, open cv, and pandas. The standard Keras library is used to download the pretrained DenseNet parameters. However, because our computer's RAM is limited and since no graphics processing unit is being employed in the trials, the batch size will also be altered.

Optimization
To update the network's parameter in order to train it, an optimization algorithm is required. It focuses on the discrepancies between model predictions and the term used to refer to the actual situation. The Adam optimizer is a technique for stochastic objective function optimization using first-order gradients. It is a computationally effective optimizer that can be used with the majority of data types. It also uses a small amount of memory. One of the oldest optimizers, the stochastic gradient descent (SGD) optimizer, does not use momentum while determining the update weights. A performance comparison between the Adam and SGD optimizer for ship detection is shown in Table 3 and Figure 6.

Setup
We develop a classifier in this paper using a particular YOLOv7 architecture. The layer number and parameter adjustments listed in Table 2 are used to finetune the method. Plotting the figures and handling data are carried out using the Python platform. These experiments also use the modules numpy, open cv, and pandas. The standard Keras library is used to download the pretrained DenseNet parameters. However, because our computer's RAM is limited and since no graphics processing unit is being employed in the trials, the batch size will also be altered.

Optimization
To update the network's parameter in order to train it, an optimization algorithm is required. It focuses on the discrepancies between model predictions and the term used to refer to the actual situation. The Adam optimizer is a technique for stochastic objective function optimization using first-order gradients. It is a computationally effective optimizer that can be used with the majority of data types. It also uses a small amount of memory. One of the oldest optimizers, the stochastic gradient descent (SGD) optimizer, does not use momentum while determining the update weights. A performance comparison between the Adam and SGD optimizer for ship detection is shown in Table 3 and Figure 6. According to Table 3, Adam optimizer is utilized with 93.4% accuracy to achieve the best classification performance. Compared with a learning rate of 0.01, a learning rate of 0.001 was used to achieve this performance. The batch size for various kinds of optimization is 16.

Divisioning of Dataset
Since this paper employs supervised learning, the training data must be labeled so that they can be used to train the algorithms. The set of non-overlapping photos used as testing data is used to gauge how reliable the classifier's predictions are. This batch of data has never been seen before. It is divided into three different scenarios. The findings in Table 4 demonstrate that there are only minor performance variations among all ratios of data splitting between training and testing, achieving the highest accuracy. The batch size and learning rate are both fixed at 16, while all other hyperparameters are set to 0.001.

Batch Size
The number of samples that are processed during one training iteration is determined by the hyperparameter known as batch size. The method will process the first 16 photos (from the first to the 16th) if the batch size is 16 samples, for instance. The algorithm will then obtain further 16 images (from positions 17 to 32), and the second iteration will continue with the same process until all of the images are processed for a particular epoch operation. A performance comparison between the different batches is shown in Table 5. A batch size of 16 for the 3000 training images and 600 testing images produces accuracy of 86.09%, which is higher than that of 84.01% for 2200 and 1200 training and testing images.

Learning Rate
The weights of the updated gradient error are affected by the learning rate. It specifically regulates the number of errors that the model's weights will take into account when it is updated. The learning rates of 0.001 and 0.01 are examined. In comparison to a learning rate of 0.01, the learning rate of 0.001 results in a substantially higher accuracy as shown in Table 6.  Figure 7 demonstrates a few results of ship detection from HRSID using our approach, YOLOv7+GNN. A comparison of YOLOv7 and other object detection algorithms such as YOLOv4, YOLOv5, and many more with respect to speed and accuracy has been presented by Chien-Yao Wang [21].   Figure 6 demonstrates a few results of ship detection from HRSID using our approach, YOLOv7+GNN. A comparison of YOLOv7 and other object detection algorithms such as YOLOv4, YOLOv5, and many more with respect to speed and accuracy has been presented by Chien-Yao Wang [21].

Conclusions
In this article, we provide a YOLOv7+GNN-based technique for automatic ship detection from the High-Resolution Satellite Image Dataset (HRSID), which is a high-resolution image of a synthetic aperture radar. This algorithm classifies ships with greater than Figure 7. Ship detection result using our approach, YOLOv7 + GNN.