A Benchmark for Maritime Object Detection with Centernet on an Improved Dataset, ABOships-PLUS

Iancu, Bogdan; Winsten, Jesper; Soloviev, Valentin; Lilius, Johan

doi:10.3390/jmse11091638

Open AccessFeature PaperArticle

A Benchmark for Maritime Object Detection with Centernet on an Improved Dataset, ABOships-PLUS

Faculty of Science and Engineering, Åbo Akademi University, 20500 Åbo, Finland

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

J. Mar. Sci. Eng. 2023, 11(9), 1638; https://doi.org/10.3390/jmse11091638

Submission received: 30 June 2023 / Revised: 8 August 2023 / Accepted: 17 August 2023 / Published: 22 August 2023

(This article belongs to the Special Issue AI for Navigation and Path Planning of Marine Vehicles)

Download

Browse Figures

Versions Notes

Abstract

:

Object detection from waterborne imagery is an essential aspect in maritime traffic management, navigation safety and coastal security. Building efficient autonomous systems, which can take decisions in critical situations, requires an integration of complex object detectors in real time. Object detectors trained on generic datasets often give unsatisfactory results in complex scenarios like the maritime environment, since only a fraction of their images contain maritime vessels. Publicly available domain-specific datasets are scarce, and they are limited in the number of images and annotations. Compared to object detection in applications such as autonomous vehicles, maritime vessel detection is considerably reduced in computer vision research. This creates a deficit in exhaustive benchmarking studies for maritime detection datasets. To bridge this gap, we relabel the ABOships dataset and benchmark a state-of-the-art center-based detector, Centernet, on the newly relabeled dataset, ABOships-PLUS. We explore its performance under different feature extractors, and investigate the effect of object size and inter-class variation on detection accuracy. The reported benchmarking illustrates that the ABOships-PLUS dataset is adequate to use in supervised domain adaptation. The experimental results show that Centernet with DLA (Deep Layer Aggregation) as a feature extractor achieved the highest accuracy in detecting maritime objects overall (with mean average precision 74.4%).

Keywords:

maritime vessel dataset; ship detection; dataset relabeling; benchmarking; deep learning; smart shipping

1. Introduction

Detection of maritime objects from waterborne imagery is a vital aspect in a range of military and civilian applications. In nonmilitary applications, vessel detection can contribute significantly to traffic supervision and management, navigation safety and environmental impact. Military applications benefit from data fusion solutions and integration of various types of ship information (location, direction, speed) with visual detection systems to ensure coastal security and real-time detection of ships in an attempt to make prompt decisions in critical situations. Maritime surveillance plays a central role in both military and nonmilitary applications and employs various technologies such as AIS (Automatic Identification System), VMS (Vessel Monitoring System) and SAR systems (both airborne and spaceborne). Integrating such surveillance technologies with maritime vision solutions improves the accuracy of monitoring solutions for maritime navigation [1,2]. Moreover, in both fields, autonomous navigation and collision avoidance emerge as essential tasks for prospective applications. Identifying various maritime objects and determining their respective distance to a specific cardinal point is of utter importance for safe navigation.

Object detection in complex environments poses various challenges that stem from numerous reasons: a high variety of vessel types, challenging atmospheric conditions, dynamic illumination, viewpoint and background variations, object occlusion and visible proportion, etc.

However, these methods may lack the necessary discriminative power required for accurate maritime vessel detection and precise classification of ships, hindering them from achieving high performance. Typically they use horizon detection or background subtraction in the pre-processing phase, as well as predefined features, and directional gradient histograms, but often they are not adequately discriminative to perform maritime vessel detection and precise classification of ships.

Contrarily, convolutional neural networks (CNNs) automatically extract distinct features through multi-layer architectures, which govern their selectivity and invariance. CNNs exhibit an improved generalization potential over large-scale datasets. Object detection performance increases significantly with dataset specificity (relevant characteristics for a specific detection task) particularly for datasets comprising labels of heterogeneous object categories such as ship types.

Generic computer vision datasets (Imagenet [3], COCO [4], PASCAL VOC [5]) have been promoted for the classification and detection of various static objects. These datasets, however, were devised to include a comprehensive selection of non-specialized object categories and visual scenes. Hence, the number of maritime vessel instances is relatively reduced: Imagenet: 1071; COCO: 3146; PASCAL VOC: 353. Shao et al. show that the performance of various algorithms for ship detection on these datasets is unsatisfactory [6]. Furthermore, generic object detection datasets exhibit compelling dissimilarities in both object categories but also image distribution (viewpoint dissimilarities, contextual information, semantic cues, etc.) [7,8]. There are several reasons why generic object detection datasets suffer from a poor performance in maritime object detection apart from the small number of vessel instances included in the annotations. First, there is a selection bias in the collection of such datasets, many of them comprising the same type of images (either nature or urban scenes or collections of images from the internet). A second bias is image capture, which can be similar for images that were retrieved from the internet. One of the most evident biases is that objects are often located in the middle of or very close to the middle of the images, which is not ideal from the point of view of object detection. A third important bias is the negative set bias, i.e., what the dataset regards as the rest of the visual world. When the negative bias set is unbalanced or not typical enough to the problem at hand, classifiers can produce results that are not selective enough [7].

Maritime vessel detection datasets that are not crowd-sourced are scarce. Many of them are privately owned and only available for purchase. Private maritime vessel datasets are used in various studies [9,10], etc. However, they cannot be used for benchmarking, since they are not accessible to the public. A few publicly available datasets for maritime vessel detection have been employed in different studies before, e.g., SeaShips [6], SMD [11], McShips [12], Marvel [13], VAIS [14], Harbour Surveillance [15], SeaSAw [16], Maritime Ports [17] and ABOships [18].

Among the aforementioned maritime vessel detection datasets, ABOships ([18]) emerges as a distinctive dataset for maritime object detection, due to its salient diversity (background variation, atmospheric conditions, illumination, object scale variation, object visible proportion, object occlusion, etc.). Another reason is that ABOships encompasses images from different scenarios: urban landscapes, harbor area and open sea environment. Although ABOships has been used for training and testing maritime object detectors, its performance in comparison with other datasets ([19]) was significantly weaker. One reason is the representation of classes in the dataset: only 1.3% of the images contain objects from the “miscellaneous” class, 1.58% of the images contain objects from the “cargoship” class. Two other classes have better representation within the dataset (ferry—9.56%, cruiseship—13.63%); however, given the number of total images (9880), the low number of images can influence the accuracy significantly. Another aspect is that the dataset contains a relatively high number of small objects with a registered occupied pixel area of less than

16^{2}

pixels, which, in the absence of other fusion data (LiDAR, radar, etc.), can negatively influence accuracy.

To address the aforementioned issues, we benchmark the performance of transfer learning on a ship detection task in a complex maritime environment. Consequently, we improved the ABOships dataset by relabeling it, with the expectation that the dataset will be used as a benchmark for maritime object detection and classification. We eliminated the very small objects with an occupied pixel area <

16^{2}

pixels and, to alleviate the class-imbalance problem, we aggregated the classes in the dataset into four superclasses, thus proposing a newly improved dataset, ABOships PLUS, comprising four maritime object classes. We trained a state-of-the-art CNN, Centernet (Objects as Points) [20], on the newly relabled domain-specific dataset, ABOships-PLUS, and concurrently performed transfer learning on the same dataset (ABOships-PLUS), using the COCO dataset [4] pretrained weights. Several feature extractors were used in ablation studies. Moreover, we provide qualitative results for maritime object detection and a comparison on the runtime of the employed algorithms. A comparative study of transfer learning for maritime object detection using CNNs was also performed by Farahnakian et al. in [21].

Our aim is to introduce a newly improved dataset for maritime object detection and classification. The motivation stems from the need of benchmark datasets for maritime computer vision (as discussed above), and we have the expectation that the ABOships-PLUS dataset will be used for maritime vision tasks in the future. Although the original ABOships dataset already provides bounding boxes for 11 classes of maritime objects, it withstands a class-imbalance problem, with several of the classes being underrepresented; more details in Section 4. In having improved the ABOships dataset annotations, we are able to provide benchmark results for maritime object detection based on Centernet (Objects as Points) [20] and compare various augmentation techniques and their effect on object detection in maritime environments.

The contributions of the paper can be summarized as follows. First, we improved the annotations of the ABOships dataset, alleviating class-imbalance by aggregating several underrepresented classes into superclasses based on visual inspection (human) and their respective semantic relevance, and also eliminating very small objects with an occupied pixel area <

16^{2}

pixels. The aim was to create new annotations for the ABOships-PLUS dataset, which are meaningful both from a semantic but also from a visual point of view of the maritime object type. Secondly, we provided benchmark results for the maritime object detection task based on Centernet (Objects as Points) [20], encompassing various feature extractors, taking into consideration inter-class variation, different object sizes and using various augmentation techniques, as well as providing runtime of the proposed detectors, and providing results for both learning from scratch and using transfer learning.

The paper is organized as follows. Section 2 depicts the distinctions between generic and maritime object detection. Section 3 describes the most prominent CNN-based object detector categories. Section 4 illustrates the main properties of the ABOships dataset and Section 5 discusses the newly relabeled dataset, ABOships-PLUS. The experimental results are illustrated in Section 6. The discussion is presented in Section 7. The conclusions are summarized in Section 8.

2. Related Work

2.1. Class-Generic Object Detection

The task of object detection plays an essential role in the field of computer vision, as it requires both the recognition of objects from given categories (classification) and the retrieval of object locations through bounding boxes (localization). Traditional algorithms for object detection consist of three phases: proposal generation, feature extraction and region classification. The proposal generation step consists in determining ROIs (regions of interest) within the image where objects of interest could be located; this step is often carried out through a sliding-window approach. During the feature extraction phase, feature vectors are assembled from the sliding windows, encapsulating the semantic information of the ROI. Lastly, in the region classification stage, classifiers determine the category of the respective region [22]. These algorithms, however, heavily depend on the design of hand-crafted feature descriptors. This makes them susceptible to unfavorable scenarios: first of all, this type of design precludes obtaining essential semantic information in complex environments, and, further, global optimization of the entire process is an intricate repercussion to manage [22].

Current state-of-the-art object detectors employ CNNs, which exploit convolutional layers in deep architectures, extracting features that are subsequently classified by fully-connected layers. Deep convolutional networks designed to this extent address fundamental limitations of earlier detectors, though require a significant upsurge in the size of the training dataset, to adjust for hyperparameter optimization [23]. In this regard, a surge of very deep networks emerged, which laid the foundation for a series of modern object detectors that were extensively used in the past years. A more detailed description is provided in Section 3.

2.2. Maritime Object Detection

Classic object detection methods based on horizon detection, background subtraction or Histogram of Oriented Gradients (HOG) using sliding windows revealed a decline in performance over waterborne ship datasets due to a high complexity of the environment, posed by everchanging atmospheric conditions, dynamic illumination, and viewpoint and background variations [24].

CNNs have started to be exploited more often for maritime vessel detection from waterborne images only for the last few years. For example, in [25], they propose a new model, YOLO [26] for ship detection. A large-scale ship dataset, SeaShips, is introduced in [6] and consists of a collection of waterborne images of maritime vessels. They employ three object detectors (Faster R-CNN [27], SSD [28] and YOLO [26]) in [25] and perform ablation studies for different scenarios. In [12], a new large-scale maritime vessel dataset is introduced, McShips. Three baseline object detectors are compared against one another, in an attempt to identify their robustness and vulnerabilities over the McShips dataset. In [13], a new vessel recognition dataset, MARVEL, is introduced, which is used to perform three tasks: vessel identity verification through CNNs (derived from [29]), vessel retrieval through multi-class SVMs [30] and vessel recognition by employing VGG-F (a VGG-variant) and Alexnet networks [31]. The VAIS dataset is presented in [14], encompassing maritime images in the visible and infrared spectrums, and three classification methods are employed: CNNs [32], gnostic fields [33] and a combination of both. In [15], Zwemer et al. provide a dataset for maritime vessel detection and tracking and perform cross-validation on viewpoints to assess the influence of scene context on the detection performance of various implementations of the SSD detector [28]. Another maritime vessel detection dataset, ABOships, was recently introduced in [18], and the performance of several one-stage and two-stage detectors was analyzed over several feature extractors. Moosbauer et al. perform a benchmarking study on the SMD dataset [11] in [34], employing detectors from the R-CNN family, with different feature extractors, pretrained on COCO and Imagenet datasets.

3. CNN-Based Detectors

Modern CNN-based detectors can be divided in two categories: anchor-based and anchor-free detectors, which we briefly discuss below. The former consists of two-stage detectors and one-stage detectors, and the latter comprises keypoint-based detectors and center-based detectors.

3.1. Anchor-Based Detectors

Anchor-based detectors adopt a conceptual design similar to the classic sliding-window approach. They propose a high number of preset anchors within an image and subsequently determine their categories, to further refine anchor coordinates before the final detections are selected as the refined anchors.

Two-stage object detectors are preferred when the accuracy is more relevant for a specific use case relative to the computational efficiency. Among the detectors in this category, Faster R-CNN ( [27]) proved to be one the most prevalent, due to an increased accuracy compared to other algorithms in the R-CNN family. Faster R-CNN consists of two stages: a region proposal network (RPN), which proposes candidate bounding boxes, and a Fast R-CNN detector applied on the RPNs, which performs both classification and regression for each bounding box. Later on, Faster R-CNN was extended to other architectures with enhanced performance. R-FCN [35], for instance, produces a position-sensitive score map, which encodes the position of different classes relative to a spatial position and adopts a position-sensitive ROI pooling layer to extract region-specific features, encoding spatial information required for object detection. Mask R-CNN [36] preserves the first stage of Faster R-CNN, which is the RPN. However, in the second stage, while performing classification and regression on the bounding boxes, it concurrently generates a binary mask for each ROI, enconding a spatial layout for every object.

One-stage detectors were adopted to address computational efficiency. In contrast to two-stage methods, one-stage methods do not employ a distinct stage for proposal generation as they attempt to classify each ROI either as background or object of interest. One of the prevalent anchor-based one-stage detectors, SSD [28] exploits anchors (similar to the RPN in the Faster R-CNN detector) for prediction, assigning scores for each anchor. Subsequently, through a feedforward CNN, SSD produces a limited number of anchors and respective scores for object class. SSD obtained commensurate results with Faster R-CNN in terms of accuracy and, furthermore, performed well in real-time scenarios, and several CNNs extended it in subsequent architectures.

3.2. Anchor-Free Object Detectors

Anchor-free detectors rely on predicting keypoints of a bounding box instead of attempting to match an object with an anchor.

Keypoint-based detectors identify keypoints and subsequently generate bounding boxes. CornerNet [37], for instance, assigns the bounding boxes identifying the top-left and bottom-right corners of an object. Other adaptations of CornerNet emerged, e.g., CornerNet-Lite [38], which combines two subnets, CornerNet-Saccade and CornerNet-Squeeze. This association of subnets enhances efficiency while maintaining the accuracy in real-time scenarios. Several other detectors are based on CornerNet; however, one of the most notable ones is Centernet (Keypoint triplets for object detection) [39], which extends CornerNet to include keypoint triples in detection rather than just the corners, to improve the accuracy.

Center-based detectors determine the center of objects and subsequently regress the height and width of bounding boxes without using anchor priors. One highly popular anchor-free detector is the YOLO series [26] which predicts bounding boxes based on points around the center of objects by dividing the image into a grid. The grid cell that the center of an object falls in is decisive for object detection. Centernet (Objects as Points) [20] exploits keypoint estimation to determine the center points of objects through keypoint heatmaps and regresses object properties, including the dimensions of the bounding boxes and their offsets. In an ablation study performed in [20], Centernet records the best efficiency–accuracy tradeoff on the COCO dataset, which is why we chose it to benchmark in the maritime environment.

4. ABOships Dataset

Several publicly accessible datasets for maritime object detection have been used in various studies. e.g., SeaShips [6], SMD [11], McShips [12], Marvel [13], VAIS [14], Harbour Surveillance [15], SeaSAw [16], Maritime Ports [17], ABOships [18]; see Table 1.

Among these datasets, we chose ABOships to investigate transfer learning, based on the bounding-box number per image ratio and, more importantly, on the variety in the dataset, which includes ship images in various atmospheric conditions, illumination, different degrees of occlusion, scale variations and background information, including images with both urban landscapes and open sea. The dataset has been collected in a variety of atmospheric conditions, specific to the summer weather in the Finnish Archipelago, alternating between bright periods with clear skies and cloudy weather with precipitation. Regarding illumination, the dataset comprises images captured in different geographical areas (urban, open sea, port areas) throughout various daylight hours. Due to the diversity of the geographical areas in the dataset, many images in ABOships comprise instances of maritime objects occluding each other. Often, the images in the ABOships dataset consist of ships in motion, which assure the existence of different visible proportions of maritime objects. Background variation is another aspect that is important in maritime object detection. The heterogeneity of the maritime objects can make them more difficult to separate from the background, due to the variable and potentially extended background information included in the bounding box (e.g., ships with sails).

ABOships comprises 9880 images consisting of images of ships spanning over nine categories (boat, cargoship, cruiseship, ferry, militaryship, motorboat, passengership, sailboat, miscboat), seamarks and miscellaneous floaters. However, the performance of detectors on ABOships in comparison with other datasets is weak for one-stage detectors [19]. One reason is due to some classes being underrepresented, classes cargoship and miscellaneous for instance only have 161 and 200 annotations, respectively, and they are contained in only 157 and 129 images, respectively. A second reason is that the dataset contains 8740 annotations with an occupied pixel area of less than

16 \times 16

pixels, since in [18], they use a tracker to verify the labels. To address these issues, we relabeled the dataset, eliminating extremely small annotations, and provided superclasses that are more representative for the maritime environment along with a good distribution of images and annotations for training, test and validation; see Section 5.

5. ABOships-PLUS—An Improved ABOships Dataset

Upon performing exploratory analysis on the ABOships dataset, we eliminated very small objects (occupied pixel area <

16^{2}

pixels) and aggregated the object categories into 4 superclasses based on distinct visual characteristics to be able to employ both transfer learning and learning from scratch. Once very small objects were eliminated from the annotations, we grouped the original ABOships classes into superclasses based on their respective semantic relevance (human supervision adopted). The aim of this class aggregation was to make the final superclasses in the new ABOships-PLUS dataset as meaningful as possible both from a semantic and also from a visual point of view of the maritime object detection type. A detailed account of the classes of the new ABOships-PLUS dataset is depicted in Table 2, which illustrates that the ship class is the most represented with 46% of the total number of bounding boxes, with 70% of the images in the dataset containing the label ship. The sailboat and powerboat superclasses have a similar distribution, while the stationary superclass is the least represented with only 8% of the total number of bounding boxes, and it appears in just 26% of the images. Example images of the superclasses of the ABOships-PLUS dataset are illustrated in Figure 1 for the stationary superclass, in Figure 2 for the powerboat superclass and finally in Figure 3 for the ship superclass.

6. Experimental Results

We benchmark the performance of Centernet (Objects as Points) with four different backbones (DLA, Hourglass, Resnet18, Resnet101) on the ABOships-PLUS dataset, using transfer learning (employing COCO-pretrained weights) and also training from scratch. In this section, we investigate how transfer learning on the aforementioned feature extractors and object sizes affects the detection accuracy of Centernet compared to training from scratch.

6.1. Evaluation Criteria

To evaluate the performance of object detection algorithms on imagery datasets, various quantitative indicators can be exploited. Object detectors predict the location of objects within images and the object class by providing a bounding box for every object with a specific confidence score. One of the most common measures in object detection is the mean average precision (

m A P

). To examine the significance of

m A P

, another metric requires introduction, i.e., IoU (Intersection of Union). IoU is a measure derived from the Jaccard Index [40] representing the overlapping area of area of the predicted bounding box,

B_{p}

, and the area of the ground-truth bounding box

B_{g t}

, divided by their union as follows [41]:

I o U = \frac{| B_{p} \cap B_{g t} |}{| B_{p} \cup B_{g t} |}

(1)

A detection can be classified as correct or incorrect by comparing the IoU with a given threshold t. If IoU is under the threshold (

I o U < t

) the detection is regarded as incorrect; otherwise, it is regarded as correct.

Based on the above considerations, precision and recall can be calculated. The precision is calculated as the percentage of correct positive detections. The recall is calculated as the percentage of correct positive detections among all given ground truths. Calculating precision and recall requires detected bounding boxes to be individually classified into one of the following outcomes, by definition:

True positive (TP): a correct prediction of a ground-truth bounding box made by the detector;
False positive (FP): an incorrect detection of a non-existent object or a misplaced detection of an existing one;
False negative (FN): a ground-truth bounding box which was not detected.
True negative (TN): this metric is not applicable in object detection, since the number of bounding boxes that should not be detected in a given image is infinite.

Given the above definitions, the precision and recall can be calculated as follows:

P = \frac{T P}{T P + F P}

(2)

P = \frac{T P}{T P + F N}

(3)

Based on the above metrics a precision–recall (P × R) curve can be depicted. The P × R curve can be interpreted as the trade-off between the precision and recall for different IoU thresholds. An object detector can be considered reliable if the precision remains high as the recall increases. This signifies that the precision and recall maintain high values for a varying threshold. Average precision is a metric representing the area enclosed by the P × R curve for all thresholds. Often this area is greatly irregular, following a zig-zag pattern, which makes measuring it with high accuracy challenging. To estimate the

A P

, it is beneficial to remove the irregular pattern, which can be achieved through interpolation, be it an 11-point interpolation or an all-point interpolation [41]. The 11-point interpolation can be calculated as follows:

A P_{11} = \sum_{R \in {0, 0.1, \dots, 0.9, 1}} P_{i} (R),

(4)

with

P_{i} (R) = max_{R^{*} | R^{*} \geq R} P_{i} (R^{*}) .

(5)

A P_{11}

is calculated employing the maximum precision

P_{i} (R)

, with a recall greater than R.

Finally, mean average precision (

m A P

) measures the accuracy of detector over all N evaluated classes in a dataset and is calculated as:

m A P = \frac{1}{N} \sum_{i = 1}^{N} A P_{i} .

(6)

6.2. Training

Of the 9880 images in the ABOships-PLUS dataset, we randomized and split them into three sets in all training experiments: train (70%), validation (15%), test (15%). We converted the ABOships-PLUS annotations into COCO format according to the superclasses defined in Section 4. The input resolution is scaled down to

512 \times 512

for all models. The provided out resolution is to

128 \times 128

for all the models. For optimizing the overall objective, we employed the Adam optimizer [42], with an initial learning rate of 1.25 ×

10^{- 4}

for all transfer learning models; for all backbones except for Hourglass, the learning rate was reduced by 10% during training. One round of training of Centernet over all four backbones on the ABOships-PLUS dataset required approximately 99 hours to complete, the Hourglass feature extractor alone necessitating the longest time out of all the feature extractors, approx. 55 hours to train. During testing according to [20], we maintained the original image resolution while zero-padding the input to reach the maximum stride of the network. For Resnet and DLA, the image was padded up to 32 pixels, while for Hourglass, 128 pixels were employed.

To investigate the quality of the train, validation and test sets, it is essential to examine the distribution of classes over the three sets. Figure 4 shows the occupied pixel area of the bounding boxes in the dataset. Figure 5 illustrates a fairly balanced distribution over the three sets, based on the split described above. The majority of bounding boxes are, based on the COCO challenge’s variants, either small or medium-sized, which makes vessel detection more challenging [4]. This shows that ABOships is an interesting dataset to train, since it provides real maritime conditions. Both the test and validation sets retain a similar distribution in comparison to the training set.

All experiments were performed on NVIDIA RTX 2080 Ti, with 11 GB GDDR6 and 4352 NVIDIA CUDA cores, CUDA Version: 11.3 and PyTorch version 1.1.0.

6.3. Transfer Learning vs. Learning from Scratch

The effect of various feature extractors: To evaluate the extent to which transfer learning affects the accuracy, we trained the proposed CNN-based detector on the ABOships-PLUS dataset over four feature extractors, using COCO pretrained weights. Moreover, we performed also training of the same architecture from scratch on the ABOships dataset. Table 3 and Table 4 illustrate the performance for the algorithms under different feature extractors at

m A P_{50}

. The results were obtained using four levels of augmentation at inference: N.A. (non-augmented), F (flip testing), MS (multi-scale augmentation:

0.5, 0.75, 1, 1.25, 1.5

), which yield different levels of performance, with MS and F accounting for the best accuracy throughout all experiments; see Table 3 and Table 4. For the experiments where we employed training from scratch, we obtained results comparable to the training performance reported in [20]. Using different levels of augmentation at inference (flip testing, multi-scale testing and combined), we noticed an improvement of the accuracy for all models.

The values of the proposed metrics are improved in all models by using pretrained weights comparing with training from scratch, which shows that using transfer learning can enhance vessel detection accuracy. These findings emphasize the importance of transfer learning for CNN-based detectors for domain-specific datasets, especially in the maritime environment, where data are scarce and training very deep nets is still challenging.

To investigate the efficiency of different feature extractors, we collected detection results under four different backbones: DLA, Hourglass, Resnet101 and Resnet18. Table 3 and Table 4 show that DLA outperforms all other feature extractors in both experiments, with an

m A P_{50}

of

74.4 %

in transfer learning and

68.4 %

in training from scratch. The second-best performers were Resnet101 for transfer learning (

72.6 %

) and Hourglass in training from scratch (

63.7 %

). The reason why DLA performed better than Hourglass and Resnet in both experiments is two-fold: the size of the training data might be detrimental to a bigger architecture (Hourglass, Resnet) and, second, imbalanced input data given by the high heterogeneity in at least one of the superclasses in the dataset (superclass: ship; see Table 2 and Figure 3).

Object size and inter-class effect: To explore the effect of object size on detection accuracy, we divided all annotated objects in the chosen dataset into three categories: small, medium and large, based on the COCO challenge’s variants. Specifically, after eliminating all objects with a bounding-box area <

16^{2}

pixels,

30 %

of the remaining objects are small (with an occupied pixel area <

32^{2}

pixels),

48 %

are medium-sized (

32^{2}

< occupied pixel area <

96^{2}

) and

22 %

fall into the large-objects category with an occupied pixel area >

96^{2}

.

From our experiments in Table 5 and Table 6, we observe that the detection accuracy drastically increases with the object size. In the transfer learning experiment, DLA outperforms all other backbones in the small-objects category (

m A P_{S} = 50.4 %

) and in the medium-size category (

m A P_{M} = 65.2 %

), while both Hourglass and Resnet101 exceed DLA and Resnet18 in the large-objects category (

m A P_{L} = 82.5 %

). In training from scratch, Hourglass outperforms all other feature extractors in all object-size categories (

m A P_{S} = 47.3 %

,

m A P_{M} = 62.5 %

,

m A P_{L} = 78.8 %

). In general, the detection of smaller objects is more difficult, taking into consideration that there is less information associated with the smaller occupied pixel area. However, transfer learning performed better than learning from scratch for all respective feature extractors and object sizes.

To compare the performance evaluation results for transfer learning between the original ABOships dataset and our newly improved dataset ABOships-PLUS, we performed transfer learning on the original ABOships dataset using Centernet under the four feature extractors, across scales, and summarized the results of the experiments in Table 7. Comparing the performance evaluation results in Table 6 and Table 7, we can conclude that the Centernet performace on the newly annotated dataset ABOships-PLUS exceeds the one on the original dataset.

To investigate the inter-class effect on detection, specifically because of the previous aggregation of annotations into superclasses, we performed testing of the transfer learning models per class, without augmentation at inference; see Table 8. We noticed that the Hourglass backbone outperforms the others on the “powerboat” and “sailboat” superclasses, which contain more homogeneous maritime vessels as far as shape is concerned, while DLA performed best in the “ship” category which is the most heterogeneous of the classes (to examine the heterogeneity of class “ship” see Table 2).

Runtime: We provide a comparison on the runtime of the proposed methods using the same hardware specification depicted in Section 6.2. All proposed detectors were tested on an example image from the ABOships dataset with HD resolution; see Figure 6. Resnet18 proved to be the fastest detector at inference, with a runtime of 175 ms, closely followed by DLA (183 ms) and Resnet101 (189 ms). In the transfer learning task with Centernet, DLA backbone emerged not only to be the fastest at inference but also the most accurate for small and medium-sized objects; see Table 5 and Table 6.

Qualitative results: Figure 7 illustrates an example of detection results for Centernet under the proposed feature extractors using transfer learning. We can notice in Figure 7a,b that Centernet with DLA and Hourglass feature extractors detected regions registering high scores, up to 0.85 for medium-sized objects, and up to

0.75

and

0.71

for small objects. The other two feature extractors Resnet18 and Resnet101, in Figure 7c,d, reached scores of

0.82

and

0.80

for medium-sized objects and up to

0.67

and

0.75

for the small-objects category, respectively. These results show that transfer learning can provide detection of small and medium-sized maritime objects with high accuracy, which is essential in maritime navigation.

7. Discussion

Maritime vessel detection plays a significant role in video surveillance, coastal security and navigation safety. It is paramount in many situations to detect maritime objects and their types in an effort to either alert various security systems or to aid in difficult navigation circumstances.

Under ideal sea conditions, conventional ship detection techniques that employ background separation or histograms of oriented gradients can yield good results. However, the extraction of low-level features can prove to be very difficult due to the complexity of the maritime environment, based on defiant atmospheric conditions (glare, fog, clouds, high waves, rain, etc.). This issue has been addressed in various CNN studies, although deep learning models require domain-specific datasets to perform adequately; see [6,43]. Publicly available datasets specifically dedicated to maritime object detection are still hard to come by, and Section 2 goes into more details about this aspect.

8. Conclusions

In this paper, to create a benchmarking dataset for maritime object detection, we improved a maritime object detection dataset and we benchmarked the impact of transfer learning on object detection, using a newly relabeled domain-specific dataset, ABOships-PLUS, in a complex maritime environment. For this purpose, we performed an in-depth exploratory analysis of the original dataset [18], proposed four new superclasses to be able to perform training and we trained a state-of-the-art center-based object detector, Centernet (Objects as Points) [20], both from scratch and using transfer learning, under a variety of feature extractors (DLA, Hourglass, Resnet101/18). We assessed their performances based on object size and object categories and provided runtimes for an example image from the dataset. We noticed that all transfer learning models outperform their training from scratch counterparts, and we obtained with them good detection scores for small and medium-sized objects. We conclude that the ABOships-PLUS dataset is adequate to utilize in domain adaptation. The experimental results show the importance of deep transfer learning for object detection in maritime environments.

We intend to account for manual labeling errors in future developments of the ABOships-PLUS dataset, to consider inconsistent labels in the original dataset, which might have translated into the newly annotated dataset. As a start, we plan to address, first, fine-grained recognition errors (which make it more challenging for humans to discover objects during labeling, even if they lay in plain sight) [44]. Moreover, we plan to modify the Centernet architecture to improve ship detection and specifically address the variability in size of different objects, which are often encountered in maritime datasets collected in real scenarios. This would help to address the detection of small objects, which the Centernet architecture is sensitive to. These proposed changes would also make a tremendous difference to the performance of Centernet on ABOships-PLUS, since, even after removing very small objects, it still contains a high number of small objects with an occupied pixel area <

32^{2}

pixels. Another direction of research we will employ is to adapt the new Centernet architecture depicted above to be deployed on embedded devices, to reduce model size, since more compact models are beneficial in autonomous navigation scenarios.

Author Contributions

Conceptualization, B.I. and J.L.; Methodology, B.I.; Software, J.W. and V.S.; Writing—original draft, B.I. and J.W.; Writing—review and editing, B.I.; Supervision, B.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by Business Finland under the SMARTER (Smart Terminals), a Sea4Value project, and by Gösta Branders forskningsfond, Stiftelsen för Åbo Akademi.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data presented in this study are available upon request from the corresponding author. The data are not publicly available due to being in the process of publishing, it is planned to be published at: https://zenodo.org/ (accessed on 27 June 2023). For reviewers a separate package with data can be provided along with any necessary code in the meantime.

Conflicts of Interest

The authors declare no conflict of interest.

References

Su, L.; Chen, Y.; Song, H.; Li, W. A survey of maritime vision datasets. Multimed. Tools Appl. 2023, 82, 28873–28893. [Google Scholar] [CrossRef]
Fefilatyev, S.; Goldgof, D.; Shreve, M.; Lembke, C. Detection and tracking of ships in open sea with rapidly moving buoy-mounted camera system. Ocean. Eng. 2012, 54, 1–12. [Google Scholar] [CrossRef]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the CVPR, Miami, FL, USA, 20–25 June 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 248–255. [Google Scholar]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C. Microsoft COCO: Common objects in context. In Proceedings of the ECCV, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
Everingham, M.; Eslami, S.; Van Gool, L.; Williams, C.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes Challenge: A retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar]
Shao, Z.; Wu, W.; Wang, Z.; Du, W.; Li, C. Seaships: A large-scale precisely annotated dataset for ship detection. IEEE Trans. Multimed. 2018, 20, 2593–2604. [Google Scholar]
Torralba, A.; Efros, A. Unbiased look at dataset bias. In Proceedings of the CVPR, Colorado Springs, CO, USA, 20–25 June 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 1521–1528. [Google Scholar]
Liu, Y.; Wang, R.; Shan, S.; Chen, X. Structure inference net: Object detection using scene-level context and instance-level relationships. In Proceedings of the CVPR, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6985–6994. [Google Scholar]
Teutsch, M.; Krüger, W. Classification of small boats in infrared images for maritime surveillance. In Proceedings of the 2010 International WaterSide Security Conference, Carrara, Italy, 3–5 November 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1–7. [Google Scholar]
Staff, A.M.; Zhang, J.; Li, J.; Xie, J.; Traiger, E.A.; Glomsrud, J.A.; Karolius, K.B. An Empirical Study on Cross-Data Transferability of Adversarial Attacks on Object Detectors. In Proceedings of the AI-Cybersec@ SGAI, Cambridge, UK, 14 December 2021; pp. 38–52. [Google Scholar]
Prasad, D.; Rajan, D.; Rachmawati, L.; Rajabally, E.; Quek, C. Video processing from electro-optical sensors for object detection and tracking in a maritime environment: A survey. IEEE Trans. Intell. Transp. Syst. 2017, 18, 1993–2016. [Google Scholar] [CrossRef]
Zheng, Y.; Zhang, S. Mcships: A Large-Scale Ship Dataset For Detection And Fine-Grained Categorization In The Wild. In Proceedings of the 2020 IEEE International Conference on Multimedia and Expo (ICME), London, UK, 6–10 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
Gundogdu, E.; Solmaz, B.; Yücesoy, V.; Koc, A. Marvel: A large-scale image dataset for maritime vessels. In Proceedings of the Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 165–180. [Google Scholar]
Zhang, M.M.; Choi, J.; Daniilidis, K.; Wolf, M.T.; Kanan, C. VAIS: A dataset for recognizing maritime imagery in the visible and infrared spectrums. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 7–12 June 2015; pp. 10–16. [Google Scholar]
Zwemer, M.H.; Wijnhoven, R.G.J.; de With, P.H.N. Ship Detection in Harbour Surveillance based on Large-Scale Data and CNNs. In Proceedings of the VISIGRAPP (5: VISAPP), Madeira, Portugal, 27–29 January 2018; pp. 153–160. [Google Scholar]
Kaur, P.; Aziz, A.; Jain, D.; Patel, H.; Hirokawa, J.; Townsend, L.; Reimers, C.; Hua, F. Sea situational awareness (seasaw) dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 2579–2587. [Google Scholar]
Petković, M.; Vujović, I.; Lušić, Z.; Šoda, J. Image Dataset for Neural Network Performance Estimation with Application to Maritime Ports. J. Mar. Sci. Eng. 2023, 11, 578. [Google Scholar] [CrossRef]
Iancu, B.; Soloviev, V.; Zelioli, L.; Lilius, J. ABOships—An Inshore and Offshore Maritime Vessel Detection Dataset with Precise Annotations. Remote Sens. 2021, 13, 988. [Google Scholar] [CrossRef]
Nunes, D.; Fortuna, J.; Damas, B.; Ventura, R. Real-time vision based obstacle detection in maritime environments. In Proceedings of the 2022 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), Santa Maria da Feira, Portugal, 29–30 April 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 243–248. [Google Scholar]
Zhou, X.; Wang, D.; Krähenbühl, P. Objects as points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
Farahnakian, F.; Zelioli, L.; Heikkonen, J. Transfer learning for maritime vessel detection using deep neural networks. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar]
Wu, X.; Sahoo, D.; Hoi, S. Recent advances in deep learning for object detection. Neurocomputing 2020, 396, 39–64. [Google Scholar]
Liu, L.; Ouyang, W.; Wang, X.; Fieguth, P.; Chen, J.; Liu, X.; Pietikäinen, M. Deep learning for generic object detection: A survey. Int. J. Comput. Vis. 2020, 128, 261–318. [Google Scholar] [CrossRef]
Prasad, D.; Prasath, C.; Rajan, D.; Rachmawati, L.; Rajabaly, E.; Quek, C. Challenges in video based object detection in maritime scenario using computer vision. arXiv 2016, arXiv:1608.01079. [Google Scholar]
Lee, S.J.; Roh, M.I.; Lee, H.W.; Ha, J.S.; Woo, I.G. Image-Based Ship Detection and Classification for Unmanned Surface Vehicle Using Real-Time Object Detection Neural Networks. In Proceedings of the ISOPE International Ocean and Polar Engineering Conference (ISOPE), Sapporo, Japan, 10–15 June 2018. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the CVPR, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A. SSD: Single shot multibox detector. In Proceedings of the ECCV, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
Sun, Y.; Wang, X.; Tang, X. Deep learning face representation from predicting 10,000 classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1891–1898. [Google Scholar]
Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. TIST 2011, 2, 1–27. [Google Scholar] [CrossRef]
Yu, W.; Yang, K.; Bai, Y.; Xiao, T.; Yao, H.; Rui, Y. Visualizing and comparing AlexNet and VGG using deconvolutional layers. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Kanan, C. Fine-grained object recognition with gnostic fields. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Steamboat Springs, CO, USA, 24–26 March 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 23–30. [Google Scholar]
Moosbauer, S.; Konig, D.; Jakel, J.; Teutsch, M. A benchmark for deep learning based object detection in maritime environments. In Proceedings of the CVPR Workshops, Long Beach, CA, USA, 16–17 June 2019; IEEE: Piscataway, NJ, USA, 2019. [Google Scholar]
Dai, J.; Li, Y.; He, K.; Sun, J. R-fcn: Object detection via region-based fully convolutional networks. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 379–387. [Google Scholar]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the ICCV, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Law, H.; Deng, J. Cornernet: Detecting objects as paired keypoints. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2018; pp. 734–750. [Google Scholar]
Law, H.; Teng, Y.; Russakovsky, O.; Deng, J. Cornernet-lite: Efficient keypoint based object detection. arXiv 2019, arXiv:1904.08900. [Google Scholar]
Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. Centernet: Keypoint triplets for object detection. In Proceedings of the Proceedings of ICCV, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6569–6578. [Google Scholar]
Taha, A.A.; Hanbury, A. Metrics for evaluating 3D medical image segmentation: Analysis, selection, and tool. BMC Med. Imaging 2015, 15, 1–28. [Google Scholar] [CrossRef] [PubMed]
Padilla, R.; Netto, S.L.; Da Silva, E.A. A survey on performance metrics for object-detection algorithms. In Proceedings of the 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), Sofia, Bulgaria, 1–3 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 237–242. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Shao, Z.; Wang, L.; Wang, Z.; Du, W.; Wu, W. Saliency-aware convolution neural network for ship detection in surveillance video. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 781–794. [Google Scholar] [CrossRef]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]

Figure 1. Example images of the stationary superclass of ABOships-PLUS dataset encompassing two classes from the original ABOships dataset: seamark (a) and miscellaneous floater (b).

Figure 2. Example images of the powerboat superclass of ABOships-PLUS dataset encompassing two classes from the original ABOships dataset: boat (a) and motorboat (b).

Figure 3. Example images for the ship superclass in the ABOships-PLUS dataset, encompassing the following classes from the ABOships dataset: cargoship (a), cruiseship (b), passengership (c), ferry (d), militaryship (e) and miscboat (f).

Figure 4. Histograms of occupied pixel area at

l o g_{2}

-scale for all annotated objects in the ABOships-PLUS dataset by object category, divided into three groups for each category: small, medium and large according to the COCO challenge’s variant (small:

l o g_{2} (a r e a) < 10

, medium:

10 < l o g_{2} (a r e a) < 13.16

and large:

l o g_{2} (a r e a) > 13.16

). (a) Powerboat. (b) Sailboat. (c) Ship. (d) Stationary.

Figure 4. Histograms of occupied pixel area at

l o g_{2}

-scale for all annotated objects in the ABOships-PLUS dataset by object category, divided into three groups for each category: small, medium and large according to the COCO challenge’s variant (small:

l o g_{2} (a r e a) < 10

, medium:

10 < l o g_{2} (a r e a) < 13.16

and large:

l o g_{2} (a r e a) > 13.16

). (a) Powerboat. (b) Sailboat. (c) Ship. (d) Stationary.

Figure 5. ABOships-PLUS: Class distribution for training (blue), test (red) and validation (green) sets, where the count represents the number of objects in every class.

Figure 6. The runtime (milliseconds) of the proposed detector for an example image from ABOships-PLUS over the four considered backbones: DLA, Hourglass, Resnet101 and Resnet18.

Figure 7. Qualitative transfer learning detection results for the ABOships-PLUS dataset with Centernet, over four feature extractors: (a) DLA, (b) Hourglass, (c) Resnet18 and (d) Resnet101. The ground-truth bounding boxes are shown as pink rectangles. Predicted boxes by Centernet with these backbones are depicted as purple bounding boxes for DLA, green for Hourglass, orange for Resnet18 and red for Resnet101. Each output box is associated with a class label and a score with a value in the interval [0, 1].

Table 1. Maritime datasets for object detection from RGB imagery.

Maritime Datasets
Denomination	Images	Annotations	Classes	Resolution
SeaShips [6]	31,455	40,077	6	1920 × 1080
SMD [11]	17,450	192,980	10	1920 × 1080
McShips [12]	14,709	26,529	13	varied
Harbour Surveillance [15]	48,966	70,513	1	2048 × 1536
SeaSAw [16]	1,900,000	14,600,000	12	varied
Maritime Ports [17]	19,337	27,849	12	1920 × 1080
ABOships [18]	9880	41,967	11	1920 × 720

Table 2. Properties of ABOships-PLUS: The table shows the ABOships-PLUS superclasses and their encompassing original classes of the ABOships dataset, the number of annotations (bounding boxes) for each superclass, the ratio of each superclass with respect to all annotations and the number of images that contain annotations representing each superclass.

Properties of ABOships-PLUS
ABOships-PLUS Super Class	ABOships Included Classes	Annotations per Class	Annotations Ratio over All Classes	Annotated Images per Class
Sailboat	Sailboat	8029	24%	3756
Powerboat	Boat, Motorboat	7244	22%	4044
Ship	Passengership, Ferry, Cruiseship, Cargoship, Militaryship, Miscboat	15,272	46%	5887
Stationary	Seamark Miscellaneous floaters	2682	8%	2151

Table 3. Mean Average Precision (

m A P

) (in %) of Centernet performance on the ABOships-PLUS dataset based on different feature extractors, using training from scratch. The table illustrates testing results without test augmentation (N.A.) as well as with flip testing (F) and multi- scale augmentation (MS). For multi-scale, NMS is used to merge results.

Table 3. Mean Average Precision (

m A P

) (in %) of Centernet performance on the ABOships-PLUS dataset based on different feature extractors, using training from scratch. The table illustrates testing results without test augmentation (N.A.) as well as with flip testing (F) and multi- scale augmentation (MS). For multi-scale, NMS is used to merge results.

Performance Metrics (mAP₅₀ in %) of Centernet for ABOships-PLUS Dataset from Scratch with Different Feature Extractors
Feature Extractor	N.A.	F	MS	MS&F
DLA	61.0	62.1	66.8	68.4
Hourglass	49.0	50.2	61.7	63.7
Resnet101	48.8	50.9	56.7	59.3
Restnet18	51.4	53.9	60.2	62.6

Table 4. Mean Average Precision (mAP) (in %) of Centernet performance on the ABOships-PLUS dataset based on different feature extractors, using transfer learning. The table illustrates testing results without test augmentation (N.A.) as well as with flip testing (F) and multi- scale augmentation (MS).

Performance Metrics (mAP₅₀ in %) of Centernet for ABOships-PLUS Dataset with Transfer Learning with Different Feature Extractors
Feature Extractor	N.A.	F	MS	MS&F
DLA	67.4	68.6	73.1	74.4
Hourglass	64.8	66.7	70.4	72.2
Resnet101	67.4	69.8	72.3	72.6
Restnet18	63.8	66.2	69.8	71.4

Table 5. Mean Average Precision (

m A P

) (in %) of Centernet performance on the ABOships-PLUS dataset based on different feature extractors, training from scratch, with different feature extractors and

m A P

scales: object sizes (

m A P_{S}

,

m A P_{M}

,

m A P_{L}

represent the mAP for small, medium, large objects). The results in the table below are obtained through N.A. testing.

Table 5. Mean Average Precision (

m A P

) (in %) of Centernet performance on the ABOships-PLUS dataset based on different feature extractors, training from scratch, with different feature extractors and

m A P

scales: object sizes (

m A P_{S}

,

m A P_{M}

,

m A P_{L}

represent the mAP for small, medium, large objects). The results in the table below are obtained through N.A. testing.

Performance of Centernet Training from Scratch for ABOships-PLUS Dataset (mAP across Scales)
Feature Extractor	${m A P}_{S}$	${m A P}_{M}$	${m A P}_{L}$
DLA	45.3	56.0	75.2
Hourglass	47.3	62.5	78.8
Resnet101	30.8	55.5	73.9
Restnet18	31.3	56.3	75.8

Table 6. Mean Average Precision (

m A P

) (in %) of Centernet performance on the ABOships-PLUS dataset based on different feature extractors, using transfer learning, with different feature extractors and

m A P

scales: object sizes (

m A P_{S}

,

m A P_{M}

,

m A P_{L}

represent the

m A P

for small, medium, large objects). The results in the table below are obtained through N.A. testing.

Table 6. Mean Average Precision (

m A P

) (in %) of Centernet performance on the ABOships-PLUS dataset based on different feature extractors, using transfer learning, with different feature extractors and

m A P

scales: object sizes (

m A P_{S}

,

m A P_{M}

,

m A P_{L}

represent the

m A P

for small, medium, large objects). The results in the table below are obtained through N.A. testing.

Performance of Transfer Learning for Centernet on the ABOships-PLUS Dataset (mAP across Scales)
Feature Extractor	${m A P}_{S}$	${m A P}_{M}$	${m A P}_{L}$
DLA	50.4	65.2	81.7
Hourglass	47.7	58.0	82.5
Resnet101	42.8	60.4	82.5
Restnet18	42.6	57.7	82.2

Table 7. Mean Average Precision (

m A P

) (in %) of Centernet performance on the original ABOships dataset based on different feature extractors, using transfer learning, with different feature extractors and

m A P

scales: object sizes (

m A P_{S}

,

m A P_{M}

,

m A P_{L}

represent the

m A P

for small, medium, large objects). The results in the table below are obtained through N.A. testing.

Table 7. Mean Average Precision (

m A P

) (in %) of Centernet performance on the original ABOships dataset based on different feature extractors, using transfer learning, with different feature extractors and

m A P

scales: object sizes (

m A P_{S}

,

m A P_{M}

,

m A P_{L}

represent the

m A P

for small, medium, large objects). The results in the table below are obtained through N.A. testing.

Performance of Transfer Learning for Centernet on the ABOships Dataset (mAP across Scales)
Feature Extractor	${m A P}_{S}$	${m A P}_{M}$	${m A P}_{L}$
DLA	46.71	59.5	80.4
Hourglass	47.5	59.5	81.0
Resnet101	40.6	56.3	74.1
Restnet18	33.3	56.8	74.1

Table 8. Average Precision (AP) (in %) for the performance of Centernet on the ABOships-PLUS dataset, based on different feature extractors, using transfer learning; the results show the inter-class variation.

Inter-Class Variation of Performance Metrics for Transfer Learning with Centernet on ABOships-PLUS
Feature Extractor	Powerboat	Sailboat	Ship	Stationary
DLA	69.5	71.9	78.4	64.1
Hourglass	69.6	73.0	74.1	74.2
Resnet101	66.5	71.7	61.7	73. 1
Restnet18	59.4	71.4	71.7	65.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Iancu, B.; Winsten, J.; Soloviev, V.; Lilius, J. A Benchmark for Maritime Object Detection with Centernet on an Improved Dataset, ABOships-PLUS. J. Mar. Sci. Eng. 2023, 11, 1638. https://doi.org/10.3390/jmse11091638

AMA Style

Iancu B, Winsten J, Soloviev V, Lilius J. A Benchmark for Maritime Object Detection with Centernet on an Improved Dataset, ABOships-PLUS. Journal of Marine Science and Engineering. 2023; 11(9):1638. https://doi.org/10.3390/jmse11091638

Chicago/Turabian Style

Iancu, Bogdan, Jesper Winsten, Valentin Soloviev, and Johan Lilius. 2023. "A Benchmark for Maritime Object Detection with Centernet on an Improved Dataset, ABOships-PLUS" Journal of Marine Science and Engineering 11, no. 9: 1638. https://doi.org/10.3390/jmse11091638

APA Style

Iancu, B., Winsten, J., Soloviev, V., & Lilius, J. (2023). A Benchmark for Maritime Object Detection with Centernet on an Improved Dataset, ABOships-PLUS. Journal of Marine Science and Engineering, 11(9), 1638. https://doi.org/10.3390/jmse11091638

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Benchmark for Maritime Object Detection with Centernet on an Improved Dataset, ABOships-PLUS

Abstract

1. Introduction

2. Related Work

2.1. Class-Generic Object Detection

2.2. Maritime Object Detection

3. CNN-Based Detectors

3.1. Anchor-Based Detectors

3.2. Anchor-Free Object Detectors

4. ABOships Dataset

5. ABOships-PLUS—An Improved ABOships Dataset

6. Experimental Results

6.1. Evaluation Criteria

6.2. Training

6.3. Transfer Learning vs. Learning from Scratch

7. Discussion

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI