Next Article in Journal
Bloch Oscillations in the Chains of Artificial Atoms Dressed with Photons
Previous Article in Journal
Evaluation of Water Content in an Active Layer Using Penetration-Type Time Domain Reflectometry
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Probabilistic Ship Detection and Classification Using Deep Learning

School of Electrical and Electronic Engineering, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul 03722, Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2018, 8(6), 936; https://doi.org/10.3390/app8060936
Submission received: 9 April 2018 / Revised: 15 May 2018 / Accepted: 2 June 2018 / Published: 5 June 2018

Abstract

:
For an autonomous ship to navigate safely and avoid collisions with other ships, reliably detecting and classifying nearby ships under various maritime meteorological environments is essential. In this paper, a novel probabilistic ship detection and classification system based on deep learning is proposed. The proposed method aims to detect and classify nearby ships from a sequence of images. The method considers the confidence of a deep learning detector as a probability; the probabilities from the consecutive images are combined over time by Bayesian fusion. The proposed ship detection system involves three steps. In the first step, ships are detected in each image using Faster region-based convolutional neural network (Faster R-CNN). In the second step, the detected ships are gathered over time and the missed ships are recovered using the Intersection over Union of the bounding boxes between consecutive frames. In the third step, the probabilities from the Faster R-CNN are combined over time and the classes of the ships are determined by Bayesian fusion. To train and evaluate the proposed system, we collected thousands of ship images from Google image search and created our own ship dataset. The proposed method was tested with the collected videos and the mean average precision increased by 89.38 to 93.92% in experimental results.

1. Introduction

Accurate detection and reliable classification of nearby moving ships are essential functions of an autonomous ship, being closely linked to safe navigation [1,2]. When a ship navigates, the chance of collision with other ships is possible in various directions, such as those that overtake, approach head-on, or cross the autonomous ship. The International Regulations for Preventing Collisions at Sea (COLREGs) defines several rules to prevent collisions [3]. In particular, overtaking (rule 13), head-on (rule 14), and crossing (rule 15) situations are considered potential collision scenarios. Autonomous ships mainly collect information related to moving obstacles through non-visual sensors such as radar [4] and automatic identification systems (AIS) [5]. However, recognizing the nearby ships reliably is difficult if only using information collected from non-visual sensors to determine whether these are dangerous obstacles. Therefore, autonomous ships must recognize dangerous obstacles using a visual camera. This problem is similar to the detection of cars, pedestrians, lane, or traffic signs using a camera in autonomous vehicles.
Hitherto, some research concerning ship detection and classification has been reported. For example, seashore ship surveillance and ship detection from spaceborne optical images have been achieved [6,7,8,9]. Synthetic aperture radar (SAR) imagery was used to detect ships and objects on the surface of the earth [7,8]. Hwang et al. used artificial neural networks (ANN) for ship detection with X-band Kompsat-5 SAR imagery [9]. Unfortunately, most of the existing research focused only on ship detection based on spaceborne optical images, such as SAR imagery. Furthermore, these studies focused on visual ship detection based only on a single image. All previous works on ship detection and classification were based on a still image. To the best of our knowledge, no studies exist for the detection of ships using an image sequence or a video.
In this study, we propose a novel probabilistic ship detection and classification method using deep learning. This method considers the confidence from a deep learning detector as a probability and the probabilities from consecutive images are combined over time via Bayesian fusion. To the best of our knowledge, no research work has used the confidence from a deep learning detector in a Bayesian framework. The proposed ship detection system involves three steps. In the first step, ships are detected for each frame using Faster R-CNN [10]. In the second step, the detected ships are gathered over time and the missed ships are recovered using the Intersection over Union (IoU) of the bounding boxes between consecutive frames. The corresponding detection confidence is updated and the recovery compensates for the misdetection confidence over a few frames. This approach ensures robust ship detection and minimizes misdetection. In the third step, the probabilities from the Faster R-CNN are combined over time and the classes of the ships are determined by Bayesian fusion. The use of Bayesian fusion was supported by its reported use in prior studies [11,12].
To use a deep learning framework in ship detection, a ship dataset was needed to train the Faster R-CNN. Well-known image datasets, such as ImageNet [13], PASCAL visual object classes (VOC) challenge [14], and Microsoft common objects in context (MS COCO) [15], include ship images but the number of ship images is limited and the various classes of ships are not labeled. Popular intelligent transportation system (ITS) datasets, such as the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) dataset [16], also does not include ship images. Because no public dataset exists for ship detection in the sea environment, we manually collected thousands of ship images from Google image search and built our own ship dataset.
The contributions of this paper can be summarized as follows. (1) This was the first attempt to detect and classify various classes of ships in a deep learning framework. (2) The confidence from a deep learning detector was considered as a probability and their values from the consecutive images were combined over time via Bayesian fusion. (3) Missed ships were recovered using the IoU of the bounding boxes between consecutive frames. (4) Large-scale ship detection dataset has been built by collecting ship images from google image search and annotating ground-truth bounding boxes.
The remainder of this paper is organized as follows: in Section 2, the background for the Faster R-CNN and the basic idea underlying this paper are outlined. In Section 3 and Section 4, the details about the proposed method are explained. In Section 5, the experimental results, performance, and discussion are presented. Finally, the conclusions drawn from this study are presented in Section 6.

2. Ship Detection and Classification from an Image

In this study, ships were detected in each frame using Faster R-CNN [10], as in our previous work [17]. The Faster R-CNN is a representative region-based object detection model based on deep learning. As shown by Huang et al. [18], Faster R-CNN outperforms the other models [19,20] in general object detection problem. Although R-CNN [21] and the Fast R-CNN [22] use Selective Search [23] to generate possible object locations, Faster R-CNN introduced the region proposal network (RPN), which outputs region proposals from shared full-image convolutional features, thereby improving speed performance. Faster R-CNN combines RPN and Fast R-CNN into a single network for object detection by sharing their convolutional features, as shown in Figure 1.
When an image is used as input data, the convolutional neural network (CNN) generates the convolutional features. Then, the fully-convolutional RPN predicts the bounding box and object scores at each position of the convolutional features, as shown in Figure 1b. Thus, the RPN tells the Fast R-CNN where to look and classify. In our experiments, we used the Zeiler and Fergus model (ZF net) [24] that five shareable convolutional layers.
The Faster R-CNN is trained with a four-step training algorithm to learn shared features via alternative optimization. In the first step, the RPN is initialized with a pre-trained model and then fine-tuned end-to-end to propose regions. In the second step, the Fast R-CNN is trained using the region proposals generated by the first-step RPN not sharing convolutional layers. In the third step, the shared convolutional layers are fixed and the unique layers of RPN are fine-tuned. Finally, the layers unique to Fast R-CNN are fine-tuned while maintaining the shared convolutional layers. The detailed alternating algorithm for training the Faster R-CNN is found in Ren et al. [10].
The Faster R-CNN detection result for a single image can be expressed as a bounding box represented by:
B = ( v x , v y , v w , v h ) ,
where B denotes the four values of the bounding box: coordinates ( v x , v y ), width ( v w ), and height ( v h ), as shown in Figure 1b. The class confidence of the bounding box predicted by the Faster R-CNN can be represented by:
p ( ω = k | B ) p k ,   where   k = 1 C p k = 1 ,
where ω denotes the class of ship, k { 1 , 2 , 3 , , C } is one of the possible values that ω can take, B is the bounding box predicted by the Faster R-CNN, and C is the number of classes in the ship dataset created in this study.
We used seven different classes of ships in this study; thus, C was set to eight, including the background. The class index is summarized in Table 1. The class of the detected bounding box is predicted by:
c l a s s ( B ) = arg max k = 1 , , C p k .
As shown in Equation (3), the determined class of the predicted bounding box is the class with the highest confidence. Our method considers the class and detection confidence from the Faster R-CNN as the probability and exploits it using Bayesian fusion.

3. Building a Sequence of Bounding Boxes

In this section, we build a sequence for the bounding boxes using the boxes returned from the Faster R-CNN over time. In building the bounding box sequence, two practical issues had to be considered. The first issue was which bounding box to select at each time to create a reasonable sequence. The second issue involved how to handle the situation in which all the bounding boxes at time t did not make sense and when the target ship has apparently not been detected. To address these issues, we used the intersection over union (IoU) of the target bounding box and the predicted bounding boxes. Figure 2 illustrates two bounding boxes with IoU of 0.3, 0.4, and 0.9. For the two given bounding boxes B 1 and B 2 , IoU computes the intersection of two boxes divided by the area of their union as follows:
I o U ( B 1 , B 2 ) = a r e a ( B 1 B 2 ) a r e a ( B 1 B 2 )
Concerning the first issue, we assumed that the target ships do not move rapidly at sea. Therefore, when the Faster R-CNN returns R bounding boxes from a given image in the tth frame, the bounding box with the largest IoU with the bounding box B t 1 in the previous frame is used as the bounding box B t in the current frame:
B t = arg max B r t { B 1 t , , B R t } I o U ( B r t , B t 1 )
where B r t denotes the rth predicted bounding boxes returned by Faster R-CNN in the tth frame.
Concerning the second issue, when the detector in the current frame did not predict the position of the ship correctly and max B r t { B 1 t , , B R t } I o U ( B r t , B t 1 ) < ε t h d , the target ship is likely to be missed. In this case, we enlarged B t slightly from B t 1 by adding an offset to avoid missing ship detection, where ε t h d denotes a threshold. This can be represented by:
B t = B t 1 + δ B = ( v x t 1 , v y t 1 , v w t 1 , v h t 1 ) + ( δ v x , δ v y , δ v w , δ v h ) = ( v x t 1 + δ v x , v y t 1 + δ v y , v w t 1 + δ v w , v h t 1 + δ v h ) ,
where δ B = ( δ v x , δ v y , δ v w , δ v h ) denotes an offset added to the bounding box. Figure 3 shows the update of the target ship-bounding box based on IoU.
If R bounding boxes are predicted in the first frame, the initial bounding box was selected as the one with the highest detection confidence busing:
B 1 = arg max B r 1 { B 1 1 , , B R 1 } k = 1 , , C 1 p ( ω = k | B r 1 ) .
For example, Faster R-CNN predicts four bounding boxes with class confidence in the first frame, as shown in Figure 4. Since the determined class of the predicted bounding box is the class with the highest confidence, the classes of the detected boxes from ① to ④ are an aircraft carrier (0.34), bulk carrier (0.895), bulk carrier (0.668), and a destroyer (0.422), respectively. In this case, from Equation (7), we selected bounding box ② as the initial bounding box in the video.

4. Probabilistic Ship Detection and Classification in a Sequence of Images

In this section, B 1 : T = { B 1 , B 2 , , B t , , B T } is a sequence of the bounding boxes predicted by the Faster R-CNN, where B t is the bounding box detected at time t . We determine the class of the bounding box sequence B 1 : T using maximum a posteriori (MAP). That is, the class of the sequence of the bounding boxes is predicted by:
c l a s s ( B 1 : T ) = arg max k = 1 , . . , C p ( ω = k | B 1 : T )
Assuming that t denotes the current time, we can rewrite Equation (8) as:
p ( ω = k | B 1 : t ) = p ( ω = k | B t , B 1 : t 1 ) ,
where B 1 : t is divided into the current measurement B t and all the previous measurements are B 1 : t 1 . Using the Bayes rule, Equation (9) can be rewritten as:
p ( ω = k | B 1 : t ) = p ( ω = k | B t , B 1 : t 1 ) = p ( B t | ω = k , B 1 : t 1 ) p ( ω = k | B 1 : t 1 ) p ( B t | B 1 : t 1 ) .
Since the current measurement B t is not affected by previous measurements B 1 : t 1 conditioned on ω = k , we obtain p ( B t | ω = k , B 1 : t 1 ) = p ( B t | ω = k ) , and Equation (10) can be simplified as:
p ( ω = k | B 1 : t ) = p ( B t | ω = k ) p ( ω = k | B 1 : t 1 ) p ( B t | B 1 : t 1 ) .
Substituting the Bayes rule
p ( B t | ω = k ) = p ( ω = k | B t ) p ( B t ) p ( ω = k )
into Equation (11) yields:
p ( ω = k | B 1 : t ) = p ( ω = k | B t ) p ( B t ) p ( ω = k ) p ( ω = k | B 1 : t 1 ) p ( B t | B 1 : t 1 )
Furthermore, we define the class confidence of a sequence of bounding boxes from the Faster R-CNN with:
p ( ω = k | B 1 : t ) f k t ,   where   k = 1 C f k t = 1
and consider a new quantity:
1 f k t f k t = k = 1 C f k t f k t f k t .
Substituting Equation (13) into Equation (15) yields:
1 f k t f k t = k = 1 C p ( ω = k | B t ) p ( B t ) p ( ω = k ) p ( ω = k | B 1 : t 1 ) p ( B t | B 1 : t 1 ) p ( ω = k | B t ) p ( B t ) p ( ω = k ) p ( ω = k | B 1 : t 1 ) p ( B t | B 1 : t 1 ) p ( ω = k | B t ) p ( B t ) p ( ω = k ) p ( ω = k | B 1 : t 1 ) p ( B t | B 1 : t 1 ) = k = 1 C p ( ω = k | B t ) p ( ω = k | B 1 : t 1 ) p ( ω = k ) p ( ω = k | B t ) p ( ω = k | B 1 : t 1 ) p ( ω = k ) 1   .
Herein, we denote the detection confidence for the bounding box selected in Equation (5) as:
p k t p ( ω = k | B t ) ,   where   k = 1 C p k t = 1 .
For practical consideration, if the detector missed the target ship, we considered the recovered bounding box in Equation (6) as a background; then, its detection confidence is assigned by:
p k t = { 0.9 f o r    k = C   ( b a c k g r o u n d ) 0.1 C 1 o t h e r w i s e
Then, substituting Equation (17) into Equation (16) yields:
1 f k t f k t = k = 1 C p k t f k t 1 f k 0 p k t f k t 1 f k 0 1 ,
where f k t 1 is the previous confidence of the sequence B 1 : t 1 at time t 1 , f k 0 is the initial confidence, and p k t is the confidence of the tth frame of B t .
If we define
ρ k t k = 1 C p k t f k t 1 f k 0 p k t f k t 1 f k 0 ,
then, we can obtain the following from Equation (19):
f k t = 1 ρ k t ,   where   k = 1 ,   2 ,   ,   C .
From Equations (19) and (20), we can update the sequence confidence f k t at time t from the previous sequence confidence f k t 1 at time t 1 , and the current frame confidence p k t from the Faster R-CNN at time t . Thus, we did not need to retain all the previous frame confidences to compute the current sequence confidence. Then, we can predict the class of a sequence B 1 : t from Equation (8). In this study, we set the IoU threshold ε t h d to 0.5 and δ B to (−1, −1, 1, 1). Summarizing the abovementioned results, the proposed probabilistic ship detection algorithm using video is outlined in Algorithm 1 and illustrated in Figure 5.
Algorithm 1: Probabilistic ship detection and classification from video.
Step 1:At frame 1, initialize the target ship-bounding box
B 1 = arg max B r 1 { B 1 1 , , B R 1 } k = 1 , , C 1 p ( ω = k | B r 1 )
f k 1 = p k 1 = p ( ω = k | B 1 )
f k 0 = 1 / C     f o r     k
Step 2:For a given image at frame t > 1 , update the bounding boxes
If max B r t { B 1 t , , B R t } I o U ( B r t , B t 1 ) ε t h d ,
    B t = arg max B r t { B 1 t , , B R t } I o U ( B r t , B t 1 )
    p k t = p ( ω = k | B t )
else
    B t = B t 1 + δ B
    p k t = { 0.9 f o r    k = C   ( b a c k g r o u n d ) 0.1 / ( C 1 ) o t h e r w i s e
End
Step 3:Evaluate the class confidence of a sequence of bounding boxes recursively by
ρ k t = k = 1 C p k t f k t 1 f k 0 p k t f k t 1 f k 0 f k t = 1 ρ k t
Step 4:Determine the class at frame t using
c l a s s ( B 1 : t ) = arg max k = 1 , . . , C f k t
Step 5:For every next frame, repeat Steps 2, 3, and 4.

5. Experimental Results

We built our own ship dataset to train the Faster R-CNN and evaluate the proposed method. For this dataset, 7000 ship images were collected by Google image search and they were manually labeled as one of seven classes: aircraft carrier, destroyer, submarine, container ship, bulk carrier, cruise ship, and tugboat.

5.1. Ship Dataset

The dataset mainly focused on large ships, which were divided into two types navigating in the ocean: warship and merchant ship. Three warship classes exist: aircraft carrier, destroyer, and submarine. Three classes of merchant also exist: container ship, bulk carrier, and cruise ship. Finally, we included a small ship in the dataset, a tugboat that assists large ships in entering and leaving a port, resulting in seven classes.
To train the Faster R-CNN for ship detection and evaluate single image detection, a total of 7000 still images were collected. Each class included 1000 images, completely balancing the problem. All the still images were manually gathered from Google image search. Most of the collected still images were completely different from and separated from other collected images and none were not consecutive. The ship image dataset was divided into a training dataset and a test dataset, as summarized in Table 2. In detail, 5250 images (75%, 750 images per class) among the 7000 images were used for the training the Faster R-CNN, and 1750 images (25%, 250 images per class) were used to test single image detection.
To evaluate ship detection performance on videos, seven video clips involving all the aforementioned classes were downloaded from YouTube in MPEG-4 video format. A test video file was decomposed into a sequence of still images that were consecutive in time and each image was processed by Faster R-CNN. The detection result of each image was combined with that of the consecutive images and the combined result was used in video simulation.

5.2. Performance

5.2.1. Results of the Single Image Detection

The same hyper parameters used in the original Faster R-CNN [10] were applied to train our Faster R-CNN for ship detection. The hyper parameters used in our experiments were as follows: learning rate: 0.001, momentum: 0.9, and weight decay, 0.0005. The ZF net pre-trained on ImageNet was used as a base CNN to extract features and fine-tune the network using our ship dataset. The maximum iteration was set to 10,000. We use the Caffe [25] framework to train the Faster R-CNN on Ubuntu 16.04 LTS and NVIDIA GeForce GTX 980 on GPU. Table 3 shows the results of the ship detection using the Faster R-CNN fine-tuned by the above training set. The results of the ship detection using the Faster R-CNN by the test sample images are shown in Figure 6.

5.2.2. Results of Detection Based on Video

Seven sequences of images were used to demonstrate the performance of the proposed method based on video. Among them, two videos were considered in detail. In the first sequence involving a tugboat, the weather was relatively fine and the ships were not influenced by environmental factors. In the second sequence involving a destroyer, however, the weather was windy and the environmental factors, such as waves and wind, influence ships. The Faster R-CNN returned eight confidences, one for each class, and the eight confidences equaled 1 in each frame, as shown in Equation (17). In Figure 7, the changes in the eight confidences are plotted against the frames for the first sequence. The subfigures in the first, second and third rows correspond to the Faster R-CNN; Faster R-CNN and IoU tracking; and Faster R-CNN, IoU tracking and Bayesian fusion, respectively.
The confidence in Figure 7a implies p k t = p ( ω = k | B t ) and the confidence in Figure 7c implies f k t = p ( ω = k | B 1 : t ) . In the first sequence, the target ship is a tugboat. In Figure 7a, the tugboat is classified as a bulk carrier four times by Faster R-CNN. The confidence of the tugboat also does not remain steady but changes irregularly. In Figure 7b, IoU tracking is also used with the Faster R-CNN. When the target was not detected or the IoU was lower than the threshold, the bounding box was considered background and the corresponding confidences were assigned by Equation (18). However, in the figure, no background confidence was observed since all the targets in each frame were detected by the Faster R-CNN and the IoU values from IoU tracking were higher than the threshold. Figure 7c shows the experimental result when Faster R-CNN, IoU tracking and Bayesian fusion were used together. The confidence for the tugboat was steady and approached one after a few frames, and the confidences for the other classes disappeared and approached zero.
The experimental results for the second sequence are shown in Figure 8. Unlike the first sequence, the ships were affected by environmental factors. The target ship in the second sequence was a destroyer. Similar to Figure 7, the change in the eight confidences is plotted against frames for the second sequence in Figure 8. The subfigures in the first, second and third rows in Figure 8 correspond to Faster R-CNN; Faster R-CNN and IoU tracking; and Faster R-CNN, IoU tracking and Bayesian fusion, respectively.
First, let us consider Intervals 1 and 4 in Figure 8. In the two intervals, the destroyer was falsely classified as an aircraft carrier, as shown in Figure 8a,b. In particular, the target ship was classified not as a destroyer but as an aircraft carrier in six frames in a row from frame 235 to 240. However, the proposed method overcame the false classification problem and the confidence for the destroyer progressed steadily to one, as shown in Figure 8c. Second, consider Interval 2. In this interval, the destroyer was not detected and was actually classified as background by the IoU tracking several times. However, the proposed method again overcame the misdetections again and the confidence for a destroyer progressed steadily to one, as shown in Figure 8c. Third, consider Interval 3, which was slightly different from Intervals 1, 2 and 4. Unlike the previous intervals, several misdetections and tens of false classifications occurred together in Interval 3. The proposed method worked well even for this challenging situation for the first 30 frames but the frequency of the misdetection and false classification exceeded a certain threshold. Moreover, our algorithm failed to classify effectively and returned the wrong result.
Three competing methods are compared on a per-frame basis in Table 4. As stated, the ground truth was a destroyer. In Intervals 1 and 4, only Faster R-CNN returned a false classification for the aircraft carrier in several frames such as #19, #20 and #21 and its confidence remained around 0.5. When Faster R-CNN was combined with IoU tracking and Bayesian fusion, however, the confidence steadily increased and approached one. In Interval 2, only Faster R-CNN often missed the destroyer but when it was combined with IoU tracking and Bayesian fusion, Faster R-CNN overcame the misdetections and the confidence also approached one. Here, when the target was not detected or falsely classified for several consecutive frames, for example, during frames #124 to #127 in Interval 2 frame or frames #235 to #240 in Interval 4, the confidence for the destroyer dropped to around 0.9. However, the confidence quickly recovered from the loss when Faster R-CNN returned the correct result. Figure 9, Figure 10 and Figure 11 show captured images from the frames #19 to #21 in Interval 1, frames #125 to #127 in Interval 2 and frames #238 to #240 in interval 4, respectively.
The performance of ship detection with the proposed method on test videos was compared with using only Faster R-CNN for ship detection. The ship detection results are summarized in Table 5. Overall, our proposed method outperformed the previous Faster R-CNN detector.

6. Conclusions

In this study, a probabilistic ship detection and classification system for video using deep learning was proposed. To train the detector and evaluate the proposed system, we collected thousands of ship images from a Google image search and built our own ship dataset. The probabilistic ship detection and classification system demonstrated better detection and classification performance compared to when only Faster R-CNN was used. The proposed method used IoU tracking to build a sequence of the bounding boxes and considered the confidence from the detector as a probability. The undetected ships were recovered by IoU tracking. Moreover, the probabilities of the detection accumulated over time and the classes of the ships were determined by Bayesian fusion. In the experiments, the proposed method was tested with two sequences of images and showed considerable improvement in both detection and classification over prior methods.

Author Contributions

K.K., S.H. and E.K. designed the algorithm, and carried out the experiment, analyzed the result and wrote the paper; B.C. analyzed the data and gave helpful suggestion on this research.

Acknowledgments

This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology under Grant NRF-2016R1A2A2A05005301.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Statheros, T.; Howells, G.; Maier, K.M. Autonomous ship collision avoidance navigation concepts, technologies and techniques. J. Navig. 2008, 61, 129–142. [Google Scholar] [CrossRef]
  2. Sun, X.; Wang, G.; Fan, Y.; Mu, D.; Qiu, B. An Automatic Navigation System for Unmanned Surface Vehicles in Realistic Sea Environments. Appl. Sci. 2018, 8, 193. [Google Scholar] [Green Version]
  3. International Maritime Organization (IMO). Available online: http://www.imo.org/en/About/Conventions/-ListOfConventions/Pages/COLREG.aspx (accessed on 3 April 2018).
  4. Liu, Z.; Zhang, Y.; Yu, X.; Yuan, C. Unmanned surface vehicles: An overview of developments and challenges. Annu. Rev. Control 2016, 41, 71–93. [Google Scholar] [CrossRef]
  5. Stitt, I.P.A. AIS and collision avoidance—A sense of déjà vu. J. Navig. 2004, 57, 167–180. [Google Scholar] [CrossRef]
  6. Tang, J.; Deng, C.; Huang, G.B.; Zhao, B. Compressed-domain ship detection on spaceborne optical image using deep neural network and extreme learning machine. IEEE Trans. Geosci. Remote Sens. 2015, 53, 1174–1185. [Google Scholar] [CrossRef]
  7. Crisp, D.J. The State-of-the-Art in Ship Detection in Synthetic Aperture Radar Imagery; No. DSTO-RR-0272; Defence Science and Technology Organisation Salisbury (Australia) Info Sciences Lab: Canberra, Australia, 2004. [Google Scholar]
  8. Migliaccio, M.; Nunziata, F.; Montuori, A.; Brown, C.E. Marine added-value products using RADARSAT-2 fine quad-polarization. Can. J. Remote Sens. 2012, 37, 443–451. [Google Scholar] [CrossRef]
  9. Hwang, J.I.; Chae, S.H.; Kim, D.; Jung, H.S. Application of Artificial Neural Networks to Ship Detection from X-Band Kompsat-5 Imagery. Appl. Sci. 2017, 7, 961. [Google Scholar] [CrossRef]
  10. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems (NIPS), Proceedings of the NIPS 2015, Montréal, Canada, 7–12 December 2015; NIPS Foundation, Inc.: La Jolla, CA, USA, 2015; pp. 91–99. [Google Scholar]
  11. Park, S.; Hwang, J.P.; Kim, E.; Lee, H.; Jung, H.G. A neural network approach to target classification for active safety system using microwave radar. Expert Syst. Appl. 2010, 37, 2340–2346. [Google Scholar] [CrossRef]
  12. Hong, S.; Lee, H.; Kim, E. Probabilistic gait modelling and recognition. IET Comput. Vis. 2013, 7, 56–70. [Google Scholar] [CrossRef]
  13. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Berg, A.C.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
  14. Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
  15. Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the European Conference on Computer Vision (ECCV) 2014, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
  16. Geiger, A.; Lenz, P.; Urtasun, R. Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
  17. Kim, K.H.; Hong, S.J.; Choi, B.H.; Kim, I.H.; Kim, E.T. Ship Detection Using Faster R-CNN in Maritime Scenarios. In Proceedings of the Conference on Information and Control Systems (CICS) 2017, Dubal, United Arab Emirates, 29–30 April 2017; pp. 158–159. (In Korean). [Google Scholar]
  18. Huang, J.; Rathod, V.; Sun, C.; Zhu, M.; Korattikara, A.; Fathi, A.; Fischer, I.; Wojna, Z.; Song, Y.; Murphy, K.; et al. Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors. In Proceedings of the IEEE 2017 Conference on Computer Vision and Pattern Recognition (CVPR) 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 7310–7311. [Google Scholar]
  19. Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object detection via region-based fully convolutional networks. In Advances in Neural Information Processing Systems (NIPS), Proceedings of the NIPS 2016 Barcelona, Spain, 5–10 December 2016; NIPS Foundation, Inc.: La Jolla, CA, USA, 2016; pp. 379–387. [Google Scholar]
  20. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multiBox Detector. In Proceedings of the European Conference on Computer Vision (ECCV) 2016, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
  21. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE 2014 Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
  22. Girshick, R. Fast R-CNN. In Proceedings of the IEEE 2015 Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1440–1448. [Google Scholar]
  23. Uijlings, J.R.; Van De Sande, K.E.; Gevers, T.; Smeulders, A.W. Selective search for object recognition. Int. J. Comput. Vis. 2013, 104, 154–171. [Google Scholar] [CrossRef]
  24. Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland; pp. 818–833. [Google Scholar]
  25. Jia, Y.; Shelhamer, E.; Donahue, J.; Karayev, S.; Long, J.; Girshick, R.; Guadarrama, S.; Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA, 3–7 November 2014; pp. 675–678. [Google Scholar]
Figure 1. Structures of (a) Faster region-based convolution neural network (Faster R-CNN) and (b) region proposal network (RPN).
Figure 1. Structures of (a) Faster region-based convolution neural network (Faster R-CNN) and (b) region proposal network (RPN).
Applsci 08 00936 g001
Figure 2. An illustration of two bounding boxes with an Intersection over Union (IoU) of (a) 0.3, (b) 0.5, and (c) 0.9.
Figure 2. An illustration of two bounding boxes with an Intersection over Union (IoU) of (a) 0.3, (b) 0.5, and (c) 0.9.
Applsci 08 00936 g002
Figure 3. IoU tracking. (a) IoU is equal to or larger than the threshold and (b) IoU is less than the threshold. If IoU is less than the threshold, the bounding box in frame t 1 increases slightly and is updated as the target ship-bounding box to avoid missing ship detection.
Figure 3. IoU tracking. (a) IoU is equal to or larger than the threshold and (b) IoU is less than the threshold. If IoU is less than the threshold, the bounding box in frame t 1 increases slightly and is updated as the target ship-bounding box to avoid missing ship detection.
Applsci 08 00936 g003
Figure 4. Initial bounding box selection in the first frame.
Figure 4. Initial bounding box selection in the first frame.
Applsci 08 00936 g004
Figure 5. Robust single ship detection system based on video in our proposed algorithm.
Figure 5. Robust single ship detection system based on video in our proposed algorithm.
Applsci 08 00936 g005
Figure 6. Results of the ship detection using the Faster R-CNN based on the test images.
Figure 6. Results of the ship detection using the Faster R-CNN based on the test images.
Applsci 08 00936 g006
Figure 7. The change of confidences in the image sequence without environmental factors. (a) Faster R-CNN, (b) Faster R-CNN and IoU tracking, and (c) Faster R-CNN, IoU tracking, and Bayesian fusion.
Figure 7. The change of confidences in the image sequence without environmental factors. (a) Faster R-CNN, (b) Faster R-CNN and IoU tracking, and (c) Faster R-CNN, IoU tracking, and Bayesian fusion.
Applsci 08 00936 g007aApplsci 08 00936 g007b
Figure 8. The change in the confidences in the image sequence with environmental factors: (a) Faster R-CNN, (b) Faster R-CNN and IoU tracking, and (c) Faster R-CNN, IoU tracking and Bayesian fusion.
Figure 8. The change in the confidences in the image sequence with environmental factors: (a) Faster R-CNN, (b) Faster R-CNN and IoU tracking, and (c) Faster R-CNN, IoU tracking and Bayesian fusion.
Applsci 08 00936 g008aApplsci 08 00936 g008b
Figure 9. Captured images from Interval 1.
Figure 9. Captured images from Interval 1.
Applsci 08 00936 g009
Figure 10. Captured images from Interval 2.
Figure 10. Captured images from Interval 2.
Applsci 08 00936 g010
Figure 11. Captured images from Interval 4.
Figure 11. Captured images from Interval 4.
Applsci 08 00936 g011
Table 1. Classes in the ship dataset.
Table 1. Classes in the ship dataset.
k 12345678
LabelAircraft carrierDestroyerSubmarineBulk carrierContainer shipCruise shipTugboatBack-ground
Table 2. The number of images in the ship dataset.
Table 2. The number of images in the ship dataset.
ClassTraining SetTest SetSubtotal
Aircraft carrier7502501000
Destroyer7502501000
Submarine7502501000
Container ship7502501000
Bulk carrier7502501000
Cruise ship7502501000
Tugboat7502501000
Total525017507000
Table 3. Results of the single image detection.
Table 3. Results of the single image detection.
ClassAP (%)
Aircraft carrier90.56
Destroyer87.98
Submarine90.22
Container ship99.60
Bulk carrier99.59
Cruise ship99.59
Tugboat95.01
mAP94.65
Table 4. The change of the class and the confidence of the destroyer.
Table 4. The change of the class and the confidence of the destroyer.
AlgorithmFaster R-CNNFaster R-CNN + IoUFaster R-CNN + IoU + Bayes
IntervalFrameClassConfidenceClassConfidenceClassConfidence
119Aircraft Carrier0.420Aircraft Carrier0.420Destroyer0.999
20Aircraft Carrier0.403Aircraft Carrier0.403Destroyer0.999
21Aircraft Carrier0.539Aircraft Carrier0.539Destroyer0.997
22Destroyer0.403Destroyer0.403Destroyer0.998
23Destroyer0.433Destroyer0.433Destroyer0.999
24Aircraft Carrier0.548Aircraft Carrier0.548Destroyer0.997
25Destroyer0.463Destroyer0.463Destroyer0.998
26Destroyer0.609Destroyer0.609Destroyer0.999
27Destroyer0.611Destroyer0.611Destroyer0.999
28Destroyer0.420Destroyer0.420Destroyer0.999
29Aircraft Carrier0.434Aircraft Carrier0.434Destroyer0.999
2118Destroyer0.546Destroyer0.546Destroyer0.999
119Misdetection0Background0.900Destroyer0.999
120Destroyer0.508Destroyer0.508Destroyer0.999
121Destroyer0.454Destroyer0.454Destroyer0.999
122Destroyer0.672Destroyer0.672Destroyer0.999
123Destroyer0.432Destroyer0.432Destroyer0.999
124Misdetection0Background0.900Destroyer0.999
125Misdetection0Background0.900Destroyer0.995
126Misdetection0Background0.900Destroyer0.966
127Misdetection0Background0.900Destroyer0.806
128Destroyer0.486Destroyer0.486Destroyer0.847
4234Destroyer0.612Destroyer0.612Destroyer0.999
235Aircraft Carrier0.611Aircraft Carrier0.611Destroyer0.999
236Aircraft Carrier0.648Aircraft Carrier0.648Destroyer0.999
237Aircraft Carrier0.616Aircraft Carrier0.616Destroyer0.990
238Aircraft Carrier0.521Aircraft Carrier0.521Destroyer0.983
239Aircraft Carrier0.630Aircraft Carrier0.630Destroyer0.955
240Aircraft Carrier0.642Aircraft Carrier0.642Destroyer0.758
241Destroyer0.858Destroyer0.858Destroyer0.974
242Destroyer0.870Destroyer0.870Destroyer0.997
243Destroyer0.814Destroyer0.814Destroyer0.999
244Aircraft Carrier0.575Aircraft Carrier0.575Destroyer0.998
245Destroyer0.766Destroyer0.766Destroyer0.999
246Destroyer0.801Destroyer0.801Destroyer0.999
247Destroyer0.689Destroyer0.689Destroyer0.999
248Destroyer0.639Destroyer0.639Destroyer0.999
249Aircraft Carrier0.601Aircraft Carrier0.601Destroyer0.999
250Destroyer0.670Destroyer0.670Destroyer0.999
251Aircraft Carrier0.558Aircraft Carrier0.558Destroyer0.999
252Aircraft Carrier0.632Aircraft Carrier0.632Destroyer0.998
253Aircraft Carrier0.553Aircraft Carrier0.553Destroyer0.997
254Aircraft Carrier0.651Aircraft Carrier0.651Destroyer0.991
Table 5. Performance of ship detection on test videos.
Table 5. Performance of ship detection on test videos.
ClassAP (%)
Faster R-CNNFaster R-CNN + IoU + Bayes
Aircraft carrier99.33100.00
Destroyer68.6777.61
Submarine98.00100.00
Container ship76.6988.19
Bulk carrier88.0094.67
Cruise ship96.3396.97
Tugboat98.67100.00
mAP (%)89.3893.92

Share and Cite

MDPI and ACS Style

Kim, K.; Hong, S.; Choi, B.; Kim, E. Probabilistic Ship Detection and Classification Using Deep Learning. Appl. Sci. 2018, 8, 936. https://doi.org/10.3390/app8060936

AMA Style

Kim K, Hong S, Choi B, Kim E. Probabilistic Ship Detection and Classification Using Deep Learning. Applied Sciences. 2018; 8(6):936. https://doi.org/10.3390/app8060936

Chicago/Turabian Style

Kim, Kwanghyun, Sungjun Hong, Baehoon Choi, and Euntai Kim. 2018. "Probabilistic Ship Detection and Classification Using Deep Learning" Applied Sciences 8, no. 6: 936. https://doi.org/10.3390/app8060936

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop