Probabilistic Ship Detection and Classiﬁcation Using Deep Learning

: For an autonomous ship to navigate safely and avoid collisions with other ships, reliably detecting and classifying nearby ships under various maritime meteorological environments is essential. In this paper, a novel probabilistic ship detection and classiﬁcation system based on deep learning is proposed. The proposed method aims to detect and classify nearby ships from a sequence of images. The method considers the conﬁdence of a deep learning detector as a probability; the probabilities from the consecutive images are combined over time by Bayesian fusion. The proposed ship detection system involves three steps. In the ﬁrst step, ships are detected in each image using Faster region-based convolutional neural network (Faster R-CNN). In the second step, the detected ships are gathered over time and the missed ships are recovered using the Intersection over Union of the bounding boxes between consecutive frames. In the third step, the probabilities from the Faster R-CNN are combined over time and the classes of the ships are determined by Bayesian fusion. To train and evaluate the proposed system, we collected thousands of ship images from Google image search and created our own ship dataset. The proposed method was tested with the collected videos and the mean average precision increased by 89.38 to 93.92% in experimental results.


Introduction
Accurate detection and reliable classification of nearby moving ships are essential functions of an autonomous ship, being closely linked to safe navigation [1,2].When a ship navigates, the chance of collision with other ships is possible in various directions, such as those that overtake, approach head-on, or cross the autonomous ship.The International Regulations for Preventing Collisions at Sea (COLREGs) defines several rules to prevent collisions [3].In particular, overtaking (rule 13), head-on (rule 14), and crossing (rule 15) situations are considered potential collision scenarios.Autonomous ships mainly collect information related to moving obstacles through non-visual sensors such as radar [4] and automatic identification systems (AIS) [5].However, recognizing the nearby ships reliably is difficult if only using information collected from non-visual sensors to determine whether these are dangerous obstacles.Therefore, autonomous ships must recognize dangerous obstacles using a visual camera.This problem is similar to the detection of cars, pedestrians, lane, or traffic signs using a camera in autonomous vehicles.
Hitherto, some research concerning ship detection and classification has been reported.For example, seashore ship surveillance and ship detection from spaceborne optical images have been achieved [6][7][8][9].Synthetic aperture radar (SAR) imagery was used to detect ships and objects on the surface of the earth [7,8].Hwang et al. used artificial neural networks (ANN) for ship detection with X-band Kompsat-5 SAR imagery [9].Unfortunately, most of the existing research focused only on ship detection based on spaceborne optical images, such as SAR imagery.Furthermore, these studies focused on visual ship detection based only on a single image.All previous works on ship detection and classification were based on a still image.To the best of our knowledge, no studies exist for the detection of ships using an image sequence or a video.
In this study, we propose a novel probabilistic ship detection and classification method using deep learning.This method considers the confidence from a deep learning detector as a probability and the probabilities from consecutive images are combined over time via Bayesian fusion.To the best of our knowledge, no research work has used the confidence from a deep learning detector in a Bayesian framework.The proposed ship detection system involves three steps.In the first step, ships are detected for each frame using Faster R-CNN [10].In the second step, the detected ships are gathered over time and the missed ships are recovered using the Intersection over Union (IoU) of the bounding boxes between consecutive frames.The corresponding detection confidence is updated and the recovery compensates for the misdetection confidence over a few frames.This approach ensures robust ship detection and minimizes misdetection.In the third step, the probabilities from the Faster R-CNN are combined over time and the classes of the ships are determined by Bayesian fusion.The use of Bayesian fusion was supported by its reported use in prior studies [11,12].
To use a deep learning framework in ship detection, a ship dataset was needed to train the Faster R-CNN.Well-known image datasets, such as ImageNet [13], PASCAL visual object classes (VOC) challenge [14], and Microsoft common objects in context (MS COCO) [15], include ship images but the number of ship images is limited and the various classes of ships are not labeled.Popular intelligent transportation system (ITS) datasets, such as the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) dataset [16], also does not include ship images.Because no public dataset exists for ship detection in the sea environment, we manually collected thousands of ship images from Google image search and built our own ship dataset.
The contributions of this paper can be summarized as follows.(1) This was the first attempt to detect and classify various classes of ships in a deep learning framework.(2) The confidence from a deep learning detector was considered as a probability and their values from the consecutive images were combined over time via Bayesian fusion.(3) Missed ships were recovered using the IoU of the bounding boxes between consecutive frames.(4) Large-scale ship detection dataset has been built by collecting ship images from google image search and annotating ground-truth bounding boxes.
The remainder of this paper is organized as follows: in Section 2, the background for the Faster R-CNN and the basic idea underlying this paper are outlined.In Sections 3 and 4, the details about the proposed method are explained.In Section 5, the experimental results, performance, and discussion are presented.Finally, the conclusions drawn from this study are presented in Section 6.

Ship Detection and Classification from an Image
In this study, ships were detected in each frame using Faster R-CNN [10], as in our previous work [17].The Faster R-CNN is a representative region-based object detection model based on deep learning.As shown by Huang et al. [18], Faster R-CNN outperforms the other models [19,20] in general object detection problem.Although R-CNN [21] and the Fast R-CNN [22] use Selective Search [23] to generate possible object locations, Faster R-CNN introduced the region proposal network (RPN), which outputs region proposals from shared full-image convolutional features, thereby improving speed performance.Faster R-CNN combines RPN and Fast R-CNN into a single network for object detection by sharing their convolutional features, as shown in Figure 1.When an image is used as input data, the convolutional neural network (CNN) generates the convolutional features.Then, the fully-convolutional RPN predicts the bounding box and object scores at each position of the convolutional features, as shown in Figure 1b.Thus, the RPN tells the Fast R-CNN where to look and classify.In our experiments, we used the Zeiler and Fergus model (ZF net) [24] that five shareable convolutional layers.
The Faster R-CNN is trained with a four-step training algorithm to learn shared features via alternative optimization.In the first step, the RPN is initialized with a pre-trained model and then fine-tuned end-to-end to propose regions.In the second step, the Fast R-CNN is trained using the region proposals generated by the first-step RPN not sharing convolutional layers.In the third step, the shared convolutional layers are fixed and the unique layers of RPN are fine-tuned.Finally, the layers unique to Fast R-CNN are fine-tuned while maintaining the shared convolutional layers.The detailed alternating algorithm for training the Faster R-CNN is found in Ren et al. [10].
The Faster R-CNN detection result for a single image can be expressed as a bounding box represented by: ( , , , ) where B denotes the four values of the bounding box: coordinates ( ), and height ( h v ), as shown in Figure 1b.The class confidence of the bounding box predicted by the Faster R-CNN can be represented by: , where 1 where  denotes the class of ship,    ( As shown in Equation (3), the determined class of the predicted bounding box is the class with the highest confidence.Our method considers the class and detection confidence from the Faster R-CNN as the probability and exploits it using Bayesian fusion.When an image is used as input data, the convolutional neural network (CNN) generates the convolutional features.Then, the fully-convolutional RPN predicts the bounding box and object scores at each position of the convolutional features, as shown in Figure 1b.Thus, the RPN tells the Fast R-CNN where to look and classify.In our experiments, we used the Zeiler and Fergus model (ZF net) [24] that five shareable convolutional layers.
The Faster R-CNN is trained with a four-step training algorithm to learn shared features via alternative optimization.In the first step, the RPN is initialized with a pre-trained model and then fine-tuned end-to-end to propose regions.In the second step, the Fast R-CNN is trained using the region proposals generated by the first-step RPN not sharing convolutional layers.In the third step, the shared convolutional layers are fixed and the unique layers of RPN are fine-tuned.Finally, the layers unique to Fast R-CNN are fine-tuned while maintaining the shared convolutional layers.The detailed alternating algorithm for training the Faster R-CNN is found in Ren et al. [10].
The Faster R-CNN detection result for a single image can be expressed as a bounding box represented by: where B denotes the four values of the bounding box: coordinates (v x ,v y ), width (v w ), and height (v h ), as shown in Figure 1b.The class confidence of the bounding box predicted by the Faster R-CNN can be represented by: where ω denotes the class of ship, k ∈ {1, As shown in Equation (3), the determined class of the predicted bounding box is the class with the highest confidence.Our method considers the class and detection confidence from the Faster R-CNN as the probability and exploits it using Bayesian fusion.

Building a Sequence of Bounding Boxes
In this section, we build a sequence for the bounding boxes using the boxes returned from the Faster R-CNN over time.In building the bounding box sequence, two practical issues had to be considered.The first issue was which bounding box to select at each time to create a reasonable sequence.The second issue involved how to handle the situation in which all the bounding boxes at time t did not make sense and when the target ship has apparently not been detected.To address these issues, we used the intersection over union (IoU) of the target bounding box and the predicted bounding boxes.Figure 2 illustrates two bounding boxes with IoU of 0.3, 0.4, and 0.9.For the two given bounding boxes B 1 and B 2 , IoU computes the intersection of two boxes divided by the area of their union as follows: sequence.The second issue involved how to handle the situation in which all the bounding boxes at time t did not make sense and when the target ship has apparently not been detected.To address these issues, we used the intersection over union (IoU) of the target bounding box and the predicted bounding boxes.Figure 2 illustrates two bounding boxes with IoU of 0.3, 0.4, and 0.9.For the two given bounding boxes 1 B and 2 B , IoU computes the intersection of two boxes divided by the area of their union as follows: Concerning the first issue, we assumed that the target ships do not move rapidly at sea.Therefore, when the Faster R-CNN returns R bounding boxes from a given image in the tth frame, the bounding box with the largest IoU with the bounding box where t r B denotes the rth predicted bounding boxes returned by Faster R-CNN in the tth frame.Concerning the second issue, when the detector in the current frame did not predict the position of the ship correctly and , the target ship is likely to be missed.In this case, we enlarged t B slightly from B − by adding an offset to avoid missing ship detection, where thd  denotes a threshold.This can be represented by: where denotes an offset added to the bounding box.Concerning the first issue, we assumed that the target ships do not move rapidly at sea.Therefore, when the Faster R-CNN returns R bounding boxes from a given image in the tth frame, the bounding box with the largest IoU with the bounding box B t−1 in the previous frame is used as the bounding box B t in the current frame: where B t r denotes the rth predicted bounding boxes returned by Faster R-CNN in the tth frame.Concerning the second issue, when the detector in the current frame did not predict the position of the ship correctly and max IoU(B t r , B t−1 ) < ε thd , the target ship is likely to be missed.In this case, we enlarged B t slightly from B t−1 by adding an offset to avoid missing ship detection, where ε thd denotes a threshold.This can be represented by: where δB = δv x , δv y , δv w , δv h denotes an offset added to the bounding box. Figure 3 shows the update of the target ship-bounding box based on IoU.

If
R bounding boxes are predicted in the first frame, the initial bounding box was selected as the one with the highest detection confidence busing: For example, Faster R-CNN predicts four bounding boxes with class confidence in the first frame, as shown in Figure 4. Since the determined class of the predicted bounding box is the class with the highest confidence, the classes of the detected boxes from ① to ④ are an aircraft carrier (0.34), bulk carrier (0.895), bulk carrier (0.668), and a destroyer (0.422), respectively.In this case, from Equation (7), we selected bounding box ② as the initial bounding box in the video.If R bounding boxes are predicted in the first frame, the initial bounding box was selected as the one with the highest detection confidence busing: For example, Faster R-CNN predicts four bounding boxes with class confidence in the first frame, as shown in Figure 4. Since the determined class of the predicted bounding box is the class with the highest confidence, the classes of the detected boxes from 1 to 4 are an aircraft carrier (0.34), bulk carrier (0.895), bulk carrier (0.668), and a destroyer (0.422), respectively.In this case, from Equation (7), we selected bounding box 2 as the initial bounding box in the video.B using maximum a posteriori (MAP).That is, the class of the sequence of the bounding boxes is predicted by: Assuming that t denotes the current time, we can rewrite Equation (8) as: ( ) where B and all the previous measurements are 1: 1 t B − .Using the Bayes rule, Equation ( 9) can be rewritten as: ) Since the current measurement t B is not affected by previous measurements , and Equation ( 10) can be simplified as: Substituting the Bayes rule into Equation (11) yields: ) Furthermore, we define the class confidence of a sequence of bounding boxes from the Faster R-CNN with: , where 1

Probabilistic Ship Detection and Classification in a Sequence of Images
In this section, B 1:T = B 1 , B 2 , . . ., B t , . . ., B T is a sequence of the bounding boxes predicted by the Faster R-CNN, where B t is the bounding box detected at time t.We determine the class of the bounding box sequence B 1:T using maximum a posteriori (MAP).That is, the class of the sequence of the bounding boxes is predicted by: Assuming that t denotes the current time, we can rewrite Equation (8) as: where B 1:t is divided into the current measurement B t and all the previous measurements are B 1:t−1 .
Using the Bayes rule, Equation ( 9) can be rewritten as: Since the current measurement B t is not affected by previous measurements B 1:t−1 conditioned on ω = k, we obtain p B t |ω = k, B 1:t−1 = p B t |ω = k , and Equation ( 10) can be simplified as: Substituting the Bayes rule into Equation (11) yields: Furthermore, we define the class confidence of a sequence of bounding boxes from the Faster R-CNN with: Appl.Sci.2018, 8, 936 7 of 17 and consider a new quantity: Substituting Equation ( 13) into Equation ( 15) yields: Herein, we denote the detection confidence for the bounding box selected in Equation ( 5) as: For practical consideration, if the detector missed the target ship, we considered the recovered bounding box in Equation ( 6) as a background; then, its detection confidence is assigned by: Then, substituting Equation ( 17) into Equation ( 16) yields: where f t−1 k is the previous confidence of the sequence B 1:t−1 at time t − 1, f 0 k is the initial confidence, and p t k is the confidence of the tth frame of B t .If we define then, we can obtain the following from Equation ( 19): From Equations ( 19) and ( 20), we can update the sequence confidence f t k at time t from the previous sequence confidence f t−1 k at time t − 1, and the current frame confidence p t k from the Faster R-CNN at time t.Thus, we did not need to retain all the previous frame confidences to compute the current sequence confidence.Then, we can predict the class of a sequence B 1:t from Equation (8).In this study, we set the IoU threshold ε thd to 0.5 and δB to (−1, −1, 1, 1).Summarizing the abovementioned results, the proposed probabilistic ship detection algorithm using video is outlined in Algorithm 1 and illustrated in Figure 5. Step 1: At frame 1, initialize the target ship-bounding box For a given image at frame t > 1, update the bounding boxes If max Evaluate the class confidence of a sequence of bounding boxes recursively by Step 4: Determine the class at frame t using class(B Step 1: At frame 1, initialize the target ship-bounding box Step 2: For a given image at frame Step 4: Determine the class at frame t using

Experimental Results
We built our own ship dataset to train the Faster R-CNN and evaluate the proposed method.For this dataset, 7000 ship images were collected by Google image search and they were manually labeled as one of seven classes: aircraft carrier, destroyer, submarine, container ship, bulk carrier, cruise ship, and tugboat.

Experimental Results
We built our own ship dataset to train the Faster R-CNN and evaluate the proposed method.For this dataset, 7000 ship images were collected by Google image search and they were manually labeled as one of seven classes: aircraft carrier, destroyer, submarine, container ship, bulk carrier, cruise ship, and tugboat.

Ship Dataset
The dataset mainly focused on large ships, which were divided into two types navigating in the ocean: warship and merchant ship.Three warship classes exist: aircraft carrier, destroyer, and submarine.Three classes of merchant also exist: container ship, bulk carrier, and cruise ship.Finally, we included a small ship in the dataset, a tugboat that assists large ships in entering and leaving a port, resulting in seven classes.
To train the Faster R-CNN for ship detection and evaluate single image detection, a total of 7000 still images were collected.Each class included 1000 images, completely balancing the problem.All the still images were manually gathered from Google image search.Most of the collected still images were completely different from and separated from other collected images and none were not consecutive.The ship image dataset was divided into a training dataset and a test dataset, as summarized in Table 2.In detail, 5250 images (75%, 750 images per class) among the 7000 images were used for the training the Faster R-CNN, and 1750 images (25%, 250 images per class) were used to test single image detection.To evaluate ship detection performance on videos, seven video clips involving all the aforementioned classes were downloaded from YouTube in MPEG-4 video format.A test video file was decomposed into a sequence of still images that were consecutive in time and each image was processed by Faster R-CNN.The detection result of each image was combined with that of the consecutive images and the combined result was used in video simulation.

Results of the Single Image Detection
The same hyper parameters used in the original Faster R-CNN [10] were applied to train our Faster R-CNN for ship detection.The hyper parameters used in our experiments were as follows: learning rate: 0.001, momentum: 0.9, and weight decay, 0.0005.The ZF net pre-trained on ImageNet was used as a base CNN to extract features and fine-tune the network using our ship dataset.The maximum iteration was set to 10,000.We use the Caffe [25] framework to train the Faster R-CNN on Ubuntu 16.04 LTS and NVIDIA GeForce GTX 980 on GPU.Table 3 shows the results of the ship detection using the Faster R-CNN fine-tuned by the above training set.The results of the ship detection using the Faster R-CNN by the test sample images are shown in Figure 6.

Results of Detection Based on Video
Seven sequences of images were used to demonstrate the performance of the proposed method based on video.Among them, two videos were considered in detail.In the first sequence involving a tugboat, the weather was relatively fine and the ships were not influenced by environmental factors.In the second sequence involving a destroyer, however, the weather was windy and the environmental factors, such as waves and wind, influence ships.The Faster R-CNN returned eight confidences, one for each class, and the eight confidences equaled 1 in each frame, as shown in Equation (17).In Figure 7, the changes in the eight confidences are plotted against the frames for the first sequence.The subfigures in the first, second and third rows correspond to the Faster R-CNN; Faster R-CNN and IoU tracking; and Faster R-CNN, IoU tracking and Bayesian fusion, respectively.

Results of Detection Based on Video
Seven sequences of images were used to demonstrate the performance of the proposed method based on video.Among them, two videos were considered in detail.In the first sequence involving a tugboat, the weather was relatively fine and the ships were not influenced by environmental factors.In the second sequence involving a destroyer, however, the weather was windy and the environmental factors, such as waves and wind, influence ships.The Faster R-CNN returned eight confidences, one for each class, and the eight confidences equaled 1 in each frame, as shown in Equation (17).In Figure 7, the changes in the eight confidences are plotted against the frames for the first sequence.The subfigures in the first, second and third rows correspond to the Faster R-CNN; Faster R-CNN and IoU tracking; and Faster R-CNN, IoU tracking and Bayesian fusion, respectively.
In the second sequence involving a destroyer, however, the weather was windy and the environmental factors, such as waves and wind, influence ships.The Faster R-CNN returned eight confidences, one for each class, and the eight confidences equaled 1 in each frame, as shown in Equation (17).In Figure 7, the changes in the eight confidences are plotted against the frames for the first sequence.The subfigures in the first, second and third rows correspond to the Faster R-CNN; Faster R-CNN and IoU tracking; and Faster R-CNN, IoU tracking and Bayesian fusion, respectively.( | ) In the first sequence, the target ship is a tugboat.In Figure 7a, the tugboat is classified as a bulk carrier four times by Faster R-CNN.The confidence of the tugboat also does not remain steady but changes irregularly.In Figure 7b, IoU tracking is also used with the Faster R-CNN.When the target was not detected or the IoU was lower than the threshold, the bounding box was considered background and the corresponding confidences were assigned by Equation ( 18).However, in the figure, no background confidence was observed since all the targets in each frame were detected by the Faster R-CNN and the IoU values from IoU tracking were higher than the threshold.Figure 7c shows the experimental result when Faster R-CNN, IoU tracking and Bayesian fusion were used together.The confidence for the tugboat was steady and approached one after a few frames, and the confidences for the other classes disappeared and approached zero.
The experimental results for the second sequence are shown in Figure 8. Unlike the first sequence, the ships were affected by environmental factors.The target ship in the second sequence was a destroyer.Similar to Figure 7, the change in the eight confidences is plotted against frames for the second sequence in Figure 8.The subfigures in the first, second and third rows in Figure 8 correspond to Faster R-CNN; Faster R-CNN and IoU tracking; and Faster R-CNN, IoU tracking and Bayesian fusion, respectively.The confidence in Figure 7a implies p t k = p(ω = k|B t ) and the confidence in Figure 7c implies f t k = p(ω = k|B 1:t ).In the first sequence, the target ship is a tugboat.In Figure 7a, the tugboat is classified as a bulk carrier four times by Faster R-CNN.The confidence of the tugboat also does not remain steady but changes irregularly.In Figure 7b, IoU tracking is also used with the Faster R-CNN.When the target was not detected or the IoU was lower than the threshold, the bounding box was considered background and the corresponding confidences were assigned by Equation (18).However, in the figure, no background confidence was observed since all the targets in each frame were detected by the Faster R-CNN and the IoU values from IoU tracking were higher than the threshold.Figure 7c shows the experimental result when Faster R-CNN, IoU tracking and Bayesian fusion were used together.The confidence for the tugboat was steady and approached one after a few frames, and the confidences for the other classes disappeared and approached zero.
The experimental results for the second sequence are shown in Figure 8. Unlike the first sequence, the ships were affected by environmental factors.The target ship in the second sequence was a destroyer.Similar to Figure 7, the change in the eight confidences is plotted against frames for the second sequence in Figure 8.The subfigures in the first, second and third rows in Figure 8  First, let us consider Intervals 1 and 4 in Figure 8.In the two intervals, the destroyer was falsely classified as an aircraft carrier, as shown in Figure 8a,b.In particular, the target ship was classified not as a destroyer but as an aircraft carrier in six frames in a row from frame 235 to 240.However, the proposed method overcame the false classification problem and the confidence for the destroyer progressed steadily to one, as shown in Figure 8c.Second, consider Interval 2. In this interval, the destroyer was not detected and was actually classified as background by the IoU tracking several times.However, the proposed method again overcame the misdetections again and the confidence for a destroyer progressed steadily to one, as shown in Figure 8c.Third, consider Interval 3, which was slightly different from Intervals 1, 2, and 4. Unlike the previous intervals, several misdetections and tens of false classifications occurred together in Interval 3. The proposed method worked well even for this challenging situation for the first 30 frames but the frequency of the misdetection and false classification exceeded a certain threshold.Moreover, our algorithm failed to classify effectively and returned the wrong result.
Three competing methods are compared on a per-frame basis in Table 4.As stated, the ground truth was a destroyer.In Intervals 1 and 4, only Faster R-CNN returned a false classification for the aircraft carrier in several frames such as #19, #20, and #21 and its confidence remained around 0.5.When Faster R-CNN was combined with IoU tracking and Bayesian fusion, however, the confidence steadily increased and approached one.In Interval 2, only Faster R-CNN often missed the destroyer but when it was combined with IoU tracking and Bayesian fusion, Faster R-CNN overcame the misdetections and the confidence also approached one.Here, when the target was not detected or falsely classified for several consecutive frames, for example, during frames #124 to #127 in Interval First, let us consider Intervals 1 and 4 in Figure 8.In the two intervals, the destroyer was falsely classified as an aircraft carrier, as shown in Figure 8a,b.In particular, the target ship was classified not as a destroyer but as an aircraft carrier in six frames in a row from frame 235 to 240.However, the proposed method overcame the false classification problem and the confidence for the destroyer progressed steadily to one, as shown in Figure 8c.Second, consider Interval 2. In this interval, the destroyer was not detected and was actually classified as background by the IoU tracking several times.However, the proposed method again overcame the misdetections again and the confidence for a destroyer progressed steadily to one, as shown in Figure 8c.Third, consider Interval 3, which was slightly different from Intervals 1, 2 and 4. Unlike the previous intervals, several misdetections and tens of false classifications occurred together in Interval 3. The proposed method worked well even for this challenging situation for the first 30 frames but the frequency of the misdetection and false classification exceeded a certain threshold.Moreover, our algorithm failed to classify effectively and returned the wrong result.The performance of ship detection with the proposed method on test videos was compared with using only Faster R-CNN for ship detection.The ship detection results are summarized in Table 5.Overall, our proposed method outperformed the previous Faster R-CNN detector.

Conclusions
In this study, a probabilistic ship detection and classification system for video using deep learning was proposed.To train the detector and evaluate the proposed system, we collected thousands of ship images from a Google image search and built our own ship dataset.The probabilistic ship detection and classification system demonstrated better detection and classification performance compared to when only Faster R-CNN was used.The proposed method used IoU tracking to build a sequence of the bounding boxes and considered the confidence from the detector as a probability.The undetected ships were recovered by IoU tracking.Moreover, the probabilities of the detection accumulated over time and the classes of the ships were determined by Bayesian fusion.In the experiments, the proposed method was tested with two sequences of images and showed considerable improvement in both detection and classification over prior methods.The performance of ship detection with the proposed method on test videos was compared with using only Faster R-CNN for ship detection.The ship detection results are summarized in Table 5.Overall, our proposed method outperformed the previous Faster R-CNN detector.

Conclusions
In this study, a probabilistic ship detection and classification system for video using deep learning was proposed.To train the detector and evaluate the proposed system, we collected thousands of ship images from a Google image search and built our own ship dataset.The probabilistic ship detection and classification system demonstrated better detection and classification performance compared to when only Faster R-CNN was used.The proposed method used IoU tracking to build a sequence of the bounding boxes and considered the confidence from the detector as a probability.The undetected ships were recovered by IoU tracking.Moreover, the probabilities of the detection accumulated over time and the classes of the ships were determined by Bayesian fusion.In the experiments, the proposed method was tested with two sequences of images and showed considerable improvement in both detection and classification over prior methods.

Figure 3 .
Figure 3. IoU tracking.(a) IoU is equal to or larger than the threshold and (b) IoU is less than the threshold.If IoU is less than the threshold, the bounding box in frame 1 t − increases slightly and is updated as the target ship-bounding box to avoid missing ship detection.

Figure 3 .
Figure 3. IoU tracking.(a) IoU is equal to or larger than the threshold and (b) IoU is less than the threshold.If IoU is less than the threshold, the bounding box in frame t − 1 increases slightly and is updated as the target ship-bounding box to avoid missing ship detection.

Figure 4 .
Figure 4. Initial bounding box selection in the first frame.

4 .
Probabilistic Ship Detection and Classification in a Sequence of Images In this section, a sequence of the bounding boxes predicted by the Faster R-CNN, where t B is the bounding box detected at time t .We determine the class of the bounding box sequence 1:T

Figure 4 .
Figure 4. Initial bounding box selection in the first frame.

Algorithm 1 :
Probabilistic ship detection and classification from video.

Figure 5 .
Figure 5. Robust single ship detection system based on video in our proposed algorithm.

Figure 5 .
Figure 5. Robust single ship detection system based on video in our proposed algorithm.

Figure 6 .
Figure 6.Results of the ship detection using the Faster R-CNN based on the test images.

Figure 6 .
Figure 6.Results of the ship detection using the Faster R-CNN based on the test images.

Figure 7 .
Figure 7.The change of confidences in the image sequence without environmental factors.(a) Faster R-CNN, (b) Faster R-CNN and IoU tracking, and (c) Faster R-CNN, IoU tracking, and Bayesian fusion.

Figure 7 .
Figure 7.The change of confidences in the image sequence without environmental factors.(a) Faster R-CNN, (b) Faster R-CNN and IoU tracking, and (c) Faster R-CNN, IoU tracking, and Bayesian fusion.

Figure 8 .
Figure 8.The change in the confidences in the image sequence with environmental factors: (a) Faster R-CNN, (b) Faster R-CNN and IoU tracking, and (c) Faster R-CNN, IoU tracking and Bayesian fusion.

Figure 8 .
Figure 8.The change in the confidences in the image sequence with environmental factors: (a) Faster R-CNN, (b) Faster R-CNN and IoU tracking, and (c) Faster R-CNN, IoU tracking and Bayesian fusion.

Table 1 .
Classes in the ship dataset.

Table 2 .
The number of images in the ship dataset.

Table 3 .
Results of the single image detection.

Table 5 .
Performance of ship detection on test videos.

Table 5 .
Performance of ship detection on test videos.