Next Article in Journal
SAES: An Introduction to Self-Adapting Exploratory Structures
Next Article in Special Issue
Dynamic Gesture Recognition Based on MEMP Network
Previous Article in Journal
Sentiment Analysis Based Requirement Evolution Prediction
Previous Article in Special Issue
Tooth-Marked Tongue Recognition Using Gradient-Weighted Class Activation Maps
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Embedded Deep Learning for Ship Detection and Recognition

College of Computer and Communication Engineering, China University of Petroleum (UPC), Qingdao 266580, China
*
Author to whom correspondence should be addressed.
Future Internet 2019, 11(2), 53; https://doi.org/10.3390/fi11020053
Submission received: 31 December 2018 / Revised: 27 January 2019 / Accepted: 31 January 2019 / Published: 21 February 2019
(This article belongs to the Special Issue Innovative Topologies and Algorithms for Neural Networks)

Abstract

:
Ship detection and recognition are important for smart monitoring of ships in order to manage port resources effectively. However, this is challenging due to complex ship profiles, ship background, object occlusion, variations of weather and light conditions, and other issues. It is also expensive to transmit monitoring video in a whole, especially if the port is not in a rural area. In this paper, we propose an on-site processing approach, which is called Embedded Ship Detection and Recognition using Deep Learning (ESDR-DL). In ESDR-DL, the video stream is processed using embedded devices, and we design a two-stage neural network named DCNet, which is composed of a DNet for ship detection and a CNet for ship recognition, running on embedded devices. We have extensively evaluated ESDR-DL, including performance of accuracy and efficiency. The ESDR-DL is deployed at the Dongying port of China, which has been running for over a year and demonstrates that it can work reliably for practical usage.

1. Introduction

With the development of the marine economy, marine transportation and management have been attracting more and more attention in modern ports [1]. Ship detection and recognition play an important role for marine transportation management. To accomplish the task of ship detection and recognition, video surveillance with static cameras is a good choice. Surveillance cameras are increasingly deployed for port management and security in order to realize a smart port [2]. However, this is challenging due to complex ship profiles, ship background and object occlusions, variations of weather and light conditions, and other issues.
Deep learning [3] provides a promising technology to tackle these issues. Vehicle plate text recognition is a popular image process method for vehicle identification, which shows promising results for accurate object recognition. The work in [4] handled Chinese car license plate recognition from traffic videos with image features extracted by DCNNs (Deep Convolutional Neural Networks). License plate recognition [5] based on deep learning was also used for feature extraction and classification. This regular character recognition is much simpler than these Chinese characters from ship license plates, due to the usage of various character types and complex backgrounds, and also the variations of ship plate locations.
At the same time, the number of monitor devices can be big, deployed at both the seashore and above the water, which are used to monitor ships sailing in the water and also ships going back and forth from a port. Therefore, the recognition approach requires good scalability and should have the capability to handle a considerate number of video streams. On the other hand, the transmission of all video streams back may not be possible as there may not be Internet connections in some places, and also the cost of using 4G for transmission of video streams is an important factor to design possible recognition solutions.
To address these challenges, we propose an embedded deep learning approach called ESDR-DL (Embedded Ship Detection and Recognition using Deep Learning), in order to conduct ship recognition on the fly by connecting the embedded device to a camera directly. In ESDR-DL, we propose a neural network named DCNet (composed with DNet and Cnet as detailed later) which conducts ship recognition as a classification problem by detecting and identifying key parts of a ship (the bow, the cabin, and the stern), and classifies the ship’s identity based on these key parts. These classification results are then voted for the decision of the ship’s identity. In order to boost performance, ESDR-DL is designed to handle multi-channel video at the same time. We conduct comprehensive evaluations for the embedded system of ESDR-DL, including the performance of accuracy and efficiency.
The contributions for this paper include:
  • To decrease model parameters, we design a tiny network DNet for ship detection, and share the base convolutional layers with CNet.
  • To address challenges of variations of ship license plate locations and text types, we propose a classification network CNet to recognize ships.
  • We run the DCNet on embedded devices, which has good scalability and can handle a large number of video streams at the same time.
The remainder of the paper is organized as follows: Section 2 discusses the related work. Section 4 presents the architecture design of ESDR-DL. Section 3 discusses the implementation and training steps of DCNet. Section 5 presents comprehensive evaluations of the deployed solution. Section 6 concludes the paper.

2. Related Work

We will discuss deep learning and embedded device supported object recognition as ESDR-DL is in principle an embedded deep learning approach, we will also discuss vehicle recognition as ship is also a kind of vehicle.

2.1. Deep Learning

The concept of deep learning originates from the study of artificial neural networks, proposed by Hinton et al. [6]. Deep learning have made remarkable achievements in the field of image processing, especially for object detection. SSD is a typical one stage detector proposed in [7], which processes images in a single network, and and has good efficiency and accuracy. Faster R-CNN [8] is a two-stage detector, which uses RPN (Region Proposal Network) to produce high-quality region proposals and then detect them with Fast R-CNN [9].
Redmon presents a single neural network named YOLO, which abandons anchor boxes, and predicts bounding boxes and class probabilities directly from a full image in one evaluation [10]. YOLO considers object detection as a regression problem to predict bounding boxes and class probabilities. It can be optimized as end-to-end directly with good detection performance. Fast YOLO can process 155 frames per second. Compared with other state-of-the-art detection algorithms, YOLO makes more localization errors.
YOLOV2 [11] is based on YOLO [10]. YOLOV2 removes the fully connected layers from YOLO and uses anchor boxes to predict bounding boxes. The YOLOV2 model can run with various image sizes, and it is easy to make a trade-off between speed and accuracy. YOLOV2 is faster than YOLO, which can process 200 frames per second with the Tiny model. Table 1 shows the performance of these algorithms.

2.2. Vehicle Identification

There are many state-of-the-art algorithms that can be used for vehicle detection, such as RCNN [9] YOLO [10], which have high real-time performance, but the accuracy is not high for ship recognition. In [12], Wang proposed a vehicle real-time detection algorithm based on YOLOv2. It optimized important parameters of the model, and improved the number and size of anchors in the model, which can achieve both real-time and high accuracy for vehicle detection. It tested by a home-made dataset, which showed higher accuracy and ran faster than YOLOv2 [11] and RCNN. However, the real-time performance is based on high-performance equipment, which is not suitable for us.
Plate recognition is the most typical application for vehicle identification. Liu et al. [13] proposed CogniMem, which used a neural-network chip to recognize license plates. The CogniMem combined a video image processing module with a neural network module by using an equalized image processing algorithm and network classification algorithm. It contained processes of license location, character segmentation and character recognition. CogniMem can recognize car plates with low error; however, it required that the plates have a fixed character position and limited character type and numbers. Lin [14] proposed a method named ALPR to detect and recognize the characters in the plate region of an image. The approach is not applicable to the situation in which new targets emerge that are not annotated in its database.

2.3. Embedded Object Recognition

Embedded image processing has been attracting a lot of efforts. In [15], Arth et al. designed a full-featured license plate detection and recognition method using DSP. The processing core is a single Texas Instruments fixed point DSP with 1 MB RAM. Additionally, a slower SDRAM memory block of 16 MB exists. It can achieve real-time performance. In addition, Kamat and Ganesan [16] implemented a license plate detection system on a DSP using the Hough transform. Kang et al. [17] implemented a vehicle tracking and license plate recognition system on a PDA. An FPGA was used by Bellas et al. [18] to speed up parts of their license plate recognition system.
There was research that ran the Fast R-CNN on Jetson TK1 platform [19]. Although additional modifications on the Fast R-CNN were made to fit TK1, the detection speed was very low (1.85 frames per second-fps). The work in [20] ran a seven-layer CNN on TDA3x SoC for object classification, and the overall system performance was 15 fps. Therefore, a powerful software/hardware platform is needed to support efficient embedded deep learning based real-time video processing.

3. Designing a Recognition Neural Network-DCNet

DCNet is a two-stage network that consists of a DNet and a CNet as shown in Figure 1. DNet is a fully convolutional network [21] for ship parts detection including ship bow, cabin and stern. CNet is a classifier that can takes an image of any size and output a set of classification scores. We locate the ship parts from the DNet, and feed them into the CNet to get three classification scores (bow score, cabin score, stern score) of ship identify. Finally, a voter is used to recognize the ship as shown in Figure 1.

3.1. DNet

Region Proposal is one of the key points for a target detection network, such as Faster R-CNN [8] using RPN to generate better regions, and YOLOv1 [10] splitting an image into grid cells as region proposals to improve the detection efficiency. As shown in Figure 2, DNet divides the input image into 6 × 6 grid cells as region proposals like YOLOv1 [10]. Each region proposal consists of eight predictions: x, y, w, h, c, C × 3. The (x, y) coordinates represent the center of the predicted box. w and h represent the width and height of the predicted box. c represents the IOU (intersection-over-union) between the predicted box and ground truth box. C × 3 represents the probability classes of bow, cabin and stern.
Inspired by YOLOv1, DNet models the detection as a regression problem. Since the object and background are relatively simple, object features are relatively obvious, and the network is not as deep as VGGNet [10] and ResNet [22]. We pay more attention to the decreasing network model parameters. DNet resizes the image to 192 × 192 as the input and we design five layers to extract features from an image; the last two layers predict the object probabilities and its coordinates.
DNet predicts bounding boxes based on grid cells. A grid cell produces one bounding box predictor. We need one bounding box predictor to be “responsible” for each object, and choose the one based on which prediction has the highest current IOU (intersection-over-union) with the ground truth. To choose a proper predictor for each object, at training time, we design the loss L g function as follows:
L g r i d = i = 0 S 2 ( C i C i ) 2 ,
where S 2 is the number of the grid cells, C i is the confidence value that the predicting box contain an object, and C i is the IOU between predicted bounding box with ground truth. If there is no object in predictions, then C i = 0 .
The final layer predicts both class probabilities and bounding box coordinates; we calculate the coordinates loss and classification loss only when the predictor is a proper one, the loss function is:
L b o x = c p r o p e r s ( x c x c ) 2 + ( y c y c ) 2 + ( w c w c ) 2 + ( h c h c ) 2 + ( p c p c ) 2 ,
where p c is the predicting class and p c is the truth class. The loss of L b o x is under the assumption that the predictor is a proper one. Therefore, it may not be ideal to weight the L g r i d equally to L b o x . We use λ to weight the loss, and the final loss function is designed as follows:
L = λ L g r i d + ( 1 λ ) L b o x .

3.2. CNet

Ship recognition is challenging, and we can make use of the fact that there is only a limited number of ships in a port. CNet model considers the recognition problem as a classification problem, which is connected to the end of the DNet. We set the output boxes and classes as the input and share the first three layers’ feature maps of the DNet. The boxes is resized to 14 × 14 by a ROIPool layer as shown in Figure 3, which was proposed in [8]. Two extra convolutional layers followed by the ROIPool layer are added, and, finally, two fully connected layers and a softmax layer are used to predict the output probabilities.
Finally, CNet outputs three ship classification scores of bow, cabin and stern. We design the voting strategy as
S i : i ( p r o b a b i l i t i e s ) = λ b S c o r e b o w i + λ c S c o r e c a b i n i + λ s S c o r e s t e r n i .
S c o r e i denotes the output score of probabilities i. It weights the score of cabins equally with scores of bow and stern which may not be ideal. To resolve this, we use λ to weight the scores.

3.3. Training and Running

Before training, we have to label the ship data set. We quadrangle the bow, cabin and stern with (c,i,x,y,w,h). c represents the key point of ships, and i represents the identification of the key point, (x,y) represents the upper-left coordinates of the box, w represents the with of the box and h represent the hight of the box. To learn the shared features, we train the DCNet with two steps as shown in Figure 4. In the first step, we train the DCNet use the ship data set, we set the initial learning rates as 0.01 and decrease by one tenth per 10,000 iterations; after 50,000 iterations, the losses tend to stabilize. In the second step, we fix the shared convolutional layers and only fine-tune the unique layers of CNet. During CNet training, we feed the ship data sets to the shared convolutional layers and rectangle the box feature maps, unify the box feature maps size by the ROIPool layer, and, lastly, classify the feature maps with the unique layers of CNet. We set the initial learning rates as 0.1 and decrease by one-tenth per 5000 iterations; after 40,000 iterations, the loss tends to be stabilized.
When running the model, firstly, the DNet predicts the coordinates and classes of bow, cabin and stern, and then it rectangles key ship parts from the sharing feature maps and feeds them to the CNet to get the probability scores, as shown in Figure 5.

4. Architecture Design of ESDR-DL

In order to reduce network traffic caused by video streaming from surveillance cameras, and resolve the limitation of low transmission bandwidth, we design an embedded architecture for deep learning, which connects surveillance cameras and performs image processing at the front end, as shown in Figure 6. In this ESDR-DL, the video stream is connected to a nearby TX2 through a LAN. To ensure real-time performance of video surveillance, each TX2 receives only one or two video streams. When the system is running, a Video Stream Receiver in TX2 is responsible for receiving the video stream accessed by the current device, decoding the video stream through a Video Stream Decoder, and inputting the decoded images to an Image Processor for detection. In the Image Processor, the DCNet model is used to detect and identify key parts of a ship (the bow, the cabin, and the stern), and classify the ship’s identity based on these key parts, and output three prediction results. These prediction results are then used in a Voter for the decision of the ship’s identity.
We use NVIDIA Jetson TX2 as it is an industry-leading embedded computing device. Table 2 lists the main properties of TX2 related to our work in this paper.

5. Experiment Results

We use the recall-R and precision-P as the evaluation standard, defined as
R = TP / ( TP + FN ) , P = TP / ( TP + FP ) .
TP refers to true positive, FN indicates false negative, and FP means false positive.

5.1. Algorithm Performance

To evaluate the performance of DNet, we use a ship data set that has 6000 images collected from Donging port, Shandong, China. We have tested both Yolo Tiny and DNet, running on TX2 and GTX TITAN X. In addition, 4700 images are used for training and 1300 are used for testing. Table 3 shows the test results. We can see that DNet achieves much higher energy efficiency with a little lower accuracy.
YOLOv1 splits an image into 7 × 7 grid cells; considering the big target of ship and the limitation of computing capacity of a TX2, we decrease the grid cells to reduce model parameters. As is shown in Table 4 where efficiency is measured by FPS (Frames Per Seconds), a test is made to check this, and DNet splits an image into 6 × 6, considering the performance–accuracy trade off.
The λ for loss L in Equation (3) can be changed based on different scenarios and targets. We adjust λ experimentally and the results is shown in Table 5. Concluding from the tests, we set λ = 0.7 .
We adjust λ experimentally for voting strategy and test its impact on accuracy as in Table 6. We can conclude the weights from it that λ c > λ b > λ s . We set λ b = 0.3 , λ c = 0.5 and λ s = 0.2 .

5.2. System Performance

ESDR-DL is deployed to Dongying port, China. The video cameras used are Hikvision DS-2CD3T25D-I5. The pixel used is 1920 × 1080 and the frame rate is 30 fps. We use seven TX2s for 10 cameras as shown in Table 7. Four cameras are installed on both sides of the entrance with a height of 8 m. Others are installed inside the port.
During one month’s running, we collect 13,000 recognized records and checks the accuracy manually. There are a total of 14,536 ships in videos.
Table 8 illustrates the recall and precision rates of ship detection and recognition. S denotes the ship number occurring in each camera, D-P stands for the ship detection precision, D-R is the detection recall, R-P is the recognition precision, R-R refers to the recognition recall, and T denotes the processing efficiency of each camera.
Comparing Table 6 and Table 8, we can find that the accuracy of actual running is lower than the accuracy testing in the home-made data set because there are new ships arriving the port and the ESDR-DL can not recognize these new ships. In addition, ESDR-DL performs better for the inside-port monitoring cameras because there are some far away scenes of ships from the entrance cameras and only close scenes exist from the inside-port cameras, while DCNet focus on big target detection and recognition. In addition, as shown in Figure 7, the system can run in bad weather conditions (such as rain and smog) in practice. In order to test the performance of the system in bad weather, we run the system in rain and smog weather, and run it at dusk (5:00 p.m.–6:00 p.m.). The detection recognition results are shown in Table 9.
The recognition results are shown in Table 10. We can see that the accuracy of the system is dropping sharply in rain and smoggy weather, while performing well at dusk. This is not a problem in practice as there are very few ships in such weather conditions.

6. Conclusions

Considering the challenges of ship detection and recognition, this paper proposes an embedded deep learning system for ship detection and recognition named ESDR-DL. It first locates the bow, cabin and stern of the ship using DNet, and then recognizes them by a classification network named CNet. Finally, voting is used to recognize the ship identification. We implement the ESDR-DL with an embedded architecture which supports real-time video processing. We have deployed ESDR-DL at Dongying port, China. It has been running stably in the past year, which shows the effectiveness of our solution. In the future, we will adopt a multi-model data fusion approach [23,24] to improve the recognition accuracy.

Author Contributions

Conceptualization, H.Z. and W.Z.; methodology, H.Z.; validation, H.Z., H.S. and B.X.; formal analysis, W.S.; investigation, B.X.; resources, W.Z.; writing—original draft preparation, H.Z.; writing—review and editing, H.Z. and H.S.; visualization, W.Z. and B.X.; project administration, W.Z.

Funding

This research was funded by the Key Research Program of Shandong Province under Grant No. 2017GGX10140 and the National Natural Science Foundation of China under Grant No. 61309024.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

The following abbreviations are used in this manuscript:
ESDR-DLEmbedded Ship Detection and Recognition using Deep Learning
DCNetDetection and Classification Network
DNetDetection Network
CNetClassification Network
DCNNDeep Convolutional Neural Network
RPNRegion Proposal Network
RCNNRegion-based Convolutional Neural Network
ROIRegion Of Interest
DSPDigital Signal Processing
PDAPersonal Digital Assistant
FPGAField-Programmable Gate Array
TDA3x SoCThreat Discovery Appliance
SSDSingle Shot MultiBox Detector
YOLOYou Only Look Once

References

  1. Wang, Z.; Tang, W.; Zhao, L. Research on the modern port logistics development in the city-group, China. In Proceedings of the 2010 International Conference on IEEE Logistics Systems and Intelligent Management (ICLSIM), Harbin, China, 9–10 January 2010; pp. 1280–1283. [Google Scholar]
  2. Alderton, P.M. Port Management and Operations; Harbors: Suffolk, NY, USA, 2008. [Google Scholar]
  3. Xu, L.; Ren, J.S.J.; Liu, C.; Jia, J. Deep convolutional neural network for image deconvolution. In Proceedings of the International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 1790–1798. [Google Scholar]
  4. Zang, D.; Chai, Z.; Zhang, J.; Zhang, D.; Cheng, J. Vehicle license plate recognition using visual attention model and deep learning. J. Electron. Imaging 2015, 24, 033001. [Google Scholar] [CrossRef] [Green Version]
  5. Masood, S.Z.; Shu, G.; Dehghan, A.; Ortiz, E.G. License Plate Detection and Recognition Using Deeply Learned Convolutional Neural Networks. arXiv, 2017; arXiv:1703.07330. [Google Scholar]
  6. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012. [Google Scholar]
  7. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37. [Google Scholar]
  8. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39. [Google Scholar] [CrossRef] [PubMed]
  9. Girshick, R. Fast r-cnn. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar] [CrossRef]
  10. Redmon, J.; Divvala, S.K.; Girshick, R.B.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. arXiv, 2015; arXiv:1506.02640v5. [Google Scholar]
  11. Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. arXiv, 2016; arXiv:1612.08242. [Google Scholar]
  12. Wang, H.; Zhang, Z. A vehicle real-time detection algorithm based on YOLOv2 framework. In Proceedings of the Real-Time Image and Video Processing, Orlando, FL, USA, 15–19 April 2018; p. 22. [Google Scholar]
  13. Liu, Y.; Wei, D.; Zhang, N.; Zhao, M. Vehicle-license-plate recognition based on neural networks. In Proceedings of the 2011 IEEE International Conference on Information and Automation, Shenzhen, China, 6–8 June 2011; pp. 363–366. [Google Scholar]
  14. Lin, D.; Lin, F.; Lv, Y.; Cai, F.; Cao, D. Chinese Character CAPTCHA Recognition and Performance Estimation via Deep Neural Network. Neurocomputing 2018, 28, 11–19. [Google Scholar] [CrossRef]
  15. Arth, C.; Limberger, F.; Bischof, H. Real-Time License Plate Recognition on an Embedded DSP-Platform. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition 2007, Minneapolis, MN, USA, 17–22 June 2007. [Google Scholar] [CrossRef]
  16. Kamat, V.; Ganesan, S. An efficient implementation of the Hough transform for detecting vehicle license plates using DSP’S. In Proceedings of the Real-Time Technology and Applications Symposium, Chicago, IL, USA, 15–17 May 1995; pp. 58–59. [Google Scholar]
  17. Kang, J.S.; Kang, M.H.; Park, C.H.; Kim, J.H.; Choi, Y.S. Implementation of embedded system for vehicle tracking and license plates recognition using spatial relative distance. In Proceedings of the International Conference on Information Technology Interfaces, Cavtat, Croatia, 7–10 June 2003; Volume 1, pp. 167–172. [Google Scholar]
  18. Bellas, N.; Chai, S.M.; Dwyer, M.; Linzmeier, D. FPGA implementation of a license plate recognition SoC using automatically generated streaming accelerators. In Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium, Rhodes Island, Greece, 25–29 April 2006. [Google Scholar]
  19. Mao, H.; Yao, S.; Tang, T.; Li, B.; Yao, J.; Wang, Y. Towards real-time object detection on embedded systems. IEEE Trans. Emerg. Top. Comput. 2016, 6, 417–431. [Google Scholar] [CrossRef]
  20. Jagannathan, S.; Desappan, K.; Swami, P.; Mathew, M.; Nagori, S.; Chitnis, K.; Marathe, Y.; Poddar, D.; Narayanan, S. Efficient object detection and classification on low power embedded systems. In Proceedings of the 2017 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 8–10 January 2017; pp. 233–234. [Google Scholar]
  21. Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar] [CrossRef]
  22. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
  23. Zhang, W.; Zhang, Y.; Zhai, J.; Zhao, D.; Xu, L.; Zhou, J.; Li, Z.; Yang, S. Multi-source data fusion using deep learning for smart refrigerators. Comput. Ind. 2018, 95, 15–21. [Google Scholar] [CrossRef]
  24. Zhang, W.; Wang, Z.; Liu, X.; Gong, W.; Sun, H.; Zhou, J.; Liu, Y. Deep Learning based Real-Time Fine-grained Pedestrian Recognition using Stream Processing. IET Intell. Transp. Syst. 2018, 12. [Google Scholar] [CrossRef]
Figure 1. DCNet.
Figure 1. DCNet.
Futureinternet 11 00053 g001
Figure 2. Region proposal of DNet.
Figure 2. Region proposal of DNet.
Futureinternet 11 00053 g002
Figure 3. ROIPool Layer.
Figure 3. ROIPool Layer.
Futureinternet 11 00053 g003
Figure 4. DCNet Training.
Figure 4. DCNet Training.
Futureinternet 11 00053 g004
Figure 5. Labels of bow, cabin and stern.
Figure 5. Labels of bow, cabin and stern.
Futureinternet 11 00053 g005
Figure 6. System architecture of ESDR-DL.
Figure 6. System architecture of ESDR-DL.
Futureinternet 11 00053 g006
Figure 7. Ships in bad weather: The top is the ships in bad weather, and the bottom is the process results.
Figure 7. Ships in bad weather: The top is the ships in bad weather, and the bottom is the process results.
Futureinternet 11 00053 g007
Table 1. The performance of the algorithms.
Table 1. The performance of the algorithms.
AlgorithmDatasetsmAPFPSProposed Year
SSD300VOC 2007 + 201274.3462016
SSD500VOC 2007 + 201276.8192016
Faster-RCNNVOC 2007 + 201273.2312015
YOLOVOC 2007 + 201263.4452015
YOLOv2VOC 2007 + 201276.8672016
Tiny YOLOVOC 2007 + 201257.12072016
Table 2. Jetson TX2.
Table 2. Jetson TX2.
VersionsJeston TX2
GPUNVIDIA Pascal 256 CUDA core
CPU64-bit Denver 2 and A57 CPUs
Memory8 GB 128-bit LPDDR4
Storage32 GB eMMC
Video Encode4 K × 2 K 60 Hz
Video Decode4 K × 2 K 60 Hz
Camera1.4 Gpix/s 2.5 Gbps per lane
Connectivity1 Gigabit Ethernet, 802.11ac WLAN
Table 3. Test results of Tiny YOLO and DNet.
Table 3. Test results of Tiny YOLO and DNet.
MethodDeviceAccuracyEfficiency (FPS)Power (w)Energy Efficiency (fps/w)
Tiny YOLOTX20.931618.917.842.41
DNetTX20.923334.877.734.51
Tiny YOLOGTX TITAN X0.9492155.241800.86
DNetGTX TITAN X0.9297298.431771.68
Table 4. Grid cells number test for DNet.
Table 4. Grid cells number test for DNet.
Grid Cells NumberDetection AccuracyEfficiency
4 × 40.764259
5 × 50.831051
6 × 60.923343
7 × 70.925138
8 × 80.928232
9 × 90.926321
Table 5. λ test for Loss L.
Table 5. λ test for Loss L.
λ Detection Accuracy
0.10.5216
0.20.7442
0.30.7513
0.40.8121
0.50.9021
0.60.9113
0.70.9233
0.80.9035
0.90.8945
Table 6. λ impacts on recognition accuracy.
Table 6. λ impacts on recognition accuracy.
λ b λ c λ s Accuracy
1000.83
0100.85
0010.80
0.330.340.330.84
0.250.50.250.85
0.30.50.20.86
0.20.50.30.85
Table 7. Deployment of cameras and TX2s.
Table 7. Deployment of cameras and TX2s.
TX2-1entrance camera-1
TX2-2entrance camera-2
TX2-3entrance camera-3
TX2-3entrance camera-4
TX2-4inside port camera-1
TX2-5inside port camera-2
TX2-6inside port camera-3
TX2-6inside port camera-4
TX2-7inside port camera-5
TX2-7inside port camera-6
Table 8. Performance of ESDR-DL.
Table 8. Performance of ESDR-DL.
CameraSD-PD-RR-PR-RT
entrance camera-19030.860.800.820.7427 fps
entrance camera-29030.860.790.820.7427 fps
entrance camera-38870.850.790.810.7427 fps
entrance camera-48910.860.800.800.7327 fps
inside port camera-115320.890.840.840.7913 fps
inside port camera-211290.870.820.800.7513 fps
inside port camera-314130.900.850.850.7813 fps
inside port camera-414100.890.840.840.7913 fps
inside port camera-516110.890.850.820.7913 fps
inside port camera-616110.880.850.820.8013 fps
Table 9. Performance of ship detection in bad weather.
Table 9. Performance of ship detection in bad weather.
CameraRain-SRain-PRan-RSmog-SSmog-PSmog-RDusk-SDusk-PDusk-R
entrance camera-1460.720.651140.610.522030.83078
entrance camera-2460.740.651140.590.512030.870.80
entrance camera-3390.690.56990.490.392110.850.79
entrance camera-4400.700.551030.480.402250.830.80
inside port camera-1700.750.70910.640.512930.820.76
inside port camera-2580.710.64690.600.552330.800.71
inside port camera-31320.790.711550.700.532540.830.79
inside port camera-41400.740.701690.670.592540.850.80
inside port camera-51290.780.731430.720.562230.850.76
inside port camera-61290.750.711430.700.512230.800.77
Table 10. Performance of ship recognition in bad weather.
Table 10. Performance of ship recognition in bad weather.
CameraRain-SRain-PRan-RSmog-SSmog-PSmog-RDusk-SDusk-PDusk-R
entrance camera-1460.530.451140.360.202030.790.72
entrance camera-2460.450.381140.310.182030.810.75
entrance camera-3390.460.36990.290.212110.820.74
entrance camera-4400.450.321030.260.152250.830.73
inside port camera-1700.550.41910.340.212930.800.72
inside port camera-2580.520.45690.290.172330.750.70
inside port camera-31320.490.391550.390.232540.800.74
inside port camera-41400.540.451690.350.192540.820.76
inside port camera-51290.480.431430.320.242230.800.75
inside port camera-61290.510.411430.250.122230.840.75

Share and Cite

MDPI and ACS Style

Zhao, H.; Zhang, W.; Sun, H.; Xue, B. Embedded Deep Learning for Ship Detection and Recognition. Future Internet 2019, 11, 53. https://doi.org/10.3390/fi11020053

AMA Style

Zhao H, Zhang W, Sun H, Xue B. Embedded Deep Learning for Ship Detection and Recognition. Future Internet. 2019; 11(2):53. https://doi.org/10.3390/fi11020053

Chicago/Turabian Style

Zhao, Hongwei, Weishan Zhang, Haoyun Sun, and Bing Xue. 2019. "Embedded Deep Learning for Ship Detection and Recognition" Future Internet 11, no. 2: 53. https://doi.org/10.3390/fi11020053

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop