Fine-Grained Ship Recognition from the Horizontal View Based on Domain Adaptation
Abstract
:1. Introduction
- In view of the small number of subclass images of each ship type, in order to enrich the data, we propose the use of computer simulation software to generate simulation ship images with different types and different angles, so as to expand training data. The main types identified in this paper are Arleigh Burke, Murasame, Nimitz, Ticonderoga, and Wasp. Some of them were selected from the third level of [10]. Because Murasame is similar to Arleigh Burke, it was also added to verify the effectiveness of our model.
- Due to the large gap between the simulation images and the real images, we propose that style transfer be performed on the simulation images first, such that the images processed by style transfer are closer to the real ship images.
- In view of the different feature distribution between the simulation images and the real images, we propose a domain adaptation method for transfer learning. The simulation images are used as the source domain data and the real images are used as the target domain data for training. LMMD is used to realize the alignment of sub-domains and enhance the capture of fine-grained information of each category by the network, improving the recognition accuracy of the model in fine-grained classification tasks. At the same time, the feature extraction module of the transformer structure is used to extract features, and its effectiveness in fine-grained classification is verified.
2. Related Work
2.1. Fine-Grained Visual Categorization (FGVC)
2.2. Detection and FGVC on Ships
3. Construction of Fine-Grained Ship Dataset
3.1. Annotation from Real Images and Videos
3.2. Generation through Computer Simulation Technology
3.3. Statistical Analysis of Ship Target Datasets
4. Ship Classification Based on Domain Adaptation and Vision Transformer
4.1. The Architecture of Training Ship Recognition Model Based on Domain Adaptation
4.2. Ship Recognition Model
4.3. Loss Function
5. Experiments
5.1. Experimental Setting
5.2. Experimental Results
5.2.1. Ablation Experiments
5.2.2. Detection and Classification Results
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wang, Y.; Ma, L.; Tian, Y. State-of-the-art of Ship Detection and Recognition in Optical Remotely Sensed Imagery. Acta Autom. Sin. 2011, 37, 1029–1039. [Google Scholar]
- Shi, W.; Jiang, J.; Bao, S. Ship Detection Method in Remote Sensing Image Based on Feature Fusion. Acta Photonica Sin. 2020, 49, 57–67. [Google Scholar]
- Zhang, J.; Wang, H. Ship Target Detection in SAR Image Based on Improved YOLOv3. J. Signal Process. 2021, 37, 1623–1632. [Google Scholar]
- Han, Z.; Wang, C.; Fu, Q. Ship Detection in SAR Images Based on Deep Feature Enhancement Network. Trans. Beijing Inst. Technol. 2021, 41, 1006–1014. [Google Scholar]
- Huang, L.; Liu, B.; Li, B.; Guo, W.; Yu, W.; Zhang, Z.; Yu, W. OpenSARShip: A dataset dedicated to Sentinel-1 ship interpretation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 195–208. [Google Scholar] [CrossRef]
- Li, X.; Huang, W.; Peters, D.K.; Power, D. Assessment of Synthetic Aperture Radar Image Preprocessing Methods for Iceberg and Ship Recognition with Convolutional Neural Networks. In Proceedings of the 2019 IEEE Radar Conference (RadarConf), Boston, MA, USA, 22–26 April 2019; pp. 1–5. [Google Scholar]
- Song, Z.; Yang, J.; Zhang, D.; Wang, S.; Li, Z. Semi-Supervised Dim and Small Infrared Ship Detection Network Based on Haar Wavelet. IEEE Access 2021, 9, 29686–29695. [Google Scholar] [CrossRef]
- Zhang, M.M.; Choi, J.; Daniilidis, K.; Wolf, M.T.; Kanan, C. VAIS: A Dataset for Recognizing Maritime Imagery in the Visible and Infrared Spectrums. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, USA, 7–12 June 2015; pp. 10–16. [Google Scholar]
- Yao, L.; Zhang, X.; Lyu, Y.; Sun, W.; Li, M. FGSC-23: A large-scale dataset of high-resolution optical remote sensing image for deep learning-based fine-grained ship recognition. J. Image Graph. 2021, 26, 2337–2345. [Google Scholar]
- Liu, Z.; Yuan, L.; Weng, L.; Yang, Y. A High Resolution Optical Satellite Image Dataset for Ship Recognition and Some New Baselines. In Proceedings of the 6th International Conference on Pattern Recognition APPLICATIONS and methods, Porto, Portugal, 24–26 February 2017. [Google Scholar]
- Gundogdu, E.; Solmaz, B.; Yücesoy, V.; Koç, A. MARVEL: A Large-Scale Image Dataset for Maritime Vessels. In Proceedings of the 2016 Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; pp. 165–180. [Google Scholar]
- Di, Y.; Jiang, Z.; Zhang, H. A Public Dataset for Fine-Grained Ship Classification in Optical Remote Sensing Images. Remote Sens. 2021, 13, 747. [Google Scholar] [CrossRef]
- Zhang, N.; Donahue, J.; Girshick, R.; Darrell, T. Part-Based R-CNNs for Fine-Grained Category Detection. In Proceedings of the 2014 European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 834–849. [Google Scholar]
- Lin, D.; Shen, X.; Lu, C.; Jia, J. Deep LAC: Deep localization, alignment and classification for fine-grained recognition. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–13 June 2015; pp. 1666–1674. [Google Scholar]
- Lin, T.; RoyChowdhury, A.; Maji, S. Bilinear CNN Models for Fine-Grained Visual Recognition. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1449–1457. [Google Scholar]
- Gao, Y.; Beijbom, O.; Zhang, N.; Darrell, T. Compact Bilinear Pooling. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 317–326. [Google Scholar]
- Wang, Y.; Morariu, V.I.; Davis, L.S. Learning a Discriminative Filter Bank within a CNN for Fine-Grained Recognition. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4148–4157. [Google Scholar]
- He, J.; Chen, J.; Liu, S.; Kortylewski, A.; Yang, C.; Bai, Y.; Wang, C. TransFG: A Transformer Architecture for Fine-grained Recognition. arXiv 2021, arXiv:2103.07976. [Google Scholar]
- Zhu, Y.; Zhuang, F.; Wang, J.; Ke, G.; Chen, J.; Bian, J.; Xiong, H.; He, Q. Deep Subdomain Adaptation Network for Image Classification. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 1713–1722. [Google Scholar] [CrossRef] [PubMed]
- Long, M.; Cao, Y.; Wang, J.; Jordan, M. Learning Transferable Features with Deep Adaptation Networks. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7–9 July 2015; pp. 97–105. [Google Scholar]
- Long, M.; Zhu, H.; Wang, J.; Jordan, M. Deep Transfer Learning with Joint Adaptation Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 2208–2217. [Google Scholar]
- Song, Z.; Sui, H.; Wang, Y. Automatic ship detection for optical satellite images based on visual attention model and LBP. In Proceedings of the 2014 IEEE Workshop on Electronics, Computer and Applications, Ottawa, ON, Canada, 8–9 May 2014; pp. 722–725. [Google Scholar]
- Antelo, J.; Ambrosio, G.; Gonzalez, J.; Galindo, C. Ship detection and recognitionin high-resolution satellite images. In Proceedings of the 2009 IEEE International Geoscience and Remote Sensing Symposium, Cape Town, South Africa, 12–17 July 2009; pp. IV-514–IV-517. [Google Scholar]
- Rainey, K.; Stastny, J. Object recognition in ocean imagery using feature selection and compressive sensing. In Proceedings of the 2011 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, 11–13 October 2011; pp. 1–6. [Google Scholar]
- Nie, X.; Duan, M.; Ding, H.; Hu, B.; Wong, E.K. Attention Mask R-CNN for Ship Detection and Segmentation from Remote Sensing Images. IEEE Access 2020, 8, 9325–9334. [Google Scholar] [CrossRef]
- Zhao, H.; Zhang, W.; Sun, H.; Xue, B. Embedded Deep Learning for Ship Detection and Recognition. Future Internet 2019, 11, 53. [Google Scholar] [CrossRef] [Green Version]
- Zhang, X.; Lv, Y.; Yao, L.; Xiong, W.; Fu, C. A New Benchmark and an Attribute-Guided Multilevel Feature Representation Network for Fine-Grained Ship Classification in Optical Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 1271–1285. [Google Scholar] [CrossRef]
- Chen, Q.; Gu, Y.; Song, Z.; Nie, S. Semi-automatic Video Target Annotation by Combining Detection and Tracking. Comput. Eng. Appl. 2021, 57, 223–230. [Google Scholar]
- Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2009, 22, 1345–1359. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All you Need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Proceedings of the 2020 European Conference on Computer Vision, Glasgow, Scotland, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.S.; et al. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 6881–6890. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Ghifary, M.; Kleijn, W.B.; Zhang, M. Domain Adaptive Neural Networks for Object Recognition. In Proceedings of the 2014 Pacific Rim International Conference on Artificial Intelligence, Gold Coast, Australia, 1–5 December 2014; pp. 898–904. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 22–25 July 2017; pp. 7263–7271. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the 2016 European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37. [Google Scholar]
- Ren, S.; He, K.; Girshick, R. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. In Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 734–750. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
Dataset | Class | Arl. | Mur. | Nim. | Tic. | Wasp | Total |
---|---|---|---|---|---|---|---|
Simulation | 5 | 2160 | 2160 | 2160 | 2160 | 2160 | 10,800 |
Real | 5 | 545 | 497 | 519 | 502 | 505 | 2568 |
Test | 5 | 48 | 45 | 47 | 43 | 43 | 226 |
Backbone | Loss | Epoch | Optimizer | Batch Size | Learn Rate | Image Size |
---|---|---|---|---|---|---|
ViT-B_16 | CE + LMMD + Contrastive | 100 | SGD (m = 0.9) | 16 | 1 × 10−2 | 224 × 224 |
ID * | Backbone | Training Strategy | Style Transfer | OA/% |
---|---|---|---|---|
1 | ResNet50 | Only real dataset | 80.5 | |
2 | ResNet50 | Directly mix datasets | 76.1 | |
3 | ResNet50 | Directly mix datasets | √ | 80.0 |
4 | ResNet50 | Domain adaptation (global alignment) | 90.7 | |
5 | ResNet50 | Domain adaptation (global alignment) | √ | 92.4 |
6 | ResNet50 | Domain adaptation (sub-domain alignment) | 92.5 | |
7 | ResNet50 | Domain adaptation (sub-domain alignment) | √ | 93.4 |
8 | ViT-B_16 | Only real dataset | 85.4 | |
9 | ViT-B_16 | Directly mix datasets | 81.4 | |
10 | ViT-B_16 | Directly mix datasets | √ | 84.5 |
11 | ViT-B_16 | Domain adaptation (global alignment) | 93.8 | |
12 | ViT-B_16 | Domain adaptation (global alignment) | √ | 95.6 |
13 | ViT-B_16 | Domain adaptation (sub-domain alignment) | 95.2 | |
14 | ViT-B_16 | Domain adaptation (sub-domain alignment) | √ | 96.0 |
ID * | Arl./% | Mur./% | Nim./% | Tic./% | Wasp/% | AA/% |
---|---|---|---|---|---|---|
1 | 89.6 | 82.2 | 61.7 | 69.8 | 100.0 | 80.7 |
2 | 62.5 | 73.3 | 68.1 | 81.4 | 97.7 | 76.6 |
3 | 81.3 | 80.0 | 63.8 | 76.7 | 100.0 | 80.4 |
4 | 95.8 | 82.2 | 95.7 | 79.1 | 100.0 | 90.6 |
5 | 95.8 | 86.7 | 93.6 | 86.0 | 100.0 | 92.4 |
6 | 100.0 | 84.4 | 91.5 | 93.0 | 95.3 | 92.9 |
7 | 100.0 | 86.7 | 100.0 | 79.1 | 100.0 | 93.1 |
8 | 75.0 | 80.0 | 97.9 | 90.7 | 83.7 | 85.5 |
9 | 77.1 | 60.0 | 95.7 | 83.7 | 90.7 | 81.4 |
10 | 72.9 | 95.6 | 95.7 | 72.1 | 86.1 | 84.5 |
11 | 83.3 | 93.3 | 100.0 | 95.3 | 97.7 | 93.9 |
12 | 89.6 | 93.3 | 97.9 | 97.7 | 100.0 | 95.7 |
13 | 81.3 | 95.6 | 100.0 | 100.0 | 100.0 | 95.4 |
14 | 85.4 | 97.8 | 100.0 | 97.7 | 100.0 | 96.2 |
Method | Arl./% | Mur./% | Nim./% | Tic./% | Wasp/% | mAP/% |
---|---|---|---|---|---|---|
YOLOv5s | 81.6 | 81.5 | 61.5 | 58.2 | 78.7 | 72.3 |
Cascade R-CNN | 86.5 | 88.6 | 62.4 | 63.0 | 86.1 | 77.3 |
YOLOv5s + Ours | 82.1 | 87.6 | 89.6 | 81.4 | 96.9 | 87.5 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, S.; Gu, Y.; Ren, M. Fine-Grained Ship Recognition from the Horizontal View Based on Domain Adaptation. Sensors 2022, 22, 3243. https://doi.org/10.3390/s22093243
Sun S, Gu Y, Ren M. Fine-Grained Ship Recognition from the Horizontal View Based on Domain Adaptation. Sensors. 2022; 22(9):3243. https://doi.org/10.3390/s22093243
Chicago/Turabian StyleSun, Shicheng, Yu Gu, and Mengjun Ren. 2022. "Fine-Grained Ship Recognition from the Horizontal View Based on Domain Adaptation" Sensors 22, no. 9: 3243. https://doi.org/10.3390/s22093243
APA StyleSun, S., Gu, Y., & Ren, M. (2022). Fine-Grained Ship Recognition from the Horizontal View Based on Domain Adaptation. Sensors, 22(9), 3243. https://doi.org/10.3390/s22093243