Next Article in Journal
Electrical Resistivity Tomography Methods and Technical Research for Hydrate-Based Carbon Sequestration
Previous Article in Journal
Temperature-Induced Errors in ITTC Model-Ship Extrapolation
Previous Article in Special Issue
Unmanned Surface Vessels in Marine Surveillance and Management: Advances in Communication, Navigation, Control, and Data-Driven Research
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

A Study on Generating Maritime Image Captions Based on Transformer Dual Information Flow

1
Navigation College, Dalian Maritime University, Dalian 116026, China
2
Public Security Technology R&D Center, Liaoning Police College, Dalian 116036, China
3
Zhilong (Dalian) Marine Technology Co., Ltd., 11th Floor, No. 523 Huangpu Road, Dalian 116020, China
*
Author to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2025, 13(7), 1204; https://doi.org/10.3390/jmse13071204 (registering DOI)
Submission received: 23 May 2025 / Revised: 19 June 2025 / Accepted: 20 June 2025 / Published: 21 June 2025

Abstract

The environmental perception capability of intelligent ships is essential for enhancing maritime navigation safety and advancing shipping intelligence. Image caption generation technology plays a pivotal role in this context by converting visual information into structured semantic descriptions. However, existing general purpose models often struggle to perform effectively in complex maritime environments due to limitations in visual feature extraction and semantic modeling. To address these challenges, this study proposes a transformer dual-stream information (TDSI) model. The proposed model uses a Swin-transformer to extract grid features and combines them with fine-grained scene semantics obtained via SegFormer. A dual-encoder structure independently encodes the grid and segmentation features, which are subsequently fused through a feature fusion module for implicit integration. A decoder with a cross-attention mechanism is then employed to generate descriptive captions for maritime images. Extensive experiments were conducted using the constructed maritime semantic segmentation and maritime image captioning datasets. The results demonstrate that the proposed TDSI model outperforms existing mainstream methods in terms of several evaluation metrics, including BLEU, METEOR, ROUGE, and CIDEr. These findings confirm the effectiveness of the TDSI model in enhancing image captioning performance in maritime environments.
Keywords: intelligent ships; image captioning generation; transformer intelligent ships; image captioning generation; transformer

Share and Cite

MDPI and ACS Style

Zhao, Z.; Shen, H.; Wang, M.; Wang, Y. A Study on Generating Maritime Image Captions Based on Transformer Dual Information Flow. J. Mar. Sci. Eng. 2025, 13, 1204. https://doi.org/10.3390/jmse13071204

AMA Style

Zhao Z, Shen H, Wang M, Wang Y. A Study on Generating Maritime Image Captions Based on Transformer Dual Information Flow. Journal of Marine Science and Engineering. 2025; 13(7):1204. https://doi.org/10.3390/jmse13071204

Chicago/Turabian Style

Zhao, Zhenqiang, Helong Shen, Meng Wang, and Yufei Wang. 2025. "A Study on Generating Maritime Image Captions Based on Transformer Dual Information Flow" Journal of Marine Science and Engineering 13, no. 7: 1204. https://doi.org/10.3390/jmse13071204

APA Style

Zhao, Z., Shen, H., Wang, M., & Wang, Y. (2025). A Study on Generating Maritime Image Captions Based on Transformer Dual Information Flow. Journal of Marine Science and Engineering, 13(7), 1204. https://doi.org/10.3390/jmse13071204

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop