This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
A Study on Generating Maritime Image Captions Based on Transformer Dual Information Flow
by
Zhenqiang Zhao
Zhenqiang Zhao 1
,
Helong Shen
Helong Shen 1,*
,
Meng Wang
Meng Wang 2 and
Yufei Wang
Yufei Wang 3
1
Navigation College, Dalian Maritime University, Dalian 116026, China
2
Public Security Technology R&D Center, Liaoning Police College, Dalian 116036, China
3
Zhilong (Dalian) Marine Technology Co., Ltd., 11th Floor, No. 523 Huangpu Road, Dalian 116020, China
*
Author to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2025, 13(7), 1204; https://doi.org/10.3390/jmse13071204 (registering DOI)
Submission received: 23 May 2025
/
Revised: 19 June 2025
/
Accepted: 20 June 2025
/
Published: 21 June 2025
Abstract
The environmental perception capability of intelligent ships is essential for enhancing maritime navigation safety and advancing shipping intelligence. Image caption generation technology plays a pivotal role in this context by converting visual information into structured semantic descriptions. However, existing general purpose models often struggle to perform effectively in complex maritime environments due to limitations in visual feature extraction and semantic modeling. To address these challenges, this study proposes a transformer dual-stream information (TDSI) model. The proposed model uses a Swin-transformer to extract grid features and combines them with fine-grained scene semantics obtained via SegFormer. A dual-encoder structure independently encodes the grid and segmentation features, which are subsequently fused through a feature fusion module for implicit integration. A decoder with a cross-attention mechanism is then employed to generate descriptive captions for maritime images. Extensive experiments were conducted using the constructed maritime semantic segmentation and maritime image captioning datasets. The results demonstrate that the proposed TDSI model outperforms existing mainstream methods in terms of several evaluation metrics, including BLEU, METEOR, ROUGE, and CIDEr. These findings confirm the effectiveness of the TDSI model in enhancing image captioning performance in maritime environments.
Share and Cite
MDPI and ACS Style
Zhao, Z.; Shen, H.; Wang, M.; Wang, Y.
A Study on Generating Maritime Image Captions Based on Transformer Dual Information Flow. J. Mar. Sci. Eng. 2025, 13, 1204.
https://doi.org/10.3390/jmse13071204
AMA Style
Zhao Z, Shen H, Wang M, Wang Y.
A Study on Generating Maritime Image Captions Based on Transformer Dual Information Flow. Journal of Marine Science and Engineering. 2025; 13(7):1204.
https://doi.org/10.3390/jmse13071204
Chicago/Turabian Style
Zhao, Zhenqiang, Helong Shen, Meng Wang, and Yufei Wang.
2025. "A Study on Generating Maritime Image Captions Based on Transformer Dual Information Flow" Journal of Marine Science and Engineering 13, no. 7: 1204.
https://doi.org/10.3390/jmse13071204
APA Style
Zhao, Z., Shen, H., Wang, M., & Wang, Y.
(2025). A Study on Generating Maritime Image Captions Based on Transformer Dual Information Flow. Journal of Marine Science and Engineering, 13(7), 1204.
https://doi.org/10.3390/jmse13071204
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.