Next Article in Journal
Resisting Memorization-Based APT Attacks Under Incomplete Information in DDHR Architecture: An Entropy-Heterogeneity-Aware RL-Based Scheduling Approach
Previous Article in Journal
Detrended Cross-Correlations and Their Random Matrix Limit: An Example from the Cryptocurrency Market
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

IE-MAS: Internal–External Multi-Agent Steering for Controllable Image Captioning

1
College of Computer and Data Science, Fuzhou University, Fuzhou 350108, China
2
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(12), 1237; https://doi.org/10.3390/e27121237
Submission received: 5 November 2025 / Revised: 3 December 2025 / Accepted: 5 December 2025 / Published: 7 December 2025
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

Controllable Image Captioning (CIC) aims to generate coherent and semantically faithful textual descriptions of images while adhering to user-specified constraints. Existing methods have achieved promising results under individual constraints such as sentimental style or sentence length. However, they typically fail to handle and satisfy multiple constraints simultaneously, as the controls often interact and interfere with one another. To overcome these challenges, we propose Internal–External Multi-Agent Steering (IE-MAS) for CIC. IE-MAS introduces an internal multimodal steering (IMS) strategy to control affective coherence within the caption, and an external multi-agent collaboration system (EMCS) to guide visual grounding and contextual alignment. From an information-theoretic view, IMS reduces uncertainty in the generation process, while EMCS strengthens the dependency between captions and visual inputs, converting the length and sentiment constraints into information gains. Together, they produce a stable balance among semantic consistency, affective expression, and length control through an adaptive steering process that dynamically balances internal linguistic control and external perceptual grounding. Experimental results demonstrate that IE-MAS effectively coordinates multiple constraints, producing captions that satisfy the length constraint and are sentimental expressive and visually faithful.
Keywords: controllable image captioning; multi-agent systems; information theory; affective analysis controllable image captioning; multi-agent systems; information theory; affective analysis

Share and Cite

MDPI and ACS Style

Cai, T.; Chen, C.; Lin, S.; Ju, S.; Liao, X. IE-MAS: Internal–External Multi-Agent Steering for Controllable Image Captioning. Entropy 2025, 27, 1237. https://doi.org/10.3390/e27121237

AMA Style

Cai T, Chen C, Lin S, Ju S, Liao X. IE-MAS: Internal–External Multi-Agent Steering for Controllable Image Captioning. Entropy. 2025; 27(12):1237. https://doi.org/10.3390/e27121237

Chicago/Turabian Style

Cai, Tiecheng, Chao Chen, Shanshan Lin, Sibo Ju, and Xiangwen Liao. 2025. "IE-MAS: Internal–External Multi-Agent Steering for Controllable Image Captioning" Entropy 27, no. 12: 1237. https://doi.org/10.3390/e27121237

APA Style

Cai, T., Chen, C., Lin, S., Ju, S., & Liao, X. (2025). IE-MAS: Internal–External Multi-Agent Steering for Controllable Image Captioning. Entropy, 27(12), 1237. https://doi.org/10.3390/e27121237

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop