This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
LLaVA-Emo: Interpretable Affective Image Stylization via Chain-of-Thought Reasoning
by
Kaichen Tang
Kaichen Tang
and
Qi Xu
Qi Xu *
College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(12), 2620; https://doi.org/10.3390/electronics15122620 (registering DOI)
Submission received: 18 May 2026
/
Revised: 8 June 2026
/
Accepted: 9 June 2026
/
Published: 13 June 2026
Abstract
Affective Image Stylization (AIS) converts an emotional intent into executable artistic visual styles. Existing methods are often limited to discrete label settings and provide limited interpretability of how target emotions are realized. We propose LLaVA-Emo, an interpretable AIS framework built on multimodal Chain-of-Thought (CoT) reasoning. Our method decouples generation into two structured outputs: <reasoning> provides visual–affective causal explanations grounded in the input image evidence, and <style_prompt> expresses actionable, renderer-ready style instructions that directly condition a frozen diffusion renderer. We constructed a training set by screening ArtEmis’ sentiment interpretations and fine-tune LLaVA-1.5-7B with LoRA, where SFT mainly supervises the structured intermediate <reasoning> (and output format), while the true executability of <style_prompt> is enforced by our DPO stage via render-and-reward feedback. The rendering stage remains training-free, and we further apply DPO for preference optimization to align candidate outputs with both emotion fidelity and instruction executability. Experiments on the EmoEdit inference set demonstrate that LLaVA-Emo improves emotion alignment while providing stronger process interpretability.
Share and Cite
MDPI and ACS Style
Tang, K.; Xu, Q.
LLaVA-Emo: Interpretable Affective Image Stylization via Chain-of-Thought Reasoning. Electronics 2026, 15, 2620.
https://doi.org/10.3390/electronics15122620
AMA Style
Tang K, Xu Q.
LLaVA-Emo: Interpretable Affective Image Stylization via Chain-of-Thought Reasoning. Electronics. 2026; 15(12):2620.
https://doi.org/10.3390/electronics15122620
Chicago/Turabian Style
Tang, Kaichen, and Qi Xu.
2026. "LLaVA-Emo: Interpretable Affective Image Stylization via Chain-of-Thought Reasoning" Electronics 15, no. 12: 2620.
https://doi.org/10.3390/electronics15122620
APA Style
Tang, K., & Xu, Q.
(2026). LLaVA-Emo: Interpretable Affective Image Stylization via Chain-of-Thought Reasoning. Electronics, 15(12), 2620.
https://doi.org/10.3390/electronics15122620
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.