LLaVA-Emo: Interpretable Affective Image Stylization via Chain-of-Thought Reasoning

Tang, Kaichen; Xu, Qi

doi:10.3390/electronics15122620

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

LLaVA-Emo: Interpretable Affective Image Stylization via Chain-of-Thought Reasoning

by

Kaichen Tang

and

Qi Xu

^*

College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(12), 2620; https://doi.org/10.3390/electronics15122620 (registering DOI)

Submission received: 18 May 2026 / Revised: 8 June 2026 / Accepted: 9 June 2026 / Published: 13 June 2026

(This article belongs to the Section Artificial Intelligence)

Download Versions Notes

Abstract

Affective Image Stylization (AIS) converts an emotional intent into executable artistic visual styles. Existing methods are often limited to discrete label settings and provide limited interpretability of how target emotions are realized. We propose LLaVA-Emo, an interpretable AIS framework built on multimodal Chain-of-Thought (CoT) reasoning. Our method decouples generation into two structured outputs: <reasoning> provides visual–affective causal explanations grounded in the input image evidence, and <style_prompt> expresses actionable, renderer-ready style instructions that directly condition a frozen diffusion renderer. We constructed a training set by screening ArtEmis’ sentiment interpretations and fine-tune LLaVA-1.5-7B with LoRA, where SFT mainly supervises the structured intermediate <reasoning> (and output format), while the true executability of <style_prompt> is enforced by our DPO stage via render-and-reward feedback. The rendering stage remains training-free, and we further apply DPO for preference optimization to align candidate outputs with both emotion fidelity and instruction executability. Experiments on the EmoEdit inference set demonstrate that LLaVA-Emo improves emotion alignment while providing stronger process interpretability.

Keywords: affective image stylization; interpretable generation; multimodal chain-of-thought; reasoning-to-prompt decoupling; diffusion rendering; direct preference optimization (DPO)

Share and Cite

MDPI and ACS Style

Tang, K.; Xu, Q. LLaVA-Emo: Interpretable Affective Image Stylization via Chain-of-Thought Reasoning. Electronics 2026, 15, 2620. https://doi.org/10.3390/electronics15122620

AMA Style

Tang K, Xu Q. LLaVA-Emo: Interpretable Affective Image Stylization via Chain-of-Thought Reasoning. Electronics. 2026; 15(12):2620. https://doi.org/10.3390/electronics15122620

Chicago/Turabian Style

Tang, Kaichen, and Qi Xu. 2026. "LLaVA-Emo: Interpretable Affective Image Stylization via Chain-of-Thought Reasoning" Electronics 15, no. 12: 2620. https://doi.org/10.3390/electronics15122620

APA Style

Tang, K., & Xu, Q. (2026). LLaVA-Emo: Interpretable Affective Image Stylization via Chain-of-Thought Reasoning. Electronics, 15(12), 2620. https://doi.org/10.3390/electronics15122620

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LLaVA-Emo: Interpretable Affective Image Stylization via Chain-of-Thought Reasoning

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI