Next Article in Journal
A Wearable Computing-Based Machine Learning System for Detecting PTSD Hyperarousal Events: Naturalistic Evaluation of Perceived Precision and User Acceptance
Previous Article in Journal
MVT-Grader: Real-Time Lightweight Multi-View CNN with Auxiliary Loss Aggregation for Tomato Grading
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

LLaVA-Emo: Interpretable Affective Image Stylization via Chain-of-Thought Reasoning

College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(12), 2620; https://doi.org/10.3390/electronics15122620 (registering DOI)
Submission received: 18 May 2026 / Revised: 8 June 2026 / Accepted: 9 June 2026 / Published: 13 June 2026
(This article belongs to the Section Artificial Intelligence)

Abstract

Affective Image Stylization (AIS) converts an emotional intent into executable artistic visual styles. Existing methods are often limited to discrete label settings and provide limited interpretability of how target emotions are realized. We propose LLaVA-Emo, an interpretable AIS framework built on multimodal Chain-of-Thought (CoT) reasoning. Our method decouples generation into two structured outputs: <reasoning> provides visual–affective causal explanations grounded in the input image evidence, and <style_prompt> expresses actionable, renderer-ready style instructions that directly condition a frozen diffusion renderer. We constructed a training set by screening ArtEmis’ sentiment interpretations and fine-tune LLaVA-1.5-7B with LoRA, where SFT mainly supervises the structured intermediate <reasoning> (and output format), while the true executability of <style_prompt> is enforced by our DPO stage via render-and-reward feedback. The rendering stage remains training-free, and we further apply DPO for preference optimization to align candidate outputs with both emotion fidelity and instruction executability. Experiments on the EmoEdit inference set demonstrate that LLaVA-Emo improves emotion alignment while providing stronger process interpretability.
Keywords: affective image stylization; interpretable generation; multimodal chain-of-thought; reasoning-to-prompt decoupling; diffusion rendering; direct preference optimization (DPO) affective image stylization; interpretable generation; multimodal chain-of-thought; reasoning-to-prompt decoupling; diffusion rendering; direct preference optimization (DPO)

Share and Cite

MDPI and ACS Style

Tang, K.; Xu, Q. LLaVA-Emo: Interpretable Affective Image Stylization via Chain-of-Thought Reasoning. Electronics 2026, 15, 2620. https://doi.org/10.3390/electronics15122620

AMA Style

Tang K, Xu Q. LLaVA-Emo: Interpretable Affective Image Stylization via Chain-of-Thought Reasoning. Electronics. 2026; 15(12):2620. https://doi.org/10.3390/electronics15122620

Chicago/Turabian Style

Tang, Kaichen, and Qi Xu. 2026. "LLaVA-Emo: Interpretable Affective Image Stylization via Chain-of-Thought Reasoning" Electronics 15, no. 12: 2620. https://doi.org/10.3390/electronics15122620

APA Style

Tang, K., & Xu, Q. (2026). LLaVA-Emo: Interpretable Affective Image Stylization via Chain-of-Thought Reasoning. Electronics, 15(12), 2620. https://doi.org/10.3390/electronics15122620

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop