Emotion-Enhanced Dual-Agent Recommendation: Understanding and Leveraging Cognitive Conflicts for Better Personalization

Yang, Yulin; Wang, Zikang; Li, Linjing; Zeng, Daniel

doi:10.3390/app16010253

Open AccessArticle

Emotion-Enhanced Dual-Agent Recommendation: Understanding and Leveraging Cognitive Conflicts for Better Personalization

¹

Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

²

School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(1), 253; https://doi.org/10.3390/app16010253

Submission received: 23 November 2025 / Revised: 17 December 2025 / Accepted: 18 December 2025 / Published: 26 December 2025

(This article belongs to the Topic Agents and Multi-Agent Systems)

Download

Browse Figures

Versions Notes

Abstract

Traditional recommendation systems are largely built upon the “rational-agent” assumption, representing user preferences as static numerical vectors while neglecting the pivotal role of emotions in decision-making. However, according to the dual-system theory in cognitive psychology, human decisions are jointly governed by two interacting subsystems: a rational system responsible for deliberate reasoning and an affective system driven by emotion and intuition. Conflicts between these two systems often lead to inconsistencies between users’ preferences and emotional experiences in real-world recommendation scenarios. To address this challenge, we propose an Emotion-Enhanced Dual-Agent Collaborative Framework (EDACF) that explicitly models and leverages cognitive conflicts between users’ emotional experiences and rational preferences. EDACF introduces user and item agents equipped with separate natural language memories for preference, emotion, and conflict representations, enabling cognitive-level reasoning beyond static numerical modeling. The framework features three key innovations: (1) a conflict detection mechanism that identifies users’ cognitive inconsistency states; (2) a dual-memory update strategy that maintains preference stability while capturing emotional dynamics; and (3) an adaptive reasoning mechanism that adjusts decision weights based on detected conflicts. Extensive experiments demonstrate that EDACF outperforms state-of-the-art baselines by 9.9% in NDCG@10 and 13.1% in MRR@10, with improvements exceeding 32% among user groups with high conflict. These results highlight a paradigm shift in recommendation systems from behavior prediction toward cognitive-level understanding of user decision processes.

Keywords:

affective recommendation; cognitive conflict; dual-agent system; large language models; memory mechanism; personalization

1. Introduction

In today’s era of information overload, recommendation systems serve as a crucial bridge connecting users with vast amounts of digital content. However, most existing systems are built upon the assumption of rational decision-making: they model user preferences as static numerical vectors, typically learned through ID embeddings or explicit feature matching. While this approach has achieved remarkable success in industry, it overlooks a critical reality—user decisions are not purely rational but rather the result of intertwined rational and emotional processes. Since the learned embeddings lack semantic information, they struggle to capture the fine-grained emotional nuances embedded in textual content [1,2]. This neglect of users’ affective characteristics prevents current systems from accurately modeling complex behavioral patterns, thus limiting user satisfaction [3,4].

Our analysis on the Amazon Movies & TV dataset confirms the prevalence of this phenomenon: a substantial cohort of users exhibits notable emotion–behavior conflicts. Two typical conflict patterns are observed.

“Cognitive Dissonance” Pattern: This conflict refers to the divergence between users’ ratings and the emotions expressed in their reviews. For example, a user gives a movie a rating of two (out of five), but writes in the review, “I did enjoy the film. The visuals were beautiful as were all the costumes. The actors themselves were fantastic,” reflecting an internal contradiction between rating behavior and emotional experience.
“Hate-Watching” Pattern: This conflict refers to users continuing to consume similar content after giving low ratings and negative reviews. For example, a user rates Transformers: Age of Extinction with one star and comments “terrible plot,” but still watches Transformers: The Last Knight and The Fate of the Furious months later.

These findings demonstrate that user behaviors are not driven by purely rational preferences but rather emerge from the joint influence of emotional and rational decision processes. Figure 1 illustrates this paradigm shift from the traditional rational assumption to the cognitive reality we observed.

The dual-system theory in cognitive psychology [5] provides a theoretical framework for modeling the joint influence of emotion and reason in human decision-making. Specifically, System 1 operates intuitively and rapidly under emotional drives, while System 2 engages in slow, deliberate, and analytical reasoning. Studies suggest that a substantial share of human decisions is guided by affective heuristics of System 1 [6]. Because emotions and rationality often conflict during decision-making, human behaviors frequently exhibit complex or even contradictory patterns.

Despite recent advances in semantic understanding and interactive intelligence brought by large models and agent-based systems, most existing recommendation systems remain confined to modeling System 2 and struggle to respond effectively when users’ rational and emotional states diverge [7]. The limitations of current research are mainly reflected in the following aspects:

Focusing solely on rational preference modeling: Mainstream methods [8,9,10] still primarily emphasize users’ rational preferences, treating users as mere preference matchers while neglecting the affective dimension of how users feel. For instance, InteRecAgent [8] only summarizes users’ preferences, and RecMind [9] does not maintain emotional states.
Overly coarse-grained emotion modeling: Although a few recent studies (e.g., Agent4Rec [11]) have begun to incorporate emotions, they typically rely on simplistic binary or categorical labels such as “satisfied/unsatisfied” or “fatigued/energetic”, ignoring the rich semantics and dynamic evolution of emotions. For instance, “shocking but slightly heavy” versus “mind-bending and exciting”, though both are positive emotions, have subtle differences that are crucial for recommendation decisions.
Lack of cognitive-level memory separation: Emotional memory differs fundamentally from preference memory, possessing unique characteristics such as semantic richness, temporal sensitivity, and interactive complexity. However, existing systems either lack an emotional memory module altogether (e.g., InteRecAgent, RecMind) or entangle it with preference memory. Although AgentCF [12] introduces separate memories for user and item agents, this separation operates along the user-item dimension rather than the cognitive dimension—it does not distinguish between users’ rational preferences (System 2) and emotional responses (System 1), nor does it detect or leverage conflicts between them.

The neglect or oversimplification of users’ emotional dimensions in existing approaches prevents current systems from understanding the complete cognitive process of users and hinders their ability to make reasonable decisions in conflict scenarios.

To address this, we propose the Emotion-Enhanced Dual-Agent Collaborative Framework (EDACF), aiming to reconstruct the cognitive modeling approach of recommendation systems. Unlike previous systems that avoid or ignore emotional conflicts, our framework views them as key signals for understanding users’ deeper needs. Its core innovation is reflected in three aspects: First, we design a dual-memory architecture that independently maintains emotion memory and preference memory within the user agent, where emotion memory preserves fine-grained emotional responses in natural language form, while preference memory encodes long-term stable interests and tastes, thus achieving the first explicit separation modeling of rationality and emotion. Second, we propose a conflict-aware mechanism capable of automatically detecting emotion–rational conflicts, identifying complex behavioral patterns, and dynamically adjusting reasoning and recommendation strategies according to the type and intensity of the detected conflict. Finally, we introduce co-evolutionary learning through continuous interaction between user agents and item agents, enabling both to continuously adapt and optimize in the process of understanding and responding to users’ complex cognition, thereby achieving higher-level personalized recommendations.

Extensive experiments on two real-world datasets validate the effectiveness of our method. Results show that compared to the state-of-the-art baseline method AgentCF, our approach achieves 9.9% improvement in NDCG@10 and 13.1% improvement in MRR@10. Particularly for high-cognitive-conflict user groups, the performance improvement is even more significant (over 32%), confirming the value of leveraging conflict signals.

The main contributions of this paper are as follows:

From a cognitive science perspective, we deeply investigate the emotion–rationality conflict phenomenon in recommendation systems, identifying and quantifying the existence and distribution characteristics of emotion–preference conflicts in large-scale datasets.
We propose the Emotion-Enhanced Dual-Agent Collaborative Framework, which independently models users’ emotion memory and preference memory through a separate memory architecture and designs a conflict detection mechanism to identify and leverage cognitive conflict signals.
We develop adaptive reasoning strategies that can automatically select appropriate reasoning approaches based on detected cognitive conflict states, providing personalized recommendations for users with different cognitive states.
Extensive experiments validate the effectiveness of our method, significantly outperforming existing state-of-the-art baseline methods across multiple datasets and evaluation metrics, particularly excelling in handling high-cognitive-conflict users.

2. Related Work

2.1. Traditional Recommendation Methods

Traditional recommendation systems predict user ratings or optimize recommendation rankings by analyzing user-item interaction data. Early methods such as collaborative filtering [13] make recommendations based on similarity computation, while matrix factorization techniques [14,15] further model user preferences and item features by learning latent factors. Deep learning methods learn more complex interaction patterns through neural networks, such as Wide&Deep [16], neural collaborative filtering [17], and DeepFM [18], which leverage neural networks to model feature interactions. Graph neural network-based methods [19,20] further exploit the structural information of user–item interaction graphs. Sequential recommendation methods such as SASRec [21] capture temporal dependencies in user behavior sequences through self-attention mechanisms. These methods have achieved significant success in industry, with their core modeling paradigm being to represent users and items as numerical embedding vectors and optimize model parameters based on explicit feedback such as ratings or clicks.

However, this modeling paradigm still suffers from several inherent limitations. First, numerical embeddings lack semantic expressiveness, making it difficult to capture the fine-grained emotions conveyed in user reviews. Second, optimization based solely on ratings or click behaviors implicitly assumes that these signals fully represent user preferences while overlooking the affective factors and cognitive complexity involved in real decision-making. Finally, representing users through static embeddings limits the model’s ability to account for contradictions and inconsistencies that often occur in user behavior.

2.2. LLM-Based Recommendation Systems

The application of large language models in recommendation systems has made significant progress in recent years, primarily developing along two technical approaches.

One approach uses LLMs as feature enhancement tools to improve the semantic understanding capabilities of traditional recommendation models. Lin et al. [22] proposed a transitional paradigm that connects item ID embeddings with language features, leveraging the semantic understanding capabilities of LLMs to enhance collaborative filtering models. Xi et al. [23] utilized the knowledge enhancement capabilities of LLMs to address open-world recommendation problems, expanding the candidate item space through external knowledge bases. These methods treat LLMs as auxiliary modules, with core recommendation decisions still made or heavily guided by traditional models.

Another approach directly treats LLMs themselves as recommender models, either in a zero-shot or task-adapted manner. He et al. [24] use LLMs as zero-shot conversational recommenders that directly generate item suggestions from natural-language dialogues, while Hou et al. [25] show that LLMs can act as zero-shot rankers over retrieved candidate sets. To further align LLMs with recommendation tasks, recent work fine-tunes LLM-based recommenders on interaction data, such as TALLRec [26], CoLLM [27], Flower [28], and SPRec [29], which respectively focus on efficient task alignment, integrating collaborative embeddings, introducing flow-guided process supervision, and debiasing via self-play. However, these methods mostly employ LLMs as auxiliary modules for feature enhancement or static reasoning, lacking mechanisms for persistent memory and self-reflection and are thus unable to support continuous evolution through memory updates.

2.3. Agent-Based Recommendation Systems

Agent-based recommendation systems introduce memory, reflection, and collaboration capabilities into recommendations.

On the memory side, agent-based systems mainly focus on recording long-term user states, system-level interaction traces, and richer structured memories. RecAgent [30,31] builds LLM-based user agents with profile, memory, and action modules in a sandbox environment, where multi-level memories (e.g., sensory, short-term, long-term) log detailed interaction histories for realistic user behavior simulation. RecMind [9] similarly maintains internal memories of visited states and tool calls to support multi-step planning. InteRecAgent [8] introduces system-level memory components and a memory-initialization tool so that user profiles and historical behaviors can be written into and retrieved from agent memory during interactive recommendation. Agent4Rec [11] augments user memory with an emotional component that logs affective states such as satisfaction and fatigue alongside factual interaction records, whereas AgentCF [12] assigns distinct natural-language memory modules to both user and item agents to store their simulated preferences, characteristics, and accumulated interaction experiences. Building on this line, AgentCF++ [32] enhances agent memories with popularity-aware, cross-domain signals and interaction-time updates, facilitating preference transfer across domains while mitigating popularity bias.

On the reflection side, agents inspect intermediate results and past trajectories to revise plans, retrieval, or policies before the next step. InteRecAgent makes reflection an explicit stage for controller self-checks over tool outputs and memory usage [8], while loop-style designs (e.g., RAH’s learn–act–critique–reflect) provide iterative critique and alignment [33]. Trajectory-based self-review appears in RecMind and RecAgent, which revisit explored states and long-term logs to generalize across sessions [9,30,31]. Feedback-driven variants update memories or policies in response to outcomes—AFL closes the loop with user–agent feedback [34], and Agent4Rec/AgentCF adjusts factual or emotional stores when expectations diverge from observations [11,12].

On the collaboration side, agent-based recommenders coordinate multiple agents to share information and make joint decisions. MACRec [10] organizes specialized agents (e.g., Manager, Analyst, Searcher, Reflector) into a workflow that exchanges intermediate artifacts and negotiates recommendations across tasks. AgentCF [12] models users and items as autonomous agents that interact and co-adapt their memories and policies via collaborative reflection, propagating preference signals along user–user, item–item, and user–item links. The AFL framework [34] forms a closed loop between a recommendation agent and a user agent, where policy updates are driven by user–agent feedback and the user agent improves its simulation grounded on the recommender’s outputs, yielding iterative co-evolution.

However, although these studies incorporate memory and reflection mechanisms, they remain limited in affective modeling from two critical aspects. First, emotion representation granularity: Agent4Rec’s emotional component relies on coarse-grained categorical labels (e.g., “satisfied/fatigued”) that cannot preserve the semantic richness of natural language emotional expressions such as “shocking but slightly heavy” versus “mind-bending and exciting.” Second, memory separation dimension: AgentCF’s dual-memory architecture separates user and item agent representations, but this is a user–item level separation—within the user agent, rational preferences and emotional responses remain implicitly merged in a single memory, making it impossible to detect when these two cognitive subsystems conflict. More critically, none of these methods explicitly identify or leverage cognitive conflicts (e.g., cognitive dissonance, hate-watching) as signals for adaptive reasoning. Consequently, existing approaches are unable to fully model users’ complex cognitive decision-making processes, particularly the interplay and conflicts between affective (System 1) and rational (System 2) systems.

2.4. Affective Recommender Systems

The dual-system theory in cognitive psychology [5] reveals the coexistence mechanism of the fast affective system (System 1) and the slow rational system (System 2) in human decision-making, while cognitive dissonance theory [35] explains the psychological discomfort that arises when individuals hold contradictory beliefs or behaviors and their adjustment processes. These theories provide a theoretical framework for understanding the affective–rational interaction in user decision-making.

Affective Recommender Systems integrate affective factors into the recommendation process to better align with users’ emotional states. In a recent survey, Hasan and Bunescu [3] point out that affective factors encompass multiple dimensions including attitudes, emotions, and moods, and these factors play an important role in user decision-making. By modeling these affective factors, deep learning-based methods have enhanced affective representation capabilities. Hyun et al. [36] proposed SentiRec, which uses CNNs to encode reviews into fixed-dimensional vectors containing affective information for user and item representations. Shi et al. [37] proposed SENGR, which learns affective-enhanced user–item representations using graph convolutional networks and affective auxiliary tasks, and designs hierarchical attention mechanisms to select important reviews and capture aspect-level sentiments. Recent methods based on pre-trained language models have further improved affective representation learning. Zhang et al. [38] use BERT in SIFN to encode reviews and design a sentiment learner to extract affective features, guiding the model to learn sentiment-aware representations through sentiment prediction tasks. Revathy et al. [39] employ BERT for sentiment classification of music lyrics, using transfer learning to identify affective categories such as happy, angry, relaxed, and sad. Darraz et al. [40] integrate BERT with sentiment analysis into hybrid recommendation systems.

Existing affective recommendation systems learn numerical representations or embedding vectors of emotions, incorporating affect as additional features into recommendation models [4,41]. These methods have achieved significant results in improving recommendation accuracy. However, these methods model emotions and preferences together as unified representations, lacking explicit separation of emotion memory and preference memory; meanwhile, affective features remain static during the modeling process, unable to capture the dynamic conflicts between emotion and rationality.

3. Methodology

3.1. Problem Reformulation and Framework Overview

Traditional recommendation systems are based on the “static preference matching” assumption, ignoring affective dynamics and cognitive conflicts. Their core model can be defined as follows.

Definition 1 (Traditional Recommendation Paradigm).

Given a user set

U

and an item set

I

, traditional recommendation systems model the interaction score s between user

u \in U

and item

i \in I

as a matching function of static embedding vectors:

s (u, i) = f (u, i),

(1)

where

u \in R^{d}

and

i \in R^{d}

represent the fixed-dimensional embedding vectors of the user and item respectively, and

f (\cdot)

is the matching function (e.g., inner product or neural network). The training objective is to optimize the model parameters so that

s (u, i)

approximates the true user preference signal.

Inspired by the dual-system theory in cognitive psychology, we reformulate the recommendation task as a “dynamic cognitive collaborative reasoning” problem that integrates temporal dynamics, affective context, and cognitive conflict mechanisms.

To realize this paradigm, we propose the Emotion-Enhanced Dual-Agent Collaborative Framework (EDACF), which models users and items as interactive cognitive agents equipped with distinct memory systems.

Definition 2 (Emotion-Enhanced Dual-Agent Collaborative Recommendation Paradigm).

At time t, the recommendation score s for user u on item i is derived through collaborative reasoning between the user agent and item agent:

s (u, i, t) = Φ_{LLM} (M_{u} (t), M_{i} (t)),

(2)

where

Φ_{LLM} (\cdot)

denotes the collaborative reasoning function implemented via large language models, which integrates and aligns the memory states of both agents through prompting-based cognitive reasoning.

The user agent maintains a triple-memory structure:

M_{u} (t) = [M_{pref} (t), M_{emo} (t), M_{conf} (t)],

(3)

corresponding to preference memory (System 2: rational system), emotion memory (System 1: affective system), and conflict memory (meta-cognitive level), respectively.

The item agent maintains a dual-memory structure:

M_{i} (t) = [A_{core}^{i} (t), A_{feedback}^{i} (t)],

(4)

representing core attributes (static metadata) and feedback memory (dynamically learned affective-triggering features).

This dual-agent memory design allows the system to explicitly model the interaction between affective and rational cognition, detect inconsistencies between them, and record such conflicts through the conflict memory component, forming the foundation for cognitive reasoning and adaptation.

Figure 2 illustrates the overall architecture of the EDACF framework. At the top, the user agent (left) and item agent (right) maintain their respective memory structures. The core of the framework is a cyclic four-phase learning loop (Decision→Feedback →Update→Conflict), which iteratively refines the memory states of both agents. At the bottom, a multi-strategy reasoning layer dynamically selects appropriate reasoning strategies based on the current memory states. The detailed design of each component will be elaborated in the following subsections.

3.2. Emotion-Enhanced Dual-Agent Collaborative Framework (EDACF)

3.2.1. Dual-Agent Structure

User Agent

The user agent simulates the user’s cognitive process and maintains three independent memory components.

Preference Memory $M_{pref} (t)$ (System 2): Inductively summarizes the user’s rational preference characteristics, stored in natural language form:

$M_{pref} (t) = {LLM}_{summarize} ({({meta}_{j}, r_{u j})}_{j \in I_{u}}),$

(5)

where $I_{u}$ represents the set of items with which user u has historically interacted, ${meta}_{j}$ denotes the metadata of item j (including structured attributes such as title, category, actors, director, and unstructured text such as plot synopsis), $r_{u j}$ represents user u’s rating, and ${LLM}_{summarize} (\cdot)$ represents the inductive summarization function based on large language models, which extracts preference patterns from item metadata and user ratings and summarizes them into natural language preference descriptions.
In the initial stage ( $t = 0$ ), the system extracts item metadata from the user’s historical interactions and inductively summarizes it to generate initial preference descriptions; in the subsequent collaborative learning process, this memory adopts a conservative update strategy, maintaining the user’s stable interests and tastes through gradual adjustments.
Emotion Memory $M_{emo} (t)$ (System 1): Captures the user’s dynamic emotional experiences, retaining records of emotional responses from the most recent k interactions. Each record contains item information, category, rating, review, and timestamp:

$M_{emo} (t) = {({item}_{j}, {category}_{j}, r_{u j}, {review}_{u j}, t_{j})}_{j \in R (u, t, k)},$

(6)

where $R (u, t, k)$ represents the most recent k interactions of user u before time t, ${item}_{j}$ and ${category}_{j}$ represent the item name and category respectively, $r_{u j}$ and ${review}_{u j}$ represent the user’s rating and review text, and $t_{j}$ represents the interaction timestamp. Unlike the inductive summarization in preference memory, emotion memory retains specific interaction details, especially the rich emotional expressions contained in reviews, thereby supporting fine-grained emotional pattern recognition and temporal behavior analysis.
This memory is initially empty ( $M_{emo} (0) = Ø$ ) and is dynamically constructed during collaborative learning: after each interaction, new emotional experience records are appended to the sequence, and when the sequence length exceeds k, the earliest record is removed to maintain a fixed window.
Conflict Memory $M_{conf} (t)$ : Tracks the user’s emotion–preference conflict history, providing conflict pattern basis for reasoning:

$M_{conf} (t) = {C (u, τ)}_{τ < t},$

(7)

where $C (u, τ)$ represents the conflict detection result at historical time $τ$ that raised between preference memory and emotion memory,

$C (u, t) = {LLM}_{detect} (M_{pref} (t), M_{emo} (t)),$

(8)

where ${LLM}_{detect} (\cdot)$ uses large language models to determine whether there are significant inconsistencies.
Since emotion memory retains detailed information such as item names, categories, ratings, and reviews, the detection process can not only determine whether conflicts exist but also identify specific conflict types. Based on the temporal manifestation of conflicts, we distinguish two types of conflict:
–
Cognitive Dissonance (Instantaneous Conflict): Occurs within a single interaction when the rating and review sentiment are inconsistent. For example, a user rates a movie 2 but writes “I did enjoy the film…”. This conflict can be detected from a single interaction without historical data.
–
Hate-Watching (Sustained Conflict): Occurs over a sequence of interactions when the user repeatedly gives low ratings and negative reviews for items in a certain series or category but continues consuming them. Detection of this conflict requires analyzing both preference and emotion memory over time.
Each conflict detection result $C (u, t)$ contains three key elements: has_conflict $\in {true, false}$ , type $\in {dissonance, hate - watch, none}$ , and description. Here, has_conflict indicates whether a conflict exists, type specifies the conflict category, and description provides a natural language summary. Conflict memory is initially empty, $M_{conf} (0) = Ø$ . During each collaborative learning round, detected conflicts are appended to the history, gradually accumulating the user’s cognitive conflict patterns.

Item Agent

Memory of the item agent includes core attribute memory and feedback memory.

Core Attributes $A_{core}^{i} (t)$ : Aggregates the item’s metadata, stored in natural language form:

$A_{core}^{i} (t) = {LLM}_{aggregate} ({meta}_{i}),$

(9)

where ${meta}_{i}$ represents the complete metadata of item i, including structured attributes (title, category, actors, director, release year, etc.) and unstructured text (plot synopsis, content description, etc.). ${LLM}_{aggregate} (\cdot)$ represents the aggregation function based on large language models, which integrates this heterogeneous information into a unified natural language representation. This memory remains relatively stable in the learning process.
Feedback Memory $A_{feedback}^{i} (t)$ : Summarizes users’ emotional responses to the item, modeling the item’s affective triggering features:

$A_{feedback}^{i} (t) = {LLM}_{summarize} ({({review}_{u i}, r_{u i})}_{u \in U_{i} (t)}),$

(10)

where $U_{i} (t)$ represents the set of users who have interacted with item i by time t, ${review}_{u i}$ and $r_{u i}$ represent user u’s review and rating for item i, respectively, and ${LLM}_{summarize} (\cdot)$ represents the summarization function based on large language models, which extracts the item’s affective triggering features from user feedback. This memory is initially empty ( $A_{feedback}^{i} (0) = Ø$ ) and is dynamically constructed during collaborative learning.

3.2.2. Collaborative Learning Mechanism

Collaborative learning achieves dynamic optimization of

M_{u} (t)

and

M_{i} (t)

through a four-phase loop: “Cognitive Decision→Feedback Generation→Memory Update→Conflict Detection.”

Phase 1: Cognitive Decision

Based on its current memory state, the user agent makes a selection decision between a given pair of candidate items. The candidate item pair consists of an item

i_{pos}

with which the user has actually interacted and a randomly selected item

i_{neg}

with which the user has not interacted. The decision process is defined as:

d (t) = {LLM}_{decide} (M_{u} (t), M_{i_{pos}} (t), M_{i_{neg}} (t)),

(11)

where

M_{i_{k}} (t) = [A_{core}^{i_{k}} (t), A_{feedback}^{i_{k}} (t)]

represents the agent memory of item

i_{k}

.

{LLM}_{decide} (\cdot)

makes decisions based on large language models, with the output being the selected item

d (t) \in {i_{pos}, i_{neg}}

and the decision rationale. During the decision process, the LLM comprehensively evaluates the matching degree between preference memory and item core attributes, as well as the consistency between emotion memory and item affective triggering features, providing the final selection and rationale through semantic reasoning.

Phase 2: Feedback Generation

The system compares the agent’s choice with the true user preference and generates corresponding learning feedback. A correct choice (

d (t) = i_{pos}

) produces a positive feedback signal, validating the effectiveness of the current memory state; an incorrect choice (

d (t) = i_{neg}

) triggers an error correction mechanism, indicating that the memory state needs adjustment. The feedback signal is defined as

r (t) = \{\begin{matrix} + 1, & d (t) = i_{pos} \\ - 1, & d (t) = i_{neg} \end{matrix}

(12)

Phase 3: Memory Update

Based on the feedback signal

r (t)

, both the user agent and item agent simultaneously update their respective memory components.

User Agent Memory Update: Since preference memory and emotion memory correspond to different cognitive systems, their update strategies differ.

Preference memory

M_{pref}

adopts a conservative update strategy. The system integrates historical preference memory, current feedback signals, and the item’s core attributes. When the feedback signal is negative (decision error), only local adjustments are made to the specific preference dimensions that led to the error, avoiding excessive perturbation of core preference patterns. For example, if a user prefers “serious science fiction” but rejects a particular work, the system might refine the preference to “likes science fiction with clear plot progression” rather than completely negating the science fiction preference.

Emotion memory

M_{emo}

extracts emotion-related content from the user’s review text for the item in the current interaction round, with each record represented as

({item}_{j}, {category}_{j}, r_{u j}, {review}_{u j}, t_{j})

, containing item information, category, rating, review, and timestamp. Each record is then appended to the emotional history sequence. When the sequence length exceeds k, the earliest record is removed, forming an incremental characterization of the user’s emotional patterns.

Item Agent Memory Update: The item’s core attributes

A_{core}^{i}

remain relatively stable, with updates mainly reflected in adjusting the description method to better match user understanding without deviating from the item’s objective information; the item’s feedback memory

A_{feedback}^{i}

extracts affective triggering features from user feedback, dynamically updating the understanding of the item’s emotional impact.

Phase 4: Conflict Detection

After memory update, the system performs emotion–preference conflict detection to identify whether the user exhibits cognitive inconsistencies based on Equation (8). The detection result

C (u, t + 1)

is then appended to the user’s conflict memory

M_{conf}

, gradually accumulating the user’s cognitive conflict patterns. Based on the detection results, the system identifies the user’s conflict type (no conflict, cognitive dissonance, hate-watching), providing a basis for the subsequent reasoning stage. The conflict type and natural language description in the detection results will be used to dynamically select reasoning strategies in the reasoning stage (see Section 3.2.3).

3.2.3. Multi-Strategy Reasoning

After completing dual-agent collaborative learning, the system needs to generate ranking results for recommendation requests. Based on the emotion–preference separation modeling capability built in previous subsections, we design three different reasoning strategies, corresponding to different complexity levels of user cognitive states and reasoning requirements.

Basic Reasoning Strategy

This strategy follows the traditional preference matching paradigm and is suitable for user groups with high cognitive consistency. The system only uses the user’s preference memory for reasoning, embodying the cognitive pattern of rational decision-making:

s_{basic} (u, i, t) = {LLM}_{score} (M_{pref} (t), A_{core}^{i} (t)),

(13)

where

{LLM}_{score} (\cdot)

performs semantic reasoning through large language models to evaluate the matching degree between the user’s preference description and the item’s core attributes.

Emotion-Enhanced Reasoning Strategy

This strategy integrates emotion memory information on top of preference memory, achieving collaborative reasoning between preference and emotion, embodying the synergy between System 1 (affective intuition) and System 2 (rational analysis) in dual-system cognitive theory. The recommendation score is calculated through weighted fusion:

s_{enhanced} (u, i, t) = α \cdot s_{pref} (u, i, t) + β \cdot s_{emo} (u, i, t),

(14)

where

s_{pref} (u, i, t)

and

s_{emo} (u, i, t)

represent preference matching degree and emotion matching degree respectively, calculated through semantic reasoning with large language models: preference matching degree evaluates the semantic consistency between user preference memory

M_{pref} (t)

and item core attributes

A_{core}^{i} (t)

, while emotion matching degree evaluates the matching degree between user emotion memory

M_{emo} (t)

and item affective triggering features

A_{feedback}^{i} (t)

. The weights

α

and

β

satisfy

α + β = 1

. This strategy is suitable for users with good cognitive consistency, adopting equal weights (

α = β = 0.5

), allowing emotion and preference to participate equally in decision-making.

Conflict-Aware Reasoning Strategy

This strategy specifically handles complex users with significant emotion–preference conflicts. The system adopts differentiated reasoning strategies based on the conflict type detected in conflict memory

M_{conf}

:

s_{conflict} (u, i, t) = \{\begin{matrix} 0.5 \cdot s_{pref} (u, i, t) + 0.5 \cdot s_{emo} (u, i, t), & No conflict or \\ low-frequency conflict \\ 0.3 \cdot s_{pref} (u, i, t) + 0.7 \cdot s_{emo} (u, i, t), & Cognitive dissonance \\ conflict \\ {LLM}_{reason} (M_{conf}, s_{pref}, s_{emo}), & Hate-watching \\ conflict \end{matrix}

(15)

The three cases are explained in detail as follows:

Case 1—No Conflict or Low-Frequency Conflict: Adopts balanced weight configuration, with emotion and preference participating equally in decision-making.

Case 2—Cognitive Dissonance Conflict (Instantaneous Conflict): Cognitive dissonance manifests as divergence between ratings and review sentiment within a single interaction. For example, a user gives a movie a low rating of 2 but writes in the review “I did enjoy the film. The visuals were beautiful.” This conflict indicates that the rating fails to fully reflect the user’s emotional experience, while the review more comprehensively expresses the user’s true feelings. The system increases the emotion memory weight to

β = 0.7

, consistent with the important role of emotions/intuition in decision-making in cognitive science [6], thereby capturing fine-grained emotional information in reviews.

Case 3—Hate-Watching Conflict (Sustained Conflict): Hate-watching is a temporal behavior pattern: users continuously give low ratings and negative reviews to a certain category, but still continue to consume that category over time. In this case, the

{LLM}_{reason} (\cdot)

function reasons based on the natural language description in conflict memory. For example, when the conflict description shows “User recently watched sci-fi movies with ratings all below 2, but still continues watching,” the system extracts key signals from it: “still continues watching” indicates deep involvement with that category, while negative ratings reflect a pattern of “critical engagement.” The reasoning process recommends that category based on the “continuous watching” signal, without being misled by negative emotions.

By distinguishing cognitive dissonance (instantaneous conflict) from hate-watching (sustained conflict), our framework can adopt appropriate reasoning strategies for different types of cognitive conflicts.

Through this three-tiered hierarchical reasoning strategy design, our framework can adapt to different reasoning requirements from simple preference matching to complex conflict handling, providing personalized reasoning experiences for users with different cognitive characteristics, achieving deep understanding and precise adaptation of recommendation systems to user cognitive diversity. Algorithm 1 presents the complete procedure of this conflict-aware adaptive reasoning process.

Algorithm 1 Conflict-Aware Adaptive Reasoning for Recommendation

Require:: User memory $M_{u} (t) = [M_{pref} (t), M_{emo} (t), M_{conf} (t)]$ ;
1:: Candidate items with their memories ${(i, M_{i} (t))}$ for $i \in I_{candidate}$
Ensure:: Top-K recommendation list
2:: Initialize empty list $scores \leftarrow []$
3:: for each candidate item $i \in I_{candidate}$ do
4:: // Compute preference matching score
5:: $s_{pref} \leftarrow {LLM}_{score} (M_{pref} (t), A_{core}^{i} (t))$
6:
7:: // Compute emotion matching score
8:: $s_{emo} \leftarrow {LLM}_{score} (M_{emo} (t), A_{feedback}^{i} (t))$
9:
10:: // Determine conflict type from conflict memory
11:: $conflict_type \leftarrow GetConflictType (M_{conf} (t))$
12:
13:: // Adaptive strategy selection based on conflict type
14:: if $conflict_type$ == “None” or “Low-frequency” then
15:: $s_{final} \leftarrow 0.5 \times s_{pref} + 0.5 \times s_{emo}$ ▹ Balanced strategy
16:: else if $conflict_type$ == “Cognitive Dissonance” then
17:: $s_{final} \leftarrow 0.3 \times s_{pref} + 0.7 \times s_{emo}$ ▹ Emotion-weighted
18:: else if $conflict_type$ == “Hate-watching” then
19:: $s_{final} \leftarrow {LLM}_{reason} (M_{conf} (t), s_{pref}, s_{emo})$ ▹ Deep reasoning
20:: end if
21:
22:: $scores . append ((i, s_{final}))$
23:: end for
24:
25:: // Sort and return top-K items
26:: Sort $scores$ by $s_{final}$ in descending order
27:: return Top-K items from $scores$

4. Experiments

4.1. Experimental Setup

4.1.1. Datasets

We conduct experiments on the Amazon Reviews 2023 dataset [42]. We select two subsets rich in affective reviews: Movies & TV and Books, retaining review data from October 2021 to October 2022. Then, we apply a 5-core filtering strategy to ensure that each user and item in the dataset has at least 5 interactions. Since LLM-based recommendation methods require frequent API calls, to control experimental costs, we follow the practice of AgentCF [12] and randomly sample 100 users from each dataset. Table 1 summarizes the statistical information of the preprocessed datasets.

4.1.2. User Cognitive Conflict Distribution

To investigate user cognitive conflicts in detail, we classify users based on emotion-behavior consistency. We adopt a semantic analysis method based on large language models, using GPT-4o-mini to analyze each user’s temporal interaction history and detect two types of cognitive conflicts. First, for cognitive dissonance detection, we determine whether review sentiment diverges from ratings within a single interaction. The LLM analyzes the semantic sentiment tendency of reviews and compares it with the user’s rating. If a low rating (≤2.5) is accompanied by positive sentiment in the review, it is classified as cognitive dissonance. Second, for hate-watching detection, we determine whether there exists a pattern of continuous low ratings but sustained consumption of the same series or category. If a user gives low ratings (≤2.5) for ≥3 consecutive times to similar categories over time but still continues watching, it is classified as hate-watching behavior. Based on the above detection, we categorize users into three types: consistent users (review sentiment is basically consistent with ratings), cognitive dissonance users (cognitive dissonance occurs in ≥20% of interactions), and hate-watching users (exhibiting a sustained negative consumption pattern with ≥3 consecutive low ratings).

Figure 3 shows the distribution of user cognitive conflicts in the two datasets. We observe that both datasets contain a certain proportion of cognitive conflict users. In the Movies & TV dataset, 27% of users exhibit cognitive conflicts, including both cognitive dissonance and hate-watching types. In the Books dataset, 11% of users exhibit cognitive conflicts. This difference may be related to content type characteristics: the serial and continuous nature of film and television content makes it easier to trigger hate-watching behavior, while the independence of book consumption results in relatively fewer emotion-behavior conflicts among users.

4.1.3. Evaluation Metrics

Following existing research [12], we adopt a leave-one-out evaluation strategy. Specifically, for each user, we use the last item in their historical interaction sequence as the ground-truth, and randomly sample nine negative samples from items the user has not interacted with to form 10 candidate items. The user agent ranks these 10 candidate items, and we calculate NDCG@K and MRR@N based on the ranking of the ground-truth item. Normalized Discounted Cumulative Gain (NDCG@K) measures the ranking quality by assigning higher scores to relevant items appearing earlier in the top-K list. Mean Reciprocal Rank (MRR@N) calculates the reciprocal of the rank of the first relevant item in the top-N list and averages it across all users, reflecting how quickly the model can recommend the correct item. To reduce the impact of randomness, each test instance is repeated five times and the average results are reported.

4.1.4. Baseline Methods

We compare EDACF with the following baselines:

BPR-MF [43]: A classic matrix factorization recommendation method. It learns low-dimensional latent representations of users and items by optimizing the Bayesian Personalized Ranking loss function, and uses inner product for rating prediction.
SASRec [21]: A sequential recommendation model based on self-attention mechanisms. It uses Transformer encoders to capture sequential dependencies in user interaction history, capable of modeling the dynamic evolution of short-term interests and long-term preferences.
LLMRank [25]: Directly uses large language models as zero-shot rankers. It constructs user interaction history sequences and textual descriptions of candidate items as prompts, leveraging the language understanding capabilities of LLMs to directly perform ranking recommendations.
Agent4Rec [11]: An LLM-based user simulation recommendation system. It simulates user behavior by constructing user agents with memory and reflection capabilities, including basic emotional state tracking (satisfaction and fatigue), but does not explicitly model emotion–preference conflicts.
AgentCF [12]: An LLM-based collaborative filtering recommendation method. It implements recommendations through interactive learning between user agents and item agents, where user agents maintain preference memory and item agents maintain feature memory.

4.1.5. Implementation Details

We use GPT-4o-mini for all LLM-based methods (including our method and baselines) to ensure fair comparison. During agent learning, the temperature parameter of the LLM is set to 0.7, while during evaluation it is set to 0 to minimize randomness in the generation process.

For our EDACF framework, the emotion memory window size is set to

k = 10

, i.e., retaining the most recent 10 interactions. During collaborative learning, conflict detection is performed after each memory update. For conflict-aware reasoning (Section 3.2.3), the emotion and preference weights for cognitive dissonance conflicts are set to

α = 0.3

and

β = 0.7

, respectively.

For traditional recommendation baselines (BPR-MF, SASRec) [43] and [21] respectively, since our agent-based method is optimized on the sampled dataset, to ensure fair comparison, we also train these models on the sampled dataset. All experiments are conducted on NVIDIA A100 GPUs.

Regarding computational cost, the collaborative learning phase requires approximately four LLM calls per training interaction (decision, preference memory update, item feedback update, and conflict detection), resulting in roughly 2600 LLM calls for our training set of 648 interactions. During inference, each user requires only one LLM call to rank all candidate items. The average latency per LLM call is 1–3 s.

4.2. Overall Performance

Table 2 shows the performance comparison between our proposed framework and baseline methods. We conduct systematic evaluations on two datasets, including traditional recommendation methods and state-of-the-art LLM-based recommendation methods.

The experimental results demonstrate the effectiveness of EDACF in capturing users’ cognitive complexity. EDACF-Full consistently outperforms all baselines, achieving approximately 10% improvement in NDCG@10 and over 13% improvement in MRR@10 compared to AgentCF. Analysis of the component variants reveals the contribution of each design: EDACF-Base, which relies solely on preference memory, performs slightly below AgentCF, indicating that the basic architecture is competitive with existing methods. Introducing independent emotion memory in EDACF-Emotion leads to substantial performance gains, confirming the importance of affective modeling. Incorporating the conflict-aware mechanism in EDACF-Full further improves results, highlighting the necessity of explicitly modeling cognitive conflicts to capture complex behavior patterns such as cognitive dissonance and hate-watching.

When compared with traditional recommendation methods, we observe that SASRec slightly outperforms some LLM-based approaches, consistent with previous studies showing that traditional models naturally capture popularity effects and temporal patterns. However, such methods primarily fit observed user behavior and lack the ability to model the underlying cognitive mechanisms. In contrast, EDACF explicitly represents interactions between the affective and rational systems through a separate memory architecture, enabling the framework to understand why users make seemingly contradictory choices and to generate more informed recommendations in cognitively complex scenarios.

Furthermore, EDACF demonstrates consistent improvements across both the Movies & TV and Books datasets despite substantial differences in the prevalence of cognitive conflict users (27% vs. 11%). This cross-dataset robustness indicates that the separate emotion memory architecture constitutes a generalizable modeling paradigm, benefiting not only high-conflict users but also improving recommendation quality for general users through fine-grained understanding of emotional signals. Given the higher proportion of conflict users in the Movies & TV dataset, subsequent ablation studies and user group analyses primarily based on this dataset.

4.3. Further Analysis

4.3.1. Ablation Study

To verify the effectiveness of each core component of the framework, we conducted ablation experiments on the Movies & TV dataset. Table 3 shows the performance changes after removing different components.

(1) w/o Emotion memory: In this variant, the emotion memory component is removed from the user agent at the architectural level. Since conflict detection relies on comparing preference memory and emotion memory, removing emotion memory also means that conflict detection cannot be performed, and conflict memory accordingly does not exist, leaving the system with only preference memory for user modeling. Removing emotion memory leads to the most significant performance degradation. This result indicates that independently modeling fine-grained emotion memory is crucial for understanding user decision-making. Unlike AgentCF, which only focuses on preference memory or Agent4Rec, which uses coarse-grained emotion labels, our emotion memory preserves rich emotional semantics in natural language form (such as “shocking but slightly heavy”), enabling the system to capture subtle differences in emotional expressions and thus more accurately predict user preferences.

(2) w/o Conflict memory: In this case, the system retains only preference memory and emotion memory, removing the conflict detection and conflict memory modules. Removing conflict memory also brings obvious performance degradation. Conflict memory enables the system to track the user’s emotion–preference inconsistency history (such as “hate-watching” behavior) and dynamically adjust reasoning strategies when conflicts are detected. This design prevents the system from simply assuming that emotion and preference are always consistent, but rather enables understanding of why users make seemingly contradictory choices. The performance difference shows that explicitly modeling cognitive conflicts can significantly improve recommendation quality.

(3) w/o Item agent: In this variant, we do not optimize the item agent and represent each item with static descriptive information. The performance degradation from removing the item agent demonstrates the importance of the collaborative learning mechanism. In the complete framework, the item agent learns and updates its representation by continuously receiving user feedback, dynamically adjusting the item’s affective triggering features (such as “this movie tends to evoke a sense of shock” or “this book is suitable for relaxed reading”). This learning process enables item representations to better match users’ emotional states. When this collaborative learning mechanism is removed and only static descriptions are used, the system loses the ability to learn from user–item interactions. The performance degradation indicates that continuous interaction and co-evolution between user agents and item agents are crucial for understanding user cognitive patterns.

(4) w/o Adaptive reasoning: In this ablation variant, we retain the conflict detection module and conflict memory

M_{conf}

storage functionality but remove the mechanism that adaptively selects reasoning strategies based on conflict type, instead applying a fixed balanced strategy

s = 0.5 \times s_{pref} + 0.5 \times s_{emo}

uniformly to all users. As shown in Table 3, the performance of this variant exhibits a significant decline compared to the full model. This result validates the critical role of the adaptive reasoning mechanism: although this variant still detects and records conflict information, the value of conflict memory cannot be effectively utilized without targeted reasoning strategies, leading to degraded reasoning capability and consequently suboptimal performance.

4.3.2. User Group Performance Analysis

To further analyze the effectiveness of methods on different user groups, we divide users into three groups: consistent users, cognitive dissonance users, and hate-watching users. As shown in Figure 4, all methods perform significantly lower on cognitive conflict users (cognitive dissonance and hate-watching users) than on consistent users, indicating that cognitive conflicts pose additional challenges to recommendation systems.

Traditional methods show particularly pronounced performance degradation on conflict users. Traditional methods primarily predict preferences by fitting user behavior patterns and struggle to understand the cognitive mechanisms behind behaviors, thus performing poorly when facing complex cognitive patterns.

Our methods demonstrate varying degrees of effectiveness on different conflict types. EDACF-Emotion shows noticeable performance improvement on cognitive dissonance users. This validates the value of independently modeling emotion memory, as emotional expressions in reviews can capture nuanced affective signals that complement preference representations.

After integrating the conflict-aware mechanism, EDACF-Full achieves the best performance across all user groups. Compared to AgentCF, our method shows greater relative improvement on cognitive conflict users than on consistent users. This demonstrates that the design of independently modeling emotion memory and explicitly handling cognitive conflicts is particularly effective in addressing cognitive complexity in user decision-making.

4.3.3. Conflict-Aware Mechanism Analysis

To systematically evaluate the effectiveness and applicability boundaries of the conflict-aware mechanism, we conducted quantitative analysis on the test set. This section focuses on two aspects: (1) the overall impact of conflict memory on recommendation ranking; (2) the accuracy differences in hate-watching identification across different content types and their impact on recommendation performance.

Impact of Conflict-Aware Mechanism on Ground-Truth Rankings. As shown in Figure 5a, the introduction of the conflict-aware mechanism significantly improved the ranking distribution of ground-truth items in the recommendation list. Comparing EDACF-Emotion (without conflict awareness) and EDACF-Full (with conflict awareness), we observe that: the Top-1 accuracy increased from 20.0% to 21.0%, the Rank 2–3 range improved from 22.0% to 26.0%, indicating that conflict awareness enhances the ability to identify high-ranking items. Correspondingly, the Rank 6–10 range decreased from 45.0% to 36.0%, demonstrating that more ground-truth items were successfully promoted to higher rankings. These improvements are reflected in MRR@10 increasing from 0.381 to 0.405, a relative improvement of approximately 6.3%.

False Negative Analysis of Hate-Watching Identification. The core of the conflict-aware mechanism lies in identifying users’ hate-watching behavior. However, the accuracy of this identification varies significantly across different content types (Figure 5b).

For Action & Adventure content, the identification accuracy reaches 87.5%. Hate-watching phenomena are relatively common in this category (e.g., movie franchises, IP sequels, superhero series), and exhibit clear behavioral patterns: users continue to watch series content completely despite consistently giving negative ratings. The conflict-aware mechanism can effectively capture this separation between ratings and behavior.

For Suspense & Thriller content, the identification accuracy is 66.7%. The hate-watching characteristics in this category are relatively ambiguous, as users’ continued viewing could stem from either hate-watching or curiosity about suspenseful plots, increasing the difficulty of judgment.

For Drama & Romance content, the identification accuracy is only 50.0%, significantly lower than other types. Viewing of this content is highly dependent on emotional resonance, and users’ negative comments often reflect genuine dissatisfaction rather than hate-watching behavior. In such scenarios, the conflict-aware mechanism faces greater judgment challenges. This result reveals the applicability boundaries of the current method: when the hate-watching characteristics of content types are not obvious, the identification mechanism based on rating-behavior conflicts may fail to effectively distinguish genuine aversion from hate-watching, thus limiting the potential for recommendation performance improvement.

4.3.4. Case Study

To intuitively demonstrate how the framework handles cognitive conflicts, we select a typical hate-watching user for in-depth analysis. This user is a loyal fan of the Die Hard series and exhibited an obvious critical engagement pattern from December 2021 to February 2022: continuously giving low ratings (1–2 stars) to action/sci-fi movies and writing detailed critical reviews but still watching these movies completely and continuing to consume similar content.

User Memory State. The system constructed a triple memory structure for this user. Preference memory identified the user’s love for classic 1980s action films and high standards for production quality, particularly noting the user’s appreciation for innovative themes. Emotion memory retained the most recent five viewing experiences, four of which received one to two star low ratings: the user’s reviews of Split Second, Aliens vs. Predator: Requiem, and others were full of criticism, expressing strong negative emotions (“very irritated,” “thoroughly disappointed”).

Reasoning Comparison Experiment. We constructed a leave-one-out test scenario: the ground truth is Reign of Fire, which the user actually watched and gave a five-star rating (an action film with innovative genre-blending elements), with nine other randomly sampled movies the user had not watched as negative samples. Figure 6 shows the comparison of reasoning results between EDACF-Emotion and EDACF-Full.

EDACF-Emotion (without conflict memory) interpreted the negative emotion memory as genuine aversion to action/sci-fi genres, recommending horror films as genre alternatives, placing the ground truth Reign of Fire at the eighth position. While this reasoning is intuitive (user’s recent consecutive low ratings → avoid this genre), it overlooked a key signal: the user’s five-star rating of Cosmoball demonstrates appreciation for films with innovative and unconventional concepts.

EDACF-Full (with conflict memory) understood through the conflict-aware mechanism that: negative reviews do not indicate lack of interest but rather critical engagement under high standards. The system identified the user’s true preference—appreciation for works that dare to try innovative themes—thus placing Reign of Fire at the top position.

This case demonstrates that the conflict-aware mechanism can effectively distinguish between “genuine aversion” and “critical engagement.” For hate-watching users, negative emotions are not rejection signals but rather expressions of deep involvement. By explicitly modeling cognitive conflicts, the system can penetrate the surface emotional polarity, understand the deep motivations behind user behavior and thus make more precise recommendation decisions. Complete case details and memory contents can be found in Appendix A.

4.3.5. Parameter Sensitivity Analysis

To investigate how the cognitive conflict detection thresholds affect framework performance, we vary the key threshold for identifying hate-watching users (consecutive low-rating count) and the threshold for identifying cognitive dissonance users (conflict interaction ratio). The results are shown in Figure 7.

Figure 7a,b illustrate the user distribution under different threshold settings. As the thresholds become stricter, the proportion of users identified as conflict users exhibits a monotonic decreasing trend, which aligns with intuitive expectations.

Figure 7c,d present the recommendation performance under different thresholds. Both metrics exhibit a clear inverted-U pattern, peaking at the default threshold values. When thresholds are too lenient, non-conflict users are misclassified in large numbers, causing the conflict-aware strategy to be incorrectly applied. When thresholds are too strict, genuine conflict users are missed, and conflict signals cannot be effectively utilized. Notably, under extreme threshold settings, the framework performance approaches or even falls below baseline methods such as AgentCF, indicating that threshold selection has a significant impact on model effectiveness and validating the rationality of our default threshold choices.

4.3.6. Robustness Analysis

To verify the framework’s adaptability to different large language models, we conducted comparative experiments on three models: GPT-4o-mini (main results), GPT-4o, and DeepSeek-V3. Table 4 shows the performance of different models on the Movies & TV dataset.

The experimental results show that the NDCG@10 performance difference among the three models is within 3%. This result indicates that our framework design does not depend on the capabilities of a specific LLM, and its core mechanisms—separate memory architecture and conflict-aware reasoning—can operate effectively across different models. GPT-4o shows a 2.9% improvement over GPT-4o-mini, reflecting the advantage of stronger semantic understanding capabilities in fine-grained emotion modeling. Notably, even the open-source model DeepSeek-V3 shows only a 0.5% performance decrease, demonstrating the framework’s good adaptability to model capabilities.

5. Conclusions and Future Work

This paper introduces EDACF (Emotion-Enhanced Dual-Agent Collaborative Framework), an emotion-aware dual-agent recommendation framework designed to model and leverage the cognitive complexity inherent in user decision-making. Drawing inspiration from the dual-system theory in cognitive science, EDACF explicitly represents users’ affective system (System 1) and rational system (System 2) as independent memory components. The framework incorporates three key designs: (1) a separate memory architecture that maintains preference memory, emotion memory, and conflict memory independently, using natural language representations to capture the semantic richness of emotional expressions; (2) a conflict-aware mechanism that detects inconsistencies between emotion and preference, identifying complex behavior patterns such as cognitive dissonance and hate-watching; and (3) adaptive reasoning strategies that dynamically adjust reasoning based on users’ conflict types. Through collaborative learning, user and item agents iteratively update their memory representations, progressively developing a deep understanding of users’ cognitive states. Experiments on real-world datasets demonstrate the effectiveness of EDACF, particularly in managing users with high cognitive conflicts.

We acknowledge that our framework lacks an explicit loss function and formal convergence analysis for the collaborative learning process. Unlike traditional parametric machine learning, where gradient-based optimization operates in continuous parameter spaces with well-defined convergence properties, our LLM-based agent learning relies on discrete natural language memory updates that leverage the LLM’s pre-trained cognitive capabilities. This process may be better characterized as cognitive evolution rather than optimization in the classical sense. Formally characterizing such evolutionary processes—including rigorous definitions of state spaces, convergence conditions, and stability guarantees—remains a profound open question that depends on both the intrinsic cognitive capabilities of underlying LLMs and how prompts guide memory evolution.

In future work, we aim to explore additional user cognitive behavior patterns in practical recommendation scenarios and investigate theoretical frameworks that can better characterize the learning dynamics of LLM-based agent systems.

Author Contributions

Conceptualization, Y.Y., Z.W., L.L., and D.Z.; methodology, Y.Y., Z.W., L.L., and D.Z.; software, Y.Y.; validation, Y.Y.; formal analysis, Y.Y.; investigation, Y.Y.; resources, Y.Y.; data curation, Y.Y.; writing—original draft preparation, Y.Y. and Z.W.; writing—review and editing, Z.W. and L.L.; supervision, L.L. and D.Z.; project administration, L.L. and D.Z.; funding acquisition, L.L. and D.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant XDA0480301, and the National Natural Science Foundation of China under Grants 62206282 and 72371029.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data used in this study are publicly available from the Amazon Reviews 2023 dataset at https://amazon-reviews-2023.github.io/ (accessed on 30 June 2025).

Acknowledgments

The authors would like to thank the anonymous reviewers for their constructive comments and suggestions. During the preparation of this manuscript, GPT-5 was used for language polishing to refine sentence fluency and expression. All modifications generated by the LLM were carefully reviewed by the authors to ensure the final content faithfully reflects the authors’ intended ideas.

Conflicts of Interest

The authors declare no conflicts of interest. The sponsors had no role in the design, execution, interpretation, or writing of the study.

Appendix A. Detailed Analysis of Hate-Watching User Case

Appendix A.1. Case Background

This case presents the hate-watching behavioral pattern of user Bob. This user is a devoted fan of the Die Hard franchise, and their viewing records from December 2021 to February 2022 demonstrate typical characteristics of “critical engagement.”

Appendix A.2. Agent Memory

Table A1. Memory state of the hate-watching user.

Memory Type	Content
Preference Memory	I am a die-hard fan of the original Die Hard, favoring the solid action films of the 1980s. I am very critical of the production quality of action/sci-fi films—I cannot tolerate clichéd dialogue, absurd plot holes, and perfunctory production. I particularly appreciate films that dare to explore innovative and unconventional themes.
Emotion Memory (chronological order)	1. Stargate (Sci-Fi, 2.0, 2021-12-09) “Watching Stargate felt very boring, with no sense of tension, like wandering aimlessly in a garden. The story is filled with clichéd tropes everywhere, I really couldn’t stand it and gave up halfway through.”
	2. Cosmoball (Sci-Fi, 5.0, 2021-12-13) “The dystopian Earth concept in Cosmoball felt quite interesting to me. Although the execution wasn’t perfect, I was watching it as a B-movie and didn’t have high expectations, so I actually found it worth watching.”
	3. Split Second (Action, 1.0, 2021-12-22) “Split Second made me feel very irritated—clichéd dialogue, terrible acting, absurd character behavior, crude sets, ridiculous costumes, and that annoying heartbeat sound throughout the film. I wouldn’t recommend it to anyone.”
	4. Aliens vs. Predator: Requiem (Action, 1.0, 2022-01-03) “The concept of Aliens vs. Predator: Requiem was good, but the execution was too poor. I watched the entire film with hope, expecting it to improve, but was thoroughly disappointed. It’s just lazy filmmaking.”
	5. Live Free or Die Hard (Action, 2.0, 2022-02-03) “Don’t get me wrong, I’m a die-hard fan of the original Die Hard, but the action sequences in this one are unbelievably exaggerated, completely breaking my immersion. The plot holes are also significant. Although it’s better than the third installment, it really doesn’t qualify as a good action film.”
Conflict Memory	This user exhibits a typical "hate-watching" pattern toward the action film genre. Despite consistently giving low ratings (1–2 stars) to action films and writing detailed critical reviews (pointing out production flaws, plot holes, clichés, etc.), they continue to watch and complete these films. As a devoted fan of the Die Hard franchise and classic 1980s action films, this user maintains deep engagement with the action genre through critical participation. The negative evaluations reflect high standards and deep engagement rather than genuine lack of interest.

Appendix A.3. Reasoning Phase Comparison

Test Scenario: Recommend 10 movies to the user, including one ground truth (Reign of Fire, which the user actually watched and rated 5.0) and nine negative samples (movies the user did not watch).

Table A2. Reasoning comparison between EDACF-Emotion and EDACF-Full.

EDACF-Emotion (Without Conflict Memory)	EDACF-Full (With Conflict Memory)
The Black Phone (Horror) My Best Friend’s Exorcism (Horror) Ghostbusters: Afterlife (Sci-Fi/Comedy) The Commuter (Action/Thriller) Taken (Action/Thriller) Hanna (Action/Thriller) Uncharted (Action/Adventure) Reign of Fire (Action/Fantasy)←Ground Truth Blade Runner 2049 (Sci-Fi) Wrath of Man (Action)	Reign of Fire (Action/Fantasy)←Ground Truth Blade Runner 2049 (Sci-Fi) Uncharted (Action/Adventure) Hanna (Action/Thriller) Taken (Action/Thriller) Wrath of Man (Action) The Commuter (Action/Thriller) Ghostbusters: Afterlife (Sci-Fi/Comedy) My Best Friend’s Exorcism (Horror) The Black Phone (Horror)

References

Yuan, Z.; Yuan, F.; Song, Y.; Li, Y.; Fu, J.; Yang, F.; Pan, Y.; Ni, Y. Where to go next for recommender systems? id-vs. modality-based recommender models revisited. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, Taiwan, 23–27 July 2023; pp. 2639–2649. [Google Scholar]
Zhao, X.; Wang, M.; Zhao, X.; Li, J.; Zhou, S.; Yin, D.; Li, Q.; Tang, J.; Guo, R. Embedding in recommender systems: A survey. arXiv 2023, arXiv:2310.18608. [Google Scholar] [CrossRef]
Hasan, T.; Bunescu, R. A survey of affective recommender systems: Modeling attitudes, emotions, and moods for personalization. arXiv 2025, arXiv:2508.20289. [Google Scholar] [CrossRef]
Kim, T.Y.; Ko, H.; Kim, S.H.; Kim, H.D. Modeling of recommendation system based on emotional information and collaborative filtering. Sensors 2021, 21, 1997. [Google Scholar] [CrossRef] [PubMed]
Kahneman, D. Thinking, Fast and Slow; Farrar, Straus and Giroux: New York, NY, USA, 2011. [Google Scholar]
Slovic, P.; Finucane, M.L.; Peters, E.; MacGregor, D.G. The affect heuristic. Eur. J. Oper. Res. 2007, 177, 1333–1352. [Google Scholar] [CrossRef]
Polignano, M.; Narducci, F.; de Gemmis, M.; Semeraro, G. Towards emotion-aware recommender systems: An affective coherence model based on emotion-driven behaviors. Expert Syst. Appl. 2021, 170, 114382. [Google Scholar] [CrossRef]
Huang, X.; Lian, J.; Lei, Y.; Yao, J.; Lian, D.; Xie, X. Recommender ai agent: Integrating large language models for interactive recommendations. ACM Trans. Inf. Syst. 2025, 43, 1–33. [Google Scholar] [CrossRef]
Wang, Y.; Jiang, Z.; Chen, Z.; Yang, F.; Zhou, Y.; Cho, E.; Fan, X.; Lu, Y.; Huang, X.; Yang, Y. Recmind: Large language model powered agent for recommendation. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, Mexico City, Mexico, 16–21 June 2024; pp. 4351–4364. [Google Scholar]
Wang, Z.; Yu, Y.; Zheng, W.; Ma, W.; Zhang, M. Macrec: A multi-agent collaboration framework for recommendation. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA, 14–18 July 2024; pp. 2760–2764. [Google Scholar]
Zhang, A.; Chen, Y.; Sheng, L.; Wang, X.; Chua, T.S. On generative agents in recommendation. In Proceedings of the 47th international ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA, 14–18 July 2024; pp. 1807–1817. [Google Scholar]
Zhang, J.; Hou, Y.; Xie, R.; Sun, W.; McAuley, J.; Zhao, W.X.; Lin, L.; Wen, J.R. Agentcf: Collaborative learning with autonomous language agents for recommender systems. In Proceedings of the ACM Web Conference 2024, Singapore, 13–17 May 2024; pp. 3679–3689. [Google Scholar]
Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China, 1–5 May 2001; pp. 285–295. [Google Scholar]
Koren, Y. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008; pp. 426–434. [Google Scholar]
Koren, Y.; Bell, R.; Volinsky, C. Matrix factorization techniques for recommender systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
Cheng, H.T.; Koc, L.; Harmsen, J.; Shaked, T.; Chandra, T.; Aradhye, H.; Anderson, G.; Corrado, G.; Chai, W.; Ispir, M.; et al. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, Boston, MA, USA, 15 September 2016; pp. 7–10. [Google Scholar]
He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.S. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 173–182. [Google Scholar]
Guo, H.; Tang, R.; Ye, Y.; Li, Z.; He, X. DeepFM: A factorization-machine based neural network for CTR prediction. arXiv 2017, arXiv:1703.04247. [Google Scholar]
Wang, X.; He, X.; Wang, M.; Feng, F.; Chua, T.S. Neural graph collaborative filtering. In Proceedings of the 42nd international ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 165–174. [Google Scholar]
He, X.; Deng, K.; Wang, X.; Li, Y.; Zhang, Y.; Wang, M. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 25–30 July 2020; pp. 639–648. [Google Scholar]
Kang, W.C.; McAuley, J. Self-attentive sequential recommendation. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM), Singapore, 17–20 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 197–206. [Google Scholar]
Lin, X.; Wang, W.; Li, Y.; Feng, F.; Ng, S.K.; Chua, T.S. Bridging items and language: A transition paradigm for large language model-based recommendation. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 1816–1826. [Google Scholar]
Xi, Y.; Liu, W.; Lin, J.; Cai, X.; Zhu, H.; Zhu, J.; Chen, B.; Tang, R.; Zhang, W.; Yu, Y. Towards open-world recommendation with knowledge augmentation from large language models. In Proceedings of the 18th ACM Conference on Recommender Systems, Bari, Italy, 14–18 October 2024; pp. 12–22. [Google Scholar]
He, Z.; Xie, Z.; Jha, R.; Steck, H.; Liang, D.; Feng, Y.; Majumder, B.P.; Kallus, N.; McAuley, J. Large language models as zero-shot conversational recommenders. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 720–730. [Google Scholar]
Hou, Y.; Zhang, J.; Lin, Z.; Lu, H.; Xie, R.; McAuley, J.; Zhao, W.X. Large language models are zero-shot rankers for recommender systems. In Proceedings of the European Conference on Information Retrieval, Glasgow, UK, 24–28 March 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 364–381. [Google Scholar]
Bao, K.; Zhang, J.; Zhang, Y.; Wang, W.; Feng, F.; He, X. Tallrec: An effective and efficient tuning framework to align large language model with recommendation. In Proceedings of the 17th ACM Conference on Recommender Systems, Singapore, 18–22 September 2023; pp. 1007–1014. [Google Scholar]
Zhang, Y.; Feng, F.; Zhang, J.; Bao, K.; Wang, Q.; He, X. Collm: Integrating collaborative embeddings into large language models for recommendation. IEEE Trans. Knowl. Data Eng. 2025, 37, 2329–2340. [Google Scholar] [CrossRef]
Gao, C.; Gao, M.; Fan, C.; Yuan, S.; Shi, W.; He, X. Process-supervised llm recommenders via flow-guided tuning. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, Padua, Italy, 13–18 July 2025; pp. 1934–1943. [Google Scholar]
Gao, C.; Chen, R.; Yuan, S.; Huang, K.; Yu, Y.; He, X. Sprec: Self-play to debias llm-based recommendation. In Proceedings of the ACM on Web Conference 2025, Sydney, NSW, Australia, 28 April–2 May 2025; pp. 5075–5084. [Google Scholar]
Wang, L.; Zhang, J.; Chen, X.; Lin, Y.; Song, R.; Zhao, W.X.; Wen, J.R. Recagent: A novel simulation paradigm for recommender systems. arXiv 2023, arXiv:2306.02552. [Google Scholar]
Wang, L.; Zhang, J.; Yang, H.; Chen, Z.Y.; Tang, J.; Zhang, Z.; Chen, X.; Lin, Y.; Sun, H.; Song, R.; et al. User behavior simulation with large language model-based agents. ACM Trans. Inf. Syst. 2025, 43, 1–37. [Google Scholar] [CrossRef]
Liu, J.; Gu, S.; Li, D.; Zhang, G.; Han, M.; Gu, H.; Zhang, P.; Lu, T.; Shang, L.; Gu, N. AgentCF++: Memory-enhanced LLM-based Agents for Popularity-aware Cross-domain Recommendations. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, Padua, Italy, 13–18 July 2025; pp. 2566–2571. [Google Scholar]
Shu, Y.; Zhang, H.; Gu, H.; Zhang, P.; Lu, T.; Li, D.; Gu, N. Rah! recsys–assistant–human: A human-centered recommendation framework with llm agents. IEEE Trans. Comput. Soc. Syst. 2024, 11, 6759–6770. [Google Scholar] [CrossRef]
Cai, S.; Zhang, J.; Bao, K.; Gao, C.; Wang, Q.; Feng, F.; He, X. Agentic feedback loop modeling improves recommendation and user simulation. In Proceedings of the 48th International ACM SIGIR conference on Research and Development in Information Retrieval, Padua, Italy, 13–18 July 2025; pp. 2235–2244. [Google Scholar]
Pepitone, A. A Theory of Cognitive Dissonance; Stanford University Press: Redwood City, CA, USA, 1959. [Google Scholar]
Hyun, D.; Park, C.; Yang, M.C.; Song, I.; Lee, J.T.; Yu, H. Review sentiment-guided scalable deep recommender system. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018; pp. 965–968. [Google Scholar]
Shi, L.; Wu, W.; Guo, W.; Hu, W.; Chen, J.; Zheng, W.; He, L. SENGR: Sentiment-enhanced neural graph recommender. Inf. Sci. 2022, 589, 655–669. [Google Scholar] [CrossRef]
Zhang, K.; Qian, H.; Liu, Q.; Zhang, Z.; Zhou, J.; Ma, J.; Chen, E. Sifn: A sentiment-aware interactive fusion network for review-based item recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual, 1–5 November 2021; pp. 3627–3631. [Google Scholar]
Revathy, V.; Pillai, A.S.; Daneshfar, F. LyEmoBERT: Classification of lyrics’ emotion and recommendation using a pre-trained model. Procedia Comput. Sci. 2023, 218, 1196–1208. [Google Scholar] [CrossRef]
Darraz, N.; Karabila, I.; El-Ansari, A.; Alami, N.; El Mallahi, M. Integrated sentiment analysis with BERT for enhanced hybrid recommendation systems. Expert Syst. Appl. 2025, 261, 125533. [Google Scholar] [CrossRef]
Wang, D.; Zhao, X. Affective video recommender systems: A survey. Front. Neurosci. 2022, 16, 984404. [Google Scholar] [CrossRef] [PubMed]
Hou, Y.; Li, J.; He, Z.; Yan, A.; Chen, X.; McAuley, J. Bridging Language and Items for Retrieval and Recommendation. arXiv 2024, arXiv:2403.03952. [Google Scholar] [CrossRef]
Rendle, S.; Freudenthaler, C.; Gantner, Z.; Schmidt-Thieme, L. BPR: Bayesian personalized ranking from implicit feedback. arXiv 2012, arXiv:1205.2618. [Google Scholar] [CrossRef]

Figure 1. From traditional assumption to cognitive reality. (Left): Traditional systems assume users as rational agents with consistent preferences, where ratings directly reflect interests. (Right): Real-world observations reveal paradoxical behaviors—cognitive dissonance (rating-sentiment divergence) and hate-watching (continued consumption despite negative feedback). These conflicts reflect the interplay between emotional (System 1) and rational (System 2) decision-making systems, motivating our approach to explicitly model both dimensions.

Figure 2. EDACF overall framework. The framework consists of three layers: (1) top: user agent (left) and item agent (right) with their memory structures; (2) center: a circular four-phase collaborative learning loop (Decision→Feedback→Update→Conflict) that iteratively optimizes memory states; (3) bottom: multi-strategy reasoning layer for final recommendation.

Figure 3. Distribution of user cognitive conflicts in the two datasets.

Figure 4. Performance comparison across different user cognitive state groups.

Figure 5. Conflict-aware mechanism analysis: (a) ground-truth ranking distribution comparison between EDACF-Emotion and EDACF-Full; (b) Hate-watching identification accuracy across different content types, showing significant variation from 87.5% (Action & Adventure) to 50.0% (Drama & Romance).

Figure 6. Reasoning comparison for the hate-watching user case. (Left): The user’s triple memory structure. (Right): Comparison of reasoning results between EDACF-Emotion and EDACF-Full. EDACF-Emotion incorrectly interprets negative emotions as genuine aversion to action/sci-fi genres, ranking ground truth at #8; EDACF-Full understands through conflict memory that negative reviews reflect critical engagement rather than lack of interest, correctly ranking ground truth at #1.

Figure 7. Parameter sensitivity analysis. (a,b) User distribution under varying thresholds. (c,d) Recommendation performance under varying thresholds. Green dashed lines indicate default values.

Table 1. Statistics of preprocessed datasets. “Avg. Int.” denotes the average number of interactions per user.

Dataset	#Users	#Items	#Int.	Sparsity	Avg. Int.
Movies & TV (full)	2865	1876	22,902	99.57%	7.99
-Sampled	100	712	848	98.81%	8.48
Books (full)	2622	2161	23,134	99.59%	8.82
-Sampled	100	689	804	98.84%	8.04

Table 2. Performance comparison of different methods on two datasets. We mark the best results in bold.

	Amazon Movies & TV			Amazon Books
Method	N@5	N@10	MRR@10	N@5	N@10	MRR@10
Traditional Recommendation Methods
BPR-MF	0.403	0.458	0.305	0.388	0.441	0.298
SASRec	0.453	0.515	0.368	0.438	0.498	0.355
LLM-based Recommendation Methods
LLMRank	0.412	0.468	0.325	0.398	0.452	0.312
Agent4Rec	0.433	0.492	0.342	0.418	0.475	0.330
AgentCF	0.446	0.507	0.358	0.431	0.489	0.346
Our Methods
EDACF-Base	0.443	0.503	0.354	0.427	0.485	0.342
EDACF-Emotion	0.468	0.532	0.381	0.453	0.514	0.367
EDACF-Full	0.490 *	0.557 *	0.405 *	0.474 *	0.538 *	0.391 *

Note: Results are averaged over 5 independent runs with standard deviations within 0.02 for all metrics. * indicates statistically significant improvement over AgentCF (Wilcoxon signed-rank test, p < 0.05). EDACF-Base uses only preference memory; EDACF-Emotion fuses preference and emotion memory; EDACF-Full is the complete framework with conflict detection and adaptive reasoning.

Table 3. Ablation study results (Movies & TV dataset).

Variant	NDCG@5	NDCG@10	MRR@10
EDACF-Full	0.490	0.557	0.405
w/o Emotion memory	0.426	0.484	0.344
w/o Conflict memory	0.468	0.532	0.381
w/o Item agent	0.438	0.498	0.358
w/o Adaptive reasoning	0.475	0.540	0.388

Table 4. Performance comparison of different LLMs on the Movies & TV dataset.

LLM Model	NDCG@10	MRR@10
GPT-4o-mini	0.557	0.405
GPT-4o	0.573 (+2.9%)	0.417 (+3.0%)
DeepSeek-V3	0.554 (−0.5%)	0.402 (−0.7%)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, Y.; Wang, Z.; Li, L.; Zeng, D. Emotion-Enhanced Dual-Agent Recommendation: Understanding and Leveraging Cognitive Conflicts for Better Personalization. Appl. Sci. 2026, 16, 253. https://doi.org/10.3390/app16010253

AMA Style

Yang Y, Wang Z, Li L, Zeng D. Emotion-Enhanced Dual-Agent Recommendation: Understanding and Leveraging Cognitive Conflicts for Better Personalization. Applied Sciences. 2026; 16(1):253. https://doi.org/10.3390/app16010253

Chicago/Turabian Style

Yang, Yulin, Zikang Wang, Linjing Li, and Daniel Zeng. 2026. "Emotion-Enhanced Dual-Agent Recommendation: Understanding and Leveraging Cognitive Conflicts for Better Personalization" Applied Sciences 16, no. 1: 253. https://doi.org/10.3390/app16010253

APA Style

Yang, Y., Wang, Z., Li, L., & Zeng, D. (2026). Emotion-Enhanced Dual-Agent Recommendation: Understanding and Leveraging Cognitive Conflicts for Better Personalization. Applied Sciences, 16(1), 253. https://doi.org/10.3390/app16010253

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Emotion-Enhanced Dual-Agent Recommendation: Understanding and Leveraging Cognitive Conflicts for Better Personalization

Abstract

1. Introduction

2. Related Work

2.1. Traditional Recommendation Methods

2.2. LLM-Based Recommendation Systems

2.3. Agent-Based Recommendation Systems

2.4. Affective Recommender Systems

3. Methodology

3.1. Problem Reformulation and Framework Overview

3.2. Emotion-Enhanced Dual-Agent Collaborative Framework (EDACF)

3.2.1. Dual-Agent Structure

User Agent

Item Agent

3.2.2. Collaborative Learning Mechanism

Phase 1: Cognitive Decision

Phase 2: Feedback Generation

Phase 3: Memory Update

Phase 4: Conflict Detection

3.2.3. Multi-Strategy Reasoning

Basic Reasoning Strategy

Emotion-Enhanced Reasoning Strategy

Conflict-Aware Reasoning Strategy

4. Experiments

4.1. Experimental Setup

4.1.1. Datasets

4.1.2. User Cognitive Conflict Distribution

4.1.3. Evaluation Metrics

4.1.4. Baseline Methods

4.1.5. Implementation Details

4.2. Overall Performance

4.3. Further Analysis

4.3.1. Ablation Study

4.3.2. User Group Performance Analysis

4.3.3. Conflict-Aware Mechanism Analysis

4.3.4. Case Study

4.3.5. Parameter Sensitivity Analysis

4.3.6. Robustness Analysis

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Detailed Analysis of Hate-Watching User Case

Appendix A.1. Case Background

Appendix A.2. Agent Memory

Appendix A.3. Reasoning Phase Comparison

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI