Integrating Graph Neural Networks and Large Language Models for Stance Detection via Heterogeneous Stance Networks

Chen, Xinyi; Liu, Bo; Hu, Huaping; Cai, Yiqing; Guo, Mengmeng; Ma, Xingkong

doi:10.3390/app15115809

Open AccessArticle

Integrating Graph Neural Networks and Large Language Models for Stance Detection via Heterogeneous Stance Networks

by

Xinyi Chen

¹

,

Bo Liu

^1,2,*

,

Huaping Hu

¹

,

Yiqing Cai

¹

,

Mengmeng Guo

¹ and

Xingkong Ma

^1,*

¹

College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China

²

Strategic Assessments and Consultation Institute, Academy of Military Sciences, Beijing 100097, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(11), 5809; https://doi.org/10.3390/app15115809

Submission received: 28 April 2025 / Revised: 12 May 2025 / Accepted: 19 May 2025 / Published: 22 May 2025

Download

Browse Figures

Versions Notes

Abstract

Stance detection, the task of identifying the stance expressed in a text toward a specific target, is essential for analyzing public opinion across diverse domains. The existing approaches primarily focus on modeling the semantic relationship between the text and target, but they often struggle when the target is implicit or indirectly referenced. In real-world scenarios, stance is frequently conveyed through references to related entities, events, or contextual implications, making stance detection particularly challenging. To tackle this challenge, we propose a novel framework that leverages large language models to construct a heterogeneous stance network from textual data. Based on this network, we develop two complementary methodologies tailored for distinct application scenarios: (1) In a supervised setting, we employ a graph neural network approach to learn stance representations from the heterogeneous stance network, enhancing stance prediction performance. (2) For zero-shot stance detection, we introduce an LLM-based method that leverages the heterogeneous stance network to infer stance without task-specific supervision. The experimental results on benchmark datasets demonstrate that our methods outperform the existing approaches, highlighting their effectiveness in both supervised and zero-shot scenarios.

Keywords:

stance detection; graph neural networks; large language models; graph learning; heterogeneous networks

1. Introduction

Stance detection, a fundamental task in natural language processing (NLP), aims to identify the stance (e.g., favor, against, or neutral) expressed in a text toward a specific target [1]. As digital communication grows exponentially—from social media debates to product reviews—understanding public sentiment has become crucial for decision-making in politics, business, and social research [2]. For example, stance detection can reveal shifts in public opinion on climate policies, helping governments to refine their messaging, or uncover consumer attitudes toward new products, enabling companies to adjust their marketing strategies.

The rise of artificial intelligence (AI) has revolutionized stance detection by automating the analysis of large-scale textual data. Early approaches relied on manual annotation and rule-based systems, but recent advances in machine learning (ML) and deep learning have significantly improved accuracy and scalability [3]. However, challenges persist, including ambiguity in language, contextual dependencies, and the need for target-specific adaptation.

The current stance detection methods can be broadly classified into two categories. The first category consists of supervised learning approaches [4,5], which train models on labeled data for stance prediction. These methods often leverage deep learning models like BERT [6], which effectively capture the semantic relationships between the text and target. The second category includes zero-shot methods [7,8], which infer stance on unseen targets without requiring specific labeled training data. Earlier zero-shot approaches, such as TOAD [9] and JointCL [10], relied on contrastive learning, transfer learning, or task-adaptive fine-tuning to generalize to new targets. Recently, large language models (LLMs) have been applied to zero-shot stance detection [11,12,13], leveraging in-context learning and prompt engineering to infer stance without task-specific supervision.

While these methods have shown promise, most existing approaches assume a direct textual connection between the stance and target [8,14,15], which is often not the case in real-world scenarios. Stance is frequently conveyed indirectly through references to intermediate entities, events, or contextual implications. For instance, as illustrated in Figure 1, a user may express a stance toward Donald Trump not by explicitly naming him but through hashtags (e.g., #MAGA) or by discussing related entities such as CNN. Similarly, social interactions between different users can imply support or opposition toward a target, even if their direct stance is not explicitly stated. This limitation constrains the effectiveness of both supervised and zero-shot approaches.

To address this limitation, we propose leveraging the observation that indirect stance expression often involves intermediate entities or contextual references linking the text and target. By identifying these entities and analyzing their stance relationships with both text and target, we construct a heterogeneous stance network that captures implicit stance connections. This network captures indirect stance relationships, providing a structured representation that enhances stance detection. Our objective is to utilize this network to improve stance inference in both supervised and zero-shot settings.

In this paper, we propose a novel framework consisting of two key components. First, we construct a heterogeneous stance network using LLMs to extract entities explicitly mentioned in the text and analyze stance relationships among the text, entities, and target. This network encodes implicit stance cues, providing a structured representation that enhances stance inference. Second, we develop two methodologies to leverage this network for different application scenarios: (1) a GNN-based approach that integrates graph information with textual features from BERT to learn stance representations, designed for supervised settings; and (2) an LLM-based zero-shot method that formulates prompts using the heterogeneous stance network, allowing an LLM to infer stance without requiring task-specific training data.

Our experimental results demonstrate the effectiveness of the proposed framework on two benchmark datasets, SemEval-2016 (Sem16) [1] and P-Stance [16], across both supervised and zero-shot settings, achieving superior performance compared to the existing approaches. In summary, our contributions are as follows:

We propose a heterogeneous stance network that systematically models indirect stance relationships, effectively addressing the challenge of implicit stance expression.
We develop two distinct methodologies: a GNN-based approach for supervised settings and an LLM-based zero-shot method, enabling robust stance detection in diverse scenarios.
Our framework establishes new state-of-the-art results on benchmark datasets across both supervised and zero-shot settings, demonstrating its superiority over the existing approaches.

2. Related Work

Stance detection has been extensively studied in NLP, with the existing methods primarily categorized into supervised and zero-shot approaches based on the availability of labeled training data. While supervised learning techniques have traditionally dominated the field, the increasing need for generalization to unseen targets has spurred research into cross-target, transfer learning, and zero-shot methodologies.

2.1. Supervised Approaches

Supervised approaches rely on labeled stance data for each target and optimize classification through task-specific learning. The early methods predominantly leveraged feature engineering with traditional machine learning algorithms such as support vector machines (SVMs) [17] and logistic regression [18], utilizing handcrafted lexical, syntactic, and sentiment-based features. However, deep learning techniques have significantly improved stance detection by enabling automatic feature extraction from raw text [14,19].

One of the earliest deep learning approaches, BiCond [20], employs a bi-directional conditional encoding framework that separately encodes text and target representations before merging them for final stance prediction. With the advent of transformer-based models, particularly BERT [6], stance detection has benefited from capturing richer contextual dependencies between text and target.

Building upon BERT-based architectures, researchers have explored various methods to enhance stance detection. CrossNet [21] adopts a self-attention layer to extract important contextual words toward the target in learning target-specific stance features, facilitating generalization. ASGCN [22] integrates aspect-based sentiment analysis (ABSA) with stance detection by employing graph convolutional networks (GCNs) to model syntactic dependencies in text. Similarly, TPDG [23] introduces a target-adaptive pragmatic dependency graph coupled with an interactive graph neural network, effectively capturing both intra-target and cross-target dependencies. While TPDG includes zero-shot experiments, its primary contribution lies in the supervised paradigm due to its reliance on labeled data for training.

In addition to these methods, other works have explored multi-task learning [24] and contrastive learning [25] to enhance model robustness and generalization. Despite achieving high accuracy with sufficient labeled data, supervised approaches remain constrained by their dependence on extensive annotations, limiting their applicability to emerging topics, new targets, and low-resource domains.

2.2. Zero-Shot Approaches

Zero-shot stance detection addresses the challenge of generalizing to unseen targets without requiring target-specific labeled training data [26]. The early zero-shot approaches mainly rely on domain transfer methods [9,27,28,29,30], contrastive learning techniques [31], and the integration of external knowledge [32,33] to learn generalizable stance representations.

For example, TOAD [9] formulates stance detection as a domain-adaptive learning problem, employing task-oriented adversarial learning to mitigate distribution shifts between seen and unseen targets. JointCL [10] employs contrastive learning to enhance target-agnostic representations by training on multiple datasets and minimizing task-specific biases.

Recently, LLMs, such as GPT-3.5, have emerged as powerful tools for zero-shot stance detection. Rather than task-specific training, in-context learning and prompt engineering enable LLMs to perform stance classification without fine-tuning. For example, based on GPT-3.5, COLA [13] introduces a collaborative, role-infused framework where multiple LLM-based agents interact to infer stance by modeling discourse context and user interactions. MB-Cal [34] focuses on reducing biases in LLMs for stance detection, using calibration techniques to enhance model robustness and fairness.

Although these methods have demonstrated effectiveness, both supervised and zero-shot approaches still face significant challenges in accurately detecting stance when the target is implicit as they often struggle to capture the nuanced relationships and indirect associations that arise in real-world scenarios.

2.3. Social and Ideological Dimensions of Stance

Beyond methodological advancements, recent studies have increasingly examined the societal implications of stance detection, particularly how online platforms amplify ideological divisions through implicit language. Research has shown that ambiguous stance expressions (e.g., sarcasm or coded language) contribute to echo chambers and polarization [35], while platform-specific linguistic patterns create unique challenges for stance interpretation [36]. These findings highlight the need for detection models that account for both textual signals and their social context—a gap our zero-shot approach helps to address by capturing indirect stance indicators without relying on explicit target references.

3. Methodology

3.1. Formal Task Definition

Given a set of posts

P = {p_{1}, p_{2}, \dots, p_{n}}

and a set of targets

T = {t_{1}, t_{2}, \dots, t_{m}}

, the goal of stance detection is to identify the stance

s_{i j}

expressed in a post

p_{i} \in P

toward a target

t_{j} \in T

. Each stance

s_{i j}

is categorized into one of three possible labels: favor, against, or neutral. The goal is to accurately classify each post–target pair into one of these stance categories.

Beyond the direct relationship between

p_{i}

and target

t_{j}

, stance detection benefits from integrating relational information involving entities linked to posts and targets. To enhance stance detection, we extend the task to incorporate multi-hop relationships among posts, targets, and intermediate entities

e \in E

(e.g., users, hashtags, organizations, and topics), represented as

p_{i} \to e \to t_{j}

or more complex paths. This heterogeneous multi-hop relational structure can be effectively captured by constructing a heterogeneous stance network

G = (V, R)

, where the nodes

V

include posts, targets, and intermediate entities, and the edges

R

represent the relationships among these nodes.

Thus, stance detection can be formulated as learning a function

f : (P, T, G) \to S

, where

S = {favor, against, neutral}

, predicting the stance

s_{i j}

by leveraging both the semantic content of the posts and the relational information in the heterogeneous network

G

. By leveraging multi-hop relationships within this network, our approach enhances stance prediction accuracy, enabling a deeper understanding of contextual dependencies between posts and targets.

Our framework, illustrated in Figure 2, consists of two key components: (1) heterogeneous stance network construction, where we utilize LLMs to process posts and targets from a given dataset, forming a heterogeneous stance network that incorporates multiple node types and diverse relational edges; and (2) multi-view stance detection, which integrates textual and graph-based representations for stance classification in both supervised and zero-shot scenarios. In the supervised setting, we employ BERT and relational graph convolutional networks (RGCNs) to jointly model post–target textual information and structural relationships for classification. In the zero-shot setting, we enhance LLMs’ ability to perform stance detection on implicit targets by encoding post–target connection paths within the network.

3.2. Heterogeneous Stance Network

3.2.1. Network Definition

As shown in Figure 2, the heterogeneous stance network (HSN) is designed to effectively capture indirect stance relationships prevalent in real-world discourse. Rather than assuming a direct connection between a post and its target, the network explicitly integrates intermediate entities to account for stance expression through references, interactions, or contextual factors. This structured design enables the network to capture richer semantic and relational information, enhancing interpretability and robustness, particularly in cases where the target is not explicitly mentioned. Moreover, even when a post explicitly states its stance toward the target, the network provides additional contextual enrichment, further refining the stance representation and improving detection accuracy.

The network consists of the following types of nodes:

Posts: Each post that expresses a stance toward a target is represented as a node in the network.
Targets: These nodes represent the subjects or entities toward which stance is expressed. Examples include political figures, organizations, or controversial topics.
Entities: Entities extracted from posts enrich contextual understanding. Depending on the dataset characteristics, entity nodes can be further categorized. For instance, in the case of “X” (formerly Twitter) data, entity nodes include the following: (1) hashtags: represent hashtags used in posts, which often serve as implicit stance indicators; (2) users: represent user mentions, capturing relationships between different social media users; (3) named entities: represent named entities such as people, organizations, or locations mentioned in the text.

Edges in the HSN represent relationships among these nodes and are categorized into three types:

Favor Edges: Indicating a positive or supportive stance.
Against Edges: Indicating an opposing or negative stance.
Neutral Edges: Indicating the absence of a strong stance.

This classification aligns with the stance label scheme commonly used in benchmark datasets such as SemEval-2016 (Sem16) [1], where the stance labels include favor, against, and none. In our network, we replace none with neutral to better reflect its semantic meaning.

3.2.2. Network Construction

The construction of the HSN involves extracting entities and determining relationships among nodes. We first extract entities from posts and targets using a combination of rule-based methods and LLMs. In the posts, certain entities, such as hashtags or users, can be extracted easily using regular expressions. As shown in Figure 3, for named entities, we utilize an LLM to extract them from the text. The entity extraction process is formalized as

E = RegEx (P) \cup {LLM}_{NER} (P),

(1)

where RegEx is a regular expression-based entity extractor, and

{LLM}_{NER}

is an LLM with a named entity recognition (NER) prompt. Relationships between nodes are determined by analyzing the stance expressed between posts/targets and entities. An LLM is used for stance detection between pairs of nodes, represented as

s = {LLM}_{SD} (u, v),

(2)

where u and v represent nodes in the set of posts, targets, or entities. Using the identified relationships, we construct edges between nodes, where each edge captures the relationship type (favor, against, or neutral). To reduce query costs, stance detection is performed only for the given post–target pair and its directly extracted entities. Specifically, as demonstrated in Figure 3, for a given post–target pair, we determine stance relationships only between (1) the post and its extracted entities, and (2) the target and the extracted entities. This ensures that stance detection does not extend to entities extracted from other post–target pairs, effectively limiting unnecessary queries while maintaining a localized and computationally efficient stance representation. For subsequent training, we represent edges as undirected:

R = {(u, s, v), (v, s, u) | u \in P \cup T, v \in E}

(3)

Finally, the heterogeneous stance network is constructed as

G = (V, R),

(4)

where

V = P \cup T \cup E

represents the set of nodes.

To improve the representation of nodes that lack textual descriptions, we enhance entity nodes by generating descriptive text. This process involves using a large language model to generate a description for each entity node e:

desc (e) = {LLM}_{Desc} (e)

(5)

The generated descriptions are used as initial features for the corresponding nodes, which aids in initializing the node embeddings for downstream tasks. This enhancement ensures that even those nodes without an original textual description have rich representations that can be utilized during feature learning and graph-based stance detection. Notably, LLMs cannot generate descriptions for all nodes as some nodes (e.g., less-known or anonymous users) may lack sufficient contextual information or publicly available data for the model to infer meaningful descriptions.

Figure 3. The three-step process for constructing a heterogeneous stance network from tweets using LLMs. Step 1 extracts entities, Step 2 determines stance relations, and Step 3 enhances nodes with descriptions. For clarity, prompts are simplified, omitting certain details. Full prompts are in Appendix A.

3.3. Multi-View Stance Detection

Although the HSN effectively captures indirect stance relationships, the direct semantic link between a post and its target remains a crucial component of stance detection. To fully utilize both aspects, we introduce a multi-view stance detection framework that integrates a text view, capturing semantic features from textual content, and a graph view, which encodes contextual and relational dependencies using HSN. By integrating these complementary perspectives, our framework constructs a more holistic and robust stance representation. As shown in Figure 2, we introduce two distinct methodologies to cater to different application scenarios: a GCN-based approach, which learns structured stance representations through graph message passing, and an LLM-based approach, which utilizes HSN-enhanced prompting to infer stance in a zero-shot setting. Both approaches are built upon the proposed HSN and leverage its multi-view architecture to enhance stance detection performance.

3.3.1. GCN Approach

The text view captures the semantic representation of a post–target pair via a pre-trained BERT model. Specifically, each post

p_{i}

and its corresponding target

t_{j}

are concatenated with special tokens [CLS] and [SEP] and then encoded using BERT.

The graph view models stance-related relationships using HSN, and the initial node representation for post nodes, target nodes, and entity nodes is obtained using BERT embeddings. Since the text view and the graph view serve distinct tasks, we employ two separate BERT instances, denoted as

{BERT}_{1}

and

{BERT}_{2}

, to optimize performance for each specific objective:

h_{i j}^{Text} = {BERT}_{1} ([CLS] p_{i} [SEP] t_{j} [SEP]),

(6)

h_{u}^{(0)} = \{\begin{matrix} {BERT}_{2} (desc (u)), & if u \in E \cup T \\ {BERT}_{2} (u), & if u \in P \end{matrix}

(7)

where

h_{i j}^{Text} \in R^{d}

denotes the feature embedding of the post–target pair, and

h_{u}^{(0)} \in R^{d}

denotes the initial graph representation of node u. Additionally, for nodes that lack textual descriptions, we use random initialization for their initial embeddings.

To effectively propagate relational information, we employ relational graph convolutional network (RGCN) layers [37], updating node embeddings based on their neighbors:

h_{u}^{(l + 1)} = σ (\sum_{s \in S} \sum_{v \in N_{s} (u)} \frac{1}{c_{u, s}} W_{r}^{(l)} h_{v}^{(l)} + W_{0}^{(l)} h_{u}^{(l)}),

(8)

where

h_{u}^{(l + 1)} \in R^{d}

is the updated feature representation of node u at layer

l + 1

. Here,

N_{s} (u)

represents the neighbors of node u connected via relation type s, and

c_{u, s}

is a normalization factor.

W_{r}^{(l)}

and

W_{0}^{(l)}

are learnable weight matrices for the neighbor nodes and the self-connection of the node, respectively.

σ

is a non-linear activation function (e.g., ReLU [38]). We denote the final representation of node u as

h_{u}^{Graph}

.

To integrate both text-based and graph-based features, we begin by concatenating the graph representations of the post

p_{i}

and the target

t_{j}

. This combined representation is then passed through a fully connected layer to generate a shared feature representation. Specifically, this process can be described by the following equation:

h_{i j}^{Graph} = W_{1} \cdot concat (h_{p_{i}}^{Graph}, h_{t_{j}}^{Graph}) + b_{1}

(9)

where

h_{i j}^{Graph} \in R^{d}

is the resulting graph-based feature vector, while

W_{1}

and

b_{1}

are learnable weight and bias vectors, respectively. This operation serves to fuse the graph-based information from both the post and the target into a unified feature representation.

To further refine the interaction between the textual and graph-based features and to capture potential dependencies between them, we employ a cross-attention mechanism. This mechanism allows the model to focus on the most relevant aspects of the textual and graph representations during fusion. The interaction is defined by

h_{i j} = CrossAttention (Q = h_{i j}^{Text}, K = V = h_{i j}^{Graph}),

(10)

where

h_{i j} \in R^{d}

is the resulting fused feature vector, incorporating the most important interactions between the text and graph features.

Finally, the stance

{\hat{s}}_{i j}

is predicted using a softmax classifier, which generates a probability distribution over the stance categories: favor, against, and neutral. This prediction is formulated as follows:

{\hat{s}}_{i j} = Softmax (W_{2} h_{i j} + b_{2}),

(11)

where

W_{2}

and

b_{2}

are learnable parameters, and

{\hat{s}}_{i j}

represents the predicted stance distribution. By incorporating both text-based and graph-based features and refining their interactions through attention, this model effectively predicts the stance with high accuracy. The model is trained using the cross-entropy loss:

L = - \sum_{c \in {favor, against, neutral}} y_{i j}^{(c)} log {\hat{s}}_{i j}^{(c)}

(12)

where

y_{i j}^{(c)}

is the one-hot ground truth label, and

{\hat{s}}_{i j}^{(c)}

is the predicted probability for class c.

3.3.2. LLM Approach

Although the GCN-based approach effectively captures explicit stance structures, its reliance on labeled training data limits its ability to generalize in zero-shot scenarios. To address this limitation, we introduce an LLM-based stance detection method, which enhances the model’s reasoning capability by incorporating external knowledge from the HSN.

Rather than depending solely on textual content, our method enriches the LLM’s input by incorporating structured knowledge extracted from the HSN. Specifically, we identify stance-relevant paths within the HSN, particularly those connecting the post and target through intermediate entities in two hops. These paths are then converted into natural language statements and appended to the LLM prompt as additional context.

The choice of two-hop paths is intentional and based on the structure of the HSN. Paths between the post and target always have an even length, such as post–entity–target or post(A)–entity(B)–post(C)–entity(D)–target(E). Beyond two hops, the number of potential paths grows exponentially, leading to increased complexity and resource consumption. This could also introduce irrelevant information that might hurt model performance. Thus, restricting the path length to two hops ensures a balance between enriching the input and maintaining efficiency.

As illustrated in Figure 4, the given post does not explicitly state its stance toward Atheism. However, the HSN provides critical clues: (1) the post expresses an against stance toward religion, and (2) Atheism also holds an against stance toward religion. By analyzing these indirect stance relationships, the LLM deduces that the post expresses a favor stance toward Atheism. This method effectively enables zero-shot generalization by allowing the LLM to leverage structured knowledge for stance detection, even when direct post–target interactions are absent.

4. Experiments

4.1. Experimental Setup

4.1.1. Datasets

To thoroughly assess our model’s performance, we employ two widely recognized stance detection datasets: SEM16 [1] and P-Stance [16]. These datasets are chosen to encompass both broad and politically focused stance detection scenarios, ensuring robustness and generalizability. The SEM16 dataset originates from the SemEval-2016 Task 6 competition and is a standard benchmark for stance detection. It consists of 4870 tweets expressing stances toward six targets: Atheism (AT), Climate Change is a Real Concern (CC), Feminism (FM), Hillary Clinton (HC), Legalization of Abortion (LA), and Donald Trump (DT). Notably, the DT target lacks training data, making supervised learning infeasible for this category. The P-Stance dataset is a large-scale stance detection dataset collected from social media, specifically designed for political stance detection. It contains 21,574 tweets related to three major political figures: Donald Trump, Joe Biden, and Bernie Sanders, with a balanced distribution across stance categories to reduce label bias. Table 1 provides a summary of dataset statistics. Both datasets adopt three labels: favor, against, and none. In our implementation, we reinterpret none as neutral to better align with our stance representation.

4.1.2. Evaluation

We adopt the Macro F1 score as our evaluation metric, consistent with prior stance detection studies. Specifically, we compute the average F1 score for the favor and against classes only, excluding the neutral class. This evaluation strategy enables fair comparisons with previous studies and better captures the model’s effectiveness in distinguishing polarized opinions. The Macro F1 score is calculated by first computing the harmonic mean of precision and recall for each class independently and then averaging the scores for the favor and against categories.

We conduct two main types of experiments:

Supervised Experiment: In this setting, models are trained and evaluated on the same set of targets. This represents a conventional stance detection scenario where the model has prior exposure to specific targets during both training and testing phases.

Zero-Shot Experiment: In the zero-shot setting, the model is tested on unseen targets, without any prior exposure to them during training. This setting evaluates the model’s ability to generalize by inferring stances for previously unseen targets using learned representations and relationships.

4.1.3. Baselines

We evaluate our model against a diverse set of strong baselines in both supervised and zero-shot settings to thoroughly assess its effectiveness.

In the supervised setting, we consider BiCond [20], BERT [6], CrossNet [21], ASGCN [22], TPDG [23], and MVSD(GCN) as benchmarks.

For the zero-shot setting, we include TOAD [9], JointCL [10], GPT-3.5-direct, COLA [13], MB-Cal [34], and MVSD(LLM). Among these, TOAD and JointCL approximate the zero-shot scenario by systematically excluding one target from the training set, enabling the model to learn from related instances. Conversely, COLA, MB-Cal, and MVSD(LLM) utilize GPT-3.5 without labeled training data, showcasing the potential of LLMs for zero-shot stance detection. GPT-3.5-direct refers to directly using GPT-3.5 for stance detection without any additional prompts.

4.1.4. Implementation Details

Experiments were conducted on a computing server featuring four NVIDIA A100 GPUs. We utilized GPT-3.5 Turbo via the OpenAI (https://openai.com/, accessed on 8 November 2024) API, while Llama3.1-70B and Qwen2.5-72B were downloaded from Hugging Face (https://huggingface.co/models, accessed on 1 November 2024) and deployed locally on our server. We implemented our model in PyTorch 2.0.0, using BERT-base-cased as the backbone with a hidden dimension of 768. We employed a two-layer RGCN, with each layer producing an output dimension of 768. We optimized the model using AdamW with a 1 ×

10^{- 5}

learning rate.

4.2. The Overall Comparison

Table 2 and Table 3 report the Macro F1 scores of various models on the SEM16 and P-Stance datasets across both supervised and zero-shot settings. The results demonstrate that our proposed models, MVSD(GCN) for the supervised setting and MVSD(LLM) for the zero-shot setting, achieve the highest overall performance across both datasets, underscoring the effectiveness of our multi-view stance detection approach.

In the supervised setting, MVSD(GCN) significantly outperforms other baselines, achieving average F1 scores of 75.4% on SEM16 and 81.4% on P-Stance. MVSD(GCN)’s success is driven by two key factors. First, it utilizes the LLM-generated HSN, which enriches the graph with additional contextual and relational information, significantly enhancing its ability to capture nuanced stance dependencies. Second, MVSD(GCN) integrates textual features with graph-based representations, effectively linking linguistic cues to structural relationships for a deeper understanding of stance dynamics.

In the zero-shot setting, MVSD(LLM) achieves state-of-the-art performance, with average F1 scores of 75.0% on SEM16 and 84.1% on P-Stance, significantly outperforming existing zero-shot models. LLM-based approaches generally outperform traditional learning-based methods due to their superior adaptability to unseen targets. Notably, directly using GPT-3.5 without optimization leads to suboptimal performance, underscoring the critical role of prompt engineering in enhancing model reasoning. COLA, MB-Cal, and MVSD(LLM) refine prompt engineering strategies to enhance LLM reasoning, resulting in significantly better performance. Among these, MVSD(LLM) excels on SEM16, highlighting its strength in incorporating HSN-augmented knowledge for enhanced stance detection. In contrast, on P-Stance, MVSD(LLM) performs comparably to COLA and MB-Cal. This difference stems from dataset characteristics: SEM16 includes numerous abstract targets, where external knowledge and relational reasoning from HSN significantly enhance stance detection. Conversely, P-Stance centers on political figures, where stance expressions are more explicit, diminishing the impact of graph-based enhancements. A particularly notable case is the performance on target Atheism in SEM16. MVSD(LLM) outperforms direct GPT-3.5 usage by over 30% but falls slightly behind COLA and MB-Cal. This discrepancy arises from prompt design: while MVSD(LLM) employs simple prompts to leverage HSN, COLA and MB-Cal utilize more complex prompts that enable finer-grained target comprehension.

The strengths and weaknesses of MVSD(GCN) and MVSD(LLM) vary based on the availability of training data. On SEM16, MVSD(GCN) outperforms MVSD(LLM) overall, benefiting from the ability to fine-tune on labeled training data. Unlike LLM-based methods that encode HSN information via prompt engineering, GCN offers a holistic perspective by explicitly modeling stance relationships within the dataset. This allows MVSD(GCN) to better capture complex indirect stance dependencies, an advantage that is particularly pronounced for abstract targets. On P-Stance, however, the advantage of GCN is limited. Since stance expressions in this dataset tend to be more direct, the need for global relational reasoning is less critical. As a result, MVSD(LLM) surpasses MVSD(GCN)’s performance without any training data, underscoring its strong generalization ability.

In summary, the results validate the effectiveness of our proposed models across both supervised and zero-shot stance detection tasks. MVSD(GCN) demonstrates its superiority in leveraging training data to build structured graph-based representations, while MVSD(LLM) showcases the potential of prompt-optimized LLMs for generalizing to novel targets without the need for labeled data. These results underscore the robustness and versatility of our multi-view stance detection framework in real-world applications.

4.3. Comparison of Different LLMs

We validate the effectiveness of our models by conducting experiments on the SEM16 dataset using three LLMs: Qwen-2.5, Llama-3.1, and GPT-3.5. These models are selected because they are comparable in terms of their capabilities and performance, representing a diverse range of LLMs. Table 4 and Figure 5 present a comparative analysis of stance detection performance across different LLM-generated graphs. The results highlight the significant influence of LLM-generated graph structures on the performance of our MVSD models.

Table 4 shows that MVSD(GCN) consistently achieves the best performance across all LLM-generated graphs. Graphs generated by Qwen-2.5 (81.3%) and Llama-3.1 (80.5%) yield stronger results than those from GPT-3.5 (75.4%). MVSD(LLM) achieves 78.4% with Qwen-2.5 and 78.9% with Llama-3.1, although it slightly lags behind MVSD(GCN). The LLM-Direct approach, which omits graph structures, performs the worst, especially with GPT-3.5 (54.5%), highlighting the crucial role of graph-based features in stance detection. In summary, the LLMs can be ranked as Qwen-2.5 ≈ Llama-3.1 > GPT-3.5 in terms of performance.

Figure 5 visualizes the differences in node generation across LLMs, categorizing nodes into users, hashtags, and entities. Users and hashtags are extracted using regular expressions, ensuring the number of each is consistent across models. Qwen-2.5 and Llama-3.1 generate graphs with approximately 18% more entity nodes than GPT-3.5, resulting in denser graphs. This structural difference suggests that richer, more comprehensive graph representations improve stance detection performance.

These findings emphasize the importance of choosing LLMs that generate structurally rich and contextually diverse graphs. Models like Qwen-2.5 and Llama-3.1, which generate more informative heterogeneous networks, significantly improve stance classification performance.

To further illustrate the differences in entity extraction among various LLMs, we visualize the extracted entities using word clouds, as shown in Figure 6. This visualization provides an intuitive comparison of the models’ extraction behaviors, revealing both differences and commonalities. Notably, while each LLM exhibits unique extraction patterns, certain key entities appear consistently across all models. For instance, the entity ‘God’ is extracted frequently by all three models, highlighting its significance in stance detection, particularly concerning the Atheism label. Expressions of support for ‘God’ implicitly indicate opposition to Atheism, reinforcing the entity’s relevance in stance classification. Similarly, entities such as ‘feminist’ are predominantly associated with the Feminism label, reflecting their ideological importance. This figure visually represents a crucial aspect of how our HSN is constructed and further explains why incorporating HSN enhances performance. By capturing and structuring these key entities, HSN effectively models the underlying relationships between different stance categories, thereby improving the overall stance detection process.

4.4. Ablation Studies

To better understand the contributions of different components in our MVSD(GCN) and MVSD(LLM) models, we perform an ablation study by systematically removing key elements and evaluating their impact on performance. Table 5 presents the average Macro F1 scores after ablating different components. Specifically, “w/o R” replaces RGCN with a standard GCN, which does not differentiate between stance relationships (favor, against, or neutral). “w/o H-node”, “w/o U-node”, and “w/o E-node” remove hashtag, user, and named entity nodes, respectively, while “w/o Att” removes the self-attention mechanism.

For MVSD(GCN), replacing RGCN with a standard GCN (w/o R) causes a substantial performance drop from 75.4% to 62.4%, highlighting the importance of relational modeling in stance detection. The inability to distinguish between stance relationships weakens the model’s ability to capture nuanced interactions. Removing specific node types also degrades performance, with the most significant drop occurring when entity nodes are removed (71.5%), underscoring the importance of entity information in stance reasoning. Eliminating self-attention (w/o Att) leads to a moderate decline (72.7%), indicating that attention-based interaction modeling refines the model’s representations.

For MVSD(LLM), we observe similar trends, with the removal of hashtag (70.2%), user (71.4%), or entity nodes (68.6%) consistently reducing performance. MVSD(LLM) is notably more sensitive to the removal of entity nodes than MVSD(GCN), suggesting that LLM-based features benefit more from explicit entity modeling. This highlights the critical role of structured knowledge in enhancing LLM-driven stance classification.

Overall, these results confirm that both relational graph modeling and multi-view representations are crucial for stance detection. Removing any key component causes significant performance degradation, reinforcing the need to integrate structured knowledge and multi-view information in our approach.

5. Conclusions

In this work, we propose MVSD, a multi-view stance detection framework that effectively integrates textual and structural features for both supervised and zero-shot settings. By leveraging heterogeneous stance networks with MVSD(GCN) and MVSD(LLM), our approach consistently outperforms the existing baselines across various datasets. The experiments on SEM16 and P-Stance demonstrate that MVSD(GCN) excels when labeled data are available, while MVSD(LLM) showcases strong generalization capabilities in zero-shot scenarios. Additionally, an ablation study further emphasizes the critical roles of relational graph modeling, multi-view representations, and attention mechanisms in improving performance.

Despite its strengths, our framework has several limitations. First, MVSD(GCN) relies on graph structure quality, which may degrade if noisy or incomplete stance relations are present. Second, while MVSD(LLM) reduces dependency on labeled data, its zero-shot performance is still constrained by the LLM’s pre-trained knowledge and prompt design. Third, our current approach focuses on static stance detection, leaving dynamic stance shifts (e.g., temporal evolution in debates) for future exploration.

Our work highlights the value of combining structural (HSN) and semantic (text) views for stance detection, offering a more robust solution than single-view methods. The success of MVSD(LLM) in zero-shot settings suggests that LLMs, when guided by relational graphs, can reduce annotation costs while maintaining competitive accuracy, a promising direction for low-resource scenarios. Moreover, the framework’s modular design allows extensions to other graph-aware NLP tasks, such as rumor detection or opinion mining.

For future work, we aim to enhance our approach by (1) generating higher-quality heterogeneous stance networks to capture richer relational information, (2) incorporating robust feature fusion methods, and (3) refining task-specific prompts for LLMs to improve zero-shot performance. Furthermore, we plan to explore how this architecture can support deeper interpretive analysis of ideological discourse online, such as uncovering latent bias patterns or tracing the evolution of narratives across communities. These directions will help to push the boundaries of stance detection and increase its real-world applicability while bridging the gap between predictive modeling and sociolinguistic insights.

Author Contributions

Writing—original draft preparation, X.C.; supervision, B.L. and H.H.; writing—review and editing, Y.C. and M.G.; resources, X.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available at “https://github.com/chuchun8/PStance” (P-Stance) (accessed on 1 May 2024) and “https://alt.qcri.org/semeval2016/task6/” (SEM16) (accessed on 1 May 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

NLP	Natural Language Processing
BERT	Bi-directional Encoder Representations from Transformers
LLM	Large Language Model
GNN	Graph Neural Network
HSN	Heterogeneous Stance Network
GCN	Graph Convolutional Network
RGCN	Relational Graph Convolutional Network
MVSD	Multi-View Stance Detection
GPU	Graphics Processing Unit

Appendix A. Details of the Propmts

To support further research in constructing the heterogeneous stance network, we provide the details of the prompts used in our experiments in Table A1.

Table A1. Prompts’ details.

Function	Prompt
Extracting entities from tweet (step 1)	In the following tweet, identify entities (concepts, people, events, etc.) that indirectly express a stance toward the target. These should be things that are not the target itself but are related to it and help to express an opinion about it. If multiple entities are found, please separate them with commas. Return ‘None’ if no entities are found. Tweet: [tweet] Target: [target]
Classifying the stance between the tweet and the entity (step 2)	Classify the stance of the following tweet towards the entity as either ‘favor’, ‘against’, or ‘neutral’. If it is ambiguous or unclear, return ‘neutral’. DO NOT RETURN ANYTHING ELSE. Tweet: [tweet] Entity: [entity]
Classifying the stance between the target and the entity (step 2)	According to the tweet, classify the stance between the entity and the target as either ‘favor’, ‘against’, or ‘neutral’. If it is ambiguous or unclear, return ‘neutral’. DO NOT RETURN ANYTHING ELSE. Tweet: [tweet] Entity: [entity] Target: [target]
Enhancing textual description of the entity (step 3)	Please briefly describe [entity]. If you do not know, return ‘unknown’.

References

Mohammad, S.; Kiritchenko, S.; Sobhani, P.; Zhu, X.; Cherry, C. SemEval-2016 Task 6: Detecting Stance in Tweets. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, CA, USA, 16–17 June 2016. [Google Scholar] [CrossRef]
Küçük, D.; Can, F. Stance Detection: A Survey. ACM Comput. Surv. 2020, 53, 12. [Google Scholar] [CrossRef]
ALDayel, A.; Magdy, W. Stance detection on social media: State of the art and trends. Inf. Process. Manag. 2021, 58, 102597. [Google Scholar] [CrossRef]
Bar-Haim, R.; Bhattacharya, I.; Dinuzzo, F.; Saha, A.; Slonim, N. Stance Classification of Context-Dependent Claims. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers; Lapata, M., Blunsom, P., Koller, A., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2017; pp. 251–261. [Google Scholar] [CrossRef]
Wei, P.; Mao, W.; Zeng, D. A Target-Guided Neural Memory Model for Stance Detection in Twitter. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the North American Chapter of the Association for Computational Linguistics, Minneapolis, MN, USA, 2–9 June 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
Allaway, E.; McKeown, K. Zero-Shot Stance Detection: A Dataset and Model using Generalized Topic Representations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP); Webber, B., Cohn, T., He, Y., Liu, Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 8913–8931. [Google Scholar] [CrossRef]
Liang, B.; Chen, Z.; Gui, L.; He, Y.; Yang, M.; Xu, R. Zero-Shot Stance Detection via Contrastive Learning. In Proceedings of the WWW ’22: Proceedings of the ACM Web Conference, Virtual, 25–29 April 2022; pp. 2738–2747. [Google Scholar] [CrossRef]
Allaway, E.; Srikanth, M.; McKeown, K. Adversarial Learning for Zero-Shot Stance Detection on Social Media. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tür, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 4756–4767. [Google Scholar] [CrossRef]
Liang, B.; Zhu, Q.; Li, X.; Yang, M.; Gui, L.; He, Y.; Xu, R. JointCL: A Joint Contrastive Learning Framework for Zero-Shot Stance Detection. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Muresan, S., Nakov, P., Villavicencio, A., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 81–91. [Google Scholar] [CrossRef]
Zhang, B.; Ding, D.; Jing, L. How would Stance Detection Techniques Evolve after the Launch of ChatGPT? arXiv 2022, arXiv:2212.14548. [Google Scholar]
Huang, H.; Zhang, B.; Li, Y.; Zhang, B.; Sun, Y.; Luo, C.; Peng, C. Knowledge-enhanced Prompt-tuning for Stance Detection. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2023, 22, 1–20. [Google Scholar] [CrossRef]
Lan, X.; Gao, C.; Jin, D.; Li, Y. Stance Detection with Collaborative Role-Infused LLM-Based Agents. Proc. Int. AAAI Conf. Web Soc. Media 2024, 18, 891–903. [Google Scholar] [CrossRef]
Zhou, Y.; Cristea, A.I.; Shi, L. Connecting Targets to Tweets: Semantic Attention-Based Model for Target-Specific Stance Detection. In Proceedings of the WISE 18th International Conference, Puschino, Russia, 7–11 October 2017. [Google Scholar]
Sun, Q.; Wang, Z.; Zhu, Q.; Zhou, G. Stance Detection with Hierarchical Attention Network. In Proceedings of the International Conference on Computational Linguistics, Santa Fe, NM, USA, 21–25 August 2018. [Google Scholar]
Li, Y.; Sosea, T.; Sawant, A.; Nair, A.J.; Inkpen, D.; Caragea, C. P-Stance: A Large Dataset for Stance Detection in Political Domain. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021; Zong, C., Xia, F., Li, W., Navigli, R., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; Volume ACL/IJCNLP 2021, pp. 2355–2365. [Google Scholar] [CrossRef]
Patra, B.G.; Das, D.; Bandyopadhyay, S. JU_NLP at SemEval-2016 Task 6: Detecting Stance in Tweets using Support Vector Machines. In Proceedings of the International Workshop on Semantic Evaluation, San Diego, CA, USA, 16–17 June 2016. [Google Scholar]
Lai, M.; Cignarella, A.T.; Farías, D.I.H. iTACOS at IberEval2017: Detecting Stance in Catalan and Spanish Tweets. In Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017), Murcia, Spain, 19 September 2017. [Google Scholar]
Rajendran, G.; Chitturi, B.; Poornachandran, P. Stance-In-Depth Deep Neural Approach to Stance Classification. Procedia Comput. Sci. 2018, 132, 1646–1653. [Google Scholar] [CrossRef]
Augenstein, I.; Rocktäschel, T.; Vlachos, A.; Bontcheva, K. Stance Detection with Bidirectional Conditional Encoding. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing; Su, J., Carreras, X., Duh, K., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2016; pp. 876–885. [Google Scholar] [CrossRef]
Xu, C.; Paris, C.; Nepal, S.; Sparks, R. Cross-Target Stance Classification with Self-Attention Networks. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers); Gurevych, I., Miyao, Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 778–783. [Google Scholar] [CrossRef]
Zhang, C.; Li, Q.; Song, D. Aspect-based Sentiment Classification with Aspect-specific Graph Convolutional Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP); Inui, K., Jiang, J., Ng, V., Wan, X., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 4567–4577. [Google Scholar] [CrossRef]
Liang, B.; Fu, Y.; Gui, L.; Yang, M.; Du, J.; He, Y.; Xu, R. Target-adaptive Graph for Cross-target Stance Detection. In Proceedings of the WWW’21: Proceedings of the Web Conference 2021, Ljubljana Slovenia, 19–23 April 2021; Leskovec, J., Grobelnik, M., Najork, M., Tang, J., Zia, L., Eds.; ACM: New York, NY, USA, 2021; pp. 3453–3464. [Google Scholar] [CrossRef]
Li, Y.; Caragea, C. Multi-Task Stance Detection with Sentiment and Stance Lexicons. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP); Inui, K., Jiang, J., Ng, V., Wan, X., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 6299–6305. [Google Scholar] [CrossRef]
Liu, G.T.; Zhang, Y.J.; Wang, C.L.; Lu, M.Y.; Tang, H.L. Comparative learning based stance agreement detection framework for multi-target stance detection. Eng. Appl. Artif. Intell. 2024, 133, 108515. [Google Scholar] [CrossRef]
Liu, G.; Zhao, K.; Zhang, L.; Bi, X.; Lv, X.; Chen, C. A Survey of Zero-Shot Stance Detection. In Natural Language Processing and Chinese Computing; Wong, D.F., Wei, Z., Yang, M., Eds.; Springer: Singapore, 2025; pp. 107–120. [Google Scholar]
Zhang, H.; Li, Y.; Zhu, T.; Li, C. Commonsense-based adversarial learning framework for zero-shot stance detection. Neurocomputing 2024, 563, 126943. [Google Scholar] [CrossRef]
Chunling, W.; Yijia, Z.; Xingyu, Y.; Guantong, L.; Fei, C.; Hongfei, L. Adversarial Network with External Knowledge for Zero-Shot Stance Detection. In Proceedings of the 22nd Chinese National Conference on Computational Linguistics; Sun, M., Qin, B., Qiu, X., Jiang, J., Han, X., Eds.; Chinese Information Processing Society of China: Beijing, China, 2023; pp. 824–835. [Google Scholar]
Zou, J.; Zhao, X.; Xie, F.; Zhou, B.; Zhang, Z.; Tian, L. Zero-Shot Stance Detection via Sentiment-Stance Contrastive Learning. In Proceedings of the 2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI), Macao, China, 31 October–2 November 2022; pp. 251–258. [Google Scholar] [CrossRef]
Zhao, X.; Zou, J.; Tian, L.; Xie, F.; Wang, H.; Wu, H.; Zhou, B.; Tian, J. A Unified Framework for Unseen Target Stance Detection based on Feature Enhancement via Graph Contrastive Learning. In Proceedings of the 45th Annual Meeting of the Cognitive Science Society, CogSci 2023, Sydney, NSW, Australia, 26–29 July 2023; Goldwater, M.B., Anggoro, F.K., Hayes, B.K., Ong, D.C., Eds.; Cognitive Science Society: Seattle, WA, USA, 2023. [Google Scholar]
Jiang, Y.; Gao, J.; Shen, H.; Cheng, X. Zero-shot stance detection via multi-perspective contrastive learning with unlabeled data. Inf. Process. Manag. 2023, 60, 103361. [Google Scholar] [CrossRef]
Liu, R.; Lin, Z.; Tan, Y.; Wang, W. Enhancing Zero-shot and Few-shot Stance Detection with Commonsense Knowledge Graph. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021. [Google Scholar]
Zhu, Q.; Liang, B.; Sun, J.; Du, J.; Zhou, L.; Xu, R. Enhancing Zero-Shot Stance Detection via Targeted Background Knowledge. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 2070–2075. [Google Scholar] [CrossRef]
Li, A.; Zhao, J.; Liang, B.; Gui, L.; Wang, H.; Zeng, X.; Wong, K.; Xu, R. Mitigating Biases of Large Language Models in Stance Detection with Calibration. arXiv 2024, arXiv:2402.14296. [Google Scholar]
Williams, H.T.; McMurray, J.R.; Kurz, T.; Hugo Lambert, F. Network analysis reveals open forums and echo chambers in social media discussions of climate change. Glob. Environ. Chang. 2015, 32, 126–138. [Google Scholar] [CrossRef]
Alkhalifa, R.; Zubiaga, A. Capturing Stance Dynamics in Social Media: Open Challenges and Research Directions. Int. J. Digit. Humanit. 2022, 3, 115–135. [Google Scholar] [CrossRef]
Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; van den Berg, R.; Titov, I.; Welling, M. Modeling Relational Data with Graph Convolutional Networks. In The Semantic Web; Springer International Publishing: Cham, Switzerland, 2018; pp. 593–607. [Google Scholar] [CrossRef]
Glorot, X.; Bordes, A.; Bengio, Y. Deep Sparse Rectifier Neural Networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Lauderdale, FL, USA, 11–13 April 2011. [Google Scholar]

Figure 1. An illustration of a simple heterogeneous stance network. The edges represent stance relationships: green (favor), red (against), gray (neutral), and dashed (unknown). For example, PatrioticCat expresses support toward #MAGA, while LuckyDog has an opposing stance toward #MAGA. Based on these clues, we can infer that LuckyDog is more likely to oppose Donald Trump, while PatrioticCat is more likely to favor him.

Figure 2. Overview of our framework with two components: (1) heterogeneous stance network construction, where a stance-aware graph is generated using LLMs; and (2) multi-view stance detection, which integrates textual and graph-based representations with tailored methods for both supervised and zero-shot scenarios.

Figure 4. Stance classification using LLM and HSN. The input consists of a tweet and a classification request toward atheism. Instead of directly classifying the stance, an LLM leverages an HSN to retrieve indirect stance clues by analyzing relationships between entities.

Figure 5. Comparison of node distributions in graphs generated by different LLMs (Qwen-2.5, Llama-3.1, and GPT-3.5) and their impact on stance detection performance on SEM16. gcn denotes MVSD(GCN) and llm denotes MVSD(LLM).

Figure 6. Comparison of entity word clouds extracted by different LLMs on SEM16. Each subfigure corresponds to a different large model, where the word size represents the frequency of the extracted entity. Larger words indicate higher extraction frequencies in the model’s output. (a) GPT-3.5; (b) Llama-3.1; (c) Qwen-2.5.

Table 1. Statistics of SEM16 and P-Stance datasets.

Dataset	Targets	Samples	Favor/Against/Neutral
SEM16	5	4870	1240/1574/2056
P-Stance	3	21,574	7645/7432/6497

Table 2. Performance comparison of different models on the SEM16 dataset. The results are reported as Macro F1 scores (%). The best-performing model in each setting is highlighted in bold. Notably, the DT target lacks training data.

Category	Model	SEM16 (%)
Category	Model	DT	HC	FM	LA	AT	CC	avg
supervised	BiCond	-	56.1	52.9	61.2	55.3	35.6	52.2
	BERT	-	61.3	59.0	63.1	60.7	38.8	56.6
	CrossNet	-	60.2	55.7	61.3	56.4	40.1	54.7
	ASGCN	-	61.0	58.7	63.2	59.5	40.6	56.6
	TPDG	-	73.4	67.3	74.7	64.7	42.3	64.5
	MVSD(GCN)	-	84.7	70.3	77.1	76.7	68.1	75.4
zero-shot	TOAD	49.5	51.2	54.1	46.2	46.1	30.9	46.3
	JointCL	50.5	54.8	53.8	49.5	54.5	39.7	50.5
	GPT3.5-direct	62.3	66.2	60.5	60.3	20.6	56.9	54.5
	COLA	68.5	81.7	63.4	71.0	70.8	65.5	70.2
	MB-Cal	72.8	80.3	75.8	68.8	66.5	71.0	72.5
	MVSD(LLM)	76.1	82.0	79.9	77.7	56.9	77.3	75.0

Table 3. Performance comparison of different models on the P-Stance dataset. The best-performing model in each setting is highlighted in bold.

Category	Model	P-Stance (%)
Category	Model	Trump	Biden	Sanders	avg
supervised	BiCond	73.0	69.4	64.6	69.0
	BERT	67.7	73.1	68.2	69.7
	CrossNet	58.0	65.0	53.0	58.7
	ASGCN	77.0	78.4	70.8	75.4
	TPDG	76.8	78.1	71	75.3
	MVSD(GCN)	83.4	86.7	74.0	81.4
zero-shot	TOAD	53.0	68.4	62.9	61.4
	JointCL	62.0	59.0	73.0	64.7
	GPT3.5-direct	82.1	82.0	79.0	81.0
	COLA	86.6	84.0	79.7	83.4
	MB-Cal	85.1	85.1	81.1	83.8
	MVSD(LLM)	85.8	84.6	82.0	84.1

Table 4. Performance comparison of different models using graphs generated by different LLMs on SEM16. The results are reported as Macro F1 scores (%).

Model	Qwen-2.5	Llama-3.1	GPT-3.5
LLM-Direct	70.3	75.1	54.5
MVSD(LLM)	78.4	78.9	75.0
MVSD(GCN)	81.3	80.5	75.4

Table 5. Ablation study of MVSD(GCN) and MVSD(LLM) on SEM16. The table shows the average Macro F1 performance (%).

Model	Average Perf.
MVSD(GCN)	75.4
w/o R	62.4
w/o H-node	72.2
w/o U-node	73.0
w/o E-node	71.5
w/o Att	72.7
MVSD(LLM)	75.0
w/o H-node	70.2
w/o U-node	71.4
w/o E-node	68.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, X.; Liu, B.; Hu, H.; Cai, Y.; Guo, M.; Ma, X. Integrating Graph Neural Networks and Large Language Models for Stance Detection via Heterogeneous Stance Networks. Appl. Sci. 2025, 15, 5809. https://doi.org/10.3390/app15115809

AMA Style

Chen X, Liu B, Hu H, Cai Y, Guo M, Ma X. Integrating Graph Neural Networks and Large Language Models for Stance Detection via Heterogeneous Stance Networks. Applied Sciences. 2025; 15(11):5809. https://doi.org/10.3390/app15115809

Chicago/Turabian Style

Chen, Xinyi, Bo Liu, Huaping Hu, Yiqing Cai, Mengmeng Guo, and Xingkong Ma. 2025. "Integrating Graph Neural Networks and Large Language Models for Stance Detection via Heterogeneous Stance Networks" Applied Sciences 15, no. 11: 5809. https://doi.org/10.3390/app15115809

APA Style

Chen, X., Liu, B., Hu, H., Cai, Y., Guo, M., & Ma, X. (2025). Integrating Graph Neural Networks and Large Language Models for Stance Detection via Heterogeneous Stance Networks. Applied Sciences, 15(11), 5809. https://doi.org/10.3390/app15115809

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integrating Graph Neural Networks and Large Language Models for Stance Detection via Heterogeneous Stance Networks

Abstract

1. Introduction

2. Related Work

2.1. Supervised Approaches

2.2. Zero-Shot Approaches

2.3. Social and Ideological Dimensions of Stance

3. Methodology

3.1. Formal Task Definition

3.2. Heterogeneous Stance Network

3.2.1. Network Definition

3.2.2. Network Construction

3.3. Multi-View Stance Detection

3.3.1. GCN Approach

3.3.2. LLM Approach

4. Experiments

4.1. Experimental Setup

4.1.1. Datasets

4.1.2. Evaluation

4.1.3. Baselines

4.1.4. Implementation Details

4.2. The Overall Comparison

4.3. Comparison of Different LLMs

4.4. Ablation Studies

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Details of the Propmts

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI