Implicit Stance Detection with Hashtag Semantic Enrichment

: Stance detection is a crucial task in natural language processing and social computing, focusing on classifying expressed attitudes towards specific targets based on the input text. Conventional methods predominantly view stance detection as a task of target-oriented, sentence-level text classification. On popular social media platforms like Twitter, users often express their opinions through hashtags in addition to textual content within tweets. However, current methods primarily treat hashtags as data retrieval labels, neglecting to effectively utilize the semantic information they carry. In this paper, we propose a large language model knowledge-enhanced stance detection framework (LKESD) for stance detection. LKESD contains three main components: an instruction-prompted background knowledge acquisition module (IPBKA) that retrieves background knowledge of hashtags by providing handcrafted prompts to large language models (LLMs); a graph convolutional feature-enhancement module (GCFEM) is designed to extract the semantic representations of words that frequently co-occur with hashtags in the dataset by leveraging textual associations; an a knowledge fusion network (KFN) is proposed to selectively integrate graph representations and LLM features using a prompt-tuning framework. Extensive experimental results on three benchmark datasets demonstrate that our LKESD method outperforms 2.7% on all setups over compared methods, validating its effectiveness in stance detection tasks.


Introduction
Stance detection is an important task in the domains of natural language processing (NLP) and social computing, focused on classifying the expressed attitude towards a particular target based on the input text [1].Early stance detection research primarily concentrates on evaluating data from online debate platforms, political analysis documents, and related sources.In recent years, the rapid development of the internet has led to a significant increase in the popularity of platforms such as Twitter, prompting researchers to investigate stance detection tasks for social media [2,3].As a result, stance detection on social media has become a significant area of research.
Current stance detection approaches are typically framed as sentence-level classification tasks based on a specific target.These approaches can be divided into non-pretrained or pretrained language model (PLM) approaches.Non-pretrained models mainly employ such as recurrent neural networks, graph convolutional networks (GCNs), and traditional attention-based architectures for stance classification.For instance, Du et al. [4] employed an attention method utilizing target features.Sun et al. [5] developed hierarchical attention for modeling text representations through linguistic knowledge, and Liang et al. [6] proposed a GCN approach to differentiate target-specific and invariant features.Inspired by the promising performance of PLMs, fine-tuning strategies have been developed to enhance the accuracy of stance detection [7].These methods involve adapting pretrained models, e.g., BERT [8] and RoBERTa [9], utilizing stance detection datasets, thus adapting the models to this particular task.Typically, these methods mainly view stance detection as a target-oriented, sentence-level text classification task.
In recent social media stance detection methods, a persistent challenge remains despite their progress.Specifically, on popular social media platforms like Twitter, users frequently express their opinions through the use of hashtags in addition to textual content within tweets.However, current methods mainly treat hashtags as data retrieval labels, neglecting to effectively utilize the semantic information they carry.For instance, datasets such as SemEval-2016 Task 6 (SEM16) [10] and Pstance [11] employ hashtags as keywords for data collection.Consequently, these hashtags are prevalent across multiple instances and are often challenging to effectively represent using sentence-level text classification approaches.
Recently, Zhang et al. [12] introduced a challenging task, implicit stance detection (ISD), and proposed an ISD dataset, where hashtags play a crucial role as discriminative features within sentences.Examples include stance indicators such as "#voteTrump" and background knowledge related to the target like "#MAGA" and "#BLM", among others.This approach closely aligns with real-world social media scenarios, where accurate stance detection necessitates a comprehensive understanding of the knowledge encapsulated within stance-related hashtags.
To date, several studies have explored the ISD task.Given the vast number of hashtags, early work employed unsupervised methods, such as k-nearest neighbors, to learn text representations of hashtags, subsequently integrating them into classifiers [13].Building upon the characteristics of social media content, Huang et al. [12] proposed the biterm topic model (BTM) method to learn vector representations of hashtags from unsupervised data.However, these methods face limitations: they require large numbers of data for unsupervised algorithm learning, which is impractical in rapidly evolving social media scenarios.Additionally, while some work leverages the knowledge stored in large models to enhance social media text retrieval, the knowledge in these models may be prone to errors due to factors such as training sample timing, rendering them unsuitable for the rapidly changing landscape of social media.
To address the aforementioned challenges, we propose LKESD, a large language model knowledge-enhanced stance detection framework for hashtags.LKESD comprises three main components as follows.An instruction-prompted background knowledge acquisition module (IPBKA) that retrieves background knowledge by providing handcrafted prompts to large language models (LLMs).A graph convolutional feature enhancement module (GCFEM) is designed to mine the semantic representations of words that frequently co-occur with hashtags in the dataset text by leveraging textual associations.Subsequently, a knowledge fusion network (KFN) is constructed to selectively integrate graph representations and LLM features using a prompt-tuning framework.The prompt-tuning framework involves constructing a prompt fine-tuning method based on pre-trained language models (PLMs) for accurate stance detection.
We summarize our contributions as follows: • We propose a LKESD framework for stance detection that can learn the semantic information of hashtags from both LLMs and corpora, thereby enhancing its applicability in real-world social media scenarios.

•
We investigate stance detection from a novel perspective by exploring the semantic expressions of hashtags.We propose a novel KFN to achieve dynamic fusion of different semantic representation features.

•
To validate the effectiveness of the LKESD model for stance detection on social media, we perform comprehensive experiments on widely used benchmarks.The experimental results demonstrate the effectiveness of the proposed method.
The subsequent sections of this paper are structured as follows.Section 2 reviews relevant literature on traditional and recent methods for stance detection.Section 3 presents the proposed model in detail.Section 4 outlines the experimental setup, including the datasets, baseline methods, and quantitative results.Lastly, Section 5 concludes the paper and discusses potential avenues for future work.

Stance Detection
Stance detection aims to classify the perspective expressed in a text towards a given target and is closely related to argument mining, fact-checking, and aspect-level sentiment analysis [14,15].
As shown in Table 1, existing in-target stance detection methods can be categorized into two types: non-pretrained and pretrained approaches.Non-pretrained methods often utilize deep neural networks like attention-based methods and GCN to train stance classifiers.Attention-based methods utilize target-relevant information and implement attention mechanisms to determine stance polarity [4,5,16].GCN-based methods utilize GCN to model relations between the target and text, enabling nuanced analysis of their connections [17][18][19].
For cross-target stance detection, existing methods can be categorized into two main types: word-level transfer and concept-level transfer.Word-level transfer methods leverage the commonality of words across targets to bridge knowledge gaps [20].Concept-level transfer methods address cross-target challenges by utilizing shared concepts between targets to enable understanding and analysis [21][22][23].
Zero-shot stance detection poses a particular challenge, requiring models to deduce the stance towards unseen targets.To enable zero-shot learning, Allaway et al. [24] constructed a human-annotated dataset tailored for this setting.Allaway et al. [25] further applied adversarial learning to derive target-invariant features and used a target-specific dataset.Liu et al. [7] proposed a graph-based model integrating intra and extra semantic knowledge and common sense using BERT.Liang et al. [6] identified target-specific and target-invariant characteristics to obtain transferable features.

Incorporating Background Knowledge
Incorporating background knowledge to improve stance detection performance on social media has gained considerable attention in recent years.Earlier methods focused on enhancing the understanding of words in the text.For example, Zhang et al. [22] proposed a framework that extracted semantic and emotional word-level knowledge from lexicons to enable knowledge transfer across targets.Another common approach is to conduct pre-training on a corpus specific to the target domain, such as BERTweet [26] or COVID-Twitter-BERT [27].Kazuaki et al. [28] introduced a method for extracting relevant concepts and events from Wikipedia articles and incorporating them into stance detection.Current retrieval methods typically employ keyword-based filtering [29] for knowledge retrieval.
Despite effective progress, an important challenge when applying these methods to social media is their inability to effectively address the semantic expressiveness of hashtags.To the best of our knowledge, semantic expressiveness is an emerging area of interest.Ghosh et al. [30] first proposes that to split hashtag into individual words and employ substitute vocabulary to clarify these expressions.However, since hashtags may contain informal text and this approach fails to incorporate contextual semantics, it yields suboptimal results.Zhang et al. [31] proposed constructing an unsupervised topic model and using the clustered topic words as semantic representations of hashtags.The low accuracy of clustering risks propagating errors, and the reliance on abundant unlabeled data makes unsupervised methods less adaptable to cross-target and zero-shot settings.Li et al. [32] proposed incorporating LLM to generate the explanation of hashtags and enhance model performance.However, directly utilizing the knowledge from large models is restricted to their training corpus and may propagate errors.

LKESD Framework
We give the task definition and the overview of our model in Section 3.1 and Section 3.2, respectively.Then, we describe the details of the LKESD in Sections 3.3-3.5.

Problem Definition
The goal of stance detection is to predict the stance polarity of an input sentence x t towards a specified target q t using a model trained on a labeled dataset X.Here, X = {x i , q i } N i=1 represents the collection of labeled data, where x denotes the input text, q corresponds to the source target, and N is the total number of instances in X.Each sentence-target pair (x, q) ∈ X is assigned a stance label y.The superscript t indicates test data.

Framework Overview
As depicted in Figure 1, LKESD consists of three main components: IPBKA, GCFEM, and KFN.IPBKA proposes an instruct-based zero-shot prompting method that acquires the knowledge for hashtags from LLMs.Since they are data from outside the training set, we refer to them as extra knowledge.The GCFEM first constructs a semantic graph containing hashtags from the input text and subsequently learns the vector representation of hashtags through a GCN network.Contrary to extra knowledge, we call it intra knowledge.Finally, the KFN is a prompt-tuning network that fuses extra and intra knowledge for accurate stance detection.This is achieved by creating a customized template for the PLM and integrating extra and intra knowledge.

IPBKA
IPBKA is used to extract extra knowledge of hashtags from LLM. Inspired by the effectiveness of zero-shot instruction prompting in current LLMs [33], we construct an instruction template that is directly fed into the LLM to obtain the background knowledge of the hashtag.The specific template can be represented as Prompt: Prompt: Given the following text, identify and analyze the semantic meaning of any hashtags present.Provide insights into the context and potential implications of these hashtags.Text: "[GIVEN TEXT]" Subsequently, we input the obtained extra knowledge into the BERT model to generate the embedding vector of extra knowledge.Specifically, we use the average of hidden states as the representation of extra knowledge, denoted as r.

GCFEM
The GCFEM is employed to learn hashtag representations from the input text (intra knowledge).Compared to the knowledge obtained from IPBKA, the hashtag representations obtained from the text are closer to the input domain.
Specifically, to represent word-hashtag relationships, we first construct a semantic graph.The semantic graph employs words or hashtags as nodes and builds weighted edges between words or hashtags based on their co-occurrence frequency.We use G to represent the constructed graph.
Subsequently, we employ GCN to learn the embeddings of each node in the graph to fully leverage the multi-hop semantic connections between nodes.Given the semantic locality between words, we extract a λ-hop subgraph from the constructed graph for each hashtag, which is then input into a GCN to learn the graph representation.GCN is adopted due to its advantage of effectiveness and efficiency in learning graph embeddings.
In formal terms, let E ∈ R v×d represent a matrix containing all v nodes in the graph and their respective features, where d is the size of the node embedding.For each node, we extract a λ-hop subgraph G ′ from the entire graph G, which has a degree matrix D and an adjacency matrix A. The normalized symmetric adjacency matrix of subgraph G ′ can be calculated as: The subgraph representation L ∈ R n×c with n nodes can be computed by feeding the subgraph G ′ into a two-layer GCN as follows: where σ denotes the sigmoid function, W i and W j are learnable parameters.After obtaining the graph representation L, the vector corresponding to the hashtag is retrieved from the graph and represented as k.

Knowledge Fusion Network
Knowledge Fusion Network is a prompt-tuning framework that takes input text information and feeds it into a pre-trained model through the construction of a template.The fusion layer then combines the input with intra and extra knowledge for fusion.
Prompt-tuning is a transformative framework that reformulates the original classification task as a masked language modeling task.In particular, prompt-tuning utilizes a natural language template p that is integrated into the given text x and the target q.The combined input is denoted as follows: "x p = x.The attitude to q is [MASK]".Let M denote the BERT model, which gives the probability of each word v in the vocabulary being filled in [MASK] given P M ([MASK]= v|x p ).In this case, v denotes the defined label word in the verbalizer.To transform the probabilities of these words to the probabilities of the labels, a verbalizer is employed as a mapping function f from the defined words in the vocabulary, which form the label word set V, to the label space Y, i.e., f : V → Y.
The probability P(y|x p ) of label y is formally computed as follows: where δ plays a pivotal role in transforming the probability distribution over label words to the probability distribution over labels.Prompt design.The crucial aspect of a prompt-based method for stance detection is the construction of an appropriate prompt.In this paper, following the work of [12], our template is defined as follows: Template (x p ): [Given input: x].The attitude to [given target: q] is [MASK].
Fusion layer.Upon building the template, we employ a novel method that relates mapping labels onto continuous vectors, referred to as stance vectors, rather than explicit words or phrases.Specifically, we input x p into BERT to obtain H, which represents the hidden vector generated by BERT.In this case, H is the input text representation vector.Further, we extract the vector at the "[MASK]" position from H as the stance vector, denoted as s.Subsequently, given vectors s, k and r, we employ the attention mechanism, enabling the learning of knowledge-enhanced textual representations.
Formally, the attention coupling factors c for each query can be computed as follows: Subsequently, we normalize the three factors using the softmax function to obtain the attention weight: With the attention weight c, the final representation can be computed as follows: where γ denotes the scaling factor.
Given the defined label words from the verbalizer, we generate the probability that the token v can be chosen as the label words: where v represents the embedding of the token in the Verbalizer.Subsequently, we aggregate the probabilities of each label from ω for these words, denoted as ŷ.Finally, the loss function of the ensemble network can be computed using standard cross-entropy methods: Here, N denotes the number of training samples, C denotes the number of stance classes, and y i represents the one-hot represented ground-truth label for the i-th sample.Ultimately, the attention layer is optimized using the standard gradient descent algorithm.

Experimental Data
We present empirical evaluations on several benchmark datasets, containing ISD [12], SemEval-2016 Task 6 (SEM16) [10], COVID-19 [34].The dataset statistics are summarized in Table 2. • ISD.The ISD dataset [12] is proposed for the stance detection task on social media, which presents a challenge as it consists of texts lacking explicit sentiment words.Therefore, understanding the relationship between the text and contextual knowledge, including the target and hashtag knowledge, is crucial for predicting stance polarity.ISD includes two targets: Trump (DT) and Biden (JB).• SEM16.The original SEM16 dataset includes 4870 texts and annotated with one of three stance labels: "favor", "against", or "neutral".To validate the efficacy of our hashtag fusion approach, we reorganized the original dataset.Hashtags containing only crawled user data were removed, and the remaining data were consolidated into a single dataset (SEM16-h).SEM16-h contains the same four targets as in previous work [21].• COVID-19.The COVID-19 dataset contains 6133 tweets, each reflecting user positions on four specific targets associated with COVID-19 health mandates.Similar to SEM16, we process the dataset, and the remaining data are consolidated into a single task (COV-h).COV-h contains the same four targets as [34].

Compared Baseline Methods
To assess the efficacy of our proposed model, we performed an extensive analysis and comparative study with established baseline models, which are outlined as follows: Statistics-based methods: • BiLSTM [20] adopts a bidirectional LSTM framework to encode the text and target separately, enabling the extraction of independent semantic features.

•
BiCond [20] employs a bidirectional LSTM framework to simultaneously encode the text and target, thereby capturing their shared semantic features.
• CrossNet [35] builds upon the BiCond architecture by integrating a self-attention mechanism, which selectively highlights salient textual features.

•
AoA [36] employs a dual-LSTM architecture, wherein two separate LSTM networks are dedicated to modeling the target and context, respectively, and an interactive attention mechanism is integrated to facilitate the examination of their relationships.• TPDG [37] proposes a target-adaptive convolutional graph framework, which boosts stance detection accuracy by leveraging shared features from similar targets and capitalizing on their inherent relationships.
Fine-tuning based methods: • BERT [8] leverages a pre-trained BERT architecture for stance detection, reformulating the input format to "[CLS] + text + [SEP] + target + [SEP]" to optimize the model's training and fine-tuning procedures.• PT-HCL [6] exploits contrastive learning to enhance the detection of subtle stance variations.
Prompt-tuning based methods: • MPT [38] introduces a knowledge-infused prompt-tuning method for stance detection, which exploits a verbalizer carefully crafted by human experts to enhance the detection of subtle stance variations.• KPT [39] leverages external lexicals to initialize the verbalizer component, which is embedded within the prompt framework, to facilitate the integration of domainspecific knowledge.• KEprompt [12] employs a topic model to acquire hashtag representations and then performs prompt-tuning methods for stance detection.
Knowledge-enhanced methods: • SEKT [22] presents a GCN framework that incorporates semantic knowledge to enhance stance detection capabilities.

•
TarBK [29] integrates the target-related wiki knowledge from Wikipedia for stance detection.

Implementation Details and Evaluation Metrics
In the experimental configuration, we selected to use the BERT-base uncased architecture as the PLM.The Adam optimizer with a learning rate of 0.0002 was used to train the model, and the mini-batch size was set to 32.For the LLM, we employ GPT-3.5 as the foundational architecture for knowledge elicitation.
Following previous research [6,22], we utilize the macro F1 score as the metric, which is the average F1 score across favor and against labels: The F1 score can be computed based on precision and recall.
4.4.Overall Performance 4.4.1.In-Target Setup Table 3 presents the results of in-target stance detection in comparison to widely used benchmarks.The results demonstrate that our LKESD methods outperform most of the baseline methods across all datasets, thereby highlighting the performance of LKESD in stance detection.The experimental results show that our LKESD model outperforms most baseline models across all datasets, thereby validating the effectiveness of our proposed stance detection method.Furthermore, significance tests conducted on LKESD, with p-value < 0.05 (indicated as † ), reveal statistically significant enhancements over the best-performing competitors across most evaluation metrics.Specifically, the experiments show that when using statistical-based embedding methods, all results perform poorly due to the inability of statistical word vector initialization methods to effectively represent hashtags.In contrast, pre-trained models (e.g., BERT) achieve improvements in accuracy, possibly due to the ability of pre-trained models like BERT to effectively leverage large-scale knowledge.
Finally, our LKESD outperforms the KASD method, which is enhanced with LLM, by an average of 2.7% across all tasks.This may be due to our knowledge fusion mechanism that effectively leverages both extra and intra knowledge for hashtags.

Cross-Target Setup
Acquiring large-scale annotated datasets requires substantial time and resources.Therefore, we further validate the effectiveness of LKESD within a cross-target setup.The goal of this setup is to use labeled data from the source target to predict the stance toward the destination target.The result can be found in Table 4. From the result, we can find that our LKESD significantly outperforms the best-performing baseline competitors.Specifically, methods leveraging large-scale models (KASD and LKESD) significantly outperform traditional knowledge enhancement methods, indicating that LLMs can effectively generate external knowledge to improve predictive performance.Furthermore, the F1 score of LKESD is on average 0.3% higher than that of KASD on average.This performance improvement may be due to the automatic selection of large-scale model knowledge by the fusion mechanism, which can effectively learn transferable important knowledge.To further evaluate the generalizability of the model, we conduct a zero-shot setup for stance detection.The result is shown in Table 5.Following previous work [6,32,41], we select a specific target as the test set, with the remaining task data as the training set.For example, we use →DT to denote DT as the test set, with the remaining targets (JB, SEM16-h and COV-h) as the training data.From the experiments, we observe that LKESD can still achieve effective performance improvements in the zero-shot scenario.Specifically, we find that methods requiring large amounts of unlabeled samples to construct hashtag representations, compared to in-target and cross-target, have insufficient accuracy improvements in the zero-shot setting.This is due to the unseen target domain, which cannot effectively acquire hashtag background knowledge.In contrast, methods leveraging LLMs (KASD and LKESD) both achieve good performance.This is consistent with our expectation that zero-shot prompt learning methods can obtain hashtag background knowledge in unseen target domains.

Ablation Study
To evaluate the effect of each component in our model, we perform ablation studies by individually removing the IPBKA model (denoted as w/o IPBKA), GCFEM (denoted as w/o GCN), and KFN (denoted as w/o KFN).In particular, for w/o KFN, we directly concatenate the intra and extra knowledge to replace the attention-based fusion mechanism.
The ablation study results are presented in Figure 2. The results indicate that IPBKA, GCN, and KFN all make significant contributions to improving the performance of the proposed method.More specifically, the performance significantly decreases when IPBKA and GCN are removed.This may be due to the importance of hashtag semantic representation for stance detection.Finally, as expected, integrating all components results in the best performance across all experimental settings.

Conclusions
In this paper, we propose a large language model knowledge-enhanced stance detection framework (LKESD) for stance detection that can learn the semantic information of hashtags, thereby enhancing its applicability in real-world social media scenarios.LKESD comprises three main components: An instruction-prompted background knowledge acquisition module (IPBKA) that retrieves background knowledge for hashtags by providing handcrafted prompts to LLMs.A graph convolutional feature enhancement module (GCFEM) aims to extract the semantic representations of words that frequently co-occur with hashtags in the dataset by leveraging textual associations.A knowledge fusion network (KFN) is proposed to selectively integrate graph representations and LLM features using a prompt-tuning framework.Experiments on three benchmark datasets demonstrate that our LKESD method outperforms comparison methods, validating its effectiveness in the stance detection task.In future work, we will further investigate the impact of biases in individual LLMs on our method.Additionally, we plan to explore stance detection methods that integrate graph-based LLMs.
Project of Key Construction Discipline in Guangdong Province (2022ZDJS112), and University Stability Support Program of Shenzhen (20231129211559001).

Figure 1 .
Figure 1.Framework overview of LKESD.The input examples for IPBKA can be found in Section 3.3.The GCFEM module takes tweets as input, while the KFN module uses pre-constructed prompt templates as input (see prompt design part in Section 3.5).

Figure 2 .
Figure 2. Ablation test results.(a) Ablation study with DT and JB targets.(b) Ablation study with SEM-h and COV-h targets.

Table 2 .
Statistics of datasets.

Table 3 .
Comparative results of F1 score for in-target stance detection.

Table 4 .
Comparative results of F1 score for cross-target stance detection.

Table 5 .
Comparative results of F1 score for zero-shot stance detection.