Leveraging Chain-of-Thought to Enhance Stance Detection with Prompt-Tuning

: Investigating public attitudes towards social media is crucial for opinion mining systems to gain valuable insights. Stance detection, which aims to discern the attitude expressed in an opinionated text towards a specific target, is a fundamental task in opinion mining. Conventional approaches mainly focus on sentence-level classification techniques. Recent research has shown that the integration of background knowledge can significantly improve stance detection performance. Despite the significant improvement achieved by knowledge-enhanced methods, applying these techniques in real-world scenarios remains challenging for several reasons. Firstly, existing methods often require the use of complex attention mechanisms to filter out noise and extract relevant background knowledge, which involves significant annotation efforts. Secondly, knowledge fusion mechanisms typically rely on fine-tuning, which can introduce a gap between the pre-training phase of pre-trained language models (PLMs) and the downstream stance detection tasks, leading to the poor prediction accuracy of the PLMs. To address these limitations, we propose a novel prompt-based stance detection method that leverages the knowledge acquired using the chain-of-thought method, which we refer to as PSDCOT. The proposed approach consists of two stages. The first stage is knowledge extraction, where instruction questions are constructed to elicit background knowledge from a VLPLM. The second stage is the multi-prompt learning network (M-PLN) for knowledge fusion, which learns model performance based on the background knowledge and the prompt learning framework. We evaluated the performance of PSDCOT on publicly available benchmark datasets to assess its effectiveness in improving stance detection performance. The results demonstrate that the proposed method achieves state-of-the-art results in in-domain, cross-target, and zero-shot learning settings.


Introduction
Stance detection is a fundamental task in natural language processing (NLP), where the goal is to classify attitudes towards a particular target given opinionated input texts [1].This task has gained significant attention in recent years due to its importance in various applications, such as political analysis, social media monitoring, and customer feedback analysis.In its early stages, research on stance detection was primarily centered on online debates that adhere to a standardized sentence structure, and where the user's attitude is typically straightforwardly expressed [2,3].However, with the rapid growth of the internet, social media platforms such as Xhave become more popular, and researchers have started exploring the mining of social media for stance detection [4,5].
Conventional methods for stance detection can be viewed as target-based sentencelevel classification tasks, which can be classified into non-pretrained and pretrained language models (PLMs).Non-pretrained models employ deep neural networks (DNNs), such as long short-term memory (LSTM), attention-based models (Att), and graph convolutional networks (GCN), to build stance classification models.For instance, Du et al. [6] employed an attention-based approach that leverages target-specific information.Dey et al. [7] employed two independent LSTMs to sieve non-neutral text and classify attitudes separately.Sun et al. [8] devised a hierarchical attention mechanism that learned text representation by utilizing linguistic features.Liang et al. [9] introduced an effective GCN-based approach that distinguished between target-invariant and target-specific features.Inspired by the recent success of PLMs, fine-tuning techniques have been developed to improve the accuracy of stance detection [10].Fine-tuning techniques have been developed to adapt pre-trained language models (PLMs) to specific tasks.One such technique involves constructing a stance classification head at the top of the special token denoted as "<cls>" and fine-tuning the entire model accordingly.In stance detection, the model is exposed to many input text-stance label pairs during fine-tuning.The <cls> token learns semantic and syntactic patterns that correlate with different stances.In sum, these methods typically regard stance detection as a target-oriented sentence-level text classification task.Nevertheless, the efficacy of social media data analysis methods is impeded by the sparsity problem arising from the concise and informal expressions commonly encountered on these platforms.Such content typically lacks context, details, or elaboration and often incorporates abbreviations and slang for which pre-trained language models (PLMs) lack corresponding background knowledge.Consequently, PLMs struggle to comprehend the semantics conveyed in the text, leading to erroneous judgments.
Recently, some pioneering studies have been conducted to address the sparsity problem by utilizing external knowledge to enhance the performance and interpretability of stance detection.For example, He et al. [11] improved the performance of text classifiers by introducing target-related Wikipedia documents as content supplements.Diaz et al. [12] constructed a stance tree by retrieving external knowledge from a knowledge base and used it as evidence to support stance prediction, thus enhancing the accuracy of stance detection.Zhang et al. [13] utilize external knowledge from semantic and emotion lexicons as a bridge to enable knowledge transfer across different targets.Nasiri et al. [14] addressed the issue of a lack of annotated datasets in Persian pose detection tasks through data augmentation and transfer learning.Hardalov et al. [15] proposed a novel semi-supervised approach to address the issue of scarce data in cross-language scenarios.Khiabaniet al. [16] enhanced stance detection performance in low-shot cross-target scenarios through multimodal embeddings derived from both textual and network features of the data.Although these works have achieved improvements in the performance and interpretability, these methods still face the following challenges in practical applications: (1) Most existing methods require the design of complex attention mechanisms to filter out noise and extract task-related background knowledge.However, such methods require a large number of annotated samples, which is clearly time-consuming and labor-intensive.In order to ease the applicability of knowledge-enhanced stance detection, it would be highly desirable to develop knowledge-acquired algorithms that are less dependent on feature engineering and with high-quality task-related background knowledge.(2) Most of these knowledge fusion mechanisms rely on fine-tuning models.However, the fine-tuning approach creates a gap between the pre-training phase of PLM and downstream stance detection tasks, resulting in the reduced prediction accuracy of the PLMs.
To tackle the challenges mentioned above, in this paper, we propose a prompt-based stance detection method by leveraging the knowledge acquired by the chain-of-thought method (PSDCOT).The proposed model is motivated by two considerations.First, the advancements in large language models, such as GPT-3.5, etc., have demonstrated their powerful knowledge generation capabilities, and COT methods can effectively mine knowledge from these models to support prediction with evidence.Second, prompt learning methods can improve prediction performance by fitting the downstream task to the upstream training process.The proposed PSDCOT consists of two stages.The first stage is knowledge extraction, where instruction questions are constructed to elicit background knowledge from a VLPLM.The second stage is the multi-prompt learning network (M-PLN) for knowledge fusion, which learns model performance based on the background knowledge and the prompt learning framework.Extensive tests were conducted on publicly available benchmark datasets to evaluate the performance of the proposed PSDCOT method.The results demonstrate that the proposed method effectively improves stance detection performance, achieving state-of-the-art results in in-domain, cross-target, and zero-shot learning settings.
In summary, this paper presents several significant contributions: • A PSDCOT framework is proposed, which improves prompt-based stance detection models by incorporating the background knowledge into prediction.The present paper is structured as follows.Section 2 offers an overview of related research, including traditional and recent methods of stance detection, as well as methods for prompt tuning.Section 3 outlines the details of our innovative method.Section 4 presents the findings of our experimental analysis.The paper concludes with Section 5, which summarizes our key findings.

Stance Detection
The aim of stance detection is to identify and analyze the perspective of a given text regarding a particular target [17,18].(1) In the scenario of an in-target setting, conventional techniques can be broadly categorized as non-pretrained and pretrained methods.Deep neural networks, say Att and GCN, are commonly utilized by non-pretrained techniques for the purpose of training stance classifiers.Att methods focus primarily on target-specific data as the attention query, and employ an attention mechanism to obtain the stance polarity [6][7][8]19].The GCN methods introduce a graph convolutional network to model the correlation between the target and the text [20][21][22].(2) Various methods have focused on cross-target stance detection tasks, which can be broadly classified into two categories.The first category is related to word-level transfer, which makes use of shared words between two targets as a means to bridge knowledge gaps [23].The second category addresses cross-target issues by utilizing concept-level knowledge that is common between two targets [13,24,25].(3) Zero-shot stance detection is a particularly challenging scenario, where a trained stance detection model is required to infer the stance of an unseen target.In response to this challenge, Allaway and McKeown [26] have developed a large-scale dataset for stance detection that has been labeled by human annotators.The dataset is specifically designed for the zero-shot scenario.Moreover, Allaway et al. [27] have utilized adversarial learning in order to extract target-invariance information and have used a stance detection dataset that is specific to the target to conduct zero-shot stance detection.Liu et al. [10] have proposed a graph model that incorporates both intra-and extra-semantic information, in addition to common sense knowledge based on BERT.This approach is aimed at enhancing the semantic information obtained.Additionally, Liang et al. [9] have introduced a robust method for detecting target-specific or target-invariant features to help acquire transferable stance features.

Background Knowledge Enhanced Stance
The use of background knowledge to enhance the performance of stance detection has garnered attention as an effective approach to improving performance [28].For instance, He et al. [11] introduced target-related background knowledge, such as Wikipedia knowledge, and proposed a fine-tuning learning method to improve the model's learning ability.Similarly, Luo et al. [10] constructed background knowledge as a knowledge graph and utilized graph neural network methods to develop a stance predictor.Additionally, Huang et al. [29] introduced the use of #hashtag background knowledge to improve content learning.Furthermore, Luo et al. [30] incorporated sentiment knowledge to better learn attitudes.

Prompt-Tuning Methods
Prompt tuning has gained widespread popularity in diverse natural language processing (NLP) domains, for example, text classification [31], natural language understanding [32], and sentiment analysis [33].The verbalizer plays a critical role in prompt tuning and significantly impacts its effectiveness [34].The methods for designing verbalizers can be categorized into human-designed and automatic verbalizers.Human-designed verbalizers rely primarily on the personal expertise of the creator and may lack sufficient coverage [32].Automatic verbalizers are designed using search methods, but they require a significant number of training and validation sets to optimize [35].Previous studies on prompt-based models have concentrated on stance detection [15,36].Jiang et al. [36] presented TAPD, a prompt-tuning framework designed for stance detection.TAPD utilizes a verbalizer that maps labels to hidden vectors to facilitate label prediction.Likewise, Hardalov et al. [15] developed a prompt-based approach for cross-lingual stance detection.Furthermore, Huang et al. [29] proposed the use of SenticNet to construct an atomic verbalizer.In conclusion, the prompt learning framework has shown remarkable progress in detecting stances.

Our Methodology
To represent the labeled dataset, we utilize X = {x i , q i } i=1 , where x and q, respectively, denote the input text and the corresponding target.Each (x, q) pair in X is assigned a stance label y.The objective of stance detection is to infer a stance label for the input sentence in the context of a given target q.

Model Overview
As illustrated in Figure 1, our PSDCOT consists of the chain-of-thought module for knowledge extraction (KE) and a multi-Prompt learning Network (M-PLN) two main components.Here, KE aims to extract the external knowledge for enhancing stance detection via COT methods.In M-PLN, we design an attention-based network for background knowledge integration for stance detection.

Knowledge Extraction
To elicit background knowledge effectively, we design the chain-of-thought prompt method.The proposed approach is motivated by the observation that the emerging capabilities of very large models enable them to generatively generate an understanding as background knowledge.Therefore, we aim to leverage the background knowledge of a large model to enhance the performance of stance detection.
Specifically, we propose the step-by-step question-answering strategy to elicit knowledge.Such a method teaches language models to solve the stance detection by providing a one-shot example.First, we construct the question-answer pair (QAP), and then feed the constructed question into the VLPLM and acquire the explanation of the reason for prediction.For example, given the following input: "RT GunnJessica: Because i want young American women to be able to be proud of the 1st woman president #SemST", the question for ChatGPT input is as follows: "What is the attitude of the sentence: "RT GunnJessica: Because i want young American women to be able to be proud of the 1st woman president #SemST" to the target "Hillary Clinton" select from "favor, against or neutral".For this particular example, ChatGPT returns a correct result.Second, we further inquire as to why the model predicts a certain stance polarity.As shown in Table 1, large language models have the capability to fill in missing information in sentences, such as subjects, and decipher hashtags.

Tweet and Target Background Knowledge
Tweet: KoriWasinger JButtawipo she also voted for the Iraq war and was responsible for gathering TheDemocrats votes to support it.Target: Hillary Clinton [Against] The sentence mentions Clinton's role in gathering votes from Democrats to support the war suggests that the speaker believes Clinton bears responsibility for the decision to go to war and holds her accountable for it. [Against] Tweet: You know you're in a patriarchy when women are the ones who are blamed for prostitution #whatisjustice #inequality.Target: Feminist Movement [Favor] The sentence seems to express a belief that women are unfairly blamed for engaging in prostitution and suggests that the society in which this occurs is a patriarchy, which is a system that favors men and oppresses women.The use of hashtags such as #whatisjustice and #inequality also suggests that the speaker is supportive of the feminist movement and its goals of promoting gender equality. [Favor] Tweet: Let's agree that it's not ok to kill a 7lbs baby in the uterus DWStweets #DNC #Clinton2016 HillaryforIA #ProCompromise Target: Legalization of Abortion [Against] The use of hashtags such as "#DNC" and "#Clinton2016" suggests a negative sentiment towards the legalization of abortion, and the mention of it being "not ok to kill a 7lbs baby in the uterus" may be seen as a criticism of the idea that some lives are more valuable than others. [Against]

Multi-Prompt Learning Network (M-PLN)
Preliminary: Prompt-tuning with PLM.Prompt-tuning is a transformative approach that reframes the stance detection task as a masked language modeling task.Specifically, prompt-tuning methodology adopts a text template p which is thoughtfully incorporated into the given text x and the target q.For example, to classify sentence x as being in favor or against, the prompt-tuning process envelops the sentence x with a predefined text template to yield a novel text representation x p = "We should support this.The attitude to the <Target q> is [MASK]."Let M be the pre-trained language model, which provides the probability of each word v in the vocabulary being filled in [MASK] given In this context, v represents the defined label word in the verbalizer.To map the probabilities of these words to the probabilities of the labels, a verbalizer is utilized as a mapping function f from the defined words in the vocabulary, which form the label word set V, to the label space Y, i.e., f : V → Y. Formally, the probability P(y|x p ) of label y, is computed as follows: where µ serves as a crucial component in transforming the probability distribution over label words to the probability distribution over labels.To illustrate, in the aforementioned example, prompt tuning can set V 1 to represent the words "support" and "agree", and V 2 to represent the word "opposition".Additionally, µ can be defined as an identity function.
The instance is then categorized under the favor class if the average likelihood of the terms in V 1 exceeds that of the terms in V 2 .In prompt tuning, the objective of learning is to minimize Prompt Design.The key to the prompt-based method for stance detection is to construct the appropriate prompt.Previous research has demonstrated that the performance of different prompts varies significantly, and this issue is further compounded for stance detection.The expressions and topics exhibit a wide range of diversity among distinct target groups, thereby rendering the formulation of a universal prompt for the entirety of these targets infeasible.To account for this heterogeneity, our approach involves the creation of multiple prompts derived from varied perspectives.Based on prior research, we have addressed stance detection by considering not only sentiment polarity in text, but also stance-aware words and target-text relations.Therefore, we design prompt templates from three perspectives, as shown by T 1 , T 2 , and T 3 , respectively.We employ three RoBERTa models as our pre-trained language model (PLM).The [MASK] and [SEP] tokens are sourced directly from the RoBERTa vocabulary.Our prompts are easily customizable for the pre-training tasks of other PLMs.
Target-aware Verbalizer.In prompt-based fine-tuning, a verbalizer, which is an injective function f: Y → V, is typically defined to map each label to a single token from the PLM's vocabulary.The efficacy of prompt-based methods heavily relies on the design of the verbalizer, and a straightforward approach of assigning a fixed concrete word to each label may not result in optimal outcomes.To address this issue, previous studies, such as that of Schick et al. [32], have suggested mapping each label to a phrase that can better represent the semantic meaning of the label, e.g., using "in favor of" instead of "favor".However, predicting consecutive [mask] tokens poses a new challenge.In an effort to tackle this problem, Gao et al. [34] proposed generating the verbalized word for each label via a pruned set of the top-k vocabulary words that are highly probable according to the PLM.Nonetheless, this approach involves a computationally demanding and time-consuming brute-force search for each label.Moreover, given the wide array of expressions used across various targets, we assert that a single phrase or token may be inadequate for capturing the stance information.To tackle this concern, we utilize a novel solution that involves the mapping of labels onto continuous vectors, called stance vectors, instead of explicit words or phrases.These vectors are amenable to be trained during optimization.Our modality revolves around the generation of three distinct vectors that correspond to the ones generated by the [MASK] of diverse templates.The stance vectors from T 1 , T 2 , and T 3 are V T,1 , V T,2 , and V T,3 , respectively.To ensure coherence with token embeddings within the PLM, the stance vectors have been dimensionally aligned with the size of said embeddings.
Attention Layer.The attention layer is proposed to integrate background knowledge with the prompt-based model.Specifically, we utilize three soft stance vectors, V T,1 , V T,2 , and V T,3 , as three queries to guide the attention in an iterative manner.The hidden state of the attention mechanism is acquired by feeding the background knowledge into the independent PLM.Here, the hidden state is denoted as H.By computing the attention queries and hidden states, the coupling coefficient matrix k can be computed as follows: where k {T,1;T,2;T,3} ∈ R n×n .Then, the query of next iteration V 2 can be updated as follows: where the dimension of the new query V 2 is the same as that of the initial query V 1 .Subsequently, the input of the next iteration can be updated by where LayerNorm performs the standard layer normalization.After t iterations, the output hidden state e can be found as follows: ), e = avg(q T,1 + q T,2 + q T,3 ) where so f tmax( f i ) = e f i ∑ j e f j , ⊕ is the concatenation operator.The detailed process is presented in Algorithm 1.

Algorithm 1 PSDCOT
Output: e 1: Utilize 1-shot example COT to teach language models and acquire background knowledge from X. 2: Initialize V T,1 , V T,2 , V T,3 3: for t in T iterations do 4: Obtain coupling coefficients: 7: end for 8: Obtain the q t T,1 , q t T,2 , q t T,3 9: Obtain the e 10: return e

Stance Classification
We classify the stance expressed in the text by assessing the semantic similarity between the target-aware stance vectors and the average of the label vector (which is defined in the verbalizer).To integrate the background knowledge, we concatenate the representation of background knowledge e with the target-aware stance vectors to enhance the stance detection performance, which is denoted as follows: Based on the words provided by the verbalizer, we calculate the probability of selecting token v as the label word.
where v is the embedding of the token in verbalizer.Then, we sum the words' probabilities of each label, which is denoted as ŷ.
Finally, the loss function can be effectively implemented through the utilization of the standard cross-entropy method: Here, N denotes the magnitude of the training set and C denotes the number of stance classes.Every ground-truth label, y i , pertaining to the i-th individual sample, is represented in the one-shot format.To optimize the attention layer, the standard method of the gradient descent algorithm is employed.
In accordance with the proposed configuration by [24], four targets-Donald Trump (D), Hillary Clinton (H), Legalization of Abortion (L), and Feminist Movement (F)-are deemed appropriate for evaluating the efficacy of the stance detection task, and hence have been chosen for our study.Specifically, for the cross-target setup [9,13,24], we construct eight cross-target stance detection tasks (D→H, H→D, F→L, L→F), where the source target is represented by the left side of the arrow, and the destination target is represented by the right side.• P-stance.To enhance the data volume for performance evaluation, the P-stance dataset comprises 21,574 tweets, targeting "Donald Trump (DT p )", "Joe Biden (JB p )", and "Bernie Sanders (BS p )".For cross-domain setup, we construct six settings: DT→JB, DT→BS, JB→DT, JB→BS, BS→DT, and BS→JB.

•
VAST.The VAST dataset, as presented by Allaway and Mckeown [26], encompasses a diverse range of targets that span across various themes, such as politics, education, and public health.The dataset comprises three distinct stance labels, with the label set being defined as "Pro", "Neutral", and "Con".The training set comprises 4003 samples, while the dev and test sets consist of 383 and 600 samples, respectively.As per Liang et al. [9], we evaluate our model's performance on zero-shot topics.• ISD.The ISD dataset, proposed by Huang et al. [29], poses a challenge as it consists of texts without explicit sentiment words.Therefore, for predicting stance polarity, it is crucial to comprehend the interplay between the text and contextual knowledge, including knowledge of the target and hashtags.The target of ISD are "Donald Trump (DT i )" and "Joe Biden (JB i )".

Compared Baseline Methods
In order to assess the efficacy of our proposed model, we conducted a thorough evaluation and comparison with a range of established baselines.The details of these baseline models are presented below for reference: Statistics-based methods: • BiLSTM [23].The BiLSTM methodology utilizes a bidirectional Long Short-Term Memory (LSTM) network to encode the underlying sentence and the corresponding target independently.

•
MemNet [39].The MemNet architecture embraces a memory network, enhanced with a multi-hop attention mechanism, to effectively encode textual data.• AOA [40].The AOA model employs two Long Short-Term Memory (LSTM) networks to model the target and context separately, and incorporates an interactive attention mechanism for modeling their interrelation.• ASGCN [41].The ASGCN approach leverages a dependency tree for modeling dependencies and leverages Graph Convolutional Networks (GCN) to learn compact and expressive text representations.• TAN [6].The TAN model introduces target-specific attention in conjunction with a Long-Short Term Memory for the task of stance detection.• TPDG [42].The TPDG model presents a novel solution for stance detection through the utilization of a target-adaptive graph convolutional network.The proposed framework integrates shared features from analogous targets, thereby enhancing the model's effectiveness in accurately delineating the stance towards a given target.• AT-JSS-Lex [43].The AT-JSS-Lex model suggests a target-adaptive graph convolutional network for the purpose of stance detection.This mechanism draws inspiration from the practice of utilizing common features from analogous targets.• TOAD [27].The TOAD uses adversarial learning to generalize across topics.
Fine-tuning based methods: • RoBERTa-FT [44].These methods employ a pretrained BERT or RoBERTa model for stance detection, with the given context and target converted to the format of "[CLS] + text + [SEP] + target + [SEP]" to adapt to the training and fine-tuning of the model.• PT-HCL [9].The PT-HCL model presents a novel approach to cross-target and zeroshot stance detection using contrastive learning.To achieve this, the model leverages a BERT-based architecture to establish a shared representation space for diverse targets.
Prompt-tuning based methods: • MPT.MPT has devised a prompt-tuning based PLM for stance detection, which employs a verbalizer defined by human experts.

•
AutoPT [31].AutoPT introduced an innovative approach for stance detection, which involves the generation of label words derived from the given data corpus via an auto-prompt method.• KPT [35].KPT introduced external lexicons to define the verbalizer for the prompt framework.

Implementation Details
In our experimental setup, we opted for pre-trained language models utilizing RoBERTalarge architecture.For training the model, we employed the Adam optimizer while using a mini-batch size of 32 and a learning rate of 0.0002.To advance the current state-of-the-art, we detail, in a comprehensive manner, the templates leveraged to stimulate pre-trained language models throughout this paper.
As per the recommendations of previous works [9,13], we employ the micro-average F1 score as our primary evaluation metric.Our first step in this process involves calculating the F1 scores for each of the categories, namely "Favor" and "Against": The F1-score can be computed based on P and R, which, respectively, stand for precision and recall.
Second, because the targets in the dataset are unbalanced, we compute the micro-F1 as another evaluation metric:

. In-Domain Setup
The results of in-domain stance detection using different robust benchmarks are presented in Tables 2 and 3. Based on the obtained outcomes, several conclusions can be drawn.(1) The pretrained models exhibit a remarkable enhancement in the performance of stance detection for most configurations when compared to statistic-based methods.For instance, RoBERTa-FT demonstrates an average improvement of 10.3% in comparison with the top-performing statistic-based method (TPDG) on the ISD dataset.This finding provides further validation on the effectiveness of utilizing pretrained models in stance detection.(2) Prompt-based PLM methods exhibit consistent improvement in multiple tasks when compared to fine-tuning PLM.For instance, PSDCOT achieves a 10.35% improvement in F1 avg and 8.15% in F1 m on average in ISD datasets, in contrast to RoBERTa-FT.This result indicates that the utilization of a prompt framework can significantly enhance the effectiveness of PLMs at tapping into their true capabilities.(3) The utilization of external knowledge serves as an indispensable factor in the completion of stance detection assignments in social media.By incorporating external knowledge into the procedure, a noteworthy enhancement in the performance of stance detection is observed.For example, after integrating background knowledge of the target, WS-BERT-Dual improves by 2.35% in F1 avg on average of ISD and P-stance datasets compared with RoBERTa-FT.(4) The PSDCOT method proposed in this paper surpasses all the established baselines across a majority of the evaluation tasks.Our experimental results demonstrate a notable improvement of 11.86% in F1 avg over the most effective neural network-based model (TPDG), 5.24% in F1 avg , and 4.72% in F1 m over the top-performing fine-tuned PLM model (RoBERTa-FT), and 2.7%in F1 avg and 3.28% in F1 m over the best-performing prompt-tuning approach (KPT), when averaging across seven distinct tasks.Furthermore, when compared to the current state-of-the-art external knowledge augmentation technique (WS-BERT-Dual), the PSDCOT method also achieves an average improvement of 6.35% in F1 avg on both the ISD and P-stance datasets.The advantage of PSDCOT comes from its two characteristics: (i) We propose a COT method to extract the background knowledge behind text and targets.This knowledge can effectively improve the performance of position detection from the results.(ii) We propose a multi-prompt learning network, which can effectively fuse background knowledge with the predictor.
Table 2. Performance comparison on F1 avg .The results with † are retrieved from [13]; ‡ are retrieved from [36].The ¶ mark refers to a p-value < 0.05.The best scores are in bold.Note that, to evaluate the stability of the model, following [13], we evaluated the stability of our proposed PSDCOT by running the method three times and reporting the average score.

No.
Embedding Obtaining a vast dataset that has been adequately annotated demands a substantial investment of time and resources.Hence, our proposal is to examine the efficacy of our approach within a cross-target framework.The objective of the cross-target framework is to predict the stance of the target destination by leveraging labeled data from the source target.The results of SenEval-2016 and P-Stance are reported in Tables 4-6.Based on the results, our proposed method outperforms the other baselines by a significant margin.Specifically, compared with the previous promising statistical method (TPDG), PSDCOT achieves an average improvement of 16.15% in F1 avg and 17.43% in F1 m on average, which confirms the effectiveness of utilizing a prompt-tuning framework in cross-target setup.In comparison to fine-tuning based methods (e.g., RoBERTa-FT), PSDCOT achieves an average improvement of 9.73% in F1 avg and 6.6% in F1 m .These results further emphasize the crucial role of using the knowledge-enhanced network in cross-target stance detection.Furthermore, PSDCOT achieves superior stability compared to KPT and MPT.For instance, PSDCOT achieves an average improvement of 1.45% in F1 avg and 2.78% in F1 m over MPT, 0.82% in F1 avg , and 0.95% in F1 m KPT, respectively, across all four setups from Tables 4 and 5.In certain scenarios, the target of a particular text may be absent from the training dataset; therefore, we compare it against the most advanced competitors in the field.The results of our experiments are reported in Table 7. Notably, due to the inherent limitations and challenges of zero-shot stance detection, all methods underperform in comparison to the in-target and cross-target setups.In particular, methods that focus only on statistics, without leveraging external background knowledge, perform poorly.On the other hand, approaches based on fine-tuning, such as PT-HCL, BERT-FT, and RoBERTa-FT, consistently outperform statistical-based methods.This outcome validates the remarkable benefits of leveraging knowledge acquired from a vast corpus.Despite the challenging nature of zero-shot stance detection, our PSDCOT model exhibits considerable potential, surpassing all benchmark approaches on the VAST dataset.Consequently, our findings imply that PSDCOT represents a promising strategy for addressing the demanding task of zeroshot stance detection by effectively incorporating background knowledge and adopting a prompt-tuning framework.

Ablation Study
In order to investigate the influence of each individual component in the PSDCOT method, we conducted an ablation test by removing the proposed component, which is denoted as "w/o".
The variants of PSDCOT are as follows: • w/o P: PSDCOT without the prompt-tuning framework; instead, we use a standard fine-tuning strategy.Specifically, the stance vector is the hidden state of the <cls> vector.The findings of our ablation study are depicted in Figure 2. Our results indicate that the chain-of-thought (COT) method significantly enhances the performance of our PSDCOT method.Specifically, we found that the removal of background knowledge acquired by the COT approach led to a substantial deterioration in performance.This observation underscores the importance of leveraging external knowledge-enriched models to facilitate a deeper understanding of the given stance.Furthermore, our study reveals that finetuning leads to a considerable drop in performance when compared to prompt-tuning.This finding highlights the efficacy of prompt learning in bridging the gap between large model pre-training and downstream tasks, such as stance detection and its capability to improve performance.Notably, our analysis exhibited that the best performance was achieved by combining all the aforementioned factors across all the experiments.(a) In-domain setup

Conclusions
In this paper, we propose a novel prompt-based stance detection approach, referred to as PSDCOT, which utilizes a chain-of-thought method to elicit knowledge and fuses knowledge through a multi-prompt learning network.The experimental results demonstrated that PSDCOT achieves state-of-the-art performance in in-target, cross-target, and zero-shot settings.In future work, we plan to elicit and prune knowledge from Large Language Models (LLMs) to enhance background knowledge accuracy and eliminate irrelevant information.Furthermore, we may dedicate efforts in constructing virtual text contexts to alleviate the challenge of social media data sparsity.

Figure 1 .
Figure 1.The overall structure of the proposed PSDCOT.
• w/o COT: We discard the Knowledge Extraction and use commonsense knowledge as the background knowledge following [11].• w/o t: We removed the multi-template and only kept the commonly used single-view T 3 template.

Table 3 .
Performance comparison of stance detection (F1 m ).The ¶ mark refers to a p-value < 0.05.The best scores are in bold.

Table 6 .
Performance comparison of cross-target stance detection (F1 avg ) on P-Stance.The ¶ mark refers to a p-value < 0.05.The best scores are in bold.

Table 7 .
Performance comparison of zero-shot stance detection.The ¶ mark refers to a p-value < 0.05.The best scores are in bold.