Dual-Aspect Active Learning with Domain-Adversarial Training for Low-Resource Misinformation Detection

Hu, Luyao; Han, Guangpu; Liu, Shichang; Ren, Yuqing; Wang, Xu; Yang, Zhengyi; Jiang, Feng

doi:10.3390/math13111752

Open AccessArticle

Dual-Aspect Active Learning with Domain-Adversarial Training for Low-Resource Misinformation Detection

by

Luyao Hu

¹,

Guangpu Han

¹,

Shichang Liu

¹,

Yuqing Ren

¹,

Xu Wang

¹,

Zhengyi Yang

^2,* and

Feng Jiang

²

¹

Chongging Division, PetroChina Southwest Oil & Gasfield Company, Chongqing 400707, China

²

School of Big Data and Software Engineering, Chongqing University, Chongqing 401331, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(11), 1752; https://doi.org/10.3390/math13111752

Submission received: 6 April 2025 / Revised: 11 May 2025 / Accepted: 21 May 2025 / Published: 25 May 2025

Download

Browse Figures

Versions Notes

Abstract

The rapid spread of misinformation threatens public safety and social stability. Although deep learning-based detection methods have achieved promising results, their effectiveness heavily relies on large amounts of labeled data, limiting their applicability in low-resource scenarios. Existing approaches, such as domain adaptation and metalearning, attempt to transfer knowledge from related source domains but often fail to fully address the challenges of data scarcity and annotation costs. Moreover, traditional active learning strategies typically focus solely on textual uncertainty, overlooking domain-specific discrepancies and the critical role of affective information in misinformation content. To address these challenges, this paper proposes a dual-aspect active learning framework with domain-adversarial training (DDT), tailored for low-resource misinformation detection. The framework integrates a dual-aspect sampling strategy that jointly considers textual and affective features to select samples that are both informative (diverse from labeled data) and uncertain (near decision boundaries). Additionally, a domain-adversarial training module is employed to extract domain-invariant representations, mitigating distribution shifts between source and target domains. Experimental results on multiple benchmark datasets demonstrate that DDT consistently outperforms baseline methods in low-resource settings, enhancing the robustness and generalizability of misinformation detection models.

Keywords:

low-resource misinformation detection; discrepancy aspect; uncertainty aspect; adversarial training; active learning

MSC:

68W99

1. Introduction

The rapid expansion of social media platforms has dramatically transformed the landscape of information dissemination, greatly facilitating the sharing of content among global audiences. While these platforms significantly enhance connectivity and real-time information exchange, they also amplify the propagation of misinformation—content that is intentionally misleading or factually incorrect—posing serious threats to public safety, social harmony, and political stability [1,2,3]. Consequently, accurately and swiftly identifying misinformation has emerged as an urgent and critical research area.

Advancements in deep learning have markedly improved misinformation detection by leveraging powerful semantic and contextual feature representations [4,5,6]. However, the performance of these deep learning-based methods critically relies on substantial amounts of labeled data. In realistic scenarios, particularly those involving emerging or evolving events, obtaining sufficient high-quality annotations is prohibitively expensive and impractical due to limited expert availability and substantial annotation costs [7,8]. Existing approaches designed to mitigate data scarcity, such as domain adaptation [9,10] and meta-learning [11], seek to leverage auxiliary knowledge from related source domains or tasks. Although these methods provide valuable insights, they frequently fail to comprehensively resolve fundamental limitations associated with sparse labeled data and significant domain discrepancies.

Active learning aims to select the most informative unlabeled samples to minimize labeling efforts while maximizing model performance gains. Traditional active learning methods, however, predominantly select samples based on uncertainty measured solely by textual content, neglecting critical aspects unique to misinformation detection. Specifically, these methods ignore two essential dimensions: (1) significant domain shifts commonly observed between training (source) and testing (target) domains in real-world misinformation detection scenarios; (2) the crucial role affective (emotional) information plays in distinguishing misinformation from truthful content. Empirical evidence indicates misinformation often elicits heightened emotional responses such as fear, surprise, or anger [1], making affective signals a powerful indicator for effective misinformation detection [12,13]. Recent work [14] shows that the emotion prototypes of real and fake claims learned from past events remain highly similar when projected onto completely new events, whereas semantic tokens diverge sharply across events. This universality of affect implies that emotional cues capture fine-grained yet transferable differences between truthful and deceptive content, making them especially valuable in low-resource, cross-event settings.

Motivated by these critical yet often overlooked gaps, we propose a novel framework termed dual-aspect active learning with domain-adversarial training (DDT), specifically designed for misinformation detection under low-resource constraints. Our framework addresses the above challenges through the following innovations: (1) Dual-Aspect Sampling Strategy: DDT introduces a novel two-stage active learning sampling mechanism that explicitly incorporates both textual and affective features. First, we pre-select unlabeled samples that are maximally divergent from the existing labeled set, ensuring the informative value and diversity of the candidate pool. Subsequently, within this diverse candidate subset, we employ uncertainty-based sampling (entropy-based) to pinpoint the most ambiguous samples, thereby effectively focusing limited annotation resources on samples that maximize model improvement. (2) Domain-Adversarial Training: To enhance model adaptability to cross-domain distributional shifts, we integrate a domain-adversarial learning mechanism. Specifically, DDT employs a domain discriminator and adversarial training to derive domain-invariant feature representations from both textual and affective information, significantly mitigating performance degradation caused by domain gaps between the source and emerging target events.

Figure 1 illustrates the core idea of our dual-aspect active learning strategy. In the first stage, we employ a dual-feature sampler to evaluate the informativeness of unlabeled samples by simultaneously considering their textual and affective representations. Specifically, samples exhibiting maximal divergence from existing labeled data are pre-selected, ensuring that the selected pool is highly informative and capable of introducing novel knowledge into the model. In the second stage, from this carefully constructed candidate set, we select samples with the highest prediction uncertainty (measured by entropy) according to our misinformation detector, thus pinpointing instances closest to the classification boundary. Furthermore, to enhance the model’s robustness against domain shifts, we adopt adversarial training [15], which effectively aligns feature distributions across source and target domains, promoting the learning of domain-invariant representations.

Overall, the proposed dual-aspect active learning with domain-adversarial training (DDT) framework comprises four primary components: (1) a dual-feature extractor that captures textual and affective cues, (2) a misinformation detector for classifying posts, (3) a domain discriminator that encourages domain-invariant representation learning, and (4) a dual-aspect sampler guiding the active learning annotation process. DDT is trained through a two-stage procedure: we initially pre-train the network on combined source and target data to achieve a preliminary domain alignment and subsequently fine-tune the misinformation detector via an iterative active learning loop, progressively incorporating newly annotated target samples.

The key contributions of this paper are summarized as follows:

The integration of dual-aspect active learning with domain-adversarial training. To our knowledge, this is the first work explicitly combining a dual-feature (textual and affective) active learning approach with adversarial domain adaptation, effectively addressing the challenges posed by limited labeled resources and domain discrepancies in misinformation detection.
A novel dual-aspect sampling strategy. We propose an innovative two-stage active learning sampling mechanism, simultaneously leveraging textual and affective features to identify samples that are both informative (diverse from labeled data) and uncertain (close to the decision boundary), optimizing the annotation process.
Enhanced effectiveness under low-resource conditions. Extensive experimental evaluations demonstrate that DDT significantly outperforms existing methods in low-resource misinformation detection tasks. Our detailed analyses confirm the individual contributions and effectiveness of both the dual-aspect sampling and domain-adversarial training components.

By jointly addressing active sample selection and domain adaptation, and by explicitly utilizing affective information alongside textual content, DDT provides robust and accurate misinformation detection performance under real-world constraints of scarce labeled data.

What sets our approach apart from previous work is the explicit and unified integration of dual-aspect active learning and domain-adversarial training, which, to the best of our knowledge, has not been explored in the existing misinformation detection literature. While prior studies may utilize either feature-rich active sampling or domain adaptation in isolation, our method innovatively bridges the two by coordinating feature diversity (textual and affective) with domain-invariant representation learning. This synergy enables our model to more effectively navigate the dual challenges of data scarcity and domain shift, achieving improved generalization in low-resource, cross-domain scenarios.

2. Related Work

In this section, we briefly review the existing research related to our proposed DDT framework. Our discussion primarily focuses on three research areas: misinformation detection, active learning, and active domain adaptation.

2.1. Fake Information Detection

The rapid proliferation of misinformation on social media has significantly intensified research efforts in automated misinformation detection. Deep learning methods, leveraging their strong ability to extract complex patterns from data, have become the predominant approach to tackle this challenge. For instance, Gao et al. [16] proposed an misinformation detection model that integrates a character-based bidirectional language model with stacked LSTM networks. Verma et al. [17] introduced WELFake, a two-stage benchmark framework utilizing linguistic features to enhance machine learning-based misinformation detection. To further improve detection accuracy, some research has explored multimodal approaches, integrating textual content, visual data, and user profile characteristics [18,19]. Bian et al. [20] proposed a bidirectional graph convolutional network (GCN) to capture both semantic and structural relations from social media networks, thereby enhancing the performance of misinformation detection models. Similarly, Chen et al. [21] addressed misinformation detection from a cross-modal perspective, introducing an ambiguity-aware multimodal detection method based on information-theoretic principles.

In addition to English-based models, researchers have investigated misinformation detection in other languages, which often present additional challenges due to limited language resources and diverse linguistic structures. For example, Lee et al. [22] introduced a Korean fake news detection dataset and benchmarked various BERT-based models, demonstrating that pre-trained Korean language models significantly outperform multilingual baselines in capturing cultural and contextual nuances. In Arabic, Alhindi et al. [23] constructed a fake news dataset and proposed a neural model combining CNN and GRU layers, showing promising results on Arabic news articles. For low-resource languages such as Bengali and Hindi, Chakraborty et al. [24] explored transfer learning techniques using XLM-R and mBERT to generalize across languages and domains. Blanco-Fernández et al. [25] developed a synthetic dataset of over 57,000 Spanish political news articles and fine-tuned Transformer models like BERT and RoBERTa, achieving high accuracy in detecting fake news. Similarly, Tretiakov et al. [26] utilized BERT-based models to detect false claims in Spanish, focusing on events such as the Spanish Parliament elections and the COVID-19 pandemic, and reported accuracy rates exceeding 88%. In the Russian context, Pavlyshenko [27] fine-tuned the LLaMA 2 large language model using a PEFT/LoRA approach for tasks including fake news detection and manipulation analytics, demonstrating its effectiveness in analyzing Russian-language disinformation. Additionally, some reports have highlighted the use of AI-generated content by Russian networks to spread propaganda, emphasizing the need for robust detection mechanisms [28].

Despite these advancements, existing deep learning methods generally depend on large-scale annotated datasets. In emerging or urgent situations, sufficient labeled data are typically unavailable, and manual labeling is both expensive and time-consuming. To overcome these limitations, this paper introduces a novel framework that integrates dual-aspect active learning with domain-adversarial training, specifically designed to achieve robust misinformation detection under low-resource conditions.

2.2. Emotion-Aware Detection

Recent advances in emotion-aware misinformation detection have demonstrated that affective information—such as fear, anger, or empathy—can play a critical role in identifying deceptive content, especially under low-resource and cross-domain conditions. In this subsection, we review several representative works that leverage emotional or sentiment-based cues in various modeling strategies. These studies provide important context and inspiration for our proposed framework.

Huang et al. [14] introduced the concept of emotion prototypes as event-invariant priors in a meta-learning framework. Their results demonstrate that emotional signals are highly transferable across events, enabling faster adaptation and better detection in low-resource scenarios. Wang et al. [29] proposed an empathy-driven multimodal detection model that captures both cognitive empathy and emotional empathy. The model achieves improved interpretability through “empathy maps” and outperforms strong multimodal baselines. Xu et al. [30] leveraged sentiment-guided prompts and cross-attention mechanisms to enhance few-shot learning performance. They maintain strong accuracy even with limited supervision, showing the efficiency of emotion-aware prompts in multimodal misinformation detection. Liu et al. [31] highlighted that high-intensity emotions such as fear and anger are consistently more prevalent in misinformation than in factual content. The authors emphasized the cross-lingual and cross-platform stability of such affective patterns and identified open challenges that motivate domain-adaptive frameworks.

In contrast to these works, our DDT framework uniquely integrates affective signals directly into active sample selection and domain-adversarial alignment, rather than treating emotion as a static feature for classification.

2.3. Active Learning

Active learning aims to achieve high model performance with minimal labeling effort by strategically selecting the most informative samples for annotation. Traditional active learning methods mainly fall into two categories: uncertainty-based sampling [32,33] and diversity-based sampling [34]. Uncertainty sampling methods prioritize selecting samples about which the model is least certain, typically those close to decision boundaries. Diversity sampling, in contrast, selects representative samples that broadly reflect the overall data distribution, thereby improving model generalizability.

Some studies have begun integrating active learning techniques specifically into misinformation detection tasks. For example, Ren et al. [35] explored adversarial active learning in graph neural networks for misinformation identification. Farinneya et al. [36] combined popular active learning strategies with existing misinformation detection models to effectively reduce annotation costs. However, these approaches largely rely on conventional active learning frameworks without explicitly considering unique domain-specific features of misinformation, such as affective or emotional cues, which can significantly enhance detection accuracy.

In contrast, our proposed DDT framework explicitly integrates a dual-aspect active learning strategy that jointly considers textual uncertainty and affective informativeness. This approach not only captures uncertainty near decision boundaries but also ensures sample diversity by leveraging emotional features characteristic of misinformation, thus substantially improving performance and adaptability under low-resource conditions.

2.4. Active Domain Adaptation

Active domain adaptation aims to enhance the effectiveness of domain adaptation methods by selectively annotating the most informative samples from the target domain, thereby effectively mitigating domain distribution discrepancies. The work by Su et al. [37] employed adversarial training to align source and target domains, selecting samples based on discriminator predictions. Fu et al. [38] further proposed three criteria—a transferable committee, transferable uncertainty, and transferable domainness—to select samples with greater informativeness and diversity. Similarly, Xie et al. [39] introduced an approach that jointly considers domain characteristics and instance-level uncertainty, leveraging a free-energy minimization technique to effectively reduce domain shifts.

Despite these advances, existing active domain adaptation approaches have primarily focused on general-purpose domain alignment strategies, neglecting unique misinformation-specific signals such as textual and affective features, which are critical indicators in misinformation detection. Consequently, directly applying these methods to misinformation detection tasks may lead to suboptimal performance.

In this work, we propose DDT, a dual-aspect active learning framework that explicitly incorporates misinformation-specific textual and affective cues during sample selection. By employing a dual-stage selection process guided by these domain-specific features and uncertainty measures, DDT effectively identifies the most informative and representative samples across different domains. Moreover, by integrating domain-adversarial training to learn domain-invariant representations, our framework significantly enhances model adaptability and accuracy under low-resource conditions and domain shifts typical of emerging misinformation scenarios.

3. Approach

In this section, we present the problem definition and key components of our proposed dual-aspect active learning framework with domain-adversarial training (DDT). Our approach employs a dual-aspect sampling strategy explicitly leveraging both textual and affective features, which are crucial indicators in misinformation detection. To further enhance adaptability and robustness under domain shifts and low-resource scenarios, DDT integrates adversarial domain training to extract transferable, domain-invariant representations.

3.1. Problem Definition

Formally, we define our problem setting as follows. Let

X

denote the input space and

Y = {0, 1}

the label space, where 0 indicates truthful content and 1 denotes misinformation. We are given a small labeled dataset,

D_{L} = {(x_{i}, y_{i})}_{i = 1}^{n_{L}}

, drawn from a source domain distribution,

D_{S}

, and a large unlabeled dataset,

D_{U} = {x_{j}}_{j = 1}^{n_{U}}

, sampled from a distinct target domain distribution,

D_{T}

, where

D_{S} \neq D_{T}

reflects the real-world scenario of cross-event or cross-topic misinformation detection.

Given a fixed annotation budget, b, our goal is to iteratively select a subset of informative instances from

D_{U}

to be labeled by an oracle, such that the performance of a misinformation classifier,

f_{θ}

, trained on the augmented labeled set is maximized. To achieve this, we employ a dual-aspect sampling strategy that jointly considers the following:

Textual representational divergence: the semantic dissimilarity between an unlabeled instance and the existing labeled data in embedding space;
Affective signal variance: the emotional distinctiveness or salience derived from affective features.

In each round, t, we select a batch,

B_{t} \subset D_{U}

, with

| B_{t} | \leq b_{t}

according to a dual-aspect scoring function,

S (x)

:

B_{t} = arg max_{B \subset D_{U}, | B | \leq b_{t}} \sum_{x \in B} S (x) .

The selected instances are annotated by the oracle, and the labeled pool is updated as

D_{L}^{(t + 1)} = D_{L}^{(t)} \cup {(x, y) | x \in B_{t}, y = O (x)} .

The process continues until the total number of queried samples reaches the budget b. The final model is evaluated on a held-out target domain test set. Our objective is to achieve the best possible detection performance under the given annotation constraint.

3.2. Framework Overview

The overall structure of our DDT framework is illustrated in Figure 2, comprising four primary components: (1) a dual-feature extractor, (2) a misinformation detector, (3) a domain discriminator, and (4) a dual-feature sampler. In the figure, textual elements associated with each component are highlighted in purple for clarity.

During the pre-training stage, the dual-feature extractor fuses textual and affective information from posts to form integrated representations. These representations are then refined through adversarial training between the extractor and the domain discriminator, learning domain-invariant features shared across source and target data. Meanwhile, the dual-feature sampler identifies samples in the target domain that are most dissimilar to the labeled pool (i.e., highly informative). From this pre-sampled set, the misinformation detector selects instances exhibiting the highest uncertainty for annotation by an oracle.

During the fine-tuning stage, the extracted features of newly annotated samples are fed into the misinformation detector for classification. Concurrently, DDT continues to query additional unlabeled samples that are both informative and uncertain at each active learning iteration. These newly labeled samples are then incorporated back into the training process, progressively enhancing the model’s performance and robustness under low-resource conditions.

3.3. Dual-Feature Extractor

The dual-feature extractor

G_{e}

plays a pivotal role in our framework, transforming each input post into a combined textual and affective representation, illustrated in Figure 3. Textual features (e.g., linguistic expressions and syntax) convey fundamental cues for assessing the veracity of a post, while affective elements (e.g., fear, provocation) can differ significantly across events, providing additional signals essential for accurate misinformation detection [40].

3.3.1. Textual Feature Extraction

We employ a pre-trained BERT model to obtain contextualized embeddings of each post. Concretely, a post,

x \in (X_{L} \cup X_{U})

, is represented as a token sequence,

{x_{0}, x_{1}, \dots, x_{S}, x_{S + 1}}

, where

x_{0} = [C L S]

and

x_{S + 1} = [S E P]

are special tokens marking the beginning and end of the sequence:

h = BERT (x),

(1)

where

h = {h_{s}}_{s = 0}^{S}

is the BERT output. Specifically,

h_{0}

is the pooled representation summarizing the entire sequence, while

{h_{s}}_{s = 1}^{S}

corresponds to the contextual embeddings of each token.

3.3.2. Affective Feature Extraction

To capture rich affective cues, we construct a representative affective vector by extracting and encoding five categories of features from each post:

Emotion: We utilize the NRC Emotion Lexicon, which associates English words with eight basic emotions: anger, fear, anticipation, trust, surprise, sadness, joy, and disgust. For each post, we identify the presence of words linked to these emotions and represent them as an eight-dimensional binary vector, where each dimension indicates the presence (1) or absence (0) of words associated with the corresponding emotion.
Sentiment: Using the same NRC Lexicon, we detect whether the text expresses positive or negative sentiment. This is encoded as a two-dimensional binary vector, indicating the presence of positive and negative sentiment words, respectively.
Morality: We employ the Moral Foundations Dictionary (MFD), which categorizes words into five moral foundations: Care/Harm, Fairness/Cheating, Loyalty/Betrayal, Authority/Subversion, and Purity/Degradation. Each foundation is divided into virtue and vice dimensions, resulting in ten categories in total. We scan each post for words associated with these categories and represent the findings as a 10-dimensional binary vector.
Imageability: We reference the MRC Psycholinguistic Database to obtain imageability scores for words within the post. Imageability refers to the ease with which a word evokes a mental image. We compute the average imageability score of content words in the post, normalize it to the range [0, 1], and encode it as a scalar value.
Hyperbole: We compile a lexicon of hyperbolic terms—words that convey exaggerated or overstated expression. Each post is examined for the presence of such terms, and this feature is encoded as a binary indicator (1 if any hyperbolic word is present, 0 otherwise).

All extracted features are concatenated to form a unified affective representation vector of fixed length, which is used as input to the downstream model.

3.3.3. Combining Textual and Affective Representations

The dual-feature extractor integrates both textual and affective information through two main components. First, a CNN with kernel sizes of 3, 4, and 5 is applied to the token-level embeddings

{h_{s}}_{s = 1}^{S}

, followed by a max-pooling layer. The resulting pooled features are concatenated to yield a word-level representation

v_{word}

. Second, a Bi-GRU processes the affective feature vector to produce an affective representation

v_{affect}

[41]. Finally, the two representations are combined, for instance, via an element-wise operation (⊙), yielding

v_{concat} = v_{word} ⊙ v_{affect},

(2)

This mixed textual and affective representation enables the model to capture both linguistic and emotional signals that are critical for accurate misinformation detection.

3.4. Domain Discriminator for Domain-Adversarial Training

Domain-adversarial training is performed during the pre-training stage of DDT, guiding the dual-feature extractor to learn domain-invariant representations. In this work, we adopt an adversarial learning mechanism inspired by [42,43], which leverages a domain discriminator to align features across source and target domains.

3.4.1. Domain Discriminator

Let

G_{d}

represent the domain discriminator, parameterized by

θ_{d}

. It is composed of two fully connected layers and a softmax output, aiming to classify each post into one of K domains. Formally, for a data sample

(x, y_{k})

from the labeled or unlabeled pools

(X, Y_{k})

, the loss of

G_{d}

is defined as

L_{d} (θ_{d}, θ_{e}) = - E_{(x, y_{k})} [\sum_{k = 1}^{K} 1_{[k = y]} log (G_{d} (G_{e} (x)))],

(3)

where

G_{e} (x)

denotes the dual-feature extractor’s output, and

1_{[k = y]}

is an indicator function that is 1 if

k = y

and 0 otherwise.

3.4.2. Adversarial Learning with Gradient Reversal

To enforce domain invariance, we introduce a gradient reversal layer (GRL) between the dual-feature extractor

G_{e}

and the domain discriminator

G_{d}

. During backpropagation, the GRL multiplies the gradient by

- 1

, effectively reversing it before it updates the parameters of

G_{e}

. Thus, minimizing

L_{d}

in Equation (3) can be interpreted as a min-max game:

G_{d}

learns to extract domain-specific traits for accurate classification, while

G_{e}

is encouraged to obfuscate these traits and produce domain-invariant features. Ultimately, this adversarial process mitigates domain shifts and enhances the generalization of the extracted representations.

3.5. Dual-Feature Sampler and Misinformation Detector for Sampling Strategies

This section describes our sample selection methodology, driven by the interaction between the misinformation detector and the dual-feature sampler.

3.5.1. Misinformation Detector

We adopt a binary classifier, denoted by

G_{r}

, to determine whether a given post is misinformation or not, based on its dual-feature representation. Formally,

G_{r}

is a Multi-Layer Perceptron (MLP) consisting of a fully connected layer and a softmax output layer [44], yielding a probability

P_{θ} (x)

that indicates how likely x is to be misinformation:

P_{θ} (x) = G_{r} (G_{e} (x)),

(4)

where

G_{e} (\cdot)

represents the dual-feature extractor, and

θ

encompasses the parameters of both

G_{e}

and

G_{r}

.

To train this detector, we minimize a cross-entropy loss over labeled data

(x_{L}, y_{L})

:

L_{r} (θ_{r}, θ_{e}) = - E_{(x_{L}, y_{L})} [y_{L} log (P_{θ} (x_{L})) + (1 - y_{L}) log (1 - P_{θ} (x_{L}))] .

(5)

By jointly optimizing the parameters

θ_{e}

(for

G_{e}

) and

θ_{r}

(for

G_{r}

), the model progressively refines its dual-feature representations and classification boundaries, thereby improving the accuracy of misinformation detection.

3.5.2. Dual-Feature Sampler

The dual-feature sampler is central to sample selection in our DDT method. Its primary role is to identify posts that are maximally dissimilar from the labeled pool by leveraging both textual and affective representations. Intuitively, such posts are likely to introduce novel information that the model has yet to learn, thereby enriching the training set with diverse textual and emotional cues.

Formally, the sampler

G_{s}

takes the dual-feature output of the extractor

G_{e} (x)

as an input and outputs a score in

(0, 1)

that indicates whether a post, x, is drawn from the labeled pool (

X_{L}

) or the unlabeled pool (

X_{U}

). Implemented using two fully connected layers and a sigmoid output layer,

G_{s}

is trained via a binary cross-entropy loss:

\begin{matrix} L_{s} (θ_{s}, θ_{e}) & = - E_{(x, y_{m})} [y_{m} log (G_{s} (G_{e} (x))) \\ + (1 - y_{m}) log (1 - G_{s} (G_{e} (x)))], \end{matrix}

(6)

where

y_{m}

is an auxiliary label indicating whether x belongs to

X_{L}

(

y_{m} = 1

) or

X_{U}

(

y_{m} = 0

), and

θ_{s}

denotes the sampler’s parameters. By minimizing

L_{s}

, the sampler becomes increasingly adept at distinguishing labeled from unlabeled samples based on dual-feature representations, thus facilitating the selection of unlabeled samples that are most likely to enhance model performance.

3.5.3. Sampling Strategy

Our sampling strategy aims to acquire unlabeled samples that are both information-rich and uncertain for fake information detection. Specifically, we identify samples that (1) differ substantially from the labeled pool in both textual and affective features and (2) exhibit high classification uncertainty.

To quantify uncertainty, we leverage the entropy measure of a sample,

x_{U}

:

H (x_{U}) = - \sum_{i = 1}^{n} [P (y_{i} ∣ x_{U}) \cdot ln P (y_{i} ∣ x_{U})],

(7)

where

P (y_{i} ∣ x_{U})

is the predicted probability that

x_{U}

belongs to label

y_{i}

. Intuitively, when these probabilities are close to each other (e.g., near 0.5 for a binary classification), the detector is highly uncertain about the sample’s label.

We select a total of b samples through the following two-step procedure:

Pre-sampling using dual-feature scores. We compute a score for each unlabeled sample, $x_{U}$ , that reflects its dissimilarity from the labeled pool, using the dual-feature sampler output:

$score = 1 - G_{s} (G_{e} (x_{U})) .$

(8)

A higher score indicates greater textual and affective divergence from the labeled pool. We then select the top $2 b$ unlabeled samples with the highest scores as our pre-sampled set.
Uncertainty-based refinement. From this pre-sampled set of $2 b$ candidates, we use the fake information detector to compute each sample’s entropy (Equation (7)). We then pick the b samples with the highest entropy values, indicating the greatest uncertainty, as our final query set.

3.6. Algorithm Optimization

During the pre-training stage, the dual-feature extractor

G_{e}

produces textual and affective representations of posts. The fake information detector is trained to minimize the detection loss

L_{r} (θ_{r}, θ_{e})

(Equation (5)), thereby improving the model’s ability to classify posts accurately. Simultaneously, an adversarial game takes place between

G_{e}

and the domain discriminator, where

G_{e}

tries to maximize the domain classification loss

L_{d} (θ_{d}, θ_{e})

, while the domain discriminator endeavors to minimize it to distinguish domain-specific features. Furthermore, the dual-feature sampler

G_{s}

cooperates with

G_{e}

to capture textual and affective discrepancies between

X_{L}

and

X_{U}

, minimizing

L_{s} (θ_{s}, θ_{e})

.

To ensure that both textual and affective features contribute effectively during training, we adopt an end-to-end optimization strategy based on the fused dual-feature representation. Specifically, the output of the dual-feature extractor—obtained by integrating semantic representations from BERT-based CNN modules and affective representations from Bi-GRU—is used uniformly across all training objectives. The misinformation detector, domain discriminator, and dual-feature sampler all operate on this fused representation, and their associated losses (i.e.,

L_{r}

,

L_{d}

, and

L_{s}

) are computed based on it. This design enables the model to jointly learn from linguistic and emotional cues in a unified representation space, facilitating more robust and generalizable feature learning under low-resource and domain-shift conditions.

Formally, the overall pre-training objective is

L_{pre} (θ_{e}, θ_{r}, θ_{d}, θ_{s}) = λ_{r} L_{r} (θ_{r}, θ_{e}) - λ_{d} L_{d} (θ_{d}, θ_{e}) + λ_{s} L_{s} (θ_{s}, θ_{e}),

(9)

aiming to simultaneously optimize the parameters of each component for their respective objectives. The pre-training algorithm is detailed in Algorithm 1.

Algorithm 1 Pre-training process in DDT.

Require:: Labeled Source pool $D_{L} (X_{L}, Y_{L})$ , Unlabeled Target pool $D_{U} (X_{U})$ , Auxiliary domain labels $Y_{K}$ for $D_{L}$ and $D_{U}$ , Initialized models $θ_{e}, θ_{d}, θ_{r}, θ_{s}$
Ensure:: Hyperparameters: epochs, $λ_{r}$ , $λ_{d}$ , $λ_{s}$
1:: for $i = 1$ to epochs do
2:: sample $(x_{L}, y_{L}) \sim (X_{L}, Y_{L})$
3:: sample $x_{U} \sim X_{U}$
4:: compute $L_{r} (θ_{r}, θ_{e})$ using Equation (5) ▹ for $x_{L}$
5:: compute $L_{d} (θ_{d}, θ_{e})$ ▹ for $x_{L}$ , $x_{U}$ (domain labels)
6:: compute $L_{s} (θ_{s}, θ_{e})$ ▹ for $x_{L}$ , $x_{U}$ (sampler task)
7:: $L_{pre} \leftarrow λ_{r} L_{r} - λ_{d} L_{d} + λ_{s} L_{s}$
8:: update $θ_{e}, θ_{d}, θ_{r}, θ_{s}$ via gradient descent
9:: end for
10:: initially query $D_{U}$ for candidate samples using the strategy in Section 3.5.3 return Trained $θ_{e}, θ_{d}, θ_{r}, θ_{s}$ and the selected candidates

During the fine-tuning stage, we apply the sampling strategy (Section 3.5.3) to select valuable samples from the unlabeled target set

X_{U}

, extract their dual-feature representations, and refine both the fake information detector and the sampler. The fine-tuning objective is

L_{ft} (θ_{e}, θ_{r}, θ_{s}) = L_{r} (θ_{r}, θ_{e}) + λ_{t} L_{s} (θ_{s}, θ_{e}) .

(10)

Minimizing

L_{ft}

enhances the classifier’s domain-specific accuracy while continuously adjusting the sampler to the evolving data distribution. The fine-tuning algorithm is summarized in Algorithm 2.

Algorithm 2 Fine-tuning process in DDT.

Require:: Labeled Target pool $\tilde{D_{L}}$ , Unlabeled Target pool $D_{U} ∖ \tilde{D_{U}}$ , Trained Parameters $θ_{e}, θ_{r}, θ_{s}$
Ensure:: Hyperparameters: epochs, budget, $λ_{t}$
1:: while budget not exhausted do
2:: for $i = 1$ to epochs do
3:: sample $(x_{L}, y_{L})$ from $\tilde{D_{L}}$
4:: sample $x_{U}$ from $D_{U} ∖ \tilde{D_{U}}$
5:: compute $L_{r} (θ_{r}, θ_{e})$ (Equation (5)) ▹ on $x_{L}$
6:: compute $L_{s} (θ_{s}, θ_{e})$ ▹ for $x_{L}$ , $x_{U}$
7:: $L_{ft} \leftarrow L_{r} + λ_{t} L_{s}$
8:: update $θ_{e}, θ_{r}, θ_{s}$ via gradient descent
9:: end for
10:: query new samples from $D_{U}$ via the sampling strategy
11:: annotate and add these samples to $\tilde{D_{L}}$
12:: end while

4. Experiments

To demonstrate the effectiveness of our proposed DDT framework, we compared it against several active learning methods and cross-domain misinformation detection baselines. We then performed an ablation study to investigate the contributions of each component and analyze hyperparameter sensitivity, thereby assessing the robustness and underlying mechanism of DDT.

4.1. Experimental Setup

Datasets. We evaluated DDT using five domains—Germanwings-crash, Sydneysiege, Ottawashooting, Charliehebdo (Cha.), and Ferguson (Fer.)—derived from the PHEME dataset [45], which compiles Twitter data including social context, posted content, and associated labels. For each target domain, we treated the remaining four events as the source domain. Table 1 summarizes the distribution of fake and real news across these five domains.

Implementation Details. We initialized the source domain data as the labeled pool and split each target domain into three partitions: (1) 10% for initial training, (2) 20% for testing, and (3) the remaining 70% as an unlabeled pool. In each iteration, we sampled 5% of the unlabeled pool for annotation by an oracle. We assumed an ideal annotation budget, b, and appropriate domain expertise to facilitate the labeling process.

4.2. Baselines

To evaluate the effectiveness of our proposed DDT framework, we compared it with baselines from two groups: (A) classical and state-of-the-art active learning strategies and (B) cross-domain fake information detection models.

4.2.1. Group A: Active Learning Strategies

Random. A simple sampling strategy that selects unlabeled instances uniformly at random, without considering any particular features or domain knowledge.
Uncertainty [46]. An approach that picks samples for which the model is least certain. We estimate uncertainty using the predicted probability distribution (entropy), where higher entropy signifies greater uncertainty.
Core-set [47]. This strategy queries samples that are farthest (in Euclidean distance) from any labeled instance, under the assumption that these will yield the most novel information for model improvement.
TQS [38]. Transferable Query Selection combines three criteria—atransferable committee, uncertainty, and domainness—to identify highly informative samples under domain shifts. It also incorporates random sampling to increase diversity among the selected queries.
DAAL [48]. Domain-adversarial active learning leverages textual and affective features to identify samples most dissimilar to labeled data and uses adversarial domain training to learn transferable representations.

4.2.2. Group B: Cross-Domain Fake Information Detection Models

EANNs. Event-Adversarial Neural Networks primarily incorporate text features from multiple domains through adversarial learning to boost cross-domain detection performance.
EDDFNs [49]. These learn domain vectors via unsupervised methods and augments them with both domain-specific and cross-domain information for improved fake information detection across diverse domains.
MDFEND [50]. This employs multiple domain experts and domain gating to integrate text features with domain-aware representations, thereby enhancing cross-domain detection capability.
DAAL [48]. As noted above, DAAL combines textual and affective features with adversarial domain training, effectively transferring knowledge across domains.
FinDCL [51]. Fine-Grained Discrepancy Contrastive Learning simulates nuanced differences between fake news and event-related truths via adversarial pre-training. It then refines truth extraction under a contrastive framework to better capture subtle falsehood patterns and reduce redundancy.

4.2.3. Comparison with Baselines

Table 2 summarizes the accuracy achieved by DDT and baselines in Group A under increasing proportions of unlabeled target data (15%, 20%, 25%, 30%, 35%, 40%, 45%, and 50%). Accuracy is a widely adopted metric in fake information detection. Across all proportions, DDT either matches or surpasses the performance of the Group A baselines.

We specifically compared DDT’s dual-aspect sampling strategy (applied after adversarial pre-training) against random, uncertainty, and core-set methods, which used the same adversarial pre-training process but differed in how they selected samples. The empirical results demonstrated that DDT achieved higher accuracy than these baselines on all five domains. This outcome underscores DDT’s effectiveness in identifying high-quality queries by jointly leveraging textual and affective features alongside uncertainty. In contrast, the alternative methods either overlooked misinformation-specific signals or did not adequately handle cross-domain discrepancies, thus yielding lower detection accuracy.

To further assess the performance of DDT relative to the cross-domain fake information detection models in Group B, we trained each of these baseline models using the source domain data combined with 20% of the target domain as labeled training samples. In contrast, our DDT framework incrementally incorporated the same 20% subset through its dual-aspect active learning process. Table 3 presents the comparative results, showcasing the effectiveness of DDT in tackling the fake information detection task.

Across all evaluated settings, DDT consistently outperformed baseline models in both active learning (Group A) and cross-domain transfer (Group B) scenarios. The proposed dual-aspect sampling strategy enables more effective query selection by jointly considering semantic divergence and affective signals, while the domain-adversarial training component promotes the extraction of transferable representations that are robust to domain shifts.

In particular, Table 2 and Table 3 show that DDT achieved state-of-the-art accuracy and F1-scores across five target domains, outperforming strong baselines including DAAL, CoreSet, and cross-domain models such as EANN and FinDCL. This demonstrates that DDT not only improves performance within a single domain but also excels in handling real-world settings where labeled data from emerging events are scarce and distributional discrepancies are substantial.

These results demonstrate that DDT generalizes well across different real-world events, even under significant domain shifts. The consistent performance gains across five distinct domains highlight its strong cross-domain adaptability, enabled by domain-invariant representations, and affect-aware sampling strategies.

4.2.4. Analysis of Active Learning Sampling

We employed t-Distributed Stochastic Neighbor Embedding (t-SNE) to reduce the dimensionality of post embeddings to two and plotted the results, shown in Figure 4. To provide a more intuitive illustration, we present two perspectives of the selected samples: (1) their distribution relative to positive and negative instances and (2) their distribution across the source and target domains.

Figure 4 is arranged in a 5 × 2 grid, yielding five subgraph groups from left to right. Each group contains two stacked subgraphs, corresponding to one of the five experimental settings: Ferguson, Germanwings-crash, Ottawashooting, Sydneysiege, and Charliehebdo. Across all these visualizations, the selected samples clearly lie between the positive and negative clusters, while also being farthest from the source domain. This indicates that our active learning strategy successfully identifies samples that introduce new and informative data features to the model.

4.2.5. Ablation Analysis

To assess the contribution of each major component, we compared DDT with three ablated variants:

DDT\T: DDT without textual features in the dual-feature sampler; only affective cues were used for sample selection.
DDT\A: DDT without affective features in the sampler, relying purely on textual signals to pick informative samples.
DDT\D: DDT without a domain discriminator, i.e., removing adversarial domain training during the pre-training stage and relying solely on labeled target data.

Figure 5 illustrates that dual-feature sampling and adversarial domain training were critical to DDT’s overall performance. When these components were absent, the respective ablated versions exhibited diminished accuracy throughout the active learning query cycles.

Interestingly, DDT\D outperformed other variants (and even DDT) in the pre-training phase. Because it does not employ adversarial training, it could directly learn domain-specific features from the 10% of labeled target data, thus initially achieving higher accuracy on that specific domain. By contrast, DDT (and other variants) focused on learning domain-invariant features via adversarial training, leading to temporarily lower accuracy in the target domain. However, after additional active learning cycles, DDT\D consistently lagged behind DDT because it failed to leverage the source-domain knowledge and shared feature space that adversarial training provides. As a result, DDT ultimately benefited from its capacity to fuse information from both source and target domains.

Meanwhile, DDT\T and DDT\A performed comparably to DDT in the initial stage but gradually fell behind during fine-tuning. Since each of these variants omitted one modality (textual or affective) when selecting samples from the unlabeled pool, they missed crucial cues. Consequently, they failed to discover the most informative candidates—those offering a balance of linguistic and emotional signals—and consequently realized less performance gain in subsequent queries. The improved performance of full DDT underscores the importance of incorporating both textual and affective features for richer exploration of the unlabeled data space.

Through the ablation analysis of DDT, we can draw the following conclusions.

The integration of textual and affective features proves advantageous for DDT to query the most informative posts.
The shared domain features learned by the dual-feature extractor are conducive to improving DDT’s performance in a new domain.

4.2.6. Hyperparameter Sensitivity

As discussed earlier, DDT first selects

n \times b

unlabeled samples in a pre-sampling stage using the dual-feature sampler and then chooses the most uncertain of those

n \times b

candidates for annotation. Therefore, the hyperparameter n critically influences sample diversity and informativeness. We evaluated DDT’s accuracy under different values of n to investigate its impact on the sampling process.

Figure 6 presents the performance of our model with various n. We observe that the highest accuracy was achieved when n was set to 2. When

n = 1

, the pre-sampling step degenerated into a single-pass strategy relying solely on the dual-feature sampler’s dissimilarity measure—potentially selecting outliers and limiting performance gains. Conversely, as n grew larger and

n \times b

became comparable to the entire unlabeled pool, the approach resembled an uncertainty-only sampling strategy, diminishing the added value of the dual-feature sampler and reducing the overall diversity of candidates. Hence, setting

n = 2

struck an effective balance between discovering informative outliers and ensuring sufficient diversity, leading to optimal performance in our experiments.

Moreover, we examined how assigning different weights to losses in the pre-training phase affected subsequent performance. Specifically, we varied the relative weights of the fake information detector, domain discriminator, and dual-feature sampler losses in Equation (9). As shown in Figure 7, increasing the weight of the fake information detector loss during pre-training led to poorer fine-tuning performance. We attributed this to the model overfitting the detection task and losing its adaptability to new events. In contrast, increasing the domain discriminator loss weight enhanced the model’s ability to learn shared domain features, ultimately boosting accuracy in the fine-tuning stage. Similarly, raising the weight of the dual-feature sampler loss was especially beneficial when a larger volume of labeled data became available, as it allowed the model to better isolate informative samples by learning the gap between the labeled and unlabeled pools. Furthermore, setting the losses to a 1:1:1 ratio balanced the three objectives effectively, yielding a strong overall improvement as labeled data accumulated. These findings underscore the importance of carefully tuning the loss weighting scheme in Equation (9) to optimize DDT’s performance.

In addition, the parameter

λ_{t}

in Equation (10) was pivotal during fine-tuning. We evaluated

λ_{t} \in {0.5, 1, 2}

and report the results in Figure 8. Our experiments indicated that

λ_{t} = 1

achieved the best overall performance. When the fake information detector was given excessive weight, the model initially attained higher accuracy but subsequently experienced slower improvement, reflecting the insufficient adaptation of the dual-feature sampler. Conversely, setting

λ_{t} = 2

accelerated the performance gain by better balancing the roles of the detector and sampler. These outcomes further underscore the significance of tuning loss components in both pre-training and fine-tuning phases for the optimal performance of DDT.

5. Conclusions and Future Work

In this work, we present a novel dual-aspect active learning framework with domain-adversarial training (DDT) for cross-domain fake information detection under low-resource conditions. The proposed method introduces a dual-feature representation that jointly captures textual and affective cues, combined with a two-stage sampling strategy and adversarial learning to achieve robust domain-invariant modeling. By integrating informativeness and uncertainty in sample selection, DDT effectively identifies high-value unlabeled instances to maximize model gains with minimal annotation cost.

Extensive experiments across five real-world misinformation domains demonstrated the superiority of DDT over classical active learning and domain adaptation methods, as well as its strong generalisability across domains. The results validated the importance of jointly modeling semantic and emotional signals in misinformation and the advantage of incorporating domain-aligned training strategies.

In the future, we aim to enhance the active learning mechanism by adaptively adjusting sampling strategies according to the evolving target domain distribution. Additionally, we plan to incorporate multimodal information (e.g., images, metadata) and explore task-specific features such as stance, novelty, or credibility signals to further improve detection accuracy. Finally, we intend to evaluate our framework on larger-scale and multilingual misinformation datasets to assess its scalability and real-world deployment potential.

Author Contributions

Conceptualization, L.H. and G.H.; methodology, G.H.; software, Z.Y.; validation, S.L., Z.Y. and F.J.; formal analysis, Y.R.; investigation, X.W.; resources, F.J.; data curation, Z.Y.; writing—original draft preparation, Z.Y.; writing—review and editing, L.H.; visualization, S.L.; supervision, Z.Y.; project administration, X.W.; funding acquisition, Z.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Innovation Key R&D Program of Chongqing, China (grant number CSTB2022TIAD-STX0006) and the Science and Technology Research Program of Chongqing Municipal Education, China (grant number KJZD-K202304401).

Data Availability Statement

The dataset is at https://github.com/majingCUHK/Rumor_GAN (accessed on 20 May 2025).

Conflicts of Interest

Authors Luyao Hu, Guangpu Han, Shichang Liu, Yuqing Ren and Xu Wang was employed by the PetroChina Southwest Oil & Gasfield Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Vosoughi, S.; Roy, D.; Aral, S. The spread of true and false news online. Science 2018, 359, 1146–1151. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Gao, M.; Wang, Z.; Wang, R.; Wen, J. Robustness Analysis of Triangle Relations Attack. In Proceedings of the 2020 IEEE 13th International Conference on Cloud Computing, Beijing, China, 18–24 October 2020; pp. 557–565. [Google Scholar]
Liu, Z.; Qin, T.; Sun, Q.; Li, S.; Song, H.H.; Chen, Z. SIRQU: Dynamic Quarantine Defense Model for Online Rumor Propagation Control. IEEE Trans. Comput. Soc. Syst. 2022, 9, 1703–1714. [Google Scholar] [CrossRef]
Amira, A.; Derhab, A.; Hadjar, S.; Merazka, M.; Alam, M.G.R.; Hassan, M.M. Detection and Analysis of Fake News Users’ Communities in Social Media. IEEE Trans. Comput. Soc. Syst. 2023, 11, 5050–5059. [Google Scholar] [CrossRef]
Yu, W.; Ge, J.; Chen, Z.; Liu, H.; Ouyang, M.; Zheng, Y.; Kong, W. Research on Fake News Detection Based on Dual Evidence Perception. Eng. Appl. Artif. Intell. 2024, 133, 108271. [Google Scholar] [CrossRef]
Jing, J.; Li, F.; Song, B.; Zhang, Z.; Choo, K.K.R. Disinformation Propagation Trend Analysis and Identification Based on Social Situation Analytics and Multilevel Attention Network. IEEE Trans. Comput. Soc. Syst. 2023, 10, 507–522. [Google Scholar] [CrossRef]
Dong, X.; Victor, U.; Qian, L. Two-Path Deep Semisupervised Learning for Timely Fake News Detection. IEEE Trans. Comput. Soc. Syst. 2020, 7, 1386–1398. [Google Scholar] [CrossRef]
Babaei, M.; Kulshrestha, J.; Chakraborty, A.; Redmiles, E.M.; Cha, M.; Gummadi, K.P. Analyzing Biases in Perception of Truth in News Stories and Their Implications for Fact Checking. IEEE Trans. Comput. Soc. Syst. 2022, 9, 839–850. [Google Scholar] [CrossRef]
Gao, F.; Pi, D.; Chen, J. Balanced and robust unsupervised Open Set Domain Adaptation via joint adversarial alignment and unknown class isolation. Expert Syst. Appl. 2024, 238, 122127. [Google Scholar] [CrossRef]
Yu, Y.; Karimi, H.R.; Shi, P.; Peng, R.; Zhao, S. A new multi-source information domain adaption network based on domain attributes and features transfer for cross-domain fault diagnosis. Mech. Syst. Signal Process. 2024, 211, 111194. [Google Scholar] [CrossRef]
Tang, Y.; Zhang, L.; Zhang, W.; Jiang, Z. Multi-task convex combination interpolation for meta-learning with fewer tasks. Knowl.-Based Syst. 2024, 296, 111839. [Google Scholar] [CrossRef]
Ghanem, B.; Ponzetto, S.P.; Rosso, P.; Rangel, F. Fakeflow: Fake news detection by modeling the flow of affective information. arXiv 2021, arXiv:2101.09810. [Google Scholar]
Miao, X.; Rao, D.; Jiang, Z. Syntax and sentiment enhanced bert for earliest rumor detection. In Proceedings of the NLPCC 2021, Qingdao, China, 13–17 October 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 570–582. [Google Scholar]
Huang, Y.; Zhang, W.; Li, M.; Chen, X. EML: Emotion-Aware Meta Learning for Cross-Event False Information Detection. ACM Trans. Knowl. Discov. Data TKDD 2024, 18, 1–25. [Google Scholar] [CrossRef]
Yuan, H.; Zheng, J.; Ye, Q.; Qian, Y.; Zhang, Y. Improving fake news detection with domain-adversarial and graph-attention neural network. Decis. Support Syst. 2021, 151, 113633. [Google Scholar] [CrossRef]
Gao, J.; Han, S.; Song, X.; Ciravegna, F. Rp-dnn: A tweet level propagation context based deep neural networks. arXiv 2020, arXiv:2002.12683. [Google Scholar]
Verma, P.K.; Agrawal, P.; Amorim, I.; Prodan, R. WELFake: Word Embedding Over Linguistic Features for Fake News Detection. IEEE Trans. Comput. Soc. Syst. 2021, 8, 881–893. [Google Scholar] [CrossRef]
Xu, F.; Zeng, L.; Huang, Q.; Yan, K.; Wang, M.; Sheng, V.S. Hierarchical graph attention networks for multi-modal rumor detection on social media. Neurocomputing 2024, 569, 127112. [Google Scholar] [CrossRef]
Zhou, H.; Ma, T.; Rong, H.; Qian, Y.; Tian, Y.; Al-Nabhan, N. MDMN: Multi-task and Domain Adaptation based Multi-modal Network for early rumor detection. Expert Syst. Appl. 2022, 195, 116517. [Google Scholar] [CrossRef]
Bian, T.; Xiao, X.; Xu, T.; Zhao, P.; Huang, W.; Rong, Y.; Huang, J. Rumor detection on social media with bi-directional graph convolutional networks. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 549–556. [Google Scholar]
Chen, Y.; Li, D.; Zhang, P.; Sui, J.; Lv, Q.; Tun, L.; Shang, L. Cross-modal ambiguity learning for multimodal fake news detection. In Proceedings of the ACM Web Conference 2022, Lyon, France, 25–29 April 2022; pp. 2897–2905. [Google Scholar]
Lee, J.; Park, S.; Ko, Y. Korean Hate Speech and News Dataset for Detecting Harmful Language on Social Media. Appl. Sci. 2021, 11, 903. [Google Scholar]
Alhindi, T.; Petridis, S.; Damljanovic, D. Detecting Fake News in Arabic Using Deep Learning. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, Marseille, France, 11–16 May 2020. [Google Scholar]
Chakraborty, T.; Bandyopadhyay, S.; Ghosh, S.; Goyal, P. Multilingual and Multimodal Fake News Detection Using XLM-R and VisualBERT. In Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, 11–16 May 2020; pp. 1097–1106. [Google Scholar]
Blanco-Fernández, Y.; Otero-Vizoso, J.; Gil-Solla, A.; García-Duque, J. Enhancing Misinformation Detection in Spanish Language with Deep Learning: BERT and RoBERTa Transformer Models. Appl. Sci. 2024, 14, 9729. [Google Scholar] [CrossRef]
Tretiakov, A.; Martín, A.; Camacho, D. Detection of False Information in Spanish Using Machine Learning Techniques. In Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning, Évora, Portugal, 22–24 November 2023. [Google Scholar]
Pavlyshenko, B.M. Analysis of Disinformation and Fake News Detection Using Fine-Tuned Large Language Model. arXiv 2023, arXiv:2309.04704. [Google Scholar]
Newport, A.; Jankowicz, N. Russian networks flood the Internet with propaganda, aiming to corrupt AI chatbots. Bull. At. Sci. 2025. Available online: https://thebulletin.org/2025/03/russian-networks-flood-the-internet-with-propaganda-aiming-to-corrupt-ai-chatbots/ (accessed on 20 May 2025).
Wang, Z.; Yuan, L.; Zhang, Z.; Zhao, Q. Bridging Cognition and Emotion: Empathy-Driven Multimodal Misinformation Detection. arXiv 2025, arXiv:2504.17332. [Google Scholar]
Xu, X.; Li, X.; Wang, T.; Jiang, Y. AMPLE: Emotion-Aware Multimodal Fusion Prompt Learning for Fake News Detection. In Proceedings of the 31st International Conference on Multimedia Modeling (MMM 2025), Nara, Japan, 8–10 January 2025; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2025; Volume 15520, pp. 86–100. [Google Scholar] [CrossRef]
Liu, Z.; Zhang, T.; Yang, K.; Thompson, P.; Yu, Z.; Ananiadou, S. Emotion Detection for Misinformation: A Review. Inf. Fusion 2024, 107, 102300. [Google Scholar] [CrossRef]
Cardoso, T.N.; Silva, R.M.; Canuto, S.; Moro, M.M.; Gonçalves, M.A. Ranked batch-mode active learning. Inf. Sci. 2017, 379, 313–337. [Google Scholar] [CrossRef]
Kirsch, A.; Van Amersfoort, J.; Gal, Y. Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2019; Volume 32. [Google Scholar]
Zhang, B.; Li, L.; Yang, S.; Wang, S.; Zha, Z.J.; Huang, Q. State-relabeling adversarial active learning. In Proceedings of the IEEE/CVF Conference, Seattle, WA, USA, 14–19 June 2020; pp. 8756–8765. [Google Scholar]
Ren, Y.; Wang, B.; Zhang, J.; Chang, Y. Adversarial active learning based heterogeneous graph neural network for fake news detection. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, 17–20 November 2020; pp. 452–461. [Google Scholar]
Farinneya, P.; Pour, M.M.A.; Hamidian, S.; Diab, M. Active learning for rumor identification on social media. In Proceedings of the EMNLP 2021, Punta Cana, Dominican, 7–11 November 2021; pp. 4556–4565. [Google Scholar]
Su, J.C.; Tsai, Y.H.; Sohn, K.; Liu, B.; Maji, S.; Chandraker, M. Active adversarial domain adaptation. In Proceedings of the IEEE/CVF Winter Conference, Village, CO, USA, 1–5 March 2020; pp. 739–748. [Google Scholar]
Fu, B.; Cao, Z.; Wang, J.; Long, M. Transferable query selection for active domain adaptation. In Proceedings of the IEEE/CVF Conference, Nashville, TN, USA, 20–25 June 2021; pp. 7272–7281. [Google Scholar]
Xie, B.; Yuan, L.; Li, S.; Liu, C.H.; Cheng, X.; Wang, G. Active learning for domain adaptation: An energy-based approach. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; Volume 36, pp. 8708–8716. [Google Scholar]
Saha, T.; Upadhyaya, A.; Saha, S.; Bhattacharyya, P. A Multitask Multimodal Ensemble Model for Sentiment- and Emotion-Aided Tweet Act Classification. IEEE Trans. Comput. Soc. Syst. 2022, 9, 508–517. [Google Scholar] [CrossRef]
Bhattacharya, P.; Patel, S.B.; Gupta, R.; Tanwar, S.; Rodrigues, J.J.P.C. SaTYa: Trusted Bi-LSTM-Based Fake News Classification Scheme for Smart Community. IEEE Trans. Comput. Soc. Syst. 2022, 9, 1758–1767. [Google Scholar] [CrossRef]
Menke, M.; Wenzel, T.; Schwung, A. Bridging the gap: Active learning for efficient domain adaptation in object detection. Expert Syst. Appl. 2024, 254, 124403. [Google Scholar] [CrossRef]
Ganin, Y.; Lempitsky, V. Unsupervised domain adaptation by backpropagation. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 7–9 July 2015; pp. 1180–1189. [Google Scholar]
Sedhai, S.; Sun, A. Semi-Supervised Spam Detection in Twitter Stream. IEEE Trans. Comput. Soc. Syst. 2018, 5, 169–175. [Google Scholar] [CrossRef]
Zubiaga, A.; Liakata, M.; Procter, R. Exploiting context for rumour detection in social media. In Proceedings of the Social Informatics: 9th International Conference, Oxford, UK, 13–15 September 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 109–123. [Google Scholar]
Sharma, M.; Bilgic, M. Evidence-based uncertainty sampling for active learning. Data Min. Knowl. Discov. 2017, 31, 164–202. [Google Scholar] [CrossRef]
Sener, O.; Savarese, S. Active learning for convolutional neural networks: A core-set approach. arXiv 2017, arXiv:1708.00489. [Google Scholar]
Zhang, C.; Gao, M.; Huang, Y.; Jiang, F.; Wang, J.; Wen, J. DAAL: Domain Adversarial Active Learning Based on Dual Features for Rumor Detection. In Proceedings of the Natural Language Processing and Chinese Computing, Foshan, China, 12–15 October 2023; pp. 690–703. [Google Scholar]
Silva, A.; Luo, L.; Karunasekera, S.; Leckie, C. Embracing domain differences in fake news: Cross-domain fake news detection using multi-modal data. In Proceedings of the AAAI, Virtual, 19–21 May 2021; Volume 35, pp. 557–565. [Google Scholar]
Nan, Q.; Cao, J.; Zhu, Y.; Wang, Y.; Li, J. MDFEND: Multi-domain fake news detection. In Proceedings of the CIKM, Online, 1–5 November 2021; pp. 3343–3347. [Google Scholar]
Yin, J.; Gao, M.; Shu, K.; Wang, J.; Huang, Y.; Zhou, W. Fine-Grained Discrepancy Contrastive Learning for Robust Fake News Detection. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 12541–12545. [Google Scholar]

Figure 1. A conceptual overview of our proposed dual-aspect active learning strategy.

Figure 2. The architecture of DDT.

Figure 3. Dual-feature extractor framework.

Figure 4. The distribution of selected samples with two perspectives in five datasets. (a) Visualization for Fer. (b) Visualization for Ott. (c) Visualization for Syd. (d) Visualization for Cha. (e) Visualization for Ger.

Figure 5. The accuracy of DDT and others in the pre-training and fine-tuning stages with different proportion of the labeled data.

Figure 6. Results of DDT with different numbers of samples in pre-sampling.

Figure 7. The accuracy of DDT and others in the pre-training and fine-tuning stages with different learning rates.

Figure 8. Results of DDT on different numbers of samples in pre-sampling.

Table 1. Statistics of the five target domains constructed from PHEME [45].

Dataset	Cha.	Fer.	Ger.	Ott.	Syd.
Source Fake News	1514	1688	1734	1502	1450
Source Real News	2208	2970	3598	3408	3132
Target Fake News	458	284	238	470	522
Target Real News	1621	859	231	421	697

Table 2. Overall performance comparison between DDT and the baselines in Group A. The target event represents the corresponding constructed datasets. The best results are highlighted in bold, while the second-best results are underlined.

Dataset	Stra.	15%	20%	25%	30%	35%	40%	45%	50%
Germanwings-crash	TQS	0.738	0.825	0.831	0.800	0.800	0.800	0.825	0.850
	UCN	0.781	0.806	0.850	0.831	0.806	0.825	0.831	0.850
	RAN	0.800	0.816	0.788	0.831	0.788	0.825	0.844	0.819
	CoreSet	0.806	0.844	0.831	0.869	0.831	0.831	0.836	0.819
	DAAL	0.831	0.875	0.863	0.875	0.869	0.856	0.863	0.863
	DDT	0.823	0.870	0.875	0.877	0.900	0.913	0.925	0.900
Sydneysiege	TQS	0.758	0.790	0.777	0.792	0.790	0.804	0.800	0.790
	UCN	0.789	0.833	0.835	0.858	0.844	0.848	0.867	0.854
	RAN	0.817	0.838	0.846	0.842	0.846	0.848	0.858	0.844
	CoreSet	0.838	0.831	0.858	0.858	0.848	0.842	0.854	0.863
	DAAL	0.850	0.850	0.854	0.856	0.856	0.867	0.877	0.871
	DDT	0.821	0.857	0.858	0.863	0.861	0.869	0.882	0.880
Ottawashooting	TQS	0.778	0.804	0.753	0.810	0.793	0.807	0.815	0.827
	UCN	0.824	0.835	0.835	0.872	0.878	0.895	0.884	0.887
	RAN	0.818	0.792	0.830	0.858	0.861	0.852	0.869	0.889
	CoreSet	0.807	0.835	0.849	0.852	0.855	0.875	0.878	0.875
	DAAL	0.838	0.886	0.861	0.895	0.903	0.895	0.895	0.903
	DDT	0.841	0.886	0.898	0.909	0.909	0.915	0.921	0.926
Charliehebdo	TQS	0.785	0.839	0.825	0.850	0.841	0.826	0.869	0.854
	UCN	0.830	0.848	0.835	0.845	0.851	0.846	0.840	0.828
	RAN	0.826	0.833	0.846	0.836	0.835	0.823	0.835	0.846
	CoreSet	0.833	0.828	0.849	0.831	0.838	0.839	0.831	0.829
	DAAL	0.844	0.850	0.861	0.858	0.858	0.861	0.858	0.866
	DDT	0.860	0.878	0.883	0.893	0.873	0.880	0.878	0.893
Ferguson	TQS	0.799	0.813	0.817	0.826	0.839	0.817	0.817	0.828
	UCN	0.792	0.790	0.824	0.864	0.874	0.865	0.871	0.882
	RAN	0.792	0.790	0.814	0.857	0.881	0.875	0.880	0.874
	CoreSet	0.790	0.844	0.826	0.866	0.883	0.880	0.875	0.880
	DAAL	0.790	0.847	0.826	0.877	0.897	0.884	0.895	0.907
	DDT	0.886	0.871	0.884	0.888	0.906	0.902	0.888	0.902

Table 3. Performance comparison between DDT and the baselines in Group B. The target event represents the corresponding constructed datasets. The best results are highlighted in bold, while the second-best results are underlined.

	Charlie.		Sydney.		Ottawash.		Ferguson		Germanw.
	Acc	F1	Acc	F1	Acc	F1	Acc	F1	Acc	F1
EANN	0.843	0.777	0.771	0.762	0.835	0.835	0.848	0.767	0.813	0.812
EDDFN	0.846	0.761	0.805	0.802	0.864	0.863	0.851	0.772	0.819	0.818
MDFEND	0.845	0.768	0.729	0.729	0.864	0.863	0.842	0.742	0.830	0.828
FinDCL	0.848	0.779	0.805	0.800	0.866	0.862	0.853	0.775	0.832	0.829
DAAL	0.850	0.781	0.850	0.818	0.886	0.865	0.847	0.754	0.875	0.875
DDT	0.878	0.782	0.857	0.820	0.886	0.879	0.871	0.667	0.870	0.833

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, L.; Han, G.; Liu, S.; Ren, Y.; Wang, X.; Yang, Z.; Jiang, F. Dual-Aspect Active Learning with Domain-Adversarial Training for Low-Resource Misinformation Detection. Mathematics 2025, 13, 1752. https://doi.org/10.3390/math13111752

AMA Style

Hu L, Han G, Liu S, Ren Y, Wang X, Yang Z, Jiang F. Dual-Aspect Active Learning with Domain-Adversarial Training for Low-Resource Misinformation Detection. Mathematics. 2025; 13(11):1752. https://doi.org/10.3390/math13111752

Chicago/Turabian Style

Hu, Luyao, Guangpu Han, Shichang Liu, Yuqing Ren, Xu Wang, Zhengyi Yang, and Feng Jiang. 2025. "Dual-Aspect Active Learning with Domain-Adversarial Training for Low-Resource Misinformation Detection" Mathematics 13, no. 11: 1752. https://doi.org/10.3390/math13111752

APA Style

Hu, L., Han, G., Liu, S., Ren, Y., Wang, X., Yang, Z., & Jiang, F. (2025). Dual-Aspect Active Learning with Domain-Adversarial Training for Low-Resource Misinformation Detection. Mathematics, 13(11), 1752. https://doi.org/10.3390/math13111752

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dual-Aspect Active Learning with Domain-Adversarial Training for Low-Resource Misinformation Detection

Abstract

1. Introduction

2. Related Work

2.1. Fake Information Detection

2.2. Emotion-Aware Detection

2.3. Active Learning

2.4. Active Domain Adaptation

3. Approach

3.1. Problem Definition

3.2. Framework Overview

3.3. Dual-Feature Extractor

3.3.1. Textual Feature Extraction

3.3.2. Affective Feature Extraction

3.3.3. Combining Textual and Affective Representations

3.4. Domain Discriminator for Domain-Adversarial Training

3.4.1. Domain Discriminator

3.4.2. Adversarial Learning with Gradient Reversal

3.5. Dual-Feature Sampler and Misinformation Detector for Sampling Strategies

3.5.1. Misinformation Detector

3.5.2. Dual-Feature Sampler

3.5.3. Sampling Strategy

3.6. Algorithm Optimization

4. Experiments

4.1. Experimental Setup

4.2. Baselines

4.2.1. Group A: Active Learning Strategies

4.2.2. Group B: Cross-Domain Fake Information Detection Models

4.2.3. Comparison with Baselines

4.2.4. Analysis of Active Learning Sampling

4.2.5. Ablation Analysis

4.2.6. Hyperparameter Sensitivity

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI