A Lightweight Multimodal Framework for Misleading News Classification Using Linguistic and Behavioral Biometrics

Haque, Mahmudul; Bari, A. S. M. Hossain; Gavrilova, Marina L.

doi:10.3390/jcp5040104

Open AccessArticle

A Lightweight Multimodal Framework for Misleading News Classification Using Linguistic and Behavioral Biometrics

by

Mahmudul Haque

^*

,

A. S. M. Hossain Bari

and

Marina L. Gavrilova

Department of Computer Science, University of Calgary, Calgary, AB T2N 1N4, Canada

^*

Author to whom correspondence should be addressed.

J. Cybersecur. Priv. 2025, 5(4), 104; https://doi.org/10.3390/jcp5040104

Submission received: 8 October 2025 / Revised: 20 November 2025 / Accepted: 21 November 2025 / Published: 25 November 2025

(This article belongs to the Special Issue Multimedia Security and Privacy)

Download

Browse Figures

Versions Notes

Abstract

The widespread dissemination of misleading news presents serious challenges to public discourse, democratic institutions, and societal trust. Misleading-news classification (MNC) has been extensively studied through deep neural models that rely mainly on semantic understanding or large-scale pretrained language models. However, these methods often lack interpretability and are computationally expensive, limiting their practical use in real-time or resource-constrained environments. Existing approaches can be broadly categorized into transformer-based text encoders, hybrid CNN–LSTM frameworks, and fuzzy-logic fusion networks. To advance research on MNC, this study presents a lightweight multimodal framework that extends the Fuzzy Deep Hybrid Network (FDHN) paradigm by introducing a linguistic and behavioral biometric perspective to MNC. We reinterpret the FDHN architecture to incorporate linguistic cues such as lexical diversity, subjectivity, and contradiction scores as behavioral signatures of deception. These features are processed and fused with semantic embeddings, resulting in a model that captures both what is written and how it is written. The design of the proposed method ensures the trade-off between feature complexity and model generalizability. Experimental results demonstrate that the inclusion of lightweight linguistic and behavioral biometric features significantly enhance model performance, yielding a test accuracy of 71.91 ± 0.23% and a macro F1 score of 71.17 ± 0.26%, outperforming the state-of-the-art method. The findings of the study underscore the utility of stylistic and affective cues in MNC while highlighting the need for model simplicity to maintain robustness and adaptability.

Keywords:

biometric; linguistic; multimodal; misinformation detection; trust; social media

1. Introduction

The rapid growth of digital ecosystems has transformed the way information is consumed and disseminated. Social media platforms, online news outlets, and user-generated streams now function as high-throughput branches where information circulates at unprecedented speed and scale [1,2]. While this connectivity offers significant benefits, it also introduces new vulnerabilities. Fake news, false or misleading information intentionally crafted to deceive or manipulate public opinion, acts as a disruption within these networks. Its rapid propagation contributes to public confusion, polarizes societal discourse, and erodes trust in legitimate journalism and democratic institutions [3,4]. Within the context of smart cities and connected environments, such misinformation can be considered a social problem, where accurate and timely classification of deceptive content becomes essential. This paper presents a robust system for online misleading information detection, which in the future can be integrated into a mobile device or a social media platform. In this context, a news feed or an opinion shared could be detected and classified as trustworthy or misleading in real-time.

Traditional misleading news classification (MNC) approaches have relied heavily on professional fact-checking. Although generally reliable, manual verification cannot keep pace with the velocity of modern information flows. Fact-checking organizations frequently face large backlogs, meaning their assessments often reach only a fraction of the audiences exposed to misinformation. Moreover, human verification is inherently unable to match the rapid evolution and viral spread of digital content [5]. To address this gap, automated detection systems have emerged, but their practical deployment in real-time fact-checking devices or live media monitoring systems remains limited. State-of-the-art transformer-based approaches are computationally intensive [6], making them unsuitable for lightweight, embedded detection applications.

A further challenge lies in identifying features that are both informative and interpretable across diverse contexts [7]. This growing challenge is compounded by the rapid spread of misinformation in online social media. Opaque algorithms amplify biases and erode user trust. Addressing these issues requires explainable, fairness-aware intelligent systems that integrate cultural, emotional, and contextual factors to strenthen the credibility in digital ecosystems [8]. The dynamic nature of misinformation requires detection systems that remain resilient against paraphrasing, recontextualization, and noisy data streams [9]. Linguistic and behavioral biometric features, such as lexical diversity, readability, and sentiment, can serve as soft sensors that capture discriminative traces of deception. Yet, these features are rarely fused systematically with propagation-level or metadata features in a unified classification framework [10].

To address these challenges, we frame misleading news classification as a multimodal sensing problem and structure this study around the following research questions.

a.: To what extent can misleading news classification be conceptualized as a multimodal sensing problem, where linguistic cues and behavioral biometrics act as “soft sensors” for deception?
b.: How does the integration of biometric and linguistic features with deep neural embeddings improve detection accuracy and robustness compared to existing state-of-the-art methods?
c.: What is the contribution of various modalities, such as credibility history, linguistic and biometric features, and logical consistency, to the overall performance of the proposed method?
d.: How can the interpretability, adaptability, and resilience of a lightweight misleading news classification system be enhanced for noisy or rephrased inputs?

Our work addresses these questions by proposing a multistage MNC system based on an enhanced hybrid deep learning architecture. The system combines textual analysis with biometric-inspired features by utilizing Natural Language Processing (NLP) techniques like part-of-speech tagging, and transformer-based models such as RoBERTa and BERT [11] to balance between performance and computational efficiency. Existing FDHN-based systems [12] have primarily focused on combining learned semantic embeddings with shallow statistical features to improve classification accuracy. However, such approaches use linguistic data as additional text descriptors rather than as behavioral indicators of cognitive processes that constitute deception. In contrast, our approach retains the FDHN backbone while expanding its multimodal input area to include behavioral-biometric and linguistic clues, as well as interpretable signals resulting from writing style, emotional tone, and lexical variability. This conceptual shift places misleading-news detection within a behavioral-linguistic framework, improving interpretability while preserving the overall architecture of the model.

To realize this vision, we propose an MNC system based on Fuzzy Deep Hybrid Network (FDHN) [12] as the feature fusion core. For evaluation, we employ the LIAR2 dataset [12], which provides fact-checked political statements annotated with six-level truth labels and enriched metadata. The MNC system integrates convolutional neural networks (TextCNN) [13], bidirectional long short-term memory (BiLSTM) networks, and fuzzy logic, enabling flexible fusion of heterogeneous features under conditions of uncertainty [12,14]. The main contributions of this study are as follows:

a.: We reconceptualize misleading news classification as a multimodal sensing problem, where linguistic and behavioral cues are treated as soft sensors that capture deception-related linguistic and cognitive patterns for AI-enabled classification of misleading news.
b.: We extend the FDHN framework by incorporating biometric-inspired linguistic features such as lexical diversity, subjectivity, sentiment, and contradiction scores, along with pretrained BERT embeddings, enabling interpretable fusion of behavioral and semantic information.
c.: Through a systematic ablation study and cross-validation analysis, we quantify the contribution of different classification modalities, including credibility metadata, stylistic markers, and logical consistency, to overall classification performance, showing that lightweight handcrafted features provide strong complementary features to deep embeddings.
d.: We empirically demonstrate the lightweight nature and near-real-time feasibility of the framework, achieving millisecond-level inference latency with a small model footprint while still maintaining competitive performance when justification-like context is available.
e.: We employ text augmentation strategies to strengthen robustness against rephrased or noisy misinformation.

This paper thus bridges critical gaps in misleading news classification research by reframing the problem as one of multimodal classification. Here, heterogeneous digital features are fused to achieve both accuracy and interpretability. In contrast to prior approaches that lacked adaptability, demanded heavy computational resources, or ignored multimodal integration, our system demonstrates that lightweight handcrafted features can substantially improve detection when combined with deep representations.

The remainder of this paper is organized as follows. Section 2 reviews related works and highlights the limitations of current approaches. Section 3 introduces the proposed methodology, detailing preprocessing, feature extraction, and the MNC system. Section 4 describes the LIAR2 dataset and its relevance to this study. Section 5 presents the experimental setup and evaluation while discussing the results in comparison with prior work. Finally, the paper has been concluded by summarizing our contributions and outlining directions for future research.

2. Related Works

Misleading or fake news poses a significant challenge to the reliability of information in modern society. While manual fact-checking is generally used for such detection, these methods come with delays that are not sufficient to cope with how fast information is spread using different social media platforms [15]. Hence, several researchers have worked towards the automatic detection of misleading news, ranging from traditional machine learning approaches to sophisticated deep learning approaches.

2.1. Misleading News Classification Approaches

The huge volume of content generated on social media makes the manual methods for fact-checking insufficient. Early automatic detection of misleading news relied on conventional machine learning models, such as Support Vector Machines, Logistic Regression, Naïve Bayes, and Decision Trees. These models do not follow a purely data-driven approach; instead, they use handcrafted features like TF-IDF, n-grams, readability scores, and sentiment polarity to differentiate between genuine and misleading news [16,17]. Khan et al. [18] conducted a comparative study on various datasets that encompassed a wide range of news topics, where they re-evaluated these methods to establish performance baselines. Although these classifiers are computationally efficient and provide interpretability, their static nature and reliance on handcrafted features restrict the ability to capture the evolving and complex patterns of language.

On the other hand, several deep learning methods have taken center stage in MNC recently. Researchers have employed Convolutional Neural Networks (CNNs) to automatically extract local features and Recurrent Neural Networks (RNNs), particularly Long Short-Term Memory (LSTM) networks, to model sequential dependencies within text [19,20]. Hybrid systems that combine CNNs with LSTMs outperformed singular approaches by leveraging both spatial and temporal cues [12,21,22,23]. A breakthrough came with transformer-based models. Fine-tuning pretrained models like BERT, RoBERTa, and XLNet has led to substantial improvements in capturing contextual nuances [24,25,26]. Capuano et al. [27] reported that such models generally provide better generalization across datasets and topics, although their high computational cost remains a challenge. Additionally, mixed models that combine transformer embeddings with traditional deep networks (e.g., CNN + BERT or LSTM + Attention) have emerged as promising alternatives for achieving high accuracy while potentially improving interpretability.

2.2. Multimodal and Hybrid Approaches

Much of the recent research has begun to explore multimodal and hybrid systems that integrate both content and behavioral biometrics. For example, Chakraborty et al. [28] proposed a hybrid model that fuses text embeddings with user interaction data to improve detection performance. Other studies have experimented with graph-based representations to capture the relationships between news items and their propagation on social networks [29]. These approaches aim to overcome the limitations of single-modality models by providing a richer context that can adapt to different types of misinformation. Furthermore, ensemble methods that combine predictions from multiple models have shown improved generalization and robustness across diverse datasets [30].

2.3. Biometrics in Misinformation Analysis

The analysis of linguistic features remains a core aspect of content-based MNC. Recent studies [16,31] have confirmed that misleading news often employs simpler language and exhibits lower lexical diversity. Metrics including n-gram frequencies, part-of-speech (POS) tags, and named entity recognition are widely used to capture differences in writing style. In addition, readability metrics have been explored by researchers using indices like Flesch-Kincaid Grade Level, Gunning Fog Index, and Automated Readability Index to quantify text complexity, where lower scores have been associated with manipulative content [32]. Beyond linguistic cues, handcrafted emotional and behavioral features have also been explored as complementary signals for misinformation detection. Prior work has demonstrated that the extraction of handcrafted emotional features can significantly enhance biometric recognition accuracy. In particular, Bhatia et al. [33] were the first to propose a lightweight bi-modular architecture with handcrafted features to detect emotional states from non-verbal signals. Although their framework operates on behavioral rather than textual data, it highlights the potential of such behavioral biometrics as informative features for classification tasks relevant to misinformation analysis. Similary, MNC tend to use hyperbolic or highly emotional language to influence the readers which make similar approach promising [24]. Moreover, advances in embedding techniques, ranging from static methods like Word2Vec, GLoVe, FastText, etc., to contextual embeddings from transformer-based models, enable the capture of nuanced semantic information [34]. These techniques have improved feature representations and, in turn, the performance of misleading news classifiers has been enhanced.

Although content features remain the primary focus in MNC, recent studies have increasingly incorporated behavioral signals to enhance model robustness. Notably, Shopon et al. [35] provided a comprehensive overview of biometric de-identification methodologies in authentication systems and highlighted psychological behavioral patterns as informative cues for classification. Their taxonomy of de-identification strategies, along with emerging modalities such as sensor-based, social behavioral, psychological, and aesthetic-based biometrics, illustrates the expanding role of behavioral information across domains and motivates its integration within multimodal MNC frameworks. Behavioral biometrics include user engagement patterns, propagation dynamics, and source credibility [36]. Metrics such as the number of shares, comments, and likes can help indicate coordinated misinformation campaigns [21,37]. The speed and diffusion patterns of news articles on social media differ between misleading and real news, offering additional detection cues. Analyzing historical credibility and contextual metadata related to the news source has also proven effective in some hybrid models [24]. However, incorporating behavioral data introduces challenges regarding data heterogeneity and integration complexity. As a result, while content-based models dominate current research, behavioral features represent a promising avenue for future multimodal approaches.

2.4. Evaluation Strategies

In parallel, evaluation practices within the field have evolved considerably. Accuracy, precision, recall, and F1-score remain the primary metrics for benchmarking model performance. For instance, Capuano et al. [27] compared multiple algorithms across several datasets based on many of these evaluation metrics, finding that gradient boosting, eXtreme Gradient Boosting, and multilayer perceptrons achieve robust performance across different scenarios. On the contrary, transformer-based models, such as XLNet and RoBERTa, on applications like sentiment detection, while delivering the highest accuracies, are noted for their high resource demands in comparison to other lightweight and efficient alternatives [38]. Hence, a balance between several of these evaluation metrics is necessary alongside good accuracy.

2.5. Challenges and Research Gaps

Despite notable progress in MNC, several unresolved challenges and research gaps persist, which have been summarized below.

a.: Dataset limitations remain a critical barrier, as most existing resources are English-centric and narrowly focused on political misinformation, restricting generalizability across domains and languages.
b.: Deep learning systems, especially transformer-based models, achieve high accuracy but often operate as black boxes, limiting transparency in decision-making.
c.: Domain adaptability and continuous learning are equally pressing, since misinformation evolves rapidly and models trained on static datasets quickly lose effectiveness.
d.: Computational efficiency poses practical obstacles, with state-of-the-art models requiring significant resources that hinder real-time deployment.
e.: While behavioral biometrics have demonstrated potential, their systematic integration with linguistic cues remains underexplored, leaving opportunities for more holistic detection methods.

Hence, to address these gaps, our work introduces a hybrid multimodal system that integrates lightweight linguistic and behavioral biometric features with deep neural models, aiming to improve robustness, interpretability, and efficiency in MNC.

3. Methodology

The objective of this work is to improve MNC by integrating linguistic and behavioral biometrics. In this study, we have worked with the state-of-the-art FDHN model [12] and optimized it to capture heterogeneous information by integrating textual and numerical inputs from the LIAR2 dataset (discussed in Section 4) alongside handcrafted biometric features. This multifaceted structure consists of several interrelated systems, each processing distinct aspects of the data to enhance both predictive performance and interpretability. An outline of the MNC system has been illustrated in Figure 1.

3.1. Preprocessing

3.1.1. Textual Preprocessing

To prepare the dataset for model training, comprehensive preprocessing has been conducted on all text-based fields, including the news text, contextual metadata, and an auxiliary justification-like field. In this study, the justification text from the LIAR2 dataset is treated as a proxy for automatically retrievable or generated contextual evidence, such as web-retrieved claim summaries or background sentences. We acknowledge, however, that LIAR2 justifications are human-written and are not directly available in real-time settings. The current work assumes the availability of justification-like contextual evidence, while the development of an efficient retrieval or summarization pipeline remains an important direction for future research.

The following procedures have been systematically applied to these fields to enable model training:

a.: HTML and Non-Alphabetic Character Removal: All input text has been sanitized by stripping HTML tags and removing characters that do not belong to the alphabetic character set. This step eliminates extraneous symbols, numeric noise, and potential markup artifacts that may interfere with linguistic analysis.
b.: Lowercasing: All text has been converted to lowercase to ensure uniformity and reduce vocabulary size. This transformation also aids in mitigating issues of sparsity caused by case-sensitive variations of the same word.
c.: Stopword Elimination and Lemmatization: Common stopwords, such as “the,” “is”, and “and” have been removed to focus the analysis on semantically rich terms. Subsequently, lemmatization has been applied to reduce words to their base or dictionary forms, enabling better generalization during model training.
d.: Tokenization and Padding: All text inputs have been tokenized using the BERT tokenizer from the BERT base model for uncased textual inputs. The tokenizer segments the input into subword units while preserving semantic alignment with the pretrained vocabulary of the BERT model. To ensure uniform input dimensions, sequences have been padded to a consistent length, and longer sequences have been truncated as necessary.

These preprocessing steps collectively ensure that the textual inputs are cleaned, standardized, and structurally aligned with the pretrained BERT tokenizer, thereby providing a robust foundation for effective feature extraction and subsequent model training.

3.1.2. Numeric Preprocessing

All numeric fields, including the numeric metadata such as the speaker’s history of truthfulness in the dataset (discussed in Section 3.2), along with the handcrafted linguistic and biometric features, have been normalized. This process ensures that all features are on comparable scales, preventing those with larger magnitudes from overshadowing others in the learning process. Consequently, this normalization ensures numerical stability and promotes faster convergence.

To ensure fair training and evaluation, the preprocessed dataset is stratified and divided into three subsets: 80% is allocated for training the model parameters, 10% is used as a validation set to monitor performance and prevent overfitting during training, and the remaining 10% is reserved as a test set for the final evaluation of the model’s generalization ability. Stratified sampling has been utilized to preserve the proportional distribution of the six class labels across all three subsets, ensuring that each subset is representative of the overall dataset.

3.2. Linguistic & Biometric Feature Extraction

In order to enrich the representational capacity of the dataset beyond the original metadata, a set of handcrafted features was introduced. In addition to the contextual embeddings obtained from the pretrained BERT model, we extracted a set of handcrafted linguistic and behavioral features designed to capture stylistic, affective, and cognitive patterns associated with deceptive writing. These include lexical diversity, sentiment polarity and scores, subjectivity, syntactic richness, and contradiction likelihood. All features were extracted using the LIAR2 dataset as the experimental corpus. However, the extraction process relies on domain-independent resources and pretrained models and established lexical analysis functions. Because these tools are general-purpose and not fine-tuned to LIAR2, the feature extraction pipeline can be directly applied to other textual datasets, supporting reproducibility and cross-domain generalization. The feature extraction process has been discussed further.

3.2.1. Lexical Analysis

Lexical analysis has been conducted to quantify the stylistic attributes of the news text. Three specific indicators were extracted:

a.: Type-Token Ratio (TTR): This metric is used to estimate lexical diversity, computed as the ratio of unique tokens (types) to the total number of tokens in a statement. Higher TTR values suggest a broader vocabulary range and greater linguistic richness.
b.: Exclamation Mark Frequency (EMF): The number of exclamation marks is counted in each statement to serve as a proxy for emotional intensity or rhetorical emphasis, which can correlate with subjective or manipulative discourse.
c.: Adjective Density (AD): This feature captures the relative frequency of adjectives in a given statement, as identified through part-of-speech tagging. Adjective usage is often associated with evaluative language and can reflect attempts to exaggerate or qualify assertions.

3.2.2. Sentiment and Subjectivity Analysis

To capture the sentimental tone and opinionated nature of the news text, sentiment labels, sentiment scores, and subjectivity scores are computed and incorporated as additional numerical features on top of those already provided in the LIAR2 dataset.

a.: Sentiment Label and Score: A RoBERTa-based sentiment classifier is utilized to assign polarity (positive or negative) and a corresponding confidence score. RoBERTa is chosen due to its superior performance in sentence-level classification tasks compared to earlier transformer models such as BERT, owing to its dynamic masking and optimized pretraining procedure [38,39]. The resulting sentiment scores enable the model to capture affective orientation and emotional intensity in news statements, which are strongly correlated with misinformation framing [40].
b.: Subjectivity Score: To assess the extent of opinionated versus factual content, we incorporate scores derived from BiBERT-Subjectivity, a transformer-based model specifically trained to detect subjective linguistic patterns. Transformer-based subjectivity models outperform traditional lexicon-based approaches by accounting for context and subtle linguistic markers [41]. This measure allows the system to quantify the degree of speculation or personal belief embedded within statements, features that have been linked to deceptive discourse [42].

3.2.3. Contradiction Detection

To evaluate the internal logical consistency of statements, a contradiction detection mechanism is proposed. Statements are first decomposed into clauses using SpaCy’s dependency parser, which is widely adopted for syntactic segmentation due to its balance of efficiency and accuracy in dependency-based parsing [43]. Next, clause pairs are evaluated with the RoBERTa-MNLI model, a transformer fine-tuned on natural language inference tasks, which excels at recognizing semantic entailment, contradiction, and neutrality [44]. This tool is particularly well-suited for contradiction detection because it captures nuanced contextual relationships beyond surface-level overlap. Finally, a contradiction score is computed as the proportion of contradictory clause pairs to the total number of clause pairs, providing a quantitative measure of self-inconsistency. According to [45], contradiction-based metrics exhibit correlations with deception and misinformation.

3.3. MNC Architecture

The proposed MNC architecture is designed to effectively integrate heterogeneous information sources. It combines semantic representations of textual content, contextual metadata, handcrafted linguistic and biometric features, as well as structured numerical indicators. The architecture inherits the core FDHN architecture, consisting of separate branches for text embeddings and auxiliary features. Our main innovation lies in the construction and behavioral interpretation of the auxiliary branch. Instead of relying solely on statistical metadata, we introduce handcrafted features grounded in psycho-linguistic and affective theory, such as lexical diversity, subjectivity and sentiment scores, and contradiction likelihood. These attributes serve as proxies for cognitive and emotional patterns observed in deceptive writing, and thus act as behavioral-biometric signals within the FDHN framework. The system, illustrated in Figure 2, consists of four primary processing branches followed by a multimodal fusion and classification stage.

3.3.1. Text Processing

The system begins with multiple TextCNN branches. The first branch processes the news text. The input sequence is transformed into dense contextualized embeddings using BERT, which captures semantic dependencies within the text. Since language inherently exhibits a multi-scale structure, parallel convolutional layers with kernel sizes of 3, 4, and 5 have been used to extract n-gram features at different granularities, capturing hierarchical information. This process is followed by non-linear activation (ReLU) and max-pooling operations, which serve to distill the most salient features over the sequence. The purpose of the maxpooling layer is to pass the strongest features to the subsequent layer. The pooled representations are concatenated and passed through a linear projection layer, yielding a compact semantic representation of the news text as Output 1 in Figure 2. The result is a fixed-size representation that encapsulates the essential semantic and syntactic attributes of the input statement.

The second branch is dedicated to processing additional contextual metadata such as date, subject, speaker, and related descriptive information. Although these inputs are categorical in nature, transforming them via a TextCNN enables the model to leverage textual patterns and contextual cues embedded in the metadata. This branch contributes auxiliary information that may provide insights into temporal and contextual factors influencing misinformation. A structure similar to the news text branch is used, which includes embedding, parallel convolutions, max-pooling, concatenation, and linear projection with dropout, as these contextual text features are essentially text-based, with some being very similar to the news text data. This ensures consistency in how textual information is represented, while enabling the capture of context-specific cues. The resulting Output 2, in Figure 2, is complementary to the news text representation.

3.3.2. Numeric Metadata

The MNC system processes the numeric metadata through a CNN-BiLSTM system. CNN component performs local feature extraction on the numeric metadata, which includes both the original textual context and the handcrafted biometric features. The CNN filters capture localized correlations and dependencies within these feature vectors, serving to highlight patterns that are obscure in the raw numeric data. Following the CNN, a BiLSTM network is used to model the sequential dependencies and temporal dynamics among the numeric features. The bidirectional nature of the LSTM allows the model to consider both past and future contextual information, which is particularly valuable when the numeric features represent time-dependent behavioral features or evolving patterns. The outputs from the BiLSTM provide a comprehensive embedding that encapsulates the complex interactions across different numeric dimensions.

The numerical data processing block has two separate branches. The first branch is designed to handle the handcrafted linguistic and biometric features. These features are first projected into a dense latent space through a linear embedding layer. A 1D convolutional layer with kernel size 1 is used to capture local feature interactions, followed by a BiLSTM network to model bidirectional contextual dependencies. BiLSTM enables the modeling of sequential dependencies among biometric indicators, thereby capturing dynamic stylistic and affective features. The resulting representation forms Output 3 (see Figure 2). The second numeric branch processes numerical metadata in the LIAR2 dataset. The processing pipeline mirrors that of the biometric context branch. This branch generates an enriched representation of structured credibility metadata, which complements the biometric features.

3.3.3. Fusion and Classification

The outputs from all four primary branches: news text, textual context, biometric context, and numerical context are concatenated as the fuzzy membership output. The fuzzy membership output represents the degree of belonging of a feature to multiple categories simultaneously unlike binary outputs that classify inputs as either fully belonging or not belonging to a category. This fuzzy layer serves to encode the inherent vagueness and imprecision in features such as subjectivity, sentiment, and credibility, which are rarely strictly binary in nature. The fuzzy layer enhances interpretability while preserving nuanced information that might be lost in hard thresholding by expressing these features as degrees of membership. All branch outputs, together with the fuzzy membership representation, are concatenated into a unified multimodal vector. A fusion layer then projects this joint representation into the six-class output space: True, Mostly True, Half True, Barely True, False, and Pants on Fire. Finally, the model produces a probability distribution over these categories, with the highest-probability label selected as the final prediction.

3.4. Text Augmentation Strategy

To enhance the robustness of the proposed MNC system, we implemented a text augmentation pipeline. The goal of this strategy is to increase the diversity of training examples and reduce overfitting to low-level features, thereby enabling the model to better generalize to unseen or rephrased misinformation instances.

The augmentation method applies transformations to the input text with a fixed probability of 50%. In our work, we have considered two augmentation techniques: synonym replacement and random deletion. These techniques aim to enhance the model’s robustness and generalization by exposing it to diverse textual variations. While training, when augmentation is applied, one or both of the data augmentation methods is randomly selected to prevent any bias toward a specific technique.

a.: Synonym Replacement (SR): This method utilizes the WordNet [46] lexical database to identify semantically related terms. For each input sentence, a subset of candidate words with available synonyms has been identified. Up to one token is then replaced with a synonym chosen at random. For example, “The politician denied the allegations” may be transformed into “The politician repudiated the allegations”. This procedure introduces lexical variation while preserving semantic fidelity. In the context of the LIAR2 dataset, where claims often exhibit diverse phrasing for similar meanings, SR encourages the model to focus on the underlying semantic content rather than memorizing exact word forms, improving generalization across differently worded statements.
b.: Random Deletion (RD): In this technique, each token in the input sequence has an independent probability of being removed. In our implementation, probability is set to 0.1 to avoid degenerated cases. Thus, at least one word is always retained. For instance, “The report was completely false and misleading” may become “The report was false misleading”. Since misleading news claims in LIAR2 are often short, incomplete, or noisy, RD simulates such real-world variations, training the model to tolerate missing or partial information while still accurately classifying the claim’s truthfulness.

The proposed method introduces lexical and structural variability that reflects the characteristics of the LIAR2 dataset by combining these augmentation techniques. In LIAR2, claims are often short, context-dependent, and phrased differently across sources, with many being paraphrased, incomplete, or noisy. Accordingly, the augmentation strategy enables the model to learn the underlying truthfulness of a claim, rather than overfitting to exact wording, thereby improving its robustness in detecting misinformation across diverse linguistic expressions and truncated statements.

3.5. Training and Optimization

The proposed hybrid model was trained using supervised learning on the LIAR2 dataset. To ensure the model is robust, we have implemented a 5-fold cross-validation on the dataset, including the default train, validation, and test split. To further optimize its performance, a standard deep learning training approach was implemented, incorporating mechanisms specifically designed to ensure generalization and prevent overfitting. Key hyperparameters such as the learning rate, batch size, number of epochs, choice of optimizer, dropout rate, activation functions, and loss function were carefully selected for this purpose.

The proposed model is optimized using a combination of loss function, adaptive optimization, and regularization techniques to ensure stable convergence and generalization. The optimization is further guided by the Binary Cross-Entropy with Logits Loss (BCEWithLogitsLoss) [47], which integrates a sigmoid activation with binary cross-entropy in a numerically stable formulation. For a dataset with N samples and C labels (in multi-label classification), the loss is defined in Equation (1). This loss penalizes incorrect and overconfident predictions and is well-suited for binary and multi-label classification tasks. It is particularly effective for multi-class classification tasks where labels are encoded in a one-hot format, enabling the model to handle nuanced distinctions among truthfulness categories. Gradient-based optimization is performed using the Adam optimizer, selected for its computational efficiency and adaptive learning rate mechanism. A fixed learning rate of

10^{- 3}

is maintained across training epochs, providing consistent convergence without the need for a scheduling policy.

L_{BCE} = - \frac{1}{N} \sum_{n = 1}^{N} \sum_{c = 1}^{C} [y_{n, c} \cdot log (σ ({\hat{y}}_{n, c})) + (1 - y_{n, c}) \cdot log (1 - σ ({\hat{y}}_{n, c}))]

(1)

where:

$y_{n, c} \in {0, 1}$ is the ground truth label for sample n and class c,
${\hat{y}}_{n, c} \in R$ is the raw model output (logits),
$σ ({\hat{y}}_{n, c}) = \frac{1}{1 + e^{- {\hat{y}}_{n, c}}}$ is the sigmoid activation function.

To mitigate overfitting, dropout layers are incorporated into multiple components of the system, most prominently within the TextCNN modules and the final fusion layer. These dropout operations enhance robustness and promote better generalization to unseen data by randomly deactivating neurons during training. All training experiments are executed on an NVIDIA RTX 3090 GPU, using PyTorch 2.6.0 with CUDA backend which provided significant acceleration of tensor computations and thereby reducing training time. Furthermore, a checkpointing mechanism is employed, wherein the model state is periodically stored based on the lowest observed validation loss. This procedure ensures that the best-performing configuration, as evaluated on held-out validation data, is retained for final testing and reporting.

4. Dataset

The proposed study evaluates MNC using the LIAR2 dataset, an enhanced version of the original LIAR dataset [48]. LIAR comprises 12,800 short political statements from PolitiFact [49], each labeled with one of six truthfulness categories: True, Mostly True, Half True, Barely True, False, and Pants-on-Fire. It includes basic contextual features such as the statement’s subject, speaker identity, job title, state information, political affiliation, and the speaker’s historical record of false statements. The dataset is split into training (80%), validation (10%), and test (10%) subsets and presents a multi-class classification challenge that closely resembles real-world misinformation scenarios.

Building on LIAR, the LIAR2 dataset [12] expands the collection to 22,962 fact-checked statements and introduces several key enhancements. Contextual features have been refined to include Speaker2Credit, which tracks the historical number of true statements by each speaker, as well as full speaker biographies rather than just job titles and political affiliations. Additional improvements include fact-checker justifications for each statement, timestamp information, revised geographic context reflecting the location mentioned in the statement, and correction of prior data errors. Table 1 summarizes the differences between LIAR and LIAR2, highlighting why LIAR2 was chosen for this study: the dataset’s expanded size, richer contextual features, and more comprehensive speaker credibility information provide a robust foundation for extracting informative features for MNC.

The LIAR2 dataset provides enriched annotations and contextual metadata, both textual and numerical, as shown in Table 2.

However, it still exhibits limitations when used directly for MNC. In particular, the raw news texts alone are insufficient to capture the subtle linguistic and behavioral cues that often distinguish deceptive from truthful content. To address this gap, we extract handcrafted linguistic features and behavioral biometric features to complement the dataset and enhance model performance. Linguistic features quantify properties such as sentiment polarity, subjectivity, stylistic markers, and intra-statement contradictions, thereby capturing the nuanced rhetorical strategies and affective framing commonly exploited in misinformation. Behavioral biometric features, on the other hand, are derived from contextual metadata such as speaker credibility history, frequency of past truthfulness, and patterns of information dissemination. These features encode speaker-level tendencies and propagation behaviors that are critical for assessing reliability beyond the surface content of the text.

The integration of handcrafted features is necessary because they provide interpretable, domain-relevant features that deep models trained solely on textual embeddings may overlook. By enriching LIAR2 with these linguistic and biometric indicators, our system not only strengthens its ability to distinguish varying degrees of truthfulness but also improves generalization across diverse claims. This combined representation ensures a more robust evaluation of MNC systems, extending analysis beyond basic content modeling to incorporate both textual structure and speaker-level behavioral dynamics.

5. Experimental Evaluation

To evaluate the effectiveness of the proposed model and the impact of the handcrafted biometric features, a series of experiments has been conducted using different configurations of the numerical metadata. All experiments retain the same textual input features (news text and textual metadata) to isolate the effect of varying numeric inputs.

5.1. Experimental Setup

The main purpose of the experiments conducted in this study is to evaluate the effectiveness of our proposed method in utilizing the news text in conjunction with diverse contextual features. To ensure a fair and reproducible evaluation, we compared our proposed model primarily against the FDHN baseline (Cheng et al. [12]), which shares the same architectural backbone and training pipeline. This choice allows us to isolate the effect of introducing behavioral-biometric linguistic features without the confounding influence of differing tokenization strategies, pretrained embeddings, or network capacities. In order to achieve this, we implement the following experiments using the LIAR2 dataset.

a.

Experiment 1 (Feature Ablation Study): Here, each numerical feature, including our handcrafted features, has been combined independently with the news text and textual context while training the model to determine the impact of each numerical feature on improving the model performance.

b.

Experiment 2 (Original Metadata Configuration): This experiment is conducted to include only the original numeric and textual metadata fields from the LIAR2 dataset.

c.

Experiment 3 (Handcrafted Feature-Enhanced Baseline): This experiment is designed based on the significance of the handcrafted linguistic and biometric features. A threshold has been determined, and any feature above that threshold is to be considered along with the numerical and textual context from the dataset. This experiment involved initially training the model for 10 epochs with a batch size of 32 based on the BCEWithLogitsLoss function at a learning rate of

10^{- 3}

. This is followed by retraining the model at its minimal loss state using Cross Entropy Loss at a learning rate of

10^{- 4}

, repeating the process for BCEWithLogitsLoss function at a learning rate of

10^{- 5}

. This step was particularly useful for escaping shallow local minima and improving convergence stability. For a classification task with N samples and C mutually exclusive classes, the categorical cross-entropy loss is defined in Equation (2)

L_{CE} = - \frac{1}{N} \sum_{n = 1}^{N} \sum_{c = 1}^{C} y_{n, c} log ({\hat{y}}_{n, c})

(2)

where:

$y_{n, c} \in {0, 1}$ denotes the one-hot encoded ground truth label for class c,
${\hat{y}}_{n, c} \in (0, 1)$ is the predicted probability for class c, typically obtained via a softmax activation over the logits.

While BCEWithLogitsLoss is appropriate for multi-label settings, switching to Cross-Entropy Loss during fine-tuning provides smoother gradients and improves convergence, potentially allowing the model to overcome suboptimal regions in the loss landscape. This is the baseline model.

d.

Experiment 4 (Augmentation Enhanced Baseline): Using the weights of the baseline trained with the handcrafted features in Experiment 3, augmentation was performed on the dataset to optimize the model to perform across a variety of new data.

5.2. Evaluation Metrics

The performance of the model was assessed using three widely adopted evaluation metrics: Accuracy, Macro F1-Score, and Micro F1-Score. Accuracy measures the proportion of correctly predicted labels out of the total number of samples, providing an overall indication of how well the model distinguishes between true and false news. However, since accuracy can be influenced by class imbalance, we also report F1-scores. The Macro F1-Score computes the F1-score independently for each class and then averages them, treating all classes equally regardless of their frequency. This ensures that minority classes contribute equally to the evaluation. In contrast, the Micro F1-Score aggregates the counts of true positives, false positives, and false negatives across all classes before computing the F1-score. This approach gives more weight to classes with a higher number of samples and thus reflects the global performance of the model across the dataset. Moreover, the inference time of each stage of the proposed method and memory utilization have been presented in Section 5.3 to evaluate the suitability of the method for real-time applications.

5.3. Experimenal Results

To evaluate the effectiveness of the proposed MNC system, we have conducted a set of experiments outlined in Section 5.1. Each experiment is designed to progressively isolate and then integrate different categories of handcrafted biometric-inspired and linguistic-contextual features. This systematic design allows us to assess both the standalone effect of individual features and their combined contribution when integrated into the multimodal model. Performance has been measured using Accuracy, Macro F1-Score, and Micro F1-Score, as discussed in Section 5.2.

5.3.1. Experiment 1: Baseline with Truth-Count Metadata

The first experiment tests the discriminative power of each numerical feature independently by combining it with the news text during model training. Table 3 presents the performance obtained from both credibility history counts (e.g., prior truthfulness labels of speakers) as well as handcrafted linguistic and biometric features (e.g., type-token ratio, subjectivity, contradiction).

The results indicate that among the handcrafted linguistic and biometric features considered in our work to provide meaningful context to the model, the sentiment labels and scores demonstrated only marginal gains, highlighting potential limitations in modeling logical consistency when statements are short or structurally simple. Based on these findings, sentiment features have been excluded from subsequent experiments. On the other hand, the remaining features, when paired with the news text, consistently improved discriminability, suggesting that affective and stylistic markers provide complementary information for MNC.

5.3.2. Experiments (2–4): Progressive Feature Integration

To better understand the cumulative impact of feature integration, Experiments 2–4 progressively incorporated larger sets of numeric metadata, including both original credibility scores and handcrafted biometric-inspired features. Results are summarized in Table 4, alongside comparison to the state-of-the-art baseline reported by Cheng et al. [12].

In Experiment 2, our proposed system utilized only the six credibility scores. This configuration has achieved 71.27 ± 0.07% accuracy and 69.67 ± 0.09% macro F1, only a marginal improvement over Cheng et al. [12]. These results confirm that credibility history alone is informative but insufficient for robust generalization.

Experiment 3 integrated the handcrafted linguistic and biometric features. This integration yielded substantial gains, improving accuracy to 71.41 ± 0.12% and macro F1 to 69.67 ± 0.09%. These results validate our hypothesis that linguistic expressivity and affective markers provide significant discriminatory cues in distinguishing deceptive from truthful statements. In particular, subjectivity and adjective density appear to capture narrative style and evaluative framing, which are often exploited in deceptive communication. These results validate our hypothesis that stylistic and evaluative cues provide complementary information to credibility metadata, enabling finer-grained discrimination between deceptive and factual content.

Finally, Experiment 4 employed dataset augmentation alongside further optimization of the MNC system, leading to the best performance with 71.91 ± 0.23% accuracy and 71.17 ± 0.26% macro F1. This outcome highlights the value of combining handcrafted features with augmentation strategies, which together enhance model robustness and generalization across diverse samples.

To validate the suitability of the model for real-time scenarios, we conducted empirical profiling of inference latency, overall processing time, model size, and memory usage. Tests were performed on a workstation equipped with an NVIDIA RTX 3090 GPU. Results indicate an average inference latency of 2.43 ms per input sample by the model, with the complete pipeline, including biometric feature extraction, requiring 7.98 ms. The total model footprint is 38.91 MB with 10.2 million parameters. Table 5 summarizes the real-time inference breakdown of the proposed framework, showing the latency contribution of each processing component. These results demonstrate that the computational components implemented in this study can operate in real-time, though full real-time deployment would additionally require an efficient context-retrieval mechanism.

Overall, the experimental trajectory demonstrates that while credibility metadata provides a strong foundation, the integration of handcrafted biometric-inspired and linguistic features substantially improves performance. Moreover, further augmentation of the samples yield state-of-the-art results on the LIAR2 dataset.

5.4. Discussion

The experimental results highlight several important insights. The handcrafted biometric-inspired features, including subjectivity, contradiction score, and linguistic indicators, significantly improve classification outcomes, indicating that deceptive language often contains measurable stylistic and affective cues. In contrast, sentiment labels and scores contributed to introducing noise, likely due to the limited depth of information in the short news statements in the LIAR2 dataset [12]. While individual handcrafted features provided modest benefits in isolation, their integration with credibility scores leads to clear and consistent performance gains, emphasizing the value of multimodal feature fusion. Unlike previous FDHN variants, which generally integrate generic linguistic features, our model interprets handcrafted cues as behavioral-biometric indicators that capture subconscious writing behaviors associated with deception and bias. This framing preserves architectural simplicity while contributing interpretability and cross-domain adaptability. Hence, the innovation lies in the input semantics and behavioral grounding. The inference time and model footprint of the proposed method are well-suited for real-time applications and can be deployed on standard hardware given the justification text is available. The findings align with our objective of achieving an interpretable yet computationally efficient framework for practical misinformation detection. Additionally, targeted data augmentation has demonstrated stabilized training and enhanced generalization, suggesting practical approaches for improving MNC in low-resource or imbalanced data scenarios. It is important to note that all robustness observations in this work are based on the LIAR2 dataset, which focuses on political claims. Although our results demonstrate robustness to segment sensitivity, they do not guarantee cross-domain generalization. Therefore, broader generalization remains an open challenge for future work. Furthermore, the dataset includes human-written justification text. However, such justification could potentially be generated in real time using retrieval-augmented models [50], presenting another avenue for future work aimed at enabling live inference.

6. Conclusions

The study proposes a novel approach to misinformation classification, introducing a hybrid deep learning method that integrates textual semantics and contextual metadata with handcrafted linguistic and behavioral biometric features. Our proposed robust misleading information classification system excels at identifying subtle patterns of truthfulness, surpassing the capabilities of traditional semantic models. It utilizes handcrafted indicators such as lexical diversity, subjectivity, and contradiction detection, in addition to augmentation techniques and pretrained embeddings within a deep neural network. Extensive experiments on the LIAR2 dataset demonstrated that our approach not only improves classification accuracy but also establishes state-of-the-art performance, achieving 71.91 ± 0.23% accuracy and 71.17 ± 0.26% macro-F1, surpassing prior baselines. These results confirm the value of combining lightweight linguistic and behavioral biometrics with deep learning to enhance both predictive power and interpretability.

Looking ahead, we aim to improve computational efficiency through lightweight systems and architectures that are compatible with edge devices. The proposed MNC system can be deployed effectively in resource-constrained IoT networks. By integrating this MNC framework into smart city infrastructures, we can enable real-time monitoring of misinformation across civic platforms and social media, which will support more informed decision-making in urban environments. Additionally, expanding datasets to include multilingual, multimodal, and location-aware data will enhance the ability of the system to detect subtle patterns of deception among diverse urban populations in smart cities.

Author Contributions

Conceptualization, M.H., A.S.M.H.B. and M.L.G.; methodology, M.H. and A.S.M.H.B.; validation, M.H.; formal analysis, M.H. and A.S.M.H.B.; investigation, M.H.; resources, M.L.G.; data curation, M.H.; writing—original draft preparation, M.H.; writing—review and editing, M.H., A.S.M.H.B. and M.L.G.; visualization, M.H.; supervision, A.S.M.H.B. and M.L.G.; project administration, A.S.M.H.B. and M.L.G.; funding acquisition, M.L.G. All authors have read and agreed to the published version of the manuscript.

Funding

The project was funded in part by NSERC Discovery Grant 10007544, the UCalgary Research Chair in Trustworthy and Explainable AI 1004085, and NSERC Alliance—Alberta Innovates Advance Program 232403253.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data used in the study are openly available in Hugging Face repository at https://huggingface.co/datasets/chengxuphd/liar2 (accessed on 1 October 2025).

Acknowledgments

The authors acknowledge the support from the Biometric Technology lab members and colleagues at the University of Calgary.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Bovet, A.; Makse, H.A. Influence of fake news in Twitter during the 2016 US presidential election. Nat. Commun. 2019, 10, 7. [Google Scholar] [CrossRef] [PubMed]
Xarhoulacos, C.G.; Anagnostopoulou, A.; Stergiopoulos, G.; Gritzalis, D. Misinformation vs. situational awareness: The art of deception and the need for cross-domain detection. Sensors 2021, 21, 5496. [Google Scholar] [CrossRef] [PubMed]
Vosoughi, S.; Roy, D.; Aral, S. The spread of true and false news online. Science 2018, 359, 1146–1151. [Google Scholar] [CrossRef]
Iavich, M. Combating Fake News with Cryptography in Quantum Era with Post-Quantum Verifiable Image Proofs. J. Cybersecur. Priv. 2025, 5, 31. [Google Scholar] [CrossRef]
Bode, L.; Vraga, E.K.; Alvarez, G.; Johnson, C.N.; Konieczna, M.; Mirer, M. What viewers want: Assessing the impact of host bias on viewer engagement with political talk shows. J. Broadcast. Electron. Media 2018, 62, 597–613. [Google Scholar] [CrossRef]
Guo, Z.; Schlichtkrull, M.; Vlachos, A. A survey on automated fact-checking. Trans. Assoc. Comput. Linguist. 2022, 10, 178–206. [Google Scholar]
Kozik, R.; Mazurczyk, W.; Cabaj, K.; Pawlicka, A.; Pawlicki, M.; Choraś, M. Deep learning for combating misinformation in multicategorical text contents. Sensors 2023, 23, 9666. [Google Scholar] [CrossRef]
Anzum, F.; Asha, A.Z.; Dey, L.; Gavrilov, A.; Iffath, F.; Ohi, A.Q.; Pond, L.; Shopon, M.; Gavrilova, M.L. A comprehensive review of trustworthy, ethical, and explainable computer vision advancements in online social media. In Global Perspectives on the Applications of Computer Vision in Cybersecurity; IGI Global Scientific Publishing: Hershey, PA, USA, 2024; pp. 1–46. [Google Scholar]
Farhangian, F.; Ensina, L.A.; Cavalcanti, G.D.; Cruz, R.M. DRES: Fake news detection by dynamic representation and ensemble selection. arXiv 2025, arXiv:2509.16893. [Google Scholar] [CrossRef]
Garg, S.; Sharma, D.K. Linguistic features based framework for automatic fake news detection. Comput. Ind. Eng. 2022, 172, 108432. [Google Scholar] [CrossRef]
Horne, B.; Adali, S. This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In Proceedings of the International AAAI conference on Web and Social Media, Montreal, QC, Canada, 15–18 May 2017; Volume 11, pp. 759–766. [Google Scholar]
Xu, C.; Kechadi, M.T. An enhanced fake news detection system with fuzzy deep learning. IEEE Access 2024, 12, 88006–88021. [Google Scholar] [CrossRef]
Kim, Y. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014; pp. 1746–1751. [Google Scholar]
Pennycook, G.; Rand, D.G. Fighting misinformation on social media using crowdsourced judgments of news source quality. Proc. Natl. Acad. Sci. USA 2019, 116, 2521–2526. [Google Scholar] [CrossRef] [PubMed]
Shu, K.; Sliva, A.; Wang, S.; Tang, J.; Liu, H. Fake news detection on social media: A data mining perspective. ACM SIGKDD Explor. Newsl. 2017, 19, 22–36. [Google Scholar]
Zhang, X.; Ghorbani, A.A. An overview of online fake news: Characterization, detection, and discussion. Inf. Process. Manag. 2020, 57, 102025. [Google Scholar] [CrossRef]
Pérez-Rosas, V.; Kleinberg, B.; Lefevre, A.; Mihalcea, R. Automatic Detection of Fake News. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA, 20–26 August 2018; pp. 3391–3401. [Google Scholar]
Khan, J.Y.; Khondaker, M.T.I.; Afroz, S.; Uddin, G.; Iqbal, A. A benchmark study of machine learning models for online fake news detection. Mach. Learn. Appl. 2021, 4, 100032. [Google Scholar] [CrossRef]
Nasir, J.A.; Khan, O.S.; Varlamis, I. Fake news detection: A hybrid CNN-RNN based deep learning approach. Int. J. Inf. Manag. Data Insights 2021, 1, 100007. [Google Scholar] [CrossRef]
Ali, A.M.; Ghaleb, F.A.; Al-Rimy, B.A.S.; Alsolami, F.J.; Khan, A.I. Deep ensemble fake news detection model using sequential deep learning technique. Sensors 2022, 22, 6970. [Google Scholar] [CrossRef] [PubMed]
Varma, R.; Verma, Y.; Vijayvargiya, P.; Churi, P.P. A systematic survey on deep learning and machine learning approaches of fake news detection in the pre-and post-COVID-19 pandemic. Int. J. Intell. Comput. Cybern. 2021, 14, 617–646. [Google Scholar]
Taeb, M.; Chi, H. Comparison of deepfake detection techniques through deep learning. J. Cybersecur. Priv. 2022, 2, 89–106. [Google Scholar] [CrossRef]
Xu, C.; Kechadi, M.T. Fuzzy deep hybrid network for fake news detection. In Proceedings of the 12th International Symposium on Information and Communication Technology, Ho Chi Minh, Vietnam, 7–8 December 2023; pp. 118–125. [Google Scholar]
Agrawal, C.; Pandey, A.; Goyal, S. A survey on role of machine learning and NLP in fake news detection on social media. In Proceedings of the 2021 IEEE 4th International Conference on Computing, Power and Communication Technologies, Kuala Lumpur, Malaysia, 24–26 September 2021; pp. 1–7. [Google Scholar]
Al-Alshaqi, M.; Rawat, D.B.; Liu, C. Ensemble techniques for robust fake news detection: Integrating transformers, natural language processing, and machine learning. Sensors 2024, 24, 6062. [Google Scholar] [CrossRef]
Rout, J.; Mishra, M.; Saikia, M.J. Towards Reliable Fake News Detection: Enhanced Attention-Based Transformer Model. J. Cybersecur. Priv. 2025, 5, 43. [Google Scholar] [CrossRef]
Capuano, N.; Fenza, G.; Loia, V.; Nota, F.D. Content-based fake news detection with machine and deep learning: A systematic review. Neurocomputing 2023, 530, 91–103. [Google Scholar] [CrossRef]
Chakraborty, A.; Khatri, I.; Choudhry, A.; Gupta, P.; Vishwakarma, D.K.; Prasad, M. An emotion-guided approach to domain adaptive fake news detection using adversarial learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 16178–16179. [Google Scholar]
Guo, Z.; Yu, K.; Jolfaei, A.; Li, G.; Ding, F.; Beheshti, A. Mixed graph neural network-based fake news detection for sustainable vehicular social networks. IEEE Trans. Intell. Transp. Syst. 2022, 24, 15486–15498. [Google Scholar] [CrossRef]
Singh, P.; Srivastava, R.; Rana, K.; Kumar, V. SEMI-FND: Stacked ensemble based multimodal inferencing framework for faster fake news detection. Expert Syst. Appl. 2023, 215, 119302. [Google Scholar] [CrossRef]
Lahby, M.; Aqil, S.; Yafooz, W.M.; Abakarim, Y. Online fake news detection using machine learning techniques: A systematic mapping study. In Combating Fake News with Computational Intelligence Techniques; Springer: Cham, Switzerland, 2021; pp. 3–37. [Google Scholar]
Parikh, S.B.; Atrey, P.K. Media-rich fake news detection: A survey. In Proceedings of the 2018 IEEE Conference on Multimedia Information Processing and Retrieval, Miami, FL, USA, 10–12 April 2018; pp. 436–441. [Google Scholar]
Bhatia, Y.; Bari, A.H.; Hsu, G.S.J.; Gavrilova, M. Motion capture sensor-based emotion recognition using a bi-modular sequential neural network. Sensors 2022, 22, 403. [Google Scholar] [CrossRef]
Alkaabi, H.; Jasim, A.K.; Darroudi, A. From Static to Contextual: A Survey of Embedding Advances in NLP. PERFECT J. Smart Algorithms 2025, 2, 64–73. [Google Scholar] [CrossRef]
Shopon, M.; Tumpa, S.N.; Bhatia, Y.; Kumar, K.P.; Gavrilova, M.L. Biometric systems de-identification: Current advancements and future directions. J. Cybersecur. Priv. 2021, 1, 470–495. [Google Scholar] [CrossRef]
Ruchansky, N.; Seo, S.; Liu, Y. CSI: A hybrid deep model for fake news detection. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 797–806. [Google Scholar]
Han, Y.; Turrini, P.; Bazzi, M.; Andrighetto, G.; Polizzi, E.; De Domenico, M. Measuring the co-evolution of online engagement with (mis) information and its visibility at scale. arXiv 2025, arXiv:2506.06106. [Google Scholar] [CrossRef]
Areshey, A.; Mathkour, H. Exploring transformer models for sentiment classification: A comparison of BERT, RoBERTa, ALBERT, DistilBERT, and XLNet. Expert Syst. 2024, 41, e13701. [Google Scholar] [CrossRef]
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A robustly optimized BERT pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
Zhou, X.; Jain, A.; Phoha, V.V.; Zafarani, R. Fake news early detection: A theory-driven model. Digit. Threat. Res. Pract. 2020, 1, 12. [Google Scholar] [CrossRef]
Medhat, W.; Hassan, A.; Korashy, H. Sentiment analysis algorithms and applications: A survey. Ain Shams Eng. J. 2014, 5, 1093–1113. [Google Scholar] [CrossRef]
Rubin, V.L.; Conroy, N.; Chen, Y.; Cornwell, S. Fake news or truth? using satirical cues to detect potentially misleading news. In Proceedings of the Second Workshop on Computational Approaches to Deception Detection, San Diego, CA, USA, 17 June 2016; pp. 7–17. [Google Scholar]
Honnibal, M. spaCy 2: Natural Language Understanding with Bloom Embeddings, Convolutional Neural Networks and Incremental Parsing. Available online: https://orkg.org/resources/R11007?noRedirect= (accessed on 1 October 2025).
Nie, Y.; Williams, A.; Dinan, E.; Bansal, M.; Weston, J.; Kiela, D. Adversarial NLI: A new benchmark for natural language understanding. arXiv 2019, arXiv:1910.14599. [Google Scholar]
Rashkin, H.; Choi, E.; Jang, J.Y.; Volkova, S.; Choi, Y. Truth of varying shades: Analyzing language in fake news and political fact-checking. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 9–11 September 2017; pp. 2931–2937. [Google Scholar]
Fellbaum, C. WordNet. In Theory and Applications of Ontology: Computer Applications; Springer: Berlin/Heidelberg, Germany, 2010; pp. 231–243. [Google Scholar]
PyTorch Contributors. BCEWithLogitsLoss. Available online: https://docs.pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html (accessed on 1 October 2025).
Wang, W.Y. “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; pp. 422–426. [Google Scholar]
Poynter Institute. PolitiFact. Available online: https://www.politifact.com/ (accessed on 1 October 2025).
Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.T.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]

Figure 1. The flowchart of the proposed MNC system.

Figure 2. The proposed MNC Architecture.

Table 1. Comparison of LIAR and LIAR2 features.

Category	LIAR	LIAR2
Number of Statements	12,836	22,962
Truthfulness Labels	6 categories	6 categories
News Timeline	No	Yes
Speaker Information	Job Title + Party	Full Biography
Geographic Context	Speaker’s State	State Referenced in Statement
Historical Credibility	Partial	Complete
Fact-Checker Justifications	No	Yes
Data Errors Fixed	No	Yes

Table 2. Available contextual metadata in the LIAR2 dataset.

Type	Features
Textual	“date”, “subject”, “speaker”, “speaker description”,
	“state info”, “context”
Numerical	“true counts”, “mostly true counts”, “half true counts”,
	“mostly false counts”, “false counts”, “pants on fire counts”

Table 3. Test performance across credibility history and handcrafted features in Experiment 1.

Category	Feature	Test Accuracy	F1 Macro	F1 Micro
Credibility History	True Count	0.6450	0.6334	0.6450
	Mostly True Count	0.6250	0.6084	0.6250
	Half True Count	0.5976	0.5758	0.5976
	Barely True Count	0.6237	0.6066	0.6237
	False Count	0.5910	0.5568	0.5910
	Pants on Fire Count	0.6150	0.5986	0.6150
Handcrafted Features	TTR	0.6193	0.6048	0.6193
	EC	0.6220	0.6078	0.6220
	AD	0.6180	0.6045	0.6180
	Sent. Label	0.6115	0.6077	0.6215
	Sent. Score	0.6167	0.6013	0.6167
	Subj. Score	0.6259	0.6121	0.6259
	Contr. Score	0.6220	0.6087	0.6220

Table 4. Test performance of Experiments 2–4 in comparison to the state-of-the-art.

Experiment	Test Accuracy	Macro F1	Micro F1
Cheng et al. [12]	70.20%	69.60%	70.20%
Proposed MNC (Baseline with Native Metadata)	70.27 ± 0.07%	69.67 ± 0.09%	70.27 ± 0.07%
Proposed MNC (Fusion with Handcrafted Features)	71.41 ± 0.12%	70.56 ± 0.17%	71.41 ± 0.12%
Proposed MNC (Augmentation-Enhanced System)	71.91 ± 0.23%	71.17 ± 0.26%	71.91 ± 0.23%

Table 5. Breakdown of the inference time of the proposed MNC system.

Component	Average Time (ms/Sample)
Proposed Model	2.43
Proposed Linguistic Feature	0.28
Proposed Sentiment & Subjectivity	1.38
Proposed Contradiction Feature	3.89
Total	7.98

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Haque, M.; Bari, A.S.M.H.; Gavrilova, M.L. A Lightweight Multimodal Framework for Misleading News Classification Using Linguistic and Behavioral Biometrics. J. Cybersecur. Priv. 2025, 5, 104. https://doi.org/10.3390/jcp5040104

AMA Style

Haque M, Bari ASMH, Gavrilova ML. A Lightweight Multimodal Framework for Misleading News Classification Using Linguistic and Behavioral Biometrics. Journal of Cybersecurity and Privacy. 2025; 5(4):104. https://doi.org/10.3390/jcp5040104

Chicago/Turabian Style

Haque, Mahmudul, A. S. M. Hossain Bari, and Marina L. Gavrilova. 2025. "A Lightweight Multimodal Framework for Misleading News Classification Using Linguistic and Behavioral Biometrics" Journal of Cybersecurity and Privacy 5, no. 4: 104. https://doi.org/10.3390/jcp5040104

APA Style

Haque, M., Bari, A. S. M. H., & Gavrilova, M. L. (2025). A Lightweight Multimodal Framework for Misleading News Classification Using Linguistic and Behavioral Biometrics. Journal of Cybersecurity and Privacy, 5(4), 104. https://doi.org/10.3390/jcp5040104

Article Menu

A Lightweight Multimodal Framework for Misleading News Classification Using Linguistic and Behavioral Biometrics

Abstract

1. Introduction

2. Related Works

2.1. Misleading News Classification Approaches

2.2. Multimodal and Hybrid Approaches

2.3. Biometrics in Misinformation Analysis

2.4. Evaluation Strategies

2.5. Challenges and Research Gaps

3. Methodology

3.1. Preprocessing

3.1.1. Textual Preprocessing

3.1.2. Numeric Preprocessing

3.2. Linguistic & Biometric Feature Extraction

3.2.1. Lexical Analysis

3.2.2. Sentiment and Subjectivity Analysis

3.2.3. Contradiction Detection

3.3. MNC Architecture

3.3.1. Text Processing

3.3.2. Numeric Metadata

3.3.3. Fusion and Classification

3.4. Text Augmentation Strategy

3.5. Training and Optimization

4. Dataset

5. Experimental Evaluation

5.1. Experimental Setup

5.2. Evaluation Metrics

5.3. Experimenal Results

5.3.1. Experiment 1: Baseline with Truth-Count Metadata

5.3.2. Experiments (2–4): Progressive Feature Integration

5.4. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI