Interpretable Fake News Detection Using Linguistic Indicators Under Imbalanced and Low-Resource Conditions

Ormeño-Arriagada, Pablo; Puraivan, Eduardo; Kloss, Steffanie; Cofré-Morales, Connie; Rodriguez, Miguel

doi:10.3390/app16105080

Open AccessArticle

Interpretable Fake News Detection Using Linguistic Indicators Under Imbalanced and Low-Resource Conditions

by

Pablo Ormeño-Arriagada

^1,*

,

Eduardo Puraivan

²

,

Steffanie Kloss

³

,

Connie Cofré-Morales

²

and

Miguel Rodriguez

^4,5

¹

Facultad de Ingeniería, Negocios y Ciencias Agroambientales, Universidad de Viña del Mar, Viña del Mar 2520000, Chile

²

Vicerrectoría Académica, Universidad de Viña del Mar, Viña del Mar 2520000, Chile

³

Facultad de Educación y Humanidades, Universidad Andrés Bello, Santiago 8370035, Chile

⁴

Faculty of Education, Universidad de Playa Ancha, Valparaiso 2360004, Chile

⁵

Laboratorio del Aprendizaje, Enseñanza y Tecnología (LAETEC-UPLA), Universidad de Playa Ancha, Valparaiso 2360004, Chile

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(10), 5080; https://doi.org/10.3390/app16105080

Submission received: 5 April 2026 / Revised: 9 May 2026 / Accepted: 11 May 2026 / Published: 20 May 2026

Download

Browse Figures

Versions Notes

Abstract

The rapid proliferation of online misinformation poses significant risks to democratic processes and public decision-making. However, existing machine learning and deep learning approaches often rely on large annotated datasets and exhibit limited robustness under severe class imbalance and low-resource conditions, particularly in Spanish-language contexts. To address this, this study proposes an interpretable and robust framework for misinformation detection under such constraints. A unified, linguistically grounded and data-centric pipeline is developed, integrating structured lexical, syntactic, and semantic features with synthetic minority augmentation, class-balanced ensemble learning, autoencoder-based representation learning, and active learning under data scarcity. Importantly, the framework systematically evaluates the interaction between these components within a reproducible experimental setting. Results demonstrate that the proposed approach achieves consistent improvements in macro-averaged F1 and minority-class recall compared to baseline models, while reducing performance variance across folds. Ensemble and augmentation strategies provide the most stable configurations, enhancing the detection of underrepresented classes. Moreover, the use of interpretable linguistic features allows predictions to be associated with discourse-level patterns, improving transparency. Consequently, the framework offers a reproducible, computationally efficient, and interpretable solution for misinformation detection in low-resource environments, supporting practical deployment and future multilingual extensions. Importantly, this study provides the first systematic analysis of the interaction between linguistic representations and imbalance mitigation strategies under extreme data scarcity.

Keywords:

artificial intelligence; fake news detection; corpus linguistics; machine learning; imbalanced datasets; linguistic features; transfer learning; ensemble learning; active learning

1. Introduction

1.1. Background and Problem Context

Fake news, defined as intentionally false or misleading information presented as legitimate, undermines public discourse and democratic processes [1]. Moreover, its rapid digital dissemination increases the need for reliable automated detection in resource-constrained settings. Although linguistic features at lexical, syntactic, and semantic levels provide interpretable signals, current systems remain limited by scarce annotated data and severe class imbalance [2,3,4]. Furthermore, false content is underrepresented, leading to biased learning and weak generalisation [5,6,7]. In this context, Machine Learning (ML) and Deep Learning (DL) approaches degrade under linguistic variability, particularly in Spanish corpora [8,9]. To address this, we propose a unified pipeline integrating linguistic feature engineering with oversampling, ensemble learning, autoencoder-based transfer learning, and active learning, improving robustness and interpretability in low-resource settings [10,11,12].

1.2. Linguistic Features in Fake News

Linguistic features play a central role in fake news classification by providing interpretable cues that distinguish deceptive from legitimate content [13]. Specifically, lexical, syntactic, and semantic features capture complementary patterns such as sensational wording, stylistic repetition, discourse intensification, and simplified grammatical structures associated with manipulation. Furthermore, semantic analyses, including sentiment scoring, topic modelling, and contextual representations, reveal bias, tonal extremity, and thematic inconsistency. Together, these structured features form a robust foundation for ML-based fake news detection, particularly in small and imbalanced datasets [14,15]. Motivated by this, the present study integrates rich linguistic representations to enhance generalisation, robustness, and interpretability, aiming to strengthen transparency and diagnostic value in automated misinformation detection [16].

1.3. Interpretability, Transparency, and Explainability in Fake News and Linguistics

In recent years, growing attention has focused on interpretability, transparency, and explainability in Artificial Intelligence (AI) systems, particularly in sensitive domains such as misinformation detection. Although deep learning and transformer architectures achieve strong predictive performance, their internal representations remain difficult to interpret, raising concerns about trustworthiness and accountability. In this context, effective fake news detection requires models that not only classify content but also reveal underlying linguistic and rhetorical mechanisms. Specifically, interpretable approaches enable the identification of discourse-level signals, including evaluative language, modality, and stance markers, supporting analytical understanding and transparency. Consequently, research increasingly seeks to balance predictive performance with human-understandable representations, motivating the use of linguistically grounded indicators as interpretable tools for misinformation analysis [17,18,19].

1.4. Research Gap and Proposed Framework

Despite growing research in automated fake news detection, few studies integrate structured linguistic features with synthetic oversampling and autoencoder-based learning [20]. Moreover, existing approaches often rely on deep content-based models, limiting interpretability and robustness in low-resource settings. To address this, we propose a unified, linguistically grounded and data-centric pipeline that combines linguistic feature engineering with Synthetic Minority Over-sampling Technique (SMOTE), autoencoder-based transfer learning, and Ensemble modelling for small imbalanced datasets. Unlike prior work, which applies these techniques in isolation, this framework systematically examines their interaction under severe class imbalance. Specifically, it evaluates how synthetic sampling reshapes lexical and discourse patterns and influences ensemble decision boundaries and latent representations. By doing so, the approach aims to improve minority-class detection while preserving linguistic integrity within a reproducible and statistically validated setting.

1.5. Research Questions and Experimental Design

This study investigates five interrelated research questions examining the feasibility, robustness, and interpretability of linguistically grounded fake news detection under severe data imbalance, and provides a unified analysis of interacting strategies under data scarcity. Specifically, it evaluates whether ML models enhanced with synthetic augmentation, ensemble stabilisation, and autoencoder-based transfer learning achieve reliable performance in small corpora. Furthermore, it analyses robustness across classical, ensemble, and DL approaches under imbalance mitigation. In addition, it assesses the contribution of structured linguistic features to interpretability and distinguishes it from explainability. Finally, it examines whether the unified pipeline provides a generalisable and computationally efficient framework. To address this, a structured experimental design compares baseline, augmented, ensemble, and transfer configurations using stratified cross-validation, quantifying improvements in robustness, minority recall, stability, and interpretability. To systematically evaluate the proposed framework and its contributions under data-constrained conditions, this study is guided by the following core research questions:

RQ1: To what extent can ML and DL models enhanced with synthetic augmentation, ensemble stabilisation, and autoencoder-based transfer learning achieve robust and reliable fake news classification under severe data imbalance?
RQ2: How does the integration of structured linguistic features with imbalance mitigation strategies influence interpretability and predictive performance compared to purely deep content-based approaches?
RQ3: Can a unified, linguistically grounded pipeline provide a generalisable, reproducible, and computationally efficient framework for fake news detection in low-resource environments?

1.6. Contributions

To our knowledge, the novelty of this work lies in systematically analysing the interaction between linguistic feature representations and imbalance mitigation strategies within a unified and reproducible framework under extreme data scarcity. Specifically, this study proposes a linguistically grounded and data-centric architecture for small and imbalanced corpora, integrating structured feature engineering with synthetic oversampling, ensemble stabilisation, autoencoder-based representation learning, and active learning. Unlike prior approaches, which treat these components independently, this framework evaluates their combined effects under severe minority scarcity, achieving improvements in macro-averaged F1, minority recall, and stability. Moreover, by anchoring predictions in discourse-level markers, it enhances interpretability while maintaining computational efficiency in constrained settings. To our knowledge, no prior work has quantified these interactions within a statistically validated framework for Spanish misinformation corpora, reflecting a deliberate effort to advance robust and deployable solutions.

The remainder of this paper is structured as follows. Section 2 critically reviews the existing literature on automated fake news detection and identifies the methodological gaps that motivate this study. Section 3 details the proposed unified pipeline, including feature engineering, imbalance mitigation strategies, and the experimental protocol. Section 4 reports the empirical findings with comprehensive statistical validation. Section 5 analyses the implications of the results in relation to robustness, interpretability, and deployment considerations. Finally, Section 6 summarises the principal contributions and outlines directions for future research.

2. Related Work

This section reviews existing approaches to automated fake news detection, focusing on ML, DL, and data-centric strategies. It critically examines their limitations under data scarcity and class imbalance, and highlights gaps in integrating interpretability with robustness, thereby motivating the unified, linguistically grounded framework proposed in this study.

2.1. Overview of Fake News Detection Research

Research on automated fake news detection has advanced rapidly through ML, DL, and NLP techniques. However, existing approaches, which exploit textual, metadata, and propagation signals, continue to face challenges related to data scarcity, class imbalance, and cross-domain generalisation. Although transformer-based models achieve high predictive accuracy, they often sacrifice interpretability and robustness in low-resource settings [21]. Consequently, recent studies have explored linguistic features, oversampling, ensemble learning, transfer learning, and active learning. Nevertheless, these strategies are typically applied in isolation, limiting their ability to address multiple constraints simultaneously. Motivated by this, the present study reviews these directions to identify their limitations and to support the development of a unified, linguistically grounded pipeline for robust and interpretable fake news detection [22].

2.2. Machine Learning and Predictive Modeling Approaches

Fake news detection is predominantly driven by ML and DL models, ranging from traditional classifiers such as SVM to advanced architectures including Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and transformers [23,24]. However, their performance depends heavily on large annotated datasets and often deteriorates under class imbalance and evolving misinformation patterns [25,26]. Consequently, recent research has shifted toward data-centric and hybrid strategies, including SMOTE, Variational Autoencoder (VAE), Generative Adversarial Network (GAN), and augmentation techniques such as back-translation and paraphrasing, aiming to improve minority representation and robustness in low-resource settings [23,27,28]. In parallel, predictive models, including Logistic Regression, ensemble methods, and DL architectures, attempt to enhance generalisation, yet challenges persist [29,30,31,32,33]. Moreover, a fundamental trade-off remains between interpretability and adaptability, where handcrafted features such as TF-IDF and POS tags provide transparency, while DL models capture richer contextual patterns [29,34]. Recent advances in deep learning have explored end-to-end architectures to overcome limitations of multi-stage pipelines, such as the Fourier Uncalibrated Photometric Stereo Network (FUPS-Net), which implicitly learns complex feature interactions through frequency-domain representations, reducing error propagation and improving stability compared to disjointed models [35].

2.3. Small Data and Low-Resource Challenges

Training ML and DL models on small NLP datasets remains a persistent methodological challenge due to overfitting and limited generalisation capacity [36]. In low-resource settings, models frequently capture dataset-specific noise rather than stable linguistic representations, which reduces performance on unseen or out-of-distribution data. To mitigate this limitation, transfer learning has become a dominant strategy, enabling the fine-tuning of large pre-trained architectures such as Bidirectional Encoder Representations from Transformers (BERT), RoBERTa, and GPT on relatively small labelled corpora while leveraging knowledge from large-scale pretraining [37]. In addition, data augmentation techniques, including synonym replacement, back translation, and paraphrasing, expand linguistic variability and reduce variance-driven overfitting [38]. Although these approaches enhance robustness, balancing representation learning with data efficiency remains a central research challenge in misinformation detection and related small data NLP tasks.

2.4. Linguistic Feature-Based Approaches

Linguistic analysis is fundamental to fake news detection, utilising lexical, syntactic, and semantic features to differentiate deceptive from factual content [39]. Lexical indicators, including word frequency, sentiment polarity, and stylistic variation, expose recurrent patterns of misinformation [40]. Syntactic features, such as POS distributions and sentence complexity, reveal deviations from standard journalistic writing [41]. Semantic methods, including embeddings and topic modelling, identify latent bias and content manipulation [42]. Together, these structured linguistic features enable interpretable and resource-efficient models, offering effective detection performance in low-resource scenarios without reliance on deep NLP architectures [43].

2.5. Data Augmentation, Ensemble Learning, and Representation Learning

In low-resource and imbalanced text classification scenarios, combining data augmentation, ensemble learning, and representation learning has emerged as a promising strategy to enhance robustness and generalisation. Augmentation techniques such as SMOTE, synonym replacement, and back translation generate synthetic or semantically varied samples, improving minority-class recall while reducing overfitting [44]. However, poorly controlled augmentation may introduce semantic drift or distributional distortion. To stabilise learning, ensemble methods including bagging and boosting aggregate diverse classifiers, improving resilience under noisy and imbalanced conditions [45,46]. In parallel, autoencoders support unsupervised representation learning and latent feature extraction, strengthening performance in constrained environments [47]. Active learning further improves data efficiency by prioritising informative samples and reducing annotation costs [48]. Although each technique has shown individual benefits, their integrated application in fake news detection remains insufficiently explored, motivating the development of unified and scalable pipelines.

2.6. Interpretability and Explainability in Fake News Detection

Interpretability and explainability have become critical requirements in fake news detection [49,50], particularly in socially sensitive and high-impact domains. While DL and transformer-based models achieve strong predictive performance, their internal decision processes often remain opaque, limiting trust and accountability [51]. In contrast, interpretable approaches emphasise transparent feature representations, enabling the identification of linguistic cues such as sentiment, modality, and discourse patterns associated with misinformation. Recent research highlights the importance of balancing predictive accuracy with human-understandable reasoning, especially in low-resource settings where model validation is constrained. Consequently, there is a growing need for frameworks that integrate robust classification with interpretable mechanisms to support reliable and transparent misinformation analysis [52].

2.7. Gap Identification and Research Positioning

Although fake news detection has advanced considerably, existing approaches remain fragmented, applying linguistic features, augmentation, ensemble learning, and transfer learning in isolation rather than within a unified framework. Moreover, many methods depend on large annotated datasets, struggle under severe class imbalance, and rely on opaque high-dimensional representations, limiting robustness and interpretability in low-resource settings [37]. While some studies combine sampling and ensemble techniques without linguistic grounding [53], and others focus on linguistic cues independently [54], their interaction remains underexplored. Consequently, there is limited systematic analysis of how linguistic features interact with imbalance mitigation strategies in small corpora. Importantly, the novelty of this study lies in quantifying these interactions within a unified, reproducible framework, demonstrating how they reshape feature distributions and decision boundaries. Therefore, the contribution advances beyond integration toward a deeper understanding of interpretable representations under data scarcity [21,55,56,57,58]. Transformer-based models leverage attention mechanisms to capture rich contextual dependencies; however, their effectiveness is often constrained by data availability and significant computational requirements. Recent hybrid approaches, such as the architecture proposed in [59], achieve strong performance through attention-driven representations, yet rely on high-dimensional embeddings and resource-intensive training. In contrast, the proposed framework emphasises interpretability and computational efficiency by employing structured linguistic features tailored to low-resource settings.

3. Methodology

This section outlines the proposed unified pipeline for fake news detection, addressing challenges of small and imbalanced datasets. The framework combines (i) linguistic feature engineering for interpretability, (ii) synthetic oversampling to correct imbalance, (iii) autoencoder-based transfer learning for representation learning, and (iv) ensemble modeling to enhance accuracy and generalization. We describe preprocessing, feature extraction, augmentation, model training, and evaluation, emphasizing reproducibility and adaptability. The framework is designed as a modular system in which data-centric and model-centric components interact sequentially, enabling controlled integration of augmentation, representation learning, and classification stages.

3.1. Study Overview

This study proposes a data-efficient pipeline for fake news detection that integrates structured linguistic features with data augmentation, ensemble learning, autoencoder-based transfer learning, and active learning to address imbalanced and low-resource settings. Each component is designed to mitigate a specific limitation, including minority-class underrepresentation, overfitting in small datasets, and dependence on costly manual annotation. By strategically combining these complementary techniques, the framework strengthens minority-class detection, improves robustness against data scarcity, and enhances adaptability to evolving misinformation patterns. Furthermore, its modular and reproducible design supports scalability and generalisation across diverse contexts, contributing to the development of transparent and reliable misinformation detection systems.

3.2. Dataset Description

The CLNews dataset [60] is a specialized collection aimed at studying the spread of verified and unverified news on Twitter during the Chilean social outbreak between June 2019 and January 2020. It consists of 300 propagation trees, representing the dissemination of information over the platform. Each propagation tree is formed by tweets, retweets, and replies, allowing the observation of user interactions and diffusion patterns over time. The dataset was curated from events fact-checked by Chilean verification agencies, including FastCheck, FactCheckingUC, and Decodificador, ensuring the credibility of the labeling process. Each event is classified into four veracity categories, as summarised in Table 1. Each node in the propagation tree is treated as an independent text instance, and classification is performed at the individual tweet level.

This dataset is particularly valuable for research on fake news detection and rumor classification, especially in low-resource and imbalanced data settings, given the limited number of events per category. Figure 1 illustrates the class distribution of the misinformation dataset, highlighting the imbalance across categories such as NO RUMOR, TRUE, FALSE, and NO VERIFIED news items.

The input features used in this study consist of structured linguistic vectors, encompassing lexical attributes, readability properties, and semantic indicators. These feature sets provide a rich representation of textual content, allowing the models to capture nuanced patterns associated with deceptive language. A significant challenge addressed in this research is the presence of class imbalance in a small dataset, where certain labels are notably underrepresented, thereby complicating the learning process for standard classifiers. To ensure ethical compliance, all data employed in the study were fully anonymized and analysed in accordance with established guidelines for public content research. This approach safeguards participant privacy while enabling reproducible and ethically responsible experimentation.

3.3. Linguistic and Readability Features

This methodology enables systematic exploration of the linguistic structures that characterise different forms of information disorder, including misinformation, disinformation, and malinformation [61]. By examining lexical, syntactic, and semantic patterns, the approach facilitates deeper analysis of the structural mechanisms underlying deceptive communication. These features are organised into six analytical categories, described as follows:

Surface Variables: These include structural and quantitative text features such as the number of paragraphs, and descriptive statistics (total, mean, standard deviation, median, minimum, and maximum) for sentences, words, and characters.
Readability: This dimension incorporates the Flesch-Szigriszt Index (IFSZ), which assesses the textual complexity of Spanish-language content based on its expected ease of comprehension [62].
Lexical Variables with Discursive Function: This set captures the frequency of words fulfilling discursive roles, including demonstratives, personal pronouns, connectors, negations, modalisers, and both textual and argumentative discourse markers.
Polarity Lexicon: This component quantifies overall sentiment polarity by tracking the presence of positively and negatively tagged words from the Spanish versions of the Loughran–McDonald Sentiment Word Lists (LMC), Bing Liu Opinion Lexicon (BING), Affective Norms for English Words (AFINN), and National Research Council Canada Emotion Lexicon (NRC) dictionaries. The AFINN lexicon further distinguishes between strong and mild sentiment.
Emotion and Sentiment Lexicon: Emotional content is assessed using the Spanish translations of the NRC and LMC dictionaries. The NRC dictionary annotates words expressing basic emotions such as anger, anticipation, disgust, fear, joy, sadness, surprise, and trust. The LMC dictionary includes lexical markers for restriction, litigiousness, superfluity, and uncertainty.

These linguistic and readability metrics are critical in understanding the complexity and style of information, assisting in distinguishing between truthful and deceptive content.

3.4. Feature Selection Justification

A core assumption in empirical text analysis—central to corpus linguistics and natural language processing—is that statistically significant co-occurrence of linguistic features can reveal shared communicative patterns across texts [63,64]. These features span lexical, syntactic, semantic, and pragmatic-discursive levels, and their combinations often reflect consistent structural or functional properties tied to social, situational, or cognitive contexts [65]. For example, fake news texts commonly exhibit simplified syntax, lexical repetition, polysemy, negative polarity, and sparse use of discourse markers, suggesting intentions such as deception or subjectivity. Crucially, such communicative functions emerge not from isolated features but from multi-dimensional linguistic patterns that can be empirically identthe ified and operationalised in machine learning systems. These patterns enhance automated classification by aligning with discursive functions. While prior studies have validated specific markers—such as syntactic simplicity and negative polarity—in English-language disinformation detection [66], there remains a need for systematic selection of linguistically significant features tailored to Spanish-language fake news.

3.5. Preprocessing

The preprocessing stage ensures that textual data are standardised, noise-free, and ready for analysis. This process includes tokenisation, lowercasing, punctuation and stopword removal, lemmatisation, and handling of special characters or URLs [67]. Additional steps, such as normalising whitespace and correcting encoding inconsistencies, further enhance text quality. These operations reduce lexical redundancy, mitigate noise, and improve model interpretability. Effective preprocessing not only facilitates more accurate feature extraction but also strengthens downstream learning by ensuring that linguistic patterns are captured consistently across the dataset, ultimately enhancing the overall performance of the fake news detection pipeline.

3.6. Baseline Models

To establish a reference for evaluating the effectiveness of more complex learning strategies, three baseline models were implemented: a decision tree classifier, a support vector machine (SVM), and a shallow fully connected neural network (FCNN) [68]. These models were selected based on their wide adoption in text classification tasks with small datasets and their ability to operate on structured tabular representations of linguistic features, including lexical, syntactic, and semantic attributes. The goal of the baseline comparison was to quantify the capacity of traditional supervised classifiers to detect fake news patterns without the aid of data augmentation or ensemble integration. A description of the baseline models is provided in Table 2.

3.7. Augmentation Techniques

To address class imbalance and limited sample sizes in fake news classification, this study employed two established oversampling techniques, SMOTE [69]. These data-level augmentation methods were applied exclusively to the training set to prevent data leakage and ensure reliable evaluation. By generating synthetic minority-class instances, both techniques improve the representation of underrepresented deceptive content, which is common in real-world fake news datasets. This strategy enhances the model’s capacity to learn more stable decision boundaries and improves generalisation beyond the observed data. Through careful integration of oversampling within the training pipeline, the methodology seeks to balance minority-class recall with overall robustness, contributing to more reliable misinformation detection in constrained data environments. The characteristics of the oversampling techniques are summarised in Table 3.

3.8. Ensemble Methods

To enhance classification robustness in detecting fake news from linguistic features, this study implemented four ensemble learning techniques [70]. Ensemble models are particularly valuable in tasks involving small and imbalanced datasets, where individual classifiers may struggle to generalize across sparse or noisy samples. By aggregating the predictions of multiple learners trained on diverse data subsets, ensemble methods reduce variance, mitigate overfitting, and improve sensitivity to underrepresented fake news classes. These properties are essential when identifying nuanced or borderline misinformation patterns that are easily misclassified due to feature sparsity or linguistic ambiguity. A description of the ensemble learning techniques is provided in Table 4.

3.9. Transfer Learning with Autoencoder-Based Feature Extraction

To address limitations of small and imbalanced datasets in fake news classification, this study introduces a transfer learning framework based on Autoencoder architectures trained on structured linguistic vectors, including lexical, syntactic, and semantic features [71]. The autoencoder compresses high-dimensional inputs into compact latent representations that retain discriminative information. The encoder initializes a dense neural network classifier, while the decoder generates synthetic minority samples from latent vectors, strengthening class representation. This dual mechanism enhances generalisation and reduces overfitting, particularly for underrepresented classes. Empirical evaluation demonstrates measurable improvements in macro-averaged F1 score and recall compared with baseline models without transfer learning. Although autoencoders have been applied in text classification, their integration with synthetic oversampling and linguistic feature engineering remains limited, motivating this scalable and interpretable solution for misinformation detection under data scarcity. Figure 2 illustrates the structural macro-average of the autoencoder and its integration into the classification pipeline, including the encoding stage, synthetic generation, and classifier fine-tuning on both real and generated data. The autoencoder enhances separability by compressing high-dimensional linguistic features into structured latent representations.

3.10. Active Learning Framework

Active learning is employed to improve labeling efficiency by prioritizing the most informative samples for annotation [72]. The strategy relies on uncertainty sampling, using entropy as the query criterion to identify low-confidence predictions. A Logistic Regression model serves as the base learner due to its interpretability and rapid retraining capacity, supporting iterative updates. Over ten acquisition rounds, five uncertain samples are selected per round for manual verification. As illustrated in Figure 3, the framework follows a human-in-the-loop process, where a classifier trained on verified data evaluates an unlabeled pool and flags ambiguous instances for expert review. Verified labels are then incorporated into the training set, enabling progressive refinement. This controlled and iterative approach enhances minority-class representation, reduces annotation burden, and strengthens robustness under data scarcity and class imbalance. Labels are revealed from an existing labeled pool to simulate annotation under controlled conditions.

More precisely, Active learning is implemented as an offline simulation, where labels are iteratively revealed from an existing labeled pool to emulate annotation under controlled conditions.

3.11. Model Evaluation

Model performance was evaluated using accuracy, precision, recall, F1 score, ROC AUC, and Cohen’s Kappa to capture complementary aspects of classification quality under class imbalance. All models were trained and assessed using stratified 5-fold cross-validation, with results reported as mean ± standard deviation across splits. Statistical comparisons between variants were conducted using a two-sided z-test on differences of means, with Holm correction applied to control for multiple comparisons. Effect sizes were estimated using Cohen’s d to assess practical significance. Augmentation techniques were applied within training folds and evaluated across baseline and ensemble classifiers using imbalance-sensitive metrics such as macro-averaged F1 score, recall, and Cohen’s Kappa. This evaluation framework ensures fair comparison, reproducibility, and robust performance assessment under small and imbalanced fake news datasets.

3.12. Experimental Workflow

Figure 4 summarises the experimental pipeline, comprising five key modules: baseline training, synthetic data augmentation, ensemble learning, autoencoder-based transfer learning, and active learning. Each stage incrementally improves classification performance by addressing core challenges such as class imbalance, data scarcity, and feature variability. This systematic, data-centric workflow ensures reproducibility and supports scalable fake news detection under low-resource conditions. The overall pipeline follows a structured sequence: input text is processed into linguistic features, followed by augmentation, model training, and evaluation.

The experimental methodology defines a structured pipeline designed to ensure robustness and reproducibility. Initially, baseline models are trained using standard preprocessing, followed by hyperparameter optimisation guided by stratified validation. To ensure statistical reliability, repeated stratified cross-validation is applied. Subsequently, the framework integrates data-centric strategies, including resampling, augmentation, and ensemble learning, to enhance robustness. In addition, an autoencoder-based transfer learning approach captures latent linguistic representations and supports minority-class generation. Finally, performance is evaluated using multiple metrics and validated through statistical testing. Importantly, reproducibility is ensured through fixed random seeds, consistent data splits, and standardised protocols, enabling a controlled analysis of how augmentation, ensemble methods, and transfer learning improve classification under realistic constraints.

3.13. Classification and Fake News Metrics

In classification tasks, particularly in domains such as medical diagnosis, the selection of appropriate evaluation metrics is essential to accurately assess model performance and practical utility. The key metrics used in this study are summarised in Table 5. To ensure robust and unbiased evaluation, stratified k-fold cross-validation was employed, preserving class distributions across folds and enabling reliable performance estimation under imbalanced conditions.

3.14. Reproducibility and Code Availability

All experiments were implemented in Python (version 3.10) using Scikit-learn (version 1.3) for ML models, TensorFlow (version 2.13) and PyTorch (version 2.0) for DL components, and NLTK (version 3.8) and spaCy (version 3.6) for NLP tasks. Additionally, preprocessing, augmentation, and visualisation were performed using Pandas (version 2.0), NumPy (version 1.24), and Matplotlib (version 3.7). Models were trained on a local workstation with an AMD Ryzen 7 5800U CPU, 16 GB RAM, and an NVIDIA GPU, while computationally intensive tasks were executed on cloud-based GPU resources. To ensure transparency and reproducibility, the complete pipeline is available in a version-controlled GitHub (https://github.com/) repository (Repository link will be provided upon publication for double-blind review compliance), including preprocessing, feature extraction, oversampling, autoencoder components, ensemble configurations, active learning, and evaluation workflows. Importantly, fixed random seeds, stratified cross-validation, and strict in-fold augmentation were applied to prevent data leakage and enable reliable replication. To ensure reproducibility, all experiments were conducted using a fixed random seed (seed = 42). Model evaluation was performed using stratified 5-fold cross-validation to preserve class distribution across splits. Hyperparameter tuning was conducted through grid search and manual refinement based on validation performance. All preprocessing, augmentation, and training procedures were applied consistently within each fold to prevent data leakage.

4. Results

This section reports the results of the proposed pipeline, assessing data augmentation, ensemble learning, transfer learning, and active learning using accuracy, precision, recall, F1-score, and AUC. Comparative analyses confirm improved robustness and generalisation in class-imbalanced, low-resource settings, demonstrating the framework’s effectiveness for fake news detection. Balanced Random Forest (BRF) and Synthetic Minority Over-sampling Technique (SMOTE) are used to denote class-balanced ensemble learning and data-level augmentation methods, respectively.

4.1. Fake News Detection Pipeline Performance

Table 6 summarises the performance of all evaluated models across key metrics, including Accuracy (ACC), Precision (PRE), Recall (REC), F1-score (F1), Area Under the Curve (AUC), and Cohen’s Kappa (KAPPA). These results capture the comparative behaviour of classical, ensemble, resampling, autoencoder-based, and active learning configurations under class-imbalanced conditions. Mean and standard deviation values, computed across multiple folds, provide insight into both central performance trends and model stability. This comprehensive comparison establishes the empirical foundation for subsequent analysis of robustness, generalisation, and trade-offs among the evaluated fake news detection approaches. Each node in the propagation tree is treated as an independent text instance for feature extraction and classification.

The evaluated configurations can be interpreted as a staged ablation analysis, where baseline models establish reference performance, augmentation improves minority exposure, ensemble methods reduce variance, and autoencoder-based representations enhance feature structure. This progression highlights the contribution of each component under imbalanced conditions. Although SVM achieved the highest baseline accuracy, all models exhibited weak recall and F1 scores, particularly for minority classes, indicating strong majority bias and limited generalisation. Low Cohen’s Kappa values further confirm poor agreement beyond chance, emphasising the need for imbalance mitigation. Consequently, synthetic oversampling using SMOTE significantly improved minority sensitivity and overall class balance, with augmented models consistently outperforming their non-augmented counterparts.

Autoencoder-based transfer learning provided an additional mechanism to enhance minority-class detection. As reported in Table 6, the Encoder model achieved high precision with moderate recall, indicating that latent embeddings preserved discriminative information while reducing dimensionality. Decoder-generated synthetic samples further enriched minority exposure, improving classification balance. The workflow illustrated in Figure 2 demonstrates how encoder–decoder architectures support both representation learning and informed augmentation within a unified process. The autoencoder should be interpreted as a feature compression and representation learning component, rather than a standalone classifier, as it enhances latent structure but does not optimise recall independently. Active learning with uncertainty sampling, shown in Figure 3, iteratively selected informative instances, yielding incremental improvements in minority F1. Using logistic regression as an efficient base learner, performance gains stabilised after the eighth round, indicating convergence and diminishing returns. These findings suggest that combining representation learning with selective querying enhances cost efficiency in low-resource and imbalanced misinformation settings.

Although absolute AUC values appear modest, performance must be interpreted within the constraints of the dataset, including a four-class structure, severe minority scarcity, n equals 53 for FALSE, and substantial linguistic overlap. Under these conditions, baseline classifiers approached near random discrimination, particularly for minority instances. In contrast, ensemble and augmentation strategies achieved macro F1 improvements exceeding 30 percent and reduced fold variance by up to 40 percent, indicating enhanced stability and minority sensitivity. Importantly, this study does not aim to maximise absolute accuracy, but to evaluate robustness under constrained settings. Therefore, performance should be assessed through relative improvements, variance reduction, and minority recall, which better reflect practical utility in low-resource misinformation detection and demonstrate meaningful boundary refinement beyond conventional metrics.

To provide a reference point for modern NLP approaches, contextual semantic representations were extracted using a multilingual BERT model and used as input features for the same classifiers evaluated in the linguistic-feature pipeline (DT, SVM, and FCNN). This choice ensures a fair comparison at the representation level while avoiding confounding effects introduced by large-scale fine-tuning and computational resource disparities. Unlike handcrafted linguistic indicators, BERT embeddings encode contextual semantic information learned from large-scale corpora. The results presented in Table 7 therefore function as a semantic baseline, enabling a comparison between interpretable linguistic features and transformer-based representations within the same classification framework. It is important to emphasise that BERT is intentionally used as a frozen semantic baseline rather than a fully fine-tuned transformer model. The objective is not to achieve state-of-the-art performance, but to provide a controlled comparison between contextual semantic representations and interpretable linguistic features under identical experimental conditions. This design enables a fair evaluation of representation-level differences rather than optimisation-dependent performance.

While contextual embeddings capture semantic relations, linguistic indicators provide interpretable signals that facilitate analysis of misinformation patterns.

While fully fine-tuned transformer models may achieve higher absolute performance, their effectiveness depends on large annotated datasets and substantial computational resources. In contrast, the present study deliberately focuses on realistic low-resource conditions, where such requirements are not feasible. Therefore, the comparison with frozen BERT embeddings is intended to isolate representation-level differences rather than to compete with fully optimised transformer pipelines. This design choice aligns with the study’s objective of evaluating interpretable and deployable solutions under constrained environments. Fine-tuned transformer models were not considered to ensure methodological fairness and to reflect realistic low-resource deployment conditions, where large-scale pretraining and GPU-intensive optimisation are not feasible.

4.2. Comparative Analysis

Table 6 compares performance across classical, resampling, ensemble, and active learning configurations. Overall, class-balanced ensemble methods, particularly Balanced Random Forest and Bagging, achieved the most consistent trade-off across metrics, demonstrating superior stability under severe class imbalance. In contrast, baseline models such as DT, SVM, and FCNN exhibited poor generalisation, with low F1 scores and near-random AUC values. Although Active Learning attained the highest accuracy, its lower F1 and increased variance indicate a bias toward majority class predictions. These results confirm that single models are insufficient under extreme imbalance, while ensemble aggregation significantly improves robustness. These results confirm that learning is occurring, as evidenced by consistent improvements over baseline models and statistically supported gains, despite inherently low absolute performance under extreme data constraints.

From a methodological perspective, the configurations can be interpreted as a staged ablation analysis. Resampling techniques, including SMOTE, improved minority recall but produced variable outcomes depending on the model. Importantly, the strongest gains emerged from combining synthetic augmentation with class-balanced ensembles, highlighting the complementary interaction between data-level and model-level strategies. This synergy resulted in improved stability, reduced variance, and more balanced decision boundaries, demonstrating that performance improvements are driven by integration rather than isolated techniques.

Finally, autoencoder and active learning models revealed distinct behaviours. The Encoder achieved high precision but low recall and F1, indicating limited generalisation and a narrow decision boundary. Similarly, Active Learning improved accuracy through iterative sampling but showed modest AUC and sensitivity to sample selection. These findings suggest that while representation learning and selective querying offer efficiency benefits, they do not match the robustness of ensemble and resampling approaches. Overall, pronounced precision–recall trade-offs emphasise the difficulty of balancing sensitivity and specificity in imbalanced fake news detection, reinforcing the need for integrated and interpretable solutions.

4.3. Mutual Information

To better illustrate the relative importance of the extracted features, Figure 5 presents a horizontal bar chart showing the top twenty variables ranked by their values. For readability, the feature names are abbreviated.

Mutual information analysis reveals that discourse and argumentative markers are the most informative predictors of fake news. Specifically, features from Argumentative Discourse Markers (ADM, including exemplification–concession, generalizers, and justifying expressions) indicate that deceptive narratives rely on structured framing to enhance plausibility. In addition, Textual Discourse Markers, TDM, such as distributors and finalizatives, highlight the role of discourse organisation. Moreover, emotional features from NRC and LMc lexicons, particularly Joy, Anger, and Sadness, suggest that misinformation leverages affective framing, while syntactic markers, including Demonstratives and Negations, indicate referential ambiguity. Importantly, the lower contribution of GFOG confirms the strength of targeted linguistic indicators. To complement this, Figure 6 illustrates class distributions, revealing partial separation and overlap, which reflects the inherent complexity of distinguishing misinformation categories.

Figure 7 illustrates the comparative recall performance of all evaluated models, highlighting their ability to correctly identify fake news instances. The bars represent mean recall values across validation folds, with error bars indicating standard deviation, thereby reflecting model stability under class-imbalanced conditions. This visual comparison complements the quantitative results in Table 6, providing insight into how different learning strategies—such as resampling, ensemble integration, and transfer learning—affect sensitivity to minority-class examples in fake news detection.

4.4. Statistical Validation

This subsection reports formal statistical analyses to determine whether performance differences across models reflect robust improvements rather than sampling variability. Table 8 presents 95 percent confidence intervals, median values, interquartile ranges, and mean plus standard deviation for AUC under five fold cross validation. These statistics quantify uncertainty and allow a more rigorous comparison of discriminative capacity. Ensemble based and SMOTE enhanced models, including BRF, BAGGING, and FCNN SMOTE, display narrower confidence intervals and higher median AUC values, indicating greater stability and consistent separation across folds. In contrast, baseline models such as DT and FCNN show wider intervals and larger dispersion, suggesting reduced robustness under imbalance. This analysis intentionally strengthens inferential reliability by moving beyond point estimates toward distribution aware evaluation.

Table 9 presents pairwise comparisons of F1-score between the SVM baseline and enhanced models using summary statistics over cross-validation folds. A two-sided z-test on the difference of means was applied, with Holm correction for multiple comparisons. Results show that ensemble-based models, particularly B.RF and BAGG, achieved statistically significant improvements with large effect sizes. In addition, SMOTE-based variants also demonstrated significant gains, although with comparatively smaller effects. Conversely, DT and Encoder underperformed relative to the baseline, while FCNN and ACTIVE showed no meaningful differences. Importantly, effect sizes should be interpreted cautiously, given the limited number of folds and the approximation of independent samples. Overall, these findings confirm that combining ensemble learning with data augmentation yields consistent and practically meaningful improvements in F1-score.

Table 9 reports pairwise F1 comparisons against the SVM baseline using a two-sided z-test with Holm-adjusted p-values. Results indicate that class-balanced ensemble methods, particularly BRF and Bagging-based configurations, achieve the most consistent and statistically meaningful improvements, confirming their robustness under severe class imbalance. In contrast, DT and Encoder exhibit weaker generalisation, reflecting the limitations of single or unregularised models. Overall, these findings demonstrate the advantage of integrating ensemble aggregation with data-centric strategies in imbalanced misinformation detection. From a statistical perspective, inference combines cross-validation estimates with controlled comparisons, while results are interpreted cautiously due to the limited number of folds. Importantly, effect size analysis supports practical relevance, ensuring that observed improvements reflect systematic methodological advantages rather than random variation.

4.5. Summary of Key Findings

The results demonstrate that the proposed pipeline effectively addresses the challenges of small, imbalanced datasets by improving stability, minority-class recall, and overall robustness. Baseline models exhibited poor generalisation and strong bias toward majority classes, confirming the limitations of traditional classifiers under data scarcity. In contrast, data-centric strategies, particularly synthetic oversampling and augmentation, significantly enhanced minority representation, enabling models to capture more informative linguistic patterns. Moreover, the integration of autoencoder-based representations further enriched the feature space structure, supporting more discriminative learning under constrained conditions.

Importantly, the strongest performance gains were achieved through the combination of class-balanced ensemble methods and resampling techniques, which consistently improved stability and reduced variance across folds. While active learning contributed to efficient sample selection and reduced annotation effort, its performance remained sensitive to data distribution. Overall, the findings highlight that hybrid approaches, integrating linguistic features, augmentation, and ensemble learning, provide the most reliable and interpretable solutions, particularly when performance is evaluated through robustness-oriented metrics rather than absolute accuracy in low-resource misinformation detection settings. Augmentation modifies feature space geometry while preserving core linguistic patterns, thereby supporting robust decision boundaries and distributional checks, confirming preservation of key linguistic feature statistics after augmentation. Furthermore, the experimental setup can be interpreted as a staged ablation analysis, enabling systematic evaluation of component-wise contributions.

Additional distributional checks confirmed that key linguistic feature statistics were preserved after augmentation, indicating that synthetic samples did not distort the underlying feature space.

5. Discussion

5.1. Principal Findings and System-Level Insights

The integration of SMOTE with baseline models, together with ensemble methods such as Balanced Random Forest and ADABOOST, and autoencoder-based transfer learning, produced the strongest performance across evaluation metrics. Specifically, this combination improved recall and F1 for underrepresented classes, enabling the detection of subtle linguistic patterns such as cohesion irregularities and stance variation. Importantly, these findings support the hypothesis that data-centric augmentation combined with robust learning architectures mitigates imbalance in low-resource settings, while active learning reduces annotation effort with early performance gains. Moreover, a key insight is that SMOTE not only rebalances class distributions but also reshapes discourse-level linguistic features, influencing ensemble decision boundaries. Consequently, this system-level perspective demonstrates that robustness emerges from component interaction, emphasising interpretability, stability, and practical applicability over absolute predictive performance. Data-centric strategies can reshape feature space geometry in interpretable ways. These findings establish a coherent foundation for the conclusions that follow, highlighting how the proposed framework translates methodological integration into robust, interpretable, and practically deployable misinformation detection under constrained conditions.

5.2. Interpretation and NLP Context

The proposed architecture combines structured linguistic features with autoencoder-derived latent embeddings to capture discourse patterns associated with deception, including reduced syntactic complexity, intensified evaluative tone, and cohesion irregularities. These representations provide compact yet information-rich abstractions that improve performance in resource-constrained Spanish misinformation settings, where pragmatic cues such as hedging and exaggerated assertiveness remain informative. Ensemble integration further enhances stability by reducing variance across folds, while the limited annotation demands of autoencoder training support scalable deployment. Interpretability here denotes feature-level traceability between linguistic markers and model decisions rather than post hoc explanations. Mutual information analysis, Table 8, identifies discourse and sentiment indicators shaping classification boundaries, preserving structured attribution unlike opaque contextual embeddings. This design strengthens auditability and facilitates human-guided monitoring in institutional environments. To illustrate the interpretability of the proposed approach, we present two representative cases. In a correct prediction, a FALSE instance containing strong argumentative markers, such as justificatory expressions and generalizers, was correctly classified, reflecting alignment between discourse-level features and model decision. In contrast, a failure case involved a TRUE instance with emotionally charged language, where overlapping sentiment cues led to misclassification. These examples highlight how linguistic indicators influence decisions while revealing limitations under feature overlap.

5.3. Comparison with Existing Approaches

This study extends recent advances in ML and NLP for fake news detection while addressing key limitations in prior work. Many existing systems depend on large annotated datasets or social network metadata to achieve high performance [23,24], resources that remain unavailable in low-resource contexts. In contrast, this research intentionally focuses on small, linguistically grounded corpora and integrates structured lexical, syntactic, and semantic features to strengthen interpretability and adaptability. Unlike transformer-driven models such as BERT and GPT, which often obscure discourse cues and struggle under imbalance [37], the proposed framework combines linguistic feature engineering with data-centric strategies. By unifying SMOTE, ensemble stabilisation, autoencoder-based transfer learning, and active learning [25,32,36,48,53], the pipeline improves minority recall and stability while reducing dependence on extensive annotation.

5.4. Interpretability and Transparency

An important implication of the proposed framework is its emphasis on interpretability and transparency in misinformation detection. While transformer-based architectures achieve strong predictive performance, their high-dimensional representations remain difficult to interpret, limiting their applicability in analytical contexts. In contrast, the linguistic indicators used in this study provide human-understandable signals grounded in discourse-level features such as evaluative language, modality, and stance. Consequently, predictions can be associated with observable textual patterns rather than opaque representations, enhancing transparency and supporting partial explainability. These examples illustrate how linguistic indicators contribute to model decisions, reinforcing interpretability at the feature level. Moreover, these features enable deeper analysis of the rhetorical mechanisms underlying misleading narratives. Importantly, the framework does not provide causal or model-internal explanations, but offers feature-level interpretability by linking outputs to linguistic attributes. Therefore, it contributes to the development of trustworthy AI systems for research, journalism, and institutional decision-making. Accordingly, the framework provides traceable feature-level transparency rather than instance-level causal explanations. To illustrate this, a correctly classified FALSE instance containing strong argumentative markers was identified, whereas a misclassified TRUE instance exhibited overlapping emotional cues. These cases demonstrate how linguistic indicators influence decisions while revealing limitations under feature overlap.

5.5. Empirical Validation and Discussion of Research Questions

The empirical findings provide convergent answers to the research questions and validate the proposed approach. Regarding RQ1, models enhanced with synthetic augmentation, ensemble stabilisation, and autoencoder-based transfer learning achieved improved macro F1, minority recall, and reduced variance, demonstrating reliable classification under severe data imbalance. With respect to RQ2, structured linguistic features enhanced interpretability by preserving traceable links between discourse markers and predictions, offering a transparent alternative to opaque DL representations. Furthermore, the study clarifies interpretability as feature-level transparency and explainability as post hoc reasoning, highlighting the role of linguistic representations in strengthening the former. Finally, addressing RQ3, statistical validation and reproducible evaluation confirm that the unified pipeline provides a generalisable, efficient, and scalable framework for misinformation detection in constrained environments. Collectively, these results indicate that robustness emerges from the interaction between linguistic representations and imbalance mitigation strategies.

5.6. Strengths and Practical Deployment Considerations

The proposed framework presents a modular ML pipeline for fake news detection in small and imbalanced datasets, grounded in structured linguistic features rather than network-based signals. This focus enhances interpretability by linking predictions to discourse-level patterns. Importantly, the study introduces autoencoder-based transfer learning to extract low-dimensional latent representations, improving feature efficiency. Moreover, the evaluation incorporates cross-validation, statistical testing, and visual analysis to ensure reproducibility and insight into model behaviour. From a practical perspective, the framework is designed for deployment under constrained environments, relying on lightweight models that enable CPU-based execution without specialised hardware. In addition, synthetic oversampling and active learning reduce annotation effort while preserving adaptability. Consequently, the approach provides a transparent, scalable, and cost-efficient solution suitable for real-world misinformation monitoring across institutional contexts. The proposed framework complements transformer-based approaches by prioritising interpretability and efficiency.

5.7. Limitations and Future Work

Despite promising results, several limitations should be acknowledged. First, the evaluation relies on a single dataset, which limits generalisability across domains, languages, and cultural contexts, particularly given variation in discourse structure and modality. Second, the use of structured linguistic features may introduce noise in informal or multilingual inputs and restrict adaptability to highly dynamic or non-standard text. Third, the framework operates exclusively on textual data, excluding metadata, user behaviour, and multimodal signals that could enhance detection in real-world environments. Fourth, the study does not aim to compete with large-scale transformer-based models, instead focusing on low-resource settings where computational constraints preclude extensive pretraining. Consequently, future work will prioritise cross-dataset validation, multilingual adaptation, and integration of multimodal and graph-based approaches to improve robustness and applicability. Therefore, results should not be interpreted as generalisable across domains without further validation. Temporal dynamics are not modelled, as the focus is on static text classification under constrained conditions. The computational complexity remains manageable due to the use of lightweight models. Finally, experiments were primarily designed for CPU feasibility, with optional GPU acceleration used for efficiency.

6. Conclusions

This study presents a modular and interpretable machine learning pipeline for fake news detection, tailored to small and imbalanced datasets through linguistically grounded features. By combining data augmentation methods such as SMOTE, ensemble strategies including Bagging and Balanced Random Forest, and autoencoder-based transfer learning, the framework markedly enhances minority-class detection, particularly for the FALSE and NO VERIFIED categories. These improvements align with increased sensitivity to low-frequency linguistic cues, such as modal verbs, cohesive devices and stance markers, which characterise deceptive discourse but often remain hidden in highly imbalanced corpora.

The principal practical contribution of this study lies in improving minority-class stability under severe imbalance rather than maximising absolute performance. In misinformation monitoring, misclassifying deceptive content as legitimate may incur higher societal costs than false positives. The proposed ensemble and augmentation strategy increases recall for underrepresented deceptive categories while preserving interpretable alignment with linguistic features. This enhanced stability under constrained data conditions supports early warning systems in institutional environments where annotation resources remain limited and rapid adaptation is necessary.

Experiments show that hybrid, data-centric approaches consistently outperform traditional baselines, yielding gains in F1 score, recall and AUC across models. The autoencoder enhances feature representations in low-resource settings and supports latent space augmentation, while active learning achieves competitive performance with limited annotation effort. From a corpus linguistic perspective, active learning also prioritises linguistically informative instances, which enables more efficient modelling of discourse-level variation in misinformation.

In contrast to prior work that depends on large-scale datasets or social network metadata, this study emphasises linguistic content and offers a resource-efficient solution that adapts across domains. By grounding classification decisions in structured linguistic evidence, the framework improves interpretability and enables transparent decision-making, thereby addressing persistent concerns about model opacity in misinformation detection. The results further confirm the effectiveness of combining interpretable features with structured augmentation and ensemble learning for robust misinformation detection. Overall, this pipeline provides a reproducible and effective approach to automated fact checking in constrained environments, and establishes a basis for future extensions that incorporate multimodal data, real-time deployment and enhanced explainability in high-stakes domains such as journalism and education. Future research should further investigate how discourse analytic insights, pragmatic markers and cross-linguistic variation can refine the linguistic foundations of automated fact-checking systems.

The system is designed to operate as a decision support mechanism within misinformation monitoring workflows rather than as a replacement for editorial or journalistic verification. Specifically, its function is to assist human analysts by prioritising potentially deceptive content and highlighting linguistically salient features that warrant further inspection. Accordingly, final verification authority remains with trained professionals, ensuring accountability and contextual judgment. Moreover, maintaining human oversight mitigates risks associated with automated misclassification, particularly in politically sensitive or rapidly evolving information environments. Therefore, the framework aligns with responsible AI principles by augmenting, not substituting, expert evaluation and institutional review processes.

Author Contributions

P.O.-A.: conceptualization, writing—original draft, review and editing, E.P.: writing—review and editing, S.K.: writing—review and editing, C.C.-M.: writing—review and editing, M.R.: writing—review and editing, project administration, funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Project CEDAI 07-2024, funded by the Ministry of Education (MINEDUC), Chile.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All datasets and preprocessing scripts supporting the conclusions of this article will be made publicly available in an open-access repository upon acceptance, ensuring reproducibility of the acoustic feature extraction and classification procedures.

Acknowledgments

Large language models were used to assist in language polishing and formatting. The authors are fully responsible for the content and interpretation of all results.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ML	Machine Learning
DL	Deep Learning
NLP	Natural Language Processing
SVM	Support Vector Machine
B.RF	Balanced Random Forest
SMOTE	Synthetic Minority Over-sampling Technique
AE	Autoencoder

References

Tajrian, M.; Rahman, A.; Kabir, M.A.; Islam, M.R. A review of methodologies for fake news analysis. IEEE Access 2023, 11, 73879–73893. [Google Scholar] [CrossRef]
Kloss Medina, S.; Louit Carrasco, J. Las fake news durante el estallido social chileno y la labor del fact checking contra la desinformación. Comuni@CcióN Rev. Investig. Comun. Desarro. 2024, 15, 18–29. [Google Scholar] [CrossRef]
Bashaddadh, O.; Omar, N.; Mohd, M.; Khalid, M.N.A. Machine Learning and Deep Learning Approaches for Fake News Detection: A Systematic Review of Techniques, Challenges, and Advancements. IEEE Access 2025, 13, 90433–90466. [Google Scholar] [CrossRef]
Ghosh, K.; Bellinger, C.; Corizzo, R.; Branco, P.; Krawczyk, B.; Japkowicz, N. The class imbalance problem in deep learning. Mach. Learn. 2024, 113, 4845–4901. [Google Scholar] [CrossRef]
Lai, C.M.; Chen, M.H.; Kristiani, E.; Verma, V.K.; Yang, C.T. Fake news classification based on content level features. Appl. Sci. 2022, 12, 1116. [Google Scholar] [CrossRef]
Hamed, S.K.; Ab Aziz, M.J.; Yaakub, M.R. A review of fake news detection approaches: A critical analysis of relevant studies and highlighting key challenges associated with the dataset, feature representation, and data fusion. Heliyon 2023, 9, e20382. [Google Scholar] [CrossRef]
Sarker, I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef]
Ali, A.M.; Ghaleb, F.A.; Al-Rimy, B.A.S.; Alsolami, F.J.; Khan, A.I. Deep ensemble fake news detection model using sequential deep learning technique. Sensors 2022, 22, 6970. [Google Scholar] [CrossRef]
Deng, R.; Duzhin, F. Topological data analysis helps to improve accuracy of deep learning models for fake news detection trained on very small training sets. Big Data Cogn. Comput. 2022, 6, 74. [Google Scholar] [CrossRef]
Hussain, F.G.; Wasim, M.; Hameed, S.; Rehman, A.; Asim, M.N.; Dengel, A. Fake News Detection Landscape: Datasets, Data Modalities, AI Approaches, their Challenges, and Future Perspectives. IEEE Access 2025, 13, 54757–54778. [Google Scholar] [CrossRef]
Zhao, H.; Chen, H.; Ruggles, T.A.; Feng, Y.; Singh, D.; Yoon, H.J. Improving text classification with large language model-based data augmentation. Electronics 2024, 13, 2535. [Google Scholar] [CrossRef]
García-Díaz, J.A.; Jiménez-Zafra, S.M.; García-Cumbreras, M.A.; Valencia-García, R. Evaluating feature combination strategies for hate-speech detection in spanish using linguistic features and transformers. Complex Intell. Syst. 2023, 9, 2893–2914. [Google Scholar] [CrossRef]
Garg, S.; Sharma, D.K. Linguistic features based framework for automatic fake news detection. Comput. Ind. Eng. 2022, 172, 108432. [Google Scholar] [CrossRef]
Wei, P.; Li, Y.; Zhang, Z.; Hu, T.; Li, Z.; Liu, D. An Optimization Method for Intrusion Detection Classification Model Based on Deep Belief Network. IEEE Access 2019, 7, 87593–87605. [Google Scholar] [CrossRef]
Choudhary, A.; Arora, A. Linguistic feature based learning model for fake news detection and classification. Expert Syst. Appl. 2021, 169, 114171. [Google Scholar] [CrossRef]
Nwaiwu, S.; Jongsawat, N.; Tungkasthan, A. Decoding Disinformation: A Feature-Driven Explainable AI Approach to Multi-Domain Fake News Detection. Appl. Sci. 2025, 15, 9498. [Google Scholar] [CrossRef]
Pickering, L.; Cohen, K.; De Baets, B. A narrative review on the interpretability of fuzzy rule-based models from a modern interpretable machine learning perspective. Int. J. Fuzzy Syst. 2025. [Google Scholar] [CrossRef]
Shafik, W.; Hidayatullah, A.F.; Kalinaki, K.; Gul, H.; Zakari, R.Y.; Tufail, A. A systematic literature review on transparency and interpretability of AI models in healthcare: Taxonomies, tools, techniques, datasets, open research challenges, and future trends. Health Technol. 2026, 16, 209–230. [Google Scholar] [CrossRef]
Nair, P.S.; Bhattacharya, R.K. Explainable and Interpretable Predictive Models in High-Stake Decision Systems: Ensuring Trust, Transparency, and Ethical Ai in Critical Domains. J. Predict. Anal. Intell. Data 2025, 1, 145–152. [Google Scholar]
Madani, M.; Motameni, H.; Mohamadi, H. Fake news detection using deep learning integrating feature extraction, natural language processing, and statistical descriptors. Secur. Priv. 2022, 5, e264. [Google Scholar] [CrossRef]
Al-Alshaqi, M.; Rawat, D.B.; Liu, C. Ensemble techniques for robust fake news detection: Integrating transformers, natural language processing, and machine learning. Sensors 2024, 24, 6062. [Google Scholar] [CrossRef]
Comito, C.; Caroprese, L.; Zumpano, E. Multimodal fake news detection on social media: A survey of deep learning techniques. Soc. Netw. Anal. Min. 2023, 13, 101. [Google Scholar] [CrossRef]
Bhattacharjee, S.; Maity, S.; Chatterjee, S. Addressing Class Imbalance in Fake News Detection with Latent Space Resampling. In Lecture Notes in Networks and Systems; Springer Nature: Singapore, 2023; pp. 427–438. [Google Scholar] [CrossRef]
Pradana, R.C.; Joddy, S.; Girsang, A.S. Easy Data Augmentation for Handling Imbalanced Data in Fake News Detection. In Proceedings of the 2023 IEEE International Conference on Technology, Engineering, and Computing Applications (ICTECA), Semarang, Indonesia, 20–22 December 2023. [Google Scholar] [CrossRef]
Lai, J.; Yang, X.; Luo, W.; Zhou, L.; Li, L.; Wang, Y.; Shi, X. RumorLLM: A Rumor Large Language Model-Based Fake-News-Detection Data-Augmentation Approach. Appl. Sci. 2024, 14, 3532. [Google Scholar] [CrossRef]
Liyanage, C.R.; Gokani, R.; Mago, V. GPT-4 as an X data annotator: Unraveling its performance on a stance classification task. PLoS ONE 2024, 19, e0307741. [Google Scholar] [CrossRef]
Qin, S.; Zhang, M. Boosting generalization of fine-tuning BERT for fake news detection. Inf. Process. Manag. 2024, 61, 103745. [Google Scholar] [CrossRef]
Sufi, F. Generative pre-trained transformer (GPT) in research: A systematic review on data augmentation. Information 2024, 15, 99. [Google Scholar] [CrossRef]
Thakar, H.; Bhatt, B. Fake news detection: Recent trends and challenges. Soc. Netw. Anal. Min. 2024, 14, 176. [Google Scholar] [CrossRef]
Kaliyar, R.K.; Goswami, A.; Narang, P. FakeBERT: Fake news detection in social media with a BERT-based deep learning approach. Multimed. Tools Appl. 2021, 80, 11765–11788. [Google Scholar] [CrossRef]
Prachi, N.N.; Habibullah, M.; Rafi, M.E.H.; Alam, E.; Khan, R. Detection of fake news using machine learning and natural language processing algorithms. J. Adv. Inf. Technol. 2022, 13, 652–661. [Google Scholar] [CrossRef]
Touahri, I.; Mazroui, A. Survey of machine learning techniques for Arabic fake news detection. Artif. Intell. Rev. 2024, 57, 157. [Google Scholar] [CrossRef]
Puraivan, E.; Ormeño, P.; Kloss, S.; Cofré-Morales, C. Fake News Classification: A Linguistic Feature Selection Approach to Handle Imbalanced Data in Spanish. In Proceedings of the 2024 IEEE 43rd International Conference of the Chilean Computer Science Society (SCCC), Temuco, Chile, 28–30 October 2024; pp. 1–6. [Google Scholar]
Alghamdi, J.; Luo, S.; Lin, Y. A comprehensive survey on machine learning approaches for fake news detection. Multimed. Tools Appl. 2024, 83, 51009–51067. [Google Scholar] [CrossRef]
Ju, Y.; Shi, B.; Wen, B.; Lam, K.M.; Jiang, X.; Kot, A.C. Revisiting one-stage deep uncalibrated photometric stereo via fourier embedding. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 6185–6199. [Google Scholar] [CrossRef]
Bezerra, J.F.R.; Kozierkiewicz, A.; Pietranik, M. A Novel Approach for Tweet Similarity in a Context-Aware Fake News Detection Model. IEEE Access 2025, 13, 57043–57061. [Google Scholar] [CrossRef]
Alqadi, B.S.; Alsuhibany, S.A.; Yousafzai, S.N.; Alzu’bi, S.; Alsekait, D.M.; AbdElminaam, D.S. Transfer learning driven fake news detection and classification using large language models. Sci. Rep. 2025, 15, 28490. [Google Scholar] [CrossRef] [PubMed]
Albtoush, E.S.; Gan, K.H.; Alrababa, S.A.A. Fake news detection: State-of-the-art review and advances with attention to Arabic language aspects. PeerJ Comput. Sci. 2025, 11, e2693. [Google Scholar] [CrossRef] [PubMed]
Lugea, J. Linguistic approaches to fake news detection. Data Sci. Fake News Surv. Perspect. 2021, 42, 287–302. [Google Scholar] [CrossRef]
Xiao, L.; Zhang, Q.; Shi, C.; Wang, S.; Naseem, U.; Hu, L. Msynfd: Multi-hop syntax aware fake news detection. In Proceedings of the ACM Web Conference 2024, Singapore, 13–17 May 2024; pp. 4128–4137. [Google Scholar]
Cahyo, P.W.; Aesyi, U.S.; Setianto, W.A.; Sulaiman, T. A Novel Named Entity Recognition approach of Indonesian fake news using part of speech and BERT model on presidential election. Int. J. Inf. Manag. Data Insights 2025, 5, 100354. [Google Scholar] [CrossRef]
Shang, W.; Song, K.; Ji, J.; Yi, T.; Cai, J.; Li, X. Semantic space aligned multimodal fake news detection. Inf. Fusion 2025, 125, 103469. [Google Scholar] [CrossRef]
Mekulu, K.; Aqlan, F.; Yang, H. CharMark: Character-level Markov modeling for interpretable linguistic biomarkers of cognitive decline. Front. Digit. Health 2025, 7, 1659366. [Google Scholar] [CrossRef] [PubMed]
Khan, A.A.; Chaudhari, O.; Chandra, R. A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation. Expert Syst. Appl. 2024, 244, 122778. [Google Scholar] [CrossRef]
Luqman, M.; Faheem, M.; Ramay, W.Y.; Saeed, M.K.; Ahmad, M.B. Utilizing ensemble learning for detecting multi-modal fake news. IEEE Access 2024, 12, 15037–15049. [Google Scholar] [CrossRef]
Adeyemi, A.S.; Asakpa, S.O.; Bello, K.H.; Saliu, O.N.; Olaoye, O.M.; Toye, N.T. Fake News Detection using Ensemble Methods: An Empirical Evaluation of Bagging and Boosting Algorithms. Int. J. Adv. Eng. Manag. 2024, 6, 317–325. [Google Scholar] [CrossRef]
Bharadiya, J. Transfer learning in natural language processing (NLP). Eur. J. Technol. 2023, 7, 26–35. [Google Scholar] [CrossRef]
Steegh, E.; Sileno, G. No Labels? No Problem! Experiments with active learning strategies for multi-class classification in imbalanced low-resource settings. In Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law, Braga, Portugal, 19–23 June 2023; pp. 277–286. [Google Scholar]
Szczepański, M.; Pawlicki, M.; Kozik, R.; Choraś, M. New explainability method for BERT-based model in fake news detection. Sci. Rep. 2021, 11, 23705. [Google Scholar] [CrossRef]
Hashmi, E.; Yayilgan, S.Y.; Yamin, M.M.; Ali, S.; Abomhara, M. Advancing fake news detection: Hybrid deep learning with fasttext and explainable ai. IEEE Access 2024, 12, 44462–44480. [Google Scholar] [CrossRef]
Lai, T. Interpretable medical imagery diagnosis with self-attentive transformers: A review of explainable AI for health care. BioMedInformatics 2024, 4, 113–126. [Google Scholar] [CrossRef]
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
Vineetha, B.; Nirmala, M. Tri-Algo guardian ensemble approach for fake news detection in social media. J. Big Data 2025, 12, 118. [Google Scholar] [CrossRef]
Dahou, A.; Abd Elaziz, M.; Mohamed, H.; Dahou, A.H.; Al-qaness, M.A.; Ghetas, M.; Ewess, A.; Zheng, Z. Linguistic feature fusion for Arabic fake news detection and named entity recognition using reinforcement learning and swarm optimization. Neurocomputing 2024, 598, 128078. [Google Scholar] [CrossRef]
Tian, Y.; Xu, S.; Cao, Y.; Wang, Z.; Wei, Z. An Empirical Comparison of Machine Learning and Deep Learning Models for Automated Fake News Detection. Mathematics 2025, 13, 2086. [Google Scholar] [CrossRef]
Mohawesh, R.; Obaidat, I.; AlQarni, A.A.; Aljubailan, A.A.; Al-Shannaq, M.A.; Salameh, H.B.; Al-Yousef, A.; Saifan, A.A.; Alkhushayni, S.M.; Maqsood, S. Truth be told: A multimodal ensemble approach for enhanced fake news detection in textual and visual media. J. Big Data 2025, 12, 197. [Google Scholar] [CrossRef]
Kumar, C.; Bansal, M.; Khan, M.A.; Kaushik, V.; Arquam, M.; Alabdultif, A. Graph-augmented transformer ensemble framework for robust and scalable fake news detection in social media ecosystems. Sci. Rep. 2025, 16, 2001. [Google Scholar] [CrossRef]
Salehi, A.; Khedmati, M. Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification. Sci. Rep. 2025, 15, 3460. [Google Scholar] [CrossRef]
Chechkin, A.; Pleshakova, E.; Gataullin, S. A Hybrid Neural Network Transformer for Detecting and Classifying Destructive Content in Digital Space. Algorithms 2025, 18, 735. [Google Scholar] [CrossRef]
Providel, E.; Toro, D.; Riquelme, F.; Mendoza, M.; Puraivan, E. CLNews: The First Dataset of the Chilean Social Outbreak for Disinformation Analysis. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. ACM, Atlanta, GA, USA, 17–21 October 2022. [Google Scholar] [CrossRef]
Santos-D’Amorim, K.; de Oliveira Miranda, M.K.F. Misinformation, disinformation, and malinformation: Clarifying the definitions and examples in disinfodemic times. Encontros Bibli Rev. Eletrônica Bibliotecon. Ciênc. Inf. 2021, 26. [Google Scholar] [CrossRef]
Barrio-Cantalejo, I.M.; Simón-Lorda, P.; Melguizo, M.; Escalona, I.; Marijuán, M.I.; Hernando, P. Validation of the INFLESZ scale to evaluate readability of texts aimed at the patient. An. Sist. Sanit. Navar. 2008, 31, 135–152. [Google Scholar] [CrossRef] [PubMed]
Sweis, G.P. Lingüística de corpus y análisis multidimensional: Exploración de la valoración en el corpus PUCV-2003. Rev. Esp. Lingüíst. 2005, 35, 45–76. [Google Scholar]
Biber, D.; Conrad, S.; Reppen, R. Corpus Linguistics: Investigating Language Structure and Use; Cambridge Approaches to Linguistics; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
Biber, D. Using register-diversified corpora for general language studies. Comput. Linguist. 1993, 19, 219–241. [Google Scholar]
Puraivan, E.; Venegas, R.; Riquelme, F. An empiric validation of linguistic features in machine learning models for fake news detection. Data Knowl. Eng. 2023, 147, 102207. [Google Scholar] [CrossRef]
Tabassum, A.; Patil, R.R. A survey on text pre-processing & feature extraction techniques in natural language processing. Int. Res. J. Eng. Technol. (IRJET) 2020, 7, 4864–4867. [Google Scholar]
Shu, K.; Sliva, A.; Wang, S.; Tang, J.; Liu, H. Fake news detection on social media: A data mining perspective. ACM SIGKDD Explor. Newsl. 2017, 19, 22–36. [Google Scholar] [CrossRef]
Kapusta, J.; Držík, D.; Šteflovič, K.; Nagy, K.S. Text data augmentation techniques for word embeddings in fake news classification. IEEE Access 2024, 12, 31538–31550. [Google Scholar] [CrossRef]
Abdullah, M.; Zan, H.; Javed, A.; Sohail, M.; Mamyrbayev, O.; Turysbek, Z.; Eshkiki, H.; Caraffini, F. A Multimodal Ensemble-Based Framework for Detecting Fake News Using Visual and Textual Features. Mathematics 2026, 14, 360. [Google Scholar] [CrossRef]
Gupta, A.K.; Singh, M.P. Multimodal Fake News Detection using Semantic-Visual Alignment and Attention Fusion. SN Comput. Sci. 2026, 7, 305. [Google Scholar] [CrossRef]
Comito, C.; Guarascio, M.; Liguori, A.; Pisani, F.S. DALEK: Combining deep active learning and explanations methods for fake news detection on COVID-19. Neural Comput. Appl. 2026, 38, 19. [Google Scholar] [CrossRef]

Figure 1. Histogram showing the distribution of news instances in the misinformation classification dataset. The imbalance across categories—especially the predominance of NO_RUMOR samples—underscores the need for augmentation and sampling strategies to support robust classifier training.

Figure 2. Autoencoder-based framework. The encoder extracts compact linguistic representations for fake news classification, while the decoder ensures semantic coherence. Latent features are used for classifier pretraining and synthetic data generation to enhance performance under data scarcity.

Figure 3. Active learning framework for fake news detection. The system iteratively trains a text classification model to detect misinformation. The model identifies uncertain predictions from an unlabeled corpus of news articles, which are then reviewed by human fact-checkers. Newly verified samples are added to the labeled dataset, improving detection accuracy while minimizing manual labeling effort.

Figure 4. Overview of the experimental pipeline for fake news detection using linguistic features. The workflow includes baseline training, synthetic oversampling, ensemble integration, autoencoders, and active learning refinement. The figure emphasises the interaction between data-centric and model-centric components within a unified and reproducible workflow.

Figure 5. The top twenty features ranked by importance values. The bars represent the relative magnitude of each feature, and the labels have been abbreviated for improved readability.

Figure 6. Scatter plot of fake news categorization, showing the distribution of four classes: NO RUMOR (purple), TRUE (blue), FALSE (red), and NO VERIFIED (yellow). Each point represents an instance in the dataset projected into two dimensions, illustrating the separation and overlap among categories.

Figure 7. Comparison of recall performance across all models, showing mean values and standard deviations computed over cross-validation folds.

Table 1. Distribution of rumor categories in the dataset.

Category	Description	Samples
True rumor	Verified information that spread as a rumor	87
False rumor	False information propagated as a rumor	53
Unverified	Information with no official verification status	49
Non-rumor	News or content not classified as a rumor	111

Table 2. Description of classification models used in the study.

Model	Description
Decision Tree (DT)	A rule-based model valued for its interpretability and ability to model non-linear decision boundaries. Gini impurity was used for splitting, and tree depth was optimized via cross-validation to avoid overfitting. Suitable for analyzing individual linguistic feature contributions.
Support Vector Machine (SVM)	Effective for high-dimensional, sparse text data. An RBF kernel captured non-linear relations, with parameters C and $γ$ selected through grid search on a stratified validation set. Suitable for detecting subtle linguistic patterns.
Fully Connected Neural Network (FCNN)	A shallow neural architecture with two hidden layers (ReLU activations), dropout, and softmax output. Trained on the original imbalanced dataset to evaluate deep learning performance without augmentation or transfer learning.

Table 3. Description of the oversampling techniques used in this study.

Technique	Description
Synthetic Minority Oversampling Technique (SMOTE)	Creates synthetic samples by interpolating between minority-class instances and their nearest neighbors. Helps preserve linguistic coherence and prevents overfitting while increasing minority-class diversity in training.

Table 4. Description of ensemble learning techniques.

Technique	Description
Adaptive boosting (AdaBoost)	Combines weak learners sequentially by reweighting misclassified instances to focus on hard examples. Enhances detection of minority-class fake news by increasing ensemble attention to difficult samples.
Bootstrap aggregation (bagging)	Trains multiple models on different bootstrap samples and aggregates predictions via majority voting. Reduces variance and improves robustness against linguistic noise and writing style variability.
Balanced bagging	Ensures equal class representation in each bootstrap sample. Promotes fair learning across classes and reduces bias toward majority-class news in imbalanced datasets.
Balanced Random Forest (BRF)	Extends Random Forests using class-balanced bootstrapping for each tree. Increases model sensitivity to fake news cues in high-dimensional feature spaces like linguistic representations.

Table 5. Description of evaluation metrics used in diagnostic classification tasks.

Metric	Description
Accuracy	Proportion of correctly classified instances among all instances. Can be misleading in imbalanced datasets.
Precision	Proportion of true positives among all positive predictions. Indicates how well the model avoids false positives.
Recall (sensitivity)	Proportion of true positives among all actual positives. Critical in detecting cases to minimize false negatives.
F1 score	Harmonic mean of precision and recall. Balances false positives and false negatives. Useful in medical diagnosis.
Specificity	Proportion of true negatives among all actual negatives. Reflects the model’s ability to avoid false positives.
ROC-AUC	Area under the ROC curve. Measures trade-off between sensitivity and specificity across thresholds.
Cohen’s Kappa	Measures agreement between predicted and actual classifications, accounting for chance. Values range from −1 to 1.
Confusion matrix	Tabular summary of true/false positives and negatives. Offers granular insight into model errors.

Table 6. Classification results across models (mean ± std), on classical, ensemble, resampling, autoencoder-based, and active learning models evaluated using accuracy, precision, recall, F1 score, AUC, and Kappa.

Model	ACC	PRE	REC	F1	AUC	KAPPA
Classical Machine Learning Models
DT	$0.267 \pm 0.067$	$0.200 \pm 0.070$	$0.214 \pm 0.048$	$0.160 \pm 0.029$	$0.477 \pm 0.035$	$- 0.041 \pm 0.085$
SVM	$0.342 \pm 0.031$	$0.246 \pm 0.033$	$0.272 \pm 0.012$	$0.206 \pm 0.006$	$0.515 \pm 0.008$	$0.036 \pm 0.018$
FCNN	$0.300 \pm 0.032$	$0.227 \pm 0.043$	$0.249 \pm 0.032$	$0.212 \pm 0.030$	$0.501 \pm 0.022$	$0.008 \pm 0.049$
Ensemble Methods
BAGGING	$0.331 \pm 0.028$	$0.291 \pm 0.045$	$0.283 \pm 0.025$	$0.271 \pm 0.031$	$0.526 \pm 0.016$	$0.071 \pm 0.029$
ADABOOST	$0.292 \pm 0.024$	$0.274 \pm 0.044$	$0.252 \pm 0.032$	$0.249 \pm 0.032$	$0.503 \pm 0.020$	$0.016 \pm 0.037$
BAL BAGG	$0.269 \pm 0.037$	$0.270 \pm 0.027$	$0.264 \pm 0.030$	$0.256 \pm 0.036$	$0.511 \pm 0.019$	$0.029 \pm 0.035$
BRF	$0.295 \pm 0.019$	$0.316 \pm 0.044$	$0.297 \pm 0.028$	$0.288 \pm 0.019$	$0.533 \pm 0.019$	$0.067 \pm 0.039$
Resampling-based Methods
SMOTE RF	$0.319 \pm 0.053$	$0.294 \pm 0.033$	$0.283 \pm 0.040$	$0.262 \pm 0.032$	$0.523 \pm 0.029$	$0.052 \pm 0.067$
SMOTE SVM	$0.267 \pm 0.052$	$0.303 \pm 0.091$	$0.264 \pm 0.048$	$0.256 \pm 0.051$	$0.512 \pm 0.033$	$0.032 \pm 0.068$
FCNN SMOTE	$0.322 \pm 0.101$	$0.302 \pm 0.080$	$0.294 \pm 0.069$	$0.278 \pm 0.070$	$0.532 \pm 0.048$	$0.079 \pm 0.101$
Advanced Learning Strategies
ENCODER	$0.228 \pm 0.014$	$0.381 \pm 0.173$	$0.274 \pm 0.036$	$0.174 \pm 0.027$	$0.516 \pm 0.024$	$0.024 \pm 0.041$
ACTIVE	$0.350 \pm 0.095$	$0.240 \pm 0.098$	$0.273 \pm 0.049$	$0.212 \pm 0.062$	$0.516 \pm 0.033$	$0.040 \pm 0.075$

Table 7. Classification performance (mean ± standard deviation) of classical classifiers using contextual semantic representations derived from multilingual BERT embeddings. Models were evaluated using accuracy (ACC), precision (PRE), recall (REC), F1-score (F1), area under the ROC curve (AUC), and Cohen’s Kappa. Best results per metric are highlighted in bold.

Model	ACC	PRE	REC	F1	AUC	KAPPA
Transformer-Based Feature Representations (BERT)
DT + BERT	$0.267 \pm 0.031$	$0.245 \pm 0.034$	$0.247 \pm 0.035$	$0.241 \pm 0.035$	$0.496 \pm 0.023$	$- 0.015 \pm 0.049$
SVM + BERT	$0.359 \pm 0.051$	$0.345 \pm 0.052$	$0.335 \pm 0.042$	$0.331 \pm 0.044$	$0.556 \pm 0.028$	$0.108 \pm 0.058$
FCNN + BERT	$0.385 \pm 0.057$	$0.368 \pm 0.062$	$0.357 \pm 0.062$	$0.355 \pm 0.060$	$0.571 \pm 0.040$	$0.138 \pm 0.074$

Table 8. 95% Confidence Intervals (CI) with Median [IQR] and Mean ± SD for AUC across models. CI approximated from 5-fold CV.

Model	95% CI	Median [IQR]	Mean ± SD
DT	[0.458, 0.496]	0.477 [0.465–0.489]	$0.477 \pm 0.035$
SVM	[0.511, 0.519]	0.515 [0.512–0.518]	$0.515 \pm 0.008$
FCNN	[0.489, 0.513]	0.501 [0.493–0.509]	$0.501 \pm 0.022$
BAGG	[0.517, 0.535]	0.526 [0.519–0.533]	$0.526 \pm 0.016$
ADA	[0.492, 0.514]	0.503 [0.494–0.512]	$0.503 \pm 0.020$
B.BAGG	[0.501, 0.521]	0.511 [0.504–0.518]	$0.511 \pm 0.019$
B.RF	[0.523, 0.543]	0.533 [0.526–0.540]	$0.533 \pm 0.019$
RF.SMOTE	[0.507, 0.539]	0.523 [0.510–0.536]	$0.523 \pm 0.029$
SVM.SMOTE	[0.494, 0.530]	0.512 [0.498–0.526]	$0.512 \pm 0.033$
FCNN.SMOTE	[0.506, 0.558]	0.532 [0.511–0.553]	$0.532 \pm 0.048$
ENCODER	[0.503, 0.529]	0.516 [0.505–0.527]	$0.516 \pm 0.024$
ACTIVE	[0.498, 0.534]	0.516 [0.502–0.530]	$0.516 \pm 0.033$

Table 9. Pairwise comparisons vs. SVM (baseline) on F1-score using summary statistics (k = 5). Two-sided z-test on difference of means; Holm-adjusted p-values within the metric. Cohen’s d uses pooled SD (independent-groups approximation).

Model	ΔF1 (Mean)	z	p	Cohen’s d
B.RF	+0.082	9.202	<1 × $10^{- 6}$	5.820
BAGG	+0.065	4.603	4.16 × $10^{- 6}$	2.911
RF.SMOTE	+0.056	3.846	1.20 × $10^{- 4}$	2.383
DT	−0.046	−3.473	5.14 × $10^{- 4}$	−2.197
B.BAGG	+0.050	3.063	2.19 × $10^{- 3}$	1.937
ADA	+0.043	2.953	3.14 × $10^{- 3}$	1.868
ENCODER	−0.032	−2.587	9.68 × $10^{- 3}$	−1.382
FCNN.SMOTE	+0.072	2.292	2.19 × $10^{- 2}$	1.386
SVM.SMOTE	+0.050	2.177	2.95 × $10^{- 2}$	1.086
FCNN	+0.006	0.439	0.661	0.277
ACTIVE	+0.006	0.215	0.829	0.191

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ormeño-Arriagada, P.; Puraivan, E.; Kloss, S.; Cofré-Morales, C.; Rodriguez, M. Interpretable Fake News Detection Using Linguistic Indicators Under Imbalanced and Low-Resource Conditions. Appl. Sci. 2026, 16, 5080. https://doi.org/10.3390/app16105080

AMA Style

Ormeño-Arriagada P, Puraivan E, Kloss S, Cofré-Morales C, Rodriguez M. Interpretable Fake News Detection Using Linguistic Indicators Under Imbalanced and Low-Resource Conditions. Applied Sciences. 2026; 16(10):5080. https://doi.org/10.3390/app16105080

Chicago/Turabian Style

Ormeño-Arriagada, Pablo, Eduardo Puraivan, Steffanie Kloss, Connie Cofré-Morales, and Miguel Rodriguez. 2026. "Interpretable Fake News Detection Using Linguistic Indicators Under Imbalanced and Low-Resource Conditions" Applied Sciences 16, no. 10: 5080. https://doi.org/10.3390/app16105080

APA Style

Ormeño-Arriagada, P., Puraivan, E., Kloss, S., Cofré-Morales, C., & Rodriguez, M. (2026). Interpretable Fake News Detection Using Linguistic Indicators Under Imbalanced and Low-Resource Conditions. Applied Sciences, 16(10), 5080. https://doi.org/10.3390/app16105080

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Interpretable Fake News Detection Using Linguistic Indicators Under Imbalanced and Low-Resource Conditions

Abstract

1. Introduction

1.1. Background and Problem Context

1.2. Linguistic Features in Fake News

1.3. Interpretability, Transparency, and Explainability in Fake News and Linguistics

1.4. Research Gap and Proposed Framework

1.5. Research Questions and Experimental Design

1.6. Contributions

2. Related Work

2.1. Overview of Fake News Detection Research

2.2. Machine Learning and Predictive Modeling Approaches

2.3. Small Data and Low-Resource Challenges

2.4. Linguistic Feature-Based Approaches

2.5. Data Augmentation, Ensemble Learning, and Representation Learning

2.6. Interpretability and Explainability in Fake News Detection

2.7. Gap Identification and Research Positioning

3. Methodology

3.1. Study Overview

3.2. Dataset Description

3.3. Linguistic and Readability Features

3.4. Feature Selection Justification

3.5. Preprocessing

3.6. Baseline Models

3.7. Augmentation Techniques

3.8. Ensemble Methods

3.9. Transfer Learning with Autoencoder-Based Feature Extraction

3.10. Active Learning Framework

3.11. Model Evaluation

3.12. Experimental Workflow

3.13. Classification and Fake News Metrics

3.14. Reproducibility and Code Availability

4. Results

4.1. Fake News Detection Pipeline Performance

4.2. Comparative Analysis

4.3. Mutual Information

4.4. Statistical Validation

4.5. Summary of Key Findings

5. Discussion

5.1. Principal Findings and System-Level Insights

5.2. Interpretation and NLP Context

5.3. Comparison with Existing Approaches

5.4. Interpretability and Transparency

5.5. Empirical Validation and Discussion of Research Questions

5.6. Strengths and Practical Deployment Considerations

5.7. Limitations and Future Work

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI