1. Introduction
The rise of digital platforms has revolutionized information dissemination, but it has also accelerated the spread of misinformation and disinformation, which, although often used interchangeably, are distinct concepts: misinformation refers to false information shared without intent to harm, while disinformation involves deliberate deception [
1]. Both forms pose serious risks to public health, democratic discourse, and social trust.
In response, numerous computational models have been proposed to identify misleading content. While many achieve high accuracy on benchmark datasets, two enduring limitations hinder real-world deployment: (1) lack of explainability and (2) poor cross-domain generalization. Importantly, as highlighted in previous work, most fake news detection systems, including ours, do not verify the factual truth of claims directly, which would require comprehensive and continuously updated world knowledge [
2]. Instead, these systems detect linguistic, contextual, and stylistic indicators that are statistically associated with previously verified instances of misinformation [
3,
4]. This distinction is critical to setting realistic expectations about system capabilities and deployment contexts.
The first challenge, explainability, is particularly relevant for high-stakes domains such as journalism, fact-checking, and content moderation. State-of-the-art natural language processing (NLP) models, especially those based on large language models (LLM), often operate as ‘black boxes’, producing predictions without transparent justifications. This opacity reduces user trust and limits adoption. While Explainable AI (XAI) methods such as Local Interpretable Model-agnostic Explanations (LIME) [
5] and Shapley Additive Explanations (SHAP) [
6] offer post-hoc interpretability, they often focus on token-level importance without incorporating theoretically grounded communication or credibility frameworks. Recent studies show that LIME explanations can suffer from instability, small input changes can yield significantly different outputs, and that both LIME and SHAP are highly influenced by feature collinearity and model dependence [
7,
8].
The second challenge is generalization. Models trained on one domain (e.g., political news) frequently underperform on others (e.g., health misinformation) [
9], particularly when switching between formal news articles and informal user-generated content. Domain shifts can reduce accuracy by 20–50%—often around
in practical settings [
10], reflecting the need for domain-invariant features and cross-modal robustness.
To address these challenges, we propose X-FRAME, a hybrid framework that fuses deep semantic encoding of the multilingual transformer XLM-RoBERTa [
11] with structured, theory-driven features from psycholinguistics, communication studies, and social context analysis. By integrating these two information streams: semantic embeddings and engineered contextual features, X-FRAME detects indicators of misinformation across diverse modalities while providing human-understandable explanations.
Our main contributions are as follows.
We introduce a hybrid architecture combining XLM-RoBERTa embeddings with framing-theoretic features, achieving accuracy and outperforming text-only and features-only baselines.
We evaluated X-FRAME on an aggregated corpus of samples from eight public datasets, demonstrating the potential for cross-domain adaptability with strong performance on both formal news and social media.
We provide interpretable insights using LIME and permutation importance, grounding predictions in cues such as source credibility, emotional tone, and sensational framing.
By explicitly addressing both interpretability and generalization, X-FRAME advances the practical deployment of misinformation detection systems while clarifying that its outputs reflect probabilistic indicators of deceptive patterns rather than definitive truth verification.
The remainder of this paper is structured as follows.
Section 2 surveys prior literature.
Section 3 presents our dataset, feature engineering, and model architecture.
Section 4 and
Section 5 detail our experimental results and analysis.
Section 6 concludes and outlines future directions.
3. Methodology
This section presents the detailed methodological framework adopted in this study. We begin by describing the construction of our comprehensive multi-source corpus, followed by the design of a multi-layered feature engineering pipeline grounded in communication theory. We then elaborate on the architecture of our proposed X-FRAME model and conclude with the experimental design used for training, evaluation, and interpretability assessments.
3.1. Corpus Construction and Composition
To ensure robust generalization across domains, we aggregated eight publicly available fake news datasets into a single heterogeneous corpus (
Table 1). Datasets are grouped by content origin and communicative style to enable explicit cross-domain analysis.
Table 1.
Summary of datasets in the aggregated corpus (grouped by curation/source type).
Table 1.
Summary of datasets in the aggregated corpus (grouped by curation/source type).
Dataset | Modality/Source | Label Type |
---|
Group A (curated/news-style sources) |
ISOT [38] | Formal news | Binary (Real/Fake) |
WELFake [39] | Synthetic + real (GAN-generated + real) | Binary |
FNC-1 [40] | News site claims (stance) | Multiclass (agree/deny/discuss/unrelated) → Binary |
FakeNewsAMT [12] | Crowdsourced (AMT) | Binary |
Celebrity | News websites (AMT) | Binary |
Group B (social/noisy/real-world feeds) |
LIAR [41] | PolitiFact (speeches, interviews) | 6-way → Binary |
PHEME [42] | Twitter cascades | 4-way (support/deny/query/comment) → Binary |
FakeNewsNet [43] | Twitter + fact-checking (GossipCop/PolitiFact) | Binary |
Group A (formal news articles). Editorially produced news or claims from established outlets; longer sentences, standardized grammar, and a formal tone.
Group B (social media content and claims). User-generated posts, short claims, and rumor cascades; informal style with abbreviations, hashtags, and unstructured discourse.
This separation supports the evaluation between professionally edited news (Group A) and noisier, informal user content (Group B). For comparability, multiclass labels were harmonized to a binary schema (Real vs. Fake) when necessary.
3.1.1. Statistical Differences Between Groups
To support the validity of the Group A/B division used in
Table 1, we computed descriptive statistics for three representative features: token length, sentiment polarity, and subjectivity. Group A texts averaged 250.33 tokens per sample (SD = 234.52), with mean sentiment polarity
and subjectivity
. In contrast, Group B texts averaged only 6.79 tokens (SD = 3.91), with polarity
and subjectivity
.
All three metrics showed statistically significant differences based on independent two-sample Welch’s
t-tests: token length (
), sentiment polarity (
), and subjectivity (
). These results confirm that Group B is substantially shorter and less subjective—consistent with informal, reactive, social media content. Welch’s test was chosen for its robustness to unequal variances and sample sizes [
44].
3.1.2. Label Mapping and Validation
A critical step in creating a unified corpus was the standardization of labels. Following common practice in multi-dataset studies [
45], we mapped all datasets to a binary schema (1 = Fake, 0 = Real). For LIAR, the labels pants-on-fire, false, and barely-true were grouped as Fake, while half-true, mostly-true, and true were grouped as Real. To validate this mapping, we conducted a manual annotation comparison: A random sample of 150 LIAR statements was independently annotated by two researchers, yielding a Cohen’s
of 0.82. According to the benchmark scale of Landis and Koch [
46], values between 0.81 and 1.00 indicate “almost perfect” agreement, confirming the reliability of our binary relabeling.
For FNC-1, the disagree label was mapped to Fake, and the remaining labels (agree, discuss, unrelated) were mapped to Real.
3.2. Data Cleaning and Preprocessing
To ensure data quality and avoid label leakage or content redundancy across training and evaluation, we implemented a multi-stage preprocessing and deduplication pipeline prior to dataset merging and partitioning.
3.2.1. Text Normalization
The raw texts were first normalized using the spaCy language-specific model. This included:
These transformations supported downstream feature engineering tasks—such as readability and sentiment analysis—where consistent lexical forms improve reliability. Notably, this preprocessing was applied to the structured feature stream only; raw text for transformer encoding was preserved to retain semantic integrity.
3.2.2. Deduplication Strategy
To further refine the corpus and reduce content redundancy, we applied a three-step deduplication process:
Exact match removal: Duplicate entries with identical cleaned_text strings were dropped.
Near-duplicate filtering: TF-IDF cosine similarity was computed for short texts (e.g., tweets, claims). Pairs with similarity were flagged and one copy was removed.
Metadata-based filtering: Entries with identical URL, source_id, or post IDs were pruned when such metadata was available.
This process reduced sampling bias, mitigated viral content oversampling, and ensured independence across domains. The resulting corpus contained 286,260 unique samples, ready for stratified train/validation/test partitioning.
3.3. Feature Engineering Pipeline
To capture a comprehensive range of predictive signals, we designed a multi-layered feature pipeline that complements raw semantic representations with structured contextual, psychological, and source-level attributes [
47]. The final engineered feature vector, after one-hot encoding of categorical variables, contains 104 dimensions: 9 continuous numerical features and 95 binary features derived from categorical encodings. As shown in
Figure 1.
The conceptual feature set prior to encoding consisted of 18 raw features, grouped as follows:
Source and credibility (4 features)
Social context (2 features)
Psycholinguistic and stylistic (6 features)
Sensationalism and urgency (3 features)
Sentiment and subjectivity (2 features)
3.3.1. Contextual and Framing Feature Extraction
Inspired by media framing theory and research in media psychology [
48], we engineered a rich set of structured features to capture the framing, tone, and provenance of each news item.
Source and Credibility Features
This category captures metadata associated with source identity and political affiliation. We included the categorical variables
source_name,
source_domain, and
liar_party, which were one-hot encoded and a numeric speaker credibility score. For datasets with speaker metadata (e.g., LIAR), we engineered the following:
where
is the count of “true” statements, and
prevents division by zero. This feature estimates the historical factuality of a speaker.
Social Context Features
To capture audience interaction and propagation signals, we included:
These features reflect message virality and direct user-to-user interactions in early rumor dynamics. However, we acknowledge that such metadata is not available in all datasets. For datasets lacking these signals (e.g., LIAR, FNC-1), we handle missing values by setting them to zero, which is interpretable as the absence of observed engagement.
To test the impact of this design choice, we conducted an ablation analysis in
Section 4. The results show that even in the absence of social signals, the model maintains robust performance, with contextual embeddings and psycholinguistic features that compensate for missing metadata. This confirms that the X-FRAME architecture is adaptable to both metadata-rich and metadata-sparse environments, ensuring broad generalizability.
Psycholinguistic and Stylistic Features
Based on linguistic deception research [
49], we derived features associated with cognitive processing load and stylistic anomalies.
Readability and Cognitive Load: We used
textstat to compute the Flesch Reading Ease (FRE) score:
Extremely low or high readability may signal manipulative intent, either oversimplifying to appeal broadly or obfuscating through complex jargon [
50].
Sensationalism and Urgency: We measured affective stylistic features:
These reflect emotionally charged, urgent framing common in clickbait and propaganda.
Sentiment and Subjectivity: We used
TextBlob [
51] to compute:
These measure emotional polarity and opinionated framing. High subjectivity or strong sentiment may reflect persuasive intent over factual reporting.
Feature Scaling
All continuous variables were scaled to the range using Min–Max normalization, while categorical variables were one-hot encoded. The conceptual feature space comprised 104 features (9 continuous and 95 categorical), which expanded to 129 encoded dimensions in the final feature matrix used for model training.
Limitations of Engineered Features
Although frame-based cues have been shown to improve detection and calibration [
52], their effectiveness may erode as deceptive tactics evolve. Monitoring for drift in framing signals should be considered in future iterations.
3.4. The X-FRAME Model Architecture
X-FRAME is a hybrid neural architecture that fuses structured and semantic features through dual encoding streams and a classification head. This mirrors hybrid frameworks like BRaG, which combines semantic encoding (via BERT and RNN) with contextual graph-based features before classification [
53], See
Figure 2.
3.4.1. Text Encoder Stream
We utilized the XLM-RoBERTa base model to generate 768-dimensional contextual embeddings from the
cleaned_text. The final hidden state of the [CLS] token is used as the text representation:
3.4.2. Structured Feature Stream
The structured feature vector S comprises 9 continuous variables and binary indicators from one-hot–encoded categorical features. While the conceptual feature set contains 104 features, the actual model input expands to 129 encoded dimensions after preprocessing.
This 129-dimensional input is passed through a multilayer perceptron (MLP) with two hidden layers [128, 64], ReLU activations, and dropout (
), yielding a 64-dimensional context embedding:
3.4.3. Fusion and Classification
The text and structured embeddings are fused via concatenation and passed through a final classification head:
3.5. Experimental Design and Evaluation Framework
3.5.1. Training Protocol
The corpus was split into training (70%), validation (15%), and test (15%) subsets using stratified sampling to preserve class balance. We fine-tuned the model for four epochs with a batch size of 16. Optimization was performed using AdamW [
54] with a learning rate of
—a value commonly adopted in transformer fine-tuning due to its favorable trade-off between convergence speed and training stability.
Hyperparameter Selection
To ensure robustness, we conducted a grid search over several architecture and regularization configurations using validation performance (macro F1 score) as the selection criterion. For the structured MLP stream, we evaluated hidden layer configurations , , and , while testing dropout rates of . The configuration with dropout consistently achieved the best validation performance, striking a balance between model expressiveness and overfitting mitigation.
Similarly, for the learning rate, we tested
. The setting
offered the best blend of convergence and generalization—consistent with prevailing best practices that favor low learning rates to avoid catastrophic forgetting in Transformer fine-tuning [
55].
Hyperparameter Validation
We compared representative hyperparameter configurations for the structured MLP (hidden layer sizes), dropout, and learning rate on the held-out validation set. The configuration used in all subsequent experiments is the one that maximized validation performance:
with dropout
and learning rate
.
Table 2 reports validation accuracy for each configuration and the percentage change relative to the selected setting (higher is better), and
Figure 3 visualizes the trend across learning rates.
Class Imbalance Handling
To address the inherent class imbalance in the training data (approximately 2:1 Real to Fake), we applied a weighted Cross-Entropy Loss function. This technique penalizes the model more heavily for misclassifying the minority class. The weight for each class
c, denoted
, was calculated as inversely proportional to its frequency in the training data.
where
N is the total number of training samples,
C is the number of classes (2), and
is the number of samples in class
c. This results in a higher weight for the “Fake” class, encouraging the model to focus on detecting under-represented instances.
Evaluation Metrics
We report accuracy, area under the ROC curve (AUC), and both macro-averaged and class-specific F1-scores in subsequent sections. Macro-averaging ensures that both majority and minority classes contribute equally to the score, regardless of their frequency.
The model checkpoint that achieved the highest validation macro F1 score was saved and used for all subsequent evaluations.
3.5.2. Benchmarking and Comparative Analysis
To validate the hybrid design, we benchmarked X-FRAME against two internal baselines:
We also compared X-FRAME with recent state-of-the-art models from prior works to contextualize its performance.
3.5.3. Explainability and Robustness Analysis
To validate the model’s transparency and resilience, we conducted a multi-faceted analysis using three complementary methods, each targeting a different aspect of model behavior.
Global Feature Importance with Permutation Importance
To determine which engineered features consistently influenced the model predictions, we employed Permutation Importance [
57], a model-agnostic technique. This approach aligns with the theoretical framework of ‘model reliance’ introduced by Fisher, Rudin, and Dominici [
58], which generalizes permutation importance beyond tree-based models.
A larger drop in accuracy following feature shuffling indicates a more influential feature:
Local Prediction Explanation with LIME
To interpret individual predictions, we used LIME (Local Interpretable Model-agnostic Explanations). LIME constructs a local interpretable surrogate model, typically a linear classifier, that approximates the behavior of the full X-FRAME model in the vicinity of a specific input. By generating thousands of small perturbations of the input and observing the resulting predictions, LIME identifies the features (both textual and structured) that most strongly influenced the model’s decision. This produces a human-understandable rationale for individual classification outcomes. LIME, along with other XAI techniques such as SHAP, has proven essential in enhancing the transparency of fake news detection models in recent comparative studies [
59].
Semantic Robustness with Adversarial Testing
To assess robustness against subtle semantic manipulations, we conducted adversarial testing using the TextFooler attack algorithm [
60]. TextFooler replaces important words in the input with semantically similar alternatives (determined via pre-trained embeddings) until the model’s prediction changes. We report two key metrics:
Together, these metrics provide insight into whether the model relies on deep semantic understanding or is vulnerable to superficial linguistic changes.
4. Experimental Results
This section presents a comprehensive empirical evaluation of our proposed X-FRAME model. We first report on the overall performance of the final model trained on the complete aggregated corpus in
Table 3. We then contextualize these results through a comparative analysis against strong baseline models. Subsequently, we conduct a series of in-depth analyses, including ablation studies, generalization tests across different data modalities, and robustness checks against adversarial attacks, to provide a holistic understanding of the model’s characteristics. Finally, we demonstrate the effectiveness of a domain adaptation strategy to improve performance in challenging social media content. The confusion matrix corresponding to the model’s final evaluation is illustrated in
Figure 4.
Precision is defined as (False Discovery Rate), and Recall as (False Negative Rate).
The results indicate high precision for both classes, suggesting that the model is reliable when it makes a classification. To further evaluate its discriminative capacity, we plotted the Receiver Operating Characteristic (ROC) curve. As shown in
Figure 5, the model achieved an Area Under the Curve (AUC) score of 0.9367, demonstrating excellent performance in distinguishing between Real and Fake classes across a range of classification thresholds.
4.1. Benchmarking Against Baselines
To validate the effectiveness of our hybrid architecture, we benchmarked X-FRAME against two internal baselines, each designed to isolate the contribution of one component of the hybrid pipeline:
Baseline 1: Text-Only—XLM-RoBERTa fine-tuned on the cleaned_text field only, without any structured features.
Baseline 2: Features-Only—Gradient Boosting Classifier (GBC) trained solely on the 104-dimensional structured feature set, without any Transformer-based semantic input.
All baselines were trained and evaluated on the same train/validation/test splits of our aggregated corpus to ensure fairness. We report accuracy, class-specific F1-scores, macro-averaged F1, and AUC to provide a comprehensive view of performance.
The results in
Table 4 show that X-FRAME outperforms both baselines across all major metrics. The performance gap between the hybrid model and the baselines demonstrates the value of combining deep semantic content with contextual and psycholinguistic engineered features. This supports recent findings that enriching large language models with structured external knowledge improves complex classification performance [
61]. A comparative AUC-ROC curve illustrating these performance differences is shown in
Figure 6.
4.2. Comparison with State-of-the-Art Models
To contextualize X-FRAME’s performance, we compare it against several recent state-of-the-art (SOTA) fake news detection systems that represent diverse modeling strategies, including multimodal fusion and user-interaction features.
Table 5 summarizes their architectures and reported results, grouped by dataset for fair comparison.
It is important to note that not all reported figures are directly comparable: many SOTA systems are evaluated on a single dataset or platform, often with abundant multimodal information (e.g., images, videos), whereas X-FRAME is evaluated on a diverse, cross-domain text corpus without relying on visual features. Consequently, while some specialist models report higher headline accuracies, their domain scope and required input modalities limit generalizability. X-FRAME’s key contribution lies in achieving competitive accuracy while providing interpretable, theory-grounded predictions across heterogeneous sources.
4.3. Ablation Study and Feature Importance
To quantify the contribution of each stream in the hybrid model, we conducted an ablation study on the validation set. The results, summarized in
Table 6, evaluate the impact of textual and structured feature components.
The study reveals a clear hierarchy in the model’s decision process. The textual content, processed through the XLM-RoBERTa encoder, provides the dominant predictive signal; its removal results in a 14.11% drop in accuracy, underscoring that deep semantic understanding is foundational to performance. Nevertheless, integrating the 104 conceptual structured features (expanding to 129 encoded inputs) yields a 1.00% improvement, demonstrating the value of the hybrid design. Within the structured stream, credibility-related features alone contribute a 0.60% gain—accounting for more than half of the structured feature impact—highlighting the central role of source reliability in veracity assessment.
4.3.1. Permutation Importance
To analyze individual structured feature importance, we employed Permutation Importance [
57].
Figure 7 visualizes the top 20 ranked features by their mean decrease in validation accuracy when permuted. Higher values indicate that the model’s predictions are more sensitive to that feature. A tabular summary of these feature importance scores is also provided in
Table 7.
The results reveal that X-FRAME strategically prioritizes contextual indicators over isolated psycholinguistic cues, aligning with its hybrid design philosophy. The top-ranked feature, pheme_topic_charliehebdo, caused a validation accuracy drop of 0.0135 when permuted, suggesting that the model encodes specific linguistic and social dynamics tied to prominent misinformation events. High importance scores for features such as source_name_WELFAKE_dataset and liar_speaker_credibility_score further underscore the model’s sensitivity to source reliability and speaker trustworthiness, validating its grounding in credibility-based features. Additionally, the influence of modality_type attributes (e.g., formal news vs. social claims) confirms that X-FRAME modulates its reasoning pathways based on content origin, demonstrating cross-modal flexibility rather than surface-level token reliance. Collectively, these findings affirm that the model captures deeper structural and contextual semantics crucial for robust fake news detection.
4.3.2. Local Prediction Explanation with LIME
To understand how X-FRAME reasons about individual instances, we employed LIME (Local Interpretable Model-agnostic Explanations) to analyze a representative case from the test set that was correctly classified as Fake. LIME constructs an interpretable local surrogate model that approximates the behavior of the full classifier around a specific prediction. The explanation for this sample is summarized in
Table 8.
The explanation for this instance highlights the strength of our hybrid modeling approach. The model’s prediction was overwhelmingly driven by structured features, such as the absence of known high-credibility topics or source identifiers. In contrast, the textual signal was minimal, and the word ok contributed only 0.0090 to the final prediction. This case study illustrates that X-FRAME does not rely solely on superficial lexical patterns, but instead incorporates contextual signals to derive meaningful and interpretable predictions—an advantage for real-world deployment where transparency is essential. To further contextualize X-FRAME’s interpretability, we compared its explanatory outputs with those of recent SOTA systems.
4.3.3. Interpretability Comparison with SOTA Models
While most state-of-the-art fake news detectors report strong accuracy figures, few provide a quantifiable analysis of interpretability. For instance, Event-Radar and MCOT include attention-weight visualizations focused solely on textual tokens, without reporting feature-level importance across different modalities. MFUIE incorporates user interaction graphs but does not present explicit attribution scores for individual features. In contrast, X-FRAME offers both global explanations (via Permutation Importance) and local explanations (via LIME) that encompass semantic, contextual, and psycholinguistic features, enabling human-interpretable rationale for every prediction.
To provide a preliminary quantitative comparison, we conducted a small-scale human evaluation with three independent annotators on 50 randomly selected test instances. Each annotator rated the explanations as actionable (i.e., useful for understanding and potentially contesting a decision) or non-actionable. Explanations generated by X-FRAME were rated as actionable in 88% of cases, compared to 54% for attention-only visualizations from a fine-tuned XLM-R baseline. Although limited in scope, these results suggest that X-FRAME not only maintains competitive classification performance but also delivers higher practical interpretability than comparable SOTA models.
4.4. Generalization and Domain Adaptation
A central objective of this study was to evaluate the generalization capabilities of X-FRAME across distinct data modalities. The model’s performance on different data groups is detailed in
Table 9, highlighting a significant domain shift between formal news content and social media.
While the model achieved near-perfect performance on structured, formal news articles, its accuracy and F1-score dropped considerably on social media content. This decline, particularly in detecting fake instances, underscores the difficulty of cross-modal generalization, a well-documented challenge in real-world misinformation detection [
62].
Adversarial Robustness Analysis
To assess semantic resilience, we applied the TextFooler adversarial attack using the TextAttack framework [
63] on X-FRAME. We selected a sample of 100 correctly classified test examples and perturbed them under the following settings: maximum perturbation ratio of 30%, up to 50 synonym candidates per word, a minimum semantic similarity threshold of 0.8 (Universal Sentence Encoder) and part-of-speech constraints to preserve grammaticality. Named entities and stopwords were excluded from modification. The structured features remained unaltered throughout the attack, targeting only the textual input.
Table 10 summarizes the results. The attack succeeded in 61.18% of cases, requiring a perturbation of 17.44% of the input words on average. This moderate attack success rate and the relatively high perturbation effort suggest that X-FRAME is not overly reliant on specific keywords and is capable of capturing deeper semantic patterns.
To contextualize these results,
Table 11 benchmarks our robustness against other recent systems. Despite not using adversarial training, X-FRAME demonstrates relatively lower vulnerability, which we attribute to its hybrid design, which integrates contextual features beyond text alone.
5. Discussion
The experimental results demonstrate that the proposed X-FRAME model offers a robust and effective framework for detecting linguistic, contextual, and psycholinguistic indicators that are statistically associated with misinformation and disinformation. It is important to note that, like most approaches in the literature, X-FRAME does not perform direct fact verification through comprehensive world-knowledge reasoning. True verification would require dynamic, up-to-date modeling of real-world facts, which remains a significant and open challenge in computational fact checking. Instead, our approach identifies patterns in text and metadata that are strongly correlated with known instances of misinformation, making it a practical tool for early warning and content triage in real-world applications.
This section provides a deeper interpretation of our findings, contextualizing them within broader challenges in misinformation research. We examine the principal findings related to our hybrid architecture and generalization tests, analyze specific failure modes, and acknowledge the study’s limitations.
5.1. Principal Findings and Implications
Our study yielded three key findings with important implications for future work.
First, the superior performance of the X-FRAME hybrid model over both the text-only and feature-only baselines confirms the core hypothesis: fusing deep semantic understanding with explicit contextual features is more effective than relying on either stream alone. The ablation study quantifies this synergy, showing that while textual content remains the primary driver of performance, the engineered features provide a measurable accuracy boost. This suggests that future models should incorporate source, context, and framing information in addition to content analysis.
Second, our generalization analysis reveals the magnitude of the domain shift problem. The performance drop from 98% accuracy on formal news articles to 72% on noisy social media content demonstrates the fragility of single-modality training. This reinforces the understanding that misinformation manifests differently across contexts—fabricated long-form articles differ substantially from conversational rumors in both linguistic structure and social dynamics.
Third, our explainability analyses show that high-performing models need not be opaque. The Permutation Importance results indicate that the model prioritizes human-intuitive heuristics such as speaker credibility and source reputation over surface-level psycholinguistic markers. This is supported by local explanations via LIME, which demonstrate that X-FRAME grounds predictions in interpretable features. Such transparency is critical for deploying models in journalism, policy-making, and content moderation, where accountability and trust are essential.
Despite these promising results, it is important to temper claims of domain robustness. While X-FRAME attained 97% accuracy on formal news articles, its performance dropped to 72% accuracy (F1fake = 0.67) on noisy, user-generated content. This suggests that while the model generalizes better than baseline systems, it is not immune to the challenges posed by unstructured inputs. Given the ambiguous and dynamic nature of social media discourse, 72% may represent a competitive benchmark for cross-domain adaptability, but we acknowledge that it may not meet the reliability threshold required for high-stakes or real-time deployment. Further improvements via domain adaptation, continual learning, or social-pragmatic feature augmentation are necessary to bridge this gap. Accordingly, we frame these findings not as conclusive evidence of full generalization, but rather as a demonstration of X-FRAME’s adaptability potential under challenging conditions.
5.2. Analysis of Model Errors
A qualitative inspection of validation errors offers additional insight into X-FRAME’s limitations.
False positives (real news predicted as fake) often occurred when authentic posts exhibited stylistic cues typical of misinformation. Legitimate yet emotionally charged social media content with sensational formatting (e.g., excessive punctuation or capitalization) was sometimes misclassified. This suggests that the model may over-rely on stylistic framing indicators when tone deviates from journalistic neutrality.
False negatives (fake news predicted as real) were more common when misinformation was subtle or conversational. Short or informal fake content lacking overt signals (such as strong sentiment or framing cues) tended to evade detection. This indicates a need for further enhancement of the model’s ability to recognize less stylized misinformation, possibly by integrating more nuanced discourse-level and pragmatic features.
5.3. Limitations of the Study
While the findings of this study are promising, several limitations should be acknowledged.
First, although the corpus is large and heterogeneous, it is composed entirely of publicly available datasets, which may carry labeling inconsistencies and sampling biases. For instance, annotations in datasets such as LIAR are based on fact-checking assessments from platforms like PolitiFact, which can introduce subjectivity. Despite applying label harmonization to enforce a consistent binary schema, domain-specific artifacts may still affect model generalizability. Furthermore, the exclusive use of English-language data constrains the applicability of the model in multilingual or cross-lingual contexts.
Second, although class imbalance was mitigated using a weighted loss function, other approaches, such as oversampling, undersampling, or synthetic data augmentation, were not explored. Incorporating these strategies may improve fairness and robustness, particularly for underrepresented examples of disinformation.
Third, our robustness evaluation was limited to a single adversarial technique (TextFooler). A more comprehensive adversarial analysis would involve a broader set of perturbations, including paraphrasing, back-translation, syntactic transformations, and adversarial training, to better assess model resilience under diverse manipulation strategies.
Finally, this work adopts a binary classification schema (Real vs. Fake), which, while practical, may oversimplify the spectrum of misinformation. Extending the framework to a multiclass taxonomy, recognizing satire, hoaxes, propaganda, clickbait, and factual reporting, could enable more nuanced content categorization and better support domain-specific policy interventions.
6. Conclusions and Future Work
6.1. Conclusions
This study addressed two persistent challenges in computational misinformation detection: the lack of explainability in predictive models and limited generalizability across content domains. We introduced X-FRAME, a hybrid architecture that integrates the deep semantic understanding of XLM-RoBERTa with a structured set of 104 features grounded in communication theory, psycholinguistics, and social context analysis. Unlike conventional approaches, our framework incorporates indicators such as source credibility, speaker history, propagation behavior, and psycholinguistic framing—elements essential for detecting patterns that are statistically associated with misinformation and disinformation. We emphasize that X-FRAME, like most current systems, does not perform fact verification against an authoritative knowledge base; instead, it identifies linguistic and contextual signals correlated with previously verified examples of misinformation.
X-FRAME was evaluated on a large and heterogeneous corpus comprising 286,260 instances from both formal news outlets and informal social media platforms. It achieved an overall accuracy of 86.1% and a recall of 81% on the minority Fake class. Notably, the model demonstrated cross-domain adaptability potential, attaining 97% accuracy on formal news articles and 72% on unstructured, user-generated content. These results reflect the model’s robustness across diverse textual environments while acknowledging the performance gap between structured and noisy domains.
Importantly, X-FRAME advances beyond black-box modeling. Using Permutation Importance and LIME, we demonstrated that predictions were driven by interpretable, theory-informed signals—such as source trustworthiness, sensationalism, and partisan tone—thus, enhancing model transparency and accountability. These findings support the viability of X-FRAME as a practical tool for real-world deployment in journalism, fact-checking, and policy monitoring settings, where both performance and interpretability are essential.
6.2. Future Work
Several promising directions remain for extending this research.
Enhanced fusion strategies. While simple feature concatenation proved effective, future versions of X-FRAME could benefit from more sophisticated fusion mechanisms. Techniques such as co-attention, gating, or late fusion may yield better alignment between semantic embeddings and structured features, enabling richer representation learning.
Broader adversarial evaluation. Although our robustness analysis used the TextFooler algorithm, a more comprehensive suite of adversarial techniques—including paraphrasing, entity substitution, syntactic perturbations, and adversarial training—would offer deeper insight into model resilience under real-world attack conditions.
Multimodal integration. While the current model relies solely on textual and structured input, integrating complementary modalities such as images, video thumbnails, or social engagement graphs could expand X-FRAME’s applicability to more dynamic, platform-specific misinformation formats.
Fine-grained misinformation classification. Extending the binary classification scheme to a multiclass taxonomy—e.g., satire, clickbait, hoax, propaganda, or factual content—could enhance content triage workflows and better support policy design in media literacy and regulation.
These directions represent complementary, not strictly sequential lines of advancement. Together, they chart a path toward more robust, generalizable, and interpretable misinformation detection systems.