MDPI - Publisher of Open Access Journals

30 pages, 940 KiB

Open AccessArticle

Language Contact and Population Contact as Sources of Dialect Similarity

by Jonathan Dunn and Sidney Wong

Languages 2025, 10(8), 188; https://doi.org/10.3390/languages10080188 (registering DOI) - 31 Jul 2025

This paper creates a global similarity network between city-level dialects of English in order to determine whether external factors like the amount of population contact or language contact influence dialect similarity. While previous computational work has focused on external influences that contribute to [...] Read more.

This paper creates a global similarity network between city-level dialects of English in order to determine whether external factors like the amount of population contact or language contact influence dialect similarity. While previous computational work has focused on external influences that contribute to phonological or lexical similarity, this paper focuses on grammatical variation as operationalized in computational construction grammar. Social media data was used to create comparable English corpora from 256 cities across 13 countries. Each sample is represented using the type frequency of various constructions. These frequency representations are then used to calculate pairwise similarities between city-level dialects; a prediction-based evaluation shows that these similarity values are highly accurate. Linguistic similarity is then compared with four external factors: (i) the amount of air travel between cities, a proxy for population contact, (ii) the difference in the linguistic landscapes of each city, a proxy for language contact, (iii) the geographic distance between cities, and (iv) the presence of political boundaries separating cities. The results show that, while all these factors are significant, the best model relies on language contact and geographic distance. Full article

(This article belongs to the Special Issue Dialectal Dynamics)

► Show Figures

Figure 1

30 pages, 37977 KiB

Open AccessArticle

Text-Guided Visual Representation Optimization for Sensor-Acquired Video Temporal Grounding

by Yun Tian, Xiaobo Guo, Jinsong Wang and Xinyue Liang

Sensors 2025, 25(15), 4704; https://doi.org/10.3390/s25154704 - 30 Jul 2025

Abstract

Video temporal grounding (VTG) aims to localize a semantically relevant temporal segment within an untrimmed video based on a natural language query. The task continues to face challenges arising from cross-modal semantic misalignment, which is largely attributed to redundant visual content in sensor-acquired [...] Read more.

Video temporal grounding (VTG) aims to localize a semantically relevant temporal segment within an untrimmed video based on a natural language query. The task continues to face challenges arising from cross-modal semantic misalignment, which is largely attributed to redundant visual content in sensor-acquired video streams, linguistic ambiguity, and discrepancies in modality-specific representations. Most existing approaches rely on intra-modal feature modeling, processing video and text independently throughout the representation learning stage. However, this isolation undermines semantic alignment by neglecting the potential of cross-modal interactions. In practice, a natural language query typically corresponds to spatiotemporal content in video signals collected through camera-based sensing systems, encompassing a particular sequence of frames and its associated salient subregions. We propose a text-guided visual representation optimization framework tailored to enhance semantic interpretation over video signals captured by visual sensors. This framework leverages textual information to focus on spatiotemporal video content, thereby narrowing the cross-modal gap. Built upon the unified cross-modal embedding space provided by CLIP, our model leverages video data from sensing devices to structure representations and introduces two dedicated modules to semantically refine visual representations across spatial and temporal dimensions. First, we design a Spatial Visual Representation Optimization (SVRO) module to learn spatial information within intra-frames. It selects salient patches related to the text, capturing more fine-grained visual details. Second, we introduce a Temporal Visual Representation Optimization (TVRO) module to learn temporal relations from inter-frames. Temporal triplet loss is employed in TVRO to enhance attention on text-relevant frames and capture clip semantics. Additionally, a self-supervised contrastive loss is introduced at the clip–text level to improve inter-clip discrimination by maximizing semantic variance during training. Experiments on Charades-STA, ActivityNet Captions, and TACoS, widely used benchmark datasets, demonstrate that our method outperforms state-of-the-art methods across multiple metrics. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

33 pages, 9781 KiB

Open AccessArticle

Spatial Narrative Optimization in Digitally Gamified Architectural Scenarios

by Deshao Wang, Jieqing Xu and Luwang Chen

Buildings 2025, 15(15), 2597; https://doi.org/10.3390/buildings15152597 - 23 Jul 2025

Viewed by 205

Abstract

Currently, exploring digital immersive experiences is a new trend in the innovation and development of cultural tourism. This study addresses the growing demand for digital immersion in cultural tourism by examining the integration of spatial narrative and digitally gamified architectural scenarios. This study [...] Read more.

Currently, exploring digital immersive experiences is a new trend in the innovation and development of cultural tourism. This study addresses the growing demand for digital immersion in cultural tourism by examining the integration of spatial narrative and digitally gamified architectural scenarios. This study synthesizes an optimized framework for narrative design in digitally gamified architectural scenarios, integrating spatial narrative theory and feedback-informed design. The proposed model comprises four key components: (1) developing spatial narrative design methods for such scenarios; (2) constructing a spatial language system for spatial narratives using linguistic principles to organize narrative expression; (3) building a preliminary digitally gamified scenario based on the “Wuhu Jiaoji Temple Renovation Project” after architectural and environmental enhancements; and (4) optimization through thermal feedback experiments—collecting visitor trajectory heatmaps, eye-tracking heatmaps, and oculometric data. The results show that the optimized design, validated in the original game Dreams of Jiaoji, effectively enhanced spatial narrative execution by refining both on-site and in-game architectural scenarios. Post-optimization visitor feedback confirmed the validity of the proposed optimization strategies and principles, providing theoretical and practical references for innovative digital cultural tourism models and architectural design advancements. In the context of site-specific architectural conservation, this approach achieves two key objectives: the generalized interpretation of architectural cultural resources and their visual representation through gamified interactions. This paradigm not only enhances public engagement through enabling a multidimensional understanding of historical building cultures but also accelerates the protective reuse of heritage sites, allowing heritage value to be maximized through contemporary reinterpretation. The interdisciplinary methodology promotes sustainable development in the digital transformation of cultural tourism, fostering user-centered experiences and contributing to rural revitalization. Ultimately, this study highlights the potential use of digitally gamified architectural scenarios as transformative tools for heritage preservation, cultural dissemination, and rural community revitalization. Full article

(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)

► Show Figures

Figure 1

32 pages, 2182 KiB

Open AccessArticle

Detection of Biased Phrases in the Wiki Neutrality Corpus for Fairer Digital Content Management Using Artificial Intelligence

by Abdullah, Muhammad Ateeb Ather, Olga Kolesnikova and Grigori Sidorov

Big Data Cogn. Comput. 2025, 9(7), 190; https://doi.org/10.3390/bdcc9070190 - 21 Jul 2025

Viewed by 371

Abstract

Detecting biased language in large-scale corpora, such as the Wiki Neutrality Corpus, is essential for promoting neutrality in digital content. This study systematically evaluates a range of machine learning (ML) and deep learning (DL) models for the detection of biased and pre-conditioned phrases. [...] Read more.

Detecting biased language in large-scale corpora, such as the Wiki Neutrality Corpus, is essential for promoting neutrality in digital content. This study systematically evaluates a range of machine learning (ML) and deep learning (DL) models for the detection of biased and pre-conditioned phrases. Conventional classifiers, including Extreme Gradient Boosting (XGBoost), Light Gradient-Boosting Machine (LightGBM), and Categorical Boosting (CatBoost), are compared with advanced neural architectures such as Bidirectional Encoder Representations from Transformers (BERT), Long Short-Term Memory (LSTM) networks, and Generative Adversarial Networks (GANs). A novel hybrid architecture is proposed, integrating DistilBERT, LSTM, and GANs within a unified framework. Extensive experimentation with intermediate variants DistilBERT + LSTM (without GAN) and DistilBERT + GAN (without LSTM) demonstrates that the fully integrated model consistently outperforms all alternatives. The proposed hybrid model achieves a cross-validation accuracy of 99.00%, significantly surpassing traditional baselines such as XGBoost (96.73%) and LightGBM (96.83%). It also exhibits superior stability, statistical significance (paired t-tests), and favorable trade-offs between performance and computational efficiency. The results underscore the potential of hybrid deep learning models for capturing subtle linguistic bias and advancing more objective and reliable automated content moderation systems. Full article

► Show Figures

Figure 1

23 pages, 1580 KiB

Open AccessArticle

Elucidating White Matter Contributions to the Cognitive Architecture of Affective Prosody Recognition: Evidence from Right Hemisphere Stroke

by Meyra S. Jackson, Yuto Uchida, Shannon M. Sheppard, Kenichi Oishi, Ciprian Crainiceanu, Argye E. Hillis and Alexandra Z. Durfee

Brain Sci. 2025, 15(7), 769; https://doi.org/10.3390/brainsci15070769 - 19 Jul 2025

Viewed by 340

Abstract

Background/Objectives: Successful discourse relies not only on linguistic but also on prosodic information. Difficulty recognizing emotion conveyed through prosody (receptive affective aprosodia) following right hemisphere stroke (RHS) significantly disrupts communication participation and personal relationships. Growing evidence suggests that damage to white matter [...] Read more.

Background/Objectives: Successful discourse relies not only on linguistic but also on prosodic information. Difficulty recognizing emotion conveyed through prosody (receptive affective aprosodia) following right hemisphere stroke (RHS) significantly disrupts communication participation and personal relationships. Growing evidence suggests that damage to white matter in addition to gray matter structures impairs affective prosody recognition. The current study investigates lesion–symptom associations in receptive affective aprosodia during RHS recovery by assessing whether disruptions in distinct white matter structures impact different underlying affective prosody recognition skills. Methods: Twenty-eight adults with RHS underwent neuroimaging and behavioral testing at acute, subacute, and chronic timepoints. Fifty-seven healthy matched controls completed the same behavioral testing, which comprised tasks targeting affective prosody recognition and underlying perceptual, cognitive, and linguistic skills. Linear mixed-effects models and multivariable linear regression were used to assess behavioral performance recovery and lesion–symptom associations. Results: Controls outperformed RHS participants on behavioral tasks earlier in recovery, and RHS participants’ affective prosody recognition significantly improved from acute to chronic testing. Affective prosody and emotional facial expression recognition were affected by external capsule and inferior fronto-occipital fasciculus lesions while sagittal stratum lesions impacted prosodic feature recognition. Accessing semantic representations of emotions implicated the superior longitudinal fasciculus. Conclusions: These findings replicate previously observed associations between right white matter tracts and affective prosody recognition and further identify lesion–symptom associations of underlying prosodic recognition skills throughout recovery. Investigation into prosody’s behavioral components and how they are affected by injury can help further intervention development and planning. Full article

(This article belongs to the Special Issue Language, Communication and the Brain—2nd Edition)

► Show Figures

Figure 1

21 pages, 1689 KiB

Open AccessArticle

Exploring LLM Embedding Potential for Dementia Detection Using Audio Transcripts

by Brandon Alejandro Llaca-Sánchez, Luis Roberto García-Noguez, Marco Antonio Aceves-Fernández, Andras Takacs and Saúl Tovar-Arriaga

Eng 2025, 6(7), 163; https://doi.org/10.3390/eng6070163 - 17 Jul 2025

Viewed by 269

Abstract

Dementia is a neurodegenerative disorder characterized by progressive cognitive impairment that significantly affects daily living. Early detection of Alzheimer’s disease—the most common form of dementia—remains essential for prompt intervention and treatment, yet clinical diagnosis often requires extensive and resource-intensive procedures. This article explores [...] Read more.

Dementia is a neurodegenerative disorder characterized by progressive cognitive impairment that significantly affects daily living. Early detection of Alzheimer’s disease—the most common form of dementia—remains essential for prompt intervention and treatment, yet clinical diagnosis often requires extensive and resource-intensive procedures. This article explores the effectiveness of automated Natural Language Processing (NLP) methods for identifying Alzheimer’s indicators from audio transcriptions of the Cookie Theft picture description task in the PittCorpus dementia database. Five NLP approaches were compared: a classical Tf–Idf statistical representation and embeddings derived from large language models (GloVe, BERT, Gemma-2B, and Linq-Embed-Mistral), each integrated with a logistic regression classifier. Transcriptions were carefully preprocessed to preserve linguistically relevant features such as repetitions, self-corrections, and pauses. To compare the performance of the five approaches, a stratified 5-fold cross-validation was conducted; the best results were obtained with BERT embeddings (84.73% accuracy) closely followed by the simpler Tf–Idf approach (83.73% accuracy) and the state-of-the-art model Linq-Embed-Mistral (83.54% accuracy), while Gemma-2B and GloVe embeddings yielded slightly lower performances (80.91% and 78.11% accuracy, respectively). Contrary to initial expectations—that richer semantic and contextual embeddings would substantially outperform simpler frequency-based methods—the competitive accuracy of Tf–Idf suggests that the choice and frequency of the words used might be more important than semantic or contextual information in Alzheimer’s detection. This work represents an effort toward implementing user-friendly software capable of offering an initial indicator of Alzheimer’s risk, potentially reducing the need for an in-person clinical visit. Full article

(This article belongs to the Special Issue Advanced Artificial Intelligence Techniques for Disease Prediction, Diagnosis and Management)

► Show Figures

Figure 1

20 pages, 1765 KiB

Open AccessArticle

Can Informativity Effects Be Predictability Effects in Disguise?

by Vsevolod Kapatsinski

Entropy 2025, 27(7), 739; https://doi.org/10.3390/e27070739 - 10 Jul 2025

Viewed by 634

Abstract

Recent work in corpus linguistics has observed that informativity predicts articulatory reduction of a linguistic unit above and beyond the unit’s predictability in the local context, i.e., the unit’s probability given the current context. Informativity of a unit is the inverse of average [...] Read more.

Recent work in corpus linguistics has observed that informativity predicts articulatory reduction of a linguistic unit above and beyond the unit’s predictability in the local context, i.e., the unit’s probability given the current context. Informativity of a unit is the inverse of average (log-scaled) predictability and corresponds to its information content. Research in the field has interpreted effects of informativity as speakers being sensitive to the information content of a unit in deciding how much effort to put into pronouncing it or as accumulation of memories of pronunciation details in long-term memory representations. However, average predictability can improve the estimate of local predictability of a unit above and beyond the observed predictability in that context, especially when that context is rare. Therefore, informativity can contribute to explaining variance in a dependent variable like reduction above and beyond local predictability simply because informativity improves the (inherently noisy) estimate of local predictability. This paper shows how to estimate the proportion of an observed informativity effect that is likely to be artifactual, due entirely to informativity improving the estimates of predictability, via simulation. The proposed simulation approach can be used to investigate whether an effect of informativity is likely to be real, under the assumption that corpus probabilities are an unbiased estimate of probabilities driving reduction behavior, and how much of it is likely to be due to noise in predictability estimates, in any real dataset. Full article

(This article belongs to the Special Issue Complexity Characteristics of Natural Language)

► Show Figures

Figure 1

22 pages, 792 KiB

Open AccessArticle

Childhood Heritage Languages: A Tangier Case Study

by Ariadna Saiz Mingo

Languages 2025, 10(7), 168; https://doi.org/10.3390/languages10070168 - 9 Jul 2025

Viewed by 358

Abstract

Through the testimony of a Tangier female citizen who grew up in the “prolific multilingual Spanish-French-Darija context of international Tangier”, this article analyzes the web of beliefs projected onto both the inherited and local languages within her linguistic repertoire. Starting from the daily [...] Read more.

Through the testimony of a Tangier female citizen who grew up in the “prolific multilingual Spanish-French-Darija context of international Tangier”, this article analyzes the web of beliefs projected onto both the inherited and local languages within her linguistic repertoire. Starting from the daily realities in which she was immersed and the social networks that she formed, we focus on the representations of communication and her affective relationship with the host societies. The analysis starts from the most immediate domestic context in which Spanish, in its variant Jaquetía (a dialect of Judeo-Spanish language spoken by the Sephardic Jews of northern Morocco) was displaced by French as the language of instruction. After an initial episode of reversible attrition, we witnessed various phenomena of translanguaging within the host society. Following the binomial “emotion-interrelational space”, we seek to discern the affective contexts associated with the languages of a multilingual childhood, and which emotional links are vital for maintaining inherited ones. This shift towards the valuation of the affective culture implies a reorientation of the gaze towards everyday experiences as a means of research in contexts of language contact. Full article

(This article belongs to the Special Issue Exploring Linguistic Boundaries: From the Acquisition of Languages to Multilingual Practices)

► Show Figures

Figure 1

20 pages, 475 KiB

Open AccessArticle

Hierarchical Modeling and Analysis of an International Conflict Based on Hesitant Fuzzy Linguistic Term Sets

by Junji Hao, Bingfeng Ge, Yuming Huang, Zeqiang Hou, Tianjiao Yang and Wanying Wei

Systems 2025, 13(7), 557; https://doi.org/10.3390/systems13070557 - 8 Jul 2025

Viewed by 198

Abstract

In this article, to address the uncertainty of preference information in interrelated conflicts in the real world, a hierarchical conflict modeling and analysis approach based on hesitant fuzzy linguistic term sets (HFLTSs) is proposed. First, considering the hesitancy and fuzziness of decision makers [...] Read more.

In this article, to address the uncertainty of preference information in interrelated conflicts in the real world, a hierarchical conflict modeling and analysis approach based on hesitant fuzzy linguistic term sets (HFLTSs) is proposed. First, considering the hesitancy and fuzziness of decision makers (DMs) when expressing preferences in hierarchical conflicts, a preference representation approach based on HFLTSs is introduced. Building upon hesitant fuzzy linguistic preference, four distinct types of hesitant fuzzy stability definitions of the two-level hierarchical graph model for conflict resolution (HGMCR) are extended, and a corresponding algorithm is developed to solve the global conflict hesitant fuzzy equilibrium states. Finally, this study is applied to investigate the outbreak and development of a specific international conflict, verifying the feasibility and effectiveness of the proposed approach. The hesitant fuzzy equilibrium states of an international conflict indicate that the attitudes of domestic forces reflect a nation’s performance in the warand that the conflict may endure for an extended duration. The hierarchical conflict modeling and analysis approach based on HFLTSs allows DMs to express the hesitation and fuzziness of preferences under uncertainty, facilitates the comprehension of the intrinsic logic behind interactions among DMs at various levels, and enhances the analysis to achieve more foresighted equilibria. Full article

(This article belongs to the Section Systems Practice in Social Science)

► Show Figures

Figure 1

21 pages, 1772 KiB

Open AccessArticle

Through Their Eyes: Journalists’ Perspectives on Framing, Bias, and Ethics in Media Coverage of Minorities

by Panagiota (Naya) Kalfeli, Christina Angeli and Christos Frangonikolopoulos

Journal. Media 2025, 6(3), 98; https://doi.org/10.3390/journalmedia6030098 - 8 Jul 2025

Viewed by 563

Abstract

Global data reveal ongoing inequalities faced by minorities, often reinforced by media portrayals that depict them as threats, victims, or passive individuals without agency. While media framing has been extensively studied, especially in terms of media content and representation, few studies have examined [...] Read more.

Global data reveal ongoing inequalities faced by minorities, often reinforced by media portrayals that depict them as threats, victims, or passive individuals without agency. While media framing has been extensively studied, especially in terms of media content and representation, few studies have examined how journalists perceive and navigate the coverage of minorities. This study addresses that gap by examining how Greek journalists perceive mainstream media coverage of refugees and migrants, LGBTQ+ individuals, and people with mental health challenges, with particular attention to their sourcing practices and sense of ethical responsibility. Fourteen journalists participated in semi-structured interviews, and thematic analysis was applied to identify key patterns. Journalists described dominant media narratives as fragmented, stereotypical, and dehumanizing, noting the frequent use of linguistic inaccuracies, misinformation, and the absence of personal stories. At the same time, they reported opportunities within their own sourcing practices to promote more inclusive and accurate coverage. Ethical concerns were expressed on three levels—union; corporate; and personal—with calls for clearer editorial guidelines and dedicated training. Many participants emphasized the role of personal ethics as a guiding compass in navigating complex newsroom pressures. Full article

► Show Figures

Figure 1

28 pages, 1969 KiB

Open AccessFeature PaperArticle

A Fuzzy-XAI Framework for Customer Segmentation and Risk Detection: Integrating RFM, 2-Tuple Modeling, and Strategic Scoring

by Gabriel Marín Díaz

Mathematics 2025, 13(13), 2141; https://doi.org/10.3390/math13132141 - 30 Jun 2025

Viewed by 315

Abstract

This article presents an interpretable framework for customer segmentation and churn risk detection, integrating fuzzy clustering, explainable AI (XAI), and strategic scoring. The process begins with Fuzzy C-Means (FCM) applied to normalized RFM indicators (Recency, Frequency, Monetary), which were then mapped to a [...] Read more.

This article presents an interpretable framework for customer segmentation and churn risk detection, integrating fuzzy clustering, explainable AI (XAI), and strategic scoring. The process begins with Fuzzy C-Means (FCM) applied to normalized RFM indicators (Recency, Frequency, Monetary), which were then mapped to a 2-tuple linguistic scale to enhance semantic interpretability. Cluster memberships and centroids were analyzed to identify distinct behavioral patterns. An XGBoost classifier was trained to validate the coherence of the fuzzy segments, while SHAP and LIME provided global and local explanations for the classification decisions. Following segmentation, an AHP-based strategic score was computed for each customer, using weights derived from pairwise comparisons reflecting organizational priorities. These scores were also translated into the 2-tuple domain, reinforcing interpretability. The model then identified customers at risk of disengagement, defined by a combination of low Recency, high Frequency and Monetary values, and a low AHP score. Based on Recency thresholds, customers are classified as Active, Latent, or Probable Churn. A second XGBoost model was applied to predict this risk level, with SHAP used to explain its predictive behavior. Overall, the proposed framework integrated fuzzy logic, semantic representation, and explainable AI to support actionable, transparent, and human-centered customer analytics. Full article

► Show Figures

Figure 1

19 pages, 457 KiB

Open AccessArticle

Transinger: Cross-Lingual Singing Voice Synthesis via IPA-Based Phonetic Alignment

by Chen Shen, Lu Zhao, Cejin Fu, Bote Gan and Zhenlong Du

Sensors 2025, 25(13), 3973; https://doi.org/10.3390/s25133973 - 26 Jun 2025

Viewed by 534

Abstract

Although Singing Voice Synthesis (SVS) has revolutionized audio content creation, global linguistic diversity remains challenging. Current SVS research shows scant exploration of cross-lingual generalization, as fragmented, language-specific phoneme encodings (e.g., Pinyin, ARPA) hinder unified phonetic modeling. To address this challenge, we built a [...] Read more.

Although Singing Voice Synthesis (SVS) has revolutionized audio content creation, global linguistic diversity remains challenging. Current SVS research shows scant exploration of cross-lingual generalization, as fragmented, language-specific phoneme encodings (e.g., Pinyin, ARPA) hinder unified phonetic modeling. To address this challenge, we built a four-language dataset based on GTSinger’s speech data, using the International Phonetic Alphabet (IPA) for consistent phonetic representation and applying precise segmentation and calibration for improved quality. In particular, we propose a novel method of decomposing IPA phonemes into letters and diacritics, enabling the model to deeply learn the underlying rules of pronunciation and achieve better generalization. A dynamic IPA adaptation strategy further enables the application of learned phonetic representations to unseen languages. Based on VISinger2, we introduce Transinger, an innovative cross-lingual synthesis framework. Transinger achieves breakthroughs in phoneme representation learning by precisely modeling pronunciation, which effectively enables compositional generalization to unseen languages. It also integrates Conformer and RVQ techniques to optimize information extraction and generation, achieving outstanding cross-lingual synthesis performance. Objective and subjective experiments have confirmed that Transinger significantly outperforms state-of-the-art singing synthesis methods in terms of cross-lingual generalization. These results demonstrate that multilingual aligned representations can markedly enhance model learning efficacy and robustness, even for languages not seen during training. Moreover, the integration of a strategy that splits IPA phonemes into letters and diacritics allows the model to learn pronunciation more effectively, resulting in a qualitative improvement in generalization. Full article

(This article belongs to the Special Issue Advances in Automatic Speech Recognition, Audio and Underwater Acoustic Signal Analysis)

► Show Figures

Figure 1

24 pages, 595 KiB

Open AccessArticle

An Empirical Comparison of Machine Learning and Deep Learning Models for Automated Fake News Detection

by Yexin Tian, Shuo Xu, Yuchen Cao, Zhongyan Wang and Zijing Wei

Mathematics 2025, 13(13), 2086; https://doi.org/10.3390/math13132086 - 25 Jun 2025

Viewed by 497

Abstract

Detecting fake news is a critical challenge in natural language processing (NLP), demanding solutions that balance accuracy, interpretability, and computational efficiency. Despite advances in NLP, systematic empirical benchmarks that directly compare both classical and deep models—across varying input richness and with careful attention [...] Read more.

Detecting fake news is a critical challenge in natural language processing (NLP), demanding solutions that balance accuracy, interpretability, and computational efficiency. Despite advances in NLP, systematic empirical benchmarks that directly compare both classical and deep models—across varying input richness and with careful attention to interpretability and computational tradeoffs—remain underexplored. In this study, we systematically evaluate the mathematical foundations and empirical performance of five representative models for automated fake news classification: three classical machine learning algorithms (Logistic Regression, Random Forest, and Light Gradient Boosting Machine) and two state-of-the-art deep learning architectures (A Lite Bidirectional Encoder Representations from Transformers—ALBERT and Gated Recurrent Units—GRUs). Leveraging the large-scale WELFake dataset, we conduct rigorous experiments under both headline-only and headline-plus-content input scenarios, providing a comprehensive assessment of each model’s capability to capture linguistic, contextual, and semantic cues. We analyze each model’s optimization framework, decision boundaries, and feature importance mechanisms, highlighting the empirical tradeoffs between representational capacity, generalization, and interpretability. Our results show that transformer-based models, especially ALBERT, achieve state-of-the-art performance (macro F1 up to 0.99) with rich context, while classical ensembles remain viable for constrained settings. These findings directly inform practical fake news detection. Full article

(This article belongs to the Special Issue Mathematical Foundations in NLP: Applications and Challenges)

► Show Figures

Figure 1

22 pages, 494 KiB

Open AccessArticle

Invaders and Containers: Cognitive Representations of Biological and Particular Matter (bioPM)

by Andrew S. Mitchell, Mark Lemon and Gillian H. Drew

Pollutants 2025, 5(3), 17; https://doi.org/10.3390/pollutants5030017 - 24 Jun 2025

Viewed by 346

Abstract

Air quality management concerns the assessment, analysis and mitigation strategies associated with ensuring that air is breathable and non-toxic. Successful management is a cognitively intensive task, knowledge-focused and converges multiple sources of information to develop a shared understanding of a problem. To operate [...] Read more.

Air quality management concerns the assessment, analysis and mitigation strategies associated with ensuring that air is breathable and non-toxic. Successful management is a cognitively intensive task, knowledge-focused and converges multiple sources of information to develop a shared understanding of a problem. To operate effectively in this space, managers and operational teams share common points of reference in discussing problems and solutions, strategies, tactical briefings, etc., and communication and technical language use are key to the discipline. However, few studies have homed in on the language communities of air quality management discourse, and fewer still have exploited this to gain insight into the cognitive processes underpinning salient operational knowledge production. This paper draws upon a discussion from a multi-stakeholder workshop on bioaerosols and the built environment and draws upon Cognitive Linguistics to systematically examine the cognitive structuring of those different stakeholder representations. This approach is then explored as a contribution to good practice in air quality knowledge management and communication that is consistent with studies on cognitive and learning science and has potential for policy formulation. Full article

(This article belongs to the Section Environmental Systems and Management)

► Show Figures

Figure 1

26 pages, 1859 KiB

Open AccessArticle

Domestication of Source Text in Literary Translation Prevails over Foreignization

by Emilio Matricciani

Analytics 2025, 4(3), 17; https://doi.org/10.3390/analytics4030017 - 20 Jun 2025

Viewed by 817

Abstract

Domestication is a translation theory in which the source text (to be translated) is matched to the foreign reader by erasing its original linguistic and cultural difference. This match aims at making the target text (translated text) more fluent. On the contrary, foreignization [...] Read more.

Domestication is a translation theory in which the source text (to be translated) is matched to the foreign reader by erasing its original linguistic and cultural difference. This match aims at making the target text (translated text) more fluent. On the contrary, foreignization is a translation theory in which the foreign reader is matched to the source text. This paper mathematically explores the degree of domestication/foreignization in current translation practice of texts written in alphabetical languages. A geometrical representation of texts, based on linear combinations of deep–language parameters, allows us (a) to calculate a domestication index which measures how much domestication is applied to the source text and (b) to distinguish language families. An expansion index measures the relative spread around mean values. This paper reports statistics and results on translations of (a) Greek New Testament books in Latin and in 35 modern languages, belonging to diverse language families; and (b) English novels in Western languages. English and French, although attributed to different language families, mathematically almost coincide. The requirement of making the target text more fluent makes domestication, with varying degrees, universally adopted, so that a blind comparison of the same linguistic parameters of a text and its translation hardly indicates that they refer to each other. Full article

► Show Figures

Figure 1

Search Results (255)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (255)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI