Saved Queries

This paper explores emotional analysis in Hebrew texts, focusing on improving machine learning techniques for depression detection by integrating psychological feature lexicons. Hebrew’s complex morphology makes emotional analysis challenging, and this study seeks to address that by combining traditional machine learning methods with sentiment lexicons. The dataset consists of over 350,000 posts from 25,000 users on the health-focused social network “Camoni” from 2010 to 2021. Various machine learning models—SVM, Random Forest, Logistic Regression, and Multi-Layer Perceptron—were used, alongside ensemble techniques like Bagging, Boosting, and Stacking. TF-IDF was applied for feature selection, with word and character n-grams, and pre-processing steps like punctuation removal, stop word elimination, and lemmatization were performed to handle Hebrew’s linguistic complexity. The models were enriched with sentiment lexicons curated by professional psychologists. The study demonstrates that integrating sentiment lexicons significantly improves classification accuracy. Specific lexicons—such as those for negative and positive emojis, hostile words, anxiety words, and no-trust words—were particularly effective in enhancing model performance. Our best model classified depression with an accuracy of 84.1%. These findings offer insights into depression detection, suggesting that practitioners in mental health and social work can improve their machine learning models for detecting depression in online discourse by incorporating emotion-based lexicons. The societal impact of this work lies in its potential to improve the detection of depression in online Hebrew discourse, offering more accurate and efficient methods for mental health interventions in online communities. Full article

(This article belongs to the Special Issue Techniques and Applications of Multimodal Data Fusion)

►▼ Show Figures

Figure 1

30 pages, 940 KiB

Open AccessArticle

Language Contact and Population Contact as Sources of Dialect Similarity

by Jonathan Dunn and Sidney Wong

Languages 2025, 10(8), 188; https://doi.org/10.3390/languages10080188 (registering DOI) - 31 Jul 2025

Abstract

This paper creates a global similarity network between city-level dialects of English in order to determine whether external factors like the amount of population contact or language contact influence dialect similarity. While previous computational work has focused on external influences that contribute to phonological or lexical similarity, this paper focuses on grammatical variation as operationalized in computational construction grammar. Social media data was used to create comparable English corpora from 256 cities across 13 countries. Each sample is represented using the type frequency of various constructions. These frequency representations are then used to calculate pairwise similarities between city-level dialects; a prediction-based evaluation shows that these similarity values are highly accurate. Linguistic similarity is then compared with four external factors: (i) the amount of air travel between cities, a proxy for population contact, (ii) the difference in the linguistic landscapes of each city, a proxy for language contact, (iii) the geographic distance between cities, and (iv) the presence of political boundaries separating cities. The results show that, while all these factors are significant, the best model relies on language contact and geographic distance. Full article

(This article belongs to the Special Issue Dialectal Dynamics)

►▼ Show Figures

Figure 1

16 pages, 1932 KiB

Open AccessArticle

Parsing Old English with Universal Dependencies—The Impacts of Model Architectures and Dataset Sizes

by Javier Martín Arista, Ana Elvira Ojanguren López and Sara Domínguez Barragán

Big Data Cogn. Comput. 2025, 9(8), 199; https://doi.org/10.3390/bdcc9080199 - 30 Jul 2025

Abstract

This study presents the first systematic empirical comparison of neural architectures for Universal Dependencies (UD) parsing in Old English, thus addressing central questions in computational historical linguistics and low-resource language processing. We evaluate three approaches—a baseline spaCy pipeline, a pipeline with a pretrained tok2vec component, and a MobileBERT transformer-based model—across datasets ranging from 1000 to 20,000 words. Our results demonstrate that the pretrained tok2vec model consistently outperforms alternatives, because it achieves 83.24% UAS and 74.23% LAS with the largest dataset, whereas the transformer-based approach substantially underperforms despite higher computational costs. Performance analysis reveals that basic tagging tasks reach 85–90% accuracy, while dependency parsing achieves approximately 75% accuracy. We identify critical scaling thresholds, with substantial improvements occurring between 1000 and 5000 words and diminishing returns beyond 10,000 words, which provides insights into scaling laws for historical languages. Technical analysis reveals that the poor performance of the transformer stems from parameter-to-data ratio mismatches (1250:1) and the unique orthographic and morphological characteristics of Old English. These findings defy assumptions about transformer superiority in low-resource scenarios and establish evidence-based guidelines for researchers working with historical languages. The broader significance of this study extends to enabling an automated analysis of three million words of extant Old English texts and providing a framework for optimal architecture selection in data-constrained environments. Our results suggest that medium-complexity architectures with monolingual pretraining offer superior cost–benefit trade-offs compared to complex transformer models for historical language processing. Full article

►▼ Show Figures

Figure 1

30 pages, 37977 KiB

Open AccessArticle

Text-Guided Visual Representation Optimization for Sensor-Acquired Video Temporal Grounding

by Yun Tian, Xiaobo Guo, Jinsong Wang and Xinyue Liang

Sensors 2025, 25(15), 4704; https://doi.org/10.3390/s25154704 - 30 Jul 2025

Abstract

Video temporal grounding (VTG) aims to localize a semantically relevant temporal segment within an untrimmed video based on a natural language query. The task continues to face challenges arising from cross-modal semantic misalignment, which is largely attributed to redundant visual content in sensor-acquired video streams, linguistic ambiguity, and discrepancies in modality-specific representations. Most existing approaches rely on intra-modal feature modeling, processing video and text independently throughout the representation learning stage. However, this isolation undermines semantic alignment by neglecting the potential of cross-modal interactions. In practice, a natural language query typically corresponds to spatiotemporal content in video signals collected through camera-based sensing systems, encompassing a particular sequence of frames and its associated salient subregions. We propose a text-guided visual representation optimization framework tailored to enhance semantic interpretation over video signals captured by visual sensors. This framework leverages textual information to focus on spatiotemporal video content, thereby narrowing the cross-modal gap. Built upon the unified cross-modal embedding space provided by CLIP, our model leverages video data from sensing devices to structure representations and introduces two dedicated modules to semantically refine visual representations across spatial and temporal dimensions. First, we design a Spatial Visual Representation Optimization (SVRO) module to learn spatial information within intra-frames. It selects salient patches related to the text, capturing more fine-grained visual details. Second, we introduce a Temporal Visual Representation Optimization (TVRO) module to learn temporal relations from inter-frames. Temporal triplet loss is employed in TVRO to enhance attention on text-relevant frames and capture clip semantics. Additionally, a self-supervised contrastive loss is introduced at the clip–text level to improve inter-clip discrimination by maximizing semantic variance during training. Experiments on Charades-STA, ActivityNet Captions, and TACoS, widely used benchmark datasets, demonstrate that our method outperforms state-of-the-art methods across multiple metrics. Full article

(This article belongs to the Section Sensing and Imaging)

►▼ Show Figures

Figure 1

20 pages, 1426 KiB

Open AccessArticle

Hybrid CNN-NLP Model for Detecting LSB Steganography in Digital Images

by Karen Angulo, Danilo Gil, Andrés Yáñez and Helbert Espitia

Appl. Syst. Innov. 2025, 8(4), 107; https://doi.org/10.3390/asi8040107 - 30 Jul 2025

Abstract

This paper proposes a hybrid model that combines convolutional neural networks with natural language processing techniques for least significant bit-based steganography detection in grayscale digital images. The proposed approach identifies hidden messages by analyzing subtle alterations in the least significant bits and validates the linguistic coherence of the extracted content using a semantic filter implemented with spaCy. The system is trained and evaluated on datasets ranging from 5000 to 12,500 images per class, consistently using an 80% training and 20% validation partition. As a result, the model achieves a maximum accuracy and precision of 99.96%, outperforming recognized architectures such as Xu-Net, Yedroudj-Net, and SRNet. Unlike traditional methods, the model reduces false positives by discarding statistically suspicious but semantically incoherent outputs, which is essential in forensic contexts. Full article

►▼ Show Figures

Figure 1

16 pages, 2431 KiB

Open AccessArticle

AppHerb: Language Model for Recommending Traditional Thai Medicine

by Thanawat Piyasawetkul, Suppachai Tiyaworanant and Tarapong Srisongkram

AI 2025, 6(8), 170; https://doi.org/10.3390/ai6080170 - 29 Jul 2025

Viewed by 80

Abstract

Trust in Traditional Thai Medicine (TTM) among Thai people has been reduced due to a lack of objective standards and the susceptibility of the general population to false information. The emergence of generative artificial intelligence (Gen AI) has significantly impacted various industries, including traditional medicine. However, previous Gen AI models have primarily focused on prescription generation based on Traditional Chinese Medicine (TCM), leaving TTM unexplored. To address this gap, we propose a novel fast-learning fine-tuned language model fortified with TTM knowledge. We utilized textual data from two TTM textbooks, Wat Ratcha-orasaram Ratchaworawihan (WRO), and Tamra Osot Phra Narai (NR), to fine-tune Unsloth’s Gemma-2 with 9 billion parameters. We developed two specialized TTM tasks: treatment prediction (TrP) and herbal recipe generation (HRG). The TrP and HRG models achieved precision, recall, and F1 scores of 26.54%, 28.14%, and 24.00%, and 32.51%, 24.42%, and 24.84%, respectively. Performance evaluation against TCM-based generative models showed comparable precision, recall, and F1 results with a smaller knowledge corpus. We further addressed the challenges of utilizing Thai, a low-resource and linguistically complex language. Unlike English or Chinese, Thai lacks explicit sentence boundary markers and employs an abugida writing system without spaces between words, complicating text segmentation and generation. These characteristics pose significant difficulties for machine understanding and limit model accuracy. Despite these obstacles, our work establishes a foundation for further development of AI-assisted TTM applications and highlights both the opportunities and challenges in applying language models to traditional medicine knowledge systems in Thai language contexts. Full article

(This article belongs to the Section Medical & Healthcare AI)

►▼ Show Figures

Graphical abstract

25 pages, 3625 KiB

Open AccessArticle

Automated Classification of Public Transport Complaints via Text Mining Using LLMs and Embeddings

by Daniyar Rakhimzhanov, Saule Belginova and Didar Yedilkhan

Information 2025, 16(8), 644; https://doi.org/10.3390/info16080644 - 29 Jul 2025

Viewed by 76

Abstract

The proliferation of digital public service platforms and the expansion of e-government initiatives have significantly increased the volume and diversity of citizen-generated feedback. This trend emphasizes the need for classification systems that are not only tailored to specific administrative domains but also robust to the linguistic, contextual, and structural variability inherent in user-submitted content. This study investigates the comparative effectiveness of large language models (LLMs) alongside instruction-tuned embedding models in the task of categorizing public transportation complaints. LLMs were tested using a few-shot inference, where classification is guided by a small set of in-context examples. Embedding models were assessed under three paradigms: label-only zero-shot classification, instruction-based classification, and supervised fine-tuning. Results indicate that fine-tuned embeddings can achieve or exceed the accuracy of LLMs, reaching up to 90 percent, while offering significant reductions in inference latency and computational overhead. E5 embeddings showed consistent generalization across unseen categories and input shifts, whereas BGE-M3 demonstrated measurable gains when adapted to task-specific distributions. Instruction-based classification produced lower accuracy for both models, highlighting the limitations of prompt conditioning in isolation. These findings position multilingual embedding models as a viable alternative to LLMs for classification at scale in data-intensive public sector environments. Full article

(This article belongs to the Special Issue Text Mining: Challenges, Algorithms, Tools and Applications)

►▼ Show Figures

Graphical abstract

19 pages, 2176 KiB

Open AccessArticle

Secrets of More Likes: Understanding eWOM Popularity in Wine Tourism Reviews Through Text Complexity and Personal Disclosure

by Jie Zheng, Xi Wang and Yaning Mao

Tour. Hosp. 2025, 6(3), 145; https://doi.org/10.3390/tourhosp6030145 - 29 Jul 2025

Viewed by 67

Abstract

Online reviews increasingly shape experiential travel decisions. This study investigates how structural and linguistic features of user-generated content influence peer endorsement in wine tourism. While prior research has explored review valence and credibility, limited attention has been paid to how micro-level textual and identity cues affect social approval metrics such as likes. Grounded in the Elaboration Likelihood Model, the analysis draws on 7942 TripAdvisor reviews using automated web scraping, readability metrics, and multivariate regression. Results indicate that location disclosure significantly increases likes, while higher textual complexity reduces endorsement. Title length and reviewer contributions function as peripheral cues, with an interaction between complexity and title length compounding cognitive effort. Findings refine dual-process persuasion theory and offer practical insights for content optimization in post-pandemic tourism engagement. Full article

►▼ Show Figures

Figure 1

15 pages, 336 KiB

Open AccessArticle

Mitigation, Rapport, and Identity Construction in Workplace Requests

by Spyridoula Bella

Languages 2025, 10(8), 179; https://doi.org/10.3390/languages10080179 - 25 Jul 2025

Viewed by 234

Abstract

This study investigates how Greek professionals formulate upward requests and simultaneously manage rapport and workplace identity within hierarchical exchanges. The data comprise 400 written requests elicited through a discourse–completion task from 100 participants, supplemented by follow-up interviews. Integrating pragmatic perspectives on request mitigation with Spencer-Oatey’s Rapport-Management model and a social constructionist perspective on identity, the analysis reveals a distinctive “direct-yet-mitigated” style: syntactically direct head acts (typically want- or need-statements) various mitigating devices. This mitigation enables speakers to preserve superiors’ face, assert entitlement, and invoke shared corporate goals in a single move. Crucially, rapport work is intertwined with identity construction. Strategic oscillation between deference and entitlement projects four recurrent professional personae: the deferential subordinate, the competent and deserving employee, the cooperative team-player, and the rights-aware negotiator. Speakers shift among these personae to calibrate relational distance, demonstrating that rapport management functions not merely as a politeness calculus but as a resource for dynamic identity performance. This study thus bridges micro-pragmatic choices and macro social meanings, showing how linguistic mitigation safeguards interpersonal harmony while scripting desirable workplace selves. Full article

(This article belongs to the Special Issue Greek Speakers and Pragmatics)

34 pages, 9281 KiB

Open AccessArticle

A Statistical Framework for Modeling Behavioral Engagement via Topic and Psycholinguistic Features: Evidence from High-Dimensional Text Data

by Dan Li and Yi Zhang

Mathematics 2025, 13(15), 2374; https://doi.org/10.3390/math13152374 - 24 Jul 2025

Viewed by 148

Abstract

This study investigates how topic-specific expression by women delivery riders on digital platforms predicts their community engagement, emphasizing the mediating role of self-disclosure and the moderating influence of cognitive and emotional language features. Using unsupervised topic modeling (Top2Vec, Topical Vectors via Embeddings and Clustering) and psycholinguistic analysis (LIWC, Linguistic Inquiry and Word Count), the paper extracted eleven thematic clusters and quantified self-disclosure intensity, cognitive complexity, and emotional polarity. A moderated mediation model was constructed to estimate the indirect and conditional effects of topic probability on engagement behaviors (likes, comments, and views) via self-disclosure. The results reveal that self-disclosure significantly mediates the influence of topical content on engagement, with emotional negativity amplifying and cognitive complexity selectively enhancing this pathway. Indirect effects differ across topics, highlighting the heterogeneous behavioral salience of expressive themes. The findings support a statistically grounded, semantically interpretable framework for predicting user behavior in high-dimensional text environments. This approach offers practical implications for optimizing algorithmic content ranking and fostering equitable visibility for marginalized digital labor groups. Full article

(This article belongs to the Special Issue Machine Learning and Statistical Methods to Prediction and Optimal Decision-Making)

►▼ Show Figures

Figure 1

26 pages, 3526 KiB

Open AccessArticle

All Roads Lead to Excellence: A Comparative Scientometric Assessment of French and Dutch European Research Council Grant Winners’ Academic Performance in the Domain of Social Sciences and Humanities

by Gergely Ferenc Lendvai, Petra Aczél and Péter Sasvári

Publications 2025, 13(3), 34; https://doi.org/10.3390/publications13030034 - 24 Jul 2025

Viewed by 397

Abstract

This study investigates how differing national research governance models impact academic performance by comparing European Research Council (ERC) grant winners in the social sciences and humanities from France and the Netherlands. Situated within the broader context of centralized versus decentralized research systems, the analysis aims to understand how these structures shape publication trends, thematic diversity, and collaboration patterns. Drawing on Scopus and SciVal data covering 9996 publications by 305 ERC winners between 2019 and 2023, we employed a multi-method approach, including latent Dirichlet allocation for topic modeling, compound annual growth rate analysis, and co-authorship network analysis. The results show that neuroscience, climate change, and psychology are dominant domains, with language and linguistics particularly prevalent in France and law and political science in the Netherlands. French ERC winners are more likely to be affiliated with national or sectoral institutions, whereas in the Netherlands, elite universities dominate. Collaboration emerged as a key success factor, with an average of four co-authors per publication and network analyses revealing central figures who bridge topical clusters. International collaborations were consistently linked with higher visibility, while single-authored publications showed limited impact. These findings suggest that institutional context and collaborative practices significantly shape research performance in both countries. Full article

►▼ Show Figures

Figure 1

22 pages, 1346 KiB

Open AccessArticle

Understanding Video Narratives Through Dense Captioning with Linguistic Modules, Contextual Semantics, and Caption Selection

by Dvijesh Bhatt and Priyank Thakkar

AI 2025, 6(8), 166; https://doi.org/10.3390/ai6080166 - 23 Jul 2025

Viewed by 355

Abstract

Dense video captioning involves identifying, localizing, and describing multiple events within a video. Capturing temporal and contextual dependencies between events is essential for generating coherent and accurate captions. To effectively capture temporal and contextual dependencies between events, we propose Dense Video Captioning with Dual Contextual, Systematic, and Linguistic Modules (DVC-DCSL), a novel dense video captioning model that integrates contextual, semantic, and linguistic modules. The proposed approach employs two uni-directional LSTMs (forward and backward) to generate distinct captions for each event. A caption selection mechanism then processes these outputs to determine the final caption. In addition, contextual alignment is improved by incorporating visual and textual features from previous video segments into the captioning module, ensuring smoother narrative transitions. Comprehensive experiments conducted using the ActivityNet dataset demonstrate that DVC-DCSL increases the Meteor score from 11.28 to 12.71, representing a 12% improvement over state-of-the-art models in the field of dense video captioning. These results highlight the effectiveness of the proposed approach in improving dense video captioning quality through contextual and linguistic integration. Full article

►▼ Show Figures

Figure 1

33 pages, 9781 KiB

Open AccessArticle

Spatial Narrative Optimization in Digitally Gamified Architectural Scenarios

by Deshao Wang, Jieqing Xu and Luwang Chen

Buildings 2025, 15(15), 2597; https://doi.org/10.3390/buildings15152597 - 23 Jul 2025

Viewed by 193

Abstract

Currently, exploring digital immersive experiences is a new trend in the innovation and development of cultural tourism. This study addresses the growing demand for digital immersion in cultural tourism by examining the integration of spatial narrative and digitally gamified architectural scenarios. This study synthesizes an optimized framework for narrative design in digitally gamified architectural scenarios, integrating spatial narrative theory and feedback-informed design. The proposed model comprises four key components: (1) developing spatial narrative design methods for such scenarios; (2) constructing a spatial language system for spatial narratives using linguistic principles to organize narrative expression; (3) building a preliminary digitally gamified scenario based on the “Wuhu Jiaoji Temple Renovation Project” after architectural and environmental enhancements; and (4) optimization through thermal feedback experiments—collecting visitor trajectory heatmaps, eye-tracking heatmaps, and oculometric data. The results show that the optimized design, validated in the original game Dreams of Jiaoji, effectively enhanced spatial narrative execution by refining both on-site and in-game architectural scenarios. Post-optimization visitor feedback confirmed the validity of the proposed optimization strategies and principles, providing theoretical and practical references for innovative digital cultural tourism models and architectural design advancements. In the context of site-specific architectural conservation, this approach achieves two key objectives: the generalized interpretation of architectural cultural resources and their visual representation through gamified interactions. This paradigm not only enhances public engagement through enabling a multidimensional understanding of historical building cultures but also accelerates the protective reuse of heritage sites, allowing heritage value to be maximized through contemporary reinterpretation. The interdisciplinary methodology promotes sustainable development in the digital transformation of cultural tourism, fostering user-centered experiences and contributing to rural revitalization. Ultimately, this study highlights the potential use of digitally gamified architectural scenarios as transformative tools for heritage preservation, cultural dissemination, and rural community revitalization. Full article

(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)

►▼ Show Figures

Figure 1

32 pages, 2182 KiB

Open AccessArticle

Detection of Biased Phrases in the Wiki Neutrality Corpus for Fairer Digital Content Management Using Artificial Intelligence

by Abdullah, Muhammad Ateeb Ather, Olga Kolesnikova and Grigori Sidorov

Big Data Cogn. Comput. 2025, 9(7), 190; https://doi.org/10.3390/bdcc9070190 - 21 Jul 2025

Viewed by 350

Abstract

Detecting biased language in large-scale corpora, such as the Wiki Neutrality Corpus, is essential for promoting neutrality in digital content. This study systematically evaluates a range of machine learning (ML) and deep learning (DL) models for the detection of biased and pre-conditioned phrases. Conventional classifiers, including Extreme Gradient Boosting (XGBoost), Light Gradient-Boosting Machine (LightGBM), and Categorical Boosting (CatBoost), are compared with advanced neural architectures such as Bidirectional Encoder Representations from Transformers (BERT), Long Short-Term Memory (LSTM) networks, and Generative Adversarial Networks (GANs). A novel hybrid architecture is proposed, integrating DistilBERT, LSTM, and GANs within a unified framework. Extensive experimentation with intermediate variants DistilBERT + LSTM (without GAN) and DistilBERT + GAN (without LSTM) demonstrates that the fully integrated model consistently outperforms all alternatives. The proposed hybrid model achieves a cross-validation accuracy of 99.00%, significantly surpassing traditional baselines such as XGBoost (96.73%) and LightGBM (96.83%). It also exhibits superior stability, statistical significance (paired t-tests), and favorable trade-offs between performance and computational efficiency. The results underscore the potential of hybrid deep learning models for capturing subtle linguistic bias and advancing more objective and reliable automated content moderation systems. Full article

►▼ Show Figures

Figure 1

22 pages, 2112 KiB

Open AccessArticle

Cultural Diversity and the Operational Performance of Airport Security Checkpoints: An Analysis of Energy Consumption and Passenger Flow

by Jacek Ryczyński, Artur Kierzkowski, Marta Nowakowska and Piotr Uchroński

Energies 2025, 18(14), 3853; https://doi.org/10.3390/en18143853 - 20 Jul 2025

Viewed by 279

Abstract

This paper examines the operational consequences and energy demands associated with the growing cultural diversity of air travellers at airport security checkpoints. The analysis focuses on how an increasing proportion of passengers requiring enhanced security screening, due to cultural, religious, or linguistic factors, affects both system throughput and energy consumption. The methodology integrates synchronised measurement of passenger flow with real-time monitoring of electricity usage. Four operational scenarios, representing incremental shares (0–15%) of passengers subject to extended screening, were modelled. The findings indicate that a 15% increase in this passenger group leads to a statistically significant rise in average power consumption per device (3.5%), a total energy usage increase exceeding 4%, and an extension of average service time by 0.6%—the cumulative effect results in a substantial annual contribution to the airport’s carbon footprint. The results also reveal a higher frequency and intensity of power consumption peaks, emphasising the need for advanced infrastructure management. The study emphasises the significance of predictive analytics, dynamic resource allocation, and the implementation of energy-efficient technologies. Furthermore, systematic intercultural competency training is recommended for security staff. These insights provide a scientific basis for optimising airport security operations amid increasing passenger heterogeneity. Full article

(This article belongs to the Topic Sustainable Energy: Efficient Technological Solutions Combining Environmental, Economic, Political and Social Aspects, 2nd Edition)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 23.

Go to page 1 2 3 4 5

Search Results (1,137)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI