MDPI - Publisher of Open Access Journals

28 pages, 2658 KB

Open AccessArticle

Analysis of Robustness and Interpretability of Multinomial Naïve Bayes and Tiny Text CNN Models for SMS Spam Detection Under Adversarial Attacks

by Murad A. Rassam and Redhwan Shaddad

Information 2026, 17(5), 408; https://doi.org/10.3390/info17050408 - 24 Apr 2026

Viewed by 237

Abstract

The growing complexity of unwanted messages, especially SMS spam, presents a serious challenge to the security of digital communication and user experience. While conventional spam detection models are useful on clean datasets, they are vulnerable to targeted attacks that aim to evade detection. [...] Read more.

The growing complexity of unwanted messages, especially SMS spam, presents a serious challenge to the security of digital communication and user experience. While conventional spam detection models are useful on clean datasets, they are vulnerable to targeted attacks that aim to evade detection. This study is motivated by the urgent need to evaluate the resilience of machine learning models against evolving threats in real-world applications. We specifically investigate the robustness and interpretability of a Multinomial Naive Bayes (MNB) model, representative of traditional machine learning, and a Tiny Text convolutional neural network (Tiny Text CNN), representative of deep learning models, for SMS spam detection. Using the UCI dataset under simulated adversarial text attacks, both models were tested against filler-word insertion and character-level perturbation attacks. Results show that while the Tiny Text CNN maintained higher overall robustness (accuracy: 0.9821 clean vs. 0.9758 under character attacks), both models experienced notable degradation in recall, with MNB being more susceptible to filler-word attacks. Interpretability analyses using LIME and gradient-based saliency maps indicated that adversarial perturbations alter feature importance, diminishing the influence of spam-indicative tokens. The findings underscore the trade-offs between model complexity and adversarial resilience, offering insights for developing more secure and interpretable spam detection systems. Full article

(This article belongs to the Special Issue Advances and Emerging Applications of Machine Learning, Evolutionary and Swarm Intelligence in Cybersecurity)

► Show Figures

Graphical abstract

9 pages, 1166 KB

Open AccessProceeding Paper

Development of Transactional Filipino Sign Language Recognition System Using MediaPipe and Gated Recurrent Units

by Angela Cardano, Franz Railey Columna and Jocelyn Villaverde

Eng. Proc. 2026, 134(1), 47; https://doi.org/10.3390/engproc2026134047 - 14 Apr 2026

Viewed by 283

Abstract

Persistent communication barriers for the deaf and hard-of-hearing community in the Philippines are addressed in this study by developing a Filipino Sign Language Recognition (SLR) system. The system focuses on transactional signs commonly used in commercial environments such as markets and public facilities, [...] Read more.

Persistent communication barriers for the deaf and hard-of-hearing community in the Philippines are addressed in this study by developing a Filipino Sign Language Recognition (SLR) system. The system focuses on transactional signs commonly used in commercial environments such as markets and public facilities, thereby filling a gap left by existing SLR models. A vision-based approach was adopted, employing MediaPipe for landmark detection and Gated Recurrent Units for translating signs into text. To train the model, a custom dataset comprising 1065 video samples of 26 transactional signs was created, accounting for subtle variations in individual signing styles. The complete system was implemented on a Raspberry Pi 5 equipped with a webcam and touchscreen display. When evaluated on unseen data, the system achieved a recognition accuracy of 87%, demonstrating its potential for real-world applications in supporting commercial interactions for deaf and hard-of-hearing individuals. Full article

(This article belongs to the Proceedings of The 7th Eurasia Conference on IoT, Communication and Engineering 2025 (ECICE 2025))

► Show Figures

Figure 1

29 pages, 6283 KB

Open AccessArticle

Modularity-Driven Keyword Co-Occurrence Network for Mining Statistical Associations in Construction Safety Accidents

by Shu Liu, Weidong Yan, Jian Ma, Guoqi Liu and Rui Zhang

Buildings 2026, 16(7), 1461; https://doi.org/10.3390/buildings16071461 - 7 Apr 2026

Viewed by 298

Abstract

To address the limitations of traditional construction safety accident analysis, which relies on manually defined causal relationships, requires extensive data annotation, and struggles to identify latent risks from Chinese unstructured texts, this study proposes an unsupervised and data-driven framework, termed CESA-Miner, for mining [...] Read more.

To address the limitations of traditional construction safety accident analysis, which relies on manually defined causal relationships, requires extensive data annotation, and struggles to identify latent risks from Chinese unstructured texts, this study proposes an unsupervised and data-driven framework, termed CESA-Miner, for mining statistical association patterns among construction safety accidents. The proposed framework adopts a modularity-driven keyword optimization strategy to automatically identify a stable set of risk-related features. Based on this, an accident risk weighted co-occurrence network is constructed, where statistical associations are represented through keyword co-occurrence patterns and network community structures. Community detection algorithms are then applied to identify accident clusters and their underlying relationships. Using a dataset of 1368 official construction accident reports, the results show that the network modularity increases from 0.173 to 0.683, indicating significantly improved structural quality and community separability. In the absence of explicit ground truth, structural quality is evaluated using network modularity as a proxy metric. Compared with conventional clustering-based and embedding-based approaches, the proposed method yields a more structurally distinct network community organization and offers a complementary structure-aware perspective for characterizing accident relationships. The framework enables large-scale intelligent analysis of accident texts without requiring manual annotation, providing data-driven support for latent risk identification and statistical pattern analysis in construction safety. Full article

(This article belongs to the Special Issue AI in Construction: Automation, Optimization, and Safety)

► Show Figures

Figure 1

18 pages, 1815 KB

Open AccessArticle

Predictive Maintenance MCP: An Open-Source Framework for Bridging Large Language Models and Industrial Condition Monitoring via the Model Context Protocol

by Luigi Gianpio Di Maggio

Appl. Sci. 2026, 16(6), 2812; https://doi.org/10.3390/app16062812 - 15 Mar 2026

Viewed by 931

Abstract

This paper presents a Proof of Concept (PoC) for PredictiveMaintenance MCP, an open-source server based on the Model Context Protocol (MCP) that supports machine condition monitoring and predictive maintenance via natural language interaction with Large Language Models (LLMs). The server constrains the [...] Read more.

This paper presents a Proof of Concept (PoC) for PredictiveMaintenance MCP, an open-source server based on the Model Context Protocol (MCP) that supports machine condition monitoring and predictive maintenance via natural language interaction with Large Language Models (LLMs). The server constrains the LLM within an explicit perimeter of deterministic resources and tools for vibration-based diagnostics, including FFT spectral analysis with peak identification, envelope analysis for rolling element bearing defects, time-domain indicators, vibration severity assessment consistent with ISO standards and semi-supervised anomaly detection on extracted features. Each tool invocation produces structured outputs and artifacts that record inputs, parameters, and results. The LLM acts as an orchestrator that selects resources, configures parameters, invokes tools, and synthesizes conclusions anchored to computed evidence, thereby improving traceability and repeatability compared to unconstrained text-only interaction. End-to-end workflows are demonstrated in a reproducible package with code, examples, and demo data to support community-driven validation and extension toward industrial requirements. The software is archived on Zenodo and the GitHub repository serves as the collaboration hub. Full article

(This article belongs to the Section Mechanical Engineering)

► Show Figures

Figure 1

15 pages, 669 KB

Open AccessArticle

Dementia Detection from Spontaneous Speech Using Cross-Attention Fusion

by Felix Agbavor and Hualou Liang

J. Dement. Alzheimer's Dis. 2026, 3(1), 12; https://doi.org/10.3390/jdad3010012 - 2 Mar 2026

Viewed by 588

Abstract

Background/Objectives: Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that affects the daily lives of older adults, impacting their cognitive abilities as well as speech and language communication. Early detection is crucial, as it enables timely intervention and helps improve the quality [...] Read more.

Background/Objectives: Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that affects the daily lives of older adults, impacting their cognitive abilities as well as speech and language communication. Early detection is crucial, as it enables timely intervention and helps improve the quality of life for those affected. While large language models (LLMs) have shown promise from spontaneous speech, most studies are unimodal and miss complementary signals across modalities. Methods: We present an LLM-powered multimodal cross-attention framework that integrates lexical (text), acoustic (speech), and visual (image) information for dementia detection using the ADReSSo 2021 picture-description dataset. Within this framework, text data are encoded using the ModernBERT, audio features are extracted using the wav2vec 2.0-base-960, and the Cookie Theft image is represented through the CLIP ViT-L/14. These embeddings are linearly projected to a shared space and then combined via Transformer-based cross-attention, yielding a fused vector for AD detection. Results: Our results show that the trimodal model achieved the best overall performance when paired with an SVC classifier, reaching an accuracy of 0.8732 and an F1 score of 0.8571, surpassing both the top-performing unimodal and bimodal configurations. For interpretability, a sensitivity analysis of modality contributions reveals that text plays the primary role, audio provides complementary improvements, and image offers modest yet stabilizing contextual support. Conclusions: These results highlight that the method of multimodal embedding fusion significantly influences performance: a cross-attention block achieves an effective balance between accuracy and simplicity, producing integrated representations that align well with interpretable downstream classifiers. Full article

► Show Figures

Figure 1

11 pages, 244 KB

Open AccessReview

The Ocular Surface Bacterial Microbiome and the Impact of Contact Lens Use: A Literature Review

by Laura De Luca, Feliciana Menna, Stefano Lupo, Enzo Maria Vingolo, Matteo Mario Carlà, Maura Mancini, Giovanni William Oliverio, Letteria Minutoli, Antonio Baldascino, Cosimo Mazzotta, Pasquale Aragona and Alessandro Meduri

Microorganisms 2026, 14(3), 518; https://doi.org/10.3390/microorganisms14030518 - 24 Feb 2026

Viewed by 585

Abstract

The ocular surface microbiome plays a critical role in maintaining ocular health, preventing infections, and regulating immune responses. Contact lens (CL) wear has been linked to alterations in microbial composition, potentially leading to dysbiosis and increased susceptibility to ocular infections. This review aims [...] Read more.

The ocular surface microbiome plays a critical role in maintaining ocular health, preventing infections, and regulating immune responses. Contact lens (CL) wear has been linked to alterations in microbial composition, potentially leading to dysbiosis and increased susceptibility to ocular infections. This review aims to summarize current evidence on the effects of CL use on the ocular microbiome and to discuss strategies to preserve microbial homeostasis. A literature search was conducted in PubMed, Scopus, Web of Science, and Google Scholar for English-language human studies published between January 2005 and January 2025. We included original studies and systematic reviews evaluating the ocular surface bacterial community in contact lens (CL) wearers using either sequencing-based approaches (microbiome; e.g., 16S rRNA gene sequencing/metagenomics) or culture-based methods (microbiota). Two authors screened titles/abstracts and full texts. Overall, 12 studies met the inclusion criteria and were qualitatively synthesized. Across included studies, CL wear was associated with reproducible changes in the ocular surface bacterial community, most commonly a shift toward a skin-like profile and increased detection/relative abundance of opportunistic taxa (e.g., Pseudomonas, Acinetobacter, and Staphylococcus aureus) together with reduced representation of typical ocular commensals in several sequencing-based datasets. Culture-based studies reported increased recovery of opportunistic bacteria from lenses and storage cases, supporting contamination/biofilm-related mechanisms. Lens care solutions and preservatives were reported to modulate bacterial profiles and may contribute to dysbiosis, although evidence remains heterogeneous across study designs and analytic pipelines. CL use is associated with significant alterations in the ocular microbiome, increasing the risk of microbial keratitis and corneal inflammatory events. Strategies to maintain microbial balance, including careful selection of lens care products and development of antimicrobial lenses, may improve ocular surface health in CL wearers. Future longitudinal studies with standardized sampling and analytic workflows are needed to clarify causal links between CL-associated microbial changes and clinical outcomes. Full article

(This article belongs to the Section Medical Microbiology)

29 pages, 770 KB

Open AccessArticle

Revisiting SMS Spam Detection: The Impact of Feature Representation on Classical Machine Learning Models

by Meryem Soysaldı Şahin, Durmuş Özkan Şahin and Areej Fateh Salah

Electronics 2026, 15(4), 894; https://doi.org/10.3390/electronics15040894 - 21 Feb 2026

Viewed by 811

Abstract

The proliferation of unsolicited short messages (SMS spam) poses persistent challenges to mobile communication security and user privacy. This study presents a systematic benchmarking and analytical investigation of classical machine learning approaches for SMS spam detection, focusing on the impact of text feature [...] Read more.

The proliferation of unsolicited short messages (SMS spam) poses persistent challenges to mobile communication security and user privacy. This study presents a systematic benchmarking and analytical investigation of classical machine learning approaches for SMS spam detection, focusing on the impact of text feature representation under imbalanced short-text conditions.In practical SMS filtering systems, minimizing false positives (i.e., incorrectly blocking legitimate messages) is a critical operational constraint. Therefore, beyond overall accuracy, precision and specificity are emphasized to ensure reliable preservation of legitimate communication. Using the SMSSpamCollection dataset (5574 messages: 747 spam and 4827 ham), seven feature representation techniques were evaluated in combination with six widely adopted classifiers, resulting in 42 configurations assessed under 10-fold cross-validation. The results demonstrate that feature representation plays a more critical role than classifier complexity. Character-level 3-grams combined with Logistic Regression achieved the best overall performance, reaching 98.55% accuracy, with 98.55% precision and 90.50% recall for the spam class (F1-score = 94.32%), and 0.9893 AUC. Linear SVM produced comparable results, highlighting the effectiveness of linear models when paired with expressive representations. Beyond reporting performance metrics, this study analyzes feature–classifier interaction patterns and clarifies practical trade-offs between precision, recall, and computational efficiency. The findings provide reproducible baselines and structured guidance for designing efficient SMS spam filtering systems. Full article

(This article belongs to the Topic Recent Advances in Artificial Intelligence for Security and Security for Artificial Intelligence)

► Show Figures

Figure 1

21 pages, 4705 KB

Open AccessArticle

Computational and Graph-Theoretic Analysis of Legislative Networks: New Zealand’s Mental Health Act as a Case Study

by Iman Ardekani, Maryam Ildoromi, Neda Sakhaee, Sewmini Gunawardhana and Parmida Raeis

Information 2026, 17(2), 161; https://doi.org/10.3390/info17020161 - 5 Feb 2026

Viewed by 501

Abstract

This paper presents a computational framework for constructing and analysing a focal legislative citation network. A depth-limited expansion strategy generates subgraphs of the network that capture the local structural environment of a seed Act while avoiding the global hub dominance present in whole-corpus [...] Read more.

This paper presents a computational framework for constructing and analysing a focal legislative citation network. A depth-limited expansion strategy generates subgraphs of the network that capture the local structural environment of a seed Act while avoiding the global hub dominance present in whole-corpus analyses. Centrality measures and community detection show how the seed Act’s perceived influence changes with network radius. To incorporate semantic information, we develop and apply an Large Language Model (LLM)-assisted topic modelling method in which representative keywords and LLM-generated summaries form a compact text representation that is converted into a Term Frequency-Inverse Document Frequency (TF–IDF) document–term matrix. Although demonstrated on New Zealand’s mental health legislation, the framework generalises to any legislative corpus or jurisdiction. Integrating graph-theoretic structure with LLM-assisted semantic modelling provides a scalable approach for analysing legislative systems, identifying domain-specific clusters, and supporting computational studies of legal evolution and policy impact. Full article

(This article belongs to the Section Information Theory and Methodology)

► Show Figures

Figure 1

5 pages, 1524 KB

Open AccessProceeding Paper

SMSProcessing Using Optical Character Recognition for Smishing Detection

by Lidia Prudente-Tixteco, Jesus Olivares-Mercado and Linda Karina Toscano-Medina

Eng. Proc. 2026, 123(1), 12; https://doi.org/10.3390/engproc2026123012 - 3 Feb 2026

Viewed by 448

Abstract

Instant messaging services are the main modern means of communication because they allow the exchange of messages between people anywhere and through many types of devices. Smishing involves sending text messages spoofing banks, government institutions, or companies in order to deceive. These messages [...] Read more.

Instant messaging services are the main modern means of communication because they allow the exchange of messages between people anywhere and through many types of devices. Smishing involves sending text messages spoofing banks, government institutions, or companies in order to deceive. These messages often include malicious links that redirect users to fraudulent websites designed to steal personal information and commit financial fraud, identity theft, and extortion, among other crimes. Detecting smishing requires techniques to prevent access to dynamic links generated by cybercriminals to take control of devices or to consult blacklists of malicious links. Optical Character Recognition (OCR) recognizes text embedded in images without accessing links. This paper presents a conceptual model that uses OCR to extract text from messages suspected of smishing from a screenshot of a mobile device so that further processing can analyze whether it is smishing. Full article

(This article belongs to the Proceedings of First Summer School on Artificial Intelligence in Cybersecurity)

► Show Figures

Figure 1

28 pages, 1241 KB

Open AccessEditor’s ChoiceArticle

Joint Learning for Metaphor Detection and Interpretation Based on Gloss Interpretation

by Yanan Liu, Hai Wan and Jinxia Lin

Electronics 2026, 15(2), 456; https://doi.org/10.3390/electronics15020456 - 21 Jan 2026

Viewed by 369

Abstract

Metaphor is ubiquitous in daily communication and makes language expression more vivid. Identifying metaphorical words, known as metaphor detection, is crucial for capturing the real meaning of a sentence. As an important step of metaphorical understanding, the correct interpretation of metaphorical words [...] Read more.

Metaphor is ubiquitous in daily communication and makes language expression more vivid. Identifying metaphorical words, known as metaphor detection, is crucial for capturing the real meaning of a sentence. As an important step of metaphorical understanding, the correct interpretation of metaphorical words directly affects metaphor detection. This article investigates how to use metaphor interpretation to enhance metaphor detection. Since previous approaches for metaphor interpretation are coarse-grained or constrained by ambiguous meanings of substitute words, we propose a different interpretation mechanism that explains metaphorical words by means of gloss-based interpretations. To comprehensively explore the optimal joint strategy, we go beyond previous work by designing diverse model architectures. We investigate both classification and sequence labeling paradigms, incorporating distinct component designs based on MIP and SPV theories. Furthermore, we integrate Part-of-Speech tags and external knowledge to further refine the feature representation. All methods utilize pre-trained language models to encode text and capture semantic information of the text. Since this mechanism involves both metaphor detection and metaphor interpretation but there is a lack of datasets annotated for both tasks, we have enhanced three datasets with glosses for metaphor detection: one Chinese dataset (PSUCMC) and two English datasets (TroFi and VUA). Experimental results demonstrate that the proposed joint methods are superior to or at least comparable to state-of-the-art methods on the three enhanced datasets. Results confirm that joint learning of metaphor detection and gloss-based interpretation makes metaphor detection more accurate. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

22 pages, 2873 KB

Open AccessArticle

Resource-Constrained Edge AI Solution for Real-Time Pest and Disease Detection in Chili Pepper Fields

by Hoyoung Chung, Jin-Hwi Kim, Junseong Ahn, Yoona Chung, Eunchan Kim and Wookjae Heo

Agriculture 2026, 16(2), 223; https://doi.org/10.3390/agriculture16020223 - 15 Jan 2026

Viewed by 1630

Abstract

This paper presents a low-cost, fully on-premise Edge Artificial Intelligence (AI) system designed to support real-time pest and disease detection in open-field chili pepper cultivation. The proposed architecture integrates AI-Thinker ESP32-CAM module (ESP32-CAM) image acquisition nodes (“Sticks”) with a Raspberry Pi 5–based edge [...] Read more.

This paper presents a low-cost, fully on-premise Edge Artificial Intelligence (AI) system designed to support real-time pest and disease detection in open-field chili pepper cultivation. The proposed architecture integrates AI-Thinker ESP32-CAM module (ESP32-CAM) image acquisition nodes (“Sticks”) with a Raspberry Pi 5–based edge server (“Module”), forming a plug-and-play Internet of Things (IoT) pipeline that enables autonomous operation upon simple power-up, making it suitable for aging farmers and resource-limited environments. A Leaf-First 2-Stage vision model was developed by combining YOLOv8n-based leaf detection with a lightweight ResNet-18 classifier to improve the diagnostic accuracy for small lesions commonly occurring in dense pepper foliage. To address network instability, which is a major challenge in open-field agriculture, the system adopted a dual-protocol communication design using Hyper Text Transfer Protocol (HTTP) for Joint Photographic Experts Group (JPEG) transmission and Message Queuing Telemetry Transport (MQTT) for event-driven feedback, enhanced by Redis-based asynchronous buffering and state recovery. Deployment-oriented experiments under controlled conditions demonstrated an average end-to-end latency of 0.86 s from image capture to Light Emitting Diode (LED) alert, validating the system’s suitability for real-time decision support in crop management. Compared to heavier models (e.g., YOLOv11 and ResNet-50), the lightweight architecture reduced the computational cost by more than 60%, with minimal loss in detection accuracy. This study highlights the practical feasibility of resource-constrained Edge AI systems for open-field smart farming by emphasizing system-level integration, robustness, and real-time operability, and provides a deployment-oriented framework for future extension to other crops. Full article

(This article belongs to the Special Issue Smart Sensor-Based Systems for Crop Monitoring)

► Show Figures

Figure 1

20 pages, 498 KB

Open AccessArticle

Defending Against Backdoor Attacks in Federated Learning: A Triple-Phase Client-Side Approach

by Yunran Chen and Boyuan Li

Electronics 2026, 15(2), 273; https://doi.org/10.3390/electronics15020273 - 7 Jan 2026

Viewed by 927

Abstract

Federated learning effectively addresses the issues of data privacy and communication overhead in traditional deep learning through distributed local training. However, its open architecture is seriously threatened by backdoor attacks, where malicious clients can implant triggers to control the global model. To address [...] Read more.

Federated learning effectively addresses the issues of data privacy and communication overhead in traditional deep learning through distributed local training. However, its open architecture is seriously threatened by backdoor attacks, where malicious clients can implant triggers to control the global model. To address these issues, this paper proposes a novel three-stage defense mechanism based on local clients. First, through text readability analysis, each client’s local data is independently evaluated to construct a global scoring distribution model, and a dynamic threshold is used to precisely locate and remove suspicious samples with low readability. Second, frequency analysis and perturbation are performed on the remaining data to identify and disrupt triggers based on specific words while preserving the basic semantics of the text. Third, n-gram distribution analysis is employed to detect and remove samples containing abnormally high-frequency word sequences, which may correspond to complex backdoor attack patterns. Experimental results show that this method can effectively defend against various backdoor attacks with minimal impact on model accuracy, providing a new solution for the security of federated learning. Full article

(This article belongs to the Special Issue Empowering IoT with AI: AIoT for Smart and Autonomous Systems)

► Show Figures

Figure 1

26 pages, 3117 KB

Open AccessArticle

C-STEER: A Dynamic Sentiment-Aware Framework for Fake News Detection with Lifecycle Emotional Evolution

by Ziyi Zhen and Ying Li

Informatics 2026, 13(1), 4; https://doi.org/10.3390/informatics13010004 - 5 Jan 2026

Viewed by 1083

Abstract

The dynamic evolution of collective emotions across the news dissemination life-cycle is a powerful yet underexplored signal in affective computing. While phenomena like the spread of fake news depend on eliciting specific emotional trajectories, existing methods often fail to capture these crucial dynamic [...] Read more.

The dynamic evolution of collective emotions across the news dissemination life-cycle is a powerful yet underexplored signal in affective computing. While phenomena like the spread of fake news depend on eliciting specific emotional trajectories, existing methods often fail to capture these crucial dynamic affective cues. Many approaches focus on static text or propagation topology, limiting their robustness and failing to model the complete emotional life-cycle for applications such as assessing veracity. This paper introduces C-STEER (Cycle-aware Sentiment-Temporal Emotion Evolution), a novel framework grounded in communication theory, designed to model the characteristic initiation, burst, and decay stages of these emotional arcs. Guided by Diffusion of Innovations Theory, C-STEER first segments an information cascade into its life-cycle phases. It then operationalizes insights from Uses and Gratifications Theory and Emotional Contagion Theory to extract stage-specific emotional features and model their temporal dependencies using a Bidirectional Long Short-Term Memory (BiLSTM). To validate the framework’s descriptive and predictive power, we apply it to the challenging domain of fake news detection. Experiments on the Weibo21 and Twitter16 datasets demonstrate that modeling life-cycle emotion dynamics significantly improves detection performance, achieving F1-macro scores of 91.6% and 90.1%, respectively, outperforming state-of-the-art baselines by margins of 1.6% to 2.4%. This work validates the C-STEER framework as an effective approach for the computational modeling of collective emotion life-cycles. Full article

(This article belongs to the Special Issue Practical Applications of Sentiment Analysis)

► Show Figures

Figure 1

23 pages, 725 KB

Open AccessArticle

From Sound to Risk: Streaming Audio Flags for Real-World Hazard Inference Based on AI

by Ilyas Potamitis

J. Sens. Actuator Netw. 2026, 15(1), 6; https://doi.org/10.3390/jsan15010006 - 1 Jan 2026

Viewed by 1671

Abstract

Seconds count differently for people in danger. We present a real-time streaming pipeline for audio-based detection of hazardous life events affecting life and property. The system operates online rather than as a retrospective analysis tool. Its objective is to reduce the latency between [...] Read more.

Seconds count differently for people in danger. We present a real-time streaming pipeline for audio-based detection of hazardous life events affecting life and property. The system operates online rather than as a retrospective analysis tool. Its objective is to reduce the latency between the occurrence of a crime, conflict, or accident and the corresponding response by authorities. The key idea is to map reality as perceived by audio into a written story and question the text via a large language model. The method integrates streaming, zero-shot algorithms in an online decoding mode that convert sound into short, interpretable tokens, which are processed by a lightweight language model. CLAP text–audio prompting identifies agitation, panic, and distress cues, combined with conversational dynamics derived from speaker diarization. Lexical information is obtained through streaming automatic speech recognition, while general audio events are detected by a streaming version of Audio Spectrogram Transformer tagger. Prosodic features are incorporated using pitch- and energy-based rules derived from robust F0 tracking and periodicity measures. The system uses a large language model configured for online decoding and outputs binary (YES/NO) life-threatening risk decisions every two seconds, along with a brief justification and a final session-level verdict. The system emphasizes interpretability and accountability. We evaluate it on a subset of the X-Violence dataset, comprising only real-world videos. We release code, prompts, decision policies, evaluation splits, and example logs to enable the community to replicate, critique, and extend our blueprint. Full article

(This article belongs to the Topic Trends and Prospects in Security, Encryption and Encoding)

► Show Figures

Figure 1

20 pages, 1070 KB

Open AccessArticle

LJ-TTS: A Paired Real and Synthetic Speech Dataset for Single-Speaker TTS Analysis

by Viola Negroni, Davide Salvi, Luca Comanducci, Taiba Majid Wani, Madleen Uecker, Irene Amerini, Stefano Tubaro and Paolo Bestagini

Electronics 2026, 15(1), 169; https://doi.org/10.3390/electronics15010169 - 30 Dec 2025

Viewed by 1341

Abstract

In this paper, we present LJ-TTS, a large-scale single-speaker dataset of real and synthetic speech designed to support research in text-to-speech (TTS) synthesis and analysis. The dataset builds upon high-quality recordings of a single English speaker, alongside outputs generated by 11 state-of-the-art TTS [...] Read more.

In this paper, we present LJ-TTS, a large-scale single-speaker dataset of real and synthetic speech designed to support research in text-to-speech (TTS) synthesis and analysis. The dataset builds upon high-quality recordings of a single English speaker, alongside outputs generated by 11 state-of-the-art TTS models, including both autoregressive and non-autoregressive architectures. By maintaining a controlled single-speaker setting, LJ-TTS enables precise comparison of speech characteristics across different generative models, isolating the effects of synthesis methods from speaker variability. Unlike multi-speaker datasets lacking alignment between real and synthetic samples, LJ-TTS provides exact utterance-level correspondence, allowing fine-grained analyses that are otherwise impractical. The dataset supports systematic evaluation of synthetic speech across multiple dimensions, including deepfake detection, source tracing, and phoneme-level analyses. LJ-TTS provides a standardized resource for benchmarking generative models, assessing the limits of current TTS systems, and developing robust detection and evaluation methods. The dataset is publicly available to the research community to foster reproducible and controlled studies in speech synthesis and synthetic speech detection. Full article

(This article belongs to the Special Issue Emerging Trends in Generative-AI Based Audio Processing)

► Show Figures

Figure 1

Search Results (184)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (184)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI