MDPI - Publisher of Open Access Journals

16 pages, 494 KiB

Open AccessArticle

Kaddish and Other Millin Setimin: Esoteric Languages in Jewish–American Narratives

by Ofra Amihay

Humanities 2025, 14(7), 149; https://doi.org/10.3390/h14070149 - 15 Jul 2025

Viewed by 263

In this article, I analyze the use of Hebrew, Yiddish, and Aramaic texts—and the Kaddish in particular—as esoteric tongues in Jewish–American narratives, including poems, plays, television shows, and films. I suggest that by doing so, the creators of these works evoke the Lurianic [...] Read more.

In this article, I analyze the use of Hebrew, Yiddish, and Aramaic texts—and the Kaddish in particular—as esoteric tongues in Jewish–American narratives, including poems, plays, television shows, and films. I suggest that by doing so, the creators of these works evoke the Lurianic notion of millin setimin or “secreted words”—utterances that transcend the communicative function of everyday speech and partake in some profound revelations. I hope to show that from Allen Ginsberg, through Tony Kushner, to the Coen Brothers and beyond, Jewish–American creators have been evoking Jewish tongues both as symbols of a lost past and as millin setimin that aspire to restore the connection to that past, within the Jewish–American community and beyond. Full article

(This article belongs to the Special Issue Comparative Jewish Literatures)

45 pages, 12653 KiB

Open AccessArticle

Mastery, Modality, and Tsotsil Coexpressivity

by John B. Haviland

Languages 2025, 10(7), 169; https://doi.org/10.3390/languages10070169 - 15 Jul 2025

Viewed by 613

Abstract

“Coexpressivity” is the property of utterances that marshal multiple linguistic elements and modalities simultaneously to perform the distinct linguistic functions of Jakobson’s classic analysis (1960). This study draws on a longitudinal corpus of natural conversation recorded over six decades with an accomplished “master [...] Read more.

“Coexpressivity” is the property of utterances that marshal multiple linguistic elements and modalities simultaneously to perform the distinct linguistic functions of Jakobson’s classic analysis (1960). This study draws on a longitudinal corpus of natural conversation recorded over six decades with an accomplished “master speaker” of Tsotsil (Mayan), adept at using his language to manage different aspects of social life. The research aims to elaborate the notion of coexpressivity through detailed examples drawn from a range of circumstances. It begins with codified emic speech genres linked to prayer and formal declamation and then ranges through conversational narratives to gossip-laden multiparty interaction, to emphasize coexpressive connections between speech as text and concurrent gesture, gaze, and posture among interlocutors; audible modalities such as sound symbolism, pitch, and speech rate; and finally, specific morphological characteristics and the multifunctional effects of lexical choices themselves. The study thus explores how multiple functions may, in principle, be coexpressed simultaneously or contemporaneously in individual utterances if one takes this range of modalities and expressive resources into account. The notion of “master speaker” relates to coexpressive virtuosity by linking the resources available in speech, body, and interactive environments to accomplishing a wide range of social ends, perhaps with a special flourish although not excluded from humbler, plainer talk. Full article

(This article belongs to the Special Issue Coexpressivity, Gesture, and Language Emergence: Modality, Composition, and Creation)

► Show Figures

Figure 1

19 pages, 2212 KiB

Open AccessArticle

A Self-Evaluated Bilingual Automatic Speech Recognition System for Mandarin–English Mixed Conversations

by Xinhe Hai, Kaviya Aranganadin, Cheng-Cheng Yeh, Zhengmao Hua, Chen-Yun Huang, Hua-Yi Hsu and Ming-Chieh Lin

Appl. Sci. 2025, 15(14), 7691; https://doi.org/10.3390/app15147691 - 9 Jul 2025

Viewed by 400

Abstract

Bilingual communication is increasingly prevalent in this globally connected world, where cultural exchanges and international interactions are unavoidable. Existing automatic speech recognition (ASR) systems are often limited to single languages. However, the growing demand for bilingual ASR in human–computer interactions, particularly in medical [...] Read more.

Bilingual communication is increasingly prevalent in this globally connected world, where cultural exchanges and international interactions are unavoidable. Existing automatic speech recognition (ASR) systems are often limited to single languages. However, the growing demand for bilingual ASR in human–computer interactions, particularly in medical services, has become indispensable. This article addresses this need by creating an application programming interface (API)-based platform using VOSK, a popular open-source single-language ASR toolkit, to efficiently deploy a self-evaluated bilingual ASR system that seamlessly handles both primary and secondary languages in tasks like Mandarin–English mixed-speech recognition. The mixed error rate (MER) is used as a performance metric, and a workflow is outlined for its calculation using the edit distance algorithm. Results show a remarkable reduction in the Mandarin–English MER, dropping from ∼65% to under 13%, after implementing the self-evaluation framework and mixed-language algorithms. These findings highlight the importance of a well-designed system to manage the complexities of mixed-language speech recognition, offering a promising method for building a bilingual ASR system using existing monolingual models. The framework might be further extended to a trilingual or multilingual ASR system by preparing mixed-language datasets and computer development without involving complex training. Full article

► Show Figures

Figure 1

25 pages, 2093 KiB

Open AccessArticle

Deep Learning-Based Speech Enhancement for Robust Sound Classification in Security Systems

by Samuel Yaw Mensah, Tao Zhang, Nahid AI Mahmud and Yanzhang Geng

Electronics 2025, 14(13), 2643; https://doi.org/10.3390/electronics14132643 - 30 Jun 2025

Viewed by 733

Abstract

Deep learning has emerged as a powerful technique for speech enhancement, particularly in security systems where audio signals are often degraded by non-stationary noise. Traditional signal processing methods struggle in such conditions, making it difficult to detect critical sounds like gunshots, alarms, and [...] Read more.

Deep learning has emerged as a powerful technique for speech enhancement, particularly in security systems where audio signals are often degraded by non-stationary noise. Traditional signal processing methods struggle in such conditions, making it difficult to detect critical sounds like gunshots, alarms, and unauthorized speech. This study investigates a hybrid deep learning framework that combines Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Generative Adversarial Networks (GANs) to enhance speech quality and improve sound classification accuracy in noisy security environments. The proposed model is trained and validated using real-world datasets containing diverse noise distortions, including VoxCeleb for benchmarking speech enhancement and UrbanSound8K and ESC-50 for sound classification. Performance is evaluated using industry-standard metrics such as Perceptual Evaluation of Speech Quality (PESQ), Short-Time Objective Intelligibility (STOI), and Signal-to-Noise Ratio (SNR). The architecture includes multi-layered neural networks, residual connections, and dropout regularization to ensure robustness and generalizability. Additionally, the paper addresses key challenges in deploying deep learning models for security applications, such as computational complexity, latency, and vulnerability to adversarial attacks. Experimental results demonstrate that the proposed DNN + GAN-based approach significantly improves speech intelligibility and classification performance in high-interference scenarios, offering a scalable solution for enhancing the reliability of audio-based security systems. Full article

► Show Figures

Figure 1

14 pages, 341 KiB

Open AccessArticle

Hidden Behind Homonymy: Infamy or Sanctity?

by Jewgienij Zubkow

Religions 2025, 16(7), 836; https://doi.org/10.3390/rel16070836 - 25 Jun 2025

Viewed by 267

Abstract

This research focuses on the ideological sphere of criminals with the highest status in the Russian Federation. This ideological sphere was studied in literary sources of various kinds on the basis of repeatability (the existence of linguistic facts) and averaging (external and internal [...] Read more.

This research focuses on the ideological sphere of criminals with the highest status in the Russian Federation. This ideological sphere was studied in literary sources of various kinds on the basis of repeatability (the existence of linguistic facts) and averaging (external and internal confrontation of sources). It is suggested that, in speech, there exist some selective overinterpretations of world religions that neglect basic elements of the traditional law-abiding picture of the world and that are directly based on literary fiction instead of the scientific literature. On the other hand, there can be some search for faith connected with the belief in spiritual knowledge from the dead, divine beings, and God. Full article

(This article belongs to the Special Issue Divine Encounters: Exploring Religious Themes in Literature)

19 pages, 692 KiB

Open AccessReview

Music Therapy and Music-Based Interventions in Pediatric Neurorehabilitation

by Elisa Milcent Fernandez and Christopher J. Newman

Children 2025, 12(6), 773; https://doi.org/10.3390/children12060773 - 14 Jun 2025

Viewed by 830

Abstract

Background: Music therapy and music-based interventions are increasingly recognized as valuable adjuncts in pediatric neurorehabilitation, leveraging rhythm, singing, instrument playing, and improvisation to support children with neurological disabilities. Objective/Method: This narrative review synthesizes evidence from studies published between 2000 and 2025, focusing on [...] Read more.

Background: Music therapy and music-based interventions are increasingly recognized as valuable adjuncts in pediatric neurorehabilitation, leveraging rhythm, singing, instrument playing, and improvisation to support children with neurological disabilities. Objective/Method: This narrative review synthesizes evidence from studies published between 2000 and 2025, focusing on children aged 3 to 18 years receiving neurorehabilitation. Results: The literature demonstrates that music therapy and music-based interventions can improve motor function—particularly gait and upper limb coordination—as well as speech production, while also reducing anxiety and enhancing participation. Techniques such as rhythmic auditory stimulation and melodic intonation therapy have shown promise in targeting movement and communication deficits. Music therapy is further associated with positive effects on vital signs and emotional well-being, supporting its role in holistic care. Neurobiological findings suggest that music-based interventions may promote neuroplasticity and strengthen brain connectivity, though high-quality mechanistic studies remain limited. Conclusions: Despite methodological heterogeneity and small sample sizes in the current literature, the overall evidence supports music therapy and music-based interventions as accessible, cost-effective, and child-centered complements to standard neurorehabilitation. Future research should prioritize rigorous clinical trials and neurobiological investigations to clarify mechanisms and optimize therapeutic protocols. Full article

(This article belongs to the Section Pediatric Neurology & Neurodevelopmental Disorders)

► Show Figures

Figure 1

27 pages, 6771 KiB

Open AccessArticle

A Deep Neural Network Framework for Dynamic Two-Handed Indian Sign Language Recognition in Hearing and Speech-Impaired Communities

by Vaidhya Govindharajalu Kaliyaperumal and Paavai Anand Gopalan

Sensors 2025, 25(12), 3652; https://doi.org/10.3390/s25123652 - 11 Jun 2025

Viewed by 535

Abstract

Language is that kind of expression by which effective communication with another can be well expressed. One may consider such as a connecting bridge for bridging communication gaps for the hearing- and speech-impaired, even though it remains as an advanced method for hand [...] Read more.

Language is that kind of expression by which effective communication with another can be well expressed. One may consider such as a connecting bridge for bridging communication gaps for the hearing- and speech-impaired, even though it remains as an advanced method for hand gesture expression along with identification through the various different unidentified signals to configure their palms. This challenge can be met with a novel Enhanced Convolutional Transformer with Adaptive Tuna Swarm Optimization (ECT-ATSO) recognition framework proposed for double-handed sign language. In order to improve both model generalization and image quality, preprocessing is applied to images prior to prediction, and the proposed dataset is organized to handle multiple dynamic words. Feature graining is employed to obtain local features, and the ViT transformer architecture is then utilized to capture global features from the preprocessed images. After concatenation, this generates a feature map that is then divided into various words using an Inverted Residual Feed-Forward Network (IRFFN). Using the Tuna Swarm Optimization (TSO) algorithm in its enhanced form, the provided Enhanced Convolutional Transformer (ECT) model is optimally tuned to handle the problem dimensions with convergence problem parameters. In order to solve local optimization constraints when adjusting the position for the tuna update process, a mutation operator was introduced. The dataset visualization that demonstrates the best effectiveness compared to alternative cutting-edge methods, recognition accuracy, and convergences serves as a means to measure performance of this suggested framework. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

13 pages, 1695 KiB

Open AccessArticle

Deepfake Voice Detection: An Approach Using End-to-End Transformer with Acoustic Feature Fusion by Cross-Attention

by Liang Yu Gong and Xue Jun Li

Electronics 2025, 14(10), 2040; https://doi.org/10.3390/electronics14102040 - 16 May 2025

Viewed by 844

Abstract

Deepfake technology uses artificial intelligence to create highly realistic but fake audio, video, or images, often making it difficult to distinguish from real content. Due to its potential use for misinformation, fraud, and identity theft, deepfake technology has gained a bad reputation in [...] Read more.

Deepfake technology uses artificial intelligence to create highly realistic but fake audio, video, or images, often making it difficult to distinguish from real content. Due to its potential use for misinformation, fraud, and identity theft, deepfake technology has gained a bad reputation in the digital world. Recently, many works have reported on the detection of deepfake videos/images. However, few studies have concentrated on developing robust deepfake voice detection systems. Among most existing studies in this field, a deepfake voice detection system commonly requires a large amount of training data and a robust backbone to detect real and logistic attack audio. For acoustic feature extractions, Mel-frequency Filter Bank (MFB)-based approaches are more suitable for extracting speech signals than applying the raw spectrum as input. Recurrent Neural Networks (RNNs) have been successfully applied to Natural Language Processing (NLP), but these backbones suffer from gradient vanishing or explosion while processing long-term sequences. In addition, the cross-dataset evaluation of most deepfake voice recognition systems has weak performance, leading to a system robustness issue. To address these issues, we propose an acoustic feature-fusion method to combine Mel-spectrum and pitch representation based on cross-attention mechanisms. Then, we combine a Transformer encoder with a convolutional neural network block to extract global and local features as a front end. Finally, we connect the back end with one linear layer for classification. We summarized several deepfake voice detectors’ performances on the silence-segment processed ASVspoof 2019 dataset. Our proposed method can achieve an Equal Error Rate (EER) of 26.41%, while most of the existing methods result in EER higher than 30%. We also tested our proposed method on the ASVspoof 2021 dataset, and found that it can achieve an EER as low as 28.52%, while the EER values for existing methods are all higher than 28.9%. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

17 pages, 1017 KiB

Open AccessArticle

Using Voice-to-Text Transcription to Examine Outcomes of AirPods Pro Receivers When Used as Part of Remote Microphone System

by Shuang Qi and Linda Thibodeau

Appl. Sci. 2025, 15(10), 5451; https://doi.org/10.3390/app15105451 - 13 May 2025

Viewed by 482

Abstract

Hearing difficulty in noise can occur in 10–15% of listeners with typical hearing in the general population of the United States. Using one’s smartphone as a remote microphone (RM) system with the AirPods Pro (AP) may be considered an assistive device given its [...] Read more.

Hearing difficulty in noise can occur in 10–15% of listeners with typical hearing in the general population of the United States. Using one’s smartphone as a remote microphone (RM) system with the AirPods Pro (AP) may be considered an assistive device given its wide availability and potentially lower price. To evaluate this possibility, the accuracy of voice-to-text transcription for sentences presented in noise was compared, when KEMAR wore an AP receiver connected to an iPhone set to function as an RM system, to the accuracy obtained when it wore a sophisticated Phonak Roger RM system. A ten-sentence list was presented for six technology arrangements at three signal-to-noise ratios (SNRs; +5, 0, and −5 dB) in two types of noise (speech-shaped and babble noise). Each sentence was transcribed by Otter AI to obtain an overall percent accuracy for each condition. At the most challenging SNR (−5 dB SNR) across both noise types, the Roger system and smartphone/AP set to noise cancelation mode showed significantly higher accuracy relative to the condition when the smartphone/AP was in transparency mode. However, the major limitation of Bluetooth signal delay when using the AP/smartphone system would require further investigation in real-world settings with human users. Full article

► Show Figures

Figure 1

13 pages, 1923 KiB

Open AccessArticle

Shooting the Messenger? Harassment and Hate Speech Directed at Journalists on Social Media

by Simón Peña-Fernández, Urko Peña-Alonso, Ainara Larrondo-Ureta and Jordi Morales-i-Gras

Societies 2025, 15(5), 130; https://doi.org/10.3390/soc15050130 - 10 May 2025

Viewed by 441

Abstract

Journalists have incorporated social networks into their work as a standard tool, enhancing their ability to produce and disseminate information and making it easier for them to connect more directly with their audiences. However, this greater presence in the digital public sphere has [...] Read more.

Journalists have incorporated social networks into their work as a standard tool, enhancing their ability to produce and disseminate information and making it easier for them to connect more directly with their audiences. However, this greater presence in the digital public sphere has also increased their exposure to harassment and hate speech, particularly in the case of women journalists. This study analyzes the presence of harassment and hate speech in responses (n = 60,684) to messages that 200 journalists and media outlets posted on X (formerly Twitter) accounts during the days immediately preceding and following the July 23 (23-J) general elections held in Spain in 2023. The results indicate that the most common forms of harassment were insults and political hate, which were more frequently aimed at personal accounts than institutional ones, highlighting the significant role of political polarization—particularly during election periods—in shaping the hostility that journalists face. Moreover, although, generally speaking, the total number of harassing messages was similar for men and women, it was found that a greater number of sexist messages were aimed at women journalists, and an ideological dimension was identified in the hate speech that extremists or right-wing populists directed at them. This study corroborates that this is a minor but systemic issue, particularly from a political and gender perspective. To counteract this, the media must develop proactive policies and protective actions extending even to the individual level, where this issue usually applies. Full article

(This article belongs to the Special Issue The Impact of Algorithms on Public Opinion: Disinformation, Social Media Use and Generative Artificial Intelligence)

► Show Figures

Figure 1

22 pages, 2852 KiB

Open AccessArticle

The Role of Buddhism in the Language Ecology and Vitality of Tai Phake in Assam (India) and Wutun in Qinghai (China)

by U-tain Wongsathit, Erika Sandman and Chingduang Yurayong

Religions 2025, 16(5), 566; https://doi.org/10.3390/rel16050566 - 28 Apr 2025

Viewed by 645

Abstract

This study examines the role of Buddhism in the vitality of local languages as an asset of indigenous traditions, focusing on two geographically disconnected minority language communities: Tai Phake in the state of Assam, India, and Wutun (Ngandehua) in the Qinghai [...] Read more.

This study examines the role of Buddhism in the vitality of local languages as an asset of indigenous traditions, focusing on two geographically disconnected minority language communities: Tai Phake in the state of Assam, India, and Wutun (Ngandehua) in the Qinghai province of China. The investigation addresses various factors related to the ecology of speech communities discussed in connection with religion. The data are based on longitudinal observations from personal fieldwork in the respective locations over the past two decades. The descriptive and comparative analysis applies an ecology-based typology of minority language situations to assess the contribution of individual factors in three different domains (speakers, language, and setting) to the vitality of the Tai Phake and Wutun languages. The results reveal several areas in which Buddhism as a cultural authority has noticeably contributed to language preservation. The effects of Buddhism are considered significant in enhancing demographic stability, social setting, attitudes, awareness of historical legacy, education in monasteries, and sustainable economics. In contrast, religion does not account for the vitality of these local languages in situations where a low degree of dialectal variation does not complicate intergenerational transmission of language, the minority status of the speech community is unique, and space for language in the institutionalised domain of use is insufficiently provided. Full article

(This article belongs to the Special Issue Religion and Indigenous Traditions)

26 pages, 3268 KiB

Open AccessArticle

The Neural Mechanisms of Private Speech in Second Language Learners’ Oral Production: An fNIRS Study

by Rong Jiang, Zhe Xiao, Yihan Jiang and Xueqing Jiang

Brain Sci. 2025, 15(5), 451; https://doi.org/10.3390/brainsci15050451 - 25 Apr 2025

Viewed by 682

Abstract

Background: According to Vygotsky’s sociocultural theory, private speech functions both as a tool for thought regulation and as a transitional form between outer and inner speech. However, its role in adult second language (L2) learning—and the neural mechanisms supporting it—remains insufficiently understood. [...] Read more.

Background: According to Vygotsky’s sociocultural theory, private speech functions both as a tool for thought regulation and as a transitional form between outer and inner speech. However, its role in adult second language (L2) learning—and the neural mechanisms supporting it—remains insufficiently understood. This study thus examined whether private speech facilitates L2 oral production and investigated its underlying neural mechanisms, including the extent to which private speech resembles inner speech in its regulatory function and the transitional nature of private speech. Methods: In Experiment 1, to identify natural users of private speech, 64 Chinese-speaking L2 English learners with varying proficiency levels were invited to complete a picture-description task. In Experiment 2, functional near-infrared spectroscopy (fNIRS) was used to examine the neural mechanisms of private speech in 32 private speech users identified in Experiment 1. Results: Experiment 1 showed that private speech facilitates L2 oral production. Experiment 2 revealed that private and inner speech elicited highly similar patterns of functional connectivity. Among high-proficiency learners, private speech exhibited enhanced connectivity between the language network and the thought-regulation network, indicating involvement of higher-order cognitive processes. In contrast, among low-proficiency learners, connectivity was primarily restricted to language-related regions, suggesting that private speech supports basic linguistic processing at early stages. Furthermore, both private and outer speech showed stronger connectivity in speech-related brain regions. Conclusions: This is the first study to examine the neural mechanisms of private speech in L2 learners by using fNIRS. The findings provide novel neural evidence that private speech serves as both a regulatory scaffold and a transitional form bridging outer and inner speech. Its cognitive function appears to evolve with increasing L2 proficiency. Full article

(This article belongs to the Section Behavioral Neuroscience)

► Show Figures

Figure 1

28 pages, 1376 KiB

Open AccessArticle

Fitting in with Porteños: Case Studies of Dialectal Feature Production, Investment, and Identity During Study Abroad

by Rebecca Pozzi, Chelsea Escalante, Lucas Bugarín, Myrna Pacheco-Ramos, Ximena Pichón and Tracy Quan

Languages 2025, 10(4), 68; https://doi.org/10.3390/languages10040068 - 28 Mar 2025

Viewed by 830

Abstract

In recent years, several studies across a variety of target languages (e.g., Chinese, French, and Spanish) have demonstrated that students who study abroad acquire target-like patterns of variation. In Spanish-speaking contexts, recent research has moved beyond investigating the acquisition of features specific to [...] Read more.

In recent years, several studies across a variety of target languages (e.g., Chinese, French, and Spanish) have demonstrated that students who study abroad acquire target-like patterns of variation. In Spanish-speaking contexts, recent research has moved beyond investigating the acquisition of features specific to Spain to examine that of features used in immersion contexts such as Mexico, the Dominican Republic, Ecuador, Peru, and Argentina. Nevertheless, many of these studies either rely on quantitative variationist analysis or implement qualitative analysis of one or two target dialectal features. In addition, learner omission and expression of pronominal subjects in these contexts have been largely underexplored. Using a mixed-methods approach, this study not only quantitatively examines learners’ production of several features of Buenos Aires Spanish, including sheísmo/zheísmo, /s/-weakening, voseo, and subject pronoun expression, but it also qualitatively relates the production of these features to learners’ experiences during a five-month semester in Argentina. It aims to answer the following research questions: When and to what degree do three English-speaking students studying abroad for five months in Buenos Aires, Argentina acquire target-like production of [ʃ] and/or [ʒ], s-weakening, vos, and subject pronoun expression? How do participants’ experiences, communities of practice, investments, identities, and imagined communities relate to this production? Speech data were gathered prior to, at the midpoint, and at the end of the semester by means of sociolinguistic interviews and elicitation tasks. To further understand the connection between these learners’ use of the target features and their overseas experiences, we explored the case studies of three learners of Spanish of differing proficiency levels (beginning, intermediate, and advanced) using qualitative data collected during semi-structured interviews at each interview time. The results suggest that all three learners increased their production of the prestigious, salient dialectal features of sheísmo/zheísmo and vos during the sojourn and that the amount of increase was greater at each proficiency level. While the beginning and intermediate learners did not move toward target-like norms in their use of the often-stigmatized, less salient, variable features of /s/-weakening and subject pronoun expression, the advanced learner did. As such, stigma, salience, and variability, as well as proficiency level, may play a role in the acquisition of variable features. Learners’ investment in the target language and participation in local communities of practice increased at each proficiency level as well, and learners’ imagined communities beyond their study abroad experiences were related to their identity construction and linguistic choices abroad. Full article

(This article belongs to the Special Issue The Acquisition of L2 Sociolinguistic Competence)

► Show Figures

Figure 1

14 pages, 1025 KiB

Open AccessArticle

Rhythmic Analysis in Animal Communication, Speech, and Music: The Normalized Pairwise Variability Index Is a Summary Statistic of Rhythm Ratios

by Yannick Jadoul, Francesca D’Orazio, Vesta Eleuteri, Jelle van der Werff, Tommaso Tufarelli, Marco Gamba, Teresa Raimondi and Andrea Ravignani

Vibration 2025, 8(2), 12; https://doi.org/10.3390/vibration8020012 - 24 Mar 2025

Viewed by 960

Abstract

Rhythm is fundamental in many physical and biological systems. Rhythm is relevant to a broad range of phenomena across different fields, including animal bioacoustics, speech sciences, and music cognition. As a result, the interest in developing consistent quantitative measures for cross-disciplinary rhythmic analysis [...] Read more.

Rhythm is fundamental in many physical and biological systems. Rhythm is relevant to a broad range of phenomena across different fields, including animal bioacoustics, speech sciences, and music cognition. As a result, the interest in developing consistent quantitative measures for cross-disciplinary rhythmic analysis is growing. Two quantitative measures that can be directly applied to any temporal structure are the normalized pairwise variability index (nPVI) and rhythm ratios (r_k). The nPVI summarizes the overall isochrony of a sequence, i.e., how regularly spaced a sequence’s events are, as a single value. Meanwhile, r_k quantifies ratios between a sequence’s adjacent intervals and is often used for identifying rhythmic categories. Here, we show that these two rhythmic measures are fundamentally connected: the nPVI is a summary static of the r_k values of a temporal sequence. This result offers a deeper understanding of how these measures are applied. It also opens the door for creating novel, custom measures to quantify rhythmic patterns based on a sequence’s r_k distribution and compare rhythmic patterns across different domains. The explicit connection between nPVI and r_k is one further step towards a common quantitative toolkit for rhythm research across disciplines. Full article

► Show Figures

Figure 1

21 pages, 1274 KiB

Open AccessArticle

Heterogeneous Graph Neural Network with Multi-View Contrastive Learning for Cross-Lingual Text Classification

by Xun Li and Kun Zhang

Appl. Sci. 2025, 15(7), 3454; https://doi.org/10.3390/app15073454 - 21 Mar 2025

Viewed by 721

Abstract

The cross-lingual text classification task remains a long-standing challenge that aims to train a classifier on high-resource source languages and apply it to classify texts in low-resource target languages, bridging linguistic gaps while maintaining accuracy. Most existing methods achieve exceptional performance by relying [...] Read more.

The cross-lingual text classification task remains a long-standing challenge that aims to train a classifier on high-resource source languages and apply it to classify texts in low-resource target languages, bridging linguistic gaps while maintaining accuracy. Most existing methods achieve exceptional performance by relying on multilingual pretrained language models to transfer knowledge across languages. However, little attention has been paid to factors beyond semantic similarity, which leads to the degradation of classification performance in the target languages. This study proposes a novel framework, a heterogeneous graph neural network with multi-view contrastive learning for cross-lingual text classification, which integrates a heterogeneous graph architecture with multi-view contrastive learning for the cross-lingual text classification task. This study constructs a heterogeneous graph to capture both syntactic and semantic knowledge by connecting document and word nodes using different types of edges, including Part-of-Speech tagging, dependency, similarity, and translation edges. A Graph Attention Network is applied to aggregate information from neighboring nodes. Furthermore, this study devises a multi-view contrastive learning strategy to enhance model performance by pulling positive examples closer together and pushing negative examples further apart. Extensive experiments show that the framework outperforms the previous state-of-the-art model, achieving improvements of 2.20% in accuracy and 1.96% in F1-score on the XGLUE and Amazon Review datasets, respectively. These findings demonstrate that the proposed model makes a positive impact on the cross-lingual text classification task overall. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

Search Results (204)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (204)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI