MDPI - Publisher of Open Access Journals

26 pages, 1747 KiB

Open AccessArticle

Quality over Quantity: An Effective Large-Scale Data Reduction Strategy Based on Pointwise V-Information

by Fei Chen and Wenchi Zhou

Electronics 2025, 14(15), 3092; https://doi.org/10.3390/electronics14153092 - 1 Aug 2025

Viewed by 152

In order to increase the effectiveness of model training, data reduction is essential to data-centric Artificial Intelligence (AI). It achieves this by locating the most instructive examples in massive datasets. To increase data quality and training efficiency, the main difficulty is choosing the [...] Read more.

In order to increase the effectiveness of model training, data reduction is essential to data-centric Artificial Intelligence (AI). It achieves this by locating the most instructive examples in massive datasets. To increase data quality and training efficiency, the main difficulty is choosing the best examples rather than the complete datasets. In this paper, we propose an effective data reduction strategy based on Pointwise

V

-Information (PVI). To enable a static method, we first use PVI to quantify instance difficulty and remove instances with low difficulty. Experiments show that classifier performance is maintained with only a 0.0001% to 0.76% decline in accuracy when 10–30% of the data is removed. Second, we train the classifiers using a progressive learning strategy on examples sorted by increasing PVI, accelerating convergence and achieving a 0.8% accuracy gain over conventional training. Our findings imply that training a classifier on the chosen optimal subset may improve model performance and increase training efficiency when combined with an efficient data reduction strategy. Furthermore, we have adapted the PVI framework, which was previously limited to English datasets, to a variety of Chinese Natural Language Processing (NLP) tasks and base models, yielding insightful results for faster training and cross-lingual data reduction. Full article

(This article belongs to the Special Issue Data Retrieval and Data Mining)

► Show Figures

Figure 1

27 pages, 1817 KiB

Open AccessArticle

A Large Language Model-Based Approach for Multilingual Hate Speech Detection on Social Media

by Muhammad Usman, Muhammad Ahmad, Grigori Sidorov, Irina Gelbukh and Rolando Quintero Tellez

Computers 2025, 14(7), 279; https://doi.org/10.3390/computers14070279 - 15 Jul 2025

Viewed by 761

Abstract

The proliferation of hate speech on social media platforms poses significant threats to digital safety, social cohesion, and freedom of expression. Detecting such content—especially across diverse languages—remains a challenging task due to linguistic complexity, cultural context, and resource limitations. To address these challenges, [...] Read more.

The proliferation of hate speech on social media platforms poses significant threats to digital safety, social cohesion, and freedom of expression. Detecting such content—especially across diverse languages—remains a challenging task due to linguistic complexity, cultural context, and resource limitations. To address these challenges, this study introduces a comprehensive approach for multilingual hate speech detection. To facilitate robust hate speech detection across diverse languages, this study makes several key contributions. First, we created a novel trilingual hate speech dataset consisting of 10,193 manually annotated tweets in English, Spanish, and Urdu. Second, we applied two innovative techniques—joint multilingual and translation-based approaches—for cross-lingual hate speech detection that have not been previously explored for these languages. Third, we developed detailed hate speech annotation guidelines tailored specifically to all three languages to ensure consistent and high-quality labeling. Finally, we conducted 41 experiments employing machine learning models with TF–IDF features, deep learning models utilizing FastText and GloVe embeddings, and transformer-based models leveraging advanced contextual embeddings to comprehensively evaluate our approach. Additionally, we employed a large language model with advanced contextual embeddings to identify the best solution for the hate speech detection task. The experimental results showed that our GPT-3.5-turbo model significantly outperforms strong baselines, achieving up to an 8% improvement over XLM-R in Urdu hate speech detection and an average gain of 4% across all three languages. This research not only contributes a high-quality multilingual dataset but also offers a scalable and inclusive framework for hate speech detection in underrepresented languages. Full article

(This article belongs to the Special Issue Recent Advances in Social Networks and Social Media)

► Show Figures

Figure 1

23 pages, 3874 KiB

Open AccessArticle

Optimal Media Control Strategy for Rumor Propagation in a Multilingual Dual Layer Reaction Diffusion Network Model

by Guiyun Liu, Haozhe Xu, Yu Zhu, Yiyang Ma and Zhipeng Chen

Mathematics 2025, 13(14), 2253; https://doi.org/10.3390/math13142253 - 11 Jul 2025

Viewed by 200

Abstract

The rapid advancement of Internet of Things technologies has significantly enhanced cross-regional communication among geographically and linguistically diverse populations on social platforms yet simultaneously accelerated the propagation of rumors across multilingual networks at unprecedented velocity. Therefore, this study focuses on investigating the spatiotemporal [...] Read more.

The rapid advancement of Internet of Things technologies has significantly enhanced cross-regional communication among geographically and linguistically diverse populations on social platforms yet simultaneously accelerated the propagation of rumors across multilingual networks at unprecedented velocity. Therefore, this study focuses on investigating the spatiotemporal propagation dynamics and cross-lingual diffusion characteristics of rumors. Distinguished from conventional approaches, we innovatively formulate a hybrid dual-layer rumor containment model through a reaction–diffusion framework that explicitly incorporates the coupling control effects of media layers with independent propagation dynamics. Furthermore, we rigorously prove the differentiability of control-to-state mappings, which enables the derivation of necessary optimality conditions for the optimal control problem. Finally, comprehensive simulations validate both the adaptability and effectiveness of our media-based spatiotemporal control strategies in multilingual environments. Full article

► Show Figures

Figure 1

24 pages, 5192 KiB

Open AccessArticle

Cross-Lingual Summarization for Low-Resource Languages Using Multilingual Retrieval-Based In-Context Learning

by Gyutae Park, Jeonghyun Park and Hwanhee Lee

Appl. Sci. 2025, 15(14), 7800; https://doi.org/10.3390/app15147800 - 11 Jul 2025

Viewed by 463

Abstract

Cross-lingual summarization (XLS) involves generating a summary in one language from an article written in another language. XLS presents substantial hurdles due to the complex linguistic structures across languages and the challenges in transferring knowledge effectively between them. Although Large Language Models (LLMs) [...] Read more.

Cross-lingual summarization (XLS) involves generating a summary in one language from an article written in another language. XLS presents substantial hurdles due to the complex linguistic structures across languages and the challenges in transferring knowledge effectively between them. Although Large Language Models (LLMs) have demonstrated capabilities in cross-lingual tasks, the integration of retrieval-based in-context learning remains largely unexplored, despite its potential to overcome these linguistic barriers by providing relevant examples. In this paper, we introduce Multilingual Retrieval-based Cross-lingual Summarization (MuRXLS), a robust framework that dynamically selects the most relevant summarization examples for each article using multilingual retrieval. Our method leverages multilingual embedding models to identify contextually appropriate demonstrations for various LLMs. Experiments across twelve XLS setups (six language pairs in both directions) reveal a notable directional asymmetry: our approach significantly outperforms baselines in many-to-one (X→English) scenarios, while showing comparable performance in one-to-many (English→X) directions. We also observe a strong correlation between article-example semantic similarity and summarization quality, demonstrating that intelligently selecting contextually relevant examples substantially improves XLS performance by providing LLMs with more informative demonstrations. Full article

(This article belongs to the Special Issue New Perspectives in Natural Language Processing and Computational Linguistics)

► Show Figures

Figure 1

10 pages, 187 KiB

Open AccessArticle

Correlation of Airway POCUS Measures with Screening and Severity Evaluation Tools in Obstructive Sleep Apnea: An Exploratory Study

by Sapna Ravindranath, Yatish S. Ranganath, Ethan Lemke, Matthew B Behrens, Anil A. Marian, Hari Kalagara, Nada Sadek, Melinda S. Seering, Linder Wendt, Patrick Ten Eyck and Rakesh V. Sondekoppam

J. Clin. Med. 2025, 14(14), 4858; https://doi.org/10.3390/jcm14144858 - 9 Jul 2025

Viewed by 394

Abstract

Background: Obstructive Sleep Apnea (OSA) is a common occurrence in the perioperative patient population but is often undiagnosed. Point-of-Care Ultrasound (POCUS) has emerged as a promising tool for perioperative assessment; however, its effectiveness in detecting the presence or severity of OSA needs to [...] Read more.

Background: Obstructive Sleep Apnea (OSA) is a common occurrence in the perioperative patient population but is often undiagnosed. Point-of-Care Ultrasound (POCUS) has emerged as a promising tool for perioperative assessment; however, its effectiveness in detecting the presence or severity of OSA needs to be evaluated. Objective: We assessed the ability of airway POCUS as a screening and severity evaluation tool for OSA by examining its correlation with STOP-BANG scores and the Apnea–Hypopnea Index (AHI). Design: Cross-sectional observational study. Setting: A single-center study in a tertiary care hospital between June 2020 to May 2021. Patients: Adult patients aged 18–65 with prior Polysomnography (PSG) for OSA workup were screened. Interventions: The participants completed the STOP-BANG questionnaire and subsequently underwent POCUS examinations, either pre- or post-surgery. Ten different POCUS views previously used for evaluating OSA were acquired in a predefined sequence, with subsequent measurements of airway parameters. Outcome measures: Generalized linear modeling was used to explore and assess the relationships between the measured parameters, STOP-BANG, and AHI scores (modeled continuously and categorized into risk levels of STOP-BANG and AHI). Results: A total of 260 patients were screened, of which 142 were enrolled and 127 completed the scanning studies. The median AHI was 16.71, while the STOP-BANG scores were mostly between 5 and 6, indicating a moderate-to-high OSA risk in the study population. Notably, only neck circumference was significantly associated with AHI severity (p = 0.012), whereas none of the other POCUS measures were. Among the POCUS measures, significant associations with STOP-BANG scores were observed for the Tongue Cross-Sectional Area (T-CSA) (p = 0.002), Retro-Palatal Diameter (RPD) (p = 0.034), Distance Between Lingual Arteries (DLA) (p = 0.034), and Geniohyoid Muscle Thickness (GMT) (p = 0.040). Conclusions: Neck circumference is a more reliable predictor of OSA severity (AHI) compared to other POCUS measurements. Many of the POCUS measures had a good correlation with the STOP-BANG scores, highlighting the utility of POCUS as a screening tool for OSA rather than as a severity evaluation tool. Full article

(This article belongs to the Special Issue Innovations in Perioperative Anesthesia and Intensive Care)

13 pages, 1153 KiB

Open AccessArticle

A Novel Approach to the Study of Pathophysiology in Patients with Obstructive Sleep Apnea Using the Iowa Oral Performance Instrument (IOPI)

by Andrés Navarro, Gabriela Bosco, Bárbara Serrano, Peter Baptista, Carlos O’Connor-Reina and Guillermo Plaza

J. Clin. Med. 2025, 14(13), 4781; https://doi.org/10.3390/jcm14134781 - 7 Jul 2025

Viewed by 476

Abstract

Background: Myofunctional therapy has emerged as a treatment option for obstructive sleep apnea (OSA). The Iowa Oral Performance Instrument (IOPI) enables objective measurement of lingual and orofacial muscle strength, although it was originally designed for evaluating dysphagia. OSA is frequently associated with [...] Read more.

Background: Myofunctional therapy has emerged as a treatment option for obstructive sleep apnea (OSA). The Iowa Oral Performance Instrument (IOPI) enables objective measurement of lingual and orofacial muscle strength, although it was originally designed for evaluating dysphagia. OSA is frequently associated with a hypotonic phenotype characterized by reduced strength in upper airway muscles, but its identification remains unclear. Objective: We evaluated the usefulness of IOPI measurements in identifying hypotonic phenotypes among patients with obstructive sleep apnea (OSA). Methods: We carried out a cross-sectional study analyzing the relationship between IOPI scores, sleep polygraphy metrics—such as the apnea–hypopnea index (AHI)—and findings from physical examination. In addition to the standard IOPI protocol, we introduced novel maneuvers aimed at providing a more comprehensive assessment of oropharyngeal muscle function. Results: Although IOPI conventional maneuvers showed no clear association with AHI or ODI, the inferior tongue maneuver showed higher awake tongue strength, with a statistically significant correlation to both AHI (r = 0.2873; p = 0.008) and ODI (r = 0.2495; p = 0.032). Performing each exercise three times yielded highly consistent results across trials (r > 0.94), but did not significantly alter the overall outcome. Interestingly, lower tongue strength values were observed in patients with a high-arched palate (p < 0.05), whereas no relevant associations were found with the presence of a restricted lingual frenulum or CPAP use. Conclusions: Incorporating specific IOPI maneuvers, especially the inferior tongue exercise, may provide additional insight into muscle function in OSA. Selective repetition is advisable for borderline values. Full article

(This article belongs to the Special Issue Obstructive Sleep Apnea: Latest Advances and Prospects)

► Show Figures

Figure 1

16 pages, 1396 KiB

Open AccessArticle

Knowing the Words, Missing the Meaning: Evaluating LLMs’ Cultural Understanding Through Sino-Korean Words and Four-Character Idioms

by Eunsong Lee, Hyein Do, Minsu Kim and Dongsuk Oh

Appl. Sci. 2025, 15(13), 7561; https://doi.org/10.3390/app15137561 - 5 Jul 2025

Viewed by 459

Abstract

This study proposes a new benchmark to evaluate the cultural understanding and natural language processing capabilities of large language models based on Sino-Korean words and four-character idioms. Those are essential linguistic and cultural assets in Korea. Reflecting the official question types of the [...] Read more.

This study proposes a new benchmark to evaluate the cultural understanding and natural language processing capabilities of large language models based on Sino-Korean words and four-character idioms. Those are essential linguistic and cultural assets in Korea. Reflecting the official question types of the Korean Hanja Proficiency Test, we constructed four question categories—four-character idioms, synonyms, antonyms, and homophones—and systematically compared the performance of GPT-based and non-GPT LLMs. GPT-4o showed the highest accuracy and explanation quality. However, challenges remain in distinguishing the subtle nuances of individual characters and in adapting to uniquely Korean meanings as opposed to standard Chinese character interpretations. Our findings reveal a gap in LLMs’ understanding of Korea-specific Hanja culture and underscore the need for evaluation tools reflecting these cultural distinctions. Full article

► Show Figures

Figure 1

24 pages, 3666 KiB

Open AccessArticle

Contrastive Learning Pre-Training and Quantum Theory for Cross-Lingual Aspect-Based Sentiment Analysis

by Xun Li and Kun Zhang

Entropy 2025, 27(7), 713; https://doi.org/10.3390/e27070713 - 1 Jul 2025

Viewed by 362

Abstract

The cross-lingual aspect-based sentiment analysis (ABSA) task continues to pose a significant challenge, as it involves training a classifier on high-resource source languages and then applying it to classify texts in low-resource target languages, thereby bridging linguistic gaps while preserving accuracy. Most existing [...] Read more.

The cross-lingual aspect-based sentiment analysis (ABSA) task continues to pose a significant challenge, as it involves training a classifier on high-resource source languages and then applying it to classify texts in low-resource target languages, thereby bridging linguistic gaps while preserving accuracy. Most existing methods achieve exceptional performance by relying on multilingual pre-trained language models (mPLM) and translation systems to transfer knowledge across languages. However, little attention has been paid to factors beyond semantic similarity, which ultimately hinders classification performance in target languages. To address this challenge, we propose CLQT, a novel framework that combines contrastive learning pre-training with quantum theory to address the cross-lingual ABSA task. Firstly, we develop a contrastive learning strategy to align data between the source and target languages. Subsequently, we incorporate a quantum network that employs quantum projection and quantum entanglement to facilitate effective knowledge transfer across languages. Extensive experiments reveal that the novel CLQT framework both achieves strong results and has a beneficial overall influence on the cross-lingual ABSA task. Full article

(This article belongs to the Special Issue The Future of Quantum Machine Learning and Quantum AI, 2nd Edition)

► Show Figures

Figure 1

27 pages, 2004 KiB

Open AccessArticle

Cross-Lingual Cross-Domain Transfer Learning for Rumor Detection

by Eliana Providel, Marcelo Mendoza and Mauricio Solar

Future Internet 2025, 17(7), 287; https://doi.org/10.3390/fi17070287 - 26 Jun 2025

Viewed by 341

Abstract

This study introduces a novel method that merges propagation-based transfer learning with word embeddings for rumor detection. This approach aims to use data from languages with abundant resources to enhance performance in languages with limited availability of annotated corpora in this task. Furthermore, [...] Read more.

This study introduces a novel method that merges propagation-based transfer learning with word embeddings for rumor detection. This approach aims to use data from languages with abundant resources to enhance performance in languages with limited availability of annotated corpora in this task. Furthermore, we augment our rumor detection framework with two supplementary tasks—stance classification and bot detection—to reinforce the primary task of rumor detection. Utilizing our proposed multi-task system, which incorporates cascade learning models, we generate several pre-trained models that are subsequently fine-tuned for rumor detection in English and Spanish. The results show improvements over the baselines, thus empirically validating the efficacy of our proposed approach. A Macro-F1 of 0.783 is achieved for the Spanish language, and a Macro-F1 of 0.945 is achieved for the English language. Full article

(This article belongs to the Special Issue AI Based Natural Language Processing: Emerging Approaches and Applications)

► Show Figures

Graphical abstract

19 pages, 457 KiB

Open AccessArticle

Transinger: Cross-Lingual Singing Voice Synthesis via IPA-Based Phonetic Alignment

by Chen Shen, Lu Zhao, Cejin Fu, Bote Gan and Zhenlong Du

Sensors 2025, 25(13), 3973; https://doi.org/10.3390/s25133973 - 26 Jun 2025

Viewed by 584

Abstract

Although Singing Voice Synthesis (SVS) has revolutionized audio content creation, global linguistic diversity remains challenging. Current SVS research shows scant exploration of cross-lingual generalization, as fragmented, language-specific phoneme encodings (e.g., Pinyin, ARPA) hinder unified phonetic modeling. To address this challenge, we built a [...] Read more.

Although Singing Voice Synthesis (SVS) has revolutionized audio content creation, global linguistic diversity remains challenging. Current SVS research shows scant exploration of cross-lingual generalization, as fragmented, language-specific phoneme encodings (e.g., Pinyin, ARPA) hinder unified phonetic modeling. To address this challenge, we built a four-language dataset based on GTSinger’s speech data, using the International Phonetic Alphabet (IPA) for consistent phonetic representation and applying precise segmentation and calibration for improved quality. In particular, we propose a novel method of decomposing IPA phonemes into letters and diacritics, enabling the model to deeply learn the underlying rules of pronunciation and achieve better generalization. A dynamic IPA adaptation strategy further enables the application of learned phonetic representations to unseen languages. Based on VISinger2, we introduce Transinger, an innovative cross-lingual synthesis framework. Transinger achieves breakthroughs in phoneme representation learning by precisely modeling pronunciation, which effectively enables compositional generalization to unseen languages. It also integrates Conformer and RVQ techniques to optimize information extraction and generation, achieving outstanding cross-lingual synthesis performance. Objective and subjective experiments have confirmed that Transinger significantly outperforms state-of-the-art singing synthesis methods in terms of cross-lingual generalization. These results demonstrate that multilingual aligned representations can markedly enhance model learning efficacy and robustness, even for languages not seen during training. Moreover, the integration of a strategy that splits IPA phonemes into letters and diacritics allows the model to learn pronunciation more effectively, resulting in a qualitative improvement in generalization. Full article

(This article belongs to the Special Issue Advances in Automatic Speech Recognition, Audio and Underwater Acoustic Signal Analysis)

► Show Figures

Figure 1

20 pages, 904 KiB

Open AccessArticle

Addressing Structural Asymmetry: Unsupervised Joint Training of Bilingual Embeddings for Non-Isomorphic Spaces

by Lei Meng, Xiaona Yang, Shangfeng Chen and Xiaojun Zhao

Symmetry 2025, 17(7), 1005; https://doi.org/10.3390/sym17071005 - 26 Jun 2025

Viewed by 315

Abstract

Bilingual Word Embeddings (BWEs) are crucial for multilingual NLP tasks, enabling cross-lingual transfer. While traditional joint training methods require bilingual corpora, their applicability is limited for many language pairs, especially low-resource ones. Unsupervised methods, relying on the isomorphism assumption, suffer from performance degradation [...] Read more.

Bilingual Word Embeddings (BWEs) are crucial for multilingual NLP tasks, enabling cross-lingual transfer. While traditional joint training methods require bilingual corpora, their applicability is limited for many language pairs, especially low-resource ones. Unsupervised methods, relying on the isomorphism assumption, suffer from performance degradation when dealing with non-isomorphic embedding spaces, which are common in distant language pairs. This structural asymmetry challenges conventional approaches. To address these limitations, we propose a novel unsupervised joint training method for BWEs. We leverage monolingual corpora and introduce a dynamic programming algorithm to extract bilingual text segments, facilitating concurrent BWE training without relying on explicit bilingual supervision. Our approach effectively mitigates the challenge posed by asymmetric, non-isomorphic spaces by jointly learning BWEs in a shared space. Extensive experiments demonstrate the superiority of our method compared to existing approaches, particularly for distant language pairs exhibiting significant structural asymmetry Full article

(This article belongs to the Special Issue Symmetry/Asymmetry Studies in Data Mining & Machine Learning of Large Language Models)

► Show Figures

Figure 1

32 pages, 7150 KiB

Open AccessArticle

A Riemannian Dichotomizer Approach on Symmetric Positive Definite Manifolds for Offline, Writer-Independent Signature Verification

by Nikolaos Vasilakis, Christos Chorianopoulos and Elias N. Zois

Appl. Sci. 2025, 15(13), 7015; https://doi.org/10.3390/app15137015 - 21 Jun 2025

Cited by 1 | Viewed by 385

Abstract

Automated handwritten signature verification continues to pose significant challenges. A common approach for developing writer-independent signature verifiers involves the use of a dichotomizer, a function that generates a dissimilarity vector with the differences between similar and dissimilar pairs of signature descriptors as components. [...] Read more.

Automated handwritten signature verification continues to pose significant challenges. A common approach for developing writer-independent signature verifiers involves the use of a dichotomizer, a function that generates a dissimilarity vector with the differences between similar and dissimilar pairs of signature descriptors as components. The Dichotomy Transform was applied within a Euclidean or vector space context, where vectored representations of handwritten signatures were embedded in and conformed to Euclidean geometry. Recent advances in computer vision indicate that image representations to the Riemannian Symmetric Positive Definite (SPD) manifolds outperform vector space representations. In offline signature verification, both writer-dependent and writer-independent systems have recently begun leveraging Riemannian frameworks in the space of SPD matrices, demonstrating notable success. This work introduces, for the first time in the signature verification literature, a Riemannian dichotomizer employing Riemannian dissimilarity vectors (RDVs). The proposed framework explores a number of local and global (or common pole) topologies, as well as simple serial and parallel fusion strategies for RDVs for constructing robust models. Experiments were conducted on five popular signature datasets of Western and Asian origin, using blind intra- and cross-lingual experimental protocols. The results indicate the discriminative capabilities of the proposed Riemannian dichotomizer framework, which can be compared to other state-of-the-art and computationally demanding architectures. Full article

(This article belongs to the Special Issue Applications of Image Processing and Pattern Recognition in Biometrics)

► Show Figures

Figure 1

20 pages, 1955 KiB

Open AccessArticle

by Svitlana Biloshchytska, Arailym Tleubayeva, Oleksandr Kuchanskyi, Andrii Biloshchytskyi, Yurii Andrashko, Sapar Toxanov, Aidos Mukhatayev and Saltanat Sharipova

Appl. Sci. 2025, 15(12), 6707; https://doi.org/10.3390/app15126707 - 15 Jun 2025

Viewed by 598

Abstract

This study presents an advanced hybrid approach for detecting near-duplicate texts in the Kazakh language, addressing the specific challenges posed by its agglutinative morphology. The proposed method combines statistical and semantic techniques, including N-gram analysis, TF-IDF, LSH, LSA, and LDA, and is benchmarked [...] Read more.

This study presents an advanced hybrid approach for detecting near-duplicate texts in the Kazakh language, addressing the specific challenges posed by its agglutinative morphology. The proposed method combines statistical and semantic techniques, including N-gram analysis, TF-IDF, LSH, LSA, and LDA, and is benchmarked against the bert-base-multilingual-cased model. Experiments were conducted on the purpose-built Arailym-aitu/KazakhTextDuplicates corpus, which contains over 25,000 manually modified text fragments using typical techniques, such as paraphrasing, word order changes, synonym substitution, and morphological transformations. The results show that the hybrid model achieves a precision of 1.00, a recall of 0.73, and an F1-score of 0.84, significantly outperforming traditional N-gram and TF-IDF approaches and demonstrating comparable accuracy to the BERT model while requiring substantially lower computational resources. The hybrid model proved highly effective in detecting various types of near-duplicate texts, including paraphrased and structurally modified content, making it suitable for practical applications in academic integrity verification, plagiarism detection, and intelligent text analysis. Moreover, this study highlights the potential of lightweight hybrid architectures as a practical alternative to large transformer-based models, particularly for languages with limited annotated corpora and linguistic resources. It lays the foundation for future research in cross-lingual duplicate detection and deep model adaptation for the Kazakh language. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

19 pages, 626 KiB

Open AccessArticle

A Kazakh–Chinese Cross-Lingual Joint Modeling Method for Question Understanding

by Yajing Ma, Yingxia Yu, Han Liu, Gulila Altenbek, Xiang Zhang and Yilixiati Tuersun

Appl. Sci. 2025, 15(12), 6643; https://doi.org/10.3390/app15126643 - 12 Jun 2025

Viewed by 439

Abstract

Current research on intelligent question answering mainly focuses on high-resource languages such as Chinese and English, with limited studies on question understanding and reasoning in low-resource languages. In addition, during the joint modeling of question understanding tasks, the interdependence among subtasks can lead [...] Read more.

Current research on intelligent question answering mainly focuses on high-resource languages such as Chinese and English, with limited studies on question understanding and reasoning in low-resource languages. In addition, during the joint modeling of question understanding tasks, the interdependence among subtasks can lead to error accumulation during the interaction phase, thereby affecting the prediction performance of the individual subtasks. To address the issue of error propagation caused by sentence-level intent encoding in the joint modeling of intent recognition and slot filling, this paper proposes a Cross-lingual Token-level Bi-Interactive Model (Bi-XTM). The model introduces a novel subtask interaction method that leverages the token-level intent output distribution as additional information for slot vector representation, effectively reducing error propagation and enhancing the information exchange between intent and slot vectors. Meanwhile, to address the scarcity of Kazakh (Arabic alphabet) language corpora, this paper constructs a cross-lingual joint question understanding dataset for the Xinjiang tourism domain, named JISD, which includes 16,548 Chinese samples and 1399 Kazakh samples. This dataset provides a new resource for cross-lingual intent recognition and slot filling joint tasks. Experimental results on the publicly available multi-lingual question understanding dataset MTOD and the newly constructed dataset demonstrate that the proposed Bi-XTM achieves state-of-the-art performance in both monolingual and cross-lingual settings. Full article

► Show Figures

Figure 1

12 pages, 679 KiB

Open AccessArticle

Comparative Assessment of Halitosis and Oral Health-Related Quality of Life Among Children and Young Adults with Clear Aligners, Those with Lingual Orthodontics, and Non-Orthodontic Controls: A Cross-Sectional Study with Dietary Subgroup Analyses

by Hamsah Musa, Dana-Cristina Bratu, Ioana Georgiana Pașca, Malina Popa, Magda Mihaela Luca, Octavia Balean, Ramona Dumitrescu, Ruxadra Sava Rosianu, Atena Galuscan and Roxana Oancea

J. Clin. Med. 2025, 14(11), 3995; https://doi.org/10.3390/jcm14113995 - 5 Jun 2025

Viewed by 521

Abstract

Background and Objectives: Halitosis poses a clinical and psychosocial burden, particularly in orthodontic contexts where plaque retention can exacerbate odor production. This cross-sectional study aimed to compare halitosis and oral health-related quality of life (OHRQoL) in three distinct groups: patients wearing removable clear [...] Read more.

Background and Objectives: Halitosis poses a clinical and psychosocial burden, particularly in orthodontic contexts where plaque retention can exacerbate odor production. This cross-sectional study aimed to compare halitosis and oral health-related quality of life (OHRQoL) in three distinct groups: patients wearing removable clear aligners, patients with lingual orthodontic brackets, and non-orthodontic controls. We further explored dietary factors (frequent snacking vs. infrequent snacking) to identify their influence on halitosis severity and self-perceived well-being. Methods: A total of 162 participants (55 aligners, 58 lingual brackets, 49 controls) were recruited. Halitosis was assessed by the Halitosis Associated Life-Quality Test (HALT) questionnaire (range 0–100) and an organoleptic evaluation (range 0–5). OHRQoL was examined with the OHIP-14 instrument (range 0–56). Data on frequent vs. infrequent snacking were also recorded. One-way ANOVAs with Tukey’s post hoc and chi-square tests were utilized for group comparisons. Spearman’s correlation examined relationships between HALT scores, organoleptic measures, and OHIP-14. A significance threshold of p < 0.05 was adopted. Results: Aligner users demonstrated lower mean HALT scores (31.7 ± 5.8) compared to the lingual group (37.4 ± 6.2, p = 0.001) and controls (34.6 ± 6.0, p = 0.039). Lingual bracket wearers had the highest mean organoleptic score (2.4 ± 0.6, p < 0.001). Frequent snackers exhibited worse HALT outcomes (36.9 ± 6.3) than infrequent snackers (32.6 ± 5.9, p = 0.005). A correlation analysis showed a moderate positive correlation (r = +0.52, p < 0.001) between HALT and organoleptic scores and a strong negative relationship (r = –0.63, p < 0.001) between HALT and OHIP-14. Conclusions: Removable aligner use correlated with lower self-reported halitosis and better OHRQoL relative to lingual brackets. Frequent snacking appeared to aggravate halitosis across all groups. These findings emphasize the importance of tailored oral hygiene measures, dietary counseling, and orthodontic appliance selection to mitigate halitosis and enhance overall well-being. Full article

(This article belongs to the Section Dentistry, Oral Surgery and Oral Medicine)

► Show Figures

Figure 1

Search Results (127)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (127)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI