Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (2)

Search Parameters:
Keywords = cross-lingual semantic shift

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 1396 KiB  
Article
Knowing the Words, Missing the Meaning: Evaluating LLMs’ Cultural Understanding Through Sino-Korean Words and Four-Character Idioms
by Eunsong Lee, Hyein Do, Minsu Kim and Dongsuk Oh
Appl. Sci. 2025, 15(13), 7561; https://doi.org/10.3390/app15137561 - 5 Jul 2025
Viewed by 345
Abstract
This study proposes a new benchmark to evaluate the cultural understanding and natural language processing capabilities of large language models based on Sino-Korean words and four-character idioms. Those are essential linguistic and cultural assets in Korea. Reflecting the official question types of the [...] Read more.
This study proposes a new benchmark to evaluate the cultural understanding and natural language processing capabilities of large language models based on Sino-Korean words and four-character idioms. Those are essential linguistic and cultural assets in Korea. Reflecting the official question types of the Korean Hanja Proficiency Test, we constructed four question categories—four-character idioms, synonyms, antonyms, and homophones—and systematically compared the performance of GPT-based and non-GPT LLMs. GPT-4o showed the highest accuracy and explanation quality. However, challenges remain in distinguishing the subtle nuances of individual characters and in adapting to uniquely Korean meanings as opposed to standard Chinese character interpretations. Our findings reveal a gap in LLMs’ understanding of Korea-specific Hanja culture and underscore the need for evaluation tools reflecting these cultural distinctions. Full article
Show Figures

Figure 1

18 pages, 585 KiB  
Article
Improving Diacritical Arabic Speech Recognition: Transformer-Based Models with Transfer Learning and Hybrid Data Augmentation
by Haifa Alaqel and Khalil El Hindi
Information 2025, 16(3), 161; https://doi.org/10.3390/info16030161 - 20 Feb 2025
Viewed by 1544
Abstract
Diacritical Arabic (DA) refers to Arabic text with diacritical marks that guide pronunciation and clarify meanings, making their recognition crucial for accurate linguistic interpretation. These diacritical marks (short vowels) significantly influence meaning and pronunciation, and their accurate recognition is vital for the effectiveness [...] Read more.
Diacritical Arabic (DA) refers to Arabic text with diacritical marks that guide pronunciation and clarify meanings, making their recognition crucial for accurate linguistic interpretation. These diacritical marks (short vowels) significantly influence meaning and pronunciation, and their accurate recognition is vital for the effectiveness of automatic speech recognition (ASR) systems, particularly in applications requiring high semantic precision, such as voice-enabled translation services. Despite its importance, leveraging advanced machine learning techniques to enhance ASR for diacritical Arabic has remained underexplored. A key challenge in developing DA ASR is the limited availability of training data. This study introduces a transformer-based approach leveraging transfer learning and data augmentation to address these challenges. Using a cross-lingual speech representation (XLSR) model pretrained on 53 languages, we fine-tune it on DA and integrate connectionist temporal classification (CTC) with transformers for improved performance. Data augmentation techniques, including volume adjustment, pitch shift, speed alteration, and hybrid strategies, further mitigate data limitations, significantly reducing word error rates (WER). Our methods achieve a WER of 12.17%, outperforming traditional ASR systems and setting a new benchmark for DA ASR. These findings demonstrate the potential of advanced machine learning to address longstanding challenges in DA ASR and enhance its accuracy. Full article
Show Figures

Figure 1

Back to TopTop