MDPI - Publisher of Open Access Journals

24 pages, 5192 KiB

Open AccessArticle

Cross-Lingual Summarization for Low-Resource Languages Using Multilingual Retrieval-Based In-Context Learning

by Gyutae Park, Jeonghyun Park and Hwanhee Lee

Appl. Sci. 2025, 15(14), 7800; https://doi.org/10.3390/app15147800 - 11 Jul 2025

Viewed by 416

Cross-lingual summarization (XLS) involves generating a summary in one language from an article written in another language. XLS presents substantial hurdles due to the complex linguistic structures across languages and the challenges in transferring knowledge effectively between them. Although Large Language Models (LLMs) [...] Read more.

Cross-lingual summarization (XLS) involves generating a summary in one language from an article written in another language. XLS presents substantial hurdles due to the complex linguistic structures across languages and the challenges in transferring knowledge effectively between them. Although Large Language Models (LLMs) have demonstrated capabilities in cross-lingual tasks, the integration of retrieval-based in-context learning remains largely unexplored, despite its potential to overcome these linguistic barriers by providing relevant examples. In this paper, we introduce Multilingual Retrieval-based Cross-lingual Summarization (MuRXLS), a robust framework that dynamically selects the most relevant summarization examples for each article using multilingual retrieval. Our method leverages multilingual embedding models to identify contextually appropriate demonstrations for various LLMs. Experiments across twelve XLS setups (six language pairs in both directions) reveal a notable directional asymmetry: our approach significantly outperforms baselines in many-to-one (X→English) scenarios, while showing comparable performance in one-to-many (English→X) directions. We also observe a strong correlation between article-example semantic similarity and summarization quality, demonstrating that intelligently selecting contextually relevant examples substantially improves XLS performance by providing LLMs with more informative demonstrations. Full article

(This article belongs to the Special Issue New Perspectives in Natural Language Processing and Computational Linguistics)

► Show Figures

Figure 1

24 pages, 3666 KiB

Open AccessArticle

Contrastive Learning Pre-Training and Quantum Theory for Cross-Lingual Aspect-Based Sentiment Analysis

by Xun Li and Kun Zhang

Entropy 2025, 27(7), 713; https://doi.org/10.3390/e27070713 - 1 Jul 2025

Viewed by 347

Abstract

The cross-lingual aspect-based sentiment analysis (ABSA) task continues to pose a significant challenge, as it involves training a classifier on high-resource source languages and then applying it to classify texts in low-resource target languages, thereby bridging linguistic gaps while preserving accuracy. Most existing [...] Read more.

The cross-lingual aspect-based sentiment analysis (ABSA) task continues to pose a significant challenge, as it involves training a classifier on high-resource source languages and then applying it to classify texts in low-resource target languages, thereby bridging linguistic gaps while preserving accuracy. Most existing methods achieve exceptional performance by relying on multilingual pre-trained language models (mPLM) and translation systems to transfer knowledge across languages. However, little attention has been paid to factors beyond semantic similarity, which ultimately hinders classification performance in target languages. To address this challenge, we propose CLQT, a novel framework that combines contrastive learning pre-training with quantum theory to address the cross-lingual ABSA task. Firstly, we develop a contrastive learning strategy to align data between the source and target languages. Subsequently, we incorporate a quantum network that employs quantum projection and quantum entanglement to facilitate effective knowledge transfer across languages. Extensive experiments reveal that the novel CLQT framework both achieves strong results and has a beneficial overall influence on the cross-lingual ABSA task. Full article

(This article belongs to the Special Issue The Future of Quantum Machine Learning and Quantum AI, 2nd Edition)

► Show Figures

Figure 1

21 pages, 1274 KiB

Open AccessArticle

Heterogeneous Graph Neural Network with Multi-View Contrastive Learning for Cross-Lingual Text Classification

by Xun Li and Kun Zhang

Appl. Sci. 2025, 15(7), 3454; https://doi.org/10.3390/app15073454 - 21 Mar 2025

Viewed by 726

Abstract

The cross-lingual text classification task remains a long-standing challenge that aims to train a classifier on high-resource source languages and apply it to classify texts in low-resource target languages, bridging linguistic gaps while maintaining accuracy. Most existing methods achieve exceptional performance by relying [...] Read more.

The cross-lingual text classification task remains a long-standing challenge that aims to train a classifier on high-resource source languages and apply it to classify texts in low-resource target languages, bridging linguistic gaps while maintaining accuracy. Most existing methods achieve exceptional performance by relying on multilingual pretrained language models to transfer knowledge across languages. However, little attention has been paid to factors beyond semantic similarity, which leads to the degradation of classification performance in the target languages. This study proposes a novel framework, a heterogeneous graph neural network with multi-view contrastive learning for cross-lingual text classification, which integrates a heterogeneous graph architecture with multi-view contrastive learning for the cross-lingual text classification task. This study constructs a heterogeneous graph to capture both syntactic and semantic knowledge by connecting document and word nodes using different types of edges, including Part-of-Speech tagging, dependency, similarity, and translation edges. A Graph Attention Network is applied to aggregate information from neighboring nodes. Furthermore, this study devises a multi-view contrastive learning strategy to enhance model performance by pulling positive examples closer together and pushing negative examples further apart. Extensive experiments show that the framework outperforms the previous state-of-the-art model, achieving improvements of 2.20% in accuracy and 1.96% in F1-score on the XGLUE and Amazon Review datasets, respectively. These findings demonstrate that the proposed model makes a positive impact on the cross-lingual text classification task overall. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

19 pages, 3501 KiB

Open AccessArticle

Cross-Encoder-Based Semantic Evaluation of Extractive and Generative Question Answering in Low-Resourced African Languages

by Funebi Francis Ijebu, Yuanchao Liu, Chengjie Sun, Nobert Jere, Ibomoiye Domor Mienye and Udoinyang Godwin Inyang

Technologies 2025, 13(3), 119; https://doi.org/10.3390/technologies13030119 - 16 Mar 2025

Viewed by 704

Abstract

Efficient language analysis techniques and models are crucial in the artificial intelligence age for enhancing cross-lingual question answering. Transfer learning with state-of-the-art models has been beneficial in this regard, but the performance of low-resource African languages with morphologically rich grammatical structures and unique [...] Read more.

Efficient language analysis techniques and models are crucial in the artificial intelligence age for enhancing cross-lingual question answering. Transfer learning with state-of-the-art models has been beneficial in this regard, but the performance of low-resource African languages with morphologically rich grammatical structures and unique typologies has shown deficiencies linkable to evaluation techniques and scarce training data. To enhance the former, this paper proposes an evaluation pipeline leveraging the semantic answer similarity method enhanced with automatic answer annotation. The pipeline uses the Language-agnostic BERT Sentence Embedding model integrated with an adapted vector measure to perform cross-lingual text analysis after answer prediction. Experimental results from the multilingual-T5 and AfroXLMR models on nine languages of the AfriQA dataset surpassed existing benchmarks deploying string-based methods for question answer evaluation. The results are also superior to the F1-score-based GPT4 and Llama-2 performances on the same downstream task. The automatic answer annotation technique effectively reduced the labelling time while maintaining a high performance. Thus, the proposed pipeline is more efficient than the prevailing string-based F1 and Exact Match metrics in mixed answer type question–answer evaluations, and it is a more natural performance estimator for models targeting real-world deployment. Full article

► Show Figures

Figure 1

15 pages, 683 KiB

Open AccessArticle

Cross-Lingual Short-Text Semantic Similarity for Kannada–English Language Pair

by Muralikrishna S N, Raghurama Holla, Harivinod N and Raghavendra Ganiga

Computers 2024, 13(9), 236; https://doi.org/10.3390/computers13090236 - 18 Sep 2024

Cited by 1 | Viewed by 1839

Abstract

Analyzing the semantic similarity of cross-lingual texts is a crucial part of natural language processing (NLP). The computation of semantic similarity is essential for a variety of tasks such as evaluating machine translation systems, quality checking human translation, information retrieval, plagiarism checks, etc. [...] Read more.

Analyzing the semantic similarity of cross-lingual texts is a crucial part of natural language processing (NLP). The computation of semantic similarity is essential for a variety of tasks such as evaluating machine translation systems, quality checking human translation, information retrieval, plagiarism checks, etc. In this paper, we propose a method for measuring the semantic similarity of Kannada–English sentence pairs that uses embedding space alignment, lexical decomposition, word order, and a convolutional neural network. The proposed method achieves a maximum correlation of 83% with human annotations. Experiments on semantic matching and retrieval tasks resulted in promising results in terms of precision and recall. Full article

(This article belongs to the Special Issue When Natural Language Processing Meets Machine Learning—Opportunities, Challenges and Solutions)

► Show Figures

Figure 1

15 pages, 567 KiB

Open AccessArticle

Oversea Cross-Lingual Summarization Service in Multilanguage Pre-Trained Model through Knowledge Distillation

by Xiwei Yang, Jing Yun, Bofei Zheng, Limin Liu and Qi Ban

Electronics 2023, 12(24), 5001; https://doi.org/10.3390/electronics12245001 - 14 Dec 2023

Cited by 2 | Viewed by 1344

Abstract

Cross-lingual text summarization is a highly desired service for overseas report editing tasks and is formulated in a distributed application to facilitate the cooperation of editors. The multilanguage pre-trained language model (MPLM) can generate high-quality cross-lingual text summaries with simple fine-tuning. However, the [...] Read more.

Cross-lingual text summarization is a highly desired service for overseas report editing tasks and is formulated in a distributed application to facilitate the cooperation of editors. The multilanguage pre-trained language model (MPLM) can generate high-quality cross-lingual text summaries with simple fine-tuning. However, the MPLM does not adapt to complex variations, like the word order and tense in different languages. When the model performs on these languages with separate syntactic structures and vocabulary morphologies, it will lead to the low-level quality of the cross-lingual summary. The matter worsens when the cross-lingual summarization datasets are low-resource. We use a knowledge distillation framework for the cross-lingual summarization task to address the above issues. By learning the monolingual teacher model, the cross-lingual student model can effectively capture the differences between languages. Since the teacher and student models generate summaries in two languages, their representations lie on different vector spaces. In order to construct representation relationships across languages, we further propose a similarity metric, which is based on bidirectional semantic alignment, to map different language representations to the same space. In order to improve the quality of cross-lingual summaries further, we use contrastive learning to make the student model focus on the differentials among languages. Contrastive learning can enhance the ability of the similarity metric for bidirectional semantic alignment. Our experiments show that our approach is competitive in low-resource scenarios on cross-language summarization datasets in pairs of distant languages. Full article

(This article belongs to the Special Issue New Advances in Distributed Computing and Its Applications)

► Show Figures

Figure 1

16 pages, 2845 KiB

Open AccessArticle

An Interactive Framework of Cross-Lingual NLU for In-Vehicle Dialogue

by Xinlu Li, Liangkuan Fang, Lexuan Zhang and Pei Cao

Sensors 2023, 23(20), 8501; https://doi.org/10.3390/s23208501 - 16 Oct 2023

Cited by 2 | Viewed by 1647

Abstract

As globalization accelerates, the linguistic diversity and semantic complexity of in-vehicle communication is increasing. In order to meet the needs of different language speakers, this paper proposes an interactive attention-based contrastive learning framework (IABCL) for the field of in-vehicle dialogue, aiming to effectively [...] Read more.

As globalization accelerates, the linguistic diversity and semantic complexity of in-vehicle communication is increasing. In order to meet the needs of different language speakers, this paper proposes an interactive attention-based contrastive learning framework (IABCL) for the field of in-vehicle dialogue, aiming to effectively enhance cross-lingual natural language understanding (NLU). The proposed framework aims to address the challenges of cross-lingual interaction in in-vehicle dialogue systems and provide an effective solution. IABCL is based on a contrastive learning and attention mechanism. First, contrastive learning is applied in the encoder stage. Positive and negative samples are used to allow the model to learn different linguistic expressions of similar meanings. Its main role is to improve the cross-lingual learning ability of the model. Second, the attention mechanism is applied in the decoder stage. By articulating slots and intents with each other, it allows the model to learn the relationship between the two, thus improving the ability of natural language understanding in languages of the same language family. In addition, this paper constructed a multilingual in-vehicle dialogue (MIvD) dataset for experimental evaluation to demonstrate the effectiveness and accuracy of the IABCL framework in cross-lingual dialogue. With the framework studied in this paper, IABCL improves by 2.42% in intent, 1.43% in slot, and 2.67% in overall when compared with the latest model. Full article

(This article belongs to the Special Issue Advanced Vehicle to Everything (V2X) Communication and Application in Vehicle-Environment Cooperative Control)

► Show Figures

Figure 1

26 pages, 786 KiB

Open AccessArticle

Finding Patient Zero and Tracking Narrative Changes in the Context of Online Disinformation Using Semantic Similarity Analysis

by Codruț-Georgian Artene, Ciprian Oprișa, Cristian Nicolae Buțincu and Florin Leon

Mathematics 2023, 11(9), 2053; https://doi.org/10.3390/math11092053 - 26 Apr 2023

Cited by 2 | Viewed by 2215

Abstract

Disinformation in the form of news articles, also called fake news, is used by multiple actors for nefarious purposes, such as gaining political advantages. A key component for fake news detection is the ability to find similar articles in a large documents corpus, [...] Read more.

Disinformation in the form of news articles, also called fake news, is used by multiple actors for nefarious purposes, such as gaining political advantages. A key component for fake news detection is the ability to find similar articles in a large documents corpus, for tracking narrative changes and identifying the root source (patient zero) of a particular piece of information. This paper presents new techniques based on textual and semantic similarity that were adapted for achieving this goal on large datasets of news articles. The aim is to determine which of the implemented text similarity techniques is more suitable for this task. For text similarity, a Locality-Sensitive Hashing is applied on n-grams extracted from text to produce representations that are further indexed to facilitate the quick discovery of similar articles. The semantic textual similarity technique is based on sentence embeddings from pre-trained language models, such as BERT, and Named Entity Recognition. The proposed techniques are evaluated on a collection of Romanian articles to determine their performance in terms of quality of results and scalability. The presented techniques produce competitive results. The experimental results show that the proposed semantic textual similarity technique is better at identifying similar text documents, while the Locality-Sensitive Hashing text similarity technique outperforms it in terms of execution time and scalability. Even if they were evaluated only on Romanian texts and some of them are based on pre-trained models for the Romanian language, the methods that are the basis of these techniques allow their extension to other languages, with few to no changes, provided that there are pre-trained models for other languages as well. As for a cross-lingual setup, more changes are needed along with tests to demonstrate this capability. Based on the obtained results, one may conclude that the presented techniques are suitable to be integrated into a decentralized anti-disinformation platform for fact-checking and trust assessment. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

► Show Figures

Figure 1

14 pages, 1014 KiB

Open AccessArticle

Meta-Learning for Mandarin-Tibetan Cross-Lingual Speech Synthesis

by Weizhao Zhang and Hongwu Yang

Appl. Sci. 2022, 12(23), 12185; https://doi.org/10.3390/app122312185 - 28 Nov 2022

Cited by 1 | Viewed by 2001

Abstract

The paper proposes a meta-learning-based Mandarin-Tibetan cross-lingual text-to-speech (TTS) to realize both Mandarin and Tibetan speech synthesis under a unique framework. First, we build two kinds of Tacotron2-based Mandarin-Tibetan cross-lingual baseline TTS. One is a shared encoder Mandarin-Tibetan cross-lingual TTS, and another is [...] Read more.

The paper proposes a meta-learning-based Mandarin-Tibetan cross-lingual text-to-speech (TTS) to realize both Mandarin and Tibetan speech synthesis under a unique framework. First, we build two kinds of Tacotron2-based Mandarin-Tibetan cross-lingual baseline TTS. One is a shared encoder Mandarin-Tibetan cross-lingual TTS, and another is a separate encoder Mandarin-Tibetan cross-lingual TTS. Both baseline TTS use the speaker classifier with a gradient reversal layer to disentangle speaker-specific information from the text encoder. At the same time, we design a prosody generator to extract prosodic information from sentences to explore syntactic and semantic information adequately. To further improve the synthesized speech quality of the Tacotron2-based Mandarin-Tibetan cross-lingual TTS, we propose a meta-learning-based Mandarin-Tibetan cross-lingual TTS. Based on the separate encoder Mandarin-Tibetan cross-lingual TTS, we use an additional dynamic network to predict the parameters of the language-dependent text encoder that could realize better cross-lingual knowledge sharing in the sequence-to-sequence TTS. Lastly, we synthesize Mandarin or Tibetan speech through the unique acoustic model. The baseline experimental results show that the separate encoder Mandarin-Tibetan cross-lingual TTS could handle the input of different languages better than the shared encoder Mandarin-Tibetan cross-lingual TTS. The experimental results further show that the proposed meta-learning-based Mandarin-Tibetan cross-lingual speech synthesis method could effectively improve the voice quality of synthesized speech in terms of naturalness and speaker similarity. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

12 pages, 2177 KiB

Open AccessArticle

Constructing Uyghur Commonsense Knowledge Base by Knowledge Projection

by Azmat Anwar, Xiao Li, Yating Yang and Yajuan Wang

Appl. Sci. 2019, 9(16), 3318; https://doi.org/10.3390/app9163318 - 13 Aug 2019

Cited by 1 | Viewed by 2830

Abstract

Although considerable effort has been devoted to building commonsense knowledge bases (CKB), it is still not available for many low-resource languages such as Uyghur because of expensive construction cost. Focusing on this issue, we proposed a cross-lingual knowledge-projection method to construct an Uyghur [...] Read more.

Although considerable effort has been devoted to building commonsense knowledge bases (CKB), it is still not available for many low-resource languages such as Uyghur because of expensive construction cost. Focusing on this issue, we proposed a cross-lingual knowledge-projection method to construct an Uyghur CKB by projecting ConceptNet’s Chinese facts into Uyghur. We used a Chinese–Uyghur bilingual dictionary to get high-quality entity translation in facts and employed a back-translation method to eliminate the entity-translation ambiguity. Moreover, to tackle the inner relation ambiguity in translated facts, we made a hand-crafted rule to convert the structured facts into natural-language phrases and built the Chinese–Uyghur lingual phrases based on the similarity of phrases that corresponded to the bilingual semantic similarity scoring model. Experimental results show that the accuracy of our semantic similarity scoring model reached 94.75% for our task, and they successfully project 55,872 Chinese facts into Uyghur as well as obtain 67,375 Uyghur facts within a very short period. Full article

► Show Figures

Figure 1

21 pages, 1385 KiB

Open AccessArticle

Improving Semantic Similarity with Cross-Lingual Resources: A Study in Bangla—A Low Resourced Language

by Rajat Pandit, Saptarshi Sengupta, Sudip Kumar Naskar, Niladri Sekhar Dash and Mohini Mohan Sardar

Informatics 2019, 6(2), 19; https://doi.org/10.3390/informatics6020019 - 5 May 2019

Cited by 13 | Viewed by 7735

Abstract

Semantic similarity is a long-standing problem in natural language processing (NLP). It is a topic of great interest as its understanding can provide a look into how human beings comprehend meaning and make associations between words. However, when this problem is looked at [...] Read more.

Semantic similarity is a long-standing problem in natural language processing (NLP). It is a topic of great interest as its understanding can provide a look into how human beings comprehend meaning and make associations between words. However, when this problem is looked at from the viewpoint of machine understanding, particularly for under resourced languages, it poses a different problem altogether. In this paper, semantic similarity is explored in Bangla, a less resourced language. For ameliorating the situation in such languages, the most rudimentary method (path-based) and the latest state-of-the-art method (Word2Vec) for semantic similarity calculation were augmented using cross-lingual resources in English and the results obtained are truly astonishing. In the presented paper, two semantic similarity approaches have been explored in Bangla, namely the path-based and distributional model and their cross-lingual counterparts were synthesized in light of the English WordNet and Corpora. The proposed methods were evaluated on a dataset comprising of 162 Bangla word pairs, which were annotated by five expert raters. The correlation scores obtained between the four metrics and human evaluation scores demonstrate a marked enhancement that the cross-lingual approach brings into the process of semantic similarity calculation for Bangla. Full article

► Show Figures

Figure 1

13 pages, 1016 KiB

Open AccessFeature PaperArticle

Estimation of Cross-Lingual News Similarities Using Text-Mining Methods

by Zhouhao Wang, Enda Liu, Hiroki Sakaji, Tomoki Ito, Kiyoshi Izumi, Kota Tsubouchi and Tatsuo Yamashita

J. Risk Financial Manag. 2018, 11(1), 8; https://doi.org/10.3390/jrfm11010008 - 31 Jan 2018

Cited by 2 | Viewed by 6448

Abstract

In this research, two estimation algorithms for extracting cross-lingual news pairs based on machine learning from financial news articles have been proposed. Every second, innumerable text data, including all kinds news, reports, messages, reviews, comments, and tweets are generated on the Internet, and [...] Read more.

In this research, two estimation algorithms for extracting cross-lingual news pairs based on machine learning from financial news articles have been proposed. Every second, innumerable text data, including all kinds news, reports, messages, reviews, comments, and tweets are generated on the Internet, and these are written not only in English but also in other languages such as Chinese, Japanese, French, etc. By taking advantage of multi-lingual text resources provided by Thomson Reuters News, we developed two estimation algorithms for extracting cross-lingual news pairs from multilingual text resources. In our first method, we propose a novel structure that uses the word information and the machine learning method effectively in this task. Simultaneously, we developed a bidirectional Long Short-Term Memory (LSTM) based method to calculate cross-lingual semantic text similarity for long text and short text, respectively. Thus, when an important news article is published, users can read similar news articles that are written in their native language using our method. Full article

(This article belongs to the Special Issue Empirical Finance)

► Show Figures

Figure 1

21 pages, 2369 KiB

Open AccessArticle

Geospatial Information Categories Mapping in a Cross-lingual Environment: A Case Study of “Surface Water” Categories in Chinese and American Topographic Maps

by Xi Kuai, Lin Li, Heng Luo, Shen Hang, Zhijun Zhang and Yu Liu

ISPRS Int. J. Geo-Inf. 2016, 5(6), 90; https://doi.org/10.3390/ijgi5060090 - 14 Jun 2016

Cited by 12 | Viewed by 5736

Abstract

The need for integrating geospatial information (GI) data from various heterogeneous sources has seen increased importance for geographic information system (GIS) interoperability. Using domain ontologies to clarify and integrate the semantics of data is considered as a crucial step for successful semantic integration [...] Read more.

The need for integrating geospatial information (GI) data from various heterogeneous sources has seen increased importance for geographic information system (GIS) interoperability. Using domain ontologies to clarify and integrate the semantics of data is considered as a crucial step for successful semantic integration in the GI domain. Nevertheless, mechanisms are still needed to facilitate semantic mapping between GI ontologies described in different natural languages. This research establishes a formal ontology model for cross-lingual geospatial information ontology mapping. By first extracting semantic primitives from a free-text definition of categories in two GI classification standards with different natural languages, an ontology-driven approach is used, and a formal ontology model is established to formally represent these semantic primitives into semantic statements, in which the spatial-related properties and relations are considered as crucial statements for the representation and identification of the semantics of the GI categories. Then, an algorithm is proposed to compare these semantic statements in a cross-lingual environment. We further design a similarity calculation algorithm based on the proposed formal ontology model to distance the semantic similarities and identify the mapping relationships between categories. In particular, we work with two GI classification standards for Chinese and American topographic maps. The experimental results demonstrate the feasibility and reliability of the proposed model for cross-lingual geospatial information ontology mapping. Full article

(This article belongs to the Special Issue Geospatial Semantics and Semantic Web)

► Show Figures

Figure 1

Search Results (13)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (13)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI