MDPI - Publisher of Open Access Journals

25 pages, 647 KB

Open AccessArticle

AI-Driven Sensing for Cross-Lingual Risk Prediction via Semantic Alignment and Multimodal Temporal Fusion

by Yida Zhang, Ceteng Fu, Xi Wang, Yiheng Zhang, Ziyu Xiong, Jingjin Pan and Jinghui Yin

Appl. Sci. 2026, 16(8), 3741; https://doi.org/10.3390/app16083741 - 10 Apr 2026

Viewed by 472

In the context of highly interconnected global markets and the rapid dissemination of multilingual information, traditional risk prediction methods that rely on single numerical sequences or monolingual text are insufficient for achieving early perception of cross-market risks. To address this issue, a cross-market [...] Read more.

In the context of highly interconnected global markets and the rapid dissemination of multilingual information, traditional risk prediction methods that rely on single numerical sequences or monolingual text are insufficient for achieving early perception of cross-market risks. To address this issue, a cross-market risk early warning framework based on multilingual large language models and multimodal sensing fusion is proposed. The proposed approach is centered on a unified risk semantic space, where cross-lingual semantic alignment is employed to reduce semantic discrepancies across languages. Furthermore, a semantic–volatility coupling attention mechanism is introduced to capture the dynamic relationship between textual semantic evolution and market fluctuations. In addition, cross-market knowledge transfer and low-resource enhancement strategies are incorporated to improve the model’s generalization capability across multilingual and multi-market environments, thereby establishing an intelligent perception and early warning system for complex sensing scenarios. Experimental results demonstrate that the proposed method significantly outperforms multiple baseline models in multilingual cross-market risk prediction tasks. In the main experiment, the model achieves a root mean squared error (RMSE) of

0.1127

, an mean absolute error (MAE) of

0.0846

, and an area under the curve (AUC) of

0.8879

, while the early warning gain is improved to

5.2

days, which is substantially better than the Transformer model (RMSE

0.1365

, AUC

0.8042

) and the multilingual BERT-based fusion model (AUC

0.8395

). In terms of classification performance, higher accuracy, precision, and recall are consistently achieved, with overall accuracy exceeding

0.88

, and both precision and recall are maintained above

0.85

, indicating strong discriminative capability in risk identification tasks. Cross-lingual generalization experiments further verify the robustness of the proposed framework. When trained solely on the English market, the model achieves AUC values of

0.8624

and

0.8471

on the Chinese and European markets, respectively, with RMSE reduced to

0.1185

, significantly outperforming competing methods. Overall, the proposed approach achieves substantial improvements in prediction accuracy, cross-lingual generalization, and early warning performance, providing an effective solution for artificial intelligence-driven sensing and risk early warning. Full article

► Show Figures

Figure 1

21 pages, 753 KB

Open AccessArticle

Learnable Convolutional Attention Network for Unsupervised Knowledge Graph Entity Alignment

by Weishan Cai and Wenjun Ma

Entropy 2025, 27(9), 924; https://doi.org/10.3390/e27090924 - 3 Sep 2025

Cited by 1 | Viewed by 1501

Abstract

The success of current entity alignment (EA) tasks largely depends on the supervision information provided by labeled data. Considering the cost of labeled data, most supervised methods are challenging to apply in practical scenarios. Therefore, an increasing number of works based on contrastive [...] Read more.

The success of current entity alignment (EA) tasks largely depends on the supervision information provided by labeled data. Considering the cost of labeled data, most supervised methods are challenging to apply in practical scenarios. Therefore, an increasing number of works based on contrastive learning, active learning, or other deep learning techniques have been developed, to solve the performance bottleneck caused by the lack of labeled data. However, existing unsupervised EA methods still face certain limitations; either their modeling complexity is high or they fail to balance the effectiveness and practicality of alignment. To overcome these issues, we propose a learnable convolutional attention network for unsupervised entity alignment, named LCA-UEA. Specifically, LCA-UEA performs convolution operations before the attention mechanism, ensuring the acquisition of structural information and avoiding the superposition of redundant information. Then, to efficiently filter out invalid neighborhood information of aligned entities, LCA-UEA designs a relation structure reconstruction method based on potential matching relations, thereby enhancing the usability and scalability of the EA method. Notably, a similarity function based on consistency is proposed to better measure the similarity of candidate entity pairs. Finally, we conducted extensive experiments on three datasets of different sizes and types (cross-lingual and monolingual) to verify the superiority of LCA-UEA. Experimental results demonstrate that LCA-UEA significantly improved alignment accuracy, outperforming 25 supervised or unsupervised methods, and improving by 6.4% in Hits@1 over the best baseline in the best case. Full article

(This article belongs to the Special Issue Entropy in Machine Learning Applications, 2nd Edition)

► Show Figures

Figure 1

20 pages, 904 KB

Open AccessArticle

Addressing Structural Asymmetry: Unsupervised Joint Training of Bilingual Embeddings for Non-Isomorphic Spaces

by Lei Meng, Xiaona Yang, Shangfeng Chen and Xiaojun Zhao

Symmetry 2025, 17(7), 1005; https://doi.org/10.3390/sym17071005 - 26 Jun 2025

Viewed by 1256

Abstract

Bilingual Word Embeddings (BWEs) are crucial for multilingual NLP tasks, enabling cross-lingual transfer. While traditional joint training methods require bilingual corpora, their applicability is limited for many language pairs, especially low-resource ones. Unsupervised methods, relying on the isomorphism assumption, suffer from performance degradation [...] Read more.

Bilingual Word Embeddings (BWEs) are crucial for multilingual NLP tasks, enabling cross-lingual transfer. While traditional joint training methods require bilingual corpora, their applicability is limited for many language pairs, especially low-resource ones. Unsupervised methods, relying on the isomorphism assumption, suffer from performance degradation when dealing with non-isomorphic embedding spaces, which are common in distant language pairs. This structural asymmetry challenges conventional approaches. To address these limitations, we propose a novel unsupervised joint training method for BWEs. We leverage monolingual corpora and introduce a dynamic programming algorithm to extract bilingual text segments, facilitating concurrent BWE training without relying on explicit bilingual supervision. Our approach effectively mitigates the challenge posed by asymmetric, non-isomorphic spaces by jointly learning BWEs in a shared space. Extensive experiments demonstrate the superiority of our method compared to existing approaches, particularly for distant language pairs exhibiting significant structural asymmetry Full article

(This article belongs to the Special Issue Symmetry/Asymmetry Studies in Data Mining & Machine Learning of Large Language Models)

► Show Figures

Figure 1

15 pages, 567 KB

Open AccessArticle

Oversea Cross-Lingual Summarization Service in Multilanguage Pre-Trained Model through Knowledge Distillation

by Xiwei Yang, Jing Yun, Bofei Zheng, Limin Liu and Qi Ban

Electronics 2023, 12(24), 5001; https://doi.org/10.3390/electronics12245001 - 14 Dec 2023

Cited by 4 | Viewed by 2011

Abstract

Cross-lingual text summarization is a highly desired service for overseas report editing tasks and is formulated in a distributed application to facilitate the cooperation of editors. The multilanguage pre-trained language model (MPLM) can generate high-quality cross-lingual text summaries with simple fine-tuning. However, the [...] Read more.

Cross-lingual text summarization is a highly desired service for overseas report editing tasks and is formulated in a distributed application to facilitate the cooperation of editors. The multilanguage pre-trained language model (MPLM) can generate high-quality cross-lingual text summaries with simple fine-tuning. However, the MPLM does not adapt to complex variations, like the word order and tense in different languages. When the model performs on these languages with separate syntactic structures and vocabulary morphologies, it will lead to the low-level quality of the cross-lingual summary. The matter worsens when the cross-lingual summarization datasets are low-resource. We use a knowledge distillation framework for the cross-lingual summarization task to address the above issues. By learning the monolingual teacher model, the cross-lingual student model can effectively capture the differences between languages. Since the teacher and student models generate summaries in two languages, their representations lie on different vector spaces. In order to construct representation relationships across languages, we further propose a similarity metric, which is based on bidirectional semantic alignment, to map different language representations to the same space. In order to improve the quality of cross-lingual summaries further, we use contrastive learning to make the student model focus on the differentials among languages. Contrastive learning can enhance the ability of the similarity metric for bidirectional semantic alignment. Our experiments show that our approach is competitive in low-resource scenarios on cross-language summarization datasets in pairs of distant languages. Full article

(This article belongs to the Special Issue New Advances in Distributed Computing and Its Applications)

► Show Figures

Figure 1

17 pages, 402 KB

Open AccessArticle

Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages

by Ephrem Afele Retta, Richard Sutcliffe, Jabar Mahmood, Michael Abebe Berwo, Eiad Almekhlafi, Sajjad Ahmad Khan, Shehzad Ashraf Chaudhry, Mustafa Mhamed and Jun Feng

Appl. Sci. 2023, 13(23), 12587; https://doi.org/10.3390/app132312587 - 22 Nov 2023

Cited by 13 | Viewed by 3665

Abstract

In a conventional speech emotion recognition (SER) task, a classifier for a given language is trained on a pre-existing dataset for that same language. However, where training data for a language do not exist, data from other languages can be used instead. We [...] Read more.

In a conventional speech emotion recognition (SER) task, a classifier for a given language is trained on a pre-existing dataset for that same language. However, where training data for a language do not exist, data from other languages can be used instead. We experiment with cross-lingual and multilingual SER, working with Amharic, English, German, and Urdu. For Amharic, we use our own publicly available Amharic Speech Emotion Dataset (ASED). For English, German and Urdu, we use the existing RAVDESS, EMO-DB, and URDU datasets. We followed previous research in mapping labels for all of the datasets to just two classes: positive and negative. Thus, we can compare performance on different languages directly and combine languages for training and testing. In Experiment 1, monolingual SER trials were carried out using three classifiers, AlexNet, VGGE (a proposed variant of VGG), and ResNet50. The results, averaged for the three models, were very similar for ASED and RAVDESS, suggesting that Amharic and English SER are equally difficult. Similarly, German SER is more difficult, and Urdu SER is easier. In Experiment 2, we trained on one language and tested on another, in both directions for each of the following pairs: Amharic↔German, Amharic↔English, and Amharic↔Urdu. The results with Amharic as the target suggested that using English or German as the source gives the best result. In Experiment 3, we trained on several non-Amharic languages and then tested on Amharic. The best accuracy obtained was several percentage points greater than the best accuracy in Experiment 2, suggesting that a better result can be obtained when using two or three non-Amharic languages for training than when using just one non-Amharic language. Overall, the results suggest that cross-lingual and multilingual training can be an effective strategy for training an SER classifier when resources for a language are scarce. Full article

(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications)

► Show Figures

Figure 1

18 pages, 2814 KB

Open AccessArticle

A Multitask Cross-Lingual Summary Method Based on ABO Mechanism

by Qing Li, Weibing Wan and Yuming Zhao

Appl. Sci. 2023, 13(11), 6723; https://doi.org/10.3390/app13116723 - 31 May 2023

Cited by 1 | Viewed by 1988

Abstract

Recent cross-lingual summarization research has pursued the use of a unified end-to-end model which has demonstrated a certain level of improvement in performance and effectiveness, but this approach stitches together multiple tasks and makes the computation more complex. Less work has focused on [...] Read more.

Recent cross-lingual summarization research has pursued the use of a unified end-to-end model which has demonstrated a certain level of improvement in performance and effectiveness, but this approach stitches together multiple tasks and makes the computation more complex. Less work has focused on alignment relationships across languages, which has led to persistent problems of summary misordering and loss of key information. For this reason, we first simplify the multitasking by converting the translation task into an equal proportion of cross-lingual summary tasks so that the model can perform only cross-lingual summary tasks when generating cross-lingual summaries. In addition, we splice monolingual and cross-lingual summary sequences as an input so that the model can fully learn the core content of the corpus. Then, we propose a reinforced regularization method based on the model to improve its robustness, and build a targeted ABO mechanism to enhance the semantic relationship alignment and key information retention of the cross-lingual summaries. Ablation experiments are conducted on three datasets of different orders of magnitude to demonstrate the effective enhancement of the model by the optimization approach; they outperform the mainstream approaches on the cross-lingual summarization task and the monolingual summarization task for the full dataset. Finally, we validate the model’s capabilities on a cross-lingual summary dataset of professional domains, and the results demonstrate its superior performance and ability to improve cross-lingual sequencing. Full article

(This article belongs to the Special Issue Natural Language Processing: Novel Methods and Applications)

► Show Figures

Figure 1

15 pages, 5546 KB

Open AccessArticle

Domain-Adversarial Based Model with Phonological Knowledge for Cross-Lingual Speech Recognition

by Qingran Zhan, Xiang Xie, Chenguang Hu, Juan Zuluaga-Gomez, Jing Wang and Haobo Cheng

Electronics 2021, 10(24), 3172; https://doi.org/10.3390/electronics10243172 - 20 Dec 2021

Cited by 4 | Viewed by 4038

Abstract

Phonological-based features (articulatory features, AFs) describe the movements of the vocal organ which are shared across languages. This paper investigates a domain-adversarial neural network (DANN) to extract reliable AFs, and different multi-stream techniques are used for cross-lingual speech recognition. First, a novel universal [...] Read more.

Phonological-based features (articulatory features, AFs) describe the movements of the vocal organ which are shared across languages. This paper investigates a domain-adversarial neural network (DANN) to extract reliable AFs, and different multi-stream techniques are used for cross-lingual speech recognition. First, a novel universal phonological attributes definition is proposed for Mandarin, English, German and French. Then a DANN-based AFs detector is trained using source languages (English, German and French). When doing the cross-lingual speech recognition, the AFs detectors are used to transfer the phonological knowledge from source languages (English, German and French) to the target language (Mandarin). Two multi-stream approaches are introduced to fuse the acoustic features and cross-lingual AFs. In addition, the monolingual AFs system (i.e., the AFs are directly extracted from the target language) is also investigated. Experiments show that the performance of the AFs detector can be improved by using convolutional neural networks (CNN) with a domain-adversarial learning method. The multi-head attention (MHA) based multi-stream can reach the best performance compared to the baseline, cross-lingual adaptation approach, and other approaches. More specifically, the MHA-mode with cross-lingual AFs yields significant improvements over monolingual AFs with the restriction of training data size and, which can be easily extended to other low-resource languages. Full article

(This article belongs to the Special Issue Applications of Neural Networks for Speech and Language Processing)

► Show Figures

Figure 1

24 pages, 912 KB

Open AccessArticle

Monolingual and Cross-Lingual Intent Detection without Training Data in Target Languages

by Jurgita Kapočiūtė-Dzikienė, Askars Salimbajevs and Raivis Skadiņš

Electronics 2021, 10(12), 1412; https://doi.org/10.3390/electronics10121412 - 11 Jun 2021

Cited by 10 | Viewed by 4850

Abstract

Due to recent DNN advancements, many NLP problems can be effectively solved using transformer-based models and supervised data. Unfortunately, such data is not available in some languages. This research is based on assumptions that (1) training data can be obtained by the machine [...] Read more.

Due to recent DNN advancements, many NLP problems can be effectively solved using transformer-based models and supervised data. Unfortunately, such data is not available in some languages. This research is based on assumptions that (1) training data can be obtained by the machine translating it from another language; (2) there are cross-lingual solutions that work without the training data in the target language. Consequently, in this research, we use the English dataset and solve the intent detection problem for five target languages (German, French, Lithuanian, Latvian, and Portuguese). When seeking the most accurate solutions, we investigate BERT-based word and sentence transformers together with eager learning classifiers (CNN, BERT fine-tuning, FFNN) and lazy learning approach (Cosine similarity as the memory-based method). We offer and evaluate several strategies to overcome the data scarcity problem with machine translation, cross-lingual models, and a combination of the previous two. The experimental investigation revealed the robustness of sentence transformers under various cross-lingual conditions. The accuracy equal to ~0.842 is achieved with the English dataset with completely monolingual models is considered our top-line. However, cross-lingual approaches demonstrate similar accuracy levels reaching ~0.831, ~0.829, ~0.853, ~0.831, and ~0.813 on German, French, Lithuanian, Latvian, and Portuguese languages. Full article

(This article belongs to the Special Issue Hybrid Methods for Natural Language Processing)

► Show Figures

Figure 1

Search Results (8)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (8)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI