Data-Driven Cross-Lingual Anomaly Detection via Self-Supervised Representation Learning
Abstract
1. Introduction
- Unlike existing methods that treat linguistic features and behavioral sequences in isolation, we construct a unified framework, LR-SSAD, that leverages the complementarity between semantic invariants and temporal regularities. This design conceptually resolves the bottleneck of feature sparsity in low-resource scenarios by allowing one modality to regularize the representation learning of the other.
- We propose a joint optimization objective that integrates cross-lingual masked prediction with Mamba-based sequence reconstruction. Distinct from standard auto-encoding or masked modeling approaches, this strategy utilizes the linear complexity of state space models to capture long-range behavioral dependencies, which serve as a stable anchor to mitigate the semantic drift often observed in low-resource language models.
- To overcome the “confirmation bias” inherent in conventional self-training and pseudo-labeling frameworks, we design a noise-robust pseudo-label refinement mechanism. By dynamically re-weighting samples based on prototype uncertainty, this mechanism ensures stable optimization trajectories and prevents the accumulation of noise in scarce-label environments.
- We leverage cross-lingual semantic alignment not just for representation matching, but to shape a consistent anomaly decision boundary across languages. This provides a new technical perspective for mitigating negative transfer in robust risk control systems, ensuring that anomaly definitions remain consistent even when transferring from high-resource to low-resource settings.
2. Related Work
2.1. Multilingual Financial Text Modeling Methods
2.2. Self-Supervised Learning and Anomaly Detection
2.3. Low-Resource Transfer Learning and Few-Shot Robust Modeling
3. Materials and Method
3.1. Data Collection
3.2. Dataset Processing and Enhancement
3.3. Proposed Method
3.3.1. Overall
3.3.2. Cross-Lingual Masked Prediction Module
3.3.3. Behavior Sequence Reconstruction Module
3.3.4. Pseudo-Label Noise Suppression and Stable Training Module
4. Results and Discussion
4.1. Experimental Setup
4.1.1. Hardware and Software Infrastructure
4.1.2. Data Partitioning and Evaluation Protocol
4.1.3. Implementation Details and Hyperparameters
4.1.4. Baseline Models and Evaluation Metrics
4.2. Overall Performance Comparison
4.3. Cross-Lingual Generalization Performance
4.4. Ablation Study of Different Components
4.5. Discussion
4.6. Limitation and Future Work
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Federici, F.M. Translating hazards: Multilingual concerns in risk and emergency communication. Translator 2022, 28, 375–398. [Google Scholar] [CrossRef]
- Liu, X.Y.; Wang, G.; Yang, H.; Zha, D. Fingpt: Democratizing internet-scale data for financial large language models. arXiv 2023, arXiv:2307.10485. [Google Scholar]
- Chen, A.; Wei, Y.; Le, H.; Zhang, Y. Learning by teaching with ChatGPT: The effect of teachable ChatGPT agent on programming education. Br. J. Educ. Technol. 2024, 27, 275–298. [Google Scholar] [CrossRef]
- Inaltong, N.U. Anti-Money Laundering Practices in the Scope of Risk Mitigation and Comparison with Anti-Money Laundering Regulations. SSRN Electron. J. 2025. Available online: https://ssrn.com/abstract=5215578 (accessed on 15 September 2025).
- Saxena, C. Identifying transaction laundering red flags and strategies for risk mitigation. J. Money Laund. Control 2024, 27, 1063–1077. [Google Scholar] [CrossRef]
- Komadina, A.; Martinić, M.; Groš, S.; Mihajlović, Ž. Comparing threshold selection methods for network anomaly detection. IEEE Access 2024, 12, 124943–124973. [Google Scholar] [CrossRef]
- Chiu, Y.T.; Bai, Z.H. Translation or multilingual retrieval? evaluating cross-lingual search strategies for traditional chinese financial documents. In Proceedings of the FinTech in AI CUP Special Session, Tokyo, Japan, 10 June 2025. [Google Scholar]
- Guo, P.; Ren, Y.; Hu, Y.; Li, Y.; Zhang, J.; Zhang, X.; Huang, H.Y. Teaching large language models to translate on low-resource languages with textbook prompting. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italy, 20–25 May 2024; pp. 15685–15697. [Google Scholar]
- Ogueji, K.; Zhu, Y.; Lin, J. Small data? No problem! Exploring the viability of pretrained multilingual language models for low-resourced languages. In Proceedings of the 1st Workshop on Multilingual Representation Learning, Punta Cana, Dominican Republic, 11 November 2021; pp. 116–126. [Google Scholar]
- Zhang, L.; Zhang, Y.; Ma, X. A new strategy for tuning ReLUs: Self-adaptive linear units (SALUs). In Proceedings of the ICMLCA 2021; 2nd International Conference on Machine Learning and Computer Application, Shenyang, China, 17–19 December 2021; VDE: Berlin, Germany, 2021; pp. 1–8. [Google Scholar]
- Barbieri, F.; Anke, L.E.; Camacho-Collados, J. XLM-T: Multilingual language models in Twitter for sentiment analysis and beyond. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, Marseille, France, 20–25 June 2022; pp. 258–266. [Google Scholar]
- Mazumder, M.T.R.; Shourov, M.S.H.; Rasul, I.; Akter, S.; Miah, M.K. Anomaly Detection in Financial Transactions Using Convolutional Neural Networks. J. Econ. Financ. Account. Stud. 2025, 7, 195–207. [Google Scholar] [CrossRef]
- Aliyu, Y.; Sarlan, A.; Danyaro, K.U.; Rahman, A.S. Comparative Analysis of Transformer Models for Sentiment Analysis in Low-Resource Languages. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 353. [Google Scholar] [CrossRef]
- Freire, M.B. Unsupervised Deep Learning to Supervised Interpretability: A Dual-Stage Approach for Financial Anomaly Detection. Master’s Thesis, Pontifícia Universidade Católica do Rio Grande do Sul, Porto Alegre, Brazil, 2024. [Google Scholar]
- Niu, S.; Liu, Y.; Wang, J.; Song, H. A decade survey of transfer learning (2010–2020). IEEE Trans. Artif. Intell. 2021, 1, 151–166. [Google Scholar] [CrossRef]
- Gui, J.; Chen, T.; Zhang, J.; Cao, Q.; Sun, Z.; Luo, H.; Tao, D. A survey on self-supervised learning: Algorithms, applications, and future trends. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 9052–9071. [Google Scholar] [CrossRef]
- Kumar, P.; Rawat, P.; Chauhan, S. Contrastive self-supervised learning: Review, progress, challenges and future research directions. Int. J. Multimed. Inf. Retr. 2022, 11, 461–488. [Google Scholar] [CrossRef]
- Tondji, G. Linguistic (In) Security and Persistence in Doctoral Studies: A Mixed-Methods Study of the Impact of Metapragmatic Discourses on the Persistence of Multilingual Doctoral Students. Ph.D. Thesis, The University of Texas Rio Grande Valley, Edinburg, TX, USA, 2025. [Google Scholar]
- Li, Q.; Ren, J.; Zhang, Y.; Song, C.; Liao, Y.; Zhang, Y. Privacy-Preserving DNN Training with Prefetched Meta-Keys on Heterogeneous Neural Network Accelerators. In Proceedings of the 2023 60th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 9–13 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
- Zhang, K.; Wen, Q.; Zhang, C.; Cai, R.; Jin, M.; Liu, Y.; Zhang, J.Y.; Liang, Y.; Pang, G.; Song, D.; et al. Self-supervised learning for time series analysis: Taxonomy, progress, and prospects. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 6775–6794. [Google Scholar] [CrossRef] [PubMed]
- Sehwag, V.; Chiang, M.; Mittal, P. Ssd: A unified framework for self-supervised outlier detection. arXiv 2021, arXiv:2103.12051. [Google Scholar] [CrossRef]
- Wu, Z.; Yang, X.; Wei, X.; Yuan, P.; Zhang, Y.; Bai, J. A self-supervised anomaly detection algorithm with interpretability. Expert Syst. Appl. 2024, 237, 121539. [Google Scholar] [CrossRef]
- Wang, Y.; Qin, C.; Wei, R.; Xu, Y.; Bai, Y.; Fu, Y. Self-supervision meets adversarial perturbation: A novel framework for anomaly detection. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 4555–4559. [Google Scholar]
- Pospieszny, P.; Mormul, W.; Szyndler, K.; Kumar, S. ADALog: Adaptive Unsupervised Anomaly detection in Logs with Self-attention Masked Language Model. arXiv 2025, arXiv:2505.13496. [Google Scholar]
- Chi, Z.; Dong, L.; Wei, F.; Yang, N.; Singhal, S.; Wang, W.; Song, X.; Mao, X.L.; Huang, H.Y.; Zhou, M. InfoXLM: An information-theoretic framework for cross-lingual language model pre-training. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 3576–3588. [Google Scholar]
- Alhawiti, K.M. Multi-Modal Decentralized Hybrid Learning for Early Parkinson’s Detection Using Voice Biomarkers and Contrastive Speech Embeddings. Sensors 2025, 25, 6959. [Google Scholar] [CrossRef] [PubMed]
- Ma, Y. Cross-language text generation using mbert and xlm-r: English-chinese translation task. In Proceedings of the 2024 International Conference on Machine Intelligence and Digital Applications, Ningbo, China, 30–31 May 2024; pp. 602–608. [Google Scholar]
- Goyal, N.; Du, J.; Ott, M.; Anantharaman, G.; Conneau, A. Larger-scale transformers for multilingual masked language modeling. arXiv 2021, arXiv:2105.00572. [Google Scholar] [CrossRef]
- Al-Laith, A. Exploring the Effectiveness of Multilingual and Generative Large Language Models for Question Answering in Financial Texts. In Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal), Abu Dhabi, United Arab Emirates, 19–20 January 2025; pp. 230–235. [Google Scholar]
- Han, Y.; Qi, Z.; Tian, Y. Anomaly classification based on self-supervised learning and its application. J. Radiat. Res. Appl. Sci. 2024, 17, 100918. [Google Scholar] [CrossRef]
- Wettig, A.; Gao, T.; Zhong, Z.; Chen, D. Should you mask 15% in masked language modeling? In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Dubrovnik, Croatia, 2–6 May 2023; pp. 2985–3000. [Google Scholar]
- Yu, C.W.; Chuang, Y.S.; Lotsos, A.N.; Meier, T.; Haase, C.M. The More Similar, the Better? Associations Between Latent Semantic Similarity and Emotional Experiences Differ Across Conversation Contexts. arXiv 2025, arXiv:2309.12646. [Google Scholar] [CrossRef]
- Tan, X.; Qin, T.; Bian, J.; Liu, T.Y.; Bengio, Y. Regeneration learning: A learning paradigm for data generation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 26–27 February 2024; Volume 38, pp. 22614–22622. [Google Scholar]
- Tang, Y.; Khatchadourian, R.; Bagherzadeh, M.; Singh, R.; Stewart, A.; Raja, A. An empirical study of refactorings and technical debt in machine learning systems. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), Madrid, Spain, 25–28 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 238–250. [Google Scholar]
- Li, S.; Zhang, L.; Wang, Z.; Wu, D.; Wu, L.; Liu, Z.; Xia, J.; Tan, C.; Liu, Y.; Sun, B.; et al. Masked modeling for self-supervised representation learning on vision and beyond. arXiv 2023, arXiv:2401.00897. [Google Scholar]
- Huang, K.H.; Ahmad, W.; Peng, N.; Chang, K.W. Improving zero-shot cross-lingual transfer learning via robust training. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 1684–1697. [Google Scholar]
- Han, W.; Pang, B.; Wu, Y.N. Robust transfer learning with pretrained language models through adapters. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International joint Conference on Natural Language Processing (Volume 2: Short Papers), Virtual Event, 1–6 August 2021; pp. 854–861. [Google Scholar]
- Jafari, A.R.; Heidary, B.; Farahbakhsh, R.; Salehi, M.; Jalili, M. Transfer Learning for Multi-lingual Tasks—A Survey. arXiv 2021, arXiv:2110.02052. [Google Scholar]
- Zhang, W.; Deng, L.; Zhang, L.; Wu, D. A survey on negative transfer. IEEE/CAA J. Autom. Sin. 2022, 10, 305–329. [Google Scholar] [CrossRef]
- Hong, S.; Lee, S.; Moon, H.; Lim, H.S. MIGRATE: Cross-Lingual Adaptation of Domain-Specific LLMs through Code-Switching and Embedding Transfer. In Proceedings of the 31st International Conference on Computational Linguistics, Abu Dhabi, United Arab Emirates, 19–24 January 2025; pp. 9184–9193. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
- Conneau, A.; Khandelwal, K.; Goyal, N.; Chaudhary, V.; Wenzek, G.; Guzmán, F.; Grave, E.; Ott, M.; Zettlemoyer, L.; Stoyanov, V. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 8440–8451. [Google Scholar]
- Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef]
- Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Washington, DC, USA, 15–19 December 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 413–422. [Google Scholar]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Schölkopf, B.; Platt, J.C.; Shawe-Taylor, J.; Smola, A.J.; Williamson, R.C. Estimating the support of a high-dimensional distribution. Neural Comput. 2001, 13, 1443–1471. [Google Scholar] [CrossRef] [PubMed]





| Data Type | Source | Language Coverage | Sample Size |
|---|---|---|---|
| Financial texts | Cross-border payment logs, customer records, online platforms | High- and low-resource languages | 1,200,000 |
| Transaction sequences | Payment systems and digital asset platforms | Language- independent | 420,000 accounts |
| Labeled anomalies | Rule-based systems and expert review | Multilingual | 38,500 |
| Normal samples | Long-term stable accounts | Multilingual | 381,500 |
| Method | Accuracy | Precision | Recall | F1-Score | AUC | AP | Inf. Time (ms) | Memory (GB) |
|---|---|---|---|---|---|---|---|---|
| Isolation Forest [44] | 0.842 ± 0.005 | 0.811 ± 0.006 | 0.768 ± 0.007 | 0.789 ± 0.006 | 0.861 ± 0.005 | 0.803 ± 0.006 | 0.8 ± 0.1 | 0.1 ± 0.0 |
| One-Class SVM [46] | 0.856 ± 0.004 | 0.832 ± 0.005 | 0.781 ± 0.006 | 0.806 ± 0.005 | 0.874 ± 0.004 | 0.819 ± 0.005 | 1.2 ± 0.2 | 0.2 ± 0.0 |
| AutoEncoder [45] | 0.871 ± 0.004 | 0.845 ± 0.004 | 0.802 ± 0.005 | 0.823 ± 0.004 | 0.891 ± 0.003 | 0.836 ± 0.004 | 2.5 ± 0.3 | 1.1 ± 0.1 |
| BiLSTM [43] | 0.884 ± 0.003 | 0.862 ± 0.003 | 0.817 ± 0.004 | 0.839 ± 0.003 | 0.903 ± 0.003 | 0.852 ± 0.003 | 3.1 ± 0.2 | 1.4 ± 0.1 |
| mBERT [41] | 0.893 ± 0.003 | 0.871 ± 0.002 | 0.836 ± 0.003 | 0.853 ± 0.002 | 0.912 ± 0.002 | 0.864 ± 0.002 | 11.5 ± 0.8 | 4.2 ± 0.2 |
| XLM-R [42] | 0.901 ± 0.002 | 0.879 ± 0.002 | 0.842 ± 0.003 | 0.860 ± 0.002 | 0.919 ± 0.002 | 0.871 ± 0.002 | 12.8 ± 0.9 | 4.8 ± 0.3 |
| LR-SSAD (Ours) | 0.932 ± 0.002 * | 0.914 ± 0.002 * | 0.891 ± 0.003 * | 0.902 ± 0.002 * | 0.948 ± 0.001 * | 0.923 ± 0.002 * | 4.8 ± 0.4 | 2.6 ± 0.1 |
| Method | Accuracy | F1-Score | AUC | AP | FAR ↓ | Inf. Time (ms) | Memory (GB) |
|---|---|---|---|---|---|---|---|
| Isolation Forest | 0.801 ± 0.007 | 0.742 ± 0.008 | 0.823 ± 0.006 | 0.761 ± 0.007 | 0.148 ± 0.005 | 0.8 ± 0.1 | 0.1 ± 0.0 |
| One-Class SVM | 0.814 ± 0.006 | 0.758 ± 0.007 | 0.836 ± 0.005 | 0.774 ± 0.006 | 0.136 ± 0.004 | 1.2 ± 0.2 | 0.2 ± 0.0 |
| AutoEncoder | 0.829 ± 0.005 | 0.779 ± 0.006 | 0.851 ± 0.004 | 0.791 ± 0.005 | 0.129 ± 0.003 | 2.5 ± 0.3 | 1.1 ± 0.1 |
| mBERT | 0.846 ± 0.004 | 0.801 ± 0.004 | 0.867 ± 0.003 | 0.814 ± 0.004 | 0.116 ± 0.003 | 11.5 ± 0.8 | 4.2 ± 0.2 |
| XLM-R | 0.858 ± 0.003 | 0.816 ± 0.003 | 0.879 ± 0.002 | 0.827 ± 0.003 | 0.109 ± 0.002 | 12.8 ± 0.9 | 4.8 ± 0.3 |
| LR-SSAD (Ours) | 0.901 ± 0.002 * | 0.862 ± 0.002 * | 0.914 ± 0.001 * | 0.889 ± 0.002 * | 0.081 ± 0.001 * | 4.8 ± 0.4 | 2.6 ± 0.1 |
| Model Variant | Accuracy | Precision | Recall | F1-Score | AUC | AP |
|---|---|---|---|---|---|---|
| LR-SSAD w/o CLMP | 0.894 ± 0.004 | 0.871 ± 0.004 | 0.843 ± 0.005 | 0.857 ± 0.004 | 0.906 ± 0.003 | 0.875 ± 0.004 |
| LR-SSAD w/o BSR | 0.887 ± 0.005 | 0.865 ± 0.005 | 0.832 ± 0.006 | 0.848 ± 0.005 | 0.899 ± 0.004 | 0.862 ± 0.005 |
| LR-SSAD w/o PLNS | 0.902 ± 0.003 | 0.881 ± 0.003 | 0.854 ± 0.004 | 0.867 ± 0.003 | 0.912 ± 0.002 | 0.885 ± 0.003 |
| LR-SSAD (full model) | 0.932 ± 0.002 * | 0.914 ± 0.002 * | 0.891 ± 0.003 * | 0.902 ± 0.002 * | 0.948 ± 0.001 * | 0.923 ± 0.002 * |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, M.; Wang, N.; Mei, L.; Li, Y.; Liu, X.; Hua, S.; Li, M. Data-Driven Cross-Lingual Anomaly Detection via Self-Supervised Representation Learning. Electronics 2026, 15, 212. https://doi.org/10.3390/electronics15010212
Wang M, Wang N, Mei L, Li Y, Liu X, Hua S, Li M. Data-Driven Cross-Lingual Anomaly Detection via Self-Supervised Representation Learning. Electronics. 2026; 15(1):212. https://doi.org/10.3390/electronics15010212
Chicago/Turabian StyleWang, Mingfei, Nuo Wang, Lingdong Mei, Yunfei Li, Xinyang Liu, Surui Hua, and Manzhou Li. 2026. "Data-Driven Cross-Lingual Anomaly Detection via Self-Supervised Representation Learning" Electronics 15, no. 1: 212. https://doi.org/10.3390/electronics15010212
APA StyleWang, M., Wang, N., Mei, L., Li, Y., Liu, X., Hua, S., & Li, M. (2026). Data-Driven Cross-Lingual Anomaly Detection via Self-Supervised Representation Learning. Electronics, 15(1), 212. https://doi.org/10.3390/electronics15010212
