Leveraging Contrastive Semantics and Language Adaptation for Robust Financial Text Classification Across Languages

Zhang, Liman; Lin, Qianye; Meng, Fanyu; Liang, Siyu; Lu, Jingxuan; Liu, Shen; Chen, Kehan; Zhan, Yan

doi:10.3390/computers14080338

Open AccessArticle

Leveraging Contrastive Semantics and Language Adaptation for Robust Financial Text Classification Across Languages

by

Liman Zhang

¹,

Qianye Lin

¹,

Fanyu Meng

¹,

Siyu Liang

¹,

Jingxuan Lu

¹,

Shen Liu

¹,

Kehan Chen

¹ and

Yan Zhan

^1,2,*

¹

National School of Development, Peking University, Beijing 100871, China

²

Artificial Intelligence Research Institute, Tsinghua University, Beijing 100084, China

^*

Author to whom correspondence should be addressed.

Computers 2025, 14(8), 338; https://doi.org/10.3390/computers14080338

Submission received: 16 July 2025 / Revised: 15 August 2025 / Accepted: 17 August 2025 / Published: 19 August 2025

(This article belongs to the Special Issue Natural Language Processing (NLP) and Large Language Modelling)

Download

Browse Figures

Versions Notes

Abstract

With the growing demand for multilingual financial information, cross-lingual financial sentiment recognition faces significant challenges, including semantic misalignment, ambiguous sentiment expression, and insufficient transferability. To address these issues, a unified multilingual recognition framework is proposed, integrating semantic contrastive learning with a language-adaptive modulation mechanism. This approach is built upon the XLM-R multilingual model and employs a semantic contrastive module to enhance cross-lingual semantic consistency. In addition, a language modulation module based on low-rank parameter injection is introduced to improve the model’s sensitivity to fine-grained emotional features in low-resource languages such as Chinese and French. Experiments were conducted on a constructed trilingual financial sentiment dataset encompassing English, Chinese, and French. The results demonstrate that the proposed model significantly outperforms existing methods in cross-lingual sentiment recognition tasks. Specifically, in the English-to-French transfer setting, the model achieved 73.6% in accuracy, 69.8% in F1-Macro, 72.4% in F1-Weighted, and a cross-lingual generalization score of 0.654. Further improvements were observed under multilingual joint training, reaching 77.3%, 73.6%, 76.1%, and 0.696, respectively. In overall comparisons, the proposed model attained the highest performance across cross-lingual scenarios, with 75.8% in accuracy, 72.3% in F1-Macro, and 74.7% in F1-Weighted, surpassing strong baselines such as XLM-R+SimCSE and LaBSE. These results highlight the model’s superior capability in semantic alignment and generalization across languages. The proposed framework demonstrates strong applicability and promising potential in multilingual financial sentiment analysis, public opinion monitoring, and multilingual risk modeling.

Keywords:

semantic contrastive learning; language-adaptive fine-tuning; low-resource language transfer; cross-lingual generalization; multilingual pretrained language models

1. Introduction

In the context of increasing integration within global financial markets and the accelerating dissemination of information across multiple languages, financial sentiment has emerged as a critical variable for characterizing investor expectations and market responses. It plays a pivotal role in shaping asset pricing mechanisms, risk transmission pathways, and market volatility dynamics [1,2]. Consequently, the development of financial sentiment recognition systems with cross-lingual understanding and analytical capabilities is of significant importance for supporting intelligent investment research, enhancing regulatory efficiency across markets, and enabling globalized financial decision making [3]. Particularly, with the rapid expansion of multilingual financial corpora, the extraction of consistent and comparable sentiment signals from diverse linguistic contexts has become a pressing issue in the field of financial natural language processing. Although sentiment analysis in English has witnessed the establishment of relatively mature methodologies—spanning sentiment lexicon construction, sentiment-annotated corpus development, and deep-learning-based modeling techniques—these approaches predominantly rely on data and tools available in high-resource environments [4,5,6]. For other languages, a severe scarcity of relevant data resources impedes the acquisition of effective supervision signals by models. Moreover, differences in syntactic structure, sentiment expression conventions, and domain-specific terminology across languages often lead to semantic misalignment, which significantly undermines the transferability and generalization performance of sentiment recognition models. Current methods still lack systematic modeling approaches for addressing issues such as cross-lingual semantic consistency, sentiment alignment, and variation in domain terminology, thereby constraining their practical application in multilingual financial environments.

Against this backdrop, multilingual pretrained models (e.g., mBERT, XLM-R) have been increasingly introduced into financial sentiment analysis tasks and have become a focal point of recent research on cross-lingual modeling. Almalki et al. (2025) applied mBERT to multilingual sentiment recognition on social media and validated its transfer capability among high-resource languages [7]. Kumar et al. (2021) fine-tuned XLM-R on English corpora and employed zero-shot transfer learning to enhance domain adaptability [8]. Li et al. (2025) further integrated contrastive learning with translation-based alignment strategies to alleviate the problem of cross-lingual semantic drift [9]. While these efforts have demonstrated promising results in general or vertical domains, multiple challenges remain in multilingual financial texts. These include inconsistencies in sentiment expression, weak alignment of semantic spaces, and the absence of effective financial domain knowledge injection—all of which hinder model performance in low-resource languages. To address these challenges, a cross-lingual financial sentiment analysis framework is proposed, integrating multilingual pretrained models with a contrastive learning mechanism. A unified semantic representation system is constructed based on XLM-R, and a semantic contrastive learning module is introduced to effectively enhance sentiment alignment across languages. Simultaneously, a language-adaptive modulation module is designed to improve understanding of financial texts in specific languages without compromising the model’s inherent cross-lingual generalization capability. In addition, a multilingual financial sentiment dataset covering English, Chinese, and French is constructed and publicly released to provide a systematic evaluation foundation for this task. The main contributions of this work are as follows:

A contrastive-learning-based semantic alignment mechanism is proposed, which significantly enhances the model’s robustness in recognizing sentiment equivalence across languages;
A language-adaptive modulation mechanism is introduced to achieve synergistic improvement between cross-lingual transferability and monolingual performance;
The effectiveness and generalization ability of the proposed method are systematically validated on multiple cross-lingual financial sentiment classification tasks, with overall performance surpassing that of existing mainstream multilingual model baselines.

2. Related Work

2.1. Financial Sentiment Analysis

Financial sentiment analysis aims to extract sentiment-related information from financial texts using natural language processing techniques, thereby supporting asset pricing, risk assessment, and investment decision making [10]. This task has attracted extensive attention in recent years and has demonstrated significant potential in applications such as volatility forecasting, public opinion monitoring, and financial crisis early warning systems [6]. Unlike general sentiment analysis, financial texts are typically characterized by high semantic density, concentrated use of terminology, and implicit sentiment expression, which introduces additional complexity to sentiment modeling [11]. As a result, the construction of accurate and robust financial sentiment recognition systems has become a central focus of ongoing research in this field. Existing studies have primarily focused on English-language contexts, with relatively well-established resources and methodological frameworks. Representative works include the construction of high-quality financial sentiment lexicons and annotated corpora, such as the Financial PhraseBank [12], the FiQA (Financial Question Answering) dataset [13], and the Loughran–McDonald financial dictionary [14], all of which provide a solid data foundation for model training. In terms of methodology, earlier studies mainly relied on lexicon matching and rule-based inference. In recent years, with the rise of deep learning, models based on BERT and its domain-specific variants (such as FinBERT) have been widely applied to financial sentiment classification tasks, achieving significantly better performance than traditional methods on various benchmark datasets [15]. However, significant challenges persist in non-English languages. Most current sentiment lexicons and datasets are constructed in English, while other languages (e.g., Chinese, French, Arabic) suffer from a severe lack of both quantity and quality of resources, which limits the applicability and generalizability of multilingual sentiment models. Additionally, systemic differences in syntactic structures, cultural semantics, and financial terminology expressions across languages can lead to semantic drift and performance degradation during multilingual transfer. The imbalance of available resources also results in a lack of effective supervision signals during model training, further weakening the practical utility of these models in cross-lingual financial environments. These limitations indicate that the development of sentiment analysis frameworks capable of cross-lingual transfer, while accommodating linguistic differences and diverse sentiment expressions, is essential for advancing the field toward multilingual applications.

2.2. Multilingual Pretrained Language Models and Domain Adaptation

Multilingual pretrained language models (MPLMs) have emerged as essential tools for multilingual tasks due to their rapid development in the field of natural language processing [16]. Representative models such as mBERT [17] and XLM-R [18] are pretrained on large-scale cross-lingual corpora in an unsupervised manner to construct shared semantic representation spaces, thereby enabling the transfer of textual representation capabilities across languages. These models have been widely used in cross-lingual tasks, including sentiment classification, question answering, and named entity recognition, demonstrating strong zero-shot or few-shot learning capabilities in resource-imbalanced scenarios. Inspired by this, researchers in the financial domain have begun exploring the application of such general-purpose multilingual models to financial contexts, aiming to leverage their cross-lingual generalization ability to address issues such as resource imbalance and the high specificity of financial terminology. For instance, Rizvi et al. (2025) fine-tuned XLM-R on banking reviews in Sinhala, English, and code-mixed texts to support cross-lingual sentiment and event classification tasks [19], achieving an F1-score of over 88% by integrating explainability techniques such as SHAP and LIME. Other studies have adopted a “translate–train” strategy, where non-English financial texts are first translated into English before downstream analysis is performed using monolingual models [20]. Although these methods have provided preliminary pathways for multilingual financial sentiment analysis, their applicability in real-world financial scenarios remains constrained. First, since MPLMs are mainly pretrained on large-scale general-purpose corpora, they often lack precise modeling of specialized financial terminology, resulting in semantic deviation and misinterpretation [21]. Moreover, deep-rooted differences in syntactic structure, sentiment expression, and cultural context across languages make it difficult to maintain semantic consistency during cross-lingual transfer. Although domain-adaptive fine-tuning [22] has been proposed to mitigate these issues, this approach typically requires large amounts of high-quality, domain-specific annotated data, which are often unavailable for low-resource languages, thereby severely limiting its scalability in multilingual financial settings. To address these limitations, a language-adaptive modulation module and a semantic contrastive learning strategy are introduced in the present work to enhance semantic alignment and sentiment recognition precision in domain-specific contexts.

2.3. Contrastive Learning and Cross-Lingual Semantic Alignment

Contrastive learning has attracted considerable attention in recent years in the field of representation learning. As an unsupervised or weakly supervised paradigm, it aims to enhance a model’s ability to aggregate semantically similar representations and to separate dissimilar ones by constructing sample pairs of “similar” and “dissimilar” examples [23]. This mechanism has demonstrated outstanding discriminative capabilities across various tasks, particularly in computer vision and natural language processing. Representative methods such as SimCLR [24] and MoCo [25] have achieved notable success in image representation learning, providing methodological foundations for transfer learning in natural language tasks. Models such as SimCSE [26] and CLIP [27] have further extended contrastive learning to sentence modeling and multimodal semantic alignment, significantly improving the semantic structure of the embedding space. In cross-lingual natural language processing tasks, contrastive learning has been employed to promote semantic consistency across different language spaces. For instance, LaBSE [28] constructs language-agnostic positive pairs using parallel translated sentences, effectively enhancing cross-lingual semantic alignment. AlignSimCSE [29] introduces alignment constraints at the sentence embedding level by incorporating multilingual sentence pairs and contrastive loss terms. These approaches have achieved strong performance in cross-lingual retrieval and sentiment classification tasks, demonstrating the potential of contrastive learning in constructing language-independent representations. However, they still face limitations when applied to multilingual sentiment modeling in financial scenarios. Financial corpora often exhibit high semantic density and implicit sentiment features; if generic contrastive strategies are used without consideration of domain-specific semantic objectives, non-essential features may be amplified, weakening the model’s ability to detect genuine sentiment signals [30]. Furthermore, most existing methods construct sample pairs at the sentence level, making it difficult to capture localized sentiment triggers specific to financial contexts, which, in turn, affects the stability and generalizability of cross-lingual alignment. To effectively integrate contrastive learning strategies into multilingual financial sentiment analysis frameworks and construct a cross-lingual semantic alignment mechanism with sentiment awareness, a semantic contrastive learning module with financial sentiment sensitivity is proposed. This module is designed to improve the stability of transfer learning and the consistency of semantic representation in heterogeneous language and domain-divergent scenarios.

3. Materials and Methods

3.1. Data Collection

In this study, the construction of a cross-lingual financial sentiment dataset served as the foundation for model training and evaluation. The dataset encompassed three major financial languages: English, Chinese, and French. The collected texts included diverse sources such as financial news, corporate announcements, and social media comments, ensuring a wide range of sentiment distribution and linguistic styles, as shown in Table 1. English texts were primarily gathered from leading financial media outlets, such as Reuters, Bloomberg, and CNBC, as well as corporate disclosure databases, such as SEC EDGAR. The collection spanned the period from 2018 to 2023 and covered topics including macroeconomic policies, market dynamics, and corporate earnings reports. Chinese data were mainly sourced from platforms such as Sina Finance, Eastmoney, and Xueqiu, focusing on corporate announcements and user comments. Data were retrieved using keyword-based extraction and thematic clustering to ensure coverage of stock market sentiment, policy reactions, and company news. The construction of the French corpus was relatively complex and relied on French financial news sites such as Les Echos and La Tribune, as well as disclosures from Euronext Paris. Supplementary data were collected from social platforms such as Twitter involving finance-related posts. Given the high noise level in French social media texts, language detection, keyword filtering, and content-based clustering were applied to significantly improve the thematic relevance and annotation viability of the corpus. To ensure balanced language distribution and sentiment diversity, a dual-constraint sampling mechanism was designed, enforcing proportional selection of news, announcements, and social posts across languages. After initial acquisition, manual verification and deduplication were conducted to obtain a high-quality raw corpus. Manual verification covered all sampled entries in Chinese and French and a 20% stratified random subset in English, focusing on language correctness, financial domain relevance, and removal of duplicated or near-duplicate entries. For data annotation, we employed a “pretrained-model-assisted plus human-revised” strategy. Specifically, the initial sentiment labels (positive, neutral, negative) were generated using the FinBERT model [15] for English texts. These English texts served as the base for constructing sentiment labels, which were then projected onto Chinese and French via high-quality machine translation (Microsoft Translator API). All projected labels were subject to human verification. The annotation team consisted of six annotators with backgrounds in finance and linguistics. Each non-English sample was reviewed by at least two annotators. In cases of disagreement, a third senior annotator acted as an adjudicator to determine the final label. The final dataset contains sentiment annotations for all three languages in their original form, not only the English-translated versions. Table 2 reports the label distribution across languages. In addition, language recognition and entity-level filtering techniques were applied to exclude non-target language content and irrelevant information, thereby increasing the density and quality of training samples. The dataset preserves the linguistic characteristics of each language while maintaining encoding consistency, syntactic integrity, and terminological uniformity, facilitating efficient downstream processing by multilingual models.

3.2. Data Preprocessing

To enhance the model training stability and improve the quality of text representations, a systematic data cleaning and augmentation process was conducted following the completion of data annotation. During the cleaning phase, a strategy combining rule-based and statistical methods was adopted. Initially, invalid texts—such as those that were excessively short or contained only symbols or corrupted characters—were removed. Subsequently, the fastText tool was employed to identify the language of each text, ensuring both linguistic consistency and accuracy. Additionally, hash-based fingerprinting techniques (e.g., MinHash) were introduced to detect content similarity and eliminate duplicate samples originating from different platforms or sources. This cleaning process can be formally defined as follows:

D_{1} = \{x_{i} \in D_{0} | Lang (x_{i}) \in {en, zh, fr}, \forall j \neq i, Sim (x_{i}, x_{j}) < δ\},

(1)

where

D_{0}

denotes the original dataset,

D_{1}

denotes the cleaned dataset,

Lang (x_{i})

denotes the language detection function,

Sim (x_{i}, x_{j})

denotes the text similarity function, and

δ

denotes the de-duplication threshold. This cleaning procedure effectively filtered out language-inconsistent and redundant content, providing a semantically consistent and source-diverse foundation for subsequent training. Based on this, a multi-strategy data augmentation mechanism was devised to further improve the model’s robustness and generalization under low-resource language settings. The strategies included cross-lingual back-translation, synonym substitution at the lexical level, and structure-perturbation-based masking. Let

x \in D_{1}

denote a cleaned sample; the corresponding augmented samples were generated as follows:

A (x) = {x^{(b t)}, x^{(s y n)}, x^{(m a s k)}},

(2)

where

x^{(b t)} = BackTrans (x, l_{t})

indicates that x is translated into an intermediate language

l_{t}

(e.g., German) and then translated back to its original language, thereby introducing syntactic variants while preserving semantics;

x^{(s y n)} = SynSub (x)

refers to synonym substitution under contextual constraints using resources such as WordNet or synonym dictionaries;

x^{(m a s k)} = Mask (x, γ)

denotes random masking of up to a

γ

proportion of non-stopword tokens, replaced with [MASK], to improve the model’s robustness to missing information. The final augmented dataset is defined as

D_{2} = ⋃_{x \in D_{1}} x \cup A (x),

(3)

which provides a more diverse training corpus that extends cross-lingual sentiment coverage while maintaining semantic consistency.

3.3. Proposed Method

3.3.1. Overall

The proposed method for cross-lingual financial sentiment analysis is built upon multilingual pretrained language models and forms a unified framework composed of a multilingual encoder, a semantic contrastive alignment module, and a language-adaptive tuning module. This framework is designed to address the challenges of semantic misalignment, ambiguous sentiment expression, and limited cross-lingual generalization in non-English financial texts. Structured multilingual financial texts are fed into the model as input. Initially, the multilingual encoder, based on XLM-R, encodes the input into shared semantic representations across languages. These representations are then passed into two enhancement modules. The semantic contrastive alignment module is responsible for aligning the representations of semantically equivalent sentences across languages by constructing positive and negative pairs and optimizing their relative similarity. It maximizes similarity between positive pairs and minimizes it between negative pairs, enabling the model to learn language-agnostic deep sentiment features. In parallel, the language-adaptive tuning module introduces low-rank language-specific parameters into the encoder in a non-destructive manner. This enhances the model’s sensitivity to fine-grained sentiment variation in languages such as Chinese and French while preserving its multilingual generalization ability. The enhanced representation is subsequently processed by a domain-specific sentiment decoder, which integrates a domain-specific lexicon and confidence-based adjustment mechanisms to perform multilayer perception mapping and produce one of three sentiment labels. The overall architecture is modular, complementary, and well structured, achieving both linguistic adaptability and sentiment discriminability, with high performance and extensibility in cross-lingual financial applications.

3.3.2. Semantic Contrastive Alignment

In the proposed framework, the semantic contrastive alignment module serves as a bridge between the multilingual encoder and the language-adaptive tuning module. Its primary objective is to align semantically equivalent financial texts across languages in the representation space through language-agnostic embedding learning, thereby mitigating the semantic drift caused by syntactic and lexical discrepancies across languages. This module is inspired by the SimCSE architecture and has been adapted to suit the multilingual context.

As shown in Figure 1, the semantic contrastive alignment module receives high-dimensional embeddings from the XLM-R encoder, with a shape of

H \times D

, where

H = 1

denotes sentence-level embeddings and

D = 1024

is the hidden dimension of the final layer of XLM-R. A fully connected layer is employed to project the embeddings into a contrastive space with dimensionality

d = 256

. The projection layer is defined as follows:

z_{i} = ReLU (W_{c} \cdot h_{i} + b_{c}), W_{c} \in R^{256 \times 1024}, b_{c} \in R^{256},

(4)

where

h_{i}

denotes the encoded representation of the i-th sample, and

z_{i}

is the projected vector for contrastive learning. The ReLU activation introduces nonlinearity into the feature space. All vector pairs

(z_{i}, z_{j})

are then optimized using the InfoNCE loss, defined as

L_{contrast} = - log \frac{exp (sim (z_{i}, z_{j}) / τ)}{\sum_{k = 1}^{N} 1_{[k \neq i]} exp (sim (z_{i}, z_{k}) / τ)},

(5)

where

sim (z_{i}, z_{j})

denotes the cosine similarity

sim (z_{i}, z_{j}) = \frac{z_{i}^{⊤} z_{j}}{∥ z_{i} ∥ ∥ z_{j} ∥}

,

τ

is the temperature parameter (typically set to

0.05

), N is the mini-batch size, and

1_{[k \neq i]}

is the indicator function, excluding self-comparisons. From an optimization perspective, the InfoNCE objective maximizes the similarity between semantically aligned positive samples and suppresses similarity among negatives, which can be theoretically shown to maximize a lower bound of the mutual information between paired samples. Given a financial sentence X in language A and its semantic equivalent Y in language B, the following inequality holds:

L_{contrast} \geq I (X; Y) - log (N),

(6)

where

I (X; Y)

denotes the mutual information between X and Y. Hence, the optimization process of the contrastive module can be interpreted as maximizing semantic co-occurrence between multilingual equivalents. When combined with the language-adaptive tuning module, the contrastive module ensures alignment in the representation space, while the tuning module refines language-specific semantic variance. These modules operate in parallel structurally and are decoupled in terms of parameter optimization. Specifically, the contrastive module updates the encoder representations using a shared loss

L_{contrast}

, while the tuning module introduces independently learned parameters

θ_{adapt}

, optimized via the following joint objective:

L_{joint} = L_{contrast} + λ \cdot L_{task} (θ_{adapt}),

(7)

where

L_{task}

denotes the cross-entropy loss for the sentiment classification task, and

λ

is a hyperparameter that balances the gradient contributions of the contrastive and adaptive components. This joint architecture offers distinct advantages. The contrastive module enforces structured semantic consistency across languages, reducing the sensitivity of the model to distributional shifts. In parallel, the language-adaptive module provides fine-grained adjustments tailored to low-resource financial languages such as French and Chinese. Together, they effectively address two pervasive challenges in multilingual financial sentiment analysis—semantic misalignment and insufficient generalization—thereby enhancing overall model performance and transferability.

3.3.3. Language-Adaptive Tuning

The proposed language-adaptive tuning module enhances the model’s ability to capture language-specific sentiment patterns in financial texts while preserving the generalization capacity of the underlying multilingual language model. This module leverages a lightweight low-rank injection mechanism, incorporating dynamic rank decomposition and selective channel pruning. These techniques enable the injection of language-specific adjustments without modifying the large-scale parameters of the base model (e.g., XLM-R), thereby improving adaptability to financial texts in non-English languages such as Chinese and French.

As shown in Figure 2, the core implementation of this module is based on injecting learnable low-rank matrices into the key linear transformations within the multi-head self-attention and feedforward sublayers of the Transformer architecture. Specifically, two low-rank matrices

W_{up} \in R^{d \times r}

and

W_{down} \in R^{r \times d}

are introduced, where

d = 1024

denotes the hidden layer dimension, and

r = 16

is the rank. The injected structure is defined as follows:

\tilde{h} = h + W_{up} \cdot act (W_{down} \cdot h),

(8)

where h represents the input representation at a given Transformer layer, and

act (\cdot)

denotes the activation function (e.g., GELU). This formulation is analogous to a residual adaptation path. The design allows language-specific modulation to be injected into each Transformer layer, with only

W_{up}

and

W_{down}

being updated during training while keeping the base model parameters frozen, enabling rapid adaptation to the target language. To further enhance the effectiveness of this module, a dynamic rank decomposition mechanism and a selective channel pruning strategy are introduced. Based on the gradient statistics and Hessian approximations of the support data in the target language, each layer’s parameter sensitivity is automatically evaluated. This enables dynamic adjustment of the injected matrix ranks

r_{l}

across Transformer layers, leading to an optimized configuration

{r_{1}^{ℓ}, r_{2}^{ℓ}, \dots, r_{L}^{ℓ}}

for language ℓ. Mathematically, the optimization objective is formulated as

min_{r_{l}} \sum_{l = 1}^{L} {∥ \nabla_{θ_{l}^{ℓ}} L_{task} ∥}^{2} + α \cdot r_{l},

(9)

where

L_{task}

denotes the task-specific loss for the target language,

θ_{l}^{ℓ}

represents the adaptation parameters at layer l, and

α

is the regularization coefficient for rank sparsity. This formulation balances expressive power and compression efficiency, promoting optimal parameter utilization. This architecture offers two key advantages in the context of this study. First, the injected low-rank structure enables flexible modeling of target-language-specific sentence patterns, terminology, and sentiment cues (e.g., in French financial disclosures) without compromising multilingual generalization. Second, the module requires only a modest number of additional parameters (approximately 32K per layer), making it highly efficient and suitable for low-resource multilingual sentiment classification. Working in tandem with the semantic contrastive alignment module—which facilitates language-invariant representation—the tuning module reinforces language-specific expressivity, forming a synergistic framework for multilingual financial sentiment modeling.

3.3.4. Domain-Specific Decoder

Within the proposed framework for multilingual financial sentiment classification, the domain-specific decoder serves as the final output module, and it is responsible for mapping unified multilingual representations into discrete sentiment labels. This module must not only interpret semantic embeddings effectively but also respond precisely to sentiment triggers, domain-specific terminology, and linguistic ambiguity often present in financial texts. To this end, a multi-layer perceptron structure is adopted, integrating attention-based control and sentiment confidence weighting mechanisms. By combining language-adaptive features with contrastively aligned semantics, the decoder performs robust and controllable multilingual sentiment prediction.

To better handle temporal shifts and contextual variability in multilingual financial sentiment texts, the domain-specific decoder is designed as a multi-branch temporal–spatial processing structure rather than a pure MLP. As illustrated in Figure 3, the decoder first interpolates multi-source token embeddings using hierarchical temporal and spatial blocks and then passes the fused representations through token shuffling and resizing stages to enhance fine-grained semantic composition. These processed features are finally projected into discrete sentiment logits via a lightweight MLP with the structure

F C_{1024 \to 512} \to ReLU \to F C_{512 \to 256} \to ReLU \to F C_{256 \to 3}

. The input is a 1024-dimensional sentence embedding refined by the contrastive and adaptive modules. The output is a probability distribution over three sentiment classes: positive, neutral, and negative. ReLU activation functions are applied after each hidden layer, followed by a softmax transformation at the final layer:

\hat{y} = softmax (W_{3} \cdot ReLU (W_{2} \cdot ReLU (W_{1} \cdot h + b_{1}) + b_{2}) + b_{3}),

(10)

where

h \in R^{1024}

is the input embedding, and

W_{1} \in R^{512 \times 1024}

,

W_{2} \in R^{256 \times 512}

, and

W_{3} \in R^{3 \times 256}

are the weight matrices.

b_{1}

,

b_{2}

, and

b_{3}

denote the corresponding biases. The resulting

\hat{y} \in R^{3}

represents the predicted probability for each sentiment class. To address issues such as fuzzy sentiment cues and weak supervision, a domain-aware sentiment augmentation mechanism is incorporated. A sentiment trigger embedding function

g (w)

is introduced to assign attention weights to domain-specific sentiment words, modifying the sentence representation as follows:

h^{'} = h + \sum_{w \in T} α_{w} \cdot g (w), α_{w} = \frac{\exp (s_{w})}{\sum_{w^{'} \in T} \exp (s_{w^{'}})},

(11)

where

T

denotes the set of detected sentiment-sensitive tokens, where each token w is associated with an embedding

g (w) \in R^{1024}

and an importance score

s_{w}

derived from a multilingual, domain-specific sentiment lexicon. This lexicon covers English, Chinese, and French, and it is constructed from financial news corpora, corporate disclosures, and expert-curated terms. Each entry is assigned a polarity label (positive, neutral, negative) and an importance score based on sentiment intensity and domain frequency, enabling the model to assign greater attention to financially salient expressions across languages. This mechanism increases the decoder’s responsiveness to key financial sentiment terms such as “surge,” “drop,” or “bearish,” enhancing classification precision. To further improve uncertainty handling, a confidence-based adjustment strategy is implemented. For predictions with high entropy, temperature scaling and soft-label interpolation are applied to mitigate errors caused by ambiguous boundaries. The entropy of the model prediction is defined as

H (\hat{y}) = - \sum_{i = 1}^{3} {\hat{y}}_{i} \cdot log ({\hat{y}}_{i}),

(12)

and when

H (\hat{y})

exceeds a threshold

δ

, confidence-aware correction is triggered by blending the output with a soft target

\tilde{y}

. The final loss function is then formulated as

L_{final} = (1 - γ) \cdot L_{CE} (y, \hat{y}) + γ \cdot KL (\tilde{y} ∥ \hat{y}),

(13)

where

L_{CE}

represents the standard cross-entropy loss, KL denotes the Kullback–Leibler divergence, and

γ

is a dynamically adjusted weight based on prediction confidence. This hybrid loss formulation ensures stability in training and inference, particularly for low-confidence or noisy samples. This decoder module works in conjunction with the semantic contrastive and language-adaptive modules. While the former two construct a unified and adaptive semantic representation space, the decoder performs the final sentiment inference, forming a cascaded semantic–linguistic–task pathway. This design enables the model to handle multilingual semantic discrepancies, domain-specific terminology variation, and diverse sentiment expressions with robustness and interpretability in multilingual financial sentiment analysis.

4. Results and Discussion

4.1. Experimental Setup

4.1.1. Evaluation Metrics

To comprehensively assess the performance of the proposed model on cross-lingual financial sentiment classification tasks, three representative evaluation metrics are adopted: classification accuracy, macro- and weighted-averaged F1-scores, and the cross-lingual generalization score. The calculation formulas are defined as follows:

Accuracy = \frac{1}{N} \sum_{i = 1}^{N} I ({\hat{y}}_{i} = y_{i}),

(14)

{F 1}_{macro} = \frac{1}{C} \sum_{c = 1}^{C} \frac{2 \cdot P_{c} \cdot R_{c}}{P_{c} + R_{c}},

(15)

{F 1}_{weighted} = \sum_{c = 1}^{C} \frac{n_{c}}{N} \cdot \frac{2 \cdot P_{c} \cdot R_{c}}{P_{c} + R_{c}},

(16)

CGS = \frac{1}{L} \sum_{(s, t) \in T} \frac{F 1_{s \to t}}{F 1_{s \to s}},

(17)

where N denotes the total number of test samples,

{\hat{y}}_{i}

and

y_{i}

represent the predicted and ground-truth labels of the i-th sample, respectively, and

I (\cdot)

is the indicator function. C is the total number of sentiment classes,

P_{c}

and

R_{c}

are the precision and recall for class c, and

n_{c}

is the number of samples in class c. L denotes the number of cross-lingual test pairs,

T

is the set of all language transfer pairs (e.g., en→zh, fr→en),

F 1_{s \to t}

represents the F1-score when training on source language s and testing on target language t, and

F 1_{s \to s}

denotes the F1-score when both training and testing are conducted in the same language. By incorporating the CGS metric, the extent of performance degradation due to cross-lingual transfer can be quantitatively measured, facilitating the assessment of the model’s usability and generalization potential in low-resource language settings.

4.1.2. Baseline

To validate the effectiveness of the proposed approach on cross-lingual financial sentiment classification tasks, five representative multilingual baseline models are selected for comparison. These include multilingual pretrained language models, translation-based strategies, and contrastive learning methods, namely, mBERT [31], XLM-R (finetuned) [32], Translate-Train-BERT [33], LaBSE [28], and XLM-R + SimCSE [29]. mBERT is one of the earliest multilingual BERT models with basic cross-lingual representation capabilities and is widely used as a baseline for multilingual tasks. XLM-R, a more powerful multilingual pretrained model, has demonstrated superior transfer performance across multiple downstream tasks; its finetuned variant on financial data is used for evaluation in this study. Translate-Train-BERT represents a commonly adopted “translate-then-train” strategy, where non-English corpora are translated into English before being input into monolingual models, thereby assessing semantic transfer and expressiveness. LaBSE, optimized for cross-lingual semantic retrieval, serves as a strong baseline for evaluating sentence-level sentiment consistency due to its high semantic alignment capability. XLM-R + SimCSE combines the language modeling strengths of XLM-R with the contrastive learning mechanism of SimCSE, enabling the analysis of semantic alignment in sentiment recognition tasks. These baselines collectively span the three mainstream paradigms—pretraining, contrastive alignment, and translation adaptation—thus enabling a comprehensive performance comparison to highlight the contributions and advantages of the proposed method. The key characteristics of all baseline models are summarized in Table 3.

4.1.3. Software and Hardware Platform

All experiments were conducted in a high-performance computing environment. The hardware infrastructure comprises a multi-node server cluster equipped with eight NVIDIA A100 80GB GPUs, an Intel Xeon Platinum 8358 CPU, and 1TB of memory on the primary node, enabling large-scale parallel training and multilingual data processing. In terms of software, model training and evaluation were implemented using PyTorch 2.0 and the Transformers 4.36 library. Data preprocessing was conducted with the Hugging Face Datasets library. The entire environment was deployed under CUDA 12.1 and Python 3.10. Experiment management and result visualization were performed using Weights & Biases, ensuring the reproducibility of model tuning and the stability of the experimental results. This platform configuration provides robust computational and engineering support for large-scale cross-lingual model training, meeting the demands of financial sentiment recognition tasks in terms of data volume, computational complexity, and debugging flexibility.

4.2. Cross-Lingual Financial Sentiment Classification Results (Training Language → Testing Language)

This experiment is designed to evaluate the performance of various cross-lingual pretrained models on multilingual financial sentiment classification tasks, with a focus on assessing their transferability and semantic generalization when the training and testing languages differ. Training–testing paths were configured across English, Chinese, and French, and comparisons were conducted among mBERT, XLM-R (finetuned), Translate-Train-BERT, LaBSE, XLM-R combined with SimCSE, and the proposed integrated model. The evaluation metrics included Accuracy, F1-Macro, F1-Weighted, and the Cross-Lingual Generalization Score (CLG Score), providing a comprehensive view of each model’s effectiveness in handling language transfer and sentiment alignment.

As shown in Table 4 and Figure 4, the traditional mBERT model, which lacks domain-specific training for financial semantics, exhibited the lowest performance across all metrics. Its F1-Macro score of only 58.3 reflects imbalanced recognition of minority sentiment classes. While XLM-R demonstrated improved multilingual representation capabilities, its performance gain after fine-tuning remained limited due to observable semantic drift under cross-lingual settings. Translate-Train-BERT, which augments training data through translation, alleviated part of the misalignment issue but showed constrained performance due to its reliance on translation quality and lack of alignment supervision. LaBSE, optimized for multilingual sentence alignment, achieved a better CLG Score but fell short in capturing fine-grained sentiment features. The XLM-R + SimCSE model outperformed previous baselines by enhancing semantic consistency through contrastive learning, achieving nearly 70% in both F1-Macro and F1-Weighted. Notably, the proposed full model achieved the highest scores across all metrics. This performance advantage is attributed to the synergistic integration of the semantic contrastive alignment and language-adaptive tuning modules. These mechanisms jointly enable robust modeling of both cross-lingual shared semantics and language-specific sentiment variations, maintaining superior robustness in low-resource target language scenarios. From a mathematical perspective, the contrastive module imposes structural constraints on the representation space, clarifying the geometric boundaries between positive and negative pairs. Simultaneously, the adaptive module introduces language-specific variations via low-rank residual injection into Transformer layers, enhancing both cross-lingual transferability and domain-sensitive sentiment recognition.

4.3. Performance Degradation Under Module Ablation

This experiment evaluates the contribution of the semantic contrastive alignment and language-adaptive tuning modules to the overall model by systematically removing each component. Three ablation settings were tested: removing the semantic contrastive alignment module, removing the language-adaptive tuning module, and removing both. Other components were kept constant. The evaluation was based on Accuracy, F1-Macro, and F1-Weighted, covering overall correctness, category-level balance, and class-weighted performance, respectively.

As shown in Table 5 and Figure 5, removing either module resulted in noticeable performance degradation, particularly in F1-Macro, indicating that both modules play crucial roles in enhancing the model’s ability to identify minority sentiment classes under multilingual conditions. From a structural and mathematical standpoint, the contrastive module imposes distance-based constraints between semantic pairs in the vector space, allowing the model to learn language-independent semantic structures beyond monolingual statistical correlations. This effectively shapes the geometry of the embedding space and improves generalization in low-resource languages. The adaptive module, on the other hand, employs low-rank residual parameter injection to selectively modulate Transformer layers, introducing language-specific expressivity with minimal parameter cost. Together, these components form a representation path composed of shared semantic projection and fine-grained language modulation. Removal of either module compromises the model’s accuracy and robustness, confirming the necessity and complementarity of both designs in the proposed framework.

4.4. Transferability Evaluation Across Source → Target Languages

This experiment analyzes the transferability of the model under various source–target language combinations to further validate the generalization capacity of the proposed approach in cross-lingual financial sentiment classification. Models were trained separately on English, Chinese, and French data and then tested on other languages. A multilingual joint training setting was also included as an upper-bound reference to assess performance gains under data fusion.

As shown in Table 6 and Figure 6, the best single-source transfer occurred when training on French and testing on English. This was followed by English to Chinese and Chinese to English, suggesting that the model trained on French financial texts aligns well with English semantics. The multilingual joint training setting consistently outperformed all single-source models, with notable gains in F1-Macro and the CLG Score, indicating that combining multiple language sources significantly improves the model’s ability to interpret and discriminate sentiment across languages. Theoretically, these performance differences reflect the model’s sensitivity to distributional similarity and semantic alignment across languages. For instance, English and French share structural similarities in financial writing style and syntax, which, combined with contrastive semantic alignment, allow more effective sentiment mapping. In contrast, Chinese, with distinct syntactic and lexical characteristics, presents a greater challenge. However, the inclusion of the language-adaptive tuning module compensates for such divergence by injecting targeted adjustments into the Transformer layers. Mathematically, the transferability stems from the semantic alignment enforced by contrastive learning, which clusters semantically equivalent sentences in the embedding space across languages. Concurrently, the adaptive module ensures stable sentiment decoding by adjusting for syntactic and lexical divergence. This dual mechanism underpins the model’s robust performance in cross-lingual financial sentiment transfer.

4.5. Discussion

The proposed cross-lingual financial sentiment recognition framework demonstrated significant advantages across multiple multilingual sentiment classification tasks, exhibiting robust transferability and generalization capabilities with substantial real-world applicability. In the context of increasingly interconnected global financial markets, investment decisions are no longer limited to information from a single language. An increasing number of financial institutions, quantitative funds, and international credit rating agencies require sentiment signals extracted from multilingual financial news, corporate disclosures, and social media posts in Chinese, French, Arabic, and other languages to facilitate rapid asset pricing and risk assessment on a global scale. The multilingual sentiment recognition model developed in this study addresses these demands effectively, particularly in scenarios characterized by an abundance of English resources and a scarcity of target language data. It offers a reliable foundational capability for multilingual public opinion monitoring, event-driven trading strategies, and cross-border investment risk alerts. For instance, in international credit rating scenarios, the model can assist analysts in extracting sentiment signals from local media sources in non-English-speaking countries to provide quantitative evidence for credit rating adjustments. When foreign banks conduct market analyses in Belt and Road countries, the model enables sentiment tracking in Arabic or Southeast Asian economic news, supporting timely insights into monetary policy shifts or market expectation changes. On multilingual social media platforms such as Weibo, Reddit, and X, the model also facilitates real-time sentiment monitoring, enhancing financial risk surveillance systems with early detection of market panic signals. The integration of semantic contrastive learning and language-adaptive mechanisms not only enhances cross-lingual comprehension but also provides a feasible technical pathway for processing financial texts in complex multilingual environments. This framework holds potential for further extension into areas such as cross-lingual economic policy analysis and multilingual financial summarization, advancing the capabilities of financial language models from monolingual generalization toward deep multilingual understanding.

4.6. Limitations and Future Work

Despite the superior performance of the proposed cross-lingual financial sentiment recognition framework across multiple tasks, several limitations remain that warrant further investigation. First, although the model exhibits improved performance in low-resource language settings, the semantic alignment mechanism based on contrastive learning is still subject to the quality of training data and translation consistency. In cases where annotations are ambiguous or semantic interpretations are uncertain, the stability of sentiment labels may affect the model’s generalization, especially in languages where sentiment expression is highly context-dependent. Second, the current model predominantly adopts sentence-level representations and has not yet fully leveraged discourse-level structural information and contextual dependencies present in financial documents, limiting its applicability to long-text sentiment analysis tasks such as analyst reports or full-length announcements. Future research may proceed in three directions. First, more robust cross-lingual contrastive learning strategies can be explored by incorporating multilingual semantic alignment pretraining or graph-based semantic augmentation to enhance cross-lingual consistency. Second, hierarchical modeling techniques could be introduced to explicitly capture discourse-level structures in financial texts, improving performance on long-form, multi-paragraph sentiment classification. Third, the integration of external knowledge graphs and multilingual financial lexicons may support dynamic understanding and reasoning over newly emerging terminology, thereby enhancing the model’s adaptability and interpretability in real-world financial scenarios.

5. Conclusions

The task of cross-lingual financial sentiment recognition is addressed in this study by proposing a unified multilingual modeling framework that integrates semantic contrastive learning with language-adaptive modulation, targeting the challenges of semantic misalignment, ambiguous sentiment expression, and limited transferability in non-English financial texts. Built upon multilingual pretrained language models, the proposed framework introduces a semantic contrastive module to enhance semantic consistency across different languages, while a low-rank language-adaptive module is incorporated to improve the model’s sensitivity to language-specific emotional cues. This design enables efficient and robust sentiment classification across languages in financial contexts. The experimental results demonstrate the effectiveness of the proposed approach. On cross-lingual sentiment classification tasks, the model achieved an accuracy of 75.8%, an F1-Macro score of 72.3%, and an F1-Weighted score of 74.7%, significantly outperforming mainstream baselines such as mBERT, XLM-R, and Translate-Train-BERT. In transfer evaluations between source and target languages, superior performance was consistently observed across various combinations, including English-to-French, French-to-English, and Chinese-to-English combinations. Furthermore, multilingual joint training boosted the cross-lingual generalization score to 0.696, reflecting strong semantic alignment and transfer capacity. Ablation studies further confirm that both the semantic contrastive module and the language-adaptive module contribute critically to performance improvements, exhibiting complementarity and structural soundness. The proposed cross-lingual financial sentiment recognition model not only offers advantages in terms of accuracy, recall, and robustness but also demonstrates broad practical applicability. It provides a transferable and scalable technical solution for automated multilingual financial text analysis and sentiment modeling, underscoring its theoretical significance and potential for real-world deployment.

Author Contributions

Conceptualization, L.Z., Q.L., F.M. and Y.Z.; methodology, L.Z., Q.L. and F.M.; software, L.Z., Q.L. and F.M.; validation, S.L. (Siyu Liang); formal analysis, J.L. and K.C.; investigation, J.L. and K.C.; resources, S.L. (Siyu Liang), S.L. (Shen Liu) and K.C.; data curation, S.L. (Siyu Liang), S.L. (Shen Liu) and K.C.; writing—original draft preparation, L.Z., Q.L., F.M., S.L. (Siyu Liang), J.L., S.L. (Shen Liu), K.C. and Y.Z.; visualization, J.L. and S.L. (Shen Liu); supervision, Y.Z.; project administration, Y.Z.; funding acquisition, Y.Z., L.Z., Q.L. and F.M. contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant number: 61207429).

Data Availability Statement

The data presented in this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Alahmadi, K.; Alharbi, S.; Chen, J.; Wang, X. Generalizing sentiment analysis: A review of progress, challenges, and emerging directions. Soc. Netw. Anal. Min. 2025, 15, 45. [Google Scholar] [CrossRef]
Smailović, J.; Grčar, M.; Lavrač, N.; Žnidaršič, M. Stream-based active learning for sentiment analysis in the financial domain. Inf. Sci. 2014, 285, 181–203. [Google Scholar] [CrossRef]
Hajek, P.; Munk, M. Speech emotion recognition and text sentiment analysis for financial distress prediction. Neural Comput. Appl. 2023, 35, 21463–21477. [Google Scholar] [CrossRef]
Rao, Y.; Lei, J.; Wenyin, L.; Li, Q.; Chen, M. Building emotional dictionary for sentiment analysis of online news. World Wide Web 2014, 17, 723–742. [Google Scholar] [CrossRef]
Yuan, C.; Liu, Y.; Yin, R.; Zhang, J.; Zhu, Q.; Mao, R.; Xu, R. Target-based sentiment annotation in Chinese financial news. In Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France, 13–15 May 2020; pp. 5040–5045. [Google Scholar]
Sohangir, S.; Wang, D.; Pomeranets, A.; Khoshgoftaar, T.M. Big Data: Deep Learning for financial sentiment analysis. J. Big Data 2018, 5, 3. [Google Scholar] [CrossRef]
Almalki, S.S. Sentiment Analysis and Emotion Detection Using Transformer Models in Multilingual Social Media Data. Int. J. Adv. Comput. Sci. Appl. 2025, 16, 324–333. [Google Scholar] [CrossRef]
Kumar, A.; Albuquerque, V.H.C. Sentiment analysis using XLM-R transformer and zero-shot transfer learning on resource-poor Indian language. Trans. Asian Low-Resour. Lang. Inf. Process. 2021, 20, 1–13. [Google Scholar] [CrossRef]
Li, X.; Zhang, K. Contrastive Learning Pre-Training and Quantum Theory for Cross-Lingual Aspect-Based Sentiment Analysis. Entropy 2025, 27, 713. [Google Scholar] [CrossRef]
Du, K.; Xing, F.; Mao, R.; Cambria, E. Financial sentiment analysis: Techniques and applications. ACM Comput. Surv. 2024, 56, 1–42. [Google Scholar] [CrossRef]
Yang, S.; Rosenfeld, J.; Makutonin, J. Financial aspect-based sentiment analysis using deep representations. arXiv 2018, arXiv:1808.07931. [Google Scholar] [CrossRef]
Malo, P.; Sinha, A.; Korhonen, P.; Wallenius, J.; Takala, P. Good debt or bad debt: Detecting semantic orientations in economic texts. J. Assoc. Inf. Sci. Technol. 2014, 65, 782–796. [Google Scholar] [CrossRef]
Yang, Y.; Uy, M.C.S.; Huang, A. Finbert: A pretrained language model for financial communications. arXiv 2020, arXiv:2006.08097. [Google Scholar] [CrossRef]
Sheetal, R.; Aithal, P.K. Enhancing Financial Sentiment Analysis: Integrating the LoughranMcDonald Dictionary with BERT for Advanced Market Predictive Insights. Procedia Comput. Sci. 2025, 258, 2244–2257. [Google Scholar]
Huang, A.H.; Wang, H.; Yang, Y. FinBERT: A large language model for extracting information from financial text. Contemp. Account. Res. 2023, 40, 806–841. [Google Scholar] [CrossRef]
Doddapaneni, S.; Ramesh, G.; Khapra, M.M.; Kunchukuttan, A.; Kumar, P. A primer on pretrained multilingual language models. arXiv 2021, arXiv:2107.00676. [Google Scholar] [CrossRef]
Kalia, I.; Singh, P.; Kumar, A. Domain Adaptation for NER Using mBERT. In Proceedings of the International Conference on Innovations in Computational Intelligence and Computer Vision; Springer: Berlin/Heidelberg, Germany, 2024; pp. 171–181. [Google Scholar]
Ma, Y. Cross-language Text Generation Using mBERT and XLM-R: English-Chinese Translation Task. In Proceedings of the 2024 International Conference on Machine Intelligence and Digital Applications, Ningbo, China, 30–31 May 2024; pp. 602–608. [Google Scholar]
Rizvi, A.; Thamindu, N.; Adhikari, A.; Senevirathna, W.; Kasthurirathna, D.; Abeywardhana, L. Enhancing Multilingual Sentiment Analysis with Explainability for Sinhala, English, and Code-Mixed Content. arXiv 2025, arXiv:2504.13545. [Google Scholar] [CrossRef]
Ri, R.; Kiyono, S.; Takase, S. Self-translate-train: Enhancing cross-lingual transfer of large language models via inherent capability. arXiv 2024, arXiv:2407.00454. [Google Scholar]
Liu, J.; Yang, Y.; Tam, K.Y. Beyond surface similarity: Detecting subtle semantic shifts in financial narratives. arXiv 2024, arXiv:2403.14341. [Google Scholar] [CrossRef]
Yan, G.; Peng, K.; Wang, Y.; Tan, H.; Du, J.; Wu, H. AdaFT: An efficient domain-adaptive fine-tuning framework for sentiment analysis in chinese financial texts. Appl. Intell. 2025, 55, 701. [Google Scholar] [CrossRef]
Tian, Y.; Sun, C.; Poole, B.; Krishnan, D.; Schmid, C.; Isola, P. What makes for good views for contrastive learning? Adv. Neural Inf. Process. Syst. 2020, 33, 6827–6839. [Google Scholar]
Zhang, H.; Cao, Y. Understanding the Benefits of SimCLR Pre-Training in Two-Layer Convolutional Neural Networks. arXiv 2024, arXiv:2409.18685. [Google Scholar]
He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9729–9738. [Google Scholar]
Gao, T.; Yao, X.; Chen, D. Simcse: Simple contrastive learning of sentence embeddings. arXiv 2021, arXiv:2104.08821. [Google Scholar]
Hafner, M.; Katsantoni, M.; Köster, T.; Marks, J.; Mukherjee, J.; Staiger, D.; Ule, J.; Zavolan, M. CLIP and complementary methods. Nat. Rev. Methods Prim. 2021, 1, 20. [Google Scholar] [CrossRef]
Feng, F.; Yang, Y.; Cer, D.; Arivazhagan, N.; Wang, W. Language-agnostic BERT sentence embedding. arXiv 2020, arXiv:2007.01852. [Google Scholar]
Wang, Y.S.; Wu, A.; Neubig, G. English contrastive learning can learn universal cross-lingual sentence embeddings. arXiv 2022, arXiv:2211.06127. [Google Scholar] [CrossRef]
Tang, G.; Yousuf, O.; Jin, Z. Improving BERTScore for machine translation evaluation through contrastive learning. IEEE Access 2024, 12, 77739–77749. [Google Scholar] [CrossRef]
Pires, T.; Schlinger, E.; Garrette, D. How multilingual is multilingual BERT? arXiv 2019, arXiv:1906.01502. [Google Scholar] [CrossRef]
Conneau, A.; Khandelwal, K.; Goyal, N.; Chaudhary, V.; Wenzek, G.; Guzmán, F.; Grave, E.; Ott, M.; Zettlemoyer, L.; Stoyanov, V. Unsupervised cross-lingual representation learning at scale. arXiv 2019, arXiv:1911.02116. [Google Scholar]
Yu, S.; Sun, Q.; Zhang, H.; Jiang, J. Translate-Train Embracing Translationese Artifacts; Association for Computational Linguistics: Dublin, Ireland, 2022. [Google Scholar]

Figure 1. Semantic contrastive alignment module. Subfigure (a) shows a declarative query and (b) an interrogative equivalent. Attention weights link functional, conceptual, and punctuation tokens. Contrastive learning aligns their embeddings, where ✔ marks correct alignment and × indicates misaligned outputs.

Figure 2. Illustration of the language-adaptive tuning mechanism. The notations

W^{* u p}

and

W^{* d o w n}

denote the compressed low-rank adaptation matrices obtained after pruning, serving as refined versions of the injected matrices

W^{u p}

and

W^{d o w n}

.

Figure 2. Illustration of the language-adaptive tuning mechanism. The notations

W^{* u p}

and

W^{* d o w n}

denote the compressed low-rank adaptation matrices obtained after pruning, serving as refined versions of the injected matrices

W^{u p}

and

W^{d o w n}

.

Figure 3. Illustration of the domain-specific decoder architecture.

Figure 4. Radar chart comparing multiple models on cross-lingual financial sentiment classification tasks, where training and testing are performed across different languages.

Figure 5. Line chart illustrating the ablation study results, depicting performance degradation when key modules are removed from the proposed model.

Figure 6. Heatmap illustrating the transferability performance across different source → target language pairs for cross-lingual financial sentiment classification. Metrics include Accuracy, F1-Macro, F1-Weighted, and the CLG Score.

Table 1. Statistics of the multilingual financial sentiment dataset.

Language	News	Announcements	Social Media
English	18,542	12,764	9380
Chinese	16,308	14,227	11,106
French	12,695	10,024	8517
Total	47,545	37,015	28,993

Table 2. Sentiment label distribution by language.

Language	Positive	Neutral	Negative
English	14,582	12,347	13,757
Chinese	13,276	14,865	13,500
French	10,384	10,928	9924

Table 3. Summary of the baseline models used in the comparison.

Model	#Params	Pretraining Data	Architecture	Key Features
mBERT	∼110 M	Wikipedia (104 langs)	Transformer, 12 layers, 768-d	Early MPLM, basic cross-lingual capability
XLM-R (finetuned)	∼270 M	CommonCrawl CC-100 (100 langs)	Transformer, 24 layers, 1024-d	Strong multilingual encoder, finetuned on financial data
Translate-Train-BERT	∼110 M	Wikipedia (English) + Translated corpora	Transformer, 12 layers, 768-d	Translate non-English to English before training
LaBSE	∼470 M	Translation pairs (109 langs)	Dual-encoder Transformer	Optimized for multilingual sentence alignment
XLM-R + SimCSE	∼270 M	CC-100 (100 langs) + contrastive finetuning	Transformer, 24 layers, 1024-d	Combines multilingual encoder with sentence-level contrastive learning

Table 4. Cross-lingual financial sentiment classification results. For each model, the reported metrics are the averages across all evaluated cross-lingual transfer directions.

Model	Accuracy	F1-Macro	F1-Weighted	CLG Score
mBERT	62.4	58.3	60.2	0.541
XLM-R (finetuned)	66.1	62.7	65.4	0.584
Translate-Train-BERT	67.3	63.5	66.1	0.597
LaBSE	68.5	65.2	67.8	0.614
XLM-R + SimCSE	71.4	67.9	70.2	0.646
Ours (Full Model)	75.8	72.3	74.7	0.684

Table 5. Ablation results: Performance degradation with different modules removed.

Model Configuration	Accuracy	F1-Macro	F1-Weighted
Full Model	75.8	72.3	74.7
w/o Semantic Contrastive Alignment	70.1	66.0	68.3
w/o Language-Adaptive Tuning	69.2	64.9	67.4
w/o Both Modules	65.8	61.7	63.5

Table 6. Transferability evaluation across source → target language pairs.

Source Language → Target Language	Accuracy	F1-Macro	F1-Weighted	CLG Score
English → Chinese	74.2	70.5	73.1	0.662
English → French	73.6	69.8	72.4	0.654
French → English	75.1	71.9	74.2	0.671
Chinese → English	73.9	70.4	72.7	0.658
Multilingual Joint Training	77.3	73.6	76.1	0.696

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, L.; Lin, Q.; Meng, F.; Liang, S.; Lu, J.; Liu, S.; Chen, K.; Zhan, Y. Leveraging Contrastive Semantics and Language Adaptation for Robust Financial Text Classification Across Languages. Computers 2025, 14, 338. https://doi.org/10.3390/computers14080338

AMA Style

Zhang L, Lin Q, Meng F, Liang S, Lu J, Liu S, Chen K, Zhan Y. Leveraging Contrastive Semantics and Language Adaptation for Robust Financial Text Classification Across Languages. Computers. 2025; 14(8):338. https://doi.org/10.3390/computers14080338

Chicago/Turabian Style

Zhang, Liman, Qianye Lin, Fanyu Meng, Siyu Liang, Jingxuan Lu, Shen Liu, Kehan Chen, and Yan Zhan. 2025. "Leveraging Contrastive Semantics and Language Adaptation for Robust Financial Text Classification Across Languages" Computers 14, no. 8: 338. https://doi.org/10.3390/computers14080338

APA Style

Zhang, L., Lin, Q., Meng, F., Liang, S., Lu, J., Liu, S., Chen, K., & Zhan, Y. (2025). Leveraging Contrastive Semantics and Language Adaptation for Robust Financial Text Classification Across Languages. Computers, 14(8), 338. https://doi.org/10.3390/computers14080338

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Leveraging Contrastive Semantics and Language Adaptation for Robust Financial Text Classification Across Languages

Abstract

1. Introduction

2. Related Work

2.1. Financial Sentiment Analysis

2.2. Multilingual Pretrained Language Models and Domain Adaptation

2.3. Contrastive Learning and Cross-Lingual Semantic Alignment

3. Materials and Methods

3.1. Data Collection

3.2. Data Preprocessing

3.3. Proposed Method

3.3.1. Overall

3.3.2. Semantic Contrastive Alignment

3.3.3. Language-Adaptive Tuning

3.3.4. Domain-Specific Decoder

4. Results and Discussion

4.1. Experimental Setup

4.1.1. Evaluation Metrics

4.1.2. Baseline

4.1.3. Software and Hardware Platform

4.2. Cross-Lingual Financial Sentiment Classification Results (Training Language → Testing Language)

4.3. Performance Degradation Under Module Ablation

4.4. Transferability Evaluation Across Source → Target Languages

4.5. Discussion

4.6. Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI