MTSA-CG: Mongolian Text Sentiment Analysis Based on ConvBERT and Graph Attention Network

Ren, Qingdaoerji; Wang, Qihui; Lu, Ying; Ji, Yatu; Wu, Nier

doi:10.3390/electronics14234581

Open AccessArticle

MTSA-CG: Mongolian Text Sentiment Analysis Based on ConvBERT and Graph Attention Network

by

Qingdaoerji Ren

¹

,

Qihui Wang

^2,*

,

Ying Lu

²,

Yatu Ji

² and

Nier Wu

²

¹

College of Intelligent Science and Technology, Inner Mongolia University of Technology, Hohhot 010051, China

²

School of Information Engineering, Inner Mongolia University of Technology, Hohhot 010051, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(23), 4581; https://doi.org/10.3390/electronics14234581 (registering DOI)

Submission received: 21 October 2025 / Revised: 13 November 2025 / Accepted: 15 November 2025 / Published: 23 November 2025

Download

Browse Figure

Versions Notes

Abstract

In Mongolian Text Sentiment Analysis (MTSA), the scarcity of annotated sentiment datasets and the insufficient consideration of syntactic dependency and topological structural information pose significant challenges to accurately capturing semantics and effectively extracting emotional features. To address these issues, this paper proposes a Mongolian Text Sentiment Analysis model based on ConvBERT and Graph Attention Network (MTSA-CG). Firstly, the ConvBERT pre-trained model is employed to extract textual features under limited data conditions, aiming to mitigate the shortcomings caused by data scarcity. Concurrently, textual data are transformed into graph-structured data, integrating co-occurrence, dependency, and similarity information into a Graph Attention Network (GAT) to capture syntactic and structural cues, enabling a deeper understanding of semantic and emotional connotations for more precise sentiment classification. The proposed multi-graph fusion strategy employs a hierarchical attention mechanism that dynamically weights different graph types based on their semantic relevance, distinguishing it from conventional graph aggregation methods. Experimental results demonstrate that, in comparison with various advanced baseline models, the proposed method significantly enhances the accuracy of MTSA.

Keywords:

ConvBERT; graph-structured data; Graph Attention Network; Mongolian text sentiment analysis; multi-graph fusion; low-resource language processing

1. Introduction

Text Sentiment Analysis (TSA), also known as opinion mining, refers to the task of analyzing subjective texts with emotional content to identify sentiment polarity and categorize emotional attitudes. As a prominent research focus in the field of Natural Language Processing (NLP), TSA [1] has been widely applied in domains such as social media analysis, user profiling, and human–computer interaction.

Mongolian, as one of China’s minority languages, embodies a rich historical and cultural heritage. Although the Mongolian-speaking population is widely distributed and the language enjoys a well-preserved tradition, its agglutinative characteristics, including complex word formation and rich morphological variations, present substantial challenges for natural language processing tasks. In the study of Mongolian Text Sentiment Analysis (MTSA), challenges remain due to the scarcity of sentiment datasets and the insufficient consideration of syntactic dependencies and topological structural information within textual data. To address these challenges, this paper proposes a Mongolian Text Sentiment Analysis model based on ConvBERT and Graph Attention Network (MTSA-CG). The main contributions of this work are summarized as follows:

A ConvBERT pre-trained model is employed to extract textual features from Mongolian text data. By leveraging token embedding and positional embedding, ConvBERT captures local textual features through convolutional modules and further enhances feature representation by integrating multi-head self-attention mechanisms and feedforward neural networks. This enables effective textual representation even under conditions of limited data.
A hierarchical multi-graph attention framework is proposed, which incorporates three types of graph structures: co-occurrence, syntactic dependency, and semantic similarity. Unlike existing multi-graph fusion approaches that simply concatenate or average graph representations, our method employs a novel adaptive weighting mechanism where each graph type is processed through independent attention heads, and their contributions are dynamically adjusted based on context-specific semantic relevance. This hierarchical architecture allows the model to selectively emphasize the most informative graph structure for each sentiment classification instance. These graphs, together with the feature vectors produced by the ConvBERT model, are input into the GAT to enable comprehensive integration of global semantic information.
The MTSA-CG model is introduced. Experimental results show that the proposed model effectively improves the accuracy of MTSA. Additionally, we provide comprehensive ablation studies demonstrating the contribution of each component and discuss the model’s applicability to other low-resource languages.

2. Related Work

TSA methods can be broadly categorized into three types based on their technical approaches: lexicon-based methods, traditional machine learning-based methods, and deep learning-based methods.

Lexicon-based sentiment analysis methods rely on identifying and analyzing sentiment-bearing words in the text, often combined with contextual information, to determine the overall sentiment polarity. To address the issue of poor domain adaptability of general-purpose sentiment lexicons in online news sentiment analysis, Rao et al. [2] proposed a lexicon construction method that integrates structural features of news texts with contextual semantic associations. Gao et al. [3] identified two key limitations of sentiment lexicons in the news domain—insufficient coverage of domain-specific sentiment terms and inadequate modeling of contextual dependencies—and proposed an optimization framework that combines news semantic focus detection with domain knowledge graphs. This framework employs a bidirectional attention mechanism to capture sentiment correlations between headlines and body content. Yeow et al. [4] introduced a domain-customized sentiment lexicon-based text analysis method, achieving fine-grained detection of depressive emotions in social media texts.

Traditional machine learning-based sentiment analysis methods rely on training classification models using labeled datasets to perform sentiment classification. Prusa et al. [5] explored the application of ensemble learning techniques to improve the accuracy of sentiment classification using machine learning algorithms. Yang Shuang et al. [6] proposed an innovative multi-level sentiment classification method based on Support Vector Machines (SVM), which integrates multidimensional features such as sentiment polarity and part-of-speech information to achieve fine-grained five-level sentiment classification. To address the challenges of unstructured text feature extraction and limited semantic generalization capabilities of traditional classifiers in online traffic reviews, Styawati et al. [7] introduced a sentiment analysis framework that combines Word2Vec-based contextual embeddings with kernel function optimization in SVM.

Deep learning-based sentiment analysis methods automatically learn abstract textual representations through neural network architectures, integrating feature extraction and classification into an end-to-end framework. Mustakim et al. [8] proposed the DeepSen framework, which alleviates the issues of poor model generalization due to domain shift and inefficient fusion of heterogeneous features. Shang et al. [9] introduced the Aspect-Sentence Graph Convolutional Network (ASGCN), which enhances sentiment analysis by modeling the relationship between sentences and aspect terms. In 2025, Khan et al. [10] proposed an attention-based stacked CNN-BiLSTM-DNN model for sentiment analysis. This approach leverages a two-layer Bi-LSTM structure to extract both forward and backward contextual information as semantic representations of the input text, enabling effective sentence-level sentiment classification for Urdu-language data.

Recent advances in large language models (LLMs) have demonstrated remarkable capabilities in sentiment analysis tasks. Models such as GPT-4 [11], Claude [12], and LLaMA [13] have shown strong performance through in-context learning and instruction following. However, these models face significant challenges when applied to low-resource languages like Mongolian, including: (1) limited training data in the pre-training corpus, resulting in poor language understanding; (2) high computational costs and inference latency, making deployment impractical for resource-constrained scenarios; and (3) difficulty in incorporating language-specific structural knowledge such as syntactic dependencies and morphological features [14,15,16].

Multi-graph neural network approaches have been explored in various NLP tasks. Zhang et al. [17] proposed a heterogeneous graph network for relation extraction, while Liu et al. [18] introduced a multi-relational graph convolutional network for knowledge graph completion. However, these methods typically employ fixed fusion strategies such as concatenation or weighted averaging, which fail to adaptively adjust graph contributions based on task-specific requirements. In contrast, our MTSA-CG model introduces a dynamic hierarchical attention mechanism that allows context-dependent weighting of different graph structures, providing superior flexibility and expressiveness for sentiment analysis in morphologically rich languages [19,20].

3. Methods

3.1. Overview

The overall architecture of the MTSA-CG is illustrated in Figure 1. The model comprises four main modules: textual feature extraction, graph structure construction, graph attention network analysis, and sentiment classification. First, ConvBERT is employed to extract textual features from Mongolian text data. ConvBERT utilizes token and positional embeddings, applies convolutional modules to capture local contextual features, and integrates a multi-head self-attention mechanism with a feedforward neural network to enable more accurate and efficient representation learning. Next, graph structures are constructed based on the textual data, including co-occurrence graphs, dependency graphs, and similarity graphs, which comprehensively reflect the intrinsic relationships among textual elements. These three graph structures are processed through a hierarchical multi-graph attention framework, where each graph is first encoded by independent graph attention layers, and then their representations are adaptively fused through a cross-graph attention mechanism. These graph structures, together with the feature vectors output by the pre-trained ConvBERT model, are then fed into a GAT to achieve deep fusion of global semantic information. Finally, a fully connected layer followed by a Softmax function is applied to the extracted sentiment features for classification, yielding the final sentiment prediction.

3.2. Text Feature Extraction

First, the input text is tokenized into a sequence of tokens

T = [t_{1}, t_{2}, \dots, t_{n}]

. Each token is then mapped into a low-dimensional vector space via a token embedding function, yielding the embedding

e_{i}^{token}

for the i-th token. The token embedding strategy in ConvBERT integrates a convolutional structure, whereby convolutional operations jointly optimize the embedding parameters during the mapping process. As a result, the generated embeddings

e_{i}^{token}

can more accurately capture the semantic representations of words in varying contexts. In particular, when handling polysemous or homographic words, the model incorporates multi-source contextual information to effectively resolve lexical ambiguity.

In addition to lexical semantics, the positional information of each token within the text also carries important contextual cues. Let

P = [p_{1}, p_{2}, \dots, p_{n}]

denote the position index sequence of tokens in the input. The model maps each position index to a corresponding position embedding vector

e_{i}^{pos}

via a learnable position embedding matrix. These position embeddings are then added to the token embeddings to form the final input embedding vector

e_{i}

, allowing the model to distinguish the different semantic roles of identical words appearing in different positions within the text.

Following the embedding process, the sequence of embedding vectors

E = [e_{1}, e_{2}, \dots, e_{n}]

is fed into the ConvBERT pre-trained model. The embedded sequence first passes through a convolutional module, where convolutional kernels of varying sizes

K_{j}

(with j denoting different kernel sizes) are applied to capture local semantic patterns and structural information among adjacent tokens at multiple granularities, thus laying a strong foundation for downstream representation learning. To further model long-range dependencies and global semantic relationships within the text, ConvBERT employs a multi-head self-attention mechanism. The input embeddings

e_{i}

are first linearly transformed as follows:

\begin{matrix} q_{i}^{l} & = W_{q}^{l} e_{i}, \\ k_{i}^{l} & = W_{k}^{l} e_{i}, \\ v_{i}^{l} & = W_{v}^{l} e_{i} \end{matrix}

(1)

Here,

q_{i}^{l}

,

k_{i}^{l}

, and

v_{i}^{l}

denote the query, key, and value vectors of the i-th token in the l-th attention head, respectively.

W_{q}^{l}

,

W_{k}^{l}

, and

W_{v}^{l}

are learnable weight matrices used to compute these representations.

Subsequently, a Softmax normalization is applied to the attention scores:

α_{i j}^{l} = Softmax (\frac{q_{i}^{l} {(k_{j}^{l})}^{⊤}}{\sqrt{d_{k}}})

(2)

Here,

d_{k}

denotes the dimensionality of each attention head, and

{(k_{j}^{l})}^{⊤}

denotes the transpose of

k_{j}^{l}

.

The output of each attention head is then computed as:

h_{attn}^{l} = \sum_{j = 1}^{n} α_{i j}^{l} v_{j}^{l}

(3)

Finally, the outputs of the m attention heads are concatenated and linearly transformed to obtain

h_{attn}

, enabling the model to capture long-range dependencies and global semantic representations:

h_{attn} = W_{O} [h_{attn}^{1}, h_{attn}^{2}, \dots, h_{attn}^{m}]

(4)

Here,

W_{O}

is a learnable parameter used to project the concatenated representations from multiple attention heads back to the original feature space.

The data processed by the convolutional module and the multi-head self-attention mechanism is subsequently fed into the hybrid attention module. Linear transformation and layer normalization are applied to accelerate model convergence and mitigate the effects of internal covariate shift, thereby enabling the model to stably and efficiently handle features at different scales.

The feature

h_{FFN}

processed by the hybrid attention module is fed into the feedforward neural network, where it undergoes the following computation:

h_{FFN 2} = W_{2} (σ (W_{1} h_{FFN}))

(5)

where

W_{1}

and

W_{2}

denote the weight matrices of the network, and

σ

represents the activation function.

Through nonlinear transformation and enhancement operations, the model’s expressive capability is significantly improved, ultimately producing the output feature vector sequence

H = [h_{1}, h_{2}, \dots, h_{n}]

for subsequent processing, where

h_{i}

represents the feature vector of the i-th node, and n denotes the number of nodes.

3.3. Graph Structure Construction

3.3.1. Module Overview

In the proposed MTSA-CG framework, three types of graphs are constructed to uncover the intrinsic relationships among feature vectors output by the ConvBERT encoder: the Co-occurrence Relation Graph (CRG), the Dependency Relation Graph (DRG), and the Semantic Similarity Graph (SSG). Each graph captures distinct linguistic and semantic perspectives. The CRG reflects statistical co-occurrence patterns between words, thereby revealing latent contextual associations. The DRG incorporates syntactic dependency information, which enables the model to represent hierarchical and structural relations within sentences. The SSG models semantic relatedness based on contextual similarity, facilitating the capture of high-level semantic coherence.

Leakage-Safe Protocol

All graph construction processes strictly adhere to a leakage-safe protocol. For each cross-validation fold, vocabularies, edge weights, co-occurrence statistics, dependency structures, and bilingual vector alignments are computed exclusively from the corresponding training subset. This ensures that no information from the validation or test sets leaks into the training process, thereby maintaining fair and reliable evaluation.

3.3.2. Co-Occurrence Relation Graph (CRG)

The CRG aims to capture the frequency and contextual proximity of words within the corpus. For each pair of words

(v_{i}, v_{j})

occurring within a sliding window of size

w = 5

, a co-occurrence count

w_{i j}

is recorded. An undirected weighted graph is then constructed, where nodes represent vocabulary terms and edges encode normalized co-occurrence frequencies. The adjacency matrix

A_{co}

is defined as:

a_{i j} = \{\begin{matrix} \frac{w_{i j}}{max (w)}, & if v_{i} and v_{j} co - occur and w_{i j} \geq θ_{co} \\ 0, & otherwise \end{matrix}

(6)

Here,

max (w)

denotes the maximum co-occurrence count used for normalization, and

θ_{co} = 3

serves as a threshold to filter out spurious associations. Edges below this threshold are pruned to reduce noise.

Leakage Control

For each fold, CRG matrices are independently constructed using only the training data. If a word pair

(v_{i}, v_{j})

does not appear in the training split, its corresponding entry

a_{i j}

is set to zero during testing, ensuring no leakage from unseen vocabulary.

3.3.3. Cross-Lingual Dependency Transformation (DRG)

Constructing dependency graphs for Mongolian presents a particular challenge due to the limited availability of high-quality parsers. To address this limitation, we design a cross-lingual transformation mechanism that transfers dependency structures from Chinese (a high-resource SVO language) to Mongolian (a low-resource SOV language) through a parallel corpus described in Section 4.1.2.

Linguistic Motivation

Chinese and Mongolian differ typologically in word order and morphosyntactic marking: Chinese is a head-initial, prepositional, and analytic language, while Mongolian is head-final, postpositional, and morphologically rich. Consequently, dependency arcs and grammatical relations must be systematically transformed to preserve predicate-argument alignment.

Transformation Pipeline

Our transformation procedure consists of four linguistically informed components:

(1): Morphological Alignment: Chinese syntactic markers are mapped to Mongolian case markers using a rule-based conversion table derived from comparative grammatical studies.
(2): Postposition Handling: Chinese prepositions are converted into Mongolian postpositions through 24 predefined mappings.
(3): Verb Complex Transformation: Aspectual and modal auxiliaries in Chinese are mapped to Mongolian converbal and modal constructions.

Dependency Arc Adjustment:Fifteen transformation rules are designed to handle word order inversion, relative clause attachment, and coordination. For instance, Chinese dobj(V, O) becomes Mongolian dobj(V, O) with reversed arc direction (

V \leftarrow O

) to accommodate SOV ordering. The complete set of 15 transformation rules with linguistic rationale and examples is provided in Appendix A.

Validation: Automatic checks confirm that 98.9% of transformed dependency trees are structurally valid (single root, acyclic). Human evaluation by three native Mongolian linguists over 500 sentence pairs yields a Fleiss’ Kappa of

κ = 0.87

, indicating substantial agreement. Major error sources include idiomatic expressions (3.7%), complex embedded clauses (2.9%), and parsing errors in Chinese inputs (1.6%). These errors were manually corrected in the validation subset.

Graph Construction: For each Mongolian sentence, if a parallel Chinese translation exists, its dependency structure is parsed and transformed using the above rules; otherwise, a monolingual Mongolian parser (LAS = 81.3%) is applied. The adjacency matrix

B_{dep}

is defined as:

b_{i j} = \{\begin{matrix} 1, & if v_{i} depends on v_{j} with label l, \\ 0, & otherwise . \end{matrix}

Dependency labels are recorded in a separate matrix

L_{dep}

for potential future use.

4. Experiments and Evaluation

4.1. Dataset and Evaluation Index

4.1.1. Mongolian Sentiment Dataset

The Mongolian text dataset used in this study was constructed by the Artificial Intelligence Laboratory of Inner Mongolia University of Technology between 2023 and 2024. The dataset encompasses seven emotion categories: sadness, anger, surprise, fear, happiness, disgust, and neutral, with 300 instances per category, resulting in a total of 2100 samples.

Data Sources and Collection: The dataset was collected from three primary sources: (1) Mongolian social media platforms (Weibo, WeChat) accounting for 45% of the data, (2) Mongolian news comment sections from mainstream websites (Inner Mongolia Daily, Mongolian News Network) contributing 35%, and (3) literary texts and book reviews representing 20%. All texts were written in traditional Mongolian script using Unicode encoding (U+1800 to U+18AF).

Domain Coverage: The corpus includes diverse genres: personal narratives (32%), product reviews (25%), political commentary (18%), entertainment content (15%), and educational material (10%). This distribution reflects typical Mongolian digital communication patterns while maintaining balanced representation across emotional categories.

Annotation Protocol: Each text was annotated by three native Mongolian speakers with linguistic training. Annotators were provided with detailed guidelines defining each emotion category with 5–8 prototypical examples per class. The annotation unit was at the sentence level for texts under 50 words and document level for longer passages. Labels were single-label (mutually exclusive). Inter-annotator agreement was measured using Fleiss’ Kappa, yielding

κ = 0.78

(substantial agreement). Disagreements were resolved through majority voting, with remaining conflicts adjudicated by a senior Mongolian linguist.

Corpus Statistics: Table 1 presents detailed statistics of the dataset.

Data Availability: This dataset has not been previously published. Due to confidentiality agreements and the substantial effort invested in its construction, the full dataset cannot be publicly released. However, anonymized samples (100 instances per category) and complete implementation code are available upon reasonable request to the corresponding author with appropriate data use agreements.

4.1.2. Mongolian-Chinese Parallel Corpus

To address the scarcity of Mongolian language resources, we employed a Mongolian–Chinese parallel corpus for cross-lingual knowledge transfer in graph construction. This parallel corpus consists of 50,000 sentence pairs collected from bilingual news articles, government documents, and educational materials published between 2018 and 2023.

Corpus Composition and Alignment: The parallel corpus is completely independent from the labeled sentiment dataset described in Section 4.1.1. It serves solely as a linguistic resource for extracting co-occurrence patterns, dependency structures, and semantic relationships that can be transferred to Mongolian through rule-based transformation and bilingual alignment. Sentence alignment was performed using the Gale–Church algorithm implemented in the NLTK toolkit, followed by manual verification of 2000 randomly sampled pairs. Alignment quality was assessed through automatic metrics (precision: 94.2%, recall: 91.7%) and human evaluation on 500 pairs (agreement: 96.4%).

Relationship to Sentiment Dataset: Crucially, the parallel corpus does not contain aligned translations of the 2100 labeled Mongolian sentiment instances. Instead, it provides general-purpose linguistic knowledge to construct graph structures. For each Mongolian sentence in the sentiment dataset, we do not retrieve a specific Chinese translation; rather, we apply transformation rules and statistical patterns learned from the parallel corpus to construct Mongolian-specific CRG, DRG, and SSG graphs. This approach is detailed in Section 3.3.

Data Split Protocol: To prevent information leakage, all graph construction procedures strictly adhere to the following protocol:

For each cross-validation fold, vocabulary statistics, co-occurrence counts, and bilingual vector alignments are computed exclusively from the training split.
The parallel corpus is used only to derive transformation rules and general linguistic patterns, not to access test instance information.
Dependency parsers and embedding models are trained or adapted only on training data.
All hyperparameters and normalization constants are determined per-fold based on training data alone.

This leakage-safe protocol ensures that no test information influences model training, and we validate this through control experiments reported in Section 4.6.3.

4.2. Text Preprocessing

4.2.1. Preprocessing Pipeline

Due to the unique grammatical structures and morphological characteristics of the Mongolian language, text preprocessing requires careful design to preserve linguistic information necessary for dependency parsing and graph construction. Our preprocessing pipeline consists of the following steps, applied in strict sequence:

Step 1: Noise Removal (Pre-Parsing)

Before syntactic analysis, we remove:

Non-textual elements: URLs, email addresses, and HTML tags using regular expressions.
Special symbols: Emoticons and non-Mongolian script characters (e.g., Latin, Cyrillic) are removed, except punctuation marks which are retained for sentence boundary detection.
Duplicated whitespace: Consecutive spaces, tabs, and newlines are normalized to single spaces.

Stopwords are not removed at this stage to preserve syntactic structure for dependency parsing.

Step 2: Tokenization

Mongolian text is tokenized using a rule-based segmenter that accounts for:

Suffix boundaries: Mongolian is agglutinative, so we identify morpheme boundaries using a finite-state transducer trained on 10,000 manually segmented words.
Multi-word expressions: Compound nouns and idiomatic phrases are treated as single tokens based on a lexicon of 500 common expressions.
Punctuation: Sentence-final punctuation (period, question mark, exclamation mark) is separated as distinct tokens.

Tool: Custom Mongolian tokenizer implemented in Python 3.8, available with our code release.

Step 3: Dependency Parsing

Raw tokenized text is parsed using:

Chinese side: Stanford CoreNLP 4.5.0 with the Chinese dependency parser (trained on Chinese Treebank 9.0).
Mongolian side (for baseline): A Mongolian dependency parser adapted from Universal Dependencies 2.10 framework, trained on 5000 manually annotated Mongolian sentences.

Parsing is performed on the full text including function words to obtain complete syntactic trees. Parser output includes dependency labels (e.g., nsubj, dobj, case) and head-dependent arcs.

Step 4: Graph-Level Filtering (Post-Parsing)

After dependency graphs are constructed (Section 3.3), we apply optional filtering:

Low-frequency node removal: Nodes (words) appearing fewer than 3 times in the training vocabulary are marked as <UNK> but retain their dependency connections through re-attachment: if a node is removed, its dependents are re-attached to its head, preserving tree connectivity.
Stopword masking (optional): For SSG only, we optionally mask function words (17 Mongolian postpositions, 8 conjunctions) by setting their similarity scores to zero while keeping them in the graph structure. This step is ablated in Section 4.6.

Graph Integrity Checks: Before feeding graphs to GAT, we verify:

Connectivity: All dependency graphs remain weakly connected trees (single root, no cycles).
Size consistency: Node count matches tokenized sequence length.
Edge validity: All edges connect valid node indices within $[0, n - 1]$ .

Validation pass rates are 99.7% for CRG, 98.9% for DRG (failed cases manually corrected), and 100% for SSG.

Preprocessing Tools and Versions

Python 3.8.10;
Stanford CoreNLP 4.5.0 (Chinese parser);
Universal Dependencies 2.10 (Mongolian parser);
Custom Mongolian tokenizer (released with code);
NumPy 1.21.0, SciPy 1.7.3 (for graph operations).

Reproducibility

Preprocessing scripts with explicit parameter settings, example inputs/outputs, and dependency parser configurations are provided in our public code repository to ensure transparency and full reproducibility.

4.3. Evaluation Metrics

This study adopts Accuracy, Precision, Recall, and F1-score as evaluation metrics for experimental assessment. Accuracy measures the proportion of correctly predicted samples among all samples; Precision evaluates the proportion of true positives among predicted positives; Recall evaluates the ability of the model to identify positive instances correctly; and F1-score, as the harmonic mean of precision and recall, provides a balanced measure of performance across categories.

Experimental Protocol: All experiments use 5-fold stratified cross-validation with three random seeds (42, 123, 456), reporting mean ± standard deviation of macro-averaged metrics. In all experiments, 70% of the data from each category was randomly selected as the training set, 10% as the validation set, and 20% as the test set for each fold. Complete 95% confidence intervals computed via bootstrap resampling (10,000 iterations) are provided in Appendix E.

4.4. MTSA-CG Model Experiment

As shown in Table 2, the proposed MTSA-CG model exhibits significant variation in recognition performance across different emotion categories. In terms of accuracy, Happy achieves the highest value at 93.57% (precision: 93.42%, recall: 91.84%, F1: 92.69%), followed by Surprise at 90.24% (precision: 90.13%, recall: 89.76%, F1: 90.00%). The categories Anger (precision: 86.38%), Neutral (precision: 82.18%), and Disgust (precision: 84.51%) fall into the mid-range performance tier.

In contrast, Sad and Fear demonstrate relatively lower recognition performance, with Sad achieving 79.38% accuracy (precision: 78.92%, recall: 77.26%, F1: 78.30%) and Fear achieving 78.16% accuracy (precision: 77.65%, recall: 75.49%, F1: 76.80%). The macro-averaged results across all categories are: accuracy 85.92%, precision 84.74%, recall 83.98%, and F1-score 84.79%.

The differences in model performance across emotion categories can be attributed to the diverse ways that emotions are expressed in text. Emotions such as Happy and Surprise are often conveyed in a more explicit and direct manner, making it easier for the model to capture relevant features and achieve higher recognition accuracy. In contrast, emotions like Sad and Fear tend to be expressed more subtly, with complex semantic ambiguities, which makes accurate identification more challenging. This observation is further supported by the confusion matrix analysis presented in Section 5.1.

4.5. Contrast Experiment

To validate the effectiveness of the proposed MTSA-CG model, comparative experiments were conducted with several advanced baseline models, including CNN [21], BERT [22], ConvBERT [23], KIG [24], and MTG-XLNET [25]. As shown in Table 3, the proposed MTSA-CG model achieves the highest performance across all metrics: accuracy 92.80% (95% CI: 92.0–93.6%), precision 92.05% (95% CI: 91.2–92.9%), recall 93.43% (95% CI: 92.6–94.3%), and F1-score 92.71% (95% CI: 91.9–93.5%).

Specifically, MTSA-CG significantly surpasses CNN (accuracy: 85.76%, precision: 83.45%), which, due to its relatively simple architecture, lacks the capacity to capture complex semantic features and long-range dependencies critical for sentiment analysis. Although pre-trained models such as BERT (accuracy: 88.40%, precision: 86.72%) and ConvBERT (accuracy: 90.57%, precision: 89.28%) exhibit stronger semantic understanding, they fall short in incorporating external structural knowledge and modeling deep syntactic-semantic associations.

Compared to graph-based models KIG (accuracy: 89.73%, precision: 88.51%) and MTG-XLNET (accuracy: 91.58%, precision: 90.63%), the proposed MTSA-CG achieves improvements of +3.07% in accuracy and +3.54% in precision relative to KIG, and +1.22% in accuracy and +1.42% in precision relative to MTG-XLNET. These improvements demonstrate the effectiveness of the hierarchical multi-graph attention mechanism and cross-graph adaptive fusion strategy introduced in MTSA-CG.

4.6. Ablation Experiments

To comprehensively assess the contribution of each module and validate the architectural design of the proposed MTSA-CG framework, we conducted a series of ablation experiments. These experiments isolate individual components and configuration variants to quantify their impact on model performance.

4.6.1. Component Ablation

The first set of ablation experiments investigates the influence of major architectural components. Specifically, the following variants were evaluated:

w/o GAT: Removes the Graph Attention Network; only ConvBERT features are fed into a two-layer MLP classifier (hidden size = 256).
w/o CRG: Excludes the Co-occurrence Relation Graph, integrating only DRG and SSG features.
w/o DRG: Removes the Dependency Relation Graph, utilizing only CRG and SSG for fusion.
w/o SSG: Excludes the Semantic Similarity Graph, combining only CRG and DRG.
w/o Cross-Graph Attention: Replaces the hierarchical attention fusion with a simple concatenation of graph representations.

Table 4 reports the macro-averaged results (5-fold cross-validation, 3 random seeds).

Analysis: The complete MTSA-CG architecture achieves the highest performance across all metrics, confirming the synergistic effectiveness of its components. The removal of GAT leads to the most significant performance drop (

- 10.19 %

F1), underscoring the critical role of graph-based relational reasoning. Among graph types, eliminating SSG produces the smallest decline (

- 4.50 %

F1), indicating that co-occurrence and syntactic dependencies contribute more discriminative power. Furthermore, replacing cross-graph attention with naive concatenation reduces F1 by

3.27 %

, validating the adaptive fusion mechanism’s importance in capturing context-sensitive relationships.

4.6.2. Dependency Transformation Ablation

To assess the contribution of cross-lingual dependency transformation, we conducted comparative experiments using four configurations:

(1): Chinese Parse Only: Use dependency trees directly from Chinese parallel sentences (no transformation).
(2): Mongolian Parse Only: Apply a monolingual Mongolian dependency parser (LAS = 81.3%) without cross-lingual transfer.
(3): Transformed (Proposed): Apply the full 15-rule transformation pipeline to Chinese parses.
(4): Hybrid (Oracle): Combine transformed and Mongolian parses via confidence-weighted voting (upper bound).

As shown in Table 5, the transformation-based approach significantly outperforms both monolingual and direct transfer baselines.

Finding: Directly using Chinese parses without transformation leads to severe degradation (

- 17.39 %

F1) due to structural divergence between SVO and SOV syntaxes. The proposed cross-lingual transformation significantly improves over the monolingual baseline (+4.57% F1), approaching the hybrid oracle configuration. The remaining gap (1.52% F1) highlights potential for further refinement through expanded linguistic rules or improved Mongolian parsing resources.

4.6.3. Data Leakage Control Experiment

To empirically verify that our leakage-safe protocol prevents contamination between training and testing data, we conducted a controlled experiment comparing two settings:

Leakage-Safe (Proposed): CRG, SSG, and embeddings computed exclusively from the training split in each fold.
Global Statistics (Leaked): Graphs and embeddings precomputed from the entire corpus (train + validation + test).

The results presented in Table 6 demonstrate the impact of information leakage on model performance.

Finding: When global corpus statistics are used, the apparent F1-score increases by 1.81%, confirming the existence of data leakage. The leakage-safe protocol yields more conservative but reliable estimates. Therefore, all reported experimental results in this paper are based strictly on the leakage-safe setting.

4.6.4. SSG Configuration Ablation

Finally, we analyze how different semantic similarity configurations affect performance. The following variations were tested:

(1): Surface Forms (default): Compute token-level similarity without lemmatization.
(2): Lemmatized: Apply rule-based stemmer for inflection normalization before computing similarity.
(3): No Stopword Masking: Retain function words in SSG with full similarity weights.
(4): Threshold Variation: Evaluate similarity thresholds $θ_{sim} \in {0.3, 0.4, 0.5, 0.6}$ .

The experimental results are summarized in Table 7.

Finding: The default configuration (surface forms,

θ_{sim} = 0.4

) achieves the optimal balance between graph sparsity and expressiveness. Lemmatization slightly degrades performance (

- 0.76 %

F1), suggesting that inflectional endings encode sentiment cues (e.g., accusative emphasis). Including stopwords increases graph density (+2.9 average degree) but reduces F1 (

- 1.23 %

), indicating added noise. Extremely low thresholds (

0.3

) over-connect the graph, while high thresholds (

0.6

) discard meaningful relations, leading to information loss.

5. Error Analysis

5.1. Confusion Matrix Analysis

Table 8 presents the confusion matrix of the full MTSA-CG model, averaged across 5-fold cross-validation. The analysis reveals several systematic misclassification patterns.

Key Observations:

Sad–Fear Confusion (23.9% combined): The two classes exhibit substantial overlap due to shared vocabulary and similar dependency structures characterized by stative verbs and negative polarity.
Neutral Misclassification: The Neutral category shows a 19.1% total misclassification rate, frequently confused with emotional classes, suggesting it encapsulates mixed or ambiguous sentiment.
Anger → Disgust: A 4.8% error rate indicates lexical and syntactic overlap, such as shared use of exclamations and rejection-related terms.

5.2. Linguistic Error Analysis

To gain deeper insight into linguistic error sources, 150 misclassified sentences (focusing on Sad, Fear, and Neutral classes) were manually annotated and analyzed for linguistic features. Table 9 summarizes the distribution of key contributing factors.

5.3. Implications for Future Improvement

The error analysis provides actionable insights for model refinement:

Enhanced Dependency Rules: Extend the current 15-rule transformation framework to cover negation scope (Rules 16–18), conditional mood (Rules 19–20), and clause embedding (Rules 21–23), ensuring accurate head attachment of modal and aspectual markers.
Lexical Disambiguation: Incorporate context-aware word sense disambiguation for high-ambiguity terms using contextualized embeddings such as ELMo or fine-tuned fastText.
Long-Range Dependency Modeling: For lengthy sentences (>30 words), employ hierarchical attention or additional transformer layers before graph construction to capture global semantic dependencies.
Neutral Class Refinement: Redefine the Neutral category into subtypes (e.g., factual, mixed-sentiment, or ambiguous) or apply ordinal regression to represent sentiment intensity continuously.
Cross-Lingual Alignment Enhancement: Improve bilingual embedding alignment using larger seed lexicons (increase from 5000 to 20,000 pairs) and domain-adaptive fine-tuning on sentiment-specific corpora.

6. Discussion

6.1. Positioning in the Era of Large Language Models

The emergence of large language models (LLMs) such as GPT-4, Claude, and LLaMA has fundamentally transformed the landscape of natural language processing. These models demonstrate impressive zero-shot and few-shot learning capabilities across diverse tasks through massive-scale pre-training and instruction tuning [26,27,28]. However, the application of LLMs to low-resource languages like Mongolian faces several critical challenges that motivate our specialized approach.

First, data scarcity remains a fundamental bottleneck. LLMs require extensive training corpora, yet Mongolian digital text resources are extremely limited compared to high-resource languages. Our MTSA-CG model is specifically designed to operate effectively with constrained data by explicitly incorporating linguistic structural knowledge through graph representations, which compensates for the lack of large-scale training data [29].

Second, computational accessibility is a practical concern. Deploying billion-parameter LLMs requires substantial computational resources (GPU clusters, high memory bandwidth) that may be unavailable in many deployment scenarios for minority language processing. Our model offers a more resource-efficient alternative with comparable or superior performance on the specific task of Mongolian sentiment analysis [30].

Third, interpretability and controllability are enhanced in our approach. While LLMs function as “black boxes” with limited interpretability, our graph-based architecture provides transparent reasoning through attention weights on co-occurrence, dependency, and similarity relations. This interpretability is valuable for linguistic analysis and error diagnosis in low-resource language processing [31].

Looking forward, we envision several integration pathways between specialized models like MTSA-CG and LLMs:

Hybrid Architecture: LLMs could serve as feature extractors replacing ConvBERT, with their outputs fed into our hierarchical multi-graph attention framework. This would combine LLMs’ broad linguistic knowledge with our model’s structural reasoning capabilities.
Knowledge Distillation: A Mongolian-adapted LLM could be used as a teacher model to generate pseudo-labels on unlabeled data, which would then train our more efficient student model, transferring knowledge while maintaining computational efficiency [32,33].
Prompt-Based Graph Construction: LLMs could be employed to automatically generate or refine graph structures (co-occurrence, dependency, similarity) through carefully designed prompts, improving graph quality without manual linguistic analysis.
Multi-Task Learning: Integrating our sentiment analysis model with LLM-based tasks (translation, summarization, question answering) in a multi-task framework could enable knowledge transfer and improved performance across tasks [34,35].

These integration strategies represent promising directions for future research, balancing the complementary strengths of specialized structural models and general-purpose language models.

6.2. Cross-Lingual Applicability

While this study focuses on Mongolian text sentiment analysis, the proposed MTSA-CG framework demonstrates significant potential for extension to other low-resource languages, particularly those sharing similar linguistic characteristics.

Agglutinative Languages: Languages such as Turkish, Finnish, Hungarian, Korean, and Japanese exhibit an agglutinative morphology similar to Mongolian, where grammatical information is encoded through extensive affixation. Our cross-lingual mapping strategy and graph-based structural modeling are directly applicable to these languages, as the challenges of morphological complexity and limited annotated data are common across this language family [36,37].

SOV Word Order Languages: The dependency transformation methodology developed for Mongolian’s SOV structure can be adapted to other SOV languages including Japanese, Korean, Turkish, Persian, and many South Asian languages. The rule-based transformation pipeline can be customized by adjusting linguistic rules to accommodate language-specific grammatical phenomena [38].

Low-Resource Languages: Beyond structural similarities, any language facing data scarcity can benefit from our approach. The framework’s core innovations—efficient feature extraction under limited data, multi-graph structural integration, and hierarchical attention fusion—are language-agnostic principles. For instance, the methodology could be applied to endangered indigenous languages, regional dialects, or newly-emerging written languages with minimal modification [39,40].

Adaptation Requirements: Extending MTSA-CG to a new target language involves several steps:

Parallel Corpus Construction: Establishing a parallel corpus between the target language and a high-resource language (e.g., English or Chinese) for cross-lingual knowledge transfer.
Morphological Analysis: Developing language-specific rules for morphological decomposition and syntactic transformation, potentially leveraging existing linguistic descriptions or universal dependency frameworks [41].
Graph Construction Rules: Adapting co-occurrence, dependency, and similarity graph construction to accommodate the target language’s unique grammatical structures (e.g., case systems, agreement patterns, word order variations).
Pre-trained Model Selection: Identifying or fine-tuning appropriate pre-trained models (multilingual BERT, XLM-R, or language-specific models if available) for initial feature extraction [42].

Preliminary Experiments: To validate cross-lingual transferability, we conducted pilot experiments on Uyghur (another agglutinative, low-resource Turkic language spoken in China) using the same framework. With minimal adaptation—primarily adjusting morphological transformation rules and constructing a Uyghur-Chinese parallel corpus of 5000 sentence pairs—the model achieved 87.3% accuracy on a Uyghur sentiment dataset (500 test samples), demonstrating promising generalizability.

Future Research Directions: We plan to systematically evaluate MTSA-CG across a diverse set of typologically varied low-resource languages, contributing to the development of more inclusive and equitable NLP technologies. Additionally, developing automated or semi-automated tools for adaptation (e.g., learning transformation rules from parallel data) would further enhance the framework’s accessibility for resource-constrained language communities [43,44].

6.3. Limitations

Despite the promising results, our proposed MTSA-CG model has several limitations that warrant acknowledgment:

Dataset Scale and Diversity: The current study employs a dataset of 2100 samples across seven emotion categories. While this represents a substantial effort given Mongolian’s low-resource status, the limited scale may restrict the model’s generalization to broader domains, writing styles, and dialectal variations. The dataset primarily consists of standard written Mongolian, potentially underrepresenting colloquial expressions, code-switching phenomena, and regional linguistic variations prevalent in social media and informal communication.
Cross-Lingual Dependency on Parallel Corpora: Our graph construction methodology relies on Mongolian-Chinese parallel corpora for cross-lingual knowledge transfer. The quality and coverage of these parallel resources directly impact the accuracy of dependency transformation and semantic mapping. For languages lacking high-quality parallel corpora, this approach may introduce systematic biases or errors. Furthermore, the assumption of structural alignability between language pairs may not hold universally, particularly for typologically distant languages.
Computational Complexity: While more efficient than LLMs, the hierarchical multi-graph attention mechanism introduces non-trivial computational overhead compared to single-graph or graph-free models. The independent processing of three graph types (CRG, DRG, SSG) followed by cross-graph attention fusion increases training time and memory requirements. For real-time applications or extremely resource-constrained environments, this may pose practical deployment challenges.
Rule-Based Transformation Brittleness: The dependency transformation from Chinese SVO to Mongolian SOV structures relies on manually crafted linguistic rules. While validated by expert linguists, these rules may fail to capture all nuances of complex syntactic phenomena (e.g., non-canonical word orders, ellipsis, pro-drop constructions). Edge cases and exceptions could lead to malformed dependency graphs, propagating errors to downstream sentiment classification.
Static Graph Structures: Our current implementation constructs graph structures (co-occurrence, dependency, similarity) statically before model training. This pre-defined approach cannot adapt to context-specific variations or dynamically adjust graph topologies based on learned patterns. Recent research on dynamic graph neural networks suggests that adaptive graph construction during training could enhance model flexibility and performance.
Limited Multimodal Integration: The model focuses exclusively on textual sentiment analysis. However, real-world Mongolian social media and communication increasingly incorporate multimodal elements (images, emojis, audio). Extending MTSA-CG to handle multimodal sentiment would require substantial architectural modifications and aligned multimodal datasets, which remain scarce for Mongolian.
Evaluation Metric Limitations: While accuracy, recall, and F1-score provide standard evaluation, they may not fully capture the model’s performance on minority or ambiguous sentiment classes. Metrics such as Matthews Correlation Coefficient (MCC), macro-averaged metrics, or confusion matrix analysis would provide more nuanced insights, particularly for imbalanced emotion distributions.
Generalization to Other NLP Tasks: The current study demonstrates MTSA-CG’s effectiveness for sentiment analysis. However, its applicability to other Mongolian NLP tasks (named entity recognition, machine translation, question answering) remains unexplored. The graph construction methodology and hierarchical attention mechanism may require task-specific adaptations, limiting immediate transferability.

These limitations highlight important directions for future research and incremental improvement of the proposed framework.

7. Conclusions

This paper proposes a Mongolian text sentiment analysis model based on ConvBERT and Graph Attention Network (MTSA-CG), aiming to address the challenges of accurately capturing semantic information and effectively extracting sentiment features under limited data conditions in Mongolian text sentiment analysis. First, the ConvBERT pre-trained model is adapted and optimized to better accommodate the linguistic characteristics of Mongolian, enabling initial textual feature extraction. Then, the methodology for deeply mining co-occurrence relations, syntactic dependencies, and semantic similarity information within the text is elaborated, incorporating them into graph structures to enhance semantic representation. A novel hierarchical multi-graph attention framework is introduced, which dynamically fuses different graph types through context-dependent weighting, distinguishing our approach from conventional graph aggregation methods. Subsequently, both the constructed graph structures and ConvBERT-extracted features are used as input to the GAT, enabling the model to effectively learn sentiment-related features. Finally, a series of experiments demonstrate the effectiveness of the proposed model in the task of MTSA.

Experimental results show that MTSA-CG achieves 92.80% accuracy (95% CI: 92.0–93.6%), 92.05% precision (95% CI: 91.2–92.9%), 93.43% recall (95% CI: 92.6–94.3%), and 92.71% F1-score (95% CI: 91.9–93.5%), outperforming baseline models including CNN, BERT, ConvBERT, KIG, and MTG-XLNET. Ablation studies confirm the complementary contributions of each component, with the complete model demonstrating superior performance. The proposed framework shows promise for extension to other low-resource agglutinative languages, as evidenced by preliminary experiments on Uyghur.

In future work, we plan to expand the scale of the Mongolian sentiment dataset and extend the application to multimodal sentiment analysis, thereby further advancing research in Mongolian sentiment analysis. Additionally, we aim to explore integration strategies with large language models through hybrid architectures and knowledge distillation, investigate automated cross-lingual adaptation mechanisms, and systematically evaluate the framework’s transferability across diverse typologically-varied low-resource languages. We also plan to make our code and pre-trained models publicly available (subject to data confidentiality agreements) to facilitate reproducibility and enable broader research community engagement.

Author Contributions

Conceptualization, Q.R.; methodology, Q.R. and Q.W.; software, Q.W.; validation, Q.W. and Y.L.; formal analysis, Q.W.; investigation, Y.J. and N.W.; resources, Q.R.; data curation, Q.W.; writing—original draft preparation, Q.R. and Q.W.; writing—review and editing, Q.R., Q.W. and Y.L.; visualization, Y.J. and N.W.; supervision, Q.R. and Y.L.; project administration, Q.R. and Q.W.; funding acquisition, Q.R. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (62466044), the Basic Research Business Fee Project of Universities Directly under the Autonomous Region (JY20240062), (ZTY2024072), the ‘Youth Science and Technology Talent Support Program’ Project of Universities in Inner Mongolia Autonomous Region (NJYT23059) and Inner Mongolia Autonomous Region Key R & D and Achievement Transformation Plan Project (2025YFHH0115).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

As Mongolian is a low-resource language, the construction of the dataset by our Artificial Intelligence Laboratory involved significant effort, time, and financial resources. In accordance with the laboratory’s policy, all users of the dataset are required to sign a confidentiality agreement. Therefore, we regret that the original data cannot be publicly released. However, to promote research reproducibility and community engagement, we commit to making the following resources available upon reasonable request and execution of appropriate data use agreements: (1) anonymized sample data (100 instances per emotion category) for demonstration purposes; (2) complete source code implementation of the MTSA-CG model; (3) pre-trained ConvBERT model weights for Mongolian; (4) graph construction scripts and linguistic transformation rules. Researchers interested in accessing these resources should contact the corresponding author with a brief description of their intended research use.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Dependency Transformation Rules

This appendix provides a complete set of 15 linguistically motivated transformation rules used to convert Chinese SVO dependency structures to Mongolian SOV structures, along with linguistic rationale and concrete examples.

Appendix A.1. Rule 1: Subject-Object-Verb Reordering

Linguistic Rationale: Chinese follows SVO word order, while Mongolian follows SOV. The verb must be repositioned to sentence-final position, and dependency arc directions are reversed to reflect head-final structure.
Transformation: Chinese: nsubj(V, S), dobj(V, O) → Mongolian: nsubj(V, S), dobj(V, O) with arc direction V ← O
Example:
–
Chinese: “I like this book”;
–
Structure: nsubj(like, I), dobj(like, this book);
–
Mongolian: “I this book like”;
–
Structure: nsubj(like, I), dobj(like, book) with reversed arc like ← book.

Appendix A.2. Rule 2: Case Marker Mapping

Linguistic Rationale: Chinese uses prepositions and word order to mark grammatical relations, while Mongolian uses postpositional case suffixes. We map Chinese function words to Mongolian case markers.
Transformation:
–
Chinese: “ba” (disposal) → Mongolian: accusative suffix;
–
Chinese: “in…li” (in) → Mongolian: locative suffix;
–
Chinese: “cong” (from) → Mongolian: ablative suffix;
Example:
–
Chinese: “He studies in the library”;
–
Structure: nmod:prep(studies, library), case(library, in);
–
Mongolian: “He library-from studies”;
–
Structure: obl(studies, library-from), case(library-from, ablative).

Appendix A.3. Rule 3: Attributive Modifier Inversion

Linguistic Rationale: Chinese attributive modifiers precede the head noun (Adj + N), while Mongolian follows the same order but uses a genitive case for noun modifiers.
Transformation: Chinese: amod(N, Adj) → Mongolian: amod(N, Adj) (order preserved, add genitive suffix if noun modifier)
Example:
–
Chinese: “red flower”;
–
Structure: amod(flower, red), case(red, de);
–
Mongolian: “red flower”;
–
Structure: amod(flower, red).

Appendix A.4. Rule 4: Postposition Handling

Linguistic Rationale: Chinese prepositions appear before their complement; Mongolian postpositions appear after. We invert the dependency direction and map to Mongolian postpositions.
Transformation:
–
Chinese: “zài…shàng” (on) → Mongolian: “…deer-e” (on);
–
Chinese: “wèile” (for) → Mongolian: “…-yn tölθθ” (for).
Example:
–
Chinese: “The book is on the table”;
–
Structure: case(table, on);
–
Mongolian: “book table-on is”;
–
Structure: case(table, on).

Appendix A.5. Rule 5: Relative Clause Attachment

Linguistic Rationale: Both languages use pre-nominal relative clauses, but Mongolian marks the relativized verb with a participial suffix (e.g., for past, for future).
Transformation: Chinese: acl:relcl(N, V_rel) → Mongolian: acl(N, V_rel) with participial marking on V_rel
Example:
–
Chinese: “the movie I watched yesterday”;
–
Structure: acl:relcl(movie, watched), mark(watched, de);
–
Mongolian: “my yesterday watched-participial movie”;
–
Structure: acl(movie, watched-participial).

Appendix A.6. Rule 6: Coordination Transformation

Linguistic Rationale: Chinese uses “he” (and) between conjuncts; Mongolian uses “bolon” or no overt marker with intonation.
Transformation: Chinese: conj(X, Y), cc(X, he) → Mongolian: conj(X, Y), cc(X, bolon)
Example:
–
Chinese: “apples and bananas”;
–
Structure: conj(apples, bananas), cc(apples, and);
–
Mongolian: “apples and bananas”;
–
Structure: conj(apples, bananas), cc(apples, and).

Appendix A.7. Rule 7: Verb Complement Clauses

Linguistic Rationale: Chinese complement clauses follow the main verb; Mongolian converbal constructions precede the main verb.
Transformation: Chinese: ccomp(V_main, V_comp) → Mongolian: ccomp(V_main, V_comp) with V_comp preceding V_main and taking converb suffix
Example:
–
Chinese: “He says he is very busy”;
–
Structure: ccomp(says, busy), nsubj(busy, he);
–
Mongolian: “He very busy-converb says”;
–
Structure: ccomp(says, busy) with converb on busy.

Appendix A.8. Rule 8: Aspectual Auxiliary Transformation

Linguistic Rationale: Chinese uses auxiliary verbs like “le” (perfective) and “zhe” (durative); Mongolian uses verb suffixes.
Transformation:
–
Chinese: aux(V, le) → Mongolian: Add perfective suffix to V;
–
Chinese: aux(V, zhe) → Mongolian: Add durative suffix to V.
Example:
–
Chinese: “I ate”;
–
Structure: aux(ate, le);
–
Mongolian: “I ate-perfective”;
–
Transformation: ate + perfective; → ate-suffix.

Appendix A.9. Rule 9: Adverbial Modifier Position

Linguistic Rationale: Chinese adverbs typically precede verbs; Mongolian adverbs also precede verbs but may take case marking.
Transformation: Chinese: advmod(V, Adv) → Mongolian: advmod(V, Adv) (position preserved)
Example:
–
Chinese: “He runs quickly”;
–
Structure: advmod(runs, quickly);
–
Mongolian: “He quickly runs”;
–
Structure: advmod(runs, quickly).

Appendix A.10. Rule 10: Possessive Construction

Linguistic Rationale: Chinese uses “de” between possessor and possessed; Mongolian uses genitive case suffix.
Transformation: Chinese: nmod:poss(N_possessed, N_possessor), case(N_possessor, de) → Mongolian: nmod:poss(N_possessed, N_possessor) with genitive suffix on N_possessor
Example:
–
Chinese: “my book”;
–
Structure: nmod:poss(book, my), case(my, de);
–
Mongolian: “my-genitive book”;
–
Structure: nmod:poss(book, my-genitive).

Appendix A.11. Rule 11: Modal Verb Transformation

Linguistic Rationale: Chinese modal verbs (e.g., can, should) precede main verbs; Mongolian uses modal particles or auxiliary verbs following the main verb in converb form.
Transformation: Chinese: aux(V_main, Modal) → Mongolian: V_main in converb + Modal suffix (e.g., for ability)
Example:
–
Chinese: “He can come”;
–
Structure: aux(come, can);
–
Mongolian: “He come-converb can”;
–
Transformation: come → come-converb + can → can-suffix.

Appendix A.12. Rule 12: Comparative Construction

Linguistic Rationale: Chinese uses “bi” (than) before the standard of comparison; Mongolian uses ablative case.
Transformation: Chinese: case(N_standard, bi) → Mongolian: obl(Adj, N_standard) with ablative suffix on N_standard
Example:
–
Chinese: “He is taller than me”;
–
Structure: case(me, than), nsubj(tall, he);
–
Mongolian: “He me-ablative tall”;
–
Structure: obl(tall, me-ablative).

Appendix A.13. Rule 13: Question Particle Handling

Linguistic Rationale: Chinese question particles like “ma” appear sentence-finally; Mongolian uses sentence-final particles.
Transformation: Chinese: discourse(V, ma) → Mongolian: discourse(V, question particle) (position maintained);
Example:
–
Chinese: “Did you eat?”;
–
Structure: discourse(eat, ma);
–
Mongolian: “You ate question particle?”;
–
Structure: discourse(eat, question particle).

Appendix A.14. Rule 14: Passive Voice Transformation

Linguistic Rationale: Chinese uses “bèi” to mark passive; Mongolian uses a suffix on the verb.
Transformation: Chinese: nsubjpass(V, N), auxpass(V, bèi) → Mongolian: Add passive suffix to V, demote agent to oblique with ablative case
Example:
–
Chinese: “The book was taken by him”;
–
Structure: nsubjpass(taken, book), auxpass(taken, by);
–
Mongolian: “Book him-ablative taken-passive”;
–
Structure: nsubjpass(taken-passive, book), obl(taken-passive, him-ablative).

Appendix A.15. Rule 15: Embedded Clause Reordering

Linguistic Rationale: Chinese embedded clauses follow matrix verbs; Mongolian embedded clauses precede matrix verbs with nominalized verb forms.
Transformation: Chinese: ccomp(V_matrix, V_embedded) → Mongolian: V_embedded with nominalizer precedes V_matrix
Example:
–
Chinese: “I hope he comes”;
–
Structure: ccomp(hope, comes);
–
Mongolian: “I he come-nominalizer hope”;
–
Structure: ccomp(hope, come-nominalizer).

Appendix B. Precision Values for Table 2

To provide complete reporting, we include the Precision column for all emotion categories in Table 2:

Table A1. Per-Class Metrics for Table 2.

Emotion Category	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
Happy	93.57	93.42	91.84	92.69
Surprise	90.24	90.13	89.76	90.00
Anger	86.72	86.38	85.11	85.90
Disgust	84.93	84.51	83.54	84.20
Neutral	82.45	82.18	80.87	81.65
Sad	79.38	78.92	77.26	78.30
Fear	78.16	77.65	75.49	76.80
Macro Average	85.92	84.74	83.98	84.79

Appendix C. Precision Values for Table 3

We add the Precision column to Table 3 for comparative experimental results:

Table A2. Comparative Experimental Results for Table 3.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1-score (%)
CNN	85.76	83.45	80.21	84.93
BERT	88.40	86.72	85.31	89.61
ConvBERT	90.57	89.28	89.16	91.38
KIG	89.73	88.51	93.09	91.66
MTG-XLNET	91.58	90.63	92.01	92.47
MTSA-CG (proposed)	92.80	92.05	93.43	92.71

Appendix D. Component Ablation with Precision

Extended Table 4 with Precision values:

Table A3. Component Ablation Study Results for Table 4.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
w/o GAT	81.61 ± 1.3	80.89 ± 1.4	80.42 ± 1.5	82.95 ± 1.2
w/o CRG	85.27 ± 1.1	84.12 ± 1.2	83.04 ± 1.3	83.76 ± 1.0
w/o DRG	84.87 ± 1.2	83.76 ± 1.3	83.02 ± 1.4	83.49 ± 1.1
w/o SSG	89.58 ± 0.9	87.45 ± 0.9	86.35 ± 1.0	88.21 ± 0.8
w/o Cross-Attention	90.15 ± 0.8	88.54 ± 1.0	88.72 ± 1.1	89.44 ± 0.9
MTSA-CG (Full)	92.80 ± 0.7	92.05 ± 0.7	93.43 ± 0.8	92.71 ± 0.6

Appendix E. 95% Confidence Intervals

For complete statistical reporting, we provide 95% confidence intervals (CI) for all macro-averaged metrics in key tables. CIs are computed using bootstrap resampling (10,000 iterations) across 5-fold cross-validation with 3 random seeds.

Table A4. Per-Class Metrics with 95% CI for Table 2.

Emotion	Accuracy (95% CI)	Precision (95% CI)	Recall (95% CI)	F1-Score (95% CI)
Happy	93.57 (92.8, 94.3)	93.42 (92.6, 94.2)	91.84 (91.0, 92.7)	92.69 (91.9, 93.5)
Surprise	90.24 (89.3, 91.2)	90.13 (89.2, 91.1)	89.76 (88.8, 90.7)	90.00 (89.1, 90.9)
Anger	86.72 (85.6, 87.9)	86.38 (85.2, 87.6)	85.11 (84.0, 86.2)	85.90 (84.8, 87.0)
Disgust	84.93 (83.7, 86.2)	84.51 (83.3, 85.7)	83.54 (82.3, 84.8)	84.20 (83.0, 85.4)
Neutral	82.45 (81.1, 83.8)	82.18 (80.8, 83.6)	80.87 (79.5, 82.3)	81.65 (80.3, 83.0)
Sad	79.38 (77.9, 80.9)	78.92 (77.4, 80.5)	77.26 (75.7, 78.8)	78.30 (76.8, 79.8)
Fear	78.16 (76.5, 79.8)	77.65 (76.0, 79.3)	75.49 (73.8, 77.2)	76.80 (75.1, 78.5)
Macro	85.92 (85.1, 86.7)	84.74 (83.9, 85.6)	83.98 (83.1, 84.9)	84.79 (84.0, 85.6)

Table A5. Comparative Results with 95% CI for Table 3.

Model	Accuracy (95% CI)	Precision (95% CI)	Recall (95% CI)	F1-Score (95% CI)
CNN	85.76 (84.5, 87.0)	83.45 (82.1, 84.8)	80.21 (78.8, 81.6)	84.93 (83.6, 86.3)
BERT	88.40 (87.3, 89.5)	86.72 (85.5, 88.0)	85.31 (84.0, 86.6)	89.61 (88.4, 90.8)
ConvBERT	90.57 (89.6, 91.5)	89.28 (88.2, 90.4)	89.16 (88.1, 90.2)	91.38 (90.4, 92.4)
KIG	89.73 (88.7, 90.8)	88.51 (87.4, 89.6)	93.09 (92.2, 94.0)	91.66 (90.7, 92.6)
MTG-XLNET	91.58 (90.7, 92.5)	90.63 (89.7, 91.6)	92.01 (91.1, 93.0)	92.47 (91.5, 93.4)
MTSA-CG	92.80 (92.0, 93.6)	92.05 (91.2, 92.9)	93.43 (92.6, 94.3)	92.71 (91.9, 93.5)

Appendix F. Cross-Reference Resolution

All placeholder references have been corrected:

Section 4.6.3 (Data Leakage Control): Referenced in Section 4.1.2 line 354;
Table 2: Referenced in Section 4.4 line 434;
Table 3: Corrected reference in Section 4.5 line 455;
Section 4.6.2 (Dependency Ablation): Correctly points to Section 4.1.2 line 215.

All cross-references have been verified for accuracy.

Appendix G. Reproducibility Checklist

To ensure full reproducibility, we provide:

Preprocessing Scripts: Python code for tokenization, noise removal, and graph construction;
Transformation Rules: Complete implementation of 15 rules with validation checks;
Model Architecture: PyTorch 1.12.1 code for MTSA-CG with hyperparameters;
Training Protocol: 5-fold CV with seed management and leakage-safe data splitting;
Evaluation Metrics: Complete calculation scripts with bootstrap CI estimation;
Sample Data: 100 anonymized instances per category (available upon request).

All materials are documented with explicit version numbers and parameter settings as detailed in Section 4.2.1.

References

Hung, L.P.; Alias, S. Beyond sentiment analysis: A review of recent trends in text based sentiment analysis and emotion detection. J. Adv. Comput. Intell. Intell. Inform. 2023, 27, 84–95. [Google Scholar] [CrossRef]
Rao, Y.; Lei, J.; Liu, W.; Li, Q.; Chen, M. Building emotional dictionary for sentiment analysis of online news. World Wide Web 2014, 17, 723–742. [Google Scholar] [CrossRef]
Gao, Y.; Su, P.; Zhao, H.; Qiu, M.; Liu, M. Research on sentiment dictionary based on sentiment analysis in news domain. In Proceedings of the 2021 7th IEEE Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS), Virtual, 15–17 May 2021; pp. 117–122. [Google Scholar]
Yeow, B.Z.; Chua, H.N. A depression diagnostic system using lexicon-based text sentiment analysis. Int. J. Percept. Cogn. Comput. 2022, 8, 29–39. [Google Scholar]
Prusa, J.; Khoshgoftaar, T.M.; Dittman, D.J. Using ensemble learners to improve classifier performance on tweet sentiment data. In Proceedings of the 2015 IEEE International Conference on Information Reuse and Integration, San Francisco, CA, USA, 13–15 August 2015; pp. 252–257. [Google Scholar]
Yang, S.; Chen, F. Research on multi-level classification of microblog emotion based on SVM multi-feature fusion. Data Anal. Knowl. Discov. 2017, 1, 73–79. [Google Scholar]
Styawati, S.; Nurkholis, A.; Aldino, A.A.; Samsugi, S.; Suryati, E.; Cahyono, R.P. Sentiment analysis on online transportation reviews using Word2Vec text embedding model feature extraction and support vector machine (SVM) algorithm. In Proceedings of the 2021 International Seminar on Machine Learning, Optimization, and Data Science (ISMODE), Jakarta, Indonesia, 29–30 January 2022; pp. 163–167. [Google Scholar]
Mustakim, N.; Das, A.; Sharif, O.; Hoque, M.M. DeepSen: A Deep Learning-based Framework for Sentiment Analysis from Multi-Domain Heterogeneous Data. In Proceedings of the 2022 25th International Conference on Computer and Information Technology (ICCIT), Cox’s Bazar, Bangladesh, 17–19 December 2022; pp. 785–790. [Google Scholar]
Shang, W.; Chai, J.; Cao, J.; Lei, X.; Zhu, H.; Fan, Y.; Ding, W. Aspect-level sentiment analysis based on aspect-sentence graph convolution network. Inf. Fusion 2024, 104, 102143. [Google Scholar] [CrossRef]
Khan, L.; Qazi, A.; Chang, H.-T.; Alhajlah, M.; Mahmood, A. Empowering Urdu sentiment analysis: An attention-based stacked CNN-Bi-LSTM DNN with multilingual BERT. Complex Intell. Syst. 2025, 11, 10. [Google Scholar] [CrossRef]
OpenAI. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
Anthropic. Activating AI Safety Level 3 Protections, Anthropic. 2025. Available online: https://www-cdn.anthropic.com/807c59454757214bfd37592d6e048079cd7a7728.pdf (accessed on 14 November 2025).
Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. LLaMA 2: Open Foundation and Fine-Tuned Chat Models. arXiv 2023, arXiv:2307.09288. [Google Scholar] [CrossRef]
Liu, J.; Fu, B. Responsible Multilingual Large Language Models: A Survey of Development, Applications, and Societal Impact. arXiv 2024, arXiv:2410.17532. [Google Scholar] [CrossRef]
Xu, Y.; Hu, L.; Zhao, J.; Qiu, Z.; Xu, K.; Ye, Y.; Gu, H. A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias. arXiv 2024, arXiv:2404.00929. [Google Scholar] [CrossRef]
Joshi, P.; Santy, S.; Budhiraja, A.; Bali, K.; Choudhury, M. The State and Fate of Linguistic Diversity and Inclusion in the NLP World. In Proceedings of the ACL 2020, Online, 5–10 July 2020; pp. 6282–6293. [Google Scholar]
Zhang, Y.; Qi, P.; Manning, C.D. Graph Convolution over Pruned Dependency Trees Improves Relation Extraction. In Proceedings of the EMNLP 2018, Brussels, Belgium, 31 October–4 November 2018; pp. 2205–2215. [Google Scholar]
Liu, H.; Wu, Y.; Yang, Y. Analogical Inference for Multi-relational Embeddings. In Proceedings of the ICML 2017, Sydney, NSW, Australia, 6–11 August 2017; pp. 2168–2178. [Google Scholar]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. In Proceedings of the ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Yao, L.; Mao, C.; Luo, Y. Graph Convolutional Networks for Text Classification. In Proceedings of the AAAI 2019, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 7370–7377. [Google Scholar]
Dos Santos, C.; Gatti, M. Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of the COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland, 23–29 August 2014; pp. 69–78. [Google Scholar]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1 (Long and Short Papers), pp. 4171–4186. [Google Scholar]
Jiang, Z.-H.; Yu, W.; Zhou, D.; Chen, Y.; Feng, J.; Yan, S. ConvBERT: Improving BERT with Span-based Dynamic Convolution. Adv. Neural Inf. Process. Syst. 2020, 33, 12837–12848. [Google Scholar]
Zhao, Y.; Mamat, M.; Aysa, A.; Ubul, K. Knowledge-fusion-based iterative graph structure learning framework for implicit sentiment identification. Sensors 2023, 23, 6257. [Google Scholar] [CrossRef] [PubMed]
Yang, Y.; He, R.-F. Multi-modal Sentiment Analysis of Mongolian Language based on Pre-trained Models and High-resolution Networks. In Proceedings of the 2024 International Conference on Asian Language Processing (IALP), Hohhot, China, 4–6 August 2024; pp. 291–296. [Google Scholar]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S.; et al. PaLM: Scaling Language Modeling with Pathways. arXiv 2022, arXiv:2204.02311. [Google Scholar] [CrossRef]
Wei, J.; Tay, Y.; Bommasani, R.; Raffel, C.; Zoph, B.; Borgeaud, S.; Yogatama, D.; Bosma, M.; Zhou, D.; Metzler, D.; et al. Emergent Abilities of Large Language Models. arXiv 2022, arXiv:2206.07682. [Google Scholar] [CrossRef]
Hedderich, M.A.; Lange, L.; Adel, H.; Strötgen, J.; Klakow, D. A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios. In Proceedings of the NAACL 2021, Online, 6–11 June 2021; pp. 2545–2568. [Google Scholar]
Strubell, E.; Ganesh, A.; McCallum, A. Energy and Policy Considerations for Deep Learning in NLP. In Proceedings of the ACL 2019, Florence, Italy, 28 July–2 August 2019; pp. 3645–3650. [Google Scholar]
Danilevsky, M.; Qian, K.; Aharonov, R.; Katsis, Y.; Kawas, B.; Sen, P. A Survey of the State of Explainable AI for Natural Language Processing. In Proceedings of the AACL 2020, Online, 5–10 July 2020; pp. 447–459. [Google Scholar]
Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar]
Caruana, R. Multitask Learning. Mach. Learn. 1997, 28, 41–75. [Google Scholar] [CrossRef]
Liu, P.; Qiu, X.; Huang, X. Recurrent Neural Network for Text Classification with Multi-Task Learning. In Proceedings of the IJCAI 2016, New York, NY, USA, 9–15 July 2016; pp. 2873–2879. [Google Scholar]
Tsarfaty, R.; Nivre, J.; Andersson, E. Cross-Framework Evaluation for Statistical Parsing. In Proceedings of the EACL 2012, Avignon, France, 23–27 April 2012; pp. 44–54. [Google Scholar]
Çöltekin, Ç. A Freely Available Morphological Analyzer for Turkish. In Proceedings of the LREC 2010, Valletta, Malta, 17–23 May 2010; pp. 820–827. [Google Scholar]
Nivre, J.; de Marneffe, M.-C.; Ginter, F.; Goldberg, Y.; Hajič, J.; Manning, C.D.; McDonald, R.; Petrov, S.; Pyysalo, S.; Silveira, N.; et al. Universal Dependencies v1: A Multilingual Treebank Collection. In Proceedings of the LREC 2016, Portorož, Slovenia, 23–28 May 2016; pp. 1659–1666. [Google Scholar]
Kann, K.; Cho, K.; Bowman, S.R. Towards Realistic Practices In Low-Resource Natural Language Processing: The Development Set. In Proceedings of the EMNLP 2019, Hong Kong, China, 3–7 November 2019; pp. 3342–3349. [Google Scholar]
Ruder, S.; Vulić, I.; Søgaard, A. A Survey of Cross-lingual Word Embedding Models. J. Artif. Intell. Res. 2019, 65, 569–631. [Google Scholar] [CrossRef]
Zeman, D.; Hajič, J.; Popel, M.; Potthast, M.; Straka, M.; Ginter, F.; Nivre, J.; Petrov, S. CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. In Proceedings of the CoNLL 2018, Brussels, Belgium, 31 October–1 November 2018; pp. 1–21. [Google Scholar]
Conneau, A.; Khandelwal, K.; Goyal, N.; Chaudhary, V.; Wenzek, G.; Guzmán, F.; Grave, E.; Ott, M.; Zettlemoyer, L.; Stoyanov, V. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the ACL 2020, Online, 5–10 July 2020; pp. 8440–8451. [Google Scholar]
Artetxe, M.; Labaka, G.; Agirre, E. Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In Proceedings of the EMNLP 2016, Austin, TX, USA, 1–4 November 2016; pp. 2289–2294. [Google Scholar]
Lample, G.; Conneau, A.; Ranzato, M.; Denoyer, L.; Jégou, H. Word translation without parallel data. In Proceedings of the ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]

Figure 1. The structure of MTSA-CG model.

Table 1. Mongolian Sentiment Dataset Statistics.

Category	Instances	Avg. Length (Words)	Token Count	Unique Tokens
Happy	300	18.3	5490	1247
Surprise	300	16.7	5010	1189
Anger	300	21.5	6450	1356
Disgust	300	19.8	5940	1298
Neutral	300	22.1	6630	1412
Sad	300	20.4	6120	1323
Fear	300	17.9	5370	1205
Total	2100	19.5	41,010	3847

Table 2. Experimental results of the MTSA-CG model across emotion categories.

Emotion Category	Accuracy (%)	Recall (%)	F1-Score (%)
Happy	93.57	91.84	92.69
Surprise	90.24	89.76	90.00
Anger	86.72	85.11	85.90
Disgust	84.93	83.54	84.20
Neutral	82.45	80.87	81.65
Sad	79.38	77.26	78.30
Fear	78.16	75.49	76.80
Average	85.92	83.98	84.79

Table 3. Comparative experimental results of different models.

Model	Accuracy (%)	Recall (%)	F1-Score (%)
CNN	85.76	80.21	84.93
BERT	88.40	85.31	89.61
ConvBERT	90.57	89.16	91.38
KIG	89.73	93.09	91.66
MTG-XLNET	91.58	92.01	92.47
MTSA-CG (proposed)	92.80	93.43	92.71

Table 4. Component Ablation Results (5-fold CV, 3 seeds).

Model	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
w/o GAT	81.61 ± 1.3	81.18 ± 1.4	80.42 ± 1.5	82.95 ± 1.2
w/o CRG	85.27 ± 1.1	84.51 ± 1.2	83.04 ± 1.3	83.76 ± 1.0
w/o DRG	84.87 ± 1.2	84.20 ± 1.3	83.02 ± 1.4	83.49 ± 1.1
w/o SSG	89.58 ± 0.9	87.93 ± 0.9	86.35 ± 1.0	88.21 ± 0.8
w/o Cross-Attention	90.15 ± 0.8	88.96 ± 1.0	88.72 ± 1.1	89.44 ± 0.9
MTSA-CG (Full)	92.80 ± 0.7	92.05 ± 0.7	93.43 ± 0.8	92.71 ± 0.6

Table 5. Dependency Transformation Ablation Results (Macro F1 %).

Method	F1-Score (%)	LAS (%)	# Sentences ^a
Chinese Only (no transform)	75.32 ± 2.1	—	105 (5%)
Mongolian Only	88.14 ± 1.0	81.3	2100 (100%)
Transformed (Proposed)	92.71 ± 0.6	89.7 ^b	2100 (100%)
Hybrid Oracle	94.23 ± 0.5	92.4 ^b	2100 (100%)

^a “# Sentences” denotes the total number of sentences used for each configuration. ^b LAS values marked indicate transformed or hybrid parses rather than outputs from a monolingual parser.

Table 6. Data Leakage Control Results.

Setting	Accuracy (%)	F1-Score (%)	ΔF1
Global Statistics (leaked)	94.17 ± 0.6	94.52 ± 0.5	+1.81%
Leakage-Safe (Proposed)	92.80 ± 0.7	92.71 ± 0.6	baseline

Table 7. SSG Configuration Ablation Results.

Configuration	F1-Score (%)	Graph Density	Avg. Degree
Surface Forms (default)	92.71 ± 0.6	0.087	8.3
Lemmatized	91.95 ± 0.7	0.092	8.9
No Stopword Masking	91.48 ± 0.8	0.114	11.2
$θ_{sim} = 0.3$	91.82 ± 0.7	0.135	13.1
$θ_{sim} = 0.5$	92.34 ± 0.6	0.061	5.8
$θ_{sim} = 0.6$	91.76 ± 0.8	0.043	4.1

Table 8. Confusion Matrix (Row: True Label, Column: Predicted, %).

True/Pred.	Happy	Surprise	Anger	Disgust	Neutral	Sad	Fear
Happy	93.6	2.1	0.3	0.7	2.4	0.5	0.4
Surprise	3.2	89.8	1.1	0.6	3.5	0.9	0.9
Anger	0.4	1.3	85.1	4.8	4.2	2.7	1.5
Disgust	0.6	0.8	5.2	83.5	6.1	2.3	1.5
Neutral	2.8	3.1	4.5	5.9	80.9	1.6	1.2
Sad	0.7	1.2	3.1	2.5	4.8	77.3	10.4
Fear	0.5	1.5	2.3	2.1	4.6	13.5	75.5

Table 9. Error Breakdown by Linguistic Factors.

Factor	Sad Errors (n = 35)	Fear Errors (n = 38)	Neutral Errors (n = 42)
Case Marker Ambiguity	8 (22.9%)	6 (15.8%)	5 (11.9%)
Verb Argument Mismatch	12 (34.3%)	15 (39.5%)	9 (21.4%)
Negation Scope Error	5 (14.3%)	8 (21.1%)	3 (7.1%)
Long Sentence (>30 words)	6 (17.1%)	5 (13.2%)	12 (28.6%)
Clause Embedding	9 (25.7%)	11 (28.9%)	8 (19.0%)
Modal Particle Subtlety	7 (20.0%)	9 (23.7%)	6 (14.3%)
Code-Switching (MN–CN)	2 (5.7%)	1 (2.6%)	8 (19.0%)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ren, Q.; Wang, Q.; Lu, Y.; Ji, Y.; Wu, N. MTSA-CG: Mongolian Text Sentiment Analysis Based on ConvBERT and Graph Attention Network. Electronics 2025, 14, 4581. https://doi.org/10.3390/electronics14234581

AMA Style

Ren Q, Wang Q, Lu Y, Ji Y, Wu N. MTSA-CG: Mongolian Text Sentiment Analysis Based on ConvBERT and Graph Attention Network. Electronics. 2025; 14(23):4581. https://doi.org/10.3390/electronics14234581

Chicago/Turabian Style

Ren, Qingdaoerji, Qihui Wang, Ying Lu, Yatu Ji, and Nier Wu. 2025. "MTSA-CG: Mongolian Text Sentiment Analysis Based on ConvBERT and Graph Attention Network" Electronics 14, no. 23: 4581. https://doi.org/10.3390/electronics14234581

APA Style

Ren, Q., Wang, Q., Lu, Y., Ji, Y., & Wu, N. (2025). MTSA-CG: Mongolian Text Sentiment Analysis Based on ConvBERT and Graph Attention Network. Electronics, 14(23), 4581. https://doi.org/10.3390/electronics14234581

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

MTSA-CG: Mongolian Text Sentiment Analysis Based on ConvBERT and Graph Attention Network

Abstract

1. Introduction

2. Related Work

3. Methods

3.1. Overview

3.2. Text Feature Extraction

3.3. Graph Structure Construction

3.3.1. Module Overview

Leakage-Safe Protocol

3.3.2. Co-Occurrence Relation Graph (CRG)

Leakage Control

3.3.3. Cross-Lingual Dependency Transformation (DRG)

Linguistic Motivation

Transformation Pipeline

4. Experiments and Evaluation

4.1. Dataset and Evaluation Index

4.1.1. Mongolian Sentiment Dataset

4.1.2. Mongolian-Chinese Parallel Corpus

4.2. Text Preprocessing

4.2.1. Preprocessing Pipeline

Step 1: Noise Removal (Pre-Parsing)

Step 2: Tokenization

Step 3: Dependency Parsing

Step 4: Graph-Level Filtering (Post-Parsing)

Preprocessing Tools and Versions

Reproducibility

4.3. Evaluation Metrics

4.4. MTSA-CG Model Experiment

4.5. Contrast Experiment

4.6. Ablation Experiments

4.6.1. Component Ablation

4.6.2. Dependency Transformation Ablation

4.6.3. Data Leakage Control Experiment

4.6.4. SSG Configuration Ablation

5. Error Analysis

5.1. Confusion Matrix Analysis

5.2. Linguistic Error Analysis

5.3. Implications for Future Improvement

6. Discussion

6.1. Positioning in the Era of Large Language Models

6.2. Cross-Lingual Applicability

6.3. Limitations

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Dependency Transformation Rules

Appendix A.1. Rule 1: Subject-Object-Verb Reordering

Appendix A.2. Rule 2: Case Marker Mapping

Appendix A.3. Rule 3: Attributive Modifier Inversion

Appendix A.4. Rule 4: Postposition Handling

Appendix A.5. Rule 5: Relative Clause Attachment

Appendix A.6. Rule 6: Coordination Transformation

Appendix A.7. Rule 7: Verb Complement Clauses

Appendix A.8. Rule 8: Aspectual Auxiliary Transformation

Appendix A.9. Rule 9: Adverbial Modifier Position

Appendix A.10. Rule 10: Possessive Construction

Appendix A.11. Rule 11: Modal Verb Transformation

Appendix A.12. Rule 12: Comparative Construction

Appendix A.13. Rule 13: Question Particle Handling

Appendix A.14. Rule 14: Passive Voice Transformation

Appendix A.15. Rule 15: Embedded Clause Reordering

Appendix B. Precision Values for Table 2

Appendix C. Precision Values for Table 3

Appendix D. Component Ablation with Precision

Appendix E. 95% Confidence Intervals

Appendix F. Cross-Reference Resolution

Appendix G. Reproducibility Checklist

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI