Tibetan Judicial Event Argument Extraction Based on Machine Reading Comprehension in Low-Resource Scenarios

Gao, Lu; Zhao, Xiaobing

doi:10.3390/electronics14193887

Open AccessArticle

Tibetan Judicial Event Argument Extraction Based on Machine Reading Comprehension in Low-Resource Scenarios

by

Lu Gao

^1,*

and

Xiaobing Zhao

^2,*

¹

School of Software, Handan University, Handan 056005, China

²

National Language Resources Monitoring & Research Center Minority Languages Branch, Minzu University of China, Beijing 100081, China

^*

Authors to whom correspondence should be addressed.

Electronics 2025, 14(19), 3887; https://doi.org/10.3390/electronics14193887

Submission received: 4 August 2025 / Revised: 24 September 2025 / Accepted: 27 September 2025 / Published: 30 September 2025

Download

Browse Figures

Versions Notes

Abstract

This paper proposes a Tibetan judicial event argument extraction method based on machine reading comprehension (MRC) to address the challenges of data scarcity and insufficient model generalization in low-resource language scenarios. Unlike traditional methods, this work models event argument extraction as an MRC task, progressively identifying and extracting various event arguments through a question-guided approach. First, a strategy for constructing event knowledge-enhanced questions tailored to the Tibetan judicial domain is designed. Specifically, interrogative words are formulated for different types of event arguments, and event semantic information is incorporated into questions to effectively disambiguate questions. Second, a deep semantic understanding architecture for Tibetan judicial events based on the CINO (Chinese Minority Pretrained Language Model) is proposed, incorporating a multi-head self-attention mechanism to enhance semantic alignment and global understanding between event sentences and questions. Finally, a two-stage training strategy is proposed for low-resource languages. Training is performed on a general Tibetan machine reading comprehension dataset, followed by task-adaptive fine-tuning on judicial domain data, effectively alleviating the data scarcity issue. Experimental results show that the proposed method achieved an F1-score of 76.59% in the Tibetan judicial event argument extraction task. This research offers new ideas for low-resource language event extraction and is of great significance for promoting intelligent information processing of minority languages.

Keywords:

MRC; Tibetan; event extraction; low-resource language

1. Introduction

As a foundational component of natural language processing (NLP), event extraction focuses on identifying events and their key participants from unstructured texts, supporting higher-level tasks such as semantic retrieval, knowledge graphs, and automated reasoning [1,2]. Driven by advances in neural architectures and the accumulation of multilingual annotated corpora, event extraction has made significant progress in well-resourced languages, particularly English and Chinese [3,4].

The event extraction task usually includes two core subtasks: event trigger detection (ETD) and event argument extraction (EAE) [5,6]. The event argument extraction task focuses on identifying the participating entities and their semantic roles in the event, which is a key part of the complete modeling of event structure [7]. Compared with event trigger detection, argument extraction requires higher syntactic analysis and semantic comprehension, especially in scenarios with scarce language resources and complex text structures, which are more challenging [8]. Therefore, this paper focuses on the event argument extraction task in the Tibetan judicial context.

The core difficulties of extracting Tibetan judicial events mainly lie in the limitations of low-resource languages, the particularity of judicial texts, and the limitations of existing methods [9,10,11]. Tibetan is a typical low-resource language that lacks large-scale, high-quality annotated corpora, which seriously restricts the training and generalization capabilities of deep learning models [9,10]. Tibetan judicial texts are typically characterized by complex structures, dense legal terminology, and lengthy sentence patterns, which pose greater challenges for the identification and modeling of event arguments. In addition, judicial texts often involve the interweaving of multiple roles and events, further increasing the complexity of analysis. Although existing sequence labeling methods (e.g., BiLSTM-CRF) have achieved promising results in high-resource languages, they perform poorly in the Tibetan judicial domain, mainly due to their inability to capture long-distance semantic dependencies and their limitations in handling sparse arguments and cross-domain transfer [11,12].

In response to the above challenges, researchers have proposed a variety of solutions. Traditional event argument extraction methods mainly adopt classification or sequence labeling paradigms, transforming argument extraction tasks into role classification or label assignment problems [13]. Although such methods perform well in scenarios with simple structures and sufficient data, their performance often drops significantly when dealing with low-resource languages due to insufficient training data. In recent years, researchers have begun to try to reconstruct event extraction tasks into new paradigms such as generative extraction and question-answering extraction, alleviating the impact of data scarcity through modeling transformations [14,15,16,17]. Among them, the method based on machine reading comprehension (MRC) reduces task complexity and demonstrates good transfer and knowledge fusion capabilities by virtue of its problem-guiding mechanism and semantic modeling advantages [15].

Machine reading comprehension transforms the event argument extraction task into a series of question–answer pairs [18,19,20]. Through carefully designed question templates, the model is guided to focus on specific events and their arguments, effectively decomposing the originally complex extraction task [18]. Question templates contain prior knowledge of event types and argument roles, providing the model with additional semantic information and helping to improve the model’s capability to understand the relationship between events and arguments [19]. The MRC paradigm has strong cross-domain transfer capabilities, enabling the model to use general domain data for transfer learning, thereby achieving effective application in low-resource scenarios [20]. These characteristics make MRC-based approaches particularly suitable for event argument extraction in resource-constrained languages like Tibetan.

Building upon the above analysis, this paper introduces an MRC-driven approach for extracting event arguments from Tibetan judicial texts and builds a complete technical system covering question template design, extraction model construction, and training strategy optimization. Through the reconstruction of task modeling and knowledge fusion, it offers a more efficient and transferable strategy for event argument extraction in languages with limited resources, providing a reference framework for other low-resource or domain-specific applications.

The main contributions of this study are delineated as follows:

(1): Designing high-quality question templates that integrate the semantics of event type and associated argument roles for the Tibetan judicial field, combining Tibetan interrogative words with event contextual information, effectively alleviating the problem of semantic ambiguity;
(2): Proposing an MRC_TibAE framework based on CINO (Chinese Minority Pretrained Language Model), introducing multi-head self-attention to strengthen semantic interaction and modeling capabilities between event sentences and questions;
(3): Developing a two-stage training strategy for low-resource languages, in which the model is first trained on a general Tibetan MRC dataset to acquire general language understanding, and then fine-tuned on domain-specific judicial corpora to improve transferability and extraction performance.

2. Literature Review

2.1. Event Extraction Based on Deep Learning

Deep learning models have been widely applied in event extraction tasks [11,21,22]. The mainstream deep learning architectures include CNNs, RNNs, GNNs, and Transformers [23]. In addition, many studies have adopted a hybrid approach of multiple architectures to give full play to their respective advantages. Nguyen et al. [21] first proposed the application of CNNs to event detection and proved the effectiveness of CNNs in event extraction tasks. Wang et al. [11] introduced a unified extraction framework based on CNNs and LSTMs to simultaneously identify event type and arguments, simplifying the multi-stage processing flow in traditional event extraction. Peng et al. [22] proposed a multiple template choice model (MTCM), which designed an extended event type mining module to automatically mine extended event types in event mentions. However, Liu et al. [24] pointed out that in low-resource scenarios, event extraction tasks face problems such as data sparsity and category imbalance, which impair the robustness of the model across diverse scenarios and make it difficult to handle complex nested event structures.

To address the above limitations, researchers have begun to try to reconstruct event extraction as a question-answering task in recent years by designing specific questions to guide the model to focus on different aspects of the event [15,18,25,26]. Liu et al. [15] proposed an RCEE framework which includes an unsupervised question generation process that can convert event patterns into a set of natural questions and then retrieve the answers as event extraction results through a BERT-based question-answering process. He et al. [25] further proposed a multi-round Q&A-based event extraction framework, which effectively exploits the hierarchical dependencies between the arguments. To promote consistency and improve performance, Liu et al. [26] designed JEEMRC, a model that seamlessly combines the recognition of events and their associated elements in a unified learning framework. Liu et al. [18] developed a question–context bridging method that reconstructs the semantic relationship between templates and text to enhance the prompting function of question templates. These works have verified the effectiveness of Q&A-based event extraction methods in introducing prior knowledge and adapting to low-resource conditions. However, existing research still focuses on resource-rich languages like Chinese and English; research efforts targeting under-resourced languages—for instance, Tibetan—remain constrained.

2.2. Research on Tibetan Natural Language Processing

Progress in Tibetan natural language processing (Tibetan NLP) has gradually emerged recently, especially in machine reading comprehension tasks. Sun et al. [27] constructed the first Tibetan machine reading comprehension dataset, TibetanQA, which covers multiple topics such as nature, culture, history, etc., and supports multiple types of question-answering tasks such as word matching and synonym replacement. The subsequent release of TibetanQA2.0 further expanded the scale and coverage of the data, added a variety of complex question types, and provided important resources for Tibetan semantic understanding [28]. Based on this, Sun et al. [29] proposed the Ti-Reader model, which introduced an attention mechanism to improve sentence comprehension ability and achieved good performance in the Tibetan MRC task. Yang [30] made full use of the BERT for fine-tuning and optimization and explored the applicability of the model to Tibetan comprehension tasks. These studies laid the foundation for the work of this paper and provided available data resources and baseline methods.

Although some progress has been made in Tibetan NLP, studies specifically focusing on event extraction are still very limited [10]. Existing studies mainly focus on basic entity recognition (NER) and relationship extraction (RE), such as the Tibetan pre-trained named entity recognition model combined with cascade technology proposed by Xu et al. [31] and the entity relationship extraction model for the field of Tibetan medicine studied by Zhou et al. [32]. However, there is little research on the extraction of more complex semantic units such as events. Gao et al. [12] attempted to apply a hybrid neural network-based method to Tibetan judicial event extraction, mainly adopting a sequence labeling framework. However, their approach lacks a systematic mechanism to address the syntactic characteristics of the Tibetan language and the challenges posed by data scarcity. The MRC-based Tibetan judicial event extraction method proposed in this paper offers a novel technical pathway for advancing Tibetan event extraction research.

2.3. Event Extraction in Low-Resource Scenarios

Low-resource language scenarios remain a significant challenge in current natural language processing research, particularly for tasks such as event extraction that are heavily dependent on semantic understanding [33]. For the problem of low-resource event extraction, existing research mainly focuses on three types of strategies: data augmentation, transfer learning, and task reformulation [24].

Data augmentation is a direct strategy to alleviate the low-resource problem, including rule-based sample replacement, model-generated data construction, and remote supervision annotation [34,35]. These methods can expand the training set size to a certain extent, but in Tibetan, with its complex language structure, there are risks such as grammatical perturbation and semantic bias, which may introduce training noise and affect model stability [24]. Transfer learning aids target-language task learning by introducing resource-rich language or general domain knowledge and shows good adaptability in low-resource event extraction [36]. Task reformulation methods change the task form of event extraction (such as generative or question-answering) to weaken the dependence on labeled data [16,20,37]. The event extraction method based on MRC proposed in this study is a typical task reformulation strategy [20]. By transforming event extraction into a question-answering form, the model is guided to focus on the target argument with the help of prior knowledge in the question template, thereby improving model performance under limited data conditions [38].

In summary, existing research has made some progress in deep learning, machine reading comprehension, and low-resource language processing, but research on Tibetan judicial event extraction is still relatively scarce [10]. Based on the machine reading comprehension framework, combined with the event knowledge enhanced question template design and two-stage training strategy, this paper proposes an event argument extraction method suitable for Tibetan, which provides new ideas and solutions for the event extraction task in low-resource languages.

3. Methodology

This paper proposes a framework comprising three interrelated components, building a comprehensive event argument extraction process for Tibetan judicial texts. Figure 1 shows that the framework has three main parts: task formalization, the MRC_TibAE architecture, and an optimization strategy for low-resource languages.

The task formalization module transforms the traditional event argument extraction tasks into structured question-answering problems. Through carefully designed question templates, it explicitly embeds event type and argument information, guiding the model to focus on extracting key event elements. The MRC_TibAE architecture concatenates the question with the event sentence and employs multi-head self-attention to model complex dependencies between them; then, the core features are processed using a feedforward neural network and residual connections and normalization layers; finally, the start and end positions of the target segment in the text are predicted through the normalized Softmax classification layer. An optimization strategy for low-resource languages addresses the challenge of data scarcity. The model is trained on the TibetanQA dataset for general-purpose Tibetan machine reading comprehension, acquiring basic Tibetan language comprehension capabilities. It is then fine-tuned on a specific judicial dataset. This two-stage training approach effectively leverages a wider range of general-purpose datasets, establishing foundational Tibetan language comprehension capabilities adapted to the specialized judicial domain.

3.1. Task Formalization

3.1.1. Question Template Design

This study designs a question representation method that integrates event semantic knowledge. By explicitly embedding event type and argument information into query templates, a set of questions with clear semantic orientation is constructed, effectively eliminating ambiguity and enhancing the model’s ability to locate answers. Table 1 systematically presents the event argument classification system constructed in this study. This system covers 51 event elements across five categories—person, entity, amount, location, and time—providing structured extraction targets for the machine reading comprehension framework.

To precisely guide the model in locating event arguments, this study designs structured question templates that explicitly instruct the model to search for specific arguments under a given event type. This structured question template design ensures that each question clearly includes information about the event type and argument role, effectively eliminating semantic ambiguity and improving the model’s ability to locate specific event arguments. To standardize the design of question templates, the templates are formalized as shown in Equation (1).

Q = T e m p l a t e (e t y p e, r t y p e, w t y p e)

(1)

Here,

Q

denotes the question template,

e t y p e

represents the event type,

r t y p e

represents the argument role,

w t y p e

refers to the type of interrogative word, and Template represents the question template generation function.

The question templates shown in Figure 2 correspond one-to-one with the 51 types of event arguments listed in Table 1, forming a complete argument extraction question set, which provides the model with clear extraction targets and search directions. This structured system of question templates enables the model to query event elements in a standardized manner, ensuring the consistency and effectiveness of the extraction process. Each question template precisely embeds information about the event type and argument roles, eliminating semantic ambiguity and enhancing the model’s ability to locate key event elements in judicial texts.

3.1.2. Task Reformulation

Traditional event argument extraction methods mainly adopt the classification or sequence labeling paradigms. However, such methods are highly data-dependent and struggle to generalize in limited-resource settings. To address this limitation, this study reformulates event argument extraction as a question-answering-based task. Each event argument is represented by a corresponding question. Given the question

Q

and the event sentence

S

, the model predicts the boundaries of the target argument span by learning the mapping function

f (Q, S)

. Formally, the model seeks to capture the following mapping:

P_{s t a r t, e n d} = f (Q, S)

(2)

Here,

f (Q, S)

denotes the joint modeling function of the question

Q

and event sentence

S

, and

P_{s t a r t, e n d}

represents the beginning and ending indices of the start and end positions of the target argument fragment in the original text.

This modeling approach transforms the structured event argument extraction task into an interpretable and easily transferable question-answering matching problem, providing a strong basis for subsequent model architecture design and training optimization.

3.2. MRC_TibAE

Based on the MRC paradigm following task reformulation, this paper proposes the MRC_TibAE model (Machine Reading Comprehension Model for Tibetan Judicial Event Argument Extraction). The model’s overall architecture is depicted in Figure 3 and includes three key components: the encoding layer, the interaction layer, and the prediction layer. The model employs the minority language pre-training model CINO as the basic encoder. Using self-attention, the model effectively encodes the semantic interplay between the question and the event sentence. Finally, a position prediction module is used to locate and extract the event arguments.

3.2.1. Encoding Layer

The encoding layer is responsible for converting the Tibetan input sequence into a dense vector representation. The input consists of two parts: event sentence

S = {{s}_{1}, s_{2}, {\dots, s}_{n}}

(

n

refers to the length of

S

) and question

Q = {q_{1}, q_{2}, \dots, q_{m}}

(

m

refers to the length of

Q

). For joint modeling, the two parts are concatenated into a unified input sequence

X

in the form of:

X = < s > ⨁ Q ⨁ < / s > ⨁ S ⨁ < / s >

(3)

where

< s >

and

< / s >

indicate the sequence start token and separator token, respectively, and

⨁

represents the sequence concatenation operation. The size of the concatenated input is

l = n + m + 3

.

The CINO encoder processes the input to generate three types of embedding: token embedding, positional embedding, and segment embedding.

E = {E m b}_{t o k e n} + {E m b}_{p o s} + {E m b}_{s e g} \in R^{l \times d}

(4)

Here,

{E m b}_{t o k e n}

,

{E m b}_{p o s}

, and

{E m b}_{s e g}

represent the token embedding, positional embedding, and segment embedding, respectively.

l

refers to the input sequence size, and

d

is the embedding dimension.

The CINO model applies multi-layer stacked encoding to the input and produces high-dimensional contextual representations for subsequent processing.

3.2.2. Interaction Layer

Tibetan judicial texts are complex in structure and information-dense, so the model needs to have strong context modeling capabilities. To accurately capture the semantic association between questions and event sentences, this study adopts a multi-head self-attention mechanism to encode interactions between tokens, effectively directing attention toward question-relevant content. Specifically, given the input representation

E = {e_{1}, e_{2}, \dots, e_{l}}

, the model utilizes h parallel multi-head attention modules to capture multi-level semantic relationships from different perspectives. The computation of the i-th self-attention head,

{h e a d}_{i}

, is defined as shown in Equation (5).

{h e a d}_{i} = A t t e n t i o n (Q_{i}, K_{i}, V_{i}) = s o f t m a x (\frac{Q_{i} {K_{i}}^{T}}{\sqrt{d_{k}}}) V_{i}

(5)

Here,

{Q_{i} = H W}_{i}^{Q}

,

{K_{i} = H W}_{i}^{K}

,

{V_{i} = H W}_{i}^{V}

,

W_{i}^{Q}

,

W_{i}^{K}

,

W_{i}^{V}

are learnable parameter matrices, and

d_{k}

is the scaling factor for the attention mechanism.

The outputs of all attention heads,

{h e a d}_{1}, {h e a d}_{2}, \dots, {h e a d}_{h}

, are merged and subsequently transformed via a linear layer to obtain the fusion representation of multi-head attention, as shown in Equation (6).

M u l t i H e a d (H) = c o n c a t ({h e a d}_{1}, \dots, {h e a d}_{h}) W^{o}

(6)

Here,

W^{o}

is the output mapping matrix.

C o n c a t

is the concatenation function.

{h e a d}_{i}

denotes the output generated by the i-th attention head.

h

represents the count of attention heads.

Through the multi-head self-attention mechanism, the model can capture deep semantic associations within the interaction representations between the question and event sentence. In this process, each word

s_{i}

in the event sentence interacts with each word

q_{i}

in the question, which strengthens the model’s ability to comprehend the event sentence in a question-aware manner. After processing by multi-layer self-attention and feedforward networks, the final hidden representation from the CINO model is taken and recorded as

H_{Q A}

:

H_{Q A} = C I N O (E)

(7)

Here,

H_{Q A} \in R^{l \times d}

represents the final semantic interaction representation,

l

refers to the input length, and

d

is the size of the hidden representation.

3.2.3. Prediction Layer

After obtaining the deep semantic representation

H_{Q A}

, two independent linear classifiers are used to estimate the likelihood that each token corresponds to the beginning or end of the target argument span, respectively:

P_{s t a r t} = s o f t m a x (W_{s t a r t} H_{Q A})

(8)

where

P_{s t a r t}

represents the likelihood assigned to each token being the starting boundary of the target argument span.

W_{s t a r t}

is a learnable parameter matrix.

P_{e n d} = s o f t m a x (W_{e n d} H_{Q A})

(9)

where

P_{e n d}

denotes the likelihood assigned to each token being the ending boundary of the target argument span.

W_{e n d}

is the learnable parameter matrix.

After obtaining the position probability distribution, enumerate all possible start and end position pairs (i, j) to construct the candidate span range set

T

:

T = {(i, j) | i ϵ P_{s t a r t}, j \in P_{e n d}}

(10)

To ensure the rationality of the extraction results, the model designs a systematic constraint mechanism to screen the candidate range. First, the span length must not exceed the preset maximum value. Second, the start position cannot be later than the end position (i.e.,

i \leq j

). Finally, the target span needs to fall completely within the valid range of the original text. For all candidate spans that meet the constraints, the joint probability score of each start and end position pair is calculated, as shown in Equation (11).

{s c o r e (i, j) = P}_{s t a r t} (i) \cdot P_{e n d} (j) i \leq j

(11)

where

P_{s t a r t} (i)

denotes the likelihood of the i-th token being selected as the start of the target span.

P_{e n d} (j)

denotes the likelihood of the j-th token being selected as the end of the target span.

s c o r e (i, j)

denotes the joint probability score.

All joint probability scores are ranked from highest to lowest, the N candidate ranges with the highest scores are selected, and the fragment yielding the maximum probability score is ultimately retained, as illustrated in Equation (12). If the scores of all candidate fragments are lower than the preset threshold, or no valid candidate fragments are generated during the screening process, the model will return an empty result, indicating that no relevant events or arguments are detected.

{s p a n}^{*} = \underset{(i, j) \in S}{argmax} (s c o r e (i, j))

(12)

Here,

{s p a n}^{*}

denotes the sequence span predicted by the model, which is mapped back to the original text space.

3.3. Optimization Strategies for Low-Resource Languages

3.3.1. Two-Stage Training Strategy

The complex linguistic structure of Tibetan judicial texts and the severe lack of annotated data make it difficult for models to directly learn effective representations from limited data. To improve the model’s adaptability in this domain, this paper proposes a two-stage strategy. This strategy leverages general domain knowledge to strengthen the model’s language understanding capabilities, then transfers this knowledge to the judicial domain to enhance the model’s ability to learn specialized terminology and event structure.

As shown in Figure 4, the first stage is trained on the general Tibetan machine reading comprehension dataset TibetanQA so that the model can learn general Tibetan language comprehension and Q&A modeling capabilities and be familiar with the basic grammar, common vocabulary, and syntactic structure of Tibetan. The formal representation is as follows:

θ_{p r e} = a r g m i n L (θ, D_{T i b e t a n Q A})

(13)

Here,

θ

represents the initial model parameters and

D_{T i b e t a n Q A}

represents the Tibetan machine reading comprehension dataset.

L

refers to the loss function.

θ_{p r e}

indicates the parameters after the general domain training is completed.

The second phase continues fine-tuning on the Tibetan judicial event dataset, adapting the model to the specific semantic features of Tibetan judicial texts, such as professional terminology, event structure, and syntactic expressions.

θ_{f t} = {a r g m i n}_{θ} L (θ_{p r e}, D_{j u d i c a l})

(14)

Here,

θ_{p r e}

is the model parameter saved in the first stage and

θ_{f t}

represents the model parameter after fine-tuning in the judicial field.

D_{j u d i c a l}

represents the Tibetan judicial event dataset. Through domain-specific fine-tuning on judicial data, the model acquires legal-domain linguistic features, which in turn enhances its performance on judicial event extraction.

Through a two-stage strategy, the model can be adapted both at the language level and the task level, thereby alleviating the low-resource language problem and improving the final extraction effect.

3.3.2. Loss Function Design

The model employs binary cross-entropy to quantify the discrepancy between predicted and true start/end positions of the argument span. In each training iteration, model parameters are optimized by minimizing the loss, aligning the predicted start and end positions more closely with the annotated ground truth. Specifically, the target loss comprises two parts–

L_{s t a r t}

and

L_{e n d}

—representing the start and end prediction losses, respectively. The loss is calculated using the following expression:

L_{s t a r t} = - \sum y_{s t a r t} \log P_{s t a r t}

(15)

L_{e n d} = - \sum y_{e n d} \log P_{e n d}

(16)

Here,

y_{s t a r t}

denotes the true start position of the target argument span, and

y_{e n d}

denotes the true end position.

P_{s t a r t}

denotes the predicted probability distribution over the start positions, and

P_{e n d}

denotes the predicted probability distribution over the end positions.

During training, the AdamW optimizer is used to optimize the loss function, and parameter updates are performed by minimizing the total loss. The overall training loss integrates the losses for both the beginning and ending positions of the target argument span:

L_{t o t a l} = L_{s t a r t} + L_{e n d}

(17)

Here,

L_{t o t a l}

denotes the total loss of the model.

L_{s t a r t}

and

L_{e n d}

represent the cross-entropy losses for the start and end positions of the target argument span, respectively.

4. Experiments

4.1. Experimental Environment Configuration

The experiment uses the CINO minority language pre-trained model [39], which has a hidden layer vector dimension of 768. All experiments are performed using a workstation that integrates two Tesla P40 GPUs, each offering 24 GB of memory. The software environment includes Python 3.8, PyTorch 2.4.1 (CUDA 11.8), and Transformers 4.46.0. This configuration is well-suited for high-dimensional vector processing tasks, such as fine-tuning and reasoning with pre-trained language models. It enables efficient handling of large-scale text data and long-sequence inputs, meeting the computational requirements of Tibetan judicial event extraction tasks.

4.2. Datasets

The experiment uses two Tibetan datasets: one is a MRC dataset in the general domain, TibetanQA [27,28]; the other is an event extraction dataset in the judicial domain [12].

(1): TibetanQA: This dataset was released by Sun Yuan et al. It contains a total of 14,054 Tibetan samples covering 12 topics such as nature, culture, education, etc., and supports question-answer types such as word-matching, synonym substitution, and multi-sentence reasoning. The raw data are presented in the form of triples, including article, question, and answer. During data preprocessing, this paper uses regular expressions to locate the answers and labels for each sentence. A total of 90% of the data is allocated for training, while the remaining 10% is used for validation.
(2): Judicial event dataset: The dataset was constructed from publicly available Tibetan judicial documents sourced from China Judgments Online, containing 3006 Tibetan judicial event records covering 12 event types and their corresponding 51 event arguments, with all personal names anonymized to protect privacy. To better accommodate the specific task requirements of this study, the original event and argument labels were suitably revised. Table 2 summarizes the revised event types along with their associated argument roles. An 8:1:1 split with stratified sampling is applied to divide the dataset into training, validation, and test portions. To adapt the data to the MRC task format, this paper preprocesses the original annotations by converting them into a standardized set of triples while preserving the original text span positions of the event arguments. This ensures compatibility with model training and inference.

4.3. Evaluation Metrics

This paper uses precision, recall and F1-score as evaluation indicators and uses the micro-average method for calculation. The specific calculation methods are presented in Equations (18)–(20).

P r e c i s i o n = \frac{T P}{T P + F P}

(18)

R e c a l l = \frac{T P}{T P + F N}

(19)

F 1 = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(20)

Here, TP (True Positive) denotes correctly predicted instances, FP (False Positive) refers to incorrect predictions made by the model, and FN (False Negative) represents the instances that the model failed to identify. Regarding argument boundary determination, this paper adopts a relaxed matching strategy [40,41] for evaluation. When the extracted argument span highly overlaps with the gold standard span, it is considered correctly identified.

4.4. Experiment and Results

This paper uses a grid search strategy to optimize key hyperparameters. The learning rate search space is set to [1 × 10⁻⁵, 2 × 10⁻⁵, 5 × 10⁻⁵], and the batch size search space is [8, 16, 32]. The optimal configuration is found through exhaustive combinations. The training procedure is limited to 15 epochs, with early stopping employed to prevent overfitting. The patience value is set to 4 epochs, meaning that training is terminated after four consecutive epochs of validation set performance failure.

The final hyperparameter configuration is summarized in Table 3. A batch size of 8 and a learning rate of 2 × 10⁻⁵ are employed, while the maximum sequence length is set to 400 and the document step size to 128. The maximum length of questions and answers is limited to 64, and the optimizer uses AdamW.

This study conducted experiments on five randomly partitioned datasets, setting five random seeds in each partition, for a total of 25 experimental groups. According to the experimental results, the proposed model attains an average F1-score of 0.7659, with a standard deviation of 0.0101 and a 95% confidence interval of [0.7617, 0.7701]. This indicates that the results are concentrated and have low fluctuations, demonstrating good stability and reliability. Regarding the performance of different datasets, the best result was observed on Dataset 2, with an average F1-score of 0.7816, while the performance of Datasets 1 and 4 was relatively low but still within a reasonable range. Although there were some differences between the results of each group, they remained at a high level overall, further verifying the robustness of the proposed method under different data conditions.

5. Discussion

5.1. Model Effectiveness Analysis

5.1.1. Impact of Hyperparameter Settings

To determine the optimal hyperparameter configuration, we systematically experimented with combinations of learning rates [1 × 10⁻⁵, 2 × 10⁻⁵, 5 × 10⁻⁵] and batch sizes [8, 16, 32], training for 15 epochs. Figure 5 shows the performance of the model under different learning rate and batch size combinations.

Results show that different hyperparameter combinations lead to significant differences in convergence speed and performance. A learning rate that is too low often leads to slow model convergence, making it difficult to reach optimality within a limited number of epochs. A learning rate that is too high leads to rapid initial improvement but is accompanied by significant fluctuations and instability. Comprehensive analysis shows that the combination of a learning rate of 2 × 10⁻⁵ and a batch size of 8 achieves the best balance between convergence speed, stability, and final performance. Therefore, this configuration was ultimately selected as the core parameter for subsequent experiments.

5.1.2. The Performance of the Two-Stage Training Strategy

To validate the effectiveness of the two-stage training strategy, this paper compared a single-stage approach (training directly on judicial domain data) with a two-stage approach (first training on the general Tibetan MRC dataset and then fine-tuning on judicial domain data).

The experimental findings are presented in Figure 5. They demonstrate that the two-stage approach demonstrates significant advantages in low-resource scenarios. Compared to the single-stage approach, the two-stage approach achieves faster performance gains in the early stages of model convergence and maintains greater stability in the later stages of training. This demonstrates that prior training on the general Tibetan MRC dataset enables the model to learn cross-task language understanding capabilities and question-answering patterns, laying a solid foundation for subsequent adaptive fine-tuning in the judicial domain. On the other hand, while the single-stage approach also gradually converges, it exhibits significant performance lags in the early stages and a relatively limited upper limit in the later stages. This suggests that under data-scarce conditions, the single-stage approach struggles to fully capture semantic regularities, resulting in insufficient model generalization. The two-stage approach, by transferring existing linguistic and semantic knowledge, effectively alleviates the performance bottleneck caused by limited data size.

5.1.3. Impact Analysis of Question Template Design

This paper designed three different question templates for comparative experiments. Figure 6 shows the performance of the three question templates. Question Template 0 only contains event argument role information, Question Template 1 adds interrogative words and uses natural language questioning, and the standard question template is the template proposed in this paper.

While Template 0 provides the model with basic extraction guidance, its expression is overly brief and lacks semantic context, which can easily lead to confusion when the model encounters similar roles. Template 1 enhances the readability and naturalness of questions to a certain extent, enabling the model to better align questions with sentences, but its semantic guidance is still insufficient. In contrast, the question template proposed in this paper further incorporates event semantic information, making the questions more targeted and discriminative. Experimental curves indicate that the proposed question template attains optimal final results and provides more stable overall training.

5.2. Model Stability and Generalization Ability Evaluation

Based on the optimal hyperparameter configuration, a systematic experiment was conducted using a cross-validation strategy combining multiple datasets and multiple random seeds. Specifically, the dataset was randomly divided into five independent datasets at an 8:1:1 ratio, and five different random seeds (42, 123, 456, 789, and 1024) were set. A total of 25 independent experiments were carried out to verify the consistency and robustness of the results.

Figure 7 shows the performance distribution across various datasets. Overall, the model exhibits good stability across all partitions, with the mean F1-score remaining within a range of 75.85% to 78.16%. The medians across datasets show little variation, and the vast majority of results are concentrated with minimal fluctuation, demonstrating the model’s robustness and consistency in low-resource scenarios. Experimental results under different random seeds exhibit no extreme outliers, further validating the model’s robustness in low-resource scenarios. Notably, while subtle differences persist between some datasets, the overall variation remains within a reasonable range, with no significant deviations or abnormal fluctuations. This demonstrates that the proposed method enhances the model’s convergence efficiency and task adaptability, and ensures consistent performance even under uncertain data partitioning.

5.3. Performance Analysis at Event Type Granularity

To provide a clearer analysis of different event types, we conducted a fine-grained evaluation on the 12 event categories in the dataset. We used the same experimental setup as in Section 5.2, conducting 25 independent experiments under five data partitions and five random seed conditions to ensure statistical reliability. The experimental outcomes for various event types are illustrated in Table 4 and Figure 8.

From the overall distribution perspective, performance across event types shows differences. “Drunk Driving” achieved the highest performance, reaching an average F1-score of 0.9050 (95% CI: [0.893, 0.917]) and a variance of only 0.0009, demonstrating excellent performance and stability. This is primarily due to the relatively simple argument structure and highly patterned presentation of judicial texts in these events. “Appraisal” also exhibited satisfactory stability (F1 = 0.8770, 95% CI: [0.866, 0.888]), with a standard deviation of only 0.0269. This is related to their procedural and normative nature. In contrast, “Purchase” (F1 = 0.6479, 95% CI: [0.610, 0.686]) and “Traffic Accident” (F1 = 0.6800, 95% CI: [0.649, 0.711]) not only had lower means but also wide confidence intervals, indicating significant uncertainty. This shows that the argument relationships of such events are complex and the contexts are diverse, which brings greater challenges for the model.

From a stability perspective, events such as “Drunk Driving” and “Theft” exhibit excellent stability (variance ≤ 0.001), indicating that the model’s outcomes for these events are more consistent and robust. In contrast, performance on events such as “Intentional Injury” and “Purchase” are more dispersed, indicating that the model is more significantly affected by data partitioning and random factors in these events, resulting in less stable results.

5.4. Data Scale Sensitivity Analysis in Low-Resource Scenarios

For a thorough examination of the proposed method’s adaptability under limited-resource settings, experiments were performed by randomly sampling the training dataset at different ratios: 10%, 20%, 30%, 40%, 50%, 60%, 70%, and 100%. Figure 9 presents the experimental results.

Experimental results show that the model achieves a steady improvement with increasing data size, but the rate of increase exhibits a distinct phased pattern. At extremely low data ratios (e.g., 10%), the F1-score is only 0.5787, a significant drop in performance, indicating that the model struggles to fully learn event semantic patterns when the data are insufficient. When the data ratio increases to 20%, performance improves rapidly, with the F1-score increasing to approximately 0.6755, an increase of nearly 10 percentage points. However, as the amount of data increases, model performance improves rapidly but stabilizes after reaching a certain scale (40%). In particular, when the training data are expanded to medium and high ratios, the marginal gains in performance gradually decrease, eventually reaching a plateau as the training data approach the full amount of data. This trend demonstrates that the proposed method can effectively function with limited data, but further performance improvement requires the introduction of additional knowledge.

We conducted a qualitative analysis of typical error cases and found that the problems primarily occurred in scenarios with ambiguous semantics and unclear argument boundaries. In some judicial texts, the expressions of different argument roles were highly similar, leading to confusion in the model’s ability to distinguish them. The model also struggled to determine the appropriate extraction granularity for complex locations or times, Certain low-frequency or non-standardized expressions also weakened the model’s generalization ability, resulting in poor performance in identifying rare arguments.

This paper’s model also compares to existing models. As presented in Table 5, MRC_TibAE achieves an F1-score of 0.7659, outperforming both sequence tagging methods, such as TJEE, and prompt-based learning methods, such as GPT-4o and DeepSeek-V3. This demonstrates the effectiveness of formulating argument extraction through a question-answering paradigm. However, compared to the BERT_AC model, which also employs a question-answering framework and achieves a performance of 0.8840, the proposed method exhibits a certain performance gap. This difference is primarily due to the complexity of linguistic characteristics: although Tibetan, like Chinese, belongs to the Sino-Tibetan language family, it is more complex in terms of lexical agglutination, word order flexibility, and morphological variation. This poses greater challenges for models in key aspects such as semantic representation and argument boundary identification. Furthermore, due to its low-resource nature, the limited availability of Tibetan pre-training corpora constrains models like CINO from fully capturing its deep semantic characteristics, which in turn affects the accuracy of question–answer matching. Despite this, MRC_TibAE still achieves relatively ideal performance in the Tibetan judicial field, providing a feasible technical path for low-resource language event extraction.

6. Conclusions

This paper proposes an MRC-based method for Tibetan judicial event argument extraction, effectively addressing the data scarcity and insufficient model generalization issues in low-resource language scenarios. By designing question templates that incorporate event semantic information, constructing a deep semantic understanding architecture based on CINO, and employing a two-stage training strategy, the event argument extraction is successfully converted into a question-answering task. The results indicate that the MRC_TibAE achieves an F1-score of 76.59% for extracting arguments from Tibetan judicial events and demonstrates adequate stability and robustness under multiple data partitioning and random seeding settings. Performance analysis across different event types demonstrates that the model performs well in most events, such as drunk driving, but that challenges remain for complex events. Data sensitivity analysis confirms that this approach remains effective under small-scale data conditions, maintaining stable performance even when using only 40% of the training data.

Future research will extend to more low-resource languages and domain scenarios, introducing multimodal data such as images and speech to enhance the model’s event understanding capabilities in multi-source data environments. Additionally, we will actively promote collaboration with judicial departments to conduct application validation of the proposed model. Against the backdrop of national efforts to promote information technology development in ethnic-minority regions, the achievements of this research not only provide technical support for the preservation and development of languages such as Tibetan but also have practical significance for improving the efficiency and intelligence level of judicial processing in ethnic-minority areas.

Author Contributions

Conceptualization, L.G. and X.Z.; methodology, L.G. and X.Z.; software, L.G.; validation, L.G. and X.Z.; formal analysis, L.G. and X.Z.; data curation, L.G. and X.Z.; writing—original draft preparation, L.G.; writing—review and editing, L.G.; visualization, L.G. and X.Z.; funding acquisition, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Special Project on AI-Empowered Educational Reform in Universities of Hebei Province (Grant No. 2025RGZN065).

Data Availability Statement

The data presented in this study are available on request from the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, Q.; Li, J.; Sheng, J.; Cui, S.; Wu, J.; Hei, Y.; Peng, H.; Guo, S.; Wang, L.; Beheshti, A. A survey on deep learning event extraction: Approaches and applications. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 6301–6321. [Google Scholar] [CrossRef]
Liu, K.; Chen, Y.; Liu, J.; Zuo, X.; Zhao, J. Extracting events and their relations from texts: A survey on recent research progress and challenges. AI Open 2020, 1, 22–39. [Google Scholar] [CrossRef]
Cheng, J.; Liu, W.; Wang, Z.; Ren, Z.; Li, X. Joint event extraction model based on dynamic attention matching and graph attention networks. Sci. Rep. 2025, 15, 6900. [Google Scholar] [CrossRef] [PubMed]
Liu, K.; Zhao, H.; Wang, Z.; Hou, Q. EIGP: Document-level event argument extraction with information enhancement generated based on prompts. Knowl. Inf. Syst. 2024, 66, 7609–7626. [Google Scholar] [CrossRef]
Xie, J.; Zhang, Y.; Kou, H.; Zhao, X.; Feng, Z.; Song, L.; Zhong, W. A survey of the application of neural networks to event extraction. Tsinghua Sci. Technol. 2024, 30, 748–768. [Google Scholar] [CrossRef]
Wan, Q.; Wan, C.; Hu, R.; Liu, D.; Liu, X.; Liao, G. Event Extraction Based on Deep Learning: A Survey of Research Issue. Acta Autom. Sin. 2024, 50, 2079–2101. (In Chinese) [Google Scholar]
Kan, Z.; Qiao, L.; Yang, S.; Liu, F.; Huang, F. Event arguments extraction via dilate gated convolutional neural network with enhanced local features. IEEE Access 2020, 8, 123483–123491. [Google Scholar] [CrossRef]
Veyseh, A.P.B.; Nguyen, T.N.; Nguyen, T.H. Graph Transformer Networks with Syntactic and Semantic Structures for Event Argument Extraction. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16–20 November 2020; pp. 3651–3661. [Google Scholar]
Ziyaden, A.; Yelenov, A.; Hajiyev, F.; Rustamov, S.; Pak, A. Text data augmentation and pre-trained language model for enhancing text classification of low-resource languages. PeerJ Comput. Sci. 2024, 10, e1974. [Google Scholar] [CrossRef]
Congjun, L.; Hill, N.W. Recent developments in Tibetan NLP. Trans. Asian Low-Resour. Lang. Inf. Process. 2021, 20, 1–3. [Google Scholar] [CrossRef]
Wang, X.; Deng, W.; Hu, F.; Deng, W.; Zhang, Q. Joint event extraction based on sequence annotation. J. Chongqing Univ. Posts Telecommun. (Nat. Sci. Ed.) 2020, 32, 884–890. (In Chinese) [Google Scholar]
Gao, L.; Zhao, X. Tibetan Judicial Event Extraction Based on Deep Word Representation and Hybrid Neural Networks. Appl. Sci. 2025, 15, 1332. [Google Scholar] [CrossRef]
Alan, R.; Lombardo, R.; Barbara, P. Biomedical event extraction as sequence labeling. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 5357–5367. [Google Scholar]
Xu, D.; Chen, W.; Peng, W.; Zhang, C.; Xu, T.; Zhao, X.; Wu, X.; Zheng, Y.; Wang, Y.; Chen, E. Large language models for generative information extraction: A survey. Front. Comput. Sci. 2024, 18, 186357. [Google Scholar] [CrossRef]
Liu, J.; Chen, Y.; Liu, K.; Bi, W.; Liu, X. Event extraction as machine reading comprehension. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 1641–1651. [Google Scholar]
Lu, Y.; Lin, H.; Xu, J.; Han, X.; Tang, J.; Li, A.; Sun, L.; Liao, M.; Chen, S. Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Virtual Event, 1–6 August 2021; pp. 2795–2806. [Google Scholar]
Peng, J.; Yang, W.; Wei, F.; He, L.; Yao, L.; Lv, H. Event co-occurrences for prompt-based generative event argument extraction. Sci. Rep. 2024, 14, 31377. [Google Scholar] [CrossRef] [PubMed]
Liu, L.; Liu, M.; Liu, S.; Ding, K. Event extraction as machine reading comprehension with question-context bridging. Knowl.-Based Syst. 2024, 299, 112041. [Google Scholar] [CrossRef]
Chen, M.; Wu, F.; Wang, Z.; Li, P.; Zhu, Q. Chinese Event Argument Extraction using Reading Comprehension Framework. In Proceedings of the 19th Chinese National Conference on Computational Linguistics, Haikou, China, 30 October–1 November 2020; pp. 376–389. [Google Scholar]
Liu, J.; Chen, Y.; Xu, J. Document-level event argument linking as machine reading comprehension. Neurocomputing 2022, 488, 414–423. [Google Scholar] [CrossRef]
Nguyen, T.H.; Grishman, R. Event detection and domain adaptation with convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 26–31 July 2015; pp. 365–371. [Google Scholar]
Peng, J.; Yang, W.; Wei, F.; He, L. Prompt for extraction: Multiple templates choice model for event extraction. Knowl.-Based Syst. 2024, 289, 111544. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Liu, T.; Jiang, G.; Liu, S. Survey of Event Extraction in Low-resource Scenarios. Comput. Sci. 2024, 51, 217–237. (In Chinese) [Google Scholar]
He, L.; Zhao, X.; Zhao, L.; Zhang, Q. An Event Extraction Approach Based on a Multi-Round Q&A Framework. Appl. Sci. 2023, 13, 6308. [Google Scholar]
Liu, S.; Zhang, S.; Ding, K.; Liu, L. JEEMRC: Joint event detection and extraction via an end-to-end machine Reading comprehension model. Electronics 2024, 13, 1807. [Google Scholar] [CrossRef]
Sun, Y.; Liu, S.; Chen, C.; Dan, Z.; Zhao, X. Construction of high-quality tibetan dataset for machine reading comprehension. In Proceedings of the 20th Chinese National Conference on Computational Linguistics, Huhhot, China, 13–15 August 2021; pp. 208–218. [Google Scholar]
Dan, Z.; Sun, Y. TibetanQA2.0: Dataset with Unanswerable Questions for Tibetan Machine Reading Comprehension. Data Intell. 2024, 6, 1158–1167. [Google Scholar] [CrossRef]
Sun, Y.; Chen, C.; Liu, S.; Zhao, X. Ti-reader: An end-to-end network model based on attention mechanisms for tibetan machine reading comprehension. In Proceedings of the 20th Chinese National Conference on Computational Linguistics, Huhhot, China, 13–15 August 2021; pp. 219–228. [Google Scholar]
Yang, M. A Study on Span-Based Tibetan Machine Reading Comprehension; Qinghai Normal University: Qinghai, China, 2024. (In Chinese) [Google Scholar]
Xu, Z.; Zhu, J.; Xu, Z.; Wang, C.; Yan, S.; Liu, Y. Cascaded Tibetan Named Entity Recognition Model with Pre-trained Language Model. J. Chin. Inf. Process. 2023, 37, 23–28. (In Chinese) [Google Scholar]
Zhou, Q.; Yong, C.; La, M.; Ni, M. Entity Relation Extraction from Tibetan Medical Texts Based on Span Representation. Acta Sci. Nat. Univ. Pekin. 2024, 7, 1–11. (In Chinese) [Google Scholar]
Duan, J.; Liao, X.; An, Y.; Wang, J. KeyEE: Enhancing low-resource generative event extraction with auxiliary keyword sub-prompt. Big Data Min. Anal. 2024, 7, 547–560. [Google Scholar] [CrossRef]
Chen, Y.; Liu, S.; Zhang, X.; Liu, K.; Zhao, J. Automatically labeled data generation for large scale event extraction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; pp. 409–419. [Google Scholar]
Yang, H.; Chen, Y.; Liu, K.; Xiao, Y.; Zhao, J. Dcfee: A document-level chinese financial event extraction system based on automatically labeled training data. In Proceedings of the ACL 2018, System Demonstrations, Melbourne, Australia, 15–20 July 2018; pp. 50–55. [Google Scholar]
Huang, K.-H.; Hsu, I.-H.; Natarajan, P.; Chang, K.-W.; Peng, N. Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; pp. 4633–4646. [Google Scholar]
Bonisoli, G.; Vilares, D.; Rollo, F.; Po, L. Document-level event extraction from Italian crime news using minimal data. Knowl.-Based Syst. 2025, 317, 113386. [Google Scholar] [CrossRef]
Du, X.; Cardie, C. Event Extraction by Answering (Almost) Natural Questions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 671–683. [Google Scholar]
Yang, Z.; Xu, Z.; Cui, Y.; Wang, B.; Lin, M.; Wu, D.; Chen, Z. CINO: A Chinese Minority Pretrained Language Model. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; pp. 3937–3949. [Google Scholar]
Liu, Z.; Mitamura, T.; Hovy, E. Evaluation algorithms for event nugget detection: A pilot study. In Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation, Denver, CO, USA, 4 June 2015; pp. 53–57. [Google Scholar]
Sharif, O.; Gatto, J.; Basak, M.; Preum, S.M. REGen: A Reliable Evaluation Framework for Generative Event Argument Extraction. arXiv 2025, arXiv:2502.16838. [Google Scholar] [CrossRef]
Yu, X.; He, L.; Wang, X. Research on event extraction from ancient books based on machine reading comprehension. J. China Soc. Sci. Tech. Inf. 2023, 42, 316–326. (In Chinese) [Google Scholar]

Figure 1. Overall research framework.

Figure 2. Flowchart of question template generation.

Figure 3. MRC_TibAE architecture.

Figure 4. Two-stage training strategy.

Figure 5. Model performance under various hyperparameter configurations.

Figure 6. Experimental results of different question templates.

Figure 7. Experimental results of MRC_TibAE on different datasets.

Figure 8. Experimental results of different event types.

Figure 9. Data sensitivity analysis under low-resource conditions.

Table 1. Event arguments and categories.

Number	Argument Category	Event Argument
1	Person	Resale_Agent
2		Resale_Buyer
3		Theft_Agent
4		Theft_Victim
5		Purchase_Seller
6		Purchase_Agent
7		IntentionalInjury_Agent
8		IntentionalInjury_Victim
9		TrafficAccident_Agent
10		TrafficAccident_Victim
11		Robbery_Agent
12		Robbery_Victim
13		Death_Person
14		Fraud_Agent
15		Fraud_Victim
16		DrunkDriving_Person
17		Concealment_Agent
18		Arrest_Target
19		Arrest_Agent
20	Entity	Concealment_Item
21		Resale_Item
22		Purchase_Item
23		Appraisal_Item
24		Robbery_Item
25		Theft_Item
26		Fraud_Item
27		Appraisal_Organization
28		Robbery_Tool
29		Death_Cause
30		IntentionalInjury_Part
31	Amount	Resale_Price
32		Purchase_Price
33		Theft_Value
34		Robbery_Value
35		Appraisal_Value
36		DrunkDriving_AlcoholLevel
37	Location	Theft_Place
38		Concealment_Place
39		Robbery_Place
40		TrafficAccident_Place
41		Fraud_Place
42		Arrest_Place
43		DrunkDriving_Place
44	Time	Theft_Time
45		IntentionalInjury_Time
46		TrafficAccident_Time
47		Robbery_Time
48		Death_Time
49		Fraud_Time
50		Arrest_Time
51		DrunkDriving_Time

Table 2. Event types and arguments.

Number	Event Type	Event Arguments
1	Theft	Theft_Time Theft_Agent Theft_Victim Theft_Place Theft_Item Theft_Value
2	Appraisal	Appraisal_Organization Appraisal_Item Appraisal_Value
3	Arrest	Arrest_Agent Arrest_Target Arrest_Place Arrest_Time
4	Drunk Driving	DrunkDriving_Person DrunkDriving_Place DrunkDriving_Time DrunkDriving_AlcoholLevel
5	Fraud	Fraud_Agent Fraud_Victim Fraud_Time Fraud_Place Fraud_Item
6	Robbery	Robbery_Agent Robbery_Victim Robbery_Time Robbery_Place Robbery_Item Robbery_Value Robbery_Tool
7	Purchase	Purchase_Agent Purchase_Seller Purchase_Price Purchase_Item
8	Resale	Resale_Agent Resale_Buyer Resale_Price Resale_Item
9	Death	Death_Person Death_Time Death_Cause
10	Concealment	Concealment_Agent Concealment_Item Concealment_Place
11	Intentional Injury	IntentionalInjury_Agent IntentionalInjury_Victim IntentionalInjury_Time IntentionalInjury_Part
12	Traffic Accident	TrafficAccident_Agent TrafficAccident_Victim TrafficAccident_Time TrafficAccident_Place

Table 3. Experimental hyperparameter configuration.

Number	Hyperparameter	Value
1	batch_size	8
2	learning_rate	2 × 10⁻⁵
3	epochs	15
4	max_seq_length	400
5	doc_stride	128
6	max_query_length	64
7	max_answer_length	64
8	optimizer	AdamW
9	patience	4

Table 4. Results across various event types.

Number	Event Type	Mean ± SD	95% CI	Variance
1	Robbery	0.7158 ± 0.0752	[0.685, 0.747]	0.0057
2	Appraisal	0.8770 ± 0.0269	[0.866, 0.888]	0.0007
3	Theft	0.7978 ± 0.0164	[0.791, 0.805]	0.0003
4	Traffic Accident	0.6800 ± 0.0753	[0.649, 0.711]	0.0057
5	Drunk Driving	0.9050 ± 0.0299	[0.893, 0.917]	0.0009
6	Concealment	0.6864 ± 0.0781	[0.654, 0.719]	0.0061
7	Arrest	0.7241 ± 0.0340	[0.710, 0.738]	0.0012
8	Intentional Injury	0.7851 ± 0.1237	[0.734, 0.836]	0.0153
9	Death	0.7146 ± 0.0878	[0.678, 0.751]	0.0077
10	Resale	0.7341 ± 0.0226	[0.725, 0.743]	0.0005
11	Fraud	0.7411 ± 0.0640	[0.715, 0.768]	0.0041
12	Purchase	0.6479 ± 0.0928	[0.610, 0.686]	0.0086

Table 5. Comparison of model performance.

Number	Model Type	Model	F1-Score
1	Sequence Labeling	CINO-CRF	0.5587
2		CINO-BiLSTM	0.5818
3		TJEE [12]	0.6299
4	Prompt	GPT-4o	0.5686
5		DeepSeek-V3	0.6371
6	QA	BERT_AC [42]	0.8840
7		MRC_TibAE	0.7659

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, L.; Zhao, X. Tibetan Judicial Event Argument Extraction Based on Machine Reading Comprehension in Low-Resource Scenarios. Electronics 2025, 14, 3887. https://doi.org/10.3390/electronics14193887

AMA Style

Gao L, Zhao X. Tibetan Judicial Event Argument Extraction Based on Machine Reading Comprehension in Low-Resource Scenarios. Electronics. 2025; 14(19):3887. https://doi.org/10.3390/electronics14193887

Chicago/Turabian Style

Gao, Lu, and Xiaobing Zhao. 2025. "Tibetan Judicial Event Argument Extraction Based on Machine Reading Comprehension in Low-Resource Scenarios" Electronics 14, no. 19: 3887. https://doi.org/10.3390/electronics14193887

APA Style

Gao, L., & Zhao, X. (2025). Tibetan Judicial Event Argument Extraction Based on Machine Reading Comprehension in Low-Resource Scenarios. Electronics, 14(19), 3887. https://doi.org/10.3390/electronics14193887

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Tibetan Judicial Event Argument Extraction Based on Machine Reading Comprehension in Low-Resource Scenarios

Abstract

1. Introduction

2. Literature Review

2.1. Event Extraction Based on Deep Learning

2.2. Research on Tibetan Natural Language Processing

2.3. Event Extraction in Low-Resource Scenarios

3. Methodology

3.1. Task Formalization

3.1.1. Question Template Design

3.1.2. Task Reformulation

3.2. MRC_TibAE

3.2.1. Encoding Layer

3.2.2. Interaction Layer

3.2.3. Prediction Layer

3.3. Optimization Strategies for Low-Resource Languages

3.3.1. Two-Stage Training Strategy

3.3.2. Loss Function Design

4. Experiments

4.1. Experimental Environment Configuration

4.2. Datasets

4.3. Evaluation Metrics

4.4. Experiment and Results

5. Discussion

5.1. Model Effectiveness Analysis

5.1.1. Impact of Hyperparameter Settings

5.1.2. The Performance of the Two-Stage Training Strategy

5.1.3. Impact Analysis of Question Template Design

5.2. Model Stability and Generalization Ability Evaluation

5.3. Performance Analysis at Event Type Granularity

5.4. Data Scale Sensitivity Analysis in Low-Resource Scenarios

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI