Next Article in Journal
An Orthogonal Feature Space as a Watermark: Harmless Model Ownership Verification by Watermarking Feature Weights
Previous Article in Journal
A Lithium-Ion Battery Remaining Useful Life Prediction Method Based on Mode Decomposition and Informer-LSTM
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Tibetan Judicial Event Argument Extraction Based on Machine Reading Comprehension in Low-Resource Scenarios

1
School of Software, Handan University, Handan 056005, China
2
National Language Resources Monitoring & Research Center Minority Languages Branch, Minzu University of China, Beijing 100081, China
*
Authors to whom correspondence should be addressed.
Electronics 2025, 14(19), 3887; https://doi.org/10.3390/electronics14193887
Submission received: 4 August 2025 / Revised: 24 September 2025 / Accepted: 27 September 2025 / Published: 30 September 2025

Abstract

This paper proposes a Tibetan judicial event argument extraction method based on machine reading comprehension (MRC) to address the challenges of data scarcity and insufficient model generalization in low-resource language scenarios. Unlike traditional methods, this work models event argument extraction as an MRC task, progressively identifying and extracting various event arguments through a question-guided approach. First, a strategy for constructing event knowledge-enhanced questions tailored to the Tibetan judicial domain is designed. Specifically, interrogative words are formulated for different types of event arguments, and event semantic information is incorporated into questions to effectively disambiguate questions. Second, a deep semantic understanding architecture for Tibetan judicial events based on the CINO (Chinese Minority Pretrained Language Model) is proposed, incorporating a multi-head self-attention mechanism to enhance semantic alignment and global understanding between event sentences and questions. Finally, a two-stage training strategy is proposed for low-resource languages. Training is performed on a general Tibetan machine reading comprehension dataset, followed by task-adaptive fine-tuning on judicial domain data, effectively alleviating the data scarcity issue. Experimental results show that the proposed method achieved an F1-score of 76.59% in the Tibetan judicial event argument extraction task. This research offers new ideas for low-resource language event extraction and is of great significance for promoting intelligent information processing of minority languages.

1. Introduction

As a foundational component of natural language processing (NLP), event extraction focuses on identifying events and their key participants from unstructured texts, supporting higher-level tasks such as semantic retrieval, knowledge graphs, and automated reasoning [1,2]. Driven by advances in neural architectures and the accumulation of multilingual annotated corpora, event extraction has made significant progress in well-resourced languages, particularly English and Chinese [3,4].
The event extraction task usually includes two core subtasks: event trigger detection (ETD) and event argument extraction (EAE) [5,6]. The event argument extraction task focuses on identifying the participating entities and their semantic roles in the event, which is a key part of the complete modeling of event structure [7]. Compared with event trigger detection, argument extraction requires higher syntactic analysis and semantic comprehension, especially in scenarios with scarce language resources and complex text structures, which are more challenging [8]. Therefore, this paper focuses on the event argument extraction task in the Tibetan judicial context.
The core difficulties of extracting Tibetan judicial events mainly lie in the limitations of low-resource languages, the particularity of judicial texts, and the limitations of existing methods [9,10,11]. Tibetan is a typical low-resource language that lacks large-scale, high-quality annotated corpora, which seriously restricts the training and generalization capabilities of deep learning models [9,10]. Tibetan judicial texts are typically characterized by complex structures, dense legal terminology, and lengthy sentence patterns, which pose greater challenges for the identification and modeling of event arguments. In addition, judicial texts often involve the interweaving of multiple roles and events, further increasing the complexity of analysis. Although existing sequence labeling methods (e.g., BiLSTM-CRF) have achieved promising results in high-resource languages, they perform poorly in the Tibetan judicial domain, mainly due to their inability to capture long-distance semantic dependencies and their limitations in handling sparse arguments and cross-domain transfer [11,12].
In response to the above challenges, researchers have proposed a variety of solutions. Traditional event argument extraction methods mainly adopt classification or sequence labeling paradigms, transforming argument extraction tasks into role classification or label assignment problems [13]. Although such methods perform well in scenarios with simple structures and sufficient data, their performance often drops significantly when dealing with low-resource languages due to insufficient training data. In recent years, researchers have begun to try to reconstruct event extraction tasks into new paradigms such as generative extraction and question-answering extraction, alleviating the impact of data scarcity through modeling transformations [14,15,16,17]. Among them, the method based on machine reading comprehension (MRC) reduces task complexity and demonstrates good transfer and knowledge fusion capabilities by virtue of its problem-guiding mechanism and semantic modeling advantages [15].
Machine reading comprehension transforms the event argument extraction task into a series of question–answer pairs [18,19,20]. Through carefully designed question templates, the model is guided to focus on specific events and their arguments, effectively decomposing the originally complex extraction task [18]. Question templates contain prior knowledge of event types and argument roles, providing the model with additional semantic information and helping to improve the model’s capability to understand the relationship between events and arguments [19]. The MRC paradigm has strong cross-domain transfer capabilities, enabling the model to use general domain data for transfer learning, thereby achieving effective application in low-resource scenarios [20]. These characteristics make MRC-based approaches particularly suitable for event argument extraction in resource-constrained languages like Tibetan.
Building upon the above analysis, this paper introduces an MRC-driven approach for extracting event arguments from Tibetan judicial texts and builds a complete technical system covering question template design, extraction model construction, and training strategy optimization. Through the reconstruction of task modeling and knowledge fusion, it offers a more efficient and transferable strategy for event argument extraction in languages with limited resources, providing a reference framework for other low-resource or domain-specific applications.
The main contributions of this study are delineated as follows:
(1)
Designing high-quality question templates that integrate the semantics of event type and associated argument roles for the Tibetan judicial field, combining Tibetan interrogative words with event contextual information, effectively alleviating the problem of semantic ambiguity;
(2)
Proposing an MRC_TibAE framework based on CINO (Chinese Minority Pretrained Language Model), introducing multi-head self-attention to strengthen semantic interaction and modeling capabilities between event sentences and questions;
(3)
Developing a two-stage training strategy for low-resource languages, in which the model is first trained on a general Tibetan MRC dataset to acquire general language understanding, and then fine-tuned on domain-specific judicial corpora to improve transferability and extraction performance.

2. Literature Review

2.1. Event Extraction Based on Deep Learning

Deep learning models have been widely applied in event extraction tasks [11,21,22]. The mainstream deep learning architectures include CNNs, RNNs, GNNs, and Transformers [23]. In addition, many studies have adopted a hybrid approach of multiple architectures to give full play to their respective advantages. Nguyen et al. [21] first proposed the application of CNNs to event detection and proved the effectiveness of CNNs in event extraction tasks. Wang et al. [11] introduced a unified extraction framework based on CNNs and LSTMs to simultaneously identify event type and arguments, simplifying the multi-stage processing flow in traditional event extraction. Peng et al. [22] proposed a multiple template choice model (MTCM), which designed an extended event type mining module to automatically mine extended event types in event mentions. However, Liu et al. [24] pointed out that in low-resource scenarios, event extraction tasks face problems such as data sparsity and category imbalance, which impair the robustness of the model across diverse scenarios and make it difficult to handle complex nested event structures.
To address the above limitations, researchers have begun to try to reconstruct event extraction as a question-answering task in recent years by designing specific questions to guide the model to focus on different aspects of the event [15,18,25,26]. Liu et al. [15] proposed an RCEE framework which includes an unsupervised question generation process that can convert event patterns into a set of natural questions and then retrieve the answers as event extraction results through a BERT-based question-answering process. He et al. [25] further proposed a multi-round Q&A-based event extraction framework, which effectively exploits the hierarchical dependencies between the arguments. To promote consistency and improve performance, Liu et al. [26] designed JEEMRC, a model that seamlessly combines the recognition of events and their associated elements in a unified learning framework. Liu et al. [18] developed a question–context bridging method that reconstructs the semantic relationship between templates and text to enhance the prompting function of question templates. These works have verified the effectiveness of Q&A-based event extraction methods in introducing prior knowledge and adapting to low-resource conditions. However, existing research still focuses on resource-rich languages like Chinese and English; research efforts targeting under-resourced languages—for instance, Tibetan—remain constrained.

2.2. Research on Tibetan Natural Language Processing

Progress in Tibetan natural language processing (Tibetan NLP) has gradually emerged recently, especially in machine reading comprehension tasks. Sun et al. [27] constructed the first Tibetan machine reading comprehension dataset, TibetanQA, which covers multiple topics such as nature, culture, history, etc., and supports multiple types of question-answering tasks such as word matching and synonym replacement. The subsequent release of TibetanQA2.0 further expanded the scale and coverage of the data, added a variety of complex question types, and provided important resources for Tibetan semantic understanding [28]. Based on this, Sun et al. [29] proposed the Ti-Reader model, which introduced an attention mechanism to improve sentence comprehension ability and achieved good performance in the Tibetan MRC task. Yang [30] made full use of the BERT for fine-tuning and optimization and explored the applicability of the model to Tibetan comprehension tasks. These studies laid the foundation for the work of this paper and provided available data resources and baseline methods.
Although some progress has been made in Tibetan NLP, studies specifically focusing on event extraction are still very limited [10]. Existing studies mainly focus on basic entity recognition (NER) and relationship extraction (RE), such as the Tibetan pre-trained named entity recognition model combined with cascade technology proposed by Xu et al. [31] and the entity relationship extraction model for the field of Tibetan medicine studied by Zhou et al. [32]. However, there is little research on the extraction of more complex semantic units such as events. Gao et al. [12] attempted to apply a hybrid neural network-based method to Tibetan judicial event extraction, mainly adopting a sequence labeling framework. However, their approach lacks a systematic mechanism to address the syntactic characteristics of the Tibetan language and the challenges posed by data scarcity. The MRC-based Tibetan judicial event extraction method proposed in this paper offers a novel technical pathway for advancing Tibetan event extraction research.

2.3. Event Extraction in Low-Resource Scenarios

Low-resource language scenarios remain a significant challenge in current natural language processing research, particularly for tasks such as event extraction that are heavily dependent on semantic understanding [33]. For the problem of low-resource event extraction, existing research mainly focuses on three types of strategies: data augmentation, transfer learning, and task reformulation [24].
Data augmentation is a direct strategy to alleviate the low-resource problem, including rule-based sample replacement, model-generated data construction, and remote supervision annotation [34,35]. These methods can expand the training set size to a certain extent, but in Tibetan, with its complex language structure, there are risks such as grammatical perturbation and semantic bias, which may introduce training noise and affect model stability [24]. Transfer learning aids target-language task learning by introducing resource-rich language or general domain knowledge and shows good adaptability in low-resource event extraction [36]. Task reformulation methods change the task form of event extraction (such as generative or question-answering) to weaken the dependence on labeled data [16,20,37]. The event extraction method based on MRC proposed in this study is a typical task reformulation strategy [20]. By transforming event extraction into a question-answering form, the model is guided to focus on the target argument with the help of prior knowledge in the question template, thereby improving model performance under limited data conditions [38].
In summary, existing research has made some progress in deep learning, machine reading comprehension, and low-resource language processing, but research on Tibetan judicial event extraction is still relatively scarce [10]. Based on the machine reading comprehension framework, combined with the event knowledge enhanced question template design and two-stage training strategy, this paper proposes an event argument extraction method suitable for Tibetan, which provides new ideas and solutions for the event extraction task in low-resource languages.

3. Methodology

This paper proposes a framework comprising three interrelated components, building a comprehensive event argument extraction process for Tibetan judicial texts. Figure 1 shows that the framework has three main parts: task formalization, the MRC_TibAE architecture, and an optimization strategy for low-resource languages.
The task formalization module transforms the traditional event argument extraction tasks into structured question-answering problems. Through carefully designed question templates, it explicitly embeds event type and argument information, guiding the model to focus on extracting key event elements. The MRC_TibAE architecture concatenates the question with the event sentence and employs multi-head self-attention to model complex dependencies between them; then, the core features are processed using a feedforward neural network and residual connections and normalization layers; finally, the start and end positions of the target segment in the text are predicted through the normalized Softmax classification layer. An optimization strategy for low-resource languages addresses the challenge of data scarcity. The model is trained on the TibetanQA dataset for general-purpose Tibetan machine reading comprehension, acquiring basic Tibetan language comprehension capabilities. It is then fine-tuned on a specific judicial dataset. This two-stage training approach effectively leverages a wider range of general-purpose datasets, establishing foundational Tibetan language comprehension capabilities adapted to the specialized judicial domain.

3.1. Task Formalization

3.1.1. Question Template Design

This study designs a question representation method that integrates event semantic knowledge. By explicitly embedding event type and argument information into query templates, a set of questions with clear semantic orientation is constructed, effectively eliminating ambiguity and enhancing the model’s ability to locate answers. Table 1 systematically presents the event argument classification system constructed in this study. This system covers 51 event elements across five categories—person, entity, amount, location, and time—providing structured extraction targets for the machine reading comprehension framework.
To precisely guide the model in locating event arguments, this study designs structured question templates that explicitly instruct the model to search for specific arguments under a given event type. This structured question template design ensures that each question clearly includes information about the event type and argument role, effectively eliminating semantic ambiguity and improving the model’s ability to locate specific event arguments. To standardize the design of question templates, the templates are formalized as shown in Equation (1).
Q = T e m p l a t e ( e t y p e , r t y p e , w t y p e )
Here, Q denotes the question template, e t y p e represents the event type, r t y p e represents the argument role, w t y p e refers to the type of interrogative word, and Template represents the question template generation function.
The question templates shown in Figure 2 correspond one-to-one with the 51 types of event arguments listed in Table 1, forming a complete argument extraction question set, which provides the model with clear extraction targets and search directions. This structured system of question templates enables the model to query event elements in a standardized manner, ensuring the consistency and effectiveness of the extraction process. Each question template precisely embeds information about the event type and argument roles, eliminating semantic ambiguity and enhancing the model’s ability to locate key event elements in judicial texts.

3.1.2. Task Reformulation

Traditional event argument extraction methods mainly adopt the classification or sequence labeling paradigms. However, such methods are highly data-dependent and struggle to generalize in limited-resource settings. To address this limitation, this study reformulates event argument extraction as a question-answering-based task. Each event argument is represented by a corresponding question. Given the question Q and the event sentence S , the model predicts the boundaries of the target argument span by learning the mapping function f ( Q , S ) . Formally, the model seeks to capture the following mapping:
P s t a r t , e n d = f ( Q , S )
Here, f ( Q , S ) denotes the joint modeling function of the question Q and event sentence S , and P s t a r t , e n d represents the beginning and ending indices of the start and end positions of the target argument fragment in the original text.
This modeling approach transforms the structured event argument extraction task into an interpretable and easily transferable question-answering matching problem, providing a strong basis for subsequent model architecture design and training optimization.

3.2. MRC_TibAE

Based on the MRC paradigm following task reformulation, this paper proposes the MRC_TibAE model (Machine Reading Comprehension Model for Tibetan Judicial Event Argument Extraction). The model’s overall architecture is depicted in Figure 3 and includes three key components: the encoding layer, the interaction layer, and the prediction layer. The model employs the minority language pre-training model CINO as the basic encoder. Using self-attention, the model effectively encodes the semantic interplay between the question and the event sentence. Finally, a position prediction module is used to locate and extract the event arguments.

3.2.1. Encoding Layer

The encoding layer is responsible for converting the Tibetan input sequence into a dense vector representation. The input consists of two parts: event sentence S = { s 1 , s 2 , , s n } ( n refers to the length of S ) and question Q = { q 1 , q 2 , , q m } ( m refers to the length of Q ). For joint modeling, the two parts are concatenated into a unified input sequence X in the form of:
X = < s >   Q   < / s >   S   < / s >
where < s > and < / s > indicate the sequence start token and separator token, respectively, and represents the sequence concatenation operation. The size of the concatenated input is l = n + m + 3 .
The CINO encoder processes the input to generate three types of embedding: token embedding, positional embedding, and segment embedding.
E = E m b t o k e n + E m b p o s + E m b s e g R l × d
Here, E m b t o k e n , E m b p o s , and E m b s e g represent the token embedding, positional embedding, and segment embedding, respectively. l refers to the input sequence size, and d is the embedding dimension.
The CINO model applies multi-layer stacked encoding to the input and produces high-dimensional contextual representations for subsequent processing.

3.2.2. Interaction Layer

Tibetan judicial texts are complex in structure and information-dense, so the model needs to have strong context modeling capabilities. To accurately capture the semantic association between questions and event sentences, this study adopts a multi-head self-attention mechanism to encode interactions between tokens, effectively directing attention toward question-relevant content. Specifically, given the input representation E = { e 1 , e 2 , , e l } , the model utilizes h parallel multi-head attention modules to capture multi-level semantic relationships from different perspectives. The computation of the i-th self-attention head, h e a d i , is defined as shown in Equation (5).
h e a d i = A t t e n t i o n Q i , K i , V i = s o f t m a x Q i K i T d k V i
Here, Q i = H W i Q , K i = H W i K , V i = H W i V , W i Q , W i K , W i V are learnable parameter matrices, and d k is the scaling factor for the attention mechanism.
The outputs of all attention heads, h e a d 1 , h e a d 2 , , h e a d h , are merged and subsequently transformed via a linear layer to obtain the fusion representation of multi-head attention, as shown in Equation (6).
M u l t i H e a d H = c o n c a t ( h e a d 1 , , h e a d h ) W o
Here, W o is the output mapping matrix. C o n c a t is the concatenation function. h e a d i denotes the output generated by the i-th attention head. h represents the count of attention heads.
Through the multi-head self-attention mechanism, the model can capture deep semantic associations within the interaction representations between the question and event sentence. In this process, each word s i in the event sentence interacts with each word q i in the question, which strengthens the model’s ability to comprehend the event sentence in a question-aware manner. After processing by multi-layer self-attention and feedforward networks, the final hidden representation from the CINO model is taken and recorded as H Q A :
H Q A = C I N O ( E )
Here, H Q A R l × d represents the final semantic interaction representation, l refers to the input length, and d is the size of the hidden representation.

3.2.3. Prediction Layer

After obtaining the deep semantic representation H Q A , two independent linear classifiers are used to estimate the likelihood that each token corresponds to the beginning or end of the target argument span, respectively:
P s t a r t = s o f t m a x W s t a r t H Q A
where P s t a r t represents the likelihood assigned to each token being the starting boundary of the target argument span. W s t a r t is a learnable parameter matrix.
P e n d = s o f t m a x ( W e n d H Q A )
where P e n d denotes the likelihood assigned to each token being the ending boundary of the target argument span. W e n d is the learnable parameter matrix.
After obtaining the position probability distribution, enumerate all possible start and end position pairs (i, j) to construct the candidate span range set T :
T = { ( i , j ) | i ϵ P s t a r t , j P e n d }
To ensure the rationality of the extraction results, the model designs a systematic constraint mechanism to screen the candidate range. First, the span length must not exceed the preset maximum value. Second, the start position cannot be later than the end position (i.e., i j ). Finally, the target span needs to fall completely within the valid range of the original text. For all candidate spans that meet the constraints, the joint probability score of each start and end position pair is calculated, as shown in Equation (11).
s c o r e i , j = P s t a r t i · P e n d j               i j
where P s t a r t ( i ) denotes the likelihood of the i-th token being selected as the start of the target span. P e n d ( j ) denotes the likelihood of the j-th token being selected as the end of the target span. s c o r e ( i , j ) denotes the joint probability score.
All joint probability scores are ranked from highest to lowest, the N candidate ranges with the highest scores are selected, and the fragment yielding the maximum probability score is ultimately retained, as illustrated in Equation (12). If the scores of all candidate fragments are lower than the preset threshold, or no valid candidate fragments are generated during the screening process, the model will return an empty result, indicating that no relevant events or arguments are detected.
s p a n * = argmax ( i , j ) S ( s c o r e ( i , j ) )
Here, s p a n * denotes the sequence span predicted by the model, which is mapped back to the original text space.

3.3. Optimization Strategies for Low-Resource Languages

3.3.1. Two-Stage Training Strategy

The complex linguistic structure of Tibetan judicial texts and the severe lack of annotated data make it difficult for models to directly learn effective representations from limited data. To improve the model’s adaptability in this domain, this paper proposes a two-stage strategy. This strategy leverages general domain knowledge to strengthen the model’s language understanding capabilities, then transfers this knowledge to the judicial domain to enhance the model’s ability to learn specialized terminology and event structure.
As shown in Figure 4, the first stage is trained on the general Tibetan machine reading comprehension dataset TibetanQA so that the model can learn general Tibetan language comprehension and Q&A modeling capabilities and be familiar with the basic grammar, common vocabulary, and syntactic structure of Tibetan. The formal representation is as follows:
θ p r e = a r g m i n L ( θ , D T i b e t a n Q A )
Here, θ represents the initial model parameters and D T i b e t a n Q A represents the Tibetan machine reading comprehension dataset. L refers to the loss function. θ p r e indicates the parameters after the general domain training is completed.
The second phase continues fine-tuning on the Tibetan judicial event dataset, adapting the model to the specific semantic features of Tibetan judicial texts, such as professional terminology, event structure, and syntactic expressions.
θ f t = a r g m i n θ L ( θ p r e , D j u d i c a l )
Here, θ p r e is the model parameter saved in the first stage and θ f t represents the model parameter after fine-tuning in the judicial field. D j u d i c a l represents the Tibetan judicial event dataset. Through domain-specific fine-tuning on judicial data, the model acquires legal-domain linguistic features, which in turn enhances its performance on judicial event extraction.
Through a two-stage strategy, the model can be adapted both at the language level and the task level, thereby alleviating the low-resource language problem and improving the final extraction effect.

3.3.2. Loss Function Design

The model employs binary cross-entropy to quantify the discrepancy between predicted and true start/end positions of the argument span. In each training iteration, model parameters are optimized by minimizing the loss, aligning the predicted start and end positions more closely with the annotated ground truth. Specifically, the target loss comprises two parts– L s t a r t and L e n d —representing the start and end prediction losses, respectively. The loss is calculated using the following expression:
L s t a r t = y s t a r t log P s t a r t
L e n d = y e n d log P e n d
Here, y s t a r t denotes the true start position of the target argument span, and y e n d denotes the true end position. P s t a r t denotes the predicted probability distribution over the start positions, and P e n d denotes the predicted probability distribution over the end positions.
During training, the AdamW optimizer is used to optimize the loss function, and parameter updates are performed by minimizing the total loss. The overall training loss integrates the losses for both the beginning and ending positions of the target argument span:
L t o t a l = L s t a r t + L e n d
Here, L t o t a l denotes the total loss of the model. L s t a r t and L e n d represent the cross-entropy losses for the start and end positions of the target argument span, respectively.

4. Experiments

4.1. Experimental Environment Configuration

The experiment uses the CINO minority language pre-trained model [39], which has a hidden layer vector dimension of 768. All experiments are performed using a workstation that integrates two Tesla P40 GPUs, each offering 24 GB of memory. The software environment includes Python 3.8, PyTorch 2.4.1 (CUDA 11.8), and Transformers 4.46.0. This configuration is well-suited for high-dimensional vector processing tasks, such as fine-tuning and reasoning with pre-trained language models. It enables efficient handling of large-scale text data and long-sequence inputs, meeting the computational requirements of Tibetan judicial event extraction tasks.

4.2. Datasets

The experiment uses two Tibetan datasets: one is a MRC dataset in the general domain, TibetanQA [27,28]; the other is an event extraction dataset in the judicial domain [12].
(1)
TibetanQA: This dataset was released by Sun Yuan et al. It contains a total of 14,054 Tibetan samples covering 12 topics such as nature, culture, education, etc., and supports question-answer types such as word-matching, synonym substitution, and multi-sentence reasoning. The raw data are presented in the form of triples, including article, question, and answer. During data preprocessing, this paper uses regular expressions to locate the answers and labels for each sentence. A total of 90% of the data is allocated for training, while the remaining 10% is used for validation.
(2)
Judicial event dataset: The dataset was constructed from publicly available Tibetan judicial documents sourced from China Judgments Online, containing 3006 Tibetan judicial event records covering 12 event types and their corresponding 51 event arguments, with all personal names anonymized to protect privacy. To better accommodate the specific task requirements of this study, the original event and argument labels were suitably revised. Table 2 summarizes the revised event types along with their associated argument roles. An 8:1:1 split with stratified sampling is applied to divide the dataset into training, validation, and test portions. To adapt the data to the MRC task format, this paper preprocesses the original annotations by converting them into a standardized set of triples while preserving the original text span positions of the event arguments. This ensures compatibility with model training and inference.

4.3. Evaluation Metrics

This paper uses precision, recall and F1-score as evaluation indicators and uses the micro-average method for calculation. The specific calculation methods are presented in Equations (18)–(20).
P r e c i s i o n = T P T P + F P  
R e c a l l = T P T P + F N
F 1 = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l
Here, TP (True Positive) denotes correctly predicted instances, FP (False Positive) refers to incorrect predictions made by the model, and FN (False Negative) represents the instances that the model failed to identify. Regarding argument boundary determination, this paper adopts a relaxed matching strategy [40,41] for evaluation. When the extracted argument span highly overlaps with the gold standard span, it is considered correctly identified.

4.4. Experiment and Results

This paper uses a grid search strategy to optimize key hyperparameters. The learning rate search space is set to [1 × 10−5, 2 × 10−5, 5 × 10−5], and the batch size search space is [8, 16, 32]. The optimal configuration is found through exhaustive combinations. The training procedure is limited to 15 epochs, with early stopping employed to prevent overfitting. The patience value is set to 4 epochs, meaning that training is terminated after four consecutive epochs of validation set performance failure.
The final hyperparameter configuration is summarized in Table 3. A batch size of 8 and a learning rate of 2 × 10−5 are employed, while the maximum sequence length is set to 400 and the document step size to 128. The maximum length of questions and answers is limited to 64, and the optimizer uses AdamW.
This study conducted experiments on five randomly partitioned datasets, setting five random seeds in each partition, for a total of 25 experimental groups. According to the experimental results, the proposed model attains an average F1-score of 0.7659, with a standard deviation of 0.0101 and a 95% confidence interval of [0.7617, 0.7701]. This indicates that the results are concentrated and have low fluctuations, demonstrating good stability and reliability. Regarding the performance of different datasets, the best result was observed on Dataset 2, with an average F1-score of 0.7816, while the performance of Datasets 1 and 4 was relatively low but still within a reasonable range. Although there were some differences between the results of each group, they remained at a high level overall, further verifying the robustness of the proposed method under different data conditions.

5. Discussion

5.1. Model Effectiveness Analysis

5.1.1. Impact of Hyperparameter Settings

To determine the optimal hyperparameter configuration, we systematically experimented with combinations of learning rates [1 × 10−5, 2 × 10−5, 5 × 10−5] and batch sizes [8, 16, 32], training for 15 epochs. Figure 5 shows the performance of the model under different learning rate and batch size combinations.
Results show that different hyperparameter combinations lead to significant differences in convergence speed and performance. A learning rate that is too low often leads to slow model convergence, making it difficult to reach optimality within a limited number of epochs. A learning rate that is too high leads to rapid initial improvement but is accompanied by significant fluctuations and instability. Comprehensive analysis shows that the combination of a learning rate of 2 × 10−5 and a batch size of 8 achieves the best balance between convergence speed, stability, and final performance. Therefore, this configuration was ultimately selected as the core parameter for subsequent experiments.

5.1.2. The Performance of the Two-Stage Training Strategy

To validate the effectiveness of the two-stage training strategy, this paper compared a single-stage approach (training directly on judicial domain data) with a two-stage approach (first training on the general Tibetan MRC dataset and then fine-tuning on judicial domain data).
The experimental findings are presented in Figure 5. They demonstrate that the two-stage approach demonstrates significant advantages in low-resource scenarios. Compared to the single-stage approach, the two-stage approach achieves faster performance gains in the early stages of model convergence and maintains greater stability in the later stages of training. This demonstrates that prior training on the general Tibetan MRC dataset enables the model to learn cross-task language understanding capabilities and question-answering patterns, laying a solid foundation for subsequent adaptive fine-tuning in the judicial domain. On the other hand, while the single-stage approach also gradually converges, it exhibits significant performance lags in the early stages and a relatively limited upper limit in the later stages. This suggests that under data-scarce conditions, the single-stage approach struggles to fully capture semantic regularities, resulting in insufficient model generalization. The two-stage approach, by transferring existing linguistic and semantic knowledge, effectively alleviates the performance bottleneck caused by limited data size.

5.1.3. Impact Analysis of Question Template Design

This paper designed three different question templates for comparative experiments. Figure 6 shows the performance of the three question templates. Question Template 0 only contains event argument role information, Question Template 1 adds interrogative words and uses natural language questioning, and the standard question template is the template proposed in this paper.
While Template 0 provides the model with basic extraction guidance, its expression is overly brief and lacks semantic context, which can easily lead to confusion when the model encounters similar roles. Template 1 enhances the readability and naturalness of questions to a certain extent, enabling the model to better align questions with sentences, but its semantic guidance is still insufficient. In contrast, the question template proposed in this paper further incorporates event semantic information, making the questions more targeted and discriminative. Experimental curves indicate that the proposed question template attains optimal final results and provides more stable overall training.

5.2. Model Stability and Generalization Ability Evaluation

Based on the optimal hyperparameter configuration, a systematic experiment was conducted using a cross-validation strategy combining multiple datasets and multiple random seeds. Specifically, the dataset was randomly divided into five independent datasets at an 8:1:1 ratio, and five different random seeds (42, 123, 456, 789, and 1024) were set. A total of 25 independent experiments were carried out to verify the consistency and robustness of the results.
Figure 7 shows the performance distribution across various datasets. Overall, the model exhibits good stability across all partitions, with the mean F1-score remaining within a range of 75.85% to 78.16%. The medians across datasets show little variation, and the vast majority of results are concentrated with minimal fluctuation, demonstrating the model’s robustness and consistency in low-resource scenarios. Experimental results under different random seeds exhibit no extreme outliers, further validating the model’s robustness in low-resource scenarios. Notably, while subtle differences persist between some datasets, the overall variation remains within a reasonable range, with no significant deviations or abnormal fluctuations. This demonstrates that the proposed method enhances the model’s convergence efficiency and task adaptability, and ensures consistent performance even under uncertain data partitioning.

5.3. Performance Analysis at Event Type Granularity

To provide a clearer analysis of different event types, we conducted a fine-grained evaluation on the 12 event categories in the dataset. We used the same experimental setup as in Section 5.2, conducting 25 independent experiments under five data partitions and five random seed conditions to ensure statistical reliability. The experimental outcomes for various event types are illustrated in Table 4 and Figure 8.
From the overall distribution perspective, performance across event types shows differences. “Drunk Driving” achieved the highest performance, reaching an average F1-score of 0.9050 (95% CI: [0.893, 0.917]) and a variance of only 0.0009, demonstrating excellent performance and stability. This is primarily due to the relatively simple argument structure and highly patterned presentation of judicial texts in these events. “Appraisal” also exhibited satisfactory stability (F1 = 0.8770, 95% CI: [0.866, 0.888]), with a standard deviation of only 0.0269. This is related to their procedural and normative nature. In contrast, “Purchase” (F1 = 0.6479, 95% CI: [0.610, 0.686]) and “Traffic Accident” (F1 = 0.6800, 95% CI: [0.649, 0.711]) not only had lower means but also wide confidence intervals, indicating significant uncertainty. This shows that the argument relationships of such events are complex and the contexts are diverse, which brings greater challenges for the model.
From a stability perspective, events such as “Drunk Driving” and “Theft” exhibit excellent stability (variance ≤ 0.001), indicating that the model’s outcomes for these events are more consistent and robust. In contrast, performance on events such as “Intentional Injury” and “Purchase” are more dispersed, indicating that the model is more significantly affected by data partitioning and random factors in these events, resulting in less stable results.

5.4. Data Scale Sensitivity Analysis in Low-Resource Scenarios

For a thorough examination of the proposed method’s adaptability under limited-resource settings, experiments were performed by randomly sampling the training dataset at different ratios: 10%, 20%, 30%, 40%, 50%, 60%, 70%, and 100%. Figure 9 presents the experimental results.
Experimental results show that the model achieves a steady improvement with increasing data size, but the rate of increase exhibits a distinct phased pattern. At extremely low data ratios (e.g., 10%), the F1-score is only 0.5787, a significant drop in performance, indicating that the model struggles to fully learn event semantic patterns when the data are insufficient. When the data ratio increases to 20%, performance improves rapidly, with the F1-score increasing to approximately 0.6755, an increase of nearly 10 percentage points. However, as the amount of data increases, model performance improves rapidly but stabilizes after reaching a certain scale (40%). In particular, when the training data are expanded to medium and high ratios, the marginal gains in performance gradually decrease, eventually reaching a plateau as the training data approach the full amount of data. This trend demonstrates that the proposed method can effectively function with limited data, but further performance improvement requires the introduction of additional knowledge.
We conducted a qualitative analysis of typical error cases and found that the problems primarily occurred in scenarios with ambiguous semantics and unclear argument boundaries. In some judicial texts, the expressions of different argument roles were highly similar, leading to confusion in the model’s ability to distinguish them. The model also struggled to determine the appropriate extraction granularity for complex locations or times, Certain low-frequency or non-standardized expressions also weakened the model’s generalization ability, resulting in poor performance in identifying rare arguments.
This paper’s model also compares to existing models. As presented in Table 5, MRC_TibAE achieves an F1-score of 0.7659, outperforming both sequence tagging methods, such as TJEE, and prompt-based learning methods, such as GPT-4o and DeepSeek-V3. This demonstrates the effectiveness of formulating argument extraction through a question-answering paradigm. However, compared to the BERT_AC model, which also employs a question-answering framework and achieves a performance of 0.8840, the proposed method exhibits a certain performance gap. This difference is primarily due to the complexity of linguistic characteristics: although Tibetan, like Chinese, belongs to the Sino-Tibetan language family, it is more complex in terms of lexical agglutination, word order flexibility, and morphological variation. This poses greater challenges for models in key aspects such as semantic representation and argument boundary identification. Furthermore, due to its low-resource nature, the limited availability of Tibetan pre-training corpora constrains models like CINO from fully capturing its deep semantic characteristics, which in turn affects the accuracy of question–answer matching. Despite this, MRC_TibAE still achieves relatively ideal performance in the Tibetan judicial field, providing a feasible technical path for low-resource language event extraction.

6. Conclusions

This paper proposes an MRC-based method for Tibetan judicial event argument extraction, effectively addressing the data scarcity and insufficient model generalization issues in low-resource language scenarios. By designing question templates that incorporate event semantic information, constructing a deep semantic understanding architecture based on CINO, and employing a two-stage training strategy, the event argument extraction is successfully converted into a question-answering task. The results indicate that the MRC_TibAE achieves an F1-score of 76.59% for extracting arguments from Tibetan judicial events and demonstrates adequate stability and robustness under multiple data partitioning and random seeding settings. Performance analysis across different event types demonstrates that the model performs well in most events, such as drunk driving, but that challenges remain for complex events. Data sensitivity analysis confirms that this approach remains effective under small-scale data conditions, maintaining stable performance even when using only 40% of the training data.
Future research will extend to more low-resource languages and domain scenarios, introducing multimodal data such as images and speech to enhance the model’s event understanding capabilities in multi-source data environments. Additionally, we will actively promote collaboration with judicial departments to conduct application validation of the proposed model. Against the backdrop of national efforts to promote information technology development in ethnic-minority regions, the achievements of this research not only provide technical support for the preservation and development of languages such as Tibetan but also have practical significance for improving the efficiency and intelligence level of judicial processing in ethnic-minority areas.

Author Contributions

Conceptualization, L.G. and X.Z.; methodology, L.G. and X.Z.; software, L.G.; validation, L.G. and X.Z.; formal analysis, L.G. and X.Z.; data curation, L.G. and X.Z.; writing—original draft preparation, L.G.; writing—review and editing, L.G.; visualization, L.G. and X.Z.; funding acquisition, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Special Project on AI-Empowered Educational Reform in Universities of Hebei Province (Grant No. 2025RGZN065).

Data Availability Statement

The data presented in this study are available on request from the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Li, Q.; Li, J.; Sheng, J.; Cui, S.; Wu, J.; Hei, Y.; Peng, H.; Guo, S.; Wang, L.; Beheshti, A. A survey on deep learning event extraction: Approaches and applications. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 6301–6321. [Google Scholar] [CrossRef]
  2. Liu, K.; Chen, Y.; Liu, J.; Zuo, X.; Zhao, J. Extracting events and their relations from texts: A survey on recent research progress and challenges. AI Open 2020, 1, 22–39. [Google Scholar] [CrossRef]
  3. Cheng, J.; Liu, W.; Wang, Z.; Ren, Z.; Li, X. Joint event extraction model based on dynamic attention matching and graph attention networks. Sci. Rep. 2025, 15, 6900. [Google Scholar] [CrossRef] [PubMed]
  4. Liu, K.; Zhao, H.; Wang, Z.; Hou, Q. EIGP: Document-level event argument extraction with information enhancement generated based on prompts. Knowl. Inf. Syst. 2024, 66, 7609–7626. [Google Scholar] [CrossRef]
  5. Xie, J.; Zhang, Y.; Kou, H.; Zhao, X.; Feng, Z.; Song, L.; Zhong, W. A survey of the application of neural networks to event extraction. Tsinghua Sci. Technol. 2024, 30, 748–768. [Google Scholar] [CrossRef]
  6. Wan, Q.; Wan, C.; Hu, R.; Liu, D.; Liu, X.; Liao, G. Event Extraction Based on Deep Learning: A Survey of Research Issue. Acta Autom. Sin. 2024, 50, 2079–2101. (In Chinese) [Google Scholar]
  7. Kan, Z.; Qiao, L.; Yang, S.; Liu, F.; Huang, F. Event arguments extraction via dilate gated convolutional neural network with enhanced local features. IEEE Access 2020, 8, 123483–123491. [Google Scholar] [CrossRef]
  8. Veyseh, A.P.B.; Nguyen, T.N.; Nguyen, T.H. Graph Transformer Networks with Syntactic and Semantic Structures for Event Argument Extraction. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16–20 November 2020; pp. 3651–3661. [Google Scholar]
  9. Ziyaden, A.; Yelenov, A.; Hajiyev, F.; Rustamov, S.; Pak, A. Text data augmentation and pre-trained language model for enhancing text classification of low-resource languages. PeerJ Comput. Sci. 2024, 10, e1974. [Google Scholar] [CrossRef]
  10. Congjun, L.; Hill, N.W. Recent developments in Tibetan NLP. Trans. Asian Low-Resour. Lang. Inf. Process. 2021, 20, 1–3. [Google Scholar] [CrossRef]
  11. Wang, X.; Deng, W.; Hu, F.; Deng, W.; Zhang, Q. Joint event extraction based on sequence annotation. J. Chongqing Univ. Posts Telecommun. (Nat. Sci. Ed.) 2020, 32, 884–890. (In Chinese) [Google Scholar]
  12. Gao, L.; Zhao, X. Tibetan Judicial Event Extraction Based on Deep Word Representation and Hybrid Neural Networks. Appl. Sci. 2025, 15, 1332. [Google Scholar] [CrossRef]
  13. Alan, R.; Lombardo, R.; Barbara, P. Biomedical event extraction as sequence labeling. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 5357–5367. [Google Scholar]
  14. Xu, D.; Chen, W.; Peng, W.; Zhang, C.; Xu, T.; Zhao, X.; Wu, X.; Zheng, Y.; Wang, Y.; Chen, E. Large language models for generative information extraction: A survey. Front. Comput. Sci. 2024, 18, 186357. [Google Scholar] [CrossRef]
  15. Liu, J.; Chen, Y.; Liu, K.; Bi, W.; Liu, X. Event extraction as machine reading comprehension. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 1641–1651. [Google Scholar]
  16. Lu, Y.; Lin, H.; Xu, J.; Han, X.; Tang, J.; Li, A.; Sun, L.; Liao, M.; Chen, S. Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Virtual Event, 1–6 August 2021; pp. 2795–2806. [Google Scholar]
  17. Peng, J.; Yang, W.; Wei, F.; He, L.; Yao, L.; Lv, H. Event co-occurrences for prompt-based generative event argument extraction. Sci. Rep. 2024, 14, 31377. [Google Scholar] [CrossRef] [PubMed]
  18. Liu, L.; Liu, M.; Liu, S.; Ding, K. Event extraction as machine reading comprehension with question-context bridging. Knowl.-Based Syst. 2024, 299, 112041. [Google Scholar] [CrossRef]
  19. Chen, M.; Wu, F.; Wang, Z.; Li, P.; Zhu, Q. Chinese Event Argument Extraction using Reading Comprehension Framework. In Proceedings of the 19th Chinese National Conference on Computational Linguistics, Haikou, China, 30 October–1 November 2020; pp. 376–389. [Google Scholar]
  20. Liu, J.; Chen, Y.; Xu, J. Document-level event argument linking as machine reading comprehension. Neurocomputing 2022, 488, 414–423. [Google Scholar] [CrossRef]
  21. Nguyen, T.H.; Grishman, R. Event detection and domain adaptation with convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 26–31 July 2015; pp. 365–371. [Google Scholar]
  22. Peng, J.; Yang, W.; Wei, F.; He, L. Prompt for extraction: Multiple templates choice model for event extraction. Knowl.-Based Syst. 2024, 289, 111544. [Google Scholar] [CrossRef]
  23. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
  24. Liu, T.; Jiang, G.; Liu, S. Survey of Event Extraction in Low-resource Scenarios. Comput. Sci. 2024, 51, 217–237. (In Chinese) [Google Scholar]
  25. He, L.; Zhao, X.; Zhao, L.; Zhang, Q. An Event Extraction Approach Based on a Multi-Round Q&A Framework. Appl. Sci. 2023, 13, 6308. [Google Scholar]
  26. Liu, S.; Zhang, S.; Ding, K.; Liu, L. JEEMRC: Joint event detection and extraction via an end-to-end machine Reading comprehension model. Electronics 2024, 13, 1807. [Google Scholar] [CrossRef]
  27. Sun, Y.; Liu, S.; Chen, C.; Dan, Z.; Zhao, X. Construction of high-quality tibetan dataset for machine reading comprehension. In Proceedings of the 20th Chinese National Conference on Computational Linguistics, Huhhot, China, 13–15 August 2021; pp. 208–218. [Google Scholar]
  28. Dan, Z.; Sun, Y. TibetanQA2.0: Dataset with Unanswerable Questions for Tibetan Machine Reading Comprehension. Data Intell. 2024, 6, 1158–1167. [Google Scholar] [CrossRef]
  29. Sun, Y.; Chen, C.; Liu, S.; Zhao, X. Ti-reader: An end-to-end network model based on attention mechanisms for tibetan machine reading comprehension. In Proceedings of the 20th Chinese National Conference on Computational Linguistics, Huhhot, China, 13–15 August 2021; pp. 219–228. [Google Scholar]
  30. Yang, M. A Study on Span-Based Tibetan Machine Reading Comprehension; Qinghai Normal University: Qinghai, China, 2024. (In Chinese) [Google Scholar]
  31. Xu, Z.; Zhu, J.; Xu, Z.; Wang, C.; Yan, S.; Liu, Y. Cascaded Tibetan Named Entity Recognition Model with Pre-trained Language Model. J. Chin. Inf. Process. 2023, 37, 23–28. (In Chinese) [Google Scholar]
  32. Zhou, Q.; Yong, C.; La, M.; Ni, M. Entity Relation Extraction from Tibetan Medical Texts Based on Span Representation. Acta Sci. Nat. Univ. Pekin. 2024, 7, 1–11. (In Chinese) [Google Scholar]
  33. Duan, J.; Liao, X.; An, Y.; Wang, J. KeyEE: Enhancing low-resource generative event extraction with auxiliary keyword sub-prompt. Big Data Min. Anal. 2024, 7, 547–560. [Google Scholar] [CrossRef]
  34. Chen, Y.; Liu, S.; Zhang, X.; Liu, K.; Zhao, J. Automatically labeled data generation for large scale event extraction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; pp. 409–419. [Google Scholar]
  35. Yang, H.; Chen, Y.; Liu, K.; Xiao, Y.; Zhao, J. Dcfee: A document-level chinese financial event extraction system based on automatically labeled training data. In Proceedings of the ACL 2018, System Demonstrations, Melbourne, Australia, 15–20 July 2018; pp. 50–55. [Google Scholar]
  36. Huang, K.-H.; Hsu, I.-H.; Natarajan, P.; Chang, K.-W.; Peng, N. Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; pp. 4633–4646. [Google Scholar]
  37. Bonisoli, G.; Vilares, D.; Rollo, F.; Po, L. Document-level event extraction from Italian crime news using minimal data. Knowl.-Based Syst. 2025, 317, 113386. [Google Scholar] [CrossRef]
  38. Du, X.; Cardie, C. Event Extraction by Answering (Almost) Natural Questions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 671–683. [Google Scholar]
  39. Yang, Z.; Xu, Z.; Cui, Y.; Wang, B.; Lin, M.; Wu, D.; Chen, Z. CINO: A Chinese Minority Pretrained Language Model. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; pp. 3937–3949. [Google Scholar]
  40. Liu, Z.; Mitamura, T.; Hovy, E. Evaluation algorithms for event nugget detection: A pilot study. In Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation, Denver, CO, USA, 4 June 2015; pp. 53–57. [Google Scholar]
  41. Sharif, O.; Gatto, J.; Basak, M.; Preum, S.M. REGen: A Reliable Evaluation Framework for Generative Event Argument Extraction. arXiv 2025, arXiv:2502.16838. [Google Scholar] [CrossRef]
  42. Yu, X.; He, L.; Wang, X. Research on event extraction from ancient books based on machine reading comprehension. J. China Soc. Sci. Tech. Inf. 2023, 42, 316–326. (In Chinese) [Google Scholar]
Figure 1. Overall research framework.
Figure 1. Overall research framework.
Electronics 14 03887 g001
Figure 2. Flowchart of question template generation.
Figure 2. Flowchart of question template generation.
Electronics 14 03887 g002
Figure 3. MRC_TibAE architecture.
Figure 3. MRC_TibAE architecture.
Electronics 14 03887 g003
Figure 4. Two-stage training strategy.
Figure 4. Two-stage training strategy.
Electronics 14 03887 g004
Figure 5. Model performance under various hyperparameter configurations.
Figure 5. Model performance under various hyperparameter configurations.
Electronics 14 03887 g005
Figure 6. Experimental results of different question templates.
Figure 6. Experimental results of different question templates.
Electronics 14 03887 g006
Figure 7. Experimental results of MRC_TibAE on different datasets.
Figure 7. Experimental results of MRC_TibAE on different datasets.
Electronics 14 03887 g007
Figure 8. Experimental results of different event types.
Figure 8. Experimental results of different event types.
Electronics 14 03887 g008
Figure 9. Data sensitivity analysis under low-resource conditions.
Figure 9. Data sensitivity analysis under low-resource conditions.
Electronics 14 03887 g009
Table 1. Event arguments and categories.
Table 1. Event arguments and categories.
NumberArgument CategoryEvent Argument
1PersonResale_Agent
2Resale_Buyer
3Theft_Agent
4Theft_Victim
5Purchase_Seller
6Purchase_Agent
7IntentionalInjury_Agent
8IntentionalInjury_Victim
9TrafficAccident_Agent
10TrafficAccident_Victim
11Robbery_Agent
12Robbery_Victim
13Death_Person
14Fraud_Agent
15Fraud_Victim
16DrunkDriving_Person
17Concealment_Agent
18Arrest_Target
19Arrest_Agent
20EntityConcealment_Item
21Resale_Item
22Purchase_Item
23Appraisal_Item
24Robbery_Item
25Theft_Item
26Fraud_Item
27Appraisal_Organization
28Robbery_Tool
29Death_Cause
30IntentionalInjury_Part
31AmountResale_Price
32Purchase_Price
33Theft_Value
34Robbery_Value
35Appraisal_Value
36DrunkDriving_AlcoholLevel
37LocationTheft_Place
38Concealment_Place
39Robbery_Place
40TrafficAccident_Place
41Fraud_Place
42Arrest_Place
43DrunkDriving_Place
44TimeTheft_Time
45IntentionalInjury_Time
46TrafficAccident_Time
47Robbery_Time
48Death_Time
49Fraud_Time
50Arrest_Time
51DrunkDriving_Time
Table 2. Event types and arguments.
Table 2. Event types and arguments.
NumberEvent TypeEvent Arguments
1TheftTheft_Time   Theft_Agent    Theft_Victim
Theft_Place   Theft_Item     Theft_Value
2AppraisalAppraisal_Organization   Appraisal_Item
Appraisal_Value
3ArrestArrest_Agent   Arrest_Target   Arrest_Place
Arrest_Time
4Drunk DrivingDrunkDriving_Person   DrunkDriving_Place
DrunkDriving_Time   DrunkDriving_AlcoholLevel
5FraudFraud_Agent   Fraud_Victim    Fraud_Time
Fraud_Place    Fraud_Item
6RobberyRobbery_Agent   Robbery_Victim   Robbery_Time
Robbery_Place    Robbery_Item    Robbery_Value
Robbery_Tool
7PurchasePurchase_Agent   Purchase_Seller   Purchase_Price
Purchase_Item
8ResaleResale_Agent    Resale_Buyer   Resale_Price
Resale_Item
9DeathDeath_Person    Death_Time    Death_Cause
10ConcealmentConcealment_Agent    Concealment_Item
Concealment_Place
11Intentional InjuryIntentionalInjury_Agent  IntentionalInjury_Victim
IntentionalInjury_Time   IntentionalInjury_Part
12Traffic AccidentTrafficAccident_Agent   TrafficAccident_Victim
TrafficAccident_Time    TrafficAccident_Place
Table 3. Experimental hyperparameter configuration.
Table 3. Experimental hyperparameter configuration.
NumberHyperparameterValue
1batch_size8
2learning_rate2 × 10−5
3epochs15
4max_seq_length400
5doc_stride128
6max_query_length64
7max_answer_length64
8optimizerAdamW
9patience4
Table 4. Results across various event types.
Table 4. Results across various event types.
NumberEvent TypeMean ± SD95% CIVariance
1Robbery0.7158 ± 0.0752[0.685, 0.747]0.0057
2Appraisal0.8770 ± 0.0269[0.866, 0.888]0.0007
3Theft0.7978 ± 0.0164[0.791, 0.805]0.0003
4Traffic Accident0.6800 ± 0.0753[0.649, 0.711]0.0057
5Drunk Driving0.9050 ± 0.0299[0.893, 0.917]0.0009
6Concealment0.6864 ± 0.0781[0.654, 0.719]0.0061
7Arrest0.7241 ± 0.0340[0.710, 0.738]0.0012
8Intentional Injury0.7851 ± 0.1237[0.734, 0.836]0.0153
9Death0.7146 ± 0.0878[0.678, 0.751]0.0077
10Resale0.7341 ± 0.0226[0.725, 0.743]0.0005
11Fraud0.7411 ± 0.0640[0.715, 0.768]0.0041
12Purchase0.6479 ± 0.0928[0.610, 0.686]0.0086
Table 5. Comparison of model performance.
Table 5. Comparison of model performance.
NumberModel TypeModelF1-Score
1Sequence LabelingCINO-CRF0.5587
2 CINO-BiLSTM0.5818
3 TJEE [12]0.6299
4PromptGPT-4o0.5686
5 DeepSeek-V30.6371
6QABERT_AC [42]0.8840
7 MRC_TibAE0.7659
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gao, L.; Zhao, X. Tibetan Judicial Event Argument Extraction Based on Machine Reading Comprehension in Low-Resource Scenarios. Electronics 2025, 14, 3887. https://doi.org/10.3390/electronics14193887

AMA Style

Gao L, Zhao X. Tibetan Judicial Event Argument Extraction Based on Machine Reading Comprehension in Low-Resource Scenarios. Electronics. 2025; 14(19):3887. https://doi.org/10.3390/electronics14193887

Chicago/Turabian Style

Gao, Lu, and Xiaobing Zhao. 2025. "Tibetan Judicial Event Argument Extraction Based on Machine Reading Comprehension in Low-Resource Scenarios" Electronics 14, no. 19: 3887. https://doi.org/10.3390/electronics14193887

APA Style

Gao, L., & Zhao, X. (2025). Tibetan Judicial Event Argument Extraction Based on Machine Reading Comprehension in Low-Resource Scenarios. Electronics, 14(19), 3887. https://doi.org/10.3390/electronics14193887

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop