A Sentence Prediction Approach Incorporating Trial Logic Based on Abductive Learning

: Sentencing prediction is an important direction of artiﬁcial intelligence applied to the judicial ﬁeld. The purpose is to predict the trial sentence for the case based on the description of the case in the adjudication documents. Traditional methods mainly use neural networks exclusively, which are trained on a large amount of data to encode textual information and then directly regress or classify out the sentence. This shows that machine learning methods are effective, but are extremely dependent on the amount of data. We found that there is still external knowledge such as laws and regulations that are not used. Moreover, the prediction of sentences in these methods does not ﬁt well with the trial process. Thus, we propose a sentence prediction method that incorporates trial logic based on abductive learning, called SPITL. The logic of the trial is reﬂected in two aspects: one is that the process of sentence prediction is more in line with the logic of the trial, and the other is that external knowledge, such as legal texts, is utilized in the process of sentence prediction. Speciﬁcally, we establish a legal knowledge base for the characteristics of theft cases, translating relevant laws and legal interpretations into ﬁrst-order logic. At the same time, we designed the process of sentence prediction according to the trial process by dividing it into key circumstance element identiﬁcation and sentence calculation. We fused the legal knowledge base as weakly supervised information into a neural network through the combination of logical inference and machine learning. Furthermore, a sentencing calculation method that is more consistent with the sentencing rules is proposed with reference to the Sentencing Guidelines. Under the condition of the same training data, the effect of this model in the experiment of responding to the legal documents of theft cases was improved compared with state-of-the-art models without domain knowledge. The results are not only more accurate as a sentencing aid in the judicial trial process, but also more explanatory.


Introduction
With the construction of the rule of law, the large number of cases makes the court overburdened, and the time required to train highly qualified judges makes it impossible to fundamentally solve the contradiction of having too many cases and too few judges by simply increasing trial resources and increasing the number of judges. Therefore, we can use sentencing prediction techniques to reduce the burden on judges. Sentencing prediction is an important aspect of intelligent justice trials that aim to use computer technology to predict sentences from semi-structured adjudication documents.
As shown in Figure 1, the adjudication documents are mainly composed of the court case number, name of the court, defendant information, description of the case, judgment elements, legal articles involved, and trial results. Among the present methods, there are many that use neural networks for sentencing prediction. This can be roughly divided into threedifferent solutions: first, feature extraction of the case description part in the adjudication document; second, regression prediction by connecting a layer of the neural network directly afterwards; third, dividing the sentence into intervals to transform it into a classification problem. However, these strategies only use the case description part of the decision document without using external knowledge such as the law, and these rely on the lack of interpretation of a large amount of data. Huang [1] uses inverse abductive learning, which combines machine learning and logical reasoning, and uses the relationships summarized in the data as external knowledge to solve the problem of a small amount of labeled data, called semi-supervised abductive learning (SS-ABL); yet, it still lacks the explanatory and logical aspects of the trial. Traditional methods use a neural network to directly fit the final sentence, which does not fit the logic and flow of the trial.
Thus, the traditional methods of sentence prediction have the following problems: they do not make full use of the external knowledge of legal features; they do not refer to the process of the judge's trial, and the reference basis of the trial is not clear; the method of judging the sentence is not in line with the process and logic. We conducted the following research to address these issues.
We noticed in this particular area of law that there is some logical and rule-based knowledge, such as legal texts and judicial interpretations. These rules can be used as a priori knowledge and act as a weak supervisor in sentencing prediction, which is not used in the traditional approach, such as: (1) If the clause about intentional homicide appears in the adjudication document, the circumstance of intentional homicide must have appeared in the case of the offender, and then, the sentence will be increased at the trial.
(2) It is clearly stated in the law that different degrees of crime correspond to different levels of sentencing. For instance, when the amount of theft is smaller, the sentence in the law will be a shorter range. When the amount is larger, the sentence is in a longer range. This makes different amounts of theft under the provisions of the law affect the base sentence, and thus the final sentence. Thus, we note that there is a certain logical nature between key circumstance elements, legal provisions, and legal interpretations. This nature can be applied to logical reasoning, allowing it to play a supervisory role in both the identification of key circumstance elements and the calculation of the sentence. If the result of the identification does not conform to the logic in the law, one of the results can be changed to make it conform to that logic to the maximum extent possible. This allows for explanatory correction of misidentified labels and improves recognition rates.
First of all, in order to use this external knowledge, we write the external knowledge in the form of first-order logic to form a logical knowledge base for logical reasoning. Next, we took a theft case as an example and extracted 23 key trial circumstance elements by referring to the legal provisions, as well as the "Guidelines of the Supreme People's Court on Sentencing for Common Crimes" ("Sentencing Guidelines"). These key plot elements are highly informative for trial purposes. Additionally, to design a sentence calculation that is logical for the trial, we refer to the Sentencing Guidelines. There are four main features as follows: (1) the basis for sentencing is the key circumstantial element of the criminal act of the basic crime-constituting facts; (2) the starting point of the sentence is ruled within the range of the sentence in accordance with the provisions of the law; (3) increase or decrease the starting point of sentencing based on other criminal facts affecting the amount, number, consequences, and purpose of the crime; (4) consider the entire case and decide the final sentence to be pronounced. This shows that sentencing needs to be based not only on the basic constituent facts of the case, i.e., the key elements of the case, but also on the totality of the circumstances of the case. In determining the base sentencing, it is necessary to refer to the range of sentences stipulated in the legal provisions. Therefore, combining the above studies and findings to address the lack of trial logic, inadequate use of knowledge of the law, and lack of interpretability in the traditional approach, we propose a new model based on the existing algorithm SS-ABL. The main innovation points are as follows: • Prepared and expanded the legal knowledge base (KB). Not only does it use common sense constraints from data, but it also converts legal texts and judicial interpretations into a form of first-order logic to constitute the knowledge base.

•
Applying knowledge-base-based logical reasoning to the process of key circumstance element identification and sentencing calculation makes it more adequate for both the upstream and downstream tasks of sentencing prediction to provide some oversight.

•
Incorporating the descriptive features of the case and key circumstance elements, a sentencing calculation method that is more in line with the judge's trial process was developed in accordance with the Sentencing Guidelines.
The paper is structured as follows. Relevant work in this area is described in Section 2. Section 3 details the sentencing prediction model that combines domain knowledge. The evaluation and results are presented in Section 4. The conclusions of the paper are presented in Section 5.

Related Work
Case sentencing prediction has been studied by domestic and foreign researchers for many years [2][3][4][5]. Since the last century, researchers have been exploring the possibility of using mathematical methods to quantitatively analyze and predict the behavior of justice. Kort [6] analyzes a number of decided cases to determine the factual elements that affect the verdict, scores those elements using a mathematical formula, and uses the resulting content to aid the judge's decision. Shapira [7] implemented a system for juvenile probation decision-making for probation officers in Israel using a rule-based expert system [8]. Sulea et al. [9] used an integrated model of multiple support vector machines (SVMs) to predict the outcome of French Supreme Court cases based on a large-scale corpus of cases. Liu and Hsieh et al. [10] proposed extracting shallow textual features (e.g., characters, words, and phrases) from adjudication documents for case charge prediction. Katz et al. [11] predicted the outcome of a U.S. Supreme Court decision by using random forest (RF) to extract valid features from the case description.
In early studies, researchers mostly conducted research on sentence prediction by means of rules, mathematical statistics, and machine learning. However, the development of research on automated sentence prediction methods has not been long.
With the development of deep learning techniques, most researchers have studied the task of sentencing prediction research on the idea of adopting text classification [12] by extracting some features. The Challenge of AI in Law 2018 (CAIL2018) and Legal AI Challenge 2021 (LAIC2021) both opened sentencing prediction tracks, and the baseline model used was the traditional text classification method. This approach transforms the sentencing prediction problem into a text classification problem by dividing the sentencing into intervals. Luo et al. [13] proposed a neural network model based on an attention mechanism to predict case charges based on the factual description of the case by introducing relevant legal provisions. Hu et al. [14] blend 10 different legal attributes to predict a small number of shooting and confusion charges. Ye et al. [15] proposed a sequence-to-sequence (Seq2Seq) model under labeling conditions to provide a courtroom view to assist in sentencing based on factual descriptions of criminal cases with coding of charge labels. Wu et al. [16] study the sentencing of environmental rights cases from the perspective of international criminal law and uses convolutional neural networks (CNNs) to determine the sentencing of environmental rights cases. Their results show that the introduction of CNNs improves the effect of the sentencing term prediction model and the fine prediction model significantly. Yang et al. [17] proposed a hierarchical attention network (HAN) that combines static spatial information, short-term motion information, and a long-term video temporal structure for complex human behavior understanding, while Wang et al. [18] later applied HAN to sentencing prediction models by using residual networks to fuse an improved hierarchical attention network (iHAN) and a deep pyramidal convolutional neural network (DPCNN) and proposed a hybrid deep neural network model, hybrid attention and CNN (HAC), to apply HAC to sentencing prediction models. Park et al. [19] used a multi-task deep learning model with an attention mechanism for combining three tasks (accident types, applied articles, and the sentencing of ship accidents) for sentencing prediction. All of the above methods use text classification methods for sentencing prediction. Zhong et al. [20] considered the existence of dependencies between multiple subtasks of legal trials (e.g., applicable legal provisions, charges, fines, and sentencings) and proposed a multitask topological dependency learning model called TOPJUDGE to predict the legal trials of cases. Yang et al. [21] proposed a multiperspective bidirectional feedback network (MPBFN) based on topological dependencies between multiple subtasks and the introduction of combinatorial semantic relations between words to predict the legal trials of cases. Zhou et al. [22] proposed an inverse deduction learning framework to solve the problem that machine learning and logical reasoning are difficult to combine by introducing a knowledge base in the logical reasoning part and using a heuristic trialand-error search algorithm to combine machine learning. Huang et al. [1] used the inverse deduction learning framework to solve the problem of case element identification by taking advantage of the strong logic of the joint trial documents.
The current research on sentencing prediction in general can be divided into two methods: text classification and regression analysis prediction. Text classification methods can be combined with crime prediction and law recommendation to achieve better results. In contrast, regression analysis prediction identifies the elements of the crime or the relevant law for prediction. The textual classification method may focus on the semantic information of the crime elements, but ignores the final calculation of the sentencing, and the regression analysis method focuses on the construction of the regression model, but does not pay attention to the information of the key crime circumstance elements themselves. The above methods do not combine the characteristics of the other methods well, so they do not achieve better results in sentencing prediction.

Methods
In this section, we will describe our proposed model in terms of the logic of the trial: it is divided into three modules, namely knowledge base preparation, element identification, and sentencing calculation. This sentence prediction model is an effective combination, and it is a new contribution of our work. In the following, we will present each of these three components.

Knowledge Base
When we combed through the statutes and adjudication documents, we sorted out approximately 23 key circumstance elements that can also be called trial elements to better use the logical relationships between the statutes, between the statutes and adjudication documents, and within the adjudication documents, referring to the Sentencing Guidelines. See Table 1. Among these are the key circumstance elements, which are elements summarized by the circumstances that have a significant impact on the sentence according to the Sentencing Guidelines. The ratio refers to the number of occurrences of the element as a percentage of the number of occurrences of all elements. With these key elements, we followed this table to extract the corresponding elements in the law, such as theft, multiple theft, etc., and the corresponding sentence ranges. Then, we chose to represent their logical structure using first-order logical relations to form a knowledge base. At the same time, we observed some potential logical relationships among the key plot elements in the data of the adjudication documents when we organized them. As you can see, the legal knowledge base is composed of first-order logical forms bounded by legal rules and common sense.
The key plot elements involved in each judgment document are also present in the legal knowledge base in a first-order logical form. A portion of the legal knowledge base is shown in Figure 2. This figure shows how we parse the law and transform it into a first-order logical form. In Figure 2a, on the left is Article 264 of the Chinese Criminal Law and on the right is its corresponding English version. The blue section is the key circumstance elements, and the red section is the range of sentencing given. In Figure 2b, the part in the box above corresponds to the result of transforming the part into first-order logic where the "score" value is the sentence interval specified in the law. The following part is an example of the common sense constraint. In the legal interpretation document, "pickpocket" is defined as "stealing property carried by another person in a public place or on public transportation." and "burglary" is defined as "entering a room and stealing it secretly." Obviously, "pickpocket" and "burglary" do not exist at the same time. The legal rules are decomposed into sentence intervals corresponding to key circumstance elements through legal provisions and judicial interpretations, which have reference values for subsequent sentencing calculation. For example, in Figure 2b, if the first paragraph of section 264 is met, the "score", which means the range of the sentence, is between 0 and 36 months (sentences are in months in this article). The common sense constraint describes the relationship between key plot elements and has a weak supervisory effect on the subsequent identification of key plot elements.

Key Circumstance Element Identification
In this section, we use a method based on the abductive learning method to identify key plot elements, the flow of which is shown in Figure 3. In the first part, we identified the key plot descriptions in the adjudication documents by the element extraction algorithm. To better obtain the key plot elements of the case and to solve the problem of the insufficient generalization of existing methods, we adopted the BERT [23] pretraining model for the extraction of text features through the comparison of multimodel experiments. The model employs a connection using the bidirectional encoder block layer part of the transformer [24] model, discarding the decoder module so that it automatically has bidirectional coding capability and powerful feature extraction capability. Transformer's encoding sequence uses the idea of a self-attention mechanism that can read the whole text sequence at once. By capturing the global contextual information to establish a long-range dependency on the target, stronger features of the whole text can be extracted.
First, we need to process the text descriptions in the key episodes into the standard input form of BERT. We add the marker "[CLS]" as the start of a sentence and the marker "[SEP]" as the end of a sentence to the entire text. The initial input is W = {W 1 ,W 2 , . . . ,W n }, where W i is a word in a sentence and n is the length of the sentence from each sentence in the judgment elements section. Then, three kinds of embeddings corresponding to W are added together to obtain the BERT input: where E t , E s , E p are the token embeddings, segment embeddings, and position embeddings corresponding to W, respectively. Since this paper is a single-sentence task, the identity vector here is 0, and the inclusion of the position vector considers that the sequential nature of the input sequence has an effect on the information in the text. The final input of BERT is denoted as E input ∈ R n×d , where n denotes the length of the input text and d denotes the dimension of the vector.
A bidirectional transformer encoder block is used for connection in BERT, discarding the decoder module so that it automatically has bidirectional encoding capability and powerful feature extraction capability. In the output of BERT, for each input word embedding, there is a corresponding output, and since the first token of each output sequence is always a special classification, i.e., "[CLS]" token embedding, which does not contain any semantic information, the vector is a collection of information of a whole sentence. We used this vector to identify the key plot elements. It is followed by a fully connected layer that transforms the vector dimension into the length of the number of key plot elements (m) and by sigmoid as the activation function: The probabilities of the key element classification are normalized and transformed into a probability distribution with a 0 to 1 interval. We set a threshold value: and the corresponding key plot element exists when its probability exceeds that value α. Then, we obtain the temporary key plot elements y p , which can also be called pseudo-labels.
In the second part, we used regular expressions to extract from the involved law articles in the adjudication documents. Additionally, we used the pseudo-labels obtained by element identification as the input. The results of their reasoning generated by the legal knowledge base (as a set) are corrected for the provisional key elements by maximizing the consistency of operation to obtain the most likely result as the final key plot element. The maximizing consistency is that after inputting the temporary key elements, the logical reasoning of the legal knowledge base will produce some corrected results that conform to the logical rules, and we only need to select one of them in the end. The basis of selection is to pass the corrected list of key elements through the sentencing calculation module to produce the corresponding sentencing loss and select the one with the smallest loss as the final key circumstance element. Finally, we obtain the corrected label for the pseudo-labels.
The whole algorithmic process of key circumstance element identification presented in this section is shown in Algorithm 1.

Algorithm 1 Key circumstance element identification.
Input: Judgment elements' data X i of judgment document with their labels Y i ; unlabeled data X j with their legal articles involved L j and true length of sentence S j ; legal knowledge base (KB) Output: X j corresponding to label Y j # Update recognizer f 10: return Y j

Sentencing Calculation
To be more in line with the flow of the trial, the sentence calculation module was designed in accordance with the Sentencing Guidelines. As shown in Figure 4, we can see the process for obtaining a base sentence and a final sentence separately. We first combined the amount of money involved in the case obtained from the judgment documents with the sentence range obtained by logical reasoning based on the legal knowledge base to design a method to obtain the base sentence.
where β represents the adjustment of the amount of theft to the starting sentence, m represents the amount of theft in the case, and γ represents the starting sentence in the statute. Since the base sentence is not defined in detail in the Sentencing Guidelines, we can only calculate it based on one of the reference theft amounts, while the base sentence can only be within the range specified in the involved law. Therefore, we constructed a one-dimensional function of the amount to calculate the base sentence. To focus on the key plot elements and consider the global features of the case, we combined the key plot elements with the general features of the case by referring to the self-attention [24] approach. In this way, not only the global features, but also the key circumstances are considered. By combining the information of key plot elements and general features of the case, we generated an attention weight for each part of the text and weighted the text information, which can help us better capture the information related to the key plot in the text paragraph.
We first performed feature extraction on the two texts by BERT as in Section 3.2 and obtained the vector D of common features of the case and the feature vector E of key elements of the case, where both D and E are global information about the respective text.
Then, we put these two vectors through a control attention mechanism to obtain features about the plot descriptions that exacerbate the critical plot element parts.
multiplying the two eigenvectors with the matrix W yields V, K, and Q.
We calculated the dot product between Q and K. Next, for its result to be too large, a constraint operation was performed to divide the result by √ d k , where d k is the dimensionality of the vector K. After that, a softmax layer was added to it to normalize the result and then multiplied by the matrix V to obtain the final weight summation representation Attention. Moreover, the obtained results were plugged into a layer of the neural network, while the base sentence was added for linear regression, and finally, the final sentence was obtained.
The whole algorithmic process of sentencing calculation presented in this section is shown in Algorithm 2.
Algorithm 2 Sentencing calculation. Input: Accident description data D i of judgment document and amount of theft M i ; key circumstance element Y i ; real sentence S i ; unlabeled data D j with their related laws Laws, amount of theft M j and key circumstance element Y j obtained from a previous mission; legal knowledge base (KB) Output: D j corresponding to sentence S j

Experiments and Results
In this section, we carried out experiments on the theft judicial sentencing task to demonstrate that SPITL is able to leverage unlabeled data and symbolic knowledge.

Experimental Setup and Evaluation Index
Since there is no public dataset on evidence extraction from judgment documents, we obtained the relevant judgment documents from the People's Court of Guizhou Province, erased the sensitive information, and used manual annotation to construct the dataset. The dataset contains 3668 adjudication documents of theft cases, and the training sets with 10%, 50%, and 90% of the cases were used for the experiments. To better measure the performance of the model, we adopted the Monte Carlo cross-validation method by randomly dividing the dataset into training and validation sets each time, so that threetime separate model training and validations were performed, and finally, these validation results were averaged as the validation error of this model. We set the length of the input text to 512, the size of the hidden layer to 768, the threshold α to 0.25, β in the base penalty to 0.065, and γ to the minimal sentence, where β is mainly used to reduce the amount of integers to decimals in order to facilitate the calculation by the neural network and γ is used to obtain the starting penalty specified in different laws through logical reasoning. In particular, the effect of the α value on the results is relatively large, and we will analyze its effect in Section 4.4. Experiments ran on a workstation with AMD EPYC 7742 64-Core Processor, 40 GB memory, and the A100-SXM4-40GB GPU.
We chose to use the mean absolute error (MAE), mean-squared error (MSE), rootmean-squared error (RMSE), and R2Score (coefficient of determination R 2 ) as the evaluation criteria. The MAE is a loss function used in regression models and is the sum of the absolute values of the differences between the target and predicted values, which measures the average modal length of the error between the predicted and target values. It can better reflect the actual situation of prediction value error. The formula is as follows, where y is the true value andŷ is the predicted value: the MSE is the sum of squares between the predicted and true values, which is slightly simpler to calculate than the MAE, but has strong robustness. It can be used to measure the dispersion of the difference between the predicted and true values. The formula is as follows, where y is the true value andŷ is the predicted value: The RMSE is the square root of the ratio of the sum of squares of deviations of observations from the true value to the number of observations n. It is used to measure the deviation of the observed value from the true value and is often used as a measure of the prediction results of machine learning models. The formula is as follows, where y is the true value andŷ is the predicted value: The R2Score is generally used in regression models to assess the degree of conformity between predicted and actual values and is defined as follows: where y is the true value,ŷ is the predicted value,ȳ is the average of the true values, FVU is called the fraction of variance unexplained, RSS is called the residual sum of squares, and TSS is called the total sum of squares. In general, the closer R2Score is to 1, the better the model fitting effect.

Result Analysis
We chose five baseline models for the comparison experiments: • TFIDF-SVM: We transformed sentence prediction into a classification problem by dividing the sentence intervals into one class. The text is first divided into words, then feature extraction is performed using TFIDF, classification is performed using SVM [25], and the median of the intervals is used as the prediction result. • CNN [26]: Using a CNN-based classification model with multiple filter window widths for text classification, we transformed sentence prediction into a classification problem by taking the final sentence as the middle value of the classification interval. • LSTM-FC [27]: A vectorized representation of the text using LSTM is followed by a linear layer for sentencing prediction. • BERT-FC [23]: Feature extraction is performed on the text using BERT, and sentence prediction is performed in a linear layer. • SS-ABL [1]: Sentence prediction directly using the inverse abductive learning framework.
Comparative experiments are shown in Table 2.
We can see that comparing the three models with different training set proportions, our model is better in terms of key element recognition and sentence calculation. Moreover, when the training set is 90%, the error of our model in the prediction of the prison sentence is within three months, reaching 2.6 months. Although the MAE of the SS-ABL model is only slightly lower than that of our model when the training set accounts for 10%, the MSE and RMSE are higher and show that the prediction result fluctuates greatly and does not have good stability. Again, we can see from the R2Score that the SPITL model is better. For the R2Score, the closer to 1, the better the fit of the model is and the better the model is. It can also be seen from the table that the R2Score of the SPITL model is always closer to 1 than those of the other models for the same training ratio. Especially when the training accounts for a relatively small amount of time, the difference with other models is more obvious, which should be played by the use of external knowledge through logical reasoning. From the above experiments, it is clear that SPITL outperforms the traditional model. Since most traditional models are based on using data to adjust parameters to fit the results, the accuracy is not particularly high when the amount of data is small. However, this model combines machine learning and logical reasoning and uses logical reasoning to constrain and change the results of recognition when the amount of data is small, resulting in a higher recognition rate and a smaller error in the sentence prediction. For example, when it is known that the case involves Article 17 of the criminal code (About minors), then the key circumstance element must contain the "Minor", and if the identified pseudo-label does not contain that label, it is modified to contain that label.
In addition, we verified the validity of the sentence calculation method. As shown in Table 3, when key plot elements were obtained, we used different calculation methods for comparison. This means that we used different methods for sentence calculation when the correct key elements are known. We can see that comparing some of the common methods of calculating sentences, our calculation method is superior to the other methods. When having more training sets, the MAE, MSE, and RMSE values are smallest, and obviously, our model is more accurate and stable. Furthermore, at the same training ratio, our model has a higher R2Score value, closer to 1. From this perspective, the sentence calculation method of our model is better than the others.
With the increase in the amount of data, the accuracy of traditional identification improved, but in the calculation of sentences, it cannot absolutely rely on the key circumstance elements, but needs to refer to the whole process of the case, such as the motive for the crime, the means of the crime, etc. Therefore, the new sentence calculation model we used will be more accurate with less error.

Chi-Squared Test
In order to check whether there is a significant difference between the actual (O) and expected results (E), we used the mathematical statistics method of the chi-squared test to perform hypothesis testing. First, we established the test hypothesis and determined the test level.
Meanwhile, we set the significance level α to 0.05. We calculated the cardinality (χ 2 ) by predicting the sentence and the true sentence.
We used the SPITL results after the 90% training set to perform the operation and obtained a χ 2 value of 340.8386. Since we know that its degree of freedom is the number of data minus one, i.e., (n−1), we can obtain a right-tailed probability of cardinality (p-value P) of 0.8231. Obviously, P = 0.8231 > α = 0.05 so we accept hypothesis H 0 . From the above, it can be seen that the fit of the predicted values with our method to the true values is high. It also proves that the method of this paper is credible, valid, and accurate. To see more intuitively the fit between the predicted and true values, we plotted them using a graph, as shown in Figure 5, where acc is the actual value and pre is the predicted value.

Hyperparameter α
In the model, α is used as a threshold in the identification of key plot elements and has an impact on the results of identification. Therefore, we analyzed the effect of the choice of α values on the recognition results when key plot elements were identified. Again, the experiments were compared at a training ratio of 90% of the settings. Referring to Figure 6, we can see that different α values still have an impact on the accuracy of key plot elements' recognition. At 0.25, the F1-value of recognition is the highest.

Conclusions
In this paper, we proposed a sentencing prediction model framework called SPITL. In this framework, our core innovations were the first-order logicalization of legal texts and their legal interpretations and adding to the legal knowledge base, which makes logical reasoning more adequate to supervise machine learning, and the design of a sentence calculation method that is more consistent with the trial process. The model is set-associative and consists of two subtasks, key circumstance element identification and sentence calculation. It was experimentally demonstrated that this model not only solves the problem of few labeled data by an abductive learning framework in sentence prediction, but is also more accurate and explanatory in sentence calculation. Despite our model achieving some results on theft cases, it is still challenging to migrate to other cases of different categories. Since different cases have different focuses, there will be more laws involved, and the reasoning will be more complicated and may take longer. In our future work, we will work on finding a better way to integrate external knowledge into the neural network training process or designing an objective function to improve inference efficiency and shorten inference time.