Predicting Critical Path of Labor Dispute Resolution in Legal Domain by Machine Learning Models Based on SHapley Additive exPlanations and Soft Voting Strategy

: The labor dispute is one of the most common civil disputes. It can be resolved in the order of the following steps, which include mediation in arbitration, arbitration award, first-instance mediation, first-instance judgment, and second-instance judgment. The process can cease at any step when it is successfully resolved. In recent years, due to the increasing rights awareness of employees, the number of labor disputes has been rising annually. However, resolving labor disputes is time-consuming and labor-intensive, which brings a heavy burden to employees and dispute resolution institutions. Using artificial intelligence algorithms to identify and predict the critical path of labor dispute resolution is helpful for saving resources and improving the efficiency of, and reducing the cost of dispute resolution. In this study, a machine learning approach based on Shapley Additive exPlanations (SHAP) and a soft voting strategy is applied to predict the critical path of labor dispute resolution. We name our approach LDMLSV (stands for Labor Dispute Machine Learning based on SHapley additive exPlanations and Voting). This approach employs three machine learning models (Random Forest, Extra Trees, and CatBoost) and then integrates them using a soft voting strategy. Additionally, SHAP is used to explain the model and analyze the feature contribution. Based on the ranking of feature importance obtained from SHAP and an incremental feature selection method, we obtained an optimal feature subset comprising 33 features. The LDMLSV achieves an accuracy of 0.90 on this optimal feature subset. Therefore, the proposed approach is a highly effective method for predicting the critical path of labor dispute resolution.


Introduction
With the increasing awareness of labor rights among employees, the number of labor disputes in China has been showing a year-on-year increase [1].The large volume of labor dispute cases imposes a heavy burden on both employees and dispute resolution institutions.Labor disputes can be resolved through both non-litigation and litigation methods.Specifically, employees can sequentially utilize five methods (mediation in arbitration, arbitration awards, first-instance mediation, first-instance judgments, and second-instance judgments) to resolve disputes until achieving a satisfactory outcome [2].In practice, however, we do not know which critical path should be taken with the case in advance.Hence, this may take a lot of time and resources and can lead to protracted legal disputes.Predicting the optimal critical path of resolving labor disputes assists employees and dispute resolution institutions in making appropriate decisions.Hence, it may expedite the dispute resolution process, save dispute resolution resources, and reduce dispute-related costs, thereby alleviating the burden on both the dispute resolution institutions and employees.
The introduction of Artificial Intelligence (AI) techniques has brought new opportunities to the legal domain.It can help legal professionals escape from repetitive tasks (e.g., legal judgment prediction [3][4][5], legal question answering [6][7][8], and legal case retrieval [9][10][11]) and then have time to focus on more valuable things.Dispute resolution, being a crucial component of the legal domain, has obtained considerable attention from researchers regarding how artificial intelligence can be utilized to address issues related to disputes, such as predicting dispute occurrences, dispute resolution methods, and dispute resolution outcomes.Chou et al. [12] proposed an integrated method of a Support Vector Machine, Artificial Neural Network, and decision tree C5.0 to predict the occurrence of disputes at the initiation stage of Public-Private Partnership projects, achieving an accuracy of 84.33%.Ayhan et al. [13] used majority voting technology to predict the occurrence of disputes, and the accuracy rate reached 91.11%, which proved the effectiveness of machine learning technology in the early prediction of the occurrence of disputes.Tsurel et al. [14] used XGBoost to predict the outcome of e-commerce disputes, determining whether the buyer or seller would prevail, which can achieve an accuracy of 86%.
In the prediction of dispute resolution methods, there have also been some research studies conducted.Lokanan [15] used a machine learning algorithm to resolve financial fraud disputes, treating disciplinary hearings as a binary classification problem between settlement and contested hearings.They achieved 99% accuracy using the Gradient Boosting classifier for prediction.Chou et al. [16] proposed a hybrid artificial intelligence system that combines fuzzy logic, a fast and messy genetic algorithm, and support vector machines.This system treats project dispute resolution as a five-class classification problem, encompassing mediation, arbitration, litigation, negotiation, and administrative appeals, achieving an accuracy of 77.04%.Ayhan et al. [17] proposed an approach for the resolution of construction project disputes as a six-class classification problem, with input variables encompassing factors influencing dispute resolution.The output variables included six dispute resolution methods: litigation, arbitration, dispute review boards, mediation, senior executive appraisal, and negotiation.They conducted attribute reduction using the Chisquare test and employed an ensemble classifier, achieving an accuracy of 89.44% through ten-fold cross-validation.However, to the best of our knowledge, there is no prediction model for labor dispute resolution methods, and the need for an AI method of predicting labor dispute resolution is becoming increasingly apparent.
In this study, we introduce a prediction model called LDMLSV (stands for Labor Dispute Machine Learning based on SHapley additive exPlanations and Voting).LDMLSV focuses on utilizing machine learning algorithms to predict the critical path of resolving labor disputes.We obtained 1255 legal documents from the court and arbitration committee in the Yuhu district of Xiangtan city of China, which include legal documents of mediation in arbitration, arbitration awards, first-instance mediation, first-instance judgments, and second-instance judgments.The resolution of labor disputes progresses sequentially through mediation in arbitration, arbitration awards, first-instance mediation, first-instance judgments, and second-instance judgments.The process can cease at any step when it is successfully resolved.Consequently, this forms five paths of labor dispute resolutions of lengths 1, 2, 3, 4, and 5.Because each stage cannot be skipped, these five paths can be distinguished by predicting only the last step.Therefore, we can consider the predictive problem of labor dispute resolution paths as a five-class classification problem for predicting the ultimate resolution method.Firstly, we compared the classification performance of 10 machine learning algorithms under multiple sample-balancing methods.Leveraging classifiers with an accuracy greater than 0.85, an ensemble method based on a soft voting strategy was used to predict the critical path for labor dispute resolution.Secondly, we applied a post hoc explanation method called SHapley Additive exPlanations (SHAP) [18,19], and importance scores for all features were computed to reveal the decision logic behind the model.Then, Incremental Feature Selection (IFS) [20] and Jackknife cross-validation were employed to select optimal features.The predictive outcomes of the optimal feature subset were compared with the original dataset on the soft voting classifier.
The main contributions of this work are listed in the following bullet points.
• This work provides a more effective and efficient way to predict the critical path of labor dispute resolution.This prediction helps judges, lawyers, and relevant stakeholders gain a better understanding of possible case development trends, enabling them to make wiser decisions.

•
Predicting the critical path of labor dispute resolution aids in seeking effective solutions, significantly reducing both the time and costs associated with legal procedures.

•
LDMLSV also aids in better resource allocation within the judicial system.It can assist courts in managing caseloads more effectively, prioritizing cases that might have a greater impact, thereby enhancing judicial efficiency and fairness.

•
Overall, the contribution of predicting the critical path to labor dispute resolution lies in providing a tool and method that can facilitate a more efficient and equitable resolution of labor disputes within the judiciary, while optimizing resource utilization.
The organization of this work is as follows: Section 2 introduces the data sources, methods of data preprocessing, model framework, and machine learning interpretation tools.Section 3 presents the results, while Section 4 discusses these findings.Section 5 summarizes the primary discoveries of this work and outlines important directions for future endeavors.

Dataset Description
The labor dispute dataset consists of 1255 legal documents from the court and arbitration committee spanning from 2014 to 2022 in the Yuhu district of Xiangtan city of China, as illustrated in Figure 1.Among these documents, there are 93 documents of mediation in arbitration, 72 documents of arbitration awards, 456 documents of first-instance mediation, 362 documents of first-instance judgment, and 272 documents of second-instance judgment.For each piece of data, they were assigned 57 attributes as characteristics (see Appendix A Table A1 for details).
compared the classification performance of 10 machine learning algorithms under multiple sample-balancing methods.Leveraging classifiers with an accuracy greater than 0.85, an ensemble method based on a soft voting strategy was used to predict the critical path for labor dispute resolution.Secondly, we applied a post hoc explanation method called SHapley Additive exPlanations (SHAP) [18,19], and importance scores for all features were computed to reveal the decision logic behind the model.Then, Incremental Feature Selection (IFS) [20] and Jackknife cross-validation were employed to select optimal features.The predictive outcomes of the optimal feature subset were compared with the original dataset on the soft voting classifier.
The main contributions of this work are listed in the following bullet points.

•
This work provides a more effective and efficient way to predict the critical path of labor dispute resolution.This prediction helps judges, lawyers, and relevant stakeholders gain a better understanding of possible case development trends, enabling them to make wiser decisions.

•
Predicting the critical path of labor dispute resolution aids in seeking effective solutions, significantly reducing both the time and costs associated with legal procedures.

•
LDMLSV also aids in better resource allocation within the judicial system.It can assist courts in managing caseloads more effectively, prioritizing cases that might have a greater impact, thereby enhancing judicial efficiency and fairness.

•
Overall, the contribution of predicting the critical path to labor dispute resolution lies in providing a tool and method that can facilitate a more efficient and equitable resolution of labor disputes within the judiciary, while optimizing resource utilization.
The organization of this work is as follows: Section 2 introduces the data sources, methods of data preprocessing, model framework, and machine learning interpretation tools.Section 3 presents the results, while Section 4 discusses these findings.Section 5 summarizes the primary discoveries of this work and outlines important directions for future endeavors.

Dataset Description
The labor dispute dataset consists of 1255 legal documents from the court and arbitration committee spanning from 2014 to 2022 in the Yuhu district of Xiangtan city of China, as illustrated in Figure 1.Among these documents, there are 93 documents of mediation in arbitration, 72 documents of arbitration awards, 456 documents of first-instance mediation, 362 documents of first-instance judgment, and 272 documents of second-instance judgment.For each piece of data, they were assigned 57 attributes as characteristics (see Appendix A Table A1 for details).

Data Preprocessing 2.2.1. Corpus Annotation
For the 1255 cases, the BRAT Rapid Annotation Tool (BRAT, version 1.3) [21] was used to annotate all attributes.BRAT is an annotation software that supports Chinese and can be downloaded from https://github.com/nlplab/brat/releases/tag/v1.3p1(accessed on 12 August 2023).After annotation, attributes were transformed into numerical representations suitable for machine learning, as described in Table A1 of Appendix A, while the five methods of dispute resolution or the five critical paths of labor dispute resolution were encoded as 0, 1, 2, 3, 4.

Feature Scaling
For each case, there are 57 attributes assigned.Compared with other attributes, attributes like employees' ages and lawsuit amounts exhibit significantly larger variations.They would impact the effectiveness of model training.Normalization can help map all data to a similar range, which is crucial for unstructured data that contain highly diverse values.MinMaxScaler normalization has proven to be very effective for processing highdimensional data.MinMaxScaler is a type of normalization that scales all labor dispute features to values between 0 and 1 through the following formula: where v min and v max represent the minimum and maximum value of the considered feature, respectively.

Oversampling for Dataset
As can be seen from Figure 1, the labor dispute dataset is highly unbalanced, with more than six times as many first-instance mediations as arbitration awards.To enhance predictive performance and alleviate the impact of sample imbalance, we opted for the KMeansS-MOTE [22] oversampling method to balance samples, comparing it against three other oversampling techniques: Synthetic Minority Over-sampling Technique (SMOTE) [23], Adaptive Synthetic Sampling (ADASYN) [24], and Support Vector Machine Synthetic Minority Over-sampling Technique (SVMSMOTE) [25].Figure 2 illustrates the sample distribution before and after KMeansSMOTE when selecting an automatic sampling strategy.

Model Architecture
In this study, we apply a machine learning approach named LDMLSV, based on SHAP and a soft voting strategy, to predict the critical path of labor dispute resolution.Figure 3 illustrates the entire workflow.

Model Architecture
In this study, we apply a machine learning approach named LDMLSV, based on SHAP and a soft voting strategy, to predict the critical path of labor dispute resolution.Figure 3 illustrates the entire workflow.represents the prediction results obtained using all features.Output_2 represents the prediction results obtained using the optimal feature subset.

Soft Voting Strategy
Ensemble learners utilize two or more classifiers to create a model that can provide more accurate predictions.A voting classifier is a type of ensemble learner commonly used for classification problems [36].The voting classifier can employ two strategies: hard voting and soft voting.In contrast to hard voting, soft voting predicts the output class based on the probabilities assigned to classes by the classifiers.The soft voting strategy can consider additional information about prediction probabilities, thereby generating more accurate predictions.Equation ( 2) provides the definition: where i is the value of class encoding, n + 1 is the number of class, m is the number of classifiers, and p ij represents the probability that the j-th classifier predicts the i-th class.
Figure 4 provides an illustration of soft voting.

Explainable Artificial Intelligence Methods Based SHAP
Shapley Values [18], introduced by Shapley in 1953, are a concept from game theory used to measure a fair distribution of rewards among a group based on players' contributions to a particular outcome.In 2017, Lundberg and Lee [19] extended this game theory concept into the explainable artificial intelligence and introduced SHAP.The introduction of SHAP has been beneficial for transitioning machine learning models from black-box models to glass-box models, enhancing their interpretability.In SHAP, the machine learning model is viewed as the set of game rules, and the input features are considered as potential players.The SHAP values can be calculated as follows: [ ] where , p is the number of features.

} { \ i x F
denotes the removal of i x from F .Specifically, the marginal contribution of i x is the average value of

Explainable Artificial Intelligence Methods Based SHAP
Shapley Values [18], introduced by Shapley in 1953, are a concept from game theory used to measure a fair distribution of rewards among a group based on players' contributions to a particular outcome.In 2017, Lundberg and Lee [19] extended this game theory concept into the explainable artificial intelligence and introduced SHAP.The introduction of SHAP has been beneficial for transitioning machine learning models from black-box models to glass-box models, enhancing their interpretability.In SHAP, the machine learning model is viewed as the set of game rules, and the input features are considered as potential players.The SHAP values can be calculated as follows: where F = x 1 , x 2 • • • x p , p is the number of features.F\{x i } denotes the removal of x i from F. Specifically, the marginal contribution of x i is the average value of f S∪{x i } (x S∪{x i } ) − f S (x S ) after iterating through S ⊆ F\{x i }.

The Optimal Feature Set Obtained from SHAP
An ordered feature ranking, denoted as A like Equation (4), can be obtained according to SHAP values.The more important the feature, the smaller its corresponding index t is.
To determine the optimal feature set in A, we construct N feature sets by incrementally adding one feature at a time, following the Incremental Feature Selection (IFS) method proposed by Huang et al. [20], as shown in Equation ( 5): For N feature sets, predictors are used in turn, and an IFS table containing the number of features and feature performance is obtained by calculating Matthews Correlation Coefficient (MCC) of Jackknife cross-validation.The subset corresponding to the highest MCC is the optimal feature set we are looking for.

Performance Evaluation Metrics
The prediction of critical paths of labor dispute resolution can be considered as a five-class classification problem, and we evaluated the performance using four metrics: accuracy, precision, recall, and F1-score.
where TP stands for True Positives, TN stands for True Negatives, FP stands for False Positives, and FN stands for False Negatives.

The Experimental Results of Hyperparameter Tuning
Hyperparameter optimization is a crucial step in improving model generalization, reducing overfitting, and enhancing the classification performance.In this study, Grid-SearchCV with 10-fold cross-validation was employed to obtain the optimal hyperparameter values for the base models.Table 1 provides a list of hyperparameter tuning values for the base classifiers when employing the KMeansSMOTE sample balancing method.

Comparison between Base Classifiers and the Soft Voting Classifier
In this study, four oversampling methods were employed to balance the samples.Evaluation of ten base classifiers was conducted using the test set, and those with an accuracy exceeding 0.85 were selected to be integrated into a soft voting classifier, as depicted in Table 2.Under the KMeansSMOTE oversampling method, the ensemble soft voting classifier comprising RF, ET, and CatBoost exhibited the best predictive performance, achieving an accuracy of 0.89.For all performance evaluation metrics, including accuracy, precision, recall, and F1-score, the soft voting classifier based on RF, ET, and CatBoost outperformed individual classifiers.Additionally, the soft voting classifier based on RF, ET, and CatBoost surpassed the soft voting classifier based on RF, ET, and XGBoost, as well as other classifier ensembles, across all evaluated performance metrics.
Since the prediction of critical paths of labor dispute resolution is a multi-classification problem, it is crucial to avoid situations where the overall prediction is good while the certain categories are poor.Table 3 presents the predictive results for each class.The results indicate that the soft voting classifier exhibits similar performance across these five dispute resolution paths, with F1-scores all surpassing 0.8.The soft voting classifier demonstrates excellent performance in predicting the critical path of labor dispute resolution.Different oversampling steps can significantly impact the final classification results.We kept classes 2, 3, and 4 fixed at the maximum class count, and then adjusted the ratios of minority classes 0 and 1.Table 4 presents the results of the soft voting classifier for both the unadjusted ratio and selected ratios of 0.25, 0.5, and 1.From the results, it is evident that as the sampling ratio increases, there is an upward trend in the predictive outcomes for the minority classes.

Model Interpretation Based on SHAP
Compared to other classifiers, the soft voting classifier based on RF, ET, and CatBoost demonstrates superior performance.In this study, SHAP is employed to interpret and analyze the predictions of these four models, thereby deducing the crucial features influencing the models.RF, ET, and CatBoost utilize TreeExplainer for analysis, and the VotingClassifier employs a KernelExplainer.To obtain a global importance chart of features, a summary plot is employed to visualize features' importance.Figure 5 illustrates the top 20 most important features for each of these four classifiers.The features are arranged from top to bottom, with each row representing a specific feature.For each base classifier, different colors are used to denote the contribution of that feature to various categories.Given that VotingClassifier is an amalgamation of individual classifiers, the overall contribution is considered instead of categorical distinctions.
VotingClassifier employs a KernelExplainer.To obtain a global importance chart of features, a summary plot is employed to visualize features' importance.Figure 5 illustrates the top 20 most important features for each of these four classifiers.The features are arranged from top to bottom, with each row representing a specific feature.For each base classifier, different colors are used to denote the contribution of that feature to various categories.Given that VotingClassifier is an amalgamation of individual classifiers, the overall contribution is considered instead of categorical distinctions.3.4.The optimal Feature Set Based on SHAP SHAP, in addition to explaining the model, can also be utilized for feature selection.Appendix A Table A2 presents the SHAP results of the soft voting classifier.Based on this importance ranking, we employed IFS to construct 57 feature subsets.Furthermore, we 3.4.The optimal Feature Set Based on SHAP SHAP, in addition to explaining the model, can also be utilized for feature selection.Appendix A Table A2 presents the SHAP results of the soft voting classifier.Based on this importance ranking, we employed IFS to construct 57 feature subsets.Furthermore, we conducted jackknife cross-validation on the training set and computed the MCC.Through this calculation, we determine that the optimal feature set is the one containing the top 33 features sorted by SHAP feature importance, as shown in Figure 6.When the number of features is 33, the highest MCC is 0.8540.

The optimal Feature Set Based on SHAP
SHAP, in addition to explaining the model, can also be utilized for feature selection Appendix A Table A2 presents the SHAP results of the soft voting classifier.Based on thi importance ranking, we employed IFS to construct 57 feature subsets.Furthermore, we conducted jackknife cross-validation on the training set and computed the MCC Through this calculation, we determine that the optimal feature set is the one containing the top 33 features sorted by SHAP feature importance, as shown in Figure 6.When the number of features is 33, the highest MCC is 0.8540.Retraining the soft voting classifier with the optimal feature subset achieves an accuracy of 0.90.From Table 5, it can be seen that the soft voting classifier performs better on the optimal feature subset containing 33 features compared with the results on the original dataset containing 57 features.Despite the reduction in the number of features, the performance of the model is improved.This demonstrates that SHAP is an efficient method for dimensionality reduction and eliminating redundancy.

Comparison with Other Methods
Research regarding the use of artificial intelligence to predict dispute resolution methods is still limited.Here, we apply two additional models, each predicting different types of dispute resolution methods, to the problem of labor dispute resolutions and compare their performance.Lokanan [15] utilized a Gradient Boosting classifier to predict resolution methods for financial fraud disputes, while Ayhan et al. [17] integrated decision tree C4.5, Naïve Bayes, and Multilayer Perceptron into a majority voting classifier to predict resolution methods for construction project disputes.We compare these two approaches with LDMLSV on our dataset.Table 6 presents the results of the comparison.The experimental results demonstrate that LDMLSV is better suited for our problem.

Discussion
An increasing body of research suggests that utilizing artificial intelligence algorithms to identify and predict critical paths in labor dispute resolutions contributes to efficiency improvements, resource conservation, and cost reduction in this domain.This study introduces a method combining a SHAP-based analysis and soft voting for predicting critical paths in labor dispute resolutions.Given the highly imbalanced nature of labor dispute samples, we opted for the KMeansSMOTE oversampling method and compared it with SMOTE, SVMSMOTE, and ADASYN.The results indicate that, except for the Gaussian Naive Bayes classifier, the performance of other classifiers under KMeansSMOTE oversampling outperformed the results from the other three oversampling methods.This can be attributed to KMeansSMOTE's initial clustering of samples using K-means, followed by SMOTE oversampling within each cluster.This method pays more attention to samples near the boundaries between different classes, facilitating more accurate synthetic sample generation while reducing noise introduction compared to other oversampling methods.
Comparing base classifiers and the soft voting classifier, RF, ET, XGBoost, and CatBoost achieved accuracies exceeding 0.85.We integrated classifiers with accuracies above 0.85 using a soft voting strategy.A comparison was made between the soft voting classifier integrating RF, ET, and XGBoost and the soft voting classifier integrating RF, ET, and CatBoost, revealing superior performance in predicting critical paths in labor disputes for the RF, ET, and CatBoost ensemble.Additionally, the ensemble of RF, ET, XGBoost, and CatBoost using a soft voting strategy did not perform as well as the RF, ET, and CatBoost ensemble.
SHAP, based on the Shapley values from cooperative game theory, offers more precise and stable explanations for models by mathematically measuring the contribution of each feature to predictions.Analyzing the results of four models-RF, ET, CatBoost, and the soft voting classifier-provided insights into the contributions of features to the outcomes.Across these models, No Employment Contract showed the highest contribution to outcomes, followed by the amount of the lawsuit.Comparing the top 20 important features across the four models revealed that 17 features were consistently present: Compensation, Double Pay, Economic Compensation, Employee's Age, Employment Relationship Terminated, Lawsuit Amount, No Employment Contract, Overtime Pay, Sex, Salary, Signing Employment Contract, Unemployment Insurance, Unpaid Medical Insurance Contribution, Unpaid Pension Insurance Contribution, Unpaid Social Insurance Contribution, Unpaid Wages, and Unpaid Maternity Insurance Contribution.The presence of these 17 features suggests their crucial role in characterizing labor dispute cases.
SHAP not only provides a comprehensive assessment of feature importance and explains the contributions of model features but also guides feature engineering and model improvement.Hence, employing an incremental feature selection method based on the SHAP importance rankings from the voting classifier, we obtained the optimal subset containing the top 33 features.Despite reducing the features from 57 to 33, the accuracy reached 0.90.This indicates that SHAP can identify features with minimal or negative impact on the model, accurately eliminating those that do not contribute or may even harm the model's predictive capability.This enhances model simplicity and generalizability.

Conclusions
This paper applies an ensemble soft voting method based on RF, ET, and CatBoost for predicting critical paths in labor dispute resolutions.Addressing sample imbalance using

Figure 1 .
Figure 1.The number distribution of five labor dispute resolutions.

18 Figure 3 .
Figure 3.The workflow diagram for predicting labor dispute resolution (LDMLSV).Output_1represents the prediction results obtained using all features.Output_2 represents the prediction results obtained using the optimal feature subset.

Figure 3 .
Figure 3.The workflow diagram for predicting labor dispute resolution (LDMLSV).Output_1represents the prediction results obtained using all features.Output_2 represents the prediction results obtained using the optimal feature subset.

Figure 5 .
Figure 5.The contributions of the top 20 features from four different models on the test set.(a) RF (b) ET, (c) Catboost, and (d) VotingClassifier (RF+ET+Catboost).

Figure 6 .
Figure 6.The IFS results under jackknife cross-validation.Value 33 is the best number of optimal features.

Table 1 .
Best hyperparameters of base classifiers after Grid Search when selecting the KMeansSMOTE sample balancing method.

Table 3 .
The predictive outcomes among five dispute resolution paths.

Table 4 .
Analysis of F1-score Sensitivity of KMeansSMOTE to Different Sampling Strategies.
Note: Class 0: Mediation in arbitration, Class 1: Arbitration awards, Class 2: First-instance mediation, class 3: First-instance judgments, class 4: Second-instance judgments, Support: The number of samples for each class.

Table 5 .
Comparison of soft voting classifiers trained on different feature dimensions.

Table 6 .
Comparison with previous research.