Impact of System and Diagnostic Errors on Medical Litigation Outcomes: Machine Learning-Based Prediction Models

No prediction models using use conventional logistic models and machine learning exist for medical litigation outcomes involving medical doctors. Using a logistic model and three machine learning models, such as decision tree, random forest, and light-gradient boosting machine (LightGBM), we evaluated the prediction ability for litigation outcomes among medical litigation in Japan. The prediction model with LightGBM had a good predictive ability, with an area under the curve of 0.894 (95% CI; 0.893–0.895) in all patients’ data. When evaluating the feature importance using the SHApley Additive exPlanation (SHAP) value, the system error was the most significant predictive factor in all clinical settings for medical doctors’ loss in lawsuits. The other predictive factors were diagnostic error in outpatient settings, facility size in inpatients, and procedures or surgery settings. Our prediction model is useful for estimating medical litigation outcomes.


Introduction
Medical litigation claims and costs resulting from medical errors and malpractice have increased over the past decade and have a negative impact on the health economics of both patients and medical staff [1][2][3]. Given the negative impacts of litigation on healthcare, the risk of medical litigation must be minimized for medical staff, litigation associates, and patient safety. It is better for medical staff to understand the factors that influence litigation outcomes [3,4]. System and diagnostic errors have been reported as contributing factors to malpractice claims [5][6][7][8] and recently recognized as essential issues in medical economics, health care quality, and patient safety [9,10]. In addition, previous studies have indicated the following factors associated with litigation outcomes: night shift, unnecessary surgery, sequelae, and death [5,8,11,12].
Therefore, a reliable prediction model for litigation outcomes in medical litigation research is critical for improving hospital management, which can effectively reduce plaintiff victory (medical doctor loss) [12]. However, such studies using conventional logistic regression models [13] or machine learning [14,15] to predict litigation outcomes involving medical doctors at the individual and system levels are limited. Systematic reviews have reported that prediction validity using machine learning is similar or slightly better than that of logistic regression models [16,17]. Additionally, different clinical settings, such as outpatient, inpatient, procedure, and surgery, should be investigated in each category because of their high degree of heterogeneity.
In this study, we aimed to develop and evaluate a high-prediction model for litigation outcomes in medical litigation in Japan using machine learning. Additionally, we clarified the impact of predictive factors on plaintiff victory (medical doctor losses) among comprehensive predictive factors in different clinical settings. Our hypothesis is that system and diagnostic errors contribute to medical litigation outcomes in which the doctor loses. This study will help to prepare for medical litigation, recognize modifiable factors, and improve the medical management system.

Study Design and Setting
This was a retrospective cohort study based on the medical malpractice litigation records against medical doctors in Japan. We partially followed the guidelines of the transparent reporting of a multivariable prediction model for the individual prognosis or diagnosis (TRIPOD) statement [18] (Table S1). The requirement for ethical approval was waived because the data were anonymous and obtained from a publicly available database.

Data Source and Study Population
We extracted data on malpractice claims against medical doctors between January 1961 and June 2017 on 29 June 2017. We used the most extensive public database in Japan (Westlaw Japan Ltd.), which includes detailed clinical information such as full text and accurate precedents. Medical claims, medical litigation, medical malpractice, diagnostic errors, wrong diagnosis, missed diagnosis, and delayed diagnosis were among the preselected keyword combinations [19]. After developing and thoroughly considering the rules in advance, five reviewers, including a lawyer and an internal medicine physician familiar with medical malpractice, performed data extraction. Of all claims extracted based on the keywords, we extracted 3430 malpractice claims. After removing duplicates (n = 751), applying the exclusion criteria (intentional crimes, robbery, financial difficulties, and veterinary claims; n = 707), and rejecting unfair lawsuits and claims against all other practitioners (n = 170), we extracted 1802 medical malpractice claims against medical doctors in Japan. We excluded claims with missing data as follows: patient age had 339 missing cases; clinical outcome had 55 missing cases; specialized field had 35 missing cases; facility size had 22 missing cases; time zone had 3 missing cases; and place had 2 missing cases. The final analysis included 1399 malpractice claims ( Figure 1).

Outcomes
The primary outcome was final judgment litigation (acceptance or rejection). Acceptance meant that the medical doctor lost the medical malpractice lawsuit, whereas rejection meant that the medical doctor won it.
The secondary outcomes were clinical outcomes, including full recovery, sequelae with a permanent disorder, and death. All payments as compensation for malpractice

Outcomes
The primary outcome was final judgment litigation (acceptance or rejection). Acceptance meant that the medical doctor lost the medical malpractice lawsuit, whereas rejection meant that the medical doctor won it.
The secondary outcomes were clinical outcomes, including full recovery, sequelae with a permanent disorder, and death. All payments as compensation for malpractice claims were converted from the Japanese yen to US dollars using the Japanese consumer price index (115 yen to the US dollar; 12 January 2022).

Variables and Definitions
We selected variables based on clinical judgment and previously published literature [5,20]. We collected the following data: patient sex and age (plaintiff); the medical malpractice situation, such as the time of day (day or night shift); place where the malpractice occurred (outpatient office, emergency room, ward, and operating room); specialized field; initial diagnosis; and the institution size, such as clinic, small hospital with beds less than 200, medium hospital with beds between 200 and 399, large hospital with greater than 400 beds, or a university hospital.
The following was the detailed litigation information: board-certified doctor, the subject of the litigation (individual medical doctor or a group or hospital), the reason for the litigation (procedure, management, education, and others), and the treatment written in the precedent (medication, procedure, and others).
Medical errors were divided into diagnostic and systemic errors. A diagnostic error was defined as a delayed diagnosis, missed diagnosis, or incorrect diagnosis by an individual medical doctor [21]. Based on descriptions in the case records and following the original study [7], the system errors were categorized as follows: technical and equipment failure; clustering; inadequate policies and procedures; inefficient and non-standard processes; poor teamwork or communication; patient neglect; management problems; poor coordination of care, supervision, or education problems; unavailable expert consultation; lack of training and orientation; personnel problems, such as laziness and violations; and external interference. Multiple malpractices or complications are common, and they are not mutually exclusive. An example of a system error is shown below. Poor communication includes left-right errors on the surgical side. Management problems include a lack of proper follow-up periods and the wrong follow-up interval for the disease.

Statistical Analysis
Categorical variables were presented as numbers and percentages, while continuous variables were presented as medians and interquartile ranges (IQR). For group comparisons, the Mann-Whitney U test was used for continuous variables, and the chi-square test or Fisher's exact test was used for categorical variables. Data analysis was performed using Stata SE version 17.0 (StataCorp, College Station, TX, USA). Statistical significance was set at a two-tailed p-value < 0.05.
Machine learning has adopted a binary classification. Supervised learning was used as the machine learning method. We developed a machine learning model that predicted the possibility of acceptance or rejection using the characteristics of the above-mentioned 64 variables. All machine learning models were implemented using Python (version 3.7.12).
We divided the data into 70% training data and 30% testing data when we first built the model. Twenty percent of the test data were further used as validation data. For hyperparameter tuning, the optimal hyperparameter was found using a grid search from the scikit-learn library using test and validation data. Then, we performed training using the training data with an optimized machine learning model and measured the performance with test data in binary classification. Cross-validation of the training datasets was performed to avoid overfitting [22]. We also performed a stratified 10-fold crossvalidation to avoid data bias for each fold.
In this study, a simple linear model (logistic model) and three machine learning models, including the logistic model, decision tree, random forest, and light gradient boosting machine (LightGBM), were implemented to predict the factors contributing to plaintiff victory and to analyze the impact of predictive factors on litigation outcomes.
A logistic model is a statistical model that determines the optimal linear model coefficients to describe the relationship between the logit transformation of a binary dependent variable and one or more independent variables. Logistic models are simple forecasting approaches that provide a baseline accuracy score for comparison with other machine learning models [23].
A decision tree analysis is an analytical approach that separates predictor values in stages using binary partitioning. All the values of the predictor were evaluated as potential splits, whereas the optimal split was determined using the decrease in the entropy of the information. A classification and regression tree (CART) analysis was selected for this study [24].
Random forest is an ensemble learning algorithm that integrates multiple weak learners with decision trees to improve generalization ability [25]. It is a collection of several slightly different decision trees based on the ensemble learning bagging and has the characteristic of being less prone to overfitting.
LightGBM is a machine learning algorithm that combines a decision tree model with ensemble learning, which is a process called gradient boosting [26]. Gradient boosting is a machine learning model that eliminates the drawbacks of high calculation costs.

Performance Metrics and Feature Importance
Six performance metrics for machine learning used in this study were accuracy, precision, recall, specificity, F1 score, and area under the curve (AUC), which was calculated from the receiver operating characteristic (ROC) curve. These metrics are related to the classifier's ability and calculated with true positives (TP), true negatives (TN), false negatives (FN), and false positives (FP).
It is difficult to correctly interpret the output results from the machine learning model. We used the SHApley Additive exPlanation (SHAP) value, which is a unified approach, to explain the results of the machine learning models. SHAP assigns attribution values to each feature in each predictive model that are consistent and locally accurate [27]. In this study, the SHAP value was used to evaluate the feature importance.

Results
We analyzed 1399 medical litigations against medical doctors in Japan. Table 1 summarizes the demographic data for all medical claims. The median age of the patients was 33 years (IQR, 9-54), and age 0 had the highest proportion, with 253 (18.1%). The 764 (51.2%) malpractice claims resulted in acceptance (the medical doctor losing the malpractice lawsuit), and the adjusted median indemnity paid was $225,756 (IQR: 54,316-482,578). The most common patient outcome was death (56.1%), and infancy accounted for 9.3% of the deaths. Procedures or surgeries were the most common reasons for litigation, with the highest acceptance (56.1%) and residual sequence (49.1%). IQR: interquartile range. Accepted: The medical doctor has lost the case. Note: The total billing amount and median indemnity were adjusted to their 2017 equivalents using the Japanese Consumer Price Index (shown in USD, 1$ = ¥115, 12 January 2022). Table 2 shows the clinical and litigation factors for litigation outcomes, with a crude comparison between the two groups. The top five initial diagnoses involved in malpractice claims were in the following order: malignant neoplasm, neonatal disease, trauma, procedure and postoperative complications, and acute coronary syndrome, which were not significantly associated with litigation outcome. The factors that significantly associated with accepted claims (medical doctor loss) were as follows: clinic, small hospital (<200 beds), system error, diagnostic error, litigation subject (individual medical doctor), and sequence.
For feature importance for machine learning algorithms, we evaluated the importance rank, which indicates the importance of the input feature. The top five features important in lightGBM were system error, diagnostic error, reason for litigation (diagnosis), patient age, and era, in descending order (Figure 2). The top five features important in decision trees were system error, reason for litigation (diagnosis), diagnostic error, era, and facility size, in descending order ( Figure S2). The top five features important in random forest were system error, diagnostic error, reason for litigation (diagnosis), facility size, and patient age, in descending order ( Figure S3). The common features were system errors, diagnostic errors, and reasons for litigation (diagnosis).

Indemnity Costs
We calculated the indemnity cost for the top five predictive factors using LightGBM (Table 4). If there were more than three categories in the predictive factor, the most common acceptance category was selected. In all accepted cases, system error had the highest proportion (82.9%) and the highest indemnity cost (82.5%). Patients aged 0 years had the highest median indemnity cost (median $349,625, IQR $126,867-727,673). In the subgroups, the highest median indemnity cost was diagnostic error in the outpatient group, era (1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999) in the inpatient group, and patient age (age 0) in the procedures or surgery group. Diagnostic error and system error accounted for the highest proportion of total indemnity in each group.

Discussion
Machine learning has demonstrated a high performance in predicting litigation outcomes in medical litigation. System error was the most significant predictive factor for medical doctors' loss in lawsuits in all clinical settings. The second predictive factor was diagnostic error in outpatient settings, inpatient facility size, and procedures or surgery settings.
Our prediction model had a good prediction ability using LightGBM (AUC 0.894 [95%CI; 0.893-0.895]) for all patient data. Our results are comparable to those of predictive models using machine learning with other clinical data [28,29]. It is likely that this dataset was a favorable population for prediction, because other machine learning methods also produced good results. We selected various factors, such as patient factors, medical doctor factors, and hospital factors, that could be extracted from medical lawsuit records and were likely to be associated with litigation outcomes for use in the prediction model. The assumed subgroup settings were affected as a result. Therefore, various factors must be considered to develop a good prediction model for litigation outcomes.
According to the machine learning analysis results, the system error, rather than other factors, was the most predictive factor in clinical settings. Because the legal structure and environment of medical litigation vary significantly from country to country, it may be difficult to generalize them. However, we believe that at least in the largest dataset of Japanese medical litigation, pursuing systemic problems, such as working status, a lack of standard patient safety efforts, and a lack of supervision within an organization, rather than individual medical staff errors, might result in a case being lost [10]. Other predictive factors for an accepted claim (medical doctors' loss) using LightGBM were diagnostic error, reason for litigation (diagnosis), facility size, and patient age. These results are consistent with those of previous studies on internal medicine and orthopedic surgery [5,8,12]. Various previous studies have estimated that physician diagnostic errors in the outpatient setting may range from 3-10%, and the negative impact of diagnostic errors is a significant and urgent problem that needs to be addressed [9,30]. Thus, it is reasonable to understand that medical errors (system and diagnostic errors) are related to litigation outcomes. If a judge determines that a medical error is the basis for a lawsuit, the outcome will be unfavorable to medical providers owing to emotional appeal on the plaintiff and unprofessional negative medical behavior.
Different factors in a different order were predictive factors in three different clinical settings, namely outpatient, inpatient, and procedure or surgery. No study has examined the risk factors for litigation outcomes in different clinical settings in any department. Orthopedic surgery research has reported that the significant factors for an accepted claim are unnecessary surgery, neurological deficit, and death [12,31], although the setting types were not distinguished. The factors required for medical doctors in each clinical setting and medical errors that are likely to occur are different. Therefore, it is necessary to consider each clinical setting in medical litigation research.

Strengths
First, this is the first study in Japan to use a prediction model to predict litigation outcomes for medical cases. Additionally, the prediction model using LightGBM demonstrated high performance. The results of this study can be referred to by medical litigation associates and medical staff when facing medical litigation, although the ideal implication of the prediction model is a free calculator available on a website that allows missing values [28]. Second, we classified the different clinical settings into three categories: outpatient, inpatient, and procedure or surgery. Our results revealed a high degree of heterogeneity in medical litigation. This prediction model can also be used retrospectively to assess the medical quality of each setting. Third, we focused on system and diagnostic errors as predictive variables. If medical providers can recognize modifiable factors through the results of this study, it will contribute to a safer medical management system and a reduction in medical lawsuits and malpractice cases, which are associated with high socioeconomic costs and burdens. Such a situation may lead to a sincere attitude of apology and open disclosure among medical professionals, rather than concealment or contention of the patient's claim. The results of this predictive study will provide evidence for future causal inference studies on medical litigation and patient safety.

Limitations
First, our data contained an inherent selection bias because the information was obtained from only a single Japanese database. In Japan, most medical litigation claims are settled out of court [32]. Because our data excluded claims dismissed before trial or settled out of court, it is difficult to generalize the findings to other countries with different legal and medical systems [5]. Second, these data did not consider legal changes in the form of trials in Japan. Japan implemented a jury system, known as the "citizen judge system", on 21 May 2009 [33]. However, this jury system applies only to criminal trials, which were few because of the exclusion of a few criminal trials against medical professionals. Further research must determine whether our findings can be applied to medical litigation in other countries. Third, the database contained information biases. The descriptions in the database are not medical descriptions but rather the perspective of the patient, which is not always medically accurate. However, this issue can be considered a non-differential (random) misclassification that occurs equally in all study groups. Fourth, there were unmeasured factors, such as personal information on medical doctors (for example, age, sex, and graduate year), because these factors were anonymized in the database. More extensive validation using valid predictive data will be necessary in the future. The fifth limitation is the generalization performance of machine learning. Machine learning performance depends on the available training data, and deviations in input from training values can result in poor machine learning model performance. In this study, the machine learning model showed excellent internal validity; however, continuous learning and rigorous external verification will be necessary in the future. Although there were some biases and limitations in this study, our results have drawn attention to the potential impact of predictive factors on medical litigation outcomes involving medical doctors.
In conclusion, we developed a high-performance prediction model using machine learning to estimate litigation outcomes in medical litigation in Japan. Our model will be useful for estimating medical litigation outcomes.