AgentsBench: A Multi-Agent LLM Simulation Framework for Legal Judgment Prediction
Abstract
1. Introduction
2. Related Work
2.1. LLM-Based Multi-Agent System
2.2. Legal Judgment Prediction
2.3. Large Language Models in Law
3. Agents on the Bench
3.1. Bench Selection
3.1.1. Preliminary: LLM Agent
3.1.2. Bench Selection
3.2. Independent Sentencing
- C represents the case details, including the fact description and legal statutes, provided to all agents.
- denotes the unique personal factors of agent i, including their background, experience, and perspectives.
- is the analysis function used by agent i, which takes into account both C and to generate the initial sentencing decision .
3.3. Deliberation
3.4. Decision Making
- represents the set of updated sentencing decisions from all agents after the final round of deliberation.
- includes the complete history of discussion points raised during all deliberation rounds.
- g is the synthesis function, executed by the presiding judge to combine all available input and formulate the final sentencing decision.
4. Experiment
4.1. Task and Dataset
4.2. Setup
4.2.1. Baselines
- Standard Prompt: This baseline involves prompting LLMs to output only the prison term, without any additional contextual information or step-by-step reasoning. This approach aims to measure the basic decision-making capability of LLMs without specialized prompting strategies.
- CoT: The zero-shot Chain of Thought (CoT) method enhances reasoning by incorporating the phrase “Let’s think step by step” into the prompt. This method encourages the LLM to engage in a reasoning process before providing an answer, potentially improving the quality and accuracy of the prediction by prompting the model to articulate intermediate reasoning steps [35].
- LS: Legal syllogism (LS) prompting is a zero-shot approach that instructs the LLM to apply syllogistic reasoning to legal judgment prediction tasks. The prompt first defines the legal syllogism structure and then guides the model through applying it to the given case [29]. This approach tests the model’s ability to logically derive the outcome based on legal articles. By comparing against LS prompting, we can evaluate how effective our multi-agent deliberation process is compared to a purely logical and structured method.
4.2.2. Implementation
4.3. Evaluation
4.3.1. Performance Evaluation
4.3.2. Quality Evaluation
4.4. Results
4.5. Case Study
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Case Detail
Appendix B. Bench Discussion Content
Appendix B.1. Presiding Judge: Zhou
Sentence Term: 60 months
Reason: Based on the specific circumstances of the case, defendant Liu, during his tenure as a public official, took advantage of his position to accept bribes of 28,900 yuan, and, while negotiating demolition work, fabricated facts to defraud 300,000 yuan. He is guilty of bribery and fraud. According to Articles 266 and 385 of the Criminal Law, the fraud involved a large sum and serious circumstances, and the bribery involved abusing official power, which is a serious offense.
During sentencing, I considered the circumstances of the crime, the amount involved, and its impact on society. After comprehensive analysis, I believe a heavier sentence is warranted. While the defendant may have shown some degree of remorse, the nature of his crime and the harm caused to society make a heavy penalty necessary to serve as a warning and deterrent. Therefore, sentencing him to 60 months in prison is appropriate.
Appendix B.2. Judge: Zhang
Sentence Term: 48 months
Reason: During his tenure as a public official, defendant Liu used his position to accept bribes totaling 28,900 yuan, and while negotiating demolition work, fabricated facts to defraud 300,000 yuan. This constitutes bribery and fraud. According to Articles 266 and 385 of the Criminal Law, the amounts involved were substantial, and the circumstances were serious, making this a significant legal offense.
When determining the sentence, I took into account the following factors: Firstly, the social harm caused by the defendant’s actions is considerable, especially given that he was a state official whose behavior undermined public interest and societal trust. Secondly, while the defendant may have shown some remorse, the severity and magnitude of his crimes necessitate an appropriate level of punishment. Lastly, given that the defendant might be a first-time offender who has shown remorse, a moderately lenient punishment could be considered.
Therefore, after considering all these factors, I believe a sentence of 48 months in prison is appropriate, serving both punitive and deterrent purposes while recognizing the defendant’s expression of remorse.
Appendix B.3. Lay Judge: Su
Sentence Term: 54 months
Reason: Defendant Liu, during his tenure as a public official, abused his position to accept bribes of 28,900 yuan and fabricated facts to defraud 300,000 yuan, which constitutes bribery and fraud. These actions not only harmed public interests but also undermined public trust in the government, thus having significant social harm.
In sentencing, I believe the following factors should be considered: Firstly, the nature of the defendant’s actions was serious. As a state official, using his position for illegal activities warrants severe punishment to uphold the authority of the law and social justice. Secondly, while the defendant may have expressed some remorse, the magnitude and severity of his crimes still warrant appropriate punishment as a warning.
Therefore, sentencing him to 54 months in prison is appropriate as it not only serves as a punishment for his actions but also takes into account his remorse, striving for a balance between punishment and rehabilitation.
Appendix B.4. Presiding Judge Deliberation (Round 1): Zhou
Presiding Judge (Zhou):
Colleagues, thank you for your input. Now, let’s discuss the differences in sentencing opinions. First, I’ll summarize each perspective:
Presiding Judge Zhou: The defendant’s actions were serious, involving substantial bribery and fraud, with significant social harm. Therefore, I recommend a sentence of 60 months in prison to serve as a warning and deterrent. Judge Zhang: Although the defendant’s actions were serious, he may have shown some remorse and is a first-time offender. Therefore, I recommend a sentence of 48 months in prison to acknowledge his expression of remorse. Lay Judge Su: Supports a heavier sentence, noting that the defendant’s actions harmed public interests and suggesting 54 months in prison to uphold the authority of the law and social justice.
In our discussion, we need to focus on the following aspects:
- -
Nature of the crime and social impact: Does the defendant’s abuse of power for bribery and fraud warrant a heavier sentence?- -
Remorse and first-time offense: Has the defendant shown remorse, and should this be considered in sentencing?- -
Justice and social effects of sentencing: How should we balance the punishment of the defendant with the social deterrent effect?
I invite each of you to express your views on these points, particularly regarding whether the defendant’s remorse is sufficient to affect sentencing. Let us explore this perspective further to see if we can reach a consensus.
Appendix C. Sentencing and Deliberation Process—Continued
Appendix C.1. Judge Deliberation (Round 1): Zhang
In this case, there are differences among the members of the collegial panel regarding the sentencing of defendant Liu. As a judge, I will analyze the legal application, the nature of the crime, the social impact, and the defendant’s remorse to better explain my viewpoint.
Firstly, regarding the defendant’s actions, according to Articles 385 and 266 of the Criminal Law, his conduct constitutes bribery and fraud. Liu, during his tenure as a public official, accepted bribes of 28,900 yuan and fraudulently obtained 300,000 yuan by fabricating facts. The amounts involved are substantial, and the circumstances are severe. Such conduct not only harms public interests but also damages public trust in the government, leading to significant social harm.
Secondly, when determining the sentence, we must consider the circumstances of the crime, the amount involved, the social impact, and the defendant’s remorse. It is important to note that while the defendant’s actions are serious, he has shown genuine remorse and has no prior criminal record. These factors warrant a more lenient approach to sentencing, reflecting both his willingness to take responsibility and the need to balance punishment with opportunities for rehabilitation.
The Presiding Judge Zhou’s suggestion of 60 months emphasizes deterrence and the need to protect societal interests, and I acknowledge the validity of these concerns. However, I believe that we must also give weight to the defendant’s remorse and the mitigating factors related to his first-time offense.
Lay Judge Su’s suggestion of 54 months strikes a certain balance between punishment and rehabilitation, but I believe that a lighter sentence could still effectively convey the seriousness of the offense while recognizing the defendant’s mitigating factors.
In conclusion, considering all viewpoints, I maintain that a sentence of 48 months is appropriate. This decision takes into account the need for punishment, deterrence, and the defendant’s genuine remorse, providing a more balanced approach to justice. A 48-month sentence would still serve as an adequate punishment while also allowing for the possibility of rehabilitation, ultimately aligning with the relevant provisions of the Criminal Law and ensuring a just outcome.
Therefore, I recommend that the panel reach a consensus on imposing a sentence of 48 months on the defendant Liu.
Appendix C.2. Lay Judge Deliberation (Round 1): Su
As a Lay Judge, I believe that the sentencing in this case is highly important as it concerns both social justice and public trust in the law. Liu’s actions are not only violations of the law but also serious infringements on public interests. As a public official, he abused his position for bribery and fraud, which undermined public trust in the government. The social harm of such behavior is evident.
The Presiding Judge Zhou proposed a sentence of 60 months, which I understand emphasizes deterrence and punishment. Such a sentence can convey a zero-tolerance stance on similar behavior, uphold the authority of the law, and strengthen public confidence. As a public servant, Liu ’s actions were not only personal misconduct but also a betrayal of public office, and society’s tolerance for such behavior should be very low.
Judge Zhang’s suggested sentence of 48 months, although considering the defendant’s remorse, does not fully reflect the seriousness of the behavior. The amounts and circumstances involved indicate the severity of his actions, and a lenient sentence could lead to a misunderstanding of the law, implying that illegal behavior can easily evade punishment.
Therefore, I maintain that sentencing Liu to 54 months in prison is appropriate. This is not only a punishment for his actions but also a warning to society, effectively upholding the authority of the law and public trust. As judges and jurors, we must reflect our firm stance on social justice in our sentencing.
Appendix C.3. Presiding Judge Consensus Determination: Zhou
Conclusion: No
Main Points of Disagreement: The panel members have significant differences regarding sentencing. Presiding Judge Zhou suggests a sentence of 60 months, emphasizing the severity of the defendant’s actions and the need for deterrence. Judge Zhang suggests 48 months, considering the defendant’s remorse and first-time offense. Lay Judge Zhang supports 54 months, aiming to strike a balance between punishment and rehabilitation. The different perspectives on sentencing and the rationale behind them have led to a lack of consensus.
Appendix C.4. Presiding Judge Updated Sentencing: Zhou
Presiding Judge (Zhou):
After the panel discussion, I believe we need to reassess the sentencing for defendant Liu. While I initially proposed a sentence of 60 months, emphasizing the severity of his actions and the deterrent effect on society, after hearing the views of the judges and jurors, I realize that we must more comprehensively consider the defendant’s remorse and first-time offense.
Firstly, the defendant’s actions indeed constitute bribery and fraud, with substantial amounts involved and significant social harm. However, Judge Zhang Jun pointed out the importance of considering the defendant’s remorse and first-time offense. If the defendant sincerely showed remorse during the trial and it was his first offense, this could be considered grounds for a lighter sentence.
Secondly, Lay Judge Zhang Wei’s suggestion of a 54-month sentence strikes a balance between punishment and rehabilitation. I believe that further adjustments can be made to better reflect the defendant’s remorse and first-time offense.
Therefore, I have decided to modify my sentencing opinion to 54 months. This sentence can reflect the punishment for the defendant’s actions while considering his remorse and first-time offense to some extent. Such a decision aligns with the relevant provisions of the Criminal Law and conveys a firm stance on social justice while maintaining the authority of the law.
In conclusion, I propose that the panel supports sentencing defendant Liu to 54 months. I invite the judges to discuss this opinion.
Appendix C.5. Judge Revises Sentence—Zhang
After the in-depth discussion among the collegial panel, I have re-examined the sentencing of defendant Liu and considered the opinions and suggestions presented by all parties. Based on the following reasons, I have decided to revise my sentencing recommendation to 54 months of imprisonment.
Firstly, while the defendant Liu’s actions constitute bribery and fraud, involving a substantial amount and causing significant social harm, we must also consider his remorse and the fact that this is his first offense when determining the sentence. Judge Zhang emphasized the defendant’s remorse, particularly his attitude during the trial, which should be an important factor considered during sentencing. If the defendant can demonstrate sincere remorse and actively return the illicit gains, these can be considered as grounds for a lighter punishment.
Secondly, Lay Judge Su proposed a sentence of 54 months, which strikes a balance between punishment and education, effectively serving as a deterrent for similar behavior while recognizing the defendant’s remorse. This sentencing is in line with the relevant provisions of the Criminal Law, while also conveying a firm stance on social justice and upholding the authority of the law.
Finally, considering that Liu was a public official, the harm caused to the public interest cannot be ignored. Therefore, the severity of the punishment must still be reflected in the sentencing. A sentence of 54 months not only effectively punishes the defendant but also acknowledges his remorse to a certain extent, balancing punishment and rehabilitation.
In conclusion, I believe that sentencing defendant Liu to 54 months of imprisonment is appropriate. This decision not only maintains the authority of the law but also helps promote the defendant’s reform, maximizing the societal effect. I hope the collegial panel can reach a consensus on this opinion.
Appendix C.6. Lay Judge Revises Sentence—Su
As a lay judge, after the collegial panel’s discussion, I have re-evaluated the sentencing of defendant Liu. Based on the following reasons, I have decided to revise my sentencing recommendation to 54 months of imprisonment.
Firstly, the defendant Liu’s actions indeed constitute bribery and fraud, involving a significant amount and causing serious social harm. Such behavior not only harms public interests but also undermines public trust in the government. Therefore, it is necessary to impose a severe penalty on the defendant to uphold the authority of the law and social justice.
Secondly, in determining the sentence, we must consider the defendant’s remorse and the fact that this is his first offense. Although his actions caused harm to society, his genuine remorse during the trial and being a first-time offender can be grounds for leniency. Judge Zhang and Lay Judge Su both mentioned the defendant’s remorse, which I believe is significant. We must strike a balance between punishment and education, ensuring that we punish the defendant’s actions while also giving him a chance to reform.
Lastly, a sentence of 54 months strikes a balance between punishment and education, effectively deterring similar offenses while acknowledging the defendant’s remorse. This sentencing is not only in line with the relevant provisions of the Criminal Law but also conveys a firm stance on social justice while upholding the authority of the law.
In conclusion, I believe sentencing defendant Liu to 54 months is appropriate. This decision not only effectively punishes his actions but also acknowledges his remorse to some extent, achieving a balance between punishment and rehabilitation. I hope the collegial panel can reach a consensus on this opinion.
Appendix C.7. Collegial Panel Discussion (Round 2)—Zhou
Presiding Judge (Zhou):
Colleagues, thank you for your thorough discussion. We have reached a certain consensus regarding the sentencing of defendant Liu, and the current recommendation is 54 months of imprisonment. I will now summarize the opinions of each party and guide us further to ensure we reach a final agreement on the sentencing.
**Presiding Judge Zhou**: Initially suggested 60 months, emphasizing the severity of the defendant’s actions and their deterrent effect on society. However, after considering the opinions of all parties, I recognized the importance of remorse and the fact that this was a first offense, and revised my recommendation to 54 months. **Judge Zhang**: Believes the defendant’s remorse and first-time offense should be considered, supporting the 54-month sentence as a balance between punishment and education. **Lay Judge Su**: Also supports the 54-month sentence, emphasizing the social harm caused by the defendant’s actions while recognizing the importance of remorse. Su believes that a 54-month sentence effectively serves as a deterrent.
Based on this, I would like to further discuss the following issues to help us reach a final consensus:
- -
**Specifics of the Remorse**: Did the defendant demonstrate specific remorse during the trial? Did he actively return the illicit gains? Are these factors sufficient to influence sentencing?- -
**Evaluation of Social Impact**: Have we adequately considered the long-term impact of the defendant’s actions on society? Is a 54-month sentence sufficient to convey a zero-tolerance attitude towards similar actions?- -
**Consistency in Legal Application**: Are there similar sentencing standards in comparable cases? Can we ensure that this decision maintains consistency in the application of the law?
Please share your thoughts on the above issues, especially regarding whether the defendant’s remorse is sufficient to affect the sentencing. Let’s explore this further to see if we can reach a consensus.
Appendix C.8. Collegial Panel Discussion (Round 2)—Zhang
In this case, the members of the collegial panel have reached a certain consensus regarding the sentencing of the defendant Liu. The current agreed-upon opinion is to sentence Liu to 54 months of imprisonment. As a judge, I will analyze the legal application, the nature of the crime, the social impact, and the defendant’s remorse to better explain my viewpoint.
Firstly, Liu’s conduct constitutes bribery and fraud, involving a substantial amount, causing significant social harm. According to Articles 385 and 266 of the Criminal Law, both bribery and fraud involve large sums and severe circumstances, deserving corresponding punishment. The defendant used his position to engage in illegal activities, harming public interests and undermining trust in the government, which clearly shows the social harm of his actions.
Secondly, when determining the sentence, we must comprehensively consider the circumstances of the crime, the amount involved, the social impact, and the defendant’s remorse. While the defendant may have shown some remorse, the nature of his crime and its impact on society necessitate a significant penalty. Presiding Judge Zhou initially proposed a 60-month sentence, emphasizing the deterrent and punitive effects on society, which is crucial. As a public official, Liu’s conduct was not just an individual violation of the law, but a serious betrayal of public duties. The tolerance for such behavior in society should be extremely low.
However, Judge Zhang and Lay Judge Su have suggested a sentence of 54 months, taking into account the defendant’s remorse and first-time offense, which are indeed factors that should be considered during sentencing. Whether the defendant has genuinely shown remorse during the trial and whether he actively returned the illicit gains are elements that can be grounds for leniency.
After the collegial panel’s discussion, I believe that 54 months of imprisonment strikes an appropriate balance between punishment and rehabilitation, effectively conveying a warning to others while acknowledging the defendant’s remorse. Such a sentence not only aligns with the relevant provisions of the Criminal Law but also conveys a firm stance on upholding social justice while maintaining the authority of the law.
In conclusion, I support sentencing Liu to 54 months of imprisonment. This sentence effectively punishes the defendant’s actions while acknowledging his remorse to some extent, achieving a balance between punishment and rehabilitation. I hope the collegial panel can reach a consensus on this opinion.
Appendix C.9. Collegial Panel Discussion (Round 2)—Su
As a lay judge, I believe that the sentencing of defendant Liu in this case is not only about his personal fate but also about social justice and the authority of the law. After a thorough analysis of the case, I maintain that sentencing Liu to 54 months of imprisonment is appropriate, for the following reasons:
Firstly, Liu’s conduct constitutes bribery and fraud, involving substantial amounts and significant social harm. As a public official, he abused his position to engage in illegal activities, damaging public interests and undermining public trust in the government. The social harm caused by such behavior is evident, and severe punishment is necessary to uphold the authority of the law and social justice.
Secondly, during sentencing, we must consider the defendant’s remorse and first-time offense. Although his actions have harmed society, his genuine remorse during the trial and the fact that this was his first offense can be considered as grounds for leniency. Judge Zhang and Presiding Judge Zhou both mentioned the defendant’s remorse, which I believe should be taken seriously. We need to strike a balance between punishment and education, ensuring that we effectively punish the defendant while giving him the opportunity to reform.
Finally, a sentence of 54 months strikes a balance between punishment and rehabilitation, effectively serving as a deterrent for similar offenses while recognizing the defendant’s remorse. This sentence is not only consistent with the relevant provisions of the Criminal Law but also conveys a firm stance on upholding social justice while maintaining the authority of the law.
In conclusion, I believe that sentencing Liu to 54 months of imprisonment is appropriate. This decision not only effectively punishes his actions but also acknowledges his remorse to some extent, achieving a balance between punishment and rehabilitation. I hope the collegial panel can reach a consensus on this opinion.
Appendix C.10. Presiding Judge’s Consensus Decision—Zhou
Conclusion: Yes
The final sentencing opinion is: defendant Liu is sentenced to 54 months of imprisonment.
After thorough discussion among the collegial panel, all three members have reached a consensus, agreeing that 54 months of imprisonment is appropriate for effectively punishing Liu’s conduct while also acknowledging his remorse to some extent, achieving a balance between punishment and rehabilitation.
Appendix C.11. Summary of Collegial Panel Discussion—Zhou
In the trial of this case, the members of the collegial panel conducted an in-depth discussion on the sentencing of defendant Liu and ultimately reached a consensus, sentencing him to 54 months of imprisonment. The following is a summary of the discussion process and the reasons for reaching a consensus:
### Summary of the Discussion Process:
**Initial Opinions**:
- -
Presiding Judge Zhou initially recommended 60 months, emphasizing the severity of the defendant’s actions and their deterrent effect on society.- -
Judge Zhang proposed 48 months, considering the defendant’s remorse and first-time offense.- -
Lay Judge Su supported a sentence of 54 months, seeking a balance between punishment and education. **Analysis of Differences**:
- -
There were differences in the basis and perspective of sentencing. The presiding judge emphasized social impact and deterrence, while the judge focused on remorse and first-time offense considerations. **Revisiting Sentencing**:
- -
After the collegial panel’s discussion, Presiding Judge Zhou recognized the importance of the defendant’s remorse and first-time offense, and revised his recommendation to 54 months.- -
Judge Zhang and Lay Judge Su also supported the 54-month sentence, believing it struck a balance between punishment and rehabilitation. **Final Consensus**:
- -
During the second round of discussion, all members of the collegial panel expressed support for a 54-month sentence, believing that such a decision effectively punishes the defendant while recognizing his remorse.
### Reasons for Reaching a Consensus:
**Severity of the Crime**:
- -
Liu’s conduct constitutes bribery and fraud, involving a significant amount, harming public interests, and undermining public trust in the government. **Consideration of Remorse**:
- -
The defendant demonstrated a degree of remorse during the trial, and since this is his first offense, leniency should be considered during sentencing. **Consistency in Legal Application**:
- -
A sentence of 54 months is consistent with the provisions of the Criminal Law, upholding the authority of the law while conveying a firm stance on social justice. **Balance Between Punishment and Education**:
- -
Sentencing Liu to 54 months effectively serves as a deterrent for similar offenses while recognizing the defendant’s remorse, achieving a balance between punishment and rehabilitation.
In conclusion, after thorough discussion, the members of the collegial panel believe that sentencing Liu to 54 months of imprisonment is appropriate, effectively punishing his actions while also acknowledging his remorse to some extent.
1 | The official English version of Chinese Criminal Law: http://www.npc.gov.cn/zgrdw/englishnpc/Law/2007-12/13/content_1384075.htm (accessed on 20 July 2025). |
References
- Zhong, H.; Xiao, C.; Tu, C.; Zhang, T.; Liu, Z.; Sun, M. How does nlp benefit legal system: A summary of legal artificial intelligence. arXiv 2020, arXiv:2004.12158. [Google Scholar]
- Bellandi, V.; Bernasconi, C.; Lodi, F.; Palmonari, M.; Pozzi, R.; Ripamonti, M.; Siccardi, S. An entity-centric approach to manage court judgments based on natural language processing. Comput. Law Secur. Rev. 2024, 52, 105904. [Google Scholar] [CrossRef]
- Xu, S.; Santosh, T.; Ichim, O.; Plank, B.; Grabmair, M. Through the lens of split vote: Exploring disagreement, difficulty and calibration in legal case outcome classification. arXiv 2024, arXiv:2402.07214. [Google Scholar]
- Cui, J.; Shen, X.; Wen, S. A survey on legal judgment prediction: Datasets, metrics, models and challenges. IEEE Access 2023, 11, 102050–102071. [Google Scholar] [CrossRef]
- Landsman, S.; Zhang, J. A tale of two juries: Lay participation comes to Japanese and Chinese courts. UCLA Pac. Basin Law J. 2007, 25, 179. [Google Scholar] [CrossRef]
- Dawson, J.P. A History of Lay Judges; Harvard University Press: Cambridge, MA, USA, 1960. [Google Scholar]
- Heppenstall, A.; Malleson, N.; Crooks, A. “Space, the Final Frontier”: How good are agent-based models at simulating individuals and space in cities? Systems 2016, 4, 9. [Google Scholar] [CrossRef]
- Benthall, S.; Strandburg, K.J. Agent-based modeling as a legal theory tool. Front. Phys. 2021, 9, 666386. [Google Scholar] [CrossRef]
- Wu, Q.; Bansal, G.; Zhang, J.; Wu, Y.; Li, B.; Zhu, E.; Jiang, L.; Zhang, X.; Zhang, S.; Liu, J.; et al. Autogen: Enabling next-gen LLM applications via multi-agent conversation. arXiv 2023, arXiv:2308.08155. [Google Scholar]
- Hong, S.; Zheng, X.; Chen, J.; Cheng, Y.; Wang, J.; Zhang, C.; Wang, Z.; Yau, S.K.S.; Lin, Z.; Zhou, L.; et al. Metagpt: Meta programming for multi-agent collaborative framework. arXiv 2023, arXiv:2308.00352. [Google Scholar]
- Abdelnabi, S.; Gomaa, A.; Sivaprasad, S.; Schönherr, L.; Fritz, M. Llm-deliberation: Evaluating llms with interactive multi-agent negotiation games. In Proceedings of the ICLR 2024, Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Park, J.S.; O’Brien, J.; Cai, C.J.; Morris, M.R.; Liang, P.; Bernstein, M.S. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, San Francisco, CA, USA, 29 October–1 November 2023; pp. 1–22. [Google Scholar]
- He, Z.; Cao, P.; Chen, Y.; Liu, K.; Li, R.; Sun, M.; Zhao, J. Lego: A multi-agent collaborative framework with role-playing and iterative feedback for causality explanation generation. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, 6–10 December 2023; pp. 9142–9163. [Google Scholar]
- Han, S.; Zhang, Q.; Yao, Y.; Jin, W.; Xu, Z. LLM multi-agent systems: Challenges and open problems. arXiv 2024, arXiv:2402.03578. [Google Scholar]
- Li, J.; Wang, S.; Zhang, M.; Li, W.; Lai, Y.; Kang, X.; Ma, W.; Liu, Y. Agent hospital: A simulacrum of hospital with evolvable medical agents. arXiv 2024, arXiv:2405.02957. [Google Scholar]
- Qian, C.; Liu, W.; Liu, H.; Chen, N.; Dang, Y.; Li, J.; Yang, C.; Chen, W.; Su, Y.; Cong, X.; et al. Chatdev: Communicative agents for software development. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand, 11–16 August 2024; pp. 15174–15186. [Google Scholar]
- Kort, F. Predicting supreme court decisions mathematically: A quantitative analysis of the “right to counsel” cases. Am. Political Sci. Rev. 1957, 51, 1–12. [Google Scholar] [CrossRef]
- Xiao, C.; Zhong, H.; Guo, Z.; Tu, C.; Liu, Z.; Sun, M.; Feng, Y.; Han, X.; Hu, Z.; Wang, H.; et al. Cail2018: A large-scale legal dataset for judgment prediction. arXiv 2018, arXiv:1807.02478. [Google Scholar]
- Chalkidis, I.; Androutsopoulos, I.; Aletras, N. Neural legal judgment prediction in English. arXiv 2019, arXiv:1906.02059. [Google Scholar]
- Luo, B.; Feng, Y.; Xu, J.; Zhang, X.; Zhao, D. Learning to predict charges for criminal cases with legal basis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; pp. 2727–2736. [Google Scholar]
- Hong, Y.-X.; Chang, C.-H. Improving colloquial case legal judgment prediction via abstractive text summarization. Comput. Law Secur. Rev. 2023, 51, 105863. [Google Scholar] [CrossRef]
- Chien, K.-C.; Chang, C.-H.; Sun, R.-D. Legal knowledge management for prosecutors based on judgment prediction and error analysis from indictments. Comput. Law Secur. Rev. 2024, 52, 105902. [Google Scholar] [CrossRef]
- An, Z.; Huang, Q.; Jiang, C.; Feng, Y.; Zhao, D. Do charge prediction models learn legal theory? In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 3757–3768. [Google Scholar]
- Medvedeva, M.; Mcbride, P. Legal judgment prediction: If you are going to do it, do it right. In Proceedings of the Natural Legal Language Processing Workshop 2023, Singapore, 7 December 2023; pp. 73–84. [Google Scholar]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. GPT-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
- Liga, D.; Robaldo, L. Fine-tuning gpt-3 for legal rule classification. Comput. Law Secur. Rev. 2023, 51, 105864. [Google Scholar] [CrossRef]
- Jiang, H.; Zhang, X.; Mahari, R.; Kessler, D.; Ma, E.; August, T.; Li, I.; Pentland, A.; Kim, Y.; Kabbara, J.; et al. Leveraging large language models for learning complex legal concepts through storytelling. arXiv 2024, arXiv:2402.17019. [Google Scholar]
- Deroy, A.; Ghosh, K.; Ghosh, S. Applicability of large language models and generative models for legal case judgement summarization. Artif. Intell. Law 2024. [Google Scholar] [CrossRef]
- Jiang, C.; Yang, X. Legal syllogism prompting: Teaching large language models for legal judgment prediction. In Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law, Braga, Portugal, 19–23 June 2023; pp. 417–421. [Google Scholar]
- Licari, D.; Comandè, G. Italian-legal-bert: A pre-trained transformer language model for Italian law. In Proceedings of the Knowledge Management for Law Workshop (KM4LAW), Bozen-Bolzano, Italy, 26 September 2022. [Google Scholar]
- Huang, Q.; Tao, M.; Zhang, C.; An, Z.; Jiang, C.; Chen, Z.; Wu, Z.; Feng, Y. Lawyer llama technical report. arXiv 2023, arXiv:2305.15062. [Google Scholar]
- Hamilton, S. Blind judgement: Agent-based supreme court modelling with gpt. arXiv 2023, arXiv:2301.05327. [Google Scholar]
- Fei, Z.; Shen, X.; Zhu, D.; Zhou, F.; Han, Z.; Zhang, S.; Chen, K.; Shen, Z.; Ge, J. Lawbench: Benchmarking legal knowledge of large language models. arXiv 2023, arXiv:2309.16289. [Google Scholar]
- Bai, J.; Bai, S.; Chu, Y.; Cui, Z.; Dang, K.; Deng, X.; Fan, Y.; Ge, W.; Han, Y.; Huang, F.; et al. Qwen technical report. arXiv 2023, arXiv:2309.16609. [Google Scholar]
- Kojima, T.; Gu, S.S.; Reid, M.; Matsuo, Y.; Iwasawa, Y. Large language models are zero-shot reasoners. Adv. Neural Inf. Process. Syst. 2022, 35, 22199–22213. [Google Scholar]
- Blair-Stanek, A.; Holzenberger, N.; Van Durme, B. Can gpt-3 perform statutory reasoning? In Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law, Braga, Portugal, 19–23 June 2023; pp. 22–31. [Google Scholar]
Model | Method | Performance (%) | Legality (%) | Logicality (%) | Morality (%) |
---|---|---|---|---|---|
Qwen | Standard Prompt | 74.22 | |||
CoT | 73.32 | 49.3 | 53.6 | 49.2 | |
LS | 72.82 | 53.2 | 52.5 | 48.9 | |
AgentsBench | 78.25 | 55.4 | 53.4 | 68.7 | |
GPT-3.5 | Standard Prompt | 75.13 | |||
CoT | 76.45 | 51.0 | 55.0 | 50.1 | |
LS | 74.72 | 54.8 | 54.4 | 49.6 | |
AgentsBench | 80.81 | 55.3 | 55.2 | 72.1 | |
GPT-4 | Standard Prompt | 80.98 | |||
CoT | 79.76 | 54.1 | 57.3 | 51.5 | |
LS | 80.28 | 56.8 | 56.6 | 52.2 | |
AgentsBench | 86.33 | 56.5 | 58.2 | 76.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jiang, C.; Yang, X. AgentsBench: A Multi-Agent LLM Simulation Framework for Legal Judgment Prediction. Systems 2025, 13, 641. https://doi.org/10.3390/systems13080641
Jiang C, Yang X. AgentsBench: A Multi-Agent LLM Simulation Framework for Legal Judgment Prediction. Systems. 2025; 13(8):641. https://doi.org/10.3390/systems13080641
Chicago/Turabian StyleJiang, Cong, and Xiaolei Yang. 2025. "AgentsBench: A Multi-Agent LLM Simulation Framework for Legal Judgment Prediction" Systems 13, no. 8: 641. https://doi.org/10.3390/systems13080641
APA StyleJiang, C., & Yang, X. (2025). AgentsBench: A Multi-Agent LLM Simulation Framework for Legal Judgment Prediction. Systems, 13(8), 641. https://doi.org/10.3390/systems13080641