RAE: A Role-Based Adaptive Framework for Evaluating Automatically Generated Public Opinion Reports
Abstract
1. Introduction
- We propose a multi-perspective evaluation framework that addresses open-ended dimensions evaluation challenges by dynamically generating roles to capture diverse, context-specific stakeholder viewpoints.
- We introduce a multi-role reasoning aggregation mechanism that, for each evaluation dimension, synthesizes perspectives from multiple roles into a comprehensive dimension-specific score.
- Comprehensive experiments demonstrate that RAE achieves strong alignment with human expert judgments, with particularly notable improvements on the challenging highly open-ended dimensions.
2. Related Work
3. Preliminaries: The Structure of a Public Opinion Report
3.1. Event Title
3.2. Event Summary
3.3. Event Timeline
3.4. Event Focus
3.5. Event Suggestions
4. Methodology
4.1. Task Definition and Research Hypotheses
- H1: What aggregation mechanism optimally synthesizes multi-role evaluations into reliable scores?
- H2: Does increasing the number of dynamic roles continuously improve evaluation performance, or is there an optimal threshold?
- H3: Why employ adaptive role composition rather than using only dynamically generated roles for all dimensions?
- H4: What underlying mechanisms account for RAE’s improved alignment with human expert judgments in complex evaluation scenarios?
4.2. Adaptive Role-Play Mechanism
4.2.1. Predefined Roles for Dimensions with Reference Information
4.2.2. Dynamic Roles for Highly Open-Ended Dimensions
4.3. Multi-Role Reasoning Aggregation
5. Experiments
5.1. Datasets
5.2. Experimental Setup
5.2.1. Baseline
5.2.2. Model Configuration
5.2.3. Evaluation Metrics
5.3. Evaluation Framework Validation
5.3.1. Human Evaluation Protocol
- Calibration: Experts first rate a subset of reports. The goal is to establish a consistent understanding of the scoring criteria, with agreement formally measured using the Intraclass Correlation Coefficient (ICC).
- Formal Evaluation: After achieving a high level of agreement in the calibration phase, the experts proceed to score the entire corpus. This phase yields the three complete sets of ratings used for our human-agent agreement analysis.
5.3.2. Agreement Analysis
6. Discussions
6.1. Validation of H1: Voting Aggregation Achieves Optimal Alignment with Human Judgments
6.2. Validation of H2: Five Dynamic Roles Optimally Balance Diversity and Relevance
6.3. Validation of H3: Dynamic Role Generation Is Critical, While Adaptive Composition Improves Efficiency
6.4. Validation of H4: Case Evidence Illustrates Why Multi-Perspective Evaluation Improves Expert Alignment
7. Conclusions
7.1. Limitations
7.2. Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Detailed Scoring Criteria
Appendix A.1. Objective Dimensions: Timeline—Date Accuracy
| Scoring Criteria for Date Accuracy (Objective Dimensions) |
| This criterion evaluates the factual accuracy of the key dates in the Event_Timeline. The evaluation is based on four critical reference dates provided for the event’s lifecycle: the start of the Incubation Period, the start of the Peak Period, and the start and end of the Decline Period. The final 1–5 score is determined based on how many of these four critical dates are correctly reflected in the generated timeline. The mapping is as follows:
|
Appendix A.2. Subjective Dimensions: Event Suggestions—Innovation
| Scoring Criteria for Innovation (Subjective Dimensions) |
| This section details the four scoring dimensions for the Event Suggestions. Innovation: This criterion evaluates whether the suggestions offer novel or forward-thinking approaches beyond standard practices.
|
References
- Wang, B.; Zi, Y.; Zhao, Y.; Deng, P.; Qin, B. ESDM: Early sensing depression model in social media streams. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italy, 22–24 May 2024; Calzolari, N., Kan, M.Y., Hoste, V., Lenci, A., Sakti, S., Xue, N., Eds.; ELRA and ICCL: Torino, Italy, 2024; pp. 6288–6298. [Google Scholar]
- Hashemi, H.; Eisner, J.; Rosset, C.; Van Durme, B.; Kedzie, C. LLM-Rubric: A multidimensional, calibrated approach to automated evaluation of natural language texts. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand, 11–16 August 2024; Ku, L.W., Martins, A., Srikumar, V., Eds.; Association for Computational Linguistics: Bangkok, Thailand, 2024; pp. 13806–13834. [Google Scholar]
- Wang, D.; Yang, K.; Zhu, H.; Yang, X.; Cohen, A.; Li, L.; Tian, Y. Learning Personalized Alignment for Evaluating Open-ended Text Generation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; pp. 13274–13292. [Google Scholar]
- Liu, Y.; Yu, J.; Xu, Y.; Li, Z.; Zhu, Q. A survey on transformer context extension: Approaches and evaluation. arXiv 2025, arXiv:2503.13299. [Google Scholar] [CrossRef]
- Chiang, C.H.; Lee, H.Y. Can large language models be an alternative to human evaluations? In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; Rogers, A., Boyd-Graber, J., Okazaki, N., Eds.; Association for Computational Linguistics: Toronto, ON, Canada, 2023; pp. 15607–15631. [Google Scholar]
- Tseng, Y.M.; Huang, Y.C.; Hsiao, T.Y.; Chen, W.L.; Huang, C.W.; Meng, Y.; Chen, Y.N. Two tales of persona in LLMs: A survey of role-playing and personalization. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, FL, USA, 12–16 November 2024; Al-Onaizan, Y., Bansal, M., Chen, Y.N., Eds.; Association for Computational Linguistics: Miami, FL, USA, 2024; pp. 16612–16631. [Google Scholar]
- Chen, Q.; Qin, L.; Liu, J.; Peng, D.; Guan, J.; Wang, P.; Hu, M.; Zhou, Y.; Gao, T.; Che, W. Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models. arXiv 2025, arXiv:2503.09567. [Google Scholar]
- Liu, Y.; Iter, D.; Xu, Y.; Wang, S.; Xu, R.; Zhu, C. G-Eval: NLG evaluation using GPT-4 with better human alignment. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; Bouamor, H., Pino, J., Bali, K., Eds.; Association for Computational Linguistics: Singapore, 2023; pp. 2511–2522. [Google Scholar]
- Zheng, L.; Chiang, W.L.; Sheng, Y.; Zhuang, S.; Wu, Z.; Zhuang, Y.; Lin, Z.; Li, Z.; Li, D.; Xing, E.P.; et al. Judging LLM-as-a-judge with MT-bench and Chatbot Arena. In Proceedings of the 37th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 10–16 December 2023. [Google Scholar]
- Xiong, K.; Ding, X.; Cao, Y.; Liu, T.; Qin, B. Examining inter-consistency of large language models collaboration: An in-depth analysis via debate. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, New Orleans, LO, USA, 10–16 December 2023; Bouamor, H., Pino, J., Bali, K., Eds.; Association for Computational Linguistics: Singapore, 2023; pp. 7572–7590. [Google Scholar]
- Lin, Y.C.; Neville, J.; Stokes, J.; Yang, L.; Safavi, T.; Wan, M.; Counts, S.; Suri, S.; Andersen, R.; Xu, X.; et al. Interpretable user satisfaction estimation for conversational systems with large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand, 11–16 August 2024; Ku, L.W., Martins, A., Srikumar, V., Eds.; Association for Computational Linguistics: Bangkok, Thailand, 2024; pp. 11100–11115. [Google Scholar]
- Liu, W.; Wang, X.; Wu, M.; Li, T.; Lv, C.; Ling, Z.; JianHao, Z.; Zhang, C.; Zheng, X.; Huang, X. Aligning large language models with human preferences through representation engineering. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand, 11–16 August 2024; Ku, L.W., Martins, A., Srikumar, V., Eds.; Association for Computational Linguistics: Bangkok, Thailand, 2024; pp. 10619–10638. [Google Scholar]
- Lin, Y.T.; Chen, Y.N. LLM-Eval: Unified multi-dimensional automatic evaluation for open-domain conversations with large language models. In Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023), Toronto, ON, Canada, 14 July 2023; Chen, Y.N., Rastogi, A., Eds.; Association for Computational Linguistics: Toronto, ON, Canada, 2023; pp. 47–58. [Google Scholar]
- Cegin, J.; Simko, J.; Brusilovsky, P. ChatGPT to replace crowdsourcing of paraphrases for intent classification: Higher diversity and comparable model robustness. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; Bouamor, H., Pino, J., Bali, K., Eds.; Association for Computational Linguistics: Singapore, 2023; pp. 1889–1905. [Google Scholar]
- Pan, Q.; Ashktorab, Z.; Desmond, M.; Santillán Cooper, M.; Johnson, J.; Nair, R.; Daly, E.; Geyer, W. Human-centered design recommendations for LLM-as-a-judge. In Proceedings of the 1st Human-Centered Large Language Modeling Workshop, Bangkok, Thailand, 15 August 2024; Soni, N., Flek, L., Sharma, A., Yang, D., Hooker, S., Schwartz, H.A., Eds.; Association for Computational Linguistics: Bangkok, Thailand, 2024; pp. 16–29. [Google Scholar]
- Xu, C.; Wen, B.; Han, B.; Wolfe, R.; Wang, L.L.; Howe, B. Do language models mirror human confidence? Exploring psychological insights to address overconfidence in LLMs. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2025, Vienna, Austria, 27 July–1 August 2025; Che, W., Nabende, J., Shutova, E., Pilehvar, M.T., Eds.; Association for Computational Linguistics: Vienna, Austria, 2025; pp. 25655–25672. [Google Scholar]
- Aher, G.; Arriaga, R.I.; Kalai, A.T. Using large language models to simulate multiple humans and replicate human subject studies. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
- Park, J.S.; Popowski, L.; Cai, C.; Morris, M.R.; Liang, P.; Bernstein, M.S. Social simulacra: Creating populated prototypes for social computing systems. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, New York, NY, USA, 29 October–2 November 2022. [Google Scholar]
- Kim, A.; Kim, K.; Yoon, S. DEBATE: Devil’s advocate-based assessment and text evaluation. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand, 11–16 August 2024; Ku, L.W., Martins, A., Srikumar, V., Eds.; Association for Computational Linguistics: Bangkok, Thailand, 2024; pp. 1885–1897. [Google Scholar]
- Koo, R.; Lee, M.; Raheja, V.; Park, J.I.; Kim, Z.M.; Kang, D. Benchmarking cognitive biases in large language models as evaluators. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand, 11–16 August 2024; Ku, L.W., Martins, A., Srikumar, V., Eds.; Association for Computational Linguistics: Bangkok, Thailand, 2024; pp. 517–545. [Google Scholar]
- Kumar, S.; Nargund, A.A.; Sridhar, V. CourtEval: A courtroom-based multi-agent evaluation framework. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2025, Vienna, Austria, 27 July–1 August 2025; Che, W., Nabende, J., Shutova, E., Pilehvar, M.T., Eds.; Association for Computational Linguistics: Vienna, Austria, 2025; pp. 25875–25887. [Google Scholar]
- Li, G.; Al Kader Hammoud, H.A.; Itani, H.; Khizbullin, D.; Ghanem, B. CAMEL: Communicative agents for “mind” exploration of large language model society. In Proceedings of the 37th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 10–16 December 2023. [Google Scholar]
- Chen, A.; Lou, L.; Chen, K.; Bai, X.; Xiang, Y.; Yang, M.; Zhao, T.; Zhang, M. DUAL-REFLECT: Enhancing large language models for reflective translation through dual learning feedback mechanisms. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Bangkok, Thailand, 11–16 August 2024; Ku, L.W., Martins, A., Srikumar, V., Eds.; Association for Computational Linguistics: Bangkok, Thailand, 2024; pp. 693–704. [Google Scholar]
- Zhao, J.; Plaza-del Arco, F.M.; Genchel, B.; Curry, A.C. Language model council: Democratically benchmarking foundation models on highly subjective tasks. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Albuquerque, NM, USA, 29 April–4 May 2025; Chiruzzo, L., Ritter, A., Wang, L., Eds.; Association for Computational Linguistics: Albuquerque, NM, USA, 2025; pp. 12395–12450. [Google Scholar]
- Zhang, Y.; Chen, Q.; Li, M.; Che, W.; Qin, L. AutoCAP: Towards automatic cross-lingual alignment planning for zero-shot chain-of-thought. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand, 11–16 August 2024; Ku, L.W., Martins, A., Srikumar, V., Eds.; Association for Computational Linguistics: Bangkok, Thailand, 2024; pp. 9191–9200. [Google Scholar]
- Bai, Y.; Ying, J.; Cao, Y.; Lv, X.; He, Y.; Wang, X.; Yu, J.; Zeng, K.; Xiao, Y.; Lyu, H.; et al. Benchmarking foundation models with language-model-as-an-examiner. In Proceedings of the 37th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 10–16 December 2023. [Google Scholar]
- Zhao, R.; Zhang, W.; Chia, Y.K.; Xu, W.; Zhao, D.; Bing, L. Auto-Arena: Automating LLM evaluations with agent peer battles and committee discussions. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vienna, Austria, 27 July–1 August 2025; Che, W., Nabende, J., Shutova, E., Pilehvar, M.T., Eds.; Association for Computational Linguistics: Vienna, Austria, 2025; pp. 4440–4463. [Google Scholar]
- Chu, Z.; Ai, Q.; Tu, Y.; Li, H.; Liu, Y. Automatic large language model evaluation via peer review. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, New York, NY, USA, 21–25 October 2024; pp. 384–393. [Google Scholar]
- Wu, N.; Gong, M.; Shou, L.; Liang, S.; Jiang, D. Large language models are diverse role-players for summarization evaluation. In Proceedings of the Natural Language Processing and Chinese Computing: 12th National CCF Conference, NLPCC 2023, Foshan, China, 12–15 October 2023; Proceedings, Part I. Springer: Berlin/Heidelberg, Germany, 2023; pp. 695–707. [Google Scholar]
- Chen, H.; Goldfarb-Tarrant, S. Safer or luckier? LLMs as safety evaluators are not robust to artifacts. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vienna, Austria, 27 July–1 August 2025; Che, W., Nabende, J., Shutova, E., Pilehvar, M.T., Eds.; Association for Computational Linguistics: Vienna, Austria, 2025; pp. 19750–19766. [Google Scholar]
- Li, Y.; Du, Y.; Zhang, J.; Hou, L.; Grabowski, P.; Li, Y.; Ie, E. Improving multi-agent debate with sparse communication topology. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, FL, USA, 12–16 November 2024; Al-Onaizan, Y., Bansal, M., Chen, Y.N., Eds.; Association for Computational Linguistics: Miami, FL, USA, 2024; pp. 7281–7294. [Google Scholar]
- Du, Y.; Li, S.; Torralba, A.; Tenenbaum, J.B.; Mordatch, I. Improving factuality and reasoning in language models through multiagent debate. In Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024. [Google Scholar]
- Yu, J.; Xu, Y.; Li, H.; Li, J.; Zhu, L.; Shen, H.; Shi, L. OPOR-Bench: Evaluating Large Language Models on Online Public Opinion Report Generation. Comput. Mater. Contin. 2025. [Google Scholar] [CrossRef]
- Mu, H.; Xu, Y.; Feng, Y.; Han, X.; Li, Y.; Hou, Y.; Che, W. Beyond Static Evaluation: A Dynamic Approach to Assessing AI Assistants’ API Invocation Capabilities. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italy, 20–24 May 2024; pp. 2342–2353. [Google Scholar]
- Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Lawrence Erlbaum Associates: Hillsdale, NJ, USA, 1988. [Google Scholar]
- Willmott, C.J.; Matsuura, K. Advantages of the Mean Absolute Error (MAE) over the Root Mean Square Error (RMSE) in assessing average model performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
- Zhou, A.; Tsai, W.H.S.; Men, L.R. Optimizing AI Social Chatbots for Relational Outcomes: The Effects of Profile Design, Communication Strategies, and Message Framing. Int. J. Bus. Commun. 2024. [Google Scholar] [CrossRef]
- Dospinescu, N. A Study on Ethical Communication in Business. In Proceedings of the 3rd International Scientific Conference on Recent Advances in Information Technology, Tourism, Economics, Management and Agriculture—ITEMA 2019, Bratislava, Slovakia, 24 October 2019; Selected Papers. ITEMA: Belgrade, Serbia, 2019; pp. 165–172. [Google Scholar] [CrossRef]
- Kazlauskienė, I.; Atkočiūnienė, V. Application of information and communication technologies for public services management in smart villages. Businesses 2025, 5, 31. [Google Scholar] [CrossRef]
- Kocmi, T.; Federmann, C. Large language models are state-of-the-art evaluators of translation quality. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, Tampere, Finland, 12–15 June 2023; Nurminen, M., Brenner, J., Koponen, M., Latomaa, S., Mikhailov, M., Schierl, F., Ranasinghe, T., Vanmassenhove, E., Vidal, S.A., Aranberri, N., et al., Eds.; European Association for Machine Translation: Tampere, Finland, 2023; pp. 193–203. [Google Scholar]




| Dimension | ICC3 |
|---|---|
| Event Title | 0.843 |
| Event Summary | |
| Event Nature | 0.868 |
| Time & Loc. | 0.860 |
| Involved Parties | 0.856 |
| Causes | 0.879 |
| Impact | 0.887 |
| Event Timeline | |
| Date Acc. | 0.839 |
| Sub Events | 0.793 |
| Event Focus | |
| Contro. Topic | 0.894 |
| Repr. Stmt. | 0.877 |
| Emo. Anal. | 0.893 |
| Event Suggestions | |
| Rel. | 0.855 |
| Feas. | 0.841 |
| Emo. Guide. | 0.850 |
| Innov. | 0.861 |
| Method | Spearman’s | Kendall’s | MAE ↓ |
|---|---|---|---|
| GPT-4o | |||
| VOTE | 0.873 | 0.864 | 0.335 |
| PROC | 0.862 | 0.852 | 0.346 |
| COMP | 0.844 | 0.834 | 0.382 |
| Direct | 0.695 | 0.605 | 0.534 |
| OPOR-Eval [33] | 0.721 | 0.633 | 0.412 |
| DeepSeek-v3 | |||
| VOTE | 0.868 | 0.861 | 0.414 |
| PROC | 0.855 | 0.843 | 0.443 |
| COMP | 0.825 | 0.819 | 0.488 |
| Direct | 0.285 | 0.240 | 0.885 |
| OPOR-Eval [33] | 0.312 | 0.288 | 0.761 |
| Method | Title | Summary | Timeline | Focus | Suggestions | AVG | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Tit. | Nat. | T&L. | Par. | Cau. | Imp. | Dat. | Sub. | Top. | Sta. | Emo. | Rel. | Fea. | Gui. | Inn. | ||
| RAE | 0.88 | 0.87 | 0.89 | 0.88 | 0.86 | 0.88 | 0.86 | 0.89 | 0.86 | 0.89 | 0.87 | 0.88 | 0.87 | 0.87 | 0.88 | 0.88 |
| w/o Predefined | 0.81 † | 0.83 † | 0.83 † | 0.84 † | 0.77 † | 0.79 † | 0.83 † | 0.87 | 0.85 † | 0.89 † | 0.85 † | 0.89 † | 0.87 | 0.88 † | 0.86 † | 0.84 (−0.04) |
| w/o Dynamic | 0.83 † | 0.84 † | 0.86 † | 0.85 | 0.86 † | 0.87 † | 0.82 † | 0.78 † | 0.81 † | 0.76 † | 0.82 † | 0.78 | 0.75 † | 0.75 | 0.77 † | 0.81 (−0.07) |
| all Predefined | 0.86 † | 0.84 † | 0.87 | 0.86 † | 0.83 † | 0.85 | 0.84 † | 0.81 † | 0.82 † | 0.77 | 0.83 † | 0.79 † | 0.76 † | 0.77 † | 0.78 † | 0.82 (−0.06) |
| all Dynamic | 0.90 † | 0.87 † | 0.88 † | 0.86 † | 0.89 † | 0.90 † | 0.87 † | 0.87 † | 0.86 † | 0.89 † | 0.87 † | 0.87 † | 0.88 † | 0.86 † | 0.86 † | 0.88 (0.00) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yu, J.; Xu, Y.; Feng, Y.; Zhu, L.; Shen, H.; Shi, L. RAE: A Role-Based Adaptive Framework for Evaluating Automatically Generated Public Opinion Reports. Electronics 2026, 15, 380. https://doi.org/10.3390/electronics15020380
Yu J, Xu Y, Feng Y, Zhu L, Shen H, Shi L. RAE: A Role-Based Adaptive Framework for Evaluating Automatically Generated Public Opinion Reports. Electronics. 2026; 15(2):380. https://doi.org/10.3390/electronics15020380
Chicago/Turabian StyleYu, Jinzheng, Yang Xu, Yifan Feng, Ligu Zhu, Hao Shen, and Lei Shi. 2026. "RAE: A Role-Based Adaptive Framework for Evaluating Automatically Generated Public Opinion Reports" Electronics 15, no. 2: 380. https://doi.org/10.3390/electronics15020380
APA StyleYu, J., Xu, Y., Feng, Y., Zhu, L., Shen, H., & Shi, L. (2026). RAE: A Role-Based Adaptive Framework for Evaluating Automatically Generated Public Opinion Reports. Electronics, 15(2), 380. https://doi.org/10.3390/electronics15020380

