Temporal Robustness of Large Language Models for Thematic Classification of UN General Assembly Debates

Mumtaz, Fatima; Rauf, Sadaf Abdul; Ishtiaq Nauman, Saadia; Abbas Malik, Muhammad Ghulam; Imran, Muhammad

doi:10.3390/info17060589

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Temporal Robustness of Large Language Models for Thematic Classification of UN General Assembly Debates

by

Fatima Mumtaz

¹,

Sadaf Abdul Rauf

^1,2,*,

Saadia Ishtiaq Nauman

³,

Muhammad Ghulam Abbas Malik

⁴

and

Muhammad Imran

^5,6

¹

Speech and Language Processing Group, Fatima Jinnah Women University, Rawalpindi 46000, Pakistan

²

Department of Computer Science, International Islamic University, Islamabad 44000, Pakistan

³

Department of Communication and Media Studies, Fatima Jinnah Women University, Rawalpindi 46000, Pakistan

⁴

Interdisciplinary Sustainable Systems (IS2) Group, College of Computer and Information Sciences, Prince Sultan University, Riyadh 12435, Saudi Arabia

⁵

Educational Research Lab, Prince Sultan University, Riyadh 12435, Saudi Arabia

⁶

Department of English, Khazar University, Baku AZ1096, Azerbaijan

^*

Author to whom correspondence should be addressed.

Information 2026, 17(6), 589; https://doi.org/10.3390/info17060589 (registering DOI)

Submission received: 11 May 2026 / Accepted: 28 May 2026 / Published: 12 June 2026

(This article belongs to the Section Artificial Intelligence)

Download Versions Notes

Abstract

Thematic analysis of large-scale political discourse remains a challenge due to semantic complexity and overlapping policy areas and changing diplomatic vocabulary. Although large language models (LLMs) offer promise for scalable thematic classification, their reliability in politically sensitive contexts requires systematic validation against expert human annotations. We evaluate LLM-based thematic classification of United Nations General Assembly (UNGA) speeches across a decade (2014–2023), using 7680 human-annotated themes mapped into 12 policy domains. Our results show that DeepSeek R1 achieves the highest accuracy 77% (F1 = 0.73), followed by ChatGPT, Gemini and LLaMA, with strong performance in lexically stable domains but substantial degradation in semantically overlapping categories such as governance and international cooperation. A unique dimension of our work is timeline analysis, which shows that the performance of LLMs over the years varies strongly and the precision decreases during times of rhetorical transformation, including pandemic-related discussions and the discourses of cooperation determined by the Russia–Ukraine conflict. By linking domain-level ambiguity and geopolitical shifts to temporal instability, this study introduces a dynamic robustness perspective for evaluating LLMs in computational political discourse analysis.

Keywords: Political Text Minings; zero-shot classification; human-in-the-loop annotation; cross-model performance evaluation; diplomatic discourse processing; large language model

Share and Cite

MDPI and ACS Style

Mumtaz, F.; Rauf, S.A.; Ishtiaq Nauman, S.; Abbas Malik, M.G.; Imran, M. Temporal Robustness of Large Language Models for Thematic Classification of UN General Assembly Debates. Information 2026, 17, 589. https://doi.org/10.3390/info17060589

AMA Style

Mumtaz F, Rauf SA, Ishtiaq Nauman S, Abbas Malik MG, Imran M. Temporal Robustness of Large Language Models for Thematic Classification of UN General Assembly Debates. Information. 2026; 17(6):589. https://doi.org/10.3390/info17060589

Chicago/Turabian Style

Mumtaz, Fatima, Sadaf Abdul Rauf, Saadia Ishtiaq Nauman, Muhammad Ghulam Abbas Malik, and Muhammad Imran. 2026. "Temporal Robustness of Large Language Models for Thematic Classification of UN General Assembly Debates" Information 17, no. 6: 589. https://doi.org/10.3390/info17060589

APA Style

Mumtaz, F., Rauf, S. A., Ishtiaq Nauman, S., Abbas Malik, M. G., & Imran, M. (2026). Temporal Robustness of Large Language Models for Thematic Classification of UN General Assembly Debates. Information, 17(6), 589. https://doi.org/10.3390/info17060589

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Temporal Robustness of Large Language Models for Thematic Classification of UN General Assembly Debates

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI