This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Temporal Robustness of Large Language Models for Thematic Classification of UN General Assembly Debates
by
Fatima Mumtaz
Fatima Mumtaz 1,
Sadaf Abdul Rauf
Sadaf Abdul Rauf 1,2,*,
Saadia Ishtiaq Nauman
Saadia Ishtiaq Nauman 3,
Muhammad Ghulam Abbas Malik
Muhammad Ghulam Abbas Malik 4
and
Muhammad Imran
Muhammad Imran 5,6
1
Speech and Language Processing Group, Fatima Jinnah Women University, Rawalpindi 46000, Pakistan
2
Department of Computer Science, International Islamic University, Islamabad 44000, Pakistan
3
Department of Communication and Media Studies, Fatima Jinnah Women University, Rawalpindi 46000, Pakistan
4
Interdisciplinary Sustainable Systems (IS2) Group, College of Computer and Information Sciences, Prince Sultan University, Riyadh 12435, Saudi Arabia
5
Educational Research Lab, Prince Sultan University, Riyadh 12435, Saudi Arabia
6
Department of English, Khazar University, Baku AZ1096, Azerbaijan
*
Author to whom correspondence should be addressed.
Information 2026, 17(6), 589; https://doi.org/10.3390/info17060589 (registering DOI)
Submission received: 11 May 2026
/
Accepted: 28 May 2026
/
Published: 12 June 2026
Abstract
Thematic analysis of large-scale political discourse remains a challenge due to semantic complexity and overlapping policy areas and changing diplomatic vocabulary. Although large language models (LLMs) offer promise for scalable thematic classification, their reliability in politically sensitive contexts requires systematic validation against expert human annotations. We evaluate LLM-based thematic classification of United Nations General Assembly (UNGA) speeches across a decade (2014–2023), using 7680 human-annotated themes mapped into 12 policy domains. Our results show that DeepSeek R1 achieves the highest accuracy 77% (F1 = 0.73), followed by ChatGPT, Gemini and LLaMA, with strong performance in lexically stable domains but substantial degradation in semantically overlapping categories such as governance and international cooperation. A unique dimension of our work is timeline analysis, which shows that the performance of LLMs over the years varies strongly and the precision decreases during times of rhetorical transformation, including pandemic-related discussions and the discourses of cooperation determined by the Russia–Ukraine conflict. By linking domain-level ambiguity and geopolitical shifts to temporal instability, this study introduces a dynamic robustness perspective for evaluating LLMs in computational political discourse analysis.
Share and Cite
MDPI and ACS Style
Mumtaz, F.; Rauf, S.A.; Ishtiaq Nauman, S.; Abbas Malik, M.G.; Imran, M.
Temporal Robustness of Large Language Models for Thematic Classification of UN General Assembly Debates. Information 2026, 17, 589.
https://doi.org/10.3390/info17060589
AMA Style
Mumtaz F, Rauf SA, Ishtiaq Nauman S, Abbas Malik MG, Imran M.
Temporal Robustness of Large Language Models for Thematic Classification of UN General Assembly Debates. Information. 2026; 17(6):589.
https://doi.org/10.3390/info17060589
Chicago/Turabian Style
Mumtaz, Fatima, Sadaf Abdul Rauf, Saadia Ishtiaq Nauman, Muhammad Ghulam Abbas Malik, and Muhammad Imran.
2026. "Temporal Robustness of Large Language Models for Thematic Classification of UN General Assembly Debates" Information 17, no. 6: 589.
https://doi.org/10.3390/info17060589
APA Style
Mumtaz, F., Rauf, S. A., Ishtiaq Nauman, S., Abbas Malik, M. G., & Imran, M.
(2026). Temporal Robustness of Large Language Models for Thematic Classification of UN General Assembly Debates. Information, 17(6), 589.
https://doi.org/10.3390/info17060589
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.