Comparative Analysis of BERT and GPT for Classifying Crisis News with Sudan Conflict as an Example
Abstract
1. Introduction
2. Related Work
2.1. Text Classification in NLP
2.2. Text Classification by BERT
2.3. Text Classification by GPT
2.4. Current Problems and Contributions
3. Data Sources
4. Methodologies
4.1. Methods
4.1.1. BERT
4.1.2. GPT
4.2. Evaluation Matrix
4.2.1. BERT
4.2.2. GPT
- Accuracy was determined by checking whether the predicted set of categories exactly matched the ground-truth set for each article.
- Precision measured the proportion of correctly predicted categories among all predicted labels.
- Recall quantified the proportion of relevant categories successfully identified by the model.
- F1-score, the harmonic mean of precision and recall, provided a balanced assessment of classification performance.
5. Results
5.1. SL-MLG
5.2. MLC
5.3. K-Fold CV
6. Discussion
Comparison with Similar Studies
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
LLM | Large language model |
ML | Machine learning |
DL | Deep learning |
NER | Named entity recognition |
GPT | Generative pre-trained Transformer |
BERT | Bidirectional encoder representations from Transformers |
RAG | Retrieval-augmented generation |
ICL | In-context learning |
NLP | Natural language processing |
MLC | Multi-label classification |
SL-MLG CV | Single-label from multi-label ground truth Cross-validation |
HP | Hyperparameter |
References
- Croicu, M. Deep Active Learning for Data Mining from Conflict Text Corpora. arXiv 2024, arXiv:2402.01577. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019, arXiv:1810.04805. [Google Scholar]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. Available online: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf (accessed on 3 July 2025).
- Wang, Z.; Masri, Y.; Malarvizhi, A.S.; Stover, T.; Ahmed, S.; Wong, D.; Jiang, Y.; Li, Y.; Bere, M.; Rothbart, D.; et al. Optimizing Context-Based Location Extraction by Tuning Open-Source LLMs with RAG. Int. J. Digit. Earth. [CrossRef]
- Wang, Z.; Chen, Y.; Li, Y.; Kakkar, D.; Guan, W.; Ji, W.; Cain, J.; Lan, H.; Sha, D.; Liu, Q.; et al. Public Opinions on COVID-19 Vaccines—A Spatiotemporal Perspective on Races and Topics Using a Bayesian-Based Method. Vaccines 2022, 10, 1486. [Google Scholar] [CrossRef]
- Li, Q.; Peng, H.; Li, J.; Xia, C.; Yang, R.; Sun, L.; Yu, P.S.; He, L. A Survey on Text Classification: From Shallow to Deep Learning. arXiv 2021, arXiv:2008.00364. [Google Scholar]
- Lavanya, P.; Sasikala, E. Deep Learning Techniques on Text Classification Using Natural Language Processing (NLP) In Social Healthcare Network: A Comprehensive Survey. In Proceedings of the 2021 3rd International Conference on Signal Processing and Communication (ICPSC), Coimbatore, India, 13–14 May 2021; pp. 603–609. [Google Scholar] [CrossRef]
- Wang, Z.; Pang, Y.; Lin, Y.; Zhu, X. Adaptable and Reliable Text Classification Using Large Language Models. arXiv 2024, arXiv:2405.10523. [Google Scholar]
- Chen, Y.; Li, Y.; Wang, Z.; Quintero, A.J.; Yang, C.; Ji, W. Rapid Perception of Public Opinion in Emergency Events through Social Media. Nat. Hazards Rev. 2022, 23, 04021066. [Google Scholar] [CrossRef]
- Sufi, F. Advances in Mathematical Models for AI-Based News Analytics. Mathematics 2024, 12, 3736. [Google Scholar] [CrossRef]
- Chen, X.; Cong, P.; Lv, S. A Long-Text Classification Method of Chinese News Based on BERT and CNN. IEEE Access 2022, 10, 34046–34057. [Google Scholar] [CrossRef]
- Fatemi, B.; Rabbi, F.; Opdahl, A.L. Evaluating the Effectiveness of GPT Large Language Model for News Classification in the IPTC News Ontology. IEEE Access 2023, 11, 145386–145394. [Google Scholar] [CrossRef]
- Wang, Y.; Qu, W.; Ye, X. Selecting Between BERT and GPT for Text Classification in Political Science Research. arXiv 2024, arXiv:2411.05050. [Google Scholar]
- Wang, Z.; Li, Y.; Wang, K.; Cain, J.; Salami, M.; Duffy, D.Q.; Little, M.M.; Yang, C. Adopting GPU Computing to Support DL-Based Earth Science Applications. Int. J. Digit. Earth 2023, 16, 2660–2680. [Google Scholar] [CrossRef]
- Jim, J.R.; Talukder, M.A.R.; Malakar, P.; Kabir, M.M.; Nur, K.; Mridha, M.F. Recent Advancements and Challenges of NLP-Based Sentiment Analysis: A State-of-the-Art Review. Nat. Lang. Process. J. 2024, 6, 100059. [Google Scholar] [CrossRef]
- Oyeyemi, D.A.; Ojo, A.K. SMS Spam Detection and Classification to Combat Abuse in Telephone Networks Using Natural Language Processing. J. Adv. Math. Comput. Sci. 2023, 38, 144–156. [Google Scholar] [CrossRef]
- Yu, L.; Liu, B.; Lin, Q.; Zhao, X.; Che, C. Semantic Similarity Matching for Patent Documents Using Ensemble BERT-Related Model and Novel Text Processing Method. arXiv 2024, arXiv:2401.06782. [Google Scholar]
- Zhang, L.; Wang, S.; Liu, B. Deep Learning for Sentiment Analysis: A Survey. arXiv 2018, arXiv:1801.07883. [Google Scholar] [CrossRef]
- Sebastiani, F. Machine Learning in Automated Text Categorization. ACM Comput. Surv. 2002, 34, 1–47. [Google Scholar] [CrossRef]
- Jiang, Y.; Li, Y.; Yang, C.; Liu, K.; Armstrong, E.M.; Huang, T.; Moroni, D.F.; Finch, C.J. A Comprehensive Methodology for Discovering Semantic Relationships among Geospatial Vocabularies Using Oceanographic Data Discovery as an Example. Int. J. Geogr. Inf. Sci. 2017, 31, 2310–2328. [Google Scholar] [CrossRef]
- Kim, Y. Convolutional Neural Networks for Sentence Classification. arXiv 2014, arXiv:1408.5882. [Google Scholar]
- Yu, M.; Huang, Q.; Qin, H.; Scheele, C.; Yang, C. Deep Learning for Real-Time Social Media Text Classification for Situation Awareness–Using Hurricanes Sandy, Harvey, and Irma as Case Studies. In Social Sensing and Big Data Computing for Disaster Management; Routledge: Oxfordshire, UK, 2020; pp. 33–50. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Graves, A. Generating Sequences with Recurrent Neural Networks. arXiv 2014, arXiv:1308.0850. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Patwardhan, N.; Marrone, S.; Sansone, C. Transformers in the Real World: A Survey on NLP Applications. Information 2023, 14, 242. [Google Scholar] [CrossRef]
- Ansar, W.; Goswami, S.; Chakrabarti, A. A Survey on Transformers in NLP with Focus on Efficiency. arXiv 2024, arXiv:2406.16893. [Google Scholar]
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models Are Few-Shot Learners. arXiv 2020, arXiv:2005.14165. [Google Scholar]
- Alaparthi, S.; Mishra, M. Bidirectional Encoder Representations from Transformers (BERT): A Sentiment Analysis Odyssey. arXiv 2020, arXiv:2007.01127. [Google Scholar]
- Yang, L.; Zhou, X.; Fan, J.; Xie, X.; Zhu, S. Can Bidirectional Encoder Become the Ultimate Winner for Downstream Applications of Foundation Models? arXiv 2024, arXiv:2411.18021. [Google Scholar]
- Limsopatham, N. Effectively Leveraging BERT for Legal Document Classification. In Natural Legal Language Processing Workshop 2021; Aletras, N., Androutsopoulos, I., Barrett, L., Goanta, C., Preotiuc-Pietro, D., Eds.; Association for Computational Linguistics: Punta Cana, Dominican Republic, 2021; pp. 210–216. [Google Scholar]
- Wu, J.; Qu, P.; Zhang, B.; Zhou, Z. Sentiment Analysis in Social Media: Leveraging BERT for Enhanced Accuracy. J. Ind. Eng. Appl. Sci. 2024, 2, 143–149. [Google Scholar] [CrossRef]
- Zaman-Khan, H.; Naeem, M.; Guarasci, R.; Bint-Khalid, U.; Esposito, M.; Gargiulo, F. Enhancing Text Classification Using BERT: A Transfer Learning Approach. Comput. Sist. 2024, 28, 2279–2295. [Google Scholar] [CrossRef]
- Chen, Y. A Study on News Headline Classification Based on BERT Modeling; Atlantis Press: Dordrecht, The Netherlands, 2024; pp. 345–355. [Google Scholar]
- Bedretdin, Ü. Supervised Multi-Class Text Classification for Media Research: Augmenting BERT with Topics and Structural Features. Available online: https://helda.helsinki.fi/items/f02c65c9-f449-4fc7-a4ac-2ad23d3cea93 (accessed on 24 June 2025).
- Shah, M.A.; Iqbal, M.J.; Noreen, N.; Ahmed, I. An Automated Text Document Classification Framework Using BERT. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 279–285. [Google Scholar] [CrossRef]
- Petridis, C. Text Classification: Neural Networks VS Machine Learning Models VS Pre-Trained Models. arXiv 2024, arXiv:2412.21022. [Google Scholar]
- OpenAI; Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; et al. GPT-4 Technical Report. arXiv 2024, arXiv:2303.08774. [Google Scholar]
- Mao, R.; Chen, G.; Zhang, X.; Guerin, F.; Cambria, E. GPTEval: A Survey on Assessments of ChatGPT and GPT-4. arXiv 2024, arXiv:2308.12488. [Google Scholar]
- Mu, Y.; Wu, B.P.; Thorne, W.; Robinson, A.; Aletras, N.; Scarton, C.; Bontcheva, K.; Song, X. Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science. arXiv 2024, arXiv:2305.14310. [Google Scholar]
- Balkus, S.V.; Yan, D. Improving Short Text Classification with Augmented Data Using GPT-3. Nat. Lang. Eng. 2024, 30, 943–972. [Google Scholar] [CrossRef]
- Liu, P.; Yuan, W.; Fu, J.; Jiang, Z.; Hayashi, H.; Neubig, G. Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Comput. Surv. 2023, 55, 1–35. [Google Scholar] [CrossRef]
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-Generation Hyperparameter Optimization Framework. arXiv 2019, arXiv:1907.10902. [Google Scholar]
- Johnson, J.; Douze, M.; Jégou, H. Billion-Scale Similarity Search with GPUs. arXiv 2017, arXiv:1702.08734. [Google Scholar] [CrossRef]
- Brandt, P.T.; Alsarra, S.; D’Orazio, V.J.; Heintze, D.; Khan, L.; Meher, S.; Osorio, J.; Sianan, M. ConfliBERT: A Language Model for Political Conflict. arXiv 2024, arXiv:2412.15060. [Google Scholar]
Attribute | Example |
---|---|
Date | 27 September 2024 |
Incident Narrative | On September 27, continued fierce fighting between the two warring parties and their supporters for the control of El Fasher, North Darfur. Radio Dabanga reported that 19 people were killed and dozens injured by continued RSF artillery shelling on the city. The El Fasher livestock market was also reported hit by bombing. |
Ground Truth Labels | Military operations (battle, shelling), Indiscriminate use of weapons |
Category | Definition |
---|---|
Unlawful detention | Refers to the act of detaining or confining an individual without legal justification or due process. For example, if protesters are arrested and detained without any legal basis during peaceful demonstrations with no access to legal representation, this would be considered unlawful detention. |
Human trafficking | Refers to the act of recruiting, transporting, transferring, harboring, or receiving individuals through force, fraud, coercion, or other forms of deception for the purpose of exploitation. Exploitation can take many forms, including forced labor, sexual exploitation, slavery, servitude, or the removal of organs. It is considered a severe violation of human rights and is illegal under international and domestic laws. |
Enslavement | Refers to the act of exercising ownership or control over another person, treating them as property, and depriving them of their freedom. It often involves forcing individuals to perform labor or services under coercion, violence, or the threat of punishment. |
Willful killing of civilians | Refers to the intentional killing of civilians who are not directly participating in hostilities, with full knowledge of their noncombatant status. This includes acts like massacres, executions, or deliberate bombings of civilian sites such as homes, schools, or hospitals, where the clear intent is to cause death. For example, a military unit massacring the residents of a village. |
Mass execution | Refers to the deliberate killing of a large number of individuals, often carried out by state or non-state actors as part of systematic persecution, acts of war, or punitive measures. The victims are typically selected based on political, ethnic, religious, or social affiliations, and the killings are often premeditated and organized. |
Kidnapping | Refers to the unlawful and intentional abduction, confinement, or holding of an individual against their will, often for a specific purpose such as extortion, ransom, political leverage, forced labor, or exploitation. It is a serious crime and violates the individual’s right to freedom and security. |
Extrajudicial killing | Refers to the killing of a person without any legal process, such as arrest, trial, or sentencing. It is carried out outside the law, often by state agents or with their approval. |
Forced disappearance | Refers to the act of abducting or detaining a person against their will, followed by a refusal to disclose their fate or whereabouts. This leaves the victim outside the protection of the law and often causes anguish to their family and community. |
Damage or destruction of civilian critical infrastructure | Refers to the reckless harm, sabotage, or destruction of essential facilities, systems, or services necessary for the well-being, safety, and survival of civilian populations. This includes infrastructure such as hospitals, water supplies, power grids, schools, transportation systems, and communication networks. |
Damage or destruction, looting, or theft of cultural heritage | Refers to the harm, removal, or appropriation of culturally significant sites, objects, or artifacts during conflicts, disasters, or other destabilizing events. These acts violate international laws that protect cultural heritage as part of humanity’s shared history and identity. This category also includes looting incidents. |
Military operations (battle, shelling) | Refers to actions explicitly conducted between opposing armed forces, such as the RSF and SAF, during a conflict or war. These actions involve the use of weapons, strategies, and tactics to achieve military objectives, focusing on direct engagements or operations targeting enemy positions. Narratives mentioning attacks on civilian areas or indiscriminate shelling are not included in this category, even if long-range weapons or artillery are used. |
Gender-based or other conflict-related sexual violence | Refers to acts of sexual violence committed during or as a result of armed conflict, often targeting individuals based on their gender, identity, or perceived vulnerability. Incidents such as rape or sexual harassment are considered gender-based or other conflict-related sexual violence. |
Violent crackdowns on protesters/opponents/civil rights abuse | Refers to the use of excessive or unlawful force to suppress dissent, silence opposition. These acts often involve targeting individuals or groups engaging in protests, political opposition, or advocacy for civil rights. |
Indiscriminate use of weapons | Refers to the use of weapons, such as shelling or bombing in a manner that impacts buildings, neighborhoods, or areas without clear differentiation between combatants and civilians, or military and civilian infrastructure. This category applies only to incidents involving the use of explosives or long-range weapons that cause widespread harm or destruction, regardless of whether brute force or a massacre is involved, unless explicitly mentioned. |
Torture or indications of torture | Refers to the infliction of severe physical or psychological pain and suffering on a person, typically to punish, intimidate, extract information, or coerce. |
Persecution based on political, racial, ethnic, gender, or sexual orientation | Refers to the systematic mistreatment, harassment, or oppression of individuals or groups due to their political beliefs, race, ethnicity, gender identity, or sexual orientation. |
Movement of military, paramilitary, or other troops and equipment | Refers to the deployment, transfer, or relocation of armed forces, armed groups, or their equipment as part of strategic or operational objectives. This movement may occur during preparation for conflict, active military operations, or in maintaining a presence in certain areas. |
Category | Train | Validation | Test |
---|---|---|---|
Unlawful Detention | 7 | 4 | 7 |
Human trafficking | 0 | 0 | 0 |
Enslavement | 1 | 0 | 0 |
Willful killing of civilians | 138 | 24 | 21 |
Mass execution | 14 | 2 | 1 |
Kidnapping | 19 | 2 | 3 |
Extrajudicial killing | 42 | 5 | 4 |
Forced disappearance | 13 | 1 | 1 |
Damage/destruction of civilian infrastructure | 78 | 12 | 9 |
Damage/destruction/looting of cultural heritage | 5 | 1 | 1 |
Military operations (battle, shelling) | 193 | 37 | 27 |
Gender-based or other conflict-related sexual violence | 3 | 0 | 0 |
Violent crackdowns on protesters/opponents | 25 | 6 | 6 |
Indiscriminate use of weapons | 75 | 16 | 8 |
Torture or indications of torture | 15 | 3 | 1 |
Persecution (political, racial, etc.) | 6 | 1 | 1 |
Movement of military/equipment | 11 | 1 | 0 |
Total Occurrences | 645 | 115 | 90 |
Total Rows | 323 | 57 | 43 |
MODEL | TUNING | TRAINING | INFERENCE | ACC. | |
---|---|---|---|---|---|
RAG ICL | |||||
Gemma2-9b | 1 m 14 s | 79.07% | |||
Gemma2-27b | 1 m 51 s | 86.05% | |||
Llama3.3-70b | 3 m 12 s | 90.70% | |||
Llama3.2-3b | 3 m 29 s | 9.30% | |||
Llama3.1-70b | 3 m 14 s | 90.70% | |||
Llama3.1-7b | 1 m 3 s | 86.05% | |||
Mistral-7b | 1 m 8 s | 79.07% | |||
RAG | |||||
Gemma2-9b | 1 m 9 s | 58.14% | |||
Gemma2-27b | 1 m 21 s | 69.77% | |||
Llama3.3-70b | 1 m 44 s | 86.05% | |||
Llama3.2-3b | 1 m | 44.19% | |||
Llama3.1-70b | 1 m 45 s | 81.40% | |||
Llama3.1-7b | 1 m 49 s | 65.12% | |||
Mistral-7b | 1 m | 62.79% | |||
Zero-Shot | |||||
Gemma2-9b | 20 s | 65.12% | |||
Gemma2-27b | 32 s | 67.44% | |||
Llama3.3-70b | 40 s | 76.74% | |||
Llama3.2-3b | 9 s | 41.86% | |||
Llama3.1-70b | 49 s | 72.09% | |||
Llama3.1-7b | 12 s | 48.84% | |||
Mistral-7b | 12 s | 51.16% | |||
BERT hp-tuned | |||||
bert-base-uncased bert-large-uncased | 137 m 57 s 1170 m 9 s | 7 m 20 s 73 m 19 s | 1 s 20 s | 74.42% 83.72% | |
BERT | |||||
bert-base-uncased bert-large-uncased | 10 m 4 s 28 m 29 s | 1 s 2 s | 76.74% 74.42% |
MODEL | TUNING | TRAINING | INFERENCE | ACC. | PREC. | REC. | F1 | |
---|---|---|---|---|---|---|---|---|
RAG ICL | ||||||||
Gemma2-9b | 1 m 17 s | 13.95% | 65.89% | 48.26% | 53.03% | |||
Gemma2-27b | 1 m 54 s | 18.60% | 77.52% | 56.40% | 62.21% | |||
Llama3.3-70b | 3 m 56 s | 6.98% | 59.34% | 77.13% | 64.42% | |||
Llama3.2-3b | 1 m 42 s | 0.00% | 26.87% | 84.50% | 36.36% | |||
Llama3.1-70b | 3 m 30 s | 25.58% | 78.49% | 63.37% | 67.00% | |||
Llama3.1-7b | 1 m 22 s | 6.98% | 49.08% | 67.05% | 50.95% | |||
Mistral-7b | 1 m 18 s | 11.63% | 67.44% | 50.39% | 53.88% | |||
RAG | ||||||||
Gemma2-9b | 1 m 12 s | 9.30% | 61.24% | 63.76% | 59.35% | |||
Gemma2-27b | 1 m 40 s | 9.30% | 61.05% | 61.43% | 57.88% | |||
Llama3.3-70b | 2 m 36 s | 11.63% | 62.02% | 80.62% | 67.30% | |||
Llama3.2-3b | 1 m 6 s | 2.33% | 2.33% | 2.33% | 2.33% | |||
Llama3.1-70b | 2 m 30 s | 16.28% | 70.74% | 73.84% | 68.80% | |||
Llama3.1-7b | 1 m 12 s | 13.95% | 51.16% | 72.29% | 57.22% | |||
Mistral-7b | 1 m 8 s | 9.30% | 53.91% | 59.88% | 52.04% | |||
Zero-Shot | ||||||||
Gemma2-9b | 32 s | 11.63% | 55.04% | 57.56% | 53.48% | |||
Gemma2-27b | 48 s | 20.93% | 59.69% | 55.81% | 53.22% | |||
Llama3.3-70b | 1 m 31 s | 9.30% | 62.95% | 71.51% | 62.75% | |||
Llama3.2-3b | 22 s | 6.98% | 37.91% | 43.22% | 37.69% | |||
Llama3.1-70b | 1 m 21 s | 16.23% | 66.05% | 63.37% | 60.79% | |||
Llama3.1-7b | 25 s | 11.63% | 46.85% | 52.91% | 45.52% | |||
Mistral-7b | 28 s | 6.98% | 41.30% | 50.19% | 41.73% | |||
BERT hp-tuned | ||||||||
bert-base-uncased bert-large-uncased | 181 m 51 s 1190 m 22 s | 5 m 54 s 160 m 33 s | 18 s 19 s | 34.88% 41.86% | 72.61% 75.70% | 57.78% 64.44% | 63.01% 68.90% | |
BERT | ||||||||
bert-base-uncased bert-large-uncased | 9 m 41 s 14 m 54 s | 2 s 1 s | 30.23% 27.91% | 71.18% 64.32% | 57.78% 52.22% | 62.85% 54.62% |
Incident Narrative | Clashes between the Sudanese Armed Forces (SAF) and its paramilitary counterpart, the Rapid Support Forces (RSF), continue in various parts of the country. On April 18, artillery shelling led to the death of a person in El Obeid, the capital of North Kordofan. |
Ground Truth | Indiscriminate use of weapons, Willful killing of civilians, Military operations (battle, shelling) |
LLM Output | Willful killing of civilians, Military operations (battle, shelling), Indiscriminate use of weapons |
ACC. | 100% |
PREC. | 100% |
REC. | 100% |
F1 | 100% |
TP | Military operations (battle, shelling); Willful killing of civilians; Indiscriminate use of weapons |
FP | N/A * |
FN | N/A * |
MODEL | TASK | ACC. ± SD | F1 ± SD | PREC. ± SD | REC. ± SD | AVG. FOLD TIME |
---|---|---|---|---|---|---|
bert-base-uncased | MLC | 27.90 ± 2.90% | 56.94 ± 4.76% | 59.50 ± 7.62% | 57.29 ± 4.37% | 9 m 16 s |
bert-large-uncased | MLC | 39.24 ± 4.01% | 70.35 ± 2.12% | 76.85 ± 4.52% | 69.53 ± 2.20% | 30 m 7 s |
bert-base-uncased | SL-MLG | 77.10 ± 5.95% | — | — | — | 9 m 23 s |
bert-large-uncased | SL-MLG | 85.83 ± 3.53% | — | — | — | 31 m 17 s |
METHOD | MODEL | TUNING | TRAINING | INFERENCE | ACC. |
---|---|---|---|---|---|
RAG ICL | Llama3.3-70b | 3 m 12 s | 90.70% | ||
RAG ICL | Llama3.1-70b | 3 m 14 s | 90.70% | ||
RAG | Llama3.3-70b | 1 m 44 s | 86.05% | ||
Zero-Shot | Llama3.3-70b | 40 s | 76.74% | ||
BERT hp-tuned BERT hp-tuned | bert-base-uncased bert-large-uncased | 137 m 57 s 1170 m 9 s | 7 m 20 s 73 m 19 s | 1 s 20 s | 74.42% 83.72% |
BERT BERT | bert-base-uncased bert-large-uncased | 10 m 4 s 28 m 29 s | 1 s 2 s | 76.74% 74.42% |
METHOD | MODEL | TUNING | TRAINING | INFERENCE | F1. |
---|---|---|---|---|---|
RAG ICL | Llama3.1-70b | 3 m 30 s | 67.00% | ||
RAG | Llama3.1-70b | 2 m 30 s | 68.80% | ||
Zero-Shot | Llama3.3-70b | 1 m 31 s | 62.75% | ||
BERT hp-tuned BERT hp-tuned | bert-base-uncased bert-large-uncased | 181 m 51 s 1190 m 22 s | 5 m 54 s 160 m 33 s | 18 s 19 s | 63.01% 68.90% |
BERT BERT | bert-base-uncased bert-large-uncased | 9 m 41 s 14 m 54 s | 2 s 1 s | 62.85% 54.62% |
Study | Model(s) Used | Dataset | Task | Key Findings | Limitations |
---|---|---|---|---|---|
Sufi (2024) [10] | GPT-3.5-Turbo, CNN | 1M+ news articles over 405 days from 100+ sources | Categorization, correlation analysis, anomaly detection | 90.67% classified into 202 subcategories; F1-score: 0.921; 85% anomaly detection sensitivity; effective use of GPT embeddings and knowledge graphs. | Fixed thresholds limit adaptability; taxonomy gaps excluded some topics. |
Chen et al. (2022) [11] | BERT + CNN | Chinese news from Toutiao (approx. 240k+ samples) | Long-text classification (headline + body) | Combining BERT and CNN outperformed baselines (SVM, GRU, and BERT alone); improved handling of long-text context and feature extraction. | Used only one LLM; no comparison with other models tested; only explored Zero-Shot prompting. |
Fatemi et al. (2023) [12] | GPT-3.5-Turbo (Zero-Shot, hierarchical prompt) | 4.7 k English news articles, 17 top/51 sub IPTC topics | Multi-class news tagging | average F1: 80%; BERT-based fine-tuned models beat ML and BERT baselines; clustering metrics confirmed label quality. | Some articles dropped (token cap); Level 2 hurt by class imbalance; still minor sub-category hallucinations. |
Brandt et al. (2024) [45] | ConfliBERT (domain-specific BERT) compared to LLaMA 3.1 (7B), Gemma 2 (9B), Qwen 2.5 (14B) | BBC, re3d, GTD (37k conflict events) | Relevance classification, actor/action extraction, NER | Outperformed larger LLMs in accuracy and macro F1 (~0.79); 100–300× faster than 7–14B models | Does not leverage prompting strategies or retrieval augmentation. |
Wang et al. (2024) [13] | RoBERTa-large (a fine-tuned BERT model) and GPT-4o using Zero-Shot and few-shot prompting. | Multiple political text datasets (e.g., Sentiment News, Party Manifestos, Parliamentary Speeches, COVID-19 Policies, SOTU Speeches) | Text classification (binary, 8-, 20-, and 22-class) | Fine-tuned BERT outperforms GPT as data increases, but GPT is competitive in low-data or simple tasks, with 2-shot prompting sometimes matching BERT with 1000 samples. | Only GPT-4o used; does not explore retrieval augmentation strategies. |
Masri et al. (2025) [This study] | BERT (standard and hyperparameter-tuned), GPT-based LLMs (Llama3.3-70B, Llama3.1-70B, Gemma2-27B, Mistral-7B, etc.) with Zero-Shot, RAG, and RAG+ICL strategies | 4234 conflict-related articles on the Sudan Conflict (January–November 2024), expert-annotated with 17 overlapping categories | Multi-Label and Single-Label Classification of crisis news using SL-MLG and MLC frameworks | BERT models (esp. HP-tuned BERT-large model) achieved the highest MLC F1-score (68.90%); RAG ICL yielded the best SL-MLG accuracy (90.70%). BERT showed higher precision and SL-MLG accuracy with lower recall. Larger LLMs benefited from contextual prompting and outperformed smaller models. | Dataset was small, imbalanced, and domain-specific; BERT required costly tuning; GPT performance varied across prompt designs; classification impacted by overlapping categories and model refusal behavior without explicit context. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Masri, Y.; Wang, Z.; Srirenganathan Malarvizhi, A.; Ahmed, S.; Stover, T.; Wong, D.W.S.; Jiang, Y.; Li, Y.; Liu, Q.; Bere, M.; et al. Comparative Analysis of BERT and GPT for Classifying Crisis News with Sudan Conflict as an Example. Algorithms 2025, 18, 420. https://doi.org/10.3390/a18070420
Masri Y, Wang Z, Srirenganathan Malarvizhi A, Ahmed S, Stover T, Wong DWS, Jiang Y, Li Y, Liu Q, Bere M, et al. Comparative Analysis of BERT and GPT for Classifying Crisis News with Sudan Conflict as an Example. Algorithms. 2025; 18(7):420. https://doi.org/10.3390/a18070420
Chicago/Turabian StyleMasri, Yahya, Zifu Wang, Anusha Srirenganathan Malarvizhi, Samir Ahmed, Tayven Stover, David W. S. Wong, Yongyao Jiang, Yun Li, Qian Liu, Mathieu Bere, and et al. 2025. "Comparative Analysis of BERT and GPT for Classifying Crisis News with Sudan Conflict as an Example" Algorithms 18, no. 7: 420. https://doi.org/10.3390/a18070420
APA StyleMasri, Y., Wang, Z., Srirenganathan Malarvizhi, A., Ahmed, S., Stover, T., Wong, D. W. S., Jiang, Y., Li, Y., Liu, Q., Bere, M., Rothbart, D., Pfoser, D., & Yang, C. (2025). Comparative Analysis of BERT and GPT for Classifying Crisis News with Sudan Conflict as an Example. Algorithms, 18(7), 420. https://doi.org/10.3390/a18070420