A Systematic Review and Experimental Evaluation of Classical and Transformer-Based Models for Urdu Abstractive Text Summarization
Abstract
1. Introduction
- We establish the first multi-dataset evaluation framework for Urdu abstractive summarization, combining UrSum, Fake News, and Urdu-Instruct datasets for robust cross-domain assessment.
- A novel text normalization pipeline addresses Urdu’s orthographic challenges through Unicode standardization and diacritic filtering.
- Our right-to-left optimized architecture introduces directional-aware tokenization and embeddings, preserving Urdu’s native reading order in transformer models.
- Comprehensive benchmarking reveals that fine-tuned monolingual transformers (BERT-Urdu, BART-Urdu) outperform multilingual models (mT5) and classical approaches by 12–18% in ROUGE scores.
- A hybrid training framework combining cross-entropy with ROUGE-based reinforcement improves both content coverage and linguistic coherence.
- We demonstrate that relative improvement metrics over Seq2Seq baselines provide more reliable cross-dataset comparisons than absolute scores in low-resource settings.
- Diagnostic analyses quantify the cumulative impact of Urdu-specific adaptations, offering guidelines for NLP development in low-resource and morphologically rich languages.
2. Related Work
3. Materials and Methods
Algorithm 1 Comprehensive Urdu Abstractive Text Summarization (ATS) Evaluation |
|
3.1. Preprocessing for Urdu Text Summarization
3.1.1. Unicode Normalization
3.1.2. Diacritic Filtering
3.1.3. Chunk Construction (512-Token Limit)
3.1.4. RTL-Aware Chunking
3.1.5. Stratified Splitting
3.1.6. Dataset Characteristics and Selection Rationale
3.1.7. Model Configuration
3.1.8. Urdu Tokenizers
3.1.9. RTL Embeddings
3.1.10. Length Constraints
3.1.11. Model Selection (Transformer and Baseline)
- BERTBERT is based on the encoder element of the original transformer architecture. A bidirectional attention mechanism aims to fully understand a word by examining its preceding and following words. BERT is comprised of multilayered transformers, each with its feedforward neural network and attention head. By using this bidirectional technique, BERT can assess a target word’s right and left contexts within a sentence to gain a deeper understanding of the text.BERT is pre-trained using Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). Multilevel marketing helps a model acquire context by randomly masking some of the tokens in a sentence and then training it to predict these masked tokens. A model using NSP can better understand the connection between two sentences when answering questions [46].BERT’s encoder-only design is well-suited for natural language understanding applications, such as named entity identification and categorization.
- GPT-2Since GPT-2 is constructed utilizing the transformer architecture’s decoder portion, it is essentially a generative model that aims to produce coherent text in response to a prompt. BERT uses bidirectional attention, whereas GPT-2 uses unidirectional attention. Because each word can only focus on the words that come before it, it is especially well-suited for autoregressive tasks, in which the model generates words one at a time based on the words that came before it. Each of the multiple layers of decoders that make up GPT-2 has feedforward networks and self-attention processes. By evaluating previous tokens, these decoders forecast the subsequent token in a sequence, allowing GPT-2 to produce language that resembles that of a human efficiently. Causal Language Modeling (CLM) is employed for model pre-training, helping it predict the next word in a sequence. The architecture of GPT-2 is designed to handle very long contexts and is optimized for text creation tasks, including tale writing, dialog generation, and text completion [57].
- mT5The multilingual T5 (mT5) model’s architecture is particularly well-suited for tackling the challenges of Urdu abstractive summarization. Its encoder–decoder framework, pre-trained on a corpus spanning over 100 languages, inherently benefits from cross-lingual transfer learning. Languages morphologically similar to Urdu, such as Arabic, Persian, and Hindi, present in the pre-training data, allow mT5 to bootstrap its understanding of Urdu’s complex feature system. We leverage several key features of mT5 for Urdu [43,58].
- –
- SentencePiece Tokenization: mT5’s subword tokenization algorithm effectively handles Urdu’s rich morphology and frequent compounding (e.g., verb–noun complexes). It learns to segment words into meaningful morphemes, significantly reducing the vocabulary sparsity and out-of-vocabulary issues that plague word-level models.
- –
- Span Corruption Pre-training: The pre-training objective of corrupting spans of text and learning to reconstruct them is highly effective for summarization, a task that requires rewriting and condensing large sections of text. This encourages the model to develop robust semantic understanding beyond mere word matching.
- –
- Explicit RTL Handling: While mT5 is multilingual, we enhance its innate capability by explicitly incorporating Right-to-Left (RTL) positional embeddings during fine-tuning. This ensures the model respects the native reading order of the Urdu script, improving coherence and the flow of generated summaries.
- –
- Transfer from Related Languages: The model’s parameters, already tuned on languages with similar syntactic structures and lexical overlap (e.g., SOV word order, Arabic loanwords), provide a substantial prior, accelerating convergence and improving performance on low-resource Urdu tasks compared to models trained from scratch or on English-only data.
- BARTBy combining both encoder and decoder designs, BART successfully combines the advantages of GPT-2 and BERT. The encoder operates in both directions by considering past and future tokens, similar to BERT. This allows it to capture the complete context of the input text. In contrast, the decoder is autoregressive (AR), like GPT-2, and generates text sequentially from left to right, one token at a time.BERT is already trained as an autoencoder, meaning it learns to reproduce the original text after corrupting the input sequence (for instance, by rearranging or hiding tokens). Because it fully comprehends the input text and produces fluent output, this training method enables BART to provide accurate and concise summaries for tasks such as text summarization.BART is highly versatile for various tasks, including text synthesis, machine translation, and summarization, thanks to its combination of bidirectional understanding (by the encoder) and AR generation (via the decoder) [44].
- –
- Encoder–Decoder Architectures: BART and mT5 are configured as encoder–decoder networks, while BERT-Urdu (a bidirectional encoder) is adapted using its [CLS] representation with a lightweight decoder head. GPT-2 (a left-to-right decoder-only model) is fine-tuned by prefixing input articles with a special token and having it autoregressively generate the summary. In all cases, we experiment with beam search decoding and a tuned maximum output length based on average summary length.
- –
- Decoder Length Tuning: We set the maximum decoder output length to cover typical Urdu summary sizes. Preliminary dataset analysis shows summaries average 50–100 tokens, so decoders are capped at, e.g., 128 tokens. This prevents overgeneration while allowing sufficient length. We also enable length penalty and early stopping in beam search to discourage excessively short or repetitive outputs
- Baseline ModelsTo establish a meaningful benchmark for assessing the effectiveness of transformer-based summarization models, we developed three traditional baseline models, each exemplifying a distinct category of summarization strategy: extractive, basic neural abstractive, and enhanced neural abstractive with memory features. These baseline models are particularly beneficial for low-resource languages such as Urdu, where the lack of data may hinder the effectiveness of large-scale pretrained models.
- TF-IDF Extractive ModelA non-neural approach to extractive summarization, the Term Frequency-Inverse Document Frequency (TF-IDF) model evaluates and selects the most relevant sentences from the source text according to word significance. Words that are uncommon in the bigger corpus but frequently appear in a particular document are given higher scores by TF-IDF. Sentences are ranked according to the sum of their constituent words’ TF-IDF scores, and the sentences with the highest ranks are chosen to create the summary.TF-IDF serves as a strong lexical-matching benchmark even though it does not generate new sentences. Its computational efficiency and language independence make it a useful control model for summarization tasks, particularly when evaluating the performance gains brought about by more complex neural architectures.
- Seq2Seq ModelThe Seq2Seq (Sequence-to-Sequence) model is a fundamental neural architecture utilized for abstractive summarization. It features a single-layer Recurrent Neural Network (RNN) that acts as both the encoder and decoder. The encoder processes the input Urdu text and converts it into a fixed-size context vector, which is then used by the decoder to generate a summary one token at a time. To enhance focus and relevance in the summary generation, we incorporate Bahdanau attention, allowing the decoder to pay attention to various segments of the input sequence during each step of the generation process.This design effectively captures relationships within the sequences and enables the model to rephrase or rearrange the content, a vital element of abstractive summarization. Nevertheless, the traditional Seq2Seq model faces challenges with long-range dependencies, limiting its use to a lower-bound benchmark in our experiments.
- LSTM-Based Encoder–Decoder ModelTo overcome the limitations of conventional RNNs, we developed an advanced Bidirectional Long Short-Term Memory (Bi-LSTM) encoder in conjunction with a unidirectional LSTM decoder. LSTM units are explicitly designed to address vanishing gradient problems, allowing for enhanced retention of long-term dependencies and contextual details.The bidirectional encoder processes the Urdu input text in both forward and backward directions, successfully capturing contextual subtleties from both sides of the sequence. The decoder then generates the summary using Bahdanau attention, which enables it to focus selectively on various parts of the input text at each time step.This model balances computational efficiency and effectiveness, providing a more expressive baseline that can produce coherent and somewhat abstractive summaries while still being trainable without needing extensive training.
3.1.12. Search Strategy
3.2. Training Configuration
3.2.1. ROUGE-Augmented Loss
3.2.2. Hyperparameter Tuning
- Hardware: The experiments were conducted on NVIDIA GPUs (for example, Tesla V100) equipped with approximately 16–32 GB of RAM. The training duration varied according to the model: around 1–2 h for the LSTM and several hours for each Transformer.
- Implementation: The RNN model is realized using PyTorch v2.6.0, while the Transformers rely on the HuggingFace transformers library. We monitor validation ROUGE to identify the optimal checkpoint.
3.3. Evaluation and Performance Metrics
Evaluation Metrics
Algorithm 2 Evaluate Summarization Models |
|
- ROUGE-1 (Unigram Overlap)
- : The number of words that overlap between the generated summary and the reference summaries, as shown in Equation (2).
- : The total number of unigrams (individual items of a sequence) in the reference summary.
- 2.
- ROUGE-2 (Bigram Overlap)
- : quantity of bigrams that overlap the automatically generated summaries along with the reference data.
- : Total bigrams in reference.
- 3.
- ROUGE-L (Longest Common Subsequence)
- : This represents the length of the longest common subsequence between the generated summary and the reference summary.
- : Total word count in reference of summary.
3.4. Performance Comparison Metric
4. Results
4.1. Performance on Urdu Fake News Dataset
4.2. Performance on MWZ/URSUM Dataset
4.3. Performance on Urdu Instruct Dataset
4.4. Transformer vs. Classical Baseline Comparison
4.5. Performance Comparison on the Urdu Fake News Dataset
4.6. Ablation Studies
5. Discussion
- RTL-Aware Tokenization: Right-to-left embeddings and script normalization (handling Noon Ghunna, Hamza, etc.) improved alignment between generated summaries and Urdu’s script structure. This adaptation was significant for pre-trained models originally trained on left-to-right scripts.
- Idiomatic and Domain-Specific Challenges: Urdu news often mixes English loanwords, idioms, and formal expressions. Models sometimes misinterpret or transliterate English words or overuse generic phrases. While transformers usually generate coherent summaries, they occasionally repeat content or hallucinate facts not present in the source.
- Hybrid Loss Function: Combining cross-entropy with a ROUGE-L penalty improved both fluency and informativeness. Unlike likelihood-only training, the hybrid loss explicitly encouraged content retention from the source.
- Training Considerations: mT5 required careful hyperparameter tuning (low learning rate, moderate dropout) to balance pretrained knowledge with Urdu-specific adaptation. In contrast, the Seq2Seq baseline required more epochs but plateaued earlier, reflecting limited capacity.
- Relative Improvement Metric: Reporting relative gains provided clearer insights when absolute ROUGE values were modest. For example, a +28.2% improvement in ROUGE-1 (Table 12) highlights the substantial benefit of TLMs.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Saggion, H.; Poibeau, T. Automatic Text Summarization: Past, Present and Future. In Multi-Source, Multilingual Information Extraction and Summarization; Springer: Berlin/Heidelberg, Germany, 2013; pp. 3–21. [Google Scholar]
- Rahul, S.; Rauniyar, S.; Monika. A Survey on Deep Learning Based Various Methods Analysis of Text Summarization. In Proceedings of the 2020 International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, 26–28 February 2020; pp. 113–116. [Google Scholar]
- Bhatti, M.W.; Aslam, M. ISUTD: Intelligent System for Urdu Text De-Summarization. In Proceedings of the 2019 International Conference on Engineering and Emerging Technologies (ICEET), Lahore, Pakistan, 21–22 February 2019; pp. 1–5. [Google Scholar]
- Verma, P.; Verma, A.; Pal, S. An Approach for Extractive Text Summarization Using Fuzzy Evolutionary and Clustering Algorithms. Appl. Soft Comput. 2022, 120, 108670. [Google Scholar] [CrossRef]
- Fejer, H.N.; Omar, N. Automatic Arabic Text Summarization Using Clustering and Keyphrase Extraction. In Proceedings of the 6th International Conference on Information Technology and Multimedia, Putrajaya, Malaysia, 18–20 November 2014; pp. 293–298. [Google Scholar]
- Syed, A.A.; Gaol, F.L.; Matsuo, T. A Survey of the State-of-the-Art Models in Neural Abstractive Text Summarization. IEEE Access 2021, 9, 13248–13265. [Google Scholar] [CrossRef]
- Siragusa, G.; Robaldo, L. Sentence Graph Attention For Content-Aware Summarization. Appl. Sci. 2022, 12, 10382. [Google Scholar] [CrossRef]
- Allahyari, M.; Pouriyeh, S.; Assefi, M.; Safaei, S.; Trippe, E.D.; Gutierrez, J.B.; Kochut, K. Text Summarization Techniques: A Brief Survey. arXiv 2017, arXiv:1707.02268. [Google Scholar] [CrossRef]
- Witte, R.; Krestel, R.; Bergler, S. Generating Update Summaries for DUC 2007. In Proceedings of the Document Understanding Conference, Rochester, NY, USA, 26–27 April 2007; pp. 1–5. [Google Scholar]
- Rahman, T. Language Policy and Localization in Pakistan: Proposal for a Paradigmatic Shift. In Proceedings of the SCALLA Conference on Computational Linguistics, Seoul, Republic of Korea, 15–21 February 2004; Volume 99, pp. 1–19. [Google Scholar]
- Janjanam, P.; Reddy, C.P. Text Summarization: An Essential Study. In Proceedings of the 2019 International Conference on Computational Intelligence in Data Science (ICCIDS), Chennai, India, 21–23 February 2019; pp. 1–6. [Google Scholar]
- Al-Maleh, M.; Desouki, S. Arabic Text Summarization Using Deep Learning Approach. J. Big Data 2020, 7, 109. [Google Scholar] [CrossRef]
- Vogel-Fernandez, A.; Calleja, P.; Rico, M. esT5s: A Spanish Model for Text Summarization. In Towards a Knowledge-Aware AI; IOS Press: Amsterdam, The Netherlands, 2022; pp. 184–190. [Google Scholar]
- Vetriselvi, T.; Mathur, M. Text Summarization and Translation of Summarized Outcome in French. In E3S Web of Conferences; EDP Sciences: Les Ulis, France, 2023; Volume 399, p. 04002. [Google Scholar]
- Balajia, R.L.; Thiruvenkataswamy, C.S.; Batumalay, M.; Duraimutharasan, N.; Devadas, A.D.T.; Yingthawornsuk, T. A Study of Unified Framework for Extremism Classification, Ideology Detection, Propaganda Analysis, and Flagged Data Detection Using Transformers. J. Appl. Data Sci. 2025, 6, 1791–1810. [Google Scholar] [CrossRef]
- Camastra, F.; Razi, G. Italian Text Categorization with Lemmatization and Support Vector Machines. In Neural Approaches to Dynamics of Signal Exchanges; Springer: Singapore, 2020; pp. 47–54. [Google Scholar]
- Garcia, G.L.; Paiola, P.H.; Jodas, D.S.; Sugi, L.A.; Papa, J.P. Text Summarization and Temporal Learning Models Applied to Portuguese Fake News Detection in a Novel Brazilian Corpus Dataset. In Proceedings of the 16th International Conference on Computational Processing of Portuguese, Santiago de Compostela, Spain, 12–15 March 2024; pp. 86–96. [Google Scholar]
- Goloviznina, V.; Kotelnikov, E. Automatic Summarization of Russian Texts: Comparison of Extractive and Abstractive Methods. arXiv 2022, arXiv:2206.09253. [Google Scholar] [CrossRef]
- Xiong, C.; Wang, Z.; Shen, L.; Deng, N. TF-BiLSTMS2S: A Chinese Text Summarization Model. In Advanced Information Networking and Applications: Proceedings of the 34th International Conference on Advanced Information Networking and Applications (AINA-2020), Caserta, Italy, 15–17 April 2020; Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 240–249. [Google Scholar]
- Nagai, Y.; Oka, T.; Komachi, M. A Document-Level Text Simplification Dataset for Japanese. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italy, 20–25 May 2024; pp. 459–476. [Google Scholar]
- Naseer, A.; Hussain, S. Supervised Word Sense Disambiguation for Urdu Using Bayesian Classification. Technical Report; Center for Research in Urdu Language Processing: Lahore, Pakistan, 2009. [Google Scholar]
- Daud, A.; Khan, W.; Che, D. Urdu Language Processing: A Survey. Artif. Intell. Rev. 2017, 47, 279–311. [Google Scholar] [CrossRef]
- Ramos, J. Using tf-idf to determine word relevance in document queries. In Proceedings of the First Instructional Conference on Machine Learning, Orlando, FL, USA, 15–17 December 2003; Volume 242, No. 1. pp. 29–48. [Google Scholar]
- Mihalcea, R.; Tarau, P. TextRank: Bringing Order into Text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 25–26 July 2004; pp. 404–411. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
- Xue, L.; Constant, N.; Roberts, A.; Kale, M.; Al-Rfou, R.; Siddhant, A.; Barua, A.; Raffel, C. mT5: A Massively Multilingual Pre-Trained Text-to-Text Transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics, Online, 6–11 June 2021; pp. 483–498. [Google Scholar]
- Ul Hasan, M.; Raza, A.; Rafi, M.S. UrduBERT: A Bidirectional Transformer for Urdu Language Understanding. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2022, 21, 1–22. [Google Scholar]
- Sajjad, H.; Dalvi, F.; Durrani, N.; Nakov, P. Poor Man’s BERT: Smaller and Faster Transformer Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 2083–2098. [Google Scholar]
- Hou, L.; Hu, P.; Bei, C. Abstractive Document Summarization via Neural Model with Joint Attention. In Proceedings of the National CCF Conference on Natural Language Processing and Chinese Computing, Dalian, China, 8–12 November 2017; pp. 329–338. [Google Scholar]
- Humayoun, M.; Nawab, R.; Uzair, M.; Aslam, S.; Farzand, O. Urdu Summary Corpus. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16), Portorož, Slovenia, 23–28 May 2016; pp. 796–800. [Google Scholar]
- Awais, M.; Nawab, R.M.A. Abstractive Text Summarization for the Urdu Language: Data and Methods. IEEE Access 2024, 12, 61198–61210. [Google Scholar] [CrossRef]
- Raza, H.; Shahzad, W. End to End Urdu Abstractive Text Summarization with Dataset and Improvement in Evaluation Metric. IEEE Access 2024, 12, 40311–40324. [Google Scholar] [CrossRef]
- Chen, Q.; Zhu, X.; Ling, Z.; Wei, S.; Jiang, H. Distraction-Based Neural Networks for Document Summarization. arXiv 2016, arXiv:1610.08462. [Google Scholar] [CrossRef]
- Gu, J.; Lu, Z.; Li, H.; Li, V.O. Incorporating Copying Mechanism in Sequence-to-Sequence Learning. arXiv 2016, arXiv:1603.06393. [Google Scholar]
- Nawaz, A.; Bakhtyar, M.; Baber, J.; Ullah, I.; Noor, W.; Basit, A. Extractive Text Summarization Models for Urdu Language. Inf. Process. Manag. 2020, 57, 102383. [Google Scholar] [CrossRef]
- Hub, C.; Lcsts, Z. A Large Scale Chinese Short Text Summarization Dataset. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; Volume 2, pp. 1967–1972. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- El-Kassas, W.S.; Salama, C.R.; Rafea, A.A.; Mohamed, H.K. Automatic Text Summarization: A Comprehensive Survey. Expert Syst. Appl. 2021, 165, 113679. [Google Scholar] [CrossRef]
- Tan, J.; Wan, X.; Xiao, J. Abstractive Document Summarization with a Graph-Based Attentional Neural Model. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, 30 July–4 August 2017; Association for Computational Linguistics: Stroudsburg, PA, USA, 2017; pp. 1171–1181. [Google Scholar]
- Rodríguez, D.Z.; Okey, O.D.; Maidin, S.S.; Udo, E.U.; Kleinschmidt, J.H. Attentive Transformer Deep Learning Algorithm for Intrusion Detection on IoT Systems Using Automatic Explainable Feature Selection. PLoS ONE 2023, 18, e0286652. [Google Scholar]
- Shafiq, N.; Hamid, I.; Asif, M.; Nawaz, Q.; Aljuaid, H.; Ali, H. Abstractive Text Summarization of Low-Resourced Languages Using Deep Learning. PeerJ Comput. Sci. 2023, 9, e1176. [Google Scholar] [CrossRef]
- Faheem, A.; Ullah, F.; Ayub, M.S.; Karim, A. UrduMASD: A Multimodal Abstractive Summarization Dataset for Urdu. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italy, 20–25 May 2024; pp. 17245–17253. [Google Scholar]
- Munaf, M.; Afzal, H.; Mahmood, K.; Iltaf, N. Low Resource Summarization Using Pre-Trained Language Models. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2024, 23, 1–19. [Google Scholar] [CrossRef]
- Raza, A.; Raja, H.S.; Maratib, U. Abstractive Summary Generation for the Urdu Language. arXiv 2023, arXiv:2305.16195. [Google Scholar] [CrossRef]
- Raza, A.; Soomro, M.H.; Shahzad, I.; Batool, S. Abstractive Text Summarization for Urdu Language. J. Comput. Biomed. Informatics 2024, 7, 2. [Google Scholar]
- Duarte, J.M.; Berton, L. A Review of Semi-Supervised Learning for Text Classification. Artif. Intell. Rev. 2023, 56, 9401–9469. [Google Scholar] [CrossRef] [PubMed]
- Bashar, M.A. A Coherent Knowledge-Driven Deep Learning Model for Idiomatic-Aware Sentiment Analysis of Unstructured Text Using Bert Transformer. Ph.D. Thesis, Universiti Teknologi MARA, Shah Alam, Malaysia, 2023. [Google Scholar]
- Humayoun, M.; Akhtar, N. CORPURES: Benchmark Corpus for Urdu Extractive Summaries and Experiments Using Supervised Learning. Intell. Syst. Appl. 2022, 16, 200129. [Google Scholar] [CrossRef]
- Muhammad, A.; Jazeb, N.; Martinez-Enriquez, A.M.; Sikander, A. EUTS: Extractive Urdu Text Summarizer. In Proceedings of the 2018 Seventeenth Mexican International Conference on Artificial Intelligence (MICAI), Guadalajara, Mexico, 22–27 October 2018; pp. 39–44. [Google Scholar]
- Saleem, M.A.; Shuja, J.; Humayun, M.A.; Ahmed, S.B.; Ahmad, R.W. Machine Learning Based Extractive Text Summarization Using Document Aware and Document Unaware Features. In Intelligent Systems Modeling and Simulation III: Artificial Intelligence, Machine Learning, Intelligent Functions and Cyber Security; Springer Nature: Cham, Switzerland, 2024; pp. 143–158. [Google Scholar]
- Syed, M.U.; Junaid, M.; Mehmood, I. UrduHack: NLP Library for Urdu Language. 2020. Available online: https://urduhack.readthedocs.io/en/stable/reference/normalization.html (accessed on 18 October 2024).
- Humsha, S. Urdu Summarization Corpus (USCorpus). 2021. Available online: https://github.com/humsha/USCorpus (accessed on 18 October 2024).
- Community Datasets. Urdu Fake News Dataset. Hugging Face, 2022. Available online: https://huggingface.co/datasets/community-datasets/urdu_fake_news (accessed on 18 October 2024).
- mwz. Ursum Dataset. Hugging Face, 2022. Available online: https://huggingface.co/datasets/mwz/ursum (accessed on 18 October 2024).
- Mustafa, A. Urdu Instruct News Article Generation. Hugging Face, 2023. Available online: https://huggingface.co/datasets/AhmadMustafa/Urdu-Instruct-News-Article-Generation (accessed on 18 October 2024).
- Sunusi, Y.; Omar, N.; Zakaria, L.Q. Exploring Abstractive Text Summarization: Methods, Dataset, Evaluation, and Emerging Challenges. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 1340–1357. [Google Scholar] [CrossRef]
- Barbella, M.; Tortora, G. ROUGE Metric Evaluation for Text Summarization Techniques. SSRN 2023. Available online: https://ssrn.com/abstract=4120317 (accessed on 18 October 2024).
- Paulus, R.; Xiong, C.; Socher, R. A Deep Reinforced Model for Abstractive Summarization. arXiv 2018, arXiv:1705.04304. [Google Scholar]
- Smith, L.N. A Disciplined Approach to Neural Network Hyper-Parameters: Part 1–Learning Rate, Batch Size, Momentum, and Weight Decay. arXiv 2018, arXiv:1803.09820. [Google Scholar]
Authors & Year | Language | Dataset | Challenges | Techniques | Evaluation Metrics | Advantage | Disadvantage |
---|---|---|---|---|---|---|---|
Shafiq et al. (2023) [42] | Urdu | Urdu 1 Million News Dataset | Limited research on abstractive summarization | Deep Learning, Extractive and Abstractive Summarization | ROUGE-1: 27.34, ROUGE-2: 07.10, ROUGE-L: 25.50 | Better than SVM and LR | Requires further exploration |
Awais et al. (2024) [32] | Urdu | 2,067,784 news articles | Limited exploration of Urdu summarization | LSTM, GRU, Bi-LSTM, Bi-GRU, Attention, GPT-3.5, BART | ROUGE-1: 46.7, ROUGE-2: 24.1, ROUGE-L: 48.7 | Strong abstractive performance | Urdu literature underexplored |
Raza et al. (2024) [33] | Urdu | Labeled Urdu dataset | Limited abstractive research | Transformer Encoder–Decoder | ROUGE-1: 25.18, Context-Aware Roberta Score | Introduced new evaluation metric | Needs larger datasets |
Authors & Year | Language | Dataset | Challenges | Techniques | Evaluation Metrics | Advantage | Disadvantage |
---|---|---|---|---|---|---|---|
A. Faheem et al. (2024) [43] | Urdu | Urdu MASD | Low-resource language | Abstractive Multimodal, mT5, MLASK | ROUGE, BLEU | The first comprehensive multimodal dataset specifically designed for the Urdu language. | Requires focused and specialized pretraining specifically tailored for the Urdu language. |
M Munaf et al. (2023) [44] | Urdu | 76.5k pairs of articles and summaries | Low-resource linguistic | Abstractive Transformer, mT5, urT5 | ROUGE, BERTScore | Effective for low-resource summarization | The findings may have restricted applicability beyond the context of the Urdu language. |
Ali Raza et al. (2023) [45] | Urdu | Publicly available dataset | Comprehension of source text, grammar, semantics | Abstractive Transformer-based encoder–decoder, beam search | ROUGE-1, ROUGE-2, and ROUGE-L | High ROUGE scores, grammatically correct summaries | There is potential for enhancement in the ROUGE-L score. |
Asif Raza et al. (2024) [46] | Urdu | Own collected a dataset of 50 articles | Limited research on Urdu abstractive summarization | Extractive methods (TF-IDF, sentence weight, word frequency), Hybrid approach, BERT | Evaluated by Urdu professionals | Hybrid approach refines extractive summaries; potential for human-like summaries | Limited to single-document summarization; requires more vocabulary and synonyms |
Authors & Year | Language | Dataset | Challenges | Techniques | Evaluation Metrics | Advantage | Disadvantage |
---|---|---|---|---|---|---|---|
Ahmed et al. (2022) [47] | Urdu | 53k dataset | Limited resources | Extractive, BERT | ROUGE-L | Leverages pretrained models | Limited corpus size |
Duarte et al. (2023) [48] | Urdu | Multiple (medical, AG News, DBpedia, etc.) | Data scarcity, imbalance | Extractive, SSL, ML | ROUGE-1 | Comprehensive review | Limited method-specific depth |
Ali Nawaz et al. (2022) [36] | Urdu | Public dataset | Lack of extractive framework | Extractive LW/GW | F-score, Accuracy | LW achieves higher F-scores | VSM yields lower accuracy |
Muhammad et al. (2018) [49] | Urdu | Text documents | Feature extraction challenges | Extractive sentence weighting | ROUGE (n-gram) | 67% n-gram accuracy | Limited to extractive |
Saleem et al. (2024) [50] | Urdu | CORPURES (100 docs) | Low-resource | Extractive | ROUGE-2: 0.63 | High accuracy | Degrades with all features |
Humayoun et al. (2022) [51] | Urdu | CORPURES (161 docs) | Lack of standardized corpora | Extractive (NB, LR, MLP) | ROUGE-2 | First Urdu extractive corpus | Limited dataset size |
Dataset | Size | Domain | Availability | URL |
---|---|---|---|---|
Urdu-fake-news | 900 documents | 5 different news domains | Hugging Face | https://huggingface.co/datasets/community-datasets/urdu_fake_news (accessed on 18 October 2024) |
MWZ/RUM (Multi-Domain Urdu Summarization) | 48,071 news articles | News, Legal ( collected from the BBC Urdu website) | Hugging Face | https://huggingface.co/datasets/mwz/urdu_summarization (accessed on 18 October 2024) |
Urdu-Instruct-News-Article-Generation (Ahmad Mustafa) | 7.5K articles | News (Instruction-Tuned) | Hugging Face | https://huggingface.co/datasets/AhmadMustafa/Urdu-Instruct-News-Article-Generation (accessed on 18 October 2024) |
Model | Batch Size | Learning Rate | Max Dec Length | Epochs |
---|---|---|---|---|
Transformer-based TLMs | 16 | Dynamic ( avg length) | 50 | |
Baseline seq2seq models | 64 | Fixed (128 tokens) | 30 |
Model | ROUGE-1 | ROUGE-2 | ROUGE-L | ROUGE-LSUM |
---|---|---|---|---|
BART | 0.48 | 0.31 | 0.40 | 0.38 |
mT5 | 0.50 | 0.33 | 0.42 | 0.40 |
BERT | 0.46 | 0.28 | 0.38 | 0.34 |
GPT-2 | 0.45 | 0.27 | 0.36 | 0.32 |
Model | ROUGE-1 | ROUGE-2 | ROUGE-L | ROUGE-LSUM |
---|---|---|---|---|
BART | 0.12 | 0.02 | 0.12 | 0.12 |
mT5 | 0.34 | 0.14 | 0.24 | 0.24 |
BERT | 0.33 | 0.11 | 0.19 | 0.19 |
GPT-2 | 0.18 | 0.05 | 0.13 | 0.13 |
Model | ROUGE-1 | ROUGE-2 | ROUGE-L | ROUGE-LSUM |
---|---|---|---|---|
BART | 0.22 | 0.07 | 0.14 | 0.14 |
mT5 | 0.36 | 0.07 | 0.24 | 0.24 |
BERT | 0.30 | 0.16 | 0.27 | 0.27 |
GPT-2 | 0.22 | 0.10 | 0.20 | 0.20 |
Model | ROUGE-1 | ROUGE-2 | ROUGE-L |
---|---|---|---|
Seq2Seq (LSTM + Attention) | 0.39 | 0.28 | 0.35 |
mT5 | 0.50 | 0.33 | 0.42 |
Model | ROUGE-1 | ROUGE-2 | ROUGE-L |
---|---|---|---|
Seq2Seq (LSTM + Attention) | 0.25 | 0.12 | 0.20 |
mT5 | 0.34 | 0.14 | 0.24 |
Model | ROUGE-1 | ROUGE-2 | ROUGE-L | R1 (%) | R2 (%) | RL (%) |
---|---|---|---|---|---|---|
Seq2Seq + Attention (LSTM) | 0.39 | 0.28 | 0.35 | — | — | — |
mT5 | 0.50 | 0.33 | 0.42 | +28.2 | +17.9 | +20.0 |
Model Variant | ROUGE-1 | ROUGE-2 | ROUGE-L |
---|---|---|---|
Full mT5 (ours) | 0.50 | 0.33 | 0.42 |
w/o RTL-aware chunking | 0.46 | 0.30 | 0.39 |
w/o Hybrid Loss | 0.48 | 0.31 | 0.40 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Azhar, M.; Amjad, A.; Dewi, D.A.; Kasim, S. A Systematic Review and Experimental Evaluation of Classical and Transformer-Based Models for Urdu Abstractive Text Summarization. Information 2025, 16, 784. https://doi.org/10.3390/info16090784
Azhar M, Amjad A, Dewi DA, Kasim S. A Systematic Review and Experimental Evaluation of Classical and Transformer-Based Models for Urdu Abstractive Text Summarization. Information. 2025; 16(9):784. https://doi.org/10.3390/info16090784
Chicago/Turabian StyleAzhar, Muhammad, Adeen Amjad, Deshinta Arrova Dewi, and Shahreen Kasim. 2025. "A Systematic Review and Experimental Evaluation of Classical and Transformer-Based Models for Urdu Abstractive Text Summarization" Information 16, no. 9: 784. https://doi.org/10.3390/info16090784
APA StyleAzhar, M., Amjad, A., Dewi, D. A., & Kasim, S. (2025). A Systematic Review and Experimental Evaluation of Classical and Transformer-Based Models for Urdu Abstractive Text Summarization. Information, 16(9), 784. https://doi.org/10.3390/info16090784