Abstractive vs. Extractive Summarization: An Experimental Review
Abstract
:1. Introduction
2. Related Work
2.1. Extractive Approaches
2.2. Abstractive Approaches
2.3. Datasets
2.4. Evaluation Metrics
3. Evaluation
3.1. Hardware and Software Specifications
3.2. Experimental Setup and Evaluation Results
3.2.1. Abstractive Models
- ⚬
- BART-large-CNN is a finetuned version of the large sized model of BART (BART-large) on CNN/Daily Mail. The BART-large model uses the same number (i.e., 12) for each of the encoder, decoder, and hidden layers. The maximum sequence length (max_position_embeddings) of the model is 1024 tokens, while the output lengths (min_length, max_length) are set to (56, 142) tokens.
- ⚬
- BART-large-XSum is a finetuned version of the large sized model of BART on XSum. It uses the same number of layers as the previous model. The max sequence length (max_position_embeddings) of the model is 1024 tokens, while the output lengths (min_length, max_length) are set to (11, 62) tokens.
- ⚬
- BART-large-CNN-SAMSum uses the BART-large-CNN model; it is further finetuned on examples from the SAMSum dataset. It has the same number of layers, maximum sequence, and output lengths as the BART-large-CNN model.
- ⚬
- PEGASUS-large is actually the “PEGASUSLARGE (mixed, stochastic)” model reported in [9]. The key difference of this model is that it was pretrained on two large text corpora, namely C4 and HugeNews, with a different training configuration compared to the other PEGASUS models reported in [9]. This model features an updated tokenizer and an enhanced training process. The model has an encoder, a decoder, and a hidden state, each consisting of 16 layers. The max sequence length (max_position_embeddings) of the model is 1024 tokens, while the maximum output length (max_length) is set to 256 tokens.
- ⚬
- PEGASUS-multi_news uses the PEGASUS-large model finetuned on the Multi-News [63] dataset, which was originally made for the task of multi-document summarization. It has the same number of maximum sequence and output lengths.
- ⚬
- PEGASUS-XSum uses the PEGASUS-large model finetuned on XSum. It has a maximum sequence length (max_position_embeddings) of 512 tokens and an output length limit (max_length) of 64 tokens.
- ⚬
- PEGASUS-CNN_dailymail uses the PEGASUS-large model finetuned on CNN/Daily Mail. It has a maximum sequence length (max_position_embeddings) of 1024 tokens and an output length limit (max_length) of 128 tokens.
- ⚬
- DistilBART-CNN-12-6 is a student model of BART that is trained for CNN/Daily Mail using the distillation technique of KD. It features an encoder and a hidden state, each consisting of 12 layers, and a decoder consisting of 6 (half of the teacher model’s decoder layers). The student model has a maximum sequence length (max_position_embeddings) of 1024 tokens, while output length limits (min_length, max_length) are set to (56, 142) tokens.
- ⚬
- DistilBART-XSum-12-6 is a student model of BART that is trained for the XSum dataset using the same distillation technique as the previous model. It features an encoder and a hidden state with 12 layers, and a decoder with 6 of the teacher model’s layers. The student model has a maximum sequence length (max_position_embeddings) of 1024 tokens, while output length limits (min_length, max_length) are set to (11, 62) tokens.
- ⚬
- mT5-multilingual-XLSum is a model that uses Google’s mT5 (multilingual-T5) base model finetuned on the XLSum dataset for the task of multilingual summarization [55]. The model has an encoder, a decoder, and a hidden state, each consisting of 12 layers. The original authors truncated the input sequence to 512 tokens, while the maximum output length (max_length) is set to 84 tokens.
3.2.2. Evaluation Results
4. Conclusions, Open Issues, and Future Work
- Abstractive models that are finetuned on data similar to the test data produce significantly better results. The rest of the abstractive models, which are finetuned on data different than the test data, in most cases perform similarly or even better than the extractive ones. There is no clear winner among the BART, PEGASUS, and DistilBART approaches since none of them outperformed the others across diverse datasets. However, it is worth noting that in some cases the DistilBART student outperformed the BART teacher model.
- Regarding the extractive approaches, their evaluation scores were relatively close to each other, so no approach stood out. Among the two TextRank variations, the implementation from sumy performed better on the datasets from which we extracted only one-sentence summaries (XSum, XLSum and Reddit TIFU), while the version from pyTextRank performed better on the rest of the chosen datasets. We also tested an embeddings-based extractive approach, e-LexRank, which in most datasets did not yield better results than the classical extractive approaches.
- Regarding our evaluation metrics, we note that the scores produced by applying BLEU are similar to those produced by ROUGE. This leads us to recommend the BLEU metric for the evaluation of summarization approaches, even if its original use concerned the field of machine translation.
- One may observe that the scores of RL match those of RLSum in all datasets, apart from the case of the CNN/Daily Mail dataset. With respect to the datasets that have single-sentence summaries, RLSum (summary-level) scores were equivalent to RL (sentence-level) scores. For the other datasets, which might contain multi-sentence summaries, this happens because the implementation of RLSum that we use splits sentences using the newline delimiter (\n).
- The implication of the aforementioned experimental results is that all abstractive models do not perform equally, as also reported in [21]. Thus, there is a constant need for researchers to discover better pretrained language model architectures that generalize more easily and produce summaries closer to the human style of writing.
- Abstractive models need to be retrained or refinetuned each time documents of different languages or different domains are introduced, respectively. This could be solved by creating more non-English datasets of various domains, and then training and finetuning different versions of the models.
- Instead of highly accurate but specialized abstractive models, a generic type of abstractive models could emerge. These could be trained and finetuned on vast multilingual corpora, achieving a degree of generalization that is not present in current models.
- Current abstractive approaches require a significant amount of training data [9,55] and training time, even with specialized hardware. This could be solved through the semisupervised nature of large language models (LLMs), e.g., GPT-3 [52], which, due to their extremely large training and number of parameters (in the billion scale), can be finetuned for a specific language or domain through the utilization of a limited number of examples.
- As mentioned in Section 2.4, BLEU can be used as an evaluation metric for the TS task. As presented in Section 3.2.2, this metric produces a ranking for the approaches that is similar to that produced by ROUGE. To the best of our knowledge, most research works that evaluate summarization approaches utilize only the ROUGE metric.
- Evaluate LLMs for the TS task, given their zero/few-shot learning capabilities, which enable them to be finetuned for different languages or domains with a significantly smaller number of examples.
- Finetune existing abstractive approaches in other languages and/or domains.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Journal/Conference/Workshop/Repository | Publisher |
---|---|
Nature | Nature |
IEEE Access International Conference on Computer, Communication and Signal Processing (ICCCSP) IEEE Region 10 Symposium (TENSYMP) | Institute of Electrical and Electronics Engineers (IEEE) |
Expert Systems with Applications Information Fusion Information Processing & Management Computer Speech & Language | Elsevier |
AAAI Conference on Artificial Intelligence | Association for the Advancement of Artificial Intelligence (AAAI) |
Journal of the American Society for Information Science and Technology (J. Am. Soc. Inf. Sci.) | Association for Computing Machinery (ACM) |
Artificial Intelligence Review European Conference on Advances in Information Retrieval (ECIR) International Journal of Parallel Programming (Int J Parallel Prog) | Springer |
International Conference on Language Resources and Evaluation (LREC) Conference on Empirical Methods in Natural Language Processing (EMNLP) Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) Annual Meeting of the Association for Computational Linguistics Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing (ACL-IJCNLP) International Joint Conference on Natural Language Processing (IJCNLP) International Conference on Computational Linguistics: System Demonstrations (COLING) Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) Findings of the Association for Computational Linguistics (ACL-IJCNLP) Text Summarization Branches Out Conference on Machine Translation (WMT) Workshop on New Frontiers in Summarization | Association for Computational Linguistics (ACL) |
International Conference on Machine Learning (Proceedings of Machine Learning Research—PMLR) Journal of Machine Learning Research (JMLR) | JMLR, Inc. and Microtome Publishing (United States) |
Journal of Artificial Intelligence Research (J. Artif. Int. Res.) | AI Access Foundation, Inc. |
Journal of emerging technologies in web intelligence (JEWI) | JEWI |
Advances in Neural Information Processing Systems | Curran Associates, Inc. |
Foundations and Trends® in Information Retrieval | Now Publishers |
IBM Journal of Research and Development | IBM |
Arxiv preprints | Arxiv.org |
References
- Gupta, V.; Lehal, G.S. A Survey of Text Summarization Extractive Techniques. J. Emerg. Technol. Web Intell. 2010, 2, 258–268. [Google Scholar] [CrossRef] [Green Version]
- El-Kassas, W.S.; Salama, C.R.; Rafea, A.A.; Mohamed, H.K. Automatic Text Summarization: A Comprehensive Survey. Expert Syst. Appl. 2021, 165, 113679. [Google Scholar] [CrossRef]
- Bharti, S.K.; Babu, K.S. Automatic Keyword Extraction for Text Summarization: A Survey. arXiv 2017, arXiv:1704.03242. [Google Scholar]
- Gambhir, M.; Gupta, V. Recent Automatic Text Summarization Techniques: A Survey. Artif. Intell. Rev. 2017, 47, 1–66. [Google Scholar] [CrossRef]
- Yasunaga, M.; Kasai, J.; Zhang, R.; Fabbri, A.R.; Li, I.; Friedman, D.; Radev, D.R. ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks. Proc. AAAI Conf. Artif. Intell. 2019, 33, 7386–7393. [Google Scholar] [CrossRef] [Green Version]
- An, C.; Zhong, M.; Chen, Y.; Wang, D.; Qiu, X.; Huang, X. Enhancing Scientific Papers Summarization with Citation Graph. Proc. AAAI Conf. Artif. Intell. 2021, 35, 12498–12506. [Google Scholar] [CrossRef]
- Hong, K.; Conroy, J.; Favre, B.; Kulesza, A.; Lin, H.; Nenkova, A. A Repository of State of the Art and Competitive Baseline Summaries for Generic News Summarization. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland, 26–31 May 2014; European Language Resources Association (ELRA): Luxembourg; pp. 1608–1616. [Google Scholar]
- Narayan, S.; Cohen, S.B.; Lapata, M. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; Association for Computational Linguistics: Stroudsburg, PA, USA; pp. 1797–1807. [Google Scholar]
- Zhang, J.; Zhao, Y.; Saleh, M.; Liu, P. PEGASUS: Pre-Training with Extracted Gap-Sentences for Abstractive Summarization. In Proceedings of the 37th International Conference on Machine Learning, Virtual Event. 21 November 2020; PMLR. pp. 11328–11339. [Google Scholar]
- Zhang, S.; Celikyilmaz, A.; Gao, J.; Bansal, M. EmailSum: Abstractive Email Thread Summarization. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online. 1–6 August 2021; Association for Computational Linguistics: Stroudsburg, PA, USA; pp. 6895–6909. [Google Scholar]
- Polsley, S.; Jhunjhunwala, P.; Huang, R. CaseSummarizer: A System for Automated Summarization of Legal Texts. In Proceedings of the COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations, Osaka, Japan, 11–16 December 2016; The COLING 2016 Organizing Committee. pp. 258–262. [Google Scholar]
- Kanapala, A.; Pal, S.; Pamula, R. Text Summarization from Legal Documents: A Survey. Artif. Intell. Rev. 2019, 51, 371–402. [Google Scholar] [CrossRef]
- Bhattacharya, P.; Hiware, K.; Rajgaria, S.; Pochhi, N.; Ghosh, K.; Ghosh, S. A Comparative Study of Summarization Algorithms Applied to Legal Case Judgments. In Advances in Information Retrieval, Proceedings of the 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, 14–18 April 2019; Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 413–428. [Google Scholar]
- Sun, S.; Luo, C.; Chen, J. A Review of Natural Language Processing Techniques for Opinion Mining Systems. Inf. Fusion 2017, 36, 10–25. [Google Scholar] [CrossRef]
- Hu, Y.H.; Chen, Y.L.; Chou, H.L. Opinion Mining from Online Hotel Reviews—A Text Summarization Approach. Inf. Process. Manag. 2017, 53, 436–449. [Google Scholar] [CrossRef]
- Adamides, E.; Giarelis, N.; Kanakaris, N.; Karacapilidis, N.; Konstantinopoulos, K.; Siachos, I. Leveraging open innovation practices through a novel ICT platform. In Human Centred Intelligent Systems, Proceedings of KES HCIS 2023 Conference. Smart Innovation, Systems and Technologies, Rome, Italy, 14–16 June 2023; Springer: Rome, Italy, 2023; Volume 359. [Google Scholar]
- Nenkova, A.; McKeown, K. Automatic Summarization. Found. Trends Inf. Retr. 2011, 5, 103–233. [Google Scholar] [CrossRef] [Green Version]
- Saggion, H.; Poibeau, T. Automatic Text Summarization: Past, Present and Future. In Multi-Source, Multilingual Information Extraction and Summarization; Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R., Eds.; Theory and Applications of Natural Language Processing; Springer: Berlin/Heidelberg, Germany, 2013; pp. 3–21. ISBN 9783642285691. [Google Scholar]
- Moratanch, N.; Chitrakala, S. A Survey on Extractive Text Summarization. In Proceedings of the 2017 International Conference on Computer, Communication and Signal Processing (ICCCSP), Chennai, India, 10–11 January 2017; pp. 1–6. [Google Scholar]
- Mridha, M.F.; Lima, A.A.; Nur, K.; Das, S.C.; Hasan, M.; Kabir, M.M. A Survey of Automatic Text Summarization: Progress, Process and Challenges. IEEE Access 2021, 9, 156043–156070. [Google Scholar] [CrossRef]
- Alomari, A.; Idris, N.; Sabri, A.Q.M.; Alsmadi, I. Deep Reinforcement and Transfer Learning for Abstractive Text Summarization: A Review. Comput. Speech Lang. 2022, 71, 101276. [Google Scholar] [CrossRef]
- Lin, C.Y. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out; Association for Computational Linguistics: Barcelona, Spain, 2004; pp. 74–81. [Google Scholar]
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics; Association for Computational Linguistics, Philadelphia, PA, USA, 7–12 July 2002; pp. 311–318. [Google Scholar]
- Graham, Y. Re-Evaluating Automatic Summarization with BLEU and 192 Shades of ROUGE. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; Association for Computational Linguistics: Stroudsburg, PA, USA; pp. 128–137. [Google Scholar]
- Rieger, B.B. On Distributed Representation in Word Semantics; International Computer Science Institute: Berkeley, CA, USA, 1991. [Google Scholar]
- Luhn, H.P. The Automatic Creation of Literature Abstracts. IBM J. Res. Dev. 1958, 2, 159–165. [Google Scholar] [CrossRef] [Green Version]
- Deerwester, S.; Dumais, S.T.; Furnas, G.W.; Landauer, T.K.; Harshman, R. Indexing by Latent Semantic Analysis. J. Am. Soc. Inf. Sci. 1990, 41, 391–407. Available online: https://search.crossref.org/?q=Indexing+by+latent+semantic+analysis+Scott+Deerwester&from_ui=yes (accessed on 30 May 2023). [CrossRef]
- Gong, Y.; Liu, X. Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, New York, NY, USA, 1 September 2001; Association for Computing Machinery: New York, NY, USA; pp. 19–25. [Google Scholar]
- Steinberger, J.; Jezek, K. Using Latent Semantic Analysis in Text Summarization and Summary Evaluation. Proc. ISIM 2004, 4, 8. [Google Scholar]
- Yeh, J.Y.; Ke, H.R.; Yang, W.P.; Meng, I.H. Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis. Inf. Process. Manag. 2005, 41, 75–95. [Google Scholar] [CrossRef]
- Mihalcea, R.; Tarau, P. TextRank: Bringing Order into Text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 25–26 July 2004; Association for Computational Linguistics: Stroudsburg, PA, USA; pp. 404–411. [Google Scholar]
- Page, L.; Brin, S.; Motwani, R.; Winograd, T. The Pagerank Citation Ranking: Bring Order to the Web; Technical Report; Stanford University: Stanford, CA, USA, 1998. [Google Scholar]
- Erkan, G.; Radev, D.R. LexRank: Graph-Based Lexical Centrality as Salience in Text Summarization. J. Artif. Int. Res. 2004, 22, 457–479. [Google Scholar] [CrossRef] [Green Version]
- Bougouin, A.; Boudin, F.; Daille, B. TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, Nagoya, Japan, 14–19 October 2013; Asian Federation of Natural Language Processing: Singapore; pp. 543–551. [Google Scholar]
- Florescu, C.; Caragea, C. PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, CMA, Canada, 30 July–4 August 2017; Association for Computational Linguistics: Stroudsburg, PA, USA; pp. 1105–1115. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; Association for Computational Linguistics: Stroudsburg, PA, USA; pp. 3982–3992. [Google Scholar]
- Chengzhang, X.; Dan, L. Chinese Text Summarization Algorithm Based on Word2vec. J. Phys. Conf. Ser. 2018, 976, 012006. [Google Scholar] [CrossRef] [Green Version]
- Haider, M.M.; Hossin, M.d.A.; Mahi, H.R.; Arif, H. Automatic Text Summarization Using Gensim Word2Vec and K-Means Clustering Algorithm. In Proceedings of the 2020 IEEE Region 10 Symposium (TENSYMP), Dhaka, Bangladesh, 5–7 June 2020; pp. 283–286. [Google Scholar]
- Abdulateef, S.; Khan, N.A.; Chen, B.; Shang, X. Multidocument Arabic Text Summarization Based on Clustering and Word2Vec to Reduce Redundancy. Information 2020, 11, 59. [Google Scholar] [CrossRef] [Green Version]
- Ganesan, K.; Zhai, C.; Han, J. Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), Beijing, China, 23–27 August 2010; Coling 2010 Organizing Committee. pp. 340–348. [Google Scholar]
- Genest, P.E.; Lapalme, G. Fully Abstractive Approach to Guided Summarization. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Jeju Island, Korea, 8–14 July 2012; Association for Computational Linguistics: Stroudsburg, PA, USA; pp. 354–358. [Google Scholar]
- Khan, A.; Salim, N.; Farman, H.; Khan, M.; Jan, B.; Ahmad, A.; Ahmed, I.; Paul, A. Abstractive Text Summarization Based on Improved Semantic Graph Approach. Int. J. Parallel. Prog. 2018, 46, 992–1016. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Rekabdar, B.; Mousas, C.; Gupta, B. Generative Adversarial Network with Policy Gradient for Text Summarization. In Proceedings of the 2019 IEEE 13th International Conference on Semantic Computing (ICSC), Newport Beach, CA, USA, 30 January–1 February 2019; pp. 204–207. [Google Scholar]
- Yang, M.; Li, C.; Shen, Y.; Wu, Q.; Zhao, Z.; Chen, X. Hierarchical Human-Like Deep Neural Networks for Abstractive Text Summarization. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 2744–2757. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2016, arXiv:1409.0473. [Google Scholar]
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
- Xue, L.; Constant, N.; Roberts, A.; Kale, M.; Al-Rfou, R.; Siddhant, A.; Barua, A.; Raffel, C. MT5: A Massively Multilingual Pre-Trained Text-to-Text Transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online. 8 June 2021; Association for Computational Linguistics: Stroudsburg, PA, USA; pp. 483–498. [Google Scholar]
- Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. BART: Denoising Sequence-to-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online. 10 July 2020; Association for Computational Linguistics: Stroudsburg, PA, USA; pp. 7871–7880. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models Are Few-Shot Learners. In Proceedings of the Advances in Neural Information Processing Systems, Online. 6–12 December 2020; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 1877–1901. [Google Scholar]
- Shleifer, S.; Rush, A.M. Pre-Trained Summarization Distillation. arXiv 2020, arXiv:2010.13002. [Google Scholar]
- Hermann, K.M.; Kocisky, T.; Grefenstette, E.; Espeholt, L.; Kay, W.; Suleyman, M.; Blunsom, P. Teaching Machines to Read and Comprehend. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, CMA, Canada, 7–12 December 2015; Curran Associates, Inc.: Red Hook, NY, USA, 2015; Volume 28. [Google Scholar]
- Gliwa, B.; Mochol, I.; Biesek, M.; Wawer, A. SAMSum Corpus: A Human-Annotated Dialogue Dataset for Abstractive Summarization. In Proceedings of the 2nd Workshop on New Frontiers in Summarization, Hong Kong, China, 4 November 2019; Association for Computational Linguistics: Stroudsburg, PA, USA; pp. 70–79. [Google Scholar]
- Kim, B.; Kim, H.; Kim, G. Abstractive Summarization of Reddit Posts with Multi-Level Memory Networks. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Stroudsburg, PA, USA; pp. 2519–2531. [Google Scholar]
- Kornilova, A.; Eidelman, V. BillSum: A Corpus for Automatic Summarization of US Legislation. In Proceedings of the 2nd Workshop on New Frontiers in Summarization, Hong Kong, China, 4 November 2019; Association for Computational Linguistics: Stroudsburg, PA, USA; pp. 48–56. [Google Scholar]
- Hasan, T.; Bhattacharjee, A.; Islam, M.d.S.; Mubasshir, K.; Li, Y.F.; Kang, Y.B.; Rahman, M.S.; Shahriyar, R. XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online, 1–6 August 2021; Association for Computational Linguistics: Stroudsburg, PA, USA; pp. 4693–4703. [Google Scholar]
- Koh, H.Y.; Ju, J.; Liu, M.; Pan, S. An Empirical Survey on Long Document Summarization: Datasets, Models, and Metrics. ACM Comput. Surv. 2022, 55, 1–35. [Google Scholar] [CrossRef]
- Post, M. A Call for Clarity in Reporting BLEU Scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, Brussels, Belgium, 31 October–1 November 2018; Association for Computational Linguistics: Stroudsburg, PA, USA; pp. 186–191. [Google Scholar]
- Nathan, P. PyTextRank, a Python Implementation of TextRank for Phrase Extraction and Summarization of Text Documents. DerwenAI/Pytextrank: v3.1.1 release on PyPi | Zenodo. 2016. Available online: https://zenodo.org/record/4637885 (accessed on 19 June 2023).
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online. 5 October 2020; Association for Computational Linguistics: Stroudsburg, PA, USA; pp. 38–45. [Google Scholar]
- Fabbri, A.; Li, I.; She, T.; Li, S.; Radev, D. Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; Association for Computational Linguistics: Stroudsburg, PA, USA; pp. 1074–1084. [Google Scholar]
- Zhang, T.; Kishore, V.; Wu, F.; Weinberger, K.Q.; Artzi, Y. BERTScore: Evaluating Text Generation with BERT. arXiv 2020, arXiv:1904.09675. [Google Scholar]
- Sellam, T.; Das, D.; Parikh, A. BLEURT: Learning Robust Metrics for Text Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; Association for Computational Linguistics: Stroudsburg, PA, USA; pp. 7881–7892. [Google Scholar]
Dataset | Size | Mean #Words - Mean #Sentences (Text/Summary) | Link |
---|---|---|---|
CNN/Daily Mail | 312 k News Articles | 766/53 words— 29.74/3.72 sentences | https://github.com/abisee/cnn-dailymail (accessed on 19 June 2023) https://www.kaggle.com/datasets/gowrishankarp/newspaper-text-summarization-cnn-dailymail (accessed on 19 June 2023) |
XSum | 227 k News Articles | 431.07/23.26 words— 19.77/1.00 sentence | https://github.com/EdinburghNLP/XSum (accessed on 19 June 2023) |
XLSum (English) | 330 k News Articles | 460.42/22.07 words— 23.54/1.11 sentences | https://github.com/csebuetnlp/xl-sum/tree/master/seq2seq (accessed on 19 June 2023) |
SAMSum | 16 k Chat Dialogues | 93.77/20.3 words— 19.26/3.07 sentences | https://arxiv.org/src/1911.12237v2/anc/corpus.7z (accessed on 19 June 2023) |
ddit TIFU (TL;DR) | 123 k Reddit Posts (42 K Posts) | 444/23 words— 22/1.4 sentences- | https://github.com/ctr4si/MMN (accessed on 19 June 2023) |
BillSum (US) | 22 k Legislation Bills | 1686/243 words— 42/7.1 sentences | https://github.com/FiscalNote/BillSum (accessed on 19 June 2023) |
Model | R1 | R2 | RL | RLSum | B1 | B2 | SB | |
---|---|---|---|---|---|---|---|---|
Extractive | PositionRank | 0.3341 | 0.1277 | 0.2130 | 0.2819 | 0.2507 | 0.1432 | 0.0805 |
LexRank | 0.3245 | 0.1146 | 0.2013 | 0.2665 | 0.2334 | 0.1252 | 0.0693 | |
e-LexRank | 0.3211 | 0.1171 | 0.1983 | 0.2679 | 0.2323 | 0.1276 | 0.0715 | |
TextRank (pyTextRank) | 0.3126 | 0.1118 | 0.1940 | 0.2615 | 0.2310 | 0.1255 | 0.0702 | |
TextRank (sumy) | 0.2827 | 0.0956 | 0.1782 | 0.2339 | 0.1895 | 0.0994 | 0.0544 | |
Luhn | 0.3091 | 0.1160 | 0.1981 | 0.2584 | 0.2149 | 0.1209 | 0.0684 | |
TopicRank | 0.2987 | 0.1170 | 0.2022 | 0.2712 | 0.2383 | 0.1311 | 0.0728 | |
LSA | 0.2933 | 0.0909 | 0.1816 | 0.2399 | 0.2161 | 0.1046 | 0.0593 | |
Abstractive | distilBART-CNN-12-6 | 0.4292 | 0.2077 | 0.2971 | 0.3646 | 0.3625 | 0.2384 | 0.1420 |
BART-large-CNN | 0.4270 | 0.2058 | 0.2977 | 0.3647 | 0.3156 | 0.2052 | 0.1433 | |
PEGASUS-CNN_dailymail | 0.4186 | 0.2025 | 0.2983 | 0.3607 | 0.3259 | 0.2124 | 0.1312 | |
BART-large-CNN-SAMSum | 0.4203 | 0.1945 | 0.2904 | 0.3560 | 0.3081 | 0.1963 | 0.1303 | |
PEGASUS-large | 0.3300 | 0.1330 | 0.2294 | 0.2803 | 0.2341 | 0.1367 | 0.0807 | |
PEGASUS-multi_news | 0.2794 | 0.1042 | 0.1714 | 0.2233 | 0.1624 | 0.0890 | 0.0449 | |
BART-large-XSum | 0.2413 | 0.0737 | 0.1635 | 0.2099 | 0.0834 | 0.0385 | 0.0242 | |
distilBART-XSum-12-6 | 0.2197 | 0.0624 | 0.1518 | 0.1919 | 0.0630 | 0.0273 | 0.0177 | |
mT5-multilingual-XLSum | 0.2017 | 0.0545 | 0.1444 | 0.1789 | 0.0521 | 0.0218 | 0.1499 | |
PEGASUS-XSum | 0.1998 | 0.0672 | 0.1398 | 0.1749 | 0.0749 | 0.0381 | 0.0242 |
Model | R1 | R2 | RL | RLSum | B1 | B2 | SB | |
---|---|---|---|---|---|---|---|---|
Extractive | TopicRank | 0.1828 | 0.0258 | 0.1313 | 0.1313 | 0.1246 | 0.0266 | 0.0266 |
TextRank (sumy) | 0.1738 | 0.0248 | 0.1266 | 0.1266 | 0.1196 | 0.0271 | 0.0227 | |
TextRank (pyTextRank) | 0.1646 | 0.0219 | 0.1197 | 0.1197 | 0.1125 | 0.0227 | 0.0248 | |
Luhn | 0.1734 | 0.0246 | 0.1254 | 0.1254 | 0.1193 | 0.0265 | 0.0244 | |
e-LexRank | 0.1725 | 0.0243 | 0.1256 | 0.1256 | 0.1147 | 0.0239 | 0.0259 | |
LexRank | 0.1696 | 0.0226 | 0.1237 | 0.1237 | 0.1149 | 0.0238 | 0.0257 | |
PositionRank | 0.1618 | 0.0184 | 0.1188 | 0.1188 | 0.1121 | 0.0200 | 0.0247 | |
LSA | 0.1518 | 0.0165 | 0.1101 | 0.1101 | 0.1047 | 0.0165 | 0.0250 | |
Abstractive | PEGASUS-XSum | 0.4573 | 0.2405 | 0.3840 | 0.3840 | 0.3484 | 0.2374 | 0.1602 |
BART-large-XSum | 0.4417 | 0.2194 | 0.3649 | 0.3649 | 0.3445 | 0.2250 | 0.1470 | |
distilBART-XSum-12-6 | 0.4397 | 0.2197 | 0.3665 | 0.3665 | 0.3350 | 0.2204 | 0.1449 | |
mT5-multilingual-XLSum | 0.3520 | 0.1386 | 0.2845 | 0.2845 | 0.2536 | 0.1394 | 0.0889 | |
BART-large-CNN-SAMSum | 0.2084 | 0.0435 | 0.1399 | 0.1399 | 0.1251 | 0.0429 | 0.0231 | |
distilBART-CNN-12-6 | 0.1985 | 0.0357 | 0.1307 | 0.1307 | 0.1122 | 0.0348 | 0.0199 | |
PEGASUS-CNN_dailymail | 0.1979 | 0.0356 | 0.1339 | 0.1339 | 0.1247 | 0.0368 | 0.0209 | |
BART-large-CNN | 0.1972 | 0.0339 | 0.1303 | 0.1303 | 0.1167 | 0.0340 | 0.0197 | |
PEGASUS-large | 0.1654 | 0.0266 | 0.1146 | 0.1146 | 0.0988 | 0.0252 | 0.0190 | |
PEGASUS-multi_news | 0.1578 | 0.0491 | 0.1114 | 0.1114 | 0.0812 | 0.0387 | 0.0173 |
Model | R1 | R2 | RL | RLSum | B1 | B2 | SB | |
---|---|---|---|---|---|---|---|---|
Extractive | TopicRank | 0.1900 | 0.0279 | 0.1354 | 0.1354 | 0.1291 | 0.0284 | 0.0271 |
e-LexRank | 0.1841 | 0.0277 | 0.1325 | 0.1325 | 0.1224 | 0.0269 | 0.0267 | |
Luhn | 0.1812 | 0.0259 | 0.1293 | 0.1293 | 0.1225 | 0.0275 | 0.0240 | |
TextRank (sumy) | 0.1802 | 0.0260 | 0.1304 | 0.1304 | 0.1222 | 0.0278 | 0.0232 | |
TextRank (pyTextRank) | 0.1690 | 0.0230 | 0.1224 | 0.1224 | 0.1152 | 0.0229 | 0.0249 | |
LexRank | 0.1778 | 0.0250 | 0.1290 | 0.1290 | 0.1207 | 0.0258 | 0.0265 | |
PositionRank | 0.1627 | 0.0194 | 0.1188 | 0.1188 | 0.1118 | 0.0204 | 0.0246 | |
LSA | 0.1526 | 0.0169 | 0.1101 | 0.1101 | 0.1057 | 0.0165 | 0.0249 | |
Abstractive | PEGASUS-XSum | 0.4343 | 0.2189 | 0.3617 | 0.3617 | 0.3269 | 0.2148 | 0.1452 |
distilBART-XSum-12-6 | 0.4237 | 0.2048 | 0.3500 | 0.3500 | 0.3208 | 0.2041 | 0.1362 | |
BART-large-XSum | 0.4223 | 0.2009 | 0.3450 | 0.3450 | 0.3260 | 0.2055 | 0.1347 | |
mT5-multilingual-XLSum | 0.3623 | 0.1491 | 0.2927 | 0.2927 | 0.2586 | 0.1462 | 0.0946 | |
BART-large-CNN-SAMSum | 0.2097 | 0.0441 | 0.1406 | 0.1406 | 0.1239 | 0.0417 | 0.0227 | |
BART-large-CNN | 0.2011 | 0.0348 | 0.1322 | 0.1322 | 0.1170 | 0.0333 | 0.0199 | |
distilBART-CNN-12-6 | 0.1999 | 0.0355 | 0.1311 | 0.1311 | 0.1112 | 0.0335 | 0.0196 | |
PEGASUS-CNN_dailymail | 0.1966 | 0.0339 | 0.1318 | 0.1318 | 0.1215 | 0.0342 | 0.0197 | |
PEGASUS-large | 0.1694 | 0.0288 | 0.1158 | 0.1158 | 0.0970 | 0.0325 | 0.0177 | |
PEGASUS-multi_news | 0.1443 | 0.0422 | 0.1014 | 0.1014 | 0.0717 | 0.0261 | 0.0144 |
Model | R1 | R2 | RL | RLSum | B1 | B2 | SB | |
---|---|---|---|---|---|---|---|---|
Extractive | TopicRank | 0.1770 | 0.0284 | 0.1290 | 0.1290 | 0.1195 | 0.0295 | 0.0270 |
Luhn | 0.1724 | 0.0284 | 0.1233 | 0.1233 | 0.1190 | 0.0311 | 0.0255 | |
e-LexRank | 0.1703 | 0.0265 | 0.1264 | 0.1264 | 0.1111 | 0.0260 | 0.0268 | |
TextRank (sumy) | 0.1689 | 0.0263 | 0.1215 | 0.1215 | 0.1164 | 0.0290 | 0.0239 | |
TextRank (pyTextRank) | 0.1534 | 0.0228 | 0.1145 | 0.1145 | 0.1036 | 0.0237 | 0.0246 | |
LexRank | 0.1673 | 0.0250 | 0.1223 | 0.1223 | 0.1131 | 0.0261 | 0.0262 | |
LSA | 0.1474 | 0.0179 | 0.1095 | 0.1095 | 0.0994 | 0.0172 | 0.0248 | |
PositionRank | 0.1304 | 0.0174 | 0.0990 | 0.0990 | 0.0864 | 0.0169 | 0.0229 | |
Abstractive | BART-large-CNN-SAMSum | 0.1834 | 0.0421 | 0.1305 | 0.1305 | 0.1065 | 0.0386 | 0.0228 |
BART-large-XSum | 0.1676 | 0.0329 | 0.1274 | 0.1274 | 0.1088 | 0.0300 | 0.0315 | |
distilBART-XSum-12-6 | 0.1697 | 0.0314 | 0.1300 | 0.1300 | 0.1059 | 0.0283 | 0.0313 | |
distilBART-CNN-12-6 | 0.1657 | 0.0340 | 0.1143 | 0.1143 | 0.0924 | 0.0315 | 0.0184 | |
PEGASUS-large | 0.1617 | 0.0295 | 0.1133 | 0.1133 | 0.0993 | 0.0301 | 0.0203 | |
PEGASUS-CNN_dailymail | 0.1596 | 0.0326 | 0.1118 | 0.1118 | 0.0956 | 0.0316 | 0.0169 | |
BART-large-CNN | 0.1570 | 0.0328 | 0.1088 | 0.1088 | 0.0883 | 0.0304 | 0.0167 | |
PEGASUS-XSum | 0.1428 | 0.0228 | 0.1135 | 0.1135 | 0.0779 | 0.0167 | 0.0260 | |
PEGASUS-multi_news | 0.1063 | 0.0243 | 0.0743 | 0.0743 | 0.0524 | 0.0197 | 0.0095 |
Model | R1 | R2 | RL | RLSum | B1 | B2 | SB | |
---|---|---|---|---|---|---|---|---|
Extractive | e-LexRank | 0.2762 | 0.0769 | 0.2059 | 0.2059 | 0.1054 | 0.0525 | 0.0393 |
LexRank | 0.1654 | 0.0310 | 0.1136 | 0.1136 | 0.0979 | 0.0309 | 0.0183 | |
TopicRank | 0.1613 | 0.0313 | 0.1112 | 0.1112 | 0.0958 | 0.0316 | 0.0180 | |
LSA | 0.1622 | 0.0253 | 0.1101 | 0.1101 | 0.0987 | 0.0262 | 0.0173 | |
TextRank (pyTextRank) | 0.1570 | 0.0287 | 0.1063 | 0.1063 | 0.0928 | 0.0282 | 0.0172 | |
TextRank (sumy) | 0.1471 | 0.0296 | 0.1027 | 0.1027 | 0.0823 | 0.0282 | 0.0154 | |
PositionRank | 0.1557 | 0.0272 | 0.1073 | 0.1073 | 0.0949 | 0.0270 | 0.0178 | |
Luhn | 0.1508 | 0.0306 | 0.1049 | 0.1049 | 0.0850 | 0.0292 | 0.0161 | |
Abstractive | BART-large-CNN-SAMSum | 0.4004 | 0.2007 | 0.3112 | 0.3112 | 0.2603 | 0.1742 | 0.1165 |
BART-large-CNN | 0.3041 | 0.1009 | 0.2272 | 0.2272 | 0.1487 | 0.0793 | 0.0508 | |
PEGASUS-CNN_dailymail | 0.2866 | 0.0833 | 0.2220 | 0.2220 | 0.1433 | 0.0697 | 0.0447 | |
distilBART-CNN-12-6 | 0.2884 | 0.0937 | 0.2149 | 0.2149 | 0.1350 | 0.0697 | 0.0466 | |
BART-large-XSum | 0.2572 | 0.0514 | 0.1878 | 0.1878 | 0.1542 | 0.0460 | 0.0431 | |
PEGASUS-large | 0.2589 | 0.0628 | 0.2021 | 0.2021 | 0.1008 | 0.0468 | 0.0429 | |
distilBART-XSum-12-6 | 0.2121 | 0.0328 | 0.1561 | 0.1561 | 0.1276 | 0.0275 | 0.0361 | |
mT5-multilingual-XLSum | 0.1787 | 0.0235 | 0.1359 | 0.1359 | 0.1082 | 0.0190 | 0.0313 | |
PEGASUS-XSum | 0.1428 | 0.0228 | 0.1135 | 0.1135 | 0.0779 | 0.0167 | 0.0260 | |
PEGASUS-multi_news | 0.1127 | 0.0200 | 0.0804 | 0.0804 | 0.0543 | 0.0149 | 0.0103 |
Model | R1 | R2 | RL | RLSum | B1 | B2 | SB | |
---|---|---|---|---|---|---|---|---|
Extractive | LexRank | 0.3845 | 0.1888 | 0.2460 | 0.2460 | 0.2663 | 0.1815 | 0.1140 |
TextRank (pyTextRank) | 0.3638 | 0.1735 | 0.2168 | 0.2168 | 0.2477 | 0.1651 | 0.1029 | |
TextRank (sumy) | 0.3503 | 0.1799 | 0.2320 | 0.2320 | 0.2437 | 0.1629 | 0.1033 | |
PositionRank | 0.3601 | 0.1686 | 0.2153 | 0.2153 | 0.2436 | 0.1606 | 0.0996 | |
e-LexRank | 0.3595 | 0.1750 | 0.2205 | 0.2205 | 0.2397 | 0.1620 | 0.1032 | |
TopicRank | 0.3568 | 0.1763 | 0.2174 | 0.2174 | 0.2334 | 0.1662 | 0.1036 | |
Luhn | 0.3521 | 0.1812 | 0.2342 | 0.2342 | 0.2349 | 0.1640 | 0.1041 | |
LSA | 0.3480 | 0.1406 | 0.2115 | 0.2115 | 0.2288 | 0.1380 | 0.0846 | |
Abstractive | PEGASUS-large | 0.3568 | 0.1480 | 0.2268 | 0.2268 | 0.2312 | 0.1400 | 0.0901 |
BART-large-CNN-SAMSum | 0.3186 | 0.1571 | 0.2301 | 0.2301 | 0.1333 | 0.0881 | 0.0562 | |
distilBART-CNN-12-6 | 0.3025 | 0.1465 | 0.2161 | 0.2161 | 0.1303 | 0.0836 | 0.0528 | |
PEGASUS-CNN_dailymail | 0.2962 | 0.1405 | 0.2116 | 0.2116 | 0.1311 | 0.0832 | 0.0543 | |
BART-large-CNN | 0.2954 | 0.1433 | 0.2146 | 0.2146 | 0.1168 | 0.0746 | 0.0495 | |
PEGASUS-multi_news | 0.2801 | 0.0721 | 0.1661 | 0.1661 | 0.1909 | 0.0864 | 0.0406 | |
mT5-multilingual-XLSum | 0.1486 | 0.0584 | 0.1155 | 0.1155 | 0.0232 | 0.0132 | 0.0081 | |
PEGASUS-XSum | 0.1440 | 0.0742 | 0.1171 | 0.1171 | 0.0264 | 0.0152 | 0.0129 | |
BART-large-XSum | 0.1327 | 0.0641 | 0.0988 | 0.0988 | 0.0159 | 0.0093 | 0.0070 | |
distilBART-XSum-12-6 | 0.1237 | 0.0540 | 0.0960 | 0.0960 | 0.0157 | 0.0091 | 0.0065 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Giarelis, N.; Mastrokostas, C.; Karacapilidis, N. Abstractive vs. Extractive Summarization: An Experimental Review. Appl. Sci. 2023, 13, 7620. https://doi.org/10.3390/app13137620
Giarelis N, Mastrokostas C, Karacapilidis N. Abstractive vs. Extractive Summarization: An Experimental Review. Applied Sciences. 2023; 13(13):7620. https://doi.org/10.3390/app13137620
Chicago/Turabian StyleGiarelis, Nikolaos, Charalampos Mastrokostas, and Nikos Karacapilidis. 2023. "Abstractive vs. Extractive Summarization: An Experimental Review" Applied Sciences 13, no. 13: 7620. https://doi.org/10.3390/app13137620
APA StyleGiarelis, N., Mastrokostas, C., & Karacapilidis, N. (2023). Abstractive vs. Extractive Summarization: An Experimental Review. Applied Sciences, 13(13), 7620. https://doi.org/10.3390/app13137620