# A Highly Scalable Method for Extractive Text Summarization Using Convex Optimization

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Overview

- 1.
- Generating short summaries for a large number of newspaper articles; from our experiments, the proposed method outperforms other methods of similar complexity and it is faster (see Section 7.2.1 and Section 7.2.2);
- 2.
- Generating query-based summaries for collections of documents; in this case the method is close or above the best official scores reported on the dataset (see Section 7.3).

#### 1.2. The Main Contributions and Paper Organization

- 1.
- 2.
- In order to tackle the computational issue, we introduce a convex relaxation of the program, inspired by the compressed sensing literature [4,7], in the sense that we try to find a sparse vector by convex optimization, and the sparsity constraint is replaced by a bound on the ${l}_{1}$ norm. The technique is general, but the quality of the approximation depends on the properties of the input data. The experiments that we have performed show that the quality of the solution is satisfactory for our purpose (Section 7). We also tried to justify the results using some plausible assumptions about the data (Section 6);
- 3
- A comparison of the proposed method with the one based on sparse coding, which is also based on convex programming and has some similarities with the one presented in the current paper (Section 5);
- 4
- An empirical evaluation of the method on two tasks—simple extractive summarization and query based summarization. The evaluation is done based on accuracy and execution time (Section 7).

## 2. Related Work

**submodular function maximization**problem [9,10]. This approach proved to be quite useful from practical and theoretical standpoints. It is similar to ours, in the way that it tackles the summarization from an optimization perspective.

**convex optimization**and the ${l}_{1}$ norm for text processing. Our goal is different both in purpose and method. In [11], the authors attempted to extract the relevant parts from a text corpus, where the relevance is concerning some keywords. To this end, they have defined the task as a feature selection problem and aimed to take advantage of the properties of the LASSO regression and ${l}_{1}$-penalized logistic regression [13]. On the other hand, our work deals with the standard extractive summarization problem using an approach based on formulating more directly a convex program with constraints.

**meta-heuristics**. For example, in [14], an algorithm based on differential evolution in proposed, while in [15], the authors apply fuzzy evolutionary optimization modeling. Another method from this category used with success for extractive summarization is the memetic algorithm [16]. A meta-heuristic method designed for speed is the micro-genetic algorithm [17]. The algorithm is faster than the standard genetic algorithm, mainly due to its smaller population size. It was employed for text summarization in [18]. While most of the algorithms from this category do not have formal convergence guarantees, simple stopping criteria, like the number of generations or objective function evaluations, work very well in practice [16,18,19]. These approaches often perform very well in terms of execution time and quality of the solution.

**sparse coding**. It uses a similar formalization with the one proposed in the current paper. The main difference lies in the structure of the objective function and the optimization algorithm (the core optimization algorithm used in [20] is the coordinate descent). We shall discuss it in some detail after the current method is presented (see Section 5).

**graph based**method, in fact a variation of the PageRank algorithm used in web search engines. Besides its excellent performance, it has the advantages of being computationally efficient [3], easy to apply, and scalable; thus it is a good baseline. We have implemented TextRank using the same technology as for our algorithm; therefore, we are able to make some meaningful comparisons regarding the execution time and scalability. For a description of the algorithm, see Appendix B.

**deep learning**summarization methods against which we compare our method. Inspired by the neural language models used for automatic translation, in [22], an attention based method has been developed. The system is abstractive and gives competitive results on several tasks. An important class of neural networks applied to summarization are those based on sequence-to-sequence architectures [23,24]. These architectures usually consist of one or more recurrent neural networks (e.g., LSTMs—long short-term memory) that map an input sequence to an output sequence (e.g., the text to the summary). Despite some inherent problems, like the tendency to repeat, they can provide good results. The work [25] is also based on recurrent neural networks (specifically LSTM) but takes a more modular approach: first, the most important sentences are extracted, then they are compressed by removing some of the words. The sequence pointer-generator network, described in [26], is a hybrid between a simple sequence-to-sequence network and a pointer network. Another architecture applied for summarization is the multi-layered attentional peephole convolutional LSTM, a modification of the standard LSTM [27]. The network used in [28] makes use of a complex language model (“transformer language model”). Despite using a powerful and resources intensive approach, the method is not particularly good. A somehow different approach is that based on reinforcement learning [29]. The main advantage of this solution is that it does not require large amounts of training data. The methods based on deep learning often provide very good results but are complex and require large amounts of computational resources and usually also training data.

## 3. Problem Formalization

#### 3.1. Preprocessing

- Tokenization;
- Stop words removal;
- Conversion to lowercase;
- Stemming (with the Porter stemmer).

#### 3.2. Processing

**processing**step.

#### 3.3. Postprocessing

#### 3.4. Extending the Method

## 4. A Projected Gradient Descent Algorithm

- 1.
- the convergence tolerance $\u03f5$;
- 2.
- the maximum number of iterations $maxNoIter$;
- 3
- the ${l}_{1}$ enforcement parameter C.

#### 4.1. Considerations about the Running Time

Algorithm 1 Projected gradient descent (PGD) algorithm for P2 |

## 5. The Relation with Sparse Coding

## 6. Properties of the Optimization Problems

#### 6.1. Solving P1 Is NP-Hard

**Problem**

**1.**

#### 6.2. The Program P2 Is Convex

#### 6.3. The Optimal Value of P2 Is a Lower Bound for the Optimal Value of P1

#### 6.4. Under Some Conditions the Solution of P2 Is Close to a k-Sparse Vector

**Proof.**

- 1.
- Bounding ${\Vert {\mathbf{h}}_{\Vert}\Vert}_{2}$.

**Claim 1.**Lemma 10.6.5 in [7] $\Vert {{\mathbf{M}}^{\mathbf{T}}\mathbf{h}\Vert}_{2}^{2}\le 2{\mathbf{w}}^{T}{\mathbf{M}}^{\mathbf{T}}\mathbf{h}$.

**Claim 1**we have:

- 2.
- Bounding ${\Vert {\mathbf{h}}_{\perp}\Vert}_{2}$.

- 3.
- Bounding $\mathbf{h}$.

- 1.
- The bound is deterministic, not “with high probability”;
- 2.
- The bound is in a sense weaker, because we do not guarantee perfect reconstruction in the noise free case ($\Vert \mathbf{w}\Vert =0$). The bound can be in general quite loose, but if our assumptions and intuitions are true, this is not the case.

## 7. Implementation and Results

#### 7.1. Quality of the Approximation and Execution Time

- 1.
- What is the execution time of our method for different problem sizes?
- 2.
- How good is the approximate solution provided by Algorithm 1?

#### 7.1.1. Quality of the Approximation

#### Quality of the Approximation as a Function of the Number of Sentences

#### Quality of the Approximation as a Function of the Number of Words

#### Quality of the Approximation for Real Data

#### 7.1.2. Execution Time

#### Execution Time as a Function of the Number of Sentences

#### Execution Time as a Function of the Number of Words

#### Execution Time for Different Document Lengths—Experiments with Real Data

#### 7.2. Single Document Summarization

#### 7.2.1. Experiments with a Medium Size Dataset

**Example**

**1.**

**Example**

**2.**

**b**the first three values are set to 1 and the last one is set up to 0.5. This selection is based empirically on the fact that these parts of the text tend to be more important. With these modifications, a significant improvement is noticeable for all metrics, CS outperforming TextRank (e.g., the F-score increases by more than 0.01). Most likely this can be further improved, using more carefully chosen values for $\lambda $ and

**b**.

#### 7.2.2. Experiments with a Large Dataset (CORNELL NEWSROOM)

#### 7.3. Query-Based Multi-Document Summarization

## 8. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A. An Example of Text Representations Computation

## Appendix B. TextRank

## Appendix C. Useful Convex Optimization and Signal Processing Notions

## Appendix D. Different Approaches to Text Summarization

**Table A1.**Summarization methods (UE-unsupervised extractive; SE-supervised extractive; UA-unsupervised abstractive; SA-supervised abstractive; UM-unsupervised mixed; SM-supervised mixed).

Method | Training | Grammatical | New Words | Compression | Scalability | Complexity |
---|---|---|---|---|---|---|

UE | No | Usually | No | Medium–Low | Medium–High | Low–Medium |

SE | Yes | Usually | No | Medium–Low | Medium–High | Medium |

UA | No | Sometimes | Yes | Medium–High | Low–Medium | Medium |

SA | Yes | Sometimes | Yes | Medium–High | Low–Medium | Medium–High |

UM | No | Often | Yes | Medium | Medium | Medium–High |

SM | Yes | Often | Yes | Medium | Medium | High |

## References

- Popescu, M.C.; Grama, L.; Rusu, C. On the use of positive definite symmetric kernels for summary extraction. In Proceedings of the 2020 13th International Conference on Communications (COMM), Bucharest, Romania, 18–20 June 2020; pp. 335–340. [Google Scholar] [CrossRef]
- Nenkova, A.; McKeown, K. Automatic Summarization. Found. Trends® Inf. Retr.
**2011**, 5, 103–233. [Google Scholar] [CrossRef][Green Version] - Popescu, C.; Grama, L.; Rusu, C. Automatic Text Summarization by Mean-absolute Constrained Convex Optimization. In Proceedings of the 41st International Conference on Telecommunications and Signal Processing, Athens, Greece, 4–6 July 2018; pp. 706–709. [Google Scholar] [CrossRef]
- Candes, E.J.; Wakin, M.B. An Introduction To Compressive Sampling. IEEE Signal Process. Mag.
**2008**, 25, 21–30. [Google Scholar] [CrossRef] - Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. B
**1996**, 58, 267–288. [Google Scholar] [CrossRef] - Uthus, D.C.; Aha, D.W. Multiparticipant chat analysis: A survey. Artif. Intell.
**2013**, 199–200, 106–121. [Google Scholar] [CrossRef] - Vershynin, R. High-Dimensional Probability: An Introduction with Applications in Data Science; Cambridge University Press: Cambridge, UK, 2018. [Google Scholar]
- Allahyari, M.; Pouriyeh, S.A.; Assefi, M.; Safaei, S.; Trippe, E.D.; Gutierrez, J.B.; Kochut, K. Text Summarization Techniques: A Brief Survey. arXiv
**2017**, arXiv:abs/1707.02268. [Google Scholar] [CrossRef][Green Version] - Hui Lin, J.B.; Xie, S. Graph-based submodular selection for extractive summarization. In Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, Moreno, Italy, 13 November–17 December 2009; pp. 381–386. [Google Scholar] [CrossRef]
- Lin, H.; Bilmes, J. Multi-document Summarization via Budgeted Maximization of Submodular Functions. In Proceedings of the Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics—HLT’10, Los Angeles, CA, USA, 2–4 June 2010; Association for Computational Linguistics: Stroudsburg, PA, USA, 2010; pp. 912–920. [Google Scholar]
- Jia, J.; Miratrix, L.; Yu, B.; Gawalt, B.; Ghaoui, L.E.; Barnesmoore, L.; Clavier, S. Concise comparative summaries (CCS) of large text corpora with a human experiment. arXiv
**2014**, arXiv:1404.7362. [Google Scholar] [CrossRef][Green Version] - Miratrix, L.; Jia, J.; Gawalt, B.; Yu, B.; Ghaoui, L.E. What Is in the News on a Subject: Automatic and Sparse Summarization of Large Document Corpora; UC Berkeley: Berkeley, CA, USA, 2011; pp. 1–36. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning—Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2011. [Google Scholar]
- Aliguliyev, R.M. A new sentence similarity measure and sentence based extractive technique for auto-matic text summarization. Expert Syst. Appl.
**2009**, 36, 7764–7772. [Google Scholar] [CrossRef] - Song, W.; Cheon Choi, L.; Cheol Park, S.; Feng Ding, X. Fuzzy Evolutionary Optimization Modeling and Its Applications to Unsupervised Categorization and Extractive Summarization. Expert Syst. Appl.
**2011**, 38, 9112–9121. [Google Scholar] [CrossRef] - Mendoza, M.; Bonilla, S.; Noguera, C.; Cobos, C.; León, E. Extractive Single-Document Summarization Based on Genetic Operators and Guided Local Search. Expert Syst. Appl.
**2014**, 41, 4158–4169. [Google Scholar] [CrossRef] - Krishnakumar, K. Micro-Genetic Algorithms for Stationary and Non-Stationary Function Optimization. In Proceedings of the 1989 Symposium on Visual Communications Image Processing, and Intelligent Robotics Systems, Philadelphia, PA, USA, 1–3 November 1989. [Google Scholar]
- Debnath, D.; Das, R.; Pakray, P. Extractive Single Document Summarization Using an Archive-Based Micro Genetic-2. In Proceedings of the 2020 7th International Conference on Soft Computing Machine Intelligence (ISCMI), Stockholm, Sweden, 14–15 November 2020; pp. 244–248. [Google Scholar] [CrossRef]
- Saini, N.; Saha, S.; Chakraborty, D.; Bhattacharyya, P. Extractive single document summarization using binary differential evolution: Optimization of different sentence quality measures. PLoS ONE
**2019**, 14, e0223477. [Google Scholar] [CrossRef][Green Version] - Li, P.; Bing, L.; Lam, W.; Li, H.; Lia, Y. Reader-Aware Multi-Document Summarization via Sparse Coding. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015; pp. 1270–1276. [Google Scholar]
- Mihalcea, R.; Tarau, P. TextRank: Bringing Order into Text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 25–26 July 2004; pp. 404–411. [Google Scholar]
- Rush, A.M.; Chopra, S.; Weston, J. A Neural Attention Model for Abstractive Sentence Summarization. arXiv
**2015**, arXiv:1509.00685. [Google Scholar] - Shi, T.; Keneshloo, Y.; Ramakrishnan, N.; Reddy, C.K. Neural Abstractive Text Summarization with Sequence-to-Sequence Models. arXiv
**2018**, arXiv:1812.02303. [Google Scholar] - Shi, T.; Wang, P.; Reddy, C.K. LeafNATS: An Open-Source Toolkit and Live Demo System for Neural Abstractive Text Summarization. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Minneapolis, MN, USA, 2–7 June 2019; pp. 66–71. [Google Scholar] [CrossRef]
- Mendes, A.; Narayan, S.; Miranda, S.; Marinho, Z.; Martins, A.F.T.; Cohen, S.B. Jointly Extracting and Compressing Documents with Summary State Representations. arXiv
**2019**, arXiv:1904.02020. [Google Scholar] - See, A.; Liu, P.J.; Manning, C.D. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Association for Computational Linguistics: Vancouver, BC, Canada, 2017; pp. 1073–1083. [Google Scholar] [CrossRef][Green Version]
- Rahman, M.M.; Siddiqui, F.H. An Optimized Abstractive Text Summarization Model Using Peephole Convolutional LSTM. Symmetry
**2019**, 11, 1290. [Google Scholar] [CrossRef][Green Version] - Subramanian, S.; Li, R.; Pilault, J.; Pal, C. On Extractive and Abstractive Neural Document Summarization with Transformer Language Models. arXiv
**2019**, arXiv:1909.03186. [Google Scholar] - Keneshloo, Y.; Ramakrishnan, N.; Reddy, C.K. Deep Transfer Reinforcement Learning for Text Summarization. arXiv
**2018**, arXiv:1810.06667. [Google Scholar] - Salton, G.; McGill, M.J. Introduction to Modern Information Retrieval; McGraw-Hill, Inc.: New York, NY, USA, 1986. [Google Scholar]
- Knight, K.; Marcu, D. Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artif. Intell.
**2002**, 139, 91–107. [Google Scholar] [CrossRef][Green Version] - Gupta, M.D.; Kumar, S.; Xiao, J. L1 Projections with Box Constraints. arXiv
**2010**, arXiv:1010.0141. [Google Scholar] - Gupta, M.D.; Xiao, J.; Kumar, S. L1 Projections with Box Constraints U.S 8407171B2, 26 March 2013. Available online: https://patents.google.com/patent/US20110191400A1/en (accessed on 10 March 2021).
- Jones, E.; Oliphant, T.; Peterson, P. SciPy: Open Source Scientific Tools for Python. 2001. Available online: https://www.scipy.org/ (accessed on 12 February 2021).
- Powell, M.J.D. A Direct Search Optimization Method That Models the Objective and Constraint Functions by Linear Interpolation. In Advances in Optimization and Numerical Analysis; Gomez, S., Hennart, J.P., Eds.; Springer: Dordrecht, The Netherlands, 1994; pp. 51–67. [Google Scholar] [CrossRef]
- Cormen, T.H.; Leiserson, C.E.; Rivest, R.L.; Stein, C. Introduction to Algorithms, 2nd ed.; The MIT Press: Cambridge, MA, USA, 2001. [Google Scholar]
- Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar] [CrossRef]
- Candes, E.; Tao, T. Decoding by linear programming. IEEE Trans. Inf. Theory
**2005**, 51, 4203–4215. [Google Scholar] [CrossRef][Green Version] - Bird, S.; Klein, E.; Loper, E. Natural Language Processing with Python—Analyzing Text with the Natural Language Toolkit, 2nd ed.; O’Reilly Media: Sebastopol, CA, USA, 2009. [Google Scholar]
- Oliphant, T.E. Guide to NumPy, 1st ed.; O’Reilly Media: Sebastopol, CA, USA, 2015. [Google Scholar]
- Hunter, J.D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng.
**2007**, 9, 90–95. [Google Scholar] [CrossRef] - Lin, C.Y. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out; Association for Computational Linguistics: Barcelona, Spain, 2004; pp. 74–81. [Google Scholar]
- Vonteru, K. News Summary. Generating Short Length Descriptions of News Articles. 2019. Available online: https://www.kaggle.com/sunnysai12345/news-summary/data (accessed on 10 March 2021).
- Tolstoy, L. War and Peace. eBook Translated by Louise and Aylmer Maude. 2009. Available online: http://www.gutenberg.org/files/2600/2600-h/2600-h.htm#link2HCH0049 (accessed on 10 March 2021).
- DUC 2002. Document Understanding Conference 2002. 2002. Available online: https://www-nlpir.nist.gov/projects/duc/data/2002_data.html (accessed on 8 February 2021).
- Grusky, M.; Naaman, M.; Artzi, Y. Newsroom: A Dataset of 13 Million Summaries with Diverse Extractive Strategies. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers); Association for Computational Linguistics: New Orleans, LA, USA, 2018; pp. 708–719. [Google Scholar] [CrossRef][Green Version]
- Barrios, F.; López, F.; Argerich, L.; Wachenchauzer, R. Variations of the Similarity Function of TextRank for Automated Summarization. arXiv
**2016**, arXiv:1602.03606. [Google Scholar] - DUC 2005. Document Understanding Conference 2005. 2005. Available online: https://www-nlpir.nist.gov/projects/duc/data/2005_data.html (accessed on 10 March 2021).
- Litvak, M.; Vanetik, N. Query-based summarization using MDL principle. In Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres, Valencia, Spain, 3–4 April 2017; pp. 22–31. [Google Scholar] [CrossRef][Green Version]
- Dang, H.T. Overview of DUC 2005. In Proceedings of the Document Understanding Conf. Wksp. 2005 (DUC 2005) at the Human Language Technology Conf./Conf. on Empirical Methods in Natural Language Processing (HLT/EMNLP), Vancouver, BC, Canada, 6–8 October 2005; pp. 1–12. [Google Scholar]
- Miller, G.A. WordNet: A Lexical Database for English. Commun. ACM
**1995**, 38, 39–41. [Google Scholar] [CrossRef] - Rao, Y.; Kosari, S.; Shao, Z.; Cai, R.; Liu, X. A Study on Domination in Vague Incidence Graph and Its Application in Medical Sciences. Symmetry
**2020**, 12, 1885. [Google Scholar] [CrossRef]

**Figure 1.**A taxonomy of the unsupervised extractive summarization algorithms. We restrict to the most relevant methods.

**Figure 4.**Quality of the approximation as the number of sentences increases (number of words is 50). Comparison between our method (PGD) and an “off-the-shelf” method (COBYLA).

**Figure 5.**Quality of the approximation as the number of sentences increases (number of words is 1000). Comparison between our method (PGD) and an “off-the-shelf” method (COBYLA).

**Figure 6.**Quality of the approximation as the number of words increases (number of sentences is 20). Comparison between our method (PGD) and an “off-the-shelf” method (COBYLA).

**Figure 8.**Variation of the execution time when the number of sentences increases (for both axes a logarithmic scale is used). Comparison between our method (PGD), an “off-the-shelf” method (COBYLA), and the “brute-force” approach (which could be run only for small instances).

**Figure 9.**Variation of the execution time when the number of words increases. Comparison between our method (PGD), an “off-the-shelf” method (COBYLA), and the “brute-force” approach.

**Figure 10.**Execution time of the summarization program for real data. The texts are generated by truncating the novel War and Peace at different numbers of words. Note that the number of sentences also increases.

1. | Input text corpus, k | ||
---|---|---|---|

2. | Computation | ||

2.1. | Preprocessing | ||

2.1.1. | Tokenization | ||

2.1.2. | Stop words removal | ||

2.1.3. | Conversion to lowercase | ||

2.1.4. | Stemming (with the Porter stemmer) | ||

2.1.5. | Computation of the numerical text representation (TF-IDF) | ||

2.2. | Processing | ||

2.2.1. | Solve P2 (or P3) | ||

2.2.2. | Find the k biggest entries of the solution | ||

2.3. | Postprocessing | ||

2.3.1. | Find the sentences associated with the selected entries | ||

2.3.2. | Concatenate the extracted sentences to generate the summary | ||

3. | Output: Summary |

**Table 2.**Quality of the approximation as the number of sentences increases (number of words is 1000). Averages over 20 executions. Comparison between our method (PGD) and an “off-the-shelf” method (COBYLA).

No. of Sentences | COBYLA | PGD |
---|---|---|

10 | 0.025 | 0.030 |

15 | 0.059 | 0.053 |

20 | 0.051 | 0.044 |

25 | 0.053 | 0.050 |

30 | 0.053 | 0.050 |

**Table 3.**Quality of the approximation as the number of words increases (number of sentences is 20). Average over 20 executions. Comparison between our method (PGD) and an “off-the-shelf” method (COBYLA).

No. of Words | COBYLA | PGD |
---|---|---|

50 | 0.249 | 0.254 |

100 | 0.162 | 0.164 |

250 | 0.110 | 0.106 |

500 | 0.066 | 0.069 |

750 | 0.052 | 0.044 |

1000 | 0.045 | 0.043 |

2000 | 0.031 | 0.029 |

5000 | 0.022 | 0.018 |

10,000 | 0.015 | 0.015 |

**Table 4.**Execution time (in seconds) as a function of the number of sentences for the brute-force, an “off the shelf” convex solver (COBYLA), and the algorithm presented in this paper (PGD—Algorithm 1). The results are for a vocabulary of $m=20$ words and $k=0.25n$ long summaries. The numbers are averages over 20 executions.

No. of Sentences | Brute-Force | COBYLA | PGD |
---|---|---|---|

10 | 0.002 | 0.022 | 0.001 |

15 | 0.020 | 0.044 | 0.001 |

20 | 0.656 | 0.087 | 0.001 |

25 | NA | 0.129 | 0.001 |

30 | NA | 0.167 | 0.001 |

40 | NA | 0.295 | 0.002 |

50 | NA | 0.442 | 0.001 |

100 | NA | 2.053 | 0.002 |

250 | NA | 2.571 | 0.002 |

500 | NA | 1.138 | 0.004 |

750 | NA | 1.662 | 0.008 |

1000 | NA | 2.256 | 0.016 |

**Table 5.**Execution time (in seconds) as a function of the number of words for the brute-force, an “off the shelf” convex solver (COBYLA), and the algorithm presented in this paper (PGD—Algorithm 1). The results are for $n=20$ sentences and $k=2$ long summaries. The outcomes are averages over 20 executions.

No. of Words | Brute-Force | COBYLA | PGD |
---|---|---|---|

10 | 0.002 | 0.021 | 0.001 |

100 | 0.002 | 0.020 | 0.0008 |

250 | 0.002 | 0.020 | 0.0007 |

500 | 0.002 | 0.018 | 0.0007 |

750 | 0.003 | 0.019 | 0.0007 |

1000 | 0.003 | 0.020 | 0.0007 |

2000 | 0.003 | 0.020 | 0.0005 |

5000 | 0.004 | 0.026 | 0.0005 |

10,000 | 0.005 | 0.028 | 0.0006 |

**Table 6.**Execution time (in seconds) of the proposed method in comparison with TextRank (the baseline), when the size of the document (number of words) increases ($k=0.05\times $ no_words).

No. of Words | Baseline | CS |
---|---|---|

100 | 0.049 | 0.057 |

500 | 0.242 | 0.248 |

1000 | 0.499 | 0.493 |

5000 | 2.656 | 2.399 |

10,000 | 6.296 | 4.812 |

12,500 | 8.622 | 6.028 |

15,000 | 11.430 | 7.249 |

17,500 | 13.989 | 8.495 |

20,000 | 17.214 | 9.719 |

22,500 | 21.110 | 10.961 |

25,000 | 24.791 | 12.192 |

27,500 | 29.704 | 13.422 |

30,000 | 33.122 | 14.692 |

32,500 | 37.402 | 15.924 |

35,000 | 40.944 | 17.149 |

Algorithm | Precision | Recall | F-Measure |
---|---|---|---|

TextRank | 0.316 | 0.482 | 0.381 |

CS (P2, $\lambda =0$) | 0.307 | 0.479 | 0.365 |

CS (P2, $\lambda =0.5$) | 0.322 | 0.505 | 0.384 |

CS (P3, $\lambda =0.5$, ${\lambda}^{\prime}=2$) | 0.335 | 0.527 | 0.400 |

Algorithm | Precision | Recall | F-Measure |
---|---|---|---|

TextRank | 0.155 | 0.247 | 0.190 |

CS (P2, $\lambda =0$) | 0.152 | 0.245 | 0.185 |

CS (P2, $\lambda =0.5$) | 0.164 | 0.267 | 0.201 |

CS (P3, $\lambda =0.5$, ${\lambda}^{\prime}=2$) | 0.175 | 0.285 | 0.214 |

Algorithm | Precision | Recall | F-Measure |
---|---|---|---|

TextRank | 0.301 | 0.421 | 0.351 |

CS (P2, $\lambda =0$) | 0.300 | 0.414 | 0.339 |

CS (P2, $\lambda =0.5$) | 0.313 | 0.438 | 0.356 |

CS (P3, $\lambda =0.5$, ${\lambda}^{\prime}=2$) | 0.325 | 0.456 | 0.371 |

**Table 10.**Results for Cornell Newsrooms dataset. Three algorithm types are available: Unsupervised Extractive (UE), Supervised Mixed (SM), and Supervised Abstractive (SA) [46]. Note that the last two types require training.

Algorithm | Alg. Type | R-1 | R-2 | R-L |
---|---|---|---|---|

Modified P-G [24] | SM | 0.3991 | 0.2838 | 0.3687 |

ExtConSumm Ext. [25] | SM | 0.3940 | 0.2780 | 0.3620 |

C10110/SpaCy [23] | SM | 0.3936 | 0.2786 | 0.3635 |

TLM [28] | SM | 0.3324 | 0.2001 | 0.2921 |

Lede-3 Baseline [46] | UE | 0.3202 | 0.2108 | 0.2959 |

CS [this paper] | UE | 0.3054 | 0.1868 | 0.2234 |

Pointer-Generator [26] | SM | 0.2754 | 0.1332 | 0.2350 |

TextRank [47] | UE | 0.2445 | 0.1012 | 0.2013 |

Fast-RL [29] | SM | 0.2193 | 0.0937 | 0.1961 |

Seq2Seq + Attention [22] | SA | 0.0599 | 0.0037 | 0.0541 |

Method | Recall | Precision | F-Measure |
---|---|---|---|

NUS3 | 0.3446 | 0.3436 | 0.3440 |

Columbia | 0.3424 | 0.3355 | 0.3388 |

PolyU | 0.3400 | 0.3329 | 0.3363 |

CS | 0.3387 | 0.3325 | 0.3355 |

ISI-Webcl | 0.3336 | 0.3134 | 0.3231 |

FDUSUM | 0.3310 | 0.3256 | 0.3282 |

SFUv2.4 | 0.3305 | 0.3249 | 0.3276 |

isi-bqfs | 0.3304 | 0.3225 | 0.3263 |

CCS-NSA-05 | 0.3300 | 0.3211 | 0.3254 |

IIITH-Sum | 0.3292 | 0.3314 | 0.3301 |

FTextST-05 | 0.3281 | 0.3406 | 0.3339 |

ERSS2005 | 0.3264 | 0.3197 | 0.3229 |

EMBRA | 0.3223 | 0.3253 | 0.3237 |

I2RNLS | 0.3222 | 0.3138 | 0.3179 |

OHSU-DUC05 | 0.3209 | 0.3203 | 0.3205 |

CLResearch.duc05 | 0.3177 | 0.3179 | 0.3177 |

lcc.duc05 | 0.3172 | 0.3325 | 0.3235 |

LAKE05 | 0.3115 | 0.3043 | 0.3078 |

TUT/NII | 0.3107 | 0.3095 | 0.3100 |

NLP-RALI05 | 0.3107 | 0.3159 | 0.3131 |

UMDBBN | 0.3069 | 0.2976 | 0.3021 |

CLAIR | 0.3047 | 0.3074 | 0.3059 |

LARIS2005 | 0.3039 | 0.3186 | 0.3109 |

KTH-holsum | 0.3003 | 0.3350 | 0.3161 |

SHEF-BSL | 0.2977 | 0.3056 | 0.3014 |

UofO | 0.2931 | 0.2900 | 0.2914 |

ULETH2005 | 0.2824 | 0.3088 | 0.2949 |

IOSSUMMZ | 0.2801 | 0.3052 | 0.2914 |

UAM2005 | 0.2795 | 0.3160 | 0.2878 |

QASUM-UPC | 0.2719 | 0.3062 | 0.2797 |

TLR | 0.2552 | 0.3554 | 0.2930 |

Baseline | 0.2532 | 0.3104 | 0.2644 |

UCD-IIRG | 0.1647 | 0.3708 | 0.2196 |

Method | Recall |
---|---|

NUS3 | 0.0725 |

CS | 0.0720 |

PolyU | 0.0717 |

isi-bqfs | 0.0698 |

IIITH-Sum | 0.0696 |

Columbia | 0.0686 |

FTextST-05 | 0.0675 |

ISI-Webcl | 0.0643 |

lcc.duc05 | 0.0635 |

OHSU-DUC05 | 0.0633 |

SFUv2.4 | 0.0632 |

CCS-NSA-05 | 0.0628 |

I2RNLS | 0.0625 |

NLP-RALI05 | 0.0609 |

ERSS2005 | 0.0609 |

FDUSUM | 0.0609 |

EMBRA | 0.0597 |

CLAIR | 0.0594 |

CLResearch.duc05 | 0.0594 |

TUT/NII | 0.0573 |

LAKE05 | 0.0563 |

KTH-holsum | 0.0553 |

ULETH2005 | 0.0547 |

UMDBBN | 0.0546 |

SHEF-BSL | 0.0534 |

TLR | 0.0515 |

LARIS2005 | 0.0497 |

UofO | 0.0496 |

QASUM-UPC | 0.0487 |

IOSSUMMZ | 0.0478 |

UAM2005 | 0.0462 |

Baseline | 0.0403 |

UCD-IIRG | 0.0256 |

Method | Recall |
---|---|

CS | 0.1430 |

NUS3 | 0.1316 |

PolyU | 0.1297 |

IIITH-Sum | 0.1279 |

Columbia | 0.1277 |

isi-bqfs | 0.1253 |

FTextST-05 | 0.1232 |

ISI-Webcl | 0.1225 |

SFUv2.4 | 0.1218 |

OHSU-DUC05 | 0.1190 |

CCS-NSA-05 | 0.1190 |

FDUSUM | 0.1188 |

ERSS2005 | 0.1187 |

lcc.duc05 | 0.1176 |

I2RNLS | 0.1174 |

EMBRA | 0.1168 |

CLResearch.duc05 | 0.1167 |

CLAIR | 0.1146 |

NLP-RALI05 | 0.1139 |

TUT/NII | 0.1112 |

LAKE05 | 0.1107 |

KTH-holsum | 0.1095 |

UMDBBN | 0.1085 |

SHEF-BSL | 0.1041 |

LARIS2005 | 0.10 41 |

ULETH2005 | 0.1023 |

UofO | 0.0995 |

IOSSUMMZ | 0.0981 |

UAM2005 | 0.0970 |

QASUM-UPC | 0.0967 |

TLR | 0.0940 |

Baseline | 0.0872 |

UCD-IIRG | 0.0557 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Popescu, C.; Grama, L.; Rusu, C. A Highly Scalable Method for Extractive Text Summarization Using Convex Optimization. *Symmetry* **2021**, *13*, 1824.
https://doi.org/10.3390/sym13101824

**AMA Style**

Popescu C, Grama L, Rusu C. A Highly Scalable Method for Extractive Text Summarization Using Convex Optimization. *Symmetry*. 2021; 13(10):1824.
https://doi.org/10.3390/sym13101824

**Chicago/Turabian Style**

Popescu, Claudiu, Lacrimioara Grama, and Corneliu Rusu. 2021. "A Highly Scalable Method for Extractive Text Summarization Using Convex Optimization" *Symmetry* 13, no. 10: 1824.
https://doi.org/10.3390/sym13101824