Adapting Static and Contextual Representations for Policy Gradient-Based Summarization

Ching-Sheng Lin; Jung-Sing Jwo; Cheng-Hsiung Lee

doi:10.3390/s23094513

,

and

¹

Master Program of Digital Innovation, Tunghai University, Taichung 40704, Taiwan

²

Department of Computer Science, Tunghai University, Taichung 40704, Taiwan

^*

Author to whom correspondence should be addressed.

Sensors2023, 23(9), 4513;https://doi.org/10.3390/s23094513

This article belongs to the Special Issue Deep Learning for Semantic Segmentation and Explainable AI Based on Sensing Technology

Version Notes

Order Reprints

Abstract

Considering the ever-growing volume of electronic documents made available in our daily lives, the need for an efficient tool to capture their gist increases as well. Automatic text summarization, which is a process of shortening long text and extracting valuable information, has been of great interest for decades. Due to the difficulties of semantic understanding and the requirement of large training data, the development of this research field is still challenging and worth investigating. In this paper, we propose an automated text summarization approach with the adaptation of static and contextual representations based on an extractive approach to address the research gaps. To better obtain the semantic expression of the given text, we explore the combination of static embeddings from GloVe (Global Vectors) and the contextual embeddings from BERT (Bidirectional Encoder Representations from Transformer) and GPT (Generative Pre-trained Transformer) based models. In order to reduce human annotation costs, we employ policy gradient reinforcement learning to perform unsupervised training. We conduct empirical studies on the public dataset, Gigaword. The experimental results show that our approach achieves promising performance and is competitive with various state-of-the-art approaches.

Keywords:

automatic text summarization; GloVe; BERT; GPT; unsupervised training; policy gradient reinforcement learning

1. Introduction

Automatic text summarization is a technique used to produce a concise version of a source document that preserves its essence while removing redundant information []. It is always an important research task in the natural language processing community. Due to the ever-growing volume of electronic documents made available in our daily lives, the need for an efficient summarization tool and reliable approach to save readers’ time is also increasing.

In general, automatic text summarization is categorized into extractive and abstractive summarization. The extraction-based approach usually composes a summary by selecting salient parts from the source text and concatenating them to create the final results [], while the abstractive-based approach often requires rewriting and synthesizing the text content in order to produce a summary, instead of directly extracting the crucial text from the source document []. An example comparison can be found in Table 1. The input document is taken from a CNN news article, where the extractive summarization identifies important sentences by applying BERT (Bidirectional Encoder Representations from Transformer) embeddings and performing K-Means clustering [], and the abstractive summarization trains a sequence-to-sequence model for the gap-sentences generation task [].

Table 1. A CNN news article (https://edition.cnn.com/2015/04/14/americas/chile-same-sex-civil-unions/index.html) (accessed on 1 February 2023) is used to demonstrate the extractive and abstractive summarization.

How to represent the text in a numerical form for the machine to process is an important step of NLP pipelines and also has significant impact on the summarization performance []. Although the traditional bag-of-words (BOW) model is simple and commonly used, it is not able to properly capture semantic relationships when two sentences have no words in common but share similar semantics []. For example, consider the following two sentences: “Brian left Beach Boys” and “Wilson abandoned the band”; they do not share any common words but have the same semantics. Most recently, word embeddings have gained much attention because of their ability to convert words into low-dimensional and dense representations with the aid of neural network language modeling. Global Vectors (GloVe) are an unsupervised method used to learn word representations by computing word–word co-occurrence, and the model is trained on Wikipedia and Gigaword corpora []. BERT (Bidirectional Encoder Representations from Transformers) [] is a pre-trained language model based on the Transformer architecture [] for the purpose of learning the semantic information of the input text. It is trained on two learning objectives, a Masked Language Model (MLM) and Next Sentence Prediction (NSP). BERT has been widely adopted in NLP models and has achieved state-of-the-art results on 11 NLP tasks. GPT-2 (Generative Pre-trained Transformer 2) is also a pre-trained language model but based on the decoder of the Transformer architecture []. Unlike BERT, which learns the words in a bidirectional manner, the training of GPT-2 is conducted in an autoregressive fashion by the task of predicting the next word. The word embeddings of GloVe are static, where each word has a fixed vector representation. On the other hand, BERT and GPT-2 produce contextualized representations where the word embeddings are context-sensitive [].

When learning text summarization with supervised training methods, the training data are usually not available, and it is often difficult to annotate on a large scale. Unsupervised learning is becoming popular because it is more applicable in real-world practice []. Reinforcement learning (RL)-based summarization models that maximize a reward function [] and adversarial training methods for text generation [] are two mainstream approaches to unsupervised extractive summarization. There have also been some efforts on automatically generating training data [] and better learning text representations in a self-supervised fashion [].

In this paper, we employ a policy gradient reinforcement learning model for unsupervised extractive summarization. Since text generation in the neural network model often suffers from a non-differentiable issue, reinforcement learning methods devise rewards to overcome this challenge []. Moreover, to better encode the semantics of words, we investigate the effectiveness of adapting and combing various embedding schemes to represent words. We formulate the extraction of important tokens from the given text as a sequence labeling task with binary labels to find an optimal assignment. The mechanism of our method is analogous to the Generative Adversarial Networks (GAN) [], where the RL agent plays the role of ‘generator’ for performing binary sequence labeling and the reward function plays the role of ‘discriminator’ for shaping the agent’s behavior. The major difference is that our reward functions do not accept any real samples and are not trainable networks.

The primary contributions of this paper are two-fold:

We propose an automated model for unsupervised extractive text summarization based on a policy gradient reinforcement learning approach. The semantic representations of the text are extracted from static and contextual embeddings, including GloVe-, BERT- and GPT-based models.
Empirical studies on the Gigaword dataset illustrate that the proposed solution is capable of creating reasonable summaries and is comparable with other state-of-the-art algorithms.

The rest of this paper is organized in the following manner. In Section 2, several research efforts and techniques related to the topic of this paper are reviewed. We then discuss the proposed approach and detailed components in Section 3. Experimental studies are reported to justify the effectiveness and analyze the performance of the proposed method in Section 4. Finally, we draw some conclusions and also provide several possible future research paths.

2. Related Work

Automatic text summarization methods which support a digest of a source document are classified into two categories: one is extractive summarization and the other is abstractive summarization []. Extraction-based approaches produce summaries by selecting crucial parts of the original document and combining them to derive a reasonable summary. On the other hand, abstraction-based methods rely on understanding the gist of the source document to rewrite a condensed version. A Restricted Boltzmann Machine (RBM) approach is used to extract features for extractive text summarization []. BERT [], a pre-trained self-supervised language model, has successfully outperformed many state-of-the-art algorithms for natural language processing-related tasks. BertSum is a BERT-based model to learn sentence representation and give each sentence a score indicating whether the sentence should be extracted as a candidate summary sentence []. The combination of sequence-to-sequence learning (seq2seq) and an attention mechanism is a widely used approach that compresses the input into a vector representation with an encoder and generates the output with a decoder []. Many different variants of attention-based seq2seq networks have been explored for abstractive summarization tasks [,]. A GAN is an adversarial training framework where two networks, a Generator and a Discriminator, are trained alternately in a competitive fashion []. It has been proposed to build a Generator to produce the abstractive summarization based on the raw input text, whereas learning is performed by a Discriminator to distinguish between the generated summary and a human-written summary [].

A two-stage design has recently been attempted for constructing document summarization systems. EXCONSUMM proposes a hybrid extract-then-compress model to avoid the challenges in abstractive summarization but still yield more readable summaries than extraction-based models []. Since human-written summary sentences are usually generated from a single sentence or the fusion of multiple sentences, a joint-score method to capture the semantic representation for both singletons and sentence pairs is proposed []. Instances with a higher score are then extracted and rewritten with the leverage of the Pointer-Generator Network [].

RL aims at training an agent to interact with an environment for the purpose of generating a feasible policy which could perform a proper action in a given state in order to maximize rewards []. It has been broadly and increasingly adopted in many applications such as autonomous driving, robotics and video games. The research area of text summarization is of no exception, where many related approaches have been proposed and have yielded state-of-the-art results []. An extractive technique based on a graph and reinforcement neural network, GoSum, is proposed to build a heterogeneous graph for the source document and then apply a policy gradient to optimize the reward scores for the summarization of long papers []. As coherence is the key factor to produce a high-quality summary, efforts to develop learning models which could take both in-sentence dependencies and cross-sentence semantics into consideration are necessary. The Reinforced Neural Extractive Summarization model [] learns a neural extractive summarizer based on the REINFORCE algorithm, where the reward score is mainly provided by another neural model, ARC-II, which is used to measure the degree of coherence between sentences []. Unlike traditional abstractive summarization approaches which assume a deterministic target distribution and are trained by maximum likelihood estimation, the BRIO model applies non-deterministic distribution to assign probability mass for candidate summaries based on their quality []. To imitate the human procedure when writing a summary, FusionSum [] devises a module to combine salient sentences into different groups and another module to rewrite the summary sentences for each group. The training mechanism between the two modules is based on an end-to-end cooperative reinforcement learning.

Based on summarization approaches, extractive methods only extract information from the original text, which could be more factually accurate but may also introduce redundant or uninformative text. Abstractive approaches can produce more concise and readable summaries, but the text generated by the model may contain errors or inaccuracies. Based on learning methods, supervised summarizers can achieve more reliable results but need training data with human input. Unsupervised summarizers do not require labelled data but could result in summaries that are difficult to understand.

3. Proposed Method

In this section, we will present the proposed network, which contains a policy gradient reinforcement learning architecture, representation learning with various embeddings and optimization strategies (Figure 1).

Figure 1. The architecture of the proposed policy gradient reinforcement learning model for unsupervised extractive summarization.

3.1. The Policy Gradient Reinforcement Learning Architecture

In this work, we formulate the task of extractive summarization as the selection of important tokens from the document and solve the problem with a reinforcement learning algorithm. The input of the algorithm is a document

d_{i}

with t characters (i.e.,

(x_{1}, x_{2}, \dots x_{t})

), and the goal is to predict a binary label for each character

x_{j}

to determine whether to include

x_{j}

in the final summary. The output will be the concatenation of those chosen items.

Reinforcement learning is a popular model to solve planning problems and has been widely used to learn in highly dynamic and complex environments. The learning agent, in general, follows three steps repeatedly with a trial-and-error mechanism to learn the optimal policy. First, the agent observes the environment state s. Second, based on a policy, the agent takes an action a from the given state s. Third, the agent receives a reward r for taking action a in state s and the environment will subsequently provide a next state s’. Ultimately, the general goal of the agent is to maximize the accumulated rewards.

We describe how to incorporate our summarization approach to the reinforcement learning settings as follows:

State: A document $d_{i} = (x_{1}, x_{2}, \dots x_{t})$ is considered as a state s. Each token $x_{j}$ is further represented by the concatenation of its GloVe ( $h^{1}$ ), BERT( $h^{2}$ ) and GPT-2( $h^{3}$ ) embeddings. GloVes are a kind of static embeddings, the BERT is a category of dynamic embeddings trained on masked language modeling and GPT-2 is another type of dynamic embeddings trained on autoregressive language modeling. We then pass the concatenated embeddings into an LSTM layer to encode the sequential information and capture contextual features. The state representation is then denoted as $f$ [,].
Action: The selection of important words from a sentence is treated as a sequence labelling problem in this work. Given the state representation $f$ , the algorithm will perform the action of producing a sequence of binary labels $y = (y_{1}, y_{2}, \dots y_{t})$ to indicate the importance of a word.
Reward: We apply three commonly used reward functions to measure the quality of the extracted text, including fluency ( $R_{f l u}$ ), similarity ( $R_{s i m}$ ) and compression ( $R_{c o m}$ ) [,,]. The fluency reward judges if the generated text is grammatically sound and semantically correct. Its score is calculated as the average of their perplexities by a language model. The similarity reward measures the semantic similarity between the generated summary and the source document in order to ensure the content’s preservation. We adopt the cosine similarity as the similarity score to compute the distance between the embeddings of the generated summary and the source document. The compression reward encourages the agent to generate summaries close to the predefined length. We refer to the prior research work [], $R_{c o m} (|y|, L) = \exp (- \frac{||y| - L|}{σ_{L}})$ , to calculate the compression score, where $|y|$ is the length of the generated summary, $L$ is the target summary length and $σ_{L}$ is a hyper-parameter.

3.2. Training Algorithm

In this work, we use a policy gradient method to learn the summarization task. The learnable policy is implemented in a neural network structure, which is parameterized by

θ

. Given an input document (

d_{i}

), the state representation

f

is obtained after concatenating the embeddings of GloVe, BERT and GPT-2 and subsequently passing them through the LSTM layer. The policy is defined as multiplication of the probability of the token

y_{i}

being included into the summary:

π_{θ} (y | f) = \prod_{i} \Pr (y_{i} | f, θ)

(1)

After sampling the action based on

π_{θ}

, the reward

r^{π}

will be given to evaluate the selected tokens. The goal of the learning is to maximize the reward, which is denoted as

J (θ)

, and the gradient can be derived as shown in Equation (2) by applying a policy gradient theorem []:

\nabla_{θ} J (θ) = \nabla_{θ} (r^{π}) {\log π}_{θ} (y | f)

(2)

We use the gradient to update the weight of

θ

with the learning rate α as follows:

θ \leftarrow θ + α \nabla_{θ} J (θ)

(3)

Algorithm 1 shows the overall algorithm to train our proposed model.

Algorithm 1: Policy Gradient-Based Summarization Model

Parameters

:

θ for the policy network π.

for each training iteration:
$Sample m examples of input document \{d_{1}, d_{2}, \dots d_{m}\}$
$for each d_{i}$ $with t characters \{x_{1}, x_{2}, \dots x_{t}\}$ as a state s
$Convert \{x_{1}, x_{2}, \dots x_{t}\}$ to $\{f_{1}, f_{2}, \dots f_{t}\}$
$Perform an action by producing a binary probability distribution based on Equation (1) for each f_{j}$ $to obtain \{y_{1}, y_{2}, \dots y_{t}\}$
Calculate the rewards $R_{f l u}$ $, R_{s i m}$ $and R_{c o m}$
Compute the gradient in Equation (2)
Update $θ$ in Equation (3)
End for
end for

4. Experiments and Results

In this section, we present empirical studies to illustrate our method with static and contextual embeddings for the summary generation. These studies consist of (1) the introduction of the dataset; (2) the definition of evaluation indices; (3) comparisons with other published approaches; and (4) ablation studies to analyze the effect of each embedding scheme.

4.1. Dataset

We apply our proposed framework on a public dataset, Gigaword. The training, validation and testing set sizes are 1 M, 189 K and 1951, respectively []. The annotation data are stored in JSON format, where each instance contains the ID, text and the corresponding summary. The average length of the source and summary are shown in Table 2.

Table 2. Data statistics for the Gigaword dataset, where AvgInputLen is the average length of the input document and AvgSummaryLen is the average summary length.

4.2. Evaluation Metric

To automatically measure the summarization performance of the proposed network, Recall-Oriented Understudy for Gisting Evaluation (ROUGE), which is the most broadly used evaluation measure in the relevant research, is employed []. ROUGE measures the co-occurrence information between the machine-generated summary and ground truth summary. In general, there are three popular ROUGE variants for the summarization task, ROUGE-1 (R-1), ROUGE-2 (R-2) and ROUGE-L (R-L). R-N computes the similarity score based on the percentage of overlap in the N-gram, R-1 compares the overlap in the unigram and R-2 measures the overlap in the bigram. Meanwhile, R-L refers to the length ratio of the longest common sequence between the generated and ground-truth summaries to indicate fluency.

4.3. Experimental Results

In order to validate the effectiveness of our method, we conduct an experimental investigation to compare its results against those of other methods, which are introduced as follows:

Lead-8: This approach is a simple baseline which directly selects the first eight words in the source document to assemble a summary.
Contextual Match []: This research work introduces two language models to create the summarization and maintain output fluency. A generic pre-trained language model is used to perform contextual matching and the other target domain-specific language model is used to guide the generation fluency.
AdvREGAN []: The method uses cycle-consistency to encode the input text representation and applies an adversarial reinforcement-based GAN to generate human-like text.
HC_title_8 []: This work extracts words from the input text based on a hill-climbing approach by discrete optimization algorithms with a summary length of about eight words.
HC_title_10 []: The model is identical to the HC_title_8 but with a summary length of about 10 words.
SCRL_L8 []: The approach models the sentence compression to fine-tune BERT using a reinforcement learning setup.

The operating environment of this experimental comparison is the Windows 10 OS equipped with an Intel Core i9 central processing unit (CPU) and 128 GB of memory. The graphic processing unit (GPU) is an NVIDIA GeForce RTX 3090 with 24 GB of memory. Our hyper-parameter setting is displayed in Table 3.

Table 3. Hyper-parameter configuration.

Based on the comparison results shown in Table 4 where other baselines are directly taken from the corresponding papers, we obtain the best results on R-1 (29.75%) and R-L scores (26.98%), indicating their ability to produce informative and coherent summaries. Although our model is ranked second best on R-2 (10.26%), the value is very close to the best method (10.27%). Reinforcement learning-based approaches (SCRL_L8 and our model) with multiple reward functions yield better performance on R-1 and R-2.

Table 4. Performance comparison (%).

In addition, we also provide several examples to illustrate our results in Table 5, including the input (INPUT_X), ground-truth summary (GOLD SUMMARY) and system-generated summary (GEN SUMMARY). As presented, our algorithm has the ability to extract crucial items and produce fair summaries. Nevertheless, there are still some mistakes to overcome for further improvements. First, because of partial selection from the source sentence to form the summary, the output is not always as grammatical as it should be (displayed in red text). Devising another reword function to incorporate grammatical constraints as the background knowledge could be a possible way to ensure grammaticality. Second, in some cases, the model fails to generate factually consistent summaries (displayed in blue text), reducing the faithfulness of the output and causing misunderstanding for the readers. One potential avenue for future research relates to leveraging textual entailment models to enhance factual consistency [].

Table 5. Four example outputs produced by our model.

4.4. Ablation Experiments

In order to verify the feasibility of our proposals, we conduct ablation studies on the Gigaword dataset to investigate the benefits of our approach and quantify the effect of each embedding scheme. As the results show in Table 6, adapting static and contextual embeddings in our proposed method can exceed the results of training with GloVe, BERT and GPT-2 independently in terms of R-1, R-2 and R-3. Therefore, we believe that our method can further enhance the performance of producing summaries through learning complex features among different embedding schemes.

Table 6. Ablation experiment results (%).

5. Conclusions

In this paper, we propose an unsupervised policy gradient reinforcement method based on the combination of various embeddings to resolve the text summarization problem. We conduct empirical studies on the Gigaword dataset, and the results are satisfactory. Compared to other existing methods, our model captures better representations of each token and yields the best performance in terms of ROUGE-1 and ROUGE-L.

Since the ROUGE-2 score of our method is ranked second (10.26%) and is only 0.01% lower than the top-performing method, future directions will focus on designing reward functions to increase bigram overlap. As our approach is capable of selecting important tokens, the investigation of combining chosen tokens into comprehensible phrases could be a possible solution.

Another challenge of our work will be the practical application in resource-constrained environments such as mobile computing due to the large embedding size. We are interested in knowledge distillation and low-dimensional representation for the purpose of deploying the proposed model in resource-restricted settings.

Author Contributions

Supervision, J.-S.J.; methodology, J.-S.J., C.-S.L. and C.-H.L.; investigation, C.-S.L.; writing—review and editing, C.-S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is financially supported by the National Science and Technology Council (NSTC) of Taiwan under Grant 111-2221-E-029-019-.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Luhn, H.P. The automatic creation of literature abstracts. IBM J. Res. Dev. 1958, 2, 159–165. [Google Scholar] [CrossRef]
Ferreira, R.; de Souza Cabral, L.; Lins, R.D.; e Silva, G.P.; Freitas, F.; Cavalcanti, G.D.; Lima, R.; Simske, S.J.; Favaro, L. Assessing sentence scoring techniques for extractive text summarization. Expert Syst. Appl. 2013, 40, 5755–5764. [Google Scholar] [CrossRef]
Nallapati, R.; Zhou, B.; Gulcehre, C.; Xiang, B. Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv 2016, arXiv:1602.06023. [Google Scholar]
Miller, D. Leveraging BERT for extractive text summarization on lectures. arXiv 2019, arXiv:1906.04165. [Google Scholar]
Zhang, J.; Zhao, Y.; Saleh, M.; Liu, P. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; PMLR: New York, NY, USA, 2020; pp. 11328–11339. [Google Scholar]
Rossiello, G.; Basile, P.; Semeraro, G. Centroid-based text summarization through compositionality of word embeddings. In Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres, Valencia, Spain, 3 April 2017; pp. 12–21. [Google Scholar]
Radev, D.R.; Jing, H.; Styś, M.; Tam, D. Centroid-based summarization of multiple documents. Inf. Process. Manag. 2004, 40, 919–938. [Google Scholar] [CrossRef]
Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 26–28 October 2014; pp. 1532–1543. [Google Scholar]
Kenton JD MW, C.; Toutanova, L.K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the NAACL-HLT 2019, Minneapolis, Minnesota, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog. 2019, 1, 9. [Google Scholar]
Ethayarajh, K. How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 55–65. [Google Scholar]
Amplayo, R.K.; Angelidis, S.; Lapata, M. Unsupervised opinion summarization with content planning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2–9 February; Volume 35, pp. 12489–12497.
Hyun, D.; Wang, X.; Park, C.; Xie, X.; Yu, H. Generating Multiple-Length Summaries via Reinforcement Learning for Unsupervised Sentence Summarization. arXiv 2022, arXiv:2212.10843. [Google Scholar]
Wang, Y.; Lee, H.Y. Learning to Encode Text as Human-Readable Summaries using Generative Adversarial Networks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 4187–4195. [Google Scholar]
Pasunuru, R.; Celikyilmaz, A.; Galley, M.; Xiong, C.; Zhang, Y.; Bansal, M.; Gao, J. (2021, May). Data augmentation for abstractive query-focused multi-document summarization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2–9 February; Volume 35, pp. 13666–13674.
Wang, H.; Wang, X.; Xiong, W.; Yu, M.; Guo, X.; Chang, S.; Wang, W.Y. Self-supervised learning for contextualized extractive summarization. arXiv 2019, arXiv:1906.04466. [Google Scholar]
Liu, L.; Lu, Y.; Yang, M.; Qu, Q.; Zhu, J.; Li, H. Generative adversarial network for abstractive text summarization. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Sharma, G.; Sharma, D. Automatic Text Summarization Methods: A Comprehensive Review. SN Comput. Sci. 2022, 4, 33. [Google Scholar] [CrossRef]
Sharma, G.; Gupta, S.; Sharma, D. Extractive text summarization using feature-based unsupervised RBM Method. In Cyber Security, Privacy and Networking: Proceedings of ICSPN 2021; Springer Nature Singapore: Singapore, 2022; pp. 105–115. [Google Scholar]
Liu, Y. Fine-tune BERT for extractive summarization. arXiv 2019, arXiv:1903.10318. [Google Scholar]
Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. Adv. Neural Inf. Process. Syst. 2014, 27. [Google Scholar]
Ma, X.; Keung, J.W.; Yu, X.; Zou, H.; Zhang, J.; Li, Y. AttSum: A Deep Attention-Based Summarization Model for Bug Report Title Generation. IEEE Trans. Reliab. 2023; early access. [Google Scholar]
Mendes, A.; Narayan, S.; Miranda, S.; Marinho, Z.; Martins, A.F.; Cohen, S.B. Jointly Extracting and Compressing Documents with Summary State Representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 3955–3966. [Google Scholar]
Lebanoff, L.; Song, K.; Dernoncourt, F.; Kim, D.S.; Kim, S.; Chang, W.; Liu, F. Scoring Sentence Singletons and Pairs for Abstractive Summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 2175–2189. [Google Scholar]
See, A.; Liu, P.J.; Manning, C.D. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, 30 July–4 August 2017; Volume 1, pp. 1073–1083. [Google Scholar]
Li, Y. Deep reinforcement learning: An overview. arXiv 2017, arXiv:1701.07274. [Google Scholar]
Alomari, A.; Idris, N.; Sabri AQ, M.; Alsmadi, I. Deep reinforcement and transfer learning for abstractive text summarization: A review. Comput. Speech Lang. 2022, 71, 101276. [Google Scholar] [CrossRef]
Bian, J.; Huang, X.; Zhou, H.; Zhu, S. GoSum: Extractive Summarization of Long Documents by Reinforcement Learning and Graph Organized discourse state. arXiv 2022, arXiv:2211.10247. [Google Scholar]
Wu, Y.; Hu, B. Learning to extract coherent summary via deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Hu, B.; Lu, Z.; Li, H.; Chen, Q. Convolutional neural network architectures for matching natural language sentences. Adv. Neural Inf. Process. Syst. 2014, 2, 2042–2050. [Google Scholar]
Liu, Y.; Liu, P.; Radev, D.; Neubig, G. BRIO: Bringing Order to Abstractive Summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; Volume 1, pp. 2890–2903. [Google Scholar]
Xiao, L.; He, H.; Jin, Y. FusionSum: Abstractive summarization with sentence fusion and cooperative reinforcement learning. Knowl. Based Syst. 2022, 243, 108483. [Google Scholar] [CrossRef]
Zhang, X.; Lapata, M. Sentence Simplification with Deep Reinforcement Learning. In Proceedings of the EMNLP 2017: Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Copenhagen, Denmark, 7–11 September 2017; pp. 584–594. [Google Scholar]
Ghalandari, D.G.; Hokamp, C.; Ifrim, G. Efficient Unsupervised Sentence Compression by Fine-tuning Transformers with Reinforcement Learning. arXiv 2022, arXiv:2205.08221. [Google Scholar]
Schumann, R.; Mou, L.; Lu, Y.; Vechtomova, O.; Markert, K. Discrete Optimization for Unsupervised Sentence Summarization with Word-Level Extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 5032–5042. [Google Scholar]
Sutton, R.S.; McAllester, D.; Singh, S.; Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 1999, 12. [Google Scholar]
Lin, C.Y.; Hovy, E. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada, 27 May–1 June 2003; pp. 150–157. [Google Scholar]
Zhou, J.; Rush, A.M. Simple Unsupervised Summarization by Contextual Matching. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 5101–5106. [Google Scholar]
Aharoni, R.; Narayan, S.; Maynez, J.; Herzig, J.; Clark, E.; Lapata, M. mFACE: Multilingual Summarization with Factual Consistency Evaluation. arXiv 2022, arXiv:2212.10622. [Google Scholar]

Figure 1. The architecture of the proposed policy gradient reinforcement learning model for unsupervised extractive summarization.

Table 1. A CNN news article (https://edition.cnn.com/2015/04/14/americas/chile-same-sex-civil-unions/index.html) (accessed on 1 February 2023) is used to demonstrate the extractive and abstractive summarization.

DOCUEMNT:
The country joined several of its South American neighbors in allowing the unions when President Michelle Bachelet enacted a new law on Monday. This is a concrete step in the drive to end the difference between homosexual and heterosexual couples, Bachelet said. The new law will take effect in six months. It will give legal weight to cohabiting relationships between two people of the same sex and between a man and a woman. The Chilean government estimates that around 2 million people will be able to benefit from the change. The law is intended to end discrimination faced by common-law couples, such as not being allowed to visit partners in hospital, make medical decisions on their behalf or decide what to do with their remains. It also gives the couples greater rights in the realms of property, health care, pensions and inheritance. A number of South American nations have moved to allow same-sex civil unions in recent years. But marriage between people of the same sex is legal only in Argentina, Brazil and Uruguay.

EXTRACTIVE SUMMARY []:
The law is intended to end discrimination faced by common-law couples, such as not being allowed to visit partners in hospital, make medical decisions on their behalf or decide what to do with their remains.

ABSTRACTIVE SUMMARY []:
Chile has become the first country in the world to legalise same-sex civil unions.

Table 2. Data statistics for the Gigaword dataset, where AvgInputLen is the average length of the input document and AvgSummaryLen is the average summary length.

	Training	Validation	Testing
AvgInputLen	31.19	31.32	29.70
AvgSummaryLen	8.04	8.31	8.79

Table 3. Hyper-parameter configuration.

Parameter	Value
Batch size	4
GloVe embeddings size	300
BERT embeddings size	768
GPT-2 embeddings size	768
LSTM hidden units	300
Learning rate	0.00005
Dropout rate	0.5

Table 4. Performance comparison (%).

	R-1	R-2	R-L
Lead-8	21.39	7.42	20.03
Contextual Match	26.48	10.05	24.41
AdvREGAN	27.29	10.01	24.59
HC_title_8	26.32	9.63	24.19
HC_title_10	27.52	10.27	24.91
SCRL_L8	29.64	9.98	26.57
Our Model	29.75	10.26	26.98

Table 5. Four example outputs produced by our model.

INPUT_1:
un under-secretary-general for political affairs ibrahim gambari said on wednesday that although the future of the peace process in the middle east is hopeful, it still faces immense challenges.

GOLD SUMMARY:
un senior official says peace process in middle east hopeful with immense challenges

GEN SUMMARY:
under-secretary-general ibrahim gambari said future peace process still faces immense challenges

INPUT_2:
rose UNK, a soprano who performed ## seasons at the metropolitan opera and established herself as a premier voice in american opera, has died.

GOLD SUMMARY:
rose UNK metropolitan opera star in the ####s and ##s dies at ##

GEN SUMMARY:
rose UNK soprano who performed at the opera has died.

INPUT_3:
an over-loaded minibus overturned in a county in southwestern guizhou province tuesday afternoon, killing ## passengers and injuring three others.

GOLD SUMMARY:
traffic accident kills ## injures # in sw china

GEN SUMMARY:
over-loaded minibus in county in southwestern guizhou killing passengers others.

INPUT_4:
south african cricket captain graeme smith said on sunday he would not use politics as an excuse for south africa ‘s performance in the cricket world cup in the west indies.

GOLD SUMMARY:
cricket: politics not to blame for world cup failure: smith

GEN SUMMARY:
african captain smith said politics an excuse in cricket cup

Table 6. Ablation experiment results (%).

	Summary
	R-1	R-2	R-L
Our Model	29.75	10.26	26.98
GloVe	11.98	1.03	11.37
BERT	28.19	8.62	25.64
GPT-2	20.54	5.31	19.02

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Adapting Static and Contextual Representations for Policy Gradient-Based Summarization

Abstract

1. Introduction

2. Related Work

3. Proposed Method

3.1. The Policy Gradient Reinforcement Learning Architecture

3.2. Training Algorithm

4. Experiments and Results

4.1. Dataset

4.2. Evaluation Metric

4.3. Experimental Results

4.4. Ablation Experiments

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics