You are currently viewing a new version of our website. To view the old version click .
Information
  • Article
  • Open Access

6 February 2023

Abstractive Summary of Public Opinion News Based on Element Graph Attention

,
,
and
1
Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
2
Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, China
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Advanced Natural Language Processing and Machine Translation

Abstract

The summary of case–public opinion refers to the generation of case-related sentences from public opinion information related to judicial cases. Case–public opinion news refers to the judicial cases (intentional homicide, rape, etc.) that cause large public opinion. The public opinion news in these cases usually contains case element information such as the suspect, victim, time, place, process, and sentencing of the case. In the multi-document summary of case–public opinion, due to the problem of information cross and information redundancy between different documents under the same case, in order to generate a concise and smooth summary, this paper proposes an abstractive summary model of case–public opinion based on the attention of a case element diagram. Firstly, multiple public opinion documents in the same case are split into paragraphs, and then the paragraphs and case elements are coded based on the transformer method to construct a heterogeneous graph containing paragraph nodes and case element nodes. Finally, in the decoding process, the two-layer attention mechanism is applied to the case element node and paragraph node, so that the model can effectively solve the redundancy problem in summary generation.

1. Introduction

Public opinion related to a case is rapidly fermented and disseminated through the Internet, and a large amount of public opinion information about the case will be formed. From this public opinion information, a brief summary of the topic information is generated, which plays an important role in quickly understanding the case and grasping the trend of public opinion.
In recent years, sequence-to-sequence-based models have achieved good results in single-document summarization tasks, but when applied to multi-document summarization tasks, due to too many input sequences, significant information related to summarization cannot be extracted, causing the generated summary to contain redundant information.
It is difficult to achieve better results simply by applying the sequence-to-sequence model to the multi-document summarization task. The multi-document summary text of case–public opinion usually contains information such as “victims, criminal suspects, and the location of the crime”. This information is an important part of the case–public opinion text and is also important for summary generation, as shown in Table 1.
Table 1. Examples of public opinion text data for multi-document cases.
As shown in Table 1, in the three different body texts of the “ SAN Huang Playground burial Case (Murder case of Tang Shi ping) “Deng Shipping, Du Shaoping, Huang Bingsong, Xinhuang’s “playground burial case””, case elements such as these appear frequently in texts and summaries. We believe that sentences containing case elements are more likely to become abstract sentences, and with the help of case elements, the relationship between documents can be efficiently encoded, and the problems of sentence salience and information redundancy can be resolved.
Based on the above analysis, we build a heterogeneous graph composed of document paragraph nodes and case element nodes. The element nodes can connect different documents and model relationships between documents through case elements. The graph attention network (GAT) [1] is applied to implement the information flow between nodes and iteratively update the node representation. In the decoding process, the decoder first processes the case element node information and expresses the attention weight of the document by combining the attention weight of the case element with the edge weight. We design a novel two-layer attention mechanism to identify salient information in each decoding step by considering the element and the global interaction between documents in the graph and re-addressing the redundancy problem of abstractive summaries. Experiments show that the model significantly improves the performance of case element-based multi-document summarization.

3. Materials and Methods

A multi-document abstractive summarization model for case–public opinion based on feature graph attention is based on a transformer-based encoder–decoder architecture [31]. Using the graph neural network as the encoder, combined with the case element information and graph structure information representation, multiple documents can be effectively encoded; in the decoding process, a new two-level attention mechanism is proposed to deal with the saliency and redundancy problem. The specific structure is shown in Figure 1.
Figure 1. Multi-document abstractive summarization method for case—public opinion based on feature graph attention.
As shown in Figure 1, the encoder–decoder framework is followed in the model graph, the encoder side is the document paragraph encoder and the case feature encoder, and the decoder is a two-level attention mechanism.

3.1. Element Relationship Diagram Construction Module

The input source documents are multiple documents D = { d 1 , d 2 , , d n } , which are first divided into smaller semantic unit paragraphs p = { p 1 , p 2 , , p n } . Then they are constructed into a heterogeneous graph G = ( V , E ) . V includes paragraph nodes V p and case feature nodes V c . E represents an undirected edge between nodes; there is no edge inside a paragraph node or case element node, there is only an edge between paragraph nodes or case element nodes. The edges of P i and C j indicate that the case elements in C j are contained in P i .
In order to include more information in the diagram, the composition obtains the matrix E R m × n , where e i j 0 indicates that the elements C j of the case are contained in P i . Specific algorithm construction is shown in Algorithm 1.
Algorithm 1: Algorithm for constructing element relation graph
Input: input text D = { d 1 , d 2 , , d n }
Output: element relationship diagram G = { V , E } ,   V = C P
1. Collection of case data sets C ;
2. With the document paragraph node as the initial node, to V = P ;
3. For d i in D do;
4.          for p i in d i do;
5.                        if p i contains c C then;
6.                                  V = p i c ;
7.                        end if;
8.       End for;
9. End for;
10. For d i in D do;
11.           for p i in d i do;
12.                          if p i and p j contains c C then;
13.                                  E = e i j ;
14.                        end if;
15.       end for;
16. End for.

3.2. Document Encoder

Segment multiple documents and then stack several token-level transformer [31] encoding layers to encode the contextual information in each paragraph.
h w l = L a y N o r m ( x w l 1 + M H A t t n ( x w l 1 ) )
x w l = L a y e r N o r m ( h w l + F F N ( h w l ) )
h p = M H P o o l ( h w 1 , h w 2 , )

3.3. Graph Encoder

The representation of semantic nodes is updated using graph attention networks (GAT) 1. Use i , j { 1 , 2 , , ( m + n ) } to represent any node in the graph, and the adjacent node set of the i node is represented by N i . The GAT layer is designed as follows:
z i j = L e a k y Re L U ( W a [ W q h i ; W k h j ] )
z ˜ i j = e ˜ i j × z i j
α i j = exp ( z ˜ i j ) l N i exp ( z ˜ i j )
u i = σ ( j N i α i j W v h j )
where e ˜ i j are edge weights derived from a matrix of TFIDF values. The main idea is to represent edge weights by discretizing real values into integers and then learn the embedding of these integers to map the weights to a multidimensional embedding space e ij R d e . The TFIDF value indicates the proximity between the case element node and the paragraph node. Therefore, we directly incorporate the original TFIDF information into the GAT mechanism by updating the attention weights using Equation (5).
Combining GAT and multi-head operations obtains h i , adding a residual connection to avoid vanishing gradients after a few iterations.
h ˜ i = h i + u i
We use the above GAT layer and a position feed forward layer to iteratively update the node representation. Each iteration consists of a paragraph-to-case element and case-element-to-paragraph update process. After t iterations, we represent each input feature matrix with H ˜ pc .
H ˜ pc R n c × ( d c + d h )

3.4. Element Decoder Based on Two-Layer Attention

Under the multi-document summarization task, the input source document may contain a lot of tokens. If the decoder computes attention weights for all tokens, the cost will be very high, and attention may be distracted. Therefore, this paper proposes that the two-level decoding process first focuses on case element nodes, which can be regarded as saliency indicators in the summarization process. This metric restricts token-level attention to certain passages, which further reduces redundancy compared to the focus on all tokens approach. i and j are used to denote case element nodes and paragraph nodes, respectively.
At each decoding step, the state of the decoder is s , and we compute the attention score of the case element node c i .
z i = u 0 T L e a k y Re L U ( [ W q s ; W k c i ] )
In Equation (10), u 0 T denotes the transposed matrix obtained after training.
z ˜ j = i = 1 m z i × e ˜ i j
In Equation (11), e ˜ i j is the edge weight derived from the TFIDF value matrix, and z ˜ j represents the segment node coefficient to realize the information flow between the element node and the segment node.
β j = exp ( z ˜ j ) l = 1 m exp ( z ˜ l )
In Equation (12), through the normalization operation, weighted summation after an activation function obtains the attention weight of the element node. We select the previous paragraph node with the highest attention score β j , and then use the attention mechanism for T w tokens in the selected paragraph node.
z w i = u 1 T L e a k y Re L U ( [ W q s t ; W k h ˜ w i ] )
In Equation (13), u 1 T denotes the transition matrix obtained after training. z w i is the attention coefficient of the node token level in the state s. h ˜ w i represents the context vector at the token level obtained by segmenting the paragraph.
γ w i = exp ( z w i ) l = 1 T w exp ( z w i )
Equation (14) obtains the attention weight at the token level in the paragraph node through the normalization operation, weighted summation, and an activation function.
γ ^ w i = β j × γ w i
In Equation (15), the token-level segment feature of two-level attention mechanism is obtained by combining the element node attention β j with the token-level segment feature aggregation.
v t = t γ ^ w i h ^ w i
The context vector is taken as the prominent content generated from the summary of the source document and connected with the hidden state s t of the decoder to obtain the distribution of the vocabulary.
P v o c a b = S o f t max ( W o [ s t ; v t ] )

3.5. Parameter Training

The training process follows a traditional sequence-to-sequence model with maximum likelihood estimation as the loss function:
L s e q = 1 | D | ( y , x ) D log p ( y | x ; θ )
where x and y are the document–summary pairs from the training set D and θ which are the parameters to learn.

4. Results

4.1. Case–Public Opinion Multi-Document Summary Dataset

Due to the continuous generation and rapid accumulation of online public opinion information, a large amount of public opinion information related to each case has been generated on the network platform. The case–public opinion multi-document summary data set constructed in this paper is collected from the Internet based on crawler technology. The specific methods are as follows:
First, perform a search in the Sogou Encyclopedia by case category (intentional homicide, robbery, kidnapping, drug trafficking, etc.), and use “intentional homicide” as an example; the search results are shown in Figure 2.
Figure 2. The search results of “intentional homicide” from the Sogou Encyclopedia.
Use Scrapy technology to find corresponding case names, remove non-case data, and form a case database containing ten types of case names. Use the case names in the case database to search in Baidu, and use the crawler technology to crawl the Baidu Encyclopedia case names and corresponding links. Taking the “Xinhuang Playground Burial Case” as an example, the search results are shown in Figure 3.
Figure 3. Baidu’s search results for “Xinhuang Playground Burial Case”. In Box 1 is the case description, and in box 2 is the key element of the case.
Figure 3 is based on Baidu Encyclopedia’s display of the search results of “Xinhuang Playground Burial Case”. In the figure, the content in box 1 is defined as the standard abstract, and the content in box 2 is defined as the case elements. The content of the text comes from different web page links at the bottom of the Baidu Encyclopedia. Crawl content from different web links describing the same case forms a multi-document dataset. The main collected information includes release time, source, title, and text. After the link is opened, its structure is shown in Figure 4.
Figure 4. Schematic diagram of the secondary page structure.
Use crawler programs to collect public opinion news about relevant cases based on the Baidu Encyclopedia. Perform manual calibration, cleaning, delete non-case data, and remove noise data such as “\n”. Finally, a case–public opinion summary dataset is constructed. The dataset contains 4569 texts and 13,133 sentences, and the average length of reference abstracts is 190.94. The statistical results of the dataset are shown in Table 2.
Table 2. Multi-document summary dataset information.

4.2. Experimental Parameter Settings

This experiment is carried out on the multi-document summary dataset of case–public opinion, and the ROUGE [32] value is used to automatically evaluate the summary quality. ROUGE-1 (RG-1), ROUGE-2 (RG-2), and ROUGE-L (RG-L) are used as the evaluation indices. The number of transformer encoding layers in the hyperparameters is set to 6, the hidden size is set to 256, the number of heads is set to 8, and the hidden size of the feedforward layer is set to 1024. We truncate the length of the input paragraph and case elements to 100 and 10 tokens, respectively. In the multi-head pooling layer, the number of heads is eight. In the graph encoding process, each layer has 8 headers, and the hidden size is 256. Other training parameters are shown in Table 3.
Table 3. Model training parameter settings.

4.3. Baseline Model Settings

To verify the effectiveness of the proposed method, we choose to compare with transformer and graph-based summarization models:
(1) FT (flat transformer) is a six-layer based encoder–decoder model. The title of the case–public opinion data and the document are connected into a long text, and the first 800 tokens are intercepted as the model input.
(2) T-DMCA (transformer decoder with memory compressed attention model) [33] is based on the cross-attention mechanism of transformer coding, using a transformer decoder and applying a convolutional layer to compress in the self-attention mechanism key and value values.
(3) HT (hierarchical transformer) [29] can efficiently process multiple input documents and extract transformer architectures with the ability to encode documents in a hierarchical manner. Cross-document relationships are first represented by an attention mechanism.
(4) GraphSum [34] uses graphs to represent multi-document generative summarization models of documents and constructs topic relation graphs and chapter structure graphs, and the model uses graphs to encode documents in order to capture the relationship between documents.

5. Discussion

5.1. Analysis of Experimental Results

The first set of experiments is a comparison experiment between the model in this paper and the four baseline models on the single-document and multi-document summary datasets of case–public opinion. The results are shown in Table 4.
Table 4. Baseline model comparison experiment.
It can be seen from the experimental results in the table above: (1) Compared with the FT model, the values of RG-1 and RG-2 are increased by 2.53 and 2.66, respectively. This is because in the multi-document summary of the FT model, the first 800 words intercepted by this method may cause key information about public opinion of the case to be removed and cannot fully summarize the theme of the article. (2) Compared with the T-DMCA model, the values of RG-1 and RG-2 are improved by 1.59 and 1.56, respectively, because in the T-DMCA model, the information generated by the multi-layer decoder in the transformer is redundant. In addition, the multi-layer decoder will cause the problem of inefficiency in inference, and parallel matrix operations can improve the decoding speed of the decoder. (3) Compared with the HT model in this paper, RG-1 and RG-2 are improved by 0.87 and 1.02, respectively. This is because in the HT model, sentence-level and word-level transformers are introduced to encode the case–public opinion text, and as the amount of parameters increases, the model complexity increases. (4) Compared with the GraphSum model, the model in this paper has an improvement of 0.29 and 0.82 on RG-1 and RG-2, respectively, indicating that the graph structure is used to represent the relationship between documents across sentences, but the model in this paper incorporates case elements as auxiliary information. The method is more effective, can effectively reduce redundant information, and has an important guiding role in generating sentences that are closer to the topic of the document.

5.2. Analysis of Ablation Experiments

To verify the effectiveness of individual components such as the graph encoder module and the two-level attention module, we conduct ablation research experiments. The ablation experiments results are shown in Table 5.
Table 5. Ablation experiments.
The “w/o graph encoder” represents experiments without the graph encoding module, fixing case element representation, and paragraph representation after multi-head pooling layer. The “w/o two-level attention” means experiments without two-level attention.
We directly apply token-level attention; however, the extra focus on case elements suggests that this is an easy way to combine information about case elements. Table 6 results show the effectiveness of our newly introduced module.
Table 6. Comparative experiments of different case element extraction methods.

5.3. Comparative Experimental Analysis of Different Case Element Extraction Methods

This experiment mainly verifies the influence of obtaining case elements using different methods of abstract generation. Using TFIDF, TextRank, and named entity recognition algorithms, keywords are extracted from the case text as case elements and integrated into the model of this paper to generate abstracts. The case elements of each case in the data set constructed in this paper usually include case name, victim, suspect, case time, crime location, and other case elements, usually containing 5–8 keywords. The NER method takes the names of people and places, organization name, and time as case elements. For the TFIDF and TextRank methods, we use top-8 elements as the case elements. The results are shown in Table 6 Show.
It can be seen from the above table that (1) compared with the NER model, the RG-1 and RG-2 values of this model are improved by 1.57 and 1.51 because the NER method obtains a large amount of redundant information, which is not conducive to the learning of graph attention and results in summary performance drops. (2) When compared with the TFIDF model, the RG-1 and RG-2 values of this model are increased by 1.44 and 1.29, respectively, because TFIDF is a method based on word frequency statistics. The keywords in the text can fully express the subject information of the article, thereby improving summary performance, but words with higher word frequency in case–public opinion texts are not necessarily related to text topics. (3) When compared with the TextRank method, the model in this paper has an improvement of 0.66 and 0.45 on RG-1 and RG-2, respectively. TextRank’s method of extracting keywords has a small gap. When there are multiple documents but no case elements, it can extract keywords which serve as case elements and are integrated into the generation of auxiliary abstracts in the model of this paper.

6. Conclusions

In this paper, a multi-document generative summarization model based on case feature graph attention is proposed. In addition to text unit nodes, case element nodes are also introduced to construct heterogeneous graphs, which assist the model to capture complex relationships between text units. A decoder with a two-level attention mechanism is also introduced, which first pays attention to the case element nodes and then uses the attention weights to guide the attention to the text units, which can effectively deal with saliency and redundancy issues. In the next research, we will continue to explore other methods, such as reinforcement learning-based methods, to further improve the summary quality in the multi-document summarization environment, and the model in this paper can also be applied to other tasks, such as multi-document question answering.

Author Contributions

Methodology, Y.H. and G.L.; Project administration, Z.Y.; Writing—original draft, Y.H. and S.H.; Writing—review-editing, Y.H. and S.H. All authors have read and agreed to the published version of the manuscript.

Funding

We would like to thank the anonymous reviewers for their constructive comments. This work was supported by the National Key Research and Development Program of China (grant numbers 2018YFC0830105, 2018YFC0830101, and 2018YFC0830100), the National Natural Science Foundation of China (grant numbers 62266027, U21B2027, and 61972186), the Yunnan provincial major science and technology special plan projects (grant number 202202AD080003), general projects of basic research in Yunnan Province (grant number 202001AT070047), and Kunming University of Science and Technology “double first-class” joint project (202201BE070001-021).

Institutional Review Board Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
  2. Barzilay, R.; McKeown, K.R. Sentence fusion for multidocument news summarization. Comput. Linguist. 2005, 31, 297–328. [Google Scholar] [CrossRef]
  3. Filippova, K.; Strube, M. Sentence fusion via dependency graph compression. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, HI, USA, 25–27 October 2008; pp. 177–185. [Google Scholar]
  4. Banerjee, S.; Mitra, P.; Sugiyama, K. Multi-document abstractive summarization using ilp based multi-sentence compression. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015. [Google Scholar]
  5. Li, W. Abstractive multi-document summarization with semantic information extraction. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1908–1913. [Google Scholar]
  6. Bing, L.; Li, P.; Liao, Y.; Lam, W.; Guo, W.; Passonneau, R.J. Abstractive multi-document summarization via phrase selection and merging. arXiv 2015, arXiv:1506.01597. [Google Scholar]
  7. Cohn, T.A.; Lapata, M. Sentence compression as tree transduction. J. Artif. Intell. Res. 2009, 34, 637–674. [Google Scholar] [CrossRef]
  8. Wang, L.; Cardie, C. Domain-independent abstract generation for focused meeting summarization. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Sofia, Bulgaria, 4–9 August 2013; pp. 1395–1405. [Google Scholar]
  9. Pighin, D.; Cornolti, M.; Alfonseca, E.; Filippova, K. Modelling events through memory-based, open-ie patterns for abstractive summarization. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, MD, USA, 22–27 June 2014; pp. 892–901. [Google Scholar]
  10. Paulus, R.; Xiong, C.; Socher, R. A deep reinforced model for abstractive summarization. arXiv 2017, arXiv:1705.04304. [Google Scholar]
  11. Gehrmann, S.; Deng, Y.; Rush, A.M. Bottom-up abstractive summarization. arXiv 2018, arXiv:1808.10792. [Google Scholar]
  12. Li, W.; Xiao, X.; Lyu, Y.; Wang, Y. Improving neural abstractive document summarization with structural regularization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 4078–4087. [Google Scholar]
  13. Zhang, X.; Wei, F.; Zhou, M. HIBERT: Document level pre-training of hierarchical bidirectional transformers for document summarization. arXiv 2019, arXiv:1905.06566. [Google Scholar]
  14. Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
  15. Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. arXiv 2019, arXiv:1910.13461. [Google Scholar]
  16. Zhang, J.; Zhao, Y.; Saleh, M.; Liu, P. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the International Conference on Machine Learning. PMLR; 2020; pp. 11328–11339. Available online: https://www.google.com.hk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwiiwNjb8P_8AhXD6aQKHdhzB3sQFnoECAoQAQ&url=http%3A%2F%2Fproceedings.mlr.press%2Fv119%2Fzhang20ae%2Fzhang20ae.pdf&usg=AOvVaw1VKn6wia_Muv_rcuqG30sp (accessed on 1 January 2023).
  17. Zou, Y.; Zhang, X.; Lu, W.; Wei, F.; Zhou, M. Pre-training for abstractive document summarization by reinstating source text. arXiv 2020, arXiv:2004.01853. [Google Scholar]
  18. Grail, Q.; Perez, J.; Gaussier, E. Globalizing BERT-based transformer architectures for long document summarization. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Online, 19–23 April 2021; pp. 1792–1810. [Google Scholar]
  19. Erkan, G.; Radev, D.R. Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 2004, 22, 457–479. [Google Scholar] [CrossRef]
  20. Wan, X. An exploration of document impact on graph-based multi-document summarization. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, HI, USA, 25–27 October 2008; pp. 755–762. [Google Scholar]
  21. Christensen, J.; Soderland, S.; Etzioni, O. Towards coherent multi-document summarization. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, USA, 9–14 June 2013; pp. 1163–1173. [Google Scholar]
  22. Tan, J.; Wan, X.; Xiao, J. Abstractive document summarization with a graph-based attentional neural model. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July–4 August 2017; pp. 1171–1181. [Google Scholar]
  23. Yasunaga, M.; Zhang, R.; Meelu, K.; Pareek, A.; Srinivasan, K.; Radev, D. Graph-based neural multi-document summarization. arXiv 2017, arXiv:1706.06681. [Google Scholar]
  24. Fan, A.; Gardent, C.; Braud, C.; Bordes, A. Using local knowledge graph construction to scale seq2seq models to multi-document inputs. arXiv 2019, arXiv:1910.08435. [Google Scholar]
  25. Huang, L.; Wu, L.; Wang, L. Knowledge graph-augmented abstractive summarization with semantic-driven cloze reward. arXiv 2020, arXiv:2005.01159. [Google Scholar]
  26. Wang, D.; Liu, P.; Zheng, Y.; Qiu, X.; Huang, X. Heterogeneous graph neural networks for extractive document summarization. arXiv 2020, arXiv:2004.12393. [Google Scholar]
  27. Song, Z.; King, I. Hierarchical Heterogeneous Graph Attention Network for Syntax-Aware Summarization. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2022; pp. 11340–11348. [Google Scholar]
  28. Jin, H.; Wang, T.; Wan, X. Multi-granularity interaction network for extractive and abstractive multi-document summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 6244–6254. [Google Scholar]
  29. Liu, Y.; Lapata, M. Hierarchical transformers for multi-document summarization. arXiv 2019, arXiv:1905.13164. [Google Scholar]
  30. Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI blog 2019, 1, 9. [Google Scholar]
  31. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
  32. Lin, C.Y. Rouge: A package for automatic evaluation of summaries. In Proceedings of the Text summarization branches out; 2004; pp. 74–81. Available online: https://aclanthology.org/W04-1013.pdf (accessed on 1 January 2023).
  33. Liu, P.J.; Saleh, M.; Pot, E.; Goodrich, B.; Sepassi, R.; Kaiser, L.; Shazeer, N. Generating wikipedia by summarizing long sequences. arXiv 2018, arXiv:1801.10198. [Google Scholar]
  34. Li, W.; Xiao, X.; Liu, J.; Wu, H.; Wang, H.; Du, J. Leveraging graph to improve abstractive multi-document summarization. arXiv 2020, arXiv:2005.10043. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.