Next Article in Journal
Turbo Coded OFDM Combined with MIMO Antennas Based on Matched Interleaver for Coded-Cooperative Wireless Communication
Previous Article in Journal
Exponential Operations and an Aggregation Method for Single-Valued Neutrosophic Numbers in Decision Making
Previous Article in Special Issue
Multi-Label Classification from Multiple Noisy Sources Using Topic Models
Article Menu

Export Article

Open AccessArticle
Information 2017, 8(2), 64;

Identifying High Quality Document–Summary Pairs through Text Matching

Intelligence Computing Research Center, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China
Author to whom correspondence should be addressed.
Academic Editors: Parma Nand and Rivindu Perera
Received: 28 March 2017 / Revised: 3 June 2017 / Accepted: 7 June 2017 / Published: 12 June 2017
(This article belongs to the Special Issue Text Mining Applications and Theory)
Full-Text   |   PDF [1584 KB, uploaded 12 June 2017]   |  


Text summarization namely, automatically generating a short summary of a given document, is a difficult task in natural language processing. Nowadays, deep learning as a new technique has gradually been deployed for text summarization, but there is still a lack of large-scale high quality datasets for this technique. In this paper, we proposed a novel deep learning method to identify high quality document–summary pairs for building a large-scale pairs dataset. Concretely, a long short-term memory (LSTM)-based model was designed to measure the quality of document–summary pairs. In order to leverage information across all parts of each document, we further proposed an improved LSTM-based model by removing the forget gate in the LSTM unit. Experiments conducted on the training set and the test set built upon Sina Weibo (a Chinese microblog website similar to Twitter) showed that the LSTM-based models significantly outperformed baseline models with regard to the area under receiver operating characteristic curve (AUC) value. View Full-Text
Keywords: text summarization; deep learning; long short-term memory; noise reduction text summarization; deep learning; long short-term memory; noise reduction

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Hou, Y.; Xiang, Y.; Tang, B.; Chen, Q.; Wang, X.; Zhu, F. Identifying High Quality Document–Summary Pairs through Text Matching. Information 2017, 8, 64.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Information EISSN 2078-2489 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top