Identifying High Quality Document–Summary Pairs through Text Matching
AbstractText summarization namely, automatically generating a short summary of a given document, is a difficult task in natural language processing. Nowadays, deep learning as a new technique has gradually been deployed for text summarization, but there is still a lack of large-scale high quality datasets for this technique. In this paper, we proposed a novel deep learning method to identify high quality document–summary pairs for building a large-scale pairs dataset. Concretely, a long short-term memory (LSTM)-based model was designed to measure the quality of document–summary pairs. In order to leverage information across all parts of each document, we further proposed an improved LSTM-based model by removing the forget gate in the LSTM unit. Experiments conducted on the training set and the test set built upon Sina Weibo (a Chinese microblog website similar to Twitter) showed that the LSTM-based models significantly outperformed baseline models with regard to the area under receiver operating characteristic curve (AUC) value. View Full-Text
Scifeed alert for new publicationsNever miss any articles matching your research from any publisher
- Get alerts for new papers matching your research
- Find out the new papers from selected authors
- Updated daily for 49'000+ journals and 6000+ publishers
- Define your Scifeed now
Hou, Y.; Xiang, Y.; Tang, B.; Chen, Q.; Wang, X.; Zhu, F. Identifying High Quality Document–Summary Pairs through Text Matching. Information 2017, 8, 64.
Hou Y, Xiang Y, Tang B, Chen Q, Wang X, Zhu F. Identifying High Quality Document–Summary Pairs through Text Matching. Information. 2017; 8(2):64.Chicago/Turabian Style
Hou, Yongshuai; Xiang, Yang; Tang, Buzhou; Chen, Qingcai; Wang, Xiaolong; Zhu, Fangze. 2017. "Identifying High Quality Document–Summary Pairs through Text Matching." Information 8, no. 2: 64.
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.