Next Article in Journal
On a Common Misconception Regarding the de Broglie–Bohm Theory
Previous Article in Journal
Analysis of Cell Signal Transduction Based on Kullback–Leibler Divergence: Channel Capacity and Conservation of Its Production Rate during Cascade
Article Menu
Issue 6 (June) cover image

Export Article

Open AccessArticle

Factoid Question Answering with Distant Supervision

Key Laboratory of Technology in Geo-spatial Information Processing and Application System, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China
School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
Institute of Electronics, Chinese Academy of Sciences, Suzhou, Suzhou 215123, China
Author to whom correspondence should be addressed.
Entropy 2018, 20(6), 439;
Received: 9 March 2018 / Revised: 23 May 2018 / Accepted: 3 June 2018 / Published: 5 June 2018
(This article belongs to the Section Information Theory, Probability and Statistics)
PDF [508 KB, uploaded 5 June 2018]


Automatic question answering (QA), which can greatly facilitate the access to information, is an important task in artificial intelligence. Recent years have witnessed the development of QA methods based on deep learning. However, a great amount of data is needed to train deep neural networks, and it is laborious to annotate training data for factoid QA of new domains or languages. In this paper, a distantly supervised method is proposed to automatically generate QA pairs. Additional efforts are paid to let the generated questions reflect the query interests and expression styles of users by exploring the community QA. Specifically, the generated questions are selected according to the estimated probabilities they are asked. Diverse paraphrases of questions are mined from community QA data, considering that the model trained on monotonous synthetic questions is very sensitive to variants of question expressions. Experimental results show that the model solely trained on generated data via the distant supervision and mined paraphrases could answer real-world questions with the accuracy of 49.34%. When limited annotated training data is available, significant improvements could be achieved by incorporating the generated data. An improvement of 1.35 absolute points is still observed on WebQA, a dataset with large-scale annotated training samples. View Full-Text
Keywords: distant supervision; question answering; reading comprehension; question paraphrase distant supervision; question answering; reading comprehension; question paraphrase

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Zhang, H.; Liang, X.; Xu, G.; Fu, K.; Li, F.; Huang, T. Factoid Question Answering with Distant Supervision. Entropy 2018, 20, 439.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Entropy EISSN 1099-4300 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top