Next Article in Journal
Store-Carry and Forward-Type M2M Communication Protocol Enabling Guide Robots to Work together and the Method of Identifying Malfunctioning Robots Using the Byzantine Algorithm
Previous Article in Journal
A Security Analysis of Cyber-Physical Systems Architecture for Healthcare
Article Menu

Export Article

Open AccessArticle
Computers 2016, 5(4), 29; doi:10.3390/computers5040029

An Improved Retrievability-Based Cluster-Resampling Approach for Pseudo Relevance Feedback

Information Management Department, College of Computer and Information Sciences, Imam Muhammad Ibn Saud University, Riyadh 11564, Saudi Arabia
Academic Editor: Kartik Gopalan
Received: 31 July 2016 / Revised: 3 November 2016 / Accepted: 10 November 2016 / Published: 15 November 2016
View Full-Text   |   Download PDF [633 KB, uploaded 15 November 2016]   |  

Abstract

Cluster-based pseudo-relevance feedback (PRF) is an effective approach for searching relevant documents for relevance feedback. Standard approach constructs clusters for PRF only on the basis of high similarity between retrieved documents. The standard approach works quite well if the retrieval bias of the retrieval model does not create any effect on the retrievability of documents. In our experiments we observed when a collection contains retrieval bias, then high retrievable documents of clusters are frequently retrieved at top positions for most of the queries, and these drift the relevance feedback away from relevant documents. For reducing (retrieval bias) noise, we enhance the standard cluster construction approach by constructing clusters on the basis of high similarity and retrievability. We call this retrievability and cluster-based PRF. This enhanced approach keeps only those documents in the clusters that are not frequently retrieve due to retrieval bias. Although this approach improves the effectiveness, however, it penalizes high retrievable documents even if these documents are most relevant to the clusters. To handle this problem, in a second approach, we extend the basic retrievability concept by mining frequent neighbors of the clusters. The frequent neighbors approach keeps only those documents in the clusters that are frequently retrieved with other neighbors of clusters and infrequently retrieved with those documents that are not part of the clusters. Experimental results show that two proposed extensions are helpful for identifying relevant documents for relevance feedback and increasing the effectiveness of queries. View Full-Text
Keywords: document clustering; machine learning; information retrieval; pseudo-relevance feedback; query expansion; retrieval bias; retrievability measure document clustering; machine learning; information retrieval; pseudo-relevance feedback; query expansion; retrieval bias; retrievability measure
Figures

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Bashir, S. An Improved Retrievability-Based Cluster-Resampling Approach for Pseudo Relevance Feedback. Computers 2016, 5, 29.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Computers EISSN 2073-431X Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top