Next Article in Journal
Design of Programmable LED Controller with a Variable Current Source for 3D Image Display
Previous Article in Journal
Protection Method for Data Communication between ADS-B Sensor and Next-Generation Air Traffic Control Systems
Article Menu

Export Article

Open AccessArticle
Information 2014, 5(4), 634-651; doi:10.3390/info5040634

Deep Web Search Interface Identification: A Semi-Supervised Ensemble Approach

School of Mathematics & Statistics, Central South University, Changsha 410075, China
*
Author to whom correspondence should be addressed.
Received: 30 October 2014 / Revised: 24 November 2014 / Accepted: 28 November 2014 / Published: 1 December 2014
(This article belongs to the Section Information and Communications Technology)
View Full-Text   |   Download PDF [373 KB, uploaded 2 December 2014]   |  

Abstract

To surface the Deep Web, one crucial task is to predict whether a given web page has a search interface (searchable HyperText Markup Language (HTML) form) or not. Previous studies have focused on supervised classification with labeled examples. However, labeled data are scarce, hard to get and requires tediousmanual work, while unlabeled HTML forms are abundant and easy to obtain. In this research, we consider the plausibility of using both labeled and unlabeled data to train better models to identify search interfaces more effectively. We present a semi-supervised co-training ensemble learning approach using both neural networks and decision trees to deal with the search interface identification problem. We show that the proposed model outperforms previous methods using only labeled data. We also show that adding unlabeled data improves the effectiveness of the proposed model. View Full-Text
Keywords: semi-supervised learning; Deep Web mining; search interface identification; ensemble learning semi-supervised learning; Deep Web mining; search interface identification; ensemble learning
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Wang, H.; Xu, Q.; Zhou, L. Deep Web Search Interface Identification: A Semi-Supervised Ensemble Approach. Information 2014, 5, 634-651.

Show more citation formats Show less citations formats

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Information EISSN 2078-2489 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top