Next Article in Journal
Information Extraction Under Privacy Constraints
Previous Article in Journal
On Solving the Fuzzy Customer Information Problem in Multicommodity Multimodal Routing with Schedule-Based Services
Article Menu

Export Article

Open AccessArticle
Information 2016, 7(1), 12; doi:10.3390/info7010012

CoSpa: A Co-training Approach for Spam Review Identification with Support Vector Machine

1
Research Center on Big Data Sciences, Beijing University of Chemical Technology, Beijing 100029, China
2
School of Economics and Management, Beijing University of Chemical Technology, Beijing 100029, China
3
School of Knowledge Science, Japan Advanced Institute of Science and Technology, 1-1 Ashahidai, Nomi City, Ishikawa 923-1292, Japan
4
Institute of Policy and Management, Chinese Academy of Sciences, Beijing 100190, China
*
Author to whom correspondence should be addressed.
Academic Editor: Willy Susilo
Received: 29 January 2016 / Revised: 23 February 2016 / Accepted: 26 February 2016 / Published: 9 March 2016
View Full-Text   |   Download PDF [765 KB, uploaded 9 March 2016]   |  

Abstract

Spam reviews are increasingly appearing on the Internet to promote sales or defame competitors by misleading consumers with deceptive opinions. This paper proposes a co-training approach called CoSpa (Co-training for Spam review identification) to identify spam reviews by two views: one is the lexical terms derived from the textual content of the reviews and the other is the PCFG (Probabilistic Context-Free Grammars) rules derived from a deep syntax analysis of the reviews. Using SVM (Support Vector Machine) as the base classifier, we develop two strategies, CoSpa-C and CoSpa-U, embedded within the CoSpa approach. The CoSpa-C strategy selects unlabeled reviews classified with the largest confidence to augment the training dataset to retrain the classifier. The CoSpa-U strategy randomly selects unlabeled reviews with a uniform distribution of confidence. Experiments on the spam dataset and the deception dataset demonstrate that both the proposed CoSpa algorithms outperform the traditional SVM with lexical terms and PCFG rules in spam review identification. Moreover, the CoSpa-U strategy outperforms the CoSpa-C strategy when we use the absolute value of decision function of SVM as the confidence. View Full-Text
Keywords: co-training; PCFG; spam review; CoSpa; support vector machine co-training; PCFG; spam review; CoSpa; support vector machine
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Zhang, W.; Bu, C.; Yoshida, T.; Zhang, S. CoSpa: A Co-training Approach for Spam Review Identification with Support Vector Machine. Information 2016, 7, 12.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Information EISSN 2078-2489 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top