CoSpa: A Co-training Approach for Spam Review Identification with Support Vector Machine
AbstractSpam reviews are increasingly appearing on the Internet to promote sales or defame competitors by misleading consumers with deceptive opinions. This paper proposes a co-training approach called CoSpa (Co-training for Spam review identification) to identify spam reviews by two views: one is the lexical terms derived from the textual content of the reviews and the other is the PCFG (Probabilistic Context-Free Grammars) rules derived from a deep syntax analysis of the reviews. Using SVM (Support Vector Machine) as the base classifier, we develop two strategies, CoSpa-C and CoSpa-U, embedded within the CoSpa approach. The CoSpa-C strategy selects unlabeled reviews classified with the largest confidence to augment the training dataset to retrain the classifier. The CoSpa-U strategy randomly selects unlabeled reviews with a uniform distribution of confidence. Experiments on the spam dataset and the deception dataset demonstrate that both the proposed CoSpa algorithms outperform the traditional SVM with lexical terms and PCFG rules in spam review identification. Moreover, the CoSpa-U strategy outperforms the CoSpa-C strategy when we use the absolute value of decision function of SVM as the confidence. View Full-Text
Share & Cite This Article
Zhang, W.; Bu, C.; Yoshida, T.; Zhang, S. CoSpa: A Co-training Approach for Spam Review Identification with Support Vector Machine. Information 2016, 7, 12.
Zhang W, Bu C, Yoshida T, Zhang S. CoSpa: A Co-training Approach for Spam Review Identification with Support Vector Machine. Information. 2016; 7(1):12.Chicago/Turabian Style
Zhang, Wen; Bu, Chaoqi; Yoshida, Taketoshi; Zhang, Siguang. 2016. "CoSpa: A Co-training Approach for Spam Review Identification with Support Vector Machine." Information 7, no. 1: 12.
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.