CoFea: A Novel Approach to Spam Review Identification Based on Entropy and Co-Training
AbstractWith the rapid development of electronic commerce, spam reviews are rapidly growing on the Internet to manipulate online customers’ opinions on goods being sold. This paper proposes a novel approach, called CoFea (Co-training by Features), to identify spam reviews, based on entropy and the co-training algorithm. After sorting all lexical terms of reviews by entropy, we produce two views on the reviews by dividing the lexical terms into two subsets. One subset contains odd-numbered terms and the other contains even-numbered terms. Using SVM (support vector machine) as the base classifier, we further propose two strategies, CoFea-T and CoFea-S, embedded with the CoFea approach. The CoFea-T strategy uses all terms in the subsets for spam review identification by SVM. The CoFea-S strategy uses a predefined number of terms with small entropy for spam review identification by SVM. The experiment results show that the CoFea-T strategy produces better accuracy than the CoFea-S strategy, while the CoFea-S strategy saves more computing time than the CoFea-T strategy with acceptable accuracy in spam review identification. View Full-Text
Share & Cite This Article
Zhang, W.; Bu, C.; Yoshida, T.; Zhang, S. CoFea: A Novel Approach to Spam Review Identification Based on Entropy and Co-Training. Entropy 2016, 18, 429.
Zhang W, Bu C, Yoshida T, Zhang S. CoFea: A Novel Approach to Spam Review Identification Based on Entropy and Co-Training. Entropy. 2016; 18(12):429.Chicago/Turabian Style
Zhang, Wen; Bu, Chaoqi; Yoshida, Taketoshi; Zhang, Siguang. 2016. "CoFea: A Novel Approach to Spam Review Identification Based on Entropy and Co-Training." Entropy 18, no. 12: 429.
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.