Using Neighbor Diversity to Detect Fraudsters in Online Auctions

Online auctions attract not only legitimate businesses trying to sell their products but also fraudsters wishing to commit fraudulent transactions. Consequently, fraudster detection is crucial to ensure the continued success of online auctions. This paper proposes an approach to detect fraudsters based on the concept of neighbor diversity. The neighbor diversity of an auction account quantifies the diversity of all traders that have transactions with this account. Based on four different features of each trader (i.e., the number of received ratings, the number of cancelled transactions, k-core, and the joined date), four measurements of neighbor diversity are proposed to discern fraudsters from legitimate traders. An experiment is conducted using data gathered from a real world auction website. The results show that, although the use of neighbor diversity on k-core or on the joined date shows little or no improvement in detecting fraudsters, both the neighbor diversity on the number of received ratings and the neighbor diversity on the number of cancelled transactions improve classification accuracy, compared to the state-of-the-art methods that use k-core and center weight.


Introduction
Online auctions have become a popular business model for more than a decade now.They attract not only legitimate businesses trying to sell their products, but also fraudsters wishing to commit fraudulent transactions.Most auction websites, such as eBay and Yahoo!Auctions, provide a reputation system to help buyers select reputable sellers.Therefore, maintaining a good reputation in the reputation system is important for sellers to gain trust and subsequent business from potential buyers in online auctions.
The reputation systems used by most auction websites are essentially recommendation systems.For example, in eBay, after committing transactions, buyers and sellers can give positive, negative or neutral ratings together with textual comments to reflect their experiences about transactions.An auction account receiving more positive ratings and fewer or no negative ratings has better online reputation.In addition to attracting sales, seller reputation also has an economically and statistically significant effect on price of merchandise [1].A buyer is often willing to purchase the same merchandise from a more reputable seller at a higher price than from a less reputable seller at a lower price.
Normally, a legitimate seller works hard to ensure the quality of his/her merchandises and services to maintain good online reputation.However, a fraudster can easily deceive the reputation system into giving the fraudster a good online reputation, resulting in the so-called inflated reputation fraud [2].Usually, this deceitful scheme is accomplished by a group of collusive auction accounts who commit many transactions within the group for low-price merchandise and give each other positive ratings [3].Because the cost for creating accounts and committing transactions for low-price merchandise in online auctions is low, such inflated reputation behavior has become commonplace and consequently damages the trustworthiness of the reputation systems of auction websites.
Many approaches have been proposed in the literature to detect inflated reputation fraud in online auctions.Earlier approaches are based on user-level features [4,5] such as median, sum, mean or standard deviation of the prices of merchandises that an account bought or sold in a certain period of time.In contrast, most recent approaches have adopted a social network approach.They build a graph based on accounts' transaction history, and then apply Social Network Analysis (SNA) to detect cohesive groups in the graph since the accounts in a collusive group often have extensive transactions within the group to boost their reputations [2,3,[5][6][7][8][9][10][11][12][13].To detect the cohesive groups in the graph, the notion of k-core has been found to be more feasible than other notions for cohesive groups, such as component, clique and k-plex [2,13].However, the use of k-core often results in low precision.The notion of center weight is then proposed to improve the precision, but it decreases the recall [6].
Instead of focusing solely on the user-level features or the cohesive groups in the social network, we propose the notion of neighbor diversity of an auction account to quantify the diversity of all traders that have given at least a rating to this account.Prior to calculating neighbor diversity, all accounts are partitioned into several classes based on selected features such as the number of received ratings or k-core.Then, the neighbor diversity of an account is calculated using the class distribution of his/her neighbors in the social network.Because Shannon entropy [14] is often used to quantify the diversity of a group [15][16][17], it is adopted in this paper to calculate the neighbor diversity.
The objective of this paper is to improve the classification performance for fraudster detection in online auctions.To achieve this goal, we propose four types of neighbor diversity, based on the number of received ratings, the number of cancelled transactions, the joined date and k-core.A dataset of auction accounts crawled from Ruten, one of the largest online auction websites in Taiwan, is used to perform ten-fold cross validation experiment with three classification algorithms.The results show that using the neighbor diversity improves the classification accuracy, precision and recall for J48 decision tree, neural network and support vector machine.
The rest of this paper is organized as follows: Section 2 reviews previous works on fraudster detection.Section 3 defines four different measurements of neighbor diversity used in this paper.Section 4 describes the data collection process.Section 5 presents the experimental setting and discusses the results.Finally, Section 6 concludes this paper and gives some directions for future work.

Related Work
This section reviews previous work on detecting inflated reputation fraud in online auctions.For recent surveys on online auction frauds in general, please refer to [18,19].For problems associated with the online reputation system, please refer to [20,21].
In the literature, the reputation systems used by most online auction websites have been criticized to be overly simple and likely to be taken advantage of by deceitful traders to inflate their online reputations.Many approaches have been proposed to deal with this problem.Chau and Faloutsos [4] suggested the use of user-level features to build a decision tree for detecting fraudsters.The user-level features they used include median and standard deviation of the prices of items sold/bought within several different periods of times, and the ratio of the number of items bought to the number of all transactions.Later, Chau et al. [5] proposed combining these user-level features with network-level features to detect the conspired groups in the social network built from the transaction history, as they observed that fraudsters and their accomplices are likely to form a bipartite core in the social network.Their method first built a Markov Random Field (MRF) model from the transaction history among all traders, then used the results of the user-level features [4] to instantiate the observed values of the nodes in the MRF model, and finally applied the belief propagation algorithm to calculate the probabilities of fraudster, accomplice and normal user for each node.Pandit et al. [7] simplified this method by instantiating the observed values of the nodes in the MRF model to a constant, and consequently the user-level features were no longer needed.Bin and Faloutsos [8] replaced the belief propagation algorithm in these methods [4,7] with the loopy belief propagation algorithm.The inflated reputation problem in a Chinese auction website was studied in [22], where the authors combined transaction-related features (including price, frequency, comment and connectedness in the transaction network) and user-level features (including reputation and age) to build a logistic regression model for estimating the likelihood of collusion of an account.Chang and Chang [23] suggested using a wrapper approach to select a subset of features from a large candidate feature pool to build a decision tree for fraudster detection.
As a fraudster relies on his/her accomplices to boost his/her online reputation, extensive transactions among them must occur in the transaction history.Accordingly, many approaches have tried to detect fraudsters by detecting the cohesive groups in the social network built from the transaction history.The notion of k-core has been found to be more feasible than other notions for cohesive groups (e.g., component, clique and k-plex) to detect fraudsters [2,13].A common way to build the social network is to represent each trader as a node, and each transaction between two traders as an edge connecting the two corresponding nodes of the two traders.Furthermore, duplicate edges between two nodes are removed to simplify the network.Let G = (V, E) denote the graph for the social network, where V is the set of nodes and E is the set of edges.A subgraph H = (W, E/W) induced by the set ⊆ is a k-core if the degree of v is no fewer than k for every v ∈ , and H is the maximum subgraph with this property.A node may be in several k-core subgraphs, each with a different k value.The maximum of these k values is referred to as the k-core attribute of this node, which can be calculated in linear time with respect to the number of edges in the graph [24].
Wang and Chiu [2] suggested using two social network measurements, k-core and core/periphery ratio, to distinguish fraudsters from normal traders, where the core/periphery ratio is the number of the nodes in the k-core divided by the number of the nodes adjacent to the k-core.Later, they suggested using k-core and center weight for fraud detection [6].They found that fraudsters often appear in the k-cores with k ≥ 2. However, using k-core alone often results in low precision, and thus they applied center weight to improve the precision.A "robbery" algorithm was used to calculate the center weight of each node.Initially, this algorithm assigns the degree of a node as the node's weight.Then, those nodes with larger weights repeatedly steal weights from their adjacent nodes with smaller weights.Finally, only a small number of nodes have weight greater than 0, and all of their adjacent nodes have weight 0. Essentially, those nodes with non-zero weight are the centers of k-cores, and are classified as fraudsters.Although, the use of center weight along with k-core improves the precision, but the recall is reduced.
Notably, in all of the above approaches, both k-core and center weight were used as a Boolean attribute.For k-core, they checked whether k ≥ 2 holds.For center weight, they checked whether center weight > 0 holds.Chiu et al. [13] used social network metrics such as k-core, k-plex, n-clique, normalized betweenness and degree to build a decision tree for fraud detection.They also suggested that abnormal accounts can be identified in the k-cores with k ≥ 2. Furthermore, among the abnormal accounts, fraudulent accounts can be identified using normalized betweenness, k-core with k ≥ 6, and k-plex containing 5 nodes with k = 2.
Instead of building a social network of all traders, Morzy [3] suggested building a social network of all sellers to discover clusters of sellers who are densely connected through committed transactions.First, all transactions of the low-price merchandise are ignored.Then, each node in the social network represents a seller, and for any two sellers, if there exists at least a user-defined number of common buyers for the two sellers, then the two nodes representing the two sellers are linked, and the strength of the link is set to the number of their common buyers.Furthermore, the density of a node is defined as the number of its neighboring nodes.Finally, a score of trustworthiness of a seller is calculated using the densities of its neighboring nodes and the strengths of its links.Notably, instead of detecting fraudsters, Morzy [3] tried to measure the trustworthiness of each trader.

Neighbor Diversity
Given an account x, x's neighbors are those accounts who gave at least one rating to x. Notably, we disregard those accounts who had transactions with x but never gave any rating to x, as x's neighbors.The infalted reputation fraud is achieved by giving positive ratings among the members in a collusive group, and if the two parties of a transaction do not give a positive rating to his/her counterparty, then the chance of both belonging to the same collusive group will be small.Moreover, if none of the two parties in a transaction gives his/her counterparty a rating, then the information about this transaction is not available to the public.Consequently, we, as a third party to the transaction, cannot built a link between these two parties using the dataset crawlled from the auction website.
As described in Section 1, a fraudster does business mostly with his/her accomplices to inflate his/her reputation.The accomplices may share some common characteristics, and consequently the diversity of a fraudster's neighbors on those characteristics is likely to be small.In this section, we propose four measurements for neighbor diversity.Later in Section 5.2, we provide a preliminary analysis to show that fraudsters tend to have lower neighbor diversity than non-fraudsters.
Prior to calculating the neighbor diversity, we first select an attribute attr and divide all accounts into several classes based on this selected attribute.Let n be the number of classes, and p i (x) be the ratio of the number of x's neighbors in the i-th class and the total number of x's neighbors.Then, the neighbor diversity on the attribute attr of an account x, denoted as D attr (x), is calculated using Shannon entropy [14] as follows: Notably, by selecting a different attribute for attr, we have a different measurement for neighbor diversity.In this study, we tried four attributes, including the number of received ratings, the number of cancelled transactions, k-core and the joined date, and developed four different measurements for neighbor diversity.The following four subsections describe how to use the four selected attributes to divide all accounts into several classes.Once each account is assigned to a class, Equation ( 1) is applied to calculate neighbor diversity of an account.

Neighbor Diversity on the Number of Received Ratings
The number of received ratings of an account is the first attribute chosen for measuring neighbor diversity.We divided the range of the number of received ratings into several half-open intervals, and the accounts with their numbers of received ratings falling into the same interval belong to the same class.The half-open interval for the first class is [0, 25 × 2), and the half-open interval for the i-th class is [25 × 2 i−1 , 25 × 2 i ) for i > 1.Then, Equation ( 1) is applied to calculate the neighbor diversity on the number of received ratings.We denote the neighbor diversity on the number of received ratings of an account x as D r (x).

Neighbor Diversity on the Number of Cancelled Transactions
The number of cancelled transactions of an account is the second chosen attribute for calculating neighbor diversity.We divided the range of the number of cancelled transactions into several half-open intervals in exactly the same manner as the number of received ratings.Then, Equation ( 1) is applied to calculate the neighbor diversity on the number of cancelled transactions.We denote the neighbor diversity on the number of cancelled transactions of an account x as D c (x).

Neighbor Diversity on k-Core
In the literature, k-core was found useful to detect cohesive groups of fraudsters and their accomplices [2,6,13].According to [6,13], fraudsters are more likely to reside in the 2-core subgroups.Thus, the k-core is the third chosen attribute for calculating neighbor diversity in this study.We use the scoial network of all traders built from the transaction history, as described in Section 2. The k-core of an account is defined as the largest k value among all k-core subgroups that contain the account in the social network.For example, if an account belong to a 2-core subgroup, a 3-core subgroup and a 4-core subgroup simultaneously, then the k-core of the account is 4.
To calculate the neighbor diversity on k-core, we divided the range of the k-core attribute into half-open intervals of width 2. That is, the first half-open interval is [0, 2), and the i- Then, each interval corresponds to a class, and Equation ( 1) is applied to calculate the neighbor diversity on k-core.We denote the neighbor diversity on the k-core of an account x as D k (x).

Neighbor Diversity on the Joined Date
The joined date of an account is the date that the account is created or registered.For ease of handling, we transform the joined date of an account into the number of months that the account has created.We divided the range of the number of months into half-open intervals, each with a width of 10 months.That is, the first half-open interval is (0,10) for those accounts who are smaller than 10 months old; the i-th half-open interval is [10 × (i − 1), 10 × i) for i > 1.Then, each interval corresponds to a class, and Equation ( 1) is applied to calculate the neighbor diversity on the joined date.We denote the neighbor diversity on the joined date of an account x as D j (x).

Data Collection
In the experimental study, we gathered a list of accounts from a real world auction website, constructed a social network to calculate accounts' features, and built several classifiers for performance evaluation.The first two steps are described in the following two subsections, and the last step is discussed in Section 5.

Collecting the List of Auction Accounts
Ruten (www.ruten.com.tw) is one of the largest auction websites in Taiwan, funded by e-Bay and PChome Online [12].Ruten regularly releases a list of suspended accounts, together with their reasons for the suspension.Like previous work [13,25], our list of accounts starts with these suspended accounts, and then grows to include more accounts in a level-wise manner.The detailed process is as follows : Firstly, we collected the lists of suspended accounts from Ruten during July 2013, and thus gathered 9168 suspended accounts.However, not all suspended accounts are fraudsters.For example, some accounts are suspended due to selling cigarettes or alcohol.We manually examined the suspended reasons of these 9168 suspended accounts, and kept only those accounts who have received at least one rating and whose suspension reasons are related to fraudulent behavior.These suspended reasons include fake biddings, evaluation hype, selling counterfeit products, and failure to deliver products.As a result, only 932 out of the 9168 suspended accounts were kept.We denote these 932 accounts as the L 1 accounts.Because Ruten changed the status of one of the 932 accounts back to normal during October 2013, the L 1 accounts include one non-fraudster and 931 fraudsters.
Secondly, we crawled Ruten to collect all the ratings that the L 1 accounts had received.These ratings also provide the information about who gave the ratings to these L 1 accounts.As a result, 3475 new accounts were collected, and each of these 3475 accounts had given at least a rating to at least one L 1 account.We denote these 3475 accounts as L 2 accounts.Notably, 149 out of the 3475 L 2 accounts are suspended accounts, and because they are suspended accounts who gave positive ratings to the fraudsters in the L 1 accounts, we classified them as fraudsters.Among the 149 fraudsters, 46 accounts also appeared in the initial list of the 9168 suspended accounts.
Thirdly, we crawled Ruten again to collect all the ratings that the L 2 accounts had received, and gathered 233,169 new accounts who had given at least one rating to at least one L 2 account.We denote these 233,169 accounts as L 3 accounts.For the experiment in Section 5, the L 3 accounts are only used to build the social network to provide a complete view of the L 2 accounts' neighbors.We did not try to classify the L 3 accounts as fraudsters or non-fraudsters.Therefore, we did not collect information about whether an L 3 account is fraudster or not in this study.

Construction of Social Network
In the literature [7,10,13,25], the transaction network of auction accounts is often used as the social network for fraudster detection.In this study, we also adopted the transaction network to derive the k-core, center weight and neighbor diversity of each account.Notably, k-core was found useful to detect fraudsters [2,13], and center weight was found to further improve the precision of using k-core alone [6].Thus, later in Section 5.3, we use k-core and center weight in combination with the neighbor diversity for fraudster detection.
In the transaction network, each node represents an account, and each edge represents a rating that one account gave another account.If there exist more than one rating between two accounts, only one edge between the two accounts' corresponding nodes is added to the transaction network.To construct the transaction network, we first added all L 1 , L 2 and L 3 accounts as the nodes to the transaction network.Then, for each rating received by an L 1 or an L 2 account, an edge is added to the transaction network.The resulted network contains 237,576 (=932 + 3475 + 233,169) nodes and 348,259 edges.This network did not include those edges for the ratings that occur after 31 July 2013.
Once the transaction network was constructed, we used the transaction network to calculate the k-core and center weight for each node in the network, as described in Section 2. Also, we crawled Ruten's website to collect the number of received ratings, the number of cancelled transactions, and the joined date for each account in the network.Finally, we used the transaction netowork to calculate the four measurements of neighbor diversity for each L 1 and L 2 account.We did not calculate the neighbor diversity for the L 3 accounts because this network is only a part of the complete transaction network and it does not include all neighbors of the L 3 accounts.Therefore, L 3 accounts are only used to built the transaction network, and to calculate the neighbor diversity of the L 1 and L 2 accounts.

Setting
The set of accounts used in the experimental study includes all L 1 and L 2 accounts.It contains 4407 (=932 + 3475) accounts, and among them, 1080 (=931 + 149) are fraudsters, and 3327 (=4407 -1080) are non-fraudsters (i.e., normal accounts).Descriptive statistics of the four measurements of neighbor diversity for the group of frausters and for the group of non-fraudsters are shown in Table 1.For ease of exposition, the neighbor diversities on the number of received ratings, on the number of cancelled transactions, on k-core, and on the joined date are denoted as D r , D c , D k and D j , respectively.The experiment includes two parts.The first part (in Section 5.2) performed a Mann-Whitney U Test on each of the four measurements of neighbor diversity between the group of 1080 fraudsters and the group of 3327 non-fraudsters.The results provide a preliminary evidence that fraudsters tend to have a lower neighbor diversity than non-fraudsters.
The second part of the experiment (in Section 5.3) evaluated whether the use of neighbor diversity could improve the classification performance on discerning fraudsters and non-fraudsters.For each account, the following attributes were used: k-core, center weight, and the four measurements of neighbor diversity.We first used k-core and center weight to build a classifier and evaluated its classification performance.Then, we compared the performance of each classifier against the performance of the classifier with the addition of one measurement of neighbor diversity as input attribute.Three classification algorithms (J48 decision tree, Neural Networks (NN) and Support Vector Machine (SVM)) from Weka [26] were used to perform 10-fold cross-validation.Notably, the classification experiment was conducted in two different settings.In Test 1, both k-core and center weight were used as numeric attributes.However, previous work [13,25] used k-core and center weight as Boolean attributes by checking whether k is greater than or equal to 2, and whether the center weight is 0 or not.Thus, in Test 2, we used k-core and center weight as Boolean attributes to build classifiers.

Results on Comparing the Mean Neighbor Diversity between Groups
Prior to comparing the mean neighbor diversity between the group of fraudsters and the group of non-fraudsters, we used the Kolmogorov-Smirnov test to check the normality for each group on each of the four measurements of neighbor diversity.The results showed that the assumption of normality failed for both groups on all four measurements of neighbor diversity.
Because the assumption of normality of distribution is questionable, the Mann-Whitney U Test was used as a substitute for a Student t test to compare the difference between the mean neighbor diversity at both groups.The test results were all in the expected direction and significant (z = −38.562and p < 0.05 for D r , z = −37.535and p < 0.05 for D c , z = −27.131and p < 0.05 for D k , and z = −30.141and p < 0.05 for D j ).Therefore, the mean neighbor diversity is lower in the group of fraudsters than in the group of non-fraudsters for all four measurements of neighbor diversity.

Results on Classification Performance
For ease of exposition, center weight is denoted as CW.Test 1 used k-core and CW or k-core, CW and one of D r , D c , D k and D j as input attributes to build classifiers.The results of Test 1 using J48, NN and SVM are shown in Tables 2, 3 and 4, respectively.With all three classification algorithms, the recall is lower than 52% when using only k-core and CW.With the addition of D r or D c as input attribute, the recall improves respectively to 0.8731 and 0.587 using J48, and respectively to 0.7787 and 0.875 using NN.With all three classification algorithms, the addition of D r or D c always significantly improves the accuracy, recall and F 1 -measure, but sometimes slightly decreases the precision.The addition of D k or D j does not provide stable improvement on recall and precision when using different classification algorithms.Notably, in Tables 2-7, if the addition of a neighbor diversity shows poorer result than without using the neighbor diversity, its result are shown in italic font.Also, the best results are shown in bold.Test 2 is the same as Test 1 except that the k-core and CW attributes are first transformed into Boolean attributes, as described earlier in Section 5.1.The results of Test 2 using J48, NN and SVM are shown in Tables 5-7, respectively.Compared to using numeric k-core and CW in Tables 2-4, using Boolean k-core and CW only slightly improves the precision but further decreases the recall to less than 37%.However, with the addition of D r , D c , D k or D j as input attribute, accuracy, recall and F 1 -measure are improved with all three classification algorithms.

Discussion
To understand the implication why D r and D c help detecting fraudsters, we built two decision trees, one using D r and another using D c , as the only input attribute.The results, shown in Table 8, remain competitive to those reported in Table 2.
For clarity, we show both decision trees up to depth 3 in Figure 1.The nodes in the top two levels of the decision tree of D r essentailly classified accounts with D r <= 0.892943 as fraudsters and accounts with D r > 0.892943 as non-fraudsters.This coincides with our assumption that fraudsters tend to have lower neighbor diversity than non-fraudsters.Similarly, the nodes in the top two levels of the decision tree of D c essentailly classified accounts with D c <= 0.666017 as fraudsters and accounts with D r > 0.666017 as non-fraudsters.

Conclusions
Most recent approaches for fraudster detection in online auction utilize social network analysis to detect cohesive groups in the social network [2,6,13].This work takes a different perspective by measuring the diversity of the neighbors of each trader.The intuition behind this approach is that the neighbors of a fraudster are likely to have similar patterns, and therefore have a low diversity.Four ways to measure the neighbor diversity based on four different attributes are proposed, and their performance for detecting fraudsters are evaluated using a real world dataset crawlled from the auction website Ruten.The experimental results show that the addition of using neighbor diversity on the number of received ratings (i.e., D r ) or on the number of cancelled transactions (i.e., D c ) significantly improves the accuracy, recall and F 1 -measure at the cost of sometimes slightly reducing the precision.However, the results of the neighbor diversity on k-core (i.e., D k ) or on the joined date (i.e., D j ) show little or no improvement.
In contrast to the inflated reputation problem, an evildoer may try to damage the reputation of its competitors by giving negative ratings to the competitors [20,21].How to adapt the reputation systems to such malicious activities is an interesting topic.One possible remedy is to use a weighted voting system such that a negative rating from a more reputable trader carries more weight when calculating the reputation of a trader.This is left as future work.

Figure 1 .
Figure 1.J48 decision tree using D r (left) or D c (right) as the only input attribute (up to depth 3).

Table 1 .
Descriptive statistics of the four measurements of neighbor diversity.

Table 2 .
J48 performance using k-core and CW or k-core, CW and a neighbor diversity (Test 1).

Table 3 .
NN performance using k-core and CW or k-core, CW and a neighbor diversity (Test 1).

Table 4 .
SVM performance using k-core and CW or k-core, CW and a neighbor diversity (Test 1).

Table 5 .
J48 performance using Boolean k-core and Boolean CW or Boolean k-core, Boolean CW and a neighbor diversity (Test 2).

Table 6 .
NN performance using Boolean k-core and Boolean CW or Boolean k-core, Boolean CW and a neighbor diversity (Test 2).

Table 7 .
SVM performance using Boolean k-core and Boolean CW or Boolean k-core, Boolean CW and a neighbor diversity (Test 2).

Table 8 .
J48 performance using D r or D c as the only input attribute.