Predicting Reputation in the Sharing Economy with Twitter Social Data

: In recent years, the sharing economy has become popular, with outstanding examples such as Airbnb, Uber, or BlaBlaCar, to name a few. In the sharing economy, users provide goods and services in a peer-to-peer scheme and expose themselves to material and personal risks. Thus, an essential component of its success is its capability to build trust among strangers. This goal is achieved usually by creating reputation systems where users rate each other after each transaction. Nevertheless, these systems present challenges such as the lack of information about new users or the reliability of peer ratings. However, users leave their digital footprints on many social networks. These social footprints are used for inferring personal information (e.g., personality and consumer habits) and social behaviors (e.g., ﬂu propagation). This article proposes to advance the state of the art on reputation systems by researching how digital footprints coming from social networks can be used to predict future behaviors on sharing economy platforms. In particular, we have focused on predicting the reputation of users in the second-hand market Wallapop based solely on their users’ Twitter proﬁles. The main contributions of this research are twofold: (a) a reputation prediction model based on social data; and (b) an anonymized dataset of paired users in the sharing economy site Wallapop and Twitter, which has been collected using the user self-mentioning strategy.


Introduction
The increasing adoption of social media is transforming the way we live and do business [1].As of October 2018, there are more than 2.2 billion monthly active users and 1.5 billion daily active users on Facebook [2].Other social networks such as Youtube and Twitter have 1900 and 300 million monthly active users, respectively [2].
Social Media interactions, from commenting on a post to liking a photo, leave behind digital traces-voluntary or not-that can be used for mining patterns of individual, group, and societal behaviors [3].These social data created by users when they interact with social media channels are usually known as digital footprints [4].The full potential of digital footprints for generating insights about users comes from their combination with other sources, such as bank transactions [5], sensor information [6], or other social networks [3].Several works have used many techniques for user profile matching across social networks, such as user profile similarity [7,8] as well as based on spatial, temporal, and content similarity [9].
In this work, we address how social media footprints can improve the social markets of the so-called sharing economy.The sharing economy can be defined as "Collaborative consumption that takes place in organized systems or networks, in which participants conduct sharing activities in the form of renting, lending, trading, bartering, and swapping of goods, services, transportation solutions, space, or money" [10].The success of the sharing economy lies in connecting individuals who have underutilized assets with other individuals interested in their use in the short term [11].The shared assets range from vehicles and spare time to rooms, second-hand objects or spare time for performing everyday tasks.Some of examples of these services are BlaBlaCar (http://wwww.blablacar.com) for long-distance car-sharing, Turo (http://turo.com)for short-term car renting, Uber (http://www.uber.com)and Cabify (http://www.cabify.com) that offer an alternative to taxis, Airbnb (http://www.airbnb.com)) that enable sharing rooms and apartmetns, Wallapop (http://www.wallapop.com)for second-hand objects, and Fiverr (http://www.fiverr.com)for performing everyday tasks [11].
One of the potential barriers for the development of the sharing economy is the high level of trust needed to do potentially dangerous activities with strangers as driving for them or staying at their houses [12].Several solutions attempt to overcome this problem.Some examples are online reputation systems where users can publicly rate each other after a transaction outcome or insurance-like protections that cover material and monetary risks [13].
Online reputation systems enable buyers and sellers to rate each other after each transaction (e.g., ride or apartment sharing), in the same way as eBay transactions.In this way, they help to build a reputation profile that makes transactions safer and less uncertain [12].Nonetheless, some authors [14,15] have pointed out some drawbacks of these systems, such as their manipulation with opinion spam [16] or extremely high averaged ratings [15].Another drawback is the need for historical data, and therefore the total uncertainty of new users' future behavior, which translates to mistrust from other users of the platform.
Such drawbacks could be mitigated by predicting the probability of a user being negatively rated by others.That information could enable preemptive measures to be taken.For this purpose, we propose to use social network data to build models that predict user reputation on sharing economy platforms, the same way that insurance companies use demographic data to predict customer risk.We choose to use social network data to train the models because of three reasons: (1) wide availability [2], (2) easiness to obtain data thanks to the availability of interfaces and (3) easiness of integration, since multiple sharing economy platforms, such as Airbnb or Blablacar, already allow users to link their social network profiles with their existing reputation profiles.
To further investigate this solution, we want to answer the following research questions: • RQ1.Can social digital footprints be used for predicting bad user behavior on sharing economy sites?• RQ2.What are the most relevant social features for predicting reputation in the Sharing Economy?
The remainder of the paper is organized as follows.Section 2 describes the state of the art.Then, Section 3 introduces the case study analyzed in this article as well as the proposed general approach to predicting user reputation based on Online Social Network (OSN) activity.It consists of three phases: the creation of a dataset of paired identities (Section 3.1), expanding this dataset with information from both sites (Section 3.2), and developing a reputation model (Section 4), which also includes the results and analysis from our experimental evaluation.Finally, the article concludes with Section 5 that discusses the conclusions and main findings of our work.
Several factors in today's society make the concept of trust more relevant [37].Contemporary society has become more interdependent because of the globalization process.In addition, people have more available options in all domains of life, which makes both our decisions and those of others less predictable.Moreover, organizations have become more opaque and complex.Another source of unpredictability is the increasing dependence on technology, which brings new threats and vulnerabilities.
Trust and trustworthiness is our strategy to deal with uncertain and uncontrollable future situations.As defined by Sztompka, "Trust is a bet about the future contingent actions of others" [37].In this regard, Yamagishi [38] treats trust as a form of social intelligence that allows people to assess the degree of risks they face in social situations.Granting trust to a target is the result of the estimation of its trustworthiness, which can be determine using three main mechanisms [37]: reputation, performance, and appearance.Reputation is "a record of past deeds" [37].Let's say we are evaluating how trustworthy an Uber driver is.The reputation would be based on the driver's rating.Performance refers to the present behavior.In the previous example, we would focus on how the driver is currently driving and maybe her conversation.Finally, appearance consists of superficial external signs relevant for trust, such as, for example, the driver's photo [39] or if the car is in good condition.
Online transactions challenge our traditional mechanisms for evaluating trustworthiness since we cannot see and try the products or the providers.In addition, reputation rating could be dishonest or biased towards positive feedback [40].Therefore, trust and reputation systems have tried to provide adequate substitutes for the traditional signs we use in the physical world and take advantage of Information and Communications technology (ICT) technologies for collecting this information and providing useful trust and reputation metrics [32].
The main mechanisms for managing trust are [26,43]: (i) policy-based trust that rely on security mechanisms such as credentials and access policies to grant access rights; and (ii) reputation-based trust, where past interactions are used for predicting future behavior; and (iii) social network-based trust that also use social relationships between peers to compute trust.
Policy-based trust relies on a trusted third party that serves as an authority for providing and verifying credentials.Policies can be expressed in policy specification languages [44].Trust negotiation mechanisms enable building trust through a bilateral exchange of credentials preserving privacy [45].
Reputation-based trust systems should provide mechanisms for storing past experiences and calculating trust score.There are two main architectural types, centralized and distributed systems [32].In centralized reputation systems, a central authority, the so-called reputation center, collects all the ratings and computes a public reputation score for every user.This is the most popular architecture [32] and is used in applications such as auction services such as eBay (http://ebay.com,product review sites such as BizRate (http://bizrate.com)and Amazon (http://www.amazon.com),or discussion fora such as Slashdot (http://slashdot.org).Distributed reputation systems lack a central component, and each participant can obtain ratings from other members.Many research solutions have been proposed [46,47], a recent research topic being its combination with blockchain technology as described in the survey [48].
As introduced previously, social network-based trust systems take advantage of the social network structure.Following the classification by Sherchan et al [35], three steps are followed for deriving social trust in social networks: (i) trust collection, (ii) trust evaluation, and (iii) trust dissemination.The first step, trust collection, is the process of collecting trust information from three main sources [35], attitudes [49], behaviors [50], and experiences [51].Some of the signs that can be used for inferring trustworthiness from a social network are the opinions towards a user, topic or another entity (attitude), and the interaction patterns (behavior, for example, the level of participation), and previous experiences, usually reported through feedback mechanisms.The second step, trust evaluation, consists of deriving trust from the information collected.There are two main approaches for social trust evaluation: network-based and interaction-based models, which can also be combined in hybrid models.Networks-based social trust models compute social trust using the social graph structure.Some works use networks based metrics (e.g., in-degree, out-degree), while others are based on the notion of "Web of Trust" and extend Friend-Of-A-Friend (FOAF) [52] and exploit the propagative nature of trust [53,54].Regarding interaction-based social trust evaluation models, they rely the interaction patterns and the behavior of users [55].Finally, the third step, trust dissemination, consists of exposing trust in the social network.The main dissemination models are trust recommendation models and visualization models.Trust recommendation models use a trust network where nodes are users and the edges are the trust place on them.This trust network is exploited by trust recommendation algorithms for generating personalized recommendation based on the aggregation of opinions of users in the trust network [56].Accordingly, visualization models show the trust network that helps to understand trust relationships between the members of the social network.The reader can refer to the detailed survey by Liu et al. [57] for a comprehensive overview of the Machine Learning (ML) based methods for pairwise trust prediction.

Trust and Reputation in the Sharing Economy
Sharing economy transactions have more significant risks involved than traditional online transactions as the relationships are more intimate and sophisticated (e.g., allowing a stranger to sleep in your house or entering a stranger's car).For these platforms to succeed, it is necessary that their users feel safe and trust each other when using them because without that safety feeling it is unlikely that people would want to expose themselves to such a danger: trust is a mandatory requirement for sharing economy platforms development.As stated by Tadelis [58], "the success of these marketplaces is attributed not only to the ease in which buyers can find sellers but also to the fact that they provide reputation and feedback systems that help facilitate trust." As mentioned previously, the notion of trust may vary in different domains, but, in the sharing economy, it can be defined as "the psychological state reflecting the willingness of an actor to place themselves in a vulnerable situation for the actions/intentions of another actor" [59].
To facilitate trust, sharing economy markets usually incorporate reputation and feedback systems [58].After each transaction, users can provide feedback (e.g., five-star feedback score) and the market can derive the reputation of sellers and buyers from both public information (e.g., feedback) and not public information (e.g., messages).Reputation is only one of the available mechanisms to foster trust.Other mechanisms for fostering trust are long reciprocity in long-term relations, regulations, or professional qualifications [60].In addition, personal reputation plays an important role in the shared economy [59,61].The presence of storytelling narratives, connected accounts, and photos can be used for verifying the digital identity and increase trust as well as the sense of personal contact.Moreover, some studies [39] have reported that consumers infer sellers' trustworthiness from their photos, and this visual-based trust and attractiveness could affect more than their reputation, although this conclusion has not been confirmed in other sharing economy transactions [62].
An interesting approach to improve trust in the sharing economy is the possibility of trust transfer across sharing economy applications.The term "trust transfer" has been used in different research contexts, but in the sharing economy [63], the trust transfer situation can be described as a trustor (e.g., a user) that has some initial trust on a trustee (e.g., a service provider in AirBnB) and transfers his trust on a different context (e.g., the same service provider in a different platform such as BlaBlacar).There are two main approaches to trust transfer in the sharing economy [63]: a direct trust transfer possibility from platform to platform, and providing a reputation board from a third party that acts as trust authority.A number of startups have proposed solutions for reputation boards (e.g., erated.co,miicard.com,traity.com,truste.com),but they have not become popular yet.Reputation dashboards aim to aggregate users activity and reputation and to provide a trust score (or reputation passport) to become a trusted member in new social sites.
In this work, the social site Twitter is used as the source for reputation data, and we aim to transfer this reputation to the sharing economy platform Wallapop (http://wallapop.com) that is an online marketplace for second-hand articles launched in 2013.According to internal sources, the platform has more than 25 million downloads [64] and more than 5 million monthly active users in Spain, France, and US [65].It follows the two-sided reputation schema, where both sellers and buyers receive a rating (from 1 to 5 stars) as well as an optional comment.Users' reputation can be measured by observing the ratings they receive after each transaction.
As shown in Figure 1, Wallapop user profiles consist of a nickname, a history of products sold with their associated feedback and list of products currently on sale.In addition, user profiles and products can be shared in many social networks (e.g., Facebook and Twitter) as well as by email.Wallapop can be accessed through iOS and Android applications and the official website.

Characterizing and Interlinking User Profiles in Social Networks
Technology enables new ways of communication, but it also enables ways to extract information about human interaction at a scale not available for classic ethnographic research.For example, mobile phone data have been used for inferring friendship relationships [66].Moreover, behavioral patterns from mobile phones correlate with personal information such as gender [67] or job satisfaction [66].
In the context of OSNs, the volume of data generated related to personality traits, behavioral patterns, and social connections is even bigger: from self-reported networks of friendship to user-generated content.These data are usually publicly available and voluntarily generated by users.A large body of research has exploited this available information for inferring both social aspects (e.g., influenza trends [68], sentiment in the stock market [69] or unemployment rates [70]) and individual characteristics (e.g., personality [71], location [72], ethnicity [73], or political affiliation [74]).
Since people have different accounts in many social networks, a body of research has addressed matching user profiles across different social networks to obtain a global profile of individuals.Several approaches have addressed this problem [75]: profile matching, where two user identities are compared to the similarity of their profile attributes [7,8,76]; network matching that compares network attributes [77] in two or multiple social networks [78], content matching that looks at the similarity of the content generated by users in different networks [75]; a combination of spatial, temporal and content matching [9] that compare spatial, temporal and content similarities; and self-reporting that searches for self-mentions to different networks [79].

Case Study and Methodology
We propose a model to predict user reputation in the sharing economy platform Wallapop, using the social footprints users leave when interacting on the Twitter platform.This way we are predicting the behavior of users in real-world peer-to-peer transactions (measured as the ratings received afterwards) by just using social data, frequently available in these platforms.
The methodology of this work is depicted in Figure 2. Three steps have been followed: (a) identification of paired identities between Wallapop and Twitter using a self-mentions strategy (Section 3.1); (b) enrichment of those identities by downloading user profiles from both platforms and generating numeric features (Section 3.

Identity Pairing
Past research has proved that publicly available information such as names, self-reported descriptions and social connections can be used to disambiguate users from different platforms [7].While these methods are useful to match profiles between different platforms, they suffer from a significant error rate at the same time that are difficult to scale.Instead, we have designed an alternative process which tracks social media patterns related to sharing economy platforms to match identities from both platforms (Figure 2a).This way we are creating a dataset of matched identities that we can enrich later with both platforms data (Figure 2b).
This process leverages certain actions that can be performed in sharing economy platforms and have an impact on their connected social networks, such as posting referral invites or sharing some content.In the case of Wallapop, after users upload an item to the platform, they are offered the possibility to share such listing on multiple social networks to increase its potential audience.If users choose to share it on Twitter, a default message is offered, such message format has been constant through time and contains a link to the Wallapop listing.An example of this default message is: "I am selling PlayStation 4 on #wallapop <item URL at Wallapop>".
Users can modify such messages, but most of the time, they do not change it or perform only minor changes.Based on this observation, we have searched tweets matching those patterns to collect Twitter users that sell a product in Wallapop so that we can match Twitter and Wallapop identities.In particular, we have searched for tweets that match the following two conditions:

•
Contain a link to a Wallapop product that we can use to reach the Wallapop profile of the seller.

•
Tweets similar to the default tweet message, including keywords as selling or hashtags as #wallapop.
Accessing Twitter data is a straightforward task thanks to the Application Programming Interface (API) offered by the platform.Twitter offers different endpoints to access different sets of data in the same way that the official applications do.For the profile matching, the search tweets endpoint was used, which returns a collection of relevant Tweets matching a specified pattern.After frequently tracking Twitter for an extensive period (from December 2015 to December 2016), we ended with a dataset formed by 34,981 paired Wallapop and Twitter identities.We expected a very low error rate with this matching process as it searches for the default tweet created when users share their profiles.In addition, we have created a random sample of 100 matches and manually verified them by comparing the names and pictures of both profiles, resulting in 98 confirmed matches and two uncertain matches.

Data Collection and Feature Generation
Once a dataset of paired user identities in Twitter and Wallapop has been collected, our goal is to extend that dataset with information from both platforms (Figure 2b): a metric for user reputation coming from Wallapop and a set of social features coming from Twitter.

Data Collection
Downloading Twitter public data is a straightforward process thanks to the existing API that allows us to download structured data for each one of the paired identities.The different characteristics available through different endpoints should be included to build a full Twitter profile: • users/show: profile characteristics as name, picture, or description.• statuses/user_timeline: list of tweets posted by the user.• followers/list: list of followers.• friends/list: list of friends.
In contrast with Twitter, Wallapop does not offer a documented API to access its service.One alternative could be to use scraping techniques to extract data from the website, but this technique has the downside of not being resilient to changes in the web interface.Given that we wanted to obtain data for an extended period, we opted for a different option focused on accessing the data through the internal API that the official mobile applications use to communicate with the servers.
Wallapop mobile applications follow a typical pattern of server-client communication: a set of servers that perform most of the business logic feed applications with structured data to be rendered to users.Both parties communicate with each other through the use of an API that is usually stable for a long time, which makes it ideal for gathering data for an extensive period.To download data from the internal API, the first step was to identify which endpoints the mobile apps were connecting to and how they exchanged the information.The process started by configuring and deploying a transparent web proxy and an HTTP interceptor that logged all the data sent through it.Then, a smartphone was configured to use the proxy; the official Wallapop application was installed and used for some time.The output of the process was the full set of requests exchanged between the server and the application.By studying these logs, it was possible to get all the endpoints we needed:

•
Search: a list of items available near a given location.
• Item: description of an item (including price and seller profile).• User: user public profile, including data such as the number of reviews and verifications.• User reviews: full set of reviews given or received by a user.
The output is a dataset of size 35,706 samples.As user reputation is based on ratings left by others, the only samples that are useful for the experiments are the ones who have received at least one review (n = 19,325).Users are rated from 1 to 5 stars.The average reviews rating distribution for each user is highly skewed to the right, due to the majority of positive reviews, with a big mass around integer values, as shown in Figure 3a.If we consider a review as negative to the ones with ratings below three stars out of five, the distribution of the count of negative reviews for each user has a mean value of 0.099 and a variance of 0.131, as shown in Figure 3b.

Feature Generation
According to the general framework defined by Pennacchiotti and Popescu [80], Twitter user features can be classified into four categories: profile features ("who you are"), tweeting behavior ("how you tweet"), linguistic features ("what you tweet") and social network ("who you tweet").The authorship framework [81] can complement linguistic features, which includes stylistic features.
Based on these frameworks, for the sake of this work, we have processed the following features, as shown in Table 1: • Profile: we selected the account creation date (in seconds since epoch) based on the hypothesis that users with active accounts for a long time are likely to be more trustable [82].

•
Behavior: based on previous research [83,84], we have selected activity metrics: most frequent tweeting hours, tweets count, number of tweets marked as favorites by the user, and the tweet average length.Other authors [85,86] use other network metrics such as centrality metrics since they analyze trust networks.We have not used them since our dataset is very sparse, based on the followed data collection strategy.• Linguistic: the average count of bad words by tweet has been calculated using a publicly available list (https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words) of known bad words in Spanish and English (the two majority languages in our dataset).Then, we checked the presence of each tweet word in this list.• Social: the Twitter network is unweighted and directed, with edges formed through the following action.Thus, we can analyze these edges separating them between the inner (followers) and outer edges (friends).The extracted features include features such as the count of friends and followers.These features were generated by processing the nested data structures returned from the Twitter API to generate numerical variables that can be used to train statistical and Machine Learning models.This process was an ad hoc one-time process that resulted in a set of numerical features for each one of the matched Twitter profiles.The statistics of these numerical features are shown in Table 2.

Prediction Model
Our primary research question (RQ1) is if social network footprints can be used for predicting bad user behavior on sharing economy sites.We base our assumptions in the existing research about the capabilities of Social Media for inferring personal traits.To explore such a hypothesis, we have designed the following experiment: predicting the proportion of bad ratings for each user by only using social features extracted from their Twitter profiles as predictors (Figure 2c).Given the polarization of reputation systems [15], we will define a bad rating as one with a value of less than half of the maximum, in this case, one or two stars out of five.In our dataset, there are 1922 bad ratings out of a total of 149,195.
We can make the following assumptions: (i) the majority of the Wallapop users are not fraudulent since reviews with low ratings are infrequent; and (ii) there is a certain probability of a transaction to go wrong.If this happens, the fault for such an unpleasant transaction can lie in one or both participants in the transaction.Based on this, we can then model the rate of reviews with low ratings in a certain user profile with a Poisson distribution where infrequent events, bad reviews, can occur with a certain probability.Such probability will be different for each user, and we are particularly interested in what are the user characteristics that relate to such probability.In fact, we want to predict such probability by using only a set of user characteristics that are present on Twitter, as a way of predicting user behaviors at Wallapop by only using social data.
We could think about using a Poisson regression to model the count of bad ratings for each user.However, while Poisson distributions have a variance equal to the mean, in our data, the count of bad ratings suffers from overspreading: the 0.099 and the variance is 0.131.For that reason, we will model the data using the Negative Binomial distribution, which, as a gamma mixture of Poissons, can be used to model count data with overdispersion [87].
This experiment shares two characteristics with the challenges that actuaries face when doing risk modeling in the insurance industry: first, the events are infrequent counts, and second, there is high uncertainty about the predictions of a single entity [88].Because of these similarities, we have designed our experiment with the tools of actuarial science, such as Generalized Linear Models for modeling of count data and gain curves for model performance evaluation [89].
Generalized Linear Models (GLMs) [90] are a means of modeling the relationship between a variable whose outcome we wish to predict and one or more explanatory variables.GLMs are parametric models, which are simpler and easier to understand than non-parametric models as the ones usually used in ML.At the same time, as a generalization of ordinary linear regression, they can be used to model the probability of occurrence or non-occurrence of counts, which is not possible with a simple Linear Regression.These characteristics make GLMs a good fit for the task of explaining the effects of the features on the predictions of the target variable.
Two of the limitations of parametric models are: i) the necessity of choosing the right distribution for the model and ii) the lack of robustness to correlated features.Nevertheless, determining accurate estimates of relativities in the presence of weakly correlated rating variables is a primary strength of GLMs versus other univariate analyses, ensuring that no information is double-counted [89].No strong correlation (understood as a Pearson coefficient bigger than 0.4 [91]) is found between any of the features.
In a generalized linear model, each outcome of the dependent variables is assumed to be generated from a particular distribution in the exponential family.GLMs model the relationship between µ (the model prediction) and the predictors as follows: Equation 1 states that some transformation g (link function) of µ (target variable) is equal to a linear combination of an intercept coefficient β 0 and a set of weighted predictors, where x 1 ...x n are the predictors, β 1 ...β n are the weight coefficients and n the number of predictors.By training the model, the coefficients (β 0 ...β n ) are estimated and used to predict the target variable.GLMs, and more specifically, multiplicative models, are widely used in the risk assessment insurance industry for their capacity to produce a multiplicative rating structure that easily explains the effect of each predictor in the final score [89].In case of using the log link (natural logarithm ln): For our experiments, we have built a GLM from the Negative Binomial distribution family, using the set of Twitter features as predictors (x 1 , ... , x n ).While the target variable y, defined as the ratio of bad ratings for each user count bad to the total count of ratings for each user count total , is not a discrete variable but y ∈ [0, 1], we can still perform a Negative Binomial regression by using the logarithm of the total count of ratings as an offset to the regression.By multiplying both sides of the equation by the count count total , we move it to the right side of the equation.When both sides of the equation are then logged, the final model contains ln(count total ) as an offset term that is added to the regression: The results of training this model are shown in Table 3, where we can observe that multiple predictors have a significant p-value (0.05, marked in bold).This result confirms the predictive capabilities of Twitter social footprints for Wallapop reputation prediction and positively answers the research question RQ1.Regarding the research question RQ2, we can observe in Table 3 that the features with a significant effect on the probability of a review being poorly rated are: (1) account creation date (as seconds since epoch) where the regression coefficient indicates that older accounts are related to a lower probability of a bad rating; (2) bad words ratio, having a higher ratio of bad words per tweet is related to being a worse rated user; (3) average tweet length indicating that worse users create shorter tweets; (4) the average count of friends for the user followers and (5) the average count of followers for the user friends, relating a lower risk of receiving a bad rating to having a smaller and closer Twitter network.
The account creation date was expected to be a relevant predictor, as past research relates this characteristic to trustworthiness [82].We can also interpret that Twitter early adopters are more experienced in similar peer-to-peer platforms.The bad words ratio was also expected to have a significant effect in this particular direction: users that tweet more bad words receive more bad ratings on average.The other three significant predictors are more challenging to interpret: first, the average tweet length indicates that bad users create shorter tweets, maybe as a signal of certain personality traits, such as conscientiousness, which has been related to less aggressive behavior [92].The other two significant predictors, the average count of friends for the user followers, and the average count of followers for the user friends, suggest that closer networks are related to fewer risky users.
We also trained the model against the good rating count but found that, while the coefficients had the opposite sign (as expected), there were no parameters with p-values below 0.05.
To answer RQ1, we have trained the same Generalized Linear Model with a 10 fold cross-validation strategy and assessed the model performance at differentiating between low and high-risk users using the Area Under the Curve (AUC) of the Gains Curve.Gain Curves are frequently used in risk assessment to evaluate the performance of the models as a simple and intuitive method for evaluating count data.They are built similarly to Lorenz curves: first, the samples are sorted by the predicted target variable on the x-axis, with higher values on the left and lower values on the right.In this case, users with a higher predicted probability of receiving a bad rating are on the left of the graph.The y-axis represents the amount of the accumulative percentage of the target variable, in this case, the percentage of all bad ratings.For each user percentile, the cumulative percentage of bad reviews for those users is plotted, resulting in a curve that will be better the bigger the area under it.
The 45-degree line is the line of the random model, which does not have any predictive power at the task of identifying users with a higher probability of getting a bad rating.For example, when the random model is used to sort the users, 50% of users will have 50% of all the bad ratings.The top line is the perfect model, one that gives the higher probability of a bad rating to the ones that end up receiving the bad reviews.Therefore, the line will increase very steeply at first until 100% of the target variable is quickly reached [93].A metric that summarizes the performance of the model is the AUC, also known as the lift index, understood as the area between the model gain curve and the random model curve.
After training and obtaining the output of the model with a 10 fold cross-validation strategy, we generated the Gains Curve displayed in Figure 4.The AUC is 9.83%, which proves that the model has a certain level of predictive power at the task of separating users by their probability of receiving a bad rating: a random model would have an AUC very close to 0%, and a model worse than a random model would have a negative AUC.While the insurance industry is very opaque about their risk assessment models' performance, there are some examples in the literature of models from which we can compare with, for example at the insurance [94] (19,9% AUC) or marketing [95] industries (from 15.6% to 28.9% AUC).As shown in Figure 4, a Random Forest Regressor (RFR) model is included as a baseline with an AUC of 1.52%.We decided to use the GLM model not only because of the better performance but also because, as as a parametric model, it allows us to understand the effect of the predictors in the model output and also answer RQ2.To summarize, by using the Gains Curve and the AUC, we can answer research question RQ1 and confirm the predictive power of Twitter social features for modeling the users' probability of receiving a bad rating when transacting at Wallapop.At the same time, we can use the same parametric model to understand which Twitter social features are significant predictors of the outcome, by looking at the significance of the regression parameters, and answer RQ2.

Conclusions and Future Works
In this paper, we have explored the potential of social network footprints for predicting user reputation on sharing economy platforms.The initial findings show that our social network footprints can reveal our behavior, and this knowledge can be transferred across different social sites.
We have developed a process to match users from sharing economy platforms and OSNs.This process that leverages a self-mention strategy as the link between both platforms allowed us to create a dataset of profiles formed by Twitter and Wallapop profiles.Later on, we created a set of experiments that explained the relation between social traits and reputation and confirmed such effects.These results could be used to improve the cold-start problem for new users in sharing economy platforms by providing a prediction of how a new user can behave in the future.
One limitation of this study is that our approach has been tested only in one dataset, given that there are not available datasets of linked user-profiles across OSNs that enable trust computation as proposed in this article.
Several future works arise from this work.First, we aim at linking data from other OSNs to have a better understanding of the user, at the same time that targeting different platforms with Reputation Systems to compare different models and test their portability.
The results of this research have significant potential as they connect real-world behaviors with widely available social footprints.It can enable new ways to tackle information asymmetry problems, such as the ones that appear in financial systems in developing countries [96]-for example India having the largest Facebook user base in the world [97] but at the same time having a weak credit scoring record infrastructure [96], or in the insurance industry where part of the risk modeling is based on basic demographic traits [98].
2); and (c) development of a predictor of the probability of getting bad reviews in Wallapop based on the paired Twitter profile (Section 4).

Figure 2 .
Figure 2. Proposed methodology to build a model for reputation prediction.

3 .
(a) Histogram of average rating per user (b) Histogram of negative ratings per user Figure Histograms of average rating per user, and negative rating per user in logarithmic scale.

Figure 4 .
Figure 4. Gains curve for GLM predictions through a 10 fold cross-validation process, compared to a set of baselines: a random model, a perfect model with maximum gain and a RFR.

Table 3 .
GLM regression summary.Predictors statistically significant at the 5% level are shown in bold.