The social networking site Twitter first appeared on the Internet in March 2006 to provide users with a platform that they could use to communicate and connect with people around the world in as little as 140 characters. This is not limited to just friends and family, but also people you want to share with and connect to. Twitter encourages users to express themselves freely through the use of tweeting and hash tagging to “share what you see, feel and experience as it happens” [1
]. This approach has enabled Twitter to become a central hub of real time news with breaking stories often before the media have picked them up, as well as finding out what your friends are doing. The London Riots were an example of this [2
]. Twitter was used by the Police as a tool to track the rioters to try to prevent riot hotspots and other crimes that were occurring and also to keep the public informed [3
Social media, whether it be for leisure or business has become a part of our everyday lives with Twitter and Facebook leading the way, with 645,750,000 active registered Twitter users on 11th July 2014 [4
]. People tweet on their way to work and when they get home without realizing that they are also giving away information about where they are, where they live and when they are not at home to anyone with access to the Internet.
Twitter added the geo-location function to user profiles in 2009 [5
] enabling followers to know exactly where an individual was tweeting from. It was only a matter of time before privacy issues regarding this information started to emerge. With geo-location now an integral part of Twitter, it is still unclear as to whether users know how to use this feature or how to protect themselves. There has been very little research to establish how many Twitter users do not realize that features such as their location and their identity are turned on by default. As most teenagers today have at least one profile on a number of social networking sites, such as Facebook or Twitter, they inadvertently reveal a lot of personal information about themselves that anyone can see and use. They do not appreciate the risks and so would not be aware that they are also exposing their private information [6
]. From anecdotal observation it would appear that many users have the same picture (avatar) of themselves and the same username or “handle” on all of their social media sites. This makes it even easier to identify them and follow them through cyber space to harvest more personal information about them [7
]. It could be said that anyone using Twitter would not necessarily realize that they could be tracked to their home simply because they uploaded a tweet or a picture to a social networking site. Furthermore, users are actively encouraged to enter a lot of personal information into social networking sites, so that this information can be shared with others [8
]. This includes date and place of birth, what school they went to, name address and gender, which is all useful information for attackers. The dangers of this are identified in [9
] whereby attackers could gain access to someone’s Twitter account to spread malicious code or they could steal enough personal information about a user to assume their identity. This could even include gaining access to their credit card, bank details and passwords. The user may also unintentionally reveal sensitive personal information within the contents of their tweets [9
]. Figure 1
demonstrates this, as it shows a picture of someone’s credit card posted on Twitter. Anyone viewing the original post can see all the card details including the 16 digit number across the front, the person’s name, the expiry date and in the case of this picture the three digit code on the back of the card is given away in the tweet. This is one of 151 million results returned from a search on Twitter for credit card pictures.
Credit Card Picture and Tweet Posted on Twitter.
The aim of this work was to determine how much information was leaked by three test subjects, who used Twitter normally and had geo-location turned on, which would enable us to track them through cyber space using only social networks. A preliminary study was undertaken to determine feasibility and the results of this are presented. Using publicly available data from their social networking sites we show that it is possible to harvest private information. We then identified three freely available applications for mining Twitter data from the large range of available online tools. The starting point for the investigations for each subject was a picture uploaded to their Twitter account which had geo-location information embedded within it. Having identified personal information and patterns of behavior the three test subjects were then tracked to other social networks, which not only verified the information already gathered using Twitter, but led to more information being leaked. These three test subjects are indicative of typical Twitter user behaviors and they were selected as suitable candidates on this basis.
1.1. The Privacy Debate
In 2011 a number of high profile Twitter accounts were taken over by hackers and used to spread false information. This resulted in calls for Twitter to improve their security by making HTTPS standard for all users [12
]. In response, Twitter announced that it was the user’s responsibility to secure their own passwords and to manually select the HTTPS setting themselves. In May 2014, Twitter announced that it was improving the security of the password reset function. By providing a username, email or phone number, instructions can be sent out to the user to reset their password [13
]. They also added a login history so that they could identify false login attempts, for example if the user appears to be on the other side of the world trying to log in. This is still not as secure as many would like [13
]. Just how much online privacy protection should be provided by the sites themselves? It is the personal responsibility of the user to monitor what information is uploaded and shown. The ease of use goal of all social networking sites can overshadow their goals for privacy and security for the users [14
]. There is a risk that users are unaware of this, as many prefer a social networking site that is easy to use and it is this naivety that enables personal information leakage to occur.
] the information provided by their users. Any information provided can be used by Twitter or third party services to help identify users and it is the privacy settings that allows users to select how and if their information is used. Users have the option to include location information with their tweets which will be stored to provide “features” for services. Having location information in tweets allows the services of Twitter to customize and improve by providing the user with a “more relevant content like local trends, stories, ads and suggestions for people to follow” [15
]. Hidden away in the location support page are details about how to use this setting effectively to get the best results. It also raises a caution to all users to be careful about the information they share and to keep locations such as home addresses private by not tagging them using the location setting [7
]. There is a risk that the majority of users do not read this.
Most social networking sites, including Twitter, use the opt out approach for many of their features which means that once someone is signed up, it is their responsibility to check the settings if they wish to protect their privacy, as a number are on by default. Using this approach it would be expected that these sites would use the opt-in approach and allow users to willingly and knowingly turn them on, so that they can acknowledge they might encounter a lack of privacy as a result [14
]. Many people would assume this would apply to their online profiles as well but only half of social media users utilize the security settings and make their profile private [16
]. Unfortunately, most of the information is made visible to the public by default which opens the door to identity theft and cyber stalking. On the other hand, users making an update of their whereabouts or posting any sensitive personal data provide opportunities for criminals. For example, when someone posts “out for dinner”, “home sweet home”, “at gram’s”, “away for the summer”, “will be on the next flight to Paris”, or “just got admitted to St. Paul’s hospital” they have given away significant information to criminals such as burglars and stalkers.
The message about the dangers of social networking profiles that are viewable by anyone is not wide spread as 66% of Facebook users had no knowledge regarding privacy settings and 47% of teenagers had public Facebook profiles [17
]. According to a survey conducted by Welter (2013), the starting point for as many as 81% of Internet related crimes is a social networking site [18
]. Crime Wire have reported that 39% of social network users have been victims of profile hacking. Criminals are aware of the information leakage potential surrounding social network sites, with 78% of burglars having used Facebook, Twitter, Foursquare and Google Street-view to select their victims [18
]. As many as 54% of burglars were alerted to empty homes because people posted their whereabouts and their statuses on these sites. 50% of child sex offenders admitted obtaining information about the victim from their social networking profile. Surprisingly, 38% of Facebook users were identified as being under the age of 13 in 2010 [17
], in spite of Facebook’s policy which requires that to be eligible to create an account all uses be over 13 years of age [15
]. 85% of parents surveyed with children aged 13 to 17 said that their child used social networking sites. Statistically it has been identified that teenagers are more likely than adults to include their real age (50%), their photo (62%), their home town (41%), the name of their school (45%), videos of themselves (14%), their phone number (14%) and their exact location (9%) [17
In 2014 the setting “Protect my tweets” is still off by default, so that anyone searching the Internet can identify an individual’s tweets. There are also a number of other settings which are still on by default, such as Photo Tagging and “
Let others find me by my email address” and the author of this article [19
] could not think of any reason why someone would want to have this feature enabled. However, the Tweet location setting has now been turned to “off” by default. Neagu (2014) proposed that users need to defend their Twitter privacy by ensuring that they use strong passwords and by not posting information which discloses a current location, so there are clearly still issues with Twitter’s security policy [20
1.3. Related Literature
Many applications that use Twitter’s own APIs (Application Programming Interface) have come under scrutiny because of privacy issues. An example of this is the location application “Girls Around Me”. This API was very controversial and particularly worrying. A user running the application was presented with a map showing the profile pictures of anyone (male or female) in the same area who had recently checked in with Foursquare and who had a publicly visible Facebook profile. Foursquare is an application which allows users to check in wherever they go so that their friends can find them. The user could then select to view all females or all males without their consent. The people identified by this application were unaware that they were being targeted. The application took data from Foursquare to plot the locations of the people and the photographs displayed were from their Facebook accounts. The pictures could then be used by anyone to gain access to that person’s Facebook details [25
There are a number of papers which explore the use of Twitter for teaching in very diverse fields, from engineering and medicine in higher education to secondary schools to enhance the student experience [26
]. Pentina et al.
compared Twitter users in America with their counterparts in the Ukraine. The focus here was comparing how following well-known brands in these two cultures affected trust [32
]. Twitter has also become a reporting tool for journalists so that they can report breaking news, as demonstrated in two papers. The first paper discusses how national and international journalists used Twitter to share information and images during the four days of riots in the summer of 2011 [33
]. The second paper describes how the journalist Andy Carvin used Twitter to post messages and links to images supplied by demonstrators during the 2011 Tunisian and Egyptian uprisings [34
]. These are examples of how Twitter enables news to propagate around the world as it happens. Using Twitter to aid the Police in preventing and detecting crime is also a common theme and there is ongoing research into the use of Twitter to help to detect crime. Malleson et al.
conducted a study in Leeds, England to investigate violent crime and to determine the main population at risk. Their study is based on “crowd-sourced” data to identify crime hot spots [35
The approach taken by Gabielkov et al.
(2014) was to study the structure of the Twitter social graph to investigate the way that tweets propagate. They used a very large sample retrieved from Twitter to identify different Twitter feeds [36
]. Their paper demonstrated that it was possible to determine the location that a user was tweeting from when there was no available geo-location data. Using the language content of a person’s tweets, the location of the tweeting device and the time zone, an accurate location can be identified and the challenges of this approach are discussed [37
]. In our work we are using the geo-location data as the starting point for our investigation.
There are also a large number of papers on the use of Twitter for medical reporting and also for political campaigning in Australia but these are outside of the scope of this work.
The paper by Jeong and Coyle investigates the perceptions of young peoples’ concerns about privacy on Twitter and Facebook. Their study identified that the greatest concern was that people in authority, parents, teachers etc.
might gain access to their information, rather than complete strangers [38
]. The strategy used by the “42 middle school students” [39
] in another survey was to use false information on their profiles. This small group were aware of issues in online privacy but they had been selected as being the most “digitally engaged students”, which is not a representative sample of the whole cohort of students. They advocate teaching online privacy in schools in their conclusion [39
Zhang et al.
(2014) have undertaken a long term study of Twitter users’ concerns using surveys and interviews with adult subjects. The results outlined the concerns that the users of Twitter and other social network sites had and how these had changed over the duration of this study. They also included the users perception of the impact of targeted advertising related to what the user was posting at that time. The subjects of this study were aware that their data and posts were being mined and used for directed advertising. The term “creepy” in the title of the paper [40
] relates to the perceptions of the people interviewed to the targeted advertising using Twitter and not the application used in our work.
Mao et al.
based their work on the analysis of leaks in Twitter on a very large sample obtained over a nine month period [41
]. Their research was to identify privacy leaks related to three distinct categories of tweets, namely vacation plans, tweeting while being drunk and divulging medical conditions. They identified that people were tweeting on the assumption that this information was private as it was shared only with friends and family, which they classified as primary leaks. However the information leakage occurred during retweeting, which they classified as secondary leaks. They also identified that 76% of alerts related to vacations were easily available to burglars [41
Mearns et al.
(2014) developed a framework to retrieve tweets which enabled researchers to investigate trends and topics using time stamps, geo-location data and real-time streaming. This is looking at information flows and patterns between Twitter users rather than at individual users [42
]. Another paper presented a framework for determining a user’s location to within 10 Km by using the textual content of their tweets [43
]. Jin et al.
(2013) undertook a survey into user behavior on social networks such as Facebook, Twitter, Google+, LinkedIn, and Foursquare [44
]. They were using social graphs to identify relationships within the Twitter data based on friendships, followings, interaction between users and latent interaction, which is described as someone browsing a profile. They also included an investigation into traffic activity on these sites and the use of mobile devices for social networking sites. Their coverage of malicious behavior relates to spam and to attacks at the network level rather than individual misbehaving users, as in our work.
Retweeting is an important aspect for Twitter users, which is used to spread information amongst users. There is a body of research which has investigated the extent to which users retweet. Shi et al.
] use a dataset harvested from Twitter which, on analysis, showed that retweeting displays a “chain effect” and that retweeting is extremely popular because it is easy to use within Twitter applications.
Analysis of retweeting behaviour and the factors that influence how retweets propagate, such as the user themselves, the content of the tweet and the timing factor determine whether a tweet is retweeted is presented [46
]. They concluded that each user in their dataset retweeted in a seven day period, and that the majority do so at a low frequency, with the average number of retweets being 197. The 3% of users who retweeted more than 1,000 times in the same period were classified as retweet-aholics [46
Tweet and retweet behaviors were gathered in a micro blogging web site located in China, which has similar functionality to Twitter. They collected a data sample of 1.7 million users with 4 billion flowing relationships. 300,000 micro blogs were extracted which consist of the original “tweet” and all of the associated “retweets”. They determined that each “tweet” had been “retweeted” up to 80 times. They then modeled the “retweet” behavior, which demonstrated the same behavior is found on similar micro blogging sites as on more famous sites such as Twitter [47
The speed and degree by which information is disseminated through the “Twittersphere” is modeled in a number of papers. Jiang et al.
] used a dataset which comprised 1.7 million Twitter users. They identified 23.7 million retweets that originated from these users and categorized them according to their popularity. They defined four levels of popularity which ranged from “unpopular” to “highly popular” based on the number and the speed of diffusion of retweets within the data sample. Lee et al.
] have developed two models to identify the content of peoples’ tweets to identify how likely they would be to retweet and also the likelihood as to whether they would propagate information via retweets when asked to do so by a stranger. Pervin et al.
] have modeled “retweetability” using their “Information Diffusion Impact” model which examines the roles of the Twitter users involved in retweeting. These are the “information starter”, “amplifier”, and “transmitter”. They have validated the results from their model with a dataset from the retweeting that occurred directly after the Boston bombings and the Japanese earthquake. Lia et al.
] have developed a framework to measure the way that dynamic information propagates using retweets. They compare a large scale dataset from Twitter with their experimental results to evaluate their framework. Modeling retweeting behaviors is presented in this paper [52
] to predict how many times a tweet is retweeted. Liu et al.
use a two-phase model to predict how many times a tweet will be retweeted. They have also used a dataset from Twitter to test the accuracy of their framework when compared to real data.
The results of several retweet behavior models which have been used to evaluate Twitter data are presented by Macskassy et al.
]. Their dataset consisted of over 768,000 tweets which originated from 30,000 Twitter users and they concluded that there are a lot of factors that influence retweeting behavior and this complexity cannot be duplicated with a single model.
Current research into social networks is mainly focused on investigating the mining of information from Twitter to identify trends and themes, often using social graphs. Retweeting is part of normal behavior amongst Twitter users and follows a number of models, as the studies discussed above have shown. Some tweets are massively retweeted while others are only retweeted a few times. Our work differs from the approach taken by the majority of researchers in that we demonstrate the danger to an individual user of leaking even the smallest piece of information from social networks which could expose them to being stalked or even having their home burgled. The test subjects in this work were completely unaware of their exposure and they are all adults over 21 years of age. We believe our approach to this research area is novel.