There is no doubt that the so-called Big Data revolution [1
] is providing researchers from many disciplines with exciting new possibilities and avenues of inquiry. It delivers data at a volume, velocity and variety that was hardly imaginable only a decade ago [3
]. This is especially true for the social sciences, where the greatest changes in methodological paradigms are needed and many voices are calling for this situation to be addressed [4
]. The steadily increasing stream of data from social media sources is relatively accessible and offers promising insights into the motivations and mechanisms of collective as well as individual human behavior. It has already been used to predict box office returns [7
], elections [8
] and flu trends [9
], or even in disaster management and prediction [10
]. At the same time, the wide-scale adoption and ubiquity of location aware technologies in networked mobile devices such as smartphones provide geography and especially geographic information science with additional opportunities since, as Gordon and de Souza e Silva [11
] point out: “Mobile devices are the primary tools with which we access location.” This lead to convergence of GIS and social media [12
] and has even been called a renaissance of geographic information [13
]. Among the many social media platforms, Twitter is the most widely used in this field, partially because of its relaxed data policy. Although access has become more restricted in recent years, the free 1% data stream is still a relatively viable source of geolocated data that can be considered a fair representation of the whole user population, devoid of any systematic bias [14
]. However, whether this Twitter population can be used as a proxy representation for more general social processes is another question entirely. Critiques have been raised that draw attention to the fact that Twitter, as well as other social media outlets, constitutes a very specific subset of the people. It is, for example, unknown whether behind a user account there is a single person, multiple people or a bot [16
]. Our lack of knowledge about the demographics of tweeting severely impairs our ability to draw generalized conclusions [17
]. This is just one of the many problems and challenges faced when using Big Data from social media. One part of the Big Data equation is, as Boyd and Crawford [16
] call it, the “mythology” that consists of many simplistic assumptions like the ability to capture the whole of a domain, the lack of the need for theory and the faith that data can speak for themselves [3
]. These fallacies accompany the re-emergence of empiricism [2
] and its criticism mirrors arguments that were made during geography's quantitative revolution [18
], and these are still mostly valid.
Despite these widely acknowledged limitations, many researchers are using Twitter data to delineate city cores [19
], gain insights into travel plans and tourism [20
], characterize urban landscapes [21
], study global migrations [22
] or identify mobility patterns [23
]. Recently, there has been a growing interest in filling the gap in our knowledge about the demographics of both the Twitter user population as a whole and the subgroup of users that produce (or rather contribute since they may not be aware of it) an ambient geospatial information (AGI) [25
]. The former has been addressed on many spatial scales by a range of papers [26
] with the general conclusion that Twitter users are younger than the general population and derive predominantly from urban areas, with gender and ethnic biases still visible but becoming less pronounced over time. The latter, however, has been given much less attention. The problem was mentioned in Graham et al. [30
], where it was stated that it is most unlikely that the content of that part of the Twitter stream that is geolocated is not biased by socioeconomic status, location and education. Recently, Sloan and Morgan [15
] tested this hypothesis and concluded that the use of geoservices and geotagging is, in fact, dependent on demographic characteristics—with language being the most significant. Even more detailed analysis of uneven geographies of user-generated content has been conducted by Robertson and Feick [31
], which showed that even within one country there is a significant geographical variation at least partly influenced by socioeconomic variables.
In this paper, we are trying a somewhat different approach, by exploring divides between users based not on their demographic but rather geospatial characteristics, that is, spatial and temporal distribution of the content that is produced by them. We hypothesize that geosocial media production is a distinct phenomenon that is in many aspects indifferent to the socio-spatial context. Therefore, behavior in spatial media is more directly dependent on software than on location. Using the code/space metaphor [32
], we may explain this in a yet another way. The “code” part of this hybrid space, transduced by social media software, is location independent, while “space” components vary. We may be able, therefore, to observe similar behavior among users in different places. This is the result of what Foucault [33
] calls a “technology of self”, in which software compels its users to extend their capabilities but in the same time to act in ways that are pre-defined by the people responsible for the creation of the technology.
By utilizing methods of spatial analysis, we will try to address the following research questions:
Can geosocial content be treated as a single spatial representation or is this an averaged product of separate groups of users, very different in respect to their mode of social media use—in both spatial and non-spatial contexts? If the latter is true, it may indicate that a different research methodology is needed in the analysis of such data.
What are the spatial characteristics of geosocial media producers? This can possibly allow us to identify various spatial behavior types for Twitter users.
Is there a relation between the spatial behavior and posting activity of Twitter users? Are more active users also more mobile? If they are mobile and traveling while posting, this may suggest that the personal location information is used as a form of communication or even as a resource to increase their level of social capital [34
]. If there is no such relation, this may indicate that user location may not be as important in the social media environment and by extension in research.
Do these characteristics vary between different socio-spatial contexts? Investigating Twitter populations in cities with very different histories and geographies in various parts of the Europe gives us the opportunity to observe whether they have any impact on user behavior.
If there are different groups of geosocial media producers or there are differences between socio-spatial contexts, this may introduce biases. In what respect may these biases influence the analysis of geosocial media data?
4. Conclusions and Discussion
The above results make it possible to formulate answers to the questions posed at the beginning of the paper. Firstly, it is apparent that Twitter users can be clearly separated into at least two distinct groups based on the spatial characteristics of their geolocated content. It is also possible to separate users by the frequency with which they post geolocated content. In both cases, user distribution is bimodal and highly skewed. We can, therefore, imagine the Twitter population as consisting of people with very different modes of social media use in the context of its spatiality. There are users that post frequently from distant locations while traveling through the city, and there are also users that post rarely and from the same place—for example, the home or workplace. When we adopt a communication metaphor—that is, seeing location as a tool used for increasing social capital or as an additional information layer—we can see that the personal location information can be a carrier for a very wide range of meanings. Therefore, it seems that a set of geosocial data for a given place, at least on the city scale or greater, cannot be perceived and used as a whole when analyzing socio-spatial processes. It must be filtered not only by content and quality, but also by user groups and by the meanings associated with them. Locations are like homophones in language—a single set of coordinates or a place name shared on a social network can convey two entirely opposite meanings according to the social and spatial context in which they are read. This may have a significant impact on the perceived image of the given area—its virtual dimension [49
]—since social media have lately become one of the dominant forces shaping this, especially in tourism [20
]. Of course, we do not suggest here that the groupings used in our study are universal. Quite the opposite: it is entirely possible to classify users with another set of measures, for example, the number of followers or presence/absence of certain hashtags in their tweets. However, what is important is the fact that there may be significant spatial differences between these groups that must be acknowledged.
Our results suggests that there is a positive correlation between the spatial behavior and posting activity of Twitter users. Users that post more frequently are also mobile rather than stationary. This further strengthens the observation that location serves a purpose in communication on Twitter. Location is a meaningful part of the message or maybe even its main component. The bimodality of Power users' mobility, the fact that they are either very mobile or very stationary (Figure 5
), suggests that those are conscious decisions made by the users. The importance of this finding is increased by the fact that the most prolific users tend to entirely dominate the social media content stream. In the case of our data, Power users were responsible for at least 71% of all posts in every city. This also means that not only do geolocated (in our understanding of the term) Twitter data represent only a small percentage of the whole stream (1–3% according to various sources), but the spatial image that is produced is dependent on a very small number of people. This was also observed by Yin et al. [50
]. It is possible, however, to overcome or mitigate this bias. The former limitation can be mitigated by the use of geocoding techniques (e.g., [51
]), while the latter by both increasing the size of the dataset—to gather enough information about outliers and minorities and by applying normalization techniques. One such technique is to restrict data points to one location per user, that is, no matter how many times users tweeted from single a set of coordinates (or rather a small area to accommodate for a location estimation error), it has the same spatial weight. This procedure can lead to quite a different spatial image of the Twitter population [52
], but its usefulness is dependent on the purpose of the analysis.
Another question is how the spatial characteristics of Twitter users differ between cities that in our case represent various spatial and social contexts. At first glance, clearly the Twitter population in every place we studied is very similar. The summary statistics and the distributions of basic characteristics are weakly related to the population size and area of the city. Also, spatial characteristics suggest that there are some universal patterns in using the location services on Twitter—at least when we look within the limits of one city. When the limits of bounding boxes are lifted, the users do differ between cities. The greatest disparity is apparent in the localness index, which means that in some of the cities a much greater part of the geolocated content is produced by outsiders, especially since those are also the cities in which the Twitter population is the smallest. Yet the resulting overall spatial behavior of the users is similar. It may be the case that this is an example of behavior driven by software, by functions and services offered by the Twitter platform that shape the Foucauldian “technologies of self”.
The next vital step in the research path undertaken here should be to further increase the level of knowledge about the motivations and behavior of geosocial media producers. Ideally, this is an area for a mixed methods approach, where big data mining techniques can be combined with quantitative methods from the social sciences to unravel differences between groups of Twitter users. The aim of this paper was to highlight the presence and importance of these differences for research practice.