1. Introduction
The recognition that humans are the primary agents of change in natural system structure and function [
1,
2,
3] has led many researchers to adopt the concept of socio-environmental systems (i.e., socio-ecological systems or coupled human-natural systems) [
4]. Socio-environmental systems (SES) are defined as tightly linked social and biophysical subsystems that mutually influence one another through positive and negative feedback, as shown in
Figure 1. Within this conceptual framework, human behaviors, decisions, and policies influence the status of ecosystems (e.g., water quality) that, in turn, influence human beings’ quality of life and future decisions. For example, the structure and function of the natural landscape both influence and are influenced by natural resource use decisions and land managers’ actions [
5]. Thus, landscape changes are linked to both natural processes of landscape change (e.g., erosion, forest succession, climate change, etc.) and local and broader-scale economic, political, and cultural forces that motivate natural resource use. SES research is necessary for addressing many environmental problems, which require consideration of social and environmental factors as well as the feedback between them [
6], and for understanding human-environment interactions. The SES concept has thus been widely used to understand issues such as the vulnerability and resilience of human populations to natural hazards, where it is critical to consider the synergistic effects of population pressure, resource shortages, environmental change, and natural hazard events in order to prevent natural hazards becoming natural disasters [
7].
SES research poses many challenges, not least of which are collecting or compiling data at the appropriate scales and aligning social and environmental data to address SES questions [
8]. Collection of social, demographic, or political data often requires a trade-off between the fine-scaled detail necessary to understand behavior in a limited number of study sites (e.g., ethnographic case studies) and broad-scale spatial coverage that describes macro-level trends but is too coarse to provide insights into individual heterogeneity (e.g., aggregate census data). In addition, SES research is often complicated by mismatches between the spatial or temporal resolution or the extent of data describing important natural and social processes. For instance, the resolution of biophysical data, often obtained through remote sensing, does not easily correspond with the spatial units of social and political processes, such as administrative boundaries or census tracts. Furthermore, feedback between social and environmental actors can be characterized by significant temporal lags (e.g., processes of economic change operate on much faster time scales than those of climate change), through which the effects of human alterations of the natural system may not manifest until long after the alterations first occurred (e.g., [
9]). Longitudinal data with sufficient depth to capture the effects of both slower environmental processes as well as relatively faster social processes are rare.
While the temporal and spatial resolutions and extents of traditional, authoritative data sources such as censuses and surveys are highly constrained by time, money, and expertise, there is also great potential for non-scientists to collect data on SES at greater extents and finer resolutions. Citizen science projects such as the Christmas Bird Count (which has been in existence since 1900 [
10]) and projects using volunteered geographic information (VGI) (e.g., OpenStreetMap) [
11], where the public actively supply geographical information, have contributed to a greater understanding of ecosystem changes over time [
12]. Examples include the use of crowd-sourced data to map farm field size and land-use change at a global scale [
13,
14], monitor crops [
15], and record biodiversity information for residential properties [
16]. Now, with increasing internet coverage and cell phone use worldwide, social media sites (e.g., blogs, micro-blogs, social multimedia) have become an additional source of “big data” for information about social processes, reported in real time. To give a sense of the scale of information being shared online, Twitter, a popular micro blogging platform, has over 974 million user accounts and 126 million daily users, while Facebook has 1.2 billion daily users [
17] and Instagram, a photo and video-sharing site, has 500 million daily users [
18]. The photo sharing service, Flickr, has over 75 million accounts and over 10 billion photos and 3.5 million photos are uploaded every day to the site [
19]. There are also a plethora of other social media platforms such as QQ, WeChat, VK, and others. Much of social media data also has a geographical component (as we will discuss in
Section 2). For example, upwards of 50% of tweets from Twitter have some form of locational information in the form a place description or coordinates, while precisely geolocated tweets ranges from 0.5 to 3% [
20]. It is also estimated that 4.5% of Flickr content is geotagged [
21].
Social media posts act as a sort of unsolicited VGI, where people self-report their reactions to the digital and physical worlds. Unlike citizen science data and VGI, social media users put forth information that can be used for research, but without the goal of contributing to research [
22]. We would argue that social media data have several features that make them complementary to traditional social data sources for SES research. The large quantity of data points supports robust quantitative analyses without enormous costs to researchers to collect it, and the individual-level resolution allows for scaling between individual decisions, behavior, or motivations, and aggregate behavior. Social media, along with citizen science data and VGI, also provide access to new data sources, such as observations from private property [
23], and may be the only available source of social data in regions where large-scale social surveys are not carried out. Furthermore, the unsolicited nature of preferences, opinions, and perceptions stated on social media can produce unique insights and avoid some of the drawbacks of traditional survey methods, such as misinterpretation of survey questions [
24] and bias associated with stated preferences [
25].
The fine resolution and broad extent of social media data and the potential novel insights it can provide into individuals’ response to and influence on the environment make it an exciting data source for SES research, including issues of biodiversity conservation [
26,
27] and urban sustainability [
28]. In the last decade, this field has been growing, but there are concerns about the ability of social media data to provide reliable insight into socio-environmental processes because of issues of bias and interpretation (e.g., [
26,
27,
28]). These concerns lead us to ask the following questions:
How can feedback between social and environmental systems be meaningfully studied using social media data?
How can using social media data reframe or compliment current SES research questions and methods?
Are there best practices for collecting and validating social media data for use in SES research?
We address these questions in the remainder of this paper. First, we review studies that have used social media data to examine a wide range of topics in SES. These studies vary in their spatial and temporal scale and the social media data sources that they use as shown in
Table 1 but can be organized into categories based on the aspects of a SES that they focus on (i.e., Social, Environmental, Environmental → Social, or Social → Environmental, as shown in
Figure 1). We begin by describing studies examining purely social or environmental phenomena, using social media data as “sensors” (
Section 2). In
Section 3, we describe ways that researchers have used social media data to study people’s responses to the environment, including responses to natural hazards and attitudes towards natural areas (Environmental → Social processes), before turning our attention to studies on the effects of people’s behaviors on the environment (Social → Environmental processes) in
Section 4. These studies show how social media data provide insights into several land-related issues, including environmental and land-use change, ecosystem service provisioning, and informing land management and landscape conservation decisions. However, there are challenges to using and interpreting social media data, including several sources of bias. We discuss these in
Section 5 and provide recommendations for best practices in this field, with a specific emphasis on SES research. Finally, in
Section 6, we provide a brief summary of the paper.
2. Social Media Data as Social and Environmental “Sensors”
As people record their reactions to social and environmental phenomena on social media sites, the data they create can act as “sensors” tracking these phenomena [
29] for a wide variety of topics (as shown in
Table 1). Since social media data record people’s unsolicited views, they can provide insights into complex social phenomena that would otherwise go unnoticed [
30]. Social media data have several key attributes (i.e., metadata, in the form of JSON or EXIF files from Twitter and Flickr respectively) that together create a rich source of information that can be mined for SES research. Metadata provide various pieces of information about a social media post and/or platform user, which researchers can access using an application programming interface (API). These include: the content of the posts themselves, which can take the form of text, images, or video; an associated timestamp and (often) geographic location; a network of “followers” or users who see the post; and a number of “likes” or “retweets” by other users, which can be used to estimate the influence or popularity of a particular post (as shown in
Figure 2B). These different types of data are used to answer various research questions in SES studies, as shown in
Figure 2. For example, the location coordinates associated with some social media posts provide information that can be used to analyze where a tweet originates (e.g., [
31]) or where a Flickr photo was taken (e.g., [
32]), and analyzing text provides information on the subjects of tweets (e.g., [
33]). Thus, social media can track the “pulse” of public opinion and the popularity of different topics across space and over time, as well as how information spreads across networks, between individuals and groups [
20]. Sentiment analysis, which quantifies positive or negative associations with different posts and associated topics, can be used to compare people’s attitudes or emotional responses towards different topics, such as the transit organizations of different cities [
34] or urban parks [
35].
Spatial information associated with social media data provides insight into difficult-to-measure processes such as human movement [
46], including identification of hotspots and movement patterns of tourists [
47] and differential movement abilities between residents of advantaged and disadvantaged neighborhoods [
36]. Topics from georeferenced social media posts, as well as other types of VGI such as travel blogs and wikis, can be used to discern people’s conceptualizations of place (i.e., geo-narratives), including defining land-uses [
48] and regions of thematic saliency [
49,
50]. Combined with the identification of important conversation topics, social network analysis exposes links between people, places, and topics, such as discussions of microcephaly, abortion rights, and mosquito control with the spread of the Zika virus [
33].
VGI, including social media data, can also contribute to environmental monitoring efforts and provide insights into environmental phenomena, including filling gaps in authoritative environmental datasets [
51]. For example, the locations of social media data have been used to track the extent of natural hazards such as floods [
52,
53,
54], earthquakes [
31,
55], disease outbreaks [
56], and wildfires [
38,
57]. In the case of such high-impact, rapid events, the high spatial and temporal resolution of social media data can make it critical for identifying places where people are in danger, as well as mapping the extent of hazards, particularly in areas lacking monitoring devices [
57]. In a few cases, researchers have tapped social media data to document longer-term environmental trends, particularly ecological processes such as invasive species spread [
39] and the timing of recurring events, such as pollen release [
58] or leaf emergence [
59]. Social networking sites can also serve as citizen science platforms where individuals leverage their social network to help identify species they observe, by posting pictures, audio files, or descriptions online. iNaturalist is an example of a social networking site specifically designed for this purpose (inaturalist.org), but such crowd-sourced species identification activities also occur on citizen science platforms such as eBird (
https://ebird.org/home) and on social networking sites, such as Twitter [
60]. Species observations on social media can provide valuable information on species distributions [
39,
61,
62] and document animal behaviors captured in photos, videos, or audio recordings (e.g., female bird songs [
63]).
In addition to recording social and environmental phenomena, social media data provide insight into linkages between social and environmental systems (as shown in
Figure 1). Most of the SES research using social media data has focused on people’s responses to the environment, including natural hazards and ecosystem services (as we discuss in
Section 3), rather than effects of people on the environment (as we discuss in
Section 4).
3. Responses to the Environment: Perceptions, Attitudes, and Opinions
Just as social media data can track the extent of natural hazards, they also provide information about people’s responses to those events. For example, in addition to simply analyzing the spatial and temporal occurrences of posts related to wildfires on Twitter to map their extents, researchers have delved into the content of posts to gain insight into people’s socio-psychological responses to wildfire, such as concerns about property damage and health impacts as well as gratitude towards rescue workers [
44,
57]. There have also been several studies using social media data to examine risk perceptions to natural hazards and environmental disasters, including winter storms in New England [
43], Typhoon Haiyan in the Philippines [
64], flood and wind damage from Hurricane Sandy in the Northeastern U.S. [
65,
66], and algal blooms in Ohio [
40]. Social media data provide insight into people’s concerns and situational awareness, both during and following events. Together with spatiotemporal information on the location of threats in real time [
42], knowledge of people’s concerns help to guide and improve hazard response strategies [
43,
57]. Furthermore, social media posts can indicate the degree to which the public understands the environmental issues underlying crises (e.g., connecting a water supply shutdown with the toxic algal blooms that caused it [
40]), the level of satisfaction or dissatisfaction with the government response to the event [
43], and the importance of social networks for information spread during disasters [
67].
Compared to research on responses to natural hazards, few studies have examined responses to longer-term environmental trends such as climate change over the periods at which the changes occur (i.e., decades). Most extant social media platforms have not been established long enough to capture long-term trends, but ongoing data collection could provide insight into people’s responses to longer-term changes. For example, several studies have used geolocated tweets to examine human mobility [
68,
69]; these methods could be applied to study changes in human movement (e.g., environmental migration) in response to changing climate, if collected over longer periods of time.
Social media data also provides information about the benefits that people get from ecosystems (e.g., ecosystem services) and how these vary across space and time in response to environmental features or management decisions. Much of the work on this topic has focused on people’s attitudes towards conservation areas and other environmental features [
26]. Sentiment scores of tweets are used to quantify positive effects of exposure to nature, such as urban green spaces, on human wellbeing [
70]. A number of studies have used the frequency of social media posts to estimate the number of visitors to recreation sites (e.g., [
71,
72]), and how recreation desirability relates to features such as water quality in lakes [
24], or park amenities and accessibility [
73]. Others have calculated the number of photos posted on social networking sites (e.g., Flickr, Instagram, Panaramio) at different locations to compare the aesthetic value of different sites or landscape features [
74,
75]. The content of social media posts (e.g., the subjects of photographs) can also document the activities people participate in and the features they notice or appreciate at different sites, such as preferences for animal species in a national park [
42]. Analyzing the content of social media posts in natural areas has been used to infer the “cultural ecosystem services” (e.g., spiritual or aesthetic appreciation of nature, recreation, sense of place) they provide [
41,
76,
77]. Notably, large-scale studies of aesthetic value and other cultural ecosystem services would not be feasible without these types of broad extent, fine-resolution data, which allow for comparisons between different landscapes. These data could be used at large scales to understand the tradeoffs between cultural ecosystem services and other goals for landscapes, such as food production, regulating ecosystem services, or biodiversity.
Beyond quantifying the benefits that people derive from landscapes, social media studies can inform land management decisions. Some researchers have used people’s attitudes towards subjects in social media posts to assign non-monetary values to places and actions, such as land conservation, as well as guide decisions for land management and planning to meet people’s preferences and expectations [
78]. For example, Barry [
79] used Flickr photos and associated comments to examine people’s perceptions of livestock grazing in San Francisco Bay Area parks, and Sonter et al. [
80] examined the effects of forest clearing on nature-based recreation in Vermont.
In addition to capturing people’s responses to particular environmental events or natural features, identifying popular topics or trends on social media can provide information on the environmental issues that people care about [
81,
82]. Google Trends and Twitter in particular have often been used to track interest in topics over time and across space. For example, Cha and Stow [
40] used Twitter to monitor online discussions around harmful algal blooms and the subsequent drinking water shutdown in Toledo, OH in 2014 and Google Trends to follow broader-scale interest in this issue over time. Twitter has also been used to gauge public interest in other environmental issues, such as people’s opinions about invasive species [
39] and climate change [
83]. Using spatial and temporal information and social networks, researchers can track how the interest in a topic varies across space or time [
29,
84], identify key stakeholder groups and information sharers [
85], and understand how network structure influences the sharing and spread of information on environmental topics [
60,
86]. Research findings can inform education and tailoring of messages to increase interest in or understanding of key environmental issues. Roberge [
87] used Twitter to assess interest in different species, which provides information on the efficacy of conservation outreach programs focused on different species and how this could be improved. Based on these works, one could easily imagine that in the future, an increased understanding of how people respond to the content and deliverery of messages could be used to tailor messages, to more effectively change people’s attitudes towards environmental issues and behaviors that influence the environment [
88].
4. Effects of People’s Behaviors on the Environment
In general, behaviors are more difficult than attitudes to detect on social media, and thus less research has addressed the environmental impacts of relevant behaviors. However, it is also possible to use this data to detect some aspects of people’s effects on their environment via their self-reported behaviors. Reported participation in resource-use activities such as fishing (as shown in
Figure 3), foraging, or hunting [
45] could complement more authoritative data sources on these activities, particularly at local spatial scales. In addition, people may post on social media when making changes to their properties that have environmental consequences (whether positive or negative), such as planting native species in their gardens to provide habitat for endangered pollinator species, building a floodwall, or installing solar panels (as shown in
Figure 4). Some users also report other behaviors on social media that have an indirect impact on the environment, such as choices to buy or promote “green” products.
Location data from social media can also provide information on the degree of human impact on natural areas. For example, in parks or other natural areas where location estimates are accurate and most users are involved in recreation, information on “hot spots” of use [
89] can inform decisions about trail maintenance and monitoring of erosion and other environmental impacts [
90]. At a larger spatial scale, social media data and VGI can be used to identify the locations of roads, fishing boats, or areas of high human movement, which may indicate potential threats to vulnerable animal populations (e.g., [
90,
91,
92,
93,
94]). More generally, analysis of social media posts that provide information on land-use and species observations could be combined to better understand the broad-scale effects of land-use change on biodiversity, an area in need of research at large spatial scales [
95].
5. Challenges and Best Practices
While social media can be an informative data source for SES research (as discussed above), there are concerns about bias in these data and their ability to provide reliable insight into social and socio-environmental processes [
96]. There are some known biases in the users of social media data: Towards urban dwellers, with fewer users in rural areas, and a bias against populations that are technology averse, including the elderly [
97]. In addition, there are some known differences between the user groups of various social media platforms [
97]. Beyond these known biases, there is generally not enough information on social media users to identify biases in any particular study’s sample population, because of concerns about users’ privacy that limit the availability of personal information. Early research on social media revealed that even when usernames and other personal information is masked from data, it can still be possible to link specific posts to individuals [
98]. Thus, social media companies have constantly evolving measures to limit access to sensitive information and data sharing. Changing options about how spatial locations are reported mean that the amount and precision of georeferenced data is constantly changing, as well as variable among posts. In addition, without basic information about who users are (e.g., where they are from; their age, race, gender, education level, etc.), it can be difficult to understand the complexities of social dynamics on social media and to recognize and deal with potential bias in the sample population.
The lack of information on who social media users are, and how they decide what to post online, can contribute to difficulty in confidently interpreting the content of social media posts. This is particularly an issue in cases where social media is used as “big data” with analysis of myriad posts from many individuals. A single social media platform can have many different uses for different individuals and populations [
64]. For example, Twitter is used for professional networking and self-promotion, sharing news and information, or communicating conversationally with friends. Individuals may show biases in their behavior on social media depending on when in life they adopted its use [
99,
100,
101] and whether their perceived audience is made up of friends, family, colleagues, potential employers, random strangers, or a combination of all of these. The meanings of specific terms (e.g., slang) and the tone of posts may depend on the identity of the user and their intended audience, which are often unknown. In addition, sarcasm and insincerity can be difficult to detect, especially when using algorithms to process data (e.g., for sentiment analysis [
35]).
The major concern with using social media data for SES research is thus that either systematic biases or misinterpretation of data could lead to inaccurate conclusions about the social or environmental phenomena under investigation. A further issue is that it is generally not feasible to validate the data by asking follow-up questions or repeating a sample. Studies using social media data are often difficult to replicate because of constraints on data access and sharing. Some sites, such as Twitter, allow researchers to collect a small percentage of tweets in real time for free, but access to the full dataset or accessing past data usually requires payment to intermediary companies (e.g., Crimson Hexagon Data Library Platform, the sproutsocial platform, or sprinklr). Thus, different researchers could make the same query to retrieve data and return very different samples. Other sites, such as Facebook, have almost completely cut off access to data for researchers because of privacy concerns (although they have shared the data to companies that use it for political and economic means [
102]). Restrictions on sharing data also reduce the replicability of data analysis.
Although the biases in social media data are important to recognize and take into account when designing studies and interpreting results, there are existing strategies to address some of the primary concerns with social media research [
28]. With respect to SES research, we have identified three major categories of best practices, which we describe in the following subsections. Specifically, we describe the importance of selecting research questions which are appropriate for the data being collected (
Section 5.1), engaging with theory (
Section 5.2), and data validation (
Section 5.3).
5.1. Design Research Questions That Are Appropriate for the Available Data
As with any new data or modeling technique, one has to understand the limits of social media and how it is different from other data sources that were collected for a specific reason and from a specific population. The attributes of social media data that make them challenging to work with (i.e., biases in who uses different platforms, unknown bias in particular samples, and difficulties in interpretation, as described above) limit the set of questions that can be reasonably and confidently addressed with this data. For example, questions that require a representative sample of the population may not be appropriate for analysis with social media data (although there are some ways to account for bias; see
Section 5.3). Identifying appropriate research questions up front is key to producing valid results.
Social media will likely not be a suitable data source for all environmental phenomena. Only highly salient topics, such as major events (e.g., hurricanes) and controversial issues (e.g., pollution), are likely to have adequate coverage to support analysis. Less salient topics, such as routine activities (e.g., commuting), which have significant environmental consequences, may not receive sufficient attention to support analysis. Furthermore, since only a fraction of people uses any given social media site and many platforms only allow researchers access to a fraction of the data, the absence of a phenomenon in the data cannot be taken as evidence that a phenomenon has not occurred. For example, some species, such as coyotes, might be observable mainly at night and/or in rural landscapes [
103], which are both contexts in which social media users may be under-represented. However, it is possible to use the presence of a phenomenon to negate a null hypothesis that it does not exist (e.g., [
71,
73]). Similarly, when looking at responses like emotions or opinions, one will never find the full distribution of what is occurring but can potentially capture the ends of the range (e.g., extreme positive and extreme negative responses). Given these potential observability gaps, major trends or salient topics provide a better opportunity to leverage social media data for SES research.
5.2. Engage with Theory to Test Hypotheses and Interpret Data
Some have suggested that the role of theory has been diminished with the rise of “big data” (e.g., [
104]). However, it has also been argued that theory is necessary to interpret data, identify important aspects or dimensions of phenomenon to measure (e.g., [
105,
106]), turn data into information (i.e., link data to how we think people behave), and identify points in SES feedback loops where data can be collected. Theory is also useful for formulating hypotheses about social-environmental interactions (e.g., the state of the natural resource and the behavioral response or pattern of use) that can be tested with available social media data. In particular, social media data offer unique opportunities to test hypotheses generated from social and psychological theories, because observations are made in situ, at the level of individuals, and often repeatedly over time as events are unfolding. For example, Barberá et al. [
107] tested hypotheses about ideological differences in the formation of “echo chambers” using Twitter exchanges across a range of political and non-political issues, and found key differences in communication structures between ideological groups consistent with psychological theory about ideological motivations. In contrast, conventional psychological theory testing is conducted with hypothetical situations presented in controlled lab experiments or with one-time surveys using surrogate test groups (e.g., university students). Similarly, theories of social processes that operate through networked interactions are difficult to test, because empirical social network data is difficult to collect and often only a snapshot of a network is available, at one point in time [
108]. In addition to these, other social and/or psychological theories lend themselves well to testing with social media data. Theories of social amplification of risk [
43,
109], salience [
110], and habituation [
111] are relevant to studies on people’s responses to natural disasters. For example, habituation has been used to explain people’s responses to earthquakes on social media [
31]. Risk aversion (e.g., prospect theory [
112]) and opportunity costs (e.g., travel cost method) could help explain patterns in human movement and other actions captured on social media [
113]. Theory on the evolution of social norms and institutional analysis (e.g., [
114,
115]) could provide valuable lenses for understanding past and potential future reactions to environmental regulations (e.g., climate regulation [
116]).
5.3. Data Integration, Calibration, and Validation
The same standards of careful crosschecking, modeling, and testing for model sensitivity that have been developed for social science research should be applied to analyses using social media data. Strategies to account for sampling bias, comparison to authoritative data, and combining different lines of evidence to interpret data (i.e., triangulation) can help to make social media studies more robust.
Due to the inherent biases with social media, it is important to quantify the uncertainty in the data, to the extent possible, and distinguish between variation in the data that is attributable to inherent sample variability, and variation that is attributable to measurement error. When the desired study population is known, it is possible to account for sampling biases in the sampled population to some extent by comparing it to the known demographics of social media platform users. For example, Keeler et al. [
24] used Flickr photographs of lakes to examine lake visitation in Iowa and Minnesota, and compared the demographics of survey respondents from a previous study on Iowa lake visitors to the worldwide population of Flickr users to identify potential biases in Flickr users visiting Iowa lakes. Although demographic information about the sample population is generally unknown, it is often possible to infer information such as where users live based on the centroid of the locations of their posts [
36]. Some researchers also identify types of users or stakeholders, based on the content of their posts or their stated affiliation [
34,
64,
85]. As different social media platforms have different user groups, combining data from different platforms into a single analysis can also reduce the overall bias in the sample [
117].
Census data and other types of authoritative data can help to account for geographic bias in social media datasets as well. A simple approach would be to map the locations of social media posts and compare the densities of observations to authoritative data on where people are, such as population estimates from official censuses or data on tourism or transportation routes. Researchers can use this information to weight data from high-population areas more than others in analyses [
36] or test the sensitivity of models to population estimates or the density of social media users across space [
66]. When examining observations related to a particular topic, comparing the spatial distribution of those observations to social media observations overall could also account for geographic biases (i.e., observation error) in where people are using social media.
In addition to census data, other authoritative datasets can validate findings from social media analyses. This strategy can provide more robust conclusions and, when the findings align, increased confidence in the use of social media data as a proxy for authoritative data sources. For example, studies using the number of social media posts in different locations have found strong concordance with on-site data collection on the number of visitors to recreation areas [
24,
71]. Other studies have also found concordance between tweets referring to smoke from wildfires and Environmental Protection Agency (EPA) air quality monitoring data [
57] and between Flickr photos relating to Hurricane Sandy and atmospheric pressure measurements from National Oceanic and Atmospheric Administrations (NOAA’s) Automated Surface Observing System [
52]. These findings provide evidence that social media data can act as useful sensors in areas without authoritative data collection, and thus can be used to scale research up to larger study areas. A promising future direction for social media research with respect to SES research would be working backwards from conventional data sources to derive an expected distribution of observations and comparing these to observations from social media data to identify gaps in data collection.
Even in cases where authoritative data are not available, combining different data sources or lines of evidence can be very useful for calibrating and validating interpretations of social media data. There is great potential for applying qualitative social science methods such as triangulation and crystallization to social media analyses to interpret data. For example, analyzing both the composition of photographs and the associated text can provide a more conclusive idea of the identity of the intended subject [
118]; similarly, automated analyses of the sentiment associated with text can benefit from a qualitative analysis of the text’s meaning and relevant themes [
119].
6. Summary
While analysis of social media data will never replace traditional research methods, there are several advantages to the use of social media data for SES research that make it a useful compliment. The unsolicited nature of social media is akin to revealed rather than stated preferences in economics, and may make it more appropriate than traditional surveys for uncovering new social phenomena, capturing rapidly changing situations, and understanding people’s true views on certain topics [
24,
25]. In addition, many studies (like the ones shown in
Table 1) have shown the utility of social media for accurate social and environmental monitoring in areas where authoritative data are lacking. Insights from social media will likely only increase in the future, as will strategies to account for the particularities of the data to draw robust conclusions. We see particular promise in the use of social media to scale up from existing in-depth, small-scale studies (using authoritative data at the smaller scale to ground-truth findings) and quantify individual behaviors for parameterizing and validating complex systems models [
120,
121]. In many cases, social media analysis will be used as an exploratory tool, documenting a new phenomenon, particularly in locations and populations where data from traditional sources are lacking. Recognizing the exploratory nature of such studies and following up with more targeted studies, including those using more traditional methods, is necessary to test whether the phenomenon is real.
In the current age of constant data creation, the potential to harness existing “big data” to address SES research questions is considerable. In only the last decade, social media data has provided insight into SES relationships at broad spatial scales, including people’s perceptions of risk from natural hazards, how people value recreation areas and ecosystem services, and even how people select environmentally relevant behaviors. Thus, studies using social media data contribute to knowledge about land-use and environmental changes, natural resource use, and ecosystem service provisioning, with the potential for advancing SES theory and informing land management and planning. Although there are some important caveats to the use of social media data for research, thoughtful selection of appropriate research questions, interpretations guided by theory, and creative methods for addressing bias and uncertainty offer promising solutions to many of these issues and provides us with new opportunities to study SES systems.