Responding to societal crises requires sustained changes in individual behaviors [1
]. For example, to slow the spread of the COVID-19 pandemic, governments worldwide have enacted a variety of mitigation measures that include physical distancing, a ban on public gatherings, quarantines, and mask mandates. However, public response to these policies was influenced by factors such as political partisanship and attitudes towards science [4
]. While conservatives were more skeptical of government policies to limit the spread of the virus [6
], mistrust of authorities and experts extended beyond partisanship [7
]. Mistrust of science unites a diverse coalition of “anti-vaxxers”, “natural parenting” and alternative medicine advocates, and others who are suspicious of “big government” and “big pharma” that refuse to follow some of the official health guidelines. Understanding science skepticism will help us better frame policies that encourage the behaviors necessary to respond to the next health or environmental crisis [9
Anti-science attitudes are typically assessed through surveys, which ask people to report how much they agree with scientific statements about vaccine safety or human activity causing climate change, for example. Using the responses, researchers have examined the role of political ideology [8
] and religious views [10
]. Researchers have linked attitudes toward science to scientific literacy, showing that scientific knowledge explains anti-science views while controlling for key socioeconomic variables such as religious faith, gender, and income
]. More recent research has examined cognitive and emotional factors, such as reasoning ability [14
], fear [16
], and anger [17
In this paper, we present a framework for measuring anti-science attitudes that are expressed through messages posted on social media. This approach can complement existing methods [18
] by enabling us to study attitudes at the population scale with high spatial and temporal granularity, as well as measure their evolution over time. We illustrate this through the framework of studying the social and cognitive factors of the communities where users live that explain the prevalence of anti-science attitudes.
Researchers have previously used social media data to characterize the political ideology of users. Conover et al. [21
] introduced a text-based approach, using sets of political terms to measure partisanship. In contrast, Barbera [22
] applied a latent space model to the follower graph, while Badawy et al. [23
] used label propagation on the retweet graph to infer the political leanings of social media users. More recently, researchers leveraged a set of curated news and information sources to label the political orientation of users based on the partisanship of the information sources they share [24
]. Others have operationalized users’ vulnerability to fake news and misinformation based on their attention to low-quality information sources [26
We adopted a similar approach to characterize science skepticism, using the number of links to curated pro-science and anti-science information sources users share on social media as a measure of their attention to anti-science content. Specifically, we used a random 10% sample of all messages (tweets) publicly shared on Twitter in October 2016. We further restricted the tweets to those with location metadata and linked them to counties within the US. We trained machine learning classifiers to infer users’ attitudes towards science based on the information sources they shared. In addition, we inferred emotions expressed in tweets and also linked data to political partisanship through county-level outcomes in the 2016 presidential election.
To explain variation in attitudes toward science, we studied their relationship to the socioeconomic characteristics of communities (counties) from which people tweeted. Although the share of social media users expressing anti-science attitudes was correlated with the population of the county, the education and income of its residents, and their political partisanship, these relationships were confounded by the correlations among the variables. To disentangle these effects, we performed robust statistical analysis to identify groups of counties with similar trends. Our analysis revealed three types of communities and how they differ in their skepticism of science: communities in large metropolitan regions, smaller metro (and suburban) areas, and rural regions. Anti-science attitudes were more prevalent in older, more affluent rural communities and younger metro regions. Across all types of communities, a larger share of Trump voters and non-White residents was associated with more anti-science attitudes. Additionally, we see that these attitudes were associated with lower emotional valence and high arousal, but only in small metro and rural areas, suggesting an important role that anger plays in the mistrust of science. We also found that anti-science attitudes partly explained differences in COVID-19 vaccination rates almost five years later. Our study suggests that pre-existing science skepticism may have provided fertile ground for the resistance to COVID-19 mitigation measures to take root in communities across US.
The contributions of this work are as follows:
We described an approach to estimate anti-science views from the text of messages posted on social media, enabling the tracking of attitudes toward science at scale.
We studied the geographical variation of anti-science attitudes and found differences across the US.
We identified the latent structure of data using state-of-the-art machine learning methods to demonstrate the importance of stratifying data on latent groups to measure more robust trends. The structure of data suggests that cultural differences are defined by the urban–rural divide.
We found that anti-science attitudes are associated with lower COVID-19 vaccination rates in urban communities.
Our analysis revealed the importance of partisanship, race, and emotions such as anger in explaining anti-science attitudes. However, education is not found to have significant explanatory power and income is only mildly significant.
Given the global adoption and deep penetration of social media, especially in urban areas, this framework enables the population-wide monitoring of the expressions of attitudes at unprecedented spatial and temporal scales with high resolution. The framework can be extended to situations where surveys or interviews are not practical, but where online devices are present, which includes much of the developing world. In addition, this data-driven framework can provide guidance about where the confounders lie when conducting survey-based studies and design models that better reflect the hidden structure of data.
Social data are often highly heterogeneous, i.e., generated by individuals with different behaviors. The divergent trends at the individual level may disappear when data are aggregated, confounding the analysis [42
]. To mitigate this effect, we used a recently developed machine learning algorithm to disaggregate the data and perform regression analysis within the discovered groups. To explain the share of social media users within communities expressing anti-science attitudes, the method identified three groups, which roughly map communities within large metropolitan areas, smaller metro and suburban areas, and rural areas. This suggests that the urban–rural divide defines cultural differences in anti-science attitudes in the US. In all three groups, the share of White households is negatively associated with anti-science attitudes, meaning that counties with a larger non-White population also have more people expressing skepticism of science. In the pooled data (Figure 3
(left)), there was little association with age, but in the disaggregated data there are stronger, statistically significant associations between anti-science attitudes and age (Figure 3
(right)). Specifically, rural communities with older population have significantly more anti-science users, but older urban communities have fewer anti-science users. The trends with respect to income also show Simpson’s reversal [42
]. In the pooled data, income has a negative association with anti-science attitudes, but in the disaggregated data, there is a positive association with income in rural and urban (though not statistically significant) counties. Household size was not appreciably associated with anti-science attitudes in the pooled data but has a strong relationship in one of the groups: rural counties with larger households express more anti-science attitudes. Finally, Trump’s vote share was associated with more anti-science attitudes across all groups.
One surprise is the relative unimportance of education, which we measured as the share of county residents with college education (or above). Although this variable is negatively correlated the with anti-science share (, ), its coefficients within all three groups, while also negative, are not statistically significant. This suggests that most of the effects of education can be explained by other variables, such as income and partisanship.
We also found that anti-science attitudes were associated with significantly lower COVID-19 vaccination rates in urban communities in 2021, even after controlling for political partisanship, race, income, and education. Our analysis does not imply a causal link; instead, it suggests that pre-existing anti-science attitudes may have allowed COVID-19 misinformation and resistance to mitigation measures to spread, rather than vice versa.
We found emotional factors to be strongly and significantly associated with anti-science attitudes, but only in rural and smaller urban areas. In fact, the effect of psychological factors, especially arousal, was about twice that of partisanship. Together with negative emotional valence, high arousal suggests that anger plays an important role in explaining anti-science attitudes, especially in less densely populated areas. Researchers have begun to explore the psychological antecedents of science skepticism [44
], but more work is needed in this area.
When performing an analysis of social media data, one must keep in mind that Twitter users are not a representative sample of the population. Moreover, geo-coded tweets posted in a given area may not come from residents, but from users visiting those areas. This will weaken the link from social media expressions to the characteristics of residents of a given area. However, we believe this applies to a negligible number of tweets. Additionally, unvaccinated people in each county may also include a small fraction of people who have not had the chance to receive a vaccine. However, we work under the assumption that this is a negligible number as opposed to the ones who prefer not to receive one.
Flaws in the automatic detection of anti-science attitudes may bias results. We put strict constraints on the classifiers, eliminating users from analysis whose predicted anti-science scores were not extreme enough. This reduced the number of users in the dataset. Additionally, attitudes could have shifted over the years, diminishing the utility of the 2016 data in understanding current anti-science attitudes. To check, we compared the share of anti-science users in the 2016 data explored in this paper to the share of anti-science users measured in the 2020 data using a similar methodology [25
]. The correlation at the state level is high at 0.6. This suggests that attitudes are stable over time.
Multi-collinearity could affect regression coefficients. To address this question, we used ridge regression to estimate regression coefficients and found them to be largely unchanged.
We used public messages posted on social media in October 2016 to track anti-science attitudes in the US. By linking these messages to specific regions in the US through their geographic coordinates, we were able to estimate the number of people in different places who share anti-science attitudes and measured the association with their socioeconomic characteristics.
To disentangle the effects of different variables, we used machine learning methods to disaggregate the data and estimate regression coefficients separately within each group. This analysis split counties into three culturally meaningful groups: those belonging to large urban centers, smaller urban areas, and rural regions. While political partisanship (specifically, the share of Trump voters) was strongly associated with the county’s share of anti-science users, race and psychological factors were also found to be important. Specifically, places with a larger share of non-White population had more anti-science users, and places outside of large metro centers where users expressed anger (negative valence and high arousal) were associated with more anti-science users. We also found that anti-science attitudes in large metropolitan areas are associated with significantly lower COVID-19 vaccination rates nearly five years later.
Crafting effective policies in response to a crisis requires an understanding of public attitudes and views. While public messages posted on social media have long been seen as a practical sensor of these views, past efforts in this area were hampered by the heterogeneous nature of social data. Our paper describes a framework to tame the data heterogeneity and illustrates that it demonstrate that social media is a practical tool for monitoring anti-science attitudes.