Currently, an emerging research agenda is to use social media data to analyze the public opinion on climate change issues. Public perception of the existence of climate change and its impacts on the environment and society is an important issue with societal and political implications [1
]. Public supports are also crucial for legislation and the implementation of mitigation policies to climate change [2
]. However, although there is a consensus in the scientific community that climate change is mainly caused by human activities and is already having significant negative impacts on the environment and society [3
], many studies report the lack of agreement within the general public that anthropogenic climate change is occurring [4
]. This discrepancy between the scientific community and the general public on climate change is actually a science communication problem, which has stimulated many scholars to investigate the key factors influencing public attitudes or preferences on topics related to climate change [5
Public opinion analytics is essential for a better understanding of the social environment and the dynamics of social changes. Among various sources of public opinion data, social media data is attracting great attention from researchers, as it provides highly valuable data about the public attitudes and opinions on controversial social events [6
] and has been widely used to monitor and analyze public responses to natural or social phenomena [7
]. For social science research, the nature of social media data is significantly different from the data collected by traditional questionnaire surveys [8
]. Although the sample size of social media data is usually much larger than that of survey data, and the collection process of social media data is also much easier and cheaper [9
], the utility of social media data is still limited by some constraints. First, social media data are usually collected by keyword search, introducing the possibility that the data may only contain the keywords and are not actually relevant to the research topic. In addition, social media data are usually textual data which are unstructured and difficult to analyze with traditional statistical methods. On the contrary, survey questionnaires are always elaborately designed to investigate an explicit research question, and the measurement systems (e.g., Likert scale) are also mature. Hence, the survey data are well-structured and can be easily processed by traditional statistical methods. As a result, social media data are regarded as “case-rich but variable-poor”, while survey data are regarded as “case-poor but variable-rich” [8
]. Consequently, most social science studies based on social media data are descriptive analysis or with simple correlation analysis, rather than predictive or prescriptive analysis [11
Recent studies on scientific communication, based on theories such as Mode-2 knowledge production or post-normal science, show that knowledge is created not only by qualified actors in official spaces such as universities [15
], but also across multiple sites and by multiple actors [16
]. With its increasing popularity, the Internet is now a major place for large-scale and complex discourses on climate change issues [18
]. Blogs and social media have been recognized as an “alternative site of scientific knowledge production” and a “site of knowledge contestation” [19
]. The high visibility of massive user-generated-content to the public and free comments from a large number of readers make these online services interactive [18
] and ensure the involvement of the public speaking back to science, thus creating new public arenas where scientific information becomes more socially robust through contestation [20
]. Due to the large availability to the public, blogs and social media are also used by climate scientists as a means of dissemination [21
]. In sum, the online space “has become an increasingly important forum for climate change issues, both from a scientific and political standpoint, and for environmentalist campaigners and climate skeptics alike” [18
Previous studies have investigated the online climate change communication in the blogosphere [18
] and social media platforms such as Twitter and Facebook with big data technologies [1
]. These web services were originally designed for personal emotion expressions and social needs. On these sites, users publish comments on or share information related to climate change issues with strong sentimental experience, and correspondingly other users also respond to these pieces of information emotionally rather than rationally. In addition, although climate change issues are often discussed on blogs or social media platforms, the relevant messages are highly scattered, even within a site, creating obstacles to data collection. Fully unstructured data also make it difficult to quantitatively analyze. As a result, previous studies using the big data from blogs, Twitter, or Facebook have mainly focused on descriptive analysis of online climate change discussions, such as social network analysis of bloggers [19
], linguistic analysis of blog articles [18
], and spatio-temporal distribution analysis of tweeting behaviors [23
]. Predictive or prescriptive analysis, such as key factors influencing public preferences on climate change knowledge and opinions, cannot be comprehensively investigated with previous methods and data.
Fortunately, the recent emerging Question and Answer (Q&A) platform Quora provides a chance for a more in-depth investigation of predictive or prescriptive analysis in the field of public opinion and science communication. Quora is regarded as the so-called “online social Q&A community” in the informational jungle of the Internet [26
]. It is designed to afford question-posing and answering, to support votes for expression preferences or supports to an answer, and to support collaboration through a social network [27
]. This nature of social networking makes Quora different from regular Q&A systems such as the Enterprise Q&A system. The booming popularity and the high-quality content highlight the importance of the site in the field of online knowledge sharing. The vote mechanism provides the potential of measuring public preferences on certain opinions. Another important feature of Quora is that it requires users to provide their full name, including family name and given name, in registration. Although this requirement is not mandatory, it has constructed a real-name environment in Quora, reinforcing the representativeness of Quora’s data to reflect the public opinion. In addition, the ample auxiliary information, including author information, question information, and answer information, also augments the utility of Quora data for research in public opinion and science communication on climate change.
The aim of this study is to investigate the key factors influencing public preferences on climate change knowledge and opinions, with the user-generated-content data collected from Quora, particularly from the questions under the Climate Change topic in Quora. In this study, the measurement of public preference, which is always a thorny issue in traditional public opinion research [8
], was naturally and quantitatively implemented by counting the up-vote number of an answer. Textual features extracted by topic modelling together with other features of each answer were integrated into a regression model to explain the influence of these features on the up-vote number of an answer. The results of the model reveal the mechanism of the science communication of climate change knowledge in social media sites, and the analytic framework in this study is expected to be widely applied as a methodological strategy in future social science studies, especially those involving online public opinion and science communication.
3.1. Major Topics in Answers
As indicated above, through manual inspection of the details of the established models, a structural topic model with 10 topics was found to be preferable in terms of both semantic coherence and exclusivity, compared to those with more or fewer topics. The selected 10-topic model is shown in Table 2
, with the 15 most frequent terms, the proportion in the whole corpus, and the manually proposed label of each topic. Figure 3
presents the word clouds with the 25 most frequent terms for each topic to make the result easier to read and interpret.
Most topics in Table 2
apparently refer to the commonly discussed subjects related to climate change. For example, Topic 1 contains high-frequency terms such as “carbon”, “fuel”, “burn”, “dioxid-”, and “emiss-”, and clearly pertains to the fuel and carbon issue, which is always regarded as the major cause of anthropogenic global warming [44
]. Thus, Topic 1 is labeled as “Fuel/Carbon”. Topic 4 contains similar terms as Topic 1, but is labeled as “energy”, as the high-frequency terms in Topic 4 (i.e., “energi-”, “power”, “cost”, “develop”, and “renew-”) reflect that this topic focuses on a more macro level than Topic 1 does. These two topics are relevant to energy and fuel, and account for 18.8% of the whole corpus.
Three topics, including Topic 3 (Human/Biodiversity), Topic 5 (Atmosphere/Weather), and Topic 7 (Hydrosphere), show more relevance to the influences of climate change on different aspects, including water, air, and species. They have a total proportion of 29.1%. Meanwhile, Topic 6 and Topic 9, together accounting for 21.2% of the corpus, focus on more societal issues relevant to climate change, including science communication and politics. Topic 10 discusses details of climate modeling, with many methodological terms, such as “model”, “data”, “predict”, and “trend”.
The remaining two topics, including Topic 2 (Livelihood) and Topic 8 (Future), contain high-frequency words that are commonly used in daily dialogues, rather than in specific subjects related to climate change. The two topics account for 22.7% of the whole corpus.
3.2. Regression Results
presents the results of two count data regression models: the Poisson regression model and the negative binomial regression model. The following measures of fit were employed to quantify the model fit: Log likelihood, Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). The results in Table 3
show that the negative binomial regression model fit the data better than the Poisson regression model, with a higher Log likelihood, a lower AIC value, and a lower BIC value. Hence, the following interpretations are mainly based on the results of the negative binomial regression model. These measures also indicate that the observed count data (i.e., the up-vote number of answers related to climate change in Quora) do have an over-dispersion problem. The estimated value of the over-dispersion parameter
, described in Equation (4), is 0.5584. The better performance of negative binomial regression is consistent with prior studies arguing that negative binomial regression is more useful than the Poisson model in fitting over-dispersed datasets [26
The effect of the explanatory variable on the dependent variable is determined by the regression coefficient β
shown in Table 3
. In both the Poisson regression model and the negative binomial regression model, a positive (negative) estimated value of the β
coefficient for an explanatory variable indicates that an increase (decrease) in the variable leads to a higher expected count of up-votes, ceteris paribus. As the two count data models fit the natural logarithm of the up-vote number, the coefficients can be interpreted as follows: for a one-unit change in an independent variable, if other variables remain fixed, the natural logarithm of the dependent variable is expected to change by the value of the estimated coefficient. As is shown in the negative binomial regression results in Table 3
, four auxiliary features, including scaled Author followers, scaled Text length, scaled Image number, and scaled Question followers, were positively correlated to the number of up-votes that an answer received, all with a significance level of p
< 0.001. For example, the estimated coefficient for the scaled Text length was 0.195, which means that if other variables remain fixed, answers that have a one-unit longer text length on average obtain 1.215 (exp (0.195) = 1.215) times as many up-votes as the shorter ones.
With regard to the textual features, based on the results of the negative binomial regression model, there were eight topics significantly influencing the public preferences of an answer. Six topics, including Topic 1 (Carbon/Fuel), Topic 3 (Human/Biodiversity), Topic 4 (Energy), Topic 6 (Science Communication), Topic 9 (Politics), and Topic 10 (Climate Modeling), showed significantly positive effects on the extent to which an answer can get more up-votes. Two topics, including Topic 2 (Livelihood) and Topic 8 (Future/Impact), showed significantly negative effects in this regard. Meanwhile, Topic 5 (Atmosphere/Weather) and Topic 7 (Hydrosphere) had no significant effect on the number of up-votes obtained by an answer.
In the near future, public participation in environmental issues will take place primarily via the Internet, and social media sites—which provide opportunities for implementing the interactions between policy makers and common people or knowledge producers and knowledge receivers—will be the major platform for online public participation [47
]. With regard to climate change issues, a huge volume of public opinion data is posted on social media sites at present. These data have been widely used to describe the profile of online public opinions about climate change [1
]. However, more in-depth studies with predictive or prescriptive analysis are rare. This study responds to the lack of such empirical cases by highlighting the utility of the combination of the structured and unstructured data collected from Quora. The analytic framework in this study solves several conceptual and computational problems in leveraging the data, including using the number of up-votes to measure public preferences on certain standpoints, employing the Poisson regression model and the negative binomial regression model to fit the count data and to transform unstructured text data into topical features which can be used in a regression model. The proposed framework is expected to be widely applied in future social science studies which intend to leverage the big data from social media sites.
In addition to the methodology’s significance, the results of topic modeling and regression modeling on the Quora data also have implications for better understanding the science communication and the public opinion on climate change:
The topic modeling results summarize the online public opinion on climate change in Quora, which is one of the most popular Q&A websites in the English world. The induced 10 topics distribute quite evenly in the whole corpus, with the most prevalent topic being Topic 2 (Livelihood), accounting for 12.4% in whole corpus, and the least prevalent topic being Topic 4 (Fuel/Carbon), accounting for 7.8%. Most of these topics also appear in previous studies based on open-ended surveys of citizens in the U.S. and the U.K. aiming to find effective images associated with global warming or climate change [49
], however, with different proportions. For instance, natural phenomena related to climate change, such as ice melt, flooding, and abnormal weather, are prominent topics or effective images of citizens in the U.S. and the U.K. However, similar topics in Quora answers, including Topic 5 (Atmosphere/Weather) and Topic 7 (Hydrosphere), account for just 20.9% of the corpus. The proportion of topics focusing on energy and fuel and carbon emission issues (Topic 1 and Topic 4) is 18.8% in Quora answers, clearly larger than the proportions of the Greenhouse category in the U.S. and the U.K. (both less than 5%) [49
]. In addition, human and societal topics, including Topic 2 (Livelihood), Topic 3 (Human/Biodiversity), Topic 8 (Future/Impact), and Topic 9 (Politics), account for about 40% in the corpus, against much lesser proportions of similar image categories in the U.S. and the U.K. The proportion of Science Communication (Topic 6) in Quora answers is 11.4%, much smaller than the U.S.’s naysayer category (23% in 2010) [50
]. Meanwhile, scientific research on climate change—particularly with technical details of climate modeling—has a topic proportion of 8.0% in Quora answers, but seldom appears in citizens’ images related to climate change. To sum up, in such a knowledge sharing and social networking platform as Quora, users are more likely to talk about the energy, human and societal issues, and scientific research rather than natural phenomena on climate change, compared with the citizens’ responses to open-ended surveys in previous research.
The regression modeling results quantitatively reveal the effects of different features on the public preferences for an answer. In terms of textual features, only four topics, including Topic 2 (Livelihood), Topic 5 (Atmosphere/Weather), Topic 7 (Hydrosphere), and Topic 8 (Future), had negative effects on the number of up-votes, and only Topic 2’s and Topic 8’s effects were significant. A possible reason to explain this is that Topic 2 and Topic 8 do not focus on specific subjects relevant to climate change, which can be inferred from their high-frequency terms shown in Table 2
. Answers with a high proportion of those everyday terms cannot provide substantial knowledge to the readers. Hence, these answers can hardly get many up-votes and may even bore the readers. With regard to Topic 5 and Topic 7, although these two topics describe specific subjects related to climate change, the changes in atmosphere, weather, and hydrosphere are, to some extent, popular knowledge about climate change [2
], which cannot stimulate the Quora users to vote for the corresponding answers. Nevertheless, the estimated coefficients of Topic 5 and Topic 7 were very small in absolute value and their effects were also insignificant, showing that the prevalence of these two topics cannot significantly influence the voting behavior of Quora users.
Topics with significantly positive effects on the number of up-votes all discuss specific subjects related to climate change. The largest effect came from Topic 6 (Science Communication), with a β
value of 0.451. It is not strange that the topic of science communication could attract more support from users in Quora, as Quora does operate as a platform for online science communication. As reported by Alexa, users in Quora are more educated than the general internet population and may have stronger beliefs of the scientific consensus on climate change issues. Hence, the discussion of science communication—especially the criticism of the deniers and the skeptics of climate change—may substantially resonate with those Quora users [33
] and can get more up-votes. The second largest effect was from Topic 9 (Politics), with a β
value of 0.348. Climate change issues are always a significant political agenda at different levels [51
]. From an international perspective, although agreement was reached on the Kyoto Protocol to the United Nations Framework Convention on Climate Change with over 183 countries’ commitment by 2009, these countries may be unwilling to act unilaterally, because “in doing so they would pay the full price of abatement but gain only a fraction of the benefit” [52
]. From a domestic perspective, decisions on policies to mitigate climate change are highly concerned with electoral interests, national discourses, and domestic political institutions [52
]. The subtle linkage between climate change and politics may also be intriguing knowledge to Quora users. Other topics, including Topic 1 (Fuel/Carbon), Topic 3 (Human/Biodiversity), Topic 4 (Energy), and Topic 10 (Climate Modeling), are also specific subjects but not popular knowledge about climate change.
The effects of auxiliary features were all significantly positive. This is in line with our expectations indicated in Section 2.3.2
. The remarkable feature was Author followers, which had the largest effect on the number of up-votes, highlighting the importance of social capital on science communication in such a social Q&A website as Quora [26
]. For a knowledge contributor (answer author) in Quora, the interaction between their social capital (represented by the number of followers) and their peer recognition (represented by the total number of up-votes they received) is complex. Based on the attention economy theory proposed by Simon [54
], users’ attention is a scarce resource in a social network. In order to get widespread attention from readers, knowledge contributors need both more followers and more up-votes, which are mutually reinforcing. In fact, as demonstrated in previous studies, contributors’ expectation of getting more attention, including followers and positive feedbacks (up-votes), motivates the development of knowledge or information sharing websites such as YouTube [55
] and Twitter [56
]. Hence, in order to promote science communication in social media, an in-depth understanding of this complex interaction is necessary and needs further research.
People will selectively read and understand information in ways that reinforce their already-constructed beliefs [3
]. Previous studies with data collected from Twitter and Facebook show that the echo chamber effect is prominent in social media discussions, especially topics related to climate change [57
]. Facebook and Twitter can be regarded as pure social media sites and were originally designed for social purposes. Although there are a large number of posts about climate change on Facebook and Twitter, these posts are short, scattered, and full of personal emotions, and the echo chamber effect is significant in these posts [59
]. However, Quora has unique features, including a topic–question–answer structure, real name environment, and social status stimulation (a good answer will attract more readers to follow the author; thus, the author will have a higher discourse power in the community). These features make Quora a more proper platform to rationally discuss climate change issues rather than to emotionally express personal attitudes. Thus, Quora has the ability to disrupt the echo chambers in the online environment.
This study demonstrated the utility of the data collected from the online social Q&A community Quora for the investigation of science communication and public opinion, specifically on the knowledge of climate change. By integrating the technologies of web crawler, topic modeling, and count data regression model, a novel analytic framework was proposed to leverage the semi-structured dataset collected from Quora. The topic modeling result indicates that Quora users are more likely to talk about the energy, human and societal issues, and scientific research rather than natural phenomena of climate change, compared with the previous open-ended surveys of citizens in English speaking countries (the U.S. and the U.K.) [49
]. The regression modeling results revealed that: (i) answers with more emphasis on specific subjects, but not popular knowledge, about climate change can get significantly more up-votes; (ii) answers with more terms of daily dialogue will get significantly fewer up-votes; and (iii) answers written by an author with more followers, with a longer text, with more images, or belonging to a question with more followers, can get significantly more up-votes. These results are useful in promoting the science communication of climate change in online social Q&A communities, which implement a decentralized knowledge production mode and will be the major platform for the public discussion of controversial environmental issues in the future.
As a novel investigation with a new dataset and new methodology, this study has some limitations. First, the lack of detailed demographic information of Quora users obscures the representativeness of the sample. We should admit that the sample of this study is biased. Even for users in Quora, those following the questions and voting on the answers about climate change might only be the ones who are seriously concerned with the issues. Thus, the result of this study reflects only a fraction of the public opinion. However, since Quora has been gaining more and more users, the full view of the Climate Change topic in Quora does have significance in the research field of public opinion and science communication of climate change. Second, the question information is almost absent (only reflected by Question followers) in the regression models. This may lead to a potential loss of important information. Third, some subjectivity exists in the processes of determining the topic number in topic modeling and determining the threshold of the transformation of the textual features. Hence, further research will focus on the corresponding aspects, as follows. The completion of demographic information, including gender and age, can be implemented by image recognition of user icons [60
]. The involvement of question information can be introduced by a hierarchical regression model [61
], which needs further classification of the questions. The subjectivity can be reduced by using more automated topic modeling approaches, such as hierarchical Dirichlet process [62
]. We believe that the proposed methodology, including the valuable Q&A data and the quantitative analytic process, is expected to be widely used in future research on science communication and public opinion about climate change, as well as more general social issues.