Analysis of Popular Social Media Topics Regarding Plastic Pollution

: Plastic pollution is one of the most signiﬁcant environmental issues in the world. The rapid increase of the cumulative amount of plastic waste has caused alarm, and the public have called for actions to mitigate its impacts on the environment. Numerous governments and social activists from various non-proﬁt organisations have set up policies and actively promoted awareness and have engaged the public in discussions on this issue. Nevertheless, social responsibility is the key to a sustainable environment, and individuals are accountable for performing their civic duty and commit to behavioural changes that can reduce the use of plastics. This paper explores a set of topic modelling techniques to assist policymakers and environment communities in understanding public opinions about the issues related to plastic pollution by analysing social media data. We report on an experiment in which a total of 274,404 tweets were collected from Twitter that are related to plastic pollution, and ﬁve topic modelling techniques, including (a) Latent Dirichlet Allocation (LDA), (b) Hierarchical Dirichlet Process (HDP), (c) Latent Semantic Indexing (LSI), (d) Non-Negative Matrix Factorisation (NMF), and (e) extension of LDA—Structural Topic Model (STM), were applied to the data to identify popular topics of online conversations, considering topic coherence, topic prevalence, and topic correlation. Our experimental results show that some of these topic modelling techniques are effective in detecting and identifying important topics surrounding plastic pollution, and potentially different techniques can be combined to develop an efﬁcient system for mining important environment-related topics from social media data on a large scale.


Introduction
The global production of plastic has increased significantly to more than 380 million tonnes annually (See report at https://ourworldindata.org/where-does-plastic-accumulate, accessed on 30 November 2021), which leads to the massive accumulation of plastic waste, like plastic bags, containers, toys, foams, pre-production gears, plastic fragments, and microplastics [1,2]. By 2050, plastic waste is expected to increase to 12 billion tons if countermeasures are not taken [3,4]. Due to littering, the illegal dumping of rubbish, and poorly managed waste dumps, plastic waste could be everywhere in our natural environment and oceans. In addition, plastic waste is often washed away by rain or blown away by the wind, causing widespread environmental pollution [5]. Plastic pollution undoubtedly has detrimental effects, including global warming, water and soil contamination by chemicals released by plastic, affecting human health, killing animals and plants, and threatening all species' survival. Therefore, all stakeholders ought to make concerted efforts and share responsibility in solving the plastic pollution crisis, including the government, industries, civil society, and consumers. Nevertheless, public behaviour does not demonstrate major changes in the reduction of plastic usage. For instance, in supermarkets, consumers still use plastic bags to pack their vegetables, take out food or drinks, and carry them in plastic containers and utensils. This global problem can only be solved effectively if the public is happy to commit to alternatives. For some members of the public, this may lead to inconvenience, such as travelling with self-bottled shampoo rather than retail shampoo packs. To date, the vast majority of people are not fully aware of the need for civic engagement. Detecting popular trending topics and public sentiments concerning particular issues can reveal the level of civic engagement and reflect the public acceptance of policy changes. However, traditional research methods, such as surveys and interviews, are timeconsuming and costly. In recent years, the increase of social media content is becoming an essential source of value creation [6]. Trending topic and sentiment analysis that leverages social media has attracted considerable research attention in many communities. Such studies include the trending topic analysis with topic detection [7], examination of public opinion by observing the voluntary compliance of energy consumption [8], exploitation for sustainability tourism [9], predicting elections [10], and traffic management [11].
As a major social media platform, Twitter allows the public to freely share their contents, and so it is a handy source for analysing public opinion. In this study, we aim to understand public sentiments or opinions towards plastic pollution challenges, and thereby better inform future solutions and policy decision-making. Two research questions were formed: 1.
What are popular topics related to plastic pollution discussed in the Twitter-sphere? 2.
How are those topics related?
This paper is organised as follows: In Section 2, a brief overview on related work is presented. In Section 3, a discussion on related methodology is presented. In Section 4, data materials and techniques used for text analysis and the topic modelling results extracted from our data and analysis of the results are presented. In Section 5, the implication and application of our work are discussed. Finally, the conclusion is presented in Section 6.

Literature Review
Numerous studies have used them as a reference point for sustainable development since their formulation [12][13][14][15]. This is because governments want quantitative measurements to analyse their performance in relation to a variety of difficulties, ranging from natural resource management to pollution control.
When researchers seek evidence to aid in government decision-making on a particular subject, one method that researchers use is to examine the opinion of individuals or users on social networks [16][17][18]. Even though social media platforms such as Twitter and Facebook are growing in popularity and increasing the number of individuals who have a voice on issues such as the environment, these platforms have demonstrated their limitations as a single means of policymaking, because not all individuals are represented by social media [19][20][21]. Numerous studies, however, have used Twitter and hashtags to ascertain the population's topical feelings about a particular issue associated with environmental or global health [22]; other studies, for example, those examining public perception in social media following a natural disaster, analyzed the key terms retrieved from the remarks [23].
Amongst many other pertinent studies, authors in [23] evaluated how Twitter members interacted after an earthquake or a flood when they were unable to make phone calls. Additionally, in [24], the authors studied the hashtag #characterbuildings, which was used to communicate Twitter users' perspectives on New Zealand's sustainable housing. Similarly, in [25], the authors utilized publicly available tweets to explore political, social, physical, and historical elements that enable hotel facilities to enhance their sustainability decision-making and management.
In [24], the authors established a model of environmental social awareness based on social media with the goal of equipping individuals with the required information, abilities, and attitudes to mitigate unfavorable environmental impacts. Additionally, in [26], the authors utilized the social platform to assess public attitudes toward environmental health, gather information about community concerns, and interact with residents about environmental dangers.

Sentiment Analysis towards Social Network
Sentiment analysis is a popular technique for determining the emotional impact of messages released on a particular subject [26]. It is derived from the utilisation of Big Data, and is used to perform research on the impact of events on social media, to evaluate consumer sentiment towards services and products, and to comprehend online communication [27]. Sentiment analysis is essentially the process of acquiring topic-based Twitter remarks.
Sentiment analysis can be used in conjunction with other technologies to elicit the most critical aspects under investigation. For example, Pak and Paroubek [28] created a variety of approaches for Twitter analysis. By and large, the study considers the connotations of commonly occurring words, which convey a range of emotions, and can be classed as good, bad, or neutral. For instance, Palomino et al. [22] employed sentiment analysis to ascertain public opinion and analyse associated speech in order to gain a better understanding of the natural environment's effect on people's mental health and well-being. On the other hand, Ekenga et al. [26] conducted a topic-based sentiment analysis of Twitter remarks, and illustrated that this social networking site is useful for gauging public sentiment with respect to environmental health, obtaining information about the community's most pressing issues, and communicating about environmental risks. In general, past research has revealed that social networking sites are an excellent venue for doing this type of study. Although some of the difficulties associated with sentiment analysis are due to the context in which the content is examined, sarcasm, or irony, this technology may be trained to mitigate these impacts and shortcomings. To be precise, we can predict ironies and sarcasms by utilising machine learning and data mining algorithms. Table 1 shows summary of important research with respect to social network.

Textual Analysis
Textual analysis is a sort of qualitative approach in which a text is analysed in order to arrange the concepts under consideration into "nodes". This method can be used to analyse an event, a business, or any other object of study. NVivo (QSR International, Melbourne, Australia) is the most often used programme for this analysis [25]. The strategy results in the grouping of concepts into nodes, which is quite beneficial for performing exploratory analysis. This strategy is more descriptive than if no software was utilised.
The nodes are organised hierarchically. At the most fundamental level, there are notions that are conceptually distinct from one another. At the second level, there are branches that originate from each of the nodes and are organised hierarchically. Finally, there are indicators that can be derived from research findings and are relevant to the study's objective. Saura et al. [25] conducted one of the important research studies, in which textual analysis was utilised to discover environmental influences. The authors identify three nodes that correlate to negative, positive, and neutral influences.
How Does Society Respond to Plastic Pollution? Over 7.8 billion people in the world [32] account for generating a large amount of plastic waste in a single month, which is enough to fill three and a half times the weight of Malaysia's Petronas Twin Towers [33]. As part of the effort towards sustainability, many activities such as bring your bottle (BYOB) and reusable utensils, metal and eco-friendly straws are being introduced to reduce the amount of plastic use. Environmental agencies and organisations such as Greenpeace, Heal the Bay, and Friend of Earth, have played a crucial role in reducing the plastic pollution crisis. They take the initiative in leading antiplastic campaigns, conduct beach clean-up activities in the coastal areas, raise awareness on plastic pollution, and help in efforts to reduce plastic pollution. Ocean Conservancy has partnered with governments, corporations and participated in a programme known as the International Coastal Cleanup to remove plastic debris on beaches [34]. These activities have greatly attracted school students' interest, and can be used as a vehicle for environmental education. Exploring this topic online can assist the study of how well society is educated and reacts to the issues and the current state of the community's commitment toward this issue.
Meanwhile, governments have set up laws to fine manufacturers and individuals who do not follow proper methods of plastic waste disposal. For example, the UK government discourages using plastic bags, and shops and supermarkets in the UK charge customers for new plastic bags. The Zimbabwe government has announced a ban on polystyrene usage, as it takes approximately a million years to decompose. Those who are found guilty of breaching the law will be fined 30 dollars to 500 dollars [35]. In Malaysia, the Selangor government issued a ban on the use of plastic bags, and people have to pay twenty cents for each plastic bag that is used in shops [36], while in the Penang state, a new plastic bag is charged one ringgit from 2021 [37]. Moreover, Malaysia's government introduced the Malaysia Roadmap Towards Zero Single-Use Plastic law in 2019 and now strictly restricts the dumping of plastic bags.
All the strategy mentioned above is to urge society to look into the serious issue of plastic pollution. Every individual has a moral responsibility to recycle used plastics. American and European residents have played an active role in initiating recycling activities, and the national recycling rate of polyethylene film has been increasing.

Twitter
As one of the most popular social media sites, Twitter serves to be a social connection among people, particularly individuals within their in-group that share the same perspective [38]. Social interaction fulfils the emotional needs that people may not otherwise find elsewhere. If an emotional need were met, the viewer would engage in that similar behaviour intensively [38]. In other words, Twitter is potentially helpful in driving social behaviour change. Hence, understanding and exploring topic prevalence and correlation could drive solidarity in society.
Why tweets? Twitter is connectable to Facebook and Instagram. Instagrammers are the users of the photo-sharing and social networking site Instagram. Instagram has been used as a marketing tool to discuss a wide range of well-being issues, such as food consumption, obesity, eating disorders and dieting [39]. Mainly to gather popularity, many Instagram accounts are dedicated to documenting their lives concerning these issues. In other words, compared to the role of Instagram and Facebook, Twitter has an even larger community of users. It is also larger than both of these competitors on their own. Hence, our data exploration starts with Twitter.

Topic Modelling
In this age of information, it is challenging for individuals to process and understand a vast amount of unstructured data; hence, topic modelling is commonly applied to declare, determine and define an unknown semantic framework in a text collection [40]. A variety of techniques have been proposed and implemented for this purpose.
Generally speaking, a topic model is a type of mathematical model which discovers and analyses abstract topics from a group of documents. It is also defined as a probabilistic semantic model, because mathematical algorithms are implemented to reveal the underlying semantic framework in a text collection [41]. Over the past years, topic modelling has been used across different fields, concentrating on text mining and retrieving information [42]. The model has also obtained much attraction among researchers from other areas of study. It has been successfully applied in the fields of brand management [43], personal branding [44], and social networks [45].
There are a variety of topic modelling techniques. In this paper, we explore five methods, including LDA is an unsupervised method for identifying key topics within a set of documents [46]. LDA contains a generative statistical model which assumes that the observed documents are produced from a mix of latent topics. The unobserved latent topics whose number is determined by the researcher beforehand will then generate words that associate with each other based on the probability distributions. Suppose LDA is given a number of topics and a set of documents. In that case, the LDA algorithm will identify what combination of the unknown issues can be generated by those documents [47].
Another topic modelling approach extended from LDA is the Hierarchical Dirichlet Process (HDP). It is a Bayesian nonparametric model used to model groups of data with a potentially infinite number of components [48]. This approach assumes that the components are word distributions representing recurrent topics in the collection. The number of topics and the characteristics of the distribution are defined by posterior inference automatically. Latent Semantic Indexing (LSI) is a statistical model used to enhance information indexing and retrieval. It uses the singular value decomposition (SVD) technique to find the latent relationship between words in given documents and classify them into topics.
Non-negative matrix factorisation (NMF) is a dimensionality reduction technique for multivariate data. It was introduced by Paatero and Tapper in 1994 under the name positive matrix factorization [49]. It is easy to implement and became known as non-negative matrix factorisation after Lee and Seung (2001) published algorithms of two types of factorisation [50]. The basic idea of NMF is that the negative numbers are meaningless in the data process. Hence, it overcomes the issue of negativity by placing non-negativity constraints on the data model. Many researchers have used it for topic modelling, such as [51].
The Structural Topic Model (STM) is an extension of the LDA, which provides many valuable features, including rich ways to explore topics, estimate uncertainty, and visualise interest quantities [52]. It was shown to be useful in analysing trends and understanding topic prevalence in tweets [53], studying the topic prevalence and how topic evolves across international news in social studies [54]. Lobbying reports [55]. Compared to standard topic models, STM adds two extra functionalities that allow researchers to determine the topic prevalence and topic contents [47]. It facilitates the study of topic prevalence, topic evolution, and topic correlation. Several studies have adopted the STM approach to analyse social media [56], newspapers [57], and research papers [58].
The STM has been growing in popularity, as it is fully implemented as a publically available R package. The package offers multiple features such as topic discovery, text preprocessing, model validation and model search, and topic visualisation [59]. The STM is a seminal word count model document level covariate information used for topic modelling. By influencing topical prevalence, topical content, or both of them, qualitative and assumption interpretability can be enhanced [54]. A data creation process is illustrated for every document so that the data can be further applied to search for the possible values for the framework within the model [52]). The model starts with both topic word and document topic distributions, which produces metadata documents related to them. With STM, researchers can conveniently determine the relationship between topics and documents via metadata, and the measurement of hypothesis among the relationships can be conducted [52].

Data and Methodology
Text analytics are applied to extract meaningful and valuable information from unstructured texts [60]. Text analytics usually involve data collection, data preprocessing, categorisation, and analysis [61]. Meaningful information is extracted from the corpus, a collection of textual contents associated with the topic of interest, where data processing and area of expertise are in charge of organising the genuineness of content [62]. Figure 1 shows the overall four phases involved in the text analytics process.

Data Extraction
The number of tweets provided for behavioural research studies commonly ranges from 50,000 to 200,000 tweets [63][64][65]. In this study, 274,404 tweets were crawled from Twitter using keywords between 16 April and 4 June 2020. These tweets were extracted based on predefined keywords of interest, and are related to plastic pollution, such as "plastic-free", "sustainable", "sustainability", "waste", "plastic", and "pollution". The corpus comprises 21 variables, including full name, tweet text, tweet ID, followers, location, bio, and many more. It was further pre-processed using R programming, and a clean data-set was stored in an Excel file.

Methods and Tools for Topic Analysis
In this paper, we explore a number of mainstream methods and tools for assisting the extraction and analysis of public opinions on hot topics related to plastic pollution from social media on a large scale. For this purpose, we selected a set of standard topic modelling models and tools to process our data. Our aim was not to compare the accuracy of these different methods; rather, we wanted to explore the feasibility of combining the different methods to achieve a wider coverage of topics contained in the data.
As shown in our literature review, there exist numerous algorithms, models and tools for topic detection. For our study, we selected four topic modelling methods, including LDA, LSI, HDP and NMF, as they are among the most commonly used models. In addition, we also used the LDA-based STM (Structural Topic Model) package, as it facilitates various analysis and visualisation of analysis results, in addition to the core topic modelling functionality.
In addition to the detection of topics themselves, we also analysed the key words under the topics, in order to examine the semantic affinity and relations of the key words that collectively represent individual topics. Such analysis reveals how reliably these topic modelling methods can identify main topics from textual data, and how the topics can be linked via shared keywords. Moreover, the STM package was used to analyse the relevance and correlation of the extracted topics illustrated using its visualising functionality.

Data Pre-Processing
Data pre-processing consists of stemming, tokenisation, stop word removal, and normalisation to prepare the corpus and improve the accuracy of further analysis. In this study, text from tweets is first loaded as a corpus. Data cleaning of unnecessary information is required, as the type of text data applied in tweets is generally noisy. Removal of extra white space, numbers, and punctuations from the text is performed. Besides that, arbitrary characters are replaced with space, and words are converted to lowercase. By applying normal expressions, text normalisation is performed too. The normalisation of text in any natural language processing task is a must, as the text is usually represented in an informal and ambiguous method. Text normalisations involve text stemming, where the texts are prepared to be further processed. Words are usually represented in different forms according to the purpose of grammar; for example, the verb "to be" can be represented in "is", "are", "were", and many more. Therefore, texts which are in their basic "stem" forms are reduced in stemming. Then, stop words which are unimportant but common words, such as "for", "to", "the", "an", "and", "a", "but", and "or", are removed. Next, the list of top words related to each topic is identified for better topic exploration. By examining the top words from each topic, labelling can be carried out more effectively and efficiently. After performing data pre-processing, the final dataset consisted of 221,409 tweets, 86,257 terms, and 3,469,025 tokens.

Topic Labelling
This study applied search intent to create a suitable and relevant label for automatically generated topics. Search intent illustrates the users' query on search engines, which represents a goal that users are attempting to achieve. By performing keyword searches through Google, results shown based on the keywords searched could provide a clearer idea of creating relevant labels. For instance, the top results shown by searching for keywords such as "zero", "waste", "time", "just", and "people" in Google are mostly about zero waste. Hence, the topic with these keywords is labelled as "zero waste".
Google is one of the most used web search engines that could guess users' search intent, and provides the most related results for their search. To be more capable of determining users' search intent, Google has started to improve its algorithm over the years, which focused on searching, ranking, and returning the most related results for every search term. According to the latest Google Bidirectional Encoder Representations from Transformers (BERT) in 2019 [66], it provides a machine learning algorithm where the framework and connections of all the search terms are considered, instead of just identifying the terms individually based on their order. Therefore, this method is implemented through Google, as it provides the most highly recommended search outcomes, which assist in better labelling for further text analysis.

Topic Extraction with LDA, LSI, HDP, NMF
One of the two main parts of our experiment is the topic extraction using four popular topic modelling algorithms, namely LDA, LSI, HDP and NMF. The tweet data were passed to each of the tools, and extracted the top twenty topics. Using the topic labelling methods explained earlier, we tagged extracted topics to find meaningful labels. Table 2 shows the labelled topics selected from the results.
As shown in Table 2, altogether, we could manually label forty of the total eighty topics, many of which are related to environmental issues (see highlighted topics), including the plastic pollution issue. In the "Topic types" column, the three digits denote the number of environment-related, manually labelled, and a total number of extracted topics. For example, "4/14/20" for the NMF model indicates that this model extracted twenty topics, of which fourteen could be manually labelled, of which four are relevant to the environment. Some of the topics are semantically close to each other and fall under the same labels. For example, for the HDP model, three topics are about plastic waste. Hence, all of them are labelled as "Plastic Waste", to which the numbers are attached solely to distinguish different topic items.
Because the Twitter data were collected using keywords related to plastic pollution, we expected to extract many environment-related topics. In this regard, the four algorithms performed differently. The HDP and LSI produced more such topics (nine each) compared to the other models. It demonstrates that they are more effective in picking up general topics of the data. However, as reflected by their lower conference scores, the topics' quality (in terms of semantic similarity of words) is poorer than those produced by NMF and LDA. Tables 3 and 4 show two sets of words of corresponding topics (Climate Change and Plastic Pollution) for a comparison. Table 3 lists ten words under each of the two topics extracted by NMF with a coherence score of 0.4964, which is the highest among the four models. On the other hand, Table 4 lists ten words under the corresponding topics extracted by LSI and HDP, with coherence scores of 0.3503 and 0.3440, respectively. When we closely examine the words, we can observe that, in Table 3, the sets of ten words are semantically closely related, whereas, in Table 4, the sets of words have a loose semantic connection between most of the words. Such different performance of different topic modelling algorithms has implications with regard to their practical applications. In this case, the NMF and LDA are more effective in clustering-related text (tweets in this case) under highly representative topics. On the other hand, the LSI and HDP are more helpful in identifying the general topic of the entire data (environment issue in this case). Our primary purpose in this paper is not to compare the performance of individual models. Instead, we focus on analysing the contents of the Tweet data by combining the topics extracted by these four algorithms.
As we mentioned earlier, our primary focus of this paper is not the comparison of different topic modelling models. Instead, we explore the feasibility of identifying social media hot topics by combining a set of models, namely LDA, LSI, HDP, and NMF, in this particular case. When we combine all the topics extracted using the four models, we found that the most frequent topics of the data include "Plastic Pollution" and "Sustainable/Sustainability", which accurately reflect our data's main contents. Table 5 shows the frequency of the environment-related topics. As shown in the table, in the social media conversation, the public are highly aware of environmental issues, including issues related to plastic pollution, reflected by Plastic Pollution, Plastic Waste, Recycling, Waste Management, and its effect to the environment, reflected by Sustainable/Sustainability, Environment, 3R Concept, Climate Change, Renewable Energy.
All the topics above reflect a widespread concern in society about plastic pollution and related environmental issues. Such issues have become popular topics of chatting and discussions on social media. Although a further investigation is needed, our experimental results demonstrate that an appropriate combination of a set of topic modelling algorithms can potentially provide an effective means of tracking and analysing public issues and opinions regarding plastic pollution and more general environmental issues. Furthermore, we examined the semantic network of the topics by observing the collection of the top twenty words extracted by the topic modelling tools for each topic. Figure 2 shows a word cloud generated using all the words under eighty topics extracted in the first step of our experiment. In this word cloud, the frequency of each word determines its font size: the more frequent, the bigger. As shown in Figure 2, the most frequent words include (f = frequency): "Waste" (f = 22), "Plastic" (f = 20), "Sustainability" (f = 12), "environment" (f = 10), "polution" (f = 9), "climate" (f = 9), "nature" (f = 9), "time" (f = 9). Except for the word "time", which does not have explicit relevance to the environment issue, the rest of the words correctly reflect the core theme of the data collectively, and connect some topics. The wordcloud also reveals some interesting sub-topics related to the main theme, such as "books", "games", "romance", "music", etc. Such sub-topics may potentially show connections between some media or activities and environmental issues.
We also examined the wordcloud of each of the four topic models, as shown by Figures 3-6. Each word cloud is generated using the words from twenty topics extracted by each of the topic models. As shown in Figure 3, HDP extracted quite a few environmentrelated words for the topics, including "waste" (f = 14), "plastic" (f = 10), "nature" (f = 8), "environment" (f = 7). Similar case for LSI, whose topics contain frequent words, including "sustainability" (f = 11), "plastic" (f = 7), "climate" (f = 7), "recycle" (f = 6), "pollution" (f = 6), etc., which are closely related to the environment issue. On the other hand, NMF and LDA extracted a wider range of different words, which form more diverse topics, but with a good coherence level, although many are not related to environment issues. These two models should be more useful for identifying a wider range of diverse topics that are potentially related to environmental topics in indirect ways. Note that all of the results are extracted from the same data set.
Given that individual topic models have different features and performances for specific requirements, a practical approach can combine them to produce a complete result. Our experiment demonstrates that this is feasible, as illustrated by Figure 2, with which we can identify deeper and wider thematic and semantic, and structure and network within social media data.

Topic Exploration with STM Package
Another major experiment is the topic extraction using STM. The STM package, a highquality R package that allows data analysis to be carried out effectively and efficiently, was used to extract topics, topic prevalence, and topic correlation. This package provides various functions for data pre-processing [59].

Topic Extraction
Two methods implemented in STM are employed in this study to carry out topic exploration. The first method is for identifying the list of words that are related to each of the topics. Meanwhile, the second method determines the actual documents, which are tweets, that are highly related to each topic. Functions that are used for topic exploration in the STM package are the labels Topics function and Final thoughts function. In the STM package, the label Topics function is applied to explore each topic's top words. The Final thoughts function in the STM package is then used to search for tweets related to a specific topic. This function is beneficial as it provides a better understanding of each topic's contents to illustrate its meaning. Table 2 shows the top words for each topic and the original tweets associated with each of the topics. By applying the plot.STM, (type = "perspectives") function, the difference in words between two topics and covariates can also be described and plotted to be compared more easily.  The MonkeyLearn tool was used (in April 2021) to generate the word cloud: https: //monkeylearn.com/word-cloud, accessed on 1 April 2021.
In topic prevalence, a function in the STM package is used, which is the plot.STM, (type = "summary") function, in order to generate the expected topic proportions. The prevalence of a certain topic within the whole corpus and the top three words related to the topic are identified and displayed.
Correlations between topics discussed by Twitter users are identified and visualised by applying the STM package in topic correlation. Two topics within a document tend to be discussed when it consists of positive correlations. A correlation threshold is defined, where both topics are proved to be connected if both of them are related above the given threshold, with the help of a force-directed layout algorithm; the plot.topicCorr() function is then used to generate a blueprint of topic correlations displayed in Figure 6. The relationship between topics can be further discovered and discussed.
For a further topic exploration, two outputs of STM topic modelling are used: (1) pertopic word distribution and (2) per-document topic distribution. The per-topic word distribution offers top words for each topic, so this has become the primary source of topic discovery in developing a suitable label for each topic. On the other hand, the per-document topic distribution to find tweets most relevant to each topic, and this information was used as complementary in labelling topics. To examine an appropriate label for each of the topics, the per-topic word distribution has become an essential source of topic exploration, as it provides a list of top words for every topic. Meanwhile, the most relevant tweets for each of the topics are also discovered by the per-document topic distribution.
After reviewing the per-topic word distribution (the list of top words per topic), a label is assigned for each topic. For example, Topic 5 is labelled as "sustainable energy", as it consists of top words, such as sustain, energy, future, develops, and industries. Top words for Topic 6 are plastic, environment, climate change, world, and pollution; thus, it is labelled as "plastic pollution". Topic 4 is labelled "going green", based on top words such as sustain, product, green, nature, and design. Based on top words like zero, waste, time, just, and people, Topic 8 is labelled "zero waste". Topic 16 contained top words such as recycling, plastic, use, reuse, and package, leading to the topic label "3R concept".    Moreover, topics relevant to announcements, knowledge, and webinars about plastic pollution can also be identified by analysing the tweets collected. For example, Topic 7 is about "reading", which includes top words like free, book, read, kindle, and ebook. Topic 11 is labelled "watch videos" according to top words such as via, video, watch, full, and share. Based on top words like free, get, online, learn, and course, Topic 12 is labelled "online learning". Topic 13 is defined by top words such as business, join, webinar, may, and support, leading to "webinar" for the topic label. By using similar techniques, a total of 20 topics are labelled.
Some of the topics did not consist of top words that contributed to a better understanding of a specific topic, making it challenging to be labelled. For example, Topic 2 contained top words like "free", "Boston", "Boston curb alert", "freebies", and "wood". Related tweets like "Wood #BostonCurbAlert #Boston #Free" are identified based on the per-document topic distribution. What are the reasons that wood can be linked to Boston and free? Therefore, Topic 2 is labelled "Boston free stuff".
The expected topic proportions for each of the labelled topics are given in the column. The total of proportions is adding up to 1. The highest proportion of the topic is T20 labelled as "help needed", which indicates that most of the tweets are searching for help from the public either to share opinions, seek agreement, or to gather public in joining their activists to reduce topic, which also indicates that a large group of society is not only taking responsibility in helping to sustain the environment, but also utilising tweets in search of the public assistant.
For illustration purposes, three plots were created to show the shared words and distinctive words. Figure 7 demonstrates that Topic 7 is related to "Reading" and Topic 12 is about "Online Learning". Topic 7's top words, such as a book, read, and kindle, are very different from Topic 12's top words, like online, learn, and course. Therefore, both topics are not relevant, as there are unique words instead of shared words. Meanwhile, Figure 8 shows that Topic 6 is about "Plastic pollution", while Topic 16 is related to the "3R concept". Both topics consist of shared words like "plastic", and are mostly related. By applying the "3R concept" to reduce, reuse, and recycle in our daily lives, plastic pollution can be reduced and avoided to save the environment. Firstly, reducing the amount of plastic used can minimise the chances of plastic pollution. Next, the demand for producing new plastics can be diminished by reusing plastics. Although not very high in the proportions over (0.06 over 1), it indicates that society takes responsibility and concern about discussing topics related to method in 3R concept. They understand recycling as to reuse plastics, because it consumes less energy compared to creating plastic. As shown in Figure 9, Topic 4 (Going green) and Topic 9 (Covid-19) are not relevant, as they have no shared words.

Topic Prevalence
The popularity of each topic in tweets can be identified by the expected topic proportions of top words in every topic. The top three words from each of the topics are also shown in Figure 10. The average topic proportion is about 0.05 (100/20 = 5%), given that 20 topics are examined in this study. In terms of topic prevalence, differences in the 20 topics can be observed.   Referring to Figure 10, the top 5 well-known topics made in Twitter conversations between April 16 to June 4 in 2020, in descending proportion order, range from zero waste (Topic 8), sustainable energy (Topic 5), help needed (Topic 20), reading (Topic 7), to online learning (Topic 12). This conversation depicts that society is aware of the environment and recognises the importance of dealing with plastic pollution. They converse in ways that indicate the need for immediate actions to protect the environment, including discussing how to solve problems like reducing waste and sustainable issues. Within two months during the pandemic, while everybody was working from home, online shopping seemed to have high potential to create plastic pollution issues due to the intensifying packaging and delivery services. Furthermore, the topic of reading and online learning also comes into the discussion. Meanwhile, individuals on Twitter also showed interest in learning more about sustainability, especially the discussion of wandering animals around the city and how society's concern about environmental issues was discussed (see Appendix A for  sample Tweets for Topics).
Surprisingly, some topics of discussion that are not related to environmental issues also appear within the batch of tweets in the tweets data collection pile; this topic has an element that linked to the discussion about videos (Topic 11), free games (Topic 18), free music (Topic 10), giveaways (Topic 14), and fashion (Topic 1). However, these are comparatively less popular. Society is more indulging in video and free games during those times instead of talking about it online. Users are still attracted to giveaway information that is posted on Twitter.

Topic Correlation
A network graft is generated to illustrate the correlation between the set of 20 topics; Directed and undirected graph frameworks are also presented with the line connected to each topic. Each node represented a topic, and a link connected referred to a relationship between the two nodes. Figure 11 contains a collection of all nodes that are connected.
The topic correlation network presented 20 topics that were connected to one or even more topics. The 20 topics have established newsworthy communities, where the largest community is defined by 10 of those 20 topics. The community is formed by topics that were mainly about sustainability and the environment, which are Topic 4 (going green), Topic 5 (sustainable energy), Topic 6 (plastic pollution), Topic 15 (wastage), Topic 16 (3R concept), Topic 20 (help needed), and others. On the other hand, another community is represented by topics that were more related to leisure activities and free kinds of stuff, instead of the environment, which includes Topic 10 (free music), Topic 11 (watch video), Topic 14 (giveaways), Topic 18 (free games) and so on.
Meanwhile, Topic 2 and Topic 17 have the least connection, as they are only connected to one node. Topic 2 (Boston free stuff) is connected to Topic 1 (fashion), while Topic 17 (gift cards giveaways) is linked to Topic 14 (giveaways). These topics are not commonly discussed on Twitter, based on their expected topic proportions in 4.2 topic prevalence.
Topic 20 (help needed) is the most correlated topic that consists of the most nodes connected to it. It is connected to seven topics, such as Topic 5 (sustainable energy), Topic 8 (zero waste), Topic 9 (covid-19), Topic 13 (webinar), Topic 15 (wastage), Topic 16 (3R concept), and Topic 19 (happy life). Most of the topics connected to topic 20 contained conversations about environmental issues faced and actions that should be applied to solve the problems and save the environment. Figure 11. Topic Correlation.

Discussions
In this modern information age, text data can be collected from various online sources easily, such as social media, traditional media, surveys, and many more. Nowadays, the public tend to share their opinions and feelings on social media platforms like Twitter, Facebook, and Instagram [67]. Therefore, topics related to the environment, plastic pollution in our particular case, can be identified by analysing data from social media platforms.
Our experiments demonstrate that, by applying various topic modelling algorithms and tools, such as LSI, LDA, HDP, NMF, and STM, on social media data, we can identify topics related to plastic pollution discussions, the popularity of each topic, and correlation among different topics of interest. Furthermore, this approach implemented is proven to be capable of handling a large amount of online textual data effectively and efficiently. Hence, it can distill important information regarding plastic pollution and the environment in general from social media data for stakeholders, such as the government and organisations.
For example, with STM analysis, motivation related to plastic pollution on Twitter consists of various topics, such as sustainability, zero waste, 3R concept, reading, learning, etc. Approximately 55% of the topics discussed are related to the environment, while the rest is about leisure activities. Governments and organisations should be able to get a deeper understanding of individuals' motivation by observing detailed information on every topic of interest, and take timely actions to address the issues of reducing the negative impact of plastic pollution.
According to our STM analysis, the most popular topic discussed between April and June 2020 on Twitter is Topic 8 (zero waste). Creating a zero-waste future where every product and good are treated with value, the disaster caused by plastic pollution can be tackled. Businesses need to develop products and services which avoid single-use plastic to begin a zero-waste supply chain. Governments and environmental organisations also play an important role in spreading awareness and implementing zero-waste acquisition guidance for private companies. In addition, topic correlation showed that Topic 8 (zero waste) is linked to Topic 3 (ban on countries) and Topic 20 (help needed). The interrelated connections between these three topics emphasise the conversations involving how and what to do to achieve zero waste via international collaboration and help from each other.
Moreover, Topic 6 (plastic pollution) is worth further attention, and is closely related to topics such as Topic 3 (ban on countries), Topic 4 (going green), Topic 5 (sustainable energy), , and Topic 16 (3R concept). Due to the COVID-19 pandemic, the amount of single-use plastics such as gloves and masks has rapidly increased, contributing to global plastic pollution [68]. One of the well-known methods in solving plastic waste is that individuals in their daily lives apply the 3R concept to reduce, reuse, and recycle. Topic 7 (reading) and Topic 12 (online learning) are also frequently discussed on Twitter, where individuals can read and learn about the information associated with plastic pollution and the environment through online platforms. Compared to traditional learning methods, free courses and programs provided online are more accessible and convenient. Therefore, governments and organisations could promote online courses to reduce the number of plastics used to avoid plastic pollution. Online learning can also minimise several adverse environmental effects that can reduce wastage and protect natural resources.
Some of the topics like STM Topic 11 (watch video), Topic 14 (giveaways), and Topic 17 (gift cards giveaways) that are not associated with plastic pollution are also worth attention. Video tends to reach out to more people and attract individuals' attention better compared to normal blog posts. Individuals nowadays prefer video content more than written content, as they are more interesting and engaging. To attract and interact with individuals, organisations or governments can create video content related to plastic pollution and post it online so that it is accessible to the public. Knowledge related to sustainability and methods to protect the environment can be promoted more quickly to the public by producing and posting videos online. By providing giveaways and freebies, individuals' attention can also be generated to organisations' platforms.

Conclusions
Plastic pollution, along with correlated terms such as zero waste, sustainability, and 3R concept, are important topics that should be further discussed and studied by academic researchers. Big data can be gathered from social media platforms such as Twitter, where individuals are more willing to voice their opinions and feelings. Useful information related to the environment can be extracted from text data like tweets. Previous studies tend to concentrate on applying social media analysis tools that only involve keyword and sentiment analysis, which makes it hard to determine suggestions, desires, and questions from the public. The goal of this study is to explore topics that are discussed in the Twitter-sphere, the popularity of topics, and connections between topics.
Most previous studies aim to extract meaningful insights from tweets to solve research problems that could not be explained by traditional approaches [69]. Twitter has been used as a powerful platform for organisations to interact with individuals, as it is more sentimental and less formal. This study introduces the use of big data and computational methods for future research related to plastic pollution, as there is huge potential in big data. Researchers need to require knowledge related to computational data collection methods, so that studies can be carried out more effectively and efficiently. Data collection methods, such as web crawling and APIs, allow a large amount of data to be gathered rapidly. This study recommends the use of text analytics instead of traditional analysis techniques like manual coding, which are not practical for big data. Besides, structural topic modelling is suggested, as it can quantify topic exploration, topic popularity, and topic correlation automatically.
We must solve the problem of plastic pollution which leads to negative impacts on human beings, animals, and plants, such as causing illness [70] and killing wildlife [71]. Everyone should implement immediate actions to get closer to a future where individuals are capable of adjusting the expenditure with the needs of the natural world. Instead of just reducing plastic waste and pollution, governments from all around the world should encourage sustainable growth and promote both blue and green economies to their citizens. Meanwhile, organisations can collaborate with other businesses to stimulate plastic compilation and education. By reducing, reusing, and recycling packaging made from plastics across businesses supply chains, plastic waste can also be avoided. Therefore, organisations, businesses, politicians, and society should work together by modifying their consumption habits and public policy to make a difference for the environment and future.
There is a limitation to our current research. The Twitter data we used only cover one and a half months. In order to develop a more authentic and in-depth understanding of plastic pollution, future research should collect more data for a longer period of time, and from various data sources. In addition, future research should also apply more techniques, such as Twitter user metrics and sentiment analysis.  Learn about #sustainability and take action with our #online course in#environmental natural resources and sustainable development. 100% #scholarship offered by Government of #India on #iLearn-#StudyiLearn #eVBAB @IndiainDRC @indiainzambia @IndiainUganda 0.065 T13 Webinar Business, join, webinar, may, support Join to support this webinar for an overview of the sustainable plastics marketplace and businesses that have developed sustainable solutions. #EENcanhelp RT @KTN-Creative: Now Live Designing Sustainable Plastic Solutions comp. @innovateuk will invest up to "800k to fund early-stage, human-centered design projects to reduce the harm that plastics have on our environment and increase productivity and amp; growth of the UK economy. 0.055 T14 Giveaways Just, follow, free, giveaway, post Don't forget !! CashApp Giveaway tomorrow!! #CashAppFriday #cashapp #stocks #robinhood #webull #m1 #stash #freemoney #moneygiveaway #Stimuluscheck #free #freecash RT @bdippr: @Bdippr @Bdippr @bdippr Weekly Cash App Giveaway (x3) CashApp Prizes FREE money! Invest it ,It's super easy to enter!! 1. Follow @bdippr 2. Tag a friend 3. Retweet this post.

T15 Wastage
Waste, food, water, time, service Want to produce delivered to your door AND save money or time? Imperfect provides produce too ugly for the grocery store but still perfectly edible! Less food waste means less water waste #climatechange, #pollution 0.044 T16 3R concept Recycle, plastic, use, reuse, package From villain to the hero during #COVID19. Demand for #plastic masks, gloves and packaged goods are soaring. Plastics seem now indispensable: cheap and amp; disposable. A backlash to the #recycling and amp; ban #singleuse plastic movements? Getting back to the #circulareconomy will take some effort 0.06 This coming week we want to help solve the single-use plastic problem that is facing our planet. What are some of the ways we can reuse singleuse plastic? #Reuse #Plastic #Pollution #CleanBeaches #Trash #Garbage #Environment #LoveEarth #Green #Recycle #SturdyWings #StaySturdy 0.075