A Multi-Dimensional Analysis of El Niño on Twitter: Spatial, Social, Temporal, and Semantic Perspectives

: Social media platforms have become a critical virtual community where people share information and discuss issues. Their capabilities for fast dissemination and massive participation have placed under scrutiny the way in which they inﬂuence people’s perceptions over time and space. This paper investigates how El Niño, an extreme recurring weather phenomenon, was discussed on Twitter in the United States from December 2015 to January 2016. A multiple-dimensional analysis, including spatial, social, temporal, and semantic perspectives, is conducted to comprehensively understand Twitter users’ discussion of such weather phenomenon. We argue that such multi-dimensional analysis can reveal complicated patterns of Twitter users’ online discussion and answers questions that cannot be addressed with a single-dimension analysis. For example, a signiﬁcant increase in tweets about El Niño was noted when a series of rainstorms inundated California in January 2016. Some discussions on natural disasters were inﬂuenced by their geographical distances to the disasters and the prevailing geopolitical environment. The popular tweets generally discussing El Niño were overall negative, while tweets talking about how to prepare for the California rainstorms were more positive.


Introduction
Analyzing the discussions on social media platforms is essential to tackle the complex issues associated with disasters and risks [1][2][3][4][5]. Information can reach vast audiences immediately through social media [6,7], becoming a major source for understanding public perceptions on natural disasters [8]. The study of how people perceive risks and natural disasters on social media can advance our knowledge of risk communication [9].
Disasters and crises disrupt normal routine life and stimulate online activities, including information gathering, as well as generating and sharing information through peer channels. Such disruption and information production sets up new and temporal social structures organized by people using available information around focused interests [10]. The perception of disasters or risks is thus multi-dimensional and influenced by social, political, geographical, cultural, economic, and social factors [1,[11][12][13].
Social media data have multiple dimensions, but most studies only incorporated part of dimensions, thus providing limited information [5]. This research simultaneously analyzed the discussion of El Niño, an extreme recurring weather phenomenon, on Twitter using multiple dimensions, i.e., spatial, temporal, social, and semantical. Specifically, this paper investigates how El Niño was discussed in the United States from December 2015 to January 2016 when a series of rainstorms inundated California in early January 2016. 3 of 18 portion (16%) of geo-tagged tweets, and are able to remove those noises by classifying the "source" metadata in the collected tweets [40].
Therefore, identify data quality and potential bots is critical in analyzing social media data. Chu et al. proposed several criteria to identify Twitter bots, such as: lack of original contents, abundant presence of malicious URLs, duplicate tweets, and low reputation which is defined as the ratio between the number of Twitter followers divided by the number of Twitter followers plus the number of Twitter followings [41]. Based on the above criteria, many machine learning-based methods or platforms are developed to filter noises or identify Twitter bots automatically [28,39,42].
In this paper, instead of focusing on a specific dimension analysis, the researchers analyze the Twitter discussion in multi-dimensions to provide a comprehensive understanding of Twitter discussion. Tweets talking about El Niño, a phase of climate oscillation, had been collected for two months within which there was a week-long rainstorm on the west coast of the United States. Census data and geopolitical information are also incorporated into this study to explore how socioeconomic characteristics influence people's discussion on Twitter. Against the backdrop of people's understanding of El Niño, a thorough analysis of Twitter data is performed to explore how people's perception of El Niño varied based on spatial-temporal-social-semantic dimensions before, during and after a specific storm. The bias and quality of Twitter data are also analyzed and discussed. Some insights which cannot be obtained through single dimension analysis are observed in this study. For example, the impact of geographical and geopolitical environment on people's perception of El Niño is identified.

Data Collection
El Niño is a recurring climate pattern that shifts back and forth irregularly every two to seven years [43]. It has major impact on agriculture, ecosystems and the daily lives of all people. To understand the person's reaction to, and perception of such abnormal weather conditions, the researchers collected Twitter data from 1 December 2015 to 22 February 2016 using the keywords, "El Niño" and "Elnino" globally.
In order to obtain more tweets with updated retweet numbers and favorite number, this study uses a two-stage data collection method. A Streaming Twitter API (https://developer.twitter.com/en/docs.html) was initially employed to capture the spontaneous discussion on El Niño on Twitter. However, since the Streaming API only captures tweets in real-time, most collected tweets received zero retweets or favorites at the time of initial collection. Thus, in order to obtain people's reactions, i.e., number of favorites and retweets, of the previously collected tweets, a second round of data collection was performed using the REST Twitter API to capture the number of retweets and favorites six months later. The researchers chose a six-month waiting period to make sure that most tweets collected with the Streaming API had sufficient time to be viewed, favorited or retweeted. With the number of favorites and retweets, the collected tweets would be more representative for understanding people's perception of El Niño.

Geocoding the Twitter Data
As a general rule of thumb, only 1% of tweets contain explicitly geographical coordinates. There are 168,753 tweets collected between 1 December 2015 and 22 February 2016 globally using the Streaming API. However, there were small percentage tweets (only 981) containing explicit coordinates of where the tweets were published, while a few tweets (4844) contained the names of places where the tweets were published. On the other hand, the authors, i.e., Twitter users who published such tweets, of most tweets (127,469) included their location information on their Twitter account profiles. To maximize the number of geographic locations to go with the Twitter data, this study assumes that the tweets without explicit coordinates or tagged place names were posted from the locations listed on the Twitter users' profiles. Therefore, user profiles or place names contained in the tweets are used to geocode the tweets. In the geocoding process, the priority was given as the following: • if a tweet contained explicitly geographical coordinates, the coordinates were to be used directly without geocoding; • if a tweet did not include coordinates, the tagged place name in the tweet metadata was used in geocoding; • if a tweet did not have coordinates or a tagged place name, the location information listed on the user's profile was used in geocoding; • and, if a tweet did not contain any information as listed above, it was to be abandoned or ignored.
There were 106,400 tweets, i.e., 63% of total tweets, geocoded based on the geographic information listed on either the tweets, or Twitter users' profiles using the ArcGIS Online Geocoding Service (https://geocode.arcgis.com/arcgis/). This ratio of geocoded tweets is higher than the 18% or 46% in Kryvasheyeu et al.'s studies [33,34]. After filtering out the ambiguous or fictitious place names, such as "world, USA, dream", the researchers identified 58,773 tweets located in the United States.

Identifying Topics in Tweets
To better understand the semantic discussion on El Niño on Twitter, the researchers utilized topic modeling in RapidMiner (https://rapidminer.com/) to identify the different foci within the tweets. The topic modeling techniques produce clusters of similar words [44]. In this paper, the texts of all tweets are extracted first, then converted to lower cases. The URLs, 'RT' and stop words in English are removed from all the texts. The Latent Dirichlet Analysis (LDA) [44] is selected in the topic modeling process.
To identify the optimal number of topics, several rounds of topic modeling are conducted with different number of topics in each round. According to the Log Likelihood and Perplexity, adding more number of topics will always achieve better performance in this topic modeling process. This is probably due to the fact that the collected tweets are so diverse in terms of languages, geographic areas and unstructured expression on Twitter. Due to the page limitation, this paper chooses the result of 20 topics in the topic modeling. Figure 1 reports the top 5 keywords those 20 topics.  Most of the tweets are classified to topic 1, 18, 10 and 7. Figure 1 shows the top keywords in each topic where the length indicates the weight of each word in its corresponding topic. The words with high weights contribute more in distinguishing the topic, namely can be considered as the signal words of that topic. The word such as CA, Storm, and Rain are frequently seen in many topics. In addition, flood, NASA, and climate change are also important keywords in a few topics. Figure 2 shows the temporal variation of number of tweets in each topic. Tweets in topic 1 and 18 increased dramatically when the storm hit CA in early January 2016, indicate strongest temporal correlation to the CA storm, followed by the topic 7 and 10. Although such semantic-temporal analysis does distinguish the tweets by contents and temporal variation, limited knowledge is gained about how people discuss El Niño on Twitter, especially since several topics have similar keywords.

Sentiment Calculation
A sentimental analysis is performed for all tweets with the TextBolb Python Library (https://textblob.readthedocs.io/en/dev/). Special characters, e.g., URLs, are excluded from each tweet before this analysis. The outputs include a polarity index and a subjectivity index. The polarity index ranges from negative 1 to positive 1 indicating a very negative tone to a very positive tone, and the

Sentiment Calculation
A sentimental analysis is performed for all tweets with the TextBolb Python Library (https://textblob.readthedocs.io/en/dev/). Special characters, e.g., URLs, are excluded from each tweet before this analysis. The outputs include a polarity index and a subjectivity index. The polarity index ranges from negative 1 to positive 1 indicating a very negative tone to a very positive tone, and the subjective index ranges from 0 to 1 indicating a very objective tone to a very subjective tone. Figures 3 and 4 display the Cumulative Distribution Function (CDF) of polarity and subjectivity of all the tweets in each topic. Tweets in different topics present various sentiments. For example, topic 2 tends to be more objective and negative, as the polarity of tweets in topic 2 tends to be more negative and the subjective of the tweets in topic 2 is lower, namely more objective, than other topics.
Again, single dimension analysis, such as sentimental analysis, only proves the sentimental variation. In-depth understanding of human's online discussion requires analysis from multiple dimensions. subjective index ranges from 0 to 1 indicating a very objective tone to a very subjective tone. Figures  3 and 4 display the Cumulative Distribution Function (CDF) of polarity and subjectivity of all the tweets in each topic. Tweets in different topics present various sentiments. For example, topic 2 tends to be more objective and negative, as the polarity of tweets in topic 2 tends to be more negative and the subjective of the tweets in topic 2 is lower, namely more objective, than other topics. Again, single dimension analysis, such as sentimental analysis, only proves the sentimental variation. In-depth understanding of human's online discussion requires analysis from multiple dimensions.

Social Network Construction
On Twitter, a user (author) can mention other Twitter users (mentioned users) in his/her tweets, and can also include several hashtags to label or signify the tweet content. By investigating who (what) are the influential and active users (hashtags), the researchers are able to identify the popular Twitter users and interest foci. This paper constructed two social networks for each topic, namely author-to-mentioned-user network and the hashtag network to identify the popular Twitter users subjective index ranges from 0 to 1 indicating a very objective tone to a very subjective tone. Figures  3 and 4 display the Cumulative Distribution Function (CDF) of polarity and subjectivity of all the tweets in each topic. Tweets in different topics present various sentiments. For example, topic 2 tends to be more objective and negative, as the polarity of tweets in topic 2 tends to be more negative and the subjective of the tweets in topic 2 is lower, namely more objective, than other topics. Again, single dimension analysis, such as sentimental analysis, only proves the sentimental variation. In-depth understanding of human's online discussion requires analysis from multiple dimensions.

Social Network Construction
On Twitter, a user (author) can mention other Twitter users (mentioned users) in his/her tweets, and can also include several hashtags to label or signify the tweet content. By investigating who (what) are the influential and active users (hashtags), the researchers are able to identify the popular Twitter users and interest foci. This paper constructed two social networks for each topic, namely author-to-mentioned-user network and the hashtag network to identify the popular Twitter users

Social Network Construction
On Twitter, a user (author) can mention other Twitter users (mentioned users) in his/her tweets, and can also include several hashtags to label or signify the tweet content. By investigating who (what) are the influential and active users (hashtags), the researchers are able to identify the popular Twitter users and interest foci. This paper constructed two social networks for each topic, namely author-to-mentioned-user network and the hashtag network to identify the popular Twitter users and interest foci. For each topic, the researchers calculated the following statistics or measurements of network structure [45]: • Frequency: The number of times an item, e.g., a hashtag or a Twitter user, has been mentioned in the collected tweets of one topic; • Degree: The number of times an item is associated with other items, e.g., how many different hashtags/Twitter users are mentioned together with this hashtag/Twitter user in one topic: Indegree: In a directed network, the indegree is the number of ties an item receives from other items. Outdegree: In a directed network, the outdegree is the number of ties an item constructs toward other items.
Weighted degrees are calculated where the frequency is the weight.
• Eigenvector centrality: measure the influence of Twitter users or hashtags in networks. Weighted Eigenvector is calculated where the frequency is the weight.
Forty networks are thus constructed for each topic, and the frequency, degrees and eigenvector centralities are also calculated for the Twitter users and the hashtags in each network. Among those networks (i.e., topics), topic 1 contains the highest number of Twitter users and hashtags due to the large quantity of tweets in this topic, followed by topic 14 and 10. Figures 5 and 6 display the top influential hashtags and users in each network where the length indicates the weighted Eigenvector centrality. Different from Figure 1 which includes the most significant keywords that distinguish each topic statistically, Figure 5 displays the most influential hashtags that best represent authors' original intention on Twitter. Therefore, the hashtags are slightly different from the keywords. For example, the top 5 keywords in topic 19 are not the same language as the top 10 hashtags in topic 19. If we combine both figures together, more information about each topic can be obtained. For instance, the keywords that separate topic 0 from other topics are climatechange, cop and etc. The popular hashtags of topic 0 indicate that this topic focuses on climate change and COP 21, i.e., the 2015 United Nations Climate Change Conference. In addition, the keywords in topic 2 contain NASA, weather and impact. The hashtags in this topic focus on Science, NASA, La Niña and etc. Such hashtags and keywords also explain why the sentimental of topic 2 is the most objective and negative: this group of tweets is discussing the climate change due to the El Niño and La Niña with official agencies or scientific resources. and interest foci. For each topic, the researchers calculated the following statistics or measurements of network structure [45]: • Frequency: The number of times an item, e.g., a hashtag or a Twitter user, has been mentioned in the collected tweets of one topic; • Degree: The number of times an item is associated with other items, e.g., how many different hashtags/Twitter users are mentioned together with this hashtag/Twitter user in one topic: o Indegree: In a directed network, the indegree is the number of ties an item receives from other items.
o Outdegree: In a directed network, the outdegree is the number of ties an item constructs toward other items. Weighted degrees are calculated where the frequency is the weight. • Eigenvector centrality: measure the influence of Twitter users or hashtags in networks. Weighted Eigenvector is calculated where the frequency is the weight. Forty networks are thus constructed for each topic, and the frequency, degrees and eigenvector centralities are also calculated for the Twitter users and the hashtags in each network. Among those networks (i.e., topics), topic 1 contains the highest number of Twitter users and hashtags due to the large quantity of tweets in this topic, followed by topic 14 and 10. Figures 5 and 6 display the top influential hashtags and users in each network where the length indicates the weighted Eigenvector centrality. Different from Figure 1 which includes the most significant keywords that distinguish each topic statistically, Figure 5 displays the most influential hashtags that best represent authors' original intention on Twitter. Therefore, the hashtags are slightly different from the keywords. For example, the top 5 keywords in topic 19 are not the same language as the top 10 hashtags in topic 19. If we combine both figures together, more information about each topic can be obtained. For instance, the keywords that separate topic 0 from other topics are climatechange, cop and etc. The popular hashtags of topic 0 indicate that this topic focuses on climate change and COP 21, i.e., the 2015 United Nations Climate Change Conference. In addition, the keywords in topic 2 contain NASA, weather and impact. The hashtags in this topic focus on Science, NASA, La Niña and etc. Such hashtags and keywords also explain why the sentimental of topic 2 is the most objective and negative: this group of tweets is discussing the climate change due to the El Niño and La Niña with official agencies or scientific resources.   It is clear that the local news media in CA, ABC 10 News, is the most powerful Twitter user in many topics ( Figure 6). This is probability because this Twitter account is also actively interacting with other Twitter users in topic 13 (Figure 8), especially with other local news as shown in topic 1, 13 and 14, where those local news media accounts have high Eigenvector centrality ( Figure 6). NASA and some national news media, such as ABC News, are frequently mentioned in topic 2, 11, 18 ( Figure  7). However, those mentioned Twitter accounts are not actively interacting with other users, as they are not listed as the top mentioning users or influential users ( Figure 6, Figure 8). In addition to those national news media, some local news media, e.g., ABC7, and local public agencies, such as LAMayorsOffice, are also frequently mentioned in topic 4, 10, 13, 15 and 18.   It is clear that the local news media in CA, ABC 10 News, is the most powerful Twitter user in many topics ( Figure 6). This is probability because this Twitter account is also actively interacting with other Twitter users in topic 13 (Figure 8), especially with other local news as shown in topic 1, 13 and 14, where those local news media accounts have high Eigenvector centrality ( Figure 6). NASA and some national news media, such as ABC News, are frequently mentioned in topic 2, 11, 18 ( Figure  7). However, those mentioned Twitter accounts are not actively interacting with other users, as they are not listed as the top mentioning users or influential users ( Figure 6, Figure 8). In addition to those national news media, some local news media, e.g., ABC7, and local public agencies, such as LAMayorsOffice, are also frequently mentioned in topic 4, 10, 13, 15 and 18.

Data Quality and Bias
To access the data quality and potential bias in the collected tweets, the paper calculated the reputation of Twitter users and summarized the source of all tweets in each topic.
Specifically, the reputation is calculated based on [41]: = + Figure 9. Reputation of Twitter users. It is clear that the local news media in CA, ABC 10 News, is the most powerful Twitter user in many topics ( Figure 6). This is probability because this Twitter account is also actively interacting with other Twitter users in topic 13 (Figure 8), especially with other local news as shown in topic 1, 13 and 14, where those local news media accounts have high Eigenvector centrality ( Figure 6). NASA and some national news media, such as ABC News, are frequently mentioned in topic 2, 11, 18 ( Figure 7). However, those mentioned Twitter accounts are not actively interacting with other users, as they are not listed as the top mentioning users or influential users ( Figure 6, Figure 8). In addition to those national news media, some local news media, e.g., ABC7, and local public agencies, such as LAMayorsOffice, are also frequently mentioned in topic 4, 10, 13, 15 and 18.

Data Quality and Bias
To access the data quality and potential bias in the collected tweets, the paper calculated the reputation of Twitter users and summarized the source of all tweets in each topic.
Specifically, the reputation is calculated based on [41]: Figure 9 summarized the reputations of Twitter users in each topic and Figure 10 shows the sources, language and sensitivity of all tweets ordered by the number of tweets in each topic. Most tweets in topic 3, 5, 6, 7, 11, 12 and 19 are not in English. Topic 18 contains the most tweets that are published by bots or automatic tweeting services such as IFTTT, and include the most sensitive URLs. The other topics also contain a relative small portion of tweets from bots or automatic tweeting services. The reputations of Twitter users in topic 6 vary the most. This is because there is a large number of duplicated tweets in this topic where many tweets have 0 retweets, but a few tweets have huge retweets. The reputation of Twitter users in topic 19 is higher than other topics, and both topic 6 and 19 contain large number of tweets in the language of IN which is not listed on the ISO 639 language code (https://www.loc.gov/standards/iso639-2/php/code_list.php) or the Twitter official document (https://developer.twitter.com/en/docs/twitter-for-websites/twitter-for-websites-supportedlanguages/overview.html).

Reputation = Number Twitter Followers Number o f Twitter Followers + Number o f Twitter Followings
To access the data quality and potential bias in the collected tweets, the paper calculated the reputation of Twitter users and summarized the source of all tweets in each topic.
Specifically, the reputation is calculated based on [41]: = + Figure 9. Reputation of Twitter users.  Figure 9 summarized the reputations of Twitter users in each topic and Figure 10 shows the sources, language and sensitivity of all tweets ordered by the number of tweets in each topic. Most tweets in topic 3, 5, 6, 7, 11, 12 and 19 are not in English. Topic 18 contains the most tweets that are published by bots or automatic tweeting services such as IFTTT, and include the most sensitive URLs.

Multi-Dimension Analysis
Although analyses in Section 4 yield some insights of how people discuss El Niño on Twitter, such insights are limited when considered separately. For example, topic modeling classifies the tweets to different groups where tweets with similar words belong to the same group. The keywords in Figure 1 are insufficient to gain the exact meaning of those topics due to overlapped keywords. However, if we analyzed the outputs from Section 4 simultaneously, many meaningful questions can be answered. This section provides a few examples to demonstrate how the multi-dimension analysis can provide more meaningful insights to understand people's discussion of El Niño on Twitter.

When and Where Do People Discuss El Niño
Although topic 1 has the most tweets located in the USA, topic 10 and 14 have the shortest standard distance that focuses on the west of the USA as shown on Figure 11 where the centroids of each circle indicate the mean centers of each topic and the radius of the circles represent the size of the standard distances. analyzed the outputs from Section 4 simultaneously, many meaningful questions can be answered. This section provides a few examples to demonstrate how the multi-dimension analysis can provide more meaningful insights to understand people's discussion of El Niño on Twitter.

When and Where Do People Discuss El Niño
Although topic 1 has the most tweets located in the USA, topic 10 and 14 have the shortest standard distance that focuses on the west of the USA as shown on Figure 11 where the centroids of each circle indicate the mean centers of each topic and the radius of the circles represent the size of the standard distances. In addition to the spatial distribution of the geocoded tweets, Sea Surface Temperature (SST), an import indicator of El Niño, is also introduced to assess people's perception and reaction to El Niño on Twitter over time and space. The geocoded tweet points were split into the same temporal internals as the SST data, e.g., 1 December 2015-6 December 2015, 7 December 2015-13 December 2015, etc. The hot-cold spots of the tweet density at the state level for each week were also calculated, and the Sea Surface Temperature (SST) of each week was added as a reference. The Polygon contiguity (first order) is selected as the spatial weight matrix in calculating the hot-cold spots.
According to Figure 12, Twitter users in the District of Columbia and surrounding areas tended to be consistently more interested in El Niño on Twitter before or after the CA storm , namely from 1 December 2015 to 3 January 2016. However, there was a sudden and abnormal increase in the SST on the west coast of the United States between 28 December 2015 and 3 January 2016, accompanied by heavy rainstorms in areas in California between 4 January 2016 and 10 January 2016 [46]. The discussion of El Niño on Twitter in California and its surrounding states thus was seen to increase dramatically between 4 January 2016 and 10 January 2016. As the SST returned to normal after 4 October 2016, the hotspots of the Discussion on El Niño came back and stayed in the District of Columbia and its nearby areas. The hot-cold spots reveal a general pattern that before or after the CA storm, Twitter users in D.C. express significant interests on the discussion of El Niño. This might be due to the concentration of federal departments and news media agencies in D.C. area. However, during the week of the CA rainstorm, Twitter users in CA and surrounding states showed more interests on El Niño due to the direct impact. In addition to the spatial distribution of the geocoded tweets, Sea Surface Temperature (SST), an import indicator of El Niño, is also introduced to assess people's perception and reaction to El Niño on Twitter over time and space. The geocoded tweet points were split into the same temporal internals as the SST data, e.g., 1 December 2015-6 December 2015, 7 December 2015-13 December 2015, etc. The hot-cold spots of the tweet density at the state level for each week were also calculated, and the Sea Surface Temperature (SST) of each week was added as a reference. The Polygon contiguity (first order) is selected as the spatial weight matrix in calculating the hot-cold spots.
According to Figure 12, Twitter users in the District of Columbia and surrounding areas tended to be consistently more interested in El Niño on Twitter before or after the CA storm, namely from 1 December 2015 to 3 January 2016. However, there was a sudden and abnormal increase in the SST on the west coast of the United States between 28 December 2015 and 3 January 2016, accompanied by heavy rainstorms in areas in California between 4 January 2016 and 10 January 2016 [46]. The discussion of El Niño on Twitter in California and its surrounding states thus was seen to increase dramatically between 4 January 2016 and 10 January 2016. As the SST returned to normal after 4 October 2016, the hotspots of the Discussion on El Niño came back and stayed in the District of Columbia and its nearby areas. The hot-cold spots reveal a general pattern that before or after the CA storm, Twitter users in D.C. express significant interests on the discussion of El Niño. This might be due to the concentration of federal departments and news media agencies in D.C. area. However, during the week of the CA rainstorm, Twitter users in CA and surrounding states showed more interests on El Niño due to the direct impact.

The Different Foci in Tweets
After combining all the outputs from Section 4, the different foci in tweets can be identified. Based on the common hashtags ( Figure 5), the most mentioned and mentioning Twitter users (Figure 7, Figure 8), and the number of tweets in different language (Figure 2), topic 3, 5, 6, 7, 11, 12 and 19 contains significant portion of non-English tweets. This also explains why they have the least number of tweets that are geocoded in the USA. Topic 0 is a broad discussion of El Niño, for instance the World Economic Forum (@wef ) is mentioned a lot. However, topic 9 and 17 also focus on the general or global discussion of El Niño. Those 3 topics are separated in the topic modeling because all of those contain a large number of tweets from one or two Twitter users, e.g., potential bots. Figure 13 calculate the number of tweets posted by each Twitter user in each topic. Topic 0 and 17 contain hug number of tweets from Twitter user tweetsbychkov while topic 9 contains many tweets from VinylrobotLA. Those extreme active Twitter users, or potential bots, lead to the Twitter discussion to 3 statistically different groups.
VinylrobotLA. Those extreme active Twitter users, or potential bots, lead to the Twitter discussion to 3 statistically different groups.
Topic 1 contains the highest number of tweets and geocoded tweets in the USA, and is more general of LA storm and El Niño. Topic 2 and 8 associated El Niño with climate change, and mentioned NASA or 2015 United Nations Climate Change Conference (COP 21) respectively. Therefore, their sentiment tends to be more objective and negative (Figure 3, Figure 4).
The other topics (4, 10, 13-16) talk more about the CA storm or LA rain due to the El Niño. Specifically, topic 15 and 16 are general discussion. Topic 10 focuses on how to prepare for the LA storm, thus shows more positive than other topics (Figure 3), and higher tweet density in LA than in DC ( Figure 14). Both topic 13 and 14 display the near real-time report of El Niño where a lot of local news media are involved. However, topic 14 is separated from topic 13 due to the heave user of NecklaceFash who sends a lot of tweets related to boots or outfit of the day (ootd) when the LA storm came.
Scientists, national news media and local news media play important roles in disseminating and mediating discussions about El Niño events, which provides evidence that was not found in previous studies [26]. Specifically, the local news media and national news media behavior differently when reporting El Niño. As shown in Figures 6-8, national new media have been mentioned a lot (e.g., in topic 18) when the tweets focusing on general discussion of El Niño, while local news media demonstrate more interests in CA storm (e.g., topic 13,14). In addition, local news media tend to interact with other local news media on Twitter while national news media do not.  Topic 1 contains the highest number of tweets and geocoded tweets in the USA, and is more general of LA storm and El Niño. Topic 2 and 8 associated El Niño with climate change, and mentioned NASA or 2015 United Nations Climate Change Conference (COP 21) respectively. Therefore, their sentiment tends to be more objective and negative (Figure 3, Figure 4).

The Impact of Geopolitical Environment on Twitter Discussion
The other topics (4,10,(13)(14)(15)(16) talk more about the CA storm or LA rain due to the El Niño. Specifically, topic 15 and 16 are general discussion. Topic 10 focuses on how to prepare for the LA storm, thus shows more positive than other topics (Figure 3), and higher tweet density in LA than in DC ( Figure 14). Both topic 13 and 14 display the near real-time report of El Niño where a lot of local news media are involved. However, topic 14 is separated from topic 13 due to the heave user of NecklaceFash who sends a lot of tweets related to boots or outfit of the day (ootd) when the LA storm came.
correlation with the number of discussions in the General Discussion on El Niño. Such a division is likely due to the factor that Democrat-led states focused more on cause, impact, and action of climate change [37], and discussions and responses to climate change news are dominated by climate change activists rather than climate change deniers [23]. However, none of those tweet densities are statistically related to the 2016 median household income in each state.  Scientists, national news media and local news media play important roles in disseminating and mediating discussions about El Niño events, which provides evidence that was not found in previous studies [26]. Specifically, the local news media and national news media behavior differently when reporting El Niño. As shown in Figures 6-8, national new media have been mentioned a lot (e.g., in topic 18) when the tweets focusing on general discussion of El Niño, while local news media demonstrate more interests in CA storm (e.g., topic 13, 14). In addition, local news media tend to interact with other local news media on Twitter while national news media do not.

The Impact of Geopolitical Environment on Twitter Discussion
People's perception is closely related to their nearby geographic and political environment [1,15]. The distribution of each topic is normalized by the population of each state to explore how the geopolitical environment impacts people's discussion on El Niño on Twitter. Figure 15 depicts the distribution of the topic 1 within the United States spatially. The sizes of the dots indicate the number of tweets per one million persons in each state, and the colors represent the wining part in 2018 election. On average, people in District of Columbia and California produced the highest number of tweets, but people in District of Columbia talked more about El Niño in relation to climate change, while people in California focused on the real-time report of the storms. A clear pattern also emerged on Figure 14 indicating that people in states that voted for the Democratic Party in the 2016 Presidential Election talked more about El Niño than people in states that voted for the Republican Party. After comparing the correlation between the voting rate (data from [47]) for the two parties in each state, and the number of tweets in each topic per one million persons in each state, the researchers found that without considering DC and CA, topic 1 is the only one that significantly related to voting rate where states that favored the Republican Party had a moderate negative correlation with the number of discussions in the General Discussion on El Niño. Such a division is likely due to the factor that Democrat-led states focused more on cause, impact, and action of climate change [37], and discussions and responses to climate change news are dominated by climate change activists rather than climate change deniers [23]. However, none of those tweet densities are statistically related to the 2016 median household income in each state.

Discussion and Conclusions
Whenever Twitter users perceived what they thought were abnormal weather conditions, they immediately expressed their feelings and opinions on Twitter. This research analyzed the social media dynamics of a major natural disaster, El Niño, and compared the discussions of such disaster before, during and after an actual event which occurred on the west coast of the United States. A comprehensive analysis was conducted to investigate the perception of El Niño in both spatialtemporal and social-semantic dimensions. Such methods are able to identify the spatial-temporal clusters of people's interests of specific events and can identify popular Twitter users or interest foci from Twitter data. Such multi-dimension analysis can identify complicated patterns of Twitter users' online discussion and answer questions that cannot be addressed with single dimension analysis. For instance, this research revealed that people who were directly affected by the severe weather conditions demonstrated significant interest in them, more than people in other places. This indirectly contrasts to another study [29], where the discussions about climate change were found to be controlled by the state, enacted by state-sponsored actors and media, and remained limited to a theoretical discussion about climate change detached from the larger political context. This paper also finds that when generally discussing a natural disaster, such as El Niño, the popular tweets tend to be negative and objective. However, when people are talking about an actual event or such disaster, the popular tweets tended to be just positive. In a discussion of the same El Niño events, people were found to have different foci. The majority of tweets in the discussion on Los Angeles Storm focused on the Los Angeles storm caused by an El Niño event. In this cluster, local news outlets were found to play a significant role in reporting the situations of the storm. In addition, there is also general discussion about El Niño events. In the general discussion on El Niño, laypeople expressed their interests and concerns on El Niño events. Tweets in the discussion on El Niño with climate change expressed serious concern about El Niño regarding climate change, global warming, drought, food security, etc., and in this discussion, scientists and national news media are seen as major contributors to the discussion.
Such diverse discussions are also related to the geopolitical environment of Twitter users. Twitter users in the District of Columbia area showed extraordinary interest in topics in the Discussion on El Niño with climate change, while Twitter users in California showed more concern for issues mentioned in the discussion on Los Angeles storm. Meanwhile, Twitter users in Democratled states discussed the El Niño events more than Twitter users in Republican-led states.
There are several limitations of this research. Although tens of thousands of tweets are analyzed to support the results, the analyzed tweets are still a small portion of the entire Twitter data archive, Figure 15. Spatial distribution of tweets in topic 1.

Discussion and Conclusions
Whenever Twitter users perceived what they thought were abnormal weather conditions, they immediately expressed their feelings and opinions on Twitter. This research analyzed the social media dynamics of a major natural disaster, El Niño, and compared the discussions of such disaster before, during and after an actual event which occurred on the west coast of the United States. A comprehensive analysis was conducted to investigate the perception of El Niño in both spatial-temporal and social-semantic dimensions. Such methods are able to identify the spatial-temporal clusters of people's interests of specific events and can identify popular Twitter users or interest foci from Twitter data. Such multi-dimension analysis can identify complicated patterns of Twitter users' online discussion and answer questions that cannot be addressed with single dimension analysis. For instance, this research revealed that people who were directly affected by the severe weather conditions demonstrated significant interest in them, more than people in other places. This indirectly contrasts to another study [29], where the discussions about climate change were found to be controlled by the state, enacted by state-sponsored actors and media, and remained limited to a theoretical discussion about climate change detached from the larger political context. This paper also finds that when generally discussing a natural disaster, such as El Niño, the popular tweets tend to be negative and objective. However, when people are talking about an actual event or such disaster, the popular tweets tended to be just positive. In a discussion of the same El Niño events, people were found to have different foci. The majority of tweets in the discussion on Los Angeles Storm focused on the Los Angeles storm caused by an El Niño event. In this cluster, local news outlets were found to play a significant role in reporting the situations of the storm. In addition, there is also general discussion about El Niño events. In the general discussion on El Niño, laypeople expressed their interests and concerns on El Niño events. Tweets in the discussion on El Niño with climate change expressed serious concern about El Niño regarding climate change, global warming, drought, food security, etc., and in this discussion, scientists and national news media are seen as major contributors to the discussion.
Such diverse discussions are also related to the geopolitical environment of Twitter users. Twitter users in the District of Columbia area showed extraordinary interest in topics in the Discussion on El Niño with climate change, while Twitter users in California showed more concern for issues mentioned in the discussion on Los Angeles storm. Meanwhile, Twitter users in Democrat-led states discussed the El Niño events more than Twitter users in Republican-led states.
There are several limitations of this research. Although tens of thousands of tweets are analyzed to support the results, the analyzed tweets are still a small portion of the entire Twitter data archive, and there are a huge number of people who do not use Twitter as a major communication tool. Gathering comprehensive information on people's discussion while not violating personal privacy is always a big challenge in big data science. Meanwhile, topic modeling separates tweets based on the statistical similarity where sarcasm or jargon cannot be detected, and is vulnerable to Twitter bots. A supervised machine learning model, such as Naïve Bay, may generate better results but requires huge human inputs. Finally, the majority of the tweets' locations are geocoded from users' profiles. Those locations may not represent the true locations where the tweets are actually posted.
Author Contributions: Xinyue Ye conceived and designed the study; Xinyue Ye and Xuebin Wei outlined the methodology; Xuebin Wei analyzed the data and drafted the manuscript; Xinyue Ye extensively updated the manuscript. All authors have read and approved the final manuscript.