Harnessing the Power of Multi-Source Media Platforms for Public Perception Analysis: Insights from the Ohio Train Derailment

Hu, Tao; Huang, Xiao; Li, Yun; Fu, Xiaokang

doi:10.3390/bdcc9040088

Open AccessArticle

Harnessing the Power of Multi-Source Media Platforms for Public Perception Analysis: Insights from the Ohio Train Derailment

¹

Department of Geography, Oklahoma State University, Stillwater, OK 74074, USA

²

Department of Environmental Sciences, Emory University, Atlanta, GA 30322, USA

³

Department of Computer Science, Emory University, Atlanta, GA 30322, USA

⁴

Center for Geographic Analysis, Harvard University, Cambridge, MA 02138, USA

⁵

State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 129 Luoyu Road, Wuhan 430079, China

^*

Author to whom correspondence should be addressed.

Big Data Cogn. Comput. 2025, 9(4), 88; https://doi.org/10.3390/bdcc9040088

Submission received: 5 February 2025 / Revised: 23 March 2025 / Accepted: 2 April 2025 / Published: 5 April 2025

(This article belongs to the Special Issue Machine Learning Applications and Big Data Challenges)

Download

Browse Figures

Versions Notes

Abstract

Media platforms provide an effective way to gauge public perceptions, especially during mass disruption events. This research explores public responses to the 2023 Ohio train derailment event through Twitter, currently known as X, and Google Trends. It aims to unveil public sentiments and attitudes by employing sentiment analysis using the Valence Aware Dictionary and Sentiment Reasoner (VADER) and topic modeling using Latent Dirichlet Allocation (LDA) on geotagged tweets across three phases of the event: impact and immediate response, investigation, and recovery. Additionally, the Self-Organizing Map (SOM) model is employed to conduct time-series clustering analysis of Google search patterns, offering a deeper understanding into the event’s spatial and temporal impact on society. The results reveal that public perceptions related to pollution in communities exhibited an inverted U-shaped curve during the initial two phases on both the Twitter and Google Search platforms. However, in the third phase, the trends diverged. While public awareness declined on Google Search, it experienced an uptick on Twitter, a shift that can be attributed to governmental responses. Furthermore, the topics of Twitter discussions underwent a transition across three phases, changing from a focus on the causes of fires and evacuation strategies in Phase 1, to river pollution and trusteeship issues in Phase 2, and finally converging on government actions and community safety in Phase 3. Overall, this study advances a multi-platform and multi-method framework to uncover the spatiotemporal dynamics of public perception during disasters, offering actionable insights for real-time, region-specific crisis management.

Keywords:

social media; disaster management; sentiment analysis; topic modeling

1. Introduction

Understanding public perceptions is critical in the context of mass disruption events, such as natural disasters [1,2,3] and pandemics [4,5,6]. In this digital era, media platforms—encompassing social media, search engines, and news media—have emerged as key tools for gauging public perceptions. They provide real-time, high-volume data that can be harnessed to understand and gauge public response, thereby informing and enhancing strategies for crisis management. Each platform attracts a unique user demographic and serves specific information needs, leading to different patterns of public awareness. Therefore, conducting an integrated analysis across diverse media platforms can offer a more comprehensive understanding of public perceptions and behaviors during mass disruption events.

Many studies have leveraged media platforms to exam public responses and awareness. During Hurricane Sandy in 2012, researchers analyzed Twitter data to assess the effectiveness of social media communication during disasters, especially in understanding the relationship between activities and attention gained during three phases of a disaster [7,8,9]. In response to the Chennai floods of 2015, in India, Facebook activated its ‘Safety Check’ feature, allowing users in the affected area to quickly inform their networks that they were safe. Research found that this feature played a significant role in disseminating information and connecting affected individuals with their social networks [10]. During the COVID-19 pandemic, many researchers analyzed a large volume of Twitter data to understand public sentiments towards the infectious disease [11], attitudes towards government policies (e.g., stay at home order) [12], opinions on COVID-19 vaccines [4], and public concerns across the pandemic [13]. Different media platforms have distinct features and can offer diverse perspectives into public awareness. However, limited studies have leveraged data from multiple media platforms to concurrently support analysis. For instance, Burke et al. [14] analyzed location-specific variation in search query behavior related to smoke exposure from Google Trends and geotagged Twitter data using natural language processing algorithms to identify public preferences and sentiments in rapidly changing wildfire risk. However, the study lacks comprehensive analysis from the spatial and temporal perspectives.

This study aims to utilize multiple media platforms to reveal distinct dimensions of the public’s response to the Ohio train derailment event. On 3 February 2023, a train carrying hazardous materials suffered a severe derailment on the eastern side of East Palestine, Ohio, near the Pennsylvania border. This catastrophic event resulted in a fire that burned for several days. Among the derailed cars, 11 were tank cars that dumped ~100,000 gallons of hazardous materials into the environment, including vinyl chloride, benzene residue, and butyl acrylate, which pose significant threats to the environment and public health. To mitigate the situation, emergency crews conducted a controlled burn of several railcars. Unfortunately, this measure resulted in the emission of additional hazardous substances into the air.

Numerous studies have investigated the environmental impacts of the derailment event. Ramachandra et al. [15] analyzed the temporal pattern of chemical pollutant levels in the air, water, and soil in East Palestine and built machine learning and statistical models to predict pollutant levels over time. Nix et al. [16] collected waterways and cropland, household income, and health insurance coverage datasets and utilized geographic information system tools to analyze human and environmental concerns. In addition to environmental investigations, several studies have addressed public awareness of railroad safety from different perspectives. For instance, Cruz [17] utilized qualitative method to examine the communication strategies employed during the Ohio train derailment incident, focusing on how information was disseminated and managed among stakeholders. Ghosh et al. [18] conducted a systematic analysis of Twitter data to identify major topics related to railroad safety, including Ohio train derailment event. Their findings underscore the utility of social media platforms like Twitter in understanding public discourse on transportation safety issues.

Despite growing recognition of the value of public perception data from media platforms, most existing studies on the Ohio derailment have primarily focused on environmental modeling and pollutant monitoring, with limited attention paid to how the public interpreted and responded to the event. Moreover, while social media and search engine platforms generate rich streams of real-time data that reflect public sentiment and awareness, few studies have leveraged both sources in a complementary manner to assess public reaction. This study is motivated by the need to bridge this gap by integrating geotagged Twitter data and Google Trends to assess public awareness, concern, and sentiment across spatial and temporal dimensions. Twitter provides a platform for users to express their immediate emotional responses, while Google Trends captures broader search behaviors indicative of public curiosity and concern. Together, these platforms offer a multifaceted view of public discourse that can enhance situational awareness and inform data-driven decision-making.

To guide this investigation, we pose the following three research questions:

Q1: How does the popularity of search terms related to train derailment events across different geographic regions in the U.S.?

Q2: How does the sentiment and tone of social media about train derailment events change over time, and are there any notable events or incidents that contribute to shifts in sentiment?

Q3: How do trending topics related to train derailment events change and evolve on a social media platform?

The remainder of the paper is organized into four main sections. Section 2 provides background information on the Ohio train derailment event and introduces the media platforms analyzed in this study, including Twitter and Google Trends. It further outlines the key methods used to analyze social media data. Section 3 presents the results and highlights the main findings. Section 4 and Section 5 offer a discussion of the results, concluding remarks, study limitations, and directions for future research.

2. Study Data and Methods

This section describes the datasets and methodologies used in the study, offering temporal, spatial, and thematic insights into public responses following the Ohio train derailment. It begins by outlining the event timeline, followed by a detailed explanation of the data collection processes for Google Trends and Twitter data. Finally, it introduces the analytical approaches used to examine spatial and temporal patterns in public awareness and perception of the event.

Figure 1 presents the integrated workflow used to examine public awareness and opinion by combining Twitter and Google Trends data. The top pathway outlines the Twitter-based analysis, which includes data collection via the Twitter API, preprocessing, and various forms of content analysis, such as sentiment analysis, emotion analysis, topic modeling, and word cloud generation, to capture the tone, themes, and discourse surrounding the event. The bottom pathway illustrates the Google Trends-based analysis, where data were collected using the Pytrends API and preprocessed for time series clustering. This clustering was conducted at the state level, allowing for the identification of both spatial and temporal trends in public search behavior. While Twitter analysis focuses on the content and emotional context of public discourse, the Google Trends analysis captures broader patterns of public interest over time and across geographic regions. Together, these complementary approaches provide a holistic view of public response to the event.

2.1. Ohio Train Derailment Event Timeline

On 3 February 2023, a freight train derailed in East Palestine, Ohio, U.S., causing a significant disaster in the vicinity. The train, carrying a shipment of ethanol, resulted in a massive fireball upon derailment. In response to the escalating danger, the government implemented an emergency evacuation of all residents within a one-mile radius of the derailment site on 6 February. However, the pollution spawned by the derailment has resulted in an enduring and ongoing environmental impact. According to statements from the Ohio Department of Natural Resources, the chemical spill resulted in the death of approximately 3500 small fish in about 7.5 miles of streams as of 8 February (https://www.cbsnews.com/news/timeline-east-palestine-ohio-train-derailment-chemicals-evacuations/, accessed on 31 December 2024). By 23 February, it had potentially killed more than 43,000 marine creatures, including fish, crustaceans, and amphibians (https://www.washingtonpost.com/climate-environment/2023/02/23/ohio-train-derailment-animals-deaths/, accessed on 31 December 2024). Since this devastating event took place, environmental scientists and governmental agencies have been actively developing controlled relief strategies and conducting rigorous assessments of the environmental impacts. This includes the collection of air, soil, and water samples to gauge the extent of pollution. Crash remnants were detected in water samples from water near the derailment site such as Bull Creek, Little Beaver Creek, and the Ohio River. Despite official statements that no hazardous chemicals were detected in the air and water, residents persist in reporting health issues such as headaches and difficulty breathing [19].

As depicted in Figure 2, we grouped the event time period into three phases, including the following:

Phase 1: Impact and immediate response (3 February–9 February). This phase includes the derailment event itself, the immediate response by emergency services, the evacuation of the surrounding area, and the immediate efforts to contain and manage the situation.
Phase 2: Investigation (10 February–21 February). This phase begins after investigations into the impact of pollution in local communities. For instance, on February 16, the EPA (U.S. Environmental Protection Agency) administrator visited East Palestine for the first time since the derailment and said that residents with private wells should use bottled water until tests showed that the well water was safe to drink.
Phase 3: Recovery (22 February–28 February). This phase represents the longer-term aftermath of the event, including government response at a higher level, such as the visit from previous president Donald Trump on 22 February 2023.

2.2. Google Search Data

Google Trends is a website created by Google that analyzes the popularity of top search queries in Google Search across various regions and languages. It offers unique insights into the public’s awareness of specific events by monitoring Google search behaviors over time. For this study, we used Google Trends to obtain data related to Ohio train derailment events. While Google does not directly offer an API to extract data from Google Trends, PyTrends, an open-source Python library, allows a simple interface for automating downloading of Google Trends reports. It enables users to extract time-series data of Google searches related to keywords. When gathering data, Google Trends normalizes it by scaling according to the total search volume within a specific time period and location. The data are indexed on a scale from 0 to 100, where the peak popularity for a term in the chosen region and timeframe is assigned a value of 100. All other data points are scaled relative to this peak. In our study, we queried Google Trends for each state individually, indicating that the normalization process was applied on a state-by-state basis. Given this methodology, Google Trends can offer insights into how popular a term is across different areas, making it suitable for analyzing and cross-comparing relative trends over time.

To accurately capture the online discourse surrounding the Ohio train derailment, we used the search term ’Ohio train derailment’ in our data request process. Furthermore, we restricted the time window for data collection from 4 February to 28 February 2023. This timeframe aligns with our identified timeline of the critical events and discussions related to the derailment. This approach guarantees the capture the most relevant and focused insights from the public awareness that unfolded during this period.

2.3. Social Media Data

We further collected tweets data from Twitter, currently known as X, to identify what people are discussing and sharing, providing insights into public opinion and sentiment. Before July 2023, Twitter offered various APIs, including the Rest API, Ads API, Twitter API v2, and Enterprise APIs. These APIs allow developers to access and interact with data, such as tweet content, user profiles, follower data, location information, and more. This study employed Twitter streaming API, notable for its real-time data access. This API enables developers to subscribe to specific users, keywords, or geographic regions and receive all tweets that match predefined criteria. The Twitter streaming API’s unique strength lies in its capacity to provide real-time tracking of public events and discussions. For instance, during the Ohio train derailment event, this API delivered a live feed of all event-related tweets, forming a sense of real-time public awareness.

The Harvard Center for Geographic Analysis (CGA) has utilized Twitter’s streaming API to amass geotagged tweets since 2012, thereby creating a substantial data archive [9]. In this study, we utilized this repository and defined specific keywords and hashtags relevant to Ohio train derailments. The keywords include “Ohio toxic”, “Ohio river”, “Ohio pollution”, “Ohio derailment”, “Ohio train”, “EastPalestine”, “Ohio disaster”, “Ohio chemical”, “Ohio Chernobyl”, “Ohio water”, and “Ohio train crash”. The hashtags include #EastPalestine, #OhioTrainDisaster, #OhioChernobyl, #ChernobylOhio, #OhioChemicalDisaster, #ohiotoxiccloud, #TrainDerailment, #OhioCoverup, #EastPalestineDisaster, #PalestineOhio, #OhioPalestine, #Ohiotraincrash, and #TrainDisaster. In addition, we selected only tweets posted in English. Having established a study timeframe from 1 February to 28 February 2023, we gathered a considerable number of geotagged tweets posted in the U.S., a total of 10,050. We have three main steps to identify and remove promotional and advertisement tweets. The first step is to build identification criteria. We defined promotional or advisement tweets as those primaries intended to market a product, service, or event. This includes tweets containing sales offers, discounts, promotional codes, or direct calls to action such as ‘buy now’, ‘subscribe’, ‘promotion’, or ‘visit our store’. The second step is the manual review. After selecting tweets based on the above criteria using keyword searches, the team reviewed the content of the tweets to ascertain whether they were promotional or advertisement in nature. The third step is validation. In addition to the criteria, the team members went through the removed and remaining tweets and validated the accuracy of the decisions. Following this rigorous process, we removed a total of 290 tweets identified as promotional or advertisements, resulting in a final dataset of 9760 geotagged tweets for our study.

2.4. Time-Series Clustering

We first performed time-series clustering at the U.S. state level using Google Trends data as introduced in Section 2.2. Given the limited time series duration incorporated in our research, we opted for an unsupervised approach to this task. We deployed the Self-Organizing Map (SOM), an unsupervised learning algorithm originating from artificial neural networks, specifically crafted for high-dimensional data visualization and clustering tasks. The broad applicability of the SOM stems from its inherent flexibility, topological preservation, scalability, and concurrent dimension region and clustering capabilities, which have warranted its widespread usage across diverse domains [20,21,22]. By feeding the state-level Google Trends time series data into the SOM, we managed to optimize it via the quantization error [23]. This process resulted in a 1 × 3 neuron setting, which corresponds to three optimized time-series clusters. We set the learning rate as 0.16 and the sigma as 0.7. Our input time series ranged from 4 February to 28 February, thereby leading to a 25-dimension vector, i.e.,

X = (x_{1}, x_{2}, \dots, x_{25})

. We derived a normalized vector

X_{n o r m} = {x_{1}^{'}, x_{2}^{'}, \dots, x_{25}^{'}}

, where

x_{i}^{'} = \frac{x_{i}}{m a x (X)}

.

The training procedure for the SOM is further elaborated upon as follows:

(1): Choose random values for the initial weight vectors $w_{j}$ .
(2): Draw a sample training input vector $X_{n o r m}$ from the input space.
(3): Find the winning neuron $I (x)$ that has the weight vector closest to the input vector, i.e., $d_{j} (x^{'}) = \sum_{i = 1}^{D} {(x_{i}^{'} - w_{j i})}^{2}$ .
(4): Apply the weight update equation $∆ w_{j i} = η (t) T_{j, I_{(x^{'})}} (t) (x_{i}^{'} - w_{j i})$ , where $η (t)$ is the learning rate and $T_{j, I_{(x^{'})}} (t)$ is a Gaussian neighborhood.
(5): Return to the second step until the feature map stops changing.

2.5. Sentiment Analysis

The tool chosen for sentiment analysis in this study is VADER (Valence Aware Dictionary and Sentiment Reasoner), a lexicon and rule-based sentiment analysis tool specifically designed to work well with social media data, including Twitter [24]. Its primary function is to measure the polarity (whether the sentiment is positive or negative) and intensity (the strength of the sentiment) of emotions in text. The VADER scoring system provides not only the Positivity and Negativity scores but also a Compound score that conveys the overall sentiment of a text. This Compound score ranges from −1 to 1, where a score less than −0.05 indicates a negative sentiment, a score more than 0.05 signifies a positive sentiment, and scores between these thresholds are regarded as neutral. One of VADER’s strengths is its adeptness at analyzing short, informal texts such as tweets, which often include abbreviations, slang, and emojis. It even considers the impact of text attributes like capitalization and exclamation points on sentiment. Therefore, in this study, VADER’s ability to provide context-specific sentiment scores enables us to capture the nuanced dynamics of sentiment on social media with a high degree of accuracy.

2.6. Topic Modeling

Latent Dirichlet Allocation (LDA) serves as a valuable tool for topic modeling, enabling the identification of underlying topics within tweets. It is particularly well-suited for processing large quantities of unstructured data, such as the text collected from Twitter feeds. The first step involves preprocessing the Twitter data, including tokenization, which splits the text into individual words, the removal of stop words (i.e., frequently used words such as ’is’, ’the’, etc., that offer limited informative value), and stemming or lemmatization, which reduces words to their root form. Specific to this study, we also excluded words like ‘ohio’, ‘train’, ‘derailment’, and ‘palestine’ from the text related to the train derailment event to focus on meaningful content.

Once preprocessing was complete, we applied the LDA model. The underlying premise of LDA is that each tweet is a combination of a specific number of topics, and every word within a tweet fit to one of its topics. Based on word distributions across tweets, LDA identifies these topics. The model initially assigns each word in a tweet to a random topic and then iteratively reassigns words to topics. This reassignment is grounded in the probability of a word belonging to a topic and the tweet containing that topic. Upon multiple iterations, a steady state emerges wherein words are allocated to particular topics. These topics manifest as collections of keywords. By studying these keywords, researchers can deduce the theme of the topic. For this study, we employed Gensim, a Python library, to implement the LDA model for Twitter data analysis.

3. Results

This section presents the key findings derived from analyzing Google Trends and Twitter data. It highlights patterns in temporal dynamics, public sentiment, and discourse related to the studied topic. By integrating multiple analytical approaches, the results offer a comprehensive understanding of how public interest and emotional responses to the train derailment event evolved over time.

3.1. Time-Series Clustering: Google Trends

Our first research question (Q1) aims to understand the regional variations in the popularity of search terms related to the Ohio train derailment events across different areas in the U.S.. To achieve this, we conducted a time-series clustering analysis predicated on the query frequencies derived from Google Trends as described in Section 2.4. The SOM model returns three clusters, characterized by three different patterns, as depicted in Figure 3. Cluster 3 is evidently defined by an array of chaotic patterns, attributed to several states having a low volume of searches. In contrast, Clusters 1 and 2 are distinguished by a biphasic pattern. The initial peak is observed on 6 February, three days after the derailment, concomitant with the imposition of a mandatory evacuation directive for all residents within a 1-by-2-mile radius. The secondary peak is recorded on 16 February, marking the onset of intense deliberations pertaining to the resultant environmental concerns. The major difference between Clusters 1 and 2 resides in the magnitude of the first peak, where Cluster 1 exhibits a markedly elevated peak relative to that of Cluster 2.

Clustering methods not only facilitate the discovery of relationships among sample data but also act as a foundational tool for visual exploration, thereby enhancing domain comprehension As a result, we visualize the spatial distribution of clusters for each contiguous state in the U.S. (a pseudo-geographical representation) as shown in Figure 3. The spatial continuity of these clusters is notable, with Cluster 1 predominantly encompassing states from the Midwest, South, and Southwest, while Cluster 2 largely comprises states from the Northwest and Northeast regions (Figure 4). The geographical distributions of these two clusters suggest that the public awareness derived from Google Trends conforms to Tobler’s First Law of Geography, stipulating that neighboring states manifest similar levels of public awareness. Specifically, Ohio, the site of the derailment incident, and its neighboring states exhibit distinct peaks, with a pronounced surge in magnitude for the initial peak dated 6 February. The detected pattern aligns with the phenomenon known as the mass media spillover effect [25,26]. Initially, the incident would be the subject of exhaustive coverage by local news agencies. Subsequently, this incident might garner attention from regional and national news outlets, thereby facilitating the dissemination of the news across a broader geographic spectrum, reaching a more expansive audience. States proximate to the site of the incident are particularly vulnerable to this spillover effect due to their geographical adjacency to the incident and the presence of intersecting media markets. Consequently, this leads to increased analogous degrees of public awareness or concern about the incident among populations of these neighboring states, notwithstanding the incident transpiring outside their immediate jurisdiction.

3.2. Sentiment and Emotion Analysis

To address the second research question (Q2), which focuses on exploring the temporal changes in the sentiment and tone of social media discussions about the train derailment events in Ohio, we conducted sentiment and emotion analysis using Twitter data. Figure 5 illustrates the temporal trend of various factors related to the Ohio derailment event. The figure presents the number of geotagged tweets discussing the event, the trend of relevant keywords in the Google Search Engine, and the public sentiment scores at the national level.

The blue line in the figure represents the daily normalized tweet count throughout the study period. It reveals that the event gained attention immediately after its occurrence. The average sentiment score experienced a significant decline starting from 9 February. This shift in public sentiment can be attributed to the announcement made by the Ohio Department of Natural Resources, which reported that the chemical spill had contaminated approximately seven and a half miles of stream, resulting in the estimated death of around 3500 fish (https://www.nytimes.com/article/ohio-train-derailment-timeline.html, accessed on 31 December 2024). This alarming news was disseminated widely on Twitter, leading to an apparent change in the sentiment landscape on the platform. For example, a user posted a twitter, “And now there are miles of dead fish floating down Ohio creeks and rivers, dead pets who couldn’t be rescued, the residents of East Palestine can’t even get a comprehensive list of the chemicals spilled in their hometown.”. Based on the number of tweets, it is evident that the level of public concern regarding the event was relatively limited prior to 11 February. However, the situation took a notable turn when the Environmental Protection Agency (EPA) issued a letter to Norfolk Southern stating that the rail car had been carrying hazardous materials, including substances like vinyl chloride and butyl acrylate. These harmful chemicals had been released into the air, surface soil, and surface waters due to the derailment. This critical information significantly elevated the level of public concern, as reflected in social media discourse. Towards the end of February, the Compound sentiment score remained low, under 0. However, there was a noticeable uptick in sentiment as former president Donald Trump visited East Palestine on 22 February 2023, sparking a wave of positive discussions on social media. Despite this, the sentiment scores still lingered in the lower range towards the month’s end. Users sustained their opinions about the potential risks of air pollution and their perspectives on the local government’s handling of the incident, for instance, ‘The railroad corporation lied to the Federal, State, County, City government’s politicians; Before the burn-off, during, and after? East Palestine Ohio was not the only communities effected by the Chemical accident, …’.

The figure also features four solid lines representing the temporal trends of specific keywords searched on Google including “Ohio train derailment”, “East Palestine”, “Ohio chemical”, and “Ohio Toxic”. These trends closely resemble the tweet count, indicating that people started searching for information about the disaster location and the potential impact of pollution following the event. The number of searches spiked significantly after the public was informed about the release of toxic chemicals into the air. In contrast, the public sentiment score, represented by dashed red lines, remained relatively stable in the first few days after the event. This suggests that initially, most people regarded it as a typical derailment incident. However, the sentiment score experienced a significant decline when the severity of the event became known, and concerns about its adverse effects grew among the public.

3.3. Public Awareness and Opinions via Twitter

The third research question (Q3) aims to understand how trending topics related to the train derailment events in Ohio evolved over time on social media platforms. To comprehend these thematic fluctuations and patterns, we engaged in word cloud visualization and topic modeling analytical techniques. Word clouds serve as visual depictions of textual data, wherein the size of each word indicates its frequency and importance. Figure 6 features word clouds from six selected time points during the event, showcasing the trending topics on Twitter. Specifically, 4 February marked the second day of the train derailment event; hence, Twitter witnessed heated discussions revolving around keywords such as ‘fire’, ‘explosion’, and ‘resident’, among others. The significant impact of the train explosion led to the immediate closure of roads to Palestine, resulting in the emergence of ‘traffic’, ‘road’, and ‘close’ as prominent topics. On 9 February, the arrest of a news reporter covering a press conference became another point of focus. As the event transitioned into its second phase, discussions turned toward the issue of EPA, water contamination, and pollution. When Donald Trump visited Palestine, Ohio, on 22 February, it catalyzed a new burst of social media conversation.

In addition to word frequency analysis, we further analyze the main topics discussed on social media platform using LDA model, which well depicted the topic trends in three phases. We selected three dates for the analysis, including 4 February in Phase 1, 15 February in Phase 2, and 25 February in Phase 3. At each time point, we selected four topics, and each topic included 10 words. As shown in Figure 7, on the 2nd day of the event, many tweets centered on the discussion of urgent needs for action, including the primary topics of the cause of the fire, resident evacuation, shelter, and nearby residents. In Phase 2, new topics emerged that showed deeper understanding and concern about the broader impact of the event, especially the rising anxiety about hazardous chemicals. Therefore, Twitter users were primarily discussing trustees to the local government, hazardous chemicals, and river and water pollution. For instance, residents in East Palestine expressed their distrust in public officials and Norfolk Southern during a meeting where the scale of the disaster and public health threat caused by the derailment was not effectively communicated. In Phase 3, as the event moved further into the public’s consciousness, discussions shifted to broader themes of safety, accountability, and community resilience. For instance, people remain concerned about potential impacts on water quality, possible chemical hazards, and the government’s response—particularly the actions and statements from the president, any assistance packages offered, and the broader implications of such a disaster on national policy.

4. Discussion

In this research, we delved into an analysis of public sentiment and opinions concerning the Ohio train derailment event, leveraging both Twitter data and Google Trends data. This multidimensional approach facilitated a comprehensive understanding of the spatial and temporal trends related to this event. Emergency management typically encompasses four phases: mitigation, preparedness, emergency response, and recovery [27]. However, unlike natural disasters, some disruptive events (i.e., train derailment) are unpredictable. As a result, the preparedness phase is not included in this work. Our analysis and results provide crucial insights into how public discourse evolved, from immediate reactions to long-term reflections, and how geographical nuances influenced this discourse. As depicted in Figure 1 and Figure 4, both Twitter and Google Trends data reveal a consistent pattern of public concerns, showing an inverted U-shaped curve. This clearly delineates the three phases of the event: impact, investigation, and response. Public sentiments extracted from Twitter data further corroborate the three distinct phases of the disaster. Therefore, social media platforms and search engines can play an important role in identifying different phases of emergencies. By tracking temporal trends of public awareness, policymakers can develop or adjust policies related to disaster management. For example, there is a dramatic increase in the number of tweets and Google queries from Phase 1 to Phase 2, while sentiment scores are continuously decreasing. Some typical tweets regarding the event include ‘That #trainderailment in #Ohio is giving everyone the perfect reason to flee; like a Cat 5 Hurricane about to hit’, ‘Ummmm what??? WTF are they trying to hide??? #OhioTrainCrash’, and ‘Explain this #Ohio Governor @MikeDeWine! Are you still using #Trump tactics to censor journalists from news media that don’t align with your political party?’ The temporal pattern and the negative content of the posted tweets suggest that local authorities or the governor need to pay more attention to disclosing the truth to the public through various means, ensuring that the government maintains public trust. In addition, understanding how public awareness changes over time can provide insights into behavior patterns. This can be important for predicting how people might respond to future emergencies and for developing strategies to encourage desired behaviors, such as evacuating and building shelters.

This study further identified public opinions towards the event via text mining, offering valuable insights into the concerns, beliefs, and priorities of the affected community. This can guide policymakers and emergency responders in evidence-based decision-making that are in line with the needs and preferences of the public. Understanding opinions can further help authorities tailor their communication strategies to address prevailing misconceptions, fears, and rumors. In this case study, the analysis of Twitter data in Phase 2 revealed a prevalent distrust towards public officials, primarily due to the observed discrepancies between state officials’ recommendations and those of the Ohio EPA. In light of this, it is imperative for local governments to provide clear and concise clarifications to rebuild trust. More importantly, public opinions shed light on the long-term challenges and needs of affected communities. For instance, East Palestine residents have concerns over whether the pollution danger that may be addressed in the short term will pose a potential long-term impact. Such concerns underscore the need for policymakers to implement effective strategies, including routine air and water quality assessments, ensuring access to safe drinking water, and more. Oladeji et al. conducted mobile air quality sampling on 20 February–21 February to complement initial data from the EPA stationary air monitor [28]. The levels of acrolein were high relative to those of other volatile organic compounds, while the average concentrations of xylenes, benzene, vinyl chloride, and toluene were below minimal risk levels for intermediate and chronic exposure. These proactive measures can steer recovery and rebuilding initiatives in a direction that ensures sustainability and aligns with community needs and concerns.

In addition to ongoing environmental monitoring, there is a pressing need to systematically track and assess the health outcomes of local residents to ensure their long-term well-being. This approach is not just about immediate responses but also about understanding the prolonged impact of the incident. The growing public concern is evident in social media, as highlighted by Twitter users’ posts: ‘Yikes! #Health concerns grow in East Palestine, Ohio, after train derailment’, and ‘Everybody in East Palestine needs to (1) get a current health check up including labs and blood work (2) get a lawyer. (3) sign nothing.’. This sentiment echoes the worries of a local East Palestine resident, as revealed in an interview: ‘I’m worried about the health and safety of my family and the residents of East Palestine. What’s our chronic health outlook going to be?’ [29]. These statements underscore the community’s anxiety and the urgent need for reliable health information. To address these concerns, the expertise of epidemiologists and community health professionals becomes invaluable [30]. They are equipped to set up a comprehensive health registry, which would play a crucial role in tracking health outcomes over an extended period. Additionally, conducting thorough and ongoing chemical exposure assessments, as recommended by the CDC, is vital. Moreover, establishing a strong communication channel between health professionals, local authorities, and the community is essential. Regular updates, transparent sharing of information, and community engagement in health monitoring efforts can help alleviate fears and build trust. It also ensures that the community’s concerns are heard and addressed promptly, fostering a collaborative approach to managing this health crisis.

However, there are several limitations to be addressed in this study. First, both Twitter and Google Trends users represent a subset of the population, not the entire population. This could potentially lead to bias in the interpretation of the data. Second, Twitter feeds often contain a significant amount of noise, including irrelevant posts, spam, or misinformation. This could pose challenges in data processing and analysis. While some tools (e.g., tweetbotornot) can be used to filter out noise, they may not be 100% effective. In addition to concerns about data quality, Twitter terminated free access to its APIs in 2023 and now requires a monthly fee. This change has resulted in challenging and costly data collection for academic researchers. Other existing social media platforms, like Facebook and Instagram, could be considered as potential data sources.

In our future work, we plan to include diverse news media platforms such as the Global Database of Events, Language, and Tone (GDELT) (https://www.gdeltproject.org/, accessed on 31 December 2024) and Television Archive (https://archive.org/details/tvarchive, accessed on 31 December 2024). The GDELT Project monitors the world’s news from nearly every corner of the world in over 100 languages and identifies the people, locations, themes, emotions, and events in the news in near real-time. It can serve as an extensive source of global news data that complements local and regional data. Television archives, on the other hand, offer unique access to mainstream narratives and public discourse. By integrating data from these sources, researchers can gain a holistic view of the information landscape during disruptive events. This will not only enhance the accuracy of the sentiment analysis but also broaden the spectrum of the response analysis, ensuring a more inclusive, diversified, and well-rounded assessment. However, with this expansion in data sources, it will be crucial to develop robust methods for data cleaning, integration, and analysis to handle the increased complexity and volume of data. Last but not least, this study is short-term and specifically focuses on the variations in public perceptions over a brief period. However, it is essential to investigate the long-term impact on health outcomes within communities in future research.

5. Conclusions

This study reveals how public discussions on Twitter and web search behavior can serve as a barometer of societal concerns and perceptions, heavily influenced by unfolding events. Twitter’s real-time feed provides a rich source of micro-level data. The analysis of Twitter content using natural language processing techniques can provide deep understanding into public sentiments and reactions. On the other hand, Google Trends, which tracks the frequency of specific research queries, can provide a macro-level view of public interest or concern about the event. Increases in specific search queries can reflect growing public awareness or concern about an event. Moreover, the geographical and temporal distribution of these searches can shed light on the spatial and temporal extent of the event’s impact. Therefore, the combined use of Twitter and Google Trends can thus provide a multi-level, comprehensive picture of disruptive events. While Twitter can offer a granular, individual-oriented perspective, Google Trends can supplement this perspective with a broader, population-level view. This comprehensive approach can significantly improve the ability to detect and assess the societal impacts of disruptive events and to understand their temporal and spatial dynamics.

Author Contributions

Conceptualization, T.H., X.H. and Y.L.; methodology, T.H. and X.H.; software, T.H., X.H. and X.F.; validation, T.H., X.H., Y.L. and X.F.; formal analysis, T.H. and X.H.; investigation, T.H. and X.H.; resources, T.H. and X.F.; data curation, T.H. and X.F.; writing—original draft preparation, T.H., X.H. and Y.L.; writing—review and editing, T.H., X.H. and Y.L.; visualization, T.H. and X.H.; supervision, T.H.; project administration, T.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is available based on request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zou, L.; He, Z.; Zhou, C.; Zhu, W. Multi-Class Multi-Label Classification of Social Media Texts for Typhoon Damage Assessment: A Two-Stage Model Fully Integrating the Outputs of the Hidden Layers of BERT. Int. J. Digit. Earth 2024, 17, 2348668. [Google Scholar] [CrossRef]
Tzavella, K.; Skopeliti, A.; Fekete, A. Volunteered Geographic Information Use in Crisis, Emergency and Disaster Management: A Scoping Review and a Web Atlas. Geo Spat. Inf. Sci. 2024, 27, 423–454. [Google Scholar] [CrossRef]
Gao, S.; Yang, T.; Xu, Y.; Mou, N.; Wang, X.; Huang, H. Enhancing Disaster Situation Awareness Through Multimodal Social Media Data: Evidence from Typhoon Haikui. Appl. Sci. 2025, 15, 465. [Google Scholar] [CrossRef]
Hu, T.; Wang, S.; Luo, W.; Zhang, M.; Huang, X.; Yan, Y.; Liu, R.; Ly, K.; Kacker, V.; She, B.; et al. Revealing Public Opinion Towards COVID-19 Vaccines With Twitter Data in the United States: Spatiotemporal Perspective. J. Med. Internet Res. 2021, 23, e30854. [Google Scholar] [CrossRef] [PubMed]
Huang, X.; Wang, S.; Zhang, M.; Hu, T.; Hohl, A.; She, B.; Gong, X.; Li, J.; Liu, X.; Gruebner, O.; et al. Social Media Mining under the COVID-19 Context: Progress, Challenges, and Opportunities. Int. J. Appl. Earth Obs. Geoinf. 2022, 113, 102967. [Google Scholar] [CrossRef] [PubMed]
Wang, S.; Huang, X.; Hu, T.; She, B.; Zhang, M.; Wang, R.; Gruebner, O.; Imran, M.; Corcoran, J.; Liu, Y.; et al. A Global Portrait of Expressed Mental Health Signals towards COVID-19 in Social Media Space. Int. J. Appl. Earth Obs. Geoinf. 2023, 116, 103160. [Google Scholar] [CrossRef]
Roy, K.C.; Hasan, S.; Sadri, A.M.; Cebrian, M. Understanding the Efficiency of Social Media Based Crisis Communication during Hurricane Sandy. Int. J. Inf. Manag. 2020, 52, 102060. [Google Scholar] [CrossRef]
Wang, Z.; Ye, X. Space, Time, and Situational Awareness in Natural Hazards: A Case Study of Hurricane Sandy with Social Media Data. Cartogr. Geogr. Inf. Sci. 2019, 46, 334–346. [Google Scholar] [CrossRef]
Yu, M.; Huang, Q.; Qin, H.; Scheele, C.; Yang, C. Deep Learning for Real-Time Social Media Text Classification for Situation Awareness—Using Hurricanes Sandy, Harvey, and Irma as Case Studies. Int. J. Digit. Earth 2019, 12, 1230–1247. [Google Scholar] [CrossRef]
Bhuvana, N.; Arul Aram, I. Facebook and Whatsapp as Disaster Management Tools during the Chennai (India) Floods of 2015. Int. J. Disaster Risk Reduct. 2019, 39, 101135. [Google Scholar] [CrossRef]
Boon-Itt, S.; Skunkan, Y. Public Perception of the COVID-19 Pandemic on Twitter: Sentiment Analysis and Topic Modeling Study. JMIR Public Health Surveill. 2020, 6, e21978. [Google Scholar] [CrossRef]
Tsai, M.H.; Wang, Y. Analyzing Twitter Data to Evaluate People’s Attitudes towards Public Health Policies and Events in the Era of COVID-19. Int. J. Environ. Res. Public Health 2021, 18, 6272. [Google Scholar] [CrossRef] [PubMed]
Al-Rawi, A.; Siddiqi, M.; Morgan, R.; Vandan, N.; Smith, J.; Wenham, C. COVID-19 and the Gendered Use of Emojis on Twitter: Infodemiology Study. J. Med. Internet Res. 2020, 22, e21646. [Google Scholar] [CrossRef] [PubMed]
Burke, M.; Heft-Neal, S.; Li, J.; Driscoll, A.; Baylis, P.; Stigler, M.; Weill, J.A.; Burney, J.A.; Wen, J.; Childs, M.L.; et al. Exposures and Behavioural Responses to Wildfire Smoke. Nat. Hum. Behav. 2022, 6, 1351–1361. [Google Scholar] [CrossRef] [PubMed]
Ramachandra, V.; Sethi, M. Machine Learning Based Time Series Forecasting of Chemical Levels from the East Palestine, Ohio Train Derailment Site. medRxiv 2023. [Google Scholar] [CrossRef]
Nix, L.; Vansell, T. The Human and Environmental Concerns of the Ohio Train Derailment Using Geographic Information Systems. 2023 Student Academic Showcase. Master’s Thesis, Lindenwood University, St Charles, MO, USA, 2023. [Google Scholar]
Cruz, A. The Chaos Beyond the Rails: A Case Study on Communication Established During the Ohio Train Derailment. Master’s Thesis, San Diego State University, San Diego, CA, USA, 2024. [Google Scholar]
Ghosh, S.; Chen, Y.; Dou, W. Railroad Safety: A Systematic Analysis of Twitter Data. Case Stud. Transp. Policy 2024, 15, 101154. [Google Scholar] [CrossRef]
Lenharo, M. Ohio Train Derailment: Scientists Scan for Lingering Toxics. Nature 2023. [Google Scholar] [CrossRef]
Miljković, D. Brief Review of Self-Organizing Maps. In Proceedings of the 2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 22–26 May 2017; pp. 1061–1066. [Google Scholar]
Barreto, G.A. Time Series Prediction with the Self-Organizing Map: A Review. In Perspectives of Neural-Symbolic Integration; Hammer, B., Hitzler, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; pp. 135–158. ISBN 978-3-540-73954-8. [Google Scholar]
Liu, Y.; Weisberg, R.H.; Liu, Y.; Weisberg, R.H. A Review of Self-Organizing Map Applications in Meteorology and Oceanography. In Self Organizing Maps—Applications and Novel Algorithm Design; IntechOpen: London, UK, 2011; ISBN 978-953-307-546-4. [Google Scholar]
Pölzlbauer, G.; Rauber, A.; Dittenbach, M. Advanced visualization techniques for self-organizing maps with graph-based methods. In International Symposium on Neural Networks; Springer: Heidelberg, Berlin, 2005; pp. 75–80. [Google Scholar]
Hutto, C.; Gilbert, E. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. In Proceedings of the International AAAI Conference on Web and Social Media, Ann Arbor, MI, USA, 1–4 June 2014; Volume 8, pp. 216–225. [Google Scholar] [CrossRef]
Kim, S.-J. The Spillover Effects of US and Japanese Public Information News in Advanced Asia-Pacific Stock Markets. Pac. Basin Financ. J. 2003, 11, 611–630. [Google Scholar] [CrossRef]
Zhang, H.; Hong, H.; Guo, Y.; Yang, C. Information Spillover Effects from Media Coverage to the Crude Oil, Gold, and Bitcoin Markets during the COVID-19 Pandemic: Evidence from the Time and Frequency Domains. Int. Rev. Econ. Financ. 2022, 78, 267–285. [Google Scholar] [CrossRef]
Xiao, Y.; Huang, Q.; Wu, K. Understanding Social Media Data for Disaster Management. Nat. Hazards 2015, 79, 1663–1679. [Google Scholar] [CrossRef]
Oladeji, O.; Saitas, M.; Mustapha, T.; Johnson, N.M.; Chiu, W.A.; Rusyn, I.; Robinson, A.L.; Presto, A.A. Air Pollutant Patterns and Human Health Risk Following the East Palestine, Ohio, Train Derailment. Environ. Sci. Technol. Lett. 2023, 10, 680–685. [Google Scholar] [CrossRef] [PubMed]
Jaffe, S. Hazardous Train Spills Prompt Calls for Tougher Safety Rules. Lancet 2023, 401, 1143–1144. [Google Scholar] [CrossRef]
Schooling, C.M.; Jones, H.E.; McDermott, S. East Palestine, Ohio, Railroad Derailment—Lessons to Learn, Actions to Take. Am. J. Public Health 2023, 113, 841–843. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Data collection and analysis workflow.

Figure 2. Three phases of Ohio train derailment event.

Figure 3. The clustering results from Google Trends with “Ohio train derailment” as the keyword for the search query at the U.S. state level.

Figure 4. Cluster distribution at the U.S. state level (a pseudo-geographical representation).

Figure 5. Sentiment analysis and Google trend topics.

Figure 6. Text cloud mapping of Twitter data.

Figure 7. Selected topic modeling results in three phases.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, T.; Huang, X.; Li, Y.; Fu, X. Harnessing the Power of Multi-Source Media Platforms for Public Perception Analysis: Insights from the Ohio Train Derailment. Big Data Cogn. Comput. 2025, 9, 88. https://doi.org/10.3390/bdcc9040088

AMA Style

Hu T, Huang X, Li Y, Fu X. Harnessing the Power of Multi-Source Media Platforms for Public Perception Analysis: Insights from the Ohio Train Derailment. Big Data and Cognitive Computing. 2025; 9(4):88. https://doi.org/10.3390/bdcc9040088

Chicago/Turabian Style

Hu, Tao, Xiao Huang, Yun Li, and Xiaokang Fu. 2025. "Harnessing the Power of Multi-Source Media Platforms for Public Perception Analysis: Insights from the Ohio Train Derailment" Big Data and Cognitive Computing 9, no. 4: 88. https://doi.org/10.3390/bdcc9040088

APA Style

Hu, T., Huang, X., Li, Y., & Fu, X. (2025). Harnessing the Power of Multi-Source Media Platforms for Public Perception Analysis: Insights from the Ohio Train Derailment. Big Data and Cognitive Computing, 9(4), 88. https://doi.org/10.3390/bdcc9040088

Article Menu

Harnessing the Power of Multi-Source Media Platforms for Public Perception Analysis: Insights from the Ohio Train Derailment

Abstract

1. Introduction

2. Study Data and Methods

2.1. Ohio Train Derailment Event Timeline

2.2. Google Search Data

2.3. Social Media Data

2.4. Time-Series Clustering

2.5. Sentiment Analysis

2.6. Topic Modeling

3. Results

3.1. Time-Series Clustering: Google Trends

3.2. Sentiment and Emotion Analysis

3.3. Public Awareness and Opinions via Twitter

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI