Identifying Public Perceptions toward Emerging Transportation Trends through Social Media-Based Interactions

The objective of this study is to mine and analyze large-scale social media data (rich spatiotemporal data unlike traditional surveys) and develop comparative infographics of emerging transportation trends and mobility indicators by adopting natural language processing and data-driven techniques. As such, first, around 13 million tweets for about 20 days (16 December 2019–4 January 2020) from North America were collected, and tweets closely aligned with emerging transportation and mobility trends (such as shared mobility, vehicle technology, built environment, user fees, telecommuting, and e-commerce) were identified. Data analytics captured spatio-temporal differences in social media user interactions and concerns about such trends, as well as topics of discussions formed through such interactions. California, Florida, Georgia, Illinois, New York are among the highly visible cities discussing such trends. Being positive overall, people carried more positive views on shared mobility, vehicle technology, telecommuting, and e-commerce, while being more negative on user fees, and the built environment. Ride-hailing, fuel efficiency, trip navigation, daily as well as shopping and recreational activities, gas price, tax, and product delivery were among the emergent topics. The social media data-driven framework would allow real-time monitoring of transportation trends by agencies, researchers, and professionals.


Introduction and Motivation
With the rapid expansion of modern technologies, the wide availability of spatial data and smartphone apps, and emerging transportation options, the landscape of transportation demand and supply are changing. The increased availability of technology-enabled transportation options (e.g., ridesharing) and modern communication devices (smartphones, in particular) are transforming travel-related decision-making in the population differently at a different level. These complex dynamics of emerging mobility behaviors are also expected to be influenced by individual lifestyles and different social (e.g., education), economic (e.g., employment), demographic (e.g., gender) factors. National Cooperative Highway Research Program (NCHRP) explores the effect of these factors on travel demand [1][2][3]. Several studies reflect the dynamics of mobility patterns influenced by the built environment [4][5][6][7]. Moreover, numerous studies have paid attention to exploring the impact of different socioeconomic factors, such as age, gender, and income levels on travel behavior [8][9][10].
The main sources of data used in the abovementioned studies are surveys (e.g., travel surveys), that feature representative populations and detailed information about travel mode and trip purposes. Survey data have some limitations, such as variabilities across different locations in data collection method and data availability, lack of real-time engagement of the respondents, expansive and time-consuming as trend analysis requires periodic data collection. So, the real-time spatiotemporal monitoring of people's travel 1.
Identify spatio-temporal characteristics of relevant social media interactions on shared mobility, vehicle technology, built environment, user fees, e-commerce, and telecommuting which can give an understanding about the spatial and temporal distribution of the relevant tweets describing the emerging transportation trends; 2.
Measure public sentiments and perceptions on emerging transportation trends through natural language processing such as sentiment analysis, which can allow the classification of tweets based on sentiment scores (highly positive, positive, neutral and negative, highly negative); 3.
Explore spatio-temporal differences of user sentiments by classifying sentiment scores on transportation and mobility indicators which can make sense about the spatial and temporal distribution of tweets concerning their sentiment direction; 4.
Extract emerging transportation topics and user concerns from social media interactions through Latent Dirichlet Allocation (LDA) which is a machine learning approach to identify the patterns of the filtered relevant tweets to recognize the emerging transportation trends.

Background and Related Work
Though SMPs are relatively new fields for research, researchers have used them in various cases such as travel demand forecasting, mobility pattern identification, disaster management, mass transit evaluation, and traffic incident management.
There are several studies where SMPs have been used to forecast travel demand. Golder and Macy [12] and Yin et al. [13] investigated the capacity, scope, and application of various SMPs to derive information on household daily travel. Tasse and Hong discussed the opportunities of using geotagged social media data instead of traditional survey data to understand people's mobility patterns, the average distance traveled, and the overall spatial distribution of urban areas [14]. Liao et al. [15] developed a novel approach using geotagged tweets which demonstrated twitter's potentiality for estimating travel demand, through careful consideration of sampling method, estimation model, and sample size.
SMPs have been applied to understand mass human mobility patterns, too. These studies have established Location-based Social Networking (LBSN) data as a strong proxy not only for tracking and predicting human movement, identifying mobility patterns, and recognizing various geographic and economic factors that affect human mobility patterns at aggregate levels across different geographical scales [16][17][18] but also to model user activity patterns. Hasan and Ukkusuri presented a novel approach to understand the urban human activity and mobility patterns using large-scale location-based data characterizing temporal and spatial aspects of the mobility and activity patterns [19,20]. Üsküplü et al. [21] discovered and analyzed the activity patterns of the parts of the historical districts of Istanbul by evaluating the data generated from location-based social networks.
Opinion mining has been performed in a few studies to show people's attitude towards public transit, which can affect the way stakeholders think about future transit investments [22]. Pender et al. [23] applied crowdsourcing techniques to derive transit service information that can satisfy the increased demand and expectation for real-time information dissemination. Luong and Houston [24] also used social media data to study public attitudes about light rail transit services in Los Angeles. Haghighi et al. [25] proposed a framework to examine the opinion of transit riders' opinions about the quality of transit service in the region of Salt Lake City using Twitter data.
Recent studies have also extracted traffic data from social media for transportation network operation and management purposes. Tian et al. [26] validated traffic incidents posted on social media by checking camera footage data and found that tweets about severe incidents tended to be more accurate. Steur [27] showed the correlation between accidents and the frequency of tweets near the incident locations. Several studies also show the potentiality of SMPs in disaster management. Wang and Taylor [28,29] studied the perturbation and resilience of human mobility patterns during and after tropical storms and confirmed the correlation of daily human trajectories between steady-state and perturbation state and the high inherent resiliency of human mobility using Twitter data. Researchers also have focused on detecting effective social media users and explored their network features to understand the spread of targeted information in major disasters [30,31]. There are few recent studies that showed the potentiality of Twitter data at a traffic predictor [32][33][34]. This made Twitter a promising source for real-time traffic management and potentially extended for traffic prediction at any time of day.
In summary, SMPs have been utilized to retrieve relevant information for demand prediction, pattern recognition, transit evaluation, incident management, and disaster management. No study has used SMPs to infer public opinions and perceptions toward emerging transportation trends on spatio-temporal platforms. As such, this pilot study presents a comprehensive approach to exploring how SMPs (Twitter) can be used to understand public perceptions and attitudes towards emerging transportation and mobility trends using natural language processing and data-driven methods.

Data Collection and Preparation
The research team created a Twitter Developer Account using Twitter Apps [35] to retrieve data through Twitter Streaming API (Application Programming Interfaces). Python programming language was used to collect the data and associated Python libraries have been used. The focus of this study is English geotagged tweets as tweet geographic information is a potential parameter for spatio-temporal analysis, the location-based data collection method produced a more suitable and reliable dataset that serves the goal of the study. As a result, tweets from North America and its surrounding area (as most of the people in this region speak English), are collected using a location-bounding box for around 20 days (16 December 2019-4 January 2020) which covered USA, Canada, Mexico, Cuba, Puerto Rico, and part of Guatemala and Greenland ( Figure 1). thon programming language was used to collect the data and associated Python libraries have been used. The focus of this study is English geotagged tweets as tweet geographic information is a potential parameter for spatio-temporal analysis, the location-based data collection method produced a more suitable and reliable dataset that serves the goal of the study. As a result, tweets from North America and its surrounding area (as most of the people in this region speak English), are collected using a location-bounding box for around 20 days (16 December 2019-4 January 2020) which covered USA, Canada, Mexico, Cuba, Puerto Rico, and part of Guatemala and Greenland ( Figure 1). The raw data contain approximately 12.9 million tweets. Approximately 100% of tweets are geotagged and mostly in English (about 77%) with around 0.97 million unique users. Tweets retrieved from the streaming API contain additional information such as user id, profile information, coordinate of tweeting location, and creation time along with the tweet text. Only tweet texts, tweet creation time, and location information were considered for analysis in this study. Given the inherent ambiguity of tweets (e.g., non-standard spelling, inconsistent punctuation, and/or capitalization), additional preprocessing steps are performed to extract clean tweet text which is suitable for analysis. Noises (such as html tags, character codes, emojis, stop words, etc.) were removed from the text data, and tweets were tokenized which is the process of breaking down an expression, sentence, paragraph, or even an entire text document into smaller units such as individual words or phrases. Tokens are the names given to each of these smaller units.
The purpose of this study is to understand public opinion and identify emerging transportation trends using twitter data. For this purpose, the relevant tweets regarding emerging transportation trends need to be crawled systematically, which is one of the most important tasks in this study. During tweet crawling the following six major categories of emerging mobility trends were planned to keep in focus.
1. Shared Mobility: shared, mobility, carpool, car, uber, lyft, etc.; 2. Vehicle Technology: autonomous, automated, self-driving, connect, connected, etc.; The raw data contain approximately 12.9 million tweets. Approximately 100% of tweets are geotagged and mostly in English (about 77%) with around 0.97 million unique users. Tweets retrieved from the streaming API contain additional information such as user id, profile information, coordinate of tweeting location, and creation time along with the tweet text. Only tweet texts, tweet creation time, and location information were considered for analysis in this study. Given the inherent ambiguity of tweets (e.g., non-standard spelling, inconsistent punctuation, and/or capitalization), additional preprocessing steps are performed to extract clean tweet text which is suitable for analysis. Noises (such as html tags, character codes, emojis, stop words, etc.) were removed from the text data, and tweets were tokenized which is the process of breaking down an expression, sentence, paragraph, or even an entire text document into smaller units such as individual words or phrases. Tokens are the names given to each of these smaller units.
The purpose of this study is to understand public opinion and identify emerging transportation trends using twitter data. For this purpose, the relevant tweets regarding emerging transportation trends need to be crawled systematically, which is one of the most important tasks in this study. During tweet crawling the following six major categories of emerging mobility trends were planned to keep in focus. To facilitate the relevant tweet crawling, researchers searched different emerging transportation related words (e.g., uber, e-scooter, transit, e-hail) in twitter to check the availability of emerging transportation related tweets. Finally, in total 205 keywords in the six major categories were identified as relevant to emerging mobility trends. If any tweet contained at least one of the keywords from the list of 205 keywords, that tweet was considered relevant to this study. Although this approach may filter out some relevant tweets, it ensures that all tweets involving these keywords were included in the filtered dataset for further analysis. After filtering the dataset, a total number of 1.25 million (9.68% Future Transp. 2021, 1 798 of the total tweets) relevant English tweets to emerging mobility trends were obtained for this study. Table 1 presents the keywords used to filter relevant tweets and the total relevant tweet count for each category. The percentage value represents the percentage of tweets that contained the specific keywords concerning the whole dataset.

Spatial and Temporal Analysis
Twitter allows users to share their location from where the user posted the tweet, which is a confined area, generated automatically with the tweet if the location of the user's device remains enabled. Geolocational information and timestamp of tweets were extracted from the 'place' and 'created_at' fields, respectively. Temporal or time series analysis is one of the best techniques to understand the internal patterns (trends, temporal variation) within data over time. Heatmaps were produced to represent the correlation between the most frequently used words in relevant tweets and the dates when they were tweeted. This illustrates the daily variation of popular words that have been tweeted, which provides insight into the temporal variation of the most popular and unpopular trends over time. Another type of heatmaps, plotting the inter-relationship between the most frequently used words and tweet location, was also created. It is a very efficient way to understand the spatial variation of the popularity of transportation trends. For this reason, geotagged tweets were considered as a source to improve situational awareness and improve the understanding of real-world transportation trends.

Sentiment Ratings
Sentiment analysis or opinion mining is the computational study of opinions, sentiments, and emotions. It tries to infer people's sentiments based on their language expressions expressed in a text. It usually uses a sentiment lexicon to provide sentiment scores on the generated corpus (a textual body clustered by required class or cluster). The analysis focuses on individual sentence targets to determine whether a sentence expresses an opinion or not (often called subjectivity classification), and if so, whether the opinion is positive or negative (called sentence-level sentiment classification). Assume an opinionated document (tweet) be w, which expresses on a subject or a group of subjects. Generally, w = (w 1 , w 2 , . . . w i , . . . , w n ) where w i is a word or sentence. An opinion passage on a feature f of an object o evaluated in w is a group of consecutive sentences in w that expresses a positive or negative opinion on f. Additionally, sentiments also contain subjectivity. A subjective sentence expresses some personal feelings or beliefs. Sentence-level sentiment classification involves two definite tasks with a single assumption [36]. These are stated below: • Task: Given a sentence s, two subtasks are performed: 1. Subjectivity Classification: Determine whether w is a subjective sentence or an objective sentence, 2.
Sentence-level sentiment classification: If w is subjective, determine whether it expresses a positive or negative opinion.
• Assumption: The tweet w expresses a single opinion from a single opinion holder In this study, we used a Python package called VADER [37], which detects the sentiment value of a short text, for analyzing the sentiments of relevant tweets about emerging transportation trends. Using a pre-defined list of words, VADER assigns a final compound score to each of the input words, which is the sum of all the lexicon ratings which have been standardized to range between −1 and 1 [38]. VADER considers currently, frequently used slang and informal writings-multiple punctuation marks, acronyms, and emoticons-to express how a person is feeling, which makes VADER great for social media text.
People express their attitudes differently on different topics on twitter. Some show neutral views on certain topics but express dissatisfaction on other issues. Similarly, on a specific issue, some people may give just satisfactory remarks, and some may express their highest satisfaction. In the case of the transportation sector, this variability of people's attitudes towards different topics (e.g., transportation facility or trend) is a goldmine for transportation authorities (e.g., researchers, professionals) to assess public satisfaction and acceptance level of transportation facilities. This will discover valuable insights about a certain transportation facility or trend that will help the transportation authority to make better decisions. To capture this different level of attitude (highly negative, negative, neutral, positive tweets, and highly positive) new intervals of scores have been introduced between the standardized compound sentiment score from −1 to 1.
As there is no systematic approach in the literature to identify different sentiment categories, equal intervals of the scores have been taken between the standardized compound sentiment score from −1 to 1 to define sentiment classes. So, to decide on a range to categorize highly negative, negative, neutral, positive tweets, and highly positive, a heatmap of the sentiment scores was produced and used to gauge roughly where scores were landing −1 to −0.6 (highly negative), −0.6 to −0.2 (negative), −0.2 to 0.2 (neutral), 0.2 to 0.6 (positive), and 0.6 to 1.0 (highly positive). These intervals were ultimately set as the bounds for the five categories. Some real tweets from this study were presented here as examples to demonstrate the categories: 1.
"thank you for creating vision for sustainability and leading the way not only electric cars but also solar autonomous software energy storage among other accomplishments im looking forward seeing what you and your team create"-Highly Positive (Score: 0.7992); 2.
"They'd stop fighting long enough maybe we'd all have autonomous self-driving cars the road now"-Negative (Score: −0.296); 5.

Topic Mining
To identify the patterns of the filtered tweets to recognize the emerging transportation trends, the topic mining technique is applied in this study. Latent Dirichlet Allocation (LDA) or topic modeling approach [39] is applied in this study. Topic modeling is a machine learning technique that analyzes text data automatically to classify cluster terms for a series of documents. LDA used a probabilistic latent semantic analysis model to recognize the patterns of the posted tweets. Though the topic model has been used popularly in machine learning, recently it was being applied in transportation studies [20,40,41].
The probabilistic procedure for the document (tweet) generating is adopted in LDA which starts with choosing a distribution ψ k over words in the vocabulary for each topic k (k 1, K) (Steyvers and Griffiths 2007). Here, ψ k is selected from a Dirichlet distribution Dirichlet v (β) . After that, another distribution θ d over K topics is sampled from a different Dirichlet distribution Dirichlet k (α) to generate a document d (a set of word w d ). Thus, a topic is assigned for each word in w d followed by selecting each word w di based on θ d .
For LDA, initial sampling is done on a particular topic z di 1, K from a multinomial distribution Multinomial k (θ d ) in generating each word w di . Finally, the word w di is selected from the multinomial distribution Multinomial v (ψ zdi ) . Figure 2 shows the graphical representation of LDA by Sun and Yin [42]. The inference of LDA models can be done by applying the variational expectation-maximization (VEM) algorithm [39] or through Gibbs sampling [43]. The posterior of document-topic distribution θ d and topic-word distribution ψ can be efficiently inferred by both methods which allow us to discover the latent thematic structure from a large collection of documents [42].
Future Transp. 2021, 1, FOR PEER REVIEW 8 For LDA, initial sampling is done on a particular topic ϵ 1, K from a multinomial distribution ( ) in generating each word . Finally, the word is selected from the multinomial distribution ( ) . Figure 2 shows the graphical representation of LDA by Sun and Yin [42]. The inference of LDA models can be done by applying the variational expectation-maximization (VEM) algorithm [39] or through Gibbs sampling [43]. The posterior of document-topic distribution and topic-word distribution can be efficiently inferred by both methods which allow us to discover the latent thematic structure from a large collection of documents [42].  [42]).
The key steps involved in the data analysis for the Tweet data are summarized in Figure 3. The key steps involved in the data analysis for the Tweet data are summarized in Figure 3.  [42]).
The key steps involved in the data analysis for the Tweet data are summarized in Figure 3.

Results and Discussion
After data processing and cleaning, a total number of 1.25 million relevant English tweets were obtained for further analysis. Figure 4 presents the main components and characteristics of the dataset. There are mostly three kinds of location information that can be extracted from a tweet:

Results and Discussion
After data processing and cleaning, a total number of 1.25 million relevant English tweets were obtained for further analysis. Figure 4 presents the main components and characteristics of the dataset. There are mostly three kinds of location information that can be extracted from a tweet:

Spatio-Temporal Heatmaps of Tweets
Spatio-temporal distribution of tweeting activities can broaden the understanding of the credibility and representativeness of the datasets being used for the analyses over space and time. Due to the limitation of measuring the statistically significant difference

Spatio-Temporal Heatmaps of Tweets
Spatio-temporal distribution of tweeting activities can broaden the understanding of the credibility and representativeness of the datasets being used for the analyses over space and time. Due to the limitation of measuring the statistically significant difference mathematically over different categories across different places, visual inspection was adopted, and almost identical spatio-temporal distribution patterns were observed over across all categories i.e., shared mobility, vehicle technology, built environment, user fees, telecommuting, and e-commerce ( Figure 5). Figure 5 is a two-dimensional representation of tweeting activities based on tweet originating dates and the most frequent 50 locations (state-level). Places such as California, Florida, Georgia, Illinois, New York, North Carolina, Ohio, Pennsylvania, Texas, Virginia, and Washington were among the most frequent locations and generated around 10 thousand tweets daily on emerging transportation trends. People from these locations were likely to be more expressive of emerging mobility trends through social media interactions as evident from Twitter. In contrast, places such as Al-  Places such as California, Florida, Georgia, Illinois, New York, North Carolina, Ohio, Pennsylvania, Texas, Virginia, and Washington were among the most frequent locations and generated around 10 thousand tweets daily on emerging transportation trends. People from these locations were likely to be more expressive of emerging mobility trends through social media interactions as evident from Twitter. In contrast, places such as Alberta, Clarendon, Delaware, Hawaii, Maine, Nebraska, New Hampshire, New Mexico, Quebec, Rhode Island, Utah, and West Virginia generated as low as only around 1.5 thousand tweets per day on emerging trends. Other locations that appear in Figure 5 represent moderate levels of concern among social media users (around 3-10 thousand tweets on average). Locations that do not appear in Figure 5 were inactive with less than 100 tweets a day. These findings indicate spatial diversity of the transportation-related needs and concerns people express through social media channels and the need to utilize such information to develop new policies meeting the diverse needs people may have in different locations. Moreover, the temporal patterns for almost all locations indicate people were less expressive of such concerns during and immediately before/after a government holiday such as Christmas and New Year.

Temporal Heatmaps of Tweet Keywords
To delve deeper into the understanding of social media interactions on different categories, i.e., shared mobility, vehicle technology, built environment, user fees, telecommuting, and e-commerce, temporal heatmaps of tweet keywords were generated ( Figure 6).
The word frequencies in the heatmaps indicates that people tweeted more about user fees and e-commerce, followed by vehicle technology, telecommuting, built environment, and shared mobility. This indicates the potential to utilize such information to rank people's social media interactions and leverage social sharing platforms to promote user interests in emerging trends based on similar word clustering. A closer look at the word heatmaps by categories shows the following findings: Shared Mobility: • 'car', 'share', and 'ride' showed strong presence, followed by 'uber', 'bike', 'bird', and 'shared'; • 'Uber' was more popular than 'Lyft'; • Emerging platforms such as 'vanpool, 'bikeshare, 'escooter', 'uberpool' were found less frequent on Twitter; • 'bike' and 'bicycle' showed less prominence compared to 'car'. This is indicative of the need to leverage social media for bike-sharing.
'Hybrid' and 'autonomous' showed less prominence relative to 'energy'. This is indicative of the need to leverage social media for hybrid and autonomous transport.

Sentiment Analysis
While the heatmaps of tweeting keywords provided the significance of individual keywords representing social media user concerns on transportation and mobility trends, the combined effects of multiple words in each tweet were analyzed to quantify user emotion or sentiments based on such interactions. Sentiment analyses of tweets were performed by the VADER python package and corresponding user sentiments were reported as highly negative, negative, neutral, positive, and highly positive. Sentiment or opinion mining results are presented in Figures 7 and 8 for each category i.e., shared mobility, vehicle technology, built environment, user fees, telecommuting, and e-commerce. While Figure 7a shows the distribution of relative sentiments i.e., percentage distribution of five different sentiment types for all the relevant tweets, Figure 7b presents the distribution of relative sentiments i.e., percentage distribution of five different sentiment types for all the six categories. Figure 8  • Most of the tweets on shared mobility showed positive and highly positive views in almost all the places. However, places such as Arkansas, Clarendon, Georgia, Louisiana, and Mississippi showed some exceptions, generating a relatively higher proportion of neutral and negative tweets which reflects the people's less positive attitude towards shared mobility in these places. Necessary steps should be taken to improve the facilities in this sector of emerging mobility trends at these places; • Though tweets on vehicle technology also showed an almost similar trend such as shared mobility at different places, places such as Alabama, Connecticut, Delaware, Rhode Island, and West Virginia generated a relatively higher proportion of neutral and negative tweets. Authorities in these places should plan accordingly to introduce facilities of this sector of emerging mobility trends to attain people's positive attitudes; • In the case of the built environment, though most of the tweets are positive over different places, there is also a higher proportion of neutral and negative tweets in many places concerning other categories (except user fees); • In most places, tweets are more likely positive, neutral, and negative on user fees. Even places such as Rhode Island, Washington, Colorado generated a higher proportion of negative tweets than other sentiment types. Necessary steps should be taken to improve the facilities in this sector of emerging mobility trends at these places; • Telecommuting and E-commerce showed similar patterns over different places. In all places, tweets showed mostly positive and highly positive views and there are a very small proportion of neutral, negative, and highly negative tweets. Rapid advancements in available facilities in these sectors of emerging mobility trends are probably the reasons behind this higher public satisfaction; • Overall, most locations showed a more positive attitude towards shared mobility, vehicle technology, telecommuting, and e-commerce, whereas relatively more negative on the built environment and user fees.   A few key observations from Figure 8 are summarized here: • Most of the tweets on shared mobility showed positive and highly positive views in almost all the places. However, places such as Arkansas, Clarendon, Georgia, Louisiana, and Mississippi showed some exceptions, generating a relatively higher proportion of neutral and negative tweets which reflects the people's less positive attitude towards shared mobility in these places. Necessary steps should be taken to improve the facilities in this sector of emerging mobility trends at these places; • Though tweets on vehicle technology also showed an almost similar trend such as shared mobility at different places, places such as Alabama, Connecticut, Delaware, Rhode Island, and West Virginia generated a relatively higher proportion of neutral and negative tweets. Authorities in these places should plan accordingly to introduce facilities of this sector of emerging mobility trends to attain people's positive attitudes; These findings indicate the need to design and implement more dedicated and targeted efforts to improve public satisfaction on certain transportation aspects based on quantitative evidence observed through social media interactions.

Topic Modeling
Topic modeling analysis was applied to investigate how different combinations of words in the data may constitute social interaction topics of transportation trends. While sentiment analyses helped to quantify positive, neutral, or negative attitudes of social media users, topic models typically provide more insights on the actual topics that exist in text data. Topic coherence means the average/median of the pairwise word-similarity scores of the words in the topic, and has been used to specify the number of unique topics [44]. A good topic modeling depends on the higher coherence which depends on two predefined parameters: (a) number of topics; (b) number of iterations. The optimal number of topics and iteration was estimated after several trials. The tentative generated topics (in total 33) for six categories are presented in Figure 9. People primarily mentioned ride-hailing and employment opportunities as part of shared mobility. On vehicle technology, interactions mainly included topics on fuel efficiency and trip navigations. Regular activities on a day-to-day basis are among the built environment topics in addition to shopping and recreational activities. Under the user fees category, people were more concerned about gas price, tax, and expressways along with their probable frustration towards lane blocks while driving. On telecommuting, people talked more about the holiday season and healthcare activities, customer services related to item delivery were among the predominant topics on e-commerce. Such topics and associated words provide better insights on how to identify and connect to social media users based on their topics of interest and the use of specific keywords that can maximize influence.  Among these 33 topics, some are relevant, and some are irrelevant to this study. Finally, a total of 17 topics were identified as relevant to different emerging transportation trends which have been reported in Table 2. Table 2 reports the topic modeling coherence score for each category as well as the probable interaction topics with their probability in that category. Topic probability represents the probabilistic distribution of all the tweets in a certain category of transportation trend among different topics generated by topic modeling. For example, "Ride-Hailing" has a topic probability of 0.472 in the "Shared Mobility" category. This indicates that 47.2% of tweets in "Shared Mobility" are expected to discuss ride-hailing.
Moreover, only the five most frequent associated words contributing to the formation of a topic with their probability in that topic (in brackets) were reported in Table 2 for illustration purposes. A topic consists of hundreds of words depending on the volume of datasets. As a result, the word probability in a certain topic for a given word may be very small. However, the reported five most frequent words in Table 2 for a certain topic have significantly higher probability values than the remaining words in that topic.
People primarily mentioned ride-hailing and employment opportunities as part of shared mobility. On vehicle technology, interactions mainly included topics on fuel efficiency and trip navigations. Regular activities on a day-to-day basis are among the built environment topics in addition to shopping and recreational activities. Under the user fees category, people were more concerned about gas price, tax, and expressways along with their probable frustration towards lane blocks while driving. On telecommuting, people talked more about the holiday season and healthcare activities, customer services related to item delivery were among the predominant topics on e-commerce. Such topics and associated words provide better insights on how to identify and connect to social media users based on their topics of interest and the use of specific keywords that can maximize influence.

Study Limitations and Future Research Directions
These study results showed that there seem to be significant potentials for using social media data to develop models for the identification of emerging transportation indicators and long-term planning purposes. However, it is acknowledged that small events that are retweeted several times may affect the collected dataset. Future studies should address these issues to eliminate biases due to such small events. Moreover, due to issues with user privacy that limit the availability of personal information, there is usually insufficient information on social media users to detect biases in any given subject's sample population. Further studies are needed to consider the user's profile background (e.g., gender, race) in the analysis to reduce the sample population biases.
Twitter users include people, news organizations, companies, and, perhaps most troublingly, are not always human. Previous research has shown that Twitter includes many bots that automatically send tweets, mostly to promote a product or a political campaign [45]. The elimination of these tweets is not achieved in this study, but several methods for finding them have been proposed [46][47][48]. So special caution is required to the biases associated with social media data in future studies.
Another limitation is that Twitter data was not able to collect all the tweets during that period as the streaming API was used for collecting tweets. As that specific API does not allow collecting all data. To make this type of online social media research more authentic and comprehensive, a different type of paid Twitter API (Power track, Enterprise) and other social media platforms (Facebook, LinkedIn, etc.) can also be used for future research which will collect most of the tweets.
In this study, no spatial analyses using spatial statistics have been performed. Further analyses are suggested using spatial statistics that will allow for better understanding of the spatial distribution of emerging transportation trends using twitter data. This will lead to new perspectives of decision making for the researchers, professionals, etc. This sort of spatial strategy, visualization, and statistical data may assist in making data-driven planning decisions [49].

Conclusions
Transportation researchers in recent times have used SMPs extensively for problems related to travel demand forecasting, activity pattern modeling, transit service assessment, traffic incident, and disaster management, among others. Yet, there is still much more to explore how such information can contribute towards understanding public perception and attitude towards emerging transportation trends and mobility indicators. As such, the goal of this study is to mine and analyze large-scale public interactions from SPMs enriched with time and location information and develop comparative infographics of emerging transportation trends and mobility indicators using natural language processing and data-driven techniques.
About 13 million tweets for about 20 days (16 December 2019-4 January 2020) were collected using Twitter API. Tweets closely aligned with emerging transportation and mobility trends such as shared mobility, vehicle technology, built environment, user fees, telecommuting, and e-commerce were identified. Data analytics captured spatio-temporal differences in social media user interactions and concerns about such trends as well as topics of discussions formed through such interactions. California, Florida, Georgia, Illinois, New York, North Carolina, Ohio, Pennsylvania, Texas, Virginia, and Washington are among the highly visible cities discussing such trends. Key observations from sentiment analysis indicate that around one-third of the relevant tweets are positive and about onefifth expressed highly positive views. Moreover, around 24% of tweets showed negative views (negative and highly negative). People carried more positive views on shared mobility, vehicle technology, telecommuting, and e-commerce while being more negative on user fees, and built environment. Analysis of sentiment over space showed that most locations showed a more positive attitude towards shared mobility, vehicle technology, telecommuting, and e-commerce, whereas attitude was relatively more negative on the built environment and user fees. These findings show the need to create and implement more committed and targeted measures to increase public satisfaction on certain emerging transportation trends at different places.
Topic modeling analysis identified 17 topics related to transportation trends. Ride-hailing, fuel efficiency, trip navigation, daily as well as shopping and recreational activities, gas price, tax, product delivery were among the topics. Specifically, people primarily mentioned ride-hailing and employment opportunities as part of shared mobility. On vehicle technology, interactions mainly included topics on fuel efficiency and trip navigations. Regular activities on a day-to-day basis are among the built environment topics in addition to shopping and recreational activities. Under the user fees category, people were more concerned about gas price, tax, and expressways along with their probable frustration towards lane blocks while driving. On telecommuting, people talked more about the holiday season and healthcare activities, customer services related to item delivery were among the predominant topics on e-commerce. Such topics and associated words provide better insights on how to identify and connect to social media users based on their topics of interest and the use of specific keywords that can maximize influence. The above-listed topics and information can help transportation planners and policymakers systematically make better and timely decisions while facing future transportation demand for emerging technology. This will lead to a step forward in understanding the need for a modern transportation system to reduce dependency on fossil fuel, controlling climate changes, reducing traffic jams and accidents while increasing the reliability of the transportation system.
This study for the first time introduced a social media data-driven framework that would allow real-time monitoring of transportation trends by agencies, researchers, and professionals. A better understanding of the demands and viewpoints of users may help with public transportation planning, management, and supervision, as well as achieving transportation policy objectives. Moreover, exploration of acquired data through spatio-temporal analysis of tweeting activity and tweet sentiments might give significant instruments for policy exercise and evaluation by building and strengthening the participative process in citizens' digital societies. Potential applications of the work may include: (1) identify spatial diversity of public mobility needs and concerns through social media channels; (2) develop new policies that would satisfy the diverse needs at different locations; (3) leverage SMPs to promote user interests on emerging trends based on similar word clustering; (4) design and implement more efficient strategies to improve and influence public interest and satisfaction. While data biases may exist in such an approach, however, large-scale observations from SMPs would help to predict convincing patterns with heightened statistical power.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/futuretransp1030044/s1, Figure S1: Sentiment Analysis over Space for six categories: Shared Mobility, Figure S2. Sentiment Analysis over Space for six categories: Vehicle Technology, Figure S3. Sentiment Analysis over Space for six categories: Built Environment, Figure S4. Sentiment Analysis over Space for six categories: User Fees, Figure S5. Sentiment Analysis over Space for six categories: Telecommuting, Figure S6. Sentiment Analysis over Space for six categories: Ecommerce.