Sentiment Analysis and Topic Modeling in Transportation: A Literature Review

Torres, Ewerton Chaves Moreira; de Picado-Santos, Luís Guilherme

doi:10.3390/app15126576

Open AccessReview

Sentiment Analysis and Topic Modeling in Transportation: A Literature Review

by

Ewerton Chaves Moreira Torres

^*

and

Luís Guilherme de Picado-Santos

CERIS, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001 Lisboa, Portugal

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(12), 6576; https://doi.org/10.3390/app15126576

Submission received: 28 April 2025 / Revised: 2 June 2025 / Accepted: 4 June 2025 / Published: 11 June 2025

Download

Browse Figures

Versions Notes

Abstract

The growing use of social media data has opened new avenues for understanding user perceptions and operational inefficiencies in transportation systems. Among the most widely adopted analytical approaches for extracting insights from these data are sentiment analysis and topic modeling, which enable researchers to capture public opinion trends and uncover latent themes in unstructured content. However, despite a rising number of individual studies, systematic reviews focusing specifically on these approaches in transportation research remain limited, particularly in addressing methodological challenges and data heterogeneity. This literature review addresses that gap by critically examining 81 open-access studies published between 2014 and 2024. The main challenges identified include handling linguistic diversity, integrating multimodal and geolocated data, managing short-text formats, and addressing regional and demographic bias. In response, this review proposes a methodological framework for study selection and bibliometric analysis, classifies the most commonly applied machine learning models for sentiment and topic extraction, and synthesizes findings regarding data sources, model performance, and application contexts in transportation. Additionally, it discusses unresolved gaps and ethical concerns related to representativeness and social media governance. This review highlights the transformative potential of combining sentiment analysis and topic modeling to support smarter, more inclusive, and sustainable transportation policies by offering an integrative and critical perspective.

Keywords:

social media analytics; sentiment analysis; topic modeling; sustainable transportation; machine learning; transportation research

1. Introduction

In an increasingly connected world, transportation systems are crucial to urban development, economic growth, and environmental responsibility. However, they also face pressing challenges related to operational efficiency, user satisfaction, and the persistent failure to return to pre-pandemic levels of public transportation usage [1]. Increasing congestion, inadequate infrastructure, and poor integration of transportation modes affect urban mobility and quality of life in cities.

On the other hand, the advent of social media has provided unprecedented access to real-time user-generated data, reflecting public perceptions, behavioral patterns and emerging societal trends—an unprecedented technological advancement for urban analysis [2]. This wealth of information has catalyzed innovative applications of machine learning in the transportation domain, enabling a deeper understanding of user satisfaction, operational inefficiencies, and sustainability challenges.

Sentiment analysis enables the measurement of public opinion, highlighting areas for service improvement and informing policy decisions [3], while topic modeling uncovers latent themes and trends that shape the broader narrative of urban mobility [4]. These approaches are particularly effective in leveraging data from platforms such as Twitter, Facebook, and Instagram, offering a dynamic lens through which to view the evolving needs of transportation systems.

Despite their potential, these techniques face challenges. Linguistic diversity, regional variations, and the complexity of integrating multimodal datasets pose significant obstacles to scalability and generalizability. Moreover, short-text formats, such as tweets, demand advanced models to extract meaningful insights effectively [5]. Addressing these issues is crucial to fully realizing the benefits of social media analytics in transportation research.

In this context, the present review synthesizes recent advancements in sentiment analysis and topic modeling, examining their applications and limitations in the transportation domain. By examining diverse studies, the work aims to illuminate pathways for integrating these methodologies into smarter, more sustainable mobility solutions. Through a comprehensive exploration of methodologies, applications, and challenges, this study contributes to advancing the field, fostering data-driven innovations in transportation systems worldwide. Additionally, this review seeks to facilitate progress in studies on mobility patterns in urban areas influenced by major trip-generating hubs, such as universities, enabling comparisons with mobility dynamics in the broader context of the city in which they are located.

The main contributions of this review are (i) to provide a domain-specific methodological framework for identifying and analyzing relevant studies at the intersection of sentiment analysis, topic modeling, and transportation, using bibliometric and network tools adapted to this context; (ii) to synthesize comparative insights regarding platforms, data sizes, and model performance related to the application of machine learning models to opinion classification and topic detection; and (iii) to critically discuss current gaps, limitations, and ethical concerns in the field.

The remainder of this paper is structured as follows. Section 2 reviews related work in sentiment analysis and topic modeling within transportation. Section 3 describes the review methodology. Section 4 presents the main findings and analytical results. Section 5 offers a critical discussion. Section 6 concludes the paper with final remarks and future directions.

2. Related Work

The increasing availability of user-generated content from social media has motivated a growing number of reviews that examine the role of natural language processing in transportation research. These reviews often focus on specific analytical techniques, data sources, or application domains, offering diverse but often fragmented perspectives on the state of the field.

A number of studies have reviewed the use of sentiment analysis in transportation. For instance, Zayet et al. [6] conducted a systematic mapping of sentiment analysis applications in transportation, highlighting dominant research themes, such as traffic monitoring, ride-hailing services, and public transport satisfaction. Their review also noted a concentration of studies in a few geographic regions and a reliance on Twitter as the primary data source.

Verma [7] presented a bibliometric and thematic analysis of sentiment analysis across public service domains, including transportation, emphasizing the growth of research after 2015 and the increasing role of machine learning models. However, that study adopts a broad and multi-sectoral perspective, covering areas such as governance, health, infrastructure, and mobility, with the overarching goal of supporting innovative society initiatives. In contrast, the present review is domain-specific and focuses exclusively on the intersection of sentiment analysis, topic modeling, and transportation research.

In relation to topic modeling, fewer reviews have been published. Sun and Yin [8] performed a topic modeling-based bibliometric analysis of transportation research more broadly, identifying evolving research areas within the field. Although their work did not focus explicitly on social media or text mining, it exemplifies how topic modeling can be applied as a meta-analytical tool in transport studies.

A comprehensive technical overview of topic modeling techniques was also provided by Kherwa and Bansal [9], who surveyed over 300 publications and presented a hierarchical classification of methods. Their review included algorithmic insights, evaluation metrics, and applications in multiple fields, including social networks, which are increasingly relevant in transportation research.

However, existing reviews typically address sentiment analysis and topic modeling in isolation. There is a lack of integrated syntheses that examine how these two complementary techniques are used together in transportation research. Moreover, few works provide comparative analyses of methodologies, performance metrics, data heterogeneity, and platform biases. This review aims to fill that gap by offering a focused and comparative synthesis of SA and TM applications in transportation studies.

3. Review Methodology

This section describes the methodology adopted to conduct the literature review, detailing databases consulted, the search strategies and keywords employed, the exclusion and inclusion criteria applied, the bibliometric procedures carried out using VOSviewer software (version 1.6.20), and the framework used to compare relevant dimensions among the selected studies.

Figure 1 presents a PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram summarizing the selection process. The figure details the four main phases—identification, screening, eligibility, and inclusion—used to select relevant studies from OpenAlex and Google Scholar.

3.1. Database

The search was conducted exclusively in the OpenAlex database and was complemented with additional papers retrieved from Google Scholar. Using OpenAlex, the search criteria included articles published between 2014 and 2024, focusing specifically on works within the fields of sentiment analysis and opinion mining or computational text analysis in the Social Sciences. To ensure relevance to the topic of transportation, the keyword ‘transport’ was required to appear within the full text of the articles.

This approach aimed to identify and analyze research related to the use of computational text analysis methods applied to transportation contexts. Only peer-reviewed articles were considered, and the scope of the search encompassed English-language publications to maintain consistency in content analysis. The distribution of relevant publications over the selected time frame offers insights into the evolution of research interest in sentiment analysis and topic modeling applied to transportation. Figure 2 illustrates the annual publication trends based on the OpenAlex dataset, showing a steady increase in the number of studies published between 2014 and 2023. This upward trend reflects the growing academic interest in applying sentiment analysis and topic modeling to transportation-related research.

To complement the results retrieved from OpenAlex, additional studies were identified through searches conducted on Google Scholar. Only papers containing the words “transport” and/or “sentiment analysis” and/or “topic modeling” and that were published between 2014 and 2024 were restricted.

The choice of using only open-access databases reflects a commitment to transparency, reproducibility, and inclusive access to scientific knowledge. OpenAlex was selected for its continuous updates and commitment to open science, while Google Scholar was used to complement the search by including gray literature and sources not indexed in commercial databases. Some selection criteria (e.g., peer-reviewed publications, thematic relevance) mitigated quality risks, and the decision to exclude subscription-based databases (e.g., Scopus, Web of Science) aimed to democratize access to the findings for researchers without institutional subscriptions.

3.2. Bibliographic Analysis

VOSviewer, a tool for bibliometric and network analysis, was employed to focus the search on the specific topic of the research and to visually map the network of key concepts within the selected scientific articles and their connections. An initial graph created with VOSviewer after the first search revealed clusters containing concepts related to the study’s subject.

The graph is displayed in Figure 3, Figure 4, Figure 5 and Figure 6 in three different formats to enhance interpretation. Each figure emphasizes a key term relevant to the study’s topic and its connections to other terms: Figure 2 highlights “transport”, Figure 3 “twitter”, Figure 4 “topic” and Figure 5 “classification”.

The bibliographic analysis performed using VOSviewer offers valuable insights into the interconnected themes and research priorities within the transportation field. The network visualization reveals three primary clusters, representing distinct yet interconnected domains: sentiment analysis (green cluster), topic modeling and trends (red cluster), and platform-based studies, such as those involving Twitter (blue cluster).

The red cluster is dominated by keywords such as “topic modeling”, “Latent Dirichlet Allocation”, and “trend”, which emphasize the methodological focus of transportation research. It highlights the widespread application of topic modeling techniques to uncover latent trends in public sentiment, operational inefficiencies, and sustainability-related issues. In this sense, LDA continues to be a cornerstone of topic modeling for transportation datasets, offering significant insights into themes such as safety, government policies, and public opinion.

The green cluster focuses on performance, accuracy, and machine learning models, such as BERT, which are central to sentiment analysis research. This cluster underscores the evolution of sentiment analysis techniques, particularly the adoption of deep learning models, which have significantly improved accuracy and contextual understanding. It illustrates how these methods are being used to classify user feedback, identify operational gaps, and predict behavioral trends.

The blue cluster, centered around social media platforms, reflects the critical role of data sources, such as Twitter, in real-time transportation research. Keywords such as “feedback”, “platform”, and “comment” indicate the use of user-generated content to monitor public sentiment and improve service delivery. This cluster also highlights the integration of geospatial data and metadata to enhance the contextual relevance of insights.

Together, these clusters reveal the multifaceted nature of transportation research, where advanced methodologies like topic modeling and sentiment analysis intersect with real-world applications. The bibliographic analysis demonstrates the growing importance of leveraging diverse data sources, addressing challenges such as linguistic variations, and integrating multimodal datasets to improve transportation systems globally. These findings provide a roadmap for future research, emphasizing the need for innovative, scalable, and adaptive analytical frameworks.

3.3. Selected Studies

After the analysis of the VOSviewer graph, the search was refined to focus exclusively on open-access publications. Ultimately, 81 studies were selected for detailed exploration (Appendix A).

Of the 81 articles selected, 61 (75.3%) focused on sentiment analysis, while only 20 (24.7%) addressed topic modeling. However, it is important to note that some studies combined both methods, meaning the total number of articles using each technique is not mutually exclusive. This discrepancy may indicate a potential bias in the literature, reflecting a greater academic interest and applicability of data for sentiment analysis in transportation research.

Conversely, topic modeling, despite its potential, may face methodological challenges in analyzing short-text formats and interpreting generated topics. Additionally, the smaller number of studies may suggest a need for further research exploring this approach to better understand trends and patterns in transportation-related discussions. This imbalance in the literature highlights the importance of future research that integrates both techniques, enabling a more comprehensive and in-depth understanding of public perceptions and dynamics in transportation systems.

The geographical distribution of the studies, illustrated in Figure 7, reveals a global focus, with contributions from various countries across multiple continents. Europe dominates the research landscape, with countries such as Italy, the UK, Germany, and Spain frequently represented. Asia also plays a significant role, with contributions from China, India, South Korea, and Japan. These countries reflect a growing focus on using advanced methodologies to address urban mobility challenges in densely populated regions. The high number of studies in China underscores its emphasis on integrating technology with urban transport systems.

In North America, the United States leads with studies from several states, showcasing its diverse research focus on topics ranging from public sentiment to ride-sharing platforms. South America, represented by countries such as Brazil and Chile, demonstrates a growing interest in addressing regional transportation issues and improving user satisfaction. Africa and Oceania have a smaller but notable presence. Research in Kenya, Nigeria, and South Africa highlights emerging interest in leveraging data-driven methods for transportation in developing regions. Australia also contributes with studies focusing on innovative approaches to public transport.

Given this, the application of sentiment analysis and topic modeling in transportation research faces significant challenges shaped by geographical constraints, government policies, and disparities in social media usage. In regions with strict privacy regulations, such as the European Union or Brazil, the collection of geolocated or personal data is limited, restricting real-time analyses or the integration of critical metadata for contextual studies.

For example, research relying on Twitter in Europe must work with anonymized data. Additionally, access to social media APIs has become more restricted and costly, particularly after Twitter’s policy changes in 2023, disproportionately affecting researchers in lower-resource countries.

The choice of platforms also reflects regional inequalities. While Twitter dominates in countries such as the U.S., the UK, and Indonesia, Facebook and Instagram are more relevant in South America, and local platforms have gained prominence in China, where global platforms are blocked. However, methodological bias is also introduced due to linguistic and cultural differences.

In Indonesia, for instance, the abundance of Twitter data masks challenges such as spam and informal language, while in regions such as Sub-Saharan Africa, data scarcity limits analytical robustness. This fragmentation also creates representation bias, as urban, younger, and digitally active users are overrepresented, marginalizing rural populations or those with limited internet access.

Overall, this global distribution indicates that transportation research is a shared priority, with methodologies, such as topic modeling and sentiment analysis, adapted to address the unique challenges of each region. However, there remains an opportunity to expand research in underrepresented areas to ensure a more comprehensive understanding of global transportation issues.

4. Findings from the Literature Review

In the following sections, we analyze the main findings on three aspects: (i) social media data, (ii) sentiment analysis, and (iii) topic modeling.

4.1. Social Media Data

This section examines the application of social media data in transportation research that utilizes these machine learning models. Figure 8 presents a subset of the selected studies that explicitly reported using social media platforms, such as Twitter, Facebook, and Instagram, as primary data sources. Only studies that provided clear and direct information regarding the platforms employed were included in the final visualization. In contrast, studies that did not specify the use of social media data, as well as general literature reviews summarizing previous works, were excluded from this specific quantitative analysis (Figure 8), as they did not involve direct applications of sentiment analysis or topic modeling. Nevertheless, these articles were not removed from the overall review and were considered in later sections for contextualization and theoretical discussion.

In total, 62 studies were considered, as they directly reported the application of social media data. These studies detailed the types of platforms used, the nature of the content analyzed (e.g., tweets, posts, reviews), and their specific contributions to the research objectives.

Twitter dominates as the most frequently used platform, representing approximately 59.5% of the studies. This strong preference can be attributed to Twitter’s real-time data availability, extensive user base, and open API access, making it particularly effective for applications such as sentiment analysis and topic modeling.

In comparison, platforms such as Facebook (11.9%) and Instagram (5.9%) are less commonly used, likely due to data accessibility challenges and stricter privacy policies. Specialized platforms, such as TripAdvisor (4.8%) and Weibo (3.6%), are employed in more niche research contexts, often focused on tourism and region-specific transportation insights. The “Others” category represents a diverse group of emerging or less commonly used platforms.

Another important aspect of social media data is the large volume of information collected across different studies. Figure 9 illustrates the distribution of data sizes used in unique publications related to transportation research, presented on a logarithmic scale and sorted in ascending order. The data sizes vary significantly, ranging from small datasets with fewer than 100 entries to very large ones exceeding one million entries. This variation highlights the diverse scope of studies, with smaller datasets often focusing on detailed or localized analyses, while larger datasets cater to broader and computationally intensive research.

The figure also emphasizes the prominence of larger datasets, particularly those derived from social media platforms like Twitter, which are frequently used in transportation research for real-time and large-scale analyses. The representation of unique publications ensures that multiple contributions from the same authors or groups are distinctly identified, providing a comprehensive view of the field.

4.2. Sentiment Analysis

This section summarizes key insights from selected articles on applying sentiment analysis in transportation. The reviewed articles employ diverse sentiment analysis methodologies tailored to transportation research. Machine learning approaches, such as Support Vector Machines (SVM) and Naïve Bayes (NB), are widely used for their robustness and simplicity in text classification tasks.

Sentiment analysis, a subtask of opinion mining, classifies textual content, such as tweets or user reviews, by emotional tone (positive, negative, or neutral) [10]. It involves analyzing opinions, attitudes, and emotions expressed in written language, often applied to social media posts, customer reviews, and survey responses. By categorizing sentiments as positive, negative, or neutral, sentiment analysis helps organizations and researchers understand public perception, monitor brand reputation, and detect emerging trends.

Among the most commonly adopted methods are Support Vector Machines (SVM) and Naïve Bayes (NB), known for their efficiency and performance in short-text classification tasks. For example, Candelieri and Archetti [11] used SVM and delta TF-IDF for sentiment classification and event detection related to urban transport in Milan. Effendy et al. [12] applied SVM to analyze public sentiment on various public transportation services in Jakarta, achieving 78.12% accuracy. Additionally, Sari et al. [13] used Naïve Bayes Classifier (NBC) for sentiment analysis of tweets related to Gojek online transportation services, achieving 81% accuracy. Ashari et al. [14] used Naïve Bayes Classifier for analyzing sentiments toward major transportation services in Greater Jakarta.

Several studies explored hybrid or comparative approaches. Anastasia and Budi [15] used SVM, Naïve Bayes, and Decision Tree algorithms for sentiment classification of online transportation services (Gojek and Grab) in Indonesia. Pratama et al. [16] compared SVM, Random Forest, and Multinomial Naïve Bayes (MNB) for sentiment analysis of Indonesia’s Commuter Line services, with SVM achieving the highest accuracy (85%). Overall, SVM demonstrates consistent performance in balanced datasets, achieving accuracies between 72.33% and 86%, while NB is effective for smaller datasets, as shown in studies analyzing sentiment toward COVID-19 transmission risks and public transportation services.

Deep learning techniques have recently gained prominence due to their improved ability to capture linguistic subtleties. For instance, Lopez-Fuentes et al. [17] used deep learning models, such as CNNs and BiLSTMs, for road passability detection during floods, leveraging metadata and image analysis. Jacques et al. [18] applied CNNs for classifying sentiments in French tweets related to transportation in Île-de-France. Jaman et al. [19] then used the same model for sentiment classification of Instagram comments related to online transportation services in Indonesia, both achieving 94% accuracy.

Even higher performance was observed with hybrid models. Lin et al. [20] employed a hybrid BiLSTM model with an attention mechanism, achieving 98.7% accuracy in sentiment classification for online ride-hailing apps. In the context of the COVID-19 pandemic, Hirata and Matsuda [21] utilized the BERT model for sentiment analysis of Twitter data, focusing on logistics trends in Japan, and Chen et al. [22] applied BERT-based models to classify travel modes and sentiments in New York City.

Despite the rise of neural models, lexicon-based approaches continue to be used due to their simplicity and low computational cost. Fen et al. [23] applied lexicon-based sentiment analysis with AFINN to analyze public transportation satisfaction in Malaysia, achieving high accuracy when combined with SVM, and Chaturvedi et al. [24] leveraged AFINN lexicon to analyze sentiment in geotagged tweets, focusing on urban transportation issues like congestion and road quality in Indian cities. Otherwise, a study conducted by Beck et al. [25] used Bing Lexicon for sentiment analysis of comments from Google Maps about São Paulo’s bus terminuses, identifying cleanliness and accessibility as positive aspects.

Hybrid models that integrate lexicon-based methods with machine learning techniques enhance classification accuracy by leveraging semantic nuances. These approaches have proven effective in studies evaluating urban mobility and transportation satisfaction, providing deeper insights into user feedback. To illustrate, Mishra and Panda [26] combined lexicon-based methods and Naïve Bayes for sentiment analysis of Indian Railways using Twitter data, enhancing sentiment classification accuracy.

Sentiment analysis has become a versatile tool in transportation research, addressing a wide range of applications that enhance both operational efficiency and strategic planning. These include real-time monitoring of urban mobility, evaluating public transit services, managing emergency responses during disasters, and analyzing public sentiment during major disruptions, such as the COVID-19 pandemic. Additionally, sentiment analysis has proven invaluable for brand reputation management and policy assessment, helping transportation providers and policymakers align services with user expectations.

One of the key applications of sentiment analysis is in real-time monitoring and incident management. Studies have leveraged social media data to detect traffic events, service disruptions, and emergency conditions. For instance, Salas et al. [27] combined geolocation and sentiment analysis to identify real-time traffic incidents in the West Midlands, significantly improving traffic management. Similarly, Lock and Pettit [28] assessed public transport performance in Sydney by using Twitter as a passive geo-participation tool, integrating sentiment analysis with transport network data. In contexts such as natural disasters, Muguro et al. [29] analyzed social media data in Kenya to uncover insights into reckless driving and accidents, aiding emergency responses.

The model has also been widely used to evaluate transportation services, providing actionable insights into user satisfaction. Effendy et al. [12] analyzed user sentiment toward Jakarta’s public transportation, revealing critical issues such as long wait times and unsafe driving practices. Politis et al. [30] explored public sentiment about London’s transportation system during the pandemic, noting increased negative sentiment during periods of stricter restrictions. Studies focusing on platforms like Twitter, Instagram, and TripAdvisor have identified strengths and weaknesses in commuter lines, online transportation apps, and ride-hailing services. Common findings include areas for improvement, such as service delays, safety concerns, and app usability, enabling providers to refine their services to better meet customer needs.

In the context of sustainability and urban mobility, sentiment analysis has been utilized to explore public attitudes toward initiatives such as low-carbon travel and non-motorized transport. For example, Serna et al. [31] used hybrid sentiment analysis methods to identify sustainability-related issues in urban mobility based on social media data from Minube, emphasizing environmental concerns and transportation efficiency. In another study, Serna et al. [32] also evaluated sustainable transportation modes using user-generated content from TripAdvisor. In addition, Vitetta [33] applied sentiment analysis to social media data to understand public preferences for biking in metropolitan areas, promoting the adoption of non-motorized transport modes. By addressing these aspects, sentiment analysis not only supports environmental goals but also improves the overall quality of urban mobility systems.

Sentiment analysis has proven equally valuable for understanding public perception during major disruptions or policy changes. During the COVID-19 pandemic, studies such as those by Hirata and Matsuda [21] and Chen et al. [22] used sentiment analysis to examine logistics trends in Japan and travel behaviors in New York City, respectively. These insights revealed shifting public opinions on safety measures and service satisfaction, providing valuable guidance for policymakers to implement targeted interventions.

Another critical application lies in brand reputation management. Baj-Rogowska [34] conducted a sentiment analysis of Uber’s Facebook feedback, highlighting public opinion fluctuations caused by events such as the #DeleteUber campaign. Additionally, Uber customer sentiment studies have emphasized critical areas for improvement, including app functionality and safety measures, showcasing the importance of sentiment analysis in maintaining a competitive edge in the transportation market. Sentiment analysis has also explored emotional responses to individual motorized modes of transport. Garzia et al. [35] analyzed reactions to car and motorcycle travel during the pandemic in London and Rome, emphasizing safety concerns. This highlights the utility of sentiment analysis in assessing user preferences and addressing emotional factors that influence transportation choices.

Overall, the applications of sentiment analysis in transportation research illustrate its ability to bridge operational insights with strategic decision making. By leveraging advanced models and hybrid methodologies, researchers and practitioners can address complex challenges and foster the development of smarter, more sustainable transportation systems. The growing integration of sentiment analysis into transportation systems underscores its transformative potential for improving safety, accessibility, and user satisfaction across various domains.

The findings across the reviewed studies demonstrate the significant potential of sentiment analysis to extract actionable insights and improve various aspects of transportation systems. These findings also highlight both the strengths and limitations of the methods employed, as well as the practical implications for transportation services and policy.

A recurring theme is the ability of sentiment analysis to accurately classify sentiments and predict user behavior. Studies using advanced models, such as CNNs, BERT, and BiLSTM, consistently report high accuracy, precision, and recall metrics, reflecting their robustness in handling large datasets and nuanced text [18,19,20,21]. For instance, the integration of word embeddings with machine learning models has yielded sentiment classification accuracies as high as 94% in online transportation reviews. Similarly, topic modeling combined with sentiment analysis has successfully identified key issues, such as service delays and safety concerns, providing clear areas for intervention.

Another notable finding is the identification of critical user concerns and operational challenges. Sentiment analysis has been effective in identifying specific issues, such as crowdedness, poor app functionality, and delays, which negatively impact user satisfaction [12,13,25]. Studies focusing on public transportation systems and ride-hailing services have uncovered patterns of dissatisfaction, such as unsafe driving, insufficient infrastructure, and inadequate customer service, guiding service providers toward targeted improvements. For example, Othman et al. [36] analyzed social media feedback on Rapid KL transportation in Malaysia, identifying issues with driver attitudes, technical problems, and poor customer service, and Saragih and Girsang [37] investigated complaints on Twitter and Facebook about ride-hailing services in Indonesia, uncovering dissatisfaction with unsafe driving and lack of responsiveness in customer service.

The contributions also emphasize the contextual and temporal shifts in sentiment, particularly during disruptive events like the COVID-19 pandemic. For example, several studies observed increased negative sentiment during periods of stricter lockdown measures, reflecting heightened user stress and dissatisfaction. Conversely, positive sentiment spikes were noted following the implementation of safety measures or vaccination campaigns, highlighting the importance of adaptive and responsive policies. To illustrate, Aksan and Akdağ [38] explored sentiment trends in the UK, noting that responsive policies addressing affordability and accessibility led to spikes in positive sentiment.

An important insight is the value of sentiment analysis in uncovering latent trends and informing sustainable practices. By analyzing public sentiment toward low-carbon travel and urban mobility initiatives, studies have provided evidence-based recommendations to enhance accessibility, reduce environmental impact, and improve overall user experience. Findings also demonstrate the potential of sentiment analysis to address broader societal concerns, such as equity in transit access and perceptions of safety. Specifically, Tran et al. [39] assessed the behavioral and psychological responses of vulnerable transit riders in Metro Vancouver, highlighting inequities in transit access and safety concerns, particularly for low-income groups and women.

Despite these successes, some findings reveal limitations and areas for improvement. Challenges include addressing linguistic and regional variations, ensuring the scalability of models across diverse contexts, and handling imbalanced datasets. For instance, Bhardwaj et al. [40] applied an Optimal Transport (OT) loss framework to improve sentiment classification in imbalanced datasets, demonstrating enhanced accuracy and performance compared to traditional methods. These challenges often result in reduced classification performance or limited generalizability of insights. Additionally, certain studies highlight gaps in the integration of metadata and multimodal data, which could provide a more comprehensive understanding of user sentiment.

Therefore, the increasing complexity of datasets calls for adopting advanced machine learning and deep learning techniques to capture nuanced public sentiments. In this sense, Giancristofaro and Panangadan [41] used combined visual (image) and textual data for sentiment analysis of Instagram posts related to transportation services, enhancing classification accuracy. Equally important, Ali et al. [42] integrated metadata and text data using fuzzy ontology for sentiment analysis in transportation, demonstrating improved accuracy and insights.

Future research should prioritize addressing challenges such as multilingual datasets, imbalanced data, and multimodal inputs, ensuring that sentiment analysis continues to enhance the efficiency and sustainability of transportation systems globally. Myoya et al. [43] investigated sentiment toward public transport, focusing on multilingual and code-mixed datasets, highlighting the need for advanced natural language processing (NLP) tools.

In conclusion, the findings underscore the transformative potential of sentiment analysis in the transportation domain. They highlight its utility not only in diagnosing existing problems but also in predicting future trends and shaping proactive strategies. As methodologies advance and datasets become more comprehensive, the insights derived from sentiment analysis are likely to play an increasingly pivotal role in optimizing transportation systems and policies.

Finally, other relevant works have presented different approaches related to the use of sentiment analysis with application in the transportation domain. Table 1 shows a summary of these studies.

4.3. Topic Modeling

In the context of natural language processing and machine learning, topic modeling is an automatic technique that attempts to extract the most important topics within a collection of text documents [70]. By analyzing patterns of word co-occurrence, it clusters words and phrases into groups that represent coherent subjects, allowing researchers to uncover hidden structures in large datasets without prior knowledge of the content. Topic modeling is widely applied in fields such as information retrieval, content recommendation, sentiment analysis, and academic research, offering a scalable way to organize, summarize, and explore unstructured textual data efficiently.

Topic modeling has also been extensively used in transportation research to uncover latent themes and trends within large datasets. Various methods have been applied, with Latent Dirichlet Allocation (LDA) being one of the most popular approaches. For instance, Sun and Yin [8] used LDA to analyze 17,163 transportation research abstracts, identifying emerging trends, such as sustainability and non-motorized mobility. Similarly, Politis et al. [30] applied LDA to public transport-related tweets during the COVID-19 pandemic in London, revealing topics like safety measures and ridership changes. Wu et al. [71] utilized LDA to analyze hotline complaints in China, identifying issues such as fare disputes and unlicensed taxis.

Despite its popularity, LDA’s limitations in handling short texts, such as tweets, have prompted the adoption of advanced models, such as Non-Negative Matrix Factorization (NMF), BERTopic, Top2Vec, and Structural Topic Modeling (STM), which deliver superior granularity and contextual relevance.

For example, Egger and Yu [72] compared NMF with other topic modeling techniques for analyzing Twitter posts and found it particularly effective for generating clear and novel insights. BERTopic and Top2Vec have proven useful for short and unstructured text, such as social media posts. The authors also demonstrated BERTopic’s ability to extract nuanced and interpretable themes, while Top2Vec provided broader topic categorizations.

Structural Topic Modeling (STM) is another advanced approach, particularly useful for integrating metadata into the analysis. Kuhn [73] applied STM to aviation safety reports, incorporating contextual information, such as flight phases, to identify latent trends. Similarly, Tamakloe et al. [74] used STM to analyze COVID-19-related transportation research articles, uncovering themes such as travel behavior changes and logistics optimization.

However, some works have successfully applied Latent Dirichlet Allocation (LDA) topic modeling to tweets. For example, Hidayatullah and Ma’arif [75] analyzed tweets from Traffic Management Centers (TMCs) to model topics related to road traffic. The key topics identified included regular traffic monitoring, traffic rule appeals, announcements, and traffic accident reports.

Moreno and Iglesias [76] used LDA alongside tools such as Senpy and RapidMiner to analyze tweets from Uber’s customer support (@Uber_Support) during 2020. The identified topics included app-related issues and driver concerns. Similarly, Kinra et al. [77] analyzed tweets and newspaper articles to study public opinion on driverless cars. The topics covered included safety, congestion, labor market impacts, and legal liability. In another study, Lock and Pettit [28] employed the same model to analyze tweets related to public transport performance in Sydney. Twelve major topics were identified, including delays, cancellations, service updates, and the effects of government policies.

Hybrid approaches have also gained traction, combining topic modeling with other techniques for enhanced insights. Ali et al. [42] integrated Latent Dirichlet Allocation with fuzzy ontology and word embeddings to analyze sentiment and trends in transportation datasets. Tools like MALLET have been employed for LDA implementations, as seen in the work of Esztergár-Kiss [78] on abstracts, while libraries such as gensim and sklearn, are frequently used for applying LDA, NMF, and other models. Visual tools, such as pyLDAvis and word clouds, are often utilized to display topic distributions and improve interpretability.

Topic modeling has been widely applied in transportation research, showcasing its adaptability to diverse applications. One significant area is public transportation analysis, where topic modeling helps identify operational challenges and user concerns. For instance, Aksan and Akdağ [38] combined LDA with BERT to study public sentiment toward UK public transportation services, highlighting key themes like affordability, safety, and delays.

Another crucial application is in real-time monitoring and incident detection. Salas et al. [27] applied topic modeling to tweets from the West Midlands, enhancing traffic management by identifying real-time traffic events and disruptions. Egger and Yu [72] leveraged topic modeling to analyze travel during COVID-19, revealing shifts in public opinion and travel behavior during different phases of the pandemic. These applications demonstrate how topic modeling can enhance decision making for transportation operators by providing actionable insights into dynamic scenarios.

Topic modeling has also played a critical role in sustainability and urban mobility initiatives. Lin et al. [20] utilized hybrid methods combining LDA and BiLSTM to assess topics involving low-carbon travel initiatives, uncovering public support for environmentally friendly transportation policies. Serna et al. [31] employed semantic taxonomy and LDA to analyze urban mobility issues on social media, identifying concerns about environmental impacts and inefficiencies in urban transit systems. These insights have informed policies promoting sustainable urban mobility and non-motorized transport.

In tourism-related transportation, topic modeling has helped uncover accessibility issues and improve the user experience. For example, Pineda-Jaramillo et al. [79] applied LDA to reviews of transport systems at Mount Etna, identifying criticalities like limited public transport options and cost concerns. These findings informed recommendations for increasing public transport frequency and accessibility to enhance tourist satisfaction.

Findings from topic modeling in transportation research reveal critical insights into user concerns, operational challenges, and emerging trends. One major discovery is its ability to uncover recurring issues, such as affordability, safety, and delays, which often negatively impact user satisfaction. By analyzing patterns in public feedback, researchers have identified key areas where transportation systems can improve service quality and operational efficiency. These insights have informed strategies to address user dissatisfaction and enhance overall service delivery.

In the context of sustainability and urban mobility, topic modeling has highlighted public support for environmentally friendly initiatives and exposed inefficiencies in urban transportation systems. It has provided evidence-based recommendations to promote low-carbon travel, reduce environmental impact, and improve accessibility. Findings have also emphasized the importance of addressing infrastructure gaps and inefficiencies to meet sustainability goals while enhancing the user experience.

Temporal and contextual dynamics are another significant area of insight. Topic modeling has captured shifts in public sentiment during periods of disruption, such as heightened concerns during crises and improved sentiment following the implementation of safety measures. These findings demonstrate the method’s ability to track evolving public attitudes and provide valuable input for adaptive policy-making and service adjustments.

To conclude, the growing complexity of transportation datasets has driven the adoption of advanced methods, such as BERTopic, STM, and hybrid models, which complement the foundational role of LDA. These approaches have proven essential for analyzing social media data, integrating metadata, and uncovering actionable insights. Topic modeling has consistently uncovered critical themes, including public sentiment, operational inefficiencies, and opportunities for sustainability, guiding targeted interventions and policy decisions.

Overall, by revealing latent patterns and adapting to diverse contexts, topic modeling remains a powerful tool for advancing transportation research. Its ability to inform service improvements, promote low-carbon mobility, and address evolving challenges underscores its ongoing relevance in fostering smarter, more sustainable transportation systems worldwide. Other relevant works have presented different approaches related to the use of sentiment analysis and topic modeling with application in the transportation domain. Table 2 shows a summary of these studies.

5. Discussion

The integration of sentiment analysis and topic modeling into transportation research has offered transformative insights into public perceptions, operational challenges, and areas for improvement. By analyzing vast datasets from social media and other sources, these methods have illuminated critical aspects of transportation systems, such as user satisfaction, safety concerns, and sustainability efforts.

Advancements in sentiment analysis have demonstrated their capability to uncover hidden patterns and themes within large and complex transportation datasets. By leveraging hybrid approaches, such as BERTopic and word embeddings, sentiment analysis has improved accuracy and interpretability, allowing researchers to identify specific operational inefficiencies and user concerns. These methods have proven especially useful in addressing user feedback related to urban mobility and sustainability, where they have highlighted barriers like insufficient infrastructure and inefficiencies in public transit systems.

Topic modeling techniques, such as Latent Dirichlet Allocation (LDA) and Structural Topic Modeling (STM), have been particularly effective in identifying latent trends and contextual themes. Despite its limitations in handling short texts, such as tweets, LDA remains a widely used method due to its simplicity and effectiveness in extracting topics from social media data. This application has proven especially valuable in understanding public sentiment and identifying issues in real-time transportation contexts.

Quantitatively, only a minority of studies provided standard performance metrics for either sentiment classification or topic modeling. For sentiment analysis, the F1-score and accuracy were the most frequently reported metrics; however, many papers lacked detailed confusion matrices or class distribution analyses. In topic modeling, coherence measures—probabilistic coherence, for example—were inconsistently used across studies. This omission weakens the reproducibility and comparison of models from different datasets and use cases.

Probabilistic coherence, in particular, offers a robust way to evaluate topic quality by measuring statistical dependence among top terms within each topic. It outperforms traditional measures, such as perplexity, when interpretability is a priority. However, several studies failed to report it, limiting the validity of their findings.

Beyond evaluation gaps, the integration of sentiment analysis (SA) and topic modeling (TM) remains underutilized in transportation research. However, recent studies in related domains offer methodological contributions that highlight the potential of integrated approaches. Çaylak et al. [88] and Özkara et al. [89] demonstrate how SA and TM can be combined to analyze user reviews and multilingual social media content, respectively, revealing how emotional tone aligns with thematic concerns.

Deng and Liu [90] propose a RoBERTa–BiLSTM–Attention model that improves sentiment classification across public opinion stages and could inform more structured combinations with topic models. Melhem et al. [91] develop a deep learning framework combining BiLSTM, CNN, and multi-level attention to classify traffic-related short texts—an architecture that could be extended to sentiment-aware topic modeling. While not focused directly on SA or TM, Laynes-Fiascunari et al. [92] emphasize the importance of real-time textual analysis in intelligent transport systems (ITS), reinforcing the value of integrating textual and behavioral analytics.

The analysis of temporal dynamics has added depth to transportation research, revealing how public sentiment shifts during disruptive events. This ability to track and respond to evolving public sentiment underscores the potential of these methods to inform adaptive decision making in transportation planning.

However, several challenges persist. Linguistic and regional variations remain significant obstacles, alongside the handling of imbalanced datasets. Additionally, the integration of multimodal data—combining text, images, and metadata—is still underexplored, limiting the comprehensiveness of current insights. Overcoming these challenges requires continued methodological innovation and the development of more robust analytical frameworks.

Another limitation lies in the lack of synergy between user-generated content and traditional transportation data sources. Bridging this gap could create more comprehensive datasets, enabling more accurate and actionable insights. As such, fostering collaboration across data platforms and research domains will be critical for future advancements.

Despite methodological advances, it is crucial to recognize that social networks are private platforms whose algorithms prioritize content with greater engagement, often amplifying polarizations, hate speech or sensationalist narratives. For example, negative or controversial posts tend to receive greater visibility, creating a biased perception of collective sentiment.

Furthermore, content moderation policies, API access restrictions, and unilateral changes to platform governance, such as Twitter’s 2023 API policy shifts, can limit the transparency and consistency of data, compromising the replicability of studies.

Relying on these platforms to inform public policies introduces significant risks, as decisions based on data influenced by corporate interests can reinforce inequalities or prioritize agendas that are not aligned with collective well-being. Therefore, it is imperative to question the neutrality of these sources and seek complementary strategies, such as the integration of traditional research, official data, and open platforms, to ensure more balanced and democratically representative analyses.

Finally, while sentiment analysis and topic modeling offer transformative insights for transportation systems, their application raises critical ethical concerns. First, the reliance on social media data, such as Twitter, often overlooks issues of privacy and consent, as users may not explicitly agree to their posts being used for policy-making purposes. For instance, studies analyzing geotagged tweets risk exposing individual commuting patterns, potentially violating anonymity. Second, algorithmic biases in models may amplify inequalities, as datasets predominantly reflect digitally active populations, marginalizing underrepresented groups. Future frameworks must address these challenges through transparent data practices, inclusive sampling, and participatory policymaking to promote equitable and sustainable mobility.

6. Conclusions

This review highlights the transformative potential of sentiment analysis and topic modeling in the transportation domain. These methods have proven invaluable for understanding public sentiment, identifying operational challenges, and fostering sustainable practices. By leveraging social media data, researchers have gained actionable insights into user experiences and systemic inefficiencies, enabling targeted interventions and policy reforms.

The findings emphasize the role of advanced machine learning and deep learning techniques in capturing complex and nuanced data. These methods have consistently delivered high accuracy and contextual relevance, supporting the development of smarter and more sustainable transportation systems. Furthermore, the integration of topic modeling with sentiment analysis has offered a more comprehensive understanding of transportation dynamics, effectively bridging gaps between operational insights and strategic decision making.

Despite these contributions, this review has certain limitations. First, it focused exclusively on English-language, open-access publications, which may have excluded valuable insights from non-English or subscription-only sources. Second, while the selection and analysis processes were designed to be systematic and transparent, it is important to acknowledge that systematic reviews—by their very nature—may overlook relevant studies due to strict inclusion criteria or database limitations. Consequently, some innovative or context-specific contributions may not have been captured.

Future research should prioritize addressing existing challenges, such as enhancing model scalability, incorporating multilingual capabilities, and integrating multimodal data. Expanding the application of these methods to underserved regions and contexts is also essential for achieving a more equitable and comprehensive understanding of global transportation issues.

Additionally, improving methodological transparency and evaluation is crucial. This includes the systematic use of reproducible metrics, such as the F1-score for sentiment classification and probabilistic coherence for topic modeling, as well as the application of stability tests and cross-validation procedures. The integration of SA and TM into joint modeling frameworks remains underexplored and warrants further investigation.

In conclusion, the integration of sentiment analysis and topic modeling represents a significant step forward in transportation research. By uncovering latent patterns and adapting to diverse contexts, these methodologies are poised to play an increasingly central role in optimizing transportation systems and policies. Their ongoing evolution and application will undoubtedly contribute to the advancement of sustainable, efficient, and user-focused mobility solutions worldwide.

Author Contributions

Conceptualization, E.C.M.T. and L.G.d.P.-S.; methodology, E.C.M.T. and L.G.d.P.-S.; software, E.C.M.T.; validation, E.C.M.T.; formal analysis, E.C.M.T. and L.G.d.P.-S.; investigation, E.C.M.T.; resources, E.C.M.T. and L.G.d.P.-S.; data curation, E.C.M.T.; writing—original draft preparation, E.C.M.T.; writing—review and editing, E.C.M.T. and L.G.d.P.-S.; visualization, E.C.M.T.; supervision, L.G.d.P.-S. All authors have read and agreed to the published version of the manuscript.

Funding

The authors acknowledge the financial support of the Foundation for Science and Technology (FCT) through the project UIDB/04625/2025 of the research unit CERIS (DOI: 10.54499/UIDB/04625/2020).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Scientific Studies Using Sentiment Analysis and Topic Modeling in the Transport Domain.

Ref.	Author(s)	Social Media	Data Size	Location
[11]	Candelieri and Archetti	Yes (Twitter)	Not mentioned	Milan (ITA)
[17]	Lopez-Fuentes et al.	Yes (Twitter)	5818 tweets	Global
[44]	Ajik et al.	No (Google Forms)	216 responses	Katsina (NGA)
[8]	Sun and Yin	No (Abstracts)	17,163 articles from 22 journals	Cambridge and Gainesville (USA)
[72]	Egger and Yu	Yes (Twitter)	31,800 tweets	Salzburg and Vienna (AUT)
[93]	Ali et al.	Yes (Twitter, Facebook, TripAdvisor)	30,000 tweets 1851 TripAdvisor reviews Facebook (not mentioned)	New York (USA) and London (GBR)
[73]	Kuhn	No	25,706 records	Global
[80]	Roque et al.	No	54 RSI reports	Dublin (IRL)
[45]	Sari and Ruldeviyani	Yes (Twitter)	340 tweets	Jakarta (IDN)
[75]	Hidayatullah and Ma’arif	Yes (Twitter)	20,932 tweets	Java (IDN)
[34]	Baj-Rogowska	Yes (Facebook)	18,326 posts	Gdańsk (POL)
[6]	Zayet et al.	No	74 studies	Kuala Lumpur and Petaling Jaya (MYS)
[21]	Hirata and Matsuda	Yes (Twitter)	17,597 tweets	Kobe and Tokyo (JPN)
[46]	Cao et al.	Yes (Weibo, blogs, forums)	Not mentioned	Changsha and Beijing (CHN)
[47]	Papapicco	Yes (Twitter)	774 tweets	Bari (ITA)
[48]	Trivedi and Serasiya	Yes (Twitter)	Not mentioned	Ahmedabad (IND)
[27]	Salas et al.	Yes (Twitter)	Not mentioned	West Midlands (GBR)
[30]	Politis et al.	Yes (Twitter)	418,624 tweets	London (GBR)
[32]	Serna et al.	Yes (TripAdvisor)	2000 TripAdvisor reviews	Donostia-San Sebastián (ESP)
[76]	Moreno and Iglesias	Yes (Twitter)	215,387 tweets	Madrid (ESP)
[36]	Othman et al.	Yes (Facebook and Twitter)	500 posts/tweets	Klang Valley (MYS)
[24]	Chaturvedi et al.	Yes (Twitter)	Not mentioned	Delhi, Mumbai, Bangalore, and Hyderabad (IND)
[77]	Kinra et al.	Yes (Twitter, newspaper articles)	157,000 tweets and 1338 newspaper articles	Copenhagen (DNK)
[12]	Effendy et al.	Yes (Twitter)	1201 tweets	Bandung and Jakarta (IDN)
[31]	Serna et al.	Yes (Minube)	43,251 comments	Arrasate, Valencia, and Elgoibar (ESP)
[49]	Candelieri et al.	Yes (Twitter)	45,000 tweets	Milan (ITA)
[78]	Esztergár-Kiss	No (Abstracts)	310 project abstracts	Budapest (HUN)
[40]	Bhardwaj et al.	Yes	Not mentioned	Global
[81]	Méndez et al.	Yes (Twitter)	91,186 tweets	Santiago (CHL)
[50]	Lazic et al.	Yes (Twitter)	14,640 tweets	Belgrade (SRB)
[18]	Jacques et al.	Yes (Twitter)	68,916 tweets	Île-de-France (FRA)
[51]	Windasari et al.	Yes (Twitter)	2000 tweets	Semarang (IDN)
[52]	Luong and Houston	Yes (Twitter)	8515 tweets	Los Angeles (USA)
[71]	Wu et al.	No (Hotline data)	223,599 complaints	Hohhot (CHN)
[26]	Mishra and Panda	Yes (Twitter)	92,271 tweets	Rourkela (IND)
[35]	Garzia et al.	Yes (Twitter)	121,536 tweets	London (UK) and Rome (ITA)
[82]	Dou et al.	Yes (Dianping.com).	52,087 reviews	Shanghai (CHN)
[29]	Muguro et al.	Yes (Twitter)	1,000,000 tweets	Nairobi (KEN)
[83]	Ye et al.	Yes (Weibo)	13,738 messages	Shanghai (CHN)
[53]	Pinem	Yes (Turnbackhoax.com data)	Over 100 hoax cases	Yogyakarta (IDN)
[84]	Tamakloe et al.	No (Abstracts)	421 abstracts	Seoul (KOR)
[54]	Preoțiuc-Pietro et al.	Yes (Twitter)	1971 tweets	Florence (ITA)
[85]	Kabbani et al.	Yes (Twitter)	4432 tweets	Calgary (CAN)
[94]	Shokoohyar et al.	Yes (Twitter)	216,120 tweets	USA
[55]	Shin	Yes (Yelp)	1075 reviews	New York, Washington, D.C., and Chicago (USA)
[15]	Anastasia and Budi	Yes (Twitter)	126,405 tweets	Depok (IDN)
[42]	Ali et al.	Yes (Twitter and reviews)	1851 reviews and tweets	Incheon (KOR)
[56]	Fan et al.	No (Noise complaint records)	2032 complaint records	Bukit Panjang, Singapore (SGP)
[37]	Saragih and Girsang	Yes (Facebook and Twitter)	1200 comments	Jakarta (IDN)
[13]	Sari et al.	Yes (Twitter)	2000 tweets	Yogyakarta (IDN)
[22]	Chen et al.	Yes (Twitter)	296,924 tweets	New York (USA)
[28]	Lock and Pettit	Yes (Twitter)	55,000 tweets	Sydney (AUS)
[57]	Seliverstov et al.	No (Web reviews)	1130 reviews	Saint Petersburg (RUS)
[58]	Adilah et al.	Yes (Instagram)	1000 comments	Jakarta (IDN)
[59]	Gao et al.	Yes (Weibo)	3266 Weibo posts	Shanghai (CHN)
[79]	Pineda-Jaramillo et al.	No (TripAdvisor reviews)	1947 reviews	Mount Etna, Sicily (ITA)
[60]	Styawati et al.	Yes (Google Play Store reviews)	14,688 Gojek reviews and 15,945 Grab reviews	Bandar Lampung (IDN)
[25]	Beck et al.	No (Google Maps reviews)	8371 comments	São Paulo (BRA)
[39]	Tran et al.	Yes (Twitter)	517,000 tweets	Vancouver (CAN)
[43]	Myoya et al.	Yes (Twitter)	Not mentioned	Nairobi (KEN), Johannesburg (ZAF), Dar es Salaam (TZA)
[61]	Atmadja et al.	Yes (Twitter)	565 tweets	Bandung and Samarinda (IDN)
[86]	Aksan and Akdağ	Yes (Twitter)	206,205 tweets (UK) and 36,418 tweets (India)	London, Birmingham, Manchester (UK) and New Delhi, Mumbai, Bengaluru (IND)
[62]	Gupta et al.	Yes (Facebook and Twitter)	1103 feedback	New Delhi (IND) and Edmonton (CAN)
[23]	Fen et al.	Yes (Twitter)	1235 tweets	Kuala Lumpur (MYS)
[63]	Kumalasari and Handayani	Yes (Instagram)	1584 reviews	Surabaya (IDN)
[19]	Jaman et al.	Yes (Twitter)	2053 tweets	Karawang (IDN)
[64]	Bakalos, Papadakis, and Litke	Yes (Twitter, Reddit)	5047 tweets	Athens (GRC)
[7]	Verma	No (bibliometric review)	353 articles	Global
[16]	Pratama et al.	Yes (Twitter)	2160 tweets	Jakarta (IDN)
[14]	Ashari, Irawan, and Setianing-sih	Yes (Instagram)	3600 comments	Bandung (IDN)
[33]	Vitetta	Yes (platforms unspecified)	Not mentioned	Florence, Rome, Naples, Bari, Reggio Calabria (ITA)
[87]	Ali et al.	Yes (Twitter, TripAdvisor)	Not mentioned	Incheon (KOR)
[65]	Rohwinasakti, Irawan, and Setianingsih	Yes (Instagram)	278,179	Bandung and Jakarta (IDN)
[20]	Lin et al.	No (Online app reviews)	9200 reviews	Tainan (TWN)
[66]	Gitto and Mancuso	Yes (blogs)	895 sentences from blog content	Amsterdam (NLD), Frankfurt (DEU), London (GBR), Madrid (ESP), Paris (FRA)
[67]	Martin-Domingo et al.	Yes (Twitter)	4392 tweets	London (GBR)
[38]	Aksan and Akdağ	Yes (Twitter)	206,205 tweets	London, Birmingham, Manchester (GBR)
[68]	Aldisa, Maulana, and Al-dinugroho	Yes (Twitter)	5 h of tweets collected per service	Greater Jakarta (IDN)
[41]	Giancristofaro and Panangadan	Yes (Instagram)	1010 posts	California (USA)
[69]	Arsarini et al.	Yes (Twitter, YouTube, Google Search)	5000 data points	Bali (IDN)

References

Ziedan, A.; Brakewood, C.; Watkins, K. Will Transit Recover? A Retrospective Study of Nationwide Ridership in the United States during the COVID-19 Pandemic. J. Public Transp. 2023, 25, 100046. [Google Scholar] [CrossRef] [PubMed]
Martí, P.; Serrano-Estrada, L.; Nolasco-Cirugeda, A. Social Media Data: Challenges, Opportunities and Limitations in Urban Studies. Comput. Environ. Urban Syst. 2019, 74, 161–174. [Google Scholar] [CrossRef]
Wang, S.; Zhao, Z.; Xie, Y.; Ma, M.; Chen, Z.; Wang, Z.; Su, B.; Xu, W.; Li, T. Recent Surge in Public Interest in Transportation: Sentiment Analysis of Baidu Apollo Go Using Weibo Data. arXiv 2024, arXiv:2408.10088v1. [Google Scholar]
Avetisyan, L.; Zhang, C.; Bai, S.; Pari, E.M.; Feng, F.; Bao, S.; Zhou, F. Design a Sustainable Micro-Mobility Future: Trends and Challenges in the United States and European Union Using Natural Language Processing Techniques. arXiv 2022, arXiv:2210.11714. [Google Scholar]
Wang, X.; Jiang, W.; Luo, Z. Combination of Convolutional and Recurrent Neural Network for Sentiment Analysis of Short Texts. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, 11–16 December 2016; pp. 2428–2437. [Google Scholar]
Zayet, T.M.A.; Ismail, M.A.; Varathan, K.D.; Noor, R.M.D.; Chua, H.N.; Lee, A.; Low, Y.C.; Singh, S.K.J. Investigating Transportation Research Based on Social Media Analysis: A Systematic Mapping Review. Scientometrics 2021, 126, 6383–6421. [Google Scholar] [CrossRef]
Verma, S. Sentiment Analysis of Public Services for Smart Society: Literature Review and Future Research Directions. Gov. Inf. Q. 2022, 39, 101708. [Google Scholar] [CrossRef]
Sun, L.; Yin, Y. Discovering Themes and Trends in Transportation Research Using Topic Modeling. Transp. Res. Part C Emerg. Technol. 2017, 77, 49–66. [Google Scholar] [CrossRef]
Kherwa, P.; Bansal, P. Topic Modeling: A Comprehensive Review. EAI Endorsed Trans. Scalable Inf. Syst. 2020, 7, 159623. [Google Scholar] [CrossRef]
Tonkin, E.L. A Day at Work (with Text): A Brief Introduction. In Working with Text: Tools, Techniques and Approaches for Text Mining; Elsevier: Amsterdam, The Netherlands, 2016; pp. 23–60. ISBN 9781843347491. [Google Scholar]
Candelieri, A.; Archetti, F. Analyzing Tweets to Enable Sustainable, Multi-Modal and Personalized Urban Mobility: Approaches and Results from the Italian Project TAM-TAM. In WIT Transactions on the Built Environment; WITPress: Boston, MA, USA, 2014; Volume 138, pp. 373–379. [Google Scholar]
Effendy, V.; Novantirani, A.; Sabariah, M.K. Sentiment Analysis on Twitter about the Use of City Public Transportation Using Support Vector Machine Method. Int. J. Inf. Commun. Technol. 2016, 2, 57–66. [Google Scholar] [CrossRef]
Sari, E.Y.; Wierfi, A.D.; Setyanto, A. Sentiment Analysis of Customer Satisfaction on Transportation Network Company Using Naive Bayes Classifier. In Proceedings of the International Conference on Computer Engineering Network, and Intelligent Multimedia, London, UK, 3–5 July 2019; Institute of Electrical and Electronics Engineers: Los Alamitos, CA, USA, 2019. [Google Scholar]
Ashari, D.S.; Irawan, B.; Setianingsih, C. Sentiment Analysis on Online Transportation Services Using Convolutional Neural Network Method. In Proceedings of the International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), Semarang, Indonesia, 20–21 October 2021; Institute of Electrical and Electronics Engineers Inc.: New York, NY, USA, 2021; Volume 1, pp. 335–340. [Google Scholar]
Anastasia, S.; Budi, I. Twitter Sentiment Analysis of Online Transportation Service Providers. In Proceedings of the 2016 International Conference on Advanced Computer Science and Information Systems, Malang, Indonesia, 15–16 October 2016; IEEE: New York, NY, USA, 2016. [Google Scholar]
Pratama, M.O.; Satyawan, W.; Jannati, R.; Pamungkas, B.; Raspiani; Syahputra, M.E.; Neforawati, I. The Sentiment Analysis of Indonesia Commuter Line Using Machine Learning Based on Twitter Data. In Proceedings of the Journal of Physics: Conference Series, Crete, Greece, 18 April 2019; Institute of Physics Publishing: London, UK, 2019; Volume 1193. [Google Scholar]
Lopez-Fuentes, L.; Farasin, A.; Zaffaroni, M.; Skinnemoen, H.; Garza, P. Deep Learning Models for Road Passability Detection during Flood Events Using Social Media Data. Appl. Sci. 2020, 10, 8783. [Google Scholar] [CrossRef]
Jacques, S.; Farahnak, F.; Kosseim, L. Sentiment Analysis of Tweets on Transport from Île-de-France. ACL Anthol. 2018, 2, 239–248. [Google Scholar]
Jaman, J.H.; Abdulrohman, R.; Suharso, A.; Sulistiowati, N.; Dewi, I.P. Sentiment Analysis on Utilizing Online Transportation of Indonesian Customers Using Tweets in the Normal Era and the Pandemic COVID-19 Era with Support Vector Machine. Adv. Sci. Technol. Eng. Syst. 2020, 5, 389–394. [Google Scholar] [CrossRef]
Lin, X.M.; Ho, C.H.; Xia, L.T.; Zhao, R.Y. Sentiment Analysis of Low-Carbon Travel APP User Comments Based on Deep Learning. Sustain. Energy Technol. Assess. 2021, 44, 101014. [Google Scholar] [CrossRef]
Hirata, E.; Matsuda, T. Examining Logistics Developments in Post-Pandemic Japan through Sentiment Analysis of Twitter Data. Asian Transp. Stud. 2023, 9, 100110. [Google Scholar] [CrossRef]
Chen, X.; Wang, Z.; Di, X. Sentiment Analysis on Multimodal Transportation during the COVID-19 Using Social Media Data. Information 2023, 14, 113. [Google Scholar] [CrossRef]
Fen, C.W.; Ismail, M.A.; Zayet, T.M.A.; Varathan, K.D. Sentiment Analysis of Users’ Perception Towards Public Transportation Using TWITTER. Int. J. Technol. Manag. Inf. Syst. 2020, 2, 92–101. [Google Scholar]
Chaturvedi, N.; Toshniwal, D.; Parida, M. Twitter to Transport: Geo-Spatial Sentiment Analysis of Traffic Tweets to Discover People’s Feelings for Urban Transportation Issues. J. East. Asia Soc. Transp. Stud. 2019, 13, 210–220. [Google Scholar]
Beck, D.; Teixeira, M.; Maróstica, J.; Ferasso, M. Quality Perception of São Paulo Transportation Services: A Sentiment Analysis of Citizens’ Satisfaction Regarding Bus Terminuses. Rev. Gest. Ambient. Sustentabilidade 2024, 13, e23392. [Google Scholar] [CrossRef]
Mishra, D.N.; Panda, R.K. Decoding Customer Experiences in Rail Transport Service: Application of Hybrid Sentiment Analysis. Public Transp. 2023, 15, 31–60. [Google Scholar] [CrossRef]
Salas, A.; Georgakis, P.; Nwagboso, C.; Ammari, A.; Petalas, I. Traffic Event Detection Framework Using Social Media. In Proceedings of the IEEE International Conference on Smart Grid and Smart Cities (ICSGSC), Singapore, 23–26 July 2017. [Google Scholar] [CrossRef]
Lock, O.; Pettit, C. Social Media as Passive Geo-Participation in Transportation Planning–How Effective Are Topic Modeling & Sentiment Analysis in Comparison with Citizen Surveys? Geo-Spat. Inf. Sci. 2020, 23, 275–292. [Google Scholar] [CrossRef]
Muguro, J.; Njeri, W.; Matsushita, K.; Sasaki, M. Road Traffic Conditions in Kenya: Exploring the Policies and Traffic Cultures from Unstructured User-Generated Data Using NLP. IATSS Res. 2022, 46, 329–344. [Google Scholar] [CrossRef]
Politis, I.; Georgiadis, G.; Kopsacheilis, A.; Nikolaidou, A.; Papaioannou, P. Capturing Twitter Negativity Pre-vs. Mid-COVID-19 Pandemic: An Lda Application on London Public Transport System. Sustainability 2021, 13, 13356. [Google Scholar] [CrossRef]
Serna, A.; Gerrikagoitia, J.K.; Bernabé, U.; Ruiz, T. Sustainability Analysis on Urban Mobility Based on Social Media Content. In Transportation Research Procedia; Elsevier: Amsterdam, The Netherlands, 2017; Volume 24, pp. 1–8. [Google Scholar]
Serna, A.; Soroa, A.; Agerri, R. Applying Deep Learning Techniques for Sentiment Analysis to Assess Sustainable Transport. Sustainability 2021, 13, 2397. [Google Scholar] [CrossRef]
Vitetta, A. Sentiment Analysis Models with Bayesian Approach: A Bike Preference Application in Metropolitan Cities. J. Adv. Transp. 2022, 2022, 2499282. [Google Scholar] [CrossRef]
Baj-Rogowska, A. Sentiment Analysis of Facebook Posts: The Uber Case. In Proceedings of the 8th IEEE International Conference on Intelligent Computing and Information Systems, Madurai, India, 13–15 December 2018; Volume 430. [Google Scholar]
Garzia, F.; Borghini, F.; Moretti, A.; Lombardi, M.; Ramalingam, S. Emotional Analysis of Safeness and Risk Perception of Transports and Travels by Car and Motorcycle in London and Rome during the COVID-19 Pandemic. In Proceedings of the International Carnahan Conference on Security Technology, Hatfield, UK, 11–15 October 2021; Institute of Electrical and Electronics Engineers Inc.: New York, NY, USA, 2021; Volume 1. [Google Scholar]
Othman, N.; Hussin, M.; Mahmood, R.A.R. Sentiment Evaluation of Public Transport in Social Media Using Naïve Bayes Method. Int. J. Eng. Adv. Technol. 2019, 9, 2305–2308. [Google Scholar] [CrossRef]
Saragih, M.H.; Girsang, A.S. Sentiment Analysis of Customer Engagement on Social Media in Transport Online. In Proceedings of the International Conference on Sustainable Information Engineering and Technology, Batu, Indonesia, 24–25 November 2017; IEEE: New York, NY, USA, 2017. [Google Scholar]
Aksan, A.; Akdaǧ, H.C. Public Opinion on UK Public Transportation Through Sentiment Analysis and Topic Modeling. In Proceedings of the 31st IEEE Conference on Signal Processing and Communications Applications, SIU 2023, Istanbul, Turkey, 5–8 July 2023; Institute of Electrical and Electronics Engineers Inc.: New York, NY, USA, 2023. [Google Scholar]
Tran, M.; Draeger, C.; Wang, X.; Nikbakht, A. Monitoring the Well-Being of Vulnerable Transit Riders Using Machine Learning Based Sentiment Analysis and Social Media: Lessons from COVID-19. Environ. Plan B Urban Anal. City Sci. 2023, 50, 60–75. [Google Scholar] [CrossRef]
Bhardwaj, R.; Vaidya, T.; Poria, S. Towards Solving NLP Tasks with Optimal Transport Loss. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 10434–10443. [Google Scholar] [CrossRef]
Giancristofaro, G.T.; Panangadan, A. Predicting Sentiment toward Transportation in Social Media Using Visual and Textual Features. In Proceedings of the 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, 1–4 November 2016; IEEE: New York, NY, USA, 2016. [Google Scholar]
Ali, F.; Kwak, D.; Khan, P.; Islam, S.M.R.; Kim, K.H.; Kwak, K.S. Fuzzy Ontology-Based Sentiment Analysis of Transportation and City Feature Reviews for Safe Traveling. Transp. Res. Part C Emerg. Technol. 2017, 77, 33–48. [Google Scholar] [CrossRef]
Myoya, R.L. Analysing Public Transport User Sentiment. Master’s Thesis, University of Pretoria, Pretoria, South Africa, 2024. [Google Scholar]
Ajik, E.D.; Suleiman, A.B.; Ibrahim, M. Enhancing user experience through sentiment analysis for katsina state transport agency: A textblob approach. Fudma J. Sci. 2023, 7, 117–122. [Google Scholar] [CrossRef]
Sari, I.C.; Ruldeviyani, Y. Sentiment Analysis of the COVID-19 Virus Infection in Indonesian Public Transportation on Twitter Data: A Case Study of Commuter Line Passengers. In Proceedings of the 2020 International Workshop on Big Data and Information Security, IWBIS 2020, Depok, Indonesia, 17 October 2020; Institute of Electrical and Electronics Engineers Inc.: New York, NY, USA, 2020; pp. 23–28. [Google Scholar]
Cao, J.; Zeng, K.; Wang, H.; Cheng, J.; Qiao, F.; Wen, D.; Gao, Y. Web-Based Traffic Sentiment Analysis: Methods and Applications. IEEE Trans. Intell. Transp. Syst. 2014, 15, 844–853. [Google Scholar] [CrossRef]
Papapicco, C. SentiSfaction: New Cultural Way to Measure Tourist COVID-19 Mobility in Italy. Mediterr. J. Soc. Behav. Res. 2023, 7, 29–41. [Google Scholar] [CrossRef] [PubMed]
Trivedi, M.; Serasiya, S. Traffic Issues Categorization of Indian Cities Using Word2Vec by Social Media Data. J. Emerg. Technol. Innov. Res. 2020, 7, 576–580. [Google Scholar]
Candelieri, A.; Archetti, F. Detecting Events and Sentiment on Twitter for Improving Urban Mobility. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, Istanbul, Turkey, 4–8 May 2015; pp. 106–115. [Google Scholar]
Lazić, J.; Krstić, A.; Vujnović, S. Sentiment Analysis Using Optimal Transport Loss Function. In Proceedings of the 10th International Conference on Electrical, Electronic and Computing Engineering, IcETRAN 2023, East Sarajevo, Bosnia and Herzegovina, 5–8 June 2023; Institute of Electrical and Electronics Engineers Inc.: New York, NY, USA, 2023. [Google Scholar]
Pertiwi Windasari, I.; Nurul Uzzi, F.; Iman Satoto, K. Sentiment Analysis on Twitter Posts: An Analysis of Positive or Negative Opinion on GoJek. In Proceedings of the 2017 4th International Conference on Information Technology, Computer, and Electrical Engineering, Semarang, Indonesia, 18–19 October 2017; ISBN 9781538639474. [Google Scholar]
Luong, T.T.B.; Houston, D. Public Opinions of Light Rail Service in Los Angeles, an Analysis Using Twitter Data. In Proceedings of the iConference 2015 Proceedings, Newport Beach, CA, USA, 24–27 March 2015. [Google Scholar]
Pinem, Y.A. Corpus-Based Analysis of Online Hoax Discourse on Transportation Subject Picturing Indonesian Issue. Ling. Cult. 2021, 15, 7067. [Google Scholar] [CrossRef]
Preotiuc-Pietro, D.; Gaman, M.; Aletras, N. Automatically Identifying Complaints in Social Media. In Proceedings of the Association for Computational Linguistics, Minneapolis, MN, USA, 2–7 June 2019; Volume 5008, pp. 5008–5019. [Google Scholar]
Shin, E.J. A Comparative Study of Bike-Sharing Systems from a User’s Perspective: An Analysis of Online Reviews in Three U.S. Regions between 2010 and 2018. Int. J. Sustain. Transp. 2021, 15, 908–923. [Google Scholar] [CrossRef]
Fan, Y.; Teo, H.P.; Wan, W.X. Public Transport, Noise Complaints, and Housing: Evidence from Sentiment Analysis in Singapore. J. Reg. Sci. 2021, 61, 570–596. [Google Scholar] [CrossRef]
Seliverstov, Y.; Seliverstov, S.; Malygin, I.; Korolev, O. Traffic Safety Evaluation in Northwestern Federal District Using Sentiment Analysis of Internet Users’ Reviews. Transp. Res. Procedia 2020, 50, 626–635. [Google Scholar] [CrossRef]
Adilah, M.T.; Supendar, H.; Ningsih, R.; Muryani, S.; Solecha, K. Sentiment Analysis of Online Transportation Service Using the Naïve Bayes Methods. J. Phys. Conf. Ser. 2020, 1641, 012093. [Google Scholar]
Gao, S.; Ran, Q.; Su, Z.; Wang, L.; Ma, W.; Hao, R. Evaluation System for Urban Traffic Intelligence Based on Travel Experiences: A Sentiment Analysis Approach. Transp. Res. Part A Policy Pract. 2024, 187, 104170. [Google Scholar] [CrossRef]
Styawati; Nurkholis, A.; Aldino, A.A.; Samsugi, S.; Suryati, E.; Cahyono, R.P. Sentiment Analysis on Online Transportation Reviews Using Word2Vec Text Embedding Model Feature Extraction and Support Vector Machine (SVM) Algorithm. In Proceedings of the 2021 International Seminar on Machine Learning, Optimization, and Data Science, ISMODE 2021, Jakarta, Indonesia, 29–30 January 2022; Institute of Electrical and Electronics Engineers Inc.: New York, NY, USA, 2022; pp. 163–167. [Google Scholar]
Atmadja, A.R.; Uriawan, W.; Pritisen, F.; Maylawati, D.S.; Arbain, A. Comparison of Naive Bayes and K-Nearest Neighbours for Online Transportation Using Sentiment Analysis in Social Media. J. Phys. Conf. Ser. 2019, 1402, 077029. [Google Scholar] [CrossRef]
Gupta, P.; Mehlawat, M.K.; Khaitan, A.; Pedrycz, W. Sentiment Analysis for Driver Selection in Fuzzy Capacitated Vehicle Routing Problem with Simultaneous Pick-Up and Drop in Shared Transportation. IEEE Trans. Fuzzy Syst. 2021, 29, 1198–1211. [Google Scholar] [CrossRef]
Kumalasari, A.T.; Handayani, W. Sentiment Analysis to Improve the Quality of Public Services “Suroboyo Bus”. Indones. Interdiscip. J. Sharia Econ. 2024, 7, 6407–6426. [Google Scholar]
Bakalos, N.; Papadakis, N.; Litke, A. Public Perception of Autonomous Mobility Using ML-Based Sentiment Analysis over Social Media Data. Logistics 2020, 4, 12. [Google Scholar] [CrossRef]
Rohwinasakti, S.; Irawan, B.; Setianingsih, C. Sentiment Analysis on Online Transportation Service Products Using K-Nearest Neighbor Method. In Proceedings of the International Conference on Computer Information, and Telecommunication Systems CITS 2021, Istanbul, Turkey, 11–13 November 2021; Institute of Electrical and Electronics Engineers Inc.: New York, NY, USA, 2021. [Google Scholar]
Gitto, S.; Mancuso, P. Improving Airport Services Using Sentiment Analysis of the Websites. Tour. Manag. Perspect. 2017, 22, 132–136. [Google Scholar] [CrossRef]
Martin-Domingo, L.; Martín, J.C.; Mandsberg, G. Social Media as a Resource for Sentiment Analysis of Airport Service Quality (ASQ). J. Air Transp. Manag. 2019, 78, 106–115. [Google Scholar] [CrossRef]
Aldisa, R.T.; Maulana, P.; Aldinugroho, M. Sentiment Analysis of Public Transportation Services on Twitter Social Media Using the Method Naïve Bayes Classifier. Int. J. Inf. Syst. Technol. Akreditasi 2021, 5, 466–475. [Google Scholar] [CrossRef]
Ayu Putu Savita Arsarini, D.; Ketut Gede Darma Putra, I.; Kadek Dwi Rusjayanthi, N. Public Sentiment Analysis of Online Transportation in Indonesia through Social Media Using Google Machine Learning. J. Ilm. Merpati 2021, 9, 153–164. [Google Scholar]
Wagner, S.; Fernández, D.M. Analyzing Text in Software Projects. In The Art and Science of Analyzing Software Data; Elsevier Inc.: Amsterdam, The Netherlands, 2015; pp. 39–72. ISBN 9780124115439. [Google Scholar]
Wu, R.; Shao, C.; Zhuge, C.; Wang, X.; Yin, X. What Do People Complain about Transport Service? Text Mining of Hotline Data Using LDA Model 2022. Available online: https://ssrn.com/abstract=4305469 (accessed on 24 November 2024).
Egger, R.; Yu, J. A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts. Front. Sociol. 2022, 7, 886498. [Google Scholar] [CrossRef]
Kuhn, K.D. Using Structural Topic Modeling to Identify Latent Topics and Trends in Aviation Incident Reports. Transp. Res. Part C Emerg. Technol. 2018, 87, 105–122. [Google Scholar] [CrossRef]
Tamakloe, R.; Park, D. Discovering Latent Topics and Trends in Autonomous Vehicle-Related Research: A Structural Topic Modelling Approach. Transp. Policy 2023, 139, 1–20. [Google Scholar] [CrossRef]
Hidayatullah, A.F.; Ma’arif, M.R. Road Traffic Topic Modeling on Twitter Using Latent Dirichlet Allocation. In Proceedings of the 2017 International Conference on Sustainable Information Engineering and Technology (SIET), Malang, Indonesia, 24–25 November 2017. [Google Scholar]
Moreno, A.; Iglesias, C.A. Understanding Customers’ Transport Services with Topic Clustering and Sentiment Analysis. Appl. Sci. 2021, 11, 10169. [Google Scholar] [CrossRef]
Kinra, A.; Beheshti-Kashi, S.; Buch, R.; Nielsen, T.A.S.; Pereira, F. Examining the Potential of Textual Big Data Analytics for Public Policy Decision-Making: A Case Study with Driverless Cars in Denmark. Transp. Policy 2020, 98, 68–78. [Google Scholar] [CrossRef]
Esztergár-Kiss, D. Horizon 2020 Project Analysis by Using Topic Modelling Techniques in the Field of Transport. Transp. Telecommun. 2024, 25, 266–277. [Google Scholar] [CrossRef]
Pineda-Jaramillo, J.; Fazio, M.; Le Pira, M.; Giuffrida, N.; Inturri, G.; Viti, F.; Ignaccolo, M. A Sentiment Analysis Approach to Investigate Tourist Satisfaction towards Transport Systems: The Case of Mount Etna. Transp. Res. Procedia 2023, 69, 400–407. [Google Scholar] [CrossRef]
Roque, C.; Lourenço Cardoso, J.; Connell, T.; Schermers, G.; Weber, R. Topic Analysis of Road Safety Inspections Using Latent Dirichlet Allocation: A Case Study of Roadside Safety in Irish Main Roads. Accid. Anal. Prev. 2019, 131, 336–349. [Google Scholar] [CrossRef]
Mendez, J.T.; Lobel, H.; Parra, D.; Herrera, J.C. Using Twitter to Infer User Satisfaction with Public Transport: The Case of Santiago, Chile. IEEE Access 2019, 7, 60255–60263. [Google Scholar] [CrossRef]
Dou, M.; Gu, Y.; Gong, J. How Do People Perceive the Quality of Urban Transport Service? New Insights from Online Reviews of Shanghai Metro System. J. Urban Manag. 2024, 13, 705–719. [Google Scholar] [CrossRef]
Ye, Q.; Chen, X.; Zhang, H.; Ozbay, K.; Zuo, F. Public Concerns and Response Pattern toward Shared Mobility Using Social Media Data. In Proceedings of the IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 1 January 2019; pp. 619–624. [Google Scholar]
Tamakloe, R.; Park, D.; Chang, H. Discovering Research Topics, Trends, and Perspectives in COVID-19-Related Transportation Journal Articles. Int. J. Urban Sci. 2022, 26, 710–738. [Google Scholar] [CrossRef]
Kabbani, O.; Klumpenhouwer, W.; El-Diraby, T.; Shalaby, A. What Do Riders Say and Where? The Detection and Analysis of Eyewitness Transit Tweets. J. Intell. Transp. Syst. Technol. Plan. Oper. 2023, 27, 347–363. [Google Scholar] [CrossRef]
Aksan, A.; Akdağ, H.C. Comparative Analysis of Public Transportation Through Sentiment Analysis and Topic Modeling. In Industrial Engineering in the Industry 4.0 Era; Springer Science and Business Media Deutschland GmbH: Cham, Switzerland, 2024; pp. 3–15. [Google Scholar]
Ali, F.; El-Sappagh, S.; Ali, A.; Kwak, K.S.; Ei-Sappagh, S.; Kwak, K.S.; Kwak, D. Sentiment Analysis of Transportation Using Word Embedding and LDA Approaches. In Proceedings of the Korea Institute of Communications and Information Sciences 2018 Winter General Academic Conference, Incheon, Republic of Korea, 25 January 2018. [Google Scholar]
Çaylak, P.Ç.; Kayakuş, M.; Eksili, N.; Yiğit Açikgöz, F.; Coşkun, A.E.; Ichimov, M.A.M.; Moiceanu, G. Analysing Online Reviews Consumers’ Experiences of Mobile Travel Applications with Sentiment Analysis and Topic Modelling: The Example of Booking and Expedia. Appl. Sci. 2024, 14, 11800. [Google Scholar] [CrossRef]
Özkara, Y.; Bilişli, Y.; Yildirim, F.S.; Kayan, F.; Başdeğirmen, A.; Kayakuş, M.; Yiğit Açıkgöz, F. Analysing Social Media Discourse on Electric Vehicles with Machine Learning. Appl. Sci. 2025, 15, 4395. [Google Scholar] [CrossRef]
Deng, J.; Liu, Y. Research on Sentiment Analysis of Online Public Opinion Based on RoBERTa–BiLSTM–Attention Model. Appl. Sci. 2025, 15, 2148. [Google Scholar] [CrossRef]
Melhem, W.Y.; Abdi, A.; Meziane, F. Deep Learning Classification of Traffic-Related Tweets: An Advanced Framework Using Deep Learning for Contextual Understanding and Traffic-Related Short Text Classification. Appl. Sci. 2024, 14, 11009. [Google Scholar] [CrossRef]
Laynes-Fiascunari, V.; Gutierrez-Franco, E.; Rabelo, L.; Sarmiento, A.T.; Lee, G. A Framework for Urban Last-Mile Delivery Traffic Forecasting: An In-Depth Review of Social Media Analytics and Deep Learning Techniques. Appl. Sci. 2023, 13, 5888. [Google Scholar] [CrossRef]
Ali, F.; Kwak, D.; Khan, P.; El-Sappagh, S.; Ali, A.; Ullah, S.; Kim, K.H.; Kwak, K.S. Transportation Sentiment Analysis Using Word Embedding and Ontology-Based Topic Modeling. Knowl. Based Syst. 2019, 174, 27–42. [Google Scholar] [CrossRef]
Shokoohyar, S.; Ghomi, V.; Jafari Gorizi, A.; Liang, W.; Sinclair, A. Impact of COVID-19 Outbreak and Vaccination on Ride-Sharing Services: A Social Media Analysis. Transp. Lett. 2024, 16, 527–541. [Google Scholar] [CrossRef]

Figure 1. PRISMA flow diagram illustrating the step-by-step process of the literature review.

Figure 2. Publications over time. Source: OpenAlex.

Figure 3. Co-occurrences of “transport”. Source: VOSviewer.

Figure 4. Co-occurrences of “Twitter”. Source: VOSviewer.

Figure 5. Co-occurrences of “topic”. Source: VOSviewer.

Figure 6. Co-occurrences of “classification”. Source: VOSviewer.

Figure 7. Study locations.

Figure 8. Use of social media in research (percentage).

Figure 9. Data size by unique publication.

Table 1. Scientific studies using sentiment analysis in the transport domain.

Ref./Year	Method	Application	Findings
[44] 2023	Sentiment analysis using the TextBlob library; preprocessing and polarity scoring of comments from survey data.	Improving customer satisfaction for Katsina State Transport Authority through insights from user feedback.	71% positive sentiments; 9% negative, highlighting service areas needing improvement. Recommendations for operational and service enhancements based on feedback.
[45] 2020	Sentiment analysis using Naïve Bayes and Decision Tree algorithms on Twitter data; preprocessing included data cleaning, case folding, filtering, stemming, and tokenizing.	Analyzing public sentiment toward COVID-19 transmission risks among commuter line passengers in Indonesia.	Naïve Bayes outperformed Decision Tree with an average accuracy of 73.59%. Positive sentiment was the most frequent (135 tweets), followed by negative (152 tweets) and neutral (53 tweets). Sentiment predominantly reflected appeals and calls for COVID-19 prevention and control.
[6] 2021	Systematic mapping review of 74 transportation-related studies using social media data published between 2008 and 2018	Identifying trends, approaches, data types, and platforms in transportation research leveraging social media analysis; providing insights for future research.	Twitter is the most frequently used platform (72% of studies). Text data is the most commonly analyzed attribute, often combined with sentiment analysis or topic modeling. Challenges include limited access to data and insufficient integration of metadata like location.
[46] 2014	Rule-based sentiment analysis framework for processing traffic-related web data; data preprocessed with text segmentation and rule construction.	Analyzing public sentiment on traffic-related issues (e.g., “yellow light rule” and fuel prices) to support intelligent transportation systems (ITS) and policy evaluation.	Rule-based approach achieved higher accuracy for traffic-related sentiment analysis compared to existing algorithms. Demonstrated the value of public sentiment as a supplementary “social sensor” in ITS decision making.
[47] 2023	Sentiment analysis using SentiStrength, combined with Chi-square and t-tests, applied to Twitter data.	Measuring customer satisfaction and online reputation of Trenitalia rail services during the pre-COVID-19 (2019) and COVID-19 (2020) periods.	Prevalence of neutral sentiment (41% in 2019; 64% in 2020). Mixed emotions highlighted in tweets reflect the impact of the pandemic on mobility and service expectations. Differences in sentiment attributed to contextual factors, emphasizing the need for qualitative analysis alongside quantitative methods.
[48] 2020	Word2Vec classification algorithm and semantic analysis for analyzing traffic-related tweets; preprocessing included data cleaning, hashtag and URL removal, and multilingual tweet conversion.	Categorizing traffic-related issues (e.g., accidents, congestion, potholes) in Indian cities for improved traffic management.	Tweets categorized into positive, negative, and neutral sentiments. Traffic issues such as accidents, congestion, and potholes identified; significant disparities observed in tweet activity between regions of Ahmedabad.
[49] 2015	Sentiment analysis using Support Vector Machines (SVM) and delta TF-IDF; event detection based on keyword matching.	Improving urban mobility in Milan by analyzing Twitter data for detecting transport events and assessing user sentiment toward public transportation services.	SVM proved effective for sentiment classification with high accuracy. Event detection and sentiment analysis enhanced the TAM-TAM platform’s capability for trip optimization and service evaluation.
[50] 2023	Optimal Transport (OT) loss applied to sentiment analysis for minimizing misclassifications between opposite sentiment classes; compared against traditional cross-entropy loss.	Analyzing user sentiment toward U.S. airlines using the Kaggle Twitter US Airline Sentiment dataset.	OT loss improved misclassification rates between positive and negative sentiment classes compared to cross-entropy. Overall accuracy was slightly improved, demonstrating OT’s potential for enhancing sentiment classification tasks. Results emphasize OT’s utility for handling imbalanced classes in real-world applications.
[51] 2017	Sentiment analysis using Support Vector Machine (SVM) with unigram and TF-IDF for feature extraction; preprocessing included text cleaning, stemming, and tokenization.	Analyzing public sentiment about the online transportation service GoJek, categorizing tweets as positive or negative to assess customer satisfaction.	Achieved 86% accuracy in classifying tweets. Positive sentiment was identified with 100% precision, while negative sentiment showed 67.44% precision. Highlighted the potential of sentiment analysis for improving service quality and understanding customer feedback.
[52] 2015	Sentiment analysis using an English opinion lexicon; unigram-based clustering for topic analysis; text preprocessing included emoji translation and data cleaning.	Understanding public opinions about Los Angeles’ light rail system to enhance service delivery and policy-making.	Temporal analysis revealed weekly patterns, such as more positive tweets on Mondays and negative ones during weekends. Word clustering identified frequent topics (e.g., “delay,” “disable,” “dies,” “fatal”). Highlighted Twitter’s potential as a tool for real-time feedback on transit services.
[53] 2021	Corpus-based analysis using Indonesian Web Corpus (IWaC) and hoax news collection from Turnbackhoax.com; socio-pragmatic approach applied to interpret discourse prosody.	Analyzing the spread of transportation-related hoaxes in Indonesia to identify underlying social issues, such as distrust in government and unawareness of transportation laws.	Most hoaxes related to land transportation, including toll roads, police ticketing, and public transport issues. Themes often reflect real social concerns, such as economic inequality, governance mistrust, and lack of infrastructure awareness. Emphasized the role of hoaxes in mirroring unresolved societal problems in transportation.
[54] 2019	Complaint classification using logistic regression and neural models (MLP and LSTM); dataset created via manual annotation of Twitter complaints across nine domains, including transport.	Identifying and analyzing complaints on social media to enhance customer service and support dialogue systems.	Predictive accuracy reached up to 79% F1 (using logistic regression with bag-of-words features). Transport domain exhibited lower performance due to linguistic variations in complaint expression. Proposed models significantly outperformed baseline sentiment analysis tools in complaint classification.
[55] 2021	Content analysis of Yelp reviews using a coding scheme developed from literature and refined via manual review; analyzed factors affecting user evaluations and complaints.	Investigating perceptions of bike-sharing systems in New York, Washington, D.C., and Chicago, focusing on regional and temporal variations in user satisfaction.	Average ratings were consistent across regions (around 2.6/5). Key factors influencing satisfaction included pricing, bike quality, and customer service. Highlighted the role of social media in detecting service issues and planning improvements.
[56] 2021	Sentiment analysis using the SentimentR tool for analyzing noise complaint records; Difference-in-Differences (DID) approach for causal analysis; instrumental variables used to validate proximity effects.	Investigating the impact of a new bus route on noise complaints and housing prices in Bukit Panjang, Singapore, to understand the trade-off between accessibility and environmental externalities.	The new bus route increased noise complaints by 10.9% for residents within 100 m compared to those 100–200 m away. Noise complaints were most pronounced for mid-level floors (5th–8th). Housing prices decreased by 3% with a 1-point increase in noise sentiment, offsetting 17.8% of the accessibility benefit. Demonstrated the importance of noise insulation in transit-oriented developments.
[57] 2020	Sentiment analysis using a Naïve Bayes classifier and a linear classifier with stochastic gradient descent optimization; data mining from web sources using the Scrapy framework; text vectorization through Bag-of-Words and TF-IDF Vectorizer.	Analyzing road conditions in the Northwestern Federal District of Russia through user reviews to identify problematic areas and provide recommendations for traffic safety improvements.	Classifier achieved 71.94% accuracy, categorizing reviews into positive and negative. Roads with positive reviews covered 75% of total length, while negatively reviewed roads accounted for 25%. Created a visual map highlighting road conditions, supporting data-driven decision making for traffic management.
[58] 2020	Sentiment analysis using Naïve Bayes Classifier (NBC); preprocessing included tokenization, stopword removal, and TF-IDF term weighting.	Evaluating public sentiment about Gojek online transportation services based on Instagram comments, classifying sentiments as positive or negative to inform service improvements.	Achieved an accuracy of 81% using NBC. Positive comments reflected satisfaction with promotions and service reliability, while negative comments highlighted technical issues and long waiting times. Demonstrated the utility of Instagram as a feedback channel for enhancing transportation services.
[59] 2024	Sentiment analysis using BERT (Bidirectional Encoder Representations from Transformers) for text classification and lexicon-based sentiment analysis; combined with Analytic Hierarchy Process (AHP) for cross-validation.	Developing a comprehensive evaluation system for urban traffic intelligence in Shanghai, focusing on public travel experiences gathered from surveys and social media (Weibo).	key indicators identified: safety, efficiency, comfort, environmental friendliness, reliability, convenience, flexibility, information accessibility, and affordability. Areas needing improvement: affordability, safety, and comfort. High positive sentiment towards flexibility, environmental friendliness, and information accessibility. Demonstrated scalability for application to other cities with minimal input data.
[60] 2021	Sentiment analysis using Word2Vec (Skip-gram model) for feature extraction and Support Vector Machine (SVM) with various kernels (RBF, Linear, Polynomial) for classification.	Evaluating public sentiment on Gojek and Grab reviews from the Google Play Store to identify service strengths and weaknesses.	Gojek achieved a higher performance score: 89% accuracy, 94% precision, 86% recall, and 90% F1-score. Grab: 87% accuracy, 94% precision, 85% recall, and 89% F1-score. Demonstrated Word2Vec and SVM as effective tools for sentiment classification in online transportation reviews.
[43] 2024	Multilingual opinion mining using AfriBERTa, AfroXLMR, and AfroLM models; preprocessing involved data cleaning, trend analysis, and handling code-mixed languages.	Investigating public sentiment toward rail, bus, and mini-bus taxi systems in Kenya, South Africa, and Tanzania through Twitter data, focusing on user satisfaction and multilingual insights.	Positive sentiments were observed for reliability and accessibility, while negative sentiments focused on delays, overcrowding, and safety issues. Code-mixed datasets highlighted the complexity of user expressions, emphasizing the need for multilingual NLP tools. Demonstrated the alignment and gaps between user sentiment and service provider ratings. Results offered actionable insights for improving public transport policies and service quality in multilingual contexts.
[61] 2019	Sentiment analysis using Naïve Bayes Classifier (NBC) and K-Nearest Neighbors (KNN); preprocessing included data cleaning, tokenization, stemming, and filtering.	Comparing the performance of NBC and KNN for sentiment classification of tweets related to online transportation services (e.g., Gojek and Grab).	NBC achieved 66.15% accuracy, while KNN slightly outperformed with 67.69% accuracy. Demonstrated that KNN provides better classification accuracy than NBC for small datasets. Identified positive and negative sentiments related to service quality, driver behavior, and application functionality. Recommendations included expanding datasets and integrating spam detection for improved accuracy.
[62] 2021	Sentiment analysis using Natural Language Processing (NLP) to extract driver ratings from customer feedback; hybrid Genetic Algorithm (GA) and fuzzy simulation to optimize vehicle routing in uncertain conditions.	Improving shared transportation systems by integrating customer sentiment into driver selection for the Vehicle Routing Problem (VRP) with simultaneous pick-up and drop services.	NLP-based sentiment analysis provided actionable insights into driver performance, linking driver selection to improved customer satisfaction. Fuzzy simulations demonstrated robust handling of uncertainties in travel durations and customer demands. Genetic Algorithm optimization reduced travel times while incorporating customer sentiment into routing decisions.
[63] 2024	Sentiment analysis using Random Forest for classification; data preprocessing involved tokenization, stemming, and case folding. Word clouds were used for visualization.	Evaluating customer feedback about the Suroboyo Bus on Instagram to identify key areas for service improvement and enhance public transportation quality in Surabaya.	Positive sentiments focused on the affordability and convenience of services. Negative sentiments highlighted issues with bus stops, routes, schedules, and the mobile app. Random Forest achieved an accuracy of 71.27% in classifying sentiments. Recommendations include optimizing bus schedules, improving the app, and addressing service gaps identified through sentiment analysis.
[64] 2020	Sentiment analysis using BERT for natural language processing; data captured via APIs from Twitter and Reddit. Preprocessing included tokenization, lemmatization, and removal of irrelevant data.	Assessing public acceptance of autonomous mobility by identifying fears and concerns regarding self-driving technology through social media data.	61.66% of Twitter posts expressed positive opinions, compared to 71.72% on Reddit. Negative sentiments were associated with safety concerns, cybersecurity fears, and employment issues. Identified specific concerns like combining autonomous and conventional vehicles, liability in accidents, and the potential decline of driving as a hobby. Recommendations include addressing technophobia and enhancing public understanding of autonomous technology.
[7] 2022	Comprehensive bibliometric review of 353 research articles using VOSviewer for co-citation and network analysis. Data preprocessing included keyword identification, clustering, and co-occurrence mapping.	Mapping the thematic and intellectual structure of sentiment analysis in public services for smart society, including applications in traffic congestion, governance, and urban planning.	Identified motor themes like information retrieval and supervised learning as drivers for smart societies. Emerging themes included social media data for urban planning and location-based services. Sentiment analysis facilitates innovation, transparency, citizen participation, and improved efficiency in public service management. Highlighted gaps in cross-domain application and the need for advanced algorithms to process unstructured social media data.
[65] 2020	Sentiment analysis using the K-Nearest Neighbor (KNN) algorithm; preprocessing included tokenization, case folding, stopword removal, and stemming. Data extracted from Instagram comments related to two Indonesian online transportation providers (BRG and KJG).	Analyzing user satisfaction with online transportation services to identify positive and negative customer sentiments, aiding service improvement.	BRG: 35.6% positive comments, 64.4% negative comments. KJG: 35.9% positive comments, 64.1% negative comments. Achieved 94.4% accuracy, precision, recall, and F1-score with a training-test split of 95:5. Demonstrated KNN’s effectiveness for analyzing social media data to gauge public opinion.
[66] 2017	Sentiment analysis using KNIME and Semantria API; document-level analysis of blog content using a dictionary-based and machine learning approach.	Assessing customer satisfaction levels with aviation and non-aviation services at five major European airports based on blog data.	Non-aviation services (food, beverage, shopping) had higher positive sentiment (55%) than aviation services (check-in, baggage claim, security control) with only 33% positive feedback. Identified critical areas for improvement, including food and beverage quality, and check-in and security procedures. Provided actionable insights for airport management to prioritize resource allocation and enhance passenger experience.
[67] 2019	Sentiment analysis using Theysay and Twinword tools for analyzing 4392 tweets related to Heathrow Airport’s services. Preprocessing included keyword extraction and mapping to Airport Service Quality (ASQ) attributes.	Evaluating passenger sentiment on various airport service aspects to complement traditional methods like surveys, providing insights into areas needing improvement.	Positive sentiments were higher for attributes like WiFi, Food and Beverage, and Lounge services. Negative sentiments focused on Waiting, Parking, Passport Control, and Staff interactions. The analysis highlighted gaps in service quality, guiding actionable improvements.
[68] 2021	Sentiment analysis using the Naïve Bayes Classifier (NBC). Data preprocessing included tokenization, stemming, and filtering. Tweets related to Gojek, Grab, Commuter Line, and Transjakarta services were analyzed.	Evaluating public sentiment about major transportation services in Greater Jakarta based on Twitter data to identify service strengths and weaknesses.	Commuter Line: Positive sentiment dominated with a 0.333 probability and no negative tweets. Gojek: 8.4% positive and 50.4% negative probabilities. Grab: 6.7% positive and 79% negative probabilities. Transjakarta: 2.4% positive and 11.9% negative probabilities. Overall, negative sentiment outweighed positive sentiment for most services except for the Commuter Line. Recommendations included leveraging social media feedback to improve service quality and customer interaction.
[69] 2021	Sentiment analysis using Google Machine Learning API (Natural Language API); data preprocessing included data crawling, cleansing, and filtering. Sentiment labeling was based on a threshold system (positive, neutral, negative).	Evaluating public sentiment toward online transportation providers in Indonesia (Gojek, Grab, and Bluebird) to inform service improvements.	Sentiment distribution for Gojek: 495 positive, 853 neutral, 1054 negative tweets. Sentiment distribution for Grab: 385 positive, 406 neutral, 429 negative tweets. Sentiment distribution for Bluebird: 21 positive, 49 neutral, 17 negative tweets. Gojek received the highest proportion of negative sentiments. Performance of the sentiment analysis system achieved 82.6% accuracy, 82.2% precision, and 83.3% recall. Demonstrated Twitter as the most valuable platform for sentiment data collection compared to other platforms like YouTube and Google Search.

Table 2. Scientific studies using sentiment analysis and topic modeling in the transport domain.

Ref./Year	Method	Application	Findings
[73] 2018	Structural Topic Modeling (STM) applied to aviation safety incident reports (ASRS database), using metadata (flight mission, phase of flight, etc.) to uncover latent topics and trends.	Identifying themes and trends in aviation safety to inform safety priorities and future research.	Demonstrated STM’s utility for integrating metadata and narrative text for aviation safety insights.
[80] 2019	Latent Dirichlet Allocation (LDA) applied to Road Safety Inspection (RSI) reports to identify patterns in roadside safety issues and interventions.	Assessing road safety on Irish roads, focusing on run-off-road crashes and identifying common problems and corresponding interventions.	Common roadside hazards include poles, roadside barriers, and walls. Issues are more frequently identified than solutions, indicating better intervention strategies are needed. Lack of application of “clear zone” and “forgiving roadside” concepts in Irish RSI reports.
[81] 2019	Sentiment analysis using manual classification and topic modeling with MALLET; validation of results via comparative analysis with traditional surveys.	Measuring user satisfaction with the Transantiago public transport system in Santiago, Chile, using Twitter data to complement traditional surveys.	Twitter primarily captures negative sentiment (75%) compared to surveys, which report more balanced feedback. Thematic topics from tweets aligned well with operational issues yet revealed biases towards rush hours and higher socioeconomic areas. Twitter data provides broader spatial coverage than surveys yet is more limited in terms of in depth per stop or service.
[82] 2024	Structural Topic Model (STM) applied to 52,087 online reviews from Dianping.com; preprocessing included text segmentation, stopword removal, and part-of-speech tagging.	Assessing the quality of service in the Shanghai Metro system by identifying key themes, analyzing sentiment polarity, and exploring temporal and spatial patterns.	Temporal analysis showed a rise in experience-related topics and a decline in physical operations-related topics. Spatial clustering revealed differing priorities across station types: residential areas emphasized commuting needs, business districts highlighted design and operational aspects, and other areas focused on cleanliness and security.
[83] 2019	Latent Dirichlet Allocation (LDA) and dictionary-based sentiment analysis applied to 13,738 Weibo messages; preprocessing included keyword filtering and term weighting using TF-IDF.	Understanding public concerns and sentiment patterns related to ride-hailing security following a Didi driver murder case to improve service platforms and government regulations.	Four key concerns: ride-hailing service quality (35%), platform accountability (25%), government regulation on market entry, and case-related discussions. Negative sentiment peaked after the incident, highlighting dissatisfaction with driver background checks and crisis response. Positive sentiment was associated with competition driving service improvements.
[84] 2022	Structural Topic Model (STM) applied to abstracts from 421 COVID-19-related transportation research articles to identify latent themes and analyze trends and perspectives.	Investigating how the pandemic influenced transportation research priorities and comparing research focuses between high-income countries (HICs) and middle- and low-income countries (MLICs).	Key topics included travel behavior changes, airport financial performance, air transport recovery, and logistics optimization. Emerging areas included shipping emissions, active transportation, and traffic safety. HIC authors emphasized shared mobility and safety, while MLIC researchers focused on logistics efficiency and pandemic mitigation strategies.
[85] 2023	Sentiment analysis using Valence Aware Dictionary and Sentiment Reasoner (VADER) combined with topic modeling and geospatial analysis; preprocessing involved tweet cleaning, geocoding, and location matching.	Detecting and analyzing eyewitness transit-related tweets for incident management in Calgary Transit, identifying urgent issues, and supporting service improvement.	Safety and security incidents were the most frequently reported topics, followed by ride quality and travel time. Tweets were primarily negative or neutral, reflecting complaints about crowded buses, safety concerns, and service delays. Spatial analysis highlighted high tweet activity in downtown Calgary, enabling targeted service responses. Demonstrated potential for integrating social media insights into real-time transit management.
[86] 2024	Sentiment analysis using RoBERTa for polarity classification (positive, neutral, negative) combined with Latent Dirichlet Allocation (LDA) for topic modeling. Data preprocessing included tokenization, lemmatization, and removal of stopwords.	Understanding public sentiment and identifying key issues and strengths in public transportation in the UK and India based on Twitter data.	Recommendations include addressing affordability and safety in the UK and enhancing public transport infrastructure in India.
[87] 2018	Sentiment analysis using Latent Dirichlet Allocation (LDA) for topic modeling and Word2Vec (skip-gram model) for feature representation. Data preprocessing included filtering, stopword removal, lemmatization, and cleaning. Classification applied machine learning models, including Decision Tree, SVM, Logistic Regression, Naïve Bayes, Random Forest, and Neural Network.	Analyzing transportation-related data from Twitter, New York Times, and TripAdvisor to predict polarity and enhance Intelligent Transportation Systems (ITS).	The proposed system improved the accuracy of sentiment classification by integrating topic modeling and word embedding. Neural network models outperformed traditional classifiers in sentiment prediction. Highlighted the importance of domain-specific ontologies for refining feature extraction.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Torres, E.C.M.; de Picado-Santos, L.G. Sentiment Analysis and Topic Modeling in Transportation: A Literature Review. Appl. Sci. 2025, 15, 6576. https://doi.org/10.3390/app15126576

AMA Style

Torres ECM, de Picado-Santos LG. Sentiment Analysis and Topic Modeling in Transportation: A Literature Review. Applied Sciences. 2025; 15(12):6576. https://doi.org/10.3390/app15126576

Chicago/Turabian Style

Torres, Ewerton Chaves Moreira, and Luís Guilherme de Picado-Santos. 2025. "Sentiment Analysis and Topic Modeling in Transportation: A Literature Review" Applied Sciences 15, no. 12: 6576. https://doi.org/10.3390/app15126576

APA Style

Torres, E. C. M., & de Picado-Santos, L. G. (2025). Sentiment Analysis and Topic Modeling in Transportation: A Literature Review. Applied Sciences, 15(12), 6576. https://doi.org/10.3390/app15126576

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sentiment Analysis and Topic Modeling in Transportation: A Literature Review

Abstract

1. Introduction

2. Related Work

3. Review Methodology

3.1. Database

3.2. Bibliographic Analysis

3.3. Selected Studies

4. Findings from the Literature Review

4.1. Social Media Data

4.2. Sentiment Analysis

4.3. Topic Modeling

5. Discussion

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI