Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (65)

Search Parameters:
Keywords = short text mining

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
28 pages, 2850 KiB  
Article
Quantification and Evolution of Online Public Opinion Heat Considering Interactive Behavior and Emotional Conflict
by Zhengyi Sun, Deyao Wang and Zhaohui Li
Entropy 2025, 27(7), 701; https://doi.org/10.3390/e27070701 - 29 Jun 2025
Viewed by 266
Abstract
With the rapid development of the Internet, the speed and scope of sudden public events disseminating in cyberspace have grown significantly. Current methods of quantifying public opinion heat often neglect emotion-driven factors and user interaction behaviors, making it difficult to accurately capture fluctuations [...] Read more.
With the rapid development of the Internet, the speed and scope of sudden public events disseminating in cyberspace have grown significantly. Current methods of quantifying public opinion heat often neglect emotion-driven factors and user interaction behaviors, making it difficult to accurately capture fluctuations during dissemination. To address these issues, first, this study addressed the complexity of interaction behaviors by introducing an approach that employs the information gain ratio as a weighting indicator to measure the “interaction heat” contributed by different interaction attributes during event evolution. Second, this study built on SnowNLP and expanded textual features to conduct in-depth sentiment mining of large-scale opinion texts, defining the variance of netizens’ emotional tendencies as an indicator of emotional fluctuations, thereby capturing “emotional heat”. We then integrated interactive behavior and emotional conflict assessment to achieve comprehensive heat index to quantification and dynamic evolution analysis of online public opinion heat. Subsequently, we used Hodrick–Prescott filter to separate long-term trends and short-term fluctuations, extract six key quantitative features (number of peaks, time of first peak, maximum amplitude, decay time, peak emotional conflict, and overall duration), and applied K-means clustering algorithm (K-means) to classify events into three propagation patterns, which are extreme burst, normal burst, and long-tail. Finally, this study conducted ablation experiments on critical external intervention nodes to quantify the distinct contribution of each intervention to the propagation trend by observing changes in the model’s goodness-of-fit (R2) after removing different interventions. Through an empirical analysis of six representative public opinion events from 2024, this study verified the effectiveness of the proposed framework and uncovered critical characteristics of opinion dissemination, including explosiveness versus persistence, multi-round dissemination with recurring emotional fluctuations, and the interplay of multiple driving factors. Full article
(This article belongs to the Special Issue Statistical Physics Approaches for Modeling Human Social Systems)
Show Figures

Figure 1

22 pages, 561 KiB  
Article
Opinion Mining and Analysis Using Hybrid Deep Neural Networks
by Adel Hidri, Suleiman Ali Alsaif, Muteeb Alahmari, Eman AlShehri and Minyar Sassi Hidri
Technologies 2025, 13(5), 175; https://doi.org/10.3390/technologies13050175 - 28 Apr 2025
Viewed by 520
Abstract
Understanding customer attitudes has become a critical component of decision-making due to the growing influence of social media and e-commerce. Text-based opinions are the most structured, hence playing an important role in sentiment analysis. Most of the existing methods, which include lexicon-based approaches [...] Read more.
Understanding customer attitudes has become a critical component of decision-making due to the growing influence of social media and e-commerce. Text-based opinions are the most structured, hence playing an important role in sentiment analysis. Most of the existing methods, which include lexicon-based approaches and traditional machine learning techniques, are insufficient for handling contextual nuances and scalability. While the latter has limitations in model performance and generalization, deep learning (DL) has achieved improvement, especially on semantic relationship capturing with recurrent neural networks (RNNs) and convolutional neural networks (CNNs). The aim of the study is to enhance opinion mining by introducing a hybrid deep neural network model that combines a bidirectional gated recurrent unit (BGRU) and long short-term memory (LSTM) layers to improve sentiment analysis, particularly addressing challenges such as contextual nuance, scalability, and class imbalance. To substantiate the efficacy of the proposed model, we conducted comprehensive experiments utilizing benchmark datasets, encompassing IMDB movie critiques and Amazon product evaluations. The introduced hybrid BGRU-LSTM (HBGRU-LSTM) architecture attained a testing accuracy of 95%, exceeding the performance of traditional DL frameworks such as LSTM (93.06%), CNN+LSTM (93.31%), and GRU+LSTM (92.20%). Moreover, our model exhibited a noteworthy enhancement in recall for negative sentiments, escalating from 86% (unbalanced dataset) to 96% (balanced dataset), thereby ensuring a more equitable and just sentiment classification. Furthermore, the model diminished misclassification loss from 20.24% for unbalanced to 13.3% for balanced dataset, signifying enhanced generalization and resilience. Full article
(This article belongs to the Section Information and Communication Technologies)
Show Figures

Figure 1

19 pages, 2579 KiB  
Article
Predicting Workplace Hazard, Stress and Burnout Among Public Health Inspectors: An AI-Driven Analysis in the Context of Climate Change
by Ioannis Adamopoulos, Antonios Valamontes, Panagiotis Tsirkas and George Dounias
Eur. J. Investig. Health Psychol. Educ. 2025, 15(5), 65; https://doi.org/10.3390/ejihpe15050065 - 22 Apr 2025
Viewed by 1037
Abstract
The increasing severity of climate-related workplace hazards challenges occupational health and safety, particularly for Public Health and Safety Inspectors. Exposure to extreme temperatures, air pollution, and high-risk environments heightens immediate physical threats and long-term burnout. This study employs Artificial Intelligence (AI)-driven predictive analytics [...] Read more.
The increasing severity of climate-related workplace hazards challenges occupational health and safety, particularly for Public Health and Safety Inspectors. Exposure to extreme temperatures, air pollution, and high-risk environments heightens immediate physical threats and long-term burnout. This study employs Artificial Intelligence (AI)-driven predictive analytics and secondary data analysis to assess hazards and forecast burnout risks. Machine learning models, including eXtreme Gradient Boosting (XGBoost 3.0), Random Forest, Autoencoders, and Long Short-Term Memory (LSTMs), achieved 85–90% accuracy in hazard prediction, reducing workplace incidents by 35% over six months. Burnout risk analysis identified key predictors: physical hazard exposure (β = 0.76, p < 0.01), extended work hours (>10 h/day, +40% risk), and inadequate training (β = 0.68, p < 0.05). Adaptive workload scheduling and fatigue monitoring reduced burnout prevalence by 28%. Real-time environmental data improved hazard detection, while Natural Language Processing (NLP)-based text mining identified stress-related indicators in worker reports. The results demonstrate AI’s effectiveness in workplace safety, predicting, classifying, and mitigating risks. Reinforcement learning-based adaptive monitoring optimizes workforce well-being. Expanding predictive-driven occupational health frameworks to broader industries could enhance safety protocols, ensuring proactive risk mitigation. Future applications include integrating biometric wearables and real-time physiological monitoring to improve predictive accuracy and strengthen occupational resilience. Full article
Show Figures

Figure 1

28 pages, 6540 KiB  
Article
Leveraging Spectral Clustering and Long Short-Term Memory Techniques for Green Hotel Recommendations in Saudi Arabia
by Abdullah Alghamdi
Sustainability 2025, 17(5), 2328; https://doi.org/10.3390/su17052328 - 6 Mar 2025
Viewed by 885
Abstract
Online recommendation agents have demonstrated their value in various contexts by helping users navigate information overload, supporting decision-making, and influencing user behavior. There is a lack of studies focusing on recommendation systems for green hotels that utilize user-generated content from social networking and [...] Read more.
Online recommendation agents have demonstrated their value in various contexts by helping users navigate information overload, supporting decision-making, and influencing user behavior. There is a lack of studies focusing on recommendation systems for green hotels that utilize user-generated content from social networking and e-commerce platforms. While numerous studies have explored the use of real-world datasets for hotel recommendations, the development of recommendation systems specifically for green hotels remains underexplored, particularly in the context of Saudi Arabia. This study attempts to develop a new approach for green hotel recommendations using text mining and Long Short-Term Memory techniques. Latent Dirichlet Allocation is used to identify the main aspects of users’ preferences from the user-generated content, which will help the recommender system to provide more accurate recommendations to the users. Long Short-Term Memory is used for preference prediction based on numerical ratings. To better perform recommendations, a clustering technique is used to overcome the scalability issue of the proposed recommender system, specifically when there is a large amount of data in the datasets. Specifically, a spectral clustering algorithm is used to cluster the users’ ratings on green hotels. To evaluate the proposed recommendation method, 4684 reviews were collected from Saudi Arabia’s green hotels on the TripAdvisor platform. The method was evaluated for its effectiveness in solving sparsity issues, recommendation accuracy, and scalability. It was found that Long Short-Term Memory better predicts the customers’ overall ratings on green hotels. The comparison results demonstrated that the proposed method provides the highest precision (Precision at Top @5 = 89.44, Precision at Top @7 = 88.21) and lowest prediction error (Mean Absolute Error = 0.84) in hotel recommendations. The author discusses the results and presents the research implications based on the findings of the proposed method. Full article
Show Figures

Figure 1

17 pages, 1581 KiB  
Article
Research on Automatic Classification of Mine Safety Hazards Using Pre-Trained Language Models
by Xingbang Qiang, Guoqing Li, Jie Hou and Chunchao Fan
Electronics 2025, 14(5), 1001; https://doi.org/10.3390/electronics14051001 - 1 Mar 2025
Viewed by 797
Abstract
The advancement of pre-trained language models (PLMs) has provided new avenues for addressing text classification challenges. This study investigates the applicability of PLMs in the categorization and automatic classification of short-text safety hazard information specifically within mining industry contexts. Leveraging the superior word [...] Read more.
The advancement of pre-trained language models (PLMs) has provided new avenues for addressing text classification challenges. This study investigates the applicability of PLMs in the categorization and automatic classification of short-text safety hazard information specifically within mining industry contexts. Leveraging the superior word embedding capabilities of encoder-based PLMs, the standardized hazard description data collected from mine safety supervision systems were vectorized while preserving semantic information. Utilizing the BERTopic model, the study successfully mined hazard category information, which was subsequently manually consolidated and labeled to form a standardized dataset for training classification models. A text classification framework based on both encoder and decoder models was designed, and the classification outcomes were compared with those from ensemble learning models constructed using Naive Bayes, XGBoost, TextCNN, etc. The results demonstrate that decoder-based PLMs exhibit superior classification accuracy and generalization capabilities for semantically complex safety hazard descriptions, compared to Non-PLMs and encoder-based PLMs. Additionally, the study concludes that selecting a classification model requires a comprehensive consideration of factors such as classification accuracy and training costs to achieve a balance between performance, efficiency, and cost. This research offers novel insights and methodologies for short-text classification tasks, particularly in the application of PLMs in mine safety management and hazard analysis, laying a foundation for subsequent related studies and further improvements in mine safety management practices. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

37 pages, 1098 KiB  
Article
Evaluating the Current and Future of Corporation and Hotel Industry Performance by Analyzing CEO Messages Using the SBSC Framework Assessment
by Hyeon Kang, Fan Meng and Hyung Jong Na
Sustainability 2025, 17(5), 2109; https://doi.org/10.3390/su17052109 - 28 Feb 2025
Viewed by 654
Abstract
The hotel industry has faced significant challenges in both the short and long term, particularly due to the impact of COVID-19, highlighting the need for strategic adjustments to ensure sustainability and growth. This study investigates the strategic elements emphasized in CEO messages published [...] Read more.
The hotel industry has faced significant challenges in both the short and long term, particularly due to the impact of COVID-19, highlighting the need for strategic adjustments to ensure sustainability and growth. This study investigates the strategic elements emphasized in CEO messages published on hotel company websites and their relationship with current and future corporate performance. Utilizing text mining techniques and the Sustainability Balanced Scorecard (SBSC) framework, this research classifies key strategies into financial, customer, internal business processes, learning and growth, social responsibility, and security and safety perspectives. Empirical analysis using the 2SLS regression model reveals several critical findings. First, strategies emphasizing financial and customer perspectives positively influence future corporate performance, though no significant impact is observed on current performance. Second, internal business processes and learning and growth perspectives show no statistically significant relationship with either current or future corporate performance. Third, social responsibility initiatives have an immediate positive effect on current performance, but their long-term impact is negligible. Finally, security and safety perspectives negatively affect both short-term and long-term corporate performance, largely due to the associated costs. However, additional network analysis demonstrates that security and safety factors are interconnected with other strategic elements, suggesting their complementary importance for overall operational stability. The study highlights the critical role of transparent information delivery through CEO messages as a tool for communicating corporate vision and strategy, offering insights into how stakeholders can utilize this information for decision-making. These findings provide valuable theoretical and managerial implications for sustainable hotel management in a post-pandemic environment. Full article
(This article belongs to the Section Economic and Business Aspects of Sustainability)
Show Figures

Figure 1

19 pages, 2662 KiB  
Article
Identifying Persons of Interest in Digital Forensics Using NLP-Based AI
by Jonathan Adkins, Ali Al Bataineh and Majd Khalaf
Future Internet 2024, 16(11), 426; https://doi.org/10.3390/fi16110426 - 18 Nov 2024
Cited by 2 | Viewed by 5517
Abstract
The field of digital forensics relies on expertise from multiple domains, including computer science, criminology, and law. It also relies on different toolsets and an analyst’s expertise to parse enormous amounts of user-generated data to find clues that help crack a case. This [...] Read more.
The field of digital forensics relies on expertise from multiple domains, including computer science, criminology, and law. It also relies on different toolsets and an analyst’s expertise to parse enormous amounts of user-generated data to find clues that help crack a case. This process of investigative analysis is often done manually. Artificial Intelligence (AI) can provide practical solutions to efficiently mine enormous amounts of data to find useful patterns that can be leveraged to investigate crimes. Natural Language Processing (NLP) is a subdomain of research under AI that deals with problems involving unstructured data, specifically language. The domain of NLP includes several tools to parse text, including topic modeling, pairwise correlation, word vector cosine distance measurement, and sentiment analysis. In this research, we propose a digital forensic investigative technique that uses an ensemble of NLP tools to identify a person of interest list based on a corpus of text. Our proposed method serves as a type of human feature reduction, where a total pool of suspects is filtered down to a short list of candidates who possess a higher correlation with the crime being investigated. Full article
Show Figures

Figure 1

24 pages, 5059 KiB  
Article
Hazard Analysis for Massive Civil Aviation Safety Oversight Reports Using Text Classification and Topic Modeling
by Yaxi Xu, Zurui Gan, Rengang Guo, Xin Wang, Ke Shi and Pengfei Ma
Aerospace 2024, 11(10), 837; https://doi.org/10.3390/aerospace11100837 - 11 Oct 2024
Cited by 3 | Viewed by 1190
Abstract
There are massive amounts of civil aviation safety oversight reports collected each year in the civil aviation of China. The narrative texts of these reports are typically short texts, recording the abnormal events detected during the safety oversight process. In the construction of [...] Read more.
There are massive amounts of civil aviation safety oversight reports collected each year in the civil aviation of China. The narrative texts of these reports are typically short texts, recording the abnormal events detected during the safety oversight process. In the construction of an intelligent civil aviation safety oversight system, the automatic classification of safety oversight texts is a key and fundamental task. However, all safety oversight reports are currently analyzed and classified into categories by manual work, which is time consuming and labor intensive. In recent years, pre-trained language models have been applied to various text mining tasks and have proven to be effective. The aim of this paper is to apply text classification to the mining of these narrative texts and to show that text classification technology can be a critical element of the aviation safety oversight report analysis. In this paper, we propose a novel method for the classification of narrative texts in safety oversight reports. Through extensive experiments, we validated the effectiveness of all the proposed components. The experimental results demonstrate that our method outperforms existing methods on the self-built civil aviation safety oversight dataset. This study undertakes a thorough examination of the precision and associated outcomes of the dataset, thereby establishing a solid basis for furnishing valuable insights to enhance data quality and optimize information. Full article
(This article belongs to the Special Issue Machine Learning for Aeronautics (2nd Edition))
Show Figures

Figure 1

34 pages, 3038 KiB  
Article
Topic Extraction: BERTopic’s Insight into the 117th Congress’s Twitterverse
by Margarida Mendonça and Álvaro Figueira
Informatics 2024, 11(1), 8; https://doi.org/10.3390/informatics11010008 - 17 Feb 2024
Cited by 12 | Viewed by 3507
Abstract
As social media (SM) becomes increasingly prevalent, its impact on society is expected to grow accordingly. While SM has brought positive transformations, it has also amplified pre-existing issues such as misinformation, echo chambers, manipulation, and propaganda. A thorough comprehension of this impact, aided [...] Read more.
As social media (SM) becomes increasingly prevalent, its impact on society is expected to grow accordingly. While SM has brought positive transformations, it has also amplified pre-existing issues such as misinformation, echo chambers, manipulation, and propaganda. A thorough comprehension of this impact, aided by state-of-the-art analytical tools and by an awareness of societal biases and complexities, enables us to anticipate and mitigate the potential negative effects. One such tool is BERTopic, a novel deep-learning algorithm developed for Topic Mining, which has been shown to offer significant advantages over traditional methods like Latent Dirichlet Allocation (LDA), particularly in terms of its high modularity, which allows for extensive personalization at each stage of the topic modeling process. In this study, we hypothesize that BERTopic, when optimized for Twitter data, can provide a more coherent and stable topic modeling. We began by conducting a review of the literature on topic-mining approaches for short-text data. Using this knowledge, we explored the potential for optimizing BERTopic and analyzed its effectiveness. Our focus was on Twitter data spanning the two years of the 117th US Congress. We evaluated BERTopic’s performance using coherence, perplexity, diversity, and stability scores, finding significant improvements over traditional methods and the default parameters for this tool. We discovered that improvements are possible in BERTopic’s coherence and stability. We also identified the major topics of this Congress, which include abortion, student debt, and Judge Ketanji Brown Jackson. Additionally, we describe a simple application we developed for a better visualization of Congress topics. Full article
Show Figures

Figure 1

20 pages, 897 KiB  
Article
Finite State Automata on Multi-Word Units for Efficient Text-Mining
by Alberto Postiglione
Mathematics 2024, 12(4), 506; https://doi.org/10.3390/math12040506 - 6 Feb 2024
Cited by 5 | Viewed by 2289
Abstract
Text mining is crucial for analyzing unstructured and semi-structured textual documents. This paper introduces a fast and precise text mining method based on a finite automaton to extract knowledge domains. Unlike simple words, multi-word units (such as credit card) are emphasized for their [...] Read more.
Text mining is crucial for analyzing unstructured and semi-structured textual documents. This paper introduces a fast and precise text mining method based on a finite automaton to extract knowledge domains. Unlike simple words, multi-word units (such as credit card) are emphasized for their efficiency in identifying specific semantic areas due to their predominantly monosemic nature, their limited number and their distinctiveness. The method focuses on identifying multi-word units within terminological ontologies, where each multi-word unit is associated with a sub-domain of ontology knowledge. The algorithm, designed to handle the challenges posed by very long multi-word units composed of a variable number of simple words, integrates user-selected ontologies into a single finite automaton during a fast pre-processing step. At runtime, the automaton reads input text character by character, efficiently locating multi-word units even if they overlap. This approach is efficient for both short and long documents, requiring no prior training. Ontologies can be updated without additional computational costs. An early system prototype, tested on 100 short and medium-length documents, recognized the knowledge domains for the vast majority of texts (over 90%) analyzed. The authors suggest that this method could be a valuable semantic-based knowledge domain extraction technique in unstructured documents. Full article
(This article belongs to the Special Issue Recent Trends and Advances in the Natural Language Processing)
Show Figures

Figure 1

25 pages, 13288 KiB  
Article
Quantitative Evaluation of Eco-Environmental Protection Policy in the Yangtze River Economic Belt: A PMC-Index Model Approach
by Zeyu Wang, Yachao Xiong and Changli Zhang
Sustainability 2024, 16(2), 805; https://doi.org/10.3390/su16020805 - 17 Jan 2024
Cited by 7 | Viewed by 2364
Abstract
The eco-environmental protection policy of the Yangtze River Economic Belt (YREB) is paramount in upholding biodiversity and fostering sustainable development within the Yangtze River Basin. To assess the effectiveness of this policy, an evaluation system was established utilising text mining and the PMC-Index [...] Read more.
The eco-environmental protection policy of the Yangtze River Economic Belt (YREB) is paramount in upholding biodiversity and fostering sustainable development within the Yangtze River Basin. To assess the effectiveness of this policy, an evaluation system was established utilising text mining and the PMC-Index model. Subsequently, thirteen representative policies were evaluated, and their performance was visualised through PMC-Surface plots. The study showed that nine of the thirteen representative policies were assessed as “Excellent”, the remaining four were assessed as “Acceptable”, and no policy was assessed as either “Perfect” or “Poor”. It shows that the general design of the eco-environmental protection policy of the YREB is reasonable and scientific but still has much to improve. The performance is as follows: short- and medium-term policies are the most prevalent, while long-term planning is lacking; the issuing agency is relatively single, and the awareness and capacity of collaborative governance need to be strengthened; the regulatory scope of local policies does not focus on the YREB as a whole. Based on this, subsequent policies should be improved by focusing on policy timeliness, the policy issuing agency, and the regulation scope. Full article
(This article belongs to the Section Environmental Sustainability and Applications)
Show Figures

Figure 1

14 pages, 4268 KiB  
Article
Advanced Sampling Technique in Radiology Free-Text Data for Efficiently Building Text Mining Models by Deep Learning in Vertebral Fracture
by Wei-Chieh Hung, Yih-Lon Lin, Chi-Wei Lin, Wei-Leng Chin and Chih-Hsing Wu
Diagnostics 2024, 14(2), 137; https://doi.org/10.3390/diagnostics14020137 - 8 Jan 2024
Cited by 3 | Viewed by 1832
Abstract
This study aims to establish advanced sampling methods in free-text data for efficiently building semantic text mining models using deep learning, such as identifying vertebral compression fracture (VCF) in radiology reports. We enrolled a total of 27,401 radiology free-text reports of X-ray examinations [...] Read more.
This study aims to establish advanced sampling methods in free-text data for efficiently building semantic text mining models using deep learning, such as identifying vertebral compression fracture (VCF) in radiology reports. We enrolled a total of 27,401 radiology free-text reports of X-ray examinations of the spine. The predictive effects were compared between text mining models built using supervised long short-term memory networks, independently derived by four sampling methods: vector sum minimization, vector sum maximization, stratified, and simple random sampling, using four fixed percentages. The drawn samples were applied to the training set, and the remaining samples were used to validate each group using different sampling methods and ratios. The predictive accuracy was measured using the area under the receiver operating characteristics (AUROC) to identify VCF. At the sampling ratios of 1/10, 1/20, 1/30, and 1/40, the highest AUROC was revealed in the sampling methods of vector sum minimization as confidence intervals of 0.981 (95%CIs: 0.980–0.983)/0.963 (95%CIs: 0.961–0.965)/0.907 (95%CIs: 0.904–0.911)/0.895 (95%CIs: 0.891–0.899), respectively. The lowest AUROC was demonstrated in the vector sum maximization. This study proposes an advanced sampling method, vector sum minimization, in free-text data that can be efficiently applied to build the text mining models by smartly drawing a small amount of critical representative samples. Full article
(This article belongs to the Special Issue AI and Big Data in Healthcare)
Show Figures

Figure 1

15 pages, 1559 KiB  
Article
Utilizing an Attention-Based LSTM Model for Detecting Sarcasm and Irony in Social Media
by Deborah Olaniyan, Roseline Oluwaseun Ogundokun, Olorunfemi Paul Bernard, Julius Olaniyan, Rytis Maskeliūnas and Hakeem Babalola Akande
Computers 2023, 12(11), 231; https://doi.org/10.3390/computers12110231 - 14 Nov 2023
Cited by 14 | Viewed by 4510
Abstract
Sarcasm and irony represent intricate linguistic forms in social media communication, demanding nuanced comprehension of context and tone. In this study, we propose an advanced natural language processing methodology utilizing long short-term memory with an attention mechanism (LSTM-AM) to achieve an impressive accuracy [...] Read more.
Sarcasm and irony represent intricate linguistic forms in social media communication, demanding nuanced comprehension of context and tone. In this study, we propose an advanced natural language processing methodology utilizing long short-term memory with an attention mechanism (LSTM-AM) to achieve an impressive accuracy of 99.86% in detecting and interpreting sarcasm and irony within social media text. Our approach involves innovating novel deep learning models adept at capturing subtle cues, contextual dependencies, and sentiment shifts inherent in sarcastic or ironic statements. Furthermore, we explore the potential of transfer learning from extensive language models and integrating multimodal information, such as emojis and images, to heighten the precision of sarcasm and irony detection. Rigorous evaluation against benchmark datasets and real-world social media content showcases the efficacy of our proposed models. The outcomes of this research hold paramount significance, offering a substantial advancement in comprehending intricate language nuances in digital communication. These findings carry profound implications for sentiment analysis, opinion mining, and an enhanced understanding of social media dynamics. Full article
Show Figures

Figure 1

11 pages, 283 KiB  
Article
Short Text Classification Based on Explicit and Implicit Multiscale Weighted Semantic Information
by Jun Gong, Juling Zhang, Wenqiang Guo, Zhilong Ma and Xiaoyi Lv
Symmetry 2023, 15(11), 2008; https://doi.org/10.3390/sym15112008 - 1 Nov 2023
Cited by 4 | Viewed by 1501
Abstract
Considering the poor effect of short text classification due to insufficient semantic information mining in the current short text matching methods, a new short text classification method is proposed based on explicit and implicit multiscale weighting semantic information interaction. First, the explicit and [...] Read more.
Considering the poor effect of short text classification due to insufficient semantic information mining in the current short text matching methods, a new short text classification method is proposed based on explicit and implicit multiscale weighting semantic information interaction. First, the explicit and implicit representations of short text are obtained by a word vector model (word2vec), convolutional neural networks (CNNs), and long short-term memory (LSTM). Then, a multiscale convolutional neural network obtains the explicit and implicit multiscale weighting semantics information of short text. Finally, the multiscale weighting semantics is fused for more accurate short text classification. The experimental results show that this method is superior to the existing classical short text classification algorithms and two advanced short text classification models on the five short text classification datasets of MR, Subj, TREC, SST1 and SST2 with accuracies of 85.7%, 96.9%, 98.1%, 53.4% and 91.8%, respectively. Full article
Show Figures

Figure 1

14 pages, 2422 KiB  
Article
WES-BTM: A Short Text-Based Topic Clustering Model
by Jian Zhang, Weichao Gao and Yanhe Jia
Symmetry 2023, 15(10), 1889; https://doi.org/10.3390/sym15101889 - 9 Oct 2023
Cited by 7 | Viewed by 2404
Abstract
User comments often contain their most practical requirements. Using topic modeling of user comments, it is possible to classify and downscale text data, mine the information in user comments, and understand users’ requirements and preferences. However, user comment texts are usually short and [...] Read more.
User comments often contain their most practical requirements. Using topic modeling of user comments, it is possible to classify and downscale text data, mine the information in user comments, and understand users’ requirements and preferences. However, user comment texts are usually short and lack rich word frequency and contextual information with sparsity. The traditional topic model cannot model and analyze these short texts well. The biterm topic model (BTM), while solving the sparsity problem, suffers from accuracy and noise problems. In order to eliminate information barriers and further ensure information symmetry, a new topic clustering model, termed the word-embedding similarity-based BTM (WES-BTM), is proposed in this paper. The WES-BTM builds on the BTM by converting word pairs into word vectors and calculating their similarity to perform word pair filtering, which in turn improves clustering accuracy. Based on the experimental results using actual data, the WES-BTM outperforms the BTM, LDA, and NMF models in terms of topic coherence, perplexity, and Jensen–Shannon divergence. It is verified that the WES-BTM can effectively reduce noise and improve the quality of topic clustering. In this way, the information in user comments can be better mined. Full article
Show Figures

Figure 1

Back to TopTop