Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (143)

Search Parameters:
Keywords = term frequency-inverse document frequency (TF-IDF)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 1687 KiB  
Article
Beyond Classical AI: Detecting Fake News with Hybrid Quantum Neural Networks
by Volkan Altıntaş
Appl. Sci. 2025, 15(15), 8300; https://doi.org/10.3390/app15158300 - 25 Jul 2025
Viewed by 212
Abstract
The advent of quantum computing has introduced new opportunities for enhancing classical machine learning architectures. In this study, we propose a novel hybrid model, the HQDNN (Hybrid Quantum–Deep Neural Network), designed for the automatic detection of fake news. The model integrates classical fully [...] Read more.
The advent of quantum computing has introduced new opportunities for enhancing classical machine learning architectures. In this study, we propose a novel hybrid model, the HQDNN (Hybrid Quantum–Deep Neural Network), designed for the automatic detection of fake news. The model integrates classical fully connected neural layers with a parameterized quantum circuit, enabling the processing of textual data within both classical and quantum computational domains. To assess its effectiveness, we conducted experiments on the widely used LIAR dataset utilizing Term Frequency–Inverse Document Frequency (TF-IDF) features, as well as transformer-based DistilBERT embeddings. The experimental results demonstrate that the HQDNN achieves a superior recall performance—92.58% with TF-IDF and 94.40% with DistilBERT—surpassing traditional machine learning models such as Logistic Regression, Linear SVM, and Multilayer Perceptron. Additionally, we compare the HQDNN with SetFit, a recent CPU-efficient few-shot transformer model, and show that while SetFit achieves higher precision, the HQDNN significantly outperforms it in recall. Furthermore, an ablation experiment confirms the critical contribution of the quantum component, revealing a substantial drop in performance when the quantum layer is removed. These findings highlight the potential of hybrid quantum–classical models as effective and compact alternatives for high-sensitivity classification tasks, particularly in domains such as fake news detection. Full article
Show Figures

Figure 1

23 pages, 1290 KiB  
Article
A KeyBERT-Enhanced Pipeline for Electronic Information Curriculum Knowledge Graphs: Design, Evaluation, and Ontology Alignment
by Guanghe Zhuang and Xiang Lu
Information 2025, 16(7), 580; https://doi.org/10.3390/info16070580 - 6 Jul 2025
Viewed by 471
Abstract
This paper proposes a KeyBERT-based method for constructing a knowledge graph of the electronic information curriculum system, aiming to enhance the structured representation and relational analysis of educational content. Electronic Information Engineering curricula encompass diverse and rapidly evolving topics; however, existing knowledge graphs [...] Read more.
This paper proposes a KeyBERT-based method for constructing a knowledge graph of the electronic information curriculum system, aiming to enhance the structured representation and relational analysis of educational content. Electronic Information Engineering curricula encompass diverse and rapidly evolving topics; however, existing knowledge graphs often overlook multi-word concepts and more nuanced semantic relationships. To address this gap, this paper presents a KeyBERT-enhanced method for constructing a knowledge graph of the electronic information curriculum system. Utilizing teaching plans, syllabi, and approximately 500,000 words of course materials from 17 courses, we first extracted 500 knowledge points via the Term Frequency–Inverse Document Frequency (TF-IDF) algorithm to build a baseline course–knowledge matrix and visualize the preliminary graph using Graph Convolutional Networks (GCN) and Neo4j. We then applied KeyBERT to extract about 1000 knowledge points—approximately 65% of extracted terms were multi-word phrases—and augment the graph with co-occurrence and semantic-similarity edges. Comparative experiments demonstrate a ~20% increase in non-zero matrix coverage and a ~40% boost in edge count (from 5100 to 7100), significantly enhancing graph connectivity. Moreover, we performed sensitivity analysis on extraction thresholds (co-occurrence ≥ 5, similarity ≥ 0.7), revealing that (5, 0.7) maximizes the F1-score at 0.83. Hyperparameter ablation over n-gram ranges [(1,1),(1,2),(1,3)] and top_n [5, 10, 15] identifies (1,3) + top_n = 10 as optimal (Precision = 0.86, Recall = 0.81, F1 = 0.83). Finally, GCN downstream tests show that, despite higher sparsity (KeyBERT 64% vs. TF-IDF 40%), KeyBERT features achieve Accuracy = 0.78 and F1 = 0.75, outperforming TF-IDF’s 0.66/0.69. This approach offers a novel, rigorously evaluated solution for optimizing the electronic information curriculum system and can be extended through terminology standardization or larger data integration. Full article
Show Figures

Figure 1

15 pages, 2051 KiB  
Article
Analysis of Short Texts Using Intelligent Clustering Methods
by Jamalbek Tussupov, Akmaral Kassymova, Ayagoz Mukhanova, Assyl Bissengaliyeva, Zhanar Azhibekova, Moldir Yessenova and Zhanargul Abuova
Algorithms 2025, 18(5), 289; https://doi.org/10.3390/a18050289 - 19 May 2025
Viewed by 676
Abstract
This article presents a comprehensive review of short text clustering using state-of-the-art methods: Bidirectional Encoder Representations from Transformers (BERT), Term Frequency-Inverse Document Frequency (TF-IDF), and the novel hybrid method Latent Dirichlet Allocation + BERT + Autoencoder (LDA + BERT + AE). The article [...] Read more.
This article presents a comprehensive review of short text clustering using state-of-the-art methods: Bidirectional Encoder Representations from Transformers (BERT), Term Frequency-Inverse Document Frequency (TF-IDF), and the novel hybrid method Latent Dirichlet Allocation + BERT + Autoencoder (LDA + BERT + AE). The article begins by outlining the theoretical foundation of each technique and its merits and limitations. BERT is critiqued for its ability to understand word dependence in text, while TF-IDF is lauded for its applicability in terms of importance assessment. The experimental section compares the efficacy of these methods in clustering short texts, with a specific focus on the hybrid LDA + BERT + AE approach. A detailed examination of the LDA-BERT model’s training and validation loss over 200 epochs shows that the loss values start above 1.2 and quickly decrease to around 0.8 within the first 25 epochs, eventually stabilizing at approximately 0.4. The close alignment of these curves suggests the model’s practical learning and generalization capabilities, with minimal overfitting. The study demonstrates that the hybrid LDA + BERT + AE method significantly enhances text clustering quality compared to individual methods. Based on the findings, the study recommends the optimum choice and use of clustering methods for different short texts and natural language processing operations. The applications of these methods in industrial and educational settings, where successful text handling and categorization are critical, are also addressed. The study ends by emphasizing the importance of the holistic handling of short texts for deeper semantic comprehension and effective information retrieval. Full article
(This article belongs to the Section Databases and Data Structures)
Show Figures

Graphical abstract

6 pages, 167 KiB  
Proceeding Paper
Classification of Artificial Intelligence-Generated Product Reviews on Amazon
by Jia-Luen Yang
Eng. Proc. 2025, 92(1), 17; https://doi.org/10.3390/engproc2025092017 - 25 Apr 2025
Viewed by 639
Abstract
Amazon has been flooded with artificial intelligence (AI)-generated product reviews that offer minimal value to customers. These AI reviews merely echo the given product descriptions without providing any authentic information on how buyers feel when using the products. Therefore, an AI review-identifying method [...] Read more.
Amazon has been flooded with artificial intelligence (AI)-generated product reviews that offer minimal value to customers. These AI reviews merely echo the given product descriptions without providing any authentic information on how buyers feel when using the products. Therefore, an AI review-identifying method was developed to enhance the quality of the review-reading experience in this study. A dataset of 6217 Amazon reviews was compiled including 1116 identified as AI-generated ones. They were classified with a 99.25% F1 score on the test data using the term frequency–inverse document frequency (TF–IDF) and support vector classifier (SVC). The developed method enables the detection of AI-generated reviews on the internet, fostering an authentic and reliable platform. Full article
(This article belongs to the Proceedings of 2024 IEEE 6th Eurasia Conference on IoT, Communication and Engineering)
25 pages, 1964 KiB  
Article
Hate Speech Detection and Online Public Opinion Regulation Using Support Vector Machine Algorithm: Application and Impact on Social Media
by Siyuan Li and Zhi Li
Information 2025, 16(5), 344; https://doi.org/10.3390/info16050344 - 24 Apr 2025
Viewed by 797
Abstract
Detecting hate speech in social media is challenging due to its rarity, high-dimensional complexity, and implicit expression via sarcasm or spelling variations, rendering linear models ineffective. In this study, the SVM (Support Vector Machine) algorithm is used to map text features from low-dimensional [...] Read more.
Detecting hate speech in social media is challenging due to its rarity, high-dimensional complexity, and implicit expression via sarcasm or spelling variations, rendering linear models ineffective. In this study, the SVM (Support Vector Machine) algorithm is used to map text features from low-dimensional to high-dimensional space using kernel function techniques to meet complex nonlinear classification challenges. By maximizing the category interval to locate the optimal hyperplane and combining nuclear techniques to implicitly adjust the data distribution, the classification accuracy of hate speech detection is significantly improved. Data collection leverages social media APIs (Application Programming Interface) and customized crawlers with OAuth2.0 authentication and keyword filtering, ensuring relevance. Regular expressions validate data integrity, followed by preprocessing steps such as denoising, stop-word removal, and spelling correction. Word embeddings are generated using Word2Vec’s Skip-gram model, combined with TF-IDF (Term Frequency–Inverse Document Frequency) weighting to capture contextual semantics. A multi-level feature extraction framework integrates sentiment analysis via lexicon-based methods and BERT for advanced sentiment recognition. Experimental evaluations on two datasets demonstrate the SVM model’s effectiveness, achieving accuracies of 90.42% and 92.84%, recall rates of 88.06% and 90.79%, and average inference times of 3.71 ms and 2.96 ms. These results highlight the model’s ability to detect implicit hate speech accurately and efficiently, supporting real-time monitoring. This research contributes to creating a safer online environment by advancing hate speech detection methodologies. Full article
(This article belongs to the Special Issue Information Technology in Society)
Show Figures

Figure 1

22 pages, 4631 KiB  
Article
ChurnKB: A Generative AI-Enriched Knowledge Base for Customer Churn Feature Engineering
by Maryam Shahabikargar, Amin Beheshti, Wathiq Mansoor, Xuyun Zhang, Eu Jin Foo, Alireza Jolfaei, Ambreen Hanif and Nasrin Shabani
Algorithms 2025, 18(4), 238; https://doi.org/10.3390/a18040238 - 21 Apr 2025
Cited by 1 | Viewed by 1330
Abstract
Customers are the cornerstone of business success across industries. Companies invest significant resources in acquiring new customers and, more importantly, retaining existing ones. However, customer churn remains a major challenge, leading to substantial financial losses. Addressing this issue requires a deep understanding of [...] Read more.
Customers are the cornerstone of business success across industries. Companies invest significant resources in acquiring new customers and, more importantly, retaining existing ones. However, customer churn remains a major challenge, leading to substantial financial losses. Addressing this issue requires a deep understanding of customers’ cognitive status and behaviours, as well as early signs of churn. Predictive and Machine Learning (ML)-based analysis, when trained with appropriate features indicative of customer behaviour and cognitive status, can be highly effective in mitigating churn. A robust ML-driven churn analysis depends on a well-developed feature engineering process. Traditional churn analysis studies have primarily relied on demographic, product usage, and revenue-based features, overlooking the valuable insights embedded in customer–company interactions. Recognizing the importance of domain knowledge and human expertise in feature engineering and building on our previous work, we propose the Customer Churn-related Knowledge Base (ChurnKB) to enhance feature engineering for churn prediction. ChurnKB utilizes textual data mining techniques such as Term Frequency-Inverse Document Frequency (TF-IDF), cosine similarity, regular expressions, word tokenization, and stemming to identify churn-related features within customer-generated content, including emails. To further enrich the structure of ChurnKB, we integrate Generative AI, specifically large language models, which offer flexibility in handling unstructured text and uncovering latent features, to identify and refine features related to customer cognitive status, emotions, and behaviours. Additionally, feedback loops are incorporated to validate and enhance the effectiveness of ChurnKB.Integrating knowledge-based features into machine learning models (e.g., Random Forest, Logistic Regression, Multilayer Perceptron, and XGBoost) improves predictive performance of ML models compared to the baseline, with XGBoost’s F1 score increasing from 0.5752 to 0.7891. Beyond churn prediction, this approach potentially supports applications like personalized marketing, cyberbullying detection, hate speech identification, and mental health monitoring, demonstrating its broader impact on business intelligence and online safety. Full article
Show Figures

Figure 1

16 pages, 2935 KiB  
Article
LLM-Enhanced Framework for Building Domain-Specific Lexicon for Urban Power Grid Design
by Yan Xu, Tao Wang, Yang Yuan, Ziyue Huang, Xi Chen, Bo Zhang, Xiaorong Zhang and Zehua Wang
Appl. Sci. 2025, 15(8), 4134; https://doi.org/10.3390/app15084134 - 9 Apr 2025
Cited by 1 | Viewed by 736
Abstract
Traditional methods for urban power grid design have struggled to meet the demands of multi-energy integration and high resilience scenarios due to issues such as delayed updates of terminology and semantic ambiguity. Current techniques for constructing domain-specific lexicons face challenges like the insufficient [...] Read more.
Traditional methods for urban power grid design have struggled to meet the demands of multi-energy integration and high resilience scenarios due to issues such as delayed updates of terminology and semantic ambiguity. Current techniques for constructing domain-specific lexicons face challenges like the insufficient coverage of specialized vocabulary and imprecise synonym mining, which restrict the semantic parsing capabilities of intelligent design systems. To address these challenges, this study proposes a framework for constructing a domain-specific lexicon for urban power grid design based on Large Language Models (LLMs). The aim is to enhance the accuracy and practicality of the lexicon through multi-level term extraction and synonym expansion. Initially, a structured corpus covering national and industry standards in the field of power was constructed. An improved Term Frequency–Inverse Document Frequency (TF-IDF) algorithm, combined with mutual information and adjacency entropy filtering mechanisms, was utilized to extract high-quality seed vocabulary from 3426 candidate terms. Leveraging LLMs, multi-level prompt templates were designed to guide synonym mining, incorporating a self-correction mechanism for semantic verification to mitigate errors caused by model hallucinations. This approach successfully built a domain-specific lexicon comprising 3426 core seed words and 10,745 synonyms. The average cosine similarity of synonym pairs reached 0.86, and expert validation confirmed an accuracy rate of 89.3%; text classification experiments showed that integrating the domain-specific dictionary improved the classifier’s F1-score by 9.2%, demonstrating the effectiveness of the method. This research innovatively constructs a high-precision terminology dictionary in the field of power design for the first time through embedding domain-driven constraints and validation workflows, solving the problems of insufficient coverage and imprecise expansion of traditional methods, and supporting the development of semantically intelligent systems for smart urban power grid design, with significant practical application value. Full article
(This article belongs to the Special Issue Advances in Smart Construction and Intelligent Buildings)
Show Figures

Figure 1

20 pages, 512 KiB  
Article
Automated Classification Model for Elementary Mathematics Diagnostic Assessment Data Based on TF-IDF and XGBoost
by Seonghyun Park, Seungmin Oh and Woncheol Park
Appl. Sci. 2025, 15(7), 3764; https://doi.org/10.3390/app15073764 - 29 Mar 2025
Viewed by 681
Abstract
With the increasing emphasis on personalized learning, there is a growing demand for automated systems that analyze individual students’ learning states and provide effective feedback. This study proposes a system that analyzes elementary school mathematics diagnostic assessment data to generate personalized feedback. The [...] Read more.
With the increasing emphasis on personalized learning, there is a growing demand for automated systems that analyze individual students’ learning states and provide effective feedback. This study proposes a system that analyzes elementary school mathematics diagnostic assessment data to generate personalized feedback. The proposed system integrates Term Frequency-Inverse Document Frequency (TF-IDF) and eXtreme Gradient Boosting (XGBoost) to vectorize textual data and automatically classify learning error patterns. The study utilizes 15,000 diagnostic assessment records collected from 2020 to 2024. After preprocessing, TF-IDF was employed to extract relevant features, and XGBoost was used to train a classification model. To validate the model’s performance, comparative experiments were conducted with Logistic Regression, Support Vector Machine (SVM), LightGBM, BERT, and DistilBERT. The TF-IDF + XGBoost model achieved an accuracy of 98.85% and an F1 Score of 0.9860, outperforming other models. Furthermore, the system demonstrated an average real-time processing speed of 1.3 s, ensuring instant feedback in educational settings. This study highlights the automation and scalability of educational data analysis, suggesting potential applications across various subjects and grade levels. Full article
Show Figures

Figure 1

22 pages, 3887 KiB  
Article
The Impact of Linguistic Variations on Emotion Detection: A Study of Regionally Specific Synthetic Datasets
by Fernando Henrique Calderón Alvarado
Appl. Sci. 2025, 15(7), 3490; https://doi.org/10.3390/app15073490 - 22 Mar 2025
Viewed by 677
Abstract
This study examines the role of linguistic regional variations in synthetic dataset generation and their impact on emotion detection performance. Emotion detection is essential for natural language processing (NLP) applications such as social media analysis, customer service, and mental health monitoring. To explore [...] Read more.
This study examines the role of linguistic regional variations in synthetic dataset generation and their impact on emotion detection performance. Emotion detection is essential for natural language processing (NLP) applications such as social media analysis, customer service, and mental health monitoring. To explore this, synthetic datasets were generated using a state-of-the-art language model, incorporating English variations from the United States, United Kingdom, and India, alongside a general baseline dataset. Two levels of prompt specificity were employed to assess the influence of regional linguistic nuances. Statistical analyses—including frequency distribution, term frequency-inverse document frequency (TF-IDF), type–token ratio (TTR), hapax legomena, pointwise mutual information (PMI) scores, and key-phrase extraction—revealed significant linguistic diversity and regional distinctions in the generated datasets. To evaluate their effectiveness, classification experiments were conducted with two models using bidirectional encoder representations from transformers (BERT) and its de-noising sequence to sequence variation (BART), beginning with zero-shot classification on the contextualized affect representations for emotion recognition (CARER) dataset, followed by fine-tuning with both baseline and region-specific datasets. Results demonstrated that region-specific datasets, particularly those generated with detailed prompts, significantly improved classification accuracy compared to the baseline. These findings underscore the importance of incorporating global linguistic variations in synthetic dataset generation, offering insights into how regional adaptations can enhance emotion detection models for diverse NLP applications. Full article
(This article belongs to the Special Issue Application of Affective Computing)
Show Figures

Figure 1

24 pages, 2927 KiB  
Article
Text Mining Approaches for Exploring Research Trends in the Security Applications of Generative Artificial Intelligence
by Jinsick Kim, Byeongsoo Koo, Moonju Nam, Kukjin Jang, Jooyeoun Lee, Myoungsug Chung and Youngseo Song
Appl. Sci. 2025, 15(6), 3355; https://doi.org/10.3390/app15063355 - 19 Mar 2025
Viewed by 2071
Abstract
This study examines the security implications of generative artificial intelligence (GAI), focusing on models such as ChatGPT. As GAI technologies are increasingly integrated into industries like healthcare, education, and media, concerns are growing regarding security vulnerabilities, ethical challenges, and potential for misuse. This [...] Read more.
This study examines the security implications of generative artificial intelligence (GAI), focusing on models such as ChatGPT. As GAI technologies are increasingly integrated into industries like healthcare, education, and media, concerns are growing regarding security vulnerabilities, ethical challenges, and potential for misuse. This study not only synthesizes existing research but also conducts an original scientometric analysis using text mining techniques. To address these concerns, this research analyzes 1047 peer-reviewed academic articles from the SCOPUS database using scientometric methods, including Term Frequency–Inverse Document Frequency (TF-IDF) analysis, keyword centrality analysis, and Latent Dirichlet Allocation (LDA) topic modeling. The results highlight significant contributions from countries such as the United States, China, and India, with leading institutions like the Chinese Academy of Sciences and the National University of Singapore driving research on GAI security. In the keyword centrality analysis, “ChatGPT” emerged as a highly central term, reflecting its prominence in the research discourse. However, despite its frequent mention, “ChatGPT” showed lower proximity centrality than terms like “model” and “AI”. This suggests that while ChatGPT is broadly associated with other key themes, it has a less direct connection to specific research subfields. Topic modeling identified six major themes, including AI and security in education, language models, data processing, and risk management. The analysis emphasizes the need for robust security frameworks to address technical vulnerabilities, ensure ethical responsibility, and manage risks in the safe deployment of AI systems. These frameworks must incorporate not only technical solutions but also ethical accountability, regulatory compliance, and continuous risk management. This study underscores the importance of interdisciplinary research that integrates technical, legal, and ethical perspectives to ensure the responsible and secure deployment of GAI technologies. Full article
(This article belongs to the Special Issue New Advances in Computer Security and Cybersecurity)
Show Figures

Figure 1

23 pages, 4154 KiB  
Review
Mapping Research Trends on Intestinal Permeability in Irritable Bowel Syndrome with a Focus on Nutrition: A Bibliometric Analysis
by Domenica Mallardi, Fatima Maqoud, Davide Guido, Michelangelo Aloisio, Michele Linsalata and Francesco Russo
Nutrients 2025, 17(6), 1064; https://doi.org/10.3390/nu17061064 - 18 Mar 2025
Viewed by 1475
Abstract
Irritable Bowel Syndrome (IBS) is a complex gastrointestinal disorder characterized by chronic abdominal pain and altered bowel habits, often linked to disruptions in intestinal barrier function. Increased intestinal permeability plays a key role in IBS pathogenesis, affecting immune responses, gut microbiota, and inflammation. [...] Read more.
Irritable Bowel Syndrome (IBS) is a complex gastrointestinal disorder characterized by chronic abdominal pain and altered bowel habits, often linked to disruptions in intestinal barrier function. Increased intestinal permeability plays a key role in IBS pathogenesis, affecting immune responses, gut microbiota, and inflammation. This study conducts a bibliometric analysis to explore global research trends on intestinal permeability in IBS, focusing on key contributors, collaboration networks, and thematic shifts, particularly the interplay between the intestinal barrier, gut microbiota, and dietary components. A total of 411 articles were retrieved from Scopus, with 232 studies analyzed using Bibliometrix in R. To optimize screening, ASReview, a machine learning tool, was employed, utilizing the Naïve Bayes algorithm combined with Term Frequency-Inverse Document Frequency (TF-IDF) for adaptive ranking of articles by relevance. This approach significantly improved screening step efficacy. The analysis highlights growing research interest, with China and the USA as leading contributors. Key themes include the role of gut microbiota in modulating permeability, the impact of dietary components (fiber, probiotics, bioactive compounds) on tight junction integrity, and the exploration of therapeutic agents. Emerging studies suggest integrating gut barrier modulation with nutritional and microbiome-targeted strategies for IBS management. This study provides a comprehensive overview of research on intestinal permeability in IBS, mapping its evolution and identifying major trends. By highlighting key contributors and thematic areas, it offers insights to guide future investigations into the interplay between gut permeability, diet, and microbiota, advancing understanding of IBS pathophysiology and management. Full article
(This article belongs to the Special Issue Biostatistics Methods in Nutritional Research)
Show Figures

Figure 1

19 pages, 1050 KiB  
Article
WASPAS-Based Natural Language Processing Method for Handling Content Words Extraction and Ranking Issues: An Example of SDGs Corpus
by Liang-Ching Chen, Kuei-Hu Chang and Jeng-Fung Hung
Information 2025, 16(3), 198; https://doi.org/10.3390/info16030198 - 4 Mar 2025
Viewed by 752
Abstract
This paper addresses the challenges in extracting content words within the domains of natural language processing (NLP) and artificial intelligence (AI), using sustainable development goals (SDGs) corpora as verification examples. Traditional corpus-based methods and the term frequency-inverse document frequency (TF-IDF) method face limitations, [...] Read more.
This paper addresses the challenges in extracting content words within the domains of natural language processing (NLP) and artificial intelligence (AI), using sustainable development goals (SDGs) corpora as verification examples. Traditional corpus-based methods and the term frequency-inverse document frequency (TF-IDF) method face limitations, including the inability to automatically eliminate function words, effectively extract the relevant parameters’ quantitative data, simultaneously consider frequency and range parameters to evaluate the terms’ overall importance, and sort content words at the corpus level. To overcome these limitations, this paper proposes a novel method based on a weighted aggregated sum product assessment (WASPAS) technique. This NLP method integrates the function word elimination method, an NLP machine, and the WASPAS technique to improve the extraction and ranking of content words. The proposed method efficiently extracts quantitative data, simultaneously considers frequency and range parameters to evaluate terms’ substantial importance, and ranks content words at the corpus level, providing a comprehensive overview of term significance. This study employed a target corpus from the Web of Science (WOS), comprising 35 highly cited SDG-related research articles. Compared to competing methods, the results demonstrate that the proposed method outperforms traditional methods in extracting and ranking content words. Full article
Show Figures

Figure 1

20 pages, 3976 KiB  
Article
Machine Learning for Quality Diagnostics: Insights into Consumer Electronics Evaluation
by Najada Firza, Anisa Bakiu and Alfonso Monaco
Electronics 2025, 14(5), 939; https://doi.org/10.3390/electronics14050939 - 27 Feb 2025
Viewed by 762
Abstract
In the era of digital commerce, understanding consumer opinions has become crucial for businesses aiming to tailor their products and services effectively. This study investigates acoustic quality diagnostics of the latest generation of AirPods. From this perspective, the work examines consumer sentiment using [...] Read more.
In the era of digital commerce, understanding consumer opinions has become crucial for businesses aiming to tailor their products and services effectively. This study investigates acoustic quality diagnostics of the latest generation of AirPods. From this perspective, the work examines consumer sentiment using text mining and sentiment analysis techniques applied to product reviews, focusing on Amazon’s AirPods reviews. Using the naïve Bayes classifier, a probabilistic machine learning approach grounded in Bayes’ theorem, this research analyzes textual data to classify consumer reviews as positive or negative. Data were collected via web scraping, following ethical guidelines, and preprocessed to ensure quality and relevance. Textual features were transformed using term frequency-inverse document frequency (TF-IDF) to create input vectors for the classifier. The results reveal that naïve Bayes provides satisfactory performance in categorizing sentiment, with metrics such as accuracy, sensitivity, specificity, and F1-score offering insight into the model’s effectiveness. Key findings highlight the divergence in consumer perception across ratings, identifying sentiment drivers such as noise cancellation quality and product integration. These insights underline the potential of sentiment analysis in enabling companies to address consumer concerns, improve offerings, and optimize business strategies. The study concludes that such methodologies are indispensable for leveraging consumer feedback in the rapidly evolving digital marketplace. Full article
Show Figures

Figure 1

22 pages, 1670 KiB  
Article
Word-of-Mouth Evaluation of Ancient Towns in Southern China Using Web Comments
by Yihan Zhang, Weizhuo Guo, Yanling Sheng and Shanshan Li
Tour. Hosp. 2025, 6(1), 25; https://doi.org/10.3390/tourhosp6010025 - 11 Feb 2025
Viewed by 1100
Abstract
With the rapid development of digital networks and communication technologies, traditional word-of-mouth (WOM) has transformed into electronic word-of-mouth (eWOM), which plays a pivotal role in improving the management and service quality of ancient town tourism. This study uses Python web scraping techniques to [...] Read more.
With the rapid development of digital networks and communication technologies, traditional word-of-mouth (WOM) has transformed into electronic word-of-mouth (eWOM), which plays a pivotal role in improving the management and service quality of ancient town tourism. This study uses Python web scraping techniques to gather eWOM data from the top ten ancient towns in southern China. Using IPA analysis, the analytic hierarchy process (AHP), Term Frequency–Inverse Document Frequency (TF-IDF), and cluster analysis, we developed a comprehensive eWOM evaluation framework. This framework was employed to perform word frequency analysis, sentiment analysis, topic modeling, and rating analysis, providing deeper insights into tourists’ perceptions. The results reveal several key findings: (1) Transportation infrastructure varies significantly across the towns. Heshun and Huangyao suffer from poor accessibility, while the remaining towns benefit from the developed transportation network of the Yangtze River Delta. (2) The volume of eWOM is strongly influenced by seasonal patterns and was notably impacted by the COVID-19 pandemic. (3) The majority of tourists express positive sentiments toward the ancient towns, with a focus on the available facilities. Their highest levels of satisfaction, however, are associated with the scenic landscapes. (4) A comprehensive eWOM analysis suggests that Wuzhen and Xidi–Hongcun are the most popular tourist destinations, while Zhujiajiao, Huangyao, Zhouzhuang, and Nanxun exhibit lower levels of both attention and visitor satisfaction. Full article
Show Figures

Figure 1

16 pages, 3369 KiB  
Article
Authorship Detection on Classical Chinese Text Using Deep Learning
by Lingmei Zhao, Jianjun Shi, Chenkai Zhang and Zhixiang Liu
Appl. Sci. 2025, 15(4), 1677; https://doi.org/10.3390/app15041677 - 7 Feb 2025
Viewed by 1243
Abstract
Authorship detection has played an important role in social information science. In this study, we propose a support vector machine (SVM)-based authorship detection model for classical Chinese texts. Term frequency-inverse document frequency (TF-IDF) feature extraction technique is combined with the SVM-based method. The [...] Read more.
Authorship detection has played an important role in social information science. In this study, we propose a support vector machine (SVM)-based authorship detection model for classical Chinese texts. Term frequency-inverse document frequency (TF-IDF) feature extraction technique is combined with the SVM-based method. The linguistic features used in this model are based on TF-DIF calculations of different function words, including literary Chinese words, end-function words, vernacular function words, and transitional function words. Furthermore, a bidirectional long short-term memory (BiLSTM)-based authorship model is introduced to detect authorship in classical Chinese texts. The BiLSTM model incorporates an attention mechanism to better capture the meaning and weight of the words. We conduct a comparative analysis between the SVM-based and BiLSTM-based models in the context of authorship detection in Chinese classical literature. The applicability of the two authorship detection models for classical Chinese texts is examined. Results indicate varying authorship between different sections of the texts, with the SVM model outperforming the BiLSTM model. Notably, these classification outcomes are consistent with findings from prior studies in classical Chinese literary analysis. The proposed SVM-based authorship detection model is especially suited for automatic literary analysis, which underscores its potential for broader literary studies. Full article
(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications—2nd Edition)
Show Figures

Figure 1

Back to TopTop