MDPI - Publisher of Open Access Journals

35 pages, 2730 KiB

Open AccessReview

Deep Learning and NLP-Based Trend Analysis in Actuators and Power Electronics

by Woojun Jung and Keuntae Cho

Actuators 2025, 14(8), 379; https://doi.org/10.3390/act14080379 (registering DOI) - 1 Aug 2025

Viewed by 62

Actuators and power electronics are fundamental components of modern control systems, enabling high-precision functionality, enhanced energy efficiency, and sophisticated automation. This study investigates evolving research trends and thematic developments in these areas spanning the last two decades (2005–2024). This study analyzed 1840 peer-reviewed [...] Read more.

Actuators and power electronics are fundamental components of modern control systems, enabling high-precision functionality, enhanced energy efficiency, and sophisticated automation. This study investigates evolving research trends and thematic developments in these areas spanning the last two decades (2005–2024). This study analyzed 1840 peer-reviewed abstracts obtained from the Web of Science database using BERTopic modeling, which integrates transformer-based sentence embeddings with UMAP for dimensionality reduction and HDBSCAN for clustering. The approach also employed class-based TF-IDF calculations, intertopic distance visualization, and hierarchical clustering to clarify topic structures. The analysis revealed a steady increase in research publications, with a marked surge post-2015. From 2005 to 2014, investigations were mainly focused on established areas including piezoelectric actuators, adaptive control, and hydraulic systems. In contrast, the 2015–2024 period saw broader diversification into new topics such as advanced materials, robotic mechanisms, resilient systems, and networked actuator control through communication protocols. The structural topic analysis indicated a shift from a unified to a more differentiated and specialized spectrum of research themes. This study offers a rigorous, data-driven outlook on the increasing complexity and diversity of actuator and power electronics research. The findings are pertinent for researchers, engineers, and policymakers aiming to advance state-of-the-art, sustainable industrial technologies. Full article

(This article belongs to the Special Issue Power Electronics and Actuators—Second Edition)

► Show Figures

Figure 1

23 pages, 978 KiB

Open AccessArticle

Emotional Analysis in a Morphologically Rich Language: Enhancing Machine Learning with Psychological Feature Lexicons

by Ron Keinan, Efraim Margalit and Dan Bouhnik

Electronics 2025, 14(15), 3067; https://doi.org/10.3390/electronics14153067 (registering DOI) - 31 Jul 2025

Viewed by 179

Abstract

This paper explores emotional analysis in Hebrew texts, focusing on improving machine learning techniques for depression detection by integrating psychological feature lexicons. Hebrew’s complex morphology makes emotional analysis challenging, and this study seeks to address that by combining traditional machine learning methods with [...] Read more.

This paper explores emotional analysis in Hebrew texts, focusing on improving machine learning techniques for depression detection by integrating psychological feature lexicons. Hebrew’s complex morphology makes emotional analysis challenging, and this study seeks to address that by combining traditional machine learning methods with sentiment lexicons. The dataset consists of over 350,000 posts from 25,000 users on the health-focused social network “Camoni” from 2010 to 2021. Various machine learning models—SVM, Random Forest, Logistic Regression, and Multi-Layer Perceptron—were used, alongside ensemble techniques like Bagging, Boosting, and Stacking. TF-IDF was applied for feature selection, with word and character n-grams, and pre-processing steps like punctuation removal, stop word elimination, and lemmatization were performed to handle Hebrew’s linguistic complexity. The models were enriched with sentiment lexicons curated by professional psychologists. The study demonstrates that integrating sentiment lexicons significantly improves classification accuracy. Specific lexicons—such as those for negative and positive emojis, hostile words, anxiety words, and no-trust words—were particularly effective in enhancing model performance. Our best model classified depression with an accuracy of 84.1%. These findings offer insights into depression detection, suggesting that practitioners in mental health and social work can improve their machine learning models for detecting depression in online discourse by incorporating emotion-based lexicons. The societal impact of this work lies in its potential to improve the detection of depression in online Hebrew discourse, offering more accurate and efficient methods for mental health interventions in online communities. Full article

(This article belongs to the Special Issue Techniques and Applications of Multimodal Data Fusion)

► Show Figures

Figure 1

23 pages, 3847 KiB

Open AccessArticle

Optimizing Sentiment Analysis in Multilingual Balanced Datasets: A New Comparative Approach to Enhancing Feature Extraction Performance with ML and DL Classifiers

by Hamza Jakha, Souad El Houssaini, Mohammed-Alamine El Houssaini, Souad Ajjaj and Abdelali Hadir

Appl. Syst. Innov. 2025, 8(4), 104; https://doi.org/10.3390/asi8040104 - 28 Jul 2025

Viewed by 283

Abstract

Social network platforms have a big impact on the development of companies by influencing clients’ behaviors and sentiments, which directly affect corporate reputations. Analyzing this feedback has become an essential component of business intelligence, supporting the improvement of long-term marketing strategies on a [...] Read more.

Social network platforms have a big impact on the development of companies by influencing clients’ behaviors and sentiments, which directly affect corporate reputations. Analyzing this feedback has become an essential component of business intelligence, supporting the improvement of long-term marketing strategies on a larger scale. The implementation of powerful sentiment analysis models requires a comprehensive and in-depth examination of each stage of the process. In this study, we present a new comparative approach for several feature extraction techniques, including TF-IDF, Word2Vec, FastText, and BERT embeddings. These methods are applied to three multilingual datasets collected from hotel review platforms in the tourism sector in English, French, and Arabic languages. Those datasets were preprocessed through cleaning, normalization, labeling, and balancing before being trained on various machine learning and deep learning algorithms. The effectiveness of each feature extraction method was evaluated using metrics such as accuracy, F1-score, precision, recall, ROC AUC curve, and a new metric that measures the execution time for generating word representations. Our extensive experiments demonstrate significant and excellent results, achieving accuracy rates of approximately 99% for the English dataset, 94% for the Arabic dataset, and 89% for the French dataset. These findings confirm the important impact of vectorization techniques on the performance of sentiment analysis models. They also highlight the important relationship between balanced datasets, effective feature extraction methods, and the choice of classification algorithms. So, this study aims to simplify the selection of feature extraction methods and appropriate classifiers for each language, thereby contributing to advancements in sentiment analysis. Full article

(This article belongs to the Topic Social Sciences and Intelligence Management, 2nd Volume)

► Show Figures

Figure 1

18 pages, 1687 KiB

Open AccessArticle

Beyond Classical AI: Detecting Fake News with Hybrid Quantum Neural Networks

by Volkan Altıntaş

Appl. Sci. 2025, 15(15), 8300; https://doi.org/10.3390/app15158300 - 25 Jul 2025

Viewed by 194

Abstract

The advent of quantum computing has introduced new opportunities for enhancing classical machine learning architectures. In this study, we propose a novel hybrid model, the HQDNN (Hybrid Quantum–Deep Neural Network), designed for the automatic detection of fake news. The model integrates classical fully [...] Read more.

The advent of quantum computing has introduced new opportunities for enhancing classical machine learning architectures. In this study, we propose a novel hybrid model, the HQDNN (Hybrid Quantum–Deep Neural Network), designed for the automatic detection of fake news. The model integrates classical fully connected neural layers with a parameterized quantum circuit, enabling the processing of textual data within both classical and quantum computational domains. To assess its effectiveness, we conducted experiments on the widely used LIAR dataset utilizing Term Frequency–Inverse Document Frequency (TF-IDF) features, as well as transformer-based DistilBERT embeddings. The experimental results demonstrate that the HQDNN achieves a superior recall performance—92.58% with TF-IDF and 94.40% with DistilBERT—surpassing traditional machine learning models such as Logistic Regression, Linear SVM, and Multilayer Perceptron. Additionally, we compare the HQDNN with SetFit, a recent CPU-efficient few-shot transformer model, and show that while SetFit achieves higher precision, the HQDNN significantly outperforms it in recall. Furthermore, an ablation experiment confirms the critical contribution of the quantum component, revealing a substantial drop in performance when the quantum layer is removed. These findings highlight the potential of hybrid quantum–classical models as effective and compact alternatives for high-sensitivity classification tasks, particularly in domains such as fake news detection. Full article

► Show Figures

Figure 1

27 pages, 1481 KiB

Open AccessArticle

Integration of Associative Tokens into Thematic Hyperspace: A Method for Determining Semantically Significant Clusters in Dynamic Text Streams

by Dmitriy Rodionov, Boris Lyamin, Evgenii Konnikov, Elena Obukhova, Gleb Golikov and Prokhor Polyakov

Big Data Cogn. Comput. 2025, 9(8), 197; https://doi.org/10.3390/bdcc9080197 - 25 Jul 2025

Viewed by 302

Abstract

With the exponential growth of textual data, traditional topic modeling methods based on static analysis demonstrate limited effectiveness in tracking the dynamics of thematic content. This research aims to develop a method for quantifying the dynamics of topics within text corpora using a [...] Read more.

With the exponential growth of textual data, traditional topic modeling methods based on static analysis demonstrate limited effectiveness in tracking the dynamics of thematic content. This research aims to develop a method for quantifying the dynamics of topics within text corpora using a thematic signal (TS) function that accounts for temporal changes and semantic relationships. The proposed method combines associative tokens with original lexical units to reduce thematic entropy and information noise. Approaches employed include topic modeling (LDA), vector representations of texts (TF-IDF, Word2Vec), and time series analysis. The method was tested on a corpus of news texts (5000 documents). Results demonstrated robust identification of semantically meaningful thematic clusters. An inverse relationship was observed between the level of thematic significance and semantic diversity, confirming a reduction in entropy using the proposed method. This approach allows for quantifying topic dynamics, filtering noise, and determining the optimal number of clusters. Future applications include analyzing multilingual data and integration with neural network models. The method shows potential for monitoring information flows and predicting thematic trends. Full article

► Show Figures

Figure 1

20 pages, 4490 KiB

Open AccessArticle

Mapping Trends in Green Finance: A Bibliometric and Topic Modeling Analysis

by Orlando Joaqui-Barandica, Jesús Heredia-Carroza, Sebastian López-Estrada and Daniela-Tatiana Agheorghiesei

Int. J. Financial Stud. 2025, 13(3), 137; https://doi.org/10.3390/ijfs13030137 - 25 Jul 2025

Viewed by 648

Abstract

This study presents a comprehensive bibliometric and topic modeling analysis of the academic literature on green and sustainable finance. Using 1372 peer-reviewed articles indexed in the Web of Science up to 2024, we identify key publication trends, influential authors, prominent journals, and thematic [...] Read more.

This study presents a comprehensive bibliometric and topic modeling analysis of the academic literature on green and sustainable finance. Using 1372 peer-reviewed articles indexed in the Web of Science up to 2024, we identify key publication trends, influential authors, prominent journals, and thematic clusters shaping the field. The analysis reveals an exponential growth in publications since 2017 and highlights the dominance of journals such as Journal of Sustainable Finance & Investment and Sustainability. Text mining techniques, including TF-IDF and Latent Dirichlet Allocation (LDA), are applied to abstracts to extract the most relevant terms and classify articles into four latent topics. The findings suggest a growing focus on the impact of green finance on carbon emissions, energy efficiency, and firm performance, particularly in the context of China. This study offers valuable insights for researchers and policymakers by mapping the intellectual structure and identifying emerging research frontiers in the rapidly evolving field of green finance. Full article

► Show Figures

Figure 1

24 pages, 2281 KiB

Open AccessArticle

Multilayer Network Modeling for Brand Knowledge Discovery: Integrating TF-IDF and TextRank in Heterogeneous Semantic Space

by Peng Xu, Rixu Zang, Zongshui Wang and Zhuo Sun

Information 2025, 16(7), 614; https://doi.org/10.3390/info16070614 - 17 Jul 2025

Viewed by 222

Abstract

In the era of homogenized competition, brand knowledge has become a critical factor that influences consumer purchasing decisions. However, traditional single-layer network models fail to capture the multi-dimensional semantic relationships embedded in brand-related textual data. To address this gap, this study proposes a [...] Read more.

In the era of homogenized competition, brand knowledge has become a critical factor that influences consumer purchasing decisions. However, traditional single-layer network models fail to capture the multi-dimensional semantic relationships embedded in brand-related textual data. To address this gap, this study proposes a BKMN framework integrating TF-IDF and TextRank algorithms for comprehensive brand knowledge discovery. By analyzing 19,875 consumer reviews of a mobile phone brand from JD website, we constructed a tri-layer network comprising TF-IDF-derived keywords, TextRank-derived keywords, and their overlapping nodes. The model incorporates co-occurrence matrices and centrality metrics (degree, closeness, betweenness, eigenvector) to identify semantic hubs and interlayer associations. The results reveal that consumers prioritize attributes such as “camera performance”, “operational speed”, “screen quality”, and “battery life”. Notably, the overlap layer exhibits the highest node centrality, indicating convergent consumer focus across algorithms. The network demonstrates small-world characteristics (average path length = 1.627) with strong clustering (average clustering coefficient = 0.848), reflecting cohesive consumer discourse around key features. Meanwhile, this study proposes the Mul-LSTM model for sentiment analysis of reviews, achieving a 93% sentiment classification accuracy, revealing that consumers have a higher proportion of positive attitudes towards the brand’s cell phones, which provides a quantitative basis for enterprises to understand users’ emotional tendencies and optimize brand word-of-mouth management. This research advances brand knowledge modeling by synergizing heterogeneous algorithms and multilayer network analysis. Its practical implications include enabling enterprises to pinpoint competitive differentiators and optimize marketing strategies. Future work could extend the framework to incorporate sentiment dynamics and cross-domain applications in smart home or cosmetic industries. Full article

► Show Figures

Figure 1

21 pages, 1689 KiB

Open AccessArticle

Exploring LLM Embedding Potential for Dementia Detection Using Audio Transcripts

by Brandon Alejandro Llaca-Sánchez, Luis Roberto García-Noguez, Marco Antonio Aceves-Fernández, Andras Takacs and Saúl Tovar-Arriaga

Eng 2025, 6(7), 163; https://doi.org/10.3390/eng6070163 - 17 Jul 2025

Viewed by 294

Abstract

Dementia is a neurodegenerative disorder characterized by progressive cognitive impairment that significantly affects daily living. Early detection of Alzheimer’s disease—the most common form of dementia—remains essential for prompt intervention and treatment, yet clinical diagnosis often requires extensive and resource-intensive procedures. This article explores [...] Read more.

Dementia is a neurodegenerative disorder characterized by progressive cognitive impairment that significantly affects daily living. Early detection of Alzheimer’s disease—the most common form of dementia—remains essential for prompt intervention and treatment, yet clinical diagnosis often requires extensive and resource-intensive procedures. This article explores the effectiveness of automated Natural Language Processing (NLP) methods for identifying Alzheimer’s indicators from audio transcriptions of the Cookie Theft picture description task in the PittCorpus dementia database. Five NLP approaches were compared: a classical Tf–Idf statistical representation and embeddings derived from large language models (GloVe, BERT, Gemma-2B, and Linq-Embed-Mistral), each integrated with a logistic regression classifier. Transcriptions were carefully preprocessed to preserve linguistically relevant features such as repetitions, self-corrections, and pauses. To compare the performance of the five approaches, a stratified 5-fold cross-validation was conducted; the best results were obtained with BERT embeddings (84.73% accuracy) closely followed by the simpler Tf–Idf approach (83.73% accuracy) and the state-of-the-art model Linq-Embed-Mistral (83.54% accuracy), while Gemma-2B and GloVe embeddings yielded slightly lower performances (80.91% and 78.11% accuracy, respectively). Contrary to initial expectations—that richer semantic and contextual embeddings would substantially outperform simpler frequency-based methods—the competitive accuracy of Tf–Idf suggests that the choice and frequency of the words used might be more important than semantic or contextual information in Alzheimer’s detection. This work represents an effort toward implementing user-friendly software capable of offering an initial indicator of Alzheimer’s risk, potentially reducing the need for an in-person clinical visit. Full article

(This article belongs to the Special Issue Advanced Artificial Intelligence Techniques for Disease Prediction, Diagnosis and Management)

► Show Figures

Figure 1

27 pages, 1817 KiB

Open AccessArticle

A Large Language Model-Based Approach for Multilingual Hate Speech Detection on Social Media

by Muhammad Usman, Muhammad Ahmad, Grigori Sidorov, Irina Gelbukh and Rolando Quintero Tellez

Computers 2025, 14(7), 279; https://doi.org/10.3390/computers14070279 - 15 Jul 2025

Viewed by 717

Abstract

The proliferation of hate speech on social media platforms poses significant threats to digital safety, social cohesion, and freedom of expression. Detecting such content—especially across diverse languages—remains a challenging task due to linguistic complexity, cultural context, and resource limitations. To address these challenges, [...] Read more.

The proliferation of hate speech on social media platforms poses significant threats to digital safety, social cohesion, and freedom of expression. Detecting such content—especially across diverse languages—remains a challenging task due to linguistic complexity, cultural context, and resource limitations. To address these challenges, this study introduces a comprehensive approach for multilingual hate speech detection. To facilitate robust hate speech detection across diverse languages, this study makes several key contributions. First, we created a novel trilingual hate speech dataset consisting of 10,193 manually annotated tweets in English, Spanish, and Urdu. Second, we applied two innovative techniques—joint multilingual and translation-based approaches—for cross-lingual hate speech detection that have not been previously explored for these languages. Third, we developed detailed hate speech annotation guidelines tailored specifically to all three languages to ensure consistent and high-quality labeling. Finally, we conducted 41 experiments employing machine learning models with TF–IDF features, deep learning models utilizing FastText and GloVe embeddings, and transformer-based models leveraging advanced contextual embeddings to comprehensively evaluate our approach. Additionally, we employed a large language model with advanced contextual embeddings to identify the best solution for the hate speech detection task. The experimental results showed that our GPT-3.5-turbo model significantly outperforms strong baselines, achieving up to an 8% improvement over XLM-R in Urdu hate speech detection and an average gain of 4% across all three languages. This research not only contributes a high-quality multilingual dataset but also offers a scalable and inclusive framework for hate speech detection in underrepresented languages. Full article

(This article belongs to the Special Issue Recent Advances in Social Networks and Social Media)

► Show Figures

Figure 1

19 pages, 2783 KiB

Open AccessArticle

Cross-Project Multiclass Classification of EARS-Based Functional Requirements Utilizing Natural Language Processing, Machine Learning, and Deep Learning

by Touseef Tahir, Hamid Jahankhani, Kinza Tasleem and Bilal Hassan

Systems 2025, 13(7), 567; https://doi.org/10.3390/systems13070567 - 10 Jul 2025

Viewed by 448

Abstract

Software requirements are primarily classified into functional and non-functional requirements. While research has explored automated multiclass classification of non-functional requirements, functional requirements remain largely unexplored. This study addressed that gap by introducing a comprehensive dataset comprising 9529 functional requirements from 315 diverse projects. [...] Read more.

Software requirements are primarily classified into functional and non-functional requirements. While research has explored automated multiclass classification of non-functional requirements, functional requirements remain largely unexplored. This study addressed that gap by introducing a comprehensive dataset comprising 9529 functional requirements from 315 diverse projects. The requirements are classified into five categories: ubiquitous, event-driven, state-driven, unwanted behavior, and optional capabilities. Natural Language Processing (NLP), machine learning (ML), and deep learning (DL) techniques are employed to enable automated classification. All software requirements underwent several procedures, including normalization and feature extraction techniques such as TF-IDF. A series of Machine learning (ML) and deep learning (DL) experiments were conducted to classify subcategories of functional requirements. Among the trained models, the convolutional neural network achieved the highest performance, with an accuracy of 93, followed by the long short-term memory network with an accuracy of 92, outperforming traditional decision-tree-based methods. This work offers a foundation for precise requirement classification tools by providing both the dataset and an automated classification approach. Full article

(This article belongs to the Special Issue Decision Making in Software Project Management)

► Show Figures

Figure 1

16 pages, 1535 KiB

Open AccessArticle

Clinical Text Classification for Tuberculosis Diagnosis Using Natural Language Processing and Deep Learning Model with Statistical Feature Selection Technique

by Shaik Fayaz Ahamed, Sundarakumar Karuppasamy and Ponnuraja Chinnaiyan

Informatics 2025, 12(3), 64; https://doi.org/10.3390/informatics12030064 - 7 Jul 2025

Viewed by 485

Abstract

Background: In the medical field, various deep learning (DL) algorithms have been effectively used to extract valuable information from unstructured clinical text data, potentially leading to more effective outcomes. This study utilized clinical text data to classify clinical case reports into tuberculosis (TB) [...] Read more.

Background: In the medical field, various deep learning (DL) algorithms have been effectively used to extract valuable information from unstructured clinical text data, potentially leading to more effective outcomes. This study utilized clinical text data to classify clinical case reports into tuberculosis (TB) and non-tuberculosis (non-TB) groups using natural language processing (NLP), a pre-processing technique, and DL models. Methods: This study used 1743 open-source respiratory disease clinical text data, labeled via fuzzy matching with ICD-10 codes to create a labeled dataset. Two tokenization methods preprocessed the clinical text data, and three models were evaluated: the existing Text-CNN, the proposed Text-CNN with t-test, and Bio_ClinicalBERT. Performance was assessed using multiple metrics and validated on 228 baseline screening clinical case text data collected from ICMR–NIRT to demonstrate effective TB classification. Results: The proposed model achieved the best results in both the test and validation datasets. On the test dataset, it attained a precision of 88.19%, a recall of 90.71%, an F1-score of 89.44%, and an AUC of 0.91. Similarly, on the validation dataset, it achieved 100% precision, 98.85% recall, 99.42% F1-score, and an AUC of 0.982, demonstrating its effectiveness in TB classification. Conclusions: This study highlights the effectiveness of DL models in classifying TB cases from clinical notes. The proposed model outperformed the other two models. The TF-IDF and t-test showed statistically significant feature selection and enhanced model interpretability and efficiency, demonstrating the potential of NLP and DL in automating TB diagnosis in clinical decision settings. Full article

► Show Figures

Figure 1

23 pages, 1290 KiB

Open AccessArticle

A KeyBERT-Enhanced Pipeline for Electronic Information Curriculum Knowledge Graphs: Design, Evaluation, and Ontology Alignment

by Guanghe Zhuang and Xiang Lu

Information 2025, 16(7), 580; https://doi.org/10.3390/info16070580 - 6 Jul 2025

Viewed by 464

Abstract

This paper proposes a KeyBERT-based method for constructing a knowledge graph of the electronic information curriculum system, aiming to enhance the structured representation and relational analysis of educational content. Electronic Information Engineering curricula encompass diverse and rapidly evolving topics; however, existing knowledge graphs [...] Read more.

This paper proposes a KeyBERT-based method for constructing a knowledge graph of the electronic information curriculum system, aiming to enhance the structured representation and relational analysis of educational content. Electronic Information Engineering curricula encompass diverse and rapidly evolving topics; however, existing knowledge graphs often overlook multi-word concepts and more nuanced semantic relationships. To address this gap, this paper presents a KeyBERT-enhanced method for constructing a knowledge graph of the electronic information curriculum system. Utilizing teaching plans, syllabi, and approximately 500,000 words of course materials from 17 courses, we first extracted 500 knowledge points via the Term Frequency–Inverse Document Frequency (TF-IDF) algorithm to build a baseline course–knowledge matrix and visualize the preliminary graph using Graph Convolutional Networks (GCN) and Neo4j. We then applied KeyBERT to extract about 1000 knowledge points—approximately 65% of extracted terms were multi-word phrases—and augment the graph with co-occurrence and semantic-similarity edges. Comparative experiments demonstrate a ~20% increase in non-zero matrix coverage and a ~40% boost in edge count (from 5100 to 7100), significantly enhancing graph connectivity. Moreover, we performed sensitivity analysis on extraction thresholds (co-occurrence ≥ 5, similarity ≥ 0.7), revealing that (5, 0.7) maximizes the F1-score at 0.83. Hyperparameter ablation over n-gram ranges [(1,1),(1,2),(1,3)] and top_n [5, 10, 15] identifies (1,3) + top_n = 10 as optimal (Precision = 0.86, Recall = 0.81, F1 = 0.83). Finally, GCN downstream tests show that, despite higher sparsity (KeyBERT 64% vs. TF-IDF 40%), KeyBERT features achieve Accuracy = 0.78 and F1 = 0.75, outperforming TF-IDF’s 0.66/0.69. This approach offers a novel, rigorously evaluated solution for optimizing the electronic information curriculum system and can be extended through terminology standardization or larger data integration. Full article

► Show Figures

Figure 1

23 pages, 2960 KiB

Open AccessArticle

Exploring Information Interaction Preferences in an LLM-Assisted Learning Environment with a Topic Modeling Framework

by Yiming Taclis Luo, Ting Liu, Patrick Cheong-Iao Pang, Zhuo Wang and Ka Ian Chan

Appl. Sci. 2025, 15(13), 7515; https://doi.org/10.3390/app15137515 - 4 Jul 2025

Viewed by 541

Abstract

Large Language Models (LLMs) are driving a revolution in the way we access information, yet there remains a lack of exploration to capture people’s information interaction preferences in LLM environments. In this study, we designed a comprehensive analysis framework to evaluate students’ prompt [...] Read more.

Large Language Models (LLMs) are driving a revolution in the way we access information, yet there remains a lack of exploration to capture people’s information interaction preferences in LLM environments. In this study, we designed a comprehensive analysis framework to evaluate students’ prompt texts during a professional academic writing task. The framework includes a dimensionality reduction and classification method, three topic modeling approaches, namely BERTopic, BoW-LDA, and TF-IDF-NMF, and a set of evaluation criteria. These criteria assess both the semantic quality of topic content and the structural quality of clustering. Using this framework, we analyzed 288 prompt texts to identify key topics that reflect students’ information interaction behaviors. The results showed that students with low academic performance tend to focus on structural clarity and task execution, including task inquiry, format specifications, and methodological search, indicating that their interaction mode is instruction-oriented. In contrast, students with high academic performance interact with LLM not only in basic task completion but also in knowledge integration and the pursuit of novel ideas. This is reflected in more complex topic levels and diverse, innovative keywords. It shows that they have stronger self-planning and self-regulation abilities. This study provides a new approach to studying the interaction between students and LLM in engineering education by using natural language processing to process prompts, contributing to the exploration of the performance of students with different performance levels in professional academic writing using LLM. Full article

(This article belongs to the Special Issue Applications of Natural Language Processing to Data Science)

► Show Figures

Figure 1

21 pages, 804 KiB

Open AccessArticle

Spam Email Detection Using Long Short-Term Memory and Gated Recurrent Unit

by Samiullah Saleem, Zaheer Ul Islam, Syed Shabih Ul Hasan, Habib Akbar, Muhammad Faizan Khan and Syed Adil Ibrar

Appl. Sci. 2025, 15(13), 7407; https://doi.org/10.3390/app15137407 - 1 Jul 2025

Viewed by 498

Abstract

In today’s business environment, emails are essential across all sectors, including finance and academia. There are two main types of emails: ham (legitimate) and spam (unsolicited). Spam wastes consumers’ time and resources and poses risks to sensitive data, with volumes doubling daily. Current [...] Read more.

In today’s business environment, emails are essential across all sectors, including finance and academia. There are two main types of emails: ham (legitimate) and spam (unsolicited). Spam wastes consumers’ time and resources and poses risks to sensitive data, with volumes doubling daily. Current spam identification methods, such as Blocklist approaches and content-based techniques, have limitations, highlighting the need for more effective solutions. These constraints call for detailed and more accurate approaches, such as machine learning (ML) and deep learning (DL), for realistic detection of new scams. Emphasis has since been placed on the possibility that ML and DL technologies are present in detecting email spam. In this work, we have succeeded in developing a hybrid deep learning model, where Long Short-Term Memory (LSTM) and the Gated Recurrent Unit (GRU) are applied distinctly to identify spam email. Despite the fact that the other models have been applied independently (CNNs, LSTM, GRU, or ensemble machine learning classifier) in previous studies, the given research has provided a contribution to the existing body of literature since it has managed to combine the advantage of LSTM in capturing the long-term dependency and the effectiveness of GRU in terms of computational efficiency. In this hybridization, we have addressed key issues such as the vanishing gradient problem and outrageous resource consumption that are usually encountered in applying standalone deep learning. Moreover, our proposed model is superior regarding the detection accuracy (90%) and AUC (98.99%). Though Transformer-based models are significantly lighter and can be used in real-time applications, they require extensive computation resources. The proposed work presents a substantive and scalable foundation to spam detection that is technically and practically dissimilar to the familiar approaches due to the powerful preprocessing steps, including particular stop-word removal, TF-IDF vectorization, and model testing on large, real-world size dataset (Enron-Spam). Additionally, delays in the feature comparison technique within the model minimize false positives and false negatives. Full article

► Show Figures

Figure 1

13 pages, 732 KiB

Open AccessArticle

Current Unveiling Key Research Trends in Endometrial Cancer: A Comprehensive Topic Modeling Analysis

by Sujin Kang and Youngji Kim

Healthcare 2025, 13(13), 1567; https://doi.org/10.3390/healthcare13131567 - 30 Jun 2025

Viewed by 327

Abstract

Background/Objectives: Endometrial cancer (EC) is the sixth most common cancer among women worldwide, and its global incidence has significantly increased over the past three decades. Despite its substantial burden, comprehensive reviews of EC-related research remain limited. This study employs topic modeling to analyze [...] Read more.

Background/Objectives: Endometrial cancer (EC) is the sixth most common cancer among women worldwide, and its global incidence has significantly increased over the past three decades. Despite its substantial burden, comprehensive reviews of EC-related research remain limited. This study employs topic modeling to analyze and classify recent research trends in EC. Methods: We identified studies related to endometrial carcinoma published between 2019 and 2023 in PubMed, Web of Science, and the Cochrane Library. The search was conducted using the following terms: endometr* AND (neoplasm* OR cancer* OR carcinoma*) NOT endometriosis. Word clouds were constructed and topic modeling was performed to analyze research activity. Results: A total of 2188 studies were selected, and 11,552 terms were extracted. High-frequency and TF-IDF-weighted keywords included ‘cancer’, ‘risk’, ‘survival’, ‘stage’, ‘tumor’, ‘surgery’, and ‘OS.’ Topic modeling analysis identified ten clusters, categorized as follows: ‘Gynecologic cancer’, ‘Surgical staging’, ‘Therapeutic efficacy’, ‘Diagnosis’, ‘Surgical management’, ‘Multimodal treatment’, ‘Molecular treatment’, ‘Risk factors’, ‘Survival’, and ‘Hormonal regulation.’ Conclusions: This study highlights that recent research on EC has primarily focused on surgical decision making, outcome prediction, and patient survival. Future studies should place greater emphasis on multimodal treatment and prevention—particularly through the identification of risk factors—as well as on improving patients’ quality of life. Full article

► Show Figures

Figure 1

Search Results (329)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (329)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI