Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (370)

Search Parameters:
Keywords = TF-IDF

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
14 pages, 488 KB  
Article
The Evolution of Nanoparticle Regulation: A Meta-Analysis of Research Trends and Historical Parallels (2015–2025)
by Sung-Kwang Shin, Niti Sharma, Seong Soo A. An and Meyoung-Kon (Jerry) Kim
Nanomaterials 2026, 16(2), 134; https://doi.org/10.3390/nano16020134 - 19 Jan 2026
Abstract
Objective: We analyzed nanoparticle regulation research to examine the evolution of regulatory frameworks, identify major thematic structures, and evaluate current challenges in the governance of rapidly advancing nanotechnologies. By drawing parallels with the historical development of radiation regulation, the study aimed to [...] Read more.
Objective: We analyzed nanoparticle regulation research to examine the evolution of regulatory frameworks, identify major thematic structures, and evaluate current challenges in the governance of rapidly advancing nanotechnologies. By drawing parallels with the historical development of radiation regulation, the study aimed to contextualize emerging regulatory strategies and derive lessons for future governance. Methods: A total of 9095 PubMed-indexed articles published between January 2015 and October 2025 were analyzed using text mining, keyword frequency analysis, and topic modeling. Preprocessed titles and abstracts were transformed into a TF-IDF (Term Frequency–Inverse Document Frequency) document–term matrix, and NMF (Non-negative Matrix Factorization) was applied to extract semantically coherent topics. Candidate topic numbers (K = 1–12) were evaluated using UMass coherence scores and qualitative interpretability criteria to determine the optimal topic structure. Results: Six major research topics were identified, spanning energy and sensor applications, metal oxide toxicity, antibacterial silver nanoparticles, cancer nano-therapy, and nanoparticle-enabled drug and mRNA delivery. Publication output increased markedly after 2019 with interdisciplinary journals driving much of the growth. Regulatory considerations were increasingly embedded within experimental and biomedical research, particularly in safety assessment and environmental impact analyses. Conclusions: Nanoparticle regulation matured into a dynamic multidisciplinary field. Regulatory efforts should prioritize adaptive, data-informed, and internationally harmonized frameworks that support innovation while ensuring human and environmental safety. These findings provide a data-driven overview of how regulatory thinking was evolved alongside scientific development and highlight areas where future governance efforts were most urgently needed. Full article
(This article belongs to the Section Environmental Nanoscience and Nanotechnology)
Show Figures

Figure 1

24 pages, 4461 KB  
Article
SD-CVD Corpus: Towards Robust Detection of Fine-Grained Cyber-Violence Across Saudi Dialects in Online Platforms
by Abrar Alsayed, Salma Elhag and Sahar Badri
Information 2026, 17(1), 76; https://doi.org/10.3390/info17010076 - 12 Jan 2026
Viewed by 180
Abstract
This paper introduces Saudi Dialects Cyber Violence Detection (SD-CVD) corpus, a large-scale, class-balanced Saudi-dialect corpus for fine-grained cyber violence detection on online platforms. The dataset contains 88,687 Saudi Arabic tweets annotated using a three-level hierarchical scheme that assigns each tweet to one of [...] Read more.
This paper introduces Saudi Dialects Cyber Violence Detection (SD-CVD) corpus, a large-scale, class-balanced Saudi-dialect corpus for fine-grained cyber violence detection on online platforms. The dataset contains 88,687 Saudi Arabic tweets annotated using a three-level hierarchical scheme that assigns each tweet to one of 11 mutually exclusive classes, covering benign sentiment (positive, neutral, negative), cyberbullying, and seven hate-speech subtypes (incitement to violence, gender, national, social class, tribal, religious, and regional discrimination). To mitigate the class imbalance common in Arabic cyber violence datasets, data augmentation was applied to achieve a near-uniform class distribution. Annotation quality was ensured through multi-stage review, yielding excellent inter-annotator agreement (Fleiss’ κ > 0.89). We evaluate three modeling paradigms: traditional machine learning with TF–IDF and n-gram features (SVM, logistic regression, random forest), deep learning models trained on fixed sentence embeddings (LSTM, RNN, MLP, CNN), and fine-tuned transformer models (AraBERTv02-Twitter, CAMeLBERT-MSA). Experimental results show that transformers perform best, with AraBERTv02-Twitter achieving the highest weighted F1-score (0.882) followed by CAMeLBERT-MSA (0.869). Among non-transformer baselines, SVM is most competitive (0.853), while CNN performs worst (0.561). Overall, SD-CVD provides a high-quality benchmark and strong baselines to support future research on robust and interpretable Arabic cyber-violence detection. Full article
Show Figures

Figure 1

29 pages, 2980 KB  
Article
Integrating NLP and Ensemble Learning into Next-Generation Firewalls for Robust Malware Detection in Edge Computing
by Ramahlapane Lerato Moila and Mthulisi Velempini
Sensors 2026, 26(2), 424; https://doi.org/10.3390/s26020424 - 9 Jan 2026
Viewed by 297
Abstract
As edge computing becomes increasingly central to modern digital infrastructure, it also creates opportunities for sophisticated malware attacks that traditional security systems struggle to address. This study proposes a natural language processing (NLP) framework integrated with ensemble learning into next-generation firewalls (NGFWs) to [...] Read more.
As edge computing becomes increasingly central to modern digital infrastructure, it also creates opportunities for sophisticated malware attacks that traditional security systems struggle to address. This study proposes a natural language processing (NLP) framework integrated with ensemble learning into next-generation firewalls (NGFWs) to detect and mitigate malware attacks in edge computing environments. The approach leverages unstructured threat intelligence (e.g., cybersecurity reports, logs) by applying NLP techniques, such as TF-IDF vectorization, to convert textual data into structured insights. This process uncovers hidden patterns and entity relationships within system logs. By combining Random Forest (RF) and Logistic Regression (LR) in a soft voting ensemble, the proposed model achieves 95% accuracy on a cyber threat intelligence dataset augmented with synthetic data to address class imbalance, and 98% accuracy on the CSE-CIC-IDS2018 dataset. The study was validated using ANOVA to assess statistical robustness and confusion matrix analysis, both of which confirmed low error rates. The system enhances detection rates and adaptability, providing a scalable defense layer optimized for resource-constrained, latency-sensitive edge environments. Full article
(This article belongs to the Section Internet of Things)
Show Figures

Figure 1

13 pages, 444 KB  
Article
Evaluating the Accuracy, Usefulness, and Safety of ChatGPT for Caregivers Seeking Information on Congenital Muscular Torticollis
by Siyun Kim, Seoyon Yang, Jaewon Kim, Sunyoung Joo, Hoo Young Lee, Hye Jung Park, Jongwook Jeon and You Gyoung Yi
Healthcare 2026, 14(2), 140; https://doi.org/10.3390/healthcare14020140 - 6 Jan 2026
Viewed by 123
Abstract
Background/Objectives: Caregivers of infants with congenital muscular torticollis (CMT) frequently seek information online, although the accuracy, clarity, and safety of web-based content remain variable. As large language models (LLMs) are increasingly used as health information tools, their reliability for caregiver education requires [...] Read more.
Background/Objectives: Caregivers of infants with congenital muscular torticollis (CMT) frequently seek information online, although the accuracy, clarity, and safety of web-based content remain variable. As large language models (LLMs) are increasingly used as health information tools, their reliability for caregiver education requires systematic evaluation. This study aimed to assess the reproducibility and quality of ChatGPT-5.1 responses to caregiver-centered questions regarding CMT. Methods: A set of 17 questions was developed through a Delphi process involving clinicians and caregivers to ensure relevance and comprehensiveness. ChatGPT generated responses in two independent sessions. Reproducibility was assessed using TF–IDF cosine similarity and embedding-based semantic similarity. Ten clinical experts evaluated each response for accuracy, readability, safety, and overall quality using a 4-point Likert scale. Results: ChatGPT demonstrated moderate lexical consistency (mean TF–IDF similarity 0.75) and high semantic stability (mean embedding similarity 0.92). Expert ratings indicated moderate to good performance across domains, with mean scores of 3.0 for accuracy, 3.6 for readability, 3.1 for safety, and 3.1 for overall quality. However, several responses exhibited deficiencies, particularly due to omission of key cautions, oversimplification, or insufficient clinical detail. Conclusions: While ChatGPT provides fluent and generally accurate information about CMT, the observed variability across topics underscores the importance of human oversight and content refinement prior to integration into caregiver-facing educational materials. Full article
Show Figures

Figure 1

16 pages, 1520 KB  
Article
Public Conversation on X During COP30: Engagement, Sentiment and Thematic Dynamics Around #COP30noBrasil
by Rafael Carrasco-Polaino
Soc. Sci. 2026, 15(1), 21; https://doi.org/10.3390/socsci15010021 - 1 Jan 2026
Viewed by 281
Abstract
This study examines how public conversation on X unfolded during the COP30 climate summit, focusing on posts articulated around the official hashtag #COP30noBrasil and analysing a dataset of 1139 posts. Social media research has shown that platforms such as X play a central [...] Read more.
This study examines how public conversation on X unfolded during the COP30 climate summit, focusing on posts articulated around the official hashtag #COP30noBrasil and analysing a dataset of 1139 posts. Social media research has shown that platforms such as X play a central role in shaping climate communication, particularly during major diplomatic events. To explore this dynamic, all posts published between 10 and 21 November 2025 were collected using Tweet Binder and analysed quantitatively. Engagement, follower–following ratio and sentiment were computed, and non-parametric tests were applied given the non-normal distribution of the variables. Word clouds based on frequency and TF–IDF weighting were generated to identify prevalent topics in posts and replies. The results showed that activity was dominated by retweets, with original posts and replies representing smaller portions of the interaction. Engagement did not differ significantly between verified and unverified accounts, although posts with images generated higher interaction than text-only posts. No significant correlations emerged between engagement, sentiment or FF ratio. Replies displayed a less positive tone than original posts, suggesting a shift toward more neutral reactions. The thematic analysis indicated that original posts centred on planning and institutional aspects of COP30, while replies focused more on Amazon-related issues, resource extraction and calls for environmental protection. Full article
(This article belongs to the Special Issue Big Data and Political Communication)
Show Figures

Figure 1

19 pages, 726 KB  
Article
Structural–Semantic Term Weighting for Interpretable Topic Modeling with Higher Coherence and Lower Token Overlap
by Dmitriy Rodionov, Evgenii Konnikov, Gleb Golikov and Polina Yakob
Information 2026, 17(1), 22; https://doi.org/10.3390/info17010022 - 31 Dec 2025
Viewed by 201
Abstract
Topic modeling of large news streams is widely used to reconstruct economic and political narratives, which requires coherent topics with low lexical overlap while remaining interpretable to domain experts. We propose TF-SYN-NER-Rel, a structural–semantic term weighting scheme that extends classical TF-IDF by integrating [...] Read more.
Topic modeling of large news streams is widely used to reconstruct economic and political narratives, which requires coherent topics with low lexical overlap while remaining interpretable to domain experts. We propose TF-SYN-NER-Rel, a structural–semantic term weighting scheme that extends classical TF-IDF by integrating positional, syntactic, factual, and named-entity coefficients derived from morphosyntactic and dependency parses of Russian news texts. The method is embedded into a standard Latent Dirichlet Allocation (LDA) pipeline and evaluated on a large Russian-language news corpus from the online archive of Moskovsky Komsomolets (over 600,000 documents), with political, financial, and sports subsets obtained via dictionary-based expert labeling. For each subset, TF-SYN-NER-Rel is compared with standard TF-IDF under identical LDA settings, and topic quality is assessed using the C_v coherence metric. To assess robustness, we repeat model training across multiple random initializations and report aggregate coherence statistics. Quantitative results show that TF-SYN-NER-Rel improves coherence and yields smoother, more stable coherence curves across the number of topics. Qualitative analysis indicates reduced lexical overlap between topics and clearer separation of event-centered and institutional themes, especially in political and financial news. Overall, the proposed pipeline relies on CPU-based NLP tools and sparse linear algebra, providing a computationally lightweight and interpretable complement to embedding- and LLM-based topic modeling in large-scale news monitoring. Full article
Show Figures

Figure 1

21 pages, 11718 KB  
Article
A Method to Infer Customary Routes via Analysis of the Movement Importance of Ship Trajectories Calculated Using TF-IDF
by Seung Sim, Jun-Rae Cho, Jae-Ryong Jung, Jong-Hwa Baek and Deuk-Jae Cho
J. Mar. Sci. Eng. 2026, 14(1), 29; https://doi.org/10.3390/jmse14010029 - 23 Dec 2025
Viewed by 225
Abstract
Ship positional data are widely used for route inference, yet most existing studies rely on automatic identification system data, which contain irregular transmission intervals and limit the ability to capture vessel-specific operational habits and subtle route choices. This study addresses these limitations by [...] Read more.
Ship positional data are widely used for route inference, yet most existing studies rely on automatic identification system data, which contain irregular transmission intervals and limit the ability to capture vessel-specific operational habits and subtle route choices. This study addresses these limitations by proposing a methodology to infer customary routes using periodic 3 s ship position data collected through the Korean e-Navigation system based on long-term evolution maritime communication. The method comprises three main steps: constructing a sea-area grid with an associated weight map, determining data-driven importance and updating weights, and performing pathfinding. Domestic waters are divided into 100 m grids, and navigable and non-navigable areas are binarized to establish a framework for route exploration. Ship positional data are processed to extract inter-port trajectories, which are then classified by ship size and tidal time zone to account for navigational differences arising from vessel characteristics and tide-dependent accessibility. These trajectories are combined with spatial grids and transformed into a document–word structure, enabling the calculation of movement importance between grid cells using a modified term frequency–inverse document frequency measure. The resulting weights are applied to a pathfinding graph to derive routes that reflect vessel size and tidal conditions. The effectiveness of the proposed method is evaluated by computing cosine similarity between the inferred routes and actual trajectories. Full article
(This article belongs to the Special Issue Advanced Ship Trajectory Prediction and Route Planning)
Show Figures

Figure 1

23 pages, 953 KB  
Article
Comparative Study of Machine Learning Models for Textual Medical Note Classification
by Yan Zhang, Huynh Trung Nguyen Le, Nathan Lopez and Kira Phan
Computers 2026, 15(1), 7; https://doi.org/10.3390/computers15010007 - 23 Dec 2025
Viewed by 413
Abstract
The expansion of electronic health records (EHRs) has generated a large amount of unstructured textual data, such as clinical notes and medical reports, which contain diagnostic and prognostic information. Effective classification of these textual medical notes is critical for improving clinical decision support [...] Read more.
The expansion of electronic health records (EHRs) has generated a large amount of unstructured textual data, such as clinical notes and medical reports, which contain diagnostic and prognostic information. Effective classification of these textual medical notes is critical for improving clinical decision support and healthcare data management. This study presents a statistically rigorous comparative analysis of four traditional machine learning algorithms—Random Forest, Logistic Regression, Multinomial Naive Bayes, and Support Vector Machine—for multiclass classification of medical notes into four disease categories: Neoplasms, Digestive System Diseases, Nervous System Diseases, and Cardiovascular Diseases. A dataset containing 9633 labeled medical notes was preprocessed through text cleaning, lemmatization, stop-word removal, and vectorization using term frequency-inverse document frequency (TF–IDF) representation. The models were trained and optimized through GridSearchCV with 5-fold cross-validation and evaluated across five independent stratified 90-10 train–test splits. Evaluation metrics, including accuracy, precision, recall, F1-score, and multiclass ROC-AUC, were used to assess model performance. Logistic Regression demonstrated the strongest overall performance, achieving an average accuracy of 0.8469 and high macro and weighted F1 scores, followed by Support Vector Machine and Multinomial Naive Bayes. Misclassification patterns revealed substantial lexical overlap between digestive and neurological disease notes, underscoring the limitations of TF–IDF representations in capturing deeper semantic distinctions. These findings confirm that traditional machine learning models remain robust, interpretable, and computationally efficient tools for textual medical note classification, and the study establishes a transparent and reproducible benchmark that provides a solid foundation for future methodological advancements in clinical natural language processing. Full article
(This article belongs to the Special Issue Machine Learning and Statistical Learning with Applications 2025)
Show Figures

Figure 1

24 pages, 1586 KB  
Article
A Study on Psychospatial Perception of a Sustainable Urban Node: Semantic–Spatial Mapping of User-Generated Place Cognition at Hakata Station in Fukuoka, Japan
by Chiayu Tsai and Shichen Zhao
Sustainability 2025, 17(24), 10959; https://doi.org/10.3390/su172410959 - 8 Dec 2025
Viewed by 384
Abstract
Reducing reliance on private vehicles, optimizing public spaces, and adopting low-carbon, energy-efficient practices are essential strategies for advancing sustainable urban development. This study investigates user perceptions and spatial experiences at Hakata Station in Fukuoka, Japan, by analyzing online reviews collected over 1 year. [...] Read more.
Reducing reliance on private vehicles, optimizing public spaces, and adopting low-carbon, energy-efficient practices are essential strategies for advancing sustainable urban development. This study investigates user perceptions and spatial experiences at Hakata Station in Fukuoka, Japan, by analyzing online reviews collected over 1 year. The results indicate that: (1) Using TF–IDF vectorization and K-means clustering (K = 5), five major semantic themes were identified, and a chi-square test (χ2(16) = 632.00, p < 0.001) confirmed their strong correspondence with the station’s five functional zones. This revealed a cognitive mapping effect between users’ semantic structures and spatial functions. (2) Six environmental psychology indicators—Wayfinding Usability, Crowding Density, Seating and Rest Availability, Functional Convenience, Environmental Quality, and Information Legibility—were established. Logistic regression showed that only Functional Convenience significantly predicted positive sentiment (OR = 31.6, p = 0.05), underscoring the emotional influence of smooth circulation and well-integrated commercial facilities. (3) Process-intensive areas exhibited emotional accumulation and cognitive strain, while restorative zones reduced mental fatigue; moderate spatial concealment enhanced exploration, and a shared social atmosphere fostered belongingness. The findings elucidate the psychological correspondence between semantic structures and spatial functions, providing user-centered indicators for urban node design that promote comfort, accessibility, and urban sustainability. Full article
(This article belongs to the Special Issue Advanced Studies in Sustainable Urban Planning and Urban Development)
Show Figures

Figure 1

17 pages, 2226 KB  
Article
Multi-Aspect Sentiment Analysis of Arabic Café Reviews Using Machine and Deep Learning Approaches
by Hmood Al-Dossari and Munerah Altalasi
Mathematics 2025, 13(24), 3895; https://doi.org/10.3390/math13243895 - 5 Dec 2025
Viewed by 333
Abstract
Online reviews on platforms such as Google Maps strongly influence consumer decisions. However, aggregated ratings mask nuanced opinions about specific aspects such as food, drinks, service, lounge, and price. This study presents a multi-aspect sentiment analysis framework for Arabic café reviews. Specifically, we [...] Read more.
Online reviews on platforms such as Google Maps strongly influence consumer decisions. However, aggregated ratings mask nuanced opinions about specific aspects such as food, drinks, service, lounge, and price. This study presents a multi-aspect sentiment analysis framework for Arabic café reviews. Specifically, we combine machine learning (Linear SVC, Naïve Bayes, Logistic Regression, Decision Tree, Random Forest) and a Convolutional Neural Network (CNN) to perform aspect identification and sentiment classification. A rigorous preprocessing and feature-engineering with TF-IDF and n-gram was implemented and statistically validated through bootstrap confidence intervals and Friedman–Nemenyi significance tests. Experimental results demonstrate that Linear SVC with optimized TF-IDF tri-grams achieved a macro-F1 of 0.89 for aspect identification and 0.71 for sentiment classification. Meanwhile, the CNN model yielded a comparable F1 of 0.89 for aspect identification and a higher 0.76 for sentiment classification. The findings highlight that effective feature representation and model selection can substantially improve Arabic opinion mining. The proposed framework provides a reliable foundation for analyzing Arabic user feedback on location-based platforms and supports more interpretable and data-driven business insights. These insights are essential to enhance personalized recommendations and business intelligence in the hospitality sector. Full article
(This article belongs to the Special Issue Data Mining and Machine Learning with Applications, 2nd Edition)
Show Figures

Figure 1

16 pages, 1214 KB  
Article
From Prediction to Prevention: Identifying Actionable Crash Factors Through ML and Narrative-Based Sensitivity Testing
by Mohammad Zana Majidi, Teng Wang and Reginald Souleyrette
Future Transp. 2025, 5(4), 190; https://doi.org/10.3390/futuretransp5040190 - 4 Dec 2025
Cited by 1 | Viewed by 336
Abstract
Crashes on roadways continue to represent a major global public health concern due to high rates of death and injury, underscoring the need for predictive tools that can identify high-risk conditions and guide prevention strategies. This study develops a framework that combines structured [...] Read more.
Crashes on roadways continue to represent a major global public health concern due to high rates of death and injury, underscoring the need for predictive tools that can identify high-risk conditions and guide prevention strategies. This study develops a framework that combines structured crash records and road information with unstructured police narratives to predict injury severity using machine learning and natural language processing (NLP). The dataset is used to train, validate, and test nine models, combining three algorithms (Random Forest, AdaBoost, and XGBoost) with two NLP methods (TF-IDF and Word2Vec). Model performance is evaluated using macro-average F1-scores to address severe class imbalance. Results show that XGBoost with TF-IDF achieves the best performance (macro-F1 = 0.644), demonstrating measurable improvements from incorporating narrative features compared to structured data alone. Beyond prediction, a simulation-based sensitivity analysis is conducted on the top 100 features, identifying 11 variables with the greatest impact on severity outcomes in Kentucky. Seatbelt non-use, occupant entrapment, and impaired driver control emerge as the most influential factors, with simulated improvements leading to notable reductions in fatalities and major injuries. The study introduces a “prediction-to-prevention” framework that links injury severity prediction with simulation-based sensitivity analysis. By integrating structured and narrative crash data, the framework identifies how changes in key behavioral and roadway factors can shift injury outcomes toward less severe levels. These findings highlight the dual contribution of this study: improving predictive accuracy through narrative integration and offering actionable insights to support evidence-based traffic safety interventions. Full article
Show Figures

Figure 1

27 pages, 1028 KB  
Article
MCD-Temporal: Constructing a New Time-Entropy Enhanced Dynamic Weighted Heterogeneous Ensemble for Cognitive Level Classification
by Yuhan Wu, Long Zhang, Bin Li and Wendong Zhang
Informatics 2025, 12(4), 134; https://doi.org/10.3390/informatics12040134 - 2 Dec 2025
Viewed by 531
Abstract
Accurate classification of cognitive levels in instructional dialogues is essential for personalized education and intelligent teaching systems. However, most existing methods predominantly rely on static textual features and a shallow semantic analysis. They often overlook dynamic temporal interactions and struggle with class imbalance. [...] Read more.
Accurate classification of cognitive levels in instructional dialogues is essential for personalized education and intelligent teaching systems. However, most existing methods predominantly rely on static textual features and a shallow semantic analysis. They often overlook dynamic temporal interactions and struggle with class imbalance. To address these limitations, this study proposes a novel framework for cognitive-level classification. This framework integrates time entropy-enhanced dynamics with a dynamically weighted, heterogeneous ensemble strategy. Specifically, we reconstruct the original Multi-turn Classroom Dialogue (MCD) dataset by introducing time entropy to quantify teacher–student speaking balance and semantic richness features based on Term Frequency-Inverse Document Frequency (TF-IDF), resulting in an enhanced MCD-temporal dataset. We then design a Dynamic Weighted Heterogeneous Ensemble (DWHE), which adjusts weights based on the class distribution. Our framework achieves a state-of-the-art macro-F1 score of 0.6236. This study validates the effectiveness of incorporating temporal dynamics and adaptive ensemble learning for robust cognitive level assessment, offering a more powerful tool for educational AI applications. Full article
Show Figures

Figure 1

32 pages, 5411 KB  
Article
A Text-Based Project Risk Classification System Using Multi-Model AI: Comparing SVM, Logistic Regression, Random Forests, Naive Bayes, and XGBoost
by Koudoua Ferhati, Adriana Burlea-Schiopoiu and Andrei-Gabriel Nascu
Systems 2025, 13(12), 1078; https://doi.org/10.3390/systems13121078 - 1 Dec 2025
Viewed by 929
Abstract
This study presents the design and evaluation of a multi-model artificial intelligence (AI) framework for proactive quality risk management in projects. A dataset comprising 2000 risk records was developed, containing four columns: Risk Description (input), Risk Category, Trigger, and Impact (outputs). Each output [...] Read more.
This study presents the design and evaluation of a multi-model artificial intelligence (AI) framework for proactive quality risk management in projects. A dataset comprising 2000 risk records was developed, containing four columns: Risk Description (input), Risk Category, Trigger, and Impact (outputs). Each output variable was modeled using three independent classifiers, forming a multi-step decision-making pipeline where one input is processed by multiple specialized models. Two feature extraction techniques, Term Frequency–Inverse Document Frequency (TF-IDF) and GloVe100 Word Embeddings, were compared in combination with several machine learning algorithms, including Logistic Regression, Support Vector Machines (SVMs), Random Forest, Multinomial Naive Bayes, and XGBoost. Results showed that model performance varied with task complexity and the number of output classes. Trigger prediction (28 classes), Logistic Regression, and SVM achieved the best performance, with a macro-average F1-score of 0.75, while XGBoost with TF-IDF features produced the highest accuracy for Risk Category classification (five classes). In Impact prediction (15 classes), SVM with Word Embeddings demonstrated superior results. The implementation, conducted in Python (v3.9.12, Anaconda), utilized Scikit-learn, XGBoost, SHAP, and Gensim libraries. SHAP visualizations and confusion matrices enhanced model interpretability. The proposed framework contributes to scalable, text-based, predictive, quality risk management, supporting real-time project decision-making. Full article
(This article belongs to the Section Complex Systems and Cybernetics)
Show Figures

Figure 1

30 pages, 1826 KB  
Article
Unveiling the Scientific Knowledge Evolution: Carbon Capture (2007–2025)
by Kuei-Kuei Lai, Yu-Jin Hsu and Chih-Wen Hsiao
Appl. Syst. Innov. 2025, 8(6), 187; https://doi.org/10.3390/asi8060187 - 30 Nov 2025
Viewed by 564
Abstract
This study explores how research on carbon capture technologies (CCTs) has developed over time and shows how semantic text mining can improve the analysis of technology trajectories. Although CCTs are widely viewed as essential for net-zero transitions, the literature is still scattered across [...] Read more.
This study explores how research on carbon capture technologies (CCTs) has developed over time and shows how semantic text mining can improve the analysis of technology trajectories. Although CCTs are widely viewed as essential for net-zero transitions, the literature is still scattered across many subthemes, and links between engineering advances, infrastructure deployment, and policy design are often weak. Methods that rely mainly on citations or keyword frequencies tend to overlook contextual meaning and the subtle diffusion of ideas across these strands, making it difficult to reconstruct clear developmental pathways. To address this problem, we ask the following: How do CCT topics change over time? What evolutionary mechanisms drive these transitions? And which themes act as bridges between technical lineages? We first build a curated corpus using a PRISMA-based screening process. We then apply BERTopic, integrating Sentence-BERT embeddings with UMAP, HDBSCAN, and class-based TF-IDF, to identify and label coherent semantic topics. Topic evolution is modeled through a PCC-weighted, top-K filtered network, where cross-year connections are categorized as inheritance, convergence, differentiation, or extinction. These patterns are further interpreted with a Fish-Scale Multiscience mapping to clarify underlying theoretical and disciplinary lineages. Our results point to a two-stage trajectory: an early formation phase followed by a period of rapid expansion. Long-standing research lines persist in amine absorption, membrane separation, and metal–organic frameworks (MOFs), while direct air capture emerges later and becomes increasingly stable. Across the full period, five evolutionary mechanisms operate in parallel. We also find that techno-economic assessment, life-cycle and carbon accounting, and regulation–infrastructure coordination serve as key “weak-tie” bridges that connect otherwise separated subfields. Overall, the study reconstructs the core–periphery structure and maturity of CCT research and demonstrates that combining semantic topic modeling with theory-aware mapping complements strong-tie bibliometric approaches and offers a clearer, more transferable framework for understanding technology evolution. Full article
Show Figures

Figure 1

17 pages, 565 KB  
Article
From Headlines to Thumbnails: Comparative Analysis of Web Publications in Bulgarian Digital Media and YouTube
by Plamen Hristov Milev and Yavor Nikolov Tabov
Journal. Media 2025, 6(4), 202; https://doi.org/10.3390/journalmedia6040202 - 28 Nov 2025
Viewed by 847
Abstract
The objective of this study is to determine if the thematic priorities of news organizations are consistent or platform-specific by investigating the cross-platform strategies of three leading Bulgarian news agencies. Methodologically, the study combines a quantitative TF-IDF text analysis of 315,103 headlines from [...] Read more.
The objective of this study is to determine if the thematic priorities of news organizations are consistent or platform-specific by investigating the cross-platform strategies of three leading Bulgarian news agencies. Methodologically, the study combines a quantitative TF-IDF text analysis of 315,103 headlines from their websites and 6961 titles from their official YouTube channels with a qualitative analysis of YouTube thumbnails to assess their strategic visual contribution. The findings reveal a significant strategic divergence: YouTube channels are primarily dedicated to high-impact domestic political news centered on key public figures, while their official websites feature a much broader thematic scope, covering international conflicts or extensive cultural events. The thumbnail analysis further shows they function as a critical visual layer, adding emotional context and explicit cues that are not present in text headlines. This research concludes that news agencies do not simply mirror content but strategically adapt it to leverage the unique characteristics and audience expectations of each platform, employing distinct models for their YouTube and web presences. Full article
Show Figures

Graphical abstract

Back to TopTop