MDPI - Publisher of Open Access Journals

15 pages, 1600 KiB

Open AccessArticle

XLNet-CRF: Efficient Named Entity Recognition for Cyber Threat Intelligence with Permutation Language Modeling

by Tianhao Wang, Yang Liu, Chao Liang, Bailing Wang and Hongri Liu

Electronics 2025, 14(15), 3034; https://doi.org/10.3390/electronics14153034 - 30 Jul 2025

Viewed by 143

As cyberattacks continue to rise in frequency and sophistication, extracting actionable Cyber Threat Intelligence (CTI) from diverse online sources has become critical for proactive threat detection and defense. However, accurately identifying complex entities from lengthy and heterogeneous threat reports remains challenging due to [...] Read more.

As cyberattacks continue to rise in frequency and sophistication, extracting actionable Cyber Threat Intelligence (CTI) from diverse online sources has become critical for proactive threat detection and defense. However, accurately identifying complex entities from lengthy and heterogeneous threat reports remains challenging due to long-range dependencies and domain-specific terminology. To address this, we propose XLNet-CRF, a hybrid framework that combines permutation-based language modeling with structured prediction using Conditional Random Fields (CRF) to enhance Named Entity Recognition (NER) in cybersecurity contexts. XLNet-CRF directly addresses key challenges in CTI-NER by modeling bidirectional dependencies and capturing non-contiguous semantic patterns more effectively than traditional approaches. Comprehensive evaluations on two benchmark cybersecurity corpora validate the efficacy of our approach. On the CTI-Reports dataset, XLNet-CRF achieves a precision of 97.41% and an F1-score of 97.43%; on MalwareTextDB, it attains a precision of 85.33% and an F1-score of 88.65%—significantly surpassing strong BERT-based baselines in both accuracy and robustness. Full article

► Show Figures

Figure 1

16 pages, 1242 KiB

Open AccessReview

Micro-Ultrasound in the Detection of Clinically Significant Prostate Cancer: A Comprehensive Review and Comparison with Multiparametric MRI

by Julien DuBois, Shayan Smani, Aleksandra Golos, Carlos Rivera Lopez and Soum D. Lokeshwar

Tomography 2025, 11(7), 80; https://doi.org/10.3390/tomography11070080 - 8 Jul 2025

Viewed by 465

Abstract

Background/Objectives: Multiparametric MRI (mpMRI) is widely established as the standard imaging modality for detecting clinically significant prostate cancer (csPCa), yet it can be limited by cost, accessibility, and the need for specialized radiologist interpretation. Micro-ultrasound (micro-US) has recently emerged as a more accessible [...] Read more.

Background/Objectives: Multiparametric MRI (mpMRI) is widely established as the standard imaging modality for detecting clinically significant prostate cancer (csPCa), yet it can be limited by cost, accessibility, and the need for specialized radiologist interpretation. Micro-ultrasound (micro-US) has recently emerged as a more accessible alternative imaging modality. This review evaluates whether the evidence base for micro-US meets thresholds comparable to those that led to MRI’s guideline adoption, synthesizes diagnostic performance data compared to mpMRI, and outlines future research priorities to define its clinical role. Methods: A targeted literature review of PubMed, Embase, and the Cochrane Library was conducted for studies published between 2014 and May 2025 evaluating micro-US in csPCa detection. Search terms included “micro-ultrasound,” “ExactVu,” “PRI-MUS,” and related terminology. Study relevance was assessed independently by the authors. Extracted data included csPCa detection rates, modality concordance, and diagnostic accuracy, and were synthesized and, rarely, restructured to facilitate study comparisons. Results: Micro-US consistently demonstrated non-inferiority to mpMRI for csPCa detection across retrospective studies, prospective cohorts, and meta-analyses. Several studies reported discordant csPCa lesions detected by only one modality, highlighting potential complementarity. The recently published OPTIMUM randomized controlled trial offers the strongest individual-trial evidence to date in support of micro-US non-inferiority. Conclusions: Micro-US shows potential as an alternative or adjunct to mpMRI for csPCa detection. However, additional robust multicenter studies are needed to achieve the evidentiary strength that led mpMRI to distinguish itself in clinical guidelines. Full article

(This article belongs to the Special Issue New Trends in Diagnostic and Interventional Radiology)

► Show Figures

Figure 1

18 pages, 650 KiB

Open AccessSystematic Review

Home-Based Community Elderly Care Quality Indicators in China: A Systematic Literature Review

by Xi Chen, Rahimah Ibrahim, Yok Fee Lee, Tengku Aizan Hamid and Sen Tyng Chai

Healthcare 2025, 13(14), 1637; https://doi.org/10.3390/healthcare13141637 - 8 Jul 2025

Viewed by 404

Abstract

Background: China’s rapidly aging population has increased the need for effective community-based eldercare services. However, the lack of standardized, culturally relevant evaluation frameworks hinders consistent service quality assessment and improvement. Objective: This systematic review aims to identify, synthesize, and critically evaluate [...] Read more.

Background: China’s rapidly aging population has increased the need for effective community-based eldercare services. However, the lack of standardized, culturally relevant evaluation frameworks hinders consistent service quality assessment and improvement. Objective: This systematic review aims to identify, synthesize, and critically evaluate the existing quality indicators (QIs) currently utilized for home-based community elderly care HCEC in China. It also aims to highlight gaps to inform the development of a more comprehensive and context-appropriate quality framework. Methods: Following PRISMA guidelines, systematic searches were conducted across Web of Science, PubMed, Wiley, and CNKI databases for studies published in English and Chinese from 2008 onward. Extracted QIs from eligible studies were categorized using Donabedian’s structure–process–outcome (SPO) model. Results: Fifteen studies met the inclusion criteria, with QI sets ranging from 5 to 64 indicators. Most studies emphasized structural and procedural aspects, while outcome measures were limited. Key gaps include inconsistent terminology, insufficient medical care integration, narrow stakeholder engagement, and limited cultural adaptation of Western theoretical frameworks. Furthermore, subjective weighting methods predominated, impacting indicator reliability. Conclusions: Currently, there is no formal quality framework to guide service providers in HCEC, and therefore, quality indicators can be described as fragmented and lack cultural specificity, medical integration, and methodological robustness. Future research should prioritize developing culturally anchored and medically comprehensive QI frameworks, standardize indicator terminology, actively involve diverse stakeholders through participatory methods, and adopt hybrid methodological approaches combining subjective expert insights and objective, data-driven techniques. Alignment with established international standards, such as the OECD long-term care quality indicators, is essential to enhance eldercare quality and support evidence-based policymaking. Full article

(This article belongs to the Special Issue Healthcare Practice in Community)

► Show Figures

Figure 1

23 pages, 1290 KiB

Open AccessArticle

A KeyBERT-Enhanced Pipeline for Electronic Information Curriculum Knowledge Graphs: Design, Evaluation, and Ontology Alignment

by Guanghe Zhuang and Xiang Lu

Information 2025, 16(7), 580; https://doi.org/10.3390/info16070580 - 6 Jul 2025

Viewed by 451

Abstract

This paper proposes a KeyBERT-based method for constructing a knowledge graph of the electronic information curriculum system, aiming to enhance the structured representation and relational analysis of educational content. Electronic Information Engineering curricula encompass diverse and rapidly evolving topics; however, existing knowledge graphs [...] Read more.

This paper proposes a KeyBERT-based method for constructing a knowledge graph of the electronic information curriculum system, aiming to enhance the structured representation and relational analysis of educational content. Electronic Information Engineering curricula encompass diverse and rapidly evolving topics; however, existing knowledge graphs often overlook multi-word concepts and more nuanced semantic relationships. To address this gap, this paper presents a KeyBERT-enhanced method for constructing a knowledge graph of the electronic information curriculum system. Utilizing teaching plans, syllabi, and approximately 500,000 words of course materials from 17 courses, we first extracted 500 knowledge points via the Term Frequency–Inverse Document Frequency (TF-IDF) algorithm to build a baseline course–knowledge matrix and visualize the preliminary graph using Graph Convolutional Networks (GCN) and Neo4j. We then applied KeyBERT to extract about 1000 knowledge points—approximately 65% of extracted terms were multi-word phrases—and augment the graph with co-occurrence and semantic-similarity edges. Comparative experiments demonstrate a ~20% increase in non-zero matrix coverage and a ~40% boost in edge count (from 5100 to 7100), significantly enhancing graph connectivity. Moreover, we performed sensitivity analysis on extraction thresholds (co-occurrence ≥ 5, similarity ≥ 0.7), revealing that (5, 0.7) maximizes the F1-score at 0.83. Hyperparameter ablation over n-gram ranges [(1,1),(1,2),(1,3)] and top_n [5, 10, 15] identifies (1,3) + top_n = 10 as optimal (Precision = 0.86, Recall = 0.81, F1 = 0.83). Finally, GCN downstream tests show that, despite higher sparsity (KeyBERT 64% vs. TF-IDF 40%), KeyBERT features achieve Accuracy = 0.78 and F1 = 0.75, outperforming TF-IDF’s 0.66/0.69. This approach offers a novel, rigorously evaluated solution for optimizing the electronic information curriculum system and can be extended through terminology standardization or larger data integration. Full article

► Show Figures

Figure 1

16 pages, 275 KiB

Open AccessArticle

Distinguishing Dyslexia, Attention Deficit, and Learning Disorders: Insights from AI and Eye Movements

by Alae Eddine El Hmimdi and Zoï Kapoula

Bioengineering 2025, 12(7), 737; https://doi.org/10.3390/bioengineering12070737 - 5 Jul 2025

Viewed by 432

Abstract

This study investigates whether eye movement abnormalities can differentiate between distinct clinical annotations of dyslexia, attention deficit, or school learning difficulties in children. Utilizing a selection of saccade and vergence eye movement data from a large clinical dataset recorded across 20 European centers [...] Read more.

This study investigates whether eye movement abnormalities can differentiate between distinct clinical annotations of dyslexia, attention deficit, or school learning difficulties in children. Utilizing a selection of saccade and vergence eye movement data from a large clinical dataset recorded across 20 European centers using the REMOBI and AIDEAL technologies, this research study focuses on individuals annotated with only one of the three annotations. The selected dataset includes 355 individuals for saccade tests and 454 for vergence tasks. Eye movement analysis was performed with AIDEAL software. Key parameters, such as amplitude, latency, duration, and velocity, are extracted and processed to remove outliers and standardize values. Machine learning models, including logistic regression, random forest, support vector machines, and neural networks, are trained using a GroupKFold strategy to ensure patient data are present in either the training or test set. Results from the machine learning models revealed that children annotated solely with dyslexia could be successfully identified based on their saccade and vergence eye movements, while identification of the other two categories was less distinct. Statistical evaluation using the Kruskal–Wallis test highlighted significant group mean differences in several saccade parameters, such as a velocity and latency, particularly for dyslexics relative to the other two groups. These findings suggest that specific terminology, such as “dyslexia”, may capture unique eye movement patterns, underscoring the importance of eye movement analysis as a diagnostic tool for understanding the complexity of these conditions. This study emphasizes the potential of eye movement analysis in refining diagnostic precision and capturing the nuanced differences between dyslexia, attention deficits, and general learning difficulties. Full article

(This article belongs to the Special Issue Machine Learning and Eye Movements: Insights into Learning and Autism Spectrum Disorders)

► Show Figures

Figure A1

14 pages, 1324 KiB

Open AccessArticle

Preprocessing of Physician Notes by LLMs Improves Clinical Concept Extraction Without Information Loss

by Daniel B. Hier, Michael A. Carrithers, Steven K. Platt, Anh Nguyen, Ioannis Giannopoulos and Tayo Obafemi-Ajayi

Information 2025, 16(6), 446; https://doi.org/10.3390/info16060446 - 27 May 2025

Viewed by 746

Abstract

Clinician notes are a rich source of patient information, but often contain inconsistencies due to varied writing styles, abbreviations, medical jargon, grammatical errors, and non-standard formatting. These inconsistencies hinder their direct use in patient care and degrade the performance of downstream computational applications [...] Read more.

Clinician notes are a rich source of patient information, but often contain inconsistencies due to varied writing styles, abbreviations, medical jargon, grammatical errors, and non-standard formatting. These inconsistencies hinder their direct use in patient care and degrade the performance of downstream computational applications that rely on these notes as input, such as quality improvement, population health analytics, precision medicine, clinical decision support, and research. We present a large-language-model (LLM) approach to the preprocessing of 1618 neurology notes. The LLM corrected spelling and grammatical errors, expanded acronyms, and standardized terminology and formatting, without altering clinical content. Expert review of randomly sampled notes confirmed that no significant information was lost. To evaluate downstream impact, we applied an ontology-based NLP pipeline (Doc2Hpo) to extract biomedical concepts from the notes before and after editing. F1 scores for Human Phenotype Ontology extraction improved from 0.40 to 0.61, confirming our hypothesis that better inputs yielded better outputs. We conclude that LLM-based preprocessing is an effective error correction strategy that improves data quality at the level of free text in clinical notes. This approach may enhance the performance of a broad class of downstream applications that derive their input from unstructured clinical documentation. Full article

(This article belongs to the Special Issue Biomedical Natural Language Processing and Text Mining)

► Show Figures

Figure 1

26 pages, 3691 KiB

Open AccessArticle

LLM-ACNC: Aerospace Requirement Texts Knowledge Graph Construction Utilizing Large Language Model

by Yuhao Liu, Junjie Hou, Yuxuan Chen, Jie Jin and Wenyue Wang

Aerospace 2025, 12(6), 463; https://doi.org/10.3390/aerospace12060463 - 23 May 2025

Viewed by 717

Abstract

Traditional methods for requirement identification depend on the manual transformation of unstructured requirement texts into formal documents, a process that is both inefficient and prone to errors. Although requirement knowledge graphs offer structured representations, current named entity recognition and relation extraction techniques continue [...] Read more.

Traditional methods for requirement identification depend on the manual transformation of unstructured requirement texts into formal documents, a process that is both inefficient and prone to errors. Although requirement knowledge graphs offer structured representations, current named entity recognition and relation extraction techniques continue to face significant challenges in processing the specialized terminology and intricate sentence structures characteristic of the aerospace domain. To overcome these limitations, this study introduces a novel approach for constructing aerospace-specific requirement knowledge graphs using a large language model. The method first employs the GPT model for data augmentation, followed by BERTScore filtering to ensure data quality and consistency. An efficient continual learning based on token index encoding is then implemented, guiding the model to focus on key information and enhancing domain adaptability through fine-tuning of the Qwen2.5 (7B) model. Furthermore, a chain-of-thought reasoning framework is established for improved entity and relation recognition, coupled with a dynamic few-shot learning strategy that selects examples adaptively based on input characteristics. Experimental results validate the effectiveness of the proposed method, achieving F1 scores of 88.75% in NER and 89.48% in relation extraction tasks. Full article

► Show Figures

Figure 1

22 pages, 2705 KiB

Open AccessArticle

Exploring the Impact of Students’ Prompt Engineering on GPT’s Performance: A Blockchain-Focused Automatic Term Extraction Experiment

by Aliya Nugumanova, Almas Alzhanov, Aigerim Mansurova and Madina Mansurova

Electronics 2025, 14(11), 2098; https://doi.org/10.3390/electronics14112098 - 22 May 2025

Viewed by 686

Abstract

To address the need for comprehensive terminology construction in rapidly evolving domains such as blockchain, this study examines how large language models (LLMs), particularly GPT, enhance automatic term extraction through human feedback. The experimental part involves 60 bachelor’s students interacting with GPT in [...] Read more.

To address the need for comprehensive terminology construction in rapidly evolving domains such as blockchain, this study examines how large language models (LLMs), particularly GPT, enhance automatic term extraction through human feedback. The experimental part involves 60 bachelor’s students interacting with GPT in a three-step iterative prompting process: initial prompt formulation, intermediate refinement, and final adjustment. At each step, the students’ prompts are evaluated by a teacher using a structured rubric based on 6C criteria (clarity, complexity, coherence, creativity, consistency, and contextuality), with their summed scores forming an overall grade. The analysis indicates that (1) students’ overall grades correlate with GPT’s performance across all steps, reaching the highest correlation (0.87) at Step 3; (2) the importance of rubric criteria varies across steps, e.g., clarity and creativity are the most crucial initially, while complexity, coherence and consistency influence subsequent refinements, with contextuality having no effect at all steps; and (3) the linguistic accuracy of prompt formulations significantly outweighs domain-specific factual content in influencing GPT’s performance. These findings suggest GPT has a robust foundational understanding of blockchain terminology, making clear, consistent, and linguistically structured prompts more effective than contextual domain-specific explanations for automatic term extraction. Full article

(This article belongs to the Special Issue Techniques and Applications in Prompt Engineering and Generative AI)

► Show Figures

Figure 1

16 pages, 1260 KiB

Open AccessReview

Clinical Evidence of Bee Venom Acupuncture for Ankle Pain: A Review of Clinical Research

by Soo-Hyun Sung, Hyein Jeong, Jong-Hyun Park, Minjung Park and Gihyun Lee

Toxins 2025, 17(5), 257; https://doi.org/10.3390/toxins17050257 - 21 May 2025

Viewed by 992

Abstract

The prevalence of ankle pain in adults is 9–15%, with up to 45% of sports-related injuries attributed to ankle pain and injuries. If ankle pain is not controlled in a timely manner, it can lead to ankle instability, resulting in further damage, recurrence [...] Read more.

The prevalence of ankle pain in adults is 9–15%, with up to 45% of sports-related injuries attributed to ankle pain and injuries. If ankle pain is not controlled in a timely manner, it can lead to ankle instability, resulting in further damage, recurrence of pain, and secondary injuries. The present study aimed to assess the therapeutic potential and safety profile of bee venom acupuncture (BVA) in the management of ankle pain. Ten electronic databases were searched for articles published up to March 2025. We included clinical studies that utilized BVA for the treatment of ankle pain and studies that included pain- and function-related assessment tools. The safety of bee venom acupuncture (BVA) was assessed by extracting adverse events from the included studies and categorizing them according to the Common Terminology Criteria for Adverse Events (CTCAE). A total of 14 clinical studies were selected, of which 9 were case reports, 2 were case-controlled clinical trials (CCTs), and 3 were randomized controlled trials (RCTs). The conditions causing ankle pain were mostly traumatic (42.9%), followed by inflammatory (21.4%) and neuropathic disorders (14.3%). BVA was applied at concentrations ranging from 0.05 to 0.5 mg/mL, with a per-session volume ranging from 0.04 to 2.5 mL. In most studies, BVA was reported to improve both ankle pain and function simultaneously. Among the 14 studies, four participants reported adverse events following BVA treatment, all of which were classified as grade 1 or grade 2, indicating mild to moderate severity. This review suggests that BVA may be recommended for controlling ankle pain based on clinical evidence. However, the number of high-quality RCTs is limited, and half of the studies did not report side effects, indicating the need for further clinical research to verify its safety and efficacy. Full article

(This article belongs to the Special Issue Clinical Evidence for Therapeutic Effects and Safety of Animal Venoms)

► Show Figures

Graphical abstract

28 pages, 1928 KiB

Open AccessArticle

Deep Learning-Based Automatic Summarization of Chinese Maritime Judgment Documents

by Lin Zhang, Yanan Li and Hongyu Zhang

Appl. Sci. 2025, 15(10), 5434; https://doi.org/10.3390/app15105434 - 13 May 2025

Viewed by 379

Abstract

In the context of China’s accelerating maritime judicial digitization, automatic summarization of lengthy and terminology-rich judgment documents has become a critical need for improving legal efficiency. Focusing on the task of automatic summarization for Chinese maritime judgment documents, we propose HybridSumm, an “extraction–abstraction” [...] Read more.

In the context of China’s accelerating maritime judicial digitization, automatic summarization of lengthy and terminology-rich judgment documents has become a critical need for improving legal efficiency. Focusing on the task of automatic summarization for Chinese maritime judgment documents, we propose HybridSumm, an “extraction–abstraction” hybrid summarization framework that integrates a maritime judgment lexicon to address the unique characteristics of maritime legal texts, including their extended length and dense domain-specific terminology. First, we construct a specialized maritime judgment lexicon to enhance the accuracy of legal term identification, specifically targeting the complexity of maritime terminology. Second, for long-text processing, we design an extractive summarization model that integrates the RoBERTa-wwm-ext pre-trained model with dilated convolutional networks and residual mechanisms. It can efficiently identify key sentences by capturing both local semantic features and global contextual relationships in lengthy judgments. Finally, the abstraction stage employs a Nezha-UniLM encoder–decoder architecture, augmented with a pointer–generator network (for out-of-vocabulary term handling) and a coverage mechanism (to reduce redundancy), ensuring that summaries are logically coherent and legally standardized. Experimental results show that HybridSumm’s lexicon-guided two-stage framework significantly enhances the standardization of legal terminology and semantic coherence in long-text summaries, validating its practical value in advancing judicial intelligence development. Full article

(This article belongs to the Special Issue Data Analysis and Data Mining for Knowledge Discovery)

► Show Figures

Figure 1

17 pages, 606 KiB

Open AccessReview

The Effects of Multicomponent Training on Clinical, Functional, and Psychological Outcomes in Cardiovascular Disease: A Narrative Review

by Luca Poli, Alessandro Petrelli, Francesco Fischetti, Stefania Morsanuto, Livica Talaba, Stefania Cataldi and Gianpiero Greco

Medicina 2025, 61(5), 822; https://doi.org/10.3390/medicina61050822 - 29 Apr 2025

Cited by 1 | Viewed by 486

Abstract

Cardiovascular diseases (CVDs) remain the leading cause of death globally. In recent years, interest in multicomponent interventions has grown as a response to the multifactorial complexity of CVDs. However, the literature still shows little systematic investigation into the effectiveness of multicomponent training (MCT) [...] Read more.

Cardiovascular diseases (CVDs) remain the leading cause of death globally. In recent years, interest in multicomponent interventions has grown as a response to the multifactorial complexity of CVDs. However, the literature still shows little systematic investigation into the effectiveness of multicomponent training (MCT) in the field of CVDs, accompanied by terminological confusion. This study aims to summarize and critically appraise the recent literature through a narrative review. A narrative review was conducted, synthesizing evidence from studies published between 2010 and January 2025. The databases searched included PubMed, Scopus, and Google Scholar using predefined search terms related to CVDs and MCT, and medical subject headings (MeSHs) and Boolean syntax. Two team authors independently extracted relevant information from the included studies. MCT significantly improved hemodynamic parameters in CVD patients, with reductions in systolic, diastolic, mean blood pressure, and heart rate. Physical fitness measures showed consistent enhancements whereas anthropometric improvements often corresponded with blood pressure reductions. Psychological outcomes varied across studies, with intervention duration emerging as a key factor in effectiveness. MCT interventions could lead to improvements in clinical outcomes, risk factor reduction, and patient adherence. Although findings on psychological parameters remain inconsistent, the overall evidence supports their integration into both clinical and community settings. Full article

(This article belongs to the Section Cardiology)

► Show Figures

Figure 1

21 pages, 3806 KiB

Open AccessArticle

Research on the Method of Air Traffic Control Instruction Keyword Extraction Based on the Roberta-Attention-BiLSTM-CRF Model

by Sheng Chen, Weijun Pan, Yidi Wang, Shenhao Chen and Xuan Wang

Aerospace 2025, 12(5), 376; https://doi.org/10.3390/aerospace12050376 - 27 Apr 2025

Viewed by 495

Abstract

In recent years, with the increasing complexity of air traffic management and the rapid development of automation technology, efficiently and accurately extracting key information from large volumes of air traffic control (ATC) instructions has become essential for ensuring flight safety and improving the [...] Read more.

In recent years, with the increasing complexity of air traffic management and the rapid development of automation technology, efficiently and accurately extracting key information from large volumes of air traffic control (ATC) instructions has become essential for ensuring flight safety and improving the efficiency of air traffic control. However, this task is challenging due to the specialized terminology involved and the high real-time requirements for data collection and processing. While existing keyword extraction methods have made some progress, most of them still perform unsatisfactorily on ATC instruction data due to issues such as data irregularities and the lack of domain-specific knowledge. To address these challenges, this paper proposes a Roberta-Attention-BiLSTM-CRF model for keyword extraction from ATC instructions. The RABC model introduces an attention mechanism specifically designed to extract keywords from multi-segment ATC instruction texts. Moreover, the BiLSTM component enhances the model’s ability to capture detailed semantic information within individual sentences during the keyword extraction process. Finally, by integrating a Conditional Random Field (CRF), the model can predict and output multiple keywords in the correct sequence. Experimental results on an ATC instruction dataset demonstrate that the RABC model achieves an accuracy of 89.5% in keyword extraction and a sequence match accuracy of 91.3%, outperforming other models across multiple evaluation metrics. These results validate the effectiveness of the proposed model in extracting keywords from ATC instruction data and demonstrate its potential for advancing automation in air traffic control. Full article

(This article belongs to the Section Air Traffic and Transportation)

► Show Figures

Figure 1

16 pages, 2935 KiB

Open AccessArticle

LLM-Enhanced Framework for Building Domain-Specific Lexicon for Urban Power Grid Design

by Yan Xu, Tao Wang, Yang Yuan, Ziyue Huang, Xi Chen, Bo Zhang, Xiaorong Zhang and Zehua Wang

Appl. Sci. 2025, 15(8), 4134; https://doi.org/10.3390/app15084134 - 9 Apr 2025

Cited by 1 | Viewed by 730

Abstract

Traditional methods for urban power grid design have struggled to meet the demands of multi-energy integration and high resilience scenarios due to issues such as delayed updates of terminology and semantic ambiguity. Current techniques for constructing domain-specific lexicons face challenges like the insufficient [...] Read more.

Traditional methods for urban power grid design have struggled to meet the demands of multi-energy integration and high resilience scenarios due to issues such as delayed updates of terminology and semantic ambiguity. Current techniques for constructing domain-specific lexicons face challenges like the insufficient coverage of specialized vocabulary and imprecise synonym mining, which restrict the semantic parsing capabilities of intelligent design systems. To address these challenges, this study proposes a framework for constructing a domain-specific lexicon for urban power grid design based on Large Language Models (LLMs). The aim is to enhance the accuracy and practicality of the lexicon through multi-level term extraction and synonym expansion. Initially, a structured corpus covering national and industry standards in the field of power was constructed. An improved Term Frequency–Inverse Document Frequency (TF-IDF) algorithm, combined with mutual information and adjacency entropy filtering mechanisms, was utilized to extract high-quality seed vocabulary from 3426 candidate terms. Leveraging LLMs, multi-level prompt templates were designed to guide synonym mining, incorporating a self-correction mechanism for semantic verification to mitigate errors caused by model hallucinations. This approach successfully built a domain-specific lexicon comprising 3426 core seed words and 10,745 synonyms. The average cosine similarity of synonym pairs reached 0.86, and expert validation confirmed an accuracy rate of 89.3%; text classification experiments showed that integrating the domain-specific dictionary improved the classifier’s F1-score by 9.2%, demonstrating the effectiveness of the method. This research innovatively constructs a high-precision terminology dictionary in the field of power design for the first time through embedding domain-driven constraints and validation workflows, solving the problems of insufficient coverage and imprecise expansion of traditional methods, and supporting the development of semantically intelligent systems for smart urban power grid design, with significant practical application value. Full article

(This article belongs to the Special Issue Advances in Smart Construction and Intelligent Buildings)

► Show Figures

Figure 1

24 pages, 1414 KiB

Open AccessReview

Microplastics in Water Resources: Threats and Challenges

by Wojciech Strojny, Renata Gruca-Rokosz and Maksymilian Cieśla

Appl. Sci. 2025, 15(8), 4118; https://doi.org/10.3390/app15084118 - 9 Apr 2025

Viewed by 2069

Abstract

This study is a review of current knowledge on microplastics (MPs) in aquatic environments. In addition to identifying the sources of contamination by MPs in water and the hazards of their presence, an attempt is made to systematize the terminology of polymeric microparticles [...] Read more.

This study is a review of current knowledge on microplastics (MPs) in aquatic environments. In addition to identifying the sources of contamination by MPs in water and the hazards of their presence, an attempt is made to systematize the terminology of polymeric microparticles according to their size and to describe other parameters characteristic of MPs, i.e., shape and color. Special focus was placed on a review of the most important methods used to extract MPs from environmental matrices, as well as the latest and most effective analytical methods, highlighting their advantages and disadvantages. The value of the paper is in pointing out important developments in MPs analytics, identifying existing inaccuracies and limitations in the field and providing practical guidance. Thanks to its comprehensive approach, this article is a valuable resource for researchers concerned with the problem of environmental MPs pollution. Full article

(This article belongs to the Special Issue Pollution Control and Environmental Remediation)

► Show Figures

Figure 1

21 pages, 3129 KiB

Open AccessArticle

Optimizing Contextonym Analysis for Terminological Definition Writing

by Antonio San Martín

Information 2025, 16(4), 257; https://doi.org/10.3390/info16040257 - 22 Mar 2025

Viewed by 500

Abstract

To write terminological definitions that meet user needs, terminologists require methods that help them effectively select the most relevant information to be included in a definition. In this sense, a corpus technique that can be useful for the definition of terms is contextonym [...] Read more.

To write terminological definitions that meet user needs, terminologists require methods that help them effectively select the most relevant information to be included in a definition. In this sense, a corpus technique that can be useful for the definition of terms is contextonym analysis. It involves the quantitative analysis of the other terms with which the term to be defined usually co-occurs (i.e., its contextonyms), regardless of any syntactic or semantic relationship. This paper presents a study conducted to determine the optimal configuration for extracting contextonyms for the creation of terminological definitions. More specifically, this study aims to create a word sketch column in Sketch Engine that lists contextonyms, offering a user-friendly method for their extraction. This study has identified that the optimal context window for extracting contextonyms in the form of word sketches in English to inform definition writing is 50 tokens, and that these contextonyms should be ranked by frequency. Full article

(This article belongs to the Special Issue Information Extraction and Language Discourse Processing)

► Show Figures

Graphical abstract

Search Results (79)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (79)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI