Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (55)

Search Parameters:
Keywords = lexicon construction

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 1344 KB  
Article
Research on Intelligent Extraction Method of Influencing Factors of Loess Landslide Geological Disasters Based on Soft-Lexicon and GloVe
by Lutong Huang, Yueqin Zhu, Yingfei Li, Tianxiao Yan, Yu Xiao, Dongqi Wei, Ziyao Xing and Jian Li
Appl. Sci. 2025, 15(16), 8879; https://doi.org/10.3390/app15168879 - 12 Aug 2025
Viewed by 177
Abstract
Loess landslide disasters are influenced by a multitude of factors, including slope conditions, triggering mechanisms, and spatial attributes. Extracting these factors from unstructured geological texts is challenging due to nested entities, semantic ambiguity, and rare domain-specific terms. This study proposes a joint extraction [...] Read more.
Loess landslide disasters are influenced by a multitude of factors, including slope conditions, triggering mechanisms, and spatial attributes. Extracting these factors from unstructured geological texts is challenging due to nested entities, semantic ambiguity, and rare domain-specific terms. This study proposes a joint extraction framework guided by a domain ontology that categorizes six types of loess landslide influencing factors, including spatial relationships. The ontology facilitates conceptual classification and semi-automatic nested entity annotation, enabling the construction of a high-quality corpus with eight tag types. The model integrates a Soft-Lexicon mechanism that enhances character-level GloVe embeddings with explicit lexical features, including domain terms, part-of-speech tags, and word boundary indicators derived from a domain-specific lexicon. The resulting hybrid character-level representations are then fed into a BiLSTM-CRF architecture to jointly extract entities, attributes, and multi-level spatial and causal relationships. Extracted results are structured using a content-knowledge model to build a spatially enriched knowledge graph, supporting semantic queries and intelligent reasoning. Experimental results demonstrate improved performance over baseline methods, showcasing the framework’s effectiveness in geohazard information extraction and disaster risk analysis. Full article
(This article belongs to the Special Issue Applications of Big Data and Artificial Intelligence in Geoscience)
Show Figures

Figure 1

22 pages, 967 KB  
Article
Developing a Sentiment Lexicon-Based Quality Performance Evaluation Model on Construction Projects in Korea
by Kiseok Lee, Taegeun Song, Yoonseok Shin and Wi Sung Yoo
Buildings 2025, 15(16), 2817; https://doi.org/10.3390/buildings15162817 - 8 Aug 2025
Viewed by 265
Abstract
The increasing frequency of structural failures on construction sites emphasizes the critical role of rigorous supervision in ensuring the quality of both construction processes and materials. Current regulatory frameworks mandate the production of detailed supervision reports to provide comprehensive evaluations of construction quality, [...] Read more.
The increasing frequency of structural failures on construction sites emphasizes the critical role of rigorous supervision in ensuring the quality of both construction processes and materials. Current regulatory frameworks mandate the production of detailed supervision reports to provide comprehensive evaluations of construction quality, material compliance, and site records. This study proposes a novel approach to harnessing unstructured reports for automated quality assessment. Employing text mining techniques, a sentiment lexicon specifically tailored for quality performance evaluation was developed. A corpus-based manual classification was conducted on 291 relevant words and 432 sentences extracted from the supervision reports, assigning sentiment labels of negative, neutral, and positive. This sentiment lexicon was then utilized as fundamental information for the Quality Performance Evaluation Model (QPEM). To validate the efficacy of the QPEM, it was applied to supervision reports from 30 construction sites adhering to legal standards. Furthermore, a Pearson correlation analysis was performed with the actual outcomes based on the legal requirements, including quality test failure rate, material inspection failure rate, and inspection management performance. By leveraging the wealth of unstructured data continuously generated throughout a project’s lifecycle, this model can enhance the timeliness of inspection and management processes, ultimately contributing to improved construction performance. Full article
(This article belongs to the Section Construction Management, and Computers & Digitization)
Show Figures

Figure 1

27 pages, 3926 KB  
Article
A Multi-Source Embedding-Based Named Entity Recognition Model for Knowledge Graph and Its Application to On-Site Operation Violations in Power Grid Systems
by Lingwen Meng, Yulin Wang, Guobang Ban, Yuanjun Huang, Xinshan Zhu and Shumei Zhang
Electronics 2025, 14(13), 2511; https://doi.org/10.3390/electronics14132511 - 20 Jun 2025
Viewed by 407
Abstract
With the increasing complexity of power grid field operations, frequent operational violations have emerged as a major concern in the domain of power grid field operation safety. To support dispatchers in accurately identifying and addressing violation risks, this paper introduces a profiling approach [...] Read more.
With the increasing complexity of power grid field operations, frequent operational violations have emerged as a major concern in the domain of power grid field operation safety. To support dispatchers in accurately identifying and addressing violation risks, this paper introduces a profiling approach for power grid field operation violations based on knowledge graph techniques. The method enables deep modeling and structured representation of violation behaviors. In the structured data processing phase, statistical analysis is conducted based on predefined rules, and mutual information is employed to quantify the contribution of various operational factors to violations. At the municipal bureau level, statistical modeling of violation characteristics is performed to support regional risk assessment. For unstructured textual data, a multi-source embedding-based named entity recognition (NER) model is developed, incorporating domain-specific power lexicon information to enhance the extraction of key entities. High-weight domain terms related to violations are further identified using the TF-IDF algorithm to characterize typical violation behaviors. Based on the extracted entities and relationships, a knowledge graph of field operation violations is constructed, providing a computable and inferable semantic representation of operational scenarios. Finally, visualization techniques are applied to present the structural patterns and distributional features of violations, offering graph-based support for violation risk analysis and dispatch decision-making. Experimental results demonstrate that the proposed method effectively identifies critical features of violation behaviors and provides a structured foundation for intelligent decision support in power grid operation management. Full article
(This article belongs to the Special Issue Knowledge Information Extraction Research)
Show Figures

Figure 1

24 pages, 1461 KB  
Article
Syllable-, Bigram-, and Morphology-Driven Pseudoword Generation in Greek
by Kosmas Kosmidis, Vassiliki Apostolouda and Anthi Revithiadou
Appl. Sci. 2025, 15(12), 6582; https://doi.org/10.3390/app15126582 - 11 Jun 2025
Viewed by 523
Abstract
Pseudowords are essential in (psycho)linguistic research, offering a way to study language without meaning interference. Various methods for creating pseudowords exist, but each has its limitations. Traditional approaches modify existing words, risking unintended recognition. Modern algorithmic methods use high-frequency n-grams or syllable [...] Read more.
Pseudowords are essential in (psycho)linguistic research, offering a way to study language without meaning interference. Various methods for creating pseudowords exist, but each has its limitations. Traditional approaches modify existing words, risking unintended recognition. Modern algorithmic methods use high-frequency n-grams or syllable deconstruction but often require specialized expertise. Currently, no automatic process for pseudoword generation is designed explicitly for Greek, which is our primary focus. Therefore, we developed SyBig-r-Morph, a novel application that constructs pseudowords using syllables as the main building block, replicating Greek phonotactic patterns. SyBig-r-Morph draws input from word lists and databases that include syllabification, word length, part of speech, and frequency information. It categorizes syllables by position to ensure phonotactic consistency with user-selected morphosyntactic categories and can optionally assign stress to generated words. Additionally, the tool uses multiple lexicons to eliminate phonologically invalid combinations. Its modular architecture allows easy adaptation to other languages. To further evaluate its output, we conducted a manual assessment using a tool that verifies phonotactic well-formedness based on phonological parameters derived from a corpus. Most SyBig-r-Morph words passed the stricter phonotactic criteria, confirming the tool’s sound design and linguistic adequacy. Full article
(This article belongs to the Special Issue Computational Linguistics: From Text to Speech Technologies)
Show Figures

Figure 1

28 pages, 1928 KB  
Article
Deep Learning-Based Automatic Summarization of Chinese Maritime Judgment Documents
by Lin Zhang, Yanan Li and Hongyu Zhang
Appl. Sci. 2025, 15(10), 5434; https://doi.org/10.3390/app15105434 - 13 May 2025
Viewed by 451
Abstract
In the context of China’s accelerating maritime judicial digitization, automatic summarization of lengthy and terminology-rich judgment documents has become a critical need for improving legal efficiency. Focusing on the task of automatic summarization for Chinese maritime judgment documents, we propose HybridSumm, an “extraction–abstraction” [...] Read more.
In the context of China’s accelerating maritime judicial digitization, automatic summarization of lengthy and terminology-rich judgment documents has become a critical need for improving legal efficiency. Focusing on the task of automatic summarization for Chinese maritime judgment documents, we propose HybridSumm, an “extraction–abstraction” hybrid summarization framework that integrates a maritime judgment lexicon to address the unique characteristics of maritime legal texts, including their extended length and dense domain-specific terminology. First, we construct a specialized maritime judgment lexicon to enhance the accuracy of legal term identification, specifically targeting the complexity of maritime terminology. Second, for long-text processing, we design an extractive summarization model that integrates the RoBERTa-wwm-ext pre-trained model with dilated convolutional networks and residual mechanisms. It can efficiently identify key sentences by capturing both local semantic features and global contextual relationships in lengthy judgments. Finally, the abstraction stage employs a Nezha-UniLM encoder–decoder architecture, augmented with a pointer–generator network (for out-of-vocabulary term handling) and a coverage mechanism (to reduce redundancy), ensuring that summaries are logically coherent and legally standardized. Experimental results show that HybridSumm’s lexicon-guided two-stage framework significantly enhances the standardization of legal terminology and semantic coherence in long-text summaries, validating its practical value in advancing judicial intelligence development. Full article
(This article belongs to the Special Issue Data Analysis and Data Mining for Knowledge Discovery)
Show Figures

Figure 1

16 pages, 2935 KB  
Article
LLM-Enhanced Framework for Building Domain-Specific Lexicon for Urban Power Grid Design
by Yan Xu, Tao Wang, Yang Yuan, Ziyue Huang, Xi Chen, Bo Zhang, Xiaorong Zhang and Zehua Wang
Appl. Sci. 2025, 15(8), 4134; https://doi.org/10.3390/app15084134 - 9 Apr 2025
Cited by 1 | Viewed by 846
Abstract
Traditional methods for urban power grid design have struggled to meet the demands of multi-energy integration and high resilience scenarios due to issues such as delayed updates of terminology and semantic ambiguity. Current techniques for constructing domain-specific lexicons face challenges like the insufficient [...] Read more.
Traditional methods for urban power grid design have struggled to meet the demands of multi-energy integration and high resilience scenarios due to issues such as delayed updates of terminology and semantic ambiguity. Current techniques for constructing domain-specific lexicons face challenges like the insufficient coverage of specialized vocabulary and imprecise synonym mining, which restrict the semantic parsing capabilities of intelligent design systems. To address these challenges, this study proposes a framework for constructing a domain-specific lexicon for urban power grid design based on Large Language Models (LLMs). The aim is to enhance the accuracy and practicality of the lexicon through multi-level term extraction and synonym expansion. Initially, a structured corpus covering national and industry standards in the field of power was constructed. An improved Term Frequency–Inverse Document Frequency (TF-IDF) algorithm, combined with mutual information and adjacency entropy filtering mechanisms, was utilized to extract high-quality seed vocabulary from 3426 candidate terms. Leveraging LLMs, multi-level prompt templates were designed to guide synonym mining, incorporating a self-correction mechanism for semantic verification to mitigate errors caused by model hallucinations. This approach successfully built a domain-specific lexicon comprising 3426 core seed words and 10,745 synonyms. The average cosine similarity of synonym pairs reached 0.86, and expert validation confirmed an accuracy rate of 89.3%; text classification experiments showed that integrating the domain-specific dictionary improved the classifier’s F1-score by 9.2%, demonstrating the effectiveness of the method. This research innovatively constructs a high-precision terminology dictionary in the field of power design for the first time through embedding domain-driven constraints and validation workflows, solving the problems of insufficient coverage and imprecise expansion of traditional methods, and supporting the development of semantically intelligent systems for smart urban power grid design, with significant practical application value. Full article
(This article belongs to the Special Issue Advances in Smart Construction and Intelligent Buildings)
Show Figures

Figure 1

17 pages, 3161 KB  
Article
Unpacking Online Discourse on Bioplastics: Insights from Reddit Sentiment Analysis
by Bernardo Cruz, Aimilia Vaitsi, Samuel Domingos, Catarina Possidónio, Sílvia Luís, Eliana Portugal, Ana Loureiro, Sibu Padmanabhan and Ana Rita Farias
Polymers 2025, 17(6), 823; https://doi.org/10.3390/polym17060823 - 20 Mar 2025
Viewed by 1193
Abstract
Bioplastics have been presented as a sustainable alternative to products derived from fossil sources. In response, industries have developed innovative products using biopolymers across various sectors, such as food, packaging, biomedical, and construction. However, consumer acceptance remains crucial for their widespread adoption. This [...] Read more.
Bioplastics have been presented as a sustainable alternative to products derived from fossil sources. In response, industries have developed innovative products using biopolymers across various sectors, such as food, packaging, biomedical, and construction. However, consumer acceptance remains crucial for their widespread adoption. This study aims to explore public sentiment toward bioplastics, focusing on emotions expressed on Reddit. A dataset of 5041 Reddit comments was collected using keywords associated with bioplastics and the extraction process was facilitated by Python-based libraries like pandas, NLTK, and NumPy. The sentiment analysis was conducted using the NRCLex, a broadly used lexicon. The overall findings suggest that trust, anticipation, and joy were the most dominant emotions in the time frame 2014–2024, indicating that the public emotional response towards bioplastics has been mostly positive. Negative emotions such as fear, sadness, and anger were less prevalent, although an intense response was noted in 2018. Findings also indicate a temporal co-occurrence between significant events related to bioplastics and changes in sentiment among Reddit users. Although the representativeness of the sample is limited, the results of this study support the need to develop real-time monitoring of the public’s emotional responses. Thus, it will be possible to design communication campaigns more aligned with public needs. Full article
Show Figures

Figure 1

26 pages, 6629 KB  
Article
Named Entity Recognition in Track Circuits Based on Multi-Granularity Fusion and Multi-Scale Retention Mechanism
by Yanrui Chen, Guangwu Chen and Peng Li
Electronics 2025, 14(5), 828; https://doi.org/10.3390/electronics14050828 - 20 Feb 2025
Viewed by 591
Abstract
To enhance the efficiency of reusing massive unstructured operation and maintenance (O&M) data generated during routine railway maintenance inspections, this paper proposes a Named Entity Recognition (NER) method that integrates multi-granularity semantics and a Multi-Scale Retention (MSR) mechanism. The proposed approach effectively transforms [...] Read more.
To enhance the efficiency of reusing massive unstructured operation and maintenance (O&M) data generated during routine railway maintenance inspections, this paper proposes a Named Entity Recognition (NER) method that integrates multi-granularity semantics and a Multi-Scale Retention (MSR) mechanism. The proposed approach effectively transforms expert knowledge extracted from manually processed fault data into structured triplet information, enabling the in-depth mining of track circuit O&M text data. Given the specific characteristics of railway domain texts, which include a high prevalence of technical terms, ambiguous entity boundaries, and complex semantics, we first construct a domain-specific lexicon stored in a Trie tree structure. A lexicon adapter is then introduced to incorporate these terms as external knowledge into the base encoding process of RoBERTa-wwm-ext, forming the lexicon-enhanced LE-RoBERTa-wwm model. Subsequently, a hidden feature extractor captures semantic representations from all 12 output layers of LE-RoBERTa-wwm, performing weighted fusion to fully leverage multi-granularity semantic information across encoding layers. Furthermore, in the downstream processing stage, two computational paradigms are designed based on the MSR mechanism and the Regularized Dropout (R-Drop) mechanism, enabling low-cost inference and efficient parallel training. Comparative experiments conducted on the public Resume and Weibo datasets demonstrate that the model achieves F1 scores of 96.75% and 72.06%, respectively. Additional experiments on a track circuit dataset further validate the model’s superior recognition performance and generalization capability. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

18 pages, 3355 KB  
Article
Semi-Supervised Chinese Word Segmentation in Geological Domain Using Pseudo-Lexicon and Self-Training Strategy
by Bo Wan, Zhuo Tan, Deping Chu, Yan Dai, Fang Fang and Yan Wu
Appl. Sci. 2025, 15(3), 1404; https://doi.org/10.3390/app15031404 - 29 Jan 2025
Viewed by 868
Abstract
Chinese word segmentation (CWS), which involves splitting the sequence of Chinese characters into words, is a key task in natural language processing (NLP) for Chinese. However, the complexity and flexibility of geologic terms require that domain-specific knowledge be utilized in CWS for geoscience [...] Read more.
Chinese word segmentation (CWS), which involves splitting the sequence of Chinese characters into words, is a key task in natural language processing (NLP) for Chinese. However, the complexity and flexibility of geologic terms require that domain-specific knowledge be utilized in CWS for geoscience domains. Previous studies have identified several challenges that have an impact on CWS in the geoscience domain, including the absence of abundant labeled data and difficult-to-delineate complex geological word boundaries. To solve these problems, a novel semi-supervised deep learning framework, GeoCWS, is developed for CWS in the geoscience domain. The framework is designed with domain-enhanced features and an uncertainty-aware self-training strategy. First, n-grams are automatically constructed from the input text as a pseudo-lexicon. Then, a backbone model is suggested that learns domain-enhanced features by introducing a pseudo-lexicon-based memory mechanism to delineate complex geological word boundaries based on BERT. Next, the backbone model is fine-tuned with a small amount of labeled data to obtain the teacher model. Finally, we design a self-training strategy with joint confidence and uncertainty awareness to improve the generalization ability of the backbone model to unlabeled data. Our method outperformed the state-of-the-art baseline methods in extensive experiments, and ablation experiments verified the effectiveness of the proposed backbone model and self-training strategy. Full article
Show Figures

Figure 1

18 pages, 1102 KB  
Article
Mezcal Characterization Through Sensory and Volatile Analyses
by Oxana Lazo, Ana Lidia García-Ortíz, Joaliné Pardo and Luis Guerrero
Foods 2025, 14(3), 402; https://doi.org/10.3390/foods14030402 - 26 Jan 2025
Viewed by 1008
Abstract
Mezcal is a traditional beverage with relevant cultural and economic importance in Mexico, with different Protected Designation of Origin locations. This study focuses on creating a sensory lexicon for Mezcal with local producers by means of Free Choice Profiling. A selection of the [...] Read more.
Mezcal is a traditional beverage with relevant cultural and economic importance in Mexico, with different Protected Designation of Origin locations. This study focuses on creating a sensory lexicon for Mezcal with local producers by means of Free Choice Profiling. A selection of the most relevant descriptors was made to construct a sensory wheel. Subsequently, a sensory panel evaluated a total of 10 Mezcal samples using the sensory categories defined in the sensory wheel. Additionally, gas chromatography with mass spectrometry was performed to analyze volatile components’ contribution to the aroma and flavor descriptors. A total of 87 terms were selected for the sensory wheel, using 41 descriptors within 10 categories for odor modality and 46 more within 13 categories for flavor modality. The main volatile compounds that were identified were 37 esters, 17 alcohols, 12 ketals and 9 terpenes, which were the foremost contributors to the presence of several sensory descriptors and were also found in most of the Mezcal samples. The quantitative analysis results exhibited a higher floral odor for Mezcal of the Angustifolia variety and the highest smoked odor for an earthenware distilled Mezcal, thus proving that the selection of the descriptors from the wheel was appropriate for differentiating Mezcal samples from different origins, agave species and distillation processes. Therefore, the sensory wheel developed in this study can be used both as a quality control tool and as a marketing tool that allows producers to differentiate their products in the market. Full article
(This article belongs to the Section Sensory and Consumer Sciences)
Show Figures

Figure 1

17 pages, 1985 KB  
Article
A Spine-Specific Lexicon for the Sentiment Analysis of Interviews with Adult Spinal Deformity Patients Correlates with SF-36, SRS-22, and ODI Scores: A Pilot Study of 25 Patients
by Ross Gore, Michael M. Safaee, Christopher J. Lynch and Christopher P. Ames
Information 2025, 16(2), 90; https://doi.org/10.3390/info16020090 - 24 Jan 2025
Cited by 1 | Viewed by 1007
Abstract
Classic health-related quality of life (HRQOL) metrics are cumbersome, time-intensive, and subject to biases based on the patient’s native language, educational level, and cultural values. Natural language processing (NLP) converts text into quantitative metrics. Sentiment analysis enables subject matter experts to construct domain-specific [...] Read more.
Classic health-related quality of life (HRQOL) metrics are cumbersome, time-intensive, and subject to biases based on the patient’s native language, educational level, and cultural values. Natural language processing (NLP) converts text into quantitative metrics. Sentiment analysis enables subject matter experts to construct domain-specific lexicons that assign a value of either negative (−1) or positive (1) to certain words. The growth of telehealth provides opportunities to apply sentiment analysis to transcripts of adult spinal deformity patients’ visits to derive a novel and less biased HRQOL metric. In this study, we demonstrate the feasibility of constructing a spine-specific lexicon for sentiment analysis to derive an HRQOL metric for adult spinal deformity patients from their preoperative telehealth visit transcripts. We asked each of twenty-five (25) adult patients seven open-ended questions about their spinal conditions, treatment, and quality of life during telehealth visits. We analyzed the Pearson correlation between our sentiment analysis HRQOL metric and established HRQOL metrics (the Scoliosis Research Society-22 questionnaire [SRS-22], 36-Item Short Form Health Survey [SF-36], and Oswestry Disability Index [ODI]). The results show statistically significant correlations (0.43–0.74) between our sentiment analysis metric and the conventional metrics. This provides evidence that applying NLP techniques to patient transcripts can yield an effective HRQOL metric. Full article
(This article belongs to the Special Issue Biomedical Natural Language Processing and Text Mining)
Show Figures

Figure 1

19 pages, 5136 KB  
Article
Classification of Multi-Value Chain Business Data Resources Based on Semantic Association
by Bo Fan, Linfu Sun, Dong Tan and Min Han
Electronics 2024, 13(24), 5035; https://doi.org/10.3390/electronics13245035 - 21 Dec 2024
Viewed by 709
Abstract
Focusing on the classification of business data resources within a unified business semantic environment is an important method to simplify the data environment and a crucial approach to studying data intelligence. A multi-value chain data space is a typical business semantic heterogeneous complex [...] Read more.
Focusing on the classification of business data resources within a unified business semantic environment is an important method to simplify the data environment and a crucial approach to studying data intelligence. A multi-value chain data space is a typical business semantic heterogeneous complex data environment. This paper summarizes the characteristics of multi-value chain business data resources and proposes a study on their classification using business semantic logic. By constructing a semantic-based relational model for multi-value chain business data resources and a multi-value chain business lexicon, this paper unifies the semantics of business data resources. This creates conditions for their classification according to business logic. Based on the feature transformation of business data resources, this paper proposes a clustering algorithm for multi-value chain business data resources (Business data resource classification algorithm for multi-value chain data space, BDRCA4MVCDS) aimed at a data space, completing the classification of business data resources. Finally, comparative experiments with KMeans and KABSA demonstrate the clustering effectiveness of the proposed algorithm (BDRCA4MVCDS) as well as its good stability and adaptability. Full article
(This article belongs to the Special Issue Secure Data Privacy and Encryption in Digital Networks)
Show Figures

Figure 1

25 pages, 1676 KB  
Article
Research of Chinese Entity Recognition Model Based on Multi-Feature Semantic Enhancement
by Ling Yuan, Chenglong Zeng and Peng Pan
Electronics 2024, 13(24), 4895; https://doi.org/10.3390/electronics13244895 - 12 Dec 2024
Cited by 1 | Viewed by 922
Abstract
Chinese Entity Recognition (CER) aims to extract key information entities from Chinese text data, supporting subsequent natural language processing tasks such as relation extraction, knowledge graph construction, and intelligent question answering. However, CER faces several challenges, including limited training corpora, unclear entity boundaries, [...] Read more.
Chinese Entity Recognition (CER) aims to extract key information entities from Chinese text data, supporting subsequent natural language processing tasks such as relation extraction, knowledge graph construction, and intelligent question answering. However, CER faces several challenges, including limited training corpora, unclear entity boundaries, and complex entity structures, resulting in low accuracy and a call for further improvements. To address issues such as high annotation costs and ambiguous entity boundaries, this paper proposes the SEMFF-CER model, a CER model based on semantic enhancement and multi-feature fusion. The model employs character feature extraction algorithms, SofeLexicon semantic enhancement for vocabulary feature extraction, and deep semantic feature extraction from pre-trained models. These features are integrated into the entity recognition process via gating mechanisms, effectively leveraging diverse features to enhance contextual semantics and improve recognition accuracy. Additionally, the model incorporates several optimization strategies: an adaptive loss function to balance negative samples and improve the F1 score, data augmentation to enhance model robustness, and dropout and Adamax optimization algorithms to refine training. The SEMFF-CER model is characterized by a low dependence on training corpora, fast computation speed, and strong scalability. Experiments conducted on four Chinese benchmark entity recognition datasets validate the proposed model, demonstrating superior performance over existing models with the highest F1 score. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

20 pages, 7741 KB  
Article
Upscaling Natural Materials in Construction: Earthen, Fast-Growing, and Living Materials
by Olga Beatrice Carcassi, Roberta Salierno, Pietro Augusto Falcinelli, Ingrid Maria Paoletti and Lola Ben-Alon
Sustainability 2024, 16(18), 7926; https://doi.org/10.3390/su16187926 - 11 Sep 2024
Cited by 6 | Viewed by 2663
Abstract
Despite the numerous advantages of using natural materials, such as fast-growing, living, and earthen materials, their widespread application in the construction industry remains limited. This research presents a perception survey, which investigates stakeholders’ perceptions regarding the market, regulatory barriers, and educational barriers, exploring [...] Read more.
Despite the numerous advantages of using natural materials, such as fast-growing, living, and earthen materials, their widespread application in the construction industry remains limited. This research presents a perception survey, which investigates stakeholders’ perceptions regarding the market, regulatory barriers, and educational barriers, exploring experiences, motivations, and attitudes toward the adoption of natural materials in construction projects. The results capture variations in current practices and identify patterns for future directions, analyzed in a comparative manner to assess two geographical regions: Europe and North America. The results show that contractor availability, a lack of professional knowledge (mostly in Europe), and cost-to-value perceptions (mostly in the USA) are key barriers to adopting natural materials. The lack of awareness among construction professionals regarding technical aspects highlights the need for targeted training, while the lack of regulatory distinction between living and earth-based materials underscores the need for harmonized policies. By elucidating stakeholders’ perspectives and identifying key challenges, this research aims to inform policymaking, industry practices, and research initiatives aimed at promoting the use of a wider lexicon of construction materials. Ultimately, this study hopes to facilitate the development of strategies to overcome scalability challenges and accelerate the transition toward their implementation in mainstream projects. Full article
Show Figures

Figure 1

16 pages, 1928 KB  
Article
A New Chinese Named Entity Recognition Method for Pig Disease Domain Based on Lexicon-Enhanced BERT and Contrastive Learning
by Cheng Peng, Xiajun Wang, Qifeng Li, Qinyang Yu, Ruixiang Jiang, Weihong Ma, Wenbiao Wu, Rui Meng, Haiyan Li, Heju Huai, Shuyan Wang and Longjuan He
Appl. Sci. 2024, 14(16), 6944; https://doi.org/10.3390/app14166944 - 8 Aug 2024
Viewed by 1481
Abstract
Named Entity Recognition (NER) is a fundamental and pivotal stage in the development of various knowledge-based support systems, including knowledge retrieval and question-answering systems. In the domain of pig diseases, Chinese NER models encounter several challenges, such as the scarcity of annotated data, [...] Read more.
Named Entity Recognition (NER) is a fundamental and pivotal stage in the development of various knowledge-based support systems, including knowledge retrieval and question-answering systems. In the domain of pig diseases, Chinese NER models encounter several challenges, such as the scarcity of annotated data, domain-specific vocabulary, diverse entity categories, and ambiguous entity boundaries. To address these challenges, we propose PDCNER, a Pig Disease Chinese Named Entity Recognition method leveraging lexicon-enhanced BERT and contrastive learning. Firstly, we construct a domain-specific lexicon and pre-train word embeddings in the pig disease domain. Secondly, we integrate lexicon information of pig diseases into the lower layers of BERT using a Lexicon Adapter layer, which employs char–word pair sequences. Thirdly, to enhance feature representation, we propose a lexicon-enhanced contrastive loss layer on top of BERT. Finally, a Conditional Random Field (CRF) layer is employed as the model’s decoder. Experimental results show that our proposed model demonstrates superior performance over several mainstream models, achieving a precision of 87.76%, a recall of 86.97%, and an F1-score of 87.36%. The proposed model outperforms BERT-BiLSTM-CRF and LEBERT by 14.05% and 6.8%, respectively, with only 10% of the samples available, showcasing its robustness in data scarcity scenarios. Furthermore, the model exhibits generalizability across publicly available datasets. Our work provides reliable technical support for the information extraction of pig diseases in Chinese and can be easily extended to other domains, thereby facilitating seamless adaptation for named entity identification across diverse contexts. Full article
(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications—2nd Edition)
Show Figures

Figure 1

Back to TopTop