Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (10)

Search Parameters:
Keywords = cross-domain named entity recognition

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 1606 KB  
Article
CLFF-NER: A Cross-Lingual Feature Fusion Model for Named Entity Recognition in the Traditional Chinese Festival Culture Domain
by Shenghe Yang, Kun He, Wei Li and Yingying He
Informatics 2025, 12(4), 136; https://doi.org/10.3390/informatics12040136 - 5 Dec 2025
Viewed by 628
Abstract
With the rapid development of information technology, there is an increasing demand for the digital preservation of traditional festival culture and the extraction of relevant knowledge. However, existing research on Named Entity Recognition (NER) for Chinese traditional festival culture lacks support from high-quality [...] Read more.
With the rapid development of information technology, there is an increasing demand for the digital preservation of traditional festival culture and the extraction of relevant knowledge. However, existing research on Named Entity Recognition (NER) for Chinese traditional festival culture lacks support from high-quality corpora and dedicated model methods. To address this gap, this study proposes a Named Entity Recognition model, CLFF-NER, which integrates multi-source heterogeneous information. The model operates as follows: first, Multilingual BERT is employed to obtain the contextual semantic representations of Chinese and English sentences. Subsequently, a Multiconvolutional Kernel Network (MKN) is used to extract the local structural features of entities. Then, a Transformer module is introduced to achieve cross-lingual, cross-attention fusion of Chinese and English semantics. Furthermore, a Graph Neural Network (GNN) is utilized to selectively supplement useful English information, thereby alleviating the interference caused by redundant information. Finally, a gating mechanism and Conditional Random Field (CRF) are combined to jointly optimize the recognition results. Experiments were conducted on the public Chinese Festival Culture Dataset (CTFCDataSet), and the model achieved 89.45%, 90.01%, and 89.73% in precision, recall, and F1 score, respectively—significantly outperforming a range of mainstream baseline models. Meanwhile, the model also demonstrated competitive performance on two other public datasets, Resume and Weibo, which verifies its strong cross-domain generalization ability. Full article
Show Figures

Figure 1

18 pages, 864 KB  
Article
Enhanced Semantic BERT for Named Entity Recognition in Education
by Ping Huang, Huijuan Zhu, Ying Wang, Lili Dai and Lei Zheng
Electronics 2025, 14(19), 3951; https://doi.org/10.3390/electronics14193951 - 7 Oct 2025
Viewed by 773
Abstract
To address the technical challenges in the educational domain named entity recognition (NER), such as ambiguous entity boundaries and difficulties with nested entity identification, this study proposes an enhanced semantic BERT model (ES-BERT). The model innovatively adopts an education domain, vocabulary-assisted semantic enhancement [...] Read more.
To address the technical challenges in the educational domain named entity recognition (NER), such as ambiguous entity boundaries and difficulties with nested entity identification, this study proposes an enhanced semantic BERT model (ES-BERT). The model innovatively adopts an education domain, vocabulary-assisted semantic enhancement strategy that (1) applies the term frequency–inverse document frequency (TF-IDF) algorithm to weight domain-specific terms, and (2) fuses the weighted lexical information with character-level features, enabling BERT to generate enriched, domain-aware, character–word hybrid representations. A complete bidirectional long short-term memory-conditional random field (BiLSTM-CRF) recognition framework was established, and a novel focal loss-based joint training method was introduced to optimize the process. The experimental design employed a three-phase validation protocol, as follows: (1) In a comparative evaluation using 5-fold cross-validation on our proprietary computer-education dataset, the proposed ES-BERT model yielded a precision of 90.38%, which is higher than that of the baseline models; (2) Ablation studies confirmed the contribution of domain-vocabulary enhancement to performance improvement; (3) Cross-domain experiments on the 2016 knowledge base question answering datasets and resume benchmark datasets demonstrated outstanding precision of 98.41% and 96.75%, respectively, verifying the model’s transfer-learning capability. These comprehensive experimental results substantiate that ES-BERT not only effectively resolves domain-specific NER challenges in education but also exhibits remarkable cross-domain adaptability. Full article
Show Figures

Figure 1

18 pages, 1578 KB  
Article
Leveraging Failure Modes and Effect Analysis for Technical Language Processing
by Mathieu Payette, Georges Abdul-Nour, Toualith Jean-Marc Meango, Miguel Diago and Alain Côté
Mach. Learn. Knowl. Extr. 2025, 7(2), 42; https://doi.org/10.3390/make7020042 - 9 May 2025
Cited by 2 | Viewed by 2761
Abstract
With the evolution of data collection technologies, sensor-generated data have become the norm. However, decades of manually recorded maintenance data still hold untapped value. Natural Language Processing (NLP) offers new ways to extract insights from these historical records, especially from short, unstructured maintenance [...] Read more.
With the evolution of data collection technologies, sensor-generated data have become the norm. However, decades of manually recorded maintenance data still hold untapped value. Natural Language Processing (NLP) offers new ways to extract insights from these historical records, especially from short, unstructured maintenance texts often accompanying structured database fields. While NLP has shown promise in this area, technical texts pose unique challenges, particularly in preprocessing and manual annotation. This study proposes a novel methodology combining Failure Mode and Effect Analysis (FMEA), a reliability engineering tool, into the NLP pipeline to enhance Named Entity Recognition (NER) in maintenance records. By leveraging the structured and domain-specific knowledge encapsulated in FMEAs, the annotation process becomes more systematic, reducing the need for exhaustive manual effort. A case study using real-world data from a major electrical utility demonstrates the effectiveness of this approach. The custom NER model, trained using FMEA-informed annotations, achieves high precision, recall, and F1 scores, successfully identifying key reliability elements in maintenance text. The integration of FMEA not only improves data quality but also supports more informed asset management decisions. This research introduces a novel cross-disciplinary framework combining reliability engineering and NLP. It highlights how domain expertise can be used to streamline annotation, improve model accuracy, and unlock actionable insights from legacy maintenance data. Full article
(This article belongs to the Section Data)
Show Figures

Figure 1

14 pages, 420 KB  
Article
Cross-Domain Tibetan Named Entity Recognition via Large Language Models
by Jin Zhang, Fan Gao, Lobsang Yeshi, Dorje Tashi, Xiangshi Wang, Nyima Tashi and Gadeng Luosang
Electronics 2025, 14(1), 111; https://doi.org/10.3390/electronics14010111 - 30 Dec 2024
Cited by 1 | Viewed by 1771
Abstract
With the development of large language models (LLMs), they have demonstrated powerful capabilities across many downstream tasks. Existing Tibetan named entity recognition (NER) methods often suffer from a high degree of coupling between data and models, limiting them to identifying entities only within [...] Read more.
With the development of large language models (LLMs), they have demonstrated powerful capabilities across many downstream tasks. Existing Tibetan named entity recognition (NER) methods often suffer from a high degree of coupling between data and models, limiting them to identifying entities only within specific domain datasets and making cross-domain recognition difficult. Additionally, each dataset requires training a dedicated model, and when faced with new domains, retraining and redeployment are necessary. In practical applications, the ability to perform cross-domain NER is crucial to meeting real-world needs. To address this issue and decouple data from models, enabling cross-domain NER, this paper proposes a cross-domain joint learning approach based on large language models, which enhances model robustness by learning the shared underlying semantics across different domains. To reduce the significant computational costs incurred by LLMs during inference, we adopt an adaptive structured pruning method based on domain-dependent prompt, which effectively reduces the model’s memory requirements and improves the inference speed while minimizing the impact on performance. The experimental results show that our method significantly outperformed the baseline model across cross-domain Tibetan datasets. In the Tibetan medicine domain, our method achieved an F1 score improvement of up to 27.26% compared with the baseline model at its best. Our method achieved an average F1 score of 95.17% across domains, outperforming the baseline Llama2 + Prompt model by 5.12%. Furthermore, our method demonstrates strong generalization capabilities in NER tasks for other low-resource languages. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Graphical abstract

27 pages, 1228 KB  
Article
Designing a Prototype Platform for Real-Time Event Extraction: A Scalable Natural Language Processing and Data Mining Approach
by Mihai-Constantin Avornicului, Vasile Paul Bresfelean, Silviu-Claudiu Popa, Norbert Forman and Calin-Adrian Comes
Electronics 2024, 13(24), 4938; https://doi.org/10.3390/electronics13244938 - 14 Dec 2024
Cited by 2 | Viewed by 2580
Abstract
In this paper, we present a modular, high-performance prototype platform for real-time event extraction, designed to address key challenges in processing large volumes of unstructured data across applications like crisis management, social media monitoring and news aggregation. The prototype integrates advanced natural language [...] Read more.
In this paper, we present a modular, high-performance prototype platform for real-time event extraction, designed to address key challenges in processing large volumes of unstructured data across applications like crisis management, social media monitoring and news aggregation. The prototype integrates advanced natural language processing (NLP) techniques (Term Frequency–Inverse Document Frequency (TF-IDF), Latent Semantic Indexing (LSI), Named Entity Recognition (NER)) with data mining strategies to improve precision in relevance scoring, clustering and entity extraction. The platform is designed to handle real-time constraints in an efficient manner, by combining TF-IDF, LSI and NER into a hybrid pipeline. Unlike the transformer-based architectures that often struggle with latency, our prototype is scalable and flexible enough to support various domains like disaster management and social media monitoring. The initial quantitative and qualitative evaluations demonstrate the platform’s efficiency, accuracy, scalability, and are validated by metrics like F1-score, response time, and user satisfaction. Its design has a balance between fast computation and precise semantic analysis, and this can make it effective for applications that necessitate rapid processing. This prototype offers a robust foundation for high-frequency data processing, adaptable and scalable for real-time scenarios. In our future work, we will further explore contextual understanding, scalability through microservices and cross-platform data fusion for expanded event coverage. Full article
Show Figures

Figure 1

16 pages, 886 KB  
Article
Exploring the Potential of Neural Machine Translation for Cross-Language Clinical Natural Language Processing (NLP) Resource Generation through Annotation Projection
by Jan Rodríguez-Miret, Eulàlia Farré-Maduell, Salvador Lima-López, Laura Vigil, Vicent Briva-Iglesias and Martin Krallinger
Information 2024, 15(10), 585; https://doi.org/10.3390/info15100585 - 25 Sep 2024
Cited by 6 | Viewed by 4884
Abstract
Recent advancements in neural machine translation (NMT) offer promising potential for generating cross-language clinical natural language processing (NLP) resources. There is a pressing need to be able to foster the development of clinical NLP tools that extract key clinical entities in a comparable [...] Read more.
Recent advancements in neural machine translation (NMT) offer promising potential for generating cross-language clinical natural language processing (NLP) resources. There is a pressing need to be able to foster the development of clinical NLP tools that extract key clinical entities in a comparable way for a multitude of medical application scenarios that are hindered by lack of multilingual annotated data. This study explores the efficacy of using NMT and annotation projection techniques with expert-in-the-loop validation to develop named entity recognition (NER) systems for an under-resourced target language (Catalan) by leveraging Spanish clinical corpora annotated by domain experts. We employed a state-of-the-art NMT system to translate three clinical case corpora. The translated annotations were then projected onto the target language texts and subsequently validated and corrected by clinical domain experts. The efficacy of the resulting NER systems was evaluated against manually annotated test sets in the target language. Our findings indicate that this approach not only facilitates the generation of high-quality training data for the target language (Catalan) but also demonstrates the potential to extend this methodology to other languages, thereby enhancing multilingual clinical NLP resource development. The generated corpora and components are publicly accessible, potentially providing a valuable resource for further research and application in multilingual clinical settings. Full article
(This article belongs to the Special Issue Machine Translation for Conquering Language Barriers)
Show Figures

Figure 1

18 pages, 3070 KB  
Article
Harnessing Causal Structure Alignment for Enhanced Cross-Domain Named Entity Recognition
by Xiaoming Liu, Mengyuan Cao, Guan Yang, Jie Liu, Yang Liu and Hang Wang
Electronics 2024, 13(1), 67; https://doi.org/10.3390/electronics13010067 - 22 Dec 2023
Viewed by 2123
Abstract
Cross-domain named entity recognition (NER) is a crucial task in various practical applications, particularly when faced with the challenge of limited data availability in target domains. Existing methodologies primarily depend on feature representation or model parameter sharing mechanisms to enable the transfer of [...] Read more.
Cross-domain named entity recognition (NER) is a crucial task in various practical applications, particularly when faced with the challenge of limited data availability in target domains. Existing methodologies primarily depend on feature representation or model parameter sharing mechanisms to enable the transfer of entity recognition capabilities across domains. However, these approaches often ignore the latent causal relationships inherent in invariant features. To address this limitation, we propose a novel framework, the Causal Structure Alignment-based Cross-Domain Named Entity Recognition (CSA-NER) framework, designed to harness the causally invariant features within causal structures to enhance the cross-domain transfer of entity recognition competence. Initially, CSA-NER constructs a causal feature graph utilizing causal discovery to ascertain causal relationships between entities and contextual features across source and target domains. Subsequently, it performs graph structure alignment to extract causal invariant knowledge across domains via the graph optimal transport (GOT) method. Finally, the acquired causal invariant knowledge is refined and utilized through the integration of Gated Attention Units (GAUs). Comprehensive experiments conducted on five English datasets and a specific CD-NER dataset exhibit a notable improvement in the average performance of the CSA-NER model in comparison to existing cross-domain methods. These findings underscore the significance of unearthing and employing latent causal invariant knowledge to effectively augment the entity recognition capabilities in target domains, thereby contributing a robust methodology to the broader realm of cross-domain natural language processing. Full article
Show Figures

Figure 1

16 pages, 660 KB  
Article
Stylometric Fake News Detection Based on Natural Language Processing Using Named Entity Recognition: In-Domain and Cross-Domain Analysis
by Chih-Ming Tsai
Electronics 2023, 12(17), 3676; https://doi.org/10.3390/electronics12173676 - 31 Aug 2023
Cited by 20 | Viewed by 5660
Abstract
Nowadays, the dissemination of news information has become more rapid, liberal, and open to the public. People can find what they want to know more and more easily from a variety of sources, including traditional news outlets and new social media platforms. However, [...] Read more.
Nowadays, the dissemination of news information has become more rapid, liberal, and open to the public. People can find what they want to know more and more easily from a variety of sources, including traditional news outlets and new social media platforms. However, at a time when our lives are glutted with all kinds of news, we cannot help but doubt the veracity and legitimacy of these news sources; meanwhile, we also need to guard against the possible impact of various forms of fake news. To combat the spread of misinformation, more and more researchers have turned to natural language processing (NLP) approaches for effective fake news detection. However, in the face of increasingly serious fake news events, existing detection methods still need to be continuously improved. This study proposes a modified proof-of-concept model named NER-SA, which integrates natural language processing (NLP) and named entity recognition (NER) to conduct the in-domain and cross-domain analysis of fake news detection with the existing three datasets simultaneously. The named entities associated with any particular news event exist in a finite and available evidence pool. Therefore, entities must be mentioned and recognized in this entity bank in any authentic news articles. A piece of fake news inevitably includes only some entitlements in the entity bank. The false information is deliberately fabricated with fictitious, imaginary, and even unreasonable sentences and content. As a result, there must be differences in statements, writing logic, and style between legitimate news and fake news, meaning that it is possible to successfully detect fake news. We developed a mathematical model and used the simulated annealing algorithm to find the optimal legitimate area. Comparing the detection performance of the NER-SA model with current state-of-the-art models proposed in other studies, we found that the NER-SA model indeed has superior performance in detecting fake news. For in-domain analysis, the accuracy increased by an average of 8.94% on the LIAR dataset and 19.36% on the fake or real news dataset, while the F1-score increased by an average of 24.04% on the LIAR dataset and 19.36% on the fake or real news dataset. In cross-domain analysis, the accuracy and F1-score for the NER-SA model increased by an average of 28.51% and 24.54%, respectively, across six domains in the FakeNews AMT dataset. The findings and implications of this study are further discussed with regard to their significance for improving accuracy, understanding context, and addressing adversarial attacks. The development of stylometric detection based on NLP approaches using NER techniques can improve the effectiveness and applicability of fake news detection. Full article
(This article belongs to the Special Issue Data Push and Data Mining in the Age of Artificial Intelligence)
Show Figures

Figure 1

16 pages, 625 KB  
Article
POISE: Efficient Cross-Domain Chinese Named Entity Recognization via Transfer Learning
by Jiabao Sheng, Aishan Wumaier and Zhe Li
Symmetry 2020, 12(10), 1673; https://doi.org/10.3390/sym12101673 - 13 Oct 2020
Cited by 6 | Viewed by 2963
Abstract
To improve the performance of deep learning methods in case of a lack of labeled data for entity annotation in entity recognition tasks, this study proposes transfer learning schemes that combine the character to be the word to convert low-resource data symmetry into [...] Read more.
To improve the performance of deep learning methods in case of a lack of labeled data for entity annotation in entity recognition tasks, this study proposes transfer learning schemes that combine the character to be the word to convert low-resource data symmetry into high-resource data. We combine character embedding, word embedding, and the embedding of the label features using high- and low-resource data based on the BiLSTM-CRF model, and perform the feature-transfer and parameter-sharing tasks in two domains of the BiLSTM network to annotate with zero resources. Before transfer learning, we must first calculate the label similarity between two different domains and select the label features with large similarity for feature transfer mapping. All training parameters of the source domain in the model are shared during the BiLSTM network processing and CRF layer. In addition, we also use the method of combining characters and words to reduce the problem of word segmentation across domains and reduce the error rate in label mapping. The results of experiments show that in terms of the overall F1 score, the proposed model without supervision was superior by 9.76 percentage points to the general parametric shared transfer learning method, and by 9.08 and 12.38 percentage points, respectively, to two recent high–low resource learning methods. The proposed scheme improves performance in terms of transfer learning between the high- and low-resource data and can identify the predicted data in the target domain. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

17 pages, 1239 KB  
Article
Data-Augmented Hybrid Named Entity Recognition for Disaster Management by Transfer Learning
by Hung-Kai Kung, Chun-Mo Hsieh, Cheng-Yu Ho, Yun-Cheng Tsai, Hao-Yung Chan and Meng-Han Tsai
Appl. Sci. 2020, 10(12), 4234; https://doi.org/10.3390/app10124234 - 20 Jun 2020
Cited by 16 | Viewed by 4533
Abstract
This research aims to build a Mandarin named entity recognition (NER) module using transfer learning to facilitate damage information gathering and analysis in disaster management. The hybrid NER approach proposed in this research includes three modules: (1) data augmentation, which constructs a concise [...] Read more.
This research aims to build a Mandarin named entity recognition (NER) module using transfer learning to facilitate damage information gathering and analysis in disaster management. The hybrid NER approach proposed in this research includes three modules: (1) data augmentation, which constructs a concise data set for disaster management; (2) reference model, which utilizes the bidirectional long short-term memory–conditional random field framework to implement NER; and (3) the augmented model built by integrating the first two modules via cross-domain transfer with disparate label sets. Through the combination of established rules and learned sentence patterns, the hybrid approach performs well in NER tasks for disaster management and recognizes unfamiliar words successfully. This research applied the proposed NER module to disaster management. In the application, we favorably handled the NER tasks of our related work and achieved our desired outcomes. Through proper transfer, the results of this work can be extended to other fields and consequently bring valuable advantages in diverse applications. Full article
Show Figures

Figure 1

Back to TopTop