Next Article in Journal
Interoperability Conflicts in Linked Open Statistical Data
Previous Article in Journal
Crowdsourcing the Paldaruo Speech Corpus of Welsh for Speech Technology
Previous Article in Special Issue
Assisting Forensic Identification through Unsupervised Information Extraction of Free Text Autopsy Reports: The Disappearances Cases during the Brazilian Military Dictatorship
Open AccessFeature PaperArticle

Transfer Learning for Named Entity Recognition in Financial and Biomedical Documents

1
Department of Computer Science, Language Intelligence & Information Retrieval Lab (LIIR), 3000 KU Leuven, Belgium
2
Contract.fit, 1000 Brussels, Belgium
*
Author to whom correspondence should be addressed.
Information 2019, 10(8), 248; https://doi.org/10.3390/info10080248
Received: 30 May 2019 / Revised: 18 July 2019 / Accepted: 25 July 2019 / Published: 26 July 2019
(This article belongs to the Special Issue Natural Language Processing and Text Mining)
  |  
PDF [944 KB, uploaded 2 August 2019]
  |  

Abstract

Recent deep learning approaches have shown promising results for named entity recognition (NER). A reasonable assumption for training robust deep learning models is that a sufficient amount of high-quality annotated training data is available. However, in many real-world scenarios, labeled training data is scarcely present. In this paper we consider two use cases: generic entity extraction from financial and from biomedical documents. First, we have developed a character based model for NER in financial documents and a word and character based model with attention for NER in biomedical documents. Further, we have analyzed how transfer learning addresses the problem of limited training data in a target domain. We demonstrate through experiments that NER models trained on labeled data from a source domain can be used as base models and then be fine-tuned with few labeled data for recognition of different named entity classes in a target domain. We also witness an interest in language models to improve NER as a way of coping with limited labeled data. The current most successful language model is BERT. Because of its success in state-of-the-art models we integrate representations based on BERT in our biomedical NER model along with word and character information. The results are compared with a state-of-the-art model applied on a benchmarking biomedical corpus. View Full-Text
Keywords: deep learning; entity extraction; named entity recognition; transfer learning; fine-tuning; minimum training data deep learning; entity extraction; named entity recognition; transfer learning; fine-tuning; minimum training data
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Francis, S.; Van Landeghem, J.; Moens, M.-F. Transfer Learning for Named Entity Recognition in Financial and Biomedical Documents. Information 2019, 10, 248.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Information EISSN 2078-2489 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top