Next Article in Journal
Liver Cancer Classification Model Using Hybrid Feature Selection Based on Class-Dependent Technique for the Central Region of Thailand
Next Article in Special Issue
Spelling Correction of Non-Word Errors in Uyghur–Chinese Machine Translation
Previous Article in Journal
Green Investment Decisions in Supply Chains: A Game Model with Complete Information
Previous Article in Special Issue
Istex: A Database of Twenty Million Scientific Papers with a Mining Tool Which Uses Named Entities
Open AccessArticle

An Improved Word Representation for Deep Learning Based NER in Indian Languages

Department of Computer Science, Cochin University of Science and Technology, Kochi 682022, India
*
Author to whom correspondence should be addressed.
Information 2019, 10(6), 186; https://doi.org/10.3390/info10060186
Received: 27 April 2019 / Revised: 23 May 2019 / Accepted: 27 May 2019 / Published: 30 May 2019
(This article belongs to the Special Issue Natural Language Processing and Text Mining)
Named Entity Recognition (NER) is the process of identifying the elementary units in a text document and classifying them into predefined categories such as person, location, organization and so forth. NER plays an important role in many Natural Language Processing applications like information retrieval, question answering, machine translation and so forth. Resolving the ambiguities of lexical items involved in a text document is a challenging task. NER in Indian languages is always a complex task due to their morphological richness and agglutinative nature. Even though different solutions were proposed for NER, it is still an unsolved problem. Traditional approaches to Named Entity Recognition were based on the application of hand-crafted features to classical machine learning techniques such as Hidden Markov Model (HMM), Support Vector Machine (SVM), Conditional Random Field (CRF) and so forth. But the introduction of deep learning techniques to the NER problem changed the scenario, where the state of art results have been achieved using deep learning architectures. In this paper, we address the problem of effective word representation for NER in Indian languages by capturing the syntactic, semantic and morphological information. We propose a deep learning based entity extraction system for Indian languages using a novel combined word representation, including character-level, word-level and affix-level embeddings. We have used ‘ARNEKT-IECSIL 2018’ shared data for training and testing. Our results highlight the improvement that we obtained over the existing pre-trained word representations. View Full-Text
Keywords: named entity recognition; Bi-LSTM-CRF; Indian languages; affix embedding; character-based word composition; agglutinative languages named entity recognition; Bi-LSTM-CRF; Indian languages; affix embedding; character-based word composition; agglutinative languages
Show Figures

Figure 1

MDPI and ACS Style

A P, A.; K, M.; Mary Idicula, S. An Improved Word Representation for Deep Learning Based NER in Indian Languages. Information 2019, 10, 186.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop