Next Article in Journal
ByNowLife: A Novel Framework for OWL and Bayesian Network Integration
Next Article in Special Issue
Word Sense Disambiguation Studio: A Flexible System for WSD Feature Extraction
Previous Article in Journal / Special Issue
Application of Machine Learning Models for Survival Prognosis in Breast Cancer Studies
Article Menu

Export Article

Open AccessArticle
Information 2019, 10(3), 94; https://doi.org/10.3390/info10030094

A Multilingual and Multidomain Study on Dialog Act Recognition Using Character-Level Tokenization

1
L2F—Spoken Language Systems Laboratory—INESC-ID, 1000-029 Lisboa, Portugal
2
Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisboa, Portugal
3
Instituto Universitário de Lisboa (ISCTE-IUL), 1649-026 Lisboa, Portugal
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in The 18th International Conference on Artificial Intelligence: Methodology, Systems, Applications (AIMSA 2018).
Received: 21 January 2019 / Revised: 25 February 2019 / Accepted: 26 February 2019 / Published: 3 March 2019
(This article belongs to the Special Issue Artificial Intelligence—Methodology, Systems, and Applications)
Full-Text   |   PDF [368 KB, uploaded 3 March 2019]   |  

Abstract

Automatic dialog act recognition is an important step for dialog systems since it reveals the intention behind the words uttered by its conversational partners. Although most approaches on the task use word-level tokenization, there is information at the sub-word level that is related to the function of the words and, consequently, their intention. Thus, in this study, we explored the use of character-level tokenization to capture that information. We explored the use of multiple character windows of different sizes to capture morphological aspects, such as affixes and lemmas, as well as inter-word information. Furthermore, we assessed the importance of punctuation and capitalization for the task. To broaden the conclusions of our study, we performed experiments on dialogs in three languages—English, Spanish, and German—which have different morphological characteristics. Furthermore, the dialogs cover multiple domains and are annotated with both domain-dependent and domain-independent dialog act labels. The achieved results not only show that the character-level approach leads to similar or better performance than the state-of-the-art word-level approaches on the task, but also that both approaches are able to capture complementary information. Thus, the best results are achieved by combining tokenization at both levels. View Full-Text
Keywords: dialog act recognition; character-level; multilinguality; multidomain dialog act recognition; character-level; multilinguality; multidomain
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Ribeiro, E.; Ribeiro, R.; de Matos, D.M. A Multilingual and Multidomain Study on Dialog Act Recognition Using Character-Level Tokenization. Information 2019, 10, 94.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Information EISSN 2078-2489 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top