Next Article in Journal
Development of Novel Real-Time Radiation Systems Using 4-Channel Sensors
Next Article in Special Issue
Marker-Based Movement Analysis of Human Body Parts in Therapeutic Procedure
Previous Article in Journal
An Efficient Data-Hiding Scheme Based on Multidimensional Mini-SuDoKu
Open AccessArticle

Using Complexity-Identical Human- and Machine-Directed Utterances to Investigate Addressee Detection for Spoken Dialogue Systems

1
Institute of Communications Engineering, Ulm University, Albert-Einstein-Allee 43, 89081 Ulm, Germany
2
ITMO University, Kronverksky Ave. 49, 197101 St. Petersburg, Russia
3
Institute for Information Technology and Communications, Otto-von-Guericke-University, Universitaetsplatz 2, 39016 Magdeburg, Germany
4
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, 14th Line 39, 199178 St. Petersburg, Russia
*
Authors to whom correspondence should be addressed.
Sensors 2020, 20(9), 2740; https://doi.org/10.3390/s20092740
Received: 31 March 2020 / Revised: 30 April 2020 / Accepted: 7 May 2020 / Published: 11 May 2020
(This article belongs to the Special Issue Multimodal Sensing for Understanding Behavior and Personality)
Human-machine addressee detection (H-M AD) is a modern paralinguistics and dialogue challenge that arises in multiparty conversations between several people and a spoken dialogue system (SDS) since the users may also talk to each other and even to themselves while interacting with the system. The SDS is supposed to determine whether it is being addressed or not. All existing studies on acoustic H-M AD were conducted on corpora designed in such a way that a human addressee and a machine played different dialogue roles. This peculiarity influences speakers’ behaviour and increases vocal differences between human- and machine-directed utterances. In the present study, we consider the Restaurant Booking Corpus (RBC) that consists of complexity-identical human- and machine-directed phone calls and allows us to eliminate most of the factors influencing speakers’ behaviour implicitly. The only remaining factor is the speakers’ explicit awareness of their interlocutor (technical system or human being). Although complexity-identical H-M AD is essentially more challenging than the classical one, we managed to achieve significant improvements using data augmentation (unweighted average recall (UAR) = 0.628) over native listeners (UAR = 0.596) and a baseline classifier presented by the RBC developers (UAR = 0.539). View Full-Text
Keywords: addressee detection; human-computer interaction; computational paralinguistics; speaking style; data augmentation; mixup; speech classification addressee detection; human-computer interaction; computational paralinguistics; speaking style; data augmentation; mixup; speech classification
Show Figures

Figure 1

MDPI and ACS Style

Akhtiamov, O.; Siegert, I.; Karpov, A.; Minker, W. Using Complexity-Identical Human- and Machine-Directed Utterances to Investigate Addressee Detection for Spoken Dialogue Systems. Sensors 2020, 20, 2740. https://doi.org/10.3390/s20092740

AMA Style

Akhtiamov O, Siegert I, Karpov A, Minker W. Using Complexity-Identical Human- and Machine-Directed Utterances to Investigate Addressee Detection for Spoken Dialogue Systems. Sensors. 2020; 20(9):2740. https://doi.org/10.3390/s20092740

Chicago/Turabian Style

Akhtiamov, Oleg; Siegert, Ingo; Karpov, Alexey; Minker, Wolfgang. 2020. "Using Complexity-Identical Human- and Machine-Directed Utterances to Investigate Addressee Detection for Spoken Dialogue Systems" Sensors 20, no. 9: 2740. https://doi.org/10.3390/s20092740

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop