Multimodal Conversational Interaction and Interfaces

A special issue of Multimodal Technologies and Interaction (ISSN 2414-4088).

Deadline for manuscript submissions: closed (30 April 2019) | Viewed by 22177

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer and Information Science, Faculty of Science and Technology, Seikei University, 3-3-1 Kichijoji-Kitamachi, Musashino-shi, Tokyo 180-8633, Japan
Interests: multimodal/multiparty interaction; intelligent user interfaces; conversational agents/robots; dialogue systems
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Kyoto University, Kyoto 606-8501, Japan
Interests: conversational informatics; multimodal interaction; common ground; human-AI communication

E-Mail Website
Guest Editor
School of Computing, University of the Fraser Valley, Abbotsford, BC V2S 7M8, Canada
Interests: natural language processing; speech processing; small group interaction; A.I. for health and wellness
Special Issues, Collections and Topics in MDPI journals

E-Mail
Guest Editor
EPFL- ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE, Lausanne, Switzerland
Interests: group interaction; social signal processing; social robot; educational technologies

Special Issue Information

Dear Colleagues,

In face-to-face interaction, multiple communicative behaviors including verbal information and nonverbal signals, such as gestures, facial expressions, and gaze, are exchanged between the conversation participants. They also interpret the combination and co-occurrence of verbal and nonverbal behaviors to understand the conversation. In multiparty communication, where more than two people participate in the conversation, patterns of multimodal information become more complex. Aiming to shed light on such a complex process of face-to-face communication, studies on multimodal interaction have employed a variety of machine learning techniques. From an application point of view, implementing some aspects of multimodal interaction is indispensable in enhancing human-agent/robot communication and supporting human-human communication in computer-mediated environments.

The purpose of this special issue is to solicit contributions from both a theoretical and a practical perspective, and envision the future directions of research on multimodal interaction and its application to multimodal conversational interfaces. We encourage authors to submit original research articles on the following topics, but not limited to:

  • Theoretical and computational models that shed light on the process and the characteristics of multimodal interaction
  • New data-driven methodologies for investigating big data of multimodal interaction
  • Virtual agents and humanoid robots with multimodal and/or multiparty conversational functionality
  • Communication support systems that facilitate multimodal and /or multiparty conversation in computer-mediated communication
  • Tools and platforms that contribute to research on multimodal interaction and building novel multimodal conversational interfaces

Prof. Yukiko I. Nakano
Prof. Toyoaki Nishida
Assoc. Prof. Gabriel Murray
Dr. Catharine Oertel
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Multimodal Technologies and Interaction is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Verbal and nonverbal information
  • Multiparty interaction
  • Computational and statistical models
  • Conversational virtual agents
  • Communication robots
  • Multimodal interfaces for human-human communication
  • Social signal processing

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

23 pages, 2483 KiB  
Article
Generation of Head Movements of a Robot Using Multimodal Features of Peer Participants in Group Discussion Conversation
by Hung-Hsuan Huang, Seiya Kimura, Kazuhiro Kuwabara and Toyoaki Nishida
Multimodal Technol. Interact. 2020, 4(2), 15; https://doi.org/10.3390/mti4020015 - 29 Apr 2020
Cited by 3 | Viewed by 2829
Abstract
In recent years, companies have been seeking communication skills from their employees. Increasingly more companies have adopted group discussions during their recruitment process to evaluate the applicants’ communication skills. However, the opportunity to improve communication skills in group discussions is limited because of [...] Read more.
In recent years, companies have been seeking communication skills from their employees. Increasingly more companies have adopted group discussions during their recruitment process to evaluate the applicants’ communication skills. However, the opportunity to improve communication skills in group discussions is limited because of the lack of partners. To solve this issue as a long-term goal, the aim of this study is to build an autonomous robot that can participate in group discussions, so that its users can repeatedly practice with it. This robot, therefore, has to perform humanlike behaviors with which the users can interact. In this study, the focus was on the generation of two of these behaviors regarding the head of the robot. One is directing its attention to either of the following targets: the other participants or the materials placed on the table. The second is to determine the timings of the robot’s nods. These generation models are considered in three situations: when the robot is speaking, when the robot is listening, and when no participant including the robot is speaking. The research question is: whether these behaviors can be generated end-to-end from and only from the features of peer participants. This work is based on a data corpus containing 2.5 h of the discussion sessions of 10 four-person groups. Multimodal features, including the attention of other participants, voice prosody, head movements, and speech turns extracted from the corpus, were used to train support vector machine models for the generation of the two behaviors. The performances of the generation models of attentional focus were in an F-measure range between 0.4 and 0.6. The nodding model had an accuracy of approximately 0.65. Both experiments were conducted in the setting of leave-one-subject-out cross validation. To measure the perceived naturalness of the generated behaviors, a subject experiment was conducted. In the experiment, the proposed models were compared. They were based on a data-driven method with two baselines: (1) a simple statistical model based on behavior frequency and (2) raw experimental data. The evaluation was based on the observation of video clips, in which one of the subjects was replaced by a robot performing head movements in the above-mentioned three conditions. The experimental results showed that there was no significant difference from original human behaviors in the data corpus and proved the effectiveness of the proposed models. Full article
(This article belongs to the Special Issue Multimodal Conversational Interaction and Interfaces)
Show Figures

Figure 1

24 pages, 1532 KiB  
Article
Prediction of Who Will Be Next Speaker and When Using Mouth-Opening Pattern in Multi-Party Conversation
by Ryo Ishii, Kazuhiro Otsuka, Shiro Kumano, Ryuichiro Higashinaka and Junji Tomita
Multimodal Technol. Interact. 2019, 3(4), 70; https://doi.org/10.3390/mti3040070 - 26 Oct 2019
Cited by 13 | Viewed by 3594
Abstract
We investigated the mouth-opening transition pattern (MOTP), which represents the change of mouth-opening degree during the end of an utterance, and used it to predict the next speaker and utterance interval between the start time of the next speaker’s utterance and the end [...] Read more.
We investigated the mouth-opening transition pattern (MOTP), which represents the change of mouth-opening degree during the end of an utterance, and used it to predict the next speaker and utterance interval between the start time of the next speaker’s utterance and the end time of the current speaker’s utterance in a multi-party conversation. We first collected verbal and nonverbal data that include speech and the degree of mouth opening (closed, narrow-open, wide-open) of participants that were manually annotated in four-person conversation. A key finding of the MOTP analysis is that the current speaker often keeps her mouth narrow-open during turn-keeping and starts to close it after opening it narrowly or continues to open it widely during turn-changing. The next speaker often starts to open her mouth narrowly after closing it during turn-changing. Moreover, when the current speaker starts to close her mouth after opening it narrowly in turn-keeping, the utterance interval tends to be short. In contrast, when the current speaker and the listeners open their mouths narrowly after opening them narrowly and then widely, the utterance interval tends to be long. On the basis of these results, we implemented prediction models of the next-speaker and utterance interval using MOTPs. As a multimodal-feature fusion, we also implemented models using eye-gaze behavior, which is one of the most useful items of information for prediction of next-speaker and utterance interval according to our previous study, in addition to MOTPs. The evaluation result of the models suggests that the MOTPs of the current speaker and listeners are effective for predicting the next speaker and utterance interval in multi-party conversation. Our multimodal-feature fusion model using MOTPs and eye-gaze behavior is more useful for predicting the next speaker and utterance interval than using only one or the other. Full article
(This article belongs to the Special Issue Multimodal Conversational Interaction and Interfaces)
Show Figures

Figure 1

10 pages, 600 KiB  
Article
Graph-Based Prediction of Meeting Participation
by Gabriel Murray
Multimodal Technol. Interact. 2019, 3(3), 54; https://doi.org/10.3390/mti3030054 - 12 Jul 2019
Viewed by 3083
Abstract
Given a meeting participant’s turn-taking dynamics during one segment of a meeting, and their contribution to the group discussion up to that point, our aim is to automatically predict their activity level at a later point of the meeting. The predictive models use [...] Read more.
Given a meeting participant’s turn-taking dynamics during one segment of a meeting, and their contribution to the group discussion up to that point, our aim is to automatically predict their activity level at a later point of the meeting. The predictive models use verbal and nonverbal features derived from social network representations of each small group interaction. The best automatic prediction models consistently outperform two baseline models at multiple time-lags. We analyze which interaction features are most predictive of later meeting activity levels, and investigate the efficacy of the verbal vs. nonverbal feature classes for this prediction task. At long time-lags, linguistic features become more crucial, but performance degrades compared with prediction at short time-lags. Full article
(This article belongs to the Special Issue Multimodal Conversational Interaction and Interfaces)
Show Figures

Figure 1

37 pages, 5883 KiB  
Article
Exploring Methods for Predicting Important Utterances Contributing to Meeting Summarization
by Fumio Nihei and Yukiko I. Nakano
Multimodal Technol. Interact. 2019, 3(3), 50; https://doi.org/10.3390/mti3030050 - 06 Jul 2019
Cited by 7 | Viewed by 4100
Abstract
Meeting minutes are useful, but creating meeting summaries are a time consuming task. Aiming at supporting such task, this paper proposes prediction models for important utterances that should be included in the meeting summary by using multimodal and multiparty features. We will tackle [...] Read more.
Meeting minutes are useful, but creating meeting summaries are a time consuming task. Aiming at supporting such task, this paper proposes prediction models for important utterances that should be included in the meeting summary by using multimodal and multiparty features. We will tackle this issue from two approaches: Handcrafted feature models and deep neural network models. The best handcrafted feature model achieved 0.707 in F-measure, and the best deep-learning based verbal and nonverbal model (V-NV model) achieved 0.827 in F-measure. Based on the V-NV model, we implemented a meeting browser, and conducted a user study. The results showed that the proposed meeting browser better contributes to the understanding of the content of the discussion and the participant roles in the discussion than the conventional text-based browser. Full article
(This article belongs to the Special Issue Multimodal Conversational Interaction and Interfaces)
Show Figures

Figure 1

17 pages, 471 KiB  
Article
Observing Collaboration in Small-Group Interaction
by Maria Koutsombogera and Carl Vogel
Multimodal Technol. Interact. 2019, 3(3), 45; https://doi.org/10.3390/mti3030045 - 28 Jun 2019
Cited by 8 | Viewed by 4018
Abstract
In this study, we define and test measures that capture aspects of collaboration in interaction within groups of three participants performing a task. The measures are constructed upon turn-taking and lexical features from a corpus of triadic task-based interactions, as well as upon [...] Read more.
In this study, we define and test measures that capture aspects of collaboration in interaction within groups of three participants performing a task. The measures are constructed upon turn-taking and lexical features from a corpus of triadic task-based interactions, as well as upon demographic features, and personality, dominance, and satisfaction assessments related to the corpus participants. Those quantities were tested for significant effects and correlations they have with each other. The findings indicate that determinants of collaboration are located in measures quantifying the differences among dialogue participants in conversational mechanisms employed, such as number and frequency of words contributed, lexical repetitions, conversational dominance, and in psychological and sentiment variables, i.e., the participants’ personality traits and expression of satisfaction. Full article
(This article belongs to the Special Issue Multimodal Conversational Interaction and Interfaces)
Show Figures

Figure 1

11 pages, 366 KiB  
Article
Websites with Multimedia Content: A Heuristic Evaluation of the Medical/Anatomical Museums
by Matina Kiourexidou, Nikos Antonopoulos, Eugenia Kiourexidou, Maria Piagkou, Rigas Kotsakis and Konstantinos Natsis
Multimodal Technol. Interact. 2019, 3(2), 42; https://doi.org/10.3390/mti3020042 - 12 Jun 2019
Cited by 8 | Viewed by 3651
Abstract
The internet and web technologies have radically changed the way that users interact with museum exhibits. The websites and their related services play an important role in accessibility and interaction with the multimedia content of museums. The aim of the current research is [...] Read more.
The internet and web technologies have radically changed the way that users interact with museum exhibits. The websites and their related services play an important role in accessibility and interaction with the multimedia content of museums. The aim of the current research is to present a heuristic evaluation of forty-seven medical and anatomy museum websites from usability experts, for the determination of the principal/key characteristics and issues towards the effective design of a museum website. For homogeneity and comparison purposes, the websites of museums with no support of English language were not included in the evaluation process. In the present paper, the methodology was structured with the assessment of the technologies and services of anatomy museum websites in mind. The results of the current statistical examination are subsequently analyzed and discussed. Full article
(This article belongs to the Special Issue Multimodal Conversational Interaction and Interfaces)
Show Figures

Figure 1

Back to TopTop