Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Semantic Fusion for Natural Multimodal Interfaces using Concurrent Augmented Transition Networks

Multimodal Technol. Interact. 2018, 2(4), 81; https://doi.org/10.3390/mti2040081

by Chris Zimmerer, Martin Fischbach and Marc Erich Latoschik^*

Reviewer 1:

Bruno Dumas

Reviewer 2: Anonymous

Multimodal Technol. Interact. 2018, 2(4), 81; https://doi.org/10.3390/mti2040081

Submission received: 14 October 2018 / Revised: 16 November 2018 / Accepted: 4 December 2018 / Published: 6 December 2018

(This article belongs to the Special Issue Multimodal User Interfaces Modelling and Development)

Round 1

Reviewer 1 Report

This paper presents an extension to an existing approach for the semantic-level fusion of multimodal inputs for interactive systems. The paper starts a rather complete state of the art, before describing requirements analysis done for their approach based on a use case. The approach itself is then presented, and once again illustrated with help of a use case. Some further examples of projects realised with the cATN approach are presented. The article closes with a discussion and a conclusion.

Overall, the work presented in this article is quite interesting, and sound overall. Works on semantic fusion for multimodal interactive fusion have become relatively sparse these past years beyond the numerous issues still open. As such, an extension of a relatively important past work is a welcome addition to the current state of the art.

The paper is also very clearly written, and well structured. This reviewer spotted only a handful of typos, which are listed at the bottom of this review.

The state of the art is complete enough, with maybe the omission of symbolic-statistical fusion approaches such as e.g.

- Wu, L., Oviatt, S. L., and Cohen, P. R. 2002. From members to teams to committee-a robust approach to gestural and multimodal recognition. In IEEE Trans Neural Netw, vol. 13, no. 4, 2002. pp. 972-982.

- Chai, J., Hong, P., and Zhou, M. A Probabilistic Approach to Reference Resolution in Multimodal User Interfaces. In Proceedings of the 9th International Conference on Intelligent User Interfaces (IUI 2004) (Funchal, Madeira, Portugal, January 2004), 70–77.

- Dumas, B, Signer, B & Lalanne, D 2012, Fusion in multimodal interactive systems: An HMM-Based Algorithm for User-Induced Adaptation. in Proceedings of the 4th ACM SIGCHI symposium on Engineering interactive computing systems (EICS '12).

On the approach itself: several aspects presented are quite interesting. Direct integration of feedback to the user as part of the design of the fusion method is particularly worth mentioning, as well as the ability to take context into account. One question on that topic: can context be dynamic? E.g. can the approach presented by the authors handle information such as time, position of the user, or environmental information such as the surrounding light level?

On page 3, lines 89-90, the authors mention that "there is currently no solution on how to fulfill two fundamental semantic fusion requirements (see Table 1): handling probabilistic and chronologically unsorted input". This is true for finite state automatas-based approaches, but approaches such as meaning frames or unification-based approaches have been able to handle such inputs from the start (see e.g. ref. [30] from the article)... However with the limit that they have to "wait" for all input to be processed before integration can start. There is a little interesting discussion here on which the authors could build.

Authors mention that they plan to assess how well their approach works performance-wise. This is actually a point that this reviewer would have liked to see developed a bit further. In the current experience of the authors, how well does their approach cope with high frequency data sources, such as e.g. a touch surface such as the one pictured in Figure 1 middle? If both users place their full hands on the surface at the same time, what happens?

The main weakness of the article revolves around the lack of validation of the approach. Evaluation is mentioned in the future works at the end of the article, however, mostly on performance aspects. Beyond performance aspects, a core question is the usability (understandability, expressivity) of the cATN approach and language w.r.t developers. The authors would be encouraged to explore this side of their work in further publications. This would also build on the (very good) argument the authors make in page 2, line 75, about typical interaction design techniques. Also on the topic of understandability and expressivity of the proposed language, the fact that it relies on Scala is definitely a positive aspect (avoiding the need for developers to learn a brand new language). However, it seems to this reviewer that the proposed approach would be well suited for the creation of a visual tool for easier definition of cATNs. These last points are more general suggestions and don't have to be addressed in a revision of this article.

Misc. typos:

- p.6 line 176: word "fusion" repeated

- p.9 line 304 method -> methods

- p.13 line 434 & 437: Cursor -> Cursors

- p.17 line 582: to integrated -> to integrate

Author Response

Dear editor,
dear reviewers,

We would like to take this chance to thank you all very much for the valuable feedback and appreciate that you found the time to evaluate our manuscript. We took your advices at heart to improve, refine, and polish our work and the manuscript. With this reply we like to address your comments and point-out how we have reflected them in a refined version which we hope will find your approval now.

best

Author Response File: Author Response.pdf

Reviewer 2 Report

A concurrent Augmented Transition Network (cATN) was proposed to support the rapid prototyping of multimodal interfaces. While this paper is interest and useful, this manuscript itself requires a major rewrite.

1. Methodology needs to be clearly specified such as how the research was conducted and why.

(1). The authors should explain how to determine the value of confidence.

(2). The authors should explain how to determine the value of n for n-best guesses.

2. The practical experimental results should be given and discussed in details.

(1). The authors should give some evaluation factors to evaluate the proposed method.

(2). The authors should give some practical experimental results to evaluate the proposed method.

(3). The proposed method should be compared with other methods.

3. Literature review section needs to be further conducted to discuss more references used and how they support the design of the method. The authors should cite more refereed international journal articles which were published in the recent three years. The advantages and limitations of these studies should be discussed.

4. The introduction section needs to be rewritten with much better motivation and providing the context for this work. It should include:

(1). Contextualization

(2). Importance/Relevance of the Theme

(3). Research Question

(4). Objectives

(5). Structure of the Paper

5. In the last section, please focus on “Discussion, Implication, and Conclusion” to include

(1). Discussion, Implication, and Conclusion

(2). Discussion why the authors found out these results and how they comply (or not) with the Literature Review?

(3). Conclusions

(4). Managerial and Academic Implications

(5). Limitations of the paper

(6). Future Studies and Recommendations

6. The English of this paper should be polished and revised carefully, from the reviewer's point of view, the work should be written more objective and professional, things such as “we identify” should be avoided.

7. The full name of abbreviation words should be given.

(1). “SDKs” in Line 133

(2). “SGIM” in Line 225

Author Response

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The authors have addressed my review comments to revise the manuscript.

Article Menu

Semantic Fusion for Natural Multimodal Interfaces using Concurrent Augmented Transition Networks

Further Information

Guidelines

MDPI Initiatives

Follow MDPI