Next Article in Journal
Design and Implementation of an IPT Charger with Minimum Number of Elements for Battery Charging Applications
Next Article in Special Issue
Impact of VR Application in an Academic Context
Previous Article in Journal
Finite Element Analysis of Maxillary Teeth Movement with Time during En Masse Retraction Using Orthodontic Mini-Screw
Previous Article in Special Issue
Augmented and Virtual Reality to Enhance the Didactical Experience of Technological Heritage Museums
 
 
Article
Peer-Review Record

The Influence of Disclosing the AI Potential Error to the User on the Efficiency of User–AI Collaboration

Appl. Sci. 2023, 13(6), 3572; https://doi.org/10.3390/app13063572
by Olga Lukashova-Sanz 1,2,*, Martin Dechant 1,2 and Siegfried Wahl 1,2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2023, 13(6), 3572; https://doi.org/10.3390/app13063572
Submission received: 24 January 2023 / Revised: 17 February 2023 / Accepted: 3 March 2023 / Published: 10 March 2023
(This article belongs to the Special Issue Virtual Reality Technology and Applications)

Round 1

Reviewer 1 Report

In general, basing statistics on only 15 participants is unreliable and may lead to false conclusions. There is no summary table showing a comprehensive summary of the results. It is diffcult to understand the features/numbers/ from calculation prespective. For example, the results of the scoring system or input/output of two-way ANOVA. In addition, it should be noted that the accuracy of the model output can be reported with a confidence level that determines the level of decision confidence in the model (this approach is not mentioned in paper). In general, from my perspective, the article is too general and the study is not well planned in terms of statistics to carry a large scientific aspect. I opt to reject this paper.

Author Response

We sincerely thank the reviewer for the comments on the manuscript as well as the suggestions for improvement. In the following text, we address the comments one by one where the reviewer’s comments are denoted by “Reviewer”, and the authors’ response comments – by “Authors”. All references to specific lines correspond to the updated version of the manuscript.

Reviewer:

In general, basing statistics on only 15 participants is unreliable and may lead to false conclusions. There is no summary table showing a comprehensive summary of the results. It is diffcult to understand the features/numbers/ from calculation prespective. For example, the results of the scoring system or input/output of two-way ANOVA. In addition, it should be noted that the accuracy of the model output can be reported with a confidence level that determines the level of decision confidence in the model (this approach is not mentioned in paper). In general, from my perspective, the article is too general and the study is not well planned in terms of statistics to carry a large scientific aspect. I opt to reject this paper.

Authors

As suggested by the reviewer, in addition to the description in the main text in the Analysis and Results sections, we added ANOVA summary tables for each of the relevant dependent variables which can be found in Tables 1-3. We hope it will make the results more comprehensible.

Combined with the previous comment, we furthermore highlighted in the Discussion section the limitation on the number of participants (lines 618-626).

Reviewer 2 Report

This paper conducted a VR study in which a simulated AI predicted the users intended action in a selection task. The experiment results show that conveying the potential AI error to the user enables the user to anticipate when it is more practical to actively participate in the final decision or delegate it to the AI and also ensures a faster decision regardless of the risk level.

The writing of this paper is clear. The experiment design and results are well explained. We understand that such human involved experiment is highly time-consuming. This paper has made a good example of setting up such an experiment and performing a valid analysis.

The following questions are not quite clear:

1. What is the inspiration for such a test flow design? Is there any reference

2. Why it is necessary for this paper to implement the test in a VR environment instead of using a PC-based test, in which the participant can directly make decisions using a mouse and keyboard? What are the benefits of using a VR environment?

3. How long does it last for a participant to complete the test? What are the main challenges in having more participants in the test? One interesting future work could be having more participants in different categories to allow more analysis such as age, gender, work experience, personality, disabled or not, etc. One possible solution might be designing such experiment and publishing it as an online game and sharing it on social media. Even a reward can be set once some particular goal is reached. 

Author Response

We sincerely thank the reviewer for the comments on the manuscript as well as the suggestions for improvement. In the following text, we address the comments one by one where the reviewer’s comments are denoted by “Reviewer”, and the authors’ response comments – by “Authors”. All references to specific lines correspond to the updated version of the manuscript.

 

Reviewer

This paper conducted a VR study in which a simulated AI predicted the user’s intended action in a selection task. The experiment results show that conveying the potential AI error to the user enables the user to anticipate when it is more practical to actively participate in the final decision or delegate it to the AI and also ensures a faster decision regardless of the risk level.

The writing of this paper is clear. The experiment design and results are well explained. We understand that such human involved experiment is highly time-consuming. This paper has made a good example of setting up such an experiment and performing a valid analysis.

The following questions are not quite clear:

  1. What is the inspiration for such a test flow design? Is there any reference

Authors:

As is mentioned in the Introduction section, the present study was developed in the framework of a larger project in the context of AI-supported assistive technology. Existing research on AI error communication to the user prior to the system deployment is focused on rather remote scenarios which are not applicable to the intended context. For example, when it comes to the User-AI collaboration, in the prior work the typical user performance is worse or similar to that of the AI model (e.g., predicting an income of a person based on the demographic parameters of that person). In contrast, in the case of a grasping task, the user knows his or her intention with the highest certainty, which in the case of the present study is the intended-to-be-grasped object. Thus, the idea in the present work was to implement a paradigm that is representative of such an application and that was the main motivation for the experimental paradigm. We comment on these aspects in the Introduction (lines 75-88).

Reviewer

  1. Why it is necessary for this paper to implement the test in a VR environment instead of using a PC-based test, in which the participant can directly make decisions using a mouse and keyboard? What are the benefits of using a VR environment?

Authors

In combination with the comments to the first question, the motivation to use VR was to bring the paradigm closer to the potential application scenario in mind which is a grasping task with AI-based assistance. As we describe in the Methods section 2.2.1, participants were informed that there was an AI system that makes its prediction based on the data collected from the participant and from the environment. For example, participants assumed that eye movements were collected by the system to use it as one of the input parameters for the AI prediction. To strengthen the immersion of the participant into the paradigm, we implemented some interaction with the VR environment using the eye movements, specifically, selecting to continue or to check the model’s decision. Furthermore, VR enables natural head and body movement and a larger field of view compared to a traditional screen-based setup which in turn increases the realism of the experimental paradigm. Based on the comments of the participants, this approach worked quite nicely, and participants indeed seemed to assume there was an AI system even though in fact it was only simulated. We discuss the motivation for using VR in the introduction (lines 75-88).

Reviewer

  1. How long does it last for a participant to complete the test? What are the main challenges in having more participants in the test? One interesting future work could be having more participants in different categories to allow more analysis such as age, gender, work experience, personality, disabled or not, etc. One possible solution might be designing such experiment and publishing it as an online game and sharing it on social media. Even a reward can be set once some particular goal is reached. 

Authors

The experiment was performed on two different days with two sub-sessions on each day. Each sub-session lasted 15-20 min. Therefore, the complete clean time of data acquisition was approximately 2x2x20min. We elaborate on the total number of trials as well as the detailed time frame of the sessions in the Methods section 2.6.

As kindly mentioned by the reviewer, the experiments using VR are resource-consuming as participants have to come to the lab in person as well as the data collection time is limited to not overload the participant and avoid discomfort due to the long headset wearing. The present study served as the initial step in implementing a realistic scenario reproducing an AI-assisted grasping task and a set size of 15 participants was selected. Even though the findings of the study are fruitful, we admit, that a larger number of participants could strengthen the results of the paper – we additionally comment on the limitation of the set size in the Discussion (lines 618-626).

We furthermore thank the reviewer for the kind suggestions for future studies. We hope that our research and the findings of the present study will open new perspectives on AI error communication in the context of assistive technology and, therefore, support designing better solutions for those in need.

Back to TopTop