Virtual Reality as a Reflection Technique for Public Speaking Training

Video recording is one of the most commonly used techniques for reflection, because video allows people to know what they look like to others and how they could improve their performance, but it is problematic because some people easily fall into negative emotions and worry about their performance, resulting in a low benefit. In this study, the possibility of applying a simple VR-based reflection method was explored. This method uses virtual reality (VR) and a head-mounted display (HMD) to allow presenters to watch their own presentations from the audience’s perspective and uses an avatar, which hides personal appearance, which has low relevance to the quality of presentation, to help reduce self-awareness during reflection. An experimental study was carried out, considering four personal characteristics—gender, personal anxiety, personal confidence and self-bias. The goal of this study is to discuss which populations can benefit more from this system and to assess the impact of the avatar and HMD-based VR. According to the results, the individuals with low self-confidence in their public speaking skills could benefit more on self-evaluation from VR reflection with HMD, while individuals with negative self-bias could reduce more anxiety by using an avatar.


Introduction
The act of speaking in public can trigger fear for speakers of all levels. It is common that people with conditions, such as autism spectrum disorder (ASD), have a pronounced tendency for anxiety. However, according to statistics, approximately 77% of the general population are afraid to speak in public, which makes public speaking the most common social fear [1,2]. The fear and tension may compromise the quality of a presentation. For example, the inner state of a presenter has a significant influence on the presenter's cognitive resources, eye movements, body postures and the vocalization of hand movements at the time of the presentation. In general, people try to undertake repetitive practices before the actual presentation in order to control their own inner state and behavior. However, even though the best training method is to rehearse the presentation in a similar environment, the cost of the preparation is generally high. Hence, a simpler method is desirable.
Recently, a number of presentation training programs that use virtual reality (VR) technology have been proposed, in which a realistic situation can be reproduced at a low cost with 3D models in a virtual environment [3,4]. By using a VR system, the users can improve their presentation skills as much as they want without worrying about feedback from others [5,6]. Many of these existing VR-based training systems evaluate a presenter's behavior during training and provide feedback during or after the presentation. It is proven that these systems have certain effects on improving presentation skills. While many of the existing studies apply VR to exposure training that aims to improve public speaking skills while reducing anxiety, this study emphasized self-reflection as an important technique. Video-reflection has been proven to be a useful pedagogical tool for basic training on public speaking. The nonverbal and verbal elements of speaking performances can be recorded for subsequent review and analysis. This technique helps individuals improve their skill acquisition, speech content, objective test performance, and the recall of the actual speech [7]. In addition, this strategy reduces the anxiety and perceived social costs of speakers, and increases the positive appraisals of the performance [8,9]. However, people with public speaking anxieties usually have low self-esteem and little self-confidence, and they are more likely to avoid videotaped feedback because they are unwilling to watch their own performance [10].
To overcome this disadvantage, a simple VR-based reflection system was explored in this study. This reflection system involves VR and a head-mounted display (HMD) to enable presenters to observe their own behavior from a third-person's perspective in the 3D VR environment, which could be reflected by an increase in subjective ratings of observer-like perspectives during remembering [11], and to provide a wider field of view (FOV) to change the spatial awareness compared to video, so that a better self-reflection could be expected. A 3D human model is used as the avatar of the presenter. It presents only the factors for assessing the quality of presentations without disturbing the mood of the presenter in the VR space. This paper presents an experimental study that uses this VR-based reflection system. This experimental study focused on which populations could benefit more from such a system and assessed the impact of the factors (avatar or other parts in HMD-based VR) on the reflection effect.
The paper was organized as follows-in the next section, related work on VR public speaking training systems and the individual differences in training results are presented. In Section 3, the main concepts that lead to the design of the system are described. The system itself is detailed in Section 4, and the experiment is described in Section 5, in which 24 graduate students were trained to observe their own presentations while they were using the system. The results are presented and discussed in Sections 6 and 7.

Related Work
At present, there are systems that use information technologies (IT), such as VR, to enhance people's public speaking skills. For the sake of simplicity and reality, public speaking training that uses VR has been proposed for decades [12], and several companies have introduced practical VR training software to the market in recent years [3,4]. Many recent studies have featured implementations of an automatic evaluation system, where users received assessment results. For example, Fung et al. developed a framework that provides feedback to users by combining the automatic nonverbal assessment in the presentation video with the system and human subjective feedback through crowdsourcing [13]. Schneider et al. propose a tool called Presentation Trainer to provide feedback on nonverbal communication aspects, which identified 131 nonverbal communication practices that affect the quality of a presentation [14], and they also extended it in VR [15]. Chen et al. proposed an algorithm to estimate the public speaking competence rubric (PSCR) [16] score, which involves using body movements, facial expressions and linguistic information [17]. Damian et al. proposed a method for presenting simple icon-based feedback to users who speak in front of an audience while wearing smart glasses [18]. Palmas et al. proposed a VR public speaking training system that is based on the idea of placing a VR character as an audience member, and the VR character can react in the same manner as real audience members [19].
We categorize public speaking training with two axes-information provision timing and information acquisition methods, as shown in Figure 1. Information acquisition timing includes learning the information before, during and after the presentation, while the methods of information acquisition are divided into feedback-based and reflection-based (self-awareness-based) methods. In the former type, a trainee is taught by others or by the system what aspects he or she should improve (and how to improve them) [19]. In contrast, in the latter type, the speakers discover what aspects they should improve (and how to improve them) by themselves based on the provided information. Such reflection-based methods could help with some part of a speech which does not have any accurate rules and to find a unique style for public speaking. This is also what this study was focused on. Reflection is a cognitive process for learning from experiences [20,21]. The most common method of reflection is to record the audio of a presentation and then have the presenter listen to it afterwards. This is known to be effective for reflecting upon the prosodic information, involving the volume, inflection, timing and fluency of the voice as well as the content. Video recording has been proven to be a useful pedagogical tool for public speaking courses, as well as more general training [9]. Both verbal and nonverbal elements of the speaking performance can be recorded for subsequent review and analysis. Through video recording, speakers can recall their actual speeches and improve their skill acquisition methods, speech content, and objective test performance [7]. In addition, it can reduce the speakers' anxieties and perceived social costs, and increases the positive appraisals of the performance [8]. However, according to [10], people with public speaking anxiety are more likely to avoid watching their video recording because they are not willing to confirm their negative performance. These presenters tend to be more aware of their personal appearance than their performance, which would affect their self-assessment and reflection.
To mitigate the disadvantages, while retaining the advantages, of video reflection, a system which applies VR with a 3D avatar and records a presenter's motion together with an HMD based on the following two ideas was proposed in this study. The first idea is to present only the information that is strongly related to the quality of the presentation while hiding other information from the user, so as not to interfere with the user's mood. The second is to enable users to observe their own presentations from the audience's point of view in a 3D VR space by using the HMD. This might enable the users to evaluate their presentations objectively rather than through self-evaluation only.
The individual differences also affected the VR training result for public speaking. A series of differences were reported during and after training according to different levels of anxiety. Actually, a significant decrease in anxiety was observed after the VR exposure therapy in individuals with high-level public speaking anxiety [22,23]. The personality traits of neuroticism, extraversion and conscientiousness also had an influence on the perceived challenge of training, which caused anxiety [23]. Gender differences were also found to effect the different training strategies-males could reduce anxiety more by watching a virtual avatar performing a successful speech, while females benefit more from imagining themselves giving a successful speech [24]. According to these works, it is also one of the goals of this paper to make clear which population could benefit more from the VR reflection with an avatar.

Main Idea
The aim of this study is to explore a method that uses VR as a mechanism to help with reflection in public speaking training. The requirements for self-reflection in public speaking training include: R1 Be aware of one's problems when delivering a presentation.
R2 Grasp one's visibility to others without a cognitive bias.
R3 Avoid excessive self-focused attention and negative emotions.
In order to achieve all of these requirements, (A) a unique VR characteristic in which information can be confirmed by placing one's body in a state close to the real world (i.e., in a 3D space), and (B) the IT characteristics that enable the selective display of only the necessary information were used in this study.

Reflections from a Third-Person Perspective
In the real world, people move their body and change the perceived visual information according to the position and direction of their body, head and eye gaze. Self-acceptance sensation enables them to perceive where they are and what posture they are making, even with their eyes closed. The perceived position of the self and the physical position are usually consistent. However, when a human-shaped 3D avatar is displayed in front of a presenter who is using VR, the avatar's position may differ from his/her perceived position. It is believed that this difference will make the avatar seem like another individuals while it is recognized as a past self [25]. It is generally known that when an individual is making an evaluation of a person in an objective way, it is more accurate than evaluating him/herself [26].
Besides, a wider FOV could change the spatial awareness of information in the periphery of the scene [27], and influence the types of details in the memory, so people could remember more about global information, which is closer to a real observer. For these reasons, it is expected that the same effect will be achieved by applying idea (A) in HMD-based VR.

Appearance Factor
Nonverbal communication is also another means to transmit the message and play a role in public speaking. Normally, posture, gestures, facial expressions, gaze patterns and acoustic characteristics are considered aspects of measurement for effective communication [14,28]. The direct visualization of one's own appearance in a video involves psychological resistance, as described in the previous section [10]. In this study, the appearance factors of humans were categorized in terms of the relevance to the quality of presentation and the degree of interference with one's mood, and then the necessary information was visualized. As a basic policy, we tried to exclude information about a person's unique appearance as much as possible, but it is difficult to change one's physical appearance while retaining the features that can be consciously controlled during the presentation (there are a few exceptions for technical reasons, as described below). The elements that were employed in this study are as follows.

Position and Body Posture
If a presenter shifts his or her position unnecessarily during presentation or stands at an inappropriate position in relation to the slides, it will convey to the audience that the presenter is acting unnaturally. Body posture, also referred to as stance, is also an important sign of general mental state. For example, standing straight indicates that the presenter is confident and respectful; however, if the presenter stands too straight, he or she would look stiff and tense, whereas being too relaxed may be perceived as neglect. The actual standing position and 3D posture can be easily evaluated in a 3D VR environment in comparison to a video recording.
The personal body form can also be consciously controlled during the presentation and would contribute a bit to the quality of the presentation. However, if the size of an avatar is far from the actual size, it might make it difficult to judge the appropriateness of body motion. Therefore, in this study, the body size was adjusted in accordance with the actual size (see Section 4.2 for more details).

Face Direction
During a presentation, the speaker should face the audience, instead of looking at the slides, PC, or the floor.

Hand/Arm Movement
Hand movements, which are sometimes unconscious behavior, may influence the audience's evaluation of a presentation. People can still recognize emotions only by hand movement [29], and hand movement has already been considered an expanded cue of automatic emotion recognition [30]. So the audience could feel the emotion of the presenter though their hand movements. For example, if a person crosses or stacks his/her arms in front of his/her body, it is an instinct of self-preservation and considered to be inappropriate during a presentation. Touching one's face is not only a sign of self-consciousness but also a sign of deprecated behavior. Conversely, appropriate gesturing during speaking may help convey the speaker's point [13]. So checking hand movements is necessary for reflection of the presentation.

Voice
Voice information is one of the most important elements for public speaking training, and it was directly used in this study. However, some people may not want to listen to their own recorded voice. In such cases, certain modification can be applied to the original voice, which is effective in alleviating the disgust.
In this paper, the following factors were not used.

Eye Gaze
Eye contact is an important factor for general communication. It is known that a speaker loses his/her credibility the moment he/she stops looking at his/her audience. However, since the directions of the face and eyes are basically the same, it is believed that the approximate direction of the eyes can be estimated from the direction of the face. Therefore, the direction of the eyes was not used in this study.

Personal Face
It is known that personal facial appearance can disturb the minds of individuals interacting with the person. However, the HMD is required in the VR training system, and facial expression detection is not an easy task. Furthermore, if the avatar looks different but makes the same facial expressions, the user may feel uneasy. The personal facial appearance excluding facial expression was not considered in this study because it is impossible to consciously control this factor during a presentation, and it may have no significant contribution to the quality of the presentation.

Clothing
Clothing was not taken into account in this study because of its limited contribution to the quality of the presentation. However, clothing is an important cue for self-recognition and it is known to be less inhibitory to the mind. Therefore, clothing might be used as a control parameter in future investigations.

System Implementation
The general architecture of the experimental system is shown in Figure 2. This system was proposed in [31]. It has two working modes-presentation mode and playback mode.
When it works in the presentation mode, the users can practice their presentation in a virtual environment in a manner that is similar to the real environment. During the presentation, the system records the user's behaviors, including speech and body movement. After processing the user's behaviors, the system creates a sequential motion with a virtual avatar. When the system works in the playback mode, users can watch the VR presentation recording using the HMD or a normal PC monitor.

Hardware
The system consists of a PC, an HMD (HTC VIVE Pro Eye (Dual OLED, resolution: 1440 × 1600 pixel for each eye)) with two base stations for tracking, a depth camera that measures body information (Microsoft Azure Kinect), a stereo microphone (Panasonic RP-HX350) and the VIVE controller for the slide control. All system components run in Unity on a single PC (CPU: Core i7-5930K, 3.50 GHz, RAM: 16.0 GB, OS: Windows 10 Enterprise). The setting location of each component is as shown in Figure 3.

Presentation Mode
Similar to the studies that were previously described, the proposed system allows presenters to deliver presentations in a VR environment (note that this function was not used in the experiment described in Section 5, except for the recording function). A VR environment was constructed to imitate a small meeting room in an office. In the virtual environment, the presenter stood in front of a projector screen in a 4 × 6 m 2 conference room, walking around every now and then. Three virtual audience members were seated in front of the presenter, nodding and taking notes on their computers from time to time. The screen showed the presentation slides, and the presenter can use the VIVE controller in his/her hand to advance to the next slide or go back to the previous slide. In this environment, the presenter can repeatedly practice their presentations. The information specified in Section 4.3 was recorded during the presentation.

Playback Mode
This mode is the main function of this study. The user wearing the HMD can observe the following information that is recorded in the presentation mode from the observer perspective. An example of the user's viewpoint in this mode is shown on the right of Figure 4. The user takes a seat as an audience and the viewpoint could be changed by their head movement. An avatar was prepared for each gender and avatar with the same gender as the participant's was used ( Figure 5). The height of the avatars was adjusted to match the participant's height, measured using the Kinect in the presentation phase. The information recorded, the recording method in the presentation mode, and how to use the recorded information in the playback mode are described in the following sections.

Position and Posture
The posture was considered as a group of positions of 32 joints of the human body and were recorded by Azure Kinect as 3D positions and postures at 30 frames per second. The 3D position of the spine was used as the human location in the environment.

Hand/Arm Movement
The arm postures were also recorded using Azure Kinect. The arm contains four joint points in each posture. Owing to accuracy issues, the hand states were defined and recorded as either open or closed.

Voice
The voice was saved in a stereo form since it was the user's original voice. By combining the voice with the 3D standing position, a stereo sound that was generated from the spatially correct sound source position was presented to the user.

Slide Switching Timing
The slide switch timing was also recorded so that the slides can be viewed during the playback.

Method
The focus of the experiment was put on the effectiveness of the avatar and HMD-based VR in VR reflection as well as individual differences.
In the experiment, the participants were first asked to make a presentation in front of the evaluators using a set of slides on a predefined topic. During the presentation, the system recorded the information accordingly. Then, the participants were required to review their presentations in each condition and try to improve the points that needed attention. After that, the participants were asked to make a second presentation in front of the evaluators. The reflection method was evaluated on the basis of changes in self-reported anxiety level and self-evaluation before and after the playback, as well as based on the changes in self-evaluation or the experimenter's evaluation between the two presentations.

Participants
For this investigation, 24 participants (all graduate students in information science and materials science, all non-native English speakers, 13 males and 11 females, aged 22 to 42 years (M = 27.04, SD = 4.30)) at a local university, were recruited by e-mail. The personal report and confidence as a speaker (PRCS) score was used to assess the participants' fear of public speaking before the experiment. Then the participants were divided into three groups based on the PRCS scores to ensure an equal distribution of self-confidence in public speaking.
Each participant was paid an honorarium of 2000 JPY after two hours of performing the experiment. In addition, three evaluators were hired to assess the quality of the participants' presentations (graduate students in information science, who were not presentation experts, one male, two females).

Experimental Condition
To evaluate the impact of the avatar and other parts in HMD-based VR in reflection, three conditions were set in the experiment, and it was conducted as a between-subjects experiment. The participants were all asked to present without wearing HMD to make sure they gave the presentation in the same environment under all conditions. Figure 6 provides an overview of these conditions. This condition mimicked a common individual training method for public speaking, in which the participants reviewed a video that was captured by a web camera (Logitech C920 PRO HD Webcam) on a normal PC monitor (Resolution: 1920 × 1200 pixel). The participants listened to the sound through a set of headphones that were connected to the monitor. General software (Microsoft Movies & TV) was used to watch the videos, and the participants could check the video repeatedly in the playback mode by using a mouse.

Condition 2: 3D Avatar + PC Monitor
In this condition, the participants viewed the VR environment and 3D avatar videos on a PC monitor, which were reconstructed from the recorded information as described in Section 4.3. The point of view was adjusted to the position of the camera so that it was as close to condition 1 as possible, and it could be changed in a similar manner to the normal video. The 3D avatar's motion was defined for the 3D positions of joint points that were acquired by Kinect at 30 fps to mimic the users' movements during the presentation. The interface for controlling the movie is also the same as in condition 1.

Condition 3: 3D Avatar + HMD
Similar to condition 2, the 3D avatar was used for condition 3 but the HMD was employed as a display device instead of a PC monitor. The participant was initially seated, as shown in Figure 3, which corresponds to the position of one of audiences in the virtual environment. Since the viewpoint changes depending on the observation position and the orientation of the participant wearing the HMD, he or she can observe their own presentation from any position or orientation.

Outcome Measurements
One of the aims of this experiment was to explore the benefits for presenters with different personality traits from the training.
The benefits of a reflection method for public speaking are considered in mainly three parts-changes of anxiety for public speaking, changes of subjective impression and improvement of public speaking skills. The attention pattern in the playback and the willingness of using this system were also measured as the supplement for the performance of the method, since the attention part could lead to the result of reflection and the willingness will lead to the dropout rate of the training.
The questionnaires used for this experiment are as follows. By using these questionnaires, the personal characteristics for presenter and training indicators of the training method were measured.

Anxiety(Anx)
To measure the changes of anxiety around public speaking, the PRCS [32,33] was employed. It is a 12-item questionnaire that evaluates fear on a scale of 1 to 10.

Presentation quality by self(SQu)
To measure changes of subjective impression on presentation performance, the PSCR score introduced in [16] considered 11 questions that focus on the content, voice, and body movement in order to provide self-assessment of the presentation quality. Four of the questions were related to the preparation and organization of the presentation material, but these four questions were not used since the presentation material was prepared in advance. Besides the PSCR score, a 100-point scale was added to survey the overall impression of the presentation.

Presentation quality by others(OQu)
The improvement of public speaking skills was measured according to the evaluators, so the PSCR score was also used as an index for the assessment by the evaluators. Two of the three evaluators who were randomly assigned made the evaluation immediately after they finished listening to the presentation. Although the evaluators were non-experts on the presentation content, the PSCR was designed such that the users can perform objective and quantitative assessments even for non-experts. Corresponding to SQu, there was a 100-point scale for overall impression of the presentation as well. The average of the evaluators' scores was used as an indicator.
Degree of concern(DoC) for each part in the playback A mark sheet with 11 items was used to identify the areas that received attention during the playback. These items include: (1) vocal performance (i.e., pitch, loudness), (2) pauses, (3) meaningless actions and words, (4) vocabulary, (5) topic and content, (6) stance, (7) hand gestures, (8) eye contact, (9) facial expression, (10) dress, and (11) foot jittering. The participants were asked to score each item on a 20-point scale to indicate the DoC.

Willingness of playback(Wil)
Participants were asked to score the level of willingness to use the playback method in each condition based on a 5-point scale (1: strongly disagree, 5: strongly agree).

Presenter Characteristics
As we want to find who would be the target user of such a VR reflection system, personal confidence, self-bias and anxiety around public speaking were considered in this experiment according to their relationship to the performance of the presentation or the reflection as follows.
Though gender is not considered a factor of presentation quality nor reflection quality, there is controversy about its effect on other characteristics such as self-bias [34], as well as effect on public speaking training.
Gender The role of gender in the control of public speaking anxieties remains controversial.
Generally, gender differences does not have any significant effect on public speaking anxieties [35]. However, the effect of gender differences on some training results has been reported [24].
Personal confidence for public speaking skills Confidence will engage people to accept the challenge and think about their approach as reflection [36], although it was still unclear whether the observers' perspective provided by our VR reflection method would help people with lower confidence to focus more on their approach to their presentation. The confidence in public speaking skills was surveyed on a 5-point scale (1: very poor, 5: excellent) before the experiment.

Self-bias
The gaps between SQu and OQu indicate biased self-assessment. It is hard for some people to make fair self-assessment due to biased cognition as mentioned previously, and this will lead to ineffective reflection. In this paper, how such a person is affected by the use of the avatar and VR during playback was also discussed. Since the PSCR score focuses more on each specific part for presentation, the 100-point scale was used in this study to measure the overall impression in personal characteristics.
Personal anxiety for public speaking Public speaking anxiety has been discussed in public speaking training in many works [22,23,37]. Normally, personal anxiety around public speaking is related to negative self-assessment according to the cognitive model, but there are also some reports revealing that people with social anxiety did not underestimate their performance [38]. So personal anxiety and self-bias were separated into two characteristics to see the result. The value was measured from the personal report using PRCS before the experiment and indicated general fear.

Training Indicators
First, for the question of how the anxiety for public speaking was reduced, the changes after the playback and after the whole experiment were measured: Anxiety changing(p1-pb) The Anx for Presentation 1 changing between the personal reports before and after playback.
Anxiety changing(p1-p2) The Anx changing between two presentations, indicating the anxiety reduction from the training.
For how the playback changed the subjective impression, the following indicators were measured: Self-assessment changing(pb-p1) The self-assessment changes were measured by PSCR score in SQu after playback. Since the self-bias from presenter characteristics also use PSCR score in SQu before playback, we will not analyze the relationship between self-bias and self-assessment changes.

Bias changing(p1-pb)
The self-bias changing of assessment from reflection was measured as the changes in absolute value of gaps between SQu and OQu before and after playback. Similar to the self-assessment changes, the self-bias from presenter characteristics also uses SQu before playback, so we will not analyze the relationship between them.
Finally, for the question of public speaking skill improvement in the whole training, both subjective and objective indicators were measured:

Subjective improvement(p2-p1)
The personal feelings about the improvement on public speaking skills was measured by PSCR score changes in SQu between two presentations.
Objective improvement(p2-p1) More objective results for the presenter's improvement on public speaking skills were measured by PSCR score changes in OQu between two presentations.

Procedures
The process of the experiment is visualized in Figure 7. First, the experimenter provided an overview of the experiment to each participant and obtained informed consent from each participant. After that, each participant was taught how to use the system in each condition. The participants were then informed about the theme of the presentation, the slides, and the content of each slide. In this experiment, 10 slides were prepared on the topic "Introduce Your University" for a fiveminute presentation. The reason for choosing this topic was to minimize the differences in knowledge between the participants. Each participant was given 10 min to prepare the first presentation.
Next, the participants stood in the presentation area as shown in 2, and delivered the first presentation using the slides (Presentation 1). There were three audiences. Two of them were evaluators who were hired according to their schedule, and the other one was an experimenter who was seated in the experimental room, because this arrangement can bring more social anxieties to the presenter [37]. Basically, only the evaluation ratings from the two evaluators were used in the study, but if the ratings were inconsistent, the experimenter's rating would also be considered and the average value would be used as the final score.
After Presentation 1, the participants responded to the PSCR in order to assess the quality of the presentation and the PRCS to assess the anxieties. Simultaneously, the evaluators assessed the quality of their presentations in accordance with the PSCR.
Then, the participants were required to review the information recorded during Presentation 1 and make self-reflection. They identified the parts that needed to be improved and practiced repeatedly using the method described in Section 4.3. Each participant was allowed to check the video recording once, filled out a questionnaire regarding the DoC for each part, and answered the PSCR and PRCS again to observe the changes in the self-assessment. Then they were given an additional 10 min so that they could check the concerning parts one by one and pause each scene with the mouse (conditions 1 and 2) or the controller (condition 3). If the participants determined that they had been sufficiently checked and trained, this phase ended even before the 10 min timed up.
Finally, the participants were asked again to deliver a 5-min presentation on the same topic (Presentation 2) based on what they had learned during the reflection phase. After Presentation 2, the participants once again responded to the PSCR and PRCS. Similarly, the evaluators made judgments on their presentations with the PSCR again.
After the experiment, a brief interview was conducted to collect the participants' feedback.

Attention Distribution in Each Condition
As some appearance factors of humans were excluded, the attention distribution during playback for each condition was measured. After confirming the covariance and normality, an analysis was conducted on variance (ANOVA) and multiple comparisons were made with Tukey's HSD test.
In the playback phase, the scores for the focused parts are shown in Figure 8. The parts with more than a significant tendency represented the Topic and contents (ANOVA,  The degree of concern for each part in the playback phase. * * indicates p < 0.01. † indicates p < 0.1.

Personal Characteristics Differences
Four personal characteristics, gender, personal anxiety, personal confidence and selfbias, were mainly considered.
An analysis of independent-samples t-test was conducted for gender differences. As the result indicated, there was no significant tendency between gender and any other training indicator. In addition, there was no significant tendency between gender and other personal characteristics either.
Then an analysis of bivariate correlation was conducted for other personal characteristics ( Figure 9) and the analysis results shown in Figure 9a indicated that there was no significant correlation among the personal characteristics. As shown in Figure 9b, the results of condition 3 indicated a significant correlation between personal confidence of public speaking skills and self-assessment changing(pb-p1) and objectivity changing(p1-pb), respectively, which means that the participants with low personal confidence in their public speaking skills could become more confident and more objective in self-assessment after VR reflection.
In Figure 9c, the correlation between the gaps of self-and others-assessment and anxiety changing was significant in condition 3 and the combination of condition 2 and condition 3, which suggested that the participants with lower self-assessment than others could reduce more anxiety in condition 2 and condition 3.
The evaluators' scores indicate that the agreement with a maximal distance of two ranges within 99.4% for each of the seven aspects, which shows high overall agreement between the evaluators. However, there was no factor corrected with objective improvement in all conditions.
The Wil with the question "Would you like to use this playback method for practice?" was also considered, as it may affect long-term training. The result showed a negative correlation between the personal anxiety level and the willingness of playback for condition 2, as shown in Figure 9d.

Discussion
Neither significant nor strong correlation was observed among the personal characteristics. Though there were some reports showing that women tended to judge themselves more harshly [34], there was no gender difference in self-bias nor personal confidence from the experimental results. In this experiment, there were few participants who had great public speaking anxiety but showed high personal confidence and few participants who had low confidence for public speaking skills but showed high or positive self-assessment. Personal confidence in public speaking skills may also be affected by presentation ability, and the anxiety around public speaking does not only come from low confidence or self-bias. Therefore, the personal characteristics are discussed separately.
The experiment results showed that participants with low personal confidence in their public speaking skills could change their subjective impression more positively and closer to an objective impression from other people after VR reflection. People with low selfconfidence tended to avoid looking back at their own performance in the first place. One of the possible reasons was that the avatar and a wider FOV from condition 3 were observing them from an objective and interactive perspective, which led to objective evaluation and allowed them to make a fair self-assessment. On the other hand, people who were confident with their public speaking skills did not have negative impressions of their own appearance, so they were more likely to have few fears and prefer to reflect in condition 1 where the most factual information was included.
A similar perspective also worked for individuals with negative self-bias to reduce anxiety, but they benefited more from the use of an avatar. Though the negative selfevaluation is not always affected negatively, relatively large self-bias could show the negative self-images of the individuals, and many experimental studies demonstrated the detrimental effects of negative self-image in highly socially anxious and non-anxious participants [39]. The individuals with self-bias assessed themselves more negatively than they assessed others, and this was due to self-awareness. Self-awareness would bring them worries about how people would perceive them and thus self-awareness has been linked to increased social anxiety and negative self-assessment [40,41]. When people face an avatar with their self-representation, they can feel higher ownership of that avatar and increase their self-awareness [42,43]. So, maybe it was because of the use of an avatar, which excluded as much information about a person's unique appearance as possible, could weaken the self-awareness from reflection, and reduce the anxiety. However, a further evaluation of the perspective changing would be needed to reveal the reason.
The results also showed that people with greater public speaking anxiety were less willing to watch the playback in condition 2. There were some limitations in both conditions 2 and 3: (1) natural expressions and eye contact information were lost, and (2) there were some tracking errors in some cases especially from the occlusion of the body. Individuals with greater anxiety more easily fell into negative constructs [44] and blamed themselves for the negative outcomes. They felt that their behavior was strange by watching the avatar's unnatural behavior and supplemented the lost information with more negative imagination. This might be a reason for no improvement in self-evaluation as well as low willingness to watch playback. However, in condition 3, the viewpoint can be moved by head movement to check the 3D human body and surroundings, which may reduce feelings about unnatural behavior and distract negative thoughts. For the other parts, no significant correlation was observed between personal anxiety and training indicators. This is predictable, because the causes of anxieties are complicated. Although the most common fear is from the audience's responses and performance evaluation, other fears such as the fear of the inability to self-regulate, disfluency and unpreparedness, also exist in abundance [45], which could be magnified especially when there is only limited time for preparation. Some participants did comment that they noticed their weak points, but they did not know how to improve them (e.g., "I know I should face the audience, but I cannot do this"). For this part, targeted practice could be more effective instead of just making reflections.
The whole training did not show substantive effectiveness on the improvement of public speaking skills in such a short session. A long-term intervention study is needed. Besides, some performances were changed or hidden in the VR reflection method, and this changed the concentration of eye contact and facial expressions during playback. There were no significant differences among conditions for dress and foot jitters, which was most probably because little attention was paid to these parts originally.

Conclusions
In this paper, the design and evaluation of a VR public speaking training system with VR refection is described. In this system, a virtual avatar is used and some information is chosen to display. Although video recording shows the most factual information and is preferred by individuals confident with their public speaking skills, the experiment result shows that the individuals with low self-confidence in their public speaking skills could change their self-evaluation more positively and objectively in VR reflection with HMD, and individuals with negative self-bias could reduce more anxiety by using an avatar for self-reflection. In addition, it requires few devices, which would not obstruct the presentation much and could easily be added into other VR training systems based on Virtual Reality Exposure Therapies (VRET). Since the reflection is a basic part of Cognitive Behavioural Therapy (CBT) [46], and both VRET [47,48] and CBT could help with public speaking anxiety and social anxiety.
However, some information was lost in the VR recordings, such as facial expression and precise hand movements, and some people believe these should also be important in presentations. Extra cameras and some other motion capture with hand and finger movement system may be helpful to improve the performance of the avatar as well as the system, but the obstruction to the presentation of these devices should be considered carefully. The lost information also makes people think the recording is not completely true, so they may doubt how much they can improve from the VR reflection. As we mentioned, there were some tracking errors in some cases especially from the occlusion on the body; this also enhances the feeling of unnaturalness. Our idea to improve that is to increase the number of Kinects so that the tracking would get better and we would not need to worry about the occlusion. Considering all performances of VR reflection, for individuals who get used to the video recording and do not have problems with low self-confidence or negative self-bias, a video recording with cheaper cameras could still be better.
This study is the first step to understanding how VR reflection can be effective for public speaking training based on the individual differences in the training result. Though the result shows the possible target users of VR reflection, our experiment just shows the outside performance of the system for different kinds of individuals. We did not measure the psychological changes of the presenter's during playback; for example, how much they thought the avatar was themselves or how long they take to recognize that the avatars are themselves. There are only two kinds of avatar, according to gender, the effects of changing the avatar and the voice are also unclear, so further evaluations are required. All of the participants in this study are also graduate students, and the number of participants is limited to 24. Our original target users also included people who have great anxiety about public speaking, such as individuals with ASD. The same experiment with a broader range of participants and a larger number of participants is required to generalize the findings from this study. In addition, since the current user study is based only on one trial, it is important to explore the efficacy of continuously using this system. The experimental system is designed to understand the state of the presentation and to identify the problems; however, solving the problem relies on each user's skills. We believe that it is important to combine this system with an automatic and real-time assessment of the presentation quality and feedback provision to improve it in a VR setting in the future.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.