1. Introduction
Effective personal and professional social interactions are of paramount importance in successfully achieving both psychological needs (e.g., developing and maintaining relationships with friends, family, and loved ones) and self-fulfilling needs (e.g., advancing one’s career). A social interaction is a shared exchange between two people (dyad), three people (triad), or larger social groups of four people or more occurring in-person, remotely (technology-mediated), or mixed. Within these social groups, information is shared verbally (e.g., speech) and nonverbally (e.g., eye gaze, facial expressions, body language, hand gestures, interpersonal distance, social touch, appearances of interaction partners, and the appearance/context of the setting in which the interaction occurs). Most of the information exchanged is nonverbal (65% or more) compared to verbal communicative cues [
1].
With the exception of social touching (e.g., shaking hands, hugging), vision is needed to sense, perceive, and appropriately respond to visual nonverbal cues present in dynamic social scenarios. For individuals who are blind or visually impaired, these cues are largely inaccessible. Hearing provides intermittent access to certain nonverbal cues; for example, while speaking, an interaction partner’s voice can be used to estimate how far he or she is standing from you (interpersonal distance); and the intonation of voice, together with the content of its speech, can be used, to some extent, to detect emotion and intent. But without vision, immediate emotional responses, nuances of facial expressions, and the intent of eye gaze and body language are lost. In a focus group with individuals who are blind and visually impaired, work environments were mentioned the most frequently as the setting where inaccessible social interactions had the most negative impact. Participants noted that the most challenging type of social interaction is the large group meeting, particularly because questions are commonly directed through eye gaze, and so it is easy to answer a question out of turn, creating socially awkward situations. Individuals who are blind also noted that when passing people in hallways or elevators, they wonder if it is a co-worker keeping quiet to avoid social interaction. Incomplete exchanges of information, as given in the aforementioned examples from focus groups, may lead to feelings of embarrassment, which in turn, may cause an individual to desire social avoidance and isolation, which eventually, may result in psychological problems, such as depression and social anxiety [
2]. It is, therefore, important to explore technological solutions to break down these social barriers and provide individuals who are blind with access to social interactions comparable to their sighted counterparts.
To address the aforementioned problem, researchers have begun to explore social assistive aids for individuals with visual impairments. These technologies use sensory substitution algorithms to convert visual data into information for perception by an alternative modality, such as touch or hearing. Researchers have targeted nonverbal cues of eye gaze, interpersonal distance, and facial expressions. Qiu et al. [
3] proposed a device for individuals who are blind consisting of a band of vibration motors worn around the head to map eye gaze information to vibrotactile stimulation. The device mapped a quick visual glance to a short vibrotactile burst, and a fixation to a repeating vibrotactile pattern. In our own previous work [
4], we proposed a vibrotactile belt for communicating the direction and interpersonal distance of interaction partners using dimensions of body site and tactile rhythm. Actuators were driven by a face detection algorithm ran on frames from a discreetly embedded video camera in a pair of sunglasses. Direction was presented relative to the user, e.g., when someone stood in your field of view to your right side, your right side would be stimulated via pancake motors embedded in the belt. Interpersonal distances of intimate, personal, social, and public were mapped to tactile rhythms that felt like heartbeats of varied tempo depending on the distance (intimacy) of the interaction. Most of the work done toward realizing social assistive aids for individuals who are blind has been focused on visual-to-tactile substitution of facial expressions and emotions, described next.
Buimer et al. [
5] used a waist worn vibrotactile belt to map the six basic emotions to body sites on the left and right side of the user. Réhman et al. [
6] proposed a novel vibrotactile display consisting of three axes of vibration motors embedded on the back of a chair where each axis represented a different emotion, and the progression of the stimulation along the axis indicated the intensity of the respective emotion. Rahman et al. [
7] conveyed behavioral expressions, such as yawning, smiling, and looking away, and their dimensions of affect (valence, dominance, and arousal) through speech output. Krishna et al. [
8] proposed a novel vibrotactile glove with pancake motors embedded on the back of the fingers and hand to communicate the six basic emotions (happy, sad, surprise, anger, fear, and disgust) using visual emoticon representations. For example, to convey happiness, a spatiotemporal vibrotactile pattern was displayed that was perceived as a smile being “drawn” on the back of the user’s hand. As shown, much of the previous work aims to solve the problem for the user; i.e., recognize an interaction partner’s basic emotion, and present this information to the user with or without an intensity rating.
Striving for a human-in-the-loop solution, we view social assistive aids as providing rich, complementary information to a user so that they may use this information to make their own conclusions about the facial expressions, emotions and higher cognitive states of interaction partners. To meet this goal, in our early work [
9], we proposed the first mapping of visual facial action units to vibrotactile stimulation patterns. As part of a pilot run to gather initial feedback and improve our design, we tested these patterns with sighted participants. Facial action units were chosen for the following reasons: First, any facial expression can be reliably broken down into its facial action units using the descriptive facial action unit coding system (FACS) [
10]. Importantly, we are not limiting the capabilities of the system to the six basic emotions previously described; instead, by focusing on facial action units, any facial expression could be communicated to the user. Second, it is well known which facial action units occur most frequently for each of the six basic emotions [
11], which simplifies user training for studies exploring the perception of these units and their associated emotions. Third, we may focus our attention on questions surrounding the delivery and recognition of tactile facial action units since their extraction from video is largely a solved problem: e.g., there are numerous freely and commercially available facial action unit extraction software [
12,
13,
14].
The design of the initial set of tactile facial action units, presented in [
9], were improved, based on participant performance and feedback, and retested in [
15] with individuals who are blind. The study of [
15], which, from this point forward, we refer to as “Study #1”, is presented here again, but with new results and analysis comparing the performance between individuals who are blind and sighted. The design of the tactile facial action units from [
15] were once more improved, and used in a study [
16] exploring how well individuals who are blind learn and recognize the associated emotions of these facial action units. The study of [
16], which we will refer to as “Study #2”, is presented again in this work, but with new results and analysis that explores similar performance questions posed in Study #1. To summarize, the aim the work here is to shed light on the following questions: (1)
How well can individuals who are blind recognize tactile facial action units compared to those who are sighted? (2)
How well can individuals who are blind recognize emotions from tactile facial action units compared to those who are sighted?While more experiments need to be conducted with much larger sample sizes, the preliminary pilot tests presented in the subsequent sections at least hint toward the potential of using tactile facial action units in social assistive aids for individuals who are blind. Specifically, with very little training, individuals who are blind were able to learn to recognize tactile facial action units and the associated emotions. While the recognition performance of individuals who are blind was comparable to sighted individuals, larger sample sizes are needed for more conclusive results. In any case, these preliminary results are promising and encourage more exploration by researchers.
Our aim here is not to completely solve the problem of developing a wearable social assistive aid for individuals who are blind, capable of extracting and presenting a myriad of non-verbal cues. This challenging problem requires advances within dimensions human-computer interaction, wearable computing, computer vision, and haptics. The present work focuses on one type of non-verbal cue: facial expressions. Before incorporating tactile facial action units in social assistive aids, a better understanding of recognition performance is required to improve their design. This effort assesses recognition performance differences between sighted and blind. The remaining sections of this article are as follows:
Section 2 presents the materials and methods of the work including the Haptic Chair apparatus, detailed design of the proposed tactile facial action units, and the experimental procedures.
Section 3 presents the results of both studies, and
Section 4 provides an analysis of these results with discussion. Finally,
Section 5 outlines important directions for future work in social interaction assistants for individuals who are blind.
4. Discussion
4.1. Study #1 Discussion
Ten of the fourteen individuals who were blind or visually impaired passed the training portion of the study and moved on to testing, requiring an average of approximately two attempts at the training phase. This is impressive given the number of tactile facial action units, the variations in pulse width, and the short training period. In contrast, only half of participants who were sighted passed training, although analysis began with thirteen participants, but three were thrown out due to equipment malfunctions. On average, sighted participants required more attempts at training; however, for those participants who passed training in four or less attempts, no significant difference was found for the number of training attempts between the two groups, t (13) = −2.255, p > 0.01, two-tailed. These results, however, are preliminary given the small sample size of each group, so further exploration with larger sample sizes are needed for more conclusive results.
In terms of performance, statistical analysis revealed no significant difference in mean recognition accuracy, averaged across patterns and durations, between the blind/VI group and sighted group, t (43) = −0.172, p = 0.865, two-tailed; nor were any significant differences found in mean recognition accuracy between the groups for any of the three variations in duration: 250 ms, t (13) = 0.332, p = 0.745, two-tailed; 500 ms, t (13) = −0.357, p = 0.727, two-tailed; and 750 ms, t (13) = −0.348, p = 0.733, two-tailed. In other words, individuals who self-identified as blind/VI performed just as well as participants who were sighted. While promising, these results should be viewed as preliminary and inconclusive until larger sample sizes are investigated. Moreover, it is important to note that the blind/VI group consisted of a spectrum of individuals who are legally blind including four individuals who were born blind, three individuals who became blind late in life, and three individuals who self-identify as visually impaired. Further research is needed with larger sample sizes to investigate any performance differences between these three sub-groups.
Our previous work [
15] revealed significant differences in mean recognition accuracies for pattern type for the blind/VI group. Indeed, from
Figure 5, clearly some patterns are more difficult to recognize than others, and
Table 1 corroborates recognition difficulties with participant feedback. In [
15], we identified the cause of the misclassifications, which were similarities between a few of the tactile facial action units. Since then, the mapping has been improved, and Study #2 used the refined patterns. Comparing
Figure 5 and
Figure 6, it is interesting to note the similarity between the groups in terms of recognition performance on the individual tactile facial action units.
Table 1 and
Table 2 also show many similarities in subjective assessment of the ease of recognizing the patterns and their naturalness.
4.2. Study #2 Discussion
Six of the eight participants in the blind/VI group passed Part 2 training and moved on to testing, needing an average of 2.4 attempts at the training phase. Similarly, five of the eight participants in the sighted group passed Part 2 training, also needing an average of 2.4 attempts. As with Study #1, this is impressive given the short training period and the number of tactile facial action units, in addition to having to associate emotions to facial action unit combinations.
Comparing the blind/VI group with the sighted group, no significant differences were found in mean recognition accuracy for the complete multidimensional patterns, t (9) = −0.047, p = 0.964, two-tailed; emotions, t (9) = −0.120, p = 0.907, two-tailed; nor facial action unit combinations, t (9) = −0.294, p = 0.796, two-tailed. This outcome hints at the possibility of similar recognition performance between sighted and blind/VI groups, but as with Study #1, these results should be viewed as preliminary and inconclusive given the small sample sizes. Moreover, the blind/VI group consisted of five individuals who are late blind and one individual who is congenitally blind. Therefore, a larger sample size would allow comparisons to be made between congenitally blind, late blind, and VI subgroups.
Previous work [
16] focused on exploring significant differences in mean recognition accuracy for pattern type within the blind/VI group. No significant differences were found, indicating that no pattern, in particular, was more difficult to recognize in terms of its emotional content or facial movements. However,
Figure 9 and
Figure 11 clearly indicate that some participants struggled with Sad (AU1), and this is corroborated by the subjective ratings in
Table 3. Interestingly, the sighted group seemed to have less an issue with this emotion and its associated facial action unit. They did, however, seem to struggle with Fear (AU5) and Fear (AU5+AU20), as displayed in
Figure 10 and
Figure 12, and corroborated by subjective ratings in
Table 4. The blind/VI group also had challenges recognizing emotions from facial action unit combinations representing Fear as well as Anger. These difficulties warrant further attention with larger sample sizes.
Participants of the blind/VI group incorrectly classified Fear as Surprise eight times, whereas participants of the sighted group incorrectly classified Fear as Surprise six times, Anger five times, and Sadness four times. Expressions of Fear, Surprise, and Anger share AU5 (raising eyelids) along with variations in mouth movement. Expressions of Fear and Sadness share subtle facial movements in the eyes and/or eyebrows/forehead. While these similarities most likely forced participants to rely on subtle variations to distinguish between emotions, we hypothesize that with further training, participants’ recognition of these nuances will be improved and increase proficiency in identifying emotions. We also plan to redesign select patterns, such as AU1, to further enhance the distinctness of the tactile facial action units, thereby easing training.
5. Conclusions and Future Work
The purpose of this work was to explore two questions: (1) How well can individuals who are blind recognize tactile facial action units compared to those who are sighted? (2) How well can individuals who are blind recognize emotions from tactile facial action units compared to those who are sighted?
Results from this preliminary pilot test are promising, hinting at similar recognition performance between individuals who are blind and sighted, although further investigation is required given the limitations of the study including (i) small sample sizes; (ii) age differences between the sighted and blind/VI groups; and (iii) small sample sizes per subgroups within the blind/VI group (i.e., congenitally blind, late blind, and VI), preventing exploration of the impact of visual experience on recognition. While the limitations of the results make them preliminary and inconclusive, the findings show potential for augmenting social interactions for individuals who are blind. Specifically, recognition performance among individuals who are blind was promising, demonstrating the potential of tactile facial action units in social assistive aids. Moreover, we hope that these findings will garner more interest from the research community to investigate further social assistive aids for individuals who are blind or visually impaired.
As part of future work, we aim to conduct the aforementioned pilot studies with much larger sample sizes. We are also investigating recognition of multimodal cues (voice+tactile facial action units) by individuals who are blind. This work would improve the ecological validity of the experiment since real-world social interactions involve a verbal component. Other possible directions for future work include: (1) Exploring limitations in pulse width reductions to further speed up communication while maintaining acceptable levels of recognition performance; (2) Designs that cover more facial action units while maintaining distinctness and naturalness across patterns; and (3) Longitudinal evaluations of tactile facial action units in the wild with individuals who are blind or visually impaired, potentially as part of case studies.