Creating Non-Visual Non-Verbal Social Interactions in Virtual Reality

Biggs, Brandon; Murgaski, Steve; Coppin, Peter; Walker, Bruce N.

doi:10.3390/virtualworlds4020025

Open AccessArticle

Creating Non-Visual Non-Verbal Social Interactions in Virtual Reality

¹

Computer Science, Georgia Institute of Technology, Atlanta, GA 30332, USA

²

Inclusive Design, Ontario College of Art and Design University, Toronto, ON M5T 1W1, Canada

³

Psychology, Georgia Institute of Technology, Atlanta, GA 30332, USA

^*

Authors to whom correspondence should be addressed.

Virtual Worlds 2025, 4(2), 25; https://doi.org/10.3390/virtualworlds4020025

Submission received: 1 April 2025 / Revised: 13 May 2025 / Accepted: 25 May 2025 / Published: 4 June 2025

Download Versions Notes

Abstract

Although virtual reality (VR) was originally conceived of as a multi-sensory experience, most developers of the technology have focused on its visual aspects to the detriment of other senses such as hearing. This paper presents design patterns to make virtual reality fully accessible to non-visual users, including totally blind users, especially with non-verbal social interactions. Non-visual VR has been present in the blindness audio game community since the early 2000s, but the conventions from those interfaces have never been described to a sighted audience, outside of a few limited sonification interface papers. This paper presents non-visual design patterns created by five of the top English-speaking audio game developers through a three round Delphi method, encompassing 29 non-verbal social interactions grouped into 12 categories in VR, including movement, emotes, and self-expression. This paper will be useful to developers of VR experiences who wish to represent non-verbal social information to their users through non-visual conventions. These methods have only been rigorously tested through the commercial market, and not through scientific approaches. These design patterns can serve as the foundation for future investigation in exploring non-visual non-verbal social interactions in VR.

Keywords:

virtual reality; blind; non-visual; non-verbal; social interaction; game

1. Introduction

Virtual reality (VR) was initially defined by Ivan Sutherland in the 1960s as a multisensory, interactive simulated world, and has since evolved to incorporate immersion, perception, and interaction with a virtual environment (VE) [1,2,3]. VR technology uses position-tracking and real-time updates of visual, auditory, and other sensory displays to create a sense of presence in a VE. However, mainstream VR experiences are often visually focused, making them inaccessible to blind and low vision individuals (BLVIs) due to the lack of semantic interface elements, speech output, and alternative ways to access information [2,4,5,6]. Despite this, there is growing interest in how best to provide immersive haptic and auditory VR experiences, for a number of reasons.

Although the sound design of most video games lags behind their graphics in terms of realism, this is beginning to change [7], p. 5. Even outside the world of gaming, the use of spatial audio to enhance the realism of soundscapes is increasing [8]. Apple has begun to make use of the head tracking and spatial audio features of high-end Airpods to improve the audio realism of Facetime calling, as well as some offerings on Apple Music and Apple Television [9]. Teleconferencing platforms, including Microsoft Teams and Zoom, have also recently introduced features which allow for the voices of meeting participants to sound as though they are coming from the location of their pictures on screen [10,11]. The sales of non-visual VR headsets that allow for head tracking and spatial audio were over ten times greater than those with a visual element in 2021 [12,13], which suggests that VR developers could reach a much larger market by making their VR interfaces fully accessible through audio. Although headphones are often not considered fully functional VR displays, for totally blind users, they are the only affordable and functional VR displays that can fully communicate symbolic language (e.g., speech, print, or braille).

For BLVI users of virtual environments, the potential benefits of incorporating spatial audio are enormous if the design of the audio is thoughtfully done. Tools powered by VR can enable users to experience environments, learn skills, and participate in activities where physical environments would create barriers [14,15]. VR has the potential to be beneficial in many spheres of the professional world including job training, education, remote assistance, and healthcare [16,17,18,19]. Many people with disabilities face barriers to many services and socialization experiences, especially that take place outside the home [20,21,22,23]. VR has the potential to bring these experiences into the home and make them more accessible than they would be in the real world.

Even as the potential benefits of designing VR to be inclusive of those with disabilities are vast, the potential drawbacks of failing to do so are equally daunting. As we explore in the following sections, VR is being applied in a growing number of fields. People whose needs have not been considered in the development of VR, therefore, stand to see their participation in these fields reduced in proportion to the growth of VR. So, this tool can serve either to benefit those with disabilities, such as BLVIs, or to exclude them from a growing number of personal and professional activities, depending on how thoughtfully the technology is developed.

1.1. Job Training

There has long been interest in the potential of VR and/or augmented reality (AR) to enhance job training, and education more generally [24]. With the rapidly growing interest in artificial intelligence (AI) this trend is likely to accelerate. But even without significant integration of AI to make virtual training materials more interactive, evidence suggests that having access to immersive 3-dimensional (3D) demonstrations through head-mounted displays has significant training advantages over traditional text-based materials [25].

For BLVIs, VR has the potential to provide additional work-related benefits. VR can simulate any designed environment, including workspaces. BLVI employees are able to navigate through the space, familiarize themselves with the spatial location of equipment, and even carry out specific tasks without physically being present [26,27]. BLVIs may also benefit from “explore world” VR scenarios, allowing users to familiarize themselves with a workspace before starting a job [14,28].

As with other VR applications, however, failure to consider how to use virtual environments in accessible ways will lead to greater exclusion instead of realizing their potential benefits.

1.2. Remote Assist and Collaboration

Working collaboratively on shared documents of various types has become commonplace, especially since the Covid-19 pandemic of 2020. While major teleconferencing tools such as Zoom and Microsoft Teams have made efforts to meet accessibility standards, so that it is possible for BLVIs to make use of at least most of the functionality that these tools offer, their ease of use for those relying on assistive technology has much room for improvement [29]. VR, however, opens up whole new avenues for collaborative work and interaction. For example, one team of researchers designed an activity in which users collaborate to optimize the placement of lamps in a virtual park. Their task was to make the paths and benches secure for humans by providing illumination, while minimizing power consumption and disturbance to nocturnal animals [30]. A different study sought to use VR to enable people attending an academic conference virtually to have casual interactions with attendees who were physically present [31]. While it should be possible to design these projects so that BLVI users could participate in them, it would take significant effort to achieve.

In physical workspaces, if help was needed on a particular task, such as describing a graphic showing geo-referenced climate data, there would have to be someone physically present to assist [32,33]. However, shared virtual spaces provide more opportunities for employees to be well-versed in their position since help can be outsourced and is not limited to the employees who are physically available to help [34,35]. Sighted and BLVI workers using a shared simulated interface would also minimize barriers that may be present in the physical environment and foster more collaboration and productivity [36]. This is possible since VR can virtually overlay digital information into physical job settings and help employees trade information without being in the same location, facilitating intergenerational and expert knowledge sharing [14,37].

1.2.1. Healthcare

The healthcare industry was disproportionately impacted by the COVID-19 pandemic, severely limiting access to key training tools such as medical equipment, labs, and even human participants [38,39]. With VR, simulated operating rooms, offices, and cadavers can be used to help users familiarize themselves with resources that are not readily available but critical to their success [40,41]. They can also use VR to visualize or remember the steps needed to complete procedures and diagnostics, as well as communicate and collaborate with their co-workers and patients [14,42]. BLVIs need to be able to use this VR technology both as a medical provider and as a patient, otherwise this technology will be out of compliance in many jurisdictions.

1.2.2. Visual Virtual Reality

Part 1: Visuals

Traditional VR has shaped the way we perceive the utility of VR. Its equipment comprises a visual headset and a pair of joysticks [43]. The headset is placed on the head and over the eyes, displaying the color and dimensions of the simulated environment to its user [44]. The goggles have a left and right visual field, which, when viewed simultaneously, creates a depth of field in which the user can freely navigate and explore [4].

Part 2: Audio

Audio is integral to the visual virtual reality experience, as it complements and legitimizes what is seen, but can be used independently of the visuals (e.g., when visuals are obscured [45]). VR uses advanced technologies to replicate the dynamic range of sound that hearing people experience. Features such as 3D audio, surround sound, and head-tracked audio further immerse the user into the VE and improve accuracy in accomplishing navigation and localization tasks similar to real-life scenarios [46].

Part 3: Head Tracking

One of the differentiating features that distinguishes VR headsets from other kinds of displays is the use of head tracking. The headset has positional and rotational sensors that allow for tracking of the head [47], and then adjusting the visual and/or audio field accordingly [48]. Head tracking adds a level of realism to both audio and visual experiences that would otherwise be lacking [49,50].

Part 4: Tactile

Joysticks, handheld controllers, and other haptic devices are usually complementary to the headset, which is typically responsible for the physical and geographic movement of the user, as well as haptic sensations [51,52]. Haptics technology provides a tactile response that is meant to simulate and reproduce pressure on the surface of the skin [53]. In VR systems, the controllers communicate these sensations via vibrations, shakes, and rumbles which correlate with the visual environment to add texture and realism to the experience [54]. More advanced systems, such as [55,56], and Wireality, attempt to add textures and force feedback to virtual environments to facilitate greater immersion and functionality of the VR environment.

1.3. Use of VR with BLVIs While Navigating

In the mainstream VR space, the primarily visual (or vision-required) interfaces have restricted BLVIs from fully participating in virtual experiences. With the setup of most VR systems, objectives and goals typically cannot be accomplished without the visual component. Since using vision is not an option for many BLVIs, auditory and tactile experiences have sometimes been explored. The first such approach is to provide users with more accurate audio and haptic responses [57]. BLVIs do not need visual feedback in order to feel immersed in a VE. VEs that rely on the perception of 3D sounds are designed to help BLVIs to gain a cognitive spatial representation of the environment and to develop orientation and mobility skills for safe navigation in unfamiliar settings [58].

The second approach is to create assistive VR technologies that facilitate navigation, primarily through navigation accessories such as canes and gloves [59]. There have been emerging VR technologies that place BLVIs as the primary consumer [60]. Though none of these systems are commercial yet, they exemplify great strides toward assisting in blind navigation through VR [61,62]. With direct feedback from BLVIs on design changes and improvements that would make VR more accessible, VR developers will be better able to support BLVIs through conscious design implementations [63,64].

The design of most VR experiences communicates information visually. Details, such as the amount of time left to complete a given task, health status information, and spatial location of players, may be only visually communicated and are absolutely necessary in order to complete a simulation or navigate effectively. This excludes BLVI populations from participating in these experiences.

However, it has been hypothesized that auditory VR can help BLVIs to be more spatially aware of their surroundings [65,66]. This process of translating depth information into audio feedback has been thought to help BLVIs even identify 3D objects in space. Ref. [19] devised a system called Virtual Reality Simulator for Sonification Studies or VRS3 that would create depth maps and be able to determine the distance between a user and objects in their environment. The tracking device used for the VRS3 is a magnetic field-based tracker, the Polhemus Fastrak, which provides absolute position and orientation for up to four probes [19]. After a brief training period, users were able to move around walls and columns, identifying the layout of rooms.

Incorporating VR in spaces of learning can be an impactful way to increase knowledge retention in sighted and BLVI individuals alike [67]. Education techniques that go against the grain of what is normal and commonplace have demonstrated better rates of engagement and recall [58]. Researchers have looked into embodied learning in VR, which is increasingly important in shaping how to teach mathematics [3]. Embodied learning has particular potential for BLVI students since it engages multiple senses including proprioception, bodily action, touch, and hearing, rather than focusing primarily on visual resources [3]. Immersive Virtual Reality (iVR), which is a VR experience that aims to exactly mimic the real world, has the potential to foster embodied forms of teaching and learning mathematics through sensorimotor interaction. In the study by [3], a blind child interacted with the iVR experience with the help of a researcher and his teacher for one hour, and results demonstrated that iVR offers suitable navigation training, as it provides a controlled, customizable and safe environment for this purpose [3]. Qualitative findings demonstrate that bodily movement and positioning can effectively foster BLVI children’s engagement with the Cartesian plane, and the role that sound and researcher facilitation played in this process. The majority of the children mentioned some difficulty in distinguishing the two sounds related to the x and y axes [3]. This study showed the need for specific spatial arrangements and for appropriate physical space devoted to the iVR activity. Identifying spaces to use iVR systems highlights the spatial constraints that impact BLVI navigation that are relevant not only to educational practitioners but also to technology designers.

1.3.1. Non-Visual VR Navigation Technologies

Multiple sources of research have highlighted the significant benefit of VR canes, gloves, and devices for improved independence and accurate navigation [68]. A project by [69] studied the “virtual reality of moving” to determine if haptic sensing through a cane could create a virtual environment. VirtualHand is a glove with sensors placed on various locations of the hand to measure movements and pressure points. A magnetic sensor attached to the CyberGrasp cane determined the position of CyberGrasp worn on the hand in space [69]. The CyberGrasp sensed an item on the ground, sending vibrations that help the user understand the shape and distance of the object. Navigation procedures, such as changing paths and preparing steps in advance, were easier with CyberGrasp than with traditional canes [69]. Ref. [6] discusses the effectiveness of the haptic controller, Canetroller, showing it to be a promising tool to enable BLVIs to explore and navigate the virtual world by simulating the real experience of a white cane. Ref. [70] studied the efficacy of a smartphone-based device that increases the safety and reliability of mobility assistance for BLVIs; not only was it proven to accurately detect the presence of obstacles, but it reduced costs by utilizing popular technologies such as Bluetooth and text-to-speech (TTS) features. Researchers in [71] who investigated HOMERE, a multimodal system dedicated to exploring VEs, found that BLVIs valued its obstacle detection, auditory cue, and active navigation features. In a study by [72], BlindAid, a system that provides BLVI users with non-visual tactile stimuli through the Phantom device, along with spatial auditory stimuli, was effective in enhancing and accelerating the user’s understanding of an unknown environment. In [73], the Virtual Cane System was a navigation tool that enabled users to walk and look around VEs. It included a “teleport” action command and “explore”/“look-around” modes. It supported VE participants in cognitive map construction and later assisted them in orienting themselves in physical environments [73].

1.3.2. Locomotive Navigation

VR locomotion is a technology that enables movement from one place to another within a virtual reality environment [74]. Locomotion through a VE is enabled by a variety of methods including head bobbing and arm swinging, as well as other natural movements that translate to in-game movements [75]. The paper [74] conducted the first user study to compare four types of VR locomotion to support egocentric VR locomotion with seven blind and low-vision participants and found that joystick-controlled locomotion was the safest device to use, and trackers on the ankles were the least safe because of potential real-world collisions [74]. In the paper by [76], by combining audio and visual interfaces with force feedback joystick movement, researchers were able to prove that standard features of traditional video games could be replicated using a 3D audio environment [77]. The joystick-controlled locomotion performed best in terms of speed, and even topped the precision and intuitiveness user ratings, while cognitive load was about equal across all four implementations, whereas the physical demand was higher with the treadmills than with other devices [74]. The participants’ qualitative feedback was further contextualized, stating that treadmill-based locomotion felt “unnatural, space-demanding, and less precise,’’ whereas the joystick implementation was appreciated for “simplicity and practicality” [74]. The study by [78] further highlights the importance of embodied interfaces, which require the user to physically manipulate a device [79]. They found that children significantly preferred being able to interact with systems to extract data, which underscores the value of utilizing personalization to give users the ability to understand otherwise overwhelming data sets [78]. Similarly, BLVI users benefit from systems that are customizable to their strengths, weaknesses, and sensitivities, which expedites the learning curve and closes accessibility gaps [80].

1.4. Non-Verbal Social Interactions, Importance for BLVIs

People participate in social interactions every day with friends, family, co-workers, and strangers [81]. A strong set of social skills is critical in life—for example, they help us make new friends or make good first impressions at job interviews [82]. Social interactions are one of the nine parts of the expanded core curriculum, a curriculum teachers of the blind and visually impaired (TVIs) frequently use to instruct BLVIs in primary and secondary school (K-12) in the United States [83]. Good social skills begin to develop at an early age and are essential for social development and acceptance, helping individuals lead productive, healthy, successful lives. Social interactions involve both verbal and non-verbal (visual) communication cues [84]. Non-verbal communication consists of a variety of cues such as physical environment, the appearance of communicators, and physical movement. In an average conversation between two people in the real world, about 65% of the communication is non-verbal [85]. BLVIs cannot independently access this visual information, putting them at a disadvantage in daily social encounters [86]. BLVIs find it difficult to know when to speak because they cannot determine the direction of the questioner’s gaze or may not hear that their conversational partner has walked away [87]. BLVIs often do not feel comfortable asking others to interpret non-verbal information during social encounters because they do not want to burden friends and family [88]. These factors can lead BLVIs to become socially isolated, which is a major concern given the importance of social interaction [89]. Specialized training is available to help BLVIs in learning to convey appropriate non-verbal cues [sacks2006teaching]. While this training is helpful, assistive technology is still required to allow BLVIs to independently perceive non-verbal cues [90]. While many assistive devices have been developed to meet a wide range of needs of BLVIs, not enough attention has been given to the development of assistive devices that satisfy the need for access to non-verbal communication in social interactions [91].

Ref. [92] investigated the importance of non-verbal social interaction in VR and found that many users exclusively communicated nonverbally and commented non-verbal communication was possibly more important than verbal communication, in contrast to the real world [93]. The paper highlighted how people with physical disabilities may find it difficult to turn their avatar around to talk with someone, or how deaf individuals are unable to do sign language in VR. Some users, such as those with an accent, women, or individuals whose physical voice does not match their gender’s voice, prefer not talking to avoid harassment. This paper shows the critical role non-verbal social interactions play in VR, and if BLVIs are unable to access this information it could lead to social exclusion, harassment, and an inability to communicate with some individuals [94].

1.5. Inventory of Non-Verbal Social Interactions in VR

Several papers have attempted to inventory non-verbal social interactions in VR. Ref. [95] created an inventory of non-verbal social interactions based on spending several hours exploring the social interfaces in 10 of the top VR worlds. They described 17 non-verbal social interactions, including different methods of movement, facial and body movements, multi-avatar contact, and avatar collisions. Ref. [96] investigated social interactions among large groups of avatars. They identified speeches and avatar appearance as two important factors, but also highlighted environment manipulation (e.g., adding non-player characters and landscape features) as significantly impacting social interactions. Ref. [97] investigated non-verbal social interactions in the virtual place “Second Life”. Although they found many similar interactions to previous researchers (e.g., avatar appearance), they further defined environmental manipulation as specific object manipulation (e.g., playing a guitar). They also observed that, rather than using the avatar body to emotionally react, avatars often flashed emojis above their head. They also observed a unique subset of body gestures they called “iconic gestures” that were meant to show concrete items with hand gestures (e.g., scissors cutting a piece of paper). Finally, Ref. [97] described how gestures could change based on avatar proximity (e.g., whispering showed font in smaller text and the avatar pantomimed a whispering posture, in contrast to large bolded text and expansive movements for shouting).

1.6. Evaluations of Non-Visual Non-Verbal Social Interactions in VR

There have been two papers that have specifically focused on non-visual non-verbal social interactions in VR. The paper [98] investigated different methods of conveying the distance between an avatar and different characters in the space using spatial audio (e.g., proxemic acts), and found little to no use for this kind of information in VR for BLVIs. Although the 12 BLVI participants were able to identify avatar proximity, accuracy fell as the number of avatars increased. In the study presented in [99], eye contact, head shaking, and head nodding (e.g., gaze/eye fixation and iconic gestures) were co-designed with BLVIs to be represented using short musical phrases in spatial audio combined with vibration on a controller. Their 16 BLVI participants found the cues were rated as helpful and significantly increased the ability and confidence of participants to identify detecting attention in conversational partners. These two evaluations provide insight into the fact that non-verbal social interactions need to be optional and may not be useful for everyone, but for those who desire them they are strongly preferred.

1.7. Auditory Display Techniques Useful in Non-Visual VR

Both speech-based and non-speech-based cues should be utilized in the different types of auditory wayfinding and navigation systems to present the most effective user experience. There are two types of auditory interfaces: way-planning interfaces used for virtually investigating the space to build a mental model before travel, such as digital auditory maps (or mini maps), and wayfinding interfaces that give real-time directions and cues while traveling, such as turn-by-turn navigation systems (TBTNS) [100,101]. Way-planning systems, such as [32], are often interactive/query-based data exploration auditory displays that allow users to move a virtual avatar through a space and hear a mixture of sounds to indicate particular map features and speech messages to name and provide exact location information about features [45]. Wayfinding applications, such as [102], are continuous interactive/query-based status monitoring/data exploration auditory displays that track a user’s location as they move through space, and often present speech messages about when to turn, allow querying for information about the current environment, and sometimes present sound cues that move in 3D audio as the user moves through space [45,103]. Almost all auditory way-planning and wayfinding systems utilize speech, with many also including non-speech sounds as well [45].

Speech is useful for providing symbolic directions, exact numbers, names, hierarchical structures, and teaching users what non-speech audio sounds represent [104,105]. The cons of speech are that it is slow, can only be comprehended through one or two simultaneous channels, requires users to understand the language, and typically requires a higher cognitive load when compared with alternative auditory methods [45,106,107,108,109]. Both VR-like interfaces in [32] and ref. [101] use speech messages as the user avatar moves over new features to communicate exact distance information to objects, to provide a scan of what features are around the user, and to present coordinates or other symbolic details about the environment. Ref. [102] presents an auditory interface that uses primarily non-speech audio, but employs some speech and spearcons (fast messages saying the name of the object or concept) [45] for representing menus and object names. The pros of non-speech auditory cues include presenting changes in patterns, time-series data, optimizing reaction times in high-stress environments, presenting warnings, and processing simultaneous data streams [45]. Auditory-only interfaces sometimes have difficulty presenting intent to the user, struggle to represent hierarchical information, and may require their own language to be learned when presenting exact symbolic information to users, like “3.5 feet ahead” [45,102,107]. Sound design symbiotics in VR applications can be placed along an analogic-symbolic representation continuum, with direct (e.g., a fountain), indirect ecological (e.g., thunder representing rain), and indirect metaphorical symbolic (e.g., a warning dog bark representing a pole). Sounds can also be categorized into auditory icons (analogically similar sounds to an action; (e.g., the sound of a cash register playing when entering a store), requiring shorter learning time), earcons (symbolic arbitrary sounds mapped to actions, concepts, or objects, requiring longer learning time (e.g., a ding to tell the user to turn right)), and spearcons (see above) [45]. Ref. [32] uses direct auditory icons to both represent the terrain (e.g., a concrete footstep to represent a road), and looping indirect ecological auditory icons in 3D audio (e.g., the sound of clinking dishes to represent a restaurant). In general, speech should always be an option, but non-speech sounds should also be used to speed up the user experience for users familiar with the interface [45].

1.8. Audio Games

Audio games (AGs) are games that can be played completely in audio, and are often made by and for BLVIs [110,111]. AGs are a “natural laboratory” where many non-visual non-verbal social interaction paradigms have been developed by and for BLVIs, and have been tested through the rigors of a commercial marketplace [112,113]. There has been little to no research presented in the academic literature to communicate audio game conventions. [113] describes the many different kinds of audio games that can be played (e.g., first person shooters, strategy, multiplayer online, and role-playing games). Almost all games (AGs included) have some kind of communication element, whether it is communicating with other players or with non-player characters (NPCs) [65,114]. “Survive the Wild” [115] is a non-visual fully immersive multiplayer online role-playing game where users build their world through harvesting and manipulating resources, perform quests, and can build social groups with other players. Currently, players have the option of “verbal” communication either using various text channels or with voice chat positioned using 3D audio [116]. However, since Survive the Wild is played by people who speak many different languages, “verbal” communication sometimes is not the way to go. As such, other player-to-player communication typically happens nonverbally using 3D audio through headphones. Multi-avatar interactions (e.g., giving items to someone else, or splashing water on someone else) happen by: (1) turning your avatar to face the sound of the other avatar; (2) using the keyboard to walk forward, and stop when the sound of the other avatar is at max intensity; and (3) performing actions, such as moving your right hand, to splash the other player [2]. Almost all actions performed by a player have an associated sound; that is, if you are standing near another player, the sounds coming from that player’s position will clearly indicate what they are doing, such as what items they are handling. If a player chooses to check, they can also use keystrokes to view the items that other nearby players are holding, as well as other information about them. Players can press keystrokes to perform other basic non-verbal social functions that can be communicative as well, such as causing their avatar to play clapping, snapping, itching, gasping, yawning, laughing, or screaming sounds as well as several others. Since AGs have created many conventions around non-verbal non-visual social interactions, it would be prudent, from the standpoint of familiarity, cost of iteration, and ease of implementation, to utilize these existing conventions in mainstream VR environments, as they seem to work [117,118]. AGs traditionally use a keyboard or touchscreen for input, and commodity headphones for output.

Examples of some AGs that are popular with BLVIs include Survive the Wild (STW), a non-visual fully immersive multiplayer online role-playing game where users build their world through harvesting and manipulating resources, perform quests, and can build social groups with other players. All player-to-player communication happens either through voice chat, text chat, or nonverbally using 3D audio [115]. Another is Swamp, a multiplayer online first-person shooter where players team up to kill zombies and complete missions to collect items. Communication is done through text chat and a small number of preset verbal phrases players can select [119]. In A Hero’s Call, players perform quests, team up with NPCs, and battle monsters in a single-player first-person fantasy role-playing game. Communication happens through text-based menus and sound effects [120]. Finally, Materia Magica is a text-based online multi-user domain (MUD) where players complete quests, team up, and participate in clans. Communication is done through text, and non-verbal communication is done through preset or custom emotes [121].

There has also been some limited research from the academic community on non-visual VR games, although few of those games have made it onto audiogames.net, which is an indication of their lack of acceptance into the audio game community. In a VR game for the blind by [122], the main functioning mechanics use passive sonar, which detects acoustic signals and aids in navigation. Echolocation is also used, which involves understanding the location of objects by sensing the speed of their echo responses. These gaming systems try to reduce the amount of cognitive load, which is how much thinking one is able to do before losing the ability to process new information [123].

In his video, Virtual Reality in the Dark: VR Development for People Who Are Blind, Aaron Gluck sought to uncover ways in which BLVIs can engage in fast-paced, thrilling video games which require quick and precise navigation. With these games that require split-second decisions, developers are figuring out how to quickly provide that critical information before the user crashes, gets stuck, and loses the game. Vibrations through the game controllers were used to let players know if they were approaching a wall, drifting out of their lane, if they were entering a turn, or if they were approaching another racer. Gluck tied the haptics to hand gestures where a gentle vibration represents a slight hand raise to make a small turn, whereas a strong vibration would alert the user to quickly raise their hand in order to turn the wheel as hard as they can and make a sharp turn [123]. The distance between the two controllers, up to a maximum, determines the sharpness of the turn; if there is no difference between the height of the controllers, the vehicle is just driving straight. Also, a lift of the head or a turn to the left or right would give information about whether opponents were approaching from behind or from the side. Testing this system with blind players showed that there was reduced cognitive load when compared to using traditional VR hardware [123].

1.9. Game and VR Accessibility Conventions

There have been several sets of conventions presented by accessible gaming groups that are applicable to making VR accessible to BLVIs. These include ensuring all gameplay and narrative elements are accessible, menus are narrated, navigational and combat audio cues are added, spatial audio is used, and installers and guides are screen-reader accessible [124,125]. There have been a few sets of guidelines for non-visual XR accessibility, but they are more ambiguous. These include making sure users can select and query information about objects, use the interface with a keyboard, use menus, provide clear auditory landmarks, use spatial audio, support standard touchscreen gestures (e.g., swiping and double-tapping), allow for second device output (e.g., a braille display), provide text descriptions of audio content, have fully functional voice controls, and hardware accessibility [126,127]. The only mention given to non-verbal social interactions is by [127] where non-verbal communication was mentioned as a barrier, and BLVIs commented they wanted to have a cane or some other identifier showing they are blind to communicate with other players that their gameplay style may be different. The present paper expands the work presented in the aforementioned guidelines to a more comprehensive investigation into how to create accessible non-verbal social interactions in VR. These guidelines provide the building blocks of the interactions (e.g., screen reader support and spatial audio), but have no detailed description of how a headshake should be displayed. This paper provides detailed designs for the 29 non-verbal social interactions inventory using the above building blocks. Additionally, audio game developers have been absent from the creation of these guidelines, despite having a vast amount of experience creating games BLVIs appreciate. The above interactions provide the basic tools to create accessible non-visual interactions; the guidelines just lack details for complex interactions.

1.10. Accessibility Barriers in Mainstream VR Platforms

Most traditional VR development platforms have few accessibility tools and none of them are usable for BLVI developers without major modification. Unity recently created basic screen reader integration on mobile platforms, but there has been no indication of when desktop or other platforms will be supported [128]. The history of developers asking for Unity accessibility began in 2014 and is documented in this post on the Unity forum: [129]. The forum post [129] also details ways to create screen reader support for Unity tools, and there are some addons in the Unity store that allow screen reader support on more platforms [130]. The current Unity development interface is unusable to a screen reader, and developers are required to use the interface for using the Hierarchy panel, Inspector panel, settings, and build dialogue, and scripts do not allow full development control [131].

Godot, an open-source game engine, has been the most welcoming to BLVIs, with a merge in April 2025 of initial screen reader support from a pull request that was open for two years [132]. There has also been community support with the Godot Accessibility Plugin that was last updated in 2020 [133]. This means that native screen reader support should be included in Godot by the end of 2025. Significant amounts of testing need to be done, and there are several areas that are admittedly not usable to BLVIs yet, but this is a large step forward in accessibility for VR users and developers. This will allow BLVIs to use their own screen reader.

Unreal Engine also has basic support for a rudimentary screen reader developers can implement, but there is no support for BLVI developers [134,135]. The screen reader seems to be built-in and only supports a handful of basic elements (e.g., button, text, checkbox), but this coupled with sending messages to the screen reader can be modified to support many basic interfaces in games (e.g., menus and dialogues). It is unclear what the text navigation support is in their built-in screen reader.

Unless a game development platform explicitly adds in a screen reader or accessibility, the assumption should be that the platform is inaccessible. Many tools (e.g., Unity, Godot, and Unreal Engine) use what is called a “graphics layer” to render interfaces. This is essentially drawing pictures on the screen. This is in contrast to native applications that will provide semantic information to designate the component type and functionality to assistive technology using UI automation, for example in [136]. The images drawn on a graphics layer lack semantic information by default [137].

There is a strong need for development platforms to both be made accessible to BLVI developers and provide accessible interfaces for BLVI gamers. The 21st Century Communications and Video Accessibility Act (CVAA) in the United States requires digital communication tools to be usable to BLVIs in both VR and game environments [138]. Moreover, starting in April 2026, all educational institutions in the United States, including K-12 and higher education, will be subject to new ADA Title II regulations requiring that every website and application, VR included, be usable by everyone [139]. Although a small amount of progress has been made, there is still a long way to go before BLVI players, and BLVI developers can equally use platforms by default. Although much of this accessibility deals with user interface elements, much of non-verbal communication is presented outside of traditional interface elements.

1.11. Self-Disclosure

The two lead authors on this paper are BLVIs who have been core members of the audio game community for the last fifteen years and are frustrated at the lack of accessibility in mainstream VR environments. Both authors have personally experienced the inaccessibility of major mainstream VR development tools for BLVI developers and have advocated for accessible interfaces for BLVI players. With the Metaverse attracting more investment than ever before, the authors felt it was imperative that the top audio game developers describe how to create VR interfaces that are accessible and legally compliant before platforms become too difficult to change. The authors were also frustrated that non-visual VR research from academia and XR Access was trying to reinvent non-visual conventions that have been refined over decades in commercial audio games. Although much of the past VR research has utilized participatory design, the lead researchers have almost never been BLVIs, let alone BLVIs who have spent thousands of hours playing hundreds of audio games. Following the six principles of Design Justice, the authors are sharing their expert experience and appreciation for the expert experience and designs of the top five audio game developers with a hope to lead the conversation around non-visual VR interfaces by describing what is already working in the vibrant non-visual VR community [110,140]. The hope of the authors is that companies hire audio game developers to develop the non-visual experience of their platforms by either reaching out to game developers directly, recruiting on audiogames.net, or by contacting experts in the audio game field [110,141].

2. Method for Data Collection

A Delphi method is a research method where a group of experts give feedback on provided content over a set of rounds (typically three) until a consensus is reached among the experts [142]. Delphi methods are best suited for improving knowledge about an area, in this case, non-verbal non-visual social interactions in audio games that could be employed in mainstream VR interfaces. Since the objective of this study was to develop a set of design patterns to make specific non-verbal social interactions accessible to BLVIs in VR, a Delphi method allowed all expert participants to reach consensus on optimal designs for specific examples.

The limitations of Delphi methods are that they lack empirical testing and a large sample size. Since a panel of experts is assembled to provide feedback over several rounds, this means that, by design, the Delphi method lacks any kind of statistical testing. The results are purely qualitative. Since the objective of this evaluation was to set an initial design for the items in the non-verbal social interactions inventory, a panel of experts was deemed an appropriate method to accomplish this task. Future work should be done to iterate, refine, and evaluate these results. The expertise in this group of game developers is such that they should have all built and commercialized similar interactions to this inventory in their own games. Because the number of audio game developers who fit the inclusion criteria is small, this could also skew results. One bias in this group of developers is an over-expertise on desktop platforms. Few, if any, of the developers have built mobile or console games. Since most BLVIs use computers over phones, and since screen readers function similarly across platforms, the conventions should be transferable to a mobile or console interface (although the bias is for computers) [143]. Additionally, because many of the participants are BLVIs, they may lack real-world experience of non-verbal social interactions, so they may underestimate their frequency, importance, or the impact they have on interactions. There were sighted and low-vision developers on the developer panel, so in theory they should have reviewed and counterbalanced any lack of experience the BLVIs had with non-visual social interactions. Another major limitation is that all experts were male. None of the audio game developers who fit the selection criteria were female. Overdesign is also a risk in this study. Participants may have proposed experimental designs that have not been tested in the real world, and the other panelists may have not thought through the proposed experimental design. Experimental designs are described as “proposed”, start with “imagine”, or “experimental”. Finally, generalizability outside the audio game community may be limited, similar to how results from game developers may be limited outside the gaming community. The audio game community is primarily composed of males under the age of 40. Other research projects using audio game conventions have shown generalizability outside the audio game community, so this limitation is deemed to be small [21,32,65,101,144]. Despite these limitations, the results of this evaluation provide a comprehensive set of initial designs for future VR developers and researchers to use in creating non-verbal non-visual social interactions in VR.

Five English-speaking male audio game developers who had released at least four audio games listed as ‘active’ on audiogames.net, one of which used some kind of first-person avatar movement, participated in a three-round Delphi method [110,142]. All participants stayed through the entire study. The audio game repository at [110] contains almost a thousand audio games with details on active status, author, description, and user ratings. “First-person avatar movement” is defined as movement where the user moves an audio listener around a Cartesian coordinate plane where there is looped ‘background’ sounds (e.g., plates clinking to represent a restaurant, play around the user) [106,113]. This first-person mode is what BLVIs experience as VR. Audiogames.net is the major platform for audio game distribution around the world [110]. Limiting the developers to “English-speaking” allowed them to fully participate and understand the designs by other participants. Requiring “at least four game releases” meant the developer had deep expertise in audio game development. It was important to focus on audio game developers because: mainstream developers lack the focus on innovative non-visual interfaces (e.g., The Last of Us only uses simple audio game conventions [145]); audio game developers have never been interviewed in this way before; and academic audio game and VR researchers lack the number of active users for their games that these developers have. All the developers have had thousands of BLVIs playing their games, in some cases, for tens of thousands of hours. This is a level of testing academic AG and non-visual VR researchers are unable to achieve. The “active” requirement ensured all participants were still developing audio games, thus should be aware of current trends and technologies.

The participants were intentionally sampled and recruited via email from the contact info on their website or game’s readme. This study enrolled only five participants, however there are only around ten audio game developers in the world that meet the study criteria. Several developers declined to participate because they had never experienced a mainstream VR environment and thought their expertise was inadequate, despite reassurance that their games met the definition of auditory VR. Participants were from North America and Europe and were each paid $60 for their participation.

A list of 29 non-verbal social interactions were compiled from reviewing existing inventories of non-verbal social interactions in VR and studies around non-verbal social interactions in VR [92,95,96,97,146,147]. All inventory items along with their categories and a brief recommendation for each item are presented in Appendix A. The interactions were split into twelve top-level categories that included Locative Movement, Camera Positions, Facial Control, Multi-Avatar Interactions, Gesture and Posture, Avatar Mannerisms, Conventional Communication, Avatar Appearance, Avatar–Environment Interactions, Full-Body Interactions, Environment Appearance, and Non-Body Communication. Each of the 29 interactions was defined and placed into a spreadsheet with an additional row under each of the twelve top areas for additional comments. After the study, two additional areas were added. It was important to present participants with a clear example of non-verbal interactions in order to focus their effort on designing a way to create an accessible experience for that interaction, and to create a comprehensive inventory, see Supplementary Materials.

This study was approved by the Institutional Review Board at the Georgia Institute of Technology. The study took place over email. Participants were given a spreadsheet with the social interaction and a description of that interaction. In the first round, participants were each assigned five interactions, and for each interaction they were asked to design an accessible experience, and to provide examples (if any) in existing games. These responses were compiled into one master sheet, and round two had participants comment on the designs provided in the first round. When all participants had provided their comments, those comments were incorporated by the researcher into the design, and an “X” was provided next to the design to indicate it had been modified. In the third round of the Delphi method participants were asked to provide another set of comments on the updated designs. All but one participant declined to provide new edits, and that one participant just added minor clarifications to one of the designs, so the researcher deemed consensus achieved.

It is important to note that a thematic analysis was intentionally not performed on the responses for two reasons: (1) The participants provided clear actionable guidance for VR developers that needed little modification. Context was provided after each inventory item with explanations and citations by the researcher; and (2) There would be no value-add to VR developers to have a critical abstract analysis. If an abstracted set of principles is desired, the W3C provides principles for accessible XR development that align with the design patterns discussed below at [126].

3. Results

Below are all 29 areas of the inventory, split into the twelve top-level categories and subcategories. Within each subcategory there is the description of the category, the anonymized quoted “results” from the Delphi method, and a discussion from the authors about the specific results. Note that the results section is a direct participant quote, although quotation marks are not present. In the results section for each item, the quote is taken from the Delphi method with only minor spelling and grammatical editing. Brackets are used to provide context from the researcher, and small changes were made to anonymize the comments.

It is important to note that BLVIs use “screen reader” software that renders text content on their computer or phone screens through speech and/or braille. When the participants describe using “text”, they mean the process of routing the text content through their screen reader so it can be rendered through both speech and braille. Non Visual Desktop Access (NVDA) is a free screen reader on Windows that is widely used by BLVIs and that developers can use for testing [148,149]. VoiceOver is the built-in screen reader on Apple devices [150,151]. There are many programming libraries that interface directly with screen readers, and these should be used whenever possible [152,153]. The alternative is to develop a mini screen reader or text-to-speech synthesis capability within an application. However, BLVIs dislike using built-in synthesizers and strongly prefer their own screen reader, as they have customized their screen reader’s voice, speed, and other settings [106]. It is useful, however, to have a built-in synthesizer for sighted individuals who wish to use the interface non-visually [152]. It is extremely difficult to recreate all the functionality present in a full screen reader (e.g., text navigation, element navigation, mouse control, and object navigation), so it is strongly discouraged. When semantic interfaces (e.g., menus, text fields, buttons, and other standard interface components) are incorporated into a VR interface, the native HTML or operating system elements should be used to output the sounds. Screen readers are built to work out of the box for these native built-in elements [136,137]. If there is a desperate need for the creation of a bespoke screen reader, following the conventions in the NVDA screen reader manual would be a good place to start to reduce the cognitive load for BLVIs using the system [149].

3.1. Locative Movement

3.1.1. Direct Teleportation

Description

The user pushes a button, and a target appears on the ground where they are pointing; they may move the target around. The user teleports on release. These include: Body Orientation/Facing, Destination Validation/Understanding Permitted Areas, Vertical Movement/Moving up/down in space. This may affect how users travel to one another or together [95].

Results (Quotations)

There are two possible interactions (recall that these “results” are direct quotations from Delphi participants):
I press a button to begin selecting the target. I hear a small sound in the background that lets me know I’m in selection mode. Any movement now gets represented by the actual orientation of the listener in game, including position and orientation. The background sound changes slightly based on whether or not I’m able to teleport to this specific spot. Alternatively, it would include a mode which does not permit me to move to an area that I cannot teleport to, making the exploration a separate action from the actual target selection. Alternatively, exploration mode could be the default, and I press a button to select this spot as my target. If impossible, I get an error sound.
As an alternative to a free-floating destination target: I am in the lobby of the virtual hotel, hearing any number of sounds around me. I press my teleport key and hear “D4” softly spoken. I can now arrow around a 7 by 7 grid, which has placed itself around me. As I arrow, I have not officially moved my avatar, but I begin hearing sounds as though I were in that location. I arrow up and over a few times and hear “A3”, followed by splashing in the pool and the sounds of the waterslide. In my actual current location (D4 on this imaginary grid overlay) I wasn’t able to hear the pool because it was around a corner and down a hall. From A3 I am around that corner and at a direct enough path to be able to hear the pool a little ways off. My rotation is maintained as I get to “try out” each space on this 7 by 7 grid, so I won’t have to deal with the confusion of teleporting and being spun in some unexpected way. Some of the grids do say “unavailable” after speaking their coordinates, letting me know it is an invalid teleport location. I may escape out of this teleport menu without moving anywhere, though I will have quickly gained some information about my surroundings by looking around the grid. If I do choose to move, I press enter, hear some sort of sound/teleported statement, and am now actually in this new location. Dedicated background music could play any time someone is using the teleportation grid, just to help them always know the difference between actually being in a spot versus sampling it only through the grid.

Discussion

According to [95], this mode of navigation is popular in visual VR environments because it is fast and intuitive. This mode of navigation is not faster than other forms of navigation for non-visual users, because users are already moving to a location in some way, then they need to place a portal there to teleport, when their character could already be there without the teleportation. For non-visual users, this interface would only be useful if navigating around the VE in the “normal way” resulted in some kind of penalty (e.g., energy drain or monster encounter). The game A Hero’s Call utilizes a map similar to that in Option 2 above that allows users to safely explore the space before navigating their character there, but it does not allow teleportation, as that would be against the gaming objectives [120]. Instead, the user can place an audio beacon at the location that will provide the user turn-by-turn directions from their current location to the beacon that was placed on the map.

3.1.2. Analog Stick Movement

Description

Similar to console games, a joystick and its buttons are used to move the character within the virtual environment. Movement can be performed on earth or in the air, independently or around others.

Results (Quotations)

This is handled as in most traditional games, and thus should be performed in a similar manner, but non-visually. Movement sounds denote the actual movement of the character (steps, wing flaps, claw clicks, etc.) as well as collision sounds for running into other objects that differ based on the object that’s being collided with. This includes jumping, crouching, crawling, and so on.

Discussion

Recording 1: Locative movement Example—1 to 1 player movement—Swamp—run and walk on different surfaces (https://www.openicpsr.org/openicpsr/project/224701/version/V1/view?path=/openicpsr/224701/fcr:versions/V1/Locative-movement-Example---1-to-1-player-movement---Swamp---run-and-walk-on-different-surfaces.mp3&type=file, accessed on 1 April 2025). Recording 1 demonstrates player movement across different surfaces where footsteps indicate both the speed and material of the surface under the player.

Analog stick movement is the most common form of movement within first-person audio games, and is performed typically with a mouse or arrow keys [113,115,119,120]. Research from [74] found that, out of the other types of locomotion in this category of locative movement, this was the most popular for BLVI users. The game Survive the Wild is an example that allows multiple types of navigation [115]. To rotate, users can swipe left and right with the mouse, press and hold Ctrl (or other modifier keys) along with the left and right arrow keys, or use a joystick and move it left or right. To move in a direction, users either hold down the right mouse button, press the arrow key in the direction they wish to go, or move their joystick forward or back. Other configurations, such as WASD, are not commonly used. It is important to mention that most audio games have a widely used keyconfig file that allows users to reassign all the movement commands to different keys on the keyboard, mouse, or controller, just in case someone does prefer a configuration that is different from the default.

It is critical that audio feedback is provided, as described above, when users are walking or when they collide with an object, otherwise they will have no idea if they are moving or not. Some VR environments have removed audio feedback from movement, and this instantly makes what would otherwise be a pretty usable environment inaccessible to BLVI participants [154]. It is also important to note that the volume and type of sound used to indicate movement needs to be user-tested, because, just as with visuals, not all sounds are created equal. It would be ideal if users could adjust the volume of the movement and collision sounds. Movement and collision sounds are the most fundamental interface elements developers can use to make their VR application accessible to non-visual users.

One of the participants emphasized the need for constant movement feedback that is clear and understandable: “In [one of the games I developed], you can sit down, meaning you can’t walk in this position. If you tried to move while sitting down, the game would play a small rustling sound to indicate that movement was impossible. Because of the lack of a text hint though, it was found that many new players to the game either couldn’t understand why they weren’t moving, or worse, thought they were moving while actually doing absolutely nothing. This is why a text message, combined with swimming sounds, a variety of foot step speeds, flying, or avatar vocal sounds is needed to communicate if the user is really moving or not.”

There are several different verbosity modes present in audio games, as discussed in [113], but the two most applicable to VR interfaces are first-person and grid. First-person mode is where users can rotate, move forward and back, and hear a footstep sound as they move. When they enter an object (such as a road), their screen reader or text-to-speech will say “Entering Fillmore Street”, and when they leave the object, their screen reader will say “Exiting Fillmore Street”. This verbosity mode is probably what most VEs will utilize, and if the interface is not multilingual, is what English as a Second Language users prefer [106]. Grid mode is where pressing an arrow key will move the user in that respective direction, and after every key press the screen reader announces the object (or “blank space” if there are no objects), and plays a short sound, such as a step, that represents the object (or objects) under the user’s avatar. Coordinates are often spoken after the object name (e.g., “Fillmore Street, (25, 38)”). This is used in the map experience in A Hero’s Call, and in strategy games (e.g., Tactical Battle) [120,155].

3.1.3. 1:1. Player Movement

Description

The relative position of the player’s body in their physical play space is mapped directly to the position of the avatar in VR, so that the user’s body moves exactly as it does in real life.

Results (Quotations)

This should be the easiest form to represent and should not require any additional sound cues (although sounds can add a lot), besides the obvious positioning and rotating of the listener’s position, mapped to the player’s head. If the player cannot move in the virtual environment for any reason, such as if their avatar is not standing, or has a broken leg, this must be clearly indicated to the player, usually with a text message as well as sound. The text message should indicate why movement is impossible. Note that blind players may have difficulty sensing or checking for obstacles in the real world if their ears and hands are busy in VR. This mode of interaction may not be the best idea.

Discussion

Constant and clear feedback is critical in this mode of movement (read the section on analog stick movement for more details), but this mode of movement will probably not be popular among BLVI participants. There is a baseline expectation that there will already be analog stick movement with its corresponding sound cues with this mode, and this mode will just use the same sound cues from the Analog Stick Movement. [74] found that BLVIs did not like using this mode of navigation. Some audio games (e.g., Swamp), use head tracking to make it faster to rotate, but physically reaching for objects, or walking around in the real world has not taken off in the audio game community [119]. When haptic gloves with force feedback become more common, physically moving hands, head and torso may be acceptable, but there is always going to be a small concern that physically walking will lead to an unintended collision in the real world if there is full immersion in the VE. There are, however, numerous augmented reality navigation applications that use carefully crafted spatial audio that does require users to physically move through their space [102,156,157].

3.1.4. Third-Person Movement

Description

The player places a movable target in the environment using a teleportation arc to place it. The user views the avatar from a third-person point of view (POV) instead of the usual first-person POV. The player’s POV stays in the third person while the avatar walks toward the arc in real time. Releasing the teleport button causes the player’s POV to jump back to first person. The avatar stops when the movement button is released, so it need not always arrive at the destination projected via the interface arc, and it is possible to lead the avatar around with the arc, since the avatar moves at a fixed speed. The player can aim to land near others, or to drop into crowded/empty areas.

Results (Quotations)

Upon initiating this mode, the target begins to make a sound. This sound could potentially change based on rotation; however, rotation seems to play a secondary role here, so it doesn’t strike me as particularly important and can be neglected. The sound of the target should either play in short intervals, or preferably be a looping sound that plays continuously, like a hum or chord. Alternatively, you could take over the third person and temporarily control them as if they are your avatar. You could also do this when the third person needs to stay with you. You could define a position, such as 5 feet behind you, where the third-person avatar attempts to remain. But note: For completely blind players, I do not believe there is any distinction between first-person and third-person perspectives. Moving the camera back behind a player can be a visual benefit to sighted players (allows you to see yourself, possibly gain a little bit of increased field of view and possibly see around corners) but this does not translate once visuals are removed. To a fully blind player the change to third-person perspective is simply an adjustment to sound volumes, without anything gained.

Discussion

It is important to emphasize that there is little to no distinction between first-person and third-person perspective in audio, and BLVIs work best in a first-person egocentric interface [32,158]. There is an expectation that analog stick movement is part of this interface, and that is how the portal is placed. Survive the Wild does have an out-of-body experience, where one can move to hear ground textures around their character without physically moving, but this is to avoid danger to their avatar, not to navigate [115]. It is advised to avoid using this method of movement in non-visual interfaces.

3.1.5. Hotspot Selection

Description

Users can jump from hotspot to hotspot, but no other movement is supported (e.g., interactions take place around a table, with chairs for users, and users may only sit in the chairs).

Results (Quotations)

Each time a new hotspot is selected, the hotspot emits a singular sound placed at the position in 3D space of the selected object. This sound can change based on what object is being selected—chair, door, etc. It should also use speech, either pre-recorded or synthetic or both, to convey additional information about the selected hotspot object and what one is supposed to do with it.

Discussion

The assumption here is that the keyboard or other device can be used to select the different areas, and that the screen reader will read the information about that area. A Hero’s Call has a key command to cycle through targetable objects in the vicinity by pressing Ctrl+a and Ctrl+d, but this is used for picking up items and talking with characters rather than moving to a particular location. Here is an example: The user presses “m” to open the menu to select their location. As they press the up and down arrow keys to move through the locations, they hear a short sound in spatial audio at the object’s location, such as a wood clunk to represent a table, along with the name of that object (e.g., “meeting room table”). The user will press “enter” to select the object and move there.

3.1.6. Other Thoughts in “Locative Movement”

Results (Quotations)

If the game has automatic movement, such as in rail shooters, art, or sequences in moving vehicles, for example, this should be represented in an environment-friendly way, such as the following:

Cars have engine sounds, characters could walk and emit footsteps, wind sound could be used to represent movement without a particular movement constraint.

Discussion

The best way that non-visual users know they are moving in VR is when they are controlling that movement. Otherwise, it is unclear if objects are just passing by, or if they are moving. This is why games such as Cyclepath, that have the user driving a motorcycle but without holding down the gas, have a different sound for the motorcycle at rest and moving [159]. The user will start at rest and be told to press space to start the motorcycle moving. The user then hears the sound change and is then able to associate the sounds with their motion state.

3.2. Camera Positions

3.2.1. POV Shift

Description

Shifting the POV from first- to third-person view (e.g., watching an interaction from a first-person vs. third-person perspective).

Results (Quotations)

POV changes do not really exist for blind players and should be avoided when possible. If there is a strong need for a POV shift, it should be represented using either speech or a recognizable sound cue. The position of the listener should gradually change to represent this and not change abruptly. This assumes POV changes would affect my listening location, otherwise a change in POV would mean absolutely nothing. I am moving around, and the game changes me to a third person perspective POV. Perhaps I hear a sound letting me know this change has taken place, which is a good warning against the confusion that would come next. Even though I am not moving, all of my senses (sound in this case) tell me that I am being moved backwards. When the movement stops, I would assume I am now several steps farther away from the counter I was approaching moments ago. I start moving forward and collide with it much sooner than expected, in fact the sound of the bartender still makes it seem as though I’m too far away. I continue moving around the room but keep misjudging my surroundings. I overshoot things I wanted to interact with and struggle to navigate by sound, because my body is now invisibly floating out ahead of me somewhere, but I cannot visually see it lining up with things like my sighted counterparts can. I can no longer use sound to position myself correctly.

Discussion

As mentioned by the participants, this third-person perspective should not be used. As shown by [158], BLVIs think best using an egocentric interface. If a more third-person perspective is desired (e.g., when playing chess or moving troops around a map), using a grid mode (further discussed in the analog stick movement section), would be appropriate [113]. A Hero’s Call uses this grid mode to show a map of the area, and Tactical Battle uses grid mode to allow users to move their troops around a map [120,155].

3.3. Facial Control

3.3.1. Expression Preset

Description

The user has selectable/templatized facial expressions to choose from in menus or interfaces. Presets manipulate the entire face, not individual features. Presets can be used dependent on other emote behaviors (not controlled independently by the user) or independently, where users choose an expression from the interface that they wish their avatar to display (e.g., smiling moves mouth and squints eyes, crying creates tears in eyes and turns mouth down).

Results (Quotations)

The main idea that comes to mind here is to simply display an emoticon or a one- to three-word text string next to the name of the avatar displaying the expression, wherever such names are shown in a list of nearby avatars or entities. A small sound sequence could be made to loosely indicate someone’s current expression as well either when it changes or when the avatar gets in range of sight of the listener. It is, I think, very possible to indicate expression or emotion with just a few well-produced and rapid musical notes, and as such if they are played in the right manner, it may at least communicate a subset of the information attempted. Listeners would be able to remember the UI sound associated with a common expression. On the next row [in Puppeteered Expressions] I describe a sort of audible facial description system using tonal audio waveforms, that would apply to this section as well. A text string that the listener can access at will, though, would be far more diverse and would require no learning curve. In response to pressing a button in the interface to indicate nearby avatars, my screen reader would announce instead of just the avatar’s name and distance from me, smiling George is off to the left. Frank is behind and off to the right. If I am supposed to involuntarily know people’s expressions, the same text string could be used to describe the avatar all the time. Angry face Joe has just entered the room, for example, could be spoken if Joe enters the room while being straight in front of the listener. This would fit into the method described under “Words on body” (C-40). This is essentially a form of avatar customization, and a description of the preset expression can be added to the description list.

Discussion

Recording 2: Emote Examples—Materia Magica (https://www.openicpsr.org/openicpsr/project/224701/version/V1/view?path=/openicpsr/224701/fcr:versions/V1/Emote-Examples---Materia-Magica.mp3&type=file, accessed on 1 April 2025).

The best example of an accessible facial control system matching the above description can be found with any text multi-user domain (MUD) and is demonstrated in Recording 2 [160]. As mentioned, the screen reader will announce in real time what the user emotes (e.g., “John smiles”), and if the user desires, they can look at a character to see what emotion they have (e.g., target Joe with Ctrl+d and press “l” to look at him: “Joe is a smiling elf with long blond hair who is…”). Additionally, if socialization is important, the message when you target or read the character in a list could describe their facial expression. The most important thing is that the expression descriptions are short and can be turned on and off.

The above description describes short musical tones to indicate frequent expressions, which is the co-designed experience in [99]. Participants proposed smiling would use high-pitched chimes in spatial audio, and a frown was a low-toned trumpet sound in spatial audio. The biggest difficulty participants had with these audio cues was that it is difficult to learn what all these sounds represent. Coupling the sounds with speech increases the learnability of these musical phrases [107].

3.3.2. Puppeteered Expressions

Description

Users control and compose individual facial features (or linked constellations of features) through a range of possible expressions to varying degrees. A user might puppeteer an expression from slightly smiling to grinning broadly, and any point in-between extremes.

Results (Quotations)

Imagine if each controllable segment of the face (upper lip, lower lip, left eyelid, right eyelid, nostril flare, etc.) had a different and distinct audio waveform attached to it. To be clear, these would be constant tones. When the listener is interested in watching the face of another avatar, they focus on the avatar they want to watch, allowing these waveforms to become audible. The waveforms would change pitch based on the position of each facial element. The more curved down the lower lip, the lower the pitch of that particular waveform in the mix, the more curved up the upper lip is, the higher the pitch of the waveform indicating the state of the upper lip would go, etc. Volume could also be worked with here. A simple learning menu would of course exist that would allow the listener to learn what each waveform in the mix meant, preferably giving the listener the ability to change the controls on a sample face so that they could really learn the sounds. I’m pretty sure that enough listening to the output from this audible facial description method would cause the sounds that common expressions make to be remembered, so that by listening to just a fragment of the audio mix describing a face, the listener would understand the set of all features being displayed. Assuming you know how, you can convert between a couple hundred milliseconds of a chord to a few letters that can represent that chord, why not the same, but representing facial expression instead. If someone switches from a frown to a smile, you would even be able to hear the mix shift over time as the face visually changed. Alternatively, there probably is a list of facial expressions. The selected expression can be spoken.

Discussion

Note that there is no audio game that utilizes this mode of facial recognition, instead MUDs and other games that have some kind of facial expression will say “John slightly smiles”, or “John smiles broadly”. There is, however, an area of research called “sensory substitution” that does attempt to provide audio based on a facial expression, see [161] and [162] for more information. There has been limited research in this area, and due to the lack of adoption in the audio game community, little motivation for BLVIs to actually ascertain facial expressions beyond a simple text description.

3.3.3. Lip Sync

Description

The movement of the avatar’s lips or mouth synchronizes with the player’s voice audio (or with another voice track, such as a song).

Results (Quotations)

If the player is speaking, I think the listener can automatically assume that their lips are moving, and any additional information would be distracting from whatever the avatar is trying to say. If a user’s vision is too poor to see the lip movements of other avatars, I don’t believe it is important to devise an audio equivalent. The lack of lip movement might break the immersion of sighted users, because they are used to seeing that with every person they speak with in real life. If someone does not see lip movements ever, they won’t miss it once they enter the virtual world. There are times we can find a clever way to restore some social cue that blind users struggle to experience out of VR, but lip movement does not seem to be worth the effort. We must always ask ourselves if feature X even needs to be adapted. In the case of voice audio, I guess the real concern would be making sure that it is clearly indicated whether voice audio is coming from a source with lips, or something like a speaker system like a telephone or the voice of some robotic creature. This would simply be accounted for by applying an effect to the voice audio, but other than that the transmission of lip synchronicity I believe is already achieved in the talking itself in regards to voice audio. As for silent lip movement though, this is more tricky. An option may be the audible facial description idea I mentioned above [in the Puppeteered Expressions section], where the listener could focus or gaze at the person whose lips are moving, and the audio tones changing would indicate the movement of the lips. Alternatively, it may be possible to play generic audio of whispered fragments of human speech [phonemes] when a moving pair of lips passes the position required to emit such a sound. For example, if the lips aggressively part while the listener is interested in the facial activity of an avatar, the p sound would play. The same could be applied for as many lip positions as possible. When we think to ourselves it can often manifest in audible whisperings anyway, a similar concept could be used here to indicate silent lip movement.

Discussion

It is important to note that audio games in general have a distinct lack of facial expressions above a simple text description. The text description for someone mouthing words would be: “John mouths: hello”. There is an area of research called silent speech interfaces that aims to recognize facial muscle movement to detect speech, and this recognition could be used to generate text, although a private speech channel would probably be easier to implement [163,164]. In general, this is not a critical feature to have, unless it is a major part of the social interactions in the VE.

3.3.4. Gaze/Eye Fixation

Description

The ability for the avatar’s gaze to fixate on items or people in their environment. This can include the ability to fix the avatar’s gaze on an item or another player intentionally, track the nearest object in sight, move the head/gaze in the direction of movement, or set the avatar’s gaze to react to interactions (e.g., a new avatar joining the group).

Results (Quotations)

The most unobtrusive way to handle this mostly would involve the use of text strings, where the listener can press a key that would cause screen reading software to announce what entity or object the given avatar is currently looking at. Text strings would be the simplest way to keep from overwhelming a player with audio cues. However, I could think of a way to do this with just sound. I imagine if we’re talking about virtual reality, this includes a complete 3D environment, including the properties of audio. As such, imagine that if you press a key, a UI sound of a small laser beam would emanate from the position of the gazing avatar and quickly travel to the destination of the gaze. Almost like an inconsequential projectile, it would quickly travel from the source of the gaze to where the gaze lands, whatever the sound projectile impacted would be what the gaze was focused on. The sound could play automatically if the listener was focused on an avatar whose gaze changed. A less intrusive version of the same effect could happen if, for example, the listener is in a group of avatars who all suddenly turn to look at something, or even just when the gaze of an avatar you are not focused on changes. The only issue with sounds alone for this is if there are a cluster of entities in one area and an avatar gazes at just one of them. With sounds alone, it would be difficult to tell for sure, especially from more than a couple feet away, exactly what was being looked at. Combine sound with speech for this though? That would be cool.

Discussion

Eye gaze is not information BLVIs have access to in the real world, so the addition of this functionality does seem high priority, unless eye gaze is used as an important social cue or is a critical element in the game [92]. A simple text message (e.g., “John looks at you”) would be enough information to know someone is paying attention to what another character is saying. Using a targeting mechanism similar to what A Hero’s Call uses, with Ctrl+a and Ctrl+d cycling through the targetable elements in one’s proximity, would be enough to allow non-visual users to look at someone [120]. MUDs also allow a simple “l John” to represent “look at John”. The study that evaluated eye gaze [99] only investigated eye gaze that was focused on the user and did not indicate eye gaze towards other targets. They used controller vibration to indicate if someone was looking at a user. The recommendations by participants in this evaluation are to use spatial audio combined with speech messages instead of a vibrating controller. Additionally, the eye gaze indicator in [99] used real-time monitoring, and the recommendations here are to allow the user to query for the information. BLVI participants in [99] stated that having constant access to eye gaze was stressful, and although it allowed them to understand if they were being focused on, they did not think that was a positive aspect. The participants also indicated that although the vibrations were unobtrusive, they lacked nuance and would not be able to indicate eye gaze for other targets. Future work needs to investigate the above design of on-demand speech and audio cues, with real-time vibration for eye gaze.

3.3.5. Other Thoughts in “Facial Control”

Results (Quotations)

Generally, I’ve listed ideas I’d like to see in VR experiences, the only thing is that a lot of the stuff regarding facial expressions must be optional. For example, as noted above, I certainly don’t want lip sync indication when an avatar is talking. I know they’re talking. It must just be kept in mind that as a blind user in a VR experience, too much automatic reporting of expressions or gazes, etc., would serve to detract from the experience. Such things should be optional, or for example only trigger when you are in line of sight with the avatar’s face etc. Of course, the facial expression reporting should exist and would be awesome to see, but unless it is utterly unobtrusive I really don’t care to know, at least in some cases, about every expression change that the resident annoying troll in the corner is making. Basically, as far as developer implementation, just make sure we can look away from things at least to the degree a sighted person can.

3.4. Multi-Avatar Interactions

3.4.1. Physical Contact

Description

Interactions where two or more avatars interact physically (e.g., a high five, hugging, dancing, poking, shaking hands, or kissing). These can be intentional or unintentional (e.g., bumping into someone in a crowded room).

Results (Quotations)

Many MUDs, and at least Survive the Wild, I don’t know how many other audio games have socials. The user can select an action, such as laughing or whistling, and the associated sound will play from that player’s position. In a MUD with a soundpack, the effect is somewhat similar, enough to show the concept. In short [to represent character interactions, use] Foley sounds. Pretty much all of the examples here are not purely visual in nature anyway, in the real world they are just nearly silent. You can certainly hear kissing, the clasp of a handshake, certainly a high five, and unless little clothing, you can even sometimes tell that a hug is taking place. In a VR experience, these sounds would probably just be exaggerated. The Hollywood Edge Foley FX library [165] is a great example of one that contains many delicate human Foley sounds, anything from placing an arm on a chair to someone rubbing their belly to someone grabbing one hand with another to grabbing a shirt that’s being worn, etc. In addition, less obvious sounds could be described.

Discussion

Recording 2: Emote Examples—Materia Magica. (https://www.openicpsr.org/openicpsr/project/224701/version/V1/view?path=/openicpsr/224701/fcr:versions/V1/Emote-Examples---Materia-Magica.mp3&type=file, accessed on 1 April 2025). Recording 2 demonstrates both sides of a multi-player interaction in a MUD interface.
Recording 3: Emotes Example—STW Socials. (https://www.openicpsr.org/openicpsr/project/224701/version/V1/view?path=/openicpsr/224701/fcr:versions/V1/Emotes-Example---STW-Socials.mp4&type=file, accessed on 1 April 2025). Recording 3 from Survive the Wild shows an example of proximity with a non-player character, although the same command can be used for player characters.

As mentioned by the participants, Survive the Wild has a list of actions and emotes that can be done individually and with other players, as in Recording 3 [115]. MUDs also have emotes that can be done with other players (e.g., typing “poke John” will yield “You poke John”, as in Recording 2). [121]. If there were both a sound and text message associated with an action, that would be ideal. The sound should be played to all participants participating in the action, and ideally in spatial audio to everyone else around.

3.4.2. Avatar–Avatar Collisions

Description

Avatars collide with one another. These are intentional collisions and may result either in moving through the other player’s avatar or bumping off of them.

Results (Quotations)

I think this can just be indicated with a sound that is more exaggerated than how it would actually sound in real life. Bang two folded towels together, maybe two books wrapped in cloth, two thick jackets or whatever, and there is your collision sound. If an avatar passes through another, a tiny cinematic or magical effect would probably handle the situation nicely to indicate a defiance of physics and the passthrough. Alternatively, a small shuffling sound to indicate that, for a brief moment, one avatar stepped aside and made room for another.

Discussion

In addition to a short sound, a speech message saying “John bumped into you” or “John walked through you” would help users learn what the sound signified. This should also be something that can be turned on and off, as [92] discussed how avatar collisions could be used to annoy or disrupt the experience for other players, but they could also be used to signify some social communication, so should not be left out of the experience.

3.5. Gesture and Posture

3.5.1. Micro-Gestures

Description

Automatic gestures, such as eye blinking, shifting weight and other actions people perform without conscious effort.

Results (Quotations)

Set your avatar to randomly twirl their mustache. Rustling of cloth for body movement, and possible earcons like quick blink sounds for things like eye blinks. This is also not completely necessary. We make a lot of little micromovements, and if each one was accompanied by a sound, it would quickly get overwhelming, just like how sighted people stop focusing on something, this should only be audible if either you have this feature enabled or are consciously paying attention to a person. I do not wish to be notified of every eye blink of every avatar, that would quickly become an annoyance during a voice chat session, for example. I only wish to be automatically notified if such a blink is communicative in any way, for example the reaction of the eyes after being suddenly hit by sunlight might be interesting to hear/be notified about, but any micro-gesture that cannot be represented by a light Foley sound and that is also not communicative or important in any way, I feel, should not produce any notification other than that which the listener wishes to receive. If it is your own avatar, you should be able to select automatic body actions to happen randomly.

Discussion

This should be something that can be turned on and off. MUDs have a way for participants to perform these types of gestures, and other games often have characters sigh or scratch with a sound every so often [160]. Regardless, this could be abused and become rather annoying, so it is important that users can turn this option on and off.

3.5.2. Posable/Movable Appendages

Description

The avatar’s body movements that occur when the head/torso/arm/leg of the avatar moves. These movements change in response to the player’s head/torso/arm movement in space. They could be mapped from the headset, sequenced to move together based on actions (e.g., feet moving when legs walk), or can be independent of other movements (e.g., finger movements can be used to modify other gestures like waves or communicate specific number/approval information).

Results (Quotations)

If voice chat, adjusting relative position of voice. It may be interesting to consider positioning the sounds for an avatar using multiple HRTF [head-related transfer function] sources set near each other, one for each major appendage. If an avatar snaps with their left hand extended, and the listener is standing right behind that avatar, the sound of the snap playing slightly off to the left in conjunction with the position of the extended appendage would already communicate much. The constant updating of the positions of such sounds as the avatar moves combined with some method of a speech description (Fred with left arm extended) should be able to communicate all important information about the basic position of an avatar’s body, with the sound positioning and very slight sounds for each movement, it should be possible to get a clear picture. I do think that for particularly communicative gestures (thumbs up or raising five fingers), they should be detected and spoken as text strings to a listener if possible.

Discussion

Survive the Wild utilizes positional voice chat, but there are few games that have sound sources for hands, face, and each foot [115]. Because these interactions are performed independent of an emote system, text descriptions will be difficult to obtain other than through a gesture recognition system or large language model with image recognition [166]. According to [92], gestures are often more important than verbal speech in VEs, and it is critical that players be able to recognize body movements quickly and accurately. Without access to this information, non-visual users will be left out of much of the communication and may seem rude or robotic to other players.

3.5.3. Mood/Status

Description

The way that the avatar’s movement may communicate the avatar’s general emotional state. When the mood/status changes, movements may change subtly to match the mood. These could be direct or indirect, where the interface has presets, allowing them to choose from templatized/selectable moods, or users could customize how mood affects movement.

Results (Quotations)

Textual descriptions of facial expressions. If some of this is customizable by the user, allowing the user to set some sort of custom mood string (just a few words long) should they wish to, which would then be announced by the screen reader of an observer, may also be beneficial here, at least among anyone who would wish to take advantage of such a feature.

Discussion

Similar to facial expressions, this should be attached to the description of the avatar when users walk in the room or be attached to the description of the user. An example of the first would be a user pressing Ctrl+d and Ctrl+a (or opening a menu) to cycle through players and they hear “Sally is here, looking glum”. If the user focuses on Sally, they could hear something like “Sally is a glum looking human…”. MUDs handle this kind of emotion in an effective non-visual way [160].

3.5.4. Proxemic Acts

Description

Movements which occur in relation to how close/far users should be able to perceive communication. For instance, increasing the voice volume of speaking or text size of written messages.

Results (Quotations)

Definitely HRTF audio. Volume [loudness] for distance and panning for position. Survive the Wild uses similar systems [115].

Discussion

Recording 3: Emotes Example—STW Socials. (https://www.openicpsr.org/openicpsr/project/224701/version/V1/view?path=/openicpsr/224701/fcr:versions/V1/Emotes-Example---STW-Socials.mp4&type=file, accessed on 1 April 2025). Recording 3 has an example of the proximity of a non-player character, and a brief example of a player character playing an instrument in the distance.

HRTF audio is also known as spatial audio. Survive the Wild (STW) allows users to verbally shout or whisper to players. Users can press a key to hear how far away someone is. Footsteps and other character sounds are played using spatial audio in STW, as shown in Recording 3. Many MUDs have some kind of prefix to a command to say the intensity (e.g., “Sally shouts I’m over here!” or “Fred loudly sneezes”). These are often coupled with a proximity check so users who are far away may hear a message, “You hear someone sneezing in the distance” [121,160]. Note that text size, casing, emphasis, and other formatting features are not typically reported by screen readers [149].

The study in [98] evaluated HRTF (or spatial audio) with a refined version of the above design, splitting the proximity into three categories (Intimate, Conversation, and Social). Participants were able to customize each level of interaction between proximity and friend or stranger, and could choose between musical phrases, real-world sounds, and speech messages. They found that within a few numbers of avatars the experience was useful and understandable, but participant distraction increased as more avatars entered the space. This evaluation only investigated real-time alerts rather than coupling the non-verbal cues with speech messages on demand, so future work will need to compare on-demand messages compared with real-time messages.

3.5.5. Iconic Gestures

Description

Use of gestures which have a clear and direct meaning related to communication. These include social gestures (e.g., waving, pointing) and representational gestures (e.g., miming actions, such as scissors with one’s hand).

Results (Quotations)

Textual descriptions of gestures via TTS [screen reader, or text-to-speech]. Any MUD with emotes is probably a good example.

Discussion

Recording 2: Emote Examples—Materia Magica (https://www.openicpsr.org/openicpsr/project/224701/version/V1/view?path=/openicpsr/224701/fcr:versions/V1/Emote-Examples---Materia-Magica.mp3&type=file, accessed on 1 April 2025) Recording 2 has examples (e.g., “cat nap”) that have been expanded for both the user and their recipient. Instead of just saying “You cat nap”, it says “You curl up in John’s lap for a quick cat nap.”
Recording 3: Emotes Example—STW Socials. (https://www.openicpsr.org/openicpsr/project/224701/version/V1/view?path=/openicpsr/224701/fcr:versions/V1/Emotes-Example---STW-Socials.mp4&type=file, accessed on 1 April 2025). Recording 3 has examples of emotes that players can perform.

Survive the Wild provides a list of emotes, such as yawn, jump, scratch, etc., that play the sound of that gesture in spatial audio, as can be heard in Recording 3 [115]. MUDs often have a list of emotes that users can perform (e.g., “clap”, “dance”, “yawn”, etc.) that have a description for the user (e.g., “You yawn”), and a description for everyone else (e.g., “John yawns”), as demonstrated in Recording 2. There is also often an “emote” command that allows users to create a custom emote that will just paste the text that is written after the user’s name, so one would type “emote yawns”. This would show the player “You yawns” [sic], and show everyone else “John yawns” [121,160].

The evaluation by [99] evaluated head shaking and head nodding sound cues that were indicated through musical phrases. Head nodding was a xylophone high pitch followed by a low pitch, and a head shake was a low flute note followed by a lower flute note. Participants appreciated having access to these gestures as it empowered them to understand if people they were talking with were in agreement on what was being said. Participants also found the musical notes unobtrusive. Future work needs to investigate how short speech descriptions (presented in this paper) compare with the short musical phrases presented in [99]. Theoretically speaking, short speech messages, especially if the speech is at a different spatial audio location than the user’s screen reader (so the user has a “cocktail party effect”), should be a more functional and easier to learn experience than the musical phrases [107,108].

3.5.6. Other Thoughts in “Gesture and Posture”

Results (Quotations)

Make sure there is some sort of table of constantly updating information or keystroke a user can quickly press that prints a list of nearby avatars with some textual information about their posture. I imagine if someone enters a room, they can glance around to take note of the current posture of avatars; visually impaired users need this information as well, what use are posture updates if one cannot have known the previous posture being changed?

Discussion

This review of other players could be both a menu and a cycling option. For example, opening a menu of players could be done by pressing the “l” command, and the user could move through the menu with their up and down arrow keys. Each menu item would be a player’s name and their emotional and expression state (e.g., “Fred is grinning widely, he’s feeling great” and “Sally has a stoically straight face, she is focused”). If the user is focused on the game play screen, then they could use a command (e.g., Ctrl+d or Ctrl+a) to cycle through the focusable objects and hear these users’ names followed by the user’s state. This is very similar to how MUDs convey emotions of characters and non-player characters [121].

It is important to note that the designs presented here are user controlled. A button is pressed, and information is provided. This user-controlled method of querying is the recommended technique as presented in [126]. This is in contrast to the cues evaluated in [99] and [98] that were real-time streams of information. Possibly both kinds of cues will be desired by BLVIs, but future work needs to refine and evaluate these different designs.

3.6. Avatar Mannerisms

3.6.1. Personal Mannerisms

Description

Smaller movements that avatars perform periodically and/or repetitively (e.g., rocking, swaying, tapping feet, shaking legs, twisting/pulling hair, playing with clothes/jewelry).

Results (Quotations)

If we are talking about pre-designed actions you can tell your avatar to perform, then it could be handled in the same way as other gestures (such as hand waving, smiling, or a thumbs up) with earcons or extra layered Foley sounds. Make sure that either sounds representing such actions are very well done or allow the report of such information to be optional. As described in previous sections, BLVIs don’t want to be notified of someone’s eye blinks any more aggressively than a sighted user would be. If a player drums their fingers on a table, make certain the volume of such a sound is quiet relative to how such a thing would sound in the real world, for example. If we are talking about the actual motions of other users, then I suspect it should be ignored. Depending on how accurately the VR headset/hand controls pick up motion, a sighted user may notice that someone in the room is gently rocking back and forth or has a nervous tick where they physically jerk without intending to. A blind user does not need to be specifically told about those movements, because the person making them would probably prefer that the sighted user doesn’t see it either. An involuntary or nervous movement is not often something someone wishes to announce to those around them, which is why they are involuntary movements and not an intentional hand wave or other form of expression. If I stutter and give a speech, I cannot help those in attendance hear my stuttering. If I dictate my speech to someone writing it in an email, it would frankly be insulting if they misspelled all of my words to relay my stuttering to the eventual reader. In the same way, involuntary/unintended motions may be picked up by sighted users, but going out of our way to translate that to those who wouldn’t otherwise notice is just rude.

Discussion

These types of gestures are often not conveyed in audio games, but are sometimes conveyed in MUDs. In audio games (e.g., Swamp), the player’s character will start playing a heartbeat sound as they get close to death. This will play, even if the user is out of combat, but there is no personal mannerism that lacks a gameplay purpose [119]. In some MUDs, users can either program custom mannerisms, or their state or racial choice will define a mannerism (e.g., “A young tyke climbs the walls!”) [121]. As described above, this kind of behavior should be minimal, and users should be able to ignore or turn it off.

3.7. Conventional Communication

3.7.1. Visual Communication Support

Description

Visuals (e.g., mind maps, whiteboards) used to record and organize ideas while meeting, brainstorming, or collaborating with others.

Results (Quotations)

In regards to showing visual displays in HTML for screen readers. Consider addons for the Non-Visual Desktop Access screen reader that actually turn the screen into a 2-dimensional HRTF field for the user. If the user passes a link, a little click can be played, in HRTF, at the position on the screen where the item is visually located.

There are two possible displays: 1. A standard chat window: If one were to make an accessible display of such a board that followed the standards of an accessible chatting application (modified to suit the purposes of the board), a blind user should have no trouble viewing the information, even if not done through a virtual lens or gaze. Consider that a blind user of a chatting application does not typically get behind in the conversation. The keystroke convention examples below assume a Windows keyboard but could be adapted to whatever control methods are available. Such standards may include but are not limited to home and end for top or bottom item (sorted by date added usually), up and down arrows to scroll between messages (quickly announce ONLY prevalent info here—in a chatting app this would be name: message, sent at date), BEING ABLE TO HOLD ARROW KEYS TO SCROLL THROUGH ITEMS ON THE BOARD QUICKLY, and instantly announcing new changes to the board or visual display as they are made. Consider that many websites are accessible to a blind person that contain many complicated structures of information; showing the user a small browser window with an HTML representation of a visual display is probably the easiest for both developers to integrate, and for VI users to understand. That being said, being able to use a virtual gaze to understand the layout of the visual display would be really cool indeed if done correctly, as in a browser window, a blind person doesn’t know where an element they are focused on is visually located on the screen. 2. Virtual gaze: This one is trickier than my bulletin board approach below [in Public Postings], because it is likely the data will be actively changing as new messages are added to the board. I still think the bulletin board approach will help the user visualize the layout of the messages, but changes will not be immediately obvious without the user happening upon that location to read that the message is different. New messages being added would/could quickly force background music to be changed for existing entries, causing confusion to a user who had already begun to depend on them to navigate the board. Without an example put together to test this idea, I have no idea whether a blind user would be able to follow the changes, or if they would become hopelessly lost and left behind. The bulletin board approach might end up working fine for actively changing whiteboards, but it would need to be tested.

Discussion

Utilizing existing non-visual collaborative document conventions (e.g., up and down arrow keys to move by line, information about comments, status messages when users are editing a line, etc.) are effective in most professional settings today, so should be used, but can be cognitively intense (see [167] for existing conventions). There has been some research on finding new interfaces for these types of collaborative documents that are more intuitive, but no standard has been developed [168]. The chat window example from above is similar to other kinds of chat experiences in collaborative documents or chat clients [167].

3.7.2. Public Postings

Description

Areas that host messages meant for public viewing (e.g., bulletin boards).

Results (Quotations)

I am assuming such a bulletin board would not be neatly arranged in grids, but rather be messages thrown about in possibly different sizes and orientations. I imagine this like the map coloring puzzles, where algorithms can be used to determine the minimum number of colors needed to fill in a map where no adjacent states share the same color. Rather than color I’m imagining a handful of unobtrusive background sounds like background hums. More like drones instead of music. Maybe white noise with a set of triangle waves in a chord. Something soft and unobtrusive. The board itself would invisibly divide up all entries posted on it, assigning jagged borders equal distance between them. Using a “map coloring puzzle” technique, entries are assigned background sound so that no adjacent entries use the same one. This should always keep the number of required background sound files to a handful or less. As the user’s gaze moves from entry to entry, the background sound adjusts in real time, and the text at that location is spoken using text-to-speech. The user can perform a gesture to open the message in a standard text box so they can read it with their screen reader using the screen reader’s commands. It might even be useful to adjust some attribute of the voice based on the size/bold/underline of the text, since often on a whiteboard an important message may be written bold with an underline, while less important messages are smaller. A pitch adjustment to the voice could help convey emphasis, which may be important depending on the situation. This is only if the user has decided to use the built-in screen reader, but changing the voice is not possible when interfacing with the user’s own screen reader. The main purpose of the background sound is to help the user construct the 2D layout of the messages, and to help as “landmarks” when moving their focus around. Initially looking at a board of messages is going to take time slowly looking it over, which is expected. After moving around to read/have read several messages, going back to revisit a previous message is far easier if your brain has linked it to the background sound. As you turn your head back in the direction you believe it was, each change in sound clearly marks the borders between messages, letting you keep track of how far you’re moving. If the user chooses, the sound will play for a moment before the text has a chance to be read, letting someone quickly move four or five messages until they stop on the sound they are looking for, without needing to sit through several spoken words, checking to see if those are how their desired message should start. The reason for this is because the problem with a finite number of music tracks or sounds, particularly that play several seconds before the text of a post is read, is that this alone is not sufficient for quickly browsing between posts or locating one you are interested in. This is because, even if there is an assurance of different tracks on each border, two messages that are across the room or that are many feet apart from each other may end up using the same track, thus causing me to have to listen to it for seconds to determine that the track doesn’t correlate to the message I’m looking for. As such, in optional addition to the sounds/tracks that let one learn the physical layout, I think that every time your focus fixes on a message, some sort of post number, author username, subject slug or any other form of very short identifier should be announced to the user instantly upon message focus as well. Then I can simply remember something like 95, Marcus, July 30 public announcement, 09:31 PM, or literally anything else that is very quickly presented to relocate a post I am interested in with speed. If you come in view of a message you feel you may want to view again, you should be able to somehow highlight or bookmark the message, so that the UI can instantly alert you if you return your gaze to a message you have interest in. There will be times when a user gets a little lost in their search, but I do believe background sound will help tremendously and save time. Finally, if I know I have located a post, there needs to be a way to interrupt the delay between the start of the background sound and the text of the post, or such a delay needs to be optional, as I do not want to wait a couple of seconds after finding my post for it to begin reading.

Ensure, though, that when a message is viewed it is not just spoken, it should be imperative that such a message show up in an accessible and navigable read only edit box or other browsable element so that a blind user can scroll through the message with their screen reader, similar to a sighted user. If not this, a way to, for example, copy the focused message to a user’s clipboard in plain text, so that the user can optionally browse it in their own text editing application.

Alternatively, randomly placed messages could be collected and transformed into a list which can just be browsed with the user’s screen reader, similar to an email client.

Discussion

Survive the Wild has system messages and other collaborative boards presented in a menu, where players can use their arrow keys to navigate through the messages, and enter to select one for further review [115]. MUDs typically will have a bulletin board in a town square or other central location that users can look at to view notices, then specify the number of a particular notice to view in more detail [121]. In Swamp, text commands, like those in MUDs, can be entered by pressing “slash” or “/”, and typing the command [119].

3.7.3. Synchronous Communication

Description

Non-verbal communication between two or more avatars in real time (e.g., signing, writing).

Results (Quotations)

Old school chalk and whiteboards serve as a proof of concept here. The rubbing of the chalk and the squeak of the marker both provide this auditory feedback, starting and stopping to indicate how long the movement took place, and a pitch change tied to the speed of that movement. I wouldn’t be able to tell you what someone was drawing or what they were writing on the chalk/whiteboard, but from sound alone I would know if they were drawing or writing, and could let you know any time they switched from one to the other. Drawing and writing would sound very different, and as a human I can learn to tell the difference. No sound effect had to play to inform me that the person was writing, another that they switched to drawing, then back to writing, and finally that they’ve stopped. Each of those sound cues would have to be memorized independently, and you’d need a ton of them as possible movements added up. The chalk/whiteboard example shows that even a single sound (started and stopped correctly plus pitch adjusted for speed) lets me determine all of those actions on my own.

When enabled for accessibility, I think unobtrusive sounds could be linked to many avatar motions. As an example, if someone turns their head left it could play a soft creaking sound for only as long as their head was turning, and possibly pitch adjusted to express the speed (distance head actually turned). A different creaking sound would represent a head turning to the right. In combination, a person slowly shaking their head “no” would resemble a creaky door being moved open and closed a few times. With the pitch expressing speed and the sound stopping as the head motion stopped, hearing alone would easily be able to differentiate someone slowly shaking their head “no…?” in an unsure/questioning manner versus a strong “No!”. The sounds would need to be carefully selected to keep them from being distracting or annoying, and ideally paired so expected combinations would sound nice. The sound of a creaking door being opened and closed a few times would seem to fit together, much better than the mooing of a cow and the ring of an old telephone being alternated. Head nods up and down would have their own sounds. Without needing to capture every possible motion, some trial and error could be used to determine a minimum functional set of body movements to be given such sounds. I can imagine benefits to the sound of each wrist moving closer/farther from the body, also one for each hand being turned face-up or face-down. These eight sounds alone (close/away/face-up/face-down per arm) would help indicate if the other person was offering to hand you something or take something from you. A short moment of the right wrist turning toward face-up, plus the wrist moving away from the person, is likely them initiating a hand shake. Even without shoulders, both wrists moving to face-up is likely a questioning shrug. Depending on the size of the message being drawn out, a person moving their hand to draw should eventually be determinable through the sound of their wrist rotating and moving toward/away from the body. I believe it would sound different enough from a person “speaking with their hands”, which is of course itself worth capturing. After enough conversations with someone, the sounds of their body language softly playing in the background as they speak, should help add to the experience. Insight can be gained about a person’s excitement level by seeing how they move their hands as they speak, as much as hearing the words themselves. I’m hoping a few dozen carefully chosen background sounds could solve several problems at the same time. For training purposes, a user should be able to activate these sounds for themselves, so they can begin associating those sounds with their own movements.

Discussion

The physical action of writing was discussed, but the message that is being written should be sent through text, similar to how other collaboration tools work with chat clients non-visually [167]. The gesture interface described above is similar to body gesture sonification platforms (e.g., [169]), but these have never been used in any large deployment. Large language models and other machine learning models can often recognize basic gestures and sign language letters, but most sign language recognition systems fail to recognize anything but the alphabet letters (which would be like people verbally spelling words instead of saying the word) [170]. There has been little to no research on sonifying arm, hand, face, and body gestures for sign language, and this could be a promising area of research. Because VR equipment does often capture a significant amount of body movement, it presents an opportunity which is not present in the real world.

3.7.4. Asynchronous Direct Communication

Description

Emails/messages sent directly to someone, but which may not be received in real time.

Results (Quotations)

I don’t think anything needs to be changed from how sighted users receive emails/messages. For sighted users I assume some icon somewhere indicates they have unread emails/messages, and once opening that they are presented with the message and are able to reply. When a new message arrives while they are active, I assume there is some sort of sound in addition to a visual cue. For blind users they’ll know if a message arrives while they are active because of the same sound sighted users hear. When signing in, the same sound could be played to let them know existing messages are waiting from when they were away.

Discussion

Recording 4: Email Client Example. (https://www.openicpsr.org/openicpsr/project/224701/version/V1/view?path=/openicpsr/224701/fcr:versions/V1/Email-Client-Example.mp3&type=file, accessed on 1 April 2025). Recording 4 shows navigating through an email client with a screen reader.

Many MUDs allow users to send mail to other players by either sending a letter through a post office, or by just typing “mail Fred” [121,160,171]. Users can then check the subject lines of their mail by just typing “mail”. They then view a particular message by typing mail followed by the message id (e.g., “mail 1”). Moving this to a more interactive environment, one could have the subject lines in a menu, then users could press enter on a subject line, where they will then be entered into the text box where they can read the message, as shown in Recording 4. Typical email commands (e.g., “Ctrl+r” or “r” for reply) can then be pressed to reply to the message. When the user replies, a multiline text field is presented where the user can move through the lines, characters, and words with their screen reader. This functionality should use standard HTML or user interface elements from the operating system to take advantage of all the built-in features a screen reader has [136,137].

3.8. Avatar Appearance

3.8.1. Avatar Customization

Description

Selecting personalized looks and voice, including gender expression, body features, and dress of avatars. Options may be pre-selected based on overarching features (e.g., clothing, character class) or fully customizable. This may also include adding/removing body composition and type to mirror body differences in real life (e.g., missing arm, prosthetic leg, vitiligo).

Results (Quotations)

I don’t think anything needs to be changed from how sighted users select these options, so long as each option is given an adequate description. It may feel like a description isn’t going to help blind users recreate their appearance as precisely as sighted users can, and while that’s true, it doesn’t matter in the slightest. Sighted users who are looking to match their own face settle constantly with options that are “close enough”. Users who are very happy with the face they’ve made will not actually have eyes that perfectly match the avatar, face shape, hair style, eyebrow shape, etc. They’re assembling an approximation. For blind users this shouldn’t be any different, and they won’t notice or care if the description they selected for a hairstyle doesn’t actually make their avatar look exactly like their real hair. I believe users fully accepting “close enough” is going to be the same within the blind community as well.

Discussion

Recording 5: Equipment Example. (https://www.openicpsr.org/openicpsr/project/224701/version/V1/view?path=/openicpsr/224701/fcr:versions/V1/Equipment-Example.mp4&type=file, accessed on 1 April 2025). Recording 5 shows how objects can be worn and reviewed through an equipment list.

Clothing is very present in audio games and MUDs. Both A Hero’s Call and Entombed have a very detailed clothing system [120,172]. Each body part will be in a menu with the information about what is on that body part (e.g., “Ancient Elven helmet in head slot”), as shown in Recording 5. When you hit enter, a menu of all the possibly equipable items in your inventory will show. You arrow down through the list and hit enter on the object you would like to equip. MUDs have a similar system, where typing “equipment” will show a list of what the user has on their body, along with any description the user has set for themselves (e.g., “A short stout dwarf with gray hair and a long braided beard”) [121]. One can look at another player by either focusing on them and opening an actions menu, or by typing “look ”. In VR worlds, avatar appearance is very important in helping other players build an identity or have a particular feeling about the user, and should not be thought of as a “nice to have” [92,147]. There should also be a message for those within a particular proximity to a player when they change their appearance (e.g., “John removed his elven helmet”).

3.8.2. Words on Body

Description

These are words that are written on the avatar’s clothes or directly on the body (e.g., words on a T-shirt, tattoo, product logo).

Results (Quotations)

In an environment where people can be anything (custom 3D models), it will have to rely on player-provided descriptions, and some menu/key blind users can press to have that read aloud about the avatar they are facing. Far from perfect since it requires people to participate in something they won’t benefit from or see a need for (in 99.9% of situations). For text, particularly custom text that is not part of a simple list of clothing items for example, OCR can be performed on the image of the user that is being inspected. A screen reader’s optical character recognition function can read the screen so thoroughly that a virtual document can get created which a user can scroll through, and sometimes even click elements of. I think this proves that with refinement, using OCR to determine text on clothing items, furthermore, even telling the user which clothing item contained the text or where on the clothing item, is very possible. If the environment is more “sims”-like, where avatars are clothed using a large list of options, then descriptions could be relied upon. Developers/artists would add descriptions to the clothing/decorations and normal users wouldn’t be required to do anything extra. The safest approach is probably just have a key that can be held while facing someone’s avatar. Clothing/decoration descriptions should begin being listed in a specific order starting from most generic and moving to specifics. The longer they hold the key, the further down the list of descriptions they go (are read aloud to them). As I encounter someone new, I hold the key. Jim is wearing a red shirt, blue pants, a tan baseball cap, and I release the key to hear them better as he talks to me. In a short conversation pause I hold the key again. Jim is wearing white shoes, his shirt contains a logo on the chest, the pants are torn on the left knee, a black watch on his right wrist. I once again release the key to talk. As in real life, we quickly grab generalities about someone and begin filling in the details over time. Each clothing/decoration avatar can equip what would need a few descriptions entered that span a few levels of detail, and a lot of this could be automatic, where the logo placed on a shirt is automatically on such a list, and the logo’s description itself is further down. It would require some test groups to determine what order an avatar’s appearance should be described in, but I’m sure there is some order that would be ideal for most situations. In my example I didn’t include other features, but hair color and style, eyes, makeup, and basically everything else designed into the avatar can be handled in this way. Perhaps 2 different keys would be used to separate describing face/hair/body from clothing to save on time, that would depend on how many controls are free to dedicate to this, but I think traveling down a list from generic to specific descriptions goes a long way in reproducing how sighted people gather this same information (albeit slower). Five minutes after meeting someone, “Oh hey I like your earring! I hadn’t noticed that until now,” would literally be a thing in this approach, the same as it is for sighted users. Alternatively or additionally, a menu should be easily accessible that should allow you to scroll through all such items that would be automatically announced the longer a key is held down. This way the user can quickly scroll to the information that interests them, rather than relying on some predetermined order of elements.

Discussion

If clothing with text is created by the game developers, a simple text string can be used to describe the object, similar to how equipment is described in a MUD [121,171]. If there are too many objects for someone to manually describe, large language models can provide quality text descriptions (along with OCR) relatively inexpensively [166]. If users are creating their own objects, a description can be required moving forward. If this is not an option, a large language model can be crafted to describe the avatar’s appearance on command, as described above [166]. Note that large language models often make errors when describing images, and BLVIs are skeptical of output, but there are possibilities with large language models if one is careful [173,174].

3.8.3. Other Thoughts in “Avatar Appearance”

Results (Quotations)

Something similar to this was mentioned above, but it’s very important that in addition to VI users receiving audio updates when someone’s appearance changes, there needs to be, whether by some sort of HTML table or keystroke that presents information, a way to get a summary of those near you and a basic description of what they look like, despite the more detailed features that would appear if your gaze fixates on a certain user or avatar.

Discussion

MUDs have a clear way of providing alerts when characters change appearance and also provide both brief and detailed descriptions of players [121]. When a character equips or unequips an object a message says something like: “John wears puffy pants on his legs”. When someone walks into a room, all the characters are in a list with a short description (e.g., “John the elf is here, looking for something to do. Sally the drow is here, reading a book. Fred the adventurer is here, preparing for his next quest.”). If one looks at a character, they will hear a detailed description of the character (e.g., “John is a young, bored elf with long blond hair and sparkling blue eyes. On his head he’s wearing an Ancient Elven Helmet, and on his legs he’s wearing Puffy Pants.”). Sometimes more detail about a character’s equipment can be obtained by looking at John’s helm, and a description of John’s helm will be given. In a VR interface, the list of characters can be a menu. When users hit enter on a character, they can have a “description” option in the submenu that opens. That description can be another menu with the basic description as the top item and the equipment as menu items under the basic description. Hitting enter on these other objects will provide a description of each object. Menus are one of the most useful conventions in non-semantic interfaces and will allow different levels of detail to be obtained based on the user’s interest.

3.9. Avatar–Environment Interactions

3.9.1. Avatar–Environment Collisions

Description

Avatar behavior when they collide with objects or walls in the environment (e.g., hitting the wall, running into trees, and interacting with the environment without going through it).

Results (Quotations)

You would hear a body thump with the avatar exclaiming the appropriate vocal sound. For example, a knight walks into a tree. You would hear metal hitting wood with a randomly selected groan, curse or gasp.

Discussion

Recording 6: Avatar-environment interactions Examples—Swamp—Collisions with Objects. (https://www.openicpsr.org/openicpsr/project/224701/version/V1/view?path=/openicpsr/224701/fcr:versions/V1/Avatar-environment-interactions-Examples---Swamp---Collisions-with-Objects.mp3&type=file, accessed on 1 April 2025). Recording 6 has the user running into a number of different surfaces from sandbags to cars.

Without clear collision sounds, non-visual interfaces can be very difficult (if not impossible) to use, as users will have no idea if they are still moving or not. All audio games with avatar movement provide clear collision sounds that change based on the material, and that are often coupled with a message (e.g., “Bumped into North Wall”), as shown in Recording 6 [115,119,120,121]. Clear collision sounds and messages coupled with movement sounds and messages are some of the most critical elements required for non-visual accessibility.

3.9.2. Object Manipulation

Description

An avatar’s intentional interaction with objects in the environment (e.g., throwing a ball back and forth, playing the guitar), and/or indications that an object can be manipulated (e.g., showing that a ball can be picked up, or a button can be pushed).

Results (Quotations)

Use a unique directional (i.e., via HRTF) sound to represent an object’s location in relationship to the listener, and the sound of the object itself to indicate what it is and how it’s being interacted with. Pitch can be used to designate height for non-3D systems. You can also use sound qualities like pitch to convey further information, for example when hammering in a nail, the pitch of the hammering sounds can be three or four semitones lower or higher in pitch when starting than when performing the last stroke with the hammer.

A button on the wall can be directionally located by sound. For height, the sound’s pitch can slide repeatedly between a low tone to a tone representing its height. In Survive the Wild, for example, nobody has an issue figuring out what someone is doing with an object, as each major action attached to each object has a sound cue, which is positioned in [virtual space, via] HRTF, where the object is located. Different sounds for each action should make it very clear what’s happening with an object.

Discussion

Recording 3: Emotes Example—STW Socials. (https://www.openicpsr.org/openicpsr/project/224701/version/V1/view?path=/openicpsr/224701/fcr:versions/V1/Emotes-Example---STW-Socials.mp4&type=file, accessed on 1 April 2025). Recording 3 shows an example of how objects are described around the user in a menu (e.g., “Mud near the stream straight behind, shallow stream straight behind, sand near the pond straight off to the left.”)
Recording 7: Avatar Object Interaction Example. (https://www.openicpsr.org/openicpsr/project/224701/version/V1/view?path=/openicpsr/224701/fcr:versions/V1/Avatar-Object-Interaction-Example.mp4&type=file, accessed on 1 April 2025). Recording 7 shows how a user interacts with doors and an object they can take.

There are two aspects to object interactions: the list of objects that can be interacted with, and the sound of those objects. Since BLVIs do not click objects in their environment, it is important they have a menu or way to focus on interactable objects, coupled with the object’s sounds, as shown in Recording 7. A Hero’s Call has a particular beep sound in spatial audio when there is an object in one’s proximity that can be interacted with. When the user hears that sound, they press Ctrl+a or Ctrl+d to cycle in the direction of the sound and focus on the object. When they focus on the object, they will hear the object’s name, and they can press enter to open the object’s menu (that includes take or look) [120]. Survive the Wild has a scan feature that will bring up a list of the interactable objects that are around them (e.g., “Mud is to the left, 25 tiles away. Sand is ahead and to the right, 30 tiles away.”), as shown in Recording 3. Pressing the enter key on an object will play a beacon to help the users find the object. A beacon is a beep in spatial audio in the direction of the object that gets faster as the user gets closer to the object. When the user reaches the object, the beacon stops [115].

3.9.3. Other Thoughts in “Avatar–Environment Interactions”

Description

Orientation in a room.

Results (Quotations)

Use two or more 2D or 3D sounds to provide orientation. The volume of these sounds imparts direction. First-person perspective in audio games (like Swamp) try to place ambient sounds around maps so that players are always within earshot of at least three sounds at different positions, even if at a distance. Being aware of three sounds lets the player triangulate their location and the direction they are facing.

I find myself randomly placed in a large room I’m already familiar with. In my right ear I hear the radio softly playing, which I know is in the center of the room on a table. Based on how loud it is, I have a pretty good idea of how far from the table I am, and I know it’s to my right, but that means I could be anywhere in a clockwise circle around the radio. Now I notice some bubbling sounds from the fish tank elsewhere in the room. I can now immediately rule out all but two spots in the room I could be standing, to hear both sounds how I hear them. In one situation the fish tank is ahead of me a little to my left, and the radio is ahead of me a little to my right. The other situation is that the fish tank and radio are still to my left and right, but behind me. It is quite difficult to differentiate sounds ahead of you verses behind you based solely on the volumes of each sound in each ear. Finally, the final clue is the ticking of a distant clock on a wall. I know the clock should be on the wall behind me (through previously wandering around and exploring this place), and it is so quiet that I know I’m far from that wall. Of the two possible locations I could have been standing, I now know I am the one farther from the back wall. The fish tank and radio are actually behind me and to either side. From this point forward I am fully aware of my position in the room and can turn and move accordingly. This entire process may have only taken a few moments as I took in the sounds around me.

Discussion

This idea of having three sound sources to triangulate position is used in most first-person games (e.g., Swamp, A Hero’s Call, and Survive the Wild), and can be heard in Recording 3, but deserves more research [115,119,120]. Other than ambiance, the practical positionality impact of having three sound sources playing at all times is unclear. Future research should investigate the effects of three sound sources on both ambiance and orientation.

3.10. Full-Body Interactions

3.10.1. Emotes

Description

Preset animations that involve a full-body demonstration of an emotion (e.g., jumping up and down when happy, waving arms to gain attention).

Results (Quotations)

Sounds are played to represent the action, with HRTF positioning the sounds based on the location of the object or player making them. In this example, the sound of jumping feet combined with an excited voice.

Discussion

Recording 2: Emote Examples—Materia Magica. (https://www.openicpsr.org/openicpsr/project/224701/version/V1/view?path=/openicpsr/224701/fcr:versions/V1/Emotes-Example---STW-Socials.mp4&type=file, accessed on 1 April 2025). Recording 2 shows how MUDs have a set of key actions (e.g., Cat Nap, bonk, and poke) that developers have added into the game with extra text for both the action performer and the action receiver.
Recording 3: Emotes Example—STW Socials. (https://www.openicpsr.org/openicpsr/project/224701/version/V1/view?path=/openicpsr/224701/fcr:versions/V1/Emotes-Example---STW-Socials.mp4&type=file, accessed on 1 April 2025). Recording 3 from Survive the Wild has a list of emotes (e.g., burp, yawn, scratch, stretch) that play a speech message and a sound in spatial audio.

Survive the Wild and most MUDs, like Materia Magica, have a list of preset emotes that can be quickly performed, as can be heard in both Recording 2 and Recording 3 [115,121]. In Survive the Wild, as heard in Recording 3, pressing “alt+g” opens a list of what are called “socials” which are preset actions one can perform, each with their own sound and name (e.g., laugh, jump, scratch head). As users arrow through the menu, they hear the name of the social, along with the sound. Socials can be jumped to using first-letter navigation, and users can select the social by pressing enter. In MUDs (e.g., Materia Magica as heard in Recording 2), typing “socials” will provide users a list of preset emotes that can be performed (although users can also create their own emotes).

3.11. Environment Appearance

3.11.1. Customization

Description

Adding objects and NPCs into the world and making them look a particular way (e.g., decorating a room, changing the lighting).

Results (Quotations)

Include sounds that the object might make. There should of course be a hotkey that speaks or shows in a menu a list of objects and their textual descriptions, particularly those that don’t make sounds. Flapping drapes, a clock ticking, bird sounds coming from where a window might be. It should also be possible to explore these objects somehow, for example by interacting/touching them, with a description or similar.

Discussion

Recording 8: Object Creation Examples. (https://www.openicpsr.org/openicpsr/project/224701/version/V1/view?path=/openicpsr/224701/fcr:versions/V1/Object-Creation-Examples.mp4&type=file, accessed on 1 April 2025). Recording 8 shows how objects can be created, sized, and placed in the environment using speech and the keyboard.

There are a few audio games, MUDs, and audio interfaces that allow for player environment creation and manipulation [115,119,155,171,175,176,177]. One of the most advanced object and environment creation tools for audio games is Sable from Ebon Sky Studios, as heard in Recording 8 [177]. They allow a full “what you hear is what you get” experience when creating audio games. To create objects and their properties, it uses a menu system coupled with text boxes, and to place the objects on the map, users walk around the map and press enter on the squares where they want to place objects [178]. Users can also change the terrain and create buildings with multiple levels and sounds. The Builder Academy is a MUD that specifically trains users how to create objects, environments, shops, quests, and non-player characters to be descriptive and impactful [171]. Creating objects requires an initial command to initiate the object, then users can edit the description, and other object properties by typing a command to enter an edit screen, pressing a number of the item they wish to edit, typing the text they would like, then hitting enter. Other games (e.g., Survive the Wild), allow users to craft objects and drop them, then interact with them (e.g., dropping logs, standing over them, and using a method of fire creation will create a fire). Users can then cook food over the fire [115].

3.12. Non-Body Communication

3.12.1. Emojis

Description

Emojis that appear above avatar heads to signal emotions (e.g., large happy face above a head, heart emojis surrounding avatar to signal love).

Results (Quotations)

Sounds and or speech messages are played to represent the emotion when the user presses a hotkey.

Soft romantic violin sounds to represent a heart, a jubilant trumpet sound to represent a good idea, or simple TTS speech to represent the emoji (e.g., “Face with heart shaped eyes”).

Discussion

Recording 9: Emoji Example. (https://www.openicpsr.org/openicpsr/project/224701/version/V1/view?path=/openicpsr/224701/fcr:versions/V1/Emoji-Example.mp4&type=file, accessed on 1 April 2025). Recording 9 shows how a screen reader reads Unicode emojis on applications (e.g., Microsoft Teams, Google Chat, Slack, Discord, etc.).

Mobile screen readers are able to read emojis by default, but not all desktop screen readers read all the emoji types, and sometimes the text that is read is not evocative of the emotion that is being conveyed, as can be heard in Recording 9 [179]. If text is added to the emoji, then the screen reader will be able to read it all the time. Adding sounds to emoji has been shown to increase sentiment comprehension in both BLVI and sighted users, but there are not yet conventions around what sound should be played for each emoji [180]. Audio games, such as Swamp, often will have some kind of short speech message that will play (e.g., “Out of ammo” or “Over here”), but emojis that are used in mobile interfaces are typically only used in text chats [119].

3.13. Working with the BLVI Community (Message from One Sighted VR Developer to Another)

One participant who identified as a sighted individual wrote the following message to other sighted VR developers (quoted exactly):

3.13.1. Why Should You Make a Non-Visual Experience?

While VR has been around for decades, in some crude form or another, when the masses participate in the metaverse it is likely going to be considered the true start to VR. The first waves of developers are essentially our pioneers, blazing a trail that the world will follow. These people and their projects will be remembered in the same way people look back at the first computer games or the first “3D shooters”. By nature of being among the first, people can claim their little place in the history books.

Separate from being the first, the other way to be truly remembered is to set a new standard. Even if Blizzard’s Warcraft wasn’t the first real time strategy game, it is credited with game mechanics that will be copied far into the future. If you can’t be first, you can earn your place in the history books by re-imagining and polishing what already exists. This can be done at any time, but in the early days it is much easier to think of worthwhile improvements.

Audio games (games built specifically for the blind and low vision population) have been around for a while, but none in [mainstream] VR that I am aware of. The first generation of developers who bring accessibility to the metaverse are going to be remembered in the same way the mainstream world will remember Pong, Doom, or Mario. In 50 years we will almost certainly still have blind people all over our planet, and the percentage of them who play computer games should skyrocket as our culture continues to make gaming a part of everyday life. It is for a much smaller subset of gamers, but within that group there are going to be the games/programs that made history. You happen to be living at the right time to be a part of that, if you choose.

Philanthropy is, of course, a strong motivator toward adding accessibility to your projects. It feels good to help people. In a business type setting, philanthropy is probably not as much of a motivation, but company image is. Adding support for blind players may only increase your user count by a tiny percent (or fraction of a percent), but the public opinion of your product may increase significantly. Whether a company cares about accessibility, or only wishes to appear as though it cares about accessibility, the final result is good for the blind community either way. More software, more games, and more inclusion into the wider community.

Because the blind community has been somewhat starved for innovation, it grants some level of forgiveness on mistakes. The mainstream world expects a certain level of quality and usability, as each new product has to compete with the ones that came before it. If we examined a random item at a craft show, it might be perfectly acceptable as a hand-made “craft show item”, but would seem very out of place on a store shelf. In a similar way, the quality of audio games are often compared only to other audio games. An audio game of perfectly acceptable quality, would seem very out of place when viewed by sighted mainstream players.

This shouldn’t be taken advantage of, but the blind community has had less to work with and is therefore generally happier with what they can get. If you had a somewhat clunky (but accessible) menu, mainstream players would likely complain about it and list it as a fault of your game. The very same feature could bring praise from the blind players. Adding accessibility can be a bit of a safety net in that way. Let’s imagine your project has an unfortunate launch with a mediocre public rating and very little attention. With good accessibility, you may have hundreds (or thousands) of blind users who are likely happier with the product than their sighted counterparts. A few blind users posting positive messages will go a long way, plus online reviewers and podcasters understand that inclusion/accessibility sells.

3.13.2. What Impact Making My Games Non-Visual Has Had on Me

When I started making audio games I didn’t have any blind/visually impaired people in my life, so I found myself part of a whole new community. There is a different mindset to accessible projects, and it gave me something new to puzzle over (which I do enjoy).

While there are hundreds of audio games, there are only hundreds of audio games. By mainstream standards that is insanely low, and this total has been spread out over decades. I am nowhere near the beginning of the field, but with so few existing titles I get to introduce new mechanics and concepts to a whole community of players. There are entire genres that may only have a handful of accessible games, giving tons of room to break new ground on one of my own. It isn’t something I think about often, but I have a place in video game history, even though it’s a small corner of it. In the mainstream world perhaps I could have had some successful games, but none would have been remembered or written about as milestones in gaming. My accessible games have been written about in magazines (old school paper and modern digital), included on a magazine CD (some places still do those!), and have led to dozens of recorded interviews. Without feeling like I’ve actually earned it, I am listed among the experts of blind gaming and accessibility.

I never feel like that sort of attention is earned, but it all comes down to being in the right field at the right time. I wandered into the world of audio games at a time when I could introduce new ideas and set new standards. I think I’d like to stick around a while longer, to see how much more I can do.

4. Overall Discussion

There has been confusion around how to make VR accessible to BLVIs, which has led to almost no accessible mainstream VR environments [123,181]. This paper aims to provide BLVI and audio game developers with a voice in the conversation they have so far been mostly left out of around VR accessibility. The design patterns presented in this article have, for the most part, been rigorously tested in the commercial market and are in frequent use within the audio game community. There can and should be further research performed to perfect these designs, but these are a baseline that VR interfaces should meet before experimenting with new interface conventions. The recommendations from other VR and game accessibility guidelines should also be used, but they typically lack details for non-verbal non-visual social interactions [124,125,126].

These designs can be rendered through commodity headphones and interacted with through a keyboard. Most BLVIs lack special hardware and prefer using what they are familiar with. Even though haptic, visual, smell, and taste are lacking in headphone output, BLVIs often lack the desire or resources to acquire interfaces that render these extra senses. Additionally, the value provided by vibrations or other off-the-shelf hardware is often not worth the expense and trouble of adding the device to a setup. Audio games are played completely through audio. Headphones and keyboard are the baseline hardware setup most BLVIs have access to. As it currently stands, for BLVIs, the only VR headset that renders linguistic content are headphones. Tactile gloves are being developed that can render braille, but they are not yet on the market [55,56,182].

Although existing XR accessibility and game accessibility guidelines offer many best practices, the complexity of certain interactions often requires additional support from experts in audio games and VR for disabled users. Mainstream VR developers should collaborate with audio game specialists and engage them to lead the design and implementation of non-visual interfaces. Developer contact information is usually available on audio game websites and in readme files; the forums on audiogames.net can help locate individual contributors; and companies run by blind and low-vision (BLVI) VR experts can provide professional consultation [110,124,141].

BLVI VR developers often lack traditional backgrounds and require expanding job requirements to include their qualifications. Despite numerous laws mandating equal access to education, BLVIs have historically faced barriers to obtaining computer science degrees because university curricula remain largely inaccessible [183,184]. Many mainstream VR development platforms are completely unusable by BLVI developers [129,134], forcing them to build both their own game engines and their games from scratch. A BLVI developer created a Godot accessibility plugin that provided some access to the interface [133], and recently the Godot team integrated interface accessibility into the platform by default [132]. As a result, there exists a pool of exceptionally capable BLVI developers with nontraditional backgrounds who often do not meet the standard requirement of a computer science degree at large companies. Instead of requiring a computer science degree, a requirement could be to release one game title with over 1000 downloads, or release a game engine on Github, both of which are arguably greater indicators of job performance than a degree.

There are fewer than a thousand audio game developers worldwide, and just 829 audio games exist, compared with over 750,000 games created by 230,000 developers using Unity alone [110,185]. If a company is serious about diversity and inclusion, it must adopt proactive and flexible recruitment strategies to reach BLVI developers. Naughty Dog, for example, collaborated directly with BLVI consultants during the development of The Last of Us Part II, resulting in one of the most accessible and beloved games among both BLVI and sighted players; it went on to win 39 awards and receive 36 nominations [145,186].

Many of the design patterns presented in this article are sighted-centric and making them accessible is unlikely to benefit BLVI players. Third-person movement and POV shift are two examples of visual interfaces that are likely to result in a negative interaction for non-visual users. The evaluation in [99] found that although eye gaze was helpful for BLVIs to understand if people they were talking with were paying attention, it increased their anxiety and stress to have this information. Similarly, auditory beacons and radars are interface elements that are very common in audio games but would be a detrimental experience for sighted users. Other interface elements (e.g., footstep and collision sounds) would be beneficial to both BLVI and sighted users. Although most interface elements could benefit both groups of users, there are unique conventions for both groups as well.

5. Final Conclusions

The practical implications of this evaluation for mainstream VR developers are a set of design patterns that can be used as the foundation of an accessible VR interface, but professional consulting and user testing are required before release. These designs should be detailed enough for mainstream VR developers to directly implement them into their games. The provided examples and referenced games should also provide a model that mainstream VR developers can use. One important note is that detailed training and documentation is required for any non-visual interface.

These patterns have also only been tested through the rigors of a commercial market, and academic research still needs to be performed to evaluate scientific efficacy comparing most presented patterns with a baseline. A recommended baseline for many interfaces could be the default experience without any modification (e.g., no indication of eye gaze vs. the proposed interaction previously mentioned). The variables of performance, preference, and emotional affect could be measured. Performance could measure the number of times users were able to identify an interaction or accomplish a task. Preference could be comparative preference between two conditions (e.g., “What do you prefer, condition 1 or condition 2?”). Emotional affect could include the System Usability Scale [187], NASA Task Load Index [188], or Buzz (an Auditory user interface scale) [189]. Although sighted participants can be used as initial pilot participants, or to compare visual and non-visual interfaces, BLVIs should participate in the design and evaluation of any optimizations to these designs. These interactions should provide a starting point for future researchers to improve on.

Future work should also be done to create these design patterns in a mainstream VR development platform as either a software development kit, or to use as examples. Regrettably, due to the inaccessibility of Unity and other VR development platforms, the authors were unable to create these examples. Instead, existing audio games were cited that utilize these conventions (or something similar). Recordings of many of the examples can be found in the Supplemental Materials.

The evaluation in this paper is the first study focused on audio game developers and is also the first paper to present a large number of audio game conventions. Most audio game research has been performed by sighted individuals on BLVIs [66,122,144,190,191]. This has meant that conventions built by and for BLVIs within the audio game community were missing from the early designs of these games. Although some designs may show efficacy against a baseline, they may be unpleasant or difficult to use for long periods of time in a game or VR environment. An advantage of working with audio game developers are that they all have BLVI players who spend thousands of hours playing their games and naturally provide feedback on suboptimal experiences. Audio games also have strong conventions that have not been collected or codified for developers outside the audio game community to use. This paper serves as an early collection of conventions, although further research should systematically evaluate a collection of audio games across multiple dimensions. The assumption is the designs presented in this paper can serve as the recommended experience to complement a systematic evaluation. Future work could also expand a similar Delphi method for game-specific conventions (e.g., communicating health, targeting, combat, etc.). Game developers fitting these criteria can be found through looking at audio game archives, audio game forums, and reviewing contact information present in the readme of audio games [110,192].

There are both active sighted audio game players and developers. Sighted VR developers who wish to embrace non-visual conventions should (1) hire audio game developers, and (2) explore the aforementioned audio games to experience the non-visual conventions that can make their games inclusive to an additional 285 million BLVIs, 400 million sighted users with auditory VR headsets, and more inclusive for everyone [13,193].

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/virtualworlds4020025/s1, Videos demonstrating different conventions taken from audio games can be found in the below repository. The raw, final round of the delphi replies can also be found in the below repository: Biggs, Brandon. Creating Non-Visual Non-Verbal Social Interactions in Virtual Reality. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], https://doi.org/10.3886/E224701V1 accessed on 29 March 2025. Table S1: Inventory of social interactions (https://www.openicpsr.org/openicpsr/project/224701/version/V1/view?path=/openicpsr/224701/fcr:versions/V1/Round-3-Inventory-of-Non-Verbal-Social-Interactions-in-VR---Blind-Users---Final-Combined-Interactions.csv&type=file, accessed on 1 April 2025).

Author Contributions

Conceptualization, B.B.; methodology, B.B. and B.N.W.; validation, B.B. and S.M.; formal analysis, B.B.; investigation, B.B.; resources, B.B. and S.M.; writing-original draft preparation, B.B.; writing-review and editing, B.B., S.M., B.N.W., and P.C.; supervision, B.N.W. and P.C.; project administration, B.N.W. and P.C.; funding acquisition, B.N.W. and P.C.; All authors have read and agreed to the published version of the manuscript.

Funding

No external funding was used.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of Georgia Institute of Technology, Building Accessible Non-Verbal Social Interactions in Virtual Reality (VR), 8 February 2022.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

All data is in the Supplemental Materials under the Table S1 Round 3 Inventory of Non-Verbal Social Interactions in VR-Blind Users-Final Combined Interactions csv.

Acknowledgments

The authors would like to thank: Hannah Agbaroji for assistance drafting the initial version of the background section. Rachel Lowy for drafting the IRB and brainstorming initial study design.

Conflicts of Interest

Brandon Biggs is the CEO of XR Navigation, a company providing a web platform that makes digital spatial diagrams usable to everyone. The company contains a number of audio game developers as well (although as of the time of this study, none of the participants were part of XR Navigation).

Abbreviations

VR	virtual reality
VE	virtual environment
BLVI	blind and low vision individual
HRTF	head related transfer function (for spatial audio) see 2D or 3D audio for more information. Technically HRTF is a function that calculates the volume of sounds based on the size of the user’s head. This is supposed to be as close to real life as possible. Frequently, such as in web audio, head sizes are assumed and not customizable by the user. In this paper, HRTF, 2D audio, 3D audio, and spatial audio all mean the same. Sounds that are positioned around the user in space.
2D or 3D audio	Same as above. HRTF, 2D audio, 3D audio, binaural, and spatial audio, all refer to sounds being designed such that they seem to be coming from a specific location in space, around the user’s head.
IVR	immersive virtual reality
AR	augmented reality
AI	artificial intelligence
3D	three-dimensional (i.e., located in the space around the user)
2D	two-dimensional (i.e., located on a surface)
UI	user interface
LLM	large language model
ADA	Americans With Disabilities Act
POV	point of view

Appendix A. Social Interactions Inventory

Category or Subcategory	Short Description	Design Summary
Category: Locative Movement
Direct Teleportation:	The user pushes a button and a target appears on the ground where they are pointing; they may move the target around. The user teleports on release.	Activate teleport mode, move to a free location indicated by speech message and sound and activate teleport
Analog Stick Movement:	Joystick and its buttons are used to move the character within the virtual environment.	As normal, but with movement and colision sounds
1:1 Player Movement:	The relative position of the player’s body in their physical play space is mapped directly to the position of the avatar in VR, so that the user’s body moves exactly as it does in real life.	Map movement to player’s head, use movement and vocal sounds
Third-Person Movement:	The player places a moveable target in the environment, using a teleportation arc to place it. The user views the avatar from a third-person point of view (POV) instead of the usual first-person POV.	Same as analog stick movement, there is no third-person in audio
Hotspot Selection:	Users can jump from hotspot to hotspot, but no other movement is supported (e.g., interactions take place around a table, with chairs for users, and users may only sit in the chairs).	Menu of hotspots that play a sound in spatial audio and speech message when focused
Please list and describe any movements we have missed in the category of Locative Movement:	If the game has automatic movement, such as in rail shooters, art, or sequences in moving vehicles for example, this should be represented in an environment friendly way.	Indicate movement and collisions with sound
Category: Camera Positions
POV Shift:	Shifting the POV from first- to third-person view (e.g., watching an interaction from a first-person vs. third-person perspective).	Do not POV shift, but if needed, gradual shift indicated with speech messages and sound cues
Category: Facial Control
Expression Preset:	The user has selectable/templatized facial expressions to choose from in menus or interfaces. Presets manipulate the entire face, not individual features.	Text message combined with a short musical phrase
Puppeteered Expressions:	Users control and compose individual facial features (or linked constellations of features) through a range of possible expressions to varying degrees. A user might puppeteer an expression from slightly smiling to grinning broadly, and any point in-between extremes.	Experimental: Multiple short musical phrases tracking facial elements simultaneously
Lip Sync:	The movement of the avatar’s lips or mouth synchronizes with the player’s voice audio (or with another voice track, such as a song).	Play speech, or experimental: phonemes if only lips are moving
Gaze/Eye Fixation:	The ability for the avatar’s gaze to fixate on items or people in their environment.	Text messages, or experimental: moving sound from gazer to target combined with speech message
Category: Multi-Avatar Interactions
Physical Contact:	Interactions where two or more avatars interact physically (e.g., a high five, hugging, dancing, poking, shaking hands, or kissing). These can be intentional or unintentional (e.g., bumping into someone in a crowded room).	Exaggerated short sounds combined with a speech message
Avatar–Avatar Collisions:	Avatars collide with one another. These are intentional collisions and may result either in moving through the other player’s avatar, or bumping off of them.	Exaggerated short sounds combined with a speech message
Category: Gesture and Posture
Micro-Gestures:	Automatic gestures, such as eye blinking, shifting weight and other actions people perform without conscious effort.	This should be optional: short sounds and musical phrases combined with text messages
Posable/Movable Appendages:	The avatar’s body movements that occur when the head/torso/arm/leg of the avatar moves. These movements change in response to the player’s head/torso/arm movement in space.	Speech message describing body position combined with spatial audio for sounds of body parts in spatial audio
Mood/Status:	The way that the avatar’s movement may communicate the avatar’s general emotional state. When the mood/status changes movements may change subtly to match the mood.	This should be optional: Text message describing emotional state
Proxemic Acts:	Movements which occur in relation to how close/far users should be able to perceive communication. For instance, increasing the voice volume of speaking or text size of written messages.	Use spatial audio
Iconic Gestures:	Use of gestures which have a clear and direct meaning related to communication. These include social gestures (e.g., waving, pointing) and representational gestures (e.g., miming actions, such as scissors with one’s hand).	Text messages
Category: Avatar Mannerisms
Personal Mannerisms:	Smaller movements that avatars perform periodically and/or repetitively (e.g., rocking, swaying, tapping feet, shaking legs, twisting/pulling hair, playing with clothes/jewelry).	Short sounds or musical phrases
Category: Conventional Communication
Visual Communication Support:	Visuals (e.g., mind maps, whiteboards) used to record and organize ideas while meeting, brainstorming, or collaborating with others.	Text messages, and experimental: short musical phrases based on different messages or changes
Public Postings:	Areas that host messages meant for public viewing (e.g., bulletin boards)	Text messages in a menu, and experimental: short musical phrases based on different messages
Synchronous Communication:	Non-verbal communication between two or more avatars in real time (e.g., signing, writing).	Short sounds tied to user actions, or experimental: short sounds tied to body parts, that change speed based on user movement
Asynchronous Direct Communication:	Emails/messages sent directly to someone, but which may not be received in real time.	Same as currently exists, short sound and text message indicating new mail, messages show in a menu
Category: Avatar Appearance
Avatar Customization:	Selecting personalized looks and voice, including gender expression, body features, and dress of avatars.	Describe each option and have a speech message when a new option is selected with the description
Words on Body:	These are words that are written on the avatar’s clothes or directly on the body, e.g., words on a T-shirt, tattoo, product logo.	Create multiple detail levels describing the user including AI recognized text
Category: Avatar–Environment Interactions
Avatar–Environment Collisions:	Avatar behavior when they collide with objects or walls in the environment. (e.g., hitting the wall, running into trees, and interacting with the environment without going through it).	Short collision sounds
Object Manipulation:	Avatars interaction with objects in the environment (e.g., throwing a ball back and forth, playing the guitar), or and indications that an object can be manipulated (e.g., showing that a ball can be picked up, or a button can be pushed).	Short sounds in spatial audio that change based on state variables
Please list and describe if we have missed any movements in the category of Avatar–Environment Interactions:	Orientation in a room	Use spatial audio to help orient to a room
Category: Full-Body Interactions
Emotes:	Preset animations that involve a full-body demonstration of an emotion (e.g., jumping up and down when happy, waving arms to gain attention).	Short sound in spatial audio
Category: Environment Appearance
Customization:	Adding objects and NPCs into the world and making them look a particular way (e.g., decorating a room, changing the lighting).	Menu of objects with short sounds in spatial audio
Category: Non-Body Communication
Emojis:	Emojis that appear above avatar heads to signal emotions (e.g., large happy face above a head, heart emojis surrounding avatar to signal love).	Text message combined with a short musical phrase

References

Cipresso, P.; Giglioli, I.A.C.; Raya, M.A.; Riva, G. The past, present, and future of virtual and augmented reality research: A network and cluster analysis of the literature. Front. Psychol. 2018, 9, 2086. [Google Scholar] [CrossRef] [PubMed]
Kreimeier, J.; Götzelmann, T. Two decades of touchable and walkable virtual reality for blind and visually impaired people: A high-level taxonomy. Multimodal Technol. Interact. 2020, 4, 79. [Google Scholar] [CrossRef]
Yiannoutsou, N.; Johnson, R.; Price, S. Non visual virtual reality. Educ. Technol. Soc. 2021, 24, 151–163. [Google Scholar] [CrossRef]
Desai, P.R.; Desai, P.N.; Ajmera, K.D.; Mehta, K. A review paper on oculus rift-a virtual reality headset. arXiv 2014, arXiv:1408.1173. [Google Scholar] [CrossRef]
Picinali, L.; Afonso, A.; Denis, M.; Katz, B.F.G. Exploration of architectural spaces by blind people using auditory virtual reality for the construction of spatial knowledge. Int. J. Hum. Comput. Stud. 2014, 72, 393–407. [Google Scholar] [CrossRef]
Zhao, Y.; Bennett, C.L.; Benko, H.; Cutrell, E.; Holz, C.; Morris, M.R.; Sinclair, M. Enabling people with visual impairments to navigate virtual reality with a haptic and auditory cane simulation. In Proceedings of the 2018 CHI Conference on Human factors in Computing Systems, Montreal, QC, Canada, 21–26 April 2018; pp. 1–14. [Google Scholar] [CrossRef]
Sinclair, J.-L. Principles of Game Audio and Sound Design: Sound Design and Audio Implementation for Interactive and Immersive Media; CRC Press: Boca Raton, FL, USA, 2020. [Google Scholar]
De Almeida, G.C.; de Souza, V.C.; Da Silveira Júnior, L.G.; Veronez, R. Spatial audio in virtual reality: A systematic review. In Proceedings of the 25th Symposium on Virtual and Augmented Reality, Rio Grande, Brazil, 6–9 November 2023; pp. 264–268. [Google Scholar] [CrossRef]
Control Spatial Audio and Head Tracking. 2024. Available online: https://support.apple.com/en-ph/guide/airpods/dev00eb7e0a3/web (accessed on 1 April 2025).
Using Directional Audio—Zoom Support. 2023. Available online: https://support.zoom.com/hc/en/article?id=zm_kb&sysparm_article=KB0058025 (accessed on 1 April 2025).
Spatial Audio in Microsoft Teams Meetings—Microsoft Support. 2024. Available online: https://support.microsoft.com/en-gb/office/spatial-audio-in-microsoft-teams-meetings-547b5f81-1825-4ee1-a1cf-f02e12db4fdb (accessed on 1 April 2025).
Stefan Campbell. VR Headset Sales and Market Share in 2023 (How Many Sold?). 2023. Available online: https://thesmallbusinessblog.net/vr-headset-sales-and-market-share/#:~:text=2019%20%E2%80%93%205.51%20million%20pieces%20of,been%20sold%20during%20the%20year (accessed on 1 April 2025).
Ferjan, M. Interesting AirPods Facts 2023: AirPods Revenue, Release Date, Units Sold; HeadphonesAddict: 2023. Available online: https://headphonesaddict.com/airpods-facts-revenue/ (accessed on 1 April 2025).
PEAT LLC. Inclusive XR & Hybrid Work Toolkit. 2022. Available online: https://www.peatworks.org/inclusive-xr-toolkit/ (accessed on 1 April 2025).
Thévin, L.; Brock, A. How to move from inclusive systems to collaborative systems: The case of virtual reality for teaching o&m. In CHI 2019 Workshop on Hacking Blind Navigation; HAL: Lyon, France, 2019. [Google Scholar]
Carruth, D.W. Virtual reality for education and workforce training. In Proceedings of the 2017 15th International Conference on Emerging Elearning Technologies and Applications (ICETA), Stary Smokovec, Slovakia, 26–27 October 2017; pp. 1–6. [Google Scholar] [CrossRef]
Pillai, A.S.; Mathew, P.S. Impact of virtual reality in healthcare: A review. In Virtual and Augmented Reality in Mental Health Treatment; IGI Global: Hershey, PA, USA, 2019; pp. 17–31. [Google Scholar] [CrossRef]
Striuk, A.; Rassovytska, M.; Shokaliuk, S. Using blippar augmented reality browser in the practical training of mechanical engineers. arXiv 2018, arXiv:1807.00279. [Google Scholar] [CrossRef]
Torres-Gil, M.A.; Casanova-Gonzalez, O.; González-Mora, J.L. Applications of virtual reality for visually impaired people. WSEAS Trans. Comput. 2010, 9, 184–193. [Google Scholar] [CrossRef]
The Journey Forward: Recovery from the COVID-19 Pandemic. 2022. Available online: https://www.afb.org/research-and-initiatives/covid-19-research/journey-forward/introduction (accessed on 1 April 2025).
Biggs, B.; Agbaroji, H.; Toth, C.; Stockman, T.; Coughlan, J.M.; Walker, B.N. Co-designing auditory navigation solutions for traveling as a blind individual during the COVID-19 pandemic. J. Blind. Innov. Res. 2024, 14, 1. [Google Scholar] [CrossRef]
Thomas, D.; Warwick, A.; Olvera-Barrios, A.; Egan, C.; Schwartz, R.; Patra, S.; Eleftheriadis, H.; Khawaja, A.; Lotery, A.; Müller, P.; et al. Estimating excess visual loss in people with neovascular age-related macular degeneration during the COVID-19 pandemic. BMJ Open. 2020, 12, e057269. [Google Scholar] [CrossRef]
Williams, M.A.; Hurst, A.; Kane, S.K. “pray before you step out” describing personal and situational blind navigation behaviors. In Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility, Bellevue, DC, USA, 21–23 October 2013; pp. 1–8. [Google Scholar] [CrossRef]
Cliburn, D.C. Teaching and learning with virtual reality. J. Comput. Sci. Coll. 2023, 39, 19–27. [Google Scholar] [CrossRef]
Hoffmann, C.; Büttner, S.; Prilla, M. Conveying procedural and descriptive knowledge with augmented reality. In Proceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments, Corfu, Greece, 29 June–1 July 2022; pp. 40–49. [Google Scholar] [CrossRef]
Damiani, L.; Demartini, M.; Guizzi, G.; Revetria, R.; Tonelli, F. Augmented and virtual reality applications in industrial systems: A qualitative review towards the industry 4.0 era. IFAC-Pap. 2018, 51, 624–630. [Google Scholar] [CrossRef]
Nair, V.; Ma, S.-E.; Penuela, R.E.G.; He, Y.; Lin, K.; Hayes, M.; Huddleston, H.; Donnelly, M.; Smith, B.A. Uncovering visually impaired gamers’ preferences for spatial awareness tools within video games. In Proceedings of the 24th international ACM SIGACCESS Conference on Computers and Accessibility, Athens, Greece, 23–26 October 2022; pp. 1–16. [Google Scholar] [CrossRef]
Fehling, C.D.; Müller, A.; Aehnelt, M. Enhancing vocational training with augmented reality. In Proceedings of the 16th International Conference on Knowledge Technologies and Data-Driven Business, Graz, Austria, 16–19 September 2016. [Google Scholar] [CrossRef]
Leporini, B.; Buzzi, M.; Hersh, M. Video conferencing tools: Comparative study of the experiences of screen reader users and the development of more inclusive design guidelines. ACM Trans. Access. Comput. 2023, 16, 1–36. [Google Scholar] [CrossRef]
Schröder, J.-H.; Schacht, D.; Peper, N.; Hamurculu, A.M.; Jetter, H.-C. Collaborating across realities: Analytical lenses for understanding dyadic collaboration in transitional interfaces. In Proceedings of the 2023 CHI conference on human factors in computing systems, Hamburg, Germany, 23–28 April 2023; pp. 1–16. [Google Scholar] [CrossRef]
Le, K.-D.; Ly, D.-N.; Nguyen, H.-L.; Le, Q.-T.; Fjeld, M.; Tran, M.-T. HybridMingler: Towards mixed-reality support for mingling at hybrid conferences. In Proceedings of the Extended abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, Hamburg, Germany, 23–28 April 2023; pp. 1–7. [Google Scholar] [CrossRef]
Biggs, B.; Toth, C.; Stockman, T.; Coughlan, J.M.; Walker, B.N. Evaluation of a non-visual auditory choropleth and travel map viewer. In Proceedings of the 27th International Conference on Auditory Display, Virtually, 24–27 June 2022. [Google Scholar] [CrossRef]
Wisotzky, E.L.; Rosenthal, J.-C.; Eisert, P.; Hilsmann, A.; Schmid, F.; Bauer, M.; Schneider, A.; Uecker, F.C. Interactive and multimodal-based augmented reality for remote assistance using a digital surgical microscope. In Proceedings of the 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), Osaka, Japan, 23–27 March 2019; pp. 1477–1484. [Google Scholar] [CrossRef]
Aira. 2018. Available online: https://aira.io/ (accessed on 1 April 2025).
Oda, O.; Elvezio, C.; Sukan, M.; Feiner, S.; Tversky, B. Virtual replicas for remote assistance in virtual and augmented reality. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology, Charlotte, NC, USA, 11–15 November 2015; pp. 405–415. [Google Scholar] [CrossRef]
Mourtzis, D.; Siatras, V.; Angelopoulos, J. Real-time remote maintenance support based on augmented reality (AR). Appl. Sci. 2020, 10, 1855. [Google Scholar] [CrossRef]
Cofano, F.; Di Perna, G.; Bozzaro, M.; Longo, A.; Marengo, N.; Zenga, F.; Zullo, N.; Cavalieri, M.; Damiani, L.; Boges, D.J.; et al. Augmented reality in medical practice: From spine surgery to remote assistance. Front. Surg. 2021, 8, 657901. [Google Scholar] [CrossRef]
Lányi, S. Virtual reality in healthcare. In Intelligent Paradigms for Assistive and Preventive Healthcare; Springer: Berlin/Heidelberg, Germany, 2006; pp. 87–116. [Google Scholar] [CrossRef]
Wedoff, R.; Ball, L.; Wang, A.; Khoo, Y.X.; Lieberman, L.; Rector, K. Virtual showdown: An accessible virtual reality game with scaffolds for youth with visual impairments. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Scotland, UK, 4–9 May 2019; pp. 1–15. [Google Scholar] [CrossRef]
Moline, J. Virtual reality for health care: A survey. Virtual Real. Neuro-Psycho-Physiol. 1997, 44, 3–34. [Google Scholar] [CrossRef]
Shenai, M.B.; Dillavou, M.; Shum, C.; Ross, D.; Tubbs, R.S.; Shih, A.; Guthrie, B.L. Virtual interactive presence and augmented reality (VIPAR) for remote surgical assistance. Operat. Neurosurg. 2011, 68, ons200–ons207. [Google Scholar] [CrossRef] [PubMed]
Riva, G. Virtual reality for health care: The status of research. Cyberpsychol. Behav. 2002, 5, 219–225. [Google Scholar] [CrossRef]
Wilson, C.J.; Soranzo, A. The use of virtual reality in psychology: A case study in visual perception. Comput Math Methods Med. 2015, 2015, 151702. [Google Scholar] [CrossRef]
Ruotolo, F.; Maffei, L.; Di Gabriele, M.; Iachini, T.; Masullo, M.; Ruggiero, G.; Senese, V.P. Immersive virtual reality and environmental noise assessment: An innovative audio–visual approach. Environ. Impact Assess. Rev. 2013, 41, 10–20. [Google Scholar] [CrossRef]
Walker, B.N.; Nees, M.A. Chapter 2: Theory of sonification. In The Sonification Handbook; Hermann, T., Hunt, A., Neuhoff, J.G., Eds.; Logos Publishing House: Berlin, Germany, 2011; Available online: http://sonification.de/handbook/download/TheSonificationHandbook-chapter2.pdf (accessed on 1 April 2025).
brinkman, W.-P.; Hoekstra, A.R.D.; Van Egmond, R. The effect of 3D audio and other audio techniques on virtual reality experience. Annu. Rev. Cybertherapy Telemed. 2015, 219, 44–48. [Google Scholar] [CrossRef]
LaValle, S.M.; Yershova, A.; Katsev, M.; Antonov, M. Head tracking for the oculus rift. In Proceedings of the 2014 IEEE international conference on robotics and automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 187–194. [Google Scholar] [CrossRef]
Wu, T.L.Y.; Gomes, A.; Fernandes, K.; Wang, D. The effect of head tracking on the degree of presence in virtual reality. Int. J. Hum. –Comput. Interact. 2019, 35, 1569–1577. [Google Scholar] [CrossRef]
Knigge, J.-K. Theoretical background: The use of virtual reality head-mounted devices for planning and training in the context of manual order picking. In Virtual Reality in Manual Order Picking: Using Head-Mounted Devices for Planning and Trainin; Springer: Berlin/Heidelberg, Germany, 2021; pp. 13–32. [Google Scholar] [CrossRef]
Pamungkas, D.S.; Ward, K. Electro-tactile feedback system to enhance virtual reality experience. IJCTE 2016, 8, 465–470. [Google Scholar] [CrossRef]
Buttussi, F.; Chittaro, L. Locomotion in place in virtual reality: A comparative evaluation of joystick, teleport, and leaning. IEEE Trans. Vis. Comput. Graph. 2019, 27, 125–136. [Google Scholar] [CrossRef]
de Pascale, M.; Mulatto, S.; Prattichizzo, D. Bringing haptics to second life for visually impaired people. In Proceedings of the International Conference on Human Haptic Sensing and Touch Enabled Computer Applications, Madrid, Spain, 10–13 June 2008; pp. 896–905. [Google Scholar] [CrossRef]
Bekrater-Bodmann, R.; Foell, J.; Diers, M.; Kamping, S.; Rance, M.; Kirsch, P.; Trojan, J.; Fuchs, X.; Bach, F.; Çakmak, H.K.; et al. The importance of synchrony and temporal order of visual and tactile input for illusory limb ownership experiences–an fMRI study applying virtual reality. PLoS ONE 2014, 9, e87013. [Google Scholar] [CrossRef]
Kunz, A.; Miesenberger, K.; Zeng, L.; Weber, G. Virtual navigation environment for blind and low vision people. In Proceedings of the International Conference on Computers Helping People with Special Needs, Linz, Austria, 11–13 July 2018; pp. 114–122. [Google Scholar] [CrossRef]
HaptX|haptic Gloves for VR Training, Simulation, and Design. 2019. Available online: https://haptx.com/ (accessed on 1 April 2025).
Soviak, A.; Borodin, A.; Ashok, V.; Borodin, Y.; Puzis, Y.; Ramakrishnan, I.V. Tactile accessibility: Does anyone need a haptic glove? In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility, Reno, Nevada, 24–26 October 2016; pp. 101–109. [Google Scholar] [CrossRef]
Afonso-Jaco, A.; Katz, B.F.G. Spatial knowledge via auditory information for blind individuals: Spatial cognition studies and the use of audio-VR. Sensors 2022, 13, 4794. [Google Scholar] [CrossRef]
White, G.R.; Fitzpatrick, G.; McAllister, G. Toward accessible 3D virtual environments for the blind and visually impaired. In Proceedings of the 3rd International Conference on Digital Interactive Media in Entertainment and Arts, Athens, Greece, 10–12 September 2008; pp. 134–141. [Google Scholar] [CrossRef]
Schinazi, V.R.; Thrash, T.; Chebat, D.-R. Spatial navigation by congenitally blind individuals. Wiley Interdiscip. Rev. Cogn. Sci. 2016, 7, 37–58. [Google Scholar] [CrossRef] [PubMed]
Siu, A.F.; Sinclair, M.; Kovacs, R.; Ofek, E.; Holz, C.; Cutrell, E. Virtual reality without vision: A haptic and auditory white cane to navigate complex virtual worlds. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 25–30 April 2020; pp. 1–13. [Google Scholar] [CrossRef]
Allain, K.; Dado, B.; Van Gelderen, M.; Hokke, O.; Oliveira, M.; Bidarra, R.; Gaubitch, N.D.; Hendriks, R.C.; Kybartas, B. An audio game for training navigation skills of blind children. In Proceedings of the 2015 IEEE 2nd VR workshop on Sonic Interactions for Virtual Environments (SIVE), Arles, France, 24 March 2015; pp. 1–4. [Google Scholar] [CrossRef]
Kuriakose, B.; Shrestha, R.; Sandnes, F.E. Tools and technologies for blind and visually impaired navigation support: A review. IETE Tech. Rev. 2022, 39, 3–18. [Google Scholar] [CrossRef]
Guerreiro, J.; Kim, Y.; Nogueira, R.; Chung, S.; Rodrigues, A.; Oh, U. The design space of the auditory representation of objects and their behaviours in virtual reality for blind people. IEEE Trans. Vis. Comput. Graph. 2023, 29, 2763–2773. [Google Scholar] [CrossRef]
Magnusson, C.; Rassmus-Gröhn, K.; Sjöström, C.; Danielsson, H. Navigation and recognition in complex haptic virtual environments–reports from an extensive study with blind users. In Proceedings of the third International ACM Conference on Assistive Technologies, Marina del Rey, CA, USA, 15–17 April 1998. [Google Scholar] [CrossRef]
Balan, O.; Moldoveanu, A.; Moldoveanu, F. Navigational audio games: An effective approach toward improving spatial contextual learning for blind people. Int. J. Disabil. Hum. Dev. 2015, 14, 109–118. [Google Scholar] [CrossRef]
Podkosova, I.; Urbanek, M.; Kaufmann, H. A hybrid sound model for 3D audio games with real walking. In Proceedings of the 29th International Conference on Computer Animation and Social Agents (CASA ’16), Geneva, Switzerland, 23–25 May 2016; pp. 189–192. [Google Scholar] [CrossRef]
Seki, Y.; Sato, T. A training system of orientation and mobility for blind people using acoustic virtual reality. IEEE Trans. Neural Syst. Rehabil. Eng. 2010, 19, 95–104. [Google Scholar] [CrossRef]
Husin, M.H.; Lim, Y.K. InWalker: Smart white cane for the blind. Disabil. Rehabil. Assist. Technol. 2020, 15, 701–707. [Google Scholar] [CrossRef] [PubMed]
Tatsumi, H.; Murai, Y.; Sekita, I.; Tokumasu, S.; Miyakawa, M. Cane walk in the virtual reality space using virtual haptic sensing. In Proceedings of the 2015 IEEE International Conference on Systems, Man, and Cybernetics, Hong Kong, China, 9–12 October 2015. [Google Scholar] [CrossRef]
Tahat, A.A. A wireless ranging system for the blind long-cane utilizing a smart-phone. In Proceedings of the 2009 10th International Conference on Telecommunications, Zagreb, Croatia, 8–10 June 2009; pp. 111–117. [Google Scholar] [CrossRef]
Lécuyer, A.; Mobuchon, P.; Mégard, C.; Perret, J.; Andriot, C.; Colinot, J.-P. HOMERE: A multimodal system for visually impaired people to explore virtual environments. In IEEE Virtual Reality, 2003. Proceedings; IEEE: New York, NY, USA, 2003; pp. 251–258. [Google Scholar] [CrossRef]
Schloerb, D.W.; Lahav, O.; Desloge, J.G.; Srinivasan, M.A. BlindAid: Virtual environment system for self-reliant trip planning and orientation and mobility training. In Proceedings of the 2010 IEEE Haptics Symposium, Washington, DC, USA, 25–26 March 2010; pp. 363–370. [Google Scholar] [CrossRef]
Lahav, O. Virtual reality systems as an orientation aid for people who are blind to acquire new spatial information. Sensors 2022, 22, 1307. [Google Scholar] [CrossRef] [PubMed]
Kreimeier, J.; Karg, P.; Götzelmann, T. BlindWalkVR: Formative insights into blind and visually impaired people’s VR locomotion using commercially available approaches. In Proceedings of the 13th ACM International Conference on PErvasive Technologies Related to Assistive Environments, Corfu, Greece, 30 June 2020–3 July 2020; pp. 1–8. [Google Scholar] [CrossRef]
Strelow, E.R.; Brabyn, J.A. Locomotion of the blind controlled by natural sound cues. Perception 1982, 11, 635–640. [Google Scholar] [CrossRef]
McCrindle, R.J.; Symons, D. Audio space invaders. In Proceedings of the 3rd international conference on disability, virtual reality and associated technologies, Alghero, Italy, 23–25 September 2000. [Google Scholar] [CrossRef]
Fryer, L.; Freeman, J. Presence in those with and without sight: Audio description and its potential for virtual reality applications. J. CyberTherapy Rehabil. 2012, 5, 15–23. [Google Scholar] [CrossRef]
Roberts, J.; Lyons, L.; Cafaro, F.; Eydt, R. Interpreting data from within: Supporting humandata interaction in museum exhibits through perspective taking. In Proceedings of the 2014 Conference on Interaction Design and Children, Aarhus, Denmark, 17–20 June 2014; pp. 7–16. [Google Scholar] [CrossRef]
Fishkin, K.P.; Moran, T.P.; Harrison, B.L. Embodied user interfaces: Towards invisible user interfaces. In Proceedings of the Engineering for Human-Computer Interaction: IFIP TC2/TC13 WG2. 7/WG13. 4 Seventh Working Conference on Engineering for Human-Computer Interaction, Heraklion, Greece, 14–18 September 1998; pp. 1–18. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, H.; Zhang, L.; Zhu, Y.; Hu, F. Double-diamond model-based orientation guidance in wearable human–machine navigation systems for blind and visually impaired people. Sensors 2019, 19, 4670. [Google Scholar] [CrossRef] [PubMed]
McDaniel, T.; Bala, S.; Rosenthal, J.; Tadayon, R.; Tadayon, A.; Panchanathan, S. Affective haptics for enhancing access to social interactions for individuals who are blind. In Proceedings of the Universal Access in Human-Computer Interaction. Design and Development Methods for Universal Access: 8th International Conference, UAHCI 2014, Heraklion, Greece, 22–27 June 2014; Held as Part of HCI International 2014 Proceedings, part i 8. pp. 419–429. [Google Scholar] [CrossRef]
Krishna, S.; Colbry, D.; Black, J.; Balasubramanian, V.; Panchanathan, S. A systematic requirements analysis and development of an assistive device to enhance the social interaction of people who are blind or visually impaired. In Workshop on Computer Vision Applications for the Visually Impaired; HAL: Lyon, France, 2008. [Google Scholar] [CrossRef]
Sarfraz, M.S.; Constantinescu, A.; Zuzej, M.; Stiefelhagen, R. A multimodal assistive system for helping visually impaired in social interactions. Inform. Spektrum 2017, 40, 540–545. [Google Scholar] [CrossRef]
McDaniel, T.; Tran, D.; Devkota, S.; DiLorenzo, K.; Fakhri, B.; Panchanathan, S. Tactile facial expressions and associated emotions toward accessible social interactions for individuals who are blind. In Proceedings of the 2018 Workshop on Multimedia for Accessible Human Computer Interface, Seoul, Republic of Korea, 22 October 2018; pp. 25–32. [Google Scholar] [CrossRef]
McDaniel, T.; Krishna, S.; Balasubramanian, V.; Colbry, D.; Panchanathan, S. Using a haptic belt to convey non-verbal communication cues during social interactions to individuals who are blind. In Proceedings of the 2008 IEEE International Workshop on Haptic Audio Visual Environments and Games, Phoenix, AZ, USA, 16–17 October 2010; pp. 13–18. [Google Scholar] [CrossRef]
Tao, Y.; Ding, L.; Ganz, A. Indoor navigation validation framework for visually impaired users. IEEE Access 2017, 5, 21763–21773. [Google Scholar] [CrossRef]
de Almeida Rebouças, C.B.; Pagliuca, L.M.F.; de Almeida, P.C. Non-verbal communication: Aspects observed during nursing consultations with blind patients. Esc. Anna Nery 2007, 11, 38–43. [Google Scholar] [CrossRef]
James, D.M.; Stojanovik, V. Communication skills in blind children: A preliminary investigation. Child Care Health Dev. 2007, 33, 4–10. [Google Scholar] [CrossRef]
Collis, G.M.; Bryant, C.A. Interactions between blind parents and their young children. Child Care Health Dev. 1981, 7, 41–50. [Google Scholar] [CrossRef]
Klauke, S.; Sondocie, C.; Fine, I. The impact of low vision on social function: The potential importance of lost visual social cues. J. Optom. 2023, 16, 3–11. [Google Scholar] [CrossRef]
Pölzer, S.; Miesenberger, K. Presenting non-verbal communication to blind users in brainstorming sessions. In Proceedings of the Computers Helping People with Special Needs: 14th International Conference, ICCHP 2014, Paris, France, 9–11 July 2014; Proceedings, Part i 14. pp. 220–225. [Google Scholar] [CrossRef]
Maloney, D.; Freeman, G.; Wohn, D.Y. “talking without a voice” understanding non-verbal communication in social virtual reality. In Proceedings of the ACM on Human-Computer Interaction, 4(CSCW2); ACM: New York, NY, USA, 2020; pp. 1–25. [Google Scholar] [CrossRef]
Argyle, M. Non-verbal communication in human social interaction. Non-Verbal Commun. 1972, 2, 1. [Google Scholar] [CrossRef]
Astler, D.; Chau, H.; Hsu, K.; Hua, A.; Kannan, A.; Lei, L.; Nathanson, M.; Paryavi, E.; Rosen, M.; Unno, H.; et al. Increased accessibility to nonverbal communication through facial and expression recognition technologies for blind/visually impaired subjects. In Proceedings of the 13th International ACM SIGACCESS Conference on Computers and Accessibility, Dundee Scotland, UK, 24–26 October 2011; pp. 259–260. [Google Scholar] [CrossRef]
Tanenbaum, T.J.; Hartoonian, N.; Bryan, J. “how do i make this thing smile?” an inventory of expressive nonverbal communication in commercial social virtual reality platforms. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 25–30 April 2020; pp. 1–13. [Google Scholar] [CrossRef]
Ke, F.; Im, T. Virtual-reality-based social interaction training for children with high-functioning autism. J. Educ. Res. 2013, 106, 441–461. [Google Scholar] [CrossRef]
Wigham, C.R.; Chanier, T. A study of verbal and nonverbal communication in second life–the ARCHI21 experience. ReCALL 2013, 25, 63–84. [Google Scholar] [CrossRef]
Ji, T.F.; Cochran, B.; Zhao, Y. VRBubble: Enhancing peripheral awareness of avatars for people with visual impairments in social virtual reality. In Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility, Athens, Greece, 23–26 October 2022; pp. 1–17. [Google Scholar] [CrossRef]
Jung, C.; Collins, J.; Penuela, R.E.G.; Segal, J.I.; Won, A.S.; Azenkot, S. Accessible nonverbal cues to support conversations in VR for blind and low vision people. In Proceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility, St. John’s, NL, Canada, 27–30 October 2024; pp. 1–13. [Google Scholar] [CrossRef]
Giudice, N.A.; Guenther, B.A.; Kaplan, T.M.; Anderson, S.M.; Knuesel, R.J.; Cioffi, J.F. Use of an indoor navigation system by sighted and blind travelers: Performance similarities across visual status and age. ACM Trans. Access. Comput. (TACCESS) 2020, 13, 1–27. [Google Scholar] [CrossRef]
Loeliger, E.; Stockman, T. Wayfinding without visual cues: Evaluation of an interactive audio map system. Interact. Comput. 2014, 26, 403–416. [Google Scholar] [CrossRef]
Walker, B.N.; Wilson, J. SWAN 2.0: Research and development on a new system for wearable audio navigation. In Proceedings of the 2007 11th IEEE International Symposium on Wearable Computers, Boston, MA, USA, 11–13 October 2007. [Google Scholar] [CrossRef]
Ponchillia, P.E.; Jo, S.-J.; Casey, K.; Harding, S. Developing an indoor navigation application: Identifying the needs and preferences of users who are visually impaired. J. Vis. Impair. Blind. 2020, 114, 344–355. [Google Scholar] [CrossRef]
Diaz-Merced, W.L.; Candey, R.M.; Brickhouse, N.; Schneps, M.; Mannone, J.C.; Brewster, S.; Kolenberg, K. Sonification of astronomical data. Proc. Int. Astron. Union 2011, 7, 133–136. [Google Scholar] [CrossRef]
Lakoff, G. Women, Fire, and Dangerous Things: What Categories Reveal about the Mind; The University of Chicago Press: Chicago, IL, USA, 2012. [Google Scholar]
Biggs, B.; Coughlan, J.; Coppin, P. Design and evaluation of an audio game-inspired auditory map interface. Proc. Int. Conf. Audit. Disp. 2019, 2019, 20–27. [Google Scholar] [CrossRef]
Dingler, T.; Lindsay, J.; Walker, B.N. Learnability of sound cues for environmental features: Auditory icons, earcons, spearcons, and speech. In Proceedings of the 14th International Conference on Auditory Display, Paris, France, 24–27 June 2018. [Google Scholar] [CrossRef]
O’Sullivan, J.A.; Power, A.J.; Mesgarani, N.; Rajaram, S.; Foxe, J.J.; Shinn-Cunningham, B.G.; Slaney, M.; Shamma, S.A.; Lalor, E.C. Attentional selection in a cocktail party environment can be decoded from single-trial EEG. Cereb. Cortex 2015, 25, 1697–1706. [Google Scholar] [CrossRef]
Skantze, D.; Dahlback, N. Auditory icon support for navigation in speech-only interfaces for room-based design metaphors. In Proceedings of the 2003 International Conference on Auditory Display, Boston, MA, USA, 6–9 July 2003. [Google Scholar] [CrossRef]
Audiogames.net. AudioGames, Your Resource for Audiogames, Games for the Blind, Games for the Visually Impaired! 2018. Available online: http://audiogames.net/ (accessed on 1 April 2025).
Nair, V.; Karp, J.L.; Silverman, S.; Kalra, M.; Lehv, H.; Jamil, F.; Smith, B.A. NavStick: Making video games blind-accessible via the ability to look around. In Proceedings of the 34th annual ACM Symposium on User Interface Software and Technology, online, 10–14 October 2021; pp. 538–551. [Google Scholar] [CrossRef]
Balan, O.; Moldoveanu, A.; Moldoveanu, F.; Dascalu, M.-I. Audio games-a novel approach towards effective learning in the case of visually-impaired people. In ICERI2014 Proceedings; IATED: Valencia, Spain, 2014; pp. 6542–6548. [Google Scholar] [CrossRef]
Biggs, B.; Yusim, L.; Coppin, P. The Audio Game Laboratory: Building Maps from Games; OCAD University: Toronto, ON, Canada, 2018. [Google Scholar] [CrossRef]
Jørgensen, K. Gameworld Interfaces; MIT Press: Cambridge, MA, USA, 2013. [Google Scholar]
Sam Tupy. Survive the Wild. 2021. Available online: http://www.samtupy.com/games/stw/ (accessed on 1 April 2025).
Max, M.L.; Gonzalez, J.R. Blind persons navigate in virtual reality (VR); hearing and feeling communicates “reality”. In Medicine Meets Virtual Reality; IOS Press: Amsterdam, The Netherlands, 1997; pp. 54–59. [Google Scholar] [CrossRef]
Andrade, R.; Rogerson, M.J.; Waycott, J.; Baker, S.; Vetere, F. Playing blind: Revealing the world of gamers with visual impairment. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Glasgow, UK, 4–9 May 2019; pp. 1–14. [Google Scholar] [CrossRef]
Nielsen, J. 10 Usability Heuristics for User Interface Design. 1994. Available online: https://www.nngroup.com/articles/ten-usability-heuristics/ (accessed on 1 April 2025).
Kaldobsky, J. Swamp. 2011. Available online: http://www.kaldobsky.com/audiogames/ (accessed on 1 April 2025).
Out of Sight Games. A Hero’s Call. 2019. Available online: https://outofsightgames.com/a-heros-call/ (accessed on 1 April 2025).
Magica, M. Materia Magica. 2017. Available online: https://www.materiamagica.com/ (accessed on 1 April 2025).
Gluck, A.; Boateng, K.; Brinkley, J. Racing in the dark: Exploring accessible virtual reality by developing a racing game for people who are blind. In Proceedings of the Human Factors and Ergonomics society Annual Meeting, Baltimore, MD, USA, 3–8 October 2021; pp. 1114–1118. [Google Scholar] [CrossRef]
Gluck, A. Virtual Reality in the Dark: VR Development for People Who are Blind. 2022. Available online: https://equalentry.com/virtual-reality-development-for-blind/ (accessed on 1 April 2025).
Design for Every Gamer—DFEG. Available online: https://www.rnib.org.uk/living-with-sight-loss/assistive-aids-and-technology/tv-audio-and-gaming/design-for-every-gamer/ (accessed on 1 April 2025).
Ellis, B.; Ford-Williams, G.; Graham, L.; Grammenos, D.; Hamilton, I.; Lee, E.; Manion, J.; Westin, T. Game Accessibility Guidelines. Available online: https://gameaccessibilityguidelines.com/full-list/ (accessed on 1 April 2025).
Accessible Platform Architectures Working Group. XR Accessibility User Requirements. 2021. Available online: https://www.w3.org/TR/xaur/ (accessed on 1 April 2025).
Creed, C.; Al-Kalbani, M.; Theil, A.; Sarcar, S.; Williams, I. Inclusive AR/VR: Accessibility barriers for immersive technologies. Univers. Access Inf. Soc. 2024, 23, 59–73. [Google Scholar] [CrossRef]
Perdigueiro, J. A Look at Mobile Screen Reader Support in the Unity Engine. 2024. Available online: https://unity.com/blog/engine-platform/mobile-screen-reader-support-in-unity (accessed on 1 April 2025).
Stealcase. UI Toolkit Screen Reader, Comment 2. Available online: https://discussions.unity.com/t/ui-toolkit-screen-reader/246795/2 (accessed on 1 April 2025).
MetalPop Games. UI Accessibility Plugin (UAP). 2021. Available online: https://assetstore.unity.com/packages/tools/gui/ui-accessibility-plugin-uap-87935 (accessed on 1 April 2025).
Aralan007. An Open Letter: Please Improve Screen Reader Support of the Unity Editor and Engine. 2023. Available online: https://discussions.unity.com/t/an-open-letter-please-improve-screen-reader-support-of-the-unity-editor-and-engine/882417 (accessed on 1 April 2025).
Repiteo. Available online: https://github.com/godotengine/godot/pull/76829 (accessed on 1 April 2025).
lightsoutgames. Godot Accessibility Plugin. 2020. Available online: https://github.com/lightsoutgames/godot-accessibility (accessed on 1 April 2025).
Epic Games. Designing UI for Accessibility in Unreal engine|Unreal Engine 5.5 Documentation. Available online: https://dev.epicgames.com/documentation/en-us/unreal-engine/designing-ui-for-accessibility-in-unreal-engine (accessed on 1 April 2025).
Epic Games. Blind Accessibility Features Overview. Available online: https://dev.epicgames.com/documentation/en-us/unreal-engine/blind-accessibility-features-overview-in-unreal-engine (accessed on 1 April 2025).
Bridge, K.; Coulter, D.; Batchelor, D.; Satran, M. Microsoft Active Accessibility and UI Automation Compared. 2020. Available online: https://learn.microsoft.com/en-us/windows/win32/winauto/microsoft-active-accessibility-and-ui-automation-compared (accessed on 1 April 2025).
Gorla, E. Foundations: Native Versus Custom Components. 2023. Available online: https://tetralogical.com/blog/2022/11/08/foundations-native-versus-custom-components/ (accessed on 1 April 2025).
Federal Communications Commission. 21st Century Communications and Video Accessibility Act (CVAA). 2010. Available online: https://www.fcc.gov/consumers/guides/21st-century-communications-and-video-accessibility-act-cvaa (accessed on 1 April 2025).
U.S. Department of Justice Civil Rights Division. Fact Sheet: New Rule on the Accessibility of Web Content and Mobile Apps Provided by State and Local Governments. 2024. Available online: https://www.ada.gov/resources/2024-03-08-web-rule/ (accessed on 1 April 2025).
Costanza-Chock, S. Design Justice: Towards an intersectional feminist framework for design theory and practice. In Proceedings of the Design as a Catalyst for Change—DRS International Conference, Limerick, Ireland, 25–28 June 2018. [Google Scholar] [CrossRef]
Welcome to XR Navigation! Available online: https://xrnavigation.io/ (accessed on 1 April 2025).
Skulmoski, G.J.; Hartman, F.T.; Krahn, J. The delphi method for graduate research. J. Inf. Technol. Educ. Res. 2007, 6, 1–21. [Google Scholar] [CrossRef]
Screen Reader User Survey 10 Results. 2024. Available online: https://webaim.org/projects/screenreadersurvey10/#intro (accessed on 1 April 2025).
Bălan, O.; Moldoveanu, A.; Moldoveanu, F.; Nagy, H.; Wersényi, G.; Unnórsson, R. Improving the audio game-playing performances of people with visual impairments through multimodal training. J. Vis. Impair. Blind. 2017, 111, 148. [Google Scholar] [CrossRef]
Dog, N. Accessibility Options for the Last of us Part II. Available online: https://www.playstation.com/en-us/games/the-last-of-us-part-ii/accessibility/ (accessed on 1 April 2025).
Bailenson, J.N.; Yee, N. Digital chameleons: Automatic assimilation of nonverbal gestures in immersive virtual environments. Psychol. Sci. 2005, 16, 814–819. [Google Scholar] [CrossRef] [PubMed]
Peña, A.; Rangel, N.; Muñoz, M.; Mejia, J.; Lara, G. Affective behavior and nonverbal interaction in collaborative virtual environments. J. Educ. Technol. Soc. 2016, 19, 29–41. [Google Scholar] [CrossRef]
Using NVDA to Evaluate Web Accessibility. Available online: https://webaim.org/articles/nvda/ (accessed on 1 April 2025).
NV Access. NVDA 2017.4 User Guide. 2017. Available online: https://www.nvaccess.org/files/nvda/documentation/userGuide.html (accessed on 1 April 2025).
Using VoiceOver to Evaluate Web Accessibility. Available online: https://webaim.org/articles/voiceover/ (accessed on 1 April 2025).
Apple. Chapter 1. Introducing VoiceOver. 2020. Available online: https://www.apple.com/voiceover/info/guide/_1121.html (accessed on 1 April 2025).
Kager, D.; Kelman, A. Tolk: Screen reader abstraction library. Available online: https://github.com/dkager/tolk (accessed on 1 April 2025).
Mozilla. ARIA Live Regions. 2019. Available online: https://developer.mozilla.org/en-US/docs/Web/Accessibility/ARIA/ARIA_Live_Regions (accessed on 1 April 2025).
Hubs—Private, Virtual 3D Worlds in Your Browser. Available online: https://hubs.mozilla.com/ (accessed on 1 April 2025).
Ian Reed. Tactical Battle. 2013. Available online: https://blindgamers.com/Home/IanReedsGames (accessed on 1 April 2025).
Zombies, Run! Available online: https://zrx.app/ (accessed on 1 April 2025).
Wu, J. Voice Vista. 2023. Available online: https://drwjf.github.io/vvt/index.html (accessed on 1 April 2025).
Iachini, T.; Ruggiero, G.; Ruotolo, F. Does blindness affect egocentric and allocentric frames of reference in small and large scale spaces? Behav. Brain Res. 2014, 273, 73–81. [Google Scholar] [CrossRef]
DragonApps. Cyclepath. 2017. Available online: https://www.iamtalon.me/cyclepath/ (accessed on 1 April 2025).
The Mud Connector. Getting Started: Welcome to Your First MUD Adventure. 2013. Available online: http://www.mudconnect.com/mud_intro.html (accessed on 1 April 2025).
Tanveer, M.I.; Anam, A.S.I.; Rahman, A.K.M.; Ghosh, S.; Yeasin, M. FEPS: A sensory substitution system for the blind to perceive facial expressions. In Proceedings of the 14th International ACM SIGACCESS Conference on Computers and Accessibility, Boulder, CO, USA, 22–24 October 2012; pp. 207–208. [Google Scholar] [CrossRef]
Valenti, R.; Jaimes, A.; Sebe, N. Sonify your face: Facial expressions for sound generation. In Proceedings of the 18th ACM International Conference on Multimedia, Place, Italy, 25–29 October 2010; pp. 1363–1372. [Google Scholar] [CrossRef]
Denby, B.; Schultz, T.; Honda, K.; Hueber, T.; Gilbert, J.M.; Brumberg, J.S. Silent speech interfaces. Speech Commun. 2010, 52, 270–287. [Google Scholar] [CrossRef]
Kapur, A.; Kapur, S.; Maes, P. Alterego: A personalized wearable silent speech interface. In Proceedings of the 23rd International Conference on Intelligent User Interfaces, Tokyo, Japan, 7–11 March 2018; pp. 43–53. [Google Scholar] [CrossRef]
Foley Sound Effect Libraries. Available online: https://www.hollywoodedge.com/foley.html (accessed on 1 April 2025).
OpenAI. GPT-4 Technical Report. arXiv 2023. [Google Scholar] [CrossRef]
Buzzi, M.C.; Buzzi, M.; Leporini, B.; Mori, G.; Penichet, V.M.R. Collaborative editing: Collaboration, awareness and accessibility issues for the blind. In On the Move to Meaningful Internet Systems: OTM 2014 Workshops. Proceedings of the Confederated International Workshops: OTM Academy, OTM Industry Case Studies Program, c&TC, EI2N, INBAST, ISDE, META4eS, MSC and OnToContent 2014, Amantea, Italy, 27–31 October 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 567–573. [Google Scholar] [CrossRef]
Fan, D.; Glazko, K.; Follmer, S. Accessibility of linked-node diagrams on collaborative whiteboards for screen reader users: Challenges and opportunities. In Design Thinking Research: Achieving Real Innovation; Springer: Berlin/Heidelberg, Germany, 2022; pp. 97–108. [Google Scholar] [CrossRef]
Kolykhalova, K.; Alborno, P.; Camurri, A.; Volpe, G. A serious games platform for validating sonification of human full-body movement qualities. In Proceedings of the 3rd International Symposium on Movement and Computing, Thessaloniki, Greece, 5–6 July 2016; pp. 1–5. [Google Scholar] [CrossRef]
Tiku, K.; Maloo, J.; Ramesh, A.; Indra, R. Real-time conversion of sign language to text and speech. In Proceedings of the 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 15–17 July 2020; pp. 346–351. [Google Scholar] [CrossRef]
Rumble. The Builder’s Tutorial. 2018. Available online: https://www.tbamud.com (accessed on 1 April 2025).
Driftwood Games. Entombed. 2008. Available online: http://www.blind-games.com/ (accessed on 1 April 2025).
Bergin, D.; Oppegaard, B. Automating media accessibility: An approach for analyzing audio description across generative artificial intelligence algorithms. Tech. Commun. Q. 2024, 34, 169–184. [Google Scholar] [CrossRef]
Silverman, A.M.; Baguhn, S.J.; Vader, M.-L.; Romero, E.M.; So, C.H.P. Empowering or excluding: Expert insights on inclusive artificial intelligence for people with disabilities. Am. Found. Blind. 2025. [CrossRef]
Audiom: The world’s Most Inclusive Map Viewer. 2021. Available online: https://audiom.net (accessed on 1 April 2025).
Sketchbook (Your World). Available online: https://sbyw.games/index.php (accessed on 1 April 2025).
Download Sable Proof of Concept Demo. Available online: https://ebonskystudios.com/download-sable-demo/ (accessed on 1 April 2025).
Ebon Sky Studios. Sable demo—Part i (Map Creation). 2018. Available online: https://www.youtube.com/watch?v=wyAkqGlDIgY (accessed on 1 April 2025).
Tigwell, G.W.; Gorman, B.M.; Menzies, R. Emoji accessibility for visually impaired people. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 25–30 April 2020; pp. 1–14. [Google Scholar] [CrossRef]
Cantrell, S.J.; Winters, R.M.; Kaini, P.; Walker, B.N. Sonification of emotion in social media: Affect and accessibility in facebook reactions. In Proceedings of the ACM on Human-Computer Interaction; ACM: New York, NY, USA, 2022; pp. 1–26. [Google Scholar] [CrossRef]
Virtual Reality Accessibility: 11 Things we Learned from Blind Users. 2022. Available online: https://equalentry.com/virtual-reality-accessibility-things-learned-from-blind-users/ (accessed on 1 April 2025).
Soviak, A. Haptic gloves for audio-tactile web accessibility. In Proceedings of the 12th Web for All Conference, Florence, Italy, 18–20 May 2015; p. 40. [Google Scholar] [CrossRef]
Your Rights Under Section 504 of the Rehabilitation Act. 2006. Available online: https://www.hhs.gov/sites/default/files/ocr/civilrights/resources/factsheets/504.pdf (accessed on 1 April 2025).
Rulings, Filings, and Letters. Available online: https://nfb.org/programs-services/legal-program/rulings-filings-and-letters#education (accessed on 1 April 2025).
Unity. Unity Gaming Report. Unity Technologies. 2022. Available online: https://create.unity.com/gaming-report-2022 (accessed on 1 April 2025).
Awards—the Last of Us: Part II. Available online: https://www.imdb.com/title/tt6298000/awards/ (accessed on 1 April 2025).
System Usability Scale (SUS). 2021. Available online: https://www.usability.gov/how-to-and-tools/methods/system-usability-scale.html (accessed on 1 April 2025).
NASA. NASA TLX. 2018. Available online: https://humansystems.arc.nasa.gov/groups/TLX/ (accessed on 1 April 2025).
Tomlinson, B.J.; Noah, B.E.; Walker, B.N. Buzz: An auditory interface user experience scale. In Proceedings of the Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems, Montreal, QC, Canada, 21–26 April 2018; pp. 1–6. [Google Scholar] [CrossRef]
Coleman, G.W.; Hand, C.; Macaulay, C.; Newell, A.F. Approaches to auditory interface design-lessons from computer games. In Proceedings of the 11th International Conference on Auditory Display, Limerick, Ireland, 6–9 July 2005. [Google Scholar] [CrossRef]
Oren, M.A. Speed sonic across the span: Building a platform audio game. In CHI’07 Extended Abstracts on Human Factors in Computing Systems; ACM: New York, NY, USA, 2007; pp. 2231–2236. [Google Scholar] [CrossRef]
Ian Reed. Blind Gamers Home. 2025. Available online: https://blindgamers.com/Home/ (accessed on 1 April 2025).
World Health Organization. Global Data on Visual Impairments. 2010. Available online: https://www.who.int/publications-detail-redirect/world-report-on-vision (accessed on 1 April 2025).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Biggs, B.; Murgaski, S.; Coppin, P.; Walker, B.N. Creating Non-Visual Non-Verbal Social Interactions in Virtual Reality. Virtual Worlds 2025, 4, 25. https://doi.org/10.3390/virtualworlds4020025

AMA Style

Biggs B, Murgaski S, Coppin P, Walker BN. Creating Non-Visual Non-Verbal Social Interactions in Virtual Reality. Virtual Worlds. 2025; 4(2):25. https://doi.org/10.3390/virtualworlds4020025

Chicago/Turabian Style

Biggs, Brandon, Steve Murgaski, Peter Coppin, and Bruce N. Walker. 2025. "Creating Non-Visual Non-Verbal Social Interactions in Virtual Reality" Virtual Worlds 4, no. 2: 25. https://doi.org/10.3390/virtualworlds4020025

APA Style

Biggs, B., Murgaski, S., Coppin, P., & Walker, B. N. (2025). Creating Non-Visual Non-Verbal Social Interactions in Virtual Reality. Virtual Worlds, 4(2), 25. https://doi.org/10.3390/virtualworlds4020025

Article Menu

Creating Non-Visual Non-Verbal Social Interactions in Virtual Reality

Abstract

1. Introduction

1.1. Job Training

1.2. Remote Assist and Collaboration

1.2.1. Healthcare

1.2.2. Visual Virtual Reality

Part 1: Visuals

Part 2: Audio

Part 3: Head Tracking

Part 4: Tactile

1.3. Use of VR with BLVIs While Navigating

1.3.1. Non-Visual VR Navigation Technologies

1.3.2. Locomotive Navigation

1.4. Non-Verbal Social Interactions, Importance for BLVIs

1.5. Inventory of Non-Verbal Social Interactions in VR

1.6. Evaluations of Non-Visual Non-Verbal Social Interactions in VR

1.7. Auditory Display Techniques Useful in Non-Visual VR

1.8. Audio Games

1.9. Game and VR Accessibility Conventions

1.10. Accessibility Barriers in Mainstream VR Platforms

1.11. Self-Disclosure

2. Method for Data Collection

3. Results

3.1. Locative Movement

3.1.1. Direct Teleportation

Description

Results (Quotations)

Discussion

3.1.2. Analog Stick Movement

Description

Results (Quotations)

Discussion

3.1.3. 1:1. Player Movement

Description

Results (Quotations)

Discussion

3.1.4. Third-Person Movement

Description

Results (Quotations)

Discussion

3.1.5. Hotspot Selection

Description

Results (Quotations)

Discussion

3.1.6. Other Thoughts in “Locative Movement”

Results (Quotations)

Discussion

3.2. Camera Positions

3.2.1. POV Shift

Description

Results (Quotations)

Discussion

3.3. Facial Control

3.3.1. Expression Preset

Description

Results (Quotations)

Discussion

3.3.2. Puppeteered Expressions

Description

Results (Quotations)

Discussion

3.3.3. Lip Sync

Description

Results (Quotations)

Discussion

3.3.4. Gaze/Eye Fixation

Description

Results (Quotations)

Discussion

3.3.5. Other Thoughts in “Facial Control”

Results (Quotations)

3.4. Multi-Avatar Interactions

3.4.1. Physical Contact

Description

Results (Quotations)

Discussion

3.4.2. Avatar–Avatar Collisions

Description