Two Decades of Touchable and Walkable Virtual Reality for Blind and Visually Impaired People: A High-Level Taxonomy

: Although most readers associate the term virtual reality (VR) with visually appealing entertainment content, this technology also promises to be helpful to disadvantaged people like blind or visually impaired people. While overcoming physical objects’ and spaces’ limitations, virtual objects and environments that can be spatially explored have a particular beneﬁt. To give readers a complete, clear and concise overview of current and past publications on touchable and walkable audio supplemented VR applications for blind and visually impaired users, this survey paper presents a high-level taxonomy to cluster the work done up to now from the perspective of technology, interaction and application. In this respect, we introduced a classiﬁcation into small-, medium- and large-scale virtual environments to cluster and characterize related work. Our comprehensive table shows that especially grounded force feedback devices for haptic feedback (‘small scale’) were strongly researched in di ﬀ erent applications scenarios and mainly from an exocentric perspective, but there are also increasingly physically (‘medium scale’) or avatar-walkable (‘large scale’) egocentric audio-haptic virtual environments. In this respect, novel and widespread interfaces such as smartphones or nowadays consumer grade VR components represent a promising potential for further improvements. Our survey paper provides a database on related work to foster the creation process of new ideas and approaches for both technical and methodological aspects.


Introduction
Self-determined access to spatial information is much more difficult for blind and visually impaired people than for sighted people. In order to touch and understand unknown spatial objects or to get to know unknown buildings for route planning in advance, physical models like maps or miniature models must be made available or be specially produced. If, for example, the physical environment cannot be explored for the first time together with a sighted assistant, 2.5D tactile maps are oftentimes used, whose benefits can be enhanced by increased interaction possibilities [1,2], e.g., contextual speech output when touched. However, such physical models have limitations when it comes to production costs, time and a limited possibility of interaction, e.g., the limited resolution of a tactile map or the lifetime fixed scale of a 3D printed object. In addition, real environments and sighted assistants are not always freely available which impedes limit the independence blind and visually impaired people.
In order to overcome these limitations, among other things, research has been conducted on virtual reality (VR) in this context and significant progress has been achieved through further technical development. By means of VR, it is possible not only to see virtual objects and environments, but also to hear them (in terms of spatial and semantic information) and to actually grasp and feel them through haptic feedback devices. Virtual content can be basically defined, distributed, scaled and bandwidth [12] and the combination of haptic and audio feedback becomes especially important to blind and visually impaired users. According to the distinction of the dimensionality of tactile information by Reichinger et al. [13], VR can convey 2D, 2.5D and 3D spatial information. This depends of course on the very application and the very technical implementation of the haptic interface with no physical representation of the spatial information. Depending on the very modality, there are also different perspectives: Audio feedback or spatial hearing is by nature egocentric and allows the simultaneous perception of direction and distance to multiple audio sources. Haptic feedback, on the other hand, consists of spatial information gathered by one or more individual collision points using tactile and kinesthetic cues. It can be generated by force feedback or vibration, and thus is neither egonor allocentric, but hand or body-centered. Depending on the conceptual design of the application and the technical implementation, specific characteristics are decisive. For example, haptic feedback can be implemented using grounded or wearable force feedback (see Figure 1). The formers are expensive and rare devices preventing the users from moving a hand inside a virtual object, which feels more intuitive. The latter are more lightweight and less expensive devices (e.g., data gloves), which limit only the fingers' motions to simulate grasping an object, but do not prevent the users from accidently moving their wrist as a whole inside the virtual object. In terms of haptic feedback, the survey papers from Pachierotti et al. [14] and Seifi et al. [15] provide an comprehensive overview regarding haptic rendering devices. In terms of audio rendering, the work from Picinali and Katz [16,17] are exemplary pointers to many other publications which extend the scope of this paper.
In this context, unimodal concepts such as sonification (e.g., [18]) or haptification (e.g., [19]) providing only audio or haptic feedback must of course also be mentioned, but represent independent and separate fields of research. With regard to the above-outlined focus of this paper, however, we will concentrate in the following on the interactive spatial exploration aspects towards touch and walkable VR applications. Audio feedback certainly plays an important role in this context, but in the context of this work, it will be considered as a supplementary modality providing spatial and semantic information for the exploration process.
Multimodal Technol. Interact. 2020, 4, x FOR PEER REVIEW 3 of 22 actuators, respectively) to all sensory modalities. The impairment or even absence of vision drastically reduced the overall sensory bandwidth [12] and the combination of haptic and audio feedback becomes especially important to blind and visually impaired users. According to the distinction of the dimensionality of tactile information by Reichinger et al. [13], VR can convey 2D, 2.5D and 3D spatial information. This depends of course on the very application and the very technical implementation of the haptic interface with no physical representation of the spatial information. Depending on the very modality, there are also different perspectives: Audio feedback or spatial hearing is by nature egocentric and allows the simultaneous perception of direction and distance to multiple audio sources. Haptic feedback, on the other hand, consists of spatial information gathered by one or more individual collision points using tactile and kinesthetic cues. It can be generated by force feedback or vibration, and thus is neither ego-nor allocentric, but hand or body-centered. Depending on the conceptual design of the application and the technical implementation, specific characteristics are decisive. For example, haptic feedback can be implemented using grounded or wearable force feedback (see Figure 1). The formers are expensive and rare devices preventing the users from moving a hand inside a virtual object, which feels more intuitive. The latter are more lightweight and less expensive devices (e.g., data gloves), which limit only the fingers' motions to simulate grasping an object, but do not prevent the users from accidently moving their wrist as a whole inside the virtual object. In terms of haptic feedback, the survey papers from Pachierotti et al. [14] and Seifi et al. [15] provide an comprehensive overview regarding haptic rendering devices. In terms of audio rendering, the work from Picinali and Katz [16,17] are exemplary pointers to many other publications which extend the scope of this paper. In this context, unimodal concepts such as sonification (e.g., [18]) or haptification (e.g., [19]) providing only audio or haptic feedback must of course also be mentioned, but represent independent and separate fields of research. With regard to the above-outlined focus of this paper, however, we will concentrate in the following on the interactive spatial exploration aspects towards touch and walkable VR applications. Audio feedback certainly plays an important role in this context, but in the context of this work, it will be considered as a supplementary modality providing spatial and semantic information for the exploration process. Example to distinguish between grounded and wearable haptic feedback following the differentiation from [14]. On the left side one can see a grounded force feedback device (Sensable Phantom), and on the right side a wearable force feedback (SenseGlove). With the latter, the users' hand is more flexible, but is not prevented from accidentally moving inside a virtual object.

Integration and Differentiation from Existing Surveys
Research in VR for blind people are certainly not a new phenomenon, although technology and methodology are naturally evolving. There are already some works that have summarized and analyzed this development at its time [6][7][8][20][21][22][23][24][25][26]. The oldest survey from Vincent L'evesque in 2005 [21] gives an overview of the state of knowledge at that time on how blind people can be supported by means of haptics. This work presents a very good overview of the special needs and possible applications of how touchable and walkable virtual environments can support blind and visually impaired people, but is no longer up to date and nowadays key publications are missing. In 2007, Cobb and Sharkey [9] presented among other things a review of the previous decade of research and development of VR applications for blind and visually impaired people. This review gives an Figure 1. Example to distinguish between grounded and wearable haptic feedback following the differentiation from [14]. On the left side one can see a grounded force feedback device (Sensable Phantom), and on the right side a wearable force feedback (SenseGlove). With the latter, the users' hand is more flexible, but is not prevented from accidentally moving inside a virtual object.

Integration and Differentiation from Existing Surveys
Research in VR for blind people are certainly not a new phenomenon, although technology and methodology are naturally evolving. There are already some works that have summarized and analyzed this development at its time [6][7][8][20][21][22][23][24][25][26]. The oldest survey from Vincent L'evesque in 2005 [21] gives an overview of the state of knowledge at that time on how blind people can be supported by means of haptics. This work presents a very good overview of the special needs and possible applications of how touchable and walkable virtual environments can support blind and visually impaired people, but is no longer up to date and nowadays key publications are missing. In 2007, Cobb and Sharkey [9] presented among other things a review of the previous decade of research and development of VR applications for blind and visually impaired people. This review gives an interesting and comprehensive look at the state-of-the-art at that time, but is not specifically focused on blind and visually impaired people and is also outdated nowadays. In their study of 2008, White et al. interviewed experts from the practical experience to find out how multimodal approaches to navigation can offer the greatest possible added value and provide important design guidelines and assistance for developers of such systems [6]. Continuing on from there, Ghali et al. gathered in 2012 a number of methods and approaches how VR can help (deaf and) blind people in several application scenarios like mobility, learning or games [8]. Later in 2014, Orly Lahav looked back the previous 14 years of "VEs that were developed to enable people who are blind to improve their O&M skills" (VE means Virtual Environment). In this extensive work, a differentiation between "systems that support the acquisition of a cognitive map" and "systems that are used as O&M rehabilitation aids" is made and the total existing works is analyzed in terms of descriptive information dimension, system dimension and research dimension. According to this, most publications deal with complex prototypes, whose handling is not trivial and only rarely Orientation and Mobility (O+M) experts were involved in development and evaluation. Ideally, a VR system would adapt to the users and could be used also in-situ and handheld. Similarly, the survey paper of Darin et al. [20] from 2015 proposes and discusses a classification of related work based on "Interface, Interaction, Cognition, and Evaluation", they analyzed 21 VEs. While this overview is very vivid, it is not complete nowadays and does not consider multimodal haptic environments in general, which is why some key publications are missing in this context. This is certainly also due to the fact that the last two publications cited were published several years ago. Yasmin published in 2018 an extensive survey on "Virtual Reality and Assistive Technologies" [24] which is certainly closer to today. There is one chapter summarizing "Haptic VEs for visual impairments" while the whole work deals with supporting several impairments through the use of VR and does not cover all approaches that are relevant in the context of the present survey paper. Especially on cognitive learning and methodologic aspects, there are overview works by Mesquita et al. [22,23] from 2018 and 2019 which, however, pay less attention to technical aspects of the user interface and its implementation. The latest review by Façanha et al. in 2020 [26] focuses on virtual environments for orientation and mobility training purposes and provides an in-depth analysis regarding technical development as well as usability and cognitive evaluation aspects. However, due to their specific focus and emphasis, key publications regarding VR for blind and visually impaired users from 2019 and 2020 and outside this particular scope are missing. In addition, the authors did not introduce a novel taxonomy and considered a wider scope outside orientation and mobility training as in the work presented here.

Scientific Scope and Literature Search Methodology
The aim of this work was to give readers a complete, clear and concise overview of publications on the topic of conveying (generic) spatial information by means of touchable and walkable VR for blind and visually impaired users. We especially focus on classifying the very implementation of the spatial exploration interface and evaluation in different application scenarios. This thematic environment is in itself indeed a very broad field that can be analyzed in almost any depth, but with this paper, we wish to give readers the opportunity to get a lucid and holistic overview of existing technical implementation, evaluation-related and application-oriented aspects. In the following, we will specify content and interfaces, which were published so far, and classify related work by a literature review and a taxonomy. The results will provide precise pointers to further work in each synoptic cluster. For this purpose, a systematic literature research was applied using the publisher-independent and thorough search engine scholar.google.com as starting point. Here, we identified initial relevant literature using the initial keywords 'virtual reality', 'virtual environment', 'blind' and 'visually impaired' and iteratively completed (and verified) our database by a backwards snowball-search [27] from latest work to the earliest findable publication, see the complete workflow in Figure 2. Through the snowball search process, we have also become aware of other formulations of relevant work, which were most often unimodal interaction approaches like sonification or haptification. If appropriate in the context of this paper (i.e., the user can explore the virtual content interactively using haptic feedback or locomotion), they were also considered in the further process.
We also included other keyword search hits in between earliest and latest hits to expand and verify the collection. We included any scientific work that presents a technical implementation towards a touchable and walkable VR for blind and visually impaired people and optionally also includes an evaluation. We excluded any non-scientific, non-English written work outside the mentioned scope. To sharpen our contribution in this scope, we needed to exclude in-situ real-time navigation aids using augmented reality, as this should be better addressed in a separate article specializing in this topic (e.g., see [28]). However, we did use existing review papers' references to re-check and expand our growing dataset.
Herewith, we compiled a comprehensive overview and derived a novel taxonomy, which can classify the relevant prior work into meaningful and lucid groups to ease future works' search for related work. In the following, we explain these aspects in detail, discuss how this taxonomy is useful and how it can be used for future work. of relevant work, which were most often unimodal interaction approaches like sonification or haptification. If appropriate in the context of this paper (i.e., the user can explore the virtual content interactively using haptic feedback or locomotion), they were also considered in the further process. We also included other keyword search hits in between earliest and latest hits to expand and verify the collection. We included any scientific work that presents a technical implementation towards a touchable and walkable VR for blind and visually impaired people and optionally also includes an evaluation. We excluded any non-scientific, non-English written work outside the mentioned scope. To sharpen our contribution in this scope, we needed to exclude in-situ real-time navigation aids using augmented reality, as this should be better addressed in a separate article specializing in this topic (e.g., see [28]). However, we did use existing review papers' references to re-check and expand our growing dataset.
Herewith, we compiled a comprehensive overview and derived a novel taxonomy, which can classify the relevant prior work into meaningful and lucid groups to ease future works' search for related work. In the following, we explain these aspects in detail, discuss how this taxonomy is useful and how it can be used for future work.

Definition of the Feature 'Scale'
One essential characteristic of a three-dimensional world is that one can move one's person globally and the attached sensory system locally within it. This also applies to 1D or 2D information such as maps, planar surfaces or generic spatial information like shapes, graphs and geometries that are to be explored. This characteristic must be mapped in a virtual environment, for example to mentally integrate the shape of a virtual object by pointwise manual haptic exploration or to be able to explore a larger environment with means of locomotion. At this point, of course, separate research

Definition of the Feature 'Scale'
One essential characteristic of a three-dimensional world is that one can move one's person globally and the attached sensory system locally within it. This also applies to 1D or 2D information such as maps, planar surfaces or generic spatial information like shapes, graphs and geometries that are to be explored. This characteristic must be mapped in a virtual environment, for example to mentally integrate the shape of a virtual object by pointwise manual haptic exploration or to be able to explore a larger environment with means of locomotion. At this point, of course, separate research fields such as the non-visual exploration of information by sonification and haptication are also important. However, in the scope of this paper, unimodal audio or haptic feedback is understood as supplement modality when spatially exploring virtual content and is to be considered rather secondary. Inspired by the hapto-acoustic interaction metaphors of De Felice et al. [5], we consider the particular technical implementation of this interaction possibility to be an essential feature by which different levels of 'scale' can be defined, as we will explain in more detail in the following. An overview is listed in Table 1 and a schematic illustration can be seen in Figure 3.

Common example
Exploring charts and graphs [30,31] Train O+M skill in certain urban scene [32,33] Explore unknown building [34] or learn subway network [35] Multimodal Technol. Interact. 2020, 4, x FOR PEER REVIEW 6 of 22 fields such as the non-visual exploration of information by sonification and haptication are also important. However, in the scope of this paper, unimodal audio or haptic feedback is understood as supplement modality when spatially exploring virtual content and is to be considered rather secondary. Inspired by the hapto-acoustic interaction metaphors of De Felice et al. [5], we consider the particular technical implementation of this interaction possibility to be an essential feature by which different levels of 'scale' can be defined, as we will explain in more detail in the following. An overview is listed in Table 1 and a schematic illustration can be seen in Figure 3.  Exploring charts and graphs [30,31] Train O+M skill in certain urban scene [32,33] Explore unknown building [34] or learn subway network [35] Scale is thus to be interpreted as the size of the virtual content and not as the users' input space. Depending on this very level of size (in the following referred to as scale), different interaction techniques are applied to implement the spatial exploration process (see Figure 3 and Table 1). Smaller (or miniaturized) content can be palpated within arm's reach, while larger environments can be explored on foot within the physical limits of real spaces. Even larger environments must be explored by relative control of the avatar by keyboard, gamepad or walk-in-place approaches. In the context of our work, however, the term scale does not refer to the input space of the user, but to the size of the virtual content to be explored. In each scale, there are of course various ways to implement the audio feedback, but as this is a field on its own, it cannot be discussed further in depth in the context of this paper and for the sake of clarity. This field is explicitly addressed in dedicated papers like [17] and the following work. Scale is thus to be interpreted as the size of the virtual content and not as the users' input space. Depending on this very level of size (in the following referred to as scale), different interaction techniques are applied to implement the spatial exploration process (see Figure 3 and Table 1). Smaller (or miniaturized) content can be palpated within arm's reach, while larger environments can be explored on foot within the physical limits of real spaces. Even larger environments must be explored by relative control of the avatar by keyboard, gamepad or walk-in-place approaches. In the context of our work, however, the term scale does not refer to the input space of the user, but to the size of the virtual content to be explored. In each scale, there are of course various ways to implement the audio feedback, but as this is a field on its own, it cannot be discussed further in depth in the context of this paper and for the sake of clarity. This field is explicitly addressed in dedicated papers like [17] and the following work.

Small Scale: Touching Virtual Objects within Arm's Reach or Absolute Positioning of the Avatar
We propose to define the small scale to be the interaction of the user with an interface within hand reach, i.e., the user does not have to move their body in physical space. Palpating or interacting with a virtual object is achieved by the absolute positioning the haptic feedback point(s) of interaction of the user interface within hand reach, while the user stays in the same spot. A very common approach is using a grounded force feedback device from the Phantom series providing force feedback to the user in a very limited work space. This haptic feedback can certainly be supplemented by suitable semantic and/or spatial audio feedback, for example when touching a certain part or area of the virtual object that needs to be explored. In this context, the presence of force feedback is an advantage, which prevents the user from inadvertently reaching through the object and thus makes palpation more effective, but as a disadvantage this also entails a very limited working space and therefore virtual objects have to be scaled to fit in this area. Transferring this spatial information to reality (e.g., a miniature map to a real space) requires certain imagination and mental capacity while such devices are also mostly technically complex, expensive and thus rare.

Medium Scale: Physically Walking through VE, Restricted to Physically Available Space
The medium-scale level describes the interaction of users in VEs in larger environments that can be explored by physical walking. The position of the user is analyzed by means of appropriate tracking technology and the user is given the appropriate sensory feedback as if she or he would actually walk in this environment or around an object. Due to this freedom of physical movement, at least wearable haptic feedback must be used, grounded implementations are not possible without major, usually disproportionately large, technical effort. The freedom of movement is limited to the area of the tracking environment, which is why only sections of a larger environment can be displayed in real size. A common application is the audio-haptic exploration of a room or section of a virtual outdoor space with a non-grounded haptic feedback white cane simulation. This allows real spatial structures to be simulated much more immersively and comprehensively than as miniature models, but such approaches exclude mostly grounded haptic feedback and predetermine a limited physical area that can be used virtually. At large scale, users can discover VEs or generic objects that are much larger than the physical available space using an avatar. They perceive the VE as if they were the avatar in it and can control its motion by appropriate user input. The latter can be implemented either by digital or analogue movement with a game controller, the keyboard or a joystick providing passive haptic information. In addition, walk-in-place approaches, which mimic actually walking, were used. Thus, a theoretically arbitrarily large VE can be explored, but the users must be able to put themselves in their avatar as good as possible through a realistic audio-haptic simulation. A common example is the exploration of an unknown, large real space using an avatar, which is translated and rotated stepwise in a grid-like manner inside a purely audio VE.

Definition of the Feature 'Exploration Interaction'
There are several possibilities for the technical implementation of interactive spatial exploration. In the following, we will show which these are and how they differ from each other. The semantic and conceptual borders between mere sensory input and output or devices might be blurred knowingly in order to better classify and distinguish the overall interaction concept.

Haptic Feedback
According to the definition from Pachierotti et al. [14], we distinguish grounded and wearable haptic feedback. A good example for grounded haptic feedback is the extensively researched series of Phantom devices. Here, a mechanical arm generates force feedback when the user's finger collides with a virtual object and thus prevents the user from reaching into a virtual object. Such devices are usually very complex, still expensive today and rarely found outside laboratories [15]. Wearable devices are considerably handier and also more cost-effective, but do not prevent the user from reaching into a virtual object [36] in opposition to grounded force feedback. Thus, one has to palpate and mentally integrate the surface by monitoring the triggered haptic feedback in the absence of visual feedback. Most often, wearable haptic feedback is implemented as so called vibrotactile signals. Depending on the very haptic rendering algorithm, the onset of vibration indicates the collision with a virtual object, for example [37,38]. Combined with force feedback, vibrations can also convey different virtual texture [39,40] when using a virtual white cane.

Audio Feedback
Spatial hearing is an important source of sensory information for blind and visually impaired people, regarding both contextual information about the environment and spatial information in general. Therefore, almost all known VR applications for blind and visually impaired users integrate spatial audio rendering, for example to give users a realistic and useful acoustic impression of this environment or virtual content [16] or provide haptic-supplementing semantic information. For instance, this can be spatial hearing in combination with head tracking (e.g., the user can move his head to obtain sound sources and acoustic information [41]) or audio rendering processes which are independent of the user's head rotation [42], but using stereo speakers or even speaker arrays. Stereo speakers are often used when, for example, controlling an avatar with the keyboard through a purely audio VE while the users hear through the speakers as if they were the controlled avatar inside the VE [43]. Of course, the intended (and technically possible) quality of this audio rendering also varies with the purpose and possibilities of the used or available hardware, most often the bare available computing power is decisive [16,32,39,44]. Thereby, (spatial) audio rendering itself is an interesting separate field of research on its own. Due to the audio feedback only as a spatial and semantic supplement focus of this work, we provide the reader with a few references to further, more in-depth work (e.g., [17]).

Locomotion in VE
A key aspect of the (mostly egocentric) exploration of large VEs is the movement of the users' perspective. For example, when exploring an audio-haptic VE, which is an approximation of a real place, the users certainly want to be able to 'walk' around in it (like sighted users [45]). Depending on the available interface, they can move their avatar by means of absolute or relative positioning. The former is most often achieved in a small-scale setting with a grounded haptic feedback device, whose haptic feedback point of interaction represents the users' avatar position in the VE and makes walls and obstacles perceivable. The latter is most often realized by relative movements of the users' avatar position in the VE. Mostly, its movement can be controlled in a grid-like manner horizontally along the VE with a common computer keyboard or with continuous controller input from a joystick or a walk-in-place approach mimicking actual walking [37]. Collisions or environment related information is most often supplementary provided by audio feedback.

Definition of the Feature 'Perspective'
Another main aspect when classifying VEs is the type of perspective the user has on or in it. Following the definition of egocentric and exocentric interaction metaphors by De Felice et al. [5], some approaches enable users to perceive the VE as if they were actually inside this environment (i.e., egocentric). One might think of being a doll in a dollhouse and perceiving the environment in terms of audio-haptic feedback. To reach and explore every point in this VE, the users need to be able to interactively move their position; locomotion is needed here (see the previous section in this regard) and audio feedback provides spatial and/or semantic information. With an exocentric view, however, the users can reach and explore the whole VE or virtual object without having to move their physical position by using a pointer, similar to pointwise scanning a dollhouse with a pen or 2D mathematical functions. Here, likewise, audio-haptic feedback is used to perceive spatial and semantic information. Thus, an exocentric view is strongly, but not necessarily connoted with a smallor medium-scale VE. In this context, a common example is the manual audio-haptic exploration of virtual objects like mathematical graphs or downscaled abstract maps of real spaces with grounded force feedback devices.

Definition of the Feature 'Evaluation and Metrics'
Whenever designing a user study, researchers face the problem on how to understand and measure what the user studies' participants learn or think. Especially when spatial information needs to be conveyed by means of an audio-haptic VE, there are several approaches to answer this question and thereby evaluate the VR application in combination with the user interface. In the context of our taxonomy, this appears as a valid criterion to cluster existing work. The range of applied methods ranges from subjective questionnaires measuring the usability [64] or orientation and mobility skills [65], to objective measurements like the identification performance parameters of virtual objects [66] or the transfer of a trained cognitive map to a real space navigation task [56]. In addition, some user studies were evaluated by physically rebuilding the cognitive map of the VE using physical properties like LEGO [16]. In the context of this paper, the following table lists the most widely used approaches for each cluster in the last column. Some works employ sighted, but only blindfolded participants [66], while other evaluations also involve blind participants [67]. The trend to date, however, is more and more towards solely blind or visually impaired participants in the almost double-digit range fulfilling real world navigation tasks [56,68].

Application and Discussion of Taxonomy
When applying the previously presented classification criteria to the currently available and relevant related work in this field, certain clusters can be created, which we show and discuss in the following. This brief taxonomy can certainly not cover all publications in full depth, but aims to provide a vivid and high level overview of thematically appropriate connections and clusters. The chronological context and content of the survey presented in Table 2 will be explained in more detail below. At the first glance, one might notice that the amount of citations per cluster is not consistent. Our aim was to summarize as accurately as possible a reflection of the current status, which, by making meaningful distinctions in terms of content, is causing this imbalance. In the following, we briefly discuss the historical development towards the current state-of-the-art and report on contextual milestones and trends and that time. This short summary can only mention a few milestones that will desirably motivate readers to conduct focused in-depth research with these contextual links.

Technical and Content Development over the Last Two Decades
The development of touchable and walkable VR for blind and visually impaired people started in the literature in 1997, when Max and Gonzales [69] reported on "Navigable virtual Reality Worlds for Blind Users, using Spatialized Audio". In the further course of the late 1990s and early 2000s, extensive research was conducted with grounded force feedback devices like the Phantom series in an exocentric small-scale context, e.g., making virtual charts and diagrams [31], but also simple 3D geometric objects [66] graspable. From today's point of view, the technology of that time was very limited in terms of quality and quantity of haptic and audio feedback, which also narrowed the technical band width of haptic and spatial information. Especially interesting is the combination of a force feedback data glove and a grounded force feedback mechanical arm [70,71] or two mechanical arms providing force feedback [72] to improve the spatial perception, which lead to sophisticated hardware and software engineering approaches [73]. On the contrary, they were also very complex as well as expensive and therefore not widely disseminated. Subsequently, towards the end of the 2000s, many experiments were carried out with devices such as the Phantom and parameters such as haptic and audio [74] rendering were optimized so that more uses cases like gaming could be implemented [47]. Some researchers also began with analyzing the transfer of virtually trained knowledge to navigation in the corresponding real places [75] or used other devices like commercial off-the-shelf products [76,77]. Beginning with [78,79], the former was intensively reached by Orly Lahav and Jaime Sanchez throughout the mid-2010s (e.g., [34,58,62,68,80,81]). During this period, audio rendering also became much more powerful [82], making it possible to walk through and understand unknown in the absence of haptic feedback [16,55,63].
From the mid-2010s to today, complex and expensive simulation hardware like the Phantom was exchanged with commercial off-the-shelf product like the Nintendo Wii controller [58] or smartphones [43] to receive the users' input and provide them with audio-haptic sensory feedback. In addition, appropriate VR hardware and software equipment was available by then and could be used directly for haptic and audio feedback [83,84] or to use smartphones for virtual explorations of VE [32,56]. There are even approaches to simulate echolocation in VR [85]. The latter can be found in the large-scale category, as users can use a controllable avatar to walk through much larger environments than the physically available space would allow [37]. The research trends in the remaining category medium scale continue to develop away from complex and expensive laboratory experiments [33] to relatively inexpensive, but nevertheless sophisticated audio-haptic approaches like a room scale walkable audio-haptic white cane simulations [39,40].

Analysis of the Current State of the Art
Considering Table 2, it is noticeable that in the cluster of small scale, a particularly large number of publications have an exocentric perspective. This is probably due to the fact that application scenarios in this context can be implemented particularly efficiently and thus with as little cognitive load as possible during operation. Mostly grounded force feedback is used here, which particularly supports the associated pointwise iterative exploration of (virtual) content. To be even able to capture small-scale spatial information with vibrotactile, non-grounded data gloves it appears reasonable to use a real plane (e.g., a table) as a reference. Otherwise, the cognitive load when palpating 3D information without grounded force feedback is very high. Most of the work with exocentric small-scale VEs is done with usual two-dimensional content such as graphs and diagrams, but also three-dimensional, simple geometric objects. The evaluation with users was mostly done by measuring how detailed the gained mental model was formed and how the users perceive the usability. With the other application scenarios like maps, games or proxies of real spaces similar evaluation methods were used, games were oftentimes tested in a feasibility study while the learned mental model of a map could be rebuild and checked with physical objects or navigation tasks in the real environment. With egocentric small-scale VEs, a similar context of evaluation possibilities applies. Medium-scale VEs are all egocentric simulations providing audio and wearable haptic feedback or just interactive audio feedback. In addition to wearing appropriate VR glasses and headphones in a tracked environment, smartphones can also take over this function and use a real empty space for a walkable virtual environment. Some works focus on the integration of haptic feedback to simulate a virtual white cane for blind people in this virtual room, most often a proxy of a real space. Compared to small and large scale, there is relatively little work done in this area, since such laboratory equipment is not very common and the functionality of smartphones is not yet well known or used. In the large-scale cluster, a relatively large number of publications are listed, these primarily contain the exploration of a purely audio VE by relative movement of the user's avatar inside it. This procedure is certainly less common than the absolute positioning of an avatar in small-or medium-scale, but this approach can also represent VEs that are larger than the physically available space. It is also possible to simulate a virtual white cane with haptic feedback or to use a smartphone or computer to control the avatar's movement. The virtual environment explored here can be a replica of a real environment or intentionally a game to specifically train orientation and mobility skills. This mental model or the improvement of skills can be measured subjectively and objectively through questionnaires or specific navigation tasks. In summary, Figure 4 provides an illustrative example for each scale.
Multimodal Technol. Interact. 2020, 4, x FOR PEER REVIEW 11 of 22 physically available space. It is also possible to simulate a virtual white cane with haptic feedback or to use a smartphone or computer to control the avatar's movement. The virtual environment explored here can be a replica of a real environment or intentionally a game to specifically train orientation and mobility skills. This mental model or the improvement of skills can be measured subjectively and objectively through questionnaires or specific navigation tasks. In summary, Figure  4 provides an illustrative example for each scale. Common examples for small, medium and large walkable and touchable VR applications following [40,54,57]. From left to right: Exocentric exploration with grounded force feedback, egocentric exploration with a virtual white cane in a tracked environment or by controlling an avatar with the keyboard or a game controller. The haptic exploration process of spatial information is in each case supplemented by audio feedback; as examples for spatial information, a floor plan, an outdoor scene or simple geometric shapes are shown.
However, despite all these exciting and promising developments, VR has (to the authors best knowledge) not yet arrived in everyday life of blind and visually impaired people. For sighted people there is currently a great progress in the VR and AR (Augmented Reality) field, but visual perception has completely different requirements and possibilities. Considering the mostly purely acoustic and/or haptic perception in VR, it can be quite a challenge to identify even simple 3D geometric objects without modern haptic feedback [38,83,84]. The technical development and thus also the possibilities for implementation improvements are constantly advancing, which together with the knowledge gained so far represents a great potential that needs to be exploited. Especially, smartphones [32,56] or nowadays commercial data gloves [84] hold big potential. Such virtual environments could also be used to practice the use of in-situ navigation aids in a repeatable and safe training environment [86][87][88]. A functional connection to existing in-situ navigation aids (e.g., [89,90]) would also be desirable and one could also think about realizing so far uncommon feature pairings, e.g., a medium-scale VE with an exocentric perspective. An application example might be a virtual map or a virtual object for whose complete exploration users must move in a trackable space, e.g., a true scale horse or a map that can be explored by actually walking in (respectively, over) it. These are certainly only a few of the many possibilities for improvement that arise in view of the continuing development of technology. Instead of individual and rare special laboratory prototypes (as has often been the case to date), a common software and hardware platform should be targeted and used in the long term, which could be useful for sighted, blind, visually impaired and people with other impairments.

Figure 4.
Common examples for small, medium and large walkable and touchable VR applications following [40,54,57]. From left to right: Exocentric exploration with grounded force feedback, egocentric exploration with a virtual white cane in a tracked environment or by controlling an avatar with the keyboard or a game controller. The haptic exploration process of spatial information is in each case supplemented by audio feedback; as examples for spatial information, a floor plan, an outdoor scene or simple geometric shapes are shown.
However, despite all these exciting and promising developments, VR has (to the authors best knowledge) not yet arrived in everyday life of blind and visually impaired people. For sighted people there is currently a great progress in the VR and AR (Augmented Reality) field, but visual perception has completely different requirements and possibilities. Considering the mostly purely acoustic and/or haptic perception in VR, it can be quite a challenge to identify even simple 3D geometric objects without modern haptic feedback [38,83,84]. The technical development and thus also the possibilities for implementation improvements are constantly advancing, which together with the knowledge gained so far represents a great potential that needs to be exploited. Especially, smartphones [32,56] or nowadays commercial data gloves [84] hold big potential. Such virtual environments could also be used to practice the use of in-situ navigation aids in a repeatable and safe training environment [86][87][88]. A functional connection to existing in-situ navigation aids (e.g., [89,90]) would also be desirable and one could also think about realizing so far uncommon feature pairings, e.g., a medium-scale VE with an exocentric perspective. An application example might be a virtual map or a virtual object for whose complete exploration users must move in a trackable space, e.g., a true scale horse or a map that can be explored by actually walking in (respectively, over) it. These are certainly only a few of the many possibilities for improvement that arise in view of the continuing development of technology. Instead of individual and rare special laboratory prototypes (as has often been the case to date), a common software and hardware platform should be targeted and used in the long term, which could be useful for sighted, blind, visually impaired and people with other impairments.  [16,35,42,54,55,59,60,62,74,79,81,85,[132][133][134][135][136][137][138][139][140][141][142][143][144][145][146][147][148][149] VR is undoubtedly a very modern and high-tech way to convey spatial information, often it is simply more practical to use simpler and less complex approaches. The high-level taxonomy presented here cannot yet make a definite statement in this context, but is intended to support future work in this field by helping researchers to further develop their ideas as good as possible using existing knowledge.

The Taxonomy's Value for Future Work
To conclude the development and application of this taxonomy, we will provide the readers with some inspiration how this taxonomy can contribute to future work in this field: First, it would be very interesting to evaluate concrete application examples (e.g., Orientation and Mobility Training) not only within but also between different scales in the context of the further developing technical possibilities. Up to now, all developments and evaluations have taken place in a prototype setting and have rarely been evaluated in a realistic application scenario. Such a comparison between different scales (e.g., grasping a downscaled virtual map or virtually walking in it influencing the quality and quantity of the mental model and the cognitive load) could bring a noticeable information gain towards a real world application.
Second, existing paradigms and well-researched characteristics could be thought and optimized across scales to create novel approaches. For example, does the king Midas problem (i.e., double assignment of users haptic information for both navigation input and information output) also apply to non-small scale and/or none haptic modalities in virtual environments? Investigations in this area could help to make such VR systems more efficient and easier to use, which is especially important for blind and visually impaired people with a very limited (if any) visual sense.
Third, novel combinations of taxonomy features like users' perspective, application scenario and evaluation metric could help to promote novel approaches. For example, in a medium-or large-scale setting, in what type of application and implementation would exocentric information be useful? Would it be useful if the users could interactively change their perspective and how could this change be presented to them at best?
These are certainly only a few, at first sight thought-provoking ideas, but they do show how the presented taxonomy provides a framework to foster the creation process of new ideas and approaches regarding both technical and methodological aspects. Table 2 and Figure 4 are a helpful support to grasp existing work, but certainly also a shortened and content-related optimized representation. Each individual citation represents a more or less extensive scientific contribution, so the mere quantity of citations in a cluster should not necessarily be understood as 'scientific weight'. At first glance, this presentation (without the accompanying text in Sections 3.1 and 3.2) does also not show any temporal links, which means that content-temporal conclusions are only possible to a limited extent. The authors worked to the best of their knowledge in order to minimize such effects, but due to the necessary content compression and classification, effects such as visualization-related seemingly weightings cannot be completely excluded. In addition, due to the necessary content focus of a taxonomy, not every aspect of a virtual environment or the user's interaction with it can be fully covered. Beyond the aspects mentioned in this work, there are undoubtedly more and deeper classification and analysis possibilities. However, this work is intended to be a starting point to understand recent related work and refer to further literature (including previous survey papers).

Conclusions and Outlook
This survey paper took a look back over the past two decades and highlights the developments that have led to the current state-of-the-art when it comes to the application of VR for blind and visually impaired people. We proposed and applied a high-level taxonomy to cluster the work done up to now from the perspective of technology, interaction and application. Foremost, we introduced a classification into small-, medium-and large-scale virtual environments to characterize the interaction with and the content of the virtual environment. Our comprehensive table shows that especially grounded force feedback devices for haptic feedback were strongly researched in different applications scenarios and mainly from an exocentric perspective. However, such devices have a very limited interaction area, which can be expanded with medium-scale (i.e., walkable) virtual environments and completely overcome with large-scale environments. The latter are virtually walkable with an avatar that can be controlled by the user and these virtual environments are, for example, approximations of physical large environments that are unknown. The use of novel and widespread interfaces such as smartphones or nowadays commercial off-the-shelf VR components also represents a promising potential. This work contributes to this future work by summarizing previous knowledge and thus making it as comprehensible and usable as possible to foster future development in this field.
Author Contributions: Conceptualization, methodology, formal analysis, investigation, data curation, writing-original draft preparation, J.K.; writing-review and editing, supervision, T.G. All authors have read and agreed to the published version of the manuscript.