The Unfolding Space Glove: A Wearable Spatio-Visual to Haptic Sensory Substitution Device for Blind People

This paper documents the design, implementation and evaluation of the Unfolding Space Glove—an open source sensory substitution device. It transmits the relative position and distance of nearby objects as vibratory stimuli to the back of the hand and thus enables blind people to haptically explore the depth of their surrounding space, assisting with navigation tasks such as object recognition and wayfinding. The prototype requires no external hardware, is highly portable, operates in all lighting conditions, and provides continuous and immediate feedback—all while being visually unobtrusive. Both blind (n = 8) and blindfolded sighted participants (n = 6) completed structured training and obstacle courses with both the prototype and a white long cane to allow performance comparisons to be drawn between them. The subjects quickly learned how to use the glove and successfully completed all of the trials, though still being slower with it than with the cane. Qualitative interviews revealed a high level of usability and user experience. Overall, the results indicate the general processability of spatial information through sensory substitution using haptic, vibrotactile interfaces. Further research would be required to evaluate the prototype’s capabilities after extensive training and to derive a fully functional navigation aid from its features.


Navigation Challenges for Blind People
Vision is the modality of human sensory perception with the highest information capacity [1,2]. Being born blind or losing one's sight later in life involves great challenges. The ability to cope with everyday life independently, to be mobile in unfamiliar places, to absorb information and, as a result, to participate equally in social, public and economic life can be severely hampered, ultimately affecting one's quality of life [3][4][5][6]. Society can, of course, address this issue on many levels, for example by ensuring accessibility of information or by designing public spaces to meet the specific needs of blind individuals or, more generally, visually impaired people (VIPs). In addition, technical aids, devices and apps are constantly being developed to assist VIPs with certain tasks [7,8]. These aids can essentially be divided into three aspects: obtaining information of the surroundings (What is written here? What is the object ahead like?), interfacing with machines and computers (input and output) and navigation (How do I get there? Can I walk this way?). While there are certainly large overlaps between these three aspects, this paper exclusively focuses on the third-navigation.
Following the definition of Montello and Sas [9], navigation (often used synonymously with mobility, orientation or wayfinding [10,11]) is the ability to perform "coordinated and goal-directed movement[s] through the environment". It can be divided into two subcomponents: wayfinding (long-range navigation to a target destination, spatial orientation, knowledge about surroundings and landmarks outside the immediate environment) and locomotion (short-range navigation, moving in the intended direction without colliding or getting stuck, obstacle detection and development of strategies to overcome them). This paper will concentrate on the latter and discuss how VIPs can be supported in this by technical aids-that is, without the help of human companions.

Ways to Assist Locomotion
One theoretical solution would be to rehabilitate (rudimentary) vision: since the 1990s, research has been conducted into surgical measures in both retinal implants stimulating the optic nerve and brain implants attached directly to the visual cortex. The quality of vision restored by this kind of surgery varies widely and, even in the best cases, represents only a fraction of the visual acuity of people with ordinary eyesight. Together with high costs, this leads to the fact that invasive measures of this kind are still far from widespread use and have so far only been tested on a small number of people. In the medium term, however, improving the quality of implants and simplifying surgical procedures could make the technology available to a wider public [12,13]. With regard to navigation, however, it is questionable whether the mere provision of visual information to the brain would adequately address the problem. Even if sighted individuals can rely on the visual modality for the most part, navigation and the acquisition of spatial knowledge required for it are by no means dedicated visual tasks. Blind people (but also sighted people to some extent) use multiple modalities and develop various strategies to master navigation and locomotion, which can be described as multimodal or even amodal task-specific functions. There is an ongoing discussion about how exactly VIPs-a very heterogeneous group with varying capacities for absorbing environmental information-obtain and cognitively process spatial knowledge [14,15].
Two well established and commercially available aids that do not attempt to assist locomotion through vision restoration alone are the white long cane (which provides basic spatial information about objects in close proximity) and the guide dog (which uses the dog's spatial knowledge to assist navigation).
Figures for prevalence of the white cane among VIPs vary greatly depending on the age of the study population, the severity of their visual impairment and other factors [16][17][18][19], but in one example (USA, 2008) it is as low as~10% [16]. Even though the white cane, once mastered, has proven to be an extremely helpful (and probably the most popular) tool for VIPs, it comes with the drawback of having only a limited range (approx. 1 m radius) and not being able to recognise aerial obstacles (tree branches, protruding objects) [20]. Smart canes that give vibratory feedback, capable of recognising objects above waist height or further away, could offer a remedy, but have, to the best of our knowledge, not yet achieved widespread use. The reasons for this, as well as their advantages and disadvantages, have been discussed in various publications [20][21][22][23][24].
Guide dogs, on the other hand, are another promising option that can bring further advantages for blind people besides solving navigation tasks [25][26][27][28]. However, they are even less widespread (e.g., only 1% of the blind people in USA, 2008 [16] or 2.4% of "the registered blind" in the U.K. [26] owned one). The reasons for this and the drawbacks of the guide dog as a mobility aid are manifold and have been frequently discussed in the literature.
Even among the users of these two aids, 40% reported head injuries at least once a year (18% once a month) and as a result, 34% leave their usual routes only once or several times a month, while 6% never do [19].
With an estimated worldwide total number of 295 million people with moderate or severe visual impairment in 2020, of which 36 million are totally blind [29], it can be assumed that the locomotion navigation tasks addressed-among others-still represent a global problem for VIPs and pose an important field of research.

Sensory Substitution as an Approach
The concept of Sensory Substitution (SS) offers a promising approach to address this shortcoming. The basic assumption of SS is that the function of a missing or impaired sensory modality can be replaced by stimulating another sensory modality using the missing information. This only works because the brain is so plastic that it learns to associate the new stimuli with the missing modality, as long as they fundamentally share the same characteristics [30]. Surgical intervention would not be necessary because existing modalities or sensory organs can be used instead.
There is a great amount of scientific work on the topic of SS, specifically dealing with the substitution of visual information. The research field was established in the 1960s by a research group around Paul Bach-y-Rita; they developed multiple variations of a Sensory Substitution Device (SSD) stimulating the sense of touch in order to replace missing vision, commonly called Tactile-Vision Substitution Systems (TVSS) [31][32][33]. Many influential publications followed this example and developed SSDs for the tactile modality [34][35][36][37], one even suggesting using a glove as tactile interface [38]; others addressed further modalities, such as the auditory with so-called Auditory-Vision Substitution Systems (AVSS) [39][40][41][42][43]. While there are summaries of existing approaches [44,45], there are also many other, smaller publications on the topic in the literature-often only looking at a sub-area of SS.
It should be noted that there is further work on so-called Electronic Travel Aids (ETAs) [24,44,46]. SSDs differ from them in the sense that they pass largely unprocessed information from the environment on to the substituting sensory modality and leave the interpretation to the user, while ETAs provide pre-interpreted abstract information (e.g., when to turn right or where a certain object is located).

Brain Plasticity and Sensory Substitution
The theoretical basis of SS is summarised under the term of brain plasticity. Although the focus of this paper is not on the neurophysiological discussion of this term, a brief digression is nevertheless helpful in order to understand some of the design decisions made in this project.
In general, brain plasticity describes the "adaptive capacities of the central nervous system" and "its ability to modify its own structural organization and functioning" [30]. While neuroscience has long assumed a fixed assignment of certain sensory and motor functions to specific areas of the brain, we today know that the brain is capable of reorganising itself, e.g., after brain damage [51] and, moreover, is capable of learning new sensory stimuli not only in early development but throughout life [52]. For sensory substitution to work and for the new neural correlate to be learned, a number of conditions are nevertheless necessary; Bach-Y-Rita et al. [34] point out that there has to be "(a) functional demand, (b) the sensor technology to fill that demand, and (c) the training and psychosocial factors that support the functional demand".
An SSD only needs a sensor that picks up the information, an interface that transmits it to human receptors, and finally and very importantly, the possibility for the user to modify the sensory stimulus by motor action in order to determine the initial origin of the information [34]. The importance of the latter close dependence between motor action and sensory perception has been emphasised in many publications [33,34] and is assumed to be the basis of any sensory experience [53].
There still is a vital discussion across disciplines about how cognitive processing of sensory stimuli is carried out by the brain. Worth mentioning here is the Theory of Sensorimotor Contingencies (SMCs) dismissing longstanding representational models and describing the supposedly passive perception of environmental cues as an active process that relies on regularities between action and reception that have to be learned [54]. The literature on SSDs and the SMC theory mutually refer to each other [54,55].

Pitfalls of Existing Substitution Systems
Despite the long tradition of research on the topic of SS and numerous publications with promising results, the concept has not yet achieved a real breakthrough. The exact reasons for the low number of available devices and users have often been discussed and made the subject of proposals for improvement [30,40,44,55,56].
Certain prerequisites that an SSD must meet in order to be used by a target group can be gathered from both existing literature and methods of interaction design. These aspects are of a very abstract nature and their implementation in practice is challenging and often only partially achievable. The following 14 aspects or prerequisites-the first ten of which were originally formulated as problems of existing SSDs by Chebat et al. [55]-were taken into account in the design and evaluation of the proposed SSD: learning, training, latency, dissemination, cognitive load, orientation of the sensor, spatial depth, contrast, resolution, costs, motor potential, preservation of sensory and motor habits, user experience and joy of use, and aesthetic appearance. See Appendix A for a discussion.
The motivation to deal with this field arose from the technological progress since Bach-y-Rita's early pioneering work and his analogue experimental set-ups; improvements in price, portability and processing power of modern digital computing devices, sensors and other components necessary for the development of an SSD have opened up many new possibilities in the field and facilitate the implementation of these 14 aspects, now more than ever. Recent literature on assistive technology for VIPs also suggests that the field is "gaining increasing prominence owing to an explosion of new interest in it from disparate disciplines" having a "very relevant social impact" [57].

Previous Work
The Unfolding Space Glove is an SSD; it translates spatio-visual depth images into information that can be sensed haptically. In a broader sense, it is a TVSS, with the original term tactile perception deliberately being changed to haptic because of the high level of integration of the motor system, and the term visual being changed to spatio-visual to describe the input more accurately.
It was first drafted in previous work by Kilian in 2018 [58,59] with a focus on Interaction Design (only available in German). However, the first prototypes of the glove were still a bit cumbersome, heavy, had higher latencies and were prone to errors. Nevertheless, they were able to prove the functional principle and demonstrate that more research on this device is worthwhile. Through the course of this project, the device was refined and the prototype tested in this study was ultimately able to deliver promising results in the empirical tests (see results and discussion section) and meets many of the previously defined prerequisites (see discussion section). For more details and background information on the project please also see the project website https://www.unfoldingspace.org (accessed on 26 January 2022). Code, hardware, documentation and building instructions are open source and available in the public repository https://github.com/jakobkilian/unfolding-space (accessed on 26 January 2022). Consider Release v0.2.1 for the stable version used in this study and consider more recent commits in which the content has been revised for better accessibility.

Structure and Technical Details
The Unfolding Space Glove ( Figure 1) essentially consists of two parts: a USB power bank that is worn on the upper arm (or elsewhere) and the glove itself, holding the camera, actuators, computing unit and associated technology. The only connection between them is the power supply USB cable. A list of the required components and photographic documentation of the assembly process is attached in the Supplementary Materials S1 and can be found in the aforementioned Github repository. The mere material costs of the entire set-up are about $600, of which about two thirds go to the camera alone (see Appendix B). Structurally, a TVSS typically consists of three components: the input (a camera that captures images of the environment), the computing unit (translating these images into patterns suitable for tactile perception), and finally the output (a tactile interface that provides this information to the user). The selected input system gathering 3D information on the environment is the "Pico Flexx" Time of Flight (ToF) camera by pmdtechnologies [60]. ToF refers to a technique that determines the distance to objects by measuring the time that actively emitted light signals take to reach them (and bounce back to the sensor). For quite a while, SS research focused on conventional two-dimensional greyscale images, and it is only in the last few years that the use of 3D images or data, especially from ToF cameras, has been investigated. Due to the advantages of today's ToF technology compared to other 3D imaging methods (such as structured light or stereo vision) [61], it seemed reasonable to make efforts to explore exactly this combination. See Appendix B for a more detailed discussion of existing 3D SSDs and the choice of the ToF camera technology.
The computing unit, a Raspberry Pi Compute Module 4, is attached to the glove as part of the Unfolding Space Carrier Board which is described more closely in Appendix C.
The output is a 3 × 3 matrix of (vibrating) linear resonant actuators (LRAs) placed on the back of the hand in the glove. The choice of actuators and the reasons for positioning them on the back of the hand are described in Appendix D.

Algorithm
The SSD's translation algorithm takes the total of 224 × 171 (depth) pixels from the ToF camera and calculates the final values of the glove's 3 × 3 vibration pattern. Each motor represents the object that is closest to the glove within the corresponding part of the field of view of the camera. A detailed explanation of the algorithm can be found in Appendix E. The code files and their documentation are attached as zip files in the Supplementary Materials S2 but can also be accessed in the aforementioned Github repository.

Summary
The resulting system achieves a frame rate of 25 fps and a measured latency of about 50 ms. About 10 ms of this is due to the rise time of the LRAs, 3 ms to the image-processing on the Raspberry Pi and an unknown part to the operations of the Pico Flexx specific library. The system has a horizontal field of view of 62°and a vertical of 45°with a detection range from 0.1 m to 2 m [60]. The beginning of the detection range is determined by the limitations of the ToF method, while the 2 m maximum distance was just fixed for this study and could be adjusted in the future (maximum of the camera is 4 m [60] with decreasing quality in the far range). The glove by itself weighs 120 g with all its components, the power bank and bracelet weigh 275 g together, giving a total system weight of 395 g. The glove was produced in two sizes, each with a spare unit, in order to guarantee a good fit for all subjects and a smooth conduct of the study. Now that the physical system has been described in detail, the next section will explain the study design.

Ethics
In order to evaluate the SSD described, a quasi-experiment was proposed and approved by the ethics committee of the faculty of medicine at the university hospital of the Eberhard-Karls-University Tübingen in accordance with the 2013 Helsinki Declaration. All participants were informed about the study objectives, design and associated risks and signed an informed consent form to publish pseudonymous case details. The individuals shown in photographs in this paper have explicitly consented to the publication of those by signing an additional informed consent form.

Hypotheses
The study included training and testing of the white long cane. This was not done with the intention of pitting the two aids against each other or eventually replacing the cane with the SSD. Rather, the aim was to be able to discuss the SSD comparatively with respect to a controlled set of navigation tasks. In fact, the glove is designed in a way that it could be worn and used in combination with the cane in the other hand. Testing not only both aids but also the combination of both would, however, introduce new unknown interactions and confounding factors. The main objective of this study thus reads: "The impact of the studied SSD on the performance of the population in both navigation and obstacle detection is comparable to that of the white long cane." The hypothesis derived from this is complex due to one problem: blind subjects usually have at least basic experience with the white cane or have been using it on a daily basis for decades. A newly learned tool such as the SSD can therefore hardly be experimentally compared with an internalised device like the cane. A second group of naive sighted subjects was therefore included to test two separate sub-hypotheses: Sub-Hypothesis H1a. Non-Inferiority of SSD Performance: after equivalent structured training with both aids, sighted subjects (no visual impairment but blindfolded, no experience with the cane) achieve a non-inferior performance in task completion time (25 percentage points margin) with the SSD compared to the cane in navigating an obstacle course.
Sub-Hypothesis H1b. Equivalency of Learning Progress across Groups: at the same time, blind subjects who have received identical training (here only with the SSD) show equivalent learning progress with the SSD (25 percentage points margin) as the sighted group.
With both sub-hypotheses confirmed, one can therefore, in simple terms, make assumptions about the effect of the SSD on the navigation of blind people compared to the cane, if both had been learned similarly. In addition, two secondary aspects should be investigated: The device is easy to learn, simple to use, achieves a high level of user enjoyment and satisfaction and thus strong acceptance rates.

Hypothesis 3. Distal Attribution.
Users report unconscious processing of stimuli and describe the origin of these haptic stimuli distally in space at the actual location of the observed object.

Study Population
A total of 14 participants were recruited mainly through calls at the university and through local associations for blind and visually impaired people in Cologne, Germany. Appendix F contains a summary table with the subject data presented below. The complete data set is also available in the study data in Supplementary Materials S3.
Six of the subjects were normally sighted and had a visual acuity of 0.3 or higher; eight were blind (congenitally and late blind), thus had a visual acuity of less than 0.05 and/or a visual field of less than 10°(category 3-5 according to ICD-10 H54.9 definition) on the better eye. Participants' self-reports about their visual acuity were confirmed with a finger counting test (1 m distance) and, if passed, with the screen based Landolt visual acuity test "FrACT" [62] (3 m distance) using a tactile input device (Figure 2A). Two subjects were excluded from the evaluation despite having completed the study: on average, subject f (cane = 0.75, SSD = 2.393) and subject z (cane = 1.214, SSD = 1.500) caused a remarkably higher number of contacts (two to three-fold) with both aids than the average of the remaining blind subjects (cane = 0.375, SSD = 0.643). For the former this can be explained by a consistently high level of nervousness when walking through the course. With both aids, the subject changed their course very erratically in the event of a contact, causing further contacts or even collisions right away. The performance of the latter worsened considerably towards the end of the study, again with both aids, so much so that the subject was no longer able to fulfil the task of avoiding obstacles at all, citing "bad form on the day" and fatigue as reasons. In order to not influence the available data by this apparent deviation, these two subjects were excluded from all further analysis.
The age of the remaining participants (six female, five male, one not specified) averaged 45 ± 16.65 years and ranged from 25 to 72 years. All were healthy-apart from visual impairments-and stated that they were able to assess and perform the physical effort of the task; none had prior experience with the Unfolding Space Glove or other visual SSDs.
All participants in the blind group have been using the white cane on a daily basis and for at least five years and/or did an Orientation and Mobility (O&M) training. Some occasionally use technical aids like GPS-based navigation and one even had prior experience using the feelspace belt (for navigation reasons only, not for augmentation of the Earth's magnetic field). Two reported to use Blind Square from time to time and one used a monocular.
None of the sighted group had prior experience with the white cane.

Experimental Setup
The total duration of the study per subject differs between the blind and the sighted group, as the sighted have to do the training with both aids and the blind with the SSD only (since one inclusion criterion was experience in using the cane). The total length thus was about 4.5 h in the blind group and 5.5 h in the sighted group.
In addition to paper work, introduction and breaks, participants of the sighted group received 10 min of an introductory tutorial on both aids, had 45 min of training with them, spent 60 min using them during the trials (varied slightly due to the time required for completion) and thus reached a total wearing time of about 2 h with each aid. In the blind group, the wearing time of the SSD was identical, while the wearing time of the cane is lower due to the absence of tutorial and training sessions with it.
The study was divided into three study sessions, which took place at the Köln International School of Design (TH Köln, Cologne, Germany) over the span of six weeks. In the middle of a 130 square meter room, a 4 m wide and 7 m long obstacle course was built ( Figure 2B), bordered by 1.80 m high cardboard side walls and equipped with eight cardboard obstacles (35 × 31 × 171 cm) distributed on a 50 cm grid ( Figure 2C) according to the predefined course layouts.

Procedure
Before the first test run (baseline), the participants received a 10-min Tutorial in which they were introduced to the handling of the devices. Directly afterwards, they had to complete the first Trial Session (TS). This was followed by total of three Practices Sessions (PS), each of them being followed by another TS-making the Tutorial, four TS and three PS in total. The study concluded with a questionnaire at the end of the third study session after completion of the fourth and very last TS. An exemplary timetable of the study procedure can be found in the Supplementary Materials S4.

Tutorial
In the 10-min Tutorial, participants were introduced to the functional design of the device, its components and its basic usage such as body posture and movements while interacting with the device. At the end, the participants had the opportunity to experience one of the obstacles with the aid and to walk through a gap between two of these obstacles.

Trial Sessions
Each TS consisted of seven consecutive runs in the aid condition cane and seven runs in the condition SSD, with a flip of a coin in each TS deciding which condition to start with. The task given verbally after the description of the obstacle course read: "You are one meter from the start line. You are not centered, but start from an unknown position. Your task is to cross the finish line seven meters behind the start line by using the aid. There are eight obstacles on the way which you should not touch. The time required for the run is measured and your contacts with the objects are counted. Contacts caused by the hand controlling the aid are not counted. Time and contacts are equally weighted-do not solely focus on one. You are welcome to think out loud and comment on your decisions, but you won't get assistance with finishing the task." Contacts with the cane were not included in the statistics, as an essential aspect of its operation is the deliberate induction of contact with obstacles. In addition, for both aids, contacts caused by the hand guiding it were not included in the statistics as well in order to motivate the subjects to freely interact with the aids. There was a clicking sound positioned at the end of the course (centred and 2 m behind the finish line) to roughly guide the direction. There was no help or other type of interference while participants were performing the courses. Only when they accidentally turned more than 90 degrees away from the finish line were they reminded to pay attention to the origin of the clicking sound. Both task completion time and obstacle contacts (including a rating in mild/severe contacts) were entered into a macro-assisted Excel spreadsheet on a hand-held tablet by the experimenter, who was following the subjects at a non-distracting distance. The data of all runs can be found in the study data in Supplementary Materials S3.
A total of 14 different course layouts were used (Figure 3), seven of which were longitudinal axis mirror images of the other seven. The layout order within one aid condition (SSD/cane) over all TS was the same for all participants and predetermined by drawing all 14 possible variations for each TS without laying back. This means that all participants went through the 14 layouts four times each, but in a different order for each TS and with varying aids, so that a memory effect can be excluded.
The layouts were created in advance using an algorithm that distributed the obstacles over the 50 cm grid. A sequence of 20 of these layouts was then evaluated in self-tests and with pre-subjects, leaving the final seven equally difficult layouts ( Figure 3).
The study design and the experimental setup were inspired by a proposal of a standardised obstacle course for assessment of "visual function in ultra low vision and artificial vision" [63] but has been adapted due to spatial constraints and selected study objectives (e.g., testing with two groups and limited task scope only). There are two further studies suggesting a very similar setup for testing sensory substitution devices [64,65] that were not considered for the choice of this study design.

Practice Session
The practice sessions were limited to 15 min and followed a fixed sequence of topics and interaction patterns to be learned with the two aids ( Figure 4A-C). In the training sessions obstacles were arranged in varying patterns by the experimenter. Subjects received support as needed from the experimenter and were not only allowed to touch objects in their surroundings, but were even encouraged to do so in order to compare the stimuli perceived by the aid with reality.
In the case of the SSD training, after initially learning the body posture and movement, the main objective was to understand exactly this relationship between stimuli and real object. For this purpose, the subjects went through, for example, increasingly narrow passages with the aim of maintaining a safe distance to the obstacles on the left and right. Later, the tasks increasingly focused on finding strategies to find ways through course layouts similar to the training layouts. While the training with the white cane (in the sighted group) took place in comparable spatial settings, here the subjects learned exercises from the cane programme within an O&M training (posture, swing of the cane, gait, etc.). The experimenter himself received a basic cane training from an O&M trainer in order to be able to carry it out in the study. The sighted subjects were therefore not trained by an experienced trainer, but all by the same person. At the same time, the SSD was not trained "professionally" either, as there are no standardised training methods specifically for the device yet.

Qualitative Approaches
In addition to the quantitative measurements, the subjects were asked to think aloud, to comment on their actions and to describe why they made certain decisions, both during training and during breaks between trials. These statements were written down by hand by the experimenter.
After completion of the last trial, the subjects were asked to fill out the final questionnaire. It consisted of three parts: firstly, the 10 statements of the System Usability Score (SUS) Test [66] on a 0-4 Likert agreement scale; Secondly, 10 further custom statements on handling and usability on the same 0-4 Likert scale; And finally seven questions on different scales and in free text about perception, suggestions for improvement and the possibility to leave a comment on the study. The questions of part one and two were always asked twice: once for the SSD and once for the cane. The subjects could complete this part of the questionnaire either by handwriting or with the help of an audio survey with haptic keys on a computer. This allowed both sighted and blind subjects to answer the questions without being influenced by the presence of the investigator. The third part, on the other hand, was read out to the blind subjects by the investigator, who noted down the answers.
Due to the small number of participants, the results of the questionnaire are not suitable for drawing statistically significant conclusions, but should rather serve the qualitative comparison of the SSD with the cane and support further developments on this or similar SSDs. In the study data in Supplementary Materials S3 there is a list with all questionnaire items and the Likert scale answers of items 1-20. In the Results section and in Appendix H relevant statements made in the free text questions are included.

Analysis and Statistical Methods
A total of 784 trials in 14 different obstacle course layouts were performed by every subject over all sessions. The dependent variables were task completion time (in short time) and number of contacts (in short contacts).
Fixed effects were: • group: between-subject, binary (blind/sighted) • aid: within-subject, binary (SSD/cane) • TS: within-subject, numerical and discrete (the four levels of training) Variables with random effects were: layout as well as the subjects themselves, nested within their corresponding group.
The quantitative data of the dependent variable task completion time was analysed by means of parametric statistics using a linear mixed model (LMM). In order to check whether the chosen model corresponds to the established assumptions for parametric tests, the data were analysed according to the recommendation of Zuur et al. [67]. The time variable itself has been normalised in advance using a logarithmic function to meet those assumptions (referred to in the following as log time). With the assumptions met, all variables were then tested for their significance to the model and their interactions with each other. See Appendix G for details on the model, its fitting procedure, the assumption and interactions tests and corresponding plots.
Most statistical methods only test for the presence of differences between two treatments and not for their degree of similarity. To test the sub-hypotheses of H1, a noninferiority test (H1a) and an equivalence test (H1b) were thus carried out. These check whether the least squares (LS) means and corresponding confidence intervals (CI) of a selected contrast exceed a given range (here 25 percentage points in both sub-hypotheses) either in the lower or in the upper direction. In order to confirm the latter, equivalence, both directions must be significant; For non-inferiority only the "worse" (in this case the slower side) has to be significant (since it would not falsify the sub-hypothesis if the SSD were unequally faster) [68].
No statistical tests were performed on the contacts data. Since the data structure is zero-inflated and poisson distributed, non-parametric tests such as a generalised linear mixed model would be required, resulting in low statistical power given the sample size. Nevertheless, descriptive statistical plots of these data alongside the analysis of the log time statistics are to be included in the next section.
All analyses have been executed using the statistical computing environment R and the graphical user interface RStudio. The lme4 package was used to run LMMs. To calculate Least LS means, their CI and the non-inferiority/equivalency tests, the emmeans and the emtrends package was used. In this paper averages are shown as arithmetic mean with the corresponding standard deviation. For all statistical tests an alpha level of 0.05 was chosen.

Overview
To give an impression of the study procedure, a series of videos was made available in high resolution at https://vimeo.com/channels/unfoldingspace (accessed on 26 January 2022). A selection of lower resolution clips is also attached to Supplementary Materials S5. The corresponding subject identifier and the trial number can be found in the opening credits and the descriptive text. All test persons shown here have explicitly agreed to the publication of these recordings.
To get an overview of the gathered data, Figure 5   Contacts show a similar picture ( Figure 6): in the last TS, sighted subjects touched an average of 0.38 ± 0.73 objects per run with the SSD and only 0.12 ± 0.33 with the cane. Blind subjects also showed a comparable response in the last TS, touching an average of 0.45 ± 0.89 objects per run with the SSD, while touching only an average of 0.4 ± 0.63 objects with the cane. As mentioned, these differences cannot be reasonably tested.

Learning Effect
As a basic assumption for the subsequent hypothesis tests, the learning effect on performance has to be investigated. A significant effect of TS on log time was expected in all combinations except B&C, in which the subjects were already familiar with the aid. Still, with habituation to the task and familiarity with the conditions of the course (size, type of obstacles, etc.), a negligible effect could be expected in all four conditions, i.e., also in the case of B&C. The test was carried out by adjusting the base level of the LMM to the four different combinations. TS shows a statistically significant effect on log time in Condition S&S (intercept = 4.37, slope = −0.13, SE = 0.04 , p = 0.007), S&C (intercept = 4.03, slope = −0.15, SE = 0.04 , p = 0.002) and in B&S (intercept = 3.91, slope = −0.16, SE = 0.04 , p = 0.001) but not in B&C (intercept = 3.08, slope = −0.06, SE = 0.04 , p = 0.137). The expected learning progress was thus confirmed by the tests; the general habituation slope over the course of the study for all subjects and aids was around −0.06 s on log scale.

Hypothesis H1|Performance
Given the knowledge of the significant effects of group, aid and TS and their interactions, the two sub-hypothesis H1a and H1b could be tested.

H1a|Non-Inferiority of SSD Performance
In order to accept H1a, two separate tests were carried out: firstly, a pairwise comparison using Tukey's HSD test between conditions S&S and S&C-both under the condition of TS being 4 (after last training): using the Kenward-Roger approximation, a significant difference (p < 0.001) was found, with the log time LS means predicted to be 54 for the cane to complete one obstacle course run. Secondly, the test for non-inferiority between these two conditions (using the Kenward-Roger approximation and the Šidák correction) was performed and found to be non-significant (p value 1). This means that the SSD is significantly different from the cane and could be considered inferior within the predefined tolerance range of 25%. H0 of H1a thus could not be rejected. The difference between SSD and cane under the condition investigated can also be observed in Figure 7A.
For contacts, as mentioned, a statistical analysis is not feasible. Still, the results can be compared descriptively in previous plot Figure 6. As already mentioned, the difference in measured mean contacts per run differed from 0.38 ± 0.73 objects per run with the SSD and only 0.12 ± 0.33 with the cane.

H1b|Equivalence of Learning Progress
To accept H1b, the learning progress of the SSD had to be compared between the two groups, again by using two tests: firstly, the estimated effect of TS on log time differed by only 0.04 s (SE = 0.06) between S&S (−0.12 s, SE = 0.04) and B&S (−0.16 s, SE = 0.04) condition, while not being significant (p value = 0.89). This means that there is no proof at this point that the learning progress between the groups is different. Secondly, to examine the degree of similarity of the given contrast an equivalence test was carried out (again using Kenward-Roger approximation and Šidák correction): a significant p value of 0.016 indicated the presence of equivalence of learning progress in both groups with the SSD (within the predefined tolerance range of 25%). H0 of H1b thus could be rejected. The learning progress of the SSD across the groups can also be observed in Figure 7B.
Again for contacts, a statistical analysis is not feasible. Nevertheless, it appears useful to compare the progress of the sighted and blind curve with the SSD in previous plot Figure 6. In particular, the running average described in this figure suggested a quite similar progress between those two.

Hypothesis H2|Usability & Acceptance
In Figure 8, one can find a tabular evaluation and a graphical representation of all Likert scale questions of the first and second part of the questionnaire (including the SUS). In general, one can see that the degree of coverage between SSD and cane was comparatively high. The discussion section therefore looks at the questions that show the greatest average deviations and discusses them in a classifying manner. There is no graphical representation of questions 21-27 as they were in free text or on other scales.

System Usability Score
The System Usability Score, which was queried in the first 10 questionnaire items, results from the addition of all scores multiplied by 2.5 and thus ranges from 0 to a maximum of 100 possible points. The SSD achieved an average SUS of 50 in this study, while the cane scored quite similarly at 53. As expected, the cane performed slightly better in the blind group (54) than in the sighted (52), while the SSD performed better in the sighted (51) than in the blind (49). The differences are rather negligible due to the sample size but can be seen as an indicator of a quite comparable assessment of both systems.  Question 8: I found the system very cumbersome to use.
Question 9: I felt very confident using the system.
Question 10: I needed to learn a lot of things before I could get going with this system.
Question 1: I think that I would like to use this system frequently.
Question 2: I found the system unnecessarily complex.
Question 3: I thought the system was easy to use.
Question 4: I think that I would need the support of a technical person to be able to use this system.
Question 5: I found the various functions in this system were well integrated.

Clustered Topics in Subjects' Statements
Statements expressed in the free interviews after each session, during the aloud reflection in the training sessions and in the free text questions of the questionnaires were grouped into five main topics (with the number of subjects mentioning them in brackets): • Cognitive Processing of the Stimuli (9). From the statements it is quite clear that the use of the SSD at the beginning of the study required a considerably higher cognitive effort than the use of the cane (Appendix H, Table A3, ID 1-4). Towards the end of the study, the subjects still reported a noticeable cognitive effort. However, they often also noted that the experience felt different than it did at the beginning of the training and that they could imagine that further training would reduce the effort even more (Appendix H, Table A3, ID 5 & 6). The subjects' reports towards the end of the study also suggested that deeper and more far-reaching experiences might be possible with the SSD than with the cane (Appendix H, Table A3, ID 7-9). • Perception of Space and Materiality (6). The topic of how subjects perceived space and its materiality is undoubtedly related to the previously cited statements about the processing of stimuli: it is noteworthy how often spatial or sometimes visual accounts were assigned to the experiences with the SSD, while the cane was rather described as a tool for warning of objects ahead (Appendix H, Table A4). • Wayfinding Processes (5). It was mentioned as an advantage of the glove that, in contrast to the cane, an obstacle can already be detected before contact is madei.e., earlier and from a greater distance; A different path can then be taken in advance in order to avoid a collision with this object. In addition, some described an unpleasant feeling of actively bumping into obstacles with the cane just to get information. However, these were mainly sighted people who were not yet used to handling the cane (Appendix H, Table A5). • Enjoyment of Use (3). The cane is described by some as easy to learn but therefore less challenging and less fun (Appendix H, Table A6). • Feeling Safe and Comfortable (3). On the other hand, subjects also report that they feel safer and more comfortable with the cane (Appendix H, Table A7).

Advantages of the SSD
In question 25 (Q25) the subjects could name pros and cons of both devices. These were summarised to topics, being described from the perspective of the advantages of the Unfolding Space Glove. An advantage of the cane, for example, was thus evaluated as a disadvantage of the SSD. The most frequently mentioned (by three or four subjects) advantages of the SSD were the following: more spatial awareness is possible; one can survey a wider distance; the handling is more subtle and quiet. Frequent disadvantages were: the higher learning effort for the SSD and the fact that one can obtain less information about the type of objects due to missing acoustic feedback.

Suggestions for Improvement from Subjects
In Q26, the subjects were encouraged to list their suggestions for improvement for a future version of the same SSD-even if these may not be technically feasible. The two biggest wishes addressed two well-known problems of the prototype: detection of objects close to the ground (e.g., steps, thresholds, unevenness, . . . ) was requested by five subjects and the detection of objects closer than 10 cm (where the prototype currently cannot measure and display anything) was requested by four of them. Both would probably have been mentioned even more frequently if they had not already been pointed out as well-known problems at the beginning of the study. Additionally, subjects (number in brackets) wished that they did not have to wear a battery on their arm (3) and wished that the device was generally more comfortable (2). Some individuals mentioned that they would like to customise the configuration (e.g., adjust the range). Some wished for the detection of certain objects (e.g., stairs) or characteristics of the room (brightness/darkness) to be communicated to them via vibration patterns or voice output.

Hypothesis H3|Distal Attribution
H3 could be rejected, as there were no specific indications of distal attribution of perceptions in the subjects' statements. However, some of the statements strongly suggest that such patterns were already developing in some subjects (Appendix H, Table A8), which is why this topic will be addressed in the discussion.

Discussion
The results presented above demonstrate not only the perceptibility and processibility of 3D images by means of vibrotactile interfaces for the purpose of navigation, but also the feasibility, learnability and usefulness of the novel Unfolding Space Glove-a haptic spatio-visual sensory substitution system.
Before discussing the results, it has to be made explicitly clear that the study design and the experimental set-up do not yet allow generalisations to be made about real-life navigational tasks for blind people. In order to be able to define the objective of the study precisely, many typical, everyday hazards and problems were deliberately excluded. These include objects close to the ground (thresholds, tripping hazards and steps) or the recognition of approaching staircases. Furthermore, auditory feedback from the cane, which allows conclusions to be drawn about the material and condition of the objects in question, were omitted. In addition, there is the risk of a technical failure or error, the limit of a single battery charge and other smaller everyday drawbacks (waterproofness, robustness, etc.) that the prototype currently still suffers from. Of course, many of the points listed here could be solved technically and could be integrated into the SSD at a later stage. However, they would require development time, would have to be evaluated separately and can therefore not simply be taken for granted in the present state.
With that being said, it is possible to draw a number of conclusions from the data presented. First of all, some technical aspects: the prototype withstood the entire course of the study with no technical problems and was able to meet the requirements placed on it that allow the sensory experience itself to be assessed as independently of the device as possible. These include, for example, intuitive operation of the available functions, sufficient wearing comfort, easy and quick donning and doffing, sufficient battery life and good heat management.
The experimental design can also be pointed out: components such as data collection via tablet, the labelled grid system for placing the obstacles plus corresponding set-up index cards and the interface for real-time monitoring of the prototype enabled the sessions to be carried out smoothly with only one experimenter. An assistant helped to set up and dismantle the room, provided additional support (e.g., by reconfiguring the courses and documented the study in photos and videos), but neither had to be, nor was present at every session. Observations, ratings and participant communication were carried out exclusively by the experimenter.
Turning now to the sensory experience under study, it can be deduced that 3D information of the environment is very direct and easy to learn. Not only were the subjects able to successfully complete all 392 trials with the SSD, but they also showed good results as soon as the first session and thus after only a few minutes of wearing the device. This is, to the best of our knowledge, in contrast to many other SSDs in the literature, which require several hours of training before the new sensory information can be used meaningfully (in return, usually offering higher information density).
Nevertheless, the cane outperforms the SSD in time and contacts, in both groups in the first TS and at every other stage of the study (also see Figures 5 and 6). Apart from the measurements, the fact that the cane seems to be even easier to access than the glove is also shown in the results of the questionnaire among the sighted subjects, for whom both aids were new: while many answers between the SSD and the cane only differed slightly on average (∆ ≤ 0.5), the deviations are greatest in questions about the learning progress. Sighted subjects thought the cane was easier to use (Q3, ∆ = 1.0), could imagine that "most people would learn to use this system" more quickly (Q7, ∆ = 0.8) and stated that they had to learn less things before they could use the system (Q10, ∆ = 0.7) compared to the SSD.

Hypothesis 1 and Further Learning Progress
At the end of the study and after about 2 h of wearing time, H1a states that the SSD is still about 54% slower than the cane. Even though this difference in walking speed could be acceptable if (and only if) other factors gave the SSD an advantage, H1 had to be rejected: the deviation exceeds the predefined 25% tolerance range and thus can no longer be understood as an "non-inferior performance". H1b, however, can be accepted, indicating that due to a "equivalent learning progress" between the groups these results would also apply to blind people who have not yet learned either device.
Left unanswered and holding potential for further research is the question of what the further progression of learning would look like. The fact that the cane (in comparison to the SSD) already reached a certain saturation in the given task spectrum at the end of the study is indicated by several aspects: looking at the performance of B&C, one can roughly estimate the time and average contacts that blind people need to complete the course with their well-trained aid. At the same time, the measurements of S&C are already quite close to those of B&C at the end of the study, so that it can be assumed that only a few more hours of training would be necessary for the sighted to align with them (within the mentioned limited task spectrum of the experimental setup). The assumption that the learning curve of the SSD is less saturated than that of the cane at the end of the study is supported by sighted subjects stating that the cane required less concentration at the end of the study (Q18, 1.8) than the SSD (Q18, 2.7). Furthermore, they expected less learning progress for the cane with "another 3 h of training" (Q23, 2.2) in contrast to the SSD (Q23, 3.5) and also in contrast to the learning progress they already had with the cane during the study (Q22, 3.3). Therefore, a few exciting research questions are whether the learning progress of the SDD would continue in a similar way, at which threshold value it could come to rest and whether this value would eventually be equal to or better than that of the cane. Note that the training time of 2 h in this study is far below that of many other publications on SS; one often-cited study e.g., reached 20-40 h of training with most and 150 h with one particular subject [32].
Another aspect that confounds the interpretation of the data is the presence of a correlation between the two independent variables time/log time and contacts. The reason for this is quite simple: faster walking paces lead to a higher number of contacts. A slow walking pace, on the other hand, allows more time for the interpretation of information and, when approaching an obstacle, to come to a halt in time or to correct the path and not collide with the obstacle. The subjects were asked to consider time and contacts as being equivalent. Yet these variables lack any inherent value that would allow comparing a potential contact with loss of time, for example. Personal preference may also play a role in the weighting of the two variables: fear of collisions with objects (possibly overrepresented in sighted people due to unfamiliarity with the task) may lead to slower speeds. At the same time, the motivation to complete the task particularly quickly may lead to faster speeds but higher collision rates. Several subjects reported that towards the end of the study, they felt that they had learned the device to the point where contacts were completely avoidable for them given some focus. This attitude may have led to a bias in the data, which can be observed in the fact that time increased in the sighted group with both aids towards the end of the study while the collisions continued to fall. It seems to be difficult to solve this problem only by changing the formulation of the task and without expressing the concrete value of an obstacle contact in relation to time (e.g., a collision would add 10 s to the time or leads to the exclusion of this trial from the evaluation).

Usability & Acceptance
While there is an ongoing debate about how to interpret SUS and what value the scoring system has in the first place, the scores of the SSD (50) and the cane (53) are comparably low. Therefore they can be interpreted as being "OK" only and, in the context of all the systems tested with this score, they tend to be in the 20% or even 10% percentile [69]. It should be noted, however, that the score is rarely used to evaluate assistive devices at all, which may partly explain a generally poorer performance in those. The score is, however, suitable to "compare two versions of an application" [69]: the presented results therefore indicate that usability in the two tested aids does not fundamentally differ in the somewhat small experimental group. This equivalence can also be assessed from other questionnaire items with most having very few deviations: Looking at the Likert scale averages of the entire study population (blind & sighted), biggest deviations (∆ ≥ 0.5) can be observed in the expected recognisability of a "visual impairment because of the aid" that is stated to be much lower (Q15, ∆ = 2.0) with the SSD (Q15, 1.8) than with the cane (Q15, 3.8). Just as in the sighted group, the average of both groups stated that the cane was easier to use (Q3, ∆ = 0.8) and required less concentration at the beginning of the study (Q17, ∆ = 0.8), whereas this difference becomes negligible by the end of the study (Q18, ∆ = 0.3). Last but not least, the two aids differed in Q11 ("I had a lot of fun using the aid"), in which the SSD scored ∆ = 0.7 points better.
Exemplary statements have already been presented in the results section, summarised and classified into topics. They support the theses that the Unfolding Space Glove achieves its goal of being easier and quicker to learn than many other SSDs while providing users with a positive user experience. However, the sample size and the survey methods are not sufficient for more in-depth analyses. The presentation should rather serve the purpose of completeness and provide insights into how the learning process was perceived by the test persons in the course of the study.

Distal Attribution and Cognitive Processing
The phenomenon of distal attribution (sometimes externalisation of the stimulus) in simplified terms describes when users of an SSD report to no longer consciously perceive the stimulus at the application site on the body (here e.g., the hand), but instead refer to the perceived objects in space (distal/outside the body). This can also be observed, for example, when sighted people describe visual stimuli and do not describe the perception of stimuli on their retina, but instead the things they see at their position in space. Distal attribution was first mentioned in early publications of Paul Bach-y-Rita [31,33] in which participants received 20-40 h of training (one individual even 150 h) with a TVSS and has also been described in other publications e.g., about AVSS devices [40]. Ever since it has been discussed in multifaceted and sometimes even philosophical discourses and has been topic of many experimental investigations [70][71][72][73]. As already described in the results section, the statements do not indicate the existence of this specific attribution. However, they do show a high degree of spatio-motor coupling of the stimuli and suggest the emergence of distal-like localisation patterns. The wearing time of only 2 h, however, was comparatively short and studies with longer wearing times would be of great interest on this topic.

Compliance with the Criteria Set
In the introduction to this paper, 14 criteria were defined that are important for a successful development of an SSD. See also Appendix A for a description of those. Chebat et al. [55], who collected most of them, originally did so to show problems of known SSD proposals. In the design and development process of the Unfolding Space Glove these criteria did play a crucial role from the very start. Now, with the findings of this study in mind, it is time to examine to what degree it can meet the list of criteria by classifying six of its key aspects (with numbers of the respective criteria in parentheses):  [74], proved to be a suitable site of stimulation with regard to several aspects: a fairly natural posture of the hand when using the device enables a discrete body posture, does not interfere with the overall aesthetical appearance (14) and preserves sensory and motor habits (12). The orientation of the Sensor (6) on the back of the hand is hoped to be quite accurate as we can use our hands for detailed motor actions and have a high proprioceptive precision in our elbow and shoulder [75]. Last but not least, the hand has a high motor potential (11) (rotation and movement in three axes), facilitating the sensorimotor coupling process. • Thorough Product & Interaction Design. A good design does not only consist of the visible shell. Functionality, interaction design and product design must be considered holistically and profoundly, and in the end they pay off on many aspects apart from the aesthetic appearance (14) itself, such as almost all of the key aspects discussed in this section and on user experience and joy of use (13).

Conclusions
The Unfolding Space Glove, a novel wearable haptic spatio-visual sensory substitution system, has been presented in this paper. The glove transforms three-dimensional depth images from a time of flight camera into vibrotactile stimuli on the back of the hand. Blind users can thus haptically explore the depth of the space surrounding them and obstacles contained therein by moving their hand. The device, in its somewhat limited functional scope, can already be used and tested without professional support and without the need of external hardware or specific premises. It already is highly portable and offers a continuous and very immediate feedback, while its design is unobtrusive and discreet.
In a study with eight blind and six sighted (but blindfolded) subjects, the device was tested and evaluated in obstacle courses. It could be shown that all subjects were able to learn the device and successfully complete the parcours presented to them. Handling has low entry barriers and can be learned almost intuitively in a few minutes, with the learning progress between blind and sighted subjects being fairly comparable. However, at the end of the study and after about 2 h of wearing the device, the sighted subjects were significantly slower (by about 54%) in solving the courses with the glove compared to the white long cane they had worn and trained for the same amount of time.
The device meets many basic requirements that a novel SSD has to fulfil in order to be accepted by the target group. This is also reflected in the fact that the participants reported a level of user satisfaction and usability that is-despite its different functions and complexity-quite comparable to that of the white long cane.
The results in the proposed experimental set-up are promising and confirm that depth information presented to the tactile system can be cognitively processed and used to strategically solve navigation tasks. It remains open how much improvement could be achieved in another two or more hours of training with the Unfolding Space Glove. On the other hand, the results are of limited applicability to real-world navigation for blind people: too many basic requirements for a navigation aid system (e.g., detection of ground level objects) are not yet included in the functional spectrum of the device and would have to be implemented and tested in further research.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/s22051859/s1, S1: GitHub hardware release 0.2.1; S2: GitHub software release 0.2.2; S3: Study data; S4: Exemplary timetable; S5: Exemplary videos (low resolution); S6: GitHub monitor release 1.4. Code, hardware, documentation and building instructions are also available in the public repositories https://github.com/jakobkilian/unfolding-space (accessed on 26 January 2022). Consider Release v0.2.1 for the stable version used in this study and consider more recent commits in which the content has been revised for better accessibility. High resolution video clips of subjects completing trials can also be found at: https://vimeo.com/channels/unfoldingspace (accessed on 26 January 2022). For more information and updates on the project please also see https: //www.unfoldingspace.org (accessed on 26 January 2022). All content of the project, including this paper, is licensed under the Creative Commons Attribution (CC-BY-4.0) licence (https://creativecommons.org/licenses/by/4.0/ (accessed on 26 January 2022)). The source code itself is under the MIT licence. Please refer to the LICENSE file in the root directory of the Github repository for detailed information. Funding: Funding to conduct the study was received from University of Tübingen (ZUK 63) as part of the German Excellence initiative from the Federal Ministry of Education and Research-Germany (BMBF). This work was done in an industry-on-campus-cooperation between the University of Tübingen and Carl Zeiss Vision International GmbH. The authors received no specific funding for this work. The funder did not have any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. In addition, the prototype construction was funded-also by the BMBF-as part of the Kickstart@TH Cologne project of the StartUpLab@TH Cologne programme ("StartUpLab@TH Cologne", funding reference 13FH015SU8). We furthermore acknowledge support by Open Access Publishing Fund of University of Tübingen.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of the faculty of Medicine at the University Hospital of the Eberhard-Karls-University Tübingen (project number 248/2021BO2, approved on 10 May 2021).

Informed Consent Statement:
Written informed consent has been obtained from the subjects to publish this paper. Individuals shown in photographs in this paper have explicitly consented to the publication of the photographs by signing an informed consent form.

Data Availability Statement:
The data presented in this study are available in Supplementary Materials S3 at: https://www.mdpi.com/article/10.3390/s22051859/s1.

Acknowledgments:
The authors would like to thank Kjell Wistoff for his active support in setting up, dismantling and rebuilding the study room, organising the documents and documenting the study photographically; Trainer Regina Beschta for a free introductory O&M course and the loan of the study long cane; Tim Becker and Matthias Krauß from Press Every Key for their open ear when giving advice on software and hardware; The Köln International School of Design (TH Köln) and the responsible parties for making the premises available over this long period of time; pmdtechnologies ag for providing a Pico Flexx camera; Munitec GmbH, for providing glove samples; Connor Shafran for the proofreading of this manuscript; All those who provided guidance in the development of the prototype over the past years and now in the implementation and evaluation of the study. And last but not least to the participants for volunteering for this purpose.

Conflicts of Interest:
We declare that Siegfried Wahl is scientist at the University of Tübingen and employee of Carl Zeiss Vision International GmbH, as detailed in the affiliations. There were no conflict of interest regarding this study. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A. Design Requirements for New SSDs
The following 14 aspects are crucial for the development and evaluation of new SSDs. Points 1 to 10 originate from Chebat et al. [55] while points 11-14 were added by the authors.

1.
Learning: since mastering an SSD often requires many hours of training, many prospective users are discouraged. In some cases, the motor functions learned by Visually Impaired People (VIPs) to orient themselves are contradictory to functions of SSDs. For example, blind people keep their heads straight in Orientation & Mobility (O&M) training, while using an SSD with a head-mounted camera requires turning the head to get the best scan of the scene. The resulting conflict and fear of losing established and functioning systems through training with SSDs therefore represents a major obstacle.

2.
Training: most SSD vendors offer no or too little training material for end users.
There is also no standardised test procedure that would allow comparisons between systems. A standardised obstacle course proposed by Nau et al. [63] to test low vision or artificial vision is a promising approach, but has, to our knowledge, not yet reached widespread use.

3.
Latency: some systems, especially AVSS, suffer from high latency between changes in the input image (e.g., by shifting the camera's perspective) and the generated sensory feedback. The resulting low immediacy between motor action by the user and his/her sensory perception, hampers the coupling process with the substituted modality.

4.
Dissemination: information on available SSDs is simply not very widespread yet. Scientific publications on the topic are often difficult to obtain or not accessible in a barrier-free way for VIPs.

5.
Cognitive load: with many systems, especially those with high resolution or bandwidth and many output actuators, the interpretation of the stimuli requires a high level of attention and concentration, which is then lacking for other areas of processing, orientation or navigation. 6.
Orientation of the Sensor: VIPs often find it difficult to determine the real position of objects in the room based on the images perceived with the help of the actuators. This is because the assignment between input and output is not always clear, or it is not apparent which part of the scene they are looking at. 7. Spatial depth: for many locomotion navigation tasks, the most important aspect is spatial depth. Extracting this feature from the greyscale image of a complex scene is time-consuming and only possible with good contrast and illumination. If the 3D information is provided directly, there is-in theory-no need for this extraction. 8.
Contrast: in traditional TVSS systems using conventional passive colour or B/W cameras, illumination and contrast of the scene matter. Using depth information from an actively illuminating 3D cameras (described later on) eliminates this. 9.
Resolution: by down sampling the resolution of the visual information to fit the physiological limits of the stimulated modality, a lot of acuity and therefore crucial information can be lost. A zoom function is one way to encounter this loss. 10. Costs: as the development of SSDs can take many years of research and development, commercially available devices are often expensive, which can deter potential users. Costs can be reduced by using widespread eletronic components instead of specialised parts. Another way to reduce costs is to use existing devices like smartphones and their features like the camera, the battery and the calculating power. One example for this is the AVSS "the vOICe" [40]. 11. Motor Potential: as mentioned in the brain plasticity section, a crucial factor for the success of sensorimotor coupling in learning SSDs is movement itself. If the potential for movements with the sensor (hence active motor influence on the sensory input) is low, this enactive process might be hindered. Positioning the sensor or camera on head or trunk, as seen in some examples, influences this potential considerably, as it only has limited rotation and very low potential for movement. If the head is kept straight, as commonly practised in O&M trainings, this potential might be reduced even more. 12. Preservation of Sensory and Motor Habits: there is a wide range of possible restrictions of everyday sensory and motor habits that body-worn devices can cause. The overall weight of the device, its form factor and positioning on the body need to be carefully considered and tested, taking into account those habits and needs of VIPs.
Blocking the auditory sensory channel by using headphones can, for example, hinder danger-recognition, being addressed and orienting by auditory information. As the hands play a key role in object recognition when feeling objects at close range, their mobility should also be ensured. 13. User Experience and Joy of Use: the stimuli should provide a pleasant experience, should not exceed pain thresholds and should not lead to overstraining, irritating or disturbing side effects, even with prolonged use. In order to get used to a new device, not only the purely functional benefit must be convincing. The use of the device should also be enjoyable and trigger a positive user experience. Otherwise, the basic acceptance could be reduced and the learning process could be hindered. 14. Aesthetic Appearance: even if functionality is in the foreground, VIPs in particular should not be offered devices that do not meet basic aesthetic standards. A device with a discrete design, which possibly fits stylistically and colour-wise into the outer appearance of the VIP, leads to confidence in it and greater pleasure in using it. At the same time it prevents unnecessary stigmatisation.

Appendix B. Selection of the Input System
ToF cameras have been around since the early 2000s [76], but prices were many times higher than today due to their exclusive use in the industrial segment [77][78][79]. A few years ago, ToF cameras were increasingly integrated as back-cameras into smartphones to be used for AR applications [80,81]; just recently they (next to similar 3D cameras) became important as a front-camera as well for a more secure biometric face recognition to unlock the device [82].
While there certainly has been work addressing three-dimensional 3D (x, y, depth) input from the environment to aid navigation, the topic is not very prevalent in SS research: some only dealt with sub-areas of a 3D input [83][84][85] or described ETAs that transmit some kind of simplified depth information, while not being an SSDs in its proper sense [86][87][88][89][90]. In addition, there is work that actually did propose SSDs that use depth data, yet they often had problems implementing or testing the systems in practice because the technology was not yet advanced enough (too slow, heavy and/or expensive) [91][92][93][94][95][96][97]. In recent years, however, papers have been published (most of them using sound as an output) that actually proposed and tested applied 3D systems [97][98][99][100][101][102][103][104][105][106][107]. This includes the Sound of Vision device, which today probably ranks among the most advanced and sophisticated, which has also been evaluated in several studies. The project started with auditory feedback only, but now also uses body-worn haptic actuators [101,102,106,108].
Overall, only very few experimented with ToF cameras [89,93] or an array of ToF sensors [103] at all and to the best of our knowledge there is no project that uses a low-cost and comparably fast new generation ToF camera for this, let alone implementing it in a tactile/haptic SSD.
Due to its price, size, form factor, frame rate and resolution the "Pico Flexx" ToF camera development kit ($389 today [79]) was chosen for the prototype ( Figure A1).

Appendix C. Details on the Setup
Both the bracelet-usually used to attach a smartphone for sporting activities-and the 10 Ah power bank are commercially available components. The attachment interface was glued to the power bank to be compatible with the bracelet. In the current configuration, the battery lasts about 8 h in ordinary use, with the majority being spent on the computing unit, which has potential to be even more power efficient.
The glove consists of two layers of fabric (modified commercially available gloves, mainly made of polyamide), between which the motors are glued and the cables are sewn. Each motor is covered with a protective sleeve made of heat-shrink tubing to prevent damage to the soldered joints exposed to heavy stress.
Finally there is the Unfolding Space Carrier Board attached to the outside using velcro (removable and exchangeable): a printed circuit board specifically made for the project containing the drivers for controlling the motors, the ToF camera attached via USB 3.0 Micro B connector and elastic band, and finally the computing unit-a Raspberry Compute Module 4-connected via a 100-pin mezzanine connector. The CM4101016 configuration of the Compute Module (16 GB Flash, 1 GB RAM, Wifi) that currently is in operation has very little load on flash and RAM and a cheaper version could also be used.

Appendix D. Design of the Actuator System
Once the input side was set up, a suitable interface had to be found to pass on the processed information to a sensory modality using the predefined medium of vibration.
Conventional eccentric rotating mass (ERM) actuators are the first choice for vibratory output in many projects; they are easy to handle, affordable, but not very responsive (rise time starting at 50 ms, usually even higher), noisy and not very durable (100-600 h life time) due to the wear of parts [109]. A series of self-tests with different ERM motors, in different arrangements and on different parts of the body confirmed this and also revealed that the ERMs quickly become uncomfortably hot in continuous operation.
Linear resonant actuators (LRAs) instead provided a remedy in the following tests: with only 10 ms rise time [110] and a higher lifetime (833 h tested, "thousands of hours" possible) [109,110] these are much better suited for this claim; they furthermore consume less power at the same amount of acceleration (which is important for mobile devices) and can apply a higher maximum acceleration [109,110] while in tests remaining cool enough for direct application to the skin.
To keep the complexity on a low level a 3 × 3 LRA matrix proved to be a good set-up in tests. Figure A2 shows this structure on an early prototype. At this point, it should be mentioned that there is relatively recent research on the perceived differences between these two actuator types when attached to the skin [111,112], which to some extent contradicts these assumptions and instead suggests ERM motors for haptic interfaces. It is yet to be seen what further research in this area will reveal.
Also note that there is a new generation of brush-less direct current (BLDC) ERM motors available not taken into account in this summary, that outperforms classic ERM motors while being more expensive but less energy efficient [109].

Appendix E. Details on the Algorithm
Each depth image is first checked for reliability and then divided into 3 × 3 tiles, each of which is used to create a histogram. Starting from the beginning of the measuring range (~10 cm), these histograms are scanned for objects at each level of depth (0-255) from near to far. If the number of pixels within a range of five depth steps respectively~4 cm exceeds the threshold, the algorithm saves the current distance value of this tile. If however the number remains below the threshold, it is assumed that there is no object within this image tile at this depth level or that there is only image noise. In this case, the algorithm increments the depth step by one and performs the threshold comparison again until it finally arrives at the furthest depth step. The nine values of the resulting 3 × 3 vibration pattern are finally passed on to the vibratory actuators as the amplitude. Each motor thus represents the object that is closest to the camera within the corresponding tile and hence to the hand of the person using the camera.
In Table A1 one can find a summary of the translated modalities. For better illustration, the three-dimensional extension of the field of view of a ToF camera is described as a frustum Figure A3.

Visual Information Translation into Haptic Stimuli
x-axis of the frustrum (horizontal extension) x-axis of the motor matrix in 3 levels y-axis of the frustrum (vertical extension) y-axis of the motor matrix in 3 levels z-axis of the frustrum (distance to the camera) amplitude/vibration strength To be as platform-independent as possible, a monitoring tool was developed in the Unity 3D game development environment. It receives data from the glove via udp protocol, as long as they are connected to the same (Wifi) network, and displays them visually. This includes the depth image and motor values in real time as well as various technical data such as processing speed, temperature of the Raspberry Pi core and others. It can also be used to control the glove, switch it off temporarily, or test the motors individually. The files for the Raspberry Pi Code (C++), the monitoring app (Unity 3D, C#) and respective documentation are open source and available on https://github.com/jakobkilian/ unfolding-space (accessed on 26 January 2022). Consider Release v0.2.1 for the stable versi n used in this study and consider more recent commits in which the content has been revised for better accessibility.

Appendix F. Data on Subjects
The following table shows a summary of the subject data from the Supplementary Materials S3. Beyond the information given, none of the subjects used smart canes, none used a guide dog and none had experience with another SSD.

Appendix G. Testing Assumptions for Parametric Tests
The model was fitted according to the top to down procedure by Zuur et al. [113] starting from the full model below including all variables that could be of reasonable relevance and piecewise removing those found to be non-significant.
(logTime ∼ Group * Aid * TS + Order + (TS|Group : Subject) + (1|Layout)) Before the model has been reduced, the following assumptions ( Figures A4-A6) have been tested in order to be able to use a linear mixed effects model in the first place and later apply parametric tests to it [67]:

Homoscedasticity of Residuals
A B D C Figure A6. Graphical representations of variance between different variables to show homoscedasticity/homogeneity of variance: (A) group and aid, (B) subjects and (C) layouts. Furthermore the residuals vs. fittet scatterplot in (D) allows to check for non-linearity, unequal error variances and outliers. Even though the variances of the error terms seem to be slightly left-slanted, the data can be seen as reasonably homoscedastic and homogeneous.
With these assumptions met, all variables were next tested individually for their effect size on the model [113]. The Akaike Information Criterion (AIC), which is included in Maximum Likelihood Estimates (MLE), was used to identify models that exclude certain variables and therefore show a better fit. Those candidates then have been checked in direct comparisons with a corresponding null model using the Chi-Square p-value. First the order effect (which aid has been tested first in a TS) could be excluded from the full model (p = 0.886). Layout has been found to be non-significant as well (p = 0.144), however it remains in the model because of reasonable concern about difference in difficulty. Subjects being hierarchically nested within group (due to the unique assignment to one of the two groups) has a significant effect on the model (p < 0.001), so do group itself (p < 0.001), aid (p < 0.001) and TS (p < 0.001).
Lastly, several interactions can be observed ( Figure A7) in the data (e.g., the blind group already having experience with the cane). The interaction between all three fixed factors (group, aid and TS) has to be included (p < 0.001) as well as between TS and aid (p < 0.001), between group and aid (p < 0.001) and also between aid and TS (p = 0.007). With order being dropped and the other variables as well as the interactions being kept, the final model was this:

Appendix H. Qualitative Statements
In the following appendix, the statements made by the subjects (verbally or in the final questionnaire) are listed in the form of tables, classified by general topics on which the statements are based. All these topics are referred to in the results section as well.  "I'm more occupied with the glove." i 1 3 "The glove needs a training phase." q 1 4 "With the glove I need even more time. The cane is easy to handle, but the glove takes longer." u 1 5 "You could not yet talk, think or do anything else while using the SSD." c 3 6 "Talking at the same time [when using the SSD] would still be too exhausting." h 4 7 "You experience the size and distance of objects a bit like touching them. With the cane, on the contrary, I don't imagine myself touching the object." h 4 8 "I could imagine that with more training it feels like a fabric that gets thicker and thicker the deeper you go. Then you just take the path of least resistance. I even made involuntary movements with my hand at the end of the study."    "The cane is more like hand-to-hand combat." c 1 7 "The cane makes you feel clumsy. It's annoying that you make a loud noise with it" u 1 [Talking about the glove:] "That was fun!" i 3 3 "I could imagine running only with the glove in certain situations. The glove is fun. You can really immerse yourself in it." v 4 Table A7. Abbreviations used in this table: ID = statement identifier; Subj = subjects; TS = Trial Session.

ID Statements on Topic 5: Feeling Safe and Comfortable Subj TS
1 "With the cane I'm not nervous, I walked fast, I'm very confident. In the last two runs with the glove I also felt more confident." t 3 2 "Cane was always safe, glove got better towards the end and I was less afraid." u 1 3 "With the cane I already felt comfortable the last time. Can run faster with stick without anything happening. More unsafe with glove." i 3 Table A8. Abbreviations used in this table: ID = statement identifier; Subj = subjects; TS = Trial Session.

ID Statements on Topic 6: Distal Attribution Subj TS
1 "I imagine the object and run my hand along it with the SSD to feel its corners, edges, shape and to know where it ends." t 4 2 "Got more of a feeling of really seeing [With the SSD]." j 4 3 "[With the SSD] I was able to estimate well how far away the objects were. Partially, a spatial idea of the space in front of me was also possible."