A Bibliometric Narrative Review on Modern Navigation Aids for People with Visual Impairment

: The innovations in the ﬁeld of specialized navigation systems have become prominent research topics. As an applied science for people with special needs, navigation aids for the visually impaired are a key sociotechnique that helps users to independently navigate and access needed resources indoors and outdoors. This paper adopts the informetric analysis method to assess the current research and explore trends in navigation systems for the visually impaired based on bibliographic records retrieved from the Web of Science Core Collection (WoSCC). A total of 528 relevant publications from 2010 to 2020 were analyzed. This work answers the following questions: What are the publication characteristics and most inﬂuential publication sources? Who are the most active and inﬂuential authors? What are their research interests and primary contributions to society? What are the featured key studies in the ﬁeld? What are the most popular topics and research trends, described by keywords? Additionally, we closely investigate renowned works that use different multisensor fusion methods, which are believed to be the bases of upcoming research. The key ﬁndings of this work aim to help upcoming researchers quickly move into the ﬁeld, as they can easily grasp the frontiers and the trend of R&D in the research area. Moreover, we suggest the researchers embrace smartphone-based agile development, as well as pay more attention to phone-based prominent frameworks such as ARCore or ARKit, to achieve a fast prototyping for their proposed systems. This study also provides references for the associated fellows by highlighting the critical junctures of the modern assistive travel aids for people with visual impairments.


Introduction
Visual impairment refers to the congenital or acquired impairment of visual function, resulting in decreased visual acuity or an impaired visual field. According to the World Health Organization, approximately 188.5 million people worldwide suffer from mild visual impairment, 217 million from moderate to severe visual impairment, and 36 million people are blind, with the number estimated to reach 114.6 million by 2050 [1]. In daily life, it is challenging for people with visual impairments (PVI) to travel, especially in places they are not familiar with. Although there have been remarkable efforts worldwide toward barrier-free infrastructure construction and ubiquitous services, people with visual impairments have to rely on their own relatives or personal travel aids to navigate, in most cases. In the post-epidemic era, independent living and independent travel elevate in importance since people have to maintain a social distance from each other. Thus, there has been consistent research conducted that concentrates on coupling technology and tools with a human-centric design to extend the guidance capabilities of navigation aids.
CiteSpace is a graphical user interface (GUI) bibliometric analysis tool developed by Chen [2], which has been widely adopted to analyze co-occurrence networks with The data for bibliometric analysis were collected from the subset of Clarivate Analytics' Web of Science Core Collection, which included entries indexed by SCI-EXPANDED, SSCI, A&HCI, CPCI-SSH, CPCI-S, and ESCI. The data retrieval strategy was as follows: TI = (((travel OR mobility OR navigation OR guidance OR guiding OR walking support OR mobile OR wayfinding) AND (system OR aid OR assist OR assistive OR assistant OR assistance OR aid OR aids OR prototype OR device)) AND (blind OR visually impair OR visually challenge OR visual impair OR visual challenge OR visually impaired OR visual impairment OR visual impairments OR vision impaired OR blindness)), where the time span = 2010-2020 (retrieved data 23 February 2021). A total of 550 references were obtained.
The key parameters were applied as year per slice (1), term source (all selections), node type (choose one at a time), selection criteria (k = 25), and visualization (cluster view-static, show merged network). Publications on signal processing and control theory were excluded. Note that for most of the research works on travel aids for visually impaired people, the essential keywords always appear in the titles. Thus, the bibliometric search was carried out on the title (TI) rather than the topic (TS).

Bibliometric Analysis of Publication Outputs
The total number of publications increased over the period studied but with some fluctuations. As shown in Figure 1, the period studied could be divided into two stages: the first stage was from 2010 to 2015 and the second stage was from 2016 to 2020. The first stage was a rapid development period; publication output increased from 20 in 2010 to 63 in 2015. One of the most important reasons was the boom of modern smartphones, led by Apple's iPhone and Google's Android phones after 2008. This facilitated the development Sustainability 2021, 13, 8795 3 of 23 of ubiquitous and affordable travel aid solutions for people with visual impairment, a trend revealed on the international telecom market and in the later analysis in this work [7]. It appears that a steady development of publication output takes place in the second stage. stage was a rapid development period; publication output increased from 20 in 2010 to 63 in 2015. One of the most important reasons was the boom of modern smartphones, led by Apple's iPhone and Google's Android phones after 2008. This facilitated the development of ubiquitous and affordable travel aid solutions for people with visual impairment, a trend revealed on the international telecom market and in the later analysis in this work [7]. It appears that a steady development of publication output takes place in the second stage.

Most Influential Journals by Co-Citation Journals Map
To further analyze the most popular journals in the development of PVI travel aids, a co-cited journals (proceedings) knowledge map that highlights influential journals was generated. Generating a co-citation journal map resulted in 419 nodes and 2442 links. have larger node sizes and higher centrality, at 0.14, 0.15, and 0.12, respectively. Therefore, we suggest that these three journals have high influence in the field of navigation assistance for PVI. In addition, according to the primary focus of the abovementioned journals, it is obvious that travel aids for PVI people possess interdisciplinary characteristics from both social and natural technological aspects.

Most Influential Journals by Co-Citation Journals Map
To further analyze the most popular journals in the development of PVI travel aids, a co-cited journals (proceedings) knowledge map that highlights influential journals was generated. Generating a co-citation journal map resulted in 419 nodes and 2442 links. From Figure 2, the top 10 co-cited journals were Lecture Notes in Computer Science (0.11), Sensors-Basel (0.05), Journal of Visual Impairment Blindness (0.14), IEEE Transactions on Systems, Man, and Cybernetics-Part C (0.06), IEEE International Conference on Robotics and Automation (0.05), Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (0.07), IEEE Transactions on Biomedical Engineering (0.07), IEEE Transactions on Pattern Analysis and Machine Intelligence (0.05), Foundations of Orientation and Mobility (0.12) and THESIS (0.04). The journals Lecture Notes in Computer Science, Journal of Visual Impairment Blindness and Foundations of Orientation and Mobility have larger node sizes and higher centrality, at 0.14, 0.15, and 0.12, respectively. Therefore, we suggest that these three journals have high influence in the field of navigation assistance for PVI. In addition, according to the primary focus of the abovementioned journals, it is obvious that travel aids for PVI people possess interdisciplinary characteristics from both social and natural technological aspects.

Most Active and Influential Authors by Co-Authorship and Co-Citationship
Active authors refer to authors who have published more documents in the given period. As shown in Figure 3, a co-author map, composed of 316 nodes and 197 links, was derived. The size of the node indicates the quantity of articles published by the author, and the lengths of links between the nodes were inversely proportional to the collaboration frequency of author pairs. From the co-author map, we observed that the connections between the most active authors are not intensive, while the 10 most active authors were Dragan Ahmetovic, Chieko Asakawa, Akihiro Yamashita, Joao Guerreiro, Edwige Pissaloux, Katsushi Matsubayashi, Cang Ye, Bogdan Mocanu, Kei Sato, and Xiaochen Zhang.

Most Active and Influential Authors by Co-Authorship and Co-Citationship
Active authors refer to authors who have published more documents in the given period. As shown in Figure 3, a co-author map, composed of 316 nodes and 197 links, was derived. The size of the node indicates the quantity of articles published by the author, and the lengths of links between the nodes were inversely proportional to the collaboration frequency of author pairs. From the co-author map, we observed that the connections between the most active authors are not intensive, while the 10 most active authors were Dragan Ahmetovic, Chieko Asakawa, Akihiro Yamashita, Joao Guerreiro, Edwige Pissaloux, Katsushi Matsubayashi, Cang Ye, Bogdan Mocanu, Kei Sato, and Xiaochen Zhang.
According to the co-authorship map, the most active author in the field is Dragan Ahmetovic, who has been working on computer vision and human-machine interactions for assistive systems. As a leading author, he worked closely with Chieko Asakawa and Joao Guerreiro when he was a research fellow at Carnegie Mellon University. The NavCog he proposed is a smartphone-based navigation system providing instructions for PVI, both indoors and outdoors. It uses a bluetooth low energy (BLE) beacon to collect scene knowledge and instruct users verbally. A considerable number of experimental studies have been conducted in airports [8], campuses [9], and other venues. It is worth mentioning that its later version also displays guidance clues, using the phone screen to enhance the instructions for users with residual vision [9,10]. Akihiro Yamashita, who worked closely with Katsushi Matsubayashi and Kei Sato, built a navigation system that uses radio frequency identification devices (RFID) and the quasi zenith satellite system (QZSS) to reinforce localization. The system uses Microsoft HoloLens to learn the geometry layout of the surroundings, which is necessary to plan feasible paths [11][12][13]. Edwige Pissaloux recently used a framework based on deep convolutional neural networks (Deep CNN) to detect indoor targets [14,15], an essential component for intelligence assistive systems. Cang Ye focused on guiding robots [16,17], while Bogdan Mocanu mainly studied mobile facial recognition, which is supposed to support the assistive system [18,19]. The ANSVIP proposed by Xiaochen Zhang uses ARcore-based simultaneous localization and mapping (SLAM) to localize users when GPS is not available, and haptic feedbacks are designed to ensure instructions are understood by the users [20].  Author co-citations reveal the influential researchers who are leading the hot topics in research, as researchers are likely to cite these works while working on relevant studies. The author co-citation map shown in Figure 4 is composed of 409 nodes and 2365 links. Specifically, each node represents one author, while the node size denotes the corresponding citation counts. For better visualization of the graph, we adjusted the layout of the nodes and removed the anonymous nodes. Thus, the lengths of links no longer indicate relationships between author nodes. According to the co-citation map, the top 10 influential authors were Dimitrios Dakopoulos (0.14), Jack M. Loomis  According to the co-authorship map, the most active author in the field is Dragan Ahmetovic, who has been working on computer vision and human-machine interactions for assistive systems. As a leading author, he worked closely with Chieko Asakawa and Joao Guerreiro when he was a research fellow at Carnegie Mellon University. The NavCog he proposed is a smartphone-based navigation system providing instructions for PVI, both indoors and outdoors. It uses a bluetooth low energy (BLE) beacon to collect scene knowledge and instruct users verbally. A considerable number of experimental studies have been conducted in airports [8], campuses [9], and other venues. It is worth mentioning that its later version also displays guidance clues, using the phone screen to enhance the instructions for users with residual vision [9,10]. Akihiro Yamashita, who worked closely with Katsushi Matsubayashi and Kei Sato, built a navigation system that uses radio frequency identification devices (RFID) and the quasi zenith satellite system (QZSS) to reinforce localization. The system uses Microsoft HoloLens to learn the geometry layout of the surroundings, which is necessary to plan feasible paths [11][12][13]. Edwige Pissaloux recently used a framework based on deep convolutional neural networks (Deep CNN) to detect indoor targets [14,15], an essential component for intelligence assistive systems. Cang Ye focused on guiding robots [16,17], while Bogdan Mocanu mainly studied Sustainability 2021, 13, 8795 5 of 23 mobile facial recognition, which is supposed to support the assistive system [18,19]. The ANSVIP proposed by Xiaochen Zhang uses ARcore-based simultaneous localization and mapping (SLAM) to localize users when GPS is not available, and haptic feedbacks are designed to ensure instructions are understood by the users [20].
Author co-citations reveal the influential researchers who are leading the hot topics in research, as researchers are likely to cite these works while working on relevant studies. The author co-citation map shown in Figure 4 is composed of 409 nodes and 2365 links. Specifically, each node represents one author, while the node size denotes the corresponding citation counts. For better visualization of the graph, we adjusted the layout of the nodes and removed the anonymous nodes. Thus, the lengths of links no longer indicate relationships between author nodes. According to the co-citation map, the top 10 influential authors were Dimitrios Dakopoulos (0.14), Jack M. Loomis  Author co-citations reveal the influential researchers who are leading the hot topics in research, as researchers are likely to cite these works while working on relevant studies. The author co-citation map shown in Figure 4 is composed of 409 nodes and 2365 links. Specifically, each node represents one author, while the node size denotes the corresponding citation counts. For better visualization of the graph, we adjusted the layout of the nodes and removed the anonymous nodes. Thus, the lengths of links no longer indicate relationships between author nodes. According to the co-citation map, the top 10 influential authors were Dimitrios Dakopoulos (0.14), Jack M. Loomis   Specifically, Jack M. Loomis specializes in spatial cognition, vision-free navigation, and augmented reality for assistive technology. He proposed a guided eye system using GPS, GIS, and virtual acoustics components. As the PVI with the guided eye system moves, illustrative speeches are generated by a voice synthesizer. Furthermore, the subject is supposed to be capable of identifying the orientation of each described element, since the environmental information is conveyed by stereo or spatialized sound [21].

Key References by Co-Citation
Analyzing the citation frequency and centrality of co-cited references helps to clarify the key literature of state-of-the-art works in the field. As shown in Figure 5, the reference co-citation map has 455 nodes and 1518 links. Table 1 shows the details of the top 10 co-cited references ranked by citation frequency. In terms of centrality, the key works are by Dakopoulos (2010), Guerrero (2012), Pradeep (2010), and Rodriguez (2012).

Key References by Co-Citation
Analyzing the citation frequency and centrality of co-cited references helps to clarify the key literature of state-of-the-art works in the field. As shown in Figure 5, the reference co-citation map has 455 nodes and 1518 links. Table 1 shows the details of the top 10 cocited references ranked by citation frequency. In terms of centrality, the key works are by Dakopoulos (2010), Guerrero (2012), Pradeep (2010), and Rodriguez (2012). Dakopoulos categorized the indoor navigation system into audio feedback, tactile feedback, and w/o interface according to human-computer interaction and evaluated a sample of each system based on their structure and operating specifications. He claimed that the human-computer interactions of travel aids must make efforts to embrace features such as hands-free, nonauditory, and wearable [22]. The system proposed by Guerrero was composed of infrared cameras, infrared lights, a computer, and a smartphone. When the infrared camera, installed indoors, scans the infrared lights embedded in the cane, the vision system will identify the user's position with respect to the environmental coordination system and guide the subject accordingly via voice messages casted by the smartphone [23].
Other key work by Pradeep included introducing a polynomial system utilizing trifocal tensor geometry and quaternion representation of rotation matrices, from which camera motion parameters can be extracted in the presence of noise [24]. Rodriguez used stereovision to achieve obstacle detection. It is worth noting that RANSAC and bone conduction technology were applied to enhance the optimization of localization and the user experience of human-machine interaction [25].  Dakopoulos categorized the indoor navigation system into audio feedback, tactile feedback, and w/o interface according to human-computer interaction and evaluated a sample of each system based on their structure and operating specifications. He claimed that the human-computer interactions of travel aids must make efforts to embrace features such as hands-free, nonauditory, and wearable [22]. The system proposed by Guerrero was composed of infrared cameras, infrared lights, a computer, and a smartphone. When the infrared camera, installed indoors, scans the infrared lights embedded in the cane, the vision system will identify the user's position with respect to the environmental coordination system and guide the subject accordingly via voice messages casted by the smartphone [23].
Other key work by Pradeep included introducing a polynomial system utilizing trifocal tensor geometry and quaternion representation of rotation matrices, from which camera motion parameters can be extracted in the presence of noise [24]. Rodriguez used stereovision to achieve obstacle detection. It is worth noting that RANSAC and bone conduction technology were applied to enhance the optimization of localization and the user experience of human-machine interaction [25].

Primary Topics and Research Hotspots by Keyword Co-Occurrence
The keywords co-occurrence knowledge map reflects the primary topics and hotspots over time [31]. By setting term source to author keywords/descriptors (DE), a keywords cooccurrence map is generated with 364 nodes and 682 links, where each node represents one keyword, and its size indicates the corresponding co-occurrence frequencies, as shown in Figure 6. Table 2 lists the top 20 keywords in terms of co-occurrence frequency and centrality. As shown in the table, the most frequential keywords were 'visually impaired', 'blind', 'assistive technology', 'navigation', 'ultrasonic sensor', 'accessibility', 'indoor navigation', 'obstacle detection', 'navigation system', 'blind navigation', 'computer vision', 'mobility', 'gps' (shown as 'gp' in Figure 6), 'mobile device', 'image processing', 'electronic travel aid', 'navigation aid', 'android', 'mobile application', and 'assistive device'. The 'visually impaired' consisted of eight similar elements, which were 'visually impaired', 'visual impairment', 'visually impaired people', 'visualimpairmen', 'blind and visually impaired', 'blind and visually impaired', 'visual impaired', and 'visually impaired people (vip)'. The terms 'blind', 'blindness', 'blind people', and 'blind user' together formed the element 'blind'. Moreover, 'Assistive technology' and 'assistive technology' were combined as 'assistive technology'; 'Computer vision' and 'computervision' were combined as 'computer vision'; 'ultrasonicsensor' and 'ultrasonic sensor' were combined as 'ultrasonic sensor'. We categorize the top 20 keywords into four topics, namely user, goal, requirement, and technology, as shown in Table 2. User refers to the research population targeted, goal emphasizes the outputs, requirement indicates the user needs supposed to be met during the process, and technology represents the specific scientific and engineering means used to achieve the goal. According to the co-occurrence of keywords, it is evident that researchers have been taking advantage of interdisciplinary technologies to empower nav-  We categorize the top 20 keywords into four topics, namely user, goal, requirement, and technology, as shown in Table 2. User refers to the research population targeted, goal emphasizes the outputs, requirement indicates the user needs supposed to be met during the process, and technology represents the specific scientific and engineering means used to achieve the goal. According to the co-occurrence of keywords, it is evident that researchers have been taking advantage of interdisciplinary technologies to empower navigation aids for PVI, but there has been less work that considered social-technical issues such as human-machine cooperation theory, usability engineering, feasibility analysis, and user experience.
The timezone map of the keywords requirement and technology is shown in Figure 7. Comprehensive analysis of Table 2 and Figure 7 helps to identify the research hotspots in navigation assistance for PVI over the past decade. In terms of technology, 'assistive technology' can be regarded as a general term, so the keyword appeared more frequently and stands as a root of the map in Figure 7. Subsequently, specific technologies emerged for services with explicit demands, such as ultrasonic sensors [32], infrared sensors [33], computer vision [34], ultrawideband (UWB) [35], and GPS [36] to collect environmental information; image processing [37], convolutional neural network (CNN) [38], and deep learning [14] to achieve advanced intelligence; stereo audio [39], audio-tactile maps [40], and haptic feedbacks [41] to bridge humans and machines. From the perspective of requirement, 'accessibility', 'mobility', and 'wearable' are among the most popular keywords. The specific requirements of the system include: (1) detecting, recognizing, and avoiding objects [42,43]; (2) situational awareness [44]; (3) smartphone-based [45] mobile applications [46]; and (4) audio [47] and haptic feedbacks [48]. Furthermore, it can be seen in Figure 7 that the research hotspots in the last three years are CNN, assistive wearable devices, facial recognition in video streaming, infrared sensors, object recognition, and learning [14] to achieve advanced intelligence; stereo audio [39], audio-tactile maps [40], and haptic feedbacks [41] to bridge humans and machines. From the perspective of requirement, 'accessibility', 'mobility', and 'wearable' are among the most popular keywords. The specific requirements of the system include: (1) detecting, recognizing, and avoiding objects [42,43]; (2) situational awareness [44]; (3) smartphone-based [45] mobile applications [46]; and (4) audio [47] and haptic feedbacks [48]. Furthermore, it can be seen in Figure 7 that the research hotspots in the last three years are CNN, assistive wearable devices, facial recognition in video streaming, infrared sensors, object recognition, and deep learning. To summarize, works with artificial intelligence such as neural networks and deep learning have emerged as hot topics in navigation aids for PVI. The timezone map presents the development process over a period, thereby providing insights to predict the future trend of the studied research field. Specifically, for scene understanding during the period 2010 to 2015, the system mainly uses Bluetooth, RFID, The timezone map presents the development process over a period, thereby providing insights to predict the future trend of the studied research field. Specifically, for scene understanding during the period 2010 to 2015, the system mainly uses Bluetooth, RFID, and ultrasonic sensors to obtain environmental data. After 2015, computer-vision-based approaches have been more popular for situational awareness. Furthermore, with respect to functionalities, the primary focus shifted from simple obstacle avoidance to more complicated mixed functionalities borrowed from autonomous mobile robotics to achieve better navigation. Concerning hardware, since 2008, mobile phones have gradually become wise options, and the flexibility and compatibility of smartphone apps have injected vitality into the research community. As a result, the development of smartphone-based approaches increased remarkably over time, and its growth rate reached a peak by 2015. Smartphone-based approaches are still thriving and stable. This also explains the reason why the publications studied are divided into two stages. According to Figure 7, more artificial intelligence approaches such as deep learning and neural networks appear as later nodes. Therefore, we propose that the trend towards autonomy and intelligence is unstoppable, while the primary device chosen as the carrier of the navigation aids would still be smartphones. On the basis of ensuring safe roaming, new methodologies such as vision-based cognitive systems [49] and social semantics [50] are gaining more attention to comprehensively improve the quality of subjects' daily life.
According to the bibliometric study, we have answered the following questions: What are the most influential publication sources? Who are the most active and influential authors? What are their research interests and primary contributions to society? What are the featured key studies in the field? What are the most popular topics and research trends, described by keywords? However, bibliometric analysis has its shortcomings since the results do not provide sufficient detailed insights due to two internal causes. First, the conclusions lack applicability, as they do not take a closer look into sample milestone works. Second, it is weak at identifying pioneering works using epoch-making approaches, as the influence of citation and hotspot requires more time to emerge. Thus, we have conducted a narrative study that facilitates further examination into works with unique multisensor combinations or representative multimodal interaction mechanisms.

Narrative Study of Navigation Aids for PVI
To further investigate the state-of-art research and development of the navigation aids and to look closely into the distinct architectures with multisensory patterns, 362 articles that describe working toward reasonable assistive systems or subsystems were taken from 550 studies in the literature. Among these studies, we listed 19 remarkable works in a chronological table (Table 3) for better illustration of auxiliary solutions in PVI navigation aids. In the later narrative study, we focused on the three pivotal aspects, e.g., hardware composition, scientific focus, and validation methods.

Year
Title of Reference 2012 NAVIG: augmented reality guidance system for the visually impaired [51] 2012 An indoor navigation system for the visually impaired [23] 2013 Multichannel ultrasonic range finder for blind people navigation [52] 2013 New indoor navigation system for visually impaired people using visible light communication [53] 2013 Blind navigation assistance for visually impaired based on local depth hypothesis from a single image [54] 2013 A system-prototype representing 3D space via alternative-sensing for visually impaired navigation [55] 2014 Navigation assistance for the visually impaired using RGB-D sensor with range expansion [56] 2015 Design, implementation and evaluation of an indoor navigation system for visually impaired people [57] 2015 An assistive navigation framework for the visually impaired [58] 2016 NavCog: turn-by-turn smartphone navigation assistant for people with visual impairments or blindness [10] 2016 ISANA: wearable context-aware indoor assistive navigation with obstacle avoidance for the blind [59] 2018 PERCEPT navigation for visually impaired in large transportation hubs [60] 2018 Safe local navigation for visually impaired users with a time-of-flight and haptic feedback device [61] 2019 An astute assistive device for mobility and object recognition for visually impaired people [62] 2019 Wearable travel aid for environment perception and navigation of visually impaired people [63] 2019 An ARCore based user centric assistive navigation system for visually impaired people [20] 2020 Integrating wearable haptics and obstacle avoidance for the visually impaired in indoor navigation: A user-centered approach [64] 2020 ASSIST: Evaluating the usability and performance of an indoor navigation assistant for blind and visually impaired people [65] 2020 V-eye: A vision-based navigation system for the visually impaired [66] 3.

Hardware Composition
Early electronic aid tools use plain perceptive sensors, such as ultrasonic, infrared, or laser sensors, as the key to detecting obstacles. Subsequently, with the recent deep integration of Bluetooth, GPS, and vision sensors into navigation systems, although the types of hardware equipment have become diversified, the demands on the processing kernel have not changed. Thus, we categorize these works into three families: (1) computer as core-a system that applies computers in the process of navigation; (2) phone as core-a system with smartphones as the main carrier of information collection or information processing; and (3) standalone-a system using specified or ubiquitous chips to process and control the multisensor collaboration. Each equipment type contains two parts, 'with Infrastructure' means that pre-built physical facilities, such as RFID or WIFI beacons exist at the site or in the smart city [67], whereas 'Alone' denotes otherwise. Figure 8 shows statistics relating to the hardware equipment types of navigation systems. Based on their number, standalones are the most popular, whereas distance sensors are widely used to achieve object detection and avoidance [62] with relatively simple functionalities. For example, Wojciech Gelmuda and Andrzej Kos used a multichannel ultrasonic process for high-level safety and obstacle avoidance while achieving a minimum size and cost [52]. Because safety is the primary concern for PVI, and price is the fundamental barrier for societal acceptance, the work described in [52] enlightens innovations regarding possible balances between the two factors. Some systems combine RFID tags to achieve environmental information collection and positioning [68]. Because smartphones contain numerous sensors, environmental information can be provided via BLE beacons, NFC tags, etc. [10,69,70], and scene data can also be acquired and processed through visual sensors [63]. Smartphones are compact and popular among PVI. Navigation aids with smartphones as the core account for more than one-third of the total, and the classifications of 'with Infrastructure' and 'Alone' are similar. Computer-based systems mainly use external vision sensors to collect the required data [56,64] and rarely use pre-set infrastructure. Due to the enhancement of the computing power, portability, durability, and affordability of smartphones, systems with smartphones as the core have engendered a trend to replace these tasks with computers. Thus, we hypothesize that smartphone-based navigation aids will gain more favor and interest from the research community, and, in the near future, will account for the greatest proportion of navigation systems.

Primary Scientific Focus
In this section, the abovementioned works are divided into three categories based on the focuses, e.g., path planning, behavior and motion planning, and action control. Path planning denotes the generation of a reliable path for PVI pedestrians, in most cases to cast output as waypoints; behavior and motion planning indicates a more dedicated system and machine-friendly planner that practically convert the waypoint-based path into behavior and an executable track or trajectory; action control aims to guarantee that the reference trajectory can be followed, or the equivalent goal can be reached through human-computer interaction [71]. To closely examine these tasks, 10 representative navigation aids solutions were chosen for illustration. For each solution, we present the detailed decomposition of the multisensor collaborations and hardware-supported multimodal interactions.

Path Planning
As a niche approach, Nakajima et al. proposed a navigation system that uses visible light communication technology and geomagnetic correction methods to help PVI travel indoors [53,72]. As shown in Figure 9, the system consists of a series of LED lights with visible light IDs and a smartphone that integrates a Bluetooth receiver and a headset. To

Primary Scientific Focus
In this section, the abovementioned works are divided into three categories based on the focuses, e.g., path planning, behavior and motion planning, and action control. Path planning denotes the generation of a reliable path for PVI pedestrians, in most cases to cast output as waypoints; behavior and motion planning indicates a more dedicated system and machine-friendly planner that practically convert the waypoint-based path into behavior and an executable track or trajectory; action control aims to guarantee that the reference trajectory can be followed, or the equivalent goal can be reached through human-computer interaction [71]. To closely examine these tasks, 10 representative navigation aids solutions were chosen for illustration. For each solution, we present the detailed decomposition of the multisensor collaborations and hardware-supported multimodal interactions.

Path Planning
As a niche approach, Nakajima et al. proposed a navigation system that uses visible light communication technology and geomagnetic correction methods to help PVI travel indoors [53,72]. As shown in Figure 9, the system consists of a series of LED lights with visible light IDs and a smartphone that integrates a Bluetooth receiver and a headset. To realize positioning and navigation functionalities, the system needs to obtain the user's location information and indoor map data, wherein the location information can be directly embedded into the visible light ID system. While working, after the user verbally instructs the system, the smartphone first receives the visible light ID through Bluetooth and obtains the geographical coordinates of the current position; then, it obtains the egocentric orientational information through the geomagnetic sensor. At the same time, the geographic coordinates of the next LED light are obtained accordingly, and the walking direction and distance are calculated. Finally, the location information and guidance content are fused as input to the voice synthesizer system, and are delivered via headset to the user soon thereafter. ISANA [59] is a remarkable mobile wearable indoor PVI navigation system that emphasizes its capability of situation awareness. As shown in Figure 10, the system includes an indoor map editor and a mobile application with a Tango device, a smartphone with an RGB-depth camera. First, the interior map editor processes the spatial semantic information by building the architectural model and semantic map. Second, the system uses the area learning algorithm based on Tango for positioning on the semantic map. Combined with the depth sensor to detect the obstacles ahead and the self-tracking function in Tango, the system automatically generates a smooth path to the destination. Finally, based on the priority of collecting information, ISANA uses speech-audio interaction to allow the user to input commands and provide user guidance and alert prompts to reduce the user's cognitive load. Bing Li et al. further optimized this system [73] by allowing ISANA to detect dynamic obstacles in real time and make timely path planning adjustments, thus improving the safety while working. At the same time, based on audio feedbacks, an electronic intelligent walking stick that can provide tactile feedback was developed, thereby forming a multimodal human-machine interaction with speech, audio, and tactile. SUGAR [57] uses UWB technology to achieve indoor positioning. The duration of the ISANA [59] is a remarkable mobile wearable indoor PVI navigation system that emphasizes its capability of situation awareness. As shown in Figure 10, the system includes an indoor map editor and a mobile application with a Tango device, a smartphone with an RGB-depth camera. First, the interior map editor processes the spatial semantic information by building the architectural model and semantic map. Second, the system uses the area learning algorithm based on Tango for positioning on the semantic map. Combined with the depth sensor to detect the obstacles ahead and the self-tracking function in Tango, the system automatically generates a smooth path to the destination. Finally, based on the priority of collecting information, ISANA uses speech-audio interaction to allow the user to input commands and provide user guidance and alert prompts to reduce the user's cognitive load. Bing Li et al. further optimized this system [73] by allowing ISANA to detect dynamic obstacles in real time and make timely path planning adjustments, thus improving the safety while working. At the same time, based on audio feedbacks, an electronic intelligent walking stick that can provide tactile feedback was developed, thereby forming a multimodal human-machine interaction with speech, audio, and tactile.
SUGAR [57] uses UWB technology to achieve indoor positioning. The duration of the UWB signal is short, which not only enables the pulse timing to have a higher time resolution, but can also distinguish a direct signal from the signal reflection that usually occurs in an indoor environment, thus providing accurate positioning services for visually impaired users. The system obtains detailed information about obstacles, points of interest, etc., by constructing a spatial database of buildings, combines UWB positioning, uses the A star pathfinding algorithm, and configures the required path width and the parameters of the minimum allowable distance to the obstacle to complete optimal path planning. The system adopts a wired ethernet network to realize the communication between the UWB tag fixed on the headset and the UWB sensor inside the room and uses the Wi-Fi network to realize the communication between the server and the mobile phone. It uses coded acoustic signals and voice commands in interaction, as shown in Figure 11. Compared with conventional indoor navigation systems using RFID or NFC technology, the cost of the SUGAR system is higher, as the UWB sensor has a radiation range of 50-60 m, less equipment needs to be provided, and it is more suitable for large rooms, such as meeting rooms. mation by building the architectural model and semantic map. Second, the system uses the area learning algorithm based on Tango for positioning on the semantic map. Combined with the depth sensor to detect the obstacles ahead and the self-tracking function in Tango, the system automatically generates a smooth path to the destination. Finally, based on the priority of collecting information, ISANA uses speech-audio interaction to allow the user to input commands and provide user guidance and alert prompts to reduce the user's cognitive load. Bing Li et al. further optimized this system [73] by allowing ISANA to detect dynamic obstacles in real time and make timely path planning adjustments, thus improving the safety while working. At the same time, based on audio feedbacks, an electronic intelligent walking stick that can provide tactile feedback was developed, thereby forming a multimodal human-machine interaction with speech, audio, and tactile. SUGAR [57] uses UWB technology to achieve indoor positioning. The duration of the UWB signal is short, which not only enables the pulse timing to have a higher time resolution, but can also distinguish a direct signal from the signal reflection that usually occurs in an indoor environment, thus providing accurate positioning services for visually impaired users. The system obtains detailed information about obstacles, points of interest, etc., by constructing a spatial database of buildings, combines UWB positioning, uses the A star pathfinding algorithm, and configures the required path width and the parameters of the minimum allowable distance to the obstacle to complete optimal path planning. The system adopts a wired ethernet network to realize the communication between the UWB tag fixed on the headset and the UWB sensor inside the room and uses the Wi-Fi network to realize the communication between the server and the mobile phone. It uses coded acoustic signals and voice commands in interaction, as shown in Figure 11. Compared with conventional indoor navigation systems using RFID or NFC technology, the cost of the SUGAR system is higher, as the UWB sensor has a radiation range of 50-60 m, less equipment needs to be provided, and it is more suitable for large rooms, such as meeting rooms. Figure 11. SUGAR [57] uses UWB for positioning.
The NAVIG system [51,74,75] was designed to complement assistive tools (such as canes and guide dogs). It combines the Global Navigation Satellite System (GNSS) and visual recognition to rapidly find the accurate location of users, thereby enhancing the mobility of visually impaired users in indoors and outdoors. As shown in Figure 12, the system is composed of three parts: data input, user communication, and internal system control. The data input component has a geographic positioning system, acceleration and direction sensors, a map database, an ultra-high-speed image recognition platform connected to a head-mounted camera, and a data fusion module. NAVIG uses multiple data inputs (satellite, azimuth, acceleration, and image recognition) combined with the location database to locate the user's location in real-time. Considering that it is difficult to accurately locate places such as zebra crossings and road construction during navigation, the system introduced a visual recognition function to further improve the accuracy and safety of the navigation system. To minimize the masking of real-world ambient sounds, the system uses bone-conduction headsets to deliver prompt information to the user. The NAVIG system [51,74,75] was designed to complement assistive tools (such as canes and guide dogs). It combines the Global Navigation Satellite System (GNSS) and visual recognition to rapidly find the accurate location of users, thereby enhancing the mobility of visually impaired users in indoors and outdoors. As shown in Figure 12, the system is composed of three parts: data input, user communication, and internal system control. The data input component has a geographic positioning system, acceleration and direction sensors, a map database, an ultra-high-speed image recognition platform connected to a head-mounted camera, and a data fusion module. NAVIG uses multiple data inputs (satellite, azimuth, acceleration, and image recognition) combined with the location database to locate the user's location in real-time. Considering that it is difficult to accurately locate places such as zebra crossings and road construction during navigation, the system introduced a visual recognition function to further improve the accuracy and safety of the navigation system. To minimize the masking of real-world ambient sounds, the system uses bone-conduction headsets to deliver prompt information to the user.
nected to a head-mounted camera, and a data fusion module. NAVIG uses multiple data inputs (satellite, azimuth, acceleration, and image recognition) combined with the location database to locate the user's location in real-time. Considering that it is difficult to accurately locate places such as zebra crossings and road construction during navigation, the system introduced a visual recognition function to further improve the accuracy and safety of the navigation system. To minimize the masking of real-world ambient sounds, the system uses bone-conduction headsets to deliver prompt information to the user.

Behavior and Motion Planning
Similar to research on autonomous driving and autonomous mobile robots, the popular waypoint-based path requires dedicated planning to be converted into human-machine system friendly and followable traces. Thus, behavior and motion planning are indispensable, either being considered independently or fused into the path planning. The 'behavior' means navigation behavior, including motion and human capabilities taken into consideration in planning.
Ganz et al. proposed PERCEPT, a milestone work of PVI indoor navigation. In practice, it interacts elaborately with the sensory events and reflects the changes into the motion planners. The PERCEPT II system [69] deployed near field communication (NFC) tags on existing signs of specific landmarks in the environment, such as doors, elevators, and staircases, as shown in Figure 13. When the user touches the NFC tag, the system obtains a unique ID, which, in turn, determines the user's location via Google Maps and provides navigation instructions in audio format. However, only obtaining navigation information at specific landmarks does not sufficiently meet the needs of users who wish to acquire detailed navigation instructions. Therefore, the later version of PERCEPT was developed [60] using BLE tags deployed in the environment, which allows for positioning users anywhere in the environment while providing the user-friendly instructions. In addition, it introduces a loss-prevention function designed for inadvertent disorientation. Worth mentioning is that it has a user-friendly design named 'where am I', which may trigger the system to describe with information regarding surrounding landmarks, so as to enhance the situation awareness and confidence of the user, while subsequently changing the motion by instructions.

Behavior and Motion Planning
Similar to research on autonomous driving and autonomous mobile robots, the popular waypoint-based path requires dedicated planning to be converted into human-machine system friendly and followable traces. Thus, behavior and motion planning are indispensable, either being considered independently or fused into the path planning. The 'behavior' means navigation behavior, including motion and human capabilities taken into consideration in planning.
Ganz et al. proposed PERCEPT, a milestone work of PVI indoor navigation. In practice, it interacts elaborately with the sensory events and reflects the changes into the motion planners. The PERCEPT II system [69] deployed near field communication (NFC) tags on existing signs of specific landmarks in the environment, such as doors, elevators, and staircases, as shown in Figure 13. When the user touches the NFC tag, the system obtains a unique ID, which, in turn, determines the user's location via Google Maps and provides navigation instructions in audio format. However, only obtaining navigation information at specific landmarks does not sufficiently meet the needs of users who wish to acquire detailed navigation instructions. Therefore, the later version of PERCEPT was developed [60] using BLE tags deployed in the environment, which allows for positioning users anywhere in the environment while providing the user-friendly instructions. In addition, it introduces a loss-prevention function designed for inadvertent disorientation. Worth mentioning is that it has a user-friendly design named 'where am I', which may trigger the system to describe with information regarding surrounding landmarks, so as to enhance the situation awareness and confidence of the user, while subsequently changing the motion by instructions. Bai et al. proposed a wearable intelligent navigation system, aiming to assist PVI roaming safely in unfamiliar indoor and outdoor areas [63]. The system combines and optimizes the obstacle avoidance algorithm based on multisensor fusion proposed in previous research [76] and a novel dynamic subgoal with a selection-based virtual blindroad-following scheme [77], which can not only detect smaller obstacles and transparent obstacles, but also provides navigation information by using a lightweight CNN to iden- Bai et al. proposed a wearable intelligent navigation system, aiming to assist PVI roaming safely in unfamiliar indoor and outdoor areas [63]. The system combines and optimizes the obstacle avoidance algorithm based on multisensor fusion proposed in previous research [76] and a novel dynamic subgoal with a selection-based virtual blindroad-following scheme [77], which can not only detect smaller obstacles and transparent obstacles, but also provides navigation information by using a lightweight CNN to identify objects. In the indoor environment, the system applies visual simultaneous localization and mapping (VSLAM) technology to build indoor maps, conducts scene perceptions via computer vision, and instructs the user with synthetic speech. This work was claimed to be more efficient than conventional approaches in complex situations, for example, in the scene where a road is blocked by two unexpected chairs, as shown in Figure 14. A PVI user has to replan a new path (dotted lines), since the planned path is blocked. By taking advantage of object recognition, the user may continue to follow the original guiding path to the endpoint by shifting a chair, as illustrated with the solid lines. A smooth connection between path planner and motion planner is observed, especially in cases where the path must be significantly adjusted in the middle of a trial. between path planner and motion planner is observed, especially in cases where the path must be significantly adjusted in the middle of a trial. Figure 14. A wearable intelligent indoor navigation system proposed by Bai [63].
NavCane [62] is a low-cost, low-power obstacle detection and recognition embedded guide cane, which is an alternative to machine vision systems. The system can determine the priority of obstacle information on the road, so as not to overload the information delivered to users. NavCane contains ultrasonic sensors, a water sensor, an RFID reader, a GPS module, a GIS module, a vibration motor, a gyroscope, and a battery, as shown in Figure 15. Ultrasonic sensors are designed to help VIP detect and avoid obstacles of different heights, such as feet, knees, and waists. The water sensor can confirm whether there is water on the ground. The RFID reader can recognize objects in the environment and the color of clothes based on what is stored in the RFID tags. The GPS module can obtain the user's current location information and assist the user in finding their way in an indoor or outdoor environment. In addition, the system utilizes a GIS module, allowing users to send alert messages and e-mails to the pre-configured caretakers. NavCane allows dual feedbacks: audio and tactile feedbacks. Tactile feedback is mainly used to remind users of obstacles around them, and audio feedback is used to inform users of object types or colors and guides users to move in indoors.  NavCane [62] is a low-cost, low-power obstacle detection and recognition embedded guide cane, which is an alternative to machine vision systems. The system can determine the priority of obstacle information on the road, so as not to overload the information delivered to users. NavCane contains ultrasonic sensors, a water sensor, an RFID reader, a GPS module, a GIS module, a vibration motor, a gyroscope, and a battery, as shown in Figure 15. Ultrasonic sensors are designed to help VIP detect and avoid obstacles of different heights, such as feet, knees, and waists. The water sensor can confirm whether there is water on the ground. The RFID reader can recognize objects in the environment and the color of clothes based on what is stored in the RFID tags. The GPS module can obtain the user's current location information and assist the user in finding their way in an indoor or outdoor environment. In addition, the system utilizes a GIS module, allowing users to send alert messages and e-mails to the pre-configured caretakers. NavCane allows dual feedbacks: audio and tactile feedbacks. Tactile feedback is mainly used to remind users of obstacles around them, and audio feedback is used to inform users of object types or colors and guides users to move in indoors.

Action Control
ALVU (array of LiDARs and vibrotactile units) [61] is an intuitive, contactless wearable system that helps VIP to detect the location of obstacles in their environment and the boundary of the surrounding space to enable partial navigation in an indoor environment. The system is composed of two parts, a sensor belt and a haptic strap, which communicates via Bluetooth, as shown in Figure 16. The time-of-flight (ToF) sensor is tiny yet capable to accurately and reliably measure the distance between the user and the surrounding obstacles or surfaces. The sensor belt consists of two vertical ToF sensors and an array of five horizontal ToF sensors, placed in front of the user's waist and used to detect the front, sides, ground, and ceiling in front of the user. The tactile band worn on the user's upper abdomen is composed of five horizontally arranged vibration motors to provide tactile feedback. Compared with the gradually changing vibration intensity, which indicates the measured distance, the intensity of the different levels is more easily perceivable by the user. The tactile belt is divided into five levels of vibration intensity. The closer the measurement distance is, the greater the vibration intensity will be. At the same time, five combinations of pulse rate and pulse intensity are set up for the tactile belt to detect the height of obstacles and for going up and down stairs. guide cane, which is an alternative to machine vision systems. The system can determine the priority of obstacle information on the road, so as not to overload the information delivered to users. NavCane contains ultrasonic sensors, a water sensor, an RFID reader, a GPS module, a GIS module, a vibration motor, a gyroscope, and a battery, as shown in Figure 15. Ultrasonic sensors are designed to help VIP detect and avoid obstacles of different heights, such as feet, knees, and waists. The water sensor can confirm whether there is water on the ground. The RFID reader can recognize objects in the environment and the color of clothes based on what is stored in the RFID tags. The GPS module can obtain the user's current location information and assist the user in finding their way in an indoor or outdoor environment. In addition, the system utilizes a GIS module, allowing users to send alert messages and e-mails to the pre-configured caretakers. NavCane allows dual feedbacks: audio and tactile feedbacks. Tactile feedback is mainly used to remind users of obstacles around them, and audio feedback is used to inform users of object types or colors and guides users to move in indoors.

Action Control
ALVU (array of LiDARs and vibrotactile units) [61] is an intuitive, contactless wearable system that helps VIP to detect the location of obstacles in their environment and the boundary of the surrounding space to enable partial navigation in an indoor environment. The system is composed of two parts, a sensor belt and a haptic strap, which communicates via Bluetooth, as shown in Figure 16. The time-of-flight (ToF) sensor is tiny yet capable to accurately and reliably measure the distance between the user and the surrounding obstacles or surfaces. The sensor belt consists of two vertical ToF sensors and an array of five horizontal ToF sensors, placed in front of the user's waist and used to detect the front, sides, ground, and ceiling in front of the user. The tactile band worn on the user's upper abdomen is composed of five horizontally arranged vibration motors to provide tactile feedback. Compared with the gradually changing vibration intensity, which indicates the measured distance, the intensity of the different levels is more easily perceivable by the user. The tactile belt is divided into five levels of vibration intensity. The closer the measurement distance is, the greater the vibration intensity will be. At the same time, five combinations of pulse rate and pulse intensity are set up for the tactile belt to detect the height of obstacles and for going up and down stairs. Aladren et al. proposed an indoor navigation system based on computer vision [56], as shown in Figure 17. The system obtains scene images through an RGB-D camera and uses distance information and color information to perform high-level planning and segmentation of the scene. Among them, the distance information is mainly used to extract the main structural elements in the scene within a close distance (3 m). For long-distance (greater than 3 m) image information, the system uses color information, and based on scene type, the image is detected and classified according to the polygon layer segmentation or watershed layer segmentation method. The system contains two interaction mechanisms: voice commands and a sound map. Voice commands can provide free path information to the PVI or alert users to possible hazards when the distance between users and obstacles is less than 2 m. Stereo beeps are used in the sound map to remind users of the distance information of obstacles, and the frequency of the sound depends on the distance between obstacles and the user. When the user is closer to the obstacle on their right-hand side rather than their left, the frequency of the sound received by the right ear is higher; when the system detects an obstacle in front of the user, the sound is heard in both ears. Aladren et al. proposed an indoor navigation system based on computer vision [56], as shown in Figure 17. The system obtains scene images through an RGB-D camera and uses distance information and color information to perform high-level planning and segmentation of the scene. Among them, the distance information is mainly used to extract the main structural elements in the scene within a close distance (3 m). For long-distance (greater than 3 m) image information, the system uses color information, and based on scene type, the image is detected and classified according to the polygon layer segmentation or watershed layer segmentation method. The system contains two interaction mechanisms: voice commands and a sound map. Voice commands can provide free path information to the PVI or alert users to possible hazards when the distance between users and obstacles is less than 2 m. Stereo beeps are used in the sound map to remind users of the distance information of obstacles, and the frequency of the sound depends on the distance between obstacles and the user. When the user is closer to the obstacle on their right-hand side rather than their left, the frequency of the sound received by the right ear is higher; when the system detects an obstacle in front of the user, the sound is heard in both ears. ANSVIP [20] is a navigation system based on computer vision positioning. The system relies on a pre-built CAD map of the indoor scene, and by marking the area of interest on the map, it helps the system understand the navigation request and plan the path accordingly. ANSVIP uses a smartphone that supports the ARCore module as the carrier and obtains real-time environmental information through vision sensors and inertial measurement units (IMUs). Based on SLAM and area learning, ANSVIP locates users on CAD maps and uses ARCore technology to complete real-time updates of the user's pose and location, as shown in Figure 18. The system introduces a dual-channel human-machine interaction mechanism. After the user gives instructions to the mobile terminal by voice, the mobile terminal parses the instructions and transmits the information to the wearable glove via Bluetooth, while giving voice feedback to the user. The left-handed wearable glove is used to guide the user's moving direction, and the right-handed glove is used to remind the user of the distribution of obstacles. The system provides fluent and continuous navigating guidance which were claimed superior to the conventional turnby-turn audio-guiding method.

Validation Methods
Validation of the PVI navigation aids is helpful to objectively evaluate the effectiveness and value of a system. Simulations and experiments are the two primary approaches to confirm the validity and performance. According to their validation method, the literature studied is categorized as shown in Figure 19.
The number of studies verified by experiments (251) is greater than the number of works using simulation (62). Among them, 17 works conducted both simulation and field tests. In simulations, unit tests are most popular to evaluate subsystems, such as mixed ANSVIP [20] is a navigation system based on computer vision positioning. The system relies on a pre-built CAD map of the indoor scene, and by marking the area of interest on the map, it helps the system understand the navigation request and plan the path accordingly. ANSVIP uses a smartphone that supports the ARCore module as the carrier and obtains real-time environmental information through vision sensors and inertial measurement units (IMUs). Based on SLAM and area learning, ANSVIP locates users on CAD maps and uses ARCore technology to complete real-time updates of the user's pose and location, as shown in Figure 18. The system introduces a dual-channel humanmachine interaction mechanism. After the user gives instructions to the mobile terminal by voice, the mobile terminal parses the instructions and transmits the information to the wearable glove via Bluetooth, while giving voice feedback to the user. The left-handed wearable glove is used to guide the user's moving direction, and the right-handed glove is used to remind the user of the distribution of obstacles. The system provides fluent and continuous navigating guidance which were claimed superior to the conventional turn-by-turn audio-guiding method. ANSVIP [20] is a navigation system based on computer vision positioning. The system relies on a pre-built CAD map of the indoor scene, and by marking the area of interest on the map, it helps the system understand the navigation request and plan the path accordingly. ANSVIP uses a smartphone that supports the ARCore module as the carrier and obtains real-time environmental information through vision sensors and inertial measurement units (IMUs). Based on SLAM and area learning, ANSVIP locates users on CAD maps and uses ARCore technology to complete real-time updates of the user's pose and location, as shown in Figure 18. The system introduces a dual-channel human-machine interaction mechanism. After the user gives instructions to the mobile terminal by voice, the mobile terminal parses the instructions and transmits the information to the wearable glove via Bluetooth, while giving voice feedback to the user. The left-handed wearable glove is used to guide the user's moving direction, and the right-handed glove is used to remind the user of the distribution of obstacles. The system provides fluent and continuous navigating guidance which were claimed superior to the conventional turnby-turn audio-guiding method.

Validation Methods
Validation of the PVI navigation aids is helpful to objectively evaluate the effectiveness and value of a system. Simulations and experiments are the two primary approaches to confirm the validity and performance. According to their validation method, the literature studied is categorized as shown in Figure 19.
The number of studies verified by experiments (251) is greater than the number of works using simulation (62). Among them, 17 works conducted both simulation and field tests. In simulations, unit tests are most popular to evaluate subsystems, such as mixed

Validation Methods
Validation of the PVI navigation aids is helpful to objectively evaluate the effectiveness and value of a system. Simulations and experiments are the two primary approaches to confirm the validity and performance. According to their validation method, the literature studied is categorized as shown in Figure 19. tion of a route by subjects in a real environment, and almost 40% of the solution invites PVI to participate while the rest were with blind-folded users. Compared with simulations, in which model parameters can be adjusted adaptively and the trial quantities are not restricted, experiments may produce an authentic experience and receive feedback regarding the navigation solution by practicing in real environments. Therefore, the experimental verification method is more popular, especially for validation and verification toward human-centric metrics such as usability, feasibility, and user experience.

Discussion and Conclusions
In this work, a bibliometric study using CiteSpace was carried out for research works of PVI navigation aids. We clearly answered the following questions: What are the publication characteristics and most influential publication sources? Who are the most active and influential authors? What are their research interests and primary contributions to society? What are the featured key studies in the field? What are the most popular topics and research trends, described by keywords? Since bibliometric analysis suffers from latency of response to research-focus changes and lacks closer views of milestone and representative works, we conducted a narrative illustration on remarkable works with different system architecture and sensory combinations in achieving modern navigation aids. The narrative study categories the works by their primary focus and takes ten representative works as illustrative examples to show the distinct influential sensory combinations and mechanisms applied.
The main findings follow. First, as an applied science, navigation aids research and development periodically reflect the status of frontier science and the economical rules of electronic products that accompany the development of personal computation devices and services; the carriers of navigation aids gradually shift from specially manufactured equipment to ubiquitous personal devices, e.g., PDA, computer, and currently, smartphones. Second, the research and developmental works focus less on fundamental theories; the most influential journal of this field shares the scope of sensory technology, assistive technology, robotics, and human-machine interactions. Third, the most active authors are with institutions in USA and Japan. Cooperation between leading authors results in high publication output and quality. Fourth, the key references reflect a fact that wearable and compact aids, two essential means of usability engineering, draw more attention and adoption by researchers. Fifth, hybrid functionalities supported by computer vision and multimodal sensor appear to gain more focus other than conventional obstacle The number of studies verified by experiments (251) is greater than the number of works using simulation (62). Among them, 17 works conducted both simulation and field tests. In simulations, unit tests are most popular to evaluate subsystems, such as mixed verification of camera calibration [78], the capacity of proposed novel ultrasonic signals processing [79], etc. In experiments, the results of the unit test (122) and the integrated test (97) are both numerous. Since the unit test mainly carries out verification for partial functions, such as accuracy in detecting obstacle distance [54], in most cases, the subject type is not required or described. Experiments usually demand the evaluation of the completion of a route by subjects in a real environment, and almost 40% of the solution invites PVI to participate while the rest were with blind-folded users. Compared with simulations, in which model parameters can be adjusted adaptively and the trial quantities are not restricted, experiments may produce an authentic experience and receive feedback regarding the navigation solution by practicing in real environments. Therefore, the experimental verification method is more popular, especially for validation and verification toward human-centric metrics such as usability, feasibility, and user experience.

Discussion and Conclusions
In this work, a bibliometric study using CiteSpace was carried out for research works of PVI navigation aids. We clearly answered the following questions: What are the publication characteristics and most influential publication sources? Who are the most active and influential authors? What are their research interests and primary contributions to society? What are the featured key studies in the field? What are the most popular topics and research trends, described by keywords? Since bibliometric analysis suffers from latency of response to research-focus changes and lacks closer views of milestone and representative works, we conducted a narrative illustration on remarkable works with different system architecture and sensory combinations in achieving modern navigation aids. The narrative study categories the works by their primary focus and takes ten representative works as illustrative examples to show the distinct influential sensory combinations and mechanisms applied.
The main findings follow. First, as an applied science, navigation aids research and development periodically reflect the status of frontier science and the economical rules of electronic products that accompany the development of personal computation devices and services; the carriers of navigation aids gradually shift from specially manufactured equipment to ubiquitous personal devices, e.g., PDA, computer, and currently, smartphones. Second, the research and developmental works focus less on fundamental theories; the most influential journal of this field shares the scope of sensory technology, assistive technology, robotics, and human-machine interactions. Third, the most active authors are with institutions in USA and Japan. Cooperation between leading authors results in high publication output and quality. Fourth, the key references reflect a fact that wearable and compact aids, two essential means of usability engineering, draw more attention and adoption by researchers. Fifth, hybrid functionalities supported by computer vision and multimodal sensor appear to gain more focus other than conventional obstacle avoidance. Moreover, latest progress of science and technology, such as artificial intelligence, robotics simultaneous localization and mapping, and stereo multimodal feedbacks, all allow for navigation aids to be more powerful, compact, efficient, and user-friendly. Finally, there have been numerous attempts using different sensor combination and distinct multimodal human-machine interactions to realize the navigation aids.
However, a clear conclusion is lacking, which establishes the best way to achieve navigation aids with comprehensive consideration of affordability, usability, and sufficient functionality. One of the reasons is that it lacks a recognized guideline or benchmark for these features. Thus, besides the scientific and engineering efforts, the next stage for research and development of PVI navigation aids may require additional efforts on the usercentric factors to make systems user-friendly and practical from aspects of affordability and usability, and on cooperatively working with leading research institutions and dominating organizations (WHO, World Health Organization or RESNA, Rehabilitation Engineering and Assistive Technology Society of North America) to jointly propose recommended standards and unified evaluation indicators for the research field.
As can be seen, although significant potential exists in relation to the benefits of interdisciplinary scientific advances in PVI travel aids, a large number of under-researched areas remain. Based on the literature, the advanced sensory, multimodal perception, autonomous navigation planning, and human-machine interactions are among the most viable and practical supportive components for modern PVI travel aids. Based on our study of the literature and insights gained from users, a light-weight system is clearly preferred to a highly coupled complex system. Moreover, in addition to the scientific contribution and advances, the human-centric principle is also an essential concern.
By comprehensively considering usability, feasibility, and development and technology adoption costs, we believe that the use of smartphones as the primary carrier, including the use of their onboard sensors with optional accessories, appears to be a clear trend for the upcoming research and development of navigation aids for PVI.
Specifically, for researchers and developers in this field, we subjectively but constructively suggest embracing the agile development principle in realizing navigation systems for PVI. That is, rather than undertaking all development from scratch, we believe a more sensible approach is to use a fast prototype with open interfaces, such as Google ARCore or Apple ARKit, in addition to other compatible and powerful technologies that encompass positioning, perception, navigation, and artificial intelligence in a shell. After the conceptual system is verified by the function-equivalent rapid prototype, the team could further explore and expand the potential of the smartphone-based system or appointed appropriate frameworks, which are considered to be less efficient in terms of time and effort, in addition to being commercially expensive.
Although most papers do not discuss the stability and compatibility of their works, we understand the ease and frequency with which navigation-aid systems crash or fail in practice. We highly recommend frameworks such as ARCore and ARKit, because they are engineering innovations that achieve stable mapping and positioning (repositioning) via numerous scientific and engineering efforts of their R&D teams. These technologies were developed for multiplayer application in a shared physical space, which is consistent with the needs of the PVI navigation, and are termed sensor-based-positioning, sparse feature SLAM, and human-machine systems. In addition, the development process is simpler than that of the conventional approaches; that is, more intuitively understandable and powerful cross-platform development environments are emerging, such as Unity3D and Xcode. These are user-friendly approaches for researchers and college students.
Given the increasing popularity of smartphones, we believe that 'smartphone+' will be the dominant form of future PVI travel aids due to its various merits, such as integrated and well-calibrated sensors, powerful processing capability, ease of use, and on-the-move intelligent interfaces, which are endorsed by prominent teams; lower development cost; and, primarily, high rates of ownership and a minimum additional learning burden.