Enabling Self-Practice of Digital Audio–Tactile Maps for Visually Impaired People by Large Language Models

Tran, Chanh Minh; Bach, Nguyen Gia; Tan, Phan Xuan; Kamioka, Eiji; Kanamaru, Manami

doi:10.3390/electronics13122395

Open AccessArticle

Enabling Self-Practice of Digital Audio–Tactile Maps for Visually Impaired People by Large Language Models

by

Chanh Minh Tran

^1,*

,

Nguyen Gia Bach

²

,

Phan Xuan Tan

²

,

Eiji Kamioka

^2,*

and

Manami Kanamaru

³

¹

College of Engineering, Shibaura Institute of Technology, Tokyo 135-8548, Japan

²

Graduate School of Engineering and Science, Shibaura Institute of Technology, Tokyo 135-8548, Japan

³

Division of Electronic Information and Biomedical Engineering, Tokyo Denki University, Saitama 350-0394, Japan

^*

Authors to whom correspondence should be addressed.

Electronics 2024, 13(12), 2395; https://doi.org/10.3390/electronics13122395

Submission received: 24 May 2024 / Revised: 14 June 2024 / Accepted: 17 June 2024 / Published: 19 June 2024

(This article belongs to the Special Issue Human-Robot Interaction and Collaboration for Effective Solutions to Real-World Problems)

Download

Browse Figures

Versions Notes

Abstract

Digital audio–tactile maps (DATMs) on touchscreen devices provide valuable opportunities for people who are visually impaired (PVIs) to explore the spatial environment for engaging in travel activities. Existing solutions for DATMs usually require extensive training for the PVIs to understand the feedback mechanism. Due to the shortage of human resources for training specialists, as well as PVIs’ desire for frequent practice to maintain their usage skills, it has become challenging to widely adopt DATMs in real life. This paper discusses the use of large language models (LLMs) to provide a verbal evaluation of the PVIs’ perception, which is crucial for the independent practice of DATM usage. A smartphone-based prototype providing DATMs of simple floor plans was developed for a preliminary investigation. The evaluation results have proven that the interaction with the LLM could help the participants better understand the DATMs’ content and could vividly replicate them by drawings.

Keywords:

audio–tactile map; LLM; visually impaired; practice

1. Introduction

Despite the public deployment of various mobility-accessible solutions like tactile paving, guide volunteers, guide dogs, etc., recent studies have revealed that people who are visually impaired (PVIs) still face difficulties in travel activities (e.g., visiting tourist attractions or entertainment centers) [1,2]. Traveling requires the spatial exploration ability of one’s self to understand the spatial settings of the destination or the surrounding areas for preparing travel plans or navigating independently on site [3]. People who are sighted can easily convey such information by directly observing the real-world environment or its digital replicates (e.g., Google Maps [4]) through their eyes. On the other hand, due to the absence or damage of the visual sense, PVIs must fully rely on other sensory systems, such as tactile or auditory [5]. Therefore, existing research has focused on interpreting the spatial information from visual to other sensory formats for assisting the PVIs with spatial exploration when traveling.

Tactile maps are an effective way of delivering the spatial information of an area to PVIs and have been widely adopted around the globe [6,7]. A tactile map represents the areas of interest in the environment as physical embossed patterns like dots, lines, or shapes that the PVIs can feel directly through the sense of touch. To supply even more detailed information, a tactile map can also integrate audio information (audio–tactile map) that is played by pressing the corresponding physical buttons [8,9]. By interacting with tactile or audio–tactile maps, PVIs can build a mental map, which is an imaginary representation of the spatial environment formed by the human brain, and refer to it for navigational or explorational actions [10,11].

Nevertheless, audio–tactile maps are hard copies (non-refreshable) and cannot provide real-time updated information [12,13]. Therefore, several recent works have been conducted to digitize the conventional audio–tactile maps by utilizing haptic and audio feedback on touchscreen devices such as smartphones or tablets to enable real-time refreshability [13,14,15,16,17,18,19]. For instance, audio–tactile feedback such as vibrations, beep sounds, or audio descriptions can be applied to outdoor streets [16], buildings [17,18], or indoor facilities [13] when the PVIs touch them on screen. It has been reported that such digital audio–tactile maps (DATMs) could support PVIs in constructing mental maps for spatial exploration as efficiently as the analogous hard copy counterparts [14,20].

Despite the promising results, existing DATMs require explicit usage practice of the PVIs in order to understand the feedback mechanism, especially the haptic feedback. Unlike the tactile sensation of physical tactile maps derived from the natural pressure-based perception of skin displacement [21], haptic feedback comprises vibration patterns designed for specific objects in specific interactions [22]. Hence, along with intuitive feedback mechanisms, intuitive training or practice procedures have also been investigated in previous works [19,23,24]. Additionally, a recent interview [25] has also pointed out that, apart from training on several first uses, PVIs also desire frequent practice by themselves to maintain or improve their usage skills. Also, there has been a shortage of human instructors to facilitate usage practice for PVIs [26,27], thus affecting the adoption of DATMs in real life. As a result, it is crucial to investigate and integrate self-practice modules into existing DATMs to help the PVIs practice their usage skills.

Regardless of the type of solution, usage practice for PVIs requires verbal communication between them and the instructors, from which the PVIs receive verbal evaluation to calibrate their understanding accordingly [28]. In the case of DATMs, it is necessary to help the PVIs validate and correct their constructed mental maps of the spatial environment, like the perception of the area shape or the layout of points of interest (POIs) in the area [25]. For example, in a typical practice session, a person who is visually impaired tries out a DATM of a floor plan and verbally describes the layout, number of rooms, turning points in the corridor, etc., to an instructor [19]. Then, the instructor evaluates the correctness of that information by observing the DATM and the person’s interactions and provides verbal feedback and usage instructions to help him/her improve his/her skills. For the case of a self-practice module, it is challenging to perform similarly to such behaviors of human instructors, including the ability to not only comprehend both the PVIs’ verbal inquiries and the detail of ongoing practice maps, but also to provide appropriate and intuitive responses.

Recently, large language models (LLMs), such as ChatGPT [29] or Gemini [30], have gained significant attention. In fact, the ability of LLMs aligns well with the requirements for the self-practice module of DATMs thanks to their robustness in handling contextual natural language processing (NLP) and natural language generation (NLG) tasks [31,32]. Thus, there is potential to utilize LLMs to provide PVIs verbal feedback during self-practice using DATMs.

In this manner, this work presents a conceptual design and preliminary investigation of an LLM-based self-practice assistant for DATMs for PVIs. A prototype smartphone application was developed, which provides experimental floor plans with three simple elements, namely the entrance, restroom, and user’s location. Then, an experiment was conducted with blindfolded participants, where they were asked to learn two DATMs provided by the prototype. During the experiment, half of the participants were allowed to verbally interact with a prototype LLM assistant to confirm the validity of their perception of the DATMs, while the other half had to perform this by themselves. Based on the participants’ drawings of their mental maps, it has been confirmed that the presence of the LLM assistant could help the participants understand the DATMs more correctly. In addition, some opinions of the participants regarding the functionality of the LLM assistant and the implications for the hallucination problem are summarized for future works.

The remainder of this paper is organized as follows: Section 2 summarizes the related works. Section 3 presents the conceptual DATM framework with the integration of an LLM-based self-practice module. The details of the prototype implementation and experiment procedures are provided in Section 4. Section 5 and Section 6 analyze and discuss the experiment results, respectively. Finally, Section 7 concludes this paper.

2. Related Work

Touchscreen devices such as smartphones or tablets have been exploited for providing DATMs in various existing works. Especially, recent surveys [33,34] have revealed that more and more PVIs use smartphones currently, and some PVIs are also highly confident in their usage skills. Thus, recent works have focused on smartphone-based DATMs.

To name a few, Ref. [16] was one of the first works to consider smartphone-based DATMs by utilizing the built-in haptic engine and audio speaker. The study converted outdoor street maps into DATMs and argued that knowledge of the streets was more important for the PVIs than POIs such as the location of a church. Thus, when the user touched a street, constant vibration was applied and the name of the street was read out loud. Built upon this concept, many follow-up works have been conducted on different use cases or implications for POIs. For example, the works in [17,18] also considered DATMs of street maps, but took into account the presence of POIs such as buildings or playgrounds in the feedback design. Meanwhile, the works in [13,15,19,35] targeted indoor environments like shopping malls or apartment buildings and provided DATMs of floor plans where audio–tactile feedback was assigned to the stores (e.g., [13,35]) or area boundary (e.g., [15,19]). Usability evaluations of the above works have shown the potential of smartphone-based DATMs in assisting PVIs with both outdoor and indoor exploration.

Nevertheless, while physical tactile maps allow PVIs to perceive directly throughhuman primitive tactile sensation [21], smartphone-based DATMs use haptic feedback to simulate the tactile feeling via vibration patterns, thus requiring extensive practice. In previous works, such a usage practice required the presence of a human instructor to provide verbal guidance or verbal confirmation of the user’s perception. For example, in [19], the subjects were given verbal instructions and verbal feedback by the experimenter about distinguishing haptic textures or following trajectories using the prototype DATM throughout an approximately 45 min practice session. In addition to such verbal instructions and feedback, the participants in [23,24] had to correctly answer some criterion questions from the experimenter to conclude their practice.

The aforementioned works have highlighted the importance of verbal interaction with human instructors during usage practice in the successful adoption of DATMs. Nevertheless, several recent surveys have reported the shortage of human instructors to conduct usage practice with DATMs for PVIs [26,27]. Such a problem heavily affects the adoption of DATMs in real life.

In this work, the use of an LLM as a self-practice assistant of smartphone-based DATMs is investigated to deal with the above problem. Thanks to the robust ability and contextual awareness to comprehend human inquiries and respond in natural language [31,32], it is expected that the LLM can replicate the role of a human instructor to provide verbal confirmation of the PVIs’ perception during usage practice with DATMs.

3. Conceptual Framework

Figure 1 depicts the general framework of the existing works on DATMs.

Typical DATMs allow PVIs to explore the surrounding environment in real time by directly touching the built-in touchscreen with their fingers. Through continuously matching the touch positions with the POIs, corresponding haptic and/or audio cues are provided by the feedback module. Such feedback can be as simple as a constant vibration or beep sounds or more complex like specially designed vibration patterns, sound effects, or audio descriptions. Yet, it should be noted that designing such a feedback mechanism is not the focus of this paper.

In this work, the focus lies on the self-practice module, which provides the PVIs with practice maps and an LLM-based self-practice assistant for self-evaluation of their usage skills, as shown in Figure 2. The practice DATMs should simulate the content and provide similar feedback to the actually working DATMs. During practice, the PVIs can verbally ask the LLM assistant to confirm their understanding of the current practice map based on the perceived feedback.

As shown in Figure 2, the verbal inquiries of the PVIs are passed through a speech-to-text module before being prompted to the LLM. The prompts are then analyzed by the LLM assistant based on the available information of the practice tasks in the knowledge base. Typically, the knowledge base should contain, but not be limited to, the context of the respective DATMs (e.g., outdoor or indoor environment), the detail of the encoding of the POIs (e.g., shape and color) in the DATMs, and the strategy to provide responses (e.g., how to respond to which type of inquiries). Finally, the provided text-based responses and explanations are converted into speech and played through the device’s built-in speaker. In fact, the speech-to-text and text-to-speech steps can be redundant as modern LLMs can process multimodal input and output. That means one can directly speak to the LLM and have it directly talk back.

4. Prototype Implementation and Experiment Procedure

4.1. Prototype Implementation

Figure 3 summarizes the technical details of the prototype used in this work. A prototype smartphone application providing self-practice DATMs was developed for experimental investigation. Utilizing the native Kotlin libraries, the prototype was deployed on a Google Pixel 7a running Android 13. It is expected that the self-practice LLM assistant should be integrated into the prototype. However, for the convenience of data logging, it was deployed on a separate Core i9 machine running Ubuntu 22.04 LTS, equipped with 64 GB of RAM and an NVIDIA GeForce RTX 3070 GPU. The LLM assistant used gpt-4-turbo [36] on OpenAI’s server, the latest model of ChatGPT [29] at the time of the study. Although gpt-4-turbo could process audio input and output directly, for data logging and evaluation purposes, the speech-to-text and text-to-speech processes were performed offline. The speech-to-text module utilized OpenAI’s Whisper [37], while the text-to-speech module used gTTS [38].

Based on the PVIs’ desired information of the area shape and layout of the elements within the area as found in [25], the prototype encoded the area boundary and the location of the elements as the POIs. In this work, it was assumed that the PVIs explored the DATMs of simple floor plans of some enclosed rooms; the POIs included the boundary of the room, one entrance, one restroom, and the assumed user’s location, as shown in Figure 4. Each POI was assigned several types of haptic and audio feedback, which are presented in the next subsection.

4.2. Feedback Design

In principle, when interacting with physical tactile maps, PVIs distinguish the difference in tactile feeling between the embossed POIs versus the plain surface to build mental maps [5,6]. To simulate such tactile feelings, in this experiment, when any POI was being touched, a constant vibration with moderate amplitude (Figure 5a) was applied. This vibration remained playing as long as at least one POI was touched.

In addition, the PVIs might move their fingers directly from one POI to another (e.g., move between the room’s edges at the corner). In such a case, the vibration stayed constant, and it would be hard to realize the existence of multiple POIs. Therefore, when any POI switched from the idle state (i.e., not being touched) to the touch state, an abrupt vibration with high amplitude (Figure 5b) was triggered to inform about its existence. Also, to distinguish among different POIs, each POI was assigned a specific sound effect and audio description. The details of the designed sound effects and audio descriptions are summarized in Table 1.

4.3. Prerequisite Information for the LLM Assistant

Since the LLM was pretrained to deal with general tasks, it was necessary to provide it with prerequisite information so that it could better play the role of a self-practice assistant for DATMs. The prerequisite information included descriptions of the role, details of the POIs, and desirable responses. Table 2 provides the details of the prerequisite information provided to the LLM in this work.

The PVIs may ask the LLM assistant to directly tell them the information about the POIs in the beginning, instead of confirming their understanding after trying out the DATMs. However, the typical orientation and mobility lessons for PVIs follow the Socratic questioning method: the PVIs are encouraged to experience first, then have the trainers confirm and explain to recalibrate their own understanding [28,39]. Such a way is believed to help build the confidence of the PVIs in using the DATMs, rather than a prescriptive approach. Therefore, in this work, when being inquired directly about the POI details, the LLM assistant was instructed to ask the PVIs to try feeling the POIs on the screen first and provide some example questions for them to follow.

4.4. Experiment Procedure

A preliminary experiment was conducted with 10 blindfolded participants to investigate the helpfulness of the LLM self-practice assistant in constructing mental maps for spatial exploration. The experiment was divided into 2 phases: the training phase and the practice phase.

The purpose of the training phase was for the participant to be familiarized with the prototype and its feedback mechanism. During this phase, each participant tried out 4 basic lines, namely horizontal line, vertical line, diagonal line, and curve line, and 3 basic geometry shapes, namely rectangle, triangle, and circle. The feedback applied to those lines and shapes was identical to the ones applied to the room’s boundary. For every line or shape, the participant was informed of its type in advance and was given unlimited time to try it out until he/she determined that he/she understood the content.

Upon completing the training phase, the participants entered the practice phase, where they were asked to explore 2 practice DATMs, each containing a set of POIs, as described in Section 4.1. The visualizations of the DATMs are shown in Figure 6. For the practice phase, half of the participants explored the DATMs by themselves without any assistance (denoted as Group A), while the other half were allowed to communicate with the LLM assistant (denoted as Group B) to inquire about their perception of the DATMs.

While the assumed participant’s location, entrance, and restroom were distributed randomly in the DATMs, the shapes of the rooms were decided on purpose. Figure 6a features a room shaped like a regular rectangle, which was analogous to what the participant was given during the training phase. On the other hand, the room shown in Figure 6b represents an irregular shape, which typically required more cognitive effort to recognize [40]. It was hypothesized that such an irregular shape could better highlight the influence of the LLM assistant on the mental map construction of the PVIs.

Similar to the training phase, each participant was given unlimited time to explore the DATMs. For each DATM, once the participant determined that he/she fully understood the content, he/she was asked to draw out the map based on his/her understanding using the Markup tools on an Apple iPad. Before drawing, the participant was informed of the visual details (colors and patterns) of the POIs. Then, the drawings were compared with the ground truth screenshots of the DATMs using the MobileNetV3 model of Google’s MediaPipe [41], which provided the cosine similarity score of their embedded features. The similarity score served as the quantitative evaluation measurement of the participant’s perception of the DATMs.

Finally, a qualitative subjective evaluation was conducted with the participants from Group B, who were given access to the LLM assistant during the practice phase. These participants were asked to evaluate the LLM assistant based on the following questions:

Q1.: Did the LLM assistant hear the inquiry correctly?
Q2.: Did the LLM assistant respond correctly?
Q3.: Whether the LLM assistant responded fast?
Q4.: Whether the LLM assistant was helpful for you to understand the map?

For Q1 and Q2, the participants were shown a list of all inquiries heard by and responses from the LLM assistant, respectively, throughout the practice phase and were asked to rate them as “correct” or “not correct”. Q3 and Q4 were rated on a 5-point Likert scale.

5. Results and Analysis

5.1. Similarity Scores between Ground Truth DATMs and Maps Drawn by the Participants

Table 3 provides the average similarity scores between the ground truth DATMs and the maps drawn by the participants during the practice phase, while Figure 7 provides their visualizations for qualitative comparison.

According to Table 3, for the rectangular DATM, there was no significant difference in the average similarity scores between the participants from both groups, and they were relatively high. Inferring from Figure 7a, almost all participants correctly understood the shape of the room boundary (except Participant A2). Although some participants misunderstood the room edge on which the entrance should be located (e.g., Participants A5, B3, and B5), the drawn locations were close to the ground truth.

For the irregularly shaped DATM, Table 3 shows that the average similarity scores of both groups deteriorated compared to the rectangular DATM. This demonstrates that the participants had more trouble understanding the irregularly shaped DATM and the regular rectangle one. According to Figure 7b, while all participants of both groups identified their locations, the entrance, and the restroom relatively correctly, the shape of the room boundary was perceived differently. Almost all participants from Group A misunderstood the room shape (except Participant A3). Thus, the similarity score of Group A deteriorated drastically (by 30.67%) compared to the rectangular DATM. On the contrary, with access to the LLM assistant, almost all participants from Group B understood the room shape relatively correctly (except Participant B5). As a result, the similarity score of Group B only slightly decreased (by 4.79%) compared to the rectangular DATM and was much higher than Group A (by 1.4 times). In the next subsection, the subjective evaluation of the LLM assistant is provided.

5.2. Subjective Evaluation of the LLM Assistant

Figure 8 shows the subjective evaluation conducted by the participants from Group B about their interactions with the LLM assistant during their practice.

According to Figure 8a, the participants determined that most of their inquiries were heard (Q1) and responded to (Q2) correctly by the LLM assistant. Interestingly, the number of correct responses was slightly higher than the correct prompts, indicating that, although the LLM assistant misheard a few prompts, it still managed to respond correctly. Table 4 shows the number of prompts the LLM assistant received for each DATM.

Based on Table 4, the rectangular DATM received only four inquiries, meaning that there was at least one participant who did not ask the LLM assistant anything. Meanwhile, the irregularly shaped DATM received significantly more inquiries. Such a result aligns well with the deteriorated similarity score of the irregularly shaped DATM reported earlier.

All participants also agreed that the LLM assistant was able to respond fast (Figure 8b). In fact, it could provide the response on average in 3.42 s. Finally, 3 out of 5 participants agreed that the LLM assistant was helpful to them in understanding the DATM (Figure 8c), while the rest neither agreed nor disagreed.

6. Discussion

This section discusses the influence of the LLM assistant in supporting the blindfolded participants in understanding the DATMs provided by the prototype.

6.1. Effectiveness and Usability

As presented in Section 5, the LLM assistant was inquired by the participants from Group B mostly about the irregularly shaped DATM, which was expected to require more cognitive effort. Figure 9 provides some extracted prompts and responses that the LLM assistant received and made during the practice phase.

Inferring from Figure 9, the LLM assistant could provide insightful feedback on the participant’s inquiries about not only the shape of the room (Figure 9d–f), but also the locations of the POIs (Figure 9b,c). In addition, it was able to recognize when the participant directly requested prescriptive information about the POIs and then provided guidance on how they should perform it and what they asked (Figure 9a). Specifically, for the irregularly shaped DATM (Figure 9d–f), the LLM assistant could confirm or explain vividly the visual details of the room shape, which was misunderstood by almost all participants in Group A. With such information, the participants from Group B could create much more similar drawings of this DATM than those from Group A. This highlighted the positive influence of the LLM assistant on improving the participant’s ability to understand the DATMs. Thus, it is highly promising to utilize the LLM assistant for the usage practice with existing DATMs for PVIs to promote real-life adoption.

Nevertheless, two participants (hereinafter referred to as P1 and P2) did not think the LLM assistant in this study was helpful for them when using the prototype, although not denying its accuracy and intuitiveness. P1 commented that the responses she received were indeed accurate and intuitive, but, “hard to capture the idea at only one time hearing”, and that she could, “fully understand only when reading them” (during evaluation for Q1 and Q2). “The way it answered was so polished… like reading [the answer] straight from a textbook than [how a human normally] answered”, P1 added. This suggests that the LLM assistant should be tailored to provide responses in a more compact and spoken language style, and should confirm whether the user understands its responses.

On the other hand, P2 found it hard to come up with questions to ask or to describe her understanding to confirm with the LLM assistant. She said, “I could imagine it in my head but to actually say it, it was hard […] Now looking at it with my eyes, I realize what I should have asked.” Thus, instead of reactively waiting for prompts and only providing confirmation of the user’s understanding, the LLM assistant should also proactively guide them on how to ask or express his/her thoughts.

Based on the above comments from P1 and P2, future works should focus on optimizing the behavior of the LLM assistant to make it perform more closely to a human trainer for the PVIs. Thus, it is desirable to involve human trainers for the PVIs in the design of the LLM assistant to construct proper expert data for efficient fine-tuning and behavior cloning [42]. For example, the human trainers can prepare a set of typical inquires and responses that align with the philosophy in the training documents for PVIs and fine-tune the LLM assistant based on them.

6.2. Concerns for Hallucination Problem

Despite the promising results, it is crucial to investigate the performance the LLM assistant in terms of hallucination, which is a critical problem of current LLM models and is receiving increasing attention lately. Hallucination refers to the phenomenon whereby the LLM responds to the user with false information [43]. In the case of the self-practice of DATM usage for PVIs in this study, hallucination describes the situation in which the LLM assistant provides a piece of information that does not align with the visual content of the DATM. Such a problem negatively affects the reliability of the self-practice module and the acquired usage skills of the PVIs. This subsection investigates two hallucination scenarios of the LLM assistant, as shown in Figure 10: when it was inquired about a non-existent POI (Figure 10a) and when the user insisted on false information about a POI (Figure 10b).

In the experiment of this study, almost all responses of the LLM assistant were correct (as graded by the subjective evaluation in Section 5.2). The only two incorrect responses were due to the mishearing of the participants’ verbal inquiries. Figure 10a shows an actual scenario happening during the experiment: The LLM assistant misheard the verbal inquiry of a participant and searched for a “Door 5” in the DATM. Still, it could realize that there was no such “Door 5” and responded reasonably. This was thanks to the prerequisite information about the set of considered POIs provided to the LLM assistant.

On the other hand, Figure 10b illustrates a simulated scenario where the experimenter insisted the LLM assistant provide false information about the location of the restroom. The LLM assistant correctly pointed out the false information and provided legitimate information in the first two responses. However, it finally hallucinated by complying with the false information in the third response. Actually, such a scenario did not happen at all during the experiment of this study. Also, considering the purpose of self-practice with DATMs, the PVIs tended to have the LLM assistant evaluate their perception rather than trying to convince it to listen to them. Thus, we speculated that such a hallucination scenario is less likely to happen in reality. Nevertheless, it is suggested that future studies about LLM-based self-practice modules for DATM should consider hallucination prevention practice such as fine-tuning or prompt engineering [44] to ensure the trustworthiness of the LLM self-practice assistant.

7. Conclusions

This paper discusses the potential of utilizing LLMs as a self-practice assistant for DATMs for PVIs. A prototype LLM assistant was developed for preliminary investigation with blindfolded participants. It has been proven that, thanks to the interaction with the LLM assistant, the participants could learn the spatial features from the DATMs, including the area boundary and locations of the elements within it, and could correctly recreate them with drawings. The subjective evaluation has also determined the helpfulness of the LLM assistant in aiding the participants’ understanding of the practice DATMs. Such a promising result is expected to push the wide adoption of DATMs in real life to promote sustainable travel activities for PVIs, contributing to an inclusive society. Future works will improve the behavior of the LLM assistant by fine-tuning it with respect to the training documents for the PVIs.

Author Contributions

Conceptualization, C.M.T.; methodology, C.M.T., N.G.B., E.K., and P.X.T.; software, C.M.T.; supervision, E.K.; writing—original draft, C.M.T.; writing—review and editing, N.G.B., P.X.T., E.K., and M.K.; project administration, E.K. and M.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by JSPS KAKENHI Grant Numbers JP23K11952 and JP23K17260.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript; nor in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

DATM	digital audio–tactile map
PVI	people who are visually impaired
LLM	large language model
POI	point of interest

References

Ito, Y.; Kiyohara, H.; Awamura, K.; Yamaoka, C. People with Visual Impairment Continue to Experience Difficulties in Their Daily Lives that Affect Their Health-related Quality of Life after the COVID-19 Pandemic. JMA J. 2024, 7, 114–119. [Google Scholar] [CrossRef]
Alves, J.P.; Eusébio, C.; Carneiro, M.J.; Teixeira, L.; Mesquita, S. Living in an untouchable world: Barriers to recreation and tourism for Portuguese blind people during the COVID-19 pandemic. J. Outdoor Recreat. Tour. 2023, 42, 100637. [Google Scholar] [CrossRef]
Engel, C.; Müller, K.; Constantinescu, A.; Loitsch, C.; Petrausch, V.; Weber, G.; Stiefelhagen, R. Travelling more independently: A Requirements Analysis for Accessible Journeys to Unknown Buildings for People with Visual Impairments. In Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS ’20, Virtual Event, 26–28 October 2020. [Google Scholar] [CrossRef]
Google Maps. Available online: https://www.google.com/maps/ (accessed on 24 May 2024).
Chebat, D.R.; Schneider, F.C.; Ptito, M. Spatial Competence and Brain Plasticity in Congenital Blindness via Sensory Substitution Devices. Front. Neurosci. 2020, 14, 815. [Google Scholar] [CrossRef]
Jakub Wabiński, A.M.; Touya, G. Guidelines for Standardizing the Design of Tactile Maps: A Review of Research and Best Practice. Cartogr. J. 2022, 59, 239–258. [Google Scholar] [CrossRef]
Hofmann, M.; Mack, K.; Birchfield, J.; Cao, J.; Hughes, A.G.; Kurpad, S.; Lum, K.J.; Warnock, E.; Caspi, A.; Hudson, S.E.; et al. Maptimizer: Using Optimization to Tailor Tactile Maps to Users Needs. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, CHI ’22, New Orleans, LA, USA, 29 April–5 May 2022. [Google Scholar] [CrossRef]
Palivcová, D.; Macík, M.; Míkovec, Z. Interactive Tactile Map as a Tool for Building Spatial Knowledge of Visually Impaired Older Adults. In Proceedings of the Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, CHI EA ’20, Honolulu, HI, USA, 25–30 April 2020; pp. 1–9. [Google Scholar] [CrossRef]
Wang, X.; Kayukawa, S.; Takagi, H.; Asakawa, C. BentoMuseum: 3D and Layered Interactive Museum Map for Blind Visitors. In Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS ’22, Athens, Greece, 23–26 October 2022. [Google Scholar] [CrossRef]
Ottink, L.; van Raalte, B.; Doeller, C.F.; Van der Geest, T.M.; Van Wezel, R.J.A. Cognitive map formation through tactile map navigation in visually impaired and sighted persons. Sci. Rep. 2022, 12, 11567. [Google Scholar] [CrossRef]
Ottink, L.; Hoogendonk, M.; Doeller, C.F.; Van der Geest, T.M.; Van Wezel, R.J.A. Cognitive map formation through haptic and visual exploration of tactile city-like maps. Sci. Rep. 2021, 11, 15254. [Google Scholar] [CrossRef]
Holloway, L.; Ananthanarayan, S.; Butler, M.; De Silva, M.T.; Ellis, K.; Goncu, C.; Stephens, K.; Marriott, K. Animations at Your Fingertips: Using a Refreshable Tactile Display to Convey Motion Graphics for People who are Blind or have Low Vision. In Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS ’22, Athens, Greece, 23–26 October 2022. [Google Scholar] [CrossRef]
Paratore, M.T.; Leporini, B. Exploiting the haptic and audio channels to improve orientation and mobility apps for the visually impaired. Univers. Access Inf. Soc. 2023, 23, 859–869. [Google Scholar] [CrossRef]
Palani, H.P.; Fink, P.D.S.; Giudice, N.A. Comparing Map Learning between Touchscreen-Based Visual and Haptic Displays: A Behavioral Evaluation with Blind and Sighted Users. Multimodal Technol. Interact. 2022, 6, 1. [Google Scholar] [CrossRef]
Feitl, S.; Kreimeier, J.; Götzelmann, T. Accessible Electrostatic Surface Haptics: Towards an Interactive Audiotactile Map Interface for People with Visual Impairments. In Proceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments, PETRA ’22, Corfu, Greece, 29 June–1 July 2022; pp. 522–531. [Google Scholar] [CrossRef]
Poppinga, B.; Magnusson, C.; Pielot, M.; Rassmus-Gröhn, K. TouchOver map: Audio-tactile exploration of interactive maps. In Proceedings of the 13th International Conference on Human Computer Interaction with Mobile Devices and Services, MobileHCI ’11, Stockholm, Sweden, 30 August–2 September 2011; pp. 545–550. [Google Scholar] [CrossRef]
Kaklanis, N.; Votis, K.; Tzovaras, D. A mobile interactive maps application for a visually impaired audience. In Proceedings of the 10th International Cross-Disciplinary Conference on Web Accessibility, W4A ’13, Rio de Janeiro, Brazil, 13–15 May 2023. [Google Scholar] [CrossRef]
Darvishy, A.; Hutter, H.P.; Grossenbacher, M.; Merz, D. Touch Explorer: Exploring Digital Maps for Visually Impaired People. In Computers Helping People with Special Needs, Proceedings of the 17th International Conference, ICCHP 2020, Lecco, Italy, 9–11 September 2020; Proceedings, Part I; Springer: Berlin/Heidelberg, Germany, 2020; pp. 427–434. [Google Scholar] [CrossRef]
Tivadar, R.I.; Franceschiello, B.; Minier, A.; Murray, M.M. Learning and navigating digitally rendered haptic spatial layouts. Npj Sci. Learn. 2023, 8, 61. [Google Scholar] [CrossRef]
Giudice, N.A.; Guenther, B.A.; Jensen, N.A.; Haase, K.N. Cognitive Mapping Without Vision: Comparing Wayfinding Performance After Learning From Digital Touchscreen-Based Multimodal Maps vs. Embossed Tactile Overlays. Front. Hum. Neurosci. 2020, 14, 87. [Google Scholar] [CrossRef]
Johnson, K.O.; Phillips, J.R. Tactile spatial resolution. I. Two-point discrimination, gap detection, grating resolution, and letter recognition. J. Neurophysiol. 1981, 46, 1177–1192. [Google Scholar] [CrossRef] [PubMed]
Yau, J.M.; Kim, S.S.; Thakur, P.H.; Bensmaia, S.J. Feeling form: The neural basis of haptic shape perception. J. Neurophysiol. 2016, 115, 631–642. [Google Scholar] [CrossRef] [PubMed]
Robinson Moore, W.J.; Kalal, M.; Tennison, J.L.; Giudice, N.A.; Gorlewicz, J. Spatial Audio-Enhanced Multimodal Graph Rendering for Efficient Data Trend Learning on Touchscreen Devices. In Proceedings of the CHI Conference on Human Factors in Computing Systems, CHI ’24, Honolulu, HI, USA, 11–16 May 2024. [Google Scholar] [CrossRef]
Gorlewicz, J.L.; Tennison, J.L.; Uesbeck, P.M.; Richard, M.E.; Palani, H.P.; Stefik, A.; Smith, D.W.; Giudice, N.A. Design Guidelines and Recommendations for Multimodal, Touchscreen-based Graphics. ACM Trans. Access. Comput. 2020, 13, 1–30. [Google Scholar] [CrossRef]
Jain, G.; Teng, Y.; Cho, D.H.; Xing, Y.; Aziz, M.; Smith, B.A. “I Want to Figure Things Out”: Supporting Exploration in Navigation for People with Visual Impairments. Proc. ACM Hum.-Comput. Interact. 2023, 7, 1–28. [Google Scholar] [CrossRef]
Schles, R.A.; Chastain, M. Teachers of Students With Visual Impairments: Motivations for Entering the Field of Visual Impairment and Reflections on Pre-Service Training. J. Vis. Impair. Blind. 2023, 117, 62–73. [Google Scholar] [CrossRef]
Alhammadi, M.M. Availability of disability specialists for students with vision or hearing impairment in the United Arab Emirates: Current status and future needs. Disabil. Rehabil. Assist. Technol. 2024, 19, 1709–1717. [Google Scholar] [CrossRef] [PubMed]
Chundury, P.; Patnaik, B.; Reyazuddin, Y.; Tang, C.; Lazar, J.; Elmqvist, N. Towards Understanding Sensory Substitution for Accessible Visualization: An Interview Study. IEEE Trans. Vis. Comput. Graph. 2022, 28, 1084–1094. [Google Scholar] [CrossRef]
Chat GPT. Available online: https://openai.com/chatgpt/ (accessed on 24 May 2024).
Gemini. Available online: https://gemini.google.com/ (accessed on 24 May 2024).
Karanikolas, N.; Manga, E.; Samaridi, N.; Tousidou, E.; Vassilakopoulos, M. Large Language Models versus Natural Language Understanding and Generation. In Proceedings of the 27th Pan-Hellenic Conference on Progress in Computing and Informatics, PCI ’23, Lamia, Greece, 24–26 November 2023; pp. 278–290. [Google Scholar] [CrossRef]
Yang, J.; Jin, H.; Tang, R.; Han, X.; Feng, Q.; Jiang, H.; Zhong, S.; Yin, B.; Hu, X. Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond. ACM Trans. Knowl. Discov. Data 2024, 18, 1–32. [Google Scholar] [CrossRef]
Martiniello, N.; Eisenbarth, W.; Lehane, C.; Johnson, A.; Wittich, W. Exploring the use of smartphones and tablets among people with visual impairments: Are mainstream devices replacing the use of traditional visual aids? Assist. Technol. 2022, 34, 34–45. [Google Scholar] [CrossRef] [PubMed]
Senjam, S.S.; Manna, S.; Bascaran, C. Smartphones-Based Assistive Technology: Accessibility Features and Apps for People with Visual Impairment, and its Usage, Challenges, and Usability Testing. Clin. Optom. (Auckl.) 2021, 13, 311–322. [Google Scholar] [CrossRef] [PubMed]
Paratore, M.T.; Leporini, B. Haptic-Based Cognitive Mapping to Support Shopping Malls Exploration. In Smart Objects and Technologies for Social Goods; Pires, I.M., Zdravevski, E., Garcia, N.C., Eds.; Springer: Cham, Switzerland, 2023; pp. 54–62. [Google Scholar]
GPT-4 Turbo. Available online: https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4 (accessed on 24 May 2024).
Whisper: Robust Speech Recognition via Large-Scale Weak Supervision. Available online: https://github.com/openai/whisper (accessed on 24 May 2024).
gTTS: Python Library and CLI Tool to Interface with Google Translate’s Text-to-Speech API. Available online: https://github.com/pndurette/gTTS (accessed on 24 May 2024).
Chamberlain, M.N. The ABCs of Structured Discovery Cane Travel for Children; Information Age Publishing: Charlotte, NC, USA, 2021. [Google Scholar]
Wu, X.L.; Li, J.; Zhou, F. An Experimental Study of Features Search under Visual Interference in Radar Situation-Interface. Chin. J. Mech. Eng. 2018, 31, 45. [Google Scholar] [CrossRef]
MediaPipe. Available online: https://ai.google.dev/edge/mediapipe/solutions/guide (accessed on 24 May 2024).
Chang, J.D.; Brantley, K.; Ramamurthy, R.; Misra, D.; Sun, W. Learning to generate better than your LLM. arXiv 2023. [Google Scholar] [CrossRef]
Bai, Z.; Wang, P.; Xiao, T.; He, T.; Han, Z.; Zhang, Z.; Shou, M.Z. Hallucination of Multimodal Large Language Models: A Survey. arXiv 2024. [Google Scholar] [CrossRef]
Tonmoy, S.M.T.I.; Zaman, S.M.M.; Jain, V.; Rani, A.; Rawte, V.; Chadha, A.; Das, A. A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models. arXiv 2024. [Google Scholar] [CrossRef]

Figure 1. General framework of the existing works on DATMs.

Figure 2. Conceptual design of the LLM-based self-practice module.

Figure 3. Technical details of the prototype implementation.

Figure 4. Example DATM provided by the prototype.

Figure 5. Illustrations of the vibration patterns used by the prototype.

Figure 6. The practice DATMs used in the experiment.

Figure 7. Visual comparison between the ground truth DATMs and the maps drawn by the participants.

Figure 8. Subjective evaluation of participants from Group B about their interactions with the LLM assistant during their practice.

Figure 9. Extracted prompts made by the participants and the responses from the LLM assistant.

Figure 10. Investigated hallucination scenarios of the LLM assistant. The red underlined texts indicate the hallucinated (false) responses.

Table 1. The sound effects and audio descriptions of the POIs used in this work.

POI	Sound Effect	Audio Description
Room boundary	Standard keypress sound used in Android devices	None
Entrance	Sound of a doorbell	“Entrance”
Restroom	Sound of a drop of water	“Toilet”
User’s location	Sound of item collection in console game	“You are here”

Table 2. Prerequisite information provided to the LLM self-practice assistant.

Type	Content
Descriptions of the role	“You are a self-practice assistant of using digital audio-tactile maps for visually impaired user.You receive an image showing the map of a room where the user is at. The user asks you questions about the map to help him/her confirm their understanding. Typical questions are about the shape of the room or the locations of the user, the entrance, or the restroom.”
Details of the POIs	“The shape of the room is the outer black outlined shape. The black dot inside the room is the user’s location in the room.Assume that the user is facing north. The blue dot inside the room is the restroom. The green line overlapping with the room shape is the entrance.”
Desirable responses	“If what user says is correct, answer ’Correct’ and explain. Otherwise, answer ’Not correct’ and explain. As the targeted users are visually impaired people, when explaining, no need to indicate the black dot, blue dot, or green line, but directly referring to them as user’s location, restroom, or entrance. If user directly asks ’where’ something is or ’what’ the shape of the room is, tell them to try feeling it on the screen first and then ask the following type of question:Is the restroom on the right corner? or Is the room a rectangle shape? or Is the left edge straight?”

Table 3. Average similarity scores between ground truth DATMs and maps drawn by the participants.

DATM	Group A (Without LLM Assistant)	Group B (With LLM Assistant)
Rectangular shape	0.804 ± 0.138	0.802 ± 0.065
Irregular shape	0.558 ± 0.169	0.763 ± 0.118

Table 4. Number of prompts the LLM assistant received per DATM.

	Rectangular DATM	Irregularly Shaped DATM
Number of Prompts	4	13

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tran, C.M.; Bach, N.G.; Tan, P.X.; Kamioka, E.; Kanamaru, M. Enabling Self-Practice of Digital Audio–Tactile Maps for Visually Impaired People by Large Language Models. Electronics 2024, 13, 2395. https://doi.org/10.3390/electronics13122395

AMA Style

Tran CM, Bach NG, Tan PX, Kamioka E, Kanamaru M. Enabling Self-Practice of Digital Audio–Tactile Maps for Visually Impaired People by Large Language Models. Electronics. 2024; 13(12):2395. https://doi.org/10.3390/electronics13122395

Chicago/Turabian Style

Tran, Chanh Minh, Nguyen Gia Bach, Phan Xuan Tan, Eiji Kamioka, and Manami Kanamaru. 2024. "Enabling Self-Practice of Digital Audio–Tactile Maps for Visually Impaired People by Large Language Models" Electronics 13, no. 12: 2395. https://doi.org/10.3390/electronics13122395

APA Style

Tran, C. M., Bach, N. G., Tan, P. X., Kamioka, E., & Kanamaru, M. (2024). Enabling Self-Practice of Digital Audio–Tactile Maps for Visually Impaired People by Large Language Models. Electronics, 13(12), 2395. https://doi.org/10.3390/electronics13122395

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enabling Self-Practice of Digital Audio–Tactile Maps for Visually Impaired People by Large Language Models

Abstract

1. Introduction

2. Related Work

3. Conceptual Framework

4. Prototype Implementation and Experiment Procedure

4.1. Prototype Implementation

4.2. Feedback Design

4.3. Prerequisite Information for the LLM Assistant

4.4. Experiment Procedure

5. Results and Analysis

5.1. Similarity Scores between Ground Truth DATMs and Maps Drawn by the Participants

5.2. Subjective Evaluation of the LLM Assistant

6. Discussion

6.1. Effectiveness and Usability

6.2. Concerns for Hallucination Problem

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI