Next Article in Journal
Single-Step Radio Map Reconstruction with Multi-Feature Fusion via Mean Flow Matching
Previous Article in Journal
Zero-Shot 3D Asset Detection and Localisation Through Visual Grounding in Industrial Point Clouds
Previous Article in Special Issue
From Automation to Collaboration: Mapping AI–Human Interaction in Organizations Through Bibliometric Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Designing Human-Centred Adaptive AI Navigation for Blind and Visually Impaired Individuals: A Cognitive Load-Aware Framework for Accessible Urban Mobility

by
Pilar Herrero-Martín
1,2,* and
Álvaro García-Ballestero
1
1
Escuela Técnica Superior de Ingenieros Informáticos, Universidad Politécnica de Madrid, 28660 Madrid, Spain
2
Center for Biomedical Technology (CTB-UPM), 28223 Madrid, Spain
*
Author to whom correspondence should be addressed.
AI 2026, 7(6), 206; https://doi.org/10.3390/ai7060206
Submission received: 9 April 2026 / Revised: 29 May 2026 / Accepted: 31 May 2026 / Published: 5 June 2026
(This article belongs to the Special Issue Human-Computer Interaction and Human-Centered AI)

Abstract

Artificial intelligence systems increasingly mediate high-stakes human activities, yet urban navigation remains highly challenging for blind and visually impaired individuals. Although digital navigation technologies have significantly improved route planning and accessibility, many existing systems still rely on generic interaction paradigms that insufficiently account for cognitive load, contextual uncertainty, and the adaptive needs of vulnerable users. This challenge highlights the importance of Human-Centred AI approaches capable of supporting not only functional accessibility, but also cognitively sustainable and trustworthy interaction. This paper introduces LAZAR, a human-centred adaptive AI framework for accessible urban mobility grounded in a user-centred design methodology and formalised through a structured Software Requirements Specification. Rather than focusing exclusively on route optimisation, LAZAR approaches assistive navigation as an adaptive human–AI interaction problem in which instructional granularity, interaction frequency, and feedback mechanisms are designed to support user autonomy and situational awareness whilst limiting unnecessary cognitive burden. The proposed framework integrates high-fidelity prototyping, accessibility-oriented interaction modelling, and a modular multi-agent architecture intended to support adaptive and personalised guidance. Central to the approach is a cognitive load-aware interaction layer designed to regulate the presentation and timing of navigational assistance according to user needs and contextual conditions. The proposed multi-agent architecture is presented as a modular design framework whose interaction principles and interface logic were partially operationalised in the evaluated prototype. The complete integration of all adaptive coordination mechanisms, together with large-scale real-world validation, remains part of ongoing and future development work. This work contributes a structured methodology for the design of adaptive assistive AI systems that integrates accessibility requirements, human-centred interaction principles, and cognitively informed guidance strategies. A formative usability evaluation involving eleven visually impaired participants provides preliminary empirical evidence regarding usability, accessibility, and perceived usefulness of the proposed interaction model. The framework establishes a foundation for future research on inclusive and adaptive AI-based navigation systems in urban environments.

1. Introduction

Severe visual impairment and blindness constitute significant barriers to independent urban mobility worldwide, strongly associated with reduced autonomy and impaired daily functioning. Briefly placing the study in this broad context, it is critical to note that historically, the landscape of assistive urban navigation has relied on orientation and mobility (O&M) training supplemented by physical tools. While white canes afford excellent immediate ground-level drop-off detection, they cannot anticipate mid-air obstacles or provide route awareness beyond the user’s immediate physical reach [1]. The current state of the research field has increasingly focused on digital interventions. Mainstream Global Positioning Systems (GPS) extend navigational reach; however, platforms designed for sighted users deliver static macroscopic instructions that disconnect from dynamic physical realities [2]. Specialised spatial-audio applications improve macro-awareness but suffer from signal degradation in urban canyons and lack the capacity to perceive unmapped, real-time physical obstacles [3]. Conversely, vision-centric applications utilising cloud-based Artificial Intelligence offer object recognition capabilities. However, a diverging hypothesis in the field debates the reliance on cloud computing versus edge computing for assistive mobility. While cloud architectures offer superior semantic accuracy, they inherently struggle with the critical latency necessary for real-time pedestrian evasion and pose severe reliability and privacy concerns in signal-deprived areas [4]. To address these limitations, we argue that assistive urban navigation should be approached not only as a technological problem, but also as an interactional and cognitive challenge requiring careful regulation of information delivery and user attention. Existing systems are primarily designed around functional routing and object detection, yet they rarely account for how visually impaired users perceive, process, and act upon information under real-world constraints. In dynamic urban environments, excessive or poorly timed feedback can lead to cognitive overload, masking critical environmental cues and increasing safety risks. Therefore, the design of assistive AI systems must move beyond functionality toward cognitively sustainable interaction models that adapt to user capabilities, contextual uncertainty, and situational demands [5,6]. From a Human-Centred AI perspective, this challenge extends beyond technical performance to encompass how systems communicate, adapt, and align with human cognitive processes under uncertainty. In high-stakes scenarios such as urban navigation for blind users, the failure of AI systems is not merely a matter of inefficiency, but of safety and trust. Consequently, designing AI systems that are cognitively aligned with users’ perceptual and decision-making capabilities becomes a central research problem [7].
The main aim of this work is to present Live Audio Zone-aware Assisted Routing (LAZAR), a human-centred design framework and prototype architecture for adaptive assistive navigation based on edge-computing principles and modular multi-agent coordination. The framework combines accessibility-driven interaction design, real-time perceptual feedback, and adaptive guidance concepts intended to support safer and more cognitively sustainable navigation for blind and visually impaired users in urban environments.
The proposed framework illustrates how human-centred design principles can be operationalised into an adaptive assistive navigation system designed to account for cognitive load, user variability, and environmental uncertainty. By integrating accessibility-driven interaction design with a modular multi-agent architecture, LAZAR provides a structured foundation for adaptive guidance mechanisms capable of regulating interaction according to user needs and situational context.
This paper makes the following contributions:
(1)
It proposes a human-centred design framework for adaptive navigation assistance tailored to blind and visually impaired users in urban environments.
(2)
It introduces a cognitive load-aware interaction model that regulates instructional granularity, feedback frequency, and system intervention to ensure cognitively sustainable guidance.
(3)
It presents a structured integration of user-centred design principles with a modular multi-agent architecture intended to support adaptive assistive interaction, illustrating how adaptive behaviour can be embedded into assistive AI systems.
(4)
It defines a formal Software Requirements Specification (SRS) that explicitly links accessibility constraints with system behaviour.
(5)
It presents a formative usability evaluation with eleven visually impaired participants, providing preliminary empirical evidence for the proposed interaction model and identifying directions for future controlled validation.

2. Related Work

Independent urban mobility for blind and visually impaired individuals has historically depended on orientation and mobility (O&M) training combined with physical aids. White canes remain the most widely adopted tool, providing immediate tactile detection of ground-level obstacles; their limitation; however, is well-documented: they cannot anticipate overhanging or mid-air hazards, impose continuous attentional demands, and offer no spatial awareness beyond the user’s immediate reach [1,7]. Guide dogs extend navigational fluency and support obstacle avoidance, yet their deployment requires extensive O&M training, is financially and logistically inaccessible to many users, and conveys no semantic environmental information—such as bus stop identifiers or points of interest [8]. Sensory substitution devices, which encode visual scenes into auditory or haptic signals, have extended detection range in laboratory conditions, but field evaluations consistently report steep learning curves and elevated cognitive load that limit real-world adoption [9,10].
GPS-based navigation applications have partially addressed these limitations by extending spatial awareness to the macro scale. Tools such as BlindSquare and Lazarillo use satellite positioning and mapping services to deliver turn-by-turn verbal guidance and announce nearby landmarks and have achieved meaningful adoption within blind communities [11]. Their structural shortcomings, however, are equally well established: dependence on third-party map databases produces outdated or incomplete pedestrian infrastructure entries, and the absence of real-time sensing means that transient obstacles—roadworks, displaced street furniture, and vehicles parked on pavements remain not perceived by the system in real time. Pundlik et al. [12] confirm in a field evaluation that these platforms support route planning and landmark discovery effectively, but that moment-to-moment path safety continues to depend on the white cane, creating a persistent functional gap between long-range guidance and immediate obstacle avoidance [12]. Audio-augmented reality applications have sought to bridge this gap by delivering spatial audio cues alongside GPS guidance, with evaluated deployments demonstrating improved environmental awareness but continued dependence on connectivity and static map data [13].
Computer vision services running on remote infrastructure have offered a different trade-off. Cloud-backed applications such as Microsoft Seeing AI and Google Lookout enable on-demand object recognition and scene description, and user studies report that they are valued for their descriptive richness [14,15]. The same studies, however, surface two critical concerns. First, uploading image frames to external servers introduces end-to-end latency incompatible with real-time navigation. Second, image transmission raises documented privacy risks: blind users frequently capture bystanders or personally sensitive content unintentionally, and recent work [16] demonstrates that cloud-based recognition services create complex trade-offs between functionality and the disclosure of intimate or location-identifying imagery. These constraints motivate a shift toward on-device processing that retains perceptual capability without exposing raw visual data to remote infrastructure. More recent mixed-reality approaches have extended cloud-based recognition toward spatially contextualised guidance overlays, though they inherit equivalent latency and connectivity constraints [14].
The latency argument for on-device processing is quantitatively supported. Reference [17] show that even under moderate network load, queuing and transmission delays frequently dominate end-to-end inference time, producing performance inversion where cloud processing is slower than local execution—a directly safety-relevant outcome for pedestrian obstacle avoidance, where system response must align with human walking dynamics. Edge AI pipelines combining lightweight depth estimation and object detection have been shown to sustain real-time obstacle tracking on mobile hardware [18], helping to reduce the micro-navigation limitations that GPS-only solutions leave unresolved and eliminating the connectivity dependency that makes cloud architectures unsuitable as a primary safety channel.
The feasibility of running perception models on mobile hardware has improved substantially with advances in model compression. Post-training quantisation, weight pruning and knowledge distillation now routinely reduce the memory footprint and computational cost of vision models with acceptable accuracy loss [19]. Lightweight detector families such as SSD-MobileNet and EfficientDet have been successfully deployed on embedded accelerators for real-time detection in assistive contexts. Further, ref. [20] further demonstrate that modality-aware quantisation with gradient-free test-time adaptation sustains robust performance under distribution shifts on resource-constrained edge hardware—a relevant property for assistive navigation, where lighting conditions, obstacle categories and urban layouts vary continuously. These results collectively establish that on-device execution of obstacle detection and depth estimation is increasingly feasible on contemporary mobile hardware.
A separate thread of evidence concerns the cognitive cost of assistive audio feedback. A naïve reading of the literature might suggest that richer sensory output is always preferable, but this is empirically incorrect. Measured electroencephalographic (EEG) markers in participants using the Sound of Vision multimodal navigation aid and found significantly elevated cognitive load and working memory demand relative to baseline cane use, during both object identification and obstacle avoidance [10]. Extend this finding to multi-sensor fusion systems, noting that continuous audio output through headphones degrades the user’s ability to exploit ambient auditory cues—traffic, echolocation, human voices—which blind users employ as primary environmental signals [21]. In [22], specific design guidelines for audio navigation cues in pedestrian environments, demonstrating that cue calibration, in terms of both frequency and spatial encoding, directly affects navigation performance and cognitive demand. These findings suggest that audio feedback for visually impaired navigation should remain selective, spatially encoded, and cognitively calibrated rather than excessively verbose.
Multi-agent systems (MASs) offer a coordination paradigm suited to this problem’s structure. Decoupling heterogeneous tasks, such as macro-routing, real-time perception, user modelling, into autonomous, cooperative agents avoids the brittleness of monolithic designs, where a failure in one module can halt the entire system. Oliveira et al. [23] demonstrate JADE-based MAS deployments in ambient assisted living that orchestrate sensor networks while maintaining modularity; Reference [24] document analogous healthcare applications where agent decentralisation enables flexible data integration across services. Pintea et al. [25] confirm that JADE agents are viable on Android, managing communication, storage and user interaction within mobile memory constraints. A common architectural characteristic across these deployments is the decentralisation of functionality across specialised agents rather than reliance on a single monolithic control structure: coordination emerges from asynchronous message exchange under the FIPA protocol, so that a failure or latency spike in one module, such as a cloud API timeout, does not interrupt locally executing safety processes.
Taken together, the literature reviewed suggests several persistent limitations in current assistive navigation systems related to physical aid range limits, GPS macro-navigation without real-time sensing, cloud latency and privacy costs, and cognitively uncalibrated audio feedback collectively define the design space that LAZAR addresses. Edge-executed perception closes the obstacle detection gap without connectivity dependence. A JADE-based MAS on Android decouples safety-critical perception threads from non-critical routing operations, ensuring that a cloud API timeout cannot compromise immediate physical safety. A dedicated PersonalisationAgent modulates audio verbosity against the user’s cognitive profile, directly responding to the overload risk quantified by Lupu et al. [10] and Hadjidimitriou et al. [21]. The following section details the system requirements and user experience design derived from this analysis.
Taken together, the literature reveals a persistent fragmentation in assistive navigation systems for blind and visually impaired users. Existing approaches tend to address individual aspects of the problem—such as macro-level route guidance, real-time perception, or multimodal feedback—but rarely integrate them within a unified, adaptive interaction framework. Moreover, while recent advances in Edge AI and multi-agent systems enable real-time environmental awareness, limited attention has been given to how such capabilities should be mediated through cognitively sustainable interaction models. In particular, the regulation of instructional density, timing, and modality in response to user-specific cognitive and contextual factors remains underexplored. This gap is not merely technical but conceptual: current systems fail to treat interaction, cognition, and adaptation as a unified design problem. Addressing this limitation requires a shift toward Human-Centred AI frameworks in which perception, decision-making, and interaction are co-designed to support cognitively sustainable and context-aware assistance. The framework proposed in this work addresses this need by integrating accessibility-driven design, cognitive load-aware interaction, and a conceptual multi-agent architecture into a coherent assistive navigation paradigm.

3. System Requirement and User Experience Design

The design of LAZAR is not driven by interface aesthetics but by the need to operationalize cognitive and accessibility principles into interaction mechanisms. In safety-critical contexts such as urban navigation for blind and visually impaired users, interface design directly influences user perception, decision-making, and situational awareness. Each design decision presented in this section is therefore derived from constraints related to cognitive load, perceptual limitations, and interaction uncertainty, rather than conventional usability heuristics. This approach treats interface design as a key component of Human-Centred AI in safety-critical assistive environments, where interaction design directly influences cognitive workload, situational awareness, and navigation usability.

3.1. Target Demographics and Software Requirements Specification (SRS)

The full Software Requirements Specification is presented in this article. This section summarises the requirements most directly related to accessibility, adaptation, and safety, as these constitute the primary normative constraints governing the design decisions described throughout this paper.
A monolithic approach to inclusive design is inadequate; therefore, LAZAR’s interface dynamically adapts to distinct segments of visual acuity and cognitive capability. For users with visual impairment (total blindness), the system acts as a spatial extension, providing high-fidelity environmental mapping through haptics and sonification without requiring active screen engagement. For users with severe to moderate low vision, the application highlights spatial boundaries and state changes using high-contrast visual cues. Furthermore, the system addresses cognitive load variance—such as the difference between late-onset and congenital blindness—by scaling instructional verbosity appropriately.
The requirement specification presented here is not intended to be a software engineering deliverable but a methodological instrument: it makes explicit the normative constraints—demographic, cognitive, and regulatory—that directly govern and justify the design decisions described throughout this section. In a human-centred framework, requirements serve as the traceable link between user needs and system behaviour; their inclusion therefore constitutes a research contribution rather than an engineering artefact.
Consequently, the Software Requirements Specification (SRS) mandates well-defined functional and non-functional parameters to serve this demographic safely. Functionally, the system is required to perform real-time obstacle detection and semantic classification via the device’s rear camera, execute precise distance and Z-axis height estimations, and proactively deliver intelligent evasion suggestions. The architecture is designed to support adaptive route planning based on the user’s configured experience profile. Non-functional constraints dictate a critical processing latency of under 300 ms to support safe pedestrian navigation under real-time conditions, offline availability for core perception models, and designed in accordance with an “eyes-free” interaction paradigm, ensuring that visual attention is minimised as a requirement for safe system interaction.
This segmentation is not merely descriptive but directly informs the system’s adaptive behaviour, enabling the modulation of guidance strategies according to both perceptual capability and cognitive load tolerance. By explicitly linking user characteristics to system behaviour, the design moves beyond static accessibility configurations toward adaptive, user-aware interaction models.

3.2. Low-Fidelity Prototyping

Low-fidelity prototyping was used as an early validation mechanism to explore interaction concepts prior to implementation. Rather than focusing on visual fidelity, this approach enabled rapid iteration and direct user involvement, particularly important in accessibility-driven design contexts. This method also supports early identification of cognitive and interaction constraints, ensuring that design decisions are aligned with user capabilities from the outset.
The paper prototyping process follows an iterative, participatory logic. Screens are sketched individually on separate physical cards representing discrete application states, with interface elements—buttons, labels, navigation bars, content areas—drawn at approximate scale. Transitions between states are simulated by physically swapping cards, allowing a facilitator to enact the system’s responses to user gestures in real time. This technique, established in the foundational work of Snyder [26] and later operationalized across human–computer interaction research, enables rapid design validation at negligible material cost and without any dependency on development tooling.

3.2.1. Overview of the Paper Prototyping Method

Paper prototyping constitutes an established and widely adopted method within user-centred design practice, enabling the rapid externalisation of interaction concepts without dependency on digital tooling or engineering infrastructure. The method’s central premise is that design failures are best discovered and corrected at the lowest cost, before any implementation investment has been made. A facilitator enacts the system’s responses to user gestures by physically swapping individual screen cards, simulating the transition logic of a real application. This approach was selected for LAZAR’s early design phase based on three criteria: its compatibility with iterative, participatory design; its suitability for accessibility-driven contexts where early user involvement is essential; and its negligible material cost, which permits rapid design revision cycles. The specific adaptations made to render this method accessible to users with visual impairment are described in Section 3.2.3.

3.2.2. Role of Low-Fidelity Prototyping in the Design Process

The functional value of low-fidelity prototyping resides in its position at the boundary between abstract specification and tangible interaction. A Software Requirements Specification (SRS), however detailed, cannot reveal the emergent usability failures that arise when a user attempts to navigate a concrete interface under real cognitive conditions. Paper prototypes externalise these failures before any engineering investment has been made, reducing the cost of design revision.
In the specific context of LAZAR, low-fidelity prototyping served three distinct validation purposes. First, it allowed early verification of the macro-gestural interaction paradigm: whether the spatial logic of full-screen tap targets, binary split-screen cards, and peripheral state borders was legible to evaluators without any colour or animation. Second, it enabled preliminary assessment of information hierarchy—whether the sequential ordering of elements on each screen produced an intelligible narrative when read aloud, approximating the TalkBack traversal order that the high-fidelity prototype would later need to satisfy. Third, the physical nature of the artefact made it uniquely suited to involving users with visual impairment directly in the evaluation process, as detailed in the following subsection.

3.2.3. Tactile Low-Fidelity Prototyping for Users with Visual Impairment

A fundamental methodological challenge confronts any researcher seeking to conduct participatory low-fidelity prototyping with blind or severely visually impaired users: the standard paper prototype is a purely visual medium. Showing a hand-drawn screen to a participant who cannot perceive it does not constitute inclusive evaluation—it merely generates proxy feedback from sighted observers on behalf of the intended user population, introducing a layer of interpretive distortion that limits the participatory potential of the method.
To resolve this challenge, the LAZAR low-fidelity prototypes were constructed using a specialised three-dimensional relief paint: the KREUL Pluster & Liner Pen, a puffy-paint medium that, upon air-drying, expands to produce a raised, tactilely perceptible ridge on the paper surface. Applied along the outlines of interactive elements, this material converts a flat two-dimensional sketch into a multi-modal artefact whose structural boundaries can be perceived through direct fingertip contact, without requiring visual acuity. The technique preserves the essential properties of paper prototyping—rapid fabrication, physical swappability, low material cost—while extending the method’s accessibility to the primary target demographic of the system under design.
This approach aligns with established principles in tactile graphic production for visually impaired users, where raised-line drawings and thermoformed textures are employed to communicate spatial and structural information non-visually. The use of puff paint as a rapid-fabrication substitute for specialised embossing equipment offers a practically accessible alternative for early-stage design research, requiring no specialist hardware beyond the pen itself. The dried relief ridges produced by the KREUL medium are sufficiently pronounced to allow confident haptic boundary detection, while remaining thin enough not to interfere with the stacking and swapping of screen cards during simulated interaction sessions.

3.2.4. Screen-by-Screen Analysis of the Low-Fidelity Prototype

The physical prototype comprises four paper cards, as seen in Figure 1, each representing a discrete application state. The selection of these four screens reflects the minimum viable interaction set required to validate LAZAR’s core navigational flow: system activation, active obstacle detection, destination selection, and cognitive profile configuration. The following analysis describes each card and explains the rationale for the application of tactile relief marking.
  • Screen 1—LazarFragment, Idle State (“TAP TO START”)
The first screen represents the system’s resting state and serves as the primary entry point for interaction. It presents a central button to initiate navigation within a deliberately minimal interface that avoids additional visual or interactive elements. This configuration minimises interaction ambiguity and reduces cognitive effort during task initiation, which is relevant in non-visual interaction contexts. By restricting the interface to a single actionable element, the system ensures immediate comprehensibility and lowers the risk of erroneous input, supporting safe and confident user engagement from the outset.
Tactile marking is applied exclusively to the central button, reinforcing its role as the sole interactive target and enabling reliable detection through touch. This design decision reflects a broader principle of accessibility-driven interaction: only elements that require user action are emphasised, preventing unnecessary sensory distraction. The bottom navigation bar is included as a spatial reference, allowing users to infer layout structure through touch exploration while maintaining a consistent interaction framework across screens.
  • Screen 2—LazarFragment, Active Perception State (“VISION ACTIVE”)
The second screen represents the system in its active operational state, where real-time guidance is delivered. A central directional indicator and distance readout are positioned at the vertical midpoint, providing immediate spatial feedback to support navigation decisions. This centralization of information reduces cognitive effort by presenting guidance in a predictable and easily interpretable location, minimising the need for exploratory interaction.
In the lower section of the interface, two distinct interactive elements—route termination and cancellation—are spatially separated to prevent accidental activation. This separation supports safe interaction under non-visual conditions by reducing the likelihood of input errors, particularly in time-sensitive situations. The design reflects principles related to motor control and interaction safety, ensuring that users can reliably differentiate between actions through touch alone without introducing additional cognitive burden.
  • Screen 3—Destination Picker (“Where do you want to go?”)
The destination selection interface is structured as a binary split-screen layout, presenting two primary navigation options occupying equal portions of the screen. This configuration simplifies decision-making by limiting the number of available choices and enabling rapid selection based on coarse spatial interaction.
The use of large, clearly bounded interaction areas allows users to reliably identify and activate options through touch without requiring precise targeting. By emphasising only actionable regions and omitting non-essential tactile elements, the design minimises sensory clutter and was perceived as reducing interaction effort. This approach ensures that users can efficiently resolve navigation intent while maintaining focus on their surrounding environment.
  • Screen 4—Cognitive Profile Selection (“Your Experience Profile”)
The cognitive profile selection screen presents three interaction levels—Beginner, Intermediate, and Expert—organised as vertically aligned options across the interface. This structure enables efficient exploration through a single directional gesture, supporting intuitive navigation under non-visual conditions.
The uniform distribution of selectable elements facilitates spatial predictability, allowing users to build a mental model of the interface and reducing cognitive effort during interaction. The inclusion of a distinct confirmation action at the base of the screen establishes a clear separation between selection and progression, minimising the risk of premature input.
By allowing users to define their experience level, this component plays a central role in calibrating system behaviour, enabling the subsequent adaptation of guidance intensity, instructional detail, and feedback frequency. This directly supports cognitive load regulation, ensuring that interaction remains aligned with user capabilities and situational demands.

3.3. High-Fidelity Prototyping and Legislative Compliance

The high-fidelity prototype reinforces the transition toward multimodal, non-visual interaction, ensuring that the application serves as a sensory prosthesis rather than a standard mobile interface.

3.3.1. Eyes-Free Interaction and Multimodal Feedback

To safely direct users without interrupting their auditory environmental awareness (e.g., listening to approaching traffic), LAZAR leverages multimodal feedback, designed in accordance with WCAG 1.4.1 (Use of Colour) [27], which mandates that information must never be conveyed by colour alone. Firstly, the system employs haptic signatures where safety alerts, such as immediate frontal obstacles, trigger high-amplitude, distinct asynchronous vibration patterns. This enables confident urban navigation even in environments with extreme auditory noise. Secondly, instead of verbosely narrating every obstacle, LAZAR utilises a proximity-based acoustic system known as spatial sonification or the “Geiger Effect”. As obstacle proximity decreases, the system emits progressively accelerated acoustic pulses (“tick”) intended to provide spatial awareness cues without requiring continuous verbal narration. Finally, macro-gestural navigation is implemented to substantially reduce the need for precision tapping. Interaction relies on gross-motor gestures applied anywhere on the screen, such as a full-screen Swipe Up for Safe Mode or a Double Tap to mute Text-to-Speech. This interaction model is informed by Fitts’s Law principles to accommodate impaired motor–visual coordination.
From a cognitive perspective, this multimodal strategy reduces reliance on a single sensory channel, mitigating the risk of cognitive masking—where excessive auditory information suppresses environmental cues. By distributing information across haptic, auditory, and spatial channels, the system preserves the user’s ability to perceive ambient signals, which are essential for safe navigation in real-world environments.

3.3.2. Visual Accessibility and Contrast Metrics

The graphical interface was designed to exceed WCAG AAA contrast recommendations for low-vision accessibility. The interface deliberately rejects the use of pure, saturated colours—which induce visual fatigue and impair colourblind users—in favour of a specialised, dark background (#373635) that minimises screen glare. Against this background, primary elements (#E7F434) achieve a contrast ratio of 9.97:1, exceeding the WCAG AAA requirement (7:1), while secondary elements (#C1DE79) are rendered at an 8.00:1 contrast ratio. Furthermore, a “Safe” state is conveyed redundantly via a specific secondary colour border, bold text, and corresponding haptic feedback. In terms of typography and spatial layout, core reading font sizes start at an absolute minimum of 22 sp and scale dynamically, aligning with WCAG advisories for low-vision readability. System typefaces utilise heavy, bold weights to support high legibility. Consequently, actionable zones drastically exceed standard guidelines; LAZAR implements a single full-screen actionable area for primary actions, or dual 50% split-screens for binary inputs, ensuring touch targets are substantially larger than the minimum 48 × 48 dp mandated by European legislation.

3.3.3. Regulatory and Legislative Compliance

LAZAR’s architectural and UI/UX was designed in accordance with current digital accessibility legislation governing public and assistive technologies. Primarily, it aligns with the Web Content Accessibility Guidelines (WCAG) 2.2 at Level AAA. By aligning with these requirements (specifically regarding contrast ratios, multimodal operability, and cognitive predictability), LAZAR establishes a normative design baseline intended to support usability for users with severe visual impairment through multimodal and non-visual interaction mechanisms. It should be noted that standard compliance constitutes a design-alignment criterion rather than empirical evidence of operational effectiveness; the latter requires independent evaluation under real-world conditions. Furthermore, the system complies with EN 301 549 [28], the designated standard for digital accessibility in Europe. This aligns the system design with hardware–software interface rules, specifically surrounding non-visual operability, biometric fallback, and minimal touch target dimensioning.

3.4. App Screens and Interface Flow Analysis

The onboarding sequence is designed not only as a configuration process but as a calibration phase in which the system establishes an initial user interaction profile intended to support adaptive guidance configuration. This phase enables the system to parameterize guidance strategies, balancing safety, efficiency, and cognitive load from the outset of interaction.
The following section provides a systematic, screen-by-screen analysis of LAZAR’s interaction architecture as materialised in its high-fidelity prototype. Each screen is examined not as an isolated visual artefact, but as a functional node within a broader non-visual interaction graph, in which every layout decision is traceable to a normative requirement or a cognitively grounded design principle established in the preceding sections.

3.4.1. Onboarding and Cognitive Profiling Sequence (Steps 1–5)

Welcome Screen (Step 1/5)
Figure 2 shows the onboarding sequence, which establishes the initial interaction relationship between the user and the system. The splash screen (Step 1 of 5) presents a deliberately sparse compositional grammar: the LAZAR logotype—rendered in the primary chromatic token (#E7F434) against the dark background (#373635)—is positioned in the vertical centre of the display, flanked above by a minimal step counter (“Step 1/5”) in subdued secondary typography. Below it, a two-line subtitle (“Your accessible mobility assistant. We’ll set up your profile in a few steps.”) functions as an immediate verbal disambiguation of the application’s purpose, reducing the cognitive overhead associated with first-contact uncertainty.
The sole actionable element on this screen is a full-width “Get Started” button anchored to the lower quarter of the display. This layout embodies the macro-gestural interaction paradigm established in Section 3.3.1: by rendering the call-to-action as a full-width pill spanning approximately 80% of the screen’s horizontal extent and occupying a touch height well exceeding the 48 dp minimum mandated by EN 301 549 clause 5.9 [28], the design eliminates any spatial precision requirement for activation. Fitts’s Law formalises this decision rigorously—target acquisition time is an inverse logarithmic function of target width and distance; a button of this scale effectively substantially reduces target acquisition effort, independent of the user’s degree of motor–visual impairment. The progressive step counter furthermore satisfies WCAG 2.4.8 (Location), providing orientation within the process without demanding active visual tracking.
Cognitive Profile Selection (Step 2/5—“Your Experience Profile”)
This screen operationalises one of the core interaction principles underlying LAZAR: that assistive verbosity and routing conservatism can be dynamically parameterised to the user’s navigational competence, rather than imposed as a monolithic configuration. The screen presents three mutually exclusive profiles—Beginner, Intermediate, and Expert—each rendered as a full-width, independently tappable card occupying approximately one-third of the screen’s vertical space.
Each card integrates three semantically redundant channels of information: a colour-coded circular indicator (green, amber, red), a bolded textual label rendered at no less than 22 sp in conformance with Section 3.3.2, and a brief functional descriptor (“Safe and straight routes. Maximum assistance.”; “Balance between speed and assistance.”; “Faster routes. Minimal assistance.”). This tripartite structure is directly mandated by WCAG 1.4.1 (Use of Colour) [27], which prohibits conveying information through chrominance alone; the colour of the indicator is therefore never the sole differentiating signal. For TalkBack users, each card’s content descriptor is exposed as a unified contentDescription attribute, allowing the screen reader to announce the complete semantic unit—label, colour meaning, and functional consequence—as a single traversal node.
The radio-button state indicator positioned on the right margin of each card provides an additional, non-colour confirmation of the selected state, as illustrated in Figure 3. Crucially, however, the entire card surface—not merely the radio button—constitutes the interactive target. This design choice eliminates the precision-tapping failure mode that would arise if selection were confined to the 20 dp radio indicator alone, which would violate both Fitts’s Law and the minimum touch target dimensions specified in EN 301 549 [28]. The “Next” call-to-action mirrors the full-width convention established in Step 1, maintaining compositional consistency and reinforcing the gesture schema the user has already internalised.
This profiling mechanism constitutes a key component of the system’s adaptive logic, allowing the regulation of instructional verbosity, routing complexity, and intervention thresholds. By embedding cognitive load considerations into the interaction model, the system transforms user input into dynamic behavioural constraints that guide subsequent decision-making processes.
Physical Limitations (Step 3/5)
Step 3 extends the profiling logic from cognitive capacity to physical mobility. The screen presents four multi-select capability cards—Wheelchair, Walker/Crutches, Joints, and Quick Fatigue—each structured identically to the experience-profile cards of Step 2. This deliberate compositional homogeneity exploits the principle of consistent navigation (WCAG 3.2.3), ensuring that the interaction schema does not require relearning between steps.
Each card presents a pictographic emoji icon on the left, a bold category label and a one-sentence descriptor at centre, and an unchecked box on the right. The square checkbox—distinctly different from the circular radio button employed in Step 2—communicates multi-select semantics without requiring the user to comprehend conventional UI metaphors, as the surrounding card boundary renders the entire region activatable. The information captured in this step feeds directly into LAZAR’s route planner, filtering out paths with architectural barriers (e.g., unramped kerbs, steep gradients) in compliance with the mobility-adaptation requirements implicit in EN 301 549’s clause 4.2.9 (Usage without physical effort) [28].
The dual affordance at the screen’s base—a primary “Next” full-width button and a secondary “SKIP QUESTION” text action, as demonstrated by Figure 4—deserves specific attention. This hierarchy resolves the tension between data completeness (desirable for routing accuracy) and interaction burden (hazardous for users with severe mobility limitations who may find extended interaction fatiguing). The skip affordance is rendered in a visually subdued style, signalling its secondary priority to sighted users while remaining designed to support accessibility to TalkBack, which will traverse it as a distinct focusable node after the “Next” button.

3.4.2. Main “Radar”/Navigation Dashboard (LazarFragment)

The LazarFragment constitutes the operational core of LAZAR’s perceptual cycle. Two distinct interface states are observable in the prototype: the Idle State (“TAP TO START”) and the Active Perception State (“VISION ACTIVE”), each of which presents a compositional logic calibrated to the imperatives of eyes-free, safety interaction.
Idle State
In its idle configuration, the fragment presents a near-total chromatic void: the dark background (#373635) occupies the entire screen canvas, perforated only by a centred pill-shaped status label (“TAP TO START”) in the upper quarter and a compact gesture legend beneath it (“1 tap: describe • 2 tap: mute • hold: pause”). A single oversized camera icon button is positioned in the lower central region, rendered in the primary accent (#E7F434) and sized to approximately 80 dp—nearly double the minimum mandated threshold.
The deliberate absence of additional UI elements on this screen is architecturally intentional, not a design omission. Cluttered navigational interfaces impose irreducible cognitive overhead on users whose attentional resources are already allocated to environmental perception tasks—traffic monitoring, pedestrian avoidance, auditory cue processing. By maintaining a pristine gestural canvas, the design ensures that the interface minimises interaction ambiguity by reducing the number of competing actionable elements. The gesture legend, rendered in a secondary chromatic tone at approximately 16 sp, provides just-in-time VUI scaffolding for sighted low-vision users without cluttering the primary interaction surface. This legend is read once by TalkBack upon screen focus and subsequently suppressed in subsequent traversals, reducing redundant verbosity in accordance with WCAG 3.3.2 (Labels or Instructions).
The interface, which can be checked in Figure 5, incorporates a green-tinted perimeter border intended to communicate system readiness through peripheral visual signalling. This mechanism targets residual peripheral vision commonly preserved in some low-vision users and is complemented by corresponding auditory feedback upon activation, satisfying the redundant non-colour communication requirement of WCAG 1.4.1.
Active Perception State (“VISION ACTIVE”)
Upon activation, the interface transitions into the operational perception state associated with the PerceptionAgent within the proposed architecture. The status pill updates to “VISION ACTIVE” and is rendered with a luminous border, providing both a visual state change and a TalkBack-readable content update. The navigational centre of the canvas is occupied by a single diamond-shaped directional indicator—a wayfinding glyph rendered in the primary accent colour—whose internal arrow rotates to indicate the recommended evasion vector in real time. Below this glyph, the system renders the distance estimate to the nearest detected obstacle (e.g., “1.0 m”) in a typography size that approaches 80–90 sp, ensuring legibility at arm’s length under degraded visual conditions.
Two co-existing safety state modalities are observable across the screenshots: a green-bordered frame (safe-to-proceed, no immediate hazard, as seen in Figure 6) and a red-bordered frame (hazard detected, immediate proximity alert, as illustrated in Figure 7). This peripheral border mechanism is a deliberate departure from modal dialogue-based alerting. Modal alerts would interrupt the gestural canvas, force a confirmatory interaction, and introduce a cognitive mode-switching cost precisely when the user’s cognitive resources are most constrained—during active pedestrian navigation. The border encoding instead delivers state information continuously and peripherally, consistent with the multi-channel feedback architecture described in Section 3.3.1. The red border state is accompanied by the corresponding haptic signature and accelerating acoustic signature (the Geiger Effect), providing triple-channel redundancy that satisfies WCAG 1.4.1 [27] without requiring the user to direct visual attention to any specific screen region.
A single red “✕” button (approximately 72 dp square) appears in the lower-right corner exclusively in the active state, permitting route cancellation. Its placement in the far corner—rather than a central location—is a deliberate Fitts’s Law inversion: the button’s peripheral location increases acquisition distance, which is intentional. A centrally placed cancel button would risk accidental activation via the undifferentiated tapping gestures used to trigger other functions. Its elevated size compensates for the increased distance, preserving reachability while minimising false-positive cancellations during safety navigation.
This design reflects a broader Human-Centred AI principle: interaction should remain peripheral and non-intrusive during safety tasks. Rather than demanding user attention, the system operates as a background cognitive support layer, intervening only when necessary. This approach aligns with the concept of mixed-initiative interaction, where control dynamically shifts between user and system depending on contextual risk.

3.4.3. Search and Destination Picker (Routes Fragment)

The destination selection interface instantiates LAZAR’s binary card architecture in its most reduced form. The screen presents two and only two mutually exclusive full-screen cards: Free Navigation (voice-commanded destination input) and Go Home (activation of the pre-configured home address).
According to Figure 8, each card occupies approximately 50% of the vertical canvas below the header, consistent with the dual 50% split-screen layout specification described in Section 3.3.2. This partition is not arbitrary: it maps directly to the binary decision the user is required to make—whether to verbalise a novel destination or to execute a pre-memorised routine navigation—and eliminates any intermediate selection step. Each card integrates an oversized pictographic icon (microphone for voice, pedestrian-with-sign for home navigation), a bold primary label, and a single explanatory subtitle. The touch surface for each option spans the full card boundary, producing an interaction target whose dimensions are measured in hundreds of density-independent pixels, rendering Fitts’s Law target-acquisition costs negligible.
The Go Home affordance is of particular ergonomic significance. Research in assistive navigation consistently identifies home-return tasks as the highest-frequency routine navigation event for users with visual impairment. By elevating this function to a first-class top-level card rather than burying it within a search history list, the design minimises the interaction depth (and thereby the street-exposure time) required to initiate the system’s most commonly invoked routing workflow. This design decision is coherent with EN 301 549 clause 4.2.2 (Usage without vision) [28], which requires that all core functions be reachable through non-visual mechanisms with minimal interaction complexity.
The voice input pathway (“Free Navigation”) leverages the VUI layer defined in Section 3.3.1. Upon activation, the system invokes a speech recogniser whose output is processed by LAZAR’s address resolution module, allowing the user to name a destination in natural language without any keyboard interaction, an accommodation for users whose tactile discrimination precludes reliable soft-keyboard use.

3.4.4. Route Preview and Transport Configuration

The Route Preview screen introduces LAZAR’s highest compositional density, warranting careful analysis of its information hierarchy. As outlined in Figure 9, the screen is architecturally divided into two vertical zones: an upper interactive map panel (occupying approximately 45% of the screen height) and a lower route specification panel.
The map panel renders the calculated route polyline—highlighted in the primary accent colour (#E7F434) over a standard OpenStreetMap base—with origin and destination pins marked by oversized finger-cursor glyphs that convey interactivity semantics to sighted users. The map itself is informational rather than primary-interactive; its principal function within this interface is to provide spatial orientation for users with residual vision, and it is not a required interaction surface for task completion.
The lower panel organises route information along a strict top-to-bottom semantic hierarchy: (1) a “← Back” navigation affordance, (2) the “Route type” heading, (3) the confirmed destination label, (4) a “Today I feel:” contextual profile override dropdown, and (5) the transport mode cards. This sequential ordering is not decorative; it reflects the exact traversal sequence that TalkBack will follow in its default linear focus order. The system thus constructs a screen-reader-native narrative arc: “You are going [destination]. Your assistance level today is [profile]. Your available options are [modes].” This linear semantic architecture satisfies WCAG 1.3.2 (Meaningful Sequence), ensuring that the programmatic reading order corresponds to the intended informational hierarchy.
The “Today I feel” dropdown override is a notable contextual adaptation mechanism. It permits real-time reconfiguration of the system’s verbosity and routing conservatism without requiring access to the Settings panel. A user whose cognitive or physical state on a given day differs from their baseline profile can recalibrate the system in a single interaction, immediately upstream of navigation commencement. This design is particularly sensitive to the episodic nature of disability—conditions such as fatigue, anxiety, or acute visual degradation can vary significantly within a single day.
Each transport mode (Walking, Bus, and potentially others accessible via scroll) is presented as a full-width accessibility card. Within each card, the information is structured in a two-column micro-layout: an icon and mode label on the left, and the total duration and estimated arrival time on the right, rendered in enlarged bold typography. Sub-step breakdowns (e.g., “5 min → Bus L.652 → 40 min → walk 3 min”) are provided in a secondary typographic weight below the primary duration figure. This nested hierarchy ensures that TalkBack users receive the high-salience information (total journey time and arrival) as the card’s primary announced content, with detail accessible on demand through a subsequent focus interaction—a pattern that directly reduces the cognitive parsing load imposed by complex multi-modal route data.
The “START NAVIGATION” call-to-action is positioned as a full-width anchored footer button, maintaining the compositional convention established throughout the application. Its prominence and spatial separation from the transport cards eliminate any risk of the user committing to navigation before having evaluated all available mode options.

3.4.5. Emergency/Veto and Priority State Screens

The red-bordered danger state observable in the LazarFragment active screens represents the safety-prioritisation behaviour associated with the proposed SecurityAgent. When the PerceptionAgent classifies an immediate proximity hazard, or when the device’s IMU detects a tilt angle consistent with the phone pointing groundward (proposed here as a design-level behavioural heuristic for inferring potential spatial disorientation; empirical validation of this proxy measure constitutes a priority for future instrumented field studies). The SecurityAgent is designed to deliver safety-critical feedback through coordinated multimodal alerts.
At the visual layer, the entire screen perimeter transitions from the green safe-state border to a saturated red border rendered at full-screen extent. This peripheral chromatic override is designed to be detectable through scotopic (low-light) and peripheral vision, targeting the residual spatial vision retained by many low-vision users even when central acuity is absent. The choice of red, while a chromatic signal alone, is never the sole communicative channel—it is always accompanied simultaneously by a distinct haptic burst pattern and an escalated acoustic cadence, ensuring compliance with WCAG 1.4.1.
The interface does not introduce modal overlays, confirmation dialogues, or new interactive elements during the danger state. This restraint is deliberate and normatively grounded: EN 301 549 clause 4.2.3 (Usage without hearing) and clause 4.2.2 (Usage without vision) collectively mandate that safety-critical state transitions must not introduce new interaction requirements as preconditions for hazard resolution [28]. A modal “Danger Detected—Confirm” overlay would impose a confirmatory interaction burden at precisely the moment the user’s motor and cognitive resources are most critically engaged in physical hazard avoidance. By restricting the interface to a passive state change—the border colour, haptic pulse, and acoustic signal—the SecurityAgent maximises the signal-to-noise ratio of the alert without introducing new interaction complexity.
The directional diamond glyph at screen centre continues to display the recommended evasion vector during the danger state, ensuring that the system’s navigational guidance is not interrupted by the alert condition. This simultaneous hazard signalling and continued navigational guidance operationalises the “sensory prosthesis” paradigm: the system does not merely alert the user to danger but actively prescribes a corrective spatial behaviour.

3.4.6. Auxiliary Screens: Accessibility Social Map and Live Support

Accessibility Social Map (Map Fragment)
The map fragment surfaces a community-contributed layer of accessibility intelligence, displaying a full-screen OpenStreetMap canvas annotated with user-reported obstacles and accessibility ratings. The user’s position is represented by an oversized human silhouette icon rendered in the primary accent, surrounded by a translucent blue radius indicator that communicates the GPS accuracy envelope without requiring the user to interpret numerical coordinate data (as detailed in Figure 10).
A single floating action button (FAB) anchored to the lower-right quadrant—rendered as an oversized (#E7F434) rounded-square icon with a “+” glyph at approximately 72 dp—provides the sole contribution affordance. Its placement in the ergonomic “thumb zone” of a right-handed grip, combined with its scale, applies Fitts’s Law optimally for a secondary-priority action that must remain reachable but must not compete with the primary map interaction surface. The map canvas itself is freely pannable and zoomable via standard multi-touch gestures, all of which are natively supported by TalkBack’s explore-by-touch mode.
Live Support Fragment
The Live Support screen, detailed in Figure 11, is architecturally the simplest in the application, by design. Its compositional logic derives directly from the single-function, maximum-target-size principle. Two interactive elements occupy the screen: a non-interactive information card displaying the assigned WebRTC Room ID, and a primary “CALL LIVE SUPPORT” full-width button rendered in the primary accent at a touch height exceeding 80 dp. The button integrates both a telephony icon and a text label, providing redundant iconographic and textual semantics for TalkBack traversal.
The generous whitespace allocation above and below the interactive elements serves a critical motor-accessibility function: it eliminates any risk of the user contacting adjacent interactive elements when reaching for the primary button, a failure mode particularly prevalent among users with upper-limb tremor or imprecise motor control. This spatial buffering strategy is consistent with the pointer target spacing recommendations of WCAG 2.5.8 (Target Size, Minimum) at Level AA, which LAZAR’s implementation exceeds considerably.

3.4.7. Settings Architecture (Settings Fragment)

Based on Figure 12, the Settings screen departs from the card-centric layout grammar of the navigational fragments, adopting a grouped-list architecture organised under four semantic section headings: Language Selection, Cognitive Profile, Physical Limitations, Addresses, and Simulator. Each section heading is rendered in the primary accent at approximately 24–26 sp bold typography, establishing a clear visual hierarchy that allows sighted low-vision users to perform efficient visual scanning without requiring exhaustive linear traversal.
Each Settings item presents a two-level text hierarchy: a bold primary label (e.g., “Home Address”) and a secondary descriptor or current value (e.g., “Calle Arco de Poniente 7, Majadahonda”) at a reduced weight. For TalkBack, each item is exposed as a single focusable node whose announced content concatenates label and current value, enabling efficient auditory scanning without requiring the user to activate each item to discover its current state. This pattern satisfies WCAG 4.1.3 (Status Messages) by ensuring that configuration state is communicable non-visually without focus acquisition.
The persistent five-tab bottom navigation bar—present across Support, Routes, LAZAR, Map, and Settings fragments—constitutes the application’s primary spatial orientation scaffold. The active tab is highlighted with the primary accent colour pill background and an elevated icon, while inactive tabs are rendered in a subdued secondary tone. Tab targets span the full bottom bar height, with each occupying precisely one-fifth of the screen width, producing touch targets of approximately 68–70 dp in width—substantially exceeding the 48 dp minimum. The consistent presence of this bar across all non-critical screens ensures that global navigation is always reachable without requiring the user to return to a home state, satisfying WCAG 2.4.1 (Bypass Blocks) and 3.2.3 (Consistent Navigation).

3.4.8. Synthesis: Interface Flow as a Coherent Accessibility Architecture

Taken together, the interface flow of LAZAR constitutes a coherent accessibility-driven interaction architecture in which every design decision is grounded in cognitive, perceptual, and regulatory constraints. Rather than treating accessibility as a set of compliance requirements, the system operationalizes it as a dynamic design principle that shapes interaction, adaptation, and system behaviour. This synthesis illustrates how Human-Centred AI can move beyond usability toward cognitively sustainable, context-aware assistive systems capable of supporting vulnerable users in high-stakes environments.
Across all interaction components, the system consistently enforces a principle of cognitive load regulation, where information is delivered progressively and only when contextually necessary. This prevents cognitive overload and preserves the user’s capacity to perceive environmental cues, which is critical in real-world navigation scenarios. By embedding cognitive constraints directly into the interaction model, the system ensures that assistance enhances rather than disrupts user awareness.

4. Design of Multi-Agent Architecture for Adaptive AI Navigation

The architectural design of LAZAR (Live Audio Zone-aware Assisted Routing) is grounded in the need to operationalize human-centred and cognitively adaptive interaction principles within a real-world assistive system. Rather than treating architecture as a purely technical layer, the system is conceived as an enabler of adaptive behaviour, where perception, decision-making, and interaction are continuously aligned with user capabilities and contextual demands. In this sense, architecture functions not only as a computational structure but as a mechanism for regulating cognitive load, uncertainty, and system intervention in dynamic urban environments.
To achieve this, LAZAR adopts a smart overlay approach in which adaptive intelligence is layered on top of existing navigation infrastructures. Instead of replacing conventional routing systems, the framework augments them by integrating real-time perception and user-aware adaptation mechanisms. This enables the reinterpretation of static route information in light of dynamic environmental conditions and user-specific constraints, resulting in more context-aware and cognitively sustainable guidance.
The following architecture is presented as a modular design framework illustrating how adaptive interaction principles, environmental perception, and user-aware guidance can be coordinated within an assistive navigation system. While several interaction and interface components were operationalised in the evaluated prototype, the complete integration of all agent coordination and adaptive mechanisms remains part of ongoing implementation work. The agents, communication protocols, and coordination strategies described below constitute a structured framework for future implementation and are not operational components of the prototype evaluated in Section 5. Their inclusion in this paper serves to establish the architectural foundations and design rationale for the adaptive behaviour that future engineering work will realise.

4.1. General System Architecture

The overarching system integrates the mobile client, Edge Artificial Intelligence (Edge AI) processing matrices, and External Cloud Services into a cohesive assistive environment. The architecture is deliberately designed to minimise cloud dependency for critical pedestrian tasks, pushing computational processes required for real-time perception and adaptation directly to the mobile edge device (the user’s Android smartphone).
Looking at Figure 13, at the macro level, the system is divided into three primary operational domains. The first is the Mobile Edge Environment, which houses the human–computer interaction (HCI) layer, the onboard hardware sensors (rear camera, Inertial Measurement Unit, and GPS receiver), and the core JADE-based multi-agent system [29]. The second domain comprises the External Cloud Services, which the mobile edge queries asynchronously for non-critical strategic data; this includes the Google Directions API for macroscopic route geometry [30], OpenStreetMap (Nominatim) for precise geocoding [31], and a proprietary Adaptive Learning Server for long-term route scoring. Finally, the third domain is the Digital Twin Simulator, a local network diagnostic tool that mirrors the device’s state for safe, remote debugging.
This architectural configuration is not defined solely by performance considerations, but by its role in enabling adaptive and cognitively sustainable interaction. By coordinating perception, decision-making, and user feedback, the system ensures that navigation assistance remains aligned with user capabilities and environmental complexity, supporting safe and effective interaction in real-world conditions.

4.2. Multi-Agent System

The LAZAR system is structured as a modular multi-agent architecture in which distinct agents are responsible for complementary aspects of perception, decision-making, interaction, and safety. Rather than operating as isolated technical components, these agents collectively implement the system’s adaptive and human-centred behaviour, enabling real-time coordination between environmental awareness and user-specific guidance. The full system is detailed in Figure 14.

4.2.1. PerceptionAgent

The PerceptionAgent is responsible for constructing a real-time representation of the user’s immediate environment. Its primary function is to identify obstacles, estimate spatial relationships, and continuously assess potential risks along the user’s trajectory. Rather than acting as a simple detection module, the agent transforms raw sensory input into structured environmental information that can be integrated into the system’s adaptive decision-making processes.
From a human-centred perspective, the PerceptionAgent plays a critical role in bridging the gap between physical reality and system reasoning, ensuring that guidance is grounded in the user’s immediate surroundings. This functionality is achieved through lightweight on-device perception models, enabling low-latency and privacy-preserving operation.

4.2.2. NavigationAgent

The NavigationAgent is responsible for managing route planning and progression throughout the navigation task. It operates by integrating global routing information with local environmental updates, dynamically adjusting the user’s trajectory in response to detected obstacles and contextual changes.
Importantly, this agent does not operate in isolation but continuously incorporates input from the PerceptionAgent and higher-level system constraints. Its role is therefore not limited to computing optimal paths, but to maintaining coherent navigation strategies that balance efficiency, safety, and contextual feasibility within dynamic urban environments. This contributes to reducing cognitive complexity for the user by ensuring that navigation remains predictable and consistent despite environmental variability.

4.2.3. PersonalisationAgent

The PersonalisationAgent constitutes the core of the system’s adaptive interaction layer. Its primary function is to regulate the form, frequency, and granularity of guidance based on the user’s cognitive profile, experience level, and situational context.
By transforming user characteristics into dynamic interaction constraints, this agent modulates instructional verbosity and timing to prevent cognitive overload while preserving situational awareness. For novice users, guidance is more explicit and redundant, prioritising safety and clarity; for experienced users, instructions are selectively reduced to minimise cognitive interference. In this sense, the PersonalisationAgent embodies the core principles of Human-Centred AI by aligning system behaviour with user cognitive and perceptual capabilities.
As a concrete illustration of profile-differentiated behaviour: in Beginner mode, every detected obstacle is announced with its semantic class, estimated distance, and a recommended evasion direction (e.g., “Person ahead, 1.5 metres, move right”), and route scoring strongly penalises junctions, steep gradients, and high pedestrian density; straight-ahead confirmation prompts are retained at each waypoint. In Intermediate mode, evasion direction suggestions are omitted when the obstacle is in a peripheral sector and clearance is sufficient; confirmation prompts are reduced but not eliminated. In Expert mode, straight-ahead confirmation prompts are suppressed entirely, only obstacles within the critical proximity threshold (1 m) trigger spoken alerts, and route scoring applies minimal conservatism penalties, allowing faster routes with greater infrastructure complexity. These behavioural distinctions are operationalised through the profile parameter set at onboarding and adjustable per-trip via the “Today I feel” override on the Route Preview screen.
This agent operationalizes cognitive load-aware interaction, ensuring that the system remains supportive without becoming intrusive, and directly addressing the challenge of cognitive masking identified in assistive navigation research. It should be noted that, at the current design stage, cognitive load regulation is operationalised through a profile-based interaction modulation mechanism rather than through real-time psychophysiological measurement. The transition to quantitative, sensor-driven cognitive load estimation, incorporating instruments such as NASA-TLX or physiological workload markers, constitutes a primary objective of forthcoming implementation and validation work.

4.2.4. SecurityAgent

The SecurityAgent functions as the system’s safety-critical control mechanism, continuously monitoring both environmental risks and interaction conditions. When predefined safety thresholds are exceeded, such as the detection of an imminent obstacle or a high-risk navigation scenario, the agent is capable of overriding standard navigation processes to enforce immediate corrective actions.
From a Human-Centred AI perspective, this agent embodies a mixed-initiative interaction paradigm in which control dynamically shifts between user and system according to contextual risk. By prioritising safety over efficiency, it ensures that system autonomy is exercised only when necessary to prevent hazardous situations.

4.2.5. InterfaceAgent

The InterfaceAgent is responsible for translating system decisions into multimodal feedback that can be effectively perceived by the user. It integrates auditory, haptic, and spatial cues to communicate guidance while preserving the user’s ability to perceive ambient environmental information.
This multimodal strategy ensures redundancy across sensory channels and reduces dependence on any single modality, which is critical in real-world environments where auditory or visual conditions may be compromised. By controlling how information is delivered—not just what is delivered, the agent plays a central role in maintaining cognitively sustainable interaction. This ensures that system feedback remains perceptible and interpretable under diverse real-world conditions, supporting robust human–AI interaction.

4.2.6. SystemAgent

The SystemAgent manages system-level coordination and supports monitoring and iterative refinement of the architecture. It enables the observation of system behaviour through a Digital Twin representation, allowing developers to analyse interactions between agents and evaluate system responses under controlled conditions.
This component is particularly relevant for the safe development and validation of assistive systems, as it allows testing and optimisation without exposing users to real-world risks. As such, it supports both system reliability and the iterative improvement of human-centred interaction strategies. This capability supports the continuous improvement of interaction strategies while maintaining alignment with user-centred design principles.
Together, these agents do not merely implement system functionality but collectively enable adaptive, human-centred behaviour. Their coordination ensures that perception, decision-making, and interaction remain aligned with user cognitive constraints, reinforcing the integration of system architecture and Human-Centred AI principles.

5. User Validation Study

5.1. Study Design and Participant Profile

The evaluation reported in this study focuses on the usability, accessibility, and perceived usefulness of the high-fidelity prototype. It does not constitute a full validation of the real-time AI, perception, or multi-agent execution components. Instead, the study assesses whether the proposed interaction model, cognitive profiling approach, and guidance logic are understandable, acceptable, and perceived as useful by blind and visually impaired users. Results should therefore be interpreted within the scope of a formative prototype evaluation rather than as evidence of operational system performance.
A structured user validation study was conducted with eleven participants recruited from the visually impaired community in Spain, all of whom evaluated the LAZAR high-fidelity prototype under facilitated, think-aloud conditions. Each session was preceded by the administration of a full informed-consent protocol, which all participants signed voluntarily, confirming their agreement to participate and to permit audio-visual recording of the session where applicable. The evaluation instrument comprised three standardised questionnaires applied sequentially: the System Usability Scale (SUS), the User Experience Questionnaire (UEQ), and a structured qualitative feedback protocol composed oed questions addressing identified problems, areas of difficulty, most appreciated features, and overall experience.
Participants interacted with a functional Android application in which the interface, navigation flow, and cognitive profiling mechanisms were fully operational. It is noted, however, that the multi-agent coordination layer and adaptive learning components were not yet integrated at the time of evaluation; the study therefore validates the interaction model and interface design, with the full adaptive architecture constituting future implementation work.
For clarity, the following components were fully operational in the evaluated prototype: the interface layer and navigation flow (corresponding to the functional scope of the InterfaceAgent and NavigationAgent); real-time obstacle detection feedback and spatial sonification (PerceptionAgent); and profile-based interaction modulation, whereby verbosity and routing conservatism adapt to the selected experience profile (PersonalisationAgent in its instruction-filtering role). The SecurityAgent veto mechanism, the full ACL-mediated coordination between agents, the SensorFusionAgent server-assisted priors, and the adaptive learning loop were not integrated as a complete multi-agent society at the time of evaluation. The prototype therefore validates the interaction model, individual component behaviours, and perceived usability of the guidance logic, rather than the emergent coordination properties of the full multi-agent architecture.
The participant cohort presented a demographically rich and clinically heterogeneous profile, providing a clinically heterogeneous sample reflecting several profiles within the target population. Ages ranged from 38 to 85 years, with a mean of 53.5 years (SD = 14.9), and the group was evenly balanced by gender, comprising six female and five male participants. This age range is particularly significant: the inclusion of an 85-year-old participant at the upper bound ensured that the evaluation was not artificially skewed towards younger, more technologically fluent users, and directly tested the system’s claimed accessibility mandate.
In terms of visual acuity, six participants presented total blindness, two presented functional low vision, two presented only light and bulk perception, and one participant presented monocular vision with 90% acuity in the right eye and complete blindness in the left. With respect to onset, nine participants had experienced congenital visual impairment, whilst two had acquired it more than five years prior to the study—a distribution that is relevant to the system’s cognitive load calibration, given the established differences in spatial cognition and technology adaptation strategies between congenital and late-onset visual impairment populations.
The primary autonomous mobility tools employed by participants further validated the study’s representativeness: five relied on a white cane, four on a guide dog, and two required no physical aid. Experience with screen readers was distributed across all competency levels, with six participants classified as advanced TalkBack or VoiceOver users, three as intermediate, and two as beginners—the latter group being of particular relevance for testing the system’s claimed accessibility for first-time users of screen reader technology. Familiarity with voice assistants was high across the cohort, with seven participants reporting daily use and four occasional uses, suggesting a baseline comfort with voice-mediated interaction paradigms. GPS navigation application usage was notably broad: participants collectively reported experience with Google Maps (v6.1), Apple Maps (iOS 26), Lazarillo (v3.5), BlindSquare (v5.92), VoiceVista (v2.2), GoodMaps Outdoor (v5.1), and SinAI (v2.4), providing a robust comparative baseline against which to contextualise their perceptions of LAZAR.
A particularly telling pre-study metric was participants’ self-reported confidence in navigating independently along a new or unfamiliar route, measured on a scale from 1 (very low) to 5 (very high). The cohort mean was 2.55 (SD = 0.82), with no participant rating their confidence above 4 and the modal response being 2. This result, illustrated in Table 1, supports the relevance of the central problem motivating LAZAR’s design: even experienced, technologically capable users of assistive navigation tools report a fundamentally low sense of safety and confidence when confronted with unfamiliar urban environments, confirming the existence of the functional gap that the system seeks to address.
To support the interpretation of the usability and user experience results, the evaluation instruments were grounded in established benchmark references. SUS scores were interpreted according to the adjective rating scale proposed by Bangor, Kortum, and Miller [32], and contextualised against the broader empirical distribution of SUS evaluations reported by Bangor, Kortum, and Miller [33]. UEQ results were interpreted using the evaluation framework and benchmark approach described by Schrepp, Hinderks, and Thomaschewski [34].

5.2. System Usability Scale (SUS) Results

The System Usability Scale produced a cohort mean of 91.36 (SD = 7.79), placing the evaluated prototype within the “Excellent” band of the widely cited SUS benchmark taxonomy established by Bangor, Kortum, and Miller [32], which designates scores above 90 as meriting an adjective rating of “Best Imaginable” and scores in the 85–90 range as “Excellent.” To contextualise this result: the average SUS score across published usability studies is approximately 68 [33], which places LAZAR’s mean score more than 23 points above the industry average—a margin that, at this scale, represents a qualitatively categorical difference in perceived usability rather than an incremental improvement. Given the sample size of eleven participants, these figures should be treated as indicative of strong preliminary acceptability rather than as stable population estimates.
Nine of the eleven participants returned SUS scores above 85, with six scoring 90 or above. Two participants achieved the maximum possible score of 100.0—one a 42-year-old woman with total congenital blindness and advanced TalkBack proficiency, the other an 85-year-old woman with monocular vision and beginner-level screen reader experience. The latter result is especially significant: it indicates that the system’s macro-gestural interaction paradigm, full-screen touch targets, and minimal cognitive burden design are sufficiently accessible to enable confident and independently assessable use by a user at the extreme upper end of the age distribution, with no prior familiarity with TalkBack. This finding provides preliminary support for the accessibility-oriented interaction principles described in Section 3 articulated in Section 3.
The single score below 80 (72.5, P03) warrants specific contextual analysis. This participant presented functional low vision—the only demographic profile for which the system’s visual layer is the primary interaction channel rather than a supplementary fallback—and identified concrete technical issues that explain the score reduction: insufficient contrast on the map panel, inadequate font size in certain interface elements, and magnifier incompatibility with the bottom navigation bar. Critically, even this participant awarded LAZAR a perfect five-star overall rating, indicating that the perceived usability deficit was attributed to specific, addressable rendering parameters rather than to the system’s fundamental interaction model. The issues identified are entirely consistent with the known limitations of the high-fidelity prototype stage and constitute actionable items for the subsequent engineering iteration.
The evaluation setup was designed to reflect realistic navigation scenarios, ensuring that user interaction with the system captured not only functional usability but also the cognitive demands associated with real-world mobility.

5.3. User Experience Questionnaire (UEQ) Results

The User Experience Questionnaire, scored on a standardised scale from −3 (maximally negative) to +3 (maximally positive), produced results across all six canonical UEQ dimensions that are consistently positive across all measured constructs, as observed in Table 2. Using the UEQ benchmark dataset compiled from 452 product evaluations [34], scores above 0.8 are considered “above average,” above 1.5 “good,” and values approaching +3 represent the upper bound of the scale. LAZAR’s results substantially exceeded the “good” threshold on every single dimension.
Attractiveness, which measures the overall impression and emotional reaction to the product, achieved a mean of +2.72 (SD = 0.50)—the highest score across all six scales and a result that, with the caveat of the present sample size, is positioned within the upper range of the UEQ benchmark database. This finding indicates that participants’ immediate affective response to the system was overwhelmingly positive, characterised by consistent ratings of the interface as pleasant, attractive, friendly, and supportive rather than obstructive. Perspicuity, measuring ease of understanding and learnability, scored +2.48 (SD = 0.82), reflecting participants’ strong agreement that the system was comprehensible, easy to learn, and clear in its communicative intent—a result of particular note given the range of TalkBack experience levels in the cohort, from beginner to advanced. Stimulation, which captures the degree to which the product is exciting, motivating, and interesting, returned a mean of +2.52 (SD = 0.82), indicating that LAZAR was perceived not merely as a functional tool but as a genuinely engaging and stimulating artefact.
Dependability (+2.43, SD = 1.05) and Efficiency (+2.36, SD = 0.74) both returned scores in the excellent range, though the slightly larger standard deviation on Dependability reflects the variability introduced by participants who experienced TalkBack-specific interaction inconsistencies—issues attributable to the screen reader layer rather than to LAZAR’s own interaction architecture, as several participants explicitly noted during qualitative debriefing. Novelty produced the lowest score of the six scales at +2.07 (SD = 1.21), but this remains well within the excellent band. The higher standard deviation on this dimension is consistent with the cohort’s broad experience of existing navigation applications: highly experienced GPS users (who reported familiarity with up to six competing tools) applied a more comparative frame of reference when assessing novelty, whilst less experienced users tended to rate the system as more distinctly original.
These tasks were specifically designed to capture variations in cognitive load, decision-making complexity, and environmental uncertainty, allowing the evaluation to assess not only usability but also the system’s ability to support cognitively sustainable interaction.
These results should be interpreted considering the study’s sample size (n = 11). Whilst the scores are internally consistent and exceed established benchmark thresholds, they cannot be considered stable population estimates. No confidence intervals or power analyses are reported, as the study is formative in nature and the cohort was recruited from a population that is genuinely difficult to access in larger numbers. The findings are therefore presented as preliminary, hypothesis-generating evidence in support of the proposed interaction model, rather than as conclusive empirical validation. A controlled study with a larger and geographically diverse cohort is required to substantiate the present observations.

5.4. Overall Satisfaction Rating

The overall five-star satisfaction rating, administered as a global summative item at the close of the evaluation session, returned a mean of 4.64 out of 5.00 (SD = 0.48). Seven of the eleven participants awarded the maximum rating of five stars; the remaining four awarded four stars. No participant rated the system below four stars. The complete absence of ratings below four, combined with a modal response of five, produces a distribution with a notably low ceiling effects—all participants converged on the uppermost region of the scale without a single dissenting low-satisfaction response, including from participants who had identified specific technical issues during the session.
These results indicate that the proposed interaction model was perceived as supporting reduced cognitive burden whilst maintaining user autonomy. The consistency observed across participants suggests that adaptive guidance plays a critical role in supporting decision-making under uncertainty, particularly in non-visual navigation contexts.

5.5. Qualitative Findings

5.5.1. Most Appreciated Features

Thematic analysis of the open-ended qualitative responses revealed four consistently recurrent positive themes. The feature most frequently praised, and the one that elicited the most substantive commentary, was the Accessibility Social Map and community incident reporting mechanism. P09, a 67-year-old advanced TalkBack user with extensive experience of six competing navigation applications, offered the most pointed evaluation: “The social part is something that has been attempted many times. The fact of socialising it gives it novelty—in this sector there are no solutions with this ‘live’ component of reporting incidents or speaking with someone. This would be the first solution with this option.” This assessment, from a user with deep comparative knowledge of the assistive navigation landscape, constitutes particularly strong evidence for the system’s claim to architectural differentiation.
The second most frequently cited positive feature was the adaptive cognitive profiling system, and specifically the “Today I feel” contextual profile override mechanism on the Route Preview screen. P02, a congenitally blind advanced user, highlighted the design’s sensitivity to the episodic nature of disability: “The profiles—especially the ‘how you feel today’ feature—because you vary by days. You’re not equally alert every day.” This comment directly validates the design rationale for the contextual profile override as a first-class interaction element rather than a buried Settings option.
Route search speed and directness emerged as a third consistent theme. P11 specifically praised the immediacy of route rendering: “It shows you the route immediately and without hesitation—very comfortable to use compared to other market applications.” P03, despite identifying some visual-layer issues, singled out the interface’s structural economy: “Design agility—it doesn’t put you through too many sub-tabs. Very direct.” This qualitative convergence on directness and speed affirms that the minimal interaction depth architecture, in which core tasks are reachable in a small number of steps from any system state, is perceptible and valued by users operating under real cognitive and attentional constraints.
The obstacle detection and perception model was specifically praised by P10, who noted the concept’s potential for extension into wearable form factors—citing the possibility of integration with devices such as smart glasses as a natural evolution of the system’s core perceptual capability. The Live Support video call feature was praised by P04, who described its utility in high-stress navigation scenarios as “super useful and faster” than alternative assistance-seeking strategies available to blind pedestrians.

5.5.2. Identified Usability Issues

The qualitative data revealed a clear and analytically important pattern in the classification of usability issues: the majority of difficulties reported during sessions were attributable to the Android TalkBack screen reader’s own behaviour rather than to LAZAR’s interface design. This distinction was made explicitly by multiple participants. P07 noted that “as far as the application itself is concerned, no problems were found; the difficulty was more in the use of TalkBack.” P08 similarly distinguished between application-level design and screen reader behaviour, attributing the main friction to the absence of selection-confirmation audio cues—a TalkBack configuration parameter rather than an interface element within LAZAR’s own scope.
The most technically specific issues identified concerned TalkBack verbosity and icon handling: P10 and P11 both noted that decorative icons were being read aloud by the screen reader, which constitutes a content description labelling issue resolvable through the assignment of empty contentDescription attributes to non-interactive visual elements. P11 additionally noted that certain interface elements on the LAZAR active perception screen were being announced in an unbroken sequence, suggesting that element grouping via importantForAccessibility attributes requires refinement in a subsequent development iteration.
The map panel accessibility issue, identified by P06, is the most substantive design-level finding. The Accessibility Social Map fragment, which requires users to navigate a pannable and zoomable cartographic canvas to place incident markers, was identified as incompatible with fully eyes-free operation in its current form. This limitation is acknowledged, and the development roadmap for subsequent iterations includes the implementation of a coordinate-based voice input mechanism as an alternative to direct map interaction for incident placement.
Two participants—P01 and P04—raised the issue of platform exclusivity. Both are iOS users accustomed to VoiceOver rather than TalkBack, and their unfamiliarity with Android gesture conventions introduced friction that was correctly attributable to platform context rather than to LAZAR’s design. P01 specifically requested an iOS port, and P04 noted that the absence of a double-tap activation confirmation—a VoiceOver paradigm—introduced momentary uncertainty during selection tasks. These observations motivate future cross-platform development as a priority for the system’s broader accessibility mandate.

5.5.3. Notable Participant Observations

Two individual participant responses merit specific highlighting for their evidential weight. P07—the 85-year-old monocular participant with beginner TalkBack proficiency—delivered the evaluation session’s most vivid summary assessment: “For a person who sees minimally, it is fabulous; but for someone who cannot see, I believe it is an indispensable tool.” The convergence of a maximum SUS score (100.0), a five-star rating, and this qualitative judgement from the cohort’s oldest and least technologically experienced participant constitutes compelling evidence that the system’s accessibility mandate is not merely aspirational but operationally achieved at the prototype stage.
P09, whose extensive comparative experience of six navigation applications positions him as an informed evaluator, offered the most analytically precise commendation of the prototype’s engineering quality: “Other prototypes always apologise for what doesn’t work; here everything works perfectly, it is easy to learn and to use.” This observation—from a user who has experienced the limitations of competing systems firsthand—validates not only the design but the implementation maturity of the high-fidelity prototype as an evaluation artefact.
User feedback further reinforces the importance of adaptive interaction, with participants consistently highlighting the clarity, predictability, and non-intrusiveness of the system. These observations suggest that the regulation of instruction frequency and modality contributes directly to user trust and perceived safety.

5.5.4. Comparative Observations Relative to Existing Navigation Tools

Whilst the present study did not incorporate a controlled experimental comparison with existing assistive navigation systems, the participant cohort collectively reported extensive prior experience with tools including BlindSquare, Lazarillo, Google Maps, Apple Maps, VoiceVista, GoodMaps Outdoor, and SinAI. This breadth of prior exposure affords a meaningful experiential reference frame against which participants’ qualitative assessments of LAZAR can be contextualised, even in the absence of matched task conditions.
The feature most clearly distinguished from participants’ prior experience was the community-driven Accessibility Social Map. P09—an advanced TalkBack user with documented familiarity with six competing applications—noted that whilst crowd-sourced accessibility data had been attempted in various forms, no existing solution in the assistive navigation landscape provided a real-time, live incident-reporting and social communication channel of the kind implemented in LAZAR. This observation is consistent with the literature reviewed in Section 2, which identifies the absence of real-time transient obstacle reporting as a persistent functional gap in GPS-based navigation tools such as BlindSquare and Lazarillo [11].
Interaction efficiency and structural directness were a second axis of comparative differentiation. P11 specifically contrasted LAZAR’s route rendering speed favourably against unnamed market alternatives, whilst P03 praised the application’s structural economy—its avoidance of excessive sub-navigation layers—as a meaningful distinction from other tools the participant had used. These observations align with established critiques of existing navigation applications, which impose non-trivial interaction depth for common tasks [12].
It is acknowledged that these comparative assessments are experiential and self-reported rather than empirically controlled. Participants’ familiarity with competing tools varied, and direct task-matched evaluation under equivalent conditions was not conducted. A rigorous comparative evaluation—administering LAZAR and at least one existing baseline tool (e.g., BlindSquare or Lazarillo) under matched navigation scenarios—constitutes a high-priority direction for future validation work, as discussed in the Limitations subsection below.

5.5.5. Limitations of the Validation Study

We acknowledge that the current evaluation has several limitations. The sample size of eleven participants does not support broad statistical generalisation, although it is consistent with formative usability studies involving difficult-to-recruit populations such as blind and visually impaired users. In addition, cognitive load was addressed in this work primarily as a design construct rather than as a directly measured experimental variable. Consequently, the findings should be interpreted as preliminary evidence of usability, accessibility, and perceived value rather than as definitive proof of effectiveness or cognitive load reduction. Future work will incorporate validated workload assessment instruments, such as the NASA Task Load Index (NASA-TLX), together with behavioural indicators including task completion time, navigation errors, intervention frequency, and route deviation, in order to provide a more rigorous evaluation of cognitive and navigational performance.

5.6. Summary of Validation Outcomes

The user validation study produces a coherent and mutually corroborating set of quantitative and qualitative findings. A mean SUS of 91.36 places LAZAR in the “Excellent” band of the standard usability taxonomy; UEQ scores exceeding +2.0 on all six experiential dimensions are consistent with the benchmark upper quartile; and a mean satisfaction rating of 4.64 out of 5.00 with no rating below 4 reflects a degree of user acceptance that is rarely achieved at prototype stage, particularly within a cohort whose daily experience of assistive technology gives them a well-calibrated comparative standard.
The qualitative data reveals that the dominant positive responses cluster around precisely the features that represent LAZAR’s principal architectural differentiators from existing solutions: the community-driven social map, the cognitively adaptive profile system, and the minimal interaction depth design. The issues identified are, with one substantive exception (map panel accessibility), either TalkBack configuration parameters external to LAZAR’s own design scope or minor labelling corrections addressable in a single development iteration. These findings collectively support the conclusion that the LAZAR prototype has achieved functional validation of its core accessibility design principles across a clinically diverse and technically experienced participant cohort.
Overall, the evaluation provides preliminary empirical support for the proposed human-centred and cognitive load-aware design approach. The findings suggest that aligning system behaviour with user cognitive constraints was perceived positively in terms of usability, acceptability, and perceived safety, and that adaptive interaction constitutes a promising design direction for assistive AI systems operating in high-stakes environments. These conclusions are appropriately interpreted as formative, motivating future controlled validation rather than establishing definitive effectiveness.

6. Conclusions

This work has addressed an important limitation in current assistive navigation systems for blind and visually impaired users: the limited integration between accessibility, adaptive interaction, and cognitively sustainable guidance in dynamic urban environments. Existing solutions have traditionally focused on isolated aspects of the problem, such as route planning, obstacle detection, or interface accessibility without sufficiently considering how these capabilities can be coordinated to support real-world interaction under uncertainty and cognitive demand.
In response, this paper introduced LAZAR, a human-centred adaptive AI framework that approaches assistive navigation as an interaction-driven problem rather than solely a routing or sensing task. The proposed framework combines accessibility-oriented design principles, cognitive load-aware interaction strategies, and a modular multi-agent architecture intended to support adaptive and context-aware assistance. Through the integration of user-centred design, formal requirement analysis, and interaction modelling, the work provides a structured foundation for the development of more adaptive and cognitively sustainable assistive navigation systems.
The formative evaluation conducted with visually impaired participants provided encouraging results regarding usability, accessibility, perceived usefulness, and overall acceptance of the proposed interaction model. Participants responded positively to the multimodal interaction approach, the simplified navigation flows, and the accessibility-oriented design decisions incorporated throughout the prototype. These findings support the relevance of treating interaction design, cognitive accessibility, and adaptive guidance as closely interconnected dimensions within assistive AI systems.
At the same time, the scope of the present evaluation remains formative and exploratory. The study does not constitute a full validation of real-time AI execution, adaptive multi-agent coordination, or objective cognitive load reduction under uncontrolled urban conditions. Larger-scale evaluations incorporating behavioural metrics, validated workload instruments, and controlled comparisons with existing assistive navigation tools will therefore be necessary to further assess the effectiveness of the proposed approach.
Beyond the specific navigation scenario addressed in this work, the proposed framework contributes to the broader discussion surrounding Human-Centred AI in safety-critical environments. In particular, it highlights the importance of designing AI systems that adapt not only to environmental conditions, but also to the perceptual, cognitive, and interactional characteristics of their users. Future work will address this through several planned directions: (a) integration of validated cognitive load measurement instruments, including the NASA Task Load Index (NASA-TLX) and EEG-based workload markers, to provide direct empirical evidence of the cognitive load regulation claimed at the design level; (b) controlled comparative evaluation against existing assistive navigation tools such as BlindSquare and Lazarillo under matched task conditions, generating the baseline contrast absent from the present formative study; (c) cross-platform development to support iOS and VoiceOver users, addressing the platform exclusivity limitation raised by participants during evaluation; (d) full integration and activation of the complete multi-agent coordination layer, including ACL-mediated communication across all six agents, the SecurityAgent veto mechanism, and the server-assisted Bayesian adaptation loop; and (e) large-scale field trials with blind and visually impaired participants under uncontrolled urban conditions, incorporating objective performance metrics including task completion time, navigation errors, route deviation, and obstacle avoidance rate, alongside a larger and more geographically diverse participant cohort.

Author Contributions

Conceptualization was led by P.H.-M. Methodology and Software implementation were conducted by Á.G.-B. Resources, Formal analysis, Data curation, and Validation were carried out by P.H.-M. Writing—Original Draft Preparation was undertaken by Á.G.-B. and P.H.-M., while Writing—Reviewnd Editing and administrative interoperability information and relevant use cases were managed by P.H.-M. Supervision, project administration and funding acquisition was led by P.H.-M. All authors approved the submitted version of the manuscript, including revisions made by journal staff. Each author agrees to be personally accountable for their contributions and ensures that any questions related to the accuracy or integrity of any part of the work are appropriately investigated, resolved, and documented. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Universidad Politécnica de Madrid (UPM), Spain, under project RP2510470096.

Institutional Review Board Statement

The Ethics Committee for R&D+I Activities at the Universidad Politécnica de Madrid (Ref. CE260302) determined that formal ethical review and approval were not required under the applicable institutional procedures.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are available on reasonable request from the corresponding author. The questionnaire instruments employed in the evaluation (SUS and UEQ protocols and the structured qualitative interview guide) are described in this article.

Acknowledgments

The authors would like to thank the participants who contributed to the evaluation of the system.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
ACLAgent Communication Language
AIArtificial Intelligence
APIApplication Programming Interface
dpDensity-independent Pixels
EEGElectroencephalography
FABFloating Action Button
FIPAFoundation for Intelligent Physical Agents
FSMFinite State Machine
GPSGlobal Positioning System
GUIGraphical User Interface
HCIHuman–Computer Interaction
IMUInertial Measurement Unit
IoTInternet of Things
JADEJava Agent DEvelopment framework
JSONJavaScript Object Notation
LAZARLive Audio Zone-aware Assisted Routing
MASMulti-Agent System
O&MOrientation and Mobility
RESTRepresentational State Transfer
spScale-independent Pixels
SRSSoftware Requirements Specification
TCPTransmission Control Protocol
UDPUser Datagram Protocol
UIUser Interface
UXUser Experience
VUIVoice User Interface
WCAGWeb Content Accessibility Guidelines
WebRTCWeb Real-Time Communication

References

  1. Roentgen, U.R.; Gelderblom, G.J.; Soede, M.; de Witte, L.P. Inventory of electronic mobility aids for persons with visual impairments: A literature review. J. Vis. Impair. Blind. 2008, 102, 661–674. [Google Scholar] [CrossRef]
  2. Kouroupetroglou, G. (Ed.) Disability Informatics and Web Accessibility for Motor Limitations; IGI Global: Hershey, PA, USA, 2013. [Google Scholar]
  3. Ahmetovic, D.; Manduchi, R.; Kurniawan, S.; Pece, A. NavCog: A navigational cognitive assistant for the blind. In Proceedings of the 18th International Conference on Human-Computer Interaction with Mobile Devices and Services, Florence, Italy, 6–9 September 2016. [Google Scholar]
  4. Bigham, J.P.; Jayant, C.; Ji, H.; Little, G.; Miller, A.; Miller, R.C.; Miller, R.; Tatarowicz, A.; White, B.; White, S.; et al. VizWiz: Nearly real-time answers to visual questions. In Proceedings of the 23rd Annual ACM Symposium on User Interface Software and Technology, “dblp: UIST”, New York, NY, USA, 3–6 October 2010. [Google Scholar]
  5. O’Hare, G.M.; Jennings, N.R. (Eds.) Foundations of Distributed Artificial Intelligence; John Wiley & Sons: Hoboken, NJ, USA, 1996; Volume 9. [Google Scholar]
  6. Bellifemine, F.; Caire, G.; Greenwood, D. Developing Multi-Agent Systems with JADE; John Wiley & Sons: Hoboken, NJ, USA, 2007. [Google Scholar]
  7. Williams, M.A.; Galbraith, C.; Kane, S.K.; Hurst, A. How the Blind and Sighted See Navigation Differently. In Proceedings of the 16th International ACM SIGACCESS Conference on Computers & Accessibility, Rochester, NY, USA, 20–22 October 2014. [Google Scholar]
  8. American Foundation for the Blind. AFB Announces Research Examining the Role of the Guide Dog in the Lives of People Who Are Blind or Visually Impaired; American Foundation for the Blind: Arlington, VA, USA, 2022. [Google Scholar]
  9. Paré, S.; Bleau, M.; Djerourou, I.; Malotaux, V.; Kupers, R.; Ptito, M. Spatial Navigation with Horizontally Spatialized Sounds in Early and late Blind Individuals. PLoS ONE 2021, 16, e0247448. [Google Scholar] [CrossRef]
  10. Lupu, R.-G.; Mitruț, O.; Stan, A.; Ungureanu, F.; Kalimeri, K.; Moldoveanu, A. Cognitive and Affective Assessment of Navigation and Mobility Tasks for the Visually Impaired via Electroencephalography and Behavioral Signals. Sensors 2020, 20, 5821. [Google Scholar] [CrossRef] [PubMed]
  11. Tachi, K. Lazarillo: A Free Accessible GPS App for the Blind and Visually Impaired; Perkins School for the Blind: Watertown, MA, USA, 2024. [Google Scholar]
  12. Pundlik, S.; Shivshanker, P.; Traut-Savino, T.; Luo, G. Field Evaluation of a Mobile App for Assisting Blind and Visually Impaired Travelers to Find Bus Stops. Transl. Vis. Sci. Technol. 2024, 13, 11. [Google Scholar] [CrossRef] [PubMed]
  13. Katz, B.F.G.; Kammoun, S.; Parseihian, G.; Gutierrez, O.; Brilhault, A.; Auvray, M.; Truillet, P.; Denis, M.; Thorpe, S.; Jouffrais, C. NAVIG: Augmented reality guidance system for the visually impaired: Combining transport and pedestrian navigation. Virtual Real. 2012, 16, 253–269. [Google Scholar] [CrossRef]
  14. India, G.; Robinson, S.; Pearson, J.; Morrison, C.; Jones, M. Exploring the Experiences of Individuals Who Are Blind or Low-Vision Using Object-Recognition Technologies in India. In Proceedings of the CHI Conference on Human Factors in Computing Systems; ACM: New York, NY, USA, 2025; pp. 1–11. [Google Scholar] [CrossRef]
  15. Pfitzer, N.; Zhou, Y.; Poggensee, M.; Kurtulus, D.; Dominguez-Dager, B.; Dusmanu, M.; Pollefeys, M.; Bauer, Z. Mixed-Reality Navigation Assistant for the Visually Impaired. arXiv 2025, arXiv:2506.05369. [Google Scholar] [CrossRef]
  16. Sharma, T.; Tseng, Y.-Y.; Zhang, L.; Ide, A.; Mack, K.A.; Findlater, L.; Gurari, D.; Wang, Y. ‘Before, I Asked My Mom, Now I Ask ChatGPT’: Visual Privacy Management with Generative AI for Blind and Low-Vision People. arXiv 2025, arXiv:2507.00286. [Google Scholar]
  17. Chen, Z.; Hu, W.; Wang, J.; Zhao, S.; Amos, B.; Wu, G.; Ha, K.; Elgazzar, K.; Pillai, P.; Klatzky, R.; et al. An Empirical Study of Latency in an Emerging Class of Edge Computing Applications for Wearable Cognitive Assistance. In Proceedings of the Second ACM/IEEE Symposium on Edge Computing, San Jose, CA, USA, 12–14 October 2024. [Google Scholar]
  18. Rajapandi, B.; Divya, M.A. Advanced Edge AI Navigation and Assistance System for Blind User. Int. J. Multidiscip. Res. 2024, 6, 28522. [Google Scholar] [CrossRef]
  19. Merenda, M.; Porcaro, C.; Iero, D. Edge Machine Learning for AI-Enabled IoT Devices: A Review. Sensors 2020, 20, 2533. [Google Scholar] [CrossRef] [PubMed]
  20. Wang, X.; Jia, H.; Zhou, H.; Wang, S.G.; Zhang, Y.; Dang, T.; Gu, T. LQA: A Lightweight Quantized-Adaptive Framework for On-Device Vision-Language Models. arXiv 2026, arXiv:2602.07849. [Google Scholar]
  21. Theodorou, P.; Tsiligkos, K.; Meliones, A. Multi-Sensor Data Fusion Solutions for Blind and Visually Impaired: Research and Commercial Navigation Applications for Indoor and Outdoor Spaces. Sensors 2023, 23, 5411. [Google Scholar] [CrossRef] [PubMed]
  22. Magnusson, C.; Rassmus-Gröhn, K.; Tollmar, K.; Deaner, J. Designing Parsimonious Audio Navigation Cues for Blind Pedestrians. In Proceedings of the 12th International Conference on Human-Computer Interaction with Mobile Devices and Services, Lisbon, Portugal, 7–10 September 2010; pp. 417–420. [Google Scholar]
  23. Oliveira, J.D.; Engelmann, D.C.; Kniest, D.; Vieira, R.; Bordini, R.H. Multi-Agent Interaction to Assist Visually Impaired and Elderly People. Int. J. Environ. Res. Public Health 2022, 19, 8945. [Google Scholar] [CrossRef] [PubMed]
  24. Thakur, C. Multi Agent System Applied in Healthcare; SSRN: Rochester, NY, USA, 2023. [Google Scholar]
  25. Pintea, C.-M.; Tripon, A.C.; Avram, A.; Crişan, G.-C. Multi-Agents Features on Android Platforms. arXiv 2018, arXiv:1807.05826. [Google Scholar] [CrossRef]
  26. Snyder, C. Paper Prototyping: The Fast and Easy Way to Design and Refine User Interfaces; Morgan Kaufmann: San Francisco, CA, USA, 2003. [Google Scholar]
  27. W3C. Web Content Accessibility Guidelines (WCAG) 2.2—Recommendation; World Wide Web Consortium: Cambridge, MA, USA, 2023. [Google Scholar]
  28. EN 301 549 V3.2.1; Accessibility Requirements for ICT Products and Services. European Telecommunications Standards Institute (ETSI): Sophia Antipolis, France, 2021.
  29. JADE Project. Java Agent DEvelopment Framework (JADE)—Official Website and Documentation; JADE Project: Houston, TX, USA, 2022. [Google Scholar]
  30. Google. Google Maps Directions API—Developer Guide; Google Maps Platform Documentation: Mountain View, CA, USA, 2024. [Google Scholar]
  31. OpenStreetMap Foundation. Nominatim: OpenStreetMap Search Service—Usage and API Documentation; OpenStreetMap Foundation: Cambridge, UK, 2023. [Google Scholar]
  32. Bangor, A.; Kortum, P.; Miller, J. Determining what individual SUS scores mean: Adding an adjective rating scale. J. Usability Stud. 2009, 4, 114–123. [Google Scholar]
  33. Bangor, A.; Kortum, P.T.; Miller, J.T. An empirical evaluation of the system usability scale. Int. J. Hum.-Comput. Interact. 2008, 24, 574–594. [Google Scholar] [CrossRef]
  34. Schrepp, M.; Hinderks, A.; Thomaschewski, J. Applying the user experience questionnaire (UEQ) in different evaluation scenarios. In Design, User Experience, and Usability. Theories, Methods, and Tools for Designing the User Experience; Marcus, A., Ed.; Springer: Cham, Switzerland, 2014; pp. 383–392. [Google Scholar]
Figure 1. Low-fidelity paper prototype of the LAZAR application constructed using hand-drawn ink sketches on individual paper cards. From left to right: (1) LazarFragment Idle State, displaying the “TAP TO START” status label and the oversized camera activation button as the sole interactive target; (2) LazarFragment Active Perception State (“VISION ACTIVE”), showing the central diamond-shaped directional glyph, the “1.5 m” proximity readout, and the spatially separated “STOP ROUTE” and “✕” cancellation affordances; (3) Destination Picker, instantiating the binary 50–split card architecture with “Free Navigation” (voice input) and “Go Home” (saved address) as mutually exclusive full-screen targets; (4) Cognitive Profile Selection, presenting the three-tier Beginner, Intermediate and Expert selection cards with individual radio indicators and a full-width “Next” footer button. Interactive boundaries and primary affordances are outlined using KREUL Pluster & Liner Pen puffy paint, whose raised relief upon drying enables haptic detection of interactive element boundaries by users with visual impairment, extending the standard paper prototyping method to the application’s target demographic.
Figure 1. Low-fidelity paper prototype of the LAZAR application constructed using hand-drawn ink sketches on individual paper cards. From left to right: (1) LazarFragment Idle State, displaying the “TAP TO START” status label and the oversized camera activation button as the sole interactive target; (2) LazarFragment Active Perception State (“VISION ACTIVE”), showing the central diamond-shaped directional glyph, the “1.5 m” proximity readout, and the spatially separated “STOP ROUTE” and “✕” cancellation affordances; (3) Destination Picker, instantiating the binary 50–split card architecture with “Free Navigation” (voice input) and “Go Home” (saved address) as mutually exclusive full-screen targets; (4) Cognitive Profile Selection, presenting the three-tier Beginner, Intermediate and Expert selection cards with individual radio indicators and a full-width “Next” footer button. Interactive boundaries and primary affordances are outlined using KREUL Pluster & Liner Pen puffy paint, whose raised relief upon drying enables haptic detection of interactive element boundaries by users with visual impairment, extending the standard paper prototyping method to the application’s target demographic.
Ai 07 00206 g001
Figure 2. LAZAR onboarding welcome screen (Step 1 of 5). The full-width “Get Started” call-to-action anchored to the lower quarter of the display exemplifies the macro-gestural interaction paradigm: a touch target spanning approximately 80% of the screen’s horizontal extent eliminates spatial precision requirements for activation, in direct application of Fitts’s Law. The LAZAR logotype rendered in the primary chromatic token (E7F434) against the dark background (373635) achieves the 9.97:1 contrast ratio exceeding WCAG 2.2 Level AAA (7:1) criteria.
Figure 2. LAZAR onboarding welcome screen (Step 1 of 5). The full-width “Get Started” call-to-action anchored to the lower quarter of the display exemplifies the macro-gestural interaction paradigm: a touch target spanning approximately 80% of the screen’s horizontal extent eliminates spatial precision requirements for activation, in direct application of Fitts’s Law. The LAZAR logotype rendered in the primary chromatic token (E7F434) against the dark background (373635) achieves the 9.97:1 contrast ratio exceeding WCAG 2.2 Level AAA (7:1) criteria.
Ai 07 00206 g002
Figure 3. Cognitive profile selection screen (Step 2 of 5—“Your Experience Profile”). Three full-width accessibility cards present the Beginner, Intermediate, and Expert profiles through tripartite semantic redundancy: a colour-coded circular indicator, a bold textual label (≥22 sp), and a functional descriptor. Conforming to WCAG 1.4.1 (Use of Colour) [27], chromatic differentiation (green/amber/red) is never the sole distinguishing signal; the entire card surface constitutes the interactive target, eliminating precision-tapping failure modes in compliance with EN 301 549 [28] minimum touch target specifications.
Figure 3. Cognitive profile selection screen (Step 2 of 5—“Your Experience Profile”). Three full-width accessibility cards present the Beginner, Intermediate, and Expert profiles through tripartite semantic redundancy: a colour-coded circular indicator, a bold textual label (≥22 sp), and a functional descriptor. Conforming to WCAG 1.4.1 (Use of Colour) [27], chromatic differentiation (green/amber/red) is never the sole distinguishing signal; the entire card surface constitutes the interactive target, eliminating precision-tapping failure modes in compliance with EN 301 549 [28] minimum touch target specifications.
Ai 07 00206 g003
Figure 4. Physical limitations selection screen (Step 3 of 5) full scroll view. Four multi-select accessibility cards (Wheelchair, Walker/Crutches, Joints, Quick Fatigue) present mobility limitation options through pictographic icons, bold category labels, and single-sentence functional descriptors. The square checkbox widget—semantically distinct from the radio button employed in Step 2—communicates multi-select affordance. The dual footer architecture (primary “Next” button and secondary “SKIP QUESTION” text affordance) resolves the tension between data completeness and interaction burden, with the skip affordance accessible as a distinct focusable node for TalkBack traversal.
Figure 4. Physical limitations selection screen (Step 3 of 5) full scroll view. Four multi-select accessibility cards (Wheelchair, Walker/Crutches, Joints, Quick Fatigue) present mobility limitation options through pictographic icons, bold category labels, and single-sentence functional descriptors. The square checkbox widget—semantically distinct from the radio button employed in Step 2—communicates multi-select affordance. The dual footer architecture (primary “Next” button and secondary “SKIP QUESTION” text affordance) resolves the tension between data completeness and interaction burden, with the skip affordance accessible as a distinct focusable node for TalkBack traversal.
Ai 07 00206 g004
Figure 5. LazarFragment—Idle State. The near-total chromatic void of the primary canvas—perforated only by the “TAP TO START” status pill, the gesture legend (“1 tap describe · 2 tap: mute · hold: pause”), and the oversized camera FAB—constitutes the operational gestural surface. The green perimeter border communicates system readiness through the peripheral visual field, targeting residual scotopic vision. The deliberate absence of additional UI elements minimises extraneous cognitive load (WCAG 2.4, Navigable) during the user’s environmental attention allocation phase.
Figure 5. LazarFragment—Idle State. The near-total chromatic void of the primary canvas—perforated only by the “TAP TO START” status pill, the gesture legend (“1 tap describe · 2 tap: mute · hold: pause”), and the oversized camera FAB—constitutes the operational gestural surface. The green perimeter border communicates system readiness through the peripheral visual field, targeting residual scotopic vision. The deliberate absence of additional UI elements minimises extraneous cognitive load (WCAG 2.4, Navigable) during the user’s environmental attention allocation phase.
Ai 07 00206 g005
Figure 6. LazarFragment—Active Perception State (Safe). Upon activation, the PerceptionAgent renders the directional diamond glyph and a right-turn evasion vector at the vertical centre of the canvas. The green perimeter border encodes the “Safe” system state through peripheral chromatic signalling, complemented by the corresponding haptic and acoustic channels to satisfy WCAG 1.4.1 (Use of Colour) [27]. The red “✕” cancellation button (≈72 dp square) is positioned in the lower-right periphery—a deliberate Fitts’s Law inversion to minimise false-positive navigation cancellations during active operation.
Figure 6. LazarFragment—Active Perception State (Safe). Upon activation, the PerceptionAgent renders the directional diamond glyph and a right-turn evasion vector at the vertical centre of the canvas. The green perimeter border encodes the “Safe” system state through peripheral chromatic signalling, complemented by the corresponding haptic and acoustic channels to satisfy WCAG 1.4.1 (Use of Colour) [27]. The red “✕” cancellation button (≈72 dp square) is positioned in the lower-right periphery—a deliberate Fitts’s Law inversion to minimise false-positive navigation cancellations during active operation.
Ai 07 00206 g006
Figure 7. LazarFragment—Active Perception State (Danger/Hazard Detected). The proposed SecurityAgent behaviour transitions the perimeter border from green to saturated red upon detection of an immediate proximity hazard (1.0 m). The directional glyph continues to prescribe an evasion vector, maintaining navigational guidance continuity during the alert condition. No modal overlay or confirmatory interaction is introduced, in direct compliance with EN 301 549 clauses 4.2.2 and 4.2.3 [28], which prohibit safety state transitions from imposing new interaction requirements upon the user.
Figure 7. LazarFragment—Active Perception State (Danger/Hazard Detected). The proposed SecurityAgent behaviour transitions the perimeter border from green to saturated red upon detection of an immediate proximity hazard (1.0 m). The directional glyph continues to prescribe an evasion vector, maintaining navigational guidance continuity during the alert condition. No modal overlay or confirmatory interaction is introduced, in direct compliance with EN 301 549 clauses 4.2.2 and 4.2.3 [28], which prohibit safety state transitions from imposing new interaction requirements upon the user.
Ai 07 00206 g007
Figure 8. Destination selection screen (Routes Fragment). The binary 50–split card architecture reduces the destination decision space to its cognitive minimum: Free Navigation (voice-commanded input) and Go Home (pre-configured address activation). Each card occupies approximately half the vertical canvas, producing touch targets whose dimensions render Fitts’s Law acquisition costs negligible. The elevation of the Go Home function to a first-class top-level affordance minimises interaction depth for the highest-frequency navigational task in the target demographic, consistent with EN 301 549 clause 4.2.2 (Usage without vision) [28].
Figure 8. Destination selection screen (Routes Fragment). The binary 50–split card architecture reduces the destination decision space to its cognitive minimum: Free Navigation (voice-commanded input) and Go Home (pre-configured address activation). Each card occupies approximately half the vertical canvas, producing touch targets whose dimensions render Fitts’s Law acquisition costs negligible. The elevation of the Go Home function to a first-class top-level affordance minimises interaction depth for the highest-frequency navigational task in the target demographic, consistent with EN 301 549 clause 4.2.2 (Usage without vision) [28].
Ai 07 00206 g008
Figure 9. Route Preview and Transport Configuration Screen. The screen is architecturally divided into an upper map panel (≈45% screen height, route polyline rendered in #E7F434) and a lower route specification panel. The sequential top-to-bottom information hierarchy—destination label → “Today I feel” contextual profile override → transport mode cards (Walking: 4 h 46 min; Bus: 1 h 1 min) → “START NAVIGATION” footer button—maps directly to TalkBack’s linear focus traversal order, constructing a screen-reader-native narrative arc in compliance with WCAG 1.3.2 (Meaningful Sequence). The “Today I feel” dropdown enables real-time profile reconfiguration upstream of navigation commencement, accommodating the episodic variability of disability.
Figure 9. Route Preview and Transport Configuration Screen. The screen is architecturally divided into an upper map panel (≈45% screen height, route polyline rendered in #E7F434) and a lower route specification panel. The sequential top-to-bottom information hierarchy—destination label → “Today I feel” contextual profile override → transport mode cards (Walking: 4 h 46 min; Bus: 1 h 1 min) → “START NAVIGATION” footer button—maps directly to TalkBack’s linear focus traversal order, constructing a screen-reader-native narrative arc in compliance with WCAG 1.3.2 (Meaningful Sequence). The “Today I feel” dropdown enables real-time profile reconfiguration upstream of navigation commencement, accommodating the episodic variability of disability.
Ai 07 00206 g009
Figure 10. Accessibility Social Map fragment. A full-screen OpenStreetMap canvas annotated with community-reported accessibility data. A single floating action button provides the contribution affordance. The map canvas supports TalkBack’s explore-by-touch mode, enabling non-visual spatial orientation through the native Android accessibility framework.
Figure 10. Accessibility Social Map fragment. A full-screen OpenStreetMap canvas annotated with community-reported accessibility data. A single floating action button provides the contribution affordance. The map canvas supports TalkBack’s explore-by-touch mode, enabling non-visual spatial orientation through the native Android accessibility framework.
Ai 07 00206 g010
Figure 11. Live Support screen. The screen’s compositional logic derives from the single-function, maximum-target-size principle: a non-interactive Room ID information card and a full-width “CALL LIVE SUPPORT” primary button (touch height > 80 dp, rendered in #E7F434) with redundant telephony icon and text label. Generous whitespace allocation above and below the interactive elements eliminates adjacency-activation errors for users with upper-limb tremor, exceeding WCAG 2.5.8 (Target Size, Minimum) spacing recommendations.
Figure 11. Live Support screen. The screen’s compositional logic derives from the single-function, maximum-target-size principle: a non-interactive Room ID information card and a full-width “CALL LIVE SUPPORT” primary button (touch height > 80 dp, rendered in #E7F434) with redundant telephony icon and text label. Generous whitespace allocation above and below the interactive elements eliminates adjacency-activation errors for users with upper-limb tremor, exceeding WCAG 2.5.8 (Target Size, Minimum) spacing recommendations.
Ai 07 00206 g011
Figure 12. Settings fragment. A grouped-list architecture organises configurable parameters under five semantic section headings (Language Selection, Cognitive Profile, Physical Limitations, Addresses, Simulator), each rendered in the primary accent at ≈24–26 sp bold. Each Settings item exposes label and current value as a unified TalkBack focusable node, satisfying WCAG 4.1.3 (Status Messages) by communicating configuration state non-visually without requiring item activation. The persistent five-tab bottom navigation bar—with each tab spanning one-fifth of the screen width (≈68–70 dp)—ensures global navigation reachability across all non-critical screens in compliance with WCAG 2.4.1 (Bypass Blocks) and WCAG 3.2.3 (Consistent Navigation).
Figure 12. Settings fragment. A grouped-list architecture organises configurable parameters under five semantic section headings (Language Selection, Cognitive Profile, Physical Limitations, Addresses, Simulator), each rendered in the primary accent at ≈24–26 sp bold. Each Settings item exposes label and current value as a unified TalkBack focusable node, satisfying WCAG 4.1.3 (Status Messages) by communicating configuration state non-visually without requiring item activation. The persistent five-tab bottom navigation bar—with each tab spanning one-fifth of the screen width (≈68–70 dp)—ensures global navigation reachability across all non-critical screens in compliance with WCAG 2.4.1 (Bypass Blocks) and WCAG 3.2.3 (Consistent Navigation).
Ai 07 00206 g012
Figure 13. High-level system architecture map. This diagram illustrates the macro-level connections between the Mobile Edge Environment, External Cloud Services, and the Digital Twin, highlighting the flow of Voice/Gestures, Macro-Routing JSON, and Telemetry.
Figure 13. High-level system architecture map. This diagram illustrates the macro-level connections between the Mobile Edge Environment, External Cloud Services, and the Digital Twin, highlighting the flow of Voice/Gestures, Macro-Routing JSON, and Telemetry.
Ai 07 00206 g013
Figure 14. MAS Topology Diagram. This diagram details the internal JADE framework, illustrating the proposed coordination between the six core agents (Perception, Navigation, Personalisation, Security, Interface, System) and the specific ACL message flows, such as “Obstacles & Depth”, “Veto Requests”, and “Raw Route” data between them.
Figure 14. MAS Topology Diagram. This diagram details the internal JADE framework, illustrating the proposed coordination between the six core agents (Perception, Navigation, Personalisation, Security, Interface, System) and the specific ACL message flows, such as “Obstacles & Depth”, “Veto Requests”, and “Raw Route” data between them.
Ai 07 00206 g014
Table 1. Individual participant profile, SUS score, and overall star rating.
Table 1. Individual participant profile, SUS score, and overall star rating.
ParticipantAgeGenderVision LevelTalkBack LevelSUS ScoreStar Rating
P0138MTotal blindnessIntermediate97.55
P0242FTotal blindnessAdvanced100.05
P0342FLow functional visionBeginner72.55
P0457FTotal blindnessAdvanced90.05
P0559MTotal blindnessIntermediate92.54
P0652MLow functional visionIntermediate85.05
P0785FMonocular (R: 90%, L: 0%)Beginner100.05
P0866FLight/bulk perceptionAdvanced97.55
P0967MLight/bulk perceptionAdvanced92.54
P1039MTotal blindnessAdvanced85.04
P1142FTotal blindnessAdvanced92.54
Mean53.5---91.364.64/5
Table 2. UEQ scale means across all participants (scale: −3 to +3).
Table 2. UEQ scale means across all participants (scale: −3 to +3).
UEQ ScaleMeanSDBenchmark Interpretation
Attractiveness+2.720.50Excellent
Perspicuity+2.480.82Excellent
Stimulation+2.520.82Excellent
Dependability+2.431.05Excellent
Efficiency+2.360.74Excellent
Novelty+2.071.21Excellent
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Herrero-Martín, P.; García-Ballestero, Á. Designing Human-Centred Adaptive AI Navigation for Blind and Visually Impaired Individuals: A Cognitive Load-Aware Framework for Accessible Urban Mobility. AI 2026, 7, 206. https://doi.org/10.3390/ai7060206

AMA Style

Herrero-Martín P, García-Ballestero Á. Designing Human-Centred Adaptive AI Navigation for Blind and Visually Impaired Individuals: A Cognitive Load-Aware Framework for Accessible Urban Mobility. AI. 2026; 7(6):206. https://doi.org/10.3390/ai7060206

Chicago/Turabian Style

Herrero-Martín, Pilar, and Álvaro García-Ballestero. 2026. "Designing Human-Centred Adaptive AI Navigation for Blind and Visually Impaired Individuals: A Cognitive Load-Aware Framework for Accessible Urban Mobility" AI 7, no. 6: 206. https://doi.org/10.3390/ai7060206

APA Style

Herrero-Martín, P., & García-Ballestero, Á. (2026). Designing Human-Centred Adaptive AI Navigation for Blind and Visually Impaired Individuals: A Cognitive Load-Aware Framework for Accessible Urban Mobility. AI, 7(6), 206. https://doi.org/10.3390/ai7060206

Article Metrics

Back to TopTop