Previous Article in Journal
Organizing Relational Complexity—Design of Interactive Complex Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Perception and Monitoring of Sign Language Acquisition for Avatar Technologies: A Rapid Focused Review (2020–2025)

Mada Qatar Assistive Technology Center, Doha P.O. Box 24230, Qatar
*
Author to whom correspondence should be addressed.
Multimodal Technol. Interact. 2025, 9(8), 82; https://doi.org/10.3390/mti9080082
Submission received: 3 July 2025 / Revised: 7 August 2025 / Accepted: 8 August 2025 / Published: 14 August 2025

Abstract

Sign language avatar systems have emerged as a promising solution to bridge communication gaps where human sign language interpreters are unavailable. However, the design of these avatars often fails to account for the diversity in how users acquire and perceive sign language. This study presents a rapid review of 17 empirical studies (2020–2025) to synthesize how linguistic and cognitive variability affects sign language perception and how these findings can guide avatar development. We extracted and synthesized key constructs, participant profiles, and capture techniques relevant to avatar fidelity. This review finds that delayed exposure to sign language is consistently linked to persistent challenges in syntactic processing, classifier use, and avatar comprehension. In contrast, early-exposed signers demonstrate more robust parsing and greater tolerance of perceptual irregularities. Key perceptual features, such as smooth transitions between signs, expressive facial cues for grammatical clarity, and consistent spatial placement of referents, emerge as critical for intelligibility, particularly for late learners. These findings highlight the importance of participatory design and user-centered validation in advancing accessible, culturally responsive human–computer interaction through next-generation avatar systems.

1. Introduction

According to the World Federation of the Deaf, approximately 70 million deaf people worldwide use sign languages as their first or preferred language [1]. These natural languages embody complex linguistic structures and cultural significance, with research indicating that, when children have access to fluent signers from birth, they acquire sign language on a timeline comparable to spoken language acquisition. Infants demonstrate similar sensitivity to visual linguistic information and achieve parallel developmental milestones [2]. However, access to fluent signers and structured sign language education remains inconsistent across regions and demographics, contributing to disparities in linguistic outcomes. Even with growing interpreter services, real-time access remains inconsistent; avatar-based sign language agents are, therefore, emerging as a scalable bridge.
Sign language acquisition is a critical process through which individuals, particularly those who are deaf or hard of hearing, develop proficiency in their primary means of communication [2]. It refers to the process by which individuals, especially deaf children, learn the structure and use of sign language through interaction and exposure. It involves mastering the phonological, morphological, and syntactic components of a visual-manual language, often influenced by the age of first exposure and access to fluent models.
Avatar generation, the creation of signing animations on virtual agents, differs significantly from sign-recognition systems that translate video input into text, as it involves synthesizing linguistically coherent and visually natural movements rather than interpreting them. Research highlights that avatars provide a valuable alternative where human interpreters are unavailable, particularly for round-the-clock communication access, private on-device interaction, and barrier-free content consumption [3]. Studies show that realistic signing avatars require the integration not only of manual signs, but also of nonmanual signals such as facial expressions and torso movement, which are crucial for sign language comprehension [4,5]. This review, therefore, distills the linguistic and perceptual variables that must be captured or rendered to build perceptually faithful sign language avatars.
Within this context, current scholarship has increasingly converged on three interdependent research themes that shape the development and evaluation of signing avatars:
  • Acquisition and AoA Sensitivity: Research consistently shows that individuals exposed to sign language later in life face lasting challenges in syntactic processing, classifier use, and real-time sign comprehension. These effects are especially pronounced when avatars lack contextual cues or display abrupt motion patterns [6].
  • Perceptual Design and Realism: Recent work shows that intrinsic avatar properties, such as movement fluidity, facial expressions, and naturalness, all directly affect user comprehension and trust. Particularly, high-fidelity motion capture avatars are rated more favorably by early-exposed users [6,7,8].
  • Monitoring and Feedback Integration: New research emphasizes user-in-the-loop design using eye-tracking, rating scales, and iterative feedback loops to improve signing avatars. These approaches help identify comprehension bottlenecks and perceptual gaps, especially across a diverse signing population [9].
Building on this foundation, current work probes the perceptual complexity that different learner profiles bring to sign language acquisition, leveraging motion capture (digital tracking of human movement to animate avatars), eye-tracking, and neuroimaging to obtain fine-grained kinematic, attentional, and cortical evidence. Monitoring refers to longitudinal instruments and coding schemes that chronicle developmental milestones, whereas perception studies illuminate how signers visually and somatosensorily process input captured through the same multimodal sensors that drive avatar pipelines [10].
Sign language perception refers to the sensory and cognitive processes through which individuals visually interpret signed communication, including handshapes, movement, facial expressions, and spatial structure [11]. This perceptual ability is shaped by both linguistic experience and visual motion processing systems [12]. Research on sign language perception and acquisition has evolved through the intersection of linguistic, cognitive, and technological innovation. One of the most influential factors is the critical period for language acquisition, which highlights the profound impact of early exposure. Studies consistently demonstrate that the age at which a deaf individual first learns sign language determines their proficiency in phonology, morphology, and syntax, regardless of language modality [2,13,14,15,16]. Late learners, in contrast, often experience increased cognitive load and rely more on visual classifiers than syntactic structures for deriving meaning [13,14,15,16]. Delays in age of acquisition (AoA) likewise influence how avatars should prioritize syntactic cues over lexical glosses for late-exposed users.
Another key area of development lies in the exploration of perceptual and phonological complexity in visual–gestural modality. Research indicates that features such as handshape and movement complexity significantly influence how signs are perceived and produced, with novice signers and young children struggling more with complex signs [17]. Sign languages appear to be adapted for visual perception, with frequently used signs often articulated further from the face to align with the capacities and limitations of the human visual system. These findings reinforce the importance of considering perceptual factors in both pedagogy and interface design. Such perceptual constraints must also inform polygon density choices and handshape interpolation tolerances in signer avatars. Figure 1 showcases a full-sentence rendering in Qatari Sign Language by the avatar “BuHamad,” demonstrating the need for coordinated hand movement, facial expressions, and torso shifts to convey grammatical structure and sentence modality effectively.
Parallel to recognition, gesture-synthesis research now exploits deep-learning inverse kinematics to animate avatars, yet still lacks standard perceptual benchmarks [18,19]. Inverse kinematics refers to AI systems that predict natural joint motion from endpoints. Recent efforts focus not only on manual signs, but also on integrating non-manual features, such as facial expressions and body posture, which are critical for grammatical and affective content in sign languages. However, challenges persist, especially in developing robust datasets and achieving high accuracy across diverse sign languages and signing styles [18,19].
Beyond practical applications, research in this field continues to highlight broader linguistic and cognitive insights. Sign language studies contribute to our understanding of iconicity, neural mechanisms underlying language processing, and the emergence of new languages in deaf communities, which are often inaccessible in spoken language research [20]. These findings affirm the cognitive richness of sign languages and their value in linguistic theory and neuroscience.
In light of these developments, it is increasingly clear that monitoring and understanding sign language acquisition are vital not only for ensuring language development and academic success, but also for supporting social inclusion, health, and well-being [2,21,22]. Integrating sign language into educational systems promotes equity and linguistic justice [21,23], while assistive technologies especially AI-enhanced platforms offer new opportunities for tailored, inclusive learning experiences [24,25,26]. If left unaddressed, language deprivation can have long-lasting effects on cognitive development, access to services, and community participation [2,21]. Conversely, early acquisition supports bilingualism, enhancing literacy and enabling Deaf individuals to thrive in both sign and written/spoken language environments.
As the field of sign language technology matures, empirical findings from acquisition and perception research play a direct role in shaping avatar development. Movement fidelity, the timing of manual and nonmanual features, and user preferences rooted in age of acquisition or signing fluency all affect how avatars are perceived and accepted by Deaf communities [6]. Realistic facial expression and torso coordination have been repeatedly identified as essential to ensure grammatical accuracy and legibility [8]. Studies also show that small deviations in synthesized sign motion substantially degrade perceived intelligibility, informing the need for motion capture fidelity and biomechanically accurate joint constraints [27]. Similarly, perceptual features, such as torso coordination and nonmanual signals, directly affect the comprehensibility of avatar-based signing. Furthermore, participatory design models, such as the EASIER project, illustrate how direct signer feedback helps translate linguistic insights into technological benchmarks [28]. Therefore, this review maps empirical findings in sign language research to those variables and capture methods that most directly feed avatar synthesis.
Against this background, we conducted a rapid review (2020–2025) to map which empirical variables and capture techniques most directly feed sign language avatar development. Specifically, our aim was to focus on answering the research question: Which monitored constructs and perceptual metrics best inform intelligible, culturally faithful signer avatars?

2. Related Works

Prior reviews have explored sign language acquisition, sign synthesis, and avatar development, but they typically focus on computational pipelines or linguistic theory without connecting empirical signer data to avatar design variables. For example, [29] provided a 40-year systematic review of avatar systems, noting the lack of a standardized development framework and limited integration of perceptual benchmarks from signers themselves. Similarly, Reference [6] found that the acceptability of signing avatars varies based on the users’ age at sign language acquisition, with movement naturalness and expression fidelity being key perceptual factors; however, their findings have not been systematically translated into avatar evaluation metrics.
On the evaluation front, Reference [30] emphasized the need for user-centered assessment in avatar systems, and introduced data-driven techniques that were evaluated directly by Deaf signers. Reference [31] used motion synthesis and phonological recombination, but their work emphasized animation flexibility rather than perceptual fidelity from a user study perspective.
Our review targets empirical studies on sign language acquisition, perception, and neural/behavioral responses, focusing only on primary research (not technical pipelines or reviews). Our goal is to extract human-derived constructs, such as gaze, facial expression, classifier fluidity, and EEG signatures, that can inform avatar fidelity and perceptual realism benchmarks. In this way, we uniquely bridge psycholinguistics and assistive technology through evidence-based avatar design.

3. Materials and Methods

3.1. Rapid Review Methodology

This study utilizes a rapid review methodology to provide timely insights rather than comprehensive systematic coverage. While systematic reviews provide in-depth analyses, they typically require months or even years to complete, making them unsuitable for fast-paced advancements in this area. To address this, a rapid review approach was adopted. According to Tricco et al. [32], a rapid review is a condensed and focused form of knowledge synthesis where aspects of the systematic review process are streamlined or omitted, allowing for quicker insights. This rapid review adhered broadly to PRISMA-RR guidelines, explicitly prioritizing speed and timeliness of synthesis [33,34].

3.2. Search Strategy

A comprehensive search was conducted in early 2025, covering publications from January 2020 to March 2025. The search focused on journal articles, conference proceedings, and other scholarly sources published within the specified period. The following electronic databases were searched:
ScienceDirect;
IEEE Xplore;
Google Scholar;
ACM Digital Library;
Springer Link.
Search Terms and Syntax.
This review followed structured, transparent search practices, in line with rapid review methodology, as outlined by [35], prioritizing conceptual focus, reproducibility, and reduced screening volume. We used three core search terms:
  • “Sign Language Acquisition”;
  • “Sign Language Perception”;
  • “Sign Language Acquisition Monitoring”.
These terms were selected based on their direct relevance to the review’s aim: to identify empirically grounded perceptual and monitoring constructs in human sign language use that could inform the development and evaluation of signing avatars. A domain expert in sign language linguistics reviewed and validated the term selection to ensure conceptual fit. The search strategy employed a combination of keywords and Boolean operators to capture relevant studies. The search strategies were adapted to the specific functionalities and indexing of each database to ensure comprehensive coverage. These terms were used in various combinations across databases, tailored to each platform’s search capabilities, as seen in Table 1 below.

3.3. Inclusion and Exclusion Criteria

Studies were included if they met the following criteria:
  • Publication Date: Published between January 2020 and March 2025.
  • Language: English.
  • Content Focus: Addressed sign language acquisition and/or perception in various contexts.
  • Publication Type: Peer-reviewed academic sources, including journal articles and conference proceedings.
  • Avatar Relevance: Study must report at least one empirical metric that can feed avatar design.
The exclusion criteria encompassed the following:
  • Non-Scholarly Articles: Such as those from social media and popular press.
  • Not Peer Reviewed: Theses and books.
  • Age: Studies on infants and newborns were excluded because avatar work targets literate children and adults.
  • Language: Publications in languages other than English.
  • Literature Reviews: Used for contextual background, but excluded from final data synthesis to avoid redundancy.
Studies were excluded due to “participant mismatch” when participants were infants or newborns, as our review focuses on findings applicable to avatar systems intended for literate children and adults. “Ineligible publication types” included non-peer-reviewed sources such as theses and books, as well as non-scholarly content from the popular press or social media. Additionally, studies were excluded if their primary focus was medical rather than linguistic, cognitive, or educational, as this review does not address medical diagnostics or interventions.

3.4. Study Selection, Data Extraction, and Verification

A total of 274 records were identified using systematic searches in five major academic databases: ScienceDirect (n = 37), IEEE Xplore (n = 125), Google Scholar (n = 44), ACM Digital Library (n = 5), and SpringerLink (n = 63). No additional records were retrieved through hand-searching or citation chaining.
Duplicate records were identified and removed using Zotero reference management software (Version 6.0.22), resulting in 269 unique records. These were exported to Microsoft Excel for structured screening and tracking.
Titles and abstracts of the 269 records were screened based on the inclusion and exclusion criteria. Of the 269 records remaining after duplicate removal, 144 were excluded during title and abstract screening for not meeting inclusion criteria. An additional 48 records were excluded due to ineligible publication types. Full texts were retrieved for the remaining 77 records and were assessed for eligibility. Sixty articles were excluded at the full-text stage for the following reasons: review papers (n = 4), ineligible publication types (n = 5), participant mismatch (n = 4), non-English language (n = 1), out-of-scope content (n = 45), and inaccessible documents (n = 1). In total, 17 studies met all of the inclusion criteria and were retained for synthesis. During full-text screening, every study had to report at least one empirical metric that can feed avatar synthesis and discuss or imply how that metric maps onto visual–gestural rendering. While only 17 studies met the inclusion criteria from an initial 274 records, this reflects the narrow scope of this review, which prioritized primary empirical research on acquisition or perceptual processing in human sign language users. This exclusion rate is consistent with rapid review protocols, where focus and relevancy are prioritized over volume.
Data from each included study was extracted using a structured Excel template. Extracted variables included author and year of publication, country of study, participant characteristics (age, hearing status, and signer profile), sign language studied, research design, monitoring or perception technologies used, and key findings. Because rapid reviews sacrifice some double-screening for speed, we adopted a light-touch reliability check: a second reviewer re-screened 10% of abstracts (n = 27) and 10% of full-texts (n = 7). Agreement on the avatar relevance flag and inclusion decision reached 87% (simple percent agreement). All discrepancies were resolved using discussion within one hour, after which no further calibration was required.

3.5. Quality Appraisal

Given the rapid nature and time-constrained methodology of this review, a formal, standardized quality appraisal of the included studies was not undertaken. This decision aligns with the scope and purpose of rapid reviews, which prioritize the timely synthesis of evidence to inform research, policy, or practice, often in contexts where immediate guidance is needed. While this approach limits the ability to rigorously assess the methodological robustness of each individual study using established appraisal tools, efforts were made to maintain critical awareness of study quality throughout the review process.

3.6. Data Synthesis

Due to methodological heterogeneity and the rapid nature of this review, data synthesis was conducted narratively rather than via meta-analysis.

4. Results

This section synthesizes the key findings from seventeen empirical studies that examined the monitoring and perception of sign language acquisition, with attention being paid to participant characteristics, methodological approaches, targeted linguistic features, evaluative metrics, and evidence of effectiveness across developmental and learner contexts. In total, 17 papers were found that explored sign language acquisition during the period of our study. The complete study selection process is illustrated in Figure 2, which presents the flow diagram outlining the identification, screening, eligibility, and inclusion stages.
Table 2 below lists the relevant studies, the year of publication, and the venue it was published in.
The purpose of this study was to explore the various aspects of sign language acquisition, with a focus on the age of acquisition, methodologies used, and the effectiveness of language learning techniques. The following section presents the key findings from the reviewed papers.

4.1. Participant Groups and Age Ranges in Sign Language Acquisition Studies

To contextualize the findings of the 17 included studies, we first categorized participant populations based on their age of sign language acquisition and hearing status. This section, therefore, presents a comparative synthesis of the results for native signers, late signers, hearing second-language learners, and mixed control groups, highlighting how each group’s characteristics inform perceptual metrics and avatar design considerations.

4.1.1. Native Signers

Native signers are individuals exposed to sign language from birth, typically through Deaf parents. They serve as a crucial baseline in sign language acquisition studies due to their early and natural language input. Their performance often reflects native-like grammar, vocabulary, and discourse development, making them a benchmark for comparison across age-of-acquisition and modality studies. Such participants were included in studies involving German Sign Language [36], Austrian Sign Language [6], Turkish Sign Language [40], Polish Sign Language [45], American Sign Language [46,48], and K-RSL [30,49]. In Imashev et al. [30], native signers were included among the twelve Deaf participants, three of whom reported using sign language exclusively from birth at home. Benchmark groups of native signers were also featured in studies on the Sign Language of the Netherlands [39,44].

4.1.2. Late Signers

Late signers are individuals who acquire sign language after early childhood, often due to being born to hearing parents with limited or delayed access to signing. These participants illustrate the developmental consequences of delayed linguistic exposure, such as reduced syntactic fluency or lexical access. Studies featuring late signers include research on Bhutanese Sign Language [37], Turkish Sign Language [40,41], Austrian Sign Language [14,41], and American Sign Language [46,48]. Reference [49] also included late-exposed Deaf signers and adult learners from Kazakhstan and Russia, providing feedback on avatar performance and comprehension. Reference [30] specifies that most participants acquired K-RSL during kindergarten or school-age years, with several reporting no Deaf family members and limited signing exposure at home.

4.1.3. Hearing M2L2 Learners of Sign Language

M2L2 learners are hearing individuals who acquired sign language as a second language and in a second modality (spoken to visual–gestural). These learners are often interpreter trainees or students, and provide insight into cross-modal learning and second-language strategies. Studies involving this group examined learners of British Sign Language [38], Sign Language of the Netherlands [39,44,47], American Sign Language [43], Polish Sign Language [45], and Kazakh—Russian Sign Language [49]. Imashev et al. included certified interpreters and CODAs evaluating avatar technologies, providing cross-modal perception insights [49]. In Imashev et al. [30], three interpreters participated in the study: one as a human agent; one as a facilitator; and one as a co-researcher, contributing insights from both linguistic expertise and cross-modal familiarity. One human agent was also a CODA with extensive experience in Deaf education and broadcast interpretation.

4.1.4. Mixed and Control Groups

Several studies employed mixed or control groups to compare language performance across diverse exposure histories. For example, research on co-speech gesture in ASL included bilinguals, L2 signers, and non-signers [42], while benchmark groups of Deaf and hearing teachers were included in studies on the Sign Language of the Netherlands [39,44]. Imashev et al. implemented a mixed-group design in both its online and in-person studies, comparing feedback from Deaf signers, interpreters, and novice learners of K-RSL [49]. Imashev et al. [30] used a well-defined mixed group of 12 Deaf signers with varied age of exposure and education levels, alongside interpreter participation. This diversity enabled the team to explore avatar perception across age, modality, and linguistic backgrounds.

4.2. Summary of Methodological Approaches and Aspects Examined Across Studies

The 17 reviewed studies utilized a diverse range of methodologies to examine sign language acquisition and perception, including narrative elicitation, reaction-time experiments, neuroimaging, longitudinal tracking, and qualitative case studies. This section groups the methodologies into thematic categories and outlines the specific linguistic and cognitive aspects each approach targeted.
Narrative elicitation was the most frequently used method. Reference [36] developed the NaKom DGS-Test to assess narrative skills in DGS, while Reference [41] used Tom and Jerry clips to examine reference tracking strategies in Turkish Sign Language. Reference [40] employed the Spider Story video to study referent introduction in Deaf adults, and Reference [42] explored the influence of ASL acquisition on co-speech gesture production during English narration of cartoon clips. Reference [43] coded multiple types of depicting signs in retellings of cartoon stimuli in ASL. In Imashev et al. [30], brief narrative-style signing sequences were performed by avatars and a human signer using simple, everyday phrases in K-RSL. Sentence content was carefully balanced for handshape and structural complexity to ensure comparability across agent types.
Acceptability judgment tasks and reaction-time measures were key components in studies such as [13,14], where participants rated grammaticality and processing of syntactic structures in ÖGS. Reference [46] used lexical decision tasks to assess the recognition of ASL signs and non-signs, focusing on the influence of early exposure and phonological properties. Reference [48] used eye-tracking in ASL sentence comprehension to measure real-time lexical access based on degree and type of phonological similarity. Their methodology included a visual world paradigm with time-locked divergence point analysis and gaze ratio windowing to assess recognition efficiency across early and late signers.
Imashev et al. also employed eye-tracking using Tobii Pro Glasses 2 to analyze gaze fixation and attention patterns during avatar interaction [30]. Participants’ focus areas (e.g., face vs. hands) were used to infer perceived fluency and comprehension across different agent types.
Longitudinal designs were employed by [38], who assessed BSL proficiency and cognitive skills across four testing sessions over three years in interpreting students, and by [44], who tracked classifier construction acquisition in learners of NGT through 15 sessions over two years. Reference [47] used both spontaneous and elicited data to study plurality strategies over a one-year NGT learning period.
Neuroimaging and neurostimulation techniques were used in [14], who utilized EEG to examine N400 responses to syntactic variation in ÖGS, and in [45], who combined fMRI and TMS to analyze lexical processing in learners of PJM.
Qualitative approaches were adopted by [37], who used in-depth semi-structured interviews, participant observation, the collection of visual data, and, finally, applying thematic analysis in a case study of delayed sign language acquisition’s effect on writing development. Reference [36] also implemented community-based participatory research methods and adhered to GDPR-compliant data protection protocols, including pseudonymization, encrypted storage, and layered consent.
Imashev et al. incorporated qualitative feedback through interpreter-facilitated debriefing during and after participant interaction [30]. Users shared detailed reflections on avatar comprehension, signing quality, emotional impact, and visual appearance (e.g., “robot-like,” “realistic,” or “unpleasant”), which were logged and thematically analyzed alongside quantitative data. Furthermore, Reference [30] introduced a multimodal, user-centered protocol combining the following:
  • The Godspeed Questionnaire adapted with thermometer-style visual analog scales and culturally relevant clip-art for accessibility.
  • The Funometer scale to measure emotional state shifts before and after exposure to each agent.
  • A sorting task using animated GIFs of agents for categorizing perceived animacy, anthropomorphism, intelligence, and likeability.
The inclusion of these visual, accessible, and linguistically appropriate instruments highlights methodological innovation in avatar perception studies, particularly for diverse Deaf populations.

4.3. Measurement Constructs Across Studies

Across the reviewed studies, investigators employed a mix of indicators to track sign-language acquisition and processing. In [48], researchers used a combination of gaze-based divergence point analysis and log gaze ratios to quantify sentence-level phonological interference. Their approach enabled fine-grained analysis of how shared handshape, location, and movement features impacted lexical access for early and late ASL signers. Four studies augmented behavioral tasks with neurophysiological metrics (e.g., N400 event-related potentials, functional MRI, and transcranial magnetic stimulation) to capture on-line processing effort [13,14,45,46]. Eight studies relied primarily on fine-grained discourse coding, most often being narrative retell tasks, to quantify grammatical accuracy, classifier use, referential cohesion, and spatial agreement marking [36,39,40,41,42,43,44,47]. Interpreter training work added visuospatial memory and mental rotation tests as predictive aptitude measures [38], whereas one clinical case study focused on tri-lingual writing outcomes following late first-language exposure [37]. Two studies used mixed-methods frameworks to evaluate avatar-based sign language presentation [30,49], combining subjective perception scales, comprehension tasks, and in-person eye-tracking or emotional state monitoring to assess intelligibility, human likeness, and usability among Deaf participants. Table 3 below presents a detailed breakdown of the methodologies, aspects examined, and key outcome indicators followed in each study.

4.4. Comprehensive Synthesis of Findings

Across the reviewed studies, several consistent patterns and notable differences were observed in relation to the timing of sign language acquisition, cognitive processing, linguistic development, and gesture use.

4.4.1. Age of Acquisition and Its Impact on Sign Language Perception

Late acquisition of sign language was associated with increased cognitive load, particularly during syntactic and information structure analysis [14]. Delayed exposure also negatively affected literacy-related linguistic competencies, especially in the development of written second languages [37]. Age-related effects were reported, with older learners responding more slowly and showing variations in grammatical acceptability judgments [13]. A study further revealed that late learners struggled more with identifying target signs when these were preceded by similar-looking signs (phonological primes) [48]. This delay occurred regardless of which phonological feature overlapped (handshape, location, or movement), although location overlap proved particularly disruptive. In contrast, studies on referent introduction showed that late learners performed comparably to native signers in using nominal and extension classifiers, with Deaf-of-Deaf signers displaying subtle advantages in spatial reference tracking [40,41]. These findings suggest that avatars intended for late-acquiring users should prioritize simplified syntactic cues and clearer visual–spatial representations to reduce cognitive load and facilitate comprehension.
Imashev et al. supported these findings by showing that late-exposed Deaf users reported difficulty interpreting avatars lacking facial expressions and smooth motion transitions, especially in the absence of contextual redundancy [49]. Deaf participants showed more skepticism toward avatar technologies than interpreters, reinforcing that early exposure may shape tolerance for imperfect signing agents. Imashev et al. further demonstrated that Deaf participants with later sign language acquisition histories were more likely to express discomfort, mistrust, or negative emotional responses toward avatars with abrupt or unnatural motion [30]. Eye-tracking data indicated heightened fixation on the hands over the face for these users, suggesting compensatory attention strategies when facial grammar was absent or unclear. Mood tracking (Funometer) also revealed greater negative emotional shifts in this group following avatar interaction, underscoring the perceptual and affective impact of age of acquisition.

4.4.2. Spatial Grammar and Classifier Constructions

Gesture production studies found that second-language signers produced more iconic manual gestures than native signers, suggesting differences in gesture systems based on experience [42]. Depicting signs (DS) were acquired with relative ease and correlated with sign language comprehension [43]. Learners demonstrated early use of plural referent expressions, using both familiar and novel strategies [47].
Studies on classifier acquisition reported early emergence of meaningful handshape representations, despite initial difficulties in orientation and handshape accuracy [39]. Persistent difficulties were noted in acquiring verb agreement, including errors in movement, orientation, and referent location [44].
In [49], participants evaluated the performance of avatar agents on sentence-level comprehension tasks involving K-RSL signs of varying complexity. Classifier accuracy, especially for plural referents and complex spatial constructions, was rated lower in avatars compared to human agents. Deaf participants consistently identified misarticulations in spatial verb agreement and pointed out the importance of facial grammar for disambiguating sentence types. Building on this, Imashev et al. confirmed that classifier constructions were among the most commonly flagged issues in avatar signing [30]. Participants emphasized that incorrect handshape transitions and movement paths disrupted sentence interpretation, especially when referencing multiple entities or describing spatial arrangements. Sorting tasks and qualitative interviews revealed that avatars lacking clear classifier structures were perceived as “robotic” or “unnatural.” Eye-tracking heatmaps also showed reduced facial fixation and increased attention to classifier articulation, particularly in synthetic agents with high motion jerkiness.
For effective avatar communication, precise rendering of classifier constructions, verb agreement, and spatial grammar features must be prioritized, ensuring linguistic accuracy and perceptual clarity.
The study by Wienholz and Lieberman also found that signers process phonological parameters with varying sensitivity [48]. Early signers exhibited faster recognition when signs shared location alone, while late signers showed inhibition not just for location, but also for the combined location and movement overlap. This reinforces the idea that perceptual salience and parameter complexity can interact differently with linguistic background, further emphasizing the need for careful visual–spatial design in sign avatars.

4.4.3. Methods for Monitoring and Perceiving Sign Acquisition

Across the reviewed studies, investigators employed a mix of indicators to track sign-language acquisition and processing. Four studies augmented behavioral tasks with neurophysiological metrics (e.g., N400 event-related potentials, functional MRI, and transcranial magnetic stimulation) to capture online processing effort [13,14,45,46]. Four studies relied primarily on fine-grained discourse coding, most often narrative retell tasks, to quantify grammatical accuracy, classifier use, referential cohesion, and spatial agreement marking [28,29,30,31]. Interpreter training work added visuospatial memory and mental rotation tests as predictive aptitude measures [38], whereas one clinical case study focused on tri-lingual writing outcomes following late first-language exposure [37]. These empirical monitoring techniques, including motion capture, eye-tracking, and neuroimaging, provide essential quantifiable metrics critically for verifying the perceptual fidelity and realistic animations of avatars. Figure 3 illustrates this with a timelapse and motion trajectory of the avatar “BuHamad” signing the word “World” in Qatari Sign Language. This visualization captures the importance of smooth and accurate motion paths in enhancing sign clarity, particularly in the absence of human-like fluidity.
Visuospatial working memory was strongly associated with sign language proficiency and performance in interpreter training programs [38]. Neurological studies in hearing learners identified differential hemisphere activation patterns following sign language instruction, linked to visuospatial and phonological processing [45].
Phonological processing studies indicated that lexical organization by form was present, with slower recognition times for signs in dense phonological neighborhoods, especially for low-frequency items [46]. Wienholz and Lieberman extended this line of inquiry using eye-tracking, and found that phonological similarity, defined across six specific parameter combinations had inhibitory or facilitative effects depending on the signer’s age of ASL acquisition [48]. Their methodology isolated both degree (number of shared features) and type (which features are shared) using naturalistic ASL sentences, providing new insights into sentence-level phonological activation.
Imashev et al. introduced new multimodal evaluation strategies specific to avatar-user interaction [49]. These included an adapted Funometer scale to track emotional responses before and after each signing agent interaction, and an eye-tracking setup to compare focus on manual versus non-manual features. Their results indicated strong attention toward facial regions when available, especially in determining sentence modality. Translation accuracy scores varied by agent, with data-driven avatars outperforming manually animated ones in some tasks, but still falling behind human agents. Reference [49] also reported that comprehension difficulty was higher for sign sequences lacking natural transitions and mouth movements, and participants often relied on interpreters for questionnaire clarification, demonstrating the need for Deaf-friendly multimodal data collection methods. Building on this, Reference [30] employed a more structured and culturally adapted protocol that integrated thermometer-style Godspeed visuals, customized Likert items in K-RSL, and a sorting task using animated GIFs to compare agents across dimensions like animacy and intelligence. In-person eye-tracking using Tobii Pro Glasses 2 provided high-resolution data on participant gaze behavior, revealing consistent shifts between facial and manual cues based on agent fluency. The study also tracked emotional changes using a modified Funometer, which captured pre/post interaction affective states and identified increased negative mood when participants interacted with avatars rated as unnatural. These tools, coupled with interpreter-supported debriefing and translated instructions, emphasized the importance of accessible, multimodal evaluation in Deaf-centric avatar studies.
Overall, these findings highlight how cognitive, linguistic, and neurocognitive factors interact in sign language learning, with critical implications for curriculum design and early intervention.

5. Discussion

This rapid review synthesizes contemporary empirical on sign language acquisition and perception (2020–2025), with implications for avatar design. By examining seventeen studies published between January 2020 and March 2025, we observed several recurring patterns and innovations that contribute into a more nuanced understanding of sign language learning, particularly in relation to age of acquisition, modality effects, and instructional settings.

5.1. Age of Acquisition and Linguistic Development

As detailed in Section 4.4, AoA is a dominant predictor of linguistic outcomes, with late signers showing significant deficits in syntactic processing, classifier use, and metalinguistic judgments across languages [13,14,40]. Studies using EEG and behavioral measures have shown that late learners experience increased cognitive load and reduced fluency in both Austrian and Turkish Sign Languages [13,14,45]. These difficulties are especially pronounced in technologically mediated settings, where subtle timing and grammatical cues may be less salient.
Research using eye-tracking and behavioral paradigms reveals that late signers exhibit increased competition in lexical access, particularly when signs share phonological parameters, such as handshape or movement [30,48,49]. Early signers rapidly identify target signs, while late signers are delayed when there is phonological overlap, pointing to a persistent “phonological bottleneck” due to reduced automaticity in mapping form to meaning [49].
Studies evaluating avatar-based signing agents show that late-exposed signers face increased comprehension difficulties when avatars lack facial expressions or natural movement transitions [49]. Comprehension is particularly reduced for decontextualized or complex utterances, and late signers often report frustration, mistrust, and emotional discomfort, especially when multimodal cues are unclear [30]. Eye-tracking and mood tracking reveal that late signers compensate by shifting attention from the avatar’s face to the hands, highlighting a need for avatars to provide salient, redundant cues to support comprehension and engagement.
Despite overall challenges, some late signers attain discourse-level skills, such as referent tracking and narrative cohesion, when given consistent exposure, even later in life [37,38]. Some are able to interpret high-complexity avatar signing with appropriate support, suggesting that functional fluency is achievable for a subset of late learners [49].
While early exposure to sign language remains the most effective foundation for grammatical development, targeted support can help late learners achieve practical proficiency. User feedback and empirical findings emphasize the value of avatar designs with smooth motion, expressive facial features, and contextual scaffolding, especially for late-acquiring users [30]. Tailoring avatar interaction to user profiles, by adjusting syntax, visual cues, and facial grammar, can enhance both communication and equitable access.

5.2. Cognitive and Neurocognitive Correlates

Behavioral and neuroimaging studies show that lexical organization by phonological similarity slows sign recognition, particularly for late learners, and that these effects are parameter-specific and time-sensitive [46,48]. Early and late signers interpret shared perceptual features differently, reflecting the influence of language experience and pointing to modality-independent processing mechanisms.
Neuroimaging and behavioral evidence highlight the importance of visuospatial working memory and perceptual–motor skills in sign language learning [30,43,50]. Distinct patterns of brain activation are found in learners versus native signers, and greater cognitive effort is needed when visual cues are reduced or ambiguous. When avatars display abrupt transitions or omit facial markers, late learners report needing exaggerated spatial distinctions and slower signing speeds to maintain comprehension [51]. Eye-tracking and mood tracking further indicate increased cognitive allocation and reduced emotional positivity in these contexts.
These findings indicate that avatar design must reflect perceptual and cognitive variability through clarity of movement, precision in parameter rendering, and consistent timing cues. Avatar systems should also include options for user-controlled signing pace and facial expression intensity to accommodate different cognitive processing profiles.

5.3. Importance of Perceptual Fidelity and Spatial Grammar

Gestural and iconic strategies play a critical role in supporting sign language learning, particularly for hearing and L2 signers [42,43,51,52,53]. The use of depicting signs and iconic overlap between gestures and signs facilitates vocabulary acquisition and comprehension, while learners with prior experience in sign language or gestural modalities show faster progress.
Learners, especially L2 and late learners, often face persistent difficulties with classifier constructions and agreement verbs, including errors in handshape, orientation, and spatial marking [39,44,54]. Such challenges are found across different sign languages and are particularly evident in verb agreement contexts.
Avatar-generated signing is often rated as less intelligible when classifier transitions are abrupt, pace is too rapid, or facial grammar is limited [49]. Users may misinterpret referent roles or verb directionality when torso shifts and facial cues are missing, and persistent errors in spatial grammar can diminish both comprehension and emotional engagement [30,50]. Perceptual fidelity (smooth transitions, natural facial expressions, and clear spatial referencing) is therefore essential for both intelligibility and trust.
These findings point to the need for extended, form-focused training in morphophonological contrasts and spatial grammar to facilitate acquisition. Thus, perceptual fidelity, including smooth transition dynamics, parameter consistency, and facial movement rendering, must be prioritized in avatar development. Avatar design should explicitly incorporate validated spatial linguistic markers to ensure both linguistic intelligibility and user trust among Deaf and hard-of-hearing communities.

5.4. Implications for Educational Practice and Accessibility

The reviewed studies highlight several critical implications for pedagogy and policy. First, early exposure to natural sign language remains paramount for robust language development and academic success, as exemplified in literacy outcomes for late signers [37]. Although biomechanical outcomes, such as ergonomic signing patterns, may not significantly differ by age of exposure, the cognitive and linguistic benefits from early access remain well supported [55]. Second, interpreter education and L2 instruction should integrate visuospatial aptitude assessments to better tailor curricula to learner profiles [38,45]. Moreover, the findings reinforce the importance of accessibility-focused innovations in educational contexts. This includes the integration of sign language-specific assessment tools, gesture-based scaffolds, and technology-supported instruction. Such tools not only enhance learner engagement, but also help address structural gaps in resource-limited regions. Reference [49] adds to this conversation by highlighting the importance of user feedback in avatar evaluation. Deaf participants have emphasized the need for culturally congruent signing styles, natural pacing, and facial grammar integration. Late-exposed signers have reported difficulty interpreting fast-paced avatar signing, especially when markers such as eyebrow raises or mouth gestures were absent. These findings reinforce the value of participatory design in educational technology for Deaf learners. Imashev et al. further reinforces this point by demonstrating how integrating user-centered design elements, such as K-RSL-translated consent videos, culturally adapted rating scales, and visual Likert interfaces, enhanced the accessibility and inclusivity of the evaluation process. Feedback from late signers emphasized the importance of allowing for response time extensions and reduced signing speeds to support comprehension. The use of GIF-based sorting tasks and high-contrast mood tracking tools also allowed for participants to express feedback more intuitively, reinforcing that multimodal, linguistically adaptive methods are key to inclusive educational technology.
Moreover, parameter-specific findings from [48] suggest that assessment and instructional tools should monitor not only comprehension accuracy, but also processing efficiency. Tools such as gaze tracking or response time analysis could be implemented in digital learning platforms to better capture subtle processing delays and guide personalized interventions.
In the Arab region, these insights highlight an urgent need for investments in early identification and support systems for Deaf children. Current policies often underrepresent the linguistic diversity of Arabic Sign Languages (ArSL) and their dialects, posing challenges for culturally responsive education. Research indicates a preference among local communities for maintaining national sign languages (e.g., Kuwaiti Sign Language), with resistance to imposed pan-Arab unification initiatives [56]. Supporting sign language research, interpreter training, and localized educational tools in these languages is critical to empowering Deaf communities across the region. Reference [49] also reinforces the importance of maintaining language-specific variation in avatar systems. When avatars used standardized or unfamiliar signs in place of regional ones, participants in the study experienced reduced engagement and comprehension. Imashev et al. echoes this concern [30], documenting specific instances where K-RSL users rejected avatar outputs using unfamiliar or overly formal signs, perceiving them as disconnected from natural usage. Participants consistently preferred agents that employed vernacular signs and showed alignment with regional dialect preferences. These results call for avatar systems that are not only linguistically accurate, but also culturally situated.
Establishing standardized perceptual fidelity benchmarks and cross-linguistic datasets identified in this review is crucial for effective avatar evaluation and improvement. These resources will guide avatar developers in creating culturally responsive and educationally effective avatar-based platforms.

5.5. Limitations and Methodological Considerations

As this rapid review aimed for timely insights into the avatar-related implications of recent sign language acquisition studies, methodological and scope limitations were inherent to the review process. While it is comprehensive in its overview of empirical research from January 2020 to March 2025, this review deliberately focused narrowly on the cognitive and linguistic acquisition outcomes relevant to avatar development. The high exclusion rate, while notable, reflects the deliberate narrowing of the scope to empirical acquisition studies with direct relevance to cognitive and linguistic development. Studies focusing on robotics, assistive systems, and general sign recognition were outside the remit of this review. Research on underrepresented sign languages, particularly from the Global South and the Arab world, remains sparse. Despite the converging themes, several limitations reduce the overall strength of current evidence.
Lack of standardized measures: The seventeen studies used completely different narrative tasks, distinct classifier-coding schemes, and completely unrelated neuroimaging paradigms. No single validated measure was used across more than two papers. This heterogeneity significantly complicates direct comparisons and comprehensive meta-analytic synthesis.
Small and uneven samples: Participant groups ranged from a single case study [37] to a sample size of 487 [36], with most studies recruiting fewer than forty signers. European and North American sign languages dominated; very few papers addressed the sign languages of Africa, Asia, or South America.
Short follow-up periods: Only four studies followed learners for a year or longer, limiting our understanding of long-term change.
Minimal intervention research: Most work was descriptive. Few papers tested teaching methods systematically and rarely provided randomized or controlled designs to isolate instructional effects.
These methodological limitations directly impact the standardization and validation processes for avatar technologies. Without consistent metrics, large diverse samples, and extensive longitudinal data, the development of universally intelligible and culturally appropriate sign language avatars remains challenging.

5.6. Future Research Directions

To advance the development and evaluation of signing avatars, we recommend the following directions for future research:
  • Standardize participant reporting: Studies should consistently report age of acquisition (AoA), signing fluency, language dominance, and exposure context. These variables are essential for interpreting results and comparing across studies.
  • Stratify sampling across user groups: Future work should include balanced cohorts of native signers, late signers, and hearing L2 learners. Including multilingual and dialectal signers can enhance ecological validity and inform avatar adaptability across signing communities.
  • Expand linguistic coverage: Current research is heavily skewed toward a few well-documented sign languages (e.g., ASL, BSL). Including underrepresented languages, such as Kazakh–Russian SL, Qatari SL, or regional Arab variants, will support more inclusive design and prevent overgeneralization.
  • Develop unified evaluation protocols: Establishing common frameworks for avatar assessment, combining subjective ratings (e.g., trust and clarity) and objective measures (e.g., eye-tracking, EEG, and reaction time), will improve cross-study comparability and metric validity.
  • Adopt participatory research methods: Engaging Deaf users in co-design, usability testing, and feedback collection is critical. Tools like translated consent forms, visual Likert scales, and multimodal feedback tasks can improve accessibility and inclusivity in experimental design.

6. Conclusions

This rapid review highlights key empirical findings on age of acquisition, spatial grammar complexity, and cognitive load considerations, which are crucial for informing avatar-based sign language technologies. The evidence clearly indicates that early sign language exposure significantly enhances linguistic competence, whereas delayed acquisition poses persistent cognitive and linguistic challenges, particularly in mastering spatial grammar and complex classifier structures. Emerging research demonstrates the significant role of visuospatial aptitude and gesture-based strategies in facilitating sign language acquisition, bridging cognitive processing with motor learning. Additionally, recent methodological innovations—notably, ethical and community-engaged research practices—have improved validity and inclusivity within Deaf communities. While providing timely synthesis, the rapid review methodology naturally limits exhaustive inclusion of global sign language data, especially user-facing evaluations of avatar systems. Reference [49] provided the only study in this dataset that directly assessed Deaf adults’ responses to avatar-generated sign language. Their insights reveal the importance of naturalistic pacing, culturally appropriate sign choices, and multimodal cues for clarity. Integrating such findings is essential to avoid perpetuating accessibility barriers through poorly designed signing agents. Thus, future research should prioritize developing standardized perceptual fidelity benchmarks and comprehensive cross-linguistic datasets, which are essential for effective avatar evaluation and design. Systematic explorations of instructional methods tailored specifically to avatar-assisted learning will be particularly beneficial. Implementing standardized perceptual benchmarks and cross-linguistic datasets identified in this review will directly support the creation of next-generation signing avatars, enabling around-the-clock, culturally faithful, and perceptually authentic communication access for Deaf and hard-of-hearing communities worldwide.

Author Contributions

Conceptualization, K.C. and A.O.; methodology, K.C. and A.O.; software, K.C.; validation, K.C. and A.O.; formal analysis, K.C.; investigation, K.C.; data curation, K.C.; writing—original draft preparation, K.C.; writing—review and editing, A.O.; supervision, A.O.; project administration, K.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
DGSDeutsche Gebärdensprache (German Sign Language)
ÖGSÖsterreichische Gebärdensprache (Austrian Sign Language)
BhSLBhutanese Sign Language
NGTNederlandse Gebarentaal (Sign Language of the Netherlands)
BSLBritish Sign Language
TİDTürk İşaret Dili (Turkish Sign Language)
ASLAmerican Sign Language
PJMPolski Język Migowy (Polish Sign Language)
EEGElectroencephalography
fMRIFunctional Magnetic Resonance Imaging
TMS Transcranial Magnetic Stimulation
M2L2Modality 2 Language 2 (hearing learners learning a second language in a second modality)
L2Second Language
L1First Language
L2M1Learners learning a second sign language (already know one sign language)
L2M2Learners learning sign language as a second language and in a new modality
DSDepicting Signs
AoAAge of Acquisition
DoDDeaf-of-Deaf (deaf individuals with deaf parents)
DoHDeaf-of-Hearing (deaf individuals with hearing parents)
RTReaction Time
SPLSuperior Parietal Lobule
NaKom DGS-TestA specific standardized assessment tool for German Sign Language narrative skills
PRISMAPreferred Reporting Items for Systematic Reviews and Meta-Analyses
GDPRGeneral Data Protection Regulation

References

  1. Home Page. Available online: https://wfdeaf.org (accessed on 19 January 2025).
  2. Lillo-Martin, D.; Henner, J. Acquisition of Sign Languages. Annu. Rev. Linguist. 2021, 7, 395–419. [Google Scholar] [CrossRef]
  3. Jemni, M.; Jaballah, K. A Review on 3D Signing Avatars: Benefits, Uses and Challenges. Int. J. Multimed. Data Eng. Manag. (IJMDEM) 2013, 4, 21–45. [Google Scholar] [CrossRef]
  4. Naert, L.; Larboulette, C.; Gibet, S. A Survey on the Animation of Signing Avatars: From Sign Representation to Utterance Synthesis. Comput. Graph. 2020, 92, 76–98. [Google Scholar] [CrossRef]
  5. Choudhury, S. Analysis of Torso Movement for Signing Avatar Using Deep Learning. In Proceedings of the 7th International Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual: Challenges and Perspectives, Marseille, France, 24 June 2022. [Google Scholar]
  6. Quandt, L.C.; Willis, A.; Schwenk, M.; Weeks, K.; Ferster, R. Attitudes Toward Signing Avatars Vary Depending on Hearing Status, Age of Signed Language Acquisition, and Avatar Type. Front. Psychol. 2022, 13, 730917. [Google Scholar] [CrossRef] [PubMed]
  7. Lacerda, I.; Nicolau, H.; Coheur, L. Towards Realistic Sign Language Animations. In Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents, Würzburg, Germany, 19–22 September 2023. [Google Scholar] [CrossRef]
  8. McDonald, J. Considerations on Generating Facial Nonmanual Signals on Signing Avatars. Univers. Access Inf. Soc. 2024, 24, 19–36. [Google Scholar] [CrossRef]
  9. Othman, A.; Dhouib, A.; Chalghoumi, H.; Ghoul, O.E.; Al-Mutawaa, A. The Acceptance of Culturally Adapted Signing Avatars Among Deaf and Hard-of-Hearing Individuals. IEEE Access 2024, 12, 78624–78640. [Google Scholar] [CrossRef]
  10. Emmorey, K.; Bosworth, R.; Kraljic, T. Visual Feedback and Self-Monitoring of Sign Language. J. Mem. Lang. 2009, 61, 398–411. [Google Scholar] [CrossRef]
  11. Gimeno-Martínez, M.; Costa, A.; Baus, C. Influence of Gesture and Linguistic Experience on Sign Perception. J. Deaf. Stud. Deaf. Educ. 2020, 25, 80–90. [Google Scholar] [CrossRef]
  12. Krebs, J.; Malaia, E.; Wilbur, R.; Roehm, D. Neural Mechanisms of Event Visibility in Sign Languages. Lang. Cogn. Neurosci. 2023, 38, 1282–1301. [Google Scholar] [CrossRef]
  13. Krebs, J.; Roehm, D.; Wilbur, R.B.; Malaia, E.A. Age of Sign Language Acquisition Has Lifelong Effect on Syntactic Preferences in Sign Language Users. Int. J. Behav. Dev. 2021, 45, 397–408. [Google Scholar] [CrossRef]
  14. Malaia, E.A.; Krebs, J.; Roehm, D.; Wilbur, R.B. Age of Acquisition Effects Differ across Linguistic Domains in Sign Language: EEG Evidence. Brain Lang. 2020, 200, 104708. [Google Scholar] [CrossRef]
  15. Tomaszewski, P.; Krzysztofiak, P.; Morford, J.P.; Eźlakowski, W. Effects of Age-of-Acquisition on Proficiency in Polish Sign Language: Insights to the Critical Period Hypothesis. Front. Psychol. 2022, 13, 896339. [Google Scholar] [CrossRef]
  16. Morford, J.P.; Grieve-Smith, A.B.; Macfarlane, J.; Staley, J.; Waters, G.S. Effects of Language Experience on the Perception of American Sign Language. Cognition 2008, 109, 41–53. [Google Scholar] [CrossRef] [PubMed]
  17. Caselli, N.K.; Occhino, C.; Artacho, B.; Savakis, A.; Dye, M. Perceptual Optimization of Language: Evidence from American Sign Language. Cognition 2022, 224, 105040. [Google Scholar] [CrossRef] [PubMed]
  18. Alyami, S.; Luqman, H.; Hammoudeh, M. Reviewing 25 Years of Continuous Sign Language Recognition Research: Advances, Challenges, and Prospects. Inf. Process. Manag. 2024, 61, 103774. [Google Scholar] [CrossRef]
  19. Yu, M.; Jia, J.; Xue, C.; Yan, G.; Guo, Y.; Liu, Y. A Review of Sign Language Recognition Research. J. Intell. Fuzzy Syst. 2022, 43, 3879–3898. [Google Scholar] [CrossRef]
  20. Emmorey, K. Ten Things You Should Know About Sign Languages. Curr. Dir. Psychol. Sci. 2023, 32, 387–394. [Google Scholar] [CrossRef]
  21. Murray, J.J.; Hall, W.C.; Snoddon, K. Education and Health of Children with Hearing Loss: The Necessity of Signed Languages. Bull. World Health Organ. 2019, 97, 711–716. [Google Scholar] [CrossRef]
  22. Humphries, T.; Kushalnagar, P.; Mathur, G.; Napoli, D.; Padden, C.; Rathmann, C. Ensuring Language Acquisition for Deaf Children: What Linguists Can Do. Language 2014, 90, e31–e52. [Google Scholar] [CrossRef]
  23. Bowman-Smart, H.; Gyngell, C.; Morgan, A.R.; Savulescu, J. The Moral Case for Sign Language Education. Monash Bioeth. Rev. 2019, 37, 94–110. [Google Scholar] [CrossRef]
  24. Nedjar, I.; M’hamedi, M. Interactive System Based on Artificial Intelligence and Robotic Arm to Enhance Arabic Sign Language Learning in Deaf Children. Educ. Inf. Technol. 2024, 29, 24563–24580. [Google Scholar] [CrossRef]
  25. Deoghare, R.; Gaikwad, M.; Desale, V.; Mahajan, P. Interactive Sign Language Learning Platform Fostering Sign Language Literacy. In Proceedings of the 2024 8th International Conference on Computing, Communication, Control and Automation (ICCUBEA), Pune, India, 23–24 August 2024; pp. 1–5. [Google Scholar] [CrossRef]
  26. Berrett, B. Using Computer-Assisted Language Learning in an American Sign Language Course. Innov. Lang. Learn. Teach. 2012, 6, 29–43. [Google Scholar] [CrossRef]
  27. Brock, H.; Nishina, S. Quantifying Sign Avatar Perception: How Imperfect Is Insufficient? In Proceedings of the CHI EA ‘20: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 25–30 April 2020. [Google Scholar] [CrossRef]
  28. Dimou, A.-L.; Papavassiliou, V.; McDonald, J.C.; Goulas, T.; Vasilaki, K.; Vacalopoulou, A.; Fotinea, S.-E.; Efthimiou, E.; Wolfe, R.J. Signing Avatar Performance Evaluation within EASIER Project. In Proceedings of the 7th International Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual: Challenges and Perspectives, Marseille, France, 24 June 2022. [Google Scholar]
  29. Aziz, M.; Othman, A. Evolution and Trends in Sign Language Avatar Systems: Unveiling a 40-Year Journey via Systematic Review. Multimodal Technol. Interact. 2023, 7, 97. [Google Scholar] [CrossRef]
  30. Imashev, A.; Oralbayeva, N.; Baizhanova, G.; Sandygulova, A. Assessment of Comparative Evaluation Techniques for Signing Agents: A Study with Deaf Adults. J. Multimodal User Interfaces 2025, 19, 1–19. [Google Scholar] [CrossRef]
  31. Naert, L.; Larboulette, C.; Gibet, S. Motion Synthesis and Editing for the Generation of New Sign Language Content. Mach. Transl. 2021, 35, 405–430. [Google Scholar] [CrossRef]
  32. Tricco, A.C.; Antony, J.; Zarin, W.; Strifler, L.; Ghassemi, M.; Ivory, J.; Perrier, L.; Hutton, B.; Moher, D.; Straus, S.E. A Scoping Review of Rapid Review Methods. BMC Med. 2015, 13, 224. [Google Scholar] [CrossRef]
  33. Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D. Reprint-Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. Phys. Ther. 2009, 89, 873–880. [Google Scholar] [CrossRef]
  34. Stevens, A.; Garritty, C.; Hersi, M.; Moher, D. Developing PRISMA-RR, a Reporting Guideline for Rapid Reviews of Primary Studies (Protocol). Equat. Netw. 2018, 1–12. [Google Scholar] [CrossRef]
  35. Klerings, I.; Robalino, S.; Booth, A.; Escobar-Liquitay, C.M.; Sommer, I.; Gartlehner, G.; Devane, D.; Waffenschmidt, S. Rapid Reviews Methods Series: Guidance on Literature Search. BMJ Evid.-Based Med. 2023, 28, 412–417. [Google Scholar] [CrossRef]
  36. Kolbe, V. Open Science versus Data Protection–Challenges and Solutions in Sign Language Acquisition Studies. Hrvat. Rev. Za Rehabil. Istraz. 2022, 58, 109–120. [Google Scholar] [CrossRef]
  37. Choden, S.; Jigyel, K. Impact of Delay in Sign Language Acquisition on Writing Development: The Case of a Deaf Child. Int. J. Sci. Innov. Res. 2022, 3, 01000130IJESIR. [Google Scholar]
  38. Watkins, F.; Webb, S.; Stone, C.; Thompson, R.L. Language Aptitude in the Visuospatial Modality: L2 British Sign Language Acquisition and Cognitive Skills in British Sign Language-English Interpreting Students. Front. Psychol. 2022, 13, 932370. [Google Scholar] [CrossRef] [PubMed]
  39. Boers-Visker, E. On the Acquisition of Complex Classifier Constructions by L2 Learners of a Sign Language. Lang. Teach. Res. 2024, 28, 749–785. [Google Scholar] [CrossRef]
  40. Gür, C. Investigating the Effects of Late Sign Language Acquisition on Referent Introduction: A Follow-up Study. Poznan Stud. Contemp. Linguist. 2024, 60, 1–25. [Google Scholar] [CrossRef]
  41. Keleş, O.; Atmaca, F.; Gökgöz, K. Reference Tracking Strategies of Deaf Adult Signers in Turkish Sign Language. J. Pragmat. 2023, 213, 12–35. [Google Scholar] [CrossRef]
  42. Weisberg, J.; Casey, S.; Sehyr, Z.S.; Emmorey, K. Second Language Acquisition of American Sign Language Influences Co-Speech Gesture Production. Biling. Lang. Cogn. 2020, 23, 473–482. [Google Scholar] [CrossRef]
  43. Kurz, K.B.; Kartheiser, G.; Hauser, P.C. Second Language Learning of Depiction in a Different Modality: The Case of Sign Language Acquisition. Front. Commun. 2023, 7, 896355. [Google Scholar] [CrossRef]
  44. Boers–Visker, E.; Pfau, R. Space Oddities: The Acquisition of Agreement Verbs by L2 Learners of Sign Language of The Netherlands. Mod. Lang. J. 2020, 104, 757–780. [Google Scholar] [CrossRef]
  45. Banaszkiewicz, A.; Bola, Ł.; Matuszewski, J.; Szczepanik, M.; Kossowski, B.; Mostowski, P.; Rutkowski, P.; Śliwińska, M.; Jednoróg, K.; Emmorey, K.; et al. The Role of the Superior Parietal Lobule in Lexical Processing of Sign Language: Insights from FMRI and TMS. Cortex 2021, 135, 240–254. [Google Scholar] [CrossRef]
  46. Caselli, N.K.; Emmorey, K.; Cohen-Goldberg, A.M. The Signed Mental Lexicon: Effects of Phonological Neighborhood Density, Iconicity, and Childhood Language Experience. J. Mem. Lang. 2021, 121, 104282. [Google Scholar] [CrossRef]
  47. Boers-Visker, E. The Acquisition of Strategies to Express Plurality in Hearing Second Language Learners of Sign Language of The Netherlands. Lang. Learn. 2023, 73, 101–135. [Google Scholar] [CrossRef]
  48. Wienholz, A.; Lieberman, A.M. Tracking Effects of Age of Sign Language Acquisition and Phonology in American Sign Language Sentence Processing. Mem. Cogn. 2025, 1–19. [Google Scholar] [CrossRef]
  49. Imashev, A.; Oralbayeva, N.; Sandygulova, A. An Exploratory User Study towards Developing a Unified, Comprehensive Assessment Apparatus for Deaf Signers, Specifically Tailored for Signing Avatars Evaluation: Challenges, Findings, and Recommendations. Multimed. Tools Appl. 2024, 84, 30865–30902. [Google Scholar] [CrossRef]
  50. Boers-Visker, E.; Bogaerde, B. van den Learning to Use Space in the L2 Acquisition of a Signed Language: Two Case Studies. Sign Lang. Stud. 2019, 19, 410–452. [Google Scholar] [CrossRef]
  51. Karadöller, D.Z.; Peeters, D.; Manhardt, F.; Özyürek, A.; Ortega, G. Iconicity and Gesture Jointly Facilitate Learning of Second Language Signs at First Exposure in Hearing Nonsigners. Lang. Learn. 2024, 74, 781–813. [Google Scholar] [CrossRef]
  52. Ortega, G.; Özyürek, A.; Peeters, D. Iconic Gestures Serve as Manual Cognates in Hearing Second Language Learners of a Sign Language: An ERP Study. J. Exp. Psychol. Learn. Mem. Cogn. 2020, 46, 403–415. [Google Scholar] [CrossRef]
  53. Schönström, K.; Holmström, I. L2M1 and L2M2 Acquisition of Sign Lexicon: The Impact of Multimodality on the Sign Second Language Acquisition. Front. Psychol. 2022, 13, 896254. [Google Scholar] [CrossRef]
  54. Marshall, C.; Morgan, G. From Gesture to Sign Language: Conventionalisation of Classifier Constructions by Adult Hearing Learners of BSL. Top. Cogn. Sci. 2015, 7, 61–80. [Google Scholar] [CrossRef]
  55. Donner, A.; Marshall, M.; Mozrall, J.R. Effects of Early Exposure to Sign Language on the Biomechanics of Interpreting. J. Interpret. 2016, 25, 4. [Google Scholar]
  56. Almubayei, D.S. Sign Language Choice and Policy among the Signing Community in Kuwait. Dig. Middle East Stud. 2024, 33, 166–183. [Google Scholar] [CrossRef]
Figure 1. The avatar “BuHamad” signing the full sentence “Mada Center is a private institution for public benefit” (in Arabic: مركز مدى مؤسسة خاصة ذات نفع عام) in Qatari Sign Language. This example demonstrates how multi-sign sequencing and facial grammar are essential for conveying syntactic and semantic clarity.
Figure 1. The avatar “BuHamad” signing the full sentence “Mada Center is a private institution for public benefit” (in Arabic: مركز مدى مؤسسة خاصة ذات نفع عام) in Qatari Sign Language. This example demonstrates how multi-sign sequencing and facial grammar are essential for conveying syntactic and semantic clarity.
Mti 09 00082 g001
Figure 2. Adapted PRISMA flow diagram illustrating the study selection process, used here to enhance transparency in this focused rapid review. The diagram outlines the number of records identified, screened, assessed for eligibility, and included in the final synthesis, along with reasons for exclusions at each stage. * List shows the databases searched and the number of records retrieved from each before deduplication.
Figure 2. Adapted PRISMA flow diagram illustrating the study selection process, used here to enhance transparency in this focused rapid review. The diagram outlines the number of records identified, screened, assessed for eligibility, and included in the final synthesis, along with reasons for exclusions at each stage. * List shows the databases searched and the number of records retrieved from each before deduplication.
Mti 09 00082 g002
Figure 3. Visualization of the Qatari Sign Language avatar “BuHamad” signing the word “World.” Left: The motion trajectory showing the spatial arc of hand movement. Right: Timelapse snapshots highlighting the dynamic stages of sign articulation.
Figure 3. Visualization of the Qatari Sign Language avatar “BuHamad” signing the word “World.” Left: The motion trajectory showing the spatial arc of hand movement. Right: Timelapse snapshots highlighting the dynamic stages of sign articulation.
Mti 09 00082 g003
Table 1. Database search strategies and terms used in the rapid review (2020–2025).
Table 1. Database search strategies and terms used in the rapid review (2020–2025).
DatabaseMethodSearch Terms
ScienceDirectUtilized the advanced search feature with the query(“Sign Language Acquisition” OR “Sign Language Perception” OR “Sign Language Acquisition Monitoring”) AND publication_year: [2020 TO 2025]
IEEE XploreEmployed the search string:((“All Metadata”: ”Sign Language Acquisition”) OR (“All Metadata”: ”Sign Language Perception”) OR (“All Metadata”: ”Sign Language Acquisition Monitoring”)) AND (Publication Year: 2020–2025)
Google ScholarApplied the search with a custom date range from 2020 to 2025.allintitle: “Sign Language Acquisition” OR “Sign Language Perception” OR “Sign Language Acquisition Monitoring”
ACM Digital LibraryUsed the query with filters set for publication dates between 1 January 2020 and 31 March 2025.All: “Sign Language Acquisition” OR “Sign Language Perception”
SpringerLinkConducted separate searches for each term with publication years limited to 2020 through 2025“Sign Language Acquisition”
“Sign Language Perception”
Table 2. Summary of selected studies on sign language acquisition by year, venue, and language context.
Table 2. Summary of selected studies on sign language acquisition by year, venue, and language context.
Ref.Publication TitlePublication YearType of VenueCountry/Language
[36]Open science versus data protection--Challenges and solutions in sign language acquisition studies2022JournalGermany/German Sign Language (Deutsche Gebärdensprache) (DGS)
[14]Age of acquisition effects differ across linguistic domains in sign language: EEG evidence2020JournalAustria/Austrian Sign Language (Österreichische Gebärdensprache) (ÖGS)
[37]Impact of delay in sign language acquisition on writing development: The Case of a Deaf Child2022JournalBhutan/Bhutanese Sign Language (BhSL)
[13]Age of sign language acquisition has a lifelong effect on syntactic preferences in sign language users2021JournalAustria/ÖGS
[38]Language aptitude in the visuospatial modality: L2 British Sign Language acquisition and cognitive skills in British Sign Language-English interpreting students2022JournalUnited Kingdom/British Sign Language (BSL)
[39]On the acquisition of complex classifier constructions by L2 learners of a sign language2024JournalNetherlands/Sign Language of the Netherlands (Nederlandse Gebarentaal) (NGT)
[40]Investigating the effects of late sign language acquisition on referent introduction: a follow-up study2024JournalTurkey/Turkish Sign Language (Türk İşaret Dili) (TİD)
[41]Reference tracking strategies of Deaf adult signers in Turkish Sign Language2023JournalTurkey/TİD
[42]Second language acquisition of American Sign Language influences co-speech gesture production2020JournalUSA/American Sign Language (ASL)
[43]Second language learning of depiction in a different modality: the case of sign language acquisition2023JournalUSA/ASL
[44]Space oddities: The acquisition of agreement verbs by L2 learners of Sign Language of the Netherlands2020JournalNetherlands/NGT
[45]The role of the superior parietal lobule in lexical processing of sign language: Insights from fMRI and TMS2021JournalPoland/Polish Sign Language (Polski Język Migowy) (PJM)
[46]The signed mental lexicon: Effects of phonological neighborhood density, iconicity, and childhood language experience2021JournalUSA/ASL
[47]The Acquisition of Strategies to Express Plurality in Hearing Second Language Learners of Sign Language of the Netherlands2023JournalNetherlands/NGT
[48]Tracking effects of age of sign language acquisition and phonology in American Sign Language sentence processing2025JournalUSA/ASL
[49]An exploratory user study towards developing a unified, comprehensive assessment apparatus for Deaf signers, specifically tailored for signing avatars evaluation: challenges, findings, and recommendations2024JournalKazakhstan-Russia/Kazakh-Russian sign language (K-RSL)
[30]Assessment of comparative evaluation techniques for signing agents: a study with Deaf adults2024JournalKazakhstan-Russia/K-RSL
Table 3. Summary of methodological groups, key research approaches, and the aspects examined across 17 studies on sign language acquisition monitoring and perception.
Table 3. Summary of methodological groups, key research approaches, and the aspects examined across 17 studies on sign language acquisition monitoring and perception.
Methodological GroupPaperKey MethodologiesAspects ExaminedKey Outcome Indicators
Community-Based Participatory and Standardized Assessment Adaptation[36]Adaptation and administration of the NaKom DGS-Test; narrative elicitation; qualitative and quantitative video analysis; community-based participatory research; GDPR-compliant data protection (secure storage, pseudonymization, layered consent)Narrative structure, fluency, and linguistic features in children’s DGS narratives; reference measures for DGS developmentNarrative fluency index; grammatical accuracy; Deaf-community validation loops
Narrative Elicitation and Discourse-Coding [40]Narrative elicitation (“Spider Story”); ELAN qualitative coding of lexical signs (LS) vs. classifier predicates (CL)Strategies (lexical/classifier) used for first introductions of inanimate objects; late vs. nativeProportion of lexical signs vs. classifiers in referent introduction
[41]Narrative elicitation (Tom and Jerry clips); ELAN annotation of discourse status and referring expressions; Bayesian regression analysisUse of explicit vs. implicit referring expressions (nominal, zero anaphora) by DoD vs. DoH signersDistribution of nouns, classifiers, zero-anaphora across discourse roles
[42]Narrative elicitation (Canary Row); video recording; gesture coding (iconic, deictic, beat; handshape markedness and variety); coding of embedded ASL signsGesture rate/type, handshape variety, and ASL sign intrusions in English narrative retellingsGesture rate; gesture type; handshape variety; ASL sign intrusions
[43]Cross-sectional narrative elicitation; ELAN annotation; coding of depicting signs (Entity, Body-Part, Handling, Size-and-Shape Specifiers); ASL Comprehension TestAcquisition/use of depicting signs (types, frequency, comprehension relation, gestural transfer) and ASL comprehensionCounts of four depicting-sign types; correlation with ASL-CT scores
Psycholinguistic and Neurocognitive Measures[14]EEG (N400 response); grammaticality ratings; reaction times; acceptability ratingsN400 responses, acceptability ratings, and RTs to classifier signs, word order (SOV vs. OSV), and topic-marking signalsN400 amplitude; sentence-acceptability ratings; RTs
[13]Seven-point Likert acceptability ratings; reaction-time measurement; mixed-effects statistical modelingAge of Acquisition (AoA) impacts on processing of syntax (word order), pragmatics (topic marking), and semantics (classifier constructions).Acceptability ratings and RTs for syntax/pragmatics
[45]Longitudinal fMRI (5 sessions) + Deaf comparison (1 session); TMS to left/right SPL; explicit vs. implicit lexical decision tasks; accuracy measuresSPL role in lexical processing; hemispheric specialization; brain activation in PJM learners.Lexical-decision accuracy; SPL activation (fMRI); TMS interference effects
[46]Unprimed lexical decision task with ASL-LEX 1.0 stimuli; reaction times; accuracy; demographic and exposure metrics (age of first ASL exposure, hearing status of parents, etc.)Effects of phonological density, iconicity, and early language experience on lexical recognition.RT and accuracy in lexical-decision; effects of frequency, neighborhood density, iconicity
Longitudinal Designs in L2 Acquisition[38]Three-year longitudinal design (4 sessions); cognitive assessments (dual n-back, Corsi blocks, digit span; 2D/3D mental rotation; MLAT phonological encoding; KBIT-2 non-verbal reasoning); TED-talk summarization; BSL sign/sentence repetition; interpreting tasks (BSL↔English)Working memory (multimodal, visuospatial, auditory), mental rotation, phonological encoding, reading, non-verbal reasoning, summarization in relation to BSL proficiency and interpreting outcomes.Visuospatial memory and mental-rotation accuracy; BSL repetition; interpreting scores
[39]Two-year longitudinal (15 sessions); 180 visual prompts; video recording; ELAN transcription and coding of formational features and error types (handshape substitution, misorientation, etc.)Acquisition of two-handed classifier predicates (handshape, orientation, movement, spatial coordination); developmental progression from gestures to conventional useFrequency and accuracy of classifier predicates; orientation errors
[44]Two-year longitudinal (6 sessions); picture and sentence prompts; video recording; ELAN transcription; coding of agreement verb strategies (fully/partly agreeing, constructed action, lexical strategies)Acquisition of spatial verb agreement (inflection, localization), constructed action, and alternative strategies; error patterns (overgeneralization, omission)% verbs with correct spatial agreement; error typology
[47]Spontaneous conversations (3 learners, every 10 weeks for 18 months) + elicited sessions (11 learners, 6 sessions in Year 1); ELAN transcription; coding of plurality strategies (reduplication, numeral incorporation, classifier predicates); interrater reliability 86–93%Learners’ strategies to express plural referents, early use of plural markers and errors in expressing plurality Frequency and appropriateness of plural strategies; teacher ratings
In-Depth Qualitative Case Study[37]Qualitative case study; purposive sampling; semi-structured interviews; participant observation; photovoice; videotaping; data triangulation; thematic analysisEffects of delayed SL acquisition on vocabulary, writing competency, and cross-linguistic transfer (BhSL ↔ English/Dzongkha); role of early intervention in literacy developmentVocabulary size; syntactic variety in writing; teacher rubric scores
Eye-Tracking with Phonological Priming[48]Eye-tracking, ASL sentence processing, divergence point analysis, windowed fixationSentence-level sign recognition under varying degrees and types of phonological similarity; group comparisons by AoADivergence points and gaze ratios by parameter; age-of-acquisition sensitivity to phonological overlap
Evaluation of Avatar-based Sign Language Presentation[49]Mixed-methods design, including online and in-person user studies; subjective assessment via Likert scales; adapted Godspeed and RoSAS questionnaires; video translation of consent and instructions into K-RSL; use of data-driven and manually animated avatars; eye-tracking and Funometer mood scale (in-person study)Perceived naturalness, intelligibility, human-likeness, and usability of avatar-based SL output; translation accuracy; mood impact; attention distribution (manual vs. non-manual features)Translation rate per sign; mood change metrics; Godspeed dimensions (anthropomorphism, animacy, intelligence, likeability); gaze fixation ratios (manual vs. facial regions)
[30]In-person user study with Deaf signers; eye-tracking with Tobii Pro Glasses 2; Funometer emotional scale; adapted Godspeed thermometer visuals; sorting tasks using GIFs; evaluation of 3 avatar agents (rule-based, data-driven, symbolic) vs. human signerVisual attention patterns (face vs. hands); emotional response pre/post interaction; comprehension and perception of synthetic agents based on age/exposureGaze fixation ratios; mood change metrics; Godspeed thermometer scores; participant sorting; qualitative feedback on signing accuracy and realism
Note. AoA = age of first language acquisition; DoD = Deaf-of-Deaf; DoH = Deaf-of-Hearing; RT = reaction time; SPL = superior parietal lobule.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chemnad, K.; Othman, A. Perception and Monitoring of Sign Language Acquisition for Avatar Technologies: A Rapid Focused Review (2020–2025). Multimodal Technol. Interact. 2025, 9, 82. https://doi.org/10.3390/mti9080082

AMA Style

Chemnad K, Othman A. Perception and Monitoring of Sign Language Acquisition for Avatar Technologies: A Rapid Focused Review (2020–2025). Multimodal Technologies and Interaction. 2025; 9(8):82. https://doi.org/10.3390/mti9080082

Chicago/Turabian Style

Chemnad, Khansa, and Achraf Othman. 2025. "Perception and Monitoring of Sign Language Acquisition for Avatar Technologies: A Rapid Focused Review (2020–2025)" Multimodal Technologies and Interaction 9, no. 8: 82. https://doi.org/10.3390/mti9080082

APA Style

Chemnad, K., & Othman, A. (2025). Perception and Monitoring of Sign Language Acquisition for Avatar Technologies: A Rapid Focused Review (2020–2025). Multimodal Technologies and Interaction, 9(8), 82. https://doi.org/10.3390/mti9080082

Article Metrics

Back to TopTop