In-Context Learning for Low-Resource Machine Translation: A Study on Tarifit with Large Language Models

Akallouch, Oussama; Fardousse, Khalid

doi:10.3390/a18080489

Open AccessArticle

In-Context Learning for Low-Resource Machine Translation: A Study on Tarifit with Large Language Models

by

Oussama Akallouch

^*

and

Khalid Fardousse

Faculty of Sciences Dhar EL Mahraz, Sidi Mohamed Ben Abdellah University, Fez 30050, Morocco

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(8), 489; https://doi.org/10.3390/a18080489

Submission received: 2 July 2025 / Revised: 30 July 2025 / Accepted: 2 August 2025 / Published: 6 August 2025

(This article belongs to the Section Evolutionary Algorithms and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

This study presents the first systematic evaluation of in-context learning for Tarifit machine translation, a low-resource Amazigh language spoken by 5 million people in Morocco and Europe. We assess three large language models (GPT-4, Claude-3.5, PaLM-2) across Tarifit–Arabic, Tarifit–French, and Tarifit–English translation using 1000 sentence pairs and 5-fold cross-validation. Results show that 8-shot similarity-based demonstration selection achieves optimal performance. GPT-4 achieved 20.2 BLEU for Tarifit–Arabic, 14.8 for Tarifit–French, and 10.9 for Tarifit–English. Linguistic proximity significantly impacts translation quality, with Tarifit–Arabic substantially outperforming other language pairs by 8.4 BLEU points due to shared vocabulary and morphological patterns. Error analysis reveals systematic issues with morphological complexity (42% of errors) and cultural terminology preservation (18% of errors). This work establishes baseline benchmarks for Tarifit translation and demonstrates the viability of in-context learning for morphologically complex low-resource languages, contributing to linguistic equity in AI systems.

Keywords:

zero-shot translation; Tarifit; Amazigh languages; large language models; low-resource languages; multilingual NLP; neural machine translation; North African languages

1. Introduction

1.1. Background

The vast majority of the world’s approximately 7000 languages are considered low-resource, lacking the extensive digital data and computational tools available for widely spoken languages [1]. This linguistic gap in natural language processing (NLP) research creates a significant technological divide, limiting digital inclusion and the preservation of linguistic diversity [2].

Among these underrepresented languages are the Amazigh (Berber) languages, spoken across North Africa by millions. Tarifit, a prominent Northern Berber variety, is spoken by approximately 5 million people, primarily in the Rif region of Morocco, and by a significant diaspora in various European countries [3]. Despite its substantial number of speakers and rich cultural heritage, Tarifit suffers from a profound lack of digital linguistic resources and computational tools [4,5].

1.2. Challenges and Motivations

Traditional neural machine translation approaches require large parallel corpora, which are virtually non-existent for Tarifit and most other low-resource languages. This limited resource landscape significantly impedes the development of robust NLP applications, such as machine translation (MT), for Tarifit. Recent work has explored transfer learning and multilingual models [6,7], but these approaches still require substantial training data and computational resources that remain inaccessible for most endangered languages.

The advent of Large Language Models (LLMs) has introduced a paradigm shift through in-context learning (ICL), where models can perform translation tasks using only a few demonstration examples provided in the prompt, without requiring parameter updates or extensive training data [8]. This capability offers unprecedented opportunities for low-resource machine translation, as it can leverage linguistic resources such as dictionaries, grammar books, and small sets of parallel examples that are often available for well-documented but digitally under-resourced languages like Tarifit.

1.3. Literature Survey

Recent studies have demonstrated the potential of in-context learning for machine translation of endangered languages. Zhang et al. [9] showed that incorporating linguistic descriptions in prompts significantly improves translation quality for critically endangered languages. Tanzer et al. [10] established benchmarks for learning translation from grammar books alone, while Hus and Anastasopoulos [11] explored the integration of grammatical knowledge in few-shot translation. Pei et al. [12] provided comprehensive analysis of in-context machine translation for low-resource languages through a case study on Manchu, demonstrating the effectiveness of ICL for morphologically complex languages.

For Tarifit specifically, El Ouahabi et al. [13] developed automatic speech recognition systems for Amazigh-Tarifit, while Boulal et al. [14] explored data augmentation techniques using convolutional neural networks for Amazigh speech recognition. However, systematic evaluation of ICL strategies for Berber languages, particularly regarding optimal shot selection, prompt engineering, and cross-lingual transfer patterns, remains unexplored.

1.4. Contributions and Novelties

This study addresses this critical gap by providing the first comprehensive investigation of in-context learning for Tarifit machine translation. We systematically evaluate the impact of various ICL components: shot selection strategies, prompt formulation techniques, and linguistic context integration across three language pairs (Tarifit–Arabic, Tarifit–French, Tarifit–English) that reflect the multilingual environment of Tarifit speakers. Through controlled experiments using state-of-the-art LLMs (GPT-4, Claude-3.5, PaLM-2) and a carefully curated dataset of 1000 sentences, we establish baseline performance metrics and identify optimal ICL configurations for this morphologically complex, low-resource language.

Our contributions include: (1) the first systematic evaluation of ICL for Tarifit translation, establishing baseline benchmarks across multiple language pairs; (2) comprehensive analysis of shot selection strategies and their impact on translation quality; (3) identification of optimal ICL configurations for morphologically rich, low-resource languages; and (4) practical recommendations for developing effective few-shot translation systems for endangered Berber languages. These findings advance our understanding of ICL capabilities for low-resource MT and provide a methodological framework for similar investigations in other underrepresented language families.

This study establishes baseline performance metrics for Tarifit ICL translation while acknowledging important methodological constraints. Our evaluation focuses on Latin script representation and unidirectional translation (Tarifit as source), with a dataset of 1000 sentences that, while substantial for low-resource language research, remains modest for comprehensive linguistic coverage. These limitations, detailed in Section 6, define the scope of our findings and highlight critical directions for future work in Berber language processing.

This study addresses three key research questions: RQ1: What is the optimal number of demonstration examples (shot count) for Tarifit ICL translation across different target languages? RQ2: How do different shot selection strategies (random, similarity-based, diversity-based) impact translation quality for morphologically complex low-resource languages? RQ3: To what extent does linguistic proximity between Tarifit and target languages (Arabic, French, English) influence ICL translation performance?

1.5. Organization of This Paper

The remainder of this paper is organized as follows: Section 2 reviews related work in Amazigh language processing and in-context learning for low-resource translation. Section 3 provides essential background on the Tarifit language, including its geographic distribution, writing systems, and computational challenges. Section 4 details our methodology, including the ICL framework, dataset construction, and evaluation protocols. Section 5 presents comprehensive experimental results. Finally, Section 6 discusses findings and implications with future research directions.

2. Related Work

The computational processing of Amazigh languages has seen a notable increase in scholarly attention in recent years, though research efforts remain distributed across various Amazigh varieties and natural language processing (NLP) tasks. This section reviews existing literature on Amazigh language processing, machine translation, and the emerging field of in-context learning for low-resource languages, with a particular focus on studies relevant to Tarifit.

2.1. Amazigh Language Processing and Machine Translation

Early computational work on Amazigh languages laid essential groundwork by addressing foundational NLP tasks. Outahajala et al. [15,16] established initial text processing methodologies, while Boulaknadel and Ataa Allah [17] developed standardized Amazigh corpora. A significant milestone was the creation of the first parallel multilingual corpus of Amazigh by Ataa Allah and Miftah [18], providing critical infrastructure for subsequent machine translation (MT) research.

Despite these foundational efforts, MT research for Amazigh languages remains relatively limited compared to other NLP domains. Taghbalout et al. [6] introduced a UNL-based approach for Moroccan Amazigh, albeit with restricted vocabulary. A more recent advancement was presented by Maarouf et al. [7], who developed the first transformer-based English-to-Amazigh translation system, demonstrating the potential of neural approaches, although their evaluation was conducted on small parallel corpora. Diab et al. [19] explored guided back-translation techniques for Kabyle–French, representing one of the few studies addressing Berber–European language pairs.

The inherent morphological complexity of Amazigh languages poses substantial challenges for MT systems. To address this, Nejme et al. [20,21] developed finite-state morphological analyzers, and Ammari and Zenkoua [22] contributed specialized work on pronominal morphology. These developments have been crucial in building the computational foundations necessary for advanced NLP applications for Amazigh.

2.2. LLM-Based In-Context Machine Translation

The emergence of large language models has revolutionized machine translation for low-resource languages through in-context learning capabilities. Brown et al. [8] first demonstrated that large language models could perform translation tasks using only a few demonstration examples in the prompt, without requiring parameter updates or fine-tuning.

Building on this foundation, several studies have specifically investigated ICL for low-resource and endangered languages. Lin et al. [23] explored few-shot learning with multilingual generative models, demonstrating effectiveness across various language pairs. Vilar et al. [24] provided systematic evaluation of prompting strategies for translation, establishing best practices for prompt engineering and shot selection.

Dictionary-based approaches have shown particular promise for low-resource ICL. Ghazvininejad et al. [25] introduced dictionary-based phrase-level prompting, showing significant improvements when lexical information is incorporated into prompts. Elsner and Needle [26] demonstrated effective translation of a low-resource language using GPT-3 with human-readable dictionaries, highlighting the potential of combining linguistic resources with few-shot learning.

Recent work has specifically targeted endangered and critically low-resource languages. Zhang et al. [9] showed that incorporating linguistic descriptions in prompts significantly improves LLM performance on endangered languages, while Zhang et al. [27] explored teaching LLMs unseen languages through in-context examples. Tanzer et al. [10] established benchmarks for learning translation from grammar books alone, providing systematic evaluation of grammatical knowledge integration in ICL.

Grammar-based approaches have yielded mixed results. Hus and Anastasopoulos [11] explored translation using grammar books, finding that while grammatical information can be helpful, its integration requires careful prompt engineering. Merx et al. [28] conducted a comprehensive study on Mambai, demonstrating the effectiveness of retrieval-augmented prompting for extremely low-resource languages.

However, systematic evaluation of ICL strategies specifically for Berber languages remains absent from the literature. Most existing work has focused on individual language cases without cross-linguistic analysis, and none has addressed the particular challenges posed by the morphological complexity and multilingual context characteristic of Tarifit and related Berber languages.

2.3. Tarifit-Specific Research and Digital Linguistic Landscape

Computational research specifically targeting Tarifit is notably sparse. Recent pioneering work includes Awar.ai [3], which introduced the first automatic speech recognition system designed specifically for Tarifit. Most relevant to the current study, Tahiri [29] conducted an in-depth analysis of word boundaries in Standard Amazigh writing through the lens of Tarifit Facebook users. Her findings highlighted significant orthographic variation, inconsistent digital writing practices, mixed script usage (Latin, Tifinagh, Arabic), and irregular tokenization patterns. These observations are directly pertinent to translation evaluation, as they underscore the complexity of real-world Tarifit text and the challenges it presents to automated systems.

The broader sociolinguistic context of Tarifit is also crucial for understanding its computational challenges. Aissati et al. [4] examined Amazigh language policy in Morocco, revealing the complex multilingual environment where Tarifit speakers frequently engage in code-switching between Tarifit, Arabic, and French. Ait Laaguid and Khaloufi [5] further documented similar multilingual practices in social media contexts, illustrating dynamic linguistic mixing that poses considerable challenges for automated translation systems.

These issues are further exacerbated by the broader challenge of linguistic underrepresentation in NLP. Joshi et al. [1] highlighted the critical underrepresentation of languages like Tarifit in mainstream NLP systems. Ataa Allah and Boulaknadel [30] surveyed emerging trends in less-resourced language processing, identifying key challenges specific to various Amazigh varieties.

Our current study addresses critical gaps within this research landscape by providing the first systematic evaluation of in-context learning for Tarifit translation, establishing baseline performance metrics and optimal ICL configurations for this underrepresented but culturally significant Berber language.

3. Tarifit Language Background

Tarifit (ISO 639-3: rif), also known as Northern Berber or Rifian, is a Berber language belonging to the Afroasiatic language family. The Tarifit language community represents a significant yet underserved linguistic group in the digital age. With 5 million speakers worldwide, including 3 million in Northern Morocco and major communities across Europe—notably in Belgium (700,000), Netherlands (600,000), France (300,000), and Spain (220,000)—this vibrant community lacks access to modern language technologies that many other languages take for granted [3].

3.1. Geographic Distribution and Multilingual Context

Tarifit is primarily spoken in the Rif region of Northern Morocco, encompassing provinces such as Al Hoceima, Nador, and parts of Taounate and Taza. However, the language extends far beyond Morocco’s borders through substantial diaspora communities established through decades of migration. As detailed in Table 1, the global distribution of Tarifit speakers creates a complex multilingual landscape with varying contact languages across different regions.

This geographic distribution creates a complex sociolinguistic landscape where Tarifit speakers regularly engage in code-switching between Tarifit, Arabic, and French in Morocco, or between Tarifit and European languages in diaspora communities. The multilingual competence of Tarifit speakers presents both opportunities and challenges for machine translation systems, as natural Tarifit discourse often contains lexical borrowings and code-mixed segments that automated systems must handle appropriately. The choice of Arabic, French, and English as target languages in our study directly reflects this multilingual reality, covering the primary contact languages across different Tarifit-speaking regions.

3.2. Writing Systems and Orthographic Variation

Tarifit can be written using multiple script systems: Tifinagh (the traditional Berber script), Latin script, Berber Latin script, or Arabic letters, reflecting the diverse literacy practices and historical influences within the community [3]. Table 2 illustrates this orthographic diversity through a simple greeting, demonstrating how the same linguistic content can appear in multiple written forms depending on the context and community practices.

Tifinagh represents the traditional indigenous script, increasingly promoted in educational and cultural contexts as part of Berber language revitalization efforts. The Latin-based orthographies are most common in digital contexts and educational materials, while Arabic script usage reflects the broader Arabic literacy in Morocco. This orthographic diversity poses significant challenges for computational processing, as the same linguistic content may appear in multiple scripts depending on the writer’s background, intended audience, and platform.

For the purposes of this study, we focus exclusively on Latin script representation of Tarifit, as it is the most prevalent form in digital communication and online resources from which our dataset is constructed. This methodological choice allows for consistent preprocessing and evaluation while avoiding the additional complexity of cross-script normalization, though we acknowledge that a comprehensive Tarifit NLP system would ultimately need to handle all script variants. The lack of standardized orthographic conventions across different Tarifit-speaking communities further complicates automated text processing, creating additional preprocessing challenges for machine translation systems, particularly when training data or evaluation metrics must account for multiple valid representations of the same linguistic content.

3.3. Linguistic Features and Computational Challenges

Tarifit exhibits the rich agglutinative morphology characteristic of Berber languages, with complex verbal inflection systems that mark person, number, gender, tense, aspect, and mood through prefixes, suffixes, and internal vowel alternations. This morphological complexity, combined with relatively free word order and extensive use of clitics, presents substantial challenges for automated parsing and translation. Traditional rule-based approaches struggle with the combinatorial complexity of morphological variations, while neural approaches require large training corpora that are unavailable for Tarifit.

The language demonstrates significant lexical borrowing from Arabic due to centuries of contact, creating cognates and shared vocabulary that may facilitate cross-lingual transfer in machine translation contexts, particularly for the Tarifit–Arabic language pair. However, this lexical overlap also introduces false friends and semantic shifts that can mislead automated systems. Additionally, the limited availability of digital corpora and standardized linguistic resources severely constrains the development of traditional data-driven NLP applications, making in-context learning approaches particularly valuable for this language community. The sparse digital presence and inconsistent writing practices documented in recent studies [29] further underscore the need for robust few-shot learning methodologies that can operate effectively with minimal training data.

Figure 1 illustrates the complex morphological structure characteristic of Tarifit through representative examples that demonstrate the agglutinative nature of the language and the computational challenges it presents for automated processing.

4. Methodology

4.1. In-Context Learning Framework

In-context learning enables large language models to perform translation tasks using only demonstration examples provided in the prompt, without parameter updates [8]. This paradigm is particularly promising for low-resource languages like Tarifit, where traditional neural machine translation approaches are hindered by the scarcity of parallel training data [7]. For a given Tarifit sentence x, the ICL translation process is formalized as:

\hat{y} = LLM (π (D, x))

(1)

where

\hat{y}

is the predicted translation,

LLM (\cdot)

represents the large language model, and

π (D, x)

is the prompt construction function combining demonstration examples

D = {(x_{1}, y_{1}), (x_{2}, y_{2}), \dots, (x_{k}, y_{k})}

with the input sentence x. Each

(x_{i}, y_{i})

pair consists of a Tarifit sentence and its corresponding translation.

The prompt construction follows the standard ICL framework where the context C contains task instructions and demonstration examples. Complete prompt templates for all language pairs are provided in Appendix A.1.

C = {I, s (x_{1}, y_{1}), s (x_{2}, y_{2}), \dots, s (x_{k}, y_{k}), x}

(2)

Here, I represents task instructions,

s (x_{i}, y_{i})

represents formatted demonstration examples, and x is the input sentence to be translated. Figure 2 illustrates the complete ICL pipeline for Tarifit translation.

4.2. Dataset Construction

We constructed a comprehensive dataset of 1000 Tarifit sentences with parallel translations in Arabic, French, and English. The dataset was carefully designed to capture the linguistic diversity and cultural richness of Tarifit across multiple domains and contexts (Table 3). The stratified sampling approach ensures representation of different linguistic phenomena, including morphological complexity, lexical borrowing from Arabic, and code-switching patterns characteristic of natural Tarifit discourse.

The dataset is available for research purposes upon request through our institutional ethics committee, with data collection methodology detailed in Appendix A.4.

All texts use Latin script representation following our methodological scope (Section 3.2). Reference translations were produced by three qualified native Tarifit speakers fluent in the target languages, emphasizing semantic accuracy and cultural appropriateness. Translation guidelines prioritized preserving cultural nuances and idiomatic expressions while maintaining natural target language fluency.

4.3. Shot Selection Strategies

The selection of appropriate demonstration examples is crucial for ICL performance. We evaluate three distinct shot selection approaches, each representing different strategies for optimizing the demonstration set composition.

4.3.1. Random Selection

Our baseline approach randomly samples k examples from the available parallel corpus:

D_{random} = UniformSample (P, k)

(3)

This strategy provides a control condition for evaluating the effectiveness of more sophisticated selection methods.

4.3.2. Similarity-Based Selection

This approach selects demonstrations most semantically similar to the input sentence using cosine similarity between multilingual sentence embeddings:

sim (x, x_{i}) = \frac{emb (x) \cdot emb (x_{i})}{| | emb (x) | | \cdot | | emb (x_{i}) | |}

(4)

This similarity measure computes how semantically related two sentences are by comparing their vector representations in a high-dimensional space. Higher values (closer to 1) indicate more similar meaning, while lower values (closer to 0) suggest different semantic content. This approach ensures that demonstration examples share semantic characteristics with the input sentence, potentially improving the model’s ability to recognize relevant translation patterns.

D_{sim} = arg max_{D \subset P, | D | = k} \sum_{(x_{i}, y_{i}) \in D} sim (x, x_{i})

(5)

We employ multilingual sentence embeddings to compute semantic similarity, enabling effective cross-lingual demonstration selection. The complete algorithm implementation can be found in Algorithm A1 (Appendix A.5).

4.3.3. Diversity-Based Selection

This strategy ensures that demonstrations span different linguistic patterns while maintaining relevance to the input sentence, balancing similarity and coverage of morphological variations. The approach prioritizes examples that collectively cover diverse grammatical structures, vocabulary domains, and sentence lengths to provide comprehensive linguistic context for the model.

4.4. Model Configuration

We evaluate three state-of-the-art large language models: GPT-4, Claude-3.5, and PaLM-2. These models span diverse architectures and parameter sizes, offering a representative comparison of modern LLM capabilities.

All models use temperature = 0 for deterministic outputs, ensuring reproducible results across experimental runs. We systematically vary the number of demonstration examples

k \in {1, 3, 5, 8, 10, 15}

to identify optimal shot counts for each target language and model combination. This range covers the spectrum from few-shot to many-shot learning scenarios within typical context window constraints.

Table 4 presents the technical specifications of the three LLMs selected for this study. These models were chosen to represent different architectural approaches and parameter scales, providing comprehensive coverage of current state-of-the-art capabilities for in-context learning tasks.

4.5. Evaluation Framework

4.5.1. Automatic Metrics

We employ multiple automatic evaluation metrics to capture different aspects of translation quality:

BLEU: Standard n-gram precision metric measuring lexical overlap:

BLEU = BP \times exp (\sum_{n = 1}^{4} \frac{1}{4} log p_{n})

(6)

chrF: Character-level F-score particularly suitable for morphologically rich languages:

chrF = \frac{(1 + β^{2}) \times chrP \times chrR}{β^{2} \times chrP + chrR}

(7)

BERTScore: Semantic similarity using contextual embeddings:

BERTScore = \frac{1}{| x |} \sum_{x_{i} \in x} max_{y_{j} \in y} emb (x_{i}) \cdot emb (y_{j})

(8)

The combination of these metrics provides complementary perspectives on translation quality, with BLEU capturing lexical fidelity, chrF addressing morphological variations, and BERTScore measuring semantic preservation.

4.5.2. Human Evaluation

A subset of 200 translations undergoes evaluation by qualified native speakers using 5-point Likert scales for adequacy and fluency assessment. Comprehensive evaluation guidelines, criteria, and evaluator qualifications are specified in Appendix A.2. Inter-annotator agreement is measured using Krippendorff’s alpha to ensure evaluation reliability.

4.5.3. Cross-Validation Protocol

We employ 5-fold cross-validation, where 1000 sentences are partitioned into 5 folds of 200 sentences each. For each fold, 800 sentences serve for shot selection and 200 for evaluation:

Performance (M, S, k, L) = \frac{1}{5} \sum_{f = 1}^{5} Evaluate (M, S (D_{f}, k), T_{f}, L)

(9)

where M is the model, S is the shot selection strategy, k is the number of shots, L is the target language,

D_{f}

is the demonstration set for fold f, and

T_{f}

is the test set for fold f. This protocol ensures robust performance estimates while maximizing the use of our limited dataset.

5. Results

This section presents the systematic evaluation of in-context learning performance for Tarifit translation across three target languages using multiple large language models. All experiments were conducted following the 5-fold cross-validation protocol described in Section 4, with results averaged across folds to ensure robustness.

5.1. Model Performance and Cross-Lingual Analysis

Table 5 presents the translation performance across all model and language combinations using our optimal configuration (8-shot similarity-based selection). The results reveal substantial performance variations both across models and target languages.

Cross-Lingual Performance Patterns: The results demonstrate clear performance hierarchies across target languages. GPT-4 achieved the highest scores for Tarifit→Arabic translation (BLEU: 20.2), followed by Tarifit→French (BLEU: 14.8) and Tarifit→English (BLEU: 10.9). This pattern aligns with our hypothesis regarding linguistic proximity effects, where extensive lexical borrowing and shared vocabulary between Tarifit and Arabic facilitate cross-lingual transfer.

Model-Specific Analysis: GPT-4 emerged as the strongest performer overall, achieving the best scores across all three language pairs. The performance gap between the best and worst-performing models was most pronounced for Tarifit→English, with a difference of 2.8 BLEU points. All models consistently followed the Arabic > French > English performance hierarchy.

Linguistic Proximity Effects: The Tarifit→Arabic language pair consistently outperformed other directions across all models, with an average improvement of 8.4 BLEU points over Tarifit→English. This advantage can be attributed to shared Semitic substrate influences, extensive Arabic lexical borrowing in Tarifit, and similar morphological patterns.

5.2. In-Context Learning Optimization

Figure 3 illustrates the relationship between shot count and translation performance across different target languages and models. Our analysis identifies k = 8 as the optimal number of demonstration examples across most model–language combinations. Performance improvements plateau beyond this point, with slight degradation observed at k = 15, likely due to context window limitations.

Table 6 compares shot selection approaches across all evaluation metrics. Similarity-based selection consistently achieved the highest performance across all language pairs and metrics, with an average improvement of 2.1 BLEU points, 3.3 chrF points, and 3.1 BERTScore points over random selection. The effectiveness of this strategy is evident across lexical (BLEU), morphological (chrF), and semantic (BERTScore) dimensions of translation quality.

Figure 4 provides a visual comparison of these shot selection strategies, clearly illustrating the consistent superiority of similarity-based selection across all three target languages, with Arabic maintaining the highest performance levels followed by French and English.

Bootstrap confidence intervals (95% CI) confirm that performance differences between optimal and suboptimal configurations are statistically significant (p < 0.05) for all model–language combinations. (see Appendix A.6 for detailed statistical analysis procedures).

5.3. Error Analysis and Human Evaluation

Human evaluation was conducted on 200 translations (67 per target language) using the optimal configuration. Three qualified native Tarifit speakers evaluated translations using 5-point Likert scales (Table 7).

Inter-annotator agreement measured by Krippendorff’s alpha was

α = 0.69

for adequacy and

α = 0.65

for fluency, indicating substantial agreement. The correlation between automatic metrics and human judgments was strongest for chrF (r = 0.84), while BLEU showed weaker correlation (r = 0.72) (Figure 5). Representative examples of human evaluation outcomes across different translation quality levels are provided in Appendix A.3, illustrating the range of translation challenges encountered and the evaluation criteria applied.

Systematic error analysis reveals distinct patterns across target languages (Figure 6): Morphological Errors: 42% of errors involved incorrect handling of Tarifit’s agglutinative morphology, most frequent in English translations (48%).

Lexical Errors: 28% of errors involved lexical choices. Arabic translations showed fewer lexical errors (19%) compared to French (31%) and English (39%).

Cultural Errors: 18% of errors involved mistranslation of culture-specific terms, relatively consistent across languages (15–22%).

Code-Switching: 12% of errors occurred in code-switched segments. Arabic translations handled these most effectively (28% success rate vs. 8% for French and 6% for English).

The error analysis reveals that while current LLMs demonstrate promising capabilities for Tarifit translation, systematic challenges remain in morphological processing and cross-linguistic transfer, particularly for typologically distant target languages. The increased dataset size of 1000 sentences provides more robust error pattern identification and demonstrates the persistent challenges in automated processing of morphologically complex low-resource languages.

6. Discussion and Conclusions

6.1. Discussion

This study presents the first systematic evaluation of in-context learning for Tarifit machine translation, establishing baseline performance metrics and identifying optimal ICL configurations for this underrepresented Berber language. Our key contributions include: (1) demonstration that linguistic proximity significantly enhances ICL performance, with Arabic translations outperforming French and English by substantial margins; (2) identification of k = 8 as the optimal shot count and similarity-based selection as the most effective demonstration strategy; (3) comprehensive error analysis revealing systematic challenges in morphological processing and cultural preservation; and (4) validation of chrF as a more appropriate evaluation metric than BLEU for morphologically complex languages.

Our findings have immediate practical implications for developing translation tools for the 5 million Tarifit speakers worldwide, particularly in multilingual contexts where Arabic–Tarifit translation can serve as a bridge for accessing digital resources. The demonstrated effectiveness of ICL approaches, despite performance limitations, provides a viable path forward for developing NLP tools for other low-resource Berber languages, circumventing the data scarcity challenges that have historically limited computational linguistics research in this language family.

The realistic performance benchmarks established in this study (20.2 BLEU for Arabic, 14.8 for French, 10.9 for English using GPT-4) provide a foundation for future research and set appropriate expectations for practical deployment. While these scores indicate that significant challenges remain, particularly in morphological processing and cross-linguistic transfer, they represent substantial progress for a language with virtually no prior computational resources.

6.2. Societal and Educational Implications

The development of effective machine translation for Tarifit has significant implications beyond computational linguistics. For the 5 million Tarifit speakers worldwide, access to translation technology can bridge communication gaps in multilingual contexts, particularly for diaspora communities maintaining connections with their linguistic heritage.

In educational settings, these tools can support Tarifit language preservation efforts by enabling content translation for educational materials and facilitating bilingual education programs. The demonstrated effectiveness of ICL approaches suggests viable pathways for developing similar technologies for other endangered Berber languages, contributing to broader linguistic diversity preservation in the digital age.

Furthermore, the accessibility of ICL methods—requiring minimal computational resources compared to traditional neural machine translation—makes this technology more democratically available to language communities that lack extensive technical infrastructure. This democratization of language technology represents a step toward more equitable representation in artificial intelligence systems. These findings align with patterns observed in ICL research for other morphologically complex languages like Manchu [12] while revealing the unique influence of historical language contact patterns in Berber language processing, where Arabic proximity effects (8.4 BLEU advantage) exceed typical related-language improvements in ICL studies.

6.3. Future Work

Several research directions emerge from our findings. First, extending our methodology to other Berber languages (Tamazight, Tashelhiyt, Kabyle) would validate the generalizability of our ICL optimization strategies and linguistic proximity effects. Second, investigating bidirectional translation capabilities, particularly Arabic→Tarifit, could provide insights into the asymmetries of cross-lingual transfer. Third, developing multilingual ICL approaches that simultaneously leverage multiple target languages could improve overall translation quality through cross-lingual knowledge sharing.

Technical extensions should explore the integration of morphological analyzers and cultural knowledge bases into ICL prompts, potentially addressing the systematic errors identified in our analysis. Additionally, investigating prompt engineering techniques specifically designed for morphologically rich languages could further improve performance. Finally, comprehensive comparison with fine-tuning approaches, when sufficient computational resources permit, would establish the relative merits of ICL versus traditional neural machine translation methods for low-resource languages.

Practical applications should focus on developing user-friendly interfaces that appropriately communicate translation confidence levels and limitations to end users. Given the performance constraints identified, hybrid approaches combining ICL with human post-editing or community-driven correction mechanisms may prove most effective for real-world deployment.

Our work contributes to the broader goal of linguistic equity in artificial intelligence systems, demonstrating that state-of-the-art language models can be effectively adapted for underrepresented languages through carefully designed in-context learning approaches, albeit with realistic performance expectations. As the field moves toward more inclusive NLP technologies, our methodology provides a template for developing translation capabilities for the thousands of low-resource languages that lack sufficient data for traditional machine translation approaches.

Author Contributions

Conceptualization, O.A. and K.F.; methodology, O.A.; investigation, O.A.; resources, O.A.; data curation, O.A.; writing—original draft preparation, O.A.; writing—review and editing, K.F.; visualization, O.A.; supervision, K.F.; project administration, O.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Experimental Reproducibility Details

Appendix A.1. Sample ICL Prompt Templates

Tarifit-to-English Translation Prompt:

Task: Translate the following Tarifit sentences to English.

Examples:

Tarifit: Azul, mlih cha?

English: Hello, how are you?

Tarifit: Ad yas qbar i thmadith

English: He will come before noon

Tarifit: Tamghart ni tsawar tamazight

English: That woman speaks Amazigh

[Additional examples selected based on similarity strategy…]

Now translate:

Tarifit: [INPUT SENTENCE]

English:

Tarifit-to-French Translation Prompt:

Task: Translate the following Tarifit sentences to French.

Examples:

Tarifit: Azul, mlih cha?

French: Bonjour comment allez-vous?

Tarifit: Ad yas qbar i thmadith

French: Il viendra avant midi

[Additional examples…]

Now translate:

Tarifit: [INPUT SENTENCE]

French:

Appendix A.2. Human Evaluation Protocol

Evaluator Qualifications:

Native Tarifit speakers
Fluent in target languages (Arabic/French/English)
Linguistic or translation background preferred

Evaluation Criteria:

Adequacy (1–5): Does the translation convey the meaning of the source text?
- 5: Complete meaning preserved
- 4: Most meaning preserved, minor gaps
- 3: Essential meaning preserved
- 2: Some meaning preserved
- 1: Little or no meaning preserved
Fluency (1–5): Is the translation natural in the target language?
- 5: Perfect fluency
- 4: Good fluency, minor issues
- 3: Acceptable fluency
- 2: Disfluent but understandable
- 1: Very disfluent

Appendix A.3. Sample Human Evaluation Examples

Example 1—High Quality Translation:

Tarifit Source: Azul, mlih cha?
GPT-4 Translation: Hello, how are you?
Reference Translation: Hello, how are you?
Evaluator Scores: Adequacy: 5/5, Fluency: 5/5
Comments: Perfect translation preserving greeting convention

Example 2—Good Translation with Minor Issues:

Tarifit Source: Tamghart ni tsawar tamazight
GPT-4 Translation: That woman speaks Amazigh
Reference Translation: That woman speaks Berber
Evaluator Scores: Adequacy: 4/5, Fluency: 5/5
Comments: Accurate but uses “Amazigh” instead of more common “Berber”

Example 3—Translation with Morphological Error:

Tarifit Source: Netta wa ditis cha
GPT-4 Translation: He will not coming
Reference Translation: He will not come
Evaluator Scores: Adequacy: 3/5, Fluency: 2/5
Comments: Meaning preserved but grammatical error in English

Example 4—Cultural Context Challenge:

Tarifit Source: Chha tsa3at?
GPT-4 Translation: How many hours?
Reference Translation: What time is it?
Evaluator Scores: Adequacy: 2/5, Fluency: 4/5
Comments: Literal translation misses idiomatic time-asking expression

Appendix A.4. Data Collection Methodology

Source Selection: Sentences collected from social media posts, traditional stories, and conversational recordings with speaker consent
Translation Process: Each sentence translated independently by three qualified native speakers
Quality Control: Disagreements resolved through consensus discussion
Cultural Sensitivity: All materials reviewed for cultural appropriateness before inclusion

Appendix A.5. Shot Selection Algorithm

Algorithm 1: Similarity-Based Shot Selection

Require:: Input sentence x, corpus C, shot count k
Ensure:: Selected demonstrations D
1:: Compute embedding $emb (x)$ for input sentence
2:: for each sentence $s_{i} \in C$ do
3:: Compute similarity $sim (x, s_{i}) = \frac{emb (x) \cdot emb (s_{i})}{| | emb (x) | | \cdot | | emb (s_{i}) | |}$
4:: end for
5:: Sort sentences by similarity score (descending)
6:: Select top k sentences as demonstrations D
7:: return D

Appendix A.6. Statistical Analysis Details

Significance Testing: Paired t-tests for performance comparisons
Confidence Intervals: Bootstrap sampling with 1000 iterations
Effect Size: Cohen’s d for practical significance assessment
Multiple Comparisons: Bonferroni correction applied where appropriate

References

Joshi, P.; Santy, S.; Budhiraja, A.; Bali, K.; Choudhury, M. The state and fate of linguistic diversity and inclusion in the NLP world. arXiv 2020, arXiv:2004.09095. [Google Scholar]
Galla, C.K. Indigenous language revitalization, promotion, and education: Function of digital technology. Comput. Assist. Lang. Learn. 2016, 29, 1137–1151. [Google Scholar] [CrossRef]
Awar.ai: First Speech Recognition for Tarifit. Available online: https://awar.ai (accessed on 5 May 2025).
Aissati, A.E.; Karsmakers, S.; Kurvers, J. ‘We are all beginners’: Amazigh in language policy and educational practice in Morocco. Comp. A J. Comp. Int. Educ. 2011, 41, 211–227. [Google Scholar] [CrossRef]
Ait Laaguid, B.; Khaloufi, A. Amazigh language use on social media: An exploratory study. J. Arbitrer 2023, 10, 24–34. [Google Scholar] [CrossRef]
Taghbalout, I.; Allah, F.A.; Marraki, M.E. Towards UNL-based machine translation for Moroccan Amazigh language. Int. J. Comput. Sci. Eng. 2018, 17, 43–54. [Google Scholar] [CrossRef]
Maarouf, O.; Maarouf, A.; El Ayachi, R.; Biniz, M. Automatic translation from English to Amazigh using transformer learning. Indones. J. Electr. Eng. Comput. Sci. 2024, 34, 1924–1934. [Google Scholar] [CrossRef]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
Zhang, K.; Choi, Y.; Song, Z.; He, T.; Wang, W.Y.; Li, L. Hire a linguist!: Learning endangered languages in LLMs with in-context linguistic descriptions. In Findings of the Association for Computational Linguistics: ACL 2024; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 15654–15669. [Google Scholar]
Tanzer, G.; Suzgun, M.; Visser, E.; Jurafsky, D.; Melas-Kyriazi, L. A benchmark for learning to translate a new language from one grammar book. In Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
Hus, J.; Anastasopoulos, A. Back to school: Translation using grammar books. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; pp. 20207–20219. [Google Scholar]
Pei, R.; Liu, Y.; Lin, P.; Yvon, F.; Schütze, H. Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu. arXiv 2025, arXiv:2502.11862. [Google Scholar]
El Ouahabi, S.; Atounti, M.; Bellouki, M. Toward an automatic speech recognition system for amazigh-tarifit language. Int. J. Speech Technol. 2019, 22, 421–432. [Google Scholar] [CrossRef]
Boulal, H.; Bouroumane, F.; Hamidi, M.; Barkani, J.; Abarkan, M. Exploring data augmentation for Amazigh speech recognition with convolutional neural networks. Int. J. Speech Technol. 2024, 28, 53–65. [Google Scholar] [CrossRef]
Outahajala, M.; Zenkouar, L.; Rosso, P.; Martí, A. Tagging amazigh with ancorapipe. In Proceedings of the Workshop on Language Resources and Human Language Technology for Semitic Languages, Valletta, Malta, 26 January; 2010; pp. 52–56. [Google Scholar]
Outahajala, M. Processing Amazighe Language. In Natural Language Processing and Information Systems, Proceedings of the 16th International Conference on Applications of Natural Language to Information Systems, NLDB 2011, Alicante, Spain, 28–30 June 2011; Proceedings 16; Springer: Berlin/Heidelberg, Germany, 2011; pp. 313–317. [Google Scholar]
Boulaknadel, S.; Ataa Allah, F. Building a standard Amazigh corpus. In Proceedings of the Third International Conference on Intelligent Human Computer Interaction (IHCI 2011), Prague, Czech Republic, 29–31 August 2011; Springer: Berlin/Heidelberg, Germany, 2012; pp. 91–98. [Google Scholar]
Allah, F.A.; Miftah, N. The First Parallel Multi-lingual Corpus of Amazigh. Fadoua Ataa Allah J. Eng. Res. Appl. 2018, 8, 5–12. [Google Scholar]
Diab, N.; Sadat, F.; Semmar, N. Towards Guided Back-translation for Low-resource languages—A Case Study on Kabyle-French. In Proceedings of the 2024 16th International Conference on Human System Interaction (HSI), Paris, France, 8–11 July 2024; pp. 1–4. [Google Scholar]
Nejme, F.Z.; Boulaknadel, S.; Aboutajdine, D. Finite state morphology for Amazigh language. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics, Samos, Greece, 24–30 March 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 189–200. [Google Scholar]
Nejme, F.Z.; Boulaknadel, S.; Aboutajdine, D. AmAMorph: Finite state morphological analyzer for amazighe. J. Comput. Inf. Technol. 2016, 24, 91–110. [Google Scholar] [CrossRef]
Ammari, R.; Zenkoua, A. APMorph: Finite-state transducer for Amazigh pronominal morphology. Int. J. Electr. Comput. Eng. 2021, 11, 699. [Google Scholar] [CrossRef]
Lin, X.V.; Mihaylov, T.; Artetxe, M.; Wang, T.; Chen, S.; Simig, D.; Ott, M.; Goyal, N.; Bhosale, S.; Du, J.; et al. Few-shot learning with multilingual generative language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 9019–9052. [Google Scholar]
Vilar, D.; Freitag, M.; Cherry, C.; Luo, J.; Ratnakar, V.; Foster, G. Prompting PaLM for translation: Assessing strategies and performance. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 15406–15427. [Google Scholar]
Ghazvininejad, M.; Gonen, H.; Zettlemoyer, L. Dictionary-based phrase-level prompting of large language models for machine translation. arXiv 2023, arXiv:2302.07856. [Google Scholar]
Elsner, M.; Needle, J. Translating a low-resource language using GPT-3 and a human-readable dictionary. In Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology, Toronto, ON, Canada, 14 July 2023; pp. 1–13. [Google Scholar]
Zhang, C.; Liu, X.; Lin, J.; Feng, Y. Teaching large language models an unseen language on the fly. In Findings of the Association for Computational Linguistics: ACL 2024; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 8783–8800. [Google Scholar]
Merx, R.; Mahmudi, A.; Langford, K.; de Araujo, L.A.; Vylomova, E. Low-resource machine translation through retrieval-augmented LLM prompting: A study on the Mambai language. In Proceedings of the 2nd Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia (EURALI) @ LREC-COLING 2024, Turin, Italy, 20–24 May 2024; pp. 1–11. [Google Scholar]
Tahiri, N. Word Boundaries in the Writing System of Standard Amazigh: Challenges from Tarifit Facebook Users. In The Handbook of Berber Linguistics; Springer: Berlin/Heidelberg, Germany, 2024; pp. 229–253. [Google Scholar]
Ataa Allah, F.; Boulaknadel, S. New trends in less-resourced language processing: Case of Amazigh language. Int. J. Nat. Lang. Comput. (IJNLC) 2023, 12. [Google Scholar]

Figure 1. Tarifit morphological structure examples. The diagram illustrates key morphological patterns in Tarifit: (a) circumfix negation wa…cha surrounding the verb phrase in “he will not come”, (b) complex negation with locative and existential verb in “why doesn’t he exist here?” (c) feminine circumfix ta-…-t surrounding the root in “woman”, and (d) interrogative construction in “what time?” Color coding indicates grammatical markers (red), root words (green), verbs/nouns (blue), and question words (yellow). These circumfix patterns demonstrate the computational challenges posed by Tarifit’s discontinuous morphemes for machine translation systems.

Figure 2. In-context learning framework for Tarifit translation. The diagram shows the six-step process: (1) Tarifit input sentence, (2) shot selection strategy with corpus and target language information, (3) demonstration example selection, (4) prompt construction, (5) LLM processing, and (6) translation output. The evaluation framework employs automatic metrics (BLEU, chrF, BERTScore), human evaluation, and 5-fold cross-validation.

Figure 3. Translation Performance vs. Shot Count Across Models and Languages. The graphs demonstrate that performance consistently improves from 1-shot to 8-shot configurations across all model–language combinations, with diminishing returns beyond this point. Arabic translations (blue lines) show the steepest improvement curves and highest peak performance, while English translations (red lines) exhibit more modest gains. The plateau effect after 8 shots suggests optimal context utilization, with slight degradation at 15 shots indicating context window limitations. GPT-4 (solid lines) consistently outperforms other models, maintaining larger performance gaps for typologically distant language pairs.

Figure 4. Shot selection strategy performance.

Figure 5. Automatic vs. human evaluation correlation.

Figure 6. Error type distribution by language.

Table 1. Tarifit speaker distribution by region.

Region/Country	Speakers	Primary Contact Languages
Northern Morocco	3,000,000	Arabic, French
Belgium	700,000	Dutch, French
Netherlands	600,000	Dutch
France	300,000	French
Spain	220,000	Spanish, Catalan
Other Europe	180,000	Various
Total	5,000,000	-

Table 2. Example of Tarifit orthographic variation.

Script	Text	Usage Context
Tifinagh		Cultural, digital, academic, and formal contexts
Latin	Azul, mlih cha?
Berber Latin	Aẓul, mliḥ ca?
Arabic
Translation: “Hello, how are you?”
Additional Examples in Context:
Latin	Yossid qbar i thmadith	He came before noon
Latin	Tamghart ni thsawar tamazight	That woman speaks Amazigh
Latin	Wanin bo awar ni	They don’t say that word

Table 3. Dataset composition.

Domain	Sentences	Avg. Length	Source
Conversational	500	8.2	Social media, interviews
Literary	330	12.4	Traditional stories, poetry
Cultural	170	15.1	Proverbs, oral traditions
Total	1000	10.3	-

Table 4. Large language model specifications and access details.

Specification	GPT-4	Claude-3.5	PaLM-2
Parameters	∼1.7 T	∼200 B	540 B
Context Window	8192 tokens	200 K tokens	8192 tokens
Access Method	OpenAI API (Paid)	Anthropic API (Paid)	Google API (Free tier)
Temperature	0	0	0
Max Tokens	1500	1500	1500
API Rate Limit	10K RPM	5K RPM	1K RPM

Table 5. Translation performance across models and target languages. Results show clear performance hierarchies: (1) Arabic consistently outperforms French and English due to linguistic proximity and shared vocabulary, (2) GPT-4 achieves superior performance across all language pairs, with the largest advantage for distant languages, and (3) all models follow the same ranking pattern (Arabic > French > English), indicating robust cross-model linguistic proximity effects. BLEU scores represent lexical overlap, chrF captures morphological accuracy, and BERTScore measures semantic preservation.

Model	Tarifit→Arabic			Tarifit→French			Tarifit→English
Model	BLEU	chrF	BERT	BLEU	chrF	BERT	BLEU	chrF	BERT
GPT-4	20.2	38.7	69.4	14.8	32.1	61.2	10.9	27.8	56.8
Claude-3.5	18.6	36.3	67.1	13.1	29.6	58.9	9.4	25.2	54.3
PaLM-2	16.9	33.8	64.2	11.7	27.4	56.1	8.1	23.1	51.7

Table 6. Shot selection strategy performance across all metrics.

Strategy	Arabic			French			English
Strategy	BLEU	chrF	BERT	BLEU	chrF	BERT	BLEU	chrF	BERT
Random	17.4	34.2	65.1	11.9	28.7	57.8	7.8	24.1	52.3
Similarity	19.7	37.5	68.2	13.6	31.4	60.5	9.5	26.8	55.1
Diversity	18.8	36.1	66.9	12.7	29.9	59.2	8.9	25.4	53.7

Table 7. Human evaluation results.

Language	Adequacy	Fluency	BLEU
Arabic	3.4 ± 0.7	3.6 ± 0.6	20.2
French	2.8 ± 0.8	3.0 ± 0.7	14.8
English	2.5 ± 0.9	2.7 ± 0.8	10.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Akallouch, O.; Fardousse, K. In-Context Learning for Low-Resource Machine Translation: A Study on Tarifit with Large Language Models. Algorithms 2025, 18, 489. https://doi.org/10.3390/a18080489

AMA Style

Akallouch O, Fardousse K. In-Context Learning for Low-Resource Machine Translation: A Study on Tarifit with Large Language Models. Algorithms. 2025; 18(8):489. https://doi.org/10.3390/a18080489

Chicago/Turabian Style

Akallouch, Oussama, and Khalid Fardousse. 2025. "In-Context Learning for Low-Resource Machine Translation: A Study on Tarifit with Large Language Models" Algorithms 18, no. 8: 489. https://doi.org/10.3390/a18080489

APA Style

Akallouch, O., & Fardousse, K. (2025). In-Context Learning for Low-Resource Machine Translation: A Study on Tarifit with Large Language Models. Algorithms, 18(8), 489. https://doi.org/10.3390/a18080489

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

In-Context Learning for Low-Resource Machine Translation: A Study on Tarifit with Large Language Models

Abstract

1. Introduction

1.1. Background

1.2. Challenges and Motivations

1.3. Literature Survey

1.4. Contributions and Novelties

1.5. Organization of This Paper

2. Related Work

2.1. Amazigh Language Processing and Machine Translation

2.2. LLM-Based In-Context Machine Translation

2.3. Tarifit-Specific Research and Digital Linguistic Landscape

3. Tarifit Language Background

3.1. Geographic Distribution and Multilingual Context

3.2. Writing Systems and Orthographic Variation

3.3. Linguistic Features and Computational Challenges

4. Methodology

4.1. In-Context Learning Framework

4.2. Dataset Construction

4.3. Shot Selection Strategies

4.3.1. Random Selection

4.3.2. Similarity-Based Selection

4.3.3. Diversity-Based Selection

4.4. Model Configuration

4.5. Evaluation Framework

4.5.1. Automatic Metrics

4.5.2. Human Evaluation

4.5.3. Cross-Validation Protocol

5. Results

5.1. Model Performance and Cross-Lingual Analysis

5.2. In-Context Learning Optimization

5.3. Error Analysis and Human Evaluation

6. Discussion and Conclusions

6.1. Discussion

6.2. Societal and Educational Implications

6.3. Future Work

Author Contributions

Funding

Conflicts of Interest

Appendix A. Experimental Reproducibility Details

Appendix A.1. Sample ICL Prompt Templates

Appendix A.2. Human Evaluation Protocol

Appendix A.3. Sample Human Evaluation Examples

Appendix A.4. Data Collection Methodology

Appendix A.5. Shot Selection Algorithm

Appendix A.6. Statistical Analysis Details

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI