Skip to Content
  • Article
  • Open Access

2 July 2026

Documenting Environmental Knowledge in the Bahnar Language of Vietnam

,
,
,
,
,
,
,
and
1
UNESCO Chair in Environmental Leadership, Cultural Heritage, and Biodiversity, College of Arts and Sciences, VinUniversity, Hanoi 100000, Vietnam
2
Center for Plants, People, and Culture, New York Botanical Garden, Bronx, NY 10458, USA
3
School of Interdisciplinary Sciences and Arts, Vietnam National University, Hanoi 100000, Vietnam
4
Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign, Champaign, IL 61820, USA

Abstract

Environmental knowledge encoded in Bahnar, an Indigenous language of Vietnam, is vital to the Bahnar community and contributes to broader understandings of biodiversity, climate resilience, and sustainable lifeways. We describe a collaborative approach to documenting Bahnar that integrates computational methods with ethnographic, lexicographic, and linguistic fieldwork. Because Bahnar knowledge is transmitted almost entirely through oral tradition rather than writing, effective documentation cannot rely solely on extractive corpus-based or NLP tools. Although three legacy bilingual Bahnar dictionaries exist, they are partially obsolete, uneven in coverage, and largely inaccessible to the community itself. Our corpus analysis of the Bahnar environmental vocabulary, complemented by intensive community-based fieldwork, reveals semantic patterns that closely link environmental knowledge with Bahnar lifeways, subsistence practices, and material culture. These patterns, we argue, are language-specific and may not emerge from analyses of environmental lexicons in languages such as English or Vietnamese. Bahnar semantic categories attribute aesthetic, medicinal, mythological, and spiritual agency to animals, plants, and landscapes, contrasting with classificatory frameworks common in post-industrial societies that emphasize biophysical, scientific, or economic properties. We propose that community-centered digital lexicography can strengthen Bahnar language vitality, enhance local access to cultural knowledge, and simultaneously advance comparative linguistic and environmental research.

1. The Bahnar of Vietnam

The Bahnar people (Combs, 1873; Nguyễn et al., 1937/2011)—Vietnamese: người Ba Na—comprise one of the main Indigenous groups in Tây Nguyên, the Central Highlands of Vietnam. They speak languages that belong to the Bahnaric group (Sidwell, 2002) within the Mon-Khmer (Austroasiatic) language family. Within the shared ethnicity of Bahnar, there are separate sub-groups such as Kontum, Jơlơng, Rơngao, Gơlar, Konkơdeh, Tơlô, Kriêm, and Bơnâm. The Bahnar live mainly in the provinces of Kon Tum (which in June 2025 was incorporated into Quảng Ngãi province) and Gia Lai in the Central Highlands, both of which encompass our fieldwork sites. With a population of ca. 300,000 in 2019, Bahnar is the largest ethnic group speaking a Mon-Khmer language in the Central Highlands (Bùi, 2006, p. 13). Most Bahnars are bilingual in one or more varieties of Bahnar [bdq] and in Vietnamese [vie], and many also understand Jrai [jra], a Malayo-Polynesian language spoken by the Jarai of the Central Highlands. The use of the Bahnar language in the community is vigorous among all age groups, a hopeful sign.

Biodiversity and Language

Vietnam is a biodiversity hotspot with over 13,000 species of land-based plants, more than 10,000 species of land-based animals, and more than 11,000 marine species. The Central Highlands, where the Bahnar live, are among the “most biologically diverse … [and] largest contiguous forest habitats in the world” (World Wildlife Fund, 2021). They are also recognized as part of a language hotspot (Anderson & Harrison, 2006; Harrison, 2007)—areas of concentration of a high degree of linguistic diversity, a high level of language endangerment, and low levels of documentation—being home to 29 of Vietnam’s 110 listed languages. Cultural diversity directly supports biodiversity, and the two are interlinked in ways that are widely acknowledged but not yet fully understood (Maffi & Woodley, 2010). As Pretty et al. (2009, p. 100) posit: “…any hope for saving biological diversity is predicated on a concomitant effort to appreciate and protect cultural diversity.”
Like most Indigenous peoples, the Bahnar have an intimate dependency on the environment, and they are stewards of forest biodiversity. They possess knowledge of local species and ecosystems that surpasses what is currently known to science (UNDP, 2024), with such Indigenous ecological knowledge embedded in their language. This knowledge is of potential benefit to all of humanity, but only if it can be documented and preserved. Crucially, both the Bahnar people’s language and their ancestral forest landscape are in a precarious situation (Lê, 2024) and are facing many challenges. A growing body of research posits the “intertwinement of language and land” (Ferguson & Weaselboy, 2020; Harrison, 2023) and shows that, from Indigenous perspectives, the revitalization of languages and the protection of endangered biodiversity must happen in tandem to succeed.
We explore the utility of an environmental linguistics (Harrison, 2023) approach to documenting the rich Bahnar lexicon and thus supporting the biodiversity they steward, with a focus on the structure and content of Bahnar environmental knowledge as expressed in the lexicon.

2. Framing the Project

2.1. Research Questions

Environmental linguistics examines how languages uniquely encode knowledge of the natural world at all levels of structure (Harrison, 2023). This includes: phonesthetic encoding of the natural world (Nuckolls, 1996; Harrison, 2004); affixal morphemes that denote biotic life forms, as in Burmese, where the morphemes/PiN/and/KɔN/, meaning ‘plant’ and ‘animal’, respectively, function both as independent nouns and as classifying elements within compound nouns (Vittrant, 2002); the lexicon and its organization into taxonomies; phrases and metaphors that express natural phenomena; and nature-centric texts of all genres. In applying this framework to Bahnar, we demonstrate the depth and structural diversity of nature-encoded knowledge across multiple levels of the lexicon, including verbs, reduplicated forms, and specialized terminology. We also attempt to quantify and uncover structure in the nature-reflective lexicon by sorting it into culturally coherent categories, uncovering semantic hierarchies, and identifying Bahnar concepts that may lack direct equivalents in French or English, such as aploh (verb) ‘to travel downriver by boat following the current’. Finally, Bahnar provides evidence for a central claim within environmental linguistics: that standard ontological categories are insufficient to capture Indigenous environmental knowledge. Western-centric ontologies cannot accommodate the spiritual animacy and agency that are central to the Bahnar worldview (Conklin, 1962). The forest (rơng) encompasses not only flora, fauna, and topography, but spirits, omens, and the imputed sentience of the landscape itself, demonstrating that the Bahnar lexicon must be documented on its own ontological terms.
Bahnar is a language with speakers of all ages, who make limited use of literacy and do so mostly in the context of using Bahnar-language scriptures (United Bible Societies, 2008) and hymnals in Catholic churches. Bahnar has some legacy documentation, mostly in the form of bilingual dictionaries, discussed in further detail below. Recent community-led efforts have resulted in the publication of bilingual editions of Bahnar traditional epic tales (Y et al., 2025). Most Bahnar are literate in Vietnamese as a second language, while some also speak English or French.
Relying on Bahnar legacy dictionaries and on conversations and elicitation with native speakers, we explore the following questions: (1) How can we discover the Bahnar worldview by extracting all environmental lexemes from two Bahnar bilingual dictionaries (Atkins, 1996; Drouin et al., 2018; Eidshaug et al., 2024; Khishigsuren et al., 2024)? (2) What proportion of the attested Bahnar lexicon relates directly to the environment and thus reveals local biodiversity and adaptations? And (3) what are culturally relevant ways to sort the extracted terms into semantic categories?
These three questions lead us to further questions: (4) How does the Bahnar lexicon reflect adaptations to the environment, including via the use of “ancestral technologies” (Reynolds-Cuéllar, 2023), or Cultural Keystone Species (CKS) (Nabhan & Carr, 1994)? (5) How do Bahnar linguistic structures, such as concepts, idioms, lexemes, morphemes, metaphors, phrasal collocations, reduplication, and taxonomies, encode a unique view of the environment? And (6) how can corpus linguistics and NLP1 serve the Bahnar community and advance their goals—which they clearly articulated to us—for their language and culture to thrive within the multiethnic context of Vietnam and for the biodiverse cultural landscape they steward to be protected?
Our work is informed by a large body of contemporary theory and practice in “community-driven documentation” (François, 2018) and “community-oriented models of linguistic research” (Berez-Kroeker & Henke, 2016). We are mindful of Repetti’s (2018, p. 128) advice that: “Tools for community-oriented initiatives need to be user-friendly and contain information for a wide range of interests” and should “make the data more widely accessible to two audiences: the community of speakers and the community of linguists.” We also fully agree with the premise articulated by François (2018, p. 285) that: “…communities can benefit from the efforts of language documentation carried out by linguists, through increased access to valuable recordings. Interestingly, these results can also inspire the speakers themselves to pursue the work of documentation on their own language.”
Based on five years of work with the Bahnar community (2022–2026), we explore the above questions, applying lessons learned from our prior experience in community-based digital lexicography (Harrison et al., 2019, 2020). A key premise in all our work is community ownership of language (Ting, 2024), and our practices align with the “non-extractive” model of “third wave NLP” as described in (Bird, 2024). Our work on the Bahnar language serves as a proof of concept that NLP can be not only non-extractive but can also substantially contribute to language vitality when community agency and language ideologies are prioritized.
Four challenges presented themselves, which we have framed as opportunities: (1) The primary orality of Bahnar. Although Bahnar is written for some specific purposes, such as translated scriptures or hymnals used in Bahnar Catholic villages, oral transmission is robust and remains the overwhelming norm. (2) The mediating function of Vietnamese as the contact language in which most Bahnar adults are proficient (with some knowledge of English or French as a third or fourth language among some Bahnar). (3) The existence of a rich yet inaccessible and largely unknown to the community corpus consisting of three legacy bilingual dictionaries, discussed herein. (4) The opportunity to create a Bahnar Talking Dictionary that is smartphone-based and community co-authored and a video corpus of Bahnar narratives, which serve the community while respecting their ownership of all linguistic resources, past and present.

2.2. Methodology

As already indicated, our research adopted a collaborative approach to documenting Bahnar, which integrates computational methods with ethnographic, lexicographic, and linguistic fieldwork.
We began by utilizing two of the existing legacy bilingual Bahnar dictionaries. As Rosner and Sultana (2014, p. 3790) note: “Bilingual dictionaries define word equivalents from one language to another, thus acting as an important bridge between languages. No bilingual dictionary is complete since languages are in a constant state of change.” Recognizing both the incompleteness and value of these sources, we first digitized the dictionaries, then applied AI and NLP tools as well as manual coding: this allowed us to generate a comprehensive taxonomy of Bahnar’s environmental vocabulary, concepts, and cultural models.
Tools such as word clouds and vector maps were also used to categorize and visualize the Bahnar lexicon, so as to aid with the process of more clearly identifying relationships between different terms, thus providing insights into the cultural and ecological significance of the knowledge.
Our research was, in addition, ethnographic, involving fifteen field trips by our research team to the community in the period 2022 to 2026, and reciprocal visits by community members to meet us in Hanoi and in Ho Chi Minh City. With the community, we engaged in participant observation, joining in as many cultural activities as the community would allow us to do, which included basket weaving, shooting bows and arrows, textile weaving, preparing traditional foods, harvesting rice, and navigating the river in traditional canoes. We conducted elicitation and semi-structured interviews with Bahnar experts, including collecting stories related by Bahnar hunters and forest experts, encompassing a variety of topics, including canoe-building, basketry, foodways, local species, rice harvest, and textiles, to extend the lexicon, to collect more in-depth and specialized terms. Digital audio-video recordings, hand sketches and photographs, and hand-written notes were made.
Recognizing that local species biodiversity is not adequately reflected in the legacy dictionaries—for example, many entries are simply defined as ‘a kind of plant’, or ‘a kind of bird’—we created our own local species checklists for Kon Tum province, based on identified geolocated sightings of 900 species reported on the citizen science platform iNaturalist (iNaturalist, 2025), a field guide Birds of Vietnam (Craik & Lê, 2018), and the taxidermized animal collections displayed in Bảo Tàng Kon Tum (Kon Tum Museum). We then used these checklists, with photos, to elicit local animal names in Bahnar (see Figure 1).
Figure 1. (upper left) Bahnar Chief A Bên talks with linguist David Harrison about Bahnar baskets; (upper right) Bahnar Talking Dictionary co-author A Xơm records words with VinUniversity researcher Bùi Cong Minh; (lower left) chief A Bên talks with VinUniversity researcher Nguyễn Duy Hoàng about the Bahnar canoe; (lower right) former Bahnar chief A Banh reviews a bird species checklist with VinUniversity researcher Hoàng Công Minh Khang.
Altogether, this triangulation of approaches was extremely productive for helping the research team identify and document additional environmental vocabulary, beyond what we found in legacy dictionaries. Each method contributed differently to this triangulation, and we will briefly consider what each one revealed and what implications it carries for language documentation and revitalization more broadly. Our first method, creating and refining word clouds, provided a useful visualization of word frequency tables, revealing patterns we might otherwise have missed, and directly informed the thematic wordlists we used in community elicitation sessions, allowing our native speaker consultants to engage with the lexicon in culturally meaningful groupings rather than arbitrary lists (Conklin, 1962). Our next method, semantic clustering, while revealing some novel and culturally coherent natural classes (e.g., “gong.drum.dance”), also introduced noise with some seemingly incoherent classes (e.g., “gong.poison.blood”) that we were unable to interpret as semantically coherent even upon closer inspection. Our final method, AI-assisted tagging, was at first overly narrow, resulting in many mistakenly omitted forms. But with successive prompts, this method approached but did not equal human coding ability. While imperfect, AI-assisted tagging points toward scalable tools that could support future community-led lexicographic work, particularly as large language models improve and become more accessible to under-resourced language communities. Crucially, none of these three methods is duplicative. Each reveals different structural patterns while remaining blind to others. Word clouds foreground frequency; semantic clustering reveals thematic affinity; AI tagging approximates semantic categorization but as we will show, still requires human oversight and community validation to be reliable.

2.3. Ethics

The ethics approval application for human participants research for this project was submitted to and received approval from the Ethics Review Board at VinUniversity. Further, and equally importantly, already from the early planning of this project, permission was obtained from the leadership of the Bahnar village of Kon K’tu, including the current and former chiefs. Our approach to the community was facilitated by one co-author of this paper, who is himself a member of the community, and by a second co-author who has worked closely with the community for over a decade. We also requested and received permission from provincial authorities. Upon commencing our fieldwork, we recruited ca. 20 expert consultants via referral and purposive sampling. All consultants gave verbal and/or written informed consent to share their linguistic knowledge, audio and video recordings, and, in some cases, photographs in the context of our project. Please see Section 2.5 below for further elaboration on our community-centered research.

2.4. Subjectivity

Our research team is diverse, consisting of American, Singaporean, and Vietnamese nationals, including university faculty, undergraduate students, and native speakers. Our academic expertise spans anthropology, business management, cognitive science, computer science, education, ethnomusicology, and several subfields of linguistics. Collectively, we have extensive fieldwork and language documentation experience in Vietnam and many other locations. The expertise of our Bahnar consultants includes agriculture, architecture, basketry, botany, fishing, foodways, gong playing, hunting, navigation, oral traditions, political leadership, ritual, textiles, traditional medicine, translation (Bahnar-Vietnamese and Bahnar-English), wayfinding, and woodworking.
Our research is multi-sited. Most of our computational work was carried out at VinUniversity in Hanoi, with no Bahnar speakers present, but we could communicate with them via text message to answer questions. Our fieldwork, as well as some computational analysis, was carried out primarily in Kon K’tu village (population ca. 1000), in Kon Tum Province, during a dozen field trips by members of our team. Additional expert speaker consultation was carried out in non-academic settings in Hanoi and Ho Chi Minh City with native speakers (including a co-author of this paper) who were residing or visiting there.

2.5. Community-Based Lexicography

An important ethical principle of our work is that the Bahnar language is the cultural property of the Bahnar people. This rules out a data-extraction approach (Bird, 2024) and requires both collaboration with the community and clear benefits for them (Harrison et al., 2019; Cooper et al., 2024; Harrison et al., 2026). As the Bahnar language is both under-resourced and vulnerable, we first consulted with community leaders to learn about their efforts to preserve and revitalize it. We then aligned our project with their goals in three ways.
First, we obtained permission to work in the community and to collect and curate the knowledge that they shared. We also repatriated copies of our newly collected data to them, as well as photocopies of the three legacy dictionaries, which they had not previously seen.
Second, in creating the Bahnar Talking Dictionary and other published materials, we made sure that Bahnar speakers and learners were actively involved in the entire process, from data collection to definition writing to recording. Further, prioritizing the community also entails our ensuring that the dictionary and all published materials reflect the cultural context and nuances of Bahnar, including its unique vocabulary, dialect variation, cultural knowledge, and, crucially, its ontological frameworks. The elaboration of two concrete actions in this regard is as follows.
Sorting lexemes into locally familiar semantic categories proved a beneficial initial step, prior to data collection from the legacy dictionaries and community experts. For a native speaker, sitting and reading words aloud from a list can be an arduous task, but one which can be made more congenial to speakers when words are sorted into culturally meaningful categories (e.g., ‘animals’), as lexical recall is eased by semantic priming (Wagner & Koutstaal, 2002). The use of such “thematic mini-dictionaries” (Mosel, 2014) is well-established in documentary linguistics.
For a culturally appropriate categorization, we started with the language’s own ontological categories (e.g., animal, plant, forest), rather than imposing external frameworks. Indigenous ontologies for the natural world, including the Bahnar worldview, emphasize the interconnectedness of all living things and view nature as imbued with agency and spirits. This contrasts with Western ontologies that tend to separate humans from nature and which have produced very complex and top-down semantic taxonomies (e.g., Raskin & Pan, 2005). The Bahnar concept of rơng (‘forest’), for example, encompasses not only forest dwelling animals and plants, and topography, but also the many spirits which reside in the forest, human actions that relate to the forest (these may take place either inside or outside of the forest), bird sounds and behaviors, odors, omens, and the perceived agency of the forest itself (Trần et al., 2024b). Similarly, baskets, of which there are more than twenty distinct types that we have identified to date (our documentation of Bahnar basket typology continues to expand, cf. Trần et al., 2024a), are naturally grouped into the category of plants for the Bahnar. This combination of natural and supernatural entities and handicrafts is not captured by any single category in Western environmental ontologies, yet for the Bahnar, it represents a unified category of thought and behavior.
Finally, our third action supporting community-based lexicography involves fully attributing all collected data (whether audio, textual, photographic, or video) to its originators. This attribution can be seen in the Bahnar Talking Dictionary (Trần et al., 2023), both at the level of individual entries and top-level co-authorship, and in our published papers (Trần et al., 2024a, 2024b), at the level of co-authorship and/or acknowledgement.
It is worth noting that enthusiastic reactions from the community have been received in relation to the Bahnar Talking Dictionary, even in its current nascent stage, with 3876 lexical entries. As Ogilvie (2011, p. 389) notes: “For the endangered-language speech community, the most useful and relevant research outcome of field linguistics is usually the dictionary. Articles and books on syntax, morphology, or phonology have little relevance to Indigenous speech communities. Dictionaries, however, are not only useful and functional texts, but emblems and tools of prestige which many communities use to boost their sense of identity and their political profiles.” It is for this reason that we decided to prioritize the Bahnar Talking Dictionary, along with a video corpus hosted on YouTube, both accessible to the community on smartphones, which many of them use daily.

3. Creating the Bahnar Environmental Corpus

3.1. Seeding the Bahnar Environmental Corpus from Legacy Dictionaries

As we were in possession of legacy dictionaries, the first step in our corpus-building process was the digitization of these legacy dictionaries, a common practice in linguistics (Harrison & Sariahmed, 2014; Mosel, 2014). These comprised a Bahnar-English dictionary (J. Banker et al., 1979) and a Bahnar-French dictionary (Dourisboure, 1889)—further elaboration of the digitization and analysis is presented below. A second Bahnar-French dictionary (Guilleminet & Alberty, 1959–1963) has not yet been digitized or analyzed at this stage.
Simultaneously, we undertook a process of elicitation to assemble our own corpus, which we would then cross-check with the legacy sources.

3.1.1. Bahnar-English Dictionary

The Bahnar-English dictionary (J. Banker et al., 1979; see Figure 2), digitized, yielded 3804 headwords. Following the convention of this dictionary, headwords are shown in all lowercase in our corpus; in addition, they are also italicized. To estimate the percentage of environment-related terms, two randomly selected sets of 1200 words each were generated, and hand-coded. In set #1, we found 44% (n = 528/1200) to be environment-related, and in set #2, we found 42% (n = 507/1200). These included nouns (nhot ‘vegetables’), modifiers (adrih ‘alive (plant life), raw, unripe’), verbs (leh ‘of trap to spring’), mimetic expressions (brăl-brăl ‘to designate noise and action of easily breaking stick’), and temporal expressions (bơlao ‘day before yesterday’). In coding which words we would count as “environmental”—assuredly a category with porous boundaries—we first established a working operational definition. We defined environmental terms as any lexeme whose primary or secondary meaning relates directly to the natural world, including animals, plants, landscapes, weather, natural phenomena, subsistence activities, and material culture derived from natural resources. General human actions such as ‘speak’ or ‘jump’ were excluded, while verbs meaning ‘to poison fish’ or ‘to harvest rice’ were included. We relied on two complementary sources of judgment: (a) native speaker judgments provided by a Bahnar-fluent co-author, and (b) collective team review involving five researchers with L1 and L2 competencies across Bahnar, Vietnamese, English, and French, and spanning multiple academic disciplines and cultural backgrounds. Rather than coding independently and then reconciling scores, we worked collectively, reviewing each other’s proposed classifications and resolving borderline cases by majority agreement. We did not adopt a formal, quantifiable coding procedure, nor did we calculate inter-rater reliability using metrics such as Cohen’s kappa, which we acknowledge as a limitation of our study. However, the collective and iterative nature of our process, grounded in native speaker judgment, provides a form of consensual validation that we believe is appropriate for a community-based lexicographic project of this kind.
Figure 2. Excerpt from J. Banker et al.’s (1979, p. 17) Bahnar-English dictionary showing headwords (left column) and English glosses (right column). The annotations “G, K”, “KG-1”, etc., indicate the original textual source or village dialect.
We then turned to an analysis of sub-themes, drawing from the 3804-word corpus. At the suggestion of our Bahnar experts, we first extracted plant-related words, finding them to comprise 3.8% (n = 145/3804) of the corpus. We manually coded for these, construing the category broadly to include not only plants (pơnan ‘a plant with long slender leaves used to make mats’) but also plant-related actions (trôih ‘to mend something that is woven with bamboo’, hrao ‘to wash rice’, ngơ’ngot ‘of trees to rub on each other’), and plant-derived things (krăl ‘poison from tree used to put on arrows’, grang ‘a basket used to catch fish’). Overall, plant-related terms were fewer than we anticipated, notwithstanding the highly plant-centered Bahnar lifeways. One possible explanation is that lexicographers are generally untrained in botany and may neglect plants or not pursue finer details. As a subcategory, baskets were also found to be underrepresented with just n = 20 terms. In contrast, our own field research shows baskets to be a Cultural Keystone Object (CKO) (Trần et al., 2024a), having a repertoire of (at our current stage of documentation) at least 20 named types currently produced, plus many names for basket components, parts, and weaving patterns. The fact that basket terms were underrepresented in the legacy dictionary corpus prompted us to collect them directly from speakers, which yielded a set of 36 terms, inclusive of those already found in the dictionary (Trần et al., 2024b).
The category ‘animal’ resulted in a count of 9.4% (n = 357/3804). This includes animals (tơpai ‘rabbit’, kiĕk ‘tiger’, ‘bih ‘snake’, chơdong ‘type of poisonous snake with black and white stripes’), animal-related actions (cheh ‘to hatch’, hrỡp ‘of birds to alight on a tree, perch’, tơgiot ‘of a rabbit to stick up its ears’), animal-related things (giot gơgiot ‘the hop of a rabbit’, hơdrỡp ‘a net bird trap’, phŭr ‘to describe a bird flying off suddenly when disturbed’), animal by-products (adrok ‘the skin that a snake sheds’), and mythical creatures (kiĕk wir ‘a tiger that is able to make itself into a man’). Animal terms reflect the importance of hunting and trapping among the Bahnar, which persists to the present day as an important source of food. As a subcategory of animals, birds account for 19% (n = 67/357), which reflects their symbolic importance in Bahnar animistic beliefs (Trần et al., 2024b).
We presented these thematic wordlists to native speaker consultants for validation and audio recording. Our speakers recognized and validated almost all (ca. 98%) of the listed forms, with some minor variations in pronunciation. They also added a number of their own entries during our elicitation sessions that focused on household gardens, baskets, canoes, and rice harvesting. In the case of differences between the dictionary entries and what our consultants told us, we privileged the latter but retained all attested versions in our corpus while noting the provenance of each alternate form.

3.1.2. Bahnar-French Dictionary

The Bahnar-French dictionary (Dourisboure, 1889; see Figure 3) comprised 4038 headwords. The headwords extracted, following the convention of this dictionary, are shown in our corpus in all uppercase letters. A sample entry reads: DEL ‘Vestige, trace des pieds.’ For ease of analysis, English glosses were added, using Google Translate, as well as our own knowledge of French, to give, e.g., DEL ‘Vestige, footprint.’ In a random 1200-word sample that we hand-coded, environmental terms made up 27% (n = 324/1200), a percentage considerably smaller than that in the Bahnar-English dictionary.
Figure 3. Excerpt from Dourisboure’s (1889, p. 61) Dictionnaire Bahnar-Français, which includes example sentences that will be useful for future corpus building. An example of an environmental phrase is CHƠRO XEM ‘Consult the birds, go listen to their song to know what they announce (supers.[tition])’, CHƠRO has no stand-alone lexical meaning according to this dictionary, except when collocated with the noun XEM ‘birds’. This refers to the present-day Bahnar practice of considering bird sounds as omens that guide them in planning forest forays.
Using keyword searches, categories were extracted, with the following proportions: ‘animals’ 5.2% (n = 210), ‘plants’ 7.6% (n = 307), ‘landscapes’ (to include topographic features, placenames, forests, fields) 5.0% (n = 203), and ‘time’ (days, seasons, etc.) 1.3% (n = 52).

3.2. Identifying Environmental Categories

3.2.1. Finding Environmental Lexemes

Using hand coding of a random sample of 1200 words, as described above, a significant percentage of the lexicon in both legacy dictionaries was identified as environmental (Table 1).
Table 1. Proportion of environmental vocabulary in the lexicon of the two bilingual legacy dictionaries.
Computational methods were then applied with the goal of discovering underlying environmental categories as well as identifying all environmental terms in the Bahnar-English and Bahnar-French dictionaries. Directly using the Bahnar terms for finding environmental terms is challenging due to the lack of data and effective techniques to encode the Bahnar word representation into its respective vector form. As a solution, the English definitions were used instead to represent the dictionary entries’ semantic meaning and take advantage of existing advances in text encoding methods, which are primarily available for English. Clustering methods and word cloud generation were also experimented with as a means to reveal any existing patterns.
Encoding English definitions into a vector representation that effectively captures their meaning is challenging due to language ambiguity. Nonetheless, advancement in text encoding into dense vector representation with transformer architecture enables embedding vectors that can effectively capture semantic similarity. Using the SentenceTransformers library (Reimers & Gurevych, 2019), we employed the current state-of-the-art transformer architecture MPNet (Song et al., 2020) provided to encode the English definitions into the respective embedding vectors. The resulting vectors effectively captured the semantic meaning of the English definitions, which is useful for finding neighbors with similar meanings, as demonstrated in Table 2.
Table 2. Top 5 similar entries for an entry from the Bahnar-English dictionary, using cosine similarity on the embedding vectors.

3.2.2. Frequency Tables and Word Clouds

A word cloud was constructed based on the English definitions. The first attempt, which tokenizes the definition into word-level tokens and performs simple occurrence counting, did not yield clearly distinctive categories. For preprocessing, all punctuation, stop words, and non-English words were removed using the WordNet lexical database provided in the NLTK Python 3.11 library, and dictionary-specific processing was applied to eliminate noise due to the digitizing process. Due to the existence of many tokens that do not carry significant semantic meaning but frequently appear, such as “make”, “one”, “without”, etc., the initial word cloud produced was noisy and skewed toward these words, as seen in Figure 4 and Table 3. Despite some noise in the data, we found this visualization useful for later tasks.
Figure 4. Word clouds for Bahnar-French (left) and Bahnar-English (right) dictionaries, unfiltered.
Table 3. Top 20 words in the Bahnar-French and Bahnar-English dictionaries, unfiltered.
Instead of directly using all tokens from the English definition, we constructed a set of “semantic tokens”, that is, including only those tokens that are semantically relevant to the English definition. More specifically, for each English definition, we identified a maximum of top k tokens within a cosine distance threshold d that is closest to the definition.2
Table 4. Examples of Bahnar entries and their associated semantic tokens within a cosine distance threshold d that is closest to the definition.
Limiting the maximum number of tokens to k ensures that definitions with lots of similar tokens do not skew the word cloud, while limiting the maximum cosine distance ensures that selected tokens are semantically close to the English definition. An occurrence count of the set of these semantic tokens was performed to obtain the final word cloud, which demonstrates much more distinctive categories, as in Figure 5 and Table 5.
Figure 5. Word cloud based on semantic tokens for Bahnar-French (left) and Bahnar-English (right) dictionaries, after limiting maximum number of tokens and maximum cosine distance.
Table 5. Top 20 most frequent semantic tokens in the Bahnar-French and Bahnar-English dictionaries, after limiting maximum number of tokens and maximum cosine distance.
As becomes clear in the improved analysis—after limiting the maximum number of tokens and maximum cosine distance—as illustrated in Figure 5 and Table 5, it is now evident that key environmental categories such as rice, tree, bamboo, wood, fish, and insects are what comprise the top 20 categories for both dictionaries.

3.2.3. Semantic Clustering

The word clouds, as illustrated in Figure 5, identify a large vocabulary of environmental topics, showing the potential for applying clustering techniques for topic discovery. As the embedding vector lies in a high dimensional space, directly using the embedding vectors yields no significant result. Instead, the UMAP algorithm (McInnes et al., 2020) was selected, which performs dimensionality reduction while preserving the topological structure. The compact vector representation can be clustered using HDBSCAN (McInnes et al., 2017) to find clusters with varying densities. However, since HDBSCAN can construct clusters with arbitrary shapes, cluster centers do not carry the expected semantic meaning for topic assignment. Instead, each cluster has a representative exemplar set, where each exemplar lies in a high-density region with numerous neighbors that can be used for topic assignment. As each exemplar is an English definition, its semantic meaning can be represented by the previously constructed set of semantic tokens. We count the tokens with the most occurrences within each cluster’s exemplar set to form the cluster topics.
The cluster diagrams of Figure 6 and Figure 7 show that both dictionaries are rich in environmental lexemes, which can be sensibly grouped and assigned labels by the algorithm. Intuitive clusters are observed for both Bahnar-English, e.g.,: “gong.drum.dance”, “time.long.clock”, “trap.fish.fishing”, etc.; and for Bahnar-French, e.g., “ropes.knot.cord”, “taste.desire.love”, “field.garden.forest”, “bird.pigeon.pheasant”, and “rice.grain.wheat”. A few labels seemed unintuitive or possibly erroneous, e.g., “gong.poison.blood”, “pulley.pull.rattan”, and upon closer investigation, we concluded these could not be considered semantically coherent.
Figure 6. Clustering results for the Bahnar-English dictionary, where ‘x’ indicates a large cluster of 20 or more related terms, ‘.’ in any color denotes a small cluster, and a black dot represents a single outlier.
Figure 7. Clustering results for the Bahnar-French dictionary, where ‘x’ indicates a large cluster of 20 or more related terms, ‘.’ in any color denotes a small cluster, and a black dot represents a single outlier.
We did not, for this project, analyze the arrangement of the clusters, nor look closely at outliers to see what was not clustered, both issues that would be important in building functioning NLP tools for Bahnar. Moreover, WordNet, as applied here, serves as a powerful tool and a rich lexical resource for constructing ontologies based on English (Miller, 1995), but its use for mapping Bahnar semantic relations has limitations and requires adjustments and extensions. WordNet has been successfully used as a foundational resource for ontology construction in several English-language domains, including environmental and biological ones (Cuadros et al., 2010; Buttigieg et al., 2016). The limitation we identify is not with WordNet per se, but with its application to a language whose semantic categories reflect a fundamentally different ontological framework, one in which cultural, spiritual, and material domains are interconnected in ways that English-based conceptual hierarchies do not anticipate. A fundamental limitation, for example, is that WordNet does not capture certain cultural affinities in Bahnar that relate words and concepts: chickens and water buffaloes both have status as ritual sacrifices; houses and baskets are both made from plants and can be considered extensions of plants; trees and canoes are connected, as are insects and food; forests are linked with spirits and associated sentient yet invisible entities, and with birds which provide the crucial omens that guide people in the forest. While algorithmic semantic clustering is an interesting and potentially instructive exercise, we decided that, at this stage of the research, it does not significantly advance our research questions, and so we leave it for later analysis.

4. Key Findings About the Bahnar Environmental Lexicon

The previous sections have offered a detailed account of the triangulation of approaches undertaken in our research. By documenting the lexicon as represented in legacy dictionaries, we begin to discern a community’s traditional relationship to the environment (cf. Eidshaug et al., 2024). By consulting with native speakers and using participant observation methods, we are able to update and enrich this portrait, validate historical terms, collect additional contemporary terms, and directly observe the activities that give rise to the lexicon. Such community-based lexicography contributes to the language’s digitization—and potential continued vitality and generational transmission—and contributes to a more comprehensive, and, crucially, more localized and relevant understanding of biodiversity and sustainability.
Key findings from our lexicographical research are summarized here, which pertain to biodiversity, grammatical features, and other categories, including morals, time, and technologies.

4.1. Biodiversity

The existing Bahnar lexicographic sources offer a limited but promising glimpse of biodiversity. However, many species in the dictionaries are under-differentiated or unidentified, for example, KOPUNG ‘[a] kind of bland melon’, and XUT ‘bee’. The legacy lexicon nevertheless includes a wealth of animals, environmental concepts, plants, objects, qualities, action verbs, and topographic terms. Our project has taken the next logical step by compiling species checklists and eliciting local biological expertise to better identify species.

4.1.1. Community Species Elicitation

We compiled species checklists drawn from approximately 200 representative species selected from 900 geolocated sightings recorded on iNaturalist for Kon Tum province, as well as from the field guide Birds of Vietnam (Craik & Lê, 2018) and the taxidermized animal collections displayed in Bảo Tàng Kon Tum (Kon Tum Museum). Consultants were able to provide Bahnar names for approximately half of the species presented photographically. Cross-checking these elicited names against the legacy dictionaries revealed a significant gap: birds in particular are minimally identified by species name in those resources, appearing more typically as “a type of bird.” This finding underscores both the limitations of legacy lexicographic sources for biodiversity documentation and the irreplaceable value of direct community elicitation for species-level identification. It also confirms that the Bahnar possess species-level environmental knowledge that has not previously been documented, and which is not recoverable from legacy sources alone.
We found iNaturalist to be a useful elicitation tool, with some caveats. Many of the species photos on iNaturalist are amateur-grade and, in some cases, insufficiently clear for reliable identification, particularly for birds. Printed resources such as Birds of Vietnam (Craik & Lê, 2018) provide clearer, consistently detailed color illustrations, and our consultants frequently found these easier to work with, even though they are generally unaccustomed to identifying birds from printed illustrations. We recommend that future elicitation projects of this kind combine both digital and printed resources to maximize species recognition.

4.1.2. Cultural Keystone Species

A number of animals and plants have been identified in this project which may comprise Cultural Keystone Species (Nabhan & Carr, 1994; Coe & Gaoue, 2020).
BLOLong blo ‘Arbre dont le bois dur et pe- sant est très propre pour des constructions à l’abri de la pluie. [Long blo. Tree whose hard and heavy wood is very suitable for constructions sheltered from the rain.]
BLUNGName of a fish.
BOJAThe weasel.
We looked at a variety of factors to consider what species might have cultural keystone status, including word frequency, elaborate naming taxonomies, narratives, and community consensus. We did not apply a formal frequency cutoff, as our corpus is not yet large enough to make frequency alone a reliable criterion; instead, we relied on a combination of qualitative indicators. At this stage of our research, we did not score words according to an index of identified cultural influence (ICI), which relies on up to seven criteria, one of which is “naming and terminology” (Cristancho & Vining, 2004; Garibaldi & Turner, 2004). While a quantitative ICI score may provide a stronger argument for keystone candidacy, we leave this for further research and community validation. Birds collectively, rather than any single species, show high cultural significance: they are the only animal represented iconographically on the sacred gâng pole built for important ceremonies, and bird omens are carefully attended to when planning forest forays. Among plants, the hyam tree (Engelhardia roxburghiana Lindl.) has demonstrated keystone status: its bark is collected to ferment rice wine, and it is subject to community restrictions on use, suggesting both practical and ritual significance. Different criteria were thus applied across taxonomic categories, reflecting the different ways in which cultural salience is expressed for animals versus plants in Bahnar lifeways. Identifying a Cultural Keystone Species has practical value for both language documentation and biodiversity conservation. We intend to focus future documentation work on species to which Bahnar speakers themselves attach special significance.

4.1.3. Landscapes

Ways of conceptualizing, naming, and classifying salient topographic features are culturally determined and represent language-specific adaptations to the environment (Grenoble et al., 2019). Specialized topographic terms in Bahnar, for example, rởỏnöng (‘Places in the river where the water is deep and the current is very weak’), demonstrate their longstanding connection to their land and contribute to the field of landscape linguistics (Mark et al., 2011). This represents a promising area for future research, as illustrated by our most recent fieldwork (in March, 2026), during which we spent time walking through the rice terraces and collected at least a dozen terms relating to natural and man-made hydrological features such as berms, bunds, pond fields, retaining walls, and diversion watercourses.

4.2. Grammatical Features

4.2.1. Reduplication

Bahnar has a rich and productive repertoire of reduplicated forms (E. Banker, 1964). These may express a speaker’s emotional stance (e.g., disgust, anger), consecutive action, shape, multiplicity, or intensity. To extract reduplicated forms from the Bahnar-English dictionary, we started with using a regular expression, supplemented by a manual search. Significantly, more than 80% of attested reduplicated forms in Bahnar could be interpreted as relating to the environment, a selection of which are illustrated below.
“blek-”blekto designate movement of a shiny object
“brĕng-”brĕnhto designate the action and appearance of fumes
bơbôngto designate appearance of a large hole
bru-brato designate many small things coming out from a hole
chŭk-lŭkto describe someone falling to ground or in water
dơdơngto designate any long-shaped object hanging down
đot-đotto describe the waving of a bird’s tail up and down
hlêp-hlap ~ hlêp-hlêpto cut into big slices
hochĕk-hơchăkto describe many small footprints
hơchỡk-hơchăk-măkto describe many large footprints
hơnĕch-hơhochto be ground or chopped very small
klă klangto be shining bright and beautiful
kơla-kơlĕchto have many scars
krỡk krỡkto describe the pain of fish bone in throat, thorn in foot
lôk-sơlôkto become bigger and as dirt in front of plow, waves piling up
lŭk-lŭnhto describe something turning over and over as [a] log [or] pencil
ngêng-jơngêngto designate a long, tall tree or log
‘ngơ’ngêlto shake head—of adults and large animals
‘ngưl-tơ’ngưlto gradually appear from the water
prỡ prỡto describe blood running out of leg or water from a container

4.2.2. Specialized Terms and Lexical Gaps

A candidate term may be identified if it requires a whole phrase or sentence to define it in French or English, for example, APLOH ‘(verb) Go downriver in a boat, following the current’, and NOK (verb) ‘Go upriver in a boat, against the current’. However, we also recognize that, in the pre-Internet era, lexicographers faced unique challenges and may not have had access to certain sources. For example, while English lacks equivalent verbs, there are obscure nautical terms upbound (adj./adv.) ‘against the current’, and downbound (adj./adv.) ‘with a following current’ (United States Coast Guard, n.d.; Oxford English Dictionary, 2023).

4.2.3. Environmental Verbs

In analyzing 1483 verbs from the Bahnar-English dictionary, a category of environmental verbs (n = 240) was identified, which we defined as monolexical verbs describing a direct or specific action on the environment. General verbs with meanings such as ‘cook’, ‘drink’, and ‘weave’ were excluded, although these may also describe environmental actions. We recognize this category as having porous boundaries. A selection is presented below, with a full list provided in Appendix A.
achīt‘to prepare wood for lighting fire’
hơgŏu ~ hơgâu‘to be unsuccessful in hunting animals’
hrang‘to throw a spear’
jơmo‘to be successful in hunting animals’
kao‘to chop wedges in a pole to make a ladder or in poles to make them weak in making a trap’
klang‘to place bamboo tube so that water will run through it’
kơtǐ‘to move circularly in threshing rice’
krŏu‘to poison fish’
man‘to shape form with mud cement to build with these materials’
mok‘to eat grain-like objects by letting them roll into mouth from hand’
pĕnh‘to stretch out cotton to make thread’
pỡr‘to fly, restricted to a mythical snake only’
prung‘to cook in bamboo tube’
sol‘to catch fish by use of a torch’
yŏu‘to catch fish with a basket’
In addition to these verbs being extracted from the dictionary, many of these activities were observed in the Bahnar community during our fieldwork, and the verbs were verified and recorded in situ with speakers.

4.3. Other Categories: Morals, Time, and Technologies

Several other features of language use in relation to the environment emerged clearly in our analyses.
Bahnar lexemes express moral and spiritual values, taboos, rituals, magical powers, and correct modes of behavior with respect to nature and to people, all components of Bahnar animism.
agăm‘cannot marry each other because related’
ai‘lucky, fortunate, also meaning of strength, authority in some mystical sense’
“băng‘sorcerize’
deng‘to bewitch, sorcerize’
CHƠRO XEM ‘Consult the birds, go listen to their song to know what they announce (supers.[tition])’
pơhnhŏng‘to point crossbow gun at someone mistakenly’
pơyơh, pơma pơyơh‘of young people to speak a special secret language so older people won’t understand’
sơk‘to pull up shirt to keep from getting wet’
sơkat‘to speak with magical power, to speak and have something magical happen’
sơnoh‘to pay to spirits what has been promised or to actually buy something you say you will’
tơl‘to answer, to promise the spirits a sacrifice’
ư-ŏ, pơgang ư-ŏ‘type of medicine used to stop spirits from entering village’
yôk-yak‘polite way to say one is a little older than another’
Numerous expressions of time are also prominent in the Bahnar lexicon, referencing the diurnal cycle, daylight, intervals of time, subdivisions of day/night, seasonal cycles, and environmental time indicators.
hiưp‘twilight; to fall asleep’
ki‘a while ago, the other day time, designates time from a few days ago up to about a year ago’
kơmăng‘at night, nighttime’
kơsỡ‘afternoon’
leng gong‘early in the morning around two or three o’clock’
sỡ‘distant past (from six months to a year in the past back to any time more distant)’
srê‘palm-shaped, spoke-shaped as rays of sun at dawn’
tom‘to be on time, in time’
tơdeh‘to wait a long time’
tơdre‘twilight, time of sun going down’
There is a mobile week centered around today, which displays greater precision for future days than past days.
bơlao‘day before yesterday or at most a few days ago’
brei‘yesterday’
drŏu, drâu‘today’
dơning‘tomorrow’
dơmônh‘day after tomorrow’
dơmanh‘three days after today’
tĭ, năr tĭ‘fourth day after today’ (does not contain the word puăn ‘four’)
tong, năr tong‘fifth day after today’ (does not contain the word pơđăm ‘five’)
Some lexemes point to the use of environmental calendars, although we have not been able to document any intact system.
TƠDĀP ‘Arbre à fleurs rouges, dont la floraison annonce la période de certains travaux des champs.’ [Tree with red flowers, the flowering of which heralds the period of certain work in the fields.]’
Evidence of traditional technologies is also found. Some of these technologies are no longer in use, but some, such as architecture, basketry, fishing, hunting, and instrumental music, are still very much alive.
BỐK-DOP‘A sort of leaf suit to protect yourself in rainy weather.’
BỎDỎNG‘Fish in a particular way when the waters are swollen and turbid.’
chăr ‘to split bamboo into small pieces, to make cord’
chǐng klŏng‘bamboo xylophone’
chơduh ‘instrument used in weaving placed on one’s back for thread’
dŭk, hră dŭk ‘type of crossbow used by children that is simpler in form like Western bow’
gia ‘a tall grass used to make roofs’
gle ‘type of vine from which fish poison is made’
gông ‘trap that works by trigger which trap drops on animal’
hiơ ‘spinning wheel’
hmar ‘trap that has door for animals to enter but then shuts them in’
hơdrỡm ‘bamboo water system’
hơjỡm ‘to work with metals’
bơngai hơjỡm ‘blacksmith’
honhă ‘a fish net that is pulled up by cord’
hơnhuăl ‘type of fish net’
hră ‘crossbow’
hrok ‘small conical fish trap’
iaih ‘bird trap made by putting a sticky substance on branches’
jal ‘conical fish net that is thrown into the water’
jơmul ‘to plant rice using dibble stick to make holes’

5. Using AI Tools in Corpus Research

In using LLMs, our goal was not to conduct a systematic quantitative evaluation of model performance, but rather to assess the feasibility of using AI tools to support environmental lexicon extraction from a low-resource Indigenous language dictionary. Our approach was exploratory and iterative. We began with a zero-shot prompting strategy by asking ChatGPT 5 to perform the task directly without providing examples, as our primary goal was simply to assess whether the tool could extract an environmental lexicon at all. Upon reviewing the initial output, we identified a substantial number of false negatives in the excluded list and fed these back to ChatGPT 5 as examples of terms that should have been included. This iterative process of human-in-the-loop error correction effectively constitutes a form of few-shot prompting, arrived at organically through expert judgment rather than by design. We acknowledge that a more systematic few-shot strategy, implemented from the outset, might yield improved results, and we identify this as a productive direction for future work. We also acknowledge that testing only a single LLM is a limitation, though we note that systematic LLM evaluation was not our purpose. Our goal was to extract the Bahnar environmental lexicon using any available tools, including AI, via a hybrid approach that combines computational methods with native speaker judgment and community-based fieldwork. This section provides an account of our methodology and findings that we hope will be of practical value to others undertaking similar work with low-resource Indigenous language dictionaries.
Our first zero-shot prompt to ChatGPT 5 to “extract all environment-related terms” from the Bahnar-English dictionary resulted in it finding n = 744 (19.5%) environment-related items out of 3804 total entries. By way of explanation, ChatGPT 5 stated: “I used a conservative keyword approach over the English glosses—covering flora (trees, bamboo, leaves, fruit), fauna (birds, fish, mammals, insects), landscape/terrain (river, forest, mountain, field), weather and celestial terms (rain, wind, sun, moon, stars), and natural events (flood, drought, fire).”
An examination of ChatGPT’s initial tagged set of 744 lexemes found no false positives. However, 19.5% is a significantly lower proportion compared to that derived from our manually coded random samples (42% and 44%, respectively). In examining the 3060 lexemes that ChatGPT 5 classified as “non-environmental”, many false negatives were noted, which it had failed to tag, including:
adrok‘the skin that a snake sheds’
aguăt‘centipede’
akoh‘honey-thick yellow stage’
amră‘peacock’
amrĕ‘red pepper’
atŏl‘stalk of bananas’
bah‘downriver, south’
bluh‘to sprout out of the ground’
We then asked ChatGPT 5 to “broaden search to include foods, insects, and animal by-products”. The result was 27% (n = 1028), but many false negatives were still found in the list of excluded (“non-environmental”) terms, including:
arah‘bedbug’
atŏl‘stalk of bananas’
băk‘nighthawk’
bih‘poison’
bôm, ‘bih bôm‘poisonous green snake’
brei‘yesterday’
cheh‘to hatch’
chữ-chă‘house lizard’
dăm‘male carabao, bull’
dơning‘tomorrow’
đêl‘footprints’
gŏ klăn‘quicksand’
In a different approach, we used ChatGPT 5 to auto-tag our Bahnar-English corpus against three major ontologies: (1) ENVO—Environment Ontology (Buttigieg et al., 2016), a community ontology for environments, habitats, environmental processes/qualities, widely used across ecology and bio/eco-informatics; (2) AGROVOC—a multilingual thesaurus for agriculture, forestry, fisheries, food commodities, crops, and practices (Subirats-Coll et al., 2022); and (3) DwC Darwin Core (Wieczorek et al., 2012) which has taxa, observations, and specimens, and often serves as the backbone for species observations in NLP pipelines. We tasked ChatGPT 5 with adding ontology IDs and producing a sheet of unmapped items for review. In this exercise, ChatGPT 5 was still found to be unable to fully extract relevant environmental terms: for example, overall, it tagged just 20.5% (n = 783/3804), well below our manually-coded finding of 42% and 44%, while for birds, it tagged just n = 40, while in our manual analysis, we found n = 67.
We conclude that ChatGPT 5—even while cross-checking with standard ontologies—is not yet able to efficiently extract a complete environmental sub-lexicon. That said, one useful feature of ChatGPT 5 is that it can be directed to extract all items in a well-defined category, e.g., ‘animal’, and after completing its analysis, it will suggest ways for the user to broaden or narrow the scope. We conclude that ChatGPT 5 has considerable potential for extracting specialized lexica but requires careful supervision and checking by knowledgeable human users for the data to be validated and useful.

6. Conclusions

Based on our collaborative experiences in the Bahnar community, several valuable insights can be offered regarding innovative methods in endangered language documentation, as well as Indigenous languages more broadly.
First, legacy Bahnar dictionaries, from both the 19th and 20th centuries, are valuable repositories of environmental knowledge that is uniquely encoded linguistically. We have demonstrated that we can use the dictionaries to seed new language revitalization projects that are desired by and beneficial to the community. We also highlight a cautionary note: Too often, dictionaries are produced in an extractive manner that brings no benefits to the community (Harrison et al., 2026). In many cases, we know of, the community has never seen copies of dictionaries which were produced by linguists with the assistance of earlier generations of that same community. Therefore, repatriation and attention to ethical handling of legacy sources are essential (O’Meara & Good, 2010). Reproducing (insofar as copyright law permits) and gifting copies to the community is greatly appreciated and can foster language revitalization efforts while reaffirming cultural ownership (Ting, 2024).
Second, the scope and content of legacy lexicographic sources vary considerably, and often animals and plants are unidentified by species names. This may reflect a lack of training in natural sciences on the part of the lexicographers, alongside the state of technological tools to facilitate the inclusion of such information. It can also reflect the very different ways Indigenous (folk) taxonomies organize species knowledge, in contrast to a Linnaean one (Conklin, 1962).
Third, a computational approach to extracting the environmental content from a dictionary is a worthwhile endeavor, albeit one that is currently both dominant-language-skewed and still critically reliant on human-assisted analysis. Tellingly, the number of environmental terms, including verbs, that we were able to identify in the corpora exceeds what can be found by applying NLP algorithms and established ontologies based on English. Further, it is crucial for environmental terms identified by linguists to be validated by the community within the natural settings where they occur, which then also ensures the production of a contemporary, co-authored dictionary.
Finally, standard, often Western-centric ontologies for environmental terms do not capture the spiritual agency and animacy of landscapes, which are central to the Bahnar worldview. The Bahnar lexicon must be documented on its own terms, with categories and hierarchies that make sense to the Bahnar and emerge organically (Conklin, 1962).
Our community-based lexicographic approach serves the Bahnar community in several concrete ways: by repatriating legacy dictionary materials that were previously inaccessible to them; by co-producing a Talking Dictionary and video corpus accessible on smartphones; and by ensuring that all collected data is attributed to and owned by the community. Corpus linguistics and NLP tools—including word frequency analysis, semantic clustering, and AI-assisted tagging—have supported this work by revealing patterns in the environmental lexicon that would be difficult to identify through manual analysis alone, while remaining subordinate to native speaker judgment and community validation at every stage. We have shown that the repurposing of legacy data, while expanding it through direct elicitation with speakers, can benefit the community and yield new scientific findings, supporting both language revitalization and biodiversity protection in tandem and reflecting the language-land intertwinement that is foundational to Bahnar culture.

Author Contributions

Conceptualization, K.D.H. and H.T.; methodology, K.D.H., T.B., N.Đ.N. and H.T.; formal analysis, K.D.H., T.B., N.Đ.N., H.L.C., C.M.K.H. and H.T.; investigation, K.D.H., H.T., C.M.K.H. and X.A.; data curation, K.D.H., H.T., H.L.C., N.Đ.N., C.M.K.H. and X.A.; writing—original draft preparation, K.D.H., H.T., T.B., H.L.C. and N.Đ.N.; writing—review and editing, K.D.H. and L.L.; visualization, N.Đ.N. and H.L.C.; supervision, K.D.H., T.B. and H.T.; project administration, K.D.H., L.L. and M.L.L.; funding acquisition, K.D.H. and M.L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by a VinUniversity “Grassroots grant” to M.L.L. and K.D.H., and by an Explorers Club “Discovery Expedition Grant” to K.D.H.

Institutional Review Board Statement

The ethics approval application for human participants research for this project was submitted to and received approval from the Ethics Review Board at VinUniversity, decision of 3 February 2023.

Data Availability Statement

Bahnar lexicographic data from this project, with associated field notes, photographs, and audio-video recordings are archived at Zenodo and publicly accessible (Harrison & Trần, 2026).

Acknowledgments

The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A. Bahnar Environmental Verbs from the Bahnar-English Dictionary

achītto prepare wood for lighting fire
achŭtto prepare wood for lighting fire
anhokto choke with string, rope
asaito fish with rod
atỡmto fasten thatch to roof frame
bluhto sprout out of the ground
bŏkto hollow out, make hole in wood large or small
bơbỡto prepare meat for cooking
bơhurto be very hot at night
bỡngto cover pot with leaves and tie around
bỡtto make a dam
brochto harvest rice by stripping grains from plant
bru-brato designate many small things coming out from a hole
bruato indicate the way rice, people come out of somewhere
“blek-‘blekto designate movement of a shiny object
“bơmto hit target
“brĕng-”brĕnhto designate the action and appearance of fumes
“brưchto designate rays of dawning sun, coming up of full moon
chălto chop wood of small diameter
chărto split bamboo into small pieces, to make cord
chăt to sprout
chhŏng, chhŏng kato dip for fish
chohto hoe, plow, of fish to bite
chongto cut grass
chorto dig ditch
chơchohto pound or chop meat to type
chơchrỡto designate water running as blood
chơlăto peel sugarcane
chơmŏkto be always successful in fishing
chơngto lead by hand, of hen to lead and give food to chicks
chơpĕtto squeeze fruit
chỡngto cut in two—bones, wood, meat
chreto cut off branches of tree
chrŭpto designate the way an object falls into water
chŭk-lŭkto describe someone falling to ground or in water
drang to warm oneself in sun
đăkto set a trap
đot-đotto describe the waving of a bird’s tail up and down
glipto become dark gradually
glǐpto become dark suddenly
gơlŏngto clean by shaking water inside
gơwŏkto be hooked, pick with a hook
gỡto break something made of wood metal glass
hnhohnhễto describe water leaking out, tears flowing
hochto pull out, take out; of water to take something away
honto grow fast—plants, hair, but not people
hơ’bīuto describe swelling from sting
hơdahto be light sunshine light from fire lamp
hơgŏu/hơgâuto be unsuccessful in hunting animals
hơjỡmto work with metals
hơkôihto smooth wood by shaving
hơ’mơtto catch fish with
hơnguangto hunt animals
hơ’nhreto prune a tree
hơtŭkto cook potatoes or corn
hraoto wash rice
hrepto twist off fruit with forked stick
hringto string fish on a line
hrokto push rice into bamboo tube
hrỡpto catch many fish with hands
huehto break wood
huỡrto go around as whirlpool, whirlwind; to whirl
iepto suck as leech, to suck on straw
jehto pick out a sliver, thorn
jêr-tơjêrto describe flat-surfaced object in flight
jǐlto butt with the horns
jǐtto shave off wood or bamboo
jơmoto be successful in hunting animals
jơmulto plant rice using dibble stick to make holes
jơngongto hold something between teeth as bird or dog does
jrỡpto pound husked rice in order to polish it
kaoto chop wedges in a pole to make a ladder or in poles to make them weak in making a trap
kălto chop large trees
kĕchto harvest rice
kêng-kêuto describe bowlegged stride of a monkey
klangto place bamboo tube so that water will run through it
klă klangto be shining bright and beautiful
klăngto level off a field
klǐto describe very soft, overcooked potato, banana
klỗto dig out large clods of soil
klỡkto indicate noise and way something falls to the ground
klưpto describe drooping ears of dogs or goats; people’s ears sticking out
kôchto scoop rice sand with hands; dig with hands
kôihto scrape e.g., scrape pig skin
kôpto hang on to tree as orchid, vine; entwine
kơ’bŭlto be round shaped as full tree cabbage
kơdehto break up sod with swinging motion of hoe
kơdôihto describe something with a bumpy surface such as a turtle
kơla-kơlĕchto have many scars
kơ’nguenhto curl up e.g., a dog
kơtǐto move circularly in threshing rice
kỡtto tie animals so that they can feed
krĕoto castrate
krŏuto poison fish
kuaihto dig up, scratch around for
kuerto drill holes
kuǐpto describe the way a bird eats
lŏngto shed skin, to skin
lôk-sơlôkto become bigger and as dirt in front of plow, waves piling up
“lêng-kơ’lêngto describe a bird gliding with large wings
“lơ’lĕp-‘lơ’lônh-mônhto describe the flapping of the wings of many birds
man to shape form with mud cement, to build with these materials
ming yang to placate the spirits, term used when sacrifices are made
môchto submerge oneself in, dive in water
mu, măng muto be very dark
muihto clear bush/jungle to make a field
‘mêu, ‘mê chơ’mêuto designate evil-looking face of tiger
‘mito rain
ngêng-jơngêngto designate a long tell tree or log
‘ngơ’ngêlto shake head—of adults and large animals
‘ngưl-tơ’ngưlto gradually appear from the water
nhŭpto dive completely under water
nhŭrto warm oneself by the fire
ômto be rotten, stink
păt to fold, to make tail of arrow
pehto pound rice
pĕchto cut small wood bamboo in two
pĕnhto stretch out cotton to make thread
phurto roast bananas, tubers in fire
phŭkto describe the sound of something with shell breaking
phŭrto describe a bird flying off suddenly when disturbed
plêr-plarto shine brightly, light, fire, sun
plŏto skin animal, peel fruit
pơdỡmto catch animals, trap animals
pơđeto describe an egg which does not produce a chick but rots
pơđôngto make to float in water or air
pơjôto ripen already picked fruit
pơkăpto make dogs fight each other
pơkỡpto light a fire
pơlehto pick ferns, kernels of corn
pơliĕngto sort rice
pơngoto keep animal/person from eating, not give food to
popôito make a pile of weeds
pơsahto clean over and over as in cleaning out a gourd, to cook bamboo sprouts over and over again to take away bitter taste
pơtămto plant
pơtŏto place by fire to warm
pơtuchto get one attention by calling, to tell dogs to bite prey
pơwirto cause self to change into an animal or vice versa
prachto splash on water noise and appearance of water splashing out
prălto salt food
prĕlto knock down fruit
prỡ prỡto describe blood running out of leg or water from a container
prungto cook in bamboo tube
reito sow seed
ret to cut by knife against and rotating object, to saw with knife
rĕnto bite hard and pull flesh off
rochto take out inner organs of animals
romĕnhto sew up the end of something that has been woven
rơuơto plow
rỡto heat in order to boil
sa mơhto eat rice
sehto cut grass close to ground
sento be unhusked
sohto light a fire
sokto pound corn or vegetables with mortar and pestle
solto try to heal a person who has been sorcerized by taking out nail, worm from body
sol to hunt fish by use of torch
sôngto burn
srangto throw spear, pole
sringto pierce fish and put on string
srokto pour grains into small opening
suănto climb slope, mountain; of the sun to rise in the sky
tărto be productive in thinking; to make a fence or wall of bamboo
tehto hit with a small stick
thi-thuto fan fire (probably the noise of the fanning or blowing on fire)
tongto soak in water, put into water
tŏt to pierce hole through
tônto hit with a large stick
tơ’bŏu/tơ’bâuto make to look for by smelling e.g., dog after prey
tơguătto tie a knot; to keep in the mind, remember and follow
tơjraoto be a little ripe
tơmŏkto be successful in hunting
tơ’nỡngto catch drips of water
tơ’ngumto cook in the coals with leaves around
tơplôchto straighten out wire; to let fish go from hand accidentally
tơtuhto flap wings, to shake clothes
tỡkto suck in on pipe to light, to hunt fish with a torch
tỡkto put out light or fire
trămto place wood in water so that termites will not eat later on
tretto cut off sections of a log
trôihto mend something that is woven with bamboo
tuhto pour; of sweat to pour out; of frogs to lay eggs, of animals to give birth
umto winnow rice
wahto fish by rod, line
warto make fence as around garden
wĕchto twist to break off fruit from tree
wŏng to cut grass with a sickle
yengto carry a back-basket with a strap on one shoulder only
yŏuto catch fish with a basket
yuăto cut hair or rice

Notes

1
By corpus linguistics, we mean digital lexicography spanning both legacy textual sources and contemporary lexical resources contributed by our Bahnar language consultants. By NLP, we mean applying algorithms, AI tools, and large language models to construct semantic networks, word frequency counts, visualizations, and semantic categorization.
2
All hyperparameters used in the computational pipeline were as follows. Semantic token extraction: maximum tokens per definition k = 5 (as illustrated in Table 4) and maximum cosine distance threshold d = 0.7. SentenceTransformers/MPNet embedding model: all parameters set to default. UMAP dimensionality reduction: all parameters set to default. HDBSCAN clustering: minimum cluster size set to 5; all remaining parameters set to default.

References

  1. Anderson, G. D. S., & Harrison, K. D. (2006). Language hotspots: Linking language extinction, biodiversity, and the human knowledge base. Living Tongues Institute for Endangered Languages Occasional Papers, 1, 1–11. [Google Scholar]
  2. Atkins, S. B. T. (1996). Bilingual dictionaries: Past, present and future. In M. Gellerstam, J. Järborg, S.-G. Malmgren, K. Norén, L. Rogström, & C. Röjder Papmehl (Eds.), Proceedings of the 7th EURALEX international congress (pp. 515–546). Novum Grafiska AB. [Google Scholar]
  3. Banker, E. (1964). Bahnar reduplication. Mon-Khmer Studies Journal, 1, 119–134. [Google Scholar]
  4. Banker, J., Banker, E., & Mở. (1979). Bahnar dictionary: Plei Bong–Mang Yang dialect = Ngữ-ṿưng Bahnar. Summer Institute of Linguistics. [Google Scholar]
  5. Berez-Kroeker, A. L., & Henke, R. H. (2016). A brief history of archiving in language documentation, with an annotated bibliography. Language Documentation & Conservation, 10, 411–457. [Google Scholar]
  6. Bird, S. (2024). Must NLP be extractive? In Proceedings of ACL 2024, long papers (pp. 14915–14929). Association for Computational Linguistics. [Google Scholar] [CrossRef]
  7. Buttigieg, P. L., Pafilis, E., Lewis, S. E., Schildhauer, M. P., Walls, R. L., & Mungall, C. J. (2016). The Environment Ontology in 2016: Bridging domains with increased scope, semantic density, and interoperation. Journal of Biomedical Semantics, 7(1), 57. [Google Scholar] [CrossRef]
  8. Bùi, M. Đ. (Ed.). (2006). Dân tộc Ba Na ở Việt Nam (The ethnic group of Bahnar in Vietnam). Nhà xuất bản Khoa học xã hội. [Google Scholar]
  9. Coe, M. A., & Gaoue, O. G. (2020). Cultural keystone species revisited: Are we asking the right questions? Journal of Ethnobiology and Ethnomedicine, 16, 70. [Google Scholar] [CrossRef]
  10. Combs, J. P. (1873). Lettre de M. Combes sur les mœurs et coutumes des Ba-Hnars. In P.-X. Dourisboure (Ed.), Les sauvages Ba-Hnars (pp. 403–445). E. de Soye. [Google Scholar]
  11. Conklin, H. C. (1962). Lexicographical treatment of folk taxonomies. International Journal of American Linguistics, 28, 119–141. [Google Scholar]
  12. Cooper, N., Heldreth, C., & Hutchinson, B. (2024). It’s how you do things that matters: Attending to process to better serve Indigenous communities with language technologies. arXiv. Available online: https://arxiv.org/abs/2402.02639v2 (accessed on 27 May 2026). [CrossRef]
  13. Craik, R., & Lê, Q. M. (2018). Birds of vietnam. Lynx Editions and BirdLife International. [Google Scholar]
  14. Cristancho, S., & Vining, J. (2004). Culturally defined keystone species. Human Ecology Review, 11, 153–164. [Google Scholar]
  15. Cuadros, M., Laparra, E., Rigau, G., Vossen, P., & Bosma, W. (2010). Integrating a large domain ontology of species into WordNet. In Proceedings of the seventh international conference on language resources and evaluation (LREC’10) (pp. 2310–2317). European Language Resources Association (ELRA). [Google Scholar]
  16. Dourisboure, P.-X. (1889). Dictionnaire bahnar–français. Société des Missions Étrangères. [Google Scholar]
  17. Drouin, P., L’Homme, M.-C., & Robichaud, B. (2018). Lexical profiling of environmental corpora. In Proceedings of LREC 2018. ELRA. [Google Scholar]
  18. Eidshaug, J. S. P., Bjerck, H. B., Lohndal, T., & Risbøl, O. (2024). Words as archaeological objects: A study of marine lifeways, seascapes, and coastal environmental knowledge in the Yagan-English Dictionary. International Journal of Historical Archaeology, 28(3), 722–766. [Google Scholar] [CrossRef]
  19. Ferguson, J., & Weaselboy, M. (2020). Indigenous sustainable relations: Considering land in language and language in land. Current Opinion in Environmental Sustainability, 43, 1–7. [Google Scholar] [CrossRef]
  20. François, A. (2018). In search of island treasures: Language documentation in the Pacific. Language Documentation & Conservation, Special Publication, 15, 276–294. [Google Scholar]
  21. Garibaldi, A., & Turner, N. (2004). Cultural keystone species: Implications for ecological conservation and restoration. Ecology & Society, 9, 1. [Google Scholar]
  22. Grenoble, L. A., McMahan, H., & Petrussen, A. K. (2019). An ontology of landscape and seascape in Greenland: The linguistic encoding of land in Kalaallisut. International Journal of American Linguistics, 85(1), 1–43. [Google Scholar] [CrossRef]
  23. Guilleminet, P., & Alberty, J. (1959–1963). Dictionnaire bahnar–français. École Française d’Extrême-Orient. [Google Scholar]
  24. Harrison, K. D. (2004). South Siberian sound symbolism. In E. Vajda (Ed.), Languages and prehistory of central Siberia (pp. 199–213). John Benjamins. [Google Scholar]
  25. Harrison, K. D. (2007). When languages die: The extinction of the world’s languages and the erosion of human knowledge. Oxford University Press. [Google Scholar]
  26. Harrison, K. D. (2023). Environmental linguistics. Annual Review of Linguistics, 9(1), 113–134. [Google Scholar] [CrossRef]
  27. Harrison, K. D., Anderson, G. D. S., & Ondar, A. (2020). Tuvan talking dictionary. Available online: http://tuvan.talkingdictionary.org (accessed on 22 May 2026).
  28. Harrison, K. D., Lillehaugen, B. D., Fahringer, J., & Lopez, F. H. (2019, October 1–3). Zapotec language activism and talking dictionaries. Electronic Lexicography in the 21st Century: eLex 2019 (pp. 31–50), Sintra, Portugal. [Google Scholar]
  29. Harrison, K. D., & Sariahmed, K. (2014). Linguistic and audio-video collections in ethnobiology. In curating biocultural collections: A handbook. Royal Botanic Gardens, Kew. [Google Scholar]
  30. Harrison, K. D., & Trần, H. (2026). Field materials from ethnographic research on Bahnar environmental knowledge and animism, Kon Tum, Vietnam [Dataset]. Zenodo. Available online: https://zenodo.org/records/18837888 (accessed on 27 May 2026). [CrossRef]
  31. Harrison, K. D., Trần, H., & A, X. (2026). Dictionary as grift, or dictionary as gift? Indigenous lexicons, GenAI, and the ethics of extraction. Journal of American Folklore.
  32. iNaturalist. (2025). iNaturalist [Mobile application]. Available online: https://www.inaturalist.org (accessed on 22 May 2026).
  33. Khishigsuren, T., Regier, T., Vylomova, E., & Kemp, C. (2024). A computational analysis of lexical elaboration across languages. Proceedings of the National Academy of Sciences, 122, e2417304122. [Google Scholar] [CrossRef]
  34. Lê, Q. (2024, April 10). Forests in Vietnam’s Central Highlands at risk as development projects take priority. Mongabay. Available online: https://news.mongabay.com/2024/04/forests-in-vietnams-central-highlands-at-risk-as-development-projects-take-priority/ (accessed on 22 May 2026).
  35. Maffi, L., & Woodley, E. (2010). Biocultural diversity conservation: A global sourcebook. Earthscan. [Google Scholar]
  36. Mark, D. M., Turk, A. G., Burenhult, N., & Stea, D. (Eds.). (2011). Landscape in language: An introduction. In Landscape in language: Transdisciplinary perspectives (pp. 1–24). John Benjamins. [Google Scholar]
  37. McInnes, L., Healy, J., & Astels, S. (2017). HDBSCAN: Hierarchical density based clustering. Journal of Open Source Software, 2(11), 205. [Google Scholar] [CrossRef]
  38. McInnes, L., Healy, J., & Melville, J. (2020). UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv. [Google Scholar] [CrossRef]
  39. Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41. [Google Scholar] [CrossRef]
  40. Mosel, U. (2014). Corpus linguistic and documentary approaches in writing a grammar of a previously undescribed language. Language Documentation & Conservation, Special Publication, 8, 135–157. [Google Scholar]
  41. Nabhan, G. P., & Carr, J. L. (1994). Ironwood: An ecological and cultural keystone of the Sonoran Desert. Conservation International. [Google Scholar]
  42. Nguyễn, K. C., Nguyễn, Đ. C., Nguyễn, V. K., & Hardy, A. (2011). Người Ba-Na ở Kon Tum = Les Bahnar de Kontum (A. Hardy, Ed.). Viện Viễn Đông Bác Cổ/Viện nghiên cứu văn hóa. (Original work published 1937). [Google Scholar]
  43. Nuckolls, J. B. (1996). Sounds like life: Sound-symbolic grammar, performance, and cognition in pastaza quechua. Oxford University Press. [Google Scholar]
  44. Ogilvie, S. (2011). Linguistics, lexicography, and the revitalization of endangered languages. International Journal of Lexicography, 24, 389–404. [Google Scholar] [CrossRef]
  45. O’Meara, C., & Good, J. (2010). Ethical issues in legacy language resources. Language & Communication, 30(3), 162–170. [Google Scholar] [CrossRef]
  46. Oxford English Dictionary. (2023). downbound (adj. & adv.), upbound (adj. & adv.). Available online: https://www.oed.com/dictionary/downbound_adj (accessed on 22 May 2026).
  47. Pretty, J., Adams, B., Berkes, F., De Athayde, S. F., Dudley, N., Hunn, E., Maffi, L., Milton, K., Rapport, D., Robbins, P., Sterling, E., Stolton, S., Tsing, A., Vintinnerk, E., & Pilgrim, S. (2009). The intersections of biological diversity and cultural diversity: Towards integration. Conservation & Society, 7(2), 100–112. [Google Scholar]
  48. Raskin, R. G., & Pan, M. J. (2005). Knowledge representation in the Semantic Web for Earth and Environmental Terminology (SWEET). Computers & Geosciences, 31(9), 1119–1125. [Google Scholar] [CrossRef]
  49. Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of EMNLP 2019. Association for Computational Linguistics. [Google Scholar] [CrossRef]
  50. Repetti, L. (2018). Fieldwork and building corpora for endangered varieties. In W. Ayres-Bennett, & J. Carruthers (Eds.), Manual of romance sociolinguistics (pp. 114–133). De Gruyter. [Google Scholar] [CrossRef]
  51. Reynolds-Cuéllar, P. (2023). Ancestral technologies as cultural preservation. SocArXiv. [Google Scholar] [CrossRef]
  52. Rosner, M., & Sultana, K. (2014). Automatic methods for the extension of a bilingual dictionary using comparable corpora. Proceedings of LREC, 2014, 3790–3797. [Google Scholar]
  53. Sidwell, P. (2002). Genetic classification of the Bahnaric languages: A comprehensive review. Mon-Khmer Studies Journal, 32, 1–24. [Google Scholar]
  54. Song, K., Tan, X., Qin, T., Lu, J., & Liu, T.-Y. (2020). MPNet: Masked and permuted pre-training for language understanding. Advances in Neural Information Processing Systems, 33, 16857–16867. [Google Scholar]
  55. Subirats-Coll, I., Kolshus, K., Turbati, A., Stellato, A., Mietzsch, E., Martini, D., & Zeng, M. (2022). AGROVOC: The linked data concept hub for food and agriculture. Computers and Electronics in Agriculture, 196, 105965. [Google Scholar] [CrossRef]
  56. Ting, C. J. (2024). The discursive construction of language ownership and responsibility for Indigenous language revitalisation. Journal of Sociolinguistics, 28, 46–64. [Google Scholar] [CrossRef]
  57. Trần, H., A, B., A, H., Huy, Đ. T., & Harrison, K. D. (2024a). Baskets of wisdom: Bahnar basketry, folk taxonomies, and the maintenance of upland environmental intelligence in Vietnam. Sociolinguistic Studies, 18(3–4), 433–466. [Google Scholar] [CrossRef]
  58. Trần, H., Harrison, K. D., Duong, T. L., Hoàng, C. M. K., A, X., & Cao, H. L. (2023). Bahnar talking dictionary. Available online: http://www.talkingdictionary.org/bahnar (accessed on 25 November 2025).
  59. Trần, H., Harrison, K. D., Hoàng, K. M. C., & Cao, H. L. (2024b). Who eats the forest? Forest animacy among the Bahnar people of Vietnam. Journal of Ethnobiology, 44(4), 402–414. [Google Scholar] [CrossRef]
  60. United Bible Societies. (2008). Hla Bo’ar ‘bok kei-dei pah ‘nao/Kinh Thánh Tán ước [Bahnar–Vietnamese bilingual New Testament]. United Bible Societies. [Google Scholar]
  61. United Nations Development Programme (UNDP). (2024). Indigenous knowledge is crucial in the fight against climate change—here’s why. Available online: https://climatepromise.undp.org/news-and-stories/indigenous-knowledge-crucial-fight-against-climate-change-heres-why (accessed on 22 May 2026).
  62. United States Coast Guard. (n.d.). Navigation rules: International—Inland. Available online: https://www.navcen.uscg.gov/sites/default/files/pdf/navRules/navrules.pdf (accessed on 15 August 2025).
  63. Vittrant, A. (2002). Classifier systems and noun categorization devices in Burmese. In Proceedings of the twenty-eighth annual meeting of the Berkeley linguistics society: Special session on Tibeto-Burman and Southeast Asian linguistics (Vol. 28S, pp. 129–148). Linguistic Society of America. [Google Scholar]
  64. Wagner, A. D., & Koutstaal, W. (2002). Priming. In Encyclopedia of the human brain (pp. 27–46). Academic Press. [Google Scholar]
  65. Wieczorek, J., Bloom, D., Guralnick, R., Blum, S., Döring, M., Giovanni, R., Robertson, T., & Vieglais, D. (2012). Darwin Core: An evolving community-developed biodiversity data standard. PLoS ONE, 7(1), e29715. [Google Scholar] [CrossRef] [PubMed]
  66. World Wildlife Fund. (2021). New species discoveries in the Greater Mekong 2020. Available online: https://wwfasia.awsassets.panda.org/downloads/wwf_new_species_discoveries_2020_spreads_final_compressed.pdf (accessed on 22 May 2026).
  67. Y, K., Trần, P. L. H., Harrison, K. D., & Trần, H. (Eds.). (2025). A Bahnar epic tale: Giông befriends Glaih-Phang, volume 1: Claiming the land. VinUniversity: UNESCO Chair for Environmental Leadership, Cultural Heritage, and Biodiversity. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.