A Standardized Nomenclature Design for Systematic Referencing and Identification of Animal Cellular Material

Simple Summary A permanent link between biological samples and the associated data is essential for their effective and long-term utilization. In order to enable clear identification and referencing of biosamples and to ensure comparability in research, explicit naming of such material by assigning unique and permanent identifiers is therefore necessary. This can be achieved by using explicit naming structures with a predefined pattern. These nomenclature structures have been developed for diverse biological materials but are lacking for animal cellular material, such as tissues and cell lines. Here, we present a first, standardized, human-readable nomenclature design, which generates clear and stable identifier names for such material with a focus on cellular material from wildlife species. Consistent application and central distribution and storage of these identifiers are required to ensure explicit identification and traceability of animal biosamples. This novel and globally applicable identification system adds standardization to the long-term storage of animal cell material in cryobanks and supports species conservation and research. Abstract The documentation, preservation and rescue of biological diversity increasingly uses living biological samples. Persistent associations between species, biosamples, such as tissues and cell lines, and the accompanying data are indispensable for using, exchanging and benefiting from these valuable materials. Explicit authentication of such biosamples by assigning unique and robust identifiers is therefore required to allow for unambiguous referencing, avoid identification conflicts and maintain reproducibility in research. A predefined nomenclature based on uniform rules would facilitate this process. However, such a nomenclature is currently lacking for animal biological material. We here present a first, standardized, human-readable nomenclature design, which is sufficient to generate unique and stable identifying names for animal cellular material with a focus on wildlife species. A species-specific human- and machine-readable syntax is included in the proposed standard naming scheme, allowing for the traceability of donated material and cultured cells, as well as data FAIRification. Only when it is consistently applied in the public domain, as publications and inter-institutional samples and data are exchanged, distributed and stored centrally, can the risks of misidentification and loss of traceability be mitigated. This innovative globally applicable identification system provides a standard for a sustainable structure for the long-term storage of animal bio-samples in cryobanks and hence facilitates current as well as future species conservation and biomedical research.


Introduction 1.Background
As biodiversity is increasingly threatened, studying it, with aims to its conservation and restoration, is needed.Besides nature conservation efforts, the collection and active preservation of living biological samples usable for research (e.g., for the generation of germ cells) also plays an increasingly crucial role in species conservation [1].The global exchange of such rare and valuable biosamples for research and species conservation requires traceability [2].Standardized naming tools for generating unique identifiers (UIs) support such mandatory tracing, enable clear referencing and have been broadly discussed for human cells [3,4], human genomic data [5] and human gene products [6,7].A widely used naming tool for human pluripotent stem cells was proposed by Luong et al. (2011) [3], further developed by Kurtz et al. (2018) [4] and implemented in the Human Pluripotent Stem Cell Registry (hPSCreg) [8].This is the only available nomenclature specific to human pluripotent stem cells suitable for generating human-readable, i.e., interpretable UIs [9,10].
For referencing of a specific entity, identifiers serve as a link with which metadata are associated.Multiple identifiers may co-exist to complement their particular features and utility, such as information content, coding length and global uniqueness.However, if different identifiers exist for one entity, they should be unambiguously linked and reference one another, e.g., to connect data repositories [11].The Resource Identification Initiative (RII) [12] introduced the concept of Research Resource Identifiers (RRIDs) [13] to enable reproducible research through the use of RRIDs, unique alphanumerical identifiers for referencing published research materials, such as reagents, tools, organisms and biological materials.For vertebrate and invertebrate cell lines cited in the scientific literature, Cellosaurus' knowledge resource [14] assigns a short, persistent, unique stable identifier, which is recognized as the RRID of these cell lines [15].Furthermore, the BioSamples database at the European Bioinformatics Institute as part of the European Molecular Biology Laboratory (EMBL-EBI) [16] assigns unique accession numbers to registered research biosamples, including living cells and tissues of all kinds of human and non-human organisms used for sequencing [17,18].However, without disclosure of the linked metadata, the mutually independent RRIDs and BioSamples identifiers are not directly informative or human-readable.Nor do they allow for the assessment of kinships or complex relationships between origin, donor species, biosample type and derivative, which are needed for many application cases of animal cells.Consequently, any newly established biosample identifier should fulfill the specific stakeholder need for human readability but also establish and maintain stable links to a respective RRID to enable traceability.A central platform is required to issue and register human-readable names directly attributed to the relevant data and recorded with the RRID authorities so that research resources can continue to be resolved by their RRIDs.In conclusion, no uniform, human-readable, informative nomenclature exists for animal living biosamples such as tissues, derived cell lines, gametes or embryos to enable traceability to its origins and legal provenance.
Research with living animal biosamples, especially cell lines, is the focus of various scientific fields, such as species conservation, basic research and comparative biological research, as well as veterinary and biomedical research [19], and the derivation and establishment of new animal cell lines are expanding [20].Further, the publication of animal cell lines in general and derived pluripotent animal stem cell lines in particular (i.e., embryonic stem cells (ESCs) and induced pluripotent stem cells (iPSCs)) is constantly increasing.This applies to wildlife species threatened with extinction [21][22][23][24][25][26], as well as domesticated and livestock species [27][28][29][30][31][32][33] and model species, including non-human primates, mice, naked mole rats and others [34][35][36][37][38][39].Also, the establishment of stem-cell-derived multicellular models such as organoids, assembloids and blastoids [40,41] is progressing for animal species, and they have been published both for model species, e.g., for mouse blastoids [42], and wildlife species, e.g., for rhinoceros cerebral organoids [26].Research on the cellular material of domesticated model species has been conducted intensively in the last decades, with the result being more than 174,000 mouse (Mus musculus) ES cell lines having been established, registered by the RII and assigned an RRID so far [43] and more than 2 million mouse biosamples having been registered in the BioSamples database.In contrast, living cellular biosamples of wildlife, i.e., non-domesticated species [44], have been less strongly researched yet are steadily growing in number.As a result, animal biosamples are increasingly exchanged and processed by different research institutions worldwide [2].This demands unambiguous identification to assure access to and the traceability and easy authentication of samples and cells.The absence of a standardized naming system at present has led to a wide range of inconsistent naming structures for animal cellular material (see Table 1).These irregular name schemes range from purely descriptive, alphabetical styles to short alphabetical and alphanumerical names and further to long alphanumerical names with or without additional structuring characters (see examples 1-5 in Table 1).The inconsistency in the naming of animal cellular material and the subsequent irregular interpretability can be illustrated by examples for fibroblast cell lines such as "ENL-2" and "KCB 96008", both from Asian elephants (Elephas maximus); "KDF" and "SR-fibroblasts", both from Sumatran rhinoceroses (Dicerorhinus sumatrensis) and "Fish 80" and "pA03_wD06", both describing tissue samples of rainbow trout (Oncorhynchus mykiss).Moreover, allocated cell line names such as "UCLAi090-A" mimic and can be confused with published naming structures intended for human cell lines (see examples 6-12 in Table 1).If publicly accessible, most of these biosamples are assigned an RRID ("CVCL_xxxx") or BioSample ID ("SAMxxxxxxxxx") characterized by unique alphanumerical coding ("xxxxxx").Thus, the identifier is not informative about any features of the biosample.These examples show existing ambiguities in the naming of biosamples and clearly demonstrate the necessity for a uniform nomenclature and informative identifier system in research with animal cells.

Biosample Name Publication Characterization
1 Snow leopard iPS [45] Snow leopard (Panthera uncia) iPSC line 2 J9F2 [46] Japanese macaque (Macaca fuscata fuscata) iPSC line 3 RNA-iPSC #1 [47] Common marmoset (Callithrix jacchus) iPSC lines 4 CM421F B-0-12 iPSC [31] 5 BWHGLi001 [48] Naked mole rat (Heterocephalus glaber) iPSC line The distribution of animal biosamples, such as tissues, cells and gametes, between research labs and other resources, for example, cryobanks, without a persistent standard identification system impedes explicit referencing and traceability, particularly when cells are modified, such as, for example, by reprogramming them into iPSCs.Induced pluripotent stem cell lines are immortal, making them valuable tools for differentiation and further genetic modification [34].Data related to the cell material, such as information on its derivation, cultivation and characterization, as well as ethical and legal provenance, are at risk of being disassociated from the cells over time and their ease of global distribution hindered [3,4].In addition, inconsistent naming complicates conducting literature searches for existing cell lines, hampers the findability, accessibility, interoperability and reuse (FAIR) principles [11,55] and increases the chances of gross misidentification of cell material and the subsequent irreproducibility of published results [56].
The international exchange and utilization of non-human genetic resources (i.e., "to conduct research and development on the genetic and/or biochemical composition of genetic resources, including through the application of biotechnology" [57]) are in many cases subject to regulations and control mechanisms on ethical and legal provenance based on international treaties.These include, for example, the "Washington Convention on International Trade in Endangered Species of Wild Fauna and Flora" [58] and the "Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilization (ABS) to the Convention on Biological Diversity" [59].Compliance with these regulations includes strict administrative obligations and requires transparent traceability of the biological material.
A regulated, clear, robust and accepted designation of animal cell material provides for traceability and is therefore indispensable for consistent scientific work and credible research results [3,9].This necessity becomes particularly evident in the context of the long-term storage of living animal biosamples in wildlife cryobanks.Such continuously evolving archives aim to preserve valuable cellular and genetic material to preserve biodiversity [20,[60][61][62][63] in the context of accelerated anthropogenic species extinction rates [64].They further target promoting its broad application in different research fields and conservation efforts [65], comparative cell and development biology research [66], (advanced) assisted reproduction technologies ((a)ART) and stem-cell-associated techniques (SCAT) [1,24,[67][68][69].These biorepositories are in need of joint data and process standardization [2,19].Informative unique identifiers will ease compliance tracing within the relevant legal frameworks of animal biosamples stored in cryobanks and exchanged internationally.

Requirements of a Standardized Nomenclature
The developed standardized nomenclature was designed in consideration of the recommendations of the International Cell Line Authentication Committee (ICLAC) [9].A standardized nomenclature must follow a formal pattern, i.e., a structured design characterized by predetermined rules, and be documented in a repository [11].These nomenclatures should ideally generate human-readable names for easy recognition of the features of the material to be informative to human users.

Nomenclature Components
To establish a standardized and human-readable nomenclature for biological specimens which includes a wide range of species, it is imperative to take into account species name coding to contextualize the identifiers of the respective biosamples.Moreover, it is reasonable to include information on the biosample type, such as tissue or cell line.However, when including human-readable information on the species and the biosample, the number of characters in the identifier is likely to rapidly increase and even exceed a reasonable, intuitive and useful nomenclature length.Thus, a nomenclature design naturally faces a trade-off between information and length [3,4,11].

Proposed Standardized Nomenclature-Defined Formal Pattern
The present nomenclature design aims to provide a unique, stable and human-readable 17-digit alphanumerical identifier for viable animal biosamples at the species level.It follows a simple, predefined structure, which links two components: a unique and novel 10-digit alphabetical species code (component I), followed by a 1-digit prefix for biosample classification and a 5-digit ascending identification number for every new cellular biosample (component II) (see Table 2).The clarity and readability of the two components are strengthened by hyphens (see Figure 1).This design allows for 10 5 possible standardized identifiers for each of the considered biosample types, tissue (T), cells (C), gametes (G) and embryos (E), within one species.It could be expanded for additional biosample types, such as multicellular models (M) (organoids, assembloids, blastoids, etc.).A 5-digit ascending identification number between 00001 and 99,999, allowing for the distribution of 99,999 unique identifiers for the respective biosample type within one species 13-17

Nomenclature Adaptation to Transformation Processes
Not integrated into such a nomenclature pattern is information such as derivation processes or hierarchy.Any transformation of a cellular biosample (e.g., genetic modifications) will therefore result in an individual, newly distributed identifier through the assignment of a new identification number and-if necessary-a new prefix (see Figure 2).A precise feature definition for every single position within the nomenclature (see Figure 1) prevents the possible confusion caused by ambiguous characters such as an upper case "O" and "I" or lower case "I" and the numbers 0 and 1. Examples of the nomenclature design are summarized in Table 3.

Nomenclature Adaptation to Transformation Processes
Not integrated into such a nomenclature pattern is information such as derivation processes or hierarchy.Any transformation of a cellular biosample (e.g., genetic modifications) will therefore result in an individual, newly distributed identifier through the assignment of a new identification number and-if necessary-a new prefix (see Figure 2).A precise feature definition for every single position within the nomenclature (see Figure 1) prevents the possible confusion caused by ambiguous characters such as an upper case "O" and "I" or lower case "I" and the numbers 0 and 1. Examples of the nomenclature design are summarized in Table 3.

Nomenclature Adaptation to Transformation Processes
Not integrated into such a nomenclature pattern is information such as derivation processes or hierarchy.Any transformation of a cellular biosample (e.g., genetic modifications) will therefore result in an individual, newly distributed identifier through the assignment of a new identification number and-if necessary-a new prefix (see Figure 2).A precise feature definition for every single position within the nomenclature (see Figure 1) prevents the possible confusion caused by ambiguous characters such as an upper case "O" and "I" or lower case "I" and the numbers 0 and 1. Examples of the nomenclature design are summarized in Table 3.  Example for White rhino (Ceratotherium simum) primary material.Left example: Skin tissue material of a White rhino, registered as, e.g., MPR-CerSim-T00020, is hypothetically processed into a somatic cell line (e.g., fibroblasts) and subsequently assigned a new UI, such as, e.g., MPR-CerSim-C00015.Aliquots of the latter are then reprogrammed into an iPSC line, and this new cell line is registered as, e.g., MPR-CerSim-C00032.Right example: Gametes of a White rhino, which are, e.g., assigned the UIs MPR-CerSim-G00001 and MPR-CerSim-G00321, are used for in-vitro fertilization, and one of the resulting embryos is assigned the UI MPR-CerSim-E00064.An ESC line, which would be derived from this embryo, is subsequently assigned a UI, e.g., MPR-CerSim-C00123.If stem-cell-derived biosamples such as somatic and germ cells, multicellular models or gametes are further developed, the prefix and subsequent number are then adjusted accordingly, e.g., to "M" or "G".The naming scheme has been designed and is proposed in particular for the designation of the viable cellular material of non-domesticated species in order to set a standard for systematic referencing in expanding research on and with wildlife cells.Its applicability to the cell material of highly researched model species, such as mice, while maintaining human readability would require an extension of or change in the coding to cover high numbers of biosamples.

Scheme for a Species Identifier
A universally valid coding structure that cyphers scientific species names does not exist to date.Species identifier codes are usually generated by different taxonomic databases at the subspecies level as either numerical or alphanumerical codes.Whereas, for example, the Catalogue of Life (COL) [70] generates an alphanumerical code (COL Identifier), different numerical codes are generated by the Integrated Taxonomic Information System (ITIS) [71] (Taxonomic Serial Number, TSN) and the NCBI taxonomy database [72] (Taxonomy ID, txid).As an example, for the species African elephant (Loxodonta africana), this results in the different codes "3W9KV" (COL Identifier), "584939" (ITIS TNS) and "9785" (NCBI txid).Including one of these predefined species codes in the biosample nomenclature would exclude other taxonomy databases.Moreover, the resulting nomenclature would ultimately be dependent on a unique species identifier, whose durability and stability are difficult to predict.We furthermore argue that the human readability of the species component in the nomenclature increases its acceptance and overall utility.Hence, the species coding we propose here for wildlife species refers to the International Code of Zoological Nomenclature.This binominal nomenclature is issued to provide a public and permanent scientific record and is defined as a combination of one generic name followed by one specific name, both containing two or more letters [73].These official species names are globally applied by the scientific community to unambiguously describe animal species and, because of their human readability, are expected to be more stable than the aforementioned coding.The herein presented design for a universal, stable and human-readable alphabetical species identifier includes two interlinked elements: a 3-digit taxonomic classification (element 1), followed by a 6-digit acronym of the scientific binominal species name (element 2).The former serves as a prefix to the species name acronym and increases the unambiguity and strength of the species identifier.Cyphering remains at the species level, as sub-species are not considered.

Adaptations and Limitations of the Nomenclature Design
The utility of a nomenclature to establish an identifier is measured by its ability to be adapted to new developments.For example, biosamples of reproductive, naturally occurring "inter-species hybrids", as well as species with an exceptionally short binominal zoological name (see Supplementary Material Table S1), may not fit the proposed nomenclature pattern.However, such short scientific species names are extremely rare and only apply to one mammal species (Great evening bat (Ia io)), two fish species (Weedy cardinalfish (Foa fo) and Betta pi) and six invertebrate species [74].Our proposed nomenclature design is flexible and well adjustable to these special cases (see Supplementary Material Table S1).Moreover, in the unlikely, although not impossible, event of any duplication of the 10-digit species code (component I), adjustments to the generic and specific acronyms can be made (element 2) (Table S1).Lastly, if a species is taxonomically not clearly assigned to a class, order and/or family but to a subdivision of these ranks, the closest assigned subdivision of the respective rank should be used (e.g., suborder instead of order) to create the taxonomic classification (element 1).
In the infrequent event that a scientific species name is changed due to new scientific findings, the species identifier would need to be updated accordingly.Any resulting newly distributed identifiers for already named biosamples would have to be permanently linked to the outdated identifiers to maintain traceability.

Conclusions
We herein propose a first, uniform, human-readable 17-digit alphanumerical nomenclature design that assigns standardized identifiers to animal-derived living cellular material such as tissues, cells, gametes, embryos and stem-cell-generated multicellular models.The predefined naming scheme is especially suggested for the designation of biosamples of wildlife species.It includes acronymized species and biosample information and allows simple adaptation according to the respective biosample type.Linking of the nomenclaturebased name to a body of data which (i) uniformly characterizes the cellular material and its derivation, (ii) demonstrates the genealogy, sex and ID of the donor animal and (iii) evidences the legal and ethical provenance is indispensable to ensure clear reference to and the unambiguous traceability of the biomaterial, especially when it is published and transferred worldwide for research.A centralized repository of stable biosample names could provide such a resource and also allow for machine-based linking to other central registries, specifically RRIDs.Such a platform is thus required to make these persistent and the associated data publicly accessible (FAIR).Ideally, these unique identifiers will be automatically generated using an API by the centralized repository.We therefore emphasize the need for a centralized repository to associate the standardized biosample name with its metadata.Such collection, standardization and FAIRification of data are powerful tools to support the visibility and international exchange of valuable wildlife-derived biomaterial, thereby facilitating globally consistent scientific work in wildlife conservation and biomedical research.

1. 2 .
Need for a Standardized Nomenclature Design for Animal Biosamples 1.2.1.Free-Text Names Have Little or No Interpretability

Figure 1 .
Figure 1.Schematic presentation of the nomenclature design with its components and elements.The 17-digit unique identifiers (UIs) are composed of four descriptive elements in a predefined formal pattern with distinct order 1-4, providing information on the species (component I) and the biosample (component II).The combination of elements 1 and 2 results in a robust species identifier.Each of the 17 positions is assigned a characteristic feature of upper case letter (positions 1, 2, 3, 5, 8 and 12), lower case letter (positions 6, 7, 9 and 10), hyphen (positions 4 and 11) or five 1-digit numbers between 0 and 9 (positions 13-17).Upper case letters indicate the first letter of a new word.

Figure 2 .
Figure 2. Alteration of biosample information coding (component II) according to downstream processing of biosamples.Example for White rhino (Ceratotherium simum) primary material.Left example: Skin tissue material of a White rhino, registered as, e.g., MPR-CerSim-T00020, is hypothetically processed into a somatic cell line (e.g., fibroblasts) and subsequently assigned a new UI, such as, e.g., MPR-CerSim-C00015.Aliquots of the latter are then reprogrammed into an iPSC

Figure 1 .
Figure 1.Schematic presentation of the nomenclature design with its components and elements.The 17-digit unique identifiers (UIs) are composed of four descriptive elements in a predefined formal pattern with distinct order 1-4, providing information on the species (component I) and the biosample (component II).The combination of elements 1 and 2 results in a robust species identifier.Each of the 17 positions is assigned a characteristic feature of upper case letter (positions 1, 2, 3, 5, 8 and 12), lower case letter (positions 6, 7, 9 and 10), hyphen (positions 4 and 11) or five 1-digit numbers between 0 and 9 (positions 13-17).Upper case letters indicate the first letter of a new word.

Animals 2024 , 12 Figure 1 .
Figure 1.Schematic presentation of the nomenclature design with its components and elements.The 17-digit unique identifiers (UIs) are composed of four descriptive elements in a predefined formal pattern with distinct order 1-4, providing information on the species (component I) and the biosample (component II).The combination of elements 1 and 2 results in a robust species identifier.Each of the 17 positions is assigned a characteristic feature of upper case letter (positions 1, 2, 3, 5, 8 and 12), lower case letter (positions 6, 7, 9 and 10), hyphen (positions 4 and 11) or five 1-digit numbers between 0 and 9 (positions 13-17).Upper case letters indicate the first letter of a new word.

Figure 2 .
Figure 2. Alteration of biosample information coding (component II) according to downstream processing of biosamples.Example for White rhino (Ceratotherium simum) primary material.Left example: Skin tissue material of a White rhino, registered as, e.g., MPR-CerSim-T00020, is hypothetically processed into a somatic cell line (e.g., fibroblasts) and subsequently assigned a new UI, such as, e.g., MPR-CerSim-C00015.Aliquots of the latter are then reprogrammed into an iPSC

Figure 2 .
Figure 2. Alteration of biosample information coding (component II) according to downstream processing of biosamples.Example for White rhino (Ceratotherium simum) primary material.Left example: Skin tissue material of a White rhino, registered as, e.g., MPR-CerSim-T00020, is hypothetically processed into a somatic cell line (e.g., fibroblasts) and subsequently assigned a new UI, such as, e.g., MPR-CerSim-C00015.Aliquots of the latter are then reprogrammed into an iPSC line, and this new cell line is registered as, e.g., MPR-CerSim-C00032.Right example: Gametes of a White rhino, which are, e.g., assigned the UIs MPR-CerSim-G00001 and MPR-CerSim-G00321, are used for in-vitro fertilization, and one of the resulting embryos is assigned the UI MPR-CerSim-E00064.An ESC line, which would be derived from this embryo, is subsequently assigned a UI, e.g., MPR-CerSim-C00123.If stem-cell-derived biosamples such as somatic and germ cells, multicellular models or gametes are further developed, the prefix and subsequent number are then adjusted accordingly, e.g., to "M" or "G".
supervision, A.K., N.M., T.B.H. and S.C.M.; funding acquisition, A.K., T.B.H. and A.P. All authors have read and agreed to the published version of the manuscript.

Table 1 .
Examples of assigned names for animal biosamples in absence of a standardized naming system.

Table 2 .
Summary and explanation of the nomenclature components and elements.

Table 3 .
Examples of the full nomenclature design.Summarized are combinations of diverse examples for component I (species information) and component II (biosample information).