Changes in Neuropeptide Prohormone Genes among Cetartiodactyla Livestock and Wild Species Associated with Evolution and Domestication

The impact of evolution and domestication processes on the sequences of neuropeptide prohormone genes that participate in cell–cell signaling influences multiple biological process that involve neuropeptide signaling. This information is important to understand the physiological differences between Cetartiodactyla domesticated species such as cow, pig, and llama and wild species such as hippopotamus, giraffes, and whales. Systematic analysis of changes associated with evolutionary and domestication forces in neuropeptide prohormone protein sequences that are processed into neuropeptides was undertaken. The genomes from 118 Cetartiodactyla genomes representing 22 families were mined for 98 neuropeptide prohormone genes. Compared to other Cetartiodactyla suborders, Ruminantia preserved PYY2 and lost RLN1. Changes in GNRH2, IAPP, INSL6, POMC, PRLH, and TAC4 protein sequences could result in the loss of some bioactive neuropeptides in some families. An evolutionary model suggested that most neuropeptide prohormone genes disfavor sequence changes that incorporate large and hydrophobic amino acids. A compelling finding was that differences between domestic and wild species are associated with the molecular system underlying ‘fight or flight’ responses. Overall, the results demonstrate the importance of simultaneously comparing the neuropeptide prohormone gene complement from close and distant-related species. These findings broaden the foundation for empirical studies about the function of the neuropeptidome associated with health, behavior, and food production.


Introduction
Domestication typically encompasses the separation of a species from its natural habitat and physiological and behavioral modifications through artificial selection and environments [1]. This interplay is uniquely demonstrated in the mammalian superorder Cetartiodactyla that includes domesticated livestock species as well as species adapted to diverse environmental conditions including alpine, desert, and marine environments, and management practices including precision management [2][3][4]. Out of the four Cetartiodactyla suborders, three suborders have at least one domesticated species: Ruminantia (sheep, cattle, goats, and deer), Suina (pigs and peccaries), and Tylopoda (camels, llamas, and alpaca). These domesticated terrestrial Cetartiodactyla species are managed for a wide variety of purposes including food, fiber, companion, show, and draught purposes. Whippomorpha (hippopotamuses and whales), the fourth Cetartiodactyla suborder, encompasses species abundant in the wild, and also present in zoos and aquariums participating in conservation and scientific studies [5].
Livestock domestication started approximately 11,000 years ago with domestic cattle (Bos taurus), domestic goats (Capra hircus), domestic sheep (Ovis aries), and domestic pigs (Sus scrofa) [2][3][4]. Tylopoda were domesticated 3000 to 7000 years ago [6] and reindeer  Different groupings of the Cetartiodactyla species were identified based on domestication status, suborder, family, and subfamily. Individual species were classified as domesticated (12 species) or wild (106 species). The wild Bactrian camel, Camelus ferus, was excluded from the previous classification due to uncertainty on the domestication status of the sequenced individuals [24]. The wild species from the suborders Ruminantia, Suina, and Tylopoda were grouped as terrestrial (nonCetacea) species (91 species), while Cetacea species (Whippomorpha species excluding the hippopotamus, Hippopotamus amphibious) are aquatic. Ruminantia species were also divided into domestic (8 species) and wild (74 species).
A catalog of neuropeptide gene products (prohormone proteins) was assembled and the predicted protein sequences for each species were obtained following the bioinformatic pipeline [15,16]. Since Cetartiodactyla species were unannotated, additional steps within the pipeline were taken to identify the neuropeptide prohormone genes. The longest prohormone protein isoforms were compiled from a curated list of 98 mammalian prohormone protein sequences identified in Cetartiodactyla [15,17,18,[25][26][27]. The most probable location for each neuropeptide prohormone gene was identified using TBLASTN [28] using default settings and with the low-complexity filtering disabled. An approximate 10,000 bp region surrounding the most probable gene location was extracted and the corresponding protein sequence was predicted from the extracted region using the gene prediction tool Genewise [29] with default settings. Individual sequence inspection ensured that all predicted sequences corresponded to the same longest known protein isoform. Variations in assembly depth and quality across the genomes could result in incomplete sequences or apparent sequence duplication. Incomplete, inconsistent, or potentially duplicate predictions were evaluated and additional searches using alternative locations, spanning intronic regions to accommodate genes that have short exons, and relaxing search criteria with and without composition-based statistics [30] were used to improve the prediction.
Following the compilation and validation of the neuropeptide gene protein sequences, the similarity between species was studied. The alignment of protein sequences across species by prohormone was implemented using the sequence aligner MAFFT [31]. Alignment accuracy was optimized using the L-INS-I parameterization that enables the alignment of sequences containing sequences flanking around one alignable domain [31]. Phylogenetic gene trees of protein sequences were computed for each neuropeptide prohormone gene using the software PhyML [32] with default parameters, and tree visualization used the software ASTRAL with default parameters [33][34][35][36]. The genetic distances between species trees and individual prohormone trees were estimated using Pearson correlation coefficients. The protein distances within prohormones and across the different groupings were estimated using the mean protein evolutionary distance approach [37].
The estimation of neuropeptide prohormone gene protein distances was ensued by the identification of the evolutionary/domestication paradigm best supported by the 118 species studied. An evolutionary model [38] accounting for information on amino acid aromaticity (A), composition (C), polarity (P), and side chain volume (V) [39][40][41] was fitted using the Jones-Taylor-Thornton (JTT) substitution model [42]. The final model best supported by the sequences and species studied was selected using Akaike information criterion [43].

Prohormone Identification across Species
All 98 prohormone protein sequences were identified across all 118 Cetartiodactyla species but varied across taxonomic groups. A notable finding from the comparison across species was that relaxin 1 (RLN1) has been lost in Ruminantia but is present in all other Cetartiodactyla suborders. Furthermore, prohormone peptide YY2 (PYY2) was only reliably detected across the Ruminantia suborder including complete predictions in the Giraffidae and Bovinae subfamilies and a partial PYY2 prediction in the Cervidae subfamily. Protein sequences from AVP, secretin (SCT), tachykinin precursor 4 (TAC4), and VGF nerve growth factor inducible (VGF) were completely or partially identified in at least 20% of the Cetartiodactyla species. Apelin (APLN), CART pre-propeptide (CARTPT), NPY, and somatostatin (SST) protein sequences were found to be highly conserved.
Within species, the genome assemblies of argali (Ovis ammon), Alpine ibex (Ammotragus lervia), and blue whale (Balaenoptera musculus) enabled the prediction of 56%, 53%, and 79% of neuropeptide prohormone genes, respectively. The hartebeest (Alcelaphus buselaphus) genome supported the complete or partial prediction of 69% of the neuropeptide prohormone genes. When compared to closely related species, the challenge in predicting the totality of the prohormone protein sequence completely was primarily due to depth and quality variations along the individual assemblies rather than species differences. Limitations in sequence coverage, contig assembly including frame orientation, and splice site prediction prevented an accurate census of neuropeptide prohormone gene gains and losses despite the multiple specifications of the predictive algorithms used in the present study.

Species Tree
The species tree followed the expected species relationships between and within the four Cetartiodactyla suborders (Figure 1). The species tree excluded the RLN1 and PYY2 gene trees because these genes were suborder-specific. Among the Whippomorpha species, the hippopotamus was more distant to the rest of the species in the suborder that correspond to the infraorder Cetacea which in turn is distributed into the parvorders Mysticeti (baleen whales) and Odontoceti (toothed whales). The tree also demonstrates that the comparative analysis of neuropeptide protein sequences resulted in the expected organization of species into known families and subfamilies within the Ruminantia suborder.
(baleen whales) and Odontoceti (toothed whales). The tree also demonstrates that the com parative analysis of neuropeptide protein sequences resulted in the expected organizatio of species into known families and subfamilies within the Ruminantia suborder.

Correlation of InterSpecies Distances Based on Individual and All Neuropeptide Prohormone Genes
To understand the variation in species distances across neuropeptide prohormon genes, the distances estimated within the neuropeptide prohormone gene tree were corr lated to the distances from the species gene tree. The correlation of interspecies distanc between individual and the species gene trees averaged 0.77 and ranged from 0.25 to 0.9 The species distance for most individual neuropeptide prohormone genes (69 genes) w highly correlated (between 0.8 and 0.9) with the distance estimate from the species gene tre Neuropeptide prohormone genes that are highly conserved tended to provide distance e timates less correlated with the estimates from the species gene tree, relatively with le conserved neuropeptide prohormone genes. This was reflected by the number of uniqu sequences and the length of the predicted protein sequence. The interspecies distance est mated from chromogranin A (CHGA), prodynorphin (PDYN), and the thyrotropin relea ing hormone (TRH) were highly correlated (correlation > 0.9) with the distance from th species gene tree. On the other hand, the interspecies distance estimated from AVP, insul (INS), natriuretic peptide C (NPPC), prokineticin 2 (PROK2), parathyroid hormone (PTH and SST had lower correlations (correlation < 0.50) with the distance from the species gen tree resulting from extremely highly conversed protein sequences across species.
The distance between Ruminantia species computed from individual neuropeptid prohormone genes had higher correlations with the distance computed from the specie

Correlation of InterSpecies Distances Based on Individual and All Neuropeptide Prohormone Genes
To understand the variation in species distances across neuropeptide prohormone genes, the distances estimated within the neuropeptide prohormone gene tree were correlated to the distances from the species gene tree. The correlation of interspecies distances between individual and the species gene trees averaged 0.77 and ranged from 0.25 to 0.92. The species distance for most individual neuropeptide prohormone genes (69 genes) was highly correlated (between 0.8 and 0.9) with the distance estimate from the species gene tree. Neuropeptide prohormone genes that are highly conserved tended to provide distance estimates less correlated with the estimates from the species gene tree, relatively with less conserved neuropeptide prohormone genes. This was reflected by the number of unique sequences and the length of the predicted protein sequence. The interspecies distance estimated from chromogranin A (CHGA), prodynorphin (PDYN), and the thyrotropin releasing hormone (TRH) were highly correlated (correlation >0.9) with the distance from the species gene tree. On the other hand, the interspecies distance estimated from AVP, insulin (INS), natriuretic peptide C (NPPC), prokineticin 2 (PROK2), parathyroid hormone (PTH), and SST had lower correlations (correlation <0.50) with the distance from the species gene tree resulting from extremely highly conversed protein sequences across species.
The distance between Ruminantia species computed from individual neuropeptide prohormone genes had higher correlations with the distance computed from the species gene tree relative to other species. This trend may reflect the higher proportion of ruminant species among all species used. A higher average distance between species was estimated in the second most represented suborder, Whippomorpha. The hippopotamus, sperm whale (Physeter catodon), and baiji (Lipotes vexillifer) species were more distant from other Whippomorpha species.

Evolutionary Model
The evolutionary model use accounts for information on amino acid aromaticity (A), composition (C), polarity (P), and side chain volume (V) to understand the impact of evolutionary and domestication changes within and across suborders. Table 2 summarizes the number of neuropeptide prohormone genes by evolutionary model specification. Most neuropeptide prohormone genes exhibited a nonzero estimate for at least one parameter; parameters A and V commonly had negative estimates, while parameter C typically had positive estimates, and P tended to have positive estimates (Table 2). This relationship between parameters is consistent with the correlations of individual amino acid coefficients for each parameter in the evolutionary model. Both the A and V parameters are positively correlated (~0.7) and negatively correlated with the C and P parameters (−0.35 to −0.47). The C and P parameters are positively correlated (0.4) which increased to 0.81 when cysteine is excluded. The domestic group of species had a higher proportion of neuropeptide prohormone genes (40%) than the wild grouping with nonzero parameter estimates especially involving the C parameter. A smaller proportion of parameter changes occurred between domesticated and wild Ruminantia species. There were seven neuropeptide prohormone genes where the modified parameters only occurred in domestic (1) or wild terrestrial (6) groups. These neuropeptide prohormone genes included apelin (APLN), adenylate cyclase activating polypeptide 1 (ADCYAP1), arginine vasopressin (AVP), cholecystokinin (CCK), growth hormone releasing hormone (GHRH), torsin family 2 member A (salusin-containing isoform, TOR2X), and VGF nerve growth factor inducible (VGF), and the averages of the parameter changes for these neuropeptide prohormone genes are summarized in Table 3. The positive A parameter estimate for APLN implies a preference for aromatic amino acids with domestication. In the wild groupings, the A and V parameters were negative estimates and the C parameter had positive estimates, implying a preference against aromatic amino acids with domestication.

Prohormone Complement
The integration of bioinformatics prohormone prediction, compilation, characterization, and analysis offered insights into changes of neuropeptide genes associated with evolutionary and domestication processes across Cetartiodactyla suborders and species. As expected of orders that encompass a wide range of species, the Cetartiodactyla assemblies differed in depth and quality across the genome. The Cetartiodactyla species tree was virtually identical to the expected tree [6,44,45]. The strategy of using evolutionary proximal species for gene prediction in weaker assemblies [46] enabled the recovery of sequences and the results indicated overall sequence consistency across taxonomic groups. The evolutionary proximal strategy minimized the identification of differences between species that could be a result of assembly limitations.
A remarkable finding was that Suina species contained the AUG initiation codon but all other Cetartiodactyla species in this study contained a nonAUG initiation codon of neuropeptide W (NPW) present in other mammalian species [47]. The absence of the AUG initiation codon can result in incorrect prediction of the start of the actual coding region of NPW. The unique feature of the neuropeptide NPW is particularly important because pigs are regularly used as biomedical models due to their higher genome similarity to humans than rodents and several health and behavioral processes are modulated by NPW.
Another finding from the bioinformatics pipeline centered on sequence variations in cortistatin (CORT). A 15-nucleotide insertion in signal peptide of CORT was detected in the domestic goat and does not impact the cortistatin neuropeptides. The potential evolutionary difference was detected in the assemblies of four domestic goat breeds currently available and this insertion was not present in other Capra or Caprinae species.
Comparisons of the predicted prohormone sequences highlighted and refined existing knowledge of the neuropeptide genes RLN1, PYY2, islet amyloid polypeptide (IAPP), and galaninlike peptide (GALP). Our predictions of RLN1 extended the loss of bovine RLN1 [17] to all Ruminantia species after the split from Whippomorpha. Oppositely, PYY2 was detected solely in Ruminantia species. Species from other suborders either lacked a complete prediction or had indeterminate matches (Whippomorpha) supporting that PYY2 is a pseudogene outside Ruminantia [48]. While GALP was detected in most species, all Cetartiodactyla suborders had different terminal regions and some Bovidae species lacked the initial initiation region. Tylopoda and Cetacean species presented IAPP sequences encompass-Vet. Sci. 2022, 9, 247 8 of 13 ing discrepancies consistent with pseudogenes including stop codons and lack of a signal peptide. Chacoan peccary (Catagonus wagneri) and all the camelid species studied, along with previous reports on the pig IAPP [18], lack the sequence corresponding to traditional cleavage sites, although analysis of RNA-seq data indicated that IAPP is expressed in pigs [49].
The comparative analysis of neuropeptide genes between Cetartiodactyla species identified sequence differences that could impact the levels of bioactive neuropeptides including TAC4, insulinlike 6 (INSL6), gonadotropin releasing hormone 2 (GNRH2), and prolactin releasing hormone (PRLH). Noticeably, TAC4 exhibited sequence differences among the Cetartiodactyla species studied, yet no sequence encompassed the cleavage motif reported in human and mouse [50]. Similarly, the start and terminal regions of INSL6 in pigs and camelids were different from the other Cetartiodactyla species that, in turn, were similar to the corresponding human and rodent sequences [51]. While the N-terminal cleavage site of the INSL6 A chain was conserved across species, the C-terminal cleavages sites of the INSL6 A and B chains remains uncharacterized among Cetartiodactyla species. Likewise, while the signal peptide for GNRH2 was six amino acids longer in all the species of the Bovinae subfamily, the expected cleavage motif for the gonadoliberin-2 peptide was missing from all the Whippomorpha species. Similarly prominent, all Cetacea species had an 8-amino acid deletion within the PRLH signal peptide that was not present in the Hippopotamus amphibius or other Cetartiodactyla species and this sequence difference resulted in a shorter predicted prolactin-releasing peptide 31 (PrRP31). On the other hand, a 10 bp-insert in all Ruminantia species results in a longer PRLH terminal region, albeit this region is not part of a known bioactive peptide.
The variation in POMC protein sequences across the Cetartiodactyla species could impact multiple neuropeptides. While the complete POMC sequence was predicted in Tylopoda species, the sequence lacked a cleavage site within the N-terminal peptide (NPP) necessary to produce the pro-γ-MSH, and in turn, the small amidated γ 1 -MSH. While no peptides associated in pro-γ-MSH region were detected by mass spectrometry [27], γ 1 -MSH has been characterized in human [52] and cattle [53]. The N-terminal cleavage site was detected in all the other Cetartiodactyla species studied. A similar phenomenon is observed among rodent species [54] with the cleavage site missing in Muridae yet observed in other Rodentia families. Remarkably, both Tylopoda and Muridae species include the sequence for the larger γ 3 -MSH peptide. The role of γ-MSH peptides in health and behavior processes encompasses sodium metabolism and blood pressure regulation [55][56][57] and the injection of γ 1 -MSH in the left ventral tegmental area of rats induced grooming [58].
Neuropeptides from calcitonin (CALC) genes have many different roles including the regulation of calcium, vasodilation, inflammation, migraine, pain, and hypothermia [59][60][61][62]. Related with domestication behaviors, an SNP in the calcitonin receptor-stimulating peptide (CRSP) gene cluster differentiated pure-breed (mating determined by human breeding practices) from free-breeding (mating is not artificially restricted) dogs [63]. Moreover, the calcitonin receptorlike receptor (CALCRL) was differentially expressed in the pituitary of tame and aggressive foxes [64]. In the present study, the identification of four calcitonin or CRSP genes across all Cetartiodactyla taxa was consistent with other mammals except primates and rodents [65]. While calcitonin-related genes were detected, the comparison of cleavage site locations indicated that the cleavage sites necessary to form all known neuropeptides such as a calcitonin gene-related peptide 1 are absent from both CRSP2 and CRSP3.

Evolutionary Model
The differences in the evolutionary model amino acid aromaticity (A), composition (C), polarity (P), and side chain volume (V) parameters enabled the identification of patterns of amino acid property changes in the prohormone sequences and thus neuropeptides across Cetartiodactyla taxa. The similarity in the number of changes between all species and all Ruminantia species was due to the high proportion of Ruminantia species and Vet. Sci. 2022, 9, 247 9 of 13 lower similarity was observed with family or subfamily groupings. The relatively high proportion of changes within Tylopoda was due to the limited number of species and the close relationship between Tylopoda species. Within taxonomic groupings, Whippomorpha exhibited a high proportion of neuropeptide prohormone genes changing in the A, C, and V parameters that was similarly represented in Mysticeti and Odontoceti families.
The majority of neuropeptide prohormone genes exhibited changes between Cetartiodactyla suborders, while 18% of neuropeptide prohormone genes including promelanin concentrating hormone (PMCH), growth hormone releasing hormone (GHRH), PROK2, AVP, platelet derived growth factor subunit B (PDGFB), and pancreatic polypeptide (PPY) exhibited no substantial change in parameters. The conservation of the previous prohormone sequences supported amino acid substitution rates that are consistent with the JTT substitution model.
Among the neuropeptide prohormone genes that presented parameter changes across species, most changes occurred in one (31%) or two parameters (34%). These changes were primarily negative estimates for parameters A and V and positive estimates for parameter C. For some neuropeptide prohormone genes, including hepcidin antimicrobial peptide (HAMP), insulinlike 3 (INSL3), motilin (MLB), and spexin hormone (SPX), the amino acid parameter changes were observed in most taxonomic groups.
A notable finding stemming from the parameter coefficient estimates implies that changes in prohormone sequence lower the likelihood of large and hydrophobic amino acids (i.e., phenylalanine, tryptophan and tyrosine). These amino acids may compromise the neuropeptide function and therefore the evolutionary forces favor other types of amino acids. Moreover, our results indicate that the cation-π interaction provided by the aromatic amino acids, a strong noncovalent binding interaction that is important in protein secondary structure and interactions with drugs and neurotransmitters such as serotonin [66], is generally undesirable in prohormone sequences.

Domestication
The study of changes in neuropeptide gene protein sequences associated with domestication accounted for factors such as the distribution of domestic and wild species. Inference was partitioned orthogonally from taxonomy to remove all confounding groups while addressing differences in the number of species and limited gene flow. The evolution model indicated differences with wild terrestrial and domestic groupings. The domesticated Ruminantia had a slightly higher proportion of changes than wild Ruminantia with 60% of the neuropeptide prohormone genes being equal or had at least one parameter in common between domesticated and wild Ruminantia. This finding suggests that part of the differences between parameter values can be attributed to taxonomic differences rather than domestication.
Further challenges to the assessment of the association between domestication and neuropeptide gene changes stem from the domestication classification of some species and the genome assembly quality of other species. With respect to domestication assignments, for example, the gayl (Bos frontalis) is often considered the domestic form of the wild gaur (Bos gaurus) [67], however the degree of domestication is highly variable among the former group. With regards to sequence quality, the virtually identical nucleotide or protein sequences predicted from all camelid genomes available in this study invalidates the conclusion of adaptive introgression of endothelin 3 (EDN3) in South American camelids [6]. The association between domestication and prohormone sequence changes is further obscured by the counteraction of ancestral hybridization and artificial selection [3,4,6,68]. Considering the distribution of species across taxa and domestication groups and adjusting for variable assembly quality within and across domesticated groups, the present study identified notable and consistent changes among neuropeptide prohormone genes associated with docile and herdlike behaviors.
The neuropeptide prohormone genes that presented differences in amino acid parameters between domestic and wild species produce neuropeptides that participate in a vast array of functions. Among the neuropeptides from neuropeptide prohormone genes that have distinct sequence across species, APLN, AVP, TOR2X (salusin-containing torsin family 2 member A isoform), and VGF modulate angiogenesis, vasodilation, and vasoconstriction. Likewise, neuropeptides from ADCYAP1, CCK, GHRH, and VGF are associated with feeding and energy homeostasis. VGF is also associated with circadian rhythm, pain, memory, and learning [69,70]. In the context of domestication, a commonality is that the previous functions are associated with the sympatho-adrenomedullary system that involves the 'fight or flight' response where blood pressure and glucose levels increase in response to stress [71].

Conclusions
A study of the association between evolutionary and domestication processes and changes in neuropeptide gene sequences was undertaken. A bioinformatics pipeline was developed to identify the prohormone complement of 98 sequences across 118 Cetartiodactyla wild and domesticated species distributed across 4 suborders. An exhaustive survey of prohormone sequences was compiled, the sequences were compared, and evolutionary models were used to assess the change in amino acid properties including aromaticity, composition, polarity, and volume parameters. Remarkable findings include sequence differences among Cetartiodactyla that disrupt cleavage motifs in particular neuropeptide prohormone genes (e.g., INSL6, GNRH2, PRLH, POMC, CALC), potentially compromising the ability of the prohormone to generate bioactive neuropeptides. Similarly notable, the parameter coefficient estimates suggest that the evolutionary process tends to disfavor large and hydrophobic amino acids in neuropeptide prohormone genes. Evolutionary modeling indicated that some neuropeptide prohormone genes associated with the sympatho-adrenomedullary system and the 'fight or flight' response may be impacted by domestication. The prohormone complement provides the foundation for neuropeptidomic studies of medically and economically important characteristics and Cetartiodactyla species.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/vetsci9050247/s1, Table S1: Taxonomic names and National Center for Biotechnology Information identifiers for genomes of Cetartiodactyla species. Additionally, the protein sequences with limited multiple sequence alignments and phylogenetic tree data presented in this study are openly available in the Illinois Data Bank, Available online: https://doi.org/10.13012/B2IDB-2071 917_V1 (accessed on 18 May 2022).  Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be obtained from the National Center for Biotechnology Information.