Identification of Caprine KRTAP28-1 and Its Effect on Cashmere Fiber Diameter

The keratin-associated proteins (KAPs) are constituents of cashmere fibers and variation in many KAP genes (KRTAPs) has been found to be associated with fiber traits. The gene encoding the high-sulphur KAP28-1 has been described in sheep, but it has not been identified in the goat genome. In this study, a 255-bp open reading frame on goat chromosome 1 was identified using a search of similar sequence to ovine KRTAP28-1, and that would if transcribed and translated encode a high sulphur KAP. Based on the analysis of polymerase chain reaction amplicons for the goat nucleotide sequences in 385 Longdong cashmere goats in China, five unique banding patterns were detected using single strand conformation polymorphism analysis. These represented five DNA sequences (named variants A to E) and they had the highest resemblance to KRTAP28-1 sequences from sheep, suggesting A–E are variants of caprine KRTAP28-1. DNA sequencing revealed a 2 or 4-bp deletion and eleven nucleotide sequence differences, including four non-synonymous substitutions. Of the four common variants (A, B, C and D) found in these goats, the presence of variant A was associated with decreased mean fiber diameter and this effect appeared to be additive. These results indicate that caprine KRTAP28-1 variation might have value as a molecular marker for reducing cashmere mean fiber diameter.


Introduction
Cashmere fiber is produced by the secondary fiber follicles of cashmere goats. As a consequence of its physical properties, the price of cashmere is generally much higher than that of wool. China is the largest producer of cashmere fiber, producing a third to a half of total global cashmere production, and this is widely exported. The Gansu province of China is the main region of cashmere goat breeding in China, and it is home to well-known cashmere goat breeds including the Hexi cashmere goat, the Longdong cashmere goat and the Zhongwei goat [1].
The Longdong cashmere goat is a breed that has been created as a cross between the Liaoning cashmere goat, the Inner Mongolian cashmere goat and the Ziwuling black goat. It is used for both cashmere and meat production and is well adapted to harsh environments including desert and other arid regions. The number of Longdong cashmere goats in China is approximately 1.6 million [1].

Goats Investigated and Cashmere Data Collection
The animal work was approved by Gansu Agricultural University, and all experiments for these goats were conducted according to the guidelines for the care and use of experimental animals established by the Ministry of Science and Technology of the People's Republic of China (Approval number 2006-398).
A total of 385 Longdong cashmere goats were studied. These were farmed by the Yusheng Cashmere Goat Breeding Company, and located in Huan County of the Gansu Province of China. These goats were the progeny of eleven un-related sires. At one year of age (first combing to collect fiber), the weight of cashmere fiber obtained by combing was measured. Fiber samples were collected from the mid-side region to measure both mean fiber diameter and the crimped fiber length per goat using an Optical-based Fiber Length and Diameter Analyzer OFDA4000 (EPCO, Shanghai, CHN), which standard deviation is less than 0.07 µm for the machine. Blood samples from these goats were collected onto Munktell TFN paper (Munktell Filter AB, Falun, Sweden) and the DNA from the blood was purified for PCR-SSCP analysis by the method of Zhou et al. [16].

Search for the Caprine KAP28-1 Gene
Using the ovine KRTAP28-1 sequence [6], a BLASTN search in GenBank of the Caprine Genome Assembly GCF_001704415.1 was undertaken. Of all the sequences found by the BLASTN search, the sequence with the greatest similarity to the ovine KRTAP28-1 sequence, was assumed to be caprine KRTAP28-1.

Polymerase Chain Reaction-Single Strand Conformation Polymorphism (PCR-SSCP) Analysis of Caprine KRTAP28-1
Using the goat sequence identified above, two primers (5 -TAGACAAGCCATTCTCTGTTG-3 and 5 -CATTCCAGTATTCCTGCCTG-3 ) were designed to amplify a 525-bp fragment containing the whole coding region of the putative caprine KRTAP28-1. These primers were synthesized by the Takara Biotechnology Company Limited (Dalian, China). Amplifications were carried out in a 20-µL reaction consisting of 0.25 µM of each primer, the genomic DNA on one 1.2-mm punch of TFN paper, 0.5 U of Taq DNA polymerase (Takara, Dalian, China), 150 µM of each dNTP (Takara), 2.5 mM Mg 2+ , 2.0 µL of 10× PCR buffer (Supplied with the DNA polymerase enzyme) and deionized water to make up the volume to 20 µL. The thermal profile consisted of an initial denaturation for 2 minutes at 94 • C, followed by 35 cycles of 94 • C for 30 seconds, 58 • C for 30 seconds and 72 • C for 30 seconds, with a final extension of 5 minutes at 72 • C. The PCR amplifications were carried out in Bio-Rad S1000 thermal cyclers (Bio-Rad, Hercules, CA, USA).

Sequencing of KRTAP28-1 Variants and Sequence Analyses
The DNA sequencing approaches employed were different for goats that were homozygous versus heterozygous for particular PCR-SSCP patterns.
Those amplicons that were identified as homozygous by PCR-SSCP analysis, were directly sequenced in both directions using a Sanger sequencing approach at the Beijing Genomics Institute, Beijing, China. However, for those variants that were only present in goats that were identified as being heterozygous, samples were sequenced using the approach described by Gong et al. [18]. DNAMAN version 5.2.10 (Lynnon BioSoft, Vaudreuil, Canada) was used to align DNA and amino acid sequences, and to translate DNA. MEGA version 7.0 was used to construct maximum parsimony phylogenetic tree based on the predicted amino acid sequences. The numbering of nucleotides and amino acids were in accordance with the guidelines at HGVS nomenclature and goat KAP gene sequences were obtained from GenBank and Caprine Genome Assembly GCF_001704415.1.

Statistical Analyses
IBM SPSS Statistics version 24.0 (IBM, NY, USA) was used to perform the statistical analyses. For the common variants (with a frequency greater than 5%), general linear mixed-effects models (GLMMs) were used to assess the effect of the presence or absence (coded as 1 or 0 respectively) of these KRTAP28-1 variants on various cashmere traits in the 385 Longdong cashmere goats. Since sire and gender were found to affect all the cashmere fiber traits, they were fitted as a random and fixed factor, respectively. Birth rank was not found to affect any cashmere fiber traits, and accordingly it was not included in the models. Only the main effects were tested. Unless otherwise indicated, all p values were considered significant when p < 0.05.
A second set of analyses was performed with the number of variant copies present included (in place of presence/absence) to ascertain whether additive, dominant or recessive effects were present. These models were conducted in an identical manner to the GLMMs used for testing the presence/absence of each variant.

Figure 1.
Location of KRTAPs on caprine chromosome 1. The identified KRTAP28-1 sequence is shown in the box, together with eleven previously identified KRTAPs. The vertical bars represent the KAP genes and the names of these genes are identified below the bars (e.g., 28-1 represents KRTAP28-1).
The arrows indicate the direction of transcription. The spacing of these genes is only approximate and is based on the Caprine Genome Assembly, as are the nucleotide coordinates.
Five different banding patterns (A, B, C, D and E) were detected in the Longdong cashmere goats by PCR-SSCP analysis ( Figure 2). Either one, or a combination of two different patterns, was observed for each goat, which is in accordance with them being either homozygous or heterozygous. DNA sequencing of the amplicons producing these patterns confirmed the occurrence of five unique nucleotide sequences for the amplicons. While all of the five sequences were different, they had over 97% similarity to the DNA sequence in the caprine genome assembly GCF_001704415.1. Phylogenetic analysis of the predicted amino acids sequences of the five caprine sequences identified, with all of the high sulphur KAP genes identified in sheep, humans and goats to date, and including the KRTAP28-1 sequence from sheep, revealed that these caprine sequences was different from all known caprine high sulphur KAP genes, but were most closely related to the ovine KRTAP28-1 sequence (Figure 3). This suggests that the five sequences identified in the study represent caprine orthologous variants of KRTAP28-1. The tree was constructed using the amino acid sequences (or predicted amino acid sequences). The numbers at the forks indicate the bootstrap confidence values, and only those equal to, or higher than 70%, are shown. The caprine, sheep and human KAPs are indicated with a prefix "g", 's" and "h", respectively. The five newly identified goat KAP28-1 sequences are indicated with a red vertical line, and the GenBank/EMBL accession numbers for other HS-KAPs are: The five caprine KRTAP28-1 sequences would all encode polypeptides of 84 amino acid residues. These included high levels of serine (19.05%) and threonine (13.10%), and moderate levels of tyrosine (7.14%), asparagine (7.14%), asparagine (7.14%), phenylalanine (5.95%-7.14%), glycine (4.76%-5.95%), leucine (4.76%-5.95%), cysteine (4.76%) and arginine (3.57%-4.76%).  (Figure 4).

Effect of Variation in KRTAP28-1 on Cashmere Traits
Of the five variants found in the cashmere goats, variant E was present at a frequency of less than 5%. It was therefore excluded from the association analyses given the potential for bias. Associations were accordingly only investigated for the four common variants (A, B, C and D).
Cashmere fibers from goats with two copies of A had lower mean fiber diameter than those from goats with one copy of A, whereas cashmere fibers from goat with one copy of A had lower mean fiber diameter than those from goats that did not contain A ( Table 2). This suggested an additive effect of KRTAP28-1 variation on mean fiber diameter.

Discussion
Together with homology searching, PCR-SSCP has been proved to be a useful method for finding and characterizing caprine KAP genes, including KRTAP15-1 [19], KRTAP20-1 [15], KRTAP20-2 [14] and KRTAP24-1 [11]. In this study, the identification of a new high sulphur KAP (called KRTAP28-1) has been described. The gene was clustered with eleven previously identified KAP genes on goat chromosome 1 and displayed the highest similarity with the KRTAP28-1 from sheep when compared to other high sulphur KAP sequences that have been identified in humans, sheep and goats. The identification of KRTAP28-1 brings the total number of caprine KRTAPs described in the published literature, from 14 to 15.
While the putative polypeptide encoded by the notional caprine KRTAP28-1 could be classified into the high sulphur KAP group, the content of cysteine (4.76 mol%) in the putative protein was the lower than previously identified caprine high sulphur KAP proteins. In contrast, the caprine KAP28-1 would have a higher content of serine (19.05%), threonine (13.10%) and tyrosine (7.14%). The variation in amino acid composition has been described previously for caprine KAP15-1 [19] and KAP24-1 [11], although its biological implications and function is unknown. While it has been reported that cysteine can form disulfide bond cross-links with the IFs [20], tyrosine in HGT-KAPs are thought to regulate the arrangements of IFs by cation-π interactions [21]. Furthermore serine, threonine and tyrosine have been suggested to be phosphorylated in caprine KAP20-2 [14] and KAP20-1 [15], and phosphorylation has been reported to affect keratin assembly and organization [22]. Accordingly, the functional significance of the higher content of serine, threonine and tyrosine in caprine KAP28-1 deserves further study.
Despite the predicted amino acid sequences of caprine KRTAP28-1 having the highest similarity to ovine KRTAP28-1, there are some differences in the sequences between the two species. First, when compared to goat sequences, there was a 1-bp deletion (c.249_251delA) in the sheep sequence. The deletion would cause the loss of a stop codon and this putatively leads to a 47 amino acid increase in the length of the sheep protein compared to goat KAP28-1. Second, there is variability in the number of TG dinucleotide repeats in the putative KAP28-1 sequences. The sheep sequence includes eleven repeats of TG, while the goat KAP28-1 sequences have nine to eleven repeats. The third difference is the composition of amino acid in KAP28-1. For example, the putative caprine proteins contain 4.76 mol% cysteine, which is lower than that reported for sheep (8.40 mol%). In contrast, the goat sequences have a higher content of serine (19.05 mol%) and tyrosine (13.10 mol%) than sheep (18.61 mol% and 5.46 mol%, respectively). It was further found that compared to goat sequences, more cysteine residues and less serine and tyrosine residues in sheep are mainly arisen from additional length of the ovine KAP28-1 sequences as there was an average levels of 15.55%, 6.68% and 2.00% for cysteine, serine and tyrosine in these additional ovine sequences, respectively.
Sequence variation in KRTAP28-1 has been studied in sheep [6]. Despite the observation that the KAP28-1 gene is polymorphic in both sheep and goats, the nature of the sequence variation detected in the two species appears to be different. With sheep, the vast majority of variation in sequence is located in the coding regions except for c.-30T/G [6]. In contrast, nucleotide sequence variations in the non-coding regions were found to predominate in goats. This suggests that the polymorphism observed in these two species may be derived by different mechanisms. It is also notable that for both species, all of the coding region sequence variation was non-synonymous.
Of the eleven sequence variations detected in the study, c.17G/A, c.129T/A, c.166C/T and c.190A/G were non-synonymous, and would lead to amino acid changes in the putative KAP24-1 protein.
The substitution c.129T/A would result in the gain or loss of phenylalanine. Due to the possession of an aromatic side chain, phenylalanine has the ability to form a stacking interaction with other residues with aromatic side chains [23]. It is therefore possible that the gain of phenylalanine may contribute to interactions between KAP28-1 and the IFs, and thus affect fiber traits. The substitution c.166C/T could result in change in the mol % content of serine and the substitution c.190A/G could result in the gain or loss of serine and glycine. The absence or presence of serine and glycine may affect KAP structure, as these amino acids can affect helix formation, and thus potentially affect KAP interaction with the IFs. Although nucleotide sequence changes in the non-coding region would not result in amino acid changes, their potential function should not be ignored because they may be linked to other variation in critically important regions of caprine KRTAP28-1, and potentially affect transcription binding sites, transcription efficiency and mRNA stability, thereby resulting in altered function [24]. It should be noted that the coding sequence of variant B was identical to D, and thus if expressed the two variants would arguably produce an identical amino acid sequence.
Of the three cashmere fiber traits investigated in the study, variation in KRTAP28-1 was only associated with mean fiber diameter, but not cashmere weight and crimped fiber length. The result was in accordance with what was observed in sheep, where KRTAP28-1 genotype was found to have an effect on mean fiber diameter in Southdown × Merino-cross lambs [6]. The observation that variant A was the most common variant in the 385 Longdong cashmere goats, appears to be consistent with the association analyses results. Given that the presence of A was associated with a 'favorable' cashmere fiber trait (i.e., decreased mean fiber diameter), it might be concluded that purposeful selection for decreased mean fiber diameter in the Longdong cashmere goats has resulted in variant A becoming more common than variants B, C, D and E.
It is known that ovine KRTAPs are clustered together in specific chromosomal regions. The KRTAP28-1 sequence identified in the study was clustered with eleven previously identified KRTAPs on goat chromosome 1. When compared to the associations between variation in these KRTAPs and variation in cashmere fiber traits, the effect of variation in KRTAP28-1 on cashmere traits is similar to that described for KRTAP24-1 [11], but is different to that reported for KRTAP20-1 [15], KRTAP20-2 [14] and KRTAP13-1 [13]. Given that the KAP28-1 gene is positioned closer to KRTAP24-1 than KRTAP20-1, KRTAP20-2 and KRTAP13-1 on the goat chromosome 1, it is possible that the effect described here for KRTAP28-1 may be due to its tight linkage to the KAP24-1 gene, instead of it having an independent effect.
While identifying genes like KRTAP28-1 is in itself reasonably straight-forward, it has other important implications. For example, high-throughput RNA sequencing (RNA-seq) technology has been widely used in the analysis of the genetic basis of complex and economically important traits in animals. However, the approach requires genes to have been identified and characterized, such that the RNA sequencing reads can be mapped to the reference genome of specific organisms. If there is incomplete annotation of the reference genome (i.e., the sequences have not been identified), then the approach is of limited value. In this context, while it is widely accepted that the KAPs are structural components of cashmere fibers and thus play a role in determining fiber properties, few of the caprine KAP genes have been identified, compared with humans and sheep. Accordingly, the use of RNA-seq technology is unlikely to be of much use in describing follicle activity and the production of cashmere fiber until all the KAPs and keratins have been identified.