Proteins Binding to the Carbohydrate HNK-1: Common Origins?

The human natural killer (HNK-1) carbohydrate plays important roles during nervous system development, regeneration after trauma and synaptic plasticity. Four proteins have been identified as receptors for HNK-1: the laminin adhesion molecule, high-mobility group box 1 and 2 (also called amphoterin) and cadherin 2 (also called N-cadherin). Because of HNK-1′s importance, we asked whether additional receptors for HNK-1 exist and whether the four identified proteins share any similarity in their primary structures. A set of 40,000 sequences homologous to the known HNK-1 receptors was selected and used for large-scale sequence alignments and motif searches. Although there are conserved regions and highly conserved sites within each of these protein families, there was no sequence similarity or conserved sequence motifs found to be shared by all families. Since HNK-1 receptors have not been compared regarding binding constants and since it is not known whether the sulfated or non-sulfated part of HKN-1 represents the structurally crucial ligand, the receptors are more heterogeneous in primary structure than anticipated, possibly involving different receptor or ligand regions. We thus conclude that the primary protein structure may not be the sole determinant for a bona fide HNK-1 receptor, rendering receptor structure more complex than originally assumed.

Despite the importance of the HNK-1 glycan in the nervous system, only a few proteins have been proposed as receptors. These are the laminins [15,23], high-mobility group box (HMGB) proteins 1 and 2 [24,25] (also called amphoterins [26]) and cadherin-2 (also known as neural cadherin) [12]. The different groups of proteins have been associated with different roles of HNK-1. Binding to laminins is involved in the adhesion of neurons and glial cells to the extracellular matrix as well as neural cell migration and outgrowth of processes from neurons and astrocytes [12,15,23,[27][28][29]. Cadherin-2 is known for its role in cell adhesion, but more specifically, in binding to the α-amino-3-hydroxy-5-methylisoxazole propionate (AMPA)-type glutamate receptor subunit 2 (GluR2). Binding of cadherin-2 to GluR2 was shown to depend on the presence of HNK-1 on GluR2. GluR2 on the cell surface is stabilized by HNK-1, thus one can regard the HNK-1 glycan as a modulator of AMPA receptor trafficking and synaptic plasticity [12]. The high-mobility group box proteins are non-histone chromosomal proteins that bind HNK-1 on sulfated glycolipids and glycoproteins [30,31] and promote neurite outgrowth in an HNK-1 dependent manner [30].
Little is known about the mechanisms by which HNK-1 binds its receptors. Crystal structures are available for some regions of the HNK-1 receptors, but there are no structures of the receptors with the bound glycan and it is not even known whether the glycan binds in its sulfated or non-sulfated state. Hall et al. identified a 21-mer peptide in laminin, which inhibited binding of the full protein to HNK-1. This is consistent with competition between this peptide and the corresponding region of the G2 α-laminin domain for binding to HNK-1 [23,28].
Given this dearth of structural information about the binding domain/s, it was deemed interesting to see what can be found at the level of sequence analysis. Is there any evidence for a common ancestor protein or domain within the candidate HNK-1 receptors? A shared sequence motif could hint at a shared evolutionary history. This reasoning is based on the rationale that HNK-1 carrier proteins, receptors and enzymes generating related glycans are ancient proteins conserved in evolution. This is most clearly shown by some of the associated biochemical pathways. For example, the sulfotransferase responsible for catalysing the sulfation of glucuronic acid of HNK-1 has homologues in mammals, amphibians and bony fish [32][33][34]. Thus, the glycan, its synthesis pathway, or at least close evolutionary relatives have a long, conserved history.
If there are sequence signals (conservation of specific amino acids through evolution) connecting the putative HNK-1 receptor families, they are likely to be weak, so one needs an appropriate strategy to find them. Firstly, large families of sequences were gathered with a simple iterative psi-blast database search [35]. This approach runs the risk of "profile contamination" in which unrelated sequences become part of the set for the positionspecific scoring matrix [36,37]. Rather than saving computer time, we allowed more than a dozen iterations, but only accepting homologues within extremely conservative expectation values. Given sets of related homologues, full-length protein sequences were downloaded and realigned using a progressive aligner.
Collecting and accurately realigning homologues is practical for some thousands of homologues, but not for the very large numbers encountered when combining sets of homologues together. Faced with these high numbers, the strategy was to reduce each subset while retaining the most even spread of representatives across sequence space. This means calculating a fast alignment for a family of proteins and saving the matrix of dissimilarities between pairs of sequences. The entries of this matrix were then sorted, and the list was visited, starting from the most similar pairs. From each pair of sequences, one member was removed. The net result is that one removes the redundancy from nearly identical sequences and for some number of representatives obtains the most informative set of representatives. This process is referred to as the reduction of a set of sequences in the methods, but the reduced sets still contained thousands of members as detailed under Methods.
For motif searches, we did not want to be limited to recognized motifs reported in the literature. Instead, we ran a full expectation-maximization search within each family and then between each family of candidate receptors [38]. Given the scale and sensitivity of the methods, one would expect that even remote similarities would be found.

Alignment and Conservation of HNK-1 Receptors
The aim was not to study the individual putative HNK-1 receptors, but to find features common to the different receptor families. The first important result came from an attempt to align all three families (15,000 sequences, 5000 per family). One cannot show the whole alignment, since it is essentially a mass of unaligned, gapped sequences. A neighbor-joining tree was calculated and showed three unrelated groups of proteins. There is no similarity between the families, with the exception of HMGB-1 and HMBG-2. These can be aligned and from here are treated as just one group. Although not shown, the alignments were tried with local and global alignments, as well as various subsets. This changes how we regard the receptors. There are three distinct families. Although one can align the members of each family, there is no plausible alignment spanning the postulated HNK-1 receptor groups.
Conserved sites and regions can be seen within each group and compared to the literature. The overall conservation of laminin-α is shown in Figure 1. In the upper diagram, columns were removed if no residue was present in laminin-α. One can see that the closest several hundred homologues are readily alignable, but the more remote homologues are missing entire domains and there is a first domain that is always present. We then focused on residues 2454-2474 within which an HNK-1 binding site has been reported [23]. Looking at the whole sequence set, there are large gaps (white). This suggests that if this is an HNK-1 binding site, it only serves this role in mammals or it only involves the sites which are generally present in the molecule. One can further concentrate on the 700 closest sequences. They are mostly from mammals (50%), birds (22%) and ray-finned fish (20%) with the remainder from reptilians, amphibians and others. To return to the original search for similarities amongst the different sequence families, one should note that there is no similarity or meaningful alignment with residues outside of the laminin group. The lack of overall similarity between the groups of proteins does not preclude the existence of smaller shared sequence motifs.

Motifs Common to All Families
Given the cost of full motif searches, we began by reducing the sequence set to 3 × 400 = 1200 from the original 3 × 5000. A truly universal motif would appear in the full set of 1200 sequences, but a motif could be shared by two of the families, so we decided to run motif searches over the (3 × 2)/2 = 3 pairs of families. Motif searches within each family were calculated for completeness and comparison with the literature.
For the full set of 1200 sequences, no statistically significant motifs were found over the complete set. For each pair of families (800 sequences), no motifs were found spanning both families of the pair. There are, however, motifs that are unique to each family and that are also found in the literature as in the case of the cadherin CA domains [39], the high-mobility group (HMG) box domains [40] or the laminin-1 G domains [41].
Since a 21-mer segment in laminin-α (residues 2454-2474) binding HNK-1 was described [15,23], it was studied in more detail. Within the laminins, there is a highly significant (e value = 6.4 × 10 −323 ) 21 amino acid motif (residues 2457-2477) which overlaps, but falls outside of the region by three residues. This exact motif occurs five times within the laminins and corresponds to the C-terminus of the laminin G domains, extending for the five repetitions by four residues. The HNK-1 interacting sequence in the second laminin G domain [23] is not shared with the other putative HNK-1 receptors.
We found three motifs with similar e-values < 10 −40 in the different HNK-1 receptor families corresponding to cadherin-2 motif 11, laminin-1 motif 27 and HMGB-1 motif 2 ( Figure 2). The three families share a common glycine in position 7 of the alignment depicted in the sequence logos ( Figure 2) followed by non-polar residues at position 9, 10, 13 and 16. Yet, this weak similarity was not found to be biologically meaningful.

Discussion
A motif shared by the different putative HNK-1 binding proteins would be consistent with the belief that several different protein families directly bind HNK-1. This would be a signal that one could look for in other proteins, which might have roles in neural functions. There are certain sites conserved within each family. There are even motifs repeated within some families. We, however, found absolutely no evidence for a motif common to the different families. The sequence searches, alignments and motif searches used here were rather large and should have been able to detect any plausible statistical signal.
Based on the finding on a specific HNK-1 binding site within a 21-mer stretch of mouse laminin-1 [ ). This sequence contains four basic amino acids, which might be involved in binding sulfated HNK-1, since basic residues are involved in binding of matriglycan, a polysaccharide on dystroglycan, to the fourth laminin G domain [42,43]. Continuing in this vein, one should not ignore the arguments made for the importance of interactions of aliphatic protons with π-electron systems, specifically in protein-oligo/polysaccharide systems [44][45][46]. In a protein as long as laminin, there are many 21-mer fragments with several basic residues or even one isolated individual tyrosine. We therefore proposed that it may be more useful to make a simple and testable prediction. There is a repeated motif, which overlaps the 21-mer sequence region and occurs five times (in each of the laminin G domains), but only the second laminin G domain was found to bind HNK-1. There are two possible explanations. Either the quaternary domain arrangement restricts accessibility in four copies, or the binding is sensitive to small changes in sequence. Both explanations are plausible, considering observations of different binding partners for different laminin G domains [47,48]. A previous study modelling the second laminin G domain reported no deep binding pocket that is characteristic of HNK-1 binding [49], which is in agreement with our results.
Looking for hidden similarities, we found top 10 motifs from each of the three possible receptor families when compared with a sliding window (Figure 2), but even in this case, no similarity could be seen within the self-imposed limitation to common motifs in primary structure.
Flexibility of oligomeric/polymeric glycans with their structural heterogeneity is problematic for the search of receptors. It is highly possible that different receptors bind to different conformations or different chemical states. From a biochemical point of view, it is not known whether HNK-1 acts in the sulfated or non-sulfated form. At a physiological pH, these forms are subject to considerable structural differences, particularly since the effects of sulfation are difficult to predict from antibody studies: some antibodies bind to one form of the glycan and some to both forms [9,50,51]. Given these considerations, one has to assume that one glycan may be bound by different environments in different proteins. Taking mannose/protein binding as an example, different constraints have been proposed for human immunodeficiency virus (HIV) [52,53], the mannose-binding lectin CD4 as well as dendritic cell-specific intercellular adhesion molecule 3-grabbing non-integrin (DC-SIGN) [54,55].
Although evidence of HNK-1 binding capacities has been based on antibody studies, there is no doubt that HNK-1 binds directly, for instance, to laminin, since the binding partners were purified molecules. However, some proteins considered to be HNK-1 binders may bind via a third molecule, which is directly bound to HNK-1. Precedents for this argument could be that glycan-binding associated with novoviruses involves more than a single protein and well-defined glycan [56]. From our results, we have to conclude that each receptor protein family has its own way of binding to the HNK-1 glycan and although the three broad families studied have strongly conserved regions, no striking sequence features were common to all families.
Since co-crystallography of proteins with glycans is a very difficult undertaking because of the high flexibility of glycans, one may invest in NMR studies, as exempli-fied by studies on the interaction of a lipopolysaccharide from Klebsiella pneumoniae with lysozyme [57]. Unfortunately, a single laminin domain is five times larger than lysozyme. Thus, chemical shift titration and saturation-transfer techniques can only yield overall binding in the same way as surface plasmon resonance or scanning microcalorimetry [58,59]. For the moment, the details of HNK-1 binding remain an area ripe for speculation. We expect that a realistic goal would be to document the different binding behavior of the sulfated and non-sulfated forms of the HNK-1 glycan using the identified HNK-1 binding domain of laminin as a first step to gain deeper insights into the remarkable functional versatility of HNK-1.

Materials and Methods
Sequences for each of the putative HNK-1 binding proteins are given in Table 1. Homologues were collected using psi-blast [35] with up to 30 iterations and accepting homologues with an expectation value of less than 10 −21 , which was near the most conservative value possible, while allowing the calculations to finish in practical time. MAFFT [60] was used for all multiple sequence alignments. Initial sequence alignments for laminins and cadherin were run in default (fast) mode with a maximum of five iterations. For alignments for the HMGB family (HMGB1 and HMG2) with fewer homologues, MAFFT was run in its most accurate mode with affine gap penalties.
After initial data collection and alignment, each of the HNK-1 receptor families was reduced to 5000 representative sequences [61]. These were combined to form a full set of 15,000 sequences covering all protein families and a full alignment was calculated-again in fast mode.
From each protein family, several thousand (Table 1) homologues were used for conservation calculations after a second cut-off was applied. Sequences were re-aligned, and variability was calculated using Shannon entropy S i at each site i where p α is the probability of amino acid type a in a column of the alignment and the summation runs over the 20 amino acid types [62,63]. Literature domains were taken from SMART (Simple Modular Architecture Research Tool) [64,65]. MEME was used to search for sequence motifs using expectation maximization and to estimate probabilities of chance occurrence [38,66]. Because of the cost of the calculations, each aligned family of sequences was reduced to 400 representatives. The minimum and maximum lengths for motifs were set to 6 and 30 residues. The pairwise alignment function from the Biostrings R package [67] was used to display the relation between two sequences.
Numbers are given as e.g., 10,000 as in the ACS Chicago layout regulations despite the guides for nomenclature of numbers in scientific documents [68,69].
The steps are summarized in Figure 3. . The 20,000 sequences were reduced to 5000 representative sequences and merged to form a 15,000 sequences group to be aligned. In the second approach (right branch)-after a second cutoff was applied (see also Table 1)-families were realigned and then used to find conserved sites. Data Availability Statement: All data were taken from public databanks.

Conflicts of interest:
The authors declare that they have no conflict of interest with the contents of this article.

Abbreviations
AMPA α-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid GluR2 glutamate receptor 2 HMGB high-mobility group box HNK human natural killer Figure 3. Flowchart of the steps involved in the study of the HNK-1 receptors. The workflow comprises two approaches using the results of the alignment of 20,000 proteins of each family (left branch). The 20,000 sequences were reduced to 5000 representative sequences and merged to form a 15,000 sequences group to be aligned. In the second approach (right branch)-after a second cut-off was applied (see also Table 1)-families were realigned and then used to find conserved sites. Data Availability Statement: All data were taken from public databanks.

Conflicts of Interest:
The authors declare no conflict of interest.