Human Lectins, Their Carbohydrate Affinities and Where to Find Them

Lectins are a class of proteins responsible for several biological roles such as cell-cell interactions, signaling pathways, and several innate immune responses against pathogens. Since lectins are able to bind to carbohydrates, they can be a viable target for targeted drug delivery systems. In fact, several lectins were approved by Food and Drug Administration for that purpose. Information about specific carbohydrate recognition by lectin receptors was gathered herein, plus the specific organs where those lectins can be found within the human body.


Introduction
Lectins are an attractive class of proteins of non-immune origin that can either be free or linked to cell surfaces, and are involved in numerous biological processes, such as cell-cell interactions, signaling pathways, cell development, and immune responses [1]. Lectins selectively recognize carbohydrates and reversibly bind to them as long as the ligands are oriented in a specific manner. Some of the commonly occurring carbohydrates that are found in Nature are D-fructose, D-galactose, L-arabinose, D-xylose, D-mannose, D-glucose, D-glucosamine, D-galactosamine, L-fucose, various uronic acids, sialic acid, and their combinations to form other di-and oligosaccharides, or other biomolecules ( Figure 1) [2].

Introduction
Lectins are an attractive class of proteins of non-immune origin that can either be free or linked to cell surfaces, and are involved in numerous biological processes, such as cell-cell interactions, signaling pathways, cell development, and immune responses [1]. Lectins selectively recognize carbohydrates and reversibly bind to them as long as the ligands are oriented in a specific manner. Some of the commonly occurring carbohydrates that are found in Nature are D-fructose, D-galactose, L-arabinose, D-xylose, D-mannose, D-glucose, D-glucosamine, D-galactosamine, L-fucose, various uronic acids, sialic acid, and their combinations to form other di-and oligosaccharides, or other biomolecules ( Figure 1) [2]. Lectins in vertebrates can be classified either by their subcellular location, or by their structure. Division based on their location includes integral lectins located in membranes as structural components, or soluble lectins present in intra-and intercellular fluids, Lectins in vertebrates can be classified either by their subcellular location, or by their structure. Division based on their location includes integral lectins located in membranes as structural components, or soluble lectins present in intra-and intercellular fluids, which can move freely.
Division according to lectin structure consists of several different types of lectins, such as C-type lectins (binding is Ca 2+ dependent), I-type lectins (carbohydrate recognition domain is similar to immunoglobulins), galectin family (or S-type, which are thiol dependent), which can move freely.
Division according to lectin structure consists of several different types of lectins, such as C-type lectins (binding is Ca 2+ dependent), I-type lectins (carbohydrate recognition domain is similar to immunoglobulins), galectin family (or S-type, which are thiol dependent), pentraxins (pentameric lectins) and P-type lectins (specific to glycoproteins containing mannose 6-phosphate) [3].
Different lectins have high similarity in the residues that bind to saccharides, most of which coordinate to metal ions, and water molecules. Nearly all animal lectins possess several pockets that recognize molecules other than carbohydrates, meaning that they are multivalent and can present 2 to 12 sites of interaction, allowing the binding of several ligands simultaneously. The specificity and affinity of the lectin-carbohydrate complex depends on the lectin, which can be very sensitive to the structure of the carbohydrate (e.g., mannose versus glucose, Figure 1), or to the orientation of the anomeric substituent (α versus β anomer, e.g., in Figure 2), or both. Lectin-carbohydrate interactions are achieved mainly through hydrogen bonds, van der Waals (steric interactions), and hydrophobic forces (example is given in Figure 3) [3,4].  It has been shown that the majority of lectins are conserved through evolution, suggesting that these proteins play a crucial role in the sugar-recognition activities necessary for the living process and development [5,6].
Although lectins are present in animals, plants, lichens, bacteria, and higher fungi [3], this review focuses only on human lectins for targeted drug delivery [7] purposes, their specificity towards carbohydrates and the organs where they are expressed. When referring to gene expression (or RNA expression), one means that those specific organs or cells have that specific gene coded. If active, it produces the respective protein, and one says that the protein is expressed in that organ or cell. In this review, we focus only on protein Biomolecules 2021, 11, x which can move freely.
Division according to lectin structure consists of several different types of such as C-type lectins (binding is Ca 2+ dependent), I-type lectins (carbohydrate re tion domain is similar to immunoglobulins), galectin family (or S-type, which ar dependent), pentraxins (pentameric lectins) and P-type lectins (specific to glycop containing mannose 6-phosphate) [3].
Different lectins have high similarity in the residues that bind to saccharides, m which coordinate to metal ions, and water molecules. Nearly all animal lectins posse eral pockets that recognize molecules other than carbohydrates, meaning that they a tivalent and can present 2 to 12 sites of interaction, allowing the binding of several simultaneously. The specificity and affinity of the lectin-carbohydrate complex depe the lectin, which can be very sensitive to the structure of the carbohydrate (e.g., m versus glucose, Figure 1), or to the orientation of the anomeric substituent (α ve anomer, e.g., in Figure 2), or both. Lectin-carbohydrate interactions are achieved through hydrogen bonds, van der Waals (steric interactions), and hydrophobic forc ample is given in Figure 3) [3,4].   It has been shown that the majority of lectins are conserved through evo suggesting that these proteins play a crucial role in the sugar-recognition activiti essary for the living process and development [5,6].
Although lectins are present in animals, plants, lichens, bacteria, and higher fu this review focuses only on human lectins for targeted drug delivery [7] purpose specificity towards carbohydrates and the organs where they are expressed. When ring to gene expression (or RNA expression), one means that those specific organs have that specific gene coded. If active, it produces the respective protein, and on that the protein is expressed in that organ or cell. In this review, we focus only on p It has been shown that the majority of lectins are conserved through evolution, suggesting that these proteins play a crucial role in the sugar-recognition activities necessary for the living process and development [5,6].
Although lectins are present in animals, plants, lichens, bacteria, and higher fungi [3], this review focuses only on human lectins for targeted drug delivery [7] purposes, their specificity towards carbohydrates and the organs where they are expressed. When referring to gene expression (or RNA expression), one means that those specific organs or cells have that specific gene coded. If active, it produces the respective protein, and one says that the protein is expressed in that organ or cell. In this review, we focus only on protein expression, since that information is the only relevant one for the development of targeted drug delivery systems. More information about carbohydrate-based nanocarriers for targeted drug delivery systems can be found elsewhere [8][9][10]. Since lectins are able to recognize and transport carbohydrates and their derivatives, lectin targeting can be relevant in the research and development of new medicines [7,11,12]. The metabolism of cancer cells, for example, is different from normal cells due to intense glycolytic activity (Warburg effect) [13]. Cancer cells require glutamine and/or glucose for cell growth, and glucose transporter isoforms 1 and 2 (gene symbols GLUT1 and GLUT2, respectively) showed an increase in activity in several tumors (gastrointestinal carcinoma, squamous cell carcinoma of the head and neck, breast carcinoma, renal cell carcinoma, gastric and ovarian cancer) [14,15].
The herein adopted lectin nomenclature is in accordance with the Human Genome Group (HUGO) Gene Nomenclature Committee. However, most common designated aliases (non-standard names) are also included (and appear first). The expression data for all lectin-coding genes was compiled from The Human Protein Atlas [16,17] and GeneCards [18] databases.

C-Type Lectins
C-type lectins are involved in the recognition of saccharides in a Ca 2+ -dependent manner but exhibit low affinities to carbohydrates, requiring multiple valencies of carbohydrate ligands to mediate signaling pathways, such as DC-SIGN2 which gene symbol is CLEC4M (Most genes carry the information to make proteins. The gene name is often used when referring to the corresponding protein). MINCLE (gene symbol CLEC4E), on the other hand, shows high affinity and can detect small numbers of glycolipids on fungal surfaces [19,20]. Most of the lectin-like domains contain some of the conserved residues required to establish the domain fold, but do not present the residues required for carbohydrate recognition [21]. The amino acid residues known to be involved in calcium-dependent sugar-binding are the EPN motif (mannose-binding), the QPD motif (for galactose binding), and the WND motif (for Ca 2+ binding) [22]. More information about glycan affinity and binding to proteins can be found elsewhere [23]. A comprehensive list of C-type lectins is presented in Table 1, divided by subfamilies that differ in the architecture of the domain [22,24], along with the carbohydrates that they recognize and the human tissues where they are expressed.  Gal-β-(1-3 or 1-4)-GlcNAc-β-(1-2)-Man trisaccharides [30,31] Adipose and soft tissue, bone marrow and lymphoid tissues, brain, endocrine tissues, female tissues, gastrointestinal tract, kidney and urinary bladder, lung, male tissues, muscle tissues, pancreas, proximal digestive tract, skin  COLEC12 D-galactose, L-and D-fucose, N-acetylgalactosamine (internalizes specifically in nurse-like cells), sialyl Lewis X, or a trisaccharide and asialo-orosomucoid (ASOR). May also play a role in the clearance of amyloid-beta in Alzheimer disease [48] Brain, lung, placenta   Bone marrow, brain, colon, kidney, lung, spleen

Chitolectins (or Chilectins)
There are two types of proteins that are able to recognize chitin: chitinases and chitolectins. The first ones are active proteins that bind and hydrolyze oligosaccharides, whereas the latter ones are able to bind oligosaccharides but do not hydrolyze them [76,77] and are presented in Table 2. Table 2. Human chitolectins (also called chilectins), their carbohydrate ligands and protein expression in the organs.

F-Type Lectins
F-type lectins, also called fucolectins, are characterized by an α-L-fucose recognition domain and display both unique carbohydrate-and calcium-binding sequence motifs [76]. F-type lectins are immune-recognition proteins and are presented in Table 3. Fucose is recognized by specific interactions with O5 (pyranose acetal oxygen), 3-OH and 4-OH [82], the reason why these atoms must be available to form these interactions after the synthesis of fucose derivatives.  Adipose and soft tissue, bone marrow and lymphoid tissues, brain, endocrine tissues, female tissues, gastrointestinal tract, kidney and urinary bladder, lung, male tissues, muscle tissues, pancreas, proximal digestive tract, skin a) FDA-approved drug target. b) Only RNA expression data available in The Human Protein Atlas [16,17] and GeneCards [18] databases. c) Carbohydrate moieties recognized by this protein have not been discovered yet.

F-Box Lectins
F-box proteins are the substrate-recognition subunits of the SCF (Skp1-Cul1-F-box protein) complex. They have an F-box domain that binds to S-phase kinase-associated protein 1 (Skp1) [84]. The F-box proteins were divided into three different classes: Fbws are those that contains WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs that have either different protein-protein interaction modules or no recognizable motifs [85]. Although F-box proteins are a superfamily of proteins, only five are known to recognize N-linked glycoproteins [84] as presented in Table 4.  High-mannose glycoproteins [87] Adipose and soft tissue, bone marrow and lymphoid tissues, brain, endocrine tissues, female tissues, gastrointestinal tract, kidney and urinary bladder, lung, male tissues, muscle tissues, pancreas, proximal digestive tract, skin Adipose and soft tissue, bone marrow and lymphoid tissues, brain, endocrine tissues, female tissues, gastrointestinal tract, kidney and urinary bladder, lung, male tissues, muscle tissues, proximal digestive tract, skin F-box protein 22  Esophagus, kidney, oral mucosa, parathyroid gland, skin, stomach a) Carbohydrate moieties recognized by this protein have not been discovered yet. b) Only RNA expression data available in The Human Protein Atlas [16,17] and GeneCards [18] databases.

Ficolins
Ficolins play an important role in innate immunity by recognizing and binding to carbohydrates present on the surface of Gram-positive and Gram-negative bacteria [89]. There are three human ficolins and they are presented in Table 5.

I-Type Lectins
I-type lectins are a subset of the immunoglobulin superfamily that specifically recognizes sialic acids and other carbohydrate ligands. Most of the members of this group of lectins are siglecs, which are type I transmembrane proteins, and can be divided into two groups: the CD33-related group that includes CD33 (siglec3) siglecs5-11, and siglec14 while the other group includes siglec1, CD22 (siglec2), MAG (siglec4) and Siglec15 [90,91]. CD33-related groups possess between 1 and 4 C-set domains and feature cytoplasmic tyrosine-based motifs involved in signaling and endocytosis. Siglec1 possesses 16 C-set domains, CD22 has 6 C-set domains and MAG presents 4 C-set domains. MAG is the only siglec not found on cells of the immune system. Members of this I-type superfamily are presented in Table 6 along with their carbohydrate ligands and protein expression. An example of a drug delivery system was developed by Spence, Greene and co-workers who developed polymeric nanoparticles of poly(lactic-co-glycolic acid) decorated with sialic acid [92,93]. Table 6. Human I-type lectins, their carbohydrate ligands and protein expression in the organs.  Appendix, bone marrow, brain, endometrium, fallopian tube, kidney, lung, lymph node, spleen, testis, tonsil L1 cell adhesion molecule L1CAM α-(2-3)-Sialic acid [113] Adipose and soft tissue, bone marrow and lymphoid tissues, brain, female tissues, gastrointestinal tract, kidney and urinary bladder, lung, male tissues, muscle tissues, proximal digestive tract, skin Myelin protein zero MPZ SO 4 --3GlucA-β-(1-3)-Gal-β-(1-4)-GlcNAc (HNK-1 antigen) [101] Bronchus, esophagus, fallopian tube, small intestine, soft tissue, stomach, testis Neural cell adhesion molecule 1 NCAM1 High N-linked D-mannose [114] Brain, colon, hearth muscle, pancreas, smooth muscle, soft tissue, thyroid gland Neural cell adhesion molecule 2 NCAM2 c) Brain, bronchus, colon, duodenum, gallbladder, ovary, rectum, small intestine, soft tissue, testis a) FDA-approved drug target. b) Only RNA expression data available in The Human Protein Atlas [16,17] and GeneCards [18] databases. c) Carbohydrate moieties recognized by this protein have not been discovered yet.

L-Type Lectins
L-type lectins are distinguished from other lectins on the basis of tertiary structure, not the primary sequence, and are composed of antiparallel β-sheets connected by short loops and β-bends, usually lacking any α-helices [115]. Members of this family of lectins present different glycan-binding specificities as presented in Table 7. L-type superfamily includes Pentraxins [116,117] that require Ca 2+ ions for ligand binding. Both LMAN1 and LMAN2 also require Ca 2+ ions for their binding activity [115]. Table 7. Human L-type lectins, their carbohydrate ligands and protein expression in the organs.

Common Name (HUGO Name if Different) Gene Symbol Carbohydrate Preferential Affinity Protein Expression in the Organs
Adhesion G protein-coupled receptor D2 ADGRD2 a) b) Amyloid P component, serum APCS Heparin, dextran sulfate proteoglycans [123] b) C-reactive protein CRP Adipose and soft tissue, bone marrow and lymphoid tissues, brain, endocrine tissues, female tissues, gastrointestinal tract, kidney and urinary bladder, lung, male tissues, muscle tissues, pancreas a) Carbohydrate moieties recognized by this protein have not been discovered yet. b) Only RNA expression data available in The Human Protein Atlas [16,17] and GeneCards [18] databases.

M-Type Lectins
M-type family of lectins consists of α-mannosidases, which are proteins involved in both the maturation and the degradation of Asn-linked oligosaccharides [127]. Members of this family, their binding affinities and protein expression are presented in Table 8. Table 8. Human M-type lectins, their carbohydrate ligands and protein expression in the organs.

P-Type Lectins
P-type lectins constitute a two-member family of mannose-6-phosphate receptors ( Table 9) that play an essential role in the generation of functional lysosomes. The phosphate group is key to high-affinity ligand recognition by these proteins. Furthermore, optimal ligand-binding ability of M6PR is achieved in the presence of divalent cations, particularly Mn 2+ cation [130,131]. Table 9. Human P-type lectins, their carbohydrate ligands and protein expression in the organs.

R-Type Lectins
R-type lectins are protein-UDP acetylgalactosaminyltransferases that contain an Rtype carbohydrate recognition domain, which is conserved between animal and bacterial lectins [135]. Members of this superfamily recognize Gal/GalNAc residues and are expressed in several tissues as presented in Table 10.

S-Type Lectins
S-type lectins are known nowadays as galectins and are a superfamily of proteins that show a high affinity for β-galactoside sugars (Table 11). Formerly called S-type lectins because of their sulfhydryl dependency, galectins are the most widely expressed class of lectins in all organisms. Human galectins have been classified into three major groups according to their structure: prototypical, chimeric and tandem-repeat [151][152][153].
Galectins play important roles in immune responses and promoting inflammation. They are also known for having a crucial role in cancer-causing tumor invasion, progression, metastasis and angiogenesis [154][155][156]. Table 11. Human S-type lectins, their carbohydrate ligands and protein epression in the organs.

X-Type Lectins
Intelectins (Table 12) were classified as X-type lectins because they do not have a typical lectin domain, instead, they contain a fibrinogen-like domain and a unique intelectinspecific region [173]. Table 12. Human X-type lectins, their carbohydrate ligands and protein expression in the organs.

Orphans
Orphan lectins are those that do not belong to known lectin structural families [175]. Proteins that bind to sulfated glycosaminoglycans are usually not considered as lectins [101], however, the specific binding of these proteins to sulfated glycosaminoglycans can provide a valuable tool to develop targeted drug delivery systems. Glycosaminoglycan binding interactions with proteins were described in detail by Vallet, Clerc and Ricard-Blum [176] which information is outside of the scope of this review.