The AL Amyloid Fibril: Looking for a Link between Fibril Formation and Structure

: The formation and deposition of ﬁbrils derived from immunglobulin light chains is a hallmark of systemic AL amyloidosis. A particularly remarkable feature of the disease is the diversity and complexity in pathophysiology and clinical manifestations. This is related to the variability of immunoglobulins, as virtually every patient has a variety of mutations resulting in their own unique AL protein and thus a unique ﬁbril deposited in the body. Here, I review recent biochemical and biophysical studies that have expanded our knowledge on how versatile the structure of AL ﬁbrils in patients is and highlight their implications for the molecular mechanism of ﬁbril formation in AL amyloidosis.


Introduction
Systemic AL amyloidosis is a patient specific disease caused by an underlying plasma cell dyscrasia in which a specific light chain is overexpressed by a proliferating plasma cell clone, leading to the deposition of amyloid fibrils [1]. The part of the light chain deposited in the fibril is called AL protein. Interestingly, only 10% to 15% of all plasma cell diseases result in AL amyloidosis [1], meaning that only a small fraction of the theoretically possible light chains ever forms amyloid fibrils in the body. Yet, the molecular basis of this circumstance is not clear and the identification of common structural features between the disease-associated light chain protein and AL fibrils of different patients is important to better understand and treat the disease. This review summarizes recent findings on the structure of AL amyloid fibrils and their implications for the molecular mechanism of fibril formation.

Structure and Variability of Light Chains
Light chains consist of about 220 amino acids and have a molecular weight of about 25 KDa. Each light chain is composed of two immunoglobulin domains, an N-terminal variable (V) domain and a C-terminal constant (C) domain [2]. Structurally, the C-and V-domains share conserved common features. Both are composed of two β sheets packed tightly against each other forming a Greek-key β-sandwich ( Figure 1). The V-domain comprises about 100 amino acids organized in a 4-stranded antiparallel sheet packed against a 5-stranded antiparallel sheet (ABED/CC C"FG) [3,4]. The C-domain consists of 90 to 100 amino acids arranged in a 4-stranded antiparallel sheet packed against a 3-stranded antiparallel sheet, lacking the C and C" strand present in the V-domain [3,4]. The fold and structure of the domains is stabilized by a highly conserved disulfide bridge between the two β-sheets (connecting strand B and F) and a hydrophobic core located inside the β-sandwich formed mainly by hydrophobic amino acids in strands B, C, E and F as well as a highly conserved tryptophan in strand C (Figure 1) [4]. While the C-domain shows relatively little sequence variation, the V-domain is characterized by considerable sequence variability. This variability is mainly caused by three hypervariable loops (complementary determining regions, CDR) that form the antigen binding site [2]  Greek-key β-sandwich is formed by two antiparallel beta sheets consisting of strands ABED (cyan) and CC′C″FG (brown). The three CDR loops (blue) cluster on the same side of the structure and form the antigen binding site. The conserved disulfide bridge (yellow) and tryptophan (orange) are highlighted. N: amino terminus; C: carboxy terminus.
The diversity of light chains is the reason why virtually every AL amyloidosis patient has a unique person-specific AL protein [5]. It is achieved through a complex maturation process that mainly involves two molecular events: (I) somatic recombination and (II) somatic hypermutation [2,4].
(I) Somatic recombination generates diversity in two ways. (a) Three gene segments (V, J and C) are recombined to generate a full-length light chain gene. The gene segments V and J together encode the V-domain, while the C segment encodes the C-domain. For each gene segment, multiple functional and non-functional copies exist, which are used stochastically in serial rearrangement events [6]. In addition, there are two separate light chain gene loci with their own set of gene segments, subdividing the light chain repertoire into two isotypes, κ (located on Chromosome 2) and λ (located on Chromosome 22). Allelic variations of the gene loci further increase the variability of recombined gene segments [6]. All theoretically possible rearrangements of functional gene segments reach several hundred different light chain sequences, which is called combinatorial diversity. (b) When the V and J gene segment are joined, a random truncation of the 3′ end of the V segment and/or the 5′ end of the J segment by an exonuclease may occur, resulting in P nucleotides [4]. A random addition of nucleotides in the region of the 3′ end of the V segment and/or the 5′ end of the J segment may also occur before the segments are ligated, resulting in N nucleotides [4]. This so-called junctional diversity increases the variability in the CDR3 in particular, however, the occurrence of P and N nucleotides can also lead to frameshifts, which in purely mathematical terms only lead to a functional open reading frame for a light chain in 1/3 of all cases.
(II) After successful somatic recombination, single point mutations are introduced into the rearranged light chain gene [2]. The process takes place mainly in the CDRs with a frequency of approximately 10 6 times that of a spontaneous gene mutation [7]. Somatic hypermutation potentially alters stability, specificity and binding affinity. However, it can also lead to genes that harbor premature stop codons or light chains that cannot fold correctly. Greek-key βsandwich is formed by two antiparallel beta sheets consisting of strands ABED (cyan) and CC C"FG (brown). The three CDR loops (blue) cluster on the same side of the structure and form the antigen binding site. The conserved disulfide bridge (yellow) and tryptophan (orange) are highlighted. N: amino terminus; C: carboxy terminus.
The diversity of light chains is the reason why virtually every AL amyloidosis patient has a unique person-specific AL protein [5]. It is achieved through a complex maturation process that mainly involves two molecular events: (I) somatic recombination and (II) somatic hypermutation [2,4].
(I) Somatic recombination generates diversity in two ways. (a) Three gene segments (V, J and C) are recombined to generate a full-length light chain gene. The gene segments V and J together encode the V-domain, while the C segment encodes the C-domain. For each gene segment, multiple functional and non-functional copies exist, which are used stochastically in serial rearrangement events [6]. In addition, there are two separate light chain gene loci with their own set of gene segments, subdividing the light chain repertoire into two isotypes, κ (located on Chromosome 2) and λ (located on Chromosome 22). Allelic variations of the gene loci further increase the variability of recombined gene segments [6]. All theoretically possible rearrangements of functional gene segments reach several hundred different light chain sequences, which is called combinatorial diversity. (b) When the V and J gene segment are joined, a random truncation of the 3 end of the V segment and/or the 5 end of the J segment by an exonuclease may occur, resulting in P nucleotides [4]. A random addition of nucleotides in the region of the 3 end of the V segment and/or the 5 end of the J segment may also occur before the segments are ligated, resulting in N nucleotides [4]. This so-called junctional diversity increases the variability in the CDR3 in particular, however, the occurrence of P and N nucleotides can also lead to frameshifts, which in purely mathematical terms only lead to a functional open reading frame for a light chain in 1/3 of all cases.
(II) After successful somatic recombination, single point mutations are introduced into the rearranged light chain gene [2]. The process takes place mainly in the CDRs with a frequency of approximately 10 6 times that of a spontaneous gene mutation [7]. Somatic hypermutation potentially alters stability, specificity and binding affinity. However, it can also lead to genes that harbor premature stop codons or light chains that cannot fold correctly.

Properties of Amyloidogenic Light Chains
There is evidence that the isotype (κ or λ) or the germline segments used for somatic recombination contribute decisively to the propensity of a light chain to form fibrils. Somatic rearrangement occurs first in the κ locus and only if this is unsuccessful, the λ locus is used [6]. This is the reason why about 2/3 of all B lymphocytes in the polyclonal repertoire of healthy individuals produce a κ light chain [8]. Interestingly, AL amyloidosis is more frequently associated with λ light chains (~75% of cases) than with κ light chains (~25% of cases) [1]. One explanation for such a bias could be genetic abnormalities associated with the AL clonal plasma cell, like an increase or decrease in the number of genes or chromosomes, translocation of genes or dysregulation of transcription factors that increase the accessibility of certain light chain subfamilies or germline segments for somatic recombination [9,10]. Another explanation could be the intrinsic differences in structure and physicochemical properties between λ and κ light chains, which are thought to be encoded in the germline [11,12]. For example, λ light chains show CDR3 regions that are longer, more hydrophobic and more conformationally flexible [11], which potentially make them more prone to aggregation. AL amyloidosis occurs preferentially in individuals with B lymphocytes expressing certain germline subfamilies, namely λ1, λ2, λ3, λ6 or κ1 [8]. Out of these subfamilies, only 5 germline segments, IGLV1-44 (1c), IGLV2-14 (2a2), IGLV3-01 (3r), IGLV6-57 (6a) and IGKV1-33 (O18/O8) encode around 60% of the reported amyloidogenic light chains [13][14][15][16]. In contrast, other germ line segments, such as IGLV1-40 (1e), IGLV3-10 (3p) or IGLV3-25 (3m) are frequently expressed in the polyclonal B cell repertoire, but are rarely or not found in AL amyloidosis [15]. There is also evidence that the isotype or the germline segment may influence the tendency to predominantly deposit in certain organs, so called organ tropism [1,8]. IGLV1-44 and IGLV2-14 are more likely to cause peripheral nerve and cardiac deposition [9,13,16], IGLV6-57 correlates with renal deposition [9,13,14,16] and IGKV1-33 is more likely to cause liver deposition [16]. However, these observations are based on a limited number of patients harboring a specific gene subfamily, which may be difficult to generalize. In summary, the vast majority of light chains in the body are non-amyloidogenic, and the ultimate contribution of the isotype or germline segment to the amyloidogenic properties of light chains is not clear.
Changes in primary structure introduced by somatic hypermutation into the germline sequence or by posttranslational modifications are also discussed as drivers of light chain aggregation [8,17]. The identification of critical amino acid positions and their concentration in specific secondary structure regions showed that the location and nature of the mutation, rather than the number, determines the amyloidogenicity of light chains [18][19][20][21][22]. Some mutations have been attributed to a decrease in thermodynamic stability [18,[23][24][25], increase in conformational dynamics [17,20,21] or effects on interdomain interactions [26,27]. However, in most cases it is very difficult to pinpoint specific mutations and the exact molecular mechanism by which certain mutations drive a light chain towards amyloid formation remains unclear. In the context of posttranslational modifications, light chain truncations are often found in the fibril but may vary depending on the primary structure or isotype. Reports identified V-domains with small parts of the C-domain as the main constituent of λ AL proteins [28][29][30][31][32], while κ AL proteins have been found to consist mainly of the C-domain [33,34]. The exact cleavage sites have not been determined in most cases and whether or not truncation occurs before fibril formation and is important in this process or after fibril formation and is just a result of truncation of a part not included in the fibril core is an open question [35][36][37]. In this context, a recent study hypothesized that all light chains have comparable intrinsic amyloidogenicity and that the interplay of protein sequence, environmental conditions and the presence and action of proteases determines whether or not a particular light chain is deposited as an amyloid fibril in a particular patient [38]. Another recent hypothesis is that disease-associated amyloid fibrils are selected in the body by their proteolytic stability, a mechanism termed proteolytic selection [39,40]. That is, protease-stable fibrils have a greater chance of escaping endogenous clearance mechanisms, proliferating and accumulating in a native cellular environment. Regardless of whether truncation occurs before or after fibril formation, both hypotheses propose a central role for proteolysis in AL amyloidosis and suggest that amino acid sequence and proteolytic stability cannot be simply correlated. AL-associated light chains show a higher frequency of acquired glycosylation than non-amyloidogenic light chains, which suggests that glycosylation may also contribute to amyloidogenicity [18,[41][42][43][44]. Sugar moieties may also affect interactions with other molecules or cell surfaces and therefore could have an impact on organ tropism [45]. Other posttranslational modifications like disulfide bridge formation [32,46] or pyroglutamate formation [47][48][49] have also been found in AL proteins. Both modifications are common to natural antibodies [2,50] and whether they play a role in fibril formation of light chains is unclear. It is worth noting that most studies on the fibrillogenesis of light chains have mainly focused on V-domains and therefore the involvement of the C-domain in fibril formation and the occurrence of mutations in this domain have been rarely investigated and reported [17,37,51].

Structure of AL Fibrils
The structure of patient-derived AL fibrils and the molecular mechanism by which misfolded conformations are formed in the body is not well understood. Most initial knowledge of the structure of light chain-derived fibrils comes from in vitro studies of fibrils formed from recombinant V-domains or short fragments of light chains in the test tube and analyzed by nuclear magnetic resonance spectroscopy or cryo-electron microscopy [52][53][54][55][56][57]. These studies showed that light chain-derived fibrils consist of an in-register parallel βsheet structure and that the polypeptide chain adopts a conformation with intermolecular contacts that is completely different from its native structure, suggesting that it undergoes dramatic structural rearrangements before fibril formation. However, it is crucial to point out that a growing number of studies show that in vitro fibrils differ in structure and stability from fibrils formed in the body [32,39,40,[58][59][60][61][62]. This suggests that it is important to focus on patient-derived fibrils when investigating the structural basis of a disease. In this context, cryo electron microscopy is currently the state of the art for structural analysis of patient-derived amyloid fibrils. Three recent cryo electron microscopy studies shed new light on this topic by determining the structure of AL fibrils and their corresponding AL protein. All fibrils were obtained from patients with cardiac amyloidosis and derived from the λ isotype but different germlines, namely IGLV1-44*01 [63], IGLV3-19*01 [64] and IGLV6-57*02 [65].
The fibril structures possess some striking similarities. All are asymmetric and show a single protofilament architecture, meaning they consist of only one polypeptide chain per layer ( Figure 2). As commonly observed in λ AL proteins [28][29][30][31][32], the polypeptide chain underlying the fibril consists of the V-domain and a short part of the C-domain (Figure 3). The highly ordered fibril core is formed by only a major portion of the Vdomain (Figure 3), suggesting that the V-domain is a prerequisite to build the fibril core of λ AL fibrils. The c-terminal remainder of the V-domain and the remaining C-domain form a conformationally flexible structure around the core (Figure 2). The monomers are compactly folded without crossovers of the polypeptide chain and stack on top of each other in a parallel, in-register structure ( Figure 2). Superficially, the individual protein layers form a relatively flat structure. But in detail, however, the polypeptide chain along the backbone sometimes exhibits height differences of up to 13 Å (Figure 2). These height differences allow the stacked polypeptide chains to spatially interlock with each other and interact via side chains, thus stabilizing the fibril structure [63][64][65]. Another consequence of these height differences is that the fibril tips do not form a flat surface, but rather grooves and ridges that present hydrophobic and hydrophilic regions that may serve as docking sites for monomers [65]. The fold of the polypeptide chain in the fibril core is substantially different from the native globular fold. Many of the side-chain interactions present in the globular state are not preserved in the fibril and regions that are spatially adjacent in the globular structure, such as the CDRs (Figure 1), are widely separated in the fibril (Figure 2). Another example is the highly conserved tryptophan, which is buried both in the native state and in the fibril, but has different spatial environments (Figures 1 and 2). These differences suggest that dramatic structural rearrangements and unfolding of the native state must occur [63][64][65]. Another remarkable feature is the persistence of the canonical disulfide bridge of the native fold in the fibril structure (Figures 1 and 2). Interestingly, the spatial orientation of the regions connected by the disulfide bridge is rotated by 180 • between the native state and fibril, i.e., the backbone runs in the same N to C orientation in the native state ( Figure 1) and in the opposite N to C orientation in the fibril (Figure 2). This suggests that the structural rearrangement must occur around the intact disulfide bridge that restricts the conformational freedom of the misfolded polypeptide chain [63]. o 2021, 2, FOR PEER REVIEW 5 adjacent in the globular structure, such as the CDRs (Figure 1), are widely separated in the fibril (Figure 2). Another example is the highly conserved tryptophan, which is buried both in the native state and in the fibril, but has different spatial environments (Figures 1  and 2). These differences suggest that dramatic structural rearrangements and unfolding of the native state must occur [63][64][65]. Another remarkable feature is the persistence of the canonical disulfide bridge of the native fold in the fibril structure (Figures 1 and 2). Interestingly, the spatial orientation of the regions connected by the disulfide bridge is rotated by 180° between the native state and fibril, i.e., the backbone runs in the same N to C orientation in the native state ( Figure 1) and in the opposite N to C orientation in the fibril (Figure 2). This suggests that the structural rearrangement must occur around the intact disulfide bridge that restricts the conformational freedom of the misfolded polypeptide chain [63].  are exposed on the surface while others are buried, some are in structurally well-ordered regions while others are part of highly flexible regions ( Figure 3) and they do not interrupt obvious crucial interactions for protein stability [63][64][65]. In the case of the λ3 fibril, an additional mutation was found outside the AL protein in the C-domain of the light chain [64], raising the question of whether such mutations play a role in fibril formation and whether their effects may be underestimated because they have hardly been studied so far. Besides the above-mentioned similarities, the fibril structures also show clear structural differences. The fold of the monomer is highly different and reflects the underlying sequence variability (Figure 2). The N-terminus can be disordered and surface exposed or hidden in the core of the fibril in a β strand conformation (Figure 2). The number of β strands within the polypeptide chain (between 9 and 12) as well as their length and position vary (Figure 2). The location of the CDRs in the fibril fold differs profoundly, but they mainly contribute to the rigid core region (Figure 2). The polypeptide chain may be highly ordered and structurally rigid throughout the entire core of the fibril or may have structurally flexible regions in between the highly ordered parts (Figure 2). The position and number of mutations compared to the germline sequence shows a relatively large variance, although the vast majority is located inside the CDRs (Figure 3). The λ1 AL protein has 10 mutations (2 outside the CDRs), the λ3 AL protein has 6 mutations (1 outside the CDRs) and the λ6 AL protein has 10 mutations (4 outside the CDRs). This variance supports the idea that it is not the number of mutations but rather the position and type of mutation that is important for amyloid formation [19]. However, analysis of the mutations does not provide a conclusive explanation for their role in fibril formation [63][64][65]. Some are exposed on the surface while others are buried, some are in structurally well-ordered regions while others are part of highly flexible regions ( Figure 3) and they do not interrupt obvious crucial interactions for protein stability [63][64][65]. In the case of the λ3 fibril, an additional mutation was found outside the AL protein in the C-domain of the light chain [64], raising the question of whether such mutations play a role in fibril formation and whether their effects may be underestimated because they have hardly been studied so far.
One particular position in the germline sequence is consistently mutated in all AL proteins and that is the asparagine at position 53 in CDR2 (Figure 3). Whether this particular position could be critical for fibril formation remains unclear, as it has not been found to be particularly important [27]. None of the AL proteins found in the current fibril structures show a glycosylation or a pyroglutamylation, which thus further leaves open the relevance of these modifications for fibril formation. The AL proteins show different lengths, which is mainly caused by the variability of the c terminal truncation site ( Figure 3A). The λ1 and λ3 AL proteins terminate mainly in the region around position 120 [63,64], while for the λ6 AL protein, different truncation sites around position 130 and 150 have been reported [65]. The region around position 120 is in close proximity to the short linker connecting the V-and C-domain and is flexible and exposed in both the natively folded light chain [66] and the fibril (Figures 2 and 3). In addition, the regions around position 130 and 150 are exposed in both states [65]. Whether these regions are common proteolytic sites involved in the generation of AL proteins and whether or not truncation occurs before fibril formation or after fibril formation is a matter for further studies. Based on the current knowledge, it is reasonable to assume that sequence variability leads to molecular interactions that stabilize a particular fibril structure. However, no definite conclusion can be drawn with regard to amino acid mutations and posttranslational modifications and their role in fibrillization, as the amount of structural information is currently limited.

Conclusions
Our understanding of how AL fibrils are structured and what the structural determinants of their formation are has improved significantly in the last years. Detailed knowledge of the molecular structure of AL fibrils supports the concept that light chain unfolding is required and showed that dramatic structural rearrangements occur around the canonical disulfide bridge of the immunoglobulin domain. However, as AL amyloidosis is a patient-specific disease, there is great heterogeneity in the clinical phenotypes, the associated polypeptide chains and fibril structures. Therefore, further work is needed to identify common mechanistic and structural patterns to open up possibilities for the development of new diagnostic or therapeutic strategies.