A New Structural Model of Apolipoprotein B100 Based on Computational Modeling and Cross Linking

ApoB-100 is a member of a large lipid transfer protein superfamily and is one of the main apolipoproteins found on low-density lipoprotein (LDL) and very low-density lipoprotein (VLDL) particles. Despite its clinical significance for the development of cardiovascular disease, there is limited information on apoB-100 structure. We have developed a novel method based on the “divide and conquer” algorithm, using PSIPRED software, by dividing apoB-100 into five subunits and 11 domains. Models of each domain were prepared using I-TASSER, DEMO, RoseTTAFold, Phyre2, and MODELLER. Subsequently, we used disuccinimidyl sulfoxide (DSSO), a new mass spectrometry cleavable cross-linker, and the known position of disulfide bonds to experimentally validate each model. We obtained 65 unique DSSO cross-links, of which 87.5% were within a 26 Å threshold in the final model. We also evaluated the positions of cysteine residues involved in the eight known disulfide bonds in apoB-100, and each pair was measured within the expected 5.6 Å constraint. Finally, multiple domains were combined by applying constraints based on detected long-range DSSO cross-links to generate five subunits, which were subsequently merged to achieve an uninterrupted architecture for apoB-100 around a lipoprotein particle. Moreover, the dynamics of apoB-100 during particle size transitions was examined by comparing VLDL and LDL computational models and using experimental cross-linking data. In addition, the proposed model of receptor ligand binding of apoB-100 provides new insights into some of its functions.


Introduction
Lipoproteins are particles in the blood consisting of lipids and amphipathic proteins that carry various lipophilic components and are known to be causally linked to the development of atherosclerosis [1,2] and other pathophysiologic processes [3][4][5][6]. The outer layer of lipoproteins is formed from phospholipids, apolipoproteins, and unesterified cholesterol, and its hydrophobic core contains triglycerides (TG) and cholesterol esters (CE). Apolipoproteins are amphipathic proteins that stabilize lipoprotein particles and also serve as ligands for receptors and as modulators of lipid-modifying enzymes.
The model was created without intervention or manipulation by the experimental data, and the DSSO cross-link data and cysteines disulfide bonds were only used to validate the computational model. Considering the spacer length of DSSO 10.1 Å and the distance contribution by two lysine side chains 12.58 Å (2 × 6.29 Å) and 3.0 Å for backbone flexibility and structural dynamics; the Cα-Cα distances between DSSO cross-linked lysine residues is estimated to be~26 Å [32,33]. The model was also evaluated for the seven known disulfide bonds located in subunit I. The sulfur groups of cysteine residues were correctly paired in the model and within the 5.6 Å constraint. Furthermore, 18 cross-links detected within this subunit measured within the 26 Å distance limit ( Figure 1). Next, the structure was energy minimized to resolve any significant distortions using UCSF Chimera [34,35]. Human apoB-100 sequence (residues 28-1030) was entered into HHpred (https://toolkit.tuebingen.mpg.de/ tools/hhpred) (accessed on 30 April 2021) for homology search. Templates with probability >95% were selected for multiple sequence alignment using HHpred-TemplateSelection. Subunit I computational model contains 990 amino acids (E28-S1017) in three distinct domains ( Figure 1A). It resembles a pyramidal funnel with X:80 Å, Y:110 Å, Z:60 Å dimensions. Subunit I consist of an incomplete β-barrel at the top, two antiparallel β-sheets lined on two sides of the pyramid, supported by two curved layers of α-helices on the outer side. This model is similar to previously proposed models [16,18]. Subunit I domain1 (SID1, 28-320 aa.) consists of 11 β-strands that create an incomplete β-barrel with two α-helices on the side and within the center of the barrel (Figure 2A). The last 30 residues of SID1 make a random coil and connects the barrel to the second domain. Subunit I domain 2 (SID2, 321-616 aa.) consists of 17 amphipathic α-helices ( Figure 2B). The helix structure is arranged in a double layer outside the β-sheets to apparently stabilize the core structure. Domain 2 mostly covers the right side, posterior outlet, and part of the left side of the pyramid ( Figure 1A). Subunit I domain 3 (SID3, 617-1017 aa.) mostly forms the two sides of the pyramid and the base ( Figure 2C). On the right side, residues 617-651 form a coil that connects the last α-helix from domain 2 to the first β-strand in domain 3. Five antiparallel β-strands make the β-sheet on the right side. A long coil extends from resides 693-772 to connect strand three to four. After the fifth strand, residues 790-820, made from two short helices and coils, connect the right and left β-sheets and cover the pyramid's base. Eleven antiparallel β-strands make up the left side of the pyramid ( Figure 1A).

Int. J. Mol. Sci. 2022, 23, x FOR PEER REVIEW 4 of 24
Human apoB-100 sequence (residues 28-1030) was entered into HHpred (https://toolkit.tuebingen.mpg.de/tools/hhpred) (accessed on 30 April 2021) for homology search. Templates with probability >95% were selected for multiple sequence alignment using HHpred-TemplateSelection. Subunit I computational model contains 990 amino acids (E28-S1017) in three distinct domains ( Figure 1A). It resembles a pyramidal funnel with X:80 Å, Y:110 Å, Z:60 Å dimensions. Subunit I consist of an incomplete β-barrel at the top, two antiparallel β-sheets lined on two sides of the pyramid, supported by two curved layers of α-helices on the outer side. This model is similar to previously proposed models [16,18]. Subunit I do-main1 (SID1, 28-320 aa.) consists of 11 β-strands that create an incomplete β-barrel with two α-helices on the side and within the center of the barrel (Figure 2A). The last 30 residues of SID1 make a random coil and connects the barrel to the second domain. Subunit I domain 2 (SID2, 321-616 aa.) consists of 17 amphipathic α-helices ( Figure 2B). The helix structure is arranged in a double layer outside the β-sheets to apparently stabilize the core structure. Domain 2 mostly covers the right side, posterior outlet, and part of the left side of the pyramid ( Figure 1A). Subunit I domain 3 (SID3, 617-1017 aa.) mostly forms the two sides of the pyramid and the base ( Figure 2C). On the right side, residues 617-651 form a coil that connects the last α-helix from domain 2 to the first β-strand in domain 3. Five antiparallel β-strands make the β-sheet on the right side. A long coil extends from resides 693-772 to connect strand three to four. After the fifth strand, residues 790-820, made from two short helices and coils, connect the right and left β-sheets and cover the pyramid's base. Eleven antiparallel β-strands make up the left side of the pyramid ( Figure 1A). The anterior view of subunit I followed by 180 • rotation (posterior view). Domain 1 forms an incomplete β-barrel at the top (blue). Domain 2 with an α-helices on the right (green) and domain 3 forms the left side and the base (magenta). (B) DSSO cross-link map on subunit I linear sequence with three domains is colored according to A. (C) DSSO cross-link map on subunit I ribbon representation with 18 intra-subunit cross-links (red lines) between lysine residues. (D) The distribution plot of identified DSSO cross-links in subunit I vs. the spatial distances of lysine residues in the computational model. Subunit I was evaluated by local cross-link data and known disulfide bonds in apoB by UCSF Chimera software [35,36]. For the 18 local cross-links that were identified by mass spectrometry, all Cα−Cα distances of DSSO cross-links between lysine residues were within 26 Å maximum distance constraint ( Figure 1D). Moreover, for the seven known disulfide bonds in subunit I, all the cysteines were in close proximity (<5.6 Å) to form disulfide bonds before energy minimization. After optimizing the model, disulfide bonds formed among the paired cysteines ( Figure 2D-F).

Subunit II Modeling
Human ApoB-100 sequence (residues 1018-2072) was applied to the I-TASSER server for structure prediction. Templates of the highest significance in the threading alignments include (PDB ID/chain): 4rm6A, 4acqA, 4u4jA, 7abiE, 2pffB, 1vt4A, 1kmiZ, 5n8pA, and 6ysaA [30,31]. Again, the model was generated without using the DSSO cross-link data. The model was evaluated for the 13 cross-links found in subunit II, and all the cross-links were found within the 26 Å distance limit ( Figure 3A-C). Then, the structure was energy minimized by UCSF chimera to find and solve any steric clashes [34,35]. The N and C termini of each domain are colored in dark blue and red respectively and labeled with a residue number. Domain 1 (blue) is an incomplete β-barrel that extends from E28 residue to K320. Domain 2 (green) is arranged in two layers of α-helices (Q321-K616). Domain 3 (magenta) with two β-sheets on the right and left side and α-helices at the base creates a cavity in the center. (D-F) Ribbon representation of subunit I domains with disulfide bonds colored in yellow and cysteine residues colored in red with the ball and stick representation. Four disulfide bonds are located in domain 1 (D), two disulfide bonds in domain 2 (E), and one disulfide bond in domain 3 (F). Subunit I was evaluated by local cross-link data and known disulfide bonds in apoB by UCSF Chimera software [35,36]. For the 18 local cross-links that were identified by mass spectrometry, all Cα−Cα distances of DSSO cross-links between lysine residues were within 26 Å maximum distance constraint ( Figure 1D). Moreover, for the seven known disulfide bonds in subunit I, all the cysteines were in close proximity (<5.6 Å) to form disulfide bonds before energy minimization. After optimizing the model, disulfide bonds formed among the paired cysteines ( Figure 2D-F).

Subunit II Modeling
Human ApoB-100 sequence (residues 1018-2072) was applied to the I-TASSER server for structure prediction. Templates of the highest significance in the threading alignments include (PDB ID/chain): 4rm6A, 4acqA, 4u4jA, 7abiE, 2pffB, 1vt4A, 1kmiZ, 5n8pA, and 6ysaA [30,31]. Again, the model was generated without using the DSSO cross-link data. The model was evaluated for the 13 cross-links found in subunit II, and all the cross-links were found within the 26 Å distance limit ( Figure 3A-C). Then, the structure was energy minimized by UCSF chimera to find and solve any steric clashes [34,35]. The N and C termini of each domain are colored in dark blue and red respectively and labeled with a residue number. Domain 1 (blue) is an incomplete β-barrel that extends from E28 residue to K320. Domain 2 (green) is arranged in two layers of α-helices (Q321-K616). Domain 3 (magenta) with two β-sheets on the right and left side and α-helices at the base creates a cavity in the center. (D-F) Ribbon representation of subunit I domains with disulfide bonds colored in yellow and cysteine residues colored in red with the ball and stick representation. Four disulfide bonds are located in domain 1 (D), two disulfide bonds in domain 2 (E), and one disulfide bond in domain 3 (F). Subunit II model consists of 1055 residues (A1018-P2072) ( Figure 3A). Subunit II coil (A1018-T1500) contains 483 residues and is almost entirely made of coils. This part can form various shapes in modeling and does not appear to have a preferred fixed structure ( Figure 3A). Subunit II domain 1 (Y1501-P2072) is a boomerang shape β-helix with two wings. It is 572 residues long and is made from β-strands, turns, and coils. Domain 1 larger Subunit II model consists of 1055 residues (A1018-P2072) ( Figure 3A). Subunit II coil (A1018-T1500) contains 483 residues and is almost entirely made of coils. This part can form various shapes in modeling and does not appear to have a preferred fixed structure ( Figure 3A). Subunit II domain 1 (Y1501-P2072) is a boomerang shape β-helix with two wings. It is 572 residues long and is made from β-strands, turns, and coils. Domain 1 larger wing consists of 36 β-strands which make 16 turns with 90 Å length. The smaller wing is connected to the larger part with a~90 • angle. It is made from 17 β-strands that make eight turns with 60 Å length ( Figure 3D). Domain 1 coronal section is oval in most parts, except for a few triangular areas, with 26 Å and 12 Å dimensions ( Figure 3E). Sequence analysis of subunit II domain 1 by MPI Bioinformatics Toolkit HHrepID shows that this segment is made from 15 repeats (~28 residues each) and covers Q1528 to H1944 (p-value: 3.8 × 10 −14 ) (Supplementary Figure S1) [26,27,37].

Subunit IV Modeling
Human apoB-100 sequence (K2591-K4055) was entered into the I-TASSER server for modeling. The first forty residues (S2551-I2590) were excluded due to the sequence length and computational memory limit. Top threading templates used by I-TASSER include (PDB ID/Chain): 7abhE, 3b39A, 4o9xA, 7okqA, 4kncA, 3ljyA, 6ar6A, 6tgbA, and 2nbiA [30,31]. The model was generated without the intervention of DSSO cross-link data. The model was evaluated for the ten cross-links found in subunit IV. Seven cross-links were found within the 26 Å distance limit ( Figure 5B,C). Two cross-links were slightly above the expected limit (K3946-K3973: 26.8 Å, K2853-K2829: 28 Å) and one cross-link was~34 Å (K2911-K2926: 34.2 Å) ( Figure 5B,C). The model was examined for one disulfide bond located in subunit IV (C3194-C3324). The sulfur groups of cysteine residues were 16 Å apart. After two steps of atomic-level energy minimization (Modrefiner) within the I-TASSER server and UCSF Chimera, cysteine residues were correctly paired and within the 5.6 Å constraint ( Figure 5D) [34,35,39]. Human apoB-100 (residues 2073-2273) and (residues 2274-2550) were separately entered t HHpred for homology detection. The first three templates were selected for subunit III domain and the last four for subunit III domain 2 modeling. The entire structure is composed of helix-turn helix motifs. Both domains were generated by MODELLER using apolipoproteins as templates. (C DSSO cross-link on subunit III ribbon representation with five intra-subunit cross-links (red lines between lysine residues. Two cross-links between K2100-K2387 and K2100-K2402 exceeded the ex pected 26 Å limit. (D) DSSO cross-link map on subunit III linear sequence with two domains i colored according to C without considering the coil sequence that connects domain 1 to domain 2.

Subunit IV Modeling
Human apoB-100 sequence (K2591-K4055) was entered into the I-TASSER server fo modeling. The first forty residues (S2551-I2590) were excluded due to the sequence length and computational memory limit. Top threading templates used by I-TASSER includ (PDB ID/Chain): 7abhE, 3b39A, 4o9xA, 7okqA, 4kncA, 3ljyA, 6ar6A, 6tgbA, and 2nbiA [30,31]. The model was generated without the intervention of DSSO cross-link data. Th model was evaluated for the ten cross-links found in subunit IV. Seven cross-links wer found within the 26 Å distance limit ( Figure 5B,C). Two cross-links were slightly abov The shorter form of domain 2 (green) begins at D2293 through T2548. The entire structure is composed of helix-turn-helix motifs. Both domains were generated by MODELLER using apolipoproteins as templates. (C) DSSO cross-link on subunit III ribbon representation with five intra-subunit cross-links (red lines) between lysine residues. Two cross-links between K2100-K2387 and K2100-K2402 exceeded the expected 26 Å limit. (D) DSSO cross-link map on subunit III linear sequence with two domains is colored according to C without considering the coil sequence that connects domain 1 to domain 2.
Subunit IV model includes 1507 residues (S2551-N4057). It begins with 350 residues in coil form (S2551-F2900) ( Figure 5A). The first forty and last two residues were excluded from analysis due to computational processing limits. After the coil, the protein sequence transforms into strands that make three doughnut shape β-propeller domains ( Figure 5A). Each domain makes a toroidal shape β-propeller system with symmetrical seven blades (β-sheets) around the central axis. Each blade typically has four antiparallel β-strands arranged so that the fourth strand is close to the center and perpendicular to the first strand ( Figure 5A). All three domains have a similar diameter, around 46 Å and 30 Å thickness ( Figure 5D). Domain one is 412 residues and includes W2929 through D3340. Domain 2 is 338 residues and begins at K3380 to Y3717. Domain 3 structure with 407 residues is more complex and in close contact with domain 1. Domain 3 is made from three separate sequences; it starts with S2901 to S2928, extends from F3341 to Y3379, and finally is completed by residues S3718 to N4057 ( Figure 5E). Overall domain topology is that domain 1 and domain 2 central axis are almost parallel, whereas domain 3 axis is perpendicular to domains 1 and 2 ( Figure 5E,F). Sequence analysis of residues A2610-E4059 by MPI Bioinformatics toolkit HHrepID shows that this segment is made from three repeats with 100% probability, 547 residues length, and p-value: 3.3 × 10 −228 (Supplementary Figure S2) [26,27,37]. the expected limit (K3946-K3973: 26.8 Å, K2853-K2829: 28 Å) and one cross-link was ~34 Å (K2911-K2926: 34.2 Å) ( Figure 5B,C). The model was examined for one disulfide bond located in subunit IV (C3194-C3324). The sulfur groups of cysteine residues were 16 Å apart. After two steps of atomic-level energy minimization (Modrefiner) within the I-TASSER server and UCSF Chimera, cysteine residues were correctly paired and within the 5.6 Å constraint ( Figure 5D) [34,35,39]. Subunit IV model includes 1507 residues (S2551-N4057). It begins with 350 residues in coil form (S2551-F2900) ( Figure 5A). The first forty and last two residues were excluded from analysis due to computational processing limits. After the coil, the protein sequence transforms into strands that make three doughnut shape β-propeller domains ( Figure 5A). Each domain makes a toroidal shape β-propeller system with symmetrical seven blades (β-sheets) around the central axis. Each blade typically has four antiparallel β-strands arranged so that the fourth strand is close to the center and perpendicular to the first strand ( Figure 5A). All three domains have a similar diameter, around 46 Å and 30 Å thickness ( Figure 5D). Domain one is 412 residues and includes W2929 through D3340. Domain 2 is 338 residues and begins at K3380 to Y3717. Domain 3 structure with 407 residues is more complex and in close contact with domain 1. Domain 3 is made from three separate sequences; it starts with S2901 to S2928, extends from F3341 to Y3379, and finally is completed by residues S3718 to N4057 ( Figure 5E). Overall domain topology is that domain 1 and domain 2 central axis are almost parallel, whereas domain 3 axis is perpendicular to

Subunit V Homology Modeling
Human apoB-100 sequences (residues 4016-4270) and (residues 4260-4563) were applied to the HHpred server for homology detection [26][27][28]. Apolipoproteins with >90% sequence homology probability were selected as templates (Table 3). Generated alignments were forwarded to MODELLER for structural prediction [29]. The main templates used for the modeling were apolipoprotein E, apolipoprotein A-I, apolipoprotein A-IV, and apolipophorin-III. Both models were generated without experimental data intervention ( Figure 6A,B). One intra-domain cross-link was found in the models, and it was within the 26 Å limit. The structure energy minimized for steric clashes using UCSF Chimera. Generated primary model was used as a template in I-TASSER to make the subunit V entire model (resides 4058-4563). Templates of the highest significance in the threading alignments include (PDB ID/chain): 4uxvA, 4iggA, 6thkA, 6z6fA, 6d03E, 7qj0H, and 1st6A [30,31]. The model was evaluated for the five cross-links found in subunit V, four cross-links were found within the 26 Å distance limit, and one cross-link was slightly over the limit (28 Å) ( Figure 6E). Then, the structure was energy minimized by a two-step atomic-level energy minimization (Modrefiner) within the I-TASSER server and UCSF chimera to find and solve steric clashes [34,35,39]. mera. Generated primary model was used as a template in I-TASSER to make the subunit V entire model (resides 4058-4563). Templates of the highest significance in the threading alignments include (PDB ID/chain): 4uxvA, 4iggA, 6thkA, 6z6fA, 6d03E, 7qj0H, and 1st6A [30,31]. The model was evaluated for the five cross-links found in subunit V, four crosslinks were found within the 26 Å distance limit, and one cross-link was slightly over the limit (28 Å) ( Figure 6E). Then, the structure was energy minimized by a two-step atomiclevel energy minimization (Modrefiner) within the I-TASSER server and UCSF chimera to find and solve steric clashes [34,35,39]. Human apoB-100 (residues 4016-4270) and (residues 4260-4563) were separately entered to HHpred for homology detection. The first two templates were selected for subunit V domain 1 and the last five templates for subunit V domain 2 modeling. subunit V primary structure generated by homology modeling. Domain 1 (blue) starts at W4058 through I4190, and domain 2 (green) begins at E4271 through E4558. (C) Entire subunit V generated by I-TASSER using the primary structure as a template. Subunit V structure is composed of helix-turnhelix and short coils. (D) DSSO cross-link map on subunit V linear sequence with two domains is colored according to the structure. (E) DSSO cross-link map on subunit V ribbon representation with five intra-subunit cross-links (red lines) between lysine residues. One cross-link between K4485-K4518 has slightly exceeded the 26 Å limit (28 Å).

Domain Boundaries Prediction and Secondary Structure Determination
In the early stages, subunits and domains boundaries were first defined by using DomPred within PSIPRED Server V4.0 [24,25]. After several steps examining secondary and tertiary structures, the boundaries between subunits and domains were determined. The results of apoB-100 subunit boundaries and secondary structure are summarized in Table 4. Subunit I, with 990 residues (21.8%), has an almost equal percentage of α-helix, β-strand, and coil. Subunit II, with 1055 residues (23.2%), begins with 483 amino acids, almost entirely in coil form. Domain 1 of this subunit is made from β-strands that create an oval shape β-helix. We considered the entire β-helix as β-strands. Subunit III, with 478 residues (10.5%), contains two domains, homologous to exchangeable apolipoproteins. Both domains lack β-strands and contain a similar secondary structure of α-helix and coil. Subunit IV with 1507 residues (33.2%) contributes the longest part of apoB-100. Like subunit II, it starts with a long coil, 350 aa., then creates three 7-bladed β-propeller systems that mostly contain β-strands and turns. Most of these domains were considered β-strands except for two short α-helices at the end of domain one and coils at the beginning, end, and between the domains. Domain V with 506 residues (11.2%) contains two domains with α-helix secondary structure homologous to exchangeable apolipoproteins. In summary, 24% of the entire structure is α-helix, 41% β-strand, and 35% coil. These results sometimes differ from previous secondary structural studies. An LDL study by infrared spectroscopy at 37°C estimated 24% α-helix, 23% β-sheet, 24% β-strand, 6% β-turns, and 24% unordered structure of human apoB-100 [40]. On the other hand, the circular dichroic spectrum of LDL study showed the helical content of apoB-100 is 25% to 33% [41]. In another evaluation of apoB-100 secondary structure in LDL by infrared spectroscopy, it was reported that the amount of β-sheet is 41% [42], identical to our findings. Human apoB-100 subunits and secondary structure prediction. The secondary structure was calculated from final models using UCSF Chimera.

Sequence and Structural Comparison to Lipovitellin
Lipovitellin is the predominant egg yolk lipoprotein in oviparous species. Electron microscopy reveals lipovitellin particles with a 4-6 nm diameter form a ring with an average size of 25 nm [43]. Studies show that lipovitellin evolutionary and structurally, is closely related to MTP and apoB-100. Furthermore, homology models for the apoB-100 aminoterminus 1000 residues based on available crystal structure have been made [16,17]. In this study, we made a computational model of silver lamprey's lipovitellin and vitellogenin sequence, lipovitellin precursor, to fill up some of the crystal structure gaps and compare the lipovitellin crystal and computational model with our apoB-100 model. Lipovitellin structure data and vitellogenin sequence were acquired from the protein data bank and Uniprot and I-TASSER server was used to make the computational models (Supplementary Figure S3) [30,31,43]. The first 1000 residues of the amino-terminus of lipovitellin and ApoB secondary and tertiary structure are very similar and resemble a pyramid (Supplementary Figure S3A,E). There are three breaks in this part of the lipovitellin crystal structure with no electron density (R708-W729, M759-A777, M949-F990) [44]. The computational model of lipovitellin shows that all three missing segments were coils (Supplementary Figure S3F). Furthermore, part of the lipovitellin that covers the base of the pyramid from K1306-H1355 with no electron density is also a coil within the computational model [44]. In addition, segments of the vitellogenin sequence that were missing in the lipovitellin crystal structure, including a serine-rich area of vitellogenin (P1074-F1305), due to proteolytically cleaved protein or lack of electron density in the crystal structure were entirely coil in the computational model (Supplementary Figure S3G) [44][45][46][47]. According to the apoB computational model, the secondary structure of residues A1018-T1500 is a coil and comparable to vitellogenin sequence P10740-H1355 (Supplementary Figure S3C,G). Moreover, lipovitellin β-strands which form the β-sheet and cover the base of the lipid cavity (S1356-F1529) are comparable to apoB subunit II domain 1 secondary structure (Y1501-P2072) (Supplementary Figure S3D,H).

ApoB-100 Architecture and Dynamics
The length of apoB-100 made it impossible to orient all subunits in one step using currently available bioinformatic tools. Therefore, different subunits and domains were extended and combined from C-terminus and N-terminus, with and without cross-link data, to optimize long-distance cross-link coverage. Then, large structures were examined and validated to see if they could be fitted manually. Making large segments reduced the accuracy of domains, especially within domains and subunits boundaries, but it was necessary for designing the whole architecture and validating subunits folding.
The first segment (E28-T1500) was generated by the I-TASSER server applying K763-K1324 and K1087-K1121 DSSO cross-links to orient the subunit II coil and estimate the coil location and dynamics ( Figure 7A-C). The second segment covers Y1501-Y2256, generated by the same tool [30,31]. Then, the two components were oriented using cross-links C (K766-K2208) and E (K923-K1593) ( Figure 7A-C). Both cross-links' distances were <26 Å, and subunit II was positioned very close to subunit I, and they did not clash. Since the two segments were generated independently without using the cross-link data (except for the coil part 1018-1500) and all cross-links were within the 26 Å limit, we concluded that the structural match geometrically was a strong proof that both models were accurate enough to fit by applying the cross-link data. Furthermore, we generated different protein folding for the second segment to compare with the later model and see if those models could fit with segment one using cross-links C and E. The models were generated without applying cross-link data by I-TASSER and RoseTTAFold. Even though both models' intra-domain cross-links were within the 26 Å limit, neither of those models fit well with segment one using cross-links C and E (Supplementary Figure S4) [30,31,48]. The third (Q2257-T2548) and fifth (W4058-L4563) segments were positioned next by using cross-links I (K2100-K2387 and K2100-K2402), A, and B (K1696-K4349 and K1958-K4207). The structures did not clash, and all cross-links' distances were <26 Å ( Figure 7D). The fourth segment was added using cross-links G and H (K2270-K3210 and K4207-3148 and K4207-3159). The cross-links were <26 Å, and all five segments did not clash ( Figure 7E). Cross-link F, a coil on one side (K2671-K1586 and K2671-K1593), was adjusted by changing residues torsion in the coil region (K2591-F2900). Cross-link J (K3682-K4103) was not within the 26 Å limit (102 Å) ( Figure 7E). It is possible that the subunit IV domain 2 model is the mirror image of the actual structure. Therefore, it was manually repositioned on the other side of subunit IV domain 3 to fill the gap between the two subunits, and cross-link J became <26 Å ( Figure 7F). Twelve out of thirteen inter-subunit cross-links were within the 26 Å limit in the final model, and cross-link D, which connects subunit II coil to subunit I was 37 Å. A total number of 65 unique cross-links were identified by mass spectrometry. The distance between Cα−Cα of lysine residues was measured for 64 cross-links. Fifty-six (87.5%) cross-links were within the 26 Å threshold, three cross-links were slightly beyond the 26 Å limit (26.9 Å, 2 × 28.0 Å), and for five cross-links Cα−Cα of lysine residues was >30Å (33.9 Å, 34.2 Å, 36.3 Å, 37.3 Å, 42.5 Å). For the last two cross-links, at least one lysine residue was located within the coil region (Supplementary Table S1). ApoB-100 model was arranged on the LDL particle according to the cross-link data from Figure 7 and published CryoEM data ( Figure 8A,B) [21]. The LDL lipid is represented as a discoidal shape (20 nm × 20 nm × 11 nm) with a round shape from the top ( Figure 8A) and an oval shape from the side views ( Figure 8B). Segments one (E28-T1500) and two (Y1501-Y2256) were arranged on the top of the disc. Subunit II coil (A1018-T1500), which is part of segment one, is mostly located below subunit I and connects the latter part to subunit II domain 1 (Y1501-P2072) ( Figure 8A). Segment three (Q2257-T2548) is on the front side, segment four (K2591-K4055) is on the bottom and right side, and segment five (W4058-L4563) is on the back side of the disk ( Figure 8A,B). Subunit IV coil (K2591-F2900) that connects subunit III to subunit IV domain 3 is located on the right and bottom sides.
There is a gap of 42 residues (Y2549-I2590) which is part of the subunit IV coil. These residues were excluded from analysis because of the software processing limitation from the subunit IV long sequence. Cryo-EM studies show that the LDL receptor β-propeller domain binds to the linker part of apoB-100 on the right side of the LDL particle, which connects the top and bottom parts of the protein [21]. This finding suggests the β-propeller domain of the LDL receptor binds to a β-propeller domain of subunit IV located on the right side of the LDL particle ( Figure 8A,B). segment was added using cross-links G and H (K2270-K3210 and K4207-3148 and K4207-3159). The cross-links were <26 Å, and all five segments did not clash ( Figure 7E). Crosslink F, a coil on one side (K2671-K1586 and K2671-K1593), was adjusted by changing residues torsion in the coil region (K2591-F2900). Cross-link J (K3682-K4103) was not within the 26 Å limit (102 Å) ( Figure 7E). It is possible that the subunit IV domain 2 model is the mirror image of the actual structure. Therefore, it was manually repositioned on the other side of subunit IV domain 3 to fill the gap between the two subunits, and cross-link J became <26 Å ( Figure 7F). Twelve out of thirteen inter-subunit cross-links were within the 26 Å limit in the final model, and cross-link D, which connects subunit II coil to subunit I was 37 Å. A total number of 65 unique cross-links were identified by mass spectrometry. The distance between Cα−Cα of lysine residues was measured for 64 cross-links. Fifty-six (87.5%) cross-links were within the 26 Å threshold, three cross-links were slightly beyond the 26 Å limit (26.9 Å, 2 × 28.0 Å), and for five cross-links Cα−Cα of lysine residues was >30Å (33.9 Å, 34.2 Å, 36.3 Å, 37.3 Å, 42.5 Å). For the last two cross-links, at least one lysine residue was located within the coil region (Supplementary Table S1). (E) Segment four (K2591-K4055) was the last part of locating by applying cross-links G, and H. Cross-link F that connects subunit II domain 1 to subunit IV coil was arranged by coil sequences torsion adjustment (not shown). Cross-link J (K3682-K4103), which connects subunit IV domain 2 to subunit V was the only inter-subunit linker that did not fit within the 26 Å limit (102 Å). (F) Subunit IV domain 2 was manually repositioned on the other side of subunit IV domain 3 to fill the gap between subunits IV and V. After repositioning domain 2 cross-link J fit within the 26 Å limit (25 Å). ApoB-100 model was arranged on the LDL particle according to the cross-link data from Figure 7 and published CryoEM data ( Figure 8A,B) [21]. The LDL lipid is represented as a discoidal shape (20 nm × 20 nm × 11 nm) with a round shape from the top ( Figure 8A) and an oval shape from the side views ( Figure 8B). Segments one (E28-T1500) and two (Y1501-Y2256) were arranged on the top of the disc. Subunit II coil (A1018-T1500), ApoB-100 model on VLDL was also designed according to the immunoelectron microscopy ribbon and bow model and DSSO cross-link data ( Figure 8C,D) [49]. This model is more hypothetical compared to the LDL model due to less data. The VLDL lipid is represented in a spherical shape with a 30 nm diameter. Segments 3 and 5 with α-helical structures are more extended than globular, and the subunit II coil is unrolled instead of folded in the LDL model ( Figure 8). The rest of the apoB-100 structure that was applied for the VLDL model was the same as the LDL model. In the proposed model, segments 1-4 (E28-K4055) form an incomplete ring around the spherical lipid particle, and segment 5 (W4058-L4563) moves in the opposite direction and crosses segment 4. The VLDL proposed model is almost entirely consistent with the immunoelectron microscopy results. Furthermore, the immunoelectron microscopy model supports subunit II domain 1 folding with the boomerang or V shape. This part starts around R1507 (apoB-32) and ends around M2358. There is a "kink" in the middle of this part at M1881. According to the article, the protein starts at apoB-2, then encircles the spherical shape lipid, then at apoB-41 (Kink), it significantly changes its direction through apoB-50 [49]. Interestingly, the "kink" region perfectly matches the apoB model and the kink start point (M1881) is where the two wings of boomerang shape subunit II joins ( Figure 8C). the front side, segment four (K2591-K4055) is on the bottom and right side, and segment five (W4058-L4563) is on the back side of the disk (Figure 8A,B). Subunit IV coil (K2591-F2900) that connects subunit III to subunit IV domain 3 is located on the right and bottom sides. There is a gap of 42 residues (Y2549-I2590) which is part of the subunit IV coil. These residues were excluded from analysis because of the software processing limitation from the subunit IV long sequence. Cryo-EM studies show that the LDL receptor β-propeller domain binds to the linker part of apoB-100 on the right side of the LDL particle, which connects the top and bottom parts of the protein [21]. This finding suggests the β-propeller domain of the LDL receptor binds to a β-propeller domain of subunit IV located on the right side of the LDL particle ( Figure 8A,B).  A number of studies mentioned that apoB-100 architecture is likely very dynamic and flexible. An important potential factor that causes apoB conformational changes is the surface pressure alteration due to size change from lipolysis [18,50]. Lipoprotein particle interface surface pressure progressively increases during the transformation of VLDL to LDL [50]. Thus, it was suggested that α-helix rich subunits (III and V) are expanded and in contact with the surface lipids in VLDL. While during VLDL conversion to LDL, the pressure rises, and α-helix subunits come off the surface to possibly form a globular conformation. Unlike α-helix subunits, β-strand rich subunits (II and IV) are predicted to be tightly anchored to the core lipid and keep the protein bound to the lipid [18,50]. Furthermore, we propose that coils in apoB-100, especially subunit II and subunit IV coils, play a key role in the protein flexibility and architecture adjustment due to size and surface pressure changes. Since in a large size lipoprotein, there is less surface pressure on apoB-100, we hypothesized that five subunits are expanded over the lipid surface and far apart from each other. Subunit II coil with 483 residues and subunit IV coil with 351 residues are similar to an open rope on the opposite sides. The two long coils connect subunit I to subunit II and subunit III to subunit IV, respectively ( Figure 8C). We suggest that during VLDL to LDL conversion, coils on both sides are remodeled from open to a folded form and bring the five subunits to close proximity ( Figure 8A-D). This structure alteration regulates and reduces the pressure on the entire structure to maintain its integrity.
According to DSSO cross-link data, VLDL lacks the entire distal cross-links and many short-distance cross-links except for the N-terminus β-barrel (Supplementary Figure S5). VLDL and LDL DSSO cross-link data comparison support the computational models of apoB-100 architectures on LDL and VLDL. Moreover, the disappearance of short-distance cross-links in VLDL might be due to submerged protein into VLDL lipids compared to smaller LDL particles. In addition, LDL negative staining electron microscopy micrographs show LDL particles with various morphologies such as circular, oval, and rectangular. On the other hand, VLDL particles are reported to be more circular than rectangular [21,23].
In another attempt to determine the protein's hydrophilic sites, hydrolyzed DSSO cross-links and painted apoB-100 sequences in LDL particles were analyzed. The hydrolyzed DSSO cross-links were almost entirely located in the painted regions, which indicates the hydrophilic sites by applying two different methods. Both methods show that the protein hydrophilic and hydrophobic patches are not uniform and lack specific orientation ( Supplementary Figures S6 and S7).

β-propeller Folds and Apob-100 Docking Region
β-propeller structures are a type of all beta-strand proteins made of four to twelve β-sheets called blades. Depending on the blade number, they have a wide range of functions. For instance, β-propellers with six and seven blades, in addition to structural and ligand binding functions, can operate as a signaling protein, lyase, hydrolase, or as oxidoreductase enzyme [51][52][53][54][55][56][57]. Interestingly, several reports mentioned the possible role of lipoproteins in various non-traditional activities, such as modulating signal transduction and exocytosis, DNA binding and transfection capacity, lyase activity, and oxidoreductase activity [3,5,6,58,59]. Moreover, subunit IV seven-bladed β-propellers are complementary to the six-bladed β-propeller domain of LDL-receptor YWTD-EGF ( Figure 9) [60]. Additionally, as was mentioned above, according to the Cryo-EM studies and the theoretical LDL model, the LDL receptor β-propeller domain binds to a β-propeller domain of subunit IV ( Figure 8A,B) [21]. According to a study, monoclonal antibodies specific for apoB-100 epitopes between P2953-K3057, near residue A3473, and between S4000-I4054 are inaccessible on receptor-bound LDL. According to the proposed computational model, those regions are located on subunit IV domains 1-3, respectively. Conversely, a monoclonal antibody specific for apoB-100 between F2808-I2895 could bind the epitope on receptor-bond LDL, in accordance with the proposed model part of subunit IV coil just before β-propellers start point, S2901. ApoB-100 docking architecture supports the idea that the binding site is multivalent and that the domains can act in concert or independently ( Figure 5) [61]. In addition, the apoB-100 β-propeller domain proposed model can also create a pocket for calcium ions in the center, which helps explain the mechanism of LDL binding to LDL-receptor [62].

Sequences and Domain Prediction
The human apolipoprotein B100 reference sequence (P04114) was acquired from the Uniprot database [63]. Protein templates used in the study are available at the Protein Databank. Domain boundaries were predicted by applying DomPred (Protein Domain Prediction) and PSIPRED Server V4.0 by searching against the database with 0.01 PSI-BLAST e-value cutoff and five iteration [24,25]. Since the human ApoB-100 sequence is 4563 residues long, it is not feasible to generate the model in one step with the current software and computer processing limitations. On the other hand, breaking down the protein sequence into many small pieces can significantly affect the models' accuracy and make it difficult to dock the parts together. Therefore, several strategies have been applied to determine the domain boundaries by software prediction and examining secondary and tertiary structures in every step. Since the maximum sequence length that PSIPRED can process is 1500 residues, the entire sequence was divided into three fragments of 1500

Sequences and Domain Prediction
The human apolipoprotein B100 reference sequence (P04114) was acquired from the Uniprot database [63]. Protein templates used in the study are available at the Protein Databank. Domain boundaries were predicted by applying DomPred (Protein Domain Prediction) and PSIPRED Server V4.0 by searching against the database with 0.01 PSI-BLAST e-value cutoff and five iteration [24,25]. Since the human ApoB-100 sequence is 4563 residues long, it is not feasible to generate the model in one step with the current software and computer processing limitations. On the other hand, breaking down the protein sequence into many small pieces can significantly affect the models' accuracy and make it difficult to dock the parts together. Therefore, several strategies have been applied to determine the domain boundaries by software prediction and examining secondary and tertiary structures in every step. Since the maximum sequence length that PSIPRED can process is 1500 residues, the entire sequence was divided into three fragments of 1500 residues length each, starting from the first amino acid (1-1500, 1501-3000, 3001-4500).
In addition, the entire sequence was examined for domain boundaries in segments of 500 residues length starting from the first residue and sliding to the next possible domain boundary. For instance, the first segment began at 1-500, and the second segment started from 321-820 by knowing that position 320 is a potential domain boundary. Then the results acquired by each method were compared with each other through the entire modeling process as well as their secondary and tertiary structure to optimize the results.

Database Search and Modeling
Human ApoB 100 structure is a heterogeneous molecule. Therefore, searching for templates and modeling for different parts varies based on the available structures in the database. The most successful protein structure prediction method relies on identifying homologous templates with known structures [26,27]. However, a database search does not provide homologous templates for all the regions. In this case, servers based on fold recognition by threading were used to generate models. Different software, such as the MPI Bioinformatic Toolkit, I-TASSER, DEMO, RoseTTAFold, and Phyre2 web portals, were used to develop accurate models [26,27,38,48,[64][65][66]. Homologous template database searches and selections are critical points for optimal alignment; hence, different parameters, including probability, E-value, and identity, have been modified for each sequence search to extend the target sequence coverage without reducing accuracy. Several models were generated from the sequence alignment with homologous templates. Then each model's secondary and tertiary structures were examined and validated by experimental data, such as disulfide bonds and DSSO crosslinks. Finally, the high-ranked models for all domains were selected and assembled ( Figure 10). residues length each, starting from the first amino acid (1-1500, 1501-3000, 3001-4500). In addition, the entire sequence was examined for domain boundaries in segments of 500 residues length starting from the first residue and sliding to the next possible domain boundary. For instance, the first segment began at 1-500, and the second segment started from 321-820 by knowing that position 320 is a potential domain boundary. Then the results acquired by each method were compared with each other through the entire modeling process as well as their secondary and tertiary structure to optimize the results.

Database Search and Modeling
Human ApoB 100 structure is a heterogeneous molecule. Therefore, searching for templates and modeling for different parts varies based on the available structures in the database. The most successful protein structure prediction method relies on identifying homologous templates with known structures [26,27]. However, a database search does not provide homologous templates for all the regions. In this case, servers based on fold recognition by threading were used to generate models. Different software, such as the MPI Bioinformatic Toolkit, I-TASSER, DEMO, RoseTTAFold, and Phyre2 web portals, were used to develop accurate models [26,27,38,48,[64][65][66]. Homologous template database searches and selections are critical points for optimal alignment; hence, different parameters, including probability, E-value, and identity, have been modified for each sequence search to extend the target sequence coverage without reducing accuracy. Several models were generated from the sequence alignment with homologous templates. Then each model's secondary and tertiary structures were examined and validated by experimental data, such as disulfide bonds and DSSO crosslinks. Finally, the high-ranked models for all domains were selected and assembled ( Figure 10).

Plasma Lipoprotein Separation by Sequential Ultracentrifugation
To fractionate lipoproteins (VLDL, LDL, HDL) by ultracentrifugation, a single batch of fresh whole plasma from a healthy donor was provided by the NIH blood bank. Lipoproteins were isolated from fresh plasma sample by sequential potassium bromide density ultracentrifugation according to the procedure [67]. Various lipoproteins were collected carefully after every ultracentrifugation step. Harvested samples were dialyzed with 10K MW cassettes in PBS buffer at 4 °C to remove potassium bromide. Collected samples were stored at 4 °C and used for cross-linking.

Plasma Lipoprotein Separation by Sequential Ultracentrifugation
To fractionate lipoproteins (VLDL, LDL, HDL) by ultracentrifugation, a single batch of fresh whole plasma from a healthy donor was provided by the NIH blood bank. Lipoproteins were isolated from fresh plasma sample by sequential potassium bromide density ultracentrifugation according to the procedure [67]. Various lipoproteins were collected carefully after every ultracentrifugation step. Harvested samples were dialyzed with 10K MW cassettes in PBS buffer at 4 • C to remove potassium bromide. Collected samples were stored at 4 • C and used for cross-linking.

Cross-Linking Assay
Disuccinimidyl sulfoxide (DSSO), a mass-spec cleavable cross-linker, was used to test and validate individual computational models. As described above, VLDL and LDL subfractions were obtained by density gradient ultracentrifugation from a healthy donor. Dimethyl sulfoxide (DMSO) and 20mM HEPES 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid) (pH 7.5) were added to samples to optimize the reaction according to the manufacturer's protocol. DSSO (SA246004, ThermoFisher Scientific ® , Rockford, IL, USA) was prepared just before adding samples by dissolving a 1 mg vial with 50 µL dimethyl sulfoxide (DMSO). Samples were incubated at room temperature for one hour. The reaction was quenched with 2 µL 1M/Tris buffer for 15 min at room temperature to remove the extra free crosslinker, and then stored at 4 • C. DSSO was added to samples accordingly and incubated at room temperature for one hour. The product results were run on an SDS-PAGE 4-12% to validate cross-linked molecules.

Sample Preparation for Mass Spectrometry with Double Trypsin Digestion
Protein concentration in samples was measured with a NanoDrop 1000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). An amount of 50 µg of total protein (100 µL) from each cross-linked sample was applied for further processing. Trypsin (V5111, Promega, Madison, WI, USA) was added to each sample with a protease: protein ratio of 1:50 (w/w) and incubated at 37 • C for one hour. Partially digested samples were transferred to glass tubes for delipidation with a chloroform: methanol ratio of 2:1 (v/v). Samples were incubated on ice for 30 min. After adding cold methanol, they were centrifuged at 4000× g for 20 min at 4 • C. Supernatants were discarded, and pellets were dissolved in cold methanol and centrifuged as above. Delipidated protein precipitates were resuspended in (100 µL) 8M urea and diluted by 50 mM NH4HCO3 (PH 7.8) to reach 1 M urea concentration. Samples were reduced by adding 200 mM (40 µL) dithiothreitol (DTT) and incubated at 37 • C for 30 min. Then carbamidomethylated by 800 mM (40 µL) iodoacetamide (IA) and incubated for 30 min at room temperature in dark. For a second digestion step, trypsin was added in a protease: protein ratio of 1:50 (w/w), digest overnight at 37 • C. Digestion reaction was terminated by adding formic acid. Samples were concentrated by Speed-Vac until the final volume was 100 µL. Peptides were desalted and purified using C18 resin, ZipTip ® (ZTC18S096, Millipore Sigma, Burlington, MA, USA) according to the manufacturer's protocol, dried in Speed-Vac, and stored at −20 • C. Samples were reconstituted in 20 µL 0.1% formic acid and transferred into sample vials (C1411-13, Thermo Fisher Scientific, Waltham, MA, USA) for mass spectrometry.

Mass Spectrometry
Desalted tryptic peptides were analyzed using nanoscale liquid chromatography tandem mass spectrometry (nLC-MS/MS) and Ultimate 3000-nLC online coupled with an Orbitrap Lumos Tribrid mass spectrometer (Thermo Fisher Scientific, Waltham, MA, USA). Peptides were separated on an EASY-Spray C18 column (Thermo Fisher Scientific, Waltham, MA, USA, 75 µm by 50 cm inner diameter, 2-µm particle size, and 100-A • pore size). Separation was achieved by 6 to 35% linear gradient of acetonitrile +0.1% formic acid for 60 min. An electrospray voltage of 1.9 kV was applied to the eluent via the EASY-Spray column electrode. Eluting peptides were analyzed by a Tribid Fusion Lumos mass spectrometer (Thermo Fisher Scientific), using a hybrid CID-MS/MS, and CID-MS3 fragmentation scheme, as previously described (1). Briefly, MS scans were acquired in data dependent mode on the Orbitrap (resolution = 120,000, maximum injection time = 50 ms, AGC target = 4.0 × 10 5 , scan range (m/z) = 375-1500). The duty cycle was restricted to 5 s. For each selected MS1 precursor (z = 4-8, dynamic exclusion = 30 s, intensity ≥ 2.5 × 10 −4 ), CID (normalized collision energy of 25%) fragmentation was performed in sequential scans. MS/MS scans were recorded at an Orbitrap resolution of 30,000. A mass difference (DSSO reporter ion doublets) of 31.79 amu detected in CIDMS/MS was used to trigger up to 4 CID-MS3 rapid ion-trap analyses (normalized collision energy 35%).

Mass Spectrometry Data Analysis
The .raw file was analyzed using Proteome Discoverer 2.3 with XlinkX 1 v2.0 nodes and searched against an in-house curated LDL human protein database (60 proteins known to associate with LDL). Xlink algorithm identifies MS-cleavable crosslinkers and the detection settings were as follows:

Protein Painting
Painted and unpainted ApoB samples were prepared in triplicate from a single batch of LDL particles isolated from a healthy donor. Protein concentration was 1.7 mg/mL. LDL particles (10 µg per sample) were pulsed with 50 µL of a 1 mg/mL solution of freshly prepared covalent paint molecule Fast Blue B (Cat# D9805, Sigma Aldrich, St. Louis, MO, USA) for 5 min. Unpainted control samples were pulsed with 50 µL of PBS. Following painting, samples were passed over a pre-washed Sephadex G-25 gel filtration column (Cat# 11814397001, Sigma Aldrich, St. Louis, MO, USA) according to the manufacturer's directions. The gel-filtered samples were added to a glass test tube containing 500 µL of 2:1 chloroform: methanol and incubated on ice for 30 min. Following incubation, an additional 400 µL of ice-cold methanol was added, samples were mixed via vigorous pipetting, then transferred to a 2 mL Eppendorf microcentrifuge tube and centrifuged at 4000× g for 20 min at 4 • C. Supernatant was removed and the pellet was washed and resuspended in 1 mL ice-cold methanol, then centrifuged at 4000× g for 20 min 4 • C. The resulting protein pellet was resuspended in 75 µL of 8M urea. Samples were subsequently reduced with 10 mM DTT, alkylated with 50 mM iodoacetamide, and digested with trypsin at a 1:10 protease: protein ratio for 1.5 h. Digested peptides were desalted using Pierce C18 Columns (Cat# 89873, ThermoFisher Scientific, Waltham, MA, USA) according to the manufacturer's directions. Samples were analyzed via mass spectrometry as previously reported (1). Briefly, LC-MS/MS experiments were performed on an Orbitrap Fusion (Thermo Fisher Scientific, Waltham, MA, USA) equipped with a nanospray EASY-LC 1200 HPLC system (Thermo Fisher Scientific, Waltham, MA, USA). Peptides were separated using a reversed-phase PepMap RSLC 75-µm inner diameter × 15 cm long with 2 µm, C18 resin LC column (Thermo Fisher Scientific, Waltham, MA, USA). The mobile phase consisted of 0.1% aqueous formic acid (mobile phase A) and 0.1% formic acid in 80% acetonitrile (mobile phase B). After sample injection, the peptides were eluted by using a linear gradient from 5 to 50% B over 30 min and ramping to 100% B for an additional 2 min. The flow rate was set at 300 nl/min. The mass spectrometer was operated in a datadependent mode in which one full MS scan (60,000 resolving power) from 300 Da to 1500 Da using quadrupole isolation was followed by MS/MS scans in which the most abundant molecular ions were dynamically selected by Top Speed and fragmented by collisioninduced dissociation using a normalized collision energy of 35%. "Peptide Monoisotopic Precursor Selection" and "Dynamic Exclusion" (8 s duration), were enabled, as was the charge state dependence so that only peptide precursors with charge states from +2 to +4 were selected and fragmented by collision-induced dissociation. Tandem mass spectra were searched using Proteome Discover version 2.1 with SEQUEST against the NCBI Escherichia coli and Saccharomyces cerevisiae databases and a custom database containing sequences of the recombinant proteins using tryptic cleavage constraints. Mass tolerance for precursor ions was 5 ppm, and mass tolerance for fragment ions was 0.06 Da. Data were analyzed with oxidation (+15.9949 Da) on methionine as a variable post-translation modification and carbamidomethyl cysteine (+57.0215) as a fixed modification. A 1% false discovery rate was used as a cut-off value for reporting peptide spectrum matches (PSMs) from the database. Regions of paint coverage were identified as peptides present in all three unpainted samples and absent in all three painted samples [69].

Conclusions
We developed a "divide and conquer" algorithm using PSIPRED software by dividing apoB into five subunits and 11 domains. Models of each domain were prepared using different structural prediction software. Then, experimentally, 64 DSSO cross-links and eight disulfide bonds were used to validate each model. A total of 56 cross-links (87.5%) were within the 26 Å threshold, and all eight known disulfide bonds were paired and within the expected 5.6 Å constraint. Finally, multiple domains and subunits were combined to merge smaller domains. We also examined the dynamics of apoB-100 as it transitions from VLDL to LDL. As first suggested by J. Segrest [70], our data also indicates a significant folding difference in some regions of lipovitellin compared to apoB. In the proposed model, the base of subunit I is covered by the first 1000 residues and residues 1500-2072 fold in a β-helix form instead of a simple β-sheet like lipovitellin. A significant factor for the change could be lipid size differences in LDL and VLDL (18-95 nm) compared to lipovitellin (4-6 nm). Therefore, two β-sheets may have evolved to form a β-helix with enough strength and flexibility to stabilize the larger lipoprotein particle. We also showed that subunit IV is made from three repeats in the form of seven-bladed β-propeller domains, which may enable apoB-100 to bind to various sites. According to the suggested model,~24% of the entire structure is α-helix, 41% β-strand, and 35% coil. The two long coils in subunit II and subunit IV may account for the high flexibility of apoB-100. In addition, the computational model of subunit II domain 1 simple β-sheet did not match with the DSSO cross-link data (Supplementary Figure S4). Furthermore, the "kink" region in the immunoelectron microscopy ribbon and bow model perfectly matched the subunit II domain 1 model [49]. Studies have shown that LDL and apoB-100 containing lipoproteins not only constitute a significant atherosclerosis risk factor but also induce the signaling in different tissues, including vascular smooth muscle and lung alveolar cells [3,5,6]. Additionally, the similarity between lipoproteins and hepatitis C virus structure raises the possibility that this molecular mimicry of HCV may play an essential role in its life cycle [4]. Finally, this study represents the first use of the "divide and conquer" computational algorithm integrated with crosslinking. This approach may be valuable for analyzing other complex and large proteins.