Differences among Unique Nanoparticle Protein Corona Constructs: A Case Study Using Data Analytics and Multi-Variant Visualization to Describe Physicochemical Characteristics

: Gold nanoparticles (AuNPs) used in pharmaceutical treatments have been shown to effectively deliver a payload, such as an active pharmaceutical ingredient or image contrast agent, to targeted tissues in need of therapy or diagnostics while minimizing exposure, availability, and accumulation to surrounding biological compartments. Data sets collected in this ﬁeld of study include some toxico- and pharmacodynamic properties (e.g., distribution and metabolism) but many studies lack information about adsorption of biological molecules or absorption into cells. When nanoparticles are suspended in blood serum, a protein corona cloud forms around its surface. The extent of the applications and implications of this formed cloud are unknown. Some researchers have speculated that the successful use of nanoparticles in pharmaceutical treatments relies on a comprehensive understanding of the protein corona composition. The work presented in this paper uses a suite of data analytics and multi-variant visualization techniques to elucidate particle-to-protein interactions at the molecular level. Through mass spectrometry analyses, corona proteins were identiﬁed through large and complex datasets. With such high-output analyses, complex datasets pose a challenge when visualizing and communicating nanoparticle-protein interactions. Thus, the creation of a streamlined visualization method is necessary. A series of user-friendly data informatics techniques were used to demonstrate the data ﬂow of protein corona characteristics. Multi-variant heat maps, pie charts, tables, and three-dimensional regression analyses were used to improve results interpretation, facilitate an iterative data transfer process, and emphasize features of the nanoparticle-protein corona system that might be controllable. Data informatics successfully highlights the differences between protein corona compositions and how they relate to nanoparticle surface charge.


Introduction
When nanoparticles (NPs) enter a biological fluid, such as serum, proteins and peptides adsorb onto the particle's surface, forming a nanoparticle-protein corona (NPPC) complex [1][2][3]. The protein cloud develops a new surface on the nanoparticle, impacting the interactions between the nanoparticle and the surrounding environment. The NPPC complex assumes the characteristics of a core-shell particle system, where the pristine particle is the "core" and the corona is the "shell" [4]. This creates a new interface between the pristine particle and the surrounding biological fluid. The formation of a protein corona has been shown to change the particle's biological effects [5][6][7][8]. In addition to this change in the particle's characteristics, the adsorbed proteins have the potential to undergo conformational changes. Changes in either set of properties are especially important for nanomaterial drug products [9][10][11][12][13]. The cloak of proteins around particles in biological fluids has been shown to enhance or diminish a particle's specificity to a desired location, incite or inhibit its ability to permeate into a cell, and heighten or mitigate a biochemical response. When a protein corona forms, it is more likely to distribute and become available due to the nano-bio interface that forms around the surface of the nanoparticle. This protein cloud can change the biodistribution of the nanoparticle system depending on the biomolecular composition formed around the surface. Consequently, if the drug is reliant on specific proteins binding to the surface of the nanoparticle, then the particle could be translocated into an unintended location if non-targeted proteins form around the particle.
Research efforts in the field of nanotoxicology have strived to relate (or correlate) particle properties to adverse health effects. There now exists a substantial amount of published literature on this topic; however, there is no general consensus on identifying physicochemical characteristics as predictors of toxicological responses. Many different nanoparticles, such as anatase titania, alpha-quartz silica, and gold colloids, have distinct crystal structures as opposed to amorphous materials of similar chemical composition. Notably, crystal structure has been associated with increased cytotoxicity in mammalian cell populations [14][15][16][17]. Nanoparticle size has been implicated as a main driver in systemic distribution [15,[18][19][20]. Surface charge is related to cellular uptake mechanisms [21][22][23][24]. However, for every paper published that demonstrates a direct or proportional relationship between particle properties and an induced biological affect, there are others that report a lack of association [25][26][27][28].
Toxicologists who study engineered nanomaterials and nanomedicines have been interested in both complex mixtures in environmentally-and physiologically-relevant conditions in order to decipher the role of physiochemical properties in induced mechanisms of toxic action. These analytical interrogations require multi-level data collected on germane parameters. In the NPPC research space, some of these parameters include particle composition, size, and charge; corona source, composition, and mass; and NPPC activity, toxicity, and ADME properties. New data-gathering methods, such as the inclusion of mass spectrometry techniques and resultant data, produce collections of large sets that are sometimes complicated, multifaceted, and often non-comparable. Scientific data visualization is a powerful methodological tool for enabling the understanding of multi-dimensional data sets. The purpose of this study was to demonstrate the utility of data visualization as a method for identifying trends in NPPC physicochemical properties among six unique samples. Data visualization has been shown to play a crucial role by aiding scientists and practitioners to interpret underlying patterns in their data and applying the new knowledge to pose the next set of research questions [29].
Specific to nanotoxicology research, understanding the composition of the protein corona associated with gold nanoparticles (AuNPs) has the potential to improve nano-enabled drug product development [30,31]. Gold nanoparticles are important because they offer the opportunity to advance pharmaceutical treatments by delivering direct payloads to target tissues. For example, in Ghosh et al., hybrid AuNP polymers were transfected vectors into kidney cells [9,[32][33][34][35][36][37]. Quantitative and discovery proteomics of the gold nanoparticle protein corona have enhanced physicochemical characterization [38][39][40][41]. The results presented in this manuscript serve two purposes. First, the data sets expand upon the available data in the protein corona field to assist with its exploitation in pharmaceutical drug development. Second, the data visualization highlights the varying interpretations that can be gained after displaying the same data in different forms. Therefore, the visualization and presentation of results is of relevance to toxicologists and pharmacologists alike.

Materials and Methods
The approach to analyze large sets of proteomic data relevant for nanotoxicological studies needs to be streamlined. The approach used in this study includes data collection, storage, and organization, Figure 1. The flow of data from collection to interpretation. After the conclusion of an experiment, data are collected. Within the first stage, sets of data are stored and organized locally or on the cloud in spreadsheets. Second, data are normalized relative to controls. Third, descriptive statistics are used to demonstrate mean and standard deviation of data. Fourth, utilizing statistical analysis, patterns are identified and categorized. Fifth, data management software is used to represent patterns gathered from statistical analysis in the form of figures and charts. Lastly, data are communicated to scientists and non-scientists through publications.

Protein Corona Sample Preparation
In this study, three different nanoparticle coatings were subjected to two different serum types and incubated for 24 hours. Gold nanoparticles of 30 nm were purchased from NanoHybrids (Austin, TX, USA). Particles were placed in aqueous dispersions with one of the following coatings: polyethylene glycol (PEG), carboxylic acid (COOH), or amine (NH3). The primary particle size of each nanoparticle was measured as 30 ± 2 nm using transmission electron microscopy (TEM). The different coatings provided a variety of charges among the coatings, with the PEG producing a neutrally charged surface, COOH producing a negatively charged surface, and NH3 producing a positively charged surface. This variance assisted the determination of the effects of surface charge on the formation of the protein corona. The serum types used included fetal bovine, termed serumtype 1 (ST1), and equine, termed serum-type 2 (ST2; Equitech-Bio Inc., Kerrville, TX, USA). Particles were incubated with serum at a 1:1 ratio at 37 °C for 24 hours. Coronas were separated from particles in two steps. First, the particle-corona suspension was centrifuged to remove excess proteins not adhered to particle surfaces. Second, the proteins were removed from the particle surface using a lysis buffer of 8 M urea at 95 °C for 5 minutes and then centrifuged to separate the proteins from the nanoparticles. Figure 2 shows the representative TEM of the gold nanoparticles before serum incubation ( Figure 2A) versus the gold nanoparticle after 24-hour incubation with serum-type 1 ( Figure 2B). Briefly, samples were deposited on 200 mesh carbon coated copper grid (EMS, Hatfield, PA, USA), left to dry at room temperature for 5 minutes, and washed with PBS twice. TEM images were collected using a JEM-1010 (JEOL Inc., Peabody, MA, USA) microscope at 60 kV with a spot size of 2 [42,43]. To determine the average corona diameter, 100 nanoparticles imaged in TEM micrographs were measured using Cell Sens version 1.13 from Olympus Corp (Center Valley, PA, USA). The hydrodynamic diameter was determined through dynamic light scattering (DLS), along with the polydispersity index (PdI), where NPPC with serum-type 1 (ST1) was determined to be highly monodisperse in the particle population as indicated by a low PdI of 0.144, whereas NPPC with serum-type 2 (ST2) was determined to be moderately monodisperse in particle population as indicated by a PdI of 0.483. Figure 1. The flow of data from collection to interpretation. After the conclusion of an experiment, data are collected. Within the first stage, sets of data are stored and organized locally or on the cloud in spreadsheets. Second, data are normalized relative to controls. Third, descriptive statistics are used to demonstrate mean and standard deviation of data. Fourth, utilizing statistical analysis, patterns are identified and categorized. Fifth, data management software is used to represent patterns gathered from statistical analysis in the form of figures and charts. Lastly, data are communicated to scientists and non-scientists through publications.

Protein Corona Sample Preparation
In this study, three different nanoparticle coatings were subjected to two different serum types and incubated for 24 h. Gold nanoparticles of 30 nm were purchased from NanoHybrids (Austin, TX, USA). Particles were placed in aqueous dispersions with one of the following coatings: polyethylene glycol (PEG), carboxylic acid (COOH), or amine (NH 3 ). The primary particle size of each nanoparticle was measured as 30 ± 2 nm using transmission electron microscopy (TEM). The different coatings provided a variety of charges among the coatings, with the PEG producing a neutrally charged surface, COOH producing a negatively charged surface, and NH 3 producing a positively charged surface. This variance assisted the determination of the effects of surface charge on the formation of the protein corona. The serum types used included fetal bovine, termed serum-type 1 (ST1), and equine, termed serum-type 2 (ST2; Equitech-Bio Inc., Kerrville, TX, USA). Particles were incubated with serum at a 1:1 ratio at 37 • C for 24 h. Coronas were separated from particles in two steps. First, the particle-corona suspension was centrifuged to remove excess proteins not adhered to particle surfaces. Second, the proteins were removed from the particle surface using a lysis buffer of 8 M urea at 95 • C for 5 min and then centrifuged to separate the proteins from the nanoparticles. Figure 2 shows the representative TEM of the gold nanoparticles before serum incubation ( Figure 2A) versus the gold nanoparticle after 24-h incubation with serum-type 1 ( Figure 2B). Briefly, samples were deposited on 200 mesh carbon coated copper grid (EMS, Hatfield, PA, USA), left to dry at room temperature for 5 min, and washed with PBS twice. TEM images were collected using a JEM-1010 (JEOL Inc., Peabody, MA, USA) microscope at 60 kV with a spot size of 2 [42,43]. To determine the average corona diameter, 100 nanoparticles imaged in TEM micrographs were measured using Cell Sens version 1.13 from Olympus Corp (Center Valley, PA, USA). The hydrodynamic diameter was determined through dynamic light scattering (DLS), along with the polydispersity index (PdI), where NPPC with serum-type 1 (ST1) was determined to be highly monodisperse in the particle population as indicated by a low PdI of 0.144, whereas NPPC with serum-type 2 (ST2) was determined to be moderately monodisperse in particle population as indicated by a PdI of 0.483.

Mass Spectrometry Data Collection
To identify and quantify the composition within each corona, proteins were removed from the particle surfaces after incubation. Once proteins were isolated, samples underwent tryptic digestion and were analyzed on liquid chromatography tandem mass spectrometry (LC-MS/MS) [44]. Proteins were identified and quantified using Progenesis QI for proteomics v4.1 (Nonlinear Dynamics, Waters Corporation, Milford, MA, USA).

Data Flow
Using tool sets available in Microsoft Excel (Redmond, WA, U.S. 2016), the following method was used to analyze data. Excel was used to compute graphical representation formulas that produced a series of graphs, charts, and tables. Visualized data summarized the results obtained from human-defined clustering. Here, seven different graphical representations were used: conditional formatting, two-dimensional (2D) stacked column charts, 2D pie-of-pie charts, histograms, and combo charts (of scatter and bar). After LC-MS/MS raw data collection, empirical values were stored and organized into clusters based upon characteristics, such as nanoparticle coating, NPPC size, serum-type, or protein abundance, mass, or identification. Subsequent analyses involved the clustering of protein data based on molecular or biological functions, such as transcription activity, actin-binding tendencies, or detoxification initiation.

Results
The most commonly used type of figure in proteomic studies is a table that displays high outputs of data in a tabular format. Although this may be the most comprehensive method that lists the most parametric outputs, it is technically not a visualization, nor is it conducive to quick but meaningful interpretation. Interpreting the results of such tables requires time, explanations, and a substantial amount of previous experience in proteomic analyses. Table 1 presents a traditional tabular format of the proteomic data set collected in this study. Information from the table ranges from the number of peptides to the molecular function of for each corona protein found. Any information can be added to the table, which increases the versatility of the data, but it is not a representation of the findings that is easily visualized. Trends and correlations are difficult to quickly obtain from a table, so may not be the best format to report data where multiple parameters are required to identify patterns. Tabular format is also limited by the number of rows and columns in a readable print-out. Overall, this method displays rapid information like sample name, type, and other simple parameters in the experiment.

Mass Spectrometry Data Collection
To identify and quantify the composition within each corona, proteins were removed from the particle surfaces after incubation. Once proteins were isolated, samples underwent tryptic digestion and were analyzed on liquid chromatography tandem mass spectrometry (LC-MS/MS) [44]. Proteins were identified and quantified using Progenesis QI for proteomics v4.1 (Nonlinear Dynamics, Waters Corporation, Milford, MA, USA).

Data Flow
Using tool sets available in Microsoft Excel (Redmond, WA, USA, 2016), the following method was used to analyze data. Excel was used to compute graphical representation formulas that produced a series of graphs, charts, and tables. Visualized data summarized the results obtained from human-defined clustering. Here, seven different graphical representations were used: conditional formatting, two-dimensional (2D) stacked column charts, 2D pie-of-pie charts, histograms, and combo charts (of scatter and bar). After LC-MS/MS raw data collection, empirical values were stored and organized into clusters based upon characteristics, such as nanoparticle coating, NPPC size, serum-type, or protein abundance, mass, or identification. Subsequent analyses involved the clustering of protein data based on molecular or biological functions, such as transcription activity, actin-binding tendencies, or detoxification initiation.

Results
The most commonly used type of figure in proteomic studies is a table that displays high outputs of data in a tabular format. Although this may be the most comprehensive method that lists the most parametric outputs, it is technically not a visualization, nor is it conducive to quick but meaningful interpretation. Interpreting the results of such tables requires time, explanations, and a substantial amount of previous experience in proteomic analyses. Table 1 presents a traditional tabular format of the proteomic data set collected in this study. Information from the table ranges from the number of peptides to the molecular function of for each corona protein found. Any information can be added to the table, which increases the versatility of the data, but it is not a representation of the findings that is easily visualized. Trends and correlations are difficult to quickly obtain from a table, so may not be the best format to report data where multiple parameters are required to identify patterns. Tabular format is also limited by the number of rows and columns in a readable print-out. Overall, this method displays rapid information like sample name, type, and other simple parameters in the experiment. Table 1. Proteins found in the corona of the AuNP-Amine-ST1 sample. The most common method to display high outputs of data into a figure is a traditional table. Although this may be the most comprehensive method, it is not a visualization, nor is it simple for non-scientists to interpret. For the functional data, only the most common function is noted. Bar graphs are an essential figure used for comparisons between different samples or parameters within a sample set. There are a limitless number of parameters that can be compared using a bar graph, but this method is limited to plots using a single x-(independent variable) and a single y-axis (dependent variable). Often, double y-axes add additional variable to the system, as long as the x-axis is constant between differing y-value parameters. Figure 3 shows two different bar charts that compare each protein from the corona against the normalized relative abundance. The chart presents the amount of each identified protein in each of the NPPC samples. With this visualization technique, each protein, serum-type, or nanoparticle surface coating is compared. For instance, the gold nanoparticle with the NH 3 coating incubated in serum-type 2 had the most MARCS_RAT (Myristoylated alanine-rich C-kinase substrate) protein. In contrast, ACDA1_METTE (Acetyl-CoA decarbonylase/synthase complex subunit alpha 1) protein was the most abundant component in the corona in the COOH coated gold nanoparticle incubated in the same serum-type 2. This method quantitatively compares protein abundance among multiple parameters. Bar graphs are an essential figure used for comparisons between different samples or parameters within a sample set. There are a limitless number of parameters that can be compared using a bar graph, but this method is limited to plots using a single x-(independent variable) and a single y-axis (dependent variable). Often, double y-axes add additional variable to the system, as long as the xaxis is constant between differing y-value parameters. Figure 3 shows two different bar charts that compare each protein from the corona against the normalized relative abundance. The chart presents the amount of each identified protein in each of the NPPC samples. With this visualization technique, each protein, serum-type, or nanoparticle surface coating is compared. For instance, the gold nanoparticle with the NH3 coating incubated in serum-type 2 had the most MARCS_RAT (Myristoylated alanine-rich C-kinase substrate) protein. In contrast, ACDA1_METTE (Acetyl-CoA decarbonylase/synthase complex subunit alpha 1) protein was the most abundant component in the corona in the COOH coated gold nanoparticle incubated in the same serum-type 2. This method quantitatively compares protein abundance among multiple parameters. (Top) Each protein is represented by its own normalized abundance percentages. For example, MARCS_RAT appears in all NPPC samples, but seems to be the most abundant protein in the amine-coated AuNP with serum-type 2 shell. (Bottom) Each NPPC is represented with its own normalized abundance percentages. For example, all NPPC samples contain multiple proteins, but amine-coated AuNP with serum-type 2 shell seems to be the most diverse. Figure 4 includes six pie graphs and comprehensively illustrates the total protein composition within each sample. Samples are categorized by nanoparticle coating and serum type. This method distinguishes the most abundant proteins among each of the coating-serum combinations using the average normalized abundance. Simultaneously, the figures depict the variations among each of the coating-serum combinations. This enables the comprehensive comparison of the average normalized abundance of each sample and the contrast of the protein composition between each combination [45][46][47][48][49][50]. Beginning with the three pie graphs representing samples incubated in serum-type 1, the corona from the NH3-coated particle is primarily composed of CALL_DROME (Calmodulin-related protein 97A) and MED22_CHICK (Mediator of RNA polymerase II transcription subunit 22). Similarly, the most abundant proteins in the PEG-ST1 combination are CALL_DROME and MED22_CHICK; however, these proteins are less abundant compared with NH3-ST1. These first two pie graphs implicate a commonality among the protein corona compositions specifically for serum- Figure 3. Protein abundance percent in sample. (Top) Each protein is represented by its own normalized abundance percentages. For example, MARCS_RAT appears in all NPPC samples, but seems to be the most abundant protein in the amine-coated AuNP with serum-type 2 shell. (Bottom) Each NPPC is represented with its own normalized abundance percentages. For example, all NPPC samples contain multiple proteins, but amine-coated AuNP with serum-type 2 shell seems to be the most diverse. Figure 4 includes six pie graphs and comprehensively illustrates the total protein composition within each sample. Samples are categorized by nanoparticle coating and serum type. This method distinguishes the most abundant proteins among each of the coating-serum combinations using the average normalized abundance. Simultaneously, the figures depict the variations among each of the coating-serum combinations. This enables the comprehensive comparison of the average normalized abundance of each sample and the contrast of the protein composition between each combination [45][46][47][48][49][50]. Beginning with the three pie graphs representing samples incubated in serum-type 1, the corona from the NH 3 -coated particle is primarily composed of CALL_DROME (Calmodulin-related protein 97A) and MED22_CHICK (Mediator of RNA polymerase II transcription subunit 22). Similarly, the most abundant proteins in the PEG-ST1 combination are CALL_DROME and MED22_CHICK; however, these proteins are less abundant compared with NH 3 -ST1. These first two pie graphs implicate a commonality among the protein corona compositions specifically for serum-type 1. However, in the COOH-ST1 sample, whereas MED22_CHICK is the most abundant protein in the sample, CALL_DROME is one of the least. This observation indicates the COOH coating (negative charged surface) may alter the protein composition to a higher degree than that of the NH 3 (positively charged surface) and PEG (neutrally charged surface) coatings. The three pie graphs representing samples incubated in serum-type 2 illustrate a more distributed composition for NH3-and PEG-coated nanoparticles. In both the NH3-ST2 and PEG-ST2 samples, MED22_RAT, CALL_DROME, and MED22_CHICK proteins are equally abundant. This observation indicates that these proteins may play a significant role in particle absorption in serumtype 2. However, in the COOH-ST2 sample, the CALL_DROME protein is by far the most abundant, with MARCS_RAT present in a small amount. As with the ST1-incubated samples, the COOH coating (negative charged surface) forms a different protein corona (in terms of composition) than that of the NH3 (positively charged surface) and PEG (neutrally charged surface) coatings.
The two most common proteins found in all coronas were CALL_DROME, (calmodulin-related protein 97A) and MARCS_CHICK (myristoylated alanine-rich C-kinase substrate). CALL_DROME regulates a range of normal cellular functions such as inflammation, metabolism, apoptosis, and immune response. The protein specifically functions as a calcium-binding messenger that modifies interactions with different target proteins like phosphatases or kinases. When calcium binds to calmodulin, a large structural change occurs. Electrophiles are needed to sequester the readily available electrons that influence the binding affinity for the positively-charged surface functionalization on the nanoparticles. MARCS is a common protein found in the cytoskeleton that performs a series of important functions including binding actin, calmodulin, and synapsin. This protein is in a class that strongly impacts cell motility, secretion, and shape. This class of proteins also participates in cell cycle regulation, neural development, and transmembrane transport. They play a part in sequestering acidic membrane phospholipids like PIP2, which is vital in protein signaling.
Both proteins play important roles in the maintenance of normal cellular functions in eukaryotic cells. However, in the presence of nanoparticles, a large portion of these proteins adsorb onto particles' surfaces. When these proteins adsorb onto nanoparticles surfaces, the structure changes, and function is lost. It is important to know which proteins bind to the surface and lose structural and functional capabilities, especially when nanoparticles are used in therapeutic applications.
Similar to the bar chart in Figure 3, the pie graphs in Figure 4 visually compare the abundance of each protein for all six nanoparticle-protein corona samples. The difference between the two figures is that the bar chart is used for quantitative analyses whereas the pie graph lends itself to qualitative analyses. The latter method best displays the different constituents of each sample so that trends in protein content with either serum-type or nanoparticle surface coating are noticeable. One trend demonstrated by the pie graphs is the increased abundance of protein MED22_CHICK in the The three pie graphs representing samples incubated in serum-type 2 illustrate a more distributed composition for NH 3 -and PEG-coated nanoparticles. In both the NH 3 -ST2 and PEG-ST2 samples, MED22_RAT, CALL_DROME, and MED22_CHICK proteins are equally abundant. This observation indicates that these proteins may play a significant role in particle absorption in serum-type 2. However, in the COOH-ST2 sample, the CALL_DROME protein is by far the most abundant, with MARCS_RAT present in a small amount. As with the ST1-incubated samples, the COOH coating (negative charged surface) forms a different protein corona (in terms of composition) than that of the NH 3 (positively charged surface) and PEG (neutrally charged surface) coatings.
The two most common proteins found in all coronas were CALL_DROME, (calmodulin-related protein 97A) and MARCS_CHICK (myristoylated alanine-rich C-kinase substrate). CALL_DROME regulates a range of normal cellular functions such as inflammation, metabolism, apoptosis, and immune response. The protein specifically functions as a calcium-binding messenger that modifies interactions with different target proteins like phosphatases or kinases. When calcium binds to calmodulin, a large structural change occurs. Electrophiles are needed to sequester the readily available electrons that influence the binding affinity for the positively-charged surface functionalization on the nanoparticles. MARCS is a common protein found in the cytoskeleton that performs a series of important functions including binding actin, calmodulin, and synapsin. This protein is in a class that strongly impacts cell motility, secretion, and shape. This class of proteins also participates in cell cycle regulation, neural development, and transmembrane transport. They play a part in sequestering acidic membrane phospholipids like PIP2, which is vital in protein signaling.
Both proteins play important roles in the maintenance of normal cellular functions in eukaryotic cells. However, in the presence of nanoparticles, a large portion of these proteins adsorb onto particles' surfaces. When these proteins adsorb onto nanoparticles surfaces, the structure changes, and function is lost. It is important to know which proteins bind to the surface and lose structural and functional capabilities, especially when nanoparticles are used in therapeutic applications.
Similar to the bar chart in Figure 3, the pie graphs in Figure 4 visually compare the abundance of each protein for all six nanoparticle-protein corona samples. The difference between the two figures is that the bar chart is used for quantitative analyses whereas the pie graph lends itself to qualitative analyses. The latter method best displays the different constituents of each sample so that trends in protein content with either serum-type or nanoparticle surface coating are noticeable. One trend demonstrated by the pie graphs is the increased abundance of protein MED22_CHICK in the samples incubated in ST1 compared to those incubated with ST2. A benefit of this method is the ability to easily see parts of the whole, i.e., the abundance of each protein in all samples. This method is limited by the sample size: if there are too many different proteins, it may be difficult to differentiate one protein from another. With a smaller sample size, this method works to compare individual samples.
The Pie-of-Pie graph, shown in Figure 5, enables the visualization of proteins identified by their molecular function and subsequent protein composition. This Figure Supports interpretation as it relates to biological relevance and assist in the design of future toxicological studies by narrowing the scope of pathway analyses. The first pie graph illustrates four primary functions categorized as transcription, actin-binding, toxin activity, and other. Presenting data in this manner does not compromise space by expending excessive detail on smaller groups that may not be relevant to the study (i.e., the use of the 'other' category combines lesser functions into a single wedge in the entire pie). Pie-of-Pie is not limited to just two graphs; the data can be extended into many other pie graphs that detail the composition of other functional groups noted in the molecular function graph, i.e., actin-binding and toxin activity. The second pie graph depicts the individual proteins that comprise the most abundant transcription functional group proteins identified in the NH 3 -ST1 combination. In this sample, the MED22_CAEEL protein is the most abundant among the transcription proteins (73%).  In some datasets, a figure needs to compare multiple parameters that are related in an experiment. In Figure 6, the abundance of protein is compared with the average NPPC complex diameter. Here, the x-axis is the serum-type and the y-axis is the normalized abundance of protein. The color blocks within each column represent the different nanoparticle surface functionalization/charges. When comparing the columns, the gold nanoparticles coated with PEG seem to have absorbed more proteins from serum-type 2 when compared with serum-type 1. Conversely, the gold nanoparticles coated with amine and carboxyl functional grouped absorbed more proteins from serum-type 1 when compared with serum-type 2. In addition to assessing the amount of proteins absorbed onto particle surfaces, this plot indicates the differences in NPPC size relative to serum type (as indicated by white diamonds markers). On average, the particle diameter of NPPC complexes was larger when incubated in serum-type 2 (109 nm) compared to when particles were incubated in serum-type 1 (79 nm). Since the ST1-incubated particles contain more proteins but have a smaller diameter, we concluded that incubation with ST1 creates a more densely packed protein corona shell. TEM images shown in Figure S1 provide visual evidence of this phenomenon.  . The hybridized abundancy figure demonstrates multiple parameter results, including coating composition, serum type, abundance of proteins for each sample, average corona size for that serum, and protein coating abundancy distribution in each serum type. This method is the most comprehensive by displaying multiple variables in one figure. This method is recommended to be used for summaries of bigger picture data to compare multiple parameters. Figure 6. The hybridized abundancy figure demonstrates multiple parameter results, including coating composition, serum type, abundance of proteins for each sample, average corona size for that serum, and protein coating abundancy distribution in each serum type. This method is the most comprehensive by displaying multiple variables in one figure. This method is recommended to be used for summaries of bigger picture data to compare multiple parameters.
In bottom-up proteomics methodology, the use of high-performance liquid chromatography (HPLC) is essential to separate peptides before analyzing with mass spectrometry. The most common separation method is reverse phase chromatography, where the HPLC separation column is packed with C18 (a separation column packed with particles composed of 18 carbon atoms), or another non-polar stationary phase. The mobile phase is a gradient from polar to non-polar solvents. In Figure S2, the peptides analyzed in all samples were plotted against retention time to show the polarity distribution of the peptides. Most of the peptides eluted earlier on in the gradient, i.e.,~36-56 min, indicating that most of the peptides that compose this protein corona are relatively polar. Data can be used to compare the polarity of peptides in each corona when plotted. Sample number or peptide abundance are not limitations with this figure, but a discrepancy can be made with the bins for the histogram. When analyzing multiple samples, the bins would need to be consistent to be comparable. The data from this study show that most peptides that comprise all protein coronas in all six samples are in the polar region.
The last data visualization method used in this study is a heat map ( Table 2). The table ranks the abundance from lowest (red) to highest (green) of all six samples. Given the comprehensive nature of the heat map, this figure enables the inclusion of multiple parameters, including protein abundance, nanoparticle coating, and serum-type. The color gradient creates a visual focus on the most abundant proteins in each sample type, while also comparing each protein against the other coating-serum combinations. Table 2 shows that CALL_DROME, MED22_CAEEL, and MED22_HUMAN can be quickly identified as the most abundant proteins among all the samples. Together, these three proteins may play a large role in the formation of the protein corona. Table 2. Heat map of the average normalized abundance of each serum/coating combination, illustrated through color scale with red as the least abundant, and green as the most. The map conveys the abundancy of protein-type or functional group in a sample for multiple coating or serum. However, this method does not consider the variations in incubation time. ST1 refers to serum-type 1 and ST2 refers to serum-type 2.

Abbreviation NH 3 -ST2 NH 3 -ST1 PEG-ST2 PEG-ST1 COOH-ST2 COOH-ST1
In bottom-up proteomics methodology, the use of high-performance liquid chromatography (HPLC) is essential to separate peptides before analyzing with mass spectrometry. The most common separation method is reverse phase chromatography, where the HPLC separation column is packed with C18 (a separation column packed with particles composed of 18 carbon atoms), or another nonpolar stationary phase. The mobile phase is a gradient from polar to non-polar solvents. In Figure S2, the peptides analyzed in all samples were plotted against retention time to show the polarity distribution of the peptides. Most of the peptides eluted earlier on in the gradient, i.e., ~36-56 min., indicating that most of the peptides that compose this protein corona are relatively polar. Data can be used to compare the polarity of peptides in each corona when plotted. Sample number or peptide abundance are not limitations with this figure, but a discrepancy can be made with the bins for the histogram. When analyzing multiple samples, the bins would need to be consistent to be comparable. The data from this study show that most peptides that comprise all protein coronas in all six samples are in the polar region.
The last data visualization method used in this study is a heat map ( Table 2). The table ranks the abundance from lowest (red) to highest (green) of all six samples. Given the comprehensive nature of the heat map, this figure enables the inclusion of multiple parameters, including protein abundance, nanoparticle coating, and serum-type. The color gradient creates a visual focus on the most abundant proteins in each sample type, while also comparing each protein against the other coating-serum combinations. Table 2 shows that CALL_DROME, MED22_CAEEL, and MED22_HUMAN can be quickly identified as the most abundant proteins among all the samples. Together, these three proteins may play a large role in the formation of the protein corona. Table 2. Heat map of the average normalized abundance of each serum/coating combination, illustrated through color scale with red as the least abundant, and green as the most. The map conveys the abundancy of protein-type or functional group in a sample for multiple coating or serum. However, this method does not consider the variations in incubation time. ST1 refers to serum-type 1 and ST2 refers to serum-type 2.

Discussion
The use of graphical methods for categorical data has permeated the peer-reviewed literature in many Science, Technology, Engineering, and Mathematics (STEM) related fields [51][52][53][54]. Some fields, such as those in the 'omics arena, require advanced computational methods in order to organize complex quantitative data sets [51]. Other analytical fields, such as statistical sciences, have contributed to the development of a data visualization industry; major players can be described as solution providers of either visual analytics or reporting tools [55,56]. Designing effective graphics is generally regarded as art informed by large sets of empirical data [57][58][59].
When relating (or correlating) particle properties to adverse health effects, it is important to define the parametric space in which the experimental data set belongs. In toxicological data, the entire space can be defined as the biological (or biochemical) effects of an exogenous toxicant (or endogenous toxin) on an in vivo or in vitro test system. However, an investigator can also define a specific space, such as the composition of the NPPC construct. In the former, the mass spectrometry data used in this paper reveals only a small number of critical attributes needed to define the possible health effects of gold nanoparticle exposure on human health. In the latter, the mass spectrometry data visualized herein defines the composition of the NPPC construct completely and lends itself to interpretation that promotes the next set of pathway-specific toxicity studies.
Specific to 'omics data, modern nanotoxicology experimental designs incorporate genomics, proteomics, and metabolomics into the already tedious process of nanomaterial physicochemical characterization, dose-response relationships, exposure dosimetry, and mechanistic analyses [60][61][62][63]. For example, genomic data provide detailed information about specific cellular pathways triggered after exposure. Such data include peptide sequence, length, and density. Metabolomics data aid in assembling biotransformation reaction schemes of degraded xenobiotics and include not only metabolites, but also information about the parent molecules [64,65]. Similarly, proteomics data contribute to the characterization of normal cell and tissue physiological processes [66,67]. This paper focused on the different ways in which data collected from proteomic mass spectrometry analyses can be presented, visualized, and communicated. The proteins analyzed are those that form a protein corona shell around a gold nanoparticle core. Our results show that the composition, density, and size of the protein corona varies when the nanoparticle zeta potential changes, as well as when the serum-type changes. Therefore, it is reasonable to expect that the protein corona formed on the surface of nanomaterial drug products (i.e., nano-drugs, nano-additives, or nano-carriers) may differ between drug products and between individual patients.
A suite of visualization techniques was used to communicate the results. Data visualization refers to the techniques used to organize large data sets and communicate results [51,59]. Some of these massive data sets are measured in gigabytes (109 bytes) and terabytes (1012 bytes) [65,68]. Until recently, most toxicological data sets have been held to a manageable sample size (with one variable measured over five or more doses, time points, and replicates) due to limitations in metrology, computational power, and resources [69]. Advanced data analytics and multi-variant visualization have not pervaded the toxicology literature. Today, proteomic data are becoming a routine endpoint in toxicological assessments [60][61][62][63]66,67,70]. Therefore, the inclusion of this type of data has created the opportunity for more comprehensive analysis, and thus the critical need to visualize the variability of multiple parameters and endpoints in a single graphical representation [71].
Adding multiple independent variables (such as dose, time, and sample-type) to toxicological experimental designs presents challenges when reporting and communicating data. This is not unique to input variables; increasing the number of output variables or endpoints also complicates the reporting process. The advent of 'omics techniques, such as multiplexing, microarray analyses, and RNAseq, facilitate the collection of multiple endpoints within a single biological test system. The plethora of data collected at the completion of these types of studies present challenges with data storage, cleaning, normalization, organization, processing, statistical analyses, pattern recognition, modeling, and communication [72,73].

Conclusions
Relating descriptive to functional analyses, the health effects of the protein corona can be determined using mass spectrometry tools only if meaningful interpretation can be visualized. A successful flow of information that bridges data collection, visualization, and results communication is needed to enable a higher degree of user interaction. Further developments in this work flow will include extension to more sophisticated software and the use of novel statistical analysis tools.