Detailed Source-Specific Molecular Composition of Ambient Aerosol Organic Matter Using Ultrahigh Resolution Mass Spectrometry and 1 H NMR

Organic aerosols (OA) are universally regarded as an important component of the atmosphere that have far-ranging impacts on climate forcing and human health. Many of these impacts are related to OA molecular characteristics. Despite the acknowledged importance, current uncertainties related to the source apportionment of molecular properties and environmental impacts make it difficult to confidently predict the net impacts of OA. Here we evaluate the specific molecular compounds as well as bulk structural properties of total suspended particulates in ambient OA collected from key emission sources (marine, biomass burning, and urban) using ultrahigh resolution mass spectrometry (UHR-MS) and proton nuclear magnetic resonance spectroscopy (1H NMR). UHR-MS and 1H NMR show that OA within each source is structurally diverse, and the molecular characteristics are described in detail. Principal component analysis (PCA) revealed that (1) aromatic nitrogen species are distinguishing components for these biomass burning aerosols; (2) these urban aerosols are distinguished by having formulas with high O/C ratios and lesser aromatic and condensed aromatic formulas; and (3) these marine aerosols are distinguished by lipid-like compounds of likely marine biological origin. This study provides a unique qualitative approach for enhancing the chemical characterization of OA necessary for molecular source apportionment.


Introduction
Organic matter (OM) comprises a significant portion of total aerosol mass, as much as 90% in certain areas [1,2], and is generated by a number of anthropogenic and biogenic emission sources.Organic aerosol (OA) compounds, once emitted into the atmosphere as primary OA or formed in situ as secondary OA (SOA) from gas-phase precursors, can undergo a myriad of atmospheric reactions forming new compounds that have different chemical structures and associated physical properties.The immense complexity of OA contributes to the difficulty in understanding the net impacts OA has on human health, biogeochemical cycling, and net radiative forcing.
The composition and relative concentrations of OA are expected to vary spatially and temporally due to differences in emission inputs and in the extent to which OA are transformed in the atmosphere by secondary aging processes [1].The molecular composition of OA resulting from these emissions and aging processes will, in part, determine its impacts on e.g., aerosol hygroscopicity [3,4], light absorption [5], and biogeochemical cycling upon deposition [6,7].The ability to apportion molecular compositions and impacts to specific aerosol emission sources and aging processes is therefore an important goal of the atmospheric community.Many studies have addressed similarities and differences of OA molecular composition among various aerosol sources, as well as seasonal and diurnal variability [6,[8][9][10][11][12], but considerable work remains for a full understanding of this complex problem.
Fortunately, the adoption of new powerful analytical techniques such as nuclear magnetic resonance spectroscopy (NMR) and ultrahigh resolution mass spectrometry (UHR-MS) have enabled the characterization of important OA molecular and structural details.Solution-state proton NMR ( 1 H NMR) enables the characterization of soluble extracts of OA without extensive sample preparation and has been thoroughly reviewed [13,14].A 1 H NMR spectrum can be analyzed to determine the relative contributions of major proton groups (e.g., alkyl protons) within OA, and also to identify specific compounds such as acetate, methanesulfonic acid, and levoglucosan that feature as sharp peaks in a spectrum [6,[15][16][17][18].Where NMR succeeds in providing these general structural details, it is unable to provide specific molecular details due to the overlapping signals from hydrogens associated with atmospherically relevant functional groups, and spectral interpretations must be simplified.
Pairing NMR with UHR-MS, which provides detailed molecular composition but only limited structural information, is therefore an attractive approach.UHR-MS, particularly Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS), allows for the determination of molecular formulas for thousands of high-molecular weight (>200 Da) compounds present within a single sample providing important chemical properties of OA.Its ultrahigh resolution (~500,000 over m/z 200-800) and mass accuracy (<1 ppm) can be used to obtain vital fingerprints, in the form of specific related molecular compound classes, that may be diagnostic for OA from specific emission sources and/or that have undergone molecular transformations.Numerous studies have used UHR-MS to reveal the molecular details of atmospheric OM [17,[19][20][21][22][23][24][25][26][27][28][29][30][31][32][33].UHR-MS techniques are generally regarded as non-quantitative, and structure can only be inferred from the molecular formula information, but when used in tandem with NMR extensive molecular and structural information can be achieved with minimal sample preparation.Such a pairing can provide fingerprints for the qualitative apportionment of OA molecular features that determine OA impact in the atmosphere and depositional environments.
The work presented here pairs these two powerful techniques (UHR-MS and 1 H NMR) to provide source-specific molecular characteristics for ambient aerosol samples from key anthropogenic and biogenic emission sources that can be used to aid more traditional source apportionment studies.The emission sources chosen here (marine, mixed source, biomass burning, urban) have significant regional and global quantitative importance on atmospheric aerosol loadings.Water-soluble and pyridine-soluble extracts for multiple samples from each source were evaluated for molecular characteristics using FTICR-MS and 1 H NMR. Compositional differences were elucidated using the chemometric approach, principal component analysis (PCA), to identify characteristic molecular features for each source type.Though qualitative in nature, this study sheds light on the sources of OA present in the ambient atmosphere and will be valuable for assessing the source-specific environmental impacts of OA as relationships between molecular characteristics and environmental impacts are strengthened.

Aerosol Sample Collection
Ambient aerosol total suspended particulates (TSP, n = 14) were collected from four different locations to represent different emission source types.Air was drawn through pre-combusted (4 h, 475 ˝C) quartz microfiber filters (Whatman QM/A, 20.3 ˆ25.4 cm, 419 cm 2 exposed area, 0.6 µm effective pore size) using a TSP high-volume air sampler (model GS2310, Thermo Andersen, Smyrna, GA, USA) at flow rates ranging between 0.7 and 0.9 m 3 ¨min ´1.Air particles were collected for 8-29 h with total air volumes ranging between 410 and 1170 m 3 .The filters were transferred to combusted foil pouches immediately after collection and stored at 8 ˝C until analysis.Exact sampling dates and locations can be found in the supplemental methods.Briefly, marine TSP (n = 4) were collected aboard the R/V Knorr (Woods Hole, MA, USA) as part of the 2011 US GEOTRACES program cruise (GA03) [34] and as part of the 2014 second Western Atlantic Climate Study [35].The US GEOTRACES samples (n = 3) are a subset of samples described in Wozniak et al. [7,31] where their WSOM FTICR-MS (negative ionization) and 1 H NMR characteristics were described.Mixed source TSP samples (n = 3) were collected at sea level at the Virginia Institute of Marine Science in Gloucester Point, Virginia, USA.Data for these mixed source samples are also presented in Willoughby et al. [30], a study demonstrating the utility of pyridine as a method for extracting and analyzing water-insoluble aerosol OM.Biomass burning TSP samples (n = 2) were collected at sea level in Suffolk, VA downwind of heavy smoke pollution from a fire burning at the Great Dismal Swamp.Urban TSP samples (n = 5) were collected ~60 m above sea level on the roof of an academic building at Drexel University in downtown Philadelphia, PA, USA.A storage or field blank filter was analyzed for each respective aerosol sample.

Aerosol Mass and Carbon Measurements
The QM/A filters were weighed before and after sampling to determine the TSP mass loadings (Table 1).A portion of each aerosol filter was analyzed in triplicate for total carbon (TC) using a FlashEA 1112 elemental analyzer (Thermo Scientific, Waltham, Massachusetts, USA).Black carbon (BC) amounts were determined using the chemothermal oxidation at 375 ˝C method (CTO-375) [36] and measured on the same elemental analyzer.Solvent extracts of the aerosols and respective filter blanks were obtained by combining aerosol filter plugs of known OC masses with ultrapure water (Millipore Synergy Ultrapure Water System, Darmstadt, Germany) or pyridine (Sigma-Aldrich, St. Louis, Missouri, USA, ě99.9%), and insoluble particles were removed using a syringe with a 0.45 µm PTFE filter cartridge.Percent water-soluble organic carbon (WSOC) for each water filtrate was determined by evaluating the non-purgeable organic carbon using a Shimadzu (Kyoto, Japan) TOC-VCPH analyzer.The pyridine-soluble organic carbon percentage (%PSOC) was determined by dissolving each of the aerosol samples into pyridine-D 5 and comparing spectral signals determined by 1 H NMR to that of a glucose standard [30].These methods are described in greater detail in the supplementary methods.Due to limited sample availability, the marine aerosols were only measured for TC and WSOC.* Some parameters for the marine aerosols could not be evaluated due to limited sample availability.

FTICR-MS Analysis
For FTICR-MS analyses, the water extracts were desalted using an established procedure for Agilent PPL solid-phase extraction cartridges [37].The desalted sample was eluted in methanol (Acros Organics, Geel, Belgium, 99.9%), and will be referred to as WSOM PPL to differentiate it from WSOM.All WSOM PPL samples were analyzed in both positive and negative electrospray ionization (ESI) mode (WSOM+ and WSOM´, respectively), and PSOM samples were analyzed in negative mode (PSOM´) only due to poor signal observed in the positive mode.A respective field blank extract was prepared identically and analyzed immediately prior to each of the sample extracts to obtain a representative experimental blank spectrum.Each of the samples was analyzed on a Bruker Daltonics (Bremen, Germany) 12 Tesla Apex Qe FTICR-MS with an Apollo II ESI source housed at the College of Sciences Major Instrumentation Cluster at Old Dominion University.Samples were infused to the ESI source at 120 nL¨min ´1, and spray voltages were optimized for each sample.Ions were accumulated in the hexapole for 0.5-2.0s before transfer into the ICR cell, where exactly 300 transients were co-added.The instrument was externally calibrated daily using a polyethylene glycol standard.Each spectrum was internally calibrated using the naturally occurring molecules including fatty acids and other homologous series of compounds containing only carbon, hydrogen and oxygen [38].Peaks consistent with salts (mass defect 0.4-0.98 for m/z < 400, and mass defect 0.6-0.97 for m/z > 400), blank peaks (those found in the respective filter blank), and 13 C isotopologue peaks were subtracted from the mass list and not considered for formula assignments.

Molecular Formula Assignments
A unique molecular formula was assigned to a majority (82% ˘9%) of the measured peaks having a S/N ratio of at least 3 using an in-house generated MatLab (The MathWorks Inc., Natick, MA, USA) code according to the criteria ) were used to verify ambiguous assignments.Each of the assigned formulas has a calculated mass within 1 ppm agreement with the measured m/z, where a large majority of the formulas (88% ˘7%) have less than 0.5 ppm error.

1 H NMR Spectroscopy
Each WSOM extract was diluted immediately before 1 H NMR analysis using D 2 O (100% atom D, Acros Organics) at a ratio of 90:10 WSOM:D 2 O.The deuterated WSOM solutions were analyzed via 1 H NMR spectroscopy using a Bruker Daltonics 400 MHz NMR with a BBI probe.Each sample was scanned 4000 (mixed source, biomass burning, and urban WSOM) or 8000 times (marine WSOM) using a standard Bruker water-suppression pulse program, where the 90 ˝pulse and the transmitter offset were optimized individually for each sample.The signals obtained from 1 H NMR spectra were integrated over the entire spectral range to obtain the total signal response, and were also integrated over four specific chemical shift ranges to determine contributions from major proton types [16,18].The signal response was normalized to the total signal in these regions (i.e., total signal = Area 0.6-4.4ppm + Area 6.0-9.0 ppm ) to determine the average relative contributions for each region.The regions are defined based on the chemical environment of protons exhibiting signal at those chemical shifts: (1) aliphatic hydrogen (H-C, 0.6-1.8ppm); (2) unsaturated alkyl hydrogen (H-C-C=, 1.8-3.2ppm); (3) oxygenated aliphatic hydrogen (H-C-O, 3.2-4.4ppm), and aromatic hydrogen (Ar-H, 6.0-9.0 ppm).Aldehyde and carboxylic acid hydrogen (H-C=O, and HO-C=O) would appear downfield of the aromatic protons (i.e., >9 ppm), but were not detected because these protons readily exchange with the deuterium in the D 2 O required for analysis.

Principal Component Analysis
Principal component analysis was applied separately to the molecular formulas assigned to FTICR mass spectra and peaks present in the 1 H NMR spectra in order to reveal the components that describe the greatest amount of variance between the source types.The PCA was performed using an in-house MatLab script.The first PC (PC1) explains the most amount of variance, and the second PC (PC2) is orthogonal to PC1 and explains the second largest portion of the variance.Each successive PC explains less variance until a point of diminishing returns is reached (i.e., <1% variance explained).

FTICR-MS PCA
Similar to positive matrix factorization, PCA is a factor analysis technique for explaining the observed composition in complex mixtures.It uses an eigenvector analysis of a correlation matrix to calculate principal components and variable loadings.The molecular formulas from each FTICR mass spectrum for the WSOM´, WSOM+, and PSOM´were compiled into a master formula list containing the formulas present in 2-13 of the 14 samples (14,808 formulas).Formulas present in only one sample, and formulas present in all 14 samples were removed to avoid biasing the PCA toward rare formulas and to eliminate formulas that do not contribute to the sample variance, respectively [31,40,41].A 14,808 ˆ14 matrix was created by using an input value of 1 if a formula is present and an input value of 0 for a formula not present within a given sample.

1 H NMR PCA
The 1 H NMR spectra were analyzed in a PCA following previous studies [6,7].Briefly, all of the peaks in the 1 H NMR spectrum for each of the aerosol WSOM extracts between 0.0 and 11.0 ppm were binned to a resolution of 1 data point per 0.005 ppm from an initial resolution of 0.0008 ppm between data points.The discrete signal (peak area) at each chemical shift was normalized to the total area in the given spectrum, and the normalized area was used as the data input variables (n = 2769) for the PCA.The 1 H NMR spectra were evaluated initially using all of the aerosol samples (n = 14), and a second time after omitting the marine aerosols (n = 10).

Aerosol Loadings
The aerosol samples from each of the emission sources show OM characteristics that distinguish the sources from one another.TSP and TC concentrations and BC, WSOC and PSOC percentages (relative to TC) were determined for each of the aerosol samples (Table 1).TSP, %BC, and %PSOC were not determined for the marine samples as discussed in the methods section.TSP loadings were highest for the biomass burning samples (73.2 µg¨m ´3) followed by the urban (47.1 µg¨m ´3) and mixed source (24.1 µg¨m ´3) samples.The marine samples show TC loadings (0.5 µg¨m ´3) one order of magnitude lower than the mixed source (5.7 µg¨m ´3) and urban (6.3 µg¨m ´3) samples and two orders of magnitude lower than the biomass burning samples (24.8 µg¨m ´3) as one would expect for samples collected over the middle of the ocean far from major terrestrial and anthropogenic sources.In spite of the urban samples having TSP loadings that are approximately twice that of the mixed source samples, they show similar TC concentrations indicating that the urban samples contain high amounts of inorganic materials.The biomass burning samples showed the highest %BC (6.5%) and lowest %WSOC (33.6%) values of the samples which is expected for samples collected in proximity (<30 km) to biomass combustion processes that produce BC and have not been exposed to aging processes known to increase water solubility.The marine samples show low but variable %WSOC values (39.6 ˘25.1%).The urban aerosol samples contained a higher %BC (3.4%) and lower %WSOC (40.8%) than the mixed source aerosol (%BC = 1.9, %WSOC = 50.5)samples.Like all elemental and BC measurements, the CTO-375 method used here is subject to artifacts; it is thought to be selective for highly condensed soot BC and is susceptible to potential positive bias from the charring of melanoidin-like species [42], and the BC results should be viewed with this in mind.In a multi-laboratory comparison study, CTO-375 of aerosol particulate materials yielded lower %BC (3.7%-14.3%)relative to elemental carbon measured using thermal optical reflectance and thermal optical transmittance methods (16%-50%) frequently used in atmospheric studies likely due to the latter two methods inclusion of a portion of OC and the CTO-375 method's selectivity for highly condensed soot BC [42].
The %PSOC amounts were calculated from 1 H NMR data (Supplemental Table S1) after Willoughby et al. [30] and are useful for understanding the amount of material analyzed in PSOM extracts for FTICR-MS analyses of water-insoluble OM.Because the pyridine extractions were conducted in parallel to the water extractions and some carbon compounds are soluble in both solvents and others are soluble in neither, the sum of these two percentages (%WSOC + %PSOC) may be more or less than 100%.The biomass burning samples can be expected to contain significant amounts of water-insoluble primary OA, and this is reflected in the high %PSOC and low %WSOC values.The calculation for PSOC percentage by the 1 H NMR technique omits aromatic peaks due to interference by the exchanged pyridine protons.Because the biomass burning samples are expected to have high aromatic contributions, as indicated by the high %BC and increased signal in aromatic region of the 1 H NMR spectrum for the WSOM (discussed in Section 3.4), this %PSOC value may be considered a low estimation.The urban and mixed source aerosols have considerably lower %PSOC (urban = 44%; mixed = 45%) reflecting their higher water solubility and suggesting that these two sample types have more influence from secondary and aging reactions that produce OA insoluble in pyridine.

Mass Spectra and Average Source Molecular Characteristics
Averaged values calculated for properties determined using FTICR-MS molecular formulas for each of the sample types demonstrate some source-specific characteristics that differentiate them from one another.Each of the ESI-FTICR mass spectra for the ambient aerosol extracts average thousands of peaks across a broad range of 200-800 m/z (Figure 1).The average number of formulas found in each sample's master formula list (i.e., WSOM´, WSOM+, and PSOM´combined) followed the same trend as was observed for the TSP and TC concentrations (Table 2).The biomass burning aerosols showed the highest TC loads and averaged 6579 (˘173) formulas assigned to each sample and a total of 7891 formulas.The urban aerosols were the next most molecularly abundant sample type averaging 6527 (˘173) formulas per sample and a total of 10,701 formulas in all samples.An average of 4104 (˘467) formulas were assigned to each mixed source sample, and a total of 6134 formulas were identified.To go along with their lowest TC load (Table 1), the marine aerosol samples totaled just 4570 formulas, averaging 2569 (˘736) formulas per aerosol sample.Higher quantities of organic carbon increase the overall number of compounds and the probability of a more diverse suite of compounds, and the similar trends in molecular formula abundances and TC loads are thus logical and likely related.
Atmosphere 2016, 7, 79 6 of 23 conducted in parallel to the water extractions and some carbon compounds are soluble in both solvents and others are soluble in neither, the sum of these two percentages (%WSOC + %PSOC) may be more or less than 100%.The biomass burning samples can be expected to contain significant amounts of water-insoluble primary OA, and this is reflected in the high %PSOC and low %WSOC values.The calculation for PSOC percentage by the 1 H NMR technique omits aromatic peaks due to interference by the exchanged pyridine protons.Because the biomass burning samples are expected to have high aromatic contributions, as indicated by the high %BC and increased signal in aromatic region of the 1 H NMR spectrum for the WSOM (discussed in Section 3.4), this %PSOC value may be considered a low estimation.The urban and mixed source aerosols have considerably lower %PSOC (urban = 44%; mixed = 45%) reflecting their higher water solubility and suggesting that these two sample types have more influence from secondary and aging reactions that produce OA insoluble in pyridine.

Mass Spectra and Average Source Molecular Characteristics
Averaged values calculated for properties determined using FTICR-MS molecular formulas for each of the sample types demonstrate some source-specific characteristics that differentiate them from one another.Each of the ESI-FTICR mass spectra for the ambient aerosol extracts average thousands of peaks across a broad range of 200-800 m/z (Figure 1).The average number of formulas found in each sample's master formula list (i.e., WSOM−, WSOM+, and PSOM− combined) followed the same trend as was observed for the TSP and TC concentrations (Table 2).The biomass burning aerosols showed the highest TC loads and averaged 6579 (±173) formulas assigned to each sample and a total of 7891 formulas.The urban aerosols were the next most molecularly abundant sample type averaging 6527 (±173) formulas per sample and a total of 10,701 formulas in all samples.An average of 4104 (±467) formulas were assigned to each mixed source sample, and a total of 6134 formulas were identified.To go along with their lowest TC load (Table 1), the marine aerosol samples totaled just 4570 formulas, averaging 2569 (±736) formulas per aerosol sample.Higher quantities of organic carbon increase the overall number of compounds and the probability of a more diverse suite of compounds, and the similar trends in molecular formula abundances and TC loads are thus logical and likely related.The marine aerosols share the lowest average O/C ratio (0.32 ˘0.19;Table 2) with the biomass burning aerosols, differentiating them from the urban and mixed source aerosols which had considerably higher average O/C ratios (0.45 ˘0.23 and 0.44 ˘0.23, respectively).The lower average O/C values for the biomass burning and marine aerosol samples suggest they are less oxidized than the mixed source and urban aerosols due to less post-emission atmospheric processing and/or lower characteristic O/C OM at emission.Atmospheric aging processes are known to increase the average O/C ratio of aerosol OM [43,44].The average O/C ratio (0.45 ˘0.23) for the urban aerosols is the highest of all the samples, supporting previous work showing highly oxidized OM near urban regions due to active photochemistry and abundant inorganic oxidants [45,46] in spite of proximity to primary OA sources including vehicle exhausts that are expected to have a hydrocarbon-like (low O/C) profile.The similarly high average O/C ratio measured for the mixed source samples, however, demonstrates that the high O/C ratios and presumed high extent of atmospheric oxidation are not unique to urban environments.
The marine aerosols have the highest average H/C ratio (1.56 ˘0.39).The modified aromaticity index (AI mod ) was calculated for each molecular formula according to the formula proposed by Koch and Dittmar [47].The average AI mod for the marine aerosols (0.18) is equal to the mixed source and urban aerosols and indicative of the prevalence of olefinic/alicyclic (0 < AI mod < 0.5) compounds.These ratios differentiate the marine aerosols from the biomass burning aerosols where the O/C ratios did not.The average H/C ratios for the biomass burning samples are the lowest (1.35 ˘0.39) of any of the aerosol sources indicating a large number of unsaturated molecules, and this is verified by its average AI mod (0.29 ˘0.27), which is much higher than those of the other three types of samples.As was the case for the O/C ratios, the average H/C ratio (1.46 ˘0.35), and AI mod (0.18 ˘0.23) of the mixed source aerosols are very similar to those calculated for the urban samples (H/C ratio = 1.44 ˘0.37, AI mod = 0.18 ˘0.24) making them almost indistinguishable based on these properties alone (Table 2).The biomass burning aerosols have substantially more aromatic (0.5 ď AI mod < 0.67, 16.7%, 1318 formulas) and condensed aromatic (AI mod ě 0.67, 6.6%, 523 formulas), and fewer aliphatic (AI mod = 0, 23.6%, 1864 formulas) formulas than any of the other emission sources but a similar amount of olefinic/alicyclic formulas (53.0%;Supplemental Table S2).The urban, marine, and mixed source aerosols all showed similar distributions with regard to AI mod classifications.
The averaged contributions of elemental formula combinations to each master formula list also present a broad brush method for distinguishing among the four sample types.Details regarding the properties for each ionization source can be found in the supplementary information (Supplemental Table S3), but will not be discussed.CHO formulas were always the most abundant (marine, mixed source) or second most abundant (biomass burning, urban) formula type for each group (Supplemental Table S2).The relative distribution of CHOS and CHONS formulas are similar among all the sources (CHOS = 17.8%-20.9%,CHONS = 12.2%-15.4%)with the mixed source aerosols having the highest contributions from both formula types.The marine aerosols show the largest relative contributions from P-containing molecular formulas.The van Krevelen diagram enables a representation of all of the molecular formulas assigned to the samples including the variation of H/C and O/C ratios as well as a visualization of some of the differences and commonalities described by the averaged ratio values (Supplemental Figure S1), and can be used to characterize differences among the four sample types.However, any of several molecular formulas can plot at the same H/C and O/C ratios, and it is laborious and inefficient to evaluate similarities and differences among samples by looking at tens of van Krevelen diagrams.It is more efficient and statistically valid to identify the defining features among several samples with large molecular formula datasets using a factor analysis such as PCA.

FTICR-MS PCA
The majority of the variance (66.0%) in the FTICR-MS formula identifications among the four aerosol sample types is explained by the first three principal components (PC1, 32.5%; PC2, 22.4%; PC3, 11.1%; Figure 2).The aerosol samples show distinctive PC1-PC3 values based on their source characterization.Each of the marine samples have a negative PC1 score, a positive PC2 score, and a positive PC3 score.The mixed source samples have a positive PC1, a positive PC2, and a negative PC3 score.The biomass burning samples have a negative PC1, a negative PC2, and a positive PC3.The urban samples have a positive PC1, a negative PC2, and a positive PC3.These PC score classifications were used to identify the loadings (the molecular formulas used as PCA input variables; Supplemental Figure S2) diagnostic for each source.For example, if a molecular formula has negative PC1 loadings, and positive PC2 and PC3 loadings, it is classified as a formula characteristic of marine sources.This resulted in the identification of 1078 formulas characteristic for marine aerosols, 693 formulas for mixed source aerosols, 4174 formulas for biomass burning aerosols, and 3484 formulas for urban aerosols.The remaining 5379 formulas contain characteristics that are represented by multiple sources indicating that they are not diagnostic of a particular source and may be ubiquitous in aerosol OM or inconsistently present in a given source.
Atmosphere 2016, 7, 79 8 of 23 and O/C ratios, and it is laborious and inefficient to evaluate similarities and differences among samples by looking at tens of van Krevelen diagrams.It is more efficient and statistically valid to identify the defining features among several samples with large molecular formula datasets using a factor analysis such as PCA.

FTICR-MS PCA
The majority of the variance (66.0%) in the FTICR-MS formula identifications among the four aerosol sample types is explained by the first three principal components (PC1, 32.5%; PC2, 22.4%; PC3, 11.1%; Figure 2).The aerosol samples show distinctive PC1-PC3 values based on their source characterization.Each of the marine samples have a negative PC1 score, a positive PC2 score, and a positive PC3 score.The mixed source samples have a positive PC1, a positive PC2, and a negative PC3 score.The biomass burning samples have a negative PC1, a negative PC2, and a positive PC3.The urban samples have a positive PC1, a negative PC2, and a positive PC3.These PC score classifications were used to identify the loadings (the molecular formulas used as PCA input variables; Supplemental Figure S2) diagnostic for each source.For example, if a molecular formula has negative PC1 loadings, and positive PC2 and PC3 loadings, it is classified as a formula characteristic of marine sources.This resulted in the identification of 1078 formulas characteristic for marine aerosols, 693 formulas for mixed source aerosols, 4174 formulas for biomass burning aerosols, and 3484 formulas for urban aerosols.The remaining 5379 formulas contain characteristics that are represented by multiple sources indicating that they are not diagnostic of a particular source and may be ubiquitous in aerosol OM or inconsistently present in a given source.

Marine Aerosols
In contrast with the average CHO contributions for the master formula list (41.1%), the PCAidentified formulas for the marine aerosols showed the lowest contributions from CHO formulas (31.6%,Table 3) of all the sample types.CHO formulas are still the most abundant formula type of the PC loadings associated with the marine aerosols, and the majority of these formulas are present at high H/C and low O/C ratios (Figure 3a).In fact, most of the formulas identified as specific to marine sources by PCA are localized to the upper-left region of the van Krevelen diagram (O/C ≤ 0.6, H/C ≥ 1.5), a region where many biologically-relevant compounds (lipids, fatty acids, proteins), suggesting that biological activity is an important source for marine aerosols (Figure 3a). .PCA was performed using molecular formulas identified in FTICR mass spectra for aerosol OM extracts from marine, biomass burning, urban, and mixed source areas.

Marine Aerosols
In contrast with the average CHO contributions for the master formula list (41.1%), the PCA-identified formulas for the marine aerosols showed the lowest contributions from CHO formulas (31.6%,Table 3) of all the sample types.CHO formulas are still the most abundant formula type of the PC loadings associated with the marine aerosols, and the majority of these formulas are present at high H/C and low O/C ratios (Figure 3a).In fact, most of the formulas identified as specific to marine sources by PCA are localized to the upper-left region of the van Krevelen diagram (O/C ď 0.6, H/C ě 1.5), a region where many biologically-relevant compounds (lipids, fatty acids, proteins), suggesting that biological activity is an important source for marine aerosols (Figure 3a).The PC loadings for the marine aerosols have the largest fraction of CHOS (27.4%),CHONS (19.0%), and CHOP (N,S) (8.6%) formulas of all the samples (Table 3).It appears thus, that the heteroatom containing formulas that are present in marine aerosols are very distinctive.The abundance of P-containing formulas in this region of the van Krevelen diagram is consistent with inputs of biologically-derived phospholipids as has been observed previously for marine aerosols [31].The PC loadings for the marine aerosols have the largest fraction of CHOS (27.4%),CHONS (19.0%), and CHOP (N,S) (8.6%) formulas of all the samples (Table 3).It appears thus, that the heteroatom containing formulas that are present in marine aerosols are very distinctive.The abundance of P-containing formulas in this region of the van Krevelen diagram is consistent with inputs of biologically-derived phospholipids as has been observed previously for marine aerosols [31].Membrane phospholipids have characteristic fatty acid tails with hydrophobic alkyl chains and hydrophilic phosphate heads that can impart amphipathic characteristics, and these P-containing formulas are found in both the PSOM´and WSOM´(Supplemental Table S4; Figure S3) in agreement with this partial solubility.The large number of CHOS and CHONS formulas plotting in these regions have O/S ratios >4 and are suggestive of organosulfate compounds which have also been observed in marine aerosols [31,48,49].Organosulfates are formed via photochemical aging reactions with the acid-catalyzed ring opening reactions of precursor molecules being the most kinetically favorable reaction mechanism [50][51][52].In this instance, the sulfate available for reaction is likely to be marine-derived from biological emissions of dimethyl sulfide that is oxidized in the atmosphere or from sea salt sulfates emitted with sea spray.The low O/C ratios of the marine CHOS and CHONS molecular formulas suggest that any precursor organic compounds were lipid-like in nature and also had low O/C ratios.

CHOP(N
Many studies have demonstrated the influence of biological activity on marine aerosols citing the importance of carbohydrate-like and amino-acid-like compounds, for example [53][54][55].Carbohydrates are not strongly ionizable compounds under electrospray and are not observed in these samples, and it is important to note that these characteristics represent only the polar, ionizable fraction of marine aerosols.The high H/C and low O/C CHO, CHOP(N,S), CHOS, and CHONS molecular formulas are characteristic of that polar, ionizable fraction, and studies apportioning the sources of OA to a coastal site, for example, can use them as evidence for a marine input to coastal OA.Likewise, studies apportioning sources to OA collected in the marine environment can take large contributions from aromatic or highly oxygenated compounds (O/C > 0.6) as evidence for terrestrial sources.

Biomass Burning Aerosols
The biomass burning aerosol PC loadings are characterized by having the lowest average O/C and H/C values, and the highest AI mod values (Table 3).These average ratios are consistent with the bulk properties discussed in Section 3.2 (Supplemental Table S2) but exaggerate the extreme ratios in the biomass burning aerosols to even lower H/C (1.24 vs. 1.35) and O/C (0.26 vs. 0.32) ratios and a higher AI mod (0.37 vs. 0.29).The PC loadings that define the biomass burning aerosols thus have low O/C and H/C and high AI mod and show the largest fraction of aromatic and condensed aromatic formulas (32.4%) of all the sample types (Table 3).Additionally, more than 80% of the biomass burning PC loading formulas are CHON (45.4%) and CHO (34.7%), suggesting that these two elemental formula groups are most responsible for distinguishing biomass burning aerosols from the other sample groups.Interestingly, the CHOS and CHONS formulas for the biomass burning aerosols master list of formulas made up nearly 20% and 12% of all assigned formulas and spanned a very wide O/C (0.10-1.20) and H/C (1-2) range (Figure 2).However, CHOS and CHONS formulas accounted for just 12% and 7.8% of the PC loadings associated with the biomass burning aerosols and plot in two clusters of the vK diagram (1) at H/C between 1.5 and 2.0 and O/C between 0.1 and 0.5; and (2) at H/C between 1.0 and 1.5 and O/C between 0.3 and 0.6.The much lower contributions of S-containing formulas attributable to biomass burning in the PC loadings suggests that the majority of the CHOS and CHONS formulas in the biomass burning samples (at high H/C and high O/C) are found in multiple sample types, and those particular CHOS and CHONS formulas may not be particularly useful as diagnostic of any one sample type.Studies of S-containing OA demonstrate the ability of existing OA and VOCs to form organosulfates via atmospheric oxidation reactions [17,[56][57][58].Emissions of inorganic sulfur are commonly associated with anthropogenic (e.g., fossil fuel emissions) [59,60] and marine sources (e.g., gas and aerosol-phase marine emissions) [61], but the abundance of CHOS and CHONS formulas in the mixed source and biomass burning aerosols demonstrates the ubiquity of organosulfur compounds in and beyond those environments.
The van Krevelen diagram shows the CHON and CHO formulas to make up a large portion of the aromatic and condensed aromatic PC loading formulas for the biomass burning aerosols, as indicated by their presence in the low O/C and low H/C regions (Figure 3b).Aromatic and condensed aromatic compounds have higher potential for absorbing light than more saturated molecules [5] and make up portions of the BC and brown carbon pools.The definitions of these two carbon pools are, by necessity, operational and remain a topic of debate [5,42,62].Traditional definitions of BC assume it to be insoluble in water, but this has been challenged, and BC is recognized to exist on a continuum of solubility.Recent studies have shown that BC can become soluble upon oxidation [63,64] and have detected dissolved forms of BC in aerosols [20] and aquatic dissolved OM [65].The definitive study relating dissolved BC determined by chemical techniques to thermal optically-defined BC has not to our knowledge been performed.However, the polyaromatic structures that are required for formulas defined as condensed aromatic using the AI mod classification system [47] are consistent with structures proposed for BC [66].The biomass burning PC loadings are distributed relatively evenly among the three extract-ionization pairings (Supplemental Figure S3).Interestingly, however, the majority of the aromatic and condensed aromatic formulas assigned as important to the biomass burning samples by the PCA were detected in the WSOM´and WSOM+ analyses (Supplemental Table S4).This may result because pyridine does not efficiently extract BC or because less oxygenated BC that is extracted into PSOM does not ionize efficiently in ESI, and it is unclear what fraction of the biomass burning BC is soluble.
A great many CHON formulas have very low O/C (0.05-0.40) and H/C (<1.0) ratios, indicating that they may contain reduced nitrogen functional groups (e.g., amines) or have heterocyclic rings.A subset of these compounds contributes to the black nitrogen (heterocyclic aromatic nitrogen produced during biomass combustion or derivatives of BC that have undergone reactions with inorganic nitrogen) compounds that have recently suggested to be important components of the nitrogen and carbon cycles in aquatic systems [65].Indeed, the WSOM from these biomass burning aerosols do contain fluorophoric compounds, and the fluorescence intensity is substantially higher than the mixed source and urban aerosols (Willoughby et al., unpublished data).The brown color of the sample filters suggests that these samples do contain significant amounts of brown carbon which, like BC, has no unequivocal chemical definition [5].Brown carbon is formed alongside BC in combustion processes [5], and has also been formed through model reactions of aqueous SOA with NH 3 [67,68].Though the global importance of brown carbon is still a topic of debate, it has recently been suggested that biomass burning is the predominant source of brown carbon to the world's atmosphere [69], and the results presented here and elsewhere [70,71] suggest that aromatic N-containing compounds are major components of that brown carbon and diagnostic of biomass burning.

Urban Aerosols
The molecular formulas with PC loadings that identify them as important to the urban aerosols are also characterized by a large number of CHON (38.0%) and CHO (33.4%) formulas.The majority of these PC loadings are found in WSOM´measurements (Supplemental Figure S3) and have significantly higher O/C ratios than those found in the biomass burning aerosols (Figure 3c).The average O/C (0.55 ˘0.21) is double and the average AI mod (0.18 ˘0.22) is half that of the biomass burning aerosols.The average O/C value for the urban aerosol PC loadings is much higher than the average value for the entire urban aerosol master list (0.45, Table 2) highlighting the distinguishing nature of those high O/C components.Many of the CHO and CHON formulas that the PCA associates as important to the urban aerosols have O/C ratios between 0.35 and 0.85 and H/C ratios between 0.8 and 1.8, an area of the van Krevelen diagram that overlaps with regions previously described for lignin-like or carboxyl-rich alicyclic molecules (CRAM) in the soil and aquatic literature (Figure 3c) [72,73].In the atmosphere, formulas in this region are likely to be the result of atmospheric aging reactions which tend to increase the oxidation state of carbon [44] and transform OA compounds into new compounds that plot downwards and to the right on a van Krevelen diagram [43].
Aliphatic CHOS and CHONS molecular formulas plotting at O/C between 0.3 and 1.2 and H/C between 1.3 and 2.0 were also abundant among the PC loadings for the urban aerosols.Fossil fuel combustion and biological emissions are major sources for aliphatic compounds in the atmosphere [11,74], and the presence of heavy traffic and heavy industrial activity in short and long range proximity of the urban sampling site suggests that fossil fuel combustion as strong candidates for these compounds.The high O content in these formulas is suggestive of organosulfate (O/S > 4) and nitrooxyorganosulfate (O/S > 7, contains N) compounds as potential identities.The abundance of oxidized inorganic nitrogen and sulfur emitted in urban areas combined with emissions of aliphatic biological and anthropogenic compounds make for an excellent environment for organosulfate and nitrooxyorganosulfate compound formation [75,76].
Those two major distinguishing features (CHO and CHON formulas at 0.35 < O/C < 0.85 and CHOS and CHONS formulas at 0.8 < O/C < 1.8) can be used to apportion OA as anthropogenically-influenced urban aerosols.In fact, they have been used in a study of OA collected on a transect in the North Atlantic Ocean to apportion OA to North American terrestrial air mass influences [31].The CHON, CHONS, and CHOS formulas identified as important to the urban OA have distinct characteristics from those identified as important to the marine and biomass burning aerosols.The carbon backbones to which N and S functional groups are bound (or incorporated) are very different, and the PCA has made this clear.

Mixed Source Aerosols
The mixed source aerosols contain the fewest formulas identified by PCA, indicating those samples have few compounds specific to a unique source and share many molecular features with the other aerosols.Given the potential influences to these aerosols, the lack of an abundance of truly defining features is perhaps expected.Nonetheless, the PCA did identify a few OA components specific to the mixed source aerosols.Like the urban aerosols, the majority of the PC loadings important to the mixed source samples were water-soluble.Unlike the urban aerosols, many of these PC loadings were in the WSOM+ measurements suggesting relatively higher amounts of compounds with basic functional groups in the mixed source environment (Supplementary Table S4, Figure S3).
A cluster of CHOS formulas at low O/C and H/C ratios (O/C < 0.4, H/C < 1.2) is evident in the PC loadings (Figure 3d).Most of these formulas do not contain sufficient oxygen to be organosulfates, and therefore must represent more reduced forms of organic sulfur (e.g., thiols).Examination of the master list van Krevelen diagram shows that these types of compounds are also present in the urban and biomass burning aerosols but not nearly to the extent of the mixed source OA (Figure S1).Sulfonates are a common anthropogenic pollutant and are ubiquitous in personal care products, and have been previously identified in aerosol OM [75].Non-sulfate aromatic S-containing compounds have been identified in fossil fuels [77,78].The presence of these reduced sulfur compounds in North American continental relative to marine aerosols lends support to the idea that this group of compounds may be anthropogenically-derived.A tight cluster of P-containing formulas (0.35 < O/C < 0.45, 1.45 < H/C < 1.55; Figure 3d) is also evident in the mixed source aerosols.OA compounds containing P have been attributed to biological sources [31,79], and a distinct local biological source is speculated for these formulas in the mixed source OA.The remaining CHO, CHON, and CHOS formulas with PC loadings assigned to the mixed source OA are scattered at O/C < 0.6 and H/C > 1.0, less oxygenated than the urban aerosol PC loadings and less aromatic than the biomass burning aerosols.The defining features for the mixed source aerosols are, as expected, indicative of the multiple potential sources to the sampling site.
As was found for each of the other sources, a portion of the mixed source PC loading formulas can be defined as condensed aromatic.The presence of condensed aromatic species, which may also be characterized as BC and functionalized derivatives of BC, in all of these samples reflects their ubiquity in the atmosphere even over the North Atlantic Ocean.BC is traditionally regarded as a product of combustion, as in the combustion of fossil fuel or biomass.Numerous pyrogenic sources exist on the east coast of the US (vehicular exhaust, shipping, biomass burning, and long-range transport from various anthropogenic activities), and are probably contributors to BC even in the rural environment where the mixed source aerosols were sampled and over the marine environment.It has been recently shown that BC-like compounds can also be produced from non-pyrogenic sources [80,81], but the prevalence of this mechanism has yet to be established for atmospheric systems.Regardless of origin, these aromatic and condensed aromatic species are capable of absorbing ultraviolet radiation resulting in a positive radiative forcing (i.e., climate warming) [5].While ESI-FTICR-MS provides an immense amount of molecular information regarding complex OM mixtures, the limitations are well-established.The ability of a compound to be analyzed is highly dependent on its ability to ionize, and non-polar and non-ionizable compounds (e.g., hydrocarbons and carbohydrates) are largely omitted from this analysis.Additionally, this analysis is necessarily qualitative because a combination of charge competition, concentration, and ionization efficiency drive the peak intensities.However, 1 H NMR does not have the same bias for the detection of OM, and provides a complementary set of information regarding the chemical makeup of these complex samples.

1 H NMR Analysis
The WSOM for each of the aerosols display an array of proton types spanning the spectral region between 0 and 11 ppm (Figure 4).The observed chemical shifts are related to the chemical environment of each proton, and provides clues to the structural connectivity of the atoms within each sample.The region between 0 and 4.4 ppm contains at least 90% of the signal in each of the spectra.Most of the spectra contain broad peaks indicating similar proton types attached to varying carbon chain lengths or located in varying proximity to polar functional groups.These broad peaks exemplify the complexity of aerosol WSOM, and make it difficult to identify individual compounds.However, the region in which a proton signal is detected can indicate the general class of compounds to which that proton belongs.For example, the most intense peak in each spectrum is located at 1.2 ppm and is indicative of protons that are part of a CH 2 group, which represent those that are part of an alkyl chain.This peak is broadest in the urban aerosols, suggesting the alkyl chains are longer, and the CH 2 protons in these aerosols are attached to the widest variety of structural entities.Conversely, the narrower CH 2 peak in the marine, mixed source, and biomass burning aerosols suggests shorter chain lengths and less diversity regarding chemical environments among CH 2 groups.Smaller chain lengths could indicate decomposition of larger molecules by photochemical degradation, or suggest that the molecules have not undergone substantial oligomerization reactions, such as those that add small volatile species like isoprene.
Dividing the spectrum into key proton regions and evaluating the relative contributions of the total spectral intensity can reveal important differences among the different sources (Table 4).The region where signal from aromatic protons are found (6-9 ppm) is the most variable among the sources.A broad signal is observed in the aromatic region of the biomass burning aerosols spectra (Figure 4b inset) and it makes up 9.2% of the total intensity, a percentage more than 4 times greater than the other sources (Table 4).Protons in this region are attached to aromatic or condensed aromatic rings, and the broadness of the peak indicates a high degree of structural diversity among the aromatic protons in these aerosols.There is little signal in the aromatic region in the urban and mixed source aerosols, and essentially no signal in the marine aerosols, indicating there are either very few aromatic compounds in these aerosols or that only water-insoluble aromatic species are present.The observation of a much larger intensity in the aromatic region of the biomass burning aerosols supports the detection of more aromatic and condensed aromatic formulas in the FTICR mass spectra as well as the higher concentration of BC, and suggests that the biomass burning aerosols contain a larger quantity of chromophoric OM than the other aerosol sources investigated here.
As previously mentioned, the majority of the signal in each NMR spectrum falls in the region between 0.6 and 4.4 ppm, and the relative signal in each of the major proton regions does not vary greatly between the sources.At least 50% of the signal falls in the H-C region (0.6-1.8 ppm) in all cases, indicating that a majority of the protons are part of alkyl groups.The larger H-C fractions observed in the urban and mixed source aerosols suggest that OM contains larger carbon backbones (linear or branched).Carbon can be added to existing OM via oligomerization reactions, and the larger carbon chains present in the mixed and urban aerosols suggests that they have undergone oligomerization reactions more extensively than the marine and biomass burning aerosols.Protons that are more downfield in this region (1.4-1.8 ppm) are often attributed to an H-C group that is β to a carbon attached to heteroatoms (H-C-C-C-X, where X = N, S, or O) [14].The presence of a large number of heteroatomic compounds identified in each of the mass spectra (CHON, CHOS, and CHONS) indicates some portion of the signal in this region represents these species as can be expected for OA and was noted previously (Section 3.2).
Atmosphere 2016, 7, 79 14 of 23 CHONS) indicates some portion of the signal in this region represents these species as can be expected for OA and was noted previously (Section 3.2).Approximately one-third of the signal intensity falls in the unsaturated alkyl region (H-C-C=; 1.8-3.2ppm), which includes carbonyl, carboxyl, alkenes, and also hydrogen attached to carbons adjacent to a nitrogen or sulfur (i.e., amines, thiols, etc.).The marine and urban aerosols have the   Approximately one-third of the signal intensity falls in the unsaturated alkyl region (H-C-C=; 1.8-3.2ppm), which includes carbonyl, carboxyl, alkenes, and also hydrogen attached to carbons adjacent to a nitrogen or sulfur (i.e., amines, thiols, etc.).The marine and urban aerosols have the highest relative percentage of proton signal in this region, but are also the most variable.On average, approximately 9% of the proton signals are found in the oxygenated aliphatic region (3.2-4.4 ppm) for the aerosols collected over terrestrial environments and a much higher percentage is observed in the marine aerosols (14.5%).The greater signal intensity in this region of the marine aerosols is surprising given the low O/C ratios of the molecules identified in the FTICR mass spectra, but this could be due to the presence of oxygenated species that are outside of the ESI-FTICR-MS analytical window (i.e., nonionizable compounds, compounds < 200 m/z) as suggested by the presence of a sharp, intense peak in this region.This region includes protons attached to carbons that are singly bound to an oxygen atom such as ethers, esters, alcohols, and carbohydrates.Carbohydrates are thought to be important components of aerosols produced via bubble bursting [54] but do not ionize efficiently via electrospray ionization.The complexity of aerosol OM limits the ability to identify each of the individual components within the mixture, but 1 H NMR provides valuable information regarding the connectivity of the compounds present and because of its complementarity to FTICR-MS we can gather complementary chemical features contained within these complex mixtures.The important and distinguishing features can be observed more clearly with the help of PCA.

1 H NMR PCA
PCA was performed on the whole 1 H NMR spectra for each of the aerosol WSOM extracts, similar to previous studies [6,7,82].The initial PCA results indicate that the marine aerosols are significantly different than the mixed source, biomass burning, and urban aerosols and inclusion of the marine aerosols does not adequately explain the variance between the three terrestrial sources (Figure 5a).The main differences include the fact that the marine aerosols contain a few sharp peaks (e.g., methanesulfonic acid at 2.7 ppm, and acetate at 1.8 ppm), and the other aerosols contain multiple broad signals throughout each spectrum.The PCA was evaluated a second time using only the mixed source, biomass burning, and urban aerosols, and key differences among those sources are discussed further.The key features of marine aerosol WSOM that distinguish them from continentally-influenced air masses have been discussed at length in previous studies [6,7].Briefly, the marine aerosols differ from those influenced by the North American continent in that the marine aerosols have more saturated aliphatic chains and are less structurally diverse.
Atmosphere 2016, 7, 79 15 of 23 highest relative percentage of proton signal in this region, but are also the most variable.On average, approximately 9% of the proton signals are found in the oxygenated aliphatic region (3.2-4.4 ppm) for the aerosols collected over terrestrial environments and a much higher percentage is observed in the marine aerosols (14.5%).The greater signal intensity in this region of the marine aerosols is surprising given the low O/C ratios of the molecules identified in the FTICR mass spectra, but this could be due to the presence of oxygenated species that are outside of the ESI-FTICR-MS analytical window (i.e., nonionizable compounds, compounds < 200 m/z) as suggested by the presence of a sharp, intense peak in this region.This region includes protons attached to carbons that are singly bound to an oxygen atom such as ethers, esters, alcohols, and carbohydrates.Carbohydrates are thought to be important components of aerosols produced via bubble bursting [54] but do not ionize efficiently via electrospray ionization.The complexity of aerosol OM limits the ability to identify each of the individual components within the mixture, but 1 H NMR provides valuable information regarding the connectivity of the compounds present and because of its complementarity to FTICR-MS we can gather complementary chemical features contained within these complex mixtures.The important and distinguishing features can be observed more clearly with the help of PCA.

1 H NMR PCA
PCA was performed on the whole 1 H NMR spectra for each of the aerosol WSOM extracts, similar to previous studies [6,7,82].The initial PCA results indicate that the marine aerosols are significantly different than the mixed source, biomass burning, and urban aerosols and inclusion of the marine aerosols does not adequately explain the variance between the three terrestrial sources (Figure 5a).The main differences include the fact that the marine aerosols contain a few sharp peaks (e.g., methanesulfonic acid at 2.7 ppm, and acetate at 1.8 ppm), and the other aerosols contain multiple broad signals throughout each spectrum.The PCA was evaluated a second time using only the mixed source, biomass burning, and urban aerosols, and key differences among those sources are discussed further.The key features of marine aerosol WSOM that distinguish them from continentally-influenced air masses have been discussed at length in previous studies [6,7].Briefly, the marine aerosols differ from those influenced by the North American continent in that the marine aerosols have more saturated aliphatic chains and are less structurally diverse.The first two principal components (PC1 and PC2) explain more than 80% of the variance between the mixed source, biomass burning, and urban aerosol WSOM (Figure 5b).The urban aerosols have lower PC1 scores than the mixed source and biomass burning aerosols, the biomass burning aerosols have lower PC2 scores than the urban and mixed source aerosols, and the mixed source aerosols do not have unique PC values (Figure 5b).Thus, PC1 shows the spectral characteristics that differentiate the urban aerosols from the biomass burning and mixed source The first two principal components (PC1 and PC2) explain more than 80% of the variance between the mixed source, biomass burning, and urban aerosol WSOM (Figure 5b).The urban aerosols have lower PC1 scores than the mixed source and biomass burning aerosols, the biomass burning aerosols have lower PC2 scores than the urban and mixed source aerosols, and the mixed source aerosols do not have unique PC values (Figure 5b).Thus, PC1 shows the spectral characteristics that differentiate the urban aerosols from the biomass burning and mixed source aerosols, and PC2 shows the spectral characteristics that differentiate the biomass burning aerosols from the other aerosols.
The most intense peak in the variable loadings plot for PC1 (Figure 6a) is positive and represents a CH 2 group (1.2 ppm), suggesting that alkyl chain length is important in distinguishing aerosol emission sources.The CH 2 peak in the mixed source and biomass burning aerosols is more narrow than in the urban aerosols, so the positive peak in the PC1 loadings may be indicative of shorter alkyl chains (i.e., less variability).A splitting of the CH 3 peak (0.9 ppm) is observed, and the negative peak is more downfield than the positive peak.This splitting suggests that the urban aerosols (represented by negative PC1 variable loadings) contain terminal methyl groups in closer proximity to polar functional groups, which is supported by the presence of broad peaks in the region where polar functional groups appear (1.8-3.2 ppm).Another intense positive peak is found around 2.1 ppm, which can represent hydrogen bound to a carbon α to a carbonyl group (Figure 6a).There is an intense peak in this area in each of the whole sample spectra, but the width of the peaks vary (Figure 4) indicating varying structural diversity of similar functional groups.This peak is sharper in the mixed source and biomass burning aerosols, and broad in the urban aerosols.In fact, the edges of the peak at 2.1 ppm can be found as negative peaks in PC1 indicating that the broadness of that peak is characteristic of the urban aerosols.While protons bound to carbon adjacent to a carbonyl group are widespread among the aerosols as indicated by a strong signal around 2.1 ppm in each 1 H NMR spectrum, the urban aerosols contain protons that are in more diverse chemical environments.aerosols, and PC2 shows the spectral characteristics that differentiate the biomass burning aerosols from the other aerosols.
The most intense peak in the variable loadings plot for PC1 (Figure 6a) is positive and represents a CH2 group (1.2 ppm), suggesting that alkyl chain length is important in distinguishing aerosol emission sources.The CH2 peak in the mixed source and biomass burning aerosols is more narrow than in the urban aerosols, so the positive peak in the PC1 loadings may be indicative of shorter alkyl chains (i.e., less variability).A splitting of the CH3 peak (0.9 ppm) is observed, and the negative peak is more downfield than the positive peak.This splitting suggests that the urban aerosols (represented by negative PC1 variable loadings) contain terminal methyl groups in closer proximity to polar functional groups, which is supported by the presence of broad peaks in the region where polar functional groups appear (1.8-3.2 ppm).Another intense positive peak is found around 2.1 ppm, which can represent hydrogen bound to a carbon α to a carbonyl group (Figure 6a).There is an intense peak in this area in each of the whole sample spectra, but the width of the peaks vary (Figure 4) indicating varying structural diversity of similar functional groups.This peak is sharper in the mixed source and biomass burning aerosols, and broad in the urban aerosols.In fact, the edges of the peak at 2.1 ppm can be found as negative peaks in PC1 indicating that the broadness of that peak is characteristic of the urban aerosols.While protons bound to carbon adjacent to a carbonyl group are widespread among the aerosols as indicated by a strong signal around 2.1 ppm in each 1 H NMR spectrum, the urban aerosols contain protons that are in more diverse chemical environments.The variable loadings plot for PC2 (Figure 6b) shows the spectral characteristics important to the biomass burning aerosols.A broad negative peak in the aromatic region (6-9 ppm) is apparent, and is equivalent to the aromatic signal observed in the individual spectra of the biomass burning aerosols.Peaks consistent with levoglucosan (3.36, 3.52, 3.75, and 3.93 ppm) [15] are identified in the negative PC2 loadings, indicating that they are important in distinguishing the biomass burning aerosols from the other source.Levoglucosan is a common product of the combustion of cellulose The variable loadings plot for PC2 (Figure 6b) shows the spectral characteristics important to the biomass burning aerosols.A broad negative peak in the aromatic region (6-9 ppm) is apparent, and is equivalent to the aromatic signal observed in the individual spectra of the biomass burning aerosols.Peaks consistent with levoglucosan (3.36, 3.52, 3.75, and 3.93 ppm) [15] are identified in the negative PC2 loadings, indicating that they are important in distinguishing the biomass burning aerosols from the other source.Levoglucosan is a common product of the combustion of cellulose material and is widely used as a tracer compound for biomass burning aerosols [83].Levoglucosan does produce additional 1 H NMR peaks, but they are obscured by the water-suppression pulse program due to their proximity to the peak generated by water (~4.7 ppm).A slight negative peak at 2.3 ppm is consistent with hydrogen attached to nitrogen.The peak is broad indicating that there are many compounds with the same functional group attached to varying carbon structures.Without standards, it cannot be confirmed that this peak represents amino groups, but the large number of CHON formulas identified in the FTICR mass spectra supports the presence of a diverse suite of N-containing OM in the biomass burning aerosols.Positive peaks in the aliphatic region (0.8-1.5 ppm) and at 1.8 ppm representing acetate or acetic acid demonstrate that the higher PC2 scores for the urban and mixed source aerosols have a more oxygenated and aliphatic WSOM composition.
Overall, there is good consistency and complementarity between the 1 H NMR and FTICR-MS techniques.Both techniques identify aromatic compounds and nitrogen-containing compounds that are not extensively oxidized as important for distinguishing the OM from biomass burning aerosols from the other sources.The lower degree of oxidation in the OM found in the biomass burning and also the marine aerosols suggests that these aerosols are collected close to their source and/or have not been exposed to conditions that promote extensive oxidative or oligomeric transformations.The urban aerosols are characteristically more polar as indicated by 1 H NMR peaks that are shifted more slightly downfield than in the other aerosols and also the higher O/C ratios of the formulas, with more structurally diverse carbon chains.This structural diversity may be represented by longer and/or more branched carbon chains indicative of more oligomeric or other chain elongation reactions.While some of these features were readily apparent, the PCA was able to highlight and confirm some of these variations.
The combination of 1 H NMR and FTICR-MS provides an incredible amount of molecular-level information regarding the chemical composition of aerosol OM.While 1 H NMR overcomes some of the bias introduced by ESI-FTICR-MS, it comes with its own limitations.The 1 H NMR analysis used here requires a liquid sample, so the water-insoluble OM is not analyzed by this method.Solid-state NMR techniques that analyze a whole solid sample do exist but require very large samples that require extremely large sample volumes which typically can only be obtained over weeks of sampling.Additionally, the water-suppression removes all signal in the region of 4.7 ppm and may reduce the signal in the region within ˘0.5 ppm including protons directly attached to carbon-carbon double bonds (H-C=C) which are typically found around 5 ppm.

Conclusions
The 1 H NMR and FTICR-MS results described here, while not fully quantitative in nature, provide detailed, source-specific distinguishing characteristics of the water-and pyridine-extractable and ionizable components of marine, urban, biomass burning, and mixed source OA.PCA efficiently highlights the molecular formulas and 1 H NMR structural components that distinguish the OM contained within aerosols from each of these sources.Marine aerosols contain molecules consistent with biological inputs including lipid-and phospholipid-like compounds as well as a large fraction of low O/C ratio organosulfur compounds, likely derived from reactions with sulfate derived from biological dimethyl sulfide emissions.The carbon backbones of these molecules are more aliphatic and less diverse than terrestrial OA, and the OM is overall less oxidized.The OM in biomass burning aerosols has more aromatic character and less oxygen content than the other OA groups.Aromatic, nitrogen containing compounds are a defining feature of biomass burning aerosols, and nitrogen-incorporation reactions play a major role in the formation or transformation of these aerosols.The urban aerosols are also characteristically more oxygenated and more structurally complex, suggesting they have been subject to more extensive atmospheric aging.Nitrogen-containing formulas of lesser aromaticity than described for the biomass burning OA are an important feature of urban aerosols.These compounds likely reflect the prevalence of OA transformations that incorporate inorganic NOx, either through NOx additions to aliphatic/olefinic compounds or ring-opening reactions of biomass burning OA.The combined results of the FTICR-MS and 1 H NMR PCAs show the mixed source OA to be very similar to but less chemically diverse than the urban aerosols.
These defining OA molecular features can be used in future work to provide clues to the primary inputs and degree of atmospheric transformation.In environments impacted by multiple emission sources, these molecular features can help identify key sources.Further, as the relationships between molecular composition and atmospheric impacts (e.g., hygroscopicity, ice and cloud condensation formation, light absorption) and environmental fates (e.g., photochemical and microbial lability, metal complexing ability) are strengthened, these source-specific molecular characteristics can be used to partition the relative importance of each of these emission sources to these impacts and fates.The highly aliphatic nature of and biological source for the marine aerosols suggests them to be highly susceptible to microbial components, while the aromatic nature of biomass burning OA are likely to be less microbially labile and more susceptible to photodegradation.Though the exact relationship is yet to be fully understood, the extent of oxidation is associated with hygroscopicity and the ability of aerosols to act as cloud condensation nuclei, and the high O/C content of urban and mixed source aerosols present them as high hygroscopic candidates.The additional molecular formula and structural details provided here may help understand deviations from the expected O/C ratio-hygroscopic behavior relationship.The highly aromatic, nitrogen compounds are likely an important component of atmospheric brown carbon, which makes a significant contribution to light-absorbing carbon in the atmosphere.Any extensive light absorption found for other source components may be due to higher amounts of inorganic components or olefinic or non-ionizable OA.
Aerosol FTICR-MS and 1 H NMR analyses provide a complementary set of information regarding OA chemical composition, and PCA provides a useful tool for deconstructing the important components that define each of the aerosol sources.While this study presents several aerosols from some key sources, application of this method to a larger number of samples from more emission sources are needed for a more comprehensive inventory of atmospherically-relevant OA.Additionally, pairing these analysis with highly time-resolved methods, such as aerosol mass spectrometry, would provide an excellent accounting of the inorganic ions and volatile OM that influence aerosol atmospheric and environmental impacts and fates.
Table S2: Total formulas and average elemental properties for aerosol WSOM PPL and PSOM from each emission source determined using FTICR mass spectra.The distribution of molecular formulas based on atomic content and AI mod structure type are listed as number of formulas with the percentage of total formulas in parentheses directly below.
Table S3: Total formulas and average elemental properties for aerosol WSOM´, WSOM+, and PSOM´from each emission source determined using FTICR-MS.Atomic content and structure type values are expressed as the number of formulas.The values in parentheses are the percentage of total molecular formulas in each sample, an average for each source.
Table S4: Total formulas and average elemental properties for aerosol WSOM´, WSOM+, and PSOM from each emission source identified by PCA.Distributions of formulas based on atomic content and AI mod structure type are listed as percentage of total formulas.Figure S1: Van Krevelen diagrams for molecular formulas identified in the FTICR mass spectra for the marine, biomass burning, urban, and mixed source aerosols.Each row represents a different source, and each column represents only those formulas with a specific elemental makeup (CHO, CHON, or CHOS).Each "ˆ" represents one or more molecular formulas.

ESI electrospray ionization FTICR-MS
Fourier transform ion cyclotron resonance mass spectrometry H/C hydrogen-to-carbon atomic ratio

Figure 1 .
Figure 1.Representative full ESI(−) FTICR mass spectra for WSOMPPL extracts of (a) marine; (b) biomass burning; (c) urban; and (d) mixed source aerosols between 200 and 800 m/z.Some intense peaks are shown off scale.

Figure 1 .
Figure 1.Representative full ESI(´) FTICR mass spectra for WSOM PPL extracts of (a) marine; (b) biomass burning; (c) urban; and (d) mixed source aerosols between 200 and 800 m/z.Some intense peaks are shown off scale.

Figure 2 .
Figure2.PCA score plots for PC2 versus PC1 (left) and PC3 versus PC1 (right).PCA was performed using molecular formulas identified in FTICR mass spectra for aerosol OM extracts from marine, biomass burning, urban, and mixed source areas.

Figure 2 .
Figure2.PCA score plots for PC2 versus PC1 (left) and PC3 versus PC1 (right).PCA was performed using molecular formulas identified in FTICR mass spectra for aerosol OM extracts from marine, biomass burning, urban, and mixed source areas.

Figure 3 .
Figure 3. Van Krevelen diagrams for molecular formulas identified by PCA for (a) marine; (b) biomass burning; (c) urban; and (d) mixed source aerosols.Each data point is colored according to the atomic content of the molecular formula.

Figure 3 .
Figure 3. Van Krevelen diagrams for molecular formulas identified by PCA for (a) marine; (b) biomass burning; (c) urban; and (d) mixed source aerosols.Each data point is colored according to the atomic content of the molecular formula.

Figure 4 . 1 H
Figure 4. 1 H NMR spectra for WSOM of (a) marine; (b) biomass burning; (c) urban; and (d) mixed source aerosols, where each colored line represents a different sample spectrum.The region between 6 and 9 ppm represents aromatic protons, and is expanded in the inset of each spectrum.

Table 4 .
Average relative contributions of total spectral intensity for integrations of major proton regions in 1 H NMR spectra for each of the aerosol sources.Standard deviations of the relative signal in each region among aerosols from the same source are provided.

Figure 4 . 1 H
Figure 4. 1 H NMR spectra for WSOM of (a) marine; (b) biomass burning; (c) urban; and (d) mixed source aerosols, where each colored line represents a different sample spectrum.The region between 6 and 9 ppm represents aromatic protons, and is expanded in the inset of each spectrum.

Table 4 .
Average relative contributions of total spectral intensity for integrations of major proton regions in 1 H NMR spectra for each of the aerosol sources.Standard deviations of the relative signal in each region among aerosols from the same source are provided.

Figure 5 .
Figure 5. Aerosol WSOM (a) PC1 and PC2 scores for PCA of full 1 H NMR spectra of marine, mixed source, biomass burning, and urban aerosols and (b) PC1 and PC2 scores for PCA omitting marine aerosols.The amount of variation explained by each PC is indicated in parentheses on each axis.

Figure 5 .
Figure 5. Aerosol WSOM (a) PC1 and PC2 scores for PCA of full 1 H NMR spectra of marine, mixed source, biomass burning, and urban aerosols and (b) PC1 and PC2 scores for PCA omitting marine aerosols.The amount of variation explained by each PC is indicated in parentheses on each axis.

Figure 6 .
Figure 6.Variable loadings plots for (a) PC1 and (b) PC2 resulting from PCA using full 1 H NMR spectra of mixed source, biomass burning, and urban aerosol WSOM.Some peaks are labeled with the functional group region and some peaks labeled with a single letter to indicate a specific compound, where A = acetic acid or acetate and L = levoglucosan.

Figure 6 .
Figure 6.Variable loadings plots for (a) PC1 and (b) PC2 resulting from PCA using full 1 H NMR spectra of mixed source, biomass burning, and urban aerosol WSOM.Some peaks are labeled with the functional group region and some peaks labeled with a single letter to indicate a specific compound, where A = acetic acid or acetate and L = levoglucosan.

Figure S2 :
Figure S2: The loadings for (a) PC1 and PC2 and (b) PC1 and PC3 from the PCA analysis of the FTICR-MS molecular formulas.

Figure S3 :
FigureS3: Venn diagrams showing the relative distribution of PCA molecular formulas present in any of the three solvent/ionization methods (WSOM´, WSOM+, and PSOM) for each aerosol source.Areas of overlap represent percentages of molecular formulas that appear in two or more of those samples.Areas with no overlap represent the percentage of molecular formulas unique to that individual solvent/ionization method.

Table 1 .
Average total suspended particulates (TSP) and total carbon (TC) concentrations and carbon percentages (relative to TC) for each aerosol source type.

Table 2 .
Total formulas and average elemental properties for aerosol WSOM PPL and PSOM from each emission source determined using FTICR mass spectra.

Table 3 .
Total formulas and average elemental properties for molecular formulas identified by PCA.Distributions of formulas based on atomic content and AI mod structure type are listed as percentage of total formulas.