An Exploratory Approach Using Regression and Machine Learning in the Analysis of Mass Absorption Cross Section of Black Carbon Aerosols: Model Development and Evaluation

Mass absorption cross-section of black carbon (MACBC) describes the absorptive cross-section per unit mass of black carbon, and is, thus, an essential parameter to estimate the radiative forcing of black carbon. Many studies have sought to estimate MACBC from a theoretical perspective, but these studies require the knowledge of a set of aerosol properties, which are difficult and/or labor-intensive to measure. We therefore investigate the ability of seven data analytical approaches (including different multivariate regressions, support vector machine, and neural networks) in predicting MACBC for both ambient and biomass burning measurements. Our model utilizes multi-wavelength light absorption and scattering as well as the aerosol size distributions as input variables to predict MACBC across different wavelengths. We assessed the applicability of the proposed approaches in estimating MACBC using different statistical metrics (such as coefficient of determination (R2), mean square error (MSE), fractional error, and fractional bias). Overall, the approaches used in this study can estimate MACBC appropriately, but the prediction performance varies across approaches and atmospheric environments. Based on an uncertainty evaluation of our models and the empirical and theoretical approaches to predict MACBC, we preliminarily put forth support vector machine (SVM) as a recommended data analytical technique for use. We provide an operational tool built with the approaches presented in this paper to facilitate this procedure for future users.


Introduction
Black carbon (BC) aerosols are emitted from incomplete combustion processes (e.g., fossil fuel and biomass burning) as fine particles [1,2]. BC has a major role in the climate system due to its ability to absorb solar radiation and interactions with clouds [3][4][5]. Understanding the properties of BC and quantifying its mass concentration in the atmosphere are essential to estimate its impacts on climate change. One widely used method to determine BC mass concentration is dividing light absorption coefficient (B abs ) by mass absorption cross-section (MAC) [6][7][8]. Although B abs can be measured using various instruments (such as in situ and filter-based optical instruments [7,9]), the value of MAC needs to be estimated or known in advance when performing the numerical calculation of BC mass. Consequently, the derived BC mass is sensitive to the adopted MAC value, so improved estimation of MAC BC will improve estimates of BC mass concentration from ground-based networks providing B abs , which can, in turn, result in improvements to chemistry-climate models and our understanding of the radiative impacts of BC.
Historically, MAC of 7.5 ± 1.2 m 2 g −1 (550 nm) for "fresh" and uncoated BC has been recommended [6,10]. In a recent review of 10 published reports of MAC values for freshly emitted BC, Liu et al. [11] conclude that the averaged MAC is 8.0 ± 0.7 m 2 g −1 (550 nm), generally agreeing with the values recommended in Bond and Bergstrom [6]. However, MAC of atmospheric BC particles is more variable and more uncertain than that of fresh BC. For example, MAC of atmospheric BC may be lower (e.g., Nordmann et al. [12] report values of 3.9 to 7.4 m 2 g −1 at 550 nm) or larger (e.g., Gyawali et al. [13] and Kondo et al. [8] report values ranging from roughly 10 to 14 m 2 g −1 at 550 nm) than the "typical" value of 7.5 m 2 g −1 . Similarly, the value of MAC used in different climate models spans a broad range (2.3 to 10.5 m 2 g −1 (550 nm); e.g., [14] and references therein). Differences in MAC values can exist due to differences in the aerosol mixing state, the underlying size distribution of BC-containing particles, and the effective density of the BC-containing particles. This is further complicated by uncertainty in the complex refractive index of BC; for example, Bond and Bergstrom [6] provide a range of real and imaginary complex refractive indices that may be representative of BC. Moreover, atmospheric aging may result in MAC values that differ from the Bond and Bergstrom [6] value (e.g., [15]); therefore, MAC is likely a function of time, not a constant value (e.g., [16]). Finally, although 550 nm is often the focus for reporting MAC, BC absorbs light across all wavelengths. To extend this value to other wavelengths of light (λ) a power law relationship [3,17,18] is used: where AAE is the absorption Ångström exponent. AAE is commonly assumed to be unity for BC, but it varies in the literature between~0.6 and~1.6 [3,[19][20][21]. The presence of other absorbing materials in the atmosphere further complicates the application of MAC to derive BC mass. Brown carbon (BrC) has been found to absorb light across ultraviolet-visible wavelengths, with MAC lower than that of BC [1,22,23]. Tar balls have attracted considerable attention recently as absorbers of light at visible and near-infrared spectral regions (e.g., [24][25][26]). One important source of atmospheric BrC and tar balls is biomass burning [6,27,28]. Another source of absorbing aerosols is wind erosion of the Earth's surface, releasing mineral dusts, which are non-carbonaceous material that weakly absorb light across all wavelengths [9,[29][30][31]. Therefore, due to different spectral dependence of light absorbing materials, simply dividing B abs by MAC may introduce uncertainties to the derived BC mass. Even at longer wavelengths (for example, 870 nm) where light absorption by most BrC is negligible, the computed BC mass may be overestimated if tar balls and mineral dust contribute to the light absorption.
Theoretically, MAC can be calculated from formulations using aerosol complex refractive indices, mixing state, size, and morphology as input variables [11,[32][33][34][35], all of which are difficult and/or labor-intensive to measure. Necessary assumptions within these calculations can also add uncertainty to the derived MAC. Hence, alternative approaches, such as the use of statistical methods and machine learning approaches may provide a new insight to predict values of MAC with comparable accuracy.
The goal of the present study is to explore and develop a straightforward yet accurate model for predicting temporal variations in the MAC of BC (MAC BC ). This model does not require the knowledge of aerosol composition or any assumptions of complex refractive indices and mixing state. In situ measurements are input to the model, including multiple wavelength B abs and light scattering coefficients (B scat ) as well as aerosol size distributions. One advantage of incorporating these measurements is that they are likely to be available at the majority of long-term monitoring sites [36][37][38][39], while aerosol composition is often not. To build the model, we examine several popular data analytical approaches. These approaches include classical statistical techniques and machine learning techniques (see Section 2.4 for details). We also provide several modules related to data preprocessing (see Section 2.3). We present here the framework of the model, together with an evaluation of model by testing aerosols from various environments using different statistical metrics. A forthcoming companion work will describe an extension of these models to predict MAC BC across different wavelengths using the model output. This future work will also include a sensitivity analysis of the models to different inputs variables (e.g., if only measurements of single-wavelength B abs and B scat exist) and a discussion of the influence of varying chemical composition and related optical properties (e.g., AAE, single scatting albedo (SSA)).

Dataset Description
We applied the data analytical approaches to two publicly available U.S. Department of Energy (DOE) Atmospheric Radiation Measurement (ARM) field campaigns, the Two-Column Aerosol Project (TCAP) and the Cloud, Aerosol and Complex Terrain Interactions (CACTI) project, as well as biomass burning emissions collected during the U.S.'s National Oceanic and Atmospheric Administration (NOAA)-sponsored Fire Influence on Regional to Global Environments Experiment (FIREX) laboratory campaign. A summary of aerosol properties obtained from each dataset is provided in Table 1. Values represent the mean across the entire sampling period along with the standard deviation across the period; hence, there is a temporal variability of~50% in MAC BC for each dataset.  Figure 1 for the temporal variability of mass absorption cross-section of black carbon (MAC BC ); b The datasets presented here are reported on a per-minute basis; c When calculating the MAC BC in this table, we use the black carbon (BC) mass concentration from the Single Particle Soot Photometers (SP2) and the filter-based light absorption coefficient (B abs ) corrected by either Li et al. [40] or Bond et al. [41], which was further modified in Ogren [42].
Hereafter, when we refer to Bond et al. [41], we mean the correction method which has been modified in Ogren [42]; d The results are compared to the predicted MAC BC presented in Table S5; e Using the "standard" MAC BC from Bond and Bergstrom [6] and an assumed absorption Ångström exponent (AAE) = 1, the "expected" MAC BC at 870 nm is 4.74 ± 0.75 m 2 g −1 , suggesting that both correction equations may be imperfect; f SSA 528 : single scattering albedo at 528 nm, calculated as B scat /(B abs + B scat ); g D 50 and Dv 50 : median diameter of the number and volume size distribution, respectively. The values are reported over the entire size distribution, which may have multiple modes, so D 50 and Dv 50 can be different from the geometric mean value of any individual modes; h During the Fire Influence on Regional to Global Environments Experiment (FIREX) campaign, only Scanning Mobility Particle Sizers (SMPS) was used to detect aerosol size distribution.
The TCAP (Cape Cod, Massachusetts, USA; roughly 60 miles southeast of Boston) and CACTI (mountain area of north-central Argentina; roughly 50 miles west of the city of Córdoba) field campaigns reported ground measurements of B abs using Particle Soot Absorption Photometer (PSAP), B scat using multi-wavelength nephelometers, aerosol size distributions using Scanning Mobility Particle Sizers (SMPS) and Aerodynamic Particle Sizers (APS) (spanning 11.3-461.4 nm for the SMPS and 444.6-18,747.4 nm for the APS, the latter converted to mobility diameter), and BC mass concentration using Single Particle Soot Photometers (SP2) [43][44][45]. For both campaigns, we randomly Atmosphere 2020, 11, 1185 4 of 21 selected a period when all measurements were available and without bad and/or missing data (TCAP:  16 July 2012 to 14 August 2012, CACTI: 1 December 2018 to 20 December 2018). The major aerosol sources at the two observational sites are likely to be different. Kassianov et al. [45] found that both marine aerosols and aerosols transported from continental North America over the Atlantic Ocean contributed to the observations at TCAP. Different from TCAP, the major sources of aerosols at CACTI likely included local and secondary biomass burning emissions and wind-blown dust [46].
Atmosphere 2020, 11, x FOR PEER REVIEW 10 of 22 performance of the models on training and test datasets suggests that the models have achieved a satisfactory generalization capability for new and "unseen" MACBC data at the TCAP observational site. Figure 1a,b shows the time series of predicted MACBC by different techniques from the training and test sets, respectively, as well as the "true" MACBC. We only present the results of OLS, LASSO, and SVM in this figure; results for the other models are provided in Figures S7 and S8. We apply the same time resolution to generate the figures as we calculated the results in Table 4. For both the training and test sets, the predicted MACBC follows closely to "true" MACBC with no signs of systematic errors.

Performance of Models on the Independent Validation Datasets
As the ultimate goal of comparing different techniques is to evaluate their generalizability on new datasets other than TCAP data, we apply all techniques to two independent validation sets: field data from the CACTI campaign and laboratory data from the FIREX campaign. The purpose of the laboratory portion of the FIREX campaign (Fire Sciences Laboratory, Montana, USA) conducted in 2016 was to understand how wildfire smoke influences the atmosphere [47]. There were slight differences in instruments used at FIREX compared to the DOE campaigns. During FIREX, B abs was quantified using a PSAP "successor" (Continuous Light Absorption Photometer, CLAP), B scat was detected by two Photoacoustic Extinctiometers (PAXs) at 405 and 870 nm, and a Atmosphere 2020, 11, 1185 5 of 21 relatively narrower size distribution was obtained by an SMPS (14.6-736.5 nm, mobility diameter); however, BC mass concentrations were still available from an SP2 [48].
In the present work, data preparation was performed in two steps: (1) observations of B abs and B scat were discarded if they fell below the estimated detection limits (PSAP: 0.30 Mm −1 [49]; CLAP: 0.60 Mm −1 [50]; Nephelometer: 0.29, 0.11, and 0.17 Mm −1 at 450, 550, and 700 nm respectively [51]; PAX (B scat only): 0.60 and 0.66 Mm −1 at 405 and 870 nm respectively); and (2) all data were converted to a 1-min time interval. For example, B abs and B scat were smoothed from 1-s data into 1-min averages, and the time series of particles size distributions were linearly interpolated from 5-min values to 1-min values.

Construction of Model
In this section, we first present how we utilize the above datasets to build the model. Then, we summarize the input and output parameters of the model and present the calculation details. Even though we focus on the B abs quantified by PSAP and CLAP in the present work, our model could be also implemented on B abs measured by the other instruments (such as the aethalometer or any photoacoustic technique). Similarly, the particle size distributions derived by other instruments (e.g., the Ultra-High-Sensitivity Aerosol Spectrometer (UHSAS, Droplet Measurement Technologies) or the Laser Aerosol Spectrometer (TSI Inc.)) could be used as input in our model (see Section 2.3.1 for more details). To build the model, we follow a common data partition method used in the machine learning community [52,53]. Specifically, we first randomly split the TCAP data into training set and test set (80% and 20%, respectively). The data analytical techniques are initially applied to the training set to derive parameters and then are evaluated using the test set. The test data are used as new data in order to independently assess the goodness of fit for our data analytical techniques. To further investigate the generalizability of the techniques, we extend the fitted models to another ambient observation site (CACTI) and aerosols obtained from laboratory biomass burning (FIREX). The latter two datasets are referred to as independent validation datasets in this work.
Selecting the appropriate input parameters for model development can affect the quality of MAC BC prediction. Therefore, we use the underlying physics for inspiration in identifying candidate parameters for our statistical models. MAC BC (at 870 nm) can be calculated following: where M BC is the mass concentration of BC. Light scattering theories (such as Mie theory and the Rayleigh-Debye-Gans approximation) can be used to calculate B abs using an aerosol population's complex refractive index and particle number distribution as inputs, while calculations of M BC depend on the aerosol population's density and particle volume distribution. The complex refractive index and density are both related to aerosol composition. However, because complete aerosol speciation is rare at the majority of long-term monitoring sites, we attempt to use empirical observations of B abs and B scat as an alternative; the basis for this is that previous studies have utilized AAE and scattering Ångström exponent (SAE) (i.e., an "AAE-SAE space", [30,[54][55][56]) to categorize absorbing aerosols (e.g., BC-dominant, dust-dominant, BC/BrC mixtures). We also include parameterizations of both empirical particle number distributions and particle volume distributions as candidates for our models. Even though these are inherently related, we argue that including representations of both is somewhat analogous to the TwO-Moment Aerosol Sectional (TOMAS) model [57], which tracks both number and mass distributions, and has been implemented in chemistry-climate models (e.g., [58,59]) to predict aerosol microphysics. Consequently, there are 20 candidate input variables in our statistical and machine learning models (Table 2). Pre-processing of these variables is described in Section 2.3, and the specific modeling approaches are described in Section 2.4.
N total V total a d N1 and d N2 represent the user-specified upper or lower bound of each size class. The sum of N 1 , N 2 , and N 3 equals to N total . The same definition and relationship applies for volume concentration. See Section 2.3.1 for more details; b Fractional number concentration is derived by dividing the number concentration within each size class by N total, e.g., F N1 = N 1 /N total . The same definition and relationship applies for fractional volume concentration.
When developing the model, we select the desired output to be MAC BC at 870 nm because this wavelength has traditionally been associated solely with BC (e.g., there is limited influence from other absorbing species), although we acknowledge that tar balls and/or mineral dust does absorb at this wavelength. To yield B abs (870 nm), the B abs values from the filter-based photometers corrected by either Li et al. [40] or Bond et al. [41] are extrapolated to 870 nm, following B abs~λ −AAE , where AAE is first calculated by the B abs at multiple wavelengths using a power-law fit. To derive the observation-based MAC BC (the true response of MAC BC ), M BC quantified by the SP2 at the observational site is utilized.

Data Preprocessing
In this section, we present a few preprocessing steps (summarized as three modules at https: //doi.org/10.5281/zenodo.3967833 [60]) that are necessary to enable model implementation. Readers should be aware that these modules are optional if they have data formatted as summarized in Table 2 or if they wish to conduct their own data preparation. For example, we provide the implementation of the correction algorithm described in Li et al. [40] and Bond et al. [41] for filter-based B abs , but various other correction techniques exist (e.g., Müller et al. [61] and Virkkula et al. [62] for the PSAP and CLAP; Collaud Coen et al. [63] and Weingartner et al. [64] for aethalometers). As the final step of data preprocessing, we normalize the raw data with different scales of values into scaled data within a specific range [65], which can significantly improve the prediction accuracy of models (see the Supplementary Material for more details).

Preprocessing Size Distribution Data
Both number and volume size distributions are selected as input variables in our model as they can provide different information about the aerosol properties. For example, large particles (such as mineral dust and sea salt) primarily affect the volume concentration, without substantially affecting the number concentration. In this work, aerosol volume distribution (N V (D P )) is calculated from number size distribution (N N (D P )) assuming that the aerosols are spherical, following: where D P is aerosol mobility diameter (nm). When processing the ARM datasets, we first convert the APS data from aerodynamic to mobility diameters assuming that the particles are spherical with a density of 1.2 g cm −3 , same as the procedure used in Ondráček et al. [66] and Shen et al. [67].
Considering that the size bins of the distributions may differ based on pre-defined parameters (such as the selected size range, size resolution, etc.), we use a few size classes to represent the entire size distribution; effectively, we represent both the number and volume size distributions using three bins (see Table 2). The corresponding aerosol concentration and percentage of total aerosol concentration falling within each class are calculated using Module A, which are later input to our model. Furthermore, the measured particle size range may vary from instrument to instrument. For example, the combination of SMPS and APS can provide a broad size range from roughly 10 nm to 20 µm, while the UHSAS measures particles with sizes ranged from 55 nm to 1 µm. In order to capture the effect of measured size range on the size classes used in our model, the ranges of each size class (d N1 and d N2 , d V1 and d V2 ) are selected as input parameters in Module A. Our default size boundaries are d N1 = 50 nm, d N2 = 200 nm, d V1 = 1000 nm, and d V2 = 2500 nm, but users can manually input lower and upper limits of each size class prior to module implementation. For example, when preprocessing the FIREX dataset, we used d N1 = 80 nm, d N2 = 200 nm, d V1 = 200 nm, and d V2 = 800 nm, since only an SMPS was deployed during the campaign. The sensitivity analysis of the predicted MAC BC due to the selected size bins will be discussed elsewhere.

Preprocessing Absorption and Scattering Measurements
Filter-based absorption photometers have known biases, largely due to the presence of the filter. To minimize measurement biases, we include the correction algorithms developed by Li et al. [40] and Bond et al. [41] as an optional module (Module B). One advantage of the correction algorithm presented in Li et al. [40] is that it updates the correction coefficients based on the optical properties (i.e., SSA and AAE) computed by the input aerosol data. However, considering that the correction method described in Bond et al. [41] is still widely used, we incorporate this method in our module.
Furthermore, because filter-based absorption photometers and nephelometers may operate at different wavelengths, we provide another optional module (Module C) to convert absorption and scattering measurements to "standard" wavelengths (467, 528, and 652 nm in the present work). For example, we use the following equation to convert B abs and B scat from the measured wavelength (λ meas ) to 652 nm: where B can be either B abs or B scat and AE can be either AAE or SAE. AAE and SAE are calculated by fitting exponential curves on the measured B abs and B scat , respectively [68,69].

Basic Principles of Data Analytical Techniques
Different data analytical approaches, including multiple linear regression model using ordinary least squares (OLS), stepwise regression, least absolute shrinkage and selection operation (LASSO), support vector machine (SVM), artificial neural network (ANN), and convolutional neural network (CNN), are introduced and applied to our datasets to find the optimal model for MAC BC prediction. The basis, advantages, and limitations of each approach are provided in Table 3. A detailed description of the mechanics for each approach can be found in the Supplementary Material.

Analytical Tools
The data analysis is performed using the programming language Python (version 3.7.5) for Windows in an "Anaconda" environment. Creating an "Anaconda" environment allows the installation and implementation of libraries and packages needed for processing data analytical techniques mentioned in this work. We provide a "User's Guide" including a detailed description of procedures to install packages, set the working environment, and apply our modules and models at https: //doi.org/10.5281/zenodo.3967833 [60].

Results
As presented in Table 1, B abs corrected by different correction methods can cause variation in the observation-based MAC BC , with mean values differing by roughly a factor of two. Therefore, we considered two versions of model, one using B abs corrected by Li et al. [40] as input variables and the numerator in Equation (2) to calculate MAC BC , and another using B abs corrected by Bond et al. [41] to build the model. In the main text, we focus on the results from the former model as Li et al. [40] observed good agreement between corrected filter-based B abs and reference B abs using photoacoustic instruments. The evaluation of the latter model can be found in the Supplementary Material (Tables S9 and S10, Figures S10-S12). We also input B abs corrected by Bond et al. [41] into the first model (without rebuilding the model from scratch) and compare the predicted MAC BC against our original results (Tables S11-S13).

Performance of Models on the Training and Test Datasets
The parameters of the seven data analytical techniques are trained using 7321 TCAP records and tested using 1830 remaining TCAP records. The results of coefficient values output from the models can be found in the Supplementary Material ( Figure S6). The performance of the techniques is assessed according to statistical criteria, including coefficient of determination (R 2 ) and mean square error (MSE). As seen in Table 4, all techniques are able to predict the changes of MAC BC to some extent, although some models are better than others based on these statistical criteria. We have also included results considering the use of the "standard" assumptions (MAC BC = 7.5 m 2 g −1 and AAE BC = 1) to determine MAC BC at 870 nm; based on the statistical criteria, this is unequivocally the worst-performing approach to predict the hourly MAC BC for any dataset. During the training phase, ANN features the highest statistical accuracy (R 2 = 0.798 and MSE = 0.227), although the second and third highest accuracy techniques (SVM and CNN) are not very different from ANN. However, traditional statistical approaches (OLS, forward and backward regressions) and LASSO have lower accuracies (R 2 ≈ 0.450 and MSE ≈ 0.950). The complicated model structures of ANN, SVM and CNN allow them to better capture nonlinear relationships between MAC BC and the aerosol properties than traditional statistical models. Moreover, the slightly better performance of ANN than CNN is consistent with our expectation, since the ANN algorithm has a deeper network structure (more parameters in the network) than the CNN algorithm.
During the testing phase, there are no major differences in the techniques' performance compared to the results from the training phase (i.e., R 2 and MSE are similar for both phases). Overall, the similar performance of the models on training and test datasets suggests that the models have achieved a satisfactory generalization capability for new and "unseen" MAC BC data at the TCAP observational site. Figure 1a,b shows the time series of predicted MAC BC by different techniques from the training and test sets, respectively, as well as the "true" MAC BC . We only present the results of OLS, LASSO, and SVM in this figure; results for the other models are provided in Figures S7 and S8. We apply the same time resolution to generate the figures as we calculated the results in Table 4. For both the training and test sets, the predicted MAC BC follows closely to "true" MAC BC with no signs of systematic errors.

Performance of Models on the Independent Validation Datasets
As the ultimate goal of comparing different techniques is to evaluate their generalizability on new datasets other than TCAP data, we apply all techniques to two independent validation sets: field data from the CACTI campaign and laboratory data from the FIREX campaign.
The CACTI campaign deployed the same instrumentation as the TCAP campaign but was conducted at a different observational site, resulting in different aerosol properties (Table 1). Hence, the CACTI data provide us with an opportunity to explore how the models may be applicable to different sampling locations and ambient aerosol sources. As seen in Figure 1c, Figure 2a and Figure S9, the seven approaches are capable of predicting MAC BC within a factor of two of the "true" MAC BC (i.e., predicted errors < ± 50%) for most of the sampling days, which suggests that our models may be valid for new datasets. However, there are some times during which our models are consistently biased high (e.g., 6-9 December) or low (e.g., a~6 h window on 20 December). Yuan et al. [16] demonstrate that MAC BC can vary with BC mixing state, so these discrepancies could be the result of an aerosol mixing state that differs from the training data. We will explore the role of mixing state and other potential explanatory variables for these systematic biases in a forthcoming companion paper.
Atmosphere 2020, 11, x FOR PEER REVIEW 11 of 22 The CACTI campaign deployed the same instrumentation as the TCAP campaign but was conducted at a different observational site, resulting in different aerosol properties (Table 1). Hence, the CACTI data provide us with an opportunity to explore how the models may be applicable to different sampling locations and ambient aerosol sources. As seen in Figures 1c, 2a and S9, the seven approaches are capable of predicting MACBC within a factor of two of the "true" MACBC (i.e., predicted errors < ± 50%) for most of the sampling days, which suggests that our models may be valid for new datasets. However, there are some times during which our models are consistently biased high (e.g., 6-9 December) or low (e.g., a ~6 h window on 20 December). Yuan et al. [16] demonstrate that MACBC can vary with BC mixing state, so these discrepancies could be the result of an aerosol mixing state that differs from the training data. We will explore the role of mixing state and other potential explanatory variables for these systematic biases in a forthcoming companion paper.
We quantitatively assess the model scatter by calculating the percentage of points that lay within a factor of two of the true response (Table S7). Overall, over 80% of the MACBC from CACTI can be estimated by our models within a factor of two, and the OLS yields the greatest percentage (92.4%). When comparing the R 2 and MSE across the approaches for the CACTI dataset (Table 4), we find that the traditional statistical approaches tend to yield similar or even better results than the deep learning approaches, which is different from the TCAP results. However, in the case of these CACTI data, the performance of all models is arguably similar (see also Figure 3 and relevant discussion).  Figure S9. The grey and brown shaded regions represent predicted errors < ±25% and < ±50% referenced to the line of perfect agreement (1:1).
We also applied these models to our FIREX dataset (laboratory study of biomass burning) to determine how our models work for aerosols that are vastly different than the ambient datasets, e.g., the mean SSA532 is 0.66 ± 0.32 for the FIREX dataset, whereas SSA532 for the ambient datasets tends to be >0.95 (Table 1). Consequently, the FIREX data provide a challenge for some, but not all, of our models. This could, at least in part, be due to an inadequate representation of chemical composition by Babs and Bscat for these biomass burning aerosols. However, it could also be related to differences between the mixing state of the aerosols, the presence of tar balls, or the fact that we are extrapolating the model away from the (presumably) aged BC in the training data to fresh BC in the emissions data. We will further investigate some of these factors in our forthcoming companion paper. Nevertheless, our SVM model does not appear to be very sensitive to the observed differences in aerosol optical  Figure S9. The grey and brown shaded regions represent predicted errors < ±25% and < ±50% referenced to the line of perfect agreement (1:1).
We quantitatively assess the model scatter by calculating the percentage of points that lay within a factor of two of the true response (Table S7). Overall, over 80% of the MAC BC from CACTI can be estimated by our models within a factor of two, and the OLS yields the greatest percentage (92.4%). When comparing the R 2 and MSE across the approaches for the CACTI dataset (Table 4), we find that the traditional statistical approaches tend to yield similar or even better results than the deep learning approaches, which is different from the TCAP results. However, in the case of these CACTI data, the performance of all models is arguably similar (see also Figure 3 and relevant discussion). dataset, falling along the "excellent"/"good" boundary with fractional error of roughly 0.35 and fractional bias of roughly 0.15. For the domains around the boundary, the traditional statistical approaches applied on the CACTI data lay inside the "excellent" region, while the SVM and neural networks fall outside (but within the "good" region), which is consistent with the results of R 2 and MSE in Table 4. Performance begins to diverge on the FIREX data set; while the fractional biases are within ±0.10, the fractional errors range from roughly 0.35 (SVM) to roughly 1.15 (OLS), with the majority falling around the "good"/"average" boundary. The greater value of fractional error in the case of the OLS approach is because it tends to predict a MACBC value that is the most different compared to the true value (e.g., Figures 1d and 2b). While the performance of the other approaches remains statistically worse than those applied to the training and test datasets (Table 4), their ability to predict MACBC still seems reasonable for FIREX (either "good" or "average").

Figure 3.
Fractional bias against fractional error for different combinations of datasets and data analytical approaches. When deriving the points with error bars, the "standard" MACBC from Bond and Bergstrom [6] is used as predicted MACBC, and "true" MACBC is used as desired output of MACBC. Figure 3. Fractional bias against fractional error for different combinations of datasets and data analytical approaches. When deriving the points with error bars, the "standard" MAC BC from Bond and Bergstrom [6] is used as predicted MAC BC , and "true" MAC BC is used as desired output of MAC BC .
We also applied these models to our FIREX dataset (laboratory study of biomass burning) to determine how our models work for aerosols that are vastly different than the ambient datasets, e.g., the mean SSA 532 is 0.66 ± 0.32 for the FIREX dataset, whereas SSA 532 for the ambient datasets tends to be >0.95 (Table 1). Consequently, the FIREX data provide a challenge for some, but not all, of our models. This could, at least in part, be due to an inadequate representation of chemical composition by B abs and B scat for these biomass burning aerosols. However, it could also be related to differences between the mixing state of the aerosols, the presence of tar balls, or the fact that we are extrapolating the model away from the (presumably) aged BC in the training data to fresh BC in the emissions data. We will further investigate some of these factors in our forthcoming companion paper. Nevertheless, our SVM model does not appear to be very sensitive to the observed differences in aerosol optical properties or the likely differences in aerosol chemical composition (Figure 2).

Comparison across Datasets
We next adopt the model performance criteria proposed by Morris et al. [81] to quantitatively inter-compare our models across different datasets. According to the criteria, the models can be classified as "excellent", "good", "average", and "[having a] fundamental problem", based on the fractional error and fractional bias. The fractional error is defined as: while fractional bias is defined as: 2 N P i − T i P i + T i (6) where P i is the predicted MAC BC and T i is the "true" MAC BC .
The results for our models are provided in Figure 3 (see Table S8 for the detailed results). All models applied on the training and test datasets meet the criteria for "excellent", with fractional bias close to zero and fractional error around 0.1. The models also have similar performance for the CACTI dataset, falling along the "excellent"/"good" boundary with fractional error of roughly 0.35 and fractional bias of roughly 0.15. For the domains around the boundary, the traditional statistical approaches applied on the CACTI data lay inside the "excellent" region, while the SVM and neural networks fall outside (but within the "good" region), which is consistent with the results of R 2 and MSE in Table 4. Performance begins to diverge on the FIREX data set; while the fractional biases are within ±0.10, the fractional errors range from roughly 0.35 (SVM) to roughly 1.15 (OLS), with the majority falling around the "good"/"average" boundary. The greater value of fractional error in the case of the OLS approach is because it tends to predict a MAC BC value that is the most different compared to the true value (e.g., Figures 1d and 2b). While the performance of the other approaches remains statistically worse than those applied to the training and test datasets (Table 4), their ability to predict MAC BC still seems reasonable for FIREX (either "good" or "average").
For the datasets included in Figure 3, we also compute the fractional error and fractional bias using the "standard" MAC BC values (4.74 ± 0.76 m 2 g −1 , 870 nm) as the "predicted" MAC BC . The "standard" MAC BC is derived using MAC BC (550 nm) of 7.5 ± 1.2 m 2 g −1 (recommended by Bond and Bergstrom [6]) and AAE of 1, following Equation (1) The "standard" MAC BC results in greater fractional bias and fractional error than most of the combinations of our dataset and approaches. The better performance of our approaches could be explained by their ability to incorporate temporally varying input parameters in the calculation of MAC BC . Such variations in MAC BC cannot be captured by the "standard" MAC BC since it is reported as a single value with a standard deviation and was recommended for fresh and uncoated BC particles.
We explore bias as a function of time in the Supplementary Materials. Specifically, we compare the time series residuals (residuals = predicted MAC BC -MAC BC.true ) computed by our models and the one using the "standard" MAC BC (see Table S15 and Figure S13). We find that our models have residuals closer to zero than the standard approach for all datasets. However, our models have a greater standard deviation than the standard approach for the validation datasets, which may be related to the fact that the standard approach is generally biased high. Conversely, the SVM model for the CACTI dataset, for example, has periods where it is biased high (e.g., that 6-9 December window), biased low (e.g., 1-3 December), and has practically no bias (e.g., 15-17 December). Nevertheless, these results suggest that our models, while uncertain, may have the potential to capture temporal variability in MAC BC .
Consequently, our results indicate that the proposed data analytical approaches work reasonably well on datasets that are not very different from our training dataset (e.g., ambient measurements from different environmental conditions) and the MAC BC estimated by these approaches are very likely to provide better predictions of BC mass than using a constant value of MAC BC . Furthermore, under this condition, the performance of different models is not very different. For datasets that are vastly different than our training dataset (e.g., aerosols with high BrC concentrations as in biomass burning), the machine learning approaches (e.g., SVM and ANN) yield lower fractional errors and fractional biases closer to zero compared to the traditional statistical approaches and the "standard" MAC BC . Moreover, our FIREX dataset (laboratory emissions) resulted in a mean SSA of 0.66 while field measurements of the aerosol optical properties in biomass burning plumes tend to report SSA > 0.9 (e.g., [82][83][84][85] and references within [86]); thus, it is possible that we have used an "extreme" dataset that may not be realistic for real world biomass burning plumes. In other words, our models may have limited utility for measurements of fresh emissions from combustion sources.

Comparison to a Different Correction for B abs
In this section, we provide a brief discussion of the results using the Bond et al. [41] correction on the model outputs. We rebuilt the models using this correction, providing values of R 2 and MSE in Table S9 and fractional error and fraction bias in Table S10. Compared to the results using the models built using the Li et al. [40] correction, the values of R 2 are slightly better for the training, test, CACTI, and some of the FIREX data in these rebuilt models, but MSE is, on average, about a factor of 5 larger.
Likewise, the fractional error is roughly 0.03 greater (on average) for the rebuilt models in the hourly training and test data, while in the validation data, the fractional error is sometimes lower (e.g., 0.484 in the original ANN model and 0.418 in the rebuilt ANN model). There are similar results for fractional bias, with the original models having slightly lower fractional bias for the training and test hourly data, while the comparison for the validation data sets are more varied (considering both magnitude and direction). Interestingly, the rebuilt machine learning models all have slightly lower fractional bias in the CACTI data but slightly larger fractional bias in the FIREX data.
To understand how the choice of B abs correction affects model output when simply running the models, we corrected B abs values using Bond et al. [41] and input these into our original models built using the Li et al. [40] correction; statistical metrics (R 2 , MSE, fractional error, and fractional bias) in Tables S11 and S12. When comparing these to the original model results presented in earlier sections, R 2 is typically lower using the Bond et al. [41] correction. Interestingly, the Li et al. [40] correction nearly always results in better performance (lower MSE, lower fractional error, fractional bias closer to zero) for the CACTI data, while the Bond et al. [41] correction nearly always has less error for the FIREX data (fractional bias varies by model).
As a further comparison between these two correction equations, we conducted t-tests using the 1-min data to determine if their mean MAC BC values were significantly different from each other in the original models (using the Li et al. [40] correction to build the model); results are provided in Table S13. All models were significantly different (p < 0.05) for the TCAP dataset, for two models in the CACTI dataset (OLS, backward stepwise), and for all but the SVM and ANN models in the FIREX dataset. Some of these statistical differences may be related to the relatively large sample sizes (>6700), because the relative percent different between the means ranged from 0.0 to 5.2%. Moreover, these differences are small compared to the estimated uncertainties in the model outputs (see Section 4).
Consequently, a holistic consideration of these metrics suggests that the Li et al. [40] correction appears to result in better performance when building the models. However, there is no strong evidence that suggests either correction model is better when running the models to predict MAC BC .

Discussion
In this section, we present an uncertainty assessment of MAC BC quantified by our data analytical models. For comparison, we also compute the uncertainties of MAC BC derived by various empirical and theoretical approaches. At the end of this section, we provide some insights for the application of these approaches.
Empirical MAC BC determined from direct measurements of BC mass and absorption is probably the least controversial, as we can estimate a 24% uncertainty in MAC BC via quadrature, based on 10% in B abs [40] and 22% in SP2-measured BC mass concentrations [87]. However, for the data analytical approaches developed in this paper, the "standard" MAC BC values reviewed by Bond and Bergstrom [6], and the theoretical approaches, each of the contributing terms may propagate uncertainties to the derived MAC BC . Therefore, we provide the following uncertainty estimations using both quadrature (because all models will be affected by random error in the input variables) and Monte Carlo approaches [88][89][90].

Uncertainty in Data Analytical Approaches
All models will have some uncertainty due to random error in the model inputs. To compare to the uncertainty of empirical MAC BC , we consider B abs to have 10% uncertainty [40] and B scat to have 10% uncertainty [91], and we assume that all of the parameters related to aerosol size have 10% uncertainty, based on [92] for SMPS and [93] for APS. Using quadrature, the models including all 20 input parameters (OLS, SVM, ANN, and CNN) have an estimated uncertainty of roughly 45%, while LASSO, with only 9 input parameters, has an estimated uncertainty of roughly 30%. This is likely an imperfect approach because the machine learning models, in particular, are very complex and because some of the model inputs are related to each other. For example, the total number (volume) concentration is equal to the sum of the number (volume) concentrations within the coarse bins, while the fractional number concentration is equal to the number (volume) concentration in a given bin divided by the total number (volume) concentration. Moreover, there are analogous equations to Equation (1) that describe the relationship between B scat and B abs across the different wavelengths.
We also consider an uncertainty analysis using the bootstrap approach (a nonparametric Monte Carlo analysis [89,90]). In our analysis, bootstrap mimics the process of obtaining new data from the TCAP site, and these data have the same underlying distribution as our original TCAP dataset. We then rebuild different data analytical models using the new datasets. The relative standard deviation (i.e., standard deviation/mean) of R 2 and MSE derived by the new models are presented in Table 5. Using the resampled datasets, the rebuilt models based for SVM and CNN tend to be more stable (i.e., lower relative standard deviations) than the traditional regression approaches and the other machine learning approaches when considering the validation datasets, so there appears to be lower model uncertainty when applying those models to new datasets. a The Bootstrap resampling process is repeated five times. The mean and standard deviation of R 2 and MSE can be found in Table S14; b the result is not interpretable, because extreme values of MSE are computed when using the new datasets to rebuild the model. We have also excluded the R 2 value for these cases.
In Figure 2 and Figure S9 and in Table S7, we demonstrate that all of our data analytical approaches can predict the majority (>63%) of the empirical MAC BC within ±50%, with one exception (the OLS model for the FIREX data). Considering that none of these models include aerosol composition or explicitly consider aerosol optical properties such as SSA and AAE, we argue that this performance is robust. Moreover, all of these models may capture temporal variability due to their use of time-resolved input data. Consequently, our models provide considerable improvements over the assumption of a constant value of MAC BC .
However, given that our models have not been evaluated using, e.g., urban aerosols, the model performance for some atmospheric environments is unknown. Considering that the inherent limitation of machine learning approaches is that they tend to perform worse for extrapolation than interpolation problems [94], our models may only have utility for environments that are similar to the training data (e.g., highly aged/mixed BC).

Uncertainty in the "Standard Approach"
As discussed previously, the standard approach to calculate MAC BC is based on MAC BC at 550 nm = 7.5 m 2 g −1 and AAE = 1, although discrepancies with both of these values exist in the literature. Therefore, uncertainties may exist when extending MAC BC to 870 nm. We conduct a Monte Carlo simulation to quantify this uncertainty, assuming that MAC BC values are normally distributed about 7.5 m 2 g −1 with a standard deviation of 1.2 and that AAE is uniformly distributed between 0.6 and 1.6. Using 10,000 iterations, we estimate an uncertainty of roughly 21% in MAC BC at 870 nm based on the standard assumption. However, it is also worth noting that this represents the uncertainty for a Atmosphere 2020, 11, 1185 15 of 21 constant value, and therefore, cannot capture any spatial or temporal variability in the "true" MAC BC , which may be around 50% following Table 1.

Uncertainty in Theoretical Approaches
It is difficult to quantify uncertainties in theoretical approaches due to the number of requisite assumptions. For any of the number of optical models that could be used to predict MAC BC (e.g., [9] and references therein), one requires the BC mixing state, BC complex refractive index, and BC density, along with the aerosol size distribution and a representation of BC-containing particles across that distribution. Moreover, the choice of a given optical model itself depends on the BC morphology. In the literature, Curci et al. [95] estimate an uncertainty of~30-35% in calculations of SSA and aerosol optical depth due to different model input assumptions, and mixing state plays a large role in these uncertainties. Similarly, Tuccella et al. [96] find the BC direct radiative effect varies by 50%, depending on the assumed mixing state.
We attempt to estimate uncertainty in MAC BC from a theoretical approach as follows. We assume that the calculation incorporates an empirical aerosol size distribution, with 10% uncertainty. We conservatively estimate that mixing state contributes~25% uncertainty based on Curci et al. [95] and Tuccella et al. [96], and we consider the range in recommended complex refractive index and density recommended by Bond and Bergstrom [6] to each contribute~10% uncertainty. We have no a priori estimates for uncertainty related to the apportionment of BC across the aerosol size distribution or BC morphology, so we conservatively estimate 10% uncertainty for each. Therefore, we estimate roughly 34% uncertainty (via quadrature) in predicted MAC BC from a theoretical approach.

Recommendations
There are benefits and limitations to each of the approaches discussed above. Experimental MAC BC requires the concurrent measurements of B abs and M BC , which is not often available at long-term monitoring sites. Our data analytical approaches may capture temporal variations in MAC BC using empirical observations of B abs , B scat , and aerosol size distributions, but they have the largest estimated uncertainties propagated from the input variables and their performance for different aerosol types than what we consider here is unknown. Conversely, the standard approach has the lowest estimated uncertainty, but because this uses a constant value, it will never capture any temporal variability in MAC BC at a given location. Moreover, while the theoretical approaches are physics-based models, each require major assumptions that may not be constrained by empirical observations. Considering all of these factors, we recommend the use of our SVM model for further exploration in the calculation of MAC BC . Any of our models can predict MAC BC using data common to ground-based monitoring networks, but the SVM model appears to be the least sensitive to different aerosol types, based on a holistic consideration of R 2 , MSE, fractional error, and fractional bias for both evaluation datasets, and its performance is relatively stable when rebuilding the model using the bootstrap approach. Moreover, SVM is the least sensitive to variations in empirical limitations (e.g., if a single wavelength of B abs and B scat are available, if only an APS is available), which will be discussed in a forthcoming companion study. Finally, while an argument could be made for the ANN or CNN models, the SVM model is more computationally efficient than those other two machine learning models, which suggests utility in processing large datasets from long-term field sites.
However, we add the caveat that when applying our model to aerosols whose properties are very different from our training datasets (see Table 1), the model performance is unknown and could result in substantial error. For example, the analysis of fresh aerosol emissions from fossil fuel combustion (such as diesel engines) may result in similar issues to those in some of our FIREX data; therefore, when the aerosols are likely dominated by BC (SSA < 0.5), using the standard assumptions (MAC BC = 7.5 m 2 g −1 ; AAE BC = 1) may be more appropriate than any of our models.

Conclusions
In this exploratory study we apply OLS regression, forward stepwise regression, backward stepwise regression, LASSO, SVM, ANN and CNN to three datasets (TCAP and CACTI ambient data and FIREX laboratory biomass burning data), in order to study their applicability to estimate MAC BC of black carbon aerosols. The inputs of the model simply include B abs , B scat , and aerosol size distribution. The nonlinear approaches (SVM, ANN and CNN) perform much better than the traditional regressions and the LASSO approach for the training and test set (TCAP data). When applying the approaches on independent validation datasets, the models can capture the overall trends in MAC BC , although there is some degree of decreased performance for some of the data. Arguably, no model is unequivocally the best for the CACTI data when holistically considering R 2 , MSE, fractional error, and fractional bias. However, SVM and CNN appear to perform the best for the FIREX data, and therefore, may be less sensitive to extreme variations in aerosol optical properties. Notably, there appears to be no obvious difference in model outputs when the input B abs data are corrected using a different approach for any of the models.
Our uncertainty analysis (via quadrature) suggests~24% uncertainty in empirically-derived MAC BC , compared to~21% in the standard approach,~34% in a theoretical formulation, and up tõ 45% in our statistical models due to the random error of our 20 input variables. However, the standard approach cannot account for temporal variations in MAC BC , which are on the order of ±50% (Table 1), while numerous assumptions are required for any theoretical formulations. Therefore, even though our analysis suggests the highest propagated uncertainty, we recommend the use of our SVM model for the calculation of MAC BC for long-term field monitoring sites because it requires no assumptions, and it incorporates temporally varying aerosol properties and therefore, has the potential to capture temporal variations in MAC BC . Ultimately, we anticipate that the exploratory approaches in this work can pave the way for design of data-driven models to estimate complex aerosol properties in future studies.
In a forthcoming paper, we will discuss the extension of the predicted MAC BC to different wavelengths. The extension requires underlying assumptions to establish the relationship among MAC BC , wavelength, and AAE BC (Equation (1)), where AAE BC can be assumed either to be constant or to vary with local aerosol properties. We will also explore how the predicted MAC BC are impacted by varying atmospheric parameters and changing input variables (such as using single-wavelength B abs , rather than three-wavelength B abs ), in order to quantify to what extent our model results are sensitive and robust.

Acknowledgments:
The ambient data at TCAP and CACTI sites were obtained from the Atmospheric Radiation Measurement (ARM) user facility, a U.S. Department of Energy (DOE) Office of Science user facility managed by the Office of Biological and Environmental Research. The authors would like to thank Rongjun Qin (The Ohio State University) and Adam Varble (Pacific Northwest National Laboratory) for useful discussions on machine learning and the CACTI campaign, respectively. The authors also thank the anonymous reviewers for constructive comments.

Conflicts of Interest:
The authors declare no conflict of interest.