Variability of Major Aerosol Types in China Classified Using AERONET Measurements

Aerosol type is a critical piece of information in both aerosol forcing estimation and passive satellite remote sensing. However, the major aerosol types in China and their variability is still less understood. This work uses direct sun measurements and inversion derived parameters from 47 sites within the Aerosol Robotic Network (AERONET) in China, with more than 39,000 records obtained between April 1998 and January 2017, to identify dominant aerosol types using two independent methods, namely, K means and Self Organizing Map (SOM). In total, we define four aerosol types, namely, desert dust, scattering mixed, absorbing mixed and scattering fine, based on their optical and microphysical characteristics. Seasonally, dust aerosols mainly occur in the spring and over North and Northwest China; scattering mixed are more common in the spring and summer, whereas absorbing aerosols mostly occur in the autumn and winter during heating period, and scattering fine aerosols have their highest occurrence frequency in summer over East China. Based on their spatial and temporal distribution, we also generate seasonal aerosol type maps that can be used for passive satellite retrieval. Compared with the global models used in most satellite retrieval algorithms, the unique feature of East Asian aerosols is the curved single scattering albedo spectrum, which could be related to the mixing of black carbon with dust or organic aerosols.


Introduction
In recent decades, with its rapid economic development, China has been suffering from severe air pollution and has thus become the focus of global aerosol study. Aerosol properties in China are complex and high variable, due to the various sources including industrial pollution, seasonal biomass burning and dust [1][2][3][4][5][6][7]. It is thus important to understand the major types of aerosols in China and their spatial-temporal variability, in order to better estimate their climate and environmental effects.
The information of aerosol type is also critical in aerosol optical depth retrieval using passive satellite sensors, in that this information must be assumed beforehand in the retrieval algorithms [8][9][10][11][12]. However, many satellite aerosol products, such as those from MODIS and VIIRS, still have large uncertainties in China. Li et al. [13] evaluated the VIIRS AOD product over mainland China and found it had an overall high bias of 0.13 compared with ground measurements. Tao et al. [14] compared MODIS aerosol retrievals over China with ground observation and showed that Dark Target (DT) retrievals tended to overestimate the aerosol loading while Deep Blue (DB) retrievals exhibited obvious underestimation in northwestern and southern China. Zhang et al. [15] discussed the quality of the MODIS C6 AOD, whose result was the 3 km (10 km) AOD product performed better in suburban (urban) areas. Zhu et al. [16] found VIIRS AOD overestimated more over the North China Plain region, and AOD bias in Beijing was the largest. One of the reasons for these uncertainties is that assumption of aerosol type is not accurate.

Parameters Variables
Angstrom Exponent 3 Angstrom Exponent-Total, Absorption Angstrom Exponent, Alpha Single Scattering Albedo SSA440 1 -T 2 , SSA676 1 -T 2 , SSA869 1 -T 2 , SSA1020 1 -T 2 1 440, 675, 676, 869, 870, 1020 refers to the wavelength in nm. 2 T refers to total. F refers to fine. C refers to coarse. 3 Angstrom Exponent-Total and Absorption Angstrom Exponent are calculated from the AOD obtained by sky scattering light inversion. Alpha is the Angstrom index calculated from the AOD obtained by direct observation of solar radiation.
In total, the 47 sites selected in this paper have 39,249 aerosol observations (Level 2.0 AERONET data) from April 1998 to January 2017. Optical depths and alpha are retrieved by direct sun observation with higher accuracy. Therefore, these two parameters are first used as the basis of screening. For this study, the range of acceptable optical depths is − to + 3 , and the acceptable alpha (Angstrom Exponent) is greater than ℎ − 3 ( refers to one standard deviation) [17]. Data beyond these thresholds are considered abnormal and are removed. This brings the sample size down to 36,477. Data with missing values cannot participate in clustering, so we remove it and 18,191 observations remain. There are four sites with less data removed in this process. To reduce spatial bias of the data, two sites in North China (Xianghe and Beijing_CMA) where site density is high were eliminated because they are too close to the Beijing site while the data quality is not as high. The remaining observations of spring, summer, autumn and winter are 4432, 1387, 3207 and 4244, respectively. The summer data volume is much smaller due to increased cloudy and rainy conditions. We therefore further reduce the temporal bias by evenly reducing the spring and winter data by 2/3 (reduce 2 out of every 3), and the autumn data by 1/2 (reduce 1 out of 2). This results in the number of observations in four seasons being 1478, 1387, 1603 and 1414, respectively, so that the data entering in the clustering algorithm are evenly distributed in time and space. After quality control and pre-processing, 5882 observations from 41 sites remain for further analysis. Figure  2 shows the data processing, and Table 2   In total, the 47 sites selected in this paper have 39,249 aerosol observations (Level 2.0 AERONET data) from April 1998 to January 2017. Optical depths and alpha are retrieved by direct sun observation with higher accuracy. Therefore, these two parameters are first used as the basis of screening. For this study, the range of acceptable optical depths is AOD mean − σ to AOD mean + 3σ, and the acceptable alpha (Angstrom Exponent) is greater than alpha mean − 3σ (σ refers to one standard deviation) [17]. Data beyond these thresholds are considered abnormal and are removed. This brings the sample size down to 36,477. Data with missing values cannot participate in clustering, so we remove it and 18,191 observations remain. There are four sites with less data removed in this process. To reduce spatial bias of the data, two sites in North China (Xianghe and Beijing_CMA) where site density is high were eliminated because they are too close to the Beijing site while the data quality is not as high. The remaining observations of spring, summer, autumn and winter are 4432, 1387, 3207 and 4244, respectively. The summer data volume is much smaller due to increased cloudy and rainy conditions. We therefore further reduce the temporal bias by evenly reducing the spring and winter data by 2/3 (reduce 2 out of every 3), and the autumn data by 1/2 (reduce 1 out of 2). This results in the Remote Sens. 2019, 11, 2334 4 of 16 number of observations in four seasons being 1478, 1387, 1603 and 1414, respectively, so that the data entering in the clustering algorithm are evenly distributed in time and space. After quality control and pre-processing, 5882 observations from 41 sites remain for further analysis. Figure 2 shows the data processing, and Table 2 indicates the number of stations and observations after each processing step over the four regions shown in Figure 1. All parameters are further standardized by their standard deviation before entering the clustering algorithm.
Remote Sens. 2019, 11, x FOR PEER REVIEW 4 of 18 processing step over the four regions shown in Figure 1. All parameters are further standardized by their standard deviation before entering the clustering algorithm.  In this paper, K means clustering method is first used to classify aerosol. K means clustering is a dynamic clustering method for clustering objects into a given number of K classes. The steps are as follows.
1. Roughly divide all objects into K initial classes and determine their central points, where K is a given natural number. 2. Adjust the existing classes and assign each object to the class nearest to its center (such as mean vector). 3. Recalculate the center point of the class that has objects called out or called in, and repeat step 2 for readjustment and so on until a reasonable classification is obtained.
This clustering method is easy to understand and calculate, and it is a classical algorithm in clustering analysis.
We further use the Davies-Bouldin index [23] to define the "best" number of clusters. The DB index is defined as the ratio of within-cluster scatter ( ) to between-cluster separation ( ). The specific definitions are as follows:  In this paper, K means clustering method is first used to classify aerosol. K means clustering is a dynamic clustering method for clustering objects into a given number of K classes. The steps are as follows.

1.
Roughly divide all objects into K initial classes and determine their central points, where K is a given natural number.

2.
Adjust the existing classes and assign each object to the class nearest to its center (such as mean vector).

3.
Recalculate the center point of the class that has objects called out or called in, and repeat step 2 for readjustment and so on until a reasonable classification is obtained.
This clustering method is easy to understand and calculate, and it is a classical algorithm in clustering analysis.
We further use the Davies-Bouldin index [23] to define the "best" number of clusters. The DB index is defined as the ratio of within-cluster scatter (S i ) to between-cluster separation (d ij ). The specific definitions are as follows: In the equations, a i represents the collection of class i elements, k i represents the central value of class i, M is the number of classes. S i can reflect the compactness within each category and d ij can reflect the dispersity between different categories. DB is thus an evaluation of in-cluster distance and between-cluster distance, and the smaller the DB index is, the better the clustering result is.

SOM Clustering
In order to increase the robustness of the clustering results, we also use the Self Organizing Map (SOM) method to evaluate whether the clustered patterns are stable.
SOM is an unsupervised learning algorithm that can obtain good visualization of the data. The structure of SOM network simulates the function of self-organizing feature mapping of human neural network. The network consists of the input layer and the output layer (competition layer), in which the number of neurons in the input layer is determined by the number of vectors in the input network, and the output layer is arranged into a two-dimensional node matrix by the neurons in a certain manner. The neurons in the input layer and the neurons in the output layer are connected together through the weights, which are constantly adjusted. The final stable network output then generates a natural feature map of the input pattern, thus achieving the goal of automatic clustering [24].
SOM automatically assigns similar data to the same node. This algorithm assumes that there are some topological structure or sequence in the input objects, and it can realize the dimensionality reduction mapping from the input space (n-dimension) to the output plane (2-dimension). While K means determines which type each data belongs to according to the closest distance between the data point to the centroid of four types as described in Section 2.2.1. The clustering effect of K means is that the points within the class are close enough and the points between different classes are far enough. In fact, SOM with a small number of nodes works similar to K means, and the criteria of clustering are both based on the distance between a data point and a given point (the centroid for K means and the node for SOM). Although K means is a highly popular algorithm, its results can be quite sensitive to initialization and may give very different results each time, whereas SOM does not have this drawback. Therefore, to increase the credibility of the clustering results, we simultaneously used these two different methods.

Clustering Analysis
The aerosol data are classified into different clusters (2~8 clusters) by the K means method, and their DB indices are calculated respectively. The values are shown in Figure 3. It is seen that the DB index is the smallest when the cluster number is four. We therefore consider this as the reasonable number. The sample size and centroids of the four cluster results are listed in Table 3. For more robust results, we redo the clustering using an independent technique-SOM. We have actually tried different numbers of nodes. We found that using 3 nodes produces a slightly lower DB index than using 4 nodes, but the sample numbers of the three types are quite unbalanced (one type has the majority of the data points). When using 5 nodes or more, the DB index increases. We therefore consider 4 nodes as the optimum number. This result also agrees the best with K means results. The SOM results are Remote Sens. 2019, 11, 2334 6 of 16 given in the parenthesis of Table 3 in parallel to the K means results. We notice that the results of the two techniques are almost identical, which increases the credibility of the clustering results.
cases when a data being classified into one type by K means is later assigned to a different type by reclassification. Then, all 39,249 data are used and classified into the defined four types according to the Euclidean distance between the data and the centroid of the four types, i.e., the selecting the centroid with the smallest distance and classifying the data into that type. In addition, we define a distance threshold as the longest distance from the centroid of each type, i.e., if the shortest distance between a data point and a centroid is still greater than the distance threshold of this type, this data is not classified and is thus discarded. In this way, we classified 39,249 data records by discarding only 47 records.  Moreover, to rule out the effect of sampling on the clustering results, we further perform a Monte Carlo type validation by randomly selecting half of the data from the original data as a subset to conduct data processing and clustering analysis and repeat the process ten times. Each time we compare the clustering results of the subset with those of all data. If one observation data is classified as the same aerosol type in both cases, it is marked as "correct", thus the accuracy rate of the four types can be obtained by counting the number of correct classifications. Figure 4 shows the accuracy  When analyzing the spatial distribution and seasonal variability of aerosol types in the next section, we hope to know the aerosol type information from all sites. So we use all data in this step, but the data not used for clustering analysis must be manually classified according to its distance from the clustered centers. For the observations used in the clustering step, reclassification will not affect the results. This is because our reclassification criterion is the same as that used in K means. K means determines to which type each data belongs to according to the closest distance between this data point and the centroid of four types. When we decide which type this data should belong to, we also assign it the type whose centroid is the closest to this data point. Therefore, there will not be cases when a data being classified into one type by K means is later assigned to a different type by reclassification. Then, all 39,249 data are used and classified into the defined four types according to the Euclidean distance between the data and the centroid of the four types, i.e., the selecting the centroid with the smallest distance and classifying the data into that type. In addition, we define a distance threshold as the longest distance from the centroid of each type, i.e., if the shortest distance between a data point and a centroid is still greater than the distance threshold of this type, this data is not classified and is thus discarded. In this way, we classified 39,249 data records by discarding only 47 records.
Moreover, to rule out the effect of sampling on the clustering results, we further perform a Monte Carlo type validation by randomly selecting half of the data from the original data as a subset to conduct data processing and clustering analysis and repeat the process ten times. Each time we compare the clustering results of the subset with those of all data. If one observation data is classified as the same aerosol type in both cases, it is marked as "correct", thus the accuracy rate of the four types can be obtained by counting the number of correct classifications. Figure 4 shows the accuracy rate for these ten experiments using the K means method. It is seen that most accuracy rates are above 80%, with more than half of them exceeding 90%. Therefore, we consider that the four types identified are stable.
Remote Sens. 2019, 11, x FOR PEER REVIEW 7 of 18 rate for these ten experiments using the K means method. It is seen that most accuracy rates are above 80%, with more than half of them exceeding 90%. Therefore, we consider that the four types identified are stable. The four types identified by the analysis are tentatively named as dust, scattering mixed type, absorbing mixed type and scattering fine type respectively, primarily based on their scattering/absorption parameters (single scattering albedo, refractive indices, absorption Angstrom Exponent) and size parameters (effective radius, fine mode fraction, extinction Angstrom Exponent). Because of the difficulty in high dimensional visualization, Figure 5a-d shows the scatter plot between two representative parameters-single scattering albedo (SSA) and fine mode fraction (FMF). It is observed that type 3 has 440 nm SSA about 0.85 which indicates the strongest absorption. Type 2 and type 4 have SSA above 0.9 in four wavelengths, which implies relatively strong scattering. On the other hand, Type 1 has the lowest FMF, while Type 4 has the highest FMF. As indicated by Chen et al. [19], FMF < 0.4 indicates that coarse particles dominate the aerosol model, FMF > 0.6 indicates that fine particles dominate the aerosol model, and between them, the aerosol model is mixed. In Figure 5e, we further define a range for each type based on their scatter plots. Although there are slight overlaps between scattering mixed with absorbing mixed and scattering fine types, overall the ranges of four types are well separated. Dust has the lowest FMF and moderate SSA; scattering fine has the highest FMF and highest SSA; absorbing mixed has the lowest SSA and moderate FMF, while the properties of scattering mixed lie between those of scattering fine and absorbing mixed. These results support the validity of our clustering results.  The four types identified by the analysis are tentatively named as dust, scattering mixed type, absorbing mixed type and scattering fine type respectively, primarily based on their scattering/absorption parameters (single scattering albedo, refractive indices, absorption Angstrom Exponent) and size parameters (effective radius, fine mode fraction, extinction Angstrom Exponent). Because of the difficulty in high dimensional visualization, Figure 5a-d shows the scatter plot between two representative parameters-single scattering albedo (SSA) and fine mode fraction (FMF). It is observed that type 3 has 440 nm SSA about 0.85 which indicates the strongest absorption. Type 2 and type 4 have SSA above 0.9 in four wavelengths, which implies relatively strong scattering. On the other hand, Type 1 has the lowest FMF, while Type 4 has the highest FMF. As indicated by Chen et al. [19], FMF < 0.4 indicates that coarse particles dominate the aerosol model, FMF > 0.6 indicates that fine particles dominate the aerosol model, and between them, the aerosol model is mixed. In Figure 5e, we further define a range for each type based on their scatter plots. Although there are slight overlaps between scattering mixed with absorbing mixed and scattering fine types, overall the ranges of four types are well separated. Dust has the lowest FMF and moderate SSA; scattering fine has the highest FMF and highest SSA; absorbing mixed has the lowest SSA and moderate FMF, while the properties of On the other hand, Type 1 has the lowest FMF, while Type 4 has the highest FMF. As indicated by Chen et al. [19], FMF < 0.4 indicates that coarse particles dominate the aerosol model, FMF > 0.6 indicates that fine particles dominate the aerosol model, and between them, the aerosol model is mixed. In Figure 5e, we further define a range for each type based on their scatter plots. Although there are slight overlaps between scattering mixed with absorbing mixed and scattering fine types, overall the ranges of four types are well separated. Dust has the lowest FMF and moderate SSA; scattering fine has the highest FMF and highest SSA; absorbing mixed has the lowest SSA and moderate FMF, while the properties of scattering mixed lie between those of scattering fine and absorbing mixed. These results support the validity of our clustering results. In addition to single wavelength properties, the spectral dependencies of many parameters are also associated with aerosol composition. For example, as indicated by several previous studies [25][26][27][28], dust SSA increases with wavelength due to its UV absorption, whereas anthropogenic aerosols show a reversed behavior. We therefore continue to examine the spectra for three parameters of the four types in Figure 6: (a) single scattering albedo (SSA), (b) asymmetry parameter (g), (c) aerosol optical depth (AOD). In Figure 6a, it is clearly seen that SSA of desert dust increases with wavelength while that of other types decreases with wavelength, which is due to the absorption of dust in the UV wavelengths and is consistent with the previously documented results. Moreover, the SSA spectra of absorbing and scattering mixed aerosols both exhibit non-monotonic behavior, i.e., SSA first increases with wavelength from 440 to 675 nm, then decreases. This is an indication of mixing of black carbon with dust or organic carbon and is a typical feature of East Asian aerosols [27]. According to the more quantitative method proposed by Li et al. [27], absorbing mixed (with greater SSA spectral curvature) is likely black carbon mixed more with dust, whereas scattering mixed likely contain more organic carbon. With respect to asymmetry parameter, desert dust has the g factor above 0.7 and varies little with wavelength, whereas that of other types decreases rapidly with wavelength ( Figure 6b). Figure  6c also shows that dust AOD has a much more flat spectrum than others. These clear differences between dust and the other aerosol types confirm that the former indeed primarily consists of larger particles. In addition to single wavelength properties, the spectral dependencies of many parameters are also associated with aerosol composition. For example, as indicated by several previous studies [25][26][27][28], dust SSA increases with wavelength due to its UV absorption, whereas anthropogenic aerosols show a reversed behavior. We therefore continue to examine the spectra for three parameters of the four types in Figure 6: (a) single scattering albedo (SSA), (b) asymmetry parameter (g), (c) aerosol optical depth (AOD). In Figure 6a, it is clearly seen that SSA of desert dust increases with wavelength while that of other types decreases with wavelength, which is due to the absorption of dust in the UV wavelengths and is consistent with the previously documented results. Moreover, the SSA spectra of absorbing and scattering mixed aerosols both exhibit non-monotonic behavior, i.e., SSA first increases with wavelength from 440 to 675 nm, then decreases. This is an indication of mixing of black carbon with dust or organic carbon and is a typical feature of East Asian aerosols [27]. According to the more quantitative method proposed by Li et al. [27], absorbing mixed (with greater SSA spectral curvature) is likely black carbon mixed more with dust, whereas scattering mixed likely contain more organic carbon. With respect to asymmetry parameter, desert dust has the g factor above 0.7 and varies little with wavelength, whereas that of other types decreases rapidly with wavelength ( Figure 6b). Figure 6c also shows that dust AOD has a much more flat spectrum than others. These clear differences between dust and the other aerosol types confirm that the former indeed primarily consists of larger particles. is likely black carbon mixed more with dust, whereas scattering mixed likely contain more organic carbon. With respect to asymmetry parameter, desert dust has the g factor above 0.7 and varies little with wavelength, whereas that of other types decreases rapidly with wavelength (Figure 6b). Figure  6c also shows that dust AOD has a much more flat spectrum than others. These clear differences between dust and the other aerosol types confirm that the former indeed primarily consists of larger particles.

Spatial Distribution and Seasonal Variability of the Aerosol Types
With the clustering information in hand, it becomes possible to examine the spatial and temporal variability of major aerosol types in China. Figure 7 shows the distribution of frequency of occurrence of each type for each season. Note that although the clustering only uses a selected portion of AERONET data described in Section 2, Figure 7 is produced using all data. Each data point is classified into one of the four types according to the threshold mentioned in Section 3.1 and the Euclidian distance between this data point and the center of each type. Some sites do not appear on the map during certain seasons because they have available data only for some seasons but not for others. In addition, five sites representative of major pollution source regions in China (SACOL, Beijing, Taihu, Taiwan Cheng Kung University and Hong Kong) are further selected to examine their seasonal variability of aerosol types in detail ( Figure 8). Figure 7 shows that desert dust mainly occurs in northwest China in the spring (MAM), close to the major dust sources of the Taklimakan and Gobi deserts. The North China Plain also has significant amounts of dust in the spring due to the transport by the prevailing west winds in this season. These results are consistent with previous studies [29][30][31]. Figure 8 also reveals that dust has high frequency of occurrence in the spring at SACOL (Northwest China) and Beijing (North China Plain). Moreover, Taihu is also dominated by dust aerosol in spring. Scattering mixed type occurs mostly in the spring and summer (JJA) over East China and South China, and it also appears relatively frequently in the South China all year round (Figures 7 and 8). A large number of such aerosols in south China are due to the emission from biomass burning, industrial activities, automobile exhaust and so on. Another reason may be that organic carbon aerosols released by extensive biomass burning activities in Southeast Asia are transported to South China by the monsoon [32]. The total amount of absorbing mixed type is relatively small and is mainly found over North China Plain and East China and occurs most frequently in the fall (SON) and winter (DJF) (Figures 7 and 8). The seasonal feature is consistent with the heating time when many carbonaceous aerosols are released from coal combustion and vehicle exhausts [33,34]. For the scattering fine type (Figures 7 and 8), the occurrence mainly concentrates over North China Plain and East China during the summer and fall. These two seasons are featured with high humidity and precipitation, and the suspended water droplets can facilitate the conversion of gases to particles, leading to more fine mode aerosols with strong scattering (such as the conversion of sulfur dioxide to sulfate particles, [33,35]). Moreover, hygroscopic growth of aerosols also increases their scattering efficiency. Both factors favor the higher scattering properties retrieved by AERONET. Nonetheless, due to the limitation of remote sensing data, we cannot isolate the relative humidity effect.
Southeast Asia are transported to South China by the monsoon [32]. The total amount of absorbing mixed type is relatively small and is mainly found over North China Plain and East China and occurs most frequently in the fall (SON) and winter (DJF) (Figures 7 and 8). The seasonal feature is consistent with the heating time when many carbonaceous aerosols are released from coal combustion and vehicle exhausts [33][34]. For the scattering fine type (Figures 7 and 8), the occurrence mainly concentrates over North China Plain and East China during the summer and fall. These two seasons are featured with high humidity and precipitation, and the suspended water droplets can facilitate the conversion of gases to particles, leading to more fine mode aerosols with strong scattering (such as the conversion of sulfur dioxide to sulfate particles, [33,35]). Moreover, hygroscopic growth of aerosols also increases their scattering efficiency. Both factors favor the higher scattering properties retrieved by AERONET. Nonetheless, due to the limitation of remote sensing data, we cannot isolate the relative humidity effect.  Note that here we focus on identifying the optical and microphysical properties of different aerosol types, which are critical in aerosol forcing estimation and satellite aerosol retrievals. The remote sensing measurements that we used cannot give the detailed chemical composition of aerosols. By comparing Figure 7 with existing studies that report aerosol chemical composition in China ( [5,7,17,36,37]), we can make some inference of the composition of four aerosol types. The dust type should mainly contain sand dust, urban fugitive dust and coal ash ( [7]), which is the main component of aerosol in northwest China. Zhang et al. [7] also found black carbon with strong absorption has its high level of mass concentrations in North China especially during autumn and winter, consistent with spatial and seasonal distribution of our absorbing mixed type. Many previous studies [5,17,36,37] showed the small particles with strong scattering property such as organic carbon, sulfate, nitrate and ammonium all have high concentration in China especially in urban due to various emission sources. They also account for a large proportion of aerosol composition in summer which maybe driven by the aqueous processing and gas-phase photochemical production. Therefore, it is likely that the two scattering types mainly consist of organic carbon, ammonium sulfate and nitrate. Moreover, the scattering mixed type should contain more organic carbon due to its SSA spectra. Besides, Figure 8  Muti-year mean monthly frequency of aerosols in five sites. Type 1 is dust desert. Type 2 is scattering mixed type. Type 3 is absorbing mixed type. Type 4 is scattering fine type.
Note that here we focus on identifying the optical and microphysical properties of different aerosol types, which are critical in aerosol forcing estimation and satellite aerosol retrievals. The remote sensing measurements that we used cannot give the detailed chemical composition of aerosols. By comparing Figure 7 with existing studies that report aerosol chemical composition in China ( [5,7,17,[36][37]), we can make some inference of the composition of four aerosol types. The dust type should mainly contain sand dust, urban fugitive dust and coal ash ( [7]), which is the main component of aerosol in northwest China. Zhang et al. [7] also found black carbon with strong absorption has its high level of mass concentrations in North China especially during autumn and winter, consistent with spatial and seasonal distribution of our absorbing mixed type. Many previous studies [5,17,[36][37] showed the small particles with strong scattering property such as organic carbon, sulfate, nitrate and ammonium all have high concentration in China especially in urban due to various emission sources. They also account for a large proportion of aerosol composition in summer which maybe driven by the aqueous processing and gas-phase photochemical production. Therefore, it is likely that the two scattering types mainly consist of organic carbon, ammonium sulfate and nitrate. Moreover, the scattering mixed type should contain more organic carbon due to its SSA spectra. Besides, Figure 8   Muti-year mean monthly frequency of aerosols in five sites. Type 1 is dust desert. Type 2 is scattering mixed type. Type 3 is absorbing mixed type. Type 4 is scattering fine type.
Briefly, dust aerosols are mainly found over Northwest China and the North China Plain in the spring, scattering mixed aerosols over Southeast China in the spring and summer, absorbing mixed type over East China in the fall and winter, and scattering fine type over East China in the summer and fall.
An advantage of sun photometers is their continuous measurements during the daytime, allowing the examination of diurnal variability of aerosol type. We conduct a statistical analysis on the diurnal variation of aerosol types by calculating the averaged frequency of diurnal aerosol type change. Specifically, for each station, the number of days with at least four daily observations is counted, and then the number of days on which more than one type is found is calculated; the ratio between the latter and the former thus represents the frequency of diurnal change of aerosol type. The results are shown in Figure 9. We can see that except for Northwest China, diurnal type change is frequently observed (frequency > 0.5) for most East and South China sites. This is reasonable as in the Northwest aerosol composition is relatively simple (mainly dust as discussed in the previous section), whereas in East and South China, complicated mixtures are often found due to both local emission and remote transport. We also analyzed the frequency of occurrence of the four aerosol types in the morning and afternoon but did not find outstanding patterns. Therefore, diurnal variability of aerosol types is a common phenomenon and cannot be ignored. This has important implications for both climate modeling and satellite remote sensing. Current climate models generally use averaged optical parameters while different aerosol type will result in different radiative forcing. In addition, aerosol retrieval from geostationary platforms has become increasingly popular. Such practice also needs to assume the aerosol model and without accounting for the diurnal variability of aerosol type will inevitably introduce uncertainties in the results. We will present detailed analysis of the effect of aerosol diurnal change on radiative forcing and aerosol retrieval in a following study. section), whereas in East and South China, complicated mixtures are often found due to both local emission and remote transport. We also analyzed the frequency of occurrence of the four aerosol types in the morning and afternoon but did not find outstanding patterns. Therefore, diurnal variability of aerosol types is a common phenomenon and cannot be ignored. This has important implications for both climate modeling and satellite remote sensing. Current climate models generally use averaged optical parameters while different aerosol type will result in different radiative forcing. In addition, aerosol retrieval from geostationary platforms has become increasingly popular. Such practice also needs to assume the aerosol model and without accounting for the diurnal variability of aerosol type will inevitably introduce uncertainties in the results. We will present detailed analysis of the effect of aerosol diurnal change on radiative forcing and aerosol retrieval in a following study.

Aerosol Type Map for Satellite Remote Sensing
Another important practical use of aerosol classification is to facilitate the assumption of aerosol model in passive satellite retrievals, such as those from Moderate Resolution Spectroradiometer (MODIS) and the Visible Infrared Imaging Radiometer Suite (VIIRS). For example, the aerosol model used in the MODIS retrieval algorithm is based on the classification study using AERONET measurements by Omar et al. [17]. However, many of these global models are not quite representative

Aerosol Type Map for Satellite Remote Sensing
Another important practical use of aerosol classification is to facilitate the assumption of aerosol model in passive satellite retrievals, such as those from Moderate Resolution Spectroradiometer (MODIS) and the Visible Infrared Imaging Radiometer Suite (VIIRS). For example, the aerosol model used in the MODIS retrieval algorithm is based on the classification study using AERONET measurements by Omar et al. [17]. However, many of these global models are not quite representative of East Asian aerosols. In fact, only one station in China, Beijing, is used in the Omar et al. [17] work. On the one hand, East Asia is a global pollution hotspot with complicated emission sources and aerosol composition and deserves special attention in aerosol observation. On the other hand, current satellite aerosol products still have large uncertainties over this region, and aerosol model assumption has been suggested as a major source of uncertainties (e.g., [13]). As a result, it is thus necessary to refine the representation of aerosol optical and microphysical properties in East Asia in the retrieval algorithms in order to improve the retrieval accuracy over this region.
The SSA and g spectra used by MODIS and VIIRS are shown in Figure 10 whose aerosol model information comes from reference [38,39]. By comparing Figures 6 and 10, obviously desert dust relates to dust in Figure 10, and we can infer that absorbing mixed type should contain more black carbon. Scattering fine type maybe dominated by sulfate and nitrate aerosols. However, because in reality the aerosols are almost always mixed and the sun photometers retrieved the integrated properties, it is not possible to accurately define their chemical composition. We find that the major difference is the lack of obviously curved SSA spectral in the MODIS and VIIRS fine aerosol models, which is a characteristic frequently occurring over East Asia. We further attempt to generate a seasonal map of aerosol type distribution in China based on the spatial and seasonal variability of the four aerosol types revealed by Figure 7, as shown in Figure 11. Although the resolution is coarse and there are many places not covered due to the limited number of AERONET sites, this map provides more detailed information about aerosol type variability in China than that used for many satellite aerosol retrieval algorithms such as MODIS and VIIRS. With the ongoing ground observation efforts in China, it is promising that this map could become more accurate in the future. Remote Sens. 2019, 11, x FOR PEER REVIEW 13 of 17 Figure 10. Aerosol models used by MODIS [38] and VIIRS [39]. Figure 11. Seasonal distribution of aerosol types used recommended for satellite aerosol retrieval. Figure 10. Aerosol models used by MODIS [38] and VIIRS [39].

Conclusions and Discussion
Remote Sens. 2019, 11, x FOR PEER REVIEW 14 of 18 Figure 11. Seasonal distribution of aerosol types used recommended for satellite aerosol retrieval.

Conclusions and Discussion
In this study, we identify major aerosol types in China using cluster analysis of AERONET measurements. Two independent clustering techniques, K means and SOM, are used, and they produce highly consistent results. The main conclusions of this study are summarized below: (1) We identify four types, namely desert dust, scattering mixed type, absorbing mixed type and scattering fine type, respectively, according to their optical and microphysical properties.

Conclusions and Discussion
In this study, we identify major aerosol types in China using cluster analysis of AERONET measurements. Two independent clustering techniques, K means and SOM, are used, and they produce highly consistent results. The main conclusions of this study are summarized below: (1) We identify four types, namely desert dust, scattering mixed type, absorbing mixed type and scattering fine type, respectively, according to their optical and microphysical properties.
(2) Briefly, dust aerosols have the lowest FMF, slowest variation of AOD and asymmetry factor with wavelength and increased SSA with wavelength. Absorbing mixed type has the lowest overall SSA, while scattering fine has the highest SSA values. The latter also has the highest FMF. Absorbing mixed and scattering mixed types both have non-monotonic SSA spectra, i.e., their SSA first increases with wavelength from 440 to 675 nm and then decreases, which is an indication of the mixing of black carbon with dust or organic carbon aerosols.
(3) In terms of space-time variability, dust aerosols are typically found in the spring season over Northwest China and the North China Plain. Scattering mixed type mostly occurs in the spring and summer over Southeast China. The absorbing mixed aerosols appear frequently in autumn and winter over most of the East China region, corresponding to heating period. The scattering fine aerosol has the highest frequency of occurrence in the summer and fall over East China.
(4) Based on these spatial and temporal variability, we further produce a seasonal aerosol type map to facility aerosol retrieval from passive satellite sensors such as MODIS and VIIRS.
The aerosol type classification presented in this paper can provide an useful reference for the aerosol parameterization in climate models, as well as aerosol model assumption in satellite retrieval algorithms. To our knowledge, our study is the most comprehensive aerosol type classification work using ground based remote sensing data. The results have added to our understanding of aerosol composition and their optical and microphysical properties in China. The newly established aerosol model can also help improve satellite retrieval accuracy. In fact, in another paper [40], we have used this updated aerosol model to retrieve aerosol optical depth over the Beijing area and obtained very promising results.
Nonetheless, the use of remote sensing data can only define the types based on their optical properties, while a detailed study of aerosol composition requires in-situ chemical measurements. In the future, we plan to incorporate such data to validate and improve our classification results. We will also try to incorporate observations from domestic sun photometer networks. Moreover, the change of aerosol type on seasonal and diurnal scale has significant impact on both aerosol forcing estimation and satellite retrieval. In the future we will investigate these effects in more detail.