Classification of Clouds in Satellite Imagery Using Adaptive Fuzzy Sparse Representation

Automatic cloud detection and classification using satellite cloud imagery have various meteorological applications such as weather forecasting and climate monitoring. Cloud pattern analysis is one of the research hotspots recently. Since satellites sense the clouds remotely from space, and different cloud types often overlap and convert into each other, there must be some fuzziness and uncertainty in satellite cloud imagery. Satellite observation is susceptible to noises, while traditional cloud classification methods are sensitive to noises and outliers; it is hard for traditional cloud classification methods to achieve reliable results. To deal with these problems, a satellite cloud classification method using adaptive fuzzy sparse representation-based classification (AFSRC) is proposed. Firstly, by defining adaptive parameters related to attenuation rate and critical membership, an improved fuzzy membership is introduced to accommodate the fuzziness and uncertainty of satellite cloud imagery; secondly, by effective combination of the improved fuzzy membership function and sparse representation-based classification (SRC), atoms in training dictionary are optimized; finally, an adaptive fuzzy sparse representation classifier for cloud classification is proposed. Experiment results on FY-2G satellite cloud image show that, the proposed method not only improves the accuracy of cloud classification, but also has strong stability and adaptability with high computational efficiency.


Introduction
Meteorological satellites have advantages such as a wide range of spatial observations, high temporal resolution, all-weather observation, etc. Satellite cloud imagery has become one of the important means for weather forecasting and climate analysis [1][2][3], especially for forecasting and monitoring some natural disasters, such as typhoons, floods, snowstorms, forest fires, etc. Cloud classification is one of the fundamental works of satellite cloud image processing. The current commonly used methods for cloud classification are threshold based methods (including simple threshold method and histogram based method mainly), statistical based methods, and artificial intelligence based methods [4]. Simple threshold methods [5,6] analyze the spectral characteristics of different channels to inverse the brightness temperature for each pixel, then take the gray value and brightness temperature of each pixel with the difference of brightness temperature between channels to determine a series of thresholds as comprehensive criterion, but it is very difficult to determine the series of thresholds. Histogram based method [7] improves the simple threshold method by taking advantage of statistical properties of the partial or global histograms of satellite cloud images,

Satellite Data Feature Extraction
This paper takes FY-2G satellite cloud image as the data source. The scanning radiometer carried by FY-2G satellite has five imaging channels, which includes two infrared long wave channels (IR1, 10.3~11.3 µm; IR2, 11.5~12.5 µm), one water vapor channel (IR3, 6.3~7.6 µm), one infrared medium wave channel (IR4, 3.5~4.0 µm), and one visible spectrum channel (VIS, 0.55~0.75 µm). Since different imaging channels reflect atmospheric physics parameters such as top brightness temperature, albedo, and water vapor content in different aspects, the comprehensive data have brought great convenience for various meteorological applications. By the information of these channels, the inversion accuracy of clouds and underlying surface can be greatly improved.
Due to the close relationship between cloud data of infrared channels and infrared radiation of clouds and the underlying surface in infrared cloud images (IR1, IR2, IR4) of FY-2G, most dark areas often indicate the highest temperature areas such as clear land, clear water such as lakes, and oceans in terms of temperature. Contrary to the dark areas, the bright white areas in infrared imagery often indicate different types of cloud, and clouds usually have a relatively low temperature. What is more, infrared channel data can not only indicate the height of clouds, but can also help in distinguishing different cloud types, land, and ocean from each other. The water vapor channel (IR3) is a special infrared band; vapor channel data indicate the absorption of infrared radiation by water vapor. The more water vapor the atmosphere contains, the more infrared radiation will be absorbed, and this makes the corresponding area in water vapor imagery whiter; thus, water vapor imagery can help to estimate the content of water vapor and also help to classify different cloud types. As for the visible spectrum channel, data of VIS indicates the albedo of solar radiation. So, generally, darker areas in VIS cloud imagery indicate clear sky and brighter areas indicate different cloud types for their higher albedo. Furthermore, the albedo is positively related to the thickness of cloud, so the VIS cloud image often shows sharper contrast in brightness, thus it also provides effective information for classification of different cloud types.
In order to classify different cloud types effectively, the gray values of each pixel in the cloud images from five channels are selected as basic features. Considering that although gray value of infrared and VIS cloud images indicate the top brightness temperature or albedo of clouds to some extent, extracting brightness temperature and albedo may help in classifying different cloud types. However, the gray value of pixels is nonlinear with the top brightness temperature and albedo. According to the imaging properties of a scanning radiometer for FY-2G, cloud top brightness temperature of infrared image and albedo of VIS image are also extracted as extra features of samples. Figure 1a shows the relationship of brightness temperature versus gray value of IR1 cloud image provided by FY-2G, Figure 1b shows the gray value histogram of IR1 cloud image, and Figure 1c shows the brightness temperature histogram of an IR1 cloud image. It can be seen that the brightness temperature and gray value have different distribution characteristics, thus brightness temperature is taken as a component of the feature vector, which strengthens the discriminative ability of the feature vector, and makes the sample feature more representative. For other infrared and VIS channels, their brightness temperature and albedo are extracted by the same method. In addition, the brightness temperature difference between different infrared channels indicate the radiation characteristics of different clouds [9], for example, the brightness temperature difference between IR1 and IR2 can be used to identify cirrus and cumulonimbus, and there is a strong correlation between the top height of convective cloud and the brightness temperature difference between IR1 and IR3, etc. Thus, in this paper, brightness temperature difference IR1-IR2, IR1-IR3, IR1-IR4, and IR2-IR3 are extracted as additional features for cloud classification. Table 1 lists the components of feature vectors for cloud classification; Table 2 briefly describes the main identification characteristics of different components.  G1, G2, G3, G4, GV  Gray value of IR1, IR2, IR3, IR4, VIS  T1, T2, T3, T4  Brightness temperature of IR1, IR2, IR3, IR4  A  Albedo of VIS  T1-T2, T1-T3, T1-T4, T2-T3 Brightness temperature difference IR1-IR2, IR1-IR3, IR1-IR4, IR2-IR3   Table 2. The main identification characteristics of different components

Component
Identification Characteristics G1, G2, T1, T2 Can be used to identify land, ocean, and clouds G3, T3 Can be used to measure the water vapor content of clouds G4, T4 Mainly represent the characteristics of under clouds over the ocean GV, A Mainly represent the thickness, height, and composition of clouds T1- T2  Mainly describe the characteristics of cirrus and cumulonimbus  T1-T3, T1-T4, T2-T3 Indicate the height of clouds more precisely

Cloud Classification System
The satellite cloud images, recorded by infrared and visible channel, mainly reflect the top brightness temperature and albedo information of clouds; it is difficult to grasp the generation procedure of clouds, and to analyze the cloud particles. Generally speaking, each pixel in cloud images is a comprehensive reflection of different clouds and underlying surface, thus it is hard to achieve an accurate classification of satellite cloud imagery. Now, following the international common practice of cloud classification, according to their height and vertical development, clouds are mainly divided into four families as high cloud, medium cloud, low cloud, and heap cloud [4]. The four families are subdivided into several categories: high cloud is subdivided into cirrus,   Gray value of IR1, IR2, IR3, IR4, VIS  T1, T2, T3, T4  Brightness temperature of IR1, IR2, IR3, IR4  A  Albedo of VIS  T1-T2, T1-T3, T1-T4, T2-T3 Brightness temperature difference IR1-IR2, IR1-IR3, IR1-IR4, IR2-IR3 Table 2. The main identification characteristics of different components.

Component Identification Characteristics
G1, G2, T1, T2 Can be used to identify land, ocean, and clouds G3, T3 Can be used to measure the water vapor content of clouds G4, T4 Mainly represent the characteristics of under clouds over the ocean GV, A Mainly represent the thickness, height, and composition of clouds T1-T2 Mainly describe the characteristics of cirrus and cumulonimbus T1-T3, T1-T4, T2-T3 Indicate the height of clouds more precisely The above mentioned 14 features were used to construct the original samples, denoted as x = (x 1 , x 2 , . . . , x 14 ), and then it was normalized by 2 -norm. The 2 -norm of x is defined as 2 14 , and the normalized sample vector is x = (x 1 , x 2 , . . . , x 14 ), where x i = x i norm(x) , i = 1, 2, . . . , 14.

Cloud Classification System
The satellite cloud images, recorded by infrared and visible channel, mainly reflect the top brightness temperature and albedo information of clouds; it is difficult to grasp the generation procedure of clouds, and to analyze the cloud particles. Generally speaking, each pixel in cloud images is a comprehensive reflection of different clouds and underlying surface, thus it is hard to achieve an accurate classification of satellite cloud imagery. Now, following the international common practice of cloud classification, according to their height and vertical development, clouds are mainly divided into four families as high cloud, medium cloud, low cloud, and heap cloud [4]. The four families are subdivided into several categories: high cloud is subdivided into cirrus, cirrostratus, and cirrocumulus; medium cloud is subdivided into altostratus and altocumulus; low cloud is subdivided into cumulus, stratus, stratocumulus, and nimbostratus; and heap cloud mainly refers to cumulonimbus.
Following the specific requirements of satellite cloud image classification for meteorological services, in this paper, each satellite cloud image pixel is classified as one of the following six types: clear water, clear land, heap cloud, low cloud, medium cloud, and high cloud. Since cumulus, stratus, nimbostratus, and stratocumulus are mainly made up of water drops and they frequently bring continuous rain, they are all classified as low clouds. Similarly, altostratus and altocumulus are just classified as medium clouds; cirrus, cirrostratus, and cirrocumulus, generally made up of ice crystals, with a cloud base height of usually more than 5000 m, and which generally do not bring precipitation, are classified as high clouds; for heap cloud, cumulonimbus are the focus, because its cloud top may extend to the scope of a medium-level or even high-level cloud, which reflects a strong updraft, and they usually incur severe convection weather as thunderstorms or heavy rainfall, so cumulonimbus attracts attention in meteorological monitoring.

Fuzzy Membership for Cloud Classification
In actual weather systems, apart from the clouds with distinguishing characteristics, there are also some clouds with fuzzy characteristics; thus, it is difficult to classify cloud into a specific cloud type arbitrarily. Establishing a fuzzy membership to describe the relation between samples and specific cloud types is a helpful scheme. In fact, in the application of machine learning, fuzzy membership plays an important role in depicting the relations among training samples and specific classes. If training sample x i is assigned a fuzzy membership p i ∈ [0, 1], which denotes its specific belonging class, a soft classification model can be established and the classification performance of machine learning algorithms can be improved. Since traditional SVM is sensitive to noises or outliers, Lin et al., by assigning smaller membership to noises or outliers to eliminate their impact on hyperplane of classification, proposed the fuzzy support vector machine (FSVM) and improved the performance of SVM classifier effectively [12]. So, constructing a reasonable fuzzy membership function is the key to building a soft classification model.
The traditional linear and sigmoid membership functions cannot depict the distribution characteristic of invalid samples such as noises and outliers effectively, and they cannot reflect the uncertainty of samples. To solve the problem, Ref. [14,15] proposed a membership function based on affinity. A minimum hypersphere is constructed, which contains most of the valid samples, and the affinity among samples, defined by support vector data description (SVDD) [20], is represented by the radius of the minimum hypersphere. The membership function is then constructed as follows [14]: is the distance between training samples x i and its class center, R is the radius of the minimum hypersphere. If d(x i ) ≤ R, which means that x i lies inside the hypersphere, or x i is more likely to be a valid sample, the corresponding membership of the sample is calculated using the upper formula of Equation (1). Otherwise, if d(x i ) ≥ R, which means that x i lies outside the hypersphere, the corresponding membership of the sample is calculated using the lower formula of Equation (1). According to the definition of membership function as Equation (1), the membership is larger than 0.4 for samples inside the hypersphere, and is less than 0.4 for samples outside the hypersphere; here, 0.4 acts as a critical membership. By calculating membership with different formula for samples inside or outside the hypersphere, the membership function could distinguish valid and invalid samples better and decrease the membership for noises and outliers, so as to eliminate their adverse impact on the classification hyperplane. Figure 2 shows a set of membership function curves of Equation (1) for different R. function could distinguish valid and invalid samples better and decrease the membership for noises and outliers, so as to eliminate their adverse impact on the classification hyperplane. Figure 2 shows a set of membership function curves of Equation (1) for different R.  Figure 2 shows that the critical membership depicts the boundary between the inner and outer samples of the minimum hypersphere, and the fuzzy memberships vary differently for different radius R. As R increases, the membership attenuation rate for samples outside the hypersphere (where Figure 2) accelerates a bit, while that slows down for samples inside the ), the membership attenuation rate for samples outside the hypersphere is smaller than that for samples inside the hypersphere. In this paper, a 14-dimensional feature vector is extracted and then normalized. After normalization, for real data, the radius R of the minimum hypersphere for different cloud type sample sets are all less than 3. In practice, samples inside or outside the hypersphere are treated as valid or invalid samples, respectively. To show the different importance of valid or invalid samples in designing a classifier, the membership should attenuate slowly for valid samples, and should attenuate far more quickly for invalid samples. Hence, the membership function defined by Equation (1) is not consistent with the distribution characteristics of satellite cloud images, which will degrade the performance of cloud type identification. In addition, the critical membership in Equation (1) is fixed to 0.4, which is not good enough. In fact, different sample sets might have different distribution characteristics, and their corresponding critical membership should be different, too. In the following section, membership function with adaptive parameters will be defined, and an adaptive fuzzy sparse representation classifier for cloud type identification will be constructed.

Adaptive Fuzzy Sparse Representation Classifier for Cloud Type Identification
In actual weather systems, cloud distribution is very complex; clouds may overlap each other and are changing all the time; noises and data errors in acquisition and transmission make it more difficult to conduct satellite cloud image processing. In order to achieve better classification performance, it is necessary to utilize the distribution characteristics of training samples. After analyzing the distribution characteristics of training samples for cloud classification, by eliminating the influence of outliers and noises, an adaptive fuzzy membership function is designed to improve the capability of affinity-based fuzzy membership function. Combining the improved fuzzy membership function with the sparse representation classifier, an adaptive fuzzy sparse representation classifier is constructed for cloud type identification.

Adaptive Fuzzy Membership Function
As mentioned above, since there are many noises and outliers in cloud training sample sets, it is difficult for the affinity-based fuzzy membership function to handle them. To make the membership function better for the classification of clouds in satellite cloud image, three new parameters are introduced. Parameters  Figure 2 shows that the critical membership depicts the boundary between the inner and outer samples of the minimum hypersphere, and the fuzzy memberships vary differently for different radius R. As R increases, the membership attenuation rate for samples outside the hypersphere (where µ i < 0.4 in Figure 2) accelerates a bit, while that slows down for samples inside the hypersphere (where µ i > 0.4 in Figure 2). If R is too small (R < 3), the membership attenuation rate for samples outside the hypersphere is smaller than that for samples inside the hypersphere. In this paper, a 14-dimensional feature vector is extracted and then normalized. After normalization, for real data, the radius R of the minimum hypersphere for different cloud type sample sets are all less than 3. In practice, samples inside or outside the hypersphere are treated as valid or invalid samples, respectively. To show the different importance of valid or invalid samples in designing a classifier, the membership should attenuate slowly for valid samples, and should attenuate far more quickly for invalid samples. Hence, the membership function defined by Equation (1) is not consistent with the distribution characteristics of satellite cloud images, which will degrade the performance of cloud type identification. In addition, the critical membership in Equation (1) is fixed to 0.4, which is not good enough. In fact, different sample sets might have different distribution characteristics, and their corresponding critical membership should be different, too. In the following section, membership function with adaptive parameters will be defined, and an adaptive fuzzy sparse representation classifier for cloud type identification will be constructed.

Adaptive Fuzzy Sparse Representation Classifier for Cloud Type Identification
In actual weather systems, cloud distribution is very complex; clouds may overlap each other and are changing all the time; noises and data errors in acquisition and transmission make it more difficult to conduct satellite cloud image processing. In order to achieve better classification performance, it is necessary to utilize the distribution characteristics of training samples. After analyzing the distribution characteristics of training samples for cloud classification, by eliminating the influence of outliers and noises, an adaptive fuzzy membership function is designed to improve the capability of affinity-based fuzzy membership function. Combining the improved fuzzy membership function with the sparse representation classifier, an adaptive fuzzy sparse representation classifier is constructed for cloud type identification.

Adaptive Fuzzy Membership Function
As mentioned above, since there are many noises and outliers in cloud training sample sets, it is difficult for the affinity-based fuzzy membership function to handle them. To make the membership function better for the classification of clouds in satellite cloud image, three new parameters are introduced. Parameters ρ I (0 < ρ I < 1) and ρ o (ρ o > 1) are used to control the membership attenuation rate for samples inside or outside the hypersphere, respectively, andμ(0 <μ ≤ 1) is used to control the critical membership. The modified membership function is defined as As a demo, the modified membership function curves with ρ I = 0.5, ρ o = 10, andμ = 0.4, are shown in Figure 3. It is clear that the modified membership function not only inherits the merit of affinity-based fuzzy membership function that the minimum hypersphere gives a clear boundary between valid and invalid samples, but also shows the advantage of sigmoid membership function that the membership attenuation rate for valid samples is slower than that for invalid samples. membership attenuation rate for samples inside or outside the hypersphere, respectively, and ) 1 0 (   μ μ is used to control the critical membership. The modified membership function is defined as As a demo, the modified membership function curves with , are shown in Figure 3. It is clear that the modified membership function not only inherits the merit of affinity-based fuzzy membership function that the minimum hypersphere gives a clear boundary between valid and invalid samples, but also shows the advantage of sigmoid membership function that the membership attenuation rate for valid samples is slower than that for invalid samples. According to Equation (2), parameters , and μ are involved in calculating the modified membership for each sample. Though these three parameters can be obtained by experiment, they can be determined adaptively according to the distribution of actual samples, and in turn the fuzzy membership function constructed will be adaptive, too. The three parameters are determined as follows: 1. Membership attenuation rate I  for samples inside the hypersphere Figure 4a and Figure 4b show two different types of samples distribution inside the hypersphere. The distances between the sample x and its center in Figure 4a and Figure 4b are the same. If parameter I  corresponding to sample x in Figure 4a and Figure 4b were set to be the same, their membership would be the same too. It is clear that, those samples in Figure 4a are more condensed to the center of the hypersphere, and those samples in Figure 4b distribute more randomly over the hypersphere, so the membership of sample x in Figure 4b should be larger than that in Figure 4a. Therefore, I  with the same value in Types 1 and 2 is not reasonable, I  should be related to sample distributions inside the hypersphere. The closer to the hypersphere surface the samples distribute, the slower the membership attenuates, and the smaller I  should be. According to Equation (2), parameters ρ I (0 < ρ I < 1), ρ o (ρ o > 1), andμ are involved in calculating the modified membership for each sample. Though these three parameters can be obtained by experiment, they can be determined adaptively according to the distribution of actual samples, and in turn the fuzzy membership function constructed will be adaptive, too. The three parameters are determined as follows: 1. Membership attenuation rate ρ I for samples inside the hypersphere Figure 4a,b show two different types of samples distribution inside the hypersphere. The distances between the sample x and its center in Figure 4a,b are the same. If parameter ρ I corresponding to sample x in Figure 4a,b were set to be the same, their membership would be the same too. It is clear that, those samples in Figure 4a are more condensed to the center of the hypersphere, and those samples in Figure 4b distribute more randomly over the hypersphere, so the membership of sample x in Figure 4b should be larger than that in Figure 4a. Therefore, ρ I with the same value in Types 1 and 2 is not reasonable, ρ I should be related to sample distributions inside the hypersphere. The closer to the hypersphere surface the samples distribute, the slower the membership attenuates, and the smaller ρ I should be.  Accordingly, the average radius d I (d I ≤ R) between samples and their class centre is defined to describe the overall distribution of samples inside the hypersphere.
where n I is the number of samples inside the hypersphere and ρ I can be defined as Figure 5a shows memberships for samples inside the hypersphere with ρ I as Equation (4), R = 2 andμ = 0.5. It is clear that if more samples inside the hypersphere are close to the surface, the membership attenuates more slowly.  Figure 4c,d, the distances between sample x outside the hypersphere and its center are the same. If the distribution characteristic of samples outside the hypersphere was ignored, the same value of ρ o was used, then the membership for sample x in Figure 4c,d would be the same. To differ the membership for sample x in Figure 4c from that in Figure 4d, the average radius d o (d o > R) between samples and their class center is defined as Equation (5), which describes the overall distribution of samples outside the hypersphere.
where n o is the number of samples outside the hypersphere and ρ o is defined as After some tests, we found that K = 5 gave the best results. Figure 5b shows memberships for samples outside the hypersphere with ρ o as Equation (6), R = 2 andμ = 0.5. It is clear that if most samples outside the hypersphere are close to the surface, the membership attenuates more slowly.

Critical membershipμ
The critical membershipμ means the minimum membership for samples inside the hypersphere and the maximum membership for samples outside the hypersphere. Figure 4 shows that, the closer to the surface the samples outside the hypersphere lie, the more possible it is that these samples belong to the class, and the larger the membership of these samples will be, that means the larger the critical membershipμ should be, and vice versa. So,μ can be defined as follows, By the analysis above, the modified fuzzy membership function is defined as In next section, an adaptive fuzzy membership based sparse representation classifier will be designed using the above modified fuzzy membership function for satellite cloud classification.

Classification of Clouds in Satellite Imagery Using Adaptive Fuzzy Sparse Representation
Sparsity is a common attribution of signals. Sparse representation means, in an appropriate base (dictionary), that a natural signal can be represented as a sparse linear combination of dictionary atoms; it is a concise way to represent information. In some sparse representation-based classification (SRC) algorithms [16][17][18], the sparse representation coefficients of a test sample can be obtained by sparse coding in a dictionary, and the test sample can be classified according to the sparsity and sparse concentration of representation coefficients. This paper applies SRC to the satellite cloud classification, to depict the fuzzy and uncertainty of cloud samples, and to eliminate the impact of outliers and noises for cloud classification. According to Equation (8), an adaptive membership of each training sample is calculated; based on the original feature vectors of each training sample, an adaptive fuzzy dictionary with an adaptive feature vector is constructed, which enhances the performance of sparse representation based classifiers for satellite cloud classification.
As mentioned before, a 14-dimensional feature vector is extracted for each pixel, that is, for each sample in satellite cloud image. Denote X = [X 1 , X 2 , . . . , X M ] as the training sample set with M cloud types, n ] of the i-th type can be weighted by U i to construct an optimized cloud training sample subset D i , n ] ∈ m×n . The same operation is performed on samples of all the M types, and the optimized training set is, (10) here, let D as the dictionary for sparse representation classifier. For a test sample y ∈ m from the i-th type, y could be expressed as a linear combination of those atoms d n , where α i,j , j = 1, 2, . . . , n, are the coding coefficients. According to sparse representation theory, if y is represented as a linear combination of the entire dictionary D, only those coefficients corresponding to sub-dictionary D i will be nonzero. Thus, the above sparse representation can be modeled as: where E is a sparse threshold, α = [α 1,1 , . . . α 1,n , . . . , α i,1 , . . . α i,n , . . . , α M,1 , . . . α M,n ] T is the sparse coefficient vector of y. The solution of Equation (11) is a NP-hard problem, and it is usually approximated by 1 -minimization, where ε is an optional error tolerance. To calculate the sparse coding coefficient α, Equation (12) can be rewritten as the following general Lagrangian model: where λ is a positive constant, and a homotopy algorithm is used to solve the 1 -minimization problem [21]. A new operator δ i (α) is introduced to extract the entries in α that associate with the i-th type such as δ i (α) = [α i,1 , α i,2 , . . . , α i,n ] T , then, the test sample y can be reconstructed by the sub-dictionary D i as,ȳ the reconstructed residual between y andȳ i is: which indicates how best to represent y by the i-th sub-dictionary D i . The smaller the value of r i (y) is, the more likely y belongs to the i-th type. So, the test sample y can be classified by seeking the minimum reconstructed residual.
Construct an optimized cloud training sample subset D i as D i = X i U i ; conduct this processing for all the M types, and then construct the optimized training set D = [D 1 , D 2 , . . . , D M ] ∈ m×l ; 5.
For test sample y ∈ m , make D as the adaptive dictionary for sparse representation classifier, do 1 -minimization by solving Equation (17), 6. Compute the residuals:

Simulation Results and Analysis
In this section, the performance of the proposed cloud classification system with some existing methods is compared. Experiments are conducted on a computer with 2.4 GHz Intel CPU and 4 GB RAM. The cloud image data come from the FY-2G satellite that carries five channel sensors (IR1, IR2, IR3, IR4, VIS) for a period of 10 days (from 28 June to 7 July 2016). From 28 June to 6 July 2016, the satellite data were collected at 8 a.m., 10 a.m., 12 p.m., 2 p.m., 4 p.m., and 6 p.m. Beijing time each day, so 9 × 6 = 54 times satellite data were collected. From these cloud images, three meteorologists examine the cloud images carefully, then manually identify and select 300 pixels (samples) for each of the six predefined cloud types as mentioned in Section 2.2. Totally, 1800 samples are used in experiments in Sections 5.1 and 5.2. Satellite data collected at 2 p.m. Beijing time, on 7 July 2016, is used for experiment of visual comparison in Section 5.3. All experiments in Sections 5.1-5.3 use the same training samples. In order to effectively represent spectral information, feature vectors for each selected sample are extracted and normalized by 2 -norm. In the proposed AFSRC, the centers and radius R of hyperspheres for the membership value calculations were determined by SVDD, and Gaussian kernel function was used in SVDD. Constant C, as a penalty factor, which controls the trade-off between the volume and the errors in SVDD, is set to be C = 1.0. The classification accuracy of the proposed method is evaluated; several existing methods-including affinity based FSVM [14], SRC [16], and CCSI-ODSR [19]-are compared with the proposed method, the computation efficiency of the different methods is provided, and training time for training SVDD is listed in Section 5.4.

Accuracy Evaluation of AFSRC for FY-2G
Among the 300 pixels (samples) for each cloud type, randomly selected 100 pixels are used as training samples; the rest 200 pixels are used as testing samples. When parameter K = 5 in Equation (6), the confusion matrix of the classification results is given in Table 3. It can be seen that, among the 1200 test samples of 6 different cloud types, 1186 samples are correctly classified by the proposed AFSRC, the overall classification accuracy is 98.83%. For clear water and clear land, the classification accuracies achieved were 99.00% and 98.00%, respectively, which indicates the proposed AFSRC can identify these cloud-free areas effectively, and it can be applied in cloud detection in satellite cloud image analysis. For heap cloud and low cloud, the classification accuracy is 99.00% and 99.50%, respectively. Since heap clouds are often associated with strong convection weather and almost always leads to weather like lightning, showers, gustiness, or hailstones, while low clouds such as cumulus and nimbostratus are usually associated with continuous rainfall, these two clouds are the key monitoring objects of weather service, and their high recognition rates effectively show the application value of AFSRC.  Table 4 shows the classification accuracy (%) of AFSRC with different K. It can be seen that the overall accuracy of AFSRC reaches its maximum with K = 5. When K varies, the classification accuracy of clear water and clear land remain almost unchanged, and the accuracy of other cloud types changes with a large range. Generally speaking, the overall accuracy of AFSRC with a smaller K is relatively lower, and it is relatively a bit higher with a larger K, which indicates that, with a larger K, the ability for sparse representation of the adaptive dictionary is strengthened, and the effect of outliers or noises can be suppressed. The next subsection will compare AFSRC with three competing methods.

Comparisons with Existing Methods
Some other methods have been applied in satellite cloud image classification recently; each method has its own advantages. In this experiment, AFSRC is compared with FSVM [14], SRC [16], and CCSI-ODSR [19]. As in the prior experiment, 200 samples of each cloud type are used for testing, and there are 1200 testing samples in total. The classification accuracy of the six cloud types by FSVM, SRC, CCSI-ODSR, and AFSRC are listed in Table 5. It is clear that AFSRC achieves better results than the other methods for almost all cloud types, except AFSRC is slightly worse than affinity based FSVM for clear land. By introducing fuzzy membership, FSVM [14] can discriminate noises and outliers in training samples; its classification results are acceptable, except for low and medium cloud. As for SRC, testing samples of low cloud, medium cloud, and high cloud are often confused with each other, which leads to a poor result, the overall accuracy is less than 70%. The experiment result implies that SRC is not a reliable satellite cloud classification method. As for CCSI-ODSR, with the help of an over-complete dictionary, though the classification results for low cloud and high cloud are still improvable, the overall accuracy of CCSI-ODSR is much better. For AFSRC, by introducing an adaptive fuzzy membership function to eliminate the influence of noises and outliers, an optimized adaptive dictionary for sparse representation classifier is constructed, and the classification result is the best.

Benchmarks on FY-2G Satellite Data
In this experiment, the proposed AFSRC is benchmarked on the FY-2G satellite data obtained at 2 p.m. Beijing time, on 7 July 2016. From the satellite image of each channel, a specific sub-image with spatial resolution of 512 × 512 pixels, which covers the super typhoon "Nepartak" and parts of the southeast coast of china, is selected. The sub-image is used for a benchmark only; no testing sample is selected from this sub-image. The IR1 channel image and cloud types labeled image whose cloud types are identified by meteorologist are shown in Figure 6a,b, respectively. In Figure 6b, triangle ( ) indicates clear water, inverted triangle ( ) indicates clear land, star ( ) indicates heap cloud, circle ( ) indicates low cloud, square ( ) indicates medium cloud, and cross (+) indicates high cloud. cloud classification method. As for CCSI-ODSR, with the help of an over-complete dictionary, though the classification results for low cloud and high cloud are still improvable, the overall accuracy of CCSI-ODSR is much better. For AFSRC, by introducing an adaptive fuzzy membership function to eliminate the influence of noises and outliers, an optimized adaptive dictionary for sparse representation classifier is constructed, and the classification result is the best.

Benchmarks on FY-2G Satellite Data
In this experiment, the proposed AFSRC is benchmarked on the FY-2G satellite data obtained at 2 p.m. Beijing time, on 7 July 2016. From the satellite image of each channel, a specific sub-image with spatial resolution of 512 × 512 pixels, which covers the super typhoon "Nepartak" and parts of the southeast coast of china, is selected. The sub-image is used for a benchmark only; no testing sample is selected from this sub-image. The IR1 channel image and cloud types labeled image whose cloud types are identified by meteorologist are shown in Figure 6a,b, respectively. In Figure  6b Color-coded cloud classification images by FSVM [14], CCSI-ODSR [16], and AFSRC [19], are shown in Figure 7, whereas SRC is not included in this comparison due to its poor performance. It can be seen from Figure 7 that, in most of the classification areas, the classification results of FSVM, CCSI-ODSR, and AFSRC are the same as that of the meteorologist-marking image. For example, for clear water and land, the three methods all give relatively reasonable classification results; in terms of the spiral rain-band cloud of super typhoon "Nepartak", the three methods again mark it correctly. The upper left corner of Figure 6 is the Yangtze Plain area of China, where continuous heavy rain was causing a serious flooding disaster in early to mid of July 2016. In Figure  7, AFSRC and FSVM correctly identify the low clouds gathering in this area, which is consistent Color-coded cloud classification images by FSVM [14], CCSI-ODSR [16], and AFSRC [19], are shown in Figure 7, whereas SRC is not included in this comparison due to its poor performance. cloud classification method. As for CCSI-ODSR, with the help of an over-complete dictionary, though the classification results for low cloud and high cloud are still improvable, the overall accuracy of CCSI-ODSR is much better. For AFSRC, by introducing an adaptive fuzzy membership function to eliminate the influence of noises and outliers, an optimized adaptive dictionary for sparse representation classifier is constructed, and the classification result is the best.

Benchmarks on FY-2G Satellite Data
In this experiment, the proposed AFSRC is benchmarked on the FY-2G satellite data obtained at 2 p.m. Beijing time, on 7 July 2016. From the satellite image of each channel, a specific sub-image with spatial resolution of 512 × 512 pixels, which covers the super typhoon "Nepartak" and parts of the southeast coast of china, is selected. The sub-image is used for a benchmark only; no testing sample is selected from this sub-image. The IR1 channel image and cloud types labeled image whose cloud types are identified by meteorologist are shown in Figure 6a,b, respectively. In Figure  6b, triangle (▲) indicates clear water, inverted triangle (▼) indicates clear land, star (★) indicates heap cloud, circle (•) indicates low cloud, square (■) indicates medium cloud, and cross (+) indicates high cloud. Color-coded cloud classification images by FSVM [14], CCSI-ODSR [16], and AFSRC [19], are shown in Figure 7, whereas SRC is not included in this comparison due to its poor performance. It can be seen from Figure 7 that, in most of the classification areas, the classification results of FSVM, CCSI-ODSR, and AFSRC are the same as that of the meteorologist-marking image. For example, for clear water and land, the three methods all give relatively reasonable classification results; in terms of the spiral rain-band cloud of super typhoon "Nepartak", the three methods again mark it correctly. The upper left corner of Figure 6 is the Yangtze Plain area of China, where continuous heavy rain was causing a serious flooding disaster in early to mid of July 2016. In Figure  7, AFSRC and FSVM correctly identify the low clouds gathering in this area, which is consistent It can be seen from Figure 7 that, in most of the classification areas, the classification results of FSVM, CCSI-ODSR, and AFSRC are the same as that of the meteorologist-marking image. For example, for clear water and land, the three methods all give relatively reasonable classification results; in terms of the spiral rain-band cloud of super typhoon "Nepartak", the three methods again mark it correctly. The upper left corner of Figure 6 is the Yangtze Plain area of China, where continuous heavy rain was causing a serious flooding disaster in early to mid of July 2016. In Figure 7, AFSRC and FSVM correctly identify the low clouds gathering in this area, which is consistent with the serious flooding disaster caused by the continuous heavy rain. On the left periphery of super typhoon "Nepartak", FSVM misclassified a small portion of marine areas as land, and some low clouds and medium clouds around the typhoon body were misclassified as high clouds. In Figure 7b, compared with the classification results of FSVM and the proposed AFSRC, the area of heap clouds in typhoon body by CCSI-ODSR is much smaller than that of the other two. According to the discussions above, AFSRC can achieve better results for cloud classification; higher recognition accuracy for various cloud types indicates the strong stability and adaptability of AFSRC.

Running Time
To evaluate the computation efficiency of different methods, the training time and testing time of different methods are recorded, as shown in Table 6. For SRC, no training is needed; its training time is ignored. For each method, the training time is the total training time for all the 6 × 100 training samples, and the testing time is an average for all the 6 × 200 testing samples. Compared with FSVM [14] and CCSI-ODSR [19], the training time of the proposed AFSRC is the shortest. The reason lies in that FSVM needs to calculate the fuzzy membership based on affinity among samples, which leads to a longer training time; CCSI-ODSR consumes a lot of time to train the dictionary, its high computational complex results in the longest training time; by introducing an adaptive fuzzy membership function to optimize the training samples and eliminate the impact of noises and outliers, the training time of AFSRC is shortened, too. Though the testing time of FSVM is much shorter than the other three, due to its relatively poor performance in misclassifying low cloud and medium cloud, and its poor adaptability and flexibility of the fuzzy membership function, it is hard for traditional FSVM to be applied in practice. Generally speaking, by constructing an adaptive fuzzy membership function, the proposed AFSRC achieves high classification accuracy with acceptable time efficiency, and is reliable in practical cloud classification.

Conclusions
Since traditional fuzzy membership has poor flexibility and is inconsistent with the distribution characteristics of samples for cloud classification, by defining adaptive parameters related to attenuation rate and critical membership, an adaptive fuzzy membership function is constructed and then combined with sparse representation classifier. The newly proposed adaptive fuzzy sparse representation-based classification (AFSRC) method for satellite cloud classification can eliminate the impact of noises and outliers for cloud classification. Experiments results on FY-2G satellite cloud images show that the proposed AFSRC not only improves the accuracy of cloud classification with acceptable time efficiency, but it also has strong stability and adaptability. How to find a more reasonable way to determine the parameters in AFSRC, and how to apply the AFSRC method in other satellite cloud classification, will be presented in future work.