Examining the Deep Belief Network for Subpixel Unmixing with Medium Spatial Resolution Multispectral Imagery in Urban Environments

Mixed pixels in medium spatial resolution imagery create major challenges in acquiring accurate pixel-based land use and land cover information. Deep belief network (DBN), which can provide joint probabilities in land use and land cover classification, may serve as an alternative tool to address this mixed pixel issue. Since DBN performs well in pixel-based classification and object-based identification, examining its performance in subpixel unmixing with medium spatial resolution multispectral image in urban environments would be of value. In this study, (1) we examined DBN’s ability in subpixel unmixing with Landsat imagery, (2) explored the best-fit parameter setting for the DBN model and (3) evaluated its performance by comparing DBN with random forest (RF), support vector machine (SVM) and multiple endmember spectral mixture analysis (MESMA). The results illustrated that (1) DBN performs well in subpixel unmixing with a mean absolute error (MAE) of 0.06 and a root mean square error (RMSE) of 0.0077. (2) A larger sample size (e.g., greater than 3000) can provide stable and high accuracy while two-RBM-layer and 50 batch sizes are the best parameters for DBN in this study. Epoch size and learning rate should be decided by specific applications since there is not a consistent pattern in our experiments. Finally, (3) DBN can provide comparable results compared to RF, SVM and MESMA. We concluded that DBN can be viewed as an alternative method for subpixel unmixing with Landsat imagery and this study provides references for other scholars to use DBN in subpixel unmixing in urban environments.


Introduction
Medium spatial resolution multispectral imageries, such as Landsat data, are widely used in geographical applications since they can cover large spatial areas and have a short repeat period. However, mixed pixels, which contain more than one pure land cover class in a pixel, are inevitably found in medium spatial resolution imageries. For example, Landsat Thematic Mapper (TM) imagery has a spatial resolution of 30 ms. A pixel covers 900 square ms on the corresponding ground area, which is larger than the area of many individual land cover types in urban environments. Thus, assigning only one class label to a mixed pixel is inappropriate since this will result in missing information about other classes. Subpixel unmixing methods can provide more accurate results than traditional pixel-based classifications. Subpixel models, such as spectral mixture analysis [1], We applied the vegetation-impervious surface-soil (V-ISA-S) model proposed by [42] as the land cover class category. Four land cover types, including vegetation (V), high albedo impervious surface area (ISAh), low albedo impervious surface area (ISAl) and soil (S), were utilized for subpixel unmixing. A scene of Landsat 5 Thematic Mapper (TM), acquired on 23 May 2010, was used for the subpixel unmixing. A scene of AISA images (wavelengths: 400-2500 nm, recorded in August 2008 in Milwaukee County, WI, USA) with a spatial resolution of 1 m; a spectral resolution of 6 nm; and 366 channels were employed for the collection of training samples. Image preprocessing, such as geometric correction, radiance calibration and atmospheric correction, was applied to the Landsat 5 TM and AISA images to acquired corrected spectral reflectance values. High spatial resolution images from Google Earth images (recorded in 2010) were utilized to verify the unmixing results.
We applied the vegetation-impervious surface-soil (V-ISA-S) model proposed by [42] as the land cover class category. Four land cover types, including vegetation (V), high albedo impervious surface area (ISAh), low albedo impervious surface area (ISAl) and soil (S), were utilized for subpixel unmixing. Training samples is the key to successfully applying deep learning techniques in remote sensing classification. A pure spectrum vector is viewed as a training sample. However, training samples are limited from medium spatial resolution imageries. Only a small amount of training samples can be collected from Landsat imagery. These samples, especially impervious surface area samples, are not adequate for the DBN training process. Thus, hyperspectral imagery with 1-m spatial resolution was used to assist collection of training samples ( Figure 2). The collected hyperspectral training samples were resampled to match the wavelengths of Landsat 5 TM imagery. The total number of training samples of V, ISAh, ISAl and S were 4512, 4899, 2857 and 5000, respectively.
Remote Sens. 2019, 10, x FOR PEER REVIEW 4 of 19 Training samples is the key to successfully applying deep learning techniques in remote sensing classification. A pure spectrum vector is viewed as a training sample. However, training samples are limited from medium spatial resolution imageries. Only a small amount of training samples can be collected from Landsat imagery. These samples, especially impervious surface area samples, are not adequate for the DBN training process. Thus, hyperspectral imagery with 1-m spatial resolution was used to assist collection of training samples ( Figure 2). The collected hyperspectral training samples were resampled to match the wavelengths of Landsat 5 TM imagery. The total number of training samples of V, ISAh, ISAl and S were 4512, 4899, 2857 and 5000, respectively. We also randomly collected 40 testing samples in the study area to evaluate the unmixing performance ( Figure 3). Each sample has dimensions of 3 × 3 pixels (90 m × 90 m) in order to mitigate the effect of geometric registration. We also randomly collected 40 testing samples in the study area to evaluate the unmixing performance ( Figure 3). Each sample has dimensions of 3 × 3 pixels (90 m × 90 m) in order to mitigate the effect of geometric registration.

Subpixel Unmixing with DBN
DBN is a probabilistic generative model that provides a joint probability distribution over observable data and labels [20]. It first makes full use of an efficient layer-by-layer greedy learning strategy to initialize the deep network before finetuning all weights jointly with the desired outputs [21]. A DBN is constructed with hierarchically sets of restricted Boltzmann machines (RBMs) [20,21]. Each RBM has a visible layer and a hidden layer with I binary visible units (v = {v 1 , v 2 , . . . , v I }) and J binary hidden units (h = h 1 , h 2 , . . . , h J ), respectively. The energy of joint configuration of visible and hidden units (v, h) is described as follows [20,21]: where θ = w ij , a i , b j , i = 1, 2, . . . , I; j = 1, 2, . . . , J forms the set of model parameters. An RBM defines a joint probability over the hidden units as follows: where Z is the partition function, calculated as:

of 18
Remote Sens. 2019, 10, x FOR PEER REVIEW 5 of 19 Figure 3. Distribution of testing samples. The small red squares were the testing samples in our study areas.

Subpixel Unmixing with DBN
DBN is a probabilistic generative model that provides a joint probability distribution over observable data and labels [20]. It first makes full use of an efficient layer-by-layer greedy learning strategy to initialize the deep network before finetuning all weights jointly with the desired outputs [21]. A DBN is constructed with hierarchically sets of restricted Boltzmann machines (RBMs) [20,21].
Each RBM has a visible layer and a hidden layer with I binary visible units (   The condition distribution p(h j = 1|v) and p(v i = 1|h) can be readily computed [20]. The output of the preceding RBM is used as input data for the next RBM. Two adjacent layers have a full set of connections between them while no two units in the same layer are connected. The input can be a set of spectral signatures or the contextual features from neighboring pixels.
Parameter setting is important for deep learning techniques. In order to find out the best-fit parameters, we examined different training sample sizes, numbers of RBM layer, numbers of epoch, batch sizes and learning rates, respectively.
We tested sample sizes from 5 to 5000 at intervals of 5 (sample sizes less than 100) and 50 (sample sizes larger than 100), respectively. Spectra in spectral library were sorted using the sum of the band reflectance. After this, sub samples were extracted from the sorted spectral library at the same interval. The intervals were calculated as follows: where I is the interval, tss and ss are the mean total sample size and sub sample size, respectively. All samples will be used to trained the DBN model when the corresponding class's sample size is less than the training sample size. In addition to the sample size, we also examined different numbers of RBM layers; epoch and batch sizes as well as different learning rates by iteratively testing the DBN with different parameters.

Accuracy Assessment and Comparative Analysis
Fractions of the impervious surface area, which were combined by high albedo impervious surface area and low albedo impervious surface area, were employed to verify the accuracy of unmixing results. Referenced fractions were manually digitized within the testing samples on high spatial resolution imageries. Probabilities estimated by DBN, RF and SVM were approximately viewed as fractions in this study. They were compared to the digitized fractions directly using the mean absolute error (MAE) and root mean square error (RMSE). MAE and RMSE can be calculated as follows: where f e,i and f r,i are the estimated and reference probabilities of sample i, respectively; and M is the number of samples. We also employed multiple endmember spectral mixture analysis (MESMA) [36], random forest (RF) and support vector machine (SVM) to unmix the same study area. Since different models have different requirements of training samples and parameters, we employed the best performance of each model for comparison. The best performance of each model was acquired by repeatedly testing different parameters. The model with the lowest MAE was viewed as the best model.

Sample Size
We tested the DBN model 111 times with sample sizes (for each class) of 5-5000. The testing experiments used the same RBM layer, epoch, batch size and learning rate of 2 (14,12), 150, 50 and 0.1. Figure 4 shows that MAEs drop with an increase in the sample size. MAE reaches 0.22 when there are only 5 training samples. When the sample size is in the range of 1-1200, MAEs vibrate from 0.11 to 0.15. When the sample size is larger than 1200, MAEs decline slowly and trend close to 0.06. Although the MAE at the sample size of 2850 increases dramatically, it drops back to its original level at a sample size of 2900. When the sample size is larger than 3000, MAEs become increasingly stable.

Number of RBM Layer
Six different numbers (10,8,6,4,3,2) of RBM layers ( Figure 5) were tested with a sample size of 3000. Other initial parameters were the same as previous experiments. Figure 3 shows that the DBN model of two RBM layers performs better (MAE of 0.06; visible node: 14; hidden node: 12). MAE rises quickly and reaches a peak of 0.15 when the RBM layer size is 3. MAEs have similar values around 0.13 when there are 3-10 RBM layers.
Remote Sens. 2019, 11, 1566 7 of 18 experiments used the same RBM layer, epoch, batch size and learning rate of 2 (14,12), 150, 50 and 0.1. Figure 4 shows that MAEs drop with an increase in the sample size. MAE reaches 0.22 when there are only 5 training samples. When the sample size is in the range of 1-1200, MAEs vibrate from 0.11 to 0.15. When the sample size is larger than 1200, MAEs decline slowly and trend close to 0.06. Although the MAE at the sample size of 2850 increases dramatically, it drops back to its original level at a sample size of 2900. When the sample size is larger than 3000, MAEs become increasingly stable.

Number of RBM Layer
Six different numbers (10,8,6,4,3,2) of RBM layers ( Figure 5) were tested with a sample size of 3000. Other initial parameters were the same as previous experiments. Figure 3 shows that the DBN model of two RBM layers performs better (MAE of 0.06; visible node: 14; hidden node: 12). MAE rises quickly and reaches a peak of 0.15 when the RBM layer size is 3. MAEs have similar values around 0.13 when there are 3-10 RBM layers.

Number of Epochs
We tested the number of epochs from 50 to 1000 at an interval of 50. The sample size is 3000 and the number of RBM layers is 2. Other initial parameters were the same as previous experiments. MAEs are similar since they vary in a small range between 0.06 and 0.07 ( Figure 6). This trend is not followed when the epochs increase from 50 to 1000.

Number of Epochs
We tested the number of epochs from 50 to 1000 at an interval of 50. The sample size is 3000 and the number of RBM layers is 2. Other initial parameters were the same as previous experiments. MAEs are similar since they vary in a small range between 0.06 and 0.07 ( Figure 6). This trend is not followed when the epochs increase from 50 to 1000.

Number of Batch Size
The batch sizes were tested between 50 and 1450 with 150 epochs, 2 RBM layers, 3000 training samples and 0.1 learning rate. The results illustrate that MAEs increase slowly (from 0.06 to 0.07) when the batch size is between 50 and 200 ( Figure 7). After this, MAEs increase dramatically from 0.07 to 0.13 when the range of batch sizes is 200-300. Finally, MAEs drop gradually from 0.136 to 0.128 when the batch size increases from 300 to 1450.

Number of Epochs
We tested the number of epochs from 50 to 1000 at an interval of 50. The sample size is 3000 and the number of RBM layers is 2. Other initial parameters were the same as previous experiments. MAEs are similar since they vary in a small range between 0.06 and 0.07 ( Figure 6). This trend is not followed when the epochs increase from 50 to 1000.

Learning Rate
We also tested learning rates of 0.1-2 (interval of 0.1) with a sample size of 3000, epoch of 50, batch size of 150 and 2 RBM layers. It seems that there is no consistent pattern for the changes in MAEs with an increase in the learning rates ( Figure 8). MAEs vibrate dramatically from 0.146 to 0.06 with no significant trend. MAE reaches a peak of 0.15 at a learning rate of 0.9. The MAEs of other learning rates vary from 0.06 to 0.1 without any consistent pattern.

Learning Rate
We also tested learning rates of 0.1-2 (interval of 0.1) with a sample size of 3000, epoch of 50, batch size of 150 and 2 RBM layers. It seems that there is no consistent pattern for the changes in MAEs with an increase in the learning rates ( Figure 8). MAEs vibrate dramatically from 0.146 to 0.06 with no significant trend. MAE reaches a peak of 0.15 at a learning rate of 0.9. The MAEs of other learning rates vary from 0.06 to 0.1 without any consistent pattern.

Learning Rate
We also tested learning rates of 0.1-2 (interval of 0.1) with a sample size of 3000, epoch of 50, batch size of 150 and 2 RBM layers. It seems that there is no consistent pattern for the changes in MAEs with an increase in the learning rates ( Figure 8). MAEs vibrate dramatically from 0.146 to 0.06 with no significant trend. MAE reaches a peak of 0.15 at a learning rate of 0.9. The MAEs of other learning rates vary from 0.06 to 0.1 without any consistent pattern.

Accuracy Assessment and Comparative Analysis
In order to evaluate the performance of DBN more objectively, we compared the DBN with random forest (RF), support vector machine (SVM) and multiple endmember spectral mixture analysis (MESMA). We used 3000, 100, 5 and 10 samples (each class) for DBN, RF, SVM and MESMA, respectively. Histogram and scatter plots were employed for the numeric and visual comparisons.

Accuracy Assessment and Comparative Analysis
In order to evaluate the performance of DBN more objectively, we compared the DBN with random forest (RF), support vector machine (SVM) and multiple endmember spectral mixture analysis (MESMA). We used 3000, 100, 5 and 10 samples (each class) for DBN, RF, SVM and MESMA, respectively. Histogram and scatter plots were employed for the numeric and visual comparisons. Histograms of MAE and RMSE illustrate that the best DBN (MAE: 0.06, RMSE: 0.0077) outperforms the best RF (MAE: 0.08, RMSE: 0.0114) and MESMA (MAE: 0.14, RMSE: 0.0222) while the SVM achieved the highest accuracy with a MAE of 0.03 and a RMSE of 0.0023 (Figures 9 and 10).   Moreover, we also compared the time consumption of each model. The total number of pixels in the study area is 6192 with 86 rows and 72 columns. We used the Dell Precision Tower 7920 workstation with CPU of XEON Gold 5122 (4 Cores, 3.6 GHz), memory of 128 GB RDIMM and graphics card of NVIDIA Quadro P2000 (5 GB) for calculation. The results (Table 1) demonstrate that SVM can effectively predict the results within a second. DBN needs about 5 more minutes (310.89 s) to complete the calculation while RF needs more than 10 min to finish the unmixing process. Since there are 10 4 + 4 × 10 3 + 6 × 10 2 endmember combinations (10 spectra for each class) in a pixel in the MESMA model, its calculation efficiency is very poor. MESMA takes about 25 days to finish the fraction estimate for 6192 pixels.   Moreover, we also compared the time consumption of each model. The total number of pixels in the study area is 6192 with 86 rows and 72 columns. We used the Dell Precision Tower 7920 workstation with CPU of XEON Gold 5122 (4 Cores, 3.6 GHz), memory of 128 GB RDIMM and graphics card of NVIDIA Quadro P2000 (5 GB) for calculation. The results (Table 1) demonstrate that SVM can effectively predict the results within a second. DBN needs about 5 more minutes (310.89 s) to complete the calculation while RF needs more than 10 min to finish the unmixing process. Since there are 10 4 + 4 × 10 3 + 6 × 10 2 endmember combinations (10 spectra for each class) in a pixel in the MESMA model, its calculation efficiency is very poor. MESMA takes about 25 days to finish the fraction estimate for 6192 pixels. Moreover, we also compared the time consumption of each model. The total number of pixels in the study area is 6192 with 86 rows and 72 columns. We used the Dell Precision Tower 7920 workstation with CPU of XEON Gold 5122 (4 Cores, 3.6 GHz), memory of 128 GB RDIMM and graphics card of NVIDIA Quadro P2000 (5 GB) for calculation. The results (Table 1) demonstrate that SVM can effectively predict the results within a second. DBN needs about 5 more minutes (310.89 s) to complete the calculation while RF needs more than 10 min to finish the unmixing process. Since there are 10 4 + 4 × 10 3 + 6 × 10 2 endmember combinations (10 spectra for each class) in a pixel in the MESMA model, its calculation efficiency is very poor. MESMA takes about 25 days to finish the fraction estimate for 6192 pixels. Scatterplots of estimated probabilities and referenced fractions were employed to demonstrate the detailed accuracy of each testing sample. Figure 11 shows that the reference line is almost located in the center of the scattered points, meaning that the percentages of overestimate and underestimate are similar. Figure 11 also illustrates that DBN underestimates the high ISA fraction testing samples while it overestimates the low ISA fraction samples. RF has a scatterplot pattern similar to that of DBN ( Figure 12). The scatter points in SVM are almost located along with the referenced line of y = x, implying perfect matches between estimated probabilities and referenced fractions ( Figure 13). All the fractions calculated in MESMA are underestimated since all scatter points are located on the right side of the referenced line ( Figure 14).         In addition to MAE, RMSE and scatterplots, we also added visual comparisons of DBN, RF, SVM and MESMA (Figure 15). Figure 15 illustrates that RF and SVM can present the distinct road shapes in V (dark shape) and ISAl (light shape) fractional maps, while DBN and MESMA have poor performance in representing the shape of roads in all fraction maps. With the vegetation fraction maps, high vegetation fractions/possibilities are shown in grassland areas (see Figure 1  In addition to MAE, RMSE and scatterplots, we also added visual comparisons of DBN, RF, SVM and MESMA ( Figure 15). Figure 15 illustrates that RF and SVM can present the distinct road shapes in V (dark shape) and ISAl (light shape) fractional maps, while DBN and MESMA have poor performance in representing the shape of roads in all fraction maps. With the vegetation fraction maps, high vegetation fractions/possibilities are shown in grassland areas (see Figure 1

Discussion
Currently, the applications of deep learning in remote sensing classification mainly focus on (1) pixel-based classification with hyperspectral imageries and (2) scene-based classification with high/very high spatial resolution aerial or satellite imageries. Few studies have utilized deep learning techniques to unmix medium spatial resolution multispectral imagery. Thus, this study examined the DBN model to unmix the mixed pixels in suburban areas with Landsat imagery. The results of DBN are not deterministic endmember fractions for a pixel, but probability distributions of each land cover classes. Figure 15. Visual comparisons between DBN, RF, SVM and MESMA. Column V represents vegetation, Column ISAh represents a high albedo impervious surface area, Column ISAl represents a low albedo impervious surface area and Column S represents soil.

Discussion
Currently, the applications of deep learning in remote sensing classification mainly focus on (1) pixel-based classification with hyperspectral imageries and (2) scene-based classification with high/very high spatial resolution aerial or satellite imageries. Few studies have utilized deep learning techniques to unmix medium spatial resolution multispectral imagery. Thus, this study examined the DBN model to unmix the mixed pixels in suburban areas with Landsat imagery. The results of DBN are not deterministic endmember fractions for a pixel, but probability distributions of each land cover classes.

Application of DBN in Landsat Imagery
Landsat series data only have six to seven available multispectral channels for classification. On the one hand, it can mitigate a large amount of data, which is one of the major challenges in remote sensing classification [20]. On the other hand, these limited bands also cause trouble in classification with deep learning techniques as limited wavebands provide a smaller amount of information. It is a challenge for deep learning techniques, such as DBN, to learn the comprehensive characteristics of each land cover class with these limited bands and this thus affects deep learning models' performances. Sample size and model parameters are essential in successfully applying a DBN model with multispectral images. However, labeled training samples are another challenge in medium spatial resolution imagery [20]. Acquiring a large amount of labeled training samples in urban and suburban areas is impossible in medium spatial resolution images as a pixel's corresponding ground area is larger than most independent objects on the ground. Therefore, mixed pixels, which contain more than one land cover type in a pixel, are inevitable. It is difficult to use mixed pixels as training samples in DBN because of the limitation of referenced fraction data. Thus, it is more difficult to apply DBN in medium spatial resolution multispectral imagery. To address the training sample limitation, we tried to collect training samples both from high spatial resolution hyperspectral imageries and from Landsat imagery themselves. The results demonstrated that these training samples can be utilized for the DBN and other machine learning methods. This may provide an alternative way for applications of deep learning in the collection of training samples when they are applied in medium spatial resolution multispectral imagery.

Application of DBN in Subpixel Unmixing
Similar to other machine learning based unmixing methods, DBN highlights the probabilities of spectral signatures while MESMA assumes equal probabilities for all endmembers [43]. After learning the characteristics of each land cover class, DBN estimates the probabilities of corresponding land cover types. Although other researchers have discussed using the endmember possibilities to replace endmember fractions in subpixel unmixing [40,44,45], a few studies applied deep learning techniques, especially the deep belief network, to estimate land covers' probabilities. The results from this study illustrates that the probabilities estimated from DBN model are close to the land cover fractions in subpixel unmixing. Thus, it is appropriate to use probabilities calculated from DBN to estimate land cover fraction in a mixed pixel.
Parameter setting is an important step to achieve successful unmixing results. The experiments of this study have provided information on selecting a suitable sample size, number of RBM layers and batch size by iteratively comparing MAEs with different parameter values. Accuracy assessments imply that the experiments with larger sample sizes have higher accuracies, which matches the common assumption of deep learning techniques [30]. However, we also recognized that at least 3000 samples are necessary for DBN in subpixel unmixing with Landsat imagery. DBN with a sample size larger than 3000 has stable performance. On the contrary, MAE vibrated dramatically when the sample size is less than 1000. This may be first due to training sample subset selection. In this study, we reordered the training samples according to the sum of the band reflectance. After this, the samples were selected with a fixed interval. A smaller sample size results in a larger interval. Thus, the within-class variability in the training samples will be larger with a small sample size. Second, when the sample size is small, fewer characteristics can be provided. Therefore, DBN cannot learn about the comprehensive characteristics of each land cover class, leading to the vibration of model performance. When the sample size is larger than 3000, the major characteristics of all four training samples are shown in the selected subsets and this results in a smaller change. Therefore, the model performance becomes more stable.
In terms of the size of RBM layer, it seems that there is no more space for adjustments of RBM layer sizes since there are big differences between two-layer and more than two-layer MAEs. This may be due to the limited input wavebands in multispectral imagery. Compared to the experiment from reference [30] that which applied 30-or 50-layers for RBM, the number of RBM layer used in the study is very small. However, reference [30] implemented the DBN on hyperspectral data, which have hundreds of channels. Although the number of RBM layers in DBN, also known as the depth, is important for classification accuracy, the setting of depth may be affected by the number of channels in the data set. Thus, it makes sense for the two-RBM-layer model to have higher accuracy in this study.
The number of epochs (150) in this study is much less than the 1000-500 epochs from reference [30]. However, we cannot determine a consistent pattern for choosing the best-fit epochs since their MAEs are quite similar in all tests. The results from [46] also illustrated a similar result as there was no significant trend of accuracy when the epochs change from 50 to 200. However, our results match the conclusions of [47] as they illustrated that when the epoch size is larger than 15, the accuracy becomes stable.
The setting of the learning rate affects the rate of DBN learning. There is no a significant correlation between learning rates and MAEs. It seems that a lower rate does not mean higher accuracy. Reference [46] demonstrated similar results as the accuracy vibrated slightly when the learning rate changed from 0.2 to 0.7. A lower learning rate will theoretically lead to more reliable results. Time consumption will increase accordingly. However, this pattern did not occur in this study and in reference [46]. Other studies [23,47,48] have all used the same learning rate of 0.1 for their applications.
We tested a batch size of 50-1450. The results demonstrated that a smaller batch size results in better performance of the model. Generally, many studies only apply a batch size range of 50-150 [24,47,48]. Their results illustrated that a batch size selected within this range could have promising results.

Comparisons with Other Models
In the comparisons between SVM [40,44,49], RF [50] and MESMA [36], DBN [21]'s performance is close to SVM and RF. However, DBN requires significantly training samples than the other two techniques. That may be the reason for the lack of discussion of DBN in subpixel unmixing. Besides, DBN, similar to spectral mixture analysis [1], still cannot address the between-class variability since soil fractions are overestimated in dense built-up areas. However, its performance is higher than the MESMA and the calculational efficiency is better than MESMA in this study. Thus, DBN still can be viewed as an alternative method for subpixel unmixing in urban/suburban areas.
Soil fractions in DBN, SVM and MESMA are overestimated ( Figure 15, column S). It may be due to the process of collecting soil samples. Most soil samples were collected from bare soil and sandy areas, such as beaches. These soil spectra are similar with those from high albedo impervious surface areas [51,52]. It is easy to mistaken them for high albedo impervious surface areas in classification [53]. The results in this study provide references about different models' capabilities of distinguishing soil and high albedo impervious surface areas. SVM [44] is the best model to identify soil and impervious surfaces even though the training samples are limited.
In addition, we also tested the models of deep autoencoder network (DAEN) [54,55] and pixel-based Convolutional Neural Networks (CNN) [10,16,56]. However, their performances are inferior to that of DBN (Figures 16 and 17). With the DAEN model, we have tested it with different sample sizes, learning rates, steps, batch sizes, regularizers and moving average decays. Their results are similar to Figure 16, which shows that the ISA possibilities are almost the same in different testing experiments. Similar to DAEN, we tested pixel-based CNN [10,16,56] with different parameters, such as testing sample sizes of 100-500, batch sizes of 50-500 and learning rates of 0.001-0.2. However, the probabilistic maps are similar to Figure 17. Figure 17 illustrates the inaccurate probability distributions of vegetation.  Figures 16 and 17 imply that DAEN and CNN may perform well in hyperspectral imagery [10,16,[54][55][56] but not in medium spatial resolution multispectral imagery. The number of channels is the major difference between multispectral and hyperspectral imageries. Experiments of DAEN and Similar to DAEN, we tested pixel-based CNN [10,16,56] with different parameters, such as testing sample sizes of 100-500, batch sizes of 50-500 and learning rates of 0.001-0.2. However, the probabilistic maps are similar to Figure 17. Figure 17 illustrates the inaccurate probability distributions of vegetation.  Figures 16 and 17 imply that DAEN and CNN may perform well in hyperspectral imagery [10,16,[54][55][56] but not in medium spatial resolution multispectral imagery. The number of channels is the major difference between multispectral and hyperspectral imageries. Experiments of DAEN and Similar to DAEN, we tested pixel-based CNN [10,16,56] with different parameters, such as testing sample sizes of 100-500, batch sizes of 50-500 and learning rates of 0.001-0.2. However, the probabilistic maps are similar to Figure 17. Figure 17 illustrates the inaccurate probability distributions of vegetation. Figures 16 and 17 imply that DAEN and CNN may perform well in hyperspectral imagery [10,16,[54][55][56] but not in medium spatial resolution multispectral imagery. The number of channels is the major difference between multispectral and hyperspectral imageries. Experiments of DAEN and CNN demonstrate that the number of channels is a key parameter determining their successful applications. Therefore, we did not add these two models for comparison in this study.

Conclusions
This study examined the DBN model in subpixel unmixing with Landsat imagery. The results illustrated that the DBN model can perform well in subpixel unmixing. Different tests were performed to determine the best-fit parameters in DBN model. Several conclusions can be achieved. (1) DBN can provide comparable results to RF, SVM and MESMA in medium spatial resolution multispectral imagery. (2) A larger training sample size, especially larger than 3000, can acquire better and more stable results in DBN while two RBM layers and a batch size of 50 is better for Landsat imagery since it has the lowest MAE. (3) The setting of the learning rate and epoch should be based on specific applications since there is no consistent pattern in our experiments. The major contribution of this study is the examination of the applicability of DBN in subpixel unmixing with Landsat imagery. The results of this research may serve as references for scholars to explore the use of DBN in subpixel unmixing with medium spatial resolution multispectral imagery.