Classifying Complex Mountainous Forests with L-Band SAR and Landsat Data Integration : A Comparison among Different Machine Learning Methods in the Hyrcanian Forest

Forest environment classification in mountain regions based on single-sensor remote sensing approaches is hindered by forest complexity and topographic effects. Temperate broadleaf forests in western Asia such as the Hyrcanian forest in northern Iran have already suffered from intense anthropogenic activities. In those regions, forests mainly extend in rough terrain and comprise different stand structures, which are difficult to discriminate. This paper explores the joint analysis of Landsat7/ETM+, L-band SAR and their derived parameters and the effect of terrain corrections to overcome the challenges of discriminating forest stand age classes in mountain regions. We also verified the performances of three machine learning methods which have recently shown promising results using multisource data; support vector machines (SVM), neural networks (NN), random forest (RF) and one traditional classifier (i.e., maximum likelihood classification (MLC)) as a benchmark. The non-topographically corrected ETM+ data failed to differentiate among different forest stand age classes (average classification accuracy (OA) = 65%). This confirms the need to reduce relief effects prior data classification in mountain regions. SAR backscattering alone cannot properly differentiate among different forest stand age classes (OA = 62%). However, textures and PolSAR features are very efficient for the separation of forest classes (OA = 82%). The highest classification accuracy was achieved by the joint usage of SAR and ETM+ (OA = 86%). However, this shows a slight improvement compared to the ETM+ classification (OA = 84%). The machine learning OPEN ACCESS Remote Sens. 2014, 6 3625 classifiers proved t o be more robust and accurate compared to MLC. SVM and RF statistically produced better classification results than NN in the exploitation of the considered multi-source data.


Introduction
The combination of multi-sensor data (e.g., both optical and SAR) has become an active research topic to improve discrimination of different land cover classes [1][2][3][4][5][6].Optical sensors have been imaging the Earth continuously since the early 1970s.They provide a unique source for observing the land cover changes [7].Optical sensor data are not able to capture forest stand structure information because they cannot penetrate forest canopy.Therefore, vegetation classification based on the use of optical data may yield misclassification among vegetation types [3].In addition, optical measurements are strongly dependent on atmospheric conditions (e.g., haze and clouds) [8].In contrast, SAR can penetrate the forest canopy depending on frequency and polarization mode and may capture more structural information than optical data [3,9].Another advantage is that SAR measurements are independent of weather conditions.Thus, multi-source approaches using both optical and SAR data are suggested since they contain both physical/chemical in addition to geometrical information of the forest [7,10].
Polarized L-bands SAR, such as those from the Advanced Land Observing Satellite (ALOS) Phased Array L-band SAR (PALSAR) launched in early 2006 [11] has successfully been used for forest classification combined with optical data [4,6,7,10,12,13].L-band has high sensitivity to forest structure due to its strong interaction with tree boles and trunk [9][10][11].However, the canopy structure may affect the backscattering and attenuate the radar signal, and subsequently, different forest stands may have similar backscattering values [14].Therefore, SAR data alone are not able to capture effectively the differences in stand structures in heterogeneous forest [3,7].
The ability to discriminate among forest classes has been already investigated worldwide using multi-source remote sensing data [3,7,[15][16][17][18].Joint classification approaches involving both Landsat and ALOS/PALSAR data have also been suggested in complex environments [7,12,13].The results of these investigations showed that the multi-source approach rather than using each data source independently improves the forest discrimination significantly.Authors also reported that the discrimination of forest classes is often challenging due to the lack of abrupt boundaries among classes [3,7,19].Almost all of the mentioned studies were done in flat areas probably because the complex terrain condition strongly affects the forest classification accuracy, especially when multi-source data are used [20][21][22].Topographic effects result in reflectance difference for similar terrain feature that induce possible misclassification [20][21][22].
The main objective of this research is to evaluate the potential of integration of both dual polarization ALOS/PALSAR data and Landsat-7/ETM+ data for the discrimination of different land cover classes in mountain areas.The study area is the Loveh forest, a part of the Hyrcanian forest in northern Iran.This mountainous forest has been subjected to different logging procedures in which three different age forest classes can be found [23,24].The discrimination among different forest classes is of great importance for identifying management activities and facilitating restoration plans at this particular forest.Previous studies conducted at this area were mainly restricted to optical images [24,25].Therefore, multi-source approach of PALSAR and optical data is of great interest.Overcoming the challenges of heterogeneous forest classification in mountain regions such as the spectral and backscattering similarities among different forest stand age classes and the limitations introduced by the topographic factor is the main contribution of this study.To achieve the objective, we focused on four research questions: (1) Does the integration of Landsat/ETM+ and dual polarimetric L-band SAR improve the overall classification accuracy significantly?(2) What is the impact of terrain correction on the classification accuracy in the mountain area?(3) What are the roles of employing additional SAR derived parameters for the improvement of the overall classification accuracy?(4) Which classification algorithm yields better results in Landsat, SAR and their derived features classification?

Study Area
The study area is a subset of the Hyrcanian forest, locally known as the Loveh forest.The Hyrcanian forest stretches over the northern slopes of the Alborz mountains and the southern coast of the Caspian sea.The natural vegetation is temperate deciduous broadleaved forest [26,27].
The Loveh forest has been treated by shelter-wood method since 1963.The treatment method was replaced with selective logging method in 2003.As a result, tree densities, species richness and the vertical structure of the forest have been modified.Three different stand age classes are found, due to these logging activities [23,24] (Table 1).Preparatory and establishment cuts provided more light for new seedlings to grow in managed stands, however some of the light-dependent species such as Tilia begonifolia, Acer cappadocicum, Diospyros lotus and Parrotia persica were more established than dominant species [28].Therefore, tree densities increase in managed forest compared to natural forest.The maximum tree density belongs to the MF2, where the long treatment time allows for more seedlings to be established.Because of the existence of some mature trees in MF1 class, the tree diameter at breast height (DBH) and basal area values are higher than MF2 class.However, the largest DBH, basal area and above-ground biomass (AGB) values are observed in natural forest [23].In this area, agricultural lands and flooded river (adjacent floodplain areas remained from successive floods in 2001, 2002 and 2003 [29]) are also observed and have representative spatial distribution in the study area.This is the forest area, which is managed by shelter-wood method.Preparatory cut, seed cut and establishment cut were done according to 25 years' time plan.The removal cut still is not done so some trees with large DBH can be found.
Managed forest 2 * (MF2) (25-45 years) This forest area is also managed by shelter-wood method for 45 years.Preparatory cut, seed cut as well as establishment and removal cut were done.In average, density of trees (number per hectare) is higher, and trees' DBH are smaller compared to other classes.
Agriculture land (AG) Agricultural land refers to agricultural areas purposely provided with water.
Flooded river (FA) Channel of stream plus any adjacent floodplain areas.

Field and Remote Sensing Data
Field survey was performed in the summer of 2004 in 99 plots (60 × 60 m) selected by systhematic sampling method.The sample plots were equally distributed among three forest stand age classes in order to be representative of forest over the study area [24].The geographic center of each sample plot was registered by handheld GPS.Within each plot, DBH were measured and number of trees and tree species were recorded (trees with a DBH ≥ 7.5 cm were included).The field measurements were only used for the description of the forest (Table 2).Notes: * AGB was calculated from DBH data where AGB = a DBH b : a = 0.0566, b =2.663 [30].
AGB: the total above ground tree dry biomass (Kg/tree); DBH (cm).We calculated the AGB of each tree in the plot.Then we summed up all the values and converted them to Mg/ha.
A Landsat-7/ETM+ scene acquired on 10 September 2007 was considered as reference in this investigation.Six reflective bands consisting of visible and short-wave infrared wavelengths with 30 m spatial resolution were used.Thermal and panchromatic bands were not included in this investigation.
ALOS/PALSAR were acquired on 27 September 2007 in fine beam double mode (FBD); HH-and HV-polarization.The scene was delivered in slant range single look complex (SLC) format (level 1.1).We focus on SAR data availability; therefore, there is unavoidable three years' time shift between field data and remote sensing data.Given our knowledge of forest growth in this area, the delay between remote sensing data acquisition and field survey will not significantly affect classification results.We also use SRTM data (90 m) from US Geological Survey (USGS).We then resample the DEM to 30 m resolution using the cubic convolution interpolation for further procedures described below.

Landsat-7/ETM+
The Landsat-7/ETM+ scene was corrected for the scan line corrector (SLC) error using one successive scene (i.e., acquired on 12 October 2007).The scene was then converted to at sensor radiance from digital number (DN), considering the gain and bias of the sensor.In the next step, at sensor radiance was converted to surface reflectance using atmospheric/topographic correction (ATCOR) for sattelite imagary in rugged terrain (ATCOR-3) [31] and SRTM.Atmospheric definition area set to rural in mid-latitude summer.Visibility and adjacency set to 20 and one kilometer, respectively.In order to verify the impact of shadow and relief on the surface reflectance, ETM+ was also atmospherically corrected with identical parameters values based on the ATCOR-2 [31].We then calculated different vegetation indices [32][33][34][35][36][37][38] (Table 3), principal component analysis (PCA) [39], tasseled cap transformation (TCT) [40,41] and gray level co-occurrence matrix (GLCM) [42] from the topographic compensated surface reflectance (Table 3).Vegetation indices, PCA, TCT and GLCM textures of optical data are widely used for retrieval of the forest structure as well as land cover and forest stand age classification [24,[43][44][45].Window size affects the role of GLCM textures in land cover classification [46,47].Small window sizes often exaggerate differences and increase the noise content on the texture image, while large window sizes cannot effectively extract the texture information due to smoothing texture variation [46][47][48][49].Based on visual interpretation and the separabilities among land cover classes, we chose the window size of 11 × 11 pixels with horizontal and vertical offset of one., where  , is the value in the cell ,  (row  and column ) of the moving window and  is the number of rows or columns.

ALOS/PALSAR
In order to enhance radiometric resolution and to square the pixels in ground range geometry at similar spatial resolution (i.e., 30 m for Landsat), the amplitude images were multi-looked eight times for the dual-polarization scene (i.e., four looks in azimuth and two looks in range) [50,51].We then performed refined Lee filter by a window size of 7 × 7 in order to minimize speckle noise [52].The performance of the filter and selection of the optimal window size were evaluated with the speckle suppression and mean preservation index (SMPI; [53]).
CF (calibration factor) = −83 dB, I and Q are the real and imaginary parts of the complex SAR image pixel value.
Since the study area is mountainous and a strong relief effect is observed, we performed radiometric terrain correction to compensate for the ground-topography influence on radar backscattering coefficient.The corrected backscatter in gamma-nought γ ° format can be obtained from the sigma-nought  ° value according to Equation (2) [56,57].
where γ ° is topographic normalized radar backscattering coefficient,  ° is radar backscattering coefficient, A flat is PALSAR pixel size for a theoretical flat terrain, A slope is true local PALSAR pixel size for the mountain terrain, θ loc is local incidence angle and θ ref is radar incidence angle at the image center.
The exponent n is the optical canopy depth and ranges between 0 and 1.It is a site-specific factor and difficult to obtain in practice, therefor it is set to 1 [58][59][60][61].
We calculated alpha angle (α), entropy (H) and anisotropy (A) (Table 4) according to Alpha-Entropy decomposition proposed by Cloude and Pottier [62].Cloude and Pottier have proposed a method of the extraction of mean diffusion based on eigenvalues/eigenvectors decomposition of the coherence matrix in order to characterize scattering interactions of the beams with the targets [62].High values of alpha stand for volume or multiple scattering mechanisms and low values associate with surface scattering [62].Entropy indicates the randomness or statistical disorder of the target [62].We also used GLCM (Table 3) in order to extract textural features from both HH and HV polarization channels.GLCM textures show to be useful to discriminate different forest regeneration stages [7,63].As mentioned in Section -Landsat-7/ETM+‖, selecting the appropriate window size for texture analysis is important.We choose the window size of 11 × 11 with horizontal and vertical offset of one based on the visual interpretation and the separabilities among land cover classes on texture layers.

Determination of the Land Cover Classification Scheme
In order to select homogeneous regions of each land use class (Table 1), we made use of the in situ measurements and previous land use/land cover map of the area [64].We also used 15 historical Landsat scenes, which encompass the period from 1986 to 2007.We performed unsupervised classification and visual interpretation of these scenes to ensure the forest boundaries, map different forest stand age classes over time and cross-check the previous land use/land cover map.The in situ measurements were useful for delineating the current status of different stand age classes of forest.Approximately 300 pixels of each land cover class were selected to represent land cover types over the study area.We used 70% of each class (ca.200 pixels) for training and 30% (ca. 100 pixels) for validation purposes.Separability analyses were performed based on the transformed divergence (TD) index.TD is a measure of separability between a pair of classes [65,66].The divergences values can range from 0 to 2 and indicate how well the selected pairs of classes are statistically separable from each other.Higher values indicate better separation [65,67].We divided the datasets into three major groups, (A) surface reflectance bands from Landsat-7/ETM+ scene; and pertinent features; (B) individual ALOS/PALSAR intensity backscattering and derived features and (C) the combination of both Landsat-7/ETM+ and ALOS/PALSAR data.The group B is divided into five subgroups (e.g., B1-B5) whose details are given in Table 5.
In order to maximize the classification accuracy, it is necessary to identify the best combination of textural bands as well as indices and features.In fact, not all the derived features are informative for land cover classification, or some of them may contain similar information [68].We initially selected the texture bands with high separability.Then, we checked the correlation among different textural bands to reduce the data redundancy [40,68].The final selection of pertinent features was based on experimental classification results.We fallowed the same procedure for selecting among vegetation indices and other features.
Because of the different nature of the data proposed in the classification scheme (Table 5), we evaluated three different non-parametric classifiers: support vector machines (SVM), neural network (NN) and random forest (RF).We also performed maximum likelihood classification (MLC) in order to compare its performance with non-parametric classifiers.MLC is a parametric classifier that assumes normal or near normal distribution for each feature of interest [68].Despite limitations due to its assumption of normal distribution of class signature [69], it is perhaps one of the most widely used classifiers [70][71][72].Non-parametric approaches are suggested for the classification of multi-source data in complex environments [73].SVM is a supervised non-parametric statistical learning technique [74] and it follows what is known as structural risk minimization.SVM is particularly appealing in remote sensing due to its ability to handle small training datasets successfully [75][76][77][78].SVM minimizes classification error on unseen data without prior knowledge about the probability distribution of the data [75][76][77].It creates a hyperplane through n dimensional spectral-space that separates classes based on a user defined kernel function and parameters such as penalty parameter.These parameters are optimized using machine learning to maximize the margin from the closet point to the hyperplane.A penalty parameter allows the SVM to vary the degree of training data misclassified due to possible data error when optimizing the hyperplane [79].Linear, polynominal, radial basis function and sigmoid are the four common kernels available in remote sensing packages.A careful selection of parameter setting can improve the performance of the SVM [80].We applied SVM with the radial basis function and penalty parameter of 100, which is also shown by Yang [80], as the best kernel and parameter for land cover classification [79].NN is also a nonparametric classifier with arbitrary decision boundary capabilities, easy adaptation to different data types and input structures as well as fuzzy output values and good generalization for use with multiple images [81].It benefits from parallel computation, the ability to estimate the non-linear relationship between the input data and desired outputs, and fast generalization capability [82,83].The parameter setting was based on experimental results.The logistic function as an activation function, one hidden layer and 1000 training iterations were selected.RF is a machine ensemble approach that makes use of multiple self-learning decision trees to parameterize models and use them for estimating categorical or continuous variables [84,85].RF can be used to learn complex non-linear relationships, such as those presented in variable vertical structure.Therefore, it is very efficient for classify complex and heterogeneous landscape [85].The number of variables is a user-defined parameter, as we had selected the layers in each dataset; therefore, we used all selected layers.Non-parametric classifiers often produce higher classification accuracy than the traditional parametric classifiers [75][76][77]81,82,84,85].
We then calculated producer's accuracy (PA), user's accuracy (UA) and overall accuracy (OA) from the classification results.Producer's accuracy measures the omission error to a certain class and it is the probability of a reference site being correctly classified.User's accuracy is the measure of commission error or the probability that a pixel classified on the image actually represents that class on the ground [86].The overall accuracy is the percentage of the correctly classified pixels in the validation dataset [86].We used Z-test to evaluate statistical significance differences in classification accuracy statistics [87].Figure 2 provides an overview of the entire approach adopted in this investigation.

Spectral, Backscattering and Polarimetric Characterization
Landsat surface reflectances for training dataset without and with topographic correction are shown in Figure 3.The topographic effects tend to decrease the surface reflectance in green, NIR and shortwave infrared (SWIR) regions due to the shadowing effects introduced by the relief and orientation of faced region (Figures 3).
After terrain correction, MF2 has the highest reflectance at NIR and the lowest reflectance at red.This particular forest class, subjected to intensive logging treatments, shows only one well-structured canopy layer, and less shadow effects among canopies.On the other hand, less intervention by logging at MF1 tends to preserve the vertical structure of the forest making the discrimination between NF and MF1 a challenging task [24].Different forest stand age classes may present similar canopy structure even with different ages, species complexity and biomass amounts; therefore, it is difficult to classify them based on surface reflectance [19].However, the spectral behavior and separability are in agreement with previous investigation in this study area using Landsat ETM+ data [24].The results are based on the training dataset.See Table 1 for the descriptions of classes.6 presents average values of the intensity backscattering at HH, HV as well as alpha and entropy for each land cover class.The backscattering values in both HH and HV polarized bands (Table 6) tend to decrease from NF to the both managed forest classes due to a more clear forest floor.Less density of trees per ha might enhance forest scattering (Table 6).Comparing the backscattering values in co-polarized band (HH) and cross-polarized band (HV) shows that all forest classes have higher backscattering in HH polarized band.This occurs because of the higher sensitivity of HH to volume scattering, which is influenced by the random distribution of branches, twigs and leaves [7].The use of dual-polarization data rather than quad-polarization-which was not available for the study area-could also affect the results [7].Forest classes show higher alpha values compared to other classes (Table 6).In all datasets, the separability index for pair -NF-MF2‖ are greater than -NF-MF1‖.That could be the effect of long term treatment on managed stand during 45 years (MF2), which leads to different forest structure to natural forest [23,24].In most datasets, separability index of -MF1-MF2‖ is the lowest, which makes the separation between two managed forest classes difficult.

Effect of Terrain Correction on Classification Accuracies
In order to show the effect of topography on classification results, we classified the dataset C with non-topographically corrected ETM+ data.Figure 5 highlights the effect of different facing slopes on classification results.A subset of classification results (Figure 5a) with, and (Figure 5b) without topographic effects, and (Figure 5c) aspect map are displayed.Points 1, 2 and 3 belong to MF2 class, located on different facing slopes.In Figure 5a, they are classified correctly.In Figure 5b, point 2 is misclassified as class MF1 because of presence of illumination effect.The same effect is observed for points 4 and 5.Both belong to MF2 class.In Figure 5b, point 5 is wrongly labeled as MF1.5a), however in Figure 5b, Point 2 is misclassified as MF1 because of the different spectral values as the illumination difference of opposite facing slopes (Figure 5c).The same reason is true for misclassification of Point 5 in Figure 5b.See Table 1 for the descriptions of classes.  1 for the descriptions of classes.
The average classification accuracy of non-topographically corrected ETM+ is 65% (from non-parametric classifiers).Figure 6 illustrates the SVM classification accuracies without and with topographic correction for three forest classes for datasets C. Our results demonstrate that high relief in the mountainous area reduces the classification accuracy.The same trend is reported by others [22,88]; we therefore focused on classification of terrain corrected datasets.

Performance Comparison of MLC, SVM, RF and NN Classifier
In Table 8, the classification results of four classifiers in each group are presented.The performances of four classifiers within each group are compared at 5% significance level.Table 9 presents the user and producer accuracy for each class.All classifiers show balanced PA and UA, as the differences between user and producer accuracies are not high [89].The results of classification of Landsat and its derived features show that MLC as well as the three machine learning classifiers have good performance (average OA = 84%) and there is no statistically significant difference (at 5% significance level) in their results.In group B, MLC has poorer performance compared to non-parametric classifier.In this group, except for those datasets that SAR textures (B2 and B3) are used, SVM and RF have the same performance and there is no substantial difference in their performances at 95% confidence level.Also, in all datasets SVM and RF produce better classification value at 95% confidence level compared to NN. B3 is the only dataset that NN and SVM have the similar results.In group C, MLC provides the poorest accuracy compared to non-parametric classifiers.In this group, SVM and RF have better performance at 5% significance level.Based on the relatively poor performance of MLC compared to non-parametric classifiers for ALOS/PALSAR classification, it was not considered in further analysis.Note: The identical superscripts show that the differences between the comparison cases are not statistically significant (5% significance level).
Table 9. User/producer accuracy.See Tables 1 and 5 for the descriptions of land cover classes and groups.

Assessment of the Classification Accuracy of ALOS/PALSAR
Here, we evaluated the contribution of polarimetric features and textures of polarized bands in classification overall accuracy at the 95% confidence level (Table 10).For each classifier, we performed Z-test between classification results of B1 and other datasets in group B. Absolute Z value at the 5% significance level is equal to 1.96.The comparison cases with Z value greater than absolute Z value are statistically different at 95% significance level.Table 10 summarizes the results of Z-test in group B. In subgroup B1, we classified HH and HV PolSAR bands.We added textures of HV to subgroup B2 and textures of HH to subgroup B3.We classified alpha, entropy and anisotropy as well as HH and HV polarimetric bands in subgroup B4.In Subgroup B5, PolSAR and their derived features are entered into classification algorithms.B1 has the lowest overall accuracy (average OA = 59%; Table 8).Considering UA and PA of each class, the order from highest to lowest is FA, AG, NF, MF2, and MF1 (Table 9).In B2 and B3, the average overall accuracies reach 70% and 73%, respectively (Table 8).The comparison among individual item classification show higher accuracies in subgroup B3 compared to B2 (Table 9).This could be explained by higher sensitivity of HH polarization to volume scattering.In B4, the average overall accuracy increases significantly to 75% (Table 8).The PA and UA values, especially in forest classes, significantly increase compared to B1, B2 and B3 (Table 9).In B5, we obtained the best classification results based on SAR data (average OA = 78%; Table 8).Regardless of the classifier, we concluded that HH and HV cannot separate different forest stand age classes properly, however textures of HH, HV as well as polarimetric features significantly (at 95% confidence level) increase the classification overall accuracies.

Classification of ETM+ and ALOS/PALSAR
Table 8 summarizes classification results of four classifiers for each dataset.We performed Z-test at 95% significance level between the classification results of group A versus each dataset to investigate the effect of different datasets on classification accuracy.We did the comparison among the results of each classifier separately in order to exclude the effect of classifier's performance.In group A, accuracies of classes AG and FA are higher compared to forest classes in all classifiers.Among forest classes, MF2 has the highest PA and UA values and MF1 in most cases has the lowest accuracies.Classification of Landsat spectral bands and its derived features (group A) produces high overall accuracy values with all classification algorithms (average OA = 85%; Table 8).Classification results of ETM+ indicate that medium resolution spectral bands of ETM+ data can classify the land cover efficiently.
In group B, FA has the highest PA and UA values.Forest classes do not follow the same trend in different subgroups.In most cases, the order of forest classes from highest to lowest PA and UA is NF, MF2 and MF1.The overall classification values vary substantially depending on input data.HH and HV cannot separate different forest stand age classes (average OA = 59%) properly.However, alpha, entropy and anisotropy as well as GLCM textures increase the average overall accuracy to 78%.Although, this overall accuracy does not overcome the overall accuracy from ETM+ dataset (OA = 84%) at 95% confidence level; however, it shows that in the absence of optical data (e.g., Landsat ETM+), SAR data can be used alternatively for classification purposes.This result is also in accordance with previous studies [3,7].In the last dataset, C, both Landsat-7/ETM+ and SAR data are jointly classified.Considering UA and PA of each class, the order from highest to lowest is FA, AG, MF2, NF and MF1 (Table 9).The classification accuracies of dataset C (OA = 86%) are significantly (at 95% confidence interval) different from datasets A and B. However, the joint process of ETM+ and ALOS/PALSAR does not greatly improve the classification overall accuracy, and it is very close to the original ETM+ classification.

Discussion
In this study, Landsat and ALOS/PALSAR data have been classified-separately and jointly-for mapping complex mountainous forests.Good classification results from medium resolution spectral bands of ETM+ data show its high applicability in mapping heterogeneous forest.The ETM+ classification result is inconsistent with results from previous investigation conducted in this study area [24].Forest stand age classification results based on backscattering at HH and HV are not satisfactory.Different forest classes have similar backscattering at HH and HV, which makes the separation among different classes a challenging task.The saturation effect in backscattering values, which occurs in such high biomass forest (i.e., >100 Mg/ha), may cause backscattering similarity in dual polarized bands [7,90,91].Separability values among forest classes increase by adding textures and polarimetric features.The acceptable separability is only observed when all polSAR features are used.The classification result of this dataset is satisfactory; however, it does not overcome Landsat classification result.It indicates that in the absence of optical data (e.g., Landsat ETM+), SAR data can be used alternatively for classification purposes.This result is also in accordance with previous studies [3,7].Integration of Landsat and ALOS/PALSAR slightly increases the classification accuracy.However, the improvement is not substantial and the result is very close to the original Landsat classification.The structure similarity in different forest classes in dual polarimetric SAR could be one probable reason for this result.
A comparison among the performances of both parametric and non-parametric classifiers showed that all classifiers have good performance for Landsat classification.MLC cannot effectively classify SAR data.It is probably because of its assumption of features' normal or near normal distribution, which may not be the case for forest SAR backscattering.In almost all SAR datasets, SVM and RF produce better classification values at 95% confidence level compared to NN. Classifiers have the same performance for classification of the joint dataset; SVM and RF have better performance compared to NN and MLC at 5% significance level.Parametric classification algorithms such as MLC are not typically suitable for multi source data [73,84].The better performance of SVM and RF compared to NN could be because of the fact that both of these classifiers can handle high dimensional data [77,84].

Recommendations and Conclusions
High classification accuracy achieved by inclusion of SAR textures and PolSAR decompositions shows that in the absence of optical data (due to the frequent cloud coverage in this region), SAR data can be used alternatively for classification purposes in complex mountainous forest.The joint process of dual polarimetric L-band SAR and ETM+ slightly improves the classification accuracy.The classification results are very close to original ETM+ classification, not only in the overall accuracy, but also in individual item classification accuracy.Our results also confirm that terrain correction is essential prior to data classification in those regions.The outcomes of SVM, NN and RF proved the robustness of nonparametric classifiers at 5% significance level for stand age forest classification in mountainous regions.All classifiers have similar overall accuracy for ETM+ classification.However, SVM and RF are considered more effective for the joint classification of ETM+ and SAR.MLC was as powerful as non-parametric classifiers in ETM+ classification; however, it did not show a good performance in case of SAR data classification.Although, the selection of a suitable classifier depends on tradeoffs among classification accuracy, time consumption, and computing resources.
While these results for the joint application of optical and SAR data for classification purposes in the mountain area are promising, there are several important points that should be taken into account for further investigation.Backscattering similarity in dual polarimetric mode is one of the reasons for relatively low overall accuracy resulting from SAR backscattering.Therefore, it is recommended to investigate the full polarimetric L-band SAR for forest stand age classification purposes.However, the ALOS/PALSAR mission ended in 2011, so currently no spaceborne L-band SAR exists.There are some planned spaceborne L-band SAR missions such as ALOS/PALSAR-2, TanDEM-L, MAPSAR and DESDynI.The prospective spaceborne SAR have some advantages over ALOS/PALSAR, such as more consistent multi-annual coverage as well as shorter repeat intervals for improved interferometric applications [10].Because of the sensitivity of SAR backscattering to soil and vegetation moisture [7], the prospective research shall focus also on precipitation events prior to capturing the data.

Figure 1 .
Figure 1.(a) Approximate extent of the Hyrcanian forest (green rectangle); (b) Location of the study area (red rectangle) in northern Iran.The land cover map is reclassified from 500 m MODIS land cover map; (c) The true color composition (3R2G1B) image of Landsat-7/ETM+ acquired on 10 September 2007 of the study area.

Figure 2 .
Figure 2. Flowchart of the proposed classification methodology.

Figure 3 .
Figure 3. Landsat surface reflectance, without and with topographic correction for forest stand age classes.

Figure 4 .
Figure 4. (a) Alpha-Entropy decomposition (  6 and  9 are dominated by surface scattering,  2 ,  5 and  8 by volume scattering, and  1 ,  4 and  7 by multiple scattering mechanism. 3 is non-feasible region).Dashed red polygon shows the extent of Figure 4b; (b) The zoomed view of the distribution of training dataset on the Alpha-Entropy plane.The results are based on the training dataset.See Table1for the descriptions of classes.

Figure 5 .
Figure 5.A subset of RF classification result (a) with, (b) without topographic correction and (c) aspect map.Points 1, 2 and 3 belong to MF2 class (Figure5a), however in Figure5b, Point 2 is misclassified as MF1 because of the different spectral values as the illumination difference of opposite facing slopes (Figure5c).The same reason is true for misclassification of Point 5 in Figure5b.See Table1for the descriptions of classes.

Figure 6 .
Figure 6.Comparison of classification accuracy resulted from SVM classifier for different forest classes without and with topographic correction.See Table1for the descriptions of classes.

Table 1 .
Characteristics of main land use classes.

Table 2 .
Summary of the field plot measurements.Average values per plot are indicated.Standard deviation values are indicated inside parentheses.

Table 5 .
Proposed scenarios for the classification scheme.

Table 6 .
Average values of the intensity backscattering at HH, HV, alpha and entropy for different land cover classes.The results are based on the training dataset.See Table1for the descriptions of classes.

Cover Classes HH Backscattering (dB) HV Backscattering (dB) Alpha (°) Entropy
Different forest stand age classes overlap each other showing predominantly surface scattering with moderate alpha values and relative high entropy values in dual polarization mode.The range of alpha and entropy values for different stand age classes in dual polarimetric mode is not wide enough to separate different classes.Agricultural land and flooded river represent surface scattering with relatively low alpha and entropy values.Table Figure4shows training datasets plotted on the Alpha-Entropy segmentation plane.

Table 7
compares the separability index for different combinations of forest class pairs.Groups A and C have high separability indices, showing a good separation between forest classes.In group B, except in subgroup B5, the separability indices between forest classes are low, showing that different forest stand age classes are difficult to separate with dual polarimetric SAR.Including SAR textures and derived features as well as polSAR bands (subgroup B5) significantly enhance the separability index (TD ranges from 1.45 to 1.83).

Table 7 .
Transformed divergence (TD) index for different forest class pairs, TD values higher than the threshold (TD ≥ 1.8) are in bold.

Table 8 .
Classification overall accuracy of datasets.See Table5for the descriptions of groups.

Table 10 .
Z-test * results for the comparison of overall accuracies of B1 versus B2-B5.See Table5for the descriptions of subgroups.
Notes: * Absolute Z value at the 5% significance level is equal to 1.96.The comparison cases with Z value greater than absolute Z value are statistically different at 95% significance level.