Unsupervised and Supervised Classification through Band Ratios and DEM in a Mountainous Landscape in Nepal

Modification of the original bands and integration of ancillary data in digital image classification has been shown to improve land use land cover classification accuracy. There are not many studies demonstrating such techniques in the context of the mountains of Nepal. The objective of this study was to explore and evaluate the use of modified band and ancillary data in Landsat and IRS image classification, and to produce a land use land cover map of the Galaudu watershed of Nepal. Classification of land uses were explored using supervised and unsupervised classification for 12 feature sets containing the LandsatMSS, TM and IRS original bands, ratios, normalized difference vegetation index, principal components and a digital elevation model. Overall, the supervised classification method produced higher accuracy than the unsupervised approach. The result from the combination of bands ration 4/3, 5/4 and 5/7 ranked the highest in terms of accuracy (82.86%), while the combination of bands 2, 3 and 4 ranked the lowest (45.29%). Inclusion of DEM as a component band shows promising results.


Introduction
Satellite remote sensing has become an important tool for monitoring and management of natural resources and the environment.Remotely sensed data are widely used in land use/land cover classification.Several techniques have been reported to improve classification results in terms of land use discrimination and accuracy of resulting classes while processing remotely sensed data [1].However there are some problems in getting highly accurate land use data, particularly in the mountainous landscapes of tropical humid climates.One of the main problems when generating land use maps from digital images is the confusion of spectral responses from different features.Discrimination of land use land cover types, including vegetation types, through the use of remote sensing techniques in the mountainous areas of Nepal is a very difficult task because of the complex structure and composition of vegetation communities.In many cases, the size of old secondary growth areas is too small to be detected by Landsat and IRS.In addition, the mountain topography leads to a significant shadowing effect, which becomes a particular problem in the digital image processing [1][2][3][4][5].
One of the techniques to achieve improvement in digital classification is the incorporation of ancillary data, such as a digital elevation model (DEM) [1,6].There are several approaches for incorporating ancillary data such as DEM, slope, aspects etc. in the image classification process, and earlier studies have demonstrated the use of a DEM for a number of purposes [1].DEM integration in image classification has helped increase the classification accuracy of digital data [7][8][9][10][11][12].A DEM has also been used to describe the distribution of terrain components which contribute to spectral response [13], identify sites for fieldwork [14], geographically stratify training areas or homogeneous regions [17], and provide topographic normalization of Landsat TM digital imagery [16].
Eiumnoh and Shrestha [1] mentioned in their paper that in areas of rugged terrain, integration of other ancillary information together with elevation data has improved classification accuracy.For example, Franklin [15] reported that classification accuracy using Landsat MSS increased from 58 to 79 percent with the addition of elevation data.Furthermore, the addition of four other geomorphometric variables (relief, convexity, slope, and incidence) yielded 87 percent classification accuracy.Cibula and Nyquist [18] used elevation along with slope and aspect variables to improve MSS classification.A DEM has also been used with microwave data for surface pattern discrimination [19], forestry studies [20], and snow monitoring [21], among others.
Agricultural applications of remote sensing are time critical, and, hence, careful selections of image dates are important because land-use/land-cover class separations are usually related to plant phenology [1,18].Experience shows that, in the tropics, high-resolution visible and infrared satellite sensors cannot always provide the desired information due to constraints related to cloud cover and revisit schedules [22].Langford and Bell [23] have also identified the problem of frequent cloud cover, as well as topographic effects and mixed pixels, in the tropical environment of Colombia.Cloud-free data are generally acquired during the dry season when most of the agricultural fields are fallow.Such a situation makes the process of discriminating land use/land cover a difficult task.
Franklin [17] mentioned that topography exerts a powerful influence on the spectral response observed by polar orbiting satellites, the lack of studies on digital image classification using band ratios and ancillary information in Nepal provided the main impetus for this work.The objectives of the study were to evaluate the use of band ratios and elevation data in Landsat and IRS image classification, and to produce a thematic map of land use of the Galaudu Watershed of Nepal.In this study, DEM was prepared from the elevation data using the topographic contours of 1:25,000 scales.The DEM was using with standard digital image processing operations as a component band during image classification process.Improvement of the classification of different land use classes were explored using supervised and unsupervised classification techniques for several feature sets of Landsat and IRS data.

Study Area
The study area constitutes a mountainous watershed named Galaudu/Pokhare Khola sub-watershed (hereafter refer as Galaudu watershed) situated in Dhading district of Nepal (Figure 1).Topography of the watershed is mountainous, with an average slope exceeding 30 percent and showing features which are often found in mountain zones of Asia.Most of the watershed is a mountainous region under hill forest and upland cultivation.The soils of the watershed are loam, sandy loam, clay loam, silt loam and sandy clay loam.The area has a sub-tropical climate, with a mean annual rainfall of 1,404 mm.The elevations of the highest and lowest point are 1,960 m and 217 m a.s.l., respectively.The watershed can be divided into fertile, relatively flat valleys along the rivers and surrounding uplands with medium to steep slopes (as shown in Figure 1).Agricultural lands in the valleys are under intensive management with multiple cropping systems and are mostly irrigated.Paddy, potato, wheat and vegetables are the major crops cultivated in the valley.

Spatial data
The basic data used to prepare the land use map of the study area were Landsat multispectral (MSS) and Thematic Mapper (TM) digital data, Indian Remote Sensing (IRS) digital data.Black-and-white aerial photographs of 1:50,000 scales from 1978 and 1992 were used for "ground-truth" information required for classification and accuracy estimation of classified MSS and TM images respectively.Topographic maps of 1:25,000 scales published by the Nepal Department of Survey and digital topographic data with contour interval of 20 m produced by the same agency were also used.

Image processing of the satellite data
Digital image processing was carried out to obtain the land use and land cover maps from the remote sensing (RS) data.Digital image processing of RS data for generating land use map involved several steps as shown in Figure 3. RS data pre-processing was designed to remove any undesirable image characteristics produced by the sensor to calibrate the image radiometry, remove noise and correct geometric distortions [24].The obtained image, which was geometrically corrected at smaller scale, was further corrected by registering the images to the 1:25,000 scale topographic map sheets by selecting ground control points (GCPs).The root mean square (RMS) error of less than 1 pixel at the first order and the nearest neighborhood transformation was accepted.The study area was clipped with a vector boundary layer.Classification of remote sensing data was performed by extracting different feature sets using Band ratios; normalized difference vegetation index (NDVI) and principal component analysis (PCA) is among the standard image processing techniques [25].Basically, these techniques are the data reduction techniques, which are carried out to enhance an image, as the quantity of information carried by satellite data is not necessarily same as the amount of data.Six different feature set for MSS, twelve different feature sets for TM and six different features sets for IRS (Table 1) containing original bands, derived bands and DEM were decided before running the classification.Of the unsupervised classification techniques, the Iterative Self-Organizing Data analysis technique (ISODATA) was used.This technique repeatedly performs an entire classification and recalculates statistics with minimum user inputs for locating clusters and also is relatively simple and has considerable intuitive appeal.However, the output of this technique could be affected by the choice of initial parameters and their interactions with each other [26].Parameters assigned for each classification scheme were kept same including maximum number of clusters (40 clusters) to be formed.The clusters formed were regrouped with Ward's method of hierarchical clustering technique, which is designed to optimize the minimum variance with in clusters [27].This technique calculates means for each variable within each cluster and squared Euclidean distance to the cluster means for each case [23].Distances are summed for all of the cases and at each step, the two clusters merge are those that result in the smallest increase in the overall sum of the squared within-cluster distances.Resulting classes were identified on the basis of knowledge drawn from the field survey and previous land use map.Results of each classification scheme were compared creating the error matrices and the overall classification accuracy.
Finally, only one classification scheme that showed superior result during unsupervised classification was selected.Supervised classification was performed on the selected classification scheme employing Bayesian maximum likelihood classifier (MLC).MLC, a parametric decision rule, is a well-developed method from statistical decision theory that has been applied to problem of classifying image data [28,29].At first, training signatures for identifiable classes were established with the field knowledge, which were evaluated for possible discrimination of individual class.After obtaining a suitable indication for satisfactory discrimination between the classes during signature evaluation, final classification was run to produce land use map.Training areas corresponding to each classification item (hereafter, land use class), in case of the IRS image, were chosen from among the training samples collected from the field and in case of the MSS and TM images they were generated from the interpretation of aerial photographs of 1978 and 1992 of the study area.Although the dates of the aerial photographs used as reference information in these classifications do not exactly match with the dates of the satellite images, they were used in the research with the assumption that land use in the watershed was not substantially changed between the time of aerial photography and satellite observation dates.Moreover, this was the best feasible option that could be used in this research.
For producing land use maps for 1976, 1990, 2000 and 2002 and to investigate changes that occurred between these periods, the following five land use land cover (hereafter, land use) classes were considered in image classification: forestland, scrubland, lowland agriculture, and upland agriculture.Choice of these classes was guided by: i) the objective of the research, ii) expected certain degree of accuracy in image classification, and iii) the easiness of identifying classes on aerial photographs.A brief description of each of the land use classes is given in Table 2.

Land use classes General description Forest
Forest areas with estimated 75 percent or more of the existing crown covered by trees.

Scrublands
Land covered by shrubs, bushes and young regeneration.Degraded forest areas with estimated < 10% tree crown cover are also included.Lowland agriculture Irrigated, level-terraced agricultural lands in river valleys, used for multiple cropping including winter crops.Wheat and potato are two major winter crops cultivated in these lands after the harvest of paddy rice in November-December.Upland agriculture Non-irrigated agricultural lands with or without slopping terraces, barren lands, settlements, roads, construction sites and other built-up areas.Vegetables Irrigated agricultural land under vegetables during winter season Among all the land use classes, "upland agriculture" is the most complex class.In fact, it includes all other combinations of land uses, which are not included in the remaining classes.During winter, uplands in the study area, like most of the Middle Hills, are mostly barren and have spectral values similar to those of barren lands such as non-vegetative hills and riverbeds [30].Moreover, during the time the satellite imageries were taken (particularly the IRS image), many upland terraces had exposed soil due to fresh ploughing by farmers as a preparation for the next summer crop.This condition of the cultivated uplands made it impossible to distinguish them from rough roads, new construction sites and other built-up areas.This justifies combining settlements, barren lands and built-up areas with upland agricultural lands in this study, which may not be acceptable at any other time of the year.
Post classification was performed after selectively combining classes; classified images were sieved, clumped and filtered before producing final output.Sieving removes isolated classified pixels using blob grouping, while clumping helps maintain spatial coherency by removing unclassified black pixels (speckle or holes) in classified images.Finally a 3 × 3 median filter was applied to smoothen the classified images.All activities related to image processing were performed in ERDAS Imagine version 8.7 [31].Classified images were then exported to Arc View GIS Version 3.2 [32] from ERDAS and rest of the analyses was performed in GIS environments.The classified images were first converted to grid and then to shape format in Arc View.The polygon themes so generated, were exported to Arc Info GIS Version 3.5.1 [32] and polygons of <0.5 ha in size were "eliminated" in Arc Info.This elimination was necessary to minimize the effects of classification errors arising from resolution differences among the three satellite images while at the same time without significantly altering the area under each land use class.The resultant polygon themes were used in further analyses.

Detection of land use changes
The land use polygon themes for 1976, 1990 and 2000 obtained from the digital classification of satellite data and subsequent GIS analyses were overlaid two at a time in Arc View GIS and the area converted from each of the classes to any of the other classes was computed.

Land Use/Land Cover Assessment through Image Classification
The mean spectral values for the different land cover classes show that for classification scheme containing only original bands does not attain higher accuracy, as only bare soil and forests were distinguishable.A mixed class situation was observed between forest and lowland agriculture.Especially Band 2 of TM and IRS were not very useful with the similar response for all classes.Band 3 responded higher for bare soil area.Band 4 of TM, which is the near infrared band, responded the highest for vegetation but did not discriminate due to similar response patterns of these classes in other bands.Features set containing NDVI gave the more accuracy in regards to vegetation because of the NDVI responded the highest for vegetation and the lowest for bare soil and burnt field.Some of the bare soil and burnt field was clearly discriminated but at the same times others were found mixed with forest area.One unique class was discriminated upon the addition of DEM together with NDVI.This was the scrubland with relatively scattered and no big trees.The forest types were easily distinguishable, unlike in the earlier mixed situation.
The former mixed class situation between scrub and forest was improved due to the ratio band of 5 and 2, 5 and 4 and 5 and 7. Thus the major land cover classes were differentiated, including forest types from the classification schemes with these kinds of ratio bands.Vegetation classes obtained from these combinations gave the highest accuracy.Overall classification accuracy ranged from 45 to 68 percent for unsupervised techniques.Among the twelve classification schemes for TM, and six for MSS and IRS inclusion of DEM as component band in the classification schemes showed very promising results as it helped to differentiate the vegetation types such as forest types which otherwise not possible to differentiate with NDVI due to similar signatures.Vegetation types, bare soil and upland cultivation areas were discriminated by using the combination of three different ratio bands (4/3, 5/4, 5/7) that gave the highest accuracy.This is due to the ratio bands substantially reducing the effect of shadow, which is a particular problem for satellite image processing and interpretation in the mountain areas.
Combinations of ratioed band schemes was selected for the supervised classification because this scheme gave the higher accuracy in the unsupervised classification.Supervised classification was run on the three bands containing ratio of band 4 and 3, ratio of band 5 and 4 and ratio of band 5 and 7. Trisurat [33], in his study conducted in the Khao Yai National Park of Thailand having a similar bio-climate as the Galaudu watershed, reported that the ratio 5/7 was useful for discriminating the forest classes.
The maximum number of classes yielded by this set than any other sets during unsupervised classification was the basis of testing it in supervised classification.Mixed deciduous forest, which was misclassifying either with scrub or other vegetation in unsupervised classification, could be discriminated on a knowledge base selection of training areas using elevation as a criterion.The spectral signatures of the training area for individual classes showed quite satisfactory indication for separability of the classes selected in either of ratio 4 and 3, ratio 5 and 4 or ratio 5 and 7 bands.An overall map accuracy (OCA) of 94.08 percent was obtained through error matrix of training signatures.As the post-classification accuracy evaluation, an OCA of 82.86 percent and kappa coefficient of agreement (KHAT) statistics of 0.74 (Table 3) were obtained for the classified image.

Land Use/ Land Cover Change Detection
The land use maps for 1976, 1990, 2000 and 2002 are presented in Figure 4 and the area under the four land use classes during the three periods is shown in Table 4.The results show that forest area decreased while agriculture area increased continuously over the study period.Scrublands (degraded forest land) were assessed at 1976 but they were already converted into the good condition of forest at 1990, while lowland agricultural area was expanded very much during the first period whereas upland agriculture area were expanded very much at latter period (Tables 4 and 5).Among the major land use groups, around 65 percent of upland agriculture, 52 percent of lowland agriculture and 45 percent of the forest area in 1976 remained unchanged until 2000.Forest lands shrunk by about 55 percent of 1976 area in between 1976 and 2000 (Table 5).The observed trends of decreasing forest and increasing agricultural areas in the watershed could be explained by the following three main reasons: first, a substantial proportion of the agricultural lands in the study area are in higher inclinations areas where slope stability and soil erosion is of critical concern [34,35].Those steep agricultural fields suffer from rapid soil erosion and nutrient depletion, which forces farmers to find new land to meet their growing food needs, so farmers could have moved towards the forest area for their agricultural activities.Some other studies in the mountains regions of Southeast Asia have found that many households are practicing shifting cultivation [36][37][38].There is also evidence from the hills of Thailand [39] and Honduras [40] that declining soil productivity and increased weed competition leads to the farmers to find new land for the fulfillment of their basic needs.Second, most of the settlements in the upland area do not have good quality land (less productive, out facing sloping terraces, no irrigation facilities) compared to lower elevation areas because of which a higher level of human-forest interaction can be expected in these areas, thereby bringing about more pronounced forest losses compared to the lower elevation areas.Third, most of the community forestry activities that are expected to have positive influences on the balance of forest cover were concentrated in lower elevations.A lower amount of forest loss in lower elevation zones suggests that forest conservation efforts by the local communities and concerned agencies played important roles by bringing positive outcomes in the balance of forestry land use in the watershed.The same could not happen at higher elevations (highland) because of the inability of community-based forest management programs to cover those areas and virtually non-existent forest monitoring by the forest department thereby leading to an open access condition of the high altitude forests.
The existing model of community forestry systems was unable to bring high elevation forests under management.This was probably because of difficulties in identifying users and use patterns arising from a continuous extensive forest accessed by widespread settlements.Although there was a net loss in forest area, a substantial proportion of degraded forests (scrubland) at lowland were converted into good condition of forest especially in the first period (1976-1990), which might have been due to success of community forestry program at lower elevation and near to road and market center because accessibility of those areas.A continuous loss in forest area over time despite different forestry development program challenges the efforts of forest conservation and development, the forest department and the donor agency.A combined investment from multiple actors at various levels is indeed one of the important conditions for successful outcomes from collective actions at a local level [41].
The expansion of lowland agricultural area during the first period at the expense of upland agriculture indicated increased agricultural intensification and diversification during the period.From conversations with the local farmers it was revealed that there was indeed a big shift in the use pattern of lowlands during this period because of farmers' attractions towards winter cropping of mainly wheat and potato on irrigated lands.More recently potato cultivation for commercial purposes has gained momentum on the lowlands due mainly to improved access to local markets and higher profitability compared to wheat and other cereal crops.

Conclusions
Satellite remote sensing has become an important tool for monitoring and management of natural resources and environment.One of the main problems when generating land covers maps from digital images is the confusion of spectral responses from different features.Discrimination of land use land cover types including vegetation types through the use of remote sensing techniques in mountainous areas of Nepal is a very difficult task because of the complex structure and composition of vegetation communities.In many cases, the size of old secondary growth areas is too small to be detected by Landsat and IRS.In addition, the mountain topography leads to a significant shadowing effect, which becomes a particular problem in the digital image processing.Various techniques of improving classification accuracy in digital image classification are found in practice.Geo-spatial data also has great potential in enhancing the digital image classification apart of their multi-purpose use.In this study, DEM was prepared from the elevation data from the topographic contours of 1:25,000 scales.DEM was incorporated with standard digital image processing operations as a component band during image classification process which helped to improve the classification result.
In this study, improvement of the classification of different vegetation types including the upland agriculture areas in the Galaudu watershed, Nepal, was explored using supervised and unsupervised classification techniques for several feature sets of Landsat and IRS data.Overall, the supervised classification method produced higher accuracy than the unsupervised approach.The result from the combination of bands R4/3, R5/4 and R5/7 ranked the highest in terms of accuracy (82.86%), while the combination of bands 2, 3 and 4 ranked the lowest (45.29%).This result may be influenced by the shadowing effects, which were substantially reduced in the ratio band combinations.It can be inferred that the digital classification can help to improve vegetation mapping including upland agriculture areas compared to visual classification, because digital classification allows the interpretation of smaller mapping units, offering greater detail compared to the visual technique, where the mapping units are generalized.

Figure 1 .
Figure 1.Location of the study area.
Extensive ground truthing was carried out to collect the training samples for TM image for 2000 and IRS image 2002.The spatial database consisted of Landsat-MSS data acquired on 10th October 1976 (Path 152, Row 41) (Figure 2a) and TM data acquired on 4th February 1990 and 13th March 2000 (path/row-141/41) (Figures 2b and 2c) in digital format.Similarly IRS data was acquired on 16th February 2002 (Path Row 104/052) (Figure 2d) in digital format.RS data acquired in the dry season has less spectral response from the vegetation as most of the agricultural area are in harvested condition.This situation makes it difficult to interpret such data.Use of DEM or elevation data is one of the techniques, which has been used successfully in the temperate and alpine regions to interpret the remote sensing data.

Figure 3 .
Figure 3. Steps of digital image processing to produce land use land cover map.

Table 1 .
Feature sets prepared and tested.

Table 3 .
Error Matrix of the Supervised Classification map of Galaudu Watershed, Nepal.

Table 5 .
Percent change in land use during the three periods, Galaudu watershed, Nepal.