Covering a total area of 6.0 million ha (or 40.4%) of the total land area in Nepal [1
], forest has a significant role in social and economic development of Nepal. The contribution of forestry to national GDP is underestimated at 4.4% during 1990–2000. Another study, however suggests that the contribution can be as high as 15.0% [2
]. Sal (Shorea robusta
) is the sole dominant species in Sal forest and it is still one of the most important species in Nepali forests. Sal forest is distributed in varying altitudes from low land to up-hills being named Terai Sal in southern plains and Hill Sal in higher altitudes. Sal is usually the dominant tree in the forests where it occurs. Nationally, the proportion of Shorea robusta
is highest (15.3%) followed by Pinus roxburghii
(8.5%). Of the remaining, 60.0% are dominated by other species found in the mix forest [1
]. Above ground carbon stock of Sal were estimated at 80.0 t/ha [3
] indicating that Sal forest is important for climate change mitigation and the potential benefits from the REDD+ scheme (Reducing emission from Deforestation and forest Degradation) of the United Nations Convention on Climate Change if the Sal forest is properly managed [4
]. Currently, Nepal uses a definition of forest as having an area of land at least 0.5 ha, minimum width/length of 20 m, with tree crawn cover of more than 10% and tree heights of 5 m at maturity. Moreover, the definition of “forest type”, for the purpose of this study, is adapted from forest resource assessment (FRA) of Nepal [1
] and is defined as the forest where at least 60% of the area is occupied by that particular species; and if no species constitutes more than 60% of the basal area, such forests are defined as mix forests. Sal forest provides varied products that include timber, wood for tools and furniture; carvings for historical, religious and architectural structures; utensils, firewood, plates and bowls, gum, green manure, medicines and resin, and livestock browse [6
]. Moreover, Sal forest is one of the favorite food sources for Asian elephants [9
] and a suitable habitat for endangered Bengal tigers [10
]. Sal is also important in terms of cultural beliefs. National forest inventories are less often updated working multiple years and spending huge budget. However, our study has potential to find Sal forest at the national, sub national, and local level and produce high resolution Sal maps in an economic and timely manner.
Forest mapping is needed for proper forest management because there are different types of forest and each forest would require different management regime [11
]. Mapping Sal forest, is important for better forest policy formulation and management in Nepal. It can also be utilized for carbon monitoring under REDD+ scheme. Information about distribution of Sal forest can be very useful to forest managers for sustainable use of forest resources and environment. Up until now, there are no Sal
forest maps generated for Nepal by the government nor are any found in the literature. Developing such maps solely by field assessment over a large area is extremely difficult and expensive. So, enhanced methods and the latest technologies are needed to obtain explicit species information from forest.
Analysis of remotely sensed data is a modern approach which has been quite popular in mapping and monitoring forests over the past few decades. With rapid advancement in the remote sensing technology, today we have many satellite systems which acquire earth observation data in high temporal, spectral, and spatial resolution compared to the past. Hyper spectral and Lidar [12
], laser scanner [15
] as well as fine resolution multi spectral data [16
] have been used for tree detection and species classification. However, the high cost of acquisition and processing, the limited availability and coverage, and their applicability in operational levels are limited to a handful of organizations with sufficient resources. Hence, for larger area mapping and classification, MODIS and Landsat have been popular because of their free cost, global coverage, and high temporal availability [17
]. One of the major drawbacks of using single instant imaging for forest type classification, though higher resolution, is that multiple species might have similar spectral signatures. Moreover, successive high-resolution data is not available. Even with Landsat, cloud free data for desired locations throughout the year is not always available. For example, out of 23 images a year (every 16 days), only 8 clearer images were available (with clouds <10%) in our study area. No useful data was acquired for four months, i.e., July, August, September, and November. At times, multiple scenes in the same study area acquired at different dates further complicate the issue as the vegetation structure of the same species changes over time. Hence, MODIS 16 days composite products developed from daily observation would be a reasonable choice for observing vegetation change.
Phenological study of different species in the past have shown differences in the time and duration of phenophases of individual species. Bajpai et al. [20
] performed an in-situ experiment to investigate phenophases of two dominant species, Shorea robusta
and Ficus hispida,
in deciduous natural forests along the Indo-Nepal border. Data collected from 160 twigs showed clear differences between the species. The variation in vegetation phenological properties opens the possibility of differentiating the specie types using spectral signatures over time. Time series data analysis with vegetation indices is popular in Land Use Land Cover (LULC) classification as well as agricultural and forestry applications. Yan et al. [21
] utilized normalized difference vegetation index (NDVI) and enhanced vegetation index (EVI) time series data to classify vegetation cover types in China. MODIS time series images have been used for particular crop identification [22
], land cover classification [26
In the recent past, some studies have been conducted over Sal
in Nepal, India, and the region. Most of them are related to carbon stock and above ground biomass calculation. A study by Pandey et al. [27
] in Chitwan, Nepal shows that carbon density is highest in Sal
followed by mix forest. Patel et al. [28
] estimated biomass of Shorea robusta
using principal components of vegetation indices from Landsat images. Chitale et al. [29
] characterized different Sal species in Northern India based on visual interpretation of size, tone, shape, and texture generated from optical images. However, applicability of remote sensing data for forest type mapping has been the least explored in Nepal [30
]. Forest type mapping in Nepal have usually been multi-year projects updated every decade. Reliable methods and economic technology are critically needed.
Maximum likelihood and minimum distance classifiers are quite popular conventional methods being used for classification with multispectral data. Recently, notable attention has been drawn by machine learning algorithms like support vector machines (SVM), random forest (RF) and neural networks, among others [14
]. For supervised techniques, class separability can be improved by calculating metrics like Jeffries-Matusita (J-M) and Transformed Divergence (TD). Based on the score between each pair of classes, they can be merged, further partitioned or deleted. Some researchers prefer using unsupervised techniques like K neighborhood for clustering [26
]. Remote sensing images have bigger size compared to ordinary images. Many such images are needed to cover larger area, which demands high processing costs and time. At times, multiple images might carry redundant information which is surely a processing overhead. So, for similar performance, fewer images are preferred. Identification of a minimum number of images that collectively carry maximum information and the least redundancy is not a simple task. However, there are reduction techniques that select the most important images and reduces the dimensionality of the input, thereby optimizing the processing time. Boruta algorithm [33
], principal component analysis, and minimum noise fraction (MNF) [12
] are very popular in the literature. Reduced images surely achieve gain in processing time; additionally, it sometimes yields even better accuracy.
This study aims to produce Sal map in Southern Nepal using the optimum number of vegetation indices (VI) products of MODIS time series. We focus on following two research questions: (1) Can Sal forest be mapped in coexisting heterogeneous forest environment using MODIS time series data? And (2) which period of the year provides optimum performance in terms of accuracy and processing cost?
The agricultural area had shorter and multiple phenological cycles; one during April–June and next one between June–November. The area in the earlier period was for vegetables and short cash crops, like mustard, while the area in the later period had a paddy crop cycle. Pure Sal forest had a cycle starting in March and reached the peak in July. In an earlier in-situ research on a phenological study [20
] , it was found that the flowering of Shorea robusta
started in March, shortly after the dry season, and took 3–4 months to become fully matured in June. The trend is similar; however, there exists some time difference. This gap might have been contributed by other factors—such as temperature, rainfall, soil type, aspect, etc.—which are not considered in this study. Moreover, MODIS products used were of 16 days composite. The EVI values of all the types that started advancing in March are justifiable since the day light starts increasing around this time shortly after the chilling winter in Nepal that runs from December to February. Not much of a difference was seen in the EVI curve for all the types during the end of December until February. This is typically dry season when all deciduous trees either lose their leaves or undergo very inactive photosynthesis. The statistical analysis also revealed that the most significant months for Sal detection were June, July (X19–X29), October and November (X36–X42) for the year 2015 in this studied area. This could differ slightly for some other year and location due to effect of climatic and geological factors on phenology.
Statistical study, as well as machine learning classification algorithms, have suggested EVI to be better than NDVI data. The result of EVI was superior to that of NDVI when tested with SVM as well as RF. For EVI datasets, an overall accuracy and kappa value of 78.4% and 0.69 was observed with SVM and 70.6% and 0.59 with RF. But, OA (68.6%), kappa (0.57) with SVM as well as OA (65.4%), and kappa (0.53) with RF were less for the NDVI data set of the same time series. Even the visual interpretation of the time series graph aligned with this result. The reason for this might be that EVI was introduced to minimize the effect of aerosol and overcome the saturating tendency of NDVI over high biomass and leaf area index (LAI) which is true for this studied area. According to this study, support vector machine (SVM) was preferred over the random forest (RF) for Sal
mapping. Nevertheless, both classifiers are widely used in machine learning algorithms, and they have performed superior to each other in different past situations. The accuracy of the results obtained was comparable to that of the past researchers Chitle et al. [29
] , Peter Burai et al. [41
] , and Mathus Pinheiro et al. [42
]. Field observations of many forests with varying species and interaction with local people, as well as forest officers and experts, was helpful in adding implicit knowledge of species distribution as well.
The Boruta algorithm suggested that all the features were relevant for classification. So, the best result can be expected using all the 46 imageries of the year. However, the use of minimum resources without significantly compromising the performance is always justifiable as it always costlier when using a bigger set of input data. Statistical properties of VIs showed that the growing (June–July) and post monsoon (October–November) season was the most critical period for Sal mapping. During June and July, the early phase of monsoon season, trees gain new leaves and hence VI values increase. It was found that Sal has higher values than other trees during this season. During post monsoon phase, shrubs and under tree vegetation cover is generally high. However, Sal dominated forest has less undergrowth vegetation. This might be the reason that mix forest has higher VI values compared to that of Sal. This selection of crucial season also coincides with Boruta results as 10 out of the 12 most important images suggested by Boruta were present within the 21 images that were statistically selected.
While the number of input images was reduced, the processing time also decreased. Most importantly, images 8, 12, 16, 20 and 24, suggested by Boruta algorithm, yielded results in 20, 24, 30, 40 and 45 h, respectively. The 12 most important images selected by Boruta algorithm outperformed all the remaining reduced subsets. The statistically reduced 21 images also provided almost equal performance. The F-score for Sal with 12 images was 0.79, and that with 21 images was 0.77, which was 90% and 87% of the best result obtained when full images was used. It was observed that even by increasing the number of images to 16, 20 and 24, there was no better result; although processing time increased in proportion to the increased images. There might be duplicate information carried by different images, which would have resulted in giving either a similar performance even by increasing the total number of input images. Utilizing the importance score of the Boruta algorithm, the researcher can enjoy the liberty of choosing different subsets of a varying number of images and determine optimum images, considering the tradeoff between accuracy and time.
Major misclassification occurred between Sal and mix classes, as well as in between Agriculture and RVN. This was constantly indicated by all the results in graphical representation, JM score, and the confusion matrix. In the field, it was observed that the mix forest also had Sal species mixed up in many places. Moreover, most species in the mix forest were broad-leaved like Sal. Though Sal was found mostly in its homogeneity, covering quite a bigger spatial extent, the case was different for mix and RVN. In certain areas, they were found amid an agricultural field or a river bank or in the middle of a settlement area covering comparatively less area. So, coarser resolution of the MODIS image with the length of 250 m was certainly a limitation. Obtaining an accurate measurement of the location in the deep forest by a handheld GPS device was another challenge and might be one possible source of error. The varying age of trees and the varying canopy density might have contributed to some errors. Geological, geographical and climatic factors, such as soil type, aspect, elevation, rainfall and surface temperature, were not investigated in this research. In further studies, these factors could be considered for the use of enhancing the understanding of species phenology and increasing the accuracy of Sal mapping. Since this study has identified major seasons of the year for better image acquisition, the spatial resolution of classification can be improved by using higher resolution satellite images from other sensors like Landsat or Sentinel if cloud free images are available.