1. Introduction
Oil palm trees are important economic crops. In addition to their main use to produce palm oil, oil palms are also used to generate a variety of products such as plywood, paper, furniture, etc. [
1]. Information about the locations and the number of oil palm trees in a plantation area is important in many aspects. First, it is essential for predicting the yield of palm oil, which is the most widely used vegetable oil in the world. Second, it provides vital information to understand the growing situation of palm trees after plantation, such as the age or the survival rate of the palm trees. Moreover, it informs the development of irrigation processes and maximizes productivity [
2].
Remote sensing has played an important role in various studies on oil palm productivity, the age of oil palm trees and oil palm mapping, etc. [
3,
4,
5,
6,
7,
8]. In recent years, high-resolution remote sensing images have become increasingly popular and important for many applications including automatic palm tree detection. Previous palm tree or tree crown detection research has usually been based on traditional methods in the computer vision domain. For instance, a tree detection–delineation algorithm was designed for high-resolution digital imagery tree crown detection, which is based on the local maximum filter and the analysis of local transects extending outward from a potential tree apex [
9]. Shafri et al. [
10] presented an approach for oil palm tree extraction and counting from high spatial resolution airborne imagery data, which is composed of many parts including spectral analysis, texture analysis, edge enhancement, segmentation process, morphological analysis and blob analysis. Ke et al. [
11] reviewed various methods for automatic individual tree-crown detection and delineation from passive remote sensing, including local maximum filtering, image binarization, scale analysis, and template matching, etc. Srestasathiern et al. [
12] used semi-variogram computation and non-maximal suppression for palm tree detection from high-resolution multi-spectral satellite images.
Moreover, some researchers have also applied machine learning-based methods to palm tree detection studies. Malek et al. [
2] used a scale-invariant feature transform (SIFT) and a supervised extreme learning machine classifier to detect palm trees from unmanned aerial vehicle (UAV) images. Manandhar et al. [
13] used circular autocorrelation of the polar shape matrix representation of an image as the shape feature and a linear support vector machine to standardize and reduce dimensions of the feature. This study also used a local maximum detection algorithm on the spatial distribution of standardized features to detect palm trees. Previous palm tree or tree crown detection studies have focused on detecting trees that are not very crowded and have achieved good detection results for their study areas. However, the performance of some of these methods would deteriorate when detecting palm trees in some of the regions of our study area. For instance, the local maximum filter based method [
9] cannot detect palm trees correctly in regions where the trees are very young and small, as the local maximum of each filter does not locate around the apex of young palm trees. The template matching method [
10] is not suitable for regions where palm trees are very crowded and where their crowns overlap.
The convolutional neural network (CNN), a widely used deep learning model, has achieved great performance in many studies in the computer vision field, such as image classification [
14,
15], face recognition [
16,
17], and pedestrian detection [
18,
19], etc. In recent years, deep learning based methods have also been applied to hyperspectral image classification [
20,
21], large-scale land cover classification [
22], scene classification [
23,
24,
25], and object detection [
26,
27], etc. in the remote sensing domain and achieved better performance than traditional methods. For instance, Chen et al. [
20] introduced the concept of deep learning and applied the stacked autoencoder method to hyperspectral remote sensing image classification for the first time. Li et al. [
22] built a classification framework for large-scale remote sensing image processing and African land cover mapping based on the stacked autoencoder. Zou et al. [
24] proposed a deep belief network based feature selection method for remote sensing scene classification. Chen et al. [
26] proposed a hybrid deep convolutional neural network for vehicle detection in high-resolution satellite images. Vakalopoulou et al. [
27] proposed an automated building detection framework from very high-resolution remote sensing data based on deep convolutional neural networks.
In this paper, we introduce the deep learning based method to oil palm tree detection for the first time. We propose a CNN based framework for the detection and counting of oil palm trees using high-resolution remote sensing images from Malaysia. The detection and counting of oil palm trees in our study area is more difficult than for the previous palm detection research mentioned above, as the trees are very crowded and their crowns often overlap. In our proposed method, we collect a number of manually interpreted training and test samples for training the convolutional neural network and calculating the classification accuracy. Secondly, we optimize the convolutional neural network through tuning its main parameters to obtain the best CNN model. Then, we use the best CNN model obtained previously to predict the labels for all the samples in an image dataset that are collected through the sliding window technique. Finally, we merge the predicted palm tree coordinates corresponding to the same palm tree (spatial distance less than a certain threshold) into one coordinate, and obtain the final palm tree detection results. Compared with the manually interpreted ground truth, more than 96% of the oil palm trees in our study area can be detected correctly, which is higher than the accuracies of the other three tree detection methods used in this study. The detection accuracy of our proposed method is affected, to some extent, by the limited number of our manually interpreted samples. In our future work, more manually interpreted samples will be collected to further improve the overall performance of our proposed method.
The rest of this paper is organized as follows.
Section 2 presents the study area and the datasets of this research;
Section 3 describes the flowchart and the details of our proposed method;
Section 4 provides the detection results of our proposed method and the performance comparison with other methods; and
Section 5 presents some important conclusions of this research.
5. Discussion
To further evaluate our proposed palm tree detection method, we implemented three other representative existing palm trees or tree crown detection methods (i.e., Artificial Neural Network (ANN), template matching, and local maximum filter) and compared their detection results with our proposed method. The procedure of the ANN based method is the same as our proposed method, including the ANN training, parameter optimization, image dataset label prediction, and sample merging.
The local maximum filter based method [
9] and the template matching based method [
11] are two traditional tree crown detection methods. For the template matching based method, we used 5000 manually labeled palm tree samples as the template dataset, and a 17 × 17 window to slide through the whole image. We chose the CV_TM_SQDIFF_NORMED provided by OpenCV [
30] as our matching method. A sliding window will be detected as a palm tree if it matches any sample in the template dataset (the difference between the sliding window and the template calculated by the CV_TM_SQDIFF_NORMED method is less than a threshold. In this study, the threshold is set as five through experimental tests).
For the local maximum filter based method, we first applied a non-overlapping 10×10 local maximum filter to the absolute difference image of the NIR and red spectral bands. Then, we conducted transect sampling and a scaling scheme to obtain potential tree apexes, and adjusted the locations of tree apexes to the new local maximum positions.
Finally, the outputs of the template matching based method and the local maximum filter based method are post-processed (described in
Section 3.4) to obtain the final palm tree detection results.
Figure 6,
Figure 7 and
Figure 8 show the detection images of each method for extracted areas of regions 1, 2 and 3, respectively. Each red circle denotes a detected palm tree. Each green square denotes a palm tree in ground truth that cannot be detected correctly. Each blue square denotes a background sample that is detected as a palm tree by mistake.
Table 2,
Table 3 and
Table 4 show the detection results of ANN, template matching (TMPL), and local maximum filter (LMF), respectively.
Table 5 summarizes the performance of all four methods in terms of the number of correctly detected palm trees.
Table 6 summarizes the performance of all four methods in terms of precision, recall and overall accuracy (OA). The proposed method (CNN) performs better than any of the other three methods in the number of correctly detected palm trees and in OA. Generally, machine learning based approaches (i.e., CNN and ANN) perform better than traditional tree crown detection methods (i.e., TMPL and LMF) in our study area, especially in region 1 and region 2. For example, the local maximum filter based method cannot detect palm trees correctly for regions where palm trees are very young and small (see
Figure 7d), as the local maximum of each filter does not locate around the apex of young palm trees. The template matching method is not suitable for regions where the palm trees are very crowded and the canopies often overlap (see
Figure 6c).