Automatic Counting of Large Mammals from Very High Resolution Panchromatic Satellite Imagery

: Estimating animal populations by direct counting is an essential component of wildlife conservation and management. However, conventional approaches (i.e., ground survey and aerial survey) have intrinsic constraints. Advances in image data capture and processing provide new opportunities for using applied remote sensing to count animals. Previous studies have demonstrated the feasibility of using very high resolution multispectral satellite images for animal detection, but to date, the practicality of detecting animals from space using panchromatic imagery has not been proven. This study demonstrates that it is possible to detect and count large mammals (e.g., wildebeests and zebras) from a single, very high resolution GeoEye-1 panchromatic image in open savanna. A novel semi-supervised object-based method that combines a wavelet algorithm and a fuzzy neural network was developed. To discern large mammals from their surroundings and discriminate between animals and non-targets, we used the wavelet technique to highlight potential objects. To make full use of geometric attributes, we carefully trained the classiﬁer, using the adaptive-network-based fuzzy inference system. Our proposed method (with an accuracy index of 0.79) signiﬁcantly outperformed the traditional threshold-based method (with an accuracy index of 0.58) detecting large mammals in open savanna.


Introduction
Global biodiversity loss is a pressing environmental issue [1]. Populations of a number of wild animals have been reduced by half over the past four decades [2,3]. Counting wild animals to determine population size is an essential element of wildlife conservation and environmental management [4]. However, accurate population estimation using ground-based methods remains challenging, requiring considerable investment in resources and time [5]. Aerial surveys have been used as an alternative approach to detect large mammal populations and generate statistical estimates of their abundance in open areas [6]. In developed countries, wildlife such as caribou, elk, deer and moose have been monitored using aerial surveys [7][8][9]. For developing nations, where scores of endangered and threatened fauna are found, such an alternative is not always feasible due to limitations in access, technology, aircraft availability and skilled human resources [10,11]. It is therefore desirable to develop alternative approaches for conducting wildlife population counts in such regions.
Advances in satellite technology have provided new avenues in remote sensing for environmental applications, including the remote counting and mapping of animal populations. Lower spatial resolution satellite images have proven inadequate to detect and count individual animals [12], but the availability of commercial satellite images with a spatial resolution of one meter or less (e.g., IKONOS, Remote Sens. 2017, 9, 878 2 of 16 QuickBird, GeoEye and WorldView) has made such an undertaking more feasible [13]. As a result, studies have been undertaken utilizing satellite remote sensing data to detect animals. For example, Fretwell et al. [14] successfully estimated the abundance of penguins from fecal staining of ice by using a combination of medium resolution (15-30 m) Landsat-7 ETM+ and very high resolution (0.6-2.5 m) QuickBird satellite images, but they did not attempt to count individual birds. Stapleton et al. [15] used different very high resolution (VHR) satellite images (i.e., QuickBird, WorldView-1 and WorldView-2) to track the distribution and abundance of polar bears. Although their findings demonstrated the potential of remote sensing applications for wildlife detection and monitoring, they also revealed the need for more automated detection processes to expedite analysis. Yang et al. [16] explored mammal detection in open savanna country from VHR (0.5-2 m) GeoEye-1 satellite images, using a hybrid image classification approach. Through a two-step process of pixel-based and object-based image classification, they were able to demonstrate the feasibility of automated detection and counting of large wild animals in vast open spaces. However, the method they proposed requires the input by an expert of a number of parameters, and therefore this method remains subjective and labor-intensive. Fretwell et al. [17] compared a number of classification techniques endeavoring to automatically detect whale-like objects. They found that a simple thresholding technique of the panchromatic and coastal band delivered the best results. Neither Stapleton et al. [15] nor Fretwell et al. [17] made full use of the multispectral band, while the panchromatic band played an important role in their research. To our knowledge, there has been no substantial exploration of the feasibility of using a single panchromatic (black and white) band for wildlife detection. The typical panchromatic band data obtained from airborne platforms have a much wider spectral range than is utilized by multispectral bands (red, green, blue) [18], and also have a higher radiometric resolution (number of bits per pixel). Moreover, panchromatic satellite images have a higher spatial resolution than multispectral images [19].
Object counting can also be achieved with computer vision techniques, such as local feature-based subspace clustering algorithms [20,21] and global feature-based saliency detection approaches [22][23][24][25]. The conventional clustering method, such as the K-means clustering algorithm, has been used to extract local features, but its performance relies on finding "similar" records in the training data and could therefore be highly influenced by noise [21]. Data in a specific category can also be well-represented by low-dimensional subspace where noise can be reduced [26]. To achieve a good result by eliminating the influence of errors (e.g., noise, outliers), Peng et al. [20] proposed a graph-oriented learning method, which applied the L2-Graph for subspace learning and subspace clustering, for facial recognition and moving-vehicle detection [26]. However, studies on subspace clustering mainly concentrate on high-dimensional data clustering, such as facial recognition and motion image segmentation. Saliency detection is a well-researched problem in computer vision. It aims at indicating the saliency likelihood of each pixel by generating bounding boxes, binary foreground and background segmentation, or saliency maps [27]. The aforementioned methods have proven to be useful for multi-level features with multi-band images, but are difficult to apply to a single-band image where the object consists of few pixels.
Aerial photographs have been used for bird censuses since the 1980s, counting image points falling below an established threshold [28,29]. Bajzak and Piatt [29] studied the greater snow goose, contrasting its white plumage against the surrounding mud flats by size and tonal class. Similarly, a panchromatic image can use thresholding as a simple image segmentation method that divides an image into objects and background [30][31][32]. It works well when targets contrast sharply with their background. However, thresholding methods have their limitations: (1) targets cannot be separated from ground elements with similar brightness values; (2) gray value thresholding does not make full use of geometric information; and (3) threshold values are defined manually and depend heavily on the user's expertise.
Animal detection using remote sensing then predominantly switched to a two-step process [33]: (1) highlighting suspected targets; and then (2) classifying them, using geometric information. Groom et al. [33] proposed a scheme using geometric feature (object-size) filters to count birds against a monochromatic background. As targets were visually small and dim, they were not easily discerned against their background [34]. Using filters and image processing techniques, targets embedded in the scene could be visualized and detected [35][36][37][38]. However, the performance of such filters remains dependent on the brightness contrast between the target and background [34]. Several studies have employed wavelet-based techniques to address this concern [39][40][41]. The discernibility of targets from the background may vary at different scales, which can be problematic for object detection [19,42]. Wavelet analysis can transform signals into multiple resolutions, using an adaptive window [43], and thereby latently detect targets in cluttered backgrounds.
After highlighting the targets, the major challenge becomes how to make full use of geometric features to help separate a target from its surroundings. Spectral characteristics, cluster size, shape and other spatial features have been used in rule sets for image segmentation [44]. McNeill et al. [45] analyzed potential regions using shapes, by rejecting those with a compactness greater than a specified threshold value. Descamps et al. [46] counted large birds by fitting suspected objects (birds) into bright ellipses surrounded by a darker background. Expert knowledge can also play a critical role in image classification [16,47,48]. For example, Yang et al. [16] developed a specific rule set using expert knowledge to remove misclassified objects generated by object-based analysis. In another study, Wang et al. [47] proposed a hybrid neural network and expert system to quantify understory bamboo from satellite imagery, and they concluded that integration of a neural network and expert system appeared to be more efficient than when using either a neural network or an expert system alone. However, these methods rely on experts' subjective experience and knowledge, which can be challenging for practical applications.
An alternative approach to using an expert system is machine learning: a data analysis technique that automates model building through algorithms that iteratively learn from a given dataset. Though different classifiers based on machine learning generate varying levels of accuracy for different datasets [49], the most recent machine-learning techniques have a proven ability to solve complex problems [50]. For example, convolutional neural networks (CNNs) [51] have emerged as state-of-the-art models for image classification and object detection [52][53][54][55][56][57]. Local connections, shared weight, pooling and multiple layers are four architectural factors that make CNNs excel in processing natural signals [58]. However, the human involvement level is high when tailoring the CNN algorithm to a specific task [59], and large data sets are required for training purposes to ensure a high quality output [60]. Another major limitation of CNNs is their intrinsic black-box nature: their internal workings are hidden and not easily understood [61], so the models they generate are unexplainable [62]. The fuzzy neural network (FNN) is an alternative model that incorporates both the explicit knowledge representation of an fuzzy inference system (FIS) and the learning ability of an artificial neural network [63,64].The McCulloch-Pitts model [65] was one of the earliest applications to use fuzzy sets with a neural network concept. Since the 1990s, Takagi and others have developed a solid foundation for the fuzzy neural network [66]. In 1993, Jang proposed the adaptive-network-based fuzzy inference system (ANFIS) [67]. This algorithm has been widely employed in applied mathematics [68][69][70][71], and, unlike traditional expert systems, does not require a high level of expert knowledge when developing decision rules.
This study aims to detect and count large mammals in open spaces from a single, VHR GeoEye-1 panchromatic image, using a novel semi-supervised object-based scheme that combines a wavelet algorithm and a fuzzy neural network.

Study Area and Animal Species
The study area is located in the Maasai Mara National Reserve (also known as Maasai Mara or the Mara), a large game reserve in the Great Rift Valley in the southern part of Kenya ( Figure 1). The reserve's topography is mainly open savanna (grassland) with clusters of acacia trees along the southeastern area of the park [72]. The reserve not only protects the habitat of resident species, but also preserves a critical part of the route used by wildebeests and zebras during the great migration that traverses the Maasai Mara via the Serengeti National Park. The wildebeest is the dominant species of the Maasai Mara, and herd sizes can range from a few individuals to many thousands [73]. Serengeti wildebeests migrate seasonally, and are seen intermittently in the Mara between August and November [74]. The sheer numbers of animals that congregate during migration make the wildebeest an ideal candidate species to map through the use of satellite technology.

Satellite Images
We acquired two GeoEye-1 satellite images of part of the Maasai Mara National Reserve through the DigitalGlobe Foundation (www.digitalglobefoundation.org/), each covering an area of 25 km 2 . Both images are cloud free, and include one panchromatic (0.5 m) and four multispectral (2 m) bands. The image captured on 11 August 2009 depicts large numbers of animals. The other image, without any large animals present, was captured on 10 August 2013. To address our research objective, we carefully selected three small pilot study areas from the first image, each covering an area of 120 × 120 m ( Figure 2). These pilot study areas were chosen to represent different levels of complexity regarding three criteria: (a) complexity of the landscape; (b) abundance of animals; and (c) feasibility and reliability of the visual interpretation of target animals. Pilot area No. 1 represents low complexity, with a few dozen animals viewed against a uniform background; Pilot area No. 2 represents moderate complexity, with more than one hundred animals viewed against a slightly less uniform background; and Pilot area No. 3 represents high complexity, with several hundred animals viewed against a non-uniform background. The reserve's topography is mainly open savanna (grassland) with clusters of acacia trees along the southeastern area of the park [72]. The reserve not only protects the habitat of resident species, but also preserves a critical part of the route used by wildebeests and zebras during the great migration that traverses the Maasai Mara via the Serengeti National Park. The wildebeest is the dominant species of the Maasai Mara, and herd sizes can range from a few individuals to many thousands [73]. Serengeti wildebeests migrate seasonally, and are seen intermittently in the Mara between August and November [74]. The sheer numbers of animals that congregate during migration make the wildebeest an ideal candidate species to map through the use of satellite technology.

Satellite Images
We acquired two GeoEye-1 satellite images of part of the Maasai Mara National Reserve through the DigitalGlobe Foundation (www.digitalglobefoundation.org/), each covering an area of 25 km 2 . Both images are cloud free, and include one panchromatic (0.5 m) and four multispectral (2 m) bands. The image captured on 11 August 2009 depicts large numbers of animals. The other image, without any large animals present, was captured on 10 August 2013. To address our research objective, we carefully selected three small pilot study areas from the first image, each covering an area of 120 × 120 m ( Figure 2). These pilot study areas were chosen to represent different levels of complexity regarding three criteria: (a) complexity of the landscape; (b) abundance of animals; and (c) feasibility and reliability of the visual interpretation of target animals. Pilot area No. 1 represents low complexity, with a few dozen animals viewed against a uniform background; Pilot area No. 2 represents moderate complexity, with more than one hundred animals viewed against a slightly less uniform background; and Pilot area No. 3 represents high complexity, with several hundred animals viewed against a non-uniform background.

Visual Interpretation to Establish Ground Truth for Large Animals Discerned on GeoEye-1 Imagery
Ground truth is required to calibrate the model, as well as validate the classification result. Using the panchromatic band of the GeoEye-1 image, large mammals (e.g., wildebeests and zebras) are visualized as 3-4 pixels long and 1-2 pixels wide [16]. Due to their similarity in size, large animals can be confused with small ground features such as bushes and termite mounds [75]. To facilitate the visual interpretation of target animals and avoid the problem of subjectivity, we used one pansharpened GeoEye-1 image with, and one without, the presence of large animals ( Figure 3). We invited two experienced wildlife researchers from Africa as independent visual interpreters. Together we visually compared the two separate temporal images of the three pilot study locations at multiple scales under the ArcGIS 10.3.1 environment (ESRI Inc., Redlands, CA, USA). After the observers had discussed their interpretation results, especially regarding uncertain objects, and had agreed which identified objects were indeed large mammals, their knowledge was recorded as confirmed animal ground truth points. In total, we identified 50, 128 and 426 large mammals in the pilot study areas 1, 2 and 3, respectively.

Visual Interpretation to Establish Ground Truth for Large Animals Discerned on GeoEye-1 Imagery
Ground truth is required to calibrate the model, as well as validate the classification result. Using the panchromatic band of the GeoEye-1 image, large mammals (e.g., wildebeests and zebras) are visualized as 3-4 pixels long and 1-2 pixels wide [16]. Due to their similarity in size, large animals can be confused with small ground features such as bushes and termite mounds [75]. To facilitate the visual interpretation of target animals and avoid the problem of subjectivity, we used one pan-sharpened GeoEye-1 image with, and one without, the presence of large animals ( Figure 3). We invited two experienced wildlife researchers from Africa as independent visual interpreters. Together we visually compared the two separate temporal images of the three pilot study locations at multiple scales under the ArcGIS 10.3.1 environment (ESRI Inc., Redlands, CA, USA). After the observers had discussed their interpretation results, especially regarding uncertain objects, and had agreed which identified objects were indeed large mammals, their knowledge was recorded as confirmed animal ground truth points. In total, we identified 50, 128 and 426 large mammals in the pilot study areas 1, 2 and 3, respectively.

Visual Interpretation to Establish Ground Truth for Large Animals Discerned on GeoEye-1 Imagery
Ground truth is required to calibrate the model, as well as validate the classification result. Using the panchromatic band of the GeoEye-1 image, large mammals (e.g., wildebeests and zebras) are visualized as 3-4 pixels long and 1-2 pixels wide [16]. Due to their similarity in size, large animals can be confused with small ground features such as bushes and termite mounds [75]. To facilitate the visual interpretation of target animals and avoid the problem of subjectivity, we used one pansharpened GeoEye-1 image with, and one without, the presence of large animals (Figure 3). We invited two experienced wildlife researchers from Africa as independent visual interpreters. Together we visually compared the two separate temporal images of the three pilot study locations at multiple scales under the ArcGIS 10.3.1 environment (ESRI Inc., Redlands, CA, USA). After the observers had discussed their interpretation results, especially regarding uncertain objects, and had agreed which identified objects were indeed large mammals, their knowledge was recorded as confirmed animal ground truth points. In total, we identified 50, 128 and 426 large mammals in the pilot study areas 1, 2 and 3, respectively.

Semi-Automatic Animal Detection Algorithm
Large mammals were identified by a series of multistage, semiautomatic techniques in VHR panchromatic satellite images. Our proposed scheme includes four principal steps (Figure 4): image preprocessing, preclassification, reclassification and accuracy assessment. Visual interpretation was incorporated for the purpose of reclassification and accuracy assessment.
August 2009, with large animals (bottom). The three pilot study areas represent the complexity of the landscape and the abundance of animals appearing in these images, from left to right: low, moderate and high.

Semi-Automatic Animal Detection Algorithm
Large mammals were identified by a series of multistage, semiautomatic techniques in VHR panchromatic satellite images. Our proposed scheme includes four principal steps (Figure 4): image preprocessing, preclassification, reclassification and accuracy assessment. Visual interpretation was incorporated for the purpose of reclassification and accuracy assessment.

Image Preprocessing
To highlight large mammals in the panchromatic imagery, we applied a histogram stretch in ENVI 5.2 (Exelis Visual Information Solutions, Inc., Boulder, CO, USA). Due to the limited resolution of the panchromatic band of VHR satellite images, an individual animal is represented as a cluster of pixels consisting of no more than eight pixels. To fully use their geometric information, we resampled the original image. Bicubic interpolation, which uses weighted arithmetic means, was chosen, as it maintains the quality of detailed information through antialiasing [76]. The image was carefully resized to eight times the original size, taking the wavelet decomposition performance into account, as well as memory and computation time, using where the original image a , is a matrix with m rows and n columns. We describe the resampled image as

Image Preprocessing
To highlight large mammals in the panchromatic imagery, we applied a histogram stretch in ENVI 5.2 (Exelis Visual Information Solutions, Inc., Boulder, CO, USA). Due to the limited resolution of the panchromatic band of VHR satellite images, an individual animal is represented as a cluster of pixels consisting of no more than eight pixels. To fully use their geometric information, we resampled the original image. Bicubic interpolation, which uses weighted arithmetic means, was chosen, as it maintains the quality of detailed information through antialiasing [76]. The image was carefully resized to eight times the original size, taking the wavelet decomposition performance into account, as well as memory and computation time, using I = a i,j m×n (1) where the original image a i,j m×n is a matrix with m rows and n columns. We describe the resampled image as where I is the new image, λ represents the diagonal matrix of the resized scale and f is the bicubic interpolation function.

Wavelet-Based Preclassification
Based on the generally accepted methodology of image decomposition and reconstruction, we used the wavelet-based method when highlighting suspected large mammals, to enhance their contrast against the immediate surroundings and to suppress irrelevant background [77,78]. Wavelet transform (WT) is based on the theory of Short-Time Fourier Transform (STFT) [79]. The WT differs from STFT in that it replaces infinite triangle function bases with finite decay wavelet bases. The finite decay wavelet bases, which are stretched (or squeezed) and translated from the mother wavelet, have an average value of 0 [80]. The WT of a continuous signal is defined as where a is scale, b is the position parameter, w(a) is a weighting function and ψ * ( t−b a ) is the wavelet base [81]. If the wavelet base sufficiently corresponds to an input signal, the WT coefficient at this position is high [82]. The optimal mother wavelet and parameters were selected by comparing the performance of mainstream wavelet families regarding maintaining geometry features of suspected targets in our experimental imagery. A Haar wavelet (or db1 wavelet) was selected as it is not continuous and is therefore able to detect signals containing a sudden transition [83].
The image was transformed into a series of sub-images: A1 (low-frequency image), H1 (high-frequency image in the horizontal direction) and V1 (high-frequency image in the vertical direction); and then the same procedure was applied to the low frequency image (A1). Such a method permits multiresolution processing in both directions. After three transformation iterations, nine sub-images were generated, containing details as well as background. To highlight suspected targets and suppress background information, a weighted fusion algorithm was used. We then calculated the mean-square error (MSE) [84] between sub-images (resized to the original) and the original image. Sub-images containing more high-frequency information yielded higher MSE values. The weight of each sub-image should be where i,j are the serial numbers of the current image, σ i(j) is the MSE of the current sub-image, and n is the total number of calculated sub-images. The weighted fusion algorithm creates a high signal-to-noise ratio (SNR) image. We then used Ostu's method [85] in MATLAB (The Mathworks Inc., Natick, MA, USA), to discriminate between each suspected animal blob and the background.

Selecting Geometric Features
The next concern was how to identify which suspected large mammals were true large mammals. This entailed deciding which geometric features to use, typically length and area. We also considered gray value (hue) pixels. We used cross-validation (a model assessment technique) to verify the performance of classifiers [86]. This basically involves grouping raw data: one group is used as training set and the other for validation. K-fold cross-validation (K-CV) is a commonly used validation technique in object detection [86,87]. We divided the data into ten groups, and used each group once as the training dataset while the other nine groups acted as the validation dataset. We determined the most suitable combination for this experiment by calculating the average value of the training errors and checking errors using the dataset mentioned above at situations of different feature combinations. After employing the K-fold cross-validation multiple times, we decided a combination of feature area, major axis length, minor axis length and bounding box area was most suitable for this experiment.

ANFIS-Based Reclassification
A total of 100 blobs (or unknown objects) were randomly selected from the database to train the final model. The distribution of training data was comparable to the distribution of the whole dataset. Before we trained these data using ANFIS, a number of rules was decided upon. The Fuzzy C-Mean (FCM, or Fuzzy ISODATA), which was originally designed by Dunn [88], is a well-accepted clustering algorithm ideally suited to solving a natural problem [89,90]. As shown in Figure 5, this algorithm generated 10 cluster centers (corresponding to 10 membership functions for each variable). To limit the number of feature fields, we used expert knowledge to eliminate redundant classes. Finally, we input the 100 randomly selected blobs to train ANFIS in MATLAB. With the function genfis2 , we built an initial fuzzy inference system (FIS) structure. We then loaded the initial FIS structure into the function anfis to train the ANFIS and develop the model. A hybrid method, including least-squares and backpropagation gradient descent, was applied to optimise the model. ANFIS model evaluation was conducted according to the 'evalfis' function. Required parameters for the 'anfis' function, including training error goal, initial training step size, step size decrease rate and step size increase rate, were set to default values (0, 0.01, 0.9, 1.1), which were proven to be adequate for most situations [91]. In order to avoid overfitting, we set the epoch number to 75 by considering both training error and checking error (see Appendix A). The adaptive tuning stops when the least-squares error is less than the training error goal, or has reached the epoch number. By loading all the datasets containing feature values into the model, all suspected blobs were classified by the inference system into targets and non-targets.
Remote Sens. 2017, 9,878 8 of 16 A total of 100 blobs (or unknown objects) were randomly selected from the database to train the final model. The distribution of training data was comparable to the distribution of the whole dataset. Before we trained these data using ANFIS, a number of rules was decided upon. The Fuzzy C-Mean (FCM, or Fuzzy ISODATA), which was originally designed by Dunn [88], is a well-accepted clustering algorithm ideally suited to solving a natural problem [89,90]. As shown in Figure 5, this algorithm generated 10 cluster centers (corresponding to 10 membership functions for each variable). To limit the number of feature fields, we used expert knowledge to eliminate redundant classes. Finally, we input the 100 randomly selected blobs to train ANFIS in MATLAB. With the function ′genfis2′, we built an initial fuzzy inference system (FIS) structure. We then loaded the initial FIS structure into the function ′anfis′ to train the ANFIS and develop the model. A hybrid method, including least-squares and backpropagation gradient descent, was applied to optimise the model. ANFIS model evaluation was conducted according to the 'evalfis' function. Required parameters for the 'anfis' function, including training error goal, initial training step size, step size decrease rate and step size increase rate, were set to default values (0, 0.01, 0.9, 1.1), which were proven to be adequate for most situations [91]. In order to avoid overfitting, we set the epoch number to 75 by considering both training error and checking error (see Appendix A). The adaptive tuning stops when the leastsquares error is less than the training error goal, or has reached the epoch number. By loading all the datasets containing feature values into the model, all suspected blobs were classified by the inference system into targets and non-targets.

Accuracy Assessment
We assessed the accuracy of the classification results by comparing the number of large mammals detected by the computer model with the ground truthing, and then calculated the omission error and commission error [92]. Detection accuracy (DA), which is the most commonly used metric, is highly inversely correlated ( = 1) [93]. The values for both the omission error and the commission error are always between 0 and 1. The closer their values are to 0, the better the result. The accuracy index (AI), which was devised by Pouliot et al. [94], was computed as: where TP (true positive) denotes the number of targets occurring in both the ground truth and our processing result; FN (false negative) denotes the number of targets that do appear in the ground truth, but not in our processing result; FP (false positive) denotes the number of targets occurring in our processing result, but not in the ground truth data; and N is the number of ground truth targets in the study area. The higher the value of the accuracy index, the better the result.

Accuracy Assessment
We assessed the accuracy of the classification results by comparing the number of large mammals detected by the computer model with the ground truthing, and then calculated the omission error and commission error [92]. Detection accuracy (DA), which is the most commonly used metric, is highly inversely correlated (DA + omission error = 1) [93]. The values for both the omission error and the commission error are always between 0 and 1. The closer their values are to 0, the better the result.
The accuracy index (AI), which was devised by Pouliot et al. [94], was computed as: where TP (true positive) denotes the number of targets occurring in both the ground truth and our processing result; FN (false negative) denotes the number of targets that do appear in the ground truth, but not in our processing result; FP (false positive) denotes the number of targets occurring in our processing result, but not in the ground truth data; and N is the number of ground truth targets in the study area. The higher the value of the accuracy index, the better the result.

Results
In Figure 6, the visual results of our semi-automated ANFIS-wavelet approach to detecting large mammals are compared with the results gained with the thresholding method.

Results
In Figure 6, the visual results of our semi-automated ANFIS-wavelet approach to detecting large mammals are compared with the results gained with the thresholding method. The accuracy index regarding the proposed method for the low complexity study area (No. 1) was as high as 0.86 (Table 1). For the higher-complexity sites, the results also yielded acceptable accuracy indices: 0.79 and 0.72, respectively, for the moderately (No. 2) and highly (No. 3) complex sites. As shown in Table 2, the thresholding method produced accuracy indices of 0.64, 0.56 and 0.54, respectively, for the low, moderate and high complexity areas, with an average accuracy index of 0.58. The average accuracy index of our proposed method, depicted in Table 1, is 0.79, which is 0.21 higher than that of the thresholding method. Also, the calculated omission and commission errors of our approach (0.09 and 0.12, respectively) are lower than those of the thresholding method (0.15 and 0.24, respectively). It should also be noted that, if the study area is more complex, this does not necessarily mean that the detection is less accurate. As shown in Figure 6, specific ground features can introduce inaccuracies, such as the errors appearing in this study close to roads and edges of The accuracy index regarding the proposed method for the low complexity study area (No. 1) was as high as 0.86 (Table 1). For the higher-complexity sites, the results also yielded acceptable accuracy indices: 0.79 and 0.72, respectively, for the moderately (No. 2) and highly (No. 3) complex sites. As shown in Table 2, the thresholding method produced accuracy indices of 0.64, 0.56 and 0.54, respectively, for the low, moderate and high complexity areas, with an average accuracy index of 0.58. The average accuracy index of our proposed method, depicted in Table 1, is 0.79, which is 0.21 higher than that of the thresholding method. Also, the calculated omission and commission errors of our approach (0.09 and 0.12, respectively) are lower than those of the thresholding method (0.15 and 0.24, respectively). It should also be noted that, if the study area is more complex, this does not necessarily mean that the detection is less accurate. As shown in Figure 6, specific ground features can introduce inaccuracies, such as the errors appearing in this study close to roads and edges of forests. In absolute terms of detected targets, the thresholding technique and our semi-automated ANFIS-wavelet approach showed different accuracies for each pilot study area. The statistical results regarding this study area illustrate that a higher detection accuracy is obtained with the ANFIS-wavelet method than with the threshold-based method.

Discussion
The results from this study demonstrate that it is feasible to use VHR panchromatic satellite imagery to detect and count large mammals in extensive open areas. In comparison with the traditional thresholding technique, our ANFIS-wavelet method produced a higher accuracy index and less commission/omission errors.
Although the thresholding method performs adequately when the targets share similar gray values and are dissimilar to their background, it is less accurate in more complex areas. There are two main reasons for the higher commission error found when using the thresholding method. Firstly, when the gray values of suspected objects (animals) are similar to those of the surroundings, they may be ignored by the threshold-based segmentation. In the ANFIS-wavelet method, the representation of the target is considered at different spatial scales. Suspected animals that do contrast with their immediate background, once different spatial scales are considered, will contribute to a higher weighted value in the preclassification results. Secondly, when animal objects and terrain have similar gray values, they cannot be altered simply by using thresholds: more information is required before further processing can be undertaken [32]. We statistically selected four geometric features to distinguish non-target objects from large mammals in the feature space. This approach proved more accurate than merely using a simple threshold value.
The commission error derived from our method was found to be three percentage points greater than the omission error, resulting in more non-target objects being incorrectly classified as large mammals than large mammals being incorrectly omitted. Further analysis revealed that commission errors always appeared near roads and vegetation. Bushes were confused with large mammals because of similarities in geometric features. Rough road surfaces or vehicles may result in discontinuous blobs and may thus also be recognized as large mammals by our method. Two reasons for omission include targets that are not clearly distinguishable from the background and targets that are too close to each other.
The geometric features chosen to distinguish an animal from its background were area, major axis length, minor axis length and bounding area. These features differ between target animals and non-targets such as shrubs or boulders. Even though some features were highly correlated, they can also help us in detecting animals. For example, defining both major and minor axis length can help to eliminate objects that do not have a correct length-width ratio.
The ANFIS-wavelet method has proved to be a feasible method for detecting animals in open savanna landscapes. This method is based on wavelet preclassification followed by ANFIS reclassification. The wavelet-based classification is able to highlight objects and maintain their geometric features. This is critical because the targets are dim and small, and as much useful information as possible needs to be retained. By using multiscale analysis, targets can be precisely located in poorer quality (i.e., low SNR) imagery without information loss. The ANFIS, which combines the advantages of machine learning and a fuzzy system, makes it possible to learn from data and concomitantly use existing expert knowledge, resulting in a method that is both efficient and stable.

Conclusions
We developed a novel semi-supervised object-based method that combines a wavelet algorithm and a fuzzy neural network for detecting and counting large mammals (e.g., wildebeests and zebras) from a single, very high resolution GeoEye-1 panchromatic image in open savanna. To discern large mammals from their surroundings and discriminate between animals and non-targets, we used the wavelet technique to highlight potential objects. To make full use of geometric attributes, we carefully trained the classifier, using the adaptive-network-based fuzzy inference system. We then compared our method with the traditional threshold-based method. The results showed that our proposed method (with an accuracy index of 0.79) significantly outperformed the traditional threshold-based method (with an accuracy index of 0.58) in detecting large mammals in open savanna. The greater availability of VHR images, and the advances in image segmentation techniques, mean that animal detection by means of remote sensing technology is a pragmatic alternative to direct animal counting. Further developments in image processing should eventually make it feasible to detect and monitor medium-sized and small animals remotely from space as well.
number until around 75 epochs, and increases rapidly before around 120 epochs. According to this quantitative analysis, we found that it is proper to set the epoch number to around 75.
Remote Sens. 2017, 9,878 12 of 16 epoch number until around 75 epochs, and increases rapidly before around 120 epochs. According to this quantitative analysis, we found that it is proper to set the epoch number to around 75. Figure A1. Indentification of optimum epoch number based on the root-mean-square error of both training error and checking error.