Modelling Niche Differentiation of Co-Existing, Elusive and Morphologically Similar Species: A Case Study of Four Macaque Species in Nakai-Nam Theun National Protected Area, Laos

Simple Summary We investigated the niche separation of four macaque species (Macaca arctoides, M. assamensis, M. leonina, M. mulatta) occurring within Nakai-Nam Theun National Protected Area, central-eastern Laos using the environmental niche modelling software MaxEnt. The respective suitable habitat predicted for each species reveals niche segregation between the four species with a gradual geographical distribution following an environmental gradient of, notably, temperature, precipitation, elevation and slope within the study area. This means that the four species seem adapted to different ecological conditions within the area. This information has implications for future research on these species and for their management and conservation. Abstract Species misidentification often occurs when dealing with co-existing and morphologically similar species such as macaques, making the study of their ecology challenging. To overcome this issue, we use reliable occurrence data from camera-trap images and transect survey data to model their respective ecological niche and potential distribution locally in Nakai-Nam Theun National Protected Area (NNT NPA), central-Eastern Laos. We investigate niche differentiation of morphologically similar species using four sympatric macaque species in NNT NPA, as our model species: rhesus Macaca mulatta (Taxonomic Serial Number, TSN 180099), Northern pig-tailed M. leonina (TSN not listed); Assamese M. assamensis (TSN 573018) and stump-tailed M. arctoides (TSN 573017). We examine the implications for their conservation. We obtained occurrence data of macaque species from systematic 2006–2011 camera-trapping surveys and 2011–2012 transect surveys and model their niche and potential distribution with MaxEnt software using 25 environmental and topographic variables. The respective suitable habitat predicted for each species reveals niche segregation between the four species with a gradual geographical distribution following an environmental gradient within the study area. Camera-trapping positioned at many locations can increase elusive-species records with a relatively reduced and more systematic sampling effort and provide reliable species occurrence data. These can be used for environmental niche modelling to study niche segregation of morphologically similar species in areas where their distribution remains uncertain. Examining unresolved species’ niches and potential distributions can have crucial implications for future research and species’ management and conservation even in the most remote regions and for the least-known species.


Introduction
Environmental niche models are increasingly used to understand species' habitat use, to study their evolutionary biogeography, and to predict their geographic distribution at regional or national scales in relation to their environment e.g., [1][2][3][4][5][6]. These models constitute useful tools to predict suitable habitat of species in areas where their distribution is little known.
Species misidentification is a common issue during field surveys, especially so when dealing with morphologically similar species that flee when sighted e.g., [7][8][9]. Camera-traps, in contrast, can capture clear images that allow detailed examination of morphological difference between such species and result in reliable species occurrence data (necessary for environmental niche modelling). In addition, camera-traps increase the probability of records of elusive species, compared to field surveys e.g., [10,11] making the technique particularly valuable when direct data collection in the field is logistically challenging, financially restricted and time consuming. Such data are furthermore rarely readily available to the scientific community. Although the importance of camera-trap data is being acknowledged for understanding species' ecology [12][13][14], these data have rarely been used in combination with species niche modelling e.g., [15]. Taking this further, combining camera-trapping data and species niche modelling provides a powerful technique to explore the ecological relationships of co-existing, cryptic and morphologically similar species (e.g., macaques, small cats, civets, ungulates).
In this study, we focus on macaque species. Four macaque species occur in our study area, Nakai-Nam Theun National Protected Area (NNT NPA), Laos: rhesus (Macaca mulatta), Northern pig-tailed (M. leonina), Assamese (M. assamensis) and stump-tailed (M. arctoides) macaques. Each has been previously confirmed from field sightings [16] and camera-traps (WMPA unpubl. data). Field identification confusion however can occur between species, and this is further emphasized by the unreliable and misidentification of macaques by villagers living alongside the species' habitat [9]. Fooden [17] predicted ecological segregation across their range amongst these species, but until now this has rarely been studied in the field, and never in Laos, where all four taxa are under threat from hunting. The predictive distribution of macaque species in NNT NPA will help us understand the difference in ecological niche between these four species. We use confirmed identified camera-trap photos from 2006-2011 and field records from 2011-2012 to model their distribution using the maximum entropy (MaxEnt) statistical method.

Study Area
Nakai-Nam Theun National Protected Area (NNT NPA), ~3,500 km 2 , is located in the central-east of Laos within the Annamite mountain range (Figure 1). The area remains largely forested [18] with mixed semi-evergreen/coniferous, upper montane, dry-evergreen and wet evergreen forests [16]. Elevation throughout the NPA ranges from ~200 m to ca. 2,300 m a.s.l. Annual precipitation ranges from 1,865 mm to 2,620 mm and annual mean temperatures from 14 °C to 24 °C, with extremes of 4 °C to 32 °C [19].

Biological and Environmental Data
Occurrence data for macaque species come from camera-trap and transect survey data. From 2006 to 2011 systematic camera-trapping was carried out in the area by the Nam Theun 2-Watershed Management and Protection Authority [20,21]. Camera-trap sampling was designed as to be as representative as possible of the different habitats in NNT NPA for long-term monitoring of its wildlife [20]. We examined all photos taken over the study period and selected all photos of macaque species to identify them at species level. We identified species in light of their respective morphology, clearly observed on photos, and validated our identifications with expert opinion. In addition, CNZC conducted transect surveys in 2011-2012 within the area during which she recorded all diurnal primate species [22]. Given the difficulty to identify macaque species in the field due to poor visibility and fleeting behaviour of the animals, we only use confirmed species records (i.e., involving observation of species-typical characteristics) for the analysis. Ten different sites were sampled with camera-traps from 2006 to 2011, with a total of 20,216 camera-trapping-days and ten sites were visited for transect surveys in 2011-2012, totalling 310 km walked (Table 1; Figure 1). We obtained a total of 48   Of a total of 114 independent images of macaque species, only two remained unidentified, while of the 55 sightings of macaques during our transect surveys, 33 remained unidentified. Species were identified by examining respective specific morphological characteristics.
We used 25 environmental variables in our models, of which 19 are derived from monthly min/max temperature and precipitation data averaged as annual trends for the period 1950-2000 (30 arc-second/1 km 2 resolution; (http://www.worldclim.org/current/) [19]). We retrieved elevation layer from the CGIAR Consortium for Spatial Information (http://srtm.csi.cgiar.org/). Land cover data were generated by the Lao National Geographic Department in 2002 and includes 24 categories from which 14 (upper evergreen, upper mixed deciduous, pine, mixed broadleaf-coniferous, unstocked, bamboo, ray, savannah, scrub, rice paddy, rock, grassland, swamp, water bodies) are found within the boundary of NNT NPA. We obtained vegetation continuous field from the Global Land Cover Facility (http://glcf.umiacs.umd.edu/data/vcf/). We derived slope from elevation and distance from water and villages from villages and watercourse features using Euclidean distances in ArcGIS 9.3. The same geographic extent, cell size (90 m; this resolution is considered adequate given in general a >1 km 2 home range size of macaque species; [23]) and projected coordinate system (WGS 1984 UTM Zone 48N) were selected for each layer and species localities.  Table 1); (b) transect surveys from 2011-2012, total km walked, including replications are presented in Table 1; (c) Macaques species occurrence localities used for the models.

Species Distribution Modelling
To model the potential distribution and niche of macaque species in NNT NPA, we used the maximum entropy general purpose machine learning method, which has been adapted and developed as a software (MaxEnt) for species distribution modelling [24,25] and identified as one of the best methods for species niche modelling [26]. MaxEnt version 3.3.3k is used to perform the analysis (http://www.cs.princeton.edu/~schapire/MaxEnt/). The method combines biological data of species occurrence (presence-only data) with environmental characteristics (e.g., GIS layers with a grid of values for the geographic area considered) to estimate the probability distribution of maximum entropy (i.e., closest to uniform), subject to the set of constraints provided (i.e., environmental characteristics where the species occurs) [25].

Model Building
For each species, we used a ten-fold cross-validation replication run type [27] where a sample of the occurrence localities is randomly allocated for test data (for model evaluation) at each of the ten replications. We set all other settings as default as they perform well and are robust for a large range of datasets tested, which our data can compare to [28].

Model Evaluation
We assessed the performance of our models using four methods: (i) the area under curve (AUC) of the receiver-operating characteristic (ROC) (ii) the use of a null-model [29], (iii) the Boyce Index method [30], (iv) and the jackknife validation method [31] for our smallest sample only. In addition, results outputs are interpreted in term of known ecology of the study species at National scale, published by experts (cf. Discussion section).
The AUC of ROC is provided in the model outputs. It is obtained by plotting sensitivity on the x-axis and 1-specificity on the y-axis, with sensitivity representing the proportion of correct prediction of actual presence (true-positive, or absence of omission), whereas 1-specificity is the proportion of falsely predicted presence (false-positive, or commission error) for all possible thresholds of the probability (threshold independent evaluation). In presence-only models, the AUC value represents the probability that the model scores a presence site (test locality) higher than a random background site [25]. The value ranges from 0.5 to 1ía/2, where a is the fraction of pixels covered by the species' distribution that remains unknown, thus renders inadequate the interpretation of AUC [25,29,32]. Nevertheless, it remains the most commonly used evaluation parameter and is presented here. An AUC value closer to 1 indicates that the model predicts better than random, while a value of 0.5 indicates that the prediction is no better than random [25].
Given the recent critics of using AUC for presence-only model evaluation [29,32], we used other methods to evaluate the performance of the model outputs. A recently developed alternative is the null-model, which was introduced by Raes and Steege [29]. The method tests the AUC value of the model against a null distribution of expected AUC values based on random occurrence data from the geographic area considered. More concretely, it tests if the relations between species occurrence and environmental variables at these locations are stronger than expected by chance [29]. The exact same number of occurrence points available for each species (48, 38, 34 and 14; cf. Table 2) were randomly drawn across NNT NPA and replicated 999 times. These null-distributions were run separately in MaxEnt with the same settings detailed above. This resulted in 999 AUC values of the null-model for each of the four species. AUCs of the null-models were ranked and the position of the real species' AUC was tested against the 95% confidence interval (CI) of the null-model's AUC values. The species' model is considered performing better than expected by chance if its AUC value is 95% CI null-model's AUCs [29].
In addition, we apply the Boyce Index validation method [30,33] to our models. The continuous habitat suitability scores obtained from the outputs (0 to 1) are reclassified into a number i of bins (classes). For each class, two frequencies of pixels are calculated: the Predicted Frequency and the Expected Frequency. The Predicted Frequency is the number of occurrence points predicted by the model falling into in the class i divided by the total number of occurrence points. The Expected Frequency is the number of grid cells included in class i, divided by the total number of grid cells in the whole geographic area considered. A Predicted-to-Expected ratio is calculated for each class and a Spearman rank correlation coefficient rho (1-tailed test) evaluates if the ratio significantly increases as suitability increases (p < 0.05) [33]. Our models' outputs were reclassified into 100 continuous classes of equal interval and we calculated a P/E ratio for each class. Statistical tests were performed with SPSS.v.17.
Finally, for the smallest sample (M. mulatta; n = 14), we followed the jackknife validation method for samples n < 25 described by Pearson et al. [31], in which it is assessed if the model successfully predicts the n left-out localities (one locality at each of the 14 replications) within the area of suitability (under the minimum training presence threshold chosen). This is assessed with a pValue based on the test statistic D; D = ȈX i (1 í P i ), where Xi is the success-failure variable indicating if the ith left-out locality is included or not in the predicted area and Pi is the probability of success [31]. The pValue is computed with pValue compute program [31].

Output Analysis
To produce a binary map (i.e., suitable vs. non-suitable habitat), we used the "minimum training presence logistic" threshold, which has been commonly used when occurrence data are highly reliable, such as here, given the confirmed species identification and records in their primary habitat. This threshold has the advantage to maintain zero omission error and include all areas that are at least as suitable as those where the species is known to occur [31,34]. The final potential distribution for each species was projected (WGS 1984 UTM Zone 48N) in ArcGIS 9.3.
We calculated mean values of environmental variables within the predicted distribution of the four species. Predicted suitable habitat of the four species is also tested pair-wise for their similarity using two different statistical tests that compare the logistic habitat suitability scores provided in MaxEnt's model outputs: Schoener's D [35] and I statistic [36]. Their value ranges from 0 (no overlap in habitat suitability) to 1 (complete overlap in habitat suitability); they are calculated using the ENMTool [36]. Range overlap is also quantified with ENMTool with the formulae (N x,y /min[N x , N y ]), where N x,y is the number of grid cells where both species x and y are predicted to occur and N x and N y are the number of grid cells where respectively species x and y are predicted [36]. We apply a threshold at which habitat is considered suitable, using the average of all four species' "Minimum training Presence logistic" threshold (=0.171).

Model Validation
The four species' models obtained AUC values ranged between 0.8 to 0.9. When tested against a null-model, two of our models (M. arctoides and M. leonina) performed significantly higher than random (p < 0.05). Comparatively, the Boyce Index validation test indicated a significant model prediction for the four species. The model for M. mulatta also showed a significant predicting success rate (p < 0.01) when evaluated with Pearson et al.'s jackknife method (Table 3).

Species Distribution Models Outputs
Predicted geographical distribution varies between the four macaque species but shows areas of overlap, in accordance with occurrence records collected ( Table 4). The four species have a general gradual predicted distribution with M. arctoides, M. assamensis, M. leonina and M. mulatta, respectively along a geographic gradient (Figure 2). Both M. assamensis and M. arctoides are predicted predominantly in evergreen and mixed-deciduous forests and the farthest from village settlements (average of 13 and 16 km, respectively). M. leonina and M. mulatta are predicted mainly in evergreen and mixed broadleaf-coniferous forest. The latter two species are predicted the closest to villages (average of 6 and 8 km, respectively). As per respective potential species distribution, there is a gradual change throughout the NPA in predicted mixed-species range overlap. Using the average of the four species' minimum training presence threshold, the largest overlap is between M. arctoides and M. assamensis (overlap index: 0.77), and the smallest between M. arctoides with M. mulatta and M. leonina (0.12) ( Table 5). The predicted distribution of Macaca assamensis overlaps the most (average of 0.54) with all other species, resulting from a widespread predicted distribution across the area (Table 5). Table 5. Range overlap index between the four macaque species using a threshold of presence of 0.171; range overlap between species x and y = (N x,y /min[N x , N y ]), where N x,y is the number of grid cells where both species x and y are predicted and N x and N y are the number of grid cells where respectively species x and y are predicted. In accordance with the predicted distribution overlaps between species, the niche similarity statistical tests indicate the niche of M. arctoides and M. assamensis as the most closely comparable, while the niche of M. arctoides as the most distant from the ones of M. leonina and M. mulatta (Table 6). Further to the respective gradual geographic distribution predicted, environmental characteristics within their potential distribution differ between the four species with M. mulatta and M. arctoides at the two extremes. While M. mulatta is modelled predominantly in a wetter and hotter niche at lower elevations and on flatter terrain, M. arctoides is predicted to inhabit drier and colder areas, at the highest elevations and terrain with steepest slopes (Figure 3).   (c) (d)

Macaques' Predicted Distribution and Niches
Our potential distributions obtained from the models indicate a niche differentiation between the four species occurring in NNT NPA. The four species' distributions alternate in parapatry and sympatry throughout the NPA as species are constrained along a gradient variation in environmental parameters.
Potential distribution of Macaca mulatta is mostly predicted in lowland (mean elevation 645 m a.s.l.) and flat terrain in mixed broadleaf-coniferous and evergreen habitat, and closely associated with water sources. It is the main species for which distribution is predicted around village clusters and main rivers, which is also the case across the country [37]. Macaca leonina is restrained by higher elevations and colder climate further east from its predicted distribution. Macaca assamensis is the most widespread species across the NPA. As a result, its habitat range overlaps and shares the most similarities with all other taxa. In Laos, it is the most common macaque species in mountainous areas and is also found at lower elevations in limestone formations [9]. Macaca arctoides is distributed the furthest east in NNT NPA, along the Lao-Vietnam border. Its niche is characterised by the highest elevations (mean 1,098 m. a.s.l.) and the most rugged terrain. It inhabits mostly dense evergreen and mixed-deciduous forests at colder annual temperatures.
Macaques are the most widespread group of monkeys in Southeast Asia [38]. Most species from the genus have only diverged phylogenetically during the Pleistocene (~2 mya), resulting in similar morphologies observed today (brown colour, similar size; ~6.5 to 12.2 kg for our study species [23,39]). Identification success of macaques species during field surveys was relatively low, as previous researchers have found [9]. Thus, camera traps proved essential here for our modelling of their ecological niches. Despite morphological similarity, however, Fooden [17] suggested that throughout Southeast Asia macaque species are ecologically and/or geographically segregated, resulting from interspecific competition. Ever since, macaques' ecological segregation at a local scale had rarely been studied [40í42] and never using a habitat suitability model.
Due to their proximity to human settlements, M. mulatta and M. leonina are at risk from hunting: macaques are hunted for food in NNT NPA and macaque bones are commonly encountered in poacher camps in the NPA [22]. As wildlife is mainly hunted using ground snares, set throughout the area [22], terrestrial M. arctoides is also threatened by hunting, despite being predicted as the most distant from human villages. Distance form villages is not necessarily a good predictor of hunting pressure in this case, as Vietnamese poachers commonly cross the international border for hunting [22]. Two of the four species are classified as globally threatened in the IUCN Red List (VU M. arctoides and VU M. leonina) [43]. Although species distribution models are effective to explore species' niche, these can be subject to prediction uncertainties and should be used as baseline studies to further investigate local species' habitat that can be validated and refined post hoc with additional field data [44]. Anthropogenic factors are, for instance, rarely well represented into models: species may have been extirpated from areas that have been deforested or under severe pressure of hunting. It is also important to note that the respective predicted distributions of the four species may have been influenced by the sampling bias across the area, resulting in unpredicted regions where the species may however occur [45]. Nonetheless, the use of reliable biological data for our models makes our predictions and analysis highly informative [46].

Model Performance
We demonstrate the value of MaxEnt in ecological studies [5,47,48] and in this case for examining the ecological nice of sympatric taxa. MaxEnt has been demonstrated as one of the best methods to handle small sample sizes [34,49]. Although the method has been improved in the past few years [28], new issues have been identified and assessed to improve the predictions of MaxEnt's models e.g., [4,6,45,50]. We note that methods both for model selection and model validation still need to be refined. No consensus exists on variable selection methods; in any case, variables should be selected in terms of the species' ecology and represent the complexity of the ecosystem, avoiding over-or under-parameterization [51]. We remark that the AUC value is not reliable on its own for model validation. Our four models were significantly better than random as indicated by the Boyce Index and by the jackknife method for M. mulatta, but only indicated a significant predictive power for two species when tested against a null-model. This highlights the uncertainties of model predictions and the need of using several validation methods in species niche modelling. The smallest AUC values for M. assamensis and M. mulatta, not validated by the null-model method, may be due to their ecological and geographical wider range which is commonly observed for such species [24,52].

Implications
In Laos and elsewhere in Southeast Asia, several species have rarely been directly observed in the field or are difficult to identify and differentiate from other taxa from signs only or when briefly sighted. Some of these species have mostly been recorded with camera traps only (e.g., saola Pseudoryx nghetinhensis [53]; large-antlered muntjac Muntiacus vuquangensis [20]; Annamite striped rabbit Nesolagus timminsi [54]; Sumatran striped rabbit N. netscheri [55]; Owston's civet Chrotogale owstoni [56]; several small to medium-sized cats [57,58]). Of these species, all are classified by the IUCN [43] as globally threatened or remaining Data Deficient. Potential species distributions are crucial for species management planning of threatened species, especially in the context of biodiversity crisis well known in Southeast Asia [59,60], where "protected" status of most forested areas does not imply long-term viability and protection of species [61,62].

Conclusion
Camera-trapping can increase elusive-species records with a relatively reduced and more systematic (less biased) sampling effort [63]. These accumulated occurrence data, when combined with species distribution modelling, can be used to examine unresolved species' niches and potential distributions, which will have crucial implications for future research and species management and conservation. Our data can be used as such by relevant management authorities and biologists. These techniques can be applied to even the most remote regions and the least known species and call for wider availability and open access.