Exploration in Mapping Kernel-Based Home Range Models from Remote Sensing Imagery with Conditional Adversarial Networks

Kernel-based home range models are widely-used to estimate animal habitats and develop conservation strategies. They provide a probabilistic measure of animal space use instead of assuming the uniform utilization within an outside boundary. However, this type of models estimates the home ranges from animal relocations, and the inadequate locational data often prevents scientists from applying them in long-term and large-scale research. In this paper, we propose an end-to-end deep learning framework to simulate kernel home range models. We use the conditional adversarial network as a supervised model to learn the home range mapping from time-series remote sensing imagery. Our approach enables scientists to eliminate the persistent dependence on locational data in home range analysis. In experiments, we illustrate our approach by mapping the home ranges of Bar-headed Geese in Qinghai Lake area. The proposed framework outperforms all baselines in both qualitative and quantitative evaluations, achieving visually recognizable results and high mapping accuracy. The experiment also shows that learning the mapping between images is a more effective way to map such complex targets than traditional pixel-based schemes.


Introduction
The basic concept of the home range is defined as the area traversed by the animal during its normal activities of food gathering, mating, and caring for young [1].Estimating home range is an important part of investigating species status, analyzing habitat selection, and developing conservation strategies [2].With the development of Geographic Information System (GIS), home ranges are now often estimated from the locational data obtained with radio-tracking techniques [3].The Minimum Convex Polygon (MCP) is a simple but popular method which assumes the uniform use of space within the outside boundary of animal locations [4].However, animals are unlikely to use their home range in a uniform manner in the real world.Therefore, a series of kernel-based probabilistic methods have shown the advantages in habitat studies [5,6], especially in understanding the internal structure of spatially heterogeneous environments.This type of model [2,7,8] produces a two-dimensional probability density map to represent the probability of animals occurring at each location in the defined area.Ref. [9] demonstrates that kernel home range models would enhance the studies of animal movements, species interactions, and resource selection.
Kernel home range models are based on locational data: either the density of locations or link distance between locations.Data for mapping home ranges used to be gathered by careful observation, but nowadays such data is usually collected automatically using GPS collars placed on animals.In practice, scientists usually capture and mark a certain amount of target species and collect GPS data within the validity period of the vulnerable tracking device [10,11].The whole process is costly and in most cases a one-time job.However, as introduced in [12], home ranges change dynamically over time, and estimating them with relocations requires sufficient and timely GPS records.Inadequate GPS data often prevents scientists from applying kernel home range models in long-term and large-scale research.
To solve this problem, we try to find alternatives.Habitat mapping studies [13][14][15] have demonstrated the strong connection between animal activities and environmental variables.These studies effectively leverage remote sensing imagery to map different types of habitat characteristics, which inspired us to map the home ranges from long-term support remote sensing imagery.However, most habitat mapping studies employ traditional classification or regression models on each independent pixel.This pixel-based scheme ignores the structural information in remote sensing imagery, which is an obvious defect in mapping our image-based target, the probabilistic home range map.
In this paper, we train an end-to-end deep learning framework to achieve this goal.The well-trained deep convolutional network could effectively produce the home range maps from image-based source data, without the need for animal relocations.Our main contributions can be summarized as follows:

•
We propose a general-purpose framework for mapping the kernel-based home range models from time-series remote sensing imagery.We innovatively use the adversarial network as a supervised model to learn the mapping between image-based data and the target (Figure 1).Our method enables scientists to carry out their home range analysis even if the GPS data is insufficient for long-term and large-scale research.To our knowledge, this is the first exploration in mapping home range models using an image-based strategy.

•
We illustrate our method in a real-world scenario for mapping the home ranges of Bar-headed Geese in Qinghai Lake area.We build a specific dataset for training the mapping model and elaborate each stage in the experiment.Our experience will assist researchers in extending their research scale in various wildlife analyses.

•
We qualitatively and quantitatively compare our method against several baseline models.
We analyze the strengths and drawbacks of selected baselines and further discuss why our method is suitable for this specific task.

Kernel-Based Home Range Models
The original concept of the home range is introduced by [1] (1943).He constructed a map delineating the outside boundary of the animal's movement during the course of its activities.A more formal definition is the Utilization Distribution (UD) [16], which takes the form of a bivariate probability density function that represents the probability of finding an animal in a defined area [17].Kernel UD [7,18] (also called bivariate Gaussian) is the best known home range model for constructing UD.It employs Kernel Density Estimation (KDE) [19] on animal relocations and uses a Gaussian kernel to calculate the probability on each location of the defined area.Several recent studies extended the kernel approach by using the movement patterns of wildlife, such as the Brownian bridge movement model [20], which takes the time dependence between locations into account.In summary, these kernel-based home range models produce a two-dimensional probability density map that represents the probability of the animal's occurrence.

Habitat Mapping
Habitat mapping [21,22] is a well-studied topic in the literature of remote sensing applied to ecology studies.Compared to its great success in the vegetation field [23,24], the related application to animals is limited by more complicated correspondence [25][26][27].Some studies [13,14,28,29] leverage the remote sensing imagery to map the quality and extent of wildlife habitats.These studies mostly focus on the specific species and have different source data, mapping models, and final targets.Technically, they mainly employ classification or regression models on the independent multi-dimensional vector contained in each pixel of remote sensing images, to predict discrete habitat categories [30] or continuous habitat index [31].This pixel-based scheme successfully identifies, verifies, and explains the habitat characteristic at the pixel level.However, it fails to consider the strong dependencies between pixels in highly structured remote sensing imagery.

Image-to-Image Translation
Image-to-Image translation is a class of problems emerging in computer vision literature, which aims to learn the mapping between an input image and an output image using a training set of aligned image pairs [32].This technique has many successful applications such as generating photographs from sketches [33], image style transfer [34], and image inpainting [35].The Fully Convolutional Networks (FCN) [36] can be seen as the embryonic form of this work, which removed the last fully connected layers in traditional Convolutional Neural Network (CNN) [37] to make the dense prediction at the full image level.Later, deep generative models [38] in a conditional setting have shown promise in this field, such as Conditional Variational Autoencoder (CVAE) [39] and Conditional Generative Adversarial Network (CGAN) [40].Typically, [41] proposed a general-purpose solution for the image-to-image translation problem.The "pixel2pixl" framework extends CGAN and leads to a substantial boost in the quality of translated images.Several studies [32,42] have carried out this work and further discussed the solution of multi-modal and unpair image-to-image translation.This series of studies greatly inspires us in the following work.

Data and Target
We annotate the time-series remote sensing imagery with corresponding home range maps to build image pairs.These image pairs are used to train the end-to-end mapping framework.The target, home range map, is calculated from GPS data with the kernel-based estimator.It is technically an image-based probability density map.The data are time-series remote sensing imagery.The type of remote sensing imagery is decided by each specific wildlife home range analysis.We align the image-based data and target at both spatial and temporal level.More details of the pre-processing procedure are described in a specific example in Section 4.3.

Mapping Model
Assuming that we have produced a set of aligned data-target pairs, we try to find the mapping X → Y, from multi-layer remote sensing images X ∈ R H×W×B (B is the number of layers) to home range maps Y ∈ R H×W×1 .Both X and Y have the same size and continuous pixel values.We achieve this goal by the following formulations.

Formulation
The basic idea of Generative Adversarial Nets (GANs) [38] is to simultaneously train a pair of adversarial networks, a generator G, and a discriminator D. The target of G is to produce samples G(z) under the distribution p g , from a random variable z.The D is to make the generated distribution p g close to the real data distribution p data .The objective of GAN can be expressed as follows: Original GANs can be extended to a conditional mode (CGAN [40]) in which both the generator and discriminator are conditioned on some extra information.This modification enables researchers to use an input image as conditional information to generate the corresponding output image.The representative pix2pix model took advantage of CGAN and extended the adversarial loss with the 1 loss balanced by λ.Technically, the pix2pix learns a mapping from a type of image A and random noise z, to another type of image B: {A,z} → B. The objectives can be expressed as: In this paper, the required mapping is from the remote sensing imagery X to the home range maps Y.We adapt the adversarial loss from pix2pix to fit our scenario and apply the least-square loss [43] to stabilize the training procedure and expedite the convergence.The final objects are shown as:

Network Architectures
Working with the above loss functions, two deep convolutional neural networks are used to implement the adversarial framework which consists of a generative network G and a discriminative network D. We adapt the network architecture from those in [41].As seen in Figure 2, the generator G is a deep convolutional encoder-decoder [44].The encoder extracts the high-level features from remote sensing layers while the decoder interprets these features and upsamples them to a home range map.Convolution layers help to extract the features with the consideration of structural information.U-Net connection [45] is used to share the low-level information between encoder and decoder layers [41].
The discriminator D is a traditional CNN classifier which helps G to learn more accurate mapping in the adversarial training.It is worth noting that the input for D is the data-target pair.The work of D is to determine whether the input pairs are real or synthetic.The synthetic data-target pair is formed with the real satellite image and the generated home range map.The architecture of D is shown in Figure 3.

Experiment
In this section, we illustrate our method in a real-world scenario for mapping the home ranges of Bar-headed Geese in Qinghai Lake area.We also compare our method against several baselines with both qualitative and quantitative evaluation.The experiment was conducted in the desktop environment (Intel i5-6600K, NVIDIA Geforce GTX 1070).The home range maps were estimated using an R package (adehabitatHR [18]) in R v3.5.1.The deep learning networks were implemented by Tensorflow [46] version 1. 7 in Python 2.7 environment.

Study Area and Field Knowledge
As shown in Figure 4, the study area (96.6 • and 102.4 • E, and 34.2 • and 38.8 • N) mainly covers Qinghai Lake, Gyaring Lake, Ngoring Lake, and Donggi Conag Lake, in Qinghai Province, China.These lakes, as well as the surrounding wetlands and estuaries, serve as a critical breeding ground and migratory staging area for many kinds of migratory waterfowl, especially the Bar-headed Goose (Anser indicus).This special species gained political and scientific attention following the large outbreak of highly pathogenic H5N1 avian influenza at Qinghai Lake area in the spring of 2005 [47,48].This single event is the first large-scale outbreak of avian influenza.It caused the death of nearly 5% of the global population of Bar-headed Geese [49], which sparked a global debate on the role that wild birds play in the spread of H5N1 [50].

GPS Data
We select five Bar-headed Geese captured and GPS-collared in Qinghai Lake area in 2007.These water birds are equipped with a 45 g solar-powered portable transmitter terminal.We recorded the GPS locations for each bird during the breeding season in 2007 and 2008.The details of the radio-tracking data are shown in Table 1.The study area includes Qinghai Lake, Ngoring Lake, Gyaring Lake, Donggi Conag Lake, and several wetlands and estuaries.These places serve as a critical breeding ground and migratory staging area for Bar-headed Geese.[51] Land Products are used in this application.We select environmental factors based on the field survey and a review of the literature (Table 2).The Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI) are used to determine the food availability.The Normalized Difference Water Index (NDWI) is used to determine the access to water.MODIS land cover type is used to examine the shelter conditions.We use the involved MODIS reflectance bands instead of the original factor maps as input.We leverage the deep neural network to approximate the band math and avoid the overlapped bands.We obtained the MODIS land reflectance bands from MOD09Q1 and MOD09A1 8-day L3 products.

Preprocessing
In the pre-processing stage, we annotate the MODIS reflectance data with the home range maps calculated from the kernel UD [7,18] estimator.To build the image-based data-target pairs, we align the home range maps with remote sensing images at both the spatial and temporal level.On the temporal dimension, we group bird GPS data by every 8 days to match the time interval of selected MODIS Land Products.For the spatial resolution, all raster data are transformed into the same projected coordinate system (EPSG: 4326: WGS84) and resampled to the 250 m resolution.Then, we use the R package (adehabitatHR [18]) to estimate the probability in each pixel on the remote sensing image.We slice the MODIS image as well as the probability map into numerous 256 × 256 tiles and pair them, as shown in Figure 5.We produce a total of 832 image pairs with the GPS data in 2007 and 2008.We randomly select 100 pairs as the test set, 132 pairs as the validation set and 600 training pairs.Table 2.The selected environmental factors and corresponding MODIS land reflectance bands used in this application.We also list the relevant waterfowl study which used the same factor.RED represents the wavelength of 620-670 nm for MODIS Land Bands; NIR covers the wavelength of 841-876 nm; BLUE covers the wavelength of 459-479 nm, and the GREEN covers the wavelength of 545-565 nm.

Training Details
We initialize all convolutional kernels with a Gaussian distribution N (0, 0.02).The ReLU unit is leaky with slope 0.2 and the parameter λ 1 is set to 10.We search the best value of these hyperparameters on the validation set.Following the experience in [43] in which the non-momentum optimizers perform better in very nonstationary problems, we use RMSprop optimizer [55] with a learning rate of 0.0002 instead of the default Adam optimizer [56].The improvement of training stability can be found in Section 4.5.Considering that our self-made dataset is relatively small compared to common datasets, we employ data enlargement operations in the training procedure.Mirroring and rotation for the input image are used before each training batch.Image random jitter [41] is also applied by resizing the 256 × 256 input to 275 × 275 resolution and then random cropping back to original size.

Training Stability
Unstable training is a common problem [41] for adversarial frameworks.Because the two adversarial components have opposing objectives in the simultaneous training procedure, their loss curves are usually highly-fluctuated.As mentioned in Section 4.2, we use the least-squares loss and RMSProp optimizer to improve the training stability.Here, we present the loss curves for both the generator and discriminator of our model in Figure 6.We also compared our model against two different combinations.
We find that the loss curves of the Cross-Entropy (CE) loss and Adam optimizer are highly-fluctuated, especially for the discriminator.They also have a slower convergence than two least-squares models.The second combination shows that applying the Adam optimizer on least-squares loss is not a good choice even though least-squares (LS) loss can still lead to a fast convergence.Our model (LS loss + RMSProp) achieves a fast convergence and more stable training for both the generator and discriminator, which confirms the previous report [43] that the optimizers without momentum perform better in the LS loss.

Results
We select six representative test samples which have obvious and differently shaped home ranges, as shown in Figure 7.We observe that the synthesized home range maps from our model successfully, capturing the primary distribution of the real target, even though there still exist some artifacts and noises.This result demonstrates that our model has the ability to implement the mapping between remote sensing imagery and kernel-based home range models.The middle one is the synthesized home range map which is directly mapped from remote sensing images using our end-to-end model.We colorized the original probability map with the hot colormap.

Baselines
Although we focus on a specific scenario, we investigate several potential solutions in both habitat mapping and computer vision literature.
• kNN: k-Nearest Neighbors algorithm [57] is a non-parametric method used for both classification and regression.In the regression mode, the output value is the average of the values of its k nearest neighbors.
• Decision Tree: Decision Tree [58] is a non-parametric supervised learning method used for both classification and regression.Classification and Regression Tree (CART) has been used in mapping the extent and quality of wildlife habitat in many studies [31,59].We first test the decision tree as a regression model to map our target using a pixel-based scheme.

•
Random Forest: Random Forest regressor [60] fits a number of classification and decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.It has also been used as an advanced model instead of the decision tree in several studies [61,62].Here, we examined it as a pixel-based baseline to investigate whether the improvement at the model level can overcome the limitation of the pixel-based scheme.

•
CNN + 2 loss: CNN with 2 loss is the most straightforward way to predict a continuous target using deep learning.The 2 loss is the general choice in image processing tasks [63,64].
Here, we used the same encoder-decoder as our model to avoid the impact of network architecture.This baseline is actually training the proposed generator network in 2 loss.

•
Conditional VAE: Deep generative models have achieved good performance in the image-to-image translation.Besides CGAN, another well-established generation model, CVAE [65], has also shown promise in similar studies [65,66].Different from GAN, VAE makes strong assumptions concerning the posterior and prior distribution of hidden variable and target data.It also tries to approximate these distributions with neural networks.

Metrics
• Regression Metrics: To quantitatively evaluate the prediction of the continuous values on home range maps, we employ Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R 2 to measure the mapping accuracy from the perspective of regression.• SSIM Loss: Considering that our target is also a structured image like nature images, we used the Structural Similarity Index (SSIM) [67] to measure the structural similarity between the synthesized home range maps and the real target.SSIM is an established metric to measure quality and similarity between images [68,69].The SSIM for pixel y i is: where the ŷi and y i are the predicted and real value of the ith pixel respectively.l(y i ) is the luminance comparison and cs(y i ) is the contrast and structure comparison, and the constants C 1 and C 2 are used to avoid instability.The means and standard deviations are computed with a Gaussian filter on y i .The SSIM loss for an image with N pixels can be expressed as: 4.6.3.Qualitative Evaluation As shown in Figure 8, we observe that the kNN regressor fails to produce recognizable home range maps.The other two pixel-based regression models (DT and RF) produce relatively noisy results compared to other baselines.Although they successfully predict the high probability area in the right place, they also bring a large number of noises scattered in the full image field.Examining the results reveals that pixel-based models are limited when it comes to mapping an image-based target.Concerning the two deep learning models, we find that both CNN + 2 and CVAE eliminate random noise but suffer the problem of blurring.The CNN + 2 produces fuzzy images and a significantly larger positive area than the ground truth.However, compared with the pixel-based methods, the image-based models produce more explicit and recognizable home range maps.Our model achieves the clearest and most visually realistic result among all baselines.
Figure 8.We compared the performance of our model and baselines on two test samples.We colorized the original probability map with the hot colormap which represents the probability value from low-to-high using dark-to-bright colors.

Quantitative Evaluation
We qualify the mapping accuracy of our model and baselines using two types of metrics.As seen in Table 3, three pixel-based models have higher RMSE, MAE, and lower R 2 in the test dataset, which confirms our previous assessment in the qualitative evaluation.The random forest regressor outperforms the decision tree on both RMSE and MAE but has nearly the same SSIM loss.This result reveals that the defect of the pixel-based scheme in image-based tasks cannot be overcome by the improvement of regression models.As for image-based baselines, they generally achieve better performance than pixel-based baselines.The CNN + 2 achieves a relatively lower RMSE due to its objective being also to minimize the 2 norm of errors.The blurry samples from CNN and CVAE lead to higher SSIM loss.In general, our model shows a promising result on both the three regression metrics and the structure similarity, which demonstrates that the adversarial loss and convolutional network architecture make a significant contribution to producing accurate and high-quality results.

Discussion
In this paper, we proposed a novel end-to-end deep learning framework to simulate kernel home range models by learning a mapping between image pairs.Instead of defining a new habitat model and explaining its ecological meaning, we focused on extending the applicability of the existing home range estimators.This work explores a novel way to solve the specific problem in animal ecology.We hope that the proposed approach can benefit both remote sensing and ecology communities.
Let us review the traditional habitat mapping studies [14,29,62], which virtually assume that each pixel in remote sensing images is an independent vector in a multi-dimensional environmental space.They mainly employ traditional classification or regression models in the data mining and machine learning field, to make predictions at the pixel level.This assumption is reasonable when scientists want to identify, verify, and explain the habitat characteristic pixel by pixel.However, remote sensing images are highly structured, and their pixels exhibit strong dependencies.The neglect of structural information will significantly reduce the mapping accuracy.Our experiment has confirmed this statement.So far, structural information has been less considered in habitat mapping studies.
Next, we would like to delve into the substance of our end-to-end deep learning framework.We interpret our model from two parts: the convolutional encoder-decoder, and the adversarial framework.Briefly, the convolutional encoder-decoder is the main body to implement the mapping, and the adversarial framework is a superior training strategy.Similar to the original CNN [37] which combines feature extraction, feature selection, and classifier into one network, the convolutional encoder-decoder merges feature extraction, feature selection, latent code interpretation, and image reconstruction into one end-to-end model and trains them together.On the contrary, traditional habitat studies often carry our every stage separately.The end-to-end deep learning models have achieved great success in computer vision [70,71] and remote sensing applications [72].We believe that this type of model could provide a new solution to habitat mappings as well.Regarding the adversarial framework, its key feature is the adversarial loss constructed by both the generator and discriminator.The adversarial loss can be viewed as a high-level goal which covers many low-level losses [41], and therefore brings a better result.In our supervised model, the adversarial loss is actually a superior objective to train the convolutional encoder-decoder.We effectively implement the image-based mapping by combining the advanced network architecture with a high-level training objective, which leads to the success in mapping the complicated targets from remote sensing imagery.

Conclusions
In conclusion, we propose a general-purpose framework to simulate kernel-based home range estimators.The experiment demonstrates that our framework could produce visually recognizable and highly accurate results from remote sensing imagery.Our approach could be generalized to map other types of habitat models as well, such as habitat suitability models [73] and habitat potential models [29].The deep neural network could help to seek the relationship between animal habitat and environmental factors, instead of the GPS data used in these models.Our approach still has some limitations.One important issue is that the selection of input layers mainly relies on expert knowledge.Our framework could hardly provide an explicit ranking for input layers due to its deep convolutional architecture.In future work, we will attempt to incorporate more effective selection strategies into our framework to improve the mapping performance.Acknowledgments: For logistics and field support, we are grateful to the following groups and individuals: the Institute of Automation, Chinese Academy of Sciences(S.Xiang), the Qinghai Lake National Nature Reserve staff (Z.Xing, D. Zhang), Qinghai Forestry Bureau (S.Li).

Figure 1 .
Figure 1.Using the adversarial framework to simulate the kernel-based home range estimator.X is the time-series remote sensing image, and Y is the corresponding home range map.The generator, G, learns the mapping function X → Y.The discriminator D tries to classify the real and synthetic data-target pairs.Both G and D are deep convolutional neural networks.

Figure 2 .
Figure 2. Architecture of the generator G in the adversarial framework.G generates samples G(X, z) from auxiliary information X and random noise z. "CONV/DCONV, stride = 2" denotes a convolutional/deconvolutional layer with two strides on each move.The number k under each tensor and the number d on the top stand for the size of the tensor is d × d × k.

Figure 3 .
Figure 3. Architecture of the discriminator D in the adversarial framework.The number under each tensor stands for the number of features.

Figure 4 .
Figure 4.The study area includes Qinghai Lake, Ngoring Lake, Gyaring Lake, Donggi Conag Lake, and several wetlands and estuaries.These places serve as a critical breeding ground and migratory staging area for Bar-headed Geese.

Figure 5 .
Figure 5.The pre-process procedure of building the image-based data-target pairs from source data.The GPS data are divided into time-series groups which are used to estimate home range maps via the kernel UD estimator.Then we pair each home range map (H n ) with the corresponding remote sensing image ( R n ) to form the image pairs.

Figure 6 .
Figure 6.The loss curves of our model and two different combinations (loss + optimizer).The first row presents the loss of the discriminator and the second row presents the loss of the generator.The X axis represents the number of training steps, and the Y axis represents the loss values.Our model has a fast convergence and smooth loss curves for both the generator and discriminator among all combinations.

Figure 7 .
Figure 7.The mapping result of the selected samples in the test dataset.In each set, the first image is the true color composite (band 1,4,3) of MODIS land products, which represents the input remote sensing imagery.The last image (ground truth) is estimated by the kernel UD estimator with GPS data.The middle one is the synthesized home range map which is directly mapped from remote sensing images using our end-to-end model.We colorized the original probability map with the hot colormap.

Table 1 .
ID number, sex, capture time, and number of GPS locations of five selected Bar-headed Geese.

Table 3 .
The quantitative evaluation employed in our model and baselines, with the metrics of RMSE, MAE, R 2 and L SSI M .