A Small UAV Based Multi-Temporal Image Registration for Dynamic Agricultural Terrace Monitoring

Terraces are the major land-use type of agriculture and support the main agricultural production in southeast and southwest China. However, due to smallholder farming, complex terrains, natural disasters and illegal land occupations, a light-weight and low cost dynamic monitoring of agricultural terraces has become a serious concern for smallholder production systems in the above area. In this work, we propose a small unmanned aerial vehicle (UAV) based multi-temporal image registration method that plays an important role in transforming multi-temporal images into one coordinate system and determines the effectiveness of the subsequent change detection for dynamic agricultural terrace monitoring. The proposed method consists of four steps: (i) guided image filtering based agricultural terrace image preprocessing, (ii) texture and geometric structure features extraction and combination, (iii) multi-feature guided point set registration, and (iv) feature points based image registration. We evaluated the performance of the proposed method by 20 pairs of aerial images captured from Longji and Yunhe terraces, China using a small UAV (the DJI Phantom 4 Pro), and also compared against four state-of-the-art methods where our method shows the best alignments in most cases.


Introduction
China has more than 70% of its land made up of mountains and hills, while the main agricultural terraces are located in southwest and southeast China.Therefore, agricultural terraces have become the most important agricultural land-use type and support the main agricultural production in these areas.Furthermore, it also plays an important role in reducing flood runoff, changing terrain slope and maintaining water, soil and conservation fertilizer.
However, due to Chinese agricultural policy and regional characteristics, most agricultural terraces in southwest and southeast China are farmed by smallholders, and have small sizes, scattered distributions, complex terrains and other characteristics.Such issues have increased difficulty in land use management and planting management of local governments, as well as regular cropping and planting area monitoring of smallholder farmers.Meanwhile, illegal land occupations, natural disasters, water and wind erosion are also causing a drastic decrease in agricultural terraces, and these further aggravate the soil erosion and endanger the smallholder production.Consequently, a light-weight and low cost dynamic monitoring of agricultural terraces has become a serious concern for smallholder production systems in China.
Agricultural terrace monitoring includes two major aspects: (i) crop monitoring and (ii) planting area monitoring.In this work, we mainly focus on the planting area monitoring, which generally depends on aerial remote sensing and image processing techniques.In parallel to the technological advances of aerial remote sensing, achievements in image processing have promoted the development of sophisticated algorithms using aerial images, such as the landform classification, which is one of the effective ways for dynamic monitoring of land-use changes.Traditional landform classification methods can be divided into non-supervised classification methods and supervised classification methods [1].Basically, non-supervised methods consist of manual classification methods and automated classification methods [2].Manual classification methods are relatively time-consuming and the results depend on subjective decisions of the interpreter and are, therefore, neither transparent nor reproducible [3,4].Automated classification methods [5,6] make use of the unsupervised nature and automation of the change analysis process.However, they are unfavored by difficulties in identifying and labeling change trajectories [7], and the lack of information on calibration of the ground [7].Artificial neural network (ANN) constitutes a key component of supervised classification methods [8,9].It is a non-parametric method that is capable of estimating the properties of data based on the training samples.However, ANN suffers from the long training time, the sensitivity to the amount of training data used, as well as the applicability of ANN functions in the common image processing softwares [7].
Subsequent to the landform classification, classified images can be adopted to monitor land use changes.Over the last few years, in order to analyze agricultural terraces, different and new technologies have been used.Satellite data of high spatial resolution and advanced image processing techniques, have opened up a new insight for mapping landscape features, such as terraces.This has given the opportunity for quantitative assessment of farming practices as an indicator in water pollution risk assessment [10], soil erosion risk assessment [11] and landslide boundary monitoring [11][12][13][14].In addition, airborne LiDAR (light detection and ranging), which has been developed to collect and subsequently characterize vertically distributed attributes [15], and the derived digital elevation model (DEM) [16][17][18][19][20] or digital terrain model (DTM) [12,14] are becoming standard practices in spatial related areas.Recently, the use of unmanned aerial vehicle (UAV) for civil applications has emerged as an attractive and flexible option for the monitoring of various aspects of agriculture and environment [21].For example, Diaz-Varela et al. [21] proposed an automatic identification of agricultural terraces through object-oriented analysis of high resolution digital surface models and multi-spectral images obtained from UAVs.Deffontaines et al. [22] monitored the active inter-seismic shallow deformation of the Pingting terraces by using UAV high resolution topographic data combined with InSAR time series.Yang et al. [23] proposed a multi-viewpoint remote sensing image registration method that provided an accurate mapping between different viewpoint images for ground change detections.
Compared with satellite and other aerial remote sensing, using small UAVs for agricultural terrace monitoring has a strong mobility, high efficiency, low cost and other advantages.However, the following issues still exist: (i) Due to the payload capacity, small UAVs usually can only carry a light-weight visible light camera, such as CCD or CMOS cameras that limit available image information while increasing difficulty in monitoring algorithms compared with using multi-spectral imaging.(ii) When collecting multi-temporal images for the same location (e.g., a planting area in terraces), the imaging perspective of small UAVs is often easily affected by wind speed/direction, complex terrain, battery capacity (e.g., flying distance), aircraft posture (pitch, roll, yaw), flying height and other human factors.These factors cause the captured scenes (i.e., the same location in a pair of multi-temporal images) to not be in the same coordinate system, while image geometric distortions, low image overlapping, brightness changes and color changes may also be produced in such multi-temporal images.
The above issues have led to the fact that multi-temporal images of the same scene captured by small UAVs may not be directly used to detect changes for dynamic agricultural terrace monitoring, and a reliable multi-temporal image registration, which can transform the images into one coordinate system, is necessary in order to be able to subsequently compare or integrate the data obtained from the multi-temporal images.
In this work, we focus on planting areas of agricultural terraces, and present a small UAV based multi-temporal image registration method for dynamic agricultural terrace monitoring.The major contributions of the proposed method includes: (i) the guided image filtering for agricultural terrace image preprocessing is first designed to enhance terrace ridges in multi-temporal images, (ii) the multi-feature descriptor is then applied to combine the texture feature and the geometric structure feature of terrace images for improving the description of feature points and rejecting outliers, (iii) the multi-feature guided model provides an accurate guiding for feature point set registration, and (iv) the feature points based image registration finally registers the terrace images accurately.

Methodology
The proposed small UAV based multi-temporal image registration method has four major sequential processes: (i) image preprocessing; (ii) feature extraction and combination; (iii) feature point set registration; and (iv) image registration.In this section, we first introduce the proposed method followed by analyzing the computational complexity and discussing the implementation details.

Guided Image Filtering Based Agricultural Terrace Image Preprocessing
Given a gray agricultural terrace image I with x × y pixels and intensity I = 0.3R + 0.59G + 0.11B from the colorized image captured by a small UAV.We first define a preprocessing method to strengthen the identifiability of terrace ridges in multi-temporal images, and then extract salient features of terrace images along the enhanced ridges in the second step.The main reason is that accuracy and validity of an image registration are not only controlled by the performance of feature point set registration, but also determined by the number and the distribution density of feature points, because of the image transformation constructed by abundant feature points.
In this work, the preprocessing method improves the contrast ratio between terraces and their ridges using the Guided Image Filtering (GIF) [24].In order to extract large and quality feature points from terrace ridges, we first adopt the GIF, which has the edge-preserving smoothing and the gradient preserving, to preprocess input images.The GIF employs a guidance image to construct a spatially variant kernel and is also related to the matting Laplacian matrix [25].
Firstly, a linear translation-variant guided filtering process in a square window π k centered at a pixel k is defined by: where i is a pixel index, Λ is the output image, parameters of the minimal cost function arg min Here, g is the input image that is identical to the guidance image I, γ k and δ 2 k are the mean and the variance of I in π k , |π| is the number of pixels in π k , ε is a regularization parameter preventing α k from being too large, and Secondly, we apply the linear model to all local windows in the entire image: where An example of agricultural terrace image enhancement (250 × 150 pixels) using GIF is given in Figure 1.

Before filter
After filter An agricultural terrace is constructed by cropland and terrace ridges.Ridges can be considered as a salient feature that provides the features of geometric contours and surface textures for feature based terrace image registration.However, the flat color of terrace ridge is similar with croplands and detecting cropland and terrace ridges becomes difficult when crops are growing in the early stage.Thus, the extracted feature points along ridges are more helpful than points distributed on the cropland.Mathematically, the exponential function can change the density of the data distribution; therefore, we expand the gray value distribution of the common terrace via a natural exponential function formed as: Finally, we can obtain the preprocessed gray agricultural terrace image I new with a high contrast ratio, smoothing edges, and prominent ridges.The preprocessing step gives an opportunity to extract quality features from these preprocessed terrace images.

Features Extraction and Combination
Feature points are selected by using the good-feature-to-track criterion [26], similar to the Harris detector, based on the second moment matrix [27].The selection specifically maximizes the quality of tracking, and is therefore optimized by construction, as opposed to more ad hoc measures of texturedness.The selected feature point set P = {p t } T t=1 belongs with the geometric coordinate of a input agricultural terrace image pixel, where {p t } T t=1 ∈ Z + .There is an example of feature points extraction from a preprocessed agricultural terrace image (500 × 300 pixels) as shown in Figure 2.

Local Texture Feature Descriptor
A local texture (LT) feature descriptor is designed to describe the texture features around each feature point according to the dominant rotated local binary patterns (DRLBP) proposed by Mehta [28].Given a gray image I with x × y pixels.The DRLBP operates in a local circular region by taking the difference of the central pixel with respect to its neighbors.It is defined as: where The mod operator circularly shifts the weights with respect to the dominant direction because of the weight term 2 mod(l−D,L) depends on D in the above definition.Therefore, the DRLBP is a rotation invariance and computationally efficient texture descriptor.
Before giving the LT for each feature point, the image is first weighted for each feature point based on its geometric coordinates, the weighting matrix for each point p t is defined as: where I xy is the geometric coordinate of the pixel I(x, y), p t is the tth point of P, and τ is a parameter that controls the window size of LT. t is a x × y weighting matrix of the tth feature point, and the tth weighting gray image is obtained by: where the weighting gray image I t has the same size with the source gray image.Let I t (x, y) = 0 when I t (x, y) ≤ 10 −4 .Figure 3 shows an example of weighting gray image.We define the LT of the tth feature point via DRLBP in the weighting gray image: where LT (P) R,L is a LT feature set with T vectors both size of 1 × 2 L .

Local Geometric Structure Feature Descriptor
A local geometric structure (LGS) feature is designed for each feature point in P by a local vector weighting method defined by: LGS where t k is the index of kth neighbor of p t , and K is the number of neighbors.The LGS descriptor employs the K neighbors to give a LGS description for each feature point p t but ignores outlier points.Hence, the value of K and the weight term η tk play a crucial role for the performance of LGS.We use an outlier score to define the weight term of −−→ p t p t k : LGS ), where σ 2 LGS is the variance of {∆ LT t } T t=1 , and the outlier score ∆ LT t k is computed by the LT distance between p t k and the point that has the most similar LT feature with p t k as:

Multi-Feature Descriptor
Different types of feature descriptors have their own advantages and limitations.This motivates us to make the respective advantages of LT and LGS descriptors complementary to each other.The multi-feature (MF) is designed to combine the local texture information and the local geometric structure information for improving the identifiability of each feature point.However, a fixed falseness does no good for guiding point registration throughout the iterations.Thus, the MF descriptor is defined as: where T 1 and T 2 are annealing parameters for the LGS and the LT features, respectively.The instantiation of Equation ( 11) is given in the implementation details section.

Multi-Feature Guided Point Set Registration Model
For feature based image registration, two sets of feature points are extracted from a pair of multi-temporal agricultural terrace images (i.e., a sensed image and a reference image), respectively.The extracted feature points contain a large number of outliers that limit the performance of current non-rigid point set registration algorithms [29][30][31].For this issue, a robust multi-feature guided model is designed-given two point sets A = {a n } N n=1 (i.e., the source point set) and B = {b m } M m=1 (i.e., the target point set) which are extracted from the sensed image and the reference image, respectively.The proposed point set registration model is first (i) to estimate correspondences between A and B by the proposed MF descriptor at each iteration, and then (ii) to update the location of A using a non-rigid transformation built by the recovered correspondences.The steps (i) and (ii) are iterated such that the A can gradually and continuously approach the target point set B , and finally match the exact corresponding points in B .

Correspondence Estimation
In the first step, the Gaussian mixture model (GMM) is applied to estimate correspondences by measuring the similarity of the MF between two point sets, and the correspondence estimation problem is considered as a GMM probability density estimation problem.Let the MF of a n be the centroid of the nth Gaussian component, and the MF of b m be the mth data.The GMM probability density function (PDF) is therefore obtained as: where ) with the equal isotropic covariances σ 2 of MF, P mn = 1 N are non-negative equal quantity with ∑ N n=1 P mn = 1, which are called the priors of GMM.ω M is an additional uniform distribution with a weighting parameter ω, 0 < ω < 1 for outlier dealing.
Once we have the PDF of GMM that is guided by the similarity of MFs, we can estimate correspondence by the posterior probability of GMM via Bayes' rule: by which we obtain an one-to-many fuzzy correspondence matrix S N×M guided by the similarity of MFs.Meanwhile, the corresponding target point set is obtained by: The proposed correspondence estimation method imitates the process of human practice, which measures the similarities of geometric structure feature and local texture feature.Generally, the process for humans to estimate the corresponding point of the source point a in terrace image consists of two parts: (i) searching for a region in a reference image that has a similar geometrical location and structure compared to the region that surrounds source point a, and (ii) finding a point within this region that has similar color features (LT in this paper) to the source point set.An example is shown in Figure 4, where only 100 feature points are shown for visual convenience.The two feature points a and b both have the similar pattern of LT histogram and the similar geometric feature.

Transformation Estimation
We model the non-rigid displacement function f by requiring it to lie within a specific functional space, namely a vector-valued reproducing kernel Hilbert space (RKHS) [32,33].The Gaussian kernel, which is in the form G(a n1 , a n2 ) = exp(− 1 2α 2 a n1 − a n2 2 ) and of size N × N, is chosen to be the associated kernel for the RKHS, where α is a constant to control the spatial smoothness.The function f can be defined by: Thus, the transformation estimation boils down to finding a finite parameter matrix W. Before a direct parameter estimation, we first illustrate a rule by which a reliable transformation parameter is obtained in the estimation process.
• Regularizing the transformation estimation process.The adopted Tikhonov regularization framework [34][35][36][37] is one of the most common forms of regularization.It minimizes an energy function in an RKHS H to regularize a function f , and can be written as: In this paper, the function f is defined in Equation (15).
As shown in Figure 5a, the regularized function (denoted by black line) is more reasonable than its non-regularized counterpart (denoted by blue curve).The transformation of an iterative registration method is such a procedure that slowly displaces the source point set so that the correspondence estimation is easier and more reliable.In other words, regularizing the transformation is necessary to accomplish the iterative registration.Figure 5b,c indicate that the ill-posed problem will exist if the transformation is not regularized.Note that as the number of points increases, the increasing arbitrariness of the transformation will lead to more severe ill-posed problems.The multi-feature guided fuzzy correspondence matrix S contains N × M probabilities, hence a reliable transformation will produce a larger expectation of probabilities.Therefore, the solution of transformation estimation is detected by maximizing a likelihood function that is formed as Π M m=1 s(MF (b m )), or equivalent to minimizing the negative log-likelihood function, which is formed as: We use the maximizing expectation (M-step) of the expectation maximization (EM) algorithm [29] to estimate the transformation.The idea of the EM algorithm is first to guess the values of parameters ("old" parameter values) via computing the posterior probability by Equation ( 13) (E-step), and then to find the "new" parameter values via minimizing the expectation of the complete negative log-likelihood function (M-step), which is formed as: where N S = ∑ M m=1 ∑ N n=1 s nm ≤ M (with M = N S only if ω = 0), R is the regularization of the transformation, and µ is a weighting parameter controlling the strength of the regularization.Furthermore, with an initialized deterministic annealing parameter σ 2 , the parameter W is obtained by arg min W Q. The mathematical solution is detailed in Section 2.5.

Feature Points Based Image Registration
Let I and I t be the sensed and reference images, where the source point set A and target point set B are extracted from I and I t , respectively.x(•) × y(•) denotes the size of one image.Our goal is to obtain the transformed image Î.
After the transformed source point set Â is obtained, a mapping function can be estimated based on the corresponding set that is constructed by C = {A, Â}, and then the image registration can be realized.There are two types of mapping: (i) forward approach: directly transforming the sensed image I using the mapping function, and (ii) backward approach: determining the transformed image Î from I using the grid of the reference image I t and the inverse of the mapping.Due to the discretization and rounding, (i) is complicated to implement, as it can produce holes and/or overlaps in the output image, and we use the backward approach for image transformation.We employ the TPS (thin plate spline) transformation model which obtained by: where the TPS model E TPS is of size (N + 3) × 3, O is a 3 × 3 matrix of zeros and Φ is the N × 3 matrix with the nth denoting (1, ân ), and the T is obtained by a pixel-by-pixel indexing process on the reference image I t , where Z = X(I t ) × Y(I t ).Letting grid Θ t be the source point set, and E TPS the TPS transformation model, the transformed grid is obtained by first computing then restoring the dimension of the grid to 2 by Θt ← Θt (•,1) , where the Z × N kernel K ij = θ t i − âj 2 log θ t i − âj , Φ is the Z × 3 matrix with the zth row denotes (1, θ t z ) and Θt (•,i) denotes the ith column of Θt .Let Θ be the grid obtained on I, we have Finally, the transformed image Î is obtained by resampling intensities from the sensed image I based on Θ, setting the rest of pixels to black.Note that the bicubic interpolation is used to improve the smoothness of Î; to be more precise, the intensities of each pixel in Î is determined by summing the weighted neighbor pixel intensities within a 4 × 4 window.

Implementation Details
The instantiation of Equation ( 11) is impeded by LT descriptor LT , which is constructed by a 2 L dimension histogram.Actually, the goal of Equation ( 11) is to measure the distance of MFs, which means that instantiating is equivalent to instantiating Equation (11).Analogically, )) has two-dimensional terms and one 2 L dimension term, hence, respectively instantiating and is equivalent to instantiating Equation (11).The distance of geometric and similarity of LT are denoted by Equations ( 22) and ( 23), in which the instantiation of G can be commonly realized by Euclidean distance computation.Furthermore, in general, a human similarity estimation of LT is to differentiate the pattern of the histogram, as we discussed in Section 2.3.1.Therefore, we instantiate Ψ by firstly normalizing each histogram in [0, 1], and then computing the quadratic distance sum of each dimension in the histogram, which is formed as: where LT (a n , i) denotes the ith column of the histogram of a n .Once Equation ( 11) is instantiated, we can respectively rewrite Equations ( 13) and ( 18) by: ) and thus, we can complete the M-step of EM algorithm for implementing the agricultural terrace image registration.The matrix form of Equation ( 26) to simplify the derivative is written as: where Tr denote trace operate, S A = dig(S1), S B = dig(S T 1), 1 is a column vector with all ones.G P = P + U T (P)P, where operator U(P) is defined by with P = {p i } I i=1 denoting a non-representational point set containing I points, i k is the index of the kth neighbor of the ith point, K is the number of neighbors, and η ik is the weight of LGS defined by Equation ( 9).The partial derivative of Equation ( 27) with respect to the parameter W is obtained by: Setting Equation (28) to zero, the parameter W is obtained by: Parameter Setting For evaluating image features, four groups of parameters-(i) R, the radius of the circular neighborhood in DRLBP; (ii) L, the number of the neighbors in DRLBP; (iii) τ, the window size controlled parameter in LT; and (iv) T 1 and T 2 , two annealing parameters for LGS and LT-are used.We set R = 1, L = 8, τ = 10, T 1 = exp(−iter/10) and T 2 = exp(−iter/50).
For registering extracted feature points, six parameters-(i) ω, outlier weighting parameter; (ii) α, a constant to control the spatial smoothness; (iii) σ 2 , the equal isotropic covariances of MF; (iv) W, the parameter of point set transformation; (v) µ, the weighting parameter of regularization and (vi) iter max , the max number of iteration-are used.We set ω = 0.7, α = 2 and iter max = 50, and initialize W as a matrix with all zeros.Moreover, σ 2 and µ are first initialized by and µ = 8, then annealed as iter max × µ, respectively.
The pseudo-code of our feature points based agricultural terrace image registration is outlined in Algorithm 1.

Computational Complexity
The computational complexity of each part in our feature points based agricultural terrace image registration is as follows: where X 1 , X 2 , Y 1 , Y 2 are the widths and heights of the source image and sensed image, respectively; N, M are numbers of the feature points extracted from the source and sensed image.Overall, the time complexity of the proposed method is O(NX

Experiments Design
CPD (coherent point drift) [29], GLMDTPS (global and local mixture distance with thin plate spline transformation) [30], SIFT (scale invariant feature transform) [38] and SURF (speeded-up robust features) [39], four state-of-the-art methods, are compared against our method in the following experiments.SIFT and SURF methods used the open source VLFeat toolbox with the threshold 1 and the Matlab open source OpenSURF function with the default setting, respectively.We design two series of experiments: (i) due to the employment of the same feature point sets, the quantitative comparison on feature point matching is carried out on CPD, GLMDTPS and our method using the precision ratio (PR) [40]; (ii) quantitative comparison and qualitative demonstration on image registration are carried out on all the methods using the root of mean square error (RMSE), mean absolute error (MAE) and standard deviation (SD).The experimental dataset includes 20 pairs multi-temporal (4-5 month time interval) and multi-viewpoint agricultural terrace images (500 × 300 pixels) captured from Longji and Yunhe terraces, China (see Figure 6).All agricultural terrace images were obtained by a small UAV (the DJI Phantom 4 Pro (SZ, China), store homepage: [41]) with a CMOS camera.The small UAV basically maintained the same flight height (around 50-70 m) for collecting multi-temporal images of the same locations, but appropriately changed the imaging perspective for generating different geometric distortions, and different overlapping degrees of image pairs.All experiments are tested on a PC with 2.60 GHz Intel CPU and 16 GB memory.

Evaluation Criterion
The PR is usually employed to estimate the accuracy of feature point matching and defined as where TP and FP denote the true positive and the false positive, respectively.The positive indicates inliers and the negative indicates outliers.The RMSE, MAE and SD are usually used to quantify image registration accuracy.We manually determine 20 pairs of landmarks between the sensed image and the reference image as ground truth, and all the landmarks are well-distributed and selected in the easily identified places around agricultural terraces.The related formulations and the definitions in statistics are as follows: where a l n and b l n are the nth pair of corresponding landmarks in the sensed image and the reference image, respectively.N l is the total number of selected landmarks, and the operator d(•, •) denotes the distance.

Results of Feature Matching
All agricultural terrace image pairs contain viewpoint changes and were captured with a 4-5 month time interval.By using the proposed preprocessing method, each image pair has 624 to 1513 feature points.For the quantitative comparison and visual demonstration, we evenly selected 100 feature points with more than 50% outlier (negative) to data rate for calculating the PR and visualizing the matching results as shown in Table 1 and Figure 7. CPD and GLMDTPS gave poor performances since they estimated correspondences only using the Euclidean distance and the mixed geometric features, respectively, although CPD applied the motion coherent based geometric constraint to regularize the displacement field and GLMDTPS applied the annealing scheme to gradually change the transformation from rigid to non-rigid during registration.Our method gave the best matching performance in all image pairs since the local texture feature and the local geometric structure feature are combined and very complementary.

Table 1. Experimental results on series (i). Quantitative comparisons on the mean PR (precision ratio).
Bold fonts indicate the best results.All units are in percentages.CPD (coherent point drift) denotes the method [29] and GLMDTPS (global and local mixture distance with thin plate spline transformation) denotes the method [30].

Results of Image Registration
For CPD, GLMDTPS and our method, all extracted feature points were used for image registration.Feature points for SIFT and SURF were extracted by their default setting.The quantitative comparison using the mean RMSE, MAE and SD are shown in Table 2.The transformed images and the checkboards in four typical image registration examples are shown in Figure 8. Registering images using sparse correspondence has the advantages of being potentially faster and easily maintains a high feature point matching ratio.However, the image registration accuracy might not be high since a desired image transformation with acceptable accuracy can only be established based upon more dense correspondence (e.g., more feature points).This can be explained since the image transformation is basically interpolated from correspondences of feature points, thereby an adequate number of correspondences play a crucial role in yielding detailed transformation.
In this experiment, SIFT and SURF, which extract a relative small number of feature points, failed all and 17 registrations, respectively, although SURF performed well in the other three registrations (see Table 2).The reason was that the image registrations with a small number of feature points are sensitive to mismatching.Although CPD and GLMDTPS employed the same number of feature points with our method, the geometric features used in CPD and GLMDTPS were sensitive to outliers and similar neighborhood structures in multi-temporal images.Therefore, they also gave the relatively poor registration accuracies (e.g., GLMDTPS failed five registrations).In our method, the local texture feature and the local geometric structure feature of terrace images are combined well to improve the feature description of points, while helping to reject outliers.The outliers were rejected in the point matching step, but used to yield detailed transformation in the image registration step.Therefore, our method gave the best registration performance.
Table 2. Experimental results on series (ii).Quantitative comparisons on image registration measured using the mean RMSE (root of mean square error), MAE (mean absolute error) and SD (standard deviation) are carried out.0 ≤ θ ≤ 20 denotes the number of the failed registrations (can not manually identify landmarks).Bold fonts indicate the best results, and all units are in pixels.CPD (coherent point drift) denotes the method [29], GLMDTPS (global and local mixture distance with thin plate spline transformation) denotes the method [30], SIFT (scale invariant feature transform) denotes the method [38] and SURF (speeded-up robust features) denotes the method [39].

Conclusions
Due to soil erosion, illegal land occupations, agricultural land management practices and low purchasing power in China's southeast and southwest mountain area, smallholder farmers as well as local governments require a light-weight and low cost technology such as DJI Phantom UAVs to monitor their planting area in terraces.However, multi-temporal images of the same planting area captured by small UAVs have only visible image information and are always accompanied by viewpoint changes, image geometric distortions, low image overlapping, brightness changes and color changes such that the images may not be directly used for dynamic agricultural terrace monitoring.Thus, transforming multi-temporal images into one coordinate system is necessary in order to be able to subsequently compare or integrate planting area information for dynamic agricultural terrace monitoring.
In this work, we have presented a small UAV based multi-temporal image registration method.The proposed method first designed a guided image filtering to enhance terrace ridges in multi-temporal images, and a multi-feature descriptor was applied to combine the texture feature and the geometric structure feature of terrace images for improving the description of feature points and rejecting outliers.The multi-feature guided model then provided an accurate guiding for feature point set registration, and the feature points based image registration finally gave an accurate image registration.Experiments on 20 pairs of multi-temporal terrace images captured by a DJI Phantom 4 Pro demonstrated that our method gives the best registration performance, and outperforms four state-of-the-art methods.To fully realize the dynamic agricultural terrace monitoring, future work will focus on core decision rules of automatic change detection algorithms between the registered images.For registering other landscape elements, the image enhancement preprocessing, and multi-feature extraction and combination steps (as described in Sections 2.1 and 2.2), should focus on object-specific features that are invariant to various imaging perspectives, color and brightness changes.

Figure 1 .
Figure 1.An example of guided image filtering.The images before and after filtering are shown in the first row, and the enhanced ridges are marked by the red windows.The image gradients before and after filtering are shown in the last row, exhibiting the gradient preserving by the GIF.

Figure 2 .
Figure 2.An example of feature points extraction.There are 1270 feature points extracted and denoted by blue circles.

,
i(x, y) and i(a l (x), b l (y)) are the gray values of central pixel and its neighbor in image I, respectively, (x, y) and (a l (x), b l (y)) are the geometric coordinate of the central pixel and its lth neighbor.Note that a l (x) = x + Rcos(2πl/L) and b l (y) = y − Rsin(2πl/L), R is the radius of the circular neighborhood and L is the number of the neighbors.The mod indicates the modulus operator, and D = arg max l∈{0,1,...,L−1} |i(a l (x), b l (y)) − i(x, y)|.In this paper, the {DRLBP I R,L (x, y)} X−1,Y−1 x=0,y=0 are held in binary form.The DRLBP descriptor DRLBP I R,N of I is denoted by the DRLBP histogram H

Figure 3 .
Figure 3.An example of weighting gray image that is based on a feature point p t .The left one is the original terrace gray image, and the right is the weighting gray image.The effect of the weight operator is shown in a red window, and the size of the window is controlled by the parameter τ 2 .

Figure 4 .
Figure 4. Example on estimating correspondence between two terrace images.In the first row, histograms of the points a and b are provided, respectively.In the second row, the points a and b are classified as the corresponding pair, which are connected by the red line.

Figure 5 .
Figure 5. Two examples for demonstrating the importance of regularization.(a) two different results for a function estimation problem.The black line and blue curve denote the estimated functions with or without being regularized, respectively.In addition, the red asterisks denote 11 data points; (b) point set transformation and its velocity field in regularized scenario; (c) point set transformation and its velocity field in a non-regularized scenario.In (b,c), the blue and red points denote the source and target point, respectively.

Algorithm 1 : 2 Graying image I and I t ; 3 4 5 end 6 12 Compute 20 Anneal T 1 , 21 Anneal σ 2 ;
Local texture feature and geometric structure feature guided agricultural terrace image registration input : Sensed agricultural terrace image I and reference agricultural terrace image I t output : Transformed point set Â, correspondence matrix S and transformed image Î parameter : R, L, τ, T 1 , T 2 , ω, α, σ 2 , W, µ and iter max 1 Image preprocessing: Strengthen gray images by Equation (2); Expand the gray value distribution by Equation (3); Select feature point set A and B from I new and I t new , respectively; 7 Initialize σ 2 and W; 8 Compute LT R,L (A) and LT R,L (B) in I and I t ; 9 Construct Gaussian kernel GLGS(A) and LGS(B) by Equation (8), respectively; 13 Compute the posterior probability matrix S by Equation (25); 14 Compute the corresponding target point set B by Equation (14); W by Equation (29); 18 Update the source point set by A ← f (A); 19 end T 2 and µ;

Figure 7 .
Figure 7. Feature matching demonstrations on four typical agricultural terrace image pairs.In (a,b), the first to the fourth rows are: the image pairs, the results on feature matching of CPD (coherent point drift), GLMDTPS (global and local mixture distance with thin plate spline transformation), and ours, respectively.Red circles denote feature points extracted by our method.Blue lines indicate the false positive and the false negative, yellow lines indicate the true positive, and red crosses indicate the true negative.

Figure 8 .
Figure 8. Agricultural terrace image registration examples on four typical image pairs.In (a,b), the first to the fourth rows are: the image pairs, transformed images and checkboards built by CPD (coherent point drift), GLMDTPS (global and local mixture distance with thin plate spline transformation), and ours, respectively.Yellow crosses denote 20 pairs landmarks.In (c,d), the first rows show the image pairs and the second rows show the transformed images and the checkboards built by SIFT (scale invariant feature transform) and SURF (speeded-up robust features), respectively.Red circles denote the extracted feature points.