RAMC: A Rotation Adaptive Tracker with Motion Constraint for Satellite Video Single-Object Tracking

: Single-object tracking (SOT) in satellite videos (SVs) is a promising and challenging task in the remote sensing community. In terms of the object itself and the tracking algorithm, the rotation of small-sized objects and tracking drift are common problems due to the nadir view coupled with a complex background. This article proposes a novel rotation adaptive tracker with motion constraint (RAMC) to explore how the hybridization of angle and motion information can be utilized to boost SV object tracking from two branches: rotation and translation. We decouple the rotation and translation motion patterns. The rotation phenomenon is decomposed into the translation solution to achieve adaptive rotation estimation in the rotation branch. In the translation branch, the appearance and motion information are synergized to enhance the object representations and address the tracking drift issue. Moreover, an internal shrinkage (IS) strategy is proposed to optimize the evaluation process of trackers. Extensive experiments on space-born SV datasets captured from the Jilin-1 satellite constellation and International Space Station (ISS) are conducted. The results demonstrate the superiority of the proposed method over other algorithms. With an area under the curve (AUC) of 0.785 and 0.946 in the success and precision plots, respectively, the proposed RAMC achieves optimal performance while running at real-time speed.


Introduction
Single-object tracking (SOT), a fundamental but challenging task, allows the establishment of object correspondences in a video [1].It is applied in diverse scenarios, such as surveillance, human-computer interaction, and augmented reality [2,3].Given only the initial state of an arbitrary object, the tracker aims to estimate its subsequent states in a video [4].Many studies, including deep-learning-based [5][6][7][8][9] and correlation-filterbased [10][11][12][13], have been conducted to improve the tracking effects.Due to the achievements of the convolutional neural network (CNN), researchers have introduced CNNs for object tracking.The CNN-SVM [14] combines CNN with a support vector machine (SVM) [15] to achieve tracking.TCNN [16] and MDNet [6] have demonstrated their performance in object tracking.There are also many trackers based on the Siamese network, such as SiamRPN [7], SiamRPN++ [8] and SiamMask [17].Deep-SRDCF [18], C-COT [19] and ECO [20] employ deep features extracted from CNNs to enhance the object representations but at the cost of high computational complexity.Correlation filter-based methods have emerged since MOSSE [21] was first proposed.Such methods train the filter by minimizing the output sum of squared errors.The CSK [22] tracker improves upon MOSSE by introducing the circulant matrix and kernel trick.However, CSK still uses a simple raw pixel feature despite improving accuracy and speed.The kernelized correlation filter (KCF) [10] extends CSK by incorporating a multichannel histogram of oriented gradients (HOG) [23] feature introducing the circulant matrix and kernel trick.However, CSK still uses a simple raw pixel feature despite improving accuracy and speed.The kernelized correlation filter (KCF) [10] extends CSK by incorporating a multichannel histogram of oriented gradients (HOG) [23] feature and different kernel functions, and it has shown outstanding performance in tracking objects without rotation.Only a few correlation filters [24][25][26] have considered rotation.In addition, tracking drift is a drawback of correlation filters, which may cause the sample to drift away from the object.Some algorithms (e.g., SRDCF [27] and CSR-DCF [13]) have been proposed to prevent tracking drift at the expense of high time consumption.
Remote sensing observation capabilities have broadened from static images to dynamic videos.In 2013, the SkySat-1 video satellite captured panchromatic videos with a ground sample distance (GSD) of 1.1 m and a frame rate of 30 frames per second (FPS) [28].In 2016, the International Space Station (ISS) released an ultra-high-definition RGB video with a GSD of 1.0 m and a frame rate of 3 FPS.From 2015 to the present, members of the Jilin-1 satellite constellation produced by China Changchun Satellite Technology Co., Ltd.(Changchun, China) have been launched.Currently, Jilin-1 can capture 30 FPS RGB videos with a GSD of 0.92 m.Video satellites in orbit deliver rich, dynamic information on the Earth's surface and have been successfully used for SOT [29][30][31], traffic analysis [32], stereo mapping [33] and river velocity measurement [34].However, compared with natural video (NV), SOT in satellite video (SV) involves many challenges and can be defined as an emerging subject [35].The main difficulties are two-fold.
(1) The nadir view makes the rotation (in-plane) of the object a common phenomenon, as shown in Figure 1b-d; rotation can induce non-rigid deformation of the object and change the object spatial layout, affecting the performance of the tracking algorithms [36].(2) The complex background and low contrast between small-sized objects and the background can lead to tracking drift of the algorithms [37], as shown in Figure 1f-h.The KCF [10] has shown promising performance for SOT in SVs [36][37][38][39].However, HOG-based KCF inherently cannot handle object rotation [40].The axis-aligned bounding box of the KCF contains more background information, which may cause tracking drift [36,39].Moreover, compared to the rotating bounding box, it cannot express accurate semantic information, such as the real size and orientation of the object, as shown in Figure 2. To address the object rotation and tracking drift issues of SOT in SVs, we proposed a (2) The complex background and low contrast between small-sized objects and the background can lead to tracking drift of the algorithms [37], as shown in Figure 1f-h.
The KCF [10] has shown promising performance for SOT in SVs [36][37][38][39].However, HOG-based KCF inherently cannot handle object rotation [40].The axis-aligned bounding box of the KCF contains more background information, which may cause tracking drift [36,39].Moreover, compared to the rotating bounding box, it cannot express accurate semantic information, such as the real size and orientation of the object, as shown in Figure 2. To address the object rotation and tracking drift issues of SOT in SVs, we proposed a rotation adaptive tracker with motion constraint (RAMC) consisting of a rotation and a translation branch in this paper.We performed quantitative and qualitative experiments on the space-born SV datasets.The experimental results demonstrate that the RAMC tracker outperforms state-of-the-art algorithms and runs at over 40 FPS.The major contributions are summarized as follows:

Satellite Video Single-Object Tracking
As mentioned previously, the main challenges of SOT in SVs are object rotation and tracking drift.A few methods have been proposed to solve the object rotation problem in SVs.Guo et al. [37] detect the orientations of objects by using slope information and output rotating bounding boxes.Xuan et al. [40] rotate the extracted patch with a fixed-angle pool to deal with the object rotation issue and obtained axis-aligned bounding boxes.These methods may be numb to a slight rotational issue.To address tracking drift, some approaches [29][30][31]36,37,41,42] have built motion models based on the relatively stable motion patterns of objects in SVs.In [30,36,37,41,42], the authors use the properties of the Kalman filter [43] to predict the object position at low tracking confidence, which attenuates the tracking drift.In [29,31], the motion smoothness and centroid inertia models are embedded into the tracking framework to reduce tracking drift.However, most of them [29,31,36,37,41,42] place high demands on positioning accuracy during the initial stage.Other methods [30,38,39,44] extract the motion features contained in adjacent frames to prevent tracking drift.Du et al. [44] combine the three-frame-difference approach and the KCF tracker to obtain the object's position.Shao et al. [30] construct a refining branch modeled on Gaussian mixture models (GMM) to reduce the risk of drifting.In [37,38], the authors use the Lucas-Kanade sparse optical flow [45] feature for SOT in SVs.However, they ignore the directional information of the optical flow.And the sparse optical flow makes it difficult to represent pixel-level motion information [46].
To achieve precise angle estimation and localization of SOT in SVs, this study designs a rotation-adaptive tracking framework with motion constraint.It decouples the rotation and translation motion patterns by decomposing the rotation issue into a translation so-

Satellite Video Single-Object Tracking
As mentioned previously, the main challenges of SOT in SVs are object rotation and tracking drift.A few methods have been proposed to solve the object rotation problem in SVs.Guo et al. [37] detect the orientations of objects by using slope information and output rotating bounding boxes.Xuan et al. [40] rotate the extracted patch with a fixedangle pool to deal with the object rotation issue and obtained axis-aligned bounding boxes.These methods may be numb to a slight rotational issue.To address tracking drift, some approaches [29][30][31]36,37,41,42] have built motion models based on the relatively stable motion patterns of objects in SVs.In [30,36,37,41,42], the authors use the properties of the Kalman filter [43] to predict the object position at low tracking confidence, which attenuates the tracking drift.In [29,31], the motion smoothness and centroid inertia models are embedded into the tracking framework to reduce tracking drift.However, most of them [29,31,36,37,41,42] place high demands on positioning accuracy during the initial stage.Other methods [30,38,39,44] extract the motion features contained in adjacent frames to prevent tracking drift.Du et al. [44] combine the three-frame-difference approach and the KCF tracker to obtain the object's position.Shao et al. [30] construct a refining branch modeled on Gaussian mixture models (GMM) to reduce the risk of drifting.In [37,38], the authors use the Lucas-Kanade sparse optical flow [45] feature for SOT in SVs.However, they ignore the directional information of the optical flow.And the sparse optical flow makes it difficult to represent pixel-level motion information [46].
To achieve precise angle estimation and localization of SOT in SVs, this study designs a rotation-adaptive tracking framework with motion constraint.It decouples the rotation and translation motion patterns by decomposing the rotation issue into a translation solution.In addition, it further synergizes the appearance and motion information to enhance the localization performance of SOT in SVs.It guarantees that the proposed method can estimate slight angle differences of objects and prevent tracking drift.

Kernelized Correlation Filter
KCF [10] has shown promising performance for SOT in SVs [36][37][38][39]; we exploit and improve it to address object rotation and tracking drift for accurate semantic representations.This section introduces the KCF [10] framework based on the training and detection processes.It applies dense cyclic samples to explore the structural information of an object.A circulant matrix is used to model specific structures, and the correlation is transformed into element-wise products by a fast Fourier transform (FFT).
Let an M × N patch x denote a base sample that is centered on an object and is more than twice the size of the object.All cyclic shifts {x m,n }, (m, n) ∈ {0, . . . ,M − 1} × {0, . . . ,N − 1} are considered dense sampling over the base sample.They are labeled by a Gaussian function y so that y(m, n) is the label of x m,n .
In the training process, the solution ω is obtained by minimizing the ridge regression error [10], as follows: where ϕ is the Hilbert space mapping induced by kernel κ.The inner product is defined as ϕ(x), ϕ(g) = κ(x, g).A constant λ ≥ 0 is a regularization term that avoids overfitting.After a powerful nonlinear regression using the kernel trick, the solution ω is The discrete Fourier transform (DFT) of a vector is denoted by a hat (ˆ).The kernel matrix is a circulant matrix in commonly used kernel functions [10].Thus, the dual-space coefficient α is α where k xx = κ(x m,n , x).A Gaussian kernel is employed to compute the kernel correlation k xx with element-wise products in the frequency domain.For a patch with C feature channels, the base sample is x = [x 1 , x 2 , . . . ,x C ]. Therefore, we have where F −1 is the inverse Fourier transform (IFT), denotes element-wise products, * denotes the complex conjugate, and i is the index of feature channels.
In the detection process, patch z in a new frame equal to the size of x is cropped out in the center of the object in the previous frame.The response map f (z) is solved by: The object position is then obtained by determining the maximum value of f (z).To adapt to changes in the object, the two coefficients x and α are updated [10].

Methodology
Figure 3 shows the overall framework of the proposed method, including the rotation and translation branches.The rotation branch calculates the rotation angle α(t) of the object to provide an accurate orientation representation.Moreover, the angle is adopted for the translation branch to yield a stable response map.The translation branch synergizes the appearance and motion information contained in adjacent frames for positioning to prevent tracking drift.These two complementary branches are unified for the SOT in SVs.

Rotation Branch for Adaptive Angle Estimation
Object rotation is common in SVs because of the nadir view.It can cause the spatial layout between the object and background to change, challenging the accuracy and robustness of object tracking.Since the log-polar conversion can convert the rotation problem into a potential translation problem, we propose an adaptive angle-estimation method incorporating Fourier-Mellin registration [47].The rotation branch consists of three main parts: log-polar conversion, feature extraction, and phase correlation, which are introduced briefly.
For the log-polar conversion, (, ) and  (, ) denote the patches in Cartesian coordinates and log-polar coordinates, respectively.After giving the pivot point ( ,  ) and the reference axis ( axis), it is obtained that: where  is the log distance between the original point (, ), the pivot point ( ,  ),  indicates  , and  denotes the angle between the reference axis and the line through the pivot and original point. (, ) is a rotated replica of  (, ) with rotation  .The correspondences in Cartesian coordinates are  (, ) =  ( cos  +  sin  , −  sin  +  cos  ), In log-polar coordinates, their correspondences are

Rotation Branch for Adaptive Angle Estimation
Object rotation is common in SVs because of the nadir view.It can cause the spatial layout between the object and background to change, challenging the accuracy and robustness of object tracking.Since the log-polar conversion can convert the rotation problem into a potential translation problem, we propose an adaptive angle-estimation method incorporating Fourier-Mellin registration [47].The rotation branch consists of three main parts: log-polar conversion, feature extraction, and phase correlation, which are introduced briefly.
For the log-polar conversion, I(u, v) and I (ρ, θ) denote the patches in Cartesian coordinates and log-polar coordinates, respectively.After giving the pivot point (u 0 , v 0 ) and the reference axis (u axis), it is obtained that: where ρ is the log distance between the original point (u, v), the pivot point (u 0 , v 0 ), lg indicates log 10 , and θ denotes the angle between the reference axis and the line through the pivot and original point.
with rotation θ 0 .The correspondences in Carte- sian coordinates are In log-polar coordinates, their correspondences are It can be seen that the rotation between I 2 and I 1 is deduced as translation between I 2 and I 1 .By calculating the offset θ 0 , the angle difference between the two patches can be obtained.
For the feature extraction, the HOG feature is sensitive to the rotation of the object and can be applied to discriminate the angle difference [40].In this study, the HOG feature is extracted in log-polar coordinates and used to calculate the offset θ 0 by phase correlation.
For the phase correlation, it can be used to match images translated to each other [47].Therefore, it is employed to solve θ 0 .f 1 (u, v) and f 2 (u, v) denote the two 2D patches.f 2 (u, v) has a (u 0 , v 0 ) displacement from f 1 (u, v) along the u axis and v axis, as follows: Their Fourier transforms are related by where F 1 (α, β) and F 2 (α, β) denote the DFT of f 1 (u, v) and f 2 (u, v), respectively.The cross-phase spectra of F 1 (α, β) and F 2 (α, β) are defined as: where e j2π(αu 0 +βv 0 ) is the u and v axis translation in the Fourier domain.By applying the 2D IFT to C(α, β), the phase correlation function ϕ of the spatial domain can be obtained by In the ϕ, the location (u 0 , v 0 ) corresponds to the offset between the two images and can be computed for Through phase correlation, we can finally obtain the offset θ 0 = v 0 , which is the angle difference between I 2 (u, v) and I 1 (u, v).
In tracking, the rotation template temp(1) for the first frame is obtained by extracting the HOG feature z hog (1) of the log-polar patch representation.In the subsequent frames, the extracted result z hog (t) at frame t is utilized to compute the angle difference θ(t) between the temp(t − 1) and z hog (t), as shown in Figure 3.Moreover, to adapt to object changes (e.g., illumination and deformation), the rotation template temp(t) at frame t is updated by the learning rate η θ .The pseudocode of rotation branch procedure is shown in Algorithm 1.Compared with the methods [37,40] that use slope and angle pool to estimate the rotation angle of the object, the proposed method may be suitable for achieving accurate angle estimation and orienting the bounding boxes to a real state in SVs.Stable response maps can also be obtained under object rotation, thereby enhancing the positioning of the translation branch.

Translation Branch with Motion Constraint
The object angle α(t) in frame t can be estimated using the rotation branch.The next stage is to determine the object position (x, y) on the basis of the α(t).In the translation branch, the input patch of frame t is first rotated by α(t), and then the rotated patch is fed into the KCF [10] and optical flow remapping (OFR) modules for accurate positioning.In the KCF module, the issue caused by the object rotation is removed to ensure that tracking can be achieved using the appearance information of the object, as described in Section 2.2.In the OFR module, the optical flow represents the apparent motion of the brightness patterns and captures the motion magnitude and direction information between adjacent frames [48].Therefore, the motion state of the object in the previous frame can be remapped to the current frame using the optical flow feature.To achieve the per-pixel motion constraint information, we employ Farneback dense optical flow [49] to remap previous response maps into the current frame.
For the dense optical flow, it approximates the neighborhoods of each pixel using a quadratic polynomial.Given a local signal model, the local coordinate system E(X) of pixel X = (x, y) T can be expressed as where A ∈ R 2×2 , B ∈ R 2 and C denote the coefficients of the quadratic polynomial, which are estimated using a weighted least-squares method.The polynomial coefficients of the neighborhood change with the pixels of the frame.When a pixel is moved by displacement D, a new local system is constructed as follows: where ) and solve it by minimizing the objective function: Remote Sens. 2022, 14, 3108 8 of 19 where ∆B = − 1 2 (B 2 − B 1 ).In order to suppress the excessive noise caused by single-point optimization [49], the neighborhood δ of pixel X is integrated to obtain the solution from where ω(∆X) denotes the 2D Gaussian weight function of the neighborhood points.The displacements of the pixels can be solved by The optical flow obtained represents the direction and magnitude of each pixel in the frame.The response map containing the historical states of the object is then remapped to the current frame for positioning.
For the remapping, it is the process of transferring each pixel of the original image g(x, y) with size M × N to the target image G(x, y).The process is where R denotes the mapping relationship that specifies the motion direction and magnitude of each pixel in the original image.Therefore, the optical flow at frame t can be regarded as the mapping relationship R x i , y j = D x i , y j , and the previous response map at frame t − 1 is remapped to the current frame t, as shown in Figure 4.
optimization [49], the neighborhood  of pixel  is integrated to o from The optical flow obtained represents the direction and magnitude frame.The response map containing the historical states of the object i the current frame for positioning.
For the remapping, it is the process of transferring each pixel of (, ) with size  ×  to the target image (, ).The process is where  denotes the mapping relationship that specifies the motion nitude of each pixel in the original image.Therefore, the optical flow regarded as the mapping relationship   ,  =   ,  , and the map at frame  − 1 is remapped to the current frame , as shown in F We implement object tracking with motion constraint in the SV branch.To comprehensively exploit the motion and appearance inf module works with the KCF module to obtain the object position  dressing the tracking drift.We implement object tracking with motion constraint in the SVs in the translation branch.To comprehensively exploit the motion and appearance information, the OFR module works with the KCF module to obtain the object position pos (t), thereby addressing the tracking drift.We conduct extensive experiments using eight high-resolution SVs.Seven SVs are obtained from the Jilin-1 satellite constellation launched by Chang Guang Satellite Technology Co., Ltd.(Changchun, China).Moreover, only the Vancouver dataset is acquired using a high-resolution iris camera installed on the ISS.These datasets are divided into two groups: those with rotation (i.e., Dubai, Muharraq, Hong Kong, and Boston) and those without rotation (i.e., San Diego, Vancouver, Minneapolis, and San Francisco) to demonstrate the tracking effectiveness of the proposed method for rotating and non-rotating objects.Tracked targets include cars, planes, trains, and ships.These objects are represented by rotating bounding boxes annotated with four corner coordinates.One region of each SV is cropped for clear visualization.Table 1 provides detailed information about the datasets.Figure 5 shows the first frames, cropped regions, and tracked objects.  1 provides detailed information about the datasets.Figure 5 shows the first frames, cropped regions, and tracked objects.

Evaluation Metrics
To measure the performance of the tracking algorithms, two protocols (success and precision plots) in the online tracking benchmark [50,51] are used.The success plot displays the percentages of scenarios in which the  between the estimated bounding box  and ground truth  is larger than the threshold of  ∈ 0, 1 :

Evaluation Metrics
To measure the performance of the tracking algorithms, two protocols (success and precision plots) in the online tracking benchmark [50,51] are used.The success plot displays the percentages of scenarios in which the overlap between the estimated bounding box b e and ground truth b g is larger than the threshold of t s ∈ [0, 1]: where ∩ and ∪ denote intersection and union operators, respectively, and |•| denotes the number of pixels in the region.The precision plot records the percentage of scenarios in which the center location error (CLE) between the estimated location and ground truth is smaller than the threshold t p ∈ [1, 50].The area under the curve (AUC) of the success and precision plots is selected to rank all trackers, avoiding unfair comparisons due to specific thresholds.We mainly rank trackers based on the AUC of the success plot because of its representativeness in evaluation [52].The FPS is used to evaluate tracking speed.
To narrow the gap between the rotating and axis-aligned bounding boxes in the evaluation, we propose an internal-shrinkage (IS) strategy, as shown in Figure 6.
Remote Sens. 2022, 14, x FOR PEER REVIEW 10 of 19 is smaller than the threshold  ∈ 1,50 .The area under the curve (AUC) of the success and precision plots is selected to rank all trackers, avoiding unfair comparisons due to specific thresholds.We mainly rank trackers based on the AUC of the success plot because of its representativeness in evaluation [52].The FPS is used to evaluate tracking speed.
To narrow the gap between the rotating and axis-aligned bounding boxes in the evaluation, we propose an internal-shrinkage (IS) strategy, as shown in Figure 6.Most algorithms receive (initialize) and yield (output) axis-aligned bounding boxes.To optimize the evaluation process, we attempt to initialize and evaluate the algorithms using an external rectangle, as shown in Figure 6a.In this way, the initial bounding box contains many backgrounds, and even a small angle deviation may greatly affect the overlap between the estimated external rectangle and the ground truth rectangle, as shown in Figure 6b.Considering that the object of SV appears as an ellipse-like distribution pattern, we compute the internal ellipse followed by its external rectangle, as shown in Figure 6c.Finally, we use the internal rectangle to initialize trackers that can only receive axisaligned labels.In this way, the estimated rotating bounding boxes are converted to an external rectangle using the proposed IS strategy for the evaluation.

Implementation Details
Considering the relatively slight object changes (e.g., illumination, deformation) in a short time, the learning rate of the rotation template is set as  = 0.01.The cell size and orientation of the HOG feature are set to 4 × 4 and 9, respectively, for accurate tracking, as commonly used in [36,40].The other parameters related to KCF are referred to [10].The trackers are executed on a workstation with a 3.20 GHz Intel(R) Xeon(R) Gold 6134 CPU (32-core) and NVIDIA GeForce RTX 2080 Ti GPU.
To correctly select the fusion weight  for optical flow remapping, we randomly selected three of the eight SVs for the experiments.Table 2 presents the experimental details, and Figure 7 shows the AUC of the success and precision plots.It can be seen that both the AUC of the success and precision plots tended to increase and then decrease as the weights increased.The results are ideal when the weight is approximately 0.36.Therefore, this weight  = 0.36 is used in subsequent experiments.
Table 2. AUC of the success and precision plots on a randomly picked data set.Most algorithms receive (initialize) and yield (output) axis-aligned bounding boxes.To optimize the evaluation process, we attempt to initialize and evaluate the algorithms using an external rectangle, as shown in Figure 6a.In this way, the initial bounding box contains many backgrounds, and even a small angle deviation may greatly affect the overlap between the estimated external rectangle and the ground truth rectangle, as shown in Figure 6b.Considering that the object of SV appears as an ellipse-like distribution pattern, we compute the internal ellipse followed by its external rectangle, as shown in Figure 6c.Finally, we use the internal rectangle to initialize trackers that can only receive axis-aligned labels.In this way, the estimated rotating bounding boxes are converted to an external rectangle using the proposed IS strategy for the evaluation.

Implementation Details
Considering the relatively slight object changes (e.g., illumination, deformation) in a short time, the learning rate of the rotation template is set as η θ = 0.01.The cell size and orientation of the HOG feature are set to 4 × 4 and 9, respectively, for accurate tracking, as commonly used in [36,40].The other parameters related to KCF are referred to [10].The trackers are executed on a workstation with a 3.20 GHz Intel(R) Xeon(R) Gold 6134 CPU (32-core) and NVIDIA GeForce RTX 2080 Ti GPU.
To correctly select the fusion weight ω for optical flow remapping, we randomly selected three of the eight SVs for the experiments.Table 2 presents the experimental details, and Figure 7 shows the AUC of the success and precision plots.It can be seen that both the AUC of the success and precision plots tended to increase and then decrease as the weights increased.The results are ideal when the weight is approximately 0.36.Therefore, this weight ω = 0.36 is used in subsequent experiments.

Ablation Experiments
The proposed method incorporated two major improvem (AE) and ( 2) optical flow remapping (OFR).To validate their pe of RAMC are tested: Baseline (only KCF [10]), Base_AE (a Base_OFR (add OFR to Baseline).Table 3 lists the components experimental results, and Figure 8 shows the success and prec the Baseline and Base_AE, Base_AE yields a 13.9% and 18% gain and precision plots, respectively, after adding the AE.When RAMC and Base_OFR, we find a 14.5% and 3.6% reduction in th precision plots, respectively, when removing the AE.Owing to rotation can adversely affect the tracking performance.By co Base_OFR, Base_OFR obtains 34.4% and 34.2% in the AUC of plots, respectively, after adding the OFR.In contrast with Base yields a 35% and 19.8% improvement in the AUC of the succe spectively, when adding the OFR.This is because OFR explo information in adjacent frames to prevent tracking drift.Owing for the rotation branch and OFR for the translation branch, th

Ablation Experiments
The proposed method incorporated two major improvements: (1) angle estimation (AE) and ( 2) optical flow remapping (OFR).To validate their performance, three variants of RAMC are tested: Baseline (only KCF [10]), Base_AE (add AE to Baseline), and Base_OFR (add OFR to Baseline).Table 3 lists the components of these variants and the experimental results, and Figure 8 shows the success and precision plots.By comparing the Baseline and Base_AE, Base_AE yields a 13.9% and 18% gain in the AUC of the success and precision plots, respectively, after adding the AE.When comparing the proposed RAMC and Base_OFR, we find a 14.5% and 3.6% reduction in the AUC of the success and precision plots, respectively, when removing the AE.Owing to the absence of AE, object rotation can adversely affect the tracking performance.By comparing the Baseline and Base_OFR, Base_OFR obtains 34.4% and 34.2% in the AUC of the success and precision plots, respectively, after adding the OFR.In contrast with Base_AE, the proposed RAMC yields a 35% and 19.8% improvement in the AUC of the success and precision plots, respectively, when adding the OFR.This is because OFR exploits the underlying motion information in adjacent frames to prevent tracking drift.Owing to the synergy of the AE for the rotation branch and OFR for the translation branch, the proposed RAMC yields optimal performance.We compare the proposed RAMC with 13 competing algorithms: KCF [10], SAMF [53], Staple [11], BACF [12], ECO [20], SiamRPN [7], STRCF [54] SiamRPN++ [8], ASRCF [55], LDES [26], GFS-DCF [56], AutoTrack [57], and CFME [36].Table 4 summarizes the characteristics and experimental results for these trackers, sorted by the AUC of the success plot.Figure 9 shows the average success and precision plots.The proposed RAMC performs optimally with an AUC of 0.785 in the success plot and 0.946 in the precision plot.Algorithms that cannot cope with object rotation (such as KCF, SAMF, AutoTrack, and CFME) generally achieve inferior performances.The baseline KCF achieves the worst performance owing to the limited representation of the HOG.AutoTrack outperforms SAMF by 23.3% and 21.4% in the success and precision plots, respectively, by exploiting local and global information.CFME obtains AUC of 0.613 and 0.855 in the success and precision plots, respectively, because it uses the motion model to mitigate tracking drift.However, they cannot adapt to object rotation.Generally, algorithms that can cope with rotation but cannot cope with tracking drift (such as SiamRPN, GFS-DCF, LDES, ECO, and SiamRPN++) improve tracking performance.SiamRPN, GFS-DCF, ECO, and SiamRPN++ use rotation-invariant deep features to achieve satisfactory performance.However, these algorithms ignore the motion information hidden in adjacent frames and encounter tracking drift.Compared with the champion ECO of the VOT2017 challenge, RAMC produces a gain of 13.4% and 0.7% in the success and precision plots, respectively.Compared with SiamRPN++, which uses deep networks and a multi-layer aggregation mechanism, RAMC achieves a 6.2% higher success rate owing to the consideration of angle and motion information.The results suggest that RAMC can synergize the AE of the rotation branch and the OFR of the translation branch to cope with object rotation and tracking drift issues, yielding superior tracking effects.Meanwhile, it can run at over 40 FPS.The frame rate of SVs is usually 10 FPS.A tracker with a speed higher than 20 FPS can be considered as a real-time algorithm [38,39].
To further evaluate all the algorithms, we conducted two sets of experiments on datasets with rotation (Rotation) and without rotation (Nrotation).Table 5 summarizes the experimental results, and Figure 10 shows the success and precision plots.For the Rotation dataset, the proposed RAMC obtains better accuracy than that of KCF, SAMF, AutoTrack, and CFME due to considering the rotation issue.The LDES can estimate the rotation angle but ignores the inter-frame motion information.In addition, slight background jitter can affect its accuracy in estimating the angle of small-sized objects.Compared to ECO, ASRCF, and GFS-DCF, which use rotation-invariant deep features, the proposed RAMC exceeds them by 18.9%, 17.5%, and 14% in the success plot, respectively.RAMC has the highest AUC of 0.796 in the success plot, followed by SiamRPN++ (0.732) in second place, and SiamRPN (0.716) in third place.This is because it considers inter-frame motion information on top of the AE, resulting in optimal performance.Compared with SiamRPN, RAMC yields a reduction of only 0.6% in the precision plot.This is because the complex background affects the direction of the optical flow vector, causing motion constraint bias.Table 4. Details of trackers and experimental results on datasets.The top three of each metric is bolded."MR" = Mechanisms for Rotation."MTD" = Mechanisms for Tracking Drift.For trackers, "TGRS" = IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING.For framework, "KCF" = Kernelized Correlation Filter, "DCF" = Discriminative Correlation Filter, "SiameseFC" = Fully Convolutional Siamese Network, "II" = Integral Image and "CCF" = Continuous Convolution Filter.For features, "HOG" = HOG, "CN" = Color Names, "ConvFeat" = Convolutional Features, "CH" = Color Histogram and "OF" = Optical Flow.To further evaluate all the algorithms, we conducted two sets of experiments on datasets with rotation (Rotation) and without rotation (Nrotation).Table 5 summarizes the experimental results, and Figure 10 shows the success and precision plots.For the Rotation dataset, the proposed RAMC obtains better accuracy than that of KCF, SAMF, Auto-Track, and CFME due to considering the rotation issue.The LDES can estimate the rotation angle but ignores the inter-frame motion information.In addition, slight background jitter can affect its accuracy in estimating the angle of small-sized objects.Compared to ECO, ASRCF, and GFS-DCF, which use rotation-invariant deep features, the proposed RAMC exceeds them by 18.9%, 17.5%, and 14% in the success plot, respectively.RAMC has the highest AUC of 0.796 in the success plot, followed by SiamRPN++ (0.732) in second place, and SiamRPN (0.716) in third place.This is because it considers inter-frame motion information on top of the AE, resulting in optimal performance.Compared with Si-amRPN, RAMC yields a reduction of only 0.6% in the precision plot.This is because the complex background affects the direction of the optical flow vector, causing motion con- For the Nrotation dataset, RAMC obtains top-ranked results with AUC of 0.774 and 0.928 in the success and precision plots, respectively.SiamRPN++ produces a satisfactory performance.However, it uses only the appearance information of objects while ignoring motion information.Small-sized objects with similar surroundings may cause tracking drift and degrade tracking effects.Compared to SiamRPN++, RAMC gains the AUC by 5.9% of the success plot.In comparison to ECO, RAMC achieves a 7.8% gain in the success plot.Compared to CFME, RAMC also gains the AUC by 6% of the success plot.The LDES achieves promising performance, with an AUC of 0.724, ranking second in the success plot, since it employs a block coordinate descent (BCD) solver to find the best state for coping with illumination changes and deformations.However, RAMC improves the AUC by 5.0% in the success plot by extracting the motion information contained in adjacent frames.Overall, the proposed RAMC can synergize the AE of the rotation branch and OPR of the translation branch to achieve accurate and robust tracking.For the Nrotation dataset, RAMC obtains top-ranked results with AUC of 0.774 and 0.928 in the success and precision plots, respectively.SiamRPN++ produces a satisfactory performance.However, it uses only the appearance information of objects while ignoring motion information.Small-sized objects with similar surroundings may cause tracking drift and degrade tracking effects.Compared to SiamRPN++, RAMC gains the AUC by 5.9% of the success plot.In comparison to ECO, RAMC achieves a 7.8% gain in the success plot.Compared to CFME, RAMC also gains the AUC by 6% of the success plot.The LDES achieves promising performance, with an AUC of 0.724, ranking second in the success Compared with ECO, RAMC can obtain more semantic representations, including the size and motion direction.The bounding boxes of SiamRPN + + tend to be smaller and drift away from the train.This is because motion blur deteriorates the tracking template of the train, which affects the RPN module migrated from the faster R-CNN [58] to regress to the correct position and scale.LDES incorrectly estimates the scale owing to motion blur.In the other cases shown in Figure 11, RAMC is better at estimating the rotation angle and position of the objects.The quantitative results verify its outstanding performance in estimating the angle and preventing tracking drift.

Trackers
Remote Sens. 2022, 14, x FOR PEER REVIEW 15 of 19 and motion direction.The bounding boxes of SiamRPN + + tend to be smaller and drift away from the train.This is because motion blur deteriorates the tracking template of the train, which affects the RPN module migrated from the faster R-CNN [58] to regress to the correct position and scale.LDES incorrectly estimates the scale owing to motion blur.
In the other cases shown in Figure 11, RAMC is better at estimating the rotation angle and position of the objects.The quantitative results verify its outstanding performance in estimating the angle and preventing tracking drift.

Discussion
The experimental results demonstrate the effectiveness of the proposed approach in tracking rotating and non-rotating objects in SVs.Compared to those trackers that consider the rotation issue, the proposed method can perceive small angle deviations and provide more accurate orientation and size information.For example, the method of [37] can also detect the orientation of objects by computing the slope of object centroids.However, the slope defined in [37] may be difficult to represent the orientations of objects with obvious angle changes.The method of [40] uses the fixed-angle pool to solve object rotation and outputs the axis-aligned bounding boxes, which ignores accurate semantic information (e.g., real size, orientation of the object).Compared to [37,40], the proposed approach may be suitable to represent the orientations of objects with tiny and obvious angle changes and yield-rotating bounding boxes due to precise angle estimation effects of the rotation branch.In contrast to those trackers that handle the tracking drift issue, it guarantees precise localization assisted by the hybridization of angle and motion information.The methods of [38,39] also use optical flow for tracking.However, in [38,39], the Lucas-Kanade sparse optical flow [45] is regarded as a feature for representation, which may make it difficult to represent pixel-level motion information.In addition, [38,39] focus on considering the magnitudes of the optical flow and ignore the directional information.The proposed approach explores the effects of dense optical flow, which is capable of representing the pixel-level motion information.Furthermore, magnitudes coupled with directions of the optical flow are incorporated to prevent tracking drift and enhance the tracking performance.

Discussion
The experimental results demonstrate the effectiveness of the proposed approach in tracking rotating and non-rotating objects in SVs.Compared to those trackers that consider the rotation issue, the proposed method can perceive small angle deviations and provide more accurate orientation and size information.For example, the method of [37] can also detect the orientation of objects by computing the slope of object centroids.However, the slope defined in [37] may be difficult to represent the orientations of objects with obvious angle changes.The method of [40] uses the fixed-angle pool to solve object rotation and outputs the axis-aligned bounding boxes, which ignores accurate semantic information (e.g., real size, orientation of the object).Compared to [37,40], the proposed approach may be suitable to represent the orientations of objects with tiny and obvious angle changes and yield-rotating bounding boxes due to precise angle estimation effects of the rotation branch.In contrast to those trackers that handle the tracking drift issue, it guarantees precise localization assisted by the hybridization of angle and motion information.The methods of [38,39] also use optical flow for tracking.However, in [38,39], the Lucas-Kanade sparse optical flow [45] is regarded as a feature for representation, which may make it difficult to represent pixel-level motion information.In addition, [38,39] focus on considering the magnitudes of the optical flow and ignore the directional information.The proposed approach explores the effects of dense optical flow, which is capable of representing the pixel-level motion information.Furthermore, magnitudes coupled with directions of the optical flow are incorporated to prevent tracking drift and enhance the tracking performance.
The advantages of the proposed method are attributed to its branches (i.e., rotation and translation branches) in estimating the rotation angle and preventing tracking drift.In the estimation of rotation angle (i.e., rotation branch), this study attempts to reveal the relationship between the rotation and the translation.The rotation phenomenon is decomposed into the translation solution to achieve adaptive rotation estimation.In this way, the angle difference between adjacent frames can be obtained precisely by solving the translation problem.Figure 12 shows the effect of rotation angle estimation with a sample of the original (Figure 12a) and rotated patches (Figure 12b).In Figure 12b, the original ones are rotated to the initial orientation of the first frame by the estimated angle.It can ensure the spatial consistency of the adjacent patches as much as possible, allowing stable response results.Furthermore, the appearance and motion information contained in adjacent frames are synergized to enhance the object representations and deal with the tracking drift issue in the translation branch with motion constraint.To achieve the per-pixel motion constraint, the motion state of the object in the previous frame can be remapped to the current frame using the dense optical flow feature.Moreover, the proposed method can orient the bounding box to a more realistic object state with precise angle, size, and location information, and the results would serve a variety of scenarios (e.g., 2-D pose estimation of moving objects in a video, precise representations of dense objects in remote-sensing images, etc.).relationship between the rotation and the translation.The rotation phenomenon is decomposed into the translation solution to achieve adaptive rotation estimation.In this way, the angle difference between adjacent frames can be obtained precisely by solving the translation problem.Figure 12 shows the effect of rotation angle estimation with a sample of the original (Figure 12a) and rotated patches (Figure 12b).In Figure 12b, the original ones are rotated to the initial orientation of the first frame by the estimated angle.It can ensure the spatial consistency of the adjacent patches as much as possible, allowing stable response results.Furthermore, the appearance and motion information contained in adjacent frames are synergized to enhance the object representations and deal with the tracking drift issue in the translation branch with motion constraint.To achieve the per-pixel motion constraint, the motion state of the object in the previous frame can be remapped to the current frame using the dense optical flow feature.Moreover, the proposed method can orient the bounding box to a more realistic object state with precise angle, size, and location information, and the results would serve a variety of scenarios (e.g., 2-D pose estimation of moving objects in a video, precise representations of dense objects in remotesensing images, etc.).This paper verifies the significance of angle estimation and motion constraints for SOT in SVs.This work will help exploit the potential of satellite videos for applications such as traffic analysis, disaster response, and military target surveillance.

Conclusions
SOT in SVs has great potential for remote-sensing ground surveillance.To address object rotation and tracking drift problems, we analyzed the task from a new perspective, where the hybridization of angle and motion information cooperates for SOT in SVs.In addition, an RAMC tracker consisting of rotation and translation branches was created.By decomposing the rotation issue into translation solution, it decouples the rotation and translation motion patterns, achieving adaptive angle estimation.Subsequently, we dug out potential motion information and synergized it with the appearance information to prevent tracking drift.Moreover, an IS strategy was proposed to optimize the evaluation of trackers.Quantitative and qualitative experiments were conducted on space-born SV datasets.The results demonstrate that the proposed method yields state-of-the-art performance and runs at real-time speed.Future work will focus on solving the angle jitter problem.This paper verifies the significance of angle estimation and motion constraints for SOT in SVs.This work will help exploit the potential of satellite videos for applications such as traffic analysis, disaster response, and military target surveillance.

Conclusions
SOT in SVs has great potential for remote-sensing ground surveillance.To address object rotation and tracking drift problems, we analyzed the task from a new perspective, where the hybridization of angle and motion information cooperates for SOT in SVs.In addition, an RAMC tracker consisting of rotation and translation branches was created.By decomposing the rotation issue into translation solution, it decouples the rotation and translation motion patterns, achieving adaptive angle estimation.Subsequently, we dug out potential motion information and synergized it with the appearance information to prevent tracking drift.Moreover, an IS strategy was proposed to optimize the evaluation of trackers.Quantitative and qualitative experiments were conducted on space-born SV datasets.The

Figure 1 .
Figure 1.Visualization of object rotation and tracking drift in SVs.The symbol # represents the prefix of the frame number.The current frame is shown in the upper-left corner of each image.The yellow arrow indicates the orientation of the object.(a) and (e) show the original frames, and the selected objects are enlarged.(b-d) show the object rotation.(f-h) show the tracking drift due to the complex background and low contrast of the ship and wake.

Figure 1 .
Figure 1.Visualization of object rotation and tracking drift in SVs.The symbol # represents the prefix of the frame number.The current frame is shown in the upper-left corner of each image.The yellow arrow indicates the orientation of the object.(a) and (e) show the original frames, and the selected objects are enlarged.(b-d) show the object rotation.(f-h) show the tracking drift due to the complex background and low contrast of the ship and wake.

( 1 )( 1 )
We analyze the relationship between the intuitive rotation and the potential translation.And the rotation and translation motion patterns are decoupled by decomposing the rotation phenomenon into a translation solution.It could achieve adaptive rotation estimation when applied to SOT in SVs.(2) The appearance and motion information, contained in adjacent frames, are then synergized into the framework.It constructs the motion constraint term on the appearance model to prevent tracking drift and guarantee precise localization.(3) An internal shrinkage strategy is proposed to narrow the gap between the rotating and axis-aligned bounding boxes in the evaluation benchmark.It models axis-aligned rectangles with ellipse-like distributions to optimize the evaluation process.Remote Sens. 2022, 14, x FOR PEER REVIEW 3 of 19 rotation adaptive tracker with motion constraint (RAMC) consisting of a rotation and a translation branch in this paper.We performed quantitative and qualitative experiments on the space-born SV datasets.The experimental results demonstrate that the RAMC tracker outperforms state-of-the-art algorithms and runs at over 40 FPS.The major contributions are summarized as follows: We analyze the relationship between the intuitive rotation and the potential translation.And the rotation and translation motion patterns are decoupled by decomposing the rotation phenomenon into a translation solution.It could achieve adaptive rotation estimation when applied to SOT in SVs.(2) The appearance and motion information, contained in adjacent frames, are then synergized into the framework.It constructs the motion constraint term on the appearance model to prevent tracking drift and guarantee precise localization.(3) An internal shrinkage strategy is proposed to narrow the gap between the rotating and axis-aligned bounding boxes in the evaluation benchmark.It models axisaligned rectangles with ellipse-like distributions to optimize the evaluation process.

Figure 2 .
Figure 2. Visualization of different kinds of bounding boxes.The red area in the original frame (a) is enlarged; (b) shows an axis-aligned bounding box marked in yellow, whereas (c) shows a rotating bounding box marked in green.

Figure 2 .
Figure 2. Visualization of different kinds of bounding boxes.The red area in the original frame (a) is enlarged; (b) shows an axis-aligned bounding box marked in yellow, whereas (c) shows a rotating bounding box marked in green.
Remote Sens. 2022, 14, x FOR PEER REVIEW 5 of 19to prevent tracking drift.These two complementary branches are unified for the SOT in SVs.

Figure 3 .
Figure 3. Overall framework of the proposed RAMC algorithm.( − 1) and ( − 1) denote the position and angle of the object at frame  − 1, respectively.The color code is shown for visualizing the optical flow, in which the color denotes the displacement direction, and the saturation represents the displacement magnitude.The appearance and motion lines represent the positioning process of appearance and motion information, respectively.Two frames separated by T are selected as adjacent frames to show the overall tracking framework.It yields more intuitive rotation and translation of the object for visualization.

Figure 3 .
Figure 3. Overall framework of the proposed RAMC algorithm.pos(t − 1) and α(t − 1) denote the position and angle of the object at frame t − 1, respectively.The color code is shown for visualizing the optical flow, in which the color denotes the displacement direction, and the saturation represents the displacement magnitude.The appearance and motion lines represent the positioning process of appearance and motion information, respectively.Two frames separated by T are selected as adjacent frames to show the overall tracking framework.It yields more intuitive rotation and translation of the object for visualization.

Figure 4 .
Figure 4. Visualization of the optical flow remapping process.In the last row, e response map is traversed to calculate its corresponding position in the previ the position does not exist, the values of its neighboring pixels will be interpol pixel value in the target response map.

Figure 4 .
Figure 4. Visualization of the optical flow remapping process.In the last row, each pixel of the target response map is traversed to calculate its corresponding position in the previous response map.If the position does not exist, the values of its neighboring pixels will be interpolated to determine the pixel value in the target response map.

Figure 5 .
Figure 5.The SV datasets used in the experiments.A region marked by a yellow rectangle in each dataset is cropped out, and the tracked object, marked by the green rectangle, is displayed enlarged.

Figure 5 .
Figure 5.The SV datasets used in the experiments.A region marked by a yellow rectangle in each dataset is cropped out, and the tracked object, marked by the green rectangle, is displayed enlarged.

Figure 6 .
Figure 6.Visualization of different bounding boxes.(a) shows the external rectangle of the rotating bounding box.(b) shows the effect of angle deviation on the external rectangle.(c) presents the internal-shrinkage strategy for evaluation.

Figure 6 .
Figure 6.Visualization of different bounding boxes.(a) shows the external rectangle of the rotating bounding box.(b) shows the effect of angle deviation on the external rectangle.(c) presents the internal-shrinkage strategy for evaluation.

Figure 7 .
Figure 7. Effects of the weights of optical flow remapping on tracki presents the maximum values of the AUC and their indexes.

Figure 7 .
Figure 7. Effects of the weights of optical flow remapping on tracking performance.The legend presents the maximum values of the AUC and their indexes.

Figure 8 .
Figure 8. Success plot (a) and precision plot (b) of the variant trackers on datasets.The values in legends are the AUC."OPE" = one-pass evaluation, which initializes a tracker in the first frame lets it run to the end of the sequence.

Figure 8 .
Figure 8. Success plot (a) and precision plot (b) of the variant trackers on datasets.The values in the legends are the AUC."OPE" = one-pass evaluation, which initializes a tracker in the first frame and lets it run to the end of the sequence.

Figure 9 .
Figure 9. Success plot (a) and precision plot (b) of all algorithms.

Figure 9 .
Figure 9. Success plot (a) and precision plot (b) of all algorithms.

Figure 10 .
Figure 10.Success and precision plots of all algorithms on the Rotation and Nrotation datasets.(a) and (b) indicate the success plot and precision plot under the Rotation datasets, respectively.(c) and (d) indicate the success plot and precision plot under the Nrotation datasets, respectively.

Figure 10 .
Figure 10.Success and precision plots of all algorithms on the Rotation and Nrotation datasets.(a) and (b) indicate the success plot and precision plot under the Rotation datasets, respectively.(c) and (d) indicate the success plot and precision plot under the Nrotation datasets, respectively.4.3.2.Qualitative Evaluation Figure 11 presents visualization comparisons of the top four trackers.In the Dubai data, only RAMC and SiamRPN++ successfully track the car; however, SiamRPN++ outputs

Figure 11 .
Figure 11.Tracking examples of the top four trackers.The symbol × means tracking failure.The bounding boxes are thinned for better visualization of the tracking effect differences, best viewed by zooming in.

Figure 11 .
Figure 11.Tracking examples of the top four trackers.The symbol × means tracking failure.The bounding boxes are thinned for better visualization of the tracking effect differences, best viewed by zooming in.

Figure 12 .
Figure 12.Visual comparison of original patches (a) and rotated patches (b).

Figure 12 .
Figure 12.Visual comparison of original patches (a) and rotated patches (b).

Table 1 .
Details of the SV datasets."px" = pixels.Muharraq, Hong Kong, and Boston) and those without rotation (i.e., San Diego, Vancouver, Minneapolis, and San Francisco) to demonstrate the tracking effectiveness of the proposed method for rotating and non-rotating objects.Tracked targets include cars, planes, trains, and ships.These objects are represented by rotating bounding boxes annotated with four corner coordinates.One region of each SV is cropped for clear visualization.Table

Table 2 .
AUC of the success and precision plots on a randomly picked data set.

Table 3 .
Components and results of ablation experiments.

Table 5 .
The results of all algorithms on the Rotation and Nrotation datasets.The top three of each metric is bolded.