Digital Image Stabilization Method Based on Variational Mode Decomposition and Relative Entropy

Cameras mounted on vehicles frequently suffer from image shake due to the vehicles’ motions. To remove jitter motions and preserve intentional motions, a hybrid digital image stabilization method is proposed that uses variational mode decomposition (VMD) and relative entropy (RE). In this paper, the global motion vector (GMV) is initially decomposed into several narrow-banded modes by VMD. REs, which exhibit the difference of probability distribution between two modes, are then calculated to identify the intentional and jitter motion modes. Finally, the summation of the jitter motion modes constitutes jitter motions, whereas the subtraction of the resulting sum from the GMV represents the intentional motions. The proposed stabilization method is compared with several known methods, namely, medium filter (MF), Kalman filter (KF), wavelet decomposition (MD) method, empirical mode decomposition (EMD)-based method, and enhanced EMD-based method, to evaluate stabilization performance. Experimental results show that the proposed method outperforms the other stabilization methods.


Introduction
Digital cameras are frequently used to record video information. However, cameras mounted on vehicles frequently suffer from image shaking caused by the vehicles' motion [1,2]. In particular, serious image shake occurs in complex terrains or under strenuous motions, thereby blurring the video sequences captured by cameras. Image shake does not only reduce the accuracy of observation, but also increases eye strain of users. To solve this problem, image stabilization has been widely studied in recent years [3][4][5].
Recent image stabilization systems can be generally classified into four categories: (1) optical image stabilization systems, which feature a kind of mechanism that stabilizes video sequences by optical computing with high accuracy and speed [6,7]; (2) electronic image stabilization systems, that use accelerometers or motion gyroscopes to detect camera motion and then compensate the jitter motion [8]; (3) orthogonal transfer charge-coupled device (CCD) stabilization systems, which use CCDs to measure image displacement and shifts the deviation according to the motion of bright stars [9]; (4) digital image stabilization (DIS), which estimates the global motion vector (GMV) and removes unintentional motion components from the GMV to generate stable video sequences using image processing algorithms [10][11][12]. DIS methods outperform other image stabilization methods because they are more flexible and are hardware-independent.
Motion separation is the most critical step in DIS. In signal processing, jitter motion separation from GMV can be considered a noise removal issue. Intentional motions can be considered useful signals, whereas jitter motions can be considered noisy signals. Therefore, various traditional filter methods can be used to remove jitter motions. MF includes a simple mathematical model and is a widely used scheme [13,14]. In this method, intentional motion vector is smoothed by averaging GMVs within a window. However, MF performance highly depends on window size. Another traditional method is KF, which estimates intentional motions using a dynamic motion model [15][16][17]. KF uses the current observation and previously estimated state to generate intentional motion. KF can be easily designed; however, it is unsuitable for nonlinear conditions [18]. WD method is proposed to satisfy nonlinear conditions [19]. However, the WD method must determine a proper wavelet basis function in advance, and this task becomes very difficult in complex conditions. Recently, many empirical mode decomposition (EMD)-based DIS algorithms have been proposed [20,21]. These techniques can adaptively separate jitter and intentional motions from GMV. However, EMD-based methods present many defects, such as having no precise mathematical model, sensitivity to noise, and sampling and mode mixing, which may result in inaccurate separation [22,23].
In the current study, a hybrid DIS method is proposed that uses variational mode decomposition (VMD) and relative entropy (RE). First, the GMV of video sequence is estimated using the scale-invariant feature transform (SIFT) feature matching algorithm. Then, the GMV is decomposed into several band-limit modes via VMD. Intentional motions possess low frequency and high amplitude because they are much slower than the frame rate, whereas jitter motion exhibit the opposite nature [13,20]; thus, jitter and intentional motions usually exhibit different statistical properties. Therefore, the RE value between two jitter motion modes is low, whereas the RE value between the intentional and jitter motion modes is high. Based on this fact, jitter motion modes can be determined. The summation of jitter motion modes constitutes the jitter motion vector, while the substraction of the resulting sum from the GMV represents the intentional motion vector. Several algorithms are then compared, and the experimental results show that the proposed method has better performance than the other algorithms.
The main contributions of this work are listed as follows: (1) A VMD-based motion separation method is proposed in this work. The VMD divides GMV into several narrow-banded modes, which has different center frequencies. The modes with different frequency characteristics decomposed by VMD can reproduce the original GMV. (2) A RE method is proposed to identify relevant modes. The proposed method utilizes statistical information to represent the internal relationship between different modes. Thus, compared with other existing methods (Hausdorff distance [24], power of amplitude [20], correlation coefficients [25]), the proposed method can better differentiate the intentional and jitter motions.
The rest of this paper is organized as follows: Section 2 introduces the related work, including the mathematical model of jitter motions, the VMD theory, and the RE theory. Section 3 illustrates the proposed DIS framework. Section 4 provides the experimental results of the proposed method compared with other methods. Finally, conclusions are drawn in Section 5.

Mathematic Model of Jitter Motion
In a vehicle-mounted camera system, irregular pavement, engine, transmission system, and tire vibration all cause random jitters in the camera holder, which makes the video sequences unstable. Among them, the irregular pavement is the most serious factor. The jitter level of the camera pan has strong relationship with road roughness (RR) [26]. The statistical characteristics of RR can be illustrated by the power spectral density of pavement displacement: where n denotes spatial frequency; n 0 is the reference spatial frequency, which can be set as 0.1 m −1 ; G d (n 0 ) is the coefficient of RR; and W is the frequency index, which is set as 2.
Aside from RR, vehicle speed can affect the frequency of jitter motions [27], as expressed as follows: where J( f ) is the time spectral of RR, which reflects the frequency of jitter motions; and u represents the vehicle speed. Equation (3) indicates that the frequency of jitter motions is only relative to vehicle speed and RR. In general, the sampling frequency of the video frequency is considerably higher than the frequency of motion vector. Thus, RR and vehicle speed can be assumed invariable within a short time. Then the frequency of jitter motions will be band-limited with large probability within a short time.

VMD Theory
VMD is different from traditional recursive model. This method concurrently searches modes and their center frequencies. By performing VMD, the signal can be decomposed into several band-limit modes u k (k = 1, 2, . . . , K), where K is the number of modes. Each mode converges around the center frequency ω k (k = 1, 2, . . . , K). Therefore, variational problem can be constructed, as shown by Equation (4) [28]: where f is the input signal, δ is Dirac distribution, t is time script, and * denotes convolution.
To solve Equation (4), a quadratic penalty term α and Lagrangian multiplier λ are used to transform the constrained variational problem into the following unconstrained variational problem: Then, using the alternate direction method of multipliers (by updating the u n+1 k , ω n+1 k , and λ n+1 alternately), the solution of the optimal problem can be obtained by searching the saddle point of Equation (5) [29]. VMD is implemented as follows: (1) Initialize the modes u k , center pulsation ω k , Lagrangian multiplier λ and the maximum iterations N (5000 in this paper). The cycle index is set to n = 0. (2) The cycle is started, n = n + 1.
(3) The first inner loop is executed, and u k is updated according to following function: (4) The second inner loop is executed, and ω k is updated according to the following function: (5) λ is updated according to the following: (6) Steps (2)-(5) are repeated until convergence, as follows: where τ is an update parameter, ε is a small number (0.00001 in this paper). The solution to updateû k and ω k can be solved in the spectral domain, as follows: Then, the obtained modesû k (ω) in frequency domain are transformed into the time domain via inverse Fourier transform. According to Dragomeretskiy's theory, there are two important parameters that has influence on the result: the penalty parameter α and the mode number K [28]. First, Dragomeretskiy suggested that if the principle frequencies of the sub-components are estimated a priori, then a low α is preferred to use because ω k gains freedom of mobility to the appropriate modes [28]. In the proposed method, a low α (100) is preferred because no prior frequencies of the sub-components are given. Second, when α is small, either one of the modes is shared by the neighboring modes (underbinning) or several additional modes will generally consist of texture with a low structure (overbinning). In the first case, the intentional motion and jitter motion may be in the same mode, thereby impeding the good results. If the excess modes are decomposed, then the performance will not be significantly improved, but the computation will be increased. In our simulation, the number 5 can meet the requirement of most tests.

RE Theroy
In mathematical statistics, RE measures the difference between two probability distributions [30]. For discrete probability distributions P and Q, RE from Q to P is defined as follows: The RE between two modes reflects the difference of probability distribution. In most cases, jitter and intentional motions exhibit different statistical properties. The jitter motion vector is wide-sense stationary or approximate to the Gaussian distribution, whereas the intentional motion vector is arbitrary. Thus, the RE value between two jitter motion modes is low, whereas that between the intentional and jitter motion modes is high.

Proposed Digital Image Stabilization Framework
There are three key procedures in the proposed DIS framework, namely, motion estimation, motion separation, and intentional motion vector reconstruction. During the first step, GMV is estimated using the SIFT feature point matching algorithm. Subsequently, VMD is applied to decompose GMV into different modes. RE is used in determining the intentional and jitter motion modes to separate them. Finally, the summation of the jitter motion modes constitutes the jitter motion vector, whereas the subtraction of the resulting sum from the GMV represents the intentional motion vector. The framework of the proposed DIS method is shown in Figure 1.

Motion Estimation
Lowe proposed SIFT in 1999 [31]. SIFT feature is robust against rotation, scaling, and illumination changes and considered one of the best feature extraction methods. SIFT searches extreme values in the scale space and generates 128 dimensions descriptors. Figures 2 and 3 show the SIFT feature points and feature points of matching results of two test images, respectively. SIFT feature significantly reduces the probability of mismatch. Nevertheless, false matching can still occur among candidate points, as presented in Figure 3. Matching results may also represent the local motion vector instead of GMV when SIFT feature points are on the foreground objects. In general, random sample consensus (RANSAC) is used to solve the mismatching problem [32]. Finally, GMV between two consecutive frames are calculated by averaging the displacements of different feature points. The motion vector between arbitrary two frames can be obtained by following method: Designate a frame as the reference frame, and calculate the motion vectors between the reference frame and current frames.

Motion Separation
Although the GMV contains translation, rotation, and scaling motions, the motion can be analyzed independently [20]. We take 1D translation displacement as an example in this study.
After conducting motion estimation, the obtained GMV sequence can be considered as a timevarying variable G . The amplitude of G can be regard as the motion displacement of the camera. Consider the following typical GMV:

Motion Estimation
Lowe proposed SIFT in 1999 [31]. SIFT feature is robust against rotation, scaling, and illumination changes and considered one of the best feature extraction methods. SIFT searches extreme values in the scale space and generates 128 dimensions descriptors. Figures 2 and 3 show the SIFT feature points and feature points of matching results of two test images, respectively. SIFT feature significantly reduces the probability of mismatch. Nevertheless, false matching can still occur among candidate points, as presented in Figure 3. Matching results may also represent the local motion vector instead of GMV when SIFT feature points are on the foreground objects. In general, random sample consensus (RANSAC) is used to solve the mismatching problem [32]. Finally, GMV between two consecutive frames are calculated by averaging the displacements of different feature points. The motion vector between arbitrary two frames can be obtained by following method: Designate a frame as the reference frame, and calculate the motion vectors between the reference frame and current frames.

Motion Estimation
Lowe proposed SIFT in 1999 [31]. SIFT feature is robust against rotation, scaling, and illumination changes and considered one of the best feature extraction methods. SIFT searches extreme values in the scale space and generates 128 dimensions descriptors. Figures 2 and 3 show the SIFT feature points and feature points of matching results of two test images, respectively. SIFT feature significantly reduces the probability of mismatch. Nevertheless, false matching can still occur among candidate points, as presented in Figure 3. Matching results may also represent the local motion vector instead of GMV when SIFT feature points are on the foreground objects. In general, random sample consensus (RANSAC) is used to solve the mismatching problem [32]. Finally, GMV between two consecutive frames are calculated by averaging the displacements of different feature points. The motion vector between arbitrary two frames can be obtained by following method: Designate a frame as the reference frame, and calculate the motion vectors between the reference frame and current frames.

Motion Separation
Although the GMV contains translation, rotation, and scaling motions, the motion can be analyzed independently [20]. We take 1D translation displacement as an example in this study.
After conducting motion estimation, the obtained GMV sequence can be considered as a timevarying variable G . The amplitude of G can be regard as the motion displacement of the camera. Consider the following typical GMV:

Motion Estimation
Lowe proposed SIFT in 1999 [31]. SIFT feature is robust against rotation, scaling, and illumination changes and considered one of the best feature extraction methods. SIFT searches extreme values in the scale space and generates 128 dimensions descriptors. Figures 2 and 3 show the SIFT feature points and feature points of matching results of two test images, respectively. SIFT feature significantly reduces the probability of mismatch. Nevertheless, false matching can still occur among candidate points, as presented in Figure 3. Matching results may also represent the local motion vector instead of GMV when SIFT feature points are on the foreground objects. In general, random sample consensus (RANSAC) is used to solve the mismatching problem [32]. Finally, GMV between two consecutive frames are calculated by averaging the displacements of different feature points. The motion vector between arbitrary two frames can be obtained by following method: Designate a frame as the reference frame, and calculate the motion vectors between the reference frame and current frames.

Motion Separation
Although the GMV contains translation, rotation, and scaling motions, the motion can be analyzed independently [20]. We take 1D translation displacement as an example in this study.
After conducting motion estimation, the obtained GMV sequence can be considered as a timevarying variable G . The amplitude of G can be regard as the motion displacement of the camera. Consider the following typical GMV:

Motion Separation
Although the GMV contains translation, rotation, and scaling motions, the motion can be analyzed independently [20]. We take 1D translation displacement as an example in this study.
After conducting motion estimation, the obtained GMV sequence can be considered as a time-varying variable G. The amplitude of G can be regard as the motion displacement of the camera. Consider the following typical GMV: where G(t) represents GMV, I(t) is the intentional motion vector, and J(t) is the jitter motion vector.
To separate the jitter and intentional motion components, GMV is decomposed via VMD. For a testing GMV (as Figure 4 shows), the generated modes are shown in Figure 5, which are arranged from low to high frequencies.
Entropy 2017, 19, 623 6 of 14 where ( ) G t represents GMV, ( ) I t is the intentional motion vector, and ( ) J t is the jitter motion vector. To separate the jitter and intentional motion components, GMV is decomposed via VMD. For a testing GMV (as Figure 4 shows), the generated modes are shown in Figure 5, which are arranged from low to high frequencies.  On the basis of VMD theory, the relationship between the obtained GMV and its modes is exhibited as follows: where M represents the modes; IM and JM are the indexes of intentional and jitter motion modes, respectively.

Intentional Motion Vector Reconstruction
In the current study, RE is used to identify relevance among modes. The first mode is an intentional motion mode because it features the lowest frequency and largest amplitude [21]. Then, REs between the first mode and the other modes are calculated in sequence (denoted as ( 1,2, , ) As the intentional motion is much slower than the frame rate, intentional motion shows smooth transition with high amplitude and low frequency between frames. On the other hand, jitter motion is characterized by low amplitude and high frequency. In general, jitter motions can be where ( ) G t represents GMV, ( ) I t is the intentional motion vector, and ( ) J t is the jitter motion vector. To separate the jitter and intentional motion components, GMV is decomposed via VMD. For a testing GMV (as Figure 4 shows), the generated modes are shown in Figure 5, which are arranged from low to high frequencies.  On the basis of VMD theory, the relationship between the obtained GMV and its modes is exhibited as follows: where M represents the modes; IM and JM are the indexes of intentional and jitter motion modes, respectively.

Intentional Motion Vector Reconstruction
In the current study, RE is used to identify relevance among modes. The first mode is an intentional motion mode because it features the lowest frequency and largest amplitude [21]. Then, REs between the first mode and the other modes are calculated in sequence (denoted as ( 1,2, , ) As the intentional motion is much slower than the frame rate, intentional motion shows smooth transition with high amplitude and low frequency between frames. On the other hand, jitter motion is characterized by low amplitude and high frequency. In general, jitter motions can be On the basis of VMD theory, the relationship between the obtained GMV and its modes is exhibited as follows: where M represents the modes; I M and J M are the indexes of intentional and jitter motion modes, respectively.

Intentional Motion Vector Reconstruction
In the current study, RE is used to identify relevance among modes. The first mode is an intentional motion mode because it features the lowest frequency and largest amplitude [21]. Then, REs between the first mode and the other modes are calculated in sequence (denoted as RE i (i = 1, 2, · · · , K)).
As the intentional motion is much slower than the frame rate, intentional motion shows smooth transition with high amplitude and low frequency between frames. On the other hand, jitter motion is characterized by low amplitude and high frequency. In general, jitter motions can be considered to approximately obey Gaussian distribution [10,15]. Therefore, RE value will remain at low levels when the two modes are both intentional motion components; otherwise, RE value will remain at high levels. The modes exhibit low RE values with the first mode being dominated by intentional motion, whereas the remaining modes are dominated by jitter motion.
The corresponding REs for the modes presented in Figure 5 are shown in Figure 6. RE 1 is the smallest, and RE 2 stays at a low level. However, a sudden increase is observed at RE 3 , and the subsequent REs all stay at a high level. From the preceding analysis, the modes behind the third mode correspond to jitter motions (including the third mode), whereas the first and second modes comprise intentional motions. considered to approximately obey Gaussian distribution [10,15]. Therefore, RE value will remain at low levels when the two modes are both intentional motion components; otherwise, RE value will remain at high levels. The modes exhibit low RE values with the first mode being dominated by intentional motion, whereas the remaining modes are dominated by jitter motion. The corresponding REs for the modes presented in Figure 5 are shown in Figure 6.  The following procedures describe the DIS steps: (1) Calculate the GMV by SIFT point matching algorithm.
(3) Calculate the REs between the first mode and other modes.
(4) If i RE is smaller than a threshold T (usually, can meet the demands of most situations), then the mode i M is considered an intentional motion mode.
(5) Obtain the reconstructed intentional motion by summing the intentional motion modes as follows:

Performance of the RE
To illustrate the effectiveness of RE, three different tests are performed to evaluate mode separation performance. Given a known clean signal ( ) h f t , contaminate the signal with different kind of noises (including the Gaussian noise, office noise, and factory noise) as follows: where ( ) n t is the noise signal with different input SNRs.
The probability density function of Gaussian noise obeys the Gaussian distribution, which includes a fixed mean and variance. Office noise consists of many signals with different frequencies and high amplitude. Factory noise is caused by mechanical shock, rub impact, and air disturbance and includes numerous intermittent and impulse noises. For signals with different noises, we The following procedures describe the DIS steps: (1) Calculate the GMV by SIFT point matching algorithm.
(3) Calculate the REs between the first mode and other modes. (4) If RE i is smaller than a threshold T (usually, T = 1 2 × max(RE i ) can meet the demands of most situations), then the mode M i is considered an intentional motion mode. (5) Obtain the reconstructed intentional motion by summing the intentional motion modes as follows:

Performance of the RE
To illustrate the effectiveness of RE, three different tests are performed to evaluate mode separation performance. Given a known clean signal f h (t), contaminate the signal with different kind of noises (including the Gaussian noise, office noise, and factory noise) as follows: where n(t) is the noise signal with different input SNRs.
The probability density function of Gaussian noise obeys the Gaussian distribution, which includes a fixed mean and variance. Office noise consists of many signals with different frequencies and high amplitude. Factory noise is caused by mechanical shock, rub impact, and air disturbance and includes numerous intermittent and impulse noises. For signals with different noises, we compare several selection criteria, including the Hausdorff distance [24], power amplitude [20], and correlation coefficient [25]. The evaluation steps are as follows. Noises are downloaded from NoiseX-92 database. Signal length is set as 200.
(1) Noises are added to the original clean signal f h (t), and input SNR ranges from −8 dB to 8 dB with interval of 2 dB. (2) Noisy signals are decomposed into several modes via VMD.
where P and P correspond to the powers of the original and reconstructed signals, respectively. The plots of input SNR (SNRin) versus output SNR (SNRout) for different noisy signals are shown in Figures 7-9, respectively. These figures showed that the SNRouts of the RE selection criteria are higher than other selection criteria, which indicates that the RE selection criteria outperforms the other selection criteria. compare several selection criteria, including the Hausdorff distance [24], power amplitude [20], and correlation coefficient [25]. The evaluation steps are as follows. Noises are downloaded from NoiseX-92 database. Signal length is set as 200.
(1) Noises are added to the original clean signal where P and P correspond to the powers of the original and reconstructed signals, respectively.
The plots of input SNR (SNRin) versus output SNR (SNRout) for different noisy signals are shown in Figures 7-9, respectively. These figures showed that the SNRouts of the RE selection criteria are higher than other selection criteria, which indicates that the RE selection criteria outperforms the other selection criteria.   compare several selection criteria, including the Hausdorff distance [24], power amplitude [20], and correlation coefficient [25]. The evaluation steps are as follows. Noises are downloaded from NoiseX-92 database. Signal length is set as 200.
(1) Noises are added to the original clean signal where P and P correspond to the powers of the original and reconstructed signals, respectively.
The plots of input SNR (SNRin) versus output SNR (SNRout) for different noisy signals are shown in Figures 7-9, respectively. These figures showed that the SNRouts of the RE selection criteria are higher than other selection criteria, which indicates that the RE selection criteria outperforms the other selection criteria.

Performance of the VMD-RE Method in DIS
Several simulation tests are performed to verify the effectiveness of the proposed VMD-RE method. A camera that mounted on holder mechanism is used to capture the video sequences, as shown in Figure 10. In this paper, we use the SNR and root mean square error (RMSE) [21]: 10 10 log ( / ) where P and P are the powers of the ground truth and the resulted intentional motion, respectively. is the number of sample points; and and ̅ are the amplitude of each point in the ground truth and reconstructed motion vectors, respectively. Four typical unstable scouting video sequences are tested. For Test 1, intentional motion is approximately linear, and jitter motion obeys a Gaussian distribution with fixed mean and variance. For Test 2, intentional motion contains multi-frequency components, and jitter motion obeys a Gaussian distribution. For Test 3, the level of jitter motion varies, and variance is low at former frames and increases along with time. For Test 4, the amplitude of jitter motion is maintained at high levels compared with that of intentional motions, and the level of jitter motion is time-varying. Experimental tests are performed using MATLAB ® R2013a running on a PC equipped with a 2.60 GHz Intel Core i7-6700HQ CPU with 8 GB RAM. As shown in Figures 11-14, four pairs of images are extracted from different video sequences. The displacement between two frames can be obtained

Performance of the VMD-RE Method in DIS
Several simulation tests are performed to verify the effectiveness of the proposed VMD-RE method. A camera that mounted on holder mechanism is used to capture the video sequences, as shown in Figure 10. In this paper, we use the SNR and root mean square error (RMSE) [21]: where P and P are the powers of the ground truth and the resulted intentional motion, respectively. N is the number of sample points; and x n and x n are the amplitude of each point in the ground truth and reconstructed motion vectors, respectively.

Performance of the VMD-RE Method in DIS
Several simulation tests are performed to verify the effectiveness of the proposed VMD-RE method. A camera that mounted on holder mechanism is used to capture the video sequences, as shown in Figure 10. In this paper, we use the SNR and root mean square error (RMSE) [21]: 10 10 log ( / ) where P and P are the powers of the ground truth and the resulted intentional motion, respectively. is the number of sample points; and and ̅ are the amplitude of each point in the ground truth and reconstructed motion vectors, respectively. Four typical unstable scouting video sequences are tested. For Test 1, intentional motion is approximately linear, and jitter motion obeys a Gaussian distribution with fixed mean and variance. For Test 2, intentional motion contains multi-frequency components, and jitter motion obeys a Gaussian distribution. For Test 3, the level of jitter motion varies, and variance is low at former frames and increases along with time. For Test 4, the amplitude of jitter motion is maintained at high levels compared with that of intentional motions, and the level of jitter motion is time-varying. Experimental tests are performed using MATLAB ® R2013a running on a PC equipped with a 2.60 GHz Intel Core i7-6700HQ CPU with 8 GB RAM. As shown in Figures 11-14, four pairs of images are extracted from different video sequences. The displacement between two frames can be obtained Four typical unstable scouting video sequences are tested. For Test 1, intentional motion is approximately linear, and jitter motion obeys a Gaussian distribution with fixed mean and variance. For Test 2, intentional motion contains multi-frequency components, and jitter motion obeys a Gaussian distribution. For Test 3, the level of jitter motion varies, and variance is low at former frames and increases along with time. For Test 4, the amplitude of jitter motion is maintained at high levels compared with that of intentional motions, and the level of jitter motion is time-varying. Experimental tests are performed using MATLAB ® R2013a running on a PC equipped with a 2.60 GHz Intel Core i7-6700HQ CPU with 8 GB RAM. As shown in Figures 11-14, four pairs of images are extracted from different video sequences. The displacement between two frames can be obtained using the SIFT feature point matching algorithm. The first picture in each group is the reference frame, whereas the second picture is the current frame. The blue lines show the image matching results. The actual GMV, ground truth intentional motions, and retrieved intentional motions are shown in Figures 15-18. Tables 1 and 2 show the RMSE and SNR values obtained using six different DIS algorithms, including the MF [11], KF [15], wavelet decomposition (WD) method [19], EMD-based method [20], enhanced EMD-based (E-EMD) method [21], and the proposed method.
Entropy 2017, 19,623 10 of 14 using the SIFT feature point matching algorithm. The first picture in each group is the reference frame, whereas the second picture is the current frame. The blue lines show the image matching results. The actual GMV, ground truth intentional motions, and retrieved intentional motions are shown in Figures 15-18. Tables 1 and 2 show the RMSE and SNR values obtained using six different DIS algorithms, including the MF [11], KF [15], wavelet decomposition (WD) method [19], EMD-based method [20], enhanced EMD-based (E-EMD) method [21], and the proposed method.     using the SIFT feature point matching algorithm. The first picture in each group is the reference frame, whereas the second picture is the current frame. The blue lines show the image matching results. The actual GMV, ground truth intentional motions, and retrieved intentional motions are shown in Figures 15-18. Tables 1 and 2 show the RMSE and SNR values obtained using six different DIS algorithms, including the MF [11], KF [15], wavelet decomposition (WD) method [19], EMD-based method [20], enhanced EMD-based (E-EMD) method [21], and the proposed method.     using the SIFT feature point matching algorithm. The first picture in each group is the reference frame, whereas the second picture is the current frame. The blue lines show the image matching results. The actual GMV, ground truth intentional motions, and retrieved intentional motions are shown in Figures 15-18. Tables 1 and 2 show the RMSE and SNR values obtained using six different DIS algorithms, including the MF [11], KF [15], wavelet decomposition (WD) method [19], EMD-based method [20], enhanced EMD-based (E-EMD) method [21], and the proposed method.                  First, from Tables 1 and 2, we can conclude that the MF generates the poorest results in Tests 1 and 2, KF in Test 4, and WD in Test 3. These three kinds of methods show unstable performances. MF performance highly depends on window size [11]. Larger window size generates a smoother intentional motion vector and vice versa. In this paper, window size is set as 5. Window size is accurate in some conditions but not in others. KF is not adaptive to changing jitter levels in Tests 3 and 4, and stabilization results are insufficiently accurate. KF requires that observation and transition noises obey the Gaussian distribution, and variances must be constant. However, in many cases,   First, from Tables 1 and 2, we can conclude that the MF generates the poorest results in Tests 1 and 2, KF in Test 4, and WD in Test 3. These three kinds of methods show unstable performances. MF performance highly depends on window size [11]. Larger window size generates a smoother intentional motion vector and vice versa. In this paper, window size is set as 5. Window size is accurate in some conditions but not in others. KF is not adaptive to changing jitter levels in Tests 3 and 4, and stabilization results are insufficiently accurate. KF requires that observation and transition noises obey the Gaussian distribution, and variances must be constant. However, in many cases,  First, from Tables 1 and 2, we can conclude that the MF generates the poorest results in Tests 1 and 2, KF in Test 4, and WD in Test 3. These three kinds of methods show unstable performances. MF performance highly depends on window size [11]. Larger window size generates a smoother intentional motion vector and vice versa. In this paper, window size is set as 5. Window size is accurate in some conditions but not in others. KF is not adaptive to changing jitter levels in Tests 3 and 4, and stabilization results are insufficiently accurate. KF requires that observation and transition noises obey the Gaussian distribution, and variances must be constant. However, in many cases, transition variance is time-varying, causing KF to generate poor results in Test 4. The WD method can hardly select an appropriate wavelet basis function applicable in all conditions [19]. The performance may improve if basis function is well-selected and vice versa. These three traditional methods cannot be adapted to changing conditions and cannot be used in complex vehicle-mounted DIS systems. Second, comparing mode decomposition methods with traditional methods, we can conclude that mode decomposition methods generally perform better than traditional ones. Nevertheless, we also note that EMD method performs well in Tests 3 and 4 but poorly in Tests 1 and 2. This result can be attributed to the difficulty of determining the relevant model in complex condition because frequency information of intentional and jitter motions may overlap (mode mixing). E-EMD method generates better results than the traditional EMD method (mode mixing problem can be alleviated by adding white noise series to the targeted data and averaged corresponding intrinsic mode functions). However, compared with the proposed method, such performance remains at a disadvantage. By contrast, the proposed method calculates jitter motion variance and generates considerably better results than the other methods. The proposed method produces the lowest RMSE values and the highest SNR values in all tests.

Conclusions
This study proposed a DIS method based on VMD and RE. GMV is estimated using a SIFT feature point matching algorithm. Then, GMV is decomposed via VMD. According to the RE value between modes, relevant modes of intentional and jitter motions are determined. Performance of the proposed method is compared with several state-of-the-art methods. Simulation results show better performance of the proposed method than other related methods based on quantitative comparisons of RMSE and SNR values.