A Comprehensive Motion Estimation Technique for the Improvement of EIS Methods Based on the SURF Algorithm and Kalman Filter

Video stabilization is an important technology for removing undesired motion in videos. This paper presents a comprehensive motion estimation method for electronic image stabilization techniques, integrating the speeded up robust features (SURF) algorithm, modified random sample consensus (RANSAC), and the Kalman filter, and also taking camera scaling and conventional camera translation and rotation into full consideration. Using SURF in sub-pixel space, feature points were located and then matched. The false matched points were removed by modified RANSAC. Global motion was estimated by using the feature points and modified cascading parameters, which reduced the accumulated errors in a series of frames and improved the peak signal to noise ratio (PSNR) by 8.2 dB. A specific Kalman filter model was established by considering the movement and scaling of scenes. Finally, video stabilization was achieved with filtered motion parameters using the modified adjacent frame compensation. The experimental results proved that the target images were stabilized even when the vibrating amplitudes of the video become increasingly large.


Introduction
Photographic jitter, caused by the vibration of a moving camera, often produces undesirable effects, which video stabilization methods are designed to mitigate or eliminate. Image stabilization technologies such as mechanical image stabilization (MIS), optical image stabilization (OIS) [1], and most recently electronic image stabilization (EIS) [2] are widely applied in areas such as camera capture, vehicle monitoring, and airborne and shipboard observations. Mechanical and optical image stabilization methods usually involve adjusting the spatial positions of an element or group of elements. Compared with MIS and OIS, EIS as a software-based approach has the advantage of lower cost and easier integration, though there are limitations to the software's accuracy and speed and the hardware's image performance. EIS may also be used as the fine stabilization frame in a coarse/fine combination two-level stabilization approach [3][4][5]. For an in-vehicle camera, EIS is relatively cost-efficient and may be the best option.
Research into this approach has focused on developing accurate, high-speed EIS with image blur alleviation. EIS systems generally have three main components: Global motion estimation, motion smoothing, and motion compensation [6][7][8][9][10][11][12][13][14][15][16][17][18][19]. These involve: (1) extracting high-precision features and obtaining precise positioning points; (2) separating the camera's intentional scanning movement and random noise vibration using discrete filters based on motion estimation results; First, in the SURF algorithm, the potential feature points are distinguished with a Hessian matrix and non-maximum suppression, in which box filters are used to approximate Gaussian derivatives to simplify the Hessian matrix calculation, as shown in Figure 2a. At the same time, the original image is transformed into coordinates using the multi-resolution pyramid technique. Thus a copy of the image is obtained with the same size but with reduced pixel bandwidth, achieving the space of different sub-pixel scale in parallel, as shown in Figure 2b. The feature points are selected by locating extreme points by means of the gradient value around the points. The Haar wavelet responses in both x-and y-directions around the point of interest are then computed to set a multidimensional vector as the SURF feature descriptor. Second, the affine model between two adjacent nth and ( 1 n  )th frames is described for the process of global motion estimation.
The cascading parameters are defined in Formula (2), in which  Figure 3 shows how the adjacent affine matrix is achieved subsequent to the previous frame through cascading parameters, which describe the current frame's motion relative to the reference frame. To reduce accumulated errors in a series First, in the SURF algorithm, the potential feature points are distinguished with a Hessian matrix and non-maximum suppression, in which box filters are used to approximate Gaussian derivatives to simplify the Hessian matrix calculation, as shown in Figure 2a. At the same time, the original image is transformed into coordinates using the multi-resolution pyramid technique. Thus a copy of the image is obtained with the same size but with reduced pixel bandwidth, achieving the space of different sub-pixel scale in parallel, as shown in Figure 2b. The feature points are selected by locating extreme points by means of the gradient value around the points. The Haar wavelet responses in both x-and y-directions around the point of interest are then computed to set a multidimensional vector as the SURF feature descriptor. First, in the SURF algorithm, the potential feature points are distinguished with a Hessian matrix and non-maximum suppression, in which box filters are used to approximate Gaussian derivatives to simplify the Hessian matrix calculation, as shown in Figure 2a. At the same time, the original image is transformed into coordinates using the multi-resolution pyramid technique. Thus a copy of the image is obtained with the same size but with reduced pixel bandwidth, achieving the space of different sub-pixel scale in parallel, as shown in Figure 2b. The feature points are selected by locating extreme points by means of the gradient value around the points. The Haar wavelet responses in both x-and y-directions around the point of interest are then computed to set a multidimensional vector as the SURF feature descriptor. Second, the affine model between two adjacent nth and ( 1 n  )th frames is described for the process of global motion estimation.
The cascading parameters are defined in Formula (2), in which refers to rotation and scaling, and Figure 3 shows how the adjacent affine matrix is achieved subsequent to the previous frame through cascading parameters, which describe the current frame's motion relative to the reference frame. To reduce accumulated errors in a series Second, the affine model between two adjacent nth and (n´1)th frames is described for the process of global motion estimation. I n ppq "`x n , y n˘a nd I n´1 ppq "`x n´1 , y n´1˘r efer to the corresponding points, as shown in Formula (1): The cascading parameters are defined in Formula (2) the current frame's motion relative to the reference frame. To reduce accumulated errors in a series of frames, the modified cascading parameters A n´1 and B n´1 are deduced from the adjacent cascading parameters, as defined in Equation (2): Sensors 2016, 16  Third, the removal of false matching points with the modified RANSAC method is realized. Iteratively, the foreground and background scenes are distinguished and the false matching points are removed by using a local optimal motion estimation model, along with matched points in the foreground. The data are thus preprocessed to remove noise, or false matching points.
The Kalman filter is then applied for removal of high-frequency vibration from the trajectory curves by distinguishing the camera's intentional motion from vibrations or jitters. The tracking of the estimated state and the variance or uncertainty is applied by the Kalman filter model [20], as shown in Figure 4      Third, the removal of false matching points with the modified RANSAC method is realized. Iteratively, the foreground and background scenes are distinguished and the false matching points are removed by using a local optimal motion estimation model, along with matched points in the foreground. The data are thus preprocessed to remove noise, or false matching points.
The Kalman filter is then applied for removal of high-frequency vibration from the trajectory curves by distinguishing the camera's intentional motion from vibrations or jitters. The tracking of the estimated state and the variance or uncertainty is applied by the Kalman filter model [20], as shown in Figure 4. State variableX t t´1 at time (t´1) is estimated by using the Kalman state transition model φ t t´1 and the filtered resultX t´1 at time (t´1). The observation dataŶ t t´1 are estimated by usinĝ X t t´1 and the observation model C t . The error variance forecast P t t´1 at time t is defined by using φ t t´1 , the error variance P t´1 at time (t´1) and the process noise covariance matrix Q t´1 . At time t, the Kalman gain K t is then calculated by P t t´1 , C t , and the observation noise covariance matrix R t . The error variance P t at time t is defined by using the unit matrix I, K t , C t and P t t´1 . At time t, the filtered or expected stateX t is then updated usingX t t´1 , K t ,Ŷ t t´1 , and the actual observation Y t at time t. The parameters of each frame in the videos are then recursively calculated.  Third, the removal of false matching points with the modified RANSAC method is realized. Iteratively, the foreground and background scenes are distinguished and the false matching points are removed by using a local optimal motion estimation model, along with matched points in the foreground. The data are thus preprocessed to remove noise, or false matching points.
The Kalman filter is then applied for removal of high-frequency vibration from the trajectory curves by distinguishing the camera's intentional motion from vibrations or jitters. The tracking of the estimated state and the variance or uncertainty is applied by the Kalman filter model [20], as shown in Figure 4  1st /reference frame 2nd frame nth frame 4th frame ... Finally, the motion is compensated by using the filtered affine matricesÂ n andB n at frame n.
The position compensation parameters A c n´1 "Â n´1`An´1˘´1 and B c n´1 "´Â n´1`An´1˘´1 B n´1B n´1 are applied according to Equation (3): where I n ppq andÎ n ppq refer to the initial and stabilized nth frames, respectively, and the modified cascading parameters A n´1 and B n´1 are deduced for smaller accumulated errors by using the selected reference frame (the first frame in a continuous sequence) instead of the adjacent frame. It is then refreshed periodically. The current frame is set as the reference frame when the suppression variabilities in the scenes, e.g., linear correlation coefficients for the trajectory curves (stated in Section 2.3), are smaller than certain values, i.e., 0.9, for the consideration of the efficiency assessment of vibration suppression.

Selection of Feature Point Detection Algorithms
In this section, the SURF algorithm is investigated in sub-pixel space and compared with two widely used methods, the Scale Invariant Feature Transform (SIFT) [23,24] and Orb [25] algorithms, to find the best algorithm for fast and accurate feature point detection.
We test the algorithms on a series of different images with a resolution of 640ˆ480. For purposes of accuracy evaluation, 81 extra feature points are placed on the original image in a regular two-dimensional grid. Pixel intensity is interpolated at sub-pixel accuracies of 0.5, 0.3 and 0.1 pixels. Three sample images with different features are illustrated in Figure 5, where image (a) has a dark scene, image (b) has a bright object in the scene, and image (c) has several cars against a clear green outdoor scene. Results of speed and accuracy tests are listed in Figure 6. The accuracy for feature point detection is defined as the ratio of the number of detected points to 81; a point is recognized as being detected when its Euclidean distance, as calculated from the initial position, is smaller than the sub-pixel accuracy. The speed of point detection is derived from the calculation time for one image, measured in microseconds (ms). For comparison, we give an example of the outdoor scene results at 0.1 pixels ( Figure 7); the other tests have similar results. At the same time, the algorithms are tested on sample images in the video sequences. Figure 8 shows video frames from the sample sequences in MATLAB's image processing toolbox, (a) and (b) are the indoor and outdoor scenes respectively. Average values of the results of speed and accuracy tests are listed in Figure 9, for ten frames starting with the presented one in the sequences. The tests also have similar results. The results show that the SURF algorithm accurately detects the feature points in the sub-pixel space; an improvement in speed is expected from mapping onto FPGA substrates in future work. Finally, the motion is compensated by using the filtered affine matrices  n A and  n B at frame n.
The position compensation parameters    using the selected reference frame (the first frame in a continuous sequence) instead of the adjacent frame. It is then refreshed periodically. The current frame is set as the reference frame when the suppression variabilities in the scenes, e.g., linear correlation coefficients for the trajectory curves (stated in Section 2.3), are smaller than certain values, i.e., 0.9, for the consideration of the efficiency assessment of vibration suppression.

Selection of Feature Point Detection Algorithms
In this section, the SURF algorithm is investigated in sub-pixel space and compared with two widely used methods, the Scale Invariant Feature Transform (SIFT) [23,24] and Orb [25] algorithms, to find the best algorithm for fast and accurate feature point detection.
We test the algorithms on a series of different images with a resolution of 640 × 480. For purposes of accuracy evaluation, 81 extra feature points are placed on the original image in a regular two-dimensional grid. Pixel intensity is interpolated at sub-pixel accuracies of 0.5, 0.3 and 0.1 pixels. Three sample images with different features are illustrated in Figure 5, where image (a) has a dark scene, image (b) has a bright object in the scene, and image (c) has several cars against a clear green outdoor scene. Results of speed and accuracy tests are listed in Figure 6. The accuracy for feature point detection is defined as the ratio of the number of detected points to 81; a point is recognized as being detected when its Euclidean distance, as calculated from the initial position, is smaller than the sub-pixel accuracy. The speed of point detection is derived from the calculation time for one image, measured in microseconds (ms). For comparison, we give an example of the outdoor scene results at 0.1 pixels (Figure 7); the other tests have similar results. At the same time, the algorithms are tested on sample images in the video sequences. Figure 8 shows video frames from the sample sequences in MATLAB's image processing toolbox, (a) and (b) are the indoor and outdoor scenes respectively. Average values of the results of speed and accuracy tests are listed in Figure 9, for ten frames starting with the presented one in the sequences. The tests also have similar results. The results show that the SURF algorithm accurately detects the feature points in the sub-pixel space; an improvement in speed is expected from mapping onto FPGA substrates in future work.

Quality Assessment by Using PSNR and Trajectory Tracking
Peak signal to noise ratio (PSNR) and trajectory tracking are applied when assessing the quality of image stabilization. The PSNR is defined in Equation (4)

Quality Assessment by Using PSNR and Trajectory Tracking
Peak signal to noise ratio (PSNR) and trajectory tracking are applied when assessing the quality of image stabilization. The PSNR is defined in Equation (4)

Quality Assessment by Using PSNR and Trajectory Tracking
Peak signal to noise ratio (PSNR) and trajectory tracking are applied when assessing the quality of image stabilization. The PSNR is defined in Equation (4)

Quality Assessment by Using PSNR and Trajectory Tracking
Peak signal to noise ratio (PSNR) and trajectory tracking are applied when assessing the quality of image stabilization. The PSNR is defined in Equation (4)

Quality Assessment by Using PSNR and Trajectory Tracking
Peak signal to noise ratio (PSNR) and trajectory tracking are applied when assessing the quality of image stabilization. The PSNR is defined in Equation (4): PSNR pI m , I n q " 10log 255 2 MSE pI m , I n q where I m and I n refer to two frames, and MSE pI m , I n q " 1 MN pI m pi, jq´I n pi, jqq 2 refers to the mean square error of the two frames, with the values calculated by scanning through one image with N rows and M columns. Trajectory curves are described by tracking a point in the image sequence as shown in Figure 10. The effect of the stabilization is then determined by comparing the values of the correlation coefficients r XY pC1q and r XY pC2q. The correlation coefficient r XY of two trajectory curves is calculated by Equation (5), where pX, Yq is the coordinate data of a point on a curve, N is the total number of points, X and Y are the mean values of X and Y, respectively: The curves' similarity is therefore greater as the coefficient XY r approaches 1, and its values are used in the following section for discussing experimental results quantitatively with a series of vibration videos of differing predefined amplitudes. Thus, the values of the correlation coefficients are calculated to adjust the reference frame in dynamic scenes. In our experiments, the threshold values are set at 0.9 for robust vibration suppression.

Experimental Results and Discussion
This section describes the experimental results. All experiments were performed on a PC with a 3.3 GHz CPU and 4.0 GB of memory and the software was written in C++. The size of the experimental picture was 640 × 480 pixels and the size of the experimental video was 320 × 240 pixels. The process of the algorithm was applied as shown in Figure 11. Module performance testing made use of video clips of scenes of prairie and sky and sample videos captured by the in-vehicle camera. Different kinds of vibration videos captured using the mobile in-vehicle camera are discussed in the accuracy evaluations and performance assessments. Vibration video sequences of 30 fps are investigated, with increasing vehicle speeds of 20 km/h, 40 km/h and 60 km/h on stable concrete road, bumpy sand aggregate road, and soft mud road. The experimental results are provided here for mobile in-vehicle videos, in which the values of PSNR are used to estimate the quality of the image stabilization. Figure 11. Application of the stabilization method. The curves' similarity is therefore greater as the coefficient |r XY | approaches 1, and its values are used in the following section for discussing experimental results quantitatively with a series of vibration videos of differing predefined amplitudes. Thus, the values of the correlation coefficients are calculated to adjust the reference frame in dynamic scenes. In our experiments, the threshold values are set at 0.9 for robust vibration suppression.

Experimental Results and Discussion
This section describes the experimental results. All experiments were performed on a PC with a 3.3 GHz CPU and 4.0 GB of memory and the software was written in C++. The size of the experimental picture was 640ˆ480 pixels and the size of the experimental video was 320ˆ240 pixels. The process of the algorithm was applied as shown in Figure 11. Module performance testing made use of video clips of scenes of prairie and sky and sample videos captured by the in-vehicle camera. Different kinds of vibration videos captured using the mobile in-vehicle camera are discussed in the accuracy evaluations and performance assessments. Vibration video sequences of 30 fps are investigated, with increasing vehicle speeds of 20 km/h, 40 km/h and 60 km/h on stable concrete road, bumpy sand aggregate road, and soft mud road. The experimental results are provided here for mobile in-vehicle videos, in which the values of PSNR are used to estimate the quality of the image stabilization.
in-vehicle camera. Different kinds of vibration videos captured using the mobile in-vehicle camera are discussed in the accuracy evaluations and performance assessments. Vibration video sequences of 30 fps are investigated, with increasing vehicle speeds of 20 km/h, 40 km/h and 60 km/h on stable concrete road, bumpy sand aggregate road, and soft mud road. The experimental results are provided here for mobile in-vehicle videos, in which the values of PSNR are used to estimate the quality of the image stabilization. Figure 11. Application of the stabilization method. Figure 11. Application of the stabilization method.

Module Performance Testing
The module performance of the stabilization method are tested here. First, the mismatched point removal module was verified by using different kinds of video clips. Two consecutive frames in the sample video clips of the scenes of prairie and sky are shown in Figure 12, where the positions of the tank and the flight vehicle are changed, respectively. In Figures 13a and 14a, green matching pairs refer to true matching, and blue pairs on the target refer to false matching on the foreground. As illustrated in Figures 13b and 14b, local motion vectors between two frames were used to indicate the matching pairs, as the motion vector for false matching went in a different direction. To quantify the repeatability of the module, the affine matrices are calculated 10 times, and the results show that stable feature point matching is achieved. For the prairie video, the value of PSNR increased from 26.76 dB to 29.61 dB. For the sky video, the value of PSNR increased from 29.20 dB to 32.25 dB. The corresponding relative increases in the values of PSNR were both 10%.

Module Performance Testing
The module performance of the stabilization method are tested here. First, the mismatched point removal module was verified by using different kinds of video clips. Two consecutive frames in the sample video clips of the scenes of prairie and sky are shown in Figure 12, where the positions of the tank and the flight vehicle are changed, respectively. In Figure 13a and Figure 14a, green matching pairs refer to true matching, and blue pairs on the target refer to false matching on the foreground. As illustrated in Figure 13b and Figure 14b, local motion vectors between two frames were used to indicate the matching pairs, as the motion vector for false matching went in a different direction. To quantify the repeatability of the module, the affine matrices are calculated 10 times, and the results show that stable feature point matching is achieved. For the prairie video, the value of PSNR increased from 26.76 dB to 29.61 dB. For the sky video, the value of PSNR increased from 29.20 dB to 32.25 dB. The corresponding relative increases in the values of PSNR were both 10%. The Kalman filter is applied to the videos in connection with the modified cascading parameters by using the selected reference frame proposed in Section 2.1. One sample frame from the original video captured using an in-vehicle camera in a moving car is shown in Figure 15. The results of two different algorithm models are compared, depending on whether the modified cascading parameters were used. Figure 16 shows the x-direction motion values between two frames in the same image sequences. The models indicate that the trajectory and values of the curves in (a) and (b) varied as the reference frame changed; the blue line indicates the motion values of the video sequence and the  The Kalman filter is applied to the videos in connection with the modified cascading parameters by using the selected reference frame proposed in Section 2.1. One sample frame from the original video captured using an in-vehicle camera in a moving car is shown in Figure 15. The results of two different algorithm models are compared, depending on whether the modified cascading parameters were used. Figure 16 shows the x-direction motion values between two frames in the same image sequences. The models indicate that the trajectory and values of the curves in (a) and (b) varied as the reference frame changed; the blue line indicates the motion values of the video sequence and the red line is that of the trajectory curve. The values of PSNR for the region of interest in the image, as indicated by the green box, were evaluated. The Kalman filter is applied to the videos in connection with the modified cascading parameters by using the selected reference frame proposed in Section 2.1. One sample frame from the original video captured using an in-vehicle camera in a moving car is shown in Figure 15. The results of two different algorithm models are compared, depending on whether the modified cascading parameters were used. Figure 16 shows the x-direction motion values between two frames in the same image sequences. The models indicate that the trajectory and values of the curves in (a) and (b) varied as the reference frame changed; the blue line indicates the motion values of the video sequence and the red line is that of the trajectory curve. The values of PSNR for the region of interest in the image, as indicated by the green box, were evaluated.   As shown in Figure 17, modified cascading parameters were a precondition for achieving better image quality with a larger PSNR, by up to 8.2 dB. As shown in Figure 17, modified cascading parameters were a precondition for achieving better image quality with a larger PSNR, by up to 8.2 dB.

Accuracy Evaluation with Vibration Videos of Predefined Amplitudes
The accuracy of the image stabilization method is evaluated for the consideration of suppression variabilities. Four sample frames in vibration videos are shown in Figure 18, which are captured from a mobile in-vehicle camera. Accuracy is quantitatively calculated with the correlation coefficients, the image stabilization improving as the coefficient value approaches one. First, the background video with no vibration is captured. Second, five segments of vibration videos are captured at an amplitude of 10 Hz, with each gradually increasing as indicated by the parameter P, which is calculated from the ratio of the maximum vibration amplitude to the diagonal size of the image. Third, the videos are stabilized with the comprehensive algorithm. Finally, the correlation coefficients ( 1)

Accuracy Evaluation with Vibration Videos of Predefined Amplitudes
The accuracy of the image stabilization method is evaluated for the consideration of suppression variabilities. Four sample frames in vibration videos are shown in Figure 18, which are captured from a mobile in-vehicle camera. Accuracy is quantitatively calculated with the correlation coefficients, the image stabilization improving as the coefficient value approaches one. First, the background video with no vibration is captured. Second, five segments of vibration videos are captured at an amplitude of 10 Hz, with each gradually increasing as indicated by the parameter P, which is calculated from the ratio of the maximum vibration amplitude to the diagonal size of the image. Third, the videos are stabilized with the comprehensive algorithm. Finally, the correlation coefficients r XY pC1q and r XY pC2q are calculated, as listed in Table 1, which shows that the parameter r XY pC2q has the stable value of 0.9964 for the stabilized video even when the values of P increase to 2.09%. The results also show that the PSNR values become larger for the stabilized video in comparison to the source videos. As shown in Figure 17, modified cascading parameters were a precondition for achieving better image quality with a larger PSNR, by up to 8.2 dB.

Accuracy Evaluation with Vibration Videos of Predefined Amplitudes
The accuracy of the image stabilization method is evaluated for the consideration of suppression variabilities. Four sample frames in vibration videos are shown in Figure 18, which are captured from a mobile in-vehicle camera. Accuracy is quantitatively calculated with the correlation coefficients, the image stabilization improving as the coefficient value approaches one. First, the background video with no vibration is captured. Second, five segments of vibration videos are captured at an amplitude of 10 Hz, with each gradually increasing as indicated by the parameter P, which is calculated from the ratio of the maximum vibration amplitude to the diagonal size of the image. Third, the videos are stabilized with the comprehensive algorithm. Finally, the correlation coefficients ( 1)

Performance Assessment Using a Vibration Video Sequence
The comprehensive module of the algorithm is applied in this section, in which rotation and translation motions were included in the video. The quality of image stabilization is assessed using the values of PSNR. Consecutive frames were extracted from the original and the resulting stabilized sequences are shown in Figure 19. As indicated by the red crossed lines, the images in (a) vibrate violently in the original sequence, whereas the target images in (b) are stabilized in the new sequence. In Figure 20, the inter-frame difference images (IDIs) between frames are extracted, which shows that the profile in the stabilized sequences (b) and (d) is clearer than in the original images (a) and (c). The IDIs also help calculate the vibration amplitude, and the parameter P is 2.04%. Figure 21 shows the quality of the experimental video as the average values of PSNR become increasingly large. The values of PSNR for a reference method are also calculated where the Orb module is applied. The average values of PSNR for the proposed and the reference methods are 28.02 dB and 27.57 dB respectively. The average processing time for one image in the video clips is about 210 ms with the current experimental platform, with the SURF algorithm occupying about 88% of the computation time. The comprehensive module of the algorithm is applied in this section, in which rotation and translation motions were included in the video. The quality of image stabilization is assessed using the values of PSNR. Consecutive frames were extracted from the original and the resulting stabilized sequences are shown in Figure 19. As indicated by the red crossed lines, the images in (a) vibrate violently in the original sequence, whereas the target images in (b) are stabilized in the new sequence. In Figure 20, the inter-frame difference images (IDIs) between frames are extracted, which shows that the profile in the stabilized sequences (b) and (d) is clearer than in the original images (a) and (c). The IDIs also help calculate the vibration amplitude, and the parameter P is 2.04%. Figure 21 shows the quality of the experimental video as the average values of PSNR become increasingly large. The values of PSNR for a reference method are also calculated where the Orb module is applied. The average values of PSNR for the proposed and the reference methods are 28.02 dB and 27.57 dB respectively. The average processing time for one image in the video clips is about 210 ms with the current experimental platform, with the SURF algorithm occupying about 88% of the computation time.

Performance Assessment Using the Video Sequences with Increasing Vehicle Speed
In this section, the algorithm performance is assessed by using the vibration video sequences at 30 fps, with increasing vehicle speeds of 20 km/h, 40 km/h and 60 km/h. Six sample frames in the video sequences are shown in Figure 22. The videos were captured from a mobile in-vehicle camera when the vehicle was on stable concrete road, bumpy sand aggregate road, and soft mud road. The average processing time for one image in the video clips is about 230 ms in the current experimental platform. The experimental results proved that the target images were stabilized and the values of PSNR increased as the vehicle speed increased, as shown in Table 2.

Performance Assessment Using the Video Sequences with Increasing Vehicle Speed
In this section, the algorithm performance is assessed by using the vibration video sequences at 30 fps, with increasing vehicle speeds of 20 km/h, 40 km/h and 60 km/h. Six sample frames in the video sequences are shown in Figure 22. The videos were captured from a mobile in-vehicle camera when the vehicle was on stable concrete road, bumpy sand aggregate road, and soft mud road. The average processing time for one image in the video clips is about 230 ms in the current experimental platform. The experimental results proved that the target images were stabilized and the values of PSNR increased as the vehicle speed increased, as shown in Table 2.
However, the feature points in a single frame could not be distinguished when captured on the bumpy sand aggregate road and the soft mud road at the speed of 60 km/h. It is expected in future work that videos captured using a high-speed camera will have better results at 60 km/h or higher. 30 fps, with increasing vehicle speeds of 20 km/h, 40 km/h and 60 km/h. Six sample frames in the video sequences are shown in Figure 22. The videos were captured from a mobile in-vehicle camera when the vehicle was on stable concrete road, bumpy sand aggregate road, and soft mud road. The average processing time for one image in the video clips is about 230 ms in the current experimental platform. The experimental results proved that the target images were stabilized and the values of PSNR increased as the vehicle speed increased, as shown in Table 2.

Conclusions and Future Work
This paper has proposed a comprehensive motion estimation technique for an improved EIS method that can be applied to a mobile in-vehicle camera. In the image sequences, correct points were extracted based on SURF and used to solve for the affine parameters. Modified RANSAC was used to purify the matching points. The Kalman filtering processes were applied to correctly compensate for motion by using modified cascade parameters. High-frequency vibration in the video sequences was effectively removed as translation, rotation, and scene scaling were taken into account. The experimental results show that the target images were stabilized using the proposed image stabilization algorithm, and the average PSNR values became increasingly large. The algorithm performance was assessed by using video sequences from the mobile in-vehicle camera, which showed the target images stabilized as the vehicle speed increased. It is expected that a high-speed camera would help achieve better results in future work. As the algorithm possesses the inherent characteristic of structural pipeline models, it can be integrated into FPGA substrates. The high-speed and super-zoom requirements of the vehicle platform will also be analyzed and integrated in future work.