Self-Calibration Spherical Video Stabilization Based on Gyroscope

: With the development of handheld video capturing devices, video stabilization becomes increasingly important. The gyroscope-based video stabilization methods perform promising ability, since they can return more reliable three-dimensional (3D) camera rotation estimation, especially when there are many moving objects in scenes or there are serious motion blur or illumination changes. However, the gyroscope-based methods depend on the camera intrinsic parameters to execute video stabilization. Therefore, a self-calibrated spherical video stabilization method was proposed. It builds a virtual sphere, of which the spherical radius is calibrated automatically, and then projects each frame of the video to the sphere. Through the inverse rotation of the spherical image according to the rotation jitter component, the dependence on the camera intrinsic parameters is relaxed. The experimental results showed that the proposed method does not need to calibrate the camera and it can suppress the camera jitter by binding the gyroscope on the camera. Moreover, compared with other state-of-the-art methods, the proposed method can improve the peak signal-to-noise ratio, the structural similarity metric, the cropping ratio, the distortion score, and the stability score.


Introduction
In recent years, with the development and popularization of handheld cameras and mobile phones, videos have been widely used to record interesting and important moments. However, in the moving environment, due to the instability of the carrier, the captured video is often accompanied by different degrees of image jitter [1,2]. This kind of jittered video will not only reduce the quality of the video, resulting in poor perception, but also affect the subsequent processing of the video [3]. Therefore, video stabilization is significant.
The purpose of video stabilization is to suppress or weaken the impact of camera jitter on video quality and to satisfy people's perception and subsequent processing in the future. The video stabilization method can be mainly divided into three categories, i.e., mechanical image stabilization technology, optical image stabilization technology, and electronic image stabilization technology [4][5][6]. Mechanical image stabilization detects the jitter of the camera platform through gyro sensors and other devices and then adjusts the servo system to stabilize the image [7]. Optical image stabilization uses active optical components to adaptively adjust the optical path to compensate the image motion caused by the shaking of the camera platform, so as to achieve the purpose of image stabilization [8]. Electronic image stabilization computes motion estimation between consecutive images, and then motion smoothing and motion compensation are performed on each frame of the video to obtain a stable video [9]. Although mechanical image stabilization and optical image stabilization obtain better performance, there are still some problems such as large volumes, Motion estimation: the 3D rotation transformation of the camera attitude at an adjacent time is calculated by the gyroscope data, and the camera rotation path is obtained cumulatively.
Motion smoothing: the camera rotation path smoothing is transformed into a constrained regression problem on a Riemannian manifold, and the optimal solution is calculated to obtain a smooth rotation path.
Motion compensation: the image is projected on a spherical surface, and the jitter rotation component is compensated by rotating the spherical surface. Then, the image is projected inversely onto a plane to obtain a stable video. Input data: when a jitter video is obtained, the gyroscope data are collected at the same time.
Motion estimation: the 3D rotation transformation of the camera attitude at an adjacent time is calculated by the gyroscope data, and the camera rotation path is obtained cumulatively.
Motion smoothing: the camera rotation path smoothing is transformed into a constrained regression problem on a Riemannian manifold, and the optimal solution is calculated to obtain a smooth rotation path.
Motion compensation: the image is projected on a spherical surface, and the jitter rotation component is compensated by rotating the spherical surface. Then, the image is projected inversely onto a plane to obtain a stable video.

Methodology
The proposed self-calibration spherical video stabilization based on a gyroscope includes three main steps: motion estimation, motion smoothing, and motion compensation.

Motion Estimation and Smoothing
In the 3D rotation estimation module, the gyroscope data are used to estimate the 3D rotation of the camera. The rotation angular velocity The camera path n Path is represented by Equation (2):

Methodology
The proposed self-calibration spherical video stabilization based on a gyroscope includes three main steps: motion estimation, motion smoothing, and motion compensation.

Motion Estimation and Smoothing
In the 3D rotation estimation module, the gyroscope data are used to estimate the 3D rotation of the camera. The rotation angular velocity ω = ω α ω β ω γ of a camera in the 3D coordinate system is obtained by the gyroscope, and the rotation angle θ = (α, β, γ) T = ∆tω is calculated with the integral of rotation angular velocity with respect to time, where ∆t is the sampling time interval of the gyroscope. Subsequently, the inter frame rotation matrix R corresponding to the rotation angle can be obtained by Equation (1): cos β sin γ − sin β sin α sin β cos γ − cos α sin γ sin α sin β sin γ + cos α cos γ sin α cos β cos α sin β cos γ + sin α sin γ cos α sin β sin γ − sin α cos γ cos α cos γ   . (1) The camera path Path n is represented by Equation (2): where R(i, i + 1) is the rotation matrix between ith frame and (i + 1)th frame. All of rotation matrices constitute the special orthogonal group, where any element R satisfies the constraint RR T = I, det(R) = 1, which can be also considered as an embedded Riemannian submanifold. The metric of Riemannian manifold is geodesic distance as shown in Equation (3): where log m(·) is the matrix logarithm operator and · F is the Frobenius norm of a matrix. The motion smoothing can be formulated as the objective function, and the smoothed trajectory is obtained by solving the following minimum optimization problem, as shown in Equation (4): where Path cur n is the smoothed trajectory, Path pre n is the original trajectory, and α is the weight controlling the smoothness of the stabilized trajectory. For each video sequence, the camera's rotation in the 3D space can be mapped to a curve on a Riemannian manifold. The stable camera's 3D rotation can be obtained by optimizing the geodesic distance of the curve in this space. As shown in Figure 2, the rotation matrix is transformed into a Euler angle to describe the original path and smooth the path using Equation (5): where R ij is the element of ith row and jth column in the rotation matrix R. The solid line is the original path, and the dotted line is the smoothed path. The smoothed path suppresses the jitter and retains the intentional motion.
where ( , 1) R i i + is the rotation matrix between i th frame and (i+1) th frame. All of rotation matrices constitute the special orthogonal group, where any element R satisfies the constraint ,det( ) 1 = , which can be also considered as an embedded Riemannian submanifold. The metric of Riemannian manifold is geodesic distance as shown in Equation (3): where log ( ) m  is the matrix logarithm operator and F  is the Frobenius norm of a matrix.
The motion smoothing can be formulated as the objective function, and the smoothed trajectory is obtained by solving the following minimum optimization problem, as shown in Equation (4) where cur n Path is the smoothed trajectory, pre n Path is the original trajectory, and α is the weight controlling the smoothness of the stabilized trajectory. For each video sequence, the camera's rotation in the 3D space can be mapped to a curve on a Riemannian manifold. The stable camera's 3D rotation can be obtained by optimizing the geodesic distance of the curve in this space. As shown in Figure 2, the rotation matrix is transformed into a Euler angle to describe the original path and smooth the path using Equation (5): 12 11 23 33 where ij R is the element of th i row and th j column in the rotation matrix R .
The solid line is the original path, and the dotted line is the smoothed path. The smoothed path suppresses the jitter and retains the intentional motion.

Motion Compensation
Motion compensation is an important operation of video stabilization. The virtual sphere is first established by taking the optical center of the camera as the sphere center, and then each frame image is projected on a virtual spherical surface [16]. Next, the component causing the camera jitter will be obtained according to the difference between

Motion Compensation
Motion compensation is an important operation of video stabilization. The virtual sphere is first established by taking the optical center of the camera as the sphere center, and then each frame image is projected on a virtual spherical surface [16]. Next, the component causing the camera jitter will be obtained according to the difference between the smoothed camera path and the original path. Finally, the motion compensation is carried out by reversely rotating the spherical surface to compensate the jittered images.

Spherical Projection
According to the pinhole camera model, a 2D image coordinate system can be converted into a 3D spherical coordinate system. In this paper, the angle-based spherical projection method is used, and the model is shown in Figure 3.

Motion Compensation
Motion compensation is an important operation of video stabilization. The virtual sphere is first established by taking the optical center of the camera as the sphere center, and then each frame image is projected on a virtual spherical surface [16]. Next, the component causing the camera jitter will be obtained according to the difference between the smoothed camera path and the original path. Finally, the motion compensation is carried out by reversely rotating the spherical surface to compensate the jittered images.

Spherical Projection
According to the pinhole camera model, a 2D image coordinate system can be converted into a 3D spherical coordinate system. In this paper, the angle-based spherical projection method is used, and the model is shown in Figure 3.  The resolution of the image collected by the camera is set as W × H, where W is the image width and H is the image height. In the spherical model of the right-handed coordinate system, the origin O is the optical center of the pinhole camera; the y axis is the optical axis and passes through the central point o of the image; Oo is the radius of the sphere as shown in Figure 3, and it is plotted as a red line; the projection of point P in the world coordinate system is denoted as p in the image coordinate system (marked as u − v), which is centered at (u 0 , v 0 ); the projection of point P in the virtual sphere is denoted as P S , which can be represented by angular coordinates; ϕ is the angle between Op xoy and the y axis, and θ is the angle between Op yoz and the z axis. Thus, (ϕ, θ) can be calculated by Equation (6): where r is the radius of the sphere and (u, v) is the pixel coordinates of the image. The spherical coordinates of point P S is calculated by Equation (7): In this way, the points in the image plane are converted to the corresponding points in the 3D sphere, which realizes the conversion of 2D plane images to 3D sphere images. As shown in Figure 4, taking the data published by Jia [12] as an example, the camera's focal length of f = 649 was used as the radius for spherical projection. Figure 4a is a 2D plan, and Figure 4b is the corresponding spherical projection.
where r is the radius of the sphere and ( , ) u v is the pixel coordinates of the image The spherical coordinates of point S P is calculated by Equation (7) In this way, the points in the image plane are converted to the corresponding p in the 3D sphere, which realizes the conversion of 2D plane images to 3D sphere im As shown in Figure 4, taking the data published by Jia [12] as an example, the cam focal length of 649 f = was used as the radius for spherical projection. Figure 4a is plan, and Figure 4b is the corresponding spherical projection.

Self-Calibration of the Spherical Radius
As shown in Figure 5, the projected points of point P on two adjacent frame and 2 I are 1 P and 2 P , respectively. The relative position deviation of 1 P and P

Self-Calibration of the Spherical Radius
As shown in Figure 5, the projected points of point P on two adjacent frames I 1 and I 2 are P 1 and P 2 , respectively. The relative position deviation of P 1 and P 2 in two frames can be regarded as a jitter component. θ 1 and θ 2 are the angles between the corresponding spherical projected point and the optical axis. The rotation angle of I 2 relative to I 1 is denoted by θ, which conforms to the spherical rotation model with radius r b . Since the gyroscope is bound to the camera, θ can be obtained from the gyroscope data. The most important part is how to implement the spherical radius value r b , which determines the stabilization effectiveness. Take an example in Figure 5. When the radius of the sphere r = r b , the rotation angle θ is not equal to the rotation angle θ obtained by the gyroscope, which does not conform to the rotation mode of the gyroscope; when the radius of the sphere is r b , the rotation angle θ is equal to the rotation angle θ obtained by the gyroscope, which conforms to the imaging model of the camera and the rotation mode of the gyroscope.
Therefore, it is necessary to calibrate spherical radius r in accordance with the gyroscope rotation. The spherical radius should be the focal length of the camera in theory. To achieve the self-calibration of the spherical radius coupled with a different spherical radius r, mean square error (MSE), which is used in image processing to measure the difference between two images, is used to filter the optimal spherical radius value. The MSE of a stable video corresponding with different spherical radius values is defined as Equation (8): where N is the number of the total frames in the video, and I r (x, y) is the stable frame that is stabilized with radius r. Thus, the optimal spherical radius r b is transformed into solving the optimization problem of Equation (8).
2 relative to 1 I is denoted by θ , which conforms to the spherical rotation model with radius b r . Since the gyroscope is bound to the camera, θ can be obtained from the gyroscope data. The most important part is how to implement the spherical radius value b r , which determines the stabilization effectiveness. Take an example in Figure 5. When the radius of the sphere b r r ′ ≠ , the rotation angle θ′ is not equal to the rotation angle θ obtained by the gyroscope, which does not conform to the rotation mode of the gyroscope; when the radius of the sphere is b r , the rotation angle θ′ is equal to the rotation angle θ obtained by the gyroscope, which conforms to the imaging model of the camera and the rotation mode of the gyroscope. Therefore, it is necessary to calibrate spherical radius r in accordance with the gyroscope rotation. The spherical radius should be the focal length of the camera in theory. To achieve the self-calibration of the spherical radius coupled with a different spherical radius r , mean square error (MSE), which is used in image processing to measure the difference between two images, is used to filter the optimal spherical radius value. The MSE of a stable video corresponding with different spherical radius values is defined as Equation (8) where N is the number of the total frames in the video, and ( , ) r I x y is the stable frame that is stabilized with radius r . Thus, the optimal spherical radius b r is transformed into solving the optimization problem of Equation (8).

Spherical Rotation Compensation
The existing motion compensation methods transform a 3D rotation transformation relationship into a 2D image coordinate transformation relationship through the camera intrinsic parameter matrix [13]. Assuming that the camera intrinsic matrix is K , which contains five intrinsic parameters, x f and y f represent focal lengths in terms of pixel,

Spherical Rotation Compensation
The existing motion compensation methods transform a 3D rotation transformation relationship into a 2D image coordinate transformation relationship through the camera intrinsic parameter matrix [13]. Assuming that the camera intrinsic matrix is K, which contains five intrinsic parameters, f x and f y represent focal lengths in terms of pixel, s represents the skew coefficient between the x axis and the y axis, and c x and c y represent the principal points, which can be written as: The pixel u ij , v ij in any stable frame can be calculated by the following equation under pure 3D camera rotation: where Path cur n is the smoothed rotation matrix, Path pre n is the original rotation matrix, g [x, y, z] T = [x/z, y/z] T , and u ij , v ij are pixel points in the original image. The existing motion compensation methods depend on the intrinsic parameter matrix.
Different from the existing methods, the proposed method does not need the camera intrinsic parameter matrix to realize the video stabilization. The method proposed in this paper projects an image to a sphere and uses a rotation matrix to rotate the sphere for compensating the image. Owing to this, the rotation component causing jitter can be calculated based on the difference between the smoothed rotation matrix path and the original rotation matrix path. A relatively stable spherical image can be obtained using a rotation component to reversely rotate the spherical surface, as shown in Equation (10): where x ij y ij z ij T is the original spherical point, x ij y ij z ij T is the spherical point after rotation, and R res = Path cur × (Path prev ) −1 is the rotation jitter component. Finally, the stable image sequence can be obtained by expanding the spherical surface. In Figure 6, taking the target point P of kth frame and (k + 1)th frame as the reference frame, point P is dislocated between these two adjacent frames. In the third sphere, the spherical image of (k + 1)th frame is rotated to make its position relatively consistent, so as to achieve the effect of image stabilization.  Figure 6, taking the target point P of k th frame and ( k +1) th frame as the ref frame, point P is dislocated between these two adjacent frames. In the third sphe spherical image of ( k +1) th frame is rotated to make its position relatively consisten to achieve the effect of image stabilization.

P P
Rotated frame k+1 Frame k P Frame k+1 Figure 6. Spherical rotation compensation model.

Experiment and Result Analysis
In this section, two groups of experiments are illustrated. The first experiment compared different spherical radius values to prove the optimality of the proposed selfcalibration method, and the second experiment compared different motion compensation methods to demonstrate the effectiveness of the proposed spherical motion compensation method.

Experiment Setting and Videos
In order to verify the effectiveness of the proposed method, vs2015 was used to program on a PC (Inter core i5-8500 CPU, 3.00GHz, 8GB RAM). To test the general applicability of the proposal, we collected the experimental data from different cameras, which were bound with a gyroscope. The joint calibration of the camera and the gyroscope proposed by Fang Ming [17] was used to align the image with the gyroscope data.
To evaluate the video stabilization effect quantitively, the peak signal-to-noise ratio (PSNR), the structural similarity index (SSIM) [18], the cropping ratio, the distortion score, and the stability score [19] were used. The PSNR and the SSIM are the commonly used metrics in image processing to evaluate the degree of registration between image sequences. The principle of the PSNR is that if the relative change between two adjacent frames is fully compensated, the pixel difference of two stable frames should be zero. The SSIM is widely used in video stability estimation. It considers brightness, contrast, and structure information to measure the similarity of two given images. The larger the PSNR value and the closer the SSIM value are to 1, the better the image stabilization effect is [18]. The computation methods of the PSNR and the SSIM are defined in Equation (11): PSNR(I k , I k+1 ) = 10 × log 10 ( where I k and I k+1 are the gray images of two adjacent frames; µ I k and µ I k+1 are the gray averages of I k and I k+1 , respectively; σ 2 I k and σ 2 I k+1 are the gray variances of I k and I k+1 , respectively; σ I k I k+1 is the gray covariance of I k and I k+1 ; c 1 = (k 1 L) 2 , c 2 = (k 2 L) 2 , k 1 = 0.01, k 2 = 0.03, and L = 255 are the constants used to maintain stability. The cropping ratio measures the remaining after cropping away empty regions. The distortion score is estimated from the affine part of homography. The stability score measures the smoothness of stabilized videos [19].

Comparison of Different Spherical Radius Values
In order to verify the effectiveness of spherical radius self-calibration, two groups of experimental data were implemented: (1) public video data and gyroscope data; (2) the video collected from the cascaded camera and gyroscope.
Firstly, public video data and gyroscope data [12] were used, and the focal length of the camera was f 1 = 649 pixels. The optimal spherical radius value was located at r = 656 pixels, which was calibrated by the proposed method in Section 3.2.2. The deviation between the computed value and the focal length value was small and acceptable, since the centers of the gyroscope and the camera could not completely coincide and the computed value also conformed to the imaging model basically. We ranged the spherical radius values from 200 to 3000 pixels. The PSNR and SSIM values of different spherical radius values are shown in Figure 7. The optimal values were obtained at r = 656 pixels, which indicated that the calibration result of the spherical radius is reliable. In addition, the larger the distance from the optimal radius was, the worse the video stabilization effect was.
Information 2021, 12, x FOR PEER REVIEW 10 of 13 the larger the distance from the optimal radius was, the worse the video stabilization effect was.
Secondly, the video data were collected from the cascaded camera and gyroscope, where the image data and gyroscope data were registered. The relationship between the camera and the gyroscope is shown in Figure 8. The focal length of 1468 f = pixels was computed through the camera calibration. The spherical radius value obtained by the proposed self-calibration method was 1450 r = pixels. Three groups of data, including video data and gyroscope data, were collected to verify the validity of the calibration results. We ranged the spherical radius values from 200 to 3000 pixels. Figure 9 exhibits the PSNR and SSIM values of the three videos corresponding to different spherical radii, respectively. It can be found that the PSNR values of the three videos are the maximum at Secondly, the video data were collected from the cascaded camera and gyroscope, where the image data and gyroscope data were registered. The relationship between the camera and the gyroscope is shown in Figure 8. The focal length of f = 1468 pixels was computed through the camera calibration. The spherical radius value obtained by the proposed self-calibration method was r = 1450 pixels. Three groups of data, including video data and gyroscope data, were collected to verify the validity of the calibration results. We ranged the spherical radius values from 200 to 3000 pixels. Figure 9 exhibits the PSNR and SSIM values of the three videos corresponding to different spherical radii, respectively. It can be found that the PSNR values of the three videos are the maximum at r = 1450 and the SSIM values closest to 1 are at r = 1450, which indicates that the calibration result of the spherical radius is reliable.
Therefore, the optimal spherical radius was basically consistent with the focal length of the camera, and the results obtained by the proposed self-calibration method conformed to the rotation model of the camera.
proposed self-calibration method was 1450 r = pixels. Three groups of data, including video data and gyroscope data, were collected to verify the validity of the calibration results. We ranged the spherical radius values from 200 to 3000 pixels. Figure 9 exhibits the PSNR and SSIM values of the three videos corresponding to different spherical radii, respectively. It can be found that the PSNR values of the three videos are the maximum at 1450 r = and the SSIM values closest to 1 are at 1450 r = , which indicates that the calibration result of the spherical radius is reliable. Therefore, the optimal spherical radius was basically consistent with the focal length of the camera, and the results obtained by the proposed self-calibration method conformed to the rotation model of the camera.

Comparison with the Intrinsic Parameter Matrix Method
In this paper, three different camera data were used to compare the image stabilization effect of spherical motion compensation and intrinsic parameter matrix compensation methods [12,13,15]. Figure 10 is a thumbnail of three videos, where video 1 is an indoor scene, video 2 is an outdoor scene, and video 3 is a feature-deficient scene. All the resolutions of the three videos were 1280 pixels × 720 pixels . We used the PSNR, the SSIM, the cropping ratio, the distortion score, and the stability score to compare the video stabilization effect. The results of these methods are shown in Tables 1-3. It can be seen that the proposed method achieves better indices. Meanwhile, we compared runtime as shown in Table 4. The runtime of the proposed method is slower than those of the methods [13,15], but it is promising to achieve real-time processing under the best stabilization effect. Moreover, the proposed method does not need to calibrate in advance.

Comparison with the Intrinsic Parameter Matrix Method
In this paper, three different camera data were used to compare the image stabilization effect of spherical motion compensation and intrinsic parameter matrix compensation methods [12,13,15]. Figure 10 is a thumbnail of three videos, where video 1 is an indoor scene, video 2 is an outdoor scene, and video 3 is a feature-deficient scene. All the resolutions of the three videos were 1280 pixels × 720 pixels. We used the PSNR, the SSIM, the cropping ratio, the distortion score, and the stability score to compare the video stabilization effect. The results of these methods are shown in Tables 1-3. It can be seen that the proposed method achieves better indices. Meanwhile, we compared runtime as shown in Table 4. The runtime of the proposed method is slower than those of the methods [13,15], but it is promising to achieve real-time processing under the best stabilization effect. Moreover, the proposed method does not need to calibrate in advance.
resolutions of the three videos were 1280 pixels × 720 pixels . We used the PSNR, the SSIM, the cropping ratio, the distortion score, and the stability score to compare the video stabilization effect. The results of these methods are shown in Tables 1-3. It can be seen that the proposed method achieves better indices. Meanwhile, we compared runtime as shown in Table 4. The runtime of the proposed method is slower than those of the methods [13,15], but it is promising to achieve real-time processing under the best stabilization effect. Moreover, the proposed method does not need to calibrate in advance.

Discussion
The proposed stabilization method uses a gyroscope to suppress random jitter effectively by a self-calibration spherical compensation model. At present, the representative classical methods of video stabilization based on a gyroscope include research [12][13][14][15]. We have carried out comparative analysis with references [12,13,15], which demonstrates that the proposed method has advantages in video stabilization effect and convenience. The method described in [14] designs a special optical flow sensor to assist video stabilization. There are no public sensor data and video data, so it is difficult for us to compare the results reported in this paper with those in [14].
In addition, the characteristic of this method is that it can avoid extra calibration work. In practical applications, in any scene, the image can be stabilized by fixing a gyroscope on a camera, which is more flexible and ensures the video stabilization effect. However, the runtime of the proposed method is not the fastest compared with those of other methods, but it is promising to achieve real-time processing under the best stabilization effect. In the next stage, we will optimize the method to reduce runtime.

Conclusions
In this paper, a self-calibration spherical compensation image stabilization method based on a gyroscope has been proposed. The camera motion trajectory is obtained by gyroscope, and its trajectory is smoothed on a Riemannian manifold to obtain a jitter component; the virtual sphere is established by the optical center of the camera, and the objective function about the spherical radius is established according to the mean square error of the stabilized video. The optimal spherical radius is determined by solving the optimal value of the objective function to complete the spherical radius calibration. Then, the image is projected on the spherical surface, and the spherical surface is rotated reversely according to the jitter component for motion compensation. Finally, the spherical image is expanded to obtain a stable video sequence. The experimental results showed that the stability metrics, i.e., the PSNR, the SSIM, the cropping ratio, the distortion score, and the stability score, were improved, demonstrating that the proposed method is better than the traditional intrinsic parameter matrix compensation methods. Moreover, the proposed stabilization method not only maintains the effectiveness of video stabilization, but also relaxes the dependence on camera calibration.

Data Availability Statement:
The data used to support this study's findings are available from the author upon request.

Conflicts of Interest:
The authors declare no conflict of interest.