Rotation Estimation: A Closed-Form Solution Using Spherical Moments †

Photometric moments are global descriptors of an image that can be used to recover motion information. This paper uses spherical photometric moments for a closed form estimation of 3D rotations from images. Since the used descriptors are global and not of the geometrical kind, they allow to avoid image processing as features extraction, matching, and tracking. The proposed scheme based on spherical projection can be used for the different vision sensors obeying the central unified model: conventional, fisheye, and catadioptric. Experimental results using both synthetic data and real images in different scenarios are provided to show the efficiency of the proposed method.


Introduction
Rotation estimation from images is important for many application as for motion estimation and registration in image processing [1], pattern recognition [2], localization and control of ground/aerial vehicles [3], and computer vision [4]. Feature based and direct (appearance based) are the two main approach categories proposed in the literature. The first method type uses Epipolar geometry applied to geometrical features as as points [5], lines [6], or contours [7] to estimate the fundamental matrix (uncalibrated vision sensors), the essential matrix, or homography matrix for the calibrated case. Examples of such methods using conventional cameras are Agrawal et al. [8] and Malis et al. [9] and using omnidirectional vision sensors is [10][11][12]. The direct approaches, on the other hand, avoid geometric features extraction by using grey level information of the whole image to define the descriptors to be used. Those methods are referred to in the literature as dense, direct, or global. Methods within this category can avoid the need of some higher-level image processing steps such as feature extraction, matching, and tracking. For instance, Reference [13] exploits optical flow estimation. However, a consequence of using optical flow in References [13,14] is to be limited to small camera displacements. To deal with larger motion, in Reference [4], Makadia and Daniilidis proposed a method using correlations between the two sets of spherical harmonic coefficients.
To address even larger camera displacements, motion estimation techniques based on global image representation is utilized in the literature. Makadia and Daniilidis [4] represented the spherical omnidirectional images in the frequency domain to recover rotation using the maximum correlation between the two sets of spherical harmonic coefficients. In Reference [15], the authors proposed to estimate the rotation based on 3D-mesh representation of spherical image intensities. However, these methods are not often computationally efficient due to the computation load of spherical harmonic coefficients.
The moments as descriptors provide useful characteristics such as the center of gravity of an object, orientation, disparity, and volume. As Fourier descriptors, they can be used as a compact representation of the image. In the case of 2D images, in addition to the classical geometrical moments, other kinds of moments have been defined in the literature as Zernike Moments [16] or complex moments [17]. Invariant characteristics to some transformations that an object can undergo, commonly called moment invariants, found many applications such as for pattern recognition. Thanks to their decoupling properties, they also have been used to define a minimal number of features for visual servoing and pose estimation as in References [18,19] and for visual servoing as in References [20,21].
Spherical projections can be obtained from images if the used camera obeys the unified central model. By using such projections, the authors in Reference [22] proposed a decoupled visual servoing scheme using spherical moments computed for a set of matched points. The proposed features allow to control the translational motions independently from the rotational ones. The current work exploits the spherical projection to estimate 3D rotations using spherical moments. The proposed scheme provides an analytical solution to rotation. An early version of this work was published in Reference [23]. In the next section, the central camera model and the definition of spherical moments are first recalled. The proposed approach is subsequently presented. In Section 3, experimental results of rotation estimation using synthetic data and real images acquired from a catadioptric camera mounted on a manipulator robot (Staubli) and a mobile robot (Pioneer) are presented to validate our approach.

Spherical Moments
The projection onto a unitary sphere t P s = [x s y s z s ] of a 3D point P is defined by the following: Actually, the projection onto the unit sphere can be obtained using the unified central model for conventional, fisheye, or catadioptric [24]. This makes the proposed scheme in this work valid for a wide range of cameras.
From a spherical image (or Gaussian sphere [25]) of the environment, the photometric moments can be defined by the following: where S is defined as the sphere surface. Let R be a rotation matrix and ω be the instantaneous rotational speed applied to the sensor frame. First, it can be easily shown that a rotational motion defined by a matrix R on 3D point P (P = R P, where P is the same 3D point after rotation) induces the same motion on its projection onto the unit sphere (P s = R P s ). In tangent space, the effect of a rotation speed ω = [ω x ω y ω z ] on a projected point on the sphere is given by its time derivativeṖ s : where L P s is the interaction matrix and [ ] × defines the skew-symmetric matrix of a vector. On the other side, by taking the time derivative of Equation (2), the time variation of the moment m ijk is linked to rotational speed ω by the following [26]: In the following, a scheme is proposed to define triples x m = x(m ijk ) y(m ijk ) z(m ijk ) ∈ R 3×1 that behave as a projected point onto the sphere with respect to rotations.

Closed-Form Solution of Rotation Estimation
From moments of order 1 and by using Equation (4), the interaction matrix that links time derivatives of the triple x m 1 = [m 100 m 010 m 001 ] to the rotational velocities is obtained by the following:ẋ As it can be seen, the interaction matrix given by Equation (5) has an identical form to the one obtained for a projected point onto the sphere. This implies a 3D rotation results in the same motion on x m 1 . Let us now consider triples x m defined from moments of orders larger than 1. In order to have x m = R x m , the interaction matrix related to x m = [x(m ijk ), y(m ijk ), z(m ijk )] must have the same form as Equation (3), which means the following: Under the assumption of constant speed (i.e., [ω] × is a constant matrix), solving for Equation (6) leads to the following: where x m (0) and x m (t) are respectively the initial value and at the value time t of the triple (which defines the relation for a couple of images). Any rotational motion between two images can be obtained by applying a constant speed during time t. This means Equation (6) implies x m (t) = Rx m (0) for any couple of images.
Let us now explain the procedure to obtain those triples using as an example: the moment vector w 23 = [m 300 m 200 m 300 m 110 · · · m 003 m 002 ] . The latter is composed of all the possible products between moments of order 2 and those of order 3. There are 6 possible moments of order equal to 2 and 10 of order equal to 3. This implies that w 23 is of size equal to 60. Let us define triples where α x , α y , and α z are vectors of coefficients that define the triple coordinates as linear combinations of the entries of w 23 . Our goal is to find the conditions on those vectors of coefficients such that the time derivativeẋ w 23 is of the same form as Equation (6).
As an example, let us consider the time derivative of the moment product m 030 m 200 , which is a component of w 23 . Using Equation (4), we have the following: As it can be also concluded from Equation (4), the time variation of a moment is a function of moments of the same order and the rotational speed. From Equation (8), we obtain the time derivative of the product m 030 m 200 by the following: Note that the coefficients of ω x , ω y , and ω z are nothing but linear combinations of products between moments of orders 2 and 3. Actually, the time derivative of any component of the vector w 23 will be a linear combination of the other components. Based on this fact,ẇ 23 can be written under the following form:ẇ where L w 23 /ω x , L w 23 /ω y and L w 23 /ω z are 60 × 60 matrices. Using Equation (10), the time derivative of the triple x w 23 = [α x w 23 α y w 23 α z w 23 ] is obtained: 23 /ω x w 23 α x L w 23 /ω y w 23 α x L w 23 /ω z w 23 α y L w 23 /ω x w 23 α y L w 23 /ω y w 23 α y L w 23 /ω z w 23 α z L w 23 /ω x w 23 α z L w 23 /ω y w 23 α z L w 23 If Equation (11) is of the same form as Equation (6), this leads to the 9 following conditions: 3)α x L w 23 /ω z w 23 = α y w 23 ; 4)α y L w 23 /ω x w 23 = α z w 23 ; 5)α y L w 23 /ω y w 23 = 0; The 9 constraints given by Equation (12) must be valid for any value of w 23 . Therefore, they can be simplified as follows: The constraint set of Equation (13) can be written under the following form: where I and 0 are respectively the identity and zero matrices of size 60 × 60. In practice, L w 23 /ω x , L w 23 /ω y and L w 23 /ω z are very sparse integer matrices. Solving Equation (14) for the case of the moment vector w 23 leads to three solutions for the triple x w 23 (refer to Equations (A1)-(A3) in the Appendix A). Such conditions can be solved using a symbolic computation software as Matlab Symbolic Toolbox. The procedure used to obtain triples from w 23 can be extended straightforwardly to a different moment vector as follows: • Define the moment vector w similarly as for w 23 . The vector can be built by moment products of two different orders or more. For instance, products such as m 200 m 220 m 300 which combine moments of order 2, 3, and 4 can be also used. the vector w has to be built from moment products of the same nature and has to also include all of them. • Using Equation (4), compute the matrices L w 23 /ω x , L w 23 /ω y , and L w 23 /ω z such that we obtain the following:ẇ = (L w 23 /ω x w)ω x + (L w 23 /ω y w)ω y + (L w 23 /ω z w)ω z .
• Solve the following system: where I and 0 are respectively the identity and zero matrices of the same size as L w 23 /ω x , L w 23 /ω y and L w 23 /ω z .
In practice, two different triples x w are enough to obtain a closed-form solution of the rotational motion since they can define an orthonormal basis. Let us consider, for instance, the two triples P 1 and P 2 computed from the moments of a first image using Equations (A1) and (A2) and P 1 and P 2 , their corresponding ones, computed from the image after rotation. Therefore, we have P 1 = RP 1 and P 2 = RP 2 , where R is the rotation matrix. Let us define the following base: where P n1 = P 1 P 1 and P n2 = P 2 P 2 . First, it can be shown that the vectors v 1 , v 2 and v 3 form an orthonormal base (i.e., v 1 = v 2 = v 3 = 1 and v 1 3 .v 2 = 0). Second, let us now consider a second base v 1 , v 2 and v 3 computed from P 1 and P 2 using Equation (17). Since P 1 = RP 1 and P 2 = RP 2 , it can be proven using Equation (17) that v 1 = Rv 1 , v 2 = Rv 2 and v 3 = Rv 3 . The last equations can be written under the following form: . Finally, we obtain: where the rotation matrix can be estimated straightforwardly. Solving Equation (16) ensures that all possible triples are obtained from a given moment vector. Obtaining a mathematical proof on the existence and their number is out of the the scope of this contribution.

Rotation Estimation and Scene Symmetry
As explained in the previous paragraph, an orthonormal base can be built from the triples P 1 and P 2 to estimate the rotation matrix. In practice, if the scene has no symmetry, extensive tests (not included in the paper for brevity) show that using the triples P 1 and P 2 could be enough. If the scene has a symmetry with respect to a plane (for instance xy plane), the z-coordinates of the triples P i are null. This means that all P i will belong to the xy plane. If the scene is symmetrical with respect to xy and xz planes, for instance, then the triples P i will form a line in the direction of the x-axis.

Validation Results
In the following section, simulation results and real experiments are shown. For all the validation results, the spherical moments are computed directly using the pixel's grey levels without warping the images to the sphere space. The used formulas are given in References [26,27] for the perspective and fisheye cases respectively: where ξ is the distortion parameter and (x, y) are the coordinates of a point in the metric image. The formulas of x s , y s , and z s as functions of (x, y) can be found in Reference [28]. Using Matlab with a computer equipped with a processor Intel(R) Core(TM) i5-7600 CPU @ 3.50 GHz 3.50 GHz and 16,0 Go RAM, the whole process including the computation of spherical moments of an image of size 480 × 640 pixels until the rotation estimation takes around 20 ms. This leaves room for reducing the time cost using other programming tools than Matlab to few milliseconds.

Simulation Results
The objective of this part is threefold: • To show the validity of the proposed method to cameras obeying unified model, two different kinds of camera model are used to compute the images. The first corresponds to a simulated fisheye camera with the parameters chosen as focal scaling factors Fx = Fy = 960 pixels, coordinates of the principal point u x = 240 and u y = 320 pixels, and distortion parameter ξ = 1.6. The second model corresponds a conventional camera with focal scaling factors Fx = Fy = 600 pixels and coordinates of the principal point u x = 240 and u y = 320; • To Validate our approach for large rotational motion; • To test the effect of translational motion on the accuracy of the estimated rotations.
In the first results, a fisheye camera model is used to generate the images. More precisely, from the reference image shown on Figure 1a, three sets of 100 new images are computed based on randomly generated displacements. An example of the used images is shown on Figure 1b. All the three image sets were obtained with the same random rotations shown on Figure 2b but with translational motion of norms equal to 0 for the first set and 10 cm and 20 cm for the two others. For the two last sets, the same random translation directions are used but with different norms equal to 10 cm and 20 cm, respectively. To assess the estimated rotations, the norm of the vector composed by the errors on those Euler angles is used. The obtained accuracy results are shown on Figure 2a. It can be seen that, for the case where the camera displacement does not include translational motion, the rotation motion is well estimated (refer to the red plot of the Figure). The results obtained in the case where the camera displacement involves translational motion of norm 10 cm and 20 cm are shown respectively by the blue and the green plots of the Figure 2a. From the obtained plots, it can be seen that a maximum error of 4 • (respectively 8 • ) and an average error less then 3 • (respectively 5 • ) are obtained in the case of translation norm equal to 10 cm (respectively 20 cm). The second simulation results are based on the same previous setup, but this time, it is a conventional perspective camera that is used to generate the images. As for the first simulations, we consider three sets of 100 images corresponding to transnational motions of different norms (0 cm, 10 cm, and 20 cm). Figure 3a,b shows the reference image and an example of the used rotated image for which the rotation has to be estimated. The ground truth of the rotational motions is shown on Figure 4b, while the accuracy of the rotation estimation is depicted on Figure 4a. From this Figure, it can be seen that the rotations are well estimated in the case where pure rotational motions are considered (refer to the red plot). It can also be noticed from the same figure (the blue and the green plots) that maximum rotation errors around 3.5 • and 7.5 • are observed respectively for the cases where the camera displacement involves a translational motion of norms equal to 10 cm and 20 cm, respectively.

Real Experiments
The following experimental results use the OVMIS (Omnidirectional Vision Dataset of the MIS laboratory) dataset (http://mis.u-picardie.fr/~g-caron/pub/data/OVMIS_dataset.zip). Based on the same database, the preliminary work [23] has provided several experimental results where the proposed method is compared to the SPHARM (SPherical HARmonic analyses) method [15] and to the phase correlation (COR) and photometric (PHO) methods [29]. In the following, the proposed method is validated on a larger set of scenarios from the dataset. Moreover, we compare the performance of the proposed method using the spherical moments computed from the image intensities and from its gradient. We also test the effect of introducing the weighting function on the spherical moment computation of image acquired by catadioptric camera. The used weighting is a function of the pixel coordinates (x, y) of the following form: where r = (x − u x ) 2 + (y − u y ) 2 is the distance of the pixel to the principal point (u x , u y ) and σ is a scalar that tunes the wide of the image zone to be used. Note that W(x, y) has its maximum value 1 for r = r m . Figure 5a shows the shape of the used weight function for this validation results. As can be seen from weighting function shape, the points close to the image border or to the blind zone in the mage center are given small weights to decrease the effect of appearance/disappearance of image parts. In all these experimental parts, the rotation motion is obtained by accumulating the estimated rotations between each two consecutive images. The first validation result shown in the video (Supplementary Materials) is based on Dataset 1 of OVMIS with the Staubli robot arm. The considered sequence composed of 144 images has been acquired by applying regular pure rotations around the optical axis by a catadioptric camera mounted on the end effector of a Staubli in an indoor environment which corresponds to a rotation of 2.5 • between two consecutive images. First, the proposed method is tested by considering all the image sequences. Then, one image from 10 and from 20 are considered, which corresponds to rotations of 25 • and 50 • , respectively, between tow consecutive images. As it can be seen from the video that the obtained results show that the proposed method estimates accurately the rotations in the three considered cases.
The second validation result shown as well in the video (Supplementary Materials) is based on the Scenario 1 of the Pioneer database acquired using a catadioptric camera. A similar setup to the previous case is used: a pure rotational motion around the z-axis in an indoor environment. The proposed method is tested by considering all the image sequences, one image from 10 and one from 20. The obtained results show again that the proposed method estimates accurately the rotations in the three considered cases.
The last experiment tests the proposed method on the more challenging Scenario 3 of the OVMIS dataset of a pioneer robot moving according to the trajectory described in Figure 5b. The achieved path is of length 25.27 m, in the form of 8 motions that include both rotational and translation motions. It has been achieved by the mobile robot in 160 s, which corresponds to 0.187 m/s average speed. The ground truth is obtained by incorporating gyroscopic corrections into the odometry data based on wheel encoders. From the video (Supplementary Materials), it can be seen that the mobile robot has traversed uneven terrain, with changing illumination conditions due to moving cast shadows of the trees and with appearance/disappearance of some image parts. Figure 6 compares the estimated rotation with/without using weighting function and based on moments computed from the image grey level or its gradient. From the obtained plots, it can be seen that using weighting function improves slightly the results obtained using the moment computed from the image intensities (compare the blue curve to the red one). On the other hand, it can be seen that the most accurate rotation estimation is obtained using the image gradient spherical moments and weighting function (refer to the green plot). The last experiments of the video (Supplementary Materials) describes the obtained results using the image gradient spherical moments by considering all the image sequences, one image from 10 and one from 20. From the obtained results, it can be seen that the rotation estimation is satisfactory for camera motion involving transnational and rotational motions.

Discussion
The triples computed from moment vectors provide a closed-form estimation to rotational motion. Compared to the classical method based on the alignment of principal axis, the proposed solution has no ambiguities on rotation formula. Recall that the alignment of principal axis method usually uses moments of higher orders to remove the ambiguities. If the estimated rotation is used as a feature in visual servoing or in active vision, the closed-form formula has a second advantage. For instance, as it has been proposed in References [30,31], the rotation vector is computed from the rotation matrix and used to control the rotational speed of a camera. For accurate visual servoing and active vision, most strategies are based on the analytical form of the visual features dynamics. Independently from the used method to estimate the rotation matrix, the relation between the time variation of a rotation vector and the rotational speed keeps the same form. On the other hand, its dynamic with respect to the translational speed becomes complex to obtain without a closed-form formula. Using the proposed solution in this paper, the dynamics of the rotation vector with respect to the whole degree can be obtained from those of the spherical moments [26,30,31].
A natural question arises about which are the best two triples to be used for building an orthonormal base. For instance, a base can be built from the two triples from P 1 to P 3 as well as by using two others from P 4 to P 7 . Although it is well accepted that the sensitivity to noise increases with the moment orders, there is not much difference between results obtained using moments of orders 2 and 3 and those obtained using moments of orders 4 and 3. Moreover, an orthonormal base can be obtained using more than only two triples as it is achieved in Reference [31]. Finally, recall also that the rotation matrix can be obtained using all the triples P i as a generalized solution of the orthogonal Procrustes problem [32].

Conclusions
In this paper, a new method to estimate rotational motions using photometric moments has been provided. The obtained theoretical results have been validated using simulations and real experiments using onboard omnidirectional cameras and two different robots (robot arm and ground mobile robot). The proposed method has been also tested in different scenarios with appearance/disappearance of some image parts and lightening changes. Future works will be devoted to using photometric spherical moments for the control of the 6 degrees of freedom of a camera.

Conflicts of Interest:
The authors declare no conflict of interest.