Non-Rigid Structure Estimation in Trajectory Space from Monocular Vision

In this paper, the problem of non-rigid structure estimation in trajectory space from monocular vision is investigated. Similar to the Point Trajectory Approach (PTA), based on characteristic points’ trajectories described by a predefined Discrete Cosine Transform (DCT) basis, the structure matrix was also calculated by using a factorization method. To further optimize the non-rigid structure estimation from monocular vision, the rank minimization problem about structure matrix is proposed to implement the non-rigid structure estimation by introducing the basic low-rank condition. Moreover, the Accelerated Proximal Gradient (APG) algorithm is proposed to solve the rank minimization problem, and the initial structure matrix calculated by the PTA method is optimized. The APG algorithm can converge to efficient solutions quickly and lessen the reconstruction error obviously. The reconstruction results of real image sequences indicate that the proposed approach runs reliably, and effectively improves the accuracy of non-rigid structure estimation from monocular vision.


Introduction
Recently, non-rigid structure estimation from monocular vision, which can recover the time varying 3D coordinates of points on a non-rigid object from their 2D places in a video sequence, has become a popular research topic. Generally, two major methods, i.e., the trajectory basis method and shape basis method, are often used to solve non-rigid structure estimation problems. The factorization method was first proposed to recover rigid structure by Tomasi and Kanade [1], and the method was extended to solve the non-rigid structure problem in the seminal paper by Bregler et al. [2]. The core idea is that shapes observed from motion can be represented by the linear combination of a compact set of basis shapes. Each instantaneous structure, such as a running person, can be expressed as a point in the linear space of shapes spanned by the shape basis. A large number of methods have subsequently been developed [3][4][5], which promoted improved performances of shape basis. However, the shape basis has some limitations since it has a specific nature and can't generally apply to all non-rigid bodies. The shape basis of a dancer moving, for example, cannot be recycled to compactly represent a person running. So, as an alternative to a shape space, Akhter et al. [6,7] proposed to represent the time-varying structure of a non-rigid object by using a linear combination of a set of basis trajectories, which was called Point Trajectory Approach (PTA). The primary advantage of PTA was that the trajectory basis can be predefined to be close to many real trajectories, which resulted in a significant reduction in unknowns, and corresponding stability improvement in estimation. Zhu et al. [8] pointed out that the importance of selecting the number of trajectory basis, rather than the more bases used the better, which varied from different models. On this basis, Gotardo and Matinez [9] combined the shape method and trajectory method, which can further improve the reconstruction performance. Recently, Rehan et al. [10] proposed a novel constraint in the form of local rigidity, which gave stable results in challenging realistic scenarios with small camera motions and shorter sequences. Minsk et al. [11] introduced new constraints that were more effective for non-rigid structure estimation, which constrained the motion parameters so that the 3D shapes were most closely aligned to each other, making the rank constraints unnecessary. Then they proposed a new probabilistic model in [12], which incorporated the smoothness constraint without requiring any prior knowledge. This approach regarded the sequence of 3D shapes as a simple stationary Markov process with Procrustes alignment, whose parameters were learned during the fitting process. Antonio et al. [13] proposed an online solution to estimate non-rigid structure, which modeled non-rigid deformations as a linear combination of some mode shapes obtained using modal analysis from continuum mechanics. However, the underlying principle behind most approaches was to model deformations using a low-rank shape [2,9,14,15], and it improved the accuracy of the non-rigid structure estimation.
In order to further improve the accuracy of non-rigid structure estimation, low rank condition of structure matrix is also investigated in this paper, and the APG algorithm is proposed to optimize the structure matrix, which can quickly converge to an efficient solution. Many trajectory bases can be used to recover the structure [16], such as the Discrete Cosine Transform (DCT) basis, Walsh Hadamard Transform (WHT) basis and Discrete Wavelet Transform (DWT) basis. In this paper, the predefined DCT basis is introduced to recover the motion and structure of the non-rigid object. For the 2D signal of N M × sample points, the DCT formula can be defined as follows: where = 1,2, … , and μ( ) k is the coefficient: In the paper, the APG method is proposed to solve the problem of non-rigid structure estimation. A new constraint, called trace minimization constraint of the rectification matrix, is introduced to narrow the solution space and improve the computiational speed of our algorithm. The proposed method can effectively estimate both 3D structures of non-rigid objects and the camera motion. The experimental results on real image sequences indicate that the proposed approach effectively improves the accuracy of non-rigid structure estimation from monocular vision. This paper is organized as follows: the problems are formally described in Section 2 before briefly introducing how to get the initial structure matrix S by using PTA method in Section 3. In Section 4, the APG algorithm is introduced to optimize S . Experimental results are presented in Section 5.
Finally, a summary and future works are discussed.

The Problem of Non-Rigid Structure Estimation
In fact, 3D reconstruction of non-rigid motion from monocular vision is equivalent to the decomposition of the measurement matrix W, that is decomposing the W into the rotation matrix R of camera and the structure matrix S of the non-rigid object. This problem can be simplified to estimate the rectification matrix Q. The PTA method is implemented to estimate the corresponding unknown parameters by a series of constraints, and to recover the structure S of the non-rigid object. Then, the APG algorithm is used to reconstruct the structure matrix S which is calculated by PTA, and can further improve the accuracy of the non-rigid structure estimation from monocular vision. After feature point correspondence, the measured 2D trajectories can be included in the measurement matrix W, containing the location of N image points across M frames: The measurement matrix W can be decomposed as is an orthogonal projection matrix: The structure matrix S is a N M × 3 matrix, and the structure at a time instant t can be represented as follows: where: K is the size of the DCT basis. If K is chosen too small, the trajectory is poorly represented, but if it is chosen too large, the system is ill-conditioned and the reconstruction error becomes unlimited, so how to choose a suitable K is very important.
According to the above mentioned, the measurement matrix W is decomposed as: Factorize W with the Singular Value Decomposition (SVD) method: However, the matrices ∧ Λ and ∧ Α will not be equal to Λ and Α respectively, because SVD is not unique. Any non-singular orthogonal matrix [ denotes the two rows of matrix ∧ Λ at positions between 1 2 − i and i 2 . Due to the inherent ambiguity of the orthogonal constraint, Xiao et al. [18] found that the above method couldn't obtain a unique solution of the rectification matrix Q . However, Akhter et al. [14] showed that the inherent ambiguity did not necessarily lead to a fuzzy shape. Experimental results proved that only using the constraint can also recover the unique structure S.
The rectification matrix k Q can be estimated precisely by using the trace minimization constraint of the T k k Q Q . Once matrix k Q has been computed, the matrix R can be estimated by using a nonlinear minimization routine. According to Equation (8), the structure matrix S can be calculated by the pseudo-inverse method. Because Then the structure matrix is calculated by the equation . The S is set as the iterative initial value of the APG algorithm.

The Trace-Minimization Problem
The goal of this paper is to solve the structure matrix S through the equation RS W = , where the measurement matrix W is known, and the rotation matrix R is calculated. Because A S Θ = , the rank of the matrix should meet the requirement of the low-order linear model: The size of DCT basis K is a small constant, so the structure matrix S is a low-rank matrix. Then the low-rank condition is relaxed to a rank-minimization problem [19,20]. Now the structure matrix S will be a solution to the rank minimization problem as follows: ), ( min (14) according to Dai et al. [21], because the rank-function itself is not very numerically stable and rank-minimization is an NP-hard problem in general. Relaxing the above rank-minimization to a nuclear-norm minimization form in an effective way [22,23], that is min * || S || . In principle, the nuclear-norm minimization may be solved by a standard SDP solver [15]. In this study, the size of S . However, when the size is large, the SDP technique cannot work well.

The Application of the APG Algorithm
Many efficient convex optimization algorithms could be used to solve the problem. In this paper, an effective iterative algorithm, the APG algorithm [24,25], is proposed to optimize the non-rigid structure estimation from monocular vision. According to this algorithm, a closed form solution of the following Equation (15) can be obtained. In Equation (8), W is a measurement matrix of the signal S , which was obtained by using the calculated matrix R . The above minimization Equation (14) can be rewritten in Lagrangian form as follows: Here, the stopping condition of the APG algorithm is defined as following: where tol is a moderately small positive number, since when K S gets close to an optimal solution S , the distance between K S and 1 + K S should become very small. If tol is too large, the non-rigid structure may not be calculated accurately, if tol is too small, the running time will be too long, so we should choose a suitable tol.
where the Lipschitz constant Lf is simply the square of the operator norm of the linear map: Because: (19) and then the iterative formula is shown as follows: The detailed steps of the APG algorithm are summarized in Algorithm 1.

Algorithm 1.
The steps of the APG algorithm.
Step 8. Output 1 + K S , namely the reconstructed structure matrix S .
In the Algorithm 1, factorize G with SVD method,

The Yoga Sequence Experiment
The experimental dataset consists of a 307-frame sequence of a human practicing yoga, which comes from http://cvlab.lums.edu.pk/non-rigid structure estimation. The database is observed by a perspective camera orbiting the subject on a horizontal plane at a speed of 5° per frame. The reconstruction performances of Akhter et al.'s PTA approach and the APG approach are presented in the following Figures 1 and 2, respectively. In the figures, the blue dots are the ground truth 3D points, and the red circles show the recovered points.
As shown in Figures 1 and 2, in general, the APG method reconstruction is better than that of PTA algorithm. Especially, the reconstruction precision of the APG method is increased significantly when K = 9.
From Figure 3, it can be found that APG method can improve the reconstruction quality with less reconstruction structure errors than the PTA method with different values of K.
From Figure 4, it can be found that APG method can improve the reconstruction quality with less reconstruction rotation errors than the PTA method with different values of K. From the above figures, the proposed method performs effectively, and the reconstruction accuracy of non-rigid structure estimation from monocular vision is improved effectively. The reconstruction results on the real yoga sequence images indicate that not only the structure reconstruction, but also the rotation reconstruction is obviously improved by using the APG algorithm.

The Pickup Sequence Experiment
In addition, another experimental dataset is proposed to test the proposed APG method for non-rigid structure estimation. The experimental dataset consists of a 357-frames human pickup sequence, which come from http://cvlab.lums.edu.pk/non-rigid structure estimation. The database is observed by a perspective camera orbiting the subject on a horizontal plane at a speed of 5° per frame. In this paper, the comparison of non-rigid structure estimation from monocular vision between the proposed APG algorithm and PTA algorithm is given, which are presented in terms of the reconstruction result figures and reconstruction error curves with different K values. In the following figures, the blue dots are the ground truth 3D points, and the red circles show the recovered points.
From Figures 5 and 6, in general, the APG method reconstruction is better than the PTA algorithm one. When K = 3, the reconstruction precision of the APG method is increased significantly.  From Figure 7, it can be found that APG method can improve the reconstruction quality with less reconstruction structure errors than the PTA method with different values of K. From Figure 8, it can be found that APG method can improve the reconstruction quality with less reconstruction rotation errors than the PTA method with different values of K. From Figures 7 and 8, we can see the proposed APG method runs reliably, and the accuracy of non-rigid structure estimation is effectively improved. The reconstruction results on the real pickup image sequences indicate that the APG method outperforms the PTA method in terms of reconstruction accuracy of non-rigid structure estimation.

The Comparison of APG Method and Block Matrix Method
The comparisons of reconstruction results between the APG algorithm and Block Matrix Method (the method of Dai et al. [20]) on the shark sequence are presented in the following figure. The shark sequence contains 240 frames and 91 features. The comparison of reconstruction errors between the APG algorithm and Block Matrix Method is provided in Table 1, where the number in brackets is the best value of K, which is chosen by exhaustively trying out different numeric values between 2 and 13. In our paper, the best value of is confirmed by the tracked positions of a sequence of non-rigid shapes by using the rank analysis method [26]. As can be seen in Figures 9 and 10 and Table 1, the proposed APG method outperforms the Block Matrix Method when the experiment involves the shark and drink sequences. When the experiment is about other sequences, the APG method is not good as the Block Matrix Method, but the difference is not obvious. All the structure error mentioned in our paper refers to the mean error of each frame, we call it mean structure error ( ( )). The computational formula is as follows: where = 1,2, … , , , and are respectively the standard deviation of the point , , coordinates of the t-th frame corresponding to the 3D structure.
represents the reconstruction error of the j-th 3D point of the t-th frame. is the structure matrix we reconstructed, while is the actual structure matrix.

The Comparison of the APG Method and Existing Methods
Moreover, the comparison of reconstruction error between the APG algorithm and the existing EM-PPCA [17], MP [27], the PTA method [6], and CSF [9] are also provided in Figure 11 and Table 2. The abscissa of Figure 11 shows five sequences: drink, yoga, pickup, shark and dance, respectively. Its ordinate indicates the structure errors using five different methods. It can be found that the reconstruction results using the APG method are obviously better than those obtained with the other methods. From Figure 11 and Table 2, the reconstruction results on the most real image sequences indicate that reconstruction accuracy is significantly improved by the APG method, which obviously reduces the structure errors. The APG method can further optimize the structures calculated by the PTA method, which is proved to be an effective approach for non-rigid structure estimation from monocular vision. However, for dramatic movement image sequence, the proposed approach cannot improve the accuracy of the non-rigid structure estimation effectively, such as in the shark sequence. In our algorithm, the PTA method is first used to calculate the structure matrix, and then the APG method is used to optimize the structure matrix. In the total execution time of our algorithm ( ), the percentage of the execution time of APG algorithm ( / ) is showedn in Table 3.  Table 3 shows that our APG method runs quickly, and can converge to the best solution in a little time.

Conclusions
In this paper, the APG algorithm is proposed to solve the trace-minimization problem of the structure matrix. The initial value of the APG algorithm can be calculated by using the PTA method. The proposed APG method can further improve the structure performance and converge to the optimal solution. Above experimental databases are applied to test the proposed APG method for non-rigid structure estimation from monocular vision. The experimental results show that the proposed method can improve the reconstruction quality of non-rigid structure estimation with less reconstruction error than the PTA method. The APG algorithm can also converge to the best solution quickly, so the time consumption of the proposed method is near that of the PTA method.
However, the APG method is not available for reconstructing dramatic movement sequences, so in the future, the APG method will be improved to make it suitable for dramatic movement. Moreover, the selection of the initial value plays an important role in the reconstruction efficiency of the proposed algorithm, and how to select the best initial value is another future work. The structure matrix S can also act as a solution to the rank minimization problem, so in the future, some optimization algorithms, such as the Singular Value Thresholding (SVT) algorithm, will be considered to further reduce the reconstruction error of non-rigid structure estimation from monocular vision.