Single-Shot Dense Depth Sensing with Color Sequence Coded Fringe Pattern

A single-shot structured light method is widely used to acquire dense and accurate depth maps for dynamic scenes. In this paper, we propose a color sequence coded fringe depth sensing method. To overcome the phase unwrapping problem encountered in phase-based methods, the color-coded sequence information is embedded into the phase information. We adopt the color-encoded De Bruijn sequence to denote the period of the phase information and assign the sequence into two channels of the pattern, while the third channel is used to code the phase information. Benefiting from this coding strategy, the phase information distributed in multiple channels can improve the quality of the phase intensity by channel overlay, which results in precise phase estimation. Meanwhile, the wrapped phase period assists the sequence decoding to obtain a precise period order. To evaluate the performance of the proposed method, an experimental platform is established. Quantitative and qualitative experiments demonstrate that the proposed method generates a higher precision depth, as compared to a Kinect and larger resolution ToF (Time of Flight) camera.


Introduction
With the rapid development of computer vision and its increased use in industrial applications, depth sensing is witnessing increasing use in various fields such as biomedical testing [1,2], reverse engineering [3] and human-computer interaction. Among the numerous depth sensing methods, the structured light illumination (SLI) technique has attracted more attention owing to its advantages of fast speed, high accuracy, simplicity, and non-contact nature [4][5][6].
Based on the coding strategy, the SLI techniques [7] are generally categorized into two classes: the temporal encoding method and the spatial encoding method. The former method performs the encoding process by projecting multiple illumination patterns. Using time division multiplexing, this method can achieve a high accuracy depth map. However, it is not suitable for the dynamic scene. The representative temporal encoding methods are binary coding [8] and phase shifting [9]. The spatial method is based on the encoding of the neighborhood's features, such as the pixel values and colors. All the coded information is integrated within one pattern, which averts the synchronization of camera and projector that is critical in the temporal encoding method. Therefore, this method is suitable for the depth sensing of moving objects. The common patterns of this method mainly include the De Bruijn coding pattern [10,11], stripe pattern [12], random pattern [13], and the M-array pattern [14]. M-array is a square pseudorandom array [15]. The structured light patterns with stripes or spots created based on a unique code are also used in the spatial encoding method. However, this method cannot be used to obtain dense depth maps because of the sparse patterns. To increase the resolution of the depth map, fringe pattern profilometry (FPP) introduces the phase measurement technique into the structured light. Two typical phase extraction methods have been widely applied to obtain the wrapped phase map in a fringe pattern: phase-shifting profilometry (PSP) [9] and Fourier transform profilometry (FTP) [16].
Because it works using a pixel-by-pixel measurement, the PSP method is insensitive to the vast variation in reflectivity on the surface of objects and can acquire a high resolution and accuracy depth [17]. In the PSP method, multiple fringe stripes with the same wavelength are usually utilized [18]. The application of pattern sequences with different periods avoids the ambiguity, which results from the fringe projection in classical phase shifting. Combined with the Gray code and the phase shifting, this approach can measure discontinuous surfaces accurately [19]. Yu et al. [20] introduce unequal period fringes to avoid the period jump error from the traditional combination of the Gray code and phase shifting. However, at least three shifted grating images are needed, which limits the application in case of dynamic scenes. A real-time measurement system based on the phase-shifting method is described in [21], which helps acquire a three dimensional (3D) shape at 30 fps, with 266 K points per frame. Zhang et al. [22] propose 3D shape measurement at 667 Hz by using a digital-light-processing (DLP) technology to switch binary structured patterns. The ambiguity introduced by high pattern frequencies has been relieved by embedding a period cue into the projected pattern [23]. Although a phase-shifting method can achieve real-time measurement by improving the frame rate, synchronization between the projector and the camera is necessary.
The unique advantage of using FTP is that it only requires a one-shot image and no synchronism for dynamic scenes. A Fourier transform (FT) is usually used to obtain a wrapped phase of single fringe patterns on smooth objects. However, it is difficult for FT to acquire the correct phase information at the edges owing to the spectral leakage in the neighborhood of discontinuities or at the areas with a large surface slope [24]. Adopting windowed Fourier transform (WFT) or wavelet transform (WT) to calculate local phase information can reduce the leakage errors [25].
It is critical in single-shot method to obtain absolute phase of each pixel in the modulated pattern because of periodicity of the projected pattern. Guo and Huang [26] spatially unwrapped the phase from FTP by embedding a cross-shaped marker in the single fringe pattern. The position of the marker that is utilized to calculate the absolute phase map is detected and restored before the forward. Xiao et al. [27] and Budianto et al. [28] embedded special markers and marker strips into the sinusoidal grating. However, these approaches based on markers cannot obtain the absolute phase when there is no encoded marker on the object. Meanwhile, the performance is affected in the unwrapped phase area covered by the markers. Without any additional marker, Li et al. [29] performed single-shot absolute phase recovery for the FTP method by the geometric constraints.
A major group of approaches define color coded multi-slit or stripe patterns with a special sequence by locating intensity peaks or edges respectively in order to obtain dense reconstructions. Pagès et al. [30] designed colored stripe patterns with De Bruijn sequence where both intensity peaks and edges can be located without loss of accuracy and reducing the number of hue levels included in the pattern. Wu [31] adopted binary stripes to identify the local fringe order, while the colorful grids provides additional degree of freedom to identify the stripes. However, this encoding scheme is not available for the condition that a pure color isolated object is located in a similar color sequence period.
This study proposes a single-shot sensing method with color sequence coded fringe to acquire precision and dense depth. Firstly, in order to design a reasonable sequence for phase period distinction, a mathematical model is established to prove the suitability of De Bruijn sequence. Secondly, two colors are used to code the De Bruijn sequence. Different from the other color-coded patterns, the phase information of each point is located at two channels, which can help us get a more precise phase distribution. Thirdly, a Gabor filter is used to extract the wrapped phase from the intensity information. Benefiting from the De Bruijn sequence, the phase unwrapping is easily achieved by color decoding. Meanwhile, based on the wrapped phase period, the error sequence order is checked and corrected by the phase neighborhoods back and forth to get a precise period order. Finally, stereo matching is accomplished to acquire the depth. Compared with the authors of [31], we used the De Brujin sequence to code the fringe and proved it, which can improve the robustness to the color scenes and different materials in a complex scene. Experiment results show that the performance of the proposed method exceeds the Kinect and ToF camera performances.
The rest of this paper is organized as follows. The mathematical model is given in Section 2. Section 3 provides a system overview. Color sequence coded fringe pattern generation is introduced in Section 4. The phase decoding and the stereo matching are depicted in Section 5. Experiments conducted to verify the proposed method are shown in Section 6. Section 7 provides a conclusion.

Mathematical Model for Sequence Encoding
For the fringe pattern depth sensing method, the critical issue is to distinguish the period order of the wrapped phase. Here, we want to use the color information to code the sequence of the phase order. Considering that a color pattern contains three channels, the intensity values of the blue channel vary as a cosine function of a certain frequency and the remaining channels are used for sequence encoding. A sequence-coded fringe Y i can be defined as follows: where α i , β i , x i are the intensity values for the red, green, and blue channels respectively, and i is the phase period order of the current fringe. The sequence is used to code the period order. To ensure that the sequence coding only contains two colors, α i and β i are limited in {0, x i }. Considering that x i is the phase information, which has been set in advance and is not used as the color-coding, the distinguishable color is actually decided by where S 2 and S 3 are eliminated because the state S 2 results in a gray fringe and the state S 3 leads to the phase information only included in the blue channel. In a sequence coded fringe pattern, Y i can make full use of its neighboring fringes in the sequence to stand out from other fringes. Assuming that only the adjacent fringes Y i−1 and Y i+1 are combined with the current fringe to denote the subsequence P i , The cross-correlation between any two subsequences P i and P j in the proposed pattern is calculated using the following equation: A subsequence is unique in the sense that a sequence means that the cross-correlation Cor P i , P j needs to reach the minimum. In a sequence, the mathematical model of the problem can be represented as: Equation (5) can be simplified by the quantification to acquire an explicit solution: The meaning of the Equation (6) is that the cross-correlation of any two subsequences P i and P j can achieve the minimum if P i = P j . The De Bruijn sequence with this property can achieve this aim to be used as a kind of sequence coding strategy. In this study, we want to use two colors to encode the sequence. Considering the difficulty of the sequence decoding, the length of the subsequence is three. Therefore, the circle of the De Bruijn sequence is eight. In fact, a longer length of the subsequence still meets the demand only if the subsequence can form a De Bruijn sequence.

Overview of the System
The depth sensing system proposed in this study consists of a camera and a projector as shown in Figure 1. The dotted line and straight line mean the camera and the projector are mounted on the same horizontal plane and their optical axes are parallel. The matching points are on the same row owing to the epipolar constraint. The calibrations are accomplished in advance to obtain the intrinsic and extrinsic parameters of the camera and the projector respectively. The procedure followed for the proposed method is shown in Figure 2. First, the color sequence coded fringe pattern is projected on the target object and the camera captures the modulated image. Second, the intensity information and color information are extracted from the captured image. The intensity phase distribution of the captured image is calculated from the intensity information with a Gabor filter. Third, the phase unwrapping is decoded by the De Bruijn sequence in color information. The absolute phase is acquired by the phase distribution and the period. Finally, the depth is acquired by the correspondence determination of the camera and the projector by phase stereo matching.

Color Sequence Coded Fringe Pattern
Based on the mathematical analysis in Section 2, we designed a color sequence coded fringe pattern. In this pattern, the phase encoding is adopted based on the intensity information and the De Bruijn encoding is used for the color information. The pattern generation includes two steps: in the first step, the intensity of the fringe pattern in a period varies as a cosine function of a certain frequency, which will be used for phase distribution extraction; in the second step, the De Bruijn code demonstrated by the color information is embedded into the fringe pattern to eliminate the phase ambiguity. Meanwhile, the trip point of the wrapped phase assists in overcoming the measurement sensitivity caused by the De Bruijn coding. The detailed pattern generation is as follows.

Phase-Coding Based on the Intensity Information
In the proposed cosine fringe pattern, the stripe direction is perpendicular to the direction of cosine coding. The intensity information I(x, y) is coded periodically in the horizontal direction. In the vertical direction, all the intensity values are the same. Assuming that the period of the cosine fringe is T, the intensity value I(x, y) in the coordinate (x, y) is defined as follows: where (x, y) represents the row and column coordinates in the pattern, A is a constant DC value, B is the amplitude and ϕ 0 is the initial phase of the cosine signal. In practice, we set the initial phase ϕ 0 is π 2 . In this case, the wrapped phase is consistent with the period of the De Bruijn code, which is convenient for De Bruijn decoding. The cosine fringe pattern is shown in Figure 4.

De Bruijn Coding Based on the Color Information
To distinguish between the period numbers of the fringe pattern, the De Bruijn sequence is adapted to generate the stripe pattern C(x, y), which only contains two values. The intensity I max and the intensity I min are labeled as 0 and 1. The De Bruijn sequence for alphabet {0, 1} and 3-length subsequence is 00,010,111 as shown in Figure 3. In this stripe pattern, the width of a stripe equals the period of the fringe pattern T. Each cycle of the De Bruijn sequence consists of eight stripes. The sequence length can be set to any value like 4T, 8T, 16T, 32T . . . Large sequence length benefits the corresponding but the decode complexity arise significantly. We empirically choose 8T for a good balance between corresponding accuracy and computational complexity. For color-coding, the red and green channels are adopted to represent stripe 0 and 1 respectively. Meanwhile, to ensure that the projected pattern only contains two colors, the nonzero values in the two channels must be set to be same as that in the blue channel in space. Considering that the blue channel is used for phase coding, the composite color pattern is defined as: Here, C(x, y) is the code value in the coordinate (x, y), 1 corresponds to the red channel and 0 corresponds to the green channel. This procedure is shown in Figure 4. In this pattern, two colors are used to code the De Bruijn sequence. Indeed, the color coded strategy is to attach the color information to the intensity information. Unlike other color-coded patterns, the phase information of each point is located at two channels: the blue channel and the red or green channel. This can help us get a more precise phase distribution. Indeed, the red channel and green channel can compose a new fringe pattern like the blue channel.

Projector-Camera Stereo Matching
After the designed pattern is projected onto the objects, the camera acquires the captured image. We first need to extract the wrapped phase from the intensity information. Then, the phase unwrapping is conducted by decoding the De Bruijn color information. Finally, the stereo matching of the projector and the camera is accomplished by the correspondence determination based on the unwrapped phase. The depth is obtained by the triangulation principle.

Phase Estimation
Considering that the phase information is distributed in multi channels based on the pattern design strategy, we can improve the quality of the phase information by channel overlay. The intensity information from the captured image is defined aŝ whereÎ(x, y) varies cosinoidally in the horizontal direction.
In the proposed method, the Gabor filter is adapted to calculate the intensity phase distribution. The Gabor filter is a special case of the short-time FT with a local window function and specializes in the extraction of local region and frequency domain information. A two-dimension Gabor transform, a complex exponential function whose modulation kernel function is a Gaussian function, is usually used to extract the phase with a specific direction. Gabor filter is applied to the sum of the intensities of all the channels. This is because the phase information is distributed in all channels. Let G(x, y) denote the response ofÎ(x, y) after convoluting with the 2-dimension Gabor filter, then G(x, y) = |R(x, y)|e j(ωx+ϕ(x,y)) , where R(x, y) is the amplitude of the Gabor filter response, ω and ϕ(x, y) represent the frequency and phase in the coordinate (x, y) respectively. The phase information ϕ(x, y) is calculated as follows where G r (x, y) and G i (x, y) represent the real and imaginary component of G(x, y) respectively. In Equation (11), ϕ(x, y) is a periodic wrapped phase and ϕ(x, y) ∈ (−π, π). To obtain the unwrapped phase, the period of the fringe should be calculated. The unwrapped phase is defined as follows: where n denotes the period number which is determined by the De Bruijn coding information.

Color Decoding
De Bruijn coding is based on the color information. For the image captured by cameraÎ(x, y), the code valueĈ(x, y) is obtained by the color component whereÎ r (x, y) is the red channel intensity,Î g (x, y) is the green channel intensity. However, this direct color decoding method is sensitive to the color information of the target surface. To obtain the De Bruijn code values, we adopt a voting mechanism to adjust the decoding result. De Bruijn coding is distributed in the horizontal direction. The code values in each stripe are the same. Thus, the correct code value is in a majority of the vote. After the adjustment, the code values in each stripe are made uniform and the error caused by the local color is revised.

Phase Unwrapping Based on De Bruijn Sequence
In the De Bruijn sequence pattern obtained after the color decoding, the two adjacent stripes cannot be distinguished from each other when their code values are the same. In terms of encoding principle, the initial phase ϕ 0 is set at π 2 to ensure that the wrapped phase period coincides with the period of the De Bruijn coding stripe. In case of phase unwrapping, the range of each De Bruijn stripe is obtained based on the width of the wrapped phase in the same position. Meanwhile, let us assume that the order of the stripe in a period of the De Bruijn sequence is W, where W is an integer from 1 to 8. Benefiting from the advantage of the De Bruijn sequence, one error code order can be checked by its neighborhoods. The period number n is calculated as: where k is the circle number of the De Bruijn sequence.

Phase Based Stereo Matching
In the proposed method, a reference plane technique is adopted to acquire the depth of the scene. The reference plane is a captured pattern which is projected by the projector in a given depth. The stereo matching is conducted between the reference plane and the modulated image. The geometry of the reference plane and the object as shown in Figure 5, where O p and O c are the projector optical center and the camera optical center respectively. The point (i, j) in the projected pattern is a matching point of (x, y r ) in the camera when there is no object in front of the reference plane. In practice, the point (i, j) is a matching point of (x, y) which reflects from the point A in the object. When the epipolar constraint and the relative position between the projector and the camera, the phase of point (x, y) exhibits a shift to the left of point (x, y r ). Considering the similar triangles in Figure 5, the depth can be calculated by: where f is the focal length, B is the baseline between the camera and the projector, d c is the distance between current pixel and the left border in the camera image, and d p is the distance between the matching point and the left border in the pattern, d = d c − d p is the disparity.

Simulation Experiments of the Proposed Method
In this section, the simulation experiments are conducted to demonstrate the procedure of the proposed method. We use the 3ds Max software to simulate the SLI system. The experiments of the real scenes are given in Section 6.
The whole procedure of the proposed method is shown in Figure 6. In Figure 6, (a) is the captured image; (b) illustrates the intensity information acquired by Equation (7); (c) shows the wrapped phase extracted from the intensity information by Gabor filter; (d) is the De Bruijn stripe sequence from color decoding; (e) is the unwrapped phase; (f) and (g) are the final calculated depth and 3D reconstruction of the proposed method. We can find that the proposed method can acquire a dense and accurate depth map in the simulation experiments. Considering of ambient light in real scenes, experiments on different color plane are conducted to evaluate the accuracy achievable on color objects with respect to white object. The experiments are shown in Figure 7. The plane is placed at 1.0 m position from the system. A plane is fitted as the reference plane to evaluate the mean of absolute errors. In Table 1, RGB denotes the values of the red channel, green channel and the blue channel. From Table 1, we can find the errors of red plane and blue plane are a bit larger than other planes. But the results are acceptable.

Experiment Results in Practice
To verify the feasibility of the proposed method in practice, a series of experiments for different scenarios have been conducted. The experimental platform is established as shown in Figure 8. The camera is a FL3-U3-13E4C-C image sensor (Point Grey Flea, Richmond, BC, Canada) with 1280 × 960 resolution. The projector is DMD (Digital Micromirror Device) Light Commander instrument (Light Craft 4500 Component) (Texas Instruments, Dallas, TX, USA) with 1824 × 1140 resolution. The baseline distance between the camera and the projector is 93 mm. The optical axes of the camera and the projector are parallel. In our experiment, we try our best to reduce the influence of the disalignment. In the designed pattern, the period of the fringe is 21 pixels and the period of the De Bruijn sequence is 8 stripes. The experimental platform is aligned vertically in advance so that the epipolar lines are along the vertical direction are based on the epipolar constraint. The projector-camera platform is calibrated by the plane-based calibration method [32]. This method is implemented as an extension of the Bouguet Camera calibration toolbox [33]. The intrinsic and extrinsic parameters are shown in Table 2. The point clouds of the recovery scenes are reconstructed by MeshLab software [34]. Quantitative and qualitative experiments are employed to evaluate the performance of the proposed method.

Quantitative Analysis
Firstly, we calculate the root mean square error (RMSE) for a series of planes placed at different depths ranging from 0.9 to 1.4 m. A Kinect and a ToF camera SwissRanger 4000 (Mesa Imaging, Zürich, Switzerland) are used as the competitors. Each position of the plane is measured more than 10 times. The quantitative results of the comparative experiment are shown in Figure 9 where the measurement unit is mm. The tendency of the RMSE adheres to the rule that the measurement precision decreases with increasing distance. From this figure, it can be observed that the performance of our proposed method is better than that of Kinect and ToF camera. In addition, the measurement of the discontinuous surface is used as another metric to evaluate the performance of our method. In this scene, a cuboid next to a cube is placed at a different distance from the camera so that the junction of two objects forms a discontinuous surface. We try our best to adjust the three systems to have the same depth to the object and adopt the relevant errors as the metric to replace the absolute errors. The performance of our method is shown in Figure 10a,b. The results of Kinect (Figure 10c,d) and ToF (Figure 10e,f) are also used as the benchmarks to evaluate the performance. Figure 10b,d,f are the cross-section-plot results for the same position in Figure 10a,c,e respectively. The red dotted lines are the actual depth obtained by the Least Squares Fitting method in the position. Table 3 provides the mean of absolute errors for the three competitors. We can find that our method generates smaller errors than Kinect and ToF camera, which can validate the precision of the proposed method. In addition, the measurement of the discontinuous surface is used as another metric to evaluate the performance of our method. In this scene, a cuboid next to a cube is placed at a different distance from the camera so that the junction of two objects forms a discontinuous surface. We try our best to adjust the three systems to have the same depth to the object and adopt the relevant errors as the metric to replace the absolute errors. The performance of our method is shown in Figure 10a,b. The results of Kinect (Figure 10c,d) and ToF (Figure 10e,f) are also used as the benchmarks to evaluate the performance. Figure 10b,d,f are the cross-section-plot results for the same position in Figure 10a,c,e respectively. The red dotted lines are the actual depth obtained by the Least Squares Fitting method in the position. Table 3 provides the mean of absolute errors for the three competitors. We can find that our method generates smaller errors than Kinect and ToF camera, which can validate the precision of the proposed method.

Qualitative Results
For visualization of the results obtained by the proposed method, especially the recovery of edges of the object, some plaster geometries are placed at a distance of about 1 m from our platform. The actual scene and the acquired images are shown in Figure 11a,b. In Figure 11, the bottleneck of the vase is concave downward and the body of the vase is an upward convex. The last two geometries contain smooth areas but with sharp edges. The depth map and the cloud point of the proposed method are shown in Figure 11c,d respectively. The results of Kinect and ToF camera are given in Figure 10e,f and Figure 11c,d . Benefitting from the accurate phase unwrapping procedure, our proposed method can not only recover the depth of the smooth surfaces and clear edges but also acquire the curved areas such as the surface of the vase. In case of recovery using Kinect and ToF camera, the edge is blurred and the surface is coarse because of the low precision and resolution. In addition, some sculptures of the human body parts are selected to demonstrate the feasibility in case of variations in the surface texture. The depth maps acquired by the proposed method, Kinect and ToF are shown in Figure 12b-d respectively. From the depth maps, we can see that some details such as the recovery of ear and figure are lost and the profiles are blurred in the depth map obtained by Kinect. Although the objects recovered by ToF camera are clear, the resolution of ToF camera is only 176 × 144. The granular effect of ToF camera results is high which affects the 3D reconstruction significantly. Different from the blurring and granular visualization, the depth maps are clear in our proposed method, especially for the hair and mustache of the man sculpture. This experiment can reflect the high accuracy of our method. In this experiment, the period of 21 pixel is kept the same in all of the measurements. The appearance of period of fringes in Figure 12a,b seems different with (c) because we zoom the objects into different ratio for better exhibition of the result.
Color and complex scenes present a challenge because the surface color of objects may lead to errors in the color decoding process. Moreover, the optical absorption varies with different materials, which results in sensitive sensing. To validate that the proposed method is robust to the color scenes and different materials in a complex scene, we select two scenes with multiple objects and rich colors as shown in Figure 13. The surface of the bookrack, pot, and book are smooth and made of specular material while the surface of the pear and straw hat is diffuse and made with a rough material. The results of Kinect and ToF are also shown in this Figure. The pink and cyan colors in the first scene are similar to those in the proposed pattern. However, benefiting from our pattern design strategy, the details of the depths maps are clear and dense, which can prove that the proposed method outperforms the Kinect and ToF cameras, both in precision and resolution.

Conclusions
In this paper, a single-shot sensing method with color sequence coded fringe is proposed to acquire precise and dense depth. Color coded sequence information is embedded into the phase information to relieve the phase unwrapping. On the one hand, the phase information of each point is located at multiple channels, which can help us get a more precise phase distribution. On the other hand, a wrapped phase period assists the sequence decoding to get a precise period order. We have established a theoretical model to prove the suitability of the De Bruijn sequence and constructed an experimental platform to verify the performance of the proposed method. The results show that our method can demonstrate excellent performance terms of precision, as well as resolution, as compared to off-the-shelf devices.