Adaptive Color Calibration Based One-Shot Structured Light System

In one-shot color structured light systems, the color of stripe patterns are typically distorted with respect to color crosstalk, ambient light and the albedo of the scanned objects, leading to mismatch in the correspondence of color stripes between the projected and captured images. In this paper, an adaptive color calibration and Discrete Trend Transform algorithm are presented to achieve high-resolution 3D reconstructions. The adaptive color calibration, according to the relative albedo in RGB channels, can improve the accuracy of labeling stripe by alleviating the effect of albedo and ambient light while decoding the color. Furthermore, the Discrete Trend Transform in the M channel makes the color calibration an effective method for detecting weak stripes due to the uneven surfaces or reflectance characteristics of the scanned objects. With this approach, the presented system is suitable for scanning moving objects and generating high-resolution 3D reconstructions without the need of dark laboratory environments.


Introduction
3D shape acquisition has received considerable research interest in the last decade. There are some groups of techniques for 3D shape acquisition, such as stereo vision, Time Of Flight (TOF) and structured light [1]. Compared with TOF and stereo vision techniques, the main advantages of structured light techniques include the easy image processing involved and the high accuracy achieved in the 3D reconstruction [2]. Structured light techniques are based on the triangle principle. First, some designed patterns are projected onto the scanned objects, and a camera captures the scene. Second, the correspondence between the projected and captured patterns is established. Finally, the 3D coordinates of the objects are derived from correspondence between the projected and detected patterns that combines the geometric calibration parameters of the projector and camera.
In the development of structured light techniques, the time-multiplexing techniques, also called multi-shots techniques, were the first to appear. These techniques can achieve a high level of accuracy and resolution through encoded points by projecting a sequence of patterns. However, there is a trade-off between the scanning speed and the resolution. In the recent years, fast time-multiplexing techniques have been developed that allow the shape of the moving object to be acquired. However such techniques require specialized hardware. To achieve a high resolution and high speed at the same time, the authors in [3,4] propose a one-shot structured light system for rapid range acquisition. For one-shot structured light systems, the final reconstruction result is significantly affected by three vital processes: the designation of projected patterns, the detection of coded cells in the captured image and the establishment of correspondence between the projected and captured coded cells. In recent years, as more research and efforts have been made in the field, these techniques have generated higher-quality 3D reconstruction results, making the techniques quite useful and widespread in industry and cultural heritage.
The projected pattern has a strong influence on 3D shape acquisition. Clearly, the pattern itself should be easy to detect, and it should be possible to establish correspondence between the projected and captured coded cells. Different patterns can be found in [5,6], including the use of color stripe [7], a unique shape [8], modulated patterns [9] or color-coded grids [10]. In [11], a 2D M-array pattern with a Hamming distance is suggested for robustness consideration. The matrix is comprised of many symbols, and this technique enhances the tolerance of failure to recover symbols. In this pattern, each symbol contains several pixels; thus, the reconstruction resolution is limited. In [12], the authors propose a self-adaptive system for real-time range acquisition that uses pattern color, geometry, tracking and graph cut to solve the corresponding problem. The system using coded colors can obtain denser results than that using shapes or modulated patterns. However, certain factors, such as the reflectance characteristics of the scanned object, ambiguity due to uneven surfaces and ambient light, have a significant effect on color classification and the detection of coded cells. To avoid the influence of these factors, Zhang et al. [13] present a method using multi-pass dynamic programming and edge-based reconstruction. This method alleviates albedo influence, but edge-based reconstruction cannot locate the accurate sub-pixel position of the edge. Pages et al. [2] propose a peak-based coding strategy that can improve the resolution without loss of accuracy; however, the albedo of the illuminated object is modeled by a static matrix, which limits the application of the method in a dynamic scene. For the purpose of robustness under ambient light, Benveniste et al. [14] develop a structured light system that decodes the color using a color invariant and optimizes the projected patterns by flexibly changing stripe color for different colored objects.
Addressing the impacts of these factors, in this paper, we focus on a one-shot color structured light system and propose an adaptive color calibration and Discrete Trend Transform (DTT) algorithm to obtain high-resolution 3D point clouds without the need of dark laboratory environments. First, a relative albedo function between two color channels is proposed to calibrate color adaptively. With this method, distorted color caused by different albedos in RGB channels is calibrated adaptively, and white ambient light in the scene can be canceled during color classification. Second, considering the different intensity caused by uneven surfaces or viewpoints, a novel M channel, DTT and sub-pixel peak localization algorithm are proposed to segment and detect the stripes. These techniques significantly improve the accuracy of locating and labeling the stripes. In the literature, the goal of Fechteler's work is most similar to our work, where k-means and clustering are utilized to classify the color adaptively [15].

System Setup and Framework
To realize high-resolution 3D reconstruction, a DLP projector with a WUXGA resolution (1,920 × 1,080) is used to project structured color light, and a Nikon D200 camera with a resolution of 5.2 mega pixels (2,560 × 2,048) is used to grab the scene. The higher resolutions of the camera and projector allow us to achieve denser and more accurate point clouds. The relative direction angle between the projector and camera is 17 • .
To acquire 3D shapes in a natural instead of dark environment, the following steps are performed: 1. Project stripe patterns onto the surface of the target objects and capture an image under ambient light. 2. Calibrate the geometric distortion of the camera and projector. We refer to the method proposed by Pages [2]. 3. Adjust the color by static and adaptive color calibration. The color crosstalk is approximated in static color calibration, and then the adaptive color calibration is adopted to adjust color with respect to the object's color and ambient light. 4. Extract the stripes and locate the peaks of the stripes. The proposed DTT algorithm is applied to segment the stripes. Furthermore, the sub-pixel accurate localizations of the peaks are derived. 5. Classify the colors of the stripes and determine the correspondence between the projected and captured patterns by the dynamic programming algorithm [16] with a cost function of the hue value. 6. Generate the 3D point clouds and reconstruct the surfaces of the objects through 3D delaunay triangulation.

Projected Coding Patterns
We use a modified De Bruijn sequence as the arrangement of color stripes to design the projected patterns. Suppose that D(k, n) denotes a k-array, n-order De Bruijn sequence; then, the element would be E = {0, 1, 2, 3, 4, 5}. Each element is assigned a color with a distance of 60 • in hue value, as listed in Table 1. We consider that a neighbor stripe should differ by at least two channels in (R, G, B). With this constraint, only three nodes are allowed to follow one certain node. For element 0, the possible next following node should only be 1, 2 or 3. In this manner, a De Bruijn sequence with a length of 162 is generated. Figure 1 is the cut-out of the generated stripe patterns.  Figure 1. The cut-out of generated stripe patterns.

Static and Adaptive Color Calibration
The color structured light system in a robust manner consists of two key processes: the recovery of color in the grabbed image in contrast with the projected color on the object's surfaces and the accurate establishment of correspondence between the projected and captured patterns. The color is distorted because of the following reasons: • The color crosstalk between the projector and camera.
• The ambient light in the environment.
• The different colored object results in different albedos of the RGB channels.
• The stripes in the captured image typically vary in both amplitude and width due to the uneven surfaces or reflectance characteristics of the scanned objects.

Static Color Calibration
The color distortion caused by projector-camera color crosstalk is adjusted by a static color calibration, in which the parameters are stable for a pairwise projector-camera and measured in advance. The camera captures the reflected light through RGB channel sensors as an image. The model of this process was formulated as Equation (1) by Caspi et al. [17].
where I c is the observed color through the camera, and I p denotes the corresponding projected color projected by the projector. O addresses the ambient light illumination. X indicates the projector-camera color crosstalk matrix, and A represents the albedo matrix of the object surface. During the static calibration process, the color crosstalk matrix X in Equation (1) is approximated by projecting solid color stripe patterns to a white planar board.

Adaptive Color Calibration
To recover the stripe's color with respect to the different albedos in the RGB channels and ambient light in the scene, the relative albedo is defined to calibrate the stripe color adaptively, as shown in Equation (2).
where a r is the red channel reflectance ratio, α indicates the green channel relative reflectance ratio compared to the red channel, and β denotes the blue channel relative reflectance ratio.
In the stripe segmentation and color classification processes, I p should be used as an input. Unfortunately, I p is distorted by the crosstalk effect and albedo matrix. Given camera output I c , we can compute the calibrated color through color calibration, which can be expressed as Equation (3).
whereÃ, compromised withα andβ, is the estimation of A. Then, the calibrated color I a , defined Equation (4), is the input for stripe segmentation and color classification.
The color crosstalk matrix X can be obtained from static color calibration. In the following section, we conclude thatα andβ can be estimated from the relative albedo estimation. The influence of ambient light O and a r can both be canceled in the color classification and stripe segmentation processes.

Relative Albedo Estimation
We estimateα by defining a relative albedo function between the red and green channels. Similarly, β is estimated by comparing the red and blue channels. A histogram for each channel is produced. For example, the histogram of the red channel is expressed as Equation (5): where H r i (1 ≤ i ≤ n) is the histogram bin, and W r i (1 ≤ i ≤ n) is the bin value. The superscripts r, g and b denote the red, green and blue channels respectively. To match H r , H g is transformed into H g , and a flow matrix f = {f ij }(1 ≤ i ≤ n, 1 ≤ j ≤ n) is defined to represent the transition process. A specified f minimizes the overall cost function. The cost function is expressed as Equation (6).
where f ij denotes the flow from H g i to H r j , and |j − i| denotes the flow distance, which implies that we encourage a flow from one bin in the histogram to another bin with a shorter distance. Equations (7)-(10) are the constraints of the process.
By adopting the Earth Mover's Distance (EMD) algorithm [18], which measures the least amount of work needed to match between two histograms through linear programming, a flow matrix f = {f ij } from one histogram to another can be obtained. The f ij is the pixel number transited from bin i in one histogram to bin j in another. Then, we propose a relative albedo function between the green and red channels as Equation (11) and obtain the estimated relative reflectance ratioα(i) for each grey level.
The relative reflectance ratioβ(i) can be estimated in a similar manner. After calibration, I a serves as the input for stripe segmentation and color classification.

Stripe Segmentation and Peak Localization
After static and adaptive color calibration, the color of the stripe can be classified correctly in the following Section 5. However, the pixel-wise O and red channel reflectance ratio a r still remain in I a , making effective stripe segmentation and accurate peak localization challenging. Thus, we propose a novel M channel and DTT algorithm to derive robust stripe segmentation results.

M Channel Definition
An M channel, which is a function of the RGB channels in I a , is proposed in Equation (12) to suppress ambient light O .
where C r ij in Equation (13) is the red channel value of pixel (i, j) in I a , and the superscripts r, g and b denote the red, green and blue channels respectively.
Given the assumption that ambient light is mostly white light, i.e., o r ≈ o g ≈ o b , the M channel can be simplified as Equation (14).
Note that there is at least one invalid color channel in the designed stripe patterns. Thus, min(p r ij , p g ij , p b ij ) is always zero. As a result, the M channel suppresses white ambient light. However, the red channel reflectance ratio a r is still an interference.
Another advantage of the M channel is that it integrates the RGB color channels into one channel while maintaining the original characters, which facilitates further processing. Figure 2 is the M channel of Figure 3    (e) (f)

Stripe Segmentation
The stripe pattern is vertical, and we process the M channel horizontally. A stripe in a captured image means a peak in the M channel. Instead of segmenting stripes by illuminance value, we detect stripes by locating the intensity rising and falling edges in the M channel. The rising edge and falling edge information is derived by applying DTT in Equations (15,16). and where T ij indicates the local trend of pixel (i, j) in the M channel. The local trend measures the probability of an increasing or decreasing trend in a given window. The size of the window is controlled by N , which is typically assigned a value of less than half of the stripe width in the captured image.
In Equation (15), |k − h| ≤ N . Because N is a small number, we assume that the reflectance ratio is locally continuous, i.e., a r ik ≈ a r ih . Then, we obtain Equation (17).
where p ik = (p r ik , p g ik , p b ik ) T and p ih = (p r ih , p g ih , p b ih ) T . Thus, Equation (15) can be rewritten as Equation (18). T ij is only related to the projected pattern and is independent of the reflectance ratio and ambient light.
The DTT in Equation (18) works in the following manner: • at the falling edge in the M channel, T arrives its maximum N (N +1) 2 .
• at the rising edge in the M channel, T arrives its minimum − N (N +1) 2 .
• otherwise, the value of T is between maximum and minimum. at f (n)'s rising edge. The values between the maximum and minimum denote that f (n) is neither monotonically increasing nor decreasing. The transition from the maximum to minimum values in T indicates a peak in the M channel. Thus, the max-to-min transition in T is searched for to locate the area of stripes. The rising edge in the M channel is marked as the start of a stripe area, and the falling edge in the M channel is marked as the end of a stripe area. The result of peak detection using DTT is illustrated in Figure 5. We can determine that DTT is more robust than the local adaptive threshold method [19] for weak stripes.  With the help of the M channel and DTT, there is no need to cut the foreground from the background. The standard of DTT detecting color stripes is a strict rising edge and strict falling edge in the M channel. The standard is so strict that very few points in the background are taken as candidate stripes after DTT. The robustness is enhanced by excluding the influence of the background.

Peak Localization
Accurate sub-pixel peak localization must be estimated to derive the 3D point cloud. Some existing sub-pixel peak localization algorithms are detailed and compared in [20]. To estimate the peak position accurately, the maximum value M max of the M channel is searched for in each stripe area, and its position is labeled as I max . The estimated sub-pixel peak position I estimated is proposed as Equation (19): where I i is the horizontal offset, M i is the M channel's intensity value, and α is the related ratio, which defines the pixels related to M max around I max . In the following experiments, α is set to 0.8.

Color Classification
The color is classified in HUV space. Assuming that ambient light is mostly white light and taking the color channel C r > C g > C b as an example, we calculate the hue value from Equation (20).
where (C r , C g , C b ) is the color intensity in I a . In other cases, h can be calculated by equations similar to Equation (20). Equation (20) demonstrates that the O and a r are canceled out, and the h of the colors in I a is only related to the projected patterns. Thus, the classification process is independent of the reflectance characteristics of the scanned objects and white ambient light. The correspondence between the detected and projected patterns was established by applying dynamic programming [16]. We compare three detected neighboring stripes that each have three projected neighboring stripes. The sum of the hue value difference is defined as the score function. By considering neighboring conditions, we can accurately handle the edge conditions when occlusion occurs.

Adaptive Color Calibration Performance
Adaptive color calibration is a crucial method for assuring the accuracy rate of color classification. This method can adjust a stripe's color adaptively with respect to the different albedos in the RGB channels. Figure 6(a,c) illustrates that the gain of the R channel is stronger than those of the G and B channels. After adaptive color calibration, the gains of the three channels are almost same in Figure 6(b,d). Some quantitative comparisons between the adaptive albedo calibration and non-adaptive calibration have been performed. The results in Figure 7 clearly show that adaptive albedo calibration increases the number of correctly labeled points (by more than 15%).  (c) (d) Figure 7. A comparison of color classification results.

Peak Localization Performance
In this subsection, we mainly focus on analyzing the error of the estimated sub-pixel peak position. The performance of the traditional methods is compared with our method, which is referred to as the Max-Min Weighted Average Method (MMWA). Some traditional methods are listed as follows: • Max Method (MAX) [2]. Pages et al. use the maximum intensity value to define the M channel and choose localizations where the M channel reaches its maximum value as the estimated peak position.
center = x where I x = max(I i ) • Weighted Average Method (WA) [21]. This method calculates the average intensity value of the RGB channels and uses a weighted algorithm in the entire area of the stripe to derive the peak position. • Midpoint Method (MID) [12]. This method simply uses the midpoint of the stripe as the feature point. • Probability Method (PM) [15]. Fechteler et al. use the probability method to estimate the peak localization. They detect the maximum in each RGB channel and assign each color peak a probability of being a valid stripe. Peak localization is estimated by these probabilities.
where P i is the probability of C i being a valid stripe To simulate the captured stripes, we generate a designed color stripe pattern. As shown in Equation (23), the intensity of a valid color channel is consistent with the Gaussian Distribution and corrupted with noise. Each RGB channel is additionally polluted by some offset o simulating ambient light. For example, a red stripe compromises a valid red channel generated by Equation (23) where A denotes the amplitude of the color intensity, n is the measured pixel, βA is the noise amplitude and ∈ (0, 1). We consider different noise levels: SNR = 25 dB, 20 dB and 18 dB. c is the stripe peak position, σ controls the stripe width and o is the intensity offset. The RMS error of peak localization c, defined as Equation (24), is measured in each method by analyzing 10,000 samples. The average RMS error at the different noise levels are listed in Table 2, as σ changes from 0.3 to 0.6.

RM S Error
In all of the methods, the RMS error increases as the noise increases from 25 dB to 18 dB, while MMWA derives the least RMS error at the same noise conditions. Figure 8 depicts that WA and MID are sensitive to light offsets. Nevertheless, our method obtains a high level of accuracy, even in a strong ambient light environment, because the offset is suppressed when deriving the M channel. Form Table 2 and Figure 8, we can conclude that our method outperforms other methods in terms of RMS error with respect to noise, stripe width and ambient light.   Figure 3(a,c) demonstrates that the hand and face are projected with stripe patterns in a natural environment. With the presented methods, the color of the stripes could be classified correctly, the weak stripes could be detected effectively and the surfaces of the object could be reconstructed robustly, even when illuminated by such a strong ambient light, as depicted in Figure 3(b,d). Meanwhile, the reconstructed surfaces of the hand and face illustrate the richness in detail. Figure 3(b,d) contains 85,687 and 75,371 vertices respectively. Note that there is a hole near the nose in Figure 3(d), due to the occlusion in the captured image. Furthermore, Figure 3(e,f) demonstrates that the 3D shapes of multiple objects with different albedos in the same scene could be reconstructed simultaneously because color was calibrated adaptively according to the gray level and the stripes were segmented by DTT.

Conclusions
We have presented a color structured light system for robust 3D shape acquisition with regard to the reflectance characteristics of the scanned object, ambiguity due to uneven surfaces and white ambient light. Our contributions lie in two aspects of the proposed approach. First, we proposed a novel method for calibrating color adaptively according to the colored objects and white ambient light in the scene, as Sections 3.2 and 3.3 discussed, thus enhancing the robustness of the system and widening the range of potential applications. The second contribution lies in the effectiveness with which weak stripes caused by uneven surfaces of the scanned object can be found, by using a M channel and a DTT algorithm. Furthermore, we proposed an algorithm to locate the sub-pixel peaks of stripes. Through some experimental evaluations, we demonstrated that this structured light system employing the proposed techniques could obtain high-resolution 3D reconstructions without the need of dark laboratory environments.