Color Photometric Stereo Using Multi-Band Camera Constrained by Median Filter and Occluding Boundary

One of the main problems faced by the photometric stereo method is that several measurements are required, as this method needs illumination from light sources from different directions. A solution to this problem is the color photometric stereo method, which conducts one-shot measurements by simultaneously illuminating lights of different wavelengths. However, the classic color photometric stereo method only allows measurements of white objects, while a surface-normal estimation of a multicolored object using this method is theoretically impossible. Therefore, it is necessary to add some constraints to estimate the surface normal of a multicolored object using the framework of the color photometric stereo method. In this study, a median filter is employed as the constraint condition of albedo, and the surface normal of the occluding boundary is employed as the constraint condition of the surface normal. By employing a median filter as the constraint condition, the smooth distribution of the albedo and normal is calculated while the sharp features at the boundary of different albedos and normals are preserved. The surface normal at the occluding boundary is propagated into the inner part of the object region, and forms the abstract shape of the object. Such a surface normal gives a great clue to be used as an initial guess to the surface normal. To demonstrate the effectiveness of this study, a measurement device that can realize the multispectral photometric stereo method with seven colors is employed instead of the classic color photometric stereo method with three colors.


Introduction
To reproduce a detailed surface shape, normal information is necessary. To obtain this information, the photometric stereo method was proposed, which estimates the normal by transitioning the brightness levels of several pictures by changing the direction of the light source. However, as it requires multiple photoshoots, the photometric stereo method is not suitable for modeling a moving object. To measure the shape of a moving object, the color photometric stereo method, which employs several colored light sources, was developed. This method involves placing light sources of red, green, and blue colors in three different directions, which simultaneously illuminate the target object. This paper proposes a technique that employs some constraints so that it can be applied to colored objects, which is impossible for conventional color photometric stereo. Unlike the common color photometric stereo method, we use seven narrow-band lights with different peak wavelengths while observing the target object with a seven-band multispectral camera.

Related Work
The photometric stereo method [1,2] estimates the normal of the surface of an object by illuminating the object and analyzing the resulting shadings on the object's surface. In this method, light is illuminated on the object from one white parallel light source (an infinity point light source) to obtain a picture. Then, two more pictures are captured with different light source directions. In other words, it requires capturing three pictures with different light source directions. Therefore, it is impossible to measure a dynamic object. This problem can be resolved using the color photometric stereo method. In this method, lights are simultaneously illuminated from red, green, and blue light sources, and one picture photographed with an RGB color camera is captured. Such one-shot photograph enables the measurement of a dynamic object.
The color photometric stereo method [3][4][5] (also known as shape-from-color) was developed in the 1990s. Since then, various studies [4,[6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22] have been conducted in this regard. However, many problems are inherent in the color photometric stereo method. Many researchers in the past have struggled with this method, and even until recently it has been an ongoing problem. The principle problem of the color photometric stereo method is the fact that it can only be used with white objects. This is an inevitable problem as long as lights are illuminated from three colored light sources to estimate the surface normal.
Recently, various techniques have been proposed to apply the color photometric stereo method to multicolored objects. Roubtsova et al. [20] applied the color photometric stereo method to objects with arbitrary BRDF (bidirectional reflectance distribution function) by incorporating the Helmholtz Stereo method. However, the principle of this method does not allow for real-time measurement. Therefore, an optical flow is required to measure a dynamic object. Kim et al. [16] and Gotardo et al. [23] also tracked dynamic objects using optical flow, and estimated the surface shape of objects by utilizing several images taken at different times. Fyffe et al. [12] proposed a color photometric stereo method that employs six band cameras and three white color sources. All three light sources used in their method appear white to the human eye. However, all of them possess different spectral distributions. Furthermore, this method pre-measures the reflectance of various objects to prepare a database, and calculated four bases. Using this technique, it is possible to obtain an analytic solution, as there are four unknown numbers in relation to albedo (four base coefficients) and two in relation to the normal (because the three-dimensional vector is normalized), and six equations are obtainable. Anderson et al. [6] estimated the object color using the normal of multi-view stereo. However, owing to the low accuracy of the normal of multi-view stereo, they improved the estimation accuracy of object color based on the hypothesis that an object is composed of a limited number of colors. Their technique incorporates the framework of region segmentation, where the number of the regions is automatically determined based on the Bayesian information criterion. Chakrabarti et al. [8] calculated the candidates of object color by approximating the shape inside the patch of neighboring areas using a polynomial. They calculated the histogram of the object color candidates, chose only the limited number of colors that gained most votes, and evaluated the normal by postulating that the object is composed of these limited number of colors. Jiao et al. [15] divided a picture into super pixel regions and estimated the normal by postulating that the object color inside each region is uniform.
In this paper, the problem faced by the color photometric stereo method is solved using a different approach from those used in previous studies. Our proposed technique employs a median filter as the constraint condition of the albedo and surface normal. We also use the occluding boundary constraint for the surface normal. Thanks to this constraint, we have a good estimate from the initial state of surface normal, which results in robust estimation.
The techniques of Gotardo et al. [23], Kim et al. [16], and Roubtsova et al. [20] need to employ optical flow to measure a dynamic object, while the technique of Fyffe et al. [12] requires a reflectance database to be prepared prior to the measurement. Our proposed technique does not require a shape obtained from other sensors such as multi-view stereo or laser sensor, unlike the technique of Anderson et al. [6] Moreover, unlike the techniques of Chakrabarti et al. [8] and Jiao et al. [15] our proposed method does not require region segmentation. Previous color photometric stereo methods used three lights with red, green, and blue colors and observed the object with an RGB color camera. In our study, seven lights with different wavelengths are used to illuminate the object, which is then observed by a seven-band multispectral camera. This paper demonstrates that multi-spectral cameras and multi-spectral light sources are also effective for the color photometric stereo method.

Image Formulation
A photometric stereo method that employs independent colored light is called the color photometric stereo method. A characteristic of this method is that it enables the estimation of the surface normal with one photoshoot. The widespread color photometric stereo method is conducted with three types of colored lights. While the conventional photometric stereo method results in several grayscale images, the color photometric stereo method results in a multi-spectral image.
Although the fundamental theory is given in several sources in the literature [24,25], we briefly explain the formulation of the problem. The spectral sensitivity of a camera is denoted as Q c (λ), the spectral distribution of the light source is E(λ), and the spectral reflectance of the object is S(λ). Moreover, c denotes the channel. In this case, the brightness obtained from each channel of the camera can be attained from Equation (1).
Suppose that we use single light E(λ) whose spectral distribution can be represented as a delta function δ(·) whose peak wavelength is λ c .
where e c represents the brightness of the light. Suppose that the channel c is only sensitive to the wavelength λ c , and suppose that other channels cannot detect the wavelength λ c .
where q c represents the sensitivity at wavelength λ c . Suppose that we lit a single parallel light source (infinite-far point light source) whose spectral distribution is represented as delta function and its peak wavelength is λ c , the pixel brightness I c can be represented as follows using the formulation that the diffuse reflection is represented as S(λ c ) =s c max(n · l c , 0).
wheres c represents the reflectance. n is a normal vector and l c is the light source direction vector of channel c. Denoting as A c = q c e csc , Equation (4) becomes as follows.
Hereinafter, we call A c albedo. Note that the camera sensitivity and light source brightness are included in A c .
As shown in Figure 1, this study conducts a photoshoot of a multicolored object using seven channels ( Figure 2). Following Equation (5), the brightness is obtained from this photoshoot as follows.
The surface normal n is a 3D vector; however, the degree-of-freedom is two because it is constrained to be a unit vector (such constraint reduces one degree-of-freedom). Albedo A c is represented by seven parameters. There are seven equations, as shown in Equation (6), and nine unknown parameters (A 0 , A 1 , . . . , A 6 , n x , n y , n z , s.t., n 2 x + n 2 y + n 2 z = 1, namely seven for albedo and two for surface normal). Therefore, color photometric stereo, without any assumption or constraint, is an ill-posed problem.
The most commonly used assumption is to limit the color of the target objects to white (A 0 = A 1 = · · · = A 6 ). If we set s = A c n and if we ignore the shadow, the surface normal s (scaled with albedo) can be directly solved.
As is shown above, the color photometric stereo for white objects, or in other words, the conventional photometric stereo can directly solve the surface normal, without iterative optimization nor additional constraints such as smoothness constraints. However, this paper analyzes the methods with multi-colored objects. Therefore, we add smoothness constraints and iteratively solved the problem formulated as Equation (6).  The proposed technique estimates the surface normal through an iteration process. The cost function that is minimized through the iteration process is explained in Section 3.2. Each term of the cost function is explained in Sections 3.3-3.6. The initial value required in the iteration process is explained in Sections 3.6 and 3.7, and the update rule for each iteration is shown in Section 3.8. Detection of specular reflection is explained in Section 3.9. A method to integrate the surface normal to obtain the geometrical structure of the object surface is shown in Section 3.10, and Section 3.11 explains how to cancel the channel crosstalk.

Cost Function
The cost function Fdxdy is expressed through the following four terms: Equation (8) is minimized under the condition that surface normal n should be an unit vector, n = 1.
Here, A = (A 0 (x, y), A 1 (x, y), · · · , A 6 (x, y)) , L = (l 0 , l 1 , · · · , l 6 ) , and I = (I 0 (x, y), I 1 (x, y), · · · , I 6 (x, y)) T . K 2 is the Lagrange multiplier. The area where the target object is observed is denoted as P, and the occluding boundary is denoted as ∂P. The first three terms F 1 , F 2 , and F 3 are the soft constraints defined inside the object region P \∂P, and the fourth term F 4 is the hard constraint defined at the occluding boundary ∂P. Orthographic projection is assumed in this paper for camera model.
Following are the four terms of cost functions, where K 11 and K 12 are the Lagrange multipliers.
Sections 3.3-3.6 explain F 1 , F 2 , F 3 , and F 4 , respectively. F 1 expresses the residual of Lambertian reflectance and the input image brightness. I is the input image brightness, A is the albedo, l is the light source direction, and n is the surface normal. Here, c is the index that identifies the channel, and max(l n, 0) represents the shading. F 2 is the smoothing term of the surface normal, and indicates that the gradient of the surface normal should be small; F 3 is the smoothing term of albedo, and indicates that the gradient of albedo should be small; and F 4 is the constraint condition of the surface normal at the occluding boundary. The surface normal at the occluding boundary n b can be derived from differential geometry. F 4 indicates that the surface normal at the occluding boundary should be equal to n b . The reason why F 2 use both L1 norm and L2 norm is discussed in Section 3.4.
As we will explain in Sections 3.3-3.6, we do not minimize Equation (8) at once but minimize F 1 , F 2 , F 3 , and F 4 separately. Although we cannot mathematically prove that such piecewise minimization results in global minimum, it is empirically known that piecewise minimization make the cost function smaller through the iteration. Since Equation (8) is a non-linear equation with several number of constraints, convergence speed is low. On the other hand, our approach is robust, stable, and speedy since we can minimize the cost function with closed form solution as is shown in Sections 3.3 and 3.6 (F 1 and F 4 ) and minimizing it with straightforward filtering as is shown in Sections 3.4 and 3.5. Section 3.3 explains that F 1 solely cannot solve the problem. In order to solve the problem, we have to add F 2 or F 3 as it will be explained in Sections 3.4 and 3.5. The surface normal will be smooth if we add F 2 , and the albedo will be smooth if we add F 3 . If we add both F 2 and F 3 , the surface normal and the albedo becomes relatively sharper than adding either F 2 or F 3 . Since we want to suppress the surface normal and the albedo to be smooth, we add not only F 2 and F 3 but also F 4 .

Determining Surface Normal and Albedo
If we ignore the influence of the shadow, the first term F 1 shown in Equation (9) can be represented as Equation (13).
The solution obtained by minimizing Equation (13) is expressed as Equation (14).
When albedos A 0 , A 1 , · · · , A 6 are known, the surface normal n can be obtained by calculating the pseudo-inverse matrix L + of matrix L, as shown in Equation (15).
Here, ε 1 is a small positive constant introduced to prevent division-by-zero. As the surface normal n is expressed as a unit vector ( n = 1), it is normalized after calculating Equation (15). The unit vectorn of the surface normal n can be calculated by dividing its length n asn = n/ n . Shadow has a low brightness, and thus, thresholding the brightness results in detecting the shadow, as is shown in Section 3.9. As for the channel which is detected as a shadow using the procedure shown in Section 3.9, Equation (15) cannot be used for surface normal estimation. To avoid this, n is calculated by weighting the c'th row of L by a small value d in relation to channel c, which is a shadow. This situation is expressed as follows.
As usual, the surface normal n is normalized after calculating Equation (16). When the surface normal is known, albedo can be calculated as shown in Equation (17) derived from Equation (14).
To prevent division-by-zero, Equation (17) is calculated when l c n > ε 2 holds, where ε 2 is a small positive constant. In addition, if the pixel is detected as an outlier (Section 3.9), Equation (17) is also not calculated.
There are seven constraint condition equations in Equation (14). The input brightness I 0 , I 1 , · · · , I 6 and the unit vector that expresses the light source directions l 0 , l 1 , · · · , l 6 are known. Albedos A 0 , A 1 , · · · , A 6 and normal vectors n x , n y , n z are unknown parameters. Because the 3D normal vector is conditioned to be the unit vector, its degree-of-freedom is two. Therefore, the total number of unknown parameters is nine, with seven albedos and two surface normal components. At this point, calculations are not possible because the number of the unknown numbers is larger than the number of equations. Thus, the smoothing of the surface normal, smoothing of albedos, and constraint condition of the surface normal at the occluding boundary are introduced to the cost function.

Smoothness Constraint for Surface Normal
As explained in Section 3.3, surface normal and albedo cannot be calculated because there are too many unknowns. Therefore, the smoothing of the surface normal is conducted as a constraint condition. The second term F 2 of cost function F, which expresses the smoothing term of the normal, is expressed as Equation (10).
The discretization of the first term of Equation (10) results in Equation (18) and that of the second term results in Equation (19).
In our software, Equation (18) is implemented as a Gaussian filter, and Equation (19) is implemented as a median filter. Convolving Equation (18) multiple times can be approximated by a Gaussian filter. Therefore, instead of applying Equation (18) multiple times, we applied a Gaussian filter once. We first apply a median filter before the Gaussian filter. After the surface normal is smoothed, it is normalized to be a unit vector.
The fastest way to calculate Equation (19) is to calculate the median for each element as follows.
n y = median{n y (x + 1, y), n y (x − 1, y), n y (x, y + 1), n y (x, y − 1)} , After that the vector is normalized to be a unit vector. This procedure calculates the median in Euclidean distance, not in Riemannian distance (geodesic distance). However, this difference does not matter in practice since the surface is assumed to be smooth: namely, since the angle between neighboring pixels is small, the Euclidean distance of two vectors can be approximated as the Riemannian distance.
In order to keep the sharp feature of surface normal, median filter (Equation (19)) is used. The median filter will not change surface normal over neighboring pixels at sharp features. Although a median filter is preferable to keep the sharp features, we also use a Gaussian filter (Equation (18)) to constrain the surface normal to be smooth. The median filter does not change the surface normal at shape features, and such pixels may be stuck in local minima. The Gaussian filter (Equation (18)) can modify the surface normal even for such edges. We empirically found beneficial to use both a median filter and Gaussian filter since these filters can find a good balance between smooth normals and sharp features.
As shown in Equation (6), there are nine unknown parameters and seven equations. Although Equation (18) or Equation (19) comprises three equations, the surface normal should be constrained as a unit vector; thus, Equation (18) or Equation (19) has two degrees-of-freedom. Now, we have nine unknown parameters and nine equations per pixel. The problem is now well-posed, but an over-smoothed surface normal will be obtained if we solely use this constraint. We add another constraint F 3 , as shown in Section 3.5, in order to relatively reduce the influence of F 2 .

Smoothness Constraint for Albedo
As discussed in Section 3.4, smoothing of the surface normal alone is insufficient as a constraint condition. Therefore, albedo smoothing is also conducted. The third term F 3 in the cost function, which expresses the albedo smoothing, is shown in Equation (11). Equation (11) is discretized as Equation (21).
Namely, we applied median filter to the albedo. As shown in Equation (6), there are nine unknown parameters and seven equations. Equation (21) implies seven equations because there are seven channels. Now, we have 9 unknown parameters and 14 equations per pixel, which results in a well-posed problem. However, an over-smoothed albedo will be obtained if we solely use this constraint. We add another constraint F 2 as shown in Section 3.4 in order to relatively reduce the influence of F 3 .

Occluding Boundary Constraint and Initial Value of Surface Normal
The target objects of this study are smooth and closed surfaces. Here, the occluding boundary is the border region where the surface normal of the object begins to turn toward the rear just before it becomes invisible. The angle between the observation direction vector and the normal vector is 90 • since we assume orthographic projection for camera model. It means that it is possible to correctly estimate the surface normal at the occluding boundary, which is orthogonal to the object area contour. This is incorporated into the cost function as F 4 . The occluding boundary normal is defined as n b (Equation (12)). Now, the solution that minimizes F 4 is n(x, y) = n b (x, y). At the occluding boundary, n b is used as the estimation of the surface normal.
Although F 2 or F 3 are enough for solving Equation (8), also using F 4 is beneficial. The function F 2 itself has no boundary condition, and if we minimize F 2 only the surface normal will be extraordinarily smooth. In order to restrict the surface normal to be smooth, we will add F 4 as the boundary condition.
In addition, the pixel brightness close to the occluding boundary is unreliable, since it contains shadow in most of the channels. Since the reliability of the data term F 1 is small at the occluding boundary, adding F 4 is reasonable.
To conduct the iteration process using the cost function, initial values are required for the surface normal and albedo. As follows, the initial value of surface normal is calculated from the surface normal at the occluding boundary ( Figure 3). As is done in previous work [26], we also inflated the silhouette to make the approximate shape. Our approach is shown as follows. First, we calculate the distance from each pixel to the nearest occluding boundary pixel, and next, we sort the distance. As for initial guess, we assume that the probability distribution of the height of the target object is the same as that of the hemisphere. Let us denote the maximum of the distance as D max . The number of the pixels in object region is |P |. The order of the sorted pixel (x, y) is denoted as o. If we assume that the object is hemisphere whose radius is r, then r is calculated from |P | = πr 2 . The area o whose length from the center of the circle is denoted as l can be represented as o = πl 2 . Therefore, l can be calculated from o. The height h is represented as r 2 = h 2 + l 2 where the distance from the center of the circle is l. Therefore, h can be calculated. Height of the unit hemisphere is calculated by dividing r from h. Multiplying D max results in the height of the hemisphere where its maximum height is D max . After that, the height field is slightly smoothed.
The initial height ( Figure 3) is obtained by above procedure. Differentiating the height and normalizing it as follows results in the surface normaln.
n = n x , n y , n z Finally, the smoothed and normalized surface normal is used as the initial value.

Initial Value of Albedo
It is better to use an initial value of albedo which is close to the true albedo as much as possible, in order to speed up the convergence. However, since we do not use additional sensors or data, we have to calculate the initial albedo solely from input image. The input image is a single seven-channel image, whose light source direction is different. We calculate the average of seven channels, and this average image I avg works well for initial albedo.
I avg (x, y) = 1 7 (I 0 (x, y) + I 1 (x, y) + · · · + I 6 (x, y)) , This is the sole image we can obtain from seven input images closest to the true albedo. If an infinite number of light sources illuminate the object uniformly from the surroundings, the observation of the object becomes the same as that of the albedo with constant scaling. This is the reason why the average image can be a good estimate of albedo. As shown in Figure 4, the true albedo value and brightness of the average image I avg are similar; therefore, the average image can be used as the initial guess. In order to decrease the effect of the shadow, the bilateral filter is applied to the average image.  The albedo A is highly correlated with the input image brightness I. The initial albedo A c (x, y) is set to be an image where the brightness of the average image I avg is scaled to be the same as the brightness of each channel.
In order to robustly calculate the ratio I c /I avg , the median of the ratio is used.

Update Rule
After the initial values for the normal n and albedo A are calculated, as shown in Section 3.6 and Section 3.7, the calculations are iterated several times. First, the surface normal is calculated according to the procedure shown in Section 3.3. The calculated surface normal is denoted as n , and the surface normal of the previous step is denoted as n . Instead of using n , the new surface normal n for the next step is calculated as Equation (27).
The constant α n stabilizes the update of the surface normal, resulting in robust optimization. Actually, instead of Equation (27), we implemented our software as follows.
Here, the surface normal of the previous iteration is represented as (ñ x ,ñ y ,ñ z ) and the updated surface normal to be taken over to the next iteration is represented as (n x , n y , n z ). After solving this equation, the obtained surface normal is normalized.
Here, we have employed additional weight r. This weight depends on the number of valid channels for each pixel. If there are no shadows and speculars in all seven channels, we set r as a large number, so that the surface normal calculated by photometric stereo equation becomes much more important than the surface normal of the previous iteration (ñ x ,ñ y ,ñ z ). If there are many invalid channels, the surface normal calculated by photometric stereo equation becomes unreliable, thus we set r small so that surface normal will be unchanged. We define r as follows using the parameter w.
Here, v is the number of valid channels. We found empirically that w > 1 is good for stable computation.
Next, albedo is calculated according to the procedure shown in Section 3.3. The calculated albedo is denoted as A , and the albedo of the previous step is denoted as A . The update rule for albedo is shown in Equation (30).
The constant value α a stabilizes the optimization. Instead of using Equation (30), we implemented this process as follows.
This is a weighted sum of Equation (17) and the previously calculated albedoÃ c with the weight α a . Note that Equation (31) is calculated if channel c is marked as valid through the process shown in Section 3.9, and A c =Ã c is used if it is invalid.

Outlier Detection
Detecting specular reflection in color photometric stereo problems is difficult. One of the common approaches for detecting specular reflection is to use color. The colors of diffuse reflection and specular reflection are usually different; thus, the diffuse reflection and specular reflection can be separated when the scene is illuminated by a nearly white light source. However, the color photometric stereo illuminates the object with three different colors, and thus, the color-based approach cannot solve the problem. Another approach is to use principal component analysis or singular value decomposition, which represents the image with three orthonormal bases. However, the color of each light is different in color photometric stereo approach, and thus, the images cannot be represented by a linear sum of three bases. As a result, the remaining approach is to use the strong brightness change caused at specular reflection. Therefore, we have no choice but to use thresholding approach for outlier (specular / shadow) detection. Suppose that the maximum brightness of the object for all channels is I max and the minimum is I min . We use T max = I max − t max and T min = I min + t min as thresholds, where t max and t min are small positive constants. Outlier map N, which is 1 for outlier and 0 for valid pixel, is designed as follows.
Here, "dilation" is an operator which dilates the "1" pixels, which is a well-known operator in binary image processing, which we skip to explain.

Calculating Height from Surface Normal
In this section, we briefly introduce the procedure to calculate the height from surface normal. Here, we assume orthographic projection, and the perspective projection case is shown in the literature [27]. More details are given in the literature [27][28][29].
The shape is represented as the height H set for each pixel. The partial derivatives of the heights with respect to x and y are called gradient, and represented as p and q, respectively.
The surface normal n is represented by these gradients as shown below.
The cost function that relates the surface normal to the height is shown below. (H The Euler equation (Euler-Lagrange differential equation) that minimizes the equation can be expressed as As for H, the Euler equation that minimizes Equation (36) is derived as follows: Here, H xx and H yy can be discretized as follows: Thus, substituting Equations (40) and (41) into Equation (39) yields the following equation.
As is shown in Equation (35), the gradients p and q are calculated from the surface normal n. The partial differentiation of gradients used for Equation (42) is discretized as follows.
After computing Equation (43), we solve Equation (42) to determine the height H. In this paper, we solve Equation (42) using the successive over-relaxation method, but any other methods are also applicable, such as Fourier transform [30] or preconditioned conjugate gradient [31].

Channel Crosstalk
In an instrument that independently uses signals of two or more channels, signal leaking from one channel to another is called crosstalk. Our experiment uses a multi-band camera that has seven channels and detects undesired colors of other channels. The undesired effect of a color camera is called channel crosstalk [11,[32][33][34]. Figure 5 is an example of a three-band RGB camera that detects 550 nm green light as (R, G, B) = (63, 255, 63). This signal should be (R, G, B) = (0, 255, 0) since the observed green light wavelength is 550 nm. As shown in Figure 5, the bandwidth of each spectral sensitivity is wide, and thus, has some overlaps; therefore, the R and B channels also detect the color of green light. Color photometric stereo assumes that the sensor has no channel crosstalk, as shown in Figure 6; thus, we must remove channel crosstalk.  To detect the channel crosstalk, we use a diffuse white reflectance standard, which has flat spectral reflectance for each wavelength. The seven-band camera captures seven images of the diffuse white reflectance standard illuminated by one of the seven light sources, which are lit one-by-one. A single channel is sensitive to each light; thus, the signals of other channels are caused by the crosstalk.
Channel crosstalk can be represented by a color mixing matrix X. Since we use a seven-band camera, the size of matrix X is 7 × 7. Let us denote the ideal signal without channel crosstalk as d i . This seven-dimensional column vector d i becomes d o because it is affected by channel crosstalk. The relation between these signals and the color mixing matrix is as follows.
The original signal d i can be recovered from the captured signal d o as follows.
The color mixing matrix X should be obtained prior to the measurement, and the input image should be converted by the inverse of the color mixing matrix X −1 before applying the proposed algorithm.
Suppose that we look at the 0th channel of the diffuse white reflectance standard illuminated by the 0th light with narrow-band wavelength. Ideally, the signal should be zero for each channel, except the 0th channel. We define the value of the 0th channel as 1. Namely, the ideal signal d i = (1, 0, 0, 0, 0, 0, 0) T becomes d o = (w 0,0 , w 1,0 , · · · , w 6,0 ) T after observation.
Similarly, the diffuse white reflectance standard illuminated by the 1st light is expressed as follows.
w 0,1 w 1,1 w 2,1 · · · w 6,1 This procedure is repeated until the 6th light. The following equation expresses the whole measurement, which is conducted seven times. As a result, the color mixing matrix X is obtained as follows.
The inverse of the color mixing matrix X −1 can cancel the channel crosstalk of the observed signal. The output ideal signal d i is calibrated such that the signal of the diffuse white reflectance standard would be (1, 1, · · · , 1).

Experimental Setup
The camera used for this experiment is an FD-1665 3CCD multi-spectral camera by FluxData, Inc., USA, as shown in Figure 7. It comprises two color sensors and a near-infrared (NIR) sensor. Each sensor is sensitive to its respective wavelength, i.e., each color sensor can record the components from three channels, and the NIR sensor can record the components from one channel. Figure 8 shows the spectral sensitivity of the camera. As shown in Figure 8, channel crosstalk occurred among all camera channels. Therefore, the method shown in Section 3.11 is used to remove the channel crosstalk in the photographed input image. The diffuse white reflectance standard is used to obtain the color mixing matrix shown in Figure 9, where the row denotes the channel number and the column denotes the light number. The color mixing matrix is created using the average value of the diffuse white reflectance standard.    The light source directions were determined prior to the experiment by photographing a mirrored ball. The locations of the light sources and the camera were then left unchanged.
The experiment was conducted in a darkroom. To increase the amount of supplementary information obtained for objects with narrow-wavelength regions, light sources of close wavelength were positioned opposite to each other. The NIR light source was placed next to the camera. Figure 10 shows a diagram of the experiment. Each point on the object's surface must always be illuminated by more than three light sources for the photometric stereo method. If there are six light sources, any point on the surface can be illuminated by at least three light sources [35]. Additionally, when specular reflection occurs, one picture that can be used for the photometric stereo method is eliminated. Therefore, the NIR light source is placed next to the camera so that each point is illuminated by at least four light sources. Figure 11 is Gaussian sphere representation of the surface normal, where the number of each region represents the number of light sources illuminated. In the photometric stereo method, precision increases when the angle between the light sources is widened, i.e., the baseline is lengthened, because it increases the shading contrast. However, when the baseline is lengthened, the shadow area increases. The locations of the light sources must, therefore, be limited to a certain solid angle. When seven points are placed within a fixed circle, the placement of the points must be as far from each other as possible to comprise the vertices of a regular hexagon and its center, as shown in Figure 12. Therefore, when placing seven light sources within a limited area for the photometric stereo method, it is optimal to place them at the vertices of a regular hexagon and its center.
However, when three of the light sources selected from these seven lights are placed on the same straight line, or more precisely, when the three light source vectors are coplanar, the surface normal cannot be estimated by combining the three light sources. This is because combining these three light sources causes the light source matrix to degenerate. Suppose that the surface normal n is illuminated by light sources l 0 , l 1 , and l 2 , and is observed as the pixel brightnesses I 0 , I 1 , and I 2 , respectively, while ignoring the shadow. If the light source directions are known, the surface normal can be obtained from following equation if there is an inverse of 3 × 3 light source matrix (l 0 , l 1 , l 2 ) .
The determinant of (l 0 , l 1 , l 2 ) is the scalar triple product l 0 · (l 1 × l 2 ). If l 0 , l 1 , and l 2 are coplanar, the vector l 1 × l 2 becomes orthogonal to the vector l 0 , thus the determinant becomes zero. Although two-light photometric stereo exists [36], it is better to avoid three lights to be coplanar if we have more than two lights. Therefore, the NIR light source is placed at a small distance from the center of the regular hexagon so that no three light sources are on the same straight line. The camera is placed at the center of the regular hexagon.

Experimental Result
The computation time of the main part of the algorithm (i.e., excluding the computation time of calculating the initial value) is about ten seconds for ordinary object and ordinary computer with single thread and without any fine tuning to the source code.
As for all experimental results shown in this section, we usedα n = 0.1 andα a = 0.99. These two parameters are the most important parameters which affect the final result, and other parameters are relatively less influential in comparison to these parameters. We used 4 for the standard deviation of Gaussian filter for smoothing the surface normal, and 15 × 15 and 11 × 11 for the window size of median filter of surface normal and albedo, respectively. The iteration number was set to be 2. We used w = 16, d = 0.0001, 1 = 0.001, and 2 = 0.1. The abovementioned parameters are the all parameters used in the main process.
As for calculating the initial albedo, we used 2 for the standard deviation of spatial parameter, and 20 for the standard deviation of intensity parameter for the bilateral filter. When applying the bilateral filter, the pixel brightness of outlier is scaled by 0.1 when calculating the weighted sum. The iteration number of the bilateral filter is set to be 10. As for calculating the scale, in order to avoid division-by-zero error, I avg ≤ 0.1 is not used for calculating Equation (26). As for calculating the initial normal, smoothing filter is applied twice: First it is applied to the height data and next it is applied to the surface normal. As for smoothing, 3 × 3 box filter is used, and the iteration number was set to be 100, for both the height and the normal. As for outlier detection, t max = 15 and t min = 5 are used. The number of dilation is set to be 1. The abovementioned parameters are the all parameters used in calculating the initial values.
First, we measured a plastic sphere to evaluate our system. The spherical object shown in Figure 13 consists of two types of albedos. Figure 14 shows the error map with pseudo-color representation. The error is evaluated as an angle between the estimated surface normal and the true surface normal. We measured a sphere because its true surface normal can be obtained from the mathematical expression of the sphere. We compared our method with the so-called "naive color photometric stereo." In this paper, we define the color photometric stereo that assumes white objects as target as naive color photometric stereo. The generalized color photometric stereo problem shown in Equation (6) has nine unknown parameters; however, naive color photometric stereo has three unknown parameters: single albedo value (one parameter) and 3D surface normal (two parameters since it is constrained to be a unit vector). Therefore, naive color photometric stereo directly solves the linear equation even if the image is captured by a three-band color camera. Naive color photometric stereo robustly estimates the surface normal of white shirts, white dresses, and so on. The mean error of naive color photometric stereo (Figure 14a) were 0.343 [rad]. Our method overwhelms the previous approach, and our mean error (Figure 14b) was 0.148 [rad]. Figure 15 shows the seven-channel image of an owl figurine (Figure 16a). The captured image shown in Figure 15a is contaminated by channel crosstalk, and thus, we cancelled it, which resulted in Figure 15b. The surface normal estimated by naive color photometric stereo is shown in Figure 16b and that estimated by our method is shown in Figure 16c. As usual, the x, y, and z axes of the surface normal are linearly converted to R, G, and B for the pseudo-color representation of the surface normal. The estimated albedo is shown in Figure 17. The shapes obtained by naive color photometric stereo and by our method are shown in Figure 18a,b, respectively.   If you look carefully, you may notice that the channel crosstalk is removed. However, the difference is difficult to recognize since the crosstalk is small as is shown in Figure 9.         The results for buddha object, which causes strong specular reflection. Estimated geometry [Buddha]: (a) Naive color photometric stereo and (b) our method. The proposed method can smooth the surface normal of the scarf whose surface normal is unreliable due to black paint.
The advantage of color photometric stereo is that the surface normal of dynamic objects can be obtained. Most existing color photometric stereo methods measure white shirts, white dresses, etc., to verify that these methods can be applied to dynamically deforming objects. Due to the small size of the darkroom, we measured a glove instead of clothes. Figures 25-27 show the measurement results, and Figures 28-30 show the results of the same object but differently deformed.        Figure 31b. Kinect measures the depth and photometric stereo measures the surface normal. These two sensors are fundamentally different, however, since Kinect is a well-known commercial product of shape measurement, we think beneficial to show Figure 31 for the readers.    Figure 32c, since the smoothness constraint of surface normal is stronger. Figure 32a is smoother than Figure 32c, and Figure 32b is smoother than Figure 32d, since the albedo is not smooth, which means that the surface normal becomes relatively smooth. Although Figures 17,20,23,26,and 29 show over-smoothed result of albedo, it is an adequate way to smooth the albedo in order to obtain sharp features of surface normal.   As shown in Figure 14, our method is robust to multiple types of albedos. On the other hand, as shown in Figures 16-30, our method over-smoothens the detailed surface structure. The generalized color photometric stereo problem shown in Equation (6) has nine unknown parameters; however, naive color photometric stereo has three unknown parameters, as stated in Section 4.2. Naive color photometric stereo robustly estimates the surface normal of white shirts, white dresses, etc. For multiple albedos, we have to tackle the ill-posed problem shown in Equation (6). Before starting this project, we had planned to use other constraints such as a so-called "integrability constraint." However, we have chosen the smoothness constraint for constraining the problem since the integrability constraint solely cannot solve the problem. Surface normal n can be expressed as the gradients p and q (Equation (35)). Equation (6) can be rewritten as follows.
Namely, we have 9 unknowns (A 0 , . . . , A 6 , p, and q) and 7 equations per pixel. Smoothness constraint for p and q can be represented as follows.
Since there are additional two constraints per pixel which results in 9 equations per pixel, we can solve the problem. Integrability constraint can be represented as follows.
Since only one constraint is added per pixel, we cannot determine 9 parameters from 8 equations. This is the reason why we use smoothness constraint rather than integrability constraint.
The over-smoothing problem is an unavoidable effect in the current approach, which relies on Equation (6). Our future work is to drastically change our approach such that it does not depend on Equation (6). We have to fundamentally consider the basic theory in order to improve the performance of color photometric stereo.

Conclusions
In this study, surface normal estimation of multicolored objects was conducted by the multi-spectral color photometric stereo method using the median filter and occluding boundary. Note that the conventional color photometric stereo method is an ill-posed problem. Constraining the surface normal and albedo using the median filter sucessfully solved this problem. In addition, we used the approximate shape calculated from the occluding boundary as the initial guess for the surface normal. Finally, we assembled measurement hardware that illuminates the object with seven different spectra and captured the image by a seven-band multispectral camera.
As discussed in Section 4.3, our method faces several problems in terms of both hardware and software. These problems cannot be solved with a minor update, so we need a drastic change for further improvement. In the future, we will disassemble the current measurement hardware and create a more useful system. For example, in order to make the hardware robust to shadow, it is better to add more lights and observe the scene with a multispectral camera with more than 7 channels. The current method used one point light per channel, however, using area light is one choice for improvement in order to avoid the shadows. A polarization filter is also useful to remove the specular reflection. Additional future work is to reconsider the basic theory and fundamentally reorganize the approach of the algorithm. In order to apply the method to non-Lambertian BRDF, it is useful to measure the database of actual object with proposed system and train them using deep learning or other machine learnings. A database of spectral reflectance of various object decreases the number of unknowns which can make the problem well-posed. Using additional sensors such as RGB-D camera is also interesting.