1. Introduction
Color constancy can ensure that the perception of object color is relatively stable under different illumination conditions, and it is a characteristic of the human color perception system [
1,
2,
3]. For example, whether a piece of white paper is in the outdoor sunlight or in the dim candle light indoors, we can always restore its original white color in our minds. With the development of optics and material technology [
4,
5,
6], the application of digital cameras is becoming more and more extensive. In the digital world, color constancy plays a vital role in areas such as object recognition and tracking, scene analysis, and image-based localization, etc. For example, in the field of autonomous driving, color constancy algorithms ensure that objects in a scene captured under different illumination conditions have the same appearance, thereby improving the robustness of target (pedestrian, vehicle, etc.) recognition and tracking.
Prevailing color constancy methods are mainly divided into the following two categories: learning-based methods and statistics-based methods. Learning-based color constancy methods include the following two categories: (1) Gamut mapping color constancy. (2) Learn a color constancy model from the training datasets. The gamut mapping [
7,
8] color constancy method is based on the property that humans can only observe a limited number of colors for a natural image under a given illumination. A typical color gamut is the set of all RGB values of a typical light source (typically a white light source). In RGB space, this canonical gamut proved to be a convex hull [
7]. This approach computes the changes that transfer the recorded color gamut into the canonical gamut, allowing us to determine the hue of the light source. Barnard et al. [
8] have demonstrated that the gamut mapping method outperforms the Gray-World. The Gray-World assumes that the mean value of the average reflection of light by natural scenes is a constant value close to “grey”. Finlayson improved the gamut mapping algorithm by limiting the transformations to chromaticity space, which means that only the illumination corresponding to the existing illumination is allowed [
9]. The above improved algorithm is called GCIE, which can be regarded as a robust improvement method to remove the limitation in the diagonal model of illumination variation. The illumination estimation accuracy of the color constancy method of gamut mapping is highly dependent on the proposed assumptions. Once the assumptions do not meet the actual application scenarios, the color constancy performance degrades severely.
The learning-based color constancy method obtains the illumination estimation model through continuous iterative learning from a large amount of given training datasets. Learning-based color constancy methods usually first extract intrinsic properties of natural images as features (e.g., edges, histograms, chromaticity and semantic information of brightest colors, etc.), and then study the complex relationship between features and illumination. Color cat [
10] utilizes the linear regression relationship between illumination and histogram to achieve illumination estimation. Corrected moments [
11] also show that color moments used as features provide satisfactory illumination estimation performance with least squares training. Learning-based models [
12,
13], Bayesian-based color constancy [
14,
15], exemplar-based methods [
16], biologically-inspired models [
17,
18], high-level information-based methods [
19,
20], and physics-based models [
21,
22] are commonly used examples of learning-based color constancy. In recent years, with the rapid development of deep neural network technology, the performance of color constancy methods based on convolutional neural network models has been continuously improved [
23]. However, its practical application is limited due to the large amount of parameters and overloaded redundant features.
Statistics-based methods put greater attention on the correlation between illumination and surface reflectance. Buchsbaum [
24] introduced the Gray-World hypothesis, which assumes that the average reflectance in a scene under a neutral light source is achromatic. The color constancy method based on the Gray-World assumption can calculate the mean value of the three RGB channels and eliminate the influence of ambient light as much as possible. The color constancy method based on the gray world assumption works well when the color components of the image are relatively uniform. However, once the color distribution of the image is uneven, the effect drops sharply. White-Patch [
25] assumes that perfect reflections lead to a maximum response in the RGB channels, i.e., taking the maximum value of RGB as the value of white. However, White-Patch-based color constancy methods fail when the scene is flooded with a large number of monochromatic colors. Grey-Edge is another popular color constancy method [
26,
27], which implies that the average reflectance differences in a scene is achromatic. Both of the low-level statistical methodologies listed above can be merged in a single framework:
where the image
was captured by the camera.
,
denotes a Gaussian filter with standard deviation
.
functions as a scaling coefficient that varies according to the scene observed. The constant
is between 0 and 1, and color constancy based on this equation indicates that the
Minkowski norm of the
order derivative in a scene is achromatic.
is the estimated lighting.
In this paper, we propose a novel low-level statistics-based method as a final extension of the above-mentioned framework described in Equation (1). Inspired by locally normalized reflectance estimation [
28] and Gray-Edge hypothesis, to get the locally normalized reflectance differences, we partition the reflectance difference picture into
non-overlapped patches of equal size and divide each reflectance difference by the local maximum inside the non-overlapping local patch. For the obtained color-biased image
, the proposed algorithm is first used to determine the light source estimate
. Then, color constancy is achieved by transforming image
according to illumination estimate
into a photograph taken under a standard light source.
The main contributions of the paper can be summarized as follows:
(1) The relationship between the total of reflectance differences and the sum of locally normalized reflectance differences is exploited. After analyzing the statistics of three datasets containing different lighting conditions and scenes, we found that the ratio of the global sum of reflectance differences to the global sum of locally normalized reflectance differences is achromatic. Based on this finding, we propose a more accurate color constancy method for recovering the true color of the scene.
(2) We propose a new framework that incorporates color constancy methods such as Gray-World, maximum RGB, and Gray-Edge. We will also show how that Grey-World, White-Patch, Grey-Edge, and Local-Surface [
29] can all be incorporated into the proposed framework of color constancy.
(3) The experiment demonstrates the feasibility and effectiveness of the proposed method when facing scenarios with single or multiple illuminations. In particular, the experimental results on the HDR test set show that the color constancy method proposed by us is superior to the comparison algorithm, showing that the pro-posed method can restore the actual color more accurately for different scenes. We also incorporated a clustering algorithm to improve the results under multiple illuminations.
The rest of the paper is organized as follows:
Section 2 presents the proposed algorithm in detail,
Section 3 tests the performance of the proposed algorithm on four commonly used datasets, and finally,
Section 4 summarizes and further discusses future research work.
2. Proposed Method
Assuming the scene is illuminated uniformly by a single light source
, such as outdoor lighting, the image
captured by the camera pipeline model are represented in the following form:
where
are color channels of the camera sensor,
is the spatial coordinate, wavelength of the light is represented as
.
denotes the surface reflectance,
is the visible spectrum.
is the camera sensitivity, and under the diagonal transform assumption [
7], the observed light source color
can be calculated as:
Figure 1 is a flowchart of the proposed color constancy approach, the details of which are described in the following sections.
2.1. Local Normalized Surface Reflectance Differences
This section will explain the meaning of local normalized surface reflectance differences. According to the paper [
26], we can calculate the differences in the image
by the formula:
The entire region of the differences picture
is divided into
equal-sized non-overlapping patches. Let
where
denotes the spatial location of the pixel with the maximum intensity in the
local region:
is the edge intensity of the pixel at position, which is normalized by the maximum value of the edge value in the local image patch.
The reflectance differences in a scene can be represented as
, then local normalized reflectance differences
can be represented as:
2.2. Hypothesis Validation
In this section, the validation process of the hypothesis in this paper is presented, and it is shown the illuminant estimate can be obtained accurately calculated by dividing the total of edges by the sum of local normalized reflectance differences , where denotes the overall image area, represents the total number of the local regions inside the image, and indicates the space of the local area.
Let
represent the ratio of total of edges
and the sum of local normalized reflectance differences
:
where
In order to exploit the relationship between
and
, we used the Gehler-Shi [
30] reprocessed dataset containing 568 images and the NUS dataset [
31] of 1737 images taken with eight cameras for color checking, and SFU dataset [
13] contains 105 high dynamic range images of indoor and outdoor areas. The datasets above mentioned provide illumination of each raw image, so that the no color-biased image of each color-biased image can be obtained, and then we can compute the values of
and
respectively.
As shown in
Figure 2, columns (a), (b) and (c) are the
statistics on the three datasets of NUS, Gehler-Shi and SFU, respectively. The scatter points in each sub-graph represent the
information of one of the three RGB channels of a picture. For example, the three plots in the first column of
Figure 2 depict the 1737 ratios of
and
in each of the three color channels in the NUS dataset, where each discrete point corresponds to a picture in the NUS dataset, and
with
. In detail,
denotes the ratio of
and
in red color channel of the image.
As we can see from
Figure 2, it is obvious that most of the scattered points are distributed along the diagonal line. Therefore, we can get the formula:
where
is a constant. For uniform illumination, the light source color can be computed by:
Based on the above inference. Given a color-biased image as input, can be determined by Equation (8), functions as a scaling coefficient that varies according to the scene observed. Given that is the same for all color channels , based on Equation (11) and Equation (3), we don’t have to find the actual value of because it can be negated by using the normalized form of as the final result of the illumination.
2.3. Expanded into a Unified Framework
In this section, we will show than we propose a new framework that can incorporate several important statistics-based color constancy algorithms (Grey-World, maximum RGB, Grey-Edge, etc.).
Our proposed method, like the previous framework Equation (1), can well be modified to merge into the Minkowski norm:
where
denotes the spatial derivatives of order . The Gaussian filter with standard deviation is also introduced in order to exploit the local correlation. The Minkowski norm determines the relative weights of the multiple measures used to estimate the final illuminant color.
What is more,
Table 1 demonstrates that both Grey-World, White-Patch, Grey-Edge, and Local-Surface methods are the extreme case of Equation (13). For example, for color constancy method 2nd order Local-Edge,
,
, that is,
represents the total number of the local regions inside the image, the spatial derivatives is 2,
represents Minkowski norm and
represents the standard deviation of the Gaussian filter. Similarly, the remaining color constancy methods in
Table 1 can also be incorporated into the unified framework we propose.
3. Experimental Results
The previous section provided a generic formulation of color illuminant estimates using low-level image features. In this section, the proposed approach is evaluated on four benchmark datasets, one indoor light source dataset [
32], two for the real-world dataset (SFU indoor dataset and Grey-ball dataset) [
13,
30], and one for the HDR light source dataset [
33]. The light source color of the scene is given as extra data for both datasets.
The angular error is used as the color constancy error metric [
34]:
where the
indicates the actual light source and the
indicates the estimated light source. Smaller angular errors indicate more accurate color constancy results. In order to evaluate the proposed algorithm more objectively, five metrics including median, mean, trimean, max angular error are used to measure the accuracy of the color constancy of the method.
The proposed method in the paper is a low-level based method, so we compared it to the following low-level based methods: Inverse-Intensity Chromaticity Space (IICS) [
22], Grey-World [
24], White-Patch [
25], Shade of Grey [
35], General Grey World [
34], 1-st order Grey-Edge, 2-nd order Grey-Edge [
26], Local Surface Reflectance [
29]. It also includes the most advanced methods available, such as pixel-based Gamut Mapping [
7], edge-based Gamut Mapping [
36], Spatio-Spectral Statistics [
37], Weighted Grey Edge [
27], SVR Regression [
38], Natural Image Statistics [
18], Exemplar-based method [
16], Bayesian [
15], and Thin-plate Spline Interpolation [
39].
3.1. Parameter Setting and Analysis
As previously stated in
Section 2, the proposed method should select the best parameter values. In our model there are a total of four variables, because of the computational limitation, only 1st-order and 2nd-order are discussed, and the remaining variables are only three, which are Minkowski norm
, scale
, and number of the local regions
. Empirically, we traverse the indoor image dataset to determine the optimal parameters. The final selected parameters are shown in
Table 2. As shown in
Table 2, both Minkowski norm
and the local regions
adopt the same setting in both 1st-order Local edge and 2nd-order Local edge methods. We set scale
and scale
in the 1st-order Local edge and 2nd-order Local edge color constancy methods, respectively.
3.2. Indoor Dataset
The angular error results of various models on the SFU indoor dataset are presented in
Table 3. A total of 321 linear photos were captured in the laboratory under 11 various illumination conditions in the SFU indoor dataset [
32].
It can be seen from
Table 3 that our model performed well when compared to other models on a variety of measures. Specifically, our proposed 2nd-order Local Edge color constancy method achieves the smallest angular error on the four metrics of median, mean, trimean and best-25%, and the proposed 1st-order Local Edge method on the worst-25% metric, it achieves better results than the comparison algorithm.
Table 3 shows that the proposed color constancy method can achieve more accurate lighting color estimation results than the comparison algorithm in indoor lighting environments.
3.3. Real-World Dataset
The Gehler-Shi dataset includes 568 linear natural photos [
15,
30], all of which were captured in RAW format with a DSLR camera, with no color correction. As in many prior studies, the 24 patch color checkboard in every image of the dataset was disguised for illuminant estimation.
Our approaches were then tested on the SFU Grey-Ball dataset [
13], which comprises 11,346 non-linear photos. This dataset has been treated in camera using complicated processing, making it impossible to derive an exact illuminant estimate. Before the experiment, we masked out the gray balls in each photo for unbiased evaluation.
Table 4 shows the results on this color check dataset, and
Table 5 lists the results of the SFU grey-ball dataset. In general, among all models, our method shows the best color constancy accuracy. As can be seen from
Table 4, the 2nd-order Local edge method we proposed achieved the lowest values on the five metrics of median, mean, trimean, best-25% and worst-25%. Lower values for the above five metrics indicate more accurate color constancy. Therefore, the 2nd-order Local edge method we proposed is superior to Grey-World, White-patch, Shade of Grey, Grey-Edge, Local Surface Reflectance, Pixel-based Gamut, Edge-based Gamut on the Gehler-Shi dataset test set, SVR Regression, Bayesian, Exemplar-based and NIS color constancy methods. Specifically, it can be seen from
Table 5 that on the SFU Grey-Ball dataset, our proposed 2nd-order Local edge method achieves optimal results on the three metrics of mean, best-25% and worst-25%. And it is only 0.36 higher than the best performing Exemplar-based method on the median metric.
Figure 3 demonstrates the results on sample photos from the color checker dataset. It can be seen from
Figure 3 that the proposed color constancy method can well restore the real color of the scene on the Gehler-Shi test data set. For the scenes shown in the first and third rows in
Figure 3, the White-Patch, Shade of Gray and Gray-Edge methods can hardly restore the actual color of the scene, and the results have no obvious improvement compared to the original input. Gray-World has achieved better color constancy results than the above methods in all scenes, but it still has a certain degree of color cast. Our proposed method achieves the smallest angular error on all test scenarios.
3.4. SFU HDR Dataset
Our approach was then tested on a dataset with an HDR light source [
33], which includes 105 high-quality images captured under indoor and outdoor light sources.
The performance statistics for several methods on the SFU HDR dataset are shown in
Table 6. When compared to other models on this dataset, our model performs well on a variety of metrics. It can be seen from
Table 6 that on the SFU HDR image dataset, the 2nd-order Local edge method we proposed achieved the best results on the three metrics of median, mean and worst-25%, and the proposed 1st-order Local edge algorithm achieves as good results as the 2nd-order Local edge on the median metric. The proposed methods are significantly better than the second-ranked Grey-Edge algorithm.
Table 6 shows that the color constancy method we proposed can also accurately restore the true color of the scene in the HDR scene with rich light sources both indoors and outdoors.