Green Stability Assumption: Unsupervised Learning for Statistics-Based Illumination Estimation

In the image processing pipeline of almost every digital camera there is a part dedicated to computational color constancy i.e. to removing the influence of illumination on the colors of the image scene. Some of the best known illumination estimation methods are the so called statistics-based methods. They are less accurate than the learning-based illumination estimation methods, but they are faster and simpler to implement in embedded systems, which is one of the reasons for their widespread usage. Although in the relevant literature it often appears as if they require no training, this is not true because they have parameter values that need to be fine-tuned in order to be more accurate. In this paper it is first shown that the accuracy of statistics-based methods reported in most papers was not obtained by means of the necessary cross-validation, but by using the whole benchmark datasets for both training and testing. After that the corrected results are given for the best known benchmark datasets. Finally, the so called green stability assumption is proposed that can be used to fine-tune the values of the parameters of the statistics-based methods by using only non-calibrated images without known ground-truth illumination. The obtained accuracy is practically the same as when using calibrated training images, but the whole process is much faster. The experimental results are presented and discussed. The source code is available at http://www.fer.unizg.hr/ipg/resources/color_constancy/.


I. INTRODUCTION
R EGARDLESS of the influence of the scene illumination, the human visual system can recognize object colors through its ability known as color constancy [1]. In the image processing pipeline of almost every digital camera there is also a part dedicated to computational color constancy [2]. It first estimates the scene illumination and then uses it to chromatically adapt the image i.e. to correct the colors. For a more formal problem statement an often used image f formation model written under Lambertian assumption is [3] f c (x) = ω R(λ, x) is the surface reflectance, and ρ c (λ) is the camera sensitivity of color channel c. Assuming uniform illumination for the sake of simplicity makes it possible to remove x from I(λ, x) and then the observed light source color is given as The direction of e provides enough information for successful chromatic adaptation [4]. Still, calculating e is an illposed problem because only image pixel values f are given, while both I(λ) and ρ(λ) are unknown. The solution to this problem is to make additional assumptions. Different assumptions have given rise to numerous illumiantion estimation methods that can be divided into two main groups. First of these groups contains low-level statistics-based methods such as White-patch [5], [6] and its improvements [7], [8], [9], Gray-world [10], Shades-of-Gray [11], Grey-Edge (1st and 2nd order) [12], Weighted Gray-Edge [13], using bright pixels [14], using bright and dark colors [15]. The second group includes learning-based methods such as gamut mapping (pixel, edge, and intersection based) [16], using neural networks [17], using high-level visual information [18], natural image statistics [19], Bayesian learning [20], spatiospectral learning (maximum likelihood estimate, and with gen. prior) [21], simplifying the illumination solution space [22], [23], [24], using color/edge moments [25], using regression trees with simple features from color distribution statistics [26], performing various kinds of spatial localizations [27], [28], using convolutional neural networks [29], [30], [31].
Statistics-based illumination estimation methods are less accurate than the learning-based ones, but they are faster and simpler to implement in embedded systems, which is one of the reasons for their widespread usage [32]. Although in the relevant literature it often appears as if they require no training, this is not true because they have parameter values that need to be fine-tuned in order to give higher accuracy. In this paper it is first shown that in most papers on illumination estimation the accuracy of statistics-based methods was not obtained by means of the necessary cross-validation, but by using the whole benchmark datasets for both training and testing, which leads to an unfair comparison between the methods. After that the corrected results are given for the best known benchmark datasets by performing the same cross-validation framework as for other learning-based methods. Finally, the so called green stability assumption is proposed that can be used to fine-tune the values of the parameters of the statistics-based methods by using only non-calibrated images without known ground-truth illumination. The obtained accuracy is practically the same as arXiv:1802.00776v1 [cs.CV] 2 Feb 2018 when using calibrated training images, but the whole process is much faster and it can be directly applied in practice.
The paper is structured as follows: Section II briefly describes the best known statistics-based methods, Section III shows that their accuracy data should be revisited, Section IV proposes the green stability assumption, Section V presents the results, and finally, Section VI concludes the paper.

II. BEST KNOWN STATISTICS-BASED METHODS
Some of the best known statistics-based illumination estimation methods are centered around the Gray-world assumption and its extensions. Under this assumption the average scene reflectance is achromatic [10] and e is therefore calculated as where k ∈ [0, 1] is the reflectance amount with 0 meaning no reflectance and 1 meaning total reflectance. By adding the Minkowski norm p to Eq. (3), the Gray-world method is generalized into the Shades-of-Gray method [11]: Having p = 1 results in Gray-world, while p → ∞ results in White-patch [5], [6]. In [12] Eq. (4) was extended to the general Gray-world by introducing local smoothing: where f σ = f * G σ and G σ is a Gaussian filter with standard deviation σ. Another significant extension is the Grey-edge assumption, under which the scene reflectance differences calculated with derivative order n are achromatic [12] so that The described Shades-of-Gray, general Gray-world, and Gray-edge methods have parameters and the methods' accuracy depends on how the values of these parameters are tuned. Nevertheless, in the literature it often appears as if they require no training [3], [15], which is then said to be an advantage. It may be argued that the parameter values are in most cases the same, but this is easily disproved. In [15] for methods mentioned in this section the best fixed parameter values were given for ten different datasets. These values are similar for some datasets, but overall they span two orders of magnitude. With such high differences across different datasets in mind, it is obvious that the parameter values have to be learned.

A. Angular error
Before recalculating the accuracy of the methods from the previous section, some introduction to the used measures is needed. From various proposed illumination estimation accuracy measures [33], [34], [35], the angular error is most commonly used. It represents the angle between the illumination estimation vector and the ground-truth illumination.
All angular errors obtained for a given method on a chosen dataset are usually summarized by different statistics. Because of the non-symmetry of the angular error distribution, the most important of these statistics is the median angular error [36]. Angular errors below 3 • are considered acceptable [37], [38].
The ground-truth illuminations of benchmark dataset images are obtained by reading off calibration objects put in the image scene, e.g. a gray ball or a color checker. When a method is tested, these objects are masked out to prevent possible bias.

B. The need for cross-validation
When it comes to accuracy obtained on benchmark datasets, the ones available in [3], at [39], and in [27] are the most widely copied and referenced. If, for example, the results obtained for Shades-of-Gray on the GreyBall dataset [40] are checked [39], the reported mean and median angular errors of 6.1 • and 5.3 • , respectively, are obtained by setting p to 12 on all 15 folds. However, by performing cross-validation i.e. by looking for the best p on 14 training folds, applying this to the test fold, and repeating it all 15 times clearly shows that p differs for various training sets. Overall, the mean and median errors for combined results of all test folds are 7.8 • and 7.2 • , respectively, which differs from the reported results. Similar differences can be shown for other methods from Section II as well. For the sake of fair comparison with other illumination estimation methods, these accuracies are properly recalculated and they are provided in Section V together with other results.

A. Practical application
The methods mentioned in Section II are some of the most widely used illumination estimation methods [32] and this means that their parameters should preferably be appropriately fine-tuned before putting them in production. The best way to do this is to use a benchmark dataset, but because of dependence of Eq. (2) on ρ(λ), a benchmark dataset would be required for each used camera sensor. Since putting the calibration objects into image scenes and later extracting the ground-truth illumination is time consuming, it would be better, if possible, to perform some kind of unsupervised learning on non-calibrated images without known ground-truth illumination. This would save time and be of practical value.

B. Motivation
When for a dataset the ground-truth illuminations are unknown, an alternative is to make assumptions about the nature of illumination estimations produced by statistics-based methods when their parameters are fine-tuned and then to meet the conditions of the assumptions. When considering the nature of illumination estimations, a good starting point is the observation that some statistics-based illumination estimations appear "to correlate roughly with the actual illuminant" [25]. Fig. 1 shows this for the images of the GreyBall dataset [40].
The points in Fig. 1 can be considered to occupy a space around a line in the rb-chromaticity [22] which is connected for GreyBall dataset images [40] (best viewed in color).
to the fact that the green chromaticity of the ground-truth illuminations is relatively stable and similar for all illuminations.
For the GreyBall dataset the standard deviations of the red, green, and blue chromaticity components of the ground-truth illuminations are 0.0723, 0.0106, and 0.0750, respectively, and similar results are obtained for all other datasets. For Shadesof-Gray illumination estimations shown in Fig. 1 the red, green, and blue chromaticity components of the ground-truth illuminations are 0.0842, 0.0253, and 0.0770, respectively, which means that there is also a trend of green chromaticity stability, although the standard deviation is greater than in the case of ground-truth illumination. This means that if a set of illumination estimations is to resemble the set of ground-truth illuminations, the estimations' green chromaticity standard deviation should also be smaller and closer to the one of the ground-truth. As a matter of fact, if for example the Shadesof-Gray illumination estimations for p = 2 and p = 15 shown in Fig. 2 are compared, the standard deviations of their green chromaticities are 0.0253 and 0.0158, respectively, while their median angular errors are 6.2 • and 5.3 • , respectively. Similar behaviour where lower green chromaticity standard deviation is to some degree followed by lower median angular error can be seen on all datasets and for all methods from Section II. Fig. 2: The rb-chromaticities of different Shades-of-Gray illumination estimations for GreyBall dataset images [40] (best viewed in color).
For a deeper insight into this behaviour, another experiment was conducted on the GreyBall dataset [40]. First, for each method M ∈ M, where M contains all methods from Section II, the Cartesian product of discrete sets of evenly spread values for individual parameters of M was calculated to get n tuples p (i) M , i ∈ {1, 2, . . . , n}. Gray-world and Whitepatch have no parameters, but they were implicitly included as special cases of Shades-of-Gray. Second, each p (i) M was used to set the parameter values of M and then M was applied to all images of the GreyBall dataset to obtain an illumination estimation for each of them. Third, for these illumination estimations the standard deviation of their green chromaticities σ i and their median angular error m i were calculated. Fourth, for every of n 2 possible pairs of indices i, j ∈ {1, 2, . . . , n} such that i < j a new difference pair {∆σ k , ∆m k } was calculated such that ∆σ k = σ i − σ j and ∆m k = m i − m j . Finally, all such difference pairs created for all M ∈ M were put together into set of pairs P. If members of pairs in P are interpreted as coordinates, then their plot is shown in Fig. 3.

C. Green stability assumption
The value of Person's linear correlation coefficient for the points in Fig. 3 is 0.7408, which indicates a strong positive linear relationship [41]. In other words, the difference between the standard deviations of green chromaticities of illumination estimations produced by the same method when using different parameter values is strongly correlated to the difference between median angular errors of these illumination estimations. The same correlations for NUS datasets [15] are in Table I. Based on this empirical results and observations, it is possible to introduce the green stability assumption: the parameter values for which a method's illumination estimations' green chromaticity standard deviation is lower simultaneously lead to lower illumination estimation errors. Like many other assumptions, this assumption does not always hold, but it can still be useful in cases when the ground-truth illuminations for a set of images taken with a given sensor are not available. These images should also be taken under similar illuminations as the mentioned datasets that were used for empirical results.
For a specific case when the parameter values of a chosen method are fine-tuned and only non-calibrated images are available, the green stability assumption can be expressed more formally. If n is the number of images in the training set, p i is the i-th vector of parameter values, e i,j is the method's illumination estimation obtained for the j-th image when p i is used for parameter values, e i,j G is the green component of e i,j , and e i G is the mean green component of illumination estimations for all images obtained with parameters p i , then under the green stability assumption the index i * of such p i * that should result in minimal angular errors is obtained as Since Eq. (7) performs minimization of standard deviation, it can also be written without the square and the denominator.

A. Experimental setup
The following benchmark datasets have been used to demonstrate the difference between previously reported and Originally reported results Shades-of-Gray [11] 11.55 9.70 10.23 General Gray-World [4] 11.55 9.70 10.23 1st-order Gray-Edge [12] 10.58 8.84 9.18 2nd-order Gray-Edge [12] 10.68 9.02 9.40 Revisited results Shades-of-Gray [11] 13.32 11.57 12.10 General Gray-World [4] 13.69 12.11 12.55 1st-order Gray-Edge [12] 11.06 9.54 9.81 2nd-order Gray-Edge [12] 10.73 9.21 9.49 Green stability assumption results Shades-of-Gray [11] 12.68 10.50 11.25 General Gray-World [4] 12.68 10.50 11.25 1st-order Gray-Edge [12] 13.41 11.04 11.87 2nd-order Gray-Edge [12] 12.83 10.70 11.44 newly calculated accuracy results for methods mentioned in Section II and to test the effectiveness of the proposed green stability assumption: the GreyBall dataset [40], its approximated linear version, and eight linear NUS dataset [15]. The ColorChecker dataset [20], [42] was not used because of its confusing history of wrong usage despite warnings from leading experts [43]. Except the original GreyBall dataset, all other contain linear images, which is preferred because illumination estimation is in cameras usually performed on linear images [2] similar to the model described by Eq. (1). The tested methods include all the ones from M. During cross-validation on all datasets the same folds were used as in other publications. The source code for recreating the numerical results given in the following subsection is publicly available at http://www.fer.unizg.hr/ipg/resources/color constancy/.

B. Accuracy
Tables II, III, and IV show the previously reported accuracies, the newly recalculated accuracies, and the accuracies obtained by using the green stability assumption. The results clearly confirm the potential and the practical applicability of the green stability assumption. This also demonstrates the success of unsupervised learning for illumination estimation.

VI. CONCLUSIONS AND FUTURE RESEARCH
In most relevant papers the accuracy results for some of the most widely used statistics-based methods were calculated without cross-validation. Here it was shown that crossvalidation is needed and the accuracy results were revisited. When statistics-based methods are fine-tuned, the best way to do this is by using images with known ground-truth illumination. Based on several observations and empirical evidence, the green stability assumption has been proposed that can be successfully used to fine-tune the parameters of statistics-based methods when only non-calibrated images without groundtruth illumination are available. This makes the whole finetuning process much simpler, faster, and more practical. It is also an unsupervised learning approach to color constancy. In future, other similar bases for further assumptions for unsupervised learning for color constancy will be researched.