Which Features Are More Correlated to Illuminant Estimation: A Composite Substitute

: Computational color constancy (CCC) is to endow computers or cameras with the capability to remove the color bias effect caused by different scene illuminations. The ﬁrst procedure of CCC is illuminant estimation, i.e., to calculate the illuminant color for a given image scene. Recently, some methods directly mapping image features to illuminant estimation provide an effective and robust solution for this issue. Nevertheless, due to diverse image features, it is uncertain to select which features to model illuminant color. In this research, a series of artiﬁcial features weaved into a mapping-based illuminant estimation framework is extensively investigated. This framework employs a multi-model structure and integrates the functions of kernel-based fuzzy c-means (KFCM) clustering, non-negative least square regression (NLSR), and fuzzy weighting. By comparing the resulting performance of different features, the features more correlated to illuminant estimation are found in the candidate feature set. Furthermore, the composite features are designed to achieve the outstanding performances of illuminant estimation. Extensive experiments are performed on typical benchmark datasets and the effectiveness of the proposed method has been validated. The proposed method makes illuminant estimation an explicit transformation of suitable image features with regressed and fuzzy weights, which has signiﬁcant potential for both competing performances and fast implementation against state-of-the-art methods.


Introduction
The human vision system possess an innate "color constancy" capability, i.e., to dismount effects from the illuminant color and perceive the true color of an object [1,2]. Unfortunately, computers and imaging signal processors in vision applications do not have this capability if not to perform specific algorithms. To this end, varieties of computational color constancy (CCC) algorithms have been developed to compensate for the effects of illumination on objects' color perception in the past decades. Once an estimated illuminant color is obtained for a captured image, the color deviation can be corrected by using the von Kries model [1][2][3]. Therefore, illuminant estimation is the key for CCC.
To date, many approaches have been proposed to estimate illuminant color by many researchers. These approaches can be roughly classified into three groups. Gray world (GW) [4,5], White patch (WP) [6], Shades of gray (SoG) [7], Gray edge (GE) [8], etc., belong to the first group-statistics-based methods, which take account of some image statistical features that are kept consistent in the image captured under canonical illuminating environments. Natural Image Statistics [9], machine learning [10], and deep learning [11][12][13][14][15][16], etc., are included in the second group-learning-based methods, which need a learning procedure utilizing various image information to pre-train models and estimate illuminant color. The third group is the combination version of the first two groups [11,[17][18][19]. The

Imaging Model
The most commonly used imaging model for CCC is the one under the assumption of Lambertian reflection [9]. According to this model, the camera's raw intensity response f c (x) is given by [9,16]: where c is a RGB color channel, c ∈ {R, G, B}, x is a pixel location of the given image, λ is the wavelength of the light, Ω is the spectrum range of the visible light, λ ∈ Ω, I(λ, x) denotes the spectral distribution of light source, R(λ, x) denotes the surface reflectance of imaging objects, and ρ c (λ) denotes the camera sensitivity function of color channel c.
To make the problem straightforward, it is commonly assumed the image is uniformly illuminated by a single light source. Considering the case that the image scene is an ideal white surface, it yields R(λ, x) = 1, then the observed illuminant colorˆ = [ˆ R ,ˆ G ,ˆ B ] T will be obtained as:ˆ = Ω I(λ)ρ c (λ)dλ.
Once we knowˆ , the image can be successfully chromatically adapted under the help of von Kries diagonal transformation [3]. However, with only image pixel responses f c given, but both I(λ) and ρ c (λ) unknown, the estimation ofˆ is an under-determined issue, which needs additional assumptions to be taken.

Illuminant Estimation
This section briefly introduces the three typical methods (statistics-based, learningbased, and combinational) for illuminant estimation, and additionally, discusses some methods most related to the topic of this paper.
Statistics-based methods. These algorithms take account of certain assumptions or physical observations, and directly compute some statistical metrics of a given image to obtain the illuminant color. The advantages of these methods lie in low computational efforts, although occasionally the estimation accuracy cannot be assured. Weijer et al. [24] developed a GE framework to unify most of these algorithms, which includes n-order partial derivatives and the Minkowski p-norm given as: where f(x) σ = f(x) ⊗ G σ (x) is defined as convolving the image f(x) by a Gaussian filter G(x) with scale parameter σ, k is a scaling, and e n,p,σ is the estimated illuminant color. The framework is denoted as GE n,p,σ with the choices of n, p and σ. Some special instantiations of this framework [18,19,[24][25][26] include GW (GE 0,1,0 ), WP (GE 0,∞,0 ), SoG (GE 0,6,0 ), 1-order gray edge (GE1) (GE 1,1,6 ), 2-order gray edge (GE2) (GE 2,1,5 ), General gray world (GGW) (GE 0,13,2 ), etc.
Learning-based methods. In these methods, a training process is needed and then a pre-trained model is obtained to estimate the illuminant. Commonly, these methods can be further classified into two types: low-level and high-level ones [16]. The typical low-level methods are gamut mapping algorithm [27] and its extensions [28]. Other examples include some methods using probabilistic models [29], Classification-based algorithm selection [17], and machine learning algorithms [10,30], etc. Some regression-based methods [19,20,22,23] are also included in this group. These methods learn a mapping relationship between image features and illuminant ground truths from massive training data. Following the recent blooming of deep learning, high-level methods based on convolutional neural network (CNN) [11,12,[14][15][16][31][32][33][34][35] have received state-of-the-art performances on various color constancy datasets. Although these methods commonly are more accurate than statisticsbased algorithms, their structural complexities and computational efforts on model training hinder their applications for fast and real-time tasks.
Combination methods. Most unitary algorithms just use a single strategy that usually cannot work well for all types of images. To this end, many algorithms try to use multiple model strategies where the model set of illumination estimation is acquired from some unitary algorithms. Then by weighting the multiple estimates, a final estimate is obtained. According to whether the weights used are fixed or not, combination methods can be grouped into static weight combination and dynamic weight combination. Static weights are not suitable for diverse image features [18,36] and commonly can not ensure the robustness of illuminant estimation. Conversely, dynamic weights are more effective, but they need to tune the combination of unitary estimates against varying image features [11,17,18]. There are many examples [9,17,[37][38][39] to produce the dynamic weights and find the best combination.
Related methods. In the paper [40], Zeinab Abedini provided a new explanation to some statistics-based methods from the perspective of the percentage of pixels involved in illuminant estimation. In these methods, all pixels equally attend in illuminant estimation (e.g., GW), or some pixels are radically selected for processing and the others are ignored (e.g., WP, Bright Pixel [41], PCA-based method [21], Gray Pixels [42], Gray Index [43]). Naturally, the explanation can be extended as a weight-based technique, i.e., all pixels are used, but the weights for different pixels are not the same. And, the weights can be implicitly or explicitly presented. GGW, GE, and SoG are included in this explanation since they can be explained to have implicit weights with the structure of the p-th Minkowski norm. Local surface reflectance (LSR) [44] which uses WP to weight GW can also be seen to have implicit weights [40]. According to this explanation, illumination estimation becomes the regression function of the pixels distribution features and the corresponding weights. Similarly, in learning-based methods, some direct regression-based methods have been proposed [20,21,23,40,45] in recent years, which simply map image features to illuminant. One of the remarkable methods is based on the Corrected Moment (CM) [20]. The CM-based method provides an efficient methodology to yield the illuminant estimation using a polynomial regression from multiple low-level-based methods. Other techniques, e.g., regression tree [30], fuzzy logic [19], deep learning [46,47], also can be regarded as regression-based methods with complex structures. Although these algorithms can achieve outstanding performances, their disadvantage is that they make illuminant estimation work in a black box without further considering the specific features of different images. These methods can be improved if the regression or weighting model is adjusted with image intrinsic features. So it is very important to assess the impacts of selected image features on illuminant estimation and determine which features are more correlated to illuminant estimation. The optimal combination of selected features will make illumination estimation regression more efficient.

Color Correction
For a given image, once the illuminant color is estimated as a illumination vector = (ˆ R ,ˆ G ,ˆ B ), the image should be corrected by the von Kries diagonal transformation [3]. All the colors of the color-biased image under the unknown light source (u) should be mapped to the colors under a canonical illuminant (c), which is demonstrated as follows: where (R u , G u , B u ) T and (R c , G c , B c ) T denote the image colors under the unknown light source and a canonical illuminant, respectively.

Proposed Method
In the proposed method, illuminant estimation is achieved by a serial of linear or nonlinear transformation of suitable image features. The proposed method also can be easily used to assess the illuminant estimation performance by utilizing different features to construct more efficient feature substitutes. The flowchart of the proposed framework is illustrated in Figure 1.
In the training phase, the training dataset is classified into different clustering using the KFCM algorithm. For each cluster, the illuminant estimation models M 1 , M 2 , · · · , M K are obtained using a non-negative least square regression (NLSR) method. For a given test image that has been removed dark level and saturation level in the linear RGB space, the corresponding image features will be extracted and then the final estimated illuminant will be produced by fuzzy weighting all outputs of multiple regression models.

Features Extraction
The image features can be used as representatives of an original image, just with a compact data length. Since it is not clear which features will affect the illuminant estimation much more, in this paper a series of features are gathered into a candidate set to explore the suitable feature selections and better illuminant estimation. The features in the feature set are derived from the up-to-date literature that provides some effective learning-based methods of illuminant estimation. These features are low-level, or middle-level [38]; here we do not use high-level or semantic features, since semantic features are commonly specific to different images and may have small effects on illuminant estimation [38]. Several image features are extracted for each image as follows.
Corrected moments (CM) [20,48]. They are the polynomial expansions of the 2nd, 3rd, and 4th degree, which are described by the following vectors: CM 19 = r, g, b, r 2 , g 2 , b 2 , rg, gb, rb, r 3 , g 3 , b 3 , rg 2 , gb 2 , rb 2 , gr 2 , bg 2 , br 2 , rgb CM 34 = (r, g, b, r 2 , g 2 , b 2 , rg, gb, rb, r 3 , g 3 , b 3 , rg 2 , gb 2 , rb 2 , gr 2 , bg 2 , br 2 , rgb, Here r, g, and b are the illuminant components estimated by some general methods, such as GW, SoG, etc. According to the above expressions, the element numbers recorded in a CM feature are 9, 19, and 34, respectively. RGB-uv histogram (RGBuv). The paper [49] extracted a histogram from the logchrominance space to represent the bulk of information in a color image. This feature is called RGB-uv histogram, which describes the color distribution feature of an image as a m × m × 3 tensor. The works in [49] have validated the effectiveness of this feature to illuminant estimation. To short the length of this feature, a PCA-based dimensionality reduction step is applied to get a compact representation for each RGBuv histogram. As a result, the RGBuv of any image I can be expressed by a serial of primary component coefficients v i , (i = 1, 2, · · · , p): where p is the number of primary components, and p m × m × 3. Color moments (ClrM) can characterize the color distribution in an image. Commonly, three central moments, i.e., first-order moments (Mean), second-order moments (Standard deviation), and third-order moments (Skewness), are sufficient to express the main color distribution of an image. Considering the size of image, each image is evenly divided into 3 × 3 sub-blocks in this study; therefore, 81 moments for an image I can be obtained if the moments are calculated for each RGB channel: Cheng's simple features (ChengSF) [30]. These features are a group of intensityinvariant features based on normalized chromaticity. There are four color chromaticity features extracted from an image, i.e., average color chromaticity [r a , g a ], brightest color chromaticity [r b , g b ], dominant color chromaticity [r d , g d ], and chromaticity mode of the color palette [r m , g m ]. Here r and g are the normalized chromaticity components of RGB color [30]. For an image I, ChengSF can be expressed as follows: Color distribution features (CDF). In paper [19], Luo proposed a compact color distribution feature for illuminant estimation, which is a combination of the number of image colors n c , chromaticity features in Equation (10), and color moments in Equation (9). For an image I, the obtained CDF vector is as follows: (11) where the two adjustable factors, λ 1 and λ 2 , control the influences of different components on the illuminant estimation.
Initial illuminant features (IIF). In paper [19], some unitary algorithms, i.e., GW, WP, SoG, GE1, GE2, GGW, PCA, and LSR, are applied to get eight initial illuminant estimates for an image. These initial estimates are integrated into a vector, i.e., IIF, as a metric of illuminant characteristics. For an image I, the IIF vector is expressed as follows: where γ(e i , e j ) is defined as the Euclidean distance between initial illuminant estimates e i and e j .

KFCM Clustering
In the training stage, a better classification can achieve the appropriate divisions of the training dataset. In each division the features will be close each other and fit for illuminant modeling. We use some above-mentioned features to classify the training dataset by the KFCM method.
KFCM is derived from fuzzy c-means (FCM). FCM is a classification algorithm based on fuzzy sets, which results in clustering partitions with the maximum similarity among samples in the same cluster, while the minimum similarity among different clusters [50].
As an improved version of the common c-means algorithm, FCM provides flexible fuzzy partitions since the training samples are clustered in terms of the membership functions.
For the standard FCM, a cost function based on the Euclidean distance is minimized to solve the cluster centroids and the membership degrees of all samples: where |.| is the Euclidean norm to measure the similarity between sample data and cluster centroid; µ ki is the degree of membership for the sample x k in the cluster i, U = [µ ki ]; ν i is the centroid of the cluster i, V = [ν i ]; n is the total number of samples; m is the exponent of the membership; and c is the cluster number. Furthermore, a kernel function can be introduced into FCM by a nonlinear mapping Φ : x → Φ(x), which maps the low dimension vector x into a high dimension space. Then the kernel-based FCM, i.e., KFCM, is to minimize the following cost function: where Here K(x, ν) is an inner product kernel function, and K(x, ν) = Φ(x) T Φ(ν). In this study, we adopt the Gaussian function as a kernel function, i.e., where σ is the kernel width; then we can get K(x, x) = 1 and K(ν, ν) = 1. According to Equation (15), Equation (16) can be rewritten as: From Equation (16), we can see that the cost function is minimized when the firstorder partial derivatives with respect to µ ki and ν i both are zeros. By these two necessary conditions, the final clusters and their centroids are iteratively obtained as follows: Here i = 1, 2, · · · , c, and k = 1, 2, · · · , n. It is worthy to mention that the clustering performance of KFCM will be influenced by different kernel functions and their parameters. Although the results of Equations (17) and (18) are deduced using the Gaussian kernel function, other kernel functions just satisfying K(x, x) = 1 also can be used here. In the Gaussian kernel function, the kernel width is commonly used as the key tuning parameter. Due to the Gaussian kernel being a typical local kernel function, the function value is largely affected by the data closer to the test point. Hence we can determine the kernel width according to the average distance between the sample data and the cluster centroid. This will ensure that the data in the same cluster play a dominant role in the corresponding kernel function values, while the data out of this cluster make a slight impact on the corresponding kernel function values. In additions, different from single-model methods, multi-model ones used in this study can select different kernel parameters for different clusters.

Model Regression
In this study, to avoid complicated structures of deep neural networks, we just employ the least-square-based method since we focus on finding more suitable features for illuminant estimation rather than aiming for high performance. The precision of model regression is sacrificed somewhat, but the speed and robustness of illuminant estimation can be attained substantially.
We construct the regression relationship directly from image features to illuminant ground truths as follows: where L = e 1 e 2 · · · e N T is the vector of illumination ground truths of the training samples, M is the regression model matrix, and F is the feature matrix of all sampling images. In general, we obtain the solution of Equation (19) as M = F −1 L, where F −1 is the Moore-Penrose inversion of the feature matrix F . But some elements in the resulting M might be negative, which may deteriorate the robustness of the model. To address this problem, we seek the solution to minimize M with each element in M no less than zero. Therefore, we can train a regression model to minimize the L 2 -norm loss between the reconstructed and ground truth illumination by solving the non-negative least square problem: min Therefore, for all partitions of the training set from KFCM, we can get a series of models and construct a model set as follows: where K is the number of the obtained multiple models for the training dataset.

Fuzzy Weighting
For a test image I in , the final output of the illuminant estimation is synthesized by fuzzy weighting the outputs of all models, which is given as: where w i is the weight of the i-th model M i , and F I in denotes the extracted features of the test image I in . For an image sample in the training set, w i is known (it is the membership of degrees for this sample); while for a new test image not included in the training dataset, w i is needed to be computed by determining what degree the test sample belongs to each cluster obtained via KFCM. That is to say, all possibilities that an image falls into each cluster should be acquired by calculation. To this end, we define a variable d f ,i = f − ν i 2 as the squared Euclidean distance between the test image's feature vector f and the cluster centroid ν i , (i = 1, 2, · · · , K). Then a radial basis function is constructed to represent the probability for the test image belonging to the cluster i: where σ f denotes the radial fall-off factor. In this way the weights are obtained, and the illuminant color of the test image can be predicted by Equation (22).

Experimental Set-Up
Dataset. The proposed framework should be verified on a mass of image examples that have a wide scale of color distributions and diverse illuminant characteristics. In this study, the two datasets (Gehler-Shi [51] and Cube+ [15,52]) are employed as the training and testing datasets. Both datasets are suitable for the evaluation of illumination estimation algorithms due to they include typical real-world images with varieties of illumination. In each image of the Gehler-Shi dataset, there is a color-checker board added into the lower right corner for calculating the ground truth illumination, while in the Cube+ dataset, a SpyderCube color target is used in each scene for the same purpose. These color-checker boards and color targets are masked when the images are utilized in the procedures of our experiments. The proposed method performs the standard 3-fold cross-validation for the training and testing procedures, which is commonly used in existing learning-based illuminant estimation literature.
Implementation details. We conducted three types of experiments to test our proposed approach, i.e., (1) Experiment A, training and testing on an individual image dataset from different cameras; (2) Experiment B, training and testing on an image set from an identical camera; and (3) Experiment C, training and testing on different image datasets or a combination of several image datasets. Since an image from a certain camera or dataset might contain some unique information of this camera or dataset, it is necessary to perform Experiment A and Experiment B. Both experiments will help to investigate on selecting suitable features of illuminant estimation for our method under the circumstances ruling out the influence of camera or dataset. On the other hand, considering the training/testing across datasets is the most common situation for learning-based illuminant estimations, Experiment C is employed to validate our method's effectiveness on a combination of multiple datasets.
Our MATLAB implementation is shown in https://github.com/yunhuiluo/if2ic (accessed on 28 December 2021). In the training stage, the training dataset will be clustered according to the selected image features and the regression models will be constructed for each cluster. Since the classification algorithm based on KFCM is not complex mathematically, it takes a very short time compared with other learning methods like deep learning. For example, it requires approximately 12.6 seconds for classification and training 6 models for the Cube+ dataset. Once the training is finished, the illuminant estimation process costs an average time of 0.93 seconds for a new test image with 2601 × 1732 pixels, 48 bits depth, and PNG format; this process includes feature extraction, multiple model estimation, weight computation, and fuzzy weighting output. All the runtimes above-mentioned were obtained from a laptop computer with an Intel Core i5-2450M @ 2.50 GHz CPU.
By default, we set the cluster number K = 4 when using the combination of Gehler-Shi and Cube+, and we set the fall-off factor σ f = 0.25 mentioned in Section 3.4. We utilize the KFCM algorithm to perform image clustering based on the features of IIF, and set the kernel width σ to be 140. Our method requires 6.7 MB to store the classification parameters, multiple model matrix, and some configurations. In our experiments, each PCA feature vector was also represented by 53 primary component scores (the same with [49]) for any RGBuv feature. Since each time the cluster iteration of KFCM algorithm begins from some random initial center positions, the performances for each experiment with the same configurations might be tinily different.
Evaluation Metrics. The two performance metrics widely used in color constancy literature are employed in this paper. The first one is recovery angular error (recovery AE) [9,18], which is defined as the angle in degrees between the illumination's actual 3D-chromaticity e a and its estimated 3D-chromaticity e e : γ(e a , e e ) = cos −1 e a ·e e e a e e . Here the symbol of · denotes the scalar product (dot product), and the symbol of · indicates the Euclidean norm of a vector. The second metric is called reproduction angular error (reproduction AE) [53,54], which usually is more efficient than the traditional recovery AE. The reproduction AE is defined as the angle in degrees between the true color of white surface and the reproduced color of white surface: γ(e c , e u ) = cos −1 e c ·e u e c e u . Here e c denotes the white light source (true color of white surface), e c = [1, 1, 1] T , and e u indicates the reproduced color of white surface, e u = e a /e e , which is the actual 3D-chromaticity e a divided by the estimated 3D-chromaticity e e .
Since both recovery and reproduction AEs are not normally distributed, the Mean and Median values along with the Trimean value are used to assess the statistical performance of color constancy algorithms in this study. Trimean is defined as a weighted average Trimean = (Q 1 + 2Q 2 + Q 3 )/4, where Q 1 (Best 25%), Q 2 , and Q 3 (Worst 25%) are the first, second, and third quantiles of recovery or reproduction AE, respectively. In addition, the maximums of both AEs are used as extra metrics to assess the robustness of the illuminant models along with the metric of Q 3 (Worst 25%).

Experiment A: Individually Testing on Different Datasets (Within Dataset)
In this experiment, our method is independently applied to different datasets. Tables 1 and 2 compare some performance metrics between our method and some conventional algorithms on the Gehler-Shi dataset and the Cube+ dataset, respectively. In order to illustrate the impacts of different features on illuminant estimation performances, all types of features given in Section 3.1 are investigated using the proposed method.   Table 1 shows that our results (except for the one using CDF features) outperform most unitary algorithms on the Gehler-Shi dataset. For example, the Mean of recovery AE is 4.66 for GGW, and 4.17 for our method using ChengSF; The Trimean of recovery AE is 3.81 for GGW, and 3.41 for our method using ChengSF. There is a significant performance improvement near 10% from GGW to our method using ChengSF. Table 1 also shows that our method with some combined features (i.e., RGBuv+ChengSF, RG-Buv+ChengSF+IIF, and RGBuv+ChengSF+IIF+CM 34 ) is superior to some learning-based methods, e.g., Bayesian [29], Natural Image Statistic [46], and WCS with partitioning [40], etc. What's more, we can notice that the proposed method with the combined features is better than most previous methods in terms of the metrics of Best 25% and Worst 25%. This validates that the proposed method can achieve a outstanding illuminant estimation regardless of image features. Table 2 gives the results for the Cube+ dataset. Most metrics of our results in Table 2 are better than those in Table 1, since the Cube+ dataset contains more images than the Gehler-Shi dataset. It can be seen that the statistics of our method with the combined features are almost all better than one of learning-based methods-Attention CNN [56]. These results further indicate the proposed method is effective to achieve a better regression accuracy using optimally selected feature combinations. It should be noticed that the composite features of RGBuv+ChengSF+IIF can obtain outstanding performances using our method. More comparisons of statistical metrics are provided in our GitHub repositories.

Experiment B: Testing on Image Set from Identical Camera (Within Camera)
This experiment uses some images from an identical camera as the training and testing data, which is to verify the estimation accuracy within the camera using the proposed method. Tables 3 and 4 compare the performance metrics for some conventional methods and our method with different features within an identical camera. Here Canon 5D is one of the cameras used by the Gehler-Shi dataset, and Canon EOS 550D used by the Cube+ dataset, respectively. For Canon 5D, 482 images are picked out from the Gehler-Shi dataset, and for Canon EOS 550D, 853 images are randomly selected from the Cube+ dataset. The standard 3-fold cross validation is independently applied to the constructed datasets to train and test the proposed method.  Tables 3 and 4 show that the proposed method using image feature regressions achieves better performances than some traditional methods such as WP, GW, PCA-based, LSR, etc. Using the individual feature in the proposed method, the feature of RGBuv receives the best results. According to the discussion in the paper [49], the RGBuv feature can effectively cover the color distribution of an image. The results in Tables 3 and 4 also validate this viewpoint. It can be seen that compositing several features will provide better results, e.g., RGBuv+ChengSF, RGBuv+ChengSF+IIF, and RGBuv+ChengSF+IIF+CM 34 . For the camera of Canon 5D, the composite of RGBuv+ChengSF+IIF has a 25.9% performance improvement in recovery AE Mean, and a 25.4% performance improvement in recovery AE Trimean, compared with WP. For the camera of Canon EOS 550D, the same improvements reach 42.3% and 32.3% (ours versus GGW), respectively. In our experiments, the optimal feature combination is RGBuv+ChengSF+IIF, which obtains outstanding performances compared with other feature selections.
The combination of image features affects the illuminant estimation since the proposed method is to directly map the extracted features to illuminant color. Except for the results listed in Tables 3 and 4, we conducted many other experiments for the proposed method with more feature combinations. We find that it is not the more combined features the better performance. Just the correlated features are effective for illuminant estimation. Thus, it is significant to find which features are more correlated to the illuminant estimation. In this study, we just investigated several features and their combinations. From the experimental results given, there are reasons to believe that if we find more effective features, the performance can be even more improved by the proposed method.

Experiment C: Cross-Dataset, Cross-Camera Testing
To investigate the robustness of the proposed method, some cross-dataset testing should be performed on multiple datasets. In this experiment, the combination dataset of Gehler-Shi and Cube+ is used by these testing through the 3-fold cross-validation technique. Table 5 provides the statistical metrics of the experimental results.
Since the combined dataset can cover a wider range of image features and illuminant regions, the classification and models obtained by the proposed method commonly receive better generality on illuminant estimation. Table 5 validates this viewpoint. It can be seen that the obtained statistical metrics for all images in the Gehler-Shi and Cube+ datasets reach the good performances, e.g., the Mean and Maximum of recovery AE respectively are 2.13 and 14.83 for the selected optimal features of RGBuv+ChengSF+IIF. Figure 2 shows the recovery and reproduction AEs' distribution for some images from the Gehler-Shi and Cube+ datasets using some conventional methods and the proposed method. These results indicate the proposed method can work well and reach excellent performances. It should be pointed out that the selected optimal features of RGBuv+ChengSF+IIF are prerequisites for these results.
In real life, photographers usually cannot capture images under perfect or standard illumination. A complex lighting environment results in changing brightness and contrast. A robust color constancy algorithm should be able to deal with these challenges. Our algorithm essentially is a weight-based method in which the features more related to illuminant color are highlighted and used to deduce dynamic weights. Figure 3 shows some visual results to face some challenging image scenes. As shown in this figure, our algorithm with the feature combination of RGBuv+ChengSF+IIF reaches the better performance compared with some conventional methods. Consequently, the proposed method has the great potential to color constancy of a wide range of images. For more examples, please refer to the related materials at our GitHub repositories.

Additional Experiments
In the proposed method, many factors will affect the performances of illuminant estimation. Some experiments are conducted to explore these influences.
Clustering methods and cluster numbers. For the proposed method, we perform KFCM clustering by default to classify the training data. For comparisons, the other two clustering algorithms, i.e., K-means and the two-step method [19], are also added to the proposed framework. The reason for selecting K-means and the two-step method is that the former is one of the most commonly used clustering methods, and the latter is an effective algorithm of color constancy based on cluster models. For each clustering algorithm, we can manually set different cluster numbers in order to obtain better results. Table 6 gives some statistical metrics using different clustering methods and cluster numbers. Comparing the results using different configurations, it can be seen that the proposed method along with KFCM seems to be more effective than using K-means and the two-step method. One of possible reason might be that KFCM can provide flexible division for the training set, which enhances the robustness and generalization of the proposed method. And we find K = 4 or 5 is the appropriate selection for our method under the KFCM clustering on the combination dataset of Gehler-Shi and Cube+. It is natural that the cluster number should increase for better modeling performance if we use another dataset with more image samples. Kernel width of KFCM. In the proposed method, kernel width will affect clustering divisions, which will result in different performances. Table 7 provides some statistical metrics on the Gehler-Shi and Cube+ datasets to show the influences of kernel width in KFCM clustering. We can find that the optimal choice of kernel width for our experiments is σ = 100~180. Commonly, the optimal values are obtained by trial and tried in various testing. It should be noticed that the optimal kernel width is varying with the training dataset. Therefore, as an option, Particle Swarm Optimization can be used to search for the optimal kernel width.

Discussion and Analysis
In this paper, the experimental results show that the performances of the proposed method are on par with many statistics-based and learning-based approaches. And we have tentatively identified the feature combinations that are more correlated to illuminant estimation based on a specific feature set. In the meantime, we can obtain some useful observations.
(1) The methods based on composite features generally work better than most methods based on unitary features. The key to the combination of features is to find a compact presentation to cover a much wider range of image features and illuminant characteristics. The dimension of the features is not an obstacle; if the feature dimension is low with useful information compressed, we can use different kernel functions to map them into a higher dimension space [20,48]; on the contrary, if the dimensionality is high with much redundant information, we can use the PCA technique to reduce the dimensionality of the features [49]. To date, as it were, the trial-try method by experiments is still the major route to construct artificial features for regression models of illuminant estimation. Thus, choosing an appropriate feature combination is significant to accurate and robust illuminant estimation.
(2) The methods based on feature regression clearly depend upon not only the optimal feature combinations but also the scale and characteristics of the training sets. If the training examples are from an identical camera or a benchmark dataset using multiple cameras while the testing example is taken from the same camera or dataset, the illuminant estimation accuracy is usually very high. However, if training images from a certain camera or dataset using multiple cameras while the testing example from other camera or dataset, i.e., cross cameras or cross datasets, the illuminant estimation accuracy is commonly worse than expected. As the scale of the training dataset increases and the illuminant range of the training sets extends, a single model is not a preferred option for regression modeling. So a multiple model strategy with high efficient classification (or clustering) becomes preferred, like KFCM clustering and regression modeling associated with fuzzy weighting in this paper.
(3) The type of feature, i.e., features from different cues, generally affects and limits the performances of illuminant estimation. In this paper, we just considered two types of features (low-level CDF and middle-level IIF [18,38,58]. We found the performance improvement is limited although we seem to have an optimal feature combination. Highlevel scene category or semantic features, e.g., Weibull parameters with content-based features [17], 3D scene geometry [37], indoor/outdoor classification [59], etc., can also be exploited in the proposed method.

Conclusions
For the regression-based illuminant estimation methods, it is uncertain which features are more efficient to regress illuminant color. In this research, a flexible framework is developed to integrate KFCM, NLSR, and fuzzy weighting, which can result in outstanding performances of illuminant estimation. By extensive experiments with a series of artificial features weaved into this framework, to some extent it is clarified which features are more correlated to illuminant estimation. Moreover, we exploited an optimal feature combination based on the given feature set, which obtains a significant performance improvement for illuminant estimation.
The proposed method and its framework can be used to test and compare the influences of image features on illuminant estimation. From this, we can optimally deploy the appropriate image features for illuminant estimation in a more objective way, rather than in an empirical, arbitrary manner. If the feature sets contain more feature candidates, the performance of illuminant estimation is expected to be further enhanced.
This study focuses on determining which features are more correlated to illuminant estimation in a limited feature set. Therefore, more related image features, including high-level semantic features, can be further taken into account. Besides, considering there are different intrinsic and implicit cues for different camera sensors, pre-selecting images in terms of camera sensor characteristics and then constructing multiple models may be another pathway to improve the proposed method.