ARiRTN: A Novel Learning-Based Estimation Model for Regressing Illumination

In computational color constancy, regressing illumination is one of the most common approaches to manifesting the original color appearance of an object in a real-life scene. However, this approach struggles with the challenge of accuracy arising from label vagueness, which is caused by unknown light sources, different reflection characteristics of scene objects, and extrinsic factors such as various types of imaging sensors. This article introduces a novel learning-based estimation model, an aggregate residual-in-residual transformation network (ARiRTN) architecture, by combining the inception model with the residual network and embedding residual networks into a residual network. The proposed model has two parts: the feature-map group and the ARiRTN operator. In the ARiRTN operator, all splits perform transformations simultaneously, and the resulting outputs are concatenated into their respective cardinal groups. Moreover, the proposed architecture is designed to develop multiple homogeneous branches for high cardinality, and an increased size of a set of transformations, which extends the network in width and in length. As a result of experimenting with the four most popular datasets in the field, the proposed architecture makes a compelling case that complexity increases accuracy. In other words, the combination of the two complicated networks, residual and inception networks, helps reduce overfitting, gradient distortion, and vanishing problems, and thereby contributes to improving accuracy. Our experimental results demonstrate this model’s outperformance over its most advanced counterparts in terms of accuracy, as well as the robustness of illuminant invariance and camera invariance.


Introduction
Colors in a scene image tend to be biased due to unknown light sources, different reflection characteristics of scene objects, and the external spectral sensitivity of diverse imaging sensors.Surprisingly, colors are perceived as constant by the human visual perception system (HVPS) despite unexpected interactions between different light sources.Color constancy is a key attribute of the HVPS that enables consistency in perceiving the original color appearance of an object under any illuminants.This feature has long been drawing much attention from researchers in the computational color constancy community as this characteristic serves as the underlying mechanism for a wide range of computer vision fields and applications.In computer vision, color constancy primarily deals with estimating the illumination color of a scene and reproducing the canonical color of scene objects.In computer vision color constancy, an array of approaches [1][2][3][4][5][6] rely on estimation accuracy to regress the illuminant and use the simple but effective von Kries model [7] to render the scene image.A network is designed to learn regression mapping from the consistent illuminant label or ground truth dataset and thereby perform illumination estimation.To enable networks to perform the most accurate possible estimation, it is also Sensors 2023, 23, 8558 2 of 12 critical to formulate the best possible hypothesis about the illuminant [4].This is a tough task and requires coping with appearance contradiction and label vagueness.The color appearance of a captured scene object varies significantly depending on the sensitivity of the sensor and illumination spectrum.To reduce such influences, networks are trained on a camera-specific predictor; however, this is deemed ineffective due to the challenge of data demands.Some approaches attempt to make camera-agnostic illuminant predictions and accomplish robust performance.Other approaches, as in ref. [5,6,8,9], have been suggested to address appearance contradiction.In the inception approaches [10][11][12][13], it is worth noting that theoretical complexity is the basis for building highly sophisticated architecture and resultantly improving estimation accuracy.Inception networks have been evolving over time [11,12].Behind those networks is a split-transform-merge strategy.With a fixed set of receptive field sizes, the network blocks in the architecture perform transformation simultaneously, and the resultant outputs merge in a concatenating manner.They have made progress in estimation accuracy, which is of course attributable to the complexity of the architecture.With the number of receptive fields and their sizes tailored for transformation, the architecture handles data step by step.In this way, constructing more sophisticated architecture has brought incremental progress in accuracy, but not innovation.This begs the question as to the applicability of the network to a new or broader range of tasks or datasets.Inspired to seek the answer and make meaningful enhancements to the color constancy system, a novel learning-based estimation model is introduced, an aggregate residual-in-residual transformation network (ARiRTN) architecture, by combining the inception model with the residual network and embedding residual networks into a residual network.The proposed model has two parts: the feature-map group and the ARiRTN operator.In the ARiRTN operator, all splits perform transformations simultaneously, and the resulting outputs are concatenated into their respective cardinal groups.The proposed architecture is designed to develop multiple homogeneous branches for high cardinality, and an increased size of a set of transformations, which extends the network in width and in length.
This article makes three key contributions, as summarized below.
Creating a novel learning-based estimation model, an aggregate residual-in-residual transformation network (ARiRTN), by combining the inception model with the residual network and embedding residual networks into a residual network.
Experimenting and demonstrating the applicability of the inception model to new tasks and datasets.Achieving next-level estimation accuracy, as verified by experiments on standard, public datasets, and making a meaningful contribution to the field of computer vision color constancy.

Previous Works
The Gray-world hypothesis is at the center of traditional color constancy approaches such as GW [14] and its extended versions in ref. [15,16].These approaches assume that a real-life scene has an achromatic mean of reflection under a neutral source illuminant.The hypothesis uses low-level statistics that describe scene reflectance statistics for achromatic scene color.It is derived from perfect reflectance [17,18] and has been used to develop WP approaches.These approaches feature fast computing speeds and require a small number of free parameters.However, they are too dependent on their hypothesis to cope with unexpected situations outside the conditions of the hypothesis.Some approaches use Bayesian theory in ref. [19] to calculate the posterior distribution for the estimation of the illuminant color and scene surfaces.Bayesian theory was developed to compute the prior distribution of illuminant colors and surface reflectance.The prior distribution is the analytical result of a multivariate truncated normal distribution of the weights of a linear approach.Other approaches [20,21] classify the illuminant color space through the use of the Bayesian framework and train the networks on the histogram frequencies of real-life scenes to generate the surface reflectance prior distributions.For illumination estimation, the approach in ref. [20] uses the prior distribution, which is meant to be a uniform distribution across a subset of illuminant colors, whereas that of ref. [21] uses the empirical distribution of the learning illuminant colors.
In fully supervised works, learning-based approaches [22,23] encompass combinational and direct methods, and their dependence on hand-crafted image features results in performance constraints.Recently, color constancy approaches based on fully supervised convolutional neural networks (CNNs) have made remarkable progress in estimation accuracy.They use either local patches [23,24] or the entire image input [6,[25][26][27][28][29][30].From a color classification perspective, some approaches, including convolutional color constancy [24] and its extended version-the fast Fourier color constancy approach [9]-use a color space on which a histogram shift is used to verify image re-illumination.As a result, they achieve successful and efficient estimation of diverse illumination candidates.The approach in ref. [31] employs a K-means cluster to gather illumination from datasets and adopts a CNN to perform a classification task.Here, the input is a single pre-white balancing image and the output is a K-class probability, and the K-mean cluster predicts each class of illuminants, which accounts for the rendering image.
Finally, the approach in ref. [32] adopts two CNNs for multi-device training: one carries out sensor-independent linear transformation of a 3 × 3 receptive field size and transforms the RGB color images into a canonical color space, while the other offers the estimated illumination.This approach uses a variety of datasets, except those captured by the test imaging device, and arrives at a successful result.Ref. [33] achieves imaging device invariance by using various samples across diverse imaging devices and datasets in a meta-learning framework.The approach in [34] assumes that standard RGB images gathered from websites are good white-balancing images.They undergo natural de-gamma correction for inverse tone-mapping, and a CNN is used to pick achromatic pixels for illumination estimation.These images were taken using unknown imaging devices and processed with diverse ISP pipelines.Therefore, they might have already been manipulated by unknown software.Nevertheless, this approach makes incremental progress, not innovative.To take estimation accuracy to the next level, this article introduces a novel learning-based estimation model, an aggregate residual-in-residual transformation network (ARiRTN) architecture, by combining the inception model with the residual network and embedding residual networks into a residual network.The proposed model has two parts: the feature-map group and the ARiRTN operator.In the ARiRTN operator, all splits perform transformations simultaneously, and the resulting outputs are concatenated into their respective cardinal groups.The proposed architecture is designed to develop multiple homogeneous branches for high cardinality, an increased size of a set of transformations, which extends the network in width and in length.The next section provides a more detailed elaboration on the proposed approach.

The Proposed Method
During the last several decades, the inception approach has demonstrated that complexity increases accuracy by carefully designing architectures.Inception networks have evolved over time and their key feature is a split-transform-merge strategy.In their architecture, the network blocks perform transformation simultaneously with a set of specialized receptive fields, and the resulting outputs merge in a concatenating manner.As a result, inception networks bring improved accuracy, which is attributable to their structural complexity.Inspired by the inception network and to take estimation accuracy to the next level, a novel learning-based estimation model is introduced, an aggregated residual-in-residual transformation network (ARiRTN) architecture, by combining the inception model with the residual network and embedding residual networks into a residual network.The proposed model has two parts: the feature-map group and the ARiRTN operator.The subsections that follow discuss the proposed architecture in detail.

Cardinal Groups of ARiRTN
Cardinal groups are formed by separating features into feature-map groups using a cardinality hyper-parameter K as in RexNext [35].A radix hyper-parameter, R, in this subsection expresses the number of splits within a cardinal group.Hence, the total number of feature-map groups is described as G = KR.Supposing that a cardinal group has a series of transformation {F 1 , F 2 , F 3 , . . ., F G }, a cardinal group is represented as U i = F i (X ), f or i ∈ {1, 2, 3, . . . ,G}.As in ref. [36,37], an integral representation of a cardinal group is obtained through an element-wise summation across all the splits.Suppose that Ûk ∈ R H×W× C k , f or k ∈ 1, 2, 3, . . ., K, with H, W, and C referring to the output feature-map sizes; the k − th cardinal group is represented as Ûk=∑ Rk j=R(k−1)+1 U j .With channel-wise statistics embedded in the architecture, the global contextual information is obtained through the global average pooling operation in a spatial dimension s k ∈ R C/K .Hence, the c − th constituent is computed as follows [38]: If each feature-map channel is created with a weighted integration across all the splits, the weighted integration of each cardinal group V k ∈ R H×W×C/K is gathered via channelwise soft attention.Let a k i (c) represent a soft assignment weight and mapping, g c i , decide the weight of each split for the c − th channel based on the global context, s k ; the c − th channel is described as follows: where The cardinal groups are then concatenated according to the channel dimension as in V = concat{V 1 , V 2 , V 3 , . . ., V K }.Supposing that the input and output feature maps have the same form, the proposed architecture ultimately generates the output, Y, using skip-connection, described as Y = V + X.Further, a transformation, T, is adopted to modify the output as follows: Y = T(X) + V.

Efficient Implementation of the Proposed ARiRTN Architecture
What the previous subsection discussed is the layout of cardinality-major implementation.Here, the feature-map groups with the same cardinal index are placed next to one another.Cardinality-major implementation is a simple and intuitive task, but is challenging to modularize and accelerate using CNN architecture.To address this challenge, radix-major implementation is adopted for the proposed architecture.Figure 1 presents the proposed ARiRTN architecture with radix-major implementation.The feature map is separated into several RK groups that have cardinality and radix indexes.The groups with the same radix index are placed next to one another.An add operation is conducted across all the splits.Finally, the feature-map groups are concatenated in a sequence of cardinal numbers; the feature-map groups with identical cardinality indexes merge through the concatenation operation, but not those with different radix indexes.Following the operation of the global pooling layer, K number of successive cardinalities are added up to estimate the attention weights for each split, as shown in Figure 1. Figure 2 illustrates the Bottleneck Residual Block (BR-Block) and the Dense Selective Kernel Block (DSKB) shown in Figure 1.The BR-Block is a variant of the residual block that uses a 1 × 1 convolution and is used to create a bottleneck intended to reduce the number of parameters and perform matrix multiplication.The DSKB, which is already introduced in ref. [30], is composed of Selective Kernel Convolutional Blocks (SKCBs).The l-th input of the l-th SKCB is made up of the feature maps of all its preceding SKCBs, which have undergone the split, fuse, and select.The SKCB has the advantage of adjusting the receptive field size to the changing intensity of the input stimuli.Accordingly, the proposed ARiRTN architecture is expected to obtain stability and robustness in regressing the illuminant.Furthermore, the architecture has potential for broader use in a variety of deep learning applications as it keeps up with the latest network configuration trends.1.The BR-Block is a variant of the residual block that uses a 1 × 1 convolution and is used to create a bottleneck intended to reduce the number of parameters and perform matrix multiplication.The DSKB, which is already introduced in ref. [30], is composed of Selective Kernel Convolutional Blocks (SKCBs).The l-th input of the l-th SKCB is made up of the feature maps of all its preceding SKCBs, which have undergone the split, fuse, and select.The SKCB has the advantage of adjusting the receptive field size to the changing intensity of the input stimuli.Accordingly, the proposed ARiRTN architecture is expected to obtain stability and robustness in regressing the illuminant.Furthermore, the architecture has potential for broader use in a variety of deep learning applications as it keeps up with the latest network configuration trends.Figure 2 illustrates the Bottleneck Residual Block (BR-Block) and the Dense Selective Kernel Block (DSKB) shown in Figure 1.The BR-Block is a variant of the residual block that uses a 1 × 1 convolution and is used to create a bottleneck intended to reduce the number of parameters and perform matrix multiplication.The DSKB, which is already introduced in ref. [30], is composed of Selective Kernel Convolutional Blocks (SKCBs).The l-th input of the l-th SKCB is made up of the feature maps of all its preceding SKCBs, which have undergone the split, fuse, and select.The SKCB has the advantage of adjusting the receptive field size to the changing intensity of the input stimuli.Accordingly, the proposed ARiRTN architecture is expected to obtain stability and robustness in regressing the illuminant.Furthermore, the architecture has potential for broader use in a variety of deep learning applications as it keeps up with the latest network configuration trends.

Experimental Results and Evaluations
This section discusses the experimental results and evaluations.The proposed ARiRTN architecture experiment was conducted on public, standard datasets of a great number of diverse images taken under a multitude of illumination conditions: the Gehler and Shi illuminant dataset [21] of 0.568 K images that capture a considerable variety of indoor and outdoor scenes, the Gray-ball dataset [39] of 11.340 K illuminant images of diverse scenes, and the Cube+ [40] illuminant dataset of 1365 images that capture different scenes, with their illuminant colors known and additional semantic data used for the purpose of improving the training process towards greater progress in estimation accuracy.
The proposed ARiRTN architecture runs on the machine learning codes in Tensor-Flow [41] and is operated in NVIDIA TITAN RTX 24 G.The total training time is 1 day and 11 h with 10 K epochs.In addition to resizing an image into 227 × 227 pixels, the network is set up to have an input batch size of 16.The parameters are optimized through several experiments on the Gehler and Shi illuminant dataset.Figure 3 shows that the proposed ARiRTN architecture tends to converge to zero training loss.Here, with a weight decay of 5 × 10 5 and a momentum of 0.9, several training loss values are compared to determine the optimal initial training rate for the proposed ARiRTN architecture.As highlighted in the previous section, a prominent feature of the proposed architecture is its use of BR-Block and DSKB, as opposed to the CNN and Dense network, their counterparts in conventional network structures, which both consist of 1 × 1 convolutional networks.The proposed ARiRTN architecture employs the BR-Block and DSKB to grow in complexity and increase in width and in depth.As a result, the proposed architecture makes meaningful improvements in estimation accuracy.Figures 4 and 5

Experimental Results and Evaluations
This section discusses the experimental results and evaluations.The proposed ARiRTN architecture experiment was conducted on public, standard datasets of a great number of diverse images taken under a multitude of illumination conditions: the Gehler and Shi illuminant dataset [21] of 0.568 K images that capture a considerable variety of indoor and outdoor scenes, the Gray-ball dataset [39] of 11.340 K illuminant images of diverse scenes, and the Cube+ [40] illuminant dataset of 1365 images that capture different scenes, with their illuminant colors known and additional semantic data used for the purpose of improving the training process towards greater progress in estimation accuracy.
The proposed ARiRTN architecture runs on the machine learning codes in Tensor-Flow [41] and is operated in NVIDIA TITAN RTX 24 G.The total training time is 1 day and 11 h with 10 K epochs.In addition to resizing an image into 227 × 227 pixels, the network is set up to have an input batch size of 16.The parameters are optimized through several experiments on the Gehler and Shi illuminant dataset.Figure 3 shows that the proposed ARiRTN architecture tends to converge to zero training loss.Here, with a weight decay of 5 10 and a momentum of 0.9, several training loss values are compared to determine the optimal initial training rate for the proposed ARiRTN architecture.As highlighted in the previous section, a prominent feature of the proposed architecture is its use of BR-Block and DSKB, as opposed to the CNN and Dense network, their counterparts in conventional network structures, which both consist of 1 × 1 convolutional networks.The proposed ARiRTN architecture employs the BR-Block and DSKB to grow in complexity and increase in width and in depth.As a result, the proposed architecture makes meaningful improvements in estimation accuracy.Figures 4 and 5    The next experiments use several standard datasets, the Cube+, Gray-ball, and Multi-Cam datasets, to compare the performance of the proposed ARiRTN architecture against its most advanced counterparts [42][43][44][45][46][47][48][49][50][51].In recent decades, the CNN architecture has played an integral role in performing advanced computer vision tasks, including regressing illuminants.However, this approach has struggled with the challenge of accuracy arising from label vagueness caused by unknown source lights, different reflection characteristics of scene objects, and extrinsic factors such as various types of imaging sensors.
Recently, inception approaches have demonstrated that complexity increases accuracy by carefully designing architectures.Inception networks have evolved over time and their key feature is a split-transform-merge strategy.In the architecture, the network blocks perform transformation simultaneously with a set of specialized receptive fields, and the resulting outputs merge in a concatenating manner.As a result, inception networks bring improved accuracy, which is attributable to their structural complexity.Inspired by the inception network and to overcome the limitation of the conventional CNN architecture, a novel learning-based estimation model is introduced, an aggregate residual-in-residual transformation network (ARiRTN) architecture, by combining the inception model with the residual network and embedding residual networks into a residual network.The proposed model has two parts: the feature-map group and the ARiRTN operator.The next experiments use several standard datasets, the Cube+, Gray-ball, and Mul-tiCam datasets, to compare the performance of the proposed ARiRTN architecture against its most advanced counterparts [42][43][44][45][46][47][48][49][50][51].In recent decades, the CNN architecture has played an integral role in performing advanced computer vision tasks, including regressing illuminants.However, this approach has struggled with the challenge of accuracy arising from label vagueness caused by unknown source lights, different reflection characteristics of scene objects, and extrinsic factors such as various types of imaging sensors.
Recently, inception approaches have demonstrated that complexity increases accuracy by carefully designing architectures.Inception networks have evolved over time and their key feature is a split-transform-merge strategy.In the architecture, the network blocks perform transformation simultaneously with a set of specialized receptive fields, and the resulting outputs merge in a concatenating manner.As a result, inception networks bring improved accuracy, which is attributable to their structural complexity.Inspired by the inception network and to overcome the limitation of the conventional CNN architecture, a novel learning-based estimation model is introduced, an aggregate residual-in-residual transformation network (ARiRTN) architecture, by combining the inception model with the residual network and embedding residual networks into a residual network.The proposed model has two parts: the feature-map group and the ARiRTN operator.
Figure 6 displays the resulting images at each step delivered by the proposed ARiRTN architecture with the Gehler and Shi illuminant dataset.Figure 6    Table 1 is a summary table of the comparative analysis between multiple conve tional approaches and the proposed ARiRTN architecture in terms of the mean, media trimean, best 25%, and worst 25%.The experimental results highlight extraordinary pe formance of the proposed ARiRTN architecture compared to its latest counterparts.Table 1 is a summary table of the comparative analysis between multiple conventional approaches and the proposed ARiRTN architecture in terms of the mean, median, trimean, best 25%, and worst 25%.The experimental results highlight extraordinary performance of the proposed ARiRTN architecture compared to its latest counterparts.Table 2 is a summary table of the test results that evaluate the proposed ARiRTN architecture against its conventional counterparts, and highlights that the proposed ARiRTN architecture significantly outperforms its conventional counterparts, topping the stateof-the-art approaches in terms of estimation accuracy.Tables 1 and 2 prove the robust illuminant invariance of the proposed ARiRTN architecture.Table 3 is a summary table of the test results that evaluates the proposed ARiRTN architecture against its conventional counterparts in terms of the angular errors of inter-camera estimation, using a MultiCam dataset that consists of 1365 outdoor images captured using a Cannon 550D camera.The results demonstrate that the proposed ARiRTN architecture surpasses its conventional counterparts in terms of the angular errors of the inter-camera estimation, as well as proving its robustness in terms of illuminant and imaging device invariance.Bold numbers mean the results of the proposed method.

Conclusions
In computational color constancy, regressing illumination is a classical approach to manifesting the original color appearance of an object in a real-life scene.However, this approach has struggled with the challenge of accuracy arising from label vagueness, which is caused by unknown source lights, different reflection characteristics of scene objects, and extrinsic factors such as various types of imaging sensors.This article introduces a novel learning-based estimation model, an aggregate residual-in-residual transformation network (ARiRTN) architecture, by combining the inception model with the residual network and embedding residual networks into a residual network.The proposed model has two parts: the feature-map group and the ARiRTN operator.In the ARiRTN operator, all splits perform transformations simultaneously, and the resulting outputs are concatenated into their respective cardinal groups.Moreover, the proposed architecture is designed to develop multiple homogeneous branches for high cardinality, an increased size of a set of transformations, which extends the network in width and in length.Comparative experiments were conducted using the four most popular datasets in the field: Shi's dataset, the Cube + dataset, the Gray-ball dataset and the MultiCam dataset.The proposed architecture makes a compelling case that complexity increases accuracy by demonstrating outstanding progress in estimation accuracy compared with its previous counterparts.The combination of the two complicated networks, residual and inception networks, is proven to contribute to reducing overfitting, gradient distortion, and vanishing problems, and thereby improving accuracy.These experimental results support this model's outperformance over its most advanced counterparts in terms of accuracy, as well as the robustness of illuminant invariance and camera invariance.Nevertheless, it is ever meaningful and worthwhile to continue to strive towards creating more advanced learning-based illuminant estimation models and take color constancy to newer heights.

13 Figure 1 .
Figure 1.The proposed ARiRTN architecture with radix-major implementation; the feature-map groups, grouped by radix index and cardinality, are sitting next to one another.

Figure 1 .
Figure 1.The proposed ARiRTN architecture with radix-major implementation; the feature-map groups, grouped by radix index and cardinality, are sitting next to one another.

Figure 2
Figure2illustrates the Bottleneck Residual Block (BR-Block) and the Dense Selective Kernel Block (DSKB) shown in Figure1.The BR-Block is a variant of the residual block that uses a 1 × 1 convolution and is used to create a bottleneck intended to reduce the number of parameters and perform matrix multiplication.The DSKB, which is already introduced in ref.[30], is composed of Selective Kernel Convolutional Blocks (SKCBs).The l-th input of the l-th SKCB is made up of the feature maps of all its preceding SKCBs, which have undergone the split, fuse, and select.The SKCB has the advantage of adjusting the receptive field size to the changing intensity of the input stimuli.Accordingly, the proposed ARiRTN architecture is expected to obtain stability and robustness in regressing the illuminant.Furthermore, the architecture has potential for broader use in a variety of deep learning applications as it keeps up with the latest network configuration trends.

Sensors 2023 , 13 Figure 1 .
Figure 1.The proposed ARiRTN architecture with radix-major implementation; the feature-map groups, grouped by radix index and cardinality, are sitting next to one another.
depict the comparisons between BR-Block and CNN, and between the DSKB and Dense network, by calculating and representing their median and average angular errors on the logarithmic scale.The Gehler and Shi dataset is used in the comparative experiments of training as well as cross validation, and the median and average angular errors are registered at an interval of 20 epochs.Sensors 2023, 23, x FOR PEER REVIEW 6 of 13 depict the comparisons between BR-Block and CNN, and between the DSKB and Dense network, by calculating and representing their median and average angular errors on the logarithmic scale.The Gehler and Shi dataset is used in the comparative experiments of training as well as cross validation, and the median and average angular errors are registered at an interval of 20 epochs.

Figure 3 .
Figure 3.Comparison of initial training rates by calculating their train losses to find one that best fits the proposed architecture.

Figure 3 .
Figure 3.Comparison of initial training rates by calculating their train losses to find one that best fits the proposed architecture.
shows (a) the original input image, (b) the resulting image of estimating the illuminant, (c) the ground truth image, and finally, (d) the resulting image after correcting the original image, which ultimately manifests the real-scene image well without an undesired illuminant effect.

Figure 5 .
Figure 5. Performance comparison between DSKB and Dense network by calculating (a) median angular errors and (b) average angular errors.

Figure 6 Figure 6 .
Figure 6 displays the resulting images at each step delivered by the proposed ARiRTN architecture with the Gehler and Shi illuminant dataset.Figure 6 shows (a) the original input image, (b) the resulting image of estimating the illuminant, (c) the ground truth image, and finally, (d) the resulting image after correcting the original image, which ultimately manifests the real-scene image well without an undesired illuminant effect.Sensors 2023, 23, x FOR PEER REVIEW 9 of

Figure 6 .
Figure 6.The resulting images at each step delivered by the proposed ARiRTN architecture: (a) the original input image (b) the estimated illuminant image, (c) the ground truth image, and (d) the rendered image.

Table 1 .
Comparison of angular errors between multiple conventional approaches and the propos ARiRTN architecture with Cube+ datasets (lower value means higher accuracy).

Table 1 .
Comparison of angular errors between multiple conventional approaches and the proposed ARiRTN architecture with Cube+ datasets (lower value means higher accuracy).

Table 2 .
Comparison of angular errors between the proposed ARiRTN architecture and the conventual approaches with the Gray-ball dataset.

Table 3 .
Comparison of inter-camera estimation angular errors between the proposed ARiRTN architecture and the conventional approaches with the MultiCam dataset.