Image colorization methods can be divided into traditional methods and deep learning methods. The traditional methods [
4,
5] have the disadvantages of poor colorization effects, many calculations and considerable manual interaction. In recent years, with the development of artificial intelligence technology, deep learning methods [
6,
7] driven by large-scale data have provided a new method to overcome the defects of the above image colorization methods. The image colorization methods can also be divided into scribble-based image colorization methods, reference image-based colorization methods and automatic image colorization methods.
2.1. Scribble-Based Methods
The scribble-based image colorization method is a technique that uses user input in the form of simple scribbles to guide the process of adding color to black-and-white images. This method allows for greater control and precision, as users can directly indicate which areas should be colored with specific hues. Levin et al. [
8] proposed the first scribble-based colorization method. Huang et al. [
9] proposed an adaptive edge detection colorization method based on Sobel filters to prevent color overflow at edges. Yatziv et al. [
10] proposed a chrominance fusion colorization method, which calculates the distance between each pixel and multiple scribbles and then determines the pixel color based on the weighted sum of scribble colors. Compared with the method of Levin et al. [
8], Yatziv et al. [
10] achieved lower time complexity and computational complexity. However, color overflow may still occur in regions with weaker image edges. Kim et al. [
11] improved upon Yatziv et al.’s method by introducing a data-driven distance measurement approach based on a novel restart random walk [
12], ensuring more consistent edge colors. Scribble-based colorization methods require manual participation and demand a certain level of color perception from users, making the external conditions relatively strict. Therefore, as deep learning-based image colorization methods have emerged, research on scribble-based methods has gradually declined.
2.2. Exemplar-Based Methods
Exemplar-based image colorization methods utilize reference images to add color to black and white or grayscale images. These methods rely on finding similar exemplars in a database of colorized images and using their color information to guide the colorization process. Ironi et al. [
13] incorporated image segmentation information into the colorization process and used domain matching algorithms to assign colors from the reference image to each pixel. However, when lighting conditions differ significantly between the reference and target images, the colorization effect is poor. To address this issue, Liu et al. [
4] proposed an intrinsic colorization method. First, an image is represented as two components: reflectance and illumination. The reflectance from the reference image is then combined with the illumination from the grayscale image to generate a preliminary color image. Finally, a subset pixels is extracted from the color image as color scribbles, and the method of Levin et al. [
8] is applied to perform colorization. Xu et al. [
14] proposed a fast instance colorization network based on stylization to achieve spatial consistency and improve colorization quality. Welsh et al. [
15] transferred color information of a reference image to a gray image by matching brightness and texture features. However, this local matching method often lacks spatial coherence, resulting in suboptimal colorization. Compared with scribble-based methods, reference image-based methods reduce manual involvement by introducing reference images. However, their results are highly depend on the reference image. If there is a significant visual discrepancy between the two images, the colorization quality may degrade. Additionally, the number of reference images is a key factor; using too few reference images may lead to overfitting.
2.3. Automatic Image Colorization Methods
Automatic image colorization methods aim to automatically add color to black and white or grayscale images. These methods utilize algorithms and machine learning techniques to analyze the content of an image and predict appropriate colors for different objects, backgrounds, and lighting conditions. By leveraging large datasets of colorized images, these methods can learn patterns and relationships between different elements in an image, allowing them to accurately apply color with minimal human intervention. With the advent of deep learning, various techniques have been proposed to address automatic image colorization. Such as image processing methods based on some well-designed deep convolutional neural networks (CNN) has largely exceeded traditional approaches [
16,
17,
18,
19,
20]. Cheng et al. [
21] proposed a neural network-based image colorization method for the first time, extracting feature information from different regions of the image as input to the neural network and then using joint bilateral filtering to eliminate image artifacts. Wu et al. [
22] proposed the method for coloring remote sensing images based on deep convolution generation adversarial network. Wang et al. [
23] proposed an automatic colorization framework for a Thangka sketch, which has a highly accurate response to a user selection process. The network model proposed by Cheng et al. [
21] uses manually designed features, which makes it impossible for the model to undergo end-to-end training. Therefore, in the method proposed by Iizuka et al. [
1], grayscale images are directly used as inputs to the neural network, and the predicted chromaticity channels are used as outputs; then, a second network was designed to extract global information from the image, which was then fused with the chromaticity channel to provide the model with a better understanding of the overall semantic information of the image, thereby improving the colorization effect and alleviating colorization overflow. In addition, because this method directly takes grayscale images as input to the model, the colorization time is reduced. Deshpande et al. [
3] used a variational autoencoder to perform low-dimensional embedding encoding on colors, and then combined with a mixed density network for modeling to achieve diverse color images. Isola et al. [
24] proposed an image translation method based on the pix2pix network, which is also suitable for image colorization tasks. Yoo et al. [
25] proposed a memory-enhanced colorization network model focused on solving small sample colorization problems; this model can maintain high-quality colorization results when datasets are limited. Xia et al. [
26] proposed a double-branch colorization network model that included a color modeler to predict the color of the anchor point to represent the color distribution and a color generator to predict the pixel color by referencing the sampled anchor points. Zhong et al. [
27] proposed a grayscale enhancement colorization network (GECNet) to bridge the modality gap by retaining the structure of the colorized image which contains rich information.
In recent years, several studies on colorization using residual U-Nets have achieved considerable performance results [
28,
29,
30,
31]. Sharma et al. [
28] proposed a Robust Image Colorization using Self-attention-based Progressive Generative Adversarial Network (RICSPGAN), which consists of a residual encoder–decoder (RED) network and a Self-attention-based Progressive Generative Network (SP-GAN) in a cascaded form to perform denoising and colorization. Kumar et al. [
29] presented a parallel GAN-based colorization framework that uses parallel GANs tailored to colorize the foreground (using object-level features) and background (using full-image features) independently and performs unbalanced GAN training. Guo et al. [
30] designed a novel Bilateral Res-Unet based on GAN, which is used in generator to transfer color features on both sides of the encoder. Liu et al. [
31] proposed an efficient anime sketch colorization method using swish-gated residual U-Net (SGRU) and a spectrally normalized GAN (SNGAN) to address the problem of low-quality colorization effects. These residual U-Net-based methods have made remarkable progress in image colorization, but they still have limitations in feature fusion and detail preservation. Our proposed method differs from these existing works in the strategic integration of a shallow feature extraction module (SFEB) with a residual attention U-Net, and in the unique feature fusion mechanism that strengthens shallow features to enhance detail preservation, effectively addressing the shortcomings of existing residual U-Net colorization methods.
In addition, with the rapid development of deep learning, transformer-based and diffusion-based colorization methods have become new research hotspots and achieved excellent performance. For example, DDColor [
32] proposed a dual-decoder structure to achieve photo-realistic image colorization; BigColor [
33] utilized a generative color prior to improve colorization quality; and L-CAD
∗ [
34] introduced language-based colorization with diffusion priors. These methods have shown superior performance in some scenarios, but they often have higher computational complexity and require more computing resources. Our method, based on residual attention U-Net, achieves a better balance between colorization quality and computational efficiency, and still maintains competitive performance compared with these recent advanced methods.
Although deep learning methods can achieve automatic image colorization, the model is strongly affected by the color tone of the image in the training dataset, resulting in unsatisfactory image colorization effects and problems such as detail loss, color dimming, and unreal colors. The color tones in a training dataset can have a significant impact on model performance, particularly in image colorization. To ensure robust and generalizable models, it is important to consider the diversity of color tones in the dataset, employ data augmentation techniques, normalize or standardize the color distribution, and carefully select and tune the machine learning model. By addressing these aspects, researchers and practitioners can improve the performance and reliability of their models across various real-world scenarios. This paper proposes a deep learning colorization method based on residual attention U-Net, which preserves the details of the colorized image and improves the colorization effect.