Comparative Analysis of Color Space and Channel, Detector, and Descriptor for Feature-Based Image Registration

The current study aimed to quantify the value of color spaces and channels as a potential superior replacement for standard grayscale images, as well as the relative performance of open-source detectors and descriptors for general feature-based image registration purposes, based on a large benchmark dataset. The public dataset UDIS-D, with 1106 diverse image pairs, was selected. In total, 21 color spaces or channels including RGB, XYZ, Y′CrCb, HLS, L*a*b* and their corresponding channels in addition to grayscale, nine feature detectors including AKAZE, BRISK, CSE, FAST, HL, KAZE, ORB, SIFT, and TBMR, and 11 feature descriptors including AKAZE, BB, BRIEF, BRISK, DAISY, FREAK, KAZE, LATCH, ORB, SIFT, and VGG were evaluated according to reprojection error (RE), root mean square error (RMSE), structural similarity index measure (SSIM), registration failure rate, and feature number, based on 1,950,984 image registrations. No meaningful benefits from color space or channel were observed, although XYZ, RGB color space and L* color channel were able to outperform grayscale by a very minor margin. Per the dataset, the best-performing color space or channel, detector, and descriptor were XYZ/RGB, SIFT/FAST, and AKAZE. The most robust color space or channel, detector, and descriptor were L*a*b*, TBMR, and VGG. The color channel, detector, and descriptor with the most initial detector features and final homography features were Z/L*, FAST, and KAZE. In terms of the best overall unfailing combinations, XYZ/RGB+SIFT/FAST+VGG/SIFT seemed to provide the highest image registration quality, while Z+FAST+VGG provided the most image features.


Introduction
Image registration is the computer vision task of aligning images of a common scene that differ due to their geometry or photometry conditions.Commonly, image registration is regarded as a component of a very close if not interchangeable concept, image stitching, which also involves image blending to create seamless stitched panoramas [1].The core objective of image registration is to establish spatial correspondences between different images, allowing for the fusion of data from various sources or time points.Commonly, image registration algorithms are categorized into area-based and feature-based methods, although alternative classifications based on registered image types or image transformation types exist [2], given the complexity and diversity of different image registration approaches.Area-based methods rely on comparing and correlating pixel intensity patterns or statistical properties between corresponding image regions for optimization [3], while feature-based methods rely on detecting and matching landmarks or keypoints to estimate geometrical image transformations [4].As researchers across all disciplines collectively embark on a new era of artificial intelligence, deep learning techniques have also been successfully applied to various image registration tasks such as feature extraction, descriptor matching, homography estimation, etc. [2,5,6].evaluation, and compared five features, including SIFT, SURF, KAZE, AKAZE, ORB, and BRISK, treating each as both detector and descriptor.They discovered that ORB detected the greatest number of features, while KAZE detected the lowest number; ORB had the lowest computation cost for feature detection and description, while KAZE had the highest; and SIFT was the most accurate feature for scale, rotation, and affine image variations overall.Sharma et al. [4] analyzed 85 detector and descriptor combinations based on three pairs of images, involving GFTT, SIFT, MSER, FAST, SURF, CSE, BRIEF, DAISY, AGAST, BRISK, ORB, FREAK, KAZE, AKAZE, and MSD.Their evaluation metrics included peak signal-to-noise ratio (PSNR), SSIM, feature similarity indexing method (FSIM), and visual saliency-induced index (VSI).AKAZE detector and AKAZE descriptor was identified as the best combination that outperformed all the other combinations.Wu et al. [57] compared SIFT with its variants PCA-SIFT, GSIFT, CSIFT, SURF, and ASIFT under scale and rotation, blur, illumination, and affine changes based on four pairs of images.They qualitatively concluded that SIFT and CSIFT performed the best under scale and rotation change; GSIFT performed the best under blur and illumination changes; and ASIFT performed the best under affine change.Ihmeida and Wei [49] created two datasets out of the same three remote sensing image pairs, and analyzed SIFT, SURF, ORB, BRISK, KAZE, and AKAZE as feature detector and descriptor simultaneously based on inlier match number, computation cost, and feature inlier ratio.They discovered that SIFT provided the highest accuracy, while ORB was the fastest algorithm.
In addition to the inconsistent conclusions on the best-performing feature detectors and descriptors, several knowledge gaps still exist in current literature regarding the general feature-based image registration procedure.First, before feature detection, color images are typically converted into grayscale for computation efficiency [58].However, the value of color information contained in various color spaces pertaining to image registration has never been examined and quantified.Second, image registration quality is being evaluated using diverse methods in current literature, e.g., subject visual inspection and rating on parallax error, perspective distortion, viewing ease and comfort, etc. [59][60][61][62][63], as well as objective indices such as FSIM [4], mutual information (MI) [64], normalized cross correlation (NCC) [64], PSNR [4], RE [55], root mean square error (RMSE) [65], SSIM [4,55], stereoscopic stitched image quality assessment (S-SIQA) [66], universal image quality index (UIQI) [67], VSI [4], etc.Yet, the relationships or agreements between different evaluation metrics have not been investigated.Third, existing studies usually concentrate on quantifying image registration accuracy and speed when comparing various image features; however, feature robustness or reliability in successfully registering multiple image pairs without failure is a rarely discussed aspect.Finally, an effort comparing multiple image feature detectors and descriptors based on a large dataset is simply missing in current computer vision research, as many prior comparative studies tended to rely on only a few image pairs, potentially leading to biased conclusions.
To address the aforementioned lacunae in the existing knowledge base, the current study leveraged dedicated open-source dataset and library for image registration, with consideration given to practicality and replicability for future researchers.In terms of image registration accuracy, robustness, and feature numbers, the performance of selected color spaces and the corresponding color channels, feature detectors, and feature descriptors was quantified and presented in this article.Recommendations on the best overall combinations of color space or color channel, feature detector, and feature descriptor were also provided at the end of the study.

Dataset
The public dataset UDIS-D [68] was selected for the study due to its accessibility, substantial size, and image diversity.UDIS-D, proposed by Nie et al., is the first large real-world benchmark dataset for image registration, and it includes diverse scene conditions such as indoor, outdoor, night, dark, snow, zooming, etc., with different levels of image overlap and parallax.In particular, UDIS-D includes two subsets: a training subset containing 10,440 image pairs and a testing subset containing 1106 image pairs, which all have a 512 × 512 resolution.Only the testing subset was used in the current study, as it has a sufficiently large dataset size while preserving the same data diversity as the training subset (Figure 1).

Dataset
The public dataset UDIS-D [68] was selected for the study due to its accessibility, substantial size, and image diversity.UDIS-D, proposed by Nie et al., is the first large realworld benchmark dataset for image registration, and it includes diverse scene conditions such as indoor, outdoor, night, dark, snow, zooming, etc., with different levels of image overlap and parallax.In particular, UDIS-D includes two subsets: a training subset containing 10,440 image pairs and a testing subset containing 1106 image pairs, which all have a 512 × 512 resolution.Only the testing subset was used in the current study, as it has a sufficiently large dataset size while preserving the same data diversity as the training subset (Figure 1).

Selected Color Spaces and Channels
Color space refers to a specific organization of colors, which allows for the representation of colors in a numerically and visually meaningful way.In feature-based image registration, image color information, most commonly represented by red-green-blue (RGB) color space, is usually discarded by converting RGB images into grayscale images for more efficient image feature detection and description.For the current study, five widely adopted color spaces in computer vision research were chosen to examine the value of color information under the context of image registration: RGB, XYZ, Y′CrCb, HLS, and L*a*b* [69].For each color space, in addition to utilizing all matched features from all three color channels, the usefulness of individual color channels as potential superior replacements for standard grayscale conversion were also investigated, as color channels are grayscale images themselves.In total, each pair of the raw images from the dataset corresponded to 1 grayscale + 5 three-channel color spaces + 5 × 3 single color channels = 21 versions of color space or channel for feature detection, description, matching, and filtering during registration (Figure 2).All image color space conversion operations were completed using the open-source library OpenCV version 4.9.0 without any additional image processing steps before or after the conversions, the mathematical expressions of which can be found in [70].

Selected Color Spaces and Channels
Color space refers to a specific organization of colors, which allows for the representation of colors in a numerically and visually meaningful way.In feature-based image registration, image color information, most commonly represented by red-green-blue (RGB) color space, is usually discarded by converting RGB images into grayscale images for more efficient image feature detection and description.For the current study, five widely adopted color spaces in computer vision research were chosen to examine the value of color information under the context of image registration: RGB, XYZ, Y ′ CrCb, HLS, and L*a*b* [69].For each color space, in addition to utilizing all matched features from all three color channels, the usefulness of individual color channels as potential superior replacements for standard grayscale conversion were also investigated, as color channels are grayscale images themselves.In total, each pair of the raw images from the dataset corresponded to 1 grayscale + 5 three-channel color spaces + 5 × 3 single color channels = 21 versions of color space or channel for feature detection, description, matching, and filtering during registration (Figure 2).All image color space conversion operations were completed using the open-source library OpenCV version 4.9.0 without any additional image processing steps before or after the conversions, the mathematical expressions of which can be found in [70].

Selected Detectors and Descriptors
The selection of image feature detectors and descriptors for the study was based on the following considerations, aimed at facilitating the replication of the study results and ensuring practical benefits from the study conclusions: the chosen detectors and descriptors should be implemented in open-source libraries; the chosen detector and descriptor functions should be stable for consecutive executions without raising fatal computer errors; the chosen detectors and descriptors should be freely available for use without patent protections; the chosen detector and descriptor functions should require no arguments to initialize the features; the outputs of the chosen detector functions should be compatible with the inputs of the chosen descriptor functions.Accordingly, nine feature detectors and 11 feature descriptors were selected (Tables 1 and 2).Out of all the possible detector and descriptor function combinations, there were 15 never-compatible ones (Table A1).In total, 9 detectors × 11 descriptors − 15 incompatible combinations = 84 detector and descriptor combinations were investigated in the study.All image feature detection and description operations were completed using OpenCV version 4.9.0.

Selected Detectors and Descriptors
The selection of image feature detectors and descriptors for the study was based on the following considerations, aimed at facilitating the replication of the study results and ensuring practical benefits from the study conclusions: the chosen detectors and descriptors should be implemented in open-source libraries; the chosen detector and descriptor functions should be stable for consecutive executions without raising fatal computer errors; the chosen detectors and descriptors should be freely available for use without patent protections; the chosen detector and descriptor functions should require no arguments to initialize the features; the outputs of the chosen detector functions should be compatible with the inputs of the chosen descriptor functions.Accordingly, nine feature detectors and 11 feature descriptors were selected (Tables 1 and 2).Out of all the possible detector and descriptor function combinations, there were 15 never-compatible ones (Table A1).In total, 9 detectors × 11 descriptors − 15 incompatible combinations = 84 detector and descriptor combinations were investigated in the study.All image feature detection and description operations were completed using OpenCV version 4.9.0.

Image Registration Procedure
Before registration, each pair of raw images from the dataset was first converted to grayscale or the specified color space.Depending on whether the current registration involved single grayscale channel, single color channel, or all three color channels, the feature detection, description, matching, and filtering processes described below were repeated either once or three times.On the target image channel, image features were first detected by the specified feature detector and then described by the specified feature descriptor.The described image features were matched using BF descriptor matcher and filtered by cross-checking, as explained in the introduction.The binary descriptors, including AKAZE, BRISK, and ORB, were matched based on Hamming distance, while the non-binary descriptors, including BB, BRIEF, DAISY, FREAK, KAZE, LATCH, SIFT, and VGG, were matched based on Euclidean distance.If the current registration involved all three color channels, the three sets of filtered feature matches were combined as one set.Finally, projective homography was estimated based on the filtered feature matches through RANSAC, which could robustly filter out outlier feature matches that survived through cross-checking but did not agree with the majority (Figure 3).In total, 1106 raw image pairs × 21 color spaces or channels × 84 detector and descriptor combinations = 1,950,984 image registrations were performed in the study.All failed registrations or failed registration code executions were recorded, which could be due to intermittent detector and descriptor incompatibility, fewer than four inlier matched features after RANSAC for homography estimation, excessive registered source image distortion surpassing computer memory capacity from extreme homography transformation, etc.All image registration operations were completed using OpenCV version 4.9.0 in a Python 3.11.5 environment with default function argument values, unless specified otherwise above.

Registration Quality Evaluation
As ground truth registrations do not exist for the UDIS-D dataset, the current study evaluated the image registration quality based on the similarities between the overlapping areas of the target and source images, since perfect registrations should result in identical overlapping areas.While the image registrations were executed on multiple computers with different hardware specifications, registration speed was not considered as an essential aspect to evaluate for the study.Before any evaluation metrics could be properly cal-

Registration Quality Evaluation
As ground truth registrations do not exist for the UDIS-D dataset, the current study evaluated the image registration quality based on the similarities between the overlapping areas of the target and source images, since perfect registrations should result in identical overlapping areas.While the image registrations were executed on multiple computers with different hardware specifications, registration speed was not considered as an essential aspect to evaluate for the study.Before any evaluation metrics could be properly calculated, preprocessing steps of the registered target and source images were necessary to ensure unbiased objective registration quality assessment.By default, OpenCV blends source image edge pixels with black background pixels during image warping to avoid artifacts and jagged edges, which, however, compromises original source image pixel values.When extracting target and source image overlapping regions, such edge pixels were specifically not counted as overlapping pixels.Additionally, OpenCV by default fills empty spaces with black background pixels in the registered images, which could affect certain metric calculations, although by a minimal amount.During target and source image overlapping region extraction, the black borders around the overlapping regions were removed as much as possible without sacrificing any valid overlapping pixels (Figure 4).

Registration Quality Evaluation
As ground truth registrations do not exist for the UDIS-D dataset, the current study evaluated the image registration quality based on the similarities between the overlapping areas of the target and source images, since perfect registrations should result in identical overlapping areas.While the image registrations were executed on multiple computers with different hardware specifications, registration speed was not considered as an essential aspect to evaluate for the study.Before any evaluation metrics could be properly calculated, preprocessing steps of the registered target and source images were necessary to ensure unbiased objective registration quality assessment.By default, OpenCV blends source image edge pixels with black background pixels during image warping to avoid artifacts and jagged edges, which, however, compromises original source image pixel values.When extracting target and source image overlapping regions, such edge pixels were specifically not counted as overlapping pixels.Additionally, OpenCV by default fills empty spaces with black background pixels in the registered images, which could affect certain metric calculations, although by a minimal amount.During target and source image overlapping region extraction, the black borders around the overlapping regions were removed as much as possible without sacrificing any valid overlapping pixels (Figure 4).Three commonly used metrics were selected to objectively quantify the image registration quality in the study: Three commonly used metrics were selected to objectively quantify the image registration quality in the study: Where F is the number of inlier matched fea- tures after RANSAC homography estimation, (x i , y i ) are the coordinates of the ith feature in registered target image, and (x i ′ , y i ′ ) are the coordinates of the ith feature in registered source image.RE ranges from 0 to positive infinity.
is the number of pixels in the overlapping area between registered target and source images excluding black background pixels, W is the overlapping area width, H is the overlapping area height, (x, y) are the overlapping area pixel coordinates, (R x,y , G x,y , B x,y ) are the R, G, B values at pixel location (x, y) in registered target image, and (R x,y ′ , G x,y ′ , B x,y ′ ) are the R, G, B values at pixel location (x, y) in registered source image.RMSE ranges from 0 to 255 for typical 24-bit images.
Where N is the number of image patches where local SSIM is calculated within a 7 × 7 sliding window, µ i is the mean of the ith patch in registered grayscale target image, µ i ′ is the mean of the ith patch in registered grayscale source image, σ c is the covariance of registered grayscale target and source images, σ i is the variance of the ith patch in registered grayscale target image, and σ i ′ is the variance of the ith patch in registered grayscale source image.SSIM ranges from −1 to 1.All SSIM values were calculated using scikit-image [71] version 0.20.0 with default function argument values.

Registration Quality Comparison
As shown in the following sections, RE tended to provide extremely large values when a low-quality registration was performed, unlike RMSE and SSIM, whose values were distributed within finite ranges.Perfect RE values such as 0 were achieved in the study; however, they were usually the result of low numbers of inlier feature matches after RANSAC filtering and hence could be misleading.For example, a homography estimated based on only four feature matches will achieve a perfect feature reprojection, which, however, is not necessarily equivalent to a high-quality registration, as the homography can represent an overfitted image transformation relationship.Additionally, the absence of large RE values did not necessarily indicate high-quality image registrations either, as the registrations simply could have failed.RMSE represents the average pixel value difference between registered target and source images.No perfect RMSE values such as 0 were achieved, which was anticipated, as the registration process generally would warp source images and distort their pixel values to some degree.All SSIM values in the study were larger than 0, indicating that the overlapping areas between the registered target and source images were always somewhat similar luminance, contrast, or texture-wise.Similar to RMSE, no perfect SSIM values such as 1 were achieved either.The large value ranges of the metrics reflected the diversity of registration difficulty within UDIS-D as an appropriate benchmarking dataset, including both easy registration, which would lead to low RE and RMSE values and high SSIM values, and difficult registration, which would lead to high RE and RMSE values and low SSIM values.A2, shows the boxplots of the three registration quality metrics achieved by each three-channel color space for all the image registrations, in comparison to one-channel grayscale (referred to as GS in the following figures).Overall, no color spaces differentiated themselves from others in a substantial way, regardless of the evaluation metrics, indicating that the utilization of image features from all three color channels did not bring obvious registration quality benefits.Based on the median values of the distributions, grayscale had a lower RE of 1.0005 than any other color spaces, which was likely due to its lower number of matched features coming from only one channel instead of all three channels, an RMSE of 8.1125, and an SSIM of 0.7363.For both RMSE and SSIM, RGB and XYZ consistently outperformed grayscale marginally, with RMSEs of 8.0981 and 8.0936 and SSIMs of 0.7410 and 0.7409, while Y ′ CrCb, HLS, and L*a*b* consistently underperformed grayscale marginally, with RMSEs of 8.1190, 8.1360, and 8.1159 and SSIMs of 0.7349, 0.7287, and 0.7354.Among the five color spaces, HLS seemed to be the least ideal one, with the largest RE and RMSE and the smallest SSIM.A3, shows the boxplots of the registration quality relative changes achieved by each color space over grayscale, when their raw image pairs, feature detectors, and feature descriptors were identical.Generally, based on the median values of the distributions, no color space was able to improve RE over grayscale, likely for the reason mentioned above.However, again, RGB and XYZ both were able to improve RMSE and SSIM over grayscale marginally by 0.05% and 0.1%, indicating an expected   A3, shows the boxplots of the registration quality relative changes achieved by each color space over grayscale, when their raw image pairs, feature detectors, and feature descriptors were identical.Generally, based on the median values of the distributions, no color space was able to improve RE over grayscale, likely for the reason mentioned above.However, again, RGB and XYZ both were able to improve RMSE and SSIM over grayscale marginally by 0.05% and 0.1%, indicating an expected image registration quality benefit when switching from grayscale to RGB or XYZ as input image channels.Overall Y ′ CrCb, as well as L*a*b* to a lesser degree, achieved an almost identical performance to grayscale with 0 or near 0 RE, RMSE, and SSIM relative changes.HLS again showed a consistently lower performance than grayscale, with median 10.57% RE increase, 0.09% RMSE increase, and 0.25% SSIM decrease.A3, shows the boxplots of the registration quality relative changes achieved by each color space over grayscale, when their raw image pairs, feature detectors, and feature descriptors were identical.Generally, based on the median values of the distributions, no color space was able to improve RE over grayscale, likely for the reason mentioned above.However, again, RGB and XYZ both were able to improve RMSE and SSIM over grayscale marginally by 0.05% and 0.1%, indicating an expected image registration quality benefit when switching from grayscale to RGB or XYZ as input image channels.Overall Y′CrCb, as well as L*a*b* to a lesser degree, achieved an almost identical performance to grayscale with 0 or near 0 RE, RMSE, and SSIM relative changes.HLS again showed a consistently lower performance than grayscale, with median 10.57% RE increase, 0.09% RMSE increase, and 0.25% SSIM decrease.Focusing on the outliers of the distributions in Figure 6, with the right raw image pair, feature detector, and feature descriptor combinations, all color spaces were able to either reduce image registration quality of grayscale by up to 9343% to 501,135% for RE, 103% to 253% for RMSE, and 76% to 95% for SSIM, or improve image registration quality Focusing on the outliers of the distributions in Figure 6, with the right raw image pair, feature detector, and feature descriptor combinations, all color spaces were able to either reduce image registration quality of grayscale by up to 9343% to 501,135% for RE, 103% to 253% for RMSE, and 76% to 95% for SSIM, or improve image registration quality of grayscale by up to 100% for RE, 37% to 73% for RMSE, and 252% to 1240% for SSIM.In that sense, no color space, including grayscale, is superior to others at all times, depending on the input image characteristics.Such large relative change ranges indicated the necessity of large benchmarking datasets for comparative image registration studies, as investigations based on only a few pair of outlier images could very likely result in misleading observations and conclusions.A4, shows the boxplots of the three registration quality metrics achieved by each individual color channel for all the image registrations, in comparison to grayscale.Overall, no color channels differentiated themselves from others in a meaningful positive way, although apparent inferior performances were observed for certain color channels.Based on RMSE and SSIM, Cr, Cb, H, S, a*, and b* were noticeably less accurate than the remaining color channels, all of which had similar performances to each other.In terms of RE median values, Cr, Cb, a*, and b* are much lower than the other color channels.Their 0 or near 0 first quartile RE values also indicated that they were not able to provide rich image features.Y ′ and L* are the only two channels that outperformed grayscale based on RMSE and SSIM; however, their median values are almost identical to grayscales, with 8.1120 and 8.1122 versus 8.1125 for RMSE and 0.7364 and 0.7364 versus 0.7363 for SSIM.Note the calculation of Y ′ channel should be the same as grayscale in theory, yet the function implementations in OpenCV occasionally resulted in minor image pixel value differences due to internal code base issues, leading to the trivial registration quality metric distribution differences between them.

Color Channel
they were not able to provide rich image features.Y′ and L* are the only two channels that outperformed grayscale based on RMSE and SSIM; however, their median values are almost identical to grayscales, with 8.1120 and 8.1122 versus 8.1125 for RMSE and 0.7364 and 0.7364 versus 0.7363 for SSIM.Note the calculation of Y′ channel should be the same as grayscale in theory, yet the function implementations in OpenCV occasionally resulted in minor image pixel value differences due to internal code base issues, leading to the trivial registration quality metric distribution differences between them.A5, shows the boxplots of the registration quality relative changes achieved by each color channel over grayscale, when their raw image pairs, feature detectors, and feature descriptors were identical.Based on the median values of the distributions, X, Cr, Cb, a*, and b* were able to achieve lower REs than grayscale, with a 0.43% to 54.24% reduction.L* was the only color channel that attained superior RMSE and SSIM to grayscale, with a marginal 0.01% improvement for both metrics.Cr, Cb, H, S, a*, and b* again showed substantially inferior performance to grayscale according to RMSE and SSIM, with an increase of 2.33% to 16.11% for RMSE and a decrease of 6.77% to 33.28% for SSIM.A5, shows the boxplots of the registration quality relative changes achieved by each color channel over grayscale, when their raw image pairs, feature detectors, and feature descriptors were identical.Based on the median values of the distributions, X, Cr, Cb, a*, and b* were able to achieve lower REs than grayscale, with a 0.43% to 54.24% reduction.L* was the only color channel that attained superior RMSE and SSIM to grayscale, with a marginal 0.01% improvement for both metrics.Cr, Cb, H, S, a*, and b* again showed substantially inferior performance to grayscale according to RMSE and SSIM, with an increase of 2.33% to 16.11% for RMSE and a decrease of 6.77% to 33.28% for SSIM.Focusing on the outliers of the distributions in Figure 8, with the right raw image pair, feature detector, and feature descriptor combinations, similar to the color space observations, all color channels were able to either reduce image registration quality of grayscale by up to 2609% to 524,210% for RE, 27% to 297% for RMSE, and 64% to 96% for SSIM, or improve image registration quality of grayscale by up to 89 to 100% for RE, 16% to 72% for RMSE, and 119% to 1196% for SSIM.Again, no color channel, including grayscale, is superior to others at all times, depending on the input image characteristics.Relatively speaking, three-channel color spaces seemed to provide slight advantages over singlechannel color channels in terms of improving the quality of the outlier grayscale registrations.Focusing on the outliers of the distributions in Figure 8, with the right raw image pair, feature detector, and feature descriptor combinations, similar to the color space observations, all color channels were able to either reduce image registration quality of grayscale by up to 2609% to 524,210% for RE, 27% to 297% for RMSE, and 64% to 96% for SSIM, or improve image registration quality of grayscale by up to 89 to 100% for RE, 16% to 72% for RMSE, and 119% to 1196% for SSIM.Again, no color channel, including grayscale, is superior to others at all times, depending on the input image characteristics.Relatively speaking, three-channel color spaces seemed to provide slight advantages over single-channel color channels in terms of improving the quality of the outlier grayscale registrations.A6, shows the boxplots of the three registration quality metrics achieved by each feature detector for all the image registrations.Based on the median RE values, from the best to the worst, the detectors ranked as AKAZE, SIFT, CSE, KAZE, HL, ORB, BRISK, FAST, and TBMR, with REs from 0.88 to 1.12.Based on the median RMSE and SSIM values, however, which mostly agreed with each other, from the best to the worst, the detectors ranked as FAST/SIFT, SIFT/FAST, BRISK, KAZE, AKAZE, HL, TBMR, CSE, and ORB, with RMSEs from 8.14 to 8.55 and SSIMs from 0.63 to 0.74.SIFT stood out as the most consistent-performing detector across all three metrics, securing second place in RE and RMSE and first place in SSIM.

Feature Detector
channel color channels in terms of improving the quality of the outlier grayscale registrations.
3.1.3.Feature Detector Figure 9, supplemented by Table A6, shows the boxplots of the three registration quality metrics achieved by each feature detector for all the image registrations.Based on the median RE values, from the best to the worst, the detectors ranked as AKAZE, SIFT, CSE, KAZE, HL, ORB, BRISK, FAST, and TBMR, with REs from 0.88 to 1.12.Based on the median RMSE and SSIM values, however, which mostly agreed with each other, from the best to the worst, the detectors ranked as FAST/SIFT, SIFT/FAST, BRISK, KAZE, AKAZE, HL, TBMR, CSE, and ORB, with RMSEs from 8.14 to 8.55 and SSIMs from 0.63 to 0.74.SIFT stood out as the most consistent-performing detector across all three metrics, securing second place in RE and RMSE and first place in SSIM.A7, shows the boxplots of the three registration quality metrics achieved by each feature descriptor for all the image registrations.The descriptors did not differentiate themselves from each other as much as the detectors.Based on the median RE values, from the best to the worst, the descriptors ranked as AKAZE, KAZE, FREAK, BRISK, ORB, DAISY, BB, BRIEF, VGG, SIFT, and LATCH, with REs from 0.88 to 1.08.Based on the median RMSE values, from the best to the worst, the descriptors ranked as AKAZE, DAISY, VGG, BRIEF, SIFT, BB, KAZE, BRISK, LATCH, ORB, and FREAK, with RMSEs from 8.21 to 8.33.Based on the median SSIM values, from the best to the worst, the descriptors ranked as AKAZE, KAZE, DAISY, VGG, BRIEF, BB, BRISK, SIFT, ORB, LATCH, and FREAK, with SSIMs from 0.70 to 0.72.AKAZE consistently stood out as the best-performing descriptor across all three metrics.However, as shown in the upcoming section, AKAZE was one of the two descriptors with poor detector compatibility and hence high registration failure rate.The observed superior performance of AKAZE could be due to the lower influence from fewer low-performing detectors.A7, shows the boxplots of the three registration quality metrics achieved by each feature descriptor for all the image registrations.The descriptors did not differentiate themselves from each other as much as the detectors.Based on the median RE values, from the best to the worst, the descriptors ranked as AKAZE, KAZE, FREAK, BRISK, ORB, DAISY, BB, BRIEF, VGG, SIFT, and LATCH, with REs from 0.88 to 1.08.Based on the median RMSE values, from the best to the worst, the descriptors ranked as AKAZE, DAISY, VGG, BRIEF, SIFT, BB, KAZE, BRISK, LATCH, ORB, and FREAK, with RMSEs from 8.21 to 8.33.Based on the median SSIM values, from the best to the worst, the descriptors ranked as AKAZE, KAZE, DAISY, VGG, BRIEF, BB, BRISK, SIFT, ORB, LATCH, and FREAK, with SSIMs from 0.70 to 0.72.AKAZE consistently stood out as the best-performing descriptor across all three metrics.However, as shown in the upcoming section, AKAZE was one of the two descriptors with poor detector compatibility and hence high registration failure rate.The observed superior performance of AKAZE could be due to the lower influence from fewer low-performing detectors.

Registration Quality Metric Agreement
Figure 11 shows the scatter plots between the three registration quality metrics of all the image registrations.RMSE and SSIM were poorly correlated with RE, with coefficients of determination (R 2 s) of merely 0.0016 and 0.0019, respectively.This once again suggested the downside of RE as an image registration quality metric, potentially being extremely large for inaccurate registrations, unlike RMSE and SSIM, whose values fluctu-

Registration Quality Metric Agreement
Figure 11 shows the scatter plots between the three registration quality metrics of all the image registrations.RMSE and SSIM were poorly correlated with RE, with coefficients of determination (R 2 s) of merely 0.0016 and 0.0019, respectively.This once again suggested the downside of RE as an image registration quality metric, potentially being extremely large for inaccurate registrations, unlike RMSE and SSIM, whose values fluctuated with narrow ranges.Additionally, as mentioned before, low REs could be simply the result of low matched feature numbers and did not guarantee accurate image registrations.RMSE and SSIM were better correlated, demonstrating a general negative correlation with a 0.4844 R 2 .Under the context of the study, which employed the unimodal dataset UDIS-D, RMSE seemed to be a superior and more reliable metric than SSIM.When RMSEs were low, such as being near 2, the corresponding SSIMs were also high, such as being near 1.However, when SSIMs were high, such as being near 1, the corresponding RMSEs distributed over the entire data range, such as being in between 2 and 11.In that regard, highquality registrations identified through their RMSEs would likely also have high SSIMs, while high-quality registrations identified through their SSIMs would not necessarily have low RMSEs.Nevertheless, in terms of multimodal image registrations where image pixel values differ significantly, such as registrations between magnetic resonance imaging (MRI), computed tomography (CT), single-photon emission computed tomography (SPECT), positron emission tomography (PET), and ultrasound (US) images [72][73][74], or between optical, infrared, SAR, depth, map, day, and night images [75][76][77], SSIM might provide an advantage over RMSE to better quantify the similarity between registered target and source images.

Registration Failure Rate
Figure 12 shows the registration failure rates of the investigated color channels and spaces, feature detectors, and feature descriptors respectively for all the image registrations.Cr, Cb, a*, and b* were the four color channels with very high failure rates, ranging from 66% to 82%. Figure 2 demonstrates a clear example showing their lack of image contrast relative to the other color channels and spaces, which could cause low numbers of detectable features.H and S also had noticeably high failure rates of 6% and 4%.From the best to the worst, the rest color channels and spaces ranked as L*a*b*, Y′CrCb, HLS, RGB, XYZ, grayscale, R, Z, L*, G, L, Y′, Y, X, and B, with failure rates varying from 2% to 3%.Interestingly, even though by a marginal difference, the five color spaces were more robust than any single image channels, including grayscale.From the best to the worst, the feature detectors ranked as TBMR, FAST, SIFT, ORB, BRISK, HL, CSE, KAZE, and AKAZE.Aside from TBMR, which had an unusually low failure rate of 2%, the failure rates of the rest detectors ranged from 14% to 23%.In terms of feature descriptor, AKAZR and KAZE were the two with abnormally high failure rates of 57% and 54%, mostly due to their frequent incompatibility with most feature detectors.From the best to the worst, the rest descriptors ranked as VGG, BB, SIFT, DAISY, LATCH, ORB, BRIEF, BRISK, and FREAK, with failure rates ranging from 14% to 16%.

Registration Failure Rate
Figure 12 shows the registration failure rates of the investigated color channels and spaces, feature detectors, and feature descriptors respectively for all the image registrations.Cr, Cb, a*, and b* were the four color channels with very high failure rates, ranging from 66% to 82%. Figure 2 demonstrates a clear example showing their lack of image contrast relative to the other color channels and spaces, which could cause low numbers of detectable features.H and S also had noticeably high failure rates of 6% and 4%.From the best to the worst, the rest color channels and spaces ranked as L*a*b*, Y ′ CrCb, HLS, RGB, XYZ, grayscale, R, Z, L*, G, L, Y ′ , Y, X, and B, with failure rates varying from 2% to 3%.Interestingly, even though by a marginal difference, the five color spaces were more robust than any single image channels, including grayscale.From the best to the worst, the feature detectors ranked as TBMR, FAST, SIFT, ORB, BRISK, HL, CSE, KAZE, and AKAZE.Aside from TBMR, which had an unusually low failure rate of 2%, the failure rates of the rest detectors ranged from 14% to 23%.In terms of feature descriptor, AKAZR and KAZE were the two with abnormally high failure rates of 57% and 54%, mostly due to their frequent incompatibility with most feature detectors.From the best to the worst, the rest descriptors ranked as VGG, BB, SIFT, DAISY, LATCH, ORB, BRIEF, BRISK, and FREAK, with failure rates ranging from 14% to 16%.
best to the worst, the rest color channels and spaces ranked as L*a*b*, Y′CrCb, HLS, RGB, XYZ, grayscale, R, Z, L*, G, L, Y′, Y, X, and B, with failure rates varying from 2% to 3%.Interestingly, even though by a marginal difference, the five color spaces were more robust than any single image channels, including grayscale.From the best to the worst, the feature detectors ranked as TBMR, FAST, SIFT, ORB, BRISK, HL, CSE, KAZE, and AKAZE.Aside from TBMR, which had an unusually low failure rate of 2%, the failure rates of the rest detectors ranged from 14% to 23%.In terms of feature descriptor, AKAZR and KAZE were the two with abnormally high failure rates of 57% and 54%, mostly due to their frequent incompatibility with most feature detectors.From the best to the worst, the rest descriptors ranked as VGG, BB, SIFT, DAISY, LATCH, ORB, BRIEF, BRISK, and FREAK, with failure rates ranging from 14% to 16%.A8, shows the distributions of the initial feature numbers in the target and source images detected by the feature detectors, as well as the inlier matched feature numbers in the target or source images after RANSAC used for homography estimation, achieved by each color channel for all the image registrations.Based on the distribution median values, as expected, Cr, Cb, a*, and b* had very low numbers of initial detectable features and final inlier features, with 24 to 31 detector features and 6 to 7 homography features.H and S also had considerably lower features than the most color channels, with 390 and 575 detector features and 10 and 53 homography features.From the most to the least, the rest of the color channels ranked as Z, L*, R, G, Y, Y ′ , grayscale, L, B, and X, with 399 to 449 detector features, and L*, G/Y/Y ′ /grayscale, Z/R, L, X, and B, with 144 to 162 homography features.As the best performing color channel in terms of registration quality, L* also ranked at the top in terms of feature numbers with the second-most initial detectable features and the most final inlier features.On the other hand, Cr, Cb, H, S, a*, and b* not only attained the lowest registration quality, but also had the lowest detector and homography features.This observation indicated the potential positive association between image feature number and image registration quality.In that sense, artificial intelligence-based image contrast enhancement and resolution upscaling might be a future research direction for improving image registration accuracy.A9, shows the distributions of the initial feature numbers in the target and source images detected by the feature detectors, as well as the inlier matched feature numbers in the target or source images after RANSAC used for homography estimation, achieved by each feature detector for all the image registrations.Substantial feature number differences were observed for the detectors.Based on the distribution median values, the detector rankings for the initial and final feature numbers mostly agreed with each other, being FAST, BRISK, SIFT/KAZE, KAZE/SIFT, AKAZE, HL, ORB, TBMR/CSE, and CSE/TBMR, from the most to the least.The initial detector feature numbers ranged from 3988 to 244, while the final homography feature numbers ranged from 600 to 33.FAST, as the one of the two best-performing detectors based on RMSE and SSIM, provided significantly more features than the remaining detectors, surpassing the second-place detector, BRISK, by 80% and 83% in terms of initial detector and final homography features.Again, the potential association between image feature number and image registration quality was observed.Aside from image registration, image features also have applications in object recognition [78], object detection [79], image retrieval [80],

Feature Detector
3D reconstruction [81], etc., which all might benefit from the large image feature numbers identified by detectors such as FAST, allowing for richer representations of objects and more potential feature correspondences.
features.From the most to the least, the rest of the color channels ranked as Z, L*, R, G, Y, Y′, grayscale, L, B, and X, with 399 to 449 detector features, and L*, G/Y/Y′/grayscale, Z/R, L, X, and B, with 144 to 162 homography features.As the best performing color channel in terms of registration quality, L* also ranked at the top in terms of feature numbers with the second-most initial detectable features and the most final inlier features.On the other hand, Cr, Cb, H, S, a*, and b* not attained the lowest registration quality, but also had the lowest detector and homography features.This observation indicated the potential positive association between image feature number and image registration quality.In that sense, artificial intelligence-based image contrast enhancement and resolution upscaling might be a future research direction for improving image registration accuracy.A9, shows the distributions of the initial feature numbers in the target and source images detected by the feature detectors, as well as the inlier matched feature numbers in the target or source images after RANSAC used for homography estimation, achieved by each feature detector for all the image registrations.Substantial feature number differences were observed for the detectors.Based on the distribution median values, the detector rankings for the initial and final feature numbers mostly agreed with each other, being FAST, BRISK, SIFT/KAZE, KAZE/SIFT, AKAZE, HL, ORB, TBMR/CSE, and CSE/TBMR, from the most to the least.The initial detector feature numbers ranged from 3988 to 244, while the final homography feature numbers ranged from 600 to 33.FAST, as the one of the two best-performing detectors based on RMSE and SSIM, provided significantly more features than the remaining detectors, surpassing the second-place detector, BRISK, by 80% and 83% in terms of initial detector and final homography features.Again, the potential association between image feature number and image registration quality was observed.Aside from image registration, image features also have applications in object recognition [78], object detection [79], image retrieval [80], 3D reconstruction [81], etc., which all might benefit from the large image feature numbers identified by detectors such as FAST, allowing for richer representations of objects and more potential feature correspondences.A10, shows the distributions of the inlier matched feature numbers in the target or source images after RANSAC used for homography estimation, achieved by each feature descriptor for all the image registrations.No significant feature number differences between the descriptors were observed unlike the detectors, indicating feature detector was potentially a bigger factor than feature descriptor in regard to influencing the numbers of final inlier homography features.Based on the distribution median values, from the most to the least, the descriptors ranked as KAZE, AKAZE, DAISY, VGG, SIFT, BRIEF, LATCH, BB, ORB, BRISK, and FREAK, with homography feature numbers ranging from 87 to 215.A10, shows the distributions of the inlier matched feature numbers in the target or source images after RANSAC used for homography estimation, achieved by each feature descriptor for all the image registrations.No significant feature number differences between the descriptors were observed unlike the detectors, indicating feature detector was potentially a bigger factor than feature descriptor in regard to influencing the numbers of final inlier homography features.Based on the distribution median values, from the most to the least, the descriptors ranked as KAZE, AKAZE, DAISY, VGG, SIFT, BRIEF, LATCH, BB, ORB, BRISK, and FREAK, with homography feature numbers ranging from 87 to 215.

Best Color Space or Channel, Detector, and Descriptor Combination
Due to the large number of combinations, the selection of the best color space or color channel, feature detector, and feature descriptor combinations was based on one prerequisite: the selected combinations shall never fail for any image registrations.Out of the 21 color spaces or channels × 84 detector and descriptor combinations = 1764 total combinations, each of which performed 1106 registrations, 302 or 17% of them successfully registered all the images in the dataset without failure.Figure 16 shows the composition pie charts of the 302 unfailing combinations in terms of color space or channel, feature detector, and feature descriptor.Among the 21 investigated color spaces or channels, Cr, Cb, H, a* and b* always failed at least once when registering the entire dataset, regardless of their paired detectors and descriptors.Interestingly, HLS had the highest proportion of unfailing combinations than any other color spaces or channels.Among the nine investigated feature detectors, only CSE was not able to register all the dataset images without failure.On the other hand, with the right color spaces or channels and detectors, all 11 investigated feature descriptors were able to achieve successful registrations for all the dataset images.A10, shows the distributions of the inlier matched feature numbers in the target or source images after RANSAC used for homography estimation, achieved by each feature descriptor for all the image registrations.No significant feature number differences between the descriptors were observed unlike the detectors, indicating feature detector was potentially a bigger factor than feature descriptor in regard to influencing the numbers of final inlier homography features.Based on the distribution median values, from the most to the least, the descriptors ranked as KAZE, AKAZE, DAISY, VGG, SIFT, BRIEF, LATCH, BB, ORB, BRISK, and FREAK, with homography feature numbers ranging from 87 to 215.

Best Color Space or Channel, Detector, and Descriptor Combination
Due to the large number of combinations, the selection of the best color space or color channel, feature detector, and feature descriptor combinations was based on one prerequisite: the selected combinations shall never fail for any image registrations.Out of the 21 color spaces or channels × 84 detector and descriptor combinations = 1764 total combinations, each of which performed 1106 registrations, 302 or 17% of them successfully registered all the images in the dataset without failure.Figure 16 shows the composition pie charts of the 302 unfailing combinations in terms of color space or channel, feature detector, and feature descriptor.Among the 21 investigated color spaces or channels, Cr, Cb, H, a* and b* always failed at least once when registering the entire dataset, regardless of their paired detectors and descriptors.Interestingly, HLS had the highest proportion of unfailing combinations than any other color spaces or channels.Among the nine investigated feature detectors, only CSE was not able to register all the dataset images without failure.On the other hand, with the right color spaces or channels and detectors, all 11 investigated feature descriptors were able to achieve successful registrations for all the dataset images.In terms of average values over the entire dataset, for the 302 unfailing combinations, their REs ranged from 0.86 to 1.60, their RMSEs ranged from 7.80 to 8.30, and their SSIMs ranged from 0.63 to 0.75.The top 10 combinations for each metric below can be found in Table A11.As the combinations with consecutive placing according to any of the three registration quality metrics often had very small differences, the following color space or channel, detector, and descriptor combination recommendations were strictly based on and confined by the UDIS-D dataset:  In terms of average values over the entire dataset, for the 302 unfailing combinations, their REs ranged from 0.86 to 1.60, their RMSEs ranged from 7.80 to 8.30, and their SSIMs ranged from 0.63 to 0.75.The top 10 combinations for each metric below can be found in Table A11.As the combinations with consecutive placing according to any of the three registration quality metrics often had very small differences, the following color space or channel, detector, and descriptor combination recommendations were strictly based on and confined by the UDIS-D dataset:

Conclusions
The following conclusions were made strictly based on the UDIS-D dataset and only applicable to the investigated color spaces and channels, feature detectors, and feature descriptors, without considering the incompatible detector and descriptor combinations.
From an atomistic point of view, two color spaces, XYZ and RGB, as well as one color channel, L*, provided very minor image registration quality improvement over grayscale.SIFT, and potentially FAST, were the best-performing detectors.AKAZE was the bestperforming descriptor.L*a*b* was the most robust color space, and grayscale was the most robust color channel.TBMR was the most robust detector.VGG was the most robust descriptor.Z channel allowed the most initial detector features, while L* channel allowed the most final homography features.FAST detector provided the most detector and homography features, while KAZE descriptor provided the most homography features.
From a holistic point of view, color space XYZ and RGB, detector SIFT and FAST, and descriptor VGG and SIFT seemed to optimize RMSE and SSIM the most.The KAZE detector and BRISK descriptor combination seemed to provide special benefits for optimizing RE.The Z channel, FAST detector, and VGG descriptor combination allowed for the detection of the most initial detector features as well as the preservation of the most final homography features.

Feature Acronym
The extended forms of the image feature detectors and descriptors mentioned in this article include: • AGAST: adaptive and generic accelerated segment test

Figure 1 .
Figure 1.Sample images from the testing subset of the dataset UDIS-D [68].

Figure 1 .
Figure 1.Sample images from the testing subset of the dataset UDIS-D [68].

J 24 Figure 2 .
Figure 2. The investigated 21 versions of color space or channel of a sample image from the dataset.

Figure 2 .
Figure 2. The investigated 21 versions of color space or channel of a sample image from the dataset.

Figure 3 .
Figure 3. Schematic diagram of the employed image registration pipeline.

Figure 3 .
Figure 3. Schematic diagram of the employed image registration pipeline.

Figure 4 .
Figure 4. Image preprocessing steps before registration quality evaluation.

Figure 4 .
Figure 4. Image preprocessing steps before registration quality evaluation.

Figure 5 ,
Figure 5, supplemented by TableA2, shows the boxplots of the three registration quality metrics achieved by each three-channel color space for all the image registrations, in comparison to one-channel grayscale (referred to as GS in the following figures).Overall, no color spaces differentiated themselves from others in a substantial way, regardless of the evaluation metrics, indicating that the utilization of image features from all three color channels did not bring obvious registration quality benefits.Based on the median values of the distributions, grayscale had a lower RE of 1.0005 than any other color spaces, which was likely due to its lower number of matched features coming from only one channel instead of all three channels, an RMSE of 8.1125, and an SSIM of 0.7363.For both RMSE and SSIM, RGB and XYZ consistently outperformed grayscale marginally, with RMSEs of 8.0981 and 8.0936 and SSIMs of 0.7410 and 0.7409, while Y ′ CrCb, HLS, and L*a*b* consistently underperformed grayscale marginally, with RMSEs of 8.1190, 8.1360, and 8.1159 and SSIMs of 0.7349, 0.7287, and 0.7354.Among the five color spaces, HLS seemed to be the least ideal one, with the largest RE and RMSE and the smallest SSIM.

J
. Imaging 2024, 10, x FOR PEER REVIEW 9 of 24 which was likely due to its lower number of matched features coming from only one channel instead of all three channels, an RMSE of 8.1125, and an SSIM of 0.7363.For both RMSE and SSIM, RGB and XYZ consistently outperformed grayscale marginally, with RMSEs of 8.0981 and 8.0936 and SSIMs of 0.7410 and 0.7409, while Y′CrCb, HLS, and L*a*b* consistently underperformed grayscale marginally, with RMSEs of 8.1190, 8.1360, and 8.1159 and SSIMs of 0.7349, 0.7287, and 0.7354.Among the five color spaces, HLS seemed to be the least ideal one, with the largest RE and RMSE and the smallest SSIM.

Figure 5 .
Figure 5. Boxplots of reprojection error (RE), root mean square error (RMSE), and structural similarity index measure (SSIM) achieved by each color space for all image registrations.

Figure 6 ,
Figure 6, supplemented by TableA3, shows the boxplots of the registration quality relative changes achieved by each color space over grayscale, when their raw image pairs, feature detectors, and feature descriptors were identical.Generally, based on the median values of the distributions, no color space was able to improve RE over grayscale, likely for the reason mentioned above.However, again, RGB and XYZ both were able to improve RMSE and SSIM over grayscale marginally by 0.05% and 0.1%, indicating an expected

Figure 5 .
Figure 5. Boxplots of reprojection error (RE), root mean square error (RMSE), and structural similarity index measure (SSIM) achieved by each color space for all image registrations.

Figure 6 ,
Figure 6, supplemented by TableA3, shows the boxplots of the registration quality relative changes achieved by each color space over grayscale, when their raw image pairs, feature detectors, and feature descriptors were identical.Generally, based on the median values of the distributions, no color space was able to improve RE over grayscale, likely for the reason mentioned above.However, again, RGB and XYZ both were able to improve RMSE and SSIM over grayscale marginally by 0.05% and 0.1%, indicating an expected image registration quality benefit when switching from grayscale to RGB or XYZ as input image channels.Overall Y ′ CrCb, as well as L*a*b* to a lesser degree, achieved an almost identical performance to grayscale with 0 or near 0 RE, RMSE, and SSIM relative changes.HLS again showed a consistently lower performance than grayscale, with median 10.57% RE increase, 0.09% RMSE increase, and 0.25% SSIM decrease.

Figure 5 .
Figure 5. Boxplots of reprojection error (RE), root mean square error (RMSE), and structural similarity index measure (SSIM) achieved by each color space for all image registrations.

Figure 6 ,
Figure 6, supplemented by TableA3, shows the boxplots of the registration quality relative changes achieved by each color space over grayscale, when their raw image pairs, feature detectors, and feature descriptors were identical.Generally, based on the median values of the distributions, no color space was able to improve RE over grayscale, likely for the reason mentioned above.However, again, RGB and XYZ both were able to improve RMSE and SSIM over grayscale marginally by 0.05% and 0.1%, indicating an expected image registration quality benefit when switching from grayscale to RGB or XYZ as input image channels.Overall Y′CrCb, as well as L*a*b* to a lesser degree, achieved an almost identical performance to grayscale with 0 or near 0 RE, RMSE, and SSIM relative changes.HLS again showed a consistently lower performance than grayscale, with median 10.57% RE increase, 0.09% RMSE increase, and 0.25% SSIM decrease.

Figure 6 .
Figure 6.Boxplots of RE, RMSE, and SSIM relative change achieved by each color space over grayscale for all image registrations.

Figure 6 .
Figure 6.Boxplots of RE, RMSE, and SSIM relative change achieved by each color space over grayscale for all image registrations.

Figure 7 ,
Figure 7, supplemented by TableA4, shows the boxplots of the three registration quality metrics achieved by each individual color channel for all the image registrations, in comparison to grayscale.Overall, no color channels differentiated themselves from others in a meaningful positive way, although apparent inferior performances were observed for certain color channels.Based on RMSE and SSIM, Cr, Cb, H, S, a*, and b* were noticeably less accurate than the remaining color channels, all of which had similar performances to each other.In terms of RE median values, Cr, Cb, a*, and b* are much lower than the other color channels.Their 0 or near 0 first quartile RE values also indicated that they were not able to provide rich image features.Y ′ and L* are the only two channels that outperformed grayscale based on RMSE and SSIM; however, their median values are almost identical to grayscales, with 8.1120 and 8.1122 versus 8.1125 for RMSE and 0.7364 and 0.7364 versus 0.7363 for SSIM.Note the calculation of Y ′ channel should be the same as grayscale in theory, yet the function implementations in OpenCV occasionally resulted in minor image pixel value differences due to internal code base issues, leading to the trivial registration quality metric distribution differences between them.

Figure 7 .
Figure 7. Boxplots of RE, RMSE, and SSIM achieved by each color channel for all image registrations.

Figure 8 ,
Figure 8, supplemented by TableA5, shows the boxplots of the registration quality relative changes achieved by each color channel over grayscale, when their raw image pairs, feature detectors, and feature descriptors were identical.Based on the median values of the distributions, X, Cr, Cb, a*, and b* were able to achieve lower REs than grayscale, with a 0.43% to 54.24% reduction.L* was the only color channel that attained superior RMSE and SSIM to grayscale, with a marginal 0.01% improvement for both metrics.Cr, Cb, H, S, a*, and b* again showed substantially inferior performance to grayscale according to RMSE and SSIM, with an increase of 2.33% to 16.11% for RMSE and a decrease of 6.77% to 33.28% for SSIM.

Figure 7 .
Figure 7. Boxplots of RE, RMSE, and SSIM achieved by each color channel for all image registrations.

Figure 8 ,
Figure 8, supplemented by TableA5, shows the boxplots of the registration quality relative changes achieved by each color channel over grayscale, when their raw image pairs, feature detectors, and feature descriptors were identical.Based on the median values of the distributions, X, Cr, Cb, a*, and b* were able to achieve lower REs than grayscale, with a 0.43% to 54.24% reduction.L* was the only color channel that attained superior RMSE and SSIM to grayscale, with a marginal 0.01% improvement for both metrics.Cr, Cb, H, S, a*, and b* again showed substantially inferior performance to grayscale according to RMSE and SSIM, with an increase of 2.33% to 16.11% for RMSE and a decrease of 6.77% to 33.28% for SSIM.

Figure 8 .
Figure 8. Boxplots of RE, RMSE, and SSIM relative change achieved by each color channel over grayscale for all image registrations.

Figure 9 ,
Figure 9, supplemented by TableA6, shows the boxplots of the three registration quality metrics achieved by each feature detector for all the image registrations.Based on the median RE values, from the best to the worst, the detectors ranked as AKAZE, SIFT, CSE, KAZE, HL, ORB, BRISK, FAST, and TBMR, with REs from 0.88 to 1.12.Based on the median RMSE and SSIM values, however, which mostly agreed with each other, from the best to the worst, the detectors ranked as FAST/SIFT, SIFT/FAST, BRISK, KAZE, AKAZE, HL, TBMR, CSE, and ORB, with RMSEs from 8.14 to 8.55 and SSIMs from 0.63 to 0.74.SIFT stood out as the most consistent-performing detector across all three metrics, securing second place in RE and RMSE and first place in SSIM.

Figure 8 .
Figure 8. Boxplots of RE, RMSE, and SSIM relative change achieved by each color channel over grayscale for all image registrations.

Figure 9 ,
Figure 9, supplemented by Table A6, shows the boxplots of the three registration quality metrics achieved by each feature detector for all the image registrations.Based on the median RE values, from the best to the worst, the detectors ranked as AKAZE, SIFT,

Figure 9 .
Figure 9. Boxplots of RE, RMSE, and SSIM achieved by each feature detector for all image registrations.Figure 9. Boxplots of RE, RMSE, and SSIM achieved by each feature detector for all image registrations.

Figure 9 .
Figure 9. Boxplots of RE, RMSE, and SSIM achieved by each feature detector for all image registrations.Figure 9. Boxplots of RE, RMSE, and SSIM achieved by each feature detector for all image registrations.3.1.4.Feature DescriptorFigure 10, supplemented by TableA7, shows the boxplots of the three registration quality metrics achieved by each feature descriptor for all the image registrations.The descriptors did not differentiate themselves from each other as much as the detectors.Based on the median RE values, from the best to the worst, the descriptors ranked as AKAZE, KAZE, FREAK, BRISK, ORB, DAISY, BB, BRIEF, VGG, SIFT, and LATCH, with REs from 0.88 to 1.08.Based on the median RMSE values, from the best to the worst, the descriptors ranked as AKAZE, DAISY, VGG, BRIEF, SIFT, BB, KAZE, BRISK, LATCH, ORB, and FREAK, with RMSEs from 8.21 to 8.33.Based on the median SSIM values, from the best to the worst, the descriptors ranked as AKAZE, KAZE, DAISY, VGG, BRIEF, BB, BRISK, SIFT, ORB, LATCH, and FREAK, with SSIMs from 0.70 to 0.72.AKAZE consistently stood out as the best-performing descriptor across all three metrics.However, as shown in the upcoming section, AKAZE was one of the two descriptors with poor detector compatibility and hence high registration failure rate.The observed superior performance of AKAZE could be due to the lower influence from fewer low-performing detectors.
Figure 10, supplemented by TableA7, shows the boxplots of the three registration quality metrics achieved by each feature descriptor for all the image registrations.The descriptors did not differentiate themselves from each other as much as the detectors.Based on the median RE values, from the best to the worst, the descriptors ranked as AKAZE, KAZE, FREAK, BRISK, ORB, DAISY, BB, BRIEF, VGG, SIFT, and LATCH, with REs from 0.88 to 1.08.Based on the median RMSE values, from the best to the worst, the descriptors ranked as AKAZE, DAISY, VGG, BRIEF, SIFT, BB, KAZE, BRISK, LATCH, ORB, and FREAK, with RMSEs from 8.21 to 8.33.Based on the median SSIM values, from the best to the worst, the descriptors ranked as AKAZE, KAZE, DAISY, VGG, BRIEF, BB, BRISK, SIFT, ORB, LATCH, and FREAK, with SSIMs from 0.70 to 0.72.AKAZE consistently stood out as the best-performing descriptor across all three metrics.However, as shown in the upcoming section, AKAZE was one of the two descriptors with poor detector compatibility and hence high registration failure rate.The observed superior performance of AKAZE could be due to the lower influence from fewer low-performing detectors.

Figure 10 .
Figure 10.Boxplots of RE, RMSE, and SSIM achieved by each feature descriptor for all image registrations.

Figure 10 .
Figure 10.Boxplots of RE, RMSE, and SSIM achieved by each feature descriptor for all image registrations.
J. Imaging 2024, 10, x FOR PEER REVIEW 13 of 24 77], SSIM might provide an advantage over RMSE to better quantify the similarity between registered target and source images.

Figure 11 .
Figure 11.Scatter plots between RE, RMSE, and SSIM of all image registrations.

Figure 11 .
Figure 11.Scatter plots between RE, RMSE, and SSIM of all image registrations.

Figure 12 .
Figure 12.Bar charts for registration failure rates of the investigated color spaces and channels, feature detectors, and feature descriptors.

Figure 13 ,
Figure 13, supplemented by TableA8, shows the distributions of the initial feature numbers in the target and source images detected by the feature detectors, as well as the

Figure 12 .
Figure 12.Bar charts for registration failure rates of the investigated color spaces and channels, feature detectors, and feature descriptors.

Figure 13 ,
Figure 13, supplemented by TableA8, shows the distributions of the initial feature numbers in the target and source images detected by the feature detectors, as well as the inlier matched feature numbers in the target or source images after RANSAC used for homography estimation, achieved by each color channel for all the image registrations.Based on the distribution median values, as expected, Cr, Cb, a*, and b* had very low numbers of initial detectable features and final inlier features, with 24 to 31 detector features and 6 to 7 homography features.H and S also had considerably lower features than the most color channels, with 390 and 575 detector features and 10 and 53 homography features.From the most to the least, the rest of the color channels ranked as Z, L*, R, G, Y, Y ′ , grayscale, L, B, and X, with 399 to 449 detector features, and L*, G/Y/Y ′ /grayscale, Z/R, L, X, and B, with 144 to 162 homography features.As the best performing color channel in terms of registration quality, L* also ranked at the top in terms of feature numbers with the second-most initial detectable features and the most final inlier features.On the other hand, Cr, Cb, H, S, a*, and b* not only attained the lowest registration quality, but also had the lowest detector and homography features.This observation indicated the potential positive association between image feature number and image registration quality.In that sense, artificial intelligence-based image contrast enhancement and resolution upscaling might be a future research direction for improving image registration accuracy.

Figure 14 ,
Figure 14, supplemented by TableA9, shows the distributions of the initial feature numbers in the target and source images detected by the feature detectors, as well as the inlier matched feature numbers in the target or source images after RANSAC used for homography estimation, achieved by each feature detector for all the image registrations.Substantial feature number differences were observed for the detectors.Based on the distribution median values, the detector rankings for the initial and final feature numbers mostly agreed with each other, being FAST, BRISK, SIFT/KAZE, KAZE/SIFT, AKAZE, HL, ORB, TBMR/CSE, and CSE/TBMR, from the most to the least.The initial detector feature numbers ranged from 3988 to 244, while the final homography feature numbers ranged from 600 to 33.FAST, as the one of the two best-performing detectors based on RMSE and SSIM, provided significantly more features than the remaining detectors, surpassing the second-place detector, BRISK, by 80% and 83% in terms of initial detector and final homography features.Again, the potential association between image feature number and image registration quality was observed.Aside from image registration, image features also have applications in object recognition[78], object detection[79], image retrieval [80],

Figure 13 .
Figure 13.Boxplots of initial detector feature numbers and final homography feature numbers achieved by each color channel for all image registrations.3.4.2.Feature Detector

Figure 14 ,
Figure 14, supplemented by TableA9, shows the distributions of the initial feature numbers in the target and source images detected by the feature detectors, as well as the inlier matched feature numbers in the target or source images after RANSAC used for homography estimation, achieved by each feature detector for all the image registrations.Substantial feature number differences were observed for the detectors.Based on the distribution median values, the detector rankings for the initial and final feature numbers mostly agreed with each other, being FAST, BRISK, SIFT/KAZE, KAZE/SIFT, AKAZE, HL, ORB, TBMR/CSE, and CSE/TBMR, from the most to the least.The initial detector feature numbers ranged from 3988 to 244, while the final homography feature numbers ranged from 600 to 33.FAST, as the one of the two best-performing detectors based on RMSE and SSIM, provided significantly more features than the remaining detectors, surpassing the second-place detector, BRISK, by 80% and 83% in terms of initial detector and final homography features.Again, the potential association between image feature number and image registration quality was observed.Aside from image registration, image features also have applications in object recognition[78], object detection[79], image retrieval [80], 3D reconstruction[81], etc., which all might benefit from the large image feature numbers

Figure 13 .
Figure 13.Boxplots of initial detector feature numbers and final homography feature numbers achieved by each color channel for all image registrations.

Figure 14 .
Figure 14.Boxplots of initial detector feature numbers and final homography feature numbers achieved by each feature detector for all image registrations.3.4.3.Feature Descriptor

Figure 15 ,
Figure 15, supplemented by TableA10, shows the distributions of the inlier matched feature numbers in the target or source images after RANSAC used for homography estimation, achieved by each feature descriptor for all the image registrations.No significant feature number differences between the descriptors were observed unlike the detectors, indicating feature detector was potentially a bigger factor than feature descriptor in regard to influencing the numbers of final inlier homography features.Based on the distribution median values, from the most to the least, the descriptors ranked as KAZE, AKAZE, DAISY, VGG, SIFT, BRIEF, LATCH, BB, ORB, BRISK, and FREAK, with homography feature numbers ranging from 87 to 215.

Figure 14 .
Figure 14.Boxplots of initial detector feature numbers and final homography feature numbers achieved by each feature detector for all image registrations.3.4.3.Feature Descriptor

Figure 15 ,
Figure 15, supplemented by TableA10, shows the distributions of the inlier matched feature numbers in the target or source images after RANSAC used for homography estimation, achieved by each feature descriptor for all the image registrations.No significant feature number differences between the descriptors were observed unlike the detectors, indicating feature detector was potentially a bigger factor than feature descriptor in regard to influencing the numbers of final inlier homography features.Based on the distribution median values, from the most to the least, the descriptors ranked as KAZE, AKAZE, DAISY, VGG, SIFT, BRIEF, LATCH, BB, ORB, BRISK, and FREAK, with homography feature numbers ranging from 87 to 215.

Figure 14 .
Figure 14.Boxplots of initial detector feature numbers and final homography feature numbers achieved by each feature detector for all image registrations.3.4.3.Feature DescriptorFigure 15, supplemented by TableA10, shows the distributions of the inlier matched feature numbers in the target or source images after RANSAC used for homography estimation, achieved by each feature descriptor for all the image registrations.No significant feature number differences between the descriptors were observed unlike the detectors, indicating feature detector was potentially a bigger factor than feature descriptor in regard to influencing the numbers of final inlier homography features.Based on the distribution median values, from the most to the least, the descriptors ranked as KAZE, AKAZE, DAISY, VGG, SIFT, BRIEF, LATCH, BB, ORB, BRISK, and FREAK, with homography feature numbers ranging from 87 to 215.

Figure 15 .
Figure 15.Boxplots of initial detector feature numbers and final homography feature numbers achieved by each feature descriptor for all image registrations.

Figure 15 .
Figure 15.Boxplots of initial detector feature numbers and final homography feature numbers achieved by each feature descriptor for all image registrations.

Figure 16 .
Figure 16.Composition pie charts of the 302 color space or channel, feature detector, and feature descriptor combinations that registered the entire dataset without failure.

Figure 16 .
Figure 16.Composition pie charts of the 302 color space or channel, feature detector, and feature descriptor combinations that registered the entire dataset without failure.

Table 1 .
Selected image feature detectors for the study.

Table 1 .
Selected image feature detectors for the study.

Table 2 .
Selected image feature descriptors for the study.
Figure 3. Schematic diagram of the employed image registration pipeline.

•
Lowest RE combinationsFor color space, XYZ+KAZE+BRISK ranked at 2nd place, with an RE of 0.86, an RMSE of 7.85 at 102nd place, and an SSIM of 0.73 at 102nd place.For color channel, L+KAZE+BRISK ranked at 1st place, with an RE of 0.86, an RMSE of 7.88 at 166th place, and an SSIM of 0.73 at 153rd place.

•
Lowest RE combinations For color space, XYZ+KAZE+BRISK ranked at 2nd place, with an RE of 0.86, an RMSE of 7.85 at 102nd place, and an SSIM of 0.73 at 102nd place.For color channel, L+KAZE+BRISK ranked at 1st place, with an RE of 0.86, an RMSE of 7.88 at 166th place, and an SSIM of 0.73 at 153rd place.

Table A2 .
Image registration quality metric distributions of the investigated color spaces.

Table A3 .
Image registration quality relative change distribution of the investigated color spaces over grayscale.

Table A4 .
Image registration quality metric distributions of the investigated color channels.

Table A5 .
Image registration quality relative change distribution of the investigated color channels over grayscale.

Table A6 .
Image registration quality metric distributions of the investigated feature detectors.

Table A7 .
Image registration quality metric distributions of the investigated feature descriptors.

Table A11 .
Top 10 color space or channel, feature detector, and feature descriptor combinations in terms of average RE, RMSE, SSIM, detector feature number, and homography feature number over the entire dataset.