1. Introduction
Vehicle recognition technology impacts a wide range of applications, from enhancing security measures to improving traffic management and providing insights for business strategies. Some applications need recognition at the basic level of classification, while others demand a more advanced level of re-identification. Vehicle classification, the more abstract operation, categorizes vehicles based on make and model and possibly other parameters such as color, model year, or trim level. On the other hand, vehicle re-identification achieves a higher level of precision, identifying a specific vehicle as one that was previously observed. Insurance and traffic flow analyses, for example, stand to benefit even from broad categorizations, whereas toll collection requires precise vehicle identification.
The development of models for vehicle recognition generally aims to push capabilities as close toward re-identification as possible, because such models support more applications. Certain use cases, though, concern vehicle targets that do not cooperate in being identified. These vehicles will not, for instance, carry transponders in support of electronic toll collection. Moreover, license plate information in such cases must be assumed unreliable or unavailable. As an example, in the realm of security, surveillance, or law enforcement, one cannot discount deliberate deception by the vehicle operator. Plates can be obscured, blocked, removed, or switched with those from another vehicle. In other applications, plate reading may not be practical or even permissible: Motorists entering toll roads and controlled parking implicitly agree to the capture of identifying information, but collection in other contexts may violate privacy conventions. While possibly helpful in establishing ground truth for training, a license tag has no role in inference within these applications.
With absent features such as plates that offer a one-to-one mapping to specific objects, the task of vehicle recognition becomes more challenging. Indeed, there are many similar-looking vehicles in circulation. In light of such ambiguities, models rely on vehicle features such as wheel designs and body contours. In [
1], the authors proposed a part-regularized discriminative feature-preserving method to enhance the ability to perceive subtle discrepancies. Their approach employed three vehicle parts for detection, including the front light, back light, front window, back window, and vehicle brand. An end-to-end RNN-based hierarchical attention (RNN-HA) classification model for vehicle re-identification was introduced in [
2]. The proposed RNN-based module models can effectively capture subtle visual appearance cues, such as paint and windshield stickers. In [
3], a local feature-aware model for vehicle re-identification was proposed to focus on learning discriminating parts that differ among vehicles. However, their model did not perform well under dim illumination conditions. In [
4], a co-occurrence attention network (CAN) was introduced to extract consistent global features and local details with viewpoint information. Their model was trained by partition-and-reunion-based loss to narrow the intra-class distance and increase the inter-class distance. Color also plays an important role, being represented mathematically in a multidimensional color space, where each dimension corresponds to a color component. The choice of color space has potentially important ramifications for the accuracy and robustness of vehicle recognition algorithms based on digital images. Traditional color spaces like RGB, native to optical cameras, exhibit both strengths and weaknesses in facilitating vehicle recognition across diverse lighting conditions. The current work was commissioned to evaluate the effect of RGB transformations on model performance.
Prior research on color spaces (e.g., HSV [
5], LAB [
6], YCbCr [
7]) demonstrates the influence of color space selection on the performance of recognition models, including in the context of vehicle recognition [
8,
9]. This body of work includes both simple linear transformations [
10] and feature synthesis. Feature synthesis, by creating complex features through nonlinear operations, enables the capture of intricate relationships within the data that linear transformations might miss, thereby enhancing model performance by leveraging a more sophisticated understanding of data patterns [
11,
12]. Advances in machine learning, particularly through deep learning techniques [
13,
14] and the integration of domain knowledge [
15], have also proved useful in optimizing color space selection to improve model performance. A study of the differential effects of various color spaces on convolutional neural networks (CNNs) [
16] identifies the LUV color space as a viable alternative to RGB in achieving comparable results on the CIFAR10 dataset. Furthermore, research into feeding multiple color spaces into individual dense networks [
17] found that certain color spaces more effectively represent specific classes. Results of this work suggest that the strategic choice and combination of color spaces might substantially influence the efficacy of vehicle recognition models, as well.
Despite their successes in image processing and computer vision, common color spaces have constraints that can limit their effectiveness at vehicle recognition. First, sensitivity to lighting changes can lead to significant variations in color representation, reducing recognition accuracy under variable outdoor conditions. Second, high correlation among channels can make it hard to isolate specific color information, negatively impacting the performance of recognition algorithms. Third, mixing of color and intensity information makes channel values more dependent on lighting and shadows. The lack of an invariant quantity in turn makes a recognition algorithm more sensitive to environmental factors and reduces its accuracy. Finally, non-uniform perceptual color representation can make recognition algorithms overly sensitive to minor, perceptually insignificant color changes, complicating algorithms and reducing performance. As RGB suffers many of these drawbacks, we explore alternative color spaces in
Section 2. We will investigate the role of color spaces in distinguishing vehicles, identifying unique features, and ensuring robustness under different lighting conditions. In addition, we will evaluate other representations with progressively reduced color resolution, extending to the level of binary imagery. Our proposed approach will aid in narrowing down the input variables to identify those with the highest predictive value, demonstrating the importance of judicious feature selection in enhancing model efficiency and accuracy. Our objective is to investigate further simplifications of visual sensors and network architecture and more efficient training given the reduced information content being processed. The potential gains in processing efficiency from reducing color depth can be substantial. By focusing on the most critical features, we can also observe improved model generalization and robustness, as the model becomes less prone to overfitting to noise or irrelevant details in the data. An additional contribution of our work is the emergence of a hypothesis for improving model accuracies.
3. Results
An experiment involves a two-step process: First, RGB images from the dataset (
Section 2.2) are converted into another color space (
Section 2.1). Following this conversion, an inference model is trained and validated on the transformed data. The outcome of the experiments is a series of predictions, each assessing whether a pair of vehicle images from the dataset represents the same object.
3.1. Model Accuracy
Accuracy, in the context of a binary classifier, is the fraction of model predictions that are correct. The number of correct predictions is the sum of true positives (
) and true negatives (
). The total number of predictions includes both correct and incorrect predictions:
where
and
are the numbers of false positives and false negatives, respectively. The standard metric of accuracy, as the observed proportion of correct predictions,
ranges from 0 (completely inaccurate) to 1 (perfect accuracy) and provides a straightforward measure of a model’s overall correctness in its predictions. In
Table 1,
Table 2,
Table 3 and
Table 4, we depict the highest accuracy achieved during training and validation.
3.2. Color Space Performance
The effectiveness of vehicle re-identification models, measured through their accuracy, serves as a critical indicator of their potential for real-world applicability. To maintain consistency and ensure fair comparisons among models, the iterative training process starts by using the same randomly selected training and validation sets for all color spaces. Models based on data encoded in the native RGB color space of the cameras serve as a baseline for evaluating the use of alternate color spaces.
Models are categorized based on the subsets of the imagery dataset used in their development. Four categories are identified, and results are accordingly organized into distinct tables and associated figures. The
daytime data, representing information collected during the day before sunset, were divided into training and validation sets, with the training set comprising 435,153 images and the validation set containing 76,203 images. The
nighttime data, composed of images taken after sunset and before sunrise, were also split into training and validation sets, with the night-time training set containing 27,315 images, constituting around 6% of the total training data, and the night-time validation set comprising 4982 images. Experiments vary in their use of day/night data, including those that use day and night data in equal proportions (group I), those that use only one type of data (day only in II; night only in III), and those that utilize different types of data for distinct purposes (in group IV, selecting training data from all imagery, but validation data only from night-time collections). A given group consists of trials that assess the impact of each color mapping and change in color resolution in
Section 2.1 on re-identification performance.
In the first group of training and validation runs, which selects examples from both daytime and night-time data, as shown in
Table 1 and
Figure 20, the YUV color space demonstrated superior performance over other color spaces for both training and validation under varied illumination conditions. Notably, 12-bit RGB performed comparably to 24-bit RGB, indicating a significant advantage even with reduced color depth. Contrary to expectations, no other transformations into 3D color spaces managed to surpass the performance of RGB. Remarkably, 4-bit Gray not only outperformed RGB but also achieved results that were, within the margin of error, comparable to those of YUV, underscoring its efficiency despite its reduced information content. Furthermore, even 2-bit Gray surpassed nRGB and c1c2c3 in terms of performance. It was only when the information content was reduced to a binary format that a noticeable drop in accuracy was observed.
In the second set of experiments, which focused exclusively on daytime data for training and validation of the vehicle re-identification network, as detailed in
Table 2 and
Figure 21, the baseline RGB color space exhibited the top performance during the training phase. Interestingly, during validation, 12-bit RGB achieved the highest score overall, underscoring its effectiveness in conditions dominated by daylight. Furthermore, monochrome Red, Gray, 4-bit Gray, and 2-bit Gray also demonstrated commendable performance, indicating their utility even in the absence of diverse color information. This pattern suggests a nuanced relationship between color depth and recognition accuracy in vehicle re-identification tasks, especially under consistent lighting conditions.
Table 3 and
Figure 22 summarize the performance of models that use night-time data for training and validation. As in the second set of experiments, which used exclusively daytime data, the RGB color space achieved the best performance during training, while 12-bit RGB stood out during validation. Notably, a greater difference between training and validation accuracies was observed across all color spaces when compared with the daytime data results, indicating a pronounced challenge in model generalization under low-light conditions. Despite these variations, the disparity in reported accuracies across different color spaces, including Binary, was less pronounced for a given set of image pairs, whether during training or validation. However, nRGB and c1c2c3 consistently yielded the smallest accuracies during validation, highlighting the particular difficulties these color spaces face in accurately re-identifying vehicles with night-time imagery.
Finally,
Table 4 and
Figure 23 cover experiments that explore the effects of training on all data and validating on night-time data. This adjustment to the training was made in response to the challenges observed in group III, which used exclusively night-time data. In group IV, by training the network with both daytime and night-time imagery, we sought to provide a more diversified learning experience and achieve better generalization when validating against the more challenging night-time data. RGB, HSV, and YUV were the standout performers in both training and validation phases, with HSV marginally being in the lead but within the margin of error, indicating closely competitive results among these color spaces. Consistent with findings from the other experimental groups, nRGB and c1c2c3 color spaces lagged behind, underscoring their relatively lower effectiveness in the specialized task of cross-illumination vehicle re-identification.
Table 1,
Table 2,
Table 3 and
Table 4 show overlap among the accuracy confidence intervals from 24-bit color spaces down to the 4-bit Gray level for multiple training and verification datasets. Because model development is an inherently stochastic process, variation in accuracy is observed even when training is repeated using the same color space. Furthermore, this variation can supersede milder trends that are a function of color space. Loss of color resolution is only observed to materially degrade accuracy when we consider cases with less color resolution than 4-bit Gray.
Table 1.
Accuracy of vehicle ID using both daytime and night-time data to train and validate the network.
Table 1.
Accuracy of vehicle ID using both daytime and night-time data to train and validate the network.
| RGB | HSV | YUV | LUV | nRGB | c1c2c3 | 12-Bit RGB | 8-Bit Red | 8-Bit Gray | 4-Bit Gray | 2-Bit Gray | Binary |
---|
Training | 95.32% | 94.61% | 95.74% | 95.22% | 92.80% | 92.28% | 95.56% | 95.02% | 95.10% | 95.51% | 93.72% | 88.18% |
Validation | 94.65 ± 0.44% | 93.75 ± 0.47% | 95.25 ± 0.41% | 94.51 ± 0.44% | 91.95 ± 0.53% | 90.05 ± 0.58% | 94.62 ± 0.44% | 93.87 ± 0.47% | 94.67 ± 0.44% | 94.97 ± 0.42% | 92.78 ± 0.50% | 88.45 ± 0.62% |
Table 2.
Accuracy of vehicle-ID using only daytime data to train and validate the network.
Table 2.
Accuracy of vehicle-ID using only daytime data to train and validate the network.
| RGB | HSV | YUV | LUV | nRGB | c1c2c3 | 12-Bit RGB | 8-Bit Red | 8-Bit Gray | 4-Bit Gray | 2-Bit Gray | Binary |
---|
Training | 96.32% | 95.87% | 96.28% | 96.14% | 94.20% | 94.27% | 96.15% | 95.73% | 96.24% | 95.96% | 93.53% | 88.18% |
Validation | 96.17 ± 0.37% | 95.80 ± 0.39% | 96.35 ± 0.36% | 96.32 ± 0.37% | 94.34 ± 0.45% | 93.07 ± 0.49% | 96.48 ± 0.36% | 96.09 ± 0.38% | 96.38 ± 0.36% | 95.70 ± 0.39% | 93.66 ± 0.47% | 89.60 ± 0.59% |
Table 3.
Accuracy of vehicle-ID using only night-time data to train and validate the network.
Table 3.
Accuracy of vehicle-ID using only night-time data to train and validate the network.
| RGB | HSV | YUV | LUV | nRGB | c1c2c3 | 12-Bit RGB | 8-Bit Red | 8-Bit Gray | 4-Bit Gray | 2-Bit Gray | Binary |
---|
Training | 97.85% | 97.74% | 97.75% | 97.74% | 96.97% | 96.68% | 97.59% | 97.43% | 97.71% | 97.68% | 96.82% | 95.13% |
Validation | 92.11 ± 0.52% | 92.31 ± 0.52% | 91.90 ± 0.53% | 91.93 ± 0.53% | 86.93 ± 0.65% | 85.09 ± 0.69% | 92.87 ± 0.50% | 91.41 ± 0.54% | 92.74 ± 0.50% | 92.66 ± 0.50% | 90.70 ± 0.56% | 87.18 ± 0.65% |
Table 4.
Accuracy of vehicle ID using daytime and night-time data to train the network but only night-time data for validation.
Table 4.
Accuracy of vehicle ID using daytime and night-time data to train the network but only night-time data for validation.
| RGB | HSV | YUV | LUV | nRGB | c1c2c3 | 12-Bit RGB | 8-Bit Red | 8-Bit Gray | 4-Bit Gray | 2-Bit Gray | Binary |
---|
Training | 95.61% | 95.62% | 95.45% | 95.21% | 92.56% | 91.86% | 95.13% | 94.28% | 95.04% | 95.28% | 92.8% | 87.69% |
Validation | 94.20 ± 0.45% | 95.09 ± 0.42% | 95.09 ± 0.42% | 94.12 ± 0.46% | 91.08 ± 0.55% | 88.74 ± 0.61% | 93.97 ± 0.46% | 92.97 ± 0.50% | 93.62 ± 0.47% | 93.99 ± 0.46% | 90.08 ± 0.58% | 86.18 ± 0.67% |
3.3. Confidence Intervals and Statistical Significance
The process of training and evaluating a model is inherently stochastic, driven by the unpredictable nature of the update step in stochastic gradient descent (SGD). Repeating the process therefore results in different values of model metrics. Because model development is resource-intensive, and because we explored twelve color spaces in combination with different subsets of the dataset, it was not possible to perform each evaluation multiple times. Nonetheless, it is straightforward to quantify confidence in the validation accuracy under reasonable assumptions.
We calculate a confidence interval, , for model accuracy during validation based on a normal approximation of the binomial distribution, for which we assume the following observations. Each observation is independent and identically distributed (i.i.d.), with each having the same probability of successfully determining whether or not a pair of images depicts the same vehicle. It is important to note that during a model’s training phase, ’observations’—or pair comparisons—cannot be considered to be drawn from the same distribution due to the ongoing development of the model. However, during the validation phase, the model is stable, making the i.i.d. assumption reasonable.
The confidence interval for the model accuracy,
p, is as follows:
where
p and
n are as defined in (
13) and (
14), respectively. The central limit theorem is fundamental here, ensuring that the normal approximation is reliable given the large sample size in our experiments. Additionally, the expression for
presupposes random sampling from the population and assumes that the number of trials,
n, is fixed in advance, with no dependence on the outcomes of the observations. The Z-score is
for the 95% confidence interval we display for validation results in all tables and corresponding figures.
4. Analysis
General trends are apparent from a cursory inspection of the results. While color space and lighting conditions impact model performance, the effects are relatively modest. Only when the loss of color information approaches the point of producing a binary image is there a marked loss in accuracy.
4.1. Selection of Color Space
Evidence regarding the superiority of one color space over another for vehicle re-identification is inconclusive. Reported differences in accuracy among 24-bit color spaces are modest. No single color space consistently outperforms the others, though there is a marginal preference for YUV in scenarios where all data are utilized for both training and validation. Performance notably declines only when color content is significantly reduced, as seen when a model uses nRGB, c1c2c3, 2-bit Gray, or Binary features. Empirically, the models are often successful at distinguishing vehicles whose only apparent visually distinguishing feature is hue. Nevertheless, the sensitivity of validation accuracy to the choice of color space is at most marginally greater than the 95% confidence intervals. Ultimately, the incremental benefits do not strongly justify the additional processing required to convert from the native RGB format to another 24-bit color space.
The success of models in distinguishing between objects suggests a reliance on factors beyond color differences. In the absence of readily apparent alternatives, we hypothesize that geometrical features assume outsized importance in the re-identification process in our models. Geometry here refers not just to the projections of body contours onto the focal plane, which offer a silhouette or shape that can be distinctive from one vehicle to another, but also to the finer details observed in specific parts of the vehicles, such as the wheels. It is hard to know for certain what features a model keys on as it learns, but shape and structure, alongside color, would seem important for visual recognition here.
Validation accuracy closely matches training accuracy in the majority of cases, but a notable divergence occurs in experiments confined to night-time data (
Table 3,
Figure 22). This divergence indicates that the model may have become too closely attuned to the specific characteristics of the training data, failing to generalize effectively to new, unseen data in the validation set. The crux of the problem appears to lie in the relatively small size of the night-time dataset, which exacerbates the model’s tendency to overfit as it learns and relies on idiosyncrasies of the night-time data that do not represent broader data trends. This observation underscores the critical need for a balanced dataset to prevent overfitting and ensure that models retain their generalization capabilities across different lighting conditions. On the other hand, while color information is less pronounced in night-time imagery, its diminished presence alone does not account for the broader difficulties in generalization. This assertion is supported by the observation that deliberately choosing color spaces with limited color depth in daytime data does not significantly weaken the validation results compared with the training results in that case. Other factors, such as reduced visibility of vehicle contours at night, may play a larger role in the observed performance discrepancies relating to models trained on only night-time data.
4.2. Resilience against Loss of Color Information
A notable and perhaps surprising result is how gradually model performance degrades with reduced color resolution.
Figure 20 shows that training and validation accuracy are comfortably above 90% even as color resolution is systematically reduced to 2 bits per pixel. Significant changes in performance become evident only when the information loss approaches the level of yielding a binary image. Indeed, the confidence intervals for accuracy using all data in
Figure 20 overlap from 24-bit RGB to 4-bit Gray.
The resilience of model accuracy despite the loss of information aligns with the principles of the curse of dimensionality and its impact on feature selection and model simplification. This concept posits that in many cases, the true intrinsic dimensionality of data—essentially, the minimum number of features required for adequate representation—is significantly lower than the initially available feature set. In the context of our study, the vehicle images in our dataset possess a multitude of potential discriminative features. However, not all these features are equally critical for the model’s predictive capability. In our controlled campus environment, it emerges that a few key features, with color information being of limited importance, are paramount in influencing model decisions. By selectively altering the color information in image data inputs to a classifier, we engage in a form of implicit feature selection. This approach aids in narrowing down the input variables to identify those with the highest predictive value, demonstrating the importance of judicious feature selection in enhancing model efficiency and accuracy.
4.3. Implications for Vehicle Re-Identification
Given that model accuracy is only modestly affected by color space, the potential gains in processing efficiency from reducing color depth could be substantial and worthwhile. For example, converting imagery to 2-bit Gray would mean a 12-fold reduction in the amount of information to be processed. Such a reduction could potentially lower computational complexity, leading to quicker training and inference times beneficial across various applications. By focusing on the most critical features, we could also see improved model generalization and robustness, as the model becomes less prone to overfitting to noise or irrelevant details in the data. This streamlined approach could enhance deployment on resource-constrained edge devices, offering faster real-time processing and decision-making capabilities.
The vehicle re-identification system could be significantly simplified and streamlined by tailoring the visual collection equipment to directly capture images in 2-bit format. Equipment designed with this focus could be inherently simpler, lighter, and less expensive, as it would bypass the need for high-resolution sensors and the subsequent data reduction processing steps. This direct approach to data acquisition would not only reduce the computational load on the system but also enhance its adaptability and ease of deployment across various environments. The resulting system could offer enhanced utility in a wide array of applications, providing a cost-effective solution that maintains essential performance capabilities while leveraging the benefits of reduced data complexity.
Our results further suggest that dataset augmentation could enhance model accuracies. By increasing the number of example pairs with subtle color variations among similar vehicles, the model could be trained to more accurately distinguish vehicles based on color nuances. Although the study indicates that simplified data collection and processing might achieve comparable accuracies for broader applications, prioritizing color differentiation through dataset augmentation presents a promising avenue for applications where correctness is important. This approach not only underlines the potential for improving model accuracies but also highlights the adaptability of our methodologies to meet specific accuracy requirements in vehicle re-identification.
In an ideal vehicle re-identification scenario, each class would exclusively contain images of a single, specific vehicle. However, attaining this level of accuracy in a classifier system that omits license plate data is virtually impossible. This limitation results from vehicles of identical make, model, trim, and color typically lacking visible distinguishing features in their images. This inherent limitation directly restricts the maximum achievable accuracy in vehicle re-identification systems. Nonetheless, there is a positive aspect to consider. An inference model’s accuracy, when gauged in a controlled environment where the observed objects are drawn from a subpopulation, tends to be lower than the accuracy of that model when trained and used against a larger parent population. Consequently, we anticipate higher accuracy when the system currently under development is deployed against a larger pool of potentially observable vehicles.
5. Conclusions
In this paper, we explored the impact of various color spaces and color resolutions by identifying when degradation starts to represent a liability in the specific area of vehicle re-identification. Additionally, we examined the impact of environmental conditions on these findings. We trained and validated models for vehicle re-identification on a large dataset of images captured under diverse environmental conditions and categorized by day and night collection. The curated dataset was gathered within a controlled campus setting. The target color spaces in our study included diverse 24-bit representations of imagery as well as representations with progressively reduced color resolution, extending to the level of binary imagery. Conversions to these color spaces variously involved linear and nonlinear transformations.
We found that color spaces such as YUV, LUV, 12-bit RGB, 8-bit Gray, and 4-bit Gray are consistently competitive with RGB. Because conversion of imagery to 4-bit Gray did not appreciably affect accuracies, we infer that conversion to a lower-resolution space would streamline the analysis without sacrificing model performance. The potential gains in processing efficiency from reducing color depth are substantial. A reduction in resolution from RGB to 2-bit Gray would mean a 12-fold reduction in the amount of information to be processed, which would lower computational complexity, leading to quicker training and inference times that are beneficial across various applications. By focusing on the most critical features, we would also see improved model generalization and robustness, as the model becomes less prone to overfitting to noise or irrelevant details in the data.
Future investigations of the impacts of color space are planned using a multi-perspective vehicle re-identification system now under development.