An Analysis of Partitioned Convolutional Model for Vehicle Re-Identification

Nath, Rajsekhar Kumar; Mitra, Debjani

doi:10.3390/electronics14183634

Open AccessArticle

An Analysis of Partitioned Convolutional Model for Vehicle Re-Identification

by

Rajsekhar Kumar Nath

^*,† and

Debjani Mitra

^†

Indian Institute of Technology (ISM), Dhanbad 826004, India

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2025, 14(18), 3634; https://doi.org/10.3390/electronics14183634

Submission received: 15 July 2025 / Revised: 19 August 2025 / Accepted: 3 September 2025 / Published: 14 September 2025

(This article belongs to the Special Issue Deep Learning for Computer Vision, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Local Feature generation for vehicle re-identification is a challenging research area that is not yet well-investigated. The part-based convolutional baseline model with refined part pooling (PCB-RPP) architecture commonly approached in person reidentification problems was experimented over two standard vehicle image datasets (VReId and VehicleId) to establish that RPP over uniform partitions do not work well. To address the limitation, we propose a novel approach, Overlapped-PCB, which overlaps parts of two adjacent parts to generate new parts to train the classifiers. The results are concatenated to generate the feature set and this improves the re-identification accuracy in comparison to the RPP approach. Performance comparison results of extensive testing are also presented using re-ranking and ensembling in the evaluation stage. Our proposed model has been ensembled over three architectures, ResNet50, ResNet101, and ResNext50, to show the extent of performance improvement over existing works. The re-ranking process is shown to be strongly dataset-dependent for which the conventionally used k-reciprocal neighbors method has been improved by augmenting a new simple score-based algorithm for obtaining the best mix of component distances. This can be used as a generalized tool to finetune re-ranking for different datasets.

Keywords:

convolutional neural network; part-based features; ensembled model; re-identification; residual networks

1. Introduction

Monitoring behavior, activities, and information for the purpose of gaining insights, managing, influencing, or controlling is known as surveillance. CCTV cameras serve as a widely utilized tool for surveillance, having applications in law enforcement, crime scene investigation, and the observation of events and processes through visual data. Surveillance systems leveraging such images and videos can concentrate on objects, individuals, or vehicles to perform tasks such as recognition, identification, re-identification, tracking, or retrieval. This paper specifically addresses the re-identification of vehicles. Vehicle re-identification (VReId) refers to the process of retrieving images of a specific vehicle from a database, based on a query image. VReId finds applications in smart surveillance, traffic management, and fleet management systems [1]. Some of the significant practical challenges faced in VReId include the following: (i) differences in lighting between the query image and the images in the database, (ii) occlusion caused by speed, (iii) varying perspectives from multiple cameras, (iv) low-resolution images, (v) differing camera resolutions, and (vi) lack of available license plate information [2]. Person re-identification, a subject that has been extensively researched, is closely related to VReId. The successful outcomes in person re-identification, predominantly grounded in deep learning techniques, have motivated further investigation into VReId. Most recent works in re-identification adopt the approach of pre-training an existing network architecture on an image classification task utilizing the ImageNet dataset [2,3]. This is because image classification also involves the task of feature extraction to generate discriminative properties. Such networks are referred to as the backbone network in the literature and person re-id or VReId models are built upon such models by addition and deletion of convolutional layers. Any re-identification problem involves two steps. Firstly, some discriminative features are identified in the subject images. Then, the distinctive features are used for finding matches from a collection of such images. It has been observed that various methods which have worked well in person re-identification also work well in the case of VReId. While there exists a lot of visual discriminative features for person reidentification in traffic surveillance data, the discriminative features seem to reduce a lot in the case of VReId which makes the problem quite challenging. The features can be broadly classified into attribute-based and general re-id-based features. Attributes such as color, make, model, and viewpoint serve as global features based on specific characteristics. Local features are certain parts of the vehicle like headlights, mirrors, number plates, backlights, etc. A separate module/network is used to learn attributes. The labels are mapped to learned attributes. This leads to an increase in complexity in such methods. For general re-id feature extraction there exists two approaches, global and local. The global features help in obtaining an overall view of the image and the local ones provide insight into different local information. Another method that has been used quite successfully in person reidentification is learning part-based features, which is essentially a general re-id method of learning local features. In these approaches, the convolutional tensor produced by the backbone model is divided into separate sections to obtain local representations. In this paper, we take inspiration from a high-performing supervised learning method, the part-based convolutional baseline (PCB) used in person re-identification to solve VReId [4]. The principal advantage of PCB over most other re-identification methods is its relative simplicity owing to its single-branch model and its data-independent approach which makes it an easy ready to use model for re-identification tasks. In a supervised learning framework, this technique employs a generalized uniform partitioning of the convolutional tensor generated after the image is processed by a backbone network. Our research thoroughly examines the influence of the backbone network through experiments conducted with multiple advanced pre-trained network architectures. The parts obtained after partitioning ensure that we have localized information from different parts of the image which can have higher discriminative value. We also use a refined part pooling as described in [4] for removing in-part inconsistencies within the parts. Inspired by speech and signal analysis, we also propose a novel overlapped part-based approach for solving part inconsistencies over neighboring parts by using an overlap of the original parts. The approach used is, however, different from the conventional windowing technique used in signal processing, as we utilize a two-step process of first generating overlapped parts and then using the classifier weights of the original parts, with a new set of parts generated through combining the information from neighbouring parts. Finally, in the testing phase, we explore a reranking process using k-reciprocal neighbor encoding during evaluation which uses a composite mixture of two distances between two image features—viz. the Euclidean distance and the Jaccard distance calculated from k-reciprocal nearest features for calculating the results. To the best of our knowledge, we present the first detailed analysis, on two VReId datasets, of the effect of the optimal mix of the two distances for the re-ranking process. We then finally propose an ensemble model based on feature learnings in case of different backbone networks. Experiments are conducted on two large-scale datasets—VeRi and VehicleId—and are shown to have comparable results with the state of the art. Our present work is related closely to the part-based network (PCB) analyzed over person re-identification datasets [4]. The main contributions in this paper can be summarized below:

Our work is an exhaustive extension of Sun et al. which uses PCB with refined part pooling (PCB-RPP) over person re-identification (PReid) datasets [4]. To the best of our knowledge, the application of this method to vehicle re-identification (VReid) has not been reported until now. Our analysis revealed that the performance behavior of RPP is different when applied to vehicle re-identification problem. RPP improves performance in the case of the VeRi dataset while it degrades performance in the case of the VehicleId dataset. The results of Sun et al. over three PReid datasets however showed consistent performance for all of them. The reason investigated was that RPP fails to solve within-part inconsistencies over neighboring parts in the case of images in the VehicleId dataset. This motivated further investigations to improve the PCB-RPP architecture.
We propose a novel Overlapped Part-based Convolutional Baseline method (OPCB) which can take care of within-part inconsistencies. OPCB overlaps parts of two adjacent parts to generate new parts. The new parts are then trained using the classifiers corresponding to the original parts and the results are concatenated to generate the feature set. This method is found to work consistently well across both VeRi and VehicleId datasets and even outperforms the refined part pooling approach.
CNN-based models reported in the literature are usually tested over multiple architectures to identify and compare performance variations if there are any. Our work therefore investigates three residual networks as backbone networks viz. ResNet50, ResNet101, and ResNext50. The main difference lies in the number of layers, with ResNet101 and ResNext50 having higher number of layers than ResNet50. It was found that increasing number of layers or cardinality does not necessarily increase accuracy in solving vehicle re-identification problem. The approach of ensembling different architectures/models is commonly reported in the CNN literature. However, in the VeRid application domain, there are very few such references. In this work, we present a model ensembled over the three architectures which significantly improves re-identification results compared to the state of the art.
The widely used re-ranking using k-reciprocal nearest neighbors scheme has been improved and incorporated in our work. This process uses a mix of Euclidean and Jaccard distances. We have augmented an algorithm that uses a score to obtain the best mix. This can be used as a generalized tool to finetune re-ranking for different datasets. Our results demonstrate that this has a significant impact in improving re-identification accuracy.

We have demonstrated the PCB-based model along with overlapped parts and ensembled approach using different backbone networks as a strong model for VReId problem in light of comparable results with the state of the art.

The subsequent sections of the paper are structured in the following manner: Section 2 focuses on the review of the existing literature and the rationale behind the study and Section 3 outlines the methodology employed. In Section 4, the experimental results are examined, analyzed, and contrasted with current standards, while Section 5 summarizes the findings and draws conclusions.

2. Related Work and Motivation

In this section, we review some of the existing approaches used in VReId. We also review the existing datasets used in our work.

2.1. Vehicle Re-Id Datasets

Several vehicle re-identification datasets have been created for use in vehicle re-identification [5,6,7,8,9,10,11]. The majority of existing studies present their findings primarily on the VeRi and VehicleId datasets. Therefore, we utilize the VeRi and VehicleId datasets for our experiments, which are detailed below:

VeRi Dataset: The dataset consists of 49,357 images of 776 vehicles taken from 20,138 different camera angles. These cameras are distributed across an area of 1 square kilometer. The images are sourced from unconstrained real-world surveillance footage. Each vehicle is photographed by 20 different cameras. The images vary in terms of viewpoint, resolution, lighting conditions, and occlusion. The training set contains 575 unique vehicle identities, while the remainder are found in the test set. The test set includes both a query set and a gallery set. In the query set, there is only one image per vehicle identity, which is utilized to locate the corresponding identity in the gallery set.
VehicleId Dataset: The collection consists of 221,763 pictures of 26,267 vehicles. Each vehicle is depicted from two distinct angles: front and back. The training dataset includes 110,178 images representing 13,134 different vehicle identities. The test set has been divided into three parts. These parts include 800, 1600, and 2400 vehicles, containing 7332, 12,995, and 20,038 images, respectively. For every one of the three parts, the query set consists of a single vehicle image for each identity, while the remainder of the vehicle images are included in the gallery set, where matches need to be identified.

2.2. Vehicle Re-Id Methods

Vehicle re-id methods in the literature may be divided broadly into four categories:

Single modal methods comprise a pre-trained backbone network that is used to extract features which are then used for re-identification [3,10].
Models with separate convolutional networks for extracting global/local attribute features which are then combined with the output features of a pre-trained network for re-identification [12,13,14]. This two-pronged approach adds to their complexity.
CNN-based models for detecting global/local re-id features without using any pre-trained backbone model [15,16,17,18].
Part-based models with single [4] or multiple branches [19] for extracting features. Single-branch models are inherently simpler than multi-branch models because they have fewer convolutional layers.

The early works on VReid propose methods based on the fusion of texture-based handcrafted features along with deeply learned features [5,20]. However, they fail to achieve significant improvement in accuracy. Some methods then have used models pre-trained on ImageNet, such as AlexNet [5,21], VGG [6,22], GoogLeNet [20,23], Resnet [24,25], and Mobile Net [26] for feature extraction. However, all these methods are single-modal methods focusing on the simple use of the backbone architecture to extract the features. Wang et al. suggest a combined method that incorporates learning deep features alongside attributes like camera angle, vehicle color, and type [13]. Qian et al. utilize a dual-branch framework in which local characteristics are gathered through horizontal stripes from a backbone layer’s output, merging this with attribute-related information to generate a comprehensive feature [27]. Zhao presented a single-stage Single Shot Detector aimed at pinpointing areas of interest in vehicle images, utilizing deep features as local characteristics [14]. Another work adopts a dual strategy that initially identifies local areas of interest and subsequently aligns them with the overall feature map, guaranteeing a combination of local attribute-based features and global re-identification characteristics [12]. Typically, various types of custom object detectors are employed to extract specific parts or attributes [28]. A significant drawback of this model is the necessity of a separate module for attribute extraction, which substantially increases the complexity. Wang et al. utilized a convolutional neural network to extract region feature vectors from various segmentation outcomes and acquired appearance feature vectors of the target vehicle by combining them with global feature vectors [15]. Liu et al. uses a region-aware model for finding local features [29]. Zhu et al. implemented a novel architecture without a backbone network, utilizing closely connected convolutional units for feature learning. This architecture was later applied with a Siamese network-like structure to enhance feature detection, and also incorporated a modified quadrature pooling layer for feature production [16,17,18]. Peng et al. utilizes a feature learning approach that incorporates a Spatial Transformer Network-based localization model to combine global and local features into a representation for vehicle ReID [30]. However, they do not employ any pre-trained backbones and only propose independent network architectures, thereby affecting the discriminative ability of the models in the form of lower accuracy. Spatio-temporal features of the vehicles in city traffic surveillance scenarios which are also used for traffic prediction have been utilized for solving VReid [31,32]. However, this imposes a limitation for such temporal information to be known for retrieval. Another method that has been used quite successfully in person reidentification and in general image classification is learning part-based features. In a method used for person re-identification where horizontal part-based features are used as local features [4], a refined part pooling is also used to reduce in-part inconsistencies. However, it does not analyze the effect of using different backbones or independently using partitions in other dimensions. Another work used in person re-identification with good accuracy uses a multi-branch mix of part-based and global feature-based networks [3]. Wang et al. employ a dual-branch part-based network that incorporates both horizontal and vertical segmentation, along with an external memory to store the features from each branch [19]. It employs a blend of two losses for metric learning. Such multi-branch networks are naturally more complex than single-branch networks.

The problem of within part inconsistency in the case of part-based methods has been studied in the literature and methods like RPP have been proposed for solving this problem [4]. Works like PAN target the problem of misalignment of parts across vehicle images due to differing views by using a novel Part Alignment Network. The method uses cross-correlation for alignment of vehicle parts [33]. For maintaining consistency in different parts for person re-identification, a model was proposed which aimed at aligning the predicted distributions between each part using KL loss. The parts are trained following the global feature in a multi-stage format [34]. Another work in VReid uses a cross part interaction module to enhance correlation between vehicle parts obtained by rigid partioning to generate local information. The architecture follows a three branch structure with global and part features [35]. In speech and signal analysis problems, windowing is used to analyze sections of the signal or speech data [36]. An overlap approach is used in such cases to ensure continuity and context which is akin to the problem of within part inconsistency in this instant case. Taking inspiration from such methods, we also propose a part-based overlap method for solving within part inconsistency. To the best of our knowledge, single branch part-based models for VReId have not been thoroughly studied. Also, by using some augmentations like refined part pooling, part overlapping, re-ranking, and ensembling, high accuracy can be achieved.

3. Methodology

The framework of the part-based model (PCB) is as illustrated in Figure 1. The steps and methods are detailed below.

3.1. Data Preprocessing

The unprocessed images undergo specific alterations during the pre-processing phase.

First, the images are resized to a fixed size (384,192). This is undertaken while keeping the pre-trained backbone model and the input it expects in view.
Random Horizontal Flip—This is a technique for augmenting image data where an input image is flipped along the horizontal axis based on a specified probability.
Random Erasing—This method involves randomly selecting a portion of an image and removing its pixels, which aids in training the model to be more robust with the provided data.
Finally, the images are transformed into tensors and standardized to the same mean (0.485, 0.456, 0.406) and standard deviation (0.229, 0.224, 0.225) that are applied to typical ImageNet images. This standardization is essential for attaining the best outcomes when utilizing backbone architectures that have been pre-trained on ImageNet.

3.2. Backbone Model

Any model that is appropriate for image classification can be utilized as the backbone model. For this instance, we have selected ResNet50, ResNet101, and ResNeXt50, which have been pre-trained on the ImageNet dataset. ResNet or residual network is used in many computer vision applications. It uses the concept of interconnections from higher to lower layers in order to mitigate the problem of vanishing gradients, particularly in very deep networks. The building block of Resnet is defined as follows:

y = F (x, w i) + x

(1)

Here, x and y are, respectively, the input and output vectors of the layers. The function

F (x, w i)

represents the layers in between where wi are the weights. Here, the output is arrived at by addition of the output of the layers with the original input which helps preserve information from the input layer which in effect is the essence of ResNet. ResNet has been used in similar part-based networks [3,4,25], which inspired us to use it as one of the base models. ResNext [37] is a residual network, a modified form of ResNet with an extra dimension, cardinality. The structure of ResNet and ResNext is shown in Figure 2

This addition of the extra dimension reduces the number of hyperparameters compared to conventional ResNet without compromising on accuracy. In a typical neural network, convolutional layers are usually succeeded by a final pooling layer and then a fully connected layer to produce the outputs. This is also true for our ResNet 50 backbone network. The last two layers consist of a global pooling layer followed by a fully connected layer. We will remove the last two layers of the backbone network to continue constructing our model.

3.3. Part-Based Convolutional Baseline (PCB) Model

On passing an image through the backbone network, we obtain a 3D tensor, T, as shown in Figure 1. The 3D tensor has three axes viz. horizontal, vertical, and channel, and is of the dimensions

N x C x H x V

where N is the batch size, C is the number of channels, H is the height wise dimension, and V is the width wise dimension. The feature vector is a column vector along the channel axis. Several parts (say, p) are extracted uniformly of equal width and height on the convolutional tensor T by passing it through a 2D pooling layer with size

p x 1

which converts the tensor dimension to

N x C x p x 1

. p parts are then extracted on the convolutional tensor with each part having dimension

N x C x 1 x 1

and the channel dimensions being the same as that of the original convolutional tensor. All the feature vectors in a particular part are pooled into a single column vector

g_{i}

(i = 1, 2,

…

, p)

. We pool all the column vectors in a same part into a single part-level column vector

g_{i}

(i = 1, 2, \dots, p)

of dimension 2048. We have used average pooling as the pooling method. The column vector, g, is then transformed into a dimension-reduced column vector, h of dimension 256, using a convolutional layer. Each vector, h, is then fed into a classifier comprising a fully-connected (FC) layer with Softmax activation for identity prediction. For every image, we add the cross-entropy losses over the parts for every training image. The model is sought to be optimized by minimizing the loss. In the testing phase, the concatenation of part vectors g or h is used as the vehicle image descriptor, denoted by G or H, where

G = [g_{1}, g_{2},

…

, g_{p}]

or

H = [h_{1}, h_{2},

…

, h_{p}]

. In our experiment, we use the vector H due to its small size and hence it is computationally less expensive.

3.4. Refined Part Pooling

Partitioning of the convolutional tensor aims to target attention towards various localized parts of the image which may be specific identifiers of the image. However, in the case of uniform partition, there is a high chance of within part inconsistency, i.e., some portions of a part might actually be more similar to the nearby part. Although these might essentially be outliers, rearranging them to their respective nearest parts might provide respite from such inconsistencies. For this, in the referred paper, they have proposed a refined part pooling method, using the trained parts from the earlier stage, i.e., new parts are arrived at which are closest to the original parts [4]. Initially, the part-based model with uniform parts is trained for convergence. Then, a classifier with p-part categories is used on the convolutional tensor T to generate a new set of parts to replace the original parts. In this process, the other learned layers of the part-based model are kept fixed and only the part classifier is retrained on the training data. This step ensures that the classifier generates parts which are close to the original parts. The entire model is then trained again for a few epochs for fine tuning and optimization.

3.5. Proposed Overlapped Part-Based Convolutional Baseline (OPCB)

Partitioning of the convolutional tensor aims to target attention towards various localized parts of the image which may be specific identifiers of the image. However, in case of uniform partition, there is a high chance of within-part inconsistency, i.e., some portion of a part might actually be more similar to the nearby part. Although these might essentially be outliers, rearranging them to their respective nearest parts might provide respite from such inconsistencies. We aim to achieve this by introducing the concept of overlapped parts. We train a separate set of uniform parts which are essentially overlapped parts corresponding to the original parts. The process is illustrated in Figure 3 and Figure 4 and described below:

Let the original parts be p = {

p_{0}, p_{1}, \dots, p_{n - 1}

}.

Now, any overlapped part would essentially be an overlap between

p_{i - 1}

and

p_{i}

∀

i > 0

and

i < n

.

To achieve this, each

p_{i}

is partitioned into two parts

p_{(i, 0)}

and

p_{(i, 1)}

such that

p_{(i, 0)}, p_{(i, 1)}

ϵ

p_{i}

.

The new overlapped part

q_{i}

is formed such that

q_{i} = concat (p_{(i - 1, 1)}, p_{(i, 0)})

(2)

∀

i > 0

and

i < n

q = {q_{1}, q_{2}, \dots, q_{(n - 1)}}

(3)

So, if there are n original parts, there must be n − 1 overlapped parts, q. Now, each new part,

q_{i}

, is composed partly of original parts

p_{(i - 1)}

and

p_{i}

∀

i > 0

and

i < n

.

In addition, there are two residual parts,

p_{(0, 0)}

and

p_{(n - 1, 1)}

, in addition to the

n - 1

overlapped parts.

So, there are a total of

n + 1

new parts,

n - 1

overlapped and 2 residual parts. Let the original fully connected linear layers/classifiers be

c = c_{0}, c_{1}, \dots, c_{n - 1}

which correspond to each of the original parts in p.

Each of the overlapped parts are passed through two fully connected layers/classifiers separately. The two classifiers are the ones corresponding to the original parts of each overlapped part. So, we obtain

2 (n - 1)

no. of features for

n - 1

overlapped parts. The two residual parts,

p_{(0, 0)}

and

p_{(n - 1, 1)}

, are passed through

c_{0}

and

c_{n - 1}

, respectively, to obtain two more features. So, we have a total of

2 n

features which are concatenated pairwise to obtain n final features. The final feature set is given by the following:

m_{i} = c o n c a t (c_{i + 1} (p_{(i, 0)}), c_{i + 1} (p_{(i, 1)}))

(4)

where

i \in {0, 1,

…

5}

The process is described in the following steps:

Step 1: Train the PCB model.
Step 2: Generate overlapped and residual parts.
Step 3: Train overlapped and residual parts through original classifiers.
Step 4: Concatenate the output pairwise to generate final features.

3.6. Generalized Training and Evaluation Procedure

The model is trained on the VeRi and VehicleId datasets. Most of the state-of-the-art methods are trained on these datasets, so it gives us an opportunity for comparison.

For the RPP method, the training is completed in three phases. First, it is trained in the PCB phase for 60 epochs in the case of VeRi dataset and 100 epochs in the case of VehicleId dataset. After that, it is trained in the RPP phase for 5 epochs for learning and realigning the parts. Finally, the overall network is trained for 10 epochs for final optimization. The learning rates are fixed to be different for all the three phases. After that, the evaluation is performed using a query and gallery dataset. The images in the query dataset are used to search for images of the same images in the gallery dataset. The trained model is used to collect the features, and then, using a Euclidean distance, the distance between the probe and gallery features is found out and is utilized for finding the match.

For the overlapped method, the training is performed in two phases. First, it is trained for the part-based network, PCB, as described above and then for the overlapped parts, OPCB, for 20 epochs for both VeRi and VehicleId datasets. The learning rates are fixed to be the same for the two phases as we use the same classifiers as the PCB stage and there is no separate trainable part classifier. After this, the evaluation is performed using a query and gallery dataset. The images in the query dataset are used to search for the same images in the gallery dataset. The features are collected from the trained models of the overlapped phase to generate the feature vector, and then, using a Euclidean distance, the distance between the probe and gallery features is found out and is utilized for finding the match.

Evaluation is performed for Mean Average Precision, Rank-1 accuracy, Rank-5 accuracy, and Rank-10 accuracy. The metrics are defined below: Average precision is given by the following:

A P = \sum_{k = 1}^{n} p (k) g (k) / N_{g}

(5)

where n is the number of test and

N_{g}

is the number of reference (ground truth) images while the precision is p(k) at the k th position.

g (k)

represents a function where the value is 1 if a match is found at the k th position; otherwise it is 0. The mean average precision (mAP) is formulated as follows:

m A P = (\sum_{q = 1}^{Q} A P (q)) / Q

(6)

where Q is the total number queries. Rank measures the similarity (in percentage) of a test to its class, e.g., if test1 corresponds to class1 and is found in top1 results then it is called rank@1; if found in the top5 results then it is called rank@5 and so on.

3.7. Re-Ranking Using k-Reciprocal Nearest Neighbours

During the evaluation stage, given a probe image, a ranked list is generated which contains images in descending order of their similarity to the probe image. In our evaluation mechanism, we normally find the distance between the probe image and the images in the gallery set, which is generally a Euclidean distance, and then use the distance to create an ordered list of similar items according to their distances. Re-ranking is used as a post processing activity in order to increase accuracy. Inspired by [38], we set about creating a new distance which is a combination of the original Euclidean distance and the Jaccard distance, which in turn is calculated from the k-reciprocal features of two images. A new ranking list or a re-ranked list is then created from the new distance and hence the name re-ranking. If two images are in the k-nearest neighbors of each other, then the images can be said to be in a k-reciprocal neighborhood of each other. Assuming a probe person p and a gallery set with N images,

G = g_{i} | i = 1, 2, \dots N

. If

N (p, k)

is the k-nearest neighbors of the probe p, the k-reciprocal nearest neighbors

R (p, k)

is defined as follows:

R (p, k) = {g_{i} ∣ (g_{i} \in N (p, k) \land (p \in N (g_{i}, k)}

(7)

The k-reciprocal neighbor set of probe and gallery images is encoded in a vector form to form a k-reciprocal feature. A few nearest neighbors of the probe are then added to the feature vector to improve performance. The k-reciprocal features are then used to calculate the Jaccard distance as in Equation (6). The final distance is calculated as the weighted combination of the original distance (Euclidean distance between probe and gallery) and the Jaccard distance. It is subsequently used to acquire the re-ranking list as described in [38]. The final distance in the case of re-ranking is given by the following:

F D = (1 - λ) JD + λ \times ED

(8)

where FD—Final distance, JD—Jaccard Distance and ED—Euclidean Distance, and the value of

λ

lies between 0 and 1. Jaccard distance is given by,

J D = 1 - | (R (p) \cap R (g_{i})) | / | (R (p) \cup R (g_{i})) |

(9)

where

R (p)

and

R (g_{i})

indicates the k-reciprocal feature set of the probe and gallery, respectively. A low value of

λ

will increase the contribution of the Jaccard distance and a high value of

λ

will increase the contribution of the original Euclidean distance. Selection of an appropriate value of

λ

is therefore a design choice.

3.8. Proposed Algorithm for Best Distance Mix Calculation

If we consider a parameter p and the corresponding metrics

(m_{1}, m_{2}, \dots)

, and, if we need to find the best value of p for which we would have the highest values of the metrics, we would calculate an error score, S, for each value of p, such that

S = \sum_{All metrics (m_{1}, m_{2}, \dots)} [max (V_{metric}) - V]

(10)

where

V_{m e t r i c}

is the set of values of the metric across all values of p and V is the value of the metric for a particular p. The lowest value of S will correspond to the best value of p for the dataset. The detailed process is described in Algorithm 1.

Algorithm 1 Algorithm for distance mix calculation for a dataset

Step 1: Compute values of ( $m_{1}$ , $m_{2}$ ,…) for different values of p.
Step 2: Determine highest accuracy value for each metric corresponding to different values of p, i.e., max( $V_{m e t r i c}$ )
Step 3: Calculate (max( $V_{m e t r i c}$ )-V) for each metric for all selected different values of p.
Step 4: Calculate the score, S as per Equation (10), of the expression computed in Step 3 for each value of p.
Step 5: The smallest value of S corresponds to the ideal value of parameter, p for the dataset.

4. Experimental Results and Analysis

We carried out a number of thorough experiments on the VeRi and VehicleId datasets utilizing PCB, refined part pooling, and OPCB with re-ranking during the evaluation phase, and the results of these experiments are presented in this section. Three backbone networks viz. Resnet-50, Resnet-101, and ResNext 50 were used. The images during both the testing and evaluation phases were resized to 384 × 192. The output tensor after the backbone layers has a channel dimension of 2048. The training batch size is set at 32. The weights are learned by an SGD optimizer using cross entropy loss as the loss criterion. We perform uniform partitioning in horizontal dimension with six parts. In the PCB stage, the learning rate is fixed at 0.1, while during re-ranking and final training it is fixed at 0.01. During OPCB, however, we have fixed the learning rate as the same as during the PCB stage. This is because the same classifiers as used in PCB are reused for OPCB. In OPCB stage, a mixture of triplet loss and cross entropy loss is used as the loss funtion with the ratio of triplet to cross-entropy being

1 : 2

. The re-ranking process was evaluated to find the best final distance with respect to

λ

using Equation (9). To the best of our knowledge, this type of analysis has not been investigated until now in the literature. We have investigated the performance of an ensemble model using all three backbone networks over both VeRi and VehicleId datasets.

4.1. Distance Calculation for Re-Ranking

Using (5), we experimented with VeRi and VehicleId datasets to find out the best value of

λ

for which we obtain the best values of the accuracy metrics. We varied the value of

λ

from 0 to 1 and calculated the re-ranked mAP, Rank-1 accuracy, Rank-5 accuracy, and Rank-10 accuracy for the part-based convolutional model. The results are shown in Table 1. Following the process outlined in Section 3.6, we present the accuracy values for values of

λ

, which is our parameter in this case and compute the corresponding values of score, S.

From the above table, we can clearly see that the best values of accuracies for the VeRi and VehicleId datasets occur at different values of

λ

. For the VeRi dataset, the highest values seem to occur at lower values of

λ

, while for the VehicleId dataset, the highest values occur for higher values of

λ

. The lowest value of S is found at 0.3 for the VeRi dataset, while it is found at 0.9 for VehicleId dataset.

The major conclusion of the above results is that the effect of the k-reciprocal neighbor feature-based Jaccard distance is significantly greater for the VeRi dataset than VehicleId. The reason for this may be due to the fact that there are as many as 20 camera viewpoints in the VeRi dataset in contrast to only 2 in VehicleId dataset. As a result, with only two viewpoints for diversity, the Euclidean distance is sufficient for the VehicleId dataset, and using a nearest reciprocal neighbor extrapolation does not provide any additional advantages. The result also highlights the strong impact of the type of dataset on the value of

λ

. For Market1501 dataset, the best value was reported to be 0.3 with six camera viewpoints, which further vindicates our proposition [38].

4.2. Impact of Backbone and Ensembled Model

In the earlier section, the results tabulated were all on the ResNet50 (R50) backbone. Here, we investigate the comparative performance taking two other backbones, Resnet 101 (R101) and ResNext50 (RN50). Table 2 and Table 3, respectively, present the accuracy metrices on VeRi and VehicleId datasets.

From Table 2, it can be observed that PCB along with refined part pooling and re-ranking with Resnet-50 as the backbone gives the best results in terms of mAP. Hence, we find that increasing the number of layers need not necessarily increase the accuracy for the Veri dataset.

From Table 3, it can be observed that PCB with re-ranking gives the best results across all metrics. Also, the backbones, Resnet 50, Resnet 101, and ResNext50, perform at par in the case of PCB, and a clear winner cannot be judged in terms of accuracy. However, ResNet50 has lower computational complexity compared to the other two backbones. It can also be seen that refined part pooling, although quite effective in the case of VeRi dataset, fails to improve performance in the case of the VehicleId dataset. The reason for this is the unique information in parts is less likely to be effective in the case of datasets with a lower number of viewpoints. When the number of viewpoints is large, the feature vector for each part for each vehicle identity is more likely to contain more unique information. This is a major issue in the generalizability of such refinement methods used in person re-identification to vehicle re-identification as the parts are naturally more unique in the case of person images, irrespective of viewpoint, while this is not the case in vehicle images.

During the testing phase, the query and the gallery features are computed separately for all the different backbones for a particular dataset. The distances are then found between the query and the gallery features which are then averaged to arrive at new distances. The new distances are then used for the testing. If we have

S_{b} 1, S_{b} 2, S_{b} 3

as the sets of distances corresponding to different backbones b1, b2, and b3, then we must have the combined average set of distances

F = a v g (S_{b} 1, S_{b} 2, S_{b} 3)

(11)

The ensembled results on backbones ResNet50, ResNet101, and ResNext50 are similarly computed for the VeRi and VehicleID datasets and are shown in Table 4 and Table 5, respectively. The results show considerable improvement over standalone backbone networks.

4.3. Performance of OPCB

The accuracy results of OPCB for the VeRi and VehicleId datasets are shown in Table 6 and Table 7, respectively. From the tables, the following points can be adduced:

OPCB improves PCB consistently across the VeRi and VehicleId datasets. This is in contrast with RPP which can only improve PCB for the VeRi dataset and fails to do so in the case of the VehicleId dataset.
OPCB performs at par with RPP in the case of the VeRi dataset and largely improves over RPP and PCB in the case of the VehicleId dataset.

OPCB attempts to bring similar portions of neighbouring parts together to resolve part inconsistency. RPP trains the part classifier for the entire feature set which might bring similar features from distant parts together. However, some contextual information gets lost in the process which is not the case for OPCB which preserves contextuality.

4.4. Comparison with State of the Art

The analysis of our results gives two major takeaways: Our proposed ensembled model with refined part pooling and re-ranking works best on the VeRi dataset, but for the VehicleId dataset, only the re-ranking scheme is better suited. RPP in fact slightly degrades the performance of the latter. Accordingly, we make a comparison with the state-of-the-art results on the VeRi and VehicleId datasets, as shown in Table 8 and Table 9, respectively.

The following facts come to light through the above comparison:

In the case of the VeRi dataset, our proposed ensembled part model with horizontal partitioning along with the refined part pooling gives significantly better results compared to all in terms of mAP. The R-1 and R-5 performances are as good as most of the reported works. The proposed OPCB and ensembled model also outperforms transformer-based models like Swin Transformer and TransReid.
In the case of the Vehicle-Id dataset, most works have not reported their mean average precision and hence the values could not be compared. But on comparison with the state of the art in terms of Rank-1 and Rank-5 accuracies, it is found that our ensembled model performs at par with most of the reported results and outperforms transformer-based models like TransReid.

5. Conclusions

In this work, we have analyzed the PCB model with refined part pooling for VReId over two standard datasets using three different backbones networks. An ensemble over these three was proposed which was found to outperform significantly in comparison to the use of standalone backbones. A limitation of the refined part pooling identified was that the performance in the case of the VehicleId dataset was not as good as it was for the VeRi dataset. To solve this, we have proposed a novel method, OPCB, which was found to perform consistently across datasets and improves upon PCB and RPP. The scope of re-ranking in improving performance was evaluated and extensively analyzed. The best mix of Jaccard and Euclidean distances was found to be dataset-specific and, therefore, a generalized method for arriving at the best mix was proposed. Finally, the proposed ensembled model with refined part pooling and re-ranking and the OPCB method was found to have a great application potential in VReId systems, which matches closely with recent related works. The study could be expanded by exploring how various factors, such as the number of components, pre-processing settings, and overlap percentage, influence re-identification accuracy. Future research could also examine the problem of feature redundancy in overlapping parts, as well as the efficiency of the computations involved.

Author Contributions

Conceptualization and methodology, R.K.N. and D.M.; software R.K.N.; validation, R.K.N. and D.M.; formal analysis, R.K.N. and D.M.; investigation, R.K.N. and D.M.; data curation, R.K.N. and D.M.; writing—original draft preparation, R.K.N.; writing—review and editing, R.K.N. and D.M.; supervision D.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

We used the publicly available VeRi and VehicleId datasets in this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Amiri, A.; Kaya, A.; Keceli, A.S. A Comprehensive Survey on Deep-Learning-based Vehicle Re-Identification: Models, Data Sets and Challenges. arXiv 2024, arXiv:2401.10643. [Google Scholar] [CrossRef]
Wang, H.; Hou, J.; Chen, N. A Survey of Vehicle Re-Identification Based on Deep Learning. IEEE Access 2019, 7, 172443–172469. [Google Scholar] [CrossRef]
Wang, G.; Yuan, Y.; Chen, X.; Li, J.; Zhou, X. Learning Discriminative Features with Multiple Granularities for Person Re-Identification. arXiv 2018, arXiv:1804.01438. [Google Scholar] [CrossRef]
Sun, Y.; Zheng, L.; Li, Y.; Yang, Y.; Tian, Q.; Wang, S. Learning Part-based Convolutional Features for Person Re-Identification. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 902–917. [Google Scholar] [CrossRef] [PubMed]
Liu, X.; Liu, W.; Ma, H.; Fu, H. Large-scale vehicle re-identification in urban surveillance videos, 2016. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Seattle, WA, USA, 11–15 July 2016. [Google Scholar]
Liu, H.; Tian, Y.; Wang, Y.; Pang, L.; Huang, T. Deep Relative Distance Learning: Tell the Difference between Similar Vehicles. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2167–2175. [Google Scholar] [CrossRef]
Kanacı, A.; Zhu, X.; Gong, S. Vehicle Re-identification in Context. In Pattern Recognition; Brox, T., Bruhn, A., Fritz, M., Eds.; Springer: Cham, Switzerland, 2019; pp. 377–390. [Google Scholar]
Lou, Y.; Bai, Y.; Liu, J.; Wang, S.; Duan, L. VERI-Wild: A Large Dataset and a New Method for Vehicle Re-Identification in the Wild. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–19 June 2019; pp. 3230–3238. [Google Scholar] [CrossRef]
Tang, Z.; Naphade, M.; Liu, M.; Yang, X.; Birchfield, S.; Wang, S.; Kumar, R.; Anastasiu, D.; Hwang, J. CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA, 15–19 June 2019; pp. 8789–8798. [Google Scholar] [CrossRef]
Yan, K.; Tian, Y.; Wang, Y.; Zeng, W.; Huang, T. Exploiting Multi-grain Ranking Constraints for Precisely Searching Visually-similar Vehicles. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 562–570. [Google Scholar] [CrossRef]
Zhou, Y.; Liu, L.; Shao, L. Vehicle Re-Identification by Deep Hidden Multi-View Inference. IEEE Trans. Image Process. 2018, 27, 3275–3287. [Google Scholar] [CrossRef] [PubMed]
He, B.; Li, J.; Zhao, Y.; Tian, Y. Part-Regularized Near-Duplicate Vehicle Re-Identification. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–19 June 2019; pp. 3992–4000. [Google Scholar] [CrossRef]
Wang, H.; Peng, J.; Chen, D.; Jiang, G.; Zhao, T.; Fu, X. Attribute-Guided Feature Learning Network for Vehicle Reidentification. IEEE MultiMedia 2020, 27, 112–121. [Google Scholar] [CrossRef]
Zhao, Y.; Shen, C.; Wang, H.; Chen, S. Structural Analysis of Attributes for Vehicle Re-Identification and Retrieval. IEEE Trans. Intell. Transp. Syst. 2020, 21, 723–734. [Google Scholar] [CrossRef]
Wang, Z.; Tang, L.; Liu, X.; Yao, Z.; Yi, S.; Shao, J.; Yan, J.; Wang, S.; Li, H.; Wang, X. Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-identification. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 379–387. [Google Scholar] [CrossRef]
Zhu, J.; Du, Y.; Hu, Y.; Zheng, L.; Cai, C. VRSDNet: Vehicle re-identification with a shortly and densely connected convolutional neural network. Multimed. Tools Appl. 2019, 78, 29043–29057. [Google Scholar] [CrossRef]
Zhu, J.; Zeng, H.; Lei, Z.; Liao, S.; Zheng, L.; Cai, C. A Shortly and Densely Connected Convolutional Neural Network for Vehicle Re-identification. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 3285–3290. [Google Scholar] [CrossRef]
Zhu, J.; Huang, J.; Zeng, H.; Ye, X.; Li, B.; Lei, Z.; Zheng, L. Object Reidentification via Joint Quadruple Decorrelation Directional Deep Networks in Smart Transportation. IEEE Internet Things J. 2020, 7, 2944–2954. [Google Scholar] [CrossRef]
Wang, H.; Peng, J.; Jiang, G.; Xu, F.; Fu, X. Discriminative feature and dictionary learning with part-aware model for vehicle re-identification. Neurocomputing 2021, 438, 55–62. [Google Scholar] [CrossRef]
Liu, X.; Liu, W.; Mei, T.; Ma, H. PROVID: Progressive and Multimodal Vehicle Reidentification for Large-Scale Urban Surveillance. IEEE Trans. Multimed. 2018, 20, 645–658. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems; Pereira, F., Burges, C., Bottou, L., Weinberger, K., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2012; Volume 25. [Google Scholar]
Liu, S.; Deng, W. Very deep convolutional neural network based image classification using small training sample size. In Proceedings of the 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, Malaysia, 3–6 November 2015; pp. 730–734. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA, 11–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Chen, H.; Lagadec, B.; Bremond, F. Partition and Reunion: A Two-Branch Neural Network for Vehicle Re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Qian, J.; Jiang, W.; Luo, H.; Yu, H. Stripe-based and attribute-aware network: A two-branch deep model for vehicle re-identification. Meas. Sci. Technol. 2020, 31, 095401. [Google Scholar] [CrossRef]
Yan, L.; Li, K.; Gao, R.; Wang, C.; Xiong, N. An Intelligent Weighted Object Detector for Feature Extraction to Enrich Global Image Information. Appl. Sci. 2022, 12, 7825. [Google Scholar] [CrossRef]
Liu, X.; Zhang, S.; Huang, Q.; Gao, W. RAM: A Region-Aware Deep Model for Vehicle Re-Identification. In Proceedings of the 2018 IEEE International Conference on Multimedia and Expo (ICME), Los Alamitos, CA, USA, 23–27 July 2018; pp. 1–6. [Google Scholar] [CrossRef]
Peng, J.; Wang, H.; Zhao, T.; Fu, X. Learning multi-region features for vehicle re-identification with context-based ranking method. Neurocomputing 2019, 359, 427–437. [Google Scholar] [CrossRef]
Zhou, Y.; Li, J.; Chen, H.; Wu, Y.; Wu, J.; Chen, L. A spatiotemporal attention mechanism-based model for multi-step citywide passenger demand prediction. Inf. Sci. 2020, 513, 372–385. [Google Scholar] [CrossRef]
Kim, H.G.; Na, Y.; Joe, H.W.; Moon, Y.H.; Cho, Y.J. Vehicle Re-identification with Spatio-temporal Information. In Proceedings of the 2023 14th International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 11–14 October 2023; pp. 1825–1827. [Google Scholar] [CrossRef]
Chen, Y.; Ma, B.; Chang, H. Part alignment network for vehicle re-identification. Neurocomputing 2020, 418, 114–125. [Google Scholar] [CrossRef]
Li, Z.; Lv, J.; Chen, Y.; Yuan, J. Person re-identification with part prediction alignment. Comput. Vis. Image Underst. 2021, 205, 103172. [Google Scholar] [CrossRef]
Pang, X.; Tian, X.; Nie, X.; Yin, Y.; Jiang, G. Vehicle re-identification based on grouping aggregation attention and cross-part interaction. J. Vis. Commun. Image Represent. 2023, 97, 103937. [Google Scholar] [CrossRef]
Bäckström, T.; Räsänen, O.; Zewoudie, A.; Zarazaga, P.P.; Koivusalo, L.; Das, S.; Mellado, E.G.; Mansali, M.B.; Ramos, D.; Kadiri, S.; et al. Introduction to Speech Processing, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar] [CrossRef]
Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5987–5995. [Google Scholar] [CrossRef]
Zhong, Z.; Zheng, L.; Cao, D.; Li, S. Re-ranking Person Re-identification with k-Reciprocal Encoding. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3652–3661. [Google Scholar] [CrossRef]
Kuma, R.; Weill, E.; Aghdasi, F.; Sriram, P. Vehicle Re-identification: An Efficient Baseline Using Triplet Embedding. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–9. [Google Scholar] [CrossRef]
Xu, Y.; Jiang, N.; Zhang, L.; Zhou, Z.; Wu, W. Multi-scale Vehicle Re-identification Using Self-adapting Label Smoothing Regularization. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 2117–2121. [Google Scholar] [CrossRef]
Lin, W.; Li, Y.; Yang, X.; Peng, P.; Xing, J. Multi-View Learning for Vehicle Re-Identification. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 8–12 July 2019; pp. 832–837. [Google Scholar] [CrossRef]
Chu, R.; Sun, Y.; Li, Y.; Liu, Z.; Zhang, C.; Wei, Y. Vehicle Re-Identification With Viewpoint-Aware Metric Learning. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8281–8290. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, R.; Cao, J.; Gong, D.; You, M.; Shen, C. Part-Guided Attention Learning for Vehicle Instance Retrieval. IEEE Trans. Intell. Transp. Syst. 2022, 23, 3048–3060. [Google Scholar] [CrossRef]
Li, J.; Yu, C.; Shi, J.; Zhang, C.; Ke, T. Vehicle Re-identification method based on Swin-Transformer network. Array 2022, 16, 100255. [Google Scholar] [CrossRef]
Qian, J.; Pan, M.; Tong, W.; Law, R.; Wu, E.Q. URRNet: A Unified Relational Reasoning Network for Vehicle Re-Identification. IEEE Trans. Veh. Technol. 2023, 72, 11156–11168. [Google Scholar] [CrossRef]
He, S.; Luo, H.; Wang, P.; Wang, F.; Li, H.; Jiang, W. TransReID: Transformer-Based Object Re-Identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 15013–15022. [Google Scholar]
Wu, F.; Yan, S.; Smith, J.S.; Zhang, B. Vehicle re-identification in still images: Application of semi-supervised learning and re-ranking. Signal Process. Image Commun. 2019, 76, 261–271. [Google Scholar] [CrossRef]
Huang, F.; Lv, X.; Zhang, L. Coarse-to-fine sparse self-attention for vehicle re-identification. Knowl.-Based Syst. 2023, 270, 110526. [Google Scholar] [CrossRef]
Quispe, R.; Lan, C.; Zeng, W.; Pedrini, H. AttributeNet: Attribute enhanced vehicle re-identification. Neurocomputing 2021, 465, 84–92. [Google Scholar] [CrossRef]
Pang, X.; Zheng, Y.; Nie, X.; Yin, Y.; Li, X. Multi-axis interactive multidimensional attention network for vehicle re-identification. Image Vis. Comput. 2024, 144, 104972. [Google Scholar] [CrossRef]
Taufique, A.M.N.; Savakis, A. LABNet: Local graph aggregation network with class balanced loss for vehicle re-identification. Neurocomputing 2021, 463, 122–132. [Google Scholar] [CrossRef]
Sun, K.; Pang, X.; Zheng, M.; Nie, X.; Li, X.; Zhou, H.; Yin, Y. Heterogeneous context interaction network for vehicle re-identification. Neural Netw. 2024, 169, 293–306. [Google Scholar] [CrossRef]
Li, B.; Liu, P.; Fu, L.; Li, J.; Fang, J.; Xu, Z.; Yu, H. VehicleGAN: Pair-flexible Pose Guided Image Synthesis for Vehicle Re-identification. In Proceedings of the 2024 IEEE Intelligent Vehicles Symposium (IV), Jeju Island, Republic of Korea, 2–5 June 2024; pp. 447–453. [Google Scholar] [CrossRef]
Bai, L.; Rong, L. Vehicle re-identification with multiple discriminative features based on non-local-attention block. Sci. Rep. 2024, 14, 31386. [Google Scholar] [CrossRef]

Figure 1. Framework of part-based model (PCB).

Figure 2. ResNet (Left) vs. ResNext (Right) with cardinality of 32.

Figure 3. Framework of OPCB network.

Figure 4. Schema of the overalapping procedure.

Table 1. Comparison of re-ranking accuracies. The lowest value of S and corresponding

λ

value is marked by bold character.

Table 1. Comparison of re-ranking accuracies. The lowest value of S and corresponding

λ

value is marked by bold character.

Dataset	$λ$	mAP	Rank-1	Rank-5	Rank-10	Score, S
VeRi	0	60.00	95.41	96.96	98.03	5.84
	0.1	80.92	95.65	97.20	98.45	0.39
	0.3	80.50	95.77	97.44	98.57	0.37
	0.5	80.00	95.77	97.56	98.81	0.41
	0.7	79.30	95.71	97.62	98.75	0.60
	0.9	77.99	95.95	97.91	98.87	0.76
	1	76.24	95.05	97.79	98.99	1.43
VehicleId (Small)	0	82.88	80.45	91.64	92.64	5.12
	0.1	85.16	82.09	96.71	98.99	1.29
	0.3	85.81	82.82	97.40	97.91	1.04
	0.5	86.44	83.54	97.63	98.95	0.39
	0.7	86.80	83.89	97.75	98.93	0.18
	0.9	87.11	84.25	97.73	98.82	0.05
	1	86.59	83.61	97.61	98.79	0.38
VehicleId (Medium)	0	79.24	76.69	88.40	90.26	5.64
	0.1	81.15	77.88	93.32	96.66	2.04
	0.3	81.81	78.55	94.20	97.20	1.35
	0.5	82.48	79.22	94.54	97.35	0.89
	0.7	83.22	80.09	94.54	97.39	0.48
	0.9	84.05	81.17	94.45	97.27	0.05
	1	83.84	80.98	94.20	97.18	0.24
VehicleId (Large)	0	77.74	75.20	87.09	89.81	4.85
	0.1	79.48	76.48	90.44	95.00	1.96
	0.3	80.01	76.99	91.23	95.72	1.33
	0.5	80.77	77.82	91.73	95.96	0.74
	0.7	81.48	78.64	92.14	96.09	0.23
	0.9	81.82	79.01	92.33	96.07	0.01
	1	81.26	78.32	92.18	95.95	0.39

Table 2. Comparison of backbone networks Resnet 50, Resnet 101, and ResNext50 on VeRi dataset.

Model	mAP (R50)	mAP (R101)	mAP (RN50)	R1 (R50)	R1 (R101)	R1 (ResNext 50)	R5 (R50l)	R5 (R101)	R5 (ResNext 50)
PCB	76.24	76.34	75.82	95.05	94.99	94.10	97.79	97.44	97.32
PCB-RR	80.50	80.04	79.97	95.76	95.71	96.07	97.43	96.96	97.20
RPP	77.08	76.15	76.47	94.99	94.46	93.74	97.97	97.38	97.38
RPP-RR	81.42	79.96	80.69	96.24	94.93	95.59	97.50	96.78	96.78

Table 3. Comparison of backbone networks Resnet 50, Resnet 101, and ResNext50 on Vehicle Id dataset.

Dataset Variant	Model	mAP (R50)	mAP (R101)	mAP (RNext 50)	R1 (R50)	R1 (R101)	R1 (RNext 50)	R5 (R50)	R5 (R101)	R5 (RNext 50)
Small	PCB	86.59	87.38	86.57	83.61	84.70	83.63	97.61	97.14	96.99
	PCB-RR	87.11	87.87	87.20%	84.25	85.25	84.38	97.73	97.24	97.22
	RPP	83.61	84.91	85.95	80.43	81.98	83.04	95.33	95.31	96.11
	RPP-RR	84.18	85.74	86.60	81.05	82.93	83.89	95.70	95.73	96.34
Medium	PCB	83.82	83.56	82.19	80.97	80.74	79.12	94.17	94.19	93.74
	PCB-RR	84.05	84.03	82.31	81.17	81.27	79.17	94.45	94.45	94.08
	RPP	81.45	81.67	81.57	78.62	78.90	78.52	91.70	91.83	92.90
	RPP-RR	81.76	82.38	81.96	78.90	79.67	78.94	92.13	92.25	93.21
Large	PCB	81.26	81.24	80.87	78.32	78.27	77.86	92.18	91.86	92.13
	PCB-RR	81.82	81.61	81.23	79.01	78.71	78.25	92.33	92.09	92.29
	RPP	79.09	79.67	80.39	76.23	76.96	77.42	89.39	89.21	91.31
	RPP-RR	79.72	80.33	80.85	76.92	77.69	77.96	89.74	89.60	91.56

Table 4. Performance of ensembled model on VeRi dataset.

Model	mAP (R50)	R1 (R50)	R5 (R50l)	R10 (R50)
Ensembled (PCB)	79.83	95.29	98.09	98.92
Ensembled (PCB)-RR	83.13	97.02	97.61	98.56
Ensembled—RPP	80.37	95.95	98.09	99.04
Ensembled–RPP-RR	83.63	96.60	97.61	98.45

Table 5. Performance of ensembled model on VehicleId dataset.

Variant	Model	mAP	R1	R5	R10
Small	Ensembled (PCB)	88.24	85.52	98.21	99.07
	Ensembled (PCB)-RR	88.51	85.80	98.19	99.17
	Ensembled—RPP	87.34	84.49	97.41	98.87
	Ensembled–RPP-RR	87.81	85.01	97.72	99.02
Medium	Ensembled (PCB)	84.63	81.82	95.08	97.77
	Ensembled (PCB)-RR	84.59	81.73	95.32	97.82
	Ensembled—RPP	84.24	81.49	94.71	97.12
	Ensembled–RPP-RR	84.53	81.80	95.00	97.20
Large	Ensembled (PCB)	82.33	79.32	93.65	96.90
	Ensembled (PCB)-RR	82.81	79.92	93.73	96.88
	Ensembled—RPP	82.15	79.27	92.88	96.26
	Ensembled–RPP-RR	82.59	79.83	92.86	96.35

Table 6. Performance of OPCB on VeRi dataset.

Model	mAP (R50)	R1 (R50)	R5 (R50l)	R10 (R50)
PCB (H)	76.24	95.05	97.79	98.98
PCB (H)-RR	80.50	95.76	97.43	98.56
RPP	77.08	94.99	97.97	98.75
RPP-RR	81.42	96.24	97.50	98.51
OPCB	76.74	95.23	97.91	98.98
OPCB-RR	80.70	95.59	96.96	98.27

Table 7. Performance of OPCB on VehicleId dataset.

Variant	Model	mAP (R50)	R1 (R50)	R5 (R50)	R10 (R50)
Small	PCB	86.59	83.61	97.61	98.75
	PCB-RR	87.11	84.25	97.73	98.82
	RPP	83.61	80.43	95.33	98.33
	RPP-RR	84.18	81.05	95.70	97.61
	OPCB	87.49	84.71	97.55	98.87
	OPCB-RR	87.94	85.20	97.87	98.98
Medium	PCB	83.82	80.97	94.17	97.15
	PCB-RR	84.05	81.17	94.45	97.27
	RPP	81.45	78.62	91.70	95.18
	RPP-RR	81.76	78.90	92.13	95.39
	OPCB	84.03	81.2	94.42	97.4
	OPCB-RR	84.35	81.49	94.82	97.52
Large	PCB	81.26	78.32	92.18	95.95
	PCB-RR	81.82	79.01	92.33	96.07
	RPP	79.09	76.23	89.39	93.48
	RPP-RR	79.72	76.92	89.74	93.68
	OPCB	81.81	78.9	92.5	96.17
	OPCB-RR	82.59	79.84	92.84	96.23

Table 8. Comparison with state of the art on VeRi dataset. NR—Not Reported.

Method and Reference	mAP (%)	R-1 (%)	R-5 (%)
Batch sample [39]	67.55	90.23	96.42
SLSR [40]	65.13	91.24	NR
MRL + Softmax Loss [41]	78.50	94.30	98.70
VANet [42]	66.34	89.78	95.99
MRM [14]	68.55	91.77	95.82
Part-regularized near duplicate [12]	74.30	94.30	98.70
PGAN [43]	79.30	96.50	98.30
SAN [14]	72.5	93.3	97.1
TCPM [23]	74.59	93.98	97.13
Swin Transformer [44]	78.6	97.3	NR
URRNet [45]	72.2	93.1	97.1
TransReid [46]	78.2	96.5	NR
SSL + re-ranking [47]	69.90	89.69	95.41
CFSA [48]	79.89	94.99	98.81
Attribute Net, [49]	80.1	97.1	98.6
URRNet [45]	72.2	93.1	97.1
MIMA Net [50]	79.89	94.99	98.81
OPCB + RR (Ours)	80.77	95.59	96.96
PCB + RPP + RR (Ensembled) (ours)	83.63	97.02	98.45

Table 9. Comparison with state of the art on VehicleId dataset. NR—Not Reported.

Method	Rank-1			Rank-5
	Small (%)	Medium (%)	Large (%)	Small (%)	Medium (%)	Large (%)
MRM [14]	76.64	74.20	70.86	92.34	88.54	84.82
SLSR [40]	75.10	71.80	68.70	89.70	86.10	83.10
Part regularized near duplicate [12]	78.40	75.00	74.20	92.30	88.30	86.40
Batch sample [39]	78.80	73,41	69.33	96.17	92.57	89.45
SAN [14]	79.7	78.4	75.6	94.3	91.3	88.3
MRL + Softmax Loss [41]	84.80	80.90	78.40	96.90	94.10	92.10
VANet [42]	88.12	83.17	80.35	97.29	95.14	92.97
PGAN [40]	NR	NR	77.8	NR	NR	92.1
LABNet [51]	84.02	80.18	77.2	NR	NR	NR
TransReid [46]	83.6	NR	NR	97.1	NR	NR
HCI-Net [52]	83.8	79.4	76.4	96.5	92.7	91.2
VehicleGAN [53]	83.5	78.2	75.7	96.5	93.2	90.6
Attribute Net [49]	86.0	81.9	79.6	97.4	95.1	92.7
MIMANet [50]	83.28	80.14	77.72	96.31	93.71	91.29
URRNet [45]	76.5,	73.7	68.2	96.5	92.0	89.6
MDFENet [54]	83.66	80.78	77.88	NR	NR	NR
OPCB + RR (Ours)	85.20	81.49	79.84	97.87	94.82	92.84
PCB + RR (Ensemble) (ours)	85.25	81.72	79.92	97.24	95.32	93.73

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nath, R.K.; Mitra, D. An Analysis of Partitioned Convolutional Model for Vehicle Re-Identification. Electronics 2025, 14, 3634. https://doi.org/10.3390/electronics14183634

AMA Style

Nath RK, Mitra D. An Analysis of Partitioned Convolutional Model for Vehicle Re-Identification. Electronics. 2025; 14(18):3634. https://doi.org/10.3390/electronics14183634

Chicago/Turabian Style

Nath, Rajsekhar Kumar, and Debjani Mitra. 2025. "An Analysis of Partitioned Convolutional Model for Vehicle Re-Identification" Electronics 14, no. 18: 3634. https://doi.org/10.3390/electronics14183634

APA Style

Nath, R. K., & Mitra, D. (2025). An Analysis of Partitioned Convolutional Model for Vehicle Re-Identification. Electronics, 14(18), 3634. https://doi.org/10.3390/electronics14183634

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Analysis of Partitioned Convolutional Model for Vehicle Re-Identification

Abstract

1. Introduction

2. Related Work and Motivation

2.1. Vehicle Re-Id Datasets

2.2. Vehicle Re-Id Methods

3. Methodology

3.1. Data Preprocessing

3.2. Backbone Model

3.3. Part-Based Convolutional Baseline (PCB) Model

3.4. Refined Part Pooling

3.5. Proposed Overlapped Part-Based Convolutional Baseline (OPCB)

3.6. Generalized Training and Evaluation Procedure

3.7. Re-Ranking Using k-Reciprocal Nearest Neighbours

3.8. Proposed Algorithm for Best Distance Mix Calculation

4. Experimental Results and Analysis

4.1. Distance Calculation for Re-Ranking

4.2. Impact of Backbone and Ensembled Model

4.3. Performance of OPCB

4.4. Comparison with State of the Art

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI