Dynamic Re-Weighting and Cross-Camera Learning for Unsupervised Person Re-Identification

Yin, Qingze; Wang, Guan’an; Wu, Jinlin; Luo, Haonan; Tang, Zhenmin

doi:10.3390/math10101654

Open AccessArticle

Dynamic Re-Weighting and Cross-Camera Learning for Unsupervised Person Re-Identification

by

Qingze Yin

¹

,

Guan’an Wang

²,

Jinlin Wu

²,

Haonan Luo

¹ and

Zhenmin Tang

^1,*

¹

School of Computer Science and Engineering, Nanjing University of Science and Technology, No. 200 Xiaolingwei Street, Nanjing 210094, China

²

Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(10), 1654; https://doi.org/10.3390/math10101654

Submission received: 24 March 2022 / Revised: 10 May 2022 / Accepted: 11 May 2022 / Published: 12 May 2022

(This article belongs to the Special Issue Mathematical Methods in Image Processing and Computer Vision)

Download

Browse Figures

Versions Notes

Abstract

:

Person Re-Identification (ReID) has witnessed tremendous improvements with the help of deep convolutional neural networks (CNN). Nevertheless, because different fields have their characteristics, most existing methods encounter the problem of poor generalization ability to invisible people. To address this problem, based on the relationship between the temporal and camera position, we propose a robust and effective training strategy named temporal smoothing dynamic re-weighting and cross-camera learning (TSDRC). It uses robust and effective algorithms to transfer valuable knowledge of existing labeled source domains to unlabeled target domains. In the target domain training stage, TSDRC iteratively clusters the samples into several centers and dynamically re-weights unlabeled samples from each center with a temporal smoothing score. Then, cross-camera triplet loss is proposed to fine-tune the source domain model. Particularly, to improve the discernibility of CNN models in the source domain, generally shared person attributes and margin-based softmax loss are adapted to train the source model. In terms of the unlabeled target domain, the samples are clustered into several centers iteratively and the unlabeled samples are dynamically re-weighted from each center. Then, cross-camera triplet loss is proposed to fine-tune the source domain model. Comprehensive experiments on the Market-1501 and DukeMTMC-reID datasets demonstrate that the proposed method vastly improves the performance of unsupervised domain adaptability.

Keywords:

clustering; dynamic re-weighting; person attributes; cross-camera triplet loss

MSC:

68U10

1. Introduction

Person re-identification (ReID), the basis of surveillance video analysis, is a cross-camera retrieval task. Cross-camera implies retrieving images belonging to the same pedestrian in different cameras. Lately, due to the wide use of deep learning architecture [1], the performance of ReID has been significantly improved through adopting supervised learning [2,3,4,5]. Typical existing supervised person ReID methods focus on solving challenges including occlusion presented, pose variations, background clutter and view-point variations during matching images [6,7,8,9,10]. For instance, Liu et al. [11] proposed a generative adversarial network (GAN)-based deep learning model [12] of transferable poses to tackle pedestrian pose variants. Chen et al. [7] coalesced deep learning neural networks based on multiple-scale similarity metrics and conditional random fields. Some attention-based approaches [13,14,15] have been developed to alleviate the influence of background clutter through learning images’ discriminative features. Although hopeful results have been obtained, the mentioned methods still have difficulties in cross-dataset ReID issues for their poor performance in suppressing the visual differences between datasets. The cross-dataset testing problem refers to the problem that the test and training sets of a model do not belong to the same dataset (e.g., trained on Market1501 and tested on DukeMTMC-reID), resulting in a significant degradation of model performance. The domain gap between different datasets is a vital issue due to the multiplicity of data collection. In addition, tagging pedestrian labels manually is a time-consuming process that costs too much labor and material resources.

In order to solve cross-dataset ReID problems, hand-crafted-features-based methods have been developed. They can perform ReID in an unsupervised manner [16,17,18,19] at the target domain. Domain adaptation methods [20,21] have been used [22,23] for adapting and exploiting visual information across different data domains. Nevertheless, because of the huge gap between body poses, viewpoints, background clutter and identities in different datasets, no label information can be utilized for supervised learning in the target domain. Under this situation, the performance of the algorithms will be limited. For instance, Fan et al. [23] developed an advanced unsupervised learning algorithm that performs CNN fine-tuning and K-means clustering iteratively. Li et al. [24] proposed a method to apply tracklet association on ReID which is learned by temporal and spatial information. Wang et al. [25] considered discriminative auxiliary attributes knowledge of feature representations. Deng et al. [22] utilized CycleGAN [26] to transfer image features from the source to target domains for generating data with label information. Zhong et al. [27] proposed a StarGAN [28]-based method to extract cameras’ invariant features. Lin et al. [29] minimized the distribution’s variations by introducing the maximum mean discrepancy (MMD) distance.

Traditional unsupervised ReID research has primarily concentrated on feature engineering [17,18,30,31]. Feature engineering uses prior human knowledge to create hand-crafted features which can be appropriate for the learning of an unsupervised paradigm. Because these methods cannot extract befitting semantic features based on poor data distributions, efficiency works only for small datasets, rather than for the large ones. However, large datasets are commonly used in person ReID. Therefore, those traditional unsupervised ReID methods are not suitable for current common datasets. Furthermore, because of the multiplicity of data collection, the domain gap between different datasets has to be taken into consideration. For instance, a person in Market-1501 [32] generally wears short and thin clothes, while a person in DukeMTMC-reID [33] almost wears long, thick clothes. Those characteristics can be observed in Figure 1. As we can see, due to the diversity of images’ captured dates and locations, the backgrounds and foregrounds (including pedestrians’ outlooks) have significant differences. The factors affecting the significant background gap between the two datasets can also cause a sharp drop in performance when the trained model is applied immediately from the labeled source domain to the unlabeled target domain. Therefore, in order to reduce the negative influence of these factors, more attention to domain adaptation should be developed.

Lately, unsupervised ReID methods mainly develop domain adaptation [25,34,35,36], which aims to narrow the gap between the target and source domains. For the training stage, the main strategy is transferring the learned knowledge from the source domain continuously to the target domain to prompt the learning process. For instance, Lin et al. [29] optimized alignment and classification losses jointly to develop a feature alignment algorithm that can align the target and source data in feature spaces. Deng et al. [22] raised a model named SPGAN, which can integrate model learning and image translation and also preserve similarities between source and target domains.

Substantial approaches have been proposed to solve the domain gap issue. Some methods regard this problem as a style transferring issue. The related methods in [22,27] focus on eliminating the domain gap between labeled and unlabeled images by transferring the unlabeled target images’ styles to labeled source images’ styles by adapting the generative adversarial nets [12]. Nevertheless, due to the limitation of Rank-1 accuracy, the discriminative information may be lost during the transference. Other methods resort to an unsupervised domain transfer issue. Approaches in [23,24,25,34,35,37,38] reinforce the ability of the model by adopting unsupervised methods on the target unlabeled dataset. An unsupervised transfer method, named TJ-AIDL, proposed by Wang et al. [25], learns the identity-discriminative and attribute-semantic feature spaces to transfer the source domain knowledge to the target domain. Lv et al. [38] uses the spatial–temporal relationship to mine training samples and re-rank target domain retrieval results. A progressive unsupervised learning approach (PUL) was proposed by Fan et al. [23] which fine-tunes the source domain model according to the obtained pseudolabels. Nevertheless, PUL is ineffective because of fixed threshold sampling and the poor generalization of the source domain model.

However, these methods depend on the hypothesis that source and target domains have approximate distributions. In addition, a large discrepancy between the two domains cannot guarantee efficiency. Other effective methods for solving unsupervised person ReID are called clustering-based approaches [23,39,40,41,42,43,44]. These approaches utilize the generated pseudolabels to train the model in a supervised manner, where the pseudolabels are generated by clustering image features. Fan et al. [23] raised an advanced approach based on a clustering algorithm that can transfer the pre-learned knowledge of feature representation to an unseen target domain. Feature representation learning and clustering are iteratively performed like the EM-style algorithm. Lin et al. [40] proposed a bottom-up clustering-based method that can optimize the relationship between the unique image samples and the convolutional neural network jointly. Recently, Yang et al. [42] proposed a novel clustering-based method based on the asymmetric co-teaching tactic. In terms of the unsupervised model based on the clustering method, the key factor is its data clustering quality.

A novel temporal smoothing dynamic re-weighting and cross-camera learning (TSDRC) scheme is proposed to solve the cross-domain unsupervised adaption task. For the source domain, we use both ID knowledge and attributes knowledge to train the source domain model, which is supervised by the angular softmax loss (A-Softmax [45]). Ordinarily, person identities are different across different person datasets, but attributes [3] (e.g., clothing color, hats, bags, backpacks and gender) are generally shared. Thus, we improve the source domain model’s generalizability by attribute classification.

For the target domain, we generate pseudolabels by the unsupervised clustering algorithm. In the real-scenario ReID system, a similar background is often shared between person images captured from the same camera, which will lead the person features generated from the same camera to be assigned to the same cluster by the clustering model. Then, these pseudolabels are used to fine-tune the source domain model. Nevertheless, there are many noises in the clustering results, which mislead the target domain’s fine-tuning. According to the analysis, noise can be divided into three cases, as shown in Figure 2:

Case 1 (top row): different people in the same camera, since pictures that belong to the same camera have the same background.
Case 2 (middle row): pictures belong to the same person and same camera. These pictures can be easily merged into the same center. They are safe for training but cannot improve the cross-camera retrieval ability.
Case 3 (bottom row): different people with similar appearances.

These samples are likely to be merged into one center by the clustering algorithm. To overcome this issue, we propose a dynamic re-weighting (DRW) strategy and a cross-camera triplet loss (CCT). This paper is an extended version of [46], and the extensions include the two proposed methods. The former reduces the damage of case 3 noise, while the latter avoids the damage of case 1 and case 2 noise by enhancing cross-camera training. Our contributions are summarized as follows:

We propose a novel temporal smoothing dynamic re-weighting and cross-camera learning (TSDRC) scheme to improve the training of the target domain with a person re-identification temporal smoothing constraint.
We design a dynamic re-weighting (DRW) strategy to achieve a trade-off of selecting safe clustering samples and cross-view samples. To further improve the cross-view retrieval ability, we propose cross-camera triplet loss (CCT) for the target domain training.
Comprehensive experiments on the Market-1501 and DukeMTMC-reID datasets demonstrate that the proposed method vastly improves the existing unsupervised person ReID methods.

The three other sections are structured as follows: Section 2 elaborates upon the proposed framework and algorithm. Experiments and discussions are presented in Section 3. Section 4 summarizes the conclusions.

2. Materials and Methods

This section introduces a method that learns knowledge from the source domain by supervised learning, which is applied to the target domain to achieve unsupervised learning. As Figure 3 shows, softmax cross-entropy loss and angular softmax loss (A-Softmax [45]) were separately utilized to develop the attribute and identity knowledge of people for source domain learning. For the learning of the target domain, we developed clustering learning, temporal smoothing dynamic re-weighting and cross-camera triplet loss for generating samples with less noise pseudolabels. It can push inter-cluster samples away while minimizing the intra-cluster variance. In addition, it can also solve the problem of poor discernibility in a complex situation of cross-camera person ReID.

2.1. Source Domain Training

Attribute Knowledge. In different datasets, in images of different people there may exist diversities. As shown in Figure 4, we compared attention change with applying attribute knowledge across datasets. In DukeMTMC dataset, the first row represents the condition with learning attribute knowledge, which indicates attention to person attributes, such as the backpack or bodies (circled in green). The second row represents the condition without learning attribute knowledge, which shows the attention to the background clutter, such as bicycle or car (circled in red). The same condition was also adapted for Market1501. Therefore, we added learning attribute knowledge for extracting features that pay attention to a person instead of other noises. Person attributes are generally shared, while identities are commonly non-overlapping across different ReID datasets. For example, as shown in Figure 5, the attributes red upper-body, backpack and female are shared on the DukeMTMC-reID and Market1501. Empirically speaking, identity knowledge is harder to transfer than attribute knowledge.

ID Knowledge. Softmax is widely used to train person ReID models. However, we prefer that the feature representations are easier for transferring and are more discriminative. Therefore, we utilized the angular softmax loss [45] to obtain more compact features. Figure 6 demonstrates its benefit. The angular softmax loss is formulated as follows:

\begin{matrix} L_{angular} & = - \frac{1}{N} \sum_{i = 1}^{N} log \frac{e^{z_{y_{i}}}}{\sum_{j \neq y_{i}}^{C} e^{z_{j}} + e^{z_{y_{i}}}} \\ z_{y_{i}} & = ‖ x_{i} {‖ (- 1)}^{k} c o s (m θ_{y_{i}, i}) - 2 k \\ z_{y_{j}} & = ‖ x_{j} ‖ c o s (θ_{y_{j}, j}), \end{matrix}

(1)

where

k \in [0, m - 1]

and m is a hyperparameter which is the margin factor. Please refer to the work [45] for more details.

2.2. Target Domain Training

For target domain training, we used the clustering algorithm to generate pseudolabels. The dynamic re-weighting strategy was used to reduce the damage of the noise sample. The cross-camera triplet loss was used for cross-camera retrieval learning.

Clustering. In the target domain, unlabeled samples use the K-means algorithm to cluster themselves into several centers. The summarization of the formulation is as follows:

min_{\hat{Y^{t}}, c_{1}, c_{2}, \dots, c_{K}} \sum_{k = 1}^{K} \sum_{\hat{y_{i}^{t}} = k} {‖ ϕ_{θ} (x_{i}^{t}) - c_{k} ‖}_{2}

(2)

where K, pre-defined empirically, represents the number of clusters and

c_{i}

,

i \in {1, 2, \dots, K}

, are the corresponding cluster centers.

\hat{Y^{t}}

is the predicted pseudo label.

Temporal Smoothing Dynamic Re-weighting (TSDRW). The common method adopted for re-weighting the pseudolabels is using dynamic sampling (DS), which can be represented in the following formula:

\begin{matrix} w_{i}^{t} = \{\begin{matrix} 1, & S_{i}^{t} > λ^{t} \\ 0, & o t h e r w i s e \end{matrix} \end{matrix}

(3)

where

S_{i}^{t}

is the similarity. The weight will be 1 if the similarity is larger than the threshold, which is also known as there being no weight. Otherwise, the weight will be 0 which cannot calculate with loss. However, Equation (3) cannot achieve a smoothing re-weighting, which would introduce some kinds of noises. Therefore, temporal smoothing dynamic re-weighting (TSDRW) was developed to assign a real weight to pseudolabels instead of no weight. This weight is based on its similarities. Reliable samples enjoy higher weights which can be distinguished and utilized deeply. On the contrary, unreliable samples obtain lower weights which will be utilized slightly and reduce the influence of noise.

The noise of the cluster results limits the performance of target domain training. In order to reduce the damage of the pseudolabels of outliers, we defined a temporal smooth weight

w_{i}^{t}

and a dynamic threshold

λ^{t}

to re-weight the clustering results at the t-th epoch.

\begin{matrix} w_{i}^{t} = \{\begin{matrix} S_{i}^{t}, & S_{i}^{t} > λ^{t} \\ 0, & o t h e r w i s e \end{matrix} \end{matrix}

(4)

S_{x_{i}^{t}, c_{x_{i}^{t}}} = c o s (ϕ_{θ} (x_{i}^{t}), c_{x_{i}^{t}}) + α c o s (ϕ_{θ} (x_{i}^{t - 1}), c_{x_{i}^{t}})

(5)

where

c_{x_{i}^{t}}

denotes the corresponding cluster center of the image

x_{i}^{t}

.

ϕ_{θ} (x_{i}^{t})

is the feature of

x_{i}

at the t-th epoch, while

ϕ_{θ} (x_{i}^{t - 1})

is the feature of

x_{i}

at the

t - 1

epoch.

α

is the temporal smooth hyperparameter. If the temporal smooth cosine similarity

c o s (ϕ_{θ} (x_{i}^{t}), c_{x_{i}^{t}}) + α c o s (ϕ_{θ} (x_{i}^{t - 1}), c_{x_{i}^{t}})

is larger than the threshold

λ^{t}

of t-th epoch, the sample

x_{i}^{t}

and its pseudolabel

\hat{y_{i}^{t}}

would be temporarily selected, re-weighting as

w_{i}^{t} = c o s (ϕ_{θ} (x_{i}^{t}), c_{x_{i}^{t}}) + α c o s (ϕ_{θ} (x_{i}^{t - 1}), c_{x_{i}^{t}})

for unsupervised training. Otherwise, the sample will be abandoned and re-weighted as

w_{i}^{t} = 0

.

After that, we used the temporal smoothing re-weighted samples to fine-tune the source domain model by optimizing the following objective:

min_{θ, w} \sum_{i = 1}^{N} w_{i} L (\hat{y_{i}^{t}}, f_{w} (ϕ_{θ} (x_{i}^{t})))

(6)

where

f_{w}

is the classifier of the target domain and L is the angular softmax loss function.

In terms of the threshold

λ^{t}

, a great value prefers to select the same identity samples from the same camera. The selected samples are more reliable but probably not useful for retrieving a person under cross-camera. Oppositely, a small value generally involves outliers or noisy samples which may mislead the model in the fine-tuning process. Figure 7 illustrates the influence of selected samples with different settings of the threshold. In Figure 7, noise refers to images that do not belong to the same pedestrian but are grouped into a category by the clustering algorithm due to similar outlook, occlusion or appearance/background.

To address this dilemma, the TSDRW strategy was designed. It sets a larger threshold, cautiously selecting reliable samples, which may belong to the same ID and the same camera. When the discriminability of the model becomes stabler, the samples involving the cross-camera pairs with more information are selected by a smaller threshold to enhance the cross-camera retrieval ability of the model. To this end, the sampling threshold

λ^{t}

is dynamically decreased with a decreasing rate

η

as follows:

λ^{t} = λ^{t - 1} - η (U - L),

(7)

where U, a large threshold, represents the upper bound and L, a small threshold, represents the lower bound. As the discriminative ability of the model gradually increases, the threshold

λ^{t}

decreases from the upper bound to the lower bound. The TSDRC scheme can be found in Algorithm 1.

Algorithm 1: TSDRC

Require: source dataset pre-trained model

ϕ_{θ}

; unlabeled dataset X; target dataset classifier

f_{w}

; threshold lower bound L and upper bound U; relaxing rate

η

.

Ensure: Optimized model

ϕ_{θ}

;

1: Initial

λ_{0} = U

;

2: repeat

3: Extracting feature:

f_{i} = ϕ_{θ} (x_{i}^{t})

for all

x_{i}^{t}

;

4:

L_{2}

-normalization for feature

f_{i}

;

5: K-means clustering;

6: Updating cluster center C and pseudolabels

\hat{y^{t}}

;

7: for i = 1 to N do

8: Calculate the nearest cluster center

c_{x_{i}^{t}}

of

x_{i}^{t}

;

9: Re-weighting each cluster result

x_{i}^{t}

as in Equation (4)

10: end for

11: Training

< ϕ_{θ}, f_{w} >

with selected samples as in Equation (10)

12: Updating

λ_{t}

:

λ_{t} = λ_{t - 1} - η \cdot (U - L)

13: until (

λ_{t} < L

)

Cross-Camera Triplet Loss (CCT). To avoid noise from the same camera and improve cross-camera retrieval ability, we propose cross-camera triplet loss as follows:

\begin{matrix} L_{t r i} (x_{c_{i}}^{a}) = [| | x_{c_{i}}^{a} - x_{c_{j}}^{p} {| |}_{2}^{2} - | | x_{c_{i}}^{a} - x_{c_{k}}^{n} {| |}_{2}^{2} {+ m]}_{+}, \\ c_{j} \neq c_{i} & c_{k} \neq c_{i} \end{matrix}

(8)

It mainly focuses on pushing the potential across camera positive samples.

x_{c_{i}}^{a}

is the anchor belonging to the i-th camera.

x_{c_{j}}^{p}

is the positive sample of the anchor

x_{c_{i}}^{a}

coming from j-th camera, while

x_{c_{k}}^{n}

is the negative one of k-th camera. Both positive and negative samples do not belong to the same camera as anchors. For an anchor, a corresponding positive sample and a negative sample,

< # p o s i t i v e, # a n c h o r, # n e g a t i v e >

were selected to form a triple to compute the triple loss. More details are shown in the work [47].

In order to further reduce the influence of noisy cluster outliers, we adapted temporal smoothing for features that were applied in the triplet loss. The feature temporal smoothing is as follows:

\begin{matrix} x_{c_{i}} = ϕ_{θ}^{t} {(x_{c_{i}})}^{t} + α ϕ_{θ}^{t - 1} {(x_{c_{i}})}^{t} \end{matrix}

(9)

where

ϕ_{θ}^{t} {(x_{c_{i}}^{a})}^{t}

is the feature of

{(x_{c_{i}}^{a})}^{t}

at the t-th epoch, while

ϕ_{θ}^{t - 1} {(x_{c_{i}}^{a})}^{t}

is the feature of

{(x_{c_{i}}^{a})}^{t}

at the

t - 1

-th epoch.

The total loss is as follows:

L_{t o t a l} (x_{i}^{c_{j}}) = L_{a n g u l a r} (x_{i}^{c_{j}}) + γ L_{t r i} (x_{i}^{c_{j}})

(10)

γ

is the weight of CCT, which is set as 0.5 in this paper.

3. Results

3.1. Datasets

Comprehensive experiments were conducted on Market-1501 [32] and DukeMTMC-reID [33] to evaluate our approach. They are large-scale datasets and widely used for evaluating ReID algorithms. We used Rank-1 accuracy and mean average precision (mAP) as performance measurements to evaluate our proposed algorithm.

3.2. Implementation Details

Pytorch was used to implement our approach. The model structure consisted of a Resnet-50 [1] backbone, a batch normalization (BN) layer, a global average pooling (GAP), a fully connected layer (FC) and another batch normalization (BN) layer. The m of angular softmax was set to 3 in the experiments. All person images were resized to 288 × 188. For the temporal smoothing dynamic re-weighting, the dynamic decreasing rate was set to

1.5

×

10^{- 3}

. The lower bound L and upper bound U were set to 0.7 and 0.8, respectively. We adapted a P × C × K sampling strategy for cross-camera triplet loss. We sampled P IDs for each mini-batch. For each ID, we sampled from C cameras and K images from each camera.

3.3. Ablation Experiments

This section analyzes the impact of the three main strategies (attribute knowledge, temporal smoothing dynamic re-weighting and cross-camera triplet loss), cluster number K, iteration frequency and dynamic decreasing rate

η

, which were used in our TSDRC.

Attribute Knowledge. Attribute knowledge provides a better starting point for subsequent fine-tuning. Table 1 shows that training with attributes can improve the Rank-1 accuracy by 6.2% when the source domain is Market-1501 and by 2.8% when in DukeMTMC-reID.

Temporal Smoothing Dynamic Re-weighting (TSDRW). According to the method PUL [23], four initial thresholds were set as 0.85, 0.8, 0.7, 0.6. As we can see,

λ = 0.8

obtains the best performance. By comparing with the DS and fixed threshold sampling, we used TSDRW to select reliable training samples. The impact of the re-weighting strategy is shown in Table 1. The TSDRW strategy is the best. It achieves 0.3% gains on mAP and 1.1% gains on Rank-1 when the source domain is Market-1501. It also achieves 0.4% gains on mAP and 0.4% gains on Rank-1 when the source domain is DukeMTMC-reID.

Cross-Camera Triplet Loss (CCT). CCT was used to improve the cross-camera retrieval ability. The impact of the CCT is shown in Table 1. According to the results, the cross-camera triplet loss achieves 1.6% gains on mAP and 3.1% gains on Rank-1 when the source domain is Market-1501. It also achieves 1.3% gains on mAP and 1.9% gains on Rank-1 when the source domain is DukeMTMC-reID.

Cluster Number. Figure 8 shows the influence of the number of clusters K. We set K from 700 to 1500 by a step of 200.

K = 900

obtains the best performance both on DukeMTMC-reID and Market-1501. Therefore, we set

K = 900

in the latter experiments and the results demonstrate that the fluctuations are relatively stable, which represents that the proposed TSDRC is less sensitive to K in a certain range.

Iteration Frequency. Table 2 shows the influence of different iteration frequencies. We set 30 iterations in total and 10 epochs per iteration in our experiments, which means we fine-tuned the pre-trained source model by using the optimized pseudolabels per 10 epochs. As we can see from the comparisons, the model fine-tuned by one epoch per iteration obtains the worst performance,

5.2 %

on Rank-1 and

1.9 %

on mAP when the source domain is Market-1501 and

6.6 %

on Rank-1 and

1.7 %

on mAP when the source domain is DukeMTMC, which is because there is pseudolabel collision due to the pseudolabels changing per iteration clustering. Moreover, this can mislead the model training by using those produced unreliable samples. With the increasing epochs number per iteration, the performance obtains significant rises. The results of iteration frequency set as 10 epochs achieve

65.1 %

more gains on Rank-1 and

42.4 %

more gains on mAP than those of one epoch when the source domain is Market-1501 and achieves

66.9 %

more gains on Rank-1 and

39.5 %

more gains on mAP than those of one epoch when the source domain is DukeMTMC. However, a larger epochs number can result in a longer training time. Therefore, we set 10 as the iteration frequency which obtains similar results to 20 epochs but with less training time.

Dynamic Decreasing Rate

η

. In dynamic re-weighting, we set a sampling threshold

λ^{t}

to select reliable samples which was controlled by a dynamic decreasing rate

η

. An upper bound and a lower bound were used for limiting the changing range of threshold. The threshold value was decreasing from the upper bound U to the lower bound L by a decreasing step length

η

. As Table 3 shows, we compared different settings of dynamic decreasing rate to illustrate its influence. Here, U and L were set to 0.8 and 0.7 separately. According to the results, the dynamic decreasing rate

1.5 \times 10^{- 3}

obtains the best performance, which achieves

35.1 %

more gains on Rank-1 and

22.4 %

more gains on mAP than those of

1.5 \times 10^{- 1}

when the source domain is Market-1501 and achieves

36.9 %

more gains on Rank-1 and

21.5 %

more gains on mAP than those of

1.5 \times 10^{- 1}

when the source domain is DukeMTMC. The rate set to

1.5 \times 10^{- 1}

obtains half the value of

1.5 \times 10^{- 3}

, and set to

1.5 \times 10^{- 2}

, obtains a similar value of

1.5 \times 10^{- 3}

. The reason is that the step length is too large to precisely drop. It cannot introduce enough reliable samples for training at the beginning. The rate set to

1.5 \times 10^{- 4}

results in a relative drop, which is

1.3 %

on mAP and

1.2 %

on Rank-1 when the source domain is Market-1501 and

2 %

on mAP and

1.4 %

on Rank-1 when the source domain is DukeMTMC, and also has a slow speed which requires longer training time.

The different selections of bound range in the experiments result in the different reliable confidence of selected samples. As Figure 9 shows, we selected some ranking list results and compared them in three groups. In group 1, it shows the selected samples enjoy the same identity and same camera when the similarity is larger than 0.8 and the camera of the samples will be various when the similarity is smaller than 0.8 or bigger than 0.7. In group 2, it shows the selected samples enjoy the same camera and different identities when the similarity is smaller than 0.7 or larger than 0.6. In group 3, it shows the selected samples enjoy the different identities and cameras when the similarity is smaller than 0.6. According to this observation, we set 0.8 as the upper bound and 0.7 as the lower bound to select reliable samples in the training process.

3.4. Comparing with the State-of-the-Art Approaches

Table 4 and Table 5 display the experimental results. Comparing with two clustering approaches PUL [23] and CAMEL [48], two unsupervised domain transfer approaches TJ-AIDL [25] and ARN [34], three image-style transform approaches [22,27,36] (i.e., PTGAN, SPGAN+LMP and HHL), two domain generalization approaches MixStyle [49] and DSU [50] and the hand-crafted features LOMO [16], our proposed TSDRC outperforms these methods in mAP, Rank-1 and Rank-5 on both DukeMTMC-reID and Market-1501.

Compared with the unsupervised dynamic training method CDS [46], our proposed TSDRC outperforms by 1.3% on mAP and 1.9% on Rank-1 on Market-1501 and outperforms by 3.1% on Rank-1 and 1.6% on mAP on DukeMTMC-reID. Compared with the fixed sampling method PUL, our proposed TSDRC overcomes it by 20.7% on mAP and 28% on Rank-1 on Market-1501 and exceeds PUL by 40.3% on Rank-1 and 27.9% on mAP on DukeMTMC-reID. Compared with CAMEL, TSDRC outperforms by 14.9% on mAP and 19% on Rank-1 on Market-1501. Compared with TJ-AIDL, TSDRC outperforms by 14.7% on mAP and 15.3% on Rank-1 on Market-1501 and exceeds TJ-AIDL by 21.3% on mAP and 26% on Rank-1 on DukeMTMC-reID. Compared with ARN, TSDRC overcomes it by 1.8% on mAP and 3.2% on Rank-1 on Market-1501 and exceeds ARN by 10.1% on Rank-1 and 10.9% on mAP on DukeMTMC-reID. Compared with PTGAN, TSDRC overcomes it by 34.9% on Rank-1 on Market-1501 and 42.9% on Rank-1 on DukeMTMC-reID. Compared with SPGAN+LMP, TSDRC overcomes it by 14.5% on mAP and 15.8% on Rank-1 on Market-1501 and exceeds SPGAN+LMP by 23.9% on Rank-1 and 18.1% on mAP on DukeMTMC-reID. Compared with HHL, TSDRC overcomes it by 9.8% on mAP and 11.3% on Rank-1 on Market-1501 and exceeds HHL by 23.4% on Rank-1 and 17.1% on mAP on DukeMTMC-reID. Compared with MixStyle, TSDRC overcomes it by 13.1% on mAP and 16.9% on Rank-1 on Market-1501 and exceeds MixStyle by 23.6% on Rank-1 and 16.1% on mAP on DukeMTMC-reID. Compared with DSU, TSDRC overcomes it by 8.8% on mAP and 9.8% on Rank-1 on Market-1501 and exceeds DSU by 18.3% on Rank-1 and 12.3% on mAP on DukeMTMC-reID. Compared with LOMO, TSDRC overcomes it by 33.2% on mAP and 46.3% on Rank-1 on Market-1501 and exceeds LOMO by 58% on Rank-1 and 39.5% on mAP on DukeMTMC-reID.

Combined with the ablation experiments, we may infer that the proposed TSDRC achieves noticeable improvements for cross-domain person ReID by the benefit of dynamic re-weighting in clusters, attribute guided training and cross-camera triplet loss.

4. Discussion and Future Work

In this work, we analyze the security and effectiveness of unlabeled samples in the target domain. To balance security and effectiveness, we propose a novel TSDRC method to solve the cross-domain person re-identification problem. We mainly adopt three strategies: using person attributes to enhance the discriminability of the model, constructing the temporal smoothing dynamic re-weighting strategy to select informative samples from the clustering results and designing the cross-camera triplet loss to improve cross-camera retrieval ability. TSDRC achieves competitive performance on the DukeMTMC-reID and Market1501 datasets.

However, our method uses K-means to cluster unlabeled samples. The K-means method has to artificially specify the number of cluster centers. Similarly, other clustering algorithms require artificially specified clustering hyperparameters. In practical applications, these hyperparameters are unknown. This problem limits the scalability of our method.

Therefore, how to remove the dependence of our method on clustering algorithms will be further studied. In addition, we will further extend the cross-camera triplet loss proposed in this paper to improve the performance of the unsupervised ReID task using positive and negative sample relations.

Author Contributions

Formal analysis, G.W.; Methodology, J.W.; Software, Z.T.; Writing—original draft, Q.Y.; Writing—review & editing, H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Subramanyam, A.V.; Gupta, V.; Ahuja, R. Robust Discriminative Subspace Learning for Person Reidentification. IEEE Signal Process. Lett. 2019, 26, 154–158. [Google Scholar] [CrossRef]
Lin, Y.; Zheng, L.; Zheng, Z.; Wu, Y.; Hu, Z.; Yan, C.; Yang, Y. Improving person re-identification by attribute and identity learning. Pattern Recognit. 2019, 95, 151–161. [Google Scholar] [CrossRef] [Green Version]
Zhang, S.; Zhang, L.; Wang, W.; Wu, X. AsNet: Asymmetrical Network for Learning Rich Features in Person Re-Identification. IEEE Signal Process. Lett. 2020, 27, 850–854. [Google Scholar] [CrossRef]
Zhao, Y.; Li, Y.; Wang, S. Open-World Person Re-Identification with Deep Hash Feature Embedding. IEEE Signal Process. Lett. 2019, 26, 1758–1762. [Google Scholar] [CrossRef]
Chang, X.; Hospedales, T.M.; Xiang, T. Multi-level Factorisation Net for Person Re-identification. In Proceedings of the Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2109–2118. [Google Scholar]
Chen, D.; Xu, D.; Li, H.; Sebe, N.; Wang, X. Group Consistent Similarity Learning via Deep CRF for Person Re-identification. In Proceedings of the Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8649–8658. [Google Scholar]
Chen, Y.; Li, Y.; Du, X.; Wang, Y. Learning resolution-invariant deep representations for person re-identification. Proc. AAAI 2019, 33, 8215–8222. [Google Scholar] [CrossRef] [Green Version]
Cheng, D.; Gong, Y.; Zhou, S.; Wang, J.; Zheng, N. Person Re-identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function. In Proceedings of the Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1335–1344. [Google Scholar]
Kalayeh, M.; Basaran, E.; Gokmen, M.; Kamasak, M.; Shah, M. Human Semantic Parsing for Person Re-identification. In Proceedings of the Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1062–1071. [Google Scholar]
Liu, J.; Ni, B.; Yan, Y.; Zhou, P.; Cheng, S.; Hu, J. Pose Transferrable Person Re-identification. In Proceedings of the Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4099–4108. [Google Scholar]
Jean, G.I.J.P.; Mehdi, M.; Xu, B.; David, W.; Sherjil, O.; Yoshua, B. Generative Adversarial Nets. In Proceedings of the NIPS, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
Chen, Y.; Hsu, W. Saliency Aware: Weakly Supervised Object Localization. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 1907–1911. [Google Scholar]
Chen, Y.; Huang, P.; Yu, L.; Huang, J.; Yang, M.; Lin, Y. Deep Semantic Matching with Foreground Detection and Cycle-Consistency. In Proceedings of the Asian Conference on Computer Vision (ACCV), Perth, Australia, 2–6 December 2018. [Google Scholar] [CrossRef]
Chen, Y.; Lin, Y.; Yang, M.; Huang, J. Show, Match and Segment: Joint Weakly Supervised Learning of Semantic Matching and Object Co-Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3632–3647. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Liao, S.; Hu, Y.; Zhu, X.; Li, S.Z. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2197–2206. [Google Scholar]
Farenzena, M.; Bazzani, L.; Perina, A.; Murino, V.; Cristani, M. Person re-identification by symmetry-driven accumulation of local features. In Proceedings of the Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 2360–2367. [Google Scholar]
Gray, D.; Tao, H. Viewpoint Invariant Pedestrian Recognition with an Ensemble of Localized Features. In Proceedings of the ECCV, Marseille, France, 12–18 October 2008; pp. 262–275. [Google Scholar]
Ma, B.; Su, Y.; Jurie, F. Covariance Descriptor based on Bio-inspired Features for Person Re-identification and Face Verification. Image Vis. Comput. 2014, 32, 379–390. [Google Scholar] [CrossRef] [Green Version]
Hoffman, J.; Tzeng, E.; Park, T.; Zhu, J.; Isola, P.; Saenko, K.; Efros, A.; Darrell, T. CyCADA: Cycle-Consistent Adversarial Domain Adaptation. In Proceedings of the ICML, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
Chen, Y.; Lin, Y.; Yang, M.; Huang, J. CrDoCo: Pixel-level Domain Transfer with Cross-Domain Consistency. In Proceedings of the Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
Deng, W.; Zheng, L.; Kang, G.; Yang, Y.; Ye, Q.; Jiao, J. Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person reidentification. In Proceedings of the Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 994–1003. [Google Scholar]
Fan, H.; Zheng, L.; Yan, C.; Yang, Y. Unsupervised person re-identification: Clustering and fine-tuning. TOMM 2018, 14, 83. [Google Scholar] [CrossRef]
Li, M.; Zhu, X.; Gong, S. Unsupervised person re-identification by deep learning tracklet association. In Proceedings of the ECCV, Munich, Germany, 8–10 September 2018; Volume 11208. [Google Scholar]
Wang, J.; Zhu, X.; Gong, S.; Li, W. Transferable Joint Attribute-Identity Deep Learning for Unsupervised Person Re-identification. In Proceedings of the Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2275–2284. [Google Scholar]
Zhu, J.; Park, T.; Isola, P.; Efros, A. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In Proceedings of the ICCV, Venice, Italy, 22–29 October 2017. [Google Scholar]
Zhong, Z.; Zheng, L.; Li, S.; Yang, Y. Generalizing a person retrieval model hetero-and homogeneously. In Proceedings of the ECCV, Munich, Germany, 8–10 September 2018; pp. 172–188. [Google Scholar]
Choi, Y.; Choi, M.; Kim, M.; Ha, J.-W.; Kim, S.; Choo, J. StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation. In Proceedings of the Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8789–8797. [Google Scholar]
Lin, S.; Li, H.; Li, C.; Kot, A. Multi-task Mid-level Feature Alignment Network for Unsupervised Cross-Dataset Person Re-Identification. In Proceedings of the BMVC 2018, Newcastle, UK, 3–6 September 2018. [Google Scholar]
Kodirov, E.; Xiang, T.; Gong, S. Dictionary Learning with Iterative Laplacian Regularisation for Unsupervised Person Re-identification. In Proceedings of the BMVC, Swansea, UK, 7–10 September 2015; Volume 3, p. 8. [Google Scholar]
Zhao, R.; Ouyang, W.; Wang, X. Unsupervised Salience Learning for Person Re-identification. In Proceedings of the Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 3586–3593. [Google Scholar]
Zheng, L.; Shen, L.; Tian, L.; Wang, S.; Wang, J.; Tian, Q. Scalable Person Re-identification: A Benchmark. In Proceedings of the ICCV, Santiago, Chile, 7–13 December 2015. [Google Scholar]
Zheng, Z.; Zheng, L.; Yang, Y. Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in Vitro. In Proceedings of the ICCV, Venice, Italy, 22–29 October 2017. [Google Scholar]
Li, Y.; Yang, F.; Liu, Y.; Yeh, Y.; Du, X.; Wang, Y. Adaptation and Re-Identification Network: An Unsupervised Deep Transfer Learning Approach to Person Re-Identification. In Proceedings of the CVPRW, Salt Lake City, UT, USA, 18–22 June 2018; pp. 285–2856. [Google Scholar]
Peng, P.; Xiang, T.; Wang, Y.; Massimiliano, P.; Gong, S.; Huang, T.; Tian, Y. Unsupervised cross-dataset transfer learning for person re-identification. In Proceedings of the Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1306–1315. [Google Scholar]
Wei, L.; Zhang, S.; Gao, W.; Tian, Q. Person Transfer GAN to Bridge Domain Gap for Person Re-identification. In Proceedings of the Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 79–88. [Google Scholar]
Chen, S.; Fan, Z.; Yin, J. Pseudo Label Based on Multiple Clustering for Unsupervised Cross-Domain Person Re-Identification. IEEE Signal Process. Lett. 2020, 27, 1460–1464. [Google Scholar] [CrossRef]
Lv, J.; Chen, W.; Li, Q.; Yang, C. Unsupervised Cross-dataset Person Re-identification by Transfer Learning of Spatial-Temporal Patterns. In Proceedings of the Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7948–7956. [Google Scholar]
Fu, Y.; Wei, Y.; Wang, G.; Zhou, Y.; Shi, H.; Huang, S. Self-Similarity Grouping: A Simple Unsupervised Cross Domain Adaptation Approach for Person Re-Identification. In Proceedings of the ICCV, Seoul, Korea, 27 October–3 November 2019. [Google Scholar]
Lin, Y.; Dong, X.; Zheng, L.; Yan, Y.; Yang, Y. A bottom-up clustering approach to unsupervised person re-identification. In Proceedings of the AAAI, Honolulu, HI, USA, 27 January–1 February 2019; Volume 2, pp. 1–8. [Google Scholar]
Song, L.; Wang, C.; Zhang, L.; Du, B.; Zhang, Q.; Huang, C.; Wang, X. Unsupervised Domain Adaptive Re-Identification: Theory and Practice. arXiv 2018, arXiv:1807.11334. [Google Scholar] [CrossRef] [Green Version]
Yang, F.; Li, K.; Zhong, Z.; Luo, Z.; Sun, X.; Cheng, H.; Guo, X.; Huang, F.; Ji, R.; Li, S. Asymmetric Co-Teaching for Unsupervised Cross-Domain Person Re-Identification. In Proceedings of the AAAI, New York, NY, USA, 7–12 February 2020; pp. 12597–12604. [Google Scholar]
Zhang, X.; Cao, J.; Shen, C.; You, M. Self-training with progressive augmentation for unsupervised cross-domain person re-identification. In Proceedings of the ICCV, Seoul, Korea, 27 October–3 November 2019; pp. 8222–8231. [Google Scholar]
Yin, Q.; Wang, G.; Ding, G.; Gong, S.; Tang, Z. Multi-View Label Prediction for Unsupervised Learning Person Re-Identification. IEEE Signal Process. Lett. 2021, 28, 1390–1394. [Google Scholar] [CrossRef]
Liu, W.; Wen, Y.; Yu, Z.; Li, M.; Bhiksha, R.; Song, L. Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; Volume 1, p. 1. [Google Scholar]
Wu, J.; Liao, S.; Lei, Z.; Wang, X.; Yang, Y.; Li, S.Z. Clustering and Dynamic Sampling Based Unsupervised Domain Adaptation for Person Re-Identification. In Proceedings of the ICME, Shanghai, China, 8–12 July 2019; pp. 886–891. [Google Scholar]
Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
Yu, H.; Wu, A.; Zheng, W. Cross-View Asymmetric Metric Learning for Unsupervised Person Re-Identification. In Proceedings of the ICCV, Venice, Italy, 22–29 October 2017; pp. 994–1002. [Google Scholar]
Zhou, K.; Yang, Y.; Qiao, Y.; Xiang, T. Domain generalization with mixstyle. In Proceedings of the ICLR, Virtual, 25–29 April 2022. [Google Scholar]
Li, X.; Dai, Y.; Ge, Y.; Liu, J.; Shan, Y.; Duan, L. Uncertainty Modeling for Out-of-Distribution Generalization. arXiv 2022, arXiv:2202.03958. [Google Scholar]

Figure 1. Compare different characteristics between two datasets.

Figure 2. Cluster noise analysis.

Figure 3. The framework of the proposed temporal smoothing dynamic re-weighting and cross-camera learning (TSDRC) method: we firstly trained our source domain model with both identity classification and attribute classification. Seven attributes, including gender, hat, backpack, bag, handbag, upper-body clothing color and lower body clothing color, were employed to enhance the source training of the source domain model. Then, we iteratively clustered the target-domain samples and dynamically re-weighted the informative samples in the target domain to fine-tune the source domain model. The circles with different colors represent different classes, and arrows indicate samples that belong to the same class close the distance, and samples that do not belong to the same class pull the distance. Here, GAP represents global average pooling, BN represents batch normalization, and FC represents fully connected.

Figure 4. Compare the attention changing influenced by using attribute knowledge on two datasets. Here, AK represents attribute knowledge.

Figure 5. Generally shared attributes (e.g., upper-body color, backpack and gender) on DukeMTMC-reID and Market-1501 datasets.

Figure 6. (a,c) denote representation learning using softmax as the loss function, and (b,d) denote representation learning using angular softmax as the loss function. The different colors shown in the four sub-figures represent different classes. (a,b) show the Binary Classification, and (c,d) show the Multi-Class Classification. As shown in the figure, angular softmax enables us to let features belonging to the same class be distributed more closely in the feature space. It makes the samples of different classes easier to be distinguished from each other, thus achieving stronger generalization performance. The learned representations are also more transferable.

Figure 7. The motivation of dynamic re-weighting. When using a higher threshold to filter the clustering results, the images belonging to the same class are the before and after frames of the same person. These samples are safe but not helpful for cross-camera retrieval. When a lower threshold is used, it can filter positive sample pairs across cameras, but it also introduces noise at the same time. These samples are effective but unsafe.

Figure 8. The influence of the number of cluster centers. The yellow line denotes mAP, and the blue line denotes Rank-1 accuracy.

Figure 9. Retrieval ranking list. Compare the reliable confidence of samples by different similarities as upper and lower bound.

Table 1. Ablation Experiments. In re-weighting, TSDRW and DS denote temporal smoothing dynamic re-weighting and dynamic sampling, while the other denotes fixed-threshold sampling. The CCT denotes whether we were using the cross-camera triplet loss.

Strategy		Market->Duke		Duke->Market
Strategy		mAP	Rank-1	mAP	Rank-1
Attributes	no	15.8	31.0	19.4	47.3
Attributes	yes	19.9	37.2	22.4	50.1
Re-weighting	0.85	33.0	57.3	32.9	64.6
	0.8	39.6	64.3	37.3	67.3
	0.7	35.6	58.8	34.4	64.8
	0.6	29.71	49.8	36.12	63.18
	DS	42.7	67.2	39.9	71.8
	TSDRW	43.0	68.3	40.3	72.2
CCT	no	42.7	67.2	39.9	71.6
CCT	yes	44.3	70.3	41.2	73.5

Table 2. Iteration frequency. We compare different iteration frequencies, such as 10 epochs per iteration. This represents that the pre-trained source model will be fine-tuned by optimized clustering results per 10 epochs.

Frequency	Market->Duke		Duke->Market
Frequency	mAP	Rank-1	mAP	Rank-1
One epoch	1.9	5.2	1.7	6.6
Two epochs	7.8	12.7	7.2	16.7
Five epochs	31.0	55.1	30.2	52.1
10 epochs	44.3	70.3	41.2	73.5
20 epochs	46.1	75.2	43.6	77.3

Table 3. Dynamic decreasing rate

η

.

Table 3. Dynamic decreasing rate

η

.

$η$	Market->Duke		Duke->Market
$η$	mAP	Rank-1	mAP	Rank-1
$1.5 \times 10^{- 1}$	21.9	35.2	19.7	36.6
$1.5 \times 10^{- 2}$	41.8	68.7	37.2	70.7
$1.5 \times 10^{- 3}$	44.3	70.3	41.2	73.5
$1.5 \times 10^{- 4}$	43.0	69.1	39.2	72.1

Table 4. Comparing TSDRC with other unsupervised domain-adaptive person ReID approaches on Market-1501 and DukeMTMC-reID datasets.

Methods	Market-1501 ->DukeMTMC-reID
Methods	mAP	Rank-1	Rank 5	Rank-10
LOMO [16]	4.8	12.3	21.3	26.6
UMDL [35]	7.3	18.5	31.4	37.4
PTGAN [36]	-	27.4	-	50.7
PUL [23]	16.4	30.0	43.4	48.5
CAMEL [48]	-	-	-	-
SPGAN+LMP [22]	26.2	46.4	62.3	68.0
TJAIDL [25]	23.0	44.3	59.6	65.0
HHL [27]	27.2	46.9	61.0	66.7
ARN [34]	33.4	60.2	73.9	79.5
CDS [46]	42.7	67.2	75.9	79.4
MixStyle [49]	28.2	46.7	-	-
DSU [50]	32.0	52.0	-	-
TSDRC	44.3	70.3	79.7	82.2

Table 5. Comparing TSDRC with other unsupervised domain-adaptive person ReID approaches on DukeMTMC-reID and Market-1501 datasets.

Methods	DukeMTMC-reID ->Market-1501
Methods	mAP	Rank-1	Rank 5	Rank-10
LOMO [16]	8.0	27.2	41.6	49.1
UMDL [35]	12.4	34.5	52.6	60.3
PTGAN [36]	-	38.6	-	66.1
PUL [23]	20.5	45.5	60.7	66.7
CAMEL [48]	26.3	54.5	-	-
SPGAN+LMP [22]	26.7	57.7	75.8	82.4
TJAIDL [25]	26.5	58.2	74.8	81.1
HHL [27]	31.4	62.2	78.8	84.0
ARN [34]	39.4	70.3	80.4	86.3
CDS [46]	39.9	71.6	81.2	84.7
MixStyle [49]	28.1	56.6	-	-
DSU [50]	32.4	63.7	-	-
TSDRC	41.2	73.5	83.1	86.5

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yin, Q.; Wang, G.; Wu, J.; Luo, H.; Tang, Z. Dynamic Re-Weighting and Cross-Camera Learning for Unsupervised Person Re-Identification. Mathematics 2022, 10, 1654. https://doi.org/10.3390/math10101654

AMA Style

Yin Q, Wang G, Wu J, Luo H, Tang Z. Dynamic Re-Weighting and Cross-Camera Learning for Unsupervised Person Re-Identification. Mathematics. 2022; 10(10):1654. https://doi.org/10.3390/math10101654

Chicago/Turabian Style

Yin, Qingze, Guan’an Wang, Jinlin Wu, Haonan Luo, and Zhenmin Tang. 2022. "Dynamic Re-Weighting and Cross-Camera Learning for Unsupervised Person Re-Identification" Mathematics 10, no. 10: 1654. https://doi.org/10.3390/math10101654

APA Style

Yin, Q., Wang, G., Wu, J., Luo, H., & Tang, Z. (2022). Dynamic Re-Weighting and Cross-Camera Learning for Unsupervised Person Re-Identification. Mathematics, 10(10), 1654. https://doi.org/10.3390/math10101654

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dynamic Re-Weighting and Cross-Camera Learning for Unsupervised Person Re-Identification

Abstract

1. Introduction

2. Materials and Methods

2.1. Source Domain Training

2.2. Target Domain Training

3. Results

3.1. Datasets

3.2. Implementation Details

3.3. Ablation Experiments

3.4. Comparing with the State-of-the-Art Approaches

4. Discussion and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI