Multi-Level Fusion Model for Person Re-Identiﬁcation by Attribute Awareness

: Existing person re-recognition (Re-ID) methods usually suffer from poor generalization capability and over-ﬁtting problems caused by insufﬁcient training samples. We ﬁnd that high-level attributes, semantic information, and part-based local information alignment are useful for person Re-ID networks. In this study, we propose a person re-recognition network with part-based attribute-enhanced features. The model includes a multi-task learning module, local information alignment module, and global information learning module. The ResNet based on non-local and instance batch normalization (IBN) learns more discriminative feature representations. The multi-task module, local module, and global module are used in parallel for feature extraction. To better prevent over-ﬁtting, the local information alignment module transforms pedestrian attitude alignment into local information alignment to assist in attribute recognition. Extensive experiments are carried out on the Market-1501 and DukeMTMC-reID datasets, whose results demonstrate that the effectiveness of the method is superior to most current algorithms.


Introduction
Given a pedestrian image, the purpose of person re-identification is to retrieve images of the pedestrian from cross-camera devices. Person re-identification is an image-retrieval technology designed to compensate for the visual limitations of fixed cameras. It can be combined with pedestrian-detection and tracking technologies for use in fields such as intelligent pedestrian detection and intelligent security [1][2][3][4].
Pedestrians have both rigid and flexible characteristics, due to the differences between camera devices, and their appearance is easily affected by factors such as clothing, posture, weather, and occlusion, making person re-identification one of the most challenging research topics in the field of computer vision.
The main idea of traditional image-based person re-recognition is to compare the similarity of two identities. The similarity of different identities is small, and the similarity of the same identity is large. Though the supervised person re-identification problem has labeled information, for most person Re-ID datasets, their boundary boxes are detected by outdated detectors. These limitations make it necessary to improve discriminative performances in person Re-ID. Pedestrian attributes have many commonalities between identities, and there are many differences in pedestrian images with the same identity. As a result, the features extracted by the network often cannot accurately measure the similarity. Therefore, simply relying on the labeled information to determine the pedestrian distance can easily lead to the deviation of the network's attention features. For example, assume three pedestrians have identities (x 1 , y 1 ), (x 2 , y 2 ), and (x 3 , y 3 ), where y 1 = y 2 = y 3 , while x 1 and x 2 may be very similar, the similarities S(x 1 , x 2 ), S(x 1 , x 3 ), and S(x 2 , x 3 ) will be far apart. As a result, the network pays attention to other regional feature information. When attributes are introduced, (x 1 , y 1 ) can be expressed as (x 1 , y 1 , A 1 ). When calculating 1.
Our model includes a multi-task learning module, local information alignment module, and global information learning module. The local information alignment module transforms pedestrian attitude alignment into local information alignment to inference pedestrian attributes.

2.
We design an improved network based on non-local and instance batch normalization (IBN) to learn more discriminative feature representations.

3.
The proposed method outperforms the latest person re-identification methods.
The remainder of this paper is arranged as follows. Section 2 introduces related work of image-based person re-identification. Section 3 introduces our proposed method. Section 4 discusses simulation experiments using the proposed method on two image-based person re-recognition datasets. Section 5 summarizes our proposed method.

Related Work
Person re-identification solves the problem of matching pedestrian images between unrelated cameras. It faces challenges caused by different perspectives, postures, occlusion, and other issues. To solve these problems, we need to increase the inter-class distance and reduce the intra-class distance. Traditional methods depend on metric learning [10][11][12] and deep learning [13,14].
Learning each attribute independently is an intuitive idea, but it makes person attribute recognition redundant and inefficient. Therefore, researchers tend to evaluate all attributes in a network model, and evaluate each attribute as its own task. Due to the high efficiency of multi-task learning, researchers have been paying increased attention to it [15][16][17][18].
Lin et al. [6,7] proposed a network framework combining pedestrian ID labels and attributes, breaking through the traditional limitation of learning only using pedestrian ID labels. They construct a multi-task network that simultaneously learns pedestrian ID labels and predicts attribute labels by introducing pedestrian attribute labels. Person re-identification is challenging because pedestrians are both flexible and rigid, and camera devices are different and greatly affected by the environment. These multi-task networks have high accuracy in attribute recognition, but do not perform person re-identification reliably. Unlike these networks, to improve the accuracy of person re-identification, this paper uses pedestrian attribute labels as an aid, and discusses their function in pedestrian recognition.
Lin et al. [6] and Yin et al. [7] build a multi-task network by introducing pedestrian attribute labels, enabling the network to learn identity and attributes. Lin et al. labeled pedestrian attributes and combined attribute learning and global learning of pedestrian re-identification methods. However, there has been no quantitative analysis of attributes, and the problem of hard samples is not considered. Yin et al. added hard-sample learning to force the network to extract more advanced semantic features. However, the network lacks local learning capability and has limited improvement effects. In this paper, we design a multi-level fusion model with joint local and global learning and introduce various tricks to improve the learning ability, improving the discriminative ability of the network as a whole. Figure 1 shows an example. Figure 1a shows the role of attributes in pedestrian re-recognition. Through ID tag learning and attribute learning, the network can learn semantic information of pedestrian attributes. However, pedestrians are flexible and rigid, which makes it difficult to re-identify pedestrians. For example, Figure 1b,c. Figure 1b show two different identities. Each identity has two pictures, but they have different points. For example, in the first identity, pedestrians sometimes carry a bag and sometimes do not. In identity 2, sometimes pedestrians ride bikes and sometimes they walk. Figure 1c shows three different identities. Their jackets are all yellow with similar characteristics. If you only use ID tags for learning, it is hard to tell these identities apart. The first identity and the second identity are both male, and the third identity is female. Attribute learning can separate them. The second identity is wearing a backpack and riding a bike. Attribute learning can separate these, too.
Zhu et al. [18] used attributes to assist a person re-identification network, fusing the low-level feature distance and attribute-based distance as the final distance to distinguish whether a given image has the same identity.
Attributes have been introduced to video-based person re-recognition because of their role in detection and recognition. Zhao et al. [19] proposed an attribute-driven method for feature decomposition and frame weighting. The sub-features are re-weighted through the confidence of attribute recognition and integrated into the time dimension as the final representation. Through this strategy, the area with the largest amount of information in each frame is enhanced, which contributes to a more differentiated sequence representation. Song et al. [20] proposed the partial attribute-driven network (PADNet). Methods such as this are based on global-level feature representation. Pedestrians are automatically divided into multiple body parts. A four-branch multi-label network is used to explore the spatiotemporal cues of the video. The bulk of person re-identification is based on static images. Although the ideas around solving the problem are different, the aim is to retrieve the most similar image. In the training phase, the distances of the same class should be as close as possible, and the distances of different classes should be separated as much as possible. In the testing phase, we compare all pedestrian images in the gallery, and select the one with the closest distance. Translated into the problem of retrieving the most similar image, the construction of features is particularly important. If we treat this problem according to the traditional artificial perspective, we will judge the identity of pedestrians by criteria such as the clothing, age, and body.

Network Structure
We describe the network structure in detail. Figure 2 shows the proposed network framework, which has two parts: attribute recognition and identity recognition, corresponding to A and B, respectively. A and B have parts in common. Features of attribute identification and the global branch are derived from "Feature Aggregation 1". The features of the local branch are derived from "Feature Aggregation 2". Unlike the global branch, the local branch needs to produce features of six different parts. We separate the identity network from the entire network framework to understand the structure more clearly, as shown in Figure 3. Figure 3 is a multi-level feature-fusion network focusing only on identity recognition. It is a person Re-ID network framework without attribute recognition. Figure 3 contains the global feature-extraction module and local feature-extraction module. The goal of the global module is to extract global information on pedestrians. The purpose of the local module is to align the pedestrian within the boxes. After the local average pooling layer, six different part-level features' expressions are generated. In the training phase, the attribute-identification branch, the result of the global branch and the local branch calculate the loss value through the loss function, and complete backpropagation. The "C" in module A of Figure 2 represents a concat function to combine the classification results of all attributes. BCE loss is used to calculate the difference between the predicted and real attributes and to learn the relationship between pedestrian attributes and aggregated features through backpropagation.
In the testing phase, the results of the global and local branches are fused, and the required results are output by an evaluation function. Points "1" and "2" of Figure 3 represent the output results of person re-identification with attributes in the testing phase. The metrics in "1" of Figure 3 are commonly used for person re-identification with attributes. It contains F1 score, recall and accuracy.
To provide a clearer backbone to what is proposed in this paper, Figure 3 shows a pedestrian re-recognition network framework without attribute recognition. The framework describes the workflow of the global and local branches in more detail. "G" in the figure represents the global branch and "L" represents the local branch.
In the training phase, we preprocess the input images, which plays the role of data enhancement. The preprocessing operation includes five data-enhancement modules: resize, random horizontal flip, pad, random crop, and random erasing. These can help prevent the network from falling into local extrema during the training process, which leads to overfitting.
Preprocessing also helps realize the diversification of input images and helps the to better train the network. Then, preprocessed data are fed into the ResNet backbone network used in this paper. Two modules, instance batch normalization and the non-local network (Section 3.1), endow ResNet with a stronger feature-extraction performance.
We introduced generalized mean pooling [21], whose function is similar to that of adaptive average pooling, to aggregate the feature embedding after the backbone. Its mathematical formula is as follows: where f i represents the i-th dimension of feature aggregation, and represents the aggregated features. In particular, when p i = ∞, then generalized mean pooling evolves into max pooling, and when p i = 1, it evolves into average pooling. In this paper, we set p i = 3. The aggregated features are sent to modules A and B in Figure 2, respectively. Module A learns the relationship between pedestrian attributes and features, and can learn the correlation between attributes and features. Module B learns the relationship between pedestrian identities and features. It contains a global and local branch. Module B can learn the global and part-level features of pedestrian images. Modules A and B combine to focus the network on pedestrian attributes as well as identities, and can prevent network overfitting.
To obtain the relationship between each attribute and the aggregation features, a batch-normalization (BN-1) module is set in module A for each attribute. The BN-1 module is shown in Figure 4a, followed by a two-classifier, to determine whether the current pedestrian feature contains this attribute.
In module B in Figure 2, to obtain the relationship between each identity and the aggregated features, a batch-normalization (BN-2) module ( Figure 4b) is designed. This has a similar function to BN-1 in Figure 4a, but only a simple one-dimensional batch normalization operation is used. Finally, triplet, softmax, and center loss are used to calculate the difference between the predicted and real identities. Using backpropagation to learn the relationship between pedestrian identity and aggregation features, the network brings pedestrians of the same identity as close as possible, and keeps pedestrians of different identities farther apart.
The testing phase differs from the training phase. The network does not perform data enhancement on the input images, but to adapt to the network, it performs a resizing operation. After the aggregated features are obtained, the attributes of the current input images (query and gallery) can be judged through the attribute classifiers. The network can also output the embedded features of each input image (query) through the BN-2 module in Figure 4b, and find the images with the best rank score from the gallery through the distance-matching method. The "1" and "2" in Figure 3 represent the output results of the double-stream network in the testing phase. The metrics in "2" are commonly used for person re-identification problems. The embedded features in "1" are used for output visualization.

Non-Local Residual Network (ResNet) of Instance Batch Normalization (IBN)
We add attention-like non-local [22] and instance batch normalization (IBN) [23] modules to learn more robust features.
The generalized non-local operation can be defined as: where i represents the output location and j represents all possible locations, x is the input image, and f (x i , x j ) is a function that calculates the affinity of i and j. g is a unary function representing the input image at location j, and C(x) is a normalization function. We apply non-local operations to the ResNet of instance batch normalization, making some changes to adapt it. Specifically, C(x) = N, the number of locations of x, and f (x i , x j ) uses a dot product to calculate the affinity of i and j, In this paper, the non-local block acts on Layer 2, Layer 3 and Layer 4 of ResNet.

Loss Function
In a person re-identification dataset, we can use D = {(x 1 , y 1 ), (x 2 , y 2 ), ..., (x n , y n )} to represent the set with identity labels, where x i and y i are the input image and label, respectively, of identity i, and n is the total number of input images [24,25]. In the person re-identification dataset with attributes, we can use A to represent the attribute set of all identities, where A i = (A 1 i , A 2 i , ..., A m i ) is the attribute subset of identity i, and m is the number of attributes of each identity. Therefore, we can use E = {(x 1 , A 1 ), (x 2 , A 2 ), ..., (x n , A n )} to represent a set with attribute labels. For the two sets D and E, we use a bidirectional parallel approach to solve the person re-identification problem [26,27]. Therefore, we can define the following three functions.
For the set D with identity labels, we define two functions, both based on the objective function of identity labels. First is a classification function based on identity labels, where q i = e z i ∑ n j=1 e z j is the confidence probability of the prediction label. φ(θ, x i ) is the feature embedding function of the first ith identity. θ is the training parameter of the feature embedding function. f Id (w Id , φ) is the classification function of the feature that embeds the identity label, and w Id is the training parameter of the classification function. L Id ( f Id , y i ) is the image label loss function of identity i. The purpose of the function F Id is to find the appropriate image feature embedding so that the identity obtained by training is as consistent as possible with the real label.
The second is a metric learning function based on identity labels.
where d p is the Euclidean distance of the positive pairs, d n is the Euclidean distance of the negative pairs, α is the decision boundary of the Triplet loss. L Tri (φ, y i ) is the metric loss function of the feature embedding identity. The purpose of F Tri is to find the appropriate image feature embedding so that the identities of the same labels are as close as possible, and those of different labels are separated as much as possible. We combine classification learning and metric learning to find an image feature embedding to better solve the person re-identification problem. For a set E with attribute labels, we define a function, where y m is the target class of the m-th attribute, p(m|x) is the confidence probability of the prediction attribute. f Att j (w Att j , φ) is the classification function of the feature that embeds the attribute of identity j, and L Att ( f Att j , A j i ) is the j-th classification loss function of identity i. We integrate all the attributes of identity i to obtain its attribute set. The purpose of F Att is to find a suitable image feature embedding so that the identity attribute set obtained by training is as consistent as possible with the real attribute set.
In the testing phase, we use the feature embedding function φ(θ, x i ) to embed the query set and all gallery set images in the feature space. The identity label of the query image is judged according to the Euclidean distance between each query image and all gallery images, and f Att j (w Att j , φ(θ, x i )) calculates all the attributes of each query image identity.
To better adapt these classifiers ( f Id and f Att ), we normalize them after executing the feature embedding function φ(θ, x i ). In particular, in the testing phase, we normalize them before calculating the Euclidean distance between each query image and all gallery images.

Experiment
We conducted experiments to verify the effectiveness of Algorithm 1. In order to distinguish it from others, we call the method proposed in this paper multi-level model for person re-identification by attribute awareness (MLAReID). Extract feature vectors from input images by the model 3: Predict labels, attributes from input images by the model 4: Update ID loss with Equation (3) 5: Update Triplet loss with Equation (5) 6: Update Attribute loss with Equation (9) 7: end for Output: F1 score, Recall, Accuracy, cmc, mAP, mINP

1.
Market-1501 [28] This dataset was collected by six cameras in front of a supermarket at Tsinghua University. It has 1501 identities and 32,668 annotated bounding boxes. Each annotated identity appeared in at least two cameras. The dataset is divided into 751 training identities and 750 testing identities, corresponding to 12,936 and 19,732 images, respectively. Attributes are annotated by pedestrian identity. Each image has 30 attributes. Note that although the upper-and lower-body clothing have seven and eight attributes, respectively, each identity has only one color marked "yes".

2.
Duke Multi-Target, Multi-Camera (DukeMTMC-reID) [29,30] The dataset from Duke University contains 1812 identities and 34,183 annotated bounding boxes. It is divided into 702 training identities and 1110 testing identities, corresponding to 16,522 and 17,661 images, respectively. Attributes are annotated by pedestrian identity. Each image has 23 attributes.
For the fairness of comparison, each image has a width of 128 pixels and a height of 256 pixels.

Evaluation Metrics
To measure the performance of the algorithm, we used standard metrics including cumulative matching cure (CMC), mean average precision (mAP), mean inverse negative penalty (mINP), and receiver operating characteristic (ROC) curve.

Datasets and Settings
The algorithm in this paper uses data-enhancement methods such as random horizontal flip, pad, random crop, and random erasing to preprocess the input images. For the triplet loss function, in the training phase, four identities are fed into the network in each batch, and each identity has eight images, for a total of 32 pedestrian images.

Comparison with the State-of-the-Art
We compare our method with methods published in recent years. Tables 1 and 2 compare the Rank1, Rank5, Rank10, and mAP evaluation metrics of the Market-1501 and DukeMTMC-reID datasets, respectively, to those of these other methods, where "-" means there is no record. For the Market-1501 dataset, the methods compared are GAN-based, part-based, and combined. From Table 1, we find that our proposed method achieved the best results on Rank1, Rank5, Rank10, and mAP when compared with current state-of-the-art methods. For the DukeMTMC-reID dataset, compared with other methods on the evaluation metrics Rank1, Rank5, Rank10, and mAP, the method proposed in this paper achieves the best performance.
For the Market-1501 dataset, this paper lists the accuracy of 12 attributes for comparison. "age" has four attributes: young, teenager, adult, and old. "upcolor" has eight attributes: upblack, upwhite, up-red, uppurple, upyellow, upgray, upblue, and upgreen. "downcolor" has nine attributes: downblack, downwhite, downpink, downpurple, downyellow, downgray, downblue, downgreen, and downbrown. "Upcolor" and "Downcolor" in Figure 5b represent the average accuracy of these two sets of related attributes. It can be found from Figure 5 that recognition rate of "Backpack", "Bag", "Hat", and "Up" are not as good as the method proposed by Yin [7], but the recognition rates of the other attributes are significantly improved. For the DukeMTMC-reID dataset, we list the accuracy of 10 attributes for comparison. "upcolor" has eight attributes: upblack, upwhite, upred, uppurple, upgray, upblue, upgreen, and upbrown. "downcolor" has seven attributes: downblack, downwhite, downred, downgray, downblue, downgreen, and downbrown. "Upcolor" and "Downcolor" in Figure 6b represent the average accuracy of these related attributes. It can be found from Figure 6 that, except for the attributes of "Backpack", "Bag", and "Top", the recognition rates of the attributes are greatly improved compared to the method proposed by Yin [7]. To better verify the performance of feature extraction in person re-identification with attributes, we discuss its cross-domain capabilities, as shown in Table 3. It can be seen from Table 3 that the proposed method has advantages in Rank1 and mAP compared with other methods. M→D indicates that the source domain is Market-1501 and the target domain is DukeMTMC-reID, and D→M indicates the opposite. This verifies that the algorithm is effective in extracting pedestrian features after fully learning the relationship between pedestrian attribute labels and features, as well as between pedestrian identities and features.

Ablation Study
To better illustrate the effectiveness of the proposed method, we carried out ablation experiments for the three modules of non-local, instance batch normalization, and attributes. According to the Rank1, mAP, and mINP metrics, we evaluated whether to join the experimental results of these three modules.
It can be seen from Figures 7 and 8 that, compared with "without multi-level", both the performance of Rank-1 and mAP of MLAReID are improved more obviously in the iteration process, whether it be Market-1501 or DukemtMC-ReID. It shows that MLAReID proposed in this paper pays more attention to pedestrian images comprehensively and abstractly, and can better extract the semantic information of pedestrian images.  In Table 4, " √ " means to use a module, and a blank means to not use it. Without using the three modules of non-local, instance batch normalization, and attributes, we obtained 94.1% Rank1, 85.0% mAP, and 57.1% mINP on the Market-1501 dataset. When the attribute module was applied, Rank1 rose by 0.1%, mAP by 1%, and mINP by 2.1%. Application of the three modules produced much better results than the model with no modules. Rank1 improved by 2%, mAP by 5.3%, and mINP by 13.9%. For the DukeMTMC-reID dataset, without using the three modules, Rank1 was 85.9%, mAP was 74.8%, and mINP was 36.4%. After applying the three modules, Rank1 increased by 5.5%, mAP by 6.6%, and mINP by 11.5%. From the ablation experiment, we can see that these three modules improved network performance, and we verified the effectiveness of the proposed algorithm. In Figures 7 and 8, our ablation experiments on Market-1501 and DukeMTMC-reID for multi-level fusion module are listed. "MLAReID" represents a multi-level fusion model based on ResNet. "without multi-level" represents the model without a multi-level fusion module. As can be seen from Figures 7 and 8, for both Market-1501 and DukeMTMC-reID, the algorithm in this paper has a more comprehensive and abstract focus area for pedestrian images, which can better express the semantic information of images.

Visualization
We used a variety of visualization experiments to analyze the performance of the proposed method. Figures 9 and 10 show ROC and CMC curve on two datasets.  To better verify the effectiveness of the proposed algorithm, we compared the visualization results of the two networks. "ID" in these figures (Figures 11-14) represents the query label, and serial numbers from 1 to 10 represent the sorting results from largest to smallest. Numbers in red indicate an incorrect match, and green indicates a correct match. The first line represents the visualization results of the baseline without attributes, and the second line represents the visualization results of the proposed method.
From Figure 11, we can find that the baseline (without local branch, non-local, IBN, and attribute) using no attributes has more mismatches. For example, for input image with ID 94, the top 10 images have many matching errors. This method makes an error in the "backpack" attribute, and regards a pedestrian without a backpack as an exact match. In addition, there is a matching error in the "clothing" attribute. For the pedestrian with ID 934, the baseline (without non-local, IBN, and attribute) does not correctly match the "hair" attribute. The method proposed in this paper accurately recognizes the key attributes of the pedestrian. The network has learned the relationship between key attributes of the pedestrian and pedestrian features, as well as between pedestrian identity labels and features. From Figure 12, we selected pedestrian images with IDs 47 and 288 from the query. The baseline (without local branch, non-local, IBN, and attribute) that did not use attributes had more incorrect matches. For the pedestrian with ID 47, the baseline (without local branch, non-local, IBN, and attribute) had some errors on the "backpack" and "bag" attributes. For the pedestrian with ID 288, the baseline did not correctly match the "hair" attribute. We used GradCam to generate heat maps for the input pedestrians for comparison. For the two pedestrians from the Market-1501 dataset, the baseline (without local branch, non-local, IBN, and attribute) was compared with the proposed method. From Figure 13, we can find that the method of this paper focuses on parts-level features, and is more accurate than baseline. The proposed method accurately recognizes the key attributes of pedestrians, which demonstrates their important role in network parameter learning. Figure 14 shows two examples for each dataset, one positive and one negative. Positive examples indicate that our proposed method can make correct predictions. Negative examples show that the proposed method can correctly predict attributes of pedestrians, but it is wrong on attributes that they do not possess. For example, for ID 94, our network predicts that the pedestrian has the "teenager", "backpack", "clothes", "up", "upblack", and "downblack" attributes, but ID 94 does not have the "downblue" attribute. For ID 329 of Market-1501 and ID 98 of DukeMTMC-reID, our network predicts them both completely accurately. The network not only predicts the attributes the pedestrian image has, but those the pedestrian image does not have.

Time-Complexity Analysis
We provide the time complexity of the proposed method according to the network structure, T a = T[A g (L4)] + N × (T[C(E g )] + T[L a (C(E g ))]) (10) T g = T[A g (L4)] + T[C(E g )] + T[L(E g )] T = T a + T g + T l (13) where T a , T g , and T l are the time complexities of the attribute, global, and local branch, respectively; T is their sum; L4 is the output of ResNet50; A g is the aggregation function of the global branch; A l is the aggregation function of the local branch; C is the classification function; E g is feature embedding of the global branch; E l is feature embedding of the local branch; and N is the number of attributes. The image is cut into six equal parts in this paper.

Conclusions
Each pedestrian has different attributes, and the number of attributes is variable, leading to unbalanced data distribution. In addition, pedestrians walk, and so the images captured by cameras are usually blurred, making it challenging to identify pedestrian attributes. This paper introduces a local information alignment module which focuses on specific regions. It combines a multi-task learning module and global module by learning the relationship between pedestrian attribute semantics and pedestrian identity. Our method solves the low attribute-recognition precision caused by unbalanced data distribution and blurred pedestrians to a certain extent. Through the experiments of two datasets, the multi-level network proposed in this paper can improve the precision of attribute recognition and enhance the performance of person Re-ID.
In the future, we will first generate images through the Generative Adversarial Network (GAN) to deal with unbalanced data distribution and low-quality pedestrian images. In addition, pedestrian images can be generated according to given attributes to increase training data. Second, we can introduce Graph Neural Network (GNN) and attention mechanisms to learn pedestrian posture information. We will focus on specific areas through alignment of pedestrian posture to identify pedestrian attributes more accurately.
Author Contributions: S.P. designed the algorithm, analyzed the experimental data, and wrote the manuscript. X.F. provided supervision, funding, and experimental equipment. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.