Next Article in Journal
A 5–18 GHz Four-Channel Multifunction Chip Using 3D Heterogeneous Integration of GaAs pHEMT and Si-CMOS
Previous Article in Journal
Adaptation of Fuzzy Systems Based on Ordered Fuzzy Numbers: A Review of Applications and Development Prospects
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Part-Attention-Based Pseudo-Label Refinement Reciprocal Compact Loss for Unsupervised Cattle Face Recognition

School of Automation and Electrical Engineering, Inner Mongolia University of Science & Technology, Baotou 014010, China
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(12), 2343; https://doi.org/10.3390/electronics14122343
Submission received: 28 April 2025 / Revised: 4 June 2025 / Accepted: 5 June 2025 / Published: 7 June 2025

Abstract

Cattle face recognition is a feasible way for identification of cattle in information management of large farms or identity verification in commercial insurance for farms. Recent cattle face recognition approaches, based on supervised learning, heavily depend on annotation which is both labor-intensive and time-consuming. Unsupervised learning for cattle face recognition aims at learning discriminative representations for cattle retrieval from unlabeled data. However, the inherent noise in pseudo-labels significantly hinders the performance. Thus, we propose an unsupervised learning framework with part-attention-based pseudo-label refinement reciprocal compact loss (USL-PARC) to enhance the reliability of the pseudo-label by the fine-grained local context derived via attention mechanism, while obtaining separable and discriminative features by contrastive learning with the compact loss. Firstly, we propose a part-attention-based pseudo-label refinement framework to refine the pseudo-labels of global features by dynamically supplementing local fine-grained information, thereby mitigating the effects of pseudo-label noise. Secondly, ResNet-Sim network, augmented with the SimAM attention mechanism, is constructed to strengthen the ability of capturing more informative localized supplementary information. Finally, we raise compact loss to increase the tightness of the clustering of feature points from the same identity in the feature space. It is encouraging to find that USL-PARC achieves 97.4% accuracy, outperforming the state-of-the-art unsupervised learning models on our CattleFace2025 dataset. These results demonstrate the effectiveness of our proposed USL-PARC on mitigating the impact of pseudo-label noise and enhancing the learning ability of separable and discriminative features.

1. Introduction

Cattle face recognition paves a feasible way for identification of cattle in all aspects of information management for large-scale pasture, such as individual record establishment, epidemic prevention registration, and pedigree information maintenance. It also provides an effective method for identity verification in procedures in commercial insurance that needs cattle ID. Thus, it is becoming a topic of great interest with regard to computer vision in agriculture. With the extraordinary progress of deep learning, cattle face recognition based on Deep Convolutional Neural Networks (DCNNs) has developed rapidly due to its remarkable feature extraction capability.
For cattle face recognition based on DCNNs, approaches on Deep Metric Learning (DML) are the common methods. This mainly includes two tasks: the construction of convolutional network structure and the design of effective loss function. ResNet or Inception-based networks [1,2], trained with distance restraint as triplet loss [3], are typical frameworks in current cattle face recognition tasks. The network structure used will directly affect the quality of the extracted cattle face features, which in turn affects the accuracy and robustness of the recognition. In accordance with the well-designed architecture, an effective loss function is equally important for model training. The researchers primarily adopted ArcFace [4,5], which has demonstrated strong performance in face recognition, and refined the conventional cross-entropy loss [6]. A well-designed loss function can enhance the learning capability of convolutional neural networks and boost feature discriminability, thereby facilitating individual identification.
However, current cattle face recognition algorithms using DML are mainly based on supervised learning and heavily rely on a large amount of labeled data for model training. The annotation of cattle images is both time-consuming and labor-intensive. Moreover, in practical livestock farming scenarios, there are a very large number of unlabeled individuals being born or in newly established farmlands.
Unsupervised learning provides an effective approach to training models using unlabeled data, addressing the reliance on annotations. Pseudo-label-based unsupervised learning methods primarily consist of two stages: (1) generating pseudo-labels for unlabeled data based on unsupervised clustering [7,8,9], and (2) utilizing the generated pseudo-labels to guide model training. Unsupervised learning with the above stages has achieved great success in human recognition tasks. However, pseudo-labels obtained with the clustering method inevitably include noise, and the inferior quality of these pseudo-labels directly impacts the learning ability of the module.
To mitigate the impact of noise in pseudo-labels, extensive efforts have been made to increase the label reliability, including the use of auxiliary networks [10,11] and the enhancement of clustering algorithms [12,13]. However, these approaches neglect the importance of localized fine-grained cues. Research [14] has shown that fine-grained information in local features contribute to identity recognition. The pseudo-label refinement approach proposed in [15] also explicitly demonstrates the effectiveness of enforcing the reliability of the labels with refined local information, while it still omits strengthening the model’s extraction capability of local informative characteristics.
Thus, we propose an unsupervised learning framework with part-attention-based pseudo-label refinement reciprocal compact loss (USL-PARC) for cattle face recognition. Our USL-PARC framework refines pseudo-labels by incorporating discriminative local information to mitigate the impact of pseudo-label noise, while the SimAM attention mechanism [16] is integrated into our ResNet-Sim network to enhance fine-grained feature extraction. Additionally, compact loss, including smoothed cross-entropy loss, triplet loss, and density loss, is introduced to enlarge the inter-class separability and enhance intra-class compactness.
The contributions of this work are as follows:
(1)
Since local features remain relatively stable despite variations in viewpoint and cattle posture, ResNet-Sim network based on SimAM is constructed to capture fine-grained details, while a cross agreement score is employed to assess the reliability of local information, with which the local context uncovered above is adaptively supplemented to refine pseudo-labels.
(2)
Compact loss is adopted to strengthen the discriminability of both global and local features by increasing the distances between features of different individuals and compressing the feature cluster of the same class.
(3)
The CattleFace2025 dataset is created to support unsupervised learning tasks for cattle face recognition. It is encouraging to find that our proposed USL-PARC framework outperforms the state-of-the-art unsupervised learning models on CattleFace2025. CattleFace2025 will be available publicly after the paper is accepted.
The rest of the paper is organized as follows: the CattleFace2025 dataset is introduced in Section 3. The USL-PARC framework is proposed in Section 4. The experimental details and results are provided in Section 5. Finally, conclusions and future work are presented in Section 6.

2. Related Work

2.1. Cattle Face Recognition Based on DML

Current cattle face recognition approaches based on deep metric learning that utilize well-designed network architecture [1,2,3] and effective loss function [4,5,17] have demonstrated significant competitiveness in cattle identification tasks. Xu et al. [18] proposed the EEM and EOM modules to enhance the MobileFaceNet, addressing suboptimal accuracy on images with low discriminability and poor robustness to pose variations. TB-CNN [19] combines two feature extraction networks with the SE block to extract features from different angles, mitigating the effects of angle variations and partial feature loss. To effectively leverage local features, GPN [20] adaptively exploits the global information and fine-grained local details based on feature maps of different hidden layers to learn more discriminative features. From a practical application perspective, Li et al. [21] developed a lightweight neural network with six convolutional layers, overcoming the challenge of deploying models to resource-constrained embedded systems.
An effective loss function serves to supervise the model in learning separable and discriminative features. CattleFaceNet [4] adopted Additive Angular Margin Loss (ArcFace) to strengthen the within-class compactness and between-class discrepancy during training. Meng et al. [6] employed the additive margin softmax (AM-Softmax) loss, where a margin m is added to the decision boundary to increase the separability of the classes. Additionally, cross-entropy loss [5,21], triplet loss [20], and contrastive loss [17] are also widely utilized in cattle face recognition tasks. However, the aforementioned DML approaches, which are based on supervised learning, require a large amount of labeled data, but annotating cattle is labor-intensive and time-consuming.

2.2. Pseudo-Label Refinement

Training with pseudo-labels in unsupervised learning provides an alternative approach to guiding the learning process, eliminating the reliance on annotations. However, pseudo-labels generated through clustering [22,23] often contain noise, which can significantly degrade the performance. To address this issue, recent techniques have been proposed to improve the accuracy of pseudo-labels by optimizing clustering algorithms [12,13,24], introducing auxiliary networks [10,11], or employing pseudo-label refinement methods [15,25,26].
Considering that the performance of pseudo-labels is directly determined by the effectiveness of clustering, Zhai et al. [13] proposed an AD-Cluster technique to enhance the intra-cluster diversity. Ge et al. [12] proposed a self-paced contrastive learning strategy with a clustering reliability criterion to identify unreliable clusters by measuring the independence and compactness. To reduce the “sub and mixed” clustering errors, ISE [24] generates support samples from actual samples and their neighboring clusters in the embedding space through a progressive linear interpolation (PLI) strategy.
Meanwhile, auxiliary networks are introduced to reduce label noise, which leverage the model ensemble in a peer-teaching manner by using predictions from auxiliary networks as refined labels for the target network. Specifically, MMT [11] is performed using off-line-refined hard pseudo-labels and on-line-refined soft pseudo-labels in an alternative training manner, while MEB-Net [10] introduces an authority regularization to accommodate the heterogeneity of experts learned with different architectures.
In addition, pseudo-label refinement methods have also demonstrated effectiveness in improving label quality. Lin et al. [25] explored the similarity between images based on image features and auxiliary information, identifying a reliable set of samples for each instance to refine the labels. Zhang et al. [26] utilized clustering consensus to assess the similarity of pseudo-labels used in iterative training and improve the pseudo-labels through temporal propagation and pseudo-label aggregation. PPLR [15] designed a cross agreement score to capture the complementary relationship between global and part features, thereby utilizing representations with rich local contexts to enhance label applicability.

3. Materials: CattleFace2025 Dataset

With the development of cattle face recognition methods based on DCNNs, a series of datasets for cattle face recognition have been created. The details of the current cattle face datasets are listed and shown in Table 1.
The CattleFace2025 dataset, including 11,880 cattle face images of 574 Holstein Friesian and Simmental individuals from several ranches in the Inner Mongolia Autonomous Region, was created. All the images were captured using Panasonic DC-GH5S cameras and smartphones under different backgrounds with varying viewpoints and illumination. All images were cropped to 500 × 500 pixels with the face in the middle. Specifically, compared to other datasets, there are two breeds, including Holstein and Chinese Simmental, in CattleFace2025 with a relatively large population and environmental diversity. Some of the samples are shown in Figure 1.

4. Methods: Our Proposed USL-PARC Framework

We propose a part-attention-based pseudo-label refinement reciprocal compact loss (USL-PARC) framework, which mitigates the noise of the pseudo-label by enhancing its reliability using attention-guided fine-grained local context. Furthermore, compact loss is utilized to improve the feature distribution by minimizing intra-class distance while maintaining inter-class separability.
Following existing pseudo-label-based unsupervised learning methods, USL-PARC comprises two alternating phases: clustering and training. In the clustering phase, global and local features are extracted using the ResNet-Sim network. Then, pseudo-labels are generated by clustering global features, while the cross agreement score [15] is calculated to evaluate similarity between part and global features. In the training phase, pseudo-labels are refined by aggregating local fine-grained information based on the cross agreement score. The model is trained using compact loss guided by the refined pseudo-labels. The framework of USL-PARC is illustrated in Figure 2.

4.1. ResNet-Sim Network

ResNet-Sim network focuses on fine-grained local details to extract global and part features. Given an unlabeled cattle face dataset denoted by X = x i i = 1 N , where N is the number of samples, global features f i g i = 1 N and corresponding local features f i p m m = 1 P , where P represents the number of local features, are extracted using the ResNet-Sim network.
Specifically, we use ResNet50 as the backbone, embedding SimAM to capture more discriminative representations. The SimAM is a parameter-free attention mechanism that enhances the model’s representational learning ability without increasing computational overhead. It evaluates the feature discriminability using active parameters. The detailed structure of SimAM is shown in Figure 3.
Given the i-th element e i c within the c-th channel of the feature map, the active parameter a i c is represented as follows:
a i c = ( e i c μ c ) 2 + 2 ( σ c 2 + λ ) 4 ( σ c 2 + λ )
where μ c = 1 N c i = 1 N c e i c denotes the mean value of elements within the c-th channel, σ c 2 = 1 N c i = 1 N c ( e i c μ c ) 2 represents the variance of elements, N c refers to the total number of elements in the channel, and λ is the regularization coefficient. The activity level of element e i c is expressed by a i c , based on which the adjusted element e ˜ i c on the feature map is represented as follows:
e ˜ i c = s i g m o i d ( a i c ) × e i c
In our ResNet-Sim, the SimAM attention mechanism is embedded after the outputs of Layer3 and Layer4 in the ResNet50 backbone. The global feature map is evenly divided into three horizontal regions to generate local feature maps. Global and corresponding local features are obtained via average pooling and max pooling, respectively. ResNet-Sim effectively enhances the ability to extract local features, which in turn provides reliable complementary information for the refinement of pseudo-labels.

4.2. Part-Attention-Based Pseudo-Label Refinement

4.2.1. Cross Agreement Score

Local information of a specific part of the face maintains relative invariance to variations in viewpoint and posture, making it a reliable basis for identification. Inspired by this, we aggregate local predictions to refine the global pseudo-labels. However, local features from the same image may capture identity-related information that differs from the global features and may also be susceptible to the effects of noise. Accordingly, the cross agreement score [15] is introduced to evaluate the reliability of local features, with the aim of reducing interference from irrelevant regions.
The cross agreement score S i p m is defined as the similarity between the global feature of x i and the local feature of its m-th part. Firstly, a k-nearest neighbor search is conducted separately on the global and each local feature spaces to generate ranked lists of indices containing the k most similar samples for f i g and f i p m . The cross agreement score is then calculated using the Jaccard similarity coefficient between the sets, and is represented as follows:
S i p m = | R ( f i g , k ) R ( f i p m , k ) | | R ( f i g , k ) R ( f i p m , k ) |
where R ( f i g , k ) and R ( f i p m , k ) represent the sets of indices from the ranked lists computed by f i g and f i p m , respectively, and | | denotes the cardinality of a set.
According to Equation (3), a strong complementary relationship between f i p m of image x i and f i g leads to an increase in S i p m , indicating that the local features provide reliable supplementary information. Conversely, a decrease in S i p m suggests that the local features may introduce misleading information into pseudo-label refinement, where S i p m 0 , 1 . The cross agreement score is utilized in both pseudo-label refinement and smoothed cross-entropy loss.

4.2.2. Part-Based Pseudo-Label Refinement

The pseudo-label for sample x i , generated by DBSCAN clustering, is defined as y ^ i C , which represents the one-hot encoding of the hard assignment with C clusters. Since the cross agreement score quantifies the reliability of local features, a weighted sum of local predictions based on the corresponding score is conducted for refinement. The refined pseudo-label y ˜ i , leveraging reliable fine-grained local context, is represented as follows:
y ˜ i = α y ^ i + ( 1 α ) m = 1 P φ i p m y i p m
where y i p m represents the prediction vector of f i p m , P is the number of local features set to three, φ i p m denotes the value obtained by applying the softmax function to the three cross agreement score of image x i , and α is the strength coefficient for label refinement.
The pseudo-labels obtained by clustering global features lack the learning of local information, and the one-hot labels generated via label scattering disregard inter-class relationships. In contrast, the refined pseudo-labels derived by dynamically incorporating local predictions effectively address these problems and enhance the applicability of pseudo-labels in model training. The specific structure of the part-based pseudo-label refinement is shown in Figure 4.

4.3. Compact Loss

During model training, compact loss is employed to improve feature distribution by enlarging inter-class distances and minimizing intra-class variation, including smoothed cross-entropy loss, triplet loss, and density loss.

4.3.1. Smoothed Cross-Entropy Loss

Smoothed cross-entropy loss consists of dynamic label smoothing loss for calibrating local predictions and adaptive label refinement loss for enhancing global feature similarity, calculated as follows:
L s m o o t h e d c e = L d l s + L a l r
Dynamic label smoothing loss aims to discard fine-grained information with low reliability in local predictions. Based on the cross agreement score, it is computed by flexibly adjusting the weights of the cross-entropy loss and Kullback–Leibler (KL) divergence loss, which is represented as follows:
L d l s = 1 P i = 1 N m = 1 P ( S i p m H ( y ^ i , y i p m ) + ( 1 S i p m ) D K L ( u | | y i p m ) )
where P is the number of local features set to three, y ^ i is the pseudo-label, y i p m is the prediction vector of f i p m , S i p m is the cross agreement score, u is a uniform vector with equal probability for each class, H ( , ) denotes the cross-entropy, and D K L ( | | ) represents the KL divergence.
As S i p m approaches one, the cross-entropy loss calibrates the local prediction for higher confidence aligned with pseudo-labels. Conversely, as S i p m converges to zero, the KL divergence loss is employed to guide the prediction towards the uniform vector, thereby weakening this component. Compared with label smoothing by a constant value, our approach adjusts the smoothing strength based on feature reliability, preserving valid information while suppressing noisy interference.
Adaptive label refinement loss is defined as the cross-entropy between the refined pseudo-labels and the prediction vector of the global feature, as given by the following:
L a l r = i = 1 N y ˜ i log ( y i g )
where y ˜ i denotes the refined pseudo-label, and y i g represents the prediction vector of f i g . The refined pseudo-labels effectively reduce the impact of label noise while promoting the global features to learn rich fine-grained information.

4.3.2. Triplet Loss

The traditional triplet loss employs a predefined margin to ensure that the distance between hard positive and hard negative samples is at least m. To alleviate the sensitivity to margin selection, a softmax smoothing technique is introduced, which is defined as follows:
L t r i p l e t = i = 1 N log ( exp ( | | f i g f i g | | ) exp ( | | f i g f i g + | | ) + exp ( | | f i g f i g | | ) )
where f i g + and f i g denote the hard positive and hard negative samples corresponding to f i g , respectively, and | | | | is the Euclidean norm. The triplet loss contributes to increasing inter-class separability, thereby reinforcing the reliability of class boundaries.

4.3.3. Density Loss

Density loss promotes clustering compactness by minimizing the distance between anchors and hard positive samples, aligning them with the average intra-class distance. The optimization schematic of density loss is shown in Figure 5. It consists of global and local density losses, formulated as follows:
L d e n s i t y = β * L d e n s i f y g + γ * L d e n s i f y p
where β and γ represent the weight coefficients for global and local density losses, respectively, and their values are experimentally validated in Section 5.
Specifically, global density loss L d e n s i t y g is formulated as follows:
L d e n s i t y g = 1 N i = 1 N max 0 , d ( f i g , f i g + ) 1 | N i | j , k | N i | d ( f j g , f k g )
where d ( f i g , f i g + ) represents the distance between f i g and its hard positive sample f i g + , | N i | is the total number of samples in the same class as f i g , and 1 | N i | j , k | N i | d ( f j g , f k g ) denotes the average intra-class distance. The triplet loss improves inter-class separability of global features, while the application of density loss further strengthens intra-class compactness, thereby ensuring reliable class boundaries.
Local density loss L d e n s i t y p is given by the following:
L d e n s i t y p = 1 P × N i = 1 N m = 1 P max 0 , d ( f i p m , f i p m + ) 1 | N i | j , k | N i | d ( f j p m , f k p m )
where 1 | N i | j , k | N i | d ( f j p m , f k p m ) denotes the average intra-class distance of the m-th part, and P represents the number of local features set to three. Enhancing the compactness of local features provides more reliable supplementary information for pseudo-label refinement.
Specifically, the density loss in our work uses the hard positive sampling strategy, while, in [29], it operates on all the positive samples. Additionally, it is applied to both global and local features here, which is different from the strategy in [29] applied to only global features.

4.3.4. Compact Loss

The overall loss function of USL-PARC is as follows:
L c o m p a c t = L s m o o t h e d c e + L t r i p l e t + L d e n s i t y
Compact loss ensures inter-class separability while effectively enhancing the compactness of the feature space. Therefore, the pseudo-labels obtained by clustering global features can have higher confidence, and the local information aggregated for pseudo-label refinement is more reliable.

5. Results

5.1. Implementation Details

We used a GeForce GTX 1070 GPU as the main hardware for training and testing, and the algorithm platform was built on UBUNTU 18.04 and Pytorch 1.12.0. The backbone of the model is the ResNet-Sim network loaded with ImageNet pre-training weights. The input cattle face images were resized to 224 × 224. Random horizontal flipping (p = 0.5), 10-pixel padding followed by random cropping, and random erasing (p = 0.5) were applied for data augmentation. The mini-batch size was 32 consisting of 8 pseudo-classes and 4 images for each class, ensuring a balanced number of samples per class in each batch. Adam with weight decay of 5 × 10−4 was employed for training. The initial learning rate was set to 3.5 × 10−4 and decreased by a factor of 10 after every 20 epochs. The model was trained for a total of 50 epochs, with each epoch consisting of 200 iterations. In the pseudo-label generation phase, DBSCAN based on Jaccard distance with k-reciprocal encoding was employed for clustering. The DBSCAN parameters were set with a maximum distance of 0.5 and a minimum cluster size of 4. During testing, only global features were utilized for retrieval, and their dimensionality was 2048.
The proposed method was evaluated on the CattleFace2025 dataset. The CattleFace2025 dataset was split into training and testing sets in a 1:1 ratio based on identity categories, with no overlap of image data. The training set consisted of 5898 cattle face images from 287 individuals. For the query set, 5 images were randomly selected from each identity in the test set, resulting in a total of 1435 images from 287 individuals. The remaining images in the test set formed the gallery set, comprising 4547 images from the same 287 individuals. We evaluated the performance using mean average precision (mAP), cumulative matching characteristic (CMC) at Rank-1, Rank-5, and Rank-10, as well as the accuracy of k-NN (k = 5) classifier.

5.2. Ablation Study

In this subsection, a comprehensive ablation study was conducted to validate the effectiveness of ResNet-Sim in capturing key information and to examine the contribution of each component of the compact loss in supervising the learning of separable feature distributions. The experimental results are reported in Table 2.
As shown in Table 2, the accuracy under “ResNet-Sim + L d l s + L a l r ” increases by 6.6% with the adaptive label refinement loss adding to dynamic label smoothing in “ResNet-Sim + L d l s ”. And this shows the effectiveness of the pseudo-label refinement with supplementary local fine-grained information. Triplet loss and density loss progressively improve the accuracy by 1.2% and 0.7%, demonstrating its effectiveness in maintaining inter-class separability and minimizing intra-class distance. In addition, the accuracy in “ResNet-Sim + L d l s + L a l r + L t r i p l e t + L d e n s i t y ” with SimAM attention mechanism increases by 1.3% over that in “ResNet + L d l s + L a l r + L t r i p l e t + L d e n s i t y ” without SimAM, demonstrating the learning ability of fine-grained information with SimAM attention mechanism. Meanwhile, the average inference time increases only marginally from 3.65 ms to 3.83 ms per image, indicating that the integration of SimAM introduces negligible computational overhead.

5.3. Parameter Analysis of k and α

In the pseudo-label refinement strategy, k-NN is used to obtain the top-k ranked lists of global and corresponding local features. Then, the cross agreement score is calculated with the top-k lists to quantify the reliability of local features.
The value of parameter k controls the number of retrieved nearest neighbors, which determines the effectiveness of the reliability assessment for local features. To identify the optimal value of k, we conduct experiments by incrementally increasing k with a step size of 5, while keeping other parameters fixed. The results are shown in Table 3.
According to the results in Table 3, as k increases, the k-nearest neighbor search retrieves more diverse samples, which leads to an overall decrease in the calculated cross agreement scores. Conversely, a small k limits the search range and reduces the generality and robustness of the retrieved samples, making the computed scores more susceptible to randomness. Thus, a proper value of k can provide reliable weighting for pseudo-label refinement and smoothed cross-entropy loss. Based on the experimental results, we set k to 20.
The parameter α determines the strength of the complementary local information in pseudo-label refinement. To analyze the effect of α on performance, we conducted experiments with α varying from 0 to 1, in steps of 0.1, and results are presented in Table 4.
When α is set to 0, pseudo-label refinement relies exclusively on local predictions, while with α set to 1, it excludes local fine-grained information and depends entirely on labels derived from global feature clustering. As presented in Table 4, optimal model performance is achieved when the one-hot labels are refined by an appropriate proportion of local predictions. Consequently, we set α to 0.5.

5.4. Experiment on β and γ

In the loss computation, density loss is utilized to enhance the compactness of both global and local features, with β and γ controlling the strength of intra-class distance constraints for the global and local levels, respectively. To determine the optimal combination of the two parameters, we conducted the following experiment. We first fixed β at 0.1 and varied γ from 0.05 to 0.4. As shown in Table 5, the results indicate that the model achieves the best performance when γ = 0.1.
Then, we fixed γ at 0.1, and conducted experiments with different β from 0.05 to 0.4. The results presented in Table 6 demonstrate that the model achieves the best performance when β is set to 0.1, and maintains relatively stable accuracy across various values of β and γ .

5.5. Effect of Local Feature Quantity

Pseudo-labels are refined by supplementing local fine-grained context. Different strategies for partitioning local regions directly impact the effectiveness of the supplementary information, thereby influencing the quality of the refined pseudo-labels. To verify the optimal composition of local regions, we conducted experiments to compare the effects of different scales of the local features by dividing the image into r rows and c columns. The experimental results for various partitioning strategies are shown in Table 7.
It can be seen from Table 7 that different divisions of the image significantly impact the refinement of pseudo-labels with the different scales of local information. Due to the lack of fine-grained information for refining pseudo-labels, the performance with (1*1) division is limited. After incorporating local features, the performance is improved. Specifically, dividing the image into (3*1) parts, the accuracy improves over 0.5%, and this confirms the effectiveness of local fine-grained information. However, more division with smaller local regions, such as those cut into (5*1) parts, may result in incomplete local information, which is insufficient for unique identity representation. The experimental results show proper division is effective to learn more informative and fine-grained local features, which is helpful to enhance the reliability of the pseudo-labels.

5.6. Comparison with State-of-the-Art Models

In this subsection, we conduct comparative experiments to validate the effectiveness of the proposed framework. Since supervised methods rely on labeled datasets and this study focuses on training with unlabeled data, USL-PARC is compared only with recent state-of-the-art fully unsupervised person re-identification methods on the CattleFace2025 dataset, including MMCL [30], ICE [9], RLCC [26], PPLR [15], ISE [24], STDA [31], and DCSG [32]. MMCL utilizes a memory-based non-parametric classifier, integrating multi-label and single-label classification into a unified framework. ICE employs pairwise similarity scoring between instances to enhance contrastive learning performance. RLCC and PPLR mitigate label noise via pseudo-label refinement. ISE improves clustering reliability by generating supportive samples around real instances. STDA refines pseudo-labels by mining spatial-level connections among positive instances. DCSG complements source image data using information from multiple augmented views, based on which the pseudo-labels are optimized. The results of the comparison are shown in Table 8.
Compared to the above methods, our approach focuses on the complementary relationship between global and local features after attention processing, along with feature space compactness. The experimental results demonstrate that our USL-PARC achieves the best performance on all evaluation metrics. Specifically, in terms of k-NN accuracy, it outperforms MMCL, ICE, RLCC by 8.1%, 6.0%, 4.8%, respectively, and surpasses PPLR and ISE by 2.3%. Specifically, in terms of k-NN accuracy, the proposed method outperforms MMCL, ICE, RLCC, PPLR, ISE, STDA, and DCSG by 8.1%, 6.0%, 4.8%, 2.3%, 2.3%, 1.8%, and 2.4%, respectively.

6. Conclusions and Future Work

In this paper, a part-attention-based pseudo-label refinement reciprocal compact loss is proposed for fully unsupervised cattle face recognition. The USL-PARC provides more discriminative global and local features through the ResNet-Sim network. Meanwhile, fine-grained local contexts with high similarity to global features are utilized for pseudo-label refinement, effectively mitigating label noise. Finally, compact loss is employed to guide the model in learning a separable and discriminative feature space. The proposed method eliminates reliance on annotations during training and effectively mitigates the impact of pseudo-label noise. The effectiveness of USL-PARC has been validated through extensive experiments on the CattleFace2025 dataset.
Although pseudo-label refinement based on local fine-grained context demonstrates significant competitiveness, it still has limitations to overcome. In existing local feature extraction methods, local regions are obtained by uniformly dividing the feature map. To ensure that each local feature corresponds to the same facial region, the images used for training are aligned with the cattle face. However, in practical scenarios, the captured images are not always well aligned, and misalignment can negatively affect the reliability of local information. In our future work, we will explore semantic matching techniques to construct feature spaces that better represent semantically corresponding regions of the cattle face, thereby alleviating the need for manual face alignment.
In addition, the cross agreement score provides an intuitive method for evaluating the reliability of local information by calculating the Jaccard similarity coefficient between the k-nearest neighbors of global and local features. However, Jaccard similarity focuses on the intersection and union of two sets, neglecting the order of elements. When the k-nearest neighbors of two local features contain the same elements but in reversed order, the computed reliability remains the same, which may result in local information with significant semantic differences being unrecognized. We will further consider the order of elements to study a more precise method for reliability assessment in subsequent research. Moreover, we will explore replacing DBSCAN with other clustering algorithms for pseudo-label generation, such as fuzzy clustering [33], to enhance the reliability of pseudo-labels.
At present, the CattleFace2025 dataset satisfies the requirements for cattle identification in small- to medium-sized ranches, but the environmental diversity of the images remains insufficient. Generative Adversarial Networks (GANs) [34] and diffusion models [35] are widely used for image synthesis. In subsequent work, we will explore synthesizing images with blur, fog, or noise conditions using GANs or diffusion models. Moreover, we will try to leverage GAN-based human face image generation techniques to enlarge the dataset with artificial cattle faces for the evaluation of the generation ability of the module.

Author Contributions

Conceptualization, P.L. and J.Z.; methodology, P.L. and J.Z.; software, P.L. and J.Z.; validation, P.L. and J.Z.; formal analysis, P.L. and J.Z.; investigation, P.L. and J.Z.; resources, P.L. and J.Z.; data curation, P.L.; writing—original draft preparation, P.L.; writing—review and editing, P.L. and J.Z.; visualization, P.L.; supervision, P.L. and J.Z.; project administration, P.L. and J.Z.; funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the National Natural Science Foundation of China (32460858), in part by the Natural Science Foundation of Inner Mongolia Autonomous Region (2023MS06014), and in part by Inner Mongolia Autonomous Region Science and Technology Plan Project (2021GG0224).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
DCNNsDeep Convolutional Neural Networks
DMLDeep Metric Learning
k-NNk-Nearest Negative

References

  1. Weng, Z.; Fan, L.; Zhang, Y.; Zheng, Z.; Gong, C.; Wei, Z. Facial recognition of dairy cattle based on improved convolutional neural network. IEICE Trans. Inf. Syst. 2022, 105, 1234–1238. [Google Scholar] [CrossRef]
  2. Gong, H.; Pan, H.; Chen, L.; Hu, T.; Li, S.; Sun, Y.; Mu, Y.; Guo, Y. Facial Recognition of Cattle Based on SK-ResNet. Sci. Program. 2022, 2022, 5773721. [Google Scholar] [CrossRef]
  3. Yang, L.; Xu, X.; Zhao, J.; Song, H. Fusion of retinaface and improved facenet for individual cow identification in natural scenes. Inf. Process. Agric. 2024, 11, 512–523. [Google Scholar] [CrossRef]
  4. Xu, B.; Wang, W.; Guo, L.; Chen, G.; Li, Y.; Cao, Z.; Wu, S. CattleFaceNet: A cattle face identification approach based on RetinaFace and ArcFace loss. Comput. Electron. Agric. 2022, 193, 106675. [Google Scholar] [CrossRef]
  5. Bergman, N.; Yitzhaky, Y.; Halachmi, I. Biometric identification of dairy cows via real-time facial recognition. Animal 2024, 18, 101079. [Google Scholar] [CrossRef]
  6. Meng, Y.; Yoon, S.; Han, S.; Fuentes, A.; Park, J.; Jeong, Y.; Park, D.S. Improving known–Unknown cattle’s face recognition for smart livestock farm management. Animals 2023, 13, 3588. [Google Scholar] [CrossRef]
  7. Fu, Y.; Wei, Y.; Wang, G.; Zhou, Y.; Shi, H.; Huang, T.S. Self-similarity grouping: A simple unsupervised cross domain adaptation approach for person re-identification. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6112–6121. [Google Scholar] [CrossRef]
  8. Lin, Y.; Dong, X.; Zheng, L.; Yan, Y.; Yang, Y. A bottom-up clustering approach to unsupervised person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 8738–8745. [Google Scholar] [CrossRef]
  9. Chen, H.; Lagadec, B.; Bremond, F. Ice: Inter-instance contrastive encoding for unsupervised person re-identification. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 14960–14969. [Google Scholar] [CrossRef]
  10. Zhai, Y.; Ye, Q.; Lu, S.; Jia, M.; Ji, R.; Tian, Y. Multiple expert brainstorming for domain adaptive person re-identification. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part VII. pp. 594–611. [Google Scholar] [CrossRef]
  11. Ge, Y.; Chen, D.; Li, H. Mutual mean-teaching: Pseudo label refinery for unsupervised domain adaptation on person re-identification. arXiv 2020, arXiv:2001.01526. [Google Scholar]
  12. Ge, Y.; Zhu, F.; Chen, D.; Zhao, R. Self-paced contrastive learning with hybrid memory for domain adaptive object re-id. Adv. Neural Inf. Process. Syst. 2020, 33, 11309–11321. [Google Scholar]
  13. Zhai, Y.; Lu, S.; Ye, Q.; Shan, X.; Chen, J.; Ji, R.; Tian, Y. Ad-cluster: Augmented discriminative clustering for domain adaptive person re-identification. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9021–9030. [Google Scholar] [CrossRef]
  14. Zheng, F.; Deng, C.; Sun, X.; Jiang, X.; Guo, X.; Yu, Z.; Huang, F.; Ji, R. Pyramidal person re-identification via multi-loss dynamic training. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8514–8522. [Google Scholar] [CrossRef]
  15. Cho, Y.; Kim, W.J.; Hong, S.; Yoon, S.-E. Part-based pseudo label refinement for unsupervised person re-identification. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 7308–7318. [Google Scholar] [CrossRef]
  16. Yang, L.; Zhang, R.-Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Online, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
  17. Bakhshayeshi, I.; Erfani, E.; Taghikhah, F.R.; Elbourn, S.; Beheshti, A.; Asadnia, M. An intelligence cattle reidentification system over transport by Siamese neural networks and yolo. IEEE Internet Things J. 2023, 11, 2351–2363. [Google Scholar] [CrossRef]
  18. Xu, X.; Deng, H.; Wang, Y.; Zhang, S.; Song, H. Boosting cattle face recognition under uncontrolled scenes by embedding enhancement and optimization. Appl. Soft Comput. 2024, 164, 111951. [Google Scholar] [CrossRef]
  19. Weng, Z.; Meng, F.; Liu, S.; Zhang, Y.; Zheng, Z.; Gong, C. Cattle face recognition based on a Two-Branch convolutional neural network. Comput. Electron. Agric. 2022, 196, 106871. [Google Scholar] [CrossRef]
  20. Chen, X.; Yang, T.; Mai, K.; Liu, C.; Xiong, J.; Kuang, Y.; Gao, Y. Holstein cattle face re-identification unifying global and part feature deep network with attention mechanism. Animals 2022, 12, 1047. [Google Scholar] [CrossRef] [PubMed]
  21. Li, Z.; Lei, X.; Liu, S. A lightweight deep learning model for cattle face recognition. Comput. Electron. Agric. 2022, 195, 106848. [Google Scholar] [CrossRef]
  22. Fan, H.; Zheng, L.; Yan, C.; Yang, Y. Unsupervised person re-identification: Clustering and fine-tuning. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2018, 14, 1–18. [Google Scholar] [CrossRef]
  23. Si, T.; He, F.; Zhang, Z.; Duan, Y. Hybrid contrastive learning for unsupervised person re-identification. IEEE Trans. Multimed. 2022, 25, 4323–4334. [Google Scholar] [CrossRef]
  24. Zhang, X.; Li, D.; Wang, Z.; Wang, J.; Ding, E.; Shi, J.Q.; Zhang, Z.; Wang, J. Implicit sample extension for unsupervised person re-identification. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 7369–7378. [Google Scholar] [CrossRef]
  25. Lin, Y.; Xie, L.; Wu, Y.; Yan, C.; Tian, Q. Unsupervised person re-identification via softened similarity learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3390–3399. [Google Scholar] [CrossRef]
  26. Zhang, X.; Ge, Y.; Qiao, Y.; Li, H. Refining pseudo labels with clustering consensus over generations for unsupervised object re-identification. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 3436–3445. [Google Scholar] [CrossRef]
  27. Weng, Z.; Liu, S.; Zheng, Z.; Zhang, Y.; Gong, C. Cattle facial matching recognition algorithm based on multi-view feature fusion. Electronics 2022, 12, 156. [Google Scholar] [CrossRef]
  28. Ruchay, A.; Kolpakov, V.; Guo, H.; Pezzuolo, A. On-barn cattle facial recognition using deep transfer learning and data augmentation. Comput. Electron. Agric. 2024, 225, 109306. [Google Scholar] [CrossRef]
  29. Zhao, J.-M.; Lian, Q.-S. Compact loss for visual identification of cattle in the wild. Comput. Electron. Agric. 2022, 195, 106784. [Google Scholar] [CrossRef]
  30. Wang, D.; Zhang, S. Unsupervised person re-identification via multi-label classification. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10981–10990. [Google Scholar] [CrossRef]
  31. He, Q.; Wang, Z.; Zheng, Z.; Hu, H. Spatial and temporal dual-attention for unsupervised person re-identification. IEEE Trans. Intell. Transp. Syst. 2023, 25, 1953–1965. [Google Scholar] [CrossRef]
  32. Han, Q.; Chen, J.; Min, W.; Li, J.; Zhan, L.; Li, L. DCSG: Data complement pseudo-label refinement and self-guided pre-training for unsupervised person re-identification. Vis. Comput. 2024, 40, 7235–7248. [Google Scholar] [CrossRef]
  33. Li, T.; Liu, Y.; Ren, W.; Shiri, B.; Lin, W. Single Image Dehazing Using Fuzzy Region Segmentation and Haze Density Decomposition. IEEE Trans. Circuits Syst. Video Technol. 2025. [Google Scholar] [CrossRef]
  34. Zhang, S.; Zhang, X.; Wan, S.; Ren, W.; Zhao, L.; Shen, L. Generative adversarial and self-supervised dehazing network. IEEE Trans. Ind. Inform. 2023, 20, 4187–4197. [Google Scholar] [CrossRef]
  35. Wang, T.; Zhang, K.; Zhang, Y.; Luo, W.; Stenger, B.; Lu, T.; Kim, T.-K.; Liu, W. LLDiffusion: Learning degradation representations in diffusion models for low-light image enhancement. Pattern Recognit. 2025, 166, 111628. [Google Scholar] [CrossRef]
Figure 1. CattleFace2025 dataset examples. The CattleFace2025 dataset includes 11,880 cattle face images of 574 Holstein Friesian and Simmental individuals from several ranches in the Inner Mongolia Autonomous Region. Best viewed in color.
Figure 1. CattleFace2025 dataset examples. The CattleFace2025 dataset includes 11,880 cattle face images of 574 Holstein Friesian and Simmental individuals from several ranches in the Inner Mongolia Autonomous Region. Best viewed in color.
Electronics 14 02343 g001
Figure 2. Structure of USL-PARC. The mechanism of USL-PARC includes a two-stage training scheme. In the clustering phase, pseudo-labels are assigned using DBSCAN clustering based on global features, and the cross agreement score is computed to obtain the correlation between local and global features. In the training phase, local predictions are aggregated based on the cross agreement score to refine pseudo-labels. With these refined pseudo-labels, compact loss, which consists of smoothed cross-entropy loss, triplet loss, and density loss, is used to train the model. The two phases are alternated, and pseudo-labels are updated using features extracted from the trained model for the next training phase.
Figure 2. Structure of USL-PARC. The mechanism of USL-PARC includes a two-stage training scheme. In the clustering phase, pseudo-labels are assigned using DBSCAN clustering based on global features, and the cross agreement score is computed to obtain the correlation between local and global features. In the training phase, local predictions are aggregated based on the cross agreement score to refine pseudo-labels. With these refined pseudo-labels, compact loss, which consists of smoothed cross-entropy loss, triplet loss, and density loss, is used to train the model. The two phases are alternated, and pseudo-labels are updated using features extracted from the trained model for the next training phase.
Electronics 14 02343 g002
Figure 3. Principle of SimAM. The target elements in each channel are adjusted based on the computed activation parameters to obtain the attention-weighted feature map.
Figure 3. Principle of SimAM. The target elements in each channel are adjusted based on the computed activation parameters to obtain the attention-weighted feature map.
Electronics 14 02343 g003
Figure 4. Part-based pseudo-label refinement. Top-k ranked lists of global and corresponding local features are obtained through k-nearest neighbor search for each feature space, from which the cross agreement score is calculated to evaluate the reliability of each local feature. The local predictions are dynamically aggregated based on the cross agreement score to refine the pseudo-labels.
Figure 4. Part-based pseudo-label refinement. Top-k ranked lists of global and corresponding local features are obtained through k-nearest neighbor search for each feature space, from which the cross agreement score is calculated to evaluate the reliability of each local feature. The local predictions are dynamically aggregated based on the cross agreement score to refine the pseudo-labels.
Electronics 14 02343 g004
Figure 5. Optimization schematic of density loss. On the left is the original feature distribution, which is relatively scattered. On the right is the distribution after density loss supervision, where intra-class compactness is enhanced.
Figure 5. Optimization schematic of density loss. On the left is the original feature distribution, which is relatively scattered. On the right is the distribution after density loss supervision, where intra-class compactness is enhanced.
Electronics 14 02343 g005
Table 1. Details of datasets for cattle face.
Table 1. Details of datasets for cattle face.
AuthorYearIdentityImages
Xu et al. [4]2022902318
Weng et al. [1]2022504548
Li et al. [21]202210310,239
Weng et al. [19]202213018,200
Chen et al. [20]20223000130,000
Weng et al. [27]2023414406
Bakhshayeshi et al. [17]2023502500
Bergman et al. [5]2024777032
Xu et al. [18]202411810,137
Ruchay et al. [28]202491315
Table 2. Results of the ablation experiments.
Table 2. Results of the ablation experiments.
MethodmAPRank-1Rank-5Rank-10k-NN ACC
ResNet - Sim + L d l s 57.694.396.997.888.9
ResNet - Sim + L d l s + L a l r 73.797.598.799.095.5
ResNet - Sim + L d l s + L a l r + L t r i p l e t 78.098.098.899.396.7
ResNet - Sim + L d l s + L a l r + L t r i p l e t + L d e n s i t y 79.898.699.299.497.4
ResNet + L d l s + L a l r + L t r i p l e t + L d e n s i t y 76.397.999.099.196.1
Table 3. Influence of k on model performance.
Table 3. Influence of k on model performance.
kmAPRank-1Rank-5Rank-10k-NN ACC
k = 578.298.399.399.496.6
k = 1078.298.199.099.396.2
k = 1579.098.599.499.497.4
k = 2079.898.699.299.497.4
k = 2577.798.399.199.396.9
k = 3077.998.099.299.495.9
Table 4. Influence of α on model performance.
Table 4. Influence of α on model performance.
α mAPRank-1Rank-5Rank-10k-NN ACC
α = 0.076.498.199.199.395.1
α = 0.176.998.699.299.396.4
α = 0.278.898.599.299.496.7
α = 0.379.298.599.399.496.8
α = 0.478.098.399.099.196.9
α = 0.579.898.699.299.497.4
α = 0.677.898.399.199.396.2
α = 0.778.198.399.299.396.4
α = 0.877.998.499.199.496.7
α = 0.977.798.399.299.396.5
α = 1.076.198.399.099.296.0
Table 5. Different settings of γ , β = 0.1.
Table 5. Different settings of γ , β = 0.1.
γ mAPRank-1Rank-5Rank-10k-NN ACC
γ = 0.0579.198.399.099.496.7
γ = 0.179.898.699.299.497.4
γ = 0.278.498.499.399.496.0
γ = 0.378.398.399.199.396.9
γ = 0.476.798.099.199.295.8
Table 6. Different settings of β , γ = 0.1.
Table 6. Different settings of β , γ = 0.1.
β mAPRank-1Rank-5Rank-10k-NN ACC
β = 0.0577.898.299.099.496.0
β = 0.179.898.699.299.497.4
β = 0.278.698.399.299.496.9
β = 0.377.798.099.299.596.4
β = 0.477.798.099.299.596.2
Table 7. Comparison of segmentation methods.
Table 7. Comparison of segmentation methods.
Segmentation StrategymAPRank-1Rank-5Rank-10k-NN ACC
(1*1) *78.998.499.299.296.9
(2*1)79.198.699.499.496.9
(3*1)79.898.699.299.497.4
(4*1)77.598.099.299.496.2
(5*1)77.697.898.799.296.0
(2*2)77.898.298.899.296.0
* (r*c) represents that the image is divided into r rows and c columns, resulting in r × c local regions.
Table 8. Comparison with the state-of-the-art methods.
Table 8. Comparison with the state-of-the-art methods.
MethodmAPRank-1Rank-5Rank-10k-NN ACC
MMCL [30]57.294.898.098.889.3
ICE [9]64.595.597.798.591.4
RLCC [26]69.396.098.398.892.6
PPLR [15]74.598.098.899.095.1
ISE [24]74.797.698.599.095.1
STDA [31]77.898.099.099.295.6
DCSG [32]74.197.498.598.895.0
Ours79.898.699.299.497.4
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, P.; Zhao, J. Part-Attention-Based Pseudo-Label Refinement Reciprocal Compact Loss for Unsupervised Cattle Face Recognition. Electronics 2025, 14, 2343. https://doi.org/10.3390/electronics14122343

AMA Style

Liu P, Zhao J. Part-Attention-Based Pseudo-Label Refinement Reciprocal Compact Loss for Unsupervised Cattle Face Recognition. Electronics. 2025; 14(12):2343. https://doi.org/10.3390/electronics14122343

Chicago/Turabian Style

Liu, Peng, and Jianmin Zhao. 2025. "Part-Attention-Based Pseudo-Label Refinement Reciprocal Compact Loss for Unsupervised Cattle Face Recognition" Electronics 14, no. 12: 2343. https://doi.org/10.3390/electronics14122343

APA Style

Liu, P., & Zhao, J. (2025). Part-Attention-Based Pseudo-Label Refinement Reciprocal Compact Loss for Unsupervised Cattle Face Recognition. Electronics, 14(12), 2343. https://doi.org/10.3390/electronics14122343

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop