Strategic Feature Integration for Superior Person Re-ID: A Part-Based Approach

Hussein, Ghaith; Smith, Jeremy S.; Al-Nuaimy, Waleed

doi:10.3390/ai7060210

Open AccessArticle

Strategic Feature Integration for Superior Person Re-ID: A Part-Based Approach

by

Ghaith Hussein

^*

,

Jeremy S. Smith

and

Waleed Al-Nuaimy

Department of Electrical Engineering and Electronics, University of Liverpool, Brownlow Hill, Liverpool L69 3GJ, UK

^*

Author to whom correspondence should be addressed.

AI 2026, 7(6), 210; https://doi.org/10.3390/ai7060210 (registering DOI)

Submission received: 30 March 2026 / Revised: 1 June 2026 / Accepted: 2 June 2026 / Published: 9 June 2026

(This article belongs to the Topic State-of-the-Art Object Detection, Tracking, and Recognition Techniques)

Download

Browse Figures

Versions Notes

Abstract

Person Re-identification (Person Re-ID) is essential in surveillance and security. Traditional image processing methods often struggle to identify individuals accurately due to the sensitivity to occlusions and limited discriminative capability of the global feature representation. To address these challenges, this study proposes a deep-learning architecture for Person Re-ID, termed Dynamic Part-Based Fusion (DPBF), which integrates the Salient Part Discrimination (SPD) and the Adaptive Feature Integration and Contextual Fusion (AFICF) frameworks within a unified pipeline. The SPD module enhances representation learning by emphasizing discriminative body regions through an attention-guided part-based mechanism guided by human parsing information. The AFICF component performs the correlation-aware integration of localized part-specific features and global contextual features, reducing redundancy and improving discriminative feature representation. The proposed framework coordinates part-level feature extraction and correlation-aware integration within a unified pipeline to improve robustness under occlusion and appearance variations. Additional analyses demonstrate a stable performance across independent training runs, competitive computational complexity, and robustness under severe occlusion conditions through adaptive local–global feature integration. The method was evaluated on several Person Re-ID datasets, including Occluded-ReID, Market-1501, DukeMTMC-ReID, Occluded-Duke, P-DukeMTMC-ReID, and CUHK03-Labeled. The experimental results demonstrate a competitive performance compared with existing methods, while additional reproducibility, computational-complexity, and occlusion-stability analyses further validate the robustness and practical applicability of the proposed framework. Specifically, DPBF achieves a 10.6% increase in Rank-1 accuracy and a 16% improvement in mAP over the closest competitor on the Occluded-ReID dataset.

Keywords:

Salient Part Discrimination; Adaptive Feature Integration and Contextual Fusion; Person Re-ID; attention mechanism; feature fusion

1. Introduction

Person Re-ID is a challenging task due to the complexities arising from variations in human appearance, apparel, postures, and potential occlusions. Its primary objective is to recognize individuals across different images or video frames by leveraging their visual attributes. Although the field has witnessed significant advances, there are still considerable opportunities to bolster the robustness and generalizability of these systems. Innovative models like the Learning Part-based Convolutional Features (PCB) [1] have explored part-based feature extraction. These limitations mainly arise from the part-localization strategy and the challenges associated with detailed part-level feature representation. The PCB model segments feature maps of CNNs spatially to derive distinct parts, which can lead to issues with within-part consistency [1]; the method assumes that pixels within a well-located part should be like each other and dissimilar from pixels in other parts. In cases where a pixel-wise feature vector in a part is more like another part, it indicates inappropriate partitioning. The refined part pooling technique proposed, alongside PCB, aims to reassign these outliers to the closest parts, thereby improving within-part consistency; this process requires no part labels. The model is trained in a weakly supervised manner. This strategy may still struggle with inaccuracies in part localization, as it might not always effectively capture the discriminative characteristics necessary for distinguishing individuals based on standard body structures, such as the shape of the arms or legs.

Similarly, while the Harmonious Attention Network (HA-CNN) [2] employs spatial attention maps to focus on specific areas within an image, it does not explicitly address human body parts. Although the network can highlight informative image regions, it may not effectively isolate and analyze distinct body parts, which are crucial for Person Re-ID in scenarios with a high variability in human posture and appearance. The Pose-Driven Deep Convolutional (PDC) model [3] makes advances in handling pose-based variations by isolating different body parts and learning distinctive features. The model still depends on additional pose data, which introduces an extra layer of complexity. These limitations underscore the need for a framework that intricately combines attention mechanisms with part-specific analysis.

It is important to note that the novelty of this work does not lie in introducing entirely new individual components such as part attention, human parsing, correlation analysis, or hybrid loss functions, as these have been explored in prior studies. Instead, the contribution of this work lies in the coordinated DPBF framework, which integrates parsing-guided SPD, cross-part spatial modulation, correlation-aware feature integration, and hybrid supervision into a unified pipeline for occluded Person Re-ID.

This paper introduces a coordinated parsing-guided architecture for Person Re-ID that combines attention-guided strategies and part-based feature representations through a coordinated local–global integration framework. Unlike heuristic part-based approaches, all components of the proposed architecture are optimized jointly through backpropagation. An end-to-end system is implemented that employs parsing-guided semantic attention maps for localized body-region discrimination, named SPD, which builds upon the concept of part-based feature extraction [1], which serves as a foundation, by giving a coarse segmentation based on expected body parts. It then implements human parsing mechanisms to enhance the segmentation through weakly supervised human parsing labels generated by a pretrained SCHP model, without requiring manual part annotations [4], while adaptively emphasizing informative body regions to improve identity-preserving discrimination under occlusion conditions.

Despite these advantages, training such models presents challenges, particularly due to the variability in pose and occlusion, which can negatively affect the generation of accurate human parsing labels. Moreover, different poses or partial occlusions can lead to incorrect segmentation, reducing the accuracy of the parsing, which affects the overall results.

Therefore, the SPD integrates an attention mechanism with a global feature extractor, producing a spatial feature map to refine the segmentation. This architecture generates attention weights tailored for Person Re-ID, enabling the model to emphasize informative body regions during feature learning and rendering a heatmap for each body part, enabling the model to identify and emphasise pivotal regions in the images. Consequently, the SPD enhances part representations without relying solely on existing parsing label annotations.

Following part-level feature extraction, the integration of local and global features extracted from human body images is essential. Local or foreground features, which capture fine-grained details, such as clothing texture, complement global features that provide an overall representation of appearance. Nevertheless, relying solely on local or global features may reduce robustness against viewpoint, illumination, and occlusion.

Despite recent progress in the feature fusion and aggregation techniques [5], challenges related to feature overlap and the balanced integration of feature representations still remain, as identified in studies such as [5]. To address these limitations, this study introduces the AFICF. Beginning with a correlation-aware refinement process using statistical techniques such as Pearson correlation coefficients (PCC) and Principal Component Analysis (PCA), AFICF evaluates feature complementarity and redundancy prior to adaptive feature integration. The proposed framework demonstrates a competitive performance in localizing discriminative body regions and improving ranking performance under occlusion and appearance variations across multiple Person Re-ID benchmarks.

2. Related Work

2.1. Part-Based Models and Attention Mechanisms: Evolution

The development of Person Re-ID systems has been significantly shaped by the integration of part-based approaches and attention mechanisms. This evolution marks a strategic shift from conventional frameworks towards more optimized techniques capable of addressing the complexities of identity recognition in varied surveillance contexts. Early contributions in this domain were rooted in part-based methodologies, exemplified by Sun et al.’s Part-Based Convolutional Baseline (PCB) [1], which has established a foundational architecture for segmenting feature maps of CNNs to derive distinct part-specific features. This strategy demonstrated the potential of localizing representation learning to improve robustness against occlusions and enhance the detail level of human body representation learning. Nevertheless, the PCB, along with subsequent contributions [6,7,8], highlighted the need for more effective mechanisms to address part localization inaccuracies and enhance discriminative part-level feature representation.

Parallel to these advancements, the integration of attention-guided strategies with part-based models, as observed in works like [2,6,9], marked a significant paradigm shift. These architectures, leveraging spatial attention layers, have demonstrated notable improvements in feature discriminability and overall Re-ID performance, although they were not explicitly designed for detailed body part modeling.

Recent developments have introduced transformer-based architectures, such as TransReID [10], which leverage global self-attention mechanisms to capture long-range dependencies and improve identity-aware feature learning in Person Re-ID tasks. Other strategies, such as the Part-Aware Transformer (PAT), further enhance robustness to occlusion by dynamically modeling part-level relationships using transformer-based attention mechanisms, extending traditional part-based representations toward more flexible global–local interaction learning [11]. More recent transformer-based approaches further improve occlusion robustness by integrating self-supervised learning and occlusion simulation strategies, as demonstrated in SSSC-TransReID [12]. More recent trends have also explored vision-language models, such as CLIP-based frameworks, to enhance semantic feature alignment and generalization in Person Re-ID tasks [13].

Although these methods improve feature representation, many approaches still do not effectively integrate explicit body-part segmentation with adaptive attention-guided strategies, particularly under severe posture and appearance variations. Recognizing the limitations of existing strategies, our research aims to combine the strengths of part-based techniques with the precision of attention mechanisms. This work is motivated by the need to develop a more effective architecture that combines the detailed structural analysis with adaptive attention-guided feature learning. By doing so, we aim to advance the field of Person Re-ID by providing a coordinated framework for improving discriminative feature learning under complex identification scenarios.

2.2. Salient Feature Detection and Fusion Advances

Alongside advances in salient feature detection, significant progress has also been made in feature integration techniques aimed at integrating spatial, temporal, and appearance-based representations. Salient feature detection mechanisms, notably through attention mechanisms, concentrate on identifying and amplifying the most informative regions of an image. This has been instrumental in addressing the complex challenges Person Re-ID faces, such as pose variations, occlusions, and environmental changes. The HA-CNN [2] and DuATM [14] exemplify attention-guided feature refinement strategies by leveraging spatial and dual attention-guided strategies, respectively, to enhance identity-aware representation across multiple scales. Such methods not only enhance class or identity discrimination but also bolster the robustness of feature representation against common identification challenges.

Parallel to the advancements in salient feature detection, the field has seen significant strides in the development of feature fusion techniques, aiming to create a comprehensive representation by integrating spatial, temporal, and appearance-based features. Early fusion frameworks often relied on direct concatenation, which, despite its simplicity, frequently overlooks the relative importance of different feature types. Recent studies, such as the Multi-level Feature Aggregation Network [15], have explored hierarchical feature integration by aggregating representations from different network layers to improve discriminative capability in Person Re-ID.

Adaptive feature integration methods, such as those employed in HA-CNN [2], further enhance feature robustness by selectively emphasizing informative features, thereby augmenting the system’s representation capabilities. Similarly, the Jointly Learning Multi-Loss (JLML) approach strengthens multi-scale feature learning through the integration of complementary local and global representations, showcasing the potential of deep learning in enhancing feature separability under challenging conditions [16]. More recent strategies have also explored joint CNN–Transformer fusion strategies with explicit occlusion-aware learning, such as orthogonal fusion frameworks that enhance global–local feature integration under occlusion [17].

Despite these innovations, gaps remain, particularly in the seamless integration of salient feature detection with advanced fusion techniques and the efficient management of feature diversity versus computational demands. Our research aims to address these gaps by proposing a unified architecture that incorporates the AFICF. This model is designed to adaptively integrate part-specific features with a global contextual representation while reducing redundant feature incorporation. The AFICF addresses the limitations of conventional fusion methods by eliminating redundant feature information, thus optimizing computational efficiency without compromising feature quality. Global features provide a global appearance representation but might overlook detailed regional representations valuable in occluded or challenging views [2,16]. Local features capture finer information but become redundant or noisy when isolated. A simple concatenation of the two can lead to high-dimensional vectors with overlapping or contradictory information, reducing model efficacy. AFICF addresses this by calculating inter-feature correlations using PCC and eliminating redundancy using PCA. More importantly, AFICF dynamically learn the relative importance of local and global features from contextual cues, enabling adaptive feature integration. The adaptive fusion and weighting procedure generate a denser and semantically consistent representation, which is vital for distinguishing between identities under different conditions like occlusion, pose change, and viewpoint change.

2.3. Human Parsing and Pose Estimation

The integration of human parsing and pose estimation into Person Re-ID represents a significant step toward addressing the complexities of human identification in diverse scenarios. Human parsing is the task of segmenting a human image into different fine-grained semantic parts such as the head, torso, arms, and legs [18,19,20]. Techniques such as the Self-Correction for Human Parsing (SCHP) framework [4] have improved parsing accuracy through iterative adjustment processes.

These advancements facilitate extracting part-based features crucial for identifying individuals across different environments and body poses. Concurrently, pose estimation provides contextual cues about an individual’s posture and orientation, offering valuable information for overcoming occlusions and viewpoint changes. The fusion of pose estimation with Re-ID systems aids in distinguishing individuals based on their movement patterns and posture, even in crowded or complex scenes. Visibility-aware learning strategies have also been proposed to explicitly model occluded regions and improve identification accuracy under partial visibility conditions [21].

Although these techniques improve semantic understanding, integrating human parsing and pose estimation into Person Re-ID systems remains challenging. The accuracy of these techniques is contingent on the input data quality and the model’s capacity to handle variations in human appearance due to occlusions, pose changes, and environmental factors. This research aims to address the gaps identified in current methodologies by proposing a framework that leverages the combination of human parsing and pose estimation.

SPD exemplifies this strategy, employing an attention mechanism to refine feature learning using human parsing and pose estimation cues [4]. The SPD’s methodology represents a departure from traditional strategies by incorporating an attention-guided feature improvement that dynamically focuses on the most salient regions of the human image, as informed by human parsing labels and pose estimation data. This process is supported by an iterative refinement mechanism, thereby supporting more stable and identity-aware feature representations for Person Re-ID.

3. Proposed Method

The proposed DPBF framework consists of two coordinated functional modules: the SPD and AFICF modules. The SPD module generates enhanced part-level representations by emphasizing discriminative body regions through attention-guided mechanisms, while AFICF performs the adaptive integration of part-specific and global contextual features to reduce redundancy and improve feature complementarity. Although SPD and AFICF are inspired by existing concepts such as part-based attention, feature integration, and statistical correlation analysis [1,5]. Specifically, the SPD module generates discriminative representations that directly guide the context-aware aggregation process in AFICF, enabling the more effective handling of occlusion challenges and redundant descriptors for robust Person Re-ID.

3.1. Salient Part Discrimination (SPD)

The SPD module is developed to generate discriminative and identity-aware feature embeddings for each person instance, building upon PCB [1]. This is realized through a Part Attention Mechanism (PAM) that computes part-specific attention maps for

K

-predefined body parts as illustrated in Figure 1. The overall attention-guided design is inspired by the attention mechanisms proposed in [22,23]. This mechanism is executed in two primary stages:

3.1.1. Generating Attention Maps for Each Body Part

A Grad-CAM-based strategy [24] is employed to generate attention maps for each of the

K

predefined body parts. These parts are defined using human parsing annotations, allowing the SPD to initialize attention maps aligned with semantically consistent body regions. Figure 1 represents this process and illustrates the final attention map generation procedure. By dividing the human body into

K

parts, the objective is to capture unique features from each region, amplifying the model’s feature separability as proposed in [1]. However, relying solely on predefined parts has its challenges. Given that these regions are static, they might not accommodate variations resulting from different poses or occlusions. To enhance the robustness of this process, a two-step sequential strategy was followed. Certain limitations may still persist, as body-part segmentation errors may occur due to extreme pose variations or occlusion, leading to the loss or misalignment of certain components. Thus, this impacts the capability of the model to extract reliable identity-aware features, particularly for anatomical regions such as the legs and feet:

Step 1: The SPD module serves as the primary parsing-guided discrimination stage within DPBF, focusing on semantic body-region localization to improve identity-aware feature extraction under occlusion and appearance variations. The input feature map

F

is first transformed into

F^{'}

using a 1 × 1 CNN layer. A lightweight neural network generates attention weights

W

, which are reshaped into

K

attention maps {

{A_{1} {, A}_{2}, \dots, A}_{k}},

where each attention map

A_{k}

highlights the most salient regions of the corresponding body part representation

P_{k}

, for

k

∈ {1, …,

K

}. These weights are subsequently restructured into spatial attention maps representing the predefined body parts, consistent with the methodology proposed in [25]. The effectiveness of SPD is demonstrated qualitatively in Figure 1, where attention maps highlight informative and discriminative regions, allowing the model to focus on identity-relevant visual attributes [26].

Subsequently, a sequence of CNN layers followed by a global average pooling (GAP) layer and a fully connected (

F C

) layer with

K

output nodes form the attention-generation pipeline. The GAP computes the average response of each feature channel across the spatial dimensions, producing a compact feature vector of length

C^{'}

. To generate spatially localized attention maps, a softmax function is applied over the spatial dimensions

H

and

W

, following the attention-guided strategy in [25], and the spatial attention principle used in Spatial Transformer Networks technique (STNs) [27]. The STNs introduce a learnable module that enables CNNs to focus on the informative image regions while improving transformation invariance.

Step 2: Human parsing labelling mechanisms based on SCHP are implemented to refine the segmentation. This phase concentrates on areas subject to variations due to poses and occlusions. Given that SCHP usually requires more computational resources, it focuses on refining particular regions rather than re-evaluating the whole image. The SCHP, derived from an external pose estimation model and improved through self-correction SCHP [4], allows for a nuanced understanding of the human form, concentrating on areas rich in information and generating parsing labels. Generating human parsing labels remains challenging due to human pose variations and occlusions.

This necessitates robust methods to improve the parsing accuracy for effective part-discriminative feature extraction. To address these challenges, SPD incorporates an attention-guided adjustment process based on SCHP. This component employs an attention mechanism and a global feature extractor to create a distinct attention map for each body part. These maps are initially informed by labels derived from SCHP. When human parsing labels are available, they aid SPD in generating precise part-specific attention maps, denoted as

Y \in R^{H \times W}

. These labels typically correspond to discrete body parts, including the head, torso, arms, and legs; it depends on the datasets, annotations, and framework strategies.

Given an input image

X

and its corresponding human parsing label

y,

obtained from the pose estimation model, the SPD computes a part-specific attention map

A_{k}

for each body part:

A_{k} = f (X, Y)

, where

f (.)

is a learned function that generates attention maps based on the input image

X

and the human parsing labels

y

. To generate the initial labels, the SCHP framework is integrated within the SPD attention mechanism. This process can be represented as follows:

Y_{0} = P (X)

, where

P

denotes the human parsing model,

X

is the input image, and

Y_{0}

is the initial human parsing label. The correction network

C

learns to correct the initial predictions by minimizing the errors between the predicted labels

Y_{0}

and ground truth labels

Y_{G T}

. This process can be formulated as follows:

C = a r g \underset{C L}{m i n} (C (Y_{0}), Y_{G T})

(1)

where

C L

denotes the correction loss used to optimize the correction network, and

C (Y_{0})

represents the corrected labels produced by the correction network

C

. After training, the refined labels are updated as follows:

Y_{1} = C (Y_{0})

. These refined labels are subsequently used to retrain the human parsing model iteratively until convergence.

3.1.2. Cross-Part Spatial Feature Modulation Map (CP-SFMM)

The CP-SFMM module focuses on enhancing the identity discrimination. This is achieved through element-wise multiplication (E-WM) with the original feature map to strengthen distinctiveness body-part representations in various contexts, especially in occlusions. This technique extends the attention-guided strategy introduced in [2], to spatially modulated features, ensuring that attention is paid to the most relevant areas for accurate identification. The process involves two main steps:

Informative regions are first enhanced through E-WM, followed by the aggregation of the refined feature maps into a unified representation. The attention maps

A_{k}

, generated in the previous step, are employed to compute part-specific feature maps by E-WM with the original feature map

F

[28],

P_{k} = A_{k} \times F

. The E-WM operation calculates the product of the corresponding elements in the attention maps and the input feature map, as illustrated in Figure 2. The resulting part-specific feature maps will have the same dimensions as the input feature map

F

(i.e.,

W \times H

).

After computing the part-specific feature maps for each body part, the channel dimension concatenates

P_{c o n c a t}

with dimensions 32 × 32 × 512 (

C^{'}

= 128 × 3), to obtain a combined feature representation that captures the spatial information of all the body parts, as illustrated in Figure 2 and Figure 3, respectively. This combined feature representation can then be processed by the feature fusion and aggregation module, represented as follows:

P_{c o n c a t} = C o n c a t (P_{1}, P_{2}, \dots, P_{k})

.

In summary, the CP-SFMM step merges the part-specific feature maps generated by the SPD into a single feature map. This unified feature map contains discriminative information from all predefined body parts, which is crucial for creating a robust representation of the person’s appearance for identification tasks.

It should be noted that the SPD builds upon existing part-based attention and parsing-guided techniques. Compared with conventional parsing-guided Re-ID methods that primarily use parsing labels for static body-region supervision, the SPD integrates parsing-guided attention refinement with cross-part spatial feature modulation to strengthen discriminative feature learning under occlusion and pose variations. Therefore, the contribution of SPD in this work lies in its coordinated role within the overall DPBF pipeline rather than as a standalone novel attention mechanism.

For notation consistency throughout the manuscript, the semantic body-part feature representations generated by SPD are denoted as

P_{k}

, where

k

corresponds to a predefined body region. In the experimental analysis, visualizations, and part-wise performance evaluations, the corresponding labels (e.g., p0–p8) are used to identify individual body-part embeddings. Unless otherwise stated, both notations refer to the same semantic body-part feature representation at different stages of analysis and presentation.

3.2. Adaptive Feature Integration and Contextual Fusion (AFICF)

Multiple feature representations are extracted from human body images, including foreground local features and global features. Relying solely on global or local features may reduce robustness against viewpoint, illumination, and occlusion variations. Previous studies have emphasized improving interaction and aggregation in networks such as [5,29,30,31]. These strategies fuse and aggregate local and global features to enhance discriminative representation learning, often using feature fusion and aggregation mechanisms. However, other recent studies indicate that feature fusion and aggregation may introduce redundancy when integrated descriptors are highly correlated, negatively impacting the model’s efficiency. In addition, maintaining the balance between complementary local and global representations remains challenging, as overemphasizing one can lead to underutilization of the other [5,29,30,31,32].

Two key strategies are therefore considered: employing advanced correlation-aware filtering and coordinated fusion mechanisms. To address these limitations, AFICF adaptively calibrates interactions between local–global body-part cues and holistic contextual embeddings prior to fusion, thereby reducing redundant feature incorporation and improving contextual feature consistency under occlusion conditions. Rather than directly concatenating local and global representations, AFICF evaluates inter-feature correlations to preserve informative identity cues and reduce unnecessary redundancy during aggregation. Pearson Correlation Coefficient (PCC) analysis and Principal Component Analysis (PCA) are incorporated as a statistical adjustment process within the integration stage to support feature decorrelation and compact representation learning. These operations are applied as statistical feature refinement procedures rather than standalone trainable layers or independent post-processing operations. This strategy was motivated by preliminary observations that the direct fusion of local and global representations may introduce redundant information and an inconsistent semantic scaling, which can negatively affect discriminative feature learning.

The PCC is employed to measure the linear correlation between two variables. The correlation between each pair of features,

X

and

Y,

is estimated using this formula:

r_{X Y} = \frac{\sum_{i = 1}^{n} (X_{i} - \bar{X}) (Y_{i} - \bar{Y})}{\sqrt{\sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2}} \sqrt{\sum_{i = 1}^{n} {(Y_{i} - \bar{Y})}^{2}}}

(2)

where

r_{X Y}

denotes the correlation coefficient between feature representations

X

and

Y

, while

X_{i}

and

Y_{i}

represent the corresponding feature samples, and

\bar{X}

and

\bar{Y}

are the means of the

X

and

Y

samples. Feature components exhibiting a high absolute value of

r_{X Y}

(close to 1 or −1) indicate a strong linear relationship between the two features, suggesting redundancy. Features with a high correlation can be considered redundant and are candidates for elimination because they provide similar information. The AFICF retains one feature from each group of highly correlated features and eliminates the rest to reduce redundancy, as shown in Figure 4.

Following correlation refinement, PCA is applied to generate a compact feature representation while preserving the dominant discriminative variance of the integrated feature space. PCA is utilized here as a dimensionality-reduction mechanism to suppress residual redundancy and improve feature compactness before the final feature fusion. The covariance relationship between feature dimensions can be represented as follows:

C o v (X_{j}, X_{k}) = \frac{1}{n - 1} \sum_{i = 1}^{n} (X_{i j} - {\bar{X}}_{j}) (X_{i k} - {\bar{X}}_{k})

(3)

The principal components associated with the highest eigenvalues are retained to construct the final compact feature representation used during adaptive feature integration.

3.3. Hybrid Loss Function (HLF)

The DPBF framework employs a Hybrid Loss Function (HLF) that combines cross-entropy and triplet loss. Unlike the conventional implementation of these loss functions, DPBF employs HLF not merely as a training objective but also as a component supporting feature integration between local and global representation. Studies such as [8,32,33] have demonstrated the superiority of HLF in enhancing spatial feature distribution and improving Re-ID accuracy. These findings are consistent with DPBF’s performance, which leverages HLF to significantly reduce feature-level discrepancies and optimize identity-aware representation for complex occluded scenarios. First, the cross-entropy loss is typically used for identity classification tasks. It works well with holistic embeddings because it encourages the model to learn discriminative features for identifying different individuals. The cross-entropy loss for a classification task is typically represented as follows:

L_{C E} = - \sum_{i = 1}^{N} Y_{i} \log \hat{(Y i)}

(4)

where

N

is the number of classes,

Y_{i}

is the true label (1 for the correct class and 0 otherwise) and the

\hat{(Y i)}

is the predicted probability for class

i

. Second, the part-based embeddings are used to reduce the distance between the anchor and the positive embeddings while increasing the separation from the negative embeddings [26]. This is especially useful in scenarios where the person might be partially occluded or when only parts of the person are visible. The triplet loss is formulated as follows:

L_{t r i p l e t} = m a x (0, D_{p o s} - D_{n e g} + m a r g i n)

(5)

where

D_{p o s}

is the distance between the anchor sample and the positive sample (same person),

D_{n e g}

is the distance between the anchor sample and the negative sample (different person), and

m a r g i n

is a predefined constant that enforces a minimum separation between positive and negative pairs. The HLF combines these two objectives through a weighted sum:

L_{H y b r i d} = α L_{C E} + {β L}_{T L}

(6)

where α and β denote weighting coefficients controlling the relative contribution of the cross-entropy and and triplet loss components during optimization. In this work, both components were equally weighted to maintain balanced optimization between global identity discrimination and part-level feature separability.

The integration of triplet loss and cross-entropy in the HLF is essential to enhance the interaction between local and global features during training. Specifically, the cross-entropy term is aimed at the optimization of global-identity-level features by encouraging the network to distinguish between individuals according to overall embeddings. At the same time, the triplet loss puts an emphasis on part-level discrimination so that semantically consistent body parts are clustered together for the same identity while remaining separated across different identities. The twofold aim ensures that global and local features are jointly optimized in a complementary manner. Consequently, when these representations are transferred within AFICF, they are already well-aligned and discriminative, thus enabling the adaptive integration mechanism to more effectively eliminate redundancy and capture complementary information. The effectiveness of this strategy is further demonstrated in the experimental analysis.

AFICF does not claim novelty in the individual use of correlation analysis or dimensionality-reduction techniques. Instead, its contribution lies in integrating correlation-aware feature refinement within a coordinated local and global feature integration strategy designed to improve feature complementarity and robustness under occlusion conditions within the DPBF framework.

Algorithm 1 summarizes the end-to-end training and inference workflow of the proposed DPBF framework.

Algorithm 1. End-to-End Training and Inference Procedure of DPBF

Input:

Training images

X

, identity labels

Y

, parsing masks

y

, and the number of predefined semantic body-part representations

K

.

Output:

Optimized DPBF model and final Person Re-ID ranking results.

Training Stage

Extract backbone feature maps from each input image using the CNN backbone.
Generate parsing-guided semantic body-part representations using the Salient Part Discrimination (SPD) module.
Construct local body-part embeddings and global contextual embeddings.
Refine local-global feature interactions using the Adaptive Feature Integration and Contextual Fusion (AFICF) module.
Suppress redundant feature components using Pearson Correlation Coefficient (PCC)-based refinement.
Generate compact discriminative feature embeddings using Principal Component Analysis (PCA)-based refinement.
Fuse the refined local and global feature representations.
Optimize the DPBF framework using the Hybrid Loss Function (HLF).

Inference Stage

Extract normalized feature embeddings from query and gallery images using the trained DPBF model.
Compute Euclidean distances between query and gallery embeddings.
Rank gallery samples according to ascending distance values.
Return the final Person Re-ID retrieval results.

4. Experiments and Results

This section presents the datasets, implementation details, evaluation protocols, and experimental results used to assess the performance of the proposed DPBF framework.

4.1. Implementation Details

The implementation strategy utilized ResNet-50 [34] and HRNet-w32 [35] as the backbone networks. For HRNet-w32, the architecture was refined by adjusting the stride of the final convolutional layer to 1 and removing the global average pooling (GAP) layer. The ResNet-50 model is adapted by removing the final fully connected layer along with the GAP layer, to preserve spatial feature details essential for the Re-ID task. The foundational training strategy fallows the methodology of [36], as adopted in BPB Re-ID [8]. The model was trained on an NVIDIA GeForce RTX 4090 GPU (24 GB), while additional experiments were conducted on 2x GeForce Titan Xp GPUs.

Reproducibility and Training Pipeline

To provide precise reproducibility details beyond the general implementation description, we summarize the full training configuration and processing pipeline as follows: The model was trained using the Adam optimizer [37], with an initial learning rate of

{3.5 \times 10}^{- 4}

following a warm-up strategy initialized with a learning rate of

{3.5 \times 10}^{- 5}

, momentum of 0.9, and weight decay of 0.0005. A warm-up multi-step learning-rate scheduling strategy was adopted, where the learning rate was reduced at epochs 40 and 70. The model was trained for 150 epochs with a RandomIdentitySampler, where each batch contains multiple identities with a fixed number of instances per identity (4 instances per identity). The effective batch size ranged from (16 to 64) depending on dataset characteristics, backbone architecture, and GPU memory constraints. Input images were resized to 384 × 128 for HRNet-w32 and 256 × 128 for ResNet-50, and data augmentation included random cropping, normalization, and random erasing. For the hybrid loss, identity and triplet loss components were equally weighted, and the triplet margin was set to 0.3. A pixel-level cross-entropy loss was applied for part-based supervision with a weight of 0.35. In the AFICF, PCC is used to identify highly correlated feature components based on correlation magnitude, and redundant features are filtered accordingly. PCA is then applied to reduce feature dimensionality, resulting in a compact 512-dimensional feature representation. The number of semantic body-part representations

K

was adapted according to the parsing configuration and dataset characteristics. Specifically, configurations employing

K

= 5 and

K

= 8 body-part representations were utilized depending on the visibility conditions and semantic partitioning strategy associated with each dataset and experimental setting. This strategy ensures efficient feature representation while preserving discriminative information. During inference, query and gallery images are processed through the trained network to extract feature embeddings, followed by normalization and Euclidean-distance-based ranking. To ensure fair and reproducible evaluation, all comparisons were conducted using standard dataset protocols and commonly adopted evaluation metrics, while the reported baseline results were collected from their respective original publications under standard evaluation settings and commonly adopted Person Re-ID protocols.

The DPBF model was trained and evaluated using the standard dataset splits and evaluation protocols associated with each benchmark dataset, and all comparative results were reported on the corresponding benchmark under its commonly adopted Person Re-ID evaluation setting. Unless explicitly stated otherwise in the cited studies, re-ranking techniques were not incorporated into the reported comparisons. Baseline results used for comparative evaluation were primarily collected from their original publications rather than fully reproduced within a unified implementation framework. Consequently, certain variations may exist across competing methods regarding backbone architectures, input resolutions, data augmentation strategies, optimization settings, and auxiliary training techniques. To reduce comparison bias, only results reported under standard evaluation settings and commonly adopted Person Re-ID evaluation settings were included. These comparisons are therefore intended to provide a consistent empirical reference rather than a fully unified benchmark reproduction.

4.2. Baseline Comparisons

Comparative analyses were conducted on two benchmark dataset types: holistic datasets, such as DukeMTMC-ReID [38], CUHK03-Labeled [39], and Market-1501 [40], and occluded datasets, such as Occluded-Duke [6], Occluded-ReID [41], and P-DukeMTMC-ReID [42]. To ensure fair comparison, all results are reported using standard evaluation protocols and publicly available benchmarks. Two evaluation metrics were employed to assist this work: Rank-1 Accuracy (R-1) and Mean Average Precision (mAP). The comparison tables categorized the studies into three groups for clarity: Part-Based methods (PB)—these methods focus on analyzing individual body parts; Global methods (G)—these approaches consider the global features of the persons; and the combining Part-Based and Global strategies (PB + G). To highlight the top-performing methods in the comparative evaluations (Table 1 and Table 2), a ranking scheme was used: (¹) for the best performance, (²) for the second-best, and (³) for the third-best. The comparative results obtained on the holistic benchmark datasets are presented in Table 1.

In a highly competitive field, Table 1, the DPBF demonstrates consistent improvements on the Market-1501 dataset, achieving R-1 improvements of 0.3% on Market-1501 and 1.0% on DukeMTMC-ReID compared with the strongest competing methods under the same evaluation protocol, while the effectiveness of the DPBF model becomes more evident in the CUHK03-Labeled dataset, where it outperforms the nearest competitors by 4.8% in R-1 accuracy and 1.6% in mAP, respectively.

The comparative results obtained on the occluded benchmark datasets are presented in Table 2.

Table 2 illustrates the efficacy of the DPBF model in datasets that simulate real-world situations: on the Occluded-ReID, the model demonstrates a strong performance by outperforming the closest competing methods under the same evaluation protocol with a substantial 10.6% higher R-1 and an impressive 16% higher mAP. Additionally, the model achieves a 0.6–higher R-1 and a 0.4–higher mAP in the P-DukeMTMC-ReID benchmark, further demonstrating its competitive advantage.

4.3. Part-Based Feature Discrimination and Model Efficacy in Challenging Identification Scenarios

The results, as illustrated in the Figure 5a–c, show the varied impact of each part on the model’s performance. The p0 embedding achieves a stronger identification performance than most individual body parts, demonstrating the effectiveness of integrating complementary semantic body-region information.

In contrast, individual body parts show different levels of reliability, with upper-body regions generally contributing more strongly than lower-body regions. This can be attributed to the greater visibility and denser discriminative cues of upper-body regions, while lower-body regions are more frequently affected by pose variation, occlusion, and missing visual information.

Figure 5a visualizes the confidence response of individual part-based embeddings across several challenging identification samples. The red-bordered samples indicate lower confidence scores and highlight difficult situations where occlusion, pose variation, or missing body-region information weakens the quality of part-specific feature representation. In particular, the quantitative evaluations in Figure 5b,c reveal substantial performance degradation for several lower-body embeddings, with p7 consistently exhibiting the weakest performance across the evaluated occlusion-oriented benchmarks. This observation suggests that lower-body regions are more vulnerable to severe occlusion, pose variation, and missing visual information than upper-body regions.

4.4. Feature Integration Strategy Evaluation

Table 3 presents a comparative analysis of multiple feature-integration configurations using global pooled features

g_{f}

and body-part local feature representations

{B p}_{(1 - 5) f}

within the DPBF framework on the Occluded-ReID dataset, where

P C C

and

P C A

-based filtering stage were incorporated during local–global feature integration.

The indices within the table represent different feature integration configurations, each evaluated against established performance metrics to analyze their contribution to identification performance within the DPBF framework. Table 3 outlines the inclusion of each feature component and its associated loss function, comparing these against the resultant mAP and rank accuracies. It is observed that the HLF strategy, incorporating both cross-entropy and triplet loss functions alongside a full spectrum of features, achieves the strongest overall performance among the evaluated feature integration configurations within the DPBF framework. The results indicate that a combination of global and local feature representations, together with PCC- and PCA-based refinement, contributes to improved identification performance across multiple configurations. For instance, configurations relying primarily on global pooled features

g_{f}

with PCC and PCA calibration (Index 1) exhibit lower identification performance compared with configurations incorporating both global and local body-part representations. Performance degradation becomes more evident as additional components are excluded, demonstrating that progressively removing feature integration components reduces overall identification performance.

5. Ablation Studies

5.1. Effectiveness of DPBF Model Under Supervised and Unsupervised Learning on the Market-1501 Datasets

To further analyze the impact of individual components and design choices, in our proposed method, a series of ablation studies was conducted to investigate the contribution and effectiveness of SPD on different DPBF configurations, focusing on three learning strategies: supervised learning. The supervised strategy relies on parsing labels for training, whereas the unsupervised method employs a PAM without additional parsing label information. The semi-supervised method combines both approaches to improve learning efficiency. The DPBF model’s adaptability and performance were analyzed across various scenarios, employing the Market-1501 datasets as a benchmark. This analysis centers around the model’s ability to leverage body-part-based attention mechanisms, specifically examining the influence of training the model with varying numbers of human body parts. Table 4 outlines the DPBF model’s performance.

This analysis further illustrates the model’s effectiveness in leveraging detailed part-based features but also demonstrates the benefits of integrating supervised and unsupervised learning techniques to bolster performance.

To further contextualize the DPBF model’s efficacy, Table 5 presents a comparison with the PCB model on the DukeMTMC-ReID benchmark, focusing on configurations involving 3, 5, and 6 parts.

This comparison aims to highlight the DPBF model’s advancements over existing methodologies. Although Table 4 and Table 5 primarily evaluate the impact of part granularity and compare the proposed approach with baseline strategies, the observed performance improvements suggest that the effectiveness of DPBF arises from the coordinated use of parsing-guided body-region discrimination and correlation-aware integration. This indicates that the model benefits from the interaction between its components rather than relying on a single isolated design element.

5.2. Visualisation of Occluded Learned Features Based on Individual Body-Part-Based Embeddings

As shown in Figure 6, the proposed model preserves strong activation responses in visible discriminative body regions under partial occlusion, whereas severe occlusion and missing lower-body information lead to noticeably weaker local body-region representations. Although contextual upper-body activations remain relatively stable, heavily occluded body regions exhibit weaker and less reliable feature localization. In Figure 6a, presents several query-gallery matching cases are presented, where each identity is represented by two corresponding visualization rows. The heatmap responses demonstrate that the proposed model is generally able to preserve strong activations over visible discriminative body regions despite the presence of partial occlusion and viewpoint variations.

Certain scenarios still pose notable challenges and yield less informative details, especially under severe occlusion conditions where entire limbs or body regions are obscured, impacting the model’s overall performance. For example, several examples in Figure 6a exhibit weaker responses around the foot and lower-leg regions, indicating a reduced localization reliability when only limited visual information is available. Similar degradation patterns can be observed in other examples, where heavily occluded lower-body regions generate weaker activations and less informative feature responses. From these observations, it is inferred that the most formidable challenge faced by the model is not the occluded parts, but the absence or severe degradation of discriminative visual information under extreme partial-visibility conditions. As illustrated in Figure 6b, local embeddings associated with heavily occluded or weakly visible body regions exhibit substantially weaker activation responses and reduced feature reliability compared with dominant contextual embeddings. These visual observations are consistent with the quantitative degradation presented later in the experimental analysis, where heavily occluded body-region embeddings exhibit reductions in R-1 accuracy and mAP.

5.3. Comparison with Other Feature-Learning Techniques Under Different Image Resolutions

The current research extends the evaluation of feature learning by incorporating a comparative analysis with DenseNet [54]. This analysis provides additional insights into feature-learning efficacy across different image resolutions and backbone architectures when applied to different image resolutions and architectures. ResNet-50 is integrated into the DPBF model, utilizing a resolution of 256 × 128 to maintain computational efficiency while capturing the requisite detail for Person Re-ID tasks. Conversely, HRNet-w32 is tailored for higher-resolution inputs, aligning with the design philosophy of detailed feature maps [35]. For this reason, the DPBF model harnesses an input size of 384 × 128 when employing HRNet-w32, thus capitalizing on its ability to maintain high-resolution representations through the network.

The inclusion of DenseNet in the comparative study serves as a reference point, providing insights into the performance of densely connected networks against the chosen architectures of the DPBF model. DenseNet’s feature propagation and parameter efficiency offer a valuable perspective in the context of Person Re-ID despite its absence from the DPBF framework. Figure 7 summarizes the comparative performance obtained across the evaluated backbone architectures and input resolutions.

In summary, the research presented here demonstrates the design choices of the DPBF model, illustrating the balance between feature-learning capability and computational efficiency.

5.4. Computational Efficiency and Scalability Analysis

In the competitive field of deep learning, the computational efficiency and scalability of models are of the utmost importance, especially in areas requiring real-time analytics or extensive training periods. A comparative analysis has been conducted to assess the computational efficiency of the DPBF model, particularly against well-established methods such as PCB + RPP and BPB Re-ID.

5.4.1. Assessing Processing Time and Memory Usage

The study extends beyond simple comparisons of model efficiency to an ablation study on hardware performance. Figure 8 encapsulates this by presenting the differences in computational times when utilizing dual NVIDIA TITAN Xp GPUs against a single NVIDIA GeForce RTX 4090 GPU. The benchmarks—Market-1501, DukeMTMC-ReID, P-DukeMTMC-ReID, and Occluded-Duke—serve as a testing ground for this comparison, representing a diverse range of challenges in the Person Re-ID domain.

The substantial gain in computational efficiency observed when deploying the GeForce RTX 4090 can be attributed to its advanced hardware specifications. Notably, this single-GPU configuration has a pronounced impact on feature extraction and optimizer steps, pivotal phases in the model’s training that are inherently computation-intensive. Compared to the dual TITAN Xp setup, the RTX 4090 features a significantly higher number of CUDA cores (16,384 vs. 7168 per TITAN Xp), larger memory capacity (24 GB GDDR6X compared with 12 GB GDDR5X per TITAN Xp), and improved memory bandwidth (1008 GB/s vs. 547.7 GB/s per TITAN Xp). The visual representations provide quantitative data, showcasing the RTX 4090’s enhanced performance over the dual TITAN Xp setup, thus emphasizing the RTX 4090’s advanced architecture’s role in enhancing computational efficiency.

Table 6 further presents the computational efficiency and scalability behavior of the proposed DPBF framework. Although DPBF introduces additional contextual fusion operations, the model maintains a competitive processing efficiency while achieving an improved robustness under occlusion conditions. Compared with PCB + RPP, DPBF demonstrates an improved batch-processing efficiency under identical GPU configurations, while maintaining acceptable scalability despite a moderate increase in memory consumption. Computational complexity analysis was further conducted using integrated PyTorch (version: 2.0.1)-based model complexity evaluation utilities to estimate the parameter count and FLOPs of the proposed model.

5.4.2. Reproducibility, Computational Efficiency, and Occlusion Stability Analysis

To further evaluate experimental reproducibility and computational consistency, additional independent training and evaluation runs were conducted across multiple benchmark datasets under identical evaluation protocols and controlled training configurations. The repeated runs were executed using different GPU environments, including NVIDIA TITAN Xp and RTX4090, while maintaining a consistent model architecture, training strategy, fixed random seed initialization (seed = 1), and evaluation procedures without re-ranking. The obtained results demonstrated a stable R-1 and mAP performance across independent training runs and hardware environments, with only minor performance variations observed despite the substantial differences in computational hardware, supporting the robustness and reproducibility of the proposed DPBF framework. A summary of the cross-hardware performance and computational analysis is presented in Table 7.

The repeated evaluation results demonstrate a stable identification performance across independent training runs and different hardware environments. In addition, the reported parameter counts, FLOPs, and evaluation batch processing times provide a practical computational-efficiency reference for large-scale Person Re-ID deployment scenarios. The RTX4090 platform additionally demonstrated a substantially reduced evaluation processing time compared with the TITAN Xp configuration while maintaining a consistent identification performance across all evaluated datasets.

To further quantify the influence of severe occlusion on individual body-part embeddings, the embedding degradation analysis across occlusion-oriented benchmarks is presented in Table 8.

Table 8 further demonstrates the influence of severe lower-body and partial-body occlusion on local discriminative feature preservation. Across all evaluated occlusion-oriented benchmarks, p0 consistently achieved the highest identification performance, whereas p7 exhibited severe degradation, resulting in R-1 performance gaps ranging from 66.6% to 90.2% and mAP gaps ranging from 55.4% to 80.3%. These results demonstrate the substantial impact of severe partial-body occlusion on the discriminative capability of individual body-region embeddings. This pattern indicates that lower-body or weakly visible semantic regions are more vulnerable to missing visual information, severe occlusion, and viewpoint variation. The large R-1 and mAP gaps observed between the strongest and weakest embeddings further confirm that severe lower-body occlusion remains one of the principal challenges in part-based Person Re-ID systems. Although DPBF does not eliminate the impact of severe occlusion entirely, the maintained overall identification performance demonstrates that adaptive local–global feature integration helps mitigate these effects by preserving complementary discriminative cues from visible body regions.

Future work will therefore investigate stronger cross-part contextual reasoning, occlusion-aware feature reconstruction, and transformer-based global–local interaction mechanisms to improve robustness under extreme partial visibility.

6. Conclusions

The proposed DPBF framework demonstrates an improved capability to model variations in human appearance, effectively enhancing robustness against challenges such as occlusions, pose variations, and illumination changes. Table 1 and Table 2 demonstrate the effectiveness of DPBF across both holistic and occlusion-oriented benchmarks, while Figure 5 further illustrates the robustness and relative contribution of individual body-part embeddings under challenging visibility conditions. By moving beyond traditional methods that rely on global feature representation, DPBF employs the coordinated integration of SPD and AFICF, enabling the complementary interaction between local body-region representations and global contextual information. Its AFICF module effectively merges part-specific features with a global context, enhancing feature complementarity and reducing the limitations commonly observed in conventional feature-fusion methods. Although the DPBF model demonstrates a strong performance in the realm of Person Re-ID, several areas for further improvement have been identified, including performance degradation under extreme occlusion and low-resolution conditions, as illustrated in Figure 5 and Figure 6b, quantitatively confirmed by the embedding degradation analysis reported in Table 8. Additional reproducibility, computational-efficiency, and occlusion-stability analyses further demonstrated the robustness of the proposed DPBF framework, showing a stable performance across independent training runs, competitive computational complexity and evaluation efficiency, and the effective mitigation of severe occlusion effects through adaptive local–global feature integration.

Future work will investigate more stable parsing strategies and adaptive feature handling mechanisms to improve discrimination under severe occlusion, low-resolution conditions, and visually similar appearances while maintaining computational efficiency. In particular, the aim is to investigate adaptive part-weighting mechanisms and occlusion-aware feature refinement to enhance the model’s ability to extract meaningful information from lower-body regions, even when they are partially obscured or truncated due to camera limitations.

Author Contributions

Conceptualization, G.H.; methodology, G.H.; software, G.H.; validation, G.H. and J.S.S.; formal analysis, G.H.; investigation, G.H. and J.S.S.; writing—original draft preparation, G.H.; writing—review and editing, J.S.S. and W.A.-N.; supervision, W.A.-N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethical review and approval were waived for this study because all data were obtained from publicly available datasets (Occluded-ReID, Market-1501, DukeMTMC-ReID, Occluded-Duke, P-DukeMTMC-ReID, and CUHK03-Labeled). These datasets were collected and released by their original authors in accordance with relevant ethical guidelines. No new data collection or interaction with human participants was involved in this study.

Informed Consent Statement

Informed consent was waived because the study utilized publicly available datasets containing human images, which were previously collected and released by the original dataset providers under the appropriate ethical procedures. The authors did not conduct any new data collection or have direct interaction with human participants.

Data Availability Statement

The datasets used in this study are publicly available. The Market-1501 dataset is available at https://www.kaggle.com/datasets/pengcw1/market-1501 (accessed on 1 June 2026), the DukeMTMC-ReID dataset is available at https://www.kaggle.com/datasets/whurobin/dukemtmcreid (accessed on 1 June 2026), the CUHK03-Labeled dataset is available at https://www.kaggle.com/datasets/priyanagda/cuhk03 (accessed on 1 June 2026), the Occluded-Duke dataset is available at https://www.kaggle.com/datasets/baofengz/occluded-duke (accessed on 1 June 2026), the Occluded-ReID and P-DukeMTMCRe-ID datasets is available at https://github.com/tinajia2012/ICME2018_Occluded-Person-Reidentification_datasets (accessed on 1 June 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sun, Y.; Zheng, L.; Yang, Y.; Tian, Q.; Wang, S. Learning Part-Based Convolutional Features for Person Re-Identification. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 902–917. [Google Scholar] [CrossRef] [PubMed]
Li, W.; Zhu, X.; Gong, S. Harmonious Attention Network for Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 2285–2294. [Google Scholar]
Su, C.; Li, J.; Zhang, S.; Xing, J.; Gao, W.; Tian, Q. Pose-Driven Deep Convolutional Model for Person Re-Identification. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3960–3969. [Google Scholar]
Li, P.; Xu, Y.; Wei, Y.; Yang, Y. Self-Correction for Human Parsing. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 3260–3271. [Google Scholar] [CrossRef] [PubMed]
Tao, H.; Bao, W.; Duan, Q.; Hu, Z.; An, J.; Xie, C. An Improved Interaction and Aggregation Network for Person Re-Identification. Multimed. Tools Appl. 2023, 82, 44053–44069. [Google Scholar] [CrossRef]
Miao, J.; Wu, Y.; Liu, P.; DIng, Y.; Yang, Y. Pose-Guided Feature Alignment for Occluded Person Re-Identification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 542–551. [Google Scholar]
Gao, S.; Wang, J.; Lu, H.; Liu, Z. Pose-Guided Visible Part Matching for Occluded Person Re-Identification. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11741–11749. [Google Scholar]
Somers, V.; De Vleeschouwer, C.; Alahi, A. Body Part-Based Representation Learning for Occluded Person Re-Identification. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2–7 January 2023; pp. 1613–1623. [Google Scholar]
Suh, Y.; Wang, J.; Tang, S.; Mei, T.; Lee, K. Part-Aligned Bilinear Representations for Person Re-Identification. In Proceedings of the European Conference on Computer Vision (ECCV); Springer: Cham, Switzerland, 2018; Volume 11218 LNCS, pp. 418–437. [Google Scholar]
He, S.; Luo, H.; Wang, P.; Wang, F.; Li, H. TransReID: Transformer-Based Object Re-Identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 15013–15022. [Google Scholar] [CrossRef]
Li, Y.; He, J.; Zhang, T.; Liu, X.; Zhang, Y.; Wu, F. Diverse Part Discovery: Occluded Person Re-Identification with Part-Aware Transformer. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 2897–2906. [Google Scholar] [CrossRef]
Ji, Z.; Cheng, D.; Feng, K. Exploring Stronger Transformer Representation Learning for Occluded Person Re-Identification. Multimed. Syst. 2025, 31, 394. [Google Scholar] [CrossRef]
Yan, S.; Dong, N.; Zhang, L.; Tang, J. CLIP-Driven Fine-Grained Text-Image Person Re-Identification. IEEE Trans. Image Process. 2023, 32, 7492–7505. [Google Scholar] [CrossRef] [PubMed]
Si, J.; Zhang, H.; Li, C.G.; Kuen, J.; Kong, X.; Kot, A.C.; Wang, G. Dual Attention Matching Network for Context-Aware Feature Sequence Based Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 5363–5372. [Google Scholar] [CrossRef]
Xu, Z.; Geng, X. Multi-Level Feature Aggregation Network for Person Re-Identification. In Proceedings of the International Conference on Big Data and Artificial Intelligence and Software Engineering (ICBASE), Nanjing, China, 25–27 August 2023; pp. 168–171. [Google Scholar]
Li, W.; Zhu, X.; Gong, S. Person Re-Identification by Deep Joint Learning of Multi-Loss Classification. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Melbourne, Australia, 19–25 August 2017; pp. 2194–2200. [Google Scholar]
Ferdous, S.N.; Li, X. Robust Ensemble Person Reidentification via Orthogonal Fusion with Occlusion Handling. Image Vis. Comput. 2024, 146, 105010. [Google Scholar] [CrossRef]
Zhu, K.; Guo, H.; Liu, Z.; Tang, M.; Wang, J. Identity-Guided Human Semantic Parsing for Person Re-Identification. In Proceedings of the European Conference on Computer Vision (ECCV); Springer: Cham, Switzerland, 2020; Volume 12348, pp. 346–363. [Google Scholar]
Chen, W.; Xu, X.; Jia, J.; Luo, H.; Wang, Y.; Wang, F.; Jin, R.; Sun, X. Beyond Appearance: A Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 15050–15061. [Google Scholar] [CrossRef]
Yang, L.; Song, Q.; Wang, Z.; Jiang, M. Parsing R-CNN for Instance-Level Human Analysis. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 364–373. [Google Scholar] [CrossRef]
Habermann, M.; Xu, W.; Zollhofer, M.; Pons-Moll, G.; Theobalt, C. DeepCap: Monocular Human Performance Capture Using Weak Supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 5052–5063. [Google Scholar]
Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhutdinov, R.; Zemel, R.; Bengio, Y. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Int. Conf. Mach. Learn. (ICML) 2016, 37, 2048–2057. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 5998–6008. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Viola, P.; Jones, M. Rapid Object Detection Using a Boosted Cascade of Simple Features. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Kauai, HI, USA, 8–14 December 2001; Volume 1, pp. I-511–I-518. [Google Scholar]
Schroff, F.; Kalenichenko, D.; Philbin, J. FaceNet: A Unified Embedding for Face Recognition and Clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
Jaderberg, M.; Simonyan, K.; Zisserman, A.; Kavukcuoglu, K. Spatial Transformer Networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 7–12 December 2015; pp. 2017–2025. [Google Scholar]
Zhao, L.; Li, X.; Zhuang, Y.; Wang, J. Deeply-Learned Part-Aligned Representations for Person Re-Identification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 3239–3248. [Google Scholar]
Xu, L.; Fu, X. Multiscale Reference-Aided Attentive Feature Aggregation for Person Re-Identification. IEEE Access 2021, 9, 141667–141677. [Google Scholar] [CrossRef]
Yan, G.; Wang, Z.; Geng, S.; Yu, Y.; Guo, Y. Part-Based Representation Enhancement for Occluded Person Re-Identification. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 4217–4231. [Google Scholar] [CrossRef]
Wang, M.; Tao, X.; Han, F. A New Method for Redundancy Analysis in Feature Selection. In Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence; Association for Computing Machinery: New York, NY, USA, 2020; pp. 407–411. [Google Scholar]
Zhang, Z.; Huang, M. Person Re-Identification Based on Heterogeneous Part-Based Deep Network in Camera Networks. IEEE Trans. Emerg. Top. Comput. Intell. 2020, 4, 51–60. [Google Scholar] [CrossRef]
Chen, S.; Zhang, H.; Lei, Z. Person Re-Identification Based on Attention Mechanism and Context Information Fusion. Future Internet 2021, 13, 72. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep High-Resolution Representation Learning for Human Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5686–5696. [Google Scholar] [CrossRef]
Luo, H.; Gu, Y.; Liao, X.; Lai, S.; Jiang, W. Bag of Tricks and a Strong Baseline for Deep Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16–17 June 2019; pp. 1487–1495. [Google Scholar]
Kingma, D.P.; Ba, J.L. Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015; pp. 1–15. [Google Scholar]
Ristani, E.; Solera, F.; Zou, R.; Cucchiara, R.; Tomasi, C. Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. In European Conference on Computer Vision (ECCV) Workshops; Springer: Cham, Switzerland, 2016; pp. 17–35. [Google Scholar] [CrossRef]
Li, W.; Zhao, R.; Xiao, T.; Wang, X. DeepReID: Deep Filter Pairing Neural Network for Person Re-Identification. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 152–159. [Google Scholar] [CrossRef]
Zheng, L.; Shen, L.; Tian, L.; Wang, S.; Wang, J.; Tian, Q. Scalable Person Re-Identification: A Benchmark. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1116–1124. [Google Scholar]
Zheng, K.; Lan, C.; Zeng, W.; Liu, J.; Zhang, Z.; Zha, Z. Pose-Guided Feature Learning with Knowledge Distillation for Occluded Person Re-Identification. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event, 20–24 October 2021; pp. 4537–4545. [Google Scholar] [CrossRef]
Zhuo, J.; Chen, Z.; Lai, J.; Wang, G. Occluded Person Re-Identification. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), San Diego, CA, USA, 23–27 July 2018; pp. 1–6. [Google Scholar]
Chang, X.; Hospedales, T.M.; Xiang, T. Multi-Level Factorisation Net for Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 2109–2118. [Google Scholar]
Fan, B.; Wang, L.; Zhang, R.; Guo, Z.; Zhao, Y.; Li, R.; Gong, W. Contextual Multi-Scale Feature Learning for Person Re-Identification. In Proceedings of the ACM International Conference on Multimedia (ACM MM), Seattle, WA, USA, 12–16 October 2020; pp. 655–663. [Google Scholar]
Quispe, R.; Pedrini, H. Top-DB-Net: Top Dropblock for Activation Enhancement in Person Re-Identification. In Proceedings of the International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 2980–2987. [Google Scholar]
Gautam, V.; Prasad, S.; Sinha, S. AaP-ReID: Improved Attention-Aware Person Re-Identification. In Proceedings of the International Conference on Image Information Processing (ICIIP), Solan, India, 22–24 November 2023; pp. 1–6. [Google Scholar]
Zheng, F.; Deng, C.; Sun, X.; Jiang, X.; Guo, X.; Yu, Z.; Huang, F.; Ji, R. Pyramidal Person Re-Identification via Multi-Loss Dynamic Training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 8506–8514. [Google Scholar]
Zhang, Z.; Lan, C.; Zeng, W.; Chen, Z. Densely Semantically Aligned Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 667–676. [Google Scholar]
Chen, X.; Fu, C.; Zhao, Y.; Zheng, F.; Song, J.; Ji, R.; Yang, Y. Salience-Guided Cascaded Suppression Network for Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 3297–3307. [Google Scholar]
Tan, H.; Liu, X.; Yin, B.; Li, X. MHSA-Net: Multihead Self-Attention Network for Occluded Person Re-Identification. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 8210–8224. [Google Scholar] [CrossRef] [PubMed]
Yang, J.; Zhang, J.; Yu, F.; Jiang, X.; Zhang, M.; Sun, X.; Chen, Y.; Zheng, W. Learning to Know Where to See: A Visibility-Aware Approach for Occluded Person Re-Identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 11865–11874. [Google Scholar]
Zang, X.; Li, G.; Gao, W.; Shu, X. Learning to Disentangle Scenes for Person Re-Identification. Image Vis. Comput. 2021, 116, 104330. [Google Scholar] [CrossRef]
Kiran, M.; Praveen, R.G.; Nguyen-Meidine, L.T.; Belharbi, S.; Blais-Morin, L.A.; Granger, E. Holistic Guidance for Occluded Person Re-Identification. In Proceedings of the British Machine Vision Conference (BMVC), Virtual, 22–25 November 2021; pp. 115.1–115.13. [Google Scholar]
Huang, G.; Liu, Z.; Van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]

Figure 1. Visualization of attention maps across

K

predefined body region part in Person Re-ID. Orange arrows indicate the correspondence between predefined body regions and generated attention responses, while green arrows denote the resulting body-part feature representations used for subsequent feature extraction.

Figure 1. Visualization of attention maps across

K

predefined body region part in Person Re-ID. Orange arrows indicate the correspondence between predefined body regions and generated attention responses, while green arrows denote the resulting body-part feature representations used for subsequent feature extraction.

Figure 2. Cross-part-specific feature modulation maps.

Figure 3. Illustration of the SPD attention map generation part-specific feature extraction process.

Figure 4. End-to-End Person Re-ID framework illustrating SPD-based part discrimination, AFICF-based feature integration, PCC/PCA refinement, and hybrid-loss optimization. Different colours and arrow styles are used to visually distinguish the major processing stages, feature flows, and intermediate representations within the framework.

Figure 5. Part-wise embedding analysis and performance degradation under occlusion conditions. (a) Visualization of attention-guided body-part embeddings and confidence responses across challenging identification samples. (b) Part-wise performance analysis on the Occluded-ReID dataset. (c) Part-wise performance analysis on the P-DukeMTMC-ReID dataset.

Figure 6. Occlusion-aware visualization of part-based feature embeddings: (a) feature localization under partial occlusion; and (b) severe occlusion examples highlighting degraded local body-region responses. Green borders indicate successful feature localization or matching, whereas red borders indicate weak or inaccurate representations.

Figure 7. Comparative analysis of feature-learning techniques.

Figure 8. Comparative analysis of computational performance across different GPU configurations.

Table 1. Comprehensive performance analysis on holistic datasets.

Methods	Market-1501		DukeMTMC-ReID		CUHK03-Labeled
Methods	R-1	mAP	R-1	mAP	R-1	mAP
HA-CNN [2] (PB + G)	91.2	75.7	80.5	63.8	41.7	38.6
MLFN [43] (G)	92.3	82.4	81.0	62.8	54.7	49.2
PCB + RPP [1] (PB)	93.8	81.6	83.3	69.2	63.7	57.5
OSNet [44] (PB)	94.8	84.9	88.6	73.5	72.3	67.8
Top-DB-Net [45] (PB + G)	94.9	85.8	87.5	73.5	79.4	75.4
BPB Re-ID [8] (PB)	95.6 ³	86.1	87.1	77.5	-	-
AaP-Re-ID [46] (PB + G)	95.6 ³	93.9 ¹	90.6 ³	88.6 ¹	84.7 ²	82.4 ²
Pyramid [47] (PB + G)	95.7 ²	88.2	89.0	79.0	78.9 ³	76.9
DSA-Re-ID [48] (PB + G)	95.7 ²	87.6	86.2	74.3	78.9 ³	75.2
SCSN [49] (PB + G)	95.7 ²	88.5 ³	91.0 ²	79.0 ³	84.7 ²	81.0 ³
DPBF (PB + G)	96.0 ¹	89.3 ²	92.0 ¹	82.8 ²	89.5 ¹	82.6 ¹

Note: ¹ indicates the best result, ² indicates the second-best result, and ³ indicates the third-best result. Equal values share the same ranking.

Table 2. Comprehensive performance analysis on occluded datasets.

Methods	Occluded-Duke		Occluded-ReID		P-DukeMTMC-ReID
Methods	R-1	mAP	R-1	mAP	R-1	mAP
PVPM + AUG [7] (PB)	47.0	37.7	70.4	60.1	51.5	29.2
BOT [36] (G)	51.4	44.7	58.4	52.3	87.0	74.9
PGFA [6] (PB)	51.4	37.3	57.1	56.2	44.2	23.1
MHSA-Net [50] (G)	58.2	43.1	-	-	69.6	37.6
VGT_ReID [51] (PB)	62.2	46.1	81.0	71.0	-	-
PGFL-KD [41] (PB)	63.0	54.1	80.7	70.3	81.1	64.2
PAT [11] (PB)	64.5	53.6	81.6	72.1 ³	-	-
LDS [52] (G)	64.3	55.7 ³	-	-	91.9 ³	82.9 ²
HG [53] (G)	65.1 ³	54.7	82.8 ³	72.0	-	-
BPB Re-ID [8] (PB)	75.1 ¹	62.5 ¹	82.9 ²	75.2 ²	93.0 ²	83.2 ³
DPBF (PB + G)	74.7 ²	60.7 ²	93.5 ¹	91.2 ¹	93.6 ¹	83.6 ¹

Note: ¹ indicates the best result, ² indicates the second-best result, and ³ indicates the third-best result.

Table 3. Performance comparison of feature integration configurations on the Occluded-ReID dataset.

Index	CE Loss				Triplet Loss				DPBF			PCB
Index	$g_{f}$	$B p$	$P C C$	$P C A$	$g_{f}$	$B p$	$P C C$	$P C A$	mAP	R-1	R-5	mAP	R-1	R-5
$H L F$	✓	✓	✓	✓		✓	✓	✓	73.7	86.5	89.4	55.6	73.3	84.8
1	✓		✓	✓		✓	✓	✓	68.7	79.9	84.3	51.5	71.4	84.4
2			✓	✓	✓	✓	✓	✓	62.9	72.9	77.8	50.2	71.6	84.9
3	✓		✓	✓			✓	✓	57.5	66.3	71.5	47.6	68.9	83.4
4			✓	✓	✓	✓	✓	✓	55.2	62.6	69.1	43.5	65.1	80.5

Table 4. Ablation study on DPBF model performance with varied part configurations.

Method	Supervised		Unsupervised		Semi-Supervised
	Parsing Label		PAM		Parsing + PAM
	R-1	mAP	R-1	mAP	R-1	mAP
DPBF on 3 parts: (3-labels)	86.5	80.1	81.3	74.2	92.5	85.3
DPBF on 5 parts: (5-labels)	89.1	82.5	83.5	75.9	93.6	87.2
DPBF on 8 parts: (8-labels)	92.6	85.3	86.7	77.8	96.0	89.3

Table 5. Comparative performance analysis of DPBF and PCB on part-based configurations.

Method	DPBF		PCB [1]
Method	R-1	mAP	R-1	mAP
3 parts	87.4	80.2	76.4	58.3
5 parts	90.5	82.7	82.4	67.5
6 parts	91.0	83.8	82.6	68.8

Table 6. Processing time and memory usage comparison of the evaluated methods.

Method	Processing Time (s/Batch)	Memory Usage (MB)
PCB + RPP [1]	129.8272	20,250
BPB Re-ID [8]	122.7825	19,018
DPBF	126.2159	21,250

Table 7. Reproducibility, computational complexity, and evaluation consistency analysis across independent DPBF training runs.

Dataset	Parameters	FLOPs	GPU	R-1 (%)	mAP (%)	Evaluation Batch Time (s)
Market1501	39,847,494	8,005,197,312	TITAN Xp	96.08	89.32	0.444
			RTX4090	95.67	89.49	0.126
DukeMTMC-ReID	39,521,350	8,004,871,168	TITAN Xp	91.88	82.87	0.454
			RTX4090	92.06	82.87	0.130
P-DukeMTMC-ReID	41,329,865	8,027,322,368	TITAN Xp	92.09	82.13	0.410
			RTX4090	93.62	83.67	0.128

Table 8. Quantitative embedding degradation analysis under severe occlusion conditions across occlusion-oriented Person Re-ID benchmarks.

Dataset	Strongest Embedding	Weakest Embedding	R-1 Gap	mAP Gap
Occluded-ReID	p0 (91.5/84.8)	p7 (2.5/6.9)	89.0	77.9
Occluded-Duke	p0 (69.5/56.8)	p7 (2.9/1.4)	66.6	55.4
P-DukeMTMC-ReID	p0 (92.6/81.8)	p7 (2.4/1.5)	90.2	80.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hussein, G.; Smith, J.S.; Al-Nuaimy, W. Strategic Feature Integration for Superior Person Re-ID: A Part-Based Approach. AI 2026, 7, 210. https://doi.org/10.3390/ai7060210

AMA Style

Hussein G, Smith JS, Al-Nuaimy W. Strategic Feature Integration for Superior Person Re-ID: A Part-Based Approach. AI. 2026; 7(6):210. https://doi.org/10.3390/ai7060210

Chicago/Turabian Style

Hussein, Ghaith, Jeremy S. Smith, and Waleed Al-Nuaimy. 2026. "Strategic Feature Integration for Superior Person Re-ID: A Part-Based Approach" AI 7, no. 6: 210. https://doi.org/10.3390/ai7060210

APA Style

Hussein, G., Smith, J. S., & Al-Nuaimy, W. (2026). Strategic Feature Integration for Superior Person Re-ID: A Part-Based Approach. AI, 7(6), 210. https://doi.org/10.3390/ai7060210

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Strategic Feature Integration for Superior Person Re-ID: A Part-Based Approach

Abstract

1. Introduction

2. Related Work

2.1. Part-Based Models and Attention Mechanisms: Evolution

2.2. Salient Feature Detection and Fusion Advances

2.3. Human Parsing and Pose Estimation

3. Proposed Method

3.1. Salient Part Discrimination (SPD)

3.1.1. Generating Attention Maps for Each Body Part

3.1.2. Cross-Part Spatial Feature Modulation Map (CP-SFMM)

3.2. Adaptive Feature Integration and Contextual Fusion (AFICF)

3.3. Hybrid Loss Function (HLF)

4. Experiments and Results

4.1. Implementation Details

Reproducibility and Training Pipeline

4.2. Baseline Comparisons

4.3. Part-Based Feature Discrimination and Model Efficacy in Challenging Identification Scenarios

4.4. Feature Integration Strategy Evaluation

5. Ablation Studies

5.1. Effectiveness of DPBF Model Under Supervised and Unsupervised Learning on the Market-1501 Datasets

5.2. Visualisation of Occluded Learned Features Based on Individual Body-Part-Based Embeddings

5.3. Comparison with Other Feature-Learning Techniques Under Different Image Resolutions

5.4. Computational Efficiency and Scalability Analysis

5.4.1. Assessing Processing Time and Memory Usage

5.4.2. Reproducibility, Computational Efficiency, and Occlusion Stability Analysis

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI