Person Re-Identification Enhanced by Super-Resolution Technology
Abstract
1. Introduction
1.1. Background
1.2. Motivation and Contributions
2. Related Works
2.1. SR Enhancement
- (1)
- Traditional Interpolation-Based SR
- (2)
- Deep Learning-Based SR
- (3)
- Semantic-Guided Adaptive SR Optimization
2.2. Person ReID
- (1)
- Traditional ReID relies on handcrafted features and statistical learning.
- (2)
- Deep Learning-based ReID dominates current research, which can be categorized according to supervision type and modality.
2.3. Integration of SR and ReID
- (1)
- SR as Preprocessing: Most prior works treat SR as a standalone preprocessing step: LR images are first enhanced via SR, then fed into a ReID model. Wu et al. [29] developed SR-DSFF (2022), a two-stage approach for cross-resolution person ReID. It first used SR to enhance LR images, then fused multi-scale features for ReID. SR-DSFF improved mAP by 5–8% on LR datasets but suffered from information loss during the feature transfer between the SR module and the ReID module. Wang et al. [30] proposed SR-ReID (2018), which combined SRGAN with a ResNet-based ReID model. However, SRGAN’s inherent mode collapse led to inconsistent detail generation, which limited the overall ReID performance.
- (2)
- End-to-End Integration: Few works have explored the joint optimization of SR and ReID end-to-end. Dong et al. [18] introduced the SR-ReID Joint Framework (2014), which integrated a lightweight SR CNN with a ReID Transformer in an end-to-end manner. Nevertheless, this framework lacked semantic guidance for pedestrian-specific regions, which restricted its performance in LR person ReID scenarios.
3. Methodology
3.1. Model Paradigm Definitions
3.2. SR Enhancement
3.3. Person ReID
3.4. Model Improvements
- (1)
- Feature Extraction and Enhancement: The LR input image first passes through a shallow convolutional layer to extract initial features . These features are then fed into the integrated HAT module.
- (2)
- Attention-Driven Feature Refinement: Inside HAT, the hybrid attention mechanism (combining channel attention and window self-attention) processes . Channel attention guides global enhancement, while window self-attention focuses on local detail recovery. This generates detail-rich features .
- (3)
- Feature Fusion: The initial features and the enhanced features are fused via a residual connection to form the high-quality feature map, as shown in Equation (1).
- (4)
- Semantic-Guided Attention: The fused feature map is then processed by SOLIDER-REID’s semantic controller. This controller dynamically analyzes the feature map, amplifying the weights of features in pedestrian regions while suppressing background noise, thereby focusing the model on identity-discriminative semantics.
- (5)
- Backbone Processing: The refined feature map is subsequently passed to the deeper layers of the Swin Transformer for further feature extraction and ReID matching.
- (1)
- Optimization Objective: Task-Driven and Reconstruction-Driven
- (2)
- Information Flow: Feature Enhancement and Pixel Reconstruction
- (3)
- Gradient Flow: Synergistic and Interrupted
4. Experiments and Results
4.1. Datasets
4.2. Parameter Settings
4.3. Evaluation Indicators
4.3.1. Image SR Reconstruction
4.3.2. Person ReID
4.4. Experimental Results and Analysis
4.4.1. SR Enhancement
4.4.2. Person ReID
4.5. Discussion and Analysis
- (1)
- Improvements with SR: The models relying on multi-scale features (e.g., SOLIDER, light-REID) show slight improvements on mAP and Rank-1 with SR (HAT, PiSA-SR), as the restored details strengthen fine-grained features.
- (2)
- Accuracy Degradation with SR on Global Relational Models: The models using global relational modeling (e.g., RGA) may perform worse than expected, as SR can disrupt global relationships.
- (3)
- Advantage of the End-to-End Framework: HAT-SOLIDER excels even on extreme LR (32 × 32) via end-to-end optimization, avoiding independent SR’s information loss, with 19.2% Rank-1 and 19.5% mAP gain. The superior performance of the proposed HAT-SOLIDER framework on extreme LR images (32 × 32) stems from its task-driven, end-to-end design, which enables synergistic collaboration between the SR module and the semantic controller. Unlike the two-stage HAT + SOLIDER model, which suffers from information loss during the separate image reconstruction and re-encoding steps, our integrated model performs feature-level enhancement directly within the ReID backbone. This allows the semantic controller to provide immediate feedback, guiding the HAT module via backpropagation to recover identity-discriminative details in pedestrian regions while suppressing irrelevant background noise. The joint optimization via the combined loss function ensures that super-resolution is explicitly tailored for the re-identification task, resulting in significantly more robust feature representations for low-resolution inputs.
- (4)
- Why End-to-End Excels in Extreme LR: At 32 × 32 resolution, most high-frequency details are permanently lost. However, our HAT-SOLIDER framework does not “recover” lost pixels but learns to augment discriminative features in the latent space. The semantic controller guides the HAT module to emphasize pedestrian regions, while the joint loss ensures that enhanced features align with ReID objectives.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| ReID | Person Re-identification |
| SR | Super-Resolution |
| LR | Low-Resolution |
| HR | High-Resolution |
| HAT | Hybrid Attention Transformer |
| PiSA-SR | Pixel-level and Semantic-level Adjustable SR |
| Omni-SR | Omni Aggregation Networks for Lightweight Image SR |
| SOLIDER-REID | Semantically Controllable Self-Supervised Learning Framework-REID |
| RGA | Relation-Aware Global Attention |
| PSNR | Peak Signal-to-Noise Ratio |
| SSIM | Structural Similarity Index |
| mAP | Mean Average Precision |
| CNN | Convolutional Neural Network |
| GAN | Generative Adversarial Network |
| LoRA | Low-Rank Adaptation |
| LPIPS | Learned Perceptual Image Patch Similarity |
| CSD | Classifier Score Distillation |
| SGD | Stochastic Gradient Descent |
| SOTA | state-of-the-art |
References
- Yu, Z.; Cai, Y.; Xu, H.; Chen, L.; Yang, M.; Sun, H.; Zhao, X. An Attention-Enhanced Network for Person Re-Identification via Appearance–Gait Fusion. Electronics 2025, 14, 4142. [Google Scholar] [CrossRef]
- Chen, Y.C.; Zhu, X.T.; Zheng, W.S.; Lai, J.-H. Person Re-identification by Camera Correlation Aware Feature Augmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 392–408. [Google Scholar] [CrossRef] [PubMed]
- Liu, H.Z.; Liu, J.; Zhang, Y.H.; Chen, M.; Liu, J.; He, H.H. Progressive Feature Refining with the Deformable-Guidance Describer for Dense Video Captioning. Expert Syst. Appl. 2025, 294, 128778. [Google Scholar] [CrossRef]
- Wang, Q.; Feng, G.; Li, Z. A Lightweight Person Detector for Surveillance Footage Based on YOLOv8n. Sensors 2025, 25, 436. [Google Scholar] [CrossRef]
- Tang, Y.; Yang, X.; Jiang, X.; Wang, N.; Gao, X. Dually Distribution Pulling Network for Cross-Resolution Person Reidentification. IEEE Trans. Cybern. 2022, 52, 12016–12027. [Google Scholar] [CrossRef] [PubMed]
- Yan, L.C.; Wang, F.; Leng, L.; Teoh, A.B.J. Toward Comprehensive and Effective Palmprint Reconstruction Attack. Pattern Recognit. 2024, 155, 110655. [Google Scholar] [CrossRef]
- Zheng, L.; Shen, L.Y.; Tian, L.; Wang, S.; Wang, J.; Tian, Q. Scalable person re-identification: A benchmark. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; IEEE Press: Piscataway, NJ, USA, 2015; pp. 1116–1124. [Google Scholar]
- Wei, W.Y.; Yang, W.Z.; Zuo, E.G.; Qian, Y.; Wang, L. Person Re-identification Based on Deep Learning—An Overview. J. Vis. Commun. Image Represent. 2022, 82, 103418. [Google Scholar] [CrossRef]
- Wang, Z.H.; Chen, J.; Hoi, S.C.H. Deep Learning for Image Super-Resolution: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 40, 3365–3387. [Google Scholar] [CrossRef] [PubMed]
- Chen, X.Y.; Wang, X.T.; Zhou, J.T.; Qiao, Y.; Dong, C. Activating More Pixels in Image Super-Resolution Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2023; IEEE Press: Piscataway, NJ, USA, 2023; pp. 22367–22377. [Google Scholar]
- Sun, L.C.; Wu, R.Y.; Ma, Z.Y.; Liu, S.; Yi, Q.; Zhang, L. Pixel-Level and Semantic-Level Adjustable Super-Resolution: A Dual-Lora Approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Los Angeles, CA, USA, 16–22 June 2025; IEEE Press: Piscataway, NJ, USA, 2025; p. 33357. [Google Scholar]
- Wang, H.; Chen, X.H.; Ni, B.B.; Liu, Y.; Liu, J. Omni Aggregation Networks for Lightweight Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2023; IEEE Press: Piscataway, NJ, USA, 2023; pp. 22378–22387. [Google Scholar]
- Chen, W.H.; Xu, X.Z.; Jia, J.; Luo, H.; Wang, Y.; Wang, F.; Jin, R.; Sun, X. Beyond Appearance: A Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2023; IEEE Press: Piscataway, NJ, USA, 2023; pp. 15050–15061. [Google Scholar]
- Wang, G.A.; Huang, X.W.; Gong, S.G.; Zhang, J.; Gao, W. Faster Person Re-identification: One-Shot-Filter and Coarse-to-Fine Search. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 3013–3030. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Z.Z.; Lan, C.L.; Zeng, W.J.; Jin, X.; Chen, Z. Relation-Aware Global Attention for Person Re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE Press: Piscataway, NJ, USA, 2020; pp. 3183–3192. [Google Scholar]
- Xing, E.P.; Ng, A.Y.; Jordan, M.I.; Russell, S. Distance Metric Learning with Application to Clustering with Side-Information. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 9–14 December 2002; MIT Press: Cambridge, MA, USA, 2002; pp. 521–528. [Google Scholar]
- Lin, W.J.; Chu, J.; Leng, L.; Miao, J.; Wang, L.F. Feature Disentanglement in One-Stage Object Detection. Pattern Recognit. 2024, 145, 109878. [Google Scholar] [CrossRef]
- Dong, C.; Loy, C.C.; He, K.; Tang, X.O. Learning a Deep Convolutional Network for Image Super-Resolution. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 184–199. [Google Scholar]
- Kim, J.W.; Lee, J.K.; Lee, K.M. Accurate Image Super-Resolution Using Very Deep Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE Press: Piscataway, NJ, USA, 2016; pp. 1646–1654. [Google Scholar]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.P.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE Press: Piscataway, NJ, USA, 2017; pp. 105–114. [Google Scholar]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M. Enhanced Deep Residual Networks for Single Image Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; IEEE Press: Piscataway, NJ, USA, 2017; pp. 136–144. [Google Scholar]
- Sun, L.; Dong, J.X.; Tang, J.H.; Pan, J. Spatially-Adaptive Feature Modulation for Efficient Image Super-Resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–8 October 2023; IEEE Press: Piscataway, NJ, USA, 2023; pp. 13144–13153. [Google Scholar]
- Li, B.C.; Li, X.; Zhu, H.X.; Jin, Y.; Feng, R.; Zhang, Z.; Chen, Z. SeD: Semantic-Aware Discriminator for Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–24 June 2024; IEEE Press: Piscataway, NJ, USA, 2024; pp. 25784–25795. [Google Scholar]
- Zhang, X.L.; Liu, J.; Chen, C.C.; Gong, P.Z.; Wu, Z.D.; Guo, L. Modeling Temporal Continuity of Spatial Interactions for Vessel Trajectories Prediction in Maritime Transportation Systems. Eng. Appl. Artif. Intell. 2025, 158, 111378. [Google Scholar] [CrossRef]
- Sun, Y.; Zheng, L.; Li, Y.; Yang, Y.; Tian, Q.; Wang, S. Learning Part-based Convolutional Features for Person Re-Identification. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 902–917. [Google Scholar] [CrossRef] [PubMed]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; IEEE Press: Piscataway, NJ, USA, 2021; pp. 9992–10002. [Google Scholar]
- Zheng, K.C.; Liu, W.; He, L.X.; Mei, T.; Luo, J.; Zha, Z.-J. Group-Aware Label Transfer for Domain Adaptive Person Re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; IEEE Press: Piscataway, NJ, USA, 2021; pp. 5306–5315. [Google Scholar]
- Ren, K.J.; Zhang, L. Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–24 June 2024; IEEE Press: Piscataway, NJ, USA, 2024; pp. 393–402. [Google Scholar]
- Wu, Z.; Yu, X.; Zhu, D.; Pang, Q.; Shen, S.; Ma, T.; Zheng, J. SR-DSFF and FENet-ReID: A Two-Stage Approach for Cross Resolution Person Re-identification. Comput. Intell. Neurosci. 2022, 2022, 4398727. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Ye, M.; Yang, F.; Bai, X.; Satoh, S.I. Cascaded SR-GAN for scale-adaptive low resolution person re-identification. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden, 13–19 July 2018; AAAI Press: Palo Alto, CA, USA, 2018; pp. 3891–3897. [Google Scholar]









| Category | Method Examples | Key Characteristics |
|---|---|---|
| Traditional Interpolation | Nearest Neighbor, Bilinear, Bicubic | Fast computation (O(W × H)) but causes blockiness or blurring; lacks semantic learning. |
| Deep Learning-based SR | SRCNN, VDSR, SRGAN, EDSR | Learns LR-to-HR mappings; improves PSNR/MOS but computationally costly and data-dependent. |
| Semantic-Guided Adaptive SR | SAFM, SeD, PiSA-SR | Uses semantic cues (e.g., pedestrian regions) to guide detail recovery; improves ReID relevance. |
| Category | Supervision or Modality | Method Examples | Key Characteristics |
|---|---|---|---|
| Traditional | - | Color (RGB/HSV), Shape (HOG), Texture (Gabor) | Handcrafted features; computationally cheap but limited robustness in complex scenes. |
| Deep Learning | Supervised | PCB, Swin-ReID | Uses identity labels; employs part-based or transformer architectures for robustness. |
| Deep Learning | Unsupervised | GLT, SOLIDER-REID | Uses clustering for pseudo-labels; reduces annotation cost; adapts to LR via semantics. |
| Deep Learning | Single-Modal | - | Uses only visible light images. |
| Deep Learning | Cross-Modal | IDKL | Bridges modalities (e.g., visible infrared) for all-condition ReID. |
| Characteristic | Two-Stage Model (A + B) | End-to-End Model (A − B) |
|---|---|---|
| Structural Relation | Sequential, models independent | Parallel, module embedded |
| Gradient Flow | Interrupted, no backpropagation | Continuous, joint backpropagation |
| Optimization Target | Independent SR quality (PSNR/Structural Similarity Index (SSIM)) and ReID accuracy | Unified joint loss, SR for ReID |
| Information Flow | Image space (pixel reconstruction) | Feature space (feature enhancement) |
| Advantage | Simple implementation, modular | Avoids information loss, is task-driven, and superior performance |
| Method | PSNR | SSIM |
|---|---|---|
| HAT | 34.36 | 0.940 |
| PiSA-SR | 34.95 | 0.945 |
| Omni-SR | 34.12 | 0.900 |
| Method | mAP | Rank-1 | Rank-5 | Rank-10 |
|---|---|---|---|---|
| SOLIDER | 91.6% | 96.5% | 98.8% | 99.2% |
| HAT + SOLIDER | 91.9% | 96.1% | 98.8% | 99.4% |
| PiSA-SR + SOLIDER | 92.0% | 96.3% | 98.8% | 99.4% |
| Omni-SR + SOLIDER | 91.8% | 96.3% | 98.8% | 99.4% |
| Method | mAP | Rank-1 | Rank-5 | Rank-10 |
|---|---|---|---|---|
| Light-REID | 89.0% | 94.0% | 97.5% | 98.5% |
| HAT + light-REID | 89.2% | 93.7% | 97.5% | 98.7% |
| PiSA-SR + light-REID | 89.3% | 93.8% | 97.6% | 98.8% |
| Omni-SR + light-REID | 89.1% | 93.9% | 97.5% | 98.6% |
| Method | mAP | Rank-1 | Rank-5 | Rank-10 |
|---|---|---|---|---|
| RGA | 85.0% | 88.0% | 95.0% | 97.0% |
| HAT + RGA | 85.2% | 87.3% | 94.7% | 97.1% |
| PiSA-SR + RGA | 85.3% | 87.5% | 95.3% | 97.2% |
| Omni-SR + RGA | 84.8% | 87.2% | 94.6% | 97.1% |
| Method | mAP | Rank-1 | Rank-5 | Rank-10 |
|---|---|---|---|---|
| SOLIDER | 40.3% | 61.2% | 82.4% | 88.4% |
| HAT-SOLIDER | 59.8% | 80.4% | 92.7% | 95.8% |
| HAT + SOLIDER | 58.6% | 78.6% | 91.6% | 94.3% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, Y.; Li, Z.; Leng, L.; Kim, C. Person Re-Identification Enhanced by Super-Resolution Technology. Electronics 2025, 14, 4647. https://doi.org/10.3390/electronics14234647
Liu Y, Li Z, Leng L, Kim C. Person Re-Identification Enhanced by Super-Resolution Technology. Electronics. 2025; 14(23):4647. https://doi.org/10.3390/electronics14234647
Chicago/Turabian StyleLiu, Yue, Zewen Li, Lu Leng, and Cheonshik Kim. 2025. "Person Re-Identification Enhanced by Super-Resolution Technology" Electronics 14, no. 23: 4647. https://doi.org/10.3390/electronics14234647
APA StyleLiu, Y., Li, Z., Leng, L., & Kim, C. (2025). Person Re-Identification Enhanced by Super-Resolution Technology. Electronics, 14(23), 4647. https://doi.org/10.3390/electronics14234647

