A Step-Wise Domain Adaptation Detection Transformer for Object Detection under Poor Visibility Conditions
Abstract
:1. Introduction
- (1)
- The SDA-DETR, a step-wise domain adaptation pipeline, is proposed. The elegance of the step-wise approach lies in its gradual reduction of discrepancies between the source domain and target domain at different levels, providing a simple and effective solution to the cross-domain object detection (CDOD) problem.
- (2)
- Our method addresses challenges in manually annotating low-light and adverse weather images, reducing reliance on label-rich training data. This contributes to improving the development efficiency of the self-driving field.
- (3)
- Empirically, we demonstrate the effectiveness of the SDA-DETR on three challenging public datasets; the SDA-DETR outperforms several popular methods, showing its applicability in various illumination conditions, as well as in adverse weather scenarios.
2. Related Work
2.1. Object Detection
2.2. Cross-Domain Object Detection
2.3. Masked Autoencoder
3. Method
3.1. Preliminary
3.2. Step-Wise Domain Adaptation DETR
3.2.1. Image-Level Adaptation
3.2.2. Domain-Invariant Feature Learning
3.2.3. Token-Masked Autoencoder
3.2.4. Full Objective
4. Experiments and Results
4.1. Datasets
4.2. Implementation Details
4.3. Results
4.3.1. Real-World Daytime to Nighttime Adaptation
4.3.2. Simulated-Environment Daytime to Nighttime Adaptation
4.3.3. Normal to Foggy Weather Adaptation
4.4. Ablation Study and Analysis
5. Discussion
- Figure 14 shows that the highest mAP is reached at a mask ratio of 0.8.
- Table 6 demonstrates that the highest mAP is achieved when and .
- It can be observed from Figure 15 that adaptively aligning the first and sixth layers of the encoder and decoder yields the highest mAP. Aligning multiple layers may lead to over alignment.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Hu, X.; Xu, X.; Xiao, Y.; Chen, H.; He, S.; Qin, J.; Heng, P.A. SINet: A scale-insensitive convolutional neural network for fast vehicle detection. IEEE Trans. Intell. Transp. Syst. 2018, 20, 1010–1019. [Google Scholar] [CrossRef]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Cui, X.; Ma, L.; Ma, T.; Liu, J.; Fan, X.; Liu, R. Trash to treasure: Low-light object detection via decomposition-and-aggregation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 1417–1425. [Google Scholar]
- Hui, Y.; Wang, J.; Li, B. WSA-YOLO: Weak-supervised and Adaptive object detection in the low-light environment for YOLOV7. IEEE Trans. Instrum. Meas. 2024, 73, 2507012. [Google Scholar] [CrossRef]
- Neumann, L.; Karg, M.; Zhang, S.; Scharfenberger, C.; Piegert, E.; Mistr, S.; Prokofyeva, O.; Thiel, R.; Vedaldi, A.; Zisserman, A.; et al. Nightowls: A pedestrians at night dataset. In Proceedings of the Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018; Revised Selected Papers, Part I 14. Springer: Berlin/Heidelberg, Germany, 2019; pp. 691–705. [Google Scholar]
- Yang, W.; Yuan, Y.; Ren, W.; Liu, J.; Scheirer, W.J.; Wang, Z.; Zhang, T.; Zhong, Q.; Xie, D.; Pu, S.; et al. Advancing image understanding in poor visibility environments: A collective benchmark study. IEEE Trans. Image Process. 2020, 29, 5737–5752. [Google Scholar] [CrossRef]
- Makihara, Y.; Takizawa, M.; Shirai, Y.; Shimada, N. Object recognition under various lighting conditions. In Proceedings of the Image Analysis: 13th Scandinavian Conference, SCIA 2003, Halmstad, Sweden, 29 June–2 July 2003; Proceedings 13. Springer: Berlin/Heidelberg, Germany, 2003; pp. 899–906. [Google Scholar]
- Kvyetnyy, R.; Maslii, R.; Harmash, V.; Bogach, I.; Kotyra, A.; Grądz, Ż.; Zhanpeisova, A.; Askarova, N. Object detection in images with low light condition. In Proceedings of the Photonics Applications in Astronomy, Communications, Industry, and High Energy Physics Experiments 2017, Wilga, Poland, 28 May–6 June 2017; SPIE: Bellingham, WA, USA, 2017; Volume 10445, pp. 250–259. [Google Scholar]
- Yin, W.; Yu, S.; Lin, Y.; Liu, J.; Sonke, J.J.; Gavves, E. Domain Adaptation with Cauchy-Schwarz Divergence. arXiv 2024, arXiv:2405.19978. [Google Scholar]
- Chen, Y.; Li, W.; Sakaridis, C.; Dai, D.; Van Gool, L. Domain adaptive faster r-cnn for object detection in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3339–3348. [Google Scholar]
- Li, X.; Li, Y.; Du, Z.; Li, F.; Lu, K.; Li, J. Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation. arXiv 2024, arXiv:2403.06946. [Google Scholar]
- Wang, C.; Pan, J.; Wang, W.; Fu, G.; Liang, S.; Wang, M.; Wu, X.M.; Liu, J. Correlation Matching Transformation Transformers for UHD Image Restoration. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 5336–5344. [Google Scholar]
- Lu, X.; Yuan, Y.; Liu, X.; Wang, L.; Zhou, X.; Yang, Y. Low-Light Salient Object Detection by Learning to Highlight the Foreground Objects. IEEE Trans. Circuits Syst. Video Technol. 2024. [Google Scholar] [CrossRef]
- Han, J.; Liang, X.; Xu, H.; Chen, K.; Hong, L.; Mao, J.; Ye, C.; Zhang, W.; Li, Z.; Liang, X.; et al. SODA10M: A large-scale 2D self/Semi-supervised object detection dataset for autonomous driving. arXiv 2021, arXiv:2106.11118. [Google Scholar]
- Sun, T.; Segu, M.; Postels, J.; Wang, Y.; Van Gool, L.; Schiele, B.; Tombari, F.; Yu, F. SHIFT: A synthetic driving dataset for continuous multi-task domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 21371–21382. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Sakaridis, C.; Dai, D.; Van Gool, L. Semantic foggy scene understanding with synthetic data. Int. J. Comput. Vis. 2018, 126, 973–992. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part I 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable Transformers for End-to-End Object Detection. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
- Zhang, S.; Wang, X.; Wang, J.; Pang, J.; Lyu, C.; Zhang, W.; Luo, P.; Chen, K. Dense Distinct Query for End-to-End Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7329–7338. [Google Scholar]
- Wang, Y.; Ha, J.E. Improved Object Detection with Content and Position Separation in Transformer. Remote Sens. 2024, 16, 353. [Google Scholar] [CrossRef]
- Li, G.; Ji, Z.; Qu, X. Stepwise domain adaptation (SDA) for object detection in autonomous vehicles using an adaptive CenterNet. IEEE Trans. Intell. Transp. Syst. 2022, 23, 17729–17743. [Google Scholar] [CrossRef]
- Oza, P.; Sindagi, V.A.; Sharmini, V.V.; Patel, V.M. Unsupervised domain adaptation of object detectors: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 4018–4040. [Google Scholar] [CrossRef]
- Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; Marchand, M.; Lempitsky, V. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 2016, 17, 2030–2096. [Google Scholar]
- Li, G.; Ji, Z.; Qu, X.; Zhou, R.; Cao, D. Cross-domain object detection for autonomous driving: A stepwise domain adaptative YOLO approach. IEEE Trans. Intell. Veh. 2022, 7, 603–615. [Google Scholar] [CrossRef]
- Saito, K.; Ushiku, Y.; Harada, T.; Saenko, K. Strong-weak distribution alignment for adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 6956–6965. [Google Scholar]
- Wang, W.; Cao, Y.; Zhang, J.; He, F.; Zha, Z.J.; Wen, Y.; Tao, D. Exploring sequence feature alignment for domain adaptive detection transformers. In Proceedings of the 29th ACM International Conference on Multimedia, Chengdu, China, 15 July 2021; pp. 1730–1738. [Google Scholar]
- Huang, W.J.; Lu, Y.L.; Lin, S.Y.; Xie, Y.; Lin, Y.Y. AQT: Adversarial Query Transformers for Domain Adaptive Object Detection. In International Joint Conferences on Artificial Intelligence Proceedings of the 31st International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23–29 July 2022; pp. 972–979. [Google Scholar]
- Gong, K.; Li, S.; Li, S.; Zhang, R.; Liu, C.H.; Chen, Q. Improving Transferability for Domain Adaptive Detection Transformers. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 1543–1551. [Google Scholar]
- He, L.; Wang, W.; Chen, A.; Sun, M.; Kuo, C.H.; Todorovic, S. Bidirectional Alignment for Domain Adaptive Detection with Transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 18775–18785. [Google Scholar]
- Jiang, Z.; Zhang, Y.; Wang, Z.; Yu, Y.; Zhang, Z.; Zhang, M.; Zhang, L.; Cheng, B. Inter-Domain Invariant Cross-Domain Object Detection Using Style and Content Disentanglement for In-Vehicle Images. Remote Sens. 2024, 16, 304. [Google Scholar] [CrossRef]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Arruda, V.F.; Berriel, R.F.; Paixão, T.M.; Badue, C.; De Souza, A.F.; Sebe, N.; Oliveira-Santos, T. Cross-domain object detection using unsupervised image translation. Expert Syst. Appl. 2022, 192, 116334. [Google Scholar] [CrossRef]
- Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1501–1510. [Google Scholar]
- He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16000–16009. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Bao, H.; Dong, L.; Piao, S.; Wei, F. BEiT: BERT Pre-Training of Image Transformers. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
- Chen, M.; Radford, A.; Child, R.; Wu, J.; Jun, H.; Luan, D.; Sutskever, I. Generative pretraining from pixels. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual Event, 13–18 July 2020; pp. 1691–1703. [Google Scholar]
- Tong, Z.; Song, Y.; Wang, J.; Wang, L. Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. Adv. Neural Inf. Process. Syst. 2022, 35, 10078–10093. [Google Scholar]
- Dai, Z.; Cai, B.; Lin, Y.; Chen, J. Up-detr: Unsupervised pre-training for object detection with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1601–1610. [Google Scholar]
- Jiang, L.; Zhang, C.; Huang, M.; Liu, C.; Shi, J.; Loy, C.C. Tsit: A simple and versatile framework for image-to-image translation. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part III 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 206–222. [Google Scholar]
- Ganin, Y.; Lempitsky, V. Unsupervised domain adaptation by backpropagation. In Proceedings of the International Conference on Machine Learning (PMLR), Lille, France, 7–9 July 2015; pp. 1180–1189. [Google Scholar]
- Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part II 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 694–711. [Google Scholar]
- Wang, T.C.; Liu, M.Y.; Zhu, J.Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8798–8807. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Li, Y.J.; Dai, X.; Ma, C.Y.; Liu, Y.C.; Chen, K.; Wu, B.; He, Z.; Kitani, K.; Vajda, P. Cross-domain adaptive teacher for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 7581–7590. [Google Scholar]
- Kennerley, M.; Wang, J.G.; Veeravalli, B.; Tan, R.T. 2PCNet: Two-Phase Consistency Training for Day-to-Night Unsupervised Domain Adaptive Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–23 June 2023; pp. 11484–11493. [Google Scholar]
- Deng, J.; Li, W.; Chen, Y.; Duan, L. Unbiased mean teacher for cross-domain object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 4091–4101. [Google Scholar]
- Cai, Q.; Pan, Y.; Ngo, C.W.; Tian, X.; Duan, L.; Yao, T. Exploring object relation in mean teacher for cross-domain detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 11457–11466. [Google Scholar]
- Chen, C.; Zheng, Z.; Ding, X.; Huang, Y.; Dou, Q. Harmonizing transferability and discriminability for adapting object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8869–8878. [Google Scholar]
- Chen, M.; Chen, W.; Yang, S.; Song, J.; Wang, X.; Zhang, L.; Yan, Y.; Qi, D.; Zhuang, Y.; Xie, D.; et al. Learning Domain Adaptive Object Detection with Probabilistic Teacher. In Proceedings of the International Conference on Machine Learning (PMLR), Baltimore, MD, USA, 17–23 July 2022; pp. 3040–3055. [Google Scholar]
- Zhao, L.; Wang, L. Task-specific inconsistency alignment for domain adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 14217–14226. [Google Scholar]
- He, M.; Wang, Y.; Wu, J.; Wang, Y.; Li, H.; Li, B.; Gan, W.; Wu, W.; Qiao, Y. Cross domain object detection by target-perceived dual branch distillation. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 9570–9580. [Google Scholar]
- Liu, X.; Li, W.; Yang, Q.; Li, B.; Yuan, Y. Towards robust adaptive object detection under noisy annotations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18-24 June 2022; pp. 14207–14216. [Google Scholar]
- Liu, D.; Zhang, C.; Song, Y.; Huang, H.; Wang, C.; Barnett, M.; Cai, W. Decompose to adapt: Cross-domain object detection via feature disentanglement. IEEE Trans. Multimed. 2022, 25, 1333–1344. [Google Scholar] [CrossRef]
- Jiang, J.; Chen, B.; Wang, J.; Long, M. Decoupled adaptation for cross-domain object detection. arXiv 2021, arXiv:2110.02578. [Google Scholar]
- Liu, Y.; Wang, J.; Wang, W.; Hu, Y.; Wang, Y.; Xu, Y. CRADA: Cross Domain Object Detection with Cyclic Reconstruction and Decoupling Adaptation. IEEE Trans. Multimed. 2024, 26, 6250–6261. [Google Scholar] [CrossRef]
- Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- Li, W.; Liu, X.; Yao, X.; Yuan, Y. Scan: Cross domain object detection with semantic conditioned adaptation. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 27 June 2022; Volume 36, pp. 1421–1428. [Google Scholar]
- Li, W.; Liu, X.; Yuan, Y. SCAN++: Enhanced Semantic Conditioned Adaptation for Domain Adaptive Object Detection. IEEE Trans. Multimed. 2022, 25, 7051–7061. [Google Scholar] [CrossRef]
- Li, W.; Liu, X.; Yuan, Y. Sigma: Semantic-complete graph matching for domain adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5291–5300. [Google Scholar]
- Li, W.; Liu, X.; Yuan, Y. SIGMA++: Improved Semantic-complete Graph Matching for Domain Adaptive Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 9022–9040. [Google Scholar] [CrossRef] [PubMed]
- Yu, X.; Lu, X. Domain Adaptation of Anchor-Free object detection for urban traffic. Neurocomputing 2024, 582, 127477. [Google Scholar] [CrossRef]
- Guo, Y.; Yu, H.; Xie, S.; Ma, L.; Cao, X.; Luo, X. DSCA: A Dual Semantic Correlation Alignment Method for domain adaptation object detection. Pattern Recognit. 2024, 150, 110329. [Google Scholar] [CrossRef]
- Mattolin, G.; Zanella, L.; Ricci, E.; Wang, Y. Confmix: Unsupervised domain adaptation for object detection via confidence-based mixing. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, New Orleans, LA, USA, 18–24 June 2023; pp. 423–433. [Google Scholar]
- Yu, J.; Liu, J.; Wei, X.; Zhou, H.; Nakata, Y.; Gudovskiy, D.; Okuno, T.; Li, J.; Keutzer, K.; Zhang, S. MTTrans: Cross-domain Object Detection with Mean Teacher Transformer. In Proceedings of the Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; Proceedings, Part IX. Springer: Berlin/Heidelberg, Germany, 2022; pp. 629–645. [Google Scholar]
- Zhang, J.; Huang, J.; Luo, Z.; Zhang, G.; Zhang, X.; Lu, S. DA-DETR: Domain Adaptive Detection Transformer With Information Fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 23787–23798. [Google Scholar]
- Jia, P.; Liu, J.; Yang, S.; Wu, J.; Xie, X.; Zhang, S. PM-DETR: Domain Adaptive Prompt Memory for Object Detection with Transformers. arXiv 2023, arXiv:2307.00313. [Google Scholar]
- Zhang, G.; Wang, L.; Zhang, Z.; Chen, Z. CPLT: Curriculum Pseudo Label Transformer for Domain Adaptive Object Detection in Foggy Weather. IEEE Sens. J. 2023, 23, 29857–29868. [Google Scholar] [CrossRef]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Coefficient | Scenarios | ||
---|---|---|---|
SODA10M (D→N) | SHIFT (D→N) | Cityscapes (N→F) | |
0.01 | 1.0 | 1.0 | |
0.01 | 1.0 | 1.0 | |
0.1 | 0.1 | 0.1 |
Method | Detector | Car | Cyclist | Pedestrian | Tram | Truck | mAP |
---|---|---|---|---|---|---|---|
Faster RCNN (source only) [23] | FRCNN | 50.6 | 22.5 | 22.0 | 24.7 | 27.5 | 29.5 |
AT [53] | FRCNN | 75.3 | 33.6 | 43.6 | 46.6 | 43.8 | 48.6 |
2PCNet [54] | FRCNN | 79.7 | 35.8 | 37.6 | 52.4 | 46.2 | 50.3 |
Deformable DETR (source only) [25] | D-DETR | 66.6 | 39.4 | 41.1 | 37.2 | 41.4 | 45.1 |
SFA [33] | D-DETR | 68.9 | 36.5 | 43.4 | 36.9 | 36.7 | 44.5 |
O2net [35] | D-DETR | 58.5 | 30.5 | 31.3 | 22.6 | 19.1 | 32.4 |
AQT [34] | D-DETR | 64.8 | 20.6 | 36.7 | 32.3 | 35.2 | 37.9 |
SDA-DETR (ours) | D-DETR | 76.9 | 47.7 | 48.4 | 48.1 | 45.1 | 53.2 |
Method | Detector | Bicycle | Bus | Car | Motorcycle | Pedestrian | Truck | mAP |
---|---|---|---|---|---|---|---|---|
faster RCNN (source only) [23] | FRCNN | 46.7 | 53.7 | 44.5 | 14.3 | 40.4 | 49.9 | 41.6 |
AT [53] | FRCNN | 52.3 | 49.5 | 33.0 | 20.7 | 25.8 | 54.7 | 38.9 |
UMT [55] | FRCNN | 49.2 | 46.8 | 47.5 | 16.6 | 7.7 | 18.4 | 31.1 |
DAFR [11] | FRCNN | 55.8 | 52.1 | 48.8 | 19.9 | 43.0 | 47.8 | 43.7 |
2PCNet [54] | FRCNN | 54.2 | 56.6 | 54.6 | 23.9 | 51.4 | 54.8 | 49.1 |
Deformable DETR (source only) [25] | D-DETR | 33.1 | 44.3 | 37.1 | 9.7 | 41.1 | 41.9 | 34.5 |
SFA [33] | D-DETR | 53.5 | 56.9 | 52.9 | 13.8 | 53.0 | 46.9 | 46.2 |
O2net [35] | D-DETR | 40.8 | 43.1 | 36.3 | 12.9 | 40.6 | 42.0 | 36.0 |
AQT [34] | D-DETR | 52.7 | 54.9 | 48.3 | 18.1 | 51.5 | 48.6 | 45.7 |
SDA-DETR (ours) | D-DETR | 54.6 | 58.7 | 59.0 | 21.1 | 59.7 | 54.2 | 51.2 |
Method | Detector | Bicycle | Bus | Car | Motorcycle | Person | Train | Rider | Truck | mAP |
---|---|---|---|---|---|---|---|---|---|---|
faster RCNN (source only) [23] | FRCNN | 28.6 | 32.4 | 35.6 | 25.8 | 26.9 | 9.6 | 38.2 | 18.3 | 26.9 |
SWDA [32] | FRCNN | 35.3 | 36.2 | 43.5 | 30.0 | 29.9 | 32.6 | 42.3 | 24.5 | 34.3 |
MTOR [56] | FRCNN | 35.6 | 38.6 | 44.0 | 28.3 | 30.6 | 40.6 | 41.4 | 21.9 | 35.1 |
HTCN [57] | FRCNN | 37.1 | 47.4 | 47.9 | 32.3 | 33.2 | 40.9 | 47.5 | 31.6 | 39.8 |
UMT [55] | FRCNN | 37.3 | 56.5 | 48.6 | 30.4 | 33.0 | 46.8 | 46.7 | 34.1 | 41.7 |
PT [58] | FRCNN | 44.5 | 51.8 | 59.7 | 35.4 | 40.2 | 30.6 | 48.8 | 30.7 | 42.7 |
TIA [59] | FRCNN | 38.1 | 52.1 | 49.7 | 37.7 | 34.8 | 48.6 | 46.3 | 31.1 | 42.3 |
TDD [60] | FRCNN | 41.4 | 47.6 | 55.7 | 37.0 | 39.6 | 42.1 | 47.5 | 33.8 | 43.1 |
DAF [61] | FRCNN | 39.6 | 49.9 | 54.8 | 29.9 | 37.0 | 43.5 | 46.9 | 32.1 | 41.8 |
DDF [62] | FRCNN | 40.8 | 43.9 | 51.9 | 33.5 | 37.2 | 34.2 | 46.3 | 24.7 | 39.1 |
D-adapt [63] | FRCNN | 42.0 | 36.8 | 58.1 | 32.2 | 43.1 | 14.6 | 51.8 | 26.3 | 38.1 |
CRADA [64] | FRCNN | 38.9 | 51.0 | 61.8 | 34.2 | 45.6 | 52.1 | 43.8 | 30.7 | 44.8 |
FCOS (source only) [65] | FCOS | 31.9 | 29.3 | 44.1 | 20.3 | 36.9 | 8.4 | 36.3 | 18.6 | 28.2 |
SCAN [66] | FCOS | 37.3 | 48.6 | 57.3 | 31.0 | 41.7 | 48.7 | 43.9 | 28.7 | 42.1 |
SCAN++ [67] | FCOS | 39.5 | 48.1 | 57.9 | 30.1 | 44.2 | 51.2 | 43.9 | 28.2 | 42.8 |
SIGMA [68] | FCOS | 41.4 | 50.7 | 63.7 | 34.7 | 46.9 | 35.9 | 48.4 | 27.1 | 43.5 |
SIGMA++ [69] | FCOS | 39.9 | 52.2 | 61.0 | 34.8 | 46.4 | 44.6 | 45.1 | 32.1 | 44.5 |
DAAF [70] | FCOS | 39.2 | 39.4 | 58.8 | 28.3 | 43.7 | 28.9 | 41.5 | 26.7 | 38.3 |
DSCA [71] | FCOS | 36.6 | 46.9 | 59.8 | 29.1 | 43.7 | 46.1 | 41.5 | 28.6 | 41.5 |
ConfMix [72] | YOLOv5 | 36.9 | 43.1 | 57.8 | 26.3 | 42.4 | 28.2 | 39.4 | 24.2 | 37.3 |
SDA [28] | CenterNet | 37.1 | 45.0 | 52.1 | 27.0 | 37.1 | 32.1 | 47.4 | 25.5 | 37.9 |
Deformable DETR (source only) [25] | D-DETR | 35.5 | 26.8 | 44.2 | 21.6 | 37.7 | 5.8 | 39.1 | 17.2 | 28.5 |
SFA [33] | D-DETR | 44.0 | 46.2 | 62.6 | 28.3 | 46.5 | 29.4 | 48.6 | 25.1 | 41.3 |
MTTrans [73] | D-DETR | 46.5 | 45.9 | 65.2 | 32.6 | 47.4 | 33.8 | 49.9 | 25.8 | 43.4 |
DA-DETR [74] | D-DETR | 46.3 | 45.8 | 63.1 | 31.6 | 49.9 | 37.5 | 50.0 | 24.0 | 43.5 |
PM-DETR [75] | D-DETR | 46.1 | 47.2 | 64.7 | 32.4 | 47.8 | 39.6 | 50.2 | 26.5 | 44.3 |
CPLT [76] | D-DETR | 52.6 | 42.5 | 67.8 | 40.9 | 51.5 | 30.5 | 57.4 | 27.9 | 46.4 |
SDA-DETR (ours) | D-DETR | 48.2 | 53.6 | 66.3 | 33.2 | 50.6 | 41.3 | 54.9 | 30.5 | 47.3 |
Method | DIFs | t-MAE | mAP | |
---|---|---|---|---|
Deformable DETR [25] | 45.1 | |||
✓ | 51.5 | |||
✓ | ✓ | 52.8 | ||
✓ | ✓ | ✓ | 53.2 |
Cityscapes (N → F) | ||||
---|---|---|---|---|
0.01 | 0.1 | 1.0 | 2.0 | |
0.01 | 0.1 | 1.0 | 2.0 | |
mAP | 40.2 | 42.7 | 47.3 | 44.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, G.; Wang, L.; Chen, Z. A Step-Wise Domain Adaptation Detection Transformer for Object Detection under Poor Visibility Conditions. Remote Sens. 2024, 16, 2722. https://doi.org/10.3390/rs16152722
Zhang G, Wang L, Chen Z. A Step-Wise Domain Adaptation Detection Transformer for Object Detection under Poor Visibility Conditions. Remote Sensing. 2024; 16(15):2722. https://doi.org/10.3390/rs16152722
Chicago/Turabian StyleZhang, Gege, Luping Wang, and Zengping Chen. 2024. "A Step-Wise Domain Adaptation Detection Transformer for Object Detection under Poor Visibility Conditions" Remote Sensing 16, no. 15: 2722. https://doi.org/10.3390/rs16152722
APA StyleZhang, G., Wang, L., & Chen, Z. (2024). A Step-Wise Domain Adaptation Detection Transformer for Object Detection under Poor Visibility Conditions. Remote Sensing, 16(15), 2722. https://doi.org/10.3390/rs16152722