Few-Shot Open-Set Object Detection with a Synthesized Monument Guided by Contrastive Distilled Prompts
Abstract
1. Introduction
- First, the contrastive distilled prompts (CDP) module addresses base–novel optimization conflict through a teacher–student prompt division with learnable fusion. It improves base-class detection while preserving novel/unknown generalization and reduces high-confidence unknown-to-known errors.
- Second, SMM reduces seen–unseen overlap at the distribution level via class-centered memory and cross-domain comparison. Non-parametric fusion at inference improves robustness, yielding higher unknown recall and more stable rejection in cluttered, co-occurring, low-supervision settings.
- The experimental results demonstrate significant improvements under the FS-OSOD setting, particularly on unknown detection and related aggregate metrics, validating the superiority of the proposed approach over competitive baselines.
2. Related Works
2.1. Few-Shot Open-Set Recognition
2.2. Few-Shot Object Detection
2.3. Open-Set Object Detection
2.4. Prompt Learning
3. Method
3.1. Preliminaries
3.2. Overall
3.3. Contrastive Distilled Prompts
3.4. Synthesized Monument Module
3.5. Overall Optimization
| Algorithm 1 Pseudo-code aligned with the workflow in Figure 3. Training and inference workflow of GPMN. |
|
| (A) FS-OSOD setting and batch sampling |
|
| (B) Shared detector and VLM space |
|
| (C) CDP: Prompt-level semantic stabilization |
|
| (D) SMM Stage 1: Conditional feature synthesis |
|
| (E) SMM Stage 2: Monument memory regularization |
|
| (F) Overall training schedule and joint optimization |
|
4. Experiments
4.1. Overview
4.2. Experimental Detail
4.2.1. Datasets
4.2.2. Setup
4.2.3. Evaluation Metrics
4.2.4. Baselines
4.3. Experimental Analysis
4.3.1. Analysis of Experimental Results Under Different Shot Settings on VOC-COCO
4.3.2. Analysis of Experimental Results on VOC10-5-5 Under Different Shot Settings
4.3.3. Analysis of Computational Resources on VOC10-5-5
4.4. Ablation Studies
4.5. Visualized Results
4.5.1. Visualization of Test Results in Figure 8

4.5.2. CAM Maps for Different Detectors in Figure 9

5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Annual Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Su, B.; Zhang, H.; Li, J.; Zhou, Z. Toward generalized few-shot open-set object detection. IEEE Trans. Image Process. 2024, 33, 1389–1402. [Google Scholar] [CrossRef] [PubMed]
- Su, B.; Zhang, H.; Zhou, Z. Hsic-based moving weight averaging for few-shot open-set object detection. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 5358–5369. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning. PmLR, Virtual, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Li, J.; Li, D.; Xiong, C.; Hoi, S. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In Proceedings of the International Conference on Machine Learning. PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 12888–12900. [Google Scholar]
- Zhong, Y.; Yang, J.; Zhang, P.; Li, C.; Codella, N.; Li, L.H.; Zhou, L.; Dai, X.; Yuan, L.; Li, Y.; et al. Regionclip: Region-based language-image pretraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16793–16803. [Google Scholar]
- Li, L.H.; Zhang, P.; Zhang, H.; Yang, J.; Li, C.; Zhong, Y.; Wang, L.; Yuan, L.; Zhang, L.; Hwang, J.N.; et al. Grounded language-image pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10965–10975. [Google Scholar]
- Zhou, X.; Girdhar, R.; Joulin, A.; Krähenbühl, P.; Misra, I. Detecting twenty-thousand classes using image-level supervision. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 350–368. [Google Scholar]
- Minderer, M.; Gritsenko, A.; Stone, A.; Neumann, M.; Weissenborn, D.; Dosovitskiy, A.; Mahendran, A.; Arnab, A.; Dehghani, M.; Shen, Z.; et al. Simple open-vocabulary object detection. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 728–755. [Google Scholar]
- Zhou, K.; Yang, J.; Loy, C.C.; Liu, Z. Learning to prompt for vision-language models. Int. J. Comput. Vis. 2022, 130, 2337–2348. [Google Scholar] [CrossRef]
- Zhou, K.; Yang, J.; Loy, C.C.; Liu, Z. Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16816–16825. [Google Scholar]
- Khattak, M.U.; Rasheed, H.; Maaz, M.; Khan, S.; Khan, F.S. Maple: Multi-modal prompt learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 19113–19122. [Google Scholar]
- Liu, B.; Kang, H.; Li, H.; Hua, G.; Vasconcelos, N. Few-shot open-set recognition using meta-learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8798–8807. [Google Scholar]
- Pal, D.; Bundele, V.; Sharma, R.; Banerjee, B.; Jeppu, Y. Few-shot open-set recognition of hyperspectral images with outlier calibration network. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 4–8 January 2022; pp. 3801–3810. [Google Scholar]
- Song, N.; Zhang, C.; Lin, G. Few-shot open-set recognition using background as unknowns. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 5970–5979. [Google Scholar]
- Huang, S.; Ma, J.; Han, G.; Chang, S.F. Task-adaptive negative envision for few-shot open-set recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 7171–7180. [Google Scholar]
- Wu, A.; Chen, D.; Deng, C. Deep feature deblurring diffusion for detecting out-of-distribution objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 13381–13391. [Google Scholar]
- Isaac-Medina, B.K.; Breckon, T.P. Dream-Box: Object-wise Outlier Generation for Out-of-Distribution Detection. arXiv 2025, arXiv:2504.18746. [Google Scholar]
- Wu, Z.; Su, B.; Geng, Q.; Zhang, H.; Zhou, Z. Boosting Few-Shot Open-Set Object Detection via Prompt Learning and Robust Decision Boundary. arXiv 2024, arXiv:2406.18443. [Google Scholar]
- Kang, B.; Liu, Z.; Wang, X.; Yu, F.; Feng, J.; Darrell, T. Few-shot object detection via feature reweighting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8420–8429. [Google Scholar]
- Yan, X.; Chen, Z.; Xu, A.; Wang, X.; Liang, X.; Lin, L. Meta r-cnn: Towards general solver for instance-level low-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9577–9586. [Google Scholar]
- Zhang, G.; Luo, Z.; Cui, K.; Lu, S.; Xing, E.P. Meta-DETR: Image-level few-shot detection with inter-class correlation exploitation. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 12832–12843. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Huang, T.E.; Darrell, T.; Gonzalez, J.E.; Yu, F. Frustratingly simple few-shot object detection. arXiv 2020, arXiv:2003.06957. [Google Scholar] [CrossRef]
- Sun, B.; Li, B.; Cai, S.; Yuan, Y.; Zhang, C. Fsce: Few-shot object detection via contrastive proposal encoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7352–7362. [Google Scholar]
- Qiao, L.; Zhao, Y.; Li, Z.; Qiu, X.; Wu, J.; Zhang, C. Defrcn: Decoupled faster r-cnn for few-shot object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 8681–8690. [Google Scholar]
- Joseph, K.; Khan, S.; Khan, F.S.; Balasubramanian, V.N. Towards open world object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 5830–5840. [Google Scholar]
- Gupta, A.; Narayan, S.; Joseph, K.; Khan, S.; Khan, F.S.; Shah, M. Ow-detr: Open-world detection transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 9235–9244. [Google Scholar]
- Han, J.; Ren, Y.; Ding, J.; Pan, X.; Yan, K.; Xia, G.S. Expanding low-density latent regions for open-set object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 9591–9600. [Google Scholar]
- Zhang, S.; Ni, Y.; Du, J.; Xue, Y.; Torr, P.; Koniusz, P.; van den Hengel, A. Open-World Objectness Modeling Unifies Novel Object Detection. In Proceedings of the Computer Vision and Pattern Recognition Conference, Jammu, India, 16–18 July 2025; pp. 30332–30342. [Google Scholar]
- Khattak, M.U.; Wasim, S.T.; Naseer, M.; Khan, S.; Yang, M.H.; Khan, F.S. Self-regulating prompts: Foundational model adaptation without forgetting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 15190–15200. [Google Scholar]
- Li, Z.; Li, X.; Fu, X.; Zhang, X.; Wang, W.; Chen, S.; Yang, J. Promptkd: Unsupervised prompt distillation for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 26617–26626. [Google Scholar]
- Zhang, W.; Wang, Y.X. Hallucination improves few-shot object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13008–13017. [Google Scholar]
- Liu, W.; Wang, X.; Owens, J.; Li, Y. Energy-based out-of-distribution detection. Adv. Neural Inf. Process. Syst. 2020, 33, 21464–21475. [Google Scholar]
- Hendrycks, D.; Gimpel, K. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv 2016, arXiv:1610.02136. [Google Scholar]
- Liang, S.; Li, Y.; Srikant, R. Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv 2017, arXiv:1706.02690. [Google Scholar]
- Dhamija, A.; Gunther, M.; Ventura, J.; Boult, T. The overlooked elephant of object detection: Open set. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; pp. 1021–1030. [Google Scholar]
- Jia, C.; Yang, Y.; Xia, Y.; Chen, Y.T.; Parekh, Z.; Pham, H.; Le, Q.; Sung, Y.H.; Li, Z.; Duerig, T. Scaling up visual and vision-language representation learning with noisy text supervision. In Proceedings of the International Conference on Machine Learning. PMLR, Virtual, 18–24 July 2021; pp. 4904–4916. [Google Scholar]
- Li, J.; Selvaraju, R.; Gotmare, A.; Joty, S.; Xiong, C.; Hoi, S.C.H. Align before fuse: Vision and language representation learning with momentum distillation. Adv. Neural Inf. Process. Syst. 2021, 34, 9694–9705. [Google Scholar]
- Alayrac, J.B.; Donahue, J.; Luc, P.; Miech, A.; Barr, I.; Hasson, Y.; Lenc, K.; Mensch, A.; Millican, K.; Reynolds, M.; et al. Flamingo: A visual language model for few-shot learning. Adv. Neural Inf. Process. Syst. 2022, 35, 23716–23736. [Google Scholar]
- Li, X.L.; Liang, P. Prefix-tuning: Optimizing continuous prompts for generation. arXiv 2021, arXiv:2101.00190. [Google Scholar] [CrossRef]
- Lis, K.; Nakka, K.; Fua, P.; Salzmann, M. Detecting the unexpected via image resynthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2152–2161. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Miller, D.; Nicholson, L.; Dayoub, F.; Sünderhauf, N. Dropout sampling for robust object detection in open-set conditions. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 3243–3249. [Google Scholar]
- Zhou, D.W.; Ye, H.J.; Zhan, D.C. Learning placeholders for open-set recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 4401–4410. [Google Scholar]







| Framework | Core Idea | Key Limitation |
|---|---|---|
| PEELER [14] | Open-set meta-learning: Samples novel classes per episode, maximizes posterior entropy for unseen classes, and uses a Mahalanobis metric to improve open-set separability. | Designed for image-level FSOSR; llacks region-level localization/background handling and does not address detection-specific seen–unseen overlap under clutter. |
| OCN [15] | Threshold-free outlier rejection via an auxiliary calibration network that takes distances to class prototypes; uses generated samples to learn seen vs. outlier separation. | Built for hyperspectral recognition; prototype-distance calibration is sensitive under few-shot noise and is not tailored to region-level unknown rejection in detection. |
| ProCAM [16] | Progressive CAM separates foreground/background; background features are treated as pseudo-unseen classes to reserve open space and train additional background/unknown capacity. | Background-as-unknown is an imperfect proxy for truly out-of-set objects; the formulation is recognition-centric and does not directly stabilize detection boundaries. |
| TANE [17] | Learns task-adaptive negative prototypes to obtain a threshold-free rejection boundary; integrates rejection calibration into learning. | Recognition setting; performance depends on negative prototype quality and does not directly model region-level ambiguity/background clutter in detection. |
| DFDD [18] | Feature-space OOD synthesis: Forward blurring produces virtual OOD features and reverse deblurring recovers features for augmentation to improve OOD object detection. | Not few-shot specific; does not address base/novel optimization conflict or semantic drift, and may be brittle under severe class imbalance. |
| Dream-Box [19] | Pixel-space diffusion generates object-wise outliers to train detectors for in-distribution detection and OOD rejection with visualizable outliers. | Not tailored to scarce supervision; outlier coverage can vary and does not explicitly stabilize region-level seen–unseen margins in FS-OSOD. |
| FOOD [3] | Decouples unknown optimization from known classes via an unknown decoupling learner (UDL) and mitigates overfitting using a class weight sparsification classifier (CWSC) that sparsifies normalized classifier weights to reduce co-adaptation. | Does not leverage vision–language priors for semantic transfer; rejection relies on classifier heuristics and can remain fragile when unknown regions are visually close to seen classes under clutter and co-occurrence. |
| Ours | Two complementary controls: (i) Teacher–student prompt decoupling to stabilize semantic transfer under few-shot data. (ii) Monument memory with momentum-updated prototypes and non-parametric inference fusion to compress region-level seen–unseen overlap. | Jointly targets base-class bias and region-level overlap, yielding a more stable rejection margin for unknown objects under clutter/co-occurrence. |
| Level | Module | Core Operation | Effect |
|---|---|---|---|
| Prompt level | CDP | Uses a teacher–student prompt design with teacher-guided hard negatives for contrastive optimization | Reduces base-class bias while preserving transfer to novel/unknown semantics |
| Representation level | SMM | Builds unseen anchors, synthesizes unseen-like features, and regularizes them with monument memory | Compresses seen–unseen overlap and enlarges the rejection margin for unknowns |
| Inference level | Memory fusion | Fuses detector logits with monument-memory similarities | Reduces unknown-to-known confusion and improves rejection stability |
| Overall | GPMN | Sequentially connects CDP and SMM | Achieves a better balance between known/novel detection and unknown rejection |
| Stream (Related Works) | Scope, Gap and Relevance to FS-OSOD |
|---|---|
| Few-shot detection FSRW [21], Meta-RCNN [22], Meta-DETR [23]; TFA [24], FSCE [25], DeFRCN [26] | Scope: Train transferable detectors from scarce annotations via meta-learning, two-stage transfer, or pseudo-sample augmentation under closed-set evaluation. Gap: These formulations do not explicitly model out-of-set unknowns at the region level; under background clutter and multi-class co-occurrence, base-driven gradients can distort decision boundaries and induce overconfident unknown-to-known errors. Relevance: Motivates explicit unknown handling and stable semantic transfer for generalized FS-OSOD. |
| Open-set detection outlier synthesis [18,19]; background mining [27,28]; proxy unknown selection [29,30]; | Scope: Acquires “unknown evidence” through synthetic outliers, mined background proposals, proxy unknowns, or uncertainty thresholds and shapes a rejection region during training or inference. Gap: Many OSOD methods implicitly assume abundant known-class supervision; in few-shot regimes, the overlap among seen, unknown and background proposal distributions increases, making rejection boundaries fragile and sensitive to proxy quality or threshold calibration. Relevance: Calls for representation-level regularization that enlarges the seen–unseen margin under clutter. |
| Prompt learning CoOp/CoCoOp/MaPLe [11,12,13]; PromptSRC/PromptKD [31,32] | Scope: Leverages aligned vision–text priors and parameter-efficient prompts to improve semantic transfer without full fine-tuning. Gap: Under scarce supervision, prompt parameters may drift toward base classes, degrading transfer to novel/unknown semantics and destabilizing region-level boundaries in detection. Relevance: Requires prompt-level decoupling that preserves transferability while maintaining base discrimination. |
| Our work | Contribution: CDP stabilizes semantic adaptation via teacher–student prompt decoupling, while SMM reduces region-level seen–unseen overlap using memory-anchored regularization and non-parametric fusion. Together, they address base-class bias and distributional overlap, supporting reliable unknown rejection under clutter/co-occurrence with scarce supervision. |
| Metric | 1-Shot | 5-Shot | |||||
| Method | |||||||
| TFA [24] | / | / | / | / | / | / | |
| DS [45] | / | / | / | / | / | / | |
| ORE [27] | / | /– | / | / | /– | / | |
| PROSER [46] | / | / | / | / | / | / | |
| OPENDET [29] | / | / | / | / | / | / | |
| FOOD [3] | / | / | / | / | / | / | |
| FOODv2 [4] | / | / | –/– | / | / | –/– | |
| CED-FOOD [20] | / | / | / | / | /17.91 | 2.99/ | |
| CED-FOOD* | / | 38.87/16.97 | 4.41/594.10 | / | / | / | |
| Ours | 24.30/13.89 | 36.58/15.19 | 4.75/677.90 | 26.93/19.86 | 40.59/17.48 | /788.20 | |
| Method | 10-Shot | 30-Shot | |||||
| TFA [24] | / | / | / | / | / | / | |
| DS [45] | / | / | / | / | / | / | |
| ORE [27] | / | /– | / | / | /– | / | |
| PROSER [46] | / | / | / | / | / | / | |
| OPENDET [29] | / | / | / | / | / | / | |
| FOOD [3] | / | / | / | / | / | / | |
| FOODv2 [4] | / | / | –/– | / | / | –/– | |
| CED-FOOD [20] | / | /17.06 | / | / | / | / | |
| CED-FOOD* | / | / | 2.46/ | / | / | 2.43/ | |
| Ours | 28.60/22.67 | 38.71/ | /843.50 | 28.74/24.47 | 40.80/18.11 | 3.09/1233.50 | |
| Metric | 1-Shot | 3-Shot | |||||
| Method | |||||||
| TFA [24] | / | / | / | / | / | / | |
| DS [45] | / | / | / | / | / | / | |
| ORE [27] | / | /– | / | / | /– | / | |
| PROSER [46] | / | / | / | / | / | / | |
| OPENDET [29] | / | / | / | / | / | / | |
| FOOD [3] | / | / | / | / | / | / | |
| FOODv2 [4] | / | / | –/– | / | / | –/– | |
| CED-FOOD [20] | / | /38.12 | 4.12/459.60 | / | 80.55/39.53 | 3.72/451.20 | |
| CED-FOOD* | / | / | / | / | / | / | |
| Ours | 58.73/40.95 | 79.97/37.23 | 5.68/770.30 | 57.48/34.14 | 78.14/37.34 | 5.81/558.20 | |
| Method | 5-Shot | 10-Shot | |||||
| TFA [24] | / | / | / | / | / | / | |
| DS [45] | / | / | / | / | / | / | |
| ORE [27] | / | /– | / | / | /– | / | |
| PROSER [46] | / | / | / | / | / | / | |
| OPENDET [29] | / | / | / | / | / | / | |
| FOOD [3] | / | / | / | / | / | / | |
| FOODv2 [4] | / | / | –/– | / | / | –/– | |
| CED-FOOD [20] | / | /40.32 | 3.78/ | / | 79.39/39.79 | / | |
| CED-FOOD* | / | / | / | / | / | / | |
| Ours | 59.90/43.20 | 81.68/38.50 | 4.13/509.60 | 61.78/47.40 | 78.36/38.16 | 3.36/535.20 | |
| Computational Resource | DS | PROSER | ORE | OpenDet* | CED-FOOD | Ours |
|---|---|---|---|---|---|---|
| Training time (s/iter) | 0.1963 | 0.1925 | 0.2071 | 0.1941 | 0.1963 | 0.1957 |
| Inference time (s/iter) | 0.3979 | 0.03981 | 0.03984 | 0.03985 | 0.03989 | 0.03995 |
| Model parameters (KB) | 224219 | 224125 | 231290 | 223645 | 225687 | 226945 |
| CDP | SMM | ||||||
|---|---|---|---|---|---|---|---|
| Baseline (CED-FOOD) | 25.72 | 21.16 | 39.43 | 17.52 | 2.46 | 1339.30 | |
| ✔ | 29.78 | 25.56 | 40.17 | 18 | 2.98 | 2083 | |
| ✔ | 24.28 | 19.96 | 40.88 | 18.08 | 3.16 | 1096 | |
| ✔ | ✔ | 28.74 | 24.47 | 40.80 | 18.11 | 3.09 | 1233.50 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Chen, H.; Chen, Y. Few-Shot Open-Set Object Detection with a Synthesized Monument Guided by Contrastive Distilled Prompts. Appl. Sci. 2026, 16, 3474. https://doi.org/10.3390/app16073474
Chen H, Chen Y. Few-Shot Open-Set Object Detection with a Synthesized Monument Guided by Contrastive Distilled Prompts. Applied Sciences. 2026; 16(7):3474. https://doi.org/10.3390/app16073474
Chicago/Turabian StyleChen, Hao, and Ying Chen. 2026. "Few-Shot Open-Set Object Detection with a Synthesized Monument Guided by Contrastive Distilled Prompts" Applied Sciences 16, no. 7: 3474. https://doi.org/10.3390/app16073474
APA StyleChen, H., & Chen, Y. (2026). Few-Shot Open-Set Object Detection with a Synthesized Monument Guided by Contrastive Distilled Prompts. Applied Sciences, 16(7), 3474. https://doi.org/10.3390/app16073474

