- Article
M2Former: Multiscale Patch Selection for Fine-Grained Visual Recognition
- Jiyong Moon and
- Seongsik Park
Recently, Vision Transformers (ViTs) have been actively applied to fine-grained visual recognition (FGVR). ViT can effectively model the interdependencies between patch-divided object regions through an inherent self-attention mechanism. In addition,...