Mathematics

Research

16 pages, 776 KiB

Open AccessArticle

Redesigning Embedding Layers for Queries, Keys, and Values in Cross-Covariance Image Transformers

by Jaesin Ahn , Jiuk Hong, Jeongwoo Ju and Heechul Jung

Mathematics 2023, 11(8), 1933; https://doi.org/10.3390/math11081933 - 19 Apr 2023

Cited by 1 | Viewed by 2531

There are several attempts in vision transformers to reduce quadratic time complexity to linear time complexity according to increases in the number of tokens. Cross-covariance image transformers (XCiT) are also one of the techniques utilized to address the issue. However, despite these efforts, [...] Read more.

There are several attempts in vision transformers to reduce quadratic time complexity to linear time complexity according to increases in the number of tokens. Cross-covariance image transformers (XCiT) are also one of the techniques utilized to address the issue. However, despite these efforts, the increase in token dimensions still results in quadratic growth in time complexity, and the dimension is a key parameter for achieving superior generalization performance. In this paper, a novel method is proposed to improve the generalization performances of XCiT models without increasing token dimensions. We redesigned the embedding layers of queries, keys, and values, such as separate non-linear embedding (SNE), partially-shared non-linear embedding (P-SNE), and fully-shared non-linear embedding (F-SNE). Finally, a proposed structure with different model size settings achieved

71.4 %, 77.8 %

, and

82.1 %

on ImageNet-1k compared with

69.9 %, 77.1 %

, and

82.0 %

acquired by the original XCiT models, namely XCiT-N12, XCiT-T12, and XCiT-S12, respectively. Additionally, the proposed model achieved

94.8 %

in transfer learning experiments, on average, for CIFAR-10, CIFAR-100, Stanford Cars, and STL-10, which is superior to the baseline model of XCiT-S12 (

94.5 %

). In particular, the proposed models demonstrated considerable improvements on the out-of-distribution detection task compared to the original XCiT models. Full article

(This article belongs to the Special Issue Computer Vision and Pattern Recognition with Applications)

Journal Menu

Journal Browser

Computer Vision and Pattern Recognition with Applications

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Related Special Issue

Published Papers (18 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI