Next Article in Journal
From Automation to Collaboration: Mapping AI–Human Interaction in Organizations Through Bibliometric Analysis
Previous Article in Journal
An Overview of Machine Learning and Deep Learning Methods for Style Classification in Paintings
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

DS2 Attention: Dual-Stream Segmented Information Propagating Linear Attention for Vision Transformers

Department of Computer Science and Engineering, University of Bridgeport, Bridgeport, CT 06604, USA
*
Author to whom correspondence should be addressed.
AI 2026, 7(6), 188; https://doi.org/10.3390/ai7060188
Submission received: 6 April 2026 / Revised: 13 May 2026 / Accepted: 18 May 2026 / Published: 24 May 2026

Abstract

While Vision Transformers (ViTs) have achieved state-of-the-art (SOTA) results in visual recognition, their scalability remains fundamentally constrained by the quadratic complexity of global self-attention. To address this, we present a linear complexity attention design employing dual-stream information propagation to enhance representational efficiency and structured feature aggregation. Our proposed DS2 attention acts as a versatile replacement for standard attention in various SOTA designs, such as Tokens-to-Token (T2T) and FasterViT. In our design, half of the attention heads perform left-to-right segmented information propagation in a Perceiver-style manner, while the remaining half of the heads perform right-to-left propagation. This bidirectional structured attention enables efficient long-range dependency modeling without the overhead of full global attention. To improve classification performance, we introduce a segment-level classification strategy in which each segment is associated with a summary token. The final prediction is produced via cross-attention between image tokens and these summary tokens, enabling hierarchical semantic comprehension. Extensive experiments demonstrate that the proposed attention design achieves on average 0.3% higher accuracy on the ImageNet-1K dataset, while offering improved information flow and higher efficiency across SOTA Vision Transformer designs.
Keywords: computer vision; vision transformer; efficient attention; information propagation computer vision; vision transformer; efficient attention; information propagation

Share and Cite

MDPI and ACS Style

Mahmood, R.; Patel, S.; Elleithy, K. DS2 Attention: Dual-Stream Segmented Information Propagating Linear Attention for Vision Transformers. AI 2026, 7, 188. https://doi.org/10.3390/ai7060188

AMA Style

Mahmood R, Patel S, Elleithy K. DS2 Attention: Dual-Stream Segmented Information Propagating Linear Attention for Vision Transformers. AI. 2026; 7(6):188. https://doi.org/10.3390/ai7060188

Chicago/Turabian Style

Mahmood, Rigel, Sarosh Patel, and Khaled Elleithy. 2026. "DS2 Attention: Dual-Stream Segmented Information Propagating Linear Attention for Vision Transformers" AI 7, no. 6: 188. https://doi.org/10.3390/ai7060188

APA Style

Mahmood, R., Patel, S., & Elleithy, K. (2026). DS2 Attention: Dual-Stream Segmented Information Propagating Linear Attention for Vision Transformers. AI, 7(6), 188. https://doi.org/10.3390/ai7060188

Article Metrics

Back to TopTop