DS2 Attention: Dual-Stream Segmented Information Propagating Linear Attention for Vision Transformers

Mahmood, Rigel; Patel, Sarosh; Elleithy, Khaled

doi:10.3390/ai7060188

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

DS² Attention: Dual-Stream Segmented Information Propagating Linear Attention for Vision Transformers

by

Rigel Mahmood

,

Sarosh Patel

and

Khaled Elleithy

^*

Department of Computer Science and Engineering, University of Bridgeport, Bridgeport, CT 06604, USA

^*

Author to whom correspondence should be addressed.

AI 2026, 7(6), 188; https://doi.org/10.3390/ai7060188

Submission received: 6 April 2026 / Revised: 13 May 2026 / Accepted: 18 May 2026 / Published: 24 May 2026

Download Versions Notes

Abstract

While Vision Transformers (ViTs) have achieved state-of-the-art (SOTA) results in visual recognition, their scalability remains fundamentally constrained by the quadratic complexity of global self-attention. To address this, we present a linear complexity attention design employing dual-stream information propagation to enhance representational efficiency and structured feature aggregation. Our proposed

D S^{2}

attention acts as a versatile replacement for standard attention in various SOTA designs, such as Tokens-to-Token (T2T) and FasterViT. In our design, half of the attention heads perform left-to-right segmented information propagation in a Perceiver-style manner, while the remaining half of the heads perform right-to-left propagation. This bidirectional structured attention enables efficient long-range dependency modeling without the overhead of full global attention. To improve classification performance, we introduce a segment-level classification strategy in which each segment is associated with a summary token. The final prediction is produced via cross-attention between image tokens and these summary tokens, enabling hierarchical semantic comprehension. Extensive experiments demonstrate that the proposed attention design achieves on average 0.3% higher accuracy on the ImageNet-1K dataset, while offering improved information flow and higher efficiency across SOTA Vision Transformer designs.

Keywords: computer vision; vision transformer; efficient attention; information propagation

Share and Cite

MDPI and ACS Style

Mahmood, R.; Patel, S.; Elleithy, K. DS² Attention: Dual-Stream Segmented Information Propagating Linear Attention for Vision Transformers. AI 2026, 7, 188. https://doi.org/10.3390/ai7060188

AMA Style

Mahmood R, Patel S, Elleithy K. DS² Attention: Dual-Stream Segmented Information Propagating Linear Attention for Vision Transformers. AI. 2026; 7(6):188. https://doi.org/10.3390/ai7060188

Chicago/Turabian Style

Mahmood, Rigel, Sarosh Patel, and Khaled Elleithy. 2026. "DS² Attention: Dual-Stream Segmented Information Propagating Linear Attention for Vision Transformers" AI 7, no. 6: 188. https://doi.org/10.3390/ai7060188

APA Style

Mahmood, R., Patel, S., & Elleithy, K. (2026). DS² Attention: Dual-Stream Segmented Information Propagating Linear Attention for Vision Transformers. AI, 7(6), 188. https://doi.org/10.3390/ai7060188

Article Menu

DS² Attention: Dual-Stream Segmented Information Propagating Linear Attention for Vision Transformers

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI