TLFormer: Scalable Taylor Linear Attention in Transformer for Collaborative Filtering
Abstract
1. Introduction
2. Preliminaries
2.1. Problem Formulation
2.2. Transformer Architecture
3. The Proposed TLFormer
3.1. Representation Vector Lookup
3.2. Spatial Structural Information
3.3. Taylor Linear Attention
- Matrix Notation for Linear Attention. For practical implementation, we express our approach using matrix notation. The concatenated token embeddings for users and items are represented by . Following standard attention mechanisms, we compute the query and key matrices through linear projections followed by Frobenius normalization:Motivated by the empirical findings in LightGCN [17], we omit feature transformation on node embeddings and set , which not only simplifies the model architecture, but also empirically improves the performance. The linear attention computation can then be formulated as a two-step process:This formulation maintains linear computational complexity in both time and space with respect to the number of the users and items by strategically computing matrix products between key-value pairs before query interaction.
3.4. Model Training
3.5. Complexity Analysis
| Algorithm 1 Pseudocode of TLFormer |
| Input: user–item interaction matrix , embedding dimension d, balance weight , query matrix , key matrix . Output: Node representation matrix
|
4. Experiment
4.1. Experimental Settings
- Pinterest [56]: A large-scale image recommendation dataset collected from Pinterest social media platform, containing implicit user–image interactions through user behavior data.
- Amazon Datasets (https://jmcauley.ucsd.edu/data/amazon/ accessed on 15 March 2024): Three large-scale product review datasets from Amazon’s e-commerce platform. We select Movies, Kindle, and CDs categories for their substantial size and diverse interaction patterns.
- Baselines. We compare TLFormer against a diverse set of established recommendation algorithms, including traditional matrix factorization approaches, generative models, GNN-based models, and recent self-supervised learning methods.
- BPRMF [42] is a typical negative sampling method, which optimizes ranking loss using the maximum a posterior estimation.
- RecVAE [43] builds upon Variational AutoEncoders (VAEs) and enhances Mult-VAE with a novel -VAE hyperparameter setting, a composite prior for latent codes, and an alternating update training strategy.
- LightGCN [17] is a simplified GCN that removes activation functions and feature transformation to facilitate large-scale recommender systems.
- CLRec [58] is a contrastive learning method that adopts InfoNCE loss to reduce exposure bias through inverse propensity weighting in the recommender systems.
- SGL [26] applies contrastive learning to recommender systems by generating multiple graph views through edge dropping, node dropping, and random walks, thereby enhancing the robustness of learned user and item representations.
- CONet [59] captures both local and nonlocal messages in graphs by performing k-Means clustering on nodes’ GNN embeddings to obtain graph-level representations (e.g., centroids).
- DirectAU [55] introduces the concepts of alignment and uniformity into recommender systems, eliminating negative sampling and ensuring closer user–item pairs, while making the distribution of user and item sets within the same batch more uniform.
- MGFormer [21] can capture all-pair interactions among nodes with linear complexity and incorporate learnable relative degree information to appropriately reweigh the attentions.
- BIGCF [13] revisits user–item interactions from a causal perspective by disentangling them into collective and individual intents, and reconstructs the interaction graph through a bilateral intent modeling framework grounded in generative graph learning.
- Evaluation Metrics. For evaluation, we employ two standard metrics in recommender systems: Recall and Normalized Discounted Cumulative Gain (NDCG) at different cutoff values . We use NDCG@20 as the validation metric for model selection during training [57]. Each dataset is split into training, validation, and test sets with a ratio of 8:1:1. To ensure comprehensive evaluation, we adopt the full-ranking protocol where the model ranks all items for each user rather than using a sampled subset. All baselines are tuned to their optimal performance through multiple experimental runs, with the best results reported for fair comparison.
- Experimental Setup. We implement all methods using RecBole [57] frameworks for fair comparison. For optimization, we use Adam [60] with a learning rate of 1 × 10−3 and train for a maximum of 300 epochs. We employ early stopping with a patience of 10 epochs to prevent overfitting. All models use embedding dimensions of either 64 or 128, with a batch size of 1024. For TLFormer, the default encoder consists of a linear transformation mapping user/item IDs from the embedding table, and we conduct extensive parameter tuning for the trade-off weight between uniformity and alignment objectives. We follow the hyperparameter settings as specified in the original papers [13,17,21,26,42,43,55,58,59]. All parameters are initialized using the Xavier initialization [61].
4.2. Accuracy Evaluation
4.3. Efficiency Comparison
4.4. Convergence Analysis
4.5. Noise Robustness Analysis
4.6. Effectiveness in Long-Tail Recommendation
4.7. Ablation Study
4.8. Parameter Sensitivity
5. Related Work
5.1. GNNs for Recommender Systems
5.2. Graph Transformers
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Peng, J.; Gong, J.; Zhou, C.; Zang, Q.; Fang, X.; Yang, K.; Yu, J. Kgcfrec: Improving collaborative filtering recommendation with knowledge graph. Electronics 2024, 13, 1927. [Google Scholar] [CrossRef]
- Li, P.; Zhan, W.; Gao, L.; Wang, S.; Yang, L. Multimodal Recommendation System Based on Cross Self-Attention Fusion. Systems 2025, 13, 57. [Google Scholar] [CrossRef]
- Wang, L.; Jin, D. A time-sensitive graph neural network for session-based new item recommendation. Electronics 2024, 13, 223. [Google Scholar] [CrossRef]
- Lu, H.; Chen, Z. SocialJGCF: Social Recommendation with Jacobi Polynomial-Based Graph Collaborative Filtering. Appl. Sci. 2024, 14, 12070. [Google Scholar] [CrossRef]
- Duan, Z.; Wang, C.; Zhong, W. Ssgcl: Simple social recommendation with graph contrastive learning. Mathematics 2024, 12, 1107. [Google Scholar] [CrossRef]
- Chen, H.; Lin, Y.; Pan, M.; Wang, L.; Yeh, C.C.M.; Li, X.; Zheng, Y.; Wang, F.; Yang, H. Denoising self-attentive sequential recommendation. In Proceedings of the 16th ACM Conference on Recommender Systems, Seattle, WA, USA, 18–23 September 2022; pp. 92–101. [Google Scholar]
- Chen, H.; Li, J. Adversarial tensor factorization for context-aware recommendation. In Proceedings of the 13th ACM Conference on Recommender Systems, Copenhagen, Denmark, 16–20 September 2019; pp. 363–367. [Google Scholar]
- Bawack, R.E.; Wamba, S.F.; Carillo, K.D.A.; Akter, S. Artificial intelligence in E-Commerce: A bibliometric study and literature review. Electron. Mark. 2022, 32, 297–338. [Google Scholar] [CrossRef]
- Goyani, M.; Chaurasiya, N. A review of movie recommendation system: Limitations, Survey and Challenges. ELCVIA Electron. Lett. Comput. Vis. Image Anal. 2020, 19, 18–37. [Google Scholar] [CrossRef]
- Schedl, M.; Knees, P.; McFee, B.; Bogdanov, D. Music recommendation systems: Techniques, use cases, and challenges. In Recommender Systems Handbook; Springer: Berlin/Heidelberg, Germany, 2021; pp. 927–971. [Google Scholar]
- Li, Z.; Liu, F.; Wei, Y.; Cheng, Z.; Nie, L.; Kankanhalli, M.S. Attribute-driven Disentangled Representation Learning for Multimodal Recommendation. In Proceedings of the 32nd ACM International Conference on Multimedia; ACM: New York, NY, USA, 2024; pp. 9660–9669. [Google Scholar]
- Lin, Z.; Tian, C.; Hou, Y.; Zhao, W.X. Improving graph collaborative filtering with neighborhood-enriched contrastive learning. In Proceedings of the ACM Web Conference 2022, Lyon, France, 25–29 April 2022; pp. 2320–2329. [Google Scholar]
- Zhang, Y.; Sang, L.; Zhang, Y. Exploring the individuality and collectivity of intents behind interactions for graph collaborative filtering. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA, 14–18 July 2024; pp. 1253–1262. [Google Scholar]
- Liu, F.; Zhao, S.; Cheng, Z.; Nie, L.; Kankanhalli, M. Cluster-based graph collaborative filtering. ACM Trans. Inf. Syst. 2024, 42, 1–24. [Google Scholar] [CrossRef]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Wang, X.; He, X.; Wang, M.; Feng, F.; Chua, T.S. Neural graph collaborative filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 165–174. [Google Scholar]
- He, X.; Deng, K.; Wang, X.; Li, Y.; Zhang, Y.; Wang, M. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the SIGIR, Xi’an, China, 25–30 July 2020. [Google Scholar]
- Yang, X.; Chen, H.; Yan, Y.; Tang, Y.; Zhao, Y.; Xu, E.; Cai, Y.; Tong, H. SimCE: Simplifying Cross-Entropy Loss for Collaborative Filtering. arXiv 2024, arXiv:2406.16170. [Google Scholar]
- Cheng, Z.; Han, S.; Liu, F.; Zhu, L.; Gao, Z.; Peng, Y. Multi-behavior recommendation with cascading graph convolution networks. In Proceedings of the ACM Web Conference 2023, Austin, TX, USA, 30 April–4 May 2023; pp. 1181–1189. [Google Scholar]
- Huang, J.; Cao, Q.; Xie, R.; Zhang, S.; Xia, F.; Shen, H.; Cheng, X. Adversarial learning data augmentation for graph contrastive learning in recommendation. In Proceedings of the International Conference on Database Systems for Advanced Applications, Tianjin, China, 17–20 April 2023; Springer: New York, NY, USA, 2023; pp. 373–388. [Google Scholar]
- Chen, H.; Xu, Z.; Yeh, C.C.M.; Lai, V.; Zheng, Y.; Xu, M.; Tong, H. Masked Graph Transformer for Large-Scale Recommendation. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA, 14–18 July 2024; pp. 2502–2506. [Google Scholar]
- Jain, K.; Jindal, R. Sampling and noise filtering methods for recommender systems: A literature review. Eng. Appl. Artif. Intell. 2023, 122, 106129. [Google Scholar] [CrossRef]
- Wang, S.; Zhang, X.; Wang, Y.; Ricci, F. Trustworthy recommender systems. ACM Trans. Intell. Syst. Technol. 2024, 15, 84. [Google Scholar] [CrossRef]
- Sreepada, R.S.; Patra, B.K. Enhancing long tail item recommendation in collaborative filtering: An econophysics-inspired approach. Electron. Commer. Res. Appl. 2021, 49, 101089. [Google Scholar] [CrossRef]
- Xu, Z.; Chai, Z.; Xu, C.; Yuan, C.; Yang, H. Towards effective collaborative learning in long-tailed recognition. IEEE Trans. Multimed. 2023, 26, 3754–3764. [Google Scholar] [CrossRef]
- Wu, J.; Wang, X.; Feng, F.; He, X.; Chen, L.; Lian, J.; Xie, X. Self-supervised graph learning for recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021; pp. 726–735. [Google Scholar]
- Rampášek, L.; Galkin, M.; Dwivedi, V.P.; Luu, A.T.; Wolf, G.; Beaini, D. Recipe for a general, powerful, scalable graph transformer. In Proceedings of the NeurIPS, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Geisler, S.; Li, Y.; Mankowitz, D.J.; Cemgil, A.T.; Günnemann, S.; Paduraru, C. Transformers meet directed graphs. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
- Ma, L.; Lin, C.; Lim, D.; Romero-Soriano, A.; Dokania, P.K.; Coates, M.; Torr, P.H.; Lim, S.N. Graph inductive biases in transformers without message passing. In Proceedings of the ICML, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
- Shirzad, H.; Velingker, A.; Venkatachalam, B.; Sutherland, D.J.; Sinop, A.K. Exphormer: Sparse transformers for graphs. In Proceedings of the ICML, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
- Ying, C.; Cai, T.; Luo, S.; Zheng, S.; Ke, G.; He, D.; Shen, Y.; Liu, T.Y. Do transformers really perform badly for graph representation? In Proceedings of the NeurIPS, Virtual, 6–14 December 2021. [Google Scholar]
- Liu, C.; Yao, Z.; Zhan, Y.; Ma, X.; Pan, S.; Hu, W. Gradformer: Graph Transformer with Exponential Decay. arXiv 2024, arXiv:2404.15729. [Google Scholar] [CrossRef]
- Xing, Y.; Wang, X.; Li, Y.; Huang, H.; Shi, C. Less is more: On the over-globalizing problem in graph transformers. arXiv 2024, arXiv:2405.01102. [Google Scholar] [CrossRef]
- Wu, Q.; Zhao, W.; Li, Z.; Wipf, D.P.; Yan, J. Nodeformer: A scalable graph structure learning transformer for node classification. In Proceedings of the NeurIPS, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Liu, J.; Mao, Q.; Jiang, W.; Li, J. KnowFormer: Revisiting Transformers for Knowledge Graph Reasoning. arXiv 2024, arXiv:2409.12865. [Google Scholar]
- Dwivedi, V.P.; Luu, A.T.; Laurent, T.; Bengio, Y.; Bresson, X. Graph Neural Networks with Learnable Structural and Positional Representations. In Proceedings of the ICLR, Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Choromanski, K.M.; Likhosherstov, V.; Dohan, D.; Song, X.; Gane, A.; Sarlos, T.; Hawkins, P.; Davis, J.Q.; Mohiuddin, A.; Kaiser, L.; et al. Rethinking Attention with Performers. In Proceedings of the ICLR, Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Vaswani, A. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Wu, Q.; Zhao, W.; Yang, C.; Zhang, H.; Nie, F.; Jiang, H.; Bian, Y.; Yan, J. Simplifying and empowering transformers for large-graph representations. Adv. Neural Inf. Process. Syst. 2024, 36, 2826. [Google Scholar]
- Li, C.; Xia, L.; Ren, X.; Ye, Y.; Xu, Y.; Huang, C. Graph transformer for recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, Taiwan, 23–27 July 2023; pp. 1680–1689. [Google Scholar]
- Koren, Y.; Bell, R.; Volinsky, C. Matrix factorization techniques for recommender systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
- Rendle, S.; Freudenthaler, C.; Gantner, Z.; Schmidt-Thieme, L. BPR: Bayesian personalized ranking from implicit feedback. arXiv 2012, arXiv:1205.2618. [Google Scholar] [CrossRef]
- Shenbin, I.; Alekseev, A.; Tutubalina, E.; Malykh, V.; Nikolenko, S.I. Recvae: A new variational autoencoder for top-n recommendations with implicit feedback. In Proceedings of the 13th International Conference on Web search and DATA Mining, Houston, TX, USA, 3–7 February 2020; pp. 528–536. [Google Scholar]
- Li, Z.; Guo, Y.; Wang, K.; Chen, X.; Nie, L.; Kankanhalli, M.S. Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR. In Proceedings of the 31st ACM International Conference on Multimedia; ACM: New York, NY, USA, 2023; pp. 5634–5644. [Google Scholar]
- Li, Z.; Guo, Y.; Wang, K.; Liu, F.; Nie, L.; Kankanhalli, M.S. Learning to Agree on Vision Attention for Visual Commonsense Reasoning. IEEE Trans. Multimed. 2024, 26, 1065–1075. [Google Scholar] [CrossRef]
- OpenAI. GPT-4 Technical Report. arXiv 2024, arXiv:2303.08774. [Google Scholar]
- Koresh, E.; Gross, R.D.; Meir, Y.; Tzach, Y.; Halevi, T.; Kanter, I. Unified CNNs and transformers underlying learning mechanism reveals multi-head attention modus vivendi. Phys. A Stat. Mech. Its Appl. 2025, 666, 130529. [Google Scholar] [CrossRef]
- Li, Z.; Guo, Y.; Wang, K.; Wei, Y.; Nie, L.; Kankanhalli, M.S. Joint Answering and Explanation for Visual Commonsense Reasoning. IEEE Trans. Image Process. 2023, 32, 3836–3846. [Google Scholar] [CrossRef]
- Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer Normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar] [CrossRef]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 July 2016; pp. 770–778. [Google Scholar]
- Hussain, M.S.; Zaki, M.J.; Subramanian, D. Global self-attention as a replacement for graph convolution. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 655–665. [Google Scholar]
- Liu, F.; Huang, X.; Chen, Y.; Suykens, J.A. Random features for kernel approximation: A survey on algorithms, theory, and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7128–7148. [Google Scholar] [CrossRef]
- Wang, T.; Isola, P. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 9929–9939. [Google Scholar]
- Wang, C.; Yu, Y.; Ma, W.; Zhang, M.; Chen, C.; Liu, Y.; Ma, S. Towards representation alignment and uniformity in collaborative filtering. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 1816–1825. [Google Scholar]
- He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.S. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 173–182. [Google Scholar]
- Zhao, W.X.; Mu, S.; Hou, Y.; Lin, Z.; Chen, Y.; Pan, X.; Li, K.; Lu, Y.; Wang, H.; Tian, C.; et al. Recbole: Towards a unified, comprehensive and efficient framework for recommendation algorithms. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual, 1–5 November 2021; pp. 4653–4664. [Google Scholar]
- Zhou, C.; Ma, J.; Zhang, J.; Zhou, J.; Yang, H. Contrastive learning for debiased candidate generation in large-scale recommender systems. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Singapore, 14–18 August 2021; pp. 3985–3995. [Google Scholar]
- Chen, H.; Yeh, C.C.M.; Wang, F.; Yang, H. Graph neural transport networks with non-local attentions for recommender systems. In Proceedings of the ACM Web Conference 2022, Lyon, France, 25–29 April 2022; pp. 1955–1964. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar] [CrossRef]
- Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
- Luo, D.; Cheng, W.; Xu, D.; Yu, W.; Zong, B.; Chen, H.; Zhang, X. Parameterized explainer for graph neural network. Adv. Neural Inf. Process. Syst. 2020, 33, 19620–19631. [Google Scholar]
- Luo, D.; Zhao, T.; Cheng, W.; Xu, D.; Han, F.; Yu, W.; Liu, X.; Chen, H.; Zhang, X. Towards inductive and efficient explanations for graph neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5245–5259. [Google Scholar] [CrossRef]
- Chen, H.; Wang, L.; Lin, Y.; Yeh, C.C.M.; Wang, F.; Yang, H. Structured graph convolutional networks with stochastic masks for recommender systems. In Proceedings of the SIGIR, Virtual, 11–15 July 2021. [Google Scholar]
- Yu, J.; Yin, H.; Xia, X.; Chen, T.; Cui, L.; Nguyen, Q.V.H. Are graph augmentations necessary? simple graph contrastive learning for recommendation. In Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval, Padua, Italy, 13–18 July 2022; pp. 1294–1303. [Google Scholar]
- Wu, Y.; Zhang, L.; Mo, F.; Zhu, T.; Ma, W.; Nie, J.Y. Unifying Graph Convolution and Contrastive Learning in Collaborative Filtering. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 3425–3436. [Google Scholar]
- Wu, W.; Wang, C.; Shen, D.; Qin, C.; Chen, L.; Xiong, H. AFDGCF: Adaptive Feature De-correlation Graph Collaborative Filtering for Recommendations. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA, 14–18 July 2024; pp. 1242–1252. [Google Scholar]
- Zhao, J.; Wenjie, W.; Xu, Y.; Sun, T.; Feng, F.; Chua, T.S. Denoising diffusion recommender model. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA, 14–18 July 2024; pp. 1370–1379. [Google Scholar]
- Chen, D.; O’Bray, L.; Borgwardt, K. Structure-aware transformer for graph representation learning. In Proceedings of the International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2022; pp. 3469–3489. [Google Scholar]
- Wu, Q.; Yang, C.; Zhao, W.; He, Y.; Wipf, D.; Yan, J. Difformer: Scalable (graph) transformers induced by energy constrained diffusion. arXiv 2023, arXiv:2301.09474. [Google Scholar] [CrossRef]
- Qiu, Y.; Zhang, K.; Wang, C.; Luo, W.; Li, H.; Jin, Z. Mb-taylorformer: Multi-branch efficient transformer expanded by taylor formula for image dehazing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 12802–12813. [Google Scholar]
- Babiloni, F.; Marras, I.; Deng, J.; Kokkinos, F.; Maggioni, M.; Chrysos, G.; Torr, P.; Zafeiriou, S. Linear Complexity Self-Attention with 3rd-Order Polynomials. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 12726–12737. [Google Scholar] [PubMed]
- Katharopoulos, A.; Vyas, A.; Pappas, N.; Fleuret, F. Transformers are rnns: Fast autoregressive transformers with linear attention. In Proceedings of the International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2020; pp. 5156–5165. [Google Scholar]
- Nauen, T.C.; Palacio, S.; Dengel, A. Taylorshift: Shifting the complexity of self-attention from squared to linear (and back) using taylor-softmax. In Proceedings of the International Conference on Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2025; pp. 1–16. [Google Scholar]
- Wei, Y.; Liu, W.; Liu, F.; Wang, X.; Nie, L.; Chua, T.S. Lightgt: A light graph transformer for multimedia recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, Taiwan, 23–27 July 2023; pp. 1508–1517. [Google Scholar]







| Dataset | #User | #Item | #Inter. | Density |
|---|---|---|---|---|
| 55.2k | 9.9k | 1500.8k | 0.274% | |
| Amazon-CDs | 43.2k | 35.6k | 777.4k | 0.051% |
| Amazon-Kindle | 138.9k | 98.7k | 1910.0k | 0.014% |
| Amazon-Movies | 44.4k | 25.9k | 1070.9k | 0.096% |
| Setting | Baseline Methods | Ours | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset | Metric | BPRMF | RecVAE | LightGCN | CLRec | SGL | GONet | DirectAU | MGFormer | BIGCF | TLFormer | Improv. |
| Recall@10 | 0.0840 | 0.1099 | 0.1028 | 0.1050 | 0.1070 | 0.1114 | 0.1170 | 0.1173 | 0.1163 | 0.1220 ** | 4.01% | |
| Recall@20 | 0.1393 | 0.1703 | 0.1616 | 0.1687 | 0.1687 | 0.1767 | 0.1834 | 0.1820 | 0.1831 | 0.1899 ** | 3.54% | |
| Recall@50 | 0.2568 | 0.2921 | 0.2800 | 0.3021 | 0.2948 | 0.3113 | 0.3136 | 0.3118 | 0.3135 | 0.3181 ** | 1.43% | |
| NDCG@10 | 0.0551 | 0.0755 | 0.0685 | 0.0704 | 0.0708 | 0.0746 | 0.0796 | 0.0754 | 0.0783 | 0.0818 ** | 2.76% | |
| NDCG@20 | 0.0730 | 0.0951 | 0.0876 | 0.0910 | 0.0908 | 0.0958 | 0.1011 | 0.0962 | 0.0999 | 0.1039 ** | 2.76% | |
| NDCG@50 | 0.1029 | 0.1261 | 0.1178 | 0.1249 | 0.1229 | 0.1310 | 0.1343 | 0.1340 | 0.1332 | 0.1365 ** | 1.64% | |
| Amazon-CDs | Recall@10 | 0.0677 | 0.0830 | 0.0886 | 0.1154 | 0.0980 | 0.1165 | 0.1097 | 0.1152 | 0.1093 | 0.1253 ** | 7.55% |
| Recall@20 | 0.1037 | 0.1189 | 0.1319 | 0.1634 | 0.1455 | 0.1655 | 0.1594 | 0.1615 | 0.1599 | 0.1741 ** | 5.20% | |
| Recall@50 | 0.1764 | 0.1865 | 0.2088 | 0.2489 | 0.2275 | 0.2525 | 0.2440 | 0.2465 | 0.2471 | 0.2547 ** | 0.87% | |
| NDCG@10 | 0.0391 | 0.0502 | 0.0526 | 0.0706 | 0.0587 | 0.0714 | 0.0661 | 0.0711 | 0.0654 | 0.0776 ** | 8.68% | |
| NDCG@20 | 0.0486 | 0.0598 | 0.0641 | 0.0832 | 0.0713 | 0.0843 | 0.0792 | 0.0828 | 0.0789 | 0.0904 ** | 7.24% | |
| NDCG@50 | 0.0642 | 0.0743 | 0.0806 | 0.1015 | 0.0889 | 0.1029 | 0.0974 | 0.1012 | 0.0976 | 0.1078 ** | 4.76% | |
| Amazon-Kindle | Recall@10 | 0.0453 | 0.1613 | 0.1416 | 0.1486 | 0.1348 | 0.1360 | 0.1597 | 0.1800 | 0.1457 | 0.1982 ** | 10.11% |
| Recall@20 | 0.0720 | 0.2037 | 0.1859 | 0.1995 | 0.1810 | 0.1881 | 0.2154 | 0.2314 | 0.2023 | 0.2496 ** | 7.87% | |
| Recall@50 | 0.1261 | 0.2692 | 0.2553 | 0.2818 | 0.2549 | 0.2667 | 0.3027 | 0.3112 | 0.2846 | 0.3270 ** | 5.08% | |
| NDCG@10 | 0.0252 | 0.1081 | 0.0891 | 0.0945 | 0.0828 | 0.0856 | 0.0994 | 0.1202 | 0.0868 | 0.1323 ** | 10.07% | |
| NDCG@20 | 0.0322 | 0.1195 | 0.1009 | 0.1080 | 0.0950 | 0.0953 | 0.1142 | 0.1302 | 0.1012 | 0.1460 ** | 12.14% | |
| NDCG@50 | 0.0438 | 0.1337 | 0.1158 | 0.1257 | 0.1108 | 0.1084 | 0.1330 | 0.1488 | 0.1192 | 0.1629 ** | 9.48% | |
| Amazon-Movies | Recall@10 | 0.0532 | 0.0668 | 0.0654 | 0.0837 | 0.0708 | 0.0805 | 0.0789 | 0.0875 | 0.0866 | 0.0929 ** | 6.17% |
| Recall@20 | 0.0857 | 0.1008 | 0.1003 | 0.1251 | 0.1084 | 0.1211 | 0.1194 | 0.1281 | 0.1267 | 0.1346 ** | 5.07% | |
| Recall@50 | 0.1525 | 0.1653 | 0.1656 | 0.2031 | 0.1812 | 0.1988 | 0.1944 | 0.2004 | 0.2023 | 0.2081 ** | 2.46% | |
| NDCG@10 | 0.0313 | 0.0407 | 0.0394 | 0.0514 | 0.0424 | 0.0487 | 0.0480 | 0.0538 | 0.0522 | 0.0573 ** | 6.51% | |
| NDCG@20 | 0.0400 | 0.0499 | 0.0488 | 0.0624 | 0.0526 | 0.0596 | 0.0589 | 0.0673 | 0.0661 | 0.0685 ** | 1.78% | |
| NDCG@50 | 0.0547 | 0.0641 | 0.0633 | 0.0796 | 0.0686 | 0.0766 | 0.0755 | 0.0801 | 0.0781 | 0.0847 ** | 5.74% | |
| Models | Amazon-Movies | |||
|---|---|---|---|---|
| R@20 | N@20 | R@20 | N@20 | |
| NoALL | 0.1159 | 0.0593 | 0.1757 | 0.0973 |
| NoSVD | 0.1253 | 0.0627 | 0.1814 | 0.0998 |
| NoFormer | 0.1247 | 0.0625 | 0.1772 | 0.0975 |
| TLFormer | 0.1346 | 0.0685 | 0.1899 | 0.1039 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Hao, D.; Yu, D.; Hou, X. TLFormer: Scalable Taylor Linear Attention in Transformer for Collaborative Filtering. Electronics 2026, 15, 759. https://doi.org/10.3390/electronics15040759
Hao D, Yu D, Hou X. TLFormer: Scalable Taylor Linear Attention in Transformer for Collaborative Filtering. Electronics. 2026; 15(4):759. https://doi.org/10.3390/electronics15040759
Chicago/Turabian StyleHao, Dongdong, Dongxiao Yu, and Xiaowen Hou. 2026. "TLFormer: Scalable Taylor Linear Attention in Transformer for Collaborative Filtering" Electronics 15, no. 4: 759. https://doi.org/10.3390/electronics15040759
APA StyleHao, D., Yu, D., & Hou, X. (2026). TLFormer: Scalable Taylor Linear Attention in Transformer for Collaborative Filtering. Electronics, 15(4), 759. https://doi.org/10.3390/electronics15040759

