Foveated Retinotopy Improves Classification and Localization in Convolutional Neural Networks
Abstract
1. Introduction
1.1. Biological Inspiration: Foveated Retinotopy
1.2. Eye Movements and the Sequential Analysis of the Visual Scene
1.3. The Log-Polar Model in Computer Vision
1.4. Convolutional Neural Networks and Translational Invariance
1.5. Paper Contributions
- Demonstrating that conventional CNNs are vulnerable to geometric transformations (e.g., rotations, zooms), highlighting a fundamental limitation in current architectures.
- Introducing a biologically inspired foveated layer using a log-polar transformation at the network input, which provides rotation and scale invariance to the convolution layers.
- Evaluating our approach on standard off-the-shelf architectures and benchmarks, and quantifying the impact of foveated vision on classification performance.
- Showing how the foveated input layer enhances CNN robustness to geometric transformations and provides insights into the relationship between network architecture and invariance properties.
- Demonstrating that analyzing the network output in probabilistic terms reveals class-likelihood variations across multiple viewpoints, enabling effective object localization without additional training.
- Establishing deeper connections between computational and biological vision, paving the way for improvements in machine learning architectures and a better understanding of natural visual processing.
2. Methods and Materials
2.1. The Log-Polar Transform
2.2. Convolutional Neuronal Networks (CNNs)
2.3. Datasets
2.4. Datasets Transformations and Transfer Learning
2.5. Attacking Classical CNNs with a Geometrical Rotation
2.6. Localization Tools and Evaluation
2.7. Visual Object Localization: Protocol
3. Results
3.1. Training on Transformed Images
3.2. Robustness of CNNs
3.3. Visual Object Localization: Likelihood Maps
3.3.1. Mean Likelihood Maps
3.3.2. Comparing Likelihood Maps and Intersection over UNION
3.3.3. Pointing Game
3.3.4. Impact of Network Depth on Localization and Classification
3.4. Beyond ImageNet, the Animal-10k Dataset
Multi-Label and Multi-Task Extension
4. Discussion
4.1. Retina-Inspired Mapping Enhances the Robustness of CNNs
4.2. From Foveation to Pre-Attentive Mechanisms
4.3. Perspectives and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Mitkus, M.; Potier, S.; Martin, G.R.; Duriez, O.; Kelber, A. Raptor Vision. In Oxford Research Encyclopedia of Neuroscience; Oxford University Press: Oxford, UK, 2018. [Google Scholar] [CrossRef]
- Collin, S.P. Scene through the Eyes of an Apex Predator: A Comparative Analysis of the Shark Visual System. Clin. Exp. Optom. 2018, 101, 624–640. [Google Scholar] [CrossRef] [PubMed]
- Polyak, S. The Retina: The Anatomy and the Histology of the Retina in Man, Ape, and Monkey, Including the Consideration of Visual Functions, the History of Physiological Optics, and the Histological Laboratory Technique; University of Chicago Press: Chicago, IL, USA, 1941. [Google Scholar]
- Anstis, S. A Chart Demonstrating Variations in Acuity with Retinal Position. Vis. Res. 1974, 14, 589–592. [Google Scholar] [CrossRef]
- Tootell, R.B.; Hadjikhani, N.; Hall, E.; Marrett, S.; Vanduffel, W.; Vaughan, J.; Dale, A.M. The Retinotopy of Visual Spatial Attention. Neuron 1998, 21, 1409–1422. [Google Scholar] [CrossRef] [PubMed]
- Dougherty, R.F.; Koch, V.M.; Brewer, A.A.; Fischer, B.; Modersitzki, J.; Wandell, B.A. Visual Field Representations and Locations of Visual Areas V1/2/3 in Human Visual Cortex. J. Vis. 2003, 3, 1. [Google Scholar] [CrossRef] [PubMed]
- Nauhaus, I.; Nielsen, K.J.; Callaway, E.M. Efficient Receptive Field Tiling in Primate V1. Neuron 2016, 91, 893–904. [Google Scholar] [CrossRef]
- Golomb, J.D.; Chun, M.M.; Mazer, J.A. The Native Coordinate System of Spatial Attention Is Retinotopic. J. Neurosci. 2008, 28, 10654–10662. [Google Scholar] [CrossRef]
- Lu, Z.; Doerig, A.; Bosch, V.; Krahmer, B.; Kaiser, D.; Cichy, R.M.; Kietzmann, T.C. End-to-End Topographic Networks as Models of Cortical Map Formation and Human Visual Behaviour. Nat. Hum. Behav. 2023, 9, 1975–1991. [Google Scholar] [CrossRef]
- Weinberg, R.J. Are Topographic Maps Fundamental to Sensory Processing? Brain Res. Bull. 1997, 44, 113–116. [Google Scholar] [CrossRef]
- Martins, D.M.; Manda, J.M.; Goard, M.J.; Parker, P.R.L. Building egocentric models of local space from retinal input. Curr. Biol. 2024, 34, R1185–R1202. [Google Scholar] [CrossRef]
- Noton, D.; Stark, L. Scanpaths in Eye Movements during Pattern Perception. Science 1971, 171, 308–311. [Google Scholar] [CrossRef]
- Itti, L.; Koch, C. Computational modelling of visual attention. Nat. Rev. Neurosci. 2001, 2, 194–203. [Google Scholar] [CrossRef]
- Yarbus, A. Eye Movements during the Examination of Complicated Objects. Biofizika 1961, 6, 52–56. [Google Scholar]
- Rothkopf, C.A.; Ballard, D.H.; Hayhoe, M.M. Task and Context Determine Where You Look. J. Vis. 2016, 7, 16. [Google Scholar] [CrossRef]
- Daucé, E.; Albiges, P.; Perrinet, L.U. A Dual Foveal-Peripheral Visual Processing Model Implements Efficient Saccade Selection. J. Vis. 2020, 20, 22. [Google Scholar] [CrossRef]
- Daucé, E.; Perrinet, L. Visual Search as Active Inference. In Proceedings of the Active Inference; Verbelen, T., Lanillos, P., Buckley, C.L., De Boom, C., Eds.; Communications in Computer and Information Science; Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 165–178. [Google Scholar] [CrossRef]
- Dabane, G.; Perrinet, L.U.; Daucé, E. What You See Is What You Transform: Foveated Spatial Transformers as a Bio-Inspired Attention Mechanism. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN); IEEE: Piscataway, NJ, USA, 2022; pp. 1–8. [Google Scholar] [CrossRef]
- Sandini, G.; Tagliasco, V. An Anthropomorphic Retina-like Structure for Scene Analysis. Comput. Graph. Image Process. 1980, 14, 365–372. [Google Scholar] [CrossRef]
- Araujo, H.; Dias, J. An Introduction to the Log-Polar Mapping. In Proceedings II Workshop on Cybernetic Vision; IEEE Computer Society Press: Los Alamitos, CA, USA, 1997; pp. 139–144. [Google Scholar] [CrossRef]
- Faramarzi, F.; Linares-Barranco, B.; Serrano-Gotarredona, T. A 128 × 128 Electronically Multi-Foveated Dynamic Vision Sensor with Real-Time Resolution Reconfiguration. IEEE Access 2024, 12, 192656–192671. [Google Scholar] [CrossRef]
- Sarvaiya, J.N.; Patnaik, S.; Bombaywala, S. Image Registration Using Log-Polar Transform and Phase Correlation. In Proceedings of the IEEE Region 10 Annual International Conference; Proceedings/TENCON; IEEE: Piscataway, NJ, USA, 2009; pp. 1–5. [Google Scholar] [CrossRef]
- Maiello, G.; Chessa, M.; Bex, P.J.; Solari, F. Near-Optimal Combination of Disparity across a Log-Polar Scaled Visual Field. PLoS Comput. Biol. 2020, 16, e1007699. [Google Scholar] [CrossRef]
- Palander, K.; Brandt, S.S. Epipolar geometry and log-polar transform in wide baseline stereo matching. In Proceedings of the 2008 19th International Conference on Pattern Recognition; IEEE: Piscataway, NJ, USA, 2008; pp. 1–4. [Google Scholar] [CrossRef]
- Javier Traver, V.; Bernardino, A. A Review of Log-Polar Imaging for Visual Perception in Robotics. Robot. Auton. Syst. 2010, 58, 378–398. [Google Scholar] [CrossRef]
- Antonelli, M.; Igual, F.D.; Ramos, F.; Traver, V.J. Speeding up the Log-Polar Transform with Inexpensive Parallel Hardware: Graphics Units and Multi-Core Architectures. J.-Real-Time Image Process. 2015, 10, 533–550. [Google Scholar] [CrossRef]
- Wolfe, J.M. Guided Search 2.0 a Revised Model of Visual Search. Psychon. Bull. Rev. 1994, 1, 202–238. [Google Scholar] [CrossRef]
- Cao, J.; Bao, C.; Hao, Q.; Cheng, Y.; Chen, C. LPNet: Retina Inspired Neural Network for Object Detection and Recognition. Electronics 2021, 10, 2883. [Google Scholar] [CrossRef]
- da Costa, D.; Kornemann, L.; Goebel, R.; Senden, M. Convolutional neural networks develop major organizational principles of early visual cortex when enhanced with retinal sampling. Sci. Rep. 2024, 14, 8980. [Google Scholar] [CrossRef] [PubMed]
- Lukanov, H.; König, P.; Pipa, G. Biologically Inspired Deep Learning Model for Efficient Foveal-Peripheral Vision. Front. Comput. Neurosci. 2021, 15, 746204. [Google Scholar] [CrossRef]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. (IJCV) 2015, 115, 211–252. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2015. [Google Scholar] [CrossRef]
- Kubilius, J.; Schrimpf, M.; Nayebi, A.; Bear, D.; Yamins, D.L.K.; DiCarlo, J.J. CORnet: Modeling the Neural Mechanisms of Core Object Recognition. bioRxiv 2018. [Google Scholar] [CrossRef]
- Bengio, Y.; Lee, D.H.; Bornschein, J.; Mesnard, T.; Lin, Z. Towards Biologically Plausible Deep Learning. arXiv 2016, arXiv:1502.04156. [Google Scholar] [CrossRef]
- Huang, S.H.; Papernot, N.; Goodfellow, I.J.; Duan, Y.; Abbeel, P. Adversarial Attacks on Neural Network Policies. arXiv 2017, arXiv:1702.02284. [Google Scholar]
- Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.J.; Fergus, R. Intriguing properties of neural networks. arXiv 2013, arXiv:1312.6199. [Google Scholar]
- Mineault, P.; Zanichelli, N.; Peng, J.Z.; Arkhipov, A.; Bingham, E.; Jara-Ettinger, J.; Mackevicius, E.; Marblestone, A.; Mattar, M.; Payne, A.; et al. NeuroAI for AI Safety. arXiv 2024, arXiv:2411.18526. [Google Scholar] [CrossRef]
- Lempitsky, V.; Vedaldi, A.; Ulyanov, D. Deep Image Prior. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2018; pp. 9446–9454. [Google Scholar] [CrossRef]
- Remmelzwaal, L.A.; Mishra, A.; Ellis, G.F.R. Human eye inspired log-polar pre-processing for neural networks. In Proceedings of the 2020 International SAUPEC/RobMech/PRASA Conference; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar] [CrossRef]
- Kim, J.; Jung, W.; Kim, H.; Lee, J. CyCNN: A Rotation Invariant CNN Using Polar Mapping and Cylindrical Convolution Layers. arXiv 2020, arXiv:2007.10588. [Google Scholar] [CrossRef]
- Esteves, C.; Allen-Blanchette, C.; Makadia, A.; Daniilidis, K. Learning SO(3) Equivariant Representations with Spherical CNNs. In Proceedings of the European Conference on Computer Vision (ECCV); Springer: Berlin/Heidelberg, Germany, 2017; pp. 54–70. [Google Scholar] [CrossRef]
- Hao, Q.; Tao, Y.; Cao, J.; Tang, M.; Cheng, Y.; Zhou, D.; Ning, Y.; Bao, C.; Cui, H. Retina-like Imaging and Its Applications: A Brief Review. Appl. Sci. 2021, 11, 7058. [Google Scholar] [CrossRef]
- Traver, V.J.; Pla, F. Designing the Lattice for Log-Polar Images. In International Conference on Discrete Geometry for Computer Imagery; Springer: Berlin/Heidelberg, Germany, 2003; pp. 164–173. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32; Wallach, H., Larochelle, H., Beygelzimer, A., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2019; pp. 8024–8035. [Google Scholar]
- Jaderberg, M.; Simonyan, K.; Zisserman, A.; Kavukcuoglu, K. Spatial Transformer Networks. In Proceedings of the Advances in Neural Information Processing Systems; NIPS: San Diego, CA, USA, 2015; pp. 2017–2025. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Yu, H.; Xu, Y.; Zhang, J.; Zhao, W.; Guan, Z.; Tao, D. AP-10K: A Benchmark for Animal Pose Estimation in the Wild. arXiv 2021, arXiv:2108.12617. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef]
- Wang, H.; Wang, Z.; Du, M.; Yang, F.; Zhang, Z.; Ding, S.; Mardziel, P.P.; Hu, X. Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); IEEE: Piscataway, NJ, USA, 2019; pp. 24–25. [Google Scholar] [CrossRef]
- Zhang, H.; Torres, F.; Sicre, R.; Avrithis, Y.; Ayache, S. Opti-CAM: Optimizing saliency maps for interpretability. Comput. Vis. Image Underst. 2023, 248, 104101. [Google Scholar] [CrossRef]
- Potter, M.C. Meaning in Visual Search. Science 1975, 187, 965–966. [Google Scholar] [CrossRef]
- Jérémie, J.N.; Perrinet, L.U. Ultrafast Image Categorization in Biology and Neural Models. Vision 2023, 7, 29. [Google Scholar] [CrossRef]
- Rousselet, G.A.; Macé, M.J.M.; Fabre-Thorpe, M. Is It an Animal? Is It a Human Face? Fast Processing in Upright and Inverted Natural Scenes. J. Vis. 2003, 3, 440–455. [Google Scholar] [CrossRef]
- Guyonneau, R.; Kirchner, H.; Thorpe, S.J. Animals Roll around the Clock: The Rotation Invariance of Ultrarapid Visual Processing. J. Vis. 2006, 6, 1. [Google Scholar] [CrossRef] [PubMed]
- Pun, C.M.; Lee, M.C. Log-polar wavelet energy signatures for rotation and scale invariant texture classification. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 590–603. [Google Scholar] [CrossRef]
- Amorim, M.; Bortoloti, F.; Ciarelli, P.M.; de Oliveira, E.; de Souza, A.F. Analysing rotation-invariance of a log-polar transformation in convolutional neural networks. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN); IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Crouzet, S.M. What Are the Visual Features Underlying Rapid Object Recognition? Front. Psychol. 2011, 2, 326. [Google Scholar] [CrossRef]
- Zhao, K.; Yuan, W.; Wang, Z.; Li, G.; Zhu, X.; Fan, D.P.; Zeng, D. Open-Vocabulary Camouflaged Object Segmentation with Cascaded Vision Language Models. Comput. Vis. Media 2026, 12, 473–492. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. [Google Scholar] [CrossRef]
- Dallain, M.; Rodriguez, L.; Perrinet, L.U.; Miramond, B. A Saccade-Inspired Approach to Image Classification Using Visiontransformer Attention Maps. arXiv 2026, arXiv:2603.09613. [Google Scholar] [CrossRef]
- Han, Y.; Roig, G.; Geiger, G.; Poggio, T. Scale and translation-invariance for novel objects in human vision. Sci. Rep. 2020, 10, 1411. [Google Scholar] [CrossRef] [PubMed]
- Bakhtiari, S.; Mineault, P.; Lillicrap, T.; Pack, C.C.; Richards, B. The Functional Specialization of Visual Cortex Emerges from Training Parallel Pathways with Self-Supervised Predictive Learning. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2021; Volume 34, pp. 25164–25178. [Google Scholar]
- Mineault, P.J.; Bakhtiari, S.; Richards, B.A.; Pack, C.C. Your Head Is There to Move You Around: Goal-driven Models of the Primate Dorsal Pathway. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2021; Volume 34, pp. 28757–28771. [Google Scholar]













| ResNet 18 | ResNet 50 | ResNet 101 | ||||
|---|---|---|---|---|---|---|
| Cartesian | Retinotopic | Cartesian | Retinotopic | Cartesian | Retinotopic | |
| In/Out Likelihood Ratio | 6.50 | 9.02 | 5.02 | 6.40 | 4.63 | 6.06 |
| Pointing Game | 83.07% | 87.40% | 80.81% | 85.13% | 78.74% | 85.14% |
| Central fixation | 0.62 | 0.52 | 0.76 | 0.71 | 0.77 | 0.73 |
| With saccade | 0.92 | 0.86 | 0.95 | 0.94 | 0.96 | 0.95 |
| Accuracy increase | +30% | +34% | +21% | +23% | +19% | +22% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Jérémie, J.-N.; Daucé, E.; Perrinet, L.U. Foveated Retinotopy Improves Classification and Localization in Convolutional Neural Networks. Vision 2026, 10, 17. https://doi.org/10.3390/vision10020017
Jérémie J-N, Daucé E, Perrinet LU. Foveated Retinotopy Improves Classification and Localization in Convolutional Neural Networks. Vision. 2026; 10(2):17. https://doi.org/10.3390/vision10020017
Chicago/Turabian StyleJérémie, Jean-Nicolas, Emmanuel Daucé, and Laurent U. Perrinet. 2026. "Foveated Retinotopy Improves Classification and Localization in Convolutional Neural Networks" Vision 10, no. 2: 17. https://doi.org/10.3390/vision10020017
APA StyleJérémie, J.-N., Daucé, E., & Perrinet, L. U. (2026). Foveated Retinotopy Improves Classification and Localization in Convolutional Neural Networks. Vision, 10(2), 17. https://doi.org/10.3390/vision10020017

