The Comparison of Human and Machine Performance in Object Recognition
Abstract
1. Introduction
Current Study
2. Experiment 1
2.1. Experiment 1: Method (Humans)
2.1.1. Stimuli
2.1.2. Participants
2.1.3. Apparatus
2.1.4. Procedure
2.1.5. Assessing Potential Carryover Effects
2.2. Experiment 1: Method (Machines)
2.3. Experiment 1: Results
2.3.1. Humans
2.3.2. Human–Machine Comparison (Category-Level Accuracy)
2.3.3. Trial-by-Trial Agreement Analysis
2.4. Experiment 1: Discussion
3. Experiment 2
3.1. Experiment 2: Method (Humans)
3.1.1. Stimuli
3.1.2. Participants
3.1.3. Apparatus and Procedure
3.2. Experiment 2: Method (Machines)
3.3. Experiment 2: Results
3.3.1. Humans
3.3.2. Human–Machine Comparison (Overall Accuracy)
3.3.3. Human–Machine Comparison (Category-Level Accuracy)
3.3.4. Trial-by-Trial Agreement Analysis
3.4. Experiment 2: Discussion
4. Experiment 3
4.1. Experiment 3: Method (Humans)
4.1.1. Stimuli
4.1.2. Participants
4.1.3. Apparatus
4.1.4. Procedure
4.2. Experiment 3: Method (Machines)
4.3. Experiment 3: Results
4.3.1. Humans
4.3.2. Human–Machine Comparison (Overall Accuracy)
4.3.3. Human–Machine Comparison (Category-Level Accuracy)
4.3.4. Test–Retest Reliability and Trial-by-Trial Agreement Analysis
4.3.5. Results of Ordinal Discretisation
4.4. Experiment 3: Discussion
5. General Discussion
5.1. Explaining Variability in Overall Model Accuracy Across Experiments
5.2. Future Research Directions
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| BF | Bayes Factor |
| DNNs | Deep Neural Networks |
| DCNNs | Deep Convolutional Neural Networks |
| IID | Independent and Identically Distributed |
| OOD | Out-of-Distribution |
| SOTA | State-of-the-Art |
| ViTs | Vision Transformers |
Appendix A. Model Selection
Appendix B
Appendix B.1. Analytical Methods
Accuracy Coding and Response Correction in the Free-Naming Task
Appendix B.2. Discretisation

Appendix B.3. Implementation and Results of Ordinal Discretisation
References
- Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I. J., Harp, A., Irving, G., Isard, M., Jia, Y., Józefowicz, R., Kaiser, L., Kudlur, M., … Zheng, X. (2016). TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv. [Google Scholar] [CrossRef]
- Abbas, A., & Deny, S. (2023, February 7–14). Progress and limitations of deep networks to recognize objects in unusual poses. AAAI Conference on Artificial Intelligence (Vol. 37, pp. 160–168), Washington, DC, USA. [Google Scholar] [CrossRef]
- Alcorn, M. A., Li, Q., Gong, Z., Wang, C., Mai, L., Ku, W. S., & Nguyen, A. (2019, June 16–20). Strike (with) a pose: Neural networks are easily fooled by strange poses of familiar objects. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 4840–4849), Long Beach, CA, USA. [Google Scholar] [CrossRef]
- Baker, N., Lu, H., Erlikhman, G., & Kellman, P. J. (2018). Deep convolutional networks do not classify based on global object shape. PLoS Computational Biology, 14(12), e1006613. [Google Scholar] [CrossRef] [PubMed]
- Barbu, A., Mayo, D., Alverio, J., Luo, W., Wang, C., Gutfreund, D., Tenenbaum, J., & Katz, B. (2019). Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. In Advances in neural information processing systems 32 (pp. 9453–9463). NIPS Foundation. Available online: https://papers.nips.cc/paper_files/paper/2019/file/97af07a14cacba681feacf3012730892-Paper.pdf (accessed on 10 December 2025).
- Beery, S., Van Horn, G., & Perona, P. (2018). Recognition in terra incognita. In V. Ferrari, M. Hebert, C. Sminchisescu, & Y. Weiss (Eds.), Computer vision—ECCV 2018 (pp. 472–489). Springer International Publishing. [Google Scholar] [CrossRef]
- Bell, S., Zitnick, C. L., Bala, K., & Girshick, R. B. (2016, June 27–30). Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2874–2883), Las Vegas, NV, USA. [Google Scholar] [CrossRef]
- Berry, C. J., Ward, E. V., & Shanks, D. R. (2017). Does study duration have opposite effects on recognition and repetition priming? Journal of Memory and Language, 97, 154–174. [Google Scholar] [CrossRef][Green Version]
- Beyer, L., H’enaff, O. J., Kolesnikov, A., Zhai, X., & Oord, A. V. (2020). Are we done with ImageNet? arXiv. [Google Scholar] [CrossRef]
- Borji, A. (2021). Contemplating real-world object classification. arXiv. [Google Scholar] [CrossRef]
- Borji, A., & Itti, L. (2014, June 23–28). Human vs. computer in scene and object recognition. 2014 IEEE Conference on Computer Vision and Pattern Recognition (pp. 113–120), Columbus, OH, USA. [Google Scholar] [CrossRef]
- Bowers, J. S., Malhotra, G., Dujmović, M., Llera Montero, M., Tsvetkov, C., Biscione, V., Puebla, G., Adolfi, F., Hummel, J. E., Heaton, R. F., Evans, B. D., Mitchell, J., & Blything, R. (2023). Deep problems with neural network models of human vision. Behavioral and Brain Sciences, 46, e385. [Google Scholar] [CrossRef]
- Brendel, W., & Bethge, M. (2019). Approximating cnns with bag-of-local-features models works surprisingly well on ImageNet. arXiv. [Google Scholar] [CrossRef]
- Cohen, J. (1968). Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220. [Google Scholar] [CrossRef]
- Cooper, A., Boix, X., Harari, D., Madan, S., Pfister, H., Sasaki, T., & Sinha, P. (2021). To which out-of-distribution object orientations are dnns capable of generalizing? arXiv. [Google Scholar] [CrossRef]
- Cox, D. D., & Dean, T. (2014). Neural networks and neuroscience-inspired computer vision. Current Biology, 24(18), 921–929. [Google Scholar] [CrossRef]
- Dehghani, M., Djolonga, J., Mustafa, B., Padlewski, P., Heek, J., Gilmer, J., Steiner, A., Caron, M., Geirhos, R., Alabdulmohsin, I., Jenatton, R., Beyer, L., Tschannen, M., Arnab, A., Wang, X., Riquelme, C., Minderer, M., Puigcerver, J., Evci, U., … Houlsby, N. (2023). Scaling vision transformers to 22 billion parameters. Proceedings of Machine Learning Research, 202, 7480–7512. [Google Scholar]
- Deng, J., Dong, W., Socher, R., Li, L., Li, K., & Fei-Fei, L. (2009, June 20–25). ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition (pp. 248–255), Miami, FL, USA. [Google Scholar] [CrossRef]
- DiCarlo, J. J., Zoccolan, D., & Rust, N. C. (2012). How does the brain solve visual object recognition? Neuron, 73(3), 415–434. [Google Scholar] [CrossRef] [PubMed]
- Dome, L., & Wills, A. J. (2025). g-Distance: On the comparison of model and human heterogeneity. Psychological Review, 132(3), 632–655. [Google Scholar] [CrossRef]
- Dong, Y., Kang, C., Zhang, J., Zhu, Z., Wang, Y., Yang, X., Su, H., Wei, X., & Zhu, J. (2023, June 17–24). Benchmarking robustness of 3D object detection to common corruptions in autonomous driving. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1022–1032), Vancouver, BC, Canada. [Google Scholar] [CrossRef]
- Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv, arXiv:2010.11929. [Google Scholar] [CrossRef]
- Dujmović, M., Bowers, J. S., Adolfi, F., & Malhotra, G. (2022). Obstacles to inferring mechanistic similarity using representational similarity analysis. bioRxiv. [Google Scholar] [CrossRef]
- Erdogan, G., & Jacobs, R. A. (2017). Visual shape perception as bayesian inference of 3d object-centered shape representations. Psychological Review, 124(6), 740–761. [Google Scholar] [CrossRef] [PubMed]
- Estes, W. K. (1956). The problem of inference from curves based on group data. Psychological Bulletin, 53(2), 134–140. [Google Scholar] [CrossRef]
- Fel, T., Rodriguez Rodriguez, I. F., Linsley, D., & Serre, T. (2022). Harmonizing the object recognition strategies of deep neural networks with humans. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Advances in neural information processing systems (Vol. 35, pp. 9432–9446). Curran Associates, Inc. Available online: https://proceedings.neurips.cc/paper_files/paper/2022/file/3d681cc4487b97c08e5aa67224dd74f2-Paper-Conference.pdf (accessed on 29 July 2025).
- Firestone, C. (2020). Performance vs. competence in human-machine comparisons. Proceedings of the National Academy of Sciences of the United States of America, 117(43), 26562–26571. [Google Scholar] [CrossRef]
- Gamer, M., Lemon, J., Fellows, I., & Sing, P. (2010). irr: Various coefficients of interrater reliability and agreement (R package version 0.84.1) [Computer software manual]. Available online: http://CRAN.R-project.org/package=irr (accessed on 15 July 2025).
- Gauthier, I. (2018). Domain-specific and domain-general individual differences in visual object recognition. Current Directions in Psychological Science, 27(2), 97–102. [Google Scholar] [CrossRef]
- Geirhos, R., Jacobsen, J., Michaelis, C., Zemel, R. S., Brendel, W., Bethge, M., & Wichmann, F. (2020). Shortcut learning in deep neural networks. Nature Machine Intelligence, 2, 665–673. [Google Scholar] [CrossRef]
- Geirhos, R., Narayanappa, K., Mitzkus, B., Thieringer, T., Bethge, M., Wichmann, F. A., & Brendel, W. (2021, December 6–14). Partial success in closing the gap between human and machine vision. 35th Conference on Neural Information Processing Systems (NeurIPS) (Vol. 34, pp. 23885–23899), Online. Available online: https://proceedings.neurips.cc/paper_files/paper/2021/file/c8877cff22082a16395a57e97232bb6f-Paper.pdf (accessed on 29 July 2025).
- Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F. A., & Brendel, W. (2019, May 6–9). Imagenet-trained CNNs are biased towards texture; Increasing shape bias improves accuracy and robustness. International Conference on Learning Representations (ICLR), New Orleans, LA, USA. Available online: https://openreview.net/forum?id=Bygh9j09KX (accessed on 29 July 2025).
- Geirhos, R., Temme, C. R., Rauber, J., Schütt, H. H., Bethge, M., & Wichmann, F. (2018). Generalisation in humans and deep neural networks. arXiv. [Google Scholar] [CrossRef]
- He, K., Zhang, X., Ren, S., & Sun, J. (2015, December 7–13). Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. IEEE International Conference on Computer Vision (pp. 1026–1034), Santiago, Chile. [Google Scholar] [CrossRef]
- He, K., Zhang, X., Ren, S., & Sun, J. (2016, June 27–30). Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770–778), Las Vegas, NV, USA. [Google Scholar] [CrossRef]
- Hedge, C., Powell, G., & Sumner, P. (2018). The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences. Behavior Research Methods, 50(3), 1166–1186. [Google Scholar] [CrossRef]
- Ilharco, G., Wortsman, M., Wightman, R., Gordon, C., Carlini, N., Taori, R., Dave, A., Shankar, V., Namkoong, H., Miller, J., Hajishirzi, H., Farhadi, A., & Schmidt, L. (2021). OpenCLIP (Version 0.1). Zenodo. [CrossRef]
- Jaini, P., Clark, K., & Geirhos, R. (2023). Intriguing properties of generative classifiers. arXiv. [Google Scholar] [CrossRef]
- Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv. [Google Scholar] [CrossRef]
- Kolesnikov, A., Beyer, L., Zhai, X., Puigcerver, J., Yung, J., Gelly, S., & Houlsby, N. (2020, August 23–28). Big transfer (bit): General visual representation learning. European Conference on Computer Vision (pp. 491–507), Glasgow, UK. [Google Scholar] [CrossRef]
- Kriegeskorte, N., Simmons, W. K., Bellgowan, P. S., & Baker, C. I. (2009). Circular analysis in systems neuroscience: The dangers of double dipping. Nature Neuroscience, 12(5), 535–540. [Google Scholar] [CrossRef]
- Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 25, pp. 1097–1105). Curran Associates, Inc. Available online: https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf (accessed on 29 July 2025).
- Kul, G. (2025). The comparison of human and machine performance in object recognition [Ph.D. thesis, University of Plymouth]. Available online: https://pearl.plymouth.ac.uk/psy-theses/150 (accessed on 14 December 2025).
- Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. [Google Scholar] [CrossRef] [PubMed]
- Lange, K., Kühn, S., & Filevich, E. (2015). Just another tool for online studies (jatos): An easy solution for setup and management of web servers supporting online studies. PLoS ONE, 10(6), e0130834. [Google Scholar] [CrossRef]
- LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444. [Google Scholar] [CrossRef]
- Madan, S., Sasaki, T., Pfister, H., Li, T.-M., & Boix, X. (2021). In-distribution adversarial attacks on object recognition models using gradient-free search. arXiv. [Google Scholar] [CrossRef]
- Majaj, N. J., & Pelli, D. (2018). Deep learning—Using machine learning to study biological vision. Journal of Vision, 18(13), 2. [Google Scholar] [CrossRef]
- Malhotra, G., Dujmović, M., & Bowers, J. S. (2022). Feature blindness: A challenge for understanding and modelling visual object recognition. PLoS Computational Biology, 18(5), e1009572. [Google Scholar] [CrossRef]
- Marcel, A. J. (1983). Conscious and unconscious perception: Experiments on visual masking and word recognition. Cognitive Psychology, 15(2), 197–237. [Google Scholar] [CrossRef]
- Marjieh, R., van Rijn, P., Sucholutsky, I., Sumers, T., Lee, H., Griffiths, T., & Jacoby, N. (2022). Words are all you need? Capturing human sensory similarity with textual descriptors. arXiv. [Google Scholar] [CrossRef]
- Mathôt, S., Schreij, D., & Theeuwes, J. (2012). Opensesame: An open-source, graphical experiment builder for the social sciences. Behavior Research Methods, 44, 314–324. [Google Scholar] [CrossRef]
- Morey, R. D., & Rouder, J. N. (2022). Bayesfactor: Computation of bayes factors for common designs (R package version 0.9.12-4.4) [Computer software manual]. Available online: https://cran.r-project.org/web/packages/BayesFactor/index.html (accessed on 15 July 2025). [CrossRef]
- Murphy, K., Torralba, A., Eaton, D., & Freeman, W. (2006). Object detection and localization using local and global features. In J. Ponce, M. Hebert, C. Schmid, & A. Zisserman (Eds.), Toward category-level object recognition (pp. 382–400). Springer. [Google Scholar] [CrossRef]
- Muttenthaler, L., Dippel, J., Linhardt, L., Vandermeulen, R. A., & Kornblith, S. (2022). Human alignment of neural network representations. arXiv. [Google Scholar] [CrossRef]
- Navarro, D. J., Griffiths, T. L., Steyvers, M., & Lee, M. D. (2006). Modeling individual differences using Dirichlet processes. Journal of Mathematical Psychology, 50(2), 101–122. [Google Scholar] [CrossRef]
- Oliva, A., & Torralba, A. (2007). The role of context in object recognition. Trends in Cognitive Sciences, 11(12), 520–527. [Google Scholar] [CrossRef] [PubMed]
- Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., … Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems 32 (pp. 8024–8035). Curran Associates, Inc. Available online: http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf (accessed on 15 July 2025).
- Pitt, M. A., Kim, W., Navarro, D. J., & Myung, J. I. (2006). Global model analysis by parameter space partitioning. Psychological Review, 113(1), 57–83. [Google Scholar] [CrossRef] [PubMed]
- Popal, H., Wang, Y., & Olson, I. R. (2019). A guide to representational similarity analysis for social neuroscience. Social Cognitive and Affective Neuroscience, 14(11), 1243–1253. [Google Scholar] [CrossRef]
- R Core Team. (2023). R: A language and environment for statistical computing [Computer software manual]. R Foundation for Statistical Computing. Available online: https://www.R-project.org/ (accessed on 15 July 2025).
- Revelle, W. (2024). psych: Procedures for psychological, psychometric, and personality research (R package version 2.4.3) [Computer software manual]. Available online: https://cran.r-project.org/web/packages/psych/index.html (accessed on 15 July 2025). [CrossRef]
- Richler, J. J., Tomarken, A. J., Sunday, M. A., Vickery, T. J., Ryan, K. F., Floyd, R. J., Sheinberg, D., Wong, A. C., & Gauthier, I. (2019). Individual differences in object recognition. Psychological Review, 126(2), 226–251. [Google Scholar] [CrossRef]
- Rosebrock, A. (2017). Deep learning for computer vision with python: Starter bundle. PyImageSearch. [Google Scholar]
- Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., & Fei-Fei, L. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252. [Google Scholar] [CrossRef]
- Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C. W., Wightman, R., Cherti, M., Coombes, T., Katta, A., Mullis, C., Wortsman, M., Schramowski, P., Kundurthy, S. R., Crowson, K., Schmidt, L., Kaczmarczyk, R., & Jitsev, J. (2022, November 28). Laion-5b: An open large-scale dataset for training next generation image-text models. Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, New Orleans, LA, USA. Available online: https://openreview.net/forum?id=M3Y74vmsMcY (accessed on 11 December 2023).
- Serre, T., Oliva, A., & Poggio, T. (2007). A feedforward architecture accounts for rapid categorisation. Proceedings of the National Academy of Sciences of the United States of America, 104(15), 6424–6429. [Google Scholar] [CrossRef]
- Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR. Available online: http://arxiv.org/abs/1409.1556 (accessed on 29 July 2025).
- Sona Systems. (n.d.). Sona systems: Cloud-based participant management software [Computer software]. Sona Systems, Ltd. Available online: https://www.sona-systems.com/ (accessed on 5 March 2024).
- Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016, June 27–30). Rethinking the inception architecture for computer vision. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2818–2826), Las Vegas, NV, USA. [Google Scholar] [CrossRef]
- Taori, R., Dave, A., Shankar, V., Carlini, N., Recht, B., & Schmidt, L. (2020). Measuring robustness to natural distribution shifts in image classification. arXiv. [Google Scholar] [CrossRef]
- Timme, F., Kerdels, J., & Peters, G. (2020, November 2–4). On the robustness of convolutional neural networks regarding transformed input images. 12th International Joint Conference on Computational Intelligence (Vol. 1, pp. 396–403), Budapest, Hungary. [Google Scholar] [CrossRef]
- Wang, A. Y., Kay, K., Naselaris, T., Tarr, M. J., & Wehbe, L. (2023). Natural language supervision with a large and diverse dataset builds better models of human high-level visual cortex. bioRxiv. [Google Scholar] [CrossRef]
- Wichmann, F. A., & Geirhos, R. (2023). Are deep neural networks adequate behavioral models of human visual perception? Annual Review of Vision Science, 9, 501–524. [Google Scholar] [CrossRef] [PubMed]
- Wightman, R. (2019). Pytorch image models (Version 0.6.13) [Computer software]. Available online: https://github.com/rwightman/pytorch-image-models (accessed on 4 July 2024).
- Yu, J., Wang, Z., Vasudevan, V., Yeung, L., Seyedhosseini, M., & Wu, Y. (2022). Coca: Contrastive captioners are image-text foundation models. arXiv. [Google Scholar] [CrossRef]
- Zhu, Z., Xie, L., & Yuille, A. (2017, August 19–25). Object recognition with and without objects. 26th International Joint Conference on Artificial Intelligence (pp. 3609–3615), Melbourne, Australia. [Google Scholar] [CrossRef]












| CoCa | ViT-B8 | ViT-L16 | ResNet152 | InceptionV3 | |
|---|---|---|---|---|---|
| ViT-B8 | 0.87 | ||||
| ViT-L16 | 0.87 | 0.88 | |||
| ResNet152 | 0.76 | 0.78 | 0.78 | ||
| InceptionV3 | 0.74 | 0.75 | 0.75 | 0.74 | |
| VGG19 | 0.66 | 0.67 | 0.68 | 0.67 | 0.66 |
| Condition | CoCa | ViT-B8 | ViT-L16 | ResNet152 | InceptionV3 |
|---|---|---|---|---|---|
| Isolated | |||||
| ViT-B8 | 0.65 | ||||
| ViT-L16 | 0.63 | 0.76 | |||
| ResNet152 | 0.46 | 0.54 | 0.52 | ||
| InceptionV3 | 0.43 | 0.46 | 0.42 | 0.51 | |
| VGG19 | 0.28 | 0.34 | 0.34 | 0.32 | 0.39 |
| Full | |||||
| ViT-B8 | 0.62 | ||||
| ViT-L16 | 0.55 | 0.68 | |||
| ResNet152 | 0.50 | 0.53 | 0.50 | ||
| InceptionV3 | 0.39 | 0.43 | 0.40 | 0.45 | |
| VGG19 | 0.25 | 0.30 | 0.31 | 0.39 | 0.32 |
| CoCa | ViT-B8 | ViT-L16 | ResNet152 | InceptionV3 | |
|---|---|---|---|---|---|
| ViT-B8 | 0.89 | ||||
| ViT-L16 | 0.90 | 0.90 | |||
| ResNet152 | 0.77 | 0.79 | 0.79 | ||
| InceptionV3 | 0.73 | 0.76 | 0.76 | 0.72 | |
| VGG19 | 0.65 | 0.67 | 0.69 | 0.63 | 0.70 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Kul, G.; Wills, A.J. The Comparison of Human and Machine Performance in Object Recognition. Behav. Sci. 2026, 16, 109. https://doi.org/10.3390/bs16010109
Kul G, Wills AJ. The Comparison of Human and Machine Performance in Object Recognition. Behavioral Sciences. 2026; 16(1):109. https://doi.org/10.3390/bs16010109
Chicago/Turabian StyleKul, Gokcek, and Andy J. Wills. 2026. "The Comparison of Human and Machine Performance in Object Recognition" Behavioral Sciences 16, no. 1: 109. https://doi.org/10.3390/bs16010109
APA StyleKul, G., & Wills, A. J. (2026). The Comparison of Human and Machine Performance in Object Recognition. Behavioral Sciences, 16(1), 109. https://doi.org/10.3390/bs16010109

