Comparative Evaluation of Multimodal Large Language Models for No-Reference Image Quality Assessment with Authentic Distortions: A Study of OpenAI and Claude.AI Models
Abstract
1. Introduction
- Motion blur from camera shake or subject movement, characterized by directional smearing and varying intensity along the motion path;
- Poor focus, manifesting as both global and local blur patterns, often with depth-dependent characteristics;
- Exposure issues including highlight clipping, shadow noise, and dynamic range limitations;
- Sensor noise varying with ISO settings and showing color-channel-dependent characteristics;
- Optical aberrations including chromatic aberration, barrel/pincushion distortion, and vignetting;
- Complex interactions between multiple distortion types, such as noise amplification in post-processed underexposed regions.
1.1. Contributions
- We investigate the frequency of invalid responses across different models and datasets, identifying systematic weaknesses.
- We compare the performance of multiple multimodal LLMs, including ChatGPT-4o-Latest, GPT-4o-2024-11-20, GPT-4-Turbo-2024-04-09, Claude-3-Haiku-20240307, Claude-3-Opus-20240229, and Claude-3-Sonnet-20240229, highlighting variations in their robustness and accuracy. Our results reveal that LLMs struggle more with the CLIVE [34] dataset than with KonIQ-10k [1], suggesting dataset-specific challenges that affect multimodal LLM performance.
1.2. Structure of the Paper
2. Related Works
3. Materials and Methods
3.1. Materials
3.1.1. IQA Benchmark Databases
3.1.2. Multimodal LLMs
- Vision Encoder: LLaVA-1.5-7B’s image encoder is CLIP ViT-L/14, which transforms an input image into a sequence of visual embeddings. CLIP (Contrastive Language–Image Pre-training) was trained on 400 million image–text pairs to produce aligned visual and textual representations.
- Language Model: The text processing backbone is Vicuna-7B v1.5, a fine-tuned version of LLaMA-2 7B optimized for instruction-following dialogue through supervised fine-tuning on high-quality multi-turn conversation data.
- Projection Layer: A trainable linear layer maps the output of the vision encoder to the language model’s embedding space. This enables the integration of visual information into the language model’s input stream.
- Training Paradigm: The model was trained using instruction-tuning on a mixture of datasets, including complex visual reasoning, image captioning, and visual question answering (VQA). The fine-tuning process includes aligning the model to follow multimodal prompts using image and text inputs jointly.
- Tokenization and Input Format: Images are encoded into visual tokens and prepended to the textual input as special tokens. During inference, a multimodal prompt (e.g., “Describe the quality of this image”) is concatenated with the visual tokens and fed into the model for autoregressive decoding.
- Model Availability: The “HF” suffix denotes the Hugging Face-compatible implementation, which includes pre-trained weights, configuration files, and inference scripts. It can be run on consumer-grade GPUs and integrated into custom pipelines.
3.2. Methods
4. Results
4.1. Evaluation Metrics
4.2. Performance Comparison of Multimodal LLMs
4.3. Comparison to the State-of-the-Art
5. Discussion
6. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
BRISQUE | Blind/Referenceless Image Spatial Quality Evaluator |
CLIP | Contrastive Language–Image Pre-training |
CNN | convolutional neural network |
DCT | discrete cosine transform |
EMD | earth mover’s distance |
GAN | generative adversarial network |
GGD | generalized Gaussian distribution |
IQA | image quality assessment |
JPEG | Joint Photographic Experts Group |
KADID | Konstanz artificially distorted image quality database |
KROCC | Kendall’s rank order correlation coefficient |
LBP | local binary patterns |
LIVE | Laboratory for Image and Video Engineering |
LLM | large language model |
MOS | mean opinion score |
MSCN | mean subtracted contrast normalized |
NiN | network in network |
NIQE | Naturalness Image Quality Evaluator |
NLP | natural language processing |
NR-IQA | no-reference image quality assessment |
NSS | natural scene statistics |
PLCC | Pearson’s linear correlation coefficient |
SROCC | Spearman’s rank order correlation coefficient |
SVR | support vector regressor |
TID | Tampere Image Database |
ViT | Vision Transformer |
VQA | visual question answering |
YFCC100M | Yahoo Flickr Creative Commons 100 Million |
References
- Lin, H.; Hosu, V.; Saupe, D. KonIQ-10K: Towards an ecologically valid and large-scale IQA database. arXiv 2018, arXiv:1803.08489. [Google Scholar]
- Götz-Hahn, F.; Hosu, V.; Lin, H.; Saupe, D. KonVid-150k: A dataset for no-reference video quality assessment of videos in-the-wild. IEEE Access 2021, 9, 72139–72160. [Google Scholar] [CrossRef]
- Yang, P.; Sturtz, J.; Qingge, L. Progress in blind image quality assessment: A brief review. Mathematics 2023, 11, 2766. [Google Scholar] [CrossRef]
- Oura, D.; Sato, S.; Honma, Y.; Kuwajima, S.; Sugimori, H. Quality assurance of chest X-ray images with a combination of deep learning methods. Appl. Sci. 2023, 13, 2067. [Google Scholar] [CrossRef]
- Nam, W.; Youn, T.; Ha, C. No-Reference Image Quality Assessment with Moving Spectrum and Laplacian Filter for Autonomous Driving Environment. Vehicles 2025, 7, 8. [Google Scholar] [CrossRef]
- Hao, Y.; Pei, H.; Lyu, Y.; Yuan, Z.; Rizzo, J.R.; Wang, Y.; Fang, Y. Understanding the impact of image quality and distance of objects to object detection performance. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 11436–11442. [Google Scholar]
- Dodge, S.; Karam, L. Understanding how image quality affects deep neural networks. In Proceedings of the 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX), Lisbon, Portugal, 6–8 June 2016; pp. 1–6. [Google Scholar]
- Pednekar, G.V.; Udupa, J.K.; McLaughlin, D.J.; Wu, X.; Tong, Y.; Simone, C.B.; Camaratta, J.; Torigian, D.A. Image quality and segmentation. Proc. SPIE Int. Soc. Opt. Eng. 2018, 10576, 105762N. [Google Scholar]
- Chiasserini, C.F.; Magli, E. Energy consumption and image quality in wireless video-surveillance networks. In Proceedings of the 13th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, Lisboa, Portugal, 15–18 September 2002; Volume 5, pp. 2357–2361. [Google Scholar]
- Winkler, S.; Campos, R. Video quality evaluation for Internet streaming applications. In Proceedings of the Human Vision and Electronic Imaging VIII, Santa Clara, CA, USA, 20 January 2003; Volume 5007, pp. 104–115. [Google Scholar]
- Li, J.; Zhang, X.; Ge, J.; Bai, C.; Feng, G.; Mu, H.; Wang, L.; Liu, C.; Kang, Z.; Jiang, X. Astronomical Image Quality Assessment Based on Deep Learning for Resource-constrained Environments. Publ. Astron. Soc. Pac. 2025, 137, 034502. [Google Scholar] [CrossRef]
- Babic, M.; Farahani, M.A.; Wuest, T. Image based quality inspection in smart manufacturing systems: A literature review. Procedia CIRP 2021, 103, 262–267. [Google Scholar] [CrossRef]
- Li, S.; Yang, Z.; Li, H. Statistical evaluation of no-reference image quality assessment metrics for remote sensing images. ISPRS Int. J. Geo-Inf. 2017, 6, 133. [Google Scholar] [CrossRef]
- Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S.; et al. Palm: Scaling language modeling with pathways. J. Mach. Learn. Res. 2023, 24, 1–113. [Google Scholar]
- Schuhmann, C.; Beaumont, R.; Vencu, R.; Gordon, C.; Wightman, R.; Cherti, M.; Coombes, T.; Katta, A.; Mullis, C.; Wortsman, M.; et al. Laion-5b: An open large-scale dataset for training next generation image-text models. Adv. Neural Inf. Process. Syst. 2022, 35, 25278–25294. [Google Scholar]
- Chen, X.; Wang, X.; Changpinyo, S.; Piergiovanni, A.; Padlewski, P.; Salz, D.; Goodman, S.; Grycner, A.; Mustafa, B.; Beyer, L.; et al. Pali: A jointly-scaled multilingual language-image model. arXiv 2022, arXiv:2209.06794. [Google Scholar]
- Sharma, P.; Ding, N.; Goodman, S.; Soricut, R. Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018; pp. 2556–2565. [Google Scholar]
- Mittal, A.; Moorthy, A.K.; Bovik, A.C. No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 2012, 21, 4695–4708. [Google Scholar] [CrossRef] [PubMed]
- Galdran, A.; Araújo, T.; Mendonça, A.M.; Campilho, A. Retinal image quality assessment by mean-subtracted contrast-normalized coefficients. In Proceedings of the VipIMAGE 2017: Proceedings of the VI ECCOMAS Thematic Conference on Computational Vision and Medical Image Processing, Porto, Portugal, 18–20 October 2017; Springer: Berlin/Heidelberg, Germany, 2018; pp. 844–853. [Google Scholar]
- Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 2012, 20, 209–212. [Google Scholar] [CrossRef]
- Talebi, H.; Milanfar, P. NIMA: Neural image assessment. IEEE Trans. Image Process. 2018, 27, 3998–4011. [Google Scholar] [CrossRef] [PubMed]
- Murray, N.; Marchesotti, L.; Perronnin, F. AVA: A large-scale database for aesthetic visual analysis. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 2408–2415. [Google Scholar]
- Levina, E.; Bickel, P. The earth mover’s distance is the mallows distance: Some insights from statistics. In Proceedings of the Eighth IEEE International Conference on Computer Vision, Vancouver, BC, Canada, 7–14 July 2001; Volume 2, pp. 251–256. [Google Scholar]
- Ke, J.; Wang, Q.; Wang, Y.; Milanfar, P.; Yang, F. Musiq: Multi-scale image quality transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual Event, 11–17 October 2021; pp. 5148–5157. [Google Scholar]
- Su, S.; Yan, Q.; Zhu, Y.; Zhang, C.; Ge, X.; Sun, J.; Zhang, Y. Blindly assess image quality in the wild guided by a self-adaptive hyper network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 3667–3676. [Google Scholar]
- Sheikh, H.R.; Sabir, M.F.; Bovik, A.C. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image Process. 2006, 15, 3440–3451. [Google Scholar] [CrossRef]
- Ponomarenko, N.; Ieremeiev, O.; Lukin, V.; Jin, L.; Egiazarian, K.; Astola, J.; Vozel, B.; Chehdi, K.; Carli, M.; Battisti, F.; et al. A new color image database TID2013: Innovations and results. In Proceedings of the Advanced Concepts for Intelligent Vision Systems: 15th International Conference, ACIVS 2013, Poznań, Poland, 28–31 October 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 402–413. [Google Scholar]
- Lin, H.; Hosu, V.; Saupe, D. KADID-10k: A large-scale artificially distorted IQA database. In Proceedings of the 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), Berlin, Germany, 5–7 June 2019; pp. 1–3. [Google Scholar]
- Men, H.; Lin, H.; Saupe, D. Empirical evaluation of no-reference VQA methods on a natural video quality database. In Proceedings of the 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), Erfurt, Germany, 31 May–2 June 2017; pp. 1–3. [Google Scholar]
- Men, H.; Lin, H.; Jenadeleh, M.; Saupe, D. Subjective image quality assessment with boosted triplet comparisons. IEEE Access 2021, 9, 138939–138975. [Google Scholar] [CrossRef]
- Lin, H.; Men, H.; Yan, Y.; Ren, J.; Saupe, D. Crowdsourced quality assessment of enhanced underwater images—A pilot study. In Proceedings of the 2022 14th International Conference on Quality of Multimedia Experience (QoMEX), Lippstadt, Germany, 5–7 September 2022; pp. 1–4. [Google Scholar]
- Su, S.; Lin, H.; Hosu, V.; Wiedemann, O.; Sun, J.; Zhu, Y.; Liu, H.; Zhang, Y.; Saupe, D. Going the extra mile in face image quality assessment: A novel database and model. IEEE Trans. Multimed. 2023, 26, 2671–2685. [Google Scholar] [CrossRef]
- Yang, H.; Fang, Y.; Lin, W. Perceptual quality assessment of screen content images. IEEE Trans. Image Process. 2015, 24, 4408–4421. [Google Scholar] [CrossRef]
- Ghadiyaram, D.; Bovik, A.C. Massive online crowdsourced study of subjective and objective picture quality. IEEE Trans. Image Process. 2015, 25, 372–387. [Google Scholar] [CrossRef]
- Xin, L.; Yuting, K.; Tao, S. Investigation of the Relationship between Speed and Image Quality of Autonomous Vehicles. J. Min. Sci. 2021, 57, 264–273. [Google Scholar] [CrossRef]
- Xia, W.; Yang, Y.; Xue, J.H.; Xiao, J. Domain fingerprints for no-reference image quality assessment. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 1332–1341. [Google Scholar] [CrossRef]
- Zeng, Z.; Yang, W.; Sun, W.; Xue, J.H.; Liao, Q. No-reference image quality assessment for photographic images based on robust statistics. Neurocomputing 2018, 313, 111–118. [Google Scholar] [CrossRef]
- Li, J.; Qiao, S.; Zhao, C.; Zhang, T. No-reference image quality assessment based on multiscale feature representation. IET Image Process. 2021, 15, 3318–3331. [Google Scholar] [CrossRef]
- Stansbury, D.E.; Naselaris, T.; Gallant, J.L. Natural scene statistics account for the representation of scene categories in human visual cortex. Neuron 2013, 79, 1025–1034. [Google Scholar] [CrossRef]
- Zhu, K.; Asari, V.; Saupe, D. No-reference quality assessment of H. 264/AVC encoded video based on natural scene features. In Proceedings of the Mobile Multimedia/Image Processing, Security, and Applications 2013, Baltimore, MD, USA, 29 April–1 May 2013; Volume 8755, pp. 25–35. [Google Scholar]
- Sheikh, H.R.; Bovik, A.C.; Cormack, L. Blind quality assessment of JPEG2000 compressed images using natural scene statistics. In Proceedings of the Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, Pacific Grove, CA, USA, 9–12 November 2003; Volume 2, pp. 1403–1407. [Google Scholar]
- Moorthy, A.K.; Bovik, A.C. A two-step framework for constructing blind image quality indices. IEEE Signal Process. Lett. 2010, 17, 513–516. [Google Scholar] [CrossRef]
- Daubechies, I. Ten Lectures on Wavelets; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1992. [Google Scholar]
- Awad, M.; Khanna, R.; Awad, M.; Khanna, R. Support vector regression. In Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers; Springer: Berlin/Heidelberg, Germany, 2015; pp. 67–80. [Google Scholar]
- Moorthy, A.K.; Bovik, A.C. Blind image quality assessment: From natural scene statistics to perceptual quality. IEEE Trans. Image Process. 2011, 20, 3350–3364. [Google Scholar] [CrossRef]
- Wainwright, M.J.; Simoncelli, E. Scale mixtures of Gaussians and the statistics of natural images. Adv. Neural Inf. Process. Syst. 1999, 12, 855–861. [Google Scholar]
- Saad, M.A.; Bovik, A.C. Blind quality assessment of videos using a model of natural scene statistics and motion coherency. In Proceedings of the 2012 Conference Record of the Forty Sixth Asiloma Pacific, Grove, CA, USA, 4–7 November 2012; pp. 332–336. [Google Scholar]
- Saad, M.A.; Bovik, A.C.; Charrier, C. A DCT statistics-based blind image quality index. IEEE Signal Process. Lett. 2010, 17, 583–586. [Google Scholar] [CrossRef]
- Gabarda, S.; Cristóbal, G. Blind image quality assessment through anisotropy. J. Opt. Soc. Am. A 2007, 24, B42–B51. [Google Scholar] [CrossRef]
- Lasmar, N.E.; Stitou, Y.; Berthoumieu, Y. Multiscale skewed heavy tailed model for texture analysis. In Proceedings of the 2009 16th IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, 7–10 November 2009; pp. 2281–2284. [Google Scholar]
- Ruderman, D.L.; Bialek, W. Statistics of natural images: Scaling in the woods. Phys. Rev. Lett. 1994, 73, 814. [Google Scholar] [CrossRef] [PubMed]
- Humeau-Heurtier, A. Texture feature extraction methods: A survey. IEEE Access 2019, 7, 8975–9000. [Google Scholar] [CrossRef]
- Xue, W.; Mou, X.; Zhang, L.; Bovik, A.C.; Feng, X. Blind image quality assessment using joint statistics of gradient magnitude and Laplacian features. IEEE Trans. Image Process. 2014, 23, 4850–4862. [Google Scholar] [CrossRef]
- Rahim, M.A.; Hossain, M.N.; Wahid, T.; Azam, M.S. Face recognition using local binary patterns (LBP). Glob. J. Comput. Sci. Technol. 2013, 13, 1–8. [Google Scholar]
- Song, K.-C.; Yan, Y.-H.; Chen, W.-H.; Zhang, X. Research and perspective on local binary pattern. Acta Autom. Sin. 2013, 39, 730–744. [Google Scholar] [CrossRef]
- Garcia Freitas, P.; Da Eira, L.P.; Santos, S.S.; Farias, M.C.Q.d. On the application LBP texture descriptors and its variants for no-reference image quality assessment. J. Imaging 2018, 4, 114. [Google Scholar] [CrossRef]
- Li, Q.; Lin, W.; Xu, J.; Fang, Y. Blind image quality assessment using statistical structural and luminance features. IEEE Trans. Multimed. 2016, 18, 2457–2469. [Google Scholar] [CrossRef]
- Rajevenceltha, J.; Gaidhane, V.H. An efficient approach for no-reference image quality assessment based on statistical texture and structural features. Eng. Sci. Technol. Int. J. 2022, 30, 101039. [Google Scholar] [CrossRef]
- Bosse, S.; Maniry, D.; Müller, K.R.; Wiegand, T.; Samek, W. Deep neural networks for no-reference and full-reference image quality assessment. IEEE Trans. Image Process. 2017, 27, 206–219. [Google Scholar] [CrossRef]
- Ma, Y.; Cai, X.; Sun, F.; Hao, S. No-reference image quality assessment based on multi-task generative adversarial network. IEEE Access 2019, 7, 146893–146902. [Google Scholar] [CrossRef]
- Kang, L.; Ye, P.; Li, Y.; Doermann, D. Convolutional neural networks for no-reference image quality assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1733–1740. [Google Scholar]
- Bare, B.; Li, K.; Yan, B. An accurate deep convolutional neural networks model for no-reference image quality assessment. In Proceedings of the 2017 IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, China, 10–14 July 2017; pp. 1356–1361. [Google Scholar]
- Zhang, L.; Zhang, L.; Mou, X.; Zhang, D. FSIM: A feature similarity index for image quality assessment. IEEE Trans. Image Process. 2011, 20, 2378–2386. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Po, L.M.; Feng, L.; Yuan, F. No-reference image quality assessment with deep convolutional neural networks. In Proceedings of the 2016 IEEE International Conference on Digital Signal Processing (DSP), Beijing, China, 16–18 October 2016; pp. 685–689. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Lin, M. Network in network. arXiv 2013, arXiv:1312.4400. [Google Scholar]
- Bianco, S.; Celona, L.; Napoletano, P.; Schettini, R. On the use of deep learning for blind image quality assessment. Signal Image Video Process. 2018, 12, 355–362. [Google Scholar] [CrossRef]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 586–595. [Google Scholar]
- Ryu, J. Improved image quality assessment by utilizing pre-trained architecture features with unified learning mechanism. Appl. Sci. 2023, 13, 2682. [Google Scholar] [CrossRef]
- Gao, F.; Yu, J.; Zhu, S.; Huang, Q.; Tian, Q. Blind image quality prediction by exploiting multi-level deep representations. Pattern Recognit. 2018, 81, 432–442. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Shaw, P.; Uszkoreit, J.; Vaswani, A. Self-attention with relative position representations. arXiv 2018, arXiv:1803.02155. [Google Scholar]
- Keshari, A.; Subudhi, B. Multi-scale features and parallel transformers based image quality assessment. arXiv 2022, arXiv:2204.09779. [Google Scholar]
- Yang, S.; Wu, T.; Shi, S.; Lao, S.; Gong, Y.; Cao, M.; Wang, J.; Yang, Y. Maniqa: Multi-dimension attention network for no-reference image quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1191–1200. [Google Scholar]
- Wang, J.; Chan, K.C.; Loy, C.C. Exploring clip for assessing the look and feel of images. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 2555–2563. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Manap, R.A.; Shao, L. Non-distortion-specific no-reference image quality assessment: A survey. Inf. Sci. 2015, 301, 141–160. [Google Scholar] [CrossRef]
- Xu, S.; Jiang, S.; Min, W. No-reference/blind image quality assessment: A survey. IETE Tech. Rev. 2017, 34, 223–245. [Google Scholar] [CrossRef]
- Zhai, G.; Min, X. Perceptual image quality assessment: A survey. Sci. China Inf. Sci. 2020, 63, 1–52. [Google Scholar] [CrossRef]
- Yang, X.; Li, F.; Liu, H. A survey of DNN methods for blind image quality assessment. IEEE Access 2019, 7, 123788–123806. [Google Scholar] [CrossRef]
- Xu, L.; Lin, W.; Kuo, C.C.J. Visual Quality Assessment by Machine Learning; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
- Jenadeleh, M. Blind Image and Video Quality Assessment. Ph.D. Dissertation, Universität Konstanz, Baden-Württemberg, Germany, 2018. [Google Scholar]
- Men, H. Boosting for Visual Quality Assessment with Applications for Frame Interpolation Methods. Ph.D. Dissertation, Universität Konstanz, Baden-Württemberg, Germany, 2022. [Google Scholar]
- Thomee, B.; Shamma, D.A.; Friedland, G.; Elizalde, B.; Ni, K.; Poland, D.; Borth, D.; Li, L.J. Yfcc100m: The new data in multimedia research. Commun. ACM 2016, 59, 64–73. [Google Scholar] [CrossRef]
- Saupe, D.; Hahn, F.; Hosu, V.; Zingman, I.; Rana, M.; Li, S. Crowd workers proven useful: A comparative study of subjective video quality assessment. In Proceedings of the QoMEX 2016: 8th International Conference on Quality of Multimedia Experience, Lisbon, Portugal, 6–8 June 2016. [Google Scholar]
- Shahriar, S.; Lund, B.D.; Mannuru, N.R.; Arshad, M.A.; Hayawi, K.; Bevara, R.V.K.; Mannuru, A.; Batool, L. Putting gpt-4o to the sword: A comprehensive evaluation of language, vision, speech, and multimodal proficiency. Appl. Sci. 2024, 14, 7782. [Google Scholar] [CrossRef]
- Islam, R.; Moushi, O.M. Gpt-4o: The cutting-edge advancement in multimodal llm. Authorea Prepr. 2024. [Google Scholar] [CrossRef]
- Priyanshu, A.; Maurya, Y.; Hong, Z. AI Governance and Accountability: An Analysis of Anthropic’s Claude. arXiv 2024, arXiv:2407.01557. [Google Scholar]
- Zhao, F.F.; He, H.J.; Liang, J.J.; Cen, J.; Wang, Y.; Lin, H.; Chen, F.; Li, T.P.; Yang, J.F.; Chen, L.; et al. Benchmarking the performance of large language models in uveitis: A comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3. Eye 2024, 39, 1132–1137. [Google Scholar] [CrossRef]
- Liu, H.; Li, C.; Wu, Q.; Lee, Y.J. Visual instruction tuning. Adv. Neural Inf. Process. Syst. 2023, 36, 34892–34916. [Google Scholar]
- Yu, L.; Li, J.; Pakdaman, F.; Ling, M.; Gabbouj, M. MAMIQA: No-Reference Image Quality Assessment Based on Multiscale Attention Mechanism With Natural Scene Statistics. IEEE Signal Process. Lett. 2023, 30, 588–592. [Google Scholar] [CrossRef]
- Min, X.; Zhai, G.; Gu, K.; Liu, Y.; Yang, X. Blind image quality estimation via distortion aggravation. IEEE Trans. Broadcast. 2018, 64, 508–517. [Google Scholar] [CrossRef]
- Liu, L.; Dong, H.; Huang, H.; Bovik, A.C. No-reference image quality assessment in curvelet domain. Signal Process. Image Commun. 2014, 29, 494–505. [Google Scholar] [CrossRef]
- Chen, X.; Zhang, Q.; Lin, M.; Yang, G.; He, C. No-reference color image quality assessment: From entropy to perceptual quality. EURASIP J. Image Video Process. 2019, 2019, 77. [Google Scholar] [CrossRef]
- Li, Q.; Lin, W.; Fang, Y. No-reference quality assessment for multiply-distorted images in gradient domain. IEEE Signal Process. Lett. 2016, 23, 541–545. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, L.; Bovik, A.C. A feature-enriched completely blind image quality evaluator. IEEE Trans. Image Process. 2015, 24, 2579–2591. [Google Scholar] [CrossRef]
- Ou, F.Z.; Wang, Y.G.; Zhu, G. A novel blind image quality assessment method based on refined natural scene statistics. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 1004–1008. [Google Scholar]
- Venkatanath, N.; Praneeth, D.; Bh, M.C.; Channappayya, S.S.; Medasani, S.S. Blind image quality evaluation using perception based features. In Proceedings of the 2015 Twenty First National Conference on Communications (NCC), Mumbai, India, 27 February–1 March 2015; pp. 1–6. [Google Scholar]
- Liu, L.; Hua, Y.; Zhao, Q.; Huang, H.; Bovik, A.C. Blind image quality assessment by relative gradient statistics and adaboosting neural network. Signal Process. Image Commun. 2016, 40, 1–15. [Google Scholar] [CrossRef]
- Mittal, A.; Moorthy, A.K.; Bovik, A.C. Making image quality assessment robust. In Proceedings of the 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), Pacific Grove, CA, USA, 4–7 November 2012; pp. 1718–1722. [Google Scholar]
- Liu, L.; Liu, B.; Huang, H.; Bovik, A.C. No-reference image quality assessment based on spatial and spectral entropies. Signal Process. Image Commun. 2014, 29, 856–863. [Google Scholar] [CrossRef]
- Zhang, W.; Ma, K.; Zhai, G.; Yang, X. Uncertainty-aware blind image quality assessment in the laboratory and wild. IEEE Trans. Image Process. 2021, 30, 3474–3486. [Google Scholar] [CrossRef]
- Madhusudana, P.C.; Birkbeck, N.; Wang, Y.; Adsumilli, B.; Bovik, A.C. Image Quality Assessment using Contrastive Learning. arXiv 2021, arXiv:2110.13266. [Google Scholar] [CrossRef]
- Zhang, W.; Ma, K.; Yan, J.; Deng, D.; Wang, Z. Blind image quality assessment using a deep bilinear convolutional neural network. IEEE Trans. Circuits Syst. Video Technol. 2018, 30, 36–47. [Google Scholar] [CrossRef]
- Lin, H.; Hosu, V.; Saupe, D. DeepFL-IQA: Weak supervision for deep IQA feature learning. arXiv 2020, arXiv:2001.08113. [Google Scholar]
- Hosu, V.; Lin, H.; Sziranyi, T.; Saupe, D. KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment. IEEE Trans. Image Process. 2020, 29, 4041–4056. [Google Scholar] [CrossRef] [PubMed]
- Su, S.; Hosu, V.; Lin, H.; Zhang, Y.; Saupe, D. KonIQ++: Boosting No-Reference Image Quality Assessment in the Wild by Jointly Predicting Image Quality and Defects. In Proceedings of the 32nd British Machine Vision Conference, Virtual, 22–25 November 2021. [Google Scholar]
- Hosu, V.; Goldlucke, B.; Saupe, D. Effective aesthetics prediction with multi-level spatially pooled features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9375–9383. [Google Scholar]
- Ying, Z.; Niu, H.; Gupta, P.; Mahajan, D.; Ghadiyaram, D.; Bovik, A. From patches to pictures (PaQ-2-PiQ): Mapping the perceptual space of picture quality. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 3575–3585. [Google Scholar]
- Zeng, H.; Zhang, L.; Bovik, A.C. A probabilistic quality representation approach to deep blind image quality prediction. arXiv 2017, arXiv:1708.08190. [Google Scholar]
- Pan, Z.; Yuan, F.; Lei, J.; Fang, Y.; Shao, X.; Kwong, S. VCRNet: Visual compensation restoration network for no-reference image quality assessment. IEEE Trans. Image Process. 2022, 31, 1613–1627. [Google Scholar] [CrossRef]
- Su, Y.; Korhonen, J. Blind Natural Image Quality Prediction Using Convolutional Neural Networks And Weighted Spatial Pooling. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 25–28 October 2020; pp. 191–195. [Google Scholar]
- Miyata, T. ZEN-IQA: Zero-Shot Explainable and No-Reference Image Quality Assessment With Vision Language Model. IEEE Access 2024, 12, 70973–70983. [Google Scholar] [CrossRef]
CLIVE [34] | KonIQ-10k [1] | |
---|---|---|
Publication year | 2015 | 2018 |
Number of images | 1169 | 10,073 |
Subjective framework | Crowdsourcing | Crowdsourcing |
Number of annotators | 8000 | 350,000 |
Number of annotations | 1400 | 1,200,000 |
Resolution | ||
Quality score range | 0–100 | 1–5 |
CLIVE [34] | KonIQ-10k [1] | |
---|---|---|
PLCC | 0.098 | 0.111 |
SROCC | 0.082 | 0.098 |
KROCC | 0.061 | 0.069 |
CLIVE [34] | KonIQ-10k [1] | |||||
---|---|---|---|---|---|---|
Method | PLCC | SROCC | KROCC | PLCC | SROCC | KROCC |
BIQI [42] | 0.519 | 0.488 | 0.329 | 0.688 | 0.662 | 0.471 |
BLIINDS-II [47] | 0.473 | 0.442 | 0.291 | 0.574 | 0.575 | 0.414 |
BMPRI [92] | 0.541 | 0.487 | 0.333 | 0.637 | 0.619 | 0.421 |
BRISQUE [18] | 0.524 | 0.497 | 0.345 | 0.707 | 0.677 | 0.494 |
CurveletQA [93] | 0.636 | 0.621 | 0.421 | 0.730 | 0.718 | 0.495 |
DIIVINE [45] | 0.617 | 0.580 | 0.405 | 0.709 | 0.693 | 0.471 |
ENIQA [94] | 0.596 | 0.564 | 0.376 | 0.761 | 0.745 | 0.544 |
GM-LOG-BIQA [53] | 0.607 | 0.604 | 0.383 | 0.705 | 0.696 | 0.501 |
GWH-GLBP [95] | 0.584 | 0.559 | 0.395 | 0.723 | 0.698 | 0.507 |
IL-NIQE [96] | 0.487 | 0.415 | 0.280 | 0.463 | 0.447 | 0.306 |
NBIQA [97] | 0.629 | 0.604 | 0.427 | 0.771 | 0.749 | 0.515 |
NIQE [20] | 0.328 | 0.299 | 0.200 | 0.319 | 0.400 | 0.272 |
PIQE [98] | 0.172 | 0.108 | 0.081 | 0.208 | 0.246 | 0.172 |
OG-IQA [99] | 0.545 | 0.505 | 0.364 | 0.652 | 0.635 | 0.447 |
Robust BRISQUE [100] | 0.522 | 0.484 | 0.330 | 0.718 | 0.668 | 0.477 |
SSEQ [101] | 0.487 | 0.436 | 0.309 | 0.589 | 0.572 | 0.423 |
ChatGPT-4o-Latest | 0.758 | 0.732 | 0.576 | 0.805 | 0.760 | 0.610 |
GPT-4o-2024-11-20 | 0.775 | 0.745 | 0.588 | 0.807 | 0.765 | 0.618 |
GPT-4-Turbo-2024-04-09 | 0.696 | 0.677 | 0.520 | 0.729 | 0.694 | 0.535 |
Claude-3-Haiku-20240307 | 0.484 | 0.474 | 0.353 | 0.458 | 0.370 | 0.291 |
Claude-3-Opus-20240229 | 0.557 | 0.529 | 0.396 | 0.683 | 0.602 | 0.459 |
Claude-3-Sonnet-20240229 | 0.610 | 0.586 | 0.437 | 0.717 | 0.664 | 0.517 |
CLIVE [34] | KonIQ-10k [1] | |||||
---|---|---|---|---|---|---|
Method | PLCC | SROCC | KROCC | PLCC | SROCC | KROCC |
BLIINDER [70] | 0.782 | 0.763 | 0.576 | 0.876 | 0.864 | 0.668 |
UNIQUE [102] | 0.891 | 0.855 | 0.633 | 0.900 | 0.897 | 0.664 |
CONTRIQUE [103] | 0.857 | 0.845 | - | 0.906 | 0.894 | - |
DB-CNN [104] | 0.869 | 0.851 | - | 0.884 | 0.875 | - |
DeepFL-IQA [105] | 0.769 | 0.734 | - | 0.887 | 0.877 | - |
HyperIQA [25] | 0.882 | 0.859 | - | 0.917 | 0.906 | - |
KonCept512 [106,107] | 0.848 | 0.825 | - | 0.937 | 0.921 | - |
MAMIQA [91] | 0.895 | 0.874 | - | 0.937 | 0.926 | - |
MLSP [105,108] | 0.769 | 0.734 | - | 0.887 | 0.877 | - |
PaQ-2-PiQ [109] | 0.850 | 0.840 | - | 0.880 | 0.870 | - |
PQR [110] | 0.882 | 0.857 | - | 0.884 | 0.880 | - |
VCRNet [111] | 0.865 | 0.856 | - | 0.909 | 0.894 | - |
WSP [112] | - | - | - | 0.931 | 0.918 | - |
ZEN-IQA [113] | 0.664 | 0.672 | - | 0.796 | 0.776 | - |
ChatGPT-4o-Latest | 0.758 | 0.732 | 0.576 | 0.805 | 0.760 | 0.610 |
GPT-4o-2024-11-20 | 0.775 | 0.745 | 0.588 | 0.807 | 0.765 | 0.618 |
GPT-4-Turbo-2024-04-09 | 0.696 | 0.677 | 0.520 | 0.729 | 0.694 | 0.535 |
Claude-3-Haiku-20240307 | 0.484 | 0.474 | 0.353 | 0.458 | 0.370 | 0.291 |
Claude-3-Opus-20240229 | 0.557 | 0.529 | 0.396 | 0.683 | 0.602 | 0.459 |
Claude-3-Sonnet-20240229 | 0.610 | 0.586 | 0.437 | 0.717 | 0.664 | 0.517 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Varga, D. Comparative Evaluation of Multimodal Large Language Models for No-Reference Image Quality Assessment with Authentic Distortions: A Study of OpenAI and Claude.AI Models. Big Data Cogn. Comput. 2025, 9, 132. https://doi.org/10.3390/bdcc9050132
Varga D. Comparative Evaluation of Multimodal Large Language Models for No-Reference Image Quality Assessment with Authentic Distortions: A Study of OpenAI and Claude.AI Models. Big Data and Cognitive Computing. 2025; 9(5):132. https://doi.org/10.3390/bdcc9050132
Chicago/Turabian StyleVarga, Domonkos. 2025. "Comparative Evaluation of Multimodal Large Language Models for No-Reference Image Quality Assessment with Authentic Distortions: A Study of OpenAI and Claude.AI Models" Big Data and Cognitive Computing 9, no. 5: 132. https://doi.org/10.3390/bdcc9050132
APA StyleVarga, D. (2025). Comparative Evaluation of Multimodal Large Language Models for No-Reference Image Quality Assessment with Authentic Distortions: A Study of OpenAI and Claude.AI Models. Big Data and Cognitive Computing, 9(5), 132. https://doi.org/10.3390/bdcc9050132