Algorithms for Image Processing and Machine Vision
1. Introduction
2. Foundational Algorithmic and Compression Advances
- Determining Thresholds for Optimal Adaptive Discrete Cosine Transformation (Khanov et al. [11]) presents a novel method for optimal threshold search in adaptive Discrete Cosine Transform (DCT). By tailoring thresholds to tonal distribution, their algorithm achieves up to 66% higher compression ratios compared to traditional DCT, with applications in traffic-demanding urban video surveillance.
- Synthetic Face Discrimination via Learned Image Compression (Iliopoulou et al. [12]) addresses the urgent challenge of distinguishing real from synthetic (GAN/diffusion-generated) faces. Their compression-based forensic approach exploits differences in post-compression quality, achieving efficient and generalized deepfake detection.
3. Advances in Recognition, Detection, and Scene Understanding
- Human Action Representation Learning Using an Attention-Driven Residual 3DCNN Network (Ullah and Munir [13]) introduces Dual-Attentional Residual 3D Convolutional Neural Network (DA-R3DCNN), a spatiotemporal attention model delivering 11% improved accuracy and 74× faster inference, ideal for real-time human activity recognition.
- Indoor Scene Recognition with Transfer Learning and Liquid State Machines (Surendran et al. [14]) combines DenseNet201 features with fuzzy-based selection and liquid state machine classifiers, reaching 96% accuracy on NYU datasets, particularly benefiting assistive technology applications.
- DDL R-CNN: Dynamic Direction Learning R-CNN for Rotated Object Detection (Su and Jing [15]) tackles the challenge of detecting arbitrarily oriented targets in remote sensing. Their dynamic direction module pre-extracts orientation features, enabling anchor box alignment and surpassing prior rotation detectors on UCAS-AOD and HRSC2016 datasets.
4. Safety, Healthcare, Textile, and Inclusive Education Applications
- YOLOv7 for Real-Time Car Safety Belt Detection (Nkuzo et al. [16]) develops a high-performance seatbelt usage detection system with a mean average precision (mAP) of 99.6% and real-time deployment potential.
- Detecting Motorcyclists’ Helmet Violations via YOLOv8 + DCGAN (Shoman et al. [17]) enhances training data diversity using generative augmentation, improving helmet violation detection F1-scores from 0.91 to 0.96.
- Continuous Recognition of Teachers’ Hand Signals for Students with Attention Deficits (Chen et al. [18]) applies MediaPipe BlazePose to classroom settings, achieving 88% F1-score and providing real-time non-verbal cues for inclusive education.
- Improved U2Net-Based Surface Defect Detection for Blister Tablets (Zhou et al. [19]) advances pharmaceutical quality assurance, combining large-kernel attention with Gaussian–Laplacian loss for defect detection at 99% accuracy within 50 ms per image.
- Enhanced Curvature-Based Fabric Defect Detection with Gabor Transform and Deep Learning (Erdogan and Dogan [20]) proposes a curvature algorithm, integrated with the Gabor transform, achieving performance comparable to convolutional neural networks (CNNs) while requiring minimal storage and processing, making it ideal for real-time textile quality control.
5. Generative and Diffusion-Based Models
- Denoising Diffusion Models on Model-Based Latent Space (Scribano et al. [21]) replaces learned autoencoders with model-based lossy compression schemes, reducing computing cost while preserving generative quality.
- GDUI: Guided Diffusion for Unlabeled Images (Xie and Zhao [22]) integrates Contrastive Language–Image Pretraining (CLIP)-based semantic alignment with diffusion, enabling label-free clustering and semantically guided synthesis.
- Lester: Rotoscope Animation via Segmentation and Tracking (Tous [23]) leverages Segment Anything Model (SAM) and Decoupling Features in Hierarchical Propagation for Video Object Segmentation (DeAOT) for temporally consistent 2D animation generation, offering deterministic alternatives to diffusion pipelines in creative industries.
- GAGAN: Hybrid Genetic Algorithm-Optimized DCGANs (Konstantopoulou et al. [24]) demonstrates how evolutionary strategies can stabilize GAN training and improve image quality, showing the promise of hybrid evolutionary–deep learning models for generative synthesis.
6. Emotion, Anomaly, and Surveillance
- Histogram Equalization with VGG Models for Facial Emotion Recognition (Chowdhury et al. [25]) achieves perfect classification on CK+ and near-perfect results on KDEF datasets, showing that simple preprocessing dramatically boosts deep model performance.
- Video Anomaly Detection with Vision Transformers and Spatiotemporal Attention (Habeb et al. [26]) develops a hybrid deep learning model that combines a vision transformer (ViT) with a convolutional spatiotemporal relationship (STR) attention block, achieving 95.6% area under the receiver operating characteristic curve (AUC ROC) on UCSD-Ped2 dataset and scaling effectively to very large datasets, such as ShanghaiTech and CHAD.
7. Vision for Infrastructure, Hazards, and the Built Environment
- Application Framework for UAV-Based Earthquake-Induced Structural Displacement Monitoring (Ji et al. [27]) introduces an unmanned aerial vehicle (UAV) vision-based system validated on a six-story mass timber building under shake-table tests, achieving millimeter-level displacement accuracy compared with ground-truth sensors.
8. Conclusions
- Algorithmic efficiency is being redefined by hybrid architectures, lightweight preprocessing, and compression-aware methods.
- Societal impact is evident in transportation safety, inclusive education, pharmaceutical and textile quality control, and disaster resilience.
- Generative frontiers are rapidly advancing, with GAN-evolutionary hybrids and diffusion-guided methods opening creative and practical applications.
Conflicts of Interest
References
- Golan, I.; El-Yaniv, R. Deep anomaly detection using geometric transformations. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS’18), Montréal, QC, Canada, 3–8 December 2018; Curran Associates Inc.: Red Hook, NY, USA, 2018; pp. 9781–9791. [Google Scholar]
- Velastegui, R.; Tatarchenko, M.; Karaoglu, S.; Gevers, T. Image semantic segmentation of indoor scenes: A survey. Comput. Vis. Image Underst. 2024, 248, 104102. [Google Scholar] [CrossRef]
- del Olmo, J.J.L.; López-de-Teruel, P.E.; Ruiz, A.; García-Clemente, F.J. Computer Vision on the Edge: A Scalable Auto-ID Solution for Industrial Logistics. Procedia Comput. Sci. 2025, 265, 276–284. [Google Scholar] [CrossRef]
- Cheng, Z.; Wu, Y.; Li, Y.; Cai, L.; Ihnaini, B. A Comprehensive Review of Explainable Artificial Intelligence (XAI) in Computer Vision. Sensors 2025, 25, 4166. [Google Scholar] [CrossRef]
- Wang, T.-C.; Liu, M.-Y.; Zhu, J.-Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8798–8807. [Google Scholar] [CrossRef]
- Bermudez, L.; Dabby, N.; Lin, Y.A.; Hilmarsdottir, S.; Sundararajan, N.; Kar, S. A Learning-Based Approach to Parametric Rotoscoping of Multi-Shape Systems. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2021; pp. 776–785. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS 2014): Annual Conference on Neural Information Processing Systems 2014, Montréal, QC, Canada, 8–13 December 2014; Volume 27. [Google Scholar]
- Khare, S.K.; Blanes-Vidal, V.; Nadimi, E.S.; Acharya, U.R. Emotion recognition and artificial intelligence: A systematic review (2014–2023) and research recommendations. Inf. Fusion 2024, 102, 102019. [Google Scholar] [CrossRef]
- Nassif, A.B.; Talib, M.A.; Nasir, Q.; Dakalbab, F.M. Machine Learning for Anomaly Detection: A Systematic Review. IEEE Access 2021, 9, 78658–78700. [Google Scholar] [CrossRef]
- Munir, A.; Kwon, J.; Lee, J.H.; Kong, J.; Blasch, E.; Aved, A.; Muhammad, K. FogSurv: A Fog-Assisted Architecture for Urban Surveillance Using Artificial Intelligence and Data Fusion. IEEE Access 2021, 9, 111938–111959. [Google Scholar] [CrossRef]
- Khanov, A.; Shulzhenko, A.; Voroshilova, A.; Zubarev, A.; Karimov, T.; Fahmi, S. Determining Thresholds for Optimal Adaptive Discrete Cosine Transformation. Algorithms 2024, 17, 366. [Google Scholar] [CrossRef]
- Iliopoulou, S.; Tsinganos, P.; Ampeliotis, D.; Skodras, A. Synthetic Face Discrimination via Learned Image Compression. Algorithms 2024, 17, 375. [Google Scholar] [CrossRef]
- Ullah, H.; Munir, A. Human Action Representation Learning Using an Attention-Driven Residual 3DCNN Network. Algorithms 2023, 16, 369. [Google Scholar] [CrossRef]
- Surendran, R.; Chihi, I.; Anitha, J.; Hemanth, D.J. Indoor Scene Recognition: An Attention-Based Approach Using Feature Selection-Based Transfer Learning and Deep Liquid State Machine. Algorithms 2023, 16, 430. [Google Scholar] [CrossRef]
- Su, W.; Jing, D. DDL R-CNN: Dynamic Direction Learning R-CNN for Rotated Object Detection. Algorithms 2025, 18, 21. [Google Scholar] [CrossRef]
- Nkuzo, L.; Sibiya, M.; Markus, E.D. A Comprehensive Analysis of Real-Time Car Safety Belt Detection Using the YOLOv7 Algorithm. Algorithms 2023, 16, 400. [Google Scholar] [CrossRef]
- Shoman, M.; Ghoul, T.; Lanzaro, G.; Alsharif, T.; Gargoum, S.; Sayed, T. Enforcing Traffic Safety: A Deep Learning Approach for Detecting Motorcyclists’ Helmet Violations Using YOLOv8 and Deep Convolutional Generative Adversarial Network-Generated Images. Algorithms 2024, 17, 202. [Google Scholar] [CrossRef]
- Chen, I.D.S.; Yang, C.-M.; Wu, S.-S.; Yang, C.-K.; Chen, M.-J.; Yeh, C.-H.; Lin, Y.-H. Continuous Recognition of Teachers’ Hand Signals for Students with Attention Deficits. Algorithms 2024, 17, 300. [Google Scholar] [CrossRef]
- Zhou, J.; Huang, J.; Liu, J.; Liu, J. Improved U2Net-Based Surface Defect Detection Method for Blister Tablets. Algorithms 2024, 17, 429. [Google Scholar] [CrossRef]
- Erdogan, M.; Dogan, M. Enhanced Curvature-Based Fabric Defect Detection: A Experimental Study with Gabor Transform and Deep Learning. Algorithms 2024, 17, 506. [Google Scholar] [CrossRef]
- Scribano, C.; Pezzi, D.; Franchini, G.; Prato, M. Denoising Diffusion Models on Model-Based Latent Space. Algorithms 2023, 16, 501. [Google Scholar] [CrossRef]
- Xie, X.; Zhao, J. GDUI: Guided Diffusion Model for Unlabeled Images. Algorithms 2024, 17, 125. [Google Scholar] [CrossRef]
- Tous, R. Lester: Rotoscope Animation through Video Object Segmentation and Tracking. Algorithms 2024, 17, 330. [Google Scholar] [CrossRef]
- Konstantopoulou, D.; Zacharia, P.; Papoutsidakis, M.; Leligou, H.C.; Patrikakis, C. GAGAN: Enhancing Image Generation Through Hybrid Optimization of Genetic Algorithms and Deep Convolutional Generative Adversarial Networks. Algorithms 2024, 17, 584. [Google Scholar] [CrossRef]
- Chowdhury, J.H.; Liu, Q.; Ramanna, S. Simple Histogram Equalization Technique Improves Performance of VGG Models on Facial Emotion Recognition Datasets. Algorithms 2024, 17, 238. [Google Scholar] [CrossRef]
- Habeb, M.H.; Salama, M.; Elrefaei, L.A. Enhancing Video Anomaly Detection Using a Transformer Spatiotemporal Attention Unsupervised Framework for Large Datasets. Algorithms 2024, 17, 286. [Google Scholar] [CrossRef]
- Ji, R.; Sorosh, S.; Lo, E.; Norton, T.J.; Driscoll, J.W.; Kuester, F.; Barbosa, A.R.; Simpson, B.G.; Hutchinson, T.C. Application Framework and Optimal Features for UAV-Based Earthquake-Induced Structural Displacement Monitoring. Algorithms 2025, 18, 66. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Munir, A. Algorithms for Image Processing and Machine Vision. Algorithms 2025, 18, 750. https://doi.org/10.3390/a18120750
Munir A. Algorithms for Image Processing and Machine Vision. Algorithms. 2025; 18(12):750. https://doi.org/10.3390/a18120750
Chicago/Turabian StyleMunir, Arslan. 2025. "Algorithms for Image Processing and Machine Vision" Algorithms 18, no. 12: 750. https://doi.org/10.3390/a18120750
APA StyleMunir, A. (2025). Algorithms for Image Processing and Machine Vision. Algorithms, 18(12), 750. https://doi.org/10.3390/a18120750
