Digital Image Processing: Advanced Technologies and Applications

Mahmood, Zahid

doi:10.3390/app14146051

Open AccessEditorial

Digital Image Processing: Advanced Technologies and Applications

by

Zahid Mahmood

Department of Electrical and Computer Engineering, COMSATS University Islamabad, Abbottabad Campus, Tobe Camp, Abbottabad 22060, Pakistan

Appl. Sci. 2024, 14(14), 6051; https://doi.org/10.3390/app14146051

Submission received: 6 July 2024 / Accepted: 8 July 2024 / Published: 11 July 2024

(This article belongs to the Special Issue Digital Image Processing: Advanced Technologies and Applications)

Download Versions Notes

1. Introduction

A few decades ago, conventional image processing methods mostly focused on basic tasks such as image enhancement, registration, or edge detection. Early attempts to achieve these tasks mostly utilized grayscale images. Over time, simple methods to process grayscale images resulted in performance degradation for RGB images [1]. Ultimately, RGB image processing received more attention and subsequent advancements were made, including color preservation and fusion-based processing [2]. Currently, deep learning is used extensively in various fields, such as speech recognition and healthcare domains, with encouraging outcomes in image processing, such as image classification and segmentation [3]. A recent study showed that deep learning-based approaches significantly improve the performance of many image-related tasks, such as object detection, recognition, or segmentation compared, compared to conventional methods.

With the evolution of convolutional neural networks (CNNs), supervised learning techniques were used to train CNNs, which aimed to extract efficient features to meet their gold label requirements [4]. The performance of these methods strictly relied on the available training data. Subsequently, the limited annotated training data failed to acquire the particulars of the image details. Since the supervised learning approaches learned nonlinear mapping, they tended to primarily focus on the limited training data. As a result, the trained model struggled to yield encouraging results on unseen image data [5].

The domain of digital image processing has experienced amazing advancements, particularly through the evolution of deep learning-based algorithms, which have enhanced capabilities in many real-life applications, such as image object detection [6], recognition [7], segmentation [8], edge detection [9], and restoration [10]. Despite these advances, critical gaps remain in research and knowledge, especially in the applications and exploration of deep learning models’ robustness in several challenging situations. Deep learning models also have great ease and efficiency in processing high-dimensional data [11].

This Special Issue entitled “Digital Image Processing: Advanced Technologies and Applications” addresses these challenges by collecting 15 state-of-the-art research contributions that reinforce current methodologies and offer inventive solutions and novel perspectives. Looking ahead, future research will likely focus on developing more robust and explainable AI models to enhance the feasibility of image processing systems.

Future research can also focus on exploring the potential of quantum computing to process progressively complex image data, in addition to current deep learning models. These directions will not only satisfy existing knowledge gaps but also open new possibilities for advanced applications and technologies in digital image processing.

2. An Overview of Published Articles

During the past three decades, a large number of diverse methods have appeared in computer vision and machine learning. Many of them utilize conventional machine learning. However, the recent trend in deep learning has yielded encouraging results. This section gives a brief overview of the works collected in this Special Issue.

In contribution one, researchers proposed an AI-enabled setup to analyze animal behavior with the objective of providing better flexibility and scalability to make the proposed setup more feasible. One of the interesting aspects of this work is that users can compliantly extend different behavior recognition algorithms to recognize animal behaviors and enjoy convenient human–computer interaction through natural language descriptions. A case study is discussed that evaluates behavioral variations between sick and healthy animals in a medical laboratory.

License plate recognition (LPR) is a key part of current intelligent systems that locate and identify varying license plates. LPR is a challenging task due to the various designs of LPs, a lack of standard LP templates, unconventional outlines, and angle dissimilarities/occlusion. These factors influence the appearance of the LP and degrade the detection and recognition abilities of algorithms. However, recent rising trends in the development of machine learning algorithms have prompted authors to solve this problem, which is the second contribution in this Special Issue. Particularly, this contribution presents a novel LPR algorithm to solve the aforedescribed challenges. This method is composed of three interconnected steps: Initially, a vehicle is detected using the Faster-RCNN algorithm. Next, the LP is spotted by applying the morphological operations of imaging. Lastly, LPR is accomplished using a deep learning network. Experiments conducted on several datasets indicate a mean LPR accuracy of over 96% on three different datasets.

The third contribution in this manuscript is about Urdu numeral classification and recognition. Urdu is one of the most complex languages, as it is a combination of several languages. Therefore, its character recognition is a difficult task. It is a bidirectional language that induces complexities during the recognition procedure. This contribution uses CNN and its variants to extract features, which are used by the Softmax activation function and SVM classifier. The obtained results are compared with GoogLeNet and the residual network (ResNet). This contribution reports 98.41% accuracy with the Softmax classifier and 99.0% with the SVM classifier. For GoogLeNet, the obtained accuracies are 95.61% and 96.4%, respectively, on ResNet.

Unmanned aerial vehicle (UAV) image capture is a promising means for acquiring geospatial data. Securing even and consistent quality in UAV images is hard due to the use of low-cost steering devices and non-surveying cameras. In addition, no specific procedures exist to perform quantitative tests on UAV images. Hence, in the fourth contribution, the authors conducted a modulation transfer function (MTF) investigation using a slanted-edge target and a ground sample distance (GSD) analysis to verify the basics of MTF analysis. This was used to verify the basics of MTF analysis in assessing UAV image values.

The accurate extraction of individual features in multi-view and multi-modal datasets is a difficult topic. In the fifth contribution, researchers present PhotoMatch, an open-source tool for multi-view and multi-modal feature-based image matching. The software contains several recently developed methods to process, extract, and match features. It also offers tools for a thorough assessment and judgement of the numerous methods and allows the user to select the top combination of methods for every modality in the dataset. A set of thirteen case studies, which included six multi-view and six multi-modal image datasets, were processed by following different methodologies.

In recognition of the importance of the video classification task and to summarize the success of deep learning models, contribution six is a concise review of the said topic. Particularly, this work highlights several major findings that are based on existing deep learning algorithms. This review emphasizes the type of architectures used, the evaluation criteria, and the experimented datasets. Moreover, a fair insight into the recently reported deep learning methods and traditional approaches is also provided. Furthermore, the important tasks based on the targets are highlighted to calculate the technical advancement of these systems.

In the seventh contribution, researchers addressed the task of multiple-object tracking (MOT) in complex scenarios, such as instances of missed detections, false alarms, and frequent target switching. This contribution has explicit potential applications in security applications, which include public safety and fire prevention, to track crucial targets. Therefore, researchers proposed an approach to multi-object tracking and an identity validity discrimination module. The authors raised the KC-YOLO detection model for tracking, optimized detection frames, and implemented adaptive feature refinement to solve challenges, for instance, incomplete pedestrian features, which are caused by occlusion. The method proposed in this work improves pedestrian tracking accuracy along with pedestrian characteristics. In experiments on the MOT16, MOT17, and MOT20 datasets, this method resulted in substantial findings and encouraging results.

The eighth contribution in this Special Issue is related to the study of recognizing handwritten Arabic characters. Given the fundamental complexities of the Arabic characters that encompass semi-cursive styles, apparent character models, and the insertion of diacritical spots, this area of research has great potential. Highlights in this work are on children’s handwritten Arabic writing. This area is recognized for its apparent challenges, for example, variations in writing and distortions. The researchers also collected a dataset, referred to as “Dhad”. Their investigation employs a tri-fold experimental approach, covering the investigation of pre-trained deep learning algorithms, custom-designed ConvNets architectures, and established classifiers. These findings sort out the efficiency of fine-tuned models, the potential of custom ConvNets designs, and the details associated with several classification paradigms. The pre-trained model yields the best test accuracy, at 93.59%, with the authors’ collected dataset. Moreover, researchers also proposed the idea of a novel application specifically for children younger than 13, with the aim of improving their handwriting skills.

The ninth contribution in this Special Issue is related to the analysis of mammography images using multi-branch attentional ConvNets. In this work, a research team proposed a method based on the multi-label classification of two-view mammography images. It influences the correlation between lesion type and its different states. It then classifies mammograms into density, anomaly type, and difficulty level. It takes two-view mammograms as input, analyzes them using ConvNeXt and the channel attention mechanism, and integrates this information. Finally, the combined information is fed into multi-branches, which learn pattern representations to predict the appropriate state. This algorithm was evaluated on two public domain benchmark datasets, INBreast and the Curated Breast Imaging Subset of DDSM. The developed CAD method discusses the holistic performance of a patient’s state. It guides radiologists in the analysis of mammograms with a facility to prepare a complete report of a patient’s condition with high confidence.

The tenth contribution in this Special Issue is about the detection and classification of vehicles from publicly available datasets through YOLO-v5. The authors use a transfer learning method on the packed traffic patterns. The datasets were made thorough by introducing various aspects, for example, high- and low-density traffic images and distinct weather environments. Eventually, the improved YOLO-v5 algorithm becomes familiar to any traffic examples. Through fine-tuning the pre-trained system, the authors validated that the proposed YOLO-v5 has surpassed various traditional vehicle detection algorithms in terms of accuracy and complexity. The experiments were conducted on three different datasets to demonstrate its effectiveness in varying real-life conditions.

The eleventh contribution in this Special Issue discusses segmentation in X-ray computed tomography (CT) data for non-destructive testing (NDT) by combining the segment anything model (SAM) with tile-based flood-filling networks (FFN). This method evaluates the performance of the SAM on volumetric NDT datasets and demonstrates its effectiveness to segment instances in challenging imaging scenarios. The authors implemented different methods to analyze the image-based SAM algorithm for use with volumetric datasets. This investigation enables the segmentation of 3D objects using FFN’s spatial flexibility. The piecewise method for SAM influences FFN’s abilities to segment various sized objects. This research has huge potential for merging SAM with FFN for volumetric instance segmentation, particularly for large objects.

The twelfth contribution in this Special Issue discusses a novel methodology that combines bidirectional feature learning and generative networks to innovatively approach the domain adaptation problem. This study proves that merging bidirectional feature learning and generative networks is an effective solution for domain adaptation. Through various evaluations, authors verify that merging outperforms the existing works.

The thirteenth contribution in this study proposes a fruit freshness classification method through deep learning. After the fruit data was gathered, the data was preprocessed, including augmentation and labeling. Later, the AlexNet model was used. Meanwhile, transfer learning and fine-tuning of the CNN was accomplished. Lastly, the Softmax classifier was used for classification. Experiments were performed using three commonly available datasets. The proposed model achieved highly favorable results in all three datasets by yielding an over 98% classification accuracy. In addition, this method is also computationally efficient and works in real-time to yield the final classification result.

The fourteenth contribution is about a survey of optical character recognition (OCR). OCR is a process of extracting handwritten or printed text from a scanned or printed image and converting it to a machine-readable form for further data processing. OCR technology helps digitize documents for improved productivity and accessibility. Currently, the OCR is useful for preserving historical documents. The authors briefly discuss the recent OCR methods and identify the best-performing approach that researchers could utilize in their developed applications. This contribution also covers research gaps and presents future directions for Arabic language OCR in a systematic way.

In the fifteenth contribution, the authors present a method of transfer style patterns while fusing the confined style construction with the local contented arrangement. In this contribution, numerous levels of coarse stylized features are reconstructed at low resolution using a coarse network. While achieving this, the color distribution is transferred, and the content structure is integrated with the initial style structure. Then, both the reconstructed and the content features are embraced to produce high-quality, structure-aware stylized images that have a high resolution. This is obtained through a fine network that has three structural selective fusion (SSF) sections. This method has proven to be robust by generating high-quality stylization outcomes.

3. Conclusions

The contributions listed in this Special Issue can be combined into three major groups with the following key attributes.

Group 1: Object detection: In this category, contributions 7 and 10 inspect various object detection methods. In particular, contribution 7 addresses multi-pedestrian detection and tracking. Whereas contribution 10 addresses license plate detection in real-life images in an open environment.

Group 2: Object recognition: In this category, several state-of-the-art contributions were accepted, which include contributions 1, 2, 3, 6, 8, 13, and 14. The afore-listed contributions either use AI methods or use deep learning methods to inspect various objects. This group is most prominent in this Special Issue and gathers significant scientific findings in the object recognition domain.

Group 3: Image Manipulations: This group gathers contributions 4, 5, 9, 11, 12, and 15. Specifically, contribution 4 evaluates the quality of aerial images. Whereas 5, 9, 11, 12, and 15 perform image manipulations through various methods listed therein. In these collections, contribution 12 is particularly related to image segmentation, which is currently a challenging task in various real-life scenarios.

After thoroughly analyzing the gathered contributions, the following important points are highlighted:

With the rapid advancements in AI and machine learning, the use of deep learning in various applications has become obvious in many industries due to its ability to process complex patterns and make reliable predictions. Therefore, deep learning algorithms have found their place in crucial fields, including object detection and recognition, natural language processing, and medical imaging. For instance, CNNs are extensively employed for tasks such as image classification, object detection, segmentation, and medical imaging. Similarly, recurrent neural networks (RNNs) and transformers have advanced the capabilities of many applications. For instance, real-time translation, sentiment analysis, and conversational agents. The development of GPUs and large-scale datasets has further driven deep learning’s adoption for solving complex problems with exceptional accuracy and efficiency.
With the evolution of RGB images, several state-of-the-art algorithms have also appeared in the literature. Most of the images related papers collected in this Special Issue address RGB images using an intelligent combination of machine learning-based methods to achieve desired outcomes.

Final Remarks: Digital Image Processing: Advanced Technologies and Applications will serve as a fundamental resource for researchers and practitioners. It will also assist students who aim to orient their career in machine learning and deep learning. It not only imparts basic knowledge but also stimulates advanced thinking and exploration in recent technological advancements. As the digital imaging domain continues to grow, the insights and methodologies collected in this Special Issue will provide resources and applications for newcomers. The following are a few major takeaways from the collections presented in this Special Issue:

Technological Integration: The collection presents a combination of digital image processing with advanced technologies, such as machine learning and deep learning, and demonstrates their potential for solving complex, real-world problems.

Algorithmic Development: This collection emphasizes the development and optimization of recent algorithms, which process images, extract features, and report efficient processed images results.

Innovative Applications: A variety of applications in several domains are collected, which include traffic images, medical imaging, and aerial images. Each manuscript gathered here underscores their practical relevance to modern-day technology.

Future Directions: This Special Issue also hints towards future directions in several domains, such as colored image processing, image analysis, and the development of more robust procedures, which are capable of handling a variety of datasets. Finally, the conventional object detection, recognition, and segmentation methods [12] can be integrated with recent deep learning algorithms to build a more accurate and feasible system to be deployed for various scenarios.

Conflicts of Interest

The authors declare no conflicts of interest.

List of Contributions

Chen, Y.; Jiao, T.; Song, J.; He, G.; Jin, Z. AI-Enabled Animal Behavior Analysis with High Usability: A Case Study on Open-Field Experiments. Appl. Sci. 2024, 14, 4583. https://doi.org/10.3390/app14114583.
Sultan, F.; Khan, K.; Shah, Y.A.; Shahzad, M.; Khan, U.; Mahmood, Z. Towards Automatic License Plate Recognition in Challenging Conditions. Appl. Sci. 2023, 13, 3956. https://doi.org/10.3390/app13063956.
Bhatti, A.; Arif, A.; Khalid, W.; Khan, B.; Ali, A.; Khalid, S.; Rehman, A.U. Recognition and classification of handwritten urdu numerals using deep learning techniques. Appl. Sci. 2023, 13, 1624. https://doi.org/10.3390/app13031624.
Kim, J.H.; Sung, S.M. Quality Analysis of Unmanned Aerial Vehicle Images Using a Resolution Target. Appl. Sci. 2024, 14, 2154. https://doi.org/10.3390/app14052154.
Ruiz de Oña, E.; Barbero-García, I.; González-Aguilera, D.; Remondino, F.; Rodríguez-Gonzálvez, P.; Hernández-López, D. PhotoMatch: An Open-Source Tool for Multi-View and Multi-Modal Feature-Based Image Matching. Appl. Sci. 2023, 13, 5467. https://doi.org/10.3390/app13095467.
Belhaouari, A.R.S.B.; Kabir, M.A.; Khan, A. On the Use of Deep Learning for Video Classification. Appl. Sci. 2022, 13, 2007. https://doi.org/10.3390/app13032007.
Li, J.; Wu, W.; Zhang, D.; Fan, D.; Jiang, J.; Lu, Y.; Gao, E.; Yue, T. Multi-Pedestrian Tracking Based on KC-YOLO Detection and Identity Validity Discrimination Module. Appl. Sci. 2023, 10, 2228. https://doi.org/10.3390/app132212228.
AlMuhaideb, S.; Altwaijry, N.; AlGhamdy, A.D.; AlKhulaiwi, D.; AlHassan, R.; AlOmran, H.; AlSalem, A.M. Dhad—A Children’s Handwritten Arabic Characters Dataset for Automated Recognition. Appl. Sci. 2024, 10, 2332. https://doi.org/10.3390/app14062332.
Al-Mansour, E.; Hussain, M.; Aboalsamh, H.A.; Al-Ahmadi, S.A. Comprehensive Analysis of Mammography Images Using Multi-Branch Attention Convolutional Neural Network. Appl. Sci. 2024, 5, 12995. https://doi.org/10.3390/app132412995.
Farid, A.; Hussain, F.; Khan, K.; Shahzad, M.; Khan, U.; Mahmood, Z. A Fast and Accurate Real-time Vehicle Detection Method Using Deep Learning for Unconstrained Environments. Appl. Sci. 2023, 30, 3059. https://doi.org/10.3390/app13053059.
Gruber, R.; Rüger, S.; Wittenberg, T. Adapting the Segment Anything Model for Volumetric X-ray Data-Sets of Arbitrary Sizes. Appl. Sci. 2024, 17, 3391. https://doi.org/10.3390/app14083391.
Han, C.; Choo, H.; Jeong, J. Bidirectional-Feature-Learning-Based Adversarial Domain Adaptation with Generative Network. Appl. Sci. 2023, 13, 11825. https://doi.org/10.3390/app132111825.
Amin, U.; Shahzad, M.I.; Shahzad, A.; Shahzad, M.; Khan, U.; Mahmood, Z. Automatic fruits freshness classification using CNN and transfer learning. Appl. Sci. 2023, 11, 8087. https://doi.org/10.3390/app13148087.
Faizullah, S.; Ayub, M.S.; Hussain, S.; Khan, M.A. A survey of OCR in Arabic language: Applications, techniques, and challenges. Appl. Sci. 2023, 13, 4584. https://doi.org/10.3390/app13074584.
Liu, K.; Yuan, G.; Wu, H.; Qian, W. Coarse-to-Fine Structure-Aware Artistic Style Transfer. Appl. Sci. 2023, 13, 952. https://doi.org/10.3390/app13020952.

References

Zhang, X.; Wang, X.; Yan, C.; Sun, Q. EV-fusion: A novel infrared and low-light color visible image fusion network integrating unsupervised visible image enhancement. IEEE Sens. J. 2024, 73, 5020911. [Google Scholar] [CrossRef]
Yin, M.; Du, X.; Liu, W.; Yu, L.; Xing, Y. Multiscale fusion algorithm for underwater image enhancement based on color preservation. IEEE Sens. J. 2023, 23, 7728–7740. [Google Scholar] [CrossRef]
Qi, Y.; Guo, Y.; Wang, Y. Image Quality Enhancement Using a Deep Neural Network for Plane Wave Medical Ultrasound Imaging. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 2021, 68, 926–934. [Google Scholar] [CrossRef] [PubMed]
Ye, T.; Qin, W.; Zhao, Z.; Gao, X.; Deng, X.; Ouyang, Y. Real-time object detection network in UAV-vision based on CNN and transformer. IEEE Trans. Instrum. Meas. 2023, 72, 2505713. [Google Scholar] [CrossRef]
Mahmood, Z.; Ullah, A.; Khan, T.; Zahir, A. Miscellaneous Objects Detection Using Machine Learning Under Diverse Environments. Adv. Deep Gener. Models Med. Artif. Intell. Stud. Comput. Intell. 2023, 1124, 201–223. [Google Scholar]
Alassafi, M.O.; Ibrahim, M.S.; Naseem, I.; AlGhamdi, R.; Alotaibi, R.; Kateb, F.A.; Oqaibi, H.M.; Alshdadi, A.A.; Yusuf, S.A. A novel deep learning architecture with image diffusion for robust face presentation attack detection. IEEE Access 2023, 11, 59204–59216. [Google Scholar] [CrossRef]
Tan, Z.; Liu, A.; Wan, J.; Liu, H.; Lei, Z.; Guo, G.; Li, S.Z. Cross-batch hard example mining with pseudo large batch for id vs. spot face recognition. IEEE Trans. Image Process. 2022, 31, 3224–3235. [Google Scholar] [CrossRef]
Sheikhjafari, A.; Krishnaswamy, D.; Noga, M.; Ray, N.; Punithakumar, K. Deep learning based parameterization of diffeomorphic image registration for cardiac image segmentation. IEEE Trans. NanoBiosci. 2023, 22, 800–807. [Google Scholar] [CrossRef] [PubMed]
Felt, V.; Kacker, S.; Kusters, J.; Pendergrast, J.; Cahoy, K. Fast ocean front detection using deep learning edge detection models. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4204812. [Google Scholar] [CrossRef]
Zhang, Q.; Dong, Y.; Yuan, Q.; Song, M.; Yu, H. Combined deep priors with low-rank tensor factorization for hyperspectral image restoration. IEEE Geosci. Remote Sens. Lett. 2023, 20, 5500205. [Google Scholar] [CrossRef]
Wu, R.; Zheng, F.; Li, M.; Huang, S.; Ge, X.; Liu, L.; Liu, Y.; Ni, G. Toward ground-truth optical coherence tomography via three-dimensional unsupervised deep learning processing and data. IEEE Trans. Med. Imaging 2024, 43, 2395–2407. [Google Scholar]
Mahmood, Z.; Muhammad, N.; Bibi, N.; Ali, T. A Review on state-of-the-art Face Recognition Approaches. Fractals Complex Geom. Patterns Scaling Nat. Soc. 2017, 25, 1750025. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mahmood, Z. Digital Image Processing: Advanced Technologies and Applications. Appl. Sci. 2024, 14, 6051. https://doi.org/10.3390/app14146051

AMA Style

Mahmood Z. Digital Image Processing: Advanced Technologies and Applications. Applied Sciences. 2024; 14(14):6051. https://doi.org/10.3390/app14146051

Chicago/Turabian Style

Mahmood, Zahid. 2024. "Digital Image Processing: Advanced Technologies and Applications" Applied Sciences 14, no. 14: 6051. https://doi.org/10.3390/app14146051

APA Style

Mahmood, Z. (2024). Digital Image Processing: Advanced Technologies and Applications. Applied Sciences, 14(14), 6051. https://doi.org/10.3390/app14146051

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Digital Image Processing: Advanced Technologies and Applications

1. Introduction

2. An Overview of Published Articles

3. Conclusions

Conflicts of Interest

List of Contributions

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI