applsci-logo

Journal Browser

Journal Browser

Novel Research on Image and Video Processing Technology

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 May 2025 | Viewed by 16025

Special Issue Editor


E-Mail Website
Guest Editor
School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK 73019-1102, USA
Interests: information theory; signal and image processing

Special Issue Information

Dear Colleagues,

MDPI Applied Sciences is pleased to announce a Call for Papers for an upcoming Special Issue on "Novel Research on Image and Video Processing Technology". We invite authors from academia, industry, and research institutions globally to contribute their high-quality original research and review articles for this Issue.

The rapidly evolving field of Image and Video Processing Technology has established a new paradigm in various scientific and technological fields, including computer science, engineering, telecommunications, robotics, and artificial intelligence. This Special Issue seeks to publish research articles and reviews that provide significant advances and breakthroughs in the following topics, including but not limited to:

  • Advanced algorithms for image and video processing;
  • Machine learning and AI in image and video processing;
  • Deep learning techniques in video and image recognition;
  • Augmented Reality and Virtual Reality image processing;
  • Computational photography and videography;
  • 3D image and video processing and analysis;
  • Image and video compression, coding, and encryption;
  • Real-time image and video processing;
  • Biometric image processing;
  • Medical image and video analysis.

Dr. Samuel Cheng
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image and video processing algorithms
  • AI/ML in image and video processing
  • AR/VR image and video processing
  • 3D image and video processing
  • medical image and video analysis

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

15 pages, 2580 KiB  
Article
Self-Attention (SA)-ConvLSTM Encoder–Decoder Structure-Based Video Prediction for Dynamic Motion Estimation
by Jeongdae Kim, Hyunseung Choo and Jongpil Jeong
Appl. Sci. 2024, 14(23), 11315; https://doi.org/10.3390/app142311315 - 4 Dec 2024
Viewed by 1438
Abstract
Video prediction, which is the task of predicting future video frames based on past observations, remains a challenging problem because of the complexity and high dimensionality of spatiotemporal dynamics. To address the problems associated with spatiotemporal prediction, which is an important decision-making tool [...] Read more.
Video prediction, which is the task of predicting future video frames based on past observations, remains a challenging problem because of the complexity and high dimensionality of spatiotemporal dynamics. To address the problems associated with spatiotemporal prediction, which is an important decision-making tool in various fields, several deep learning models have been proposed. Convolutional long short-term memory (ConvLSTM) can capture space and time simultaneously and has shown excellent performance in various applications, such as image and video prediction, object detection, and semantic segmentation. However, ConvLSTM has limitations in capturing long-term temporal dependencies. To solve this problem, this study proposes an encoder–decoder structure using self-attention ConvLSTM (SA-ConvLSTM), which retains the advantages of ConvLSTM and effectively captures the long-range dependencies through the self-attention mechanism. The effectiveness of the encoder–decoder structure using SA-ConvLSTM was validated through experiments on the MovingMNIST, KTH dataset. Full article
(This article belongs to the Special Issue Novel Research on Image and Video Processing Technology)
Show Figures

Figure 1

21 pages, 7944 KiB  
Article
A Method for All-Weather Unstructured Road Drivable Area Detection Based on Improved Lite-Mobilenetv2
by Qingyu Wang, Chenchen Lyu and Yanyan Li
Appl. Sci. 2024, 14(17), 8019; https://doi.org/10.3390/app14178019 - 7 Sep 2024
Cited by 1 | Viewed by 1332
Abstract
This paper presents an all-weather drivable area detection method based on deep learning, addressing the challenges of recognizing unstructured roads and achieving clear environmental perception under adverse weather conditions in current autonomous driving systems. The method enhances the Lite-Mobilenetv2 feature extraction module and [...] Read more.
This paper presents an all-weather drivable area detection method based on deep learning, addressing the challenges of recognizing unstructured roads and achieving clear environmental perception under adverse weather conditions in current autonomous driving systems. The method enhances the Lite-Mobilenetv2 feature extraction module and integrates a pyramid pooling module with an attention mechanism. Moreover, it introduces a defogging preprocessing module suitable for real-time detection, which transforms foggy images into clear ones for accurate drivable area detection. The experiments adopt a transfer learning-based training approach, training an all-road-condition semantic segmentation model on four datasets that include both structured and unstructured roads, with and without fog. This strategy reduces computational load and enhances detection accuracy. Experimental results demonstrate a 3.84% efficiency improvement compared to existing algorithms. Full article
(This article belongs to the Special Issue Novel Research on Image and Video Processing Technology)
Show Figures

Figure 1

11 pages, 2725 KiB  
Article
Methods for Reducing Ring Artifacts in Tomographic Images Using Wavelet Decomposition and Averaging Techniques
by Paweł Lipowicz, Marta Borowska and Agnieszka Dardzińska-Głębocka
Appl. Sci. 2024, 14(16), 7292; https://doi.org/10.3390/app14167292 - 19 Aug 2024
Cited by 2 | Viewed by 1261
Abstract
Computed tomography (CT) is one of the fundamental imaging modalities used in medicine, allowing for the acquisition of accurate cross-sectional images of internal body tissues. However, during the acquisition and reconstruction process, various artifacts can arise, and one of them is ring artifacts. [...] Read more.
Computed tomography (CT) is one of the fundamental imaging modalities used in medicine, allowing for the acquisition of accurate cross-sectional images of internal body tissues. However, during the acquisition and reconstruction process, various artifacts can arise, and one of them is ring artifacts. These artifacts result from the inherent limitations of CT scanner components and the properties of the scanned material, such as detector defects, non-uniform distribution of radiation from the source, or the presence of metallic elements within the scanning region. The purpose of this study was to identify and reduce ring artifacts in tomographic images using image decomposition and average filtering methods. In this study, tests were conducted on the effectiveness of identifying ring artifacts using wavelet decomposition methods for images. The test was performed on a Shepp–Logan phantom with implemented artifacts of different intensity levels. The analysis was performed using different wavelet families, and linear approximation methods were used to filter the image in the identified areas. Additional filtering was performed using moving average methods and empirical mode decomposition (EMD) techniques. Image comparison methods, i.e., RMSE, SSIM and MS-SSIM, were used to evaluate performance. The results of this study showed a significant improvement in the quality of tomographic phantom images. The authors obtained more than 50% improvement in image quality with reference to the image without any filtration. The different wavelet families had different efficiencies with relation to the identification of the induction regions of ring artifacts. The Haar wavelet and Coiflet 1 showed the best performance in identifying artifact induction regions, with comparative RMSE values for these wavelets of 0.1477 for Haar and 0.1469 for Coiflet 1. The applied additional moving average filtering and EMD permitted us to improve image quality, which is confirmed by the results of the image comparison. The obtained results allow us to assess how the used methods affect the reduction in ring artifacts in phantom images with induced artifacts. Full article
(This article belongs to the Special Issue Novel Research on Image and Video Processing Technology)
Show Figures

Figure 1

21 pages, 8984 KiB  
Article
Multi-Modal Low-Data-Based Learning for Video Classification
by Erol Citak and Mine Elif Karsligil
Appl. Sci. 2024, 14(10), 4272; https://doi.org/10.3390/app14104272 - 17 May 2024
Viewed by 1674
Abstract
Video classification is a challenging task in computer vision that requires analyzing the content of a video to assign it to one or more predefined categories. However, due to the vast amount of visual data contained in videos, the classification process is often [...] Read more.
Video classification is a challenging task in computer vision that requires analyzing the content of a video to assign it to one or more predefined categories. However, due to the vast amount of visual data contained in videos, the classification process is often computationally expensive and requires a significant amount of annotated data. Because of these reasons, the low-data-based video classification area, which consists of few-shot and zero-shot tasks, is proposed as a potential solution to overcome traditional video classification-oriented challenges. However, existing low-data area datasets, which are either not diverse or have no additional modality context, which is a mandatory requirement for the zero-shot task, do not fulfill the requirements for few-shot and zero-shot tasks completely. To address this gap, in this paper, we propose a large-scale, general-purpose dataset for the problem of multi-modal low-data-based video classification. The dataset contains pairs of videos and attributes that capture multiple facets of the video content. Thus, the new proposed dataset will both enable the study of low-data-based video classification tasks and provide consistency in terms of comparing the evaluations of future studies in this field. Furthermore, to evaluate and provide a baseline for future works on our new proposed dataset, we present a variational autoencoder-based model that leverages the inherent correlation among different modalities to learn more informative representations. In addition, we introduce a regularization technique to improve the baseline model’s generalization performance in low-data scenarios. Our experimental results reveal that our proposed baseline model, with the aid of this regularization technique, achieves over 12% improvement in classification accuracy compared to the pure baseline model with only a single labeled sample. Full article
(This article belongs to the Special Issue Novel Research on Image and Video Processing Technology)
Show Figures

Figure 1

16 pages, 524 KiB  
Article
Non-Iterative Cluster Routing: Analysis and Implementation Strategies
by Huong Pham and Samuel Cheng
Appl. Sci. 2024, 14(5), 1706; https://doi.org/10.3390/app14051706 - 20 Feb 2024
Cited by 2 | Viewed by 1272
Abstract
In conventional routing, a capsule network employs routing algorithms for bidirectional information flow between layers through iterative processes. In contrast, the cluster routingtechnique utilizes a non-iterative process and can outperform state-of-the-art models with fewer parameters, while preserving the part–whole relationship and demonstrating robust [...] Read more.
In conventional routing, a capsule network employs routing algorithms for bidirectional information flow between layers through iterative processes. In contrast, the cluster routingtechnique utilizes a non-iterative process and can outperform state-of-the-art models with fewer parameters, while preserving the part–whole relationship and demonstrating robust generalization to novel viewpoints. This paper aims to further analyze and clarify this concept, providing insights that allow users to implement the cluster routing technique efficiently. Additionally, we expand the technique and propose variations based on the routing principle of achieving consensus among votes in distinct clusters. In some cases, these variations have the potential to enhance and boost the cluster routing performance while utilizing similar memory and computing resources. Full article
(This article belongs to the Special Issue Novel Research on Image and Video Processing Technology)
Show Figures

Figure 1

13 pages, 860 KiB  
Article
Sample-Based Gradient Edge and Angular Prediction for VVC Lossless Intra-Coding
by Guojie Chen and Min Lin
Appl. Sci. 2024, 14(4), 1653; https://doi.org/10.3390/app14041653 - 18 Feb 2024
Cited by 1 | Viewed by 1590
Abstract
Lossless coding is a compression method in the Versatile Video Coding (VVC) standard, which can compress video without distortion. Lossless coding has great application prospects in fields with high requirements for video quality. Since the current VVC standard is mainly designed for lossy [...] Read more.
Lossless coding is a compression method in the Versatile Video Coding (VVC) standard, which can compress video without distortion. Lossless coding has great application prospects in fields with high requirements for video quality. Since the current VVC standard is mainly designed for lossy coding, the compression efficiency of VVC lossless coding makes it hard to meet people’s needs. In order to improve the performance of VVC lossless coding, this paper proposes a sample-based intra-gradient edge detection and angular prediction (SGAP) method. SGAP utilizes the characteristics of lossless intra-coding to employ samples adjacent to the current sample as reference samples and performs prediction through sample iteration. SGAP aims to improve the prediction accuracy for edge regions, smooth regions and directional texture regions in images. Experimental results on the VVC Test Model (VTM) 12.3 reveal that SGAP achieves 7.31% bit-rate savings on average in VVC lossless intra-coding, while the encoding time is only increased by 5.4%. Compared with existing advanced sample-based intra-prediction methods, SGAP can provide significantly higher coding performance gain. Full article
(This article belongs to the Special Issue Novel Research on Image and Video Processing Technology)
Show Figures

Figure 1

15 pages, 8918 KiB  
Article
A Fast Algorithm for VVC Intra Coding Based on the Most Probable Partition Pattern List
by Haiwu Zhao, Shuai Zhao, Xiwu Shang and Guozhong Wang
Appl. Sci. 2023, 13(18), 10381; https://doi.org/10.3390/app131810381 - 17 Sep 2023
Cited by 3 | Viewed by 1680
Abstract
Compared with High-Efficiency Video Coding (HEVC), Versatile Video Coding (VVC) has more flexible division and higher compression efficiency, but it also has higher computational complexity. In order to reduce the coding complexity, a fast algorithm based on the most probable partition pattern list [...] Read more.
Compared with High-Efficiency Video Coding (HEVC), Versatile Video Coding (VVC) has more flexible division and higher compression efficiency, but it also has higher computational complexity. In order to reduce the coding complexity, a fast algorithm based on the most probable partition pattern list (MPPPL)and pixel content similarity is proposed. Firstly, the MPPPL is constructed by using the average texture complexity difference of the sub-coding unit under different partition modes. Then, the sub-block pixel mean difference is used to decide the best partition mode or shorten the MPPPL. Finally, the selection rules of the reference lines in the intra prediction process are counted and the unnecessary reference lines are skipped by using the pixel content similarity. The experimental results show that compared with VTM-13.0, the proposed algorithm can save 52.26% of the encoding time, and the BDBR (Bjontegarrd delta bit rate) only increases by 1.23%. Full article
(This article belongs to the Special Issue Novel Research on Image and Video Processing Technology)
Show Figures

Figure 1

Review

Jump to: Research

42 pages, 2122 KiB  
Review
A Review Toward Deep Learning for High Dynamic Range Reconstruction
by Gabriel de Lima Martins, Josue Lopez-Cabrejos, Julio Martins, Quefren Leher, Gustavo de Souza Ferreti, Lucas Hildelbrano Costa Carvalho, Felipe Bezerra Lima, Thuanne Paixão and Ana Beatriz Alvarez
Appl. Sci. 2025, 15(10), 5339; https://doi.org/10.3390/app15105339 - 10 May 2025
Viewed by 216
Abstract
High Dynamic Range (HDR) image reconstruction has gained prominence in a wide range of fields; not only is it implemented in computer vision, but industries such as entertainment and medicine also benefit considerably from this technology due to its ability to capture and [...] Read more.
High Dynamic Range (HDR) image reconstruction has gained prominence in a wide range of fields; not only is it implemented in computer vision, but industries such as entertainment and medicine also benefit considerably from this technology due to its ability to capture and reproduce scenes with a greater variety of luminosities, extending conventional levels of perception. This article presents a review of the state of the art of HDR reconstruction methods based on deep learning, ranging from classical approaches that are still expressive and relevant to more recent proposals involving the advent of new architectures. The fundamental role of high-quality datasets and specific metrics in evaluating the performance of HDR algorithms is also discussed, as well as emphasizing the challenges inherent in capturing multiple exposures and dealing with artifacts. Finally, emerging trends and promising directions for overcoming current limitations and expanding the potential of HDR reconstruction in real-world scenarios are highlighted. Full article
(This article belongs to the Special Issue Novel Research on Image and Video Processing Technology)
Show Figures

Figure 1

42 pages, 1068 KiB  
Review
Personalized Video Summarization: A Comprehensive Survey of Methods and Datasets
by Michail Peronikolis and Costas Panagiotakis
Appl. Sci. 2024, 14(11), 4400; https://doi.org/10.3390/app14114400 - 22 May 2024
Cited by 1 | Viewed by 4086
Abstract
In recent years, the scientific and technological developments have led to an explosion of available videos on the web, increasing the necessity of fast and effective video analysis and summarization. Video summarization methods aim to generate a synopsis by selecting the most informative [...] Read more.
In recent years, the scientific and technological developments have led to an explosion of available videos on the web, increasing the necessity of fast and effective video analysis and summarization. Video summarization methods aim to generate a synopsis by selecting the most informative parts of the video content. The user’s personal preferences, often involved in the expected results, should be taken into account in the video summaries. In this paper, we provide the first comprehensive survey on personalized video summarization relevant to the techniques and datasets used. In this context, we classify and review personalized video summary techniques based on the type of personalized summary, on the criteria, on the video domain, on the source of information, on the time of summarization, and on the machine learning technique. Depending on the type of methodology used by the personalized video summarization techniques for the summary production process, we classify the techniques into five major categories, which are feature-based video summarization, keyframe selection, shot selection-based approach, video summarization using trajectory analysis, and personalized video summarization using clustering. We also compare personalized video summarization methods and present 37 datasets used to evaluate personalized video summarization methods. Finally, we analyze opportunities and challenges in the field and suggest innovative research lines. Full article
(This article belongs to the Special Issue Novel Research on Image and Video Processing Technology)
Show Figures

Figure 1

Back to TopTop