Applied Sciences

Research

Jump to: Review

15 pages, 2580 KiB

Open AccessArticle

Self-Attention (SA)-ConvLSTM Encoder–Decoder Structure-Based Video Prediction for Dynamic Motion Estimation

by Jeongdae Kim, Hyunseung Choo and Jongpil Jeong

Appl. Sci. 2024, 14(23), 11315; https://doi.org/10.3390/app142311315 - 4 Dec 2024

Cited by 2 | Viewed by 1999

Abstract

Video prediction, which is the task of predicting future video frames based on past observations, remains a challenging problem because of the complexity and high dimensionality of spatiotemporal dynamics. To address the problems associated with spatiotemporal prediction, which is an important decision-making tool [...] Read more.

Video prediction, which is the task of predicting future video frames based on past observations, remains a challenging problem because of the complexity and high dimensionality of spatiotemporal dynamics. To address the problems associated with spatiotemporal prediction, which is an important decision-making tool in various fields, several deep learning models have been proposed. Convolutional long short-term memory (ConvLSTM) can capture space and time simultaneously and has shown excellent performance in various applications, such as image and video prediction, object detection, and semantic segmentation. However, ConvLSTM has limitations in capturing long-term temporal dependencies. To solve this problem, this study proposes an encoder–decoder structure using self-attention ConvLSTM (SA-ConvLSTM), which retains the advantages of ConvLSTM and effectively captures the long-range dependencies through the self-attention mechanism. The effectiveness of the encoder–decoder structure using SA-ConvLSTM was validated through experiments on the MovingMNIST, KTH dataset. Full article

(This article belongs to the Special Issue Novel Research on Image and Video Processing Technology)

► Show Figures

Figure 1

21 pages, 7944 KiB

Open AccessArticle

A Method for All-Weather Unstructured Road Drivable Area Detection Based on Improved Lite-Mobilenetv2

by Qingyu Wang, Chenchen Lyu and Yanyan Li

Appl. Sci. 2024, 14(17), 8019; https://doi.org/10.3390/app14178019 - 7 Sep 2024

Cited by 4 | Viewed by 1520

Abstract

This paper presents an all-weather drivable area detection method based on deep learning, addressing the challenges of recognizing unstructured roads and achieving clear environmental perception under adverse weather conditions in current autonomous driving systems. The method enhances the Lite-Mobilenetv2 feature extraction module and [...] Read more.

This paper presents an all-weather drivable area detection method based on deep learning, addressing the challenges of recognizing unstructured roads and achieving clear environmental perception under adverse weather conditions in current autonomous driving systems. The method enhances the Lite-Mobilenetv2 feature extraction module and integrates a pyramid pooling module with an attention mechanism. Moreover, it introduces a defogging preprocessing module suitable for real-time detection, which transforms foggy images into clear ones for accurate drivable area detection. The experiments adopt a transfer learning-based training approach, training an all-road-condition semantic segmentation model on four datasets that include both structured and unstructured roads, with and without fog. This strategy reduces computational load and enhances detection accuracy. Experimental results demonstrate a 3.84% efficiency improvement compared to existing algorithms. Full article

(This article belongs to the Special Issue Novel Research on Image and Video Processing Technology)

► Show Figures

Figure 1

11 pages, 2725 KiB

Open AccessArticle

Methods for Reducing Ring Artifacts in Tomographic Images Using Wavelet Decomposition and Averaging Techniques

by Paweł Lipowicz, Marta Borowska and Agnieszka Dardzińska-Głębocka

Appl. Sci. 2024, 14(16), 7292; https://doi.org/10.3390/app14167292 - 19 Aug 2024

Cited by 3 | Viewed by 1581

Abstract

Computed tomography (CT) is one of the fundamental imaging modalities used in medicine, allowing for the acquisition of accurate cross-sectional images of internal body tissues. However, during the acquisition and reconstruction process, various artifacts can arise, and one of them is ring artifacts. [...] Read more.

Computed tomography (CT) is one of the fundamental imaging modalities used in medicine, allowing for the acquisition of accurate cross-sectional images of internal body tissues. However, during the acquisition and reconstruction process, various artifacts can arise, and one of them is ring artifacts. These artifacts result from the inherent limitations of CT scanner components and the properties of the scanned material, such as detector defects, non-uniform distribution of radiation from the source, or the presence of metallic elements within the scanning region. The purpose of this study was to identify and reduce ring artifacts in tomographic images using image decomposition and average filtering methods. In this study, tests were conducted on the effectiveness of identifying ring artifacts using wavelet decomposition methods for images. The test was performed on a Shepp–Logan phantom with implemented artifacts of different intensity levels. The analysis was performed using different wavelet families, and linear approximation methods were used to filter the image in the identified areas. Additional filtering was performed using moving average methods and empirical mode decomposition (EMD) techniques. Image comparison methods, i.e., RMSE, SSIM and MS-SSIM, were used to evaluate performance. The results of this study showed a significant improvement in the quality of tomographic phantom images. The authors obtained more than 50% improvement in image quality with reference to the image without any filtration. The different wavelet families had different efficiencies with relation to the identification of the induction regions of ring artifacts. The Haar wavelet and Coiflet 1 showed the best performance in identifying artifact induction regions, with comparative RMSE values for these wavelets of 0.1477 for Haar and 0.1469 for Coiflet 1. The applied additional moving average filtering and EMD permitted us to improve image quality, which is confirmed by the results of the image comparison. The obtained results allow us to assess how the used methods affect the reduction in ring artifacts in phantom images with induced artifacts. Full article

(This article belongs to the Special Issue Novel Research on Image and Video Processing Technology)

► Show Figures

Figure 1

21 pages, 8984 KiB

Open AccessArticle

Multi-Modal Low-Data-Based Learning for Video Classification

by Erol Citak and Mine Elif Karsligil

Appl. Sci. 2024, 14(10), 4272; https://doi.org/10.3390/app14104272 - 17 May 2024

Viewed by 1934

Abstract

Video classification is a challenging task in computer vision that requires analyzing the content of a video to assign it to one or more predefined categories. However, due to the vast amount of visual data contained in videos, the classification process is often [...] Read more.

Video classification is a challenging task in computer vision that requires analyzing the content of a video to assign it to one or more predefined categories. However, due to the vast amount of visual data contained in videos, the classification process is often computationally expensive and requires a significant amount of annotated data. Because of these reasons, the low-data-based video classification area, which consists of few-shot and zero-shot tasks, is proposed as a potential solution to overcome traditional video classification-oriented challenges. However, existing low-data area datasets, which are either not diverse or have no additional modality context, which is a mandatory requirement for the zero-shot task, do not fulfill the requirements for few-shot and zero-shot tasks completely. To address this gap, in this paper, we propose a large-scale, general-purpose dataset for the problem of multi-modal low-data-based video classification. The dataset contains pairs of videos and attributes that capture multiple facets of the video content. Thus, the new proposed dataset will both enable the study of low-data-based video classification tasks and provide consistency in terms of comparing the evaluations of future studies in this field. Furthermore, to evaluate and provide a baseline for future works on our new proposed dataset, we present a variational autoencoder-based model that leverages the inherent correlation among different modalities to learn more informative representations. In addition, we introduce a regularization technique to improve the baseline model’s generalization performance in low-data scenarios. Our experimental results reveal that our proposed baseline model, with the aid of this regularization technique, achieves over 12% improvement in classification accuracy compared to the pure baseline model with only a single labeled sample. Full article

(This article belongs to the Special Issue Novel Research on Image and Video Processing Technology)

► Show Figures

Figure 1

16 pages, 524 KiB

Open AccessArticle

Non-Iterative Cluster Routing: Analysis and Implementation Strategies

by Huong Pham and Samuel Cheng

Appl. Sci. 2024, 14(5), 1706; https://doi.org/10.3390/app14051706 - 20 Feb 2024

Cited by 2 | Viewed by 1391

Abstract

In conventional routing, a capsule network employs routing algorithms for bidirectional information flow between layers through iterative processes. In contrast, the cluster routingtechnique utilizes a non-iterative process and can outperform state-of-the-art models with fewer parameters, while preserving the part–whole relationship and demonstrating robust [...] Read more.

In conventional routing, a capsule network employs routing algorithms for bidirectional information flow between layers through iterative processes. In contrast, the cluster routingtechnique utilizes a non-iterative process and can outperform state-of-the-art models with fewer parameters, while preserving the part–whole relationship and demonstrating robust generalization to novel viewpoints. This paper aims to further analyze and clarify this concept, providing insights that allow users to implement the cluster routing technique efficiently. Additionally, we expand the technique and propose variations based on the routing principle of achieving consensus among votes in distinct clusters. In some cases, these variations have the potential to enhance and boost the cluster routing performance while utilizing similar memory and computing resources. Full article

(This article belongs to the Special Issue Novel Research on Image and Video Processing Technology)

► Show Figures

Figure 1

13 pages, 860 KiB

Open AccessArticle

Sample-Based Gradient Edge and Angular Prediction for VVC Lossless Intra-Coding

by Guojie Chen and Min Lin

Appl. Sci. 2024, 14(4), 1653; https://doi.org/10.3390/app14041653 - 18 Feb 2024

Cited by 1 | Viewed by 1722

Abstract

Lossless coding is a compression method in the Versatile Video Coding (VVC) standard, which can compress video without distortion. Lossless coding has great application prospects in fields with high requirements for video quality. Since the current VVC standard is mainly designed for lossy [...] Read more.

Lossless coding is a compression method in the Versatile Video Coding (VVC) standard, which can compress video without distortion. Lossless coding has great application prospects in fields with high requirements for video quality. Since the current VVC standard is mainly designed for lossy coding, the compression efficiency of VVC lossless coding makes it hard to meet people’s needs. In order to improve the performance of VVC lossless coding, this paper proposes a sample-based intra-gradient edge detection and angular prediction (SGAP) method. SGAP utilizes the characteristics of lossless intra-coding to employ samples adjacent to the current sample as reference samples and performs prediction through sample iteration. SGAP aims to improve the prediction accuracy for edge regions, smooth regions and directional texture regions in images. Experimental results on the VVC Test Model (VTM) 12.3 reveal that SGAP achieves 7.31% bit-rate savings on average in VVC lossless intra-coding, while the encoding time is only increased by 5.4%. Compared with existing advanced sample-based intra-prediction methods, SGAP can provide significantly higher coding performance gain. Full article

(This article belongs to the Special Issue Novel Research on Image and Video Processing Technology)

► Show Figures

Figure 1

15 pages, 8918 KiB

Open AccessArticle

A Fast Algorithm for VVC Intra Coding Based on the Most Probable Partition Pattern List

by Haiwu Zhao, Shuai Zhao, Xiwu Shang and Guozhong Wang

Appl. Sci. 2023, 13(18), 10381; https://doi.org/10.3390/app131810381 - 17 Sep 2023

Cited by 3 | Viewed by 1853

Abstract

Compared with High-Efficiency Video Coding (HEVC), Versatile Video Coding (VVC) has more flexible division and higher compression efficiency, but it also has higher computational complexity. In order to reduce the coding complexity, a fast algorithm based on the most probable partition pattern list [...] Read more.

Compared with High-Efficiency Video Coding (HEVC), Versatile Video Coding (VVC) has more flexible division and higher compression efficiency, but it also has higher computational complexity. In order to reduce the coding complexity, a fast algorithm based on the most probable partition pattern list (MPPPL)and pixel content similarity is proposed. Firstly, the MPPPL is constructed by using the average texture complexity difference of the sub-coding unit under different partition modes. Then, the sub-block pixel mean difference is used to decide the best partition mode or shorten the MPPPL. Finally, the selection rules of the reference lines in the intra prediction process are counted and the unnecessary reference lines are skipped by using the pixel content similarity. The experimental results show that compared with VTM-13.0, the proposed algorithm can save 52.26% of the encoding time, and the BDBR (Bjontegarrd delta bit rate) only increases by 1.23%. Full article

(This article belongs to the Special Issue Novel Research on Image and Video Processing Technology)

► Show Figures

Figure 1

Review

Jump to: Research

42 pages, 2122 KiB

Open AccessReview

A Review Toward Deep Learning for High Dynamic Range Reconstruction

by Gabriel de Lima Martins, Josue Lopez-Cabrejos, Julio Martins, Quefren Leher, Gustavo de Souza Ferreti, Lucas Hildelbrano Costa Carvalho, Felipe Bezerra Lima, Thuanne Paixão and Ana Beatriz Alvarez

Appl. Sci. 2025, 15(10), 5339; https://doi.org/10.3390/app15105339 - 10 May 2025

Viewed by 1155

Abstract

High Dynamic Range (HDR) image reconstruction has gained prominence in a wide range of fields; not only is it implemented in computer vision, but industries such as entertainment and medicine also benefit considerably from this technology due to its ability to capture and [...] Read more.

High Dynamic Range (HDR) image reconstruction has gained prominence in a wide range of fields; not only is it implemented in computer vision, but industries such as entertainment and medicine also benefit considerably from this technology due to its ability to capture and reproduce scenes with a greater variety of luminosities, extending conventional levels of perception. This article presents a review of the state of the art of HDR reconstruction methods based on deep learning, ranging from classical approaches that are still expressive and relevant to more recent proposals involving the advent of new architectures. The fundamental role of high-quality datasets and specific metrics in evaluating the performance of HDR algorithms is also discussed, as well as emphasizing the challenges inherent in capturing multiple exposures and dealing with artifacts. Finally, emerging trends and promising directions for overcoming current limitations and expanding the potential of HDR reconstruction in real-world scenarios are highlighted. Full article

(This article belongs to the Special Issue Novel Research on Image and Video Processing Technology)

► Show Figures

Figure 1

42 pages, 1068 KiB

Open AccessReview

Personalized Video Summarization: A Comprehensive Survey of Methods and Datasets

by Michail Peronikolis and Costas Panagiotakis

Appl. Sci. 2024, 14(11), 4400; https://doi.org/10.3390/app14114400 - 22 May 2024

Cited by 3 | Viewed by 5874

Abstract

In recent years, the scientific and technological developments have led to an explosion of available videos on the web, increasing the necessity of fast and effective video analysis and summarization. Video summarization methods aim to generate a synopsis by selecting the most informative [...] Read more.

In recent years, the scientific and technological developments have led to an explosion of available videos on the web, increasing the necessity of fast and effective video analysis and summarization. Video summarization methods aim to generate a synopsis by selecting the most informative parts of the video content. The user’s personal preferences, often involved in the expected results, should be taken into account in the video summaries. In this paper, we provide the first comprehensive survey on personalized video summarization relevant to the techniques and datasets used. In this context, we classify and review personalized video summary techniques based on the type of personalized summary, on the criteria, on the video domain, on the source of information, on the time of summarization, and on the machine learning technique. Depending on the type of methodology used by the personalized video summarization techniques for the summary production process, we classify the techniques into five major categories, which are feature-based video summarization, keyframe selection, shot selection-based approach, video summarization using trajectory analysis, and personalized video summarization using clustering. We also compare personalized video summarization methods and present 37 datasets used to evaluate personalized video summarization methods. Finally, we analyze opportunities and challenges in the field and suggest innovative research lines. Full article

(This article belongs to the Special Issue Novel Research on Image and Video Processing Technology)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Novel Research on Image and Video Processing Technology

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (9 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI