Submit to Applied Sciences Review for Applied Sciences Propose a Special Issue

Journal Menu

Journal Browser

Advances in Image and Video Processing: Techniques and Applications

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (31 December 2023) | Viewed by 36931

Share This Special Issue

Special Issue Editors

Dr. Mukesh Prasad

E-Mail Website
Guest Editor

School of Computer Science, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney 2007, Australia
Interests: machine learning; computational intelligence; image processing; data analytics; big data; natural language processing; brain–computer interface
Special Issues, Collections and Topics in MDPI journals

Dr. Ali Braytee

E-Mail Website
Guest Editor

School of Computer Science, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW 2007, Australia
Interests: machine learning; optimization; data mining; image processing; computational biology

Dr. Xian Tao

E-Mail Website
Guest Editor

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Interests: deep learning; electrode implantation robot for brain-computer interface; industrial vision detection
Special Issues, Collections and Topics in MDPI journals

Dr. Pang-jo Chun

E-Mail Website
Guest Editor

Department of Civil Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
Interests: bridge engineering; infrastructure maintenance; artificial intelligence; image processing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The rapid advancement of technology in recent years has led to a significant increase in the amount of image and video data generated and captured. This has made image and video processing an important area of research with a wide range of applications in various industries such as surveillance and security systems, drones and unmanned aerial vehicles, medical imaging and diagnosis, automotive and transportation systems, and entertainment. The proposed special session aims to highlight recent advancements in image and video processing techniques and their practical applications.

This special session will be of interest to researchers, engineers, and practitioners working in the field of image and video processing.

Topics of interest for this Special Issue include, but are not limited to, the following:

Image enhancement and restoration
Image segmentation
Object recognition and tracking
Image and video compression
Motion estimation and compensation
Image and video watermarking
Face detection and recognition
Image and video super-resolution
Generative adversarial networks for image and video synthesis
3D reconstruction from 2D images and videos
Multi-view image and video processing
Image and Video summarization
Automated image and video annotation
Adversarial attacks and defences in image and video processing

Dr. Mukesh Prasad
Dr. Ali Braytee
Dr. Xian Tao
Dr. Pang-jo Chun
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (13 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

17 pages, 1716 KiB

Open AccessArticle

Pathological Gait Classification Using Early and Late Fusion of Foot Pressure and Skeleton Data

by Muhammad Tahir Naseem, Haneol Seo, Na-Hyun Kim and Chan-Su Lee

Appl. Sci. 2024, 14(2), 558; https://doi.org/10.3390/app14020558 - 9 Jan 2024

Cited by 5 | Viewed by 2950

Abstract

Classifying pathological gaits is crucial for identifying impairments in specific areas of the human body. Previous studies have extensively employed machine learning and deep learning (DL) methods, using various wearable (e.g., inertial sensors) and non-wearable (e.g., foot pressure plates and depth cameras) sensors. This study proposes early and late fusion methods through DL to categorize one normal and five abnormal (antalgic, lurch, steppage, stiff-legged, and Trendelenburg) pathological gaits. Initially, single-modal approaches were utilized: first, foot pressure data were augmented for transformer-based models; second, skeleton data were applied to a spatiotemporal graph convolutional network (ST-GCN). Subsequently, a multi-modal approach using early fusion by concatenating features from both the foot pressure and skeleton datasets was introduced. Finally, multi-modal fusions, applying early fusion to the feature vector and late fusion by merging outputs from both modalities with and without varying weights, were evaluated. The foot pressure-based and skeleton-based models achieved 99.04% and 78.24% accuracy, respectively. The proposed multi-modal approach using early fusion achieved 99.86% accuracy, whereas the late fusion method achieved 96.95% accuracy without weights and 99.17% accuracy with different weights. Thus, the proposed multi-modal models using early fusion methods demonstrated state-of-the-art performance on the GIST pathological gait database. Full article

(This article belongs to the Special Issue Advances in Image and Video Processing: Techniques and Applications)

► Show Figures

Figure 1

15 pages, 5213 KiB

Open AccessArticle

A Multiscale Deep Encoder–Decoder with Phase Congruency Algorithm Based on Deep Learning for Improving Diagnostic Ultrasound Image Quality

by Ryeonhui Kim, Kyuseok Kim and Youngjin Lee

Appl. Sci. 2023, 13(23), 12928; https://doi.org/10.3390/app132312928 - 3 Dec 2023

Cited by 2 | Viewed by 1919

Abstract

Ultrasound imaging is widely used as a noninvasive lesion detection method in diagnostic medicine. Improving the quality of these ultrasound images is very important for accurate diagnosis, and deep learning-based algorithms have gained significant attention. This study proposes a multiscale deep encoder–decoder with phase congruency (MSDEPC) algorithm based on deep learning to improve the quality of diagnostic ultrasound images. The MSDEPC algorithm included low-resolution (LR) images and edges as inputs and constructed a multiscale convolution and deconvolution network. Simulations were conducted using the Field 2 program, and data from real experimental research were obtained using five clinical datasets containing images of the carotid artery, liver hemangiomas, breast malignancy, thyroid carcinomas, and obstetric nuchal translucency. LR images, bicubic interpolation, and super-resolution convolutional neural networks (SRCNNs) were modeled as comparison groups. Through visual assessment, the image processed using the MSDEPC was the clearest, and the lesions were clearly distinguished. The structural similarity index metric (SSIM) value of the simulated ultrasound image using the MSDEPC algorithm improved by approximately 38.84% compared to LR. In addition, the peak signal-to-noise ratio (PSNR) and SSIM values of clinical ultrasound images using the MSDEPC algorithm improved by approximately 2.33 times and 88.58%, respectively, compared to LR. In conclusion, the MSDEPC algorithm is expected to significantly improve the spatial resolution of ultrasound images. Full article

(This article belongs to the Special Issue Advances in Image and Video Processing: Techniques and Applications)

► Show Figures

Figure 1

13 pages, 4732 KiB

Open AccessArticle

Optimization of Median Modified Wiener Filter for Improving Lung Segmentation Performance in Low-Dose Computed Tomography Images

by Sewon Lim, Minji Park, Hajin Kim, Seong-Hyeon Kang, Kyuseok Kim and Youngjin Lee

Appl. Sci. 2023, 13(19), 10679; https://doi.org/10.3390/app131910679 - 26 Sep 2023

Cited by 4 | Viewed by 2038

Abstract

In low-dose computed tomography (LDCT), lung segmentation effectively improves the accuracy of lung cancer diagnosis. However, excessive noise is inevitable in LDCT, which can decrease lung segmentation accuracy. To address this problem, it is necessary to derive an optimized kernel size when using the median modified Wiener filter (MMWF) for noise reduction. Incorrect application of the kernel size can result in inadequate noise removal or blurring, degrading segmentation accuracy. Therefore, various kernel sizes of the MMWF were applied in this study, followed by region-growing-based segmentation and quantitative evaluation. In addition to evaluating the segmentation performance, we conducted a similarity assessment. Our results indicate that the greatest improvement in segmentation performance and similarity was at a kernel size 5 × 5. Compared with the noisy image, the accuracy, F1-score, intersection over union, root mean square error, and peak signal-to-noise ratio using the optimized MMWF were improved by factors of 1.38, 33.20, 64.86, 7.82, and 1.30 times, respectively. In conclusion, we have demonstrated that by applying the MMWF with an appropriate kernel size, the optimization of noise and blur reduction can enhance segmentation performance. Full article

(This article belongs to the Special Issue Advances in Image and Video Processing: Techniques and Applications)

► Show Figures

Figure 1

24 pages, 4916 KiB

Open AccessArticle

Computer Vision Based Planogram Compliance Evaluation

by Julius Laitala and Laura Ruotsalainen

Appl. Sci. 2023, 13(18), 10145; https://doi.org/10.3390/app131810145 - 8 Sep 2023

Cited by 6 | Viewed by 4596

Abstract

Arranging products in stores according to planograms, optimized product arrangement maps, is an important sales enabler and necessary for keeping up with the highly competitive modern retail market. Key benefits of planograms include increased efficiency, maximized retail store space, increased customer satisfaction, visual appeal, and increased revenue. The planograms are realized into product arrangements by humans, a process that is prone to mistakes. Therefore, for optimal merchandising performance, the planogram compliance of the arrangements needs to be evaluated from time to time. We investigate utilizing a computer vision problem setting—retail product detection—to automate planogram compliance evaluation. Retail product detection comprises product detection and classification. The detected and classified products can be compared to the planogram in order to evaluate compliance. In this paper, we propose a novel retail product detection pipeline combining a Gaussian layer network product proposal generator and domain invariant hierarchical embedding (DIHE) classifier. We utilize the detection pipeline with RANSAC pose estimation for planogram compliance evaluation. As the existing metrics for evaluating the planogram compliance evaluation performance assume unrealistically that the test image matches the planogram, we propose a novel metric, called normalized planogram compliance error

(E_{P C})

, for benchmarking real-world setups. We evaluate the performance of our method with two datasets: the only open-source dataset with planogram evaluation data, GP-180, and our own dataset collected from a large Nordic retailer. Based on the evaluation, our method provides an improved planogram compliance evaluation pipeline, with accurate product location estimation when using real-life images that include entire shelves, unlike previous research that has only used images with few products. Our analysis also demonstrates that our method requires less processing time than the state-of-the-art compliance evaluation methods. Full article

(This article belongs to the Special Issue Advances in Image and Video Processing: Techniques and Applications)

► Show Figures

Figure 1

17 pages, 20089 KiB

Open AccessArticle

Attention-Based Mechanism and Adversarial Autoencoder for Underwater Image Enhancement

by Gaosheng Luo, Gang He, Zhe Jiang and Chuankun Luo

Appl. Sci. 2023, 13(17), 9956; https://doi.org/10.3390/app13179956 - 3 Sep 2023

Cited by 6 | Viewed by 1922

Abstract

To address the phenomenon of color shift and low contrast in underwater images caused by wavelength- and distance-related attenuation and scattering when light propagates in water, we propose a method based on an attention mechanism and adversarial autoencoder for enhancing underwater images. Firstly, the pixel and channel attention mechanisms are utilized to extract rich discriminative image information from multiple color spaces. Secondly, the above image information and the original image reverse medium transmittance map are feature-fused by a feature fusion module to enhance the network response to the image quality degradation region. Finally, the encoder learning is guided by the adversarial mechanism of the adversarial autoencoder, and the hidden space of the autoencoder is continuously approached to the hidden space of the pre-trained model. The results of the experimental images acquired from the Beihai Bay area of China on the HYSY-163 platform show that the average value of the Natural Image Quality Evaluator is reduced by 27.8%, the average value of the Underwater Color Image Quality Evaluation is improved by 28.8%, and the average values of the Structural Similarity and Peak Signal-to-Noise Ratio are improved by 35.7% and 42.8%, respectively, compared with the unprocessed real underwater images, and the enhanced underwater images have improved clarity and more realistic colors. In summary, our network can effectively improve the visibility of underwater target objects, especially the quality of images of submarine pipelines and marine organisms, and is expected to be applied in the future with underwater robots for pile legs of offshore wellhead platforms and large ship bottom sea life cleaning. Full article

(This article belongs to the Special Issue Advances in Image and Video Processing: Techniques and Applications)

► Show Figures

Figure 1

17 pages, 4909 KiB

Open AccessArticle

Noise-Assessment-Based Screening Method for Remote Photoplethysmography Estimation

by Kunyoung Lee, Seunghyun Kim, Byeongseon An, Hyunsoo Seo, Shinwi Park and Eui Chul Lee

Appl. Sci. 2023, 13(17), 9818; https://doi.org/10.3390/app13179818 - 30 Aug 2023

Cited by 1 | Viewed by 2185

Abstract

Remote vital signal estimation has been researched for several years. There are numerous studies on rPPG, which utilizes cameras to detect cardiovascular activity. Most of the research has concentrated on obtaining rPPG from a complete video. However, excessive movement or changes in lighting can cause noise, and it will inevitably lead to a reduction in the quality of the obtained signal. Moreover, since rPPG measures minor changes that occur in the blood flow of an image due to variations in heart rate, it becomes challenging to capture in a noisy image, as the impact of noise is larger than the change caused by the heart rate. Using such segments in a video can cause a decrease in overall performance, but it can only be remedied through data pre-processing. In this study, we propose a screening technique that removes excessively noisy video segments as input and only uses signals obtained from reliable segments. Using this method, we were able to boost the performance of the current rPPG algorithm from 50.43% to 62.27% based on PTE6. Our screening technique can be easily applied to any existing rPPG prediction model and it can improve the reliability of the output in all cases. Full article

(This article belongs to the Special Issue Advances in Image and Video Processing: Techniques and Applications)

► Show Figures

Figure 1

20 pages, 18746 KiB

Open AccessArticle

Corrosion Damage Detection in Headrace Tunnel Using YOLOv7 with Continuous Wall Images

by Shiori Kubo, Nobuhiro Nakayama, Sadanori Matsuda and Pang-jo Chun

Appl. Sci. 2023, 13(16), 9388; https://doi.org/10.3390/app13169388 - 18 Aug 2023

Cited by 2 | Viewed by 2328

Abstract

Infrastructure that was constructed during the high economic growth period of Japan is starting to deteriorate; thus, there is a need for the maintenance and management of these structures. The basis of maintenance and management is the inspection process, which involves finding and recording damage. However, in headrace tunnels, the water supply is interrupted during inspection; thus, it is desirable to comprehensively photograph and record the tunnel wall and detect damage using the captured images to significantly reduce the water supply interruption time. Given this background, the aim of this study is to establish an investigation and assessment system for deformation points in the inner walls of headrace tunnels and to perform efficient maintenance and management of the tunnels. First, we develop a mobile headrace photography device that photographs the walls of the headrace tunnel with a charge-coupled device line camera. Next, we develop a method using YOLOv7 for detecting chalk marks at the damage locations made during cleaning of the tunnel walls that were photographed by the imaging system, and these results are used as a basis to develop a system that automatically accumulates and plots damage locations and distributions. For chalking detection using continuous wall surface images, a high accuracy of 99.02% is achieved. Furthermore, the system can evaluate the total number and distribution of deteriorated areas, which can be used to identify the causes of change over time and the occurrence of deterioration phenomena. The developed system can significantly reduce the duration and cost of inspections and surveys, and the results can be used to select priority repair areas and to predict deterioration through data accumulation, contributing to appropriate management of headrace tunnels. Full article

(This article belongs to the Special Issue Advances in Image and Video Processing: Techniques and Applications)

► Show Figures

Figure 1

16 pages, 3242 KiB

Open AccessArticle

Multi-Scale Feature Fusion and Structure-Preserving Network for Face Super-Resolution

by Dingkang Yang, Yehua Wei, Chunwei Hu, Xin Yu, Cheng Sun, Sheng Wu and Jin Zhang

Appl. Sci. 2023, 13(15), 8928; https://doi.org/10.3390/app13158928 - 3 Aug 2023

Cited by 4 | Viewed by 3060

Abstract

Deep convolutional neural networks have demonstrated significant performance improvements in face super-resolution tasks. However, many deep learning-based approaches tend to overlook the inherent structural information and feature correlation across different scales in face images, making the accurate recovery of face structure in low-resolution cases challenging. To address this, this paper proposes a method that fuses multi-scale features while preserving the facial structure. It introduces a novel multi-scale residual block (MSRB) to reconstruct key facial parts and structures from spatial and channel dimensions, and utilizes pyramid attention (PA) to exploit non-local self-similarity, improving the details of the reconstructed face. Feature Enhancement Modules (FEM) are employed in the upscale stage to refine and enhance current features using multi-scale features from previous stages. The experimental results on CelebA, Helen and LFW datasets provide evidence that our method achieves superior quantitative metrics compared to the baseline, the Peak Signal-to-Noise Ratio (PSNR) outperforms the baseline by 0.282 dB, 0.343 dB, and 0.336 dB. Furthermore, our method demonstrates improved visual performance on two additional no-reference datasets, Widerface and Webface. Full article

(This article belongs to the Special Issue Advances in Image and Video Processing: Techniques and Applications)

► Show Figures

Figure 1

20 pages, 3075 KiB

Open AccessArticle

Fuzzy Logic with Deep Learning for Detection of Skin Cancer

by Sumit Kumar Singh, Vahid Abolghasemi and Mohammad Hossein Anisi

Appl. Sci. 2023, 13(15), 8927; https://doi.org/10.3390/app13158927 - 3 Aug 2023

Cited by 45 | Viewed by 4660

Abstract

Melanoma is the deadliest type of cancerous cell, which is developed when melanocytes, melanin producing cell, starts its uncontrolled growth. If not detected and cured in its situ, it might decrease the chances of survival of patients. The diagnosis of a melanoma lesion is still a challenging task due to its visual similarities with benign lesions. In this paper, a fuzzy logic-based image segmentation along with a modified deep learning model is proposed for skin cancer detection. The highlight of the paper is its dermoscopic image enhancement using pre-processing techniques, infusion of mathematical logics, standard deviation methods, and the L-R fuzzy defuzzification method to enhance the results of segmentation. These pre-processing steps are developed to improve the visibility of lesion by removing artefacts such as hair follicles, dermoscopic scales, etc. Thereafter, the image is enhanced by histogram equalization method, and it is segmented by proposed method prior to performing the detection phase. The modified model employs a deep neural network algorithm, You Look Only Once (YOLO), which is established on the application of Deep convolutional neural network (DCNN) for detection of melanoma lesion from digital and dermoscopic lesion images. The YOLO model is composed of a series of DCNN layers we have added more depth by adding convolutional layer and residual connections. Moreover, we have introduced feature concatenation at different layers which combines multi-scale features. Our experimental results confirm that YOLO provides a better accuracy score and is faster than most of the pre-existing classifiers. The classifier is trained with 2000 and 8695 dermoscopic images from ISIC 2017 and ISIC 2018 datasets, whereas PH2 datasets along with both the previously mentioned datasets are used for testing the proposed algorithm. Full article

(This article belongs to the Special Issue Advances in Image and Video Processing: Techniques and Applications)

► Show Figures

Figure 1

18 pages, 9836 KiB

Open AccessArticle

Simulation and Experimental Studies of Optimization of σ-Value for Block Matching and 3D Filtering Algorithm in Magnetic Resonance Images

by Minji Park, Seong-Hyeon Kang, Kyuseok Kim, Youngjin Lee and for the Alzheimer’s Disease Neuroimaging Initiative

Appl. Sci. 2023, 13(15), 8803; https://doi.org/10.3390/app13158803 - 30 Jul 2023

Cited by 3 | Viewed by 1909

Abstract

In this study, we optimized the

σ

In this study, we optimized the

σ

-values of a block matching and 3D filtering (BM3D) algorithm to reduce noise in magnetic resonance images. Brain T2-weighted images (T2WIs) were obtained using the BrainWeb simulation program and Rician noise with intensities of 0.05, 0.10, and 0.15. The BM3D algorithm was applied to the optimized BM3D algorithm and compared with conventional noise reduction algorithms using Gaussian, median, and Wiener filters. The clinical feasibility was assessed using real brain T2WIs from the Alzheimer’s Disease Neuroimaging Initiative. Quantitative evaluation was performed using the contrast-to-noise ratio, coefficient of variation, structural similarity index measurement, and root mean square error. The simulation results showed optimal image characteristics and similarity at a σ-value of 0.12, demonstrating superior noise reduction performance. The optimized BM3D algorithm showed the greatest improvement in the clinical study. In conclusion, applying the optimized BM3D algorithm with a σ-value of 0.12 achieved efficient noise reduction. Full article

(This article belongs to the Special Issue Advances in Image and Video Processing: Techniques and Applications)

► Show Figures

Figure 1

16 pages, 9138 KiB

Open AccessArticle

Comparative Analysis of AI-Based Facial Identification and Expression Recognition Using Upper and Lower Facial Regions

by Seunghyun Kim, Byeong Seon An and Eui Chul Lee

Appl. Sci. 2023, 13(10), 6070; https://doi.org/10.3390/app13106070 - 15 May 2023

Cited by 3 | Viewed by 2930

Abstract

The COVID-19 pandemic has significantly impacted society, having led to a lack of social skills in children who became used to interacting with others while wearing masks. To analyze this issue, we investigated the effects of masks on face identification and facial expression recognition, using deep learning models for these operations. The results showed that when using the upper or lower facial regions for face identification, the upper facial region allowed for an accuracy of 81.36%, and the lower facial region allowed for an accuracy of 55.52%. Regarding facial expression recognition, the upper facial region allowed for an accuracy of 39% compared to 49% for the lower facial region. Furthermore, our analysis was conducted for a number of facial expressions, and specific emotions such as happiness and contempt were difficult to distinguish using only the upper facial region. Because this study used a model trained on data generated from human labeling, it is assumed that the effects on humans would be similar. Therefore, this study is significant because it provides engineering evidence of a decline in facial expression recognition; however, wearing masks does not cause difficulties in identification. Full article

(This article belongs to the Special Issue Advances in Image and Video Processing: Techniques and Applications)

► Show Figures

Figure 1

14 pages, 24574 KiB

Open AccessArticle

Dense-HR-GAN: A High-Resolution GAN Model with Dense Connection for Image Dehazing in Icing Wind Tunnel Environment

by Wenjun Zhou, Xinling Yang, Chenglin Zuo, Yifan Wang and Bo Peng

Appl. Sci. 2023, 13(8), 5171; https://doi.org/10.3390/app13085171 - 21 Apr 2023

Cited by 2 | Viewed by 2150

Abstract

To address the issue of blurred images generated during ice wind tunnel tests, we propose a high-resolution dense-connection GAN model, named Dense-HR-GAN. This issue is caused by attenuation due to scattering and absorption when light passes through cloud and fog droplets. Dense-HR-GAN is specifically designed for this environment. The model utilizes an atmospheric scattering model to dehaze images with a dense network structure for training. First, sub-pixel convolution is added to the network structure to remove image artifacts and generate high-resolution images. Secondly, we introduce instance normalization to eliminate the influence of batch size on the model and improve its generalization performance. Finally, PatchGAN is used in the discriminator to capture image details and local information, and then drive the generator to generate a clear and high-resolution dehazed image. Moreover, the model is jointly constrained by multiple loss functions during training to restore the texture information of the hazy image and reduce color distortion. Experimental results show that the proposed method can achieve the state-of-the-art performance on image dehazing the in icing wind tunnel environment. Full article

(This article belongs to the Special Issue Advances in Image and Video Processing: Techniques and Applications)

► Show Figures

Figure 1

20 pages, 15761 KiB

Open AccessArticle

EnsembleVehicleDet: Detection of Faraway Vehicles with Real-Time Consideration

by Seunghyun Yu, Seungwook Son, Hanse Ahn, Hwapyeong Baek, Kijeong Nam, Yongwha Chung and Daihee Park

Appl. Sci. 2023, 13(6), 3939; https://doi.org/10.3390/app13063939 - 20 Mar 2023

Cited by 1 | Viewed by 2024

Abstract

While detecting surrounding vehicles in autonomous driving is possible with advances in object detection using deep learning, there are cases where small vehicles are not being detected accurately. Additionally, real-time processing requirements must be met for implementation in autonomous vehicles. However, detection accuracy and execution speed have an inversely proportional relationship. To improve the accuracy–speed tradeoff, this study proposes an ensemble method. An input image is downsampled first, and the vehicle detection result is acquired for the downsampled image through an object detector. Then, warping or upsampling is performed on the Region of Interest (RoI) where the small vehicles are located, and the small vehicle detection result is acquired for the transformed image through another object detector. If the input image is downsampled, the effect on the detection accuracy of large vehicles is minimal, but the effect on the detection accuracy of small vehicles is significant. Therefore, the detection accuracy of small vehicles can be improved by increasing the pixel sizes of small vehicles in the transformed image more than the given input image. To validate the proposed method’s efficiency, the experiment was conducted with Argoverse vehicle data used in an autonomous vehicle contest, and the accuracy–speed tradeoff improved by up to a factor of two using the proposed ensemble method. Full article

(This article belongs to the Special Issue Advances in Image and Video Processing: Techniques and Applications)

► Show Figures

Journal Menu

Journal Browser

Advances in Image and Video Processing: Techniques and Applications

Share This Special Issue

Special Issue Editors

Special Issue Information

Benefits of Publishing in a Special Issue

Published Papers (13 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI