Artificial Intelligence in Image Processing and Computer Vision

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (15 August 2024) | Viewed by 15848

Special Issue Editors


E-Mail Website
Guest Editor
ICB Laboratory, UMR CNRS 6303, University of Burgundy, 21078 Dijon, France
Interests: computer vision; affective computing; approximate computing; embedded AI
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
ICB Laboratory, UMR CNRS 6303, University of Burgundy, 21078 Dijon, France
Interests: computer vision; multi-modal data fusion

E-Mail Website
Guest Editor
Laboratoire Hubert Curien, UMR CNRS 5516, 18 rue Benoit Lauras 42000 Saint Etienne, France
Interests: embedded IA; federated learning; distillation; image and video processing

E-Mail Website
Guest Editor
State Key Laboratory of Acoustics, Institute of Acoustics in the Chinese Academy of Sciences, 100190 Beijing, China
Interests: underwater acoustic information processing; underwater acoustic imaging

E-Mail Website
Guest Editor
Institut Jean Lamour, UMR 7198, University of Lorraine, 54011 Nancy, France
Interests: energy-harvesting circuits; neuromorphic architectures; reconfigurable network on chips; algorithm-architecture matching for real-time signal processing

Special Issue Information

Dear Colleagues,

Artificial Intelligence (AI) has been a game-changer in the field of image processing and computer vision, enabling the development of intelligent systems that can perform complex image analysis tasks with high accuracy and efficiency. With the recent surge in machine learning and deep learning techniques, AI has significantly impacted low-level image processing tasks, such as HDR imaging, super-resolution, and noise reduction. At the same time, high-level image analysis tasks, such as object segmentation, detection, and tracking have also witnessed remarkable advancements with the advent of deep-learning-based models. Recently, one of the most exciting developments in AI is the emergence of generative models, which has opened new possibilities for image synthesis, style transfer, and image inpainting. Moreover, the fusion of visual images with other types of modalities, such as depth, thermal, event, radar, lidar, acoustic signal, and text prompt, has made image analysis even more powerful and effective.

This Special Issue aims to provide a platform for researchers and practitioners to share their latest research findings, innovations, and applications in the field of AI for image processing and computer vision. We welcome contributions that address both theoretical and practical aspects of AI for all image analysis tasks. We also encourage submissions that explore the use of generative AI models and multimodal fusion techniques in image processing and computer vision. In addition, we strongly encourage submissions that address real-time applications of AI in image processing and computer vision for edge systems. With the increasing demand for real-time image analysis in various domains, including medical imaging, surveillance, agriculture, ocean engineering, and autonomous systems, there is a pressing need for AI-based solutions that can provide a fast and accurate analysis of visual data in real-time scenarios.

We invite the submission of high-quality, original contributions that address theoretical or practical issues related to the theme of the Special Issue. The scope of the Special Issue encompasses a wide range of topics, including but not limited to:

  • Advances in imaging techniques;
  • Advances in scene understanding;
  • Multi-modal data fusion;
  • Vision for robotics;
  • Acoustic imaging and applications;
  • Real-time image processing in embedded systems.

Overall, we believe that this Special Issue will provide an excellent opportunity for researchers and practitioners to exchange ideas, share their latest findings, and advance the state of the art in AI for image processing and computer vision, with a particular focus on real-time applications.

Prof. Dr. Fan Yang
Dr. Zongwei Wu
Dr. Virginie Fresse
Dr. Chao Li
Dr. Slaviša Jovanović
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image processing
  • computer vision
  • multi-modal data fusion
  • real-time application

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (15 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 2194 KiB  
Article
A Multidimensional Framework Incorporating 2D U-Net and 3D Attention U-Net for the Segmentation of Organs from 3D Fluorodeoxyglucose-Positron Emission Tomography Images
by Andreas Vezakis, Ioannis Vezakis, Theodoros P. Vagenas, Ioannis Kakkos and George K. Matsopoulos
Electronics 2024, 13(17), 3526; https://doi.org/10.3390/electronics13173526 - 5 Sep 2024
Viewed by 226
Abstract
Accurate analysis of Fluorodeoxyglucose (FDG)-Positron Emission Tomography (PET) images is crucial for the diagnosis, treatment assessment, and monitoring of patients suffering from various cancer types. FDG-PET images provide valuable insights by revealing regions where FDG, a glucose analog, accumulates within the body. While [...] Read more.
Accurate analysis of Fluorodeoxyglucose (FDG)-Positron Emission Tomography (PET) images is crucial for the diagnosis, treatment assessment, and monitoring of patients suffering from various cancer types. FDG-PET images provide valuable insights by revealing regions where FDG, a glucose analog, accumulates within the body. While regions of high FDG uptake include suspicious tumor lesions, FDG also accumulates in non-tumor-specific regions and organs. Identifying these regions is crucial for excluding them from certain measurements, or calculating useful parameters, for example, the mean standardized uptake value (SUV) to assess the metabolic activity of the liver. Manual organ delineation from FDG-PET by clinicians demands significant effort and time, which is often not feasible in real clinical workflows with high patient loads. For this reason, this study focuses on automatically identifying key organs with high FDG uptake, namely the brain, left cardiac ventricle, kidneys, liver, and bladder. To this end, an ensemble approach is adopted, where a three-dimensional Attention U-Net (3D AU-Net) is employed for robust three-dimensional analysis, while a two-dimensional U-Net (2D U-Net) is utilized for analysis in the coronal plane. The 3D AU-Net demonstrates highly detailed organ segmentations, but also includes many false positive regions. In contrast, 2D U-Net achieves higher reliability with minimal false positive regions, but lacks the 3D details. Experiments conducted on a subset of the public AutoPET dataset with 60 PET scans demonstrate that the proposed ensemble model achieves high accuracy in segmenting the required organs, surpassing current state-of-the-art techniques, and supporting the potential utilization of the proposed methodology in accelerating and enhancing the clinical workflow of cancer patients. Full article
(This article belongs to the Special Issue Artificial Intelligence in Image Processing and Computer Vision)
Show Figures

Figure 1

22 pages, 10596 KiB  
Article
Development of a Seafloor Litter Database and Application of Image Preprocessing Techniques for UAV-Based Detection of Seafloor Objects
by Ivan Biliškov and Vladan Papić
Electronics 2024, 13(17), 3524; https://doi.org/10.3390/electronics13173524 - 5 Sep 2024
Viewed by 282
Abstract
Marine litter poses a significant global threat to marine ecosystems, primarily driven by poor waste management, inadequate infrastructure, and irresponsible human activities. This research investigates the application of image preprocessing techniques and deep learning algorithms for the detection of seafloor objects, specifically marine [...] Read more.
Marine litter poses a significant global threat to marine ecosystems, primarily driven by poor waste management, inadequate infrastructure, and irresponsible human activities. This research investigates the application of image preprocessing techniques and deep learning algorithms for the detection of seafloor objects, specifically marine debris, using unmanned aerial vehicles (UAVs). The primary objective is to develop non-invasive methods for detecting marine litter to mitigate environmental impacts and support the health of marine ecosystems. Data was collected remotely via UAVs, resulting in a novel database of over 5000 images and 12,000 objects categorized into 31 classes, with metadata such as GPS location, wind speed, and solar parameters. Various image preprocessing methods were employed to enhance underwater object detection, with the Removal of Water Scattering (RoWS) method demonstrating superior performance. The proposed deep neural network architecture significantly improved detection precision compared to existing models. The findings indicate that appropriate databases and preprocessing methods substantially enhance the accuracy and precision of underwater object detection algorithms. Full article
(This article belongs to the Special Issue Artificial Intelligence in Image Processing and Computer Vision)
Show Figures

Figure 1

25 pages, 26769 KiB  
Article
SIDGAN: Efficient Multi-Module Architecture for Single Image Defocus Deblurring
by Shenggui Ling, Hongmin Zhan and Lijia Cao
Electronics 2024, 13(12), 2265; https://doi.org/10.3390/electronics13122265 - 9 Jun 2024
Viewed by 633
Abstract
In recent years, with the rapid developments in deep learning and graphics processing units, learning-based defocus deblurring has made favorable achievements. However, the current methods are not effective in processing blurred images with a large depth of field. The greater the depth of [...] Read more.
In recent years, with the rapid developments in deep learning and graphics processing units, learning-based defocus deblurring has made favorable achievements. However, the current methods are not effective in processing blurred images with a large depth of field. The greater the depth of field, the blurrier the image, namely, the image contains large blurry regions and encounters severe blur. The fundamental reason for the unsatisfactory results is that it is difficult to extract effective features from the blurred images with large blurry regions. For this reason, a new FFEM (Fuzzy Feature Extraction Module) is proposed to enhance the encoder’s ability to extract features from images with large blurry regions. After using the FFEM during encoding, its PSNR (Peak Signal-to-Noise Ratio) is improved by 1.33% on the DPDD (Dual-Pixel Defocus Deblurring). Moreover, images with large blurry regions often cause the current algorithms to generate artifacts in their results. Therefore, a new module named ARM (Artifact Removal Module) is proposed in this work and employed during decoding. After utilizing the ARM during decoding, its PSNR is improved by 2.49% on the DPDD. After using the FFEM and the ARM simultaneously, compared to the latest algorithms, the PSNR of our method is improved by 3.29% on the DPDD. Following the previous research in this field, qualitative and quantitative experiments are conducted on the DPDD and the RealDOF (Real Depth of Field), and the experimental results indicate that our method surpasses the state-of-the-art algorithms in three objective metrics. Full article
(This article belongs to the Special Issue Artificial Intelligence in Image Processing and Computer Vision)
Show Figures

Figure 1

23 pages, 4542 KiB  
Article
Active Learning in Feature Extraction for Glass-in-Glass Detection
by Jerzy Rapcewicz and Marcin Malesa
Electronics 2024, 13(11), 2049; https://doi.org/10.3390/electronics13112049 - 24 May 2024
Viewed by 531
Abstract
In the food industry, ensuring product quality is crucial due to potential hazards to consumers. Though metallic contaminants are easily detected, identifying non-metallic ones like wood, plastic, or glass remains challenging and poses health risks. X-ray-based quality control systems offer deeper product inspection [...] Read more.
In the food industry, ensuring product quality is crucial due to potential hazards to consumers. Though metallic contaminants are easily detected, identifying non-metallic ones like wood, plastic, or glass remains challenging and poses health risks. X-ray-based quality control systems offer deeper product inspection than RGB cameras, making them suitable for detecting various contaminants. However, acquiring sufficient defective samples for classification is costly and time-consuming. To address this, we propose an anomaly detection system requiring only non-defective samples, automatically classifying anything not recognized as good as defective. Our system, employing active learning on X-ray images, efficiently detects defects like glass fragments in food products. By fine tuning a feature extractor and autoencoder based on non-defective samples, our method improves classification accuracy while minimizing the need for manual intervention over time. The system achieves a 97.4% detection rate for foreign glass bodies in glass jars, offering a fast and effective solution for real-time quality control on production lines. Full article
(This article belongs to the Special Issue Artificial Intelligence in Image Processing and Computer Vision)
Show Figures

Figure 1

20 pages, 9286 KiB  
Article
A Selective Multi-Branch Network for Edge-Oriented Object Localization and Classification
by Kai Su, Yoichi Tomioka, Qiangfu Zhao and Yong Liu
Electronics 2024, 13(8), 1472; https://doi.org/10.3390/electronics13081472 - 12 Apr 2024
Viewed by 778
Abstract
This study introduces a novel selective multi-branch network architecture designed to speed up object localization and classification on low-performance edge devices. The concept builds upon the You Only Look at Interested Cells (YOLIC) method, which was proposed by us earlier. In this approach, [...] Read more.
This study introduces a novel selective multi-branch network architecture designed to speed up object localization and classification on low-performance edge devices. The concept builds upon the You Only Look at Interested Cells (YOLIC) method, which was proposed by us earlier. In this approach, we categorize cells of interest (CoIs) into distinct regions of interest (RoIs) based on their locations and urgency. We then employ some expert branch networks for detailed object detection in each of the RoIs. To steer these branches effectively, a selective attention unit is added into the detection process. This unit can locate RoIs that are likely to contain objects under concern and trigger corresponding expert branch networks. The inference can be more efficient because only part of the feature map is used to make decisions. Through extensive experiments on various datasets, the proposed network demonstrates its ability to reduce the inference time while still maintaining competitive performance levels compared to the current detection algorithms. Full article
(This article belongs to the Special Issue Artificial Intelligence in Image Processing and Computer Vision)
Show Figures

Figure 1

22 pages, 4688 KiB  
Article
Multi-Scale Adaptive Feature Network Drainage Pipe Image Dehazing Method Based on Multiple Attention
by Ce Li, Zhengyan Tang, Jingyi Qiao, Chi Su and Feng Yang
Electronics 2024, 13(7), 1406; https://doi.org/10.3390/electronics13071406 - 8 Apr 2024
Viewed by 899
Abstract
Drainage pipes are a critical component of urban infrastructure, and their safety and proper functioning are vital. However, haze problems caused by humid environments and temperature differences seriously affect the quality and detection accuracy of drainage pipe images. Traditional repair methods are difficult [...] Read more.
Drainage pipes are a critical component of urban infrastructure, and their safety and proper functioning are vital. However, haze problems caused by humid environments and temperature differences seriously affect the quality and detection accuracy of drainage pipe images. Traditional repair methods are difficult to meet the requirements when dealing with complex underground environments. To solve this problem, we researched and proposed a dehazing method for drainage pipe images based on multi-attention multi-scale adaptive feature networks. By designing multiple attention and adaptive modules, the network is able to capture global features with multi-scale resolution in complex underground environments, thereby achieving end-to-end dehazing processing. In addition, we also constructed a large drainage pipe dataset containing tens of thousands of clear/hazy image pairs of drainage pipes for network training and testing. Experimental results show that our network exhibits excellent dehazing performance in various complex underground environments, especially in the real scene of urban underground drainage pipes. The contributions of this paper are mainly reflected in the following aspects: first, a novel multi-scale adaptive feature network based on multiple attention is proposed to effectively solve the problem of dehazing drainage pipe images; second, a large-scale drainage pipe data is constructed. The collection provides valuable resources for related research work; finally, the effectiveness and superiority of the proposed method are verified through experiments, and it provides an efficient solution for dehazing work in scenes such as urban underground drainage pipes. Full article
(This article belongs to the Special Issue Artificial Intelligence in Image Processing and Computer Vision)
Show Figures

Figure 1

19 pages, 24713 KiB  
Article
Color Face Image Generation with Improved Generative Adversarial Networks
by Yeong-Hwa Chang, Pei-Hua Chung, Yu-Hsiang Chai and Hung-Wei Lin
Electronics 2024, 13(7), 1205; https://doi.org/10.3390/electronics13071205 - 25 Mar 2024
Viewed by 988
Abstract
This paper focuses on the development of an improved Generative Adversarial Network (GAN) specifically designed for generating color portraits from sketches. The construction of the system involves using a GPU (Graphics Processing Unit) computing host as the primary unit for model training. The [...] Read more.
This paper focuses on the development of an improved Generative Adversarial Network (GAN) specifically designed for generating color portraits from sketches. The construction of the system involves using a GPU (Graphics Processing Unit) computing host as the primary unit for model training. The tasks that require high-performance calculations are handed over to the GPU host, while the user host only needs to perform simple image processing and use the model trained by the GPU host to generate images. This arrangement reduces the computer specification requirements for the user. This paper will conduct a comparative analysis of various types of generative networks which will serve as a reference point for the development of the proposed Generative Adversarial Network. The application part of the paper focuses on the practical implementation and utilization of the developed Generative Adversarial Network for the generation of multi-skin tone portraits. By constructing a face dataset specifically designed to incorporate information about ethnicity and skin color, this approach can overcome a limitation associated with traditional generation networks, which typically generate only a single skin color. Full article
(This article belongs to the Special Issue Artificial Intelligence in Image Processing and Computer Vision)
Show Figures

Figure 1

13 pages, 1366 KiB  
Article
An Extended Method for Reversible Color Tone Control Using Data Hiding
by Daichi Nakaya and Shoko Imaizumi
Electronics 2024, 13(7), 1204; https://doi.org/10.3390/electronics13071204 - 25 Mar 2024
Viewed by 687
Abstract
This paper proposes an extended method for reversible color tone control for blue and red tones. Our previous method has an issue in that there are cases where the intensity of enhancement cannot be flexibly controlled. In contrast, the proposed method can gradually [...] Read more.
This paper proposes an extended method for reversible color tone control for blue and red tones. Our previous method has an issue in that there are cases where the intensity of enhancement cannot be flexibly controlled. In contrast, the proposed method can gradually improve the intensity by increasing the correction coefficients, regardless of the image features. This is because the method defines one reference area where the correction coefficients are determined, one each for blue and red tones, while the previous method defines a common reference area for both tones. Owing to this, the method also provides independent control for blue and red tones. In our experiments, we clarify the above advantages of the method. Additionally, we also discuss the influence of the data-embedding process, which is necessary to store recovery information, on the output image quality. Full article
(This article belongs to the Special Issue Artificial Intelligence in Image Processing and Computer Vision)
Show Figures

Figure 1

17 pages, 2841 KiB  
Article
Advancing Cough Classification: Swin Transformer vs. 2D CNN with STFT and Augmentation Techniques
by Malak Ghourabi, Farah Mourad-Chehade and Aly Chkeir
Electronics 2024, 13(7), 1177; https://doi.org/10.3390/electronics13071177 - 22 Mar 2024
Viewed by 935
Abstract
Coughing, a common symptom associated with various respiratory problems, is a crucial indicator for diagnosing and tracking respiratory diseases. Accurate identification and categorization of cough sounds, specially distinguishing between wet and dry coughs, are essential for understanding underlying health conditions. This research focuses [...] Read more.
Coughing, a common symptom associated with various respiratory problems, is a crucial indicator for diagnosing and tracking respiratory diseases. Accurate identification and categorization of cough sounds, specially distinguishing between wet and dry coughs, are essential for understanding underlying health conditions. This research focuses on applying the Swin Transformer for classifying wet and dry coughs using short-time Fourier transform (STFT) representations. We conduct a comprehensive evaluation, including a performance comparison with a 2D convolutional neural network (2D CNN) model, and exploration of two distinct image augmentation methods: time mask augmentation and classical image augmentation techniques. Extensive hyperparameter tuning is performed to optimize the Swin Transformer’s performance, considering input size, patch size, embedding size, number of epochs, optimizer type, and regularization technique. Our results demonstrate the Swin Transformer’s superior accuracy, particularly when trained on classically augmented STFT images with optimized settings (320 × 320 input size, RMS optimizer, 8 × 8 patch size, and an embedding size of 128). The approach achieves remarkable testing accuracy (88.37%) and ROC AUC values (94.88%) on the challenging crowdsourced COUGHVID dataset, marking improvements of approximately 2.5% and 11% increases in testing accuracy and ROC AUC values, respectively, compared to previous studies. These findings underscore the efficacy of Swin Transformer architectures in disease detection and healthcare classification problems. Full article
(This article belongs to the Special Issue Artificial Intelligence in Image Processing and Computer Vision)
Show Figures

Figure 1

17 pages, 1833 KiB  
Article
Fuzzy Inference Systems to Fine-Tune a Local Eigenvector Image Smoothing Method
by Khleef Almutairi, Samuel Morillas and Pedro Latorre-Carmona
Electronics 2024, 13(6), 1150; https://doi.org/10.3390/electronics13061150 - 21 Mar 2024
Viewed by 725
Abstract
Image denoising is a fundamental research topic in colour image processing, analysis, and transmission. Noise is an inevitable byproduct of image acquisition and transmission, and its nature is intimately linked to the underlying processes that produce it. Gaussian noise is a particularly prevalent [...] Read more.
Image denoising is a fundamental research topic in colour image processing, analysis, and transmission. Noise is an inevitable byproduct of image acquisition and transmission, and its nature is intimately linked to the underlying processes that produce it. Gaussian noise is a particularly prevalent type of noise that necessitates effective removal while ensuring the preservation of the original image’s quality. This paper presents a colour image denoising framework that integrates fuzzy inference systems (FISs) with eigenvector analysis. This framework employs eigenvector analysis to extract relevant information from local image neighbourhoods. This information is subsequently fed into the FIS system which dynamically adjusts the intensity of the denoising process based on local characteristics. This approach recognizes that homogeneous areas may require less aggressive smoothing than detailed image regions. Images are converted from the RGB domain to an eigenvector-based space for smoothing and then converted back to the RGB domain. The effectiveness of the proposed methods is established through the application of various image quality metrics and visual comparisons against established state-of-the-art techniques. Full article
(This article belongs to the Special Issue Artificial Intelligence in Image Processing and Computer Vision)
Show Figures

Figure 1

17 pages, 1633 KiB  
Article
Incremental Scene Classification Using Dual Knowledge Distillation and Classifier Discrepancy on Natural and Remote Sensing Images
by Chih-Chang Yu, Tzu-Ying Chen, Chun-Wei Hsu and Hsu-Yung Cheng
Electronics 2024, 13(3), 583; https://doi.org/10.3390/electronics13030583 - 31 Jan 2024
Cited by 1 | Viewed by 868
Abstract
Conventional deep neural networks face challenges in handling the increasing amount of information in real-world scenarios where it is impractical to gather all the training data at once. Incremental learning, also known as continual learning, provides a solution for lightweight and sustainable learning [...] Read more.
Conventional deep neural networks face challenges in handling the increasing amount of information in real-world scenarios where it is impractical to gather all the training data at once. Incremental learning, also known as continual learning, provides a solution for lightweight and sustainable learning with neural networks. However, incremental learning encounters issues such as “catastrophic forgetting” and the “stability–plasticity dilemma”. To address these challenges, this study proposes a two-stage training method. In the first stage, dual knowledge distillation is introduced, including feature map-based and response-based knowledge distillation. This approach prevents the model from excessively favoring new tasks during training, thus addressing catastrophic forgetting. In the second stage, an out-of-distribution dataset is incorporated to calculate the discrepancy loss between multiple classifiers. By maximizing the discrepancy loss and minimizing the cross-entropy loss, the model improves the classification accuracy of new tasks. The proposed method is evaluated using the CIFAR100 and RESISC45 benchmark datasets, comparing it to existing approaches. Experimental results demonstrate an overall accuracy improvement of 6.9% and a reduction of 5.1% in the forgetting rate after adding nine consecutive tasks. These findings indicate that the proposed method effectively mitigates catastrophic forgetting and provides a viable solution for image classification in natural and remote sensing images. Full article
(This article belongs to the Special Issue Artificial Intelligence in Image Processing and Computer Vision)
Show Figures

Figure 1

20 pages, 1070 KiB  
Article
High-Throughput MPSoC Implementation of Sparse Bayesian Learning Algorithm
by Jinyang Wang, El-Bay Bourennane, Mahdi Madani, Jun Wang, Chao Li, Yupeng Tai, Longxu Wang, Fan Yang and Haibin Wang
Electronics 2024, 13(1), 234; https://doi.org/10.3390/electronics13010234 - 4 Jan 2024
Viewed by 974
Abstract
In the field of sparse signal reconstruction, sparse Bayesian learning (SBL) has excellent performance, which is accompanied by extremely high computational complexity. This paper presents an efficient SBL hardware and software (HW&SW) co-implementation method using the ZYNQ series MPSoC (multiprocessor system-on-chip). Firstly, considering [...] Read more.
In the field of sparse signal reconstruction, sparse Bayesian learning (SBL) has excellent performance, which is accompanied by extremely high computational complexity. This paper presents an efficient SBL hardware and software (HW&SW) co-implementation method using the ZYNQ series MPSoC (multiprocessor system-on-chip). Firstly, considering the inherent challenges in parallelizing iterative algorithms like SBL, we propose an architecture based on the iterative calculations implemented on the PL side (FPGA) and the iteration control and input management handled by the PS side (ARM). By adopting this structure, we can take advantage of task-level pipelines on the FPGA side, effectively utilizing time and space resources. Secondly, we utilize LDL decomposition to perform the inversion of the Hermitian matrix, which not only exhibits the lowest computational complexity and requires fewer computational resources but also achieves a higher level in the parallel pipeline mechanism compared with other alternative methods. Furthermore, the algorithm conducts iterations sequentially, utilizing the parameters derived from the previous dataset as prior information for initializing the subsequent dataset’s initial values. This approach helps to reduce the number of iterations required. Finally, with the help of Vitis HLS 2022.2 and Vivado tools, we successfully accomplished the development of a hardware design language and its implementation on the ZYNQ UltraScale+ MPSoC ZCU102 platform. Meanwhile, we have solved a direction of arrival (DOA) estimation problem using horizontal line arrays to verify the practical feasibility of the method. Full article
(This article belongs to the Special Issue Artificial Intelligence in Image Processing and Computer Vision)
Show Figures

Figure 1

19 pages, 8762 KiB  
Article
Real-Time Object Detection and Tracking for Unmanned Aerial Vehicles Based on Convolutional Neural Networks
by Shao-Yu Yang, Hsu-Yung Cheng and Chih-Chang Yu
Electronics 2023, 12(24), 4928; https://doi.org/10.3390/electronics12244928 - 7 Dec 2023
Cited by 3 | Viewed by 3513
Abstract
This paper presents a system applied to unmanned aerial vehicles based on Robot Operating Systems (ROSs). The study addresses the challenges of efficient object detection and real-time target tracking for unmanned aerial vehicles. The system utilizes a pruned YOLOv4 architecture for fast object [...] Read more.
This paper presents a system applied to unmanned aerial vehicles based on Robot Operating Systems (ROSs). The study addresses the challenges of efficient object detection and real-time target tracking for unmanned aerial vehicles. The system utilizes a pruned YOLOv4 architecture for fast object detection and the SiamMask model for continuous target tracking. A Proportional Integral Derivative (PID) module adjusts the flight attitude, enabling stable target tracking automatically in indoor and outdoor environments. The contributions of this work include exploring the feasibility of pruning existing models systematically to construct a real-time detection and tracking system for drone control with very limited computational resources. Experiments validate the system’s feasibility, demonstrating efficient object detection, accurate target tracking, and effective attitude control. This ROS-based system contributes to advancing UAV technology in real-world environments. Full article
(This article belongs to the Special Issue Artificial Intelligence in Image Processing and Computer Vision)
Show Figures

Figure 1

18 pages, 9286 KiB  
Article
Combination of Fast Finite Shear Wave Transform and Optimized Deep Convolutional Neural Network: A Better Method for Noise Reduction of Wetland Test Images
by Xiangdong Cui, Huajun Bai, Ying Zhao and Zhen Wang
Electronics 2023, 12(17), 3557; https://doi.org/10.3390/electronics12173557 - 23 Aug 2023
Cited by 1 | Viewed by 868
Abstract
Wetland experimental images are often affected by factors such as waves, weather conditions, and lighting, resulting in severe noise in the images. In order to improve the quality and accuracy of wetland experimental images, this paper proposes a wetland experimental image denoising method [...] Read more.
Wetland experimental images are often affected by factors such as waves, weather conditions, and lighting, resulting in severe noise in the images. In order to improve the quality and accuracy of wetland experimental images, this paper proposes a wetland experimental image denoising method based on the fast finite shearlet transform (FFST) and a deep convolutional neural network model. The FFST is used to decompose the wetland experimental images, which can capture the features of different frequencies and directions in the images. The network model has a deep network structure and powerful feature extraction capabilities. By training the model, it can learn the relevant features in the wetland experimental images, thereby achieving denoising effects. The experimental results show that, compared to traditional denoising methods, the proposed method in this paper can effectively remove noise from wetland experimental images while preserving the details and textures of the images. This is of great significance for improving the quality and accuracy of wetland experimental images. Full article
(This article belongs to the Special Issue Artificial Intelligence in Image Processing and Computer Vision)
Show Figures

Figure 1

16 pages, 3152 KiB  
Article
Unified Object Detector for Different Modalities Based on Vision Transformers
by Xiaoke Shen and Ioannis Stamos
Electronics 2023, 12(12), 2571; https://doi.org/10.3390/electronics12122571 - 7 Jun 2023
Cited by 1 | Viewed by 1188
Abstract
Traditional systems typically require different models for processing different modalities, such as one model for RGB images and another for depth images. Recent research has demonstrated that a single model for one modality can be adapted for another using cross-modality transfer learning. In [...] Read more.
Traditional systems typically require different models for processing different modalities, such as one model for RGB images and another for depth images. Recent research has demonstrated that a single model for one modality can be adapted for another using cross-modality transfer learning. In this paper, we extend this approach by combining cross/inter-modality transfer learning with a vision transformer to develop a unified detector that achieves superior performance across diverse modalities. Our research envisions an application scenario for robotics, where the unified system seamlessly switches between RGB cameras and depth sensors in varying lighting conditions. Importantly, the system requires no model architecture or weight updates to enable this smooth transition. Specifically, the system uses a depth sensor in low light conditions (night time) and both an RGB camera and a depth sensor or RGB camera only in well-lit environments. We evaluate our unified model on the SUN RGB-D dataset and demonstrate that it achieves a similar or better performance in terms of the mAP50 compared to state-of-the-art methods in the SUNRGBD16 category and a comparable performance in point-cloud-only mode. We also introduce a novel inter-modality mixing method that enables our model to achieve significantly better results than previous methods. We provide our code, including training/inference logs and model checkpoints, to facilitate reproducibility and further research. Full article
(This article belongs to the Special Issue Artificial Intelligence in Image Processing and Computer Vision)
Show Figures

Figure 1

Back to TopTop