sensors-logo

Journal Browser

Journal Browser

Topical Collection "Advances in Deep-Learning-Based Sensing, Imaging, and Video Processing"

A topical collection in Sensors (ISSN 1424-8220). This collection belongs to the section "Sensing and Imaging".

Editors

Prof. Dr. Yun Zhang
E-Mail Website
Collection Editor
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
Interests: video coding; image processing
Prof. Dr. KWONG Tak Wu Sam
E-Mail Website
Collection Editor
Department of Computer Science, City University of Hong Kong, 83 Tatchee Ave., Kowloon, Hong Kong, China
Interests: image processing; video processing; image segmentation; machine learning
Prof. Dr. Xu Long
E-Mail Website
Collection Editor
National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100012, China
Interests: image processing; machine learning; solar radio astronomy; synthesis aperture imaging
Prof. Dr. Tiesong Zhao
E-Mail Website
Collection Editor
College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China
Interests: multimedia signal processing; coding and transmission; computer vision

Topical Collection Information

Dear Colleagues,

Deep learning techniques are capable of discovering knowledge from massive unstructured data and providing data-driven solutions. They have significantly improved technical advancements in many research fields and applications, such as audio-visual signal processing, computer vision, and pattern recognition. Additionally, deep learning and its improved techniques are expected to be included in future sensors and imaging systems.

Today, with the rapid development of advanced deep learning models and techniques, such as GAN, DNN, RNN, and LSTM, and the increasing demands around the effectiveness of visual signal processing, new opputunities are emerging in advances in deep-learning-based sensing, imaging, and video processing. This Special Issue aims at promoting cutting-edge research along this direction and offering a timely collection of works for researchers. We welcome high-quality original submissions related to advances in deep-learning-based sensing, imaging, and video processing.

Topics of interest include, but are not limited to

  • Deep learning theory, framework, database, and learning optimization;
  • Deep-learning-based remote sensing, multispectral, and/or high spectral sensing;
  • Deep-learning-based computational imaging and pre-processing;
  • Deep-learning-based visual perceptual model and quality assessment metrics;
  • Deep-learning-based image/video compression and communication;
  • Deep-learning-based 3D/multiview sensing, imaging, and video processing;
  • Deep-learning-based depth sensing and estimation;
  • Deep-learning-based image/video rendering, reconstruction, and enhancement;
  • Deep-learning-based visual object detection, tracking, and understanding;
  • Low complexity optimizations on deep-learning-based sensing, imaging and video processing;
  • Other advanced deep-learning-based visual sensing and signal processing.

Prof. Dr. Yun Zhang
Prof. Dr. KWONG Tak Wu Sam
Prof. Dr. Xu Long
Prof. Dr. Tiesong Zhao
Collection Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the collection website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • neural network
  • video compression
  • visual quality assessment
  • image and video enhancement
  • mutlispetral sensing and imaging

Published Papers (10 papers)

2021

Article
Small Object Detection in Traffic Scenes Based on YOLO-MXANet
Sensors 2021, 21(21), 7422; https://doi.org/10.3390/s21217422 - 08 Nov 2021
Viewed by 542
Abstract
In terms of small objects in traffic scenes, general object detection algorithms have low detection accuracy, high model complexity, and slow detection speed. To solve the above problems, an improved algorithm (named YOLO-MXANet) is proposed in this paper. Complete-Intersection over Union (CIoU) is [...] Read more.
In terms of small objects in traffic scenes, general object detection algorithms have low detection accuracy, high model complexity, and slow detection speed. To solve the above problems, an improved algorithm (named YOLO-MXANet) is proposed in this paper. Complete-Intersection over Union (CIoU) is utilized to improve loss function for promoting the positioning accuracy of the small object. In order to reduce the complexity of the model, we present a lightweight yet powerful backbone network (named SA-MobileNeXt) that incorporates channel and spatial attention. Our approach can extract expressive features more effectively by applying the Shuffle Channel and Spatial Attention (SCSA) module into the SandGlass Block (SGBlock) module while increasing the parameters by a small number. In addition, the data enhancement method combining Mosaic and Mixup is employed to improve the robustness of the training model. The Multi-scale Feature Enhancement Fusion (MFEF) network is proposed to fuse the extracted features better. In addition, the SiLU activation function is utilized to optimize the Convolution-Batchnorm-Leaky ReLU (CBL) module and the SGBlock module to accelerate the convergence of the model. The ablation experiments on the KITTI dataset show that each improved method is effective. The improved algorithm reduces the complexity and detection speed of the model while improving the object detection accuracy. The comparative experiments on the KITTY dataset and CCTSDB dataset with other algorithms show that our algorithm also has certain advantages. Full article
Show Figures

Figure 1

Article
Deepfake Detection Using the Rate of Change between Frames Based on Computer Vision
Sensors 2021, 21(21), 7367; https://doi.org/10.3390/s21217367 - 05 Nov 2021
Viewed by 378
Abstract
Recently, artificial intelligence has been successfully used in fields, such as computer vision, voice, and big data analysis. However, various problems, such as security, privacy, and ethics, also occur owing to the development of artificial intelligence. One such problem are deepfakes. Deepfake is [...] Read more.
Recently, artificial intelligence has been successfully used in fields, such as computer vision, voice, and big data analysis. However, various problems, such as security, privacy, and ethics, also occur owing to the development of artificial intelligence. One such problem are deepfakes. Deepfake is a compound word for deep learning and fake. It refers to a fake video created using artificial intelligence technology or the production process itself. Deepfakes can be exploited for political abuse, pornography, and fake information. This paper proposes a method to determine integrity by analyzing the computer vision features of digital content. The proposed method extracts the rate of change in the computer vision features of adjacent frames and then checks whether the video is manipulated. The test demonstrated the highest detection rate of 97% compared to the existing method or machine learning method. It also maintained the highest detection rate of 96%, even for the test that manipulates the matrix of the image to avoid the convolutional neural network detection method. Full article
Show Figures

Figure 1

Article
Improving the Ability of a Laser Ultrasonic Wave-Based Detection of Damage on the Curved Surface of a Pipe Using a Deep Learning Technique
Sensors 2021, 21(21), 7105; https://doi.org/10.3390/s21217105 - 26 Oct 2021
Viewed by 473
Abstract
With the advent of the Fourth Industrial Revolution, the economic, social, and technological demands for pipe maintenance are increasing due to the aging of the infrastructure caused by the increase in industrial development and the expansion of cities. Owing to this, an automatic [...] Read more.
With the advent of the Fourth Industrial Revolution, the economic, social, and technological demands for pipe maintenance are increasing due to the aging of the infrastructure caused by the increase in industrial development and the expansion of cities. Owing to this, an automatic pipe damage detection system was built using a laser-scanned pipe’s ultrasonic wave propagation imaging (UWPI) data and conventional neural network (CNN)-based object detection algorithms. The algorithm used in this study was EfficientDet-d0, a CNN-based object detection algorithm which uses the transfer learning method. As a result, the mean average precision (mAP) was measured to be 0.39. The result found was higher than COCO EfficientDet-d0 mAP, which is expected to enable the efficient maintenance of piping used in construction and many industries. Full article
Show Figures

Figure 1

Article
Compressed Video Quality Index Based on Saliency-Aware Artifact Detection
Sensors 2021, 21(19), 6429; https://doi.org/10.3390/s21196429 - 26 Sep 2021
Viewed by 1463
Abstract
Video coding technology makes the required storage and transmission bandwidth of video services decrease by reducing the bitrate of the video stream. However, the compressed video signals may involve perceivable information loss, especially when the video is overcompressed. In such cases, the viewers [...] Read more.
Video coding technology makes the required storage and transmission bandwidth of video services decrease by reducing the bitrate of the video stream. However, the compressed video signals may involve perceivable information loss, especially when the video is overcompressed. In such cases, the viewers can observe visually annoying artifacts, namely, Perceivable Encoding Artifacts (PEAs), which degrade their perceived video quality. To monitor and measure these PEAs (including blurring, blocking, ringing and color bleeding), we propose an objective video quality metric named Saliency-Aware Artifact Measurement (SAAM) without any reference information. The SAAM metric first introduces video saliency detection to extract interested regions and further splits these regions into a finite number of image patches. For each image patch, the data-driven model is utilized to evaluate intensities of PEAs. Finally, these intensities are fused into an overall metric using Support Vector Regression (SVR). In experiment section, we compared the SAAM metric with other popular video quality metrics on four publicly available databases: LIVE, CSIQ, IVP and FERIT-RTRK. The results reveal the promising quality prediction performance of the SAAM metric, which is superior to most of the popular compressed video quality evaluation models. Full article
Show Figures

Figure 1

Article
MSF-Net: Multi-Scale Feature Learning Network for Classification of Surface Defects of Multifarious Sizes
Sensors 2021, 21(15), 5125; https://doi.org/10.3390/s21155125 - 29 Jul 2021
Viewed by 574
Abstract
In the field of surface defect detection, the scale difference of product surface defects is often huge. The existing defect detection methods based on Convolutional Neural Networks (CNNs) are more inclined to express macro and abstract features, and the ability to express local [...] Read more.
In the field of surface defect detection, the scale difference of product surface defects is often huge. The existing defect detection methods based on Convolutional Neural Networks (CNNs) are more inclined to express macro and abstract features, and the ability to express local and small defects is insufficient, resulting in an imbalance of feature expression capabilities. In this paper, a Multi-Scale Feature Learning Network (MSF-Net) based on Dual Module Feature (DMF) extractor is proposed. DMF extractor is mainly composed of optimized Concatenated Rectified Linear Units (CReLUs) and optimized Inception feature extraction modules, which increases the diversity of feature receptive fields while reducing the amount of calculation; the feature maps of the middle layer with different sizes of receptive fields are merged to increase the richness of the receptive fields of the last layer of feature maps; the residual shortcut connections, batch normalization layer and average pooling layer are used to replace the fully connected layer to improve training efficiency, and make the multi-scale feature learning ability more balanced at the same time. Two representative multi-scale defect data sets are used for experiments, and the experimental results verify the advancement and effectiveness of the proposed MSF-Net in the detection of surface defects with multi-scale features. Full article
Show Figures

Figure 1

Article
Wheat Ear Recognition Based on RetinaNet and Transfer Learning
Sensors 2021, 21(14), 4845; https://doi.org/10.3390/s21144845 - 16 Jul 2021
Cited by 1 | Viewed by 704
Abstract
The number of wheat ears is an essential indicator for wheat production and yield estimation, but accurately obtaining wheat ears requires expensive manual cost and labor time. Meanwhile, the characteristics of wheat ears provide less information, and the color is consistent with the [...] Read more.
The number of wheat ears is an essential indicator for wheat production and yield estimation, but accurately obtaining wheat ears requires expensive manual cost and labor time. Meanwhile, the characteristics of wheat ears provide less information, and the color is consistent with the background, which can be challenging to obtain the number of wheat ears required. In this paper, the performance of Faster regions with convolutional neural networks (Faster R-CNN) and RetinaNet to predict the number of wheat ears for wheat at different growth stages under different conditions is investigated. The results show that using the Global WHEAT dataset for recognition, the RetinaNet method, and the Faster R-CNN method achieve an average accuracy of 0.82 and 0.72, with the RetinaNet method obtaining the highest recognition accuracy. Secondly, using the collected image data for recognition, the R2 of RetinaNet and Faster R-CNN after transfer learning is 0.9722 and 0.8702, respectively, indicating that the recognition accuracy of the RetinaNet method is higher on different data sets. We also tested wheat ears at both the filling and maturity stages; our proposed method has proven to be very robust (the R2 is above 90). This study provides technical support and a reference for automatic wheat ear recognition and yield estimation. Full article
Show Figures

Figure 1

Communication
Bionic Birdlike Imaging Using a Multi-Hyperuniform LED Array
Sensors 2021, 21(12), 4084; https://doi.org/10.3390/s21124084 - 14 Jun 2021
Viewed by 1069
Abstract
Digital cameras obtain color information of the scene using a chromatic filter, usually a Bayer filter, overlaid on a pixelated detector. However, the periodic arrangement of both the filter array and the detector array introduces frequency aliasing in sampling and color misregistration during [...] Read more.
Digital cameras obtain color information of the scene using a chromatic filter, usually a Bayer filter, overlaid on a pixelated detector. However, the periodic arrangement of both the filter array and the detector array introduces frequency aliasing in sampling and color misregistration during demosaicking process which causes degradation of image quality. Inspired by the biological structure of the avian retinas, we developed a chromatic LED array which has a geometric arrangement of multi-hyperuniformity, which exhibits an irregularity on small-length scales but a quasi-uniformity on large scales, to suppress frequency aliasing and color misregistration in full color image retrieval. Experiments were performed with a single-pixel imaging system using the multi-hyperuniform chromatic LED array to provide structured illumination, and 208 fps frame rate was achieved at 32 × 32 pixel resolution. By comparing the experimental results with the images captured with a conventional digital camera, it has been demonstrated that the proposed imaging system forms images with less chromatic moiré patterns and color misregistration artifacts. The concept proposed verified here could provide insights for the design and the manufacturing of future bionic imaging sensors. Full article
Show Figures

Figure 1

Article
Attention Networks for the Quality Enhancement of Light Field Images
Sensors 2021, 21(9), 3246; https://doi.org/10.3390/s21093246 - 07 May 2021
Viewed by 533
Abstract
In this paper, we propose a novel filtering method based on deep attention networks for the quality enhancement of light field (LF) images captured by plenoptic cameras and compressed using the High Efficiency Video Coding (HEVC) standard. The proposed architecture was built using [...] Read more.
In this paper, we propose a novel filtering method based on deep attention networks for the quality enhancement of light field (LF) images captured by plenoptic cameras and compressed using the High Efficiency Video Coding (HEVC) standard. The proposed architecture was built using efficient complex processing blocks and novel attention-based residual blocks. The network takes advantage of the macro-pixel (MP) structure, specific to LF images, and processes each reconstructed MP in the luminance (Y) channel. The input patch is represented as a tensor that collects, from an MP neighbourhood, four Epipolar Plane Images (EPIs) at four different angles. The experimental results on a common LF image database showed high improvements over HEVC in terms of the structural similarity index (SSIM), with an average Y-Bjøntegaard Delta (BD)-rate savings of 36.57%, and an average Y-BD-PSNR improvement of 2.301 dB. Increased performance was achieved when the HEVC built-in filtering methods were skipped. The visual results illustrate that the enhanced image contains sharper edges and more texture details. The ablation study provides two robust solutions to reduce the inference time by 44.6% and the network complexity by 74.7%. The results demonstrate the potential of attention networks for the quality enhancement of LF images encoded by HEVC. Full article
Show Figures

Figure 1

Article
DNet: Dynamic Neighborhood Feature Learning in Point Cloud
Sensors 2021, 21(7), 2327; https://doi.org/10.3390/s21072327 - 26 Mar 2021
Viewed by 677
Abstract
Neighborhood selection is very important for local region feature learning in point cloud learning networks. Different neighborhood selection schemes may lead to quite different results for point cloud processing tasks. The existing point cloud learning networks mainly adopt the approach of customizing the [...] Read more.
Neighborhood selection is very important for local region feature learning in point cloud learning networks. Different neighborhood selection schemes may lead to quite different results for point cloud processing tasks. The existing point cloud learning networks mainly adopt the approach of customizing the neighborhood, without considering whether the selected neighborhood is reasonable or not. To solve this problem, this paper proposes a new point cloud learning network, denoted as Dynamic neighborhood Network (DNet), to dynamically select the neighborhood and learn the features of each point. The proposed DNet has a multi-head structure which has two important modules: the Feature Enhancement Layer (FELayer) and the masking mechanism. The FELayer enhances the manifold features of the point cloud, while the masking mechanism is used to remove the neighborhood points with low contribution. The DNet can learn the manifold features and spatial geometric features of point cloud, and obtain the relationship between each point and its effective neighborhood points through the masking mechanism, so that the dynamic neighborhood features of each point can be obtained. Experimental results on three public datasets demonstrate that compared with the state-of-the-art learning networks, the proposed DNet shows better superiority and competitiveness in point cloud processing task. Full article
Show Figures

Figure 1

Article
NRA-Net—Neg-Region Attention Network for Salient Object Detection with Gaze Tracking
Sensors 2021, 21(5), 1753; https://doi.org/10.3390/s21051753 - 04 Mar 2021
Cited by 1 | Viewed by 742
Abstract
In this paper, we propose a detection method for salient objects whose eyes are focused on gaze tracking; this method does not require a device in a single image. A network was constructed using Neg-Region Attention (NRA), which predicts objects with a concentrated [...] Read more.
In this paper, we propose a detection method for salient objects whose eyes are focused on gaze tracking; this method does not require a device in a single image. A network was constructed using Neg-Region Attention (NRA), which predicts objects with a concentrated line of sight using deep learning techniques. The existing deep learning-based method has an autoencoder structure, which causes feature loss during the encoding process of compressing and extracting features from the image and the decoding process of expanding and restoring. As a result, a feature loss occurs in the area of the object from the detection results, or another area is detected as an object. The proposed method, that is, NRA, can be used for reducing feature loss and emphasizing object areas with encoders. After separating positive and negative regions using the exponential linear unit activation function, converted attention was performed for each region. The attention method provided without using the backbone network emphasized the object area and suppressed the background area. In the experimental results, the proposed method showed higher detection results than the conventional methods. Full article
Show Figures

Figure 1

Back to TopTop