sensors-logo

Journal Browser

Journal Browser

Topical Collection "Advances in Deep-Learning-Based Sensing, Imaging, and Video Processing"

A topical collection in Sensors (ISSN 1424-8220). This collection belongs to the section "Sensing and Imaging".

Viewed by 38145

Editors

School of Electronics and Communication Engineering, Sun Yat-Sen University, Shenzhen 518017, China
Interests: video coding; image processing
Department of Computer Science, City University of Hong Kong, 83 Tatchee Ave., Kowloon, Hong Kong, China
Interests: image processing; video processing; image segmentation; machine learning
State Key Laboratory of Space Weather, National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China
Interests: image processing; machine learning; solar radio astronomy; synthesis aperture imaging
College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China
Interests: multimedia signal processing; coding and transmission; computer vision

Topical Collection Information

Dear Colleagues,

Deep learning techniques are capable of discovering knowledge from massive unstructured data and providing data-driven solutions. They have significantly improved technical advancements in many research fields and applications, such as audio-visual signal processing, computer vision, and pattern recognition. Additionally, deep learning and its improved techniques are expected to be included in future sensors and imaging systems.

Today, with the rapid development of advanced deep learning models and techniques, such as GAN, DNN, RNN, and LSTM, and the increasing demands around the effectiveness of visual signal processing, new opputunities are emerging in advances in deep-learning-based sensing, imaging, and video processing. This Special Issue aims at promoting cutting-edge research along this direction and offering a timely collection of works for researchers. We welcome high-quality original submissions related to advances in deep-learning-based sensing, imaging, and video processing.

Topics of interest include, but are not limited to

  • Deep learning theory, framework, database, and learning optimization;
  • Deep-learning-based remote sensing, multispectral, and/or high spectral sensing;
  • Deep-learning-based computational imaging and pre-processing;
  • Deep-learning-based visual perceptual model and quality assessment metrics;
  • Deep-learning-based image/video compression and communication;
  • Deep-learning-based 3D/multiview sensing, imaging, and video processing;
  • Deep-learning-based depth sensing and estimation;
  • Deep-learning-based image/video rendering, reconstruction, and enhancement;
  • Deep-learning-based visual object detection, tracking, and understanding;
  • Low complexity optimizations on deep-learning-based sensing, imaging and video processing;
  • Other advanced deep-learning-based visual sensing and signal processing.

Prof. Dr. Yun Zhang
Prof. Dr. KWONG Tak Wu Sam
Prof. Dr. Xu Long
Prof. Dr. Tiesong Zhao
Collection Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the collection website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • neural network
  • video compression
  • visual quality assessment
  • image and video enhancement
  • mutlispetral sensing and imaging

Published Papers (25 papers)

2023

Jump to: 2022, 2021

Article
Pixel Intensity Resemblance Measurement and Deep Learning Based Computer Vision Model for Crack Detection and Analysis
Sensors 2023, 23(6), 2954; https://doi.org/10.3390/s23062954 - 08 Mar 2023
Viewed by 465
Abstract
This research article is aimed at improving the efficiency of a computer vision system that uses image processing for detecting cracks. Images are prone to noise when captured using drones or under various lighting conditions. To analyze this, the images were gathered under [...] Read more.
This research article is aimed at improving the efficiency of a computer vision system that uses image processing for detecting cracks. Images are prone to noise when captured using drones or under various lighting conditions. To analyze this, the images were gathered under various conditions. To address the noise issue and to classify the cracks based on the severity level, a novel technique is proposed using a pixel-intensity resemblance measurement (PIRM) rule. Using PIRM, the noisy images and noiseless images were classified. Then, the noise was filtered using a median filter. The cracks were detected using VGG-16, ResNet-50 and InceptionResNet-V2 models. Once the crack was detected, the images were then segregated using a crack risk-analysis algorithm. Based on the severity level of the crack, an alert can be given to the authorized person to take the necessary action to avoid major accidents. The proposed technique achieved a 6% improvement without PIRM and a 10% improvement with the PIRM rule for the VGG-16 model. Similarly, it showed 3 and 10% for ResNet-50, 2 and 3% for Inception ResNet and a 9 and 10% increment for the Xception model. When the images were corrupted from a single noise alone, 95.6% accuracy was achieved using the ResNet-50 model for Gaussian noise, 99.65% accuracy was achieved through Inception ResNet-v2 for Poisson noise, and 99.95% accuracy was achieved by the Xception model for speckle noise. Full article
Show Figures

Figure 1

Article
Blind Video Quality Assessment for Ultra-High-Definition Video Based on Super-Resolution and Deep Reinforcement Learning
Sensors 2023, 23(3), 1511; https://doi.org/10.3390/s23031511 - 29 Jan 2023
Viewed by 650
Abstract
Ultra-high-definition (UHD) video has brought new challenges to objective video quality assessment (VQA) due to its high resolution and high frame rate. Most existing VQA methods are designed for non-UHD videos—when they are employed to deal with UHD videos, the processing speed will [...] Read more.
Ultra-high-definition (UHD) video has brought new challenges to objective video quality assessment (VQA) due to its high resolution and high frame rate. Most existing VQA methods are designed for non-UHD videos—when they are employed to deal with UHD videos, the processing speed will be slow and the global spatial features cannot be fully extracted. In addition, these VQA methods usually segment the video into multiple segments, predict the quality score of each segment, and then average the quality score of each segment to obtain the quality score of the whole video. This breaks the temporal correlation of the video sequences and is inconsistent with the characteristics of human visual perception. In this paper, we present a no-reference VQA method, aiming to effectively and efficiently predict quality scores for UHD videos. First, we construct a spatial distortion feature network based on a super-resolution model (SR-SDFNet), which can quickly extract the global spatial distortion features of UHD videos. Then, to aggregate the spatial distortion features of each UHD frame, we propose a time fusion network based on a reinforcement learning model (RL-TFNet), in which the actor network continuously combines multiple frame features extracted by SR-SDFNet and outputs an action to adjust the current quality score to approximate the subjective score, and the critic network outputs action values to optimize the quality perception of the actor network. Finally, we conduct large-scale experiments on UHD VQA databases and the results reveal that, compared to other state-of-the-art VQA methods, our method achieves competitive quality prediction performance with a shorter runtime and fewer model parameters. Full article
Show Figures

Figure 1

Article
Privacy Preserving Image Encryption with Optimal Deep Transfer Learning Based Accident Severity Classification Model
Sensors 2023, 23(1), 519; https://doi.org/10.3390/s23010519 - 03 Jan 2023
Cited by 1 | Viewed by 583
Abstract
Effective accident management acts as a vital part of emergency and traffic control systems. In such systems, accident data can be collected from different sources (unmanned aerial vehicles, surveillance cameras, on-site people, etc.) and images are considered a major source. Accident site photos [...] Read more.
Effective accident management acts as a vital part of emergency and traffic control systems. In such systems, accident data can be collected from different sources (unmanned aerial vehicles, surveillance cameras, on-site people, etc.) and images are considered a major source. Accident site photos and measurements are the most important evidence. Attackers will steal data and breach personal privacy, causing untold costs. The massive number of images commonly employed poses a significant challenge to privacy preservation, and image encryption can be used to accomplish cloud storage and secure image transmission. Automated severity estimation using deep-learning (DL) models becomes essential for effective accident management. Therefore, this article presents a novel Privacy Preserving Image Encryption with Optimal Deep-Learning-based Accident Severity Classification (PPIE-ODLASC) method. The primary objective of the PPIE-ODLASC algorithm is to securely transmit the accident images and classify accident severity into different levels. In the presented PPIE-ODLASC technique, two major processes are involved, namely encryption and severity classification (i.e., high, medium, low, and normal). For accident image encryption, the multi-key homomorphic encryption (MKHE) technique with lion swarm optimization (LSO)-based optimal key generation procedure is involved. In addition, the PPIE-ODLASC approach involves YOLO-v5 object detector to identify the region of interest (ROI) in the accident images. Moreover, the accident severity classification module encompasses Xception feature extractor, bidirectional gated recurrent unit (BiGRU) classification, and Bayesian optimization (BO)-based hyperparameter tuning. The experimental validation of the proposed PPIE-ODLASC algorithm is tested utilizing accident images and the outcomes are examined in terms of many measures. The comparative examination revealed that the PPIE-ODLASC technique showed an enhanced performance of 57.68 dB over other existing models. Full article
Show Figures

Figure 1

2022

Jump to: 2023, 2021

Article
Lightweight Super-Resolution with Self-Calibrated Convolution for Panoramic Videos
Sensors 2023, 23(1), 392; https://doi.org/10.3390/s23010392 - 30 Dec 2022
Cited by 1 | Viewed by 653
Abstract
Panoramic videos are shot by an omnidirectional camera or a collection of cameras, and can display a view in every direction. They can provide viewers with an immersive feeling. The study of super-resolution of panoramic videos has attracted much attention, and many methods [...] Read more.
Panoramic videos are shot by an omnidirectional camera or a collection of cameras, and can display a view in every direction. They can provide viewers with an immersive feeling. The study of super-resolution of panoramic videos has attracted much attention, and many methods have been proposed, especially deep learning-based methods. However, due to complex architectures of all the methods, they always result in a large number of hyperparameters. To address this issue, we propose the first lightweight super-resolution method with self-calibrated convolution for panoramic videos. A new deformable convolution module is designed first, with self-calibration convolution, which can learn more accurate offset and enhance feature alignment. Moreover, we present a new residual dense block for feature reconstruction, which can significantly reduce the parameters while maintaining performance. The performance of the proposed method is compared to those of the state-of-the-art methods, and is verified on the MiG panoramic video dataset. Full article
Show Figures

Figure 1

Article
Automatic Recognition of Road Damage Based on Lightweight Attentional Convolutional Neural Network
Sensors 2022, 22(24), 9599; https://doi.org/10.3390/s22249599 - 07 Dec 2022
Viewed by 1161
Abstract
An efficient road damage detection system can reduce the risk of road defects to motorists and road maintenance costs to traffic management authorities, for which a lightweight end-to-end road damage detection network is proposed in this paper, aiming at fast and automatic accurate [...] Read more.
An efficient road damage detection system can reduce the risk of road defects to motorists and road maintenance costs to traffic management authorities, for which a lightweight end-to-end road damage detection network is proposed in this paper, aiming at fast and automatic accurate identification and classification of multiple types of road damage. The proposed technique consists of a backbone network based on a combination of lightweight feature detection modules constituted with a multi-scale feature fusion network, which is more beneficial for target identification and classification at different distances and angles than other studies. An embedded lightweight attention module was also developed that can enhance feature information by assigning weights to multi-scale convolutional kernels to improve detection accuracy with fewer parameters. The proposed model generally has higher performance and fewer parameters than other representative models. According to our practice tests, it can identify many types of road damage based on the images captured by vehicle cameras and meet the real-time detection required when piggybacking on mobile systems. Full article
Show Figures

Figure 1

Article
Industrial Anomaly Detection with Skip Autoencoder and Deep Feature Extractor
Sensors 2022, 22(23), 9327; https://doi.org/10.3390/s22239327 - 30 Nov 2022
Cited by 1 | Viewed by 606
Abstract
Over recent years, with the advances in image recognition technology for deep learning, researchers have devoted continued efforts toward importing anomaly detection technology into the production line of automatic optical detection. Although unsupervised learning helps overcome the high costs associated with labeling, the [...] Read more.
Over recent years, with the advances in image recognition technology for deep learning, researchers have devoted continued efforts toward importing anomaly detection technology into the production line of automatic optical detection. Although unsupervised learning helps overcome the high costs associated with labeling, the accuracy of anomaly detection still needs to be improved. Accordingly, this paper proposes a novel deep learning model for anomaly detection to overcome this bottleneck. Leveraging a powerful pre-trained feature extractor and the skip connection, the proposed method achieves better feature extraction and image reconstructing capabilities. Results reveal that the areas under the curve (AUC) for the proposed method are higher than those of previous anomaly detection models for 16 out of 17 categories. This indicates that the proposed method can realize the most appropriate adjustments to the needs of production lines in order to maximize economic benefits. Full article
Show Figures

Figure 1

Article
Caps Captioning: A Modern Image Captioning Approach Based on Improved Capsule Network
Sensors 2022, 22(21), 8376; https://doi.org/10.3390/s22218376 - 01 Nov 2022
Cited by 1 | Viewed by 714
Abstract
In image captioning models, the main challenge in describing an image is identifying all the objects by precisely considering the relationships between the objects and producing various captions. Over the past few years, many methods have been proposed, from an attribute-to-attribute comparison approach [...] Read more.
In image captioning models, the main challenge in describing an image is identifying all the objects by precisely considering the relationships between the objects and producing various captions. Over the past few years, many methods have been proposed, from an attribute-to-attribute comparison approach to handling issues related to semantics and their relationships. Despite the improvements, the existing techniques suffer from inadequate positional and geometrical attributes concepts. The reason is that most of the abovementioned approaches depend on Convolutional Neural Networks (CNNs) for object detection. CNN is notorious for failing to detect equivariance and rotational invariance in objects. Moreover, the pooling layers in CNNs cause valuable information to be lost. Inspired by the recent successful approaches, this paper introduces a novel framework for extracting meaningful descriptions based on a parallelized capsule network that describes the content of images through a high level of understanding of the semantic contents of an image. The main contribution of this paper is proposing a new method that not only overrides the limitations of CNNs but also generates descriptions with a wide variety of words by using Wikipedia. In our framework, capsules focus on the generation of meaningful descriptions with more detailed spatial and geometrical attributes for a given set of images by considering the position of the entities as well as their relationships. Qualitative experiments on the benchmark dataset MS-COCO show that our framework outperforms state-of-the-art image captioning models when describing the semantic content of the images. Full article
Show Figures

Figure 1

Article
Deep Learning-Based Synthesized View Quality Enhancement with DIBR Distortion Mask Prediction Using Synthetic Images
Sensors 2022, 22(21), 8127; https://doi.org/10.3390/s22218127 - 24 Oct 2022
Cited by 1 | Viewed by 833
Abstract
Recently, deep learning-based image quality enhancement models have been proposed to improve the perceptual quality of distorted synthesized views impaired by compression and the Depth Image-Based Rendering (DIBR) process in a multi-view video system. However, due to the lack of Multi-view Video plus [...] Read more.
Recently, deep learning-based image quality enhancement models have been proposed to improve the perceptual quality of distorted synthesized views impaired by compression and the Depth Image-Based Rendering (DIBR) process in a multi-view video system. However, due to the lack of Multi-view Video plus Depth (MVD) data, the training data for quality enhancement models is small, which limits the performance and progress of these models. Augmenting the training data to enhance the synthesized view quality enhancement (SVQE) models is a feasible solution. In this paper, a deep learning-based SVQE model using more synthetic synthesized view images (SVIs) is suggested. To simulate the irregular geometric displacement of DIBR distortion, a random irregular polygon-based SVI synthesis method is proposed based on existing massive RGB/RGBD data, and a synthetic synthesized view database is constructed, which includes synthetic SVIs and the DIBR distortion mask. Moreover, to further guide the SVQE models to focus more precisely on DIBR distortion, a DIBR distortion mask prediction network which could predict the position and variance of DIBR distortion is embedded into the SVQE models. The experimental results on public MVD sequences demonstrate that the PSNR performance of the existing SVQE models, e.g., DnCNN, NAFNet, and TSAN, pre-trained on NYU-based synthetic SVIs could be greatly promoted by 0.51-, 0.36-, and 0.26 dB on average, respectively, while the MPPSNRr performance could also be elevated by 0.86, 0.25, and 0.24 on average, respectively. In addition, by introducing the DIBR distortion mask prediction network, the SVI quality obtained by the DnCNN and NAFNet pre-trained on NYU-based synthetic SVIs could be further enhanced by 0.02- and 0.03 dB on average in terms of the PSNR and 0.004 and 0.121 on average in terms of the MPPSNRr. Full article
Show Figures

Figure 1

Article
Semi-Supervised Defect Detection Method with Data-Expanding Strategy for PCB Quality Inspection
Sensors 2022, 22(20), 7971; https://doi.org/10.3390/s22207971 - 19 Oct 2022
Viewed by 777
Abstract
Printed circuit board (PCB) defect detection plays a crucial role in PCB production, and the popular methods are based on deep learning and require large-scale datasets with high-level ground-truth labels, in which it is time-consuming and costly to label these datasets. Semi-supervised learning [...] Read more.
Printed circuit board (PCB) defect detection plays a crucial role in PCB production, and the popular methods are based on deep learning and require large-scale datasets with high-level ground-truth labels, in which it is time-consuming and costly to label these datasets. Semi-supervised learning (SSL) methods, which reduce the need for labeled samples by leveraging unlabeled samples, can address this problem well. However, for PCB defects, the detection accuracy on small numbers of labeled samples still needs to be improved because the number of labeled samples is small, and the training process will be disturbed by the unlabeled samples. To overcome this problem, this paper proposed a semi-supervised defect detection method with a data-expanding strategy (DE-SSD). The proposed DE-SSD uses both the labeled and unlabeled samples, which can reduce the cost of data labeling, and a batch-adding strategy (BA-SSL) is introduced to leverage the unlabeled data with less disturbance. Moreover, a data-expanding (DE) strategy is proposed to use the labeled samples from other datasets to expand the target dataset, which can also prevent the disturbance by the unlabeled samples. Based on the improvements, the proposed DE-SSD can achieve competitive results for PCB defects with fewer labeled samples. The experimental results on DeepPCB indicate that the proposed DE-SSD achieves state-of-the-art performance, which is improved by 4.7 mAP at least compared with the previous methods. Full article
Show Figures

Figure 1

Article
Low-Light Image Enhancement Using Photometric Alignment with Hierarchy Pyramid Network
Sensors 2022, 22(18), 6799; https://doi.org/10.3390/s22186799 - 08 Sep 2022
Viewed by 785
Abstract
Low-light image enhancement can effectively assist high-level vision tasks that often fail in poor illumination conditions. Most previous data-driven methods, however, implemented enhancement directly from severely degraded low-light images that may provide undesirable enhancement results, including blurred detail, intensive noise, and distorted color. [...] Read more.
Low-light image enhancement can effectively assist high-level vision tasks that often fail in poor illumination conditions. Most previous data-driven methods, however, implemented enhancement directly from severely degraded low-light images that may provide undesirable enhancement results, including blurred detail, intensive noise, and distorted color. In this paper, inspired by a coarse-to-fine strategy, we propose an end-to-end image-level alignment with pixel-wise perceptual information enhancement pipeline for low-light image enhancement. A coarse adaptive global photometric alignment sub-network is constructed to reduce style differences, which facilitates improving illumination and revealing under-exposure area information. After the learned aligned image, a hierarchy pyramid enhancement sub-network is used to optimize image quality, which helps to remove amplified noise and enhance the local detail of low-light images. We also propose a multi-residual cascade attention block (MRCAB) that involves channel split and concatenation strategy, polarized self-attention mechanism, which leads to high-resolution reconstruction images in perceptual quality. Extensive experiments have demonstrated the effectiveness of our method on various datasets and significantly outperformed other state-of-the-art methods in detail and color reproduction. Full article
Show Figures

Figure 1

Editorial
Advances in Deep-Learning-Based Sensing, Imaging, and Video Processing
Sensors 2022, 22(16), 6192; https://doi.org/10.3390/s22166192 - 18 Aug 2022
Viewed by 772
Abstract
Deep learning techniques have shown their capabilities to discover knowledge from massive unstructured data, providing data-driven solutions for representation and decision making [...] Full article
Article
A Timestamp-Independent Haptic–Visual Synchronization Method for Haptic-Based Interaction System
Sensors 2022, 22(15), 5502; https://doi.org/10.3390/s22155502 - 23 Jul 2022
Cited by 1 | Viewed by 746
Abstract
The booming haptic data significantly improve the users’ immersion during multimedia interaction. As a result, the study of a Haptic-based Interaction System has attracted the attention of the multimedia community. To construct such a system, a challenging task is the synchronization of multiple [...] Read more.
The booming haptic data significantly improve the users’ immersion during multimedia interaction. As a result, the study of a Haptic-based Interaction System has attracted the attention of the multimedia community. To construct such a system, a challenging task is the synchronization of multiple sensorial signals that is critical to the user experience. Despite audio-visual synchronization efforts, there is still a lack of a haptic-aware multimedia synchronization model. In this work, we propose a timestamp-independent synchronization for haptic–visual signal transmission. First, we exploit the sequential correlations during delivery and playback of a haptic–visual communication system. Second, we develop a key sample extraction of haptic signals based on the force feedback characteristics and a key frame extraction of visual signals based on deep-object detection. Third, we combine the key samples and frames to synchronize the corresponding haptic–visual signals. Without timestamps in the signal flow, the proposed method is still effective and more robust in complicated network conditions. Subjective evaluation also shows a significant improvement of user experience with the proposed method. Full article
Show Figures

Figure 1

Article
Inspection of Underwater Hull Surface Condition Using the Soft Voting Ensemble of the Transfer-Learned Models
Sensors 2022, 22(12), 4392; https://doi.org/10.3390/s22124392 - 10 Jun 2022
Cited by 3 | Viewed by 839
Abstract
In this study, we propose a method for inspecting the condition of hull surfaces using underwater images acquired from the camera of a remotely controlled underwater vehicle (ROUV). To this end, a soft voting ensemble classifier comprising six well-known convolutional neural network models [...] Read more.
In this study, we propose a method for inspecting the condition of hull surfaces using underwater images acquired from the camera of a remotely controlled underwater vehicle (ROUV). To this end, a soft voting ensemble classifier comprising six well-known convolutional neural network models was used. Using the transfer learning technique, the images of the hull surfaces were used to retrain the six models. The proposed method exhibited an accuracy of 98.13%, a precision of 98.73%, a recall of 97.50%, and an F1-score of 98.11% for the classification of the test set. Furthermore, the time taken for the classification of one image was verified to be approximately 56.25 ms, which is applicable to ROUVs that require real-time inspection. Full article
Show Figures

Figure 1

Article
A Hybrid Deep Learning and Visualization Framework for Pushing Behavior Detection in Pedestrian Dynamics
Sensors 2022, 22(11), 4040; https://doi.org/10.3390/s22114040 - 26 May 2022
Cited by 3 | Viewed by 1563
Abstract
Crowded event entrances could threaten the comfort and safety of pedestrians, especially when some pedestrians push others or use gaps in crowds to gain faster access to an event. Studying and understanding pushing dynamics leads to designing and building more comfortable and safe [...] Read more.
Crowded event entrances could threaten the comfort and safety of pedestrians, especially when some pedestrians push others or use gaps in crowds to gain faster access to an event. Studying and understanding pushing dynamics leads to designing and building more comfortable and safe entrances. Researchers—to understand pushing dynamics—observe and analyze recorded videos to manually identify when and where pushing behavior occurs. Despite the accuracy of the manual method, it can still be time-consuming, tedious, and hard to identify pushing behavior in some scenarios. In this article, we propose a hybrid deep learning and visualization framework that aims to assist researchers in automatically identifying pushing behavior in videos. The proposed framework comprises two main components: (i) Deep optical flow and wheel visualization; to generate motion information maps. (ii) A combination of an EfficientNet-B0-based classifier and a false reduction algorithm for detecting pushing behavior at the video patch level. In addition to the framework, we present a new patch-based approach to enlarge the data and alleviate the class imbalance problem in small-scale pushing behavior datasets. Experimental results (using real-world ground truth of pushing behavior videos) demonstrate that the proposed framework achieves an 86% accuracy rate. Moreover, the EfficientNet-B0-based classifier outperforms baseline CNN-based classifiers in terms of accuracy. Full article
Show Figures

Figure 1

Article
Color-Dense Illumination Adjustment Network for Removing Haze and Smoke from Fire Scenario Images
Sensors 2022, 22(3), 911; https://doi.org/10.3390/s22030911 - 25 Jan 2022
Cited by 1 | Viewed by 1628
Abstract
The atmospheric particles and aerosols from burning usually cause visual artifacts in single images captured from fire scenarios. Most existing haze removal methods exploit the atmospheric scattering model (ASM) for visual enhancement, which inevitably leads to inaccurate estimation of the atmosphere light and [...] Read more.
The atmospheric particles and aerosols from burning usually cause visual artifacts in single images captured from fire scenarios. Most existing haze removal methods exploit the atmospheric scattering model (ASM) for visual enhancement, which inevitably leads to inaccurate estimation of the atmosphere light and transmission matrix of the smoky and hazy inputs. To solve these problems, we present a novel color-dense illumination adjustment network (CIANet) for joint recovery of transmission matrix, illumination intensity, and the dominant color of aerosols from a single image. Meanwhile, to improve the visual effects of the recovered images, the proposed CIANet jointly optimizes the transmission map, atmospheric optical value, the color of aerosol, and a preliminary recovered scene. Furthermore, we designed a reformulated ASM, called the aerosol scattering model (ESM), to smooth out the enhancement results while keeping the visual effects and the semantic information of different objects. Experimental results on both the proposed RFSIE and NTIRE’20 demonstrate our superior performance favorably against state-of-the-art dehazing methods regarding PSNR, SSIM and subjective visual quality. Furthermore, when concatenating CIANet with Faster R-CNN, we witness an improvement of the objection performance with a large margin. Full article
Show Figures

Figure 1

2021

Jump to: 2023, 2022

Article
Small Object Detection in Traffic Scenes Based on YOLO-MXANet
Sensors 2021, 21(21), 7422; https://doi.org/10.3390/s21217422 - 08 Nov 2021
Cited by 9 | Viewed by 2160
Abstract
In terms of small objects in traffic scenes, general object detection algorithms have low detection accuracy, high model complexity, and slow detection speed. To solve the above problems, an improved algorithm (named YOLO-MXANet) is proposed in this paper. Complete-Intersection over Union (CIoU) is [...] Read more.
In terms of small objects in traffic scenes, general object detection algorithms have low detection accuracy, high model complexity, and slow detection speed. To solve the above problems, an improved algorithm (named YOLO-MXANet) is proposed in this paper. Complete-Intersection over Union (CIoU) is utilized to improve loss function for promoting the positioning accuracy of the small object. In order to reduce the complexity of the model, we present a lightweight yet powerful backbone network (named SA-MobileNeXt) that incorporates channel and spatial attention. Our approach can extract expressive features more effectively by applying the Shuffle Channel and Spatial Attention (SCSA) module into the SandGlass Block (SGBlock) module while increasing the parameters by a small number. In addition, the data enhancement method combining Mosaic and Mixup is employed to improve the robustness of the training model. The Multi-scale Feature Enhancement Fusion (MFEF) network is proposed to fuse the extracted features better. In addition, the SiLU activation function is utilized to optimize the Convolution-Batchnorm-Leaky ReLU (CBL) module and the SGBlock module to accelerate the convergence of the model. The ablation experiments on the KITTI dataset show that each improved method is effective. The improved algorithm reduces the complexity and detection speed of the model while improving the object detection accuracy. The comparative experiments on the KITTY dataset and CCTSDB dataset with other algorithms show that our algorithm also has certain advantages. Full article
Show Figures

Figure 1

Article
Deepfake Detection Using the Rate of Change between Frames Based on Computer Vision
Sensors 2021, 21(21), 7367; https://doi.org/10.3390/s21217367 - 05 Nov 2021
Cited by 3 | Viewed by 3159
Abstract
Recently, artificial intelligence has been successfully used in fields, such as computer vision, voice, and big data analysis. However, various problems, such as security, privacy, and ethics, also occur owing to the development of artificial intelligence. One such problem are deepfakes. Deepfake is [...] Read more.
Recently, artificial intelligence has been successfully used in fields, such as computer vision, voice, and big data analysis. However, various problems, such as security, privacy, and ethics, also occur owing to the development of artificial intelligence. One such problem are deepfakes. Deepfake is a compound word for deep learning and fake. It refers to a fake video created using artificial intelligence technology or the production process itself. Deepfakes can be exploited for political abuse, pornography, and fake information. This paper proposes a method to determine integrity by analyzing the computer vision features of digital content. The proposed method extracts the rate of change in the computer vision features of adjacent frames and then checks whether the video is manipulated. The test demonstrated the highest detection rate of 97% compared to the existing method or machine learning method. It also maintained the highest detection rate of 96%, even for the test that manipulates the matrix of the image to avoid the convolutional neural network detection method. Full article
Show Figures

Figure 1

Article
Improving the Ability of a Laser Ultrasonic Wave-Based Detection of Damage on the Curved Surface of a Pipe Using a Deep Learning Technique
Sensors 2021, 21(21), 7105; https://doi.org/10.3390/s21217105 - 26 Oct 2021
Cited by 5 | Viewed by 1743
Abstract
With the advent of the Fourth Industrial Revolution, the economic, social, and technological demands for pipe maintenance are increasing due to the aging of the infrastructure caused by the increase in industrial development and the expansion of cities. Owing to this, an automatic [...] Read more.
With the advent of the Fourth Industrial Revolution, the economic, social, and technological demands for pipe maintenance are increasing due to the aging of the infrastructure caused by the increase in industrial development and the expansion of cities. Owing to this, an automatic pipe damage detection system was built using a laser-scanned pipe’s ultrasonic wave propagation imaging (UWPI) data and conventional neural network (CNN)-based object detection algorithms. The algorithm used in this study was EfficientDet-d0, a CNN-based object detection algorithm which uses the transfer learning method. As a result, the mean average precision (mAP) was measured to be 0.39. The result found was higher than COCO EfficientDet-d0 mAP, which is expected to enable the efficient maintenance of piping used in construction and many industries. Full article
Show Figures

Figure 1

Article
Compressed Video Quality Index Based on Saliency-Aware Artifact Detection
Sensors 2021, 21(19), 6429; https://doi.org/10.3390/s21196429 - 26 Sep 2021
Cited by 2 | Viewed by 5267
Abstract
Video coding technology makes the required storage and transmission bandwidth of video services decrease by reducing the bitrate of the video stream. However, the compressed video signals may involve perceivable information loss, especially when the video is overcompressed. In such cases, the viewers [...] Read more.
Video coding technology makes the required storage and transmission bandwidth of video services decrease by reducing the bitrate of the video stream. However, the compressed video signals may involve perceivable information loss, especially when the video is overcompressed. In such cases, the viewers can observe visually annoying artifacts, namely, Perceivable Encoding Artifacts (PEAs), which degrade their perceived video quality. To monitor and measure these PEAs (including blurring, blocking, ringing and color bleeding), we propose an objective video quality metric named Saliency-Aware Artifact Measurement (SAAM) without any reference information. The SAAM metric first introduces video saliency detection to extract interested regions and further splits these regions into a finite number of image patches. For each image patch, the data-driven model is utilized to evaluate intensities of PEAs. Finally, these intensities are fused into an overall metric using Support Vector Regression (SVR). In experiment section, we compared the SAAM metric with other popular video quality metrics on four publicly available databases: LIVE, CSIQ, IVP and FERIT-RTRK. The results reveal the promising quality prediction performance of the SAAM metric, which is superior to most of the popular compressed video quality evaluation models. Full article
Show Figures

Figure 1

Article
MSF-Net: Multi-Scale Feature Learning Network for Classification of Surface Defects of Multifarious Sizes
Sensors 2021, 21(15), 5125; https://doi.org/10.3390/s21155125 - 29 Jul 2021
Cited by 3 | Viewed by 1527
Abstract
In the field of surface defect detection, the scale difference of product surface defects is often huge. The existing defect detection methods based on Convolutional Neural Networks (CNNs) are more inclined to express macro and abstract features, and the ability to express local [...] Read more.
In the field of surface defect detection, the scale difference of product surface defects is often huge. The existing defect detection methods based on Convolutional Neural Networks (CNNs) are more inclined to express macro and abstract features, and the ability to express local and small defects is insufficient, resulting in an imbalance of feature expression capabilities. In this paper, a Multi-Scale Feature Learning Network (MSF-Net) based on Dual Module Feature (DMF) extractor is proposed. DMF extractor is mainly composed of optimized Concatenated Rectified Linear Units (CReLUs) and optimized Inception feature extraction modules, which increases the diversity of feature receptive fields while reducing the amount of calculation; the feature maps of the middle layer with different sizes of receptive fields are merged to increase the richness of the receptive fields of the last layer of feature maps; the residual shortcut connections, batch normalization layer and average pooling layer are used to replace the fully connected layer to improve training efficiency, and make the multi-scale feature learning ability more balanced at the same time. Two representative multi-scale defect data sets are used for experiments, and the experimental results verify the advancement and effectiveness of the proposed MSF-Net in the detection of surface defects with multi-scale features. Full article
Show Figures

Figure 1

Article
Wheat Ear Recognition Based on RetinaNet and Transfer Learning
Sensors 2021, 21(14), 4845; https://doi.org/10.3390/s21144845 - 16 Jul 2021
Cited by 17 | Viewed by 2219
Abstract
The number of wheat ears is an essential indicator for wheat production and yield estimation, but accurately obtaining wheat ears requires expensive manual cost and labor time. Meanwhile, the characteristics of wheat ears provide less information, and the color is consistent with the [...] Read more.
The number of wheat ears is an essential indicator for wheat production and yield estimation, but accurately obtaining wheat ears requires expensive manual cost and labor time. Meanwhile, the characteristics of wheat ears provide less information, and the color is consistent with the background, which can be challenging to obtain the number of wheat ears required. In this paper, the performance of Faster regions with convolutional neural networks (Faster R-CNN) and RetinaNet to predict the number of wheat ears for wheat at different growth stages under different conditions is investigated. The results show that using the Global WHEAT dataset for recognition, the RetinaNet method, and the Faster R-CNN method achieve an average accuracy of 0.82 and 0.72, with the RetinaNet method obtaining the highest recognition accuracy. Secondly, using the collected image data for recognition, the R2 of RetinaNet and Faster R-CNN after transfer learning is 0.9722 and 0.8702, respectively, indicating that the recognition accuracy of the RetinaNet method is higher on different data sets. We also tested wheat ears at both the filling and maturity stages; our proposed method has proven to be very robust (the R2 is above 90). This study provides technical support and a reference for automatic wheat ear recognition and yield estimation. Full article
Show Figures

Figure 1

Communication
Bionic Birdlike Imaging Using a Multi-Hyperuniform LED Array
Sensors 2021, 21(12), 4084; https://doi.org/10.3390/s21124084 - 14 Jun 2021
Cited by 1 | Viewed by 1890
Abstract
Digital cameras obtain color information of the scene using a chromatic filter, usually a Bayer filter, overlaid on a pixelated detector. However, the periodic arrangement of both the filter array and the detector array introduces frequency aliasing in sampling and color misregistration during [...] Read more.
Digital cameras obtain color information of the scene using a chromatic filter, usually a Bayer filter, overlaid on a pixelated detector. However, the periodic arrangement of both the filter array and the detector array introduces frequency aliasing in sampling and color misregistration during demosaicking process which causes degradation of image quality. Inspired by the biological structure of the avian retinas, we developed a chromatic LED array which has a geometric arrangement of multi-hyperuniformity, which exhibits an irregularity on small-length scales but a quasi-uniformity on large scales, to suppress frequency aliasing and color misregistration in full color image retrieval. Experiments were performed with a single-pixel imaging system using the multi-hyperuniform chromatic LED array to provide structured illumination, and 208 fps frame rate was achieved at 32 × 32 pixel resolution. By comparing the experimental results with the images captured with a conventional digital camera, it has been demonstrated that the proposed imaging system forms images with less chromatic moiré patterns and color misregistration artifacts. The concept proposed verified here could provide insights for the design and the manufacturing of future bionic imaging sensors. Full article
Show Figures

Figure 1

Article
Attention Networks for the Quality Enhancement of Light Field Images
Sensors 2021, 21(9), 3246; https://doi.org/10.3390/s21093246 - 07 May 2021
Cited by 1 | Viewed by 1334
Abstract
In this paper, we propose a novel filtering method based on deep attention networks for the quality enhancement of light field (LF) images captured by plenoptic cameras and compressed using the High Efficiency Video Coding (HEVC) standard. The proposed architecture was built using [...] Read more.
In this paper, we propose a novel filtering method based on deep attention networks for the quality enhancement of light field (LF) images captured by plenoptic cameras and compressed using the High Efficiency Video Coding (HEVC) standard. The proposed architecture was built using efficient complex processing blocks and novel attention-based residual blocks. The network takes advantage of the macro-pixel (MP) structure, specific to LF images, and processes each reconstructed MP in the luminance (Y) channel. The input patch is represented as a tensor that collects, from an MP neighbourhood, four Epipolar Plane Images (EPIs) at four different angles. The experimental results on a common LF image database showed high improvements over HEVC in terms of the structural similarity index (SSIM), with an average Y-Bjøntegaard Delta (BD)-rate savings of 36.57%, and an average Y-BD-PSNR improvement of 2.301 dB. Increased performance was achieved when the HEVC built-in filtering methods were skipped. The visual results illustrate that the enhanced image contains sharper edges and more texture details. The ablation study provides two robust solutions to reduce the inference time by 44.6% and the network complexity by 74.7%. The results demonstrate the potential of attention networks for the quality enhancement of LF images encoded by HEVC. Full article
Show Figures

Figure 1

Article
DNet: Dynamic Neighborhood Feature Learning in Point Cloud
Sensors 2021, 21(7), 2327; https://doi.org/10.3390/s21072327 - 26 Mar 2021
Cited by 3 | Viewed by 1310
Abstract
Neighborhood selection is very important for local region feature learning in point cloud learning networks. Different neighborhood selection schemes may lead to quite different results for point cloud processing tasks. The existing point cloud learning networks mainly adopt the approach of customizing the [...] Read more.
Neighborhood selection is very important for local region feature learning in point cloud learning networks. Different neighborhood selection schemes may lead to quite different results for point cloud processing tasks. The existing point cloud learning networks mainly adopt the approach of customizing the neighborhood, without considering whether the selected neighborhood is reasonable or not. To solve this problem, this paper proposes a new point cloud learning network, denoted as Dynamic neighborhood Network (DNet), to dynamically select the neighborhood and learn the features of each point. The proposed DNet has a multi-head structure which has two important modules: the Feature Enhancement Layer (FELayer) and the masking mechanism. The FELayer enhances the manifold features of the point cloud, while the masking mechanism is used to remove the neighborhood points with low contribution. The DNet can learn the manifold features and spatial geometric features of point cloud, and obtain the relationship between each point and its effective neighborhood points through the masking mechanism, so that the dynamic neighborhood features of each point can be obtained. Experimental results on three public datasets demonstrate that compared with the state-of-the-art learning networks, the proposed DNet shows better superiority and competitiveness in point cloud processing task. Full article
Show Figures

Figure 1

Article
NRA-Net—Neg-Region Attention Network for Salient Object Detection with Gaze Tracking
Sensors 2021, 21(5), 1753; https://doi.org/10.3390/s21051753 - 04 Mar 2021
Cited by 2 | Viewed by 1585
Abstract
In this paper, we propose a detection method for salient objects whose eyes are focused on gaze tracking; this method does not require a device in a single image. A network was constructed using Neg-Region Attention (NRA), which predicts objects with a concentrated [...] Read more.
In this paper, we propose a detection method for salient objects whose eyes are focused on gaze tracking; this method does not require a device in a single image. A network was constructed using Neg-Region Attention (NRA), which predicts objects with a concentrated line of sight using deep learning techniques. The existing deep learning-based method has an autoencoder structure, which causes feature loss during the encoding process of compressing and extracting features from the image and the decoding process of expanding and restoring. As a result, a feature loss occurs in the area of the object from the detection results, or another area is detected as an object. The proposed method, that is, NRA, can be used for reducing feature loss and emphasizing object areas with encoders. After separating positive and negative regions using the exponential linear unit activation function, converted attention was performed for each region. The attention method provided without using the backbone network emphasized the object area and suppressed the background area. In the experimental results, the proposed method showed higher detection results than the conventional methods. Full article
Show Figures

Figure 1

Back to TopTop