sensors-logo

Journal Browser

Journal Browser

Sensor Fusion for Object Detection, Classification and Tracking

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensor Networks".

Deadline for manuscript submissions: closed (31 July 2021) | Viewed by 37778

Special Issue Editor


E-Mail Website
Guest Editor
School of Electrical Engineering, Korea University, Intelligent Signal Processing Center, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, South Korea
Interests: computer vision; acoustic signal processing; multi-sensor fusion; deep learning; big data analytics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Audio-visual-based object detection and tracking is a fundamental problem in computer vision and signal processing. Advances in sensor fusion technology combining more powerful and low-cost computer platforms with novel methods, particularly those relying on deep learning, are revolutionizing the computer vision field and provide new opportunities for research with larger and more diverse data sets. Recently, sensor fusion tasks for object detection, recognition, and tracking are being enabled by more flexible acoustic and vision sensors and their network scheme.

This call for papers invites technical contributions to Sensors Special Issue on “Sensor Fusion for Object Detection, Classification and Tracking”. The Special Issue aims to publish original technical papers and review papers on recent technologies that focus on object detection and tracking, knowledge extraction, distributed sensor networks, sensor fusion, and applications. Potential topics include but are not limited to the following:

  • Detection and tracking objects using various sensors;
  • Intelligent object detection algorithms;
  • Visual sensor network architecture for object detection and tracking;
  • Real-time visual object tracking in vision sensor network;
  • Intelligent machine learning mechanism for object detection and recognition;
  • Deep learning for real-time object detection and tracking;
  • Computational photography for object detection and tracking;
  • Development of non-visual sensors and their applications to video analysis and tracking;
  • Acoustic and vision sensor fusion schemes.

Prof. Hanseok Ko
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Artificial general intelligence
  • Sensor fusion
  • Deep learning
  • Big data analytics
  • Human–robot interactions
  • Human–computer integration
  • Data fusion
  • Acoustic sensors (microphone array)
  • Camera
  • Visual object recognition
  • Visual object tracking
  • Motion estimation
  • Video analytics
  • Visual surveillance and monitoring.

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review, Other

14 pages, 7608 KiB  
Article
DBSCAN-Based Tracklet Association Annealer for Advanced Multi-Object Tracking
by Jongwon Kim and Jeongho Cho
Sensors 2021, 21(17), 5715; https://doi.org/10.3390/s21175715 - 25 Aug 2021
Viewed by 2106
Abstract
Recently, as the demand for technological advancement in the field of autonomous driving and smart video surveillance is gradually increasing, considerable progress in multi-object tracking using deep neural networks has been achieved, and its application field is also expanding. However, various problems have [...] Read more.
Recently, as the demand for technological advancement in the field of autonomous driving and smart video surveillance is gradually increasing, considerable progress in multi-object tracking using deep neural networks has been achieved, and its application field is also expanding. However, various problems have not been fully addressed owing to the inherent limitations in video cameras, such as the tracking of objects in an occluded environment. Therefore, in this study, we propose a density-based object tracking technique redesigned based on DBSCAN, which has high robustness against noise and is excellent for nonlinear clustering. Moreover, it improves the noise vulnerability inherent to multi-object tracking, reduces the difficulty of trajectory separation, and facilitates real-time processing through simple structural expansion. Through performance test evaluation, it was confirmed that by using the proposed technique, several performance indices were improved compared to the existing tracking technique. In particular, when added as a post processor to the existing tracker, the tracking performance owing to noise suppression was considerably improved by more than 10%. Thus, the proposed method can be applied in industrial environments, such as real pedestrian analysis and surveillance security systems. Full article
(This article belongs to the Special Issue Sensor Fusion for Object Detection, Classification and Tracking)
Show Figures

Figure 1

19 pages, 1959 KiB  
Article
Centered Multi-Task Generative Adversarial Network for Small Object Detection
by Hongfeng Wang, Jianzhong Wang, Kemeng Bai and Yong Sun
Sensors 2021, 21(15), 5194; https://doi.org/10.3390/s21155194 - 31 Jul 2021
Cited by 8 | Viewed by 2744
Abstract
Despite the breakthroughs in accuracy and efficiency of object detection using deep neural networks, the performance of small object detection is far from satisfactory. Gaze estimation has developed significantly due to the development of visual sensors. Combining object detection with gaze estimation can [...] Read more.
Despite the breakthroughs in accuracy and efficiency of object detection using deep neural networks, the performance of small object detection is far from satisfactory. Gaze estimation has developed significantly due to the development of visual sensors. Combining object detection with gaze estimation can significantly improve the performance of small object detection. This paper presents a centered multi-task generative adversarial network (CMTGAN), which combines small object detection and gaze estimation. To achieve this, we propose a generative adversarial network (GAN) capable of image super-resolution and two-stage small object detection. We exploit a generator in CMTGAN for image super-resolution and a discriminator for object detection. We introduce an artificial texture loss into the generator to retain the original feature of small objects. We also use a centered mask in the generator to make the network focus on the central part of images where small objects are more likely to appear in our method. We propose a discriminator with detection loss for two-stage small object detection, which can be adapted to other GANs for object detection. Compared with existing interpolation methods, the super-resolution images generated by CMTGAN are more explicit and contain more information. Experiments show that our method exhibits a better detection performance than mainstream methods. Full article
(This article belongs to the Special Issue Sensor Fusion for Object Detection, Classification and Tracking)
Show Figures

Figure 1

15 pages, 921 KiB  
Article
Locating Ships Using Time Reversal and Matrix Pencil Method by Their Underwater Acoustic Signals
by Daniel Chaparro-Arce, Sergio Gutierrez, Andres Gallego, Cesar Pedraza, Felix Vega and Carlos Gutierrez
Sensors 2021, 21(15), 5065; https://doi.org/10.3390/s21155065 - 26 Jul 2021
Cited by 1 | Viewed by 2913
Abstract
This paper presents a technique, based on the matrix pencil method (MPM), for the compression of underwater acoustic signals produced by boat engines. The compressed signal, represented by its complex resonance expansion, is intended to be sent over a low-bit-rate wireless communication channel. [...] Read more.
This paper presents a technique, based on the matrix pencil method (MPM), for the compression of underwater acoustic signals produced by boat engines. The compressed signal, represented by its complex resonance expansion, is intended to be sent over a low-bit-rate wireless communication channel. We demonstrate that the method can provide data compression greater than 60%, ensuring a correlation greater than 93% between the reconstructed and the original signal, at a sampling frequency of 2.2 kHz. Once the signal was reconstituted, a localization process was carried out with the time reversal method (TR) using information from four different sensors in a simulation environment. This process sought to achieve the identification of the position of the ship using only passive sensors, considering two different sensor arrangements. Full article
(This article belongs to the Special Issue Sensor Fusion for Object Detection, Classification and Tracking)
Show Figures

Figure 1

24 pages, 5061 KiB  
Article
A Hybrid Visual Tracking Algorithm Based on SOM Network and Correlation Filter
by Yuanping Zhang, Xiumei Huang and Ming Yang
Sensors 2021, 21(8), 2864; https://doi.org/10.3390/s21082864 - 19 Apr 2021
Cited by 1 | Viewed by 2140
Abstract
To meet the challenge of video target tracking, based on a self-organization mapping network (SOM) and correlation filter, a long-term visual tracking algorithm is proposed. Objects in different videos or images often have completely different appearance, therefore, the self-organization mapping neural network with [...] Read more.
To meet the challenge of video target tracking, based on a self-organization mapping network (SOM) and correlation filter, a long-term visual tracking algorithm is proposed. Objects in different videos or images often have completely different appearance, therefore, the self-organization mapping neural network with the characteristics of signal processing mechanism of human brain neurons is used to perform adaptive and unsupervised features learning. A reliable method of robust target tracking is proposed, based on multiple adaptive correlation filters with a memory function of target appearance at the same time. Filters in our method have different updating strategies and can carry out long-term tracking cooperatively. The first is the displacement filter, a kernelized correlation filter that combines contextual characteristics to precisely locate and track targets. Secondly, the scale filters are used to predict the changing scale of a target. Finally, the memory filter is used to maintain the appearance of the target in long-term memory and judge whether the target has failed to track. If the tracking fails, the incremental learning detector is used to recover the target tracking in the way of sliding window. Several experiments show that our method can effectively solve the tracking problems such as severe occlusion, target loss and scale change, and is superior to the state-of-the-art methods in the aspects of efficiency, accuracy and robustness. Full article
(This article belongs to the Special Issue Sensor Fusion for Object Detection, Classification and Tracking)
Show Figures

Figure 1

18 pages, 2454 KiB  
Article
Manipulation Planning for Object Re-Orientation Based on Semantic Segmentation Keypoint Detection
by Ching-Chang Wong, Li-Yu Yeh, Chih-Cheng Liu, Chi-Yi Tsai and Hisasuki Aoyama
Sensors 2021, 21(7), 2280; https://doi.org/10.3390/s21072280 - 24 Mar 2021
Cited by 17 | Viewed by 3032
Abstract
In this paper, a manipulation planning method for object re-orientation based on semantic segmentation keypoint detection is proposed for robot manipulator which is able to detect and re-orientate the randomly placed objects to a specified position and pose. There are two main parts: [...] Read more.
In this paper, a manipulation planning method for object re-orientation based on semantic segmentation keypoint detection is proposed for robot manipulator which is able to detect and re-orientate the randomly placed objects to a specified position and pose. There are two main parts: (1) 3D keypoint detection system; and (2) manipulation planning system for object re-orientation. In the 3D keypoint detection system, an RGB-D camera is used to obtain the information of the environment and can generate 3D keypoints of the target object as inputs to represent its corresponding position and pose. This process simplifies the 3D model representation so that the manipulation planning for object re-orientation can be executed in a category-level manner by adding various training data of the object in the training phase. In addition, 3D suction points in both the object’s current and expected poses are also generated as the inputs of the next operation stage. During the next stage, Mask Region-Convolutional Neural Network (Mask R-CNN) algorithm is used for preliminary object detection and object image. The highest confidence index image is selected as the input of the semantic segmentation system in order to classify each pixel in the picture for the corresponding pack unit of the object. In addition, after using a convolutional neural network for semantic segmentation, the Conditional Random Fields (CRFs) method is used to perform several iterations to obtain a more accurate result of object recognition. When the target object is segmented into the pack units of image process, the center position of each pack unit can be obtained. Then, a normal vector of each pack unit’s center points is generated by the depth image information and pose of the object, which can be obtained by connecting the center points of each pack unit. In the manipulation planning system for object re-orientation, the pose of the object and the normal vector of each pack unit are first converted into the working coordinate system of the robot manipulator. Then, according to the current and expected pose of the object, the spherical linear interpolation (Slerp) algorithm is used to generate a series of movements in the workspace for object re-orientation on the robot manipulator. In addition, the pose of the object is adjusted on the z-axis of the object’s geodetic coordinate system based on the image features on the surface of the object, so that the pose of the placed object can approach the desired pose. Finally, a robot manipulator and a vacuum suction cup made by the laboratory are used to verify that the proposed system can indeed complete the planned task of object re-orientation. Full article
(This article belongs to the Special Issue Sensor Fusion for Object Detection, Classification and Tracking)
Show Figures

Figure 1

21 pages, 7771 KiB  
Article
Assessing MMA Welding Process Stability Using Machine Vision-Based Arc Features Tracking System
by Wojciech Jamrozik and Jacek Górka
Sensors 2021, 21(1), 84; https://doi.org/10.3390/s21010084 - 25 Dec 2020
Cited by 7 | Viewed by 4745
Abstract
Arc length is a crucial parameter of the manual metal arc (MMA) welding process, as it influences the arc voltage and the resulting welded joint. In the MMA method, the process’ stability is mainly controlled by the skills of a welder. According to [...] Read more.
Arc length is a crucial parameter of the manual metal arc (MMA) welding process, as it influences the arc voltage and the resulting welded joint. In the MMA method, the process’ stability is mainly controlled by the skills of a welder. According to that, giving the feedback about the arc length as well as the welding speed to the welder is a valuable property at the stage of weld training and in the production of welded elements. The proposed solution is based on the application of relatively cheap Complementary Metal Oxide Semiconductor (CMOS) cameras to track the welding electrode tip and to estimate the geometrical properties of welding arc. All measured parameters are varying during welding. To validate the results of image processing, arc voltage was measured as a reference value describing in some part the process stability. Full article
(This article belongs to the Special Issue Sensor Fusion for Object Detection, Classification and Tracking)
Show Figures

Figure 1

19 pages, 970 KiB  
Article
Fusion-ConvBERT: Parallel Convolution and BERT Fusion for Speech Emotion Recognition
by Sanghyun Lee, David K. Han and Hanseok Ko
Sensors 2020, 20(22), 6688; https://doi.org/10.3390/s20226688 - 23 Nov 2020
Cited by 20 | Viewed by 4985
Abstract
Speech emotion recognition predicts the emotional state of a speaker based on the person’s speech. It brings an additional element for creating more natural human–computer interactions. Earlier studies on emotional recognition have been primarily based on handcrafted features and manual labels. With the [...] Read more.
Speech emotion recognition predicts the emotional state of a speaker based on the person’s speech. It brings an additional element for creating more natural human–computer interactions. Earlier studies on emotional recognition have been primarily based on handcrafted features and manual labels. With the advent of deep learning, there have been some efforts in applying the deep-network-based approach to the problem of emotion recognition. As deep learning automatically extracts salient features correlated to speaker emotion, it brings certain advantages over the handcrafted-feature-based methods. There are, however, some challenges in applying them to the emotion recognition problem, because data required for properly training deep networks are often lacking. Therefore, there is a need for a new deep-learning-based approach which can exploit available information from given speech signals to the maximum extent possible. Our proposed method, called “Fusion-ConvBERT”, is a parallel fusion model consisting of bidirectional encoder representations from transformers and convolutional neural networks. Extensive experiments were conducted on the proposed model using the EMO-DB and Interactive Emotional Dyadic Motion Capture Database emotion corpus, and it was shown that the proposed method outperformed state-of-the-art techniques in most of the test configurations. Full article
(This article belongs to the Special Issue Sensor Fusion for Object Detection, Classification and Tracking)
Show Figures

Figure 1

17 pages, 4363 KiB  
Article
A Multitask Cascading CNN with MultiScale Infrared Optical Flow Feature Fusion-Based Abnormal Crowd Behavior Monitoring UAV
by Yanhua Shao, Wenfeng Li, Hongyu Chu, Zhiyuan Chang, Xiaoqiang Zhang and Huayi Zhan
Sensors 2020, 20(19), 5550; https://doi.org/10.3390/s20195550 - 28 Sep 2020
Cited by 15 | Viewed by 2662
Abstract
Visual-based object detection and understanding is an important problem in computer vision and signal processing. Due to their advantages of high mobility and easy deployment, unmanned aerial vehicles (UAV) have become a flexible monitoring platform in recent years. However, visible-light-based methods are often [...] Read more.
Visual-based object detection and understanding is an important problem in computer vision and signal processing. Due to their advantages of high mobility and easy deployment, unmanned aerial vehicles (UAV) have become a flexible monitoring platform in recent years. However, visible-light-based methods are often greatly influenced by the environment. As a result, a single type of feature derived from aerial monitoring videos is often insufficient to characterize variations among different abnormal crowd behaviors. To address this, we propose combining two types of features to better represent behavior, namely, multitask cascading CNN (MC-CNN) and multiscale infrared optical flow (MIR-OF), capturing both crowd density and average speed and the appearances of the crowd behaviors, respectively. First, an infrared (IR) camera and Nvidia Jetson TX1 were chosen as an infrared vision system. Since there are no published infrared-based aerial abnormal-behavior datasets, we provide a new infrared aerial dataset named the IR-flying dataset, which includes sample pictures and videos in different scenes of public areas. Second, MC-CNN was used to estimate the crowd density. Third, MIR-OF was designed to characterize the average speed of crowd. Finally, considering two typical abnormal crowd behaviors of crowd aggregating and crowd escaping, the experimental results show that the monitoring UAV system can detect abnormal crowd behaviors in public areas effectively. Full article
(This article belongs to the Special Issue Sensor Fusion for Object Detection, Classification and Tracking)
Show Figures

Figure 1

18 pages, 3575 KiB  
Article
A High-Speed Low-Cost VLSI System Capable of On-Chip Online Learning for Dynamic Vision Sensor Data Classification
by Wei He, Jinguo Huang, Tengxiao Wang, Yingcheng Lin, Junxian He, Xichuan Zhou, Ping Li, Ying Wang, Nanjian Wu and Cong Shi
Sensors 2020, 20(17), 4715; https://doi.org/10.3390/s20174715 - 21 Aug 2020
Cited by 6 | Viewed by 3502
Abstract
This paper proposes a high-speed low-cost VLSI system capable of on-chip online learning for classifying address-event representation (AER) streams from dynamic vision sensor (DVS) retina chips. The proposed system executes a lightweight statistic algorithm based on simple binary features extracted from AER streams [...] Read more.
This paper proposes a high-speed low-cost VLSI system capable of on-chip online learning for classifying address-event representation (AER) streams from dynamic vision sensor (DVS) retina chips. The proposed system executes a lightweight statistic algorithm based on simple binary features extracted from AER streams and a Random Ferns classifier to classify these features. The proposed system’s characteristics of multi-level pipelines and parallel processing circuits achieves a high throughput up to 1 spike event per clock cycle for AER data processing. Thanks to the nature of the lightweight algorithm, our hardware system is realized in a low-cost memory-centric paradigm. In addition, the system is capable of on-chip online learning to flexibly adapt to different in-situ application scenarios. The extra overheads for on-chip learning in terms of time and resource consumption are quite low, as the training procedure of the Random Ferns is quite simple, requiring few auxiliary learning circuits. An FPGA prototype of the proposed VLSI system was implemented with 9.5~96.7% memory consumption and <11% computational and logic resources on a Xilinx Zynq-7045 chip platform. It was running at a clock frequency of 100 MHz and achieved a peak processing throughput up to 100 Meps (Mega events per second), with an estimated power consumption of 690 mW leading to a high energy efficiency of 145 Meps/W or 145 event/μJ. We tested the prototype system on MNIST-DVS, Poker-DVS, and Posture-DVS datasets, and obtained classification accuracies of 77.9%, 99.4% and 99.3%, respectively. Compared to prior works, our VLSI system achieves higher processing speeds, higher computing efficiency, comparable accuracy, and lower resource costs. Full article
(This article belongs to the Special Issue Sensor Fusion for Object Detection, Classification and Tracking)
Show Figures

Figure 1

Review

Jump to: Research, Other

61 pages, 11909 KiB  
Review
3D Recognition Based on Sensor Modalities for Robotic Systems: A Survey
by Sumaira Manzoor, Sung-Hyeon Joo, Eun-Jin Kim, Sang-Hyeon Bae, Gun-Gyo In, Jeong-Won Pyo and Tae-Yong Kuc
Sensors 2021, 21(21), 7120; https://doi.org/10.3390/s21217120 - 27 Oct 2021
Cited by 7 | Viewed by 3558
Abstract
3D visual recognition is a prerequisite for most autonomous robotic systems operating in the real world. It empowers robots to perform a variety of tasks, such as tracking, understanding the environment, and human–robot interaction. Autonomous robots equipped with 3D recognition capability can better [...] Read more.
3D visual recognition is a prerequisite for most autonomous robotic systems operating in the real world. It empowers robots to perform a variety of tasks, such as tracking, understanding the environment, and human–robot interaction. Autonomous robots equipped with 3D recognition capability can better perform their social roles through supportive task assistance in professional jobs and effective domestic services. For active assistance, social robots must recognize their surroundings, including objects and places to perform the task more efficiently. This article first highlights the value-centric role of social robots in society by presenting recently developed robots and describes their main features. Instigated by the recognition capability of social robots, we present the analysis of data representation methods based on sensor modalities for 3D object and place recognition using deep learning models. In this direction, we delineate the research gaps that need to be addressed, summarize 3D recognition datasets, and present performance comparisons. Finally, a discussion of future research directions concludes the article. This survey is intended to show how recent developments in 3D visual recognition based on sensor modalities using deep-learning-based approaches can lay the groundwork to inspire further research and serves as a guide to those who are interested in vision-based robotics applications. Full article
(This article belongs to the Special Issue Sensor Fusion for Object Detection, Classification and Tracking)
Show Figures

Figure 1

Other

Jump to: Research, Review

13 pages, 1615 KiB  
Letter
Seismic Data Augmentation Based on Conditional Generative Adversarial Networks
by Yuanming Li, Bonhwa Ku, Shou Zhang, Jae-Kwang Ahn and Hanseok Ko
Sensors 2020, 20(23), 6850; https://doi.org/10.3390/s20236850 - 30 Nov 2020
Cited by 13 | Viewed by 3398
Abstract
Realistic synthetic data can be useful for data augmentation when training deep learning models to improve seismological detection and classification performance. In recent years, various deep learning techniques have been successfully applied in modern seismology. Due to the performance of deep learning depends [...] Read more.
Realistic synthetic data can be useful for data augmentation when training deep learning models to improve seismological detection and classification performance. In recent years, various deep learning techniques have been successfully applied in modern seismology. Due to the performance of deep learning depends on a sufficient volume of data, the data augmentation technique as a data-space solution is widely utilized. In this paper, we propose a Generative Adversarial Networks (GANs) based model that uses conditional knowledge to generate high-quality seismic waveforms. Unlike the existing method of generating samples directly from noise, the proposed method generates synthetic samples based on the statistical characteristics of real seismic waveforms in embedding space. Moreover, a content loss is added to relate high-level features extracted by a pre-trained model to the objective function to enhance the quality of the synthetic data. The classification accuracy is increased from 96.84% to 97.92% after mixing a certain amount of synthetic seismic waveforms, and results of the quality of seismic characteristics derived from the representative experiment show that the proposed model provides an effective structure for generating high-quality synthetic seismic waveforms. Thus, the proposed model is experimentally validated as a promising approach to realistic high-quality seismic waveform data augmentation. Full article
(This article belongs to the Special Issue Sensor Fusion for Object Detection, Classification and Tracking)
Show Figures

Figure 1

Back to TopTop