sensors-logo

Journal Browser

Journal Browser

Semantic Representations for Behavior Analysis in Robotic System

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: closed (30 June 2019) | Viewed by 45985

Special Issue Editors


E-Mail Website
Guest Editor
School of Automation Science and Electrical Engineering, Beihang University, Beijing, China
Interests: robotic vision; path planning of UAVs; pattern recognition; machine learning; face recognition; wavelets
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Computing and Communications, Lancaster University, Lancaster, UK
Interests: deep learning; video analysis; object tracking; wavelets

Special Issue Information

Dear Colleagues,

Vision sensors and smart cameras have been increasingly used in a variety of applications, such as surveillance, manufacturing, entertainment, robotics, etc. Robotic vision plays a significant role in the era of artificial intelligence.

Among the related topics, motion and behavior analysis in robotic systems have witnessed tremendous progress in the last twenty years. Recently, researchers in robotic vision have shifted their attention from the monitoring of a single subject’ behaviors in a relatively simple environment to that of the behavior of multiple subjects in a crowd environment. In contrast to single subject behavior, multiple behavior analysis faces more challenges, such as complex interactions, diverse semantics and various expressions. This is due to the gap between the information directly extracted from videos and semantic interpretations by human beings.

To bridge this gap, a number of feature representation approaches (e.g., Cuboids, HOG/HOF, HOG3D, and eSURF) have been subsequently reported to address the coherence between the extracted features and the semantic interpretations. Unfortunately, due to redundancy and complexity, these hard-crafted features may lead to diverse variations in semantic representations for social behavior analysis.

In recent years, deep semantic representations have proven to be an effective tool for complicated behavior analysis. Such high-level semantic representations achieve desired performance even if in crowded environments. In addition, statistical approaches, syntactic approaches, and description-based approaches also gain increasing attention in computer vision community.

On the other hand, with the continuous deepening of space exploration, the research and development of space robot systems is becoming more and more important. By analyzing the space information obtained from a sensor system and visual system carried by the space robot, people can have more knowledge and judgments regarding space. Therefore, how to process the information obtained by space robots and how to transmit it to ground stations is also a very important question.

The primary purpose of this Special Issue is to organize a collection of recently-developed semantic representations for behavior analysis in complex environments, spreading over object detection, tracking, motion trajectory acquisition and analysis, semantic feature extraction, social behavior analysis and applications. This Special Issue is intended to be an international forum for researchers to report recent developments in this field. Topics include, but are not limited to:

  • All aspects of robotic vision and UAVs
  • Real-time moving object detection and tracking in crowed environments
  • 3D scene reconstruction and occlusion handling
  • Long-term trajectory clustering and analysis for crowed behaviors
  • Probabilistic statistical models for local semantic representation
  • Context model for global semantic representation
  • Event recognition in crowed environments
  • Abnormal behavior detection in crowed environments
  • Real-time algorithms for large scale social behavior analysis
  • Signal processing and communication system of space robot to ground station
  • Deep learning for resource-constrained embedded vision sensor applications
Dr. Baochang Zhang
Dr. Jungong Han
Dr. Chen Chen
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 6759 KiB  
Article
Multi-View Fusion-Based 3D Object Detection for Robot Indoor Scene Perception
by Li Wang, Ruifeng Li, Jingwen Sun, Xingxing Liu, Lijun Zhao, Hock Soon Seah, Chee Kwang Quah and Budianto Tandianus
Sensors 2019, 19(19), 4092; https://doi.org/10.3390/s19194092 - 21 Sep 2019
Cited by 19 | Viewed by 5003
Abstract
To autonomously move and operate objects in cluttered indoor environments, a service robot requires the ability of 3D scene perception. Though 3D object detection can provide an object-level environmental description to fill this gap, a robot always encounters incomplete object observation, recurring detections [...] Read more.
To autonomously move and operate objects in cluttered indoor environments, a service robot requires the ability of 3D scene perception. Though 3D object detection can provide an object-level environmental description to fill this gap, a robot always encounters incomplete object observation, recurring detections of the same object, error in detection, or intersection between objects when conducting detection continuously in a cluttered room. To solve these problems, we propose a two-stage 3D object detection algorithm which is to fuse multiple views of 3D object point clouds in the first stage and to eliminate unreasonable and intersection detections in the second stage. For each view, the robot performs a 2D object semantic segmentation and obtains 3D object point clouds. Then, an unsupervised segmentation method called Locally Convex Connected Patches (LCCP) is utilized to segment the object accurately from the background. Subsequently, the Manhattan Frame estimation is implemented to calculate the main orientation of the object and subsequently, the 3D object bounding box can be obtained. To deal with the detected objects in multiple views, we construct an object database and propose an object fusion criterion to maintain it automatically. Thus, the same object observed in multi-view is fused together and a more accurate bounding box can be calculated. Finally, we propose an object filtering approach based on prior knowledge to remove incorrect and intersecting objects in the object dataset. Experiments are carried out on both SceneNN dataset and a real indoor environment to verify the stability and accuracy of 3D semantic segmentation and bounding box detection of the object with multi-view fusion. Full article
(This article belongs to the Special Issue Semantic Representations for Behavior Analysis in Robotic System)
Show Figures

Figure 1

20 pages, 14425 KiB  
Article
Intelligent 3D Perception System for Semantic Description and Dynamic Interaction
by Marco Antonio Simoes Teixeira, Rafael de Castro Martins Nogueira, Nicolas Dalmedico, Higor Barbosa Santos, Lucia Valeria Ramos de Arruda, Flavio Neves-Jr, Daniel Rodrigues Pipa, Julio Endress Ramos and Andre Schneider de Oliveira
Sensors 2019, 19(17), 3764; https://doi.org/10.3390/s19173764 - 30 Aug 2019
Cited by 5 | Viewed by 3147
Abstract
This work proposes a novel semantic perception system based on computer vision and machine learning techniques. The main goal is to identify objects in the environment and extract their characteristics, allowing a dynamic interaction with the environment. The system is composed of a [...] Read more.
This work proposes a novel semantic perception system based on computer vision and machine learning techniques. The main goal is to identify objects in the environment and extract their characteristics, allowing a dynamic interaction with the environment. The system is composed of a GPU processing source and a 3D vision sensor that provides RGB image and PointCloud data. The perception system is structured in three steps: Lexical Analysis, Syntax Analysis and finally an Analysis of Anticipation. The Lexical Analysis detects the actual position of the objects (or tokens) in the environment, through the combination of RGB image and PointCloud, surveying their characteristics. All information extracted from the tokens will be used to retrieve relevant features such as object velocity, acceleration and direction during the Syntax Analysis step. The anticipation step predicts future behaviors for these dynamic objects, promoting an interaction with them in terms of collisions, pull, and push actions. As a result, the proposed perception source can assign relevant information to mobile robots, not only about distances as traditional sensors, but about other environment characteristics and object behaviors. This novel perception source introduces a new class of skills to mobile robots. Experimental results obtained with a real robot are presented, showing the proposed perception source efficacy and potential. Full article
(This article belongs to the Special Issue Semantic Representations for Behavior Analysis in Robotic System)
Show Figures

Figure 1

16 pages, 4728 KiB  
Article
Dual-Resolution Dual-Path Convolutional Neural Networks for Fast Object Detection
by Jing Pan, Hanqing Sun, Zhanjie Song and Jungong Han
Sensors 2019, 19(14), 3111; https://doi.org/10.3390/s19143111 - 14 Jul 2019
Cited by 5 | Viewed by 3532
Abstract
Downsampling input images is a simple trick to speed up visual object-detection algorithms, especially on robotic vision and applied mobile vision systems. However, this trick comes with a significant decline in accuracy. In this paper, dual-resolution dual-path Convolutional Neural Networks (CNNs), named DualNets, [...] Read more.
Downsampling input images is a simple trick to speed up visual object-detection algorithms, especially on robotic vision and applied mobile vision systems. However, this trick comes with a significant decline in accuracy. In this paper, dual-resolution dual-path Convolutional Neural Networks (CNNs), named DualNets, are proposed to bump up the accuracy of those detection applications. In contrast to previous methods that simply downsample the input images, DualNets explicitly take dual inputs in different resolutions and extract complementary visual features from these using dual CNN paths. The two paths in a DualNet are a backbone path and an auxiliary path that accepts larger inputs and then rapidly downsamples them to relatively small feature maps. With the help of the carefully designed auxiliary CNN paths in DualNets, auxiliary features are extracted from the larger input with controllable computation. Auxiliary features are then fused with the backbone features using a proposed progressive residual fusion strategy to enrich feature representation.This architecture, as the feature extractor, is further integrated with the Single Shot Detector (SSD) to accomplish latency-sensitive visual object-detection tasks. We evaluate the resulting detection pipeline on Pascal VOC and MS COCO benchmarks. Results show that the proposed DualNets can raise the accuracy of those CNN detection applications that are sensitive to computation payloads. Full article
(This article belongs to the Special Issue Semantic Representations for Behavior Analysis in Robotic System)
Show Figures

Figure 1

19 pages, 24475 KiB  
Article
Learning the Cost Function for Foothold Selection in a Quadruped Robot
by Xingdong Li, Hewei Gao, Fusheng Zha, Jian Li, Yangwei Wang, Yanling Guo and Xin Wang
Sensors 2019, 19(6), 1292; https://doi.org/10.3390/s19061292 - 14 Mar 2019
Cited by 2 | Viewed by 3216
Abstract
This paper is focused on designing a cost function of selecting a foothold for a physical quadruped robot walking on rough terrain. The quadruped robot is modeled with Denavit–Hartenberg (DH) parameters, and then a default foothold is defined based on the model. Time [...] Read more.
This paper is focused on designing a cost function of selecting a foothold for a physical quadruped robot walking on rough terrain. The quadruped robot is modeled with Denavit–Hartenberg (DH) parameters, and then a default foothold is defined based on the model. Time of Flight (TOF) camera is used to perceive terrain information and construct a 2.5D elevation map, on which the terrain features are detected. The cost function is defined as the weighted sum of several elements including terrain features and some features on the relative pose between the default foothold and other candidates. It is nearly impossible to hand-code the weight vector of the function, so the weights are learned using Supporting Vector Machine (SVM) techniques, and the training data set is generated from the 2.5D elevation map of a real terrain under the guidance of experts. Four candidate footholds around the default foothold are randomly sampled, and the expert gives the order of such four candidates by rotating and scaling the view for seeing clearly. Lastly, the learned cost function is used to select a suitable foothold and drive the quadruped robot to walk autonomously across the rough terrain with wooden steps. Comparing to the approach with the original standard static gait, the proposed cost function shows better performance. Full article
(This article belongs to the Special Issue Semantic Representations for Behavior Analysis in Robotic System)
Show Figures

Figure 1

14 pages, 6474 KiB  
Article
Assistive Grasping Based on Laser-point Detection with Application to Wheelchair-mounted Robotic Arms
by Ming Zhong, Yanqiang Zhang, Xi Yang, Yufeng Yao, Junlong Guo, Yaping Wang and Yaxin Liu
Sensors 2019, 19(2), 303; https://doi.org/10.3390/s19020303 - 14 Jan 2019
Cited by 10 | Viewed by 4528
Abstract
As the aging of the population becomes more severe, wheelchair-mounted robotic arms (WMRAs) are gaining an increased amount of attention. Laser pointer interactions are an attractive method enabling humans to unambiguously point out objects and pick them up. In addition, they bring about [...] Read more.
As the aging of the population becomes more severe, wheelchair-mounted robotic arms (WMRAs) are gaining an increased amount of attention. Laser pointer interactions are an attractive method enabling humans to unambiguously point out objects and pick them up. In addition, they bring about a greater sense of participation in the interaction process as an intuitive interaction mode. However, the issue of human–robot interactions remains to be properly tackled, and traditional laser point interactions still suffer from poor real-time performance and low accuracy amid dynamic backgrounds. In this study, combined with an advanced laser point detection method and an improved pose estimation algorithm, a laser pointer is used to facilitate the interactions between humans and a WMRA in an indoor environment. Assistive grasping using a laser selection consists of two key steps. In the first step, the images captured using an RGB-D camera are pre-processed, and then fed to a convolutional neural network (CNN) to determine the 2D coordinates of the laser point and objects within the image. Meanwhile, the centroid coordinates of the selected object are also obtained using the depth information. In this way, the object to be picked up and its location are determined. The experimental results show that the laser point can be detected with almost 100% accuracy in a complex environment. In the second step, a compound pose-estimation algorithm aiming at a sparse use of multi-view templates is applied, which consists of both coarse- and precise-matching of the target to the template objects, greatly improving the grasping performance. The proposed algorithms were implemented on a Kinova Jaco robotic arm, and the experimental results demonstrate their effectiveness. Compared with commonly accepted methods, the time consumption of the pose generation can be reduced from 5.36 to 4.43 s, and synchronously, the pose estimation error is significantly improved from 21.31% to 3.91%. Full article
(This article belongs to the Special Issue Semantic Representations for Behavior Analysis in Robotic System)
Show Figures

Figure 1

25 pages, 1423 KiB  
Article
Computational Assessment of Facial Expression Production in ASD Children
by Marco Leo, Pierluigi Carcagnì, Cosimo Distante, Paolo Spagnolo, Pier Luigi Mazzeo, Anna Chiara Rosato, Serena Petrocchi, Chiara Pellegrino, Annalisa Levante, Filomena De Lumè and Flavia Lecciso
Sensors 2018, 18(11), 3993; https://doi.org/10.3390/s18113993 - 16 Nov 2018
Cited by 54 | Viewed by 6833
Abstract
In this paper, a computational approach is proposed and put into practice to assess the capability of children having had diagnosed Autism Spectrum Disorders (ASD) to produce facial expressions. The proposed approach is based on computer vision components working on sequence of images [...] Read more.
In this paper, a computational approach is proposed and put into practice to assess the capability of children having had diagnosed Autism Spectrum Disorders (ASD) to produce facial expressions. The proposed approach is based on computer vision components working on sequence of images acquired by an off-the-shelf camera in unconstrained conditions. Action unit intensities are estimated by analyzing local appearance and then both temporal and geometrical relationships, learned by Convolutional Neural Networks, are exploited to regularize gathered estimates. To cope with stereotyped movements and to highlight even subtle voluntary movements of facial muscles, a personalized and contextual statistical modeling of non-emotional face is formulated and used as a reference. Experimental results demonstrate how the proposed pipeline can improve the analysis of facial expressions produced by ASD children. A comparison of system’s outputs with the evaluations performed by psychologists, on the same group of ASD children, makes evident how the performed quantitative analysis of children’s abilities helps to go beyond the traditional qualitative ASD assessment/diagnosis protocols, whose outcomes are affected by human limitations in observing and understanding multi-cues behaviors such as facial expressions. Full article
(This article belongs to the Special Issue Semantic Representations for Behavior Analysis in Robotic System)
Show Figures

Figure 1

15 pages, 13970 KiB  
Article
HoPE: Horizontal Plane Extractor for Cluttered 3D Scenes
by Zhipeng Dong, Yi Gao, Jinfeng Zhang, Yunhui Yan, Xin Wang and Fei Chen
Sensors 2018, 18(10), 3214; https://doi.org/10.3390/s18103214 - 23 Sep 2018
Cited by 2 | Viewed by 3645
Abstract
Extracting horizontal planes in heavily cluttered three-dimensional (3D) scenes is an essential procedure for many robotic applications. Aiming at the limitations of general plane segmentation methods on this subject, we present HoPE, a Horizontal Plane Extractor that is able to extract multiple horizontal [...] Read more.
Extracting horizontal planes in heavily cluttered three-dimensional (3D) scenes is an essential procedure for many robotic applications. Aiming at the limitations of general plane segmentation methods on this subject, we present HoPE, a Horizontal Plane Extractor that is able to extract multiple horizontal planes in cluttered scenes with both organized and unorganized 3D point clouds. It transforms the source point cloud in the first stage to the reference coordinate frame using the sensor orientation acquired either by pre-calibration or an inertial measurement unit, thereby leveraging the inner structure of the transformed point cloud to ease the subsequent processes that use two concise thresholds for producing the results. A revised region growing algorithm named Z clustering and a principal component analysis (PCA)-based approach are presented for point clustering and refinement, respectively. Furthermore, we provide a nearest neighbor plane matching (NNPM) strategy to preserve the identities of extracted planes across successive sequences. Qualitative and quantitative evaluations of both real and synthetic scenes demonstrate that our approach outperforms several state-of-the-art methods under challenging circumstances, in terms of robustness to clutter, accuracy, and efficiency. We make our algorithm an off-the-shelf toolbox which is publicly available. Full article
(This article belongs to the Special Issue Semantic Representations for Behavior Analysis in Robotic System)
Show Figures

Figure 1

22 pages, 5501 KiB  
Article
A Scene Recognition and Semantic Analysis Approach to Unhealthy Sitting Posture Detection during Screen-Reading
by Weidong Min, Hao Cui, Qing Han and Fangyuan Zou
Sensors 2018, 18(9), 3119; https://doi.org/10.3390/s18093119 - 16 Sep 2018
Cited by 22 | Viewed by 7214
Abstract
Behavior analysis through posture recognition is an essential research in robotic systems. Sitting with unhealthy sitting posture for a long time seriously harms human health and may even lead to lumbar disease, cervical disease and myopia. Automatic vision-based detection of unhealthy sitting posture, [...] Read more.
Behavior analysis through posture recognition is an essential research in robotic systems. Sitting with unhealthy sitting posture for a long time seriously harms human health and may even lead to lumbar disease, cervical disease and myopia. Automatic vision-based detection of unhealthy sitting posture, as an example of posture detection in robotic systems, has become a hot research topic. However, the existing methods only focus on extracting features of human themselves and lack understanding relevancies among objects in the scene, and henceforth fail to recognize some types of unhealthy sitting postures in complicated environments. To alleviate these problems, a scene recognition and semantic analysis approach to unhealthy sitting posture detection in screen-reading is proposed in this paper. The key skeletal points of human body are detected and tracked with a Microsoft Kinect sensor. Meanwhile, a deep learning method, Faster R-CNN, is used in the scene recognition of our method to accurately detect objects and extract relevant features. Then our method performs semantic analysis through Gaussian-Mixture behavioral clustering for scene understanding. The relevant features in the scene and the skeletal features extracted from human are fused into the semantic features to discriminate various types of sitting postures. Experimental results demonstrated that our method accurately and effectively detected various types of unhealthy sitting postures in screen-reading and avoided error detection in complicated environments. Compared with the existing methods, our proposed method detected more types of unhealthy sitting postures including those that the existing methods could not detect. Our method can be potentially applied and integrated as a medical assistance in robotic systems of health care and treatment. Full article
(This article belongs to the Special Issue Semantic Representations for Behavior Analysis in Robotic System)
Show Figures

Figure 1

13 pages, 2639 KiB  
Article
Pixel-Wise Crack Detection Using Deep Local Pattern Predictor for Robot Application
by Yundong Li, Hongguang Li and Hongren Wang
Sensors 2018, 18(9), 3042; https://doi.org/10.3390/s18093042 - 11 Sep 2018
Cited by 56 | Viewed by 5124
Abstract
Robotic vision-based crack detection in concrete bridges is an essential task to preserve these assets and their safety. The conventional human visual inspection method is time consuming and cost inefficient. In this paper, we propose a robust algorithm to detect cracks in a [...] Read more.
Robotic vision-based crack detection in concrete bridges is an essential task to preserve these assets and their safety. The conventional human visual inspection method is time consuming and cost inefficient. In this paper, we propose a robust algorithm to detect cracks in a pixel-wise manner from real concrete surface images. In practice, crack detection remains challenging in the following aspects: (1) detection performance is disturbed by noises and clutters of environment; and (2) the requirement of high pixel-wise accuracy is difficult to obtain. To address these limitations, three steps are considered in the proposed scheme. First, a local pattern predictor (LPP) is constructed using convolutional neural networks (CNN), which can extract discriminative features of images. Second, each pixel is efficiently classified into crack categories or non-crack categories by LPP, using as context a patch centered on the pixel. Lastly, the output of CNN—i.e., confidence map—is post-processed to obtain the crack areas. We evaluate the proposed algorithm on samples captured from several concrete bridges. The experimental results demonstrate the good performance of the proposed method. Full article
(This article belongs to the Special Issue Semantic Representations for Behavior Analysis in Robotic System)
Show Figures

Figure 1

19 pages, 9695 KiB  
Article
Statistic Experience Based Adaptive One-Shot Detector (EAO) for Camera Sensing System
by Xiaoning Zhu, Bojian Ding, Qingyue Meng, Lize Gu and Yixian Yang
Sensors 2018, 18(9), 3041; https://doi.org/10.3390/s18093041 - 11 Sep 2018
Cited by 2 | Viewed by 2649
Abstract
Object detection in a camera sensing system has been addressed by researchers in the field of image processing. Highly-developed techniques provide researchers with great opportunities to recognize objects by applying different algorithms. This paper proposes an object recognition model, named Statistic Experience-based Adaptive [...] Read more.
Object detection in a camera sensing system has been addressed by researchers in the field of image processing. Highly-developed techniques provide researchers with great opportunities to recognize objects by applying different algorithms. This paper proposes an object recognition model, named Statistic Experience-based Adaptive One-shot Detector (EAO), based on convolutional neural network. The proposed model makes use of spectral clustering to make detection dataset, generates prior boxes for object bounding and assigns prior boxes based on multi-resolution. The model is constructed and trained for improving the detection precision and the processing speed. Experiments are conducted on classical images datasets while the results demonstrate the superiority of EAO in terms of effectiveness and efficiency. Working performance of the EAO is verified by comparing it to several state-of-the-art approaches, which makes it a promising method for the development of the camera sensing technique. Full article
(This article belongs to the Special Issue Semantic Representations for Behavior Analysis in Robotic System)
Show Figures

Figure 1

Back to TopTop