Special Issue "Computer Vision and Machine Learning in Human-Computer Interaction"

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 30 June 2021.

Special Issue Editor

Prof. Dr. Włodzimierz Kasprzak
E-Mail Website
Guest Editor
Warsaw University of Technology, Faculty of Electronics and Information Technology, Institute of Control and Computation Engineering, Nowowiejska 15/19, 00-665 Warsaw, Poland
Interests: computational techniques in pattern recognition, artificial intelligence and machine learning, and their application to image- and speech analysis; robot vision; biometric techniques
Special Issues and Collections in MDPI journals

Special Issue Information

Dear Colleagues,

Among others, the rapid development in the area of imaging sensor technology has been responsible for the recent improvement and technological readiness of various human-computer interaction (HCI) systems, especially those taking the form of human-machine interfaces and human assistance systems. Numerous application fields of HCI techniques have already been found, like car-driver assistance systems, service and social robots, medical and healthcare systems, sport training assistance, and special communication modes for handicapped and elderly people. The price, size, and power requirements of image sensors and digital cameras are steadily falling, presenting new opportunities for machine learning techniques applied in computer vision systems. The miniaturisation of vision sensors and the improved design of high-resolution and high-speed RGB-D cameras significantly stimulates the collection of huge volumes of digital image data. Computer vision algorithms benefit a lot from this process since, along with classic signal processing and pattern recognition techniques, machine learning techniques can be realistically applied now, leading to new robust solutions to human-centred image analysis tasks.

In this Special Issue, we are particularly interested in system architecture and computational techniques, applied for the purpose of human-computer interactions, that are benefiting from modern vision sensors and cameras. From the methodological point of view, the focus is on combining classical pattern recognition and deep learning techniques to create new computational paradigms for typical tasks in visual human-machine interactions, like human pose detection and dynamic gesture recognition, hand- and body sign recognition, eye attention tracking, and face emotion recognition. On the practical side, we are looking for hardware and software components, prototypes, and demonstrators of smart human-computer interaction systems in various application fields. Topics of interest include but are not limited to the following:

  • Human-machine interfaces;
  • Human assistance;
  • Imaging sensors;
  • RGB-D cameras;
  • Image data collection and annotation;
  • Human pose detection;
  • Human gesture recognition;
  • Eye tracking;
  • Face emotion recognition;
  • Sign and body language recognition;
  • Vision-based human-computer interactions (VHCI).
  • Signal processing and pattern recognition in VHCI
  • Deep learning techniques in VHCI
  • Computational paradigms and system architectures for smart VHCI
  • Hardware and software of smart VHCI
  • Prototypes and demonstrators of smart VHCI
  • Applications of smart VHCI

Prof. Dr. Włodzimierz Kasprzak
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. The Special Issue runs on a continued submission model. Authors may submit their papers at any time. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Open AccessArticle
Context Aware Video Caption Generation with Consecutive Differentiable Neural Computer
Electronics 2020, 9(7), 1162; https://doi.org/10.3390/electronics9071162 - 17 Jul 2020
Cited by 1 | Viewed by 866
Abstract
Recent video captioning models aim at describing all events in a long video. However, their event descriptions do not fully exploit the contextual information included in a video because they lack the ability to remember information changes over time. To address this problem, [...] Read more.
Recent video captioning models aim at describing all events in a long video. However, their event descriptions do not fully exploit the contextual information included in a video because they lack the ability to remember information changes over time. To address this problem, we propose a novel context-aware video captioning model that generates natural language descriptions based on the improved video context understanding. We introduce an external memory, differential neural computer (DNC), to improve video context understanding. DNC naturally learns to use its internal memory for context understanding and also provides contents of its memory as an output for additional connection. By sequentially connecting DNC-based caption models (DNC augmented LSTM) through this memory information, our consecutively connected DNC architecture can understand the context in a video without explicitly searching for event-wise correlation. Our consecutive DNC is sequentially trained with its language model (LSTM) for each video clip to generate context-aware captions with superior quality. In experiments, we demonstrate that our model provides more natural and coherent captions which reflect previous contextual information. Our model also shows superior quantitative performance on video captioning in terms of BLEU ([email protected] 4.37), METEOR (9.57), and CIDEr-D (28.08). Full article
(This article belongs to the Special Issue Computer Vision and Machine Learning in Human-Computer Interaction)
Show Figures

Figure 1

Open AccessArticle
Woven Fabric Pattern Recognition and Classification Based on Deep Convolutional Neural Networks
Electronics 2020, 9(6), 1048; https://doi.org/10.3390/electronics9061048 - 24 Jun 2020
Cited by 3 | Viewed by 1193
Abstract
The weave pattern (texture) of woven fabric is considered to be an important factor of the design and production of high-quality fabric. Traditionally, the recognition of woven fabric has a lot of challenges due to its manual visual inspection. Moreover, the approaches based [...] Read more.
The weave pattern (texture) of woven fabric is considered to be an important factor of the design and production of high-quality fabric. Traditionally, the recognition of woven fabric has a lot of challenges due to its manual visual inspection. Moreover, the approaches based on early machine learning algorithms directly depend on handcrafted features, which are time-consuming and error-prone processes. Hence, an automated system is needed for classification of woven fabric to improve productivity. In this paper, we propose a deep learning model based on data augmentation and transfer learning approach for the classification and recognition of woven fabrics. The model uses the residual network (ResNet), where the fabric texture features are extracted and classified automatically in an end-to-end fashion. We evaluated the results of our model using evaluation metrics such as accuracy, balanced accuracy, and F1-score. The experimental results show that the proposed model is robust and achieves state-of-the-art accuracy even when the physical properties of the fabric are changed. We compared our results with other baseline approaches and a pretrained VGGNet deep learning model which showed that the proposed method achieved higher accuracy when rotational orientations in fabric and proper lighting effects were considered. Full article
(This article belongs to the Special Issue Computer Vision and Machine Learning in Human-Computer Interaction)
Show Figures

Figure 1

Open AccessFeature PaperArticle
Utilisation of Embodied Agents in the Design of Smart Human–Computer Interfaces—A Case Study in Cyberspace Event Visualisation Control
Electronics 2020, 9(6), 976; https://doi.org/10.3390/electronics9060976 - 11 Jun 2020
Viewed by 843
Abstract
The goal of the research reported here was to investigate whether the design methodology utilising embodied agents can be applied to produce a multi-modal human–computer interface for cyberspace events visualisation control. This methodology requires that the designed system structure be defined in terms [...] Read more.
The goal of the research reported here was to investigate whether the design methodology utilising embodied agents can be applied to produce a multi-modal human–computer interface for cyberspace events visualisation control. This methodology requires that the designed system structure be defined in terms of cooperating agents having well-defined internal components exhibiting specified behaviours. System activities are defined in terms of finite state machines and behaviours parameterised by transition functions. In the investigated case the multi-modal interface is a component of the Operational Centre which is a part of the National Cybersecurity Platform. Embodied agents have been successfully used in the design of robotic systems. However robots operate in physical environments, while cyberspace events visualisation involves cyberspace, thus the applied design methodology required a different definition of the environment. It had to encompass the physical environment in which the operator acts and the computer screen where the results of those actions are presented. Smart human–computer interaction (HCI) is a time-aware, dynamic process in which two parties communicate via different modalities, e.g., voice, gesture, eye movement. The use of computer vision and machine intelligence techniques are essential when the human is carrying an exhausting and concentration demanding activity. The main role of this interface is to support security analysts and operators controlling visualisation of cyberspace events like incidents or cyber attacks especially when manipulating graphical information. Visualisation control modalities include visual gesture- and voice-based commands. Full article
(This article belongs to the Special Issue Computer Vision and Machine Learning in Human-Computer Interaction)
Show Figures

Figure 1

Open AccessArticle
Deep Neural Network Based Ambient Airflow Control through Spatial Learning
Electronics 2020, 9(4), 591; https://doi.org/10.3390/electronics9040591 - 31 Mar 2020
Viewed by 1448
Abstract
As global energy regulations are strengthened, improving energy efficiency while maintaining performance of electronic appliances is becoming more important. Especially in air conditioning, energy efficiency can be maximized by adaptively controlling the airflow based on detected human locations; however, several limitations such as [...] Read more.
As global energy regulations are strengthened, improving energy efficiency while maintaining performance of electronic appliances is becoming more important. Especially in air conditioning, energy efficiency can be maximized by adaptively controlling the airflow based on detected human locations; however, several limitations such as detection areas, the installation environment, and sensor quantity and real-time performance which come from the constraints in the embedded system make it a challenging problem. In this study, by using a low resolution cost effective vision sensor, the environmental information of living spaces and the real-time locations of humans are learned through a deep learning algorithm to identify the living area from the entire indoor space. Based on this information, we improve the performance and the energy efficiency of air conditioner by smartly controlling the airflow on the identified living area. In experiments, our deep learning based spatial classification algorithm shows error less than ± 5 ° . In addition, the target temperature can be reached 19.8% faster and the power consumption can be saved up to 20.5% by the time the target temperature is achieved. Full article
(This article belongs to the Special Issue Computer Vision and Machine Learning in Human-Computer Interaction)
Show Figures

Figure 1

Open AccessArticle
A Multi-Feature Representation of Skeleton Sequences for Human Interaction Recognition
Electronics 2020, 9(1), 187; https://doi.org/10.3390/electronics9010187 - 19 Jan 2020
Cited by 1 | Viewed by 900
Abstract
Inspired from the promising performances achieved by recurrent neural networks (RNN) and convolutional neural networks (CNN) in action recognition based on skeleton, this paper presents a deep network structure which combines both CNN for classification and RNN to achieve attention mechanism for human [...] Read more.
Inspired from the promising performances achieved by recurrent neural networks (RNN) and convolutional neural networks (CNN) in action recognition based on skeleton, this paper presents a deep network structure which combines both CNN for classification and RNN to achieve attention mechanism for human interaction recognition. Specifically, the attention module in this structure is utilized to give various levels of attention to various frames by different weights, and the CNN is employed to extract the high-level spatial and temporal information of skeleton data. These two modules seamlessly form a single network architecture. In addition, to eliminate the impact of different locations and orientations, a coordinate transformation is conducted from the original coordinate system to the human-centric coordinate system. Furthermore, three different features are extracted from the skeleton data as the inputs of three subnetworks, respectively. Eventually, these subnetworks fed with different features are fused as an integrated network. The experimental result shows the validity of the proposed approach on two widely used human interaction datasets. Full article
(This article belongs to the Special Issue Computer Vision and Machine Learning in Human-Computer Interaction)
Show Figures

Figure 1

Open AccessArticle
Fusion of 2D CNN and 3D DenseNet for Dynamic Gesture Recognition
Electronics 2019, 8(12), 1511; https://doi.org/10.3390/electronics8121511 - 09 Dec 2019
Cited by 3 | Viewed by 1157
Abstract
Gesture recognition has been applied in many fields as it is a natural human–computer communication method. However, recognition of dynamic gesture is still a challenging topic because of complex disturbance information and motion information. In this paper, we propose an effective dynamic gesture [...] Read more.
Gesture recognition has been applied in many fields as it is a natural human–computer communication method. However, recognition of dynamic gesture is still a challenging topic because of complex disturbance information and motion information. In this paper, we propose an effective dynamic gesture recognition method by fusing the prediction results of a two-dimensional (2D) motion representation convolution neural network (CNN) model and three-dimensional (3D) dense convolutional network (DenseNet) model. Firstly, to obtain a compact and discriminative gesture motion representation, the motion history image (MHI) and pseudo-coloring technique were employed to integrate the spatiotemporal motion sequences into a frame image, before being fed into a 2D CNN model for gesture classification. Next, the proposed 3D DenseNet model was used to extract spatiotemporal features directly from Red, Green, Blue (RGB) gesture videos. Finally, the prediction results of the proposed 2D and 3D deep models were blended together to boost recognition performance. The experimental results on two public datasets demonstrate the effectiveness of our proposed method. Full article
(This article belongs to the Special Issue Computer Vision and Machine Learning in Human-Computer Interaction)
Show Figures

Figure 1

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

Planned Paper I

Title: Embodied Agent Framework for designing Smart Human-Machine Interaction - a case study in Cyberspace Events Visualisation Control

Authors: Wojciech Szynkiewicz, Cezary Zieliński, Włodzimierz Kasprzak, Wojciech Dudek, Maciej Stefanczyk and Maksym Figat

Affiliation: Warsaw University of Technology, Institute of Control and Computation Engineering, ul. Nowowiejska 15/19, 00--665 Warszawa

*Correspondence: [email protected]; Tel.: +48-22-234-7632, +48-22-234-7397

Abstract: Smart human-computer interaction (HCI) is a time-aware, dynamic process in which two parties communicate via different modalities, e.g. voice, gesture, eye movement. The use of computer vision and machine intelligence techniques is essential when the human is carrying an exhausting and concentration-demanding activity. A smart HCI is a typical requirement for robotic systems, especially for social robots that autonomously act while communicating with human users. Thus, similarities between robot control system design and smart HCI design could be sought. The goal of this paper is to apply the embodied agent framework in HCI system design. The system's structure is defined in terms of cooperating agents having well-defined internal components and behavior. System activities are defined in terms of finite state machines and transition functions. This approach has been proved in social robotics to be very useful in the control system specification phase and supported well the system implementation stage. The case study deals with a multimodal human-computer interface for cyberspace events visualisation control. The multimodal interface is a component of the Operational Centre which is a part of the National Cybersecurity Platform. Cyberspace and its underlying infrastructure are vulnerable to a broad range of risk stemming from diverse cyber threats. The main role of this interface is to support security analysts and operators controlling visualisation of cyberspace events like incidents or cyber attacks especially when manipulating graphical information. Main visualisation control modalities are visual gesture- and voice-based commands. Thus, the design and implementation of gesture recognition- and speech-recognition functions is presented. Security requirements of the Operational Centre allow particular commands to be issued only by trusted, registered users. Thus two additional functions for human identification are implemented - the face recognition- and speaker identification functions.

Keywords: embodied agent framework; gesture/face recognition; speech/speaker recognition; event visualisation control

 

Planned Paper II

Title: Painting quality monitoring for fine-grained motor skill assessment

Authors: Marcin Grzegorzek; Yoji Ochi

Email: [email protected]; [email protected]

Abstract: The purpose of this study is to contribute to the design of applications to provide objective metrics using computer vision technology in an examination of fine-grained motor skills for young children. Specifically, we focused on painting, which is one of the fine-grained motor skills, and developed a system that monitors the painting states in real-time and supports to evaluate the quality. This system targets any colouring material on real-paper in order to introduce it into a kindergarten-test, and it provides metrics of the learner's painting quality by camera-monitoring and mathematical morphology.

Back to TopTop