Next Article in Journal
Reinforcement Learning-Based UAVs Resource Allocation for Integrated Sensing and Communication (ISAC) System
Previous Article in Journal
A Flexible and Low-Cost UHF RFID Tag Antenna for Blood Bag Traceability
Previous Article in Special Issue
A Deep Reinforcement Learning Strategy Combining Expert Experience Guidance for a Fruit-Picking Manipulator
 
 
Article

Person Localization Model Based on a Fusion of Acoustic and Visual Inputs

Faculty of Mechanical Engineering and Naval Architecture, University of Zagreb, Ivana Lucica 5, 10000 Zagreb, Croatia
*
Author to whom correspondence should be addressed.
Academic Editor: Cheng Siong Chin
Electronics 2022, 11(3), 440; https://doi.org/10.3390/electronics11030440
Received: 30 December 2021 / Revised: 25 January 2022 / Accepted: 29 January 2022 / Published: 1 February 2022
(This article belongs to the Special Issue Neural Networks in Robot-Related Applications)
PLEA is an interactive, biomimetic robotic head with non-verbal communication capabilities. PLEA reasoning is based on a multimodal approach combining video and audio inputs to determine the current emotional state of a person. PLEA expresses emotions using facial expressions generated in real-time, which are projected onto a 3D face surface. In this paper, a more sophisticated computation mechanism is developed and evaluated. The model for audio-visual person separation can locate a talking person in a crowded place by combining input from the ResNet network with input from a hand-crafted algorithm. The first input is used to find human faces in the room, and the second input is used to determine the direction of the sound and to focus attention on a single person. After an information fusion procedure is performed, the face of the person speaking is matched with the corresponding sound direction. As a result of this procedure, the robot could start an interaction with the person based on non-verbal signals. The model was tested and evaluated under laboratory conditions by interaction with users. The results suggest that the methodology can be used efficiently to focus a robot’s attention on a localized person. View Full-Text
Keywords: spatial location; residual neural network; digital filter; person separation; cognitive robotics; multimodal signal processing; sensors; HRI spatial location; residual neural network; digital filter; person separation; cognitive robotics; multimodal signal processing; sensors; HRI
Show Figures

Figure 1

MDPI and ACS Style

Koren, L.; Stipancic, T.; Ricko, A.; Orsag, L. Person Localization Model Based on a Fusion of Acoustic and Visual Inputs. Electronics 2022, 11, 440. https://doi.org/10.3390/electronics11030440

AMA Style

Koren L, Stipancic T, Ricko A, Orsag L. Person Localization Model Based on a Fusion of Acoustic and Visual Inputs. Electronics. 2022; 11(3):440. https://doi.org/10.3390/electronics11030440

Chicago/Turabian Style

Koren, Leon, Tomislav Stipancic, Andrija Ricko, and Luka Orsag. 2022. "Person Localization Model Based on a Fusion of Acoustic and Visual Inputs" Electronics 11, no. 3: 440. https://doi.org/10.3390/electronics11030440

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop