This paper presents a multimodal emotion recognition method that uses a feature-level combination of three-dimensional (3D) geometric features (coordinates, distance and angle of joints), kinematic features such as velocity and displacement of joints, and features extracted from daily behavioral patterns such as frequency of head nod, hand wave, and body gestures that represent specific emotions. Head, face, hand, body, and speech data were captured from 15 participants using an infrared sensor (Microsoft Kinect). The 3D geometric and kinematic features were developed using raw feature data from the visual channel. Human emotional behavior-based features were developed using inter-annotator agreement and commonly observed expressions, movements and postures associated to specific emotions. The features from each modality and the behavioral pattern-based features (head shake, arm retraction, body forward movement depicting anger) were combined to train the multimodal classifier for the emotion recognition system. The classifier was trained using 10-fold cross validation and support vector machine (SVM) to predict six basic emotions. The results showed improvement in emotion recognition accuracy (The precision increased by 3.28% and the recall rate by 3.17%) when the 3D geometric, kinematic, and human behavioral pattern-based features were combined for multimodal emotion recognition using supervised classification.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited