The purpose of this work is to develop computational intelligence models based on neural networks (NN), fuzzy models (FM), support vector machines (SVM) and long short-term memory networks (LSTM) to predict human pose and activity from image sequences, based on computer vision approaches to gather the required features. To obtain the human pose semantics (output classes), based on a set of 3D points that describe the human body model (the input variables of the predictive model), prediction models were obtained from the acquired data, for example, video images. In the same way, to predict the semantics of the atomic activities that compose an activity, based again in the human body model extracted at each video frame, prediction models were learned using LSTM networks. In both cases the best learned models were implemented in an application to test the systems. The SVM model obtained 95.97% of correct classification of the six different human poses tackled in this work, during tests in different situations from the training phase. The implemented LSTM learned model achieved an overall accuracy of 88%, during tests in different situations from the training phase. These results demonstrate the validity of both approaches to predict human pose and activity from image sequences. Moreover, the system is capable of obtaining the atomic activities and quantifying the time interval in which each activity takes place.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited