The past decade has witnessed rapid growth in the number of motion capture applications, ranging from sports sciences and motion analysis to motion-based video games and movies [1
]. Generally defined, motion capture is the process of recording the movements of humans. It refers to recording the actions of human actors and using that information to animate digital character models in 2D or 3D computer animation sequences. Recently, we have also witnessed the popularity of Korean pop (K-pop) music spread throughout the world. K-pop is a musical genre originating from South Korea that is characterized by a wide variety of audiovisual elements. Although it includes all genres of popular music in South Korea, the term is more often used in a narrower sense to describe a modern form of South Korean pop music covering a range of styles including dance-pop, pop ballads, electro-pop, rock, jazz, and hip-pop. One possible reason that K-pop has become so popular globally is that other aspiring dancers may feel inclined to view skilled young K-pop dancers as role models and to copy their dance styles. This can lead to plagiarism issues in both dance and music, which is our main motivation for classifying K-pop dance movements for the development of both video-based retrieval systems and dance training systems.
There are three main types of motion capture systems: optical systems, non-optical systems, and markerless systems. Optical systems use the data captured from optical sensors to detect the 3D positions of a subject located between two or more cameras that are calibrated to provide overlapping projections. Data acquisition is traditionally implemented by attaching special markers to the actor. Optical capture systems are used with several types of markers, including passive markers, active markers, time modulated active markers, and semi-passive imperceptible markers. Non-optical capture systems include inertial systems, mechanical motion systems, and magnetic systems. Among these, inertial motion capture is the best-known capture system. Inertial motion capture technology includes inertial sensors, biomechanical models, and sensor fusion algorithms. Inertial motion-sensor data are often transmitted wirelessly to a computer, where the motion is recorded or viewed. Finally, the markerless capture method is currently assisting the rapid development of the markerless approach to motion capture in the area of computer vision. Markerless systems do not require subjects to wear special equipment for tracking. Several studies related to markerless systems have been performed via motion analysis of data obtained from the well-known Kinect sensor [6
In this paper, we focus on a markerless capture method based on the skeletal joint data of human motion utilizing a Kinect camera in a motion-capture studio environment for the classification of K-pop dance movements. The previous works have been focused on ballet analysis [16
], video recommendation based on dance styles [18
], dance pose estimation [19
], dance animation [21
], and e-learning of dance [22
]. While some ballet movements and dance pose estimation have previously been studied in various aspects [16
], nobody has yet performed research on K-pop dance movements using Kinect sensors to address the problem of dance plagiarism. In order to accomplish this, a K-pop dance database is constructed from the motions of professional dancers. The process of dance movement classification comprises feature extraction, dimensionality reduction, and, finally, the classification itself. In the first step, features are extracted from 25 markers of skeletal joint data. We use six features representing the important motion angles in each frame. These features are connected in the form of a feature vector for all of the frames. Next, a combination of principal component analysis (PCA) [27
] and linear discriminant analysis (LDA) [28
], referred to in this paper as “fisherdance”, is performed to reduce the dimensionality of the dance movements. In the last step, an extreme learning machine classifier (ELMC) is designed based on a rectified linear unit (ReLU)-based activation function. The characteristics of the ReLU-based ELMC are high accuracy, low user intervention, and real-time learning that occurs in seconds or milliseconds. Conventional ELMs have homogenous architectures for compression, feature learning, clustering, regression, and classification. Research has been conducted on the use of ELMs in various applications, including image super-resolution [29
], real operation of wind farms [30
], electricity price forecasting [31
], remote control of a robotic hand [32
], human action recognition [33
], and 3D shape segmentation and labeling [34
]. A considerable number of studies have been conducted on ELM variants [35
]. The results of experiments performed on the constructed database demonstrate that the classification performance of the proposed method outperforms those employed in these studies.
This paper is organized in the following manner. Section 2
describes the generation of the concatenated vectors from the six core angles of each frame as well as the dimensionality reduction method utilized in this study. Section 3
describes the techniques used in dance movement classification realized via the ReLU-ELMC. Section 4
covers the results of simulations performed on the K-pop dance databases available at the Electronics and Telecommunications Research Institute (ETRI). Finally, Section 5
includes our concluding comments.