1. Introduction
Exercise and physical activity have well-documented mental and physical health benefits [
1,
2]. People who partake in regular physical activity are healthier and have a better mood. They are also less prone to several chronic diseases (e.g., cardiovascular disease, diabetes, cancer, hypertension, obesity, and depression) and live much longer compared to those with a sedentary lifestyle. Consequently, active daily living is recommended for all people of all ages [
2]. Unfortunately, despite the numerous benefits of regular physical activity, it is challenging for most people to stay motivated and keep adherence to a regular workout schedule [
3]. Indeed, people easily lose self-motivation. Additionally, at least for beginners, proper physical exercise necessitates training.
Researchers and exercise therapists have proposed numerous strategies that help improve adherence to a regular exercise schedule [
4]. These include, among other things, encouraging people to be physically active and to create an environment that makes it easier for people to be physically active in their homes. For example, [
5] use smartphone data and developed a fitness assistant framework that automatically generates a fitness schedule. The framework also incorporates social interaction to increase the engagement of its users. The most advanced state-of-the-art technology aims at serving as a substitute for a personal trainer. For instance, FitCoach [
6] is a virtual fitness coach that uses wearable devices and assesses the patterns and position of its users during workouts in order to help them achieve an effective workout and to prevent them from workout injuries. Extensive experiments, both in indoor and outdoor conditions, have shown that FitCoach can assess its users’ workout and provide an adequate recommendation with accuracy
.
In our previous work, we introduced a method that provides accurate real-time segmentation, classification, and counting of both indoor and outdoor practiced physical exercises from the signal of a single inertial measurement unit (IMU) worn on the chest [
7]. Targeting five types of exercises, the proposed segmentation algorithm achieved 98% precision and 94% recall, while the proposed classification method achieved 97% precision and 93% recall. We demonstrated the flexibility of proposed method by developing a virtual reality dodgeball application [
8]. The application uses a wrist-mounted IMU and an HMD (head-mounted display), and it implements the ExerSense algorithm to detect a ball-throwing gesture toward the target in virtual space (
Figure 1). This paper is an extension improving the motion detection method and demonstrating its robustness to various sensor wearing positions.
Recently, IMU sensors have become more widely adopted for physical activity recognition [
9,
10,
11]. Some IMU-based systems (e.g., [
9]) are used for step counting and walking detection to encourage its users to increase their ambulatory physical activity. Other methods (e.g., [
10]) automatically recognize various walking workouts (e.g., walking and brisk walking). Finally, advanced IMU-based systems (e.g., [
12,
13]) aim at altogether bypassing the need for personal physical trainers. They monitor their users during exercise and classify their exercises technique and provide feedback to improve their workout. Compared to existing research, the proposed approach provides the three following practical enhancements. First, most existing approaches have practical limitations. For example, methods for outdoor physical activity recognition are usually based on frequency analysis, and since the number of cycles is large, a few misclassifications are tolerable, but such errors are not tolerable for plyometric exercise. The proposed method works well for short-term cyclic movement exercises (e.g., push-ups) and for long-term cyclic quick movements exercises (e.g., running and walking). Second, unlike other comparable machine-learning-based approaches that need a lot of training data, the proposed method needs one sample of motion data of each target exercises and yet performs reasonably well (accuracy
). Finally, although not yet validated, the proposed approach has also the potential to evaluate the quality of the workout.
2. Related Work
2.1. Behavior Recognition and Step Counting from Wearables
Step counting has been extensively studied in the ubiquitous computing community. Many works have proposed accurate algorithms to count accurately walking and running steps from a smartphone worn in the trousers pocket or at the upper arm [
14,
15,
16], but also from a smartwatch [
17,
18]. Step counting is now a standard functionality in most smartphones and smartwatches. Still, false positives are still unsolved issues. The main reason for that is motion noise that produces the same signals as walking.
However, when IMUs are at the ear, they find that many of the lower-body motions are naturally “filtered out”, i.e., these noisy motions do not propagate up to the ear. Hence, the earphone IMU detects a bounce produced only from walking. Prakash et al. introduced the advantages of eSense in counting the number of steps of walking [
9]. While head movement can still pollute this bouncing signal, they developed methods to alleviate the problem. Results show 95% step count accuracy even in the most difficult test case—very slow walk—where smartphone and wrist-band-type systems falter. Importantly, their system STEAR (STep counting from EARables) is robust to changes in walking patterns and scales well across different users. Additionally, they demonstrate how STEAR also brings opportunities for effective jump analysis, often crucial for exercises and injury-related rehabilitation.
Bayat et al. [
19] proposed a machine-learning-based recognition system to recognize certain types of human physical activities using acceleration data generated by a user’s smartphone, and could reach an overall accuracy rate of 91%. Similarly, Balli et al. [
20] can classify eight different daily human activities with high accuracy from smartwatch sensor data using a hybrid of principal component analysis and random forest algorithm. More recently, Teng et al. [
21] demonstrated on several open datasets that convolutional neural network (CNN) models could improve further the performance across a variety of HAR (human activity recognition) tasks.
While many researchers and developers have been developing applications based on smartphones and smartwatches, Kawsar et al. [
22] proposed and developed a new wearable platform called “eSense” (see
Figure 2). The eSense platform consists of a pair of wireless earbuds augmented with kinetic, audio, and proximity sensing. The left earbud has a six-axis IMU with an accelerometer, a gyroscope, and a Bluetooth Low Energy (BLE) interface used to stream sensor data to a paired smartphone. Both earbuds are also equipped with microphones to record external sounds.
The use of earphones to listen to music while exercising is widespread, and though the eSense platform is still recent, it already attracted the attention of many research teams. It can simultaneously monitor behavior analyzing the sensory information and provide feedback that does not bother the visual field of the user through the acoustic interface. Indeed, repeated check of some visual feedback provided on a smartphone or smartwatch screen may be dangerous and the cause of accidents when done during exercises implying motion. For example, Prakash et al. developed an algorithm that can perform robust step counting and jump analysis from the inertial signals streamed by the eSense ear-buds [
9]. In their study, they also showed the ear position is advantageous to collect motion signals since it enables to filter of lower-body noisy motions naturally. On the other hand, Radhakrishnan et al. proposed to use the eSense platform to improve user engagement during indoor weight-based gym exercises [
23].
2.2. Vision-Based Exercise Recognition
There exist many studies that quantitatively evaluate the performance of sports and physical exercises. These researches are often based on three-dimensional (3-D) image analysis, whether it is for baseball [
24,
25,
26,
27,
28,
29], tennis [
30,
31,
32,
33], or games [
34]. Typically, the evaluation is based on kinematics and the dynamics of joint motions of shoulder, elbow, forearm, wrist, and fingers during pitching. For example, Antón et al. [
35] introduced a Kinect-based algorithm for the monitoring of physical rehabilitation exercises. The algorithm recognizes the main components of the exercises, postures, and movements in order to assess their quality of execution. Moreover, this game-like immersive framework motivates them to do the rehabilitation sessions more enjoyable. Despite only a few samples in the training step, the algorithm is capable of making real-time recognition of the exercises and achieved a monitoring accuracy of 95.16% in a real scenario when evaluated on 15 users.
In general, vision-based approaches are more accurate than wearable sensor-based approaches for exercise recognition. Although they achieve good performances, the use of a vision-based sports/exercise recognition system is limited to dedicated locations. 3-D image analysis is complex and computationally intensive. This limitation is, however, minimized by the possibility to perform some preprocessing on the sensor level.
2.3. Skill Science
Up to now, many researches have proposed to evaluate sports skills quantitatively. For long time, they have been principally carried out based on three-dimensional image analysis, whether it is for baseball [
24,
29] or tennis [
30,
33]. Along with the widespread use of wearable sensor devices, research and techniques for analyzing the movement of bodies and tools from acquired data are progressing in sports fields and the like by attaching sensors to the body and gears. In the field of skill science, there are some research works consisting in attaching a sensor to a tennis racket and analyze its behavior [
36], and others focusing on the estimation of baseball pitching speed using a wrist-mounted acceleration sensor and laser apparatus [
37]. However, most proposed accurate solutions are base on dedicated sensors (“Smart Tennis Sensor” by Sony Corporation [
38]) or the wrist (“Babolat Play” by Babolat [
39]), and require computer postprocessing, such that there is no real-time nor onsite feedback to improve skills.
With the popularity of smartwatches and other smart wearable devices that integrate multiple sensors, there is less need for exercise-specific hardware development. Smartwatches generally have built-in microelectromechanical systems (MEMS), IMU, and pulse rate (PR) sensors. Therefore, these devices need only software applications to be developed for each targeted sport or exercise. In their extensive review of technologies available for tennis serve evaluation, Tubez et al. raise the great prospect offered by markerless systems based on inertial measurement units for real situation evaluation [
40]. Examples are the applications developed by Lopez et al. [
41] for supporting an athlete or a beginner with baseball pitching action and tennis serve action. The personal sport skill improvement support application is running on Sony’s SmartWatch SWR50 and does not even need to communicate with the paired smartphone to perform onsite movement analysis and feedback. The comparative research using the proposed smartwatch applications for sport skill improvement support achieved encouraging results.
2.4. Recognition of Movement-Repetition-Based Exercises
One of the relevant previous work is that of Dan et al. [
42], who introduced RecoFit, a system for automatically tracking repetitive exercises such as weight training and calisthenics via an arm-worn inertial sensor. They addressed three challenges: segmenting, recognizing, and counting of several repetitive exercises. They achieved precision and recall greater than 95% in segmenting exercise periods, 99%, 98%, and 96% of recognition of 4, 7, and 13 exercises, respectively, and 93% of
repetition of counting accuracy. However, the method of RecoFit needs five seconds to segment and recognize exercise. In the case of a small number of counts, it cannot find correct exercise and count. It requires a dedicated device attached to the forearm; that implies a supplementary cost for users that have to buy a device for a particular and limited usage, as well as the burden of attaching a device to an unusual part of the body.
Viana et al. [
43] proposed an application called GymApp, similar to the system mentioned above, but applied to workout exercise recognition. It also runs on Android OS smartwatches and monitors physical activities, for example, in fitness. It has two modes of operation: training mode and practice mode. In training mode, an athlete is advised to perform an exercise (e.g., biceps curl) with lighter weight and with the supervision of a fitness instructor to guarantee the correctness of the performed exercise. The application then gathers sensory data and builds a model for the performed exercise using supervised machine learning techniques. Then, in the practice mode, the recorded sensory data are compared with the previously acquired data. The application calculates the similarity distance and, from the result, estimates how many repetitions of the exercise were performed correctly.
More recently, Skawinski et al. [
44] consider four different types of workout (pushups, situps, squats, and jumping jacks), and proposed a workout type recognition and repetition counting method based on machine learning with a convolutional neural network. Their evaluation with data from 10 subjects wearing a Movesense sensor on their chest during their workout resulted in 89.9% average detection of workout and 97.9% average detection accuracy for repetition counting.
Although the above-described studies are promising, they are based on machine learning techniques. It implies a necessary preliminary step to collect data to train a model for each type of targeted movement, as well as for each type of sensor or sensor position (wrist, chest, arm, head, etc.). This training step is a burden for the users and a disadvantage towards deploying the technology.
2.5. Summary
Most of the works related to detailed exercise recognition achieve around 95% for each defined exercise under the condition of only indoor workouts or only outdoor exercises like walking and running. Thus, in this research, we aim to recognize both indoor and outdoor exercises while keeping with the same accuracy. We define indoor exercises as physical activities performed on the spot, such as push-ups and sit-ups, usually performed at home or a sports gym. Contrarily, we define outdoor exercises as physical activities involving the displacement of the whole body, such as running and walking, usually performed outdoor (though you can use some running machines indoors).
Many of them are based on machine learning techniques, which often require a new dataset for each new user. Thus, this research also aims at proposing a method that provides accurate real-time segmentation, classification, and counting of physical exercises without needing recalibration for each user.
3. Methods
In this section, we introduce the method of the proposed system. In
Section 3.1, the outline of ExerSense is presented. Then, in
Section 3.2 and
Section 3.3, we describe, respectively, the details of segmentation and classification. Finally, we briefly explain how counting is performed in
Section 3.4.
3.1. Outline of ExerSense
Figure 3 represents a broad schematic of the architecture of the proposed recognition method, ExerSense. It is separated into two phases: preprocessing and runtime phase. As described later, the proposed method works independently of the kind of devices.
In the preprocessing phase, some acceleration data are collected by target devices at least one motion for each target exercise. Because the method uses a correlation-based algorithm to classify each motion, only one single motion sample of the target exercise is needed in advance. That is a significant advantage of the correlation-based approach against approaches based on machine learning. In the case of image classification, natural language processing, and so on, data are extensively available on the Internet and easy to collect physically. However, in the case of exercise recognition, it is tough to collect training data for machine learning.
The runtime phase starts with the segmentation of the streamed acceleration signal into single motions by finding the peaks in the synthetic acceleration signal. The next section explains in detail the segmentation process. Then, every segmented 3-D acceleration signal is classified by comparison with each exercise’s motion template produced in the preprocessing phase using a correlation-based algorithm, and the count of classified exercise is incremented.
3.2. Segmentation Algorithm for Single Motion Extraction
Hereafter we describe the process of segmentation algorithm from a 3-D acceleration signal collected at the chest during push-ups exercise. First, the synthetic acceleration of streamed inertial sensor data, which is the norm of the 3-D acceleration signal, is calculated. In the case of push-ups, peaks detection and motion segmentation may be performed using only the longitudinal acceleration of raw data. However, it is not the right solution since this research targets not only push-ups but also other types of exercise, including those that do not imply movements in the longitudinal direction. Therefore, the synthetic acceleration is more appropriate, though it presents a disadvantage of reducing the differences between movements that are similar but along a different axis.
The result of the norm includes much noise. Applying short-term energy enables not only to emphasize significant signal variations but also to smooth them. Smoothing is important to detect only motion start and end peaks easily.
Then, we used a sliding window of 0.25 s length to detect peaks. The tempo of the running steps is the shortest tempo among regular exercises. After observing various persons running, the fastest tempo more than three but less than four steps per second. Hence, to avoid having two steps in a sliding window, we chose 0.25 s as the optimal size. If the center value of the window is the maximum value of the window, then it is determined as a peak. The fourth plot shows detected peaks plotted on the smoothed norm of acceleration signal collected during push-ups exercise.
Finally, the synthetic acceleration signal (x × x + y × y + z × z) is segmented by extracting the data between the period of two consecutive peaks. Such, we define a “segment of exercise” as the raw acceleration data between the time interval of two consecutive peaks extracted from the smoothed synthetic acceleration signal, and containing a single motion of an exercise (e.g., one step, one jump, one push-up, etc.).
In most cases, one peak is detected for each motion. However, in the case of sit-ups, multiple peaks are detected for each motion (see
Figure 4). To be able to deal with this case, one of the peak-to-peak periods (yellow-colored in
Figure 4) is defined as sit-up base motion. Yellow-colored peak-to-peak represents “wake-up” motion during sit-up. Because “wake-up” is the most important movement for sit-up training, we selected the area.
3.3. Classification of Extracted Motion Segments
Figure 5 shows the processing flow of the proposed classification method. After extracting the 3-D acceleration signal corresponding to a single motion through the segmentation process, the dynamic time warping (Algorithm 1) algorithm is applied to calculate the distance between every template signal and the extracted signals. The dynamic time warping (DTW) can calculate the distance between two time series data that have different lengths. This is a crucial property since it offers the capability to deal with the shape of signals issued from one identical exercise, independently of the speed the exercise motion is performed. Finally, the proposed method classifies the exercise that has the minimum DTW score as the ongoing exercise.
In our previous work [
7], artificial coefficients are applied to DTW score to increase the performance. These coefficients were determined by variances of the three axes that are affected by the body influence, the direction of maximum movement, and the intensity of movement. However, these coefficients were predefined by authors based on experiences and only for the chest-mounted sensor. In this work, we removed the coefficients to compare multiple device positionings.
Algorithm 1 Dynamic Time Warping |
fordo end for fordo end for fordo for do end for end for return
|
3.4. Counting
After the classification step, it is easy to count each exercise. Only what we need to do is to iterate by one the counter for each classified exercise. However, in the case of sit-ups, the proposed method divides one motion into three segments. One of the three segments will be similar to template data, but other similarities are unlikely. Thus, we can count correctly with the combinations of segmentation and classification.
6. Conclusions
In this research, we proposed ExerSense, a method to segment, classify, and count multiple physical exercises in real time. ExerSense is based on the correlation method because only one motion is needed in advance. In the case that is difficult to collect data for a physical exercise, it is more advantageous to use the correlation method instead of the machine learning method.
We collected acceleration data of five exercises by four different positioned sensors. In order to validate our proposed segmentation method, we counted the correct extracted segments. It recalled more than 91% of segments, except 84% of the ear. Using the accurately extracted segments, we validated the classification method. The most accurate one was the Movesense sensor mounted to the chest; it achieved 99% accuracy. The smartphone mounted to the upper arm was a close second with 94%. The smartwatch mounted to the wrist was third with 86%, and the worst one among the four was the eSense mounted to the ear with 76%.
The proposed method, ExerSense, segments and classifies multiple exercises accurately in general usage devices. We found that the ExerSense system works at various positions. Though it has room for improvement of sit-up segmentation, it achieved high accuracy for most of regular exercises. In future works, we will improve the ExerSense algorithm and test it on other exercises and other positions.