ExerSense: Physical Exercise Recognition and Counting Algorithm from Wearables Robust to Positioning †

Wearable devices are currently popular for fitness tracking. However, these general usage devices only can track limited and prespecified exercises. In our previous work, we introduced ExerSense that segments, classifies, and counts multiple physical exercises in real-time based on a correlation method. It also can track user-specified exercises collected only one motion in advance. This paper is the extension of that work. We collected acceleration data for five types of regular exercises by four different wearable devices. To find the best accurate device and its position for multiple exercise recognition, we conducted 50 times random validations. Our result shows the robustness of ExerSense, working well with various devices. Among the four general usage devices, the chest-mounted sensor is the best for our target exercises, and the upper-arm-mounted smartphone is a close second. The wrist-mounted smartwatch is third, and the worst one is the ear-mounted sensor.


Introduction
Exercise and physical activity have well-documented mental and physical health benefits [1,2]. People who partake in regular physical activity are healthier and have a better mood. They are also less prone to several chronic diseases (e.g., cardiovascular disease, diabetes, cancer, hypertension, obesity, and depression) and live much longer compared to those with a sedentary lifestyle. Consequently, active daily living is recommended for all people of all ages [2]. Unfortunately, despite the numerous benefits of regular physical activity, it is challenging for most people to stay motivated and keep adherence to a regular workout schedule [3]. Indeed, people easily lose self-motivation. Additionally, at least for beginners, proper physical exercise necessitates training.
Researchers and exercise therapists have proposed numerous strategies that help improve adherence to a regular exercise schedule [4]. These include, among other things, encouraging people to be physically active and to create an environment that makes it easier for people to be physically active in their homes. For example, [5] use smartphone data and developed a fitness assistant framework that automatically generates a fitness schedule. The framework also incorporates social interaction to increase the engagement of its users. The most advanced state-of-the-art technology aims at serving as a substitute for a personal trainer. For instance, FitCoach [6] is a virtual fitness coach that uses wearable devices and assesses the patterns and position of its users during workouts in order to help them achieve an effective workout and to prevent them from workout injuries. Extensive experiments, both in indoor and outdoor conditions, have shown that FitCoach can assess its users' workout and provide an adequate recommendation with accuracy > 90%.
In our previous work, we introduced a method that provides accurate real-time segmentation, classification, and counting of both indoor and outdoor practiced physical exercises from the signal of a single inertial measurement unit (IMU) worn on the chest [7]. Targeting five types of exercises, the proposed segmentation algorithm achieved 98% precision and 94% recall, while the proposed classification method achieved 97% precision and 93% recall. We demonstrated the flexibility of proposed method by developing a virtual reality dodgeball application [8]. The application uses a wrist-mounted IMU and an HMD (head-mounted display), and it implements the ExerSense algorithm to detect a ball-throwing gesture toward the target in virtual space ( Figure 1). This paper is an extension improving the motion detection method and demonstrating its robustness to various sensor wearing positions. Recently, IMU sensors have become more widely adopted for physical activity recognition [9][10][11]. Some IMU-based systems (e.g., [9]) are used for step counting and walking detection to encourage its users to increase their ambulatory physical activity. Other methods (e.g., [10]) automatically recognize various walking workouts (e.g., walking and brisk walking). Finally, advanced IMU-based systems (e.g., [12,13]) aim at altogether bypassing the need for personal physical trainers. They monitor their users during exercise and classify their exercises technique and provide feedback to improve their workout. Compared to existing research, the proposed approach provides the three following practical enhancements. First, most existing approaches have practical limitations. For example, methods for outdoor physical activity recognition are usually based on frequency analysis, and since the number of cycles is large, a few misclassifications are tolerable, but such errors are not tolerable for plyometric exercise. The proposed method works well for short-term cyclic movement exercises (e.g., push-ups) and for long-term cyclic quick movements exercises (e.g., running and walking). Second, unlike other comparable machine-learning-based approaches that need a lot of training data, the proposed method needs one sample of motion data of each target exercises and yet performs reasonably well (accuracy > 95%). Finally, although not yet validated, the proposed approach has also the potential to evaluate the quality of the workout.

Behavior Recognition and Step Counting from Wearables
Step counting has been extensively studied in the ubiquitous computing community. Many works have proposed accurate algorithms to count accurately walking and running steps from a smartphone worn in the trousers pocket or at the upper arm [14][15][16], but also from a smartwatch [17,18].
Step counting is now a standard functionality in most smartphones and smartwatches. Still, false positives are still unsolved issues. The main reason for that is motion noise that produces the same signals as walking.
However, when IMUs are at the ear, they find that many of the lower-body motions are naturally "filtered out", i.e., these noisy motions do not propagate up to the ear. Hence, the earphone IMU detects a bounce produced only from walking. Prakash et al. introduced the advantages of eSense in counting the number of steps of walking [9]. While head movement can still pollute this bouncing signal, they developed methods to alleviate the problem. Results show 95% step count accuracy even in the most difficult test case-very slow walk-where smartphone and wrist-band-type systems falter. Importantly, their system STEAR (STep counting from EARables) is robust to changes in walking patterns and scales well across different users. Additionally, they demonstrate how STEAR also brings opportunities for effective jump analysis, often crucial for exercises and injuryrelated rehabilitation.
Bayat et al. [19] proposed a machine-learning-based recognition system to recognize certain types of human physical activities using acceleration data generated by a user's smartphone, and could reach an overall accuracy rate of 91%. Similarly, Balli et al. [20] can classify eight different daily human activities with high accuracy from smartwatch sensor data using a hybrid of principal component analysis and random forest algorithm. More recently, Teng et al. [21] demonstrated on several open datasets that convolutional neural network (CNN) models could improve further the performance across a variety of HAR (human activity recognition) tasks.
While many researchers and developers have been developing applications based on smartphones and smartwatches, Kawsar et al. [22] proposed and developed a new wearable platform called "eSense" (see Figure 2). The eSense platform consists of a pair of wireless earbuds augmented with kinetic, audio, and proximity sensing. The left earbud has a six-axis IMU with an accelerometer, a gyroscope, and a Bluetooth Low Energy (BLE) interface used to stream sensor data to a paired smartphone. Both earbuds are also equipped with microphones to record external sounds.
The use of earphones to listen to music while exercising is widespread, and though the eSense platform is still recent, it already attracted the attention of many research teams. It can simultaneously monitor behavior analyzing the sensory information and provide feedback that does not bother the visual field of the user through the acoustic interface. Indeed, repeated check of some visual feedback provided on a smartphone or smartwatch screen may be dangerous and the cause of accidents when done during exercises implying motion. For example, Prakash et al. developed an algorithm that can perform robust step counting and jump analysis from the inertial signals streamed by the eSense earbuds [9]. In their study, they also showed the ear position is advantageous to collect motion signals since it enables to filter of lower-body noisy motions naturally. On the other hand, Radhakrishnan et al. proposed to use the eSense platform to improve user engagement during indoor weight-based gym exercises [23].

Vision-Based Exercise Recognition
There exist many studies that quantitatively evaluate the performance of sports and physical exercises. These researches are often based on three-dimensional (3-D) image analysis, whether it is for baseball [24][25][26][27][28][29], tennis [30][31][32][33], or games [34]. Typically, the evalu-ation is based on kinematics and the dynamics of joint motions of shoulder, elbow, forearm, wrist, and fingers during pitching. For example, Antón et al. [35] introduced a Kinect-based algorithm for the monitoring of physical rehabilitation exercises. The algorithm recognizes the main components of the exercises, postures, and movements in order to assess their quality of execution. Moreover, this game-like immersive framework motivates them to do the rehabilitation sessions more enjoyable. Despite only a few samples in the training step, the algorithm is capable of making real-time recognition of the exercises and achieved a monitoring accuracy of 95.16% in a real scenario when evaluated on 15 users.
In general, vision-based approaches are more accurate than wearable sensor-based approaches for exercise recognition. Although they achieve good performances, the use of a vision-based sports/exercise recognition system is limited to dedicated locations. 3-D image analysis is complex and computationally intensive. This limitation is, however, minimized by the possibility to perform some preprocessing on the sensor level.

Skill Science
Up to now, many researches have proposed to evaluate sports skills quantitatively. For long time, they have been principally carried out based on three-dimensional image analysis, whether it is for baseball [24,29] or tennis [30,33]. Along with the widespread use of wearable sensor devices, research and techniques for analyzing the movement of bodies and tools from acquired data are progressing in sports fields and the like by attaching sensors to the body and gears. In the field of skill science, there are some research works consisting in attaching a sensor to a tennis racket and analyze its behavior [36], and others focusing on the estimation of baseball pitching speed using a wrist-mounted acceleration sensor and laser apparatus [37]. However, most proposed accurate solutions are base on dedicated sensors ("Smart Tennis Sensor" by Sony Corporation [38]) or the wrist ("Babolat Play" by Babolat [39]), and require computer postprocessing, such that there is no real-time nor onsite feedback to improve skills.
With the popularity of smartwatches and other smart wearable devices that integrate multiple sensors, there is less need for exercise-specific hardware development. Smartwatches generally have built-in microelectromechanical systems (MEMS), IMU, and pulse rate (PR) sensors. Therefore, these devices need only software applications to be developed for each targeted sport or exercise. In their extensive review of technologies available for tennis serve evaluation, Tubez et al. raise the great prospect offered by markerless systems based on inertial measurement units for real situation evaluation [40]. Examples are the applications developed by Lopez et al. [41] for supporting an athlete or a beginner with baseball pitching action and tennis serve action. The personal sport skill improvement support application is running on Sony's SmartWatch SWR50 and does not even need to communicate with the paired smartphone to perform onsite movement analysis and feedback. The comparative research using the proposed smartwatch applications for sport skill improvement support achieved encouraging results.

Recognition of Movement-Repetition-Based Exercises
One of the relevant previous work is that of Dan et al. [42], who introduced RecoFit, a system for automatically tracking repetitive exercises such as weight training and calisthenics via an arm-worn inertial sensor. They addressed three challenges: segmenting, recognizing, and counting of several repetitive exercises. They achieved precision and recall greater than 95% in segmenting exercise periods, 99%, 98%, and 96% of recognition of 4, 7, and 13 exercises, respectively, and 93% of ±1 repetition of counting accuracy. However, the method of RecoFit needs five seconds to segment and recognize exercise. In the case of a small number of counts, it cannot find correct exercise and count. It requires a dedicated device attached to the forearm; that implies a supplementary cost for users that have to buy a device for a particular and limited usage, as well as the burden of attaching a device to an unusual part of the body.
Viana et al. [43] proposed an application called GymApp, similar to the system mentioned above, but applied to workout exercise recognition. It also runs on Android OS smartwatches and monitors physical activities, for example, in fitness. It has two modes of operation: training mode and practice mode. In training mode, an athlete is advised to perform an exercise (e.g., biceps curl) with lighter weight and with the supervision of a fitness instructor to guarantee the correctness of the performed exercise. The application then gathers sensory data and builds a model for the performed exercise using supervised machine learning techniques. Then, in the practice mode, the recorded sensory data are compared with the previously acquired data. The application calculates the similarity distance and, from the result, estimates how many repetitions of the exercise were performed correctly.
More recently, Skawinski et al. [44] consider four different types of workout (pushups, situps, squats, and jumping jacks), and proposed a workout type recognition and repetition counting method based on machine learning with a convolutional neural network. Their evaluation with data from 10 subjects wearing a Movesense sensor on their chest during their workout resulted in 89.9% average detection of workout and 97.9% average detection accuracy for repetition counting.
Although the above-described studies are promising, they are based on machine learning techniques. It implies a necessary preliminary step to collect data to train a model for each type of targeted movement, as well as for each type of sensor or sensor position (wrist, chest, arm, head, etc.). This training step is a burden for the users and a disadvantage towards deploying the technology.

Summary
Most of the works related to detailed exercise recognition achieve around 95% for each defined exercise under the condition of only indoor workouts or only outdoor exercises like walking and running. Thus, in this research, we aim to recognize both indoor and outdoor exercises while keeping with the same accuracy. We define indoor exercises as physical activities performed on the spot, such as push-ups and sit-ups, usually performed at home or a sports gym. Contrarily, we define outdoor exercises as physical activities involving the displacement of the whole body, such as running and walking, usually performed outdoor (though you can use some running machines indoors).
Many of them are based on machine learning techniques, which often require a new dataset for each new user. Thus, this research also aims at proposing a method that provides accurate real-time segmentation, classification, and counting of physical exercises without needing recalibration for each user.

Methods
In this section, we introduce the method of the proposed system. In Section 3.1, the outline of ExerSense is presented. Then, in Sections 3.2 and 3.3, we describe, respectively, the details of segmentation and classification. Finally, we briefly explain how counting is performed in Section 3.4. Figure 3 represents a broad schematic of the architecture of the proposed recognition method, ExerSense. It is separated into two phases: preprocessing and runtime phase. As described later, the proposed method works independently of the kind of devices. In the preprocessing phase, some acceleration data are collected by target devices at least one motion for each target exercise. Because the method uses a correlation-based algorithm to classify each motion, only one single motion sample of the target exercise is needed in advance. That is a significant advantage of the correlation-based approach against approaches based on machine learning. In the case of image classification, natural language processing, and so on, data are extensively available on the Internet and easy to collect physically. However, in the case of exercise recognition, it is tough to collect training data for machine learning.

Outline of ExerSense
The runtime phase starts with the segmentation of the streamed acceleration signal into single motions by finding the peaks in the synthetic acceleration signal. The next section explains in detail the segmentation process. Then, every segmented 3-D acceleration signal is classified by comparison with each exercise's motion template produced in the preprocessing phase using a correlation-based algorithm, and the count of classified exercise is incremented.

Segmentation Algorithm for Single Motion Extraction
Hereafter we describe the process of segmentation algorithm from a 3-D acceleration signal collected at the chest during push-ups exercise. First, the synthetic acceleration of streamed inertial sensor data, which is the norm of the 3-D acceleration signal, is calculated. In the case of push-ups, peaks detection and motion segmentation may be performed using only the longitudinal acceleration of raw data. However, it is not the right solution since this research targets not only push-ups but also other types of exercise, including those that do not imply movements in the longitudinal direction. Therefore, the synthetic acceleration is more appropriate, though it presents a disadvantage of reducing the differences between movements that are similar but along a different axis.
The result of the norm includes much noise. Applying short-term energy enables not only to emphasize significant signal variations but also to smooth them. Smoothing is important to detect only motion start and end peaks easily.
Then, we used a sliding window of 0.25 s length to detect peaks. The tempo of the running steps is the shortest tempo among regular exercises. After observing various persons running, the fastest tempo more than three but less than four steps per second. Hence, to avoid having two steps in a sliding window, we chose 0.25 s as the optimal size. If the center value of the window is the maximum value of the window, then it is determined as a peak. The fourth plot shows detected peaks plotted on the smoothed norm of acceleration signal collected during push-ups exercise.
Finally, the synthetic acceleration signal (x × x + y × y + z × z) is segmented by extracting the data between the period of two consecutive peaks. Such, we define a "segment of exercise" as the raw acceleration data between the time interval of two consecutive peaks extracted from the smoothed synthetic acceleration signal, and containing a single motion of an exercise (e.g., one step, one jump, one push-up, etc.).
In most cases, one peak is detected for each motion. However, in the case of sit-ups, multiple peaks are detected for each motion (see Figure 4). To be able to deal with this case, one of the peak-to-peak periods (yellow-colored in Figure 4) is defined as sit-up base motion. Yellow-colored peak-to-peak represents "wake-up" motion during sit-up.
Because "wake-up" is the most important movement for sit-up training, we selected the area.  Figure 5 shows the processing flow of the proposed classification method. After extracting the 3-D acceleration signal corresponding to a single motion through the segmentation process, the dynamic time warping (Algorithm 1) algorithm is applied to calculate the distance between every template signal and the extracted signals. The dynamic time warping (DTW) can calculate the distance between two time series data that have different lengths. This is a crucial property since it offers the capability to deal with the shape of signals issued from one identical exercise, independently of the speed the exercise motion is performed. Finally, the proposed method classifies the exercise that has the minimum DTW score as the ongoing exercise. In our previous work [7], artificial coefficients are applied to DTW score to increase the performance. These coefficients were determined by variances of the three axes that are affected by the body influence, the direction of maximum movement, and the intensity of movement. However, these coefficients were predefined by authors based on experiences and only for the chest-mounted sensor. In this work, we removed the coefficients to compare multiple device positionings.

Counting
After the classification step, it is easy to count each exercise. Only what we need to do is to iterate by one the counter for each classified exercise. However, in the case of sit-ups, the proposed method divides one motion into three segments. One of the three segments will be similar to template data, but other similarities are unlikely. Thus, we can count correctly with the combinations of segmentation and classification.

Experimental Results
This section presents the experimental datasets in Section 4.1 and describes the proposed method's accuracy in Section 4.2.

Datasets
The experimental conditions are described in Section 4.1.1, the targeted exercises are defined in Section 4.1.2, and the collected segments are discussed in Section 4.1.3.

Conditions
• Experimental circuit -As mentioned in the introduction, this research targets exercises including indoor workouts and outdoor activities. A circuit to perform five exercises has been created to evaluate the proposed method. The order of the five exercises, which is explained in the next section, is determined randomly and systematically. • Participants -Fifteen university student participants were recruited. Participants varied in weight from 58kg to 80kg, and self-assessed as performing exercise "at least once a week," with an average of four times a week. Each participant performed all exercises once according to the conditions described above. Due to the missing value of three participants, we used valid data from 12 participants to validate the proposed method. • Sensors -This research aims to develop an exercise recognition and counting method that is deployable with various commercially available general use wearable devices (e.g., smartwatches, smart glasses, chest bands, etc.). Such a method needs to be robust to devices and their positioning. One can assume that the chest movement, like the head, has less noise than other body parts. Chest sensors are also commonly used by people practicing exercise several times a week to monitor their heart rate. Hence, our previous study [7] demonstrated the validity of the proposed method based on the signal of an IMU mounted at the chest. As a chest-mounted sensor, we used Suunto Movesense Sensor HR+ (Movesense), consisting of a nine-axis motion sensor, heart rate sensor, and Bluetooth within 10 g [45]. In this work, in addition to Movesense mounted at the chest, we used three other wearable devices that are often worn by people when practicing physical exercises: a smartwatch attached to the left wrist, a smartphone attached to the upper left arm, and a wearable device (Nokia Bell Labs eSense) attached to the left ear (see Figure 6). All four wearable devices integrate a nine-axis IMU. The smartwatch and smartphone have some storage so that they could collect data by themselves. The chest-mounted and ear-mounted wearable devices do not have storage, so these two were connected to a smartphone by Bluetooth and streamed the acceleration data.

Definition of Exercises
The proposed method was evaluated and validated on the following five exercises. The reason why we chose these five exercises is that we suppose that exercise consists of indoor workouts and outdoor running/walking. Additionally, these five exercises can be completed on flat ground without any equipment. Participants ran and walked more than 20 steps each without caring whether they start with the right or left foot. They performed jumps, push-ups, and sit-ups around ten times each. The movements of jumping, push-ups, and sit-ups were predefined and explained using demonstration photos because there are various kinds of movements (see Figure 7).
In the case of the ear and the chest, it does not matter whether the running or the walking step is taken by the right foot or left foot. On the contrary, the upper arm and the wrist movements are different between the the right step and the left step. Accordingly, we separated the templates of the upper arm and the wrist by right and left. While there are five ear and chest templates, there are seven upper arm and wrist templates.
In previous work [7], the author performed the exercises to produce the templates for all five exercises, which are necessary for real-time classification. This time, we chose the templates randomly from the participants data and calculate the classification accuracy excluding the templates. Additionally, we repeated the validating process 50 times and got the mean accuracy.
In the case of the ear and the chest, it does not matter whether the running or the walking step 294 is taken by the right foot or left foot. On the contrary, the upper arm and the wrist movements are 295 different between the the right step and the left step. Accordingly, we separated the templates of the 296 upper arm and the wrist to right and left each other. So, while the number of templates of the ear and 297 the chest are five, the number of templates of the upper arm and the wrist are seven.

298
In previous work [7], the author itself performed the exercises to produce the templates for all five 299 exercises, which are necessary for real-time classification. This time, we chose the templates randomly 300 from the participants data and calculate the classification accuracy exclude the templates. Additionally, 301 we repeated the validating process 50 times and got the mean accuracy.  Under the conditions mentioned above, we collected segments of five each exercises by using 304 four different sensors. Table 1 shows the number of collected segments. Although some sensors have 305 few missing values, we generally used the number of segments for validation.

Number of Segments
Under the conditions mentioned above, we collected segments of five for each of the exercises using four different sensors. Table 1 shows the number of collected segments. Although some sensors have several missing values, we generally used the number of segments for validation.

Performances
The recall of segmentation and the performances metrics of classification are described in Sections 4.2.1 and 4.2.2, respectively.

Recall of Segmentation
We counted the segments cut out accurately by the proposed algorithm. Table 2 shows the recall of segmentation against truth counts. From the all collected segments, we randomly chose the template segments for each exercises and classified other segments. Additionally, we repeated the random validation process 50 times to avoid redundancy. Table 3 shows the classification accuracy for all five exercises listed by sensor position. As shown in Table 3, the chest was the most accurate position (97.2%) with a minimal standard deviation (4.4%), as expected in our previous work [7]. Next came the upper arm and the wrist (93.1% and 83.5%), with relatively low standard deviations (3.1%, 5.6%). The ear was the less accurate position with an average accuracy of 78.4% and a large standard deviation of 10%. Regarding the classification performances per exercise, jumping and push-ups had the worst F1 value that tended to have a large standard deviation, especially with the earable device, prone to be loosely attached. We can also raise the point that using a wrist-worn device, the lowest F1 value was for push-ups (74.5% ±14.4) due to little motion of the wrist during push-ups.

Comparison with Machine Learning Method
The proposed method extracts one sample of motion data of each target exercise from one subject data and uses it to recognize exercise data collected from unknown users. To compare the accuracy with a conventional machine learning method, we used both leaveone-subject-out and leave-other-subjects-out cross-validation. Indeed, leave-one-subjectout cross-validation uses plural subjects data for training, while the proposed method used only one subject data as a reference. In leave-other-subjects-out cross-validation, the model training is performed with only one subject's data and testing with all others. We repeated both validation methods for each user (in or out) and calculated the average confusion matrix for linear support vector machine (SVM) (see Tables 4 and 5).
Compared to the proposed method, for most types of exercise and sensor position, the machine learning method gives better accuracy when trained with plural users (leaveone-out) but lower accuracy when trained with only one user. These results confirm that the proposed method is advantageous compared to conventional machine learning methods when retraining for each new user is not affordable. Table 4. Leave-one-subject-out cross-validation accuracy of conventional machine learning methods (linear SVM model).

Each Exercise Total Running
Walking Jumping Push-Ups Sit-Ups As shown in Table 2, the chest-mounted IMU, arm-mounted smartphone, and wristmounted smartwatch achieved 91% recall. Even the worst one, the ear-mounted device, achieved 84% recall. It is said that the proposed segmentation algorithm works well at various positions.
However, we can see the significant differences for each exercise. While walking and push-ups achieved more than 90% recall, sit-ups achieved only 52-61%. The reason why the proposed segmentation method overlooked many sit-ups segments is that the most change in the moving axes occurs during one motion. As showing in Figure 8, the sit-up motion is circular, and it causes the most change in the moving axes. As a result, when the norm of three axes was calculated, plural peaks appeared. These peaks are ignored at the step of smoothing if they are small. However, in some cases, the invalid peaks are big and remain so after smoothing. Then, it is detected as the cutting point of segments.  Table 6 shows that the range of length of collected exercise segments for each device (position) and exercise type has significant variations. It means that each exercise is performed at various speeds. The DTW (dynamic time warping) algorithm has the specificity to be robust to different data lengths such that the proposed method was not affected by the same exercise's different execution speeds.  Figure 9 illustrates the box plot of 50 times repeated validation using random exercise segment template selection, with a mark of the mean accuracy for each exercise. Though the median and quarter percentiles limits for ear and wrist positions are partially overlapping, all device positions' mean accuracies are significantly different at the 5% significance level as summed up in Table 7. As described in Section 4.2.2, the ear mounted sensor's average classification accuracy was significantly lower than others. The head movements are less restricted and more prone to noisy motions than trunk and hand movements during physical exercises. Hence, the accuracy is more affected by the quality of the selected exercise template. The large standard deviation also confirms this issue.

Discussion about the Classification
However, since the proposed method is based on one template segment per exercise and position, this result also shows the importance of the template exercise segment's quality. Considering this, we should also refer to the maximum accuracy to fairly evaluate the potential of the proposed approach. Indeed, the maximum accuracy is the accuracy obtained when selecting optimal template exercise segments. In that case, the classification accuracy is 99.8%, 97.1%, 94.2%, and 93.4% for the chest, the upper arm, the wrist, and the ear, respectively (see Table 8). While the proposed method uses only one exercise segment template to recognize unknown users' exercises, such performances are equivalent to the machine learning model evaluated by leave-one-out cross-validation. Hence, using optimal template exercise segments, the proposed method is robust to various wearable device positions.

374
In this research, we proposed ExerSense, a method to segment, classify, and count multip 375 physical exercises in real-time. ExerSense is based on correlation method because only one motion 376 needed in advance. In the case that is difficult to collecting data such a physical exercise, correlatio 377 method is advantageous against machine learning method. 378 We collected acceleration data of five exercises by four different positioned sensors. In order 379 validate our proposed segmentation method, we counted the correct extracted segments. It recalle 380 more than 91% segments except 84% of the ear. Also, using the accurately extracted segments, w 381 Figure 9. Box plot of the accuracy of each random validation (n = 50).

Conclusions
In this research, we proposed ExerSense, a method to segment, classify, and count multiple physical exercises in real time. ExerSense is based on the correlation method because only one motion is needed in advance. In the case that is difficult to collect data for a physical exercise, it is more advantageous to use the correlation method instead of the machine learning method.
We collected acceleration data of five exercises by four different positioned sensors. In order to validate our proposed segmentation method, we counted the correct extracted segments. It recalled more than 91% of segments, except 84% of the ear. Using the accurately extracted segments, we validated the classification method. The most accurate one was the Movesense sensor mounted to the chest; it achieved 99% accuracy. The smartphone mounted to the upper arm was a close second with 94%. The smartwatch mounted to the wrist was third with 86%, and the worst one among the four was the eSense mounted to the ear with 76%.
The proposed method, ExerSense, segments and classifies multiple exercises accurately in general usage devices. We found that the ExerSense system works at various positions. Though it has room for improvement of sit-up segmentation, it achieved high accuracy for most of regular exercises. In future works, we will improve the ExerSense algorithm and test it on other exercises and other positions.  Institutional Review Board Statement: "The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Ethics Committee of Aoyama Gakuin University (approval number ." Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Data collected or analyzed in this study are not available for sharing.