Recognition and Repetition Counting for LME exercises in Exercise-based CVD Rehabilitation: A Comparative Study using Artiﬁcial Intelligence Models

: Exercise-based cardiac rehabilitation requires patients to perform a set of certain prescribed exercises a speciﬁc number of times. Local muscular endurance (LME) exercises are an important part of the rehabilitation program. Automatic exercise recognition and repetition counting, from wearable sensor data is an important technology to enable patients to perform exercises independently in remote settings, e.g. their own home. In this paper we ﬁrst report on a comparison of traditional approaches to exercise recognition and repetition counting, corresponding to supervised machine learning and peak detection from inertial sensing signals respectively, with more recent machine learning approaches, speciﬁcally Convolutional Neural Networks (CNNs). We investigated two different types of CNN: one using the AlexNet architecture, the other using time-series array. We found that the performance of CNN based approaches were better than the traditional approaches. For exercise recognition task, we found that the AlexNet based single CNN model outperformed other methods with an overall 97.18% F1-score measure. For exercise repetition counting , again the AlexNet architecture based single CNN model outperformed other methods by correctly counting repetitions in 90% of the performed exercise sets within an error of ± 1. To the best of our knowledge, our approach of using a single CNN method for both recognition and repetition counting is novel. In addition to reporting our ﬁndings, we also make the dataset we created, the INSIGHT-LME dataset, publicly available to encourage further research.


Introduction
Cardiovascular disease (CVD) is the leading cause of premature death and disability in Europe and worldwide [1]. Exercise-based cardiac rehabilitation is a secondary prevention program which has been shown to be effective in lowering the recurrence rate of CVD and reducing the need for medicines [2,3]. Cardiac rehabilitation is organised in three phases, with the third phase targeting long-term exercise maintenance by attending community-based rehabilitation programs or through home-based exercise self-monitoring programs. However, a significant challenge is that the uptake and adherence of community-based cardiac rehabilitation is very low, whereby only 14% to 43% of cardiac patients participate in rehabilitation programs [4,5]. Key reasons for this include lack of disease-specific rehabilitation programmes, long travel time to such programmes, scheduling issues, and low self-efficacy associated with a perception of the poor 'body image' or poor exercise technique [6]. Part of the solution is delivering an alternative, such as a technological platform that can motivate the user to engage with exercise-based cardiac rehabilitation and enable them to do so in any environment. In an ideal scenario,people would undertake a variety of exercise programs, either specifically prescribed or based on personal preference, that suits their goals and that allows them avoid exercise associated with comorbidities, for example arthritis of the shoulder. In this scenario of "exercising anywhere", where people may be exercising in their own homes on their own, it is extremely important that they receive feedback on the exercises in order to help them track their progress and stay motivated. A technological approach may facilitate this by integrating a single wearable sensor for assessing exercise movement into an appropriate smartphone application (i.e. eHealth and eRehabilitation). However, two key challenges are presented with this approach. Firstly, it is important to be able to automatically recognize which exercises are being completed and secondly, once recognised to provide the number of repetitions as quantitative feedback on the amount of exercises performed to build the user's confidence. This would also allow people to complete elements of their own training program disbursed over the day in any environment. For example, someone could complete different exercises in home or in the workplace [7,8].
In recent years, wearable sensors have been used in the assessment of human movement in many domains including health [9], wellness [10], musculoskeletal injury and sport [11]. Research on CVD patients found that most (67%~68%) patients showed interest in mobile phone-based CR [5]. Accelerometer and Gyroscopes along with magnetometer in wearables are accurate in measuring translatory and rotary movements are frequently found in wearable sensor units. Multiple wearable sensors have been deployed on the body to recognize the day-to-day activities like walking, jogging, climbing stairs, sitting, standing [12][13][14][15], exercise feedback [16] and in qualitative evaluation of human movements [17], gym activities [18], and rehabilitation [19]. However, keeping in mind the likelihood of technology adoption combined with the aesthetics of wearing multiple sensors, it is more likely that individuals would be willing to wear a single sensor with the wrist being the ideal worn location.Single wearable sensor usage studies, either phone-based or using inertial measure units, have been conducted in recent years. Applications include recognising day-to-day activities [20][21][22][23][24][25], recognition of multiple complex exercises [26] or single exercises like lunges [16], squats [27] as well as repetition counting [8,26,28]. Detailed related work is discussed in the next section 2: Related Work.
In this work, we focus on exercise recognition and repetition counting using a single wrist-worn inertial sensor for 10 local muscular endurance (LME) exercises that are specifically prescribed in exercise-based CVD rehabilitation. We present a comparative analysis between different traditional supervised machine learning algorithms and the current state of the art methods in deep learning architectures. As the novelty of this work, we claim following novel contributions. Firstly, we propose the use of a single CNN model for the repetition counting task of different exercises. Secondly, the dataset we created, the INSIGHT-LME dataset, is made publicly available to encourage further research on this topic (https://drive.google.com/open?id=124ugOPzzoXFTmSJv9K37NC18Lz1FnFPO).

Related Work
Machine learning and deep learning are artificial intelligence methods that employ statistical techniques to learn underlying hidden distributions from observed data. Technology advances in sensor manufacturing and micro-miniaturization has resulted in low-cost micro-sensor devices like wearables, that are capable of effective lossless streaming and/or storing translatory and rotary movement information for further processing. The application of machine learning methods to study data from human movements and activities with a view to detecting and understanding these activities is referred to as human activity recognition (HAR).
Within the HAR literature, many machine learning and deep learning-based models have been used to study day-to-day routine activities like walking, jogging, running, sitting, drinking, watching TV, etc., [9,13,14,[29][30][31][32] and to assess sporting movements [11] and indoor based exercises for strength training [33]. Machine learning methods have been predominantly used for exercise recognition using multiple wearable sensors [17,18,34,35], and specifically in the areas of free weight exercise monitoring [36], the performance of lunge evaluation [16], limb movement rehabilitation [19], and intensity recognition in strength training [37]. However, the use of multiple sensors is non-ideal in practice because of cost, negative aesthetics and potential user uptake [31]. Therefore, in our research, we would like to use only a single wrist-worn inertial sensor for exercise recognition and repetition counting.
Several studies [13,14,20,21,28,31] use a single wrist-worn wearable or phone-based sensor for recognition of various day-to-day activities with machine learning models. However, very few studies have been completed in recognizing exercises, and even then only with a limited number of exercises: lunges evaluation [16], correct squat measurement [8,16,38]; single best axis for exercise repetition counting [28]. A very recent article on recognition and repetition counting for complex exercises uses deep learning [26] which recognizes a set of CrossFit exercises using a single CNN for exercise recognition, but multiple CNN models for individual the repetition counting. However, no other studies appear to have studied a wide number of exercises, and none specifically for CVD rehabilitation through LME exercises.
To date, the vast majority of HAR studies detailed above have used traditional machine learning approaches such as decision trees, Naive Bayes, random forest, perceptron neural networks, k-nearest neighborhood, and support vector machines. There is, however, a growing interest in the potential use of deep learning methods in the field of activity recognition mainly using CNN [26,[39][40][41][42], and recurrent models [40,43]. A small number of studies [39,40,42,44] have shown the significant advantage of using deep learning models in the general area of human activity recognition. However, very few studies appear to have used deep learning models in exercise recognition and repetition counting. In addition, to date no studies have done a comparative study of using traditional machine learning methods and state of the art CNN methods to identify the best possible method for exercise recognition and repetition counting. Furthermore, to the best of our knowledge, there are no works reported using single deep a CNN model for exercise recognition and repetition counting following recognition. The use of a single model for repetition counting is attractive as it eliminates the need for an exercise specific repetition counter and reduces the dependency on the total number of resources required in repetition computation.

Data Set
Currently, there exists no publicly available data sets with a single wrist-worn sensor for endurance-based exercises that are commonly prescribed in cardiovascular disease rehabilitation (CVD) programs. Therefore, we collected a new data set of local muscular endurance exercises for this purposes Table 1. In the data collection process, consenting participants performed the ten LME exercises in two sets (constrained set and unconstrained set) and some common movements which were observed by any exerciser in between two exercises.The constrained set of exercises involves participants performing the exercises while observing demonstrative videos and following the limb movement actions relatively synchronous with the demonstrator in the video. The unconstrained set of exercises involved participants performing the set of LME exercises without the assistance of demonstrative videos. Inclusion of the non-exercise movements were essential that the built models can distinguish the actions corresponding to the exercises movements from that of non-exercise movements. The data set was then used for training, validating and testing different machine learning and deep neural network models.

Sensor Calibration
Sensor calibration is a method of improving the sensor unit's performance to get a very precise and accurate measurement. The Shimmer3 (Figure 1(a)) inertial measurement unit (IMU) is a light-weight wearable sensor unit from Shimmer-Sensing 1 . Each IMU comprises of a 3 MHz MSP430 CPU, two 3D accelerometers, a 3D magnetometer and a 3D gyroscope. A calibrated Shimmer3 IMU, when firmly attached on the limb, is capable of collecting precise and accurate data. Each Shimmer3 has a microSD to store the data locally or can stream the data over Bluetooth.
Shimmer3 inertial measurement units were used in the exercise data collection process and they were calibrated using Shimmer's 9DoF calibration application 2 . The IMUs were used with a sampling frequency of 512 Hz along with a calibration range of ±16g for the 3D low noise accelerometer and a ±2000dps for the 3D gyroscope. An elastic wristband was used to firmly place the Shimmer3 IMU on the right-wrist of each participant during the data collection process. The sensor orientation and pictorial representation of the unit attachment on the right-wrist are shown in Figure 1(a) and

LME Exercise set and Experimental Protocol
Ten local muscular endurance (LME) exercises, as listed in Table 1, prescribed in cardiovascular disease rehabilitation were selected for data collection. The ten exercises comprise of six upper-body exercises: bicep curls (BC), frontal raise (FR), lateral raise (LR), triceps extension-right arm (TER), pec dec (PD), and trunk twist (TT); along with four lower-body exercises: squats (SQ), lunges -alternating sides (L), leg lateral raise (LLR), and standing bicycle crunches (SBC). The representative postures for 1 http://www.shimmersensing.com/products/shimmer3 2 https://www.shimmersensing.com/products/shimmer-9dof-calibration the execution of six upper-body LME exercises are shown in Figure 2 and that of four lower-body LME exercises are shown in Figure 3.  A pair of 1 kg dumbbells were used by each participant while performing BC, FR, LR, and PD exercises. A single dumbbell of 1kg were used during TER, TT, L, and SQ. Exercises LLR and SBC were performed without dumbbells. The data from these exercises correspond to ten different classes of exercise. The ten exercises that were used in CVD rehabilitation were either employed a single joint movement effect (BC, FR, LR, PD, TER, and LLR) or employed multiple joint movements (TT, L, SQ, and SBC). Some of these exercises have significantly similar arm movements and hence it was considered of interest to investigate how the models were able to distinguish between these exercises. It was also of interest to see how robust the models were in terms of their capacity to distinguish between the exercise actions in comparison to limb movements that were commonly observed between the exercises. The common limb movements selected for inclusion were side bending, sit-to-stand and sand-to-sit, lean down to lift water bottle or dumbbell kept on the floor, arm-stretching front-straight, lifting folded arm up-word, and body stretching up-word with calf raising for relaxation. These observed common actions have significant similarity in terms of limb movement with that of the exercises. The data corresponding to these common actions together describe an eleventh class of movement.

Participants
A total of seventy-six volunteers (47 males, 29 females, age group range: 20 -54 yrs, median age: 27 years) participated in the data collection process. No participants had any musculoskeletal injury in the recent past which would affect the exercise performance and all were healthy. Having prior knowledge of exercise was not a criterion in volunteer recruitment. The study protocols used in data collection were approved by the university research ethics committee [REC Reference: DCUREC/2018/101].

Data Capture Process
The exercise protocol was explained to the participants on their arrival to the laboratory. Each participant underwent a few minutes of warm-up with arm-stretching, leg-stretching and basic body-bending exercises. We developed an exclusive MATLAB-GUI module [ Figure 4a] to collect the data from the participants wearing IMUs via Bluetooth streaming. The "Exercise Data Capture Assist Module" was designed to select a particular exercise, to play demo videos to illustrate how to perform the exercise, to initialize and disconnect Shimmer IMUs remotely, to start recording exercise data, to stop recording exercise data and a storage path location to store the streamed data. The streamed data were stored automatically with participant_ID and the exercise type in the filename, completely anonymizing the details of the participants. The Shimmer-MATLAB Instrument driver interface was used to connect and collect data from multiple shimmer units, therefore the designed module was capable of recording from multiple participants at any given time.  All consenting participants performed the ten exercises in two sets and the common movements as described in section 3.2. During the constrained set of exercising, the participants performed the LME exercises while observing demonstrative videos on the screen and following the limb movement actions relatively synchronous with the demonstrator in the video. Participants were told to pay particular attention to the following: the initial limb resting position, how to grip the dumbbells (in case the exercise requires the use of dumbbells), the limb movement plane and the speed of limb movement during demo video. The constrained setup facilitated minimal variations in the collected data in terms of planar variations and speed and thus ensuring participants perform exercises at a similar tempo of movement. The participants were asked to perform each exercise for 30 seconds which resulted in approximately 7 to 8 repetitions. After each exercise, participants were given sufficient time to rest before moving on to the next exercise.
During the unconstrained set of exercising, a timer was used and displayed on the screen. Participants performed the exercises by recalling what they had learned during the constrained performance and were free to execute them for 30 seconds. The data collected during the unconstrained set corresponds to a variable range of variations from that of exercise data collected from the constrained set of execution. The variations observed were in terms of the plane of limb movement, speed, and the rest position of the limb; these variations were used to mimic macro variations that would typically during home-based exercising.
In addition to the constrained set and the unconstrained set of data collection, participants were instructed to perform the common movements as stated in section 3.2. Inclusion of these non-exercise movements were essential that the built models can distinguish the actions corresponding to the exercises movements from that of non-exercise movements. Participants were asked to perform each 7 of 33 of these actions repeatedly for about 30 seconds. The 5 second instances from each of these actions represents almost one full action and collectively constitutes the eleventh class.
All IMUs used in the process of data capture were calibrated as stated in section 3.1. The IMU was securely placed on the right-wrist, as shown in Figure 1(b), with the help of an elastic band and demo videos of all the exercises were shown to each participant. Data collected from both the constrained set and the unconstrained set were class labelled and stored in ten different exercise folders. An eleventh class labelled as "others" was created to store the data from all of the common movements.
Among 76 participants, 75 people participated both in the constrained set and unconstrained set of data collection. However, one participant performed only the constrained set. Only a few participants had not performed all the exercises. However, overall the collected data set was well balanced and Table 2 indicates the participation summary for each exercise under the constrained set and the unconstrained set of data capture. The data set was then segregated and stored into three different sets: the training set, the validation set and the test set, and were used in all model building. The data from 46 participants were used in the training set and the data from 15 participants were used in both the validation set and the test set. The entire data set is termed the INSIGHT-LME dataset.  Figure 5 represents an overall framework with three major processing blocks. The comparative study aims to find the best possible method from the different AI models for each task in automatic exercise recognition and repetition counting. The first block represents the INSIGHT-LME data set processing and data preparation in terms of filtering, segmentation, 6D vector generations and/or 2D image creation. Data preparation requirements were different for each specific method used in both comparative studies and hence data processing specifics pertaining to individual method are discussed along with each model below.

Methodology: Overall Framework
The second block represents the comparative study for the exercise recognition task. The exercise recognition task was treated as a multi-class classification task. We compared traditional approaches in the exercise recognition, which were supervised machine learning approaches, with two recent deep CNN based approaches. The first deep CNN model used the AlexNet architecture and the second deep CNN model was developed from scratch to use the time-series array information. In supervised machine learning (ML) models, different models were constructed using the four supervised algorithms such as SVM, RF, kNN and MLP. The eight models from these four ML algorithms were studied with and without the dimensionality reduction measures using principal component analysis (PCA). The best model from the supervised ML set was then compared with two deep CNN models to find the best possible method for the exercise recognition task.
The third block represents the comparative study for the repetition counting task. The repetition counting task was treated as a binary classification task followed by a counter to count the repetitions. Again three different methods were used in repetition counting and the performances were compared to find the single best method for repetition counting. We compared traditional signal processing models based on peak detection with two deep CNN approaches. Deep CNN models were similar to the exercise recognition task, the first deep CNN model used the same AlexNet architecture as previously, whilst the second deep CNN model was developed from scratch to work with the time-series array information. Figure 6 illustrates the end-to-end pipeline framework adopted for supervised machine learning model-based (ML) exercise recognition. As discussed in section 4, a total of eight supervised machine learning models were studied using this framework to classify the eleven activity classes, in which, ten classes were corresponding to the ten LME exercises and the eleventh class "others" for the common movements observed during exercising. The eight supervised machine learning models were constructed using four algorithms, support vector machine (SVM), random forest (RF), k-nearest neighbors (kNN), and multi-layer perceptron (MLP), either with or without dimensionality reduction using principal component analysis (PCA). Raw data from the INSIGHT-LME dataset (section 3.4), corresponding to ten LME exercises prescribed for CVD rehabilitation and the common movements, were subjected to data segmentation. 25 seconds of 3D accelerometer and 3D gyroscope data of each exercise, excluding an initial few seconds of recording, were segmented from each participant retaining class-label information. The process of segmentation was carried out on all the three sets: training set, validation set and test set from the Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 26 July 2020 doi:10.20944/preprints202007.0634.v1

Exercise Recognition with Supervised Machine Learning Models
INSIGHT-LME dataset. The segmented 3D accelerometer and 3D gyroscope signal corresponding to the bicep curl exercise are shown in Figure 7. 3D sensor plots for all ten LME exercises are given in Appendix.A. The 25seconds of 6D segmented data consists of approximately five or six repetitions of an exercise with each repetition duration lasting approximately 4 seconds. The segmented data with retained class-label information was used in feature extraction in the next stage.

Feature Extraction
Time and frequency features were extracted from the 6D segmented data using an overlapping sliding window method. Three sliding window-lengths of 1 second, 2 seconds, and 4 seconds were used along with an overlap of 50% in all cases to find an optimum window length selection in the classifier design. The maximum window-length selection was restricted to 4 seconds because the length of one complete repetition of an exercise was approx. 4 seconds.
From each sliding window, statistical time-domain features, such as, mean, standard deviation, root mean square value, minimum and maximum values were computed from each axis of 3D accelerometer and 3D gyroscope data. Auto-correlation values were computed from the pairs of XY, XZ and YZ axes of both accelerometer and gyroscope sensor data. Frequency-domain features, such as, Fourier coefficients and energy were computed from each axis of the 3D accelerometer and 3D gyroscope data.
A vector of 48 features, 24 time-frequency features were thus computed for each sliding window and repeated for every slide. Class label information was retained for the feature vectors corresponding to each exercise class and the "others" class. A combined feature set, referred to as "training feature set", was formed by combining feature vectors from all the exercise classes and the "others" class from the training set. The training feature set is computed for each sliding windows of the 1 sec, 2 sec, and 4 sec window-length on the training set of the INSIGHT-LME data set. Similarly, the "validation feature set" and the "test feature set", are computed on each of the sliding windows of 1 sec, 2 sec, and 4 sec input data from the validation set and the test set of the INSIGHT-LME data set, respectively.
Feature sets computed over each sliding window length were then used for training, validation and testing of the four supervised machine learning models using ML algorithms (SVM, RF, kNN, and MLP) forming a total of 12 classifiers.

Feature Reduction using PCA
To study the effect of dimensionality reduction, principal component analysis (PCA) was used on the feature sets computed from section 4.1.2 to reduce the overall feature dimensionality of the input vectors to the ML models. Significant principal components, which were having an accumulated variance greater than 99%, were retained. New feature sets corresponding to the training feature set, Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 26 July 2020 doi:10.20944/preprints202007.0634.v1 validation feature set and test feature set were computed using PCA for each of the 1sec, 2sec, and 4sec window-length cases. New feature sets with dimensionality reduction using PCA were then used in the training, validation and testing of additional machine learning models using ML algorithms (SVM, RF, kNN, and MLP) for each window-length case, forming an additional 12 classifiers.
(a) First three significant components plot (b) Accumulated variances plot

. Classifiers for Exercise Recognition
Exercise recognition from the single wrist-worn inertial sensor data for a set of exercises prescribed for cardio-vascular disease rehabilitation is a classic classification task using machine learning or deep learning methods. A total of twenty-four classifiers were constructed from the feature vectors as explained in section 4.1.2 and 4.1.3 and were analysed for exercise recognition. Each classifier model was constructed using the training set feature vectors, with 10 fold cross-validation using the grid-search method to ensure the models to have optimum hyper-parameters (for SVM models, kernel options between rbf and linear, and model parameters C and gamma values; for kNN models to find the best k-value or number of nearest neighbors; for RF models the number n_estimator or the number of trees to be used in the forest; for MLP the step value α).
All models were first evaluated using the validation set feature vectors to evaluate the following: firstly, the optimum sliding window-length among all possible selected windowing methods was determined based on the validation accuracy measure. Secondly, to see the effect of dimensionality reduction on ML model performance. Finally, to select the single best-supervised ML model to recognize the exercises.

Experimental Results of Exercise Recognition with Supervised Machine Learning Models
A total of twenty-four classifiers were constructed using three sliding windowing methods with four supervised machine learning algorithms with and without dimensionality reduction using PCA. Among these models, SVM models were constructed using One-Vs-Rest multi-class classifier, and were designed have optimum hyper-parameters using a grid search method with 10-fold cross-validation. The values, C = 100, gamma = 0.01 and rbf kernel were found to be the optimum hyper-parameters for all the 6-SVM classifiers. For all the 6 kNN models, k = 1 found to be the optimum value and for all the 6 RF models, n_estimator = 10 found to be the optimum value. Similarly, for all the 6 MLP classifiers the step value, α = 1, was optimum over a range of 1.0E-5 to 1E+3 on a logarithmic scale.
Selection of suitable sliding window-length was done based on the validation results using the validation feature set. While the training score indicates the self classifying ability of the model, the validation score helps in accessing the suitability of any model deployment on the unseen data. The training and validation scores for all the twenty-four classifiers segregated with the corresponding window-length are shown in Table 3 and a pictorial comparison of the validation score measure is shown in Figure 9. The training and validation score values from Table 3 indicate that the models with the window-length of 2sec or 4sec were better self-trained and has better validation measures in comparison to the models that were built using 1sec window-length. A few more observations can be drawn from the validation score results from Table 3 and Figure 9. Dimensionality reduction from 48-D time-frequency vector to 30-D transformed vector by having an accumulated variance greater than 99% using PCA had no significant effect on the performance of the models. Performance in terms of validation score for ML models with PCA were almost comparable but not improved in comparison to ML models without PCA. Also, the performance of the supervised ML models built using 4sec window-length were better when compared with the models built with a window-length of 2sec in terms of validation score. Therefore, the eight supervised ML models constructed using 4sec sliding window-length were retained for further comparison using the test set feature data to find the possibility of having a single best classifier for exercise recognition. was found to be the single best performing model with a test score of 96.07%. The SVM model with PCA was found to be the second-best model with a test score of 95.96%. The best ML model, SVM without PCA, was further evaluated to find the performance of individual exercises and the statistical parameters such as precision, recall and F1-score measures for each exercise were tabulated in Table 4. The precision with which each exercise recognized varied from a low of 91.42% (FR) to as high as 100% (BC and TER). Recall rate was found to be 93.33% (LR and SBC) to 100% (TER) and measured F1-score measures was from 92.52% (FR) to 100% (TER).  These results are significant in that they indicate that with a single wrist-worn inertial sensor it is possible to recognize the LME exercises used in CVD rehabilitation process with a very good overall score of 96.07%. The significance of this finding is very important as the set of LME exercises used in this study are not only single joint upper-body exercises but also has exercises with multi-joint lower-body exercises. Lower-body exercises like standing bicycle crunches (SBC), squats (SQ), leg lateral raise (LLR), and lunges (L) were also accurately detected with a single wrist-worn device with  Figure 11. Confusions among the exercises having similar wrist-movement actions were evident from the confusion matrix plot and are discussed here. The first observed confusion was between two upper-body LMEs, the Frontal raises (FR) and the Lateral raises (LR), and a 6.36% of the FR exercises were confused with that of the LR while a 6.67% of the LR exercises were confused with that of FR. In both FR and LR exercises, raising the hands straight was commonly observed with significant movements on the plane of the accelerometer x-axis direction. However, the wrist-movement actions were different for FR from that of LR only during the movement from the initial resting position. The second observed confusion was between the exercises Pec Dec (PD) and the Standing bicycle crunches (SBC). A 3.94% confusion were observed in SBC from PD, whereas a 5.76% of PD were getting confused with SBC. The wrist rotary movements in the plane of the gyroscope y-axis direction were similar for these SBC and PD exercises. The third observation was for the lower-body LME exercise Lunges were getting confused with the common movements (others) and a 3.64% confusion was observed. However, the common movements (others) were confused with Lunges with a 5.8% confusion. Performance measurement or the capability of the classifier models were represented using the area under the curve plot or also known as the receiver operating characteristic (AUC-ROC) curve plots. The AUC-ROC curves are plotted with the true positive rate (TPR) on the y-axis against the false positive rate (FPR) the x-axis. Figure 12 represents the AUC-ROC plot for the SVM classifier without PCA and a minimum AUC value of 99.67% for FR to a maximum of 100% for BC, TT and TER .

Exercise Recognition with CNN_Model1 using AlexNet Architecture
The second method used in the comparative study of the exercise recognition task ( Figure 5) was a deep convolutional neural network (CNN) model using the AlexNet architecture ( Figure 13) [45]. The AlexNet model is an eight layer model with five convolutional layers, three fully-connected maximum pooling layers and a rectified linear unit (ReLU) as activation function in each layer. Batch normalization is used before passing the ReLU output to the next layer. A 0.4 dropout is applied in the fully connected layers to prevent overfitting of the data. This eight layered architecture generates a trainable feature map which is capable of classifying a maximum of 1000 different classes. Figure 14 represents an end-to-end pipeline structure of an optimized CNN model, using the AlexNet architecture, for the LME exercise recognition task. The LME exercise recognition task was an 11-class classification task and hence we used a final output layer, a fully connected dense layer, with a softmax activation function for the classification of 11 classes. An optimum CNN model with best learning rate, optimizer function and loss function was trained using the training data and was further validated using the validation data before testing the model with the test data from the INSIGHT-LME dataset (Section 3.4).   The CNN model with the AlexNet architecture (CNN_Model1) requires an input data in the form of 2D images of size 227×227. Data segmentation and processing methods were used to convert the 6D time-series data from the input INSIGHT-LME dataset to 2D images. To compare the results of CNN_Model1 with the ML models discussed in Section 4.1.5, a 4-second windowing method with an overlap of 1-second was used to segment the 6D (3D accelerometer and 3D gyroscope) time-series data and an image of size 576 x 576 with plots of all 6 axes were plotted. A 4-second, 6D time-series data segment of Bicep Curls was 2D plotted into a 576 x 576 image and is shown in Figure 15. Top three plots: red, green and blue, represents the plots corresponding to the 3D accelerometer x-axis, y-axis and z-axis; and the bottom three plots: cyan, yellow and magenta, corresponding to the plots of 3D gyroscope x-axis, y-axis and Z-axis.
An image dataset was generated, by data segmentation and processing method, from the entire time-series raw data of the INSIGHT-LME dataset using the 4-second windowing method with a 1-second overlap. The image dataset comprises of 11-classes of image data, among which, ten classes were from the ten LME exercises and the eleventh class from the common movements observed during the exercises. The training set was formed with a total of 43306 images from eleven class of data from 46 participants. Similarly, the validation set was formed with 13827 images from 15 participants and the test set was formed with 14340 images from 15 participants. Downsampling of images to 227 x 227 image was further achieved by data augmentation method in the input layer during the model implementation.

CNN_Model1 for the Exercise Recognition Task
An optimum model, termed CNN_Model1 here, was developed using python sequential modelling along with the Keras API, a high-end API for TensorFlow. The model constructed here was an optimum model with the best possible optimizer function, good learning rate to achieve better accuracy and with a very good loss function. The model was constructed with the choice of optimizer function among stochastic gradient descent (SGD), Adam, and RMSprop and the model was trained with varied learning rates ranging from 1e-03 to 1e-6 values. Also, the model was trained with loss functions such as categorical cross-entropy (CCE) and Kullback-Leibler divergence (KLD). The best model parameters were selected with an iterative evaluation using a varied number of epochs.
Data augmentations, like resizing of input dataset images and shuffling of input images were achieved using flow_from_directory method in ImageDataGenerator class from Keras image processing. Since the input images correspond to time-series data, augmentation operations such as shearing, flipping, and rotation tasks were not performed. CNN models were constructed using the training image dataset and validated using the validation image dataset while monitoring the validation loss. A model with a minimum validation loss was saved for each combination of network parameters. The model parameters such as training accuracy, validation accuracy, training loss and validation loss against the number of iterations were obtained and were plotted. A best model having the highest validation accuracy was selected and tested with the test image dataset and the resulting evaluation parameters such as test accuracy and loss measures were recorded. The best model, CNN_Model1, was then compared with the best model selected from supervised machine learning model. A complete list of the architecture parameters can be found in Table A1 in Appendix A.

Experimental Results of CNN_Model1
The CNN_Model1 having Adam optimizer, a learning rate 1e-4 with KLD loss function was the best model with a training score of 99.96% and a validation score of 94.01%. The model was further evaluated using the test set image dataset and an overall F1-score of 96.895% was recorded and was almost 1% better in comparison with the SVM model, the best performing supervised ML model (Section 4.1.5). The performance of CNN_Model1 for the individual exercises were evaluated and the statistical parameters measures like precision, recall and F1-score for each exercise were tabulated in Table 5. These test score measures of the individual exercise recognition of the CNN model with AlexNet architecture were comparably better with that of the SVM model (Table 4).  for upper-body and lower-body exercises is shown in Figure 17. The CNN_Model1 outperformed the SVM model in both the upper-body LME exercises and the lower-body LME exercises.

Exercise Recognition with CNN_Model2 using Time-Series Arrays
The third method, termed CNN_Model2, used in the comparative study of the exercise recognition task ( Figure 5) was the model that we designed and built from scratch using the deep CNN concept. The motivation to build CNN_Model2 was to process the raw time-series data from the wearable sensor through deep network without converting to images. Though the results observed from CNN_Model1 were very good for the exercise recognition task, it was interesting to to test a time-series data classification model for exercise recognition. Figure 18 represents the pipeline used in the development of a CNN model from scratch which uses wearable sensor data in the form of six 1-D time-series vector arrays. The CNN_Model2 was constructed using the time-series array data from the INSIGHT_LME data set (Section 3.4). Data segmentation and processing, model construction and the obtained results are discussed in the following sections.

. Data Segmentation and Prepossessing
Data segmentation was achieved using the sliding window method with a window length of 4seconds having an overlap of 0.5second. A 6D time-series data array of size (2048 x 1 x 6) was formed from every segmented frame. A 6D time-series array comprises six 1D time-series arrays corresponding to data from each axis of 3D accelerometer and 3D gyroscope. 6D time-series data set for the training, validation and testing were prepared from the training set, validation set and test set of the INSIGHT_LME data set and the class label information were retained in each data set. The data set was then used by CNN_Model2 for the exercise recognition task. Figure 19 illustrates the base architecture of the CNN_Model2 classifier for the recognition task. The deep network consists of seven 2D convolutional layers in addition to an input layer, two fully connected layers and a dropout layer. In each convolution layer, the convolution operation was followed by batch normalization, activation and max-pooling operations. The output of the seventh convolution layer was flattened and applied to fully connected layers with a ReLU activation function. A dropout of 0.25 was used, to prevent overfitting, before connecting to an output layer. A fully-connected output layer with a softmax activation function was used which was capable of classifying output into 11 classes. The number of filters used in seven convolution layers were 16, 16, 32, 32, 64, 64 and 96 respectively. The selection of the number of convolutional layers and the number of filters in each layer of the CNN_Model2 architecture were arrived after the initial few trials with different configurations. Table A2 of Appendix A lists the complete architecture parameters for the CNN_Model2.

CNN_Model2 Architecture for Exercise Recognition
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 26 July 2020 doi:10.20944/preprints202007.0634.v1 The CNN_Model2 was constructed using Keras API with the TensorFlow back end with the choice of optimizer function among SGD, Adam and RMSprop. Similarly, a best learning rate was selected by training the model over a range of 1e-03 to 1e-10 with a decay of 1e-01. The models were optimized using the loss functions categorical cross-entropy and Kullback-Leibler divergence to have lower losses. Early stopping while the model building by monitoring validation loss and learning rate scheduler using the ReduceOnPlateau function from Keras was effectively used. Data augmentations like shearing, resizing, flipping, rotation were not performed on the time-series data. Models were trained and validated using 6D array data set from training dataset and validation data set (Section 4.3.1). A model with a minimum validation loss and with the best validation accuracy was selected as CNN_Model2 and this model was further tested using the test dataset.

Experimental Results of CNN_Model2
CNN_Model2 having Adam optimizer with a learning rate 1e-7 with KLD loss function was found to be the best model recorded with an overall training score of 96.89% and a validation score of 88.97%. The model recorded an overall test accuracy of 95.61% with an overall F1-score measure of 96% for the test dataset. The CNN_Model2 performance was very much comparable with both SVM model from supervised ML model(Section 4.1) and CNN_Model1(Section 4.2). The performance of CNN_Model2, in terms of statistical parameter measurements such as precision, recall and F1-score, for individual exercise were measured and tabulated in Table 6. A comparative representation of all three models with relative statistical measures for the upper-body and lower-body LME exercises were shown in Figure 20. These statistical evaluation measures obtained from the CNN_Model2 were compared with the results obtained from the SVM model (the best performed supervised ML model) and CNN_Model1. The overall performance of the CNN_Model2 was found to be better than the SVM model performance. However, the overall performance of CNN_Model1 was found to be better compared to CNN_Model2.

Summary of Comparative Study of Models for Exercise Recognition
The following conclusions can be drawn from the model comparison. Among the supervised ML models, models that were constructed with the dimensionality reduction using PCA were observed with inferior performance compared to the models without PCA. Among the four ML models without PCA, the SVM model performed the best. The deep CNN model, the CNN_Model1, using the AlexNet architecture found to be the single best model when compared with the SVM and CNN_Model2, and outperformed both SVM and CNN_Model2 in terms of overall test accuracy, precision, recall and F1-score values. The second deep CNN model, the CNN_Model2, was found to be the second best and outperformed the SVM model in terms of overall performance over test accuracy, precision, recall land F1-score measures. The models were compared for their performances on the upper-body Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 26 July 2020 doi:10.20944/preprints202007.0634.v1 Figure 20. F1-score comparison of SVM, CNN_Model1 and CNN_Model1 for Upper-Body and Lower-Body LME exercises LME exercise recognition and lower-body LME exercise recognition and observed that CNN_Model1 outperforms the other two models

Exercise Repetition Counting with Peak Detection Method
The first method, among three, investigated for exercise repetition counting was a signal processing method based on peak detection. The concept of the peak detection method lies in the identification of the peaks corresponding to maximum or minimum signal strength of any periodic time-series data. Figure 21 represents the end-to-end pipeline used for peak detection and counting repetitions using peak information. Raw data from the INSIGHT-LME data set corresponds to 3D accelerometer and 3D gyroscope recordings for limb movement having for each of the exercises. Each exercise type exhibits different signal patterns on the different sensor axes and the signal strengths on any given axes is proportional to the plane of limb movement. The periodicity of the signal observed on any significant axis of the sensor was used in the peak detection after completion of the exercise recognition task. Hence, 10 peak detectors were used, one for each exercise. The raw data from all the participants from the INSIGHT-LME dataset was used here to count the number of repetitions for each of the exercises. Data processing, filtering, peak detection and counting are discussed in the following section. direction. However, these signals were affected with the inherent noise introduced by the sensor. To understand and retrieve these signal variations and to calculate repetitions, the raw data was first processed and filtered. The first step is to identify a dominant sensor axis for individual exercise and use this signal in peak detection. The dominant sensor axis in the plane of limb movement was evaluated using the mean square values of acceleration measurements from all the three axes of the accelerometer and the mean square values of the rotation rate from all the three axes of the gyroscope.  Table 7 summarizes the sensor and the dominant axis information used in repetition counting for repetition counting for for individual LME exercises from the mean square value evaluation. For each exercise, the observed plane of movement of the right-wrist of the participant exercising were matched with the calculated dominant sensor axis using the mean square method. Signal plots of 3D accelerometer and 3D gyroscope for all the exercises are shown in Figure A1 of Appendix B and Figure A2 of Appendix C respectively.
Dominant axis signals were smoothed to remove the possible noise using a low pass Savitzky-Golay filter. The Savitzky-Golay filter removes high-frequency noise and has the advantage of preserving the original shape and features of the time-series signal. A window of 1023 samples and a filter order 4 was used.

Peak Detection and Repetition Counting
The peak detector detects both positive peak and negative peak values from the input time-series signal. The peak detector uses a threshold value based on the exercise type and from the input dominant signal and then using this threshold value it calculates two threshold points, an upper threshold point and a lower threshold point. Depending upon initial signal slope direction, a first max value or a first min value was determined. With an initial max/min value was determined, the detector next finds the next min/max value using the lower/upper threshold point and the search continues for the next max/min with the help of upper/lower threshold point. A pair of max-min value was used as an increment counter in the repetition counting. Figure 22 represents the filtered dominant accelerometer x-axis signal plotted with both positive and negative peaks using peak detector. A total of ten different peak detectors were used, one for each individual exercise.

Experimental Results of repetition counting using peak detectors
All the input data signals from the INSIGHT-LME dataset were used in testing to evaluate the overall performance of the peak detectors. The number of error counts, i.e. the difference between the actual number of repetition counts and the number of detected counts, was recorded in each case. Table 8 shows the results of repetition counting for individual LME exercise in terms of the number of errors with that of the actual count using peak detection method.
The table also indicates the total number of subjects that were used in testing each exercise. The Error Count, or 'e|X|', was used to show the number of participants having '|X|' number of error Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 26 July 2020 doi:10.20944/preprints202007.0634.v1 Figure 22. An example of repetition counting for Bicep Curls on the filtered dominant signal from the x-axis of the accelerometer sensor count/s, where '|X|' was used to for absolute error count in terms of 0, 1, 2 or more than 2 errors. The peak detector method used for the repetition counting performed better for upper-body exercises like, BC, FR, LR and TER in comparison to the repetition counting of the lower-body exercises. Figure 23 indicates the performance and the percentage of error in the repetition counts for each exercise. For example, from Table 8, for Bicep Curls an upper-body LME exercise, repetition counting without any error were reported for 144 instances among 151 subject trials. However, for 7 subject trials ±1 error count was reported.  FR  151  140  11  0  0  Lateral Raises  LR  150  141  9  0  0  Triceps Extension Right  TER  152  143  9  0  0  Pec Dec  PD  149  120  8  3  18  Trunk Twist  TT  151  128  14  5 4

Lower-Body LME Exercises
Standing Bicycle Crunch  SBC  149  132  8  4  5  Squats  SQ  146  63  11  6  66  Leg Lateral Raise  LLR  149  73  10  18  48  Lunges  L  147  11  9 13 114 The second approach investigated for repetition counting was a deep CNN model, using the same CNN architecture as previously i.e. CNN_Model1. We compare a single deep CNN model for the Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 26 July 2020 doi:10.20944/preprints202007.0634.v1 repetition counting task of all the exercises as opposed to the the use of multiple CNN models as used in [26]. Figure 24 illustrates the pipeline used for the repetition counting task using the CNN_Model1 as a binary classifier along with an additional repetition counter block. Inspired by the signal processing approach to repetition counting, CNN_Model1 uses the peak information from the signals. However, the CNN_Model1 uses a binary classifier for the repetition counting instead of 11 class classifier as in the case of the exercise recognition task (Figure 14). The output of the binary classifier using the CNN_Model1 was given to a repetition counter which counts the total repetitions for any given exercise. We used the image dataset created with a 4-second sliding window from section 4.2.1 and created new target label information from the dominant axis information from section 4.5.1. New labels for "Peak" and "NoPeak" were appended to the each exercise class image data using dominant signal information. Binary target class label information was generated using a grid of 50% width of the image and if the peak of the dominant axis signal plot in the image lies on the left half of the vertical-axis of the grid then the image was labelled with "Peak" ("1") otherwise the image was labelled with "NoPeak" ("0"). Figure 25 explains the labelling process by observing dominant axis signal plot information for the Bicep Curls exercise. in Figure 25, two 2D image plots corresponding to two different 4-second segments of 6D sensor signals of Bicep curls were labelled using the dominant axis information. Exercise data of 25 seconds corresponding to one exercise, when segmented using a 4-second window having an overlap of 0.5 seconds, resulted in a sequence of 43 images. The top most plot refers to the accelerometer x-axis which was the dominant axis for the Bicep Curls exercise. The training image data set with binary class label information was formed to have all images corresponding to all the exercise data from the training set of INSIGHT-LME dataset. Similarly, the validation and the test image data-sets are generated for the binary classification.

CNN_Model1 as a Repetition Counter
Models were trained with the training dataset of the newer image dataset with binary class label information and validated with the validation results. Models were built to have optimum parameters with variation in learning rate and selection of optimizer as discussed in section 4.2.2. We used a binary cross-entropy loss function while training all models and the best model was selected based on the validation score evaluation.
Repetition counting was done by testing a sequence of 43 images corresponding to a 25 second exercise data. The predicted result, from the model, on each image of the sequence was recorded and used in the repetition counter. A repetition counter counts the total number of transitions from "Peak" to "NoPeak" ("1" to "0") and from "NoPeak" to "Peak" ("0" to "1"). The total repetition count corresponds to half the number of total transitions and Figure 26 illustrates the counting from the prediction labels . The optimization of parameters were selected based on lowest validation loss measures and the optimum CNN_Model1 for the repetition counting task was with Adam optimizer and with a learning rate of 1e-5. The model was further tested with the test dataset images. The test data set corresponds to the data from fifteen participants and each exercise was performed twice by each participant resulted with a total of 30 exercises data for each exercise. Table 9 shows the result of the repetition counting for individual LME exercises in terms of the number of errors with that of the actual count. The overall performance of the model in the repetition counting using a single AlexNet architecture based CNN model was very accurate for most of the upper-body LME exercises. However, for the lower-body exercises the repetition count performance for LLR was 80% and was better compared to the performance with other lower-body exercises. For the lunges, the model performance was poorest in the repetition counting. Figure 27 shows the percentage-wise result of the model in terms of the repetition count errors for the individual exercises. The performance of the model for the upper-body LMEs like: FR and LR, it was 100%. For other upper-body LMEs like: BC, TER, PD correct counting was 96.67%. In the case of lower-body LME exercises, for LLR the correct counting was 80.0% . For other exercises the performances of the model with zero error count was poor. However, the overall count performance of CNN_Model1 was better for most of the exercises, when compared to the repetition counting using the signal processing model (Figure 23).
A significant portion of the error counting was related to the single count error. Ground truth and predicted count patterns were studied to understand the reasoning for most of the single count errors. Figure 28(a) and Figure 28(b) depicts 25 seconds of Bicep curls exercise data representation with the ground truth compared with that of predicted output from the model. Most of the single count error from the predicted signal were due to the binary classifier misclassifying the transition state peaks, Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 26 July 2020 doi:10.20944/preprints202007.0634.v1

Exercise Repetition Counting with CNN_Model2 using Time-Series Arrays
The third model used in the repetition counting task was again a deep CNN model using the CNN_Model2 architecture which uses 6D time-series array data from the raw signal for binary classification. Figure 29 represents the pipeline using the CNN_Model2 (Figure 19 architecture in the repetition counting task. This method used was again uses a single CNN model for the repetition counting instead of the multiple CNN models used by [26].
The CNN_Model2 used here includes a binary classifier for repetition counting instead of an 11 class classifier model used in the case of the exercise recognition task (section 4.3. The data processing method uses the 6D time-series array data set from section 4.3.1 along with the new binary labelling information generated from section 4.6.1. The data processing and segmentation associated is further discussed in data segmentation and processing section below. The output of the CNN_Model2 was given to a repetition counter to count the total repetition counts for any given exercise. Additional binary target class label information from section 4.6.1 was appended on the 6D time-series array dataset generated with a 4-second sliding window from section 4.3.1. 25 seconds of sensor data, corresponding to the individual exercises, when segmented using a 4-second windowing method with 0.5 second overlap results into a total of 43 6D time-series array segments. Newly labeled data sets for the training, the validation and the testing data sets from the INSIGHT-LME dataset were used in the training, validation and testing the binary classifier.

CNN_Model2 as a binary classifier and Repetition Counter
The binary classifier models with CNN_Model2 architecture were trained with the training dataset of the newer 6D time-series array dataset with the binary label information and were validated with the validation results. The models were built to have optimum parameters with variation in learning rate and selection of optimizers among SGD, Adam and RMSprop along with the binary cross-entropy loss function. Testing the model with exercise data corresponding to a single participant was about testing the sequence of all 43 6D time-series array segments. The predicted results from the model, for these sequence of arrays from the test dataset was used in the repetition counting by a repetition counter. Similar to the section 4.6.2, the repetition counting was done by calculating the number of transition from "Peak" to "NoPeak"("1" to "0") and from "NoPeak" to "Peak" ("0" to "1"). The total number of repetition counts were equal to half the number of total transitions. The total repetition count corresponds to half the number of total transitions and Figure 26 illustrates the counting from the prediction labels .

Experimental Results of repetition counting using CNN_Model2
The optimum model was selected based on the validation score and was with an Adam optimizer, and a learning rate of 1e-06. The optimum model was further tested with the test data set. The test data set consists of total 30 exercise data from each exercise type corresponding to the fifteen participants performing each exercise twice. Table 10 shows the results of repetition counting for individual LME exercise in terms of the number of errors with that of the actual count. Overall performance of repetition counter using a single CNN_model2 of 6D time-series array data is good for upper-body LME exercises. However, for lower-body body exercises the repetition count performances are average one. For lunges, a lower-body exercise, the repetition counting is the poorest. Figure 30 represents the plot of error portions of error counts for each exercise. we could achieve 100% correct counting only in the case of LR exercise trials. However, the count performances were better in comparison with the signal processing based peak detection method. However, the repetition counting was very poor in the case of Lunges.  A significant amount of error count for upper-body LME exercises was with one count error. As discussed in section 4.6.3 the single error count cases were mainly observed with the signal transitions at the beginning or the end of the 25 seconds signal samples. With a tolerance of one count error (i.e. blue + yellow, Figure 30) performance of the model were almost comparable to that of CNN_Model1. In comparison to the CNN_Model1, CNN_Model2 performance was inferior but comparable.

Summary of Comparative study of models for Exercise Repetition Counting
We studied three different methods for the exercise repetition counting task: the peak detector, CNN_Model1 and CNN_Model2. The peak detector method, was a signal processing approach where we designed ten different peak detectors based on the dominant sensor-axis signal information, one for each exercise. The method was a dependent model on the exercise recognition task and therefore was used only after completion of the exercise recognition task as a follow-up block processing method. The fact that the requirement of ten different peak detectors one for each exercise recognition and a follow-up sequential block processing makes it a weak contender though it has an overall good repetition counting output.
The second model, the CNN_Model1 was built based on AlexNet architecture, and was a single deep CNN model used in the repetition counting in comparison with any state of the art CNN models proposed in repetition counting for a set of exercises. The CNN_Model1 found to have extremely good performance for seven out of ten exercises and also the overall a single best deep CNN model studied here, capable of counting repetitions with all the exercises. Overall performance of CNN_Model1 was extremely better for the upper-body LMEs.
The third model, CNN_Model2, which processes raw time-series 6D signals was a single deep CNN model capable of counting repetition of all the exercises. The CNN_Model2 was also a second single deep CNN model that we are proposing as the single model which performs the repetition counting task of all exercises from a set of exercises. This model performance was the overall second best in the repetition counting task and performed better for the upper-body LMEs similar to the CNN_Model1. However, the CNN_Model1 was the single best model that could be used for the repetition counting task among all three models.

Conclusions and Future Work Scope
Exercise detection and repetition counting task using a single wearable device with the state of the art algorithms is a key technological demand in the area of e-Health, e-Rehabilitaion methods. In this paper, we presented the exercise recognition and repetition counting of LME exercises used in cardiovascular disease rehabilitation using different methods based on machine learning and deep learning. No public dataset was available with a single sensor wearable device specifically for the exercises used in cardiovascular disease rehabilitation which can be used on eHealth, mHealth platforms. We created the INSIGHT-LME dataset using a single wrist-worn wearable devices under the supervision of health experts from the sports clinic and with the guidance from clinical staff. The new dataset will encourage further research in the field of application using single wrist-worn inertial sensor in exercise-based rehabilitation.
Our study was to find a single best machine learning or deep learning method, individually in the exercise recognition and repetition counting. In the first comparative study, we studied the exercise recognition task, using the supervised machine learning models using SVM, RF, kNN and MLP with and without dimensionality reduction using PCA. SVM model was found to be the single best performing supervised ML model with an overall accuracy measure of 96.07%. In addition to the supervised models, two deep CNN models were studied, the CNN_Model1 based on the AlexNet architecture and the CNN_Model2 a custom-built to handle time-series array data and these models are compared with the SVM model performance. The CNN_Model1 found to be the single best performing model for the exercise recognition task with an overall f1-accuracy score measure of 96.89%.
In the second comparative study, we studied repetition counting task using again three methods: repetition counting using a signal processing method, CNN_Model1 and CNN_Model2 methods. Signal processing method found to be the best in terms of accurate counting of repetition counts. However, this method can be used as a sequential processing method and can only be used after exercise recognition and requires 10 different algorithms for each type of exercise. CNN_Model1 using AlexNet architecture is the second-best repetition counting method having accurate counting for most of upper-body LME exercises. With a tolerance of ±1 count error, the performance of CNN_Model1 is accurate in almost all exercises except for Lunges a lower-body LME exercise among 90% or repetition sets.
In future, this study can be used to enhance the model's capability to provide qualitative feedback based on the variations observed from the exercise data. We have studied the tasks of exercise recognition and repetition counting in offline mode with a windowing method using the smallest overlap of 0.5 seconds, which can lead further studies to the more real-time recognition and counting using wearables and on-device computations with limited memory availability. Two approaches studied here can be further studied for the hardware complexities and time complexity analysis if needed to be implemented on miniaturized wearable devices.

Conflicts of Interest:
The authors declare no conflict of interest.The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Model architecture for CNN_Model1 and CNN_Model2
CNN Model architecture used in the exercise recognition task for CNN_Model1 is given in Table A1 and for CNN_Model2 is given in Table A2 represents the number of layers and the parameters used. The same model with only output layer variation is used in the repetition counting task