Machine Learning Approach for Fatigue Estimation in Sit-to-Stand Exercise

Physical exercise (PE) has become an essential tool for different rehabilitation programs. High-intensity exercises (HIEs) have been demonstrated to provide better results in general health conditions, compared with low and moderate-intensity exercises. In this context, monitoring of a patients’ condition is essential to avoid extreme fatigue conditions, which may cause physical and physiological complications. Different methods have been proposed for fatigue estimation, such as: monitoring the subject’s physiological parameters and subjective scales. However, there is still a need for practical procedures that provide an objective estimation, especially for HIEs. In this work, considering that the sit-to-stand (STS) exercise is one of the most implemented in physical rehabilitation, a computational model for estimating fatigue during this exercise is proposed. A study with 60 healthy volunteers was carried out to obtain a data set to develop and evaluate the proposed model. According to the literature, this model estimates three fatigue conditions (low, moderate, and high) by monitoring 32 STS kinematic features and the heart rate from a set of ambulatory sensors (Kinect and Zephyr sensors). Results show that a random forest model composed of 60 sub-classifiers presented an accuracy of 82.5% in the classification task. Moreover, results suggest that the movement of the upper body part is the most relevant feature for fatigue estimation. Movements of the lower body and the heart rate also contribute to essential information for identifying the fatigue condition. This work presents a promising tool for physical rehabilitation.


Introduction
Physical exercise (PE) is defined as any activity performed by the muscles that requires more energy than a resting state [1]. According to the World Health Organization, PE is a fundamental tool to prevent and treat many non-communicable diseases [2], such as cardiovascular diseases, cancer, stroke, and diabetes. Therefore, to help patients and clinical staff to achieve specific rehabilitation aims, PE has been incorporated into different rehabilitation programs [3,4]. On the one hand, PE is used for improving the patient's cardiovascular and respiratory capabilities in cardiac [5][6][7] and pulmonary [8,9] rehabilitation sessions. Furthermore, in oncology rehabilitation, the PE helps mitigate the pathological fatigue effects [10,11], which is a common symptom presented in patients with cancer [12]. This means that patients are easily exhausted when performing activities of daily living [13]. In addition, the PE is implemented in neuromuscular [14][15][16] and musculoskeletal rehabilitation [17,18] to restore joint mobility and muscle strength of affected limbs.
The main aim of PE in rehabilitation is to develop the health-related physical fitness (HRPF) state of the patients [3], which refers to the components that are required to have a healthy life. In general, these components are focused on preventing illness or improving functional health, instead of working sports performance [19]. The HRPF capabilities can be divided into three individual groups [3,19]: body composition, which considers the distribution of the different body tissues (water, fat, muscle, and bone) [19]; musculoskeletal, which refers to the strength, endurance, power and flexibility of the muscles [20]; and cardiorespiratory or aerobic capability, related to the ability of the circulatory and pulmonary systems to provide oxygen for creating energy during long periods of activity. The cardiorespiratory capabilities are usually assessed with the maximal oxygen uptake (VO 2MAX ) [21]. Nevertheless, studies have shown that the anaerobic capability is also relevant for a good quality of life because it focuses on the body's ability to produce energy without oxygen, which is the metabolic way used for sudden movements with a short duration, commonly executed in daily life [3,20,22,23].
Essentially, stretching, endurance and resistance exercises with external loads or human body weight are used for developing the musculoskeletal group [20]. On the other hand, aerobic activities (e.g., walking, jogging or riding) are implemented for the cardiorespiratory elements [19]. Finally, the anaerobic capability is amplified by short-duration powerful activities (e.g., vertical jumps, sit to stand or running short distances) [22]. Therefore, various activities can be implemented to achieve different goals [3].
Despite the benefits of PE, several considerations must be kept in mind for its implementation in rehabilitation, because taking patients to extreme exercise conditions and high fatigue states might lead them to suffer physical or physiological complications [24]. Considering this, it is required to design a personalized exercise plan at the moment to use PE as a clinical tool and achieve the different objectives of each rehabilitation program. This plan must be prescribed by a specialized health care professional, according to the unique conditions of each patient (e.g., weight, height, age, injuries, medication, pathologies [25]). Commonly, the training plan considers the activities, frequency, time, and intensity of training [25]. Studies have shown that the intensity is the most relevant feature at prescribing PE [26,27] because it determines the amount of energy expenditure and can be seen as the "dose" of the prescription [3]. Moreover, it is used to allocate the exercises in three groups: low-intensity exercises (LIEs), moderate-intensity exercises (MIEs) and high-intensity exercises (HIEs) [3,23].
The LIEs are composed of soft activities that demand a low energy cost (e.g., slow walking on a flat surface), and it is used for patients with extreme risk conditions [3]. The MIEs contemplate non-stopped activities with a long duration that require a low effort (e.g., walking on a slope between 20 to 60 min) [3]. At first, the LIEs and MIEs were the only classes implemented in rehabilitation, especially in cardiac rehabilitation because they let the clinical staff manage the intensity easily and had shown to be sufficient to reduce chronic disease risk factors [28,29]. In contrast, HIEs are forceful activities with a short duration (between 15 s to 5 min, depending on the intensity), which can be divided into recovering and training periods [30]. Since several studies have demonstrated that HIE is more effective at increasing the VO 2MAX [20][21][22]28,[31][32][33][34][35][36], this training technique has been widely used in physical rehabilitation [37]. Furthermore, the American Heart Association has incorporated the HIE into its recommendation manual for patients with heart diseases [38]. Nevertheless, the infinite interval variations and the difficulties at managing the intensity make the HIEs prescription a complex task [23].
Although many activities can be used as HIEs; it is highly recommended to perform the exercises that reflect daily life motions (e.g., jumping, carrying loads, or climbing stairs). In general, because they elicit the metabolic ways and the muscular groups required for a healthy life [22]. Moreover, they are easy to implement in clinical scenarios [22]. Thus, bearing in mind that sitting and standing are some of the most common activities, the sit-to-stand (STS) test is widely implemented in physical rehabilitation [39]. This test consists of sitting and standing from a chair as fast as possible during a determined period (between 30 to 120 s), and it is considered as one of the hardest exercises [40]. Therefore, studies have demonstrated that it is indispensable for increasing VO 2MAX and assessing the patients' physical state [39,41]. However, due to its high intensity, it requires special monitoring compared to the other HIEs [42].
Hence, considering the importance of the STS test in the rehabilitation programs and the risks of taking patients to high fatigue conditions during sessions, there is a need to develop methods that allow managing the exercise intensity [42].

Exercise Intensity and Fatigue Regulation Background
Intensity can be defined in two ways: absolute, which refers to the complete quantity of energy used during the whole training, and relative, which contemplates the rate of energy implemented in the activity [43]. Hence, the more energy required for an activity, the harder it will be. Because the relative intensity allows obtaining a real-time metric, it is the most used in physical rehabilitation sessions for monitoring the patient's condition [43]. However, quantifying the amount of metabolic energy expended is a complex task due to the fact that the human body executes too many jobs at the same time and has different ways of producing energy [3]. Therefore, several techniques have been explored to get an indirect estimation of the exercise intensity.
One approximation consists of estimating the relative intensity by using the metabolic equivalent (MET). It is a unit that represents the relation between the rate of energy expended in physical activity and the rate of energy expended in a resting state, commonly measured in Kcal×Kg −1 ×h −1 [44]. In this way, the LIE is lower than two METs, the MIE is between two and six METs, and the HIE is higher than six METs [45]. Obtaining the rate of energy expended is not an easy task. Therefore, many exercises have been classified by global organizations according to some standards of healthy people [46]. However, this implementation of the MET unit has been widely criticized for the exercise intensity regulation in rehabilitation sessions. It is because it does not allow monitoring the patient's condition during the session [47]. Hence, it is more commonly used to get a general idea about the type of exercise implemented in the training plan [44].
As continuous monitoring is essential for patients with chronic diseases during physical sessions, it is preferred to estimate the exercise intensity based on metrics obtained directly from the patients [3]. Thus, other methods that consist of measuring physiological parameters related to the energy expended have been proposed-for example, monitoring the patient's breathing rate, blood lactate level, oxygen saturation, or blood pressure [48][49][50].
The oxygen uptake (VO 2 , usually measured in mL/min/kg) is considered one of the best ways to determine the exercise intensity, because of its linear relationship with the energy cost [51]. Furthermore, the VO 2 can be changed easily to METs units [52]. Nevertheless, it requires complex instrumentation, and it is difficult to measure directly. Hence, health professionals prefer not to apply this technique in their sessions [3]. A similar case is presented for the blood lactate level, blood pressure, and oxygen saturation, where a static position is required to obtain reliable measurements [3].
On the other hand, the heart rate (HR) can be easily estimated during exercise and has shown a linear relation with the VO 2 [50]. Therefore, HR is the most used physiological parameter in physical rehabilitation, especially for aerobic training [50]. Bearing in mind that each person may present different HR values during resting or training conditions, it is better to implement the heart rate reserve (HRR) for intensity monitoring [3].
The HRR represents the safe range of person HR values, calculated by the difference between the maximum HR (estimated by exercise test or the age) and the resting HR of a person [3]. This metric can be provided as a percentage, which represents the part of the range that is being covered. Hence, a value of 100% means that the person reaches his/her maximum HR. In contrast, 0% means that the person is at his/her resting HR. It can be implemented to divide the exercise intensities in such a way: the LIE is between 20-39%, the MIE is between 40-59%, and the HIE is between 60-84% of the HRR [3]. Nevertheless, studies have shown that the HR stops rising linearly when reaching its maximum values and loses its relationship to the VO 2 . Thus, it is not recommended to only use this indicator for monitoring HIE [50].
Other methods consist in estimating the patient's fatigue because it is understood as a lack of energy to keep performing an activity [53]. Fatigue has been considered as a subjective experience [54], which describes a decrease in physical performance associated with an increase in a task or exercise's real/perceived difficulty [55].
Electromyography (EMG) is considered the gold standard to detect muscular fatigue because it directly measures the bio-electrical function of the muscles [56,57]. However, the EMG processing is also a complex task to execute in real-time, since it requires power and frequency analysis [58]. It is needed to consider the noise generated by external factors and the location of the electrodes, especially as they are not always going to be placed in the same position [59]. Furthermore, some exercises implement many muscular groups, which makes it necessary to use several electrodes [59].
Other techniques implement subjective methods where the patients are asked about the level of perceived exertion or fatigue, according to established ordinal numeric scales [60]. The ten points Borg's scale (Borg CR10) is composed of 11 levels (from 0 to 10), and it is one of the most applied in physical rehabilitation [61]. On this scale, the LIE corresponds to 0-3, the MIE corresponds to 4-6, and the HIE corresponds to 7-9 values, considering that the ten value is the maximum effort and means that the patient is not able to continue with the exercise [3]. Despite its ease of application, studies have demonstrated that, due to its subjectivity, it does not always represent the real intensity compared to the physiological parameters (specifically, to the VO 2 ) [62]. Finally, the last technique is based on the idea that fatigue can be seen as a decrease in the performance of the user [63]. Current studies have shown that some temporal, kinematic, and dynamic features of the activity executed may change according to the exhaustion level of the user [64][65][66]. In general, ambulatory sensors are used to estimate the related performance features (e.g., accelerometers, gyroscopes, pressure sensors, and force platforms) because they allow measuring of easily physical metrics of the user in real scenarios [67,68]. Furthermore, computer models have been developed through the application of machine learning techniques to estimate whether the user is in a fatigued or non-fatigued state, i.e., only two states of fatigue. Studies have proposed models for different exercises, such as vertical jump [69], lower limb endurance training [66], and walking [70], showing an accuracy between 85% and 95%.
Although this novel technique presents a significant potential for clinical scenarios because it provides an objective indicator of the user's fatigue condition [70], having only the estimation of two fatigue states limits more accurate monitoring of the user's performance during therapy. However, these systems implement sensors that are easy to adapt and use in rehabilitation environments, providing a practical and useful tool for the health staff [67,68]. Moreover, due to the global health emergency caused by the coronavirus disease 2019 (COVID19), the need for home clinical tools has increased lately [71]. Therefore, this type of technology presents a significant potential for telemedicine in rehabilitation applications. Nevertheless, this method is highly dependent on the exercise type, and each activity performance is assessed with different features [72], which requires adapting the whole system to the corresponding activity.
Regarding the STS test, to the author's knowledge, two studies have explored this novel technique for this exercise. Aguirre et al. [65] determined which STS features present a relation with fatigue, implementing a Kinect depth sensor and the Borg's scale, with twenty healthy volunteers. Results showed that two temporal and three kinematic STS features present a significant lineal relation to the exhaustion level. However, a model to estimate fatigue is not developed.
Otherwise, Jiménez et al. [42] presented a case of study for detecting fatigue employing EMG signals and a smartphone accelerometer, with an obese and sedentary volunteer who performed eight STS tests. Results exhibit that relative energy acceleration of the movement increase, and the number of repetitions decreases when the person is physically exhausted. Nevertheless, an estimation model is not displayed, and it is concluded that future work should use these characteristics to develop robust models [42].
We consider the advantages and disadvantages regarding the novel fatigue estimation method based on a decrease in the performance of the user and machine learning models exposed above. Specifically, nowadays, the novel methods of machine learning developed only consider two states of fatigue, i.e., fatigued or non-fatigued state, which limits more accurate monitoring of the user's exhaustion during therapy and thus, determines the possibility of improving the user's performance during therapy. We also consider the relevant information of the HR about exercise intensity, the importance of monitoring patients' fatigue condition during exercise to avoid any injuries or affect the rehabilitation process, and the wide use of STS exercise in physical rehabilitation. This work aims to carry out a study with 60 healthy volunteers to develop and evaluate a machine learning model based on the evaluation of the participant's exercise performance to classify three fatigue levels (low, medium, and high) to be more specific in regards to user's fatigue state monitoring in STS exercise. For this purpose, the HR of the participants in the execution of the STS exercise was monitored, and kinematic and temporal characteristics of the movement were obtained through the Kinect. This device was chosen for its ease of adaptation and use in rehabilitation settings, providing a practical and helpful tool for healthcare personnel.

Materials and Methods
Bearing in mind the motivation and the related works mentioned in Section 1, this section presents the methodology applied to develop the proposed model. In general, the model estimates three levels of fatigue in the STS exercise by monitoring 32 kinematic/temporal features and the user's heart rate. To this end, this model is based on machine learning techniques, developed with 660 STS registers obtained from 60 healthy people and Borg's scale. Therefore, the first step consisted of an experimental study with 60 healthy participants to obtain the corresponding data to develop and assess the proposed fatigue estimation model.

Subjects Recruitment
A total of 30 females and 30 males were recruited to perform a 2 min sit-to-stand test, according to the following criteria: Inclusion criteria: Healthy adult subjects between 18 and 30 years old and with a weight between 50 and 75 Kg were considered. Furthermore, volunteers must have been in a non-fatigued condition, according to the "multi-dimensional fatigue inventory". This tool is a 20-item questionnaire used for measuring the user's fatigue condition, according to 5 different classifications: general fatigue, physical fatigue, mental fatigue, reduced motivation, and reduced activity.
Exclusion criteria: subjects with physical impairments that prevent them from sitting down and standing up, cognitive impairments that do not allow them to follow instructions, conditions that put them at risk in a fatigued state and use prostheses or orthoses in their limbs were excluded from the study.
Finally, the volunteers signed informed consent to clarify that they voluntarily accepted to participate in this study. The ethics committee accepted this protocol of the university "Colombian School of Engineering Julio Garavito" (Bogota, Colombia). The mean and standard deviation (M ± SD) of volunteers' demographic data for the female and male groups are shown in Table 1.

Materials
To analyze the STS kinematic, the heart rate and the fatigue level of each volunteer, a multitasking application (i.e., a multi-wire application not affected by the sampling rate of each sensor utilized) was developed to incorporate and synchronize the following tools in a single computer process: • Kinect V2 (Microsoft, USA): This sensor implements depth and RGB images to segment the human body. In this work, the second version of this sensor was used with the Windows SDK, which provides 25 body points. It can measure the 3D position and orientation of each body point at a sample rate of 30 Hz. Moreover, because this activity is normally executed in the same plane, this Kinect has been widely used to analyze the STS movement, showing great accuracy and performance [73]. The sensor was placed on a tripod at 1 m from the floor and 4 m from the subject, as it is suggested for the right usage [73]. • Zephyr HxM BT (Medtronic, Ireland): This sensor is a wearable sensor that has been used to extract heart rate information of the patients requiring continuous monitoring. In this study, data were collected through a Bluetooth communication channel with a sample rate of 1 Hz. It was placed on the volunteer's chest with an elastic band. Moreover, it is implemented to measure the resting heart rate of each subject. The selection of the Zephyr BT sensor was made based on accuracy, reliability, cost, availability, and comfort [74][75][76].

Procedure
Initially, each volunteer's maximum heart rate (MHR) was estimated by implementing the "Tanaka equation", shown in Equation (1). It uses the user age (AGE, in years) for getting an approximation of their MHR. It is essential to highlight that the Tanaka equation is recommended for healthy individuals such as those involved in this study because this equation significantly overpredicts maximal heart rate. Therefore, for people who present some diseases, it is recommended to adapt this method for exercise testing [77,78].
The volunteers were informed about the use of the Borg CR10 scale and instructed to warm-up for 5 min, composed of stretching movements and a 3 min treadmill walking. Afterward, the participants were instrumented with the Zephyr sensor. For the test, a 40 cm high chair without armrests was used, which was placed 4 m in front of the Kinect V2 sensor.
Before starting the test, participants were asked to stand with their hands on their shoulders and were instructed to look straight forward during the entire test. At the moment the participants heard the command "forward", they began to perform the exercise. The exercise consisted of sitting down and getting up from the chair as fast as possible for 120 s (2 min) without stopping. Nevertheless, if the heart rate overcame 90% of the MHR, or a 10 Borg value was notified, the test was immediately concluded. Finally, the volunteers were instructed to perform a 5 min cool-down. The STS exercise representation and the study set-up can be seen in Figure 2.
Although the Kinect V2 provides 25 body markers, in Figure Figure 2B illustrates the sitting position, the sensor locations, the reference system of the Kinect V2 (X, Y, and Z), and the orientation of some Kinect points (Xp, Yp, and Zp).

Data Processing
Considering the metrics mentioned above, Figure 3 presents an example of a test register. In Figure 3A, the movement of the M_hip marker on the Y-axis (M_hip y ) is shown, where it is possible to appreciate the sit-to-stand movement as a harmonic signal. This is because the STS test consisted of performing a repetitive activity, which creates a repetitive behavior in the position signals, especially for the vertical movements. Figure 3B illustrates the heart rate register and how it increments during the test. Finally, Figure 3C contains the 4 Borg CR 10 values mentioned by the volunteer every 30 s.

Kinect Features
Taking advantage of the M_hip repetitive behavior on the Y-axis, an automated procedure was implemented to detect each stand-to-stand cycle. Essentially, the process consisted of subtracting the mean value of the whole M_hip signal and detecting the minimum and maximum values of each cycle. Therefore, these maximum values were considered as the moments when the volunteer was standing and the minimum values when the subject was sitting. Hence, these values allow estimating the two phases of the STS activity, stand-to-sit and sit-to-stand. Figure 4 presents an example of M_hip signal on the Y axis (M_hip y ) of one test register, where it is possible to see the maximum values (Max_val) and minimum values (Min_val) of the corresponding signal. Furthermore, Figure 4 shows a zoom of one part of the signal, where a stand-to-stand cycle and its phases can be appreciated.  According to these stand-to-sit and sit-to-stand phases, the following 32 kinematic and temporal features were estimated for each stand-to-stand cycle, where "Fn" represented the feature number "n", and the symbol "*" indicates that the corresponding feature was estimated with the mean value of both sides, left and right:  Figure 2B). Hence, this feature was estimated with the difference between the maximum and minimum value of the corresponding signal during the sit-to-stand phase. • F31: Spine abduction-adduction max velocity ( • /s), estimated by deriving the spine abduction-adduction signal and obtaining its maximum value during the stand-tosit phase. • F32: Spine abduction-adduction min velocity ( • /s), estimated by deriving the spine abduction-adduction signal and obtaining its minimum value during the sit-tostand phase. Figure 5 illustrates an example of some different features estimation in two consecutive stand-to-stand cycles. The dashed lines contain the stand-to-stand cycles, the superscript symbol " ' " represents the derivative operation of the corresponding signal, the dark dots presents the maximum values (Max_val) of each signal, and the gray polygons the minimum values (Min_val). Figure 5A presents graphically the estimation of the stand-tostand time (F1), sit-to-stand time (F2), stand-to-sit time (F3) and M_hip vertical range (F4). Figure 5B shows the derivative of the M_hip y signal (M_hip y ), the M_hip max and min vertical velocity (F6 and F7). Figure 5C shows the knee flexion-extension signal (Knee fleext) and the estimation of the Knee flexo-extension range (F10). Finally, Figure 5D presents the derivative of the Knee flexo-extension signal (Knee f le − ext ), the Knee flexo-extension max and min velocity (F11 and F12).

Borg Interpolation, Features Relation and Heart Rate Incorporation
The 60 volunteers were able to finish the 2 min test; therefore, at the end of the data recollection, there were 240 Borg vales, 4 for each volunteer as it is illustrated in Figure 3C. However, only 8 subjects reported a 10 Borg CR10 value at the end of the test, which means that they reached the maximum fatigue level. Bearing in mind that the study aims to develop a computational model based on a data set and machine learning techniques, the Borg CR10 values were interpolated every 10 s employing linear estimation [79,80]   On the other hand, it is essential to highlight that the performance test is strongly dependent on each subject's physical capability. Thus, the number of stand-to-stand cycles executed may be different, as well as the amount of STS features and their values. The lowest number of cycles obtained was 71, and the highest was 127. Consequently, to relate the fatigue level to each performance feature (F1 to F32), the five closest stand-to-stand cycles to each Borg value were used to estimate an average of each STS feature. This number of cycles was obtained by analyzing the ten registers with the least number of stand-to-stand cycles. Therefore, no cycle was repeated for the Borg values, except for the last one since the final part of the test is where the lowest cycle rate is presented, which does not allow to accomplish the non-repeated cycle requirements for all participants. Figure 7 illustrates an example of these nearest cycle selections, where the dashed lines with the gray light background contain the selected sit-to-stand cycles for each Borg value.
Furthermore, it can be seen in the white background rectangles which cycles were not used and that the final Borg was not related to any cycle. Considering the importance of the heart rate for the patient's fatigue monitoring in the rehabilitation programs, this parameter was incorporated into the data set as the feature number 33 (F33) in a similar way as the other features. As the Zephyr sample rate is 1 Hz, each test register contains 120 heart rate records. Hence, aiming to get an average value without repeating records, the mean values of the five closest heart rate measurements to each Borg value (except for the last one) were used to relate the fatigue level with this physiological parameter. Figure 8 presents an example of these heart rate record selections, represented by the dashed lines and the clear gray background. Therefore, at the end of this process, the interpolated and original Borg values are related to the average of the corresponding 32 kinematic/temporal features (F1 to F32) and the average heart rate (F33).

Data Normalization
Feature variability caused by the subject physical condition makes it difficult to perform a direct comparison among the volunteer registers, which requires a normalization of the data according to each initial subject performance [81][82][83]. Hence, considering that all the volunteers were at a zero fatigue level at the beginning of the test and it is where the best performance should be presented, all features extracted were normalized by dividing it with the corresponding initial value (see Equation (2)). Figure 9 presents an example of one volunteer for three different features normalized. Figure 9A shows the Borg values reported and interpolated. Figure 9B exhibits the behavior of the sit-to-stand time (F1) and how this feature tends to increase. Figure 9C displays the behavior of the Knee flexo-extension max velocity (F11) and how this feature tends to decrease. Figure 9D shows the behavior of the Hip flexo-extension range (F13) and how this feature does not present a continuous increment or decrement. However, it illustrates constant behaviors in some parts of the test (like at the end of the test, where this feature results in higher values than the beginning). Finally, Figure 9E shows that the mean normalized values of the heart rate are increasing.

Data Set Construction
After Borg interpolation and considering that the first Borg value was used to normalize the features, and the last Borg value was not used, 11 STS performance-related fatigue levels were finally obtained for each participant. Thus, at the end of the process, a total of 660 (60 participants × 11 fatigue levels) Borg registers related to the performance features were obtained for the data set.
To determinate the target for all registers, each one was labeled with 3 fatigue states (low, moderate and high, as illustrated in Figure 1) according to the corresponding Borg value. In such a way, registers with a related Borg value between 0 and 3 were considered as low fatigue (LF); between 4 and 6, as moderate fatigue (MF); and between 7 and 10, as high fatigue (HF). Thus, each register is composed of 33 normalized features (32 STS kinematic/temporal and 1 of the heart rate) and 1 target. The representation of the data set can be seen in Figure 10, where it can be seen how the 660 registers contain their corresponding features and targets.  Finally, to analyze in general how the features change according to the 3 fatigue categories, the mean and standard deviation were calculated for each feature regarding the fatigue condition. Therefore, it is possible to observe if the features, in general, present statistically different values and how those features behave concerning the fatigue.

Fatigue Estimation Model
To develop and evaluate a computational model to estimate the three fatigue states using the 33 features, different machine learning methods were explored based on the obtained data set. Overall, the machine learning model development is divided into two phases: the training phase and the testing phase [84]. In the training phase, a huge part of the data set is used to train the model (usually, between 70% and 90%). Therefore, it can process the features and find patterns, regarding some training algorithms [84]. In the testing phase, the remaining part of the data set is used to assess the trained model by comparing the estimated outputs to the targets so that the model is evaluated with data that were not implemented for the training [84].
In this case, the training and test stage of the classifiers was conducted employing a specific technique called "cross-validation". The classifiers model parameters were trained through leave-one-out cross-validation, which involves partitioning a sample of the size of "N" into a calibration sample of size "N-1" and a validation sample of size 1 and repeating the N process times. In this context, each model is trained with "N-1". Different data groups were assessed with the reminder group [84]. Here, cross-validation is applied multiple times for different values of the tuning parameter, and the parameter that minimizes the cross-validated error is then used to build the final model. Thereby, crossvalidation addresses the problem of overfitting [85]. In the end, this technique provides a general performance metric called "accuracy", which is the relation between the total correct estimations obtained in each testing process or true positives (TP) and the complete amount of data (N), as it is shown in Equation (3).
Considering the size of the data set, 6 folds were selected for this validation process; hence, each fold consists of 110 registers. Figure 11 illustrates this process, where "TPn" represents the number of true positives of the corresponding "n" iteration, and "Acc" represents the final accuracy metric.   Figure 11. Cross validation process for the machine learning model training and assessment.
Moreover, the false negatives (FN), that represent the amount register were estimated as other fatigue groups; and the false positives (FP), that refer to the number of registers that belong to other groups and were wrongly estimated, were calculated to obtain 3 more performance metrics known as "Precision" (Equation (4)), "Recall" (Equation (5)) and "F-Score" (Equation (6)).
From our framework's perspective, it is impossible to predetermine which methods will work best for fatigue prediction because these methods are data-driven and, thus, are application-dependent, i.e., dependent on the exercise, extracted features, sensors, and scenarios, among others. Therefore, several methods were applied during our preliminary analysis of the data to develop the fatigue prediction model. The models evaluated included: logistic regression (LR), decision trees (DT), k-nearest neighbors (KNN), support vector machine (SVM), naive Bayes (NB), linear discriminant (LDA), artificial neuronal network (ANN), and random forest (RF). The open-source python library "scikit-learn" [86] was used to execute a quick general training for these classifiers. Afterward, according to the accuracy metric, and due to their relatively poor performance, DT, LDA, and NB were eliminated. Hence, our case study focused on using the best five classification models (LR, KNN, SVM, RF, ANN), adjusted and retrained, by modifying their training parameters automatically through computational iterations. The theoretical approach of the machine learning models used is summarized below.
A statistical model such as LR attempts to build a relationship among the input variables and response employing parametric methods. In other words, it uses a logistic function to model conditional probability. Hence, LR is a supervised learning algorithm technique where the probability of a dichotomous outcome is a function of the predictors/features [87,88]. Although LR is a simple yet very effective classification algorithm, its performance can vary significantly with sparse data [88]. Moreover, non-parametric approaches such as KNN, SVM, and ANN, are commonly used in human performance modeling applications [89][90][91][92]. KNN is a simple classifier, an easy-to-implement supervised machine learning algorithm that can solve both classification and regression problems. The algorithm assumes that similar things are near to each other; therefore, it requires the computation of the distance of the unlabeled object to all the labeled objects in the training set [93]. Regarding the SVM classifier, which is a supervised learning method that uses kernel functions for data classification and regression analysis, its methodology consists of using a hyperplane to separate one-dimensional data to a high-dimensional space from a given labeled data set [94,95] to identify the optimal hyperplane to classify the given data with minimum error [95].
Concerning the ANN classifier, it is a supervised machine learning classifier that seeks to classify an observation as belonging to some discrete class based on inputs. This classifier is a set of connected input-output networks in which weight is associated with each connection. It consists of one input layer, one or more intermediate layers, and one output layer. Learning of neural networks is performed by adjusting the weight of the connection. By updating the weight iteratively, the performance of the network is improved [96]. Finally, concerning the RF model, the ensemble classification algorithm utilizes trees as base classifiers to generate many classifiers and aggregate their results via voting. It means that each tree in the random forest spits out a class prediction and the class with the most votes becomes our model's prediction. The premise of this method is that combining a large number of single classifiers allows for a more diverse representation of the data and consequently a more accurate prediction [97][98][99]. Table 2 shows the descriptive data of the number of stand-to-stand cycles obtained in the 60 registers, specifically, the mean, median, standard deviation, maximum, and minimum cycle number. It is possible to see that, on average, the subjects executed 97.24 stand-to-stand cycles, which means that in general, the cycle rate was 0.803 cycles per second. Furthermore, it shows in the table that the minimum stand-to-stand cycle number achieved was 71, and the maximum was 127, obtaining a difference of 56 cycles.  Table 3 presents the number of registers for the three fatigue states, according to the labeling process presented at the end of Section 2.4.4: low fatigue (LF), moderate fatigue (MF) and high fatigue (HF). It can be seen that the MF group contains most of the registers, followed by the LF group. Hence, the HF group has the lowest value, showing a difference of 57 registers regarding the MF group, which corresponds to 8.6% of the total data.  Figure 12 displays the mean (bars) and standard deviation (black lines) of each normalized feature, regarding the fatigue state, where the light gray bars correspond to LF, the gray bars to the MF, and the black bars to the HF. Furthermore, the features are split into 3 different bar graphs; hence, Figure 12A contains features from 1 to 11, Figure 12B Figure 13 presents four examples of the data distribution regarding two specific features, using the sit-to-stand time (F2) always as the horizontal axis. Therefore, the black triangles represent the high fatigue registers; the gray squares, the moderate fatigue; and the clear gray circles, the low fatigue. Specifically, Figure 13A shows the distribution according stand-to-stand time (F1); Figure 13B, according to the heart rate (F33); Figure 13C, according to the M_shoulder depth range (F23); and Figure 13D, according to the M_hip max depth velocity (F8). Essentially, these plots display some patterns that can be found in the data set, where it is possible to see how some features are related ( Figure 13A) and others not ( Figure 13D). Considering the number of features, there are 33 scatter plot options; hence, in Figure 13, only the most relevant features are shown, which were chosen considering Figure 12. The Uniform Manifold Approximation and Projection (UMAP) was implemented to provide a 2D representation of how the data are distributed among the three classes. The UMAP algorithm allows us to represent the features into a reduced number of components. These components are used for visualizing and estimating possible clusters among the classes [100]. Hence, Figure 14 presents the obtained 2D representation of reducing the 33 features into two components, where the LF registers are easily separated from the MF and HF registers. However, the UMAP technique does not display a clear separation between the MF and HF registers. The confusion matrix obtained from the best five classifier models implemented after exploring in a grid search manner with the data obtained in Section 2.4.5 is shown in Figure 15, where along the x-axis are listed the true class labels and along the y-axis are the class predictions. Along the first diagonal are the correct classifications, whereas all the other entries show misclassifications. In the same way, Table 4 reports the parameters and the performance of the five classifiers implemented. The k-nearest neighbor (KNN) method using the Euclidean distance classified the registers by a majority vote of its nearest elements with 12 neighbors (K = 12). The logistic regression (LN) classifier implements the large-scale bound-constrained optimization as a penalty algorithm (solver = lbfgs) and a value of 1000 for its inverse of regularization strength learning parameter (C = 1000). Then, it implements the artificial neuronal network with a stochastic gradient-based optimizer (solver=adam), and 100, 20, and 100 as hidden layer sizes (hls = (100, 20, 100). The support vector machine (SVM) has a radial basis function kernel (kernel = rbf) and a constrain value of 2 (C = 2). Finally, the best model is a random forest classifier with 60 estimators (n_estimators = 60), which means that the model integrates 60 decision tree models to merge their prediction. Moreover, Table 4 provides the mean values of the performance metrics: accuracy, recall, precision, and F-score, where the random forest (RF) model presents the highest values. Taking into account that the values in Table 4 are the mean values obtained after the six tests of the cross-validation process (Figure 11), Figure 16 presents the box plot of each reliability metric for the five machine learning implemented methods. Hence, each method contains four box plots, where the middle horizontal line represents the median value, the four quartiles are contained by the vertical lines, and the boxes and the black dots are atypical data. It can be seen that the RF method always presents the highest values, showing the lowest dispersion and, therefore, the lowest variance. Table 4. Performance of the five best fatigue estimation models. Bold values show the best score for each performance metric.

Model Main Parameters
Overall Accuracy (%)  Considering the above results in Figure 17, the data was sorted to systematically evaluate the performance of the random forest classifier in the fatigue condition prediction task. In addition, the classifier reported an outstanding response without showing problems related to overfitting or underfitting. To observe whether there is any gender effect on the RF classifier performance. The data were split to analyze the effect of gender on the fatigue condition prediction task individually, as reported in Figure 18. The two individual RF classifiers were trained separately for the male (a) and female (b) genders and showed high agreement between the fatigue condition and RF predictions. Finally, the feature importance property of random forest was explored to quantify the importance of each feature for the corresponding model. This property measures a relative weight value to each feature, which represents a direct relation to the importance of the corresponding feature for this classic machine learning model. Figure 19 presents as a bar graph, the relative importance values obtained for each feature, sorted from the highest to the lowest values. Then, F23 (M_shoulder depth range), F1 (stand-to-stand time), and F33 (heart rate) features present the highest values for our experimental set-up.

Discussion
According to previous works [101] that have studied the reference number of cycles in a 1 min STS test for healthy people, the results obtained in Table 2are lower. Specifically, the authors in [101] reported that subjects between 20 and 24 years have an average standto-stand rate of 1.183. Whereas in our results reported in Table 2, it is illustrated that the participants presented an average of 0.803. This might be attributed to the fact that the STS test performed by the participants in this work was twice as long (i.e., 2 min), which makes the test harder, and hence, the general performance decreases when they start to feel exhausted. Therefore, these results suggest that people decrease their performance when they begin to feel fatigued, i.e., their speed/intensity in performing the test decreases.
Similarly, this behavior was observed in all participants (regardless of their physical condition), where the rate of the cycle was not constant, and it tended to decrease during the test due to induced fatigue. This means that at the end of the test the number of cycles decreases. Therefore, regardless of whether the lowest number of cycles was executed (71) or the highest (127), the behavior was the same. Hence, five cycles were used to get the average for each feature for the data set and to ensure homogeneity in the data.
Although every volunteer started in a low fatigue condition, results in Table 3 display that most of the registers belong to moderate fatigue (MF). In contrast, the lowest register number is presented for the high fatigue group (HF). Considering that reaching a Borg value in the HF band requires to pass firstly for the LF and MF groups, this result was expected. However, a difference of 8.6% (57 registers) is acceptable for data analysis and training computational models [84]. Moreover, these results present that the data set registers are distributed similarly among the three fatigue groups. In general, the volunteers experimented with the three fatigue states during the test.
Bear in mind, previous studies have demonstrated that the times of the sit-to-stand phases are the most relevant exercise performance features [40]. The results in Figure 12 are concordant with the literature. An increment of approximately 20% can be observed for these types of features (F1 and F2) between the mean values of the LF and HF groups. However, despite the fact that the use of heart rate is criticized for managing the patient's fatigue condition during HIE [50]. The results reported a direct relation with fatigue level. The heart rate (F33) has a difference of 21.7% between the LF and HF groups. Thus, the heart rate provides relevant information related to the individual's fatigue state.
Nonetheless, other features present the opposite behavior. Specifically, the M_hip depth range (F5), the knee flexo-extension max velocity (F11), knee flexo-extension min velocity (F12), Hip flexo-extension min velocity (F15), M_shoulder vertical (F22) range, and M_shoulder depth range (F23) present the highest decrements. Because these features are related to the phase's time and the movement of the spine, it is normal that the lower limb angular velocities decrease, especially the minimum velocities that correspond to the sit-to-stand phase.
However, features that come from the upper part of the body, specifically the M_shoulder movement ranges (F22 and F23), also decrease. This behavior may be due to the fact that the volunteers tried to change the exercise execution technique, to continue the activity as fast as possible, and to relieve the load on the main lower limb muscles. Furthermore, by moving the chest and the back to the front part, the exercise becomes easier [102], which causes the upper movement range features to decrease. Thus, although these features do not change as much as the time phase features and the heart rate, they also provide important information about the fatigue condition. It also contributes information about any possible variation in the correct execution of the exercise, which is essential for avoiding injury.
The relationship between the features themselves and the fatigue condition can be better analyzed in the 2D plots of Figure 13. In Figure 13A, the data distribution follows a straight line due to the fact that the sit-to-stand time is part of the stand-to-stand time; hence, both features are very related. It can be seen that the HF samples are clustered around the highest values. In contrast, the LF samples tend to be grouped in the lowest values. However, this graph does not present a clear difference between the LF and MF samples. Moreover, some irregular HF registers are in the lowest values, which makes it difficult to differentiate them from the other fatigue categories just with these two features, showing that just one parameter is insufficient for a good classification. Figure 13B exhibits the data distribution regarding two features that are not related, the sit-to-stand time and heart rate. Hence, the samples are more dispersed and do not follow a precise equation. As above, the LF samples tend to be grouped in the lowest values; however, it can be seen that some LF registers reach values of about 1.6. This means that during the test, the heart rate reached values of approximately 60% higher than the repose heart rate of the corresponding volunteer and do not overcome a value of 1.5 of the sit-to-stand time. This represents exercise conditions where the volunteers were requiring more energy for doing the exercise and did not feel fatigued, and thus, they were able to keep a similar performance. Taking into account that these heart rate values are acceptable in some rehabilitation scenarios (e.g., oncology rehabilitation), this case may be optimal for physical training [3,12]. Nevertheless, by monitoring just the heart rate, it would be difficult to distinguish this optimal training condition from the cases where moderate or high fatigue levels are reached.
On the other hand, it is possible to see HF samples that do not overcome a 1.4 value in the heart rate and, they are in the highest values of the sit-to-stand time. These registers represent cases when the cardiac system was not able to adapt as fast as the exercise requires, which might happen in high-intensity exercises and are very dependent on the subject's cardiorespiratory capability [50], and hence, they felt exhausted and were not able to keep executing the exercise with similar performance. However, it is possible to see the opposite case, where some HF and MF samples are presented in the heart rate highest values and in the lowest values of the sit-to-stand time. This case shows conditions where the volunteers felt compelled to adapt their execution technique to keep performing the activity quickly. Thus, it is important to monitor other exercise performance features where these changes can be appreciated.
The execution exercise technique change and its influence on the sit-to-stand time can be appreciated in Figure 13C, where many HF samples are grouped in the lowest values of the M_shoulder depth range. Considering that moving the back to the front facilitates the execution of the exercise and reduces the upper body displacement on the Z-axis [102], the exercise phase times tend to decrease, showing a better performance. However, the real situation reflects a pattern that, owing to the fatigue condition, the volunteers may modify their posture to reduce the load on the lower limbs. Therefore, the LF samples are clustered in the highest values of the M_shoulder depth range. In Figure 13B,C, it is possible to see how the fatigue distribution changes in both axes. In contrast, Figure 13D presents the data distribution of the M_hip max depth velocity, which does not provide a clear pattern visually. Thus, it is not possible to determinate data groups, clustered on the vertical axis, despite the fact that the feature changed its values in a similar range of the sit-to-stand time.
In addition, Figure 14 presents in general that the LF registries are easier to classify because they tend to be clustered according to the UMAP features reduction technique. Although Figure 14 does not present a clear separation between the MF and HF groups, this can be appreciated better in Figure 13 where the data tend to be clustered in specific ranges of the features. Specifically, it is possible to see that in the extreme values, the HF registers are normally shown.
Regarding the different patterns that can be presented and the number of features, one of the best ways to analyze the data set is by employing computational models capable of determining these and other behaviors. It can be seen in Figures 15 and 16 and Table 4 that the machine learning model with the lowest reliability values is the KNN, which is based on distance techniques for classifying. Hence, considering the data distribution presented in the scatter plots (Figures 13 and 14), it is possible to see that this is not the recommended method for this type of data. Despite the fact that the SVM and the ANN present a better performance estimation, these models based on estimating curves for classifying do not present the best performance because the groups are not quite separated. Hence, the RF model has the best reliability results. Considering the different cases that may be presented, this result suggests that the best method consists of merging different estimators that analyze the entire data to provide a consensual result.
We considered the common problems related to overfitting or underfitting in classifiers and also the consequence of these incorrect results, i.e., these can be divided into false positives or false errors. This is not desirable in clinical scenarios given the problems involved (overtraining, injury, affecting the patient's rehabilitation process, among others). Therefore, the performance of the RF classifier in the fatigue state prediction task was evaluated, as illustrated in Figure 4. This result reported an outstanding classifier response without showing problems related to underestimation or overestimation of fatigue.
On the other hand, the RF classifier performance was analyzed as a function of the participant's gender (male or female), as shown in Figure 18. The results showed a high agreement among fatigue conditions and RF predictions, i.e., the classifier performance does not decrease, and apparently, this consideration could improve RF performance. However, to ensure this hypothesis, it is necessary to test, the classifier on at least 100 patients. Therefore, the above is considered as future work, where we contemplate analyzing the relationship between gender with the classifier performance and their fatigue condition.
We regarded the feature importance obtained for each variable to indicate which features from the data are the most relevant in the training of a random forest model, as illustrated in Figure 19. Our proposed model describes the most important parameters related to the fatigue condition in STS exercises, such as M_shoulder depth range (F23), stand-to-stand time (F1), and heart rate (F33). These results are concordant to the STS study by Jimenez et al. [42], which reported that the acceleration of the chest is strongly related to the fatigue condition, considering that people try to move their upper body part to make the STS execution easier [102]. Moreover, Aguirre et al. [65] reported that considering that the stand-to-stand time contains information about both STS phases, it is the one that presents the most effective linear relationship according to the fatigue level. In the same way, Figure 19 shows the characteristics that do not provide any relevant information about the individual's fatigue state in the execution of the STS exercise. These features correspond to those representing movements in the frontal plane, such as the abductionadduction movement. This is because the STS exercise is performed primarily in the sagittal plane; therefore, these characteristics do not change completely or change randomly. This information may suggest a better understanding to clinicians of the parameters that should be analyzed to monitor the patient fatigue state in STS exercise with a limited number of parameters.
On the other hand, regarding the performance of machine learning models, it would be possible to perform a better feature selection, which could equalize or even increase the performance metrics of the classifiers by removing unnecessary features from the data. In future real-time applications, the dimensionality of data is vital to optimizing computational costs and running time.
Comparing to other similar studies [42,65], to the authors' knowledge, this work is the first that presents a model for fatigue estimation with three states (low, medium, high) during the STS exercise execution by monitoring kinematic/temporal features and the heart rate. Regarding the study by Jimenez et al. [42], the authors demonstrated that chest acceleration in vertical motion is related to fatigue, using an accessible and practical device, the IMU of a smartphone. However, it presents one case study and only analyzes one kinematic feature that may change its behavior if the subject modifies the execution technique. On the other hand, in the research by Aguirre et al. [65], the authors carried out an analysis methodology to determine which STS features are significantly linearly related to the fatigue level, measured with the Borg CR10 scale. However, it only presents a linear analysis and does not analyze the different patterns and behaviors that can be presented.
Similar studies that proposed fatigue estimation models during different exercises or activities employing IMUs, such as walking [70], vertical jumps [69], lower limb endurance [66], frameworks activities related to manufacturing tasks [99], have shown accuracy values between 85% and 95%. Therefore, contrasting the proposed ensemble model performance with the literature, its results are in the lowest part of the range (83.2%). However, it must be considered that these similar studies only considered two fatigue conditions, fatigued and no-fatigued. In contrast, this work contemplates three states, increasing the probability of failing in the estimation but providing a clear separation for LF records with respect to MF and HF. (Figure 14). Moreover, this model allows more concrete monitoring of the individual's fatigue level during the rehabilitation process, and with it, the possibility to improve the individual's performance during therapy.
Even though the proposed model is not based on IMUs, it implements a KinectV2 for obtaining the exercise features, which is an affordable sensor that has shown to be helpful in clinical scenarios and allows to measure more STS features [73]. Considering the different patterns that may be presented in the lower and upper body parts, this sensor exhibits several advantages at being able to extract relevant STS features from different body parts. This allows to have constant monitoring of the person's exercise execution technique and, thus, avoid any kind of injury. Furthermore, owing to the relevant heart rate information regarding the fatigue condition and its facility for being measured, this model also integrates an affordable heart rate sensor.
One limitation of this work is related to the study population because all the volunteers were healthy people, and the features may show different behaviors and patterns with patients or other groups with different physical conditions. However, the normalization process means that the model compares the user's state with his/her initial condition, reducing the difference features variability presented among the volunteers. Furthermore, other similar studies also recruited healthy subjects [42,65,66,69,70]; hence, as a first approximation for a complete clinical tool, this work presents relevant results.
Finally, owing to the global confinement caused by COVID19, the need for clinical tools for telemedicine has significantly increased [71]. Hence, keeping in mind the importance of fatigue monitoring in physical rehabilitation and the practical tools that it implements, this work presents the initial development of a potential clinical tool for estimating fatigue during one of the most implemented HIE in rehabilitation programs.

Conclusions
First of all, a study was carried out to obtain a data set of 660 sit-to-stand registers. It was composed of 32 kinematic/temporal exercise features and the heart rate, each characteristic labeled with a fatigue condition (low, moderate, and high) based on the Borg scale values provided by the participants.
An analysis process was carried out to determine the most relevant features related to the fatigue condition. For this purpose, the behavior and pattern of each extracted characteristic were analyzed. Results suggest that the most important feature is the depth displacement of the upper body part, followed by the stand-to-stand time and the heart rate. Therefore, it is possible to suggest that the user's physiological condition, the upper body features, and the lower body features contain relevant information regarding fatigue estimation during the STS exercise.
Finally, an approach of a fatigue estimation model is proposed aiming to show that these features can be implemented for estimating fatigue with an accuracy of 82.5% with accessible and practical sensors, which, according to similar studies, is in the acceptable range. Furthermore, this model allows classifying three fatigue conditions: low, moderate, and high. This allows for improved monitoring of the individuals' fatigue state, thereby optimizing their performance and, consequently, the execution of the exercises. Hence, this work presents the development of a potential tool for physical rehabilitation scenarios and telemedicine applications that has become an important area during this global emergency caused by COVID19.  Institutional Review Board Statement: All participants gave their informed consent for inclusion before they participated in the study, i.e., informed consent was obtained for the patient(s) to publish this paper. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the ethics Committee of the Colombian School of Engineering Julio Garavito (Project identification code: 813-2017; Protocol code: 05-2019; date: 12 November 2019).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
Publicly available datasets were analyzed in this study. This data can be found here: https://figshare.com/articles/dataset/STS_fatigue_data_zip/15001362.

Conflicts of Interest:
The authors declare no conflict of interest. The funding sponsors had no role in the design of the study, in the collection, analyses, or interpretation of data, in the writing of the manuscript, and in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: