Muscle Mass Measurement Using Machine Learning Algorithms with Electrical Impedance Myography

Sarcopenia is a wild chronic disease among elderly people. Although it does not entail a life-threatening risk, it will increase the adverse risk due to the associated unsteady gait, fall, fractures, and functional disability. The import factors in diagnosing sarcopenia are muscle mass and strength. The examination of muscle mass must be carried in the clinic. However, the loss of muscle mass can be improved by rehabilitation that can be performed in non-medical environments. Electronic impedance myography (EIM) can measure some parameters of muscles that have the correlations with muscle mass and strength. The goal of this study is to use machine learning algorithms to estimate the total mass of thigh muscles (MoTM) with the parameters of EIM and body information. We explored the seven major muscles of lower limbs. The feature selection methods, including recursive feature elimination (RFE) and feature combination, were used to select the optimal features based on the ridge regression (RR) and support vector regression (SVR) models. The optimal features were the resistance of rectus femoris normalized by the thigh circumference, phase of tibialis anterior combined with the gender, and body information, height, and weight. There were 96 subjects involved in this study. The performances of estimating the MoTM used the regression coefficient (r2) and root-mean-square error (RMSE), which were 0.800 and 0.929, and 1.432 kg and 0.980 kg for RR and SVR models, respectively. Thus, the proposed method could have the potential to support people examining their muscle mass in non-medical environments.


Introduction
The lifespan of the world's population is increasing and society is gradually aging. According to the report of the United Nations, the number of elderly people (over 65 years of age) in the world in 2019 was 703 million, and this is estimated to double to 1.5 billion by 2050 [1]. In Taiwan, the report of the National Development Council indicated that the elderly population over 65 years of age will exceed 20% of the national population in 2026 [2]. For healthy adults, aging results in a progressive loss of muscle mass and strength. According to the study of Kim and Choi, people over forty years would have 8% loss of muscle mass every decennium. When their ages are over seventy years, the muscle mass loss would be 15% every decennium [3]. Although the loss of muscle mass and strength is Mahajan et al. used logistic regression and random forest to evaluate heart failure [38]. Kwon et al. applied the one-dimension convolution neural network to estimate the change of stroke volume with the blood pressure waveform [39]. Although ML methods have been popularly used for clinical prediction for some issues, some traditional statistical analysis methods have also reignited interest in exploiting these fields [40,41].
This study aims to estimate the total mass of thigh muscles (MoTM) by EIM with ML algorithms. Seven muscles of lower limbs were measured, including rectus femoris, vastus lateralis, medial femoris, tibialis anterior, semitendinosus, biceps femoris, and gastrocnemius. The parameters of EIM were the impedance, resistance, reactance, and phase, and the body information included age, weight, body mass index (BMI), gender, thigh circumference, and calf circumference. Thus, the number of total parameters was thirtyseven. Recursive feature elimination (RFE) was used to select the important parameters as the features for ML input. Two ML models, namely ridge regression (RR) and support vector regression (SVR), were used, and their performances were verified by the data from ninety-six subjects. Figure 1 shows the framework of this study. An EIM measurement system was developed, which includes an impedance measurement module and a data acquisition board. A graphic user interface (GUI) was also designed to display and record the EIM signals. According to the guide for the examination of sarcopenia [11], we recruited 96 subjects to evaluate the skeletal muscular mass of their lower limbs. The optimal parameters were determined by recursive feature elimination. Finally, two ML models used these parameters to estimate the total MoTM.  [38]. Kwon et al. applied the one-dimension convolution network to estimate the change of stroke volume with the blood pressure wavefor Although ML methods have been popularly used for clinical prediction for some some traditional statistical analysis methods have also reignited interest in exp these fields [40,41].

Materials and Methods
This study aims to estimate the total mass of thigh muscles (MoTM) by EIM w algorithms. Seven muscles of lower limbs were measured, including rectus f vastus lateralis, medial femoris, tibialis anterior, semitendinosus, biceps femor gastrocnemius. The parameters of EIM were the impedance, resistance, reactan phase, and the body information included age, weight, body mass index (BMI), g thigh circumference, and calf circumference. Thus, the number of total paramete thirty-seven. Recursive feature elimination (RFE) was used to select the im parameters as the features for ML input. Two ML models, namely ridge regressio and support vector regression (SVR), were used, and their performances were veri the data from ninety-six subjects. Figure 1 shows the framework of this study. An EIM measurement syste developed, which includes an impedance measurement module and a data acqu board. A graphic user interface (GUI) was also designed to display and record t signals. According to the guide for the examination of sarcopenia [11], we recru subjects to evaluate the skeletal muscular mass of their lower limbs. The o parameters were determined by recursive feature elimination. Finally, two ML used these parameters to estimate the total MoTM. Figure 1. The framework of this study. A measurement system is used to measure the par of EIM for the muscles of lower limb. According to the experiment protocol, we recruited ni subjects. RFE is used to select the important features to estimate the total MoTM by the ML

EIM Measurement System
An impedance measurement module (BIOPAC EP 100, BIOPAC® System, California, USA) was used to measure four parameters of the interesting skeletal m A DAQ board (NI DAQ USB-6361, National Instruments, Austin, Texas, USA) wa to acquire the EIM signals, i.e., phase and impedance.

Calibration of EIM Measurement System
The sampling rate was 500 Hz in the EIM measurement system. The input alte current was 50 kHz in frequency and 0.4 mA (root mean square, RMS). The imp sensitivity was 100 Ω/voltage, and the phase sensitivity was 9°/voltage. The frequency of low-pass filter was 10 Hz. Figure 2 shows the placement of four ele and the distribution of the electric field. The four electrodes include two current ele (positive and negative terminals, HC and LC) and two voltage electrodes (positi negative terminals, HP and LP). The parameters of EIM are resistance (R), reactan phase (P), and impedance (I). The relation among R, Z, P, and I is defined below, = + Figure 1. The framework of this study. A measurement system is used to measure the parameters of EIM for the muscles of lower limb. According to the experiment protocol, we recruited ninety-six subjects. RFE is used to select the important features to estimate the total MoTM by the ML models.

EIM Measurement System
An impedance measurement module (BIOPAC EP 100, BIOPAC®System, Goleta, CA, USA) was used to measure four parameters of the interesting skeletal muscles. A DAQ board (NI DAQ USB-6361, National Instruments, Austin, TX, USA) was used to acquire the EIM signals, i.e., phase and impedance.

Calibration of EIM Measurement System
The sampling rate was 500 Hz in the EIM measurement system. The input alternating current was 50 kHz in frequency and 0.4 mA (root mean square, RMS). The impedance sensitivity was 100 Ω/voltage, and the phase sensitivity was 9 • /voltage. The cutoff frequency of low-pass filter was 10 Hz. Figure 2 shows the placement of four electrodes and the distribution of the electric field. The four electrodes include two current electrodes (positive and negative terminals, HC and LC) and two voltage electrodes (positive and negative terminals, HP and LP). The parameters of EIM are resistance (R), reactance (Z), phase (P), and impedance (I). The relation among R, Z, P, and I is defined below, (1) where x is actual reactance (Ω), and y is the measured   The output signals of the EP 100 module are impedance I and phase P. According to Equations (1) and (2), Z and R can be calculated by the measured I and P. We calibrated the impedance measurement module with a resistor box and a capacitor box, respectively. Figure 3a shows the calibration of resistance. The dots indicate the measured points, and the red line is the practical calibrated line (regression line) approximated by the measured points. The blue line is the designed ideal line. The mean square error was 0.052 Ω, and the square of the correlation coefficient r 2 was 1.00. Equation (3) shows the calibrating function of the resistor: where x is actual resistance (Ω) and y is the measured resistance (Ω). Figure 3b shows the calibration of reactance. The red line is the calibrated line, and the blue line is the designed ideal line. The mean square error was 0.042 Ω, and r 2 was 1.00. Equation (4) shows the calibrating function of the resistor, where x is actual reactance (Ω), and y is the measured reactance (Ω).

Placement of Electrodes
Sanchez et al. suggested that the error of placement of electrodes in the EIM would affect the reproducibility according to the intraclass correlation coefficient (ICC) [30]. The larger the ICC, the smaller the error rate of the distance between the two electrodes. Moreover, the four electrodes must be aligned. In this study, we defined the placement of four electrodes with two schemes, 5 cm and 7 cm in length, to fit for the larger and smaller muscles, respectively. We used the translucent tapes to make the markers for electrode positioning, as shown in Figure 4. The length of the left tape is 5 cm, and that of the right one is 7 cm.

Experiment Protocol
The potential subjects underwent the hand-grip strength and walk test before participation in the experiment. The participants had to perform 28 kg and 18 kg grips for the male and female subjects, respectively. Moreover, their walking speeds must be over 0.8 m/second. There were 96 subjects participating in this study, and the number of male and female subjects was 42 and 54, respectively. The information of subjects is shown in Table 1, which includes age, height, weight, BMI, as well as thigh and calf circumferences.

Placement of Electrodes
Sanchez et al. suggested that the error of placement of electrodes in the EIM would affect the reproducibility according to the intraclass correlation coefficient (ICC) [30]. The larger the ICC, the smaller the error rate of the distance between the two electrodes. Moreover, the four electrodes must be aligned. In this study, we defined the placement of four rs 2022, 22, x FOR PEER REVIEW electrodes with two schemes, 5 cm and 7 cm in length, to fit fo muscles, respectively. We used the translucent tapes to make th positioning, as shown in Figure 4. The length of the left tape is 5 one is 7 cm.

Experiment Protocol
The potential subjects underwent the hand-grip strength and ipation in the experiment. The participants had to perform 28 kg male and female subjects, respectively. Moreover, their walking m/second. There were 96 subjects participating in this study, and female subjects was 42 and 54, respectively. The information of su 1, which includes age, height, weight, BMI, as well as thigh and experiment protocol was approved by the Institutional Review  According to the previous studies, the masses of lower limb muscles declined easier than those of upper limb muscles when people have sarcopenia [5,6,42]. Thus, we measured the tibialis anterior and gastrocnemius in the calf muscles, and vastus lateralis, rectus femoris, medial famous, biceps femoris, and semitendinosus in the thigh muscles. Table 2 shows the landmarks of each muscle. The vastus lateralis, medial famous, and tibialis anterior muscles belong to the small muscles, the other muscles belong to the large muscles. A subject was requested to comfortably lie supine on a table whose face was upward. We measured the thigh and calf circumferences. Then, the total MoTM was measured by the InBody S10 (InBody Co. Ltd. Korea) as the reference. Next, the subject was asked to maintain the same posture. Four BIOPAC EP 100 modules were used synchronously to measure the masses of vastus lateralis, rectus femoris, medial femoris, and tibialis anterior at approximately 60 s. Finally, the subject was requested to change their posture with face downward. Three BIOPAC EP 100 modules were used synchronously to measure the masses of biceps femoris, semitendinosus, and gastrocnemius. Figure 5 shows the flowchart of extracting features. The RFE is used to search the optimal parameters [43,44] as the features to estimate the MoTM. The parameters of 85 subjects randomly selected from the 96 subjects are the raw data. In order to reduce the flag problems, such as overfitting or selection bias, the RFE uses the five-fold cross validation to evaluate the optimal parameters. Table 3 shows the used parameters of subjects that not only include the EIM parameters, but also contain the body information of subjects. Thus, there are 34 parameters. RFE fitted a model to remove the weakest features until the specified number of features was reached. All features were ranked by r 2 of the model, and by recursively eliminating a feature with the lowest coefficient per loop. The lower the impact feature, the lower the change of coefficient. Thus, RFE could eliminate the features with the dependencies and collinearity existing in the model.

Machine Learning Models
The traditional regression problem usually uses the linear multiple-regression method that fits the regression curve as close to the training data as possible. This would cause the testing data to contain an amount of error, which is called the overfitting problem, if the input variables are highly correlated. Therefore, the training regression curve should not be too close to the training data, so that the predictions have better results. In this study, we used two machine learning models with the concept of error margin, RR, and SVR, to estimate MoTM.   Figure 5 shows the flowchart of extracting features. The RFE is used to search th optimal parameters [43,44] as the features to estimate the MoTM. The parameters of subjects randomly selected from the 96 subjects are the raw data. In order to reduce th flag problems, such as overfitting or selection bias, the RFE uses the five-fold cross va dation to evaluate the optimal parameters. Table 3 shows the used parameters of subjec that not only include the EIM parameters, but also contain the body information of su jects. Thus, there are 34 parameters. RFE fitted a model to remove the weakest featur until the specified number of features was reached. All features were ranked by r 2 of t model, and by recursively eliminating a feature with the lowest coefficient per loop. T lower the impact feature, the lower the change of coefficient. Thus, RFE could elimina the features with the dependencies and collinearity existing in the model.
whereŷ is the estimated value, x j is the independent variable, β j is the coefficient, p is the number of independent variables, and ε is residual error. A loss function (L) will be defined by the regression function, as in Equation (6), The sum square error (SSE) is usually used as a loss function, and the object is to minimize the loss function to estimate the β j , where y i is the observed value and N is the number of observed values. Ridge regression (RR) adds the penalty parameter to the objective function [45], Because this parameter is a second-order penalty for the coefficient, it is also called the L2 penalty parameter. The value of the L2 penalty parameter can be controlled by λ. When λ approaches 0, Equation (8) is equal with Equation (7). When λ approaches to the infinite, all coefficients approach 0. In this study, λ is set to 0.1.

Support Vector Regression
The difference between SVR and linear regression is that an error margin is acceptable to find an appropriate model to fit the data. In Equation (9), the error term (ε) is instead handled in the constraints, where the absolute error is less than or equal to a specified margin, called the maximum error, Thus, a loss function (L) is defined by a regression function (f ) and adds the constraint, a specified margin. However, this margin could not comprise all of the data. Some of the data still fall outside the margin. Therefore, a slack variable is defined such that any data falling outside of is denoted its deviation from the margin as ξ. The objective function can add the slack variable below, The margin is changed as, Moreover, SVR can use the different kernel functions, linear or nonlinear functions, to convert the nonlinear data distribution to the linear distribution [46]. In this study, the kernel function is a 2nd-order polynomial function, C is set to 0.1, and ξ to 0.3.

Results
The results of this study included the optimal features and MoTM estimation. For the search of optimal features, we not only studied the impact of each parameter, but also combined the complementary parameters to yield a more significant feature set. Then, the performances of MoTM estimation by RR and SVR models were compared.

Optimal Feature Sets
According to the description in Section 2.3, the impact of each feature depends on the regression models. After the RFE process, we only chose the parameters with positive weight coefficients. Table 4 shows the ranks and weight coefficients of these parameters under RR and SVR. There are nine and eight parameters for RR and SVR, respectively. We used theses parameters as the feature sets to train RR and SVR models, whose regression coefficient (r 2 ) were 0.812 and 0.831, separately. The gender parameter is the categorical variable, and TC and CC are the geometric variables which are affected by the bone, tissue, fat, and muscle. Thus, we combined these parameters with EIM parameters to yield a more substantial feature set. From Table 4, the major thigh muscles are the rectus femoris and vastus laterals, and the major calf muscles are the gastrocnemius and tibialis anterior. The new EIM features for the thigh muscles were the RF_R, RF_Z, and VL_Z parameters normalized by the TC parameter, which individually added to the original feature vectors to train the RR and SVR models. Table 5 shows the regression coefficient (r 2 ) for the RR and SVR models. The RF_R/TC has the best performance for RR and SVR models, whose r 2 values increase to 0.816 and 0.840. The new EIM features for the calf muscles were the TA_P and GT_P parameters combining with the gender parameter, i.e., the means of TA_P and GT_P for the male and female groups separately multiplying with the TA_P and GT_P in the male and female groups. These new features were individually added to the original feature vectors to train the RR and SVR models. Table 6 shows the regression coefficient (r 2 ) for the RR and SVR models. The TA_P_Gender has the best performance for RR and SVR models, whose r 2 values increase to 0.825 and 0.840.

Performance of Regression Models
In Table 4, the body information are the height, weight, and gender. However, the gender parameter has been combined with the EIM parameter. Thus, we chose the height and weight as the features. Moreover, in Tables 5 and 6, the RF_R/TC and TA_P_Gender features have the best performance. Thus, we also chose these two features. Then, the final features were height, weight, RF_R/TC, and TA_P_Gender, which were used to train the RR and SVR models again. The subjects who were not used to train the models were used as the testing data. The number of testing samples was 11. The regression coefficients (r 2 ) of RR and SVR models were 0.800 and 0.929, and the RMSEs were 1.432 kg and 0.980 kg, respectively.

Discussion
According to the study of Chen et al., the sarcopenia diagnosis not only measures the skeletal muscle mass and strength, but also tests some physical performances [11]. The measurement of skeletal muscle mass implies significant costs in terms of clinical practice. Although some commercial apparatus with bioelectrical impedance analysis (BIA) technology can measure the global muscle mass, its price and size are hard to accept in the context of homecare. The benefits of EIM are that it can measure the electrical parameters of single muscle and its operation is much easier than BIA. Thus, when the injured muscle is improved with rehabilitation, EIM could measure the real change of this muscle [47]. In this study, an apparatus, InBody S10, with the BIA technology was used to measure the total MoTM. However, we only measured the RF_R and TA_P parameters with the EIM method and body information to estimate the MoTM based on the RR and SVR models. The regression coefficients (r 2 ) between two methods were 0.800 and 0.929, respectively.
According to the study of Janssen et al., the parameters of estimating skeletal muscle mass were the height, age, gender, and resistance of BIA [48]. The regression coefficients (r 2 ) approached 0.86. In our study, we proposed 34 parameters, including the body information of subject (excluding age) and parameters of EIM, as shown in Table 3. Different machine learning algorithms may perform properly with different feature sets, even if they are using the same training set [49]. Therefore, we used the RFE method to rank these parameters for RR and SVR models. In Table 4, the height, weight, gender, and RF_R are the common parameters for two models. This result was very close to the previous study.
For the traditional regression estimation, the categorical parameters, such as gender, country, or race, are difficult to utilize because they are only encoded. Therefore, the regression functions depend on the different categorical parameters to increase the estimating accuracy [5,50]. In Table 4, the impact ranking of gender parameter is only second based on RR and SVR models. Thus, the performance of RR and SVR models built on the different genders could not be improved. The external direct product in the group theorem is a general method to process the data of two groups [51]. Thus, we used this method to reinforce the differences of EIM parameters for the different genders.
Theoretically, more features should result in better discriminating performance, but the practical experience for the machine learning algorithms shows this doctrine not to be applicable for many cases [37,[52][53][54]. A regression model with more features would possibly reduce modeling bias. However, its predicting accuracy would decrease, and its computational complexity would increase. Hence, we used RFE to select the optimal feature set. The number of features for RR and SVR models was reduced to 10 and nine. Moreover, another method with dimensionality reduction, also known as feature extraction, usually uses the linear combination with the given features to reduce the size of feature space without losing information of the original feature space [55]. In this study, we tried to combine the parameters of EIM with the body information, including thigh circumference and gender. In Tables 5 and 6, RF_R_TC and TA_P_Gender features significantly increase the performance of RR and SVR models. Thus, we only selected four features, namely height, weight, RF_R_TC, and TA_P_Gender, to estimate the total MoTM. The SVR model had the better performance, with a regression coefficient (r 2 ) of 0.929 and RMSE of 0.98 kg.
Aaron et al. [56] and Tarulli [57] found the relation between the phase parameter of EIM measuring the TA and GT muscles and age and muscle atrophy. In Table 4, the TA_P and GT_P parameters are the important features for the RR and SVR models. Moreover, Kortman et al. found that the changes of resistance and reactance parameters of EIM for the skeletal muscle would be affected by the age and gender of subjects [5]. In Table 4, RF_R and RF_Z parameters also are the important features.
There are three reasons for the loss of mass, namely a reduction in the number of muscle fibers, shrinking in the size of muscle fibers, and transformation of muscle fibers into type I fibers. The different conditions will make the changes of the different parameters of EIM. For the change of muscle fiber number, the resistance parameter of EIM will increase, and the phase parameter will decrease. For the change of muscle fiber size, the intracellular fluid of muscle will decrease, which would cause a drop in the phase parameter of EIM. In this study, subjects, excluding those have injurious lower limbs and muscle weakness, generally have one of the three conditions listed above. The proposed method only used the resistance and phase parameters of EIM to estimate the muscle mass of lower limbs. Thus, this is considered as the limitation of this study.
Moreover, the BIA method includes the EIM and IPG, which not only measures the impedance of muscles, but also measures the tissue components including the fat, vessel wall, skin, etc., and the blood flow. Wróbel et al. proposed a pulse-dynamics analysis to calibrate these potential error parameters [58]. The pulse signal can be measured by the IPG [29]. Thus, in the future, the EIM signal could be calibrated by the dynamic pulse signal measured by IPG to reduce the distortions induced by lipids in the skin of various thickness for individual patients.

Conclusions
Sarcopenia is a prevalent disease for elderly people when their limbs or vertebra are injured. Moreover, the muscle mass usually decreases with age. Thus, people could avoid the loss of muscle mass with therapy for physical fitness. The development of a measurement system used in a non-medical environment could support people with the hidden risk of sarcopenia to examine their muscle mass condition every day. In this study, the contribution is to utilize the data mining technique, extracting important features, such as the resistance and phase parameters of EIM, and body information, and using two machine learning algorithms, namely RR and SVR, for estimating the total MoTM. Their regression coefficient (r 2 ) and RMSE all are better than previous studies. Thus, the proposed method has the potential for screening skeletal muscle mass in non-medical environments in the future. SSE sum square error r 2 regression coefficient