Heart Rate Estimated from Body Movements at Six Degrees of Freedom by Convolutional Neural Networks

Cardiac activity has been monitored continuously in daily life by virtue of advanced medical instruments with microelectromechanical system (MEMS) technology. Seismocardiography (SCG) has been considered to be free from the burden of measurement for cardiac activity, but it has been limited in its application in daily life. The most important issues regarding SCG are to overcome the limitations of motion artifacts due to the sensitivity of motion sensor. Although novel adaptive filters for noise cancellation have been developed, they depend on the researcher’s subjective decision. Convolutional neural networks (CNNs) can extract significant features from data automatically without a researcher’s subjective decision, so that signal processing has been recently replaced as CNNs. Thus, this study aimed to develop a novel method to enhance heart rate estimation from thoracic movement by CNNs. Thoracic movement was measured by six-axis accelerometer and gyroscope signals using a wearable sensor that can be worn by simply clipping on clothes. The dataset was collected from 30 participants (15 males, 15 females) using 12 measurement conditions according to two physical conditions (i.e., relaxed and aroused conditions), three body postures (i.e., sitting, standing, and supine), and six movement speeds (i.e., 3.2, 4.5, 5.8, 6.4, 8.5, and 10.3 km/h). The motion data (i.e., six-axis accelerometer and gyroscope) and heart rate (i.e., electrocardiogram (ECG)) were determined as the input data and labels in the dataset, respectively. The CNN model was developed based on VGG Net and optimized by testing according to network depth and data augmentation. The ensemble network of the VGG-16 without data augmentation and the VGG-19 with data augmentation was determined as optimal architecture for generalization. As a result, the proposed method showed higher accuracy than the previous SCG method using signal processing in most measurement conditions. The three main contributions are as follows: (1) the CNN model enhanced heart rate estimation with the benefits of automatic feature extraction from the data; (2) the proposed method was compared with the previous SCG method using signal processing; (3) the method was tested in 12 measurement conditions related to daily motion for a more practical application.


Introduction
Cardiac activity has been monitored continuously in daily life by virtue of advanced medical instruments with microelectromechanical systems (MEMS) technology. Seismocardiography (SCG) is one of the vital components that allows the possibility of this monitoring system. SCG is noninvasively measured from thoracic movements caused by both contraction of the heart and ejection of the that the significant features relevant to cardiac activity can be extracted from the motion data better by CNNs than by previous methods based on signal processing determined by the researcher's subjective decision.
This study was conducted to develop a novel method to enhance the heart rate estimation from thoracic movement by CNNs. Thoracic movement was measured by six-axis accelerometer and gyroscope signals using a wearable sensor that can be worn by simply clipping on clothes. The dataset for training the CNN model was collected from 30 persons in 12 measurement conditions according to body postures, physical conditions, and movement speeds. The CNN model was developed based on VGG Net [26] and optimized by testing according to network depth and data augmentation. It was evaluated by calculating accuracy from ECG measured as ground truth and was compared with the previous SCG method using signal processing. The contributions of this study can be summarized as follows: (1) the CNN model enhanced heart rate estimation with the benefits of automatic feature extraction from the data; (2) the proposed method was compared with the previous SCG method using signal processing; (3) the method was tested in 12 measurement conditions related to daily motion for a more practical application.

Experiment
This study was an experiment designed to collect a dataset for training and evaluating the CNN model. The participants consisted of 30 persons (15 males, 15 females) aged 27.7 ± 3.3 years. They had no medical history related to cardiovascular disease and were healthy enough to perform physical exercise. All participants were instructed to have a full rest of sleep and were asked to abstain from caffeine, alcohol, and cigarettes before the experiments. They provided written informed consent before the experiment and were paid as an incentive after the experiment.
One of the challenges for SCG is that the accelerometer signals to determine SCG is sensitive to body postures and motion artifacts. In addition, it is necessary to test SCG for a wide range of heart rates to ensure clinical relevance. Thus, this study verified the method in 12 measurement conditions according to body postures, physical conditions, and movement speeds by considering the mentioned challenges, as shown in Figure 1. First, in six measurement conditions according to body postures and physical conditions, all participants were asked to maintain three body postures (i.e., sitting, standing, and supine) in two physical conditions (i.e., relaxed and aroused conditions). To evoke the physically relaxed condition, the participants closed their eyes and maintained the body postures for 3 min. To evoke the physically aroused condition, they exercised and maintained the body postures for 3 min. The exercise was to make the heart beat faster by running on the treadmill at speed of 8.5 km/h for 3 min. Second, in six measurement conditions according to movement speeds, they were requested to walk and to run at six speeds (i.e., 3.2, 4.5, 5.8, 6.4, 8.5, and 10.3 km/h) for 3 min, respectively. The experiment lasted for a total of 63 min and sufficient rest was given between each task. The participants were given longer rest times after exercise tasks than after nonexercise tasks by taking into account the time required for cardiac activity to be restored to a steady condition. This protocol was approved by the Institutional Review Board of the Sangmyung University, Seoul, South Korea (BE2016- 14).
This study employed a portable and wearable apparatus, which had been developed in our previous study [8], to measure thoracic movement using the MEMS accelerometer and gyroscope sensors. The apparatus was worn by the participants by simply clipping it onto their clothing around the left side of the chest, as shown in Figure 2. The tri-axis accelerometer and tri-axis gyroscope signals were measured at sampling rates within the range of around 100 and 200 Hz using the apparatus. ECG was simultaneously measured using an ECG measurement system with Lead-I. The system consisted of an ECG 100C amplifier system and a MP150 data acquisition system (BIOPAC Systems Inc., Goleta, CA, USA). ECG was measured at a sampling rate of 512 Hz and served as a ground truth for the evaluation of SCG. In order to reduce noise while measuring ECG, the participants were asked to minimize the movement of their arms during the experiment.

Formatting
It is necessary to collect a large-scale dataset to train CNNs well. Thus, this study augmented the dataset by dividing the motion data and ECG with a sliding window (window size = 5 s, interval size = 1 s). Then, the motion data and ECG were preprocessed to transform them into a data structure for CNNs. The six-axis motion data was measured from the accelerometer and gyroscope at a sampling rate within the range of around 100 and 200 Hz, as described in Section 2.1. The motion data was interpolated as the sampling rate of 256 Hz by cubic spline interpolation [27] because CNNs take fixed-size data as input data. The motion-induced noise was estimated by the Savitzky-Golay filter with the order of 2 and window size of 31 samples [28] and was subtracted from each accelerometer or gyroscope signal to remove large-range motion, as shown in Figure 3. This method preserves higher order moments around inflection points and overcomes the limit of the simple digital filter [10]. The accelerometer and gyroscope signals were normalized to have an average of zero and a variance of one. ECG was measured at the sampling rate of 512 Hz to serve as ground truth. The R-

Formatting
It is necessary to collect a large-scale dataset to train CNNs well. Thus, this study augmented the dataset by dividing the motion data and ECG with a sliding window (window size = 5 s, interval size = 1 s). Then, the motion data and ECG were preprocessed to transform them into a data structure for CNNs. The six-axis motion data was measured from the accelerometer and gyroscope at a sampling rate within the range of around 100 and 200 Hz, as described in Section 2.1. The motion data was interpolated as the sampling rate of 256 Hz by cubic spline interpolation [27] because CNNs take fixed-size data as input data. The motion-induced noise was estimated by the Savitzky-Golay filter with the order of 2 and window size of 31 samples [28] and was subtracted from each accelerometer or gyroscope signal to remove large-range motion, as shown in Figure 3. This method preserves higher order moments around inflection points and overcomes the limit of the simple digital filter [10]. The accelerometer and gyroscope signals were normalized to have an average of zero and a variance of one. ECG was measured at the sampling rate of 512 Hz to serve as ground truth. The R-

Formatting
It is necessary to collect a large-scale dataset to train CNNs well. Thus, this study augmented the dataset by dividing the motion data and ECG with a sliding window (window size = 5 s, interval size = 1 s). Then, the motion data and ECG were preprocessed to transform them into a data structure for CNNs. The six-axis motion data was measured from the accelerometer and gyroscope at a sampling rate within the range of around 100 and 200 Hz, as described in Section 2.1. The motion data was interpolated as the sampling rate of 256 Hz by cubic spline interpolation [27] because CNNs take fixed-size data as input data. The motion-induced noise was estimated by the Savitzky-Golay filter with the order of 2 and window size of 31 samples [28] and was subtracted from each accelerometer or gyroscope signal to remove large-range motion, as shown in Figure 3. This method preserves higher order moments around inflection points and overcomes the limit of the simple digital filter [10]. The accelerometer and gyroscope signals were normalized to have an average of zero and a variance of one. ECG was measured at the sampling rate of 512 Hz to serve as ground truth. The R-R Intervals (RRIs) were calculated from ECG by the QRS detection algorithm, which was implemented by Pan and Tompkins to detect the R peaks [29]. The heart rate was calculated from the average of the RRIs. Finally, the motion data and heart rate were determined as the input data and labels in the dataset, respectively.
The total number of the dataset is 58,561 samples. If the heart rate is lower than 60 or is higher than 200, this sample was not included in the dataset because ECG might have been measured incorrectly. The dataset was shuffled and then divided into training data (70%), validation data (10%), and test data (20%) for the cross-validation method. R Intervals (RRIs) were calculated from ECG by the QRS detection algorithm, which was implemented by Pan and Tompkins to detect the R peaks [29]. The heart rate was calculated from the average of the RRIs. Finally, the motion data and heart rate were determined as the input data and labels in the dataset, respectively. The total number of the dataset is 58,561 samples. If the heart rate is lower than 60 or is higher than 200, this sample was not included in the dataset because ECG might have been measured incorrectly. The dataset was shuffled and then divided into training data (70%), validation data (10%), and test data (20%) for the cross-validation method.

Baseline Architecture
This study proposed a network architecture with CNN for estimating the heart rate from motion data on the chest, as shown in Figure 4. The baseline network architecture with eight convolutional layers and three fully connected layers was developed based on VGG-11 [26] and tuned to apply the motion data. There are three approaches to apply the motion data. First, although the input format of CNNs were originally regarded as a square structure proposed for image data, it was reshaped as a rectangle structure (1 × 1280 × 6) in this study. Second, this study employed the convolution (1 × 3) and pooling (1 × 2) operations with the 1-D rectangle shape instead of the 2-D square shape. Third, this network had one output node to solve a regression problem because it estimated continuous data (i.e., heart rate). In the next section, several network architectures are examined according to additional convolutional layers (e.g., VGG-13, VGG-16, and VGG-19) and data augmentation (e.g., permutation, jittering, and scaling).
There are several hyperparameters to be determined for training and optimization: activation functions, loss functions, optimizer, learning rate, accuracy, etc. Rectified units [30] are employed as the activation function to reduce the vanishing gradient problem. The optimization was performed by the L2-loss function and Adam optimizer [31] with a learning rate of 0.0001. Additionally, the dropout [32] with a probability of 0.5 was involved in each fully connected layer to avoid overfitting. The network was initialized by Xavier initialization [33] and was trained with 128-sized minibatches. The accuracy was calculated to evaluate the performance of network as

Baseline Architecture
This study proposed a network architecture with CNN for estimating the heart rate from motion data on the chest, as shown in Figure 4. The baseline network architecture with eight convolutional layers and three fully connected layers was developed based on VGG-11 [26] and tuned to apply the motion data. There are three approaches to apply the motion data. First, although the input format of CNNs were originally regarded as a square structure proposed for image data, it was reshaped as a rectangle structure (1 × 1280 × 6) in this study. Second, this study employed the convolution (1 × 3) and pooling (1 × 2) operations with the 1-D rectangle shape instead of the 2-D square shape. Third, this network had one output node to solve a regression problem because it estimated continuous data (i.e., heart rate). In the next section, several network architectures are examined according to additional convolutional layers (e.g., VGG-13, VGG-16, and VGG-19) and data augmentation (e.g., permutation, jittering, and scaling).
There are several hyperparameters to be determined for training and optimization: activation functions, loss functions, optimizer, learning rate, accuracy, etc. Rectified units [30] are employed as the activation function to reduce the vanishing gradient problem. The optimization was performed by the L2-loss function and Adam optimizer [31] with a learning rate of 0.0001. Additionally, the dropout [32] Sensors 2018, 18, 1392 6 of 19 with a probability of 0.5 was involved in each fully connected layer to avoid overfitting. The network was initialized by Xavier initialization [33] and was trained with 128-sized minibatches. The accuracy was calculated to evaluate the performance of network as where y is true label,ŷ is predicted label, and n is number of samples in the dataset. The network architecture was tested according to network depth and data augmentation. The network was implemented with TensorFlow [34], an open source deep-learning library, using a computer equipped with 3.6 GHz quad-core processors and 4 NVIDIA GeForce GTX 1080 GPUs. The dataset used in the experiments consisted of 40,986 training data, 5858 validation data, and 11,717 test data.
where is true label, is predicted label, and is number of samples in the dataset. The network architecture was tested according to network depth and data augmentation. The network was implemented with TensorFlow [34], an open source deep-learning library, using a computer equipped with 3.6 GHz quad-core processors and 4 NVIDIA GeForce GTX 1080 GPUs. The dataset used in the experiments consisted of 40,986 training data, 5858 validation data, and 11,717 test data.

Effects of CNN Depth
This study investigated the effect of network depth on its accuracy in our dataset. Note that depending on the network depth, the invariance and discriminative feature maps can be represented in a higher layer [26]. However, since an excessively deep network is difficult to generalize because of overfitting, an appropriate depth needed to be determined for our dataset. There were four network architectures according to the depth: VGG-11, VGG-13, VGG-16, and VGG-19. Each network was tuned to apply the motion data and to estimate the heart rate as described above (Baseline architecture). Then, it was trained for 100 epochs with training data, including all measurement conditions (see Figure 5a). The model's parameters were saved when the validation cost was lowest.

Effects of CNN Depth
This study investigated the effect of network depth on its accuracy in our dataset. Note that depending on the network depth, the invariance and discriminative feature maps can be represented in a higher layer [26]. However, since an excessively deep network is difficult to generalize because of overfitting, an appropriate depth needed to be determined for our dataset. There were four network architectures according to the depth: VGG-11, VGG-13, VGG-16, and VGG-19. Each network was tuned to apply the motion data and to estimate the heart rate as described above (Baseline architecture). Then, it was trained for 100 epochs with training data, including all measurement conditions (see Figure 5a). The model's parameters were saved when the validation cost was lowest.
where is true label, is predicted label, and is number of samples in the dataset. The network architecture was tested according to network depth and data augmentation. The network was implemented with TensorFlow [34], an open source deep-learning library, using a computer equipped with 3.6 GHz quad-core processors and 4 NVIDIA GeForce GTX 1080 GPUs. The dataset used in the experiments consisted of 40,986 training data, 5858 validation data, and 11,717 test data.

Effects of CNN Depth
This study investigated the effect of network depth on its accuracy in our dataset. Note that depending on the network depth, the invariance and discriminative feature maps can be represented in a higher layer [26]. However, since an excessively deep network is difficult to generalize because of overfitting, an appropriate depth needed to be determined for our dataset. There were four network architectures according to the depth: VGG-11, VGG-13, VGG-16, and VGG-19. Each network was tuned to apply the motion data and to estimate the heart rate as described above (Baseline architecture). Then, it was trained for 100 epochs with training data, including all measurement conditions (see Figure 5a). The model's parameters were saved when the validation cost was lowest.

Effects of Data Augmentation
Although deeper networks can represent the invariance and discriminative feature maps, it is necessary for a large-scale dataset to train well without overfitting. However, it was difficult to collect a large-scale dataset in our experiments which involved human subjects. Thus, this study employed data augmentation to create a large-scale dataset and examined the effect of data augmentation on the network's accuracy in our dataset. For data augmentation, the domain knowledge should be considered to preserve the labels after transformations. For example, image processing methods such as jittering, scaling, copping, distorting, or rotating are well known as the data augmentation on CNNs studies for vision. This study employed data augmentation, which was developed to generate new motion data from existing motion data, such as permutation, jittering, and scaling, as shown in Figure 6 [35]. Permutation creates new data by moving the temporal location as where x is motion data, n is length of data, and α is the window size for moving the temporal location. It can represent the invariant features for temporal location. Jittering distorts the data by adding the noise with a gaussian distribution as where m is a mean of distribution and σ is a standard deviation of distribution.
where x is motion data and α is a standard deviation of noise distribution. Scaling increases or decreases the amplitude of data by multiplying random value as where x is motion data and α is a scaling ratio. Jittering and scaling can represent the invariant features for noise. Each network was trained for 100 epochs with augmented training data and were saved when their validation cost was lowest, respectively (see Figure 5b). Although deeper networks can represent the invariance and discriminative feature maps, it is necessary for a large-scale dataset to train well without overfitting. However, it was difficult to collect a large-scale dataset in our experiments which involved human subjects. Thus, this study employed data augmentation to create a large-scale dataset and examined the effect of data augmentation on the network's accuracy in our dataset. For data augmentation, the domain knowledge should be considered to preserve the labels after transformations. For example, image processing methods such as jittering, scaling, copping, distorting, or rotating are well known as the data augmentation on CNNs studies for vision. This study employed data augmentation, which was developed to generate new motion data from existing motion data, such as permutation, jittering, and scaling, as shown in Figure 6 [35]. Permutation creates new data by moving the temporal location as where is motion data, is length of data, and is the window size for moving the temporal location. It can represent the invariant features for temporal location. Jittering distorts the data by adding the noise with a gaussian distribution as where is a mean of distribution and is a standard deviation of distribution.
where is motion data and is a standard deviation of noise distribution. Scaling increases or decreases the amplitude of data by multiplying random value as where is motion data and is a scaling ratio. Jittering and scaling can represent the invariant features for noise. Each network was trained for 100 epochs with augmented training data and were saved when their validation cost was lowest, respectively (see Figure 5b).

Evaluation of Structural Risks
The structural risk of network architectures was defined as the instability of the method using the A-Test proposed in [36], based on the multiple use of z-fold validation. Because this study solved the regression problem, the regression error τ , of a network was defined as mean absolute error (MAE). The metric for estimating the structural risk of a network was calculated by the average of the regression error τ , as

Evaluation of Structural Risks
The structural risk of network architectures was defined as the instability of the method using the A-Test proposed in [36], based on the multiple use of z-fold validation. Because this study solved the regression problem, the regression error τ n,z of a network n was defined as mean absolute error (MAE). The metricτ n for estimating the structural risk of a network n was calculated by the average of the regression error τ n,z asτ where Z max is determined as 10-fold in this study. Each fold group was divided to distribute heart rates uniformly. The low value ofτ n corresponds to a low structural risk, and the minimum value of τ n indicates the potential of the network to achieve better performance with a larger dataset. Table 1 presents the accuracy of heart rate estimation according to the CNN depth and the data augmentation. The VGG-16 without data augmentation shows the highest accuracy and lowest structural risk in nonmovement conditions, whereas the VGG-19 with data augmentation shows the highest accuracy and lowest structural risk in movement conditions. As a generalization, heart rate estimation should be accurate for all measurement conditions, with or without movement. Thus, this study determined the ensemble network of the VGG-16 without data augmentation and the VGG-19 with data augmentation as optimal architecture as

Optimal Architecture
where HR vgg16(noaug) is heart rate estimated by the VGG-16 without data augmentation, HR vgg19(aug) is heart rate estimated by the VGG-19 with data augmentation, and HR ensemble is heart rate finally estimated by the ensemble network.

Results
The three CNN models (i.e., VGG-16 (No Aug), VGG-19 (Aug), and ensemble network) were evaluated by comparing them with the heart rate of ECG by mean absolute error (MAE), standard deviation of absolute error (SDAE), root mean squared error (RMSE), and Pearson's correlation coefficients (CC). In addition, they were compared to each other by the Bland-Altman plot [37], which is represented by assigning the mean (x-axis) and difference (y-axis) between the two measurements with the 95% limits of an agreement calculated by mean difference and the ±1.96 standard deviation of the difference. Finally, the ensemble network, which was determined to be the optimal architecture for generalization, was compared with the previous SCG method [8], which employed ensemble averaging of the six-axis of the accelerometer and gyroscope using signal processing.

Estimation of Heart Rate for Running
The heart rates for running at movement speeds of 6.4, 8.5, and 10.3 km/h were evaluated as shown in Table 5. The errors at movement speed of 6.4 km/h were lower with the VGG-19 (Aug) than other the other networks (MAE = 5.24, SDAE = 6.72, RMSE = 8.52, CC = 0.917). Similarly, the errors of the VGG-19 (Aug) were lower than the errors of the other networks at movement speeds of 8.5 (MAE

Estimation of Heart Rate for Running
The heart rates for running at movement speeds of 6.4, 8.5, and 10.3 km/h were evaluated as shown in Table 5 Table 6 shows the heart rates evaluated by signal processing (previous method) and CNN (proposed method)

Discussion
This study developed the CNN model to replace traditional signal processing and to enhance heart rate estimation. The networks were evaluated on effects of CNN depth and data augmentation according to 12 measurement conditions. The VGG-16 without data augmentation was better than the other networks in nonmovement conditions, whereas the VGG-19 with data augmentation was better than the other networks in movement conditions. Their ensemble network was determined as the optimal architecture for generalization in this study.
Overall, this study has drawn six significant findings. First, for supine posture in the relaxed condition, the signal processing-based method showed high error (MAE = 18.05, SDAE = 17.27, RMSE = 24.78) and low correlation (CC = −0.084), as shown in Table 6. It indicated that the motion data did not sufficiently reflect the thoracic movement and that the apparatus may not be in close contact with the body for supine posture. Note that, nevertheless, the CNN-based method showed low error (MAE = 2.17, SDAE = 3.12, RMSE = 3.80) and high correlation (CC = 0.977). It indicated that the CNN-based method can extract features that cannot be extracted by signal processing. Thus, the CNN-based method can increase the possibility of heart rate estimation in daily life when the apparatus is not fixed.
Second, the accuracy was lower as the movement speed increased in most movement conditions. Faster movement speed leads to larger motions and induces the motion data to include more noise. However, the accuracy for walking at movement speed of 3.2 km/h was higher than the one for walking at movement speeds of 4.5 and 5.8 km/h. It may be interpreted that the motion for walking at movement speed of 3.2 km/h causes more noise to the frequency components associated with the heartbeat. Note that the noise cancellation should be focused on the noise associated with the frequency components rather than the time components.
Third, before data augmentation, the average accuracy for all measurement conditions was higher with the deeper networks (e.g., VGG-16 or VGG-19) than the shallower networks (e.g., VGG-11 or VGG-13). However, the effect of the depth was different depending on whether the test data included movement conditions or not. The deeper networks were better than the shallower networks in nonmovement conditions as well as all measurement conditions, but the shallower networks were better than the deeper networks in movement conditions. The deeper networks can extract more invariant features than the shallower networks. However, it is necessary for a large-scale dataset to train the deeper network without overfitting. The motion data in movement conditions has more large variation than in nonmovement conditions, thus, it is difficult to train the deeper networks without overfitting. Note that in the results after data augmentation, the deeper networks were better than shallower networks in most nonmovement conditions as well as movement conditions. It indicated that the data augmentation, which allows for the extraction of invariant features for temporal location and noise, is important in training the network with the noisy motion data.
Fourth, data augmentation improved heart rate estimation in movement conditions, but not in nonmovement conditions. It can be interpreted that our network models are insufficient for extracting the invariant features from both the nonmovement and movement conditions. As a generalization, it is important not only to apply data augmentation and to optimize hyper parameters, but also to improve the network architectures. CNNs have been recently developed with a focus on structural improvements to increase the number of layers and to reduce the number of parameters [21,38,39]. If our network is structurally improved in the future, it is expected that the heart rate estimation will be improved in both nonmovement and movement conditions. Fifth, the accuracy of heart rate estimation was higher with the single networks (i.e., VGG-16 (No Aug) and VGG-19 (Aug)) than with the ensemble network, but the single networks showed higher accuracy only in certain conditions. For example, although the accuracy of the VGG-16 (No Aug) was high in nonmovement conditions, it was low in movement conditions. On the other hand, the accuracy of the VGG-19 (Aug) was low in nonmovement conditions but high in movement conditions. These results indicate that single networks are difficult to employ for general-purpose applications. In order to be employed for general-purpose applications in daily life, reasonable performance should be ensured in most measurement conditions. Thus, this study suggests that the ensemble network is the optimal network architecture for general-purpose applications of heart rate estimation. However, the ensemble network is more complex and requires more capacity than a single network and traditional signal processing, thus making it difficult to load our method onto an embedded system. This study proposed solutions to develop the distillation method [40] to preserve the performance of an ensemble network by using single networks.
Sixth, the proposed method showed better performance than the previous method using signal processing in most measurement conditions, especially in movement conditions. However, the machine-learning methods have shown limited ability in unexpected conditions. Because daily conditions are so diverse that they cannot all be considered in the experiment, it is necessary to develop a method to improve performance even in unexpected conditions. This study proposed solutions to develop additional data augmentation for invariant features and to create virtual data in unexpected conditions by generative models, such as variational auto-encoder (VAE) [41] and generative adversarial networks (GANs) [42].
This study explored the issues regarding SCG in measurement conditions according to measurement location, axis selection, and motion artifacts. First, the possibility for a more comfortable measurement location was shown by the apparatus being simply worn on clothes. Second, the six-axis of the accelerometer and gyroscope were integrated to extract significant features related to thoracic movement. Third, the proposed method using CNN was demonstrated to better reduce motion artifacts than traditional signal processing and to estimate more accurately the heart rate in movement conditions. Consequently, our findings represent a significant step towards ensuring the enhanced development of SCG.

Conclusions
This study estimated the heart rate in 12 measurement conditions according to body postures, physical conditions, and movement speeds. The proposed method using CNNs was compared with the previous SCG method using traditional signal processing. As a result, the proposed method estimated a more accurate heart rate than traditional SCG methods by employing ensemble averaging of the six-axis of the accelerometer and gyroscope. Specifically, CNNs demonstrated the ability to overcome the motion artifacts problem for SCG by replacing traditional signal processing. The findings are a significant step towards ensuring the enhanced development of SCG. This study is expected to help more accurately estimate the heart rate by overcoming the motion artifacts problem and consequently improving the monitoring environment of wearer-comfortability devices in daily life.