Deep Neural Network for the Detections of Fall and Physical Activities Using Foot Pressures and Inertial Sensing

Fall detection and physical activity (PA) classification are important health maintenance issues for the elderly and people with mobility dysfunctions. The literature review showed that most studies concerning fall detection and PA classification addressed these issues individually, and many were based on inertial sensing from the trunk and upper extremities. While shoes are common footwear in daily off-bed activities, most of the aforementioned studies did not focus much on shoe-based measurements. In this paper, we propose a novel footwear approach to detect falls and classify various types of PAs based on a convolutional neural network and recurrent neural network hybrid. The footwear-based detections using deep-learning technology were demonstrated to be efficient based on the data collected from 32 participants, each performing simulated falls and various types of PAs: fall detection with inertial measures had a higher F1-score than detection using foot pressures; the detections of dynamic PAs (jump, jog, walks) had higher F1-scores while using inertial measures, whereas the detections of static PAs (sit, stand) had higher F1-scores while using foot pressures; the combination of foot pressures and inertial measures was most efficient in detecting fall, static, and dynamic PAs.


Introduction
Increasingly, research and discussion have focused on the recognition of physical activity (PA) and fall detection in recent years [1,2]. PA interventions have been evidenced to reduce the risks of cardiometabolic syndrome, falls, depression, anxiety, and dementia [3]. PA interventions have also attracted increasing interest for their potential health benefits in various diagnostic populations [4]. Several studies have even developed disease-related PA markers to differentiate the distribution of activity levels in individuals with chronic cardiovascular disease from healthy individuals [5]. On the other hand, fall detection provides a practical solution for fall scenarios, calling for help automatically if they occur. In particular, falls are a debilitating problem among the elderly [6] and individuals with Parkinson's disease [7], multiple sclerosis [8], and stroke [9]. Usually, a fear of falling restricts an individual's participation in daily activities [10][11][12]. The incorporation of fall detectors into mobile assistive technology may enable older people to live independently at home [13,14].
Currently, vision-based [15,16], inertial sensing-based detections [1,2], and hybrid frameworks [17,18] are the major approaches for PA monitoring and fall detection. The vision-based approaches use single or multiple cameras to capture human postures or movements [19]. Rapid changes in human posture are linked with fall incidence [20,21],   Methods in the first block use hand-crafted features that are obtained from foot pressures (FP) and inertial signals by accelerometer (A), gyroscope (G), and magnetometer (M). The features including acceleration amplitude (AM), acceleration cross-product (AC), differential acceleration (DA), vertical velocity (VV), acceleration cubic-product-root magnitude (ACM), angular velocity cubic-product-root magnitude (AVCM), angular velocity, angular acceleration, and posture are therefore used to distinguish falls from activities of daily life (ADL) based on empirical rules or a support vector machine (SVM) or decision tree (DT). Methods in the second block use convolutional neural networks (CNNs), recurrent neural networks (RNNs), long short-term memory (LSTM), and their combination (CNN + LSTM) to learn discriminant features form inertial signals and distinguish falls from ADL. † indicates the accuracy (acc) is calculated by sensitivity × specificity. As shoes are commonly worn during off-bed activities, they are worn during most outdoor activities. However, this methodology is rarely adopted for PAs and fall detection in the literature. In this work, we focus on the detection of falls and various types of PAs in parallel via footwear sensing and deep-learning technology. A novel footwear sensing system has been developed for instrumented shoes that are equipped with 11 force-sensitive resistors (FSRs) in the insole to measure foot pressures, and an IMU with an accelerometer, a gyroscope, and a magnetometer to measure foot inertial dynamics on each side. A novel deep-learning neural network based on a hybrid CNN and RNN model was used to learn the discernible features from the inertial signals and foot pressures and detect simulated falls and various types of PAs. To validate the feasibility of the proposed footwear approach, we define four kinds of simulated falls and seven types of activities of daily living, including jumping, jogging, level walking, walking downstairs, walking upstairs, sitting, and standing. The detections based on various combinations of signals and parameters were validated and compared based on a dataset collected in this study.

Subjects and Experimental Activities
A sample of 32 healthy participants (16 males and 16 females; age of 21.5 ± 2.0 yrs; height of 167.2 ± 7.1 cm; weight of 62.4 ± 11.3 kg) performed simulated falls and physical activities of daily life. Every participant performed four kinds of simulated falls and seven other types of PAs. Every simulated fall and PA were repeated three times by each of the participants. The protocol of this study was approved by the Research Ethics Committee of the Chang Gung Medical Foundation (IRB#201802118B0) in accordance with the Helsinki Declaration. All of the participants provided written informed consent.
The participants were protected by a thick mattress while undertaking the simulated falls. The four kinds of simulated falls were defined as follows: (1) A backward fall during stand-to-sit.
(2) A forward fall, a lateral fall toward the left side, and a lateral fall toward the right side during walking. (3) A forward fall, a lateral fall toward the left side, and a lateral fall toward the right side from standing. (4) A forward fall, a lateral fall toward the left side, and a lateral fall toward the right side during sit-to-stand.
The seven types of PAs were defined as follows: (1) A single jump and a continuous jump.

Sensing and Recording System
An ARM Cortex-M4 microcontroller (M451RG6AE, Nuvoton Tech. Corp., Hsinchu, Taiwan) received the digital data from an IMU (LSM9DS1, STMicroelectronics, Geneva, Switzerland) with a full-scale, ranged ±4 g tri-axial accelerometer, a full-scale, ranged ±500 degrees/s (dps) ±12 Gauss tri-axial magnetometer, via a serial peripheral interface bus at a sampling rate of 100 Hz. As shown in Figure 1, a customized insole with 11 FSRs (UNEO Incorporated, New Taipei City, Taiwan) was used to capture the foot pressures at the big toe, little toe, metatarsus (medial, middle, lateral), arches (medial, lateral), fore heels (medial, lateral), and heels (medial, lateral). The FSRs were constructed of a resistance-type piezo-resistive polymer composite made using processing and printingbased micromachining technology. Each FSR had a sensing range of 1 to 5 kg/cm 2 and was individually calibrated using elastic-film pressurization to reduce the resistance variance between the sensors. The microcontroller digitized the transformed voltages from these FSRs through a built-in 12-bit analog-to-digital converter at a sampling rate of 100 Hz. All of the acquired samples were wirelessly transmitted to a notebook computer through a BLE 4.2 Bluetooth module (JDY-18, Shenzhen Innovation Technology, Shenzhen, China). The customized insole had a height of 260 mm, a metatarsus width of 850 mm, a heel width of 550 mm, and a 0.63 mm thickness. Therefore, we just included participants whose foot size could match the size of the customized insole as much as possible.
Switzerland) with a full-scale, ranged ±4 g tri-axial accelerometer, a full-scale, ranged ±500 degrees/sec (dps) tri-axial gyroscope, and a full-scale, ranged ±12 Gauss tri-axial magnetometer, via a serial peripheral interface bus at a sampling rate of 100 Hz. As shown in Figure 1, a customized insole with 11 FSRs (UNEO Incorporated, New Taipei City, Taiwan) was used to capture the foot pressures at the big toe, little toe, metatarsus (medial, middle, lateral), arches (medial, lateral), fore heels (medial, lateral), and heels (medial, lateral). The FSRs were constructed of a resistance-type piezo-resistive polymer composite made using processing and printing-based micromachining technology. Each FSR had a sensing range of 1 to 5 Kg/cm 2 and was individually calibrated using elastic-film pressurization to reduce the resistance variance between the sensors. The microcontroller digitized the transformed voltages from these FSRs through a built-in 12-bit analog-to-digital converter at a sampling rate of 100 Hz. All of the acquired samples were wirelessly transmitted to a notebook computer through a BLE 4.2 Bluetooth module (JDY-18, Shenzhen Innovation Technology, Shenzhen, China). The customized insole had a height of 260 mm, a metatarsus width of 850 mm, a heel width of 550 mm, and a 0.63 mm thickness. Therefore, we just included participants whose foot size could match the size of the customized insole as much as possible.
The sensing devices were fixed at the lateral aspect or insole of each shoe in parallel to measure the foot inertial data and foot pressures, respectively. A graphical user interface developed in the PyQt Designer (Riverbank Computing, Dorchester, UK) collected these data in parallel with a video recording from a Kinect V2 camera (Microsoft Corp., Redmond, WA, USA) at a rate of 30 frames/sec.

Fall and PA Detection Network
A network to detect falls and various types of PAs was constructed based on the CNN and RNN, which are constructed using a deep residual network (DRN) and a bidirectional long short-term memory (LSTM) network. The inputs can be foot inertial data Figure 1. Instrumental shoes with inertial sensing and foot pressure measurements at the big toe, little toe, metatarsus (medial, middle, lateral), arches (medial, lateral), fore heels (medial, lateral), and heels (medial, lateral).
The sensing devices were fixed at the lateral aspect or insole of each shoe in parallel to measure the foot inertial data and foot pressures, respectively. A graphical user interface developed in the PyQt Designer (Riverbank Computing, Dorchester, UK) collected these data in parallel with a video recording from a Kinect V2 camera (Microsoft Corp., Redmond, WA, USA) at a rate of 30 frames/s.

Fall and PA Detection Network
A network to detect falls and various types of PAs was constructed based on the CNN and RNN, which are constructed using a deep residual network (DRN) and a bidirectional long short-term memory (LSTM) network. The inputs can be foot inertial data and/or three parameterized data from a single foot or both feet in a 3-s window. The inputs can also be foot pressures and/or the center of pressure (CoP) from a single foot or both feet. The combinations of these data and parameters are also used.
The foot inertial data and three parameterized data are described as follows: (1) Tri-axial accelerations: a x (i), a y (i), a z (i) (2) Tri-axial angular velocities: ω x (i), ω y (i), ω z (i) (3) Acceleration amplitude (AM) defined as the square root of the sum of tri-axial accelerations [32]: Acceleration cubic-product-root magnitude (ACM) defined as the cube root of the product of tri-axial absolute accelerations [44]: (5) Angular velocity cubic-product-root magnitude (AVCM) defined as the cube root of the product of tri-axial absolute angular velocities [44]: Figure 2 shows the tri-axial accelerations, tri-axial angular velocities, and the corresponding inertial parameters (AM, ACM, and AVCM) while a participant was undertaking a forward fall, a jump, a forward walk, and a forward jog. Both falling and jumping produced abrupt changes in accelerations and angular velocities. Two distinct acceleration peaks were generated particularly while jumping off ground and back to ground. Both walking and jogging produced repeated changes in accelerations and angular velocities. Tri-axial accelerations, tri-axial angular velocities, and the corresponding inertial parameters including acceleration amplitude (AM), acceleration cubic-product-root magnitude (ACM), and angular velocity cubic-product-root magnitude (AVCM) while a subject was undertaking a forward fall, a jump, a forward walk, and a forward jog.
The foot pressures and their CoP are described as follows: CoPx and CoPy of the left or right foot are, respectively, defined as the sum of the products of the eleven individual foot pressures and their x and y positions divided by the sum of the eleven foot pressures: where xn and yn indicate the location of the centroid of the n-th FSR relative to the local reference frame. Tri-axial accelerations, tri-axial angular velocities, and the corresponding inertial parameters including acceleration amplitude (AM), acceleration cubic-product-root magnitude (ACM), and angular velocity cubic-product-root magnitude (AVCM) while a subject was undertaking a forward fall, a jump, a forward walk, and a forward jog.
The foot pressures and their CoP are described as follows: (1) Eleven individual foot pressures: CoPx and CoPy of the left or right foot are, respectively, defined as the sum of the products of the eleven individual foot pressures and their x and y positions divided by the sum of the eleven foot pressures: where x n and y n indicate the location of the centroid of the n-th FSR relative to the local reference frame. The CoP of both feet is defined as the weighted sum of the left-side CoP and right-side CoP. When non-ground contact is detected, CoP is set to the center of a single foot or the center of both feet. Figure 3 shows the foot pressures at the metatarsus and heel areas, and the CoP trajectories during the same activity events. The foot pressures were close to zero after the fall incidence. There were also two distinct peak pressures at the metatarsus area while jumping off ground and back to ground. The foot pressures at the metatarsus and heel areas were interlaced while walking or jogging, which were linked to toe-off and heel-strike. Moreover, various CoP trajectory patterns were presented in falling, jumping, walking, and jogging. In particular, both walking and jogging produced butterfly patterns. strike. Moreover, various CoP trajectory patterns were presented in falling, jumping, walking, and jogging. In particular, both walking and jogging produced butterfly patterns. The inputs, therefore, have a size of M (channels) × 300 (points). As shown in Figure  4, a DRN is constructed by a stack of three residual units (RUs). Each RU is composed of six convolutional layers and a skip layer. In each convolutional layer, an equivalent number of 1-D filters are used to capture the temporal patterns of the inputs individually, preserving the temporal dimensions (stride 1, same padding); the temporal patterns are then summed along the channel dimension and outputted through a ReLU activation function. Each convolutional layer has multiple sets of 1-D filters and outputs one feature channel per set. A max-pooling layer subsamples the outputs of the last convolutional layer to halve the dimensionality of the features. The skip layer adds the inputs of the RU directly to the halved output features through a convolution layer with a stride of 2 and the right number of output channels. The detail of the deep residual network is listed in Table 3. The kernel size is the same within each RU but differs across RUs. The inputs, therefore, have a size of M (channels) × 300 (points). As shown in Figure 4, a DRN is constructed by a stack of three residual units (RUs). Each RU is composed of six convolutional layers and a skip layer. In each convolutional layer, an equivalent number of 1-D filters are used to capture the temporal patterns of the inputs individually, preserving the temporal dimensions (stride 1, same padding); the temporal patterns are then summed along the channel dimension and outputted through a ReLU activation function. Each convolutional layer has multiple sets of 1-D filters and outputs one feature channel per set. A max-pooling layer subsamples the outputs of the last convolutional layer to halve the dimensionality of the features. The skip layer adds the inputs of the RU directly to the halved output features through a convolution layer with a stride of 2 and the right number of output channels. The detail of the deep residual network is listed in Table 3. The kernel size is the same within each RU but differs across RUs. The benefits of the residual network have been shown through its easier optimization and the accuracy obtained from the increased depth [66]. The purpose of using three RUs is to extract the low-, middle-, and high-level features sequentially from the signals. The kernel size of these three RUs was 11, 9, and 5, respectively. With the increased level, the number of feature maps is increased, and the dimensionality of the features is halved.
The output of the deep residual network can be viewed as a sequence of feature vectors (38 vectors with vector size 108) such that it is inputted to a bidirectional LSTM network. The current input vector and previous short-term state vector are fed into a fully connected layer on which the LSTM cell generates a short-term vector and a long-term vector (each has a vector size of 64) to the next state. After the bidirectional LSTM network has been traversed in a forward direction, it is then traversed backward. The short-term vector of the last state is used as an input to a terminal fully connected network, which contains one hidden layer of 32 neurons and one output layer of 8 neurons corresponding to a fall and 7 types of PAs. The activation function in the hidden layer is chosen as the ReLU function, where the output layer uses a softmax function to represent a categorical probability distribution.  6  300  36  42  150  72  78  75  Conv1d  6  12  300  42  48  150  78  84  75  Conv1d  12  18  300  48  54  150  84  90  75  Conv1d  18  24  300  54  60  150  90  96  75  Conv1d  24  30  300  60  66  150  96 102 75   6  300  36  42  150  72  78  75  Conv1d  6  12  300  42  48  150  78  84  75  Conv1d  12  18  300  48  54  150  84  90  75  Conv1d  18  24  300  54  60  150  90  96  75  Conv1d  24  30  300  60  66  150  96  102  75  Conv1d  30  36  300  66  72  150  102  108  The benefits of the residual network have been shown through its easier optimization and the accuracy obtained from the increased depth [66]. The purpose of using three RUs is to extract the low-, middle-, and high-level features sequentially from the signals. The kernel size of these three RUs was 11, 9, and 5, respectively. With the increased level, the number of feature maps is increased, and the dimensionality of the features is halved.
The output of the deep residual network can be viewed as a sequence of feature vectors (38 vectors with vector size 108) such that it is inputted to a bidirectional LSTM network. The current input vector and previous short-term state vector are fed into a fully connected layer on which the LSTM cell generates a short-term vector and a long-term vector (each has a vector size of 64) to the next state. After the bidirectional LSTM network has been traversed in a forward direction, it is then traversed backward. The short-term vector of the last state is used as an input to a terminal fully connected network, which contains one hidden layer of 32 neurons and one output layer of 8 neurons corresponding to a fall and 7 types of PAs. The activation function in the hidden layer is chosen as the ReLU function, where the output layer uses a softmax function to represent a categorical probability distribution.
A batch containing 96 randomly chosen instances was inputted into the deep neural network. The corresponding output vectors were calculated via forward propagation. Therefore, the cross-entropy loss was computed based on the forward output vectors and the one-hot target vectors. A backpropagation of the errors was subsequently applied to update the network weights using the Adam optimization algorithm, which estimates the updates using a running average of the first and second moment of the gradient [67].
The settings of the hyper-parameters with a brief description of the forward processing and consideration are summarized in the following. These settings were determined through various tests on our dataset and achieved the optimal prediction performance.
(1) A batch size of 96 to achieve a balance between the robustness of stochastic gradient descent and the efficiency of batch gradient descent. (2) Three RUs to extract low-, middle-, and high-level features from signals: 1D convolutions with a kernel size of 11 in the first RU to capture the temporal patterns of signals (~110 ms), 9 in the second RU to capture the low-level temporal features, and 5 in the third RU to aggregate the middle-level features.

Network Training and Testing
In order to generate the data for network training and testing, foot pressures and foot inertial data with a duration of 4.5 s that covered a fall or a PA were selected as a basis for data augmentation.
(1) Several 3-s fall segments were obtained by choosing various onset times at 0, 0.5, 1, and 1.5 s and four additional multiples by randomly shifting the onset time within the three intervals (0-0.5 s, 0.5-1 s, 1-1.5 s). Therefore, a total of 3 × 10 × 16 segments (10 falls, each fall had 3 trials, 16-time data augmentation) were obtained from each participant and labeled as falls (class 0). (2) Several 3-s PA segments were obtained by choosing various onset times at 0, 0.5, 1, and 1.5 s and four additional multiples by randomly shifting the onset time within three intervals (0-0.5 s, 0.5-1 s, 1-1.5 s). The number of the augmented segments depended on the type of PA such that a total of 3 × 160 segments (each movement had 3 trials) were obtained from each PA for each participant and labeled as a specific PA (class 1 to class 7). Therefore, an evenly balanced dataset between falls and various types of PA was generated.
The dataset, therefore, contained 32 × 3840 instances (32 participants, 480 instances per class in 8 classes related to fall and seven types of PAs). Each instance was constituted by 34 channels × 300 samples (3-s segment with 3-axial accelerations, 3-axial angular velocities, and 11 foot pressures on each shoe).
The foot pressures were normalized by the sum of the eleven standing foot pressures collected from each participant. The inertial data, the normalized foot pressures, and their parameterized data were separately standardized between 0 and 1 across participants.
Leave-one-out cross-validation (LOOCV) was used to validate the neural network. The data of one participant were chosen to validate the network determined by the data from the remaining participants (training set). The LOOCV was repeated to allow each participant's combination of falls and PAs to be the validation data once.
A total of 50 iterations were applied to train the networks using an Adam optimization algorithm based on the cross-entropy loss function. A fraction of the hidden units in the terminal fully connected network was randomly dropped at every iteration with a probability of 0.5 to force the network to learn general and robust patterns from the data to prevent overfitting.
All of the validation data were fed forward to the trained model. The output values were then compared with the labeled values. The recall, precision, and F1-score for the fall or each PA were calculated based on all LOOCVs. The recall was defined as the percentage of true positives (fall or PA was correctly classified) among all validation data, while the precision was defined as the number of true positives divided by the number of true positives and false positives (the misclassified fall or the misclassified PA). The F1-score was computed by 2 × recall × precision/(recall + precision). Table 4 lists the results of the fall detection and various types of PA based on foot pressures, CoP, and their combination obtained from both feet. We used F1-score as an index to evaluate the detection performance. Falls were well detected, particularly when using the combination of foot pressures and CoP. Static PAs (sitting, standing) were also best detected using this combination. However, the incorporation of foot pressures to detect walking upstairs and downstairs was limited. The poor detection was improved when using CoP alone. Similarly, other dynamic PAs (jumping, jogging, level walking) were best detected using CoP only.  Table 5 lists the results of the detections of the falls and various types of PA based on the inertial data, inertial parameters, and their combination obtained from both feet.

Results
The use of inertial data provided better detections (higher F1-scores) for falls and most PAs except jogging. The combination of inertial data and inertial parameters improved the detection of sitting, jumping, jogging, and walking downstairs. Compared to the foot pressure-based measurements (Table 2), the inertial-based measurements detected fall and dynamic PAs better but had worse detection for static PAs (sitting and standing). As listed in Table 6, falls and various types of PA were well detected when the foot pressure-based measurements and inertial-based measurements were used in combination.  Table 7 summarizes the detection of particular performances based on foot pressures, CoP, and the inertial-based measures obtained from the left foot, right foot, or both feet. The application of right foot data performed best when detecting falls and various types of PA. Table 8 lists the results of the classifications of falls and various types of PAs using data from both feet. The number of each true class is 480 for each participant. The listed values are the average misclassifications over 32 participants when the inertial data and inertial parameters (1st row), CoP (2nd row) and inertial data, inertial parameters, foot pressures, and CoP (3rd row) are used, respectively. Overall, 14.7% (70.63/480) and 19.8% (94.84/480) of the results were misclassifications between sitting and standing, which were found by using the inertial data and inertial parameters. The misclassifications were improved when the inertial data, inertial parameters, CoP, and foot pressures were used in combination. The use of inertial data and inertial parameters resulted in more than 2% misclassifications from jogging, upstairs walking to jumping. The use of CoP alone led to more than 2% misclassifications from falling to jumping, from jumping to falling or standing, from jogging to downstairs walking, from upstairs walking to falling, and among level walking, upstairs walking and downstairs walking. These misclassifications were improved when the inertial data, inertial parameters, CoP, and foot pressures were used in combination. Table 7. Results on the detections of fall and physical activities based on foot pressures, center of pressure, inertial data and inertial parameters obtained from a single foot or both feet.  The number of each true class is 480 for each participant. The listed values are the average over 32 participants when inertial data and inertial parameters (1st row), center of pressure (2nd row) and foot pressures, center of pressure, inertial data, and parameters (3rd row) were, respectively, used. LW, level walking; US, upstairs walking; DS, downstairs walking. Table 9 lists the computational complexity of the proposed deep neural model with various inputs. The number of weights and the number of computations slightly increased with the number of input features. The time for model training in each leave-one-out crossvalidation is 30 min. A total of 16 h was needed to complete 32 cross-validations, which allowed every participant's data to test the model trained by the other 31 participants' data.

Discussion
Falls and PAs are detected or classified based on different rationales. Falls are usually characterized by impact-on-ground and post-impact lying such that several parameterbased methods have been developed to capture these characteristics, whereas PAs have a variety of kinematic patterns depending on the included PA types. Deep-learning neural networks provide a way to extract multiple varieties of PA-related features through network training on labeled data. In this study, we detected falls and various types of PAs based on a deep-learning architecture that is constituted using three networks. First, several multi-layer 1-D convolutional networks extract the low-to high-level features. Second, a skip connection feeds the raw data or features to the output of multiple convolutional layers to reduce feature degradation. Third, a bidirectional LSTM network clusters the output features at various time points to capture the sequential pattern of a fall or PA.
Inertial sensing on the chest, waist, or wrist is the commonly used measurement for fall detection in the literature. The measurement based on foot pressures and foot inertial data provides an alternative approach to distinguish falls from PA. One study built empirical rules based on the foot's acceleration magnitude (AM) and inactivity duration on the ground [64], but the rules were hand-crafted and only validated on six participants. Another study employed a decision tree to determine the threshold of the waist's AM or the CoP of both feet for fall detection, which led to a better performance (area under receiver operating characteristic curve) using the waist's AM and an improved performance when both features were used [65]. However, the performance using foot inertial data was not reported. In this study, we investigated the availability of deep neural networks based on foot-pressure-based or/and foot-inertial-based measures on fall detection. The F1-score was used because it was an overall index between the sensitivity of fall detection and the ratio of correctness in the detected falls. Our result showed a recall of 0.971, a precision of 0.943, and an F1-score of 0.980 in fall detection based on foot pressures and CoP, which supported the existence of fall-related information from foot pressures in accordance with different temporal foot pressure distributions during fall relative to PAs [65]. In addition, using foot inertial data provided a recall of 0.999, a precision of 0.998, and an F1-score of 0.998 for fall detection. The excellent performance of foot inertial data could be attributed to the significant changes in both acceleration and angular velocities during falls, particularly during ground impact.
In this study, we extended fall detection to various types of PAs and applied a deeplearning model to handle these multiple classifications. Our results showed that the inertial-based measures outperformed the foot-pressure-based measures on the detection of dynamic PAs, i.e., the inertial-based measures presented more distinct patterns among jumping, jogging, level walking, upstairs walking, and downstairs walking than the footpressure-based measures. On the contrary, using inertial-based measures to detect static PAs was limited, whereas using the foot-pressure-based measures worked well. Therefore, the combination of foot-pressure-based and inertial-based measures was suggested because it achieved a good performance in detecting both static PAs and dynamic PAs.
It is not easy to compare the efficiency of the detection model among various studies because there are different PA types in different datasets. Nevertheless, our model also produced a similar trend in the F1-score between PA classification and fall detection, as in the previous works listed in Tables 1 and 2; that is, the accuracy of fall detection was higher than those of PA detection. We speculate that falls create quite different temporal patterns in contrast to PAs, whereas the movements of some PAs are somewhat similar, e.g., among level, upstairs and downstairs walks. Therefore, misclassifications among these walks were higher than their false negatives as other PAs or false positives from other PAs, as shown in the confusion matrix (Table 8).
A limitation of this study is that all of the data were collected from young, healthy participants. However, it is not rational or ethical to invite elderly subjects to perform simulated falls for safety reasons. In practical applications, the detection model can be pre-trained based on a large healthy dataset; once a smaller elderly dataset is obtained, transfer learning can be applied to fine-tune the model.
In this study, the performance of fall and PA detections based on a single foot was similar or even higher on the right foot than that based on both feet. In fact, a single-foot measurement is more readily applicable in daily life since a stand-alone detection device can be mounted on one shoe without the need for data transmission between the left and right sides. In cases such as post-stroke hemiplegia or Parkinson's disease, the application of the device on the unaffected side would be especially easier. We speculate that the unaffected side would produce a pattern closer to younger subjects if compared to the affected side. Similarly, transfer learning can be employed to fine-tune the trained model.
CoP is commonly given by the sum of the products of individual foot pressures and their x and y positions divided by the sum of the foot pressures. This calculation yields a high-precision estimation while using insoles with high-density foot-pressure sensing. It is also applied to the CoP estimation using instrumental insoles with eight foot-pressure sensors on each side [68], 10 sensors on both feet [65], the optimal placement of eight sensors on the left foot [69] or the optimal 13 sensors on both feet [70]. In order to improve the precision of the CoP estimation by the reduced number of foot-pressure sensing, a linear regression calibration [70] or a feed-forward neural network [68] was proposed. Our work focused on fall detection and PA classification based on the customized insole with 11 FSR sensors and demonstrated that the proposed deep neural network could learn discernible features from the crudely estimated CoP for the detection of falls and various types of PAs. Further study is needed to clarify whether calibrated CoP can improve detection.
Healthcare monitoring based on wearable devices has received considerable interest in the management of physiology and psychology. It is worth noting that the measured physiological and behavioral information can be gathered via Internet of Things (IoT) technology, and the increased amount of the gathered information requires further processing to broaden its application through deep learning techniques [71,72]. Wearable IoT sensors provide a solution for the objective remote monitoring of real-life ADL and real fall events for activity-level evaluation, fall prevention, and risk assessment in the elderly and subjects with dementia, Parkinson's disease, cardiovascular disease, and frailty [73].
Detection methods that rely on the inertial sensing of the human body and limbs are light, easy to wear, and low-cost and can be practically implemented in the real world, but the applications may be limited when the subjects are not willing to or forget to wear these devices. The proposed footwear approach is beneficial in that the sensors can be embedded in shoes, which are commonly worn during off-bed activities, particularly outdoor activities. The cost and complexity of the proposed footwear device can be further reduced by considering the use of one-foot measurements with a smaller number of foot pressures. Further study is needed to investigate the effect of dimensionality reduction on detection performance in future work. In addition to the proposed fall and PA detection, foot-pressure monitoring also provides plantar-pressure information, which is used as a biomechanical assessment for body balance and ergonomics posture during static or dynamic gait [74].
Two potential problems may affect the detection performance while applying the trained deep-learning model in a real scenario. The first problem is the existence of overfitting at the training stage. The trained model cannot efficiently predict the results on the new data when there exists variance with the trained data. Several works have addressed this problem and utilized some approaches to avoid overfitting. For instance, a penalty term is added to the loss function to optimize the boundary of features [75,76]; data augmentation is used to allow the model to more accurately catch different data structures [77]; dropout is applied to prevent neurons from co-adapting too much [78]; early stopping is adopted to reduce unnecessary computing [79]. The above strategies allow the model reduce the focus on some rare specific features during the training phase, thereby keeping the balance and flexibility of the model. In our work, we expanded the collected data by sixteen multiples to enable the model to gather more detail across various situations. Considering the computational complexity of the proposed CNN + LSTM model (Table 9), we used the 50% dropout of neurons randomly at each training epoch to avoid overfitting as much as possible and reduced the computational complexity of model training as well. Additionally, early stopping was applied when the validation loss increased.
The second problem is the effect of noise on the detection performance. At present, the data collected by the inertial sensors and FSRs are not proven to be perfect. Every type of measured signal (accelerations, angular velocities, and foot pressures) in our device possibly has interferences from movement disturbances. A study applied the Dempster-Shafer theory [80] to conduct decision-level image fusion to improve the crack detection accuracy with high robustness to the noise effects [81]. Therefore, it is possible to incorporate the Dempster-Shafer algorithm to fuse data from different sensors [82] for the detection of falls and PAs in our deep neural model in future work.

Conclusions
We focused on the employment of foot-pressure-based and foot inertial-based measures to detect falls and various types of PAs using a deep neural network that used CNN to extract discernible features and RNN to cluster the features at various time points. Footpressure-based measures, as well as foot inertial-based measures, performed well in fall detection. Foot inertial-based measures led to better performance in detecting dynamic PAs (jumping, jogging, walking), while the foot-pressure-based measures yielded better performance in detecting static PAs (sitting, standing). The combination of foot-pressure-based and foot-inertial-based measures allowed for the detection of both static PAs and dynamic PAs. Although the capability of a deep neural network based on foot pressures and foot inertial sensing to detect falls, static PAs, and dynamic PAs was demonstrated based on the collected data from young participants, the model can be fine-tuned by transfer learning for practical application to the elderly. Moreover, the investigation of one-foot measurement with a fewer number of foot pressures can be conducted to reduce the complexity of the proposed footwear approach in the future.  Institutional Review Board Statement: The protocol of this study was approved by the Research Ethics Committee of the Chang Gung Medical Foundation (IRB#201802118B0) in accordance with the Helsinki Declaration.
Informed Consent Statement: All of the participants provided written informed consent.

Data Availability Statement:
The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest:
The authors declared no potential conflict of interest with respect to the research, authorship, and/or publication of this article.