1. Introduction
Physical Activity (PA), which is defined as any bodily movement produced by the skeletal muscle that results in energy expenditure [
1], generally covers walking, running, cycling, sports exercise, etc. The World Health Organization (WHO) has emphasized that residents below 65 years should spend 75 min in vigorous activities and double minutes in moderate ones at least every week. Indubitably, human activity has played a crucial role in maintaining body health in daily life. Scientific and regular PA can enhance body quality and decrease the risk of getting chronic diseases, such as diabetes, dyslipidemia, and hypertension [
2]. Human Activity Recognition (HAR) aims to classify the categories of skeletal muscle conducting and capture the physiological data timely through pervasive computing, which provides more precise assistance to make a remarkable contribution not only to medical diagnosis but also to the human activity research fields [
3,
4].
Recently, researchers have acquired information about human behavior analysis by utilizing portable mobile terminals, such as fitness trackers, smartphones, and smartwatches which have integrated a variety of inertial sensors [
5]. Due to the flourishing of the Micro Electro Mechanical System (MEMS) sensor and low-power wireless technologies, PA can be measured objectively by wearable devices, which presents great advantages and feasibility. In addition to various kinds of wearable sensors, activity recognition using visual sensors has also been studied by many scholars. In this paper [
6], the authors proposed a hybrid model that combines Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) to recognize human activity using Microsoft’s motion Kinect sensor. CNN is used for spatial feature extraction, and LSTM is used to learn temporal features. However, there are some privacy problems. In the literature [
7,
8], the SVM, RF and Bagged DT classifiers are used to recognize the activity data collected by wearable wrist sensors. Freedson et al. [
9,
10] indicated the relationship between motion strength and different types using regression methods and studied the measurement of human PA by using a neural network algorithm from raw time-series signals acquired through a single accelerometer. The inertial sensors embedded in the smartphone were applied in the deep belief network to realize activity recognition, while a robust frame was established through further possessed by a kernel principal component analysis and linear discriminant analysis on the feature set [
11]. It is tough to identify the PA patterns of different people who have a big range of PA behaviors. However, a sole sensor cannot reflect the physiological information completely. The multi-sensor system gauges the movements of the different body nodes, and it shows its potential to achieve promising performance in PA pattern identification. For instance, free-walking at a certain speed may bring about an acceleration that is similar to that of the same pace as holding a load, although the energy expenditure is much different. To address this drawback, some works attach attention to the combination of different kinds of sensors and then make data fused for PA recognition [
12,
13]. Meanwhile, ensemble learning has been increasingly investigated in the pattern recognition field. By combing the decisions of multiple classifiers or multiple sensors, the accuracy can be improved effectively [
14]. For example, to capture the learning process of bipedal robot locomotion [
15], a deep learning-based ensemble classifier is introduced for human lower activities recognition. Ref. [
16] indicates that the ensemble of classifiers reached an agreement for activity recognition. Liu et al. [
17] has realized the PA measurement precisely by multiple accelerometers and an abdominal breath sensor. Moreover, selecting the most effective component classifiers by pruning criteria was proposed to optimize the multi-sensor ensemble algorithm [
18]. These distinguished studies light up and prompt many new areas of intelligent adhibition, such as healthcare monitoring, lifelogging, and fitness tracking, that use the data obtained to evaluate people’s living style and physical status.
With the rapid development of deep learning technology and its powerful ability, more and more deep learning models have been applied to the field of human motion recognition and achieved performance results. An integrated learning algorithm (ELA) based on Convolutional Neural Network (CNN) and Gated Recurrent Unit (GRU) model is proposed to recognize the activity data of smartphone sensors [
19]. The literature [
20] shows that personalization is more effective than deep learning in the application of traditional machine learning technology. The main objective of this paper [
21] is to use a 1D convolutional neural network (1D CNN) to create a system to recognize simple everyday actions. A deep learning human activity recognition model based on residual block and Bi-directional Long Short-Term Memory (BiLSTM) is proposed [
22].
In reality, an instance possessing more than one label with a high probability is ubiquitous (e.g., a movie can be regarded as both an action movie and a romance one). Hence, a strategy for multi-label has attracted a large amount of attention. The methods attached to the multi-label are grouped into some branches, such as binary relevance, label power set, classifier chain, and pruned problem transformation. Binary relevance transforms the multi-label into a series of separate binary classifications with neglecting the relevancy among the label set, which presents intuition and efficiency in the low-density label dataset [
23]. In addition, the classifier chain that connects basic classifiers to guarantee the pre-label as an input of the next one is proposed to solve the independence among multiple labels [
24]. While the label power set treats different sets of labels that are in a multi-label training set as a new single-label class with multi-value [
25]. In contrast, with the number of new labels increasing, there would be a label-set explosion that undoubtedly enlarges the amount of computation. Ref. [
26] have established a framework that combines hand gesture labels and postural activities into a multi-label activity representation to predict postural activities. The literature [
27] has designed four experiments with different multi-label algorithms on activity recognition databases and points that significantly better performance is achieved by random forest with binary relevance. Physical activity can be described as not only an exact activity type but some kind according to the degree of activity intensity, that is to say, playing basketball or tennis, which are two different activities, both belong to the vigorous one as well.
In this paper, we proposed and evaluated a cascade system that adopted the cascade classifiers to establish the recognition framework with multi-label oriented (CCM). Construct a cascaded classifier to process the activity intensity and activity type label of the data instance. Firstly, the first-level base classifier mainly focuses on the characteristics of the respiratory sensor in the human activity instance and performs a predictive classification of activity intensity. Further, according to the predicted activity intensity, the second-level classifier of the corresponding intensity is selected to realize the activity class model identification. Finally, output the final prediction results and evaluate the performance of the cascade model. Expert guidance and suggestions can be provided to users to enhance their health status and fitness according to the assessment results.
To sum up, the following are the novelty and contribution.
1. We propose a cascade classifier structure for sensor-based physical activity recognition from a multi-dimensional perspective, with two types of labels that work together to represent an exact type of activity.
2. The Multi-sensor Inertial Measurement Union (IMU) has been designed and established to collect physical data and put them into storage. For the integrity and validity of the collected data, IMU has relatively arranged three sensor units on the abdomen, upper and lower limbs.
3. The aim is to use the evaluation results of the cascading model to provide expert guidance and suggestions for users to improve their health status and physique.
The remainder of the review is arranged as follows:
Section 2 presents the materials and methods; the proposed method is described in detail. In
Section 3, the paper is validated, and the results are discussed. Discusses the paper in
Section 4. Finally,
Section 5 is the conclusion of the paper.
2. Materials and Methods
2.1. Framework
Although numerous studies have focused on PA recognition, none of the works have considered the progressive relationship existing among the multiple labels. Generally, an object would be judged and divided into an extensive category by its shape, color, or other attributes, and then confirmed what it was based on the specific information. For instance, some attributes were listed, such as four wheels and running on the ground; vehicles can be derived easily. However, more details were needed to attach to determine the definite one, e.g., cars, trucks, buses, or SUVs. In this study, such a structure was proposed to apply to the area of PA recognition. Forasmuch, differing from rather utilizing a single classifier that might make the model over complex or assembling multiple classifiers with weighted majority voting. The cascade classifiers, with the help of multi-label, were designed for activity pattern recognition, which would decrease the time consumption of computation and simplify the model complexity.
The cascade classifier structure has mainly been employed in image pattern recognition, especially in the fields of remote-sensing images, pedestrian detection, and face detection. The temporal correlation among different images obtained at the continuous moment is illustrated under the architecture of the cascade classifier for land-cover maps [
28]. Tian et al. [
29] has embedded weighted linear regression into a cascade structure with Haar-like features and Shapelet features to obtain outstanding pedestrian detection in the relatively complex background situation. Apart from applying in image processing, a cascade classifier structure is implemented to recognize based on the feature set from a one-dimensional data stream. A hybrid cascade model [
30] has been imported to address fault detection and prediction for Android smartphones. Furthermore, the cascade system [
31] also exhibits a brilliant result for each finer classification of radar signals. Based on the quoted literature, the cascade classifier structure is applied to PA recognition. The overall architecture of the PA recognition system is described in
Figure 1.
By attaching multiple sensors to three nodes separately, PA signals corresponding to different joints of the human body can be obtained. The feature extraction and selection would then be performed. Given the reliable machine learning algorithm under the framework of cascade classifier, the physical activity patterns are identified, and the body performance of the individual is appraised.
2.2. Subjects and Materials
2.2.1. Subjects
The dataset of 110 participants (including 59 females and 51 males) has been collected for the experiment on PA recognition, and all the individual characteristics are shown in
Table 1. Ten PAs would be monitored for each subject, and the sensor data during the different PA patterns would be collected by the wearable measurement system correspondingly as well. Moreover, each PA pattern was performed for 5-min lasting, and then a 5-min rest period was given to adjust breathing. Before the beginning of the test, participants are allowed to lie down for reposing for about 10 min to keep the resting metabolic rate in a relatively slow and stable range. To make sure the rigorousness of the experiment, all the tests were executed during the daytime, and the participants were asked to ingest nothing except water before the data was acquired. Moreover, the whole duration of the experiment session lasts about 2 h for each individual.
For this exploratory study, a cascade classifier has proposed to attach a solution to the PA recognition based on multi-label. Note that every PA instance has included two distinct kinds of labels, one for activity intensity category and one for activity type. All ten PA patterns are divided into four categories due to the intensity or energy expenditure, as listed in
Table 2.
2.2.2. System Design and Realization
To reduce the disturbance of daily life due to the wearable measurement device, the inertial measurement system has equipped with some feasible functions, such as low burden and wireless connection [
32]. Due to the research on the convenience of the wearable sensors, sensor positions of this system were selected; that is, the two accelerometers were placed on the wrist and hip, respectively, while the ventilation one was tied around the abdomen [
33]. Hence, the Multi-sensor Inertial Measurement Union (IMU) has been designed and established to collect physical data and put them into storage. For the integrity and validity of the collected data, IMU has relatively arranged three sensor units on the abdomen, upper and lower limbs. More specifically, three sensor-node units below were involved in IMU:
Hip Unit: a tri-axial accelerometer ADXL345 was placed at the hip joint, which represented the degree of the lower part of the body.
Wrist Unit: a tri-axial accelerometer ADXL345 was placed at the wrist joint using a wristwatch-style strap which measured the physical activity signal of the upper of the body.
Abdomen Unit: a ventilation sensor made of piezoelectric crystals tied around the abdomen using an elastic belt was used to measure the expansion and contraction resulting from the respiration (breath rate and strength).
Figure 2 shows the architecture of the IMU system measuring the body motion parameters and respiration intensity of a human subject. The obtained data from different locations have been subsequently fused and processed to predict what the PA pattern was and quantify the energy consumption. All the data stream from these three sensor units is stored in a micro secure digital (SD) card embedded in the Hip unit.
The acquired data stream from the IMU has been plotted in
Figure 3. Among these wave charts of different PAs, there existed significant divergence according to the three measurement nodes. For example, the waveform from the hip node stayed at a more stable level than that from the wrist when the sedentary activities were performed. Because the torso of subjects maintained sitting, standing, or a stable status, upper limbs dominated the high frequency of use. As a result, the data stream from the wrist unit reflected the more detailed vibration information, which illustrated the feasibility of multi-sensor fusion to realize PA recognition to some degree. Meanwhile, owing to the more energy expenditure, playing basketball showed a higher frequency waveform than the TM 6.0’s from
Figure 3b,c. Note that the differences between different PA wave charts can be the basic evidence and support to distinguish the PA pattern.
2.3. Signal Preprocessing
The task of data processing is divided into two main steps. The first step is time-series segmentation. Segmentation algorithms divide continuous data streams into discrete time intervals of the type expected by the information processing step [
34,
35,
36]. The main purpose of data segmentation here is to separate the preprocessed data stream into the data segments that contain the information of complete behavior, and then the separated data segment is mainly used for the identification of feature extraction in the next step. The basic approach to this problem is to use a sliding window with a fixed length and split each time series into equal segments. Each data segment is identified by a start symbol and an end symbol that turns out to be another start symbol of the following segment at the same time. However, as the boundaries among physical activities are extremely vague, it is very difficult to split the valid sensor data stream effectively. The question that can arise here is how the recognition accuracy depends on the window length. Generally, the window size ranging from 2 s to 6.7 s is picked up among the majority of works, while a longer window length is also selected, such as 10 s and 12.8 s [
4,
34,
35,
36,
37,
38]. Each segment has a multi-dimensional (feature) vector extracted from it, which will be used for classification [
4,
39]. In this paper, a simple sliding window with no overlap was chosen for signal segmentation in
Figure 4.
The other step is feature extraction and selection. Overall, Multi-domain features, including 64 features (50 time-domain and 14 frequency-domain features), were extracted for training classifiers, as shown in
Figure 5. Note that the attributes of the 10th, 25th, median, 75th, and 90th percentiles represented an estimate of signal distributions in each signal. The attributes of mean and standard deviation were extracted to provide a general description of PA intensity degree. In addition, the correlation coefficient feature between the hip unit and the wrist unit was selected as well, which reflected a measurement of the coordination or variation between the upper limb and the body during an activity. Frequency-domain features (energy and entropy) have been extracted separately for these two accelerometers. As for the ventilation sensor on the abdomen, the breathing frequency was decided by the dominant frequency of the respiratory signal obtained from a spectral analysis. Meanwhile, to avoid the situation that the features in the smaller numeric ranges could be overwhelmed by those of greater numeric values, normalization was necessarily applied to convert the extracted features into the range from 0 to 1 [
40].
Based on the dataset of the human body, the accelerometer of the hip and wrist and the data of the respiratory telescopic sensor of the abdomen were selected in this paper for time domain characteristics and frequency domain characteristics.
Mean is the average level of signal values in the index frame, which can be calculated by the formula:
where
indicates the sensor sequence and
indicates the sequence length.
Variance (VR) describes the degree of data dispersion of a signal around the arithmetic mean. The formula:
where
is the data of standard deviation,
is the average sensors data.
The Correlation Coefficient (
CC) considers the degree of correlation between data at different locations. For two signal sequences
and
, the correlation coefficient between them can be expressed as:
where
represents the covariance of the two,
and
represent their standard deviations, respectively.
Energy is the average power of the signal
over the time interval (−
N/2,
N/2). The signal spectrum is obtained by the fast Fourier transform, and the power signal of the spectrum is the sum of the squares of the spectrum modes, so the energy can be calculated by the following formula:
where
is the amplitude of the
fast Fourier transform.
Spectral Entropy (SE) is the subframe entropy of normalized spectral energy. To calculate the spectral entropy of the frame
, each signal frame is first divided into
subframes of fixed size. Then, the spectral energy of each subframe is calculated and divided by the total spectral energy of the signal frame. The spectrum entropy formula is:
where
is the spectral energy of the subframe.
To map the original activity data to different category Spaces, it is necessary to analyze its statistical characteristics, such as mathematical distribution and extract the recognition feature vectors that can represent different human activities from different dimensions. However, too many feature vectors will bring some irrelevant or redundant information, which will affect the accuracy of tag prediction. In this paper, 49 time-domain features, 1 correlation feature and 14 frequency-domain features have been extracted for the following pattern recognition. To realize the diversity of the training of each base classifier, 70% of the overall features were picked up randomly for the training of the classifier.
2.4. Machine Learning Model
Machine Learning uses algorithms to analyze existing data to acquire knowledge and then apply it to new data. In this paper, three machine learning algorithms are used to train the model. Including Random Forest (RF), Sequential Minimal Optimization (SMO) and K Nearest Neighbors (KNN).
Random Forest (RF) was proposed by Breiman in 2001. As a general classification and regression method, it combines several random decision trees and shows excellent performance in an environment where the number of variables is much larger than the number of observations through the average fusion mechanism. Based on the simple and feasible voting mechanism of random Forest and its high and stable accuracy, random Forest has been widely used in medicine, text classification and facial recognition.
A Support Vector Machine (SVM) classifier is a supervised learning algorithm based on statistical theory. It is mainly used in the fields of regression analysis and pattern recognition. It can minimize the empirical errors of data while maximizing geometric edges, providing excellent generalization performance. Based on the SVM algorithm, Shevade et al. proposed an iterative algorithm of Sequential Minimal Optimization (SMO), which can effectively replace vacancy values in data and can effectively solve multi-class classification problems by using kernel functions of Gaussian kernel.
k-Nearest Neighbor (kNN), unlike Eager Learning algorithms such as random forest, needs to learn a model on the training sample set according to certain rules or algorithms and then classify test samples. The Negative Learning algorithm (Lazy Learning) represented by kNN is to jointly model test samples and training samples.
2.5. Recognition Module
As an open-source data mining platform, Waikato Environment for Knowledge Analysis (WEKA) brings together a large number of machine learning algorithms that can undertake data mining tasks through visualization on a new interactive interface. The whole PA recognition frame is architected by Java, relying on the WEKA toolkit.
A two-layer cascade classifier structure is adopted to build the classification system (Algorithm 1). To avoid similar data from the same participant appearing in the testing set, make sure the irrelevance between the testing set and training set when every iteration. Note that the pre-layer aims to classify the intensity category label by selecting some features and then assigning the corresponding classifier according to the outcome from the pre-one in the second layer. Ultimately, the final prediction will be given through the cascade architecture.
Algorithm 1: The pseudo-code of cascade classifier based on multi-label (CCM) algorithm. |
Inputs: Instances, a sequence of n instances {(x1,y1,y1′), …, (xn,yn,yn′)} with two kind of labels, yi, yi′ ϵ Y = {1, …, k}. SubjectSet, a collection of key values of all the participants. 1: foreach sub.Id in SubjectSet: 2: iterate instance in Instances: 3: if instance.subId = sub.Id then: 4: put instance into the testing set 5: else if 6: put instance into the training set 7: end iterate 8: build the activity intensity model_Layer1 (training set) 9: build the activity type models_Layer2 (training set) 10: validate the Model_Layer1 (testing set) 11: get the label of layer1 then: 12: validate the models_Layer2 (testing set) 13: activity label ← get the label of the layer2 14: return activity label 15: end for 16: output: activity label |
4. Discussion
In this paper, a novel multi-label cascade classifier system has been proposed and adopted for daily physical activity pattern recognition and achieves a promising performance. The wearable inertial measurement device has been designed to acquire the body motions information and respiration rate, which consists of a ventilation sensor around the abdomen and two tri-axial accelerometers placed on the wrist and the hip separately. Compared with the traditional single accelerometer measurement, applying multiple inertial sensors can measure and provide more detailed information about body movements. Meanwhile, a ventilation sensor enhances the additional measurement of respiration expenditure and physical activity energy expenditure. Multi-label is addressed to make a supplement for activity pattern recognition. An object that owns more than one label is a more common phenomenon. By adding the PA intensity labels, one extra indicator would support making the instances divided into the correct category as much as possible.
Generally, the current PA recognition systems have acquired quite an acceptable classification performance by using ensemble learning (integrating the multiple different classifiers based on the classification accuracies of the different PA patterns) and reliable decision fusion strategies, such as the instance-specific weighted majority voting. However, the higher classification accuracy has been obtained at the expense of taking more computational resources than non-ensemble classifiers. Meanwhile, the complexity of the ensemble model built is too redundant, and all PA patterns need to be judged in every base classifier, which enlarges the scale of models and enhances the difficulty of classification. As a result, unnecessary time consumption is also a factor to consider undoubtedly. Based on that, a cascade classifier structure based on a multi-label (CCM) approach has been designed and evaluated for the physical activity measurement and recognition system and simplifies the classification ranges of the classifiers. In this study, the cascade classifier structure based on the multi-label (CCM) approach has been designed and evaluated for the physical activity measurement and recognition system. A two-cascade structure has been established. The first layer mainly focuses on the classification of PA intensity categories and assigns the corresponding classifier in the second layer according to the prediction of the first layer to realize the PA recognition. Comparing the classifiers with non-CCM, the CCM approach has shown better performance, that is, higher mean accuracies and lower standard deviations.
Moreover, the CCM system has demonstrated better generalization capability than the non-CCM one. As seen in the leave one individual out cross-validation results, the CCM approach has presented a better performance on the PA classification of new test subjects, while in contrast, lower classification accuracies have been obtained from the same test subjects when using the non-CCM model. The cascade classifier approach, on the other hand, maintains the statistical distribution of each sensor dataset of its own and makes each classifier devoted to the minority of PA patterns. It is seen that the classification performance is reliable and robust in generalization due to the variability among the participants being reduced significantly. In the implementation, the RF-CCM structure classifier is selected first. In the process of training, all the training datasets are used for training, and the optimal parameters of the first layer are obtained and fixed, and then the optimal parameters of the second layer are obtained. Then, the final model will be used for testing.
Despite the promising performance that has been illustrated in this study, it is notable that some shortcomings still need to be mentioned. For instance, the cascade classifier structure exists the phenomenon of error passed and superimposed; that is, an instance must be divided into a wrong category no matter how precise the classifiers are in the second layer if the first layer gives a misclassification. As a consequence, measures need to be taken to optimize the cascade structure and reduce the error passed and superimposed. On the other hand, more subjects will be involved in later research to enhance the robustness of generalization. Furthermore, several issues remain discussed as follows,
The selection and comparison of sliding window length, features, and base classifiers.
Number and placement of the wearable device are arranged to acquire a better classification performance.
Optimization of the CCM structure to decrease the error accumulation.
We hope that these matters will be addressed in future studies to further improve the performance and generalization capability of the multi-label-based cascade classifier system.