News Schemes for Activity Recognition Systems Using PCA-WSVM, ICA-WSVM, and LDA-WSVM

: Feature extraction and classification are two key steps for activity recognition in a smart home environment. In this work, we used three methods for feature extraction: Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Linear Discriminant Analysis (LDA). The new features selected by each method are then used as the inputs for a Weighted Support Vector Machines (WSVM) classifier. This classifier is used to handle the problem of imbalanced activity data from the sensor readings. The experiments were implemented on multiple real-world datasets with Conditional Random Fields (CRF), standard Support Vector Machines (SVM), Weighted SVM, and combined methods PCA+WSVM, ICA+WSVM, and LDA+WSVM showed that LDA+WSVM had a higher recognition rate than other methods for activity recognition.


Introduction
Activity recognition is one of the most important tasks in pervasive computing applications [1][2][3][4].Research in human activity recognition is aimed to determine a human user's activity, such as cooking, brushing, dressing, sleeping, and so on.Therefore, different types of sensors have been used to sense user's activities in smart environments.
Sensor data collected needs to be analyzed using machine learning and pattern recognition techniques [5,6] to determine which activities is taking place by the dweller.As for any pattern recognition task, the keys to successful activity recognition are: (i) appropriately designed feature extraction of the sensor data; and (ii) the design of suitable classifiers to infer the activity.The learning of such models is usually done in a supervised manner and requires a large annotated dataset recorded in different settings [1][2][3].
The existing activity recognition algorithms suffer from two problems: the non informative of the feature space and the imbalanced data result in a degradation of the performance of activity recognition.Thus, feature extraction [7] preprocessing steps exist to extract a subset of new features from the original set by providing a better selection of relevant features of high-dimensional data, as well as high discrimination between classes.In this paper an attempt has been made to study three feature extraction methods, which are Principal Component Analysis (PCA) [8], Independent Component Analysis (ICA) [9], and Linear Discriminant Analysis (LDA) [10], and their relevance to improve the classification accuracy of the existing activity recognition systems.
Another problem affecting the performance of an algorithm's activity classification is the imbalanced data [11,12].Activity recognition datasets are generally imbalanced, meaning certain activities occur more frequently than others (e.g., sleeping is generally done once a day, while toileting is done several times a day).This can negatively influence the learning process due to the known effect of minority class, which, in turn, imbalances the outcome, and may yield disastrous consequences for human activity recognition systems.This motivates extensive research that aims to improve the effectiveness of SVM on imbalanced classification in the activity recognition field [13,14].Especially, approaches for addressing the imbalanced training-data problem can be categorized into two main streams: data processing approach and algorithmic approach [15][16][17].
The first approach is to preprocess the data either randomly or intelligently, by using undersampling the majority instances [16] or oversampling the minority instances [15].In this paper, we consider the algorithmic approach in the following because it keeps all the information and does not change the distribution of training data.The solutions of this approach include the cost-sensitive learning [18,19] that treats different misclassifications using the weights assigned to data in order to pursue a high accuracy of classification.
Our paper addresses these issues and contributes on the following topics.Firstly, we have presented new schemes using PCA+WSVM, ICA+WSVM, and LDA+WSVM to recognize activities of daily living from binary sensor data.The Weighted Support Vector Machine (WSVM) [9] was employed to handle the imbalanced classification data problem using three methods, independently, for feature extraction: PCA, ICA, and LDA.Secondly, the proposed approaches are assessed and compared with the Conditional Random Fields (CRF) [20], the standard SVM, and Weighted SVM methods.Especially, CRF has recently gained popularity in the activity recognition field [1,3].The experiments were implemented on multiple annotated real-world datasets from sensor readings in different houses [21,22].

Proposed Strategy Based Activity Recognition System
Despite its popularity in machine learning, the SVM technique has not been extensively used in activity recognition studies as pointed out in [23][24][25][26].However, by the high accuracy rates obtained in other contexts, this would suggest possible success in activity recognition.Nevertheless, it is overwhelmed by the majority class instances in the case of imbalanced datasets.The weighted Support Vector Machine (WSVM) technique has been suggested as a candidate solution for such a purpose because it uses an efficient training approach that will improve its ability to learn from a large or imbalanced data set and, therefore, improve the performances of multi-class classifier SVM.
In this paper, a new activity recognition scheme is proposed; the WSVM method was applied for imbalanced classification using three methods, independently, for feature extraction: PCA, ICA, and LDA, as shown in Figure 1.PCA aims to eliminate the redundancy information.ICA estimates components as statistically independent as possible.LDA improves the separability of samples in the subspace and extracts LDA features.Then, these transformed (lower-dimensional space) datasets by each feature extraction method will be used for learning and testing a WSVM classifier.The outcome of the trained WSVM will then be used to process a new observations during the testing phase, where the associated activities of daily living class will be predicted.

Suppose
 , m is the total of samples, n is sample's feature dimension, N is the total of classes.Projected sample is:

Principal Component Analysis (PCA)
Principal component analysis [8] is a projection-based technique that approximates the original data with lower dimensional feature vectors through the construction of uncorrelated principal components that are a linear combination of the original variables.However, PCA is ignorant of the class labels attached to the data, so a good class separation in the direction of the high variance principal components is not guaranteed [8].The main process of PCA is as follows.
In PCA, data matrix * m n X R  are first centered x x x   with x is the mean of the samples.
Then PCA diagonalizes the covariance matrix as This problem leads to solve the eigenvalue equation where V = [v1, v2, …, vi], (i = 1, …, n) is the n × n matrix containing n eigenvectors and λ is an n × n diagonal matrix of eigenvalues of the covariance matrix.In Equation ( 2), each n dimensional eignvector vi corresponds to the ith eigenvalue λi.The variance in any direction vi can be measured by dividing the associated eigenvalue λi by the sum of the n eigenvalues.The first p principal components are selected as principal components which will be used for classification when their accumulative contributive rate is: The most commonly used method for generating spatially-localized features is independent component analysis (ICA) to produce basis vectors that are statistically independent (not just linearly decorrelated, as with PCA) [9].The algorithm works on the principle of minimizing mutual information between the variables; minimizing mutual information is the correct criteria for judging independence.Additionally, minimizing mutual information is same as maximizing entropy.
The ICA model can also be written as: Based on the knowledge of informatics, negentropy of U can be used as the criteria to estimate the independency of vectors, which is approximated by using the contrast function [27]: where V is the standardized Gaussian random variable (zero mean and unit variance).G is a non-quadratic function, the commonly used G can be: where Maximizing formula in Equation ( 5) leads to estimating i w by where * i w is a new estimated value of i w .g and g' are respectively the first and second derivatives of G.
Based on the maximal negentropy principal, the whole matrix W can be computed by maximizing the sum of one-unit contrast function and taking into account the constraint of decorrelation [27].
In practice, ICA can often uncover disjoint underlying trends in multi-dimensional data.

Linear Discriminant Analysis (LDA)
The aim of LDA is to find the optimal projection matrix using the Fisher criterion below, to find the maximum of ratio of between-class scatter SB to the within-class scatter SW of the projected samples: W S W  (10) where the between and within class covariance SB and SW are defined as: where x is the mean of the ith class, and x is the overall mean vector.
To maximize (10), the optimal Wopt are the eigenvectors associated with the largest eigenvalues of the following generalized eigenvalue problem: The solution can be computed by solving the leading eigenvectors of -1 B W S S that correspond to the eigenvalue λi.Then column vectors wi are row vectors in the transformation matrix W. It should be noted that only those eigenvectors should be selected that correspond to eigenvalues carrying most of the energy, i.e., the total dispersion.Another interesting property is that this transform decorrelates both SB and SW matrices.The rank of SB is at most the N-1, and hence no more than this number of new features can be obtained.

Weighted Support Vector Machines (WSVM)
A SVM classifier is more insensitive to the problem of learning from imbalanced data.It considers a balanced training set using the same cost parameter C of different classes; this may generate suboptimal classification models.The SVM optimization primal problem is given as follows: , , 1 min 1/ 2 ( , ) The Weighted Support Vector Machine (WSVM) was presented to deal with this problem by introducing two different cost parameters C  and C  in the SVM optimization primal problem [5] for the majority classes (yi = +1) and minority (yi = −1), as given in Equation ( 15) below: The dual optimization problem of WSVM with different constraints on i  can be solved in the same way as solving the standard SVM optimization problem [5], which has the following dual form: , and where m + and m − are number of samples from +1 and −1 classes.C  and C  are the cost parameters for positive and negative classes, respectively, to construct a classifier for multiple classes.They are used to control the trade-off between margin and training error.Some authors [19,28,29] have proposed adjusting different cost parameters for different classes of data, which effectively improves the low classification accuracy caused by imbalanced samples.Veropoulos et al. in [19] proposed to increase the trade-off associated with the minority class (i.e., C C    ) to eliminate the effect of class imbalance.However they did not suggest any guidelines to decide what the regularization factors should be.The coefficients are typically chosen as [30]: When the two classes which request different sample size have the similar properties boundary (that is, the ratio of vectors supported by each class and their total sample size is equal, or these two classes have similar error rate), Hong Gunn Chew and others [30] took a detailed analysis for the reasons of classification accuracy caused by the size of the class in the SVM algorithm, and put forward the corresponding solutions.They obtained the following conclusions like this: where C is the common cost coefficient for both classes in Equations ( 17) and ( 18), w + and w − are the weights for +1 and −1 class respectively.In this paper, the weights are typically chosen as w + = 1 and w − = m + /m − for two-class WSVM.This criterion respects this reasoning that is to say that the tradeoff C  associated with the smallest class is large in order to improve the low classification accuracy caused by imbalanced samples.The modified SVM algorithm would not tend to skew the separating hyperplane towards the minority class examples to reduce the total misclassifications as the minority class examples are now assigned with a higher misclassification cost.
For multiclass imbalanced data classification, we used different misclassification penalties per class.Typically the smallest class gets weighed higher.It allows the user to set individual weights for individual training examples, which are then used in WSVM training.We give the main ratio cost value Ci for each class i (1, …, N) in the function of the class prior probabilities P(C+) and P(Ci) for the C+ et Ci classes, respectively; it is given by: We estimate each class prior probability P(Ci) as the proportion of the number of samples in class i to the total number of training samples as follow: Based on the above equation, the corresponding cost criterion in feature space can be given as follows:   where m  is the number of samples of majority classes and mi is the number of samples of the other classes.C is the common ratio misclassification cost factor of the WSVM.The search of the optimal value of the regularization parameter C is determined with the cross validation method.Where [ .] is integer part of the quantity under square bracket.Notice that it always holds that i C C  .
In this study, a software package LIBSVM [31] was used to implement the multiclass classifier algorithm.It uses the one-versus-one method (OVO) [5].OVO method consists in constructing ( 1)/2 N N  classifiers and each one is trained on data from the two activity classes.When all ( 1)/2 N N  classifiers are constructed, a voting strategy is used for the test.The point is predicted in the class with the largest number of votes (''Max Wins'' strategy).Chen et al. [32] discussed issues of using the same or different parameters for the ( 1) / 2 N N  two-class problems.Their preliminary results show that both approaches give similar accuracy.

Datasets
To evaluate the performance of our experimentations, we used different annotated datasets using different sensor networks in a pervasive environment [21,22].The details of all the datasets are shown in Table 1.Each network was installed in a different home setting and was composed by a different number of sensors nodes.These sensors were installed in everyday objects such as doors, cupboards, refrigerator, and toilet flush to record activation/deactivation events (opening/closing events) as the subject carried out everyday activities.The sensor data were labeled using different annotation methods.A list of activities that were annotated for all datasets with the number of observations of each activity can be found in Table 2. Any period of time at which no activity took place was labelled "Idle".This table clearly shows how some activities occur very frequently (e.g., "toileting"), while others that occur less frequently have a longer duration (e.g., "leaving" and "sleeping").Therefore, the datasets suffer from a severe class imbalance problem due to the nature of the data.

Setup
The models were validated by splitting the original data into a test and a training set using a "Leave One Day Out cross validation" approach, retaining one full day of sensor readings for testing and using the remaining sub-samples as training data.The process is then repeated for each day and the average performance measure reported.
Sensor outputs are binary and represented in a feature space which is used by the model to recognize the activities performed.The vector contained one entry for each sensor, two-state sensors 0 or 1 are used and the features are the states of all sensors.The raw sensor representation uses the sensor data in the same way it was received from the sensor network.The value is 1 when the sensor is active and 0 otherwise.We do not use the raw sensor data representation as observations; instead we use the combining "Change point" and "Last" representation which have been shown to give much better results in activity recognition [3].
In learning imbalanced data, the overall classification accuracy is not considered an appropriate measure of performance.Due to the fact that, in our case, we evaluate the models using F-Measure, a measure that considers the correct classification of each class is equally important.It is calculated from the precision and recall scores.We are dealing with a multi-class classification problem and therefore define the notions of true positive (TP), false negatives (FN), and false positives (FP) for each class separately.With highly-skewed data distribution, the overall accuracy metric at (23) is not sufficient anymore.It does not take into account differences in the frequency of activities.These measures are calculated as follows:

Results
In our experiments, the SVM algorithm is tested with a LibSVM implementation [31].It was used to implement the one-versus-one multiclass classifier [5].We used the radial basis kernel function as follows: . Firstly, we optimized the SVM hyper-parameters (σ, C) for all training sets in the range (0.1-2) and [0.1, 1, 10, 100], respectively, to maximize the class accuracy of the leave-one-day-out cross validation technique.The best pair parameters (σopt, Copt) = (1.7,1), ( ) are used for the datasets TK26M, TK57M, TAP30F, and TAP80F respectively.Then, locally, we optimized the cost parameter Ci, adapted for each activity class by using WSVM classifier with the common cost parameter is fixed C = 1, see Tables 3-6.We reported in Figures 2 and 3 the selected features using PCA and LDA for all datasets.The summary of the performance measures obtained for all classifiers are presented in Table 7.
For CRF results on these datasets, refer to [3,33,34].ICA differs from PCA in the fact that the low-dimensional signals do not necessarily correspond to the directions of maximum variance.We start with the first independent component and keep increasing the number until the crossvalidation error reduces.
After the selection of the best parameters, we evaluated the performance of different algorithms using appropriate metrics for imbalanced classification.The classification results for CRF, SVM, WSVM, PCA+WSVM, ICA+WSVM, and LDA+WSVM are summarized in Table 7 below.This table shows that LDA+WSVM method gives a clearly better F-measure performance, while CRF and SVM methods perform better in terms of accuracy for all datasets.As can be noted in this table, LDA outperforms PCA and ICA for recognizing activities with a WSVM classifier for all datasets.The PCA+WSVM method improves the classification results compared to CRF, SVM, WSVM, and ICA+WSVM for the TAP30F and TAP80F datasets, compared to other datasets.
The Figures 4 and 5 give the classification results in terms of the accuracy measure for each activity with WSVM, PCA+WSVM, ICA+WSVM, and LDA+WSVM methods.
In Figure 4, for WSVM, PCA+WSVM, LDA+WSVM models, the minority activities "Toileting", "Showering", and the kitchen activities "Breakfast" and "Drink" are significantly better detected, compared to other methods.LDA+WSVM is an effective method for recognizing activities.The majority activities are better for all methods, while the "Idle" activity is more accurate for the LDA+WSVM method.We can see in Figure 5 that the minority activities ("Toileting", "Washing dishes", "Watching TV", "Listen music", and the kitchen activities "Prep.Lunch", "Prep.Snack") are better recognized with LDA-WSVM.Additionally, the kitchen activities perform worst for all datasets.They are, in general, hard to recognize but they are better recognized with LDA-WSVM compared to others methods.

Discussion
Based on the experiments carried out in this work, a number of conclusions can be drawn.Using experiments on large real-world datasets, we showed the F-measure obtained with TK26M dataset is better compared to other datasets for all recognition methods because the TK57M, TAP30F, and TAP80F datasets include more activity classes.We supposed that the use of a hand-written diary in the TK57M house and PDA in TAP30F and TAP80F houses for annotating data is less accurate than using the Bluetooth headset as in TK26M house.For the TK26M dataset, a Bluetooth headset was used which communicated with the same server the sensor data was logged on.This means the timestamps of the annotation were synchronized with the timestamps of the sensors.In TK57M activity diaries were used, this is more error-prone because times might not always be written down correctly and the diaries have to be typed over afterwards.
In this section, we explain the difference in terms of performance between different recognition methods for imbalanced dataset.Our experimental results show that WSVM and LDA+WSVM methods work better for classifying activities; they consistently outperform the other methods in terms of the accuracy of the minority classes.In particular, LDA-WSVM is the best classification method for all datasets because the LDA method is more adapted for the features reduction in the datasets with consideration the discrimination between classes.
PCA-WSVM outperforms CRF, SVM, WSVM, and ICA-WSVM for TAP30F and TAP80F datasets.In other datasets ICA-WSVM surpasses PCA-WSVM.We conclude that the PCA method is more adapted for the features extraction in the datasets with large features vectors.
A multiclass SVM classifier does not take into consideration the differences (costs) between the class distributions during the learning process and optimizes with the cross-validation research the same cost parameter C for all classes.Not considering the weights in SVM formulation affects the classifiers' performances and favors the classification of majority activities ("Idle", "Leaving" and "Sleeping").Although WSVM, including the individual setting of parameter C for each class, is significantly more effective than CRF and SVM methods, WSVM is not efficient compared to LDA+WSVM.The LDA method significantly improves the performance of the WSVM classifier.Thus, it follows that LDA-WSVM can be made more robust for classifying human activities.
The recognition of the minority activities in TK26M as "Toileting", "Showering", "Breakfast" "Dinner", and "Drink" is lower compared to "Leaving" and "Sleeping" activities.This is mainly due to the fact that the minority activities are less represented in the training dataset.However, the activities "Idle" and the three kitchen activities gave the worst results compared to the others activities.Most confusion occurs between the "Idle" activity and the kitchen activities.In particular, the "Idle" is one of the most frequent activities but is usually not a very important activity to recognize.It might, therefore, be preferable to lose accuracy on this activity if it allows a better recognition of minority classes.

Figure 1 .
Figure 1.Scheme of the proposed strategy based activity recognition system.

Figure 4 .
Figure 4. Accuracy for each activity on TK26M dataset.

Figure 5 .
Figure 5. Accuracy for each activity on TAP80F dataset.

Table 1 .
House settings description.

Table 2 .
Annotated list of activities for each house and the number of observations of each activity.The bold letters represent each activity.

Table 3 .
Selection of the weights wi using TK26M dataset.

Table 4 .
Selection of the weights wi using TK57M dataset.

Table 5 .
Selection of the weights wi using TAP30F dataset.

Table 6 .
Selection of the weights wi using TAP80F dataset.