Smartphone Mode Recognition During Stairs Motion †

: Smartphone mode classiﬁcation is essential to many applications, such as daily life monitoring, healthcare, and indoor positioning. In the latter, it was shown that knowledge of the smartphone location on pedestrians can improve the positioning accuracy. Most of the research conducted in this ﬁeld is focused on pedestrian motion in a horizontal plane. In this research, we use supervised machine learning techniques to recognize and classify the smartphone mode (text, talk, pocket and swing) while accounting for the movement up and downstairs. We distinguish between the going up and the down motion, each with four different smartphone modes, making eight states in total. This classiﬁcation is based on the use of an optimal set of sensors that varies according to battery life and the energy consumption of each sensor. The classiﬁer was trained and tested on a dataset constructed from multiple user measurements (total of 94 min) to achieve robustness. This provided an accuracy of more than 90% in the cross validation method and 91.5% if the texting mode is excluded. When considering only stairs motion, regardless of the direction, the accuracy improves to 97%. These results may assist many algorithms, mainly in pedestrian dead reckoning, in improving a variety of challenges such as speed and step length estimation and cumulative error reduction.


Introduction
The need for identifiying the "smartphone mode", i.e., the way a person is holding the smartphone, is becoming more and more significant in many applications such as healthcare services, commercial usages, emergency and safety applications, etc. [1]. One of the main usages of smartphone mode recognition is to improve the capabilities of indoor navigation algorithms, particularly pedestrian dead reckoning (PDR) approaches. Such approaches are based on the measurement of step length and heading calculation. The former is based on empirical or biomechanical models which are highly affected by the smartphone mode [2]. Hence the importance of a fast and accurate recognition algorithm that can classify between multiple possible smartphone modes.
The smartphone mode is characterized by the relative location of the phone, the phone movement along periods of time, different relative angles of the phone (yaw, pitch, roll), sound levels, luminous intensity, and more. Those measurements are calculated by the phone physical sensors, e.g., accelerometer, gyroscope, magnetometer, etc. [3]. Those can be presented to the user and sorted on the device with a variety of applications. In this paper we aim to classify four smartphone modes-texting, talking, swing and pocket-while the pedestrian is going up or down the stairs.
Each of the smartphone modes above was divided to two groups: Walking up the stairs and walking down the stairs, as presented in Figure 1, resulting overall in eight different modes of classification. While the recognition of the four common modes was explored in the past, the ascent and descent separation is a relatively unexplored field. Results show an accuracy of 90.25% for the eight states and over 96.75% for the four main modes mentioned.
The rest of the paper is organized as follows: Section 2 presents the problem and our approach to solving it. Section 3 describes the data collection process, experimental setup and results, while Section 4 provides the conclusions.

Problem Formulation
Let x t ∈ R d a d dimension vector represent the data collection calculated by the phone sensors in time t. Corresponding to x t is the label y t which is determined by the smartphone mode at that time. We define a time window of size n that will be the data gathered from the sensors from time t − n + 1 to time t, the window label y n must be similar to all data samples composing this window. So, we have the following matrix: The quantity of time windows that are extracted from the data is a function of window size n, number of samples of data T, and the overlap percentage between the time windows 1 n . We notice that the overlap between the time windows can only be executed backwards due to the fact that we observe a causal system and we cannot use future windows in real life applications. Our goal is to design an algorithm that can classify a given time window to the correct smartphone mode label.

Feature Extraction
The time window W n is composed from d time vectors {u i } d 1 of length n respectively to the d data points in x t . On each of these vectors, we will calculate a collection of features to achieve more information on our data and improve the classification process. We distinguish between two groups of features [ of three axes measurements, i.e., acceleration measurements, gyroscope measurements, and magnetic field measurements.
Let σ represent the number of statistic features and τ represent the amount of time features. Thus, from each time window W n,i we can produce a feature vector χ i with the corresponding label of that time window y i . Eventually, the full feature matrix is obtained as shown in Equation (2) for T data samples, time window size n, and overlap between the windows of n − 1 samples:

Classification
In our research, we compare a wide variety of machine learning algorithms to determine which is the best classifier for this problem. Each classifier was trained and tested on our dataset which contains sensors output from six different pedestrians with six different smartphones, and the accuracy was calculated using the cross validation method as shown in Section 3. After the initial classification test, we wish to improve the results and the robustness of the algorithm using the following methods.
First, we optimize the hyper-parameters of each classifier to the parameters that best suit the nature of our data. The next step is to perform a feature selection process in order to extract the main features from the featureX matrix. This action is used to avoid over-fitting and spurious correlations, shorten training and testing time and improve the results. All physical sensors except thermometer, barometer and sound level meter generate cross-sensor measurements. For example, the combination of the accelerometer and the gyroscope generates linear acceleration and gravity. Because of the massive amount of features we first used the feature selection method which takes into account the best subset of measurements for each physical sensor, once all combinations have been tested. Afterwards, the "tsfresh" dedicated module was used, which propose a p-value based approach that inspects the significance of the features individually [5].
The optimal subset of sensors and measurements to use in the classification process is also calculated. Filtering the sensors subset is significant to achieve the optimal results for many reasons. First, some of the sensor output may be not relevant for our type of classification and moreover might impair the results due to over-fitting. Second, the application of our experiment, specifically closed space navigation using a smartphone, required the classification to be executed quickly, accurately, and the energy consumption to be minimized, i.e., using as few sensors as possible while preserving the best results.

Data Collection and Processing
As mentioned in Section 2, the data from the smartphone sensors was collected by six users considering all eight different modes. Each user used a different smartphone with an Android operating system; the application for reading the sensors output was identical for all measurements. The sensors output sampling rate was set to 10 Hz , if a device had a faster sampling rate the data was re-sampled accordingly. In the datasets, windows with different lengths and no overlap were applied with the purpose of executing the feature extraction process. The total amount of data collected and windows extracted for each mode is presented in Table 1.

Classification Process
To perform the classification process, feature extraction (see Section 2.2) is applied on the time windows (Section 2.1) constructed on the sensor raw measurements. The proposed algorithm performance is evaluated by performing a series of classification tests and comparisons. We compare four types of machine-learning classifying algorithms: K Nearest Neighbours (K-NN) [6], Decision Tree [7], Random Forest [8], and XGBoost [9].
The accuracy performances of the classifiers were determined by the portion of correct label prediction out of all the labels in the test set of each fold in the cross validation process. For validating the proposed algorithm, the accuracy of the main modes without the stairs partition was also tested. Table 2 shows the best results of each classifier as produced with the parameters of the experiment. We notice that all the Random Forest classifiers showed superior results relative to the other classifiers −90.26%. Furthermore, we see that the accuracy for the four main modes classification gave results of over 96%, i.e., most of the false classifications occurred in the upstairs vs. downstairs labelling and not between main smartphone modes. We can verify the last statement in Figure 2, showing the confusion matrices of the Random Forest classifier. Moreover, the texting mode is the most inaccurate for the assent and descent division; if we test the other three modes, we achieve accuracy of 91.5%. A possible reason could be that in texting mode the user keeps the phone in a relatively static position which makes the classification more challenging than the other modes.
The accuracy of the classifiers is based significantly on the features that were used in the classification process. Figure 3 presents the feature importance attribute of the random classifier. It is shown that the most important features in the classification are related to the specific force measurements produced by the accelerometer. Other dominant features were related to the acceleration calculation produced by the accelerometer and the atmospheric pressure produced by the barometer sensor. The influence of the barometer sensor was expected since it is often used for measuring elevation in many devices, thus the impact on the stairs movement classification process. Figure 4 shows the influence of the number of sensors used in the classification process on its performance. We see that the best results were produced by using 8 out of the 10 sensors in the phone. It is also shown that we can obtain small regression with less battery consumption by using a different sensor subset.   Table 1 (a) and out of 4 labels (b) for only the main smartphone modes (1-swing, 2-pocket, 3-talking, 4-texting).

Conclusions
In this research we addressed the problem of smartphone classification of four modes (texting, talking, swing and pocket) during stairs movement. A classification accuracy of 90.25% was obtained while classifying four main smartphone modes and divided them into climbing and ascending stairs-a relatively unexplored area of smartphone modes. In future research we aim to use neural network based approaches to improve the classification accuracy.