On-Body Smartphone Localization with an Accelerometer

A user of a smartphone may feel convenient, happy, safe, etc., if his/her smartphone works smartly based on his/her context or the context of the device. In this article, we deal with the position of a smartphone on the body and carrying items like bags as the context of a device. The storing position of a smartphone impacts the performance of the notification to a user, as well as the measurement of embedded sensors, which plays an important role in a device’s functionality control, accurate activity recognition and reliable environmental sensing. In this article, nine storing positions, including four types of bags, are subject to recognition using an accelerometer on a smartphone. In total, 63 features are selected as a set of features among 182 systematically-defined features, which can characterize and discriminate the motion of a smartphone terminal during walking. As a result of leave-one-subject-out cross-validation, an accuracy of 0.801 for the nine-class classification is shown, while an accuracy of 0.859 is obtained against five classes, which merges the subclasses of trouser pockets and bags. We also show the basic performance evaluation to select the proper window size and classifier. Furthermore, the analysis of the contributive features is presented.


Introduction
Mobile phones are getting smarter due to the advancement of technologies, such as microelectromechanical systems (MEMS), high performance and low power computation, also called a smartphone.Various sensors are embedded in or attached to a device, and a wide variety of contextual information can be extracted, which is about the user, the device and/or the environment.These sensors are (or will) not only utilized for explicit usage of the terminal's functionalities, like user authentication [1], display orientation change and backlight intensity control [2], but also for activity recognition [3,4], indoor person localization [5,6], pedestrian identification [7], environmental monitoring [8,9], etc.A phone carrying survey revealed that 17% of people determine the position of storing a mobile phone based on contextual restrictions, e.g., no pocket in the T-shirt, too large a phone size for a pants pocket, comfort for an ongoing activity [10].These factors are variable throughout the day, and thus, users change their positions in a day.This suggests that the context, on-body device position, has great potentials for improving the usability of a smartphone and the quality of sensor-dependent services, facilitating human-human communication, the reduction of unnecessary energy consumption, etc.In this article, we deal with nine popular storing positions for a smartphone, including four types of bag.We attempt to find a set of features that can characterize and discriminate the motion of a smartphone during walking, using an embedded accelerometer.The contributions of the paper are as follows:

•
Recognition features are analyzed from a microscopic point of view, in which a systematic feature selection specified 63 classifier-independent features that are more predictive of classes and less correlated with each other.Especially, we found that: (1) features derived from the y-axis are the most contributive; (2) the correlation between the y-axis and the magnitude of three axes, i.e., the force given to the device, might be useful to capture the characteristics of the propagated ground reaction force within nine storing positions; and (3) the selected features were also effective at classifying three additional classes, i.e., wrist, upper arm and belt.

•
A "compatibility" matrix is introduced and showed the possibilities of improving the accuracy by removing a "noisy" dataset of particular persons from a training dataset and training a classifier using a dataset with similar characteristics of the acceleration of a device during walking.

•
The high precision against "neck" and "trouser pocket" under leave-one-subject-out cross validation (0.95) allows reliable placement-aware environmental risk alert.
The rest of the article is organized as follows: Section 2 presents the importance of on-body position recognition with examples in three categories, and a literature survey is shown in Section 3. Section 4 describes our approach, followed by the performance evaluation in Section 5. Discussions based on the evaluation are presented in Section 6.Finally, Section 7 concludes the article.

Importance of On-Body Position Recognition
In this section, the importance of taking into account the on-body position of a device as the context of a system is presented.

Device Functionality Control
In our preliminary study, an audio notification is perceived with significantly smaller sound volume when a smartphone is hanging from the neck, compared to the case of putting it into a trouser pocket and a jacket pocket.This may also be experienced by a number of people.The case with a chest pocket comes in the middle of "neck" and "trouser pockets".One static solution is to set an audio level sufficiently high, so that a user could perceive it at any storing position; however, it is annoying in the vicinity when the smartphone is hanging from the neck, because a user can notice it with an even lower audio level at the position.Therefore, the audio volume can be adjusted at a minimum level by the information of the storing position, so that only the user can receive the notification, as Diaconita et al. intended [11,12].Other functionalities, such as a display component and a keypad, can be controlled to avoid power drain due to an invisible display, as well as accidental inputs when a smartphone is inside a bag or pocket [13].

Accurate Activity Recognition
A context-aware system does not work as designed when the context is not correctly recognized.In the work on activity recognition using body-mounted sensors, including smartphones, the sensing device is often assumed to be at the intended positions [3,4,14].Pirttikangas et al. showed that an accelerometer hanging from the neck had contributed to discriminate certain kinds of movement of the upper body, such as brushing teeth and sitting while reading a newspaper [4].
Atallah et al.
showed the variations in activity recognition performance by the position of body-worn sensor [15], in which sensors placed on the wrist and the chest had contributed to discriminate medium level activities, such as walking in a corridor and vacuuming.These findings imply that particular activities are not recognized accurately when the sensor is removed from the contributive position to another.In such a case, by utilizing the positional information, a system can ask a user to keep putting a smartphone into a chest pocket or turn the sensing component off to avoid noisy measurement based on application requirements.

Reliable Environmental Sensing
Smartphone-based environmental sensing is getting attention due to the popularity of smartphones and the existence of communication infrastructure [8,16], by which dense environmental information is easily collected without deploying a dedicated sensing system from scratch.The storing position of a smartphone is regarded as a key element of reliable measurement in such human-centric sensing, because the measurements are affected by storing positions [16][17][18].Especially, "outside a container" is important in such a case of noise sensing [9] and humidity/temperature sensing [18].In [18], a difference in the readings from a relative humidity sensor and a thermometer is observed due to the effect of body heat propagation.Furthermore, the positioning information on the Earth, e.g., latitude, longitude and the orientation, is usually captured by a GPS receiver, magnetometer and gyroscope along with the target sensor measurement.Vaitl et al. [19] and Blum et al. [20] report that even these sensors are affected by the storing position.In these cases, storing positional information can be utilized to build and select models to correct the measurement or notify a user of the state of storing into unintended positions, which is required to offer reliable sensing results.

Related Work
On-body position sensing is getting the attention of researchers in machine learning and ubiquitous computing communities [21][22][23], which starts from the work of Kunze et al. [24].Table 1 summarizes the comparison of the major work on on-body device localization with our work regarding the target positions, sensor types, evaluation method, number of subjects and position recognition accuracy.
The research direction is on the type of device that is actually realized or intended to be utilized in the future as a wearable device [23][24][25] or a smartphone [11][12][13]17,22,[26][27][28][29].The type of device relates to the selection of target positions.In the wearable device approach, the target positions range from the head to the ankle, including fine-grained discrimination, such as upper arm vs. forearm and shin vs. thigh [23].A device is usually attached firmly using a belt or a special mounting fixture.This indicates that the direction of the device might not change so irregularly within a specific activity in a frequent manner, given that small displacement might occur during activities [30].By contrast, a smartphone terminal is usually stored into containers, such as the pockets of a jacket, chest and trouser pockets and a wide variety of bags, as well as in a user's hand, hanging from the neck and on a table, as surveyed in [10,27].In this case, the degree of freedom of irregular movement in a large container, e.g., jacket pocket or handbag, would increase.In this article, we focus on the smartphone localization in nine storing positions on the body and carrying items, i.e., bags.We equally collect data from four types of bags, which is a unique aspect of our work.In existing work, the type of bag is not clearly defined [27] or limited to a backpack [11] and messenger bag [29].Therefore, the trained classifier has a bias on the collected types of bags.10-fold 99.9 (walking) Merged: "trousers", "bags" (5) LOSO 85.9 (walking) Another aspect is the modality of sensing, in which an accelerometer is dominant due to its low power operation and the availability in most commercial smartphones and wearable devices.Shi et al. [22], Alanezi et al. [28] and Incel [29] utilized a gyroscope in combination with an accelerometer, in which the combined approach slightly improved the accuracy [28,29]; however, considering the power-hungry nature of a gyroscope [31], the improvement would not be the reason for utilizing a gyroscope.Harrison and Hudson [13] utilized a multispectral light sensor to discriminate the device position based on light components.Although the recognition system was tested with a wide variety of positions, i.e., 27, from 16 people, the robustness on real-world usage seems to be still an issue.For example, a bag with cellular fabric might pass light inside, which may have similar light components even with active sensing.Active sensing methods were also utilized in [11,12] to regulate the environment that sensors capture.However, as pointed out by Jung and Choi [32], a vibration motor is a relatively high power component in a smartphone.Frequent activation, like a sliding-window approach, is not a practical solution; however, activation on receiving a phone call could work well as intended by Diaconita et al. [11,12].An advantage of an active sensing approach seems that the classification performance might not be so influenced between individuals, rather by the materials around.We consider that this helps the data collection tasks that need great human, time and monetary resources, although data collection from many variations of material is still required.
In this article, we extend our previous work [26], while utilizing the same dataset, by: (1) introducing the magnitude of three axes of acceleration as an axis for feature calculation (Section 4.4) that is found to be effective; (2) providing an analysis of contributive features from a microscopic point of view (Section 5.3); and (3) discussing the possibility of classifier-tuning based on the analysis of the compatibility of the dataset among people (Section 5.5).Recent work by Incel [29] shows an extensive study on acceleration-based phone localization, which proposes recognition features that represent the movement, rotation and orientation of devices during diverse activities of a person, e.g., walking, sitting, biking.Furthermore, Wiese et al. [27] and Diaconita et al. [12] trained and tested with a dataset from various users' conditions in addition to walking.By contrast, as outlined in Section 4.3, we primarily recognize the device position when a person is walking based on the thought that walking is the most frequent and consistent activity throughout the day.We have a mechanism of identifying the period of walking using constancy detection, which is intended to be applied before classification.Leave-one-subject-out (LOSO) cross-validation was carried out against an integrated dataset from 35 persons in total in [29]; however, the number of persons varies between positions (35 persons for trouser pocket, 25 for backpack, 15 for hand and 10 for messenger bag, jacket, belt and wrist), and the average number is 15.6.On the other hand, we tested with LOSO-CV with 20 persons who equally provided data from all target positions.By comparing to our previous work [26], the accuracy with the new set of features is much better, by six points, while still lower than the work by Incel [29], although it is hard to compare directly because of the difference in the target position and evaluation method, as well as the number of subjects.

On-Body Smartphone Localization Method
In this section, the method of localizing a smartphone on the body is described.

Target Positions
Nine popular positions shown in Figure 1 are selected as the targets of recognition: (1) around the neck (hanging); (2) chest pocket; (3) jacket pocket (side); (4) front pocket of trousers; (5) back pocket of trousers; (6) backpack; (7) handbag; (8) messenger bag; and (9) shoulder bag.People often carry smartphones in their hands during texting, calling, etc.We consider that such states could be detected by the application logging information of the terminal more precisely.Therefore, we excluded them in this study.
Including a bag as a storing position is technically challenging due to its diverse shape and the carrying style; however, as the survey [10] shows, a bag is a major location for storing a smartphone, especially for women (about 60%), and about 50% of them do not notice incoming calls/messages in their bags, which motivated us to detect a situation of carrying a smartphone in a bag.The four types of bags were specified as popular ones based on our observations on streets in Tokyo.We determined to recognize these types separately, rather than handle them as one single type of "bag".This is because the movement patterns that we utilize in recognizing a storing position are very different from each other, as shown in Table 2. Therefore, we considered it difficult to find powerful features to describe a general "bag".Instead, the result of fine-grained recognition can later be merged into one class "bag".

Neck
Chest pocket

Jacket pocket
Trousers front/ back pocket Backpack Handbag Messenger bag Shoulder bag

Sensor Modality
A three-axis accelerometer is utilized to obtain signals that characterize the movement patterns generated by dedicated storing positions while a person is walking.As surveyed in Section 3, accelerometer-based on-body device localization is popular.By contrast, although Shi et al.
showed the effectiveness of a gyroscope in storing position recognition, a gyroscope is a more power hungry sensor than an accelerometer [31] and not popular for low-end terminals; other multi-sensor approaches, e.g., [27], may also encounter similar issues.A vibration motor-based active sensing-based approach, such as [11,12], is not suitable for continuous position sensing due to the power consumption of a vibration motor, although a microphone and an accelerometer are available in today's smartphones.Typical raw acceleration signals from the target positions are shown in Figure 2. Note that the x-, yand z-axes of the accelerometer in the terminal (NexusOne) are set to the direction of width, height and thickness in portrait mode, respectively, as shown in Figure 3. Legend: X axis Y axis Z axis

Flow of Localization
Figure 4 illustrates the data processing flow from sensor readings to an event of placement change.The localization is carried out window-by-window to recognize the class of a position from the nine candidate positions based on the similarity of the patterns of the acceleration signals.Our approach primarily recognizes the storing position of a device while a person is walking.This is in line with the principles of Vahdatpour et al. [23] and Mannini et al. [25], which are based on the thoughts that walking is the most frequent and consistent activity throughout the day.Nevertheless, non-periodic motions, such as jumping and sitting, can be included in the stream of the acceleration signal.Such states are eliminated based on the constancy of the acceleration signal, as proposed in [33].The storing position of a previous recognition result is carried over against a window that is judged as "not walking".
Once a window contains a period of walking, a feature vector is obtained, in which features are calculated against linear acceleration signals.Linear acceleration is obtained by removing gravity components from the measured signals.Sophisticated linear acceleration signal estimation methods have been proposed by combining the gyroscope and magnetometer, e.g., [34]; however, we utilize only the accelerometer for the same reason as the choice of an accelerometer as a modality of storing position recognition.We adopted the method proposed by Cho et al. [35], in which the gravity components are approximately removed from the raw acceleration signals by subtracting the mean of accelerations in a window (Formula (1)), where a linear,{x|y|z},i and a raw,{x|y|z},i indicate the i-th component of a dedicated axis of the linear acceleration signal and the raw acceleration signal, respectively.Furthermore, a raw,{x|y|z} denotes the mean raw acceleration signals of the x-, yand z-axes in a window.a linear,{x|y|z},i = a raw,{x|y|z},i − a raw,{x|y|z} Windowing Classifica/on into 9 classes Smoothing "Chest pocket"  A feature vector is then given to a nine-class classifier, which is modeled by a machine-learning technique in advance.Temporal smoothing is carried out to reject a different pulsed output, since an output of the classifier is window based.Here, majority voting is applied among successive outputs.In this way, one position recognition is performed.We have already implemented the entire process on an Android platform and confirmed that the walking detection works pretty well; however, in this article, we focus on recognition features from a microscopic point of view, and the classification against single windows is performed, in which a dataset obtained during walking is utilized in an offline manner.

Recognition Features
We take the approach of listing candidates of features from the literature and the observation of waveforms (Figure 2), as well as selecting relevant and non-redundant features based on a machine learning technique.In addition to the three axes, i.e., x, y and z, utilized in our previous work [26], we introduce the magnitude of the acceleration signal (m) as the forth dimension (Formula (2)).
We systematically calculate the candidates of features from a window of a four-dimensional vector of linear acceleration signals by the combination of feature types and the axes.In total, 182 features are obtained (38 types × 4 axes for individual axes and 5 types × 6 pairs for correlation-based features).The feature selection is described in Section 5.3.Frequency that gives max f req,all Table 3 shows the features calculated from the four axes individually.The time domain features, except for the binned distribution, are basic and popular ones in acceleration-based activity recognition.The binned distribution, however, is defined as follows: (1) the range of values for each axis is determined by subtracting the minimum value from the maximum one; (2) the range is equally divided into 10 bins; and (3) the number of values that fell within each of the bins is counted [3].
Regarding the frequency domain features, max f req , fMax f req , 3 rd Q f req , IQR f req , 2 nd Max f req and f2 nd Max f req are specified to represent the shape of the frequency spectrum, as shown in Figure 5a.The feature maxSdev f req is obtained in a way similar to "sliding window average"; a subwindow with a 2.9 Hz range is created in an entire frequency spectrum to calculate the standard deviation (sdev); the subwindow is slid by 0.1 Hz throughout the frequency spectrum; and the maximum sdev is found.fMaxSdev f req is the central frequency of a particular subwindow that gives maxSdev f req .An example is shown in Figure 5b, where the third subwindow (sw 3 ) gives the largest standard deviation in N-frequency subwindows as maxSdev f req , and the central frequency for subwindow sw 3 corresponds to fMaxSdev f req .The size and sliding width (0.1 Hz) of the subwindow were heuristically determined.A feature calculated as the sum of squared values of frequency components (Formula (3)) is sumPower f req (also know as "Fast Fourier Transform (FFT) energy" in [26]) [14].The FFT entropy (entr f req ) is then calculated as the normalized information entropy of FFT component values of acceleration signals (Formula (4)), which represents the distribution of frequency components in the frequency domain [14].Note that the frequency spectrum is equally divided into three "frequency ranges" and assigned subscripts low, mid and high, which correspond to 0.0-4.2Hz, 4.2-8.4Hz and 8.4-12.5 Hz, respectively.In addition, the subscript all indicates the entire frequency range of 0.0-12.5 Hz.
Table 4 shows the features regarding the correlation of two axes, i.e., the correlation coefficient.The correlation coefficient is represented by Formula (5), where s and t represent two axes of time series data in the time domain or those of frequency spectra in the frequency domain, and M indicates the number of samples.We expected that (positively or negatively) high correlation indicates the characteristics of rotation in a particular storing position.Correlation coefficient in the mid-frequency range corr f req,high Correlation coefficient in the high-frequency range

Dataset
Unlike in well-established areas, such as machine vision [36] and speech recognition [37], reference dataset, i.e., the corpus, has not yet been recognized in on-body device localization.Combining datasets from different device localization projects is an option to cover a wide variety of storing positions and the diversity of people; however, this approach makes it difficult to separate the dataset for each person and, thus, to carry out LOSO-CV.Furthermore, we could find a very limited number and types of datasets publicly available for device localization [38].Therefore, we utilized the dataset collected in our previous study [26], which was performed as summarized in Table 5.Twenty graduate/undergraduate students (2 females and 18 males) participated to the experiment, in which they were asked to walk about 5 min (30 s/trial × 10 trials) for each storing position.We asked the participants to walk as usual, so that the data could be collected from a naturalistic condition, and no special instruction about the orientation of the device was given.They wore their own clothes; we only lent them clothes in the case that they did not have clothes with pockets.Regarding bags, we utilized one typical bag for each type of category of bag, and we asked the participants to carry bags as designed; that is, for example, carrying the handbag with one hand, not slinging it over the shoulder like a "shoulder bag".In total, we obtained about 150,000 samples per position.The applicability to other dataset in terms of different activities and other positions will be examined in Sections 5.6 and 5.7 using a dataset [38].

Basic Performance Evaluation
We compared the combinations of the window size and the classifier (classification algorithm), which are important tuning parameters in the recognition task.

Method
Three classes of window size were tested, i.e., 128, 256 and 512, which correspond to 5.12, 10.24 and 20.48 s, respectively.A window is generated by sliding 25 samples (1.00 s) in the data sequences.Regarding the classifier, we utilized five types of classifiers: (1) J48 tree as a decision-tree method; (2) naive Bayes as a Bayesian method; (3) a support vector machine (SVM) classifier; (4) multi-layer perceptron (MLP) as an artificial neural network-based method; and (5) RandomForest as an ensemble learning method.Here, the number of trees in RandomForest was set to 50.Ten-fold cross-validation (10-fold CV) was utilized to understand the basic classification performance, which is often utilized except for the active sensing approach (see Table 1) [13,17,[22][23][24]28].The Weka machine learning toolkit (version 3.6.9,University of Waikato, Hamilton, New Zealand) [39] was utilized, and the specific parameters for classifiers in Weka are summarized in Table 6.
Note that, prior to the evaluation, feature selection was performed to reduce the number of features for high generalization (avoiding overfitting to the training data) and lightweight computation.The number of selected features is 62, 63 and 61 among 182 features for window sizes of 128, 256 and 512, respectively.This means that the feature dimension was reduced to 1/3 of the original feature set.The details of the feature selection are described in Section 5.3.

Results and Analysis
Table 7 summarizes the classification accuracy for each window size and classifier.Here, the classification accuracy is defined by Formula (6).From the table, we can understand that the accuracy basically gets higher as the window size grows and that the SVM and RandomForest classifiers performed the best at high accuracy, i.e., 0.999.By taking into account the ease of parameter tuning and the processing speed, we determine to utilize RandomForest in later experiments.Regarding the window size, it seems that the accuracy of the RandomForest classifier was saturated up to 256.The window size has an impact not only on the computational cost of features, but also on the reactivity to signal changes.That is, the classifier may fail to decide the correct class on a window if a position change is detected in a window; the duration of incorrect classification depends on the size of the window, i.e., the smaller window makes a duration in which the mixed patterns appear shorter.Therefore, we take 256 as the window size for the later experiment, which is 10.24 s.

accuracy =
The number o f correct classi f ications The number o f total classi f ications

Feature Selection
In this section, we describe the method of feature selection, in which the result is focused on the window size of 256.

Method
We utilized a correlation-based feature selection (CFS) [40].CFS has a heuristic evaluation function merit, which can specify the subset of features that are highly correlated with classes, i.e., more predictive of classes, but uncorrelated with each other, i.e., more concise.As described in Section 4.4, a large number of features were listed up, which may contain redundant features.Therefore, we considered that the capability of CFS was suitable for this problem.The forward selection algorithm was utilized to generate a ranking on feature subsets, which begins with no features and greedily adds features one by one.Note that CFS is a classifier-independent method of feature selection.
In the feature selection process, the window sliding width was set to 64 samples, while the other evaluations (Sections 5.2 and 5.4) were carried out with the sliding width of 25.This indicates that the evaluations were fairer than an experiment that utilizes the same sliding width as the one at feature selection.This is because the values of calculated features were almost different from each other.

Results and Analysis
Figure 6 shows the relationship between the size of the feature subset and the merit score of the feature subset.From the figure, three phases in the relationship are found: (1) the quick increase with up to 10 features; (2) the slight increase up to 63 features; and (3) gradual degradation to the end.Therefore, we specified a feature subset with 63 features that provides the highest merit score.Table 8 summarizes the list of selected features, in which rank-N indicates the order of participation in the selected feature subset.Furthermore, to summarize the contribution of categories, such as axis and domain from Table 8, the medians of the rank are shown in Table 9.With respect an individual axis, the y-axis is most contributive to classification.We consider that this is because a ground reaction force mainly influences the vertical direction, which is the y-axis in the usual cases of neck, chest pocket and trouser pockets.The propagated ground reaction force may have different acceleration patterns in such storing positions.
The correlation-based features (corr time| f req ) generally performed well, as shown in Table 9, i.e., the rows of the "median of rank" in the upper part (26.5) and the "proportion of definition" in the lower part (more than 0.4).The effectiveness of the correlation between the magnitude of linear acceleration m and the other axes, e.g., corr time,my , indicates that the force is dominantly given to the axis.A scatter plot in Figure 7a shows the distribution of the value of corr time,my , in which (1) "neck" and ( 2) "chest pocket" have a clear negative and/or positive correlation between the yand m-axes.We consider that this is because a smartphone stored in these positions basically faces toward the front in the portrait orientation and moves up-and-down due to the strong influence of the ground reaction force.By contrast, the high correlation between the x-, yand z-axes represents rotational motions.For example, the high correlation between the xand y-axes indicates a motion around the yaw angle when a smartphone is placed in portrait orientation.We consider that such a yaw angle motion might be well observed when a terminal is put in the trouser pocket, because a terminal in portrait orientation swings with the motion of the legs.We also consider that this is a reason why a weak correlation is observed in positions, except for "neck" and "chest pocket", in Figure 7a.Similarly, there might be a particular linear and rotational motion patterns in each storing position.The effectiveness of rotational elements is consistent with the findings in [29], in which rotational information, i.e., pitch and roll, was calculated per sample, and some features, such as "mean" and "root mean square", were calculated in a window of such rotational information.In this case, the degree of rotational change is utilized to characterize the storing positions.We consider that our correlation-based features represent the level of dominance of the rotational axis in a window for specific storing positions, which is regarded as another aspect of the classification feature.
Regarding the comparison with the domains, 44 out of 63 features were originated from the frequency domain, which indicates that the features obtained from the frequency domain are contributive.
Especially, eight out of the top 10 features are frequency domain-originated ones, as shown in Table 8, in which three "sum of power (sumPower f req )" and two "frequency entropy (entr f req )" were ranked within the top 10.
As described in Section 4.4, sumPower f req represents the intensity of movement in a certain time window, while entr f req is a measure of the frequency distribution of the frequency components.The difference of the ground reaction force propagated through the body and the container of a smartphone might have different intensities.Figure 7b shows the distribution of the value of sumPower f req,mid,y , where large values can be found in (4) the trouser front pocket and (5) the trouser back pocket.We consider that this is because the ground reaction force is directly transmitted to the trouser pockets.Regarding entropy, Figure 7c is an example (entr f req,low,z ), where the frequency entropy of "neck" is relatively high.This might indicate that the signal obtained from the z-axis at the "neck" contains diverse frequency components with relatively uniform power.As inferred above, the y-axis is the dominant axis at the "neck", and conversely, the z-axis is subject to disordered force.

Method
The performance evaluation based on n-fold CV shows an optimistic result because (n − 1)/n of data from each person are included in the training dataset in theory, and hence, the classifier mostly "knows" about the subjects in advance.To see the capability of the robustness of the recognition system between individuals, we carried out LOSO cross-validation with the same dataset as 10-fold CV.LOSO-CV is carried out by testing a dataset from a particular person with a classifier that is trained without a dataset from the person.The result of the LOSO test can represent the performance in a realistic situation, such that a person purchases an on-body placement-aware functionality from a manufacturer or a third-party, because the data from a particular person are not utilized to train the classifier.Therefore, LOSO-CV is regarded as a fairer and practical test method, which has recently been getting attention [26][27][28][29].

Results and Analysis
Table 10 summarizes the confusion matrix of the average number of classified results per person.Here, recall and precision are defined by Formula (7) and Formula (8), respectively.The average accuracy per person is 0.805 with a maximum of 0.977, a minimum of 0.610 and a median of 0.828.By comparing to the work [26], which does not contain the m-axis and utilized a different classifier, i.e., SVM, the average accuracy was improved by 0.059 (5.9 points in percentage)."Neck" was classified very accurately, while "jacket" was the most difficult case.The shape and the size of jacket pockets are relatively diverse and large.Furthermore, the bottom of a jacket sometimes flaps as a person walks, which makes the movement of a smartphone diverse.We consider that this is a reason why the recall of "jacket" is low, i.e., 0.633.Additionally, the positions on the body are similar to each other in the case of "jacket" and "shoulder bag", as shown in Figure 1.Such similarity of position might cause the wrong determination of the movement of a smartphone.Moreover, the table shows two groups of frequent misclassification, i.e., (1) trouser front and back pockets and (2) backpack, handbag, messenger bag and shoulder bag.By taking into the semantic similarity between "trouser front pocket" and "trouser back pocket", these two classes are merged into a higher level of positional context "trousers (pockets)".Similarly, the four types of bag are integrated into "bags".Table 11 shows the confusion matrix by the merge operation, in which the merged rows and columns were averaged.As a result, the accuracy was improved to be 0.859.
Let us analyze the variation of classification performance in individuals.Figure 8 shows the sorted individual accuracy.Based on the fact that the median accuracy of the 9-class classification is larger than the averaged accuracy, we consider that there are some persons whose accuracies are very low.The figure implies that the classification for 6 persons, i.e., Persons J, M, D, T, G and B, degraded the overall accuracy.The common characteristics of these 6 persons are basically consistent with what was described above, i.e., large confusion within "bags" and "trousers", as well as confusion between "jacket" and "bags".

Compatibility Analysis
The compatibility of the classifier among subjects was analyzed.

Method
Classifiers are trained per individuals, and a particular classifier was tested with the datasets from the remaining persons, which was repeated with all persons.

Results and Analysis
Table 12 shows the "compatibility matrix".The number placed in a cell with row i and column j is the averaged accuracy against the dataset from person j that is tested with a classifier trained with the dataset from person i.For example, the value 0.69 at (A, E) means that the dataset from Person E was classified with an accuracy of 0.69 by a classifier trained by Person A. An exception is the values on the diagonal line, in which training and testing were carried out against the dataset from the same person by 10-fold CV.Therefore, the case is considered to be an ideal case, in which a classifier is personalized for the person.In the table, the values 0.0 and 1.0 are white and black, and the other values ranges are grayscale colors.
The rightmost column is the average of the values on each row, which indicates how well a classifier trained by the dataset from a particular person fits to other persons.Therefore, the value can be referred to as "average fitness".Here, a classifier by Person A's dataset is the best fit one, i.e., 0.58 on average, while the average fitness of Person G is the least suitable one (0.35).In training a classifier, reducing the weight on the dataset from persons whose average fitness is low, e.g., Persons C, D and G, would improve LOSO-CV accuracy by a single classifier.
By contrast, the analysis of the averages per column suggests the possibility of selective classifier tuning.The undermost row is the average on each column, which represents the generality of the dataset from a person.The best-classified dataset on average is the one from Person E (0.57), and Person J's dataset failed to be classified well on average with classifiers by the dataset from others (0.37).The classifiers trained by the datasets from Persons B, I and L did not perform well against the dataset from Person J with an accuracy of 0.18, 0.21 and 0.22, respectively.By contrast, classifiers trained by the datasets from Persons A, N and S classified Person J's dataset relatively well (0.54, 0.62 and 0.51, respectively).This suggests that the LOSO-CV accuracy might be improved if a classifier can be tuned for a person using datasets from others who have similar characteristics.

Storing Position Recognition during Periodic Motions other Than Walking
In this article, we focused on the recognition of the storing position of a device during walking based on the thought that walking is the most frequent and consistent activity throughout a day.As described in Section 4.3, a preprocessing was employed to pass through a segment of periodic motion that we regard as a walking period.However, due to the characteristics of the algorithm, other periodic motions, such as "jogging" and "biking", could still be passed through.To understand the robustness of the recognition against such accidental cases, a small-scale experiment was carried out.

Method
A classifier was trained by our original dataset (see Section 5.1) with the selected 63 features, which was obtained from 20 people during walking.As a dataset for testing during other activities, the dataset collected in [38] was again utilized.We removed "standing" and "sitting" because these two non-periodic activities can be easily filtered out at the preprocessing based on constancy detection.In addition, although the dataset were collected from four positions, i.e., "trouser front pocket", "upper arm", "wrist" and "belt", we only utilized "trouser front pocket", because it was included in the original dataset.Classification was carried out per person, and the recalls (Formula ( 7)) were averaged.

Results and Analysis
Table 13 summarizes the recalls for "trouser front pocket" and the names of the most confused classes with their recalls.
A window obtained from "trouser front pocket" during biking was misrecognized as "jacket" with a recall of 0.321.We consider that this is because the jacket hem in which the device is stored touches the thigh during biking and that this made the movement of the device in the trouser front pocket similar to that of the jacket.Regarding stepping up and down activities, "downstairs" was pretty low with a recall of 0.150, while that of "upstairs" was relatively high with a recall of 0.815.We consider that the difference comes from the impact with the ground.The motion of stepping down looks relatively different from walking compared to stepping up due to the strong downward force, which might have made the recognition difficult.The compatibility matrix presented in Section 5.5 suggests that the LOSO-CV accuracy might be improved if a classifier can be tuned for a person using datasets from others who have similar characteristics.The selection needs not to be person based.Instead, tuning a classifier based on the selection of an appropriate subset from all data might work better.In either case, the dataset for tuning a classifier needs to be identified when a person starts utilizing the system for the first time, which is a challenging issue, because no label to a class, i.e., storing position, is given for the first time usage.We will examine the possibility of identifying an appropriate dataset based on the position-independent variables, such as walking frequency.

Storing Position Recognition during Various Activities
As described in Section 4.3, when non-periodic motions, such as "standing" and "sitting", are detected, a previous decision during a periodic motion is carried over.Therefore, an issue to be considered is the occurrence of periodic motions other than the original "walking", although we take a stance that walking is the most frequent and consistent activity throughout the day.In Section 5.7, the data labeled as "trouser front pocket" were utilized to evaluate if they are correctly recognized during various periodic motions.The result showed that "biking" and "stepping down stairs" were difficult to handle when a device is put into the trouser front pocket.
The functionality of filtering out the period of biking activity needs to be investigated by paying attention to the key difference between a biking activity and the other "walking"-related activities.We consider that the key difference is the existence of the influence of the ground reaction force.Once the period of biking activity is identified, it can be handled in the same way as other non-periodic motions, i.e., carrying over the previous decision.Note that, we can ignore the case with upper body positions, because a device stored in a chest pocket, for example, moves less periodically than in a trouser pocket, which is easily filtered out by the current preprocessing.
Regarding stepping up and down activities, especially in the case of "downstairs", the difference in acceleration from "walking" might not be so large as in "biking".Therefore, it might be difficult to filter out; however, a workaround is to apply temporal smoothing from the long-term point of view, because the stepping up and down activities keep up for a couple of minutes at most.

Valid Applications with the Current Recognition Performance
The importance of on-body position recognition is described in Section 2. As shown in Table 11, "neck" and "trouser pockets" were classified very well, which is suitable for a class of applications that monitor environmental conditions, such as temperature and humidity.The measurement from the neck often differs from the trouser pockets due to the effect of body heat and sweat [18].An application can take an appropriate action, e.g., correction to the value assumed to be measured outside and alerting a user, when a monitoring device (smartphone) is inside a trouser pocket.Furthermore, a placement-aware audio volume adaptation would work well.
Moreover, the high precision for "neck" allows a sensor placement-aware activity recognition to recognize activities related to the upper part of the body, e.g., brushing teeth [4].A position-specific activity recognizer might be chosen in the case that the position recognition result is reliable, i.e., a position with high precision, such as the "neck"; by contrast, a common recognizer can be utilized against positions with low precision, such as "jacket" and "bag", in order to avoid significant degradation of the recognition due to the wrong choice of a recognizer.

Conclusions
In this article, we proposed a method of localizing a smartphone on the body.An accelerometer is utilized to recognize the storing position from nine candidate positions based on the similarity of acceleration patterns during walking.We systematically defined 182 features calculated from the axes of an accelerometer, including the magnitude of the x-, yand z-axes.As a result of correlation-based feature selection, 63 contributive features were selected that are more predictive and less redundant features than the remaining 119 features.Through the analysis of the contribution of each feature, we found that: (1) the features originated from y-axis are the most contributive; (2) the features calculated based on the correlation between two axes generally performed well compared to single-axis-originated ones, and the correlation between the magnitude axis and one of the other three axes is especially powerful; and (3) the features in the frequency domain are more powerful than the ones in the time domain; especially the sum of the power and frequency entropy are powerful.These findings would contribute to defining other features to accomplish the position recognition performance.Furthermore, the selected features were proven to be effective against new activities, i.e., "wrist", "upper arm" and "belt", which were not considered during the selection process.
The LOSO-CV evaluation with 20 subjects showed that the accuracy of nine-class classification was 0.801.Meanwhile, the accuracy against merged-class classification was 0.859, in which trouser front and back pockets were integrated into one category of "trouser pocket", and four types of bags were merged into "bag".Although a fair comparison among existing work is not practical due to the diversity of the system and environmental parameters, we consider that the accuracy falls into the category of being good.The "compatibility matrix" showed the possibilities of improving LOSO-CV accuracy by selecting an appropriate dataset prior to training a single classifier or customizing a classifier for each unknown user on the fly.In addition, we need to make the system robust against various activities that appear in daily life to improve the accuracy of recognition.

Figure 2 .
Figure 2. Raw acceleration signals from the nine target positions of a person during walking.

Figure 3 .
Figure 3.The definition of the axes of an accelerometer in an Android smartphone.

Figure 4 .
Figure 4. Localization process: the components with a dotted-line have been implemented, but are not the focus of this article.

Figure 5 .
Figure 5. Features obtained in the frequency domain.

Figure 6 .
Figure 6.The relationship between the size of the feature subset and the merit score (window size: 256).

Figure 7 .
Figure 7. Scatter plot of the distribution of features by target positions.The thick and dark portion indicates dense plots.

Figure 8 .
Figure 8. Classification accuracy per person sorted by the value.

Table 1 .
Related work on on-body localization of a device.The brackets in the accuracy column indicate the condition of the evaluation.LOSO, leave-one-subject-out.

Table 3 .
Classification features (x-, yand z-axes and the magnitude (m) of the three axes).

Table 4 .
Classification features based on correlation coefficients between two axes.

Table 5 .
Condition of data collection.

Table 7 .
Basic performance in the relationship between classification accuracy vs. window size and the classifier (10-fold CV).

Table 8 .
Selected features with ranks.

Table 9 .
Median of the rank and the ratio of selected features for each category.Note: in the case of an even number of features, the average of two central successive values was utilized.

Table 10 .
Confusion matrix of LOSO-CV for the 9-class classification (averaged per person).

Table 11 .
Confusion matrix of LOSO-CV for the merged 5-class classification (averaged per person).

Table 13 .
Averaged recalls per person against the dataset obtained during various activities.