Travel Mode Detection with Varying Smartphone Data Collection Frequencies

Smartphones are becoming increasingly popular day-by-day. Modern smartphones are more than just calling devices. They incorporate a number of high-end sensors that provide many new dimensions to smartphone experience. The use of smartphones, however, can be extended from the usual telecommunication field to applications in other specialized fields including transportation. Sensors embedded in the smartphones like GPS, accelerometer and gyroscope can collect data passively, which in turn can be processed to infer the travel mode of the smartphone user. This will solve most of the shortcomings associated with conventional travel survey methods including biased response, no response, erroneous time recording, etc. The current study uses the sensors’ data collected by smartphones to extract nine features for classification. Variables including data frequency, moving window size and proportion of data to be used for training, are dealt with to achieve better results. Random forest is used to classify the smartphone data among six modes. An overall accuracy of 99.96% is achieved, with no mode less than 99.8% for data collected at 10 Hz frequency. The accuracy is observed to decrease with decrease in data frequency, but at the same time the computation time also decreases.


Introduction
Household trip data are of crucial importance for managing present transportation infrastructure as well as to plan and design future facilities. They also provide basis for new policies implemented under Transportation Demand Management (TDM). The methods used for household trip data collection have changed with passage of time, starting with the conventional face-to-face interviews or paper-and-pencil interviews in the 1950s. High cost and safety issues proved to be the major problems in this approach. To overcome such disadvantages, computer assisted surveys were introduced in the 1980s. These surveys included computer-assisted telephone interview (CATI) and computer-assisted self-interview (CASI) [1,2]. The computer assisted surveys proved to be an improvement from the previous face-to-face interviews [3] but the underlying shortcomings in person trip (PT) data collection methods still remained. These included inaccuracies in recording the starting and ending times, underreporting due to missing short trips and non-response [4,5]. The source of all these problems was the enormous burden on the respondents to answer a huge number of questions based on their memories. To address this issue, GPS technology was employed during the late 1990s, providing the starting point for a generation of smart travel survey methods [6].
Initially, GPS surveys were carried out as supplementary surveys to assess the accuracy of traditional methods, but later total replacement was experimented with [7][8][9]. At the beginning, GPS Sensors 2016, 16, 716 3 of 24 initiated by an alliance between Singapore and Massachusetts Institute of Technology (MIT). The study validated that participants tend to over-estimate the travel time in traditional surveys.
Smartphones are equipped with a range of sensors, many of which are not favored by or simply overlooked by majority of researchers. However, there are some studies that incorporate these sensors as well. Frendberg [39] utilized data collected from GPS, accelerometer, orientation sensor and magnetic sensor to detect the travel mode using a smartphone application, similar to Su and Caceres [40].
A number of studies have utilized the accelerometer data alone for classification purposes [41][42][43][44][45][46]. In one study [29], the training and testing datasets were formed by taking 70% of the collected data as training data and rest as test data; a similar study divided the collected data as 90% for training and 10% for testing [47]; and yet another study used almost 50% of the collected data for training and rest for testing the classification algorithms [48]. Some studies (e.g., [49]) also collected GPS data but for data validation only. Mode detection was still managed by accelerometer data.
Various studies have compared random forest with other algorithms for the purpose of mode detection, while reaching the same conclusion that random forest is a superior algorithm for the intended purpose. For instance, one study made a comparison among random forest, naïve Bayes, Bayesian network, decision trees and multilayer perceptron [30]; another incorporated neural network and support vector machines along with random forest [32]; one more studied random forest, k-nearest neighbor, support vector machines, naïve Bayes and decision trees [21]; and further a study reported a comparison among support vector machines, adaptive boosting, decision trees and random forest [50]. These studies demonstrated that random forest yields higher travel mode prediction accuracies.
In our previous studies [50,51], acceleration data were collected by a purpose-built wearable device named as BCALs (Behavioral Context Addressable Loggers in the Shell). Mode detection was successfully done among four modes: walk, bicycle, car and train. Developing a methodology for data collected by smartphones and also to add some other modes for classification was required. Therefore, our current work proposes a methodology for identification among six different travel modes namely walk, bicycle, car, bus, train and subway, using data from accelerometer and orientation sensors embedded in smartphones. Further, it investigates the effect of various data collection frequencies on the classification accuracy of the used algorithm as well as the computational costs incurred.

Data Collection
Fifty participants from Kobe, Japan contributed to the collection of data utilizing Android smartphones, over a month during November 2013. The data collection days varied among the participants, with some providing the records for only single day travel, while others cooperating for multiple days. Consequently, the collected data are quite few as compared to one-month collection time. Six modes were observed, i.e., walk, bicycle, car, bus, train and subway. Recording of the ground truth was achieved by a simple application installed in the smartphones. The participants would merely input the travel mode in the application while starting a trip, and then stop the recording once they have reached their destination. At the end of the day, a recall survey would be conducted to check the reliability of the collected data. With the help of route maps generated by the GPS data, the participants could easily reconfirm the starting and ending times of various trips as well as the mode of transportation used. Afterwards, only the sensor data associated with the trips were retained and all other data including any problematic data or unlabeled data were discarded ( Table 1). The distribution of participants according to gender as well as age is shown in Table 2. Although the participants' demographics are not used in the analysis, it is worth mentioning because it implicitly affects the collected data. Table 2 shows that almost all age groups capable of driving and using other modes of transportation are incorporated in this study. Demographic data were collected during several meetings, where the participants were enrolled in the program. The general demographics of Kobe, according to 2010 census, are presented in Figure 1. The participants do not strictly represent the the program. The general demographics of Kobe, according to 2010 census, are presented in Figure 1. The participants do not strictly represent the general demographics, i.e., male participants are more than female participants, as this was a limitation of the willingness of people to participate in the survey.  The collected data consisted of readings by accelerometer (accelerations along z-, y-and z-axes) and orientation sensor (pitch and roll). GPS data were also collected but were used in this study for data verification only. For mode detection, it was dropped as the aim was to devise a battery-efficient methodology. The sensors recorded data at an average frequency of 14 Hz but due to the varying frequencies among the users, the data collection frequency was scaled down to a uniform 10 Hz. Further decreased frequencies were also tested to compare them with respect to their computational costs (details in Section 2.6). An additional advantage of decreasing the frequency can be in making the procedure more battery-efficient, as battery time is one of the main obstacles in data collection using smartphones. This can be visualized by the power consumption figures provided in the literature [52]. The study reported that an accelerometer collecting the readings at 20 Hz frequency consumes 230 mW. The power consumption reduces to 180 mW for 10 Hz frequency, and further reduces to 164 mW for 2 Hz frequency. Unfortunately, during data collection for the current study, the battery usage was not tracked, making it impossible to carry out energy  The collected data consisted of readings by accelerometer (accelerations along z-, y-and z-axes) and orientation sensor (pitch and roll). GPS data were also collected but were used in this study for data verification only. For mode detection, it was dropped as the aim was to devise a battery-efficient methodology. The sensors recorded data at an average frequency of 14 Hz but due to the varying frequencies among the users, the data collection frequency was scaled down to a uniform 10 Hz.
Further decreased frequencies were also tested to compare them with respect to their computational costs (details in Section 2.6). An additional advantage of decreasing the frequency can be in making the procedure more battery-efficient, as battery time is one of the main obstacles in data collection using smartphones. This can be visualized by the power consumption figures provided in the literature [52]. The study reported that an accelerometer collecting the readings at 20 Hz frequency consumes 230 mW. The power consumption reduces to 180 mW for 10 Hz frequency, and further reduces to 164 mW for 2 Hz frequency. Unfortunately, during data collection for the current study, the battery usage was not tracked, making it impossible to carry out energy consumption analysis. Nevertheless, energy consumption is an issue when it comes to employing smartphones; therefore, it was partially dealt here by reducing the data collection frequency. Table 3 provides the number of trips and the amount of data instances recorded for each mode at 10 Hz frequency. The percentages do not add up to 100 because of rounding.

Pre-Processing
Smartphones are usually carried in different positions by the users, e.g., some place their smartphones in their pockets, some carry it in their purse and some simply keep it in their hands while messaging or calling. These different orientations make it difficult to individually use the accelerations along the coordinate axes because smartphones' accelerometer record accelerations with respect to the force of gravity. Therefore different orientations affect the individual accelerations differently. To solve this problem, like some other studies [47,[53][54][55], instead of using accelerations along the three axes individually, magnitude of the resultant acceleration was used, calculated as below.  Figure 3a. This is the reason that individual accelerations were not used; instead, the magnitude of their resultant was utilized in the analysis. Furthermore, it is evident from the figures that the non-motorized modes, i.e., walk and bicycle, register a lot of fluctuations, whereas the behavior of motorized modes is different, with comparatively smooth trends. It can also be noted that resultant acceleration alone is not sufficient to distinguish among the modes, therefore some other features need to be extracted (details in Section 2.3). The magnitude of resultant acceleration might be affected by the activity performed on the smartphone by the owner, like calling or texting or no action at all. This probable effect should be investigated further as mentioned in future work.
Wolf [56] used a dwell time of 120 s for trip identification. The value was based on the design criteria mentioned in the Highway Capacity Manual, where the traffic signal cycle should be less than 120 s. It was assumed that stoppage at traffic signals should not be considered as trip ends. The 120 s rule lacked empirical results to support it [11]. Shen and Stopher [57] tested different thresholds of dwell time from 15 s to 120 s and concluded that 60 s would be a better criterion for trip segmentation. In the current study, the same dwell time of 60 s was used to identify different trips. In other words, if two consecutive readings were more than 60 s apart then they were considered as the ending point of the previous trip and starting point of the next trip, respectively.
This simple solution was applicable because only the sensors' data associated with the trips was taken; hence, various trips were already segmented as far as the data were concerned. It also resulted in identifying one trip as several independent trips due to short stops on the way, for instance waiting at intersections. This was not a serious issue as the only aim of the current study was to identify the mode of transportation. As long as the mode is detected correctly, it does not matter whether it is one trip or several trips. The process of splitting and joining of trips will be developed in future research. A much better methodology for stop detection is proposed by Xiao and Low [58], but it requires collection of GPS and GSM-based positioning data.

Feature Extraction
In addition to resultant acceleration, six features were further extracted from resultant acceleration namely standard deviation, skewness, kurtosis, maximum resultant acceleration, average resultant acceleration and maximum average resultant acceleration. Pitch and roll, directly recorded by orientation sensor, were also considered for classification.
Most of the extracted features are quite straightforward. Skewness measures the lack of symmetry of a given dataset. A dataset is symmetric if it looks the same on both sides of the center point. On the other hand, kurtosis measures the flatness of the dataset, determining whether the dataset or distribution is peaked or flat around the mean, relative to normal distribution. After average resultant accelerations were calculated, they were used to calculate maximum average resultant accelerations in the same way as resultant accelerations were used to calculate maximum resultant accelerations, over each window. All features/variables, except resultant acceleration, were calculated by employing a moving window concept [50]. For the purpose of smoothening the data and reducing the effect of the outliers, the concept of moving window was used where a certain number of readings, defined by the window size, were used to apply an operation (e.g., average, maximum, etc.) at a certain data entry level and this window moved downwards as the calculations proceeded along the data column. Suppose five data readings fall in 1 min window, then Figure 8 shows an example of how moving window concept is applied.
in identifying one trip as several independent trips due to short stops on the way, for instance waiting at intersections. This was not a serious issue as the only aim of the current study was to identify the mode of transportation. As long as the mode is detected correctly, it does not matter whether it is one trip or several trips. The process of splitting and joining of trips will be developed in future research. A much better methodology for stop detection is proposed by Xiao and Low [58], but it requires collection of GPS and GSM-based positioning data.

Feature Extraction
In addition to resultant acceleration, six features were further extracted from resultant acceleration namely standard deviation, skewness, kurtosis, maximum resultant acceleration, average resultant acceleration and maximum average resultant acceleration. Pitch and roll, directly recorded by orientation sensor, were also considered for classification.
Most of the extracted features are quite straightforward. Skewness measures the lack of symmetry of a given dataset. A dataset is symmetric if it looks the same on both sides of the center point. On the other hand, kurtosis measures the flatness of the dataset, determining whether the dataset or distribution is peaked or flat around the mean, relative to normal distribution. After average resultant accelerations were calculated, they were used to calculate maximum average resultant accelerations in the same way as resultant accelerations were used to calculate maximum resultant accelerations, over each window. All features/variables, except resultant acceleration, were calculated by employing a moving window concept [50]. For the purpose of smoothening the data and reducing the effect of the outliers, the concept of moving window was used where a certain number of readings, defined by the window size, were used to apply an operation (e.g., average, maximum, etc.) at a certain data entry level and this window moved downwards as the calculations proceeded along the data column. Suppose five data readings fall in 1 min window, then Figure 8 shows an example of how moving window concept is applied.  Although the window size was reported in the form of time, the equation developed for the calculation took into account the number of instances covered in the reported time interval. For example, a 1 min window size for data collected at 10 Hz frequency would cover 10ˆ60ˆ1 = 600 data instances. Suppose that the collected data contains n total instances and k is the number of instances covered in the defined window size (like 600 in the previous example), then at any instance level i, the equation developed can be expressed as follows.
where i = instance level at which moving window concept is applied. X i = computed value after the mathematical operation is applied. j = range of values covered by the window.
x j = values on which the mathematical operation is applied. k = number of data instances covered by a defined window size. n = total number of data instances in the collected dataset.
f`x j˘= mathematical operation.
The mathematical operation can be average, maximum, skewness, etc., depending on the feature to be extracted. The size of moving window used is discussed in Section 2.5.

Classification Algorithm
As mentioned previously in the Section 1, random forest is shown to work better as compared to other algorithms. Consequently, only random forest was used in the present study for the purpose of travel mode classification.
Random forest [59] is an ensemble of decision trees such that each tree is grown independently using a randomly selected dataset while the distribution remains same for all the trees in the forest. As the number of trees in the forest becomes large, the generalization error converges to a limit. The generalization error is dependent on the strength of individual trees and their correlation. Each node within the trees is split using a randomly selected set of features. This randomness introduces robustness to the algorithm against noise. Internal estimates of the algorithm can monitor error, strength and correlation. Variable importance measurement can also be done. Random forest is equally applicable to both classification and regression problems. A general structure of random forest is shown in Figure 9.  R package named "RandomForest" was used in the current study. The package, developed by Liaw and Wiener, use the original coding of the algorithm written by Breiman and Cutler in Fortran, which is imported into R environment. It has the ability to combine various ensembles of trees, extract a single tree from a forest, add trees to an ensemble, extract variable importance measures, etc.; and, of course, includes the training of algorithm, predicting and plotting the results. Default values were used for various variables involved in the algorithm whereas the number of trees to be grown was set to 100. From our previous studies, it has been learned that 100 trees are enough for such kind of classification and increasing the number will not be helpful. The number was further confirmed for the data used in the current study.
The classification results reported in this study are in the form of producer accuracy. For example, if the accuracy is reported to be 70%, it means that 70% of the data belonging to a certain known travel mode (ground truth) is classified correctly as that particular mode by the algorithm. In other words, for any mode "a" Accuracy " number o f instances correctly classi f ied as mode 2 a 2 by algorithm total number o f instances belonging to mode 2 a 2 (3)

Moving Window Size
According to a previous study [60], the average commute travel time for walking is 16.15 min. Walking is generally the travel mode for the shortest trips; the moving window size should therefore be less than the average value of 16.15 min. 10 min moving window size was selected. Although some trips will even be shorter than 10 min, as is also evident from the distribution of collected data with respect to time intervals shown in Figure 10 (3.37% of the collected data were less than 10 min), the window size cannot be reduced to cover all the trips because then the moving window concept will be useless. the window size cannot be reduced to cover all the trips because then the moving window concept will be useless. The aim of moving window is to smoothen the data, and hence decreasing the variation range; this goal cannot be achieved if a very small window size is utilized. In Figure 10, the total recorded time for each trip falling into the various time interval slots (x-axis) was added and plotted on the y-axis.

Data Frequency
As already mentioned in Section 2.1, the data collection frequency varied from 12 to 16 Hz, therefore to attain a uniform frequency, the data were scaled down to 10 Hz. This was achieved by cumulating the time intervals between successive readings. As the sum of time intervals exceeded 0.1 s, the corresponding acceleration reading was selected and the cumulative sum was reset to zero in order to proceed further. In this manner, all data were screened and the readings spaced at 0.1 s  The aim of moving window is to smoothen the data, and hence decreasing the variation range; this goal cannot be achieved if a very small window size is utilized. In Figure 10, the total recorded time for each trip falling into the various time interval slots (x-axis) was added and plotted on the y-axis.

Data Frequency
As already mentioned in Section 2.1, the data collection frequency varied from 12 to 16 Hz, therefore to attain a uniform frequency, the data were scaled down to 10 Hz. This was achieved by cumulating the time intervals between successive readings. As the sum of time intervals exceeded 0.1 s, the corresponding acceleration reading was selected and the cumulative sum was reset to zero in order to proceed further. In this manner, all data were screened and the readings spaced at 0.1 s apart were selected. The process can be further understood by an example provided in Table 4. It can be observed that the time interval is cumulated until it exceeds the required data interval (2 s or 0.5 Hz in this case), after which it is reset to zero and the corresponding acceleration reading is picked up. The process is repeated for the subsequent readings till all recorded data are scanned. The same procedure was repeated to attain datasets for reduced frequencies: 4 Hz (0.25 s), 2 Hz (0.5 s), 1 Hz (1 s), 0.5 Hz (2 s), 0.33 Hz (3 s), 0.25 Hz (4 s) and 0.2 Hz (5 s). The window size taken was 10 min, so depending on the various data collection frequencies (10 Hz or 1 Hz or any other value), the number of readings in each window will differ, e.g., for 10 Hz, the window will cover 600 readings, whereas for 1 Hz, the window will cover 60 readings.

Amount of Learning Data
For applying the classification algorithm, some portion of the total collected data should be used to train the classifier. Different values of learning data have been used by researchers (Table 5). It is evident from the table that no single value has been agreed upon by researchers. Moreover, the values listed in the table were selected arbitrarily, without any empirical support. For the current study, learning data percentages varying from 5% to 80% were tested on 0.2 Hz data ( Figure 11). The lowest frequency value was selected as it was expected that the accuracy will be lower as compared to other frequencies and hence the accuracy variation with respect to amount of learning data will be more visible. Regarding the development of learning dataset, stratified random sampling was employed, wherein equal percentage data from each mode was randomly selected.
For applying the classification algorithm, some portion of the total collected data should be used to train the classifier. Different values of learning data have been used by researchers (Table 5). It is evident from the table that no single value has been agreed upon by researchers. Moreover, the values listed in the table were selected arbitrarily, without any empirical support. For the current study, learning data percentages varying from 5% to 80% were tested on 0.2 Hz data ( Figure 11).   [29] 70 Abdulazim, Abdelgawad [32] 65 Figo, Diniz [48] 50 The lowest frequency value was selected as it was expected that the accuracy will be lower as compared to other frequencies and hence the accuracy variation with respect to amount of learning The results exhibited in Figure 11 are quite logical, with the accuracy increasing with increase in the share of learning data. It is evident that, except walk, all other modes show deteriorated accuracy as the learning data share is reduced. This trend might be specific to the data used in this study because of the huge share of walk instances. The figure suggests that better prediction results can be achieved by simply increasing the amount of learning data compared to the test data. However, this means that for the deployment of the developed methodology for real data, the requirement of a huge controlled survey for obtaining the learning dataset is essential. This requirement will limit the applicability of the approach; therefore, a methodology should be developed that will utilize comparatively fewer learning data but at the same time provide acceptable prediction accuracy.
From Table 6, which is the quantitative translation of Figure 11, it can be seen that increasing the learning data from 5% to 10%, the prediction accuracy increased by about 3.4%, but from 10% to 80% the increase is only 5%. In other words, the prediction accuracy decreased steadily until 10% learning data, after which a drastic drop was witnessed. It was therefore decided to set the amount of learning data to 10%. Table 6. Increase in accuracy with amount of training data.

Results and Discussion
Using 10 min moving window to extract the features and 10% data to train the algorithm, classification results were computed for datasets with varying recording frequencies. Additionally, the computation times were also recorded for each dataset, in order to aid in the comparison. Table 7 gives the overall results along with the computation times and Table 8 provides the detailed results in the form of confusion matrices. It is evident from Table 7 that the overall classification accuracy decreases with decrease in data frequency. It is already established from Table 6 that the accuracy increases with increase in amount of training data. The trend observed in Table 8 might also have the same reason. With increase in frequency, the amount of data also increased, which in turn increased the training data. Moreover, moving window concept seems to extract better feature values for high frequencies, as the outliers are averaged over a wider range, hence reducing their impacts. The other criterion observed is the time spent in computation. The computation time depends on the amount of data and as the data decreases with the decreased frequency, even though the recorded total time remains the same, the time required for computing decreases. Thus, if the required classification accuracy is more than 99%, then 1 Hz frequency will meet that condition with a 94% decrease in computation time compared to 10 Hz, while the difference in accuracy would be only 0.8%. Furthermore, as mentioned in Section 3, the power consumption will also be reduced.
Hence, selection of data collection frequency is very crucial, as it not only controls the classification accuracy but also the efficiency of the methodology. Nevertheless, there is a tradeoff between the accuracy and efficiency of the methodology. Therefore, researchers should select the frequency according to their specific needs. Table 9 provides an insight into the prediction accuracy for 0.2 Hz frequency data, with respect to entire trips. One thing to note here is the slightly larger number of trips (625) than reported in Table 3 (559). This is due to breaking up of larger trips into multiple smaller ones when 60 s dwell time was used for trip segregation.
A valid question arises as to the reason for the remarkably high detection accuracy by this methodology. The secret lies in the moving window concept used to extract the various features. Figure 12 shows the resultant acceleration data collected for a part of a walking trip. The average resultant acceleration calculated by moving window is also shown in the figure. It is evident that the average values approximately remain constant, hence providing a very useful feature for the algorithm. If the algorithm is trained using only a few average values, then the algorithm will very easily identify the remaining values against the values from other modes. Moreover, additional features like maximum resultant acceleration, standard deviation, skewness and kurtosis refine the classification process and decrease the number of misclassifications. Conventionally, researchers use specific time windows, mostly having 50% overlap, to extract various features [21,29,47,54,62]. One of the problems with this kind of approach is the loss of data points. For example, for data collected at one reading per second (1 Hz) and a time window of 10 s with 50% overlap, the extracted features will have a frequency of one reading per 5 s (0.2 Hz).   This explains why moving window was used but does not justify the large window size selected, which might result in excessive overlapping and consequently high prediction accuracies. The reason for using this approach lies in the real world application design of the developed methodology. Generally, people have unique walking and driving patterns, even if they usually stick to a distinctive routine while commuting daily via public transportation. To predict the mode of transportation of a person by studying a completely different person might not yield better results. However, if the prediction is done by studying limited data yielded from the same person, the accuracy will certainly be much better. As the algorithm requires training data, the application design is such that the participants will be asked to at least annotate one day's data (encouraged by providing some incentive like free cinema tickets, gift vouchers, etc.), all of which will be regarded as the training data. After that, the participants just need to keep the application running in the background for the intended period of the survey. In such a design, the big window size does not pose a problem; in fact, it helps to achieve higher prediction accuracy by smoothening the data and bringing it near to the training data. To explain this, Figure 13 demonstrates the average resultant acceleration values, calculated by a 10 min moving window, for first day walking trips made by four participants only. It is evident from the figure that large window size brings the average resultant acceleration data for each trip, by a particular participant, closer to an average value. Hence, it allows the correct prediction within each participant's data. Note that the figure shows only one feature. When assisted with a number of other features, the prediction process becomes efficient.   This explains why moving window was used but does not justify the large window size selected, which might result in excessive overlapping and consequently high prediction accuracies. The reason for using this approach lies in the real world application design of the developed methodology. Generally, people have unique walking and driving patterns, even if they usually stick to a distinctive routine while commuting daily via public transportation. To predict the mode of transportation of a person by studying a completely different person might not yield better results. However, if the prediction is done by studying limited data yielded from the same person, the accuracy will certainly be much better. As the algorithm requires training data, the application design is such that the participants will be asked to at least annotate one day's data (encouraged by providing some incentive like free cinema tickets, gift vouchers, etc.), all of which will be regarded as the training data. After that, the participants just need to keep the application running in the background for the intended period of the survey. In such a design, the big window size does not pose a problem; in fact, it helps to achieve higher prediction accuracy by smoothening the data and bringing it near to the training data. To explain this, Figure 13 demonstrates the average resultant acceleration values, calculated by a 10 min moving window, for first day walking trips made by four participants only. It is evident from the figure that large window size brings the average resultant acceleration data for each trip, by a particular participant, closer to an average value. Hence, it allows the correct prediction within each participant's data. Note that the figure shows only one feature. When assisted with a number of other features, the prediction process becomes efficient. This is the probable reason behind the extraordinarily high detection accuracies achieved in this study. To include randomness into the present analysis, the training data were randomly selected rather than taking entire trips. The aim is to assist the travel data collection survey; therefore, the predictions need not to be in real-time. Needless to say, it can be used for real-time prediction but then the window size should be decreased so as to abstain from unnecessarily long lag. The grouping of data in Figure 13 should not be confused with window size, which remained constant throughout the entire data for the calculation of all features. This grouping is merely to assist in understanding the advantage of using the large window size of 10 min. Furthermore, each trip demonstrates the spread of average resultant acceleration values calculated using 10-min window size. This is the probable reason behind the extraordinarily high detection accuracies achieved in this study. To include randomness into the present analysis, the training data were randomly selected rather than taking entire trips. The aim is to assist the travel data collection survey; therefore, the predictions need not to be in real-time. Needless to say, it can be used for real-time prediction but then the window size should be decreased so as to abstain from unnecessarily long lag. The grouping of data in Figure 13 should not be confused with window size, which remained constant throughout the entire data for the calculation of all features. This grouping is merely to assist in understanding the advantage of using the large window size of 10 min. Furthermore, each trip demonstrates the spread of average resultant acceleration values calculated using 10-min window size. Figure 13. Convergence of data due to big window size.
The variable importance, calculated by random forest, is shown in Figure 14. It is evident that all features, including orientation readings, are important and add to the predictive power of the algorithm. Resultant acceleration is least important, possibly because all other features are extracted from it and within the extracted features the distinguishable information is magnified. Resultant acceleration can therefore be eliminated from the list of features.  Variable Importance Figure 13. Convergence of data due to big window size.
The variable importance, calculated by random forest, is shown in Figure 14. It is evident that all features, including orientation readings, are important and add to the predictive power of the algorithm. Resultant acceleration is least important, possibly because all other features are extracted from it and within the extracted features the distinguishable information is magnified. Resultant acceleration can therefore be eliminated from the list of features. This is the probable reason behind the extraordinarily high detection accuracies achieved in this study. To include randomness into the present analysis, the training data were randomly selected rather than taking entire trips. The aim is to assist the travel data collection survey; therefore, the predictions need not to be in real-time. Needless to say, it can be used for real-time prediction but then the window size should be decreased so as to abstain from unnecessarily long lag. The grouping of data in Figure 13 should not be confused with window size, which remained constant throughout the entire data for the calculation of all features. This grouping is merely to assist in understanding the advantage of using the large window size of 10 min. Furthermore, each trip demonstrates the spread of average resultant acceleration values calculated using 10-min window size. Figure 13. Convergence of data due to big window size.
The variable importance, calculated by random forest, is shown in Figure 14. It is evident that all features, including orientation readings, are important and add to the predictive power of the algorithm. Resultant acceleration is least important, possibly because all other features are extracted from it and within the extracted features the distinguishable information is magnified. Resultant acceleration can therefore be eliminated from the list of features.

Conclusions and Future Work
Smartphones are opening up a new horizon for introduction of technology to solve problems in the transportation sector. Travel data collection method can be revolutionized by employing smartphones for passive data recording. This vast possibility is identified by researchers all over the world and much research is being undertaken. The present study is expected to contribute to the ongoing research. The developed methodology takes the data from smartphone sensors as the input information. All this input data can be passively recorded without any effort required on the part of the smartphone carriers.
The current study demonstrated that data recording frequency has huge impacts on the accuracy and efficiency of the methodology. The frequency should be selected with care, as the accuracy decreases with decrease in frequency but simultaneously, the time required for computation also drops. As computation cost will play a decisive role for huge amounts of data, when data collection by smartphones will be applied on a large extent, selection of suitable frequency value will become all the more important. The researcher has to settle for a compromise between accuracy and computational cost. The results showed that an impressive overall classification accuracy of 99.96% can be achieved, with identification level of no mode less than 99.8%. The main sensor value used to extract further features was the magnitude of resultant acceleration. As individual accelerations are affected by activities performed on the smartphones, it is likely that the calculated magnitude will be slightly different for the same mode among smartphones in use and not in use. This in turn will influence the extracted features. It is therefore necessary to investigate this variability and its effect on mode detection.
Initially, automatic mode detection will complement the traditional travel data collection methods by providing accurate and detailed travel information. The participants will no longer need to keep a mental note of where and when they took a trip. All this information will be provided by their smartphones, and the accuracy will obviously be higher. The final form of smart data collection would be making the traditional methods redundant. In future, the smartphone will not only be able to determine the mode of transportation used but will also be able to identify the family, thereby extracting the family data from governmental records like number of family members, their ages, salaries, etc. Moreover, by interacting with nearby smartphones, the identity of the accompanying persons will also be ascertained. We are moving briskly towards that era, with ever increasing smartphone penetration as well as tremendous increase in Internet access.
The sharp decrease in accuracy below 10% learning data, as mentioned in Section 2.7 might also be the result of small amount of collected data. As the amount of training data are increased, the algorithm becomes more and more intelligent towards predicting unknown examples correctly, until a certain amount is achieved, after which additional training examples do not add substantial detection power to the algorithm. In other words, the algorithm is fully trained and can predict huge amounts of unknown examples. Future studies should keep this aspect in mind and, while using large dataset, report the training data in terms of data points or number of trips rather than percentage of total data. Furthermore, the saturation point should be determined to decide the amount of training data.
One of the major limitations of this study is trip segmentation. Trip segmentation is implicitly added to the data by deleting sensor data during stay or periods of non-activity. It is then coupled with 60 s dwell time to divide the data into trips. In reality, the analyst will be unaware of breaks in the data; therefore, an efficient trip segmentation methodology should be developed. Another major constraint is the unequal representation of various modes in the collected data. Although the data provide a realistic picture of typical Japanese lifestyle, where walking has a major share in daily travelling, this may overshadow other modes. It can be witnessed from the mode-wise classification results, where walk showed outstanding accuracy as compared to other modes. Moreover, due to the massive amount of walk data used for training the algorithm, other modes are predominantly misclassified as walk. Applying the developed methodology for data with comparable representation from all modes might yield different results and should therefore be tested. Another limitation is the small amount of data used in the study. More data should be tested so that the developed methodology may obtain wider acceptance. Effort should be made in order to further decrease the percentage of data used to train the algorithm while attaining similar accuracy levels. This will ensure accurate data interpretation for a large amount of data collected, even when using a small percentage for training purpose. Moreover, variation in data and classification accuracy among different users should be explored to understand the role of users. This may provide new ideas to tackle the issue at hand.