Estimating Spatiotemporal Information from Behavioral Sensing Data of Wheelchair Users by Machine Learning Technologies

Recent expansion of intelligent gadgets, such as smartphones and smart watches, familiarizes humans with sensing their activities. We have been developing a road accessibility evaluation system inspired by human sensing technologies. This paper introduces our methodology to estimate road accessibility from the three-axis acceleration data obtained by a smart phone attached on a wheelchair seat, such as environmental factors, e.g., curbs and gaps, which directly influence wheelchair bodies, and human factors, e.g., wheelchair users’ feelings of tiredness and strain. Our goal is to realize a system that provides the road accessibility visualization services to users by online/offline pattern matching using impersonal models, while gradually learning to improve service accuracy using new data provided by users. As the first step, this paper evaluates features acquired by the DCNN (deep convolutional neural network), which learns the state of the road surface from the data in supervised machine learning techniques. The evaluated results show that the features can capture the difference of the road surface condition in more detail than the label attached by us and are effective as the means for quantitatively expressing the road surface condition. This paper developed and evaluated a prototype system that estimated types of ground surfaces focusing on knowledge extraction and visualization.


Introduction
Providing accessibility information of the road for people with difficulties in moving, such as elderly people, mobility impaired people, and visually impaired people, is one of the important social issues.One method of solving these issues using information communication technology is to develop an accessibility map as a large geographic information system to provide the accessibility information [1][2][3].The conventional method for gathering accessibility information on a large scale is as follows: a method for experts to evaluate sidewalks and their images for each case [4]; crowdsourcing methods to recruit information from volunteers [5,6]; and so on.In all these methods, human labor is indispensable.
Recent expansion of intelligent gadgets, such as smartphones and smart watches, familiarizes humans with sensing their activities.Focusing on the fact that the observed values of acceleration sensors installed in wheelchairs were influenced by the condition of the road surface, we have been proposing a system which evaluates road surface condition by machine learning from acceleration sensor data.Research on human action recognition by applying machine learning to raw human behavior data measured by an accelerometer attached to the body is as follows: examining the possibility of various machine learning methods for human action recognition using data sets, such as everyday life and assembly work at a factory [7][8][9]; aiming at improving learning efficiency by imaging time series data of human actions [10]; distinguishing involuntary body vibrations due to illness from voluntary exercise [11,12]; and so on.This paper estimates the road surface condition using the deep convolutional neural network (hereinafter referred to as DCNN) that is one of the most famous expression learning techniques for developing impersonal models in voice recognition, and analyzes the learned DCNN.Our goal is to realize a system that provides the road accessibility visualization services to users by pattern matching using impersonal models, while gradually learning to improve service accuracy using new data provided by users.As the first step, this paper evaluates features acquired by the DCNN, which learned the state of the road surface from the data in supervised machine learning techniques.There are no other studies investigating and verifying that feature quantities extracted from the learned DCNN capture a more detailed road surface condition than the given road surface condition label.
In the rest of this paper, the outline of the proposed system is shown in Section 2. In Section 3, there is a preliminary analysis of the wheelchair sensing data, before the machine learning is performed, to clarify the relationship between the barrier of the sidewalk and the vibration value, and the relationship between the physical burden of the wheelchair user and the vibration value.Section 4 describes collection of wheelchair sensing data, assignment of learning labels, and data classification learning by the DCNN.In Section 5, analysis of the relationship between the data and the reaction pattern of the learned DCNN is conducted, and the analysis result is reported.Section 6 discusses future tasks, and Section 7 concludes this paper.

System for Estimation of Road Accessibilities
This section introduces the authors' proposed system for providing road accessibility information which are helpful for all walkers, especially for wheelchair users.Figure 1 shows the outline of the proposed system.In the proposed system, vibration waveforms during movement are collected by an acceleration sensor installed in a wheelchair.After extracting road surface information from vibration waveforms using machine learning, the extracted road surface information is accumulated and visualized on a map.
Information 2019, 10, x FOR PEER REVIEW 2 of 14 proposing a system which evaluates road surface condition by machine learning from acceleration sensor data.Research on human action recognition by applying machine learning to raw human behavior data measured by an accelerometer attached to the body is as follows: examining the possibility of various machine learning methods for human action recognition using data sets, such as everyday life and assembly work at a factory [7][8][9]; aiming at improving learning efficiency by imaging time series data of human actions [10]; distinguishing involuntary body vibrations due to illness from voluntary exercise [11,12]; and so on.This paper estimates the road surface condition using the deep convolutional neural network (hereinafter referred to as DCNN) that is one of the most famous expression learning techniques for developing impersonal models in voice recognition, and analyzes the learned DCNN.Our goal is to realize a system that provides the road accessibility visualization services to users by pattern matching using impersonal models, while gradually learning to improve service accuracy using new data provided by users.As the first step, this paper evaluates features acquired by the DCNN, which learned the state of the road surface from the data in supervised machine learning techniques.There are no other studies investigating and verifying that feature quantities extracted from the learned DCNN capture a more detailed road surface condition than the given road surface condition label.
In the rest of this paper, the outline of the proposed system is shown in Section 2. In Section 3, there is a preliminary analysis of the wheelchair sensing data, before the machine learning is performed, to clarify the relationship between the barrier of the sidewalk and the vibration value, and the relationship between the physical burden of the wheelchair user and the vibration value.Section 4 describes collection of wheelchair sensing data, assignment of learning labels, and data classification learning by the DCNN.In Section 5, analysis of the relationship between the data and the reaction pattern of the learned DCNN is conducted, and the analysis result is reported.Section 6 discusses future tasks, and Section 7 concludes this paper.

System for Estimation of Road Accessibilities
This section introduces the authors' proposed system for providing road accessibility information which are helpful for all walkers, especially for wheelchair users.Figure 1 shows the outline of the proposed system.In the proposed system, vibration waveforms during movement are collected by an acceleration sensor installed in a wheelchair.After extracting road surface information from vibration waveforms using machine learning, the extracted road surface information is accumulated and visualized on a map.The simplest type of accessibility visualization utilizing human sensing is simple wheelchair trails [13].Such trails provide practical information for wheelchair users regarding wheelchair accessible roads and facilities.The information is useful but not sufficient.The trail approach provides the fact that someone could pass there, but not all wheelchair users can pass in the same way.The physical abilities of wheelchair users are more diverse than generally imagined; there are users like Paralympic athletes, and some users may damage their bodies with only a few vibrations.Important information for wheelchair users is the physical state of the road surface, such as the angle of the slope of the sidewalk, height of curb, roughness of the road surface, and so on.This information The simplest type of accessibility visualization utilizing human sensing is simple wheelchair trails [13].Such trails provide practical information for wheelchair users regarding wheelchair accessible roads and facilities.The information is useful but not sufficient.The trail approach provides the fact that someone could pass there, but not all wheelchair users can pass in the same way.The physical abilities of wheelchair users are more diverse than generally imagined; there are users like Paralympic athletes, and some users may damage their bodies with only a few vibrations.Important information for wheelchair users is the physical state of the road surface, such as the angle of the slope of the sidewalk, height of curb, roughness of the road surface, and so on.This information about the physical state of the road surface is not only helpful for wheelchair users to make decisions of access/avoidance of the roads according to their physical conditions and abilities, but also all people with difficulties in moving in the same way.The information about the physical state of the road is the foundation of the road accessibilities.This paper hereinafter calls this information the road accessibility information.The purpose of this paper is to propose a system for providing this road accessibility information by human sensing and machine learning techniques.
Vibration waveforms from acceleration sensors are reasonable in detecting the road accessibility information because they are influenced by the state of the road surface.Extracting useful information from noisy vibration waveforms of accelerometers mounted on various wheelchairs is not impossible because of the recent success of developing impersonal models in pattern matching tasks by the DCNN.Our final goal is to realize the system that provides road accessibility visualization services to users by online/offline pattern matching using impersonal models, while gradually learning to improve service accuracy using new data provided by users.As the wheelchair traveling data of more places is gathered by more users after the service is launched, the model for pattern matching is incrementally strengthened.Along with maturity of the model, it will also be possible to extract the road accessibility information from the running data of baby strollers and bicycles in the same way as wheelchairs.This paper aims to establish a fundamental method of information extraction from wheelchair running data using machine learning.
The mobility support information system for people with difficulties in moving has been proposed as the following examples.

•
Creating walking space network data composed of links with information such as width, step, crossing gradient of walking route, etc., and nodes connecting links with latitude/longitude information [14][15][16]; Collecting walking space network data by on-site survey by community [17]; Providing information on the road surface condition to users' mobile terminals [18,19].
All of these have the problem that it is difficult to collect wide-area road surface information because it costs a huge amount of human power.On the other hand, the following has been tried as a method for evaluating the road surface condition by automatic processing.

•
Using high-resolution satellite images to classify land cover according to the physical condition of the ground surface, such as agricultural land, grazing land, barren areas [20][21][22]; • Detecting depression of a road surface using a method based on machine learning which is based on acceleration sensor data and GPS data installed in cars [23,24]; • Detecting recesses and abnormal traffic conditions on the road surface by acceleration sensor data, GPS data, voice data, etc. of smartphones for cities [25,26].
In any of these, it is difficult to obtain a detailed road surface condition required for mobility impaired people using wheelchairs.Therefore, we have aimed at developing a method of automatic extracting of detailed information on the road surface by machine learning from acceleration sensor data installed in a wheelchair.It is not easy to extract the influence of the condition of the road surface from the raw data of the acceleration sensor [27,28].It is important to convert from the observed acceleration sensor data to indexes representing the state of the road surface by expression learning techniques.

Road Barrier and Vibration
To analyze the relationship between physical vibration and subjective feeling of the road barrier, we employed a vibration acceleration level (VAL) with decibel units (dB) that was defined as Information 2019, 10, 114 4 of 14 20log 10 (a/a0), where a and a0 indicate the root-mean-square of three-axis acceleration values and a reference acceleration, respectively.a0 is usually set to 10 −5 m/s 2 by the Japanese Industrial Standards (JIS) C1510-1995.
Nine participants with mobility impairments conducted an experiment of wheelchairs moving around Akihabara Station in Tokyo.In order to obtain pure movement data, each participant was asked to move their own wheelchair which is being used in their everyday life, and the vibration from the wheelchair moving was obtained from a Sun SPOT three-axis accelerometer of Sun Microsystems attached on the axle of the rear wheels.The moving vibration data was sent to a base station from the accelerometer and the location data obtained from a GPS receiver were automatically recorded into the laptop contained in the backpack attached to the back of each wheelchair.The experiment was recorded by video taking, and each subject was asked his/her personal feelings about road barriers through an interviewer.
To find a higher VAL on the route intuitively, we separated the moving route represented on the Google map by every 0.02 s, and each part of the route was painted with one of 13 colors according to the averaged VAL value, i.e., red color for a VAL value over 116 dB.Notice that the location data obtained from the GPS receiver was revised by hand carefully to remove the multipath effect.
Figure 2 shows the result of a 50 year old male participant who had about 40 years' experience in using a wheelchair.He has been using it both inside and outside all the time.In contrast, Figure 3 shows a whole picture of the visualized VAL of a 30 year old male participant who had about 10 years' experience in using a wheelchair.Since he usually sails a kayak, a human-powered light narrow boat that has one or more covered cockpits, he enjoyed the wheelchair moving even when the body vibration turned out to be so high.From those results, it can be observed that the road barrier from a wheelchair moving does not only come from the vibration level like VAL, but also from vibration pattern changes.
Nine participants with mobility impairments conducted an experiment of wheelchairs moving around Akihabara Station in Tokyo.In order to obtain pure movement data, each participant was asked to move their own wheelchair which is being used in their everyday life, and the vibration from the wheelchair moving was obtained from a Sun SPOT three-axis accelerometer of Sun Microsystems attached on the axle of the rear wheels.The moving vibration data was sent to a base station from the accelerometer and the location data obtained from a GPS receiver were automatically recorded into the laptop contained in the backpack attached to the back of each wheelchair.The experiment was recorded by video taking, and each subject was asked his/her personal feelings about road barriers through an interviewer.
To find a higher VAL on the route intuitively, we separated the moving route represented on the Google map by every 0.02 s, and each part of the route was painted with one of 13 colors according to the averaged VAL value, i.e., red color for a VAL value over 116 dB.Notice that the location data obtained from the GPS receiver was revised by hand carefully to remove the multipath effect.
Figure 2 shows the result of a 50 year old male participant who had about 40 years' experience in using a wheelchair.He has been using it both inside and outside all the time.In contrast, Figure 3 shows a whole picture of the visualized VAL of a 30 year old male participant who had about 10 years' experience in using a wheelchair.Since he usually sails a kayak, a human-powered light narrow boat that has one or more covered cockpits, he enjoyed the wheelchair moving even when the body vibration turned out to be so high.From those results, it can be observed that the road barrier from a wheelchair moving does not only come from the vibration level like VAL, but also from vibration pattern changes.

Physical Burden and Vibration
It is known that the road with low accessibility increases the physical burden of the manual wheelchair.In this section, we analyzed the relationship between the physical burden and vibration of the manual wheelchair user.The authors have proposed a method to analyze and quantify the relationship between the acceleration change caused by thrust when pushing the wheelchair by hand and the physical burden during manual wheelchair traveling from the acceleration sensor mounted on the smartphone [29].Evaluation of the burden has been studied mainly in the field of sports and healthcare, and it has been utilized [30][31][32].NASA-TLX (NASA Task Load Index) [33] and the SWAT (Subjective Workload Assessment Technique) [34] are the evaluation indexes using a psychological scale.Representative examples of evaluation methods using biometrics include the evaluation of muscle fatigue using electromyograms [35], and the evaluation of exercise intensity and psychological stress by heart rate [36,37].
The physical burden of manual wheelchair driving is influenced by the magnitude of thrust which is the load of the rowing action.Since the force and the acceleration are proportional to each other from Newton's equation of motion, the acceleration waveform of the wheelchair changes depending on the degree of burden during wheelchair traveling.We hereinafter refer to the action while the hands are in contact with the handrim of the manual wheelchair as "padding".Focusing on the peak to peak value (hereinafter P-P value) of the waveform at the time of padding, correlation with heart rate was confirmed through experiments.
A total of eleven persons, including five manual wheelchair users and six wheelchair inexperienced users, wore an iPhone and a heartbeat sensor and ran a preselected route.As a result, wheelchair running data consisting of 14,189 rowing actions for a total of 4.5 h was obtained.Routes were selected from four types near Ichigaya station in Tokyo (route A is about 2200 m, route B, C and D is about 1350 ± 50 m).The iPhone was installed under the seat of the wheelchair and the acceleration data on the direction of traveling axis and the GPS data were acquired at a 50 Hz sampling rate.Since acceleration is susceptible to noise, a value smoothed by a simple moving average of 10 points before and after was used.A Poral H7 heartbeat monitor was used for heartbeat sensing, and data was acquired in RRI (the R wave-to-R wave (RR) interval) format at each beat.The heartbeat data was converted to bpm format after artifacts and noise were removed.
Two types of sample extraction methods with rowing and padding, and five kinds of feature amounts of P-P value, maximum value, minimum value, average value, and time width were combined to create 10 pattern data sets.As a result of comparing correlation coefficients between these data sets and heart rate data, the maximum value was 0.78 (P < 0.001) of the padding and P-P values.Figure 4 shows the smoothed row vibration P-P value and heart rate of the participant having the highest correlation value of 0.92.The two data were normalized with minimum value 0, maximum value 1.It was confirmed that the P-P value of the acceleration data of the manual wheelchair was highly correlated with the burden of padding, that is, the road surface barrier.
acquired in RRI (the R wave-to-R wave (RR) interval) format at each beat.The heartbeat data was converted to bpm format after artifacts and noise were removed.
Two types of sample extraction methods with rowing and padding, and five kinds of feature amounts of P-P value, maximum value, minimum value, average value, and time width were combined to create 10 pattern data sets.As a result of comparing correlation coefficients between these data sets and heart rate data, the maximum value was 0.78 (P < 0.001) of the padding and P-P values.Figure 4 shows the smoothed row vibration P-P value and heart rate of the participant having the highest correlation value of 0.92.The two data were normalized with minimum value 0, maximum value 1.It was confirmed that the P-P value of the acceleration data of the manual wheelchair was highly correlated with the burden of padding, that is, the road surface barrier.The difference when comparing the heartbeat and the vibration value by the method as shown in Figure 4 is hereinafter referred to as an error value.As the result of plotting the error value on the The difference when comparing the heartbeat and the vibration value by the method as shown in Figure 4 is hereinafter referred to as an error value.As the result of plotting the error value on the map, it was found that a difference of 0.33 or more (heart rate > vibration value) occurred at the intersection, the point where the inclination at right angles to the traveling direction was large, and the place where the road width narrowed.As confirmed by video, these were all cases when the user operated the manual wheelchair with one hand.The vibration pattern of one hand operation is different from the case of two handed operation.By using the pattern classification approach of the vibration waveform instead of the P-P value of the vibration value, it could be detected that the state of the road surface was different from the usual state.

Road Accessibility Estimation
In this section, we describe the method and results for evaluating the road surface condition by the DCNN through supervised learning.The data acquisition method, the labeled data set creation method, the DCNN structure, and the classification accuracy of the road surface condition are explained step by step.

Road Sensing and Labeling for Supervised Learning
A total of nine wheelchair users, including six manual wheelchair users and three electric wheelchair users, participated in the experiment.Their actions when traveling about 1.4 km of specified route around Yotsuya station in Tokyo were measured by the acceleration sensor (iPod touch) installed in the lower part of the wheelchair seat, and positioning data of the Quasi-Zenith Satellite System (QZSS) was added.Acceleration values in the x, y, and z axes of the acceleration sensor were sampled at 50 Hz, and a total of 1,425,798 samples (about 8 h) were obtained.In order to confirm the situation where the acceleration sample was acquired, the video image of the participant's wheelchair running and of the running road surface were taken at the same time during the experiment.As a trend of the entire traveling route, there are many flat sidewalks, which are not necessarily smooth.If the user did not have a problem in raising and lowering wheelchair ramp slopes, it was thought that excessive burden on the body and risk of accident had less chance of occurring on the route.
The data set which consists of the acceleration values and four types of labels-slope/curb/tactile block/other-was created by checking the state of the road surface and participants from all of the captured video images during experiments.Each of these four labels represents an important road surface feature: in order, a continuous gradient, an abrupt step, a continuous unevenness.In order to obtain sufficient classification learning accuracy, at least the data ratio of any label needed to be 1% or more of the total number of data.These labeled data were the number of points and the ratio of data sufficient for classification learning.The three-axis acceleration data was sliced into 7016 pieces by a sliding window method with a window width of 400 samples (about 8 s) and an overlapping rate of 0.5 and labeled.This window width and overlap rate were determined by searching for suitable values by research using the same data [38].

Machine Learning and Estimation Results
The structure of the DCNN used in this paper is shown in Figure 5.This network was composed of six layers of input layer, three convolution layers, a fully connected layer, and an output layer.By using the hierarchical structured network to learn functions from input to output, it was possible to proceed simultaneous learning of the feature extractor h and the classifier f that are effective for data classification.In the convolution processing part, following the representative research [11], two processes of convolution and max pooling were used.Here, w in the figure was a weight to be convoluted with respect to the input data, which was a parameter learned to successfully map the input to the output.Feature Map (N) in the figure means that different kinds of different weights were learned to obtain N types of different outputs, respectively.As another setting, Adam was used for learning of the neural network, and the learning rate was set to 0.0001.Rectified linear units were used for the activation function, except for the softmax function, which was used in the output layer.Table 1 compares the existing method with the DCNN in Figure 4 for the accuracy of the road surface classification task based on the four kinds of labels.Here, each table's value is the average value of nine trials obtained by the leave-one-subject-out method [39].The leave-one-subject-out method is an evaluation method that can verify the recognition accuracy for unknown users by evaluating with user data not included in the learning data.In this paper, we repeatedly evaluated the model trained with the data set of wheelchair users for eight people with the data set of the remaining one person.Mean F-score and accuracy of each class were used as the evaluation index.This classification problem is class imbalance.A high average F value means that it is well recognizable for barriers with few spots appearing in the data set.Details of the method used for comparison are as follows.
• Raw: The data set for nine people was classified by the K-nearest neighbor method (KNN).Input was a one-dimensional vector obtained by simply combining three-axis acceleration values for 400 windows.1, 5, 10, 15, 20, 25, and 30 were used for the value of K, but Table 1 shows the result of K = 1, which was the most accurate.

•
MV: As the same as Raw, KNN was used and Table 1 is the result of K = 1.The difference was that the average value and standard deviation of each axis of x, y, z of the three-axis acceleration for 400 windows was applied as six meta value inputs.

• SVM:
The data set was classified by a support vector machine (SVM).The same input as MV was used.A radial bias function was used as a kernel.A grid search was performed for γ (10 −12 -10 2 ) and C (10 −7 -10 7 ), and the value with the highest accuracy in the verification data was taken as the hyper parameter.

•
Heuri: The same classifier as the SVM was used.In addition to the six inputs used in SVM, the maximum, minimum, zero crossing, average crossing, average of difference, standard deviation of difference, maximum of difference, minimum of difference, FFT frequency component, the intensity of the 0 spectrum of the FFT frequency component, energy, and entropy was also used for input.In addition, correlation for each axis pair of x, y, z and maximum value of crosscorrelation (correlation coefficient in time shift with the highest correlation coefficient taking into consideration the time shift) were also calculated and used.Table 1 compares the existing method with the DCNN in Figure 4 for the accuracy of the road surface classification task based on the four kinds of labels.Here, each table's value is the average value of nine trials obtained by the leave-one-subject-out method [39].The leave-one-subject-out method is an evaluation method that can verify the recognition accuracy for unknown users by evaluating with user data not included in the learning data.In this paper, we repeatedly evaluated the model trained with the data set of wheelchair users for eight people with the data set of the remaining one person.Mean F-score and accuracy of each class were used as the evaluation index.This classification problem is class imbalance.A high average F value means that it is well recognizable for barriers with few spots appearing in the data set.Details of the method used for comparison are as follows.
• Raw: The data set for nine people was classified by the K-nearest neighbor method (KNN).Input was a one-dimensional vector obtained by simply combining three-axis acceleration values for 400 windows.1, 5, 10, 15, 20, 25, and 30 were used for the value of K, but Table 1 shows the result of K = 1, which was the most accurate.

•
MV: As the same as Raw, KNN was used and Table 1 is the result of K = 1.The difference was that the average value and standard deviation of each axis of x, y, z of the three-axis acceleration for 400 windows was applied as six meta value inputs.

• SVM:
The data set was classified by a support vector machine (SVM).The same input as MV was used.A radial bias function was used as a kernel.A grid search was performed for γ (10 −12 -10 2 ) and C (10 −7 -10 7 ), and the value with the highest accuracy in the verification data was taken as the hyper parameter.

•
Heuri: The same classifier as the SVM was used.In addition to the six inputs used in SVM, the maximum, minimum, zero crossing, average crossing, average of difference, standard deviation of difference, maximum of difference, minimum of difference, FFT frequency component, the intensity of the 0 spectrum of the FFT frequency component, energy, and entropy was also used for input.In addition, correlation for each axis pair of x, y, z and maximum value of cross-correlation (correlation coefficient in time shift with the highest correlation coefficient taking into consideration the time shift) were also calculated and used.

•
ECDF: We used the support vector machine as a classifier, and the empirical cumulative distribution function (ECDF) as an input.The ECDF has advantages compared to other feature extraction methods, such as FFT, PCA, statistical quantity, etc., in multiple tasks [40].For the number of interpolation points, which was a hyper parameter, it has been reported that the sensitivity is low, so it was fixed to 10 in the experiment.As Table 1 shows, the method using the DCNN had higher accuracy than the conventional method.As the result of the classification accuracy of the DCNN was high in the road condition classification task using wheelchair traveling data, our proposed method is reasonable and practical.

Analysis Overview
In this section, it is investigated what type of road surface features the DCNN has learned.The DCNN seems to capture more detailed characteristics of the road surface as the output pattern of all the fully connected layers of 400 units was divided into a plurality of clusters.The analysis method is explained in order of Step 1 to Step 5 below.
Step 1: Acquire the DCNN output pattern As shown in the previous section, 400 units of output pattern of the fully connected layer was acquired by creating a DCNN model trained with eight data sets and inputting the remaining one data set as test data."Feature amount" is hereinafter used as the term referring to the output pattern of these 400 units.
Step 2: Clustering of feature amounts Clustering of feature amounts for each data set of nine participants was done.K-means was used as a clustering method.
Step 3: Visualize on the map The clustering results were analyzed by determining the color of data for each cluster and visualizing the traveling points where individual data were obtained on the map.
Step 4: Determining the optimal number of clusters It was confirmed that all participants had a case where feature amounts for uplink slope and down slope data were classified into different clusters.In such a case, it was considered that there was a high possibility that the features of the detailed road surface were accurately captured even in other clusters.In this paper, the number of clusters at that time was taken as the optimum number of clusters.Table 2 shows the optimum number of clusters for all participants (E1-3: electric wheelchair user, M1-6: manual wheelchair user).
Step 5: Investigation of correspondence between feature amounts and road condition Information 2019, 10, 114 9 of 14 Moving pictures taken during wheelchair running were analyzed and the relation between the feature amounts of clusters and the corresponding road surface condition was examined.

Results
Figure 6 shows the feature amount clustering result example when E1, having the highest classification accuracy of the DCNN, was used as test data.When visualizing on the map, the position information corrected based on the moving image was used.As shown in Figure 6, each user was traveling three times on the route, and on the second lap, the route was traveled in the opposite direction between the first and third laps.(a) Slope Position (2) in Figure 6 is the place where the slope label was given.All up and down slopes were divided into different clusters with all participants.It is thought that the DCNN caught the difference in acceleration in the case of the up and down direction driving; the difference in the number of times of rowing and padding duration in the case of manual wheelchairs.Because the distance is short, and the slope is gentle, position (3) in Figure 6 is the point where the slope labels were not given beforehand.Seven of nine people were included in the same cluster as the uphill slope when this point was traveling in a direction going up the slope.In the case of E1 with the highest classification accuracy, it was classified into independent clusters.This suggests that the DCNN gains a more detailed characterization of the road surface than the labels.
(b) Curb Depending on participants and places, curbstones showed a difference in classification accuracy.In case of E1, the curb at position (1) in Figure 6 was classified into three clusters.With this data, it is considered that the reason why the curbs were clustered in detail is that each has a characteristic of vibration when passing through the curb stone, and the DCNN can classify the minute difference of the vibration.
(c) Tactile block Points across the tactile block were classified into various clusters.The data labeled with haptic blocks was rich in diversity.Because the tactile block is installed just before the pedestrian crossing, there is a curb or slope in the vicinity, so the data is likely to be affected by them.Also, the contact condition with the block, such as the direction of the wheel or the one wheel/two wheels, and the difference in the contact time can also affect the data.However, the point including the lateral cutting of the tactile block and the point before and after the haptic block were classified into different clusters.(a) Slope Position (2) in Figure 6 is the place where the slope label was given.All up and down slopes were divided into different clusters with all participants.It is thought that the DCNN caught the difference in acceleration in the case of the up and down direction driving; the difference in the number of times of rowing and padding duration in the case of manual wheelchairs.Because the distance is short, and the slope is gentle, position (3) in Figure 6 is the point where the slope labels were not given beforehand.Seven of nine people were included in the same cluster as the uphill slope when this point was traveling in a direction going up the slope.In the case of E1 with the highest classification accuracy, it was classified into independent clusters.This suggests that the DCNN gains a more detailed characterization of the road surface than the labels.
(b) Curb Depending on participants and places, curbstones showed a difference in classification accuracy.In case of E1, the curb at position (1) in Figure 6 was classified into three clusters.With this data, it is considered that the reason why the curbs were clustered in detail is that each has a characteristic of vibration when passing through the curb stone, and the DCNN can classify the minute difference of the vibration.

(c) Tactile block
Points across the tactile block were classified into various clusters.The data labeled with haptic blocks was rich in diversity.Because the tactile block is installed just before the pedestrian crossing, there is a curb or slope in the vicinity, so the data is likely to be affected by them.Also, the contact condition with the block, such as the direction of the wheel or the one wheel/two wheels, and the difference in the contact time can also affect the data.However, the point including the lateral cutting of the tactile block and the point before and after the haptic block were classified into different clusters.It was found that the characteristics of the data including and excluding the haptic block were correctly captured by the DCNN.

Future Tasks
In the previous section, we confirmed that the learned DCNN had acquired the characteristics of the road surface more than the labeled.In this section, we discuss the similarity of the road surface conditions of these points while confirming by images, using the method of searching points where the Euclidean distances of the reaction pattern of the DCNN were close.
The point used as a similar data search input is hereinafter called "Query" in this paper.For the arbitrary points, the top five places where the Euclidean distances of the reaction pattern of the DCNN were close were extracted as similar sites with Query.Using the raw data of acceleration as the comparison target, the extraction of the top five places where the Euclidean distances were similar was performed in the same way.
Figure 7 shows a similar road surface search result with three queries; uphill slope, curbstone, and a place crossing a tactile block.For the color of the frame of the image, black was used when having the same road feature, as Query was readable from the image, and red was used if it cannot be read that it had the same road feature.In an uphill search example, in the case of the raw data of the acceleration, only the fourth rank among the top five corresponds to uphill slope, and in the case of using the feature quantity of the DCNN, all the top five corresponded to uphill slope.In a curbstone retrieval example, when using the raw data of acceleration, curbstone corresponds to the first and third rank, and in the case of using the feature quantity of the DCNN, all the top five corresponded to curbstones.In a crossing tactile block example, in the case of the raw data of the acceleration, it was judged to be similar to the data of the point where the wheelchair was stopped.In the case of using the feature quantity of the DCNN, only the first place among the top five cases corresponded to the place where the tactile block was crossed.The misjudged second to fourth were roads with continuous unevenness composed of small tiles.It can also be said that these were the results that the similarity with the feature of Query, when traveling on the tactile block, was correctly captured.
By analyzing the similar road surface search in addition to the clustering analysis of the feature amount, it was shown that the DCNN acquired more detailed road surface conditions than the four types of labels given in advance for classification learning.Through these analysis results, this paper confirms the possibility that the reaction pattern of the fully connected layer of the learned DCNN is effective as a means of quantifying the road surface condition as the feature quantity from the acceleration sensor data during the wheelchair running.
On the other hand, it was found that the classification accuracy of the road surface condition varied in the feature quantity obtained for each wheelchair user.In general, inter-class diversity due to users' differences is recognized as one of the main problems for developing human behavior recognition models [41]; how to deal with that problem is an important research topic.The simplest solution to this problem is to develop a large-scale human behavior database.Kawaguchi et al. proposed the concept of collecting a human behavior corpus for understanding real world activities [42].A large-scale behavior corpus is important not only for developing individual estimation models but also for understanding personal characteristics, and it is technically possible to develop impersonal models applicable to all users.Also, in this research, by collecting and analyzing more data sets including the new environment, it was possible to reduce variations in feature quantities obtained for each user and variations in classification accuracy of road surface conditions.We will collect and analyze more data sets including new environments and qualitatively evaluate the acquired features and propose a method to acquire more accurate feature representation that can reduce the influence of each user.
quantity of the DCNN, all the top five corresponded to uphill slope.In a curbstone retrieval example, when using the raw data of acceleration, curbstone corresponds to the first and third rank, and in the case of using the feature quantity of the DCNN, all the top five corresponded to curbstones.In a crossing tactile block example, in the case of the raw data of the acceleration, it was judged to be similar to the data of the point where the wheelchair was stopped.In the case of using the feature quantity of the DCNN, only the first place among the top five cases corresponded to the place where the tactile block was crossed.The misjudged second to fourth were roads with continuous unevenness composed of small tiles.It can also be said that these were the results that the similarity with the feature of Query, when traveling on the tactile block, was correctly captured.By analyzing the similar road surface search in addition to the clustering analysis of the feature amount, it was shown that the DCNN acquired more detailed road surface conditions than the four types of labels given in advance for classification learning.Through these analysis results, this paper confirms the possibility that the reaction pattern of the fully connected layer of the learned DCNN is effective as a means of quantifying the road surface condition as the feature quantity from the acceleration sensor data during the wheelchair running.

Conclusions
For providing road accessibility information which is helpful for all walkers, especially for wheelchair users, this paper proposed the system and the method of extracting detailed information on the road surface by machine learning from acceleration sensor data installed in a wheelchair.From preliminary analysis of the relationship between the barrier of the sidewalk and the vibration value and of the relationship between the physical burden of the wheelchair user and the vibration value, the characteristics of the wheelchair sensing data were identified and shown.This paper acquired traveling data of nine wheelchair users, labeled them, classified them by the DCNN, and confirmed their accuracy.Focusing on the reaction pattern of the learned DCNN for data, cluster analysis and similar road surface search were performed.As the result, it was confirmed that the DCNN learned more detailed road surface conditions than the four types of labels preliminarily attached for learning by the DCNN as the feature quantity.The reaction pattern of the fully connected layer of the learned DCNN was shown to be effective as a means of quantifying the road surface condition as the feature quantity from the acceleration sensor data during the wheelchair running.
The first contribution of this paper was to demonstrate the fundamental method of the road accessibility information extraction from wheelchair running data using machine learning with real data of wheelchair users.To develop impersonal models applicable to all users, we will examine new methods to acquire more accurate feature representation that can reduce the influence of each user by collecting and analyzing more data sets including the new environment.The second contribution of this paper was to confirm the possibility of reducing the labeling cost of supervised learning by showing that the learned feature quantity acquired the road features in more detail than the teacher data.The road surface features extracted from the learned network were able to be utilized as the model for pattern matching.The method of weak supervised leaning by using a simple teacher label that can be automatically assigned without human power will be applied in the next step.

Figure 1 .
Figure 1. Outline drawing of road surface feature automatic evaluation system by wheelchair sensing and machine learning.

Figure 1 .
Figure 1. Outline drawing of road surface feature automatic evaluation system by wheelchair sensing and machine learning.

Figure 2 .
Figure 2.A 50 years old male participant with a rheumatoid and paralyzed arm moves his manual wheelchair.Although he tried to control his wheelchair carefully throughout this experiment, higher VAL values were detected at uneven road surfaces.

Figure 3 .
Figure3.A 30 years old male participant moved his wheelchair so fast, and higher VAL values were often detected.Regardless, he enjoyed his wheelchair moving.

Figure 2 .
Figure 2.A 50 years old male participant with a rheumatoid and paralyzed arm moves his manual wheelchair.Although he tried to control his wheelchair carefully throughout this experiment, higher VAL values were detected at uneven road surfaces.

Figure 2 .
Figure2.A 50 years old male participant with a rheumatoid and paralyzed arm moves his manual wheelchair.Although he tried to control his wheelchair carefully throughout this experiment, higher VAL values were detected at uneven road surfaces.

Figure 3 .
Figure 3.A 30 years old male participant moved his wheelchair so fast, and higher VAL values were often detected.Regardless, he enjoyed his wheelchair moving.

Figure 3 .
Figure 3.A 30 years old male participant moved his wheelchair so fast, and higher VAL values were often detected.Regardless, he enjoyed his wheelchair moving.

Figure 4 .
Figure 4. Comparison between normalized P-P values and heart rates at padding of the participant, having the highest correlation value of 0.92.

Figure 4 .
Figure 4. Comparison between normalized P-P values and heart rates at padding of the participant, having the highest correlation value of 0.92.

14 Figure 5 .
Figure 5.The structure of the deep convolutional neural network (DCNN) used in this paper.

Figure 5 .
Figure 5.The structure of the deep convolutional neural network (DCNN) used in this paper.

14 Figure 6 .
Figure 6.The clustering result of the DCNN feature output of E1′s acceleration data in 14 clusters.

Figure 6 .
Figure 6.The clustering result of the DCNN feature output of E1 s acceleration data in 14 clusters.

Figure 7 .
Figure 7. Similar road surface search results using DCNN feature output obtained by inputting acceleration data and raw acceleration data.

Figure 7 .
Figure 7. Similar road surface search results using DCNN feature output obtained by inputting acceleration data and raw acceleration data.

Table 1 .
Performance comparison between conventional methods and the DCNN in supervised learning.MV: statistical meta value of raw data; SVM: support vector machine; ECDF: empirical cumulative distribution function.