5.1. Main Evaluation Results
We evaluated and compared the classification accuracy of the four classification methods in Section III using hold-out validation. For all of the methods, we also compared the accuracy for three cases: only ceiling radar data, only wall radar data, and fused data from the two radars. The classification model was first trained using 80% of the data for each case and then tested with the remaining 20%. We performed 30 trials of the test processes by randomly varying the training data.
Table 1 summarizes the mean and standard deviation of classification accuracies from the 30 test trials for the four classification methods. The CNN method achieved the best accuracy of 95.6%, indicating that the spectrogram images were more effective than the spectrogram envelope [
16] or motion parameter-based approach for human behavior and fall classification in restrooms. However, other classification methods also achieved moderate accuracy, making it possible to obtain efficient motion information and/or parameters necessary for classification. This is discussed in the next subsection.
Furthermore, better accuracy was obtained when using dual radar data than by using the ceiling or wall radar data only. In particular, a significant improvement was obtained for the CNN and RF methods when dual radar data were used. Therefore, we conclude that the motions in both upward and horizontal directions included the differences in the assumed behaviors and falls.
The results from the CNN method based on the convergence curve are shown in
Figure 8, and the confusion matrix is further discussed to validate its performance. No overfitting was observed in either test or training processes. The accuracy in the test process converged in less than 50 epochs.
Table 2 shows the confusion matrices for the data from the ceiling, wall, and dual radars. The classification accuracies of “(f) pulling up the pants” and “(b) pulling down the pants” are worse for the ceiling and wall radar data, respectively. However, the classification accuracy of (f) improves when the fused data are considered whereas that of (b) is not improved. The classification accuracy of “(h) falling” is 100% in all cases, and is the most important function for the practical use of fall detection.
5.2. Discussion on Efficient Features
This section discusses the efficient features measured with each radar when classifying human behaviors in restrooms. First, we discuss the effectiveness of the data from each radar and the fused data.
Table 3,
Table 4 and
Table 5 show the confusion matrices for the LSTM, RF, and SVM methods, respectively. As indicated in the confusion matrices of the CNN (
Table 2) and LSTM methods, all behaviors and falls are accurately classified using deep learning methods. However, we can see the different classification accuracies for some classes. For example, as indicated in
Table 2, the classification accuracy of the classification of behaviors (b) and (g) was worse in the results of the CNN method with the ceiling radar. In contrast, these were accurately classified with the LSTM method with the ceiling radar data as shown in
Table 3. These results indicate that some of the behaviors accurately classified by these methods varied because of the differences in the included features in the spectrogram images and envelopes. In addition, for the motion parameter-based methods (the RF and SVM methods), behaviors (b) and (g) were classified with better accuracy, as indicated in
Table 4 and
Table 5, even though the overall accuracies were significantly worse than the CNN method. Because the motion parameters were extracted from the envelopes that were also used in the LSTM method, the efficient features for the classification might be included in the spectrogram envelopes extracted from dual radars. In the following, we discuss the efficient features and factors of our results.
Next, we discuss the effectiveness of using dual radar data. Similar to the results for the CNN method, better performance was observed with the wall radar data than with the ceiling radar data. These results indicate that the motion information in the horizontal direction obtained with the wall radar includes significant information for classifying the assumed human behaviors. Another reason is that the wall radar received the data for the whole body, whereas the ceiling radar mainly obtained the motion of the head. The confusion matrices further confirm the differences between the two radars’ results for the classified behaviors. In particular, the confusion matrices of the RF and SVM methods in
Table 4 and
Table 5 indicate that the combination of the data from the two radars significantly improves the classification accuracy because the data from the two radars complement each other. Similar accuracy improvements based on the fusion of dual radar data also can be confirmed from the confusion matrices of other methods, further verifying the effectiveness of the dual radar data.
We now discuss the efficient features included in the spectrograms. Because the classification accuracy of the eight behaviors with RF and SVM methods was above 60%, we consider the selected feature parameters for these methods.
Table 6 shows the selected features for the RF and SVM methods using the filter method. The acceleration and jerk parameters were selected for all radar cases. These results indicate that the detailed motion parameters of acceleration and jerk were more effective than the velocity parameters obtained directly from Doppler radar measurements. However, the LSTM and CNN methods performed better than the RF and SVM methods using the motion parameters.
We conclude that deep learning can grasp the detailed information in the spectrograms corresponding to higher-order derivative parameters. In addition, because the CNN method had better accuracy than the LSTM method, detailed motion information was obtained from the main components extracted as the spectrogram envelopes and from other components corresponding to the micromotion of the various body parts.
The findings regarding the efficient features for the classification of human behaviors and falls in restrooms are summarized as follows:
The wall radar that measured motion in the horizontal direction was more effective than the ceiling radar that measured motion in the vertical direction.
The accurately classified classes for the two radars were different. Hence, a fusion of the two radars was effective.
The proposed method effectively used the detailed higher-order derivative parameters of acceleration and jerk.
Detailed motion information was diffused across the whole of the spectrograms and was not limited to the main components, and was efficiently extracted via the CNN.
However, as the limitation of this study, the concrete clarification of the efficient parameters and/or factors for our classification problem was difficult. To achieve this, we have to find the features that indicate clear divergence of the assumed behaviors in restrooms based on other various approaches (e.g., using principal component analysis, application of other classification algorithms and comprehensive comparison with the results of this study, and data acquisition from a larger number of participants).
5.3. Comparison with Conventional Studies
In this section, we compare our method with the conventional remote sensor-based monitoring methods for restrooms.
Table 7 outlines the comparison of the experimental studies aimed at detecting abnormal, dangerous behaviors in restrooms. The proposed method achieved the best performance in terms of the classification accuracy, number of classified behaviors, number of participants, and detection accuracy of the human fall.
Due to privacy issues, the number of studies on restroom monitoring using cameras is quite limited. Reference [
6] is one of the few studies that report on camera-based monitoring of restrooms to detect dangerous situations to protect the elderly. However, because sensors without privacy issues are more suitable for restroom monitoring, approaches using infrared-based thermal sensors and radars have been recently studied. However, most studies classify situations as normal or dangerous behaviors [
9,
10]. Although thermal sensors show a sufficiently accurate classification, detailed behaviors were not classified because the sensors cannot detect the motion information directly. By contrast, radar techniques can acquire motion velocity information and classify it into multiple behaviors as carried out in [
14,
15]. However, the accuracy achieved in [
15] was insufficient because only the simple feature parameters related to distance and signal information and the RF method were used. Therefore, our previous study [
16] proposed the LSTM method that used the rich velocity information obtained via spectrogram envelopes. While both our previous research and the present study can classify the behaviors into eight categories, the proposed method was carried out using CNN and showed higher classification accuracy, including 100% fall detection. In addition, the present study used a relatively large dataset generated from a larger number of participants, and the spectrogram images utilized the rich velocity information included in the Doppler radar signals.