4.4. Comparison with State-of-the-Art Methods
To the best of our knowledge, the NCD task in sensor-based HAR is closely related to unsupervised clustering. Therefore, the proposed MRNCL is compared with commonly used clustering algorithms, such as
k-means [
30] and Agglomerative Clustering (AC) [
32], with three different linkage types (i.e., Average, Complete and Ward). For the approach building on
k-means and AC, a model is first trained using labeled data through supervised learning. Subsequently, features are extracted with such a trained model for the unlabeled data that had never been exploited by the model before. Finally, the
k-means and AC is applied to these extracted features to acquire the clustering outcomes. Compared with NCL, the proposed MRNCL discards CS and AP, and uses our new similarity measure
. The modified NCL framework (ModifiedNCL), which removes CS and AP, is chosen for a fair comparison. To show that our framework outperforms the reference framework, we also compare our MRNCL with NCL [
26]. The above methods are evaluated on the unlabeled test set by clustering accuracy (%), pairwise F-score (%), precision (%), and recall (%). Comparison results are shown in
Table 4.
The following observations can be drawn. Firstly, on the WISDN and USC-HAD datasets, the clustering accuracy, pairwise F-score, precision, and recall values are higher than those of
k-means [
30]. On UCI-HAR, our model has a higher recall but a lower precision than
k-means [
30]. In order to gain insights into this phenomenon, we visualize the clustering performance of
k-means [
30] and our method on UCI-HAR. We obtain the feature embeddings of all data form the unlabeled test set on UCI-HAR, which are visualized by t-SNE [
53] with dimensionality reduction. Feature visualization and the clustering results by using
k-means and our method are shown in
Figure 4.
As can be observed from
Figure 4a, it is evident that cluster1 and cluster3 each correspond to a specific activity(i.e., sitting and upstairs, respectively), which leads
k-means method to a low false positive rate (i.e., fewer pairs of samples that do not belong to the same activity are incorrectly clustered together). Consequently, the precision rate achieved is relatively high (87.26%). However, a significant number of sitting and upstairs samples are incorrectly clustered in cluster2, which leads to a high false negative rate (i.e., more pairs of samples that belong to the same activity are not clustered together) so that the recall rate only achieves only 34.20%. In contrast,
Figure 4b demonstrates that our method yields distinguishable feature representations. This is evident from the observation that different activities are represented in a more compact area. The result of this is that almost all of the upstairs samples are clustered in cluster1, and a majority of sitting samples are clustered in cluster3, which leads our method to enhance the recall rate by +43.88% compared to
k-means. Furthermore, our clustering accuracy and F-score are significantly higher than
k-means [
30] on UCI-HAR. Taking into account the performance of our method across all three datasets, it is evident that MRNCL is preferred to
k-means [
30] for NCD in HAR.
Secondly, as shown in
Table 4, the performance of the three types of linkages in Agglomerative Clustering (AC) is not superior to the
k-means method on the three datasets. We can observe that on both UCI-HAR and USC-HAD datasets, the AC algorithm with three linkages achieves significantly higher
but lower
compared to ours.
Similar to the k-means method discussed in the first point, this clustering method is also limited by the feature representations obtained from models trained on labeled data. In other words, the models only capture transferred knowledge of the old classes in the labeled data without performing discriminative feature learning between the new classes for the unlabeled data. This ultimately leads to the fact that the AC algorithm with three different linkages on the three datasets is significantly lower than our MRNCL in terms of and F. This once again demonstrates that our model’s ability to learn inter-class features of new activity classes leads to more accurate and superior performance in classifying new activities.
Thirdly, as shown in
Table 4, our method outperforms ModifiedNCL on WISDN and UCI-HAR datasets, as we achieve higher
,
F,
,
. On USC-HAD, we visualize the feature representations and clustering results of unlabeled samples obtained by these two methods on USC-HAD, as shown in
Figure 5.
As shown in
Figure 5a,b, it can be observed that neither our method nor ModifiedNCL was able to generate distinguishable feature representations for running and jumping samples. Therefore, the clustering results for these two activity samples are poor under both methods. Due to the inter-activity similarity mentioned in [
22,
24], similar running and jumping samples cannot be separately classified into their corresponding single cluster, which decreases the recall rate of our method. Inter-activity similarity represents a significant challenge for our work, and its impact on the results is discussed in
Section 5. In
Figure 5a,b, the feature representations of downstairs, standing, and sleeping samples are all clustered well, as the maximum number of feature representations is represented in a compact area by ModifiedNCL and our method. However, for walking-right samples, ModifiedNCL represents their feature representations in a manner similar to downstairs and running samples, resulting in their incorrect clustering together, as shown in the black dashed box of
Figure 5a. In contrast, as shown in the black dashed box of
Figure 5b, the feature representations belonging to walking-right samples are easily distinguishable from those of other activities Therefore, they are grouped into a more compact area and well clustered together, leading to a +2.9% increase in
and a +2.11% increase in
for our method compared to ModifiedNCL. The above performance on three datasets demonstrates that our similarity metric facilitates the model in learning inter-class features between new classes, ultimately improving the clustering results.
Fourthly, in contrast to NCL [
26], our lighter MRNCL still boosts
by +9.52% on WISDM, +1.34% on UCI-HAR, and +1.12% on USC-HAD as shown in
Table 4. Additionally, our method outperforms the NCL framework in terms of
F,
,
. Therefore, our MRNCL is superior to the NCL framework.
Overall, the performance of our framework on the three datasets demonstrates the strong capability of our method to effectively classify unlabeled data into distinct new activity categories on the new class discovery task in sensor-based HAR.
4.5. More Reliable Neighborhoods
The main advantage of our MRNCL over NCL [
26] lines in our similarity measure, which helps to select more reliable neighborhoods as pseudo-positives for contrastive loss.
This enables our model to learn inter-class features for clustering new class samples and improving clustering results. To demonstrate the reliability of our proposed similarity metric in selecting neighborhoods, we calculate the percentage (%) of true positives (i.e., they and their query are the same activity class) in the selected KNNS for each epoch during training steps, and then take the average of all calculated values across both three datasets.
Our approach is compared to the following three approaches.
The following conclusions can be drawn from
Figure 6. Firstly, when comparing our MRNCL with ModifiedNCL, the similarity metric is the only difference between the two frameworks. Our
selects significantly more true positives than the cosine similarity as the training epochs increase on all three datasets. This is evidenced by the fact that our method has an average increase of +2.9% on WISDM, +5.27% on UCI-HAR, and +6.99% on USC-HAD. More true positives reinforces our model’s ability to learn inter-class features of new classes, thereby improving the clustering results, as shown in
Table 4. Additionally, as shown in
Figure 6a,c, the cosine similarity in ModifiedNCL leads to significant instability in the selection of true positives during training. Therefore, our similarity measure is significantly superior to the cosine similarity for finding more reliable neighborhoods. Secondly, NCL [
26], which employs the CS and AP forces, outperforms ModifiedNCL when using the cosine similarity for true positives on UCI-HAR and USC-HAD. However, as displayed in
Figure 6a, NCL [
26] becomes unreliable after 60 epochs of training on the WISDM dataset, reducing the
by 5.9%,
F by 6.24% as shown in
Table 4. Finally, compared with NCL [
26], our framework still performs Avg +3.17% on WISDM and +0.18% on UCI-HAR even without CS and AP. This further demonstrates the superiority of our similarity metric. On USC-HAD, although the Avg of true positives is higher than ours, the clustering performance is inferior to ours, as shown in
Table 4, This indicates the reliability of our lighter framework.
To demonstrate the robustness of the model during training, we also present the correlation between the number of epochs and loss for the three models on the three datasets in
Figure 7.
In
Figure 7, we observe a significant increase in the training loss value at the beginning of the epoch for all three datasets across all three frameworks. This increase is attributed to the introduction of the NCL loss and SCL loss at the
epoch, as discussed in
Section 4.1.3. As can be seen from
Figure 7a,c, the ModifiedNCL experiences fluctuations in the loss value between 80 and 100 epochs during training on WISDN and USC-HAD. This instability is attributed to the cosine similarity’s inconsistent ability to select true-positives as neighborhoods (presented in
Figure 6a,c), which impacts the model’s stability. In contrast, the loss value decreases gently and converges at the end of training for our MRNCL, highlighting its exceptional robustness and reiterating the superiority of our similarity metric.
Figure 7b demonstrates that both MRNCL and NCL exhibit consistent and gradual loss convergence, with a slight increase in the middle, indicating their effectiveness in learning and optimizing the given task. Across all three datasets, our MRNCL and NCL exhibits a smooth performance during training. However, our MRNCL has neither CS nor AP components, indicating its lightweight nature compared to NCL.