Traditional supervised time series classification (TSC) tasks assume that all training data are labeled. However, in practice, manually labelling all unlabeled data could be very time-consuming and often requires the participation of skilled domain experts. In this paper, we concern with the positive
[...] Read more.
Traditional supervised time series classification (TSC) tasks assume that all training data are labeled. However, in practice, manually labelling all unlabeled data could be very time-consuming and often requires the participation of skilled domain experts. In this paper, we concern with the positive unlabeled time series classification problem (
PUTSC), which refers to automatically labelling the large unlabeled set
U based on a small positive labeled set
PL. The self-training (
ST) is the most widely used method for solving the
PUTSC problem and has attracted increased attention due to its simplicity and effectiveness. The existing
ST methods simply employ the
one-nearest-neighbor (
1NN) formula to determine which unlabeled time-series should be labeled. Nevertheless, we note that the
1NN formula might not be optimal for
PUTSC tasks because it may be sensitive to the initial labeled data located near the boundary between the positive and negative classes. To overcome this issue, in this paper we propose an exploratory methodology called
ST-average. Unlike conventional
ST-based approaches,
ST-average utilizes the average sequence calculated by DTW barycenter averaging technique to label the data. Compared with any individuals in
PL set, the average sequence is more representative. Our proposal is insensitive to the initial labeled data and is more reliable than existing
ST-based methods. Besides, we demonstrate that
ST-average can naturally be implemented along with many existing techniques used in original
ST. Experimental results on public datasets show that
ST-average performs better than related popular methods.
Full article