You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Article
  • Open Access

27 December 2021

Time Series Classification with InceptionFCN

,
,
and
Department of Electronic and Computer Engineering, Inha University, Incheon 22212, Korea
*
Author to whom correspondence should be addressed.
This article belongs to the Section Internet of Things

Abstract

Deep neural networks (DNN) have proven to be efficient in computer vision and data classification with an increasing number of successful applications. Time series classification (TSC) has been one of the challenging problems in data mining in the last decade, and significant research has been proposed with various solutions, including algorithm-based approaches as well as machine and deep learning approaches. This paper focuses on combining the two well-known deep learning techniques, namely the Inception module and the Fully Convolutional Network. The proposed method proved to be more efficient than the previous state-of-the-art InceptionTime method. We tested our model on the univariate TSC benchmark (the UCR/UEA archive), which includes 85 time-series datasets, and proved that our network outperforms the InceptionTime in terms of the training time and overall accuracy on the UCR archive.

1. Introduction

Time series classification started to attract attention in the early 2000s, and, at that time, the proposed methods were based on traditional, algorithm-based approaches. However, applying deep learning techniques for time-series data has recently become trendy amongst researchers. With the growth of data, the requirement of processing that data is also increasing. Therefore, there has been high demand among data mining researchers to extract, analyze, and understand the time-series data. However, with the implementation of deep neural networks, the capability of classifying the data also significantly increased. The idea of deep learning was first introduced by Yann LeCun [1] in 1998, when the multi-level artificial neural network was developed to classify the handwritten digits, but it truly gained popularity after the introduction of AlexNet [2] for image classification. Since then, tremendous research has been conducted that has led to the creation of successful algorithms. Deep neural networks are sophisticated and capable of solving computer vision problems by handling multidimensional data, such as image spatial information, for classification or localization. Most of the success of deep learning in image recognition tasks has been attributed to the “depth” of the architectures. Time series data, however, are computationally simpler and require the computation of only sequenced data.
UCR open-source archive, the largest time-series (TS) data collection, is used as a benchmark by most of the researchers to test the model performance. The UCR archive currently contains 156 datasets that were initially collected and normalized [3]. The datasets consist of different data lengths and numbers of classes. As a good example of the different methodologies to classify the TS data, we can outline Hassan Ismail Fawaz et al. [4], where the authors implemented various deep learning modules, including multilayer perceptron (MLP), encoder, residual network (ResNet), fully convolutional network (FCN), and compared the results with each and every dataset in the UCR archive for both univariate and multivariate data. Another upgraded network was proposed by the same authors [5], where they first adapted the Inception module for time series classification (TSC) in which each module has the same architecture with different randomly initialized weight values. The main idea behind the Inception module is to apply several simultaneous filters of varying length on a continuous time-series input. We investigated the diversity of the best-related work and found out that there are still topics to improve. For this study, we can decrease the training time without an accuracy drop by applying a wider number of filters to the fundamental layers inside the Inception block and modifying the hyperparameters. Instead, we added the computational overhead to amplify the performance. Additionally, we implemented deeper FCN with varying convolutional filters and finetuned the network by adding pooling layers and included a dropout layer to decrease the likelihood of overfitting and become less sensitive to smaller fluctuations in the time-series data.
The key contributions are summarized as follows:
  • Inception block modification—we modified the existing Inception module by finetuning the parameters for the convolutional and max-pooling layers. We created narrower convolutional layers than the original Inception block by comprising more kernels per layer. These changes speed up the training due to the decrease in the number of parameters and FLOPs [6].
  • Aggregation—we combined the deeper FCN block with the Inception module to boost the classification performance. We sequentially trained the initial time-series features on Inception and on FCN modules, then we merged the output with adding layers at the end of the network. Although each contribution is easy to be implemented, we believe this is the first work including the combination of both methods.
The rest of the paper is organized as follows. Section 2 briefly describes the UCR archive data collection and discusses various related work in traditional and deep learning methods. We explain the proposed network architecture in Section 3. Section 4 demonstrates the experimental results and illustrates numerical comparisons. Section 5 discusses our findings and the exclusion of some related work. Finally, Section 6 concludes the paper and discusses the possible future work.

3. Methodology

3.1. Network Architecture

Our proposed method has similarities with the InceptionTime method. The differences, however, include that we separated the network into two parts: the inception module and the shortcut module. The inception module is composed of two blocks. A bottleneck layer with a linear activation function is used at the top of the network to reduce the size of the input tensor in a convolutional layer that is a 1D convolutional layer with an input size of 64 and 1 × 1 kernel size. The rest of the block is similar to the original inception block and consists of the three convolutional layers with multiple filters of different sizes. Next, we added a max-pooling layer to make our model stable to small feature translation. Feature translation means that even if we reduce the dimensions of the feature map values, the output from the max-pooling layer does not change. Furthermore, another convolutional layer was added to extract hidden features from the sliding 1 × 1 filter. At the end of the block, there is a concatenating layer. We reduced the number of the inception blocks from three to two, and, unlike the InceptionTime network, our InceptionFCN has a much smaller number of trainable parameters as we applied convolutions of length {10, 20} instead of {40, 20, 10} as proposed. We avoided the use of the overparameterized network due to the risk of overfitting it. Fawaz et al. [5] created several models with architectural hyperparameters studies, where they modified the parameters of the inception blocks. In our proposed work, however, we chose optimal hyperparameters for the inception block to keep the training time and accuracy in the optimal trade-off. Instead, in the shortcut module, we applied deeper FCN, used in another study [14], with slight architectural changes. Implemented FCN consists of six 1D convolutional layers of the same size but with a different kernel length. The use of the deeper FCN compensates for the reduction of one inception block in the overall network performance. The initial data from the dataset are passed through the 1 × 1 kernel to every 128 filters on the first convolutional layer as an input. 1D convolutions can be performed to reach the desired number of labels. The loss of the FCN can be calculated by averaging the cross-entropy of every timestep and mini-batch. At the bottom of the shortcut block, we have the last layer in which we add the output tensor from the Inception module to the output tensor from the FCN module. Then, we perform global average pooling for the added tensor output, and, lastly, we perform a dropout to exclude the features from half of the nodes. The overview of the Inception module and FCN block are shown in Figure 2. Each convolutional layer is followed by batch normalization and ReLU activation function.
Figure 2. The overview of our Inception (a) and FCN (b) blocks.

3.2. Training the Network with UCR Archive

Before training the network, the data from the UCR archive must be preprocessed due to the diversity in the number of classes, the data distribution, and intensity. We considered only the first definition data described in Section 2, which is the univariate time-series; therefore, we used a UCR archive consisting of 85 univariate datasets to fairly compare our experimental results with the existing methods. To adjust the data to the uniformly distributed one, we normalized the input data to the range of 0.0 to 0.1. We nullified the unavailable timestep values inside some datasets to maintain the integrity of the incoming data. Since the amount of the trainable data is very large, we used multithreading (parallelism) to fasten the preprocessing step. Our InceptionFCN network is scalable in the input layer as the input sizes are different in every dataset. InceptionFCN was trained with a single RTX2080 GPU computer; therefore, the training time may be different from that of Hassan Ismail Fawaz et al., where they used over 60 GPUs for training/testing. We used 1600 epochs for each class during training. Each dataset has a different training time since the input sizes and the number of classes is different. We included the early stopping method with the patience of 60 epochs. The median test accuracy was used as an evaluation metric. For comparison, we also trained the InceptionTime network on our machine with the hyperparameters presented by Hassan Ismail Fawaz et al. to check the processing time for both networks. We managed to decrease the training by reducing the number of trainable parameters. Using empirical trial-and-error mechanisms, the following hyperparameters are selected for training our network, as shown in Table 1.
Table 1. Hyperparameters for InceptionFCN.

4. Results

4.1. Evaluation Metric

We evaluated the overall accuracy, inference time, and FLOPs for the UCR-85 archive in the UCR benchmark. The number of characteristics (e.g., the number of classes, the size of the training/test, and time-series input size) vary according to each dataset. As an example, the authors of the UCR archive provided the per class error (PCE) for three classification types, such as 1-NN Euclidean distance, DTW, and 1-NN DTW, with no window wrapping as a benchmark. We calculated the PCE for each dataset to evaluate the classification metric on multiple datasets. PCE is found with the formula:
P C E = 1 a c c n ,
where n is the number of classes and acc refers to the classification accuracy.

4.2. Numerical Results and Comparison

To keep the comparison fair, we evaluated our network performance and selected the four best DL methods that claimed to be the SOTA results within recent years: InceptionTime, MLP, FCN, and ResNet. We have created the fairgrounds for each network and trained each network on our single GPU machine. Our network performed the best in both accuracy and performance tests, having the lowest PCE rate on the UCR-85 archive. From Table 2, we can see that our proposed network outperformed previous well-known methods by achieving higher accuracy for most of the datasets. In total, our model achieved results with Win/Tie/Loss of 52/9/17 out of 85 datasets, which is significant. InceptionTime also showed competitively good performance. However, our network is significantly faster in terms of training without affecting the overall performance of the network.
Table 2. Testing accuracy and per class error rate (PCE) for five different networks on UCR-85 dataset. Green color shows the winner in accuracy test.
Table 3 shows the difference in the computational cost (FLOPs), which is proportional to inference and training time. Fewer total parameters than the InceptionTime provides a lower rate of overfitting the model, even on greater datasets. Our proposed method is trained significantly faster and has two times smaller architecture compared to the InceptionTime method (135 M vs. 309 M FLOPs) due to the fewer number of inception blocks and kernels.
Table 3. Average FLOPs comparison (in millions).

4.3. Wilcoxon Signed-Ranks Test

We applied the Wilcoxon signed-ranks test, which is a nonparametric statistical hypothesis test. The test ranks the differences in the performances of two classifiers (ours versus InceptionTime) for each dataset and compares the ranks for the positive and negative differences (R+ and R−). This test helps evaluate the difference between the two methods. We set the null hypothesis as the two methods perform equally well for all 85 datasets, and the alternative hypothesis is that our model performs better than InceptionTime.
Let us assume i to be the difference between the classifiers’ performance, where I is the classifier value on the n-th dataset. The differences are ranked from the lowest absolute value to the highest one. In the case of ties, average ranks are taken for those datasets. From this, we can calculate the R+ and R− by the following equations:
R + = i > 0 n r a n k ( i ) + 1 2 i = 0 n r a n k ( i )  
R = i < 0 n r a n k ( i ) + 1 2 i = 0 n r a n k ( i )  
Let T be the smaller of the sums, T = min (R+, R−). General statistics include exact critical values for T for a large number of datasets, where:
z = T 1 4 n ( n + 1 ) 1 24 n ( n + 1 ) ( 2 n + 1 )  
Table 4 shows a performance comparison between InceptionFCN and InceptionTime. By calculating the difference, we can compute the ranks that are used to find the value of T. The sum of positive ranks is R+ = 954.5 (Equation (7)), and the sum of negative ranks is R− = 2710.5 (Equation (8)). This shows that our model outperforms InceptionTime because the difference is calculated by subtracting the InceptionTime accuracy from InceptionFCN accuracy. From R− and R+, we find that the value of T is 954.5. Note that the number of datasets used in this experiment is n = 85. Using Equation (9), we find that z is equal to −3.83, which is smaller than −1.96 for α= 0.05 using the z-score table. Therefore, we reject the null hypothesis and conclude that our method achieves better results compared to InceptionTime, as proposed.
Table 4. Differences and ranks for two classifiers.
The next step is to identify the statistical significance of this difference between the two methods. One way is to use a sign test (a form of the binomial test [17]) for wins and losses between the two best methods (since InceptionTime performs better than the other existing methods, we only compare the difference between InceptionTime and our InceptionFCN for this experiment). Under the null hypothesis, two compared methods should perform equally, which means each should win n/2 datasets. Since the number of wins is distributed according to a binomial distribution [18], we can use the z-test. For example, if one method wins at least in ω α , shown in Equation (10), this method is considered significantly better than the other with p < 0.05.
ω α = 1 2 + z α   n 2  
for n = 85 and z α = 1.96, the value of ω α is 51.54. From Table 3, it is seen that our method wins 55 (head-to-head comparison) datasets out of 85, which not only supports the alternate hypothesis (that our method outperforms InceptionTime) but also shows that the difference between these two methods is statistically significant with p < 0.05.

4.4. Critical Difference Calculation for Multiple Classifiers

Figure 3 shows the critical difference diagram for the compared methods.
Figure 3. Critical difference diagram of the arithmetic means of the ranks for the selected DL methods.
For the multiple classifier comparisons, we used the Wilcoxon–Holm post hoc test to determine which classifiers are significantly different from one another. The average arithmetic rank represented in Figure 3 shows that our InceptionFCN is surpassing the well-known DL models. The critical difference diagram with Wilcoxon–Holm post-hoc analysis for the data presents the proof that adding deeper FCN to finetuned inception blocks improves the overall accuracy for the UCR archive. The critical difference is found by the following formula [19]:
C D = q α y ( y + 1 ) 6 n  
where k = 5 is the number of selected classifiers, q α is the critical values that are based on the Studentized range statistic divided by 2 , and n is the number of datasets.

5. Discussion

There are other DL methods that we did not consider for this research, such as LSTM-FCN by Fazle Karim et al. [20]. Their results outperformed most of the existing methods, and this approach is regarded as one of the first choices to address the DL-based time-series classification problem. However, the computational complexity is very high, particularly when many subsequent layers are used as LSTM networks can learn long-term relationships that go through geometric feature evolutions [21]. Therefore, a vast complexity increase was unacceptable for our research. For similar reasons, we did not consider implementing ResNet architecture. ResNet would be redundant when it is used with the Inception block since both blocks perform similarly. Moreover, Fawaz et al. already compared the performance of the InceptionTime with the ResNet model. Furthermore, our motivation for this research was to make a faster inferencing network with a time–accuracy trade-off.
Through various experiments, we showed that our proposed model could achieve a competitive performance while maintaining a smaller and more optimized network architecture. We believe this research will pave the way for many further research works directed at optimizing the network structure so that these methods can be implemented on small embedded devices with limited computational and memory resources.

6. Conclusions

In this paper, we enhanced a deep neural classifier for numerous time series classification tasks. Inspired by the Inception-based research, we evolved the inception module to achieve high performance and low computational cost (fewer FLOPs). We finetuned the network parameters and added a deeper shortcut FCN block to improve the performance for the TSC. Our approach is proven to be highly scalable as it can be applied to various time series collections of different sizes (i.e., UCR archive). The proposed method also simplified the network training as we reduced twice the number of parameters and conducted the current research using a single GPU machine. Moreover, using the Wilcoxon signed rank test and Wilcoxon–Holm post-hoc analysis, we showed that the InceptionFCN model outperforms InceptionTime significantly.
However, all the experiments were focused on the univariate datasets. For future work, we would like to expand our network to perform on multivariate data archives, such as UCR-128 and UCR-156. Moreover, we look forward to applying our architectural advancements in deep neural networks for various computer vision tasks.

Author Contributions

Conceptualization, S.U. and B.I.; methodology, S.U.; software, S.U.; validation, S.B., S.U. and J.K.; formal analysis, B.I.; investigation, S.U.; resources, S.B.; data curation, B.I.; writing—original draft preparation, S.U.; writing—review and editing, J.K.; visualization, S.U.; supervision, J.K.; project administration, J.K.; funding acquisition, J.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data available in a publicly accessible repository: the UCR time-series classification archive. Publicly available datasets were analyzed in this study. These data can be found at www.cs.ucr.edu/~eamonn/time_series_data/.

Acknowledgments

This work was supported by Inha University Research Grant.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. LeCun, Y.; Bottou, L.; Orr, G.B.; Muller, K.R. Efficient backpop. In Neural Networks: Tricks of the Trade; NIPS Workshop; Springer: New York, NY, USA, 1998; pp. 9–50. [Google Scholar]
  2. Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the NIPS’12: The 25th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; Volume 1, pp. 1097–1105. [Google Scholar]
  3. Chen, Y.; Keogh, E.; Hu, B.; Begum, N.; Bagnall, A.; Mueen, A.; Batista, G. The UCR Time Series Classification Archive. Available online: www.cs.ucr.edu/~eamonn/time_series_data (accessed on 11 October 2021).
  4. Ismail Fawaz, H.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.-A. Deep learning for time series classification: A review. Data Min. Knowl. Discov. 2019, 33, 917–963. [Google Scholar] [CrossRef] [Green Version]
  5. Ismail Fawaz, H.; Lucas, B.; Forestier, G.; Pelletier, C.; Schmidt, D.F.; Weber, J.; Webb, G.I.; Idoumghar, L.; Muller, P.-A.; Petitjean, F. InceptionTime: Finding AlexNet for time series classification. Data Min. Knowl. Discov. 2020, 34, 1936–1962. [Google Scholar] [CrossRef]
  6. Zagoruyko, S.; Komodakis, N. Wide residual networks. arXiv 2016, arXiv:1605.07146. [Google Scholar]
  7. Hoang, A.D.; Bagnal, A.; Kamgar, K.; Yeh, C.C.M.; Zhu, Y.; Gharghabi, S.; Ratanamahatana, C.A.; Keogh, E. The UCR Time series archive. arXiv 2018, arXiv:1810.07758. [Google Scholar]
  8. Hu, B.; Chen, Y.; Keogh, E. Time Series Classification under More Realistic Assumptions. In Proceedings of the SIAM International Conference on Data Mining (SDM), Austin, TX, USA, 2–4 May 2013; pp. 578–586. [Google Scholar]
  9. Minaar, A. Python Time-Series K-NN Classification and K-Means Clustering Using Dynamic Time Warping. Available online: http://alexminnaar.com/2014/04/16/Time-Series-Classification-and-Clustering-with-Python.html (accessed on 16 April 2014).
  10. Rakthanmanon, T.; Campana, B.; Mueen, A.; Batista, G.; Westover, B.; Zhu, Q.; Zakaria, J.; Keogh, E. Searching and Mining Trillions of Time Series Subsequences Under Dynamic Time Warping. In Proceedings of the KDD’12: 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 262–270. [Google Scholar] [CrossRef] [Green Version]
  11. Deng, H.; Runger, G.; Tuv, E.; Vladimir, M. A Time Series Forest for Classification and Feature Extraction. arXiv 2013, arXiv:1302.2277v2. [Google Scholar] [CrossRef] [Green Version]
  12. Schäfer, P. The BOSS is concerned with time series classification in the presence of noise. Data Min. Knowl. Discov. 2015, 29, 1505–1530. [Google Scholar] [CrossRef]
  13. Wang, Z.; Yan, W.; Oates, Y. Time series classification from scratch with deep neural networks: A strong baseline. arXiv 2016, arXiv:1611.06455v4. [Google Scholar]
  14. Baydadaev, S.; Usmankhujaev, S.; Kwon, J.; Kim, K.-S. Impulse Classification Network for Video Head Impulse Test. In Proceedings of the 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Montreal, QC, Canada, 20–24 July 2020; pp. 240–243. [Google Scholar] [CrossRef]
  15. Karimi-Bidhendi, S.; Munshi, F.; Munshi, A. Scalable Classification of Univariate and Multivariate Time Series. In Proceedings of the IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018; pp. 1598–1605. [Google Scholar] [CrossRef]
  16. Liu, Y.; Yu, J.; Han, Y. Understanding the effective receptive field in semantic image segmentation. Multimed. Tools Appl. 2018, 77, 22159–22171. [Google Scholar] [CrossRef]
  17. Sheskin, D.J. Handbook of Parametric and Nonparametric Statistical Procedures, 2nd ed.; Taylor & Francis: Boca Raton, FL, USA, 2000. [Google Scholar]
  18. Demsar, J. Statistical Comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
  19. Bifet, A.; Morales, G.; Read, J.; Holmes, G.; Pfahringer, B. Efficient Online Evaluation of Big Data Stream Classifiers. In Proceedings of the KDD’15: The ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 August 2015; pp. 59–68. [Google Scholar] [CrossRef] [Green Version]
  20. Karim, F.; Majumdar, S.; Darabi, H.; Chen, S. LSTM fully convolutional networks for time series classification. IEEE Access 2018, 6, 1662–1669. [Google Scholar] [CrossRef]
  21. Bonaccorso, G.; Fandango, A.; Shanmugamani, R. Python: Advanced Guide to Artificial Intelligence; Packt Publishing: Birmingham, UK, 2018; ISBN 9781789957211. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.