# FilterNet: A Many-to-Many Deep Learning Architecture for Time Series Classification

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

#### 1.1. Motivation

#### 1.2. Traditional Approaches

#### 1.3. Deep Learning Approaches

#### 1.4. Our Approach

**Many-to-many**. Our approach can process entire input signals at once (see Figure 1), and does not require sliding, fixed-length windows.**Striding/downsampling**. If appropriate, our approach can use striding to reduce the number of samples it outputs. This can both improve computational efficiency and enable subsequent layers to model dynamics over longer time ranges.**Multi-scale**. Like the well-known U-Net image segmentation model [40], FilterNet can downsample beyond its output frequency to model even longer-range temporal dynamics.

## 2. FilterNet

#### 2.1. FilterNet Layer Modules (FLMs)

#### 2.2. Component Architecture

**(A) Full-Resolution CNN**(s = 1, t = cnn). High-resolution processing. Convolves CNN filters against the input signal without striding or pooling, in order to extract information at the finest available temporal resolution. This layer is computationally expensive because it is applied to the full-resolution input signal.**(B) Pooling Stack 1**(s > 1, t = cnn). Downsamples from the input to the output frequency. This stack of ${n}_{p1}$ CNN modules (each strided by s) downsamples the input signal by a total factor of ${s}^{{n}_{p1}}$. The output length of this stack determines the output stride ratio, ${s}_{\mathrm{out}}={s}^{{n}_{p1}}$, and thus the output length of the network for a given input, ${L}_{out}={L}_{in}/{s}^{{n}_{p1}}$.**(C) Pooling Stack 2**(s > 1, t = cnn). Downsamples beyond the overall output frequency. This stack of ${n}_{{p}_{2}}$ modules (again, each strided by s) further downsamples the output of the previous layer in order to capture slower temporal dynamics. To protect against overtraining, the width of each successive module is reduced by a factor of $s$ so that ${w}_{i}={w}_{p}{s}^{1-i}$ for $i=1,\dots ,{n}_{{p}_{2}}$.**(D) Resampling Step**. Matches output lengths. In this step, every output of Pooling Stack 2 (C) is resampled in the temporal dimension via linear interpolation to match the network output length ${L}_{out}$. These outputs are concatenated with the final module output of Pooling Stack 1 (B). Without this step, the lengths of the outputs of (C) would not match the output length, and so they could not be processed together in the next layer. We have found that exposing each intermediate output of (C) in this manner, as opposed to only exposing the final output of (C), improves the model’s training dynamics and accuracy.**(E) Bottleneck Layer**. Reduces channel number. This module effectively reduces the width of the concatenated outputs from (D), reducing the number of learned weights needed in the recurrent stack (F). This bottleneck layer allows a large number of channels to be concatenated from (C) and (D) without resulting in overtraining or excessively slowing down the network. As a CNN with kernel length $k=1$, it is similar to a fully connected dense network applied independently at each time step.**(F) Recurrent Stack**(s = 1, t = lstm). Temporal modeling. This stack of ${n}_{l}$ recurrent LSTM modules provides additional modeling capacity, enables modeling of long-range temporal dynamics, and improves the output stability of the network.**(G) Output Module**(s = 1, k = 1, t = cnn). Provides predictions for each output time step. As in (E), this is implemented as a CNN with $k=1$, but in this case without a final batch normalization layer. The multi-class outputs demonstrated in this work use a softmax activation function.

#### 2.3. FilterNet Variants

## 3. Materials and Methods

#### 3.1. Benchmark Dataset

#### 3.2. Reference Architectures

#### 3.3. Software and Hardware Specifications

#### 3.4. Inference Windowing

#### 3.5. Performance Metrics

#### 3.6. Hyperparameter Search

#### 3.7. Model Training

#### 3.8. Ensembling

_{th}model uses the i

_{th}fold for validation and the remaining n-1 folds for training; and (d) ensembling the n models together during inference by simply averaging their logit outputs before the softmax function was applied. Performance of the overall ensemble was still measured on the same test set used in other experiments; only the train and validation sets varied. For efficiency, the evaluation and ensembling of the n models was performed using a single computation graph in PyTorch.

## 4. Results

#### 4.1. Model Performance

#### 4.2. Model Ensembling

#### 4.3. Window Length Effects

#### 4.4. Performance Using Fewer Sensor Channels

#### 4.5. Comparison to Published Results

## 5. Discussion

#### 5.1. Recommendations for Use

#### 5.2. Other Modifications

## 6. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Yang, Q.; Wu, X. 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak.
**2006**, 5, 597–604. [Google Scholar] [CrossRef] [Green Version] - Altilio, R.; Andreasi, G.; Panella, M. A Classification Approach to Modeling Financial Time Series. In Neural Advances in Processing Nonlinear Dynamic Signals; Esposito, A., Faundez-Zanuy, M., Morabito, F.C., Pasero, E., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 97–106. ISBN 978-3-319-95098-3. [Google Scholar]
- Susto, G.A.; Cenedese, A.; Terzi, M. Chapter 9—Time-Series Classification Methods: Review and Applications to Power Systems Data. In Big Data Application in Power Systems; Arghandeh, R., Zhou, Y., Eds.; Elsevier: Amsterdam, The Netherlands, 2018; pp. 179–220. [Google Scholar]
- Rajkomar, A.; Oren, E.; Chen, K.; Dai, A.M.; Hajaj, N.; Hardt, M.; Liu, P.J.; Liu, X.; Marcus, J.; Sun, M.; et al. Scalable and accurate deep learning with electronic health records. Npj Digit. Med.
**2018**, 1, 1–10. [Google Scholar] [CrossRef] [PubMed] - Nwe, T.L.; Dat, T.H.; Ma, B. Convolutional neural network with multi-task learning scheme for acoustic scene classification. In Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, Malaysia, 12–15 December 2017; pp. 1347–1350. [Google Scholar]
- Lotte, F.; Bougrain, L.; Cichocki, A.; Clerc, M.; Congedo, M.; Rakotomamonjy, A.; Yger, F. A review of classification algorithms for EEG-based brain–computer interfaces: A 10 year update. J. Neural Eng.
**2018**, 15, 031005. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Wang, J.; Chen, Y.; Hao, S.; Peng, X.; Hu, L. Deep Learning for Sensor-based Activity Recognition: A Survey. Pattern Recognit. Lett.
**2019**, 119, 3–11. [Google Scholar] [CrossRef] [Green Version] - Bagnall, A.; Lines, J.; Bostrom, A.; Large, J.; Keogh, E. The great time series classification bake off: A review and experimental evaluation of recent algorithmic advances. Data Min. Knowl. Discov.
**2017**, 31, 606–660. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Fortino, G.; Galzarano, S.; Gravina, R.; Li, W. A framework for collaborative computing and multi-sensor data fusion in body sensor networks. Inf. Fusion
**2015**, 22, 50–70. [Google Scholar] [CrossRef] - Dau, H.A.; Bagnall, A.; Kamgar, K.; Yeh, C.-C.M.; Zhu, Y.; Gharghabi, S.; Ratanamahatana, C.A.; Keogh, E. The UCR Time Series Archive. IEEE/CAA J. Autom. Sin.
**2019**, 6, 1293–1305. [Google Scholar] [CrossRef] - Dua, D.; Graff, C. UCI Machine Learning Repository. Available online: http://archive.ics.uci.edu/ml (accessed on 25 March 2020).
- Kumar, S. Ubiquitous Smart Home System Using Android Application. Int. J. Comput. Netw. Commun.
**2014**, 6, 33–43. [Google Scholar] [CrossRef] - Qin, Z.; Hu, L.; Zhang, N.; Chen, D.; Zhang, K.; Qin, Z.; Choo, K.-K.R. Learning-Aided User Identification Using Smartphone Sensors for Smart Homes. IEEE Internet Things J.
**2019**, 6, 7760–7772. [Google Scholar] [CrossRef] - Kim, Y.; Toomajian, B. Hand Gesture Recognition Using Micro-Doppler Signatures with Convolutional Neural Network. IEEE Access
**2016**, 4, 7125–7130. [Google Scholar] [CrossRef] - Rautaray, S.S.; Agrawal, A. Vision based hand gesture recognition for human computer interaction: A survey. Artif. Intell. Rev.
**2015**, 43, 1–54. [Google Scholar] [CrossRef] - Pantelopoulos, A.; Bourbakis, N.G. A Survey on Wearable Sensor-Based Systems for Health Monitoring and Prognosis. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev.
**2010**, 40, 1–12. [Google Scholar] [CrossRef] [Green Version] - Vepakomma, P.; De, D.; Das, S.K.; Bhansali, S. A-Wristocracy: Deep learning on wrist-worn sensing for recognition of user complex activities. In Proceedings of the 2015 IEEE 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN), Cambridge, MA, USA, 9–12 June 2015; pp. 1–6. [Google Scholar]
- Pet Insight Project. Available online: https://www.petinsight.com (accessed on 16 December 2019).
- Whistle. Available online: https://www.whistle.com/ (accessed on 16 December 2019).
- Lines, J.; Taylor, S.; Bagnall, A. Time Series Classification with HIVE-COTE: The Hierarchical Vote Collective of Transformation-Based Ensembles. ACM Trans Knowl. Discov Data
**2018**, 12, 1–35. [Google Scholar] [CrossRef] [Green Version] - Ismail Fawaz, H.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.-A. Deep learning for time series classification: A review. Data Min. Knowl. Discov.
**2019**, 33, 917–963. [Google Scholar] [CrossRef] [Green Version] - Alsheikh, M.A.; Selim, A.; Niyato, D.; Doyle, L.; Lin, S.; Tan, H.-P. Deep activity recognition models with triaxial accelerometers. In Proceedings of the Workshops at the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–13 February 2016; pp. 8–13. [Google Scholar]
- Plötz, T.; Hammerla, N.Y.; Olivier, P. Feature learning for activity recognition in ubiquitous computing. In Proceedings of the IJCAI 2011—22nd International Joint Conference on Artificial Intelligence, Barcelona, Spain, 16–22 July 2011; pp. 1729–1734. [Google Scholar]
- Bengio, Y. Deep Learning of Representations: Looking Forward. In Proceedings of the Statistical Language and Speech Processing; Dediu, A.-H., Martín-Vide, C., Mitkov, R., Truthe, B., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 1–37. [Google Scholar]
- Yang, J.B.; Nguyen, M.N.; San, P.P.; Li, X.L.; Krishnaswamy, S. Deep convolutional neural networks on multichannel time series for human activity recognition. In Proceedings of the 24th International Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015; pp. 3995–4001. [Google Scholar]
- Yazdanbakhsh, O.; Dick, S. Multivariate Time Series Classification using Dilated Convolutional Neural Network. arXiv
**2019**, arXiv:1905.01697. [Google Scholar] - Hatami, N.; Gavet, Y.; Debayle, J. Classification of time-series images using deep convolutional neural networks. In Proceedings of the Tenth International Conference on Machine Vision (ICMV 2017), International Society for Optics and Photonics, Vienna, Austria, 13 April 2018; Volume 10696, p. 106960Y. [Google Scholar]
- Qin, Z.; Zhang, Y.; Meng, S.; Qin, Z.; Choo, K.-K.R. Imaging and fusing time series for wearable sensor-based human activity recognition. Inf. Fusion
**2020**, 53, 80–87. [Google Scholar] [CrossRef] - Thanaraj, K.P.; Parvathavarthini, B.; Tanik, U.J.; Rajinikanth, V.; Kadry, S.; Kamalanand, K. Implementation of Deep Neural Networks to Classify EEG Signals using Gramian Angular Summation Field for Epilepsy Diagnosis. arXiv
**2020**, arXiv:2003.04534. [Google Scholar] - Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw.
**1994**, 5, 157–166. [Google Scholar] [CrossRef] - Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput.
**1997**, 9, 1735–1780. [Google Scholar] [CrossRef] - Lipton, Z.C.; Berkowitz, J.; Elkan, C. A Critical Review of Recurrent Neural Networks for Sequence Learning. arXiv
**2015**, arXiv:1506.00019. [Google Scholar] - Ordóñez, F.J.; Roggen, D. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition. Sensors
**2016**, 16, 115. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Chavarriaga, R.; Sagha, H.; Calatroni, A.; Digumarti, S.T.; Tröster, G.; Millán, J. del R.; Roggen, D. The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition. Pattern Recognit. Lett.
**2013**, 34, 2033–2042. [Google Scholar] [CrossRef] [Green Version] - Zappi, P.; Lombriser, C.; Stiefmeier, T.; Farella, E.; Roggen, D.; Benini, L.; Tröster, G. Activity Recognition from On-Body Sensors: Accuracy-Power Trade-Off by Dynamic Sensor Selection. In Proceedings of the Wireless Sensor Networks; Verdone, R., Ed.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 17–33. [Google Scholar]
- Hammerla, N.Y.; Halloran, S.; Plötz, T. Deep, convolutional, and recurrent models for human activity recognition using wearables. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, NY, USA, 9–15 July 2016; pp. 1533–1540. [Google Scholar]
- Inoue, M.; Inoue, S.; Nishida, T. Deep Recurrent Neural Network for Mobile Human Activity Recognition with High Throughput. Artif Life Robot
**2018**, 23, 173–185. [Google Scholar] [CrossRef] [Green Version] - Edel, M.; Köppe, E. Binarized-BLSTM-RNN based Human Activity Recognition. In Proceedings of the 2016 International Conference on Indoor Positioning and Indoor Navigation (IPIN), Alcala de Henares, Alcala de Henares, Spain, 4–7 October 2016; pp. 1–7. [Google Scholar]
- Guan, Y.; Ploetz, T. Ensembles of Deep LSTM Learners for Activity Recognition using Wearables. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.
**2017**, 1, 1–28. [Google Scholar] [CrossRef] [Green Version] - Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Humayun, A.I.; Ghaffarzadegan, S.; Feng, Z.; Hasan, T. Learning Front-end Filter-bank Parameters using Convolutional Neural Networks for Abnormal Heart Sound Detection. In Proceedings of the 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, USA, 17–21 July 2018; pp. 1408–1411. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; The MIT Press: Cambridge, MA, USA, 2016; ISBN 978-0-262-03561-3. [Google Scholar]
- Baloch, Z.; Shaikh, F.K.; Unar, M.A. Deep architectures for human activity recognition using sensors. 3c Tecnol. Glosas Innov. Apl. Pyme
**2019**, 8, 14–35. [Google Scholar] [CrossRef] - UCI Machine Learning Repository: OPPORTUNITY Activity Recognition Data Set. Available online: https://archive.ics.uci.edu/ml/datasets/OPPORTUNITY+Activity+Recognition (accessed on 4 December 2019).
- nhammerla/deepHAR. Available online: https://github.com/nhammerla/deepHAR (accessed on 5 December 2019).
- Sussexwearlab Sussexwearlab/DeepConvLSTM. Available online: https://github.com/sussexwearlab/DeepConvLSTM (accessed on 9 December 2019).
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32; Wallach, H., Larochelle, H., Beygelzimer, A., Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates Inc.: Red Hook, NY, USA, 2019; pp. 8024–8035. [Google Scholar]
- Hevesi, P. Phev8/ward-metrics. Available online: https://github.com/phev8/ward-metrics (accessed on 6 December 2019).
- Ward, J.A.; Lukowicz, P.; Gellersen, H.W. Performance metrics for activity recognition. ACM Trans. Intell. Syst. Technol.
**2011**, 2, 1–23. [Google Scholar] [CrossRef] - Amazon EC2-P2 Instances. Available online: https://aws.amazon.com/ec2/instance-types/p2/ (accessed on 6 December 2019).
- Zhao, Y.; Yang, R.; Chevalier, G.; Xu, X.; Zhang, Z. Deep Residual Bidir-LSTM for Human Activity Recognition Using Wearable Sensors. Math. Probl. Eng.
**2018**, 2018. [Google Scholar] [CrossRef] - Long, J.; Sun, W.; Yang, Z.; Raymond, O.I. Asymmetric Residual Neural Network for Accurate Human Activity Recognition. Information
**2019**, 10, 203. [Google Scholar] [CrossRef] [Green Version] - Murahari, V.S.; Plötz, T. On Attention Models for Human Activity Recognition. In Proceedings of the 2018 ACM International Symposium on Wearable Computers; ACM: New York, NY, USA, 2018; pp. 100–103. [Google Scholar]
- Hatami, N.; Chira, C. Classifiers with a reject option for early time-series classification. In Proceedings of the 2013 IEEE Symposium on Computational Intelligence and Ensemble Learning (CIEL), Singapore, 16–19 April 2013; pp. 9–16. [Google Scholar]

**Figure 1.**In a typical many-to-one approach (

**left**) an input is first divided into fixed-length overlapping windows, then a model processes each window individually, generating a class prediction for each one, and finally the predictions are concatenated into an output time series. In a many-to-many approach (

**right**), the entire output time series is generated with a single model evaluation.

**Figure 2.**FilterNet architecture. (

**a**) Each FilterNet model is composed primarily of one or more stacks of FilterNet layer modules (FLMs), which are parameterized and constructed as shown (see text for elaboration). (

**b**) In the prototypical FilterNet component architecture, FLMs are grouped into components that can be parameterized and combined to implement time series classifiers of varying speed and complexity, tuned to the problem at hand.

**Figure 3.**Representative training history for a ms-C/L model. While the validation loss oscillated and had near-global minima at epochs 27, 35, and 41, the custom stopping metric (see text) adjusted more predictably to a minimum at epoch 36. Training was stopped at epoch 47 and the model from epoch 36 was restored and used for subsequent inference.

**Figure 4.**Heatmap demonstrating differences between model outputs. (

**a**) The ground-truth labels for 150 s during the first run in the standard Opportunity test set; (

**b–e**) alongside the corresponding predictions for the 17 non-null behavior classes for various FilterNet architectures. Panes are annotated with weighted F1 scores (${F}_{1w}$ and the event-based ${F}_{1e}$) calculated over the plotted region.

**Figure 5.**Event-based metrics. Performance metrics for several classifiers, including event-based precision ${P}_{e}$, recall ${R}_{e}$, and F

_{1}score ${F}_{1e}$, alongside event summary diagrams, each for a single representative run.

**Figure 6.**Performance of n-fold ensembled ms-C/L models. Both sample-based (

**a**) and event-based (

**b**) weighted F1 metrics improve substantially with the number of ensembled models, plateauing between 3–5 folds, while inference rate (pane

**c**) decreases. (mean $\pm $ sd, n = 10).

**Figure 7.**Effects of inference window length. For all models, accuracy metrics (top panes) improve as inference window length increases, especially when models incorporate multi-scale or LSTM architectural features. For LSTM-containing models, the inference rate (bottom panes) falls off for long windows because their calculation across the time dimension cannot be fully parallelized. (n = 5 each).

**Figure 8.**Performance of different Opportunity sensor subsets (mean of n = 5 runs) according to (

**a**) the sample-by-sample ${F}_{1w,nn}$ and (

**b**) the event-based F1 ${F}_{1e}$. Using larger sensor subsets, including gyroscopes (G), accelerometers (A), and the magnetic (Mag) components of the inertial measurement units, as well as all 113 standard sensors channels (All), tended to improve performance metrics. The best models (e.g., 4× ms-C/L, ms-C/L, ms-CNN, and p-C/L) maintain relatively high performance even with fewer sensor channels.

**Table 1.**Several FilterNet variants (and their abbreviated names). The variants range from simpler to more complex, using different combinations of components A-G. CNN: convolutional neural network; LSTM: long short-term memory.

Variant | Components | ||||||
---|---|---|---|---|---|---|---|

A | B | C | D | E | F | G | |

Base LSTM (b-LSTM) ^{1} | - | - | - | - | - | ✓ | ✓ |

Pooled CNN (p-CNN) | ✓ | ✓ | - | - | - | - | ✓ |

Pooled CNN/LSTM (p-C/L) | ✓ | ✓ | - | - | - | ✓ | ✓ |

Multi-Scale CNN (ms-CNN) | ✓ | ✓ | ✓ | ✓ | ✓ | - | ✓ |

Multi-Scale CNN/LSTM (ms-C/L) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |

^{1}Base LSTM has a 1:1 output stride ratio, unlike the other variants.

**Table 2.**Configuration of the reference architectures used in this article. Each configuration represents one of the named FilterNet variants.

Component | b-LSTM | p-CNN | p-C/L | ms-CNN | ms-C/L |
---|---|---|---|---|---|

A—Full-Res CNN | - | ${L}_{\mathrm{CNN}}\left(100\right)$ | |||

B—Pooling Stack 1 | ${L}_{\mathrm{CNN}}\left(100,\text{}\mathrm{s}=2\right)$ ${L}_{\mathrm{CNN}}\left(100,\text{}\mathrm{s}=2\right)$ ${L}_{\mathrm{CNN}}\left(100,\text{}\mathrm{s}=2\right)\to {i}_{1}$ | ||||

C—Pooling Stack 2 | - | ${L}_{\mathrm{CNN}}\left(100,\text{}\mathrm{s}=2\right)\to {i}_{2}$ ${L}_{\mathrm{CNN}}\left(50,\text{}\mathrm{s}=2\right)\to {i}_{3}$ ${L}_{\mathrm{CNN}}\left(25,\text{}\mathrm{s}=2\right)\to {i}_{4}$ ${L}_{\mathrm{CNN}}\left(13\right)\to {i}_{5}$ | |||

D—Resampling Step | - | 195 output channels ^{1} | |||

E—Bottleneck Layer | - | ${L}_{\mathrm{CNN}}\left(100,\text{}\mathrm{k}=1\right)$ | |||

F—Recurrent Layers | ${L}_{\mathrm{LSTM}}\left(100\right)$ | - | ${L}_{\mathrm{LSTM}}\left(100\right)$ | - | ${L}_{\mathrm{LSTM}}\left(100\right)$ |

G—Output Module | ${L}_{\mathrm{CNN}}\left(18,\text{}\mathrm{k}=1\right)$ |

^{1}Resamples intermediates ${i}_{2,\dots ,5}$ to each have $\mathrm{len}\left({i}_{1}\right)$. Concatenate with ${i}_{1}$ for 195 output channels with matching lengths.

**Table 3.**Layer-by-layer configuration and description of the multi-scale CNN/LSTM (ms-C/L), the most complex reference architecture.

Component | Type | W_{in} | W_{out} | $\mathit{s}$ | $\mathit{k}$ | Params^{1} | Output Stride Ratio^{2} | ROI^{3} |
---|---|---|---|---|---|---|---|---|

in | input | 113 ^{4} | 1 | |||||

A | $FL{M}_{\mathrm{CNN}}$ | 113 ^{4} | 100 | 1 | 5 | 56,700 ^{4} | 1 | 5 |

B | $FL{M}_{\mathrm{CNN}}$ | 100 | 100 | 2 | 5 | 50,200 | 2 | 13 |

B | $FL{M}_{\mathrm{CNN}}$ | 100 | 100 | 2 | 5 | 50,200 | 4 | 29 |

B | $FL{M}_{\mathrm{CNN}}$ | 100 | 100 | 2 | 5 | 50,200 | 8 | 61 |

C | $FL{M}_{\mathrm{CNN}}$ | 100 | 50 | 2 | 5 | 25,100 | 16 | 125 |

C | $FL{M}_{\mathrm{CNN}}$ | 50 | 25 | 2 | 5 | 6300 | 32 | 253 |

C | $FL{M}_{\mathrm{CNN}}$ | 25 | 13 | 2 | 5 | 1651 | 64 | 509 |

C | $FL{M}_{\mathrm{CNN}}$ | 13 | 7 | 1 | 5 | 469 | 64 | 765 |

D | resample | 195 | 195 | 765 | ||||

E | $FL{M}_{\mathrm{CNN}}$ | 195 | 100 | 1 | 1 | 19,700 | 8 | 765 |

F | $FL{M}_{\mathrm{LSTM}}$ | 100 | 100 | 1 | 80,500 | 8 | all | |

G | $FL{M}_{\mathrm{CNN}}$ | 100 | 18 | 1 | 1 | 1818 | 8 | all |

out | output | 18 | all |

^{1}Number of trainable parameters, including those for bias and batch normalization.

^{2}Ratio of layer output frequency to system input frequency.

^{3}Region of influence; see text for detail.

^{4}Varies when sensor subsets are used.

**Table 4.**Training parameters used in this study, along with an approximate recommended range for consideration in similar applications.

Parameter | Value | Recommended Range |
---|---|---|

Max epochs | 100 | 50–150 |

Initial learning rate | 0.001 | 0.0005–0.005 |

Samples per batch | 5,000 | 2,500–10,000 |

Training window step | 16 | 8–64 |

Optimizer | Adam | Adam, RMSProp |

Weight decay | 0.0001 | 0–0.001 |

Patience | 10 | 5–15 |

Learning rate decay | 0.95 | 0.9–1.0 |

Window length | 512 | 64–1024 |

**Table 5.**Model and results summary for the Opportunity dataset, the five FilterNet reference architectures, as well as three modifications of the ms-C/L architecture. The 4-fold ms-C/L variant exhibited the highest accuracy, while smaller and simpler variants were generally faster but less accurate. The best results are in bold.

Architecture | Classification Metrics | Efficiency | |||||||
---|---|---|---|---|---|---|---|---|---|

Model | n^{1} | Stride Ratio | Params (k) | ${\mathit{F}}_{\mathbf{1}\mathit{w}}$ | ${\mathit{F}}_{\mathbf{1}\mathit{m}}$ | ${\mathit{F}}_{\mathbf{1}\mathit{e}}$ | ${\mathit{F}}_{\mathbf{1}\mathit{w},\mathit{n}\mathit{n}}$ | kSamp/s | Trains/epoch |

FilterNet reference architectures | |||||||||

b-LSTM | 9 | 1 | 87 | 0.895 | 0.595 | 0.566 | 0.787 | 865 | 15 |

p-CNN | 9 | 8 | 209 | 0.900 | 0.638 | 0.646 | 0.803 | 1340 | 2.0 |

p-C/L | 9 | 8 | 299 | 0.922 | 0.717 | 0.822 | 0.883 | 1160 | 4.0 |

ms-CNN | 9 | 8 | 262 | 0.919 | 0.718 | 0.792 | 0.891 | 1140 | 3.5 |

ms-C/L | 9 | 8 | 342 | 0.928 | 0.743 | 0.842 | 0.903 | 1060 | 5.1 |

Other variants | |||||||||

4-fold ms-C/L | 10 | 8 | 1371 | 0.933 | 0.755 | 0.872 | 0.918 | 303 | 5.1 × 4 |

½ scale ms-C/L | 9 | 8 | 100 | 0.921 | 0.699 | 0.815 | 0.880 | 1350 | 5.2 |

2x scale ms-C/L | 9 | 8 | 1250 | 0.927 | 0.736 | 0.841 | 0.901 | 682 | 7.0 |

Non-FilterNet models | |||||||||

DeepConvLSTM reimplementation | 20 | 12 | 3965 | -- | -- | -- | -- | 9 | 170 |

^{1}Results are mean of n repeats.

**Table 6.**Model mean F1 score without nulls (${F}_{1w,\text{}nn}$) for different sensor subsets (n = 5 each). This table reproduces the multimodal fusion analysis of [33]. The ms-C/L and 4-fold ms-C/L models improve markedly upon reported DeepConvLSTM performance, especially with smaller sensor subsets. The best results are in bold.

Gyros | Accels | Accels+ Gyros | Accels+ Gyros + Magnetic | Opportunity Sensors Set | |
---|---|---|---|---|---|

# of sensor channels | 15 | 15 | 30 | 45 | 113 |

DeepConvLSTM [33] | 0.611 | 0.689 | 0.745 | 0.839 | 0.864 |

p-CNN | 0.615 | 0.660 | 0.722 | 0.815 | 0.798 |

ms-C/L | 0.850 | 0.838 | 0.886 | 0.903 | 0.901 |

4-fold ms-C/L | 0.857 | 0.862 | 0.903 | 0.923 | 0.916 |

**Table 7.**Performance comparison alongside published models on the Opportunity dataset. FilterNet models perform set records in each metric. The best results are in bold.

Method/Model | ${\mathit{F}}_{1\mathit{m}}$ | ${\mathit{F}}_{1\mathit{w}}$ | ${\mathit{F}}_{1\mathit{w},\mathit{n}\mathit{n}}$ | n | |
---|---|---|---|---|---|

DeepConvLSTM | [33] | 0.672 | 0.915 | 0.866 | ? |

LSTM-S | [36] | 0.698 | 0.912 | best of 128 | |

0.619 | median of 128 | ||||

b-LSTM-S | [36] | 0.745 | 0.927 | best of 128 | |

0.540 | median of 128 | ||||

LSTM Ensembles | [39] | 0.726 | mean of 30 | ||

Res-Bidir-LSTM | [51] | 0.905 | ? | ||

Asymmetric Residual Network | [52] | 0.903 | ? | ||

DeepConvLSTM + Attention | [53] | 0.707 | mean | ||

FilterNet ms-CNN | 0.718 | 0.919 | 0.891 | mean of 9 | |

FilterNet ms-C/L | 0.743 | 0.928 | 0.903 | mean of 9 | |

FilterNet 4-fold ms-C/L | 0.755 | 0.933 | 0.918 | mean of 10 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Chambers, R.D.; Yoder, N.C.
FilterNet: A Many-to-Many Deep Learning Architecture for Time Series Classification. *Sensors* **2020**, *20*, 2498.
https://doi.org/10.3390/s20092498

**AMA Style**

Chambers RD, Yoder NC.
FilterNet: A Many-to-Many Deep Learning Architecture for Time Series Classification. *Sensors*. 2020; 20(9):2498.
https://doi.org/10.3390/s20092498

**Chicago/Turabian Style**

Chambers, Robert D., and Nathanael C. Yoder.
2020. "FilterNet: A Many-to-Many Deep Learning Architecture for Time Series Classification" *Sensors* 20, no. 9: 2498.
https://doi.org/10.3390/s20092498