Convolutional neural networks (CNN) were first introduced by Krizhevsky et al. [

40] and gained attention by winning the ImageNet challenge [

41] with a large margin in 2012. Since then, new CNN Architectures, e.g., ResNet [

42] were introduced and enhanced the performance of CNN significantly. CNN’s were mostly used for the image domain. In 2016, Wang et al. [

43] introduced a fully convolutional neural network (FCN) architecture to classify time series data. They validated their approach on 44 datasets from the UCR/UEA archive. CNN for time series data processing achieved state of the art performances in various domains e.g., ECG Classification, Sound Classification, and Natural Language Processing. The name giving convolution can be seen as applying and sliding a one-dimensional filter over the time series. Unlike images, the filters cover only one dimension (time) opposed to two dimensions (width and height). A convolutional neural network consists of various filters, ranging from moving average filters to more complex filters. Those filters are learned through the backpropagation algorithm. By passing a univariate time series through a convolutional neural network, multiple convolutions with different filters are applied. This may be seen as using a filter bank to extract useful features, removing outliers or general filtering. To surpass the linear nature of a convolution operation, a nonlinear function is applied after a convolution to introduce nonlinearity and therefore ensure a nonlinear transformation of the time series data. An illustration of a one-dimensional convolutional neural network is denoted in

Figure 1. Mathematically, a convolutional neural network apply the following equation for each timestamp

t of the time series data

$\mathbf{X}$:

were

$\mathbf{W},k,t,\mathbf{b}$ equals weights of the kernel, length of the kernel, timestamp, and bias, respectively. After applying

n convolutions on the input accelerometer data with length

l, we settle with

n channels, where each channel represents a new filtered time series. These

n channels with shape

$n\times l$ are then convolved with

m different filters with shape

$n\times m\times k$, where each

${m}_{i-th}$ filter is slid across all

n channels resulting in

m additional time series, where each

${m}_{i}$ channel is a sum of the convolutions of

${m}_{i-th}$ filter across all

n channels.