Fall Detection from Electrocardiogram (ECG) Signals and Classiﬁcation by Deep Transfer Learning

: Fall is a prominent issue due to its severe consequences both physically and mentally. Fall detection and prevention is a critical area of research because it can help elderly people to depend less on caregivers and allow them to live and move more independently. Using electrocardiograms (ECG) signals independently for fall detection and activity classiﬁcation is a novel approach used in this paper. An algorithm has been proposed which uses pre-trained convolutional neural networks AlexNet and GoogLeNet as a classiﬁer between the fall and no fall scenarios using electrocardiogram signals. The ECGs for both falling and no falling cases were obtained as part of the study using eight volunteers. The signals are pre-processed using an elliptical ﬁlter for signal noises such as baseline wander and power-line interface. As feature extractors, frequency-time representations (scalograms) were obtained by applying a continuous wavelet transform on the ﬁltered ECG signals. These scalograms were used as inputs to the neural network and a signiﬁcant validation accuracy of 98.08% was achieved in the ﬁrst model. The trained model is able to distinguish ECGs with a fall activity from an ECG with a no fall activity with an accuracy of 98.02%. For the veriﬁcation of the robustness of the proposed algorithm, our experimental dataset was augmented by adding two different publicly available datasets to it. The second model can classify fall, daily activities and no activities with an accuracy of 98.44%. These models were developed by transfer learning from the domain of real images to the medical images. In comparison to traditional deep learning approaches, the transfer learning not only avoids “reinventing the wheel,” but also presents a lightweight solution to otherwise computationally heavy problems.


Introduction
The World Health Organization (WHO) defines fall as "unintentionally coming to the ground or some lower level and other than as a consequence of sustaining a violent blow, loss of consciousness, sudden onset of paralysis as in stroke or an epileptic seizure" [1].
According to a study, falls are the dominant cause of unintentional injury-related deaths and non-fatal injuries in people aged 65 and above. It poses a severe challenge for senior adults and people with movement disabilities [2]. The likelihood of falls increases with age and diminished health quality of an individual. The frequency of falls is more significant in seniors living in nursing homes than those who are living in an outside community. According to [3], approximately 30-50% of people residing in long-term care institutions fall per annum, and 40% of them experience repetitive falls. About two-thirds of people who suffer a fall are susceptible to recurrent falls. About 50% of the patients who lay on the floor for more than an hour after a fall died within six months of the fall [4].
A critical step in providing timely responses to falls is detecting them as early as possible. Several studies and surveys have been conducted to categorize falls and detect them.
In this study we aimed to detect human activities, in particular fall events using wearable devices based on electrocardiograms (ECG) sensor data. To this end, we classified the activities with a pre-trained deep neural network; i.e., we applied a transfer learning approach to reduce training time and data.
The paper is structured as follows: A brief description of the technologies used in our study is given in Section 2. Section 3 presents an overview of the different forms of fall detecting techniques and human activity recognition (HAR) in the literature along with an emphasis on our contribution to the field. The experimental setup and data collection process are illustrated in Section 4. Section 5 presents our proposed methodology and explains the algorithms used. It also describes the phases of our work and its implementation. Section 6 reviews the collected results and their implications. This leads to the discussion related to our research question in Section 7. Finally, Section 8 outlines the conclusion and gives some future perspectives that can be used to further probe the field of HAR using ECG signals.

Background
For the reader's convenience, we briefly explain the technologies that were employed in this section.
Several fall detection systems exist in literature. A recent review of the fall detection systems in [5] categorizes them as follows: (i) wearable devices, (ii) ambient systems, (iii) image processing systems and (iv) combined systems. Wearable devices provide a cheaper and more practical solution in terms of freedom of movement and energy consumption. The aim of most fall detection systems is not only to detect a fall but also to inform concerned authorities in case of an urgent medical emergency. Most of the latest algorithms for fall detection use machine learning [6] and deep learning algorithms [7].
Deep learning became prominent in computer vision (CV) when a deep learning convolution neural network, AlexNet, outperformed its competitors in the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) in 2012 during an image classification task. Before the wide spread recognition of deep learning, features from the data had to be hand crafted and then fed into a machine learning model. Deep learning in general and convolutional neural networks (CNNs) in particular, have revolutionized the field by not only detecting the features themselves but the features selected by them have been proven to be more significant than those crafted by hand [8].
According to [9], "transfer learning is a machine learning method where a learning model developed for a first learning task is reused as the starting point for a learning model in a second learning task." It can also be defined as the ability of a system to recognize and apply knowledge and skills learned in previous domains/tasks to novel domains/tasks, which share some commonality [10]. Transfer learning has been formally defined and categorized in [10]. The problem statement, notation and definition used here are also taken from [10]. A domain D is comprised of two components: a feature space X and a marginal probability P(X), where X = {x 1 , x 2 , ...x n } ∈ X . Similarly, given a specific domain, D = {X, P(X)}, a task consists of two components: a label space Y and an objective predictive function f (·) (denoted by T = {Y, f (·)}), which is not observed but can be learned from the training data, which consist of pairs {x i , y i }, where x i ∈ X and y i ∈ Y. The function f (·) can be used to predict the corresponding label, f (x), of a new instance x. From a probabilistic viewpoint, f(x) can be written as P(Y|X).
Formally transfer learning is defined as, "Given a source domain D S and learning task T S , a target domain D T and learning task T T , transfer learning aims to help improve the learning of the target predictive function f T (.) in D T using the knowledge in D S and T S , where D S = D T , or T S = T T ." Transferring knowledge from one domain to another using transfer learning is not only of theoretical interest, but also of great practical importance, as it can spare a lot of training time and resources and by reducing the data necessary in a domain, it can make learning feasible at all. In our work we have transferred the knowledge gained from training neural networks with natural images to the domain of synthetic, artificial medical images in order to classify images, that were created as a result of pre-processing medical data such as ECG data.

Related Work and Our Contribution
This section reviews the work that has been previously done to detect falls using different methods and highlights our contribution to the field. Many surveys and reviews, such as [11][12][13], exist in literature which describe different fall detection techniques, standard experimental protocols for fall simulations and their relative advantages and disadvantages. Detecting fall comes under the vast umbrella of human activity recognition (HAR). HAR is usually considered a computer vision (CV) problem where CV algorithms are used on images or videos to distinguish one activity from another. However, other device-free solutions based on, e.g., radio signals, such as received signal strength (RSSI) or channel state information (CSI) exist; see, e.g., [14][15][16]. Alternatively, wearable sensors provide a resource friendly and practical solution to real time activity recognition, e.g., in particular the rising popularity of gadgets such as smartwatches is an indicator for this trend. Many studies like [17,18] infuse readings from accelerometers and ECG sensors at decision level to determine an activity. Though the wavelet transform has been previously used for analyzing ECG signals for detecting different cardiac conditions as reviewed in [19], it has never been used separately to differentiate a fall from non-fall. Similarly [20] analyzes the impact of body movements on ambulatory ECG frequency spectrum. It also uses artificial neural networks to classify different body movements.
The ECG signals predict the overall health of the human body. A considerable amount of effort has already been contributed to the area of fall detection using accelerometers and gyroscopes. Some studies, such as that in [21], have used different machine learning techniques on these sensor readings to predict and detect the fall. Few studies use ECG signals to predict the falls like in [22], but our study uses the frequency-time domain of the ECG signals by computing their continuous wavelet transform (CWT) coefficients and converting them into scalograms. The overall idea is mainly inspired by [23] where a convolutional neural network (CNN) has been trained on scalograms to differentiate between different heart diseases using ECG signals. Similar approach was used in our study to differentiate between falls and no-falls ECG signals.
The CNN we utilize has also been used in [24] to extract features for bio-metric purposes using gait features from sensor data. However, in our work, for the first time we classify the following activities using ECG signals: FALL, RESTING, and general DAILY ACTIVITIES, i.e., we distinguish not only FALL from RESTING but also FALL from DAILY ACTIVITIES, where DAILY ACTIVITIES refer to generic daily activities performed by the subjects. The fall risk using heart rate variability in combination with data mining techniques is assessed in [25].
In this paper, we have tried to explore the research question, "Can a fall be detected by using only ECG signals?" by collecting the ECG signals of people falling and then applying our proposed algorithm on the signals. The purpose of this study is to use the biomedical electrical signals of the heart via electrocardiogram to train a deep neural network to analyze and observe the patterns of ECG before and during the fall. Although a lot of work has already been done to detect falls using accelerometers and gyroscopes along with heart rate variability, the focus of this study has been to use ECG as the only factor to determine the presence of fall. Here we are not referring to the cases where cause of the fall is due to a specific heart condition. This study has explored the less studied field of ECG signals for fall detection by applying machine learning techniques in it. It is based partially on [26] but extends the results obtained therein substantially.
In the field of medical imaging, finding an appropriate amount of data has always been a challenge. That is mainly due to two reasons: Firstly, because of strict data privacy issues, and secondly, finding a large group of volunteers for conducting experiments can be challenging. A data augmentation technique for time series called slicing was used to enhance the limited dataset.
For the machine learning domain, we used transfer learning and compared two of the popular pre-trained networks for image classification-AlexNet and GoogLeNet. We have demonstrated that it could be really beneficial to use a pre-trained network in medical imaging domain as it does not require a lot of data. However, we would like to emphasize that it is an initial study with a focus on the proof of concept in a laboratory set up. It is not conclusive enough to be deployed but it does provide a strong baseline for further workt.

Experimental Setup and Data Collection
There are some online ECG databases available like ECGVIEW [27] and physionet [28] which provide ECGs for different research purposes. The ECG databases for people falling could not be found so the ECG data was collected as part of the experiment. The data acquisition system consisted of two components: a hardware component which was obtained from [29] and a software component; for the system architecture we refer to Figure 1. The hardware component consisted of a wearable belt with ECG sensor attached to a Arduino board. The wearable prototype was designed after taking several features in account as explained in [31]. The device should have been able to record the signals continuously and should be adaptable to the experiment protocol which requires a lot of movement including falls. For this reason, a 3-lead ECG was opted as it is not only a standard for emergency purposes (e.g., in ambulance) but it also requires less wiring and hence provides patient compliance. The readings of a 3-lead ECG are comparable to a 12-lead ECG in our experimental set up. The design of the wearable device is discussed in detail in [31].
Out of many different types of falls, rolling-out-bed as a main and significant one was chosen for this study. Rolling out is defined as from lying, rolling out of bed and going to the floor [21]. The experimental setup consisted of a bed or a table. The sensor and Arduino gadget were worn by the subject. The ECG electrodes were tapped on the chest of the subject. Then the Bluetooth low energy (BLE) connection between the peripheral and central device was established.

Inclusion Criteria
The study was administered on 8 healthy subjects out of which 3 were females and 5 were males. The mean age of the subjects was 34.5. The following inclusion criteria were designed to enlist the volunteers for the HAR experiment: • Age between 18 and 55 years. • No history of heart disease or hypertension. • No drugs or alcohol consumption before the experiment. • No physical constraints for falling (arthritis, etc.).
Our experiment was evaluated by the security officer of our university and a risk assessment was established with assistance from a medical doctor classifying our study as a HAR experiment in accordance with the ethical principles of the Declaration of Helsinki [32].
The heart of a human goes through various changes from birth to adulthood. ECG parameters vary for different age groups in general below 18. Human heart goes through a lot of physiological and anatomical changes from birth to adolescence. This causes some ECG features to differ significantly in adults as compared to children of different ages [33]. In order to correctly interpret the results with various age groups in children, one should have detailed knowledge of age dependant changes which is out of scope for our study. That is the reason the age group chosen was between 18 and 55 for the experiment.

The HAR Experiment
The ECG was recorded in three positions: laying down, falling by rolling over from the bed and performing daily activities (such as walking and running). Figure 2 shows the fall in three steps. The first on the extreme left is resting position. Next comes the fall initiation and it ends with the person laying on the ground without moving. Each experiment lasted about a total of 40-45 min consisting of multiple sub-readings of 30 s each. A total of three falls was recorded in a single reading. In each fall, the subject was in resting position as shown in the Figure 2 for first 10 s. In the next 10 s, the volunteers would fall by rolling off the side of the table/bed and would lay on the ground until 30 s are completed. Then the subject would go up and repeat. A similar protocol was used by [34], a prominent dataset for falls and activities of daily life. The laying down or resting period after fall may indicate subconsciousness after fall and hence is necessary to include when designing a fall protocol mimicking elderly population. Some elderly people do not move after the fall. According to [35], a study conducted on 110 participants, of the 60% who fell, 80% were unable to get up after at least one fall and 30% had lain on the floor for an hour or more. According to [36], inactivity after the activity is a common basis for determination of a fall situation. Each reading consisted of 90 s containing a fall every 30 s. On average, 10 readings from each subject were collected.
For the resting ECG, the subject had to lay down on a flat surface without talking and moving to avoid the muscle noise. The log file consisting of 90 s for fall and 30 s for RESTING was sent to the central device via BLE. For daily activities, subjects performed some tasks like walking in a fast pace, sitting and standing for an average time of 10 to 15 s following no specific pre-determined protocol.

The Collected Data
The data collected consists of a total of 8 volunteers. Readings from two of the volunteers were too distorted to be used so they were not included in the next steps. In literature, one would find many HAR experiments which have been conducted using a limited number of volunteers. The benchmark HAR dataset Opportunity [37] used in [38] consists of readings from 4 participants. Similarly, other datasets like PAMAP2 [39] and Skoda [40], which have been used in [41], used 9 and 1 participants respectively for data collection. It is important to note that since it is a preliminary study, more experiments are planned and recommended for future work including elder people and people with some pre-existing heart issues. An overall summary of the collected data is presented in Table 1. This table presents the data statistics as it was collected and saved in files. Due to limited time with each volunteer, some of the subjects performed only fall experiments, others could do only resting readings and some could perform all three including daily activities. Table 2 presents the overview of multiple data instances that were obtained from those files.

Our Methodology and Implementation Protocol
The following section outlines our proposed algorithm and its initial implementation.

Proposed Algorithm
The basic Algorithm 1 designed involves the following steps: 1. Pre-processing.

3.
Calculating the continuous wavelet transform and creating scalograms.
Algorithm 1 Detecting fall with raw ECG signals. Input: A time series ECG raw data Ts Output: The classified activity label l Ts ← ECG_RAW_VALUE_EXTRACTION(Ts) ECG_RAW_VALUE_EXTRACTION(Ts) includes pre-processing raw ECG signals by performing an Analog to Digital Conversion (ADC), i.e., extracting the voltage from the raw data, observing the frequency domain and if required, interpolating the data to upsample to ensure uniform sampling frequency. FILTER(Ts) applies elliptical filtering to remove noisy artifacts from raw ECG signals Ts. After the signals have been filtered, wavelet transform is calculated for each signal. The absolute value of wavelet transform is used to create corresponding scalograms or scale-frequency images. The set of images is fed into a fine tuned pre-trained convolutional neural network. Algorithm 2 describes the algorithm for using AlexNet for classification. The trained model is then verified using k-fold verification.

ECG Signal Filtering
ECG is a low amplitude bio-signal that can be easily contaminated by several different types of artifacts and other bio-signals. A lot of sensor-related information or overhead has to be removed from the log file at the initial stages of filtration.
The sensor value we get is value from an analog to digital 14 bit converter (ADC). The following formula was used to find the corresponding voltage to the ADC value [42]: Noises in ECG signal are composed of components at high-frequency and/or at low-frequency. Noises with high-frequency components include power line interface, muscle noise and white Gaussian noise. Baseline wandering is a noise with low-frequency components. Minimizing baselines wander and power line interference is the first step in all electrocardiograph (ECG) signal processing [43]. Major noise artifacts lie in the following regions [44] : The frequency spectrum (the graph between frequency and amplitude of the signal) of the signals was initially analyzed to detect noises. A harmonic of about 1.5 Hz was observed in the spectrum. These harmonics are most probably due to the power line interface (PLI) because the frequency multipliers (harmonics) are PLI characteristics. A baseline wander is also seen in some of the readings (Figure 3a,b). The goal of filtering is to remove maximum noise while still preserving the characteristics and properties of the signal. With a wide range of filtering techniques available choosing the right filter presents a challenge in itself. The next step after the identification of a type of artifact in our signal is to choose an appropriate filter. ECG signal filtering is a vital step for ECG signal processing and analysis.
Several different techniques exist in literature to remove noise from ECG signals. References [45][46][47] used wavelet transform to remove baseline drift and reducing overall noise. Similarly, [47] uses an ECG simulation to compare different techniques to remove baseline wander in order to preserve changes in ECG wave and found wavelet-based baseline cancellation the best method. However, it also recommended the Butterworth high-pass filter because of its computationally fast nature.

IIR versus FIR
The classification of transfer functions in time domain based on the length of its impulse response sequence leads to transfer functions known as infinite impulse response and finite impulse response. The infinite impulse response or IIR filters have an impulse response that does not become exactly zero after a certain point, but continues indefinitely. Whereas in contrast, with the finite impulse response (FIR) the impulse response is finite in length [48].
One chooses between FIR and IIR filters depending majorly on the relative advantages of the two filter types. Since a challenge in our experimental set up is a lot of subject movement and implementation of ECG de-noising in combination with a smart mobile device (hence a limited resource environment), a careful balance between efficiency and accuracy has to be found. In [49] comparison between windows based FIR and IIR Butterworth filter was observed and IIR filter was shown to outperform FIR by acquiring a better computational efficiency with a minimal signal distortion. Similarly, in [50], the best tradeoff between spectral density and average power was shown by IIR filter (Chebyshev Type II) as compared to its counterpart FIR filters. IIR filters have less computational complexity and hence require less computation power as compared to FIR filters. The memory requirement is also increased in the case of FIR filters hence IIR filters can be the better choice for removing baseline noises [50]. Wavelet analysis is also used in recent works to denoise ECG data. Reference [45] used discrete wavelet transformation (DWT) to correct baseline wander and reduce noise. They estimated the baseline wander through coarse approximation in DWT and proposed recommendations for the selection of wavelets and the maximum depth for decomposition level.

Elliptical Filter
To acquire a given level of performance, a much higher filter order is required in case of FIR as compared to IIR. This causes a greater delay in these filters for an equal performing IIR filter.
First of all, the minimum order of an elliptic filter which is required to meet a set of filter design specifications was determined and then same order was used to filter all the signals.
Elliptic filter was better fitted to the data but it not only removed the PLI in the signal, but baseline wander was considerably improved as well.

Implementation Protocol: Hardware and Software
The following section gives a detailed account of the implementation process for phase I. Now we have filtered ECG signals from the previous step and we want to use a pre-trained CNN, AlexNet, to differentiate between falling and resting ECG signals.
After filtration, each signal of 90 s duration was individually examined and divided in 3 readings in such a way that precisely one fall or at least more than 70% of the fall laid in each reading. This was done to increase the dataset and homogenize the data at one-time duration i.e., 30 s. At this point the collected signals for DAILY ACTIVITIES were considered a part of the class NO-FALL.
Two models were trained on different hardware resources for phase I. One with a GeForce RTX 2080 Ti GPU with computing capability of 7.5 and another with a PC containing Intel(R) Core(TM) i5-7260U 2.20 GHz CPU.

Time-Frequency Representations
The scale-frequency representations were created using SCALOGRAM_CREATION(Ts). Each ECG signal had to be of the same length in order to compute a continuous wavelet transform (CWT) filter bank for all of them. It had to be done carefully so that no valuable signal, specifically fall signals, are lost.
In the next step, CWT filterbank for one of the signals was computed and using those coefficients, time-frequency representations called scalograms for rest of the data were created. Morse wavelet was used to calculate the wavelet transform. The scalograms were saved for later processing. Each representation was of 227 × 227 × 3 sized RGB image as it is the expected input format for AlexNet. Figure 4a-c provide a comparison between falling and non-falling scalograms. It can be seen that scalogram with a fall in them have some areas with high energy concentration in them in yellow color. On the contrary, the energy seems to be distributed evenly in non-fall ECG scalogram representing the cardiac cycle. Then those images were divided randomly into training and validation data with 80% training images and 20% testing images.

Phases of Our Implementation
The whole study was conducted (see Table 2) in two major phases as explained in following section. In phase I, the ECG signals obtained by our own experimental set up were used to create wavelet transforms and scalograms. The model trained in this phase had two classification classes: NO-FALL which included all readings which did not have fall, and FALL which had signals with fall in them. Eventually, we fine tuned and retrained AlexNet to obtain a trained CNN model which gave 98.02% validation accuracy. (Phase II is described in Section 5.6 below).

Training The Network: Phase I
In phase I, a very small number of daily activities along with resting in no-fall scenarios were included. The daily activities included fast walking, sitting up from laying, sudden standing from sitting, picking up something from the ground while standing and stumbling while fast walking. The daily activities were performed in a domestic setting. Each activity duration ranged between 20 to 30 s.
These were also processed in the same manner as the previous readings in the earlier iteration. Two training sessions were done in this iteration, one with Intel i5, 1.3 GHz and another with the GPU.

Preparing and Training the Model
In transfer learning, a large dataset is used to train a primary network and the features learned from the training are either re-purposed or transferred to an another designated network to be trained on a specific dataset and task. It is usually very difficult to acquire a dataset of sufficient size to train an entire CNN from scratch so practically in most scenarios a CNN is pre-trained on a very large dataset and then used either as initialization or as a feature extractor for the required task. This process is called transfer training. This process works if the features are more generic and suitable to both primary and target tasks and not specific to the basic task [51].
In our case, since the dataset was small, the chances of over-fitting are a concern for using transfer learning. A pre-trained network, AlexNet, was used to train the model and the deeper layers of the network were fine-tuned. Fine-tuning is the closest solution to our problem because universal features like edges or curves are captured by a pre-trained network which is trained on a large and diverse dataset like ImageNet [52]. According to [53], despite the differences between natural images and medical images, the CNNs trained on the well annotated ImageNet can still be transferred to make medical image recognition tasks more effective. It is conjectured that by fine-tuning the transfer learning strategy yields the best results for performance in medical imaging [53] and our experiments confirm this conjecture.

Tuning The AlexNet
As part of any transfer-learning approach the AlexNet had to be fine-tuned to our dataset. The last three layers of AlexNet are configured for 1000 categories and since we have two classification classes, we fine-tuned these layers according to our classification problem. Layer 23 was set to be a fully connected layer of size equal to the number of our classification classes, 2. Layer 24 applies the Softmax and does not need to change. Layer 25 holds the name of the loss function used for training the network and the class labels [54]. Thus, layer 25 was set to be the classification output layer.
The solver algorithm used is "Stochastic Gradient Descent with Momentum" (SGDM). The initial learning rate used for training was set to 1 × 10 −4 . The maximum number of epochs were 5 and the size of mini-batch to be used at each epoch was 15. The training was carried out in both a CPU and a GPU.

Activation's of Different Layers in CNN
Each layer of the CNN produces a specific response to the original image called activations which can be viewed to investigate and learn further how the network learns and what features are adapted by the model to recognize falls. The activations of different layers can be viewed and one can discover which features are learned by the network by comparing areas of activation with the original image. This can also help if we want to manually extract the features by looking at different layers and the features they highlighted.
In Figure 5, several tiles can be seen on the grid. Each one is the output of a channel in convolution layer 1 (conv1). Strong positive activation is represented by white pixels and similarly strong negative activations are shown by black pixels. A grey pixel represents moderate activation. The position of the pixels in the activation of a channel corresponds to the same position in the original image [54]. It can be noticed in Figure 5, the network has started to learn about specific fall areas in its activation's on the first convolution layer. Though it was never explicitly told to learn about this specific fall pattern in scalogram, it has automatically outlined those features in a fall. The feature highlighted by the strongest activation in the convolution layer 5 (conv5) in Figure 6 shows that the network correctly highlights the fall feature in a scalogram. Similar patterns can be observed in other scalograms containing falls. Similarly, by looking at Figure 7, it can be observed that activations of the deeper layer are more specific in features as compared to activations of the earlier layers which are more generic.

Extension of the Algorithm: Phase II
In the next phase, this study was enhanced by increasing datasets using time series augmentation technique called slicing and adding two different publicly available datasets, [28,55]. In this phase another pre-trained CNN called GoogLeNet [56] was trained along with AlexNet [57]. The training for this phase was carried out on a Macbook with 2,6 GHz 6-Core Intel(R) Core(TM) i7. This phase was also concluded with an accuracy of 98.44%.

Data Augmentation and its Challenges
In order to verify the effectiveness of the proposed algorithm, an additional class was added to our model called DAILY ACTIVITIES. This class encompasses the daily activities which we do in daily life, such as moving around, walking, sitting and jumping. Two publicly available datasets, [28] and [55], were added in the newly defined class. The dataset [28] contains ECG readings from 10 subjects (with mean age of 27, 1 female and 9 males). The signals were recorded performing four body movements-left and right arm up/down, sitting down and standing up and waist twist. Similarly the other database [55], has ECG signals recorded from a healthy 25-year-old male performing different physical activities.
One of the main issues incorporating a new dataset to our existing dataset was the difference in sampling frequencies between the collected data and external data sources. We collected our data at a rate of 62 Hz whereas both external databases have data sampled at a rate of 500 Hz. In order to bring all the data to a common sampling frequency, our collected ECG dataset was interpolated at a rate of 5. Interpolation works better as compared to simple upsampling or resampling. In MATLAB function upsample(), zeroes are inserted between the corresponding values. Similarly in resample(), an anti-aliasing filter is applied after interpolation, which causes a drastic decrease in frequency amplitude. Hence, the function interpolation was used instead of upsample() and resample().
In the previous phase, the dataset collected by us was limited and convolutional networks tend to over-fit with limited datasets. Hence, it was vital to augment the available data. Very few techniques exist in the field of time series augmentation specifically for deep neural networks [58]. We have used the method called windows slicing for augmenting the ECG signals as described in [59].
For a time series T, T = {T 1 , ..., T n } , window slicing is described as slicing the T into small snippets such that each snippet S i: The slicing operation is described as follows [59]: Slicing(T, s) = {S 1:s , S 2 : s + 1, ..., S n−s+1:n } In our case, the length of each slice was 4000 and the TS was sliced at every 1000th interval. Thus, each time series signal of length n gave us {n − 4000/1000} signals. All the time series generated by Slicing(T,s) will have the same label as that of T.

Tuning the GoogLeNet
Along with AlexNet, GoogLeNet was trained in phase II to obtain a comparative analysis of transfer learning. GoogLeNet, like AlexNet, is also a pre-trained convolutional neural networks. Introduced in [56], GoogLeNet was a winner of ILSVRC 2014 classification and detection challenges. Since the pre-trained models are already trained on a large image set, fine tuning them would provide a simpler and efficient solution to the issue of data shortage for deep learning problem specifically in medical domain. Now our model would have three classification classes, namely, FALL, RESTING and DAILY ACTIVITIES.
GoogLeNet has 144 layers as compared to 25 layers in AlexNet. It expects an input image of size 224 × 224 × 3. GoogLeNet introduced a novel feature called inception. It replaces the fully connected architecture with sparse architecture inside the network. The size of convolution filter is fixed in earlier CNN models like AlexNet, and VGGNet. However, now multiple convolution filters of varying size and a maxpooling is done altogether for the previous layer, and the result is stacked together again at output. This not only leads to extraction of different features but it is also computationally efficient [56].
Fine tuning GoogLeNet requires redefining some layers of the network. First of all, final drop out layer is replaced with a probability of 0.6 instead of 0.5. The last two layers, 'loss3-classifier' and 'output' are replaced with layers which are in accordance with our classification scheme. The layer 'loss3-classifier' is replaced to classify three classes instead of 1000. Finally, the last layer 'classification' is replaced with a new layer with out any classification labels. The output classes would be set accordingly during the training time.

Transfer Learning to the Rescue: GoogLeNet and AlexNet
Even after applying data augmentation and adding an external database, our database reached a total of 1273 (see Table 2), which would not suffice to train any CNN from scratch for the risk of over-fitting. This kind of situation is far too common in the medical field due to strict data protection and privacy laws and unavailability of domain specific data. Transfer learning can be the answer to various questions in this regard. A pre-trained model is already trained on a huge dataset; in the case of most CNNs, IMAGENET [60]. They have their inner layers already generalized enough to extract the relevant features from a new domain. Hence, training on a new domain can be achieved even on a small or mid-sized dataset. We just have to re-train outer most layers to readjust their weights. Hence, it is also resource friendly.
A number of models, both AlexNet and GoogLeNet, were trained. The different parameters, such as initial learning rate (ILR), training algorithm, percentage of validationtraining-testing data were varied in order to verify the generalization of the model and proposed algorithm. Training algorithms called Stochastic gradient descent with momentum (SGDM) and Root Mean Square Propagation (RMSprop) were used with similar configurations to compare the outcome. SGDM is a stochastic approximation method initially proposed in [61] with a momentum added to it [62]. RMSprop was initially proposed and explained by Geoff Hinton in an online course [63] and is an unpublished optimization algorithm. The summary of the trained models and the results are shown in Tables 3 and 4. Out of many trained models, Model 2 from Table 4 (marked as blue) was selected as the most suitable trained model. Model 5 and 6 were not chosen despite of a perfect 100% validation accuracy because they might be over-fitted. Similarly, Model 12 was not chosen because the percentage of validation data is greater in Model 2 (0.8-0.1-0.1) than in Model 12 (0.6-0.2-0.2). A confusion matrix for testing data of Model 2 is shown in Table 5. The last row and column in the confusion matrix indicate the total number of correct predictions expressed in percentage. For example, if we look at the first column, 55 instances of DAILY ACTIVITIES are correctly predicted as DAILY ACTIVITIES and 5 of them were incorrectly predicted as FALL by the model. This amounts to a total correct prediction percentage of 91.7% for daily activities. Similarly, out of a data of 255 instances, 246 were correctly predicted which amounts for a total accuracy of 96.5%. A graphical summary of the training result is shown in Figure 8.

K-Fold Verification
A k-fold verification was applied on model number 2 from Table 4, as a measure to estimate the generalization of the model. Keeping the size of dataset in mind, k was set to 3. The dataset was divided in three equal parts. At each split, one set was used as the training data and the trained model was tested on both validation and testing data to obtain validation and testing accuracy respectively. An average validation accuracy of 97.69% and a testing average accuracy of 97.37% was achieved. An overview of the process is depicted in Table 6-the blue colour in the table indicates the part of data used as training data, whereas black indicates the segment used as testing data.

Analysis of Results
In this section analysis of different results is done to obtain a better understanding of the process. Figure 9a shows a FALLECG with three falls in it. It can be observed that the amplitude has increased significantly in three areas. This can be contributed to either noise due to hit during the fall or due to an actual increase in cardiac activity. The increase lasts for about 1.6 s and then immediately goes back to normal. This time duration coincides with the time taken for the body to touch the ground. This increase in amplitude is of special interest to us. When zoomed in, as shown in Figure 9b the increase goes to an amplitude of 1 mV maximum and is followed by normal cardiac activity as we can see the normal QRS complex after the fall peaks. The other smaller peaks in heart rate in other areas are due to the movements of the subjects from the ground back to the table. The same pattern is repeated with small differences in all other recorded falls.    Figure 11 gives a side-by-side view of the time domain signals and the corresponding frequency domain of the signals for all three classes of the model. A distinct frequency spectrum can be seen in Figure 11 for FALL, RESTING and DAILY ACTIVITIES. The fall is visible in time domain and is also characterized by a spike in frequency domain. The daily activities ECG graph shows considerable noise even after filtration. This is due to the constant movement of the subjects resulting in baseline wander.

Analysis of Scalograms
The scalogram is obtained by plotting the absolute value of CWT of a signal as a function of time and frequency. They are helpful in visualization of varying events in a signal. Scaling and shifting are used on a prototype wavelet to highlight the transient changes in the signal. We used 'Morse' wavelet as a prototype wavelet because its form is very similar to the ECG shape. The fall scalograms shown in Figure 4 have distinct localized high energy areas whereas the resting scalograms have high energy distributed all over the diagram. Now the comparison between DAILY ACTIVITIES scalogram with fall scalogram is interesting. The scalogram of daily activity is shown in Figure 4a. The high energy is distributed almost uniformly all over the representation. If we compare the DAILY ACTIVITIES to the fall scalogram (Figure 4c), it can be clearly seen that noise or high energy due to movements can be seen throughout the daily activities in contrast to a localized high energy area in fall scalogram. Additionally, the magnitude of the energy is different for RESTING and DAILY ACTIVITIES scalograms.

Discussion of the Research Question
It is evident from the extension of phase I, that even after adding an additional class, the model is trained well and is generalized enough after varying the parameters. With the addition of an external database, the model has learned well the classification task.
This answers our basic research question, which was, "can a fall be detected using an ECG signal?" It is shown that not only can we detect fall in ECG signals but we were able to do it with an accuracy of greater than 90%. As for changing the parameters, it is noticeable from Tables 3 and 4, how initial learning rate plays the most significant role in determining the validation accuracy. Varying the parameters like the algorithm and the dataset ratio does not make significant change in the validation or testing accuracy. It can be also seen from the tables that by keeping all other parameters constant, the choice of neural network does not make any significant difference in the results.
Hence, a roll over fall has some characteristics imprinted in electrocardiogram signals, which are detectable in time-scale domain (scalograms). That is why the models have learned to distinguish a fall activity from a no fall activity A visible difference with respect to energy distribution can be seen in different activities. These characteristics are distinguishable from an ECG of a resting person or a person performing daily activities. More investigation still needs to be done in the frequency spectrum of ECG signal to learn more about the pattern specific to the fall and other activities.
Our algorithm has obtained an accuracy of more than 97%, greater than that achieved in [20]. Although in [20], different movements were classified using traditional deep learning techniques and fall was not among those activities.
However, our experimental setup is not free of limitations. The focus of this paper has been on determining proof of concept in laboratory and by no means this is a deployment ready prototype. The initial experimental population is considerably small. That was compensated in the second phase by adding two different external datasets. Both datasets not only increased the instances numerically but also brought variation with respect to number of subjects, age and type of noise and signals. Both datasets had obtained signals using different devices. All these factors strengthens our proof of concept. We are also bounded by the obvious limitations of the controlled experimental set up. For example, the subjects have to wear the safety gadgets (helmet, knee caps, etc.) in order to comply with the safety regulations. This might cause an onset of anxiety and nervousness in volunteers, which in turn may effect the heart rate signals. However, these circumstances are unavoidable and we have to take them as experimental conditions.

Conclusions and Future Work
It is shown clearly that ECG signals can be utilized alone for not only distinguishing different activities but also fall from no-fall activities by using a combination of wavelet transform and transfer learning. The transfer learning models learned as efficiently and accurately as any other convolutional neural networks trained from scratch. In future, we plan to further analyze the transfer learning approach for time series classification by directly training the ECG signals using long short-term memory (LSTM) networks. Additionally, a detailed frequency spectrum analysis of signals can be performed to extract the features specifically present in different activities. Further experiments are planned to be carried out to include different kind of falls, activities and circumstances. As discussed in the previous section, because of the limited dataset initially acquired, we plan to carry out our experiments on a larger and more diverse (in terms of age and gender) group of people together with a hugely increased set of data to improve this study. Another extension we would like to do is to test our proposed algorithm with state of the art pre-trained networks, such as EfficientNet [64]. Institutional Review Board Statement: Our study was conducted with healthy volunteers. Each participant gave written consent to allow measurement data and images to be anonymized and used for publications. A risk assessment was established with assistance from the medical doctor, safety officer and occupational safety specialist of the university, which includes safety precautions to minimize all possible injury risks. These precautions had to be strictly respected, and all participants were instructed and encouraged to follow our instructions before starting the tests to prevent injury. Our study was conducted in accordance with the ethical principles of the Declaration of Helsinki.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Acknowledgments:
We thank the volunteers who participated in the fall detection experiments for their time and effort. Their valuable input has lead to beneficial results which were critical to this study. We are thankful to Lorena Gutiérrez-Madroñal from the Department of Computer Science and Engineering, University of Cádiz for her software and assistance in the initial filtration of the raw ECG sensor files. We would also like to thank Sameen Arshad for her insight during the validation of ECG sensor readings and for her medical advice on ECG analysis.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: