EMG Pattern Recognition in the Era of Big Data and Deep Learning

: The increasing amount of data in electromyographic (EMG) signal research has greatly increased the importance of developing advanced data analysis and machine learning techniques which are better able to handle “big data”. Consequently, more advanced applications of EMG pattern recognition have been developed. This paper begins with a brief introduction to the main factors that expand EMG data resources into the era of big data, followed by the recent progress of existing shared EMG data sets. Next, we provide a review of recent research and development in EMG pattern recognition methods that can be applied to big data analytics. These modern EMG signal analysis methods can be divided into two main categories: (1) methods based on feature engineering involving a promising big data exploration tool called topological data analysis; and (2) methods based on feature learning with a special emphasis on “deep learning”. Finally, directions for future research in EMG pattern recognition are outlined and discussed.


Introduction
Recognition of human movements using surface electromyographic (EMG) signals generated during muscular contractions, referred to as "EMG Pattern Recognition", has been employed in a wide array of applications, including but not limited to, powered upper-limb prostheses [1], electric power wheelchairs [2], human-computer interactions [3], and diagnoses in clinical applications [4]. Compared to other well-known bioelectrical signals (e.g., electrocardiogram, ECG; electrooculogram, EOG; and galvanic skin response, GSR), however, the analysis of surface EMG signal is more challenging given that it is stochastic in nature [5]. For upper-limb myoelectric prosthesis control, as an example, many confounding factors have also been shown to greatly influence the characteristics of the EMG signal and thus the performance of EMG pattern recognition systems. Just some of these challenges include the changing characteristics of the signal itself over time, electrode location shift, muscle fatigue, inter-subject variability, variations in muscle contraction intensity as well as changes in limb position and forearm orientation [1,[6][7][8][9][10].
To capture and describe the complexity and variability of surface EMG signals for more advanced applications, a massive amount of information is therefore necessary.
Thanks to recent advancements in commercial EMG signal acquisition technologies, data storage and management, and file sharing systems, the field is now moving into the era of "big data". Several factors have contributed to the recent expansion of EMG data resources such that big data approaches are beginning to be viable. First, EMG data sets collected as part of individual research studies are now being made available online instead of residing solely on hard drives within the laboratories of individual researchers (e.g., [10][11][12]). Secondly, as in other research communities, the availability of benchmark EMG databases has been critical to the growth of the field [13]. Thirdly, the development of high-density surface EMG systems has introduced the concept of a surface EMG image and thus dramatically increased the volume of data [14,15]. Lastly, the increasing availability of multi-modality sensor systems has generated larger amounts of data in which the EMG signal is considered one of the most important sources of information [16,17]. Here, we present the current state of existing shared EMG datasets, highlighting the opportunities and challenges in the development of truly big EMG data.
To translate the vast and complex information in EMG signals into useful control signals for prosthetic devices or a meaningful diagnostic tool for identifying neuromuscular diseases, advanced data analysis and machine learning techniques capable of analyzing big data are needed. Existing EMG pattern recognition approaches can be broadly divided into two categories: (1) methods based on feature engineering and (2) methods based on feature learning. "Feature Engineering" and feature extraction have been key parts of conventional machine learning algorithms. In EMG analysis, short time windows of the raw EMG signal are extracted and augmented by extracting time and frequency features aimed at improving information quality and density. Many studies have shown that the quality and quantity of the hand-crafted features have great influence on the performance of EMG pattern recognition [18][19][20][21].
Here, we review methods that can be applied to big data analytics involving a method rooted in algebraic topology called "topological data analysis". This method has recently been shown to facilitate the design of an effective sparse EMG feature set across multiple EMG databases and scales well with data set size [22].
Conversely, in "feature learning", explicit transformation of the raw EMG signals is not required as features are automatically created by the machine learning algorithms as they learn. The use of "deep learning" therefore shifts the focus from manual (human-based) feature engineering to automated feature engineering or learning. Although neural networks have been used in EMG research for several decades, deep learning techniques have recently been applied to EMG pattern recognition. This is, at least in part, due to the lack of sufficient EMG data availability to train these deep neural networks in the earlier years of the field. With the advent of shared bigger EMG data sets and recent advances in techniques for addressing overfitting problems, most emerging deep learning architectures and methods have now been employed in EMG pattern recognition systems (e.g., [14,23,24]). In some cases, both feature engineering and learning are combined by inputing pre-processed data or pre-extracted features to a deep learning algorithm with some benefits having been shown (e.g., references [11,23,24]). Here, we provide a comprehensive review of the recent research and development in deep learning for EMG pattern recognition. Directions for future research are also outlined and discussed.

Big EMG Data
In addition to the fact that some research questions cannot be answered using single, small data sets, larger samples are generally preferable to account for the large inter-and intra-subject variability in surface EMG signals. Similarly, differences in instrumentation and data collection protocols can introduce biases in small sample sizes. Over the last decade, a long-standing interest in acquiring the large-scale EMG data sets has been increasingly fulfilled. The four main factors that have contributed to expanding EMG data resources into the big data discussion are outlined and presented in this section.

Multiple Datasets
The first step in the successful open sharing of big data resources usually comes from a number of individual researchers and research groups who are motivated to share data collected as part of research studies. Although most EMG studies have collected data from small cohorts of participants (n = 5-40), relatively large EMG data sets of several hundreds to thousands of subjects could be easily gathered if their data was made available online along with their publications. However, researchers have not typically published their data for a number of reasons. For one, the extra work required to prepare data before making them publicly available may not be worth the perceived benefits. Furthermore, the collection of most data sets require significant investment in time and effort, and thus researchers may prefer to release them only once they have extracted the maximum perceived value and not at the time of the first publication. Keeping the data set private can preserve the right to re-analyze the data in the future, either to apply different analytical techniques or to investigate different research questions. In some cases, fear may also play a factor, as opening data sets facilitates subsequent analyses that might uncover problems with the data or invalidate previous results. Whatever the reason, many EMG data sets have remained hidden, residing solely on hard drives within the laboratories of individual researchers.
With the advent of "data papers", which allow researchers to publish their data sets as citable scientific publications [25], more and more EMG data sets have been made available online. Because theperformance of EMG pattern recognition can vary depending on subject, experimental protocol, acquisition setup, and differences in pre-processing, multiple datasets are needed to ensure the robustness and generalization of findings [22,26].
For instance, Kamavuako et al. [27] found that there was no consensus on the optimum value of the threshold parameter of two of the most commonly used EMG features: zero crossings (ZC) and slope sign changes (SSC), leading them to investigate the effect of threshold selection on classification performance and on the ability to generalize across multiple data sets. Their results showed that the optimum threshold is highly subject-and dataset-dependent, i.e., each subject had a unique optimum threshold value, and, even within the same subject, the optimum threshold could change over time.
In practical use, it is desirable to build models that can generalize from one set of subjects to another, one day to another, and from one setting to another. Therefore, they recommend a global optimum threshold value yielding a good trade-off between classification performance and generalization based on the global minimum classification error rate across four different EMG data sets.
The performance of many different EMG pattern recognition methods has been evaluated in a host of studies over the last few decades. However, most previous studies have been limited in terms of the relatively small sample sizes used for classification (small datasets) from one highly specific experiment (constrained datasets), and most of them have studied either no or only one practical robustness issue. A comparison of EMG pattern recognition methods using multiple EMG datasets could thus help identify robust feature extraction and classification methods. Scheme and Englehart [21] re-evaluated the performance of the commonly used Hudgins' time domain features (ZC, SSC, mean absolute value (MAV) and waveform length (WL) [28]) and several additional features (autoregressive coefficients, AR; cepstral; coefficients, CC; Willison amplitude, WAMP; and sample entropy, SampEn) using six different EMG data sets containing over 60 subject sessions and 2500 separate contractions. Khushaba et al. [29] proposed a novel set of time domain features that can estimate the EMG signal power spectrum characteristics using five different EMG data sets. Phinyomark et al. [26,30] investigated the effect of sampling rate on EMG pattern recognition and then identified a novel set of features that are more accurate and robust for emerging low-sampling rate EMG systems, using four different EMG data sets containing 40 subject sessions with over 8000 separate contractions.
A summary of the existing shared EMG data sets for the classification of hand and finger movements are listed in Table 1. These fifteen datasets represent over 160 subject sessions with over 16,000 trials and more than 90,000 s of muscle contraction. Three of the datasets used sparse EMG channels (i.e., requiring precise positioning of the electrodes over the corresponding muscle) while the other twelve data sets employed wearable EMG armbands (i.e., multiple EMG sensors positioned radially around the circumference of a flexible band; see reference [31] for a review). The recent availability of consumer-grade wireless EMG armbands (such as the Myo armband by Thalmic Labs) will enable more researchers to collect EMG data and thus, represents a real opportunity for big data sharing. These data sets also include many of the dynamic factors that influence the performance of EMG pattern recognition, including changes in limb position, change of forearm orientation, varying contraction intensity, and between-day variability. An enormous variety of subjects, experimental protocols, acquisition setups, and pre-processing pipelines are clearly shown in Table 1, and consequently, this group of currently available datasets can be used for a comprehensive investigation of the generalization and robustness of EMG pattern recognition for myoelectric control. It is important to note that some subjects may have participated in more than one study (different subject sessions) for EMG datasets recorded from the same research group. Also, some datasets (e.g., Khushaba et al. 2 [32], Khushaba et al. 3 [33], and Chan et al. [34,35]) are only partially available online and require contacting the researchers who shared the data to access a full dataset.

Benchmark Datasets
As discussed, multiple datasets can be used to investigate the generalization and robustness of EMG pattern recognition methods, but only to a certain extent. A major limitation of the multiple-dataset investigation approach is the fact that EMG data from different data sets cannot be combined into one larger set due to experimental and equipment differences. Without large EMG datasets being collected using a single standardized protocol, it is difficult to investigate the generality of the findings across gender, age, characteristics of the amputation, etc. Although there are several recommendations for protocols, acquisition setups, and pre-processing pipelines such as the European recommendations, written by the Surface ElectroMyoGraphy for the Non-Invasive Assessment of Muscles (SENIAM) project (www.seniam.org), no solid benchmarking protocol and experimental setup (e.g., the set of movements, electrode locations, sampling rate, and filtering) has been adopted in previous studies. This is in stark contrast to other research communities that have found substantial benefit in the wide acceptance of protocols, leading to publicly available benchmark databases such as the 1000 Functional Connectomes Project and the Human Connectome Project databases for resting state functional magnetic resonance imaging (rfMRI) [36]. The usefulness and importance of benchmark databases have been clearly acknowledged in many research fields, and the lack of such a benchmark in the EMG community is a major obstacle towards open sharing of big EMG data.
In the earlier years, EMG studies were largely limited to large research centers that possessed highly specific and expensive instrumentation and manpower to acquire EMG data acceptable to the research community. This made it difficult for small laboratories to develop countries to contribute meaningfully to the field. Moreover, because of constraints on funding, time, subjects, etc., the volume of data was typically limited to the minimum required to verify a specific scientific hypothesis. In myoelectric control, this has often consisted of approximately ten able-bodied subjects and/or a few amputees.
The creation of a benchmark protocol and database would not only promote comparison between methods, attracting additional researchers from the signal processing and machine learning communities, but would also foster progress in big EMG data by encouraging the contribution of new datasets from other research groups using the same experimental protocols.
Indeed, the EMG research field has lagged behind other biomedical research fields in the development of big data sharing resources. The Non-Invasive Adaptive Prosthetics (NinaPro) database may currently be the biggest and most widely known publicly available benchmark database to date. The Ninapro project was launched in 2014 [13], and to date, consists of seven data sets [37][38][39][40][41] containing surface EMG signals from the forearm and upper arm using 10-16 EMG channels together with several additional modalities recorded from 117 able-bodied subjects and 13 amputees performing a partial set of 61 pre-defined hand and fingers movements ( Table 2). In total, there are more than 48,000 trials and 326,000 s of muscle contraction. Additional modalities (depending on dataset) include inertial measurement units (IMU) or accelerometry data acquired using Delsys Trigno Wireless electrodes or Myo armbands, kinematic hand data acquired using a 22-sensor CyberGlove II data glove, wrist orientation data acquired using a two-axis Kübler IS40 inclinometer, finger force data measured using an Finger-Force Linear Sensor (FFLS) device, and eye movement data using a Tobii Pro Glasses 2 wearable eye tracker. All datasets are fully accessible upon successful registration at http://ninapro.hevs.ch. Data are stored anonymously and subject demographic information is limited to gender, age, height, weight, laterality, and several clinical characteristics of the amputee subjects. Supporting files for the experimental protocol and acquisition setup (e.g., stimulus videos and software) can be obtained on an individual basis by contacting the Ninapro team.
Unfortunately, although a large number of movements and electrode locations have been proposed by the Ninapro project, the maximum number of movements and electrode locations that can be combined across the seven current EMG datasets are seven and eight, respectively. Similarly, there is no consensus on sampling rate, filtering, resolution, gain, etc. due to the use of different EMG acquisition devices, i.e., an Otto Bock MyoBock System for Ninapro 1, a Delsys Trigno Wireless EMG System for Ninapro 2, 3, 6 and 7, a Cometa Wave Plus wireless EMG system for Ninapro 4, and Thalmic Myo armbands for Ninapro 5. Some manipulation and transformation of data are therefore necessary before combining EMG data across the Ninapro data sets.

High-Density Surface EMG
There are two common approaches to measuring EMG signals. One is to place electrodes precisely over specific muscles (known as sparse multi-channel surface EMG), and the other is to use array-like arrangements of electrodes that are placed over a muscle area. The latter is the more common approach in the myoelectric control literature, as shown in Tables 1 and 2, but is often limited to a single row of electrodes (i.e., an EMG armband) [31]. To increase the spatial information of electrical muscle activity, high-density surface EMG (HD-sEMG or HD-EMG) has been proposed, which increases the density and coverage of the electrodes. Typically, HD-sEMG employs a large two-dimensional (2D) array of closely spaced electrodes with small size. The total number of electrodes that has been proposed for HD-sEMG is in the range of 32 [12] to over 350 [42], while the maximum number of electrodes for typical EMG armbands is 16 (Tables 1 and 2). The existing shared HD-sEMG data sets, which use electrode arrays of 32, 128, and 192, are listed in Table 3. Due to the high sampling frequencies used when measuring surface EMG (typically 1000 Hz or above), large three-dimensional arrays, i.e., thousands of 2D images, can be obtained in just a few seconds of muscle contraction from a single subject. For the csl-hdemg dataset, as an example, 6500 trials of 3-s muscle contraction were recorded using a 192 electrode array sampled at 2048 Hz. As a result, there are over 39 million sEMG images in this dataset alone. Hence, the development of HD-sEMG has dramatically increased the volume of data.
The collected HD-sEMG data allows the analysis of EMG information in both the temporal and spatial domains, leading to new possibilities for analyzing EMG signals using image processing techniques. Two methods of analyzing these kinds of EMG signals include the HD-sEMG map (a topographical image) and the sEMG image (an instantaneous image). Following conventional EMG pattern recognition methods, the HD-sEMG map can be computed using the root mean square (RMS) [42] or other amplitude-based feature extraction methods (e.g., MAV, WL, etc.) [43], of individual channels distributed in 2D space. This map is thus also sometimes referred to as an intensity or heat map. Often, the active region of the HD-sEMG map associated with a certain muscle, the so-called activation map, is identified using an image segmentation method and used as an input for subsequent feature extraction methods. Features extracted from HD-sEMG maps can be based on intensity information (any signal magnitude and power feature [18,22]) and spatial information (e.g., the mean shift [42] or the coordinates of the centre of gravity and maximum values [44]). These maps and additional spatial-based features can be used to reduce the effect of confounding factors that influence the performance of EMG pattern recognition such as the changing characteristics of the signal itself over time and electrode location shift [45] as well as variations in muscle contraction intensity [44]. However, this remains a relatively new sub-field, and novel image segmentation and spatial feature extraction methods are still needed to improve the performance of robust EMG pattern recognition.
Instead of forming an image based on the signal magnitude taken from some time window of raw sEMG signals, as is typically done, the instantaneous sEMG image can also be directly formed from the raw sEMG signals [14]. This sEMG image is equivalent to the HD-sEMG map with a window length of one sample. The number of pixels (resolution) in these sEMG images is then defined by the total number of electrodes (e.g., an electrode array with eight rows and 16 columns forms an image with 8 × 16 pixels), while the number of instantaneous sEMG images captured per second is dictated by the sampling rate used (e.g., a sampling frequency of 1000 Hz with 3 s of muscle contraction provides 3000 sEMG images). Without applying feature extraction methods, instantaneous sEMG images have been treated as an image classification problem and thus classified using deep learning approaches. A simple majority voting over several tens to several hundreds of frames can then be used to further improve the recognition performance [14]. More details about deep learning and sEMG image analysis are discussed in the Section 3.2.
It is important to note that increasing the number of electrodes is not strictly necessary to increase the recognition performance. In fact, several studies have shown that there is little need to use all EMG channels (over 100 electrodes), and instead, a properly positioned smaller set of electrodes (e.g., 9 [44] and 20-80 [45]) can provide comparable results. There is, however, no consensus on the global optimum number of electrodes yielding maximum recognition performance. Moreover, the optimal EMG electrode sub-set is highly subject-dependent (even within the same experimental protocol [41,46]), and further research is needed in this area. The use of HD-EMG has also thus far been limited to controlled in-laboratory settings, limiting its practical applications.

Multiple Modalities
Because EMG captures the activity of muscles as part of the musculoskeletal system, information about the same contractions or motions can also be measured using different types of measuring techniques, instruments, and acquisition setups. The analysis of solely surface EMG signals could therefore be considered as the analysis of a single modality. Due to the increasing availability of multi-modality sensing systems, multi-modal analysis approaches are becoming a viable option. Multiple modalities can be used to capture complementary information which is not visible using a single modality, or to provide context for others. Even when two or more modalities capture similar information, their combination can still improve the robustness of pattern recognition systems when one of the modalities is missing or noisy.
Thus far, myoelectric control of powered prostheses is the most important and commercial application of EMG pattern recognition. In this context, accelerometers have been the main supplementary modality and are the most prevalent in shared surface EMG datasets, such as Khushaba et al. 5, Ninapro 2, 3, 5, 7, and mmGest datasets (see Tables 1-3). Accelerometery has been shown to provide additional information to EMG, especially to reduce the effects of limb position [47,48].
Outside of prosthesis control, other applications of EMG pattern recognition for which multi-modality data sets exist include, for example, sleep studies, such as the Cyclic Alternating Pattern (CAP) Sleep Database [49] and the Sleep Heart Health Study (SHHS) Polysomnography Database [50]; biomechanics, such as the cutting movement dataset [51] and the horse gait dataset [52]; and brain computer interfaces, such as the Affective Pacman dataset [53] and the emergency braking assistance dataset [54]. Recently, emotion recognition using multiple physiological modalities has gained attention as another important application that has benefited from the incorporation of surface EMG.

Emotion Recognition
Emotion recognition is one of the larger growing disciplines of multi-modal research, along with audio-visual speech recognition and multimedia content indexing and retrieval. The objective assessment of human emotion can be performed using the analysis of subjects' emotional expressions and/or physiological signals. Until recently, most studies on emotion recognition and affective computing have focused on the analysis of facial expressions, speech, and multimedia content to identify the emotional state of the subjects. With the growth of wearable technology, however, physiological signals originating from the central and peripheral nervous systems have now gained attention as alternative sources of emotional information.
One of the earliest examples of multi-modal emotion recognition based on physiological signals was the study by Healey and Picard [55]. They recorded EMG from the trapezius muscle (tEMG), several physiological signals involving electrocardiogram (ECG), galvanic skin resistance (GSR), and respiration (Resp), and composite video records of the driver during real-world driving tasks of 24 subjects. These signals were collected over a 50-min duration for each subject and used to determine the driver's level of stress. The data from 17 out of the 24 subjects publicly available via PhysioNet [56]. An alternate common experimental approach is to use multimedia content (e.g., music video clips and/or movie clips) as the stimuli to elicit different emotions of subjects. Table 4 summarizes four such publicly available data sets. While the DEAP (a Database for Emotion Analysis using Physiological signals) [57] and HR-EEG4EMO [17] datasets contain brain signals acquired using electroencephalogram (EEG) sensors, the DECAF (a multimodal dataset for DECoding user physiological responses to AFfective multimedia content) [58] dataset measures brain signals using magnetoencephalogram (MEG) sensors. These datasets, however, are not limited to brain signals; in fact, the BioVid Emo DB dataset [59] includes no brain signals at all. They also include various combinations of the following peripheral nervous system signals: surface EMG from the zygomaticus major muscle (zEMG), corrugator muscle (cEMG), tEMG, blood volume pressure (BVP), ECG, Resp, skin temperature (Temp), peripheral oxygen saturation (SpO2), pulse rate (PR), and electrooculogram (EOG). Facial videos were also recorded for all datsets. Another interesting multi-modal database is the BioVid Heat Pain database [60]. The tEMG, zEMG, cEMG, ECG, GSR, and EEG signals were collected along with facial videos from 86 subjects during exposure to painful heat stimuli. To gain access to these datasets (other than Healey and Picard's dataset), the EULA (End User License Agreement) must be printed, signed, scanned, and returned via email to the authors of each dataset. Upon approval, they will then provide a username and password that can be used to download the data.
Compared to the previously discussed surface EMG data sets, the volume of these multimodal data sets is huge. For instance, the raw data from the 60-h MEG and peripheral physiological recordings in the DECAF dataset alone make up more than 300 GB. Either due to instrumentation limitations, or to limit the volume of data, unfortunately some of these datasets sampled EMG signals at lower frequencies, such as 15.5 Hz for the Healey and Picard dataset and 512 Hz for DEAP and BioVid Emo DB datasets (see Table 4). These lie well below the typical 1000-Hz sampling frequency for EMG signals, below which the performance of EMG pattern recognition has been shown to suffer from the loss of high frequency information [26,30].

Discussion
Although the EMG data sets outlined above are not as large as many other forms of big data, these shared datasets are large enough that a single computer cannot process them within a reasonable time (big volume) and they exhibit several big data quantities [61]. It is important to note that size is only one characteristic of big data, with others being equally important in its definition [62]. Specifically, big variety refers to the diversity of information within a single dataset or the diversity of multiple datasets. This is a critical aspect of both big data and EMG research, since sub-populations and different experimental conditions routinely favor different features and algorithms that are not shared by others. Therefore, no single EMG data set, big or not, should be considered to be comprehensive, and cross-validation of multiple datasets is recommended for the development of robust EMG pattern recognition systems [22,26]. Although larger EMG data sets would be preferable, the current publicly available EMG datasets (Tables 1-3) are sufficient to shed some light on the generalizability and robustness of EMG pattern recognition (and, in particular, myoelectric control). Intuitively, big variety also applies when surface EMG is analyzed together with other modalities such as EEG, MEG, and facial video (Table 4).
Big veracity refers to noise, error, incompleteness, or inconsistencies of big data. This can be interpreted in many ways in the context of EMG, and, in particular, myoelectric control, as noisy, incomplete, and inconsistent EMG data, often occurring in human experimentation. From an application standpoint, the attribution of models built from normally-limbed subjects to amputees or spinal cord injury patients, who may have very different or reduced musculature or muscle tone and higher skin impedances, also introduces veracity challenges. As noted in the data sets of Table 2, amputee subjects may not complete experiments due to fatigue or pain, and the number and placement of electrodes is often reduced or changed due to insufficient space. Surface EMG signals are also often corrupted by noise and interference while traveling through different tissues and equipment, requiring dedicated hardware or compensatory pre-processing steps [63]. The development of EMG feature extraction and classification methods that are robust to noise is also important [64,65], as is the reduction of data (or dimensionality) when dealing with large-scale data sets. Determining relevant and meaningful features from a given larger set of features which may contain irrelevant, redundant, or noisy information is commonly accomplished using either feature selection [66][67][68][69] or feature projection methods [70][71][72][73]. When properly executed, these methods not only reduce the impact of noise and irrelevant information, but also the amount of computational time required for classification.
Big velocity refers to the rate at which data are generated and the speed at which they should be analyzed. The speed at which decisions are made is integral to EMG applications, either as support for clinical decisions based on EMG, or in real-time human-machine interfaces, such as with myoelectric control. It is important to note that although real-time "user in the loop" experiments for myoelectric control are important for providing a good representation of the usability of a system, these types of studies are limited in their contributions to the growth of big EMG data. Necessarily, they allow only for the direct comparison of selected methods within a single experimental session, do not allow for later offline use (the data are collected during feedback, and not feed forward, control), and are difficult to reproduce given the number of uncontrolled parameters. On the other hand, while benchmark datasets may not incorporate feedback control, they allow other researchers to easily replicate results, perform data analyses and compare different methods. Moreover, many currently shared EMG data sets now include more realistic and dynamic movements which better approximate real-life conditions, e.g., varying limb position, contraction intensity, etc. Nevertheless, real-time testing remains paramount in the assessment of the true dynamic performance of EMG pattern recognition. Moreover, several key metrics for measuring the efficacy of control can only be measured by performing real-time control experiments, such as motion selection time and motion completion time (the time required to select and complete the desired motion) [74] and the Fitts' Law test-based metrics [75].
The advantages of big data sharing (or Big Value) for EMG pattern recognition have been discussed throughout this section. Nevertheless, due to the limitations of the current benchmark database, the development of a new, standardized benchmark database for big EMG data would be highly beneficial. Such a benchmark could help to improve the reliability and reproducibility of research, improve research practices, maximize the contribution of research subjects, help to back up valuable data, reduce the cost of research within the EMG research community, and increase accessibility to the field for new researchers. Note that N, able-bodied (non-amputee) subject; A, amputee subject; M, male; F, female; LPF, low-pass filter; NF, notch filter; BPF, band-pass filter. a For subjects 1, 3, and 10, the number of movements was respectively reduced to 39, 49, and 43 movements (including rest) due to fatigue or pain; b For subjects 7 and 8, the number of electrodes was reduced to 10 due to insufficient space; c For subject 21, the number of movements was reduced to 38 (including rest). Table 3. High-density surface EMG (HD-sEMG) datasets: subject, experimental protocol, acquisition setup, and pre-processing pipeline.

Techniques for Big EMG Data
Many methods for processing and analyzing EMG data have been proposed and tested; however, most have been designed for, and restricted to, smaller datasets. Consequently, it is difficult for many of these traditional methods to handle large-scale data effectively and efficiently. Considering shared EMG data sets have only recently been released and that only a handful of recent methods are able to handle big EMG data, research based on big EMG data remains relatively new. Novel methods capable of analyzing such data could be developed either by modifying traditional methods to run in parallel computing environments or by proposing new methods that natively leverage parallel computing. These methods will become very important in turning any collected big EMG dataset into a meaningful resource.

Feature Engineering
EMG pattern recognition systems typically consist of several inter-connected components: data pre-processing, feature extraction, dimensionality reduction, and classification [1,2]. The stochastic and non-stationary characteristics of the EMG signal make the instantaneous value unsuitable for conventional machine learning algorithms [86]. Feature extraction, which transforms short time windows of the raw EMG signal to generate additional information and improve information density, is thus required before a classification output can be computed. During the past several decades, numerous different EMG feature extraction methods based on time domain, frequency domain, and time-frequency domain information have been proposed and explored [7,8,[18][19][20]22,26,[28][29][30]. Interesting EMG feature extraction methods include a set of ZC, SSC, MAV, and WL (the most commonly used features [28]); AR and CC (the robust features for EMG electrode location shift, variation in muscle contraction effort, and muscle fatigue [8]); WAMP (a robust feature against noise [64,65]); SampEn (a robust feature against between-day variability [7]); and L-scale (an optimal feature for wearable EMG devices [26]), to name a few. For extended coverage of window-based EMG feature extraction methods, the reader is encourage to consult some of these aforementioned studies [7,8,[18][19][20]22,26,[28][29][30].
To find the best combination of all available features, one would have to try all possible combinations which is not practical and is even unfeasible for large data sets. Moreover, the best combination for one application or scenario is not necessarily the best for others. Instead of evaluating the performance of every possible combination, dimensionality reduction (feature selection and feature projection) approaches have been employed to eliminate irrelevant, redundant, or highly correlated features. More often than not, however, classical dimensionality reduction techniques cannot be applied to big data, and it is therefore necessary to re-design and change the way the traditional methods are computed.
One possible approach is to create methods that are capable of analyzing big data by modifying traditional methods to work in parallel computing environments. For feature selection, some potential and well-known population-based metaheuristic methods, such as genetic algorithm (GA), particle swarm optimization (PSO), and ant colony optimization (ACO), have been found to be effective in selecting an optimal EMG feature set (e.g., [67][68][69]). These feature selection methods have been developed to work in parallel computing as well as on graphics processing units (GPU) [87,88]. Similarly, novel approaches have been proposed to run standard feature projection (e.g., principal component analysis (PCA)) in parallel or on GPUs for big data dimensionality reduction [89,90]. The size of the data in most current studies, however, can be effectively processed using standard methods in a single high performance computer, and thus, very few studies have concentrated on using either parallel versions or GPU-based implementations [66].
Another approach is to develop new methods that work natively in a parallel manner. A method called "topological simplification", which is a topological data analysis (TDA) method, has been recently shown to design an effective sparse EMG feature set across multiple EMG databases and to scale well with dataset size [22]. Specifically, topological simplification, as exemplified by the Mapper algorithm [91], is an unsupervised learning method that can extract a topologically simplified skeleton of complex and unstructured data by means of a series of local clusterings in overlapping regions of the data space, and then linking together clusters that share common data points. Thanks to the local nature of the clustering, Mapper naturally provides a separation of the complete problem into a set of many smaller problems which are immediately amenable to parallelization and that are merged only at the final step. Moreover, the local clusterings depend only on the distances between the points in the overlapping regions; hence, a high-dimensional feature matrix is projected effectively down to a small distance matrix. These properties make Mapper a very good tool for the analysis of big data as this approach can be naturally performed in a framework of big data analysis such as the Google's MapReduce paradigm.
The output of this topological simplification approach has often been used to extract non-trivial qualitative information from big data that is hard to discern when studying the dataset globally [92]. For EMG pattern recognition, this approach has been successfully used to create charts of EMG features spaces that are robust and generalize well across three different EMG data sets containing 58 individual subjects and 27,360 separate contractions [22]. These charts highlight four functional groups of state-of-the-art EMG features that describe meaningful non-redundant information allowing for a principled and interpretable choice of EMG features for further classification. To use the output of this approach for feature engineering and selection, we can evaluate measures (such as class separability, robustness, and complexity) selecting from the fundamental and most interesting feature groups to select the best representative features. Experimental results have shown that the Mapper selected feature set achieves the same (or higher) level of classification accuracies using a support vector machine (SVM) classifier as the set of features selected using an automatic brute-force feature selection method based on sequential forward selection (SFS) [22]. Additionally, based on a ranking of 81 features across 20 subjects, the computational cost of the Mapper method is approximately 21,600 times less than that of the SFS method. Furthermore, these topological feature maps could be used to inform the design of novel EMG features that fall into sparse feature groups or form entirely new groups of their own. For an extended coverage of the TDA and Mapper algorithms for biomedical big data, the reader is encouraged to consult this book chapter [93].

Feature Learning
Although feature engineering has been the dominant focus for EMG pattern recognition so far, feature learning, as exemplified by deep learning, has recently started to demonstrate better recognition performance than carefully hand-crafted features. Indeed, in the past few years, deep learning has made great progress in feature learning for big EMG data. In contrast to feature engineering and conventional machine learning approaches, deep learning can take advantage of many samples to extract high-level features by learning representations from low-level inputs. Deep learning algorithms, however, require large training datasets to train large deep networks (a few hidden layers, each with a large number of neurons) as well as an associated large number of parameters (millions of free parameters). To train true deep neural networks, it is therefore necessary to re-consider the way traditional large-scale neural networks are computed using parallel deep learning models, GPU-based implementation, and optimized deep learning models.
One well-known parallel deep learning approach is deep stacking network (DSN) [97], which uses a method of stacking simple processing modules. A novel parallel deep learning model called tensor deep stacking network (T-DSN) [98] has been proposed to further improve the training efficiency of DSNs using clusters of central processing units (CPU). Combinations of model-and data-parallel schemes have also been implemented in a software framework called DistBelief [99] to deal with very large models (more than a billion parameters). GPU-based frameworks are another important method for parallel deep learning models [100,101]. When high performance computing resources (multiple CPU cores or GPUs) are not available, however, additional methods of improving training efficiency are necessary. Model compression techniques, for example, have been successfully applied to pattern recognition applications which commonly require implementation on embedded systems [102]. For real-time control, an incremental learning method is employed to update parameters when new samples arrive while still preserving the network structure. An extended coverage of general deep learning techniques for big data can be found in several reviews [100,101,103].
In general, though, deep learning models can be roughly grouped into three main categories: unsupervised pre-trained networks, convolutional neural networks, and recurrent neural networks. Although their application to surface EMG is relatively new, these three categories of models have already been used to analyze the EMG signal. Table 5 details each of the previous EMG research works utilizing deep learning methods.

Unsupervised Pre-Trained Networks (UPNs)
UPNs can further be divided into stacked auto-encoders and deep belief networks. Auto-encoder neural networks are an unsupervised method that are trained to copy their inputs to their outputs using a hidden layer as a code to represent the input. A deep auto-encoder (a.k.a. stacked auto-encoder, SAE) is then constructed by stacking several auto-encoders to learn hierarchical features for the given input. In contrast, a deep belief network (DBN) is composed of a stack of restricted Boltzmann machines (RBM), generative stochastic neural network models that learn a joint probability distribution of unlabeled training data. Both techniques employ two stages, pre-training and fine-tuning, to train the models which can help to avoid local optima and alleviate the overfitting of models [104].
For EMG pattern recognition, DBN has been used to replace conventional machine learning approaches to discriminate a five-wrist-motion problem using hand-crafted time domain features [24]. The results showed that DBN yields a better classification accuracy than LDA, SVM, and MLP, but that the DBN requires lengthy iterations to attain good performance in recognizing EMG patterns without overfitting. Subsequently, the same group of researchers also used split-and-merge algorithms to reduce the overfitting problem and to improve the accuracy, and called the new approach a split-and-merge deep belief network (SM-DBN) [105]. Wand and colleagues [106,107] also compared deep neural networks to commonly used machine learning approaches for EMG-based speech recognition, i.e., Gaussian mixture model (GMM), yielding accuracy improvements in almost all classification cases. The DBN also provides good performance in recognizing human emotional states (valence, arousal, and dominance) even when using the instantaneous value of surface EMG when paired with several other physiological signals from the DEAP dataset [108].
UPNs can also be used instead of traditional unsupervised feature projection methods, such as PCA and independent component analysis (ICA). As a data compression approach, for example, SAE has been used to compress EMG and EEG data, and the results show that it significantly reduces signal distortion for high compression levels compared to traditional EMG data compression techniques, such as discrete wavelet transform (DWT), compressive sensing (CS), and ICA [109]. ICA, however, still performed better than SAE for low compression levels. As a regression approach, DBN has been shown to outperform PCA in the estimation of human lower limb flexion and extension joint angles during walking [110].

Convolutional Neural Network (CNN)
The CNN (or ConvNet) may be the most widely used deep learning model in feature learning and is by far the most popular deep learning method for EMG pattern recognition (Table 5). CNN is quite similar to ordinary neural networks but makes the explicit assumption that the inputs are image-based, thus constraining the models in a tangible way (i.e., neurons are arranged in three dimensions). The hidden layers of CNN typically consist of convolutional layers, pooling layers (sub-sampling layers), and fully connected layers, where the first two layers are used for feature learning on large-scale images (i.e., the convolution operation acts as a feature extraction and the pooling operation acts as a dimensionality reduction).
CNN has been successful in EMG pattern recognition, with better classification accuracies having been found using CNN as compared to commonly used classification methods including LDA, SVM, KNN, MLP, and RF (Table 5). Specifically, Geng et al. [14] evaluated the performance of CNN in recognizing hand and finger motions based on sEMG from three public databases containing data recorded from either a single row of electrodes or a 2D high-density electrode array. Without using windowed features, the classification accuracy of an eight-motion within-subject problem achieved 89.3% on a single frame (1 ms) of an sEMG image and reached 99.0% and 99.5% using simple majority voting over 40 and 150 frames (40 and 150 ms), respectively. Subsequently, Du et al. [15] employed a similar approach with adaptation to achieve better performance for inter-session and inter-subject scenarios. It should be noted that although CNNs can be quite responsive when used with instantaneous sEMG 'images' obtained from HD-sEMG, they still require a higher computational load to handle both the high density inputs and the large-scale deep neural networks.
Although deep neural networks can be used directly with raw data, both data pre-processing and feature engineering can further improve the performance of deep learning. For instance, use of the right color space is important for image recognition using deep learning. One of the most widely used pre-extracted features for deep learning is the spectrogram. Côté-Allard et al. [111,112], for example, used CNNs with spectrograms as the input. Their results showed that CNN is not only accurate enough to recognize complex motions, but is also robust to many confounding factors, such as short term muscle fatigue, small displacement of electrodes, inter-subject variability, and long term use, without the need for recalibration. They also proposed a transfer learning algorithm to reduce the computational load and improve performance of the CNN, and used continuous wavelet transform (CWT) as pre-extracted features [11]. Zhai et al. [113] proposed a self-recalibrating classifier that can be automatically updated to maintain stable performance over time without the need for subject retraining based on CNN using PCA-reduced spectrogram inputs. The results of this study [113] support those of Côté-Allard et al. [11,111,112], and show that CNN models could be useful in compensating continuous drift in surface EMG signals.
When short time windows of the raw EMG signal (150-200 ms) have been used as inputs to CNNs with very simple architectures, however, the reported accuracies have been below those of classical classification methods (i.e., RF for Ninapro 1 and 2, and SVM for Ninapro 3) [114]. This suggests that deep learning algorithms are strongly influenced by several factors (including network models and architectures, and optimization parameters), and thus, even after a good model and architecture is found, there is still a need to search for potentially better hyper-parameters to improve the performance of the algorithm.

Recurrent Neural Network (RNN)
In contrast to other deep learning models, RNNs take time series information into account, i.e., rather than completely feed-forward connections, RNN might have connections that feed back into prior layers. This feedback path allows RNNs to store the information from previous inputs and model problems in time. Long short-term memory (LSTM) units and gated recurrent Units (GRUs) are two of the prevailing RNN architectures. For EMG pattern recognition, a combination of the RNN and CNN (RNN + CNN) has been proposed and showed better performance than support vector regression (SVR) or CNN alone for estimating human upper limb joint angles [23]. Furthermore, Laezza [115] evaluated the performance of three different network models, RNN, CNN, and RNN + CNN, for myoelectric control. Their results showed that RNN alone provided the best classification performance (91.81%), compared with CNN (89.01%) and RNN + CNN (90.4%). This may be due to the fact that RNN and LSTM have advantages when processing sequential data like EMG time series.

Discussion
From these more recent works, it is clear that EMG pattern recognition systems based on deep learning can achieve better classification accuracies than their counterparts, e.g., LDA, SVM, KNN, MLP, RF, and GMM. One of the key requirements for deep learning, however, is the availability of large volumes of data. Based on the current size of available EMG data sets, more data recording is necessary. When there is an insufficient amount of training data, the models tend to overfit to the data and end up having poor generalization ability. In the case of EMG pattern recognition, moreover, the iteration time can be long due to the need to find relevant features from raw EMG signals, possibly increasing overfitting.
To avoid this overfitting problem, larger EMG training sets are important. In the absence of more training data, techniques such as dropout, batch normalization, and early stopping (e.g., References [102,113]) may be employed.
Another simple strategy to make sure that deep learning models can generalize well is to split the dataset into three sets: training, validation, and test sets. Most previous EMG studies using deep learning, however, have approached the model selection and parameter optimization processes without statistical methods (i.e., a single run trial instead of cross validation [116]). Caution should therefore be taken when comparing the classification performances of proposed deep learning algorithms with more shallow learning conventional algorithms (e.g., LDA and SVM) which require a smaller training dataset and whose presentation has more commonly employed cross validation.
In addition to larger EMG datasets and techniques for addressing overfitting problems, several studies have employed both feature engineering and learning by inputing pre-processed EMG data and/or pre-extracted EMG features to deep learning algorithms, and some benefits have been shown. No comparison between different types of pre-extracted EMG features involving window-based time domain features, time-frequency representation features, and EMG images, has been made yet. As this area remains relatively new, however, future research should consider how to integrate both feature engineering and feature learning together best for maximum benefit.
A key challenge and impediment to the clinical deployment of deep learning methods is their high computational cost (i.e., long training times and high computational complexity). Because of the stringent power and size restrictions of prosthetic components, most devices are built using embedded systems. While it is possible for inference to be carried out on these systems, training with deep learning must likely still be completed in an offline setting. A combined effort from the research community at large is therefore still needed to develop faster algorithms and hardware with even great processing power to deliver clinically viable deep learning based myoelectric control.
Another key challenge for the clinical use of deep learning methods for myoelectric prosthetic control is the use of unsupervised domain adaptation or transfer learning methods [117] to reduce the effect of confounding factors that affect the characteristics of surface EMG signals. Inter-subject and inter-session variability are two main factors that have been studied so far (Table 5). These techniques have been used to significantly reduce the amount of training data required for a new subject as well as to alleviate the need for periodic re-calibration. Nevertheless, there are other real-life conditions that must be addressed, including but not limited to, electrode location shift, muscle fatigue, variations in muscle contraction intensit as well as variations in limb position and forearm orientation. Furthermore, no study has yet to demonstrate real-time prosthesis control by amputees using deep learning approaches. Additional efforts are needed on the development and optimization of transfer learning and domain adaptation to leverage suitable information from able-bodied to amputee subjects. This is, in part, due to the greater variability in musculature that amputees possess compared to intact-limbed subjects, and it would be impractical to collect a large pre-training dataset from amputees.

Conclusions
In recent years, big data and deep learning have become extremely active research topics in many research fields, including EMG pattern recognition. Major advances have been made in the availability of shared surface EMG data, such that there are now at least 33 data sets with surface EMG collected from 662 subject sessions available online. This abundance of EMG data has enabled the resurgence of neural network approaches and the use of deep learning. Even more EMG data is expected to be made available in the near future due to technological advances (e.g., wireless wearable devices, HD-sEMG sensors, and data sharing), and thus big data methods should continue to be investigated and developed. All of the methods discussed in this paper show promise, provide inspiration for future studies, and demonstrate the potential of developing more advanced applications of EMG pattern recognition in the era of big data and deep learning.

Conflicts of Interest:
The authors declare no conflict of interest.