Next Article in Journal
Properties of Sleeve Joints Made from Reduced Bamboo
Next Article in Special Issue
A Framework for Enhancing Big Data Integration in Biological Domain Using Distributed Processing
Previous Article in Journal
Flexural Performance of Novel Nail-Cross-Laminated Timber Composite Panels
Previous Article in Special Issue
Data-Driven Knowledge-Based System for Self-Measuring Activities of Daily Living in IoT-Based Test

Deep-Learning-Based Models for Pain Recognition: A Systematic Review

College of Computer and Information Sciences, King Saud University, Riyadh 11451, Saudi Arabia
Author to whom correspondence should be addressed.
Appl. Sci. 2020, 10(17), 5984;
Received: 31 July 2020 / Revised: 25 August 2020 / Accepted: 27 August 2020 / Published: 29 August 2020
(This article belongs to the Special Issue Data Science for Healthcare Intelligence)


Traditional standards employed for pain assessment have many limitations. One such limitation is reliability linked to inter-observer variability. Therefore, there have been many approaches to automate the task of pain recognition. Recently, deep-learning methods have appeared to solve many challenges such as feature selection and cases with a small number of data sets. This study provides a systematic review of pain-recognition systems that are based on deep-learning models for the last two years. Furthermore, it presents the major deep-learning methods used in the review papers. Finally, it provides a discussion of the challenges and open issues.
Keywords: pain assessment; pain recognition; deep learning; neural network; review; dataset pain assessment; pain recognition; deep learning; neural network; review; dataset

1. Introduction

Deep learning is an important branch of machine learning used to solve many applications. It is a powerful way of achieving supervised learning. The history of deep learning has undergone different naming conventions and developments [1]. It was first referred to as cybernetics in the 1940s–1960s. Then, in the 1980s–1990s, it became known as connectionism. Finally, the phrase deep learning began to be utilized in 2006. There are also some names that are based on human knowledge. For example, an artificial neural network (ANN) refers to the biologic aspect of deep learning. However, some problems cannot be solved by neural inspiration, and many deep-learning perspectives therefore depend on recently developed statistics and applied mathematical principles.
Automated pain recognition is a complex problem that needs to be solved by a powerful method based on deep learning. Recently, some studies of pain recognition have employed machine learning for this approach. Therefore, this study aims to review deep-learning applications for pain recognition only. It provides a systematic review that incorporates a search strategy and inclusion criteria, and we describe it in the next section. To best of our knowledge, there are only one review paper on pain recognition models in general [2]. This study focuses on the deep models for pain recognition. This review will help the researcher in Artificial intelligence and affective computer community to know the deep ANN algorithms and datasets which have been used for automating the task of pain recognition. Therefore, they can recognize the problems and solve them in a new effective model to outperforms the available algorithms.
The main contributions of this study are summarized as follows:
  • Review of the pain-recognition studies that are based on deep learning;
  • Presentation and discussion of the main deep-learning methods employed in the reviewed papers;
  • Review of the available data sets for pain recognition;
  • Discussion of some challenges and future works.

2. Methodology

2.1. Search Strategy

First, we identified important search terms for pain-recognition systems. These included ‘pain recognition’, ‘pain assessment’ and ‘pain diagnosis’. Second, the period spanned publications in 2017 to 2019 only. These two constraints with respect to terminology and time were used to perform searching in the popular databases IEEE, ACM and Web of Science (WOS).

2.2. Inclusion and Exclusion Criteria

These criteria include related fields of research and utilized the deep-learning method only. After the first search iteration, many papers that required filtering were listed.
For example, the WOS search engine has first 634 studies. We performed filtering by selecting the ones that are related to computer science fields only. As a result, the second iteration had 58 papers. Moreover, we selected only the studies that are based on deep-learning models. The final number of suitable studies from the WOS search engine was 13.

2.3. Categorization Method

The resulting papers were scanned rapidly in order to obtain a suitable categorization method. Therefore, we first classified the paper into a single model or multi-model. After this, the single model has four categories, which are described in Figure 1 below.
  • Pain recognition and deep-learning models
    Single model
    Physiological signals;
    Speech analysis;
    Facial expressions.

3. Review Papers

3.1. Single-Model-Based Pain Recognition

Single-model-based pain-recognition systems can be defined as systems that use a single kind of measure to classify the pain level. These measures are physiological, speech, body movements and facial expressions. Based on our literature review, we grouped the measures into three categories that are based on deep-learning methods. Next, we present details of the previous studies for each single model.

3.1.1. Physiological Signals

Physiological signals are among the most important measures used to describe the pain level from the physiological response of the body. These signals include vital signs (such as blood pressure, respiration rate, heart rate and muscle activity) or brain dynamics.
In addition to the lack of using physiological signals for pain detection, Lopez-Martinez and Picard in 2017 [3] proposed a model using neural network (NN) techniques. Their proposed approach implements a multitask learning method to tackle the problem of personal differences through shared layers in NNs. Their system involves multi-model data that depend on physiological signals from skin conductance (SC) and electrocardiograms (ECGs) only. They used the available data set, called a BioVid dataset, to build a model and conduct their experiments. Then, they compared their model with other machine-learning methods (logistic regression (LR) and support-vector machine with both linear kernel (SVM–L) and radial basis function kernel (SVM–RBF). They reported that this NN approach outperforms others by around 82.75%. However, they reported no experimental testing of the model on new data sets. In the future, this method can be easily adapted in real clinical settings because of the simple use of two features that can be acquired from wrist-wearable devices. Recent studies were conducted to validate the accurate of such wearable devices [4].

3.1.2. Speech Analysis

Another study [5] recognized the pain level based on speech analysis and the use of long short-term memory (LSTM) NNs. First, they employed an unsupervised learning NN to extract the vocal features using the Chinese corpus. Then, they fine-tuned it based on NNs and an emergency triage database to output the specific acoustic representation at the sentence-level for each patient. Finally, they performed a pain-level classification to two or three classes using the SVM. As a result, their methods achieved weighted average recall (WAR) values of 72.3% in binary-class and 54.2% in three-class pain-intensity classification tasks. To the best of our knowledge, this work is the first study that used the speech only to detect the pain level.

3.1.3. Facial Expressions

Facial expressions are signals that have received attention from researchers for many applications, such as face recognition in the field of biometrics.
Deep learning is used directly to estimate the pain from face expressions. One of the distinct approaches is to estimate the pain from the self-reported visual analog scale (VAS) pain levels to understand the individual differences [6]. Their method includes two learning stages. The first one, which is performed by learning the recurrent NNs (RNNs), estimates the Prkachin and Solomon pain intensity (PSPI) levels from face images. Then, personalized hidden conditional random fields (HCRFs) used the previous output to estimate the VAS for each person. By making comparisons with non-personalized approaches, this approach achieved high performance, and the score for a single-sequence test is the best.
Deep learning has been mainly used to extract the important features, as was recently done by [7] for pain detection based on facial expressions. Their approach is based on three steps: First, convolutional neural networks (CNNs) are used to extract the features from VGG_Faces. After that, the result of the feature map is used to train the LSTM. This is a type of RNN used to find the binary pain estimation (pain, no pain). They provided a summary of previous works done on the popular data set of pain detection using faces. This data set is called a UNBC–McMaster database and has 200 video sequences from 25 patients who suffered from shoulder pain. Their experiments on this data set showed that their approach outperforms all previous works with an area under the curve (AUC) performance of 93.3%. Their model can also be generalized for application to other facial emotion recognition. This ability was realized when applying their model to the Cohn Kanade + facial expression database and resulted in a competitive score (AUC = 97.2%).
In the same manner, in 2017, Egede, Valstar and Martinez [8] proposed a pain-estimation model that combined learned features obtained from deep learning and other handcrafted features. Their idea comes from the hardness to obtain a data set for pain estimation in a large area in order to work well with deep learning. Therefore, they extracted handcrafted features directly from the face image and used a CNN to learn some features. Their features include appearance, shape and dynamics information. Finally, they classified the pain level using the linear regression model on the combined features and individual. Their results outperformed the state-of-the-art methods in terms of the root mean square error (RMSE) of 0.99 and Pearson correlation (CORR) of 0.67. A limitation of this approach—and all face-based approaches—is that they consider only the front of the face without capturing several combinations of indicators, such as audio, body movements and physiological signals.
Another solution that aims to deal with small data sets and deep learning was proposed in 2017 by Wang et al. [9]. They fine-tuned a small pain data set using a face verification network that is trained by the WebFace dataset, which has 500,000 face images. Then, they fitted a problem as a regression problem by applying a regression loss regularized with the center loss. Their performance was evaluated based on new proposed metrics to avoid the use of imbalance data. Based on the results, this method achieved a high performance compared with the state-of-art methods using both weighted metrics (mean absolute error (MAE): 0.389, (mean squared error) MSE: 0.804, (Pearson’s correlation coefficient) PCC: 0.651) and new proposed metrics (weighted MAE 0.991, weighted MSE 1.720). However, pain is temporal and involves subjective information, and no such information and stimulus knowledge are used in this method, which requires further investigation.
In contrast to previous solutions, the cumulative attribute (CA) method was used in 2018 [10] as a good solution to overcome the imbalance of data in pain estimation datasets. The cumulative attribute is defined as ‘an intermediate representation Ci obtained by transforming the original labels yi into a vector’. In this study, the deep CNN is used with the cumulative attributes in two steps. In the first step, the cumulative attribute vector is outputted from a trained CNN. In the second step, the regression model is trained to produce the final real output. Their approach performed tests on a pain estimation dataset and estimated the age. Their pain estimation results obtained higher values for CA–CNN experiments compared with non-CA–CNN experiments. In addition to the use of a CA layer trained with a log-loss function, it significantly outperforms a CA layer trained with the Euclidean loss. Their approach has the advantage of using the CNN framework without any additional annotations. However, the requirement to build the annotated dataset for pain estimation is important to overcome most of the problems in classification tasks.
Therefore, Haque et al. [11] built a new database with RGB, depth and thermal (RGBDT) images of the face for pain-level recognition in sequences. Their approach of elicitation is different from previous datasets obtained by stimulated healthy people with electrical pulses. Twenty subjects participated in the data collection to determine the pain recognition based on five levels of pain (0 for no pain and four for severe pain). After collecting the data sets, they constructed a baseline model for pain recognition based on spatio–temporal features and deep learning. First, they preprocessed the video frames by cropping only the face region based on their previous proposed method [12] for RGB images. Then, they used homography matrix codes to crop other depth and thermal images. After this, they applied deep learning based on two approaches for individual modalities or the fusion between them. The main idea of the proposed method is based on two steps. First, they used 2D-CNN for frame-features extraction and pain recognition. Second, they used LSTM to find the temporal relation between frames and sequence level pain recognition. From the results, the fusion approach exhibited the greatest performance compared with the individual modalities. In addition, early fusion, which is achieved by integrating the input from all models to be fed to classifiers is better than late fusion, which integrates the output of each model to input the second classifier.

3.1.4. Other Indicators

In 2017, there was an attempt [13] to use deep-learning methods for LBP recognition. Their methods were based on segmentation and classification of LBP X-ray data. They obtained tomography lumbar spine pictures from a meta picture (MHD) arrangement and gave them five vertebral levels. Then, using a deep-learning framework, they extracted the features and classified the LBP based on five severity levels (normal, mild, crush, wedge, severe and biconcavity). They reached around 65% of accuracy that needs further improvements. However, using X-ray data of patients is not enough to recognize LBP.
Recently, [14] Hu et al. proposed a deep-learning method to recognize the low back pain in the context of static standing. Their system depends on kinematic data that was acquired using three attached motion sensors on different places of human skin. After preprocessing, they used the data as an input to LSTM network. Using 22 healthy people and 22 LBP patients, they got 1073 time series for training and 107 time series for testing. Their results showed a high accuracy of 97.2% as they mentioned. The disadvantage of this study is the using of only kinematic data and ignores the EMG data, which is important to differentiate LBP patients.
A later study to recognize the protective behavior was conducted by Wang et al. in 2019 [15]. They focused only on the body movement data from available data set (emo-pain data set). In addition, they applied the attention mechanism to LSTM architecture to keep the relevant recognition of protective behavior. They used the sliding window with zero padding to segment the data. Furthermore, they augmented that data by combining the two methods of random discarding and jittering. After comparison with previous models, this model obtained high performance of 0.844 mean F1 score. They found also that bodily attention is more important than temporal attention, while the combination between them is provide the higher performance.

3.2. Multi-Model-Based Pain Recognition

A recent attempt [16] to recognize pain based on a multi-model approach involves combining face and physiological signals (ECC and SC). The personalization of pain estimation is their main goal and is achieved based on clustering subjects into different profiles and not for each individual, as in previous works. Then, they used the multitask NN (MT–NN) approach, where each task corresponds to one profile. For their experiment, they used the available data set BioVid Heat Pain database. The results exhibited a better performance for high clusters (C = 4), which shows the need for further investigation in future using a larger number of clusters.
In addition, [17] performed fusion between physiological signals (EMG, ECG and SC) and face videos. Their approach involves implementing the idea of adaptively using the system to test unknown individuals based on unlabeled data. Therefore, they used multi-stage ensemble-based classification, and were based on the BioVid Heat Pain database. NNs were used at the confidence-estimation stage, which was trained using three different inputs, namely the prediction of one-vs-one classifiers, the continuous pain-level estimation of the regressor and the variance of a bagged ensemble of random forests. Then, the use of NNs will enable us to determine the confidence level of samples by employing a random regressor as an input. After the evaluation of different combinations of inputs to the NN, the highest values of the correlation coefficient and RMSE were 0.183 and 0.347, respectively. They found that the adaptation process is not an easy task because of individual differences in response to pain stimuli, and they therefore require more investigation.
More recently, Thiam et al. [18] explored several CNNs architectures based on the available BioVid Heat Pain database (Part A). They used the three modalities of signals: EDA, ECG and EMG. They tried two kinds of input to network which are 1D and 2D. In addition, different deep fusion architectures were presented and tested. Their results reached 84.57% and 84.40% for binary classification. Next, we present in Table 1, a summary of all pain-recognition studies based on deep learning. Furthermore, Figure 2 below described the flow chart of the main phases that required in the deepest-based pain recognition methods.
On the other hand C. Wang et al. in 2019 [19] recognize specific behavior of chronic pain by deep learning. Their main objective is to determine the protective behavior of LBP patients during performing five exercises. They used sEMG and body movement data from available data set (emo-pain data set). This study proposed two recurrent neural networks which are called stacked LSTM and dual-LSTM. From five activities only, they computed the angles and energies of Mockup data. Regarding the muscle activity, they used the rectified sEMG data for smoothing and decreasing the noise of raw data. To update the weights, Adam optimizer was used with affixed learning rate of 0.001. Sliding window technique was used for data segmentation. In order to analyze and select the best length of window, they performed different experiments for each activity to determine the best length of window using a fixed overlapping ratio of 75%. As a result, they found the 3 s is the best length of window that met best detection for the most activities. In this study, Jittering and Random discarding are the augmentation methods that were used, providing a better performance than using the original dataset. They also found the combination of the two augmentation methods provide the best performance. Their final results obtained the best performance of 0.815 mean F1 score for their LSTM Networks that was better than the conventional neural network. However, the generalization will lead to a decrease in performance which needs to be fixed and solved. The main limitation of this study is the important equally when determine the protective behavior during all body parts, activity types and time.

4. Primary Deep-Learning Methods Employed for Pain Recognition

In the previous section, we presented the literature review of deep-learning methods employed for pain recognition over the last two years. In this section, we present and discuss the main deep-learning models used in pain-recognition studies in addition to the metrics used to evaluate such models.

4.1. Convolutional Neural Networks (CNNs)

CNNs were derived from the first model on an NN that was invented in 1998 by Yann LeCun et al. [20]. In general, conventional networks perform logistic regression by applying a filter to the input. In addition to the filter size (f), the important parameters required to build a deep CNN are the stride (s) and padding (p). CNNs have many types of layers, namely conventional, pooling and fully connected layers [21]. We can illustrate the CNN architecture as in Figure 3.
The CNN has an advantage of generalization compared with MLP [22]. In addition, it has fewer parameters than the fully connected layers in MLP.
As we can see from the literature, CNNs are mostly used for feature extraction. Only two studies have employed them for classification tasks. However, they were used only for pain recognition based on facial expressions without considering physiological signals and speech analysis. For the evaluation tasks, the commonly used metrics are AUC2, PCC, MAE, RMSE and MSE and the accuracy measure is rarely used.

4.2. Recurrent Neural Networks (RNNs)

Recurrent neural networks (RNNs) are another form of NN that is suitable for processing sequences of unequal length [20]. The use of unidirectional RNNs solves the problem of different lengths between the input and output. However, they are predicted using only earlier data. Therefore, the bidirectional model is used for prediction from both directions. In general, RNNs are capable of using three different settings [20].
  • The first setting is standard, which learns from labeled data and predicts the output;
  • The second setting is called sequence setting and is able to learn data from multiple labels; It has sequences with combinations of different kinds of data and cannot break them; Therefore, it takes a full sequence to predict the next state and more;
  • The third setting is called predict next setting and can take unlabeled data or implicitly labeling, such as words in a sentence. In this example application, the RNN breaks the words down into subsequences and considers the next subsequence as a target.
The first successful model of an RNN was called a Hopfield network. In 1997, the most usable and important RNNs were LSTM NNs, which were proposed by Hochreiter and Schmidhuber. Next, we explain this kind of NN in more detail.

4.3. Long-Short Term Memory Neural Networks (LSTM-NNs)

LSTM is one of the most important architectures of NNs and were proposed in 1998 by Hochreiter and Schmidhuber [20]. It is usually used for sequential tasks such as time analysis and natural language processing [20]. In addition to the main link from one unit to the next in RNN, the LSTM has another link, i.e., the cell sate, that keeps or removes data through different gates [20]. The architecture of LSTM is shown in Figure 4.
In pain-recognition studies, researchers used this model for both feature extraction and classification tasks. The measures used for pain recognition are obtained only from facial expressions and speech analysis. They achieved a good performance when used for feature extraction [5] and also classification [3]. However, they had a better result when used in classification and combined with CNN for feature extraction, as in [7]. This comparison of the evaluation depends on metrics that are used in LSTM pain-recognition studies, namely, AUC, AUC2, MAE, ICC and confusion matrices. We can compare the results obtained for CNN and LTSM in terms of MAE only, which had a better result (0.18) in LSTM than CNN (0.0389). In contrast, the weighted MAE had a better result for CNN (0.991).

4.4. Multitask Neural Network (MT-NN)

The multitask learning (MTL) approach functions by sharing representation between related tasks in order to have a better generalization model [20]. For NNs, it is based on having a single NN that is able to perform many tasks better than performing individual tasks. Two techniques are mostly used for MTL-NN. The first method is the hard parameter sharing, which is based on sharing the hidden layers for all tasks and then having different outputs. The second method is soft parameter sharing, which is based on modeling each task alone and trying to achieve regularization in order to have similar parameters.
From the previous studies of pain recognition, we observe that this approach was rarely used. The reason is based on the difficulty of this approach, which must perform many tasks using a single neuron. Therefore, the most applicable and widely used approach is transfer learning, which helps to have a small data set problem from the big data.

5. Datasets for Pain Recognition

In this section, we present the available data set for pain recognition. After reviewing the literature, we found eleven [23,24,25,26,27,28,29,30,31,32] data sets created for pain-recognition systems. The data sets were different in terms of their selected features, number of participants, stimuli method, pain levels and the devices used for sensing the data. The main differences are summarized in Table 2 for comparison and discussion.
We can see that only three datasets were applied to patients with shoulder, back and neck pain. The number of participants for all datasets ranged from 20–129 persons. The main stimuli methods employed for healthy people came from heat and cold sensing. As a new method, electrical stimuli have been used recently [11] for healthy people. The facial expression data set had more than two levels of pain. In contrast, when the biosignals and body movements were included in the datasets, they had only 2–4 levels. This was because of recent studies of pain recognition from body movements and biosignals, which require more studies and experiments.

6. Challenges and Future Directions

Deep-learning methods are widely used in many applications. In recent times, pain recognition has been associated with sensory data rather than only facial expressions. Deep learning has shown promising results in many e-health systems with sensory data [2]. This achievement has resulted in greater interest in this approach by the pain-recognition community based on sensing data.
Based on the review of state-of-the-art with respect to the use of deep learning in pain recognition, we have found few recent studies. Most pain-recognition studies used CNNs and LSTMS owing to their ability to resolve many challenges related to deep-learning methods. However, many challenges and future directions need to be solved and fixed.
There is a need to better understand the problem of pain recognition. This will help to define the tools and additional methods such as the use of similarity measures and meta information [1].
Furthermore, to build a recognition model, it is important to extract the most suitable features. The pain is subjective measure and so the finding of the discriminate features either from speech, facial or Physiological signal is very difficult. Therefore, the deep learning will help in such intuitive and hard to describe problem by people to be used in feature extraction.
Moreover, the use of a single model depending on facial expression is limited and does not provide an accurate pain level. Therefore, the multimodality is considered as a solution and another challenge. This requires a combination of different kinds of signals, such as physiological signals with behavior expression [5].
Moreover, it is important to find a new objective measure of pain based on sensory data and experiments, compared with the use of only instruments in the existing measures [5].
The available datasets ranged from 22–142 participants, which is not enough to generalize the model of deep learning. This is considered an open issue that needs to be addressed, and a dataset should be developed with hundreds of participants.
In addition, the challenge that faces deep-learning methods in pain recognition is strongly related to data acquisition. Data acquisition includes many issues that are related to: (1) the devices and sensors used, (2) processing of signals, (3) dimensionality of data [21] and (4) reliability of data [21]. With respect to the sensors, it is more convenient to deal with wearable and mobile devices, but such tools also have many challenges in terms of short battery life and more. More information about this issue can be found on our previous study [33] for such kind of wearable devices.
Finally, most of the works were performed in a laboratory setting. However, the main challenge involves investigating the models in real clinical settings.

7. Conclusions

Healthcare systems are complex systems that contain interactions between different entities: people, process and technology [34]. This study is considered as a starting point for researchers to develop a smart healthcare system. It can provide them with available tools and datasets to build such systems.
This study presents a systematic review of pain-recognition systems that based on deep-learning methods only. Based on the papers reviewed, a new taxonomy of categorization was presented based on the kind of data used. These pain-recognition data were obtained from facial expressions, speech, or physiological signals. Furthermore, this study describes the primary deep-learning methods that were used in review papers. Finally, the main challenges and future direction were discussed.
Deep-learning algorithms have many advantages in healthcare systems. The biggest advantage is their ability to describe the complex problems and non objective measures such as pain. They could extract the features automatically without a fully understanding of health problem from medical experts. In addition, the difficulty of collecting large data from patients could be overcome by using suitable augmentation techniques. Therefore, the intelligent interpretation of problem by deep learning and also increasing the medical test data automatically will enhance the rapid development of smart healthcare systems [34].

Author Contributions

R.E.A.-E. and H.A.-K. proposed the main ideas. R.E.A.-E. carried out the methodology, performed the systematic review and the discussion. R.E.A.-E., wrote the manuscript. A.A.-S., administrate project. All authors provided critical feedback and helped shape the research. All authors have read and agreed to the published version of the manuscript.


This research received no external funding.


This research project was supported by a grant from the Research Center of the Female Scientific and Medical Colleges, Deanship of Scientific Research, King Saud University.

Conflicts of Interest

The authors declare no conflict of interest.


  1. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
  2. Werner, P.; Lopez-Martinez, D.; Walter, S.; Al-Hamadi, A.; Gruss, S.; Picard, R. Automatic Recognition Methods Supporting Pain Assessment: A Survey. IEEE Trans. Affect. Comput. 2019, 1. [Google Scholar] [CrossRef]
  3. Lopez-Martinez, D.; Picard, R. Multi-task neural networks for personalized pain recognition from physiological signals. In Proceedings of the 2017 Seventh International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), San Antonio, TX, USA, 23–26 October 2017; pp. 181–184. [Google Scholar]
  4. Del Toro, S.F.; Wei, Y.; Olmeda, E.; Ren, L.; Guowu, W.; Díaz, V. Validation of a Low-Cost Electromyography (EMG) System via a Commercial and Accurate EMG Device: Pilot Study. Sensors 2019, 19, 5214. [Google Scholar] [CrossRef] [PubMed]
  5. Tsai, F.-S.; Weng, Y.-M.; Ng, C.-J.; Lee, C.-C. Embedding stacked bottleneck vocal features in a LSTM architecture for automatic pain level classification during emergency triage. In Proceedings of the 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), San Antonio, TX, USA, 23–26 October 2017; pp. 313–318. [Google Scholar]
  6. Martinez, D.L.; Rudovic, O.; Picard, R. Personalized Automatic Estimation of Self-Reported Pain Intensity from Facial Expressions. arXiv 2017, arXiv:1706.07154. Available online: (accessed on 26 July 2018).
  7. Rodriguez, P.; Cucurull, G.; Gonzalez, J.; Gonfaus, J.M.; Nasrollahi, K.; Moeslund, T.B.; Roca, F.X.; López, P.R. Deep Pain: Exploiting Long Short-Term Memory Networks for Facial Expression Classification. IEEE Trans. Cybern. 2017, 1–11. [Google Scholar] [CrossRef] [PubMed]
  8. Egede, J.; Valstar, M.; Martinez, B. Fusing Deep Learned and Hand-Crafted Features of Appearance, Shape, and Dynamics for Automatic Pain Estimation. In Proceedings of the 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), Washington, DC, USA, 30 May–3 June 2017; pp. 689–696. [Google Scholar] [CrossRef]
  9. Wang, F.; Xiang, X.; Liu, C.; Tran, T.D.; Reiter, A.; Hager, G.D.; Quon, H.; Cheng, J.; Yuille, A.L. Regularizing face verification nets for pain intensity regression. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 1087–1091. [Google Scholar] [CrossRef]
  10. Jaiswal, S.; Egede, J.; Valstar, M. Deep Learned Cumulative Attribute Regression. In Proceedings of the 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, China, 15–19 May 2018; pp. 715–722. [Google Scholar] [CrossRef]
  11. Haque, M.A.; Bautista, R.B.; Noroozi, F.; Kulkarni, K.; Laursen, C.B.; Irani, R.; Bellantonio, M.; Escalera, S.; Anbarjafari, G.; Nasrollahi, K.; et al. Deep Multimodal Pain Recognition: A Database and Comparison of Spatio-Temporal Visual Modalities. In Proceedings of the 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, China, 15–19 May 2018; pp. 250–257. [Google Scholar] [CrossRef]
  12. Bellantonio, M.; Haque, M.A.; Rodriguez, P.; Nasrollahi, K.; Telve, T.; Escalera, S.; Gonzalez, J.; Moeslund, T.B.; Rasti, P.; Anbarjafari, G.; et al. Spatio-temporal Pain Recognition in CNN-Based Super-Resolved Facial Images. In Image Processing, Computer Vision, Pattern Recognition, and Graphics; Springer: Cham, Switzerland, 2016; Volume 10165, pp. 151–162. [Google Scholar]
  13. Kulkarni, K.R.; Gaonkar, A.; Vijayarajan, V.; Manikandan, K. Analysis of lower back pain disorder using deep learning. IOP Conf. Series Mater. Sci. Eng. 2017, 263, 42086. [Google Scholar] [CrossRef]
  14. Hu, B.; Kim, C.; Ning, X.; Xu, X. Using a deep learning network to recognise low back pain in static standing. Ergonomics 2018, 61, 1374–1381. [Google Scholar] [CrossRef] [PubMed]
  15. Wang, C.; Peng, M.; Olugbade, T.A.; Lane, N.D.; Williams, A.C.D.C.; Bianchi-Berthouze, N. Learning Bodily and Temporal Attention in Protective Movement Behavior Detection. arXiv 2019, arXiv:1904.10824. Available online: (accessed on 5 June 2020).
  16. Lopez-Martinez, D.; Rudovic, O.; Picard, R. Physiological and behavioral profiling for nociceptive pain estimation using personalized multitask learning. arXiv 2017, arXiv:1711.04036. Available online: (accessed on 26 July 2018).
  17. Kächele, M.; Amirian, M.; Thiam, P.; Werner, P.; Walter, S.; Palm, G.; Schwenker, F. Adaptive confidence learning for the personalization of pain intensity estimation systems. Evol. Syst. 2016, 8, 71–83. [Google Scholar] [CrossRef]
  18. Thiam, P.; Bellmann, P.; Kestler, H.; Schwenker, F. Exploring Deep Physiological Models for Nociceptive Pain Recognition. Sensors 2019, 19, 4503. [Google Scholar] [CrossRef] [PubMed]
  19. Wang, C.; Olugbade, T.A.; Mathur, A.; Williams, A.C.D.C.; Lane, N.D.; Bianchi-Berthouze, N. Recurrent network based automatic detection of chronic pain protective behavior using MoCap and sEMG data. In Proceedings of the 23rd International Symposium on Wearable Computers, London, UK, 9–13 September 2019; pp. 225–230. [Google Scholar] [CrossRef]
  20. Skansi, S. Introduction to Deep Learning: From Logical Calculus to Artificial Intelligence; Springer: Berlin, Germany, 2018. [Google Scholar]
  21. Obinikpo, A.A.; Kantarci, B. Big Sensed Data Meets Deep Learning for Smarter Health Care in Smart Cities. J. Sens. Actuator Netw. 2017, 6, 26. [Google Scholar] [CrossRef]
  22. Sathyanarayana, A.; Joty, S.; Fernandez-Luque, L.; Ofli, F.; Srivastava, J.; Elmagarmid, A.; Arora, T.; Taheri, S.; Ridgers, N.; Bin, Y.S. Sleep Quality Prediction From Wearable Data Using Deep Learning. JMIR mHealth uHealth 2016, 4, e125. [Google Scholar] [CrossRef] [PubMed]
  23. Lucey, P.; Cohn, J.F.; Prkachin, K.M.; Solomon, P.E.; Matthews, I. Painful data: The UNBC-McMaster shoulder pain expression archive database. In Face and Gesture; IEEE: Santa Barbara, CA, USA, 2011; pp. 57–64. [Google Scholar] [CrossRef]
  24. Walter, S.; Gruss, S.; Ehleiter, H.; Tan, J.; Traue, H.C.; Crawcour, S.; Werner, P.; Al-Hamadi, A.; Andrade, A.O. The biovid heat pain database data for the advancement and systematic validation of an automated pain recognition system. In Proceedings of the 2013 IEEE International Conference on Cybernetics (CYBCO), Lausanne, Switzerland, 13–15 June 2013; pp. 128–131. [Google Scholar] [CrossRef]
  25. Zhang, X.; Yin, L.; Cohn, J.F.; Canavan, S.; Reale, M.; Horowitz, A.; Liu, P.; Girard, J.M. BP4D-Spontaneous: A high-resolution spontaneous 3D dynamic facial expression database. Image Vis. Comput. 2014, 32, 692–706. [Google Scholar] [CrossRef]
  26. Zhang, Z.; Girard, J.M.; Wu, Y.; Zhang, X.; Liu, P.; Ciftci, U.; Canavan, S.; Reale, M.; Horowitz, A.; Yang, H.; et al. Multimodal Spontaneous Emotion Corpus for Human Behavior Analysis. pp. 3438–3446. Available online: (accessed on 7 December 2019).
  27. Velana, M.; Gruss, S.; Layher, G.; Thiam, P.; Zhang, Y.; Schork, D.; Kessler, V.; Meudt, S.; Neumann, H.; Kim, J.; et al. The SenseEmotion Database: A Multimodal Database for the Development and Systematic Validation of an Automatic Pain- and Emotion-Recognition System. In Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction; Springer: Cham, Switzerland, 2016; pp. 127–139. [Google Scholar] [CrossRef]
  28. Aung, M.S.H.; Kaltwang, S.; Romera-Paredes, B.; Martinez, B.; Singh, A.; Cella, M.; Valstar, M.; Meng, H.; Kemp, A.; Shafizadeh, M.; et al. The Automatic Detection of Chronic Pain-Related Expression: Requirements, Challenges and the Multimodal EmoPain Dataset. IEEE Trans. Affect. Comput. 2015, 7, 435–451. [Google Scholar] [CrossRef] [PubMed]
  29. Gruss, S.; Geiger, M.; Werner, P.; Wilhelm, O.; Traue, H.C.; Al-Hamadi, A.; Walter, S. Multi-Modal Signals for Analyzing Pain Responses to Thermal and Electrical Stimuli. J. Vis. Exp. 2019, e59057. [Google Scholar] [CrossRef] [PubMed]
  30. Brahnam, S.; Chuang, C.-F.; Shih, F.; Slack, M.R. SVM Classification of Neonatal Facial Images of Pain. Comput. Vis. 2005, 3849, 121–128. [Google Scholar] [CrossRef]
  31. Harrison, D.; Sampson, M.; Reszel, J.; Abdulla, K.; Barrowman, N.J.; Cumber, J.; Fuller, A.; Li, C.; Nicholls, S.G.; Pound, C. Too many crying babies: A systematic review of pain management practices during immunizations on YouTube. BMC Pediatr. 2014, 14, 134. [Google Scholar] [CrossRef] [PubMed]
  32. Mittal, V.K. Discriminating the Infant Cry Sounds Due to Pain vs. Discomfort Towards Assisted Clinical Diagnosis. 2016. Available online: (accessed on 2 December 2019). [CrossRef]
  33. Al-Eidan, R.M.; Al-Khalifa, H.; Al-Salman, A.M.S. A Review of Wrist-Worn Wearable: Sensors, Models, and Challenges. J. Sens. 2018, 2018, 1–20. Available online: (accessed on 7 February 2019). [CrossRef]
  34. Spruit, M.; Lytras, M.D. Applied data science in patient-centric healthcare: Adaptive analytic systems for empowering physicians and patients. Telemat. Inform. 2018, 35, 643–653. [Google Scholar] [CrossRef]
Figure 1. Categorization schema.
Figure 1. Categorization schema.
Applsci 10 05984 g001
Figure 2. Flowchart of deep-learning-based pain-recognition methods.
Figure 2. Flowchart of deep-learning-based pain-recognition methods.
Applsci 10 05984 g002
Figure 3. Convolutional neural network (CNN) architecture [21].
Figure 3. Convolutional neural network (CNN) architecture [21].
Applsci 10 05984 g003
Figure 4. Long short-term memory (LSTM) architecture [20].
Figure 4. Long short-term memory (LSTM) architecture [20].
Applsci 10 05984 g004
Table 1. Summary of pain-recognition studies based on deep learning. Abbreviations: WAR—weighted average recall; AUC—area under the curve; CORR—Pearson’s correlation; MSE—mean-square error; RMSE—root mean-square error; ICC—intraclass correlation; CC—correlation coefficient; MAE—mean absolute error.
Table 1. Summary of pain-recognition studies based on deep learning. Abbreviations: WAR—weighted average recall; AUC—area under the curve; CORR—Pearson’s correlation; MSE—mean-square error; RMSE—root mean-square error; ICC—intraclass correlation; CC—correlation coefficient; MAE—mean absolute error.
StudyDeep-Learning ApproachesTaskFeatures-DevicesDatasetMetric-Score
Multitask neural network (MT-NN)ClassificationSkin conductance (SC) and heart-rate features (ECG) onlyAvailable: BioVid Heat Pain databaseAccuracy
Long-short term memory neural networks (LSTMs)Feature extractionVocal from audio
Face from video
Device: Sony HDR handy cam
Collected: Triage Pain-Level Multimodal database
Available: Speech data: Chinese corpus: The DaAi database
Three-class (severe, moderate and mild)
72.3%: binary classes
54.2%: three-class classes
LSTMsClassificationFaceAvailable: UNBC-MacMaster Shoulder Pain Expression Archive databaseMAE 2.47 (0.18)
ICC 0.36 (0.08)
Confusion matrices
-Convolutional neural networks (CNNs)
-Feature extraction
FaceAvailable: UNBC-MacMaster Shoulder Pain Expression Archive database
Cohn Kanade + facial expression database
AUC: 93.3%
CNNFeature extractionFaceAvailable: UNBC-MacMaster Shoulder Pain Expression Archive databaseCORR 0.67
RMSE 0.99
CNN-Fine-tuning-regularizingClassificationFaceAvailable: UNBC-MacMaster Shoulder Pain Expression Archive database
The face verification network [12] is trained on CASIA-WebFace dataset [16], which contains 494,414 training images from 10,575 identities
Unweighted Metrics
MAE 0.389
MSE 0.804
PCC 0.651
Weighted Metrics:
Weighted MAE 0.991
Weighted MSE 1.720
Cumulative attributes (CA)-CNNClassificationFaceAvailable: UNBC-MacMaster Shoulder Pain Expression Archive databaseRegression:
PCC (0.47, 0.53)
RMSE (1.20, 1.23)
PCC (0.36, 0.41)
RMSE (1.17, 1.19)
CNN and LSTMFeature extraction
and classification
-Microsoft Kinect Version2.
-Axis Q1922 thermal camera
Collected: Multimodal Intensity Pain (MIntPAIN)’ database
Healthy subjects
Classes: 5
5-fold cross-validation
The confusion matrix
MT-NNClassificationECG, SC
Available: BioVid Heat Pain databaseMAE, RMSE, ICC
NNConfidence estimationECG, SC, EMG
Available: BioVid Heat Pain databaseCross validation
RMSE: 0.347
CC: 0.183
LSTM (Tensor Flow)ClassificationTomography lumbar spine pictures–from Meta Picture (MHD) arrange
Classes: 665%
LSTMClassification-Kinematic data
-Motion sensors
22 healthy people and 22 LBP patientsAccuracy: 97.2%
CNNsFeature extraction
and classification
EDA, ECG, EMGAvailable: BioVid Heat Pain database (part1)Accuracy: 84.40%
LSTMClassificationKinematic data
Available: EmoPain databasemean F1:0.815
LSTM with attentionClassificationKinematic dataAvailable: EmoPain databasemean F1:
Table 2. Summary of pain-recognition data sets.
Table 2. Summary of pain-recognition data sets.
Dataset NameTitleFeaturesDevicesStimuliParticipantsClasses
UNBC 2011
PAINFUL DATA: The UNBC-McMaster Shoulder Pain Expression Archive DatabaseFacial expression
Two Sony digital camerasNatural shoulder pain129 Shoulder pain patients
(63 males, 66 females)
0–16 (PSPI) and 0–10 (VAS)
BioVid 2013
Data for the Advancement and Systematic Validation of an Automated Pain Recognition System-Video: Facial expression
-Biopotential signals (SCL, ECG, sEMG, EEG)
-Kinect camera
-Nexus-32 amplifier
Heat pain at right forearm thermode (PATHWAY, Healthy4 levels of pain
BP4D-Spontaneous Database (BP4D) 2014
BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database-Facial expressionTwo stereo cameras
and one texture video camera
Cold pressor test with left arm.41 healthy8 classes of pain as one the emotions (happiness/amusement sadness, startle, embarrassment
fear, physical pain, anger, disgust)
BP4D + 2016
Multimodal Spontaneous Emotion Corpus for Human Behavior Analysis-Facial expression
-EDA, heart rate, respiration rate, blood pressure
-3D camera Di3D
-infrared camera FLIR
Same as before141 healthySame as before
SenseEmotion 2016
The SenseEmotion Database: A Multimodal Database for the Development and Systematic Validation of an Automatic Pain- and Emotion-Recognition System-Facial expressions:
-3 cameras (IDS UI-3060CP-C-HQ)
-Piezoelectric crystal sensor (chest respiration waveforms)
-Digital wireless headset microphone (Line6 XD-V75HS) + directional microphone (Rode M3)
Heat pain
Medoc Pathway thermal simulator
40 heathy (20 male, 20 female)5 (no pain, 4 levels of pain)
The Automatic Detection of Chronic Pain-Related Expression: Requirements, Challenges and the Multimodal EmoPain Dataset-Audio, Facial expressions
-Body movements
-8 cameras
-Animzaoo IGS-190
Natural while doing physical exercises. 22 chronic low back pain (CLBP)
(7 male, 15 female)
2 for face
6 for body behaviors
combined: binary
Deep Multimodal Pain Recognition: A Database and Comparison of Spatio-Temporal Visual ModalitiesFacial expression
-RGB, depth
-Microsoft Kinect Version2
-Axis Q1922 thermal camera
Electrical pain20 healthy5 classes (0–4)
X-ITE pain
Multi-Modal Signals for Analyzing Pain Responses to Thermal and Electrical Stimuli-Audio, Facial expressions
-ECG, SCL, sEMG (trapezius, corrugator, zygomaticus)
-4 cameras
Heat and electrical. 134 healthy adults 3
SVM Classification of Neonatal Facial Images of PainFacial expressionheel
for blood collection
26 neonates (age 18–36 h) 5 (pain, rest, cry, air puff or friction)
Too many crying babies: a systematic review of pain management practices during immunizations on YouTube. -Video
injectionimmunizations (injection) 142 infants FLACC observer pain assessment
Discriminating the Infant Cry Sounds Due to Pain vs. Discomfort Towards Assisted Clinical Diagnosis -Audio injectionimmunizations (injection)33 infants6 (pain, discomfort, hunger/thirst and three others)
Back to TopTop