Deep-Learning-Based Models for Pain Recognition: A Systematic Review

: Traditional standards employed for pain assessment have many limitations. One such limitation is reliability linked to inter-observer variability. Therefore, there have been many approaches to automate the task of pain recognition. Recently, deep-learning methods have appeared to solve many challenges such as feature selection and cases with a small number of data sets. This study provides a systematic review of pain-recognition systems that are based on deep-learning models for the last two years. Furthermore, it presents the major deep-learning methods used in the review papers. Finally, it provides a discussion of the challenges and open issues.


Introduction
Deep learning is an important branch of machine learning used to solve many applications. It is a powerful way of achieving supervised learning. The history of deep learning has undergone different naming conventions and developments [1]. It was first referred to as cybernetics in the 1940s-1960s. Then, in the 1980s-1990s, it became known as connectionism. Finally, the phrase deep learning began to be utilized in 2006. There are also some names that are based on human knowledge. For example, an artificial neural network (ANN) refers to the biologic aspect of deep learning. However, some problems cannot be solved by neural inspiration, and many deep-learning perspectives therefore depend on recently developed statistics and applied mathematical principles.
Automated pain recognition is a complex problem that needs to be solved by a powerful method based on deep learning. Recently, some studies of pain recognition have employed machine learning for this approach. Therefore, this study aims to review deep-learning applications for pain recognition only. It provides a systematic review that incorporates a search strategy and inclusion criteria, and we describe it in the next section. To best of our knowledge, there are only one review paper on pain recognition models in general [2]. This study focuses on the deep models for pain recognition. This review will help the researcher in Artificial intelligence and affective computer community to know the deep ANN algorithms and datasets which have been used for automating the task of pain recognition. Therefore, they can recognize the problems and solve them in a new effective model to outperforms the available algorithms.
The main contributions of this study are summarized as follows: • Review of the pain-recognition studies that are based on deep learning; • Presentation and discussion of the main deep-learning methods employed in the reviewed papers; • Review of the available data sets for pain recognition; • Discussion of some challenges and future works.

Search Strategy
First, we identified important search terms for pain-recognition systems. These included 'pain recognition', 'pain assessment' and 'pain diagnosis'. Second, the period spanned publications in 2017 to 2019 only. These two constraints with respect to terminology and time were used to perform searching in the popular databases IEEE, ACM and Web of Science (WOS).

Inclusion and Exclusion Criteria
These criteria include related fields of research and utilized the deep-learning method only. After the first search iteration, many papers that required filtering were listed.
For example, the WOS search engine has first 634 studies. We performed filtering by selecting the ones that are related to computer science fields only. As a result, the second iteration had 58 papers. Moreover, we selected only the studies that are based on deep-learning models. The final number of suitable studies from the WOS search engine was 13.

Categorization Method
The resulting papers were scanned rapidly in order to obtain a suitable categorization method. Therefore, we first classified the paper into a single model or multi-model. After this, the single model has four categories, which are described in Figure 1

Single-Model-Based Pain Recognition
Single-model-based pain-recognition systems can be defined as systems that use a single kind of measure to classify the pain level. These measures are physiological, speech, body movements and facial expressions. Based on our literature review, we grouped the measures into three categories that are based on deep-learning methods. Next, we present details of the previous studies for each single model.

Single-Model-Based Pain Recognition
Single-model-based pain-recognition systems can be defined as systems that use a single kind of measure to classify the pain level. These measures are physiological, speech, body movements and facial expressions. Based on our literature review, we grouped the measures into three categories that are based on deep-learning methods. Next, we present details of the previous studies for each single model.

Physiological Signals
Physiological signals are among the most important measures used to describe the pain level from the physiological response of the body. These signals include vital signs (such as blood pressure, respiration rate, heart rate and muscle activity) or brain dynamics.
In addition to the lack of using physiological signals for pain detection, Lopez-Martinez and Picard in 2017 [3] proposed a model using neural network (NN) techniques. Their proposed approach implements a multitask learning method to tackle the problem of personal differences through shared layers in NNs. Their system involves multi-model data that depend on physiological signals from skin conductance (SC) and electrocardiograms (ECGs) only. They used the available data set, called a BioVid dataset, to build a model and conduct their experiments. Then, they compared their model with other machine-learning methods (logistic regression (LR) and support-vector machine with both linear kernel (SVM-L) and radial basis function kernel (SVM-RBF). They reported that this NN approach outperforms others by around 82.75%. However, they reported no experimental testing of the model on new data sets. In the future, this method can be easily adapted in real clinical settings because of the simple use of two features that can be acquired from wrist-wearable devices. Recent studies were conducted to validate the accurate of such wearable devices [4].

Speech Analysis
Another study [5] recognized the pain level based on speech analysis and the use of long short-term memory (LSTM) NNs. First, they employed an unsupervised learning NN to extract the vocal features using the Chinese corpus. Then, they fine-tuned it based on NNs and an emergency triage database to output the specific acoustic representation at the sentence-level for each patient. Finally, they performed a pain-level classification to two or three classes using the SVM. As a result, their methods achieved weighted average recall (WAR) values of 72.3% in binary-class and 54.2% in three-class pain-intensity classification tasks. To the best of our knowledge, this work is the first study that used the speech only to detect the pain level.

Facial Expressions
Facial expressions are signals that have received attention from researchers for many applications, such as face recognition in the field of biometrics.
Deep learning is used directly to estimate the pain from face expressions. One of the distinct approaches is to estimate the pain from the self-reported visual analog scale (VAS) pain levels to understand the individual differences [6]. Their method includes two learning stages. The first one, which is performed by learning the recurrent NNs (RNNs), estimates the Prkachin and Solomon pain intensity (PSPI) levels from face images. Then, personalized hidden conditional random fields (HCRFs) used the previous output to estimate the VAS for each person. By making comparisons with non-personalized approaches, this approach achieved high performance, and the score for a single-sequence test is the best.
Deep learning has been mainly used to extract the important features, as was recently done by [7] for pain detection based on facial expressions. Their approach is based on three steps: First, convolutional neural networks (CNNs) are used to extract the features from VGG_Faces. After that, the result of the feature map is used to train the LSTM. This is a type of RNN used to find the binary pain estimation (pain, no pain). They provided a summary of previous works done on the popular data set of pain detection using faces. This data set is called a UNBC-McMaster database and has 200 video sequences from 25 patients who suffered from shoulder pain. Their experiments on this data set showed that their approach outperforms all previous works with an area under the curve (AUC) performance of 93.3%. Their model can also be generalized for application to other facial emotion recognition. This ability was realized when applying their model to the Cohn Kanade + facial expression database and resulted in a competitive score (AUC = 97.2%).
In the same manner, in 2017, Egede, Valstar and Martinez [8] proposed a pain-estimation model that combined learned features obtained from deep learning and other handcrafted features. Their idea comes from the hardness to obtain a data set for pain estimation in a large area in order to work well with deep learning. Therefore, they extracted handcrafted features directly from the face image and used a CNN to learn some features. Their features include appearance, shape and dynamics information. Finally, they classified the pain level using the linear regression model on the combined features and individual. Their results outperformed the state-of-the-art methods in terms of the root mean square error (RMSE) of 0.99 and Pearson correlation (CORR) of 0.67. A limitation of this approach-and all face-based approaches-is that they consider only the front of the face without capturing several combinations of indicators, such as audio, body movements and physiological signals.
Another solution that aims to deal with small data sets and deep learning was proposed in 2017 by Wang et al. [9]. They fine-tuned a small pain data set using a face verification network that is trained by the WebFace dataset, which has 500,000 face images. Then, they fitted a problem as a regression problem by applying a regression loss regularized with the center loss. Their performance was evaluated based on new proposed metrics to avoid the use of imbalance data. Based on the results, this method achieved a high performance compared with the state-of-art methods using both weighted metrics (mean absolute error (MAE): 0.389, (mean squared error) MSE: 0.804, (Pearson's correlation coefficient) PCC: 0.651) and new proposed metrics (weighted MAE 0.991, weighted MSE 1.720). However, pain is temporal and involves subjective information, and no such information and stimulus knowledge are used in this method, which requires further investigation.
In contrast to previous solutions, the cumulative attribute (CA) method was used in 2018 [10] as a good solution to overcome the imbalance of data in pain estimation datasets. The cumulative attribute is defined as 'an intermediate representation Ci obtained by transforming the original labels yi into a vector'. In this study, the deep CNN is used with the cumulative attributes in two steps. In the first step, the cumulative attribute vector is outputted from a trained CNN. In the second step, the regression model is trained to produce the final real output. Their approach performed tests on a pain estimation dataset and estimated the age. Their pain estimation results obtained higher values for CA-CNN experiments compared with non-CA-CNN experiments. In addition to the use of a CA layer trained with a log-loss function, it significantly outperforms a CA layer trained with the Euclidean loss. Their approach has the advantage of using the CNN framework without any additional annotations. However, the requirement to build the annotated dataset for pain estimation is important to overcome most of the problems in classification tasks.
Therefore, Haque et al. [11] built a new database with RGB, depth and thermal (RGBDT) images of the face for pain-level recognition in sequences. Their approach of elicitation is different from previous datasets obtained by stimulated healthy people with electrical pulses. Twenty subjects participated in the data collection to determine the pain recognition based on five levels of pain (0 for no pain and four for severe pain). After collecting the data sets, they constructed a baseline model for pain recognition based on spatio-temporal features and deep learning. First, they preprocessed the video frames by cropping only the face region based on their previous proposed method [12] for RGB images. Then, they used homography matrix codes to crop other depth and thermal images. After this, they applied deep learning based on two approaches for individual modalities or the fusion between them. The main idea of the proposed method is based on two steps. First, they used 2D-CNN for frame-features extraction and pain recognition. Second, they used LSTM to find the temporal relation between frames and sequence level pain recognition. From the results, the fusion approach exhibited the greatest performance compared with the individual modalities. In addition, early fusion, which is achieved by integrating the input from all models to be fed to classifiers is better than late fusion, which integrates the output of each model to input the second classifier.

Other Indicators
In 2017, there was an attempt [13] to use deep-learning methods for LBP recognition. Their methods were based on segmentation and classification of LBP X-ray data. They obtained tomography lumbar spine pictures from a meta picture (MHD) arrangement and gave them five vertebral levels. Then, using a deep-learning framework, they extracted the features and classified the LBP based on five severity levels (normal, mild, crush, wedge, severe and biconcavity). They reached around 65% of accuracy that needs further improvements. However, using X-ray data of patients is not enough to recognize LBP.
Recently, [14] Hu et al. proposed a deep-learning method to recognize the low back pain in the context of static standing. Their system depends on kinematic data that was acquired using three attached motion sensors on different places of human skin. After preprocessing, they used the data as an input to LSTM network. Using 22 healthy people and 22 LBP patients, they got 1073 time series for training and 107 time series for testing. Their results showed a high accuracy of 97.2% as they mentioned. The disadvantage of this study is the using of only kinematic data and ignores the EMG data, which is important to differentiate LBP patients.
A later study to recognize the protective behavior was conducted by Wang et al. in 2019 [15]. They focused only on the body movement data from available data set (emo-pain data set). In addition, they applied the attention mechanism to LSTM architecture to keep the relevant recognition of protective behavior. They used the sliding window with zero padding to segment the data. Furthermore, they augmented that data by combining the two methods of random discarding and jittering. After comparison with previous models, this model obtained high performance of 0.844 mean F1 score. They found also that bodily attention is more important than temporal attention, while the combination between them is provide the higher performance.

Multi-Model-Based Pain Recognition
A recent attempt [16] to recognize pain based on a multi-model approach involves combining face and physiological signals (ECC and SC). The personalization of pain estimation is their main goal and is achieved based on clustering subjects into different profiles and not for each individual, as in previous works. Then, they used the multitask NN (MT-NN) approach, where each task corresponds to one profile. For their experiment, they used the available data set BioVid Heat Pain database. The results exhibited a better performance for high clusters (C = 4), which shows the need for further investigation in future using a larger number of clusters.
In addition, [17] performed fusion between physiological signals (EMG, ECG and SC) and face videos. Their approach involves implementing the idea of adaptively using the system to test unknown individuals based on unlabeled data. Therefore, they used multi-stage ensemble-based classification, and were based on the BioVid Heat Pain database. NNs were used at the confidence-estimation stage, which was trained using three different inputs, namely the prediction of one-vs-one classifiers, the continuous pain-level estimation of the regressor and the variance of a bagged ensemble of random forests. Then, the use of NNs will enable us to determine the confidence level of samples by employing a random regressor as an input. After the evaluation of different combinations of inputs to the NN, the highest values of the correlation coefficient and RMSE were 0.183 and 0.347, respectively. They found that the adaptation process is not an easy task because of individual differences in response to pain stimuli, and they therefore require more investigation.
More recently, Thiam et al. [18] explored several CNNs architectures based on the available BioVid Heat Pain database (Part A). They used the three modalities of signals: EDA, ECG and EMG. They tried two kinds of input to network which are 1D and 2D. In addition, different deep fusion architectures were presented and tested. Their results reached 84.57% and 84.40% for binary classification. Next, we present in Table 1, a summary of all pain-recognition studies based on deep learning. Furthermore, Figure 2 below described the flow chart of the main phases that required in the deepest-based pain recognition methods. performance than using the original dataset. They also found the combination of the two augmentation methods provide the best performance. Their final results obtained the best performance of 0.815 mean F1 score for their LSTM Networks that was better than the conventional neural network. However, the generalization will lead to a decrease in performance which needs to be fixed and solved. The main limitation of this study is the important equally when determine the protective behavior during all body parts, activity types and time.  On the other hand C. Wang et al. in 2019 [19] recognize specific behavior of chronic pain by deep learning. Their main objective is to determine the protective behavior of LBP patients during performing five exercises. They used sEMG and body movement data from available data set (emo-pain data set). This study proposed two recurrent neural networks which are called stacked LSTM and dual-LSTM. From five activities only, they computed the angles and energies of Mockup data. Regarding the muscle activity, they used the rectified sEMG data for smoothing and decreasing the noise of raw data. To update the weights, Adam optimizer was used with affixed learning rate of 0.001. Sliding window technique was used for data segmentation. In order to analyze and select the best length of window, they performed different experiments for each activity to determine the best length of window using a fixed overlapping ratio of 75%. As a result, they found the 3 s is the best length of window that met best detection for the most activities. In this study, Jittering and Random discarding are the augmentation methods that were used, providing a better performance than using the original dataset. They also found the combination of the two augmentation methods provide the best performance. Their final results obtained the best performance of 0.815 mean F1 score for their LSTM Networks that was better than the conventional neural network. However, the generalization will lead to a decrease in performance which needs to be fixed and solved. The main limitation of this study is the important equally when determine the protective behavior during all body parts, activity types and time.

Primary Deep-Learning Methods Employed for Pain Recognition
In the previous section, we presented the literature review of deep-learning methods employed for pain recognition over the last two years. In this section, we present and discuss the main deep-learning models used in pain-recognition studies in addition to the metrics used to evaluate such models.

Convolutional Neural Networks (CNNs)
CNNs were derived from the first model on an NN that was invented in 1998 by Yann LeCun et al. [20]. In general, conventional networks perform logistic regression by applying a filter to the input. In addition to the filter size (f), the important parameters required to build a deep CNN are the stride (s) and padding (p). CNNs have many types of layers, namely conventional, pooling and fully connected layers [21]. We can illustrate the CNN architecture as in Figure 3.

Primary Deep-Learning Methods Employed for Pain Recognition
In the previous section, we presented the literature review of deep-learning methods employed for pain recognition over the last two years. In this section, we present and discuss the main deep-learning models used in pain-recognition studies in addition to the metrics used to evaluate such models.

Convolutional Neural Networks (CNNs)
CNNs were derived from the first model on an NN that was invented in 1998 by Yann LeCun et al. [20]. In general, conventional networks perform logistic regression by applying a filter to the input. In addition to the filter size (f), the important parameters required to build a deep CNN are the stride (s) and padding (p). CNNs have many types of layers, namely conventional, pooling and fully connected layers [21]. We can illustrate the CNN architecture as in Figure 3. The CNN has an advantage of generalization compared with MLP [22]. In addition, it has fewer parameters than the fully connected layers in MLP.
As we can see from the literature, CNNs are mostly used for feature extraction. Only two studies have employed them for classification tasks. However, they were used only for pain recognition based on facial expressions without considering physiological signals and speech analysis. For the evaluation tasks, the commonly used metrics are AUC2, PCC, MAE, RMSE and MSE and the accuracy measure is rarely used. The CNN has an advantage of generalization compared with MLP [22]. In addition, it has fewer parameters than the fully connected layers in MLP.
Appl. Sci. 2020, 10, 5984 9 of 15 As we can see from the literature, CNNs are mostly used for feature extraction. Only two studies have employed them for classification tasks. However, they were used only for pain recognition based on facial expressions without considering physiological signals and speech analysis. For the evaluation tasks, the commonly used metrics are AUC2, PCC, MAE, RMSE and MSE and the accuracy measure is rarely used.

Recurrent Neural Networks (RNNs)
Recurrent neural networks (RNNs) are another form of NN that is suitable for processing sequences of unequal length [20]. The use of unidirectional RNNs solves the problem of different lengths between the input and output. However, they are predicted using only earlier data. Therefore, the bidirectional model is used for prediction from both directions. In general, RNNs are capable of using three different settings [20].

•
The first setting is standard, which learns from labeled data and predicts the output; • The second setting is called sequence setting and is able to learn data from multiple labels; It has sequences with combinations of different kinds of data and cannot break them; Therefore, it takes a full sequence to predict the next state and more; • The third setting is called predict next setting and can take unlabeled data or implicitly labeling, such as words in a sentence. In this example application, the RNN breaks the words down into subsequences and considers the next subsequence as a target.
The first successful model of an RNN was called a Hopfield network. In 1997, the most usable and important RNNs were LSTM NNs, which were proposed by Hochreiter and Schmidhuber. Next, we explain this kind of NN in more detail.

Long-Short Term Memory Neural Networks (LSTM-NNs)
LSTM is one of the most important architectures of NNs and were proposed in 1998 by Hochreiter and Schmidhuber [20]. It is usually used for sequential tasks such as time analysis and natural language processing [20]. In addition to the main link from one unit to the next in RNN, the LSTM has another link, i.e., the cell sate, that keeps or removes data through different gates [20]. The architecture of LSTM is shown in Figure 4. natural language processing [20]. In addition to the main link from one unit to the next in RNN, the LSTM has another link, i.e., the cell sate, that keeps or removes data through different gates [20]. The architecture of LSTM is shown in Figure 4. In pain-recognition studies, researchers used this model for both feature extraction and classification tasks. The measures used for pain recognition are obtained only from facial expressions and speech analysis. They achieved a good performance when used for feature extraction [5] and also classification [3]. However, they had a better result when used in classification and combined with CNN for feature extraction, as in [7]. This comparison of the evaluation depends on metrics that are used in LSTM pain-recognition studies, namely, AUC, AUC2, MAE, ICC and confusion matrices. We can compare the results obtained for CNN and LTSM in terms of MAE only, which had a better result (0.18) in LSTM than CNN (0.0389). In contrast, the weighted MAE had a better result for CNN (0.991).

Multitask Neural Network (MT-NN)
The multitask learning (MTL) approach functions by sharing representation between related tasks in order to have a better generalization model [20]. For NNs, it is based on having a single NN In pain-recognition studies, researchers used this model for both feature extraction and classification tasks. The measures used for pain recognition are obtained only from facial expressions and speech analysis. They achieved a good performance when used for feature extraction [5] and also classification [3]. However, they had a better result when used in classification and combined with CNN for feature extraction, as in [7]. This comparison of the evaluation depends on metrics that are used in LSTM pain-recognition studies, namely, AUC, AUC2, MAE, ICC and confusion matrices. We can compare the results obtained for CNN and LTSM in terms of MAE only, which had a better result (0.18) in LSTM than CNN (0.0389). In contrast, the weighted MAE had a better result for CNN (0.991).

Multitask Neural Network (MT-NN)
The multitask learning (MTL) approach functions by sharing representation between related tasks in order to have a better generalization model [20]. For NNs, it is based on having a single NN that is able to perform many tasks better than performing individual tasks. Two techniques are mostly used for MTL-NN. The first method is the hard parameter sharing, which is based on sharing the hidden layers for all tasks and then having different outputs. The second method is soft parameter sharing, which is based on modeling each task alone and trying to achieve regularization in order to have similar parameters.
From the previous studies of pain recognition, we observe that this approach was rarely used. The reason is based on the difficulty of this approach, which must perform many tasks using a single neuron. Therefore, the most applicable and widely used approach is transfer learning, which helps to have a small data set problem from the big data.

Datasets for Pain Recognition
In this section, we present the available data set for pain recognition. After reviewing the literature, we found eleven [23][24][25][26][27][28][29][30][31][32] data sets created for pain-recognition systems. The data sets were different in terms of their selected features, number of participants, stimuli method, pain levels and the devices used for sensing the data. The main differences are summarized in Table 2 for comparison and discussion.
We can see that only three datasets were applied to patients with shoulder, back and neck pain. The number of participants for all datasets ranged from 20-129 persons. The main stimuli methods employed for healthy people came from heat and cold sensing. As a new method, electrical stimuli have been used recently [11] for healthy people. The facial expression data set had more than two levels of pain. In contrast, when the biosignals and body movements were included in the datasets, they had only 2-4 levels. This was because of recent studies of pain recognition from body movements and biosignals, which require more studies and experiments.

Challenges and Future Directions
Deep-learning methods are widely used in many applications. In recent times, pain recognition has been associated with sensory data rather than only facial expressions. Deep learning has shown promising results in many e-health systems with sensory data [2]. This achievement has resulted in greater interest in this approach by the pain-recognition community based on sensing data.
Based on the review of state-of-the-art with respect to the use of deep learning in pain recognition, we have found few recent studies. Most pain-recognition studies used CNNs and LSTMS owing to their ability to resolve many challenges related to deep-learning methods. However, many challenges and future directions need to be solved and fixed.
There is a need to better understand the problem of pain recognition. This will help to define the tools and additional methods such as the use of similarity measures and meta information [1].
Furthermore, to build a recognition model, it is important to extract the most suitable features. The pain is subjective measure and so the finding of the discriminate features either from speech, facial or Physiological signal is very difficult. Therefore, the deep learning will help in such intuitive and hard to describe problem by people to be used in feature extraction.
Moreover, the use of a single model depending on facial expression is limited and does not provide an accurate pain level. Therefore, the multimodality is considered as a solution and another challenge. This requires a combination of different kinds of signals, such as physiological signals with behavior expression [5].
Moreover, it is important to find a new objective measure of pain based on sensory data and experiments, compared with the use of only instruments in the existing measures [5].
The available datasets ranged from 22-142 participants, which is not enough to generalize the model of deep learning. This is considered an open issue that needs to be addressed, and a dataset should be developed with hundreds of participants.
In addition, the challenge that faces deep-learning methods in pain recognition is strongly related to data acquisition. Data acquisition includes many issues that are related to: (1) the devices and sensors used, (2) processing of signals, (3) dimensionality of data [21] and (4) reliability of data [21]. With respect to the sensors, it is more convenient to deal with wearable and mobile devices, but such tools also have many challenges in terms of short battery life and more. More information about this issue can be found on our previous study [33] for such kind of wearable devices.
Finally, most of the works were performed in a laboratory setting. However, the main challenge involves investigating the models in real clinical settings.

Conclusions
Healthcare systems are complex systems that contain interactions between different entities: people, process and technology [34]. This study is considered as a starting point for researchers to develop a smart healthcare system. It can provide them with available tools and datasets to build such systems.
This study presents a systematic review of pain-recognition systems that based on deep-learning methods only. Based on the papers reviewed, a new taxonomy of categorization was presented based on the kind of data used. These pain-recognition data were obtained from facial expressions, speech, or physiological signals. Furthermore, this study describes the primary deep-learning methods that were used in review papers. Finally, the main challenges and future direction were discussed.
Deep-learning algorithms have many advantages in healthcare systems. The biggest advantage is their ability to describe the complex problems and non objective measures such as pain. They could extract the features automatically without a fully understanding of health problem from medical experts. In addition, the difficulty of collecting large data from patients could be overcome by using suitable augmentation techniques. Therefore, the intelligent interpretation of problem by deep learning and also increasing the medical test data automatically will enhance the rapid development of smart healthcare systems [34].