A Review of Automatic Pain Assessment from Facial Information Using Machine Learning

: Pain assessment has become an important component in modern healthcare systems. It aids medical professionals in patient diagnosis and providing the appropriate care and therapy. Con-ventionally, patients are asked to provide their pain level verbally. However, this subjective method is generally inaccurate, not possible for non-communicative people, can be affected by physiological and environmental factors and is time-consuming, which renders it inefficient in healthcare settings. So, there has been a growing need to build objective, reliable and automatic pain assessment alternatives. In fact, due to the efficiency of facial expressions as pain biomarkers that accurately expand the pain intensity and the power of machine learning methods to effectively learn the subtle nuances of pain expressions and accurately predict pain intensity, automatic pain assessment methods have evolved rapidly. This paper reviews recent spatial facial expressions and machine learning-based pain assessment methods. Moreover, we highlight the pain intensity scales, datasets and method performance evaluation criteria. In addition, these methods’ contributions, strengths and limitations will be reported and discussed. Additionally, the review lays the groundwork for further study and improvement for more accurate automatic pain assessment.


Introduction
Acute and chronic pain pose serious healthcare concerns, affecting millions of people worldwide and having an effect on quality of life.Indeed, for pain management and treatment to be successful, accurate pain evaluation is essential.Conventional techniques depend on self-reporting, which is subjective, can be biased by other environmental and psychological factors and is not possible for non-communicative patients, such as infants or people with cognitive disabilities.So, it is crucial to build an automatic pain assessment method that will help healthcare providers to precisely measure and monitor several types of pain including chronic and postoperative.This will also aid in providing the correct therapy and care and monitoring patient reaction to a medical treatment.
In recent years, numerous automatic pain estimation methods have been proposed .They aim at recognizing the pain level using different modalities such as facial expressions [11][12][13][14][15]19], voice [22], human behavior (e.g., human activity, body movement, coordination and speed) [22,23] and physiological signals (e.g., ECG brain signal and heart rate) [24][25][26].Nonetheless, the study of facial expressions is the most often used data source to forecast pain assessment.Face expressions reveal important information about an individual's level of discomfort because they are a normal and frequently unconscious reaction to it.In addition, the only specialized equipment needed for this non-invasive technique is a camera, which is a feature of every smartphone.Thus, many algorithms have been created  to precisely extract facial traits linked to pain, like biting of the lip, brow furrowing and clenching of the jaw.
In the past, techniques for estimating pain intensity depended on manually extracting pain-related information with the assistance of medical professionals [27][28][29].However, with the emergence of machine learning techniques and their success in the computer vision and image processing fields [30][31][32][33][34], many machine learning-based methods have tackled the task of pain assessment from facial expressions [11][12][13][14][15]19,22].These methods have achieved impressive results.
This review delves into the newly emerging field of machine learning and facial expressions for automatic pain assessment.Facial expressions are a natural and frequently unconscious reaction to pain; they can reveal important information about a person's level of pain.However, machine learning offers a viable path toward creating automated systems that can precisely identify and gauge the degree of pain based on facial features because of its capacity to understand intricate patterns.
In comparison with recent pain assessment review papers [35][36][37][38][39][40][41][42][43], this paper principally focuses on the effective machine learning and spatial facial expression-based pain assessment methods, reports more novel works (March 2018-May 2024) and categorizes the pain assessment works from the learning model perspective, which highlights the main contribution of each work.In addition, it offers a meta-analysis comparing the latest methods while evaluating them on the widely used pain assessment datasets employing the common performance evaluation criteria.
By providing a thorough review of the spatial facial expressions and machine learningbased pain assessment research landscape, this study aims to achieve the following:

•
Highlight the limitations of self-reporting pain levels and emphasize the power of automated pain detection through spatial facial recognition and machine learning in healthcare settings.

•
Present a background about the pain intensity scales, pain datasets and method performance evaluation criteria used in automatic pain assessment.

•
Analyze the state-of-the-art spatial facial information and machine learning-based pain assessment methods to determine the areas in which their accuracies and resilience can be enhanced.

•
Encourage the application of this cutting-edge technology in clinical settings for better pain management.
For the rest of this paper, in Section 2, we will provide the review methodology followed to collect the most relevant papers.In Section 3, we will give some background information on the pain datasets, pain intensity scales, and method performance evaluation criteria applied to automatic pain detection.Then, Section 4 presents a systematic overview of spatial facial expressions and machine learning-based pain assessment methods while highlighting their contributions, strengths and limitations.Furthermore, the results will be analyzed and interpreted, the automatic pain assessment challenges will be addressed and the limitations of this review will be presented in the discussion in Section 5. Finally, the conclusion, Section 6, will wrap up the paper with the pivotal findings along with recommendations for further pain assessment studies.

Search Strategy
To conduct a deep and efficient review of the most recent and relevant research papers investigating the automatic pain assessment task from facial information using machine learning methods, a list of related keywords was built and used to collect high-level journal articles, book chapters and conference papers published in trustworthy databases such as IEEE Xplore, ACM Digital Library, Scopus, PubMed and Web of Science (WoS).
Indeed, pain assessment, facial information and machine learning-related keywords were used, mainly the following:

•
For machine learning, we used "machine learning", "deep learning" and "artificial intelligence" keywords.
After collecting the papers, they were scanned again to guarantee they accurately met this review's requirements.

Inclusion/Exclusion Criteria
This paper aims to investigate automatic pain assessment methods using static facial information and machine learning.So, only the recent works using static facial information as pain sources were selected while excluding the physiological, dynamic facial expressions, speech, self-reporting, behavioral/movement modalities and other indicators.In addition, the focus was only on the machine learning-based methods.
Additionally, to present a recent paper review, it was decided to preserve the original research and review articles published within the last six years (March 2018-May 2024).In addition, only the papers studying the pain of adults were collected while excluding the ones working on neonatal or infant pains.Moreover, mainly the works evaluated on the most commonly used and publicly available pain assessment datasets (UNBC-McMaster [44], BioVid database [45] and MIntPain database [46]) were retained (see Section 3.2).

Categorization Method
After filtering only the relevant papers presenting effective pain assessment methods, they were scanned to find an adequate categorization that would help the reader to obtain a clear understanding of their contributions and differences.So, based on the employed model, three main categories were identified : classical machine learning-based methods excluding the deep learning ones (referred to, in this paper, as "machine learning methods"), deep learning-based methods and hybrid model-based methods combining machine and/or deep learning methods (see Figure 1).The main contributions, results, advantages, limitations and suggested future directions were extracted from these research papers and used in developing and enriching our study.Furthermore, other review paper investigations [3,26,[35][36][37]39,41] were leveraged to incorporate their recommended and relevant research extensions.

Background
A facial information-based pain assessment system is composed of three main phases, as shown in Figure 2.
The first phase consists of applying several pre-processing techniques to the input image, such as face detection, cropping and alignment, which leads to focusing only on the face while subtracting the background.Additionally, other image quality enhancement and denoising techniques can be applied, which will help in accurately extracting the facial features.In addition, image segmentation can be used to extract facial parts, which can help in extracting local facial features.Moreover, due to the need for large image data, especially for deep learning models, data augmentation can be employed where the image data are limited.Furthermore, in cases where the dataset is imbalanced, it is crucial to conduct data balancing across all the classes to effectively train the learning model.Then, after pre-processing the image data, pain assessment models are applied.In this phase, facial features are extracted and used to learn the model to effectively characterize and classify the pain level.Indeed, three machine learning-based methods can be applied.The conventional machine learning-based methods [1][2][3][4] manually extract handcrafted facial features, then classify them using conventional machine learning algorithms such as Support Vector Machines (SVMs) [47] and K-Nearest Neighbors (KNNs) [48].However, deep learning-based methods [5][6][7][8][9][10][11][12][13] use deep learning models (e.g., Convolutional Neural Networks (CNNs) [49], Residual neural network (ResNet) [50], VGG [51] and Inception Network [52]) for both feature extraction and model-learning tasks.Recently, hybrid modelbased methods [14][15][16][17][18][19][20][21], which combine machine learning and/or deep learning have been proposed for the pain assessment task.These methods follow different ensemble learning strategies to leverage the advantages of different learning models.
Finally, once the facial features are extracted and classified into several pain classes, the pain intensity level is determined.Pain can be categorized into several pain levels based on its severity.That is why different pain scales have been suggested to accurately identify the pain level is person is experiencing.

Pain Intensity Scales
Pain is perceptible and may be precisely measured.Different subjective pain intensity scales have been introduced (see Figure 3) to help in communicating the pain degree between the patient and the healthcare providers, which have helped them to better understand a patient's pain and to develop an appropriate treatment plan.These subjective pain measurement tools have been a major supporter of the development of automatic pain assessment methods.The most commonly used scales by the recent automatic pain assessment methods are the Visual Analog Scale (VAS) [53] and the Prkachin and Solomon Pain Intensity (PSPI) score [54].The VAS uses a 10-unit scale to rate the pain degree.To rate, a handwritten mark must be made along a line of 100 mm, which represents a continuum going from left to right between "no pain" (0 mm) and "the worst pain imaginable" (100 mm).However, for an appropriate pain intensity measurement, it was advised to standardize the scale into 10 levels (equally spaced by 10 mm) where a patient chooses from level 0 (no pain) to level 10 (unbearable pain).Additionally, this scale can be scaled up to four sections: no pain (0-4 mm), mild pain (5-44 mm), moderate pain (45-74 mm) and severe pain (75-100 mm).
Another common metric for estimating pain intensity at the frame-level ground truth is the PSPI score.In fact, it uses the Facial Action Coding System (FACS) [55] that consists of 44 facial Action Units (AUs), which are the smallest observable contractions of facial muscles.However, PSPI focuses on six specific actions associated with pain (see Figure 4): AU4 (Brow Lowerer), AU6 (Cheek Raiser), AU7 (Lid Tightener), AU9 (Nose Wrinkler), AU10 (Upper Lip Raiser) and AU43 (Eyes Closed).After measuring these six AUs on a scale from 0 to 5 (except AU43, which is measured as 0 when the eye is opened or 1 when the eye is closed), PSPI performs a linear combination between their intensities following Equation (1).Originally, PSPI scores were set on a scale ranging from 0 to 16 [54]; although, in several research studies, they are often standardized to only four levels (0, 1, 2 and ≥3) [56].More other less-used pain intensity scales have been provided, such as the Numeric Rating Scale (NRS) [57], which is a simple 0-to-10 scale, where 0 means "no pain" and 10 means "the worst pain imaginable", as well as the Faces Pain Scale-Revised (FPS-R) [58], which help the patient to define their pain by choosing between six cartoon faces that range from happy (no pain) to crying (worst pain).This scale is often used for children or people having difficulty using a numeric scale.

Publicly Accessible Pain Assessment Datasets
The availability of many face images and/or video datasets for pain assessment has driven recent advances in the field of automatic pain assessment.The UNBC-McMaster Shoulder Pain Expression Archive Database (UNBC-McMaster) [44] is the one of the most widely utilized of these datasets [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][17][18][19][20][21].This dataset was gathered from 25 adult participants suffering from shoulder pain, which form 48,398 RGB frames issued from 200 variable-length videos (see details in Table 1).In this database, images are mainly labeled into 17 PSPI levels (0-16) and 11 VAS levels (0-10).However, for some studies, the PSPI level is normalized into four to six levels [13,19] and four VAS levels.Sample images from the UNBC-McMaster database with the corresponding PSPI levels can be seen in Figure 5. Indeed, the UNBC-McMaster dataset, like many other image datasets, suffers from the imbalanced data problem where more than 80% of the dataset has a PSPI score of zero (meaning "no pain") [3].So, data balancing has been conducted in many works [3,16,19] by applying an under-resampling technique to decrease the no-pain class.As a result, the UNBC-McMaster dataset size was reduced from 48,398 frames to 10,783 frames.Furthermore, another well-used dataset [6,8,20,45,59] is the BioVid Heat Pain dataset [45,59], which was formed by collecting 17300 RGB videos from 87 subjects of 5 s each with a frame rate of 25 fps.An inducted heat pain was the cause of the pain represented in these videos.Additionally, four pain levels are present in this dataset (from 1 to 4).
In addition, another less-used pain assessment database is the Multimodal Intensity Pain (MIntPAIN) database [16,46], which contains 9366 variable-length videos having 187,939 frames issued from 20 subjects.These videos are of three modalities: RGB, depth and thermal of the same video sequences.The pain elicited using controlled electrical stimulation is labeled into five pain levels (from 0 to 4).
More pain assessment datasets have been developed such as X-ITE [60] and EmoPain [61].Yet, their use to evaluate pain assessment methods is still limited and many of them lack diversity with regard to subject age, gender as well as poses, occlusions and lighting conditions.Data balancing is also required for some datasets.

Criteria for Performance Evaluation
To efficiently assess and compare the different automatic pain assessment methods, it is crucial to evaluate them on the same benchmarks while adopting common performance evaluation criteria based on accurate pain intensity levels.In fact, the two main performance evaluation criteria used for pain assessment methods are classification accuracy [11] and Mean Square Error (MSE) [3], which measures how accurate a model prediction is.A learning model indeed strives for both high accuracy and low MSE.
Following Equation ( 2), the accuracy metric compares the pain intensities' ground truth (i.e., PSPI, VAS scores, etc.) of the test samples to their pain intensity predicted by the method: Accuracy = # of predicted test samples with correct pain intensities Total number of test samples (2) However, as it can be seen from Equation (3), the MSE measures the average squared difference between the predicted values ( ŷi ) and the actual values (y i ):

Facial Information-Based Pain Assessment Methods
One of the most trustworthy data sources for techniques estimating pain severity is facial expressions.In fact, brow furrowing, jaw clenching, lip biting, the degree of eye closure and other facial expressions are often used to convey a patient's level of pain.Thus, it is promising to develop a system that can precisely identify the pain intensity level by extracting the most relevant pain-related face information.Facial expressions can be either dynamic (derived from the video's temporal dimension) or static (derived from the face image).However, the main focus of this review will be on recent studies that estimated patient pain using static facial expressions.
Indeed, with the success of machine learning algorithms in learning and predicting the pain intensity degree, static facial expression-based methods have largely deployed them in recent years.So, as mentioned in the methodology in Section 2 and based on the learning methodology, these methods are categorized into three categories (see Figure 1): machine learning, deep learning and hybrid model-based methods.
In general, most of the methods have followed the flowchart illustrated in Figure 2 with different inputs, pre-processing techniques and model architectures.

Machine-Learning-Based Methods
The baseline of classical machine learning models (e.g., SVM, KNN, etc.) is to classify pre-extracted hand-crafted features.The main limitation of these models is that their performances are highly dependent on the quality and the pertinence of the extracted features, which require domain expertise.However, they have allowed the automation of image classification and computer vision tasks [39].For the spatial facial expression-based pain assessment task, machine learning-based methods [44,56] have been widely used during the last fifteen years due to their encouraging results (see Table 2), whereas, with the emergence of more effective models like deep learning and hybrid models, their use for this task has been recently limited [1][2][3][4].
For instance, in [1], a relatively shallow CNN architecture with three convolutional layers was proposed.This computationally efficient network with few parameters has obtained an accuracy of 93.34% when evaluated on the UNBC-McMaster dataset.
Additionally, in [2], a hierarchical network architecture was proposed, where two per-frame feature modules are employed.The first module extracts low-level features from the image patches and assembles them using second-order pooling.The second module extracts deep learning features from the image using a deep CNN.Then, the output face representations of the two modules are weighted and combined to form a holistic representation that boosts the pain estimation process.The resulting feature is afterward classified with a linear L2-regularized L2-loss Support Vector Regressor (SVR) to predict the pain intensity.An MSE of 1.45 was obtained when evaluated on the UNBC-McMaster dataset.
Furthermore, transfer learning is increasingly adopted for image classification works.The main idea is to pre-train a model for a specific task and acquire knowledge from it.Then, this model is re-used for another task where it is fine-tuned according to its data, which will lead to improving the model performance and decreasing the training time.For instance, in [3], a pre-trained DenseNet-161 model [62] was retrained on the UNBC-McMaster dataset.Then, the features were extracted from ten middle layers of the fine-tuned network and used as inputs to a Support Vector Regression (SVR) classifier.The evaluation of this model on the UNBC-McMaster dataset obained an MSE of 0.34.Moreover, in [4], a KNN-based pain assessment method was proposed.This method extracts facial features from face patches using a pre-trained DarkNet19 [63] model and selects the most informative features with the iterative neighborhood component analysis (INCA) technique.Then, the resulting features are classified with the KNN algorithm to efficiently predict the pain intensities.This method has achieved a pain intensity estimation accuracy of 95.57% on the UNBC-McMaster dataset.

Deep Learning-Based Methods
According to the collected research papers and as reported in [40], deep learning models have rapidly taken the lead for automatic pain assessment since 2018.This is principally due to the success of deep learning models for data classification and the availability of large pain datasets.Most deep learning-based pain assessment studies [7][8][9][10]12] have utilized a variation of the successful Convolutional Neural Network (CNN) [49] (see Table 3).Many of them have exploited more sophisticated recent deep learning models (e.g., ResNet, DenseNet and InceptionV3) [5,6,11,13].
In [12], an improved version of [1] was proposed, where a compact shallow CNN model for pain severity assessment (SPANET) including a false positive reduction method were presented.In an unrestricted hospital setting and on the UNBC-McMaster dataset, the proposed SPANET method demonstrated strong performance, with a pain intensity estimation accuracy of 97.48% on UNBC-McMaster .
In addition, to overcome the possible limitations in the original pain ground truth according to the subjects themselves or data annotation experts, [5] employed seven experts to re-annotate the UNBC-McMaster dataset.Then, they translated the frames to illumination-invariant 3D space using the multidimensional scaling method in order to feed a pre-trained AlexNet [49] model.This method obtained an accuracy of 80% on the UNBC-McMaster dataset.The convolutional block is frozen and a shallow CNN is used instead of the prediction layer.ACC: 99.1% -Moreover, in [7], to focus on the pain-related face regions, an attention mechanism was incorporated in a nine-layer CNN with a Softmax loss function to assign different weights for each region according to its pain expressiveness.The integration of this attention mechanism has improved the prediction accuracy to reach 51.1%.Afterward, a more powerful multi-task pain assessment architecture was proposed in [8], which consists of a locality and identity-aware network (LIAN).This network, first, conducts a dual-branch locality-aware module to highlight the information about the facial regions connected to pain.Then, the identity-aware module (IAM) is employed to decouple the pain assessment and the identity recognition tasks, which could achieve identity-invariant pain assessment.As it can be seen in Table 3, the outcomes demonstrate that this approach delivers a good performance on UNBC-McMaster (an accuracy of 89.17%).Furthermore, a similar painlocality-based method was suggested by Cui et al. [10].They created a multi-scale regional attention network (MSRAN) that uses adaptive learning to identify the degree of pain by capturing information about the facial pain regions.To highlight the facial areas associated with pain and their relationship, a self-attention and a relation attention module were incorporated.With this method, an accuracy of 91.13% in pain estimation was obtained on the UNBC-McMaster dataset.
Additionally, evaluating the UNBC-McMaster dataset, Rathee et al. [9] have suggested a CNN-based pain assessment approach with an estimation accuracy of 90.3%.This technique uses an improved version of the Model Agnostic Meta-Learning (MAML++) module, which aids in efficiently initializing the CNN model parameter so that it can rapidly converge to the optimal performance with the fewest number of images and the least amount of time.When determining the degree of pain in fresh subjects, this method has shown good results.In addition, a customized and deeper CNN model based on the VGG16 architecture was proposed in [11].Using the UNBC-McMaster dataset, this modified VGG16 model yields a 92.5% accuracy in estimating pain intensity.To reach this compelling result, this model was fed with pre-processed face images that included gray-scaling, histogram equalization, face detection, image cropping, mean filtering and normalization.
Recently, on the UNBC-McMaster dataset, the method described in [13] produced the best-reported pain intensity estimation accuracy (99.1%).In this method, two concurrent deep CNNs (InceptionV3 models [64] with an SGD optimizer) are used to extract the relevant features while freezing all convolutional blocks and replacing the classifier layer with a shallow CNN.The resulting outputs are concatenated and sent to a dense layer, then to a fully connected layer to classify the data.This makes this architecture a deep learning architecture.
Compared to the spatial facial expression-based pain assessment works evaluated on the UNBC-McMaster dataset, fewer methods have been examined on the BioVid dataset [6,8] (see Table 3).Apart from the previously reported accuracy result (40.4%) reached in [8], Dragomir et al. [6] presented a ResNet-based pain intensity estimation model, optimizing the model hyper-parameters and trying several strategies for data augmentation.On the BioVid dataset, the model's accuracy in estimating pain intensity was found to be 36.6%.
Based on these reported methods, it can be concluded that deep learning models have performed exceptionally well when fed with static facial expressions for the task of estimating pain severity (see Table 3).However, more recent deep learning models have to be evaluated for pain assessment and more model validation has to be conducted on more challenging pain datasets.In addition, it should be mentioned that several model fusion attempts have been proposed to get the most benefit from the combined models [14][15][16][17][18][19][20][21].

Hybrid Model Methods
Due to the success of machine learning and deep learning models in the automatic pain assessment task, several ensemble learning methods [14][15][16][17][18][19][20][21] have been developed in an attempt to combine the efficiency of many models (see Table 4).Semwal and Londhe [14] proposed an Ensemble of Compact Convolutional Neural Networks (ECCNET) model that combines three CNNs (VGG-16 [51], MobileNet [65] and GoogleNet [66]) while aggregating their predictions using the average ensemble rule.The experiments demonstrate that merging the CNNs leads to better classification performance than using them individually.As a result, an accuracy of 93.87% on the UNBC-McMaster dataset was reached.Afterward, the CNN models fusion technique was re-used in a follow-up study by the same authors [15] to combine three CNNs into a single model: a cross-dataset Transfer Learning VGG-based model (VGG-TL), Entropy Texture Network (ETNet) and Dual Stream CNN (DSCNN).In fact, three distinct image features (RGB features, an entropy-based texture feature and a complimentary feature that was learned from both of them) were combined and fed into the suggested pain assessment model to improve its generalization.Furthermore, a range of data augmentation methodologies were applied to the dataset in an effort to mitigate the issue of overfitting in the model.As a result, using the same UNBC-McMaster dataset, pain level detection accuracy increased by 2.13% to reach 96% compared to [14].
Furthermore, in [16], an ensemble deep learning model (EDLM) for pain assessment was proposed.This model uses a fine-tuned VGGFace to extract facial features followed by the Principal Component Analysis (PCA) algorithm to reduce the feature dimension while retaining the most informative of them, which helps in decreasing the training time.Then, three independent CNN-RNN deep learners with different weights are used to classify the extracted facial features.This model achieved a satisfactory accuracy of 86% on the UNBC-McMaster dataset and 92.26% on the MIntPain dataset.
Several methods have focused on the face regions that are mostly affected by pain while neglecting image background information, which disturbs pain intensity detection.So, after detecting the face regions (left eye, right eye, nose and mouth) Huang et al. [17] applied a multi-stream CNN of four sub-CNNs to extract the features from these four face regions.Then, these features were classified to estimate the pain intensity while assigning a learned weight for each of them, proportionally to their contribution to the pain expression.This method was evaluated on the UNBC-McMaster dataset and given an accuracy of 88.19%.Afterwards, in a further study [18], a hierarchical deep network (HDN) architecture was described.Within HDN, two scale branches are implemented where a region-wise branch is intended to extract characteristics from face image regions related to pain while a global-wise branch investigates the inter-dependencies of pain-associated areas.Indeed, in the region-wise branch, a multi-stream CNN is used to extract local features while, in the global-wise branch, a pre-trained CNN is used to extract holistic features from the face.Additionally, a multi-task learning technique is used in the global-wise branch to identify action units and estimate pain intensity.Ultimately, a decision level fuses the outputs of two branches' pain estimation.In fact, it is empirically demonstrated that the proposed HDN performs satisfactorily and provides an MSE of 0.49 when evaluated on the UNBC-McMaster dataset.
Likewise, Ye et al. [19] proposed a parallel CNN framework with regional attention focusing on the most important pain-sensitive face regions.Their method merged a VGGNet [51] and a ResNet [50] model to extract the facial feature and a SoftMax classifier to classify them.The method has obtained an accuracy of 95.11% on the UNBC-McMaster dataset.In addition, in [20], a model that overcomes the challenges posed by full left and right profiles was proposed.This model utilizes Sparse Autoencoders (SAEs) to reconstruct the pain-affected upper-face part from the input image.Then, two pre-trained concurrent and coupled CNNs are fed the constructed upper face part as well as the original image.Indeed, this Sparse Autoencoders for Facial Expressions-based Pain Assessment (SAFEPA) approach produces better identification performance by placing greater emphasis on the upper part of the face.Furthermore, SAFEPA's architecture makes use of CNNs' advantages while also taking into account differences in head positions, which removes the requirement for pre-processing steps necessary for face detection and upper-face extraction in other models.Using the widely established UNBC-McMaster dataset, SAFEPA achieves a good accuracy of 89.93% while reaching an accuracy of 33.28% on the BioVid dataset (see Table 4).
More recently, Sait and Dutta [21] have proposed an ensemble learning model with a ShuffleNet V2 model [67], which is fine-tuned for feature extraction through the application of class activation map and fusion feature approaches.Then, following a stacking ensemble learning strategy, XGBoost and CatBoost are used as base models followed by an SVM as a meta-learner to predict the pain intensities.As a result, an accuracy of 98.7% on the UNBC-McMaster dataset proves the reliability of the proposed method and the possibility that it can be deployed in healthcare centers.
Table 4 illustrates how, by aggregating the model performances, hybrid model-based techniques are effectively challenging deep learning-based techniques.More work is yet required to effectively combine the model's advantages in order to gain a performance advantage without increasing method complexity or calculation time.

Discussion
Based on our search for spatial facial expression-based pain assessment methods and after meticulous paper scanning and filtering, twenty-one papers were selected: three machine learning-based methods, ten deep learning-based methods and eight hybrid model-based methods.As previously mentioned in Section 1, different modalities have been used for pain assessment, while the facial expressions one is the most efficient of them.
That is what led us to focus on facial expression-based methods and, more precisely, on the approaches using spatial face information.

Result Analysis
After studying the three learning approaches, we can conclude that deep learning models [5][6][7][8][9][10][11][12][13] are more effective (see Table 3) in comparison with the classical machine learning models [1][2][3][4] (see Table 2).This can be explained by the power of deep learning models to extract the most relevant features, classify them and coordinate between the feature extraction and classification parts through the backpropagation process.In addition, deep learning models have leveraged the availability of large pain datasets and efficient computational capacities.As well, the hybrid model performances [14][15][16][17][18][19][20][21] have proven the success of merging several deep and machine learning techniques for pain assessment (see Table 4).The promising results show that the hybrid models seriously compete with the DL-based models.
In addition, it was noticed that several strategies and techniques have significantly improved the methods' efficiencies.Indeed, various methods [2,7,8,10,[17][18][19][20] have focused on the pain-related face parts to extract the most pertinent pain features.However, many methods [3][4][5]15,16,18,20,21] got the benefit of the transfer learning strategy and utilized pre-trained models to speed up the training process and obtain better model performances.Moreover, a local feature relations attention module was incorporated in a variety of methods [10,18] to include the feature relationship information into the model account.Furthermore, for some methods, data augmentation was employed to enlarge the data size [6,15] and the dataset pain ground truth was enhanced by means of experts [5], which led to the increase in the accuracy results.Additionally, pre-processed images were supplied to the model in [11] to aid in the extraction of the most valuable pain features.
Additionally, several model-boosting techniques were incorporated to enhance the model performance such as the MAML++ algorithm for efficient model weight initialization [9], the false positive reduction technique [12] as well as the INCA [4] and the PCA [16] algorithms, which are used for feature selection.

Automatic Pain Assessment Challenges
Actually, several non-related pain factors can affect a patient's facial expressions, which lead to a mistaken pain intensity measurement, such as the external environment (pain distraction factors, weather, etc.), psychological factors, ethnicity, gender, region, patient sensitivity to pain (central sensitization), wondering/astonishment, previous experiences or even painkillers/drugs in medical settings or intensive care.This will make the development of a generalizable pain intensity estimation system a difficult task.So, it is essential to consider the specific context in which the automated pain assessment systems will be used to ensure high accuracy and avoid these confounding effects.
In addition, for the same pain intensity and depending on the pain types (postoperative, acute, chronic, etc.), the patient's facial expressions can be different.Indeed, in chronic pain, which is more challenging than acute pain, the patient gets used to pain so their facial expressions are less intense.So, a pain type-customized system may be a promising solution.
Furthermore, more studies have to be done to encourage the use of automatic pain assessment technology in clinical settings and analyze its cost-effectiveness.

Limitations of This Review
Through our paper search methodology, we tried to collect all recent and relevant papers from many trustworthy research papers databases.Then, we did our best to exhibit the current state of research in automated pain assessment from spatial facial expressions using machine learning methods, identifying trends, capabilities, limitations, potential healthcare applications, and knowledge gaps.However, we have not succeeded in accessing five relevant papers because of university subscription limitations.
Moreover, it was not easy to conduct a deep comparative study between the reviewed studies since they were evaluated on different pain datasets (or a subset of a dataset) with varying pain intensity levels and used several performance evaluation criteria and different cross-validation techniques.In addition, most of them did not provide sufficient details such as the use of data augmentation, validation strategies (cross-validation, K-fold, etc.) and model parameter optimization techniques.

Conclusions
This study was designed to review the recent spatial facial expression-based pain assessment methods using machine learning models in their broad meaning.It is an attempt to support the ongoing efforts in leveraging machine learning technologies in the healthcare field, mainly for pain assessment and management.Indeed, several research works were reviewed and analyzed.Their very promising capabilities have confirmed that the automation of the pain assessment task is essential and that they can be employed for broader real-time applications in medical diagnosis and health informatics areas.
However, despite the power of automatic deep learning and hybrid model-based pain assessment methods, they still need more effort to improve model efficiency.Indeed, since model fusion has given promoting results, it would be beneficial to combine several well-performing models (e.g., the visual transformer) following a hierarchical strategy while keeping the architecture simple and the training time acceptable and validate it on larger facial image datasets, accurately labeled and collected from more subjects of diverse ethnicities.In addition, it is advised to effectively leverage the already used method enhancement techniques (pain-related features extraction, incorporating feature interdependencies, feature selection, image pre-processing, transfer learning, data augmentation, model parameter optimization, etc.).
Additionally, another future research direction is to exploit many pain modalities (facial, voice, physiological, behavioral, etc.), which may allow other pain-related tasks such as pain location and cause recognition.Furthermore, for long-term pain scenarios (e.g., chronic pain), it is encouraging to include the temporal dimension for the facial expressions to extract the facial dynamics, since spatio-temporal facial features are more expressive for pain.One instance is that pain-related face landmarks can be used to form a spatio-temporal graph or several graphs for each pain-related face region.So, the graph neural network (GNN) model [68] may be used and trained with these graphs for an accurate pain intensity estimation.
Furthermore, it is anticipated that explainable AI systems will be created to aid in decision-making so that medical professionals can better interpret and control pain.

Table 1 .
Pain assessment datasets details.

Table 2 .
Summary of machine learning-based pain assessment methods using spatial facial expressions and their performances on UNBC-McMaster dataset."ACC" refers to accuracy while "MSE" refers to Mean Square Error.

Table 3 .
Summary of deep learning-based pain assessment methods using spatial facial expressions and their performances on UNBC-McMaster and BioVid pain datasets."ACC" refers to accuracy while "MSE" refers to Mean Square Error.

Table 4 .
Summary of the hybrid model-based pain assessment methods using spatial facial expressions and their performances on UNBC-McMaster , BioVid and MIntPain pain datasets."ACC" refers to accuracy while "MSE" refers to Mean Square Error.