A Multimodal Pain Sentiment Analysis System Using Ensembled Deep Learning Approaches for IoT-Enabled Healthcare Framework
Abstract
:1. Introduction
- The proposed work develops a pain sentiment analysis system based on non-verbal communication (image or video) by utilizing multiple top-down deep learning models built on convolutional neural network (CNN) architectures. These models extract discriminative features from facial regions, enhancing feature representation for detecting varying levels of pain intensity.
- The proposed work also focuses on verbal communication (audio or speech) for pain sentiment analysis by employing handcrafted audio features as input, followed by deep learning models using CNN architectures to extract meaningful features for identifying different pain intensity levels.
- Performance improvements are achieved through extensive experimentation, including comparisons between batch processing and epoch cycles, data augmentation, progressive image resizing, hyperparameter tuning, and the application of pre-trained models via transfer learning.
- Post-classification fusion techniques are applied to enhance system accuracy by integrating classification scores from multiple deep learning models. This approach addresses challenges arising from variations in age, pose, lighting conditions, and noisy artefacts.
- Finally, a multimodal pain sentiment analysis system is developed by combining the results of image-based and audio-based models. This integration improves the system’s performance, enabling more accurate recognition of pain intensity and supporting real-time decision making in patient care.
2. Related Work
3. Proposed Methodology
3.1. Image-Based PSAS
3.2. Image-Based Feature Representation Using Proposed Deep Learning Architectures
- Convolutional layer: This layer applies convolution to extract features from the input image. Non-linearity is introduced using an activation function, and the generated feature maps depend on learnable weights and biases.
- Max-pooling layer: Max pooling reduces the size of feature maps by selecting the maximum value within each patch, thereby lowering computational complexity, maintaining translation invariance, and improving feature discrimination.
- Batch normalization layer: This layer normalizes intermediate outputs, stabilizing and speeding up training. It also allows the use of higher learning rates effectively.
- Dropout layer: Acting as a regularization technique, dropout prevents overfitting by randomly deactivating neurons during training, encouraging the network to generalize better.
- Flatten layer: This layer converts multi-dimensional feature maps into a one-dimensional vector, bridging the convolutional layers and the fully connected layers.
- Fully connected layers: These layers aggregate the extracted features into a final vector, with the size corresponding to the number of class labels. The output is used for classification.
- Output layer: The final layer employs the Softmax function to perform classification and determine the output probabilities for each class.
- (product of score);
- (weighted sum of score);
- (sum of score).
3.3. Audio-Based PSAS
- Statistical audio features: Here, the audio signal initially undergoes a frequency-domain signal analysis using fast Fourier transformation (FFT) [53]. The computed frequencies are subsequently employed to calculate descriptive statistics, including mean, median, standard deviation, quartiles, and kurtosis. The magnitude of frequency components is used to calculate energy and root mean square (RMS) energy. Here, the use of FFT transforms the audio signals into frequency components, which are used to analyze audio characteristics, such as to measure tone, pitch, and spectra content. So, the measure of descriptive statistics gives a general representation of these audio characteristics, whereas energy and RMS energy distinguish low and high loudness intensities with variations in their magnitudes.
- Mel-frequency cepstral coefficients (MFCCs): These represent a well-known audio-based feature computation technique [54], where the derived frequency components from FFT undergo a log-amplitude transformation. Then, the mel scale is introduced to the logarithm of the amplitude spectrum, discrete cosine transformations (DCTs) are applied on the mel scale, and finally 2–13 DCT coefficients are kept. The rest are discarded during feature computations. Here, the extracted final features of MFCCs represent the compactness (i.e., better for discriminating audio signals between subjects), and apply to domain relevance for diverse applications of audio-based recognition systems.
- Spectral features: Various spectral features like spectral contrast, spectral centroid, and spectral bandwidth are analyzed from the audio files. These spectral features [55] are related to the spectrograms of the audio files. The spectrogram represents the frequency intensities over time. It is measured from the squared magnitude of the STFT [56], which is obtained by computing an FFT over successive signal frames. These types of features can extract richness and harmonicity from the audio signals.
4. Experiments
4.1. Used Databases
4.2. Results for Image-Based PSAS
- experiment: This experiment is centered around parameter learning, focusing on batch size and epoch count, which are critical factors for the performance of any CNN architecture. Both the batch size and the number of epochs have a substantial impact on the ability of the network to learn when training on the provided samples. One of the primary tasks in deep CNN models is optimizing the learning of weight parameters. To improve this process, this study identifies an optimal trade-off between a batch size of 16 and 1500 epochs, aiming to enhance the outcomes of the proposed system. The results of an experiment comparing the effects of batch size and epoch count on system performance are presented in Figure 9 using the UNBC two-class dataset. As shown in the figure, performance improves with a batch size of 16, and the epoch count varies between 1000 and 1500.
- In , the experiments are conducted to incorporate multi-resolution facial images using progressive image resizing. This approach leverages multi-resolution image analysis to take advantage of progressive image scaling, allowing images of two different sizes to be processed by the and architectures. Compared to the standard architecture performance, offers several advantages: (i) networks can be trained with images having diverse dimensions, such as from low to high; (ii) representations of hierarchical features are obtained for each image, leading to enhanced texture pattern discrimination; and (iii) overfitting issues are minimized. The performance of multi-resolution image analysis in the proposed PSAS is illustrated in Figure 10 for the 2DFPE database (2 classes of pain level), UNBC database (2 classes of pain level), UNBC database (3 classes of pain level), and BioVid Heat Pain dataset (5 classes of pain level). The figure demonstrates that the model effectively addresses overfitting and enhances the proposed system’s overall performance.
- experiment: This experiment investigates the influence of applying transfer learning with fine-tuning on the performance of the two new CNN models trained on the same image data domain. Fine-tuning involves adjusting the weights of an existing CNN architecture, which reduces training time and improves results. Transfer learning is an improvised version of machine learning, in which any model is trained to fulfill one specific task and utilized as the base for constructing a new framework for a similar task. In this experiment, two transfer learning strategies are tested: (i) the first approach trains both CNN architectures, and , from scratch using the appropriate image sizes; and (ii) the second approach retrains the models using pre-trained weights from both architectures. The results indicate that the retrained models outperform those trained from scratch. Table 6 displays the outcome of the CNN architectures, where the evaluation is performed based on accuracy and F1-score. The findings suggest that approach 1 improves the performance of the proposed PSAS.
Dataset | Using | |||
---|---|---|---|---|
Acc. (%) | F1-Score | Acc. (%) | F1-Score | |
2DFPE (2 classes of pain level) | 75.27 | 0.7563 | 72.22 | 0.7041 |
UNBC (2 classes of pain level) | 83.11 | 0.8174 | 83.16 | 0.7918 |
UNBC (3 classes of pain level) | 82.44 | 0.8437 | 82.36 | 0.8213 |
BioVid (5 classes of pain level) | 34.11 | 0.3304 | 32.73 | 0.3016 |
Dataset | Using | |||
2DFPE (2 classes of pain level) | 75.81 | 0.7533 | 76.14 | 0.7524 |
UNBC (2 classes of pain level) | 84.12 | 0.7916 | 83.53 | 0.7414 |
UNBC (3 classes of pain level) | 82.46 | 0.8334 | 82.37 | 0.8103 |
BioVid (5 classes of pain level) | 37.45 | 0.3571 | 36.89 | 0.3518 |
- experiment: In this experiment, various fusion techniques are applied to the classification outcome scores produced by the trained and architectures. The classification results are obtained from deep learning features, and a post-classification fusion approach is employed. The classification scores generated by each CNN model are then processed using the weighted-sum rule, product rule, and sum rule fusion methodologies. Table 7 presents the fused outcome performance of the proposed system based on different combinations of these fusion methods. The results show that the system performs optimally when using the product rule fusion method, which integrates both CNN architectures, surpassing the performance of the weighted-sum rule, sum rule, and other techniques. The proposed system achieves accuracy rates of 85.15% for the two-class UNBC dataset, 83.79% for the three-class UNBC dataset, and 77.41% for the two-class 2DFPE dataset, which will be referenced in the subsequent section.
Method | 2DFPE (2-CPL) | UNBC (2-CPL) | UNBC (3-CPL) | BioVid (5-CPL) |
---|---|---|---|---|
(A1) | 75.27 | 83.11 | 82.44 | 34.11 |
(A2) | 72.22 | 83.16 | 82.36 | 32.73 |
Sum rule (A1 + A2) | 75.56 | 83.47 | 82.51 | 34.67 |
Product rule (A1 × A2) | 76.67 | 84.11 | 83.42 | 35.13 |
Weighted-sum rule (A1A2) | 76.37 | 84.11 | 83.15 | 35.23 |
(A1) | 75.81 | 84.27 | 82.46 | 37.45 |
(A2) | 76.14 | 83.53 | 82.37 | 36.89 |
Sum rule (A1 + A2) | 76.81 | 84.69 | 82.74 | 37.89 |
Product rule (A1 × A2) | 77.41 | 85.15 | 83.38 | 38.04 |
Weighted-sum rule (A1A2) | 77.23 | 85.07 | 83.17 | 37.83 |
- Facial expressions from people of different ethnicities influence the performance of facial expression recognition systems. To address this, the proposed system incorporates feature representation methods specifically designed for non-verbal communication, which aim to enhance performance across diverse ethnic groups and across age variations. While facial expressions are largely universal—encompassing seven universally recognized basic expressions [63]—the challenge of ethnicity impacting recognition accuracy remains significant, primarily due to the scarcity of pain datasets explicitly designed to encompass diverse ethnicities across different age groups. Here, the proposed system’s robustness in identifying facial expressions is validated across ethnicities with variations in age; an experiment was conducted using the challenging AffectNet [64] facial expression dataset, which is composed of 29,042 image samples spanning eight facial expression categories: anger, contempt, disgust, fear, happy, neutral, sad, and surprise. Figure 11 illustrates examples from this dataset. The facial expression recognition system is built using the same deep learning architectures (Figure 3 and Figure 4) and training–testing protocols employed in the proposed pain sentiment analysis system (PSAS). The system’s performance is compared with competing methods used for PSA, and the results are summarized in Table 8. The findings show that the proposed feature representation methods achieve superior performance, even on ethnically diverse data and data with variations in age in the facial expression dataset. This demonstrates the system’s robustness and adaptability in accurately recognizing and analyzing facial expressions across varied demographic groups and age ranges, and also in the context of pain recognition systems.
- Comparison with existing systems for the image-based PSAS: The effectiveness of the proposed system has been benchmarked against several state-of-the-art techniques across multiple datasets: the two-class 2DFPE database; the UNBC database considering the two-class and three-class classification problems; and the BioVid database considering the five-class classification problem. Feature extraction techniques, including local binary pattern (LBP) and Histogram of Oriented Gradients (HoG), as utilized in [65,66], were employed. For each image, the feature representation was generated using a size of pixels, and the images were divided into nine blocks for both the LBP and HoG methods. From each block, a 256-dimensional LBP feature vector [65] and an 81-dimensional HoG feature vector [66] were extracted, yielding 648 HoG features and 2304 LBP features per image. For deep feature extraction, methods such as ResNet50 [67], Inception-v3 [68], and VGG16 [69] were employed, each using an image size of . Transfer learning techniques were applied by training each network on samples from either the two-class or three-class problems. Features were then extracted from the trained networks, and neural network-based classifiers were used for the classification of test samples. Competing approaches from Lucey et al. [60], Anay et al. [70], and Werner et al. [71] were also implemented, using the same image size of . The results for these techniques were derived using the same training and testing protocol as the proposed system. These results are summarized in Table 9, which shows that the proposed system outperforms the competing techniques.
8-Class AffectNet | ||
---|---|---|
Method | Acc. (%) | F1-Score |
Anay et al. [70] | 56.45 | 0.5352 |
HoG | 43.67 | 0.4018 |
Inception-v3 | 51.71 | 0.4731 |
LBP | 40.47 | 0.3659 |
Lucey et al. [60] | 55.04 | 0.5345 |
ResNet50 | 41.19 | 0.3984 |
VGG16 | 43.26 | 0.4221 |
Werner et al. [71] | 56.87 | 0.5339 |
Proposed | 61.67 | 0.6034 |
3-Class UNBC-McMaster | 2-Class UNBC-McMaster | 2-Class 2DFPE | 5-Class BioVid | |||||
---|---|---|---|---|---|---|---|---|
Method | Acc. (%) | F1-Score | Acc. (%) | F1-Score | Acc. (%) | F1-Score | Acc. (%) | F1-Score |
Anay et al. [70] | 82.54 | 0.7962 | 83.71 | 0.8193 | 75.67 | 0.7391 | 27.33 | 26.89 |
HoG | 63.19 | 0.6087 | 73.29 | 0.6967 | 68.61 | 0.6364 | 24.34 | 24.17 |
Inception-v3 | 72.19 | 0.6934 | 76.04 | 0.7421 | 63.89 | 0.6172 | 23.84 | 22.31 |
LBP | 65.89 | 0.6217 | 75.92 | 0.7156 | 67.34 | 0.6386 | 26.10 | 25.81 |
Lucey et al. [60] | 80.81 | 0.7646 | 80.73 | 0.7617 | 72.58 | 0.6439 | 29.46 | 28.63 |
ResNet50 | 74.40 | 0.7145 | 77.79 | 0.7508 | 63.78 | 0.6154 | 28.92 | 28.78 |
VGG16 | 74.71 | 0.7191 | 76.16 | 0.738 | 61.98 | 0.5892 | 25.64 | 23.55 |
Werner et al. [71] | 75.98 | 0.7233 | 76.10 | 0.7531 | 66.17 | 0.6148 | 31.76 | 29.61 |
Proposed | 83.38 | 0.8174 | 85.07 | 0.8349 | 77.41 | 0.7528 | 37.42 | 35.84 |
- Table 9 provides a comparison of the performance outcome of the sentiment analysis system based on images beyond both the two-class and three-class pain sentiment analysis tasks. All experiments were conducted using identical training and testing protocols. The proposed two-class classification model (A) was quantitatively compared to several competing methods: Anay et al. [70] (B), HoG [66] (C), Inception-v3 [68] (D), LBP [65] (E), Lucey et al. [60] (F), ResNet50 [67] (G), VGG16 [69] (H), and Werner et al. [71] (I). A comparative statistical evaluation of the proposed system (A) against the competing approaches ({B, C, D, E, F, G, H, I}) is provided in Table 10. A one-tailed t-test [72] was performed using the t-statistic to analyze the results. This test involved two hypotheses: (i) H0 (null hypothesis): (indicating that the proposed system (A) demonstrates comparable results to the competing approaches on average); and (ii) H1 (alternative hypothesis): (suggesting that the proposed system (A) performs better than the competing methods on average). Rejection of the null hypothesis occurs if the p-value is less than 0.05, signifying that the proposed system (A) demonstrates better performance than the competing methods. For the quantitative comparison, each test dataset from the employed databases was divided into two sets. For the database, there were 298 test images (set 1: 149; set 2: 149). For the UNBC (two-class/three-class) problem, there were 24,199 test samples (set 1: 12,100; set 2: 12,099). The performance of the proposed and competing methods was evaluated using these test sets. The results, shown in Table 10 confirm that the alternative hypothesis is supported in all cases, and the null hypothesis is rejected, highlighting that the proposed system (A) consistently outshines the competing approaches.
4.3. Results for Audio-Based PSAS
4.4. Results for Multimodal PSAS
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ande, R.; Adebisi, B.; Hammoudeh, M.; Saleem, J. Internet of Things: Evolution and technologies from a security perspective. Sustain. Cities Soc. 2020, 54, 101728. [Google Scholar] [CrossRef]
- Khang, A. AI and IoT Technology and Applications for Smart Healthcare Systems; CRC Press: Boca Raton, FL, USA, 2024. [Google Scholar]
- Aminizadeh, S.; Heidari, A.; Dehghan, M.; Toumaj, S.; Rezaei, M.; Navimipour, N.J.; Stroppa, F.; Unal, M. Opportunities and challenges of artificial intelligence and distributed systems to improve the quality of healthcare service. Artif. Intell. Med. 2024, 149, 102779. [Google Scholar] [CrossRef] [PubMed]
- Muhammad, G.; Alsulaiman, M.; Amin, S.U.; Ghoneim, A.; Alhamid, M.F. A facial-expression monitoring system for improved healthcare in smart cities. IEEE Access 2017, 5, 10871–10881. [Google Scholar] [CrossRef]
- Williams, A.d. Facial expression of pain: An evolutionary account. Behav. Brain Sci. 2002, 25, 439–455. [Google Scholar] [CrossRef]
- Payen, J.F.; Bru, O.; Bosson, J.L.; Lagrasta, A.; Novel, E.; Deschaux, I.; Lavagne, P.; Jacquot, C. Assessing pain in critically ill sedated patients by using a behavioral pain scale. Crit. Care Med. 2001, 29, 2258–2263. [Google Scholar] [CrossRef] [PubMed]
- McGuire, B.; Daly, P.; Smyth, F. Chronic pain in people with an intellectual disability: Under-recognised and under-treated? J. Intellect. Disabil. Res. 2010, 54, 240–245. [Google Scholar] [CrossRef]
- Puntillo, K.A.; Morris, A.B.; Thompson, C.L.; Stanik-Hutt, J.; White, C.A.; Wild, L.R. Pain behaviors observed during six common procedures: Results from Thunder Project II. Crit. Care Med. 2004, 32, 421–427. [Google Scholar] [CrossRef]
- Herr, K.; Coyne, P.J.; Key, T.; Manworren, R.; McCaffery, M.; Merkel, S.; Pelosi-Kelly, J.; Wild, L. Pain assessment in the nonverbal patient: Position statement with clinical practice recommendations. Pain Manag. Nurs. 2006, 7, 44–52. [Google Scholar] [CrossRef] [PubMed]
- Twycross, A.; Voepel-Lewis, T.; Vincent, C.; Franck, L.S.; von Baeyer, C.L. A debate on the proposition that self-report is the gold standard in assessment of pediatric pain intensity. Clin. J. Pain 2015, 31, 707–712. [Google Scholar] [CrossRef]
- Knox, D.; Beveridge, S.; Mitchell, L.A.; MacDonald, R.A. Acoustic analysis and mood classification of pain-relieving music. J. Acoust. Soc. Am. 2011, 130, 1673–1682. [Google Scholar] [CrossRef]
- Giordano, V.; Luister, A.; Reuter, C.; Czedik-Eysenberg, I.; Singer, D.; Steyrl, D.; Vettorazzi, E.; Deindl, P. Audio Feature Analysis for Acoustic Pain Detection in Term Newborns. Neonatology 2022, 119, 760–768. [Google Scholar] [CrossRef] [PubMed]
- Oshrat, Y.; Bloch, A.; Lerner, A.; Cohen, A.; Avigal, M.; Zeilig, G. Speech prosody as a biosignal for physical pain detection. In Proceedings of the Conference on 8th Speech Prosody, Boston, MA, USA, 31 May–3 June 2016; pp. 420–424. [Google Scholar]
- Ren, Z.; Cummins, N.; Han, J.; Schnieder, S.; Krajewski, J.; Schuller, B. Evaluation of the pain level from speech: Introducing a novel pain database and benchmarks. In Proceedings of the Speech Communication; 13th ITG-Symposium, Oldenburg, Germany, 10–12 October 2018; pp. 1–5. [Google Scholar]
- Ashraf, A.B.; Lucey, S.; Cohn, J.F.; Chen, T.; Ambadar, Z.; Prkachin, K.M.; Solomon, P.E. The painful face–pain expression recognition using active appearance models. Image Vis. Comput. 2009, 27, 1788–1796. [Google Scholar] [CrossRef] [PubMed]
- Lucey, P.; Cohn, J.; Howlett, J.; Lucey, S.; Sridharan, S. Recognizing emotion with head pose variation: Identifying pain segments in video. IEEE Trans. Syst. Man Cybern.-Part B 2011, 41, 664–674. [Google Scholar] [CrossRef] [PubMed]
- Littlewort-Ford, G.; Bartlett, M.S.; Movellan, J.R. Are your eyes smiling? Detecting genuine smiles with support vector machines and Gabor wavelets. In Proceedings of the 8th Joint Symposium on Neural Computation, La Jolla, CA, USA, 19 May 2001; pp. 1–9. [Google Scholar]
- Umer, S.; Dhara, B.C.; Chanda, B. Face recognition using fusion of feature learning techniques. Measurement 2019, 146, 43–54. [Google Scholar] [CrossRef]
- Bisogni, C.; Castiglione, A.; Hossain, S.; Narducci, F.; Umer, S. Impact of deep learning approaches on facial expression recognition in healthcare industries. IEEE Trans. Ind. Inform. 2022, 18, 5619–5627. [Google Scholar] [CrossRef]
- Cunningham, P.; Delany, S.J. k-Nearest neighbour classifiers-A Tutorial. ACM Comput. Surv. (CSUR) 2021, 54, 1–25. [Google Scholar] [CrossRef]
- Mohankumar, N.; Narani, S.R.; Asha, S.; Arivazhagan, S.; Rajanarayanan, S.; Padmanaban, K.; Murugan, S. Advancing chronic pain relief cloud-based remote management with machine learning in healthcare. Indones. J. Electr. Eng. Comput. Sci. 2025, 37, 1042–1052. [Google Scholar] [CrossRef]
- Anderson, K.; Stein, S.; Suen, H.; Purcell, M.; Belci, M.; McCaughey, E.; McLean, R.; Khine, A.; Vuckovic, A. Generalisation of EEG-Based Pain Biomarker Classification for Predicting Central Neuropathic Pain in Subacute Spinal Cord Injury. Biomedicines 2025, 13, 213. [Google Scholar] [CrossRef]
- Hu, F.; Xia, G.S.; Hu, J.; Zhang, L. Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery. Remote Sens. 2015, 7, 14680–14707. [Google Scholar] [CrossRef]
- Yadav, A.; Vishwakarma, D.K. A comparative study on bio-inspired algorithms for sentiment analysis. Clust. Comput. 2020, 23, 2969–2989. [Google Scholar] [CrossRef]
- Nugroho, H.; Harmanto, D.; Al-Absi, H.R.H. On the development of smart home care: Application of deep learning for pain detection. In Proceedings of the 2018 IEEE-EMBS Conference on Biomedical Engineering and Sciences (IECBES), Sarawak, Malaysia, 3–6 December 2018; pp. 612–616. [Google Scholar]
- Haque, M.A.; Bautista, R.B.; Noroozi, F.; Kulkarni, K.; Laursen, C.B.; Irani, R.; Bellantonio, M.; Escalera, S.; Anbarjafari, G.; Nasrollahi, K.; et al. Deep multimodal pain recognition: A database and comparison of spatio-temporal visual modalities. In Proceedings of the 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, China, 15–19 May 2018; pp. 250–257. [Google Scholar]
- Menchetti, G.; Chen, Z.; Wilkie, D.J.; Ansari, R.; Yardimci, Y.; Çetin, A.E. Pain detection from facial videos using two-stage deep learning. In Proceedings of the 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Ottawa, ON, Canada, 11–14 November 2019; pp. 1–5. [Google Scholar]
- Yu, I.C.; Guo, J.M.; Lin, W.C.; Fang, J.T. Development of nonverbal communication behavior model for nursing students based on deep learning facial expression recognition technology. Cogent Educ. 2025, 12, 2448059. [Google Scholar] [CrossRef]
- Kasundra, A.; Chanchlani, R.; Lal, B.; Thanveeru, S.K.; Ratre, G.; Ahmad, R.; Sharma, P.K.; Agrawal, A.; Kasundra Sr, A. Role of Artificial Intelligence in the Assessment of Postoperative Pain in the Pediatric Population: A Systematic Review. Cureus 2025, 17, e12345. [Google Scholar] [CrossRef] [PubMed]
- M. Al-Eidan, R.; Al-Khalifa, H.; Al-Salman, A. Deep-learning-based models for pain recognition: A systematic review. Appl. Sci. 2020, 10, 5984. [Google Scholar] [CrossRef]
- Gouverneur, P.; Li, F.; Adamczyk, W.M.; Szikszay, T.M.; Luedtke, K.; Grzegorzek, M. Comparison of feature extraction methods for physiological signals for heat-based pain recognition. Sensors 2021, 21, 4838. [Google Scholar] [CrossRef] [PubMed]
- Pikulkaew, K.; Boonchieng, E.; Boonchieng, W.; Chouvatut, V. Pain detection using deep learning with evaluation system. In Proceedings of the Fifth International Congress on Information and Communication Technology; Springer: Singapore, 2021; pp. 426–435. [Google Scholar]
- Ismail, L.; Waseem, M.D. Towards a Deep Learning Pain-Level Detection Deployment at UAE for Patient-Centric-Pain Management and Diagnosis Support: Framework and Performance Evaluation. Procedia Comput. Sci. 2023, 220, 339–347. [Google Scholar] [CrossRef] [PubMed]
- Wu, J.; Shi, Y.; Yan, S.; Yan, H.M. Global-Local Combined Features to Detect Pain Intensity from Facial Expression Images with Attention Mechanism. J. Electron. Sci. Technol. 2024, 23, 100260. [Google Scholar] [CrossRef]
- Othman, E.; Werner, P.; Saxen, F.; Al-Hamadi, A.; Gruss, S.; Walter, S. Classification networks for continuous automatic pain intensity monitoring in video using facial expression on the X-ITE Pain Database. J. Vis. Commun. Image Represent. 2023, 91, 103743. [Google Scholar] [CrossRef]
- Thiam, P.; Kessler, V.; Walter, S.; Palm, G.; Schwenker, F. Audio-visual recognition of pain intensity. In Proceedings of the IAPR Workshop on Multimodal Pattern Recognition of Social Signals in Human-Computer Interaction; Springer: Cham, Switerland, 2017; pp. 110–126. [Google Scholar]
- Hossain, M.S. Patient state recognition system for healthcare using speech and facial expressions. J. Med Syst. 2016, 40, 272. [Google Scholar] [CrossRef] [PubMed]
- Zeng, Z.; Pantic, M.; Roisman, G.I.; Huang, T.S. A survey of affect recognition methods: Audio, visual and spontaneous expressions. In Proceedings of the 9th International Conference on Multimodal Interfaces, Nagoya, Japan, 12–15 November 2007; pp. 126–133. [Google Scholar]
- Thiam, P.; Bellmann, P.; Kestler, H.A.; Schwenker, F. Exploring deep physiological models for nociceptive pain recognition. Sensors 2019, 19, 4503. [Google Scholar] [CrossRef] [PubMed]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- Dashtipour, K.; Gogate, M.; Cambria, E.; Hussain, A. A novel context-aware multimodal framework for persian sentiment analysis. arXiv 2021, arXiv:2103.02636. [Google Scholar] [CrossRef]
- Sagum, R.A. An Application of Emotion Detection in Sentiment Analysis on Movie Reviews. Turk. J. Comput. Math. Educ. (TURCOMAT) 2021, 12, 5468–5474. [Google Scholar] [CrossRef]
- Rustam, F.; Khalid, M.; Aslam, W.; Rupapara, V.; Mehmood, A.; Choi, G.S. A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis. PLoS ONE 2021, 16, e0245909. [Google Scholar] [CrossRef]
- Liu, Z.x.; Zhang, D.g.; Luo, G.z.; Lian, M.; Liu, B. A new method of emotional analysis based on CNN–BiLSTM hybrid neural network. Clust. Comput. 2020, 23, 2901–2913. [Google Scholar] [CrossRef]
- Ridouan, A.; Bohi, A.; Mourchid, Y. Improving Pain Classification using Spatio-Temporal Deep Learning Approaches with Facial Expressions. arXiv 2025, arXiv:2501.06787. [Google Scholar]
- Zhu, X.; Ramanan, D. Face detection, pose estimation, and landmark localization in the wild. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 2879–2886. [Google Scholar]
- Hossain, S.; Umer, S.; Rout, R.K.; Tanveer, M. Fine-grained image analysis for facial expression recognition using deep convolutional neural networks with bilinear pooling. Appl. Soft Comput. 2023, 134, 109997. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Umer, S.; Rout, R.K.; Pero, C.; Nappi, M. Facial expression recognition with trade-offs between data augmentation and deep learning features. J. Ambient. Intell. Humaniz. Comput. 2021, 13, 721–735. [Google Scholar] [CrossRef]
- Saxena, A. Convolutional neural networks: An illustration in TensorFlow. XRDS: Crossroads, ACM Mag. Stud. 2016, 22, 56–58. [Google Scholar] [CrossRef]
- Lawrence, S.; Giles, C.L.; Tsoi, A.C.; Back, A.D. Face recognition: A convolutional neural-network approach. IEEE Trans. Neural Netw. 1997, 8, 98–113. [Google Scholar] [CrossRef]
- Martin, E.; d’Autume, M.d.M.; Varray, C. Audio Denoising Algorithm with Block Thresholding. Image Process. 2012, 21, 105–1232. [Google Scholar]
- Fu, Z.; Lu, G.; Ting, K.M.; Zhang, D. A survey of audio-based music classification and annotation. IEEE Trans. Multimed. 2010, 13, 303–319. [Google Scholar] [CrossRef]
- Logan, B. Mel Frequency Cepstral Coefficients for Music Modeling. In ISMIR; ISMIR: Plymouth, MA, USA, 2000; Volume 270, p. 11. [Google Scholar]
- Lee, C.H.; Shih, J.L.; Yu, K.M.; Lin, H.S. Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features. IEEE Trans. Multimed. 2009, 11, 670–682. [Google Scholar]
- Nawab, S.; Quatieri, T.; Lim, J. Signal reconstruction from short-time Fourier transform magnitude. IEEE Trans. Acoust. Speech Signal Process. 1983, 31, 986–998. [Google Scholar] [CrossRef]
- Gulli, A.; Pal, S. Deep Learning with Keras; Packt Publishing Ltd.: Birmingham, UK, 2017. [Google Scholar]
- Bergstra, J.; Breuleux, O.; Bastien, F.; Lamblin, P.; Pascanu, R.; Desjardins, G.; Turian, J.; Warde-Farley, D.; Bengio, Y. Theano: A CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), Austin, TX, USA, 28 June–3 July 2010; Volume 4, pp. 1–7. [Google Scholar]
- Holz, N.; Larrouy-Maestri, P.; Poeppel, D. The variably intense vocalizations of affect and emotion (VIVAE) corpus prompts new perspective on nonspeech perception. Emotion 2022, 22, 213. [Google Scholar] [CrossRef] [PubMed]
- Lucey, P.; Cohn, J.F.; Prkachin, K.M.; Solomon, P.E.; Matthews, I. Painful data: The UNBC-McMaster shoulder pain expression archive database. In Proceedings of the 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), Santa Barbara, CA, USA, 21–25 March 2011; pp. 57–64. [Google Scholar]
- 2D Face Dataset with Pain Expression. Available online: http://pics.psych.stir.ac.uk/2D_face_sets.htm (accessed on 6 February 2025).
- Walter, S.; Gruss, S.; Ehleiter, H.; Tan, J.; Traue, H.C.; Werner, P.; Al-Hamadi, A.; Crawcour, S.; Andrade, A.O.; da Silva, G.M. The biovid heat pain database data for the advancement and systematic validation of an automated pain recognition system. In Proceedings of the 2013 IEEE International Conference on Cybernetics (CYBCO), Lausanne, Switzerland, 13–15 June 2013; pp. 128–131. [Google Scholar]
- Ekman, P. An Argument for Basic Emotions. Cogn. Emot. 1992, 6, 169–200. [Google Scholar] [CrossRef]
- Mollahosseini, A.; Hasani, B.; Mahoor, M.H. Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 2017, 10, 18–31. [Google Scholar] [CrossRef]
- Umer, S.; Dhara, B.C.; Chanda, B. An iris recognition system based on analysis of textural edgeness descriptors. IETE Tech. Rev. 2018, 35, 145–156. [Google Scholar] [CrossRef]
- Umer, S.; Dhara, B.C.; Chanda, B. Biometric recognition system for challenging faces. In Proceedings of the 2015 Fifth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), Patna, India, 16–19 December 2015; pp. 1–4. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- McNeely-White, D.; Beveridge, J.R.; Draper, B.A. Inception and ResNet features are (almost) equivalent. Cogn. Syst. Res. 2020, 59, 312–318. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Ghosh, A.; Umer, S.; Khan, M.K.; Rout, R.K.; Dhara, B.C. Smart sentiment analysis system for pain detection using cutting edge techniques in a smart healthcare framework. Clust. Comput. 2023, 26, 119–135. [Google Scholar] [CrossRef] [PubMed]
- Werner, P.; Al-Hamadi, A.; Limbrecht-Ecklundt, K.; Walter, S.; Gruss, S.; Traue, H.C. Automatic pain assessment with facial activity descriptors. IEEE Trans. Affect. Comput. 2016, 8, 286–299. [Google Scholar] [CrossRef]
- Gibbons, J.D. Nonparametric Statistics: An Introduction; Sage: Thousand Oaks, CA, USA, 1993; Volume 9. [Google Scholar]
- McFee, B.; Raffel, C.; Liang, D.; Ellis, D.P.; McVicar, M.; Battenberg, E.; Nieto, O. librosa: Audio and music signal analysis in python. In Proceedings of the 14th Python in Science Conference (SCIPY 2015), Austin, TX, USA, 6–12 July 2015; pp. 18–24. [Google Scholar]
Term | Abbreviation |
---|---|
2D Face Set Database with Pain Expression | 2DFPE |
Convolutional neural network | CNN |
Covertly aggressive | CAG |
Fast Fourier transformation | FFT |
Internet Of Things | IoT |
Mel-frequency cepstral coefficients | MFCCs |
Multimodal pain sentiment analysis system | MPSAS |
Non-aggressive | NAG |
Overtly aggressive | OAG |
Pain sentiment analysis system | PSAS |
Tree-structured part model | TSPM |
University Of Northern British Columbia-McMaster Shoulder Pain Expression Archive Database | UNBC-McMaster |
Variably Intense Vocalizations of Affect And Emotion | VIVAE |
Layers | Outputshape | Parameters | Layers | Outputshape | Parameters |
---|---|---|---|---|---|
Block1 (Input image: (128, 128, 3)) | Block5 | ||||
Convolution (3 × 3) | (t, t, 30) | 840 | Convolution (3 × 3) | (t4, t4, 120) | 129,720 |
Maxpool (2 × 2) | (t1, t1, 30) | 0 | Maxpool (2 × 2) | (t5, t5, 120) | 0 |
BatchNormalization | (t1, t1, 30) | 120 | BatchNormalization | (t5, t5, 120) | 480 |
Activation (ReLU) | (t1, t1, 30) | 0 | Activation (ReLU) | (t5, t5, 120) | 0 |
Dropout | (t1, t1, 30) | 0 | Dropout | (t5, t5, 120) | 0 |
Block2 | Block6 | ||||
Convolution (3 × 3) | (t1, t1, 60) | 16,260 | Convolution (3 × 3) | (t5, t5, 240) | 259,440 |
Maxpool (2 × 2) | (t2, t2, 60) | 0 | Maxpool (2 × 2) | (t6, t6, 240) | 0 |
BatchNormalization | (t2, t2, 60) | 240 | BatchNormalization | (t6, t6, 240) | 960 |
Activation (ReLU) | (t2, t2, 60) | 0 | Activation (ReLU) | (t6, t6, 240) | 0 |
Dropout | (t2, t2, 60) | 0 | Dropout | (t6, t6, 240) | 0 |
Block3 | Block7 | ||||
Convolution (3 × 3) | (t2, t2, 90) | 48,690 | Convolution (3 × 3) | (t6, t6, 512) | 110,6432 |
Maxpool (2 × 2) | (t3, t3, 90) | 0 | Maxpool (2 × 2) | (t7, t7, 512) | 0 |
BatchNormalization | (t3, t3, 90) | 360 | BatchNormalization | (t7, t7, 512) | 2048 |
Activation (ReLU) | (t3, t3, 90) | 0 | Activation (ReLU) | (t7, t7, 512) | 0 |
Dropout | (t3, t3, 90) | 0 | Dropout | (t7, t7, 512) | 0 |
Block4 | Flatten | ||||
Convolution (3 × 3) | (t3, t3, 120) | 97,320 | Dense | 1 × (t7 × t7 × 512) | 262,656 |
Maxpool (2 × 2) | (t4, t4, 120) | 0 | BatchNormalization | 1 × (t7 × t7 × 512) | 2048 |
BatchNormalization | (t4, t4, 120) | 480 | Activation (ReLU) | 1 × (t7 × t7 × 512) | 0 |
Activation (ReLU) | (t4, t4, 120) | 0 | Dropout | 1 × (t7 × t7 × 512) | 0 |
Dropout | (t4, t4, 120) | 0 | Dense | 1 × P (=5) | 2565 |
Total Parameters for The Input Image Size | 1,930,659 | ||||
Total Number of Trainable Parameters: | 1,927,291 | ||||
Non-trainable params: | 3368 |
Layers | Outputshape | Parameters | Layers | Outputshape | Parameters |
---|---|---|---|---|---|
Block1 (Input image: (192, 192, 3)) | Block5 | ||||
Convolution (3 × 3) | (t, t, 30) | 840 | Convolution (3 × 3) | (t4, t4, 240) | 259,440 |
Maxpool (2 × 2) | (t1, t1, 30) | 0 | Maxpool (2 × 2) | (t5, t5, 240) | 0 |
BatchNormalization | (t1, t1, 30) | 120 | BatchNormalization | (t5, t5, 240) | 960 |
Activation (ReLU) | (t1, t1, 30) | 0 | Activation (ReLU) | (t5, t5, 240) | 0 |
Dropout | (t1, t1, 30) | 0 | Dropout | (t5, t5, 240) | 0 |
Block2 | Block6 | ||||
Convolution (3 × 3) | (t1, t1, 60) | 16,260 | Convolution (3 × 3) | (t5, t5, 480) | 1,037,280 |
Maxpool (2 × 2) | (t2, t2, 60) | 0 | Maxpool (2 × 2) | (t6, t6, 480) | 0 |
BatchNormalization | (t2, t2, 60) | 240 | BatchNormalization | (t6, t6, 480) | 1920 |
Activation (ReLU) | (t2, t2, 60) | 0 | Activation (ReLU) | (t6, t6, 480) | 0 |
Dropout | (t2, t2, 60) | 0 | Dropout | (t6, t6, 480) | 0 |
Block3 | Block7 | ||||
Convolution (3 × 3) | (t2, t2, 90) | 48,690 | Convolution (3 × 3) | (t6, t6, 1024) | 4,424,704 |
Maxpool (2 × 2) | (t3, t3, 90) | 0 | Maxpool (2 × 2) | (t7, t7, 1024) | 0 |
BatchNormalization | (t3, t3, 90) | 360 | BatchNormalization | (t7, t7, 1024) | 4096 |
Activation (ReLU) | (t3, t3, 90) | 0 | Activation (ReLU) | (t7, t7, 1024) | 0 |
Dropout | (t3, t3, 90) | 0 | Dropout | (t7, t7, 1024) | 0 |
Block4 | Flatten | ||||
Convolution (3 × 3) | (t3, t3, 120) | 97,320 | Dense | 1 × (t7 × t7 × 1024) | 1,049,600 |
Maxpool (2 × 2) | (t4, t4, 120) | 0 | BatchNormalization | 1 × (t7 × t7 × 1024) | 4096 |
BatchNormalization | (t4, t4, 120) | 480 | Activation (ReLU) | 1 × (t7 × t7 × 1024) | 0 |
Activation (ReLU) | (t4, t4, 120) | 0 | Dropout | 1 × (t7 × t7 × 1024) | 0 |
Dropout | (t4, t4, 120) | 0 | Dense | 1 × P (=5) | 5125 |
Total Parameters for The Input Image Size | 6,951,531 | ||||
Total Number of Trainable Parameters: | 6,945,395 | ||||
Non-trainable params: | 6136 |
Layer | Output Shape | Feature Size | Parameters |
---|---|---|---|
Flatten | (1, d) | (1,d) | 0 |
Dense | (1, 512) | (1, 512) | (1 + d) × 512 |
BatNorm | (1, 512) | (1, 512) | 2048 |
ActRelu | (1, 512) | (1, 512) | 0 |
Dropout | (1, 512) | (1, 512) | 0 |
Dense | (1, 256) | (1, 256) | (1 + 512) × 256 = 131,328 |
BatNorm | (1, 256) | (1, 256) | 1024 |
ActRelu | (1, 256) | (1, 256) | 0 |
Dropout | (1, 256) | (1, 256) | 0 |
Dense | (1, 128) | (1,128) | (1 + 256) × 128 = 32,896 |
BatNorm | (1, 128) | (1,128) | 512 |
ActRelu | (1, 128 | (1, 128) | 0 |
Dropout | (1, 128) | (1, 128) | 0 |
Dense | (1, 3) | (1, 3) | (128 + 1) × 3 = 387 |
Total No. of Parameters | 168,195 + ((1 + d) × 512) |
Class | Samples | Demography |
---|---|---|
2DFPE Database (2 classes of pain level) | ||
No Pain () | 298 | 13 female and 10 male participants, without mention, belongs to Scotland, UK |
Pain () | 298 | |
UNBC Database (2 classes of pain level) | ||
No Pain () | 40,029 | Same as for UNBC Database (3 classes of pain level) |
Pain () | 8369 | |
UNBC Database (3 classes of pain level) | ||
No (Zero Intensity) Pain (NP) () | 40,029 | 12 male and 13 female participants, with ages ranging from 18 to 65 years, from Ontario, Canada, regions |
Low-Intensity Pain (LP) () | 2909 | |
High-Intensity Pain (HP) () | 5460 | |
BioVid Heat Pain dataset (5 classes of pain level) | ||
No Pain () | 6900 | 30 male and 30 female participants, healthy adult individuals aged 18 and above, diverse age range, Western, European, and German regions |
Pain Amount 1 () | 6900 | |
Pain Amount 2 () | 6900 | |
Pain Amount 3 () | 6900 | |
Pain Amount 4 () | 6900 |
3-Class UNBC-McMaster | ||||||
Method | Set-1 | Set-2 | t-statistic, p-value | Remarks | ||
Proposed (A) | 85.21 | 81.89 | 79.05 | 3.1123 | H0: Null hypothesis; H1: Alternative hypothesis | |
Anay et al. [70] (B) | 83.12 | 80.45 | 84.51 | 0.18 | 0.1942, | H0 is accepted, |
HoG (C) | 69.74 | 64.56 | 65.21 | 0.25 | 10.31, | H0 is rejected, |
Inception-v3 (D) | 74.31 | 72.61 | 71.39 | 2.42 | 7.54, | H0 is rejected, |
LBP (E) | 71.36 | 62.72 | 67.04 | 37.32 | 3.5674, | H0 is rejected, |
Lucey et al. [60] (F) | 82.78 | 80.45 | 80.21 | 12.33 | 7.54, | H0 is rejected, |
ResNet50 (G) | 76.72 | 75.91 | 76.32 | 0.33 | 4.23, | H0 is rejected, |
VGG16 (H) | 73.33 | 76.23 | 74.78 | 4.21 | 3.97, | H0 is rejected, |
Werner et al. [71] (I) | 75.89 | 74.52 | 75.20 | 0.94 | 4.64, | H0 is rejected, |
2-Class UNBC-McMaster | ||||||
Method | Set-1 | Set-2 | t-statistic, p-value | Remarks | ||
Proposed (A) | 87.43 | 85.81 | 79.05 | 3.1123 | H0: Null hypothesis; H1: Alternative hypothesis | |
Anay et al. [70] (B) | 82.17 | 84.29 | 84.21 | 2.14 | 4.4314, | H0 is rejected, |
HoG (C) | 76.33 | 75.01 | 74.96 | 0.25 | 10.31, | H0 is rejected, |
Inception-v3 (D) | 77.45 | 75.19 | 76.32 | 2.55 | 7.40, | H0 is rejected, |
LBP (E) | 73.23 | 76.93 | 73.81 | 3.70 | 7.39, | H0 is rejected, |
Lucey et al. [60] (F) | 82.19 | 81.89 | 82.04 | 0.04 | 5.55, | H0 is rejected, |
ResNet50 (G) | 79.31 | 78.05 | 78.68 | 0.79 | 7.73, | H0 is rejected, |
VGG16 (H) | 77.45 | 71.38 | 74.41 | 18.42 | 3.88, | H0 is rejected, |
Werner et al. [71] (I) | 74.38 | 73.67 | 74.03 | 0.25 | 14.24, | H0 is rejected, |
2-Class 2DFPE | ||||||
Method | Set-1 | Set-2 | t-statistic, p-value | Remarks | ||
Proposed (A) | 77.41 | 81.91 | 79.66 | 10.12 | H0: Null hypothesis; H1: Alternative hypothesis | |
Anay et al. [70] (B) | 75.67 | 76.06 | 75.87 | 0.08 | 1.68, | H0 is accepted, |
HoG (C) | 68.61 | 72.71 | 70.66 | 8.40 | 2.95, | H0 is rejected, |
Inception-v3 (D) | 63.89 | 65.07 | 64.48 | 0.70 | 6.52, | H0 is rejected, |
LBP (E) | 67.34 | 69.47 | 68.40 | 2.27 | 4.52, | H0 is rejected, |
Lucey et al. [60] (F) | 72.58 | 75.01 | 73.80 | 2.95 | 2.29, | H0 is accepted, |
ResNet50 (G) | 63.78 | 65.39 | 64.59 | 1.30 | 6.30, | H0 is rejected, |
VGG16 (H) | 61.98 | 64.14 | 63.06 | 2.33 | 6.65, | H0 is rejected, |
Werner et al. [71] (I) | 69.05 | 71.37 | 70.21 | 2.69 | 3.73, | H0 is rejected, |
5-Class BioVid | ||||||
Method | Set-1 | Set-2 | t-statistic, p-value | Remarks | ||
Proposed (A) | 37.67 | 37.81 | 37.42 | 0.62 | H0: Null hypothesis; H1: Alternative hypothesis | |
Anay et al. [70] (B) | 28.15 | 26.41 | 26.96 | 2.13 | 15.4437, | H0 is rejected, |
HoG (C) | 23.05 | 25.24 | 24.34 | 2.62 | 17.7968, | H0 is rejected, |
Inception-v3 (D) | 22.78 | 24.89 | 23.84 | 2.23 | 13.15, | H0 is rejected, |
LBP (E) | 25.89 | 26.31 | 26.10 | 0.09 | 52.58, | H0 is rejected, |
Lucey et al. [60] (F) | 29.78 | 29.15 | 29.46 | 0.20 | 25.64, | H0 is rejected, |
ResNet50 (G) | 28.56 | 29.28 | 28.92 | 0.26 | 24.049, | H0 is rejected, |
VGG16 (H) | 25.09 | 26.18 | 25.64 | 0.59 | 22.03, | H0 is rejected, |
Werner et al. [71] (I) | 31.34 | 32.17 | 31.76 | 0.34 | 14.22, | H0 is rejected, |
50–50% Training–Testing Set | 75–25% Training–Testing Set | |||||||
---|---|---|---|---|---|---|---|---|
Statistical Features | ||||||||
Classifier | Accuracy | Precision | Recall | F1-Score | Accuracy | Precision | Recall | F1-Score |
Logistic Regression | 49.56 | 24.76 | 49.56 | 33.52 | 50.21 | 25.12 | 50.21 | 33.43 |
K-Nearest Neighbors | 75.11 | 58.17 | 75.11 | 64.47 | 52.41 | 50.43 | 50.63 | 49.22 |
Decision Tree | 75.11 | 58.17 | 75.11 | 64.47 | 50.21 | 25.12 | 50.21 | 33.43 |
Support Vector Machine | 49.56 | 24.76 | 49.56 | 33.52 | 50.21 | 25.14 | 50.21 | 33.43 |
MFCC features | ||||||||
Logistic Regression | 52.23 | 35.63 | 52.23 | 39.52 | 75.34 | 58.18 | 75.07 | 64.74 |
K-Nearest Neighbors | 52.23 | 35.63 | 52.23 | 39.52 | 75.66 | 63.29 | 75.22 | 67.39 |
Decision Tree | 95.18 | 93.52 | 92.38 | 92.73 | 97.57 | 96.45 | 95.81 | 95.08 |
Support Vector Machine | 26.27 | 9.63 | 26.41 | 13.48 | 97.43 | 96.21 | 95.72 | 94.69 |
Spectra features | ||||||||
Logistic Regression | 49.71 | 33.12 | 49.76 | 39.44 | 75.18 | 58.37 | 75.18 | 64.77 |
K-Nearest Neighbors | 75.13 | 58.17 | 75.21 | 64.53 | 75.18 | 53.37 | 75.18 | 64.77 |
Decision Tree | 96.45 | 95.41 | 95.43 | 94.21 | 98.51 | 97.49 | 97.43 | 97.19 |
Support Vector Machine | 96.31 | 95.23 | 94.24 | 93.19 | 75.18 | 58.37 | 75.18 | 64.77 |
Data Type | Image Classification | Audio Classification | |
---|---|---|---|
CNN Models | |||
Trainable Parameters | 1,947,431 | 6,654,119 | 281,603 |
Non-Trainable Parameters | 2944 | 4736 | 1792 |
Total Parameters | 1,944,487 | 6,649,383 | 283,395 |
Time | 1.04 | 1.06 | 0.43 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ghosh, A.; Umer, S.; Dhara, B.C.; Ali, G.G.M.N. A Multimodal Pain Sentiment Analysis System Using Ensembled Deep Learning Approaches for IoT-Enabled Healthcare Framework. Sensors 2025, 25, 1223. https://doi.org/10.3390/s25041223
Ghosh A, Umer S, Dhara BC, Ali GGMN. A Multimodal Pain Sentiment Analysis System Using Ensembled Deep Learning Approaches for IoT-Enabled Healthcare Framework. Sensors. 2025; 25(4):1223. https://doi.org/10.3390/s25041223
Chicago/Turabian StyleGhosh, Anay, Saiyed Umer, Bibhas Chandra Dhara, and G. G. Md. Nawaz Ali. 2025. "A Multimodal Pain Sentiment Analysis System Using Ensembled Deep Learning Approaches for IoT-Enabled Healthcare Framework" Sensors 25, no. 4: 1223. https://doi.org/10.3390/s25041223
APA StyleGhosh, A., Umer, S., Dhara, B. C., & Ali, G. G. M. N. (2025). A Multimodal Pain Sentiment Analysis System Using Ensembled Deep Learning Approaches for IoT-Enabled Healthcare Framework. Sensors, 25(4), 1223. https://doi.org/10.3390/s25041223