Convolutional Neural Network-Based Digital Diagnostic Tool for the Identification of Psychosomatic Illnesses
Abstract
:1. Introduction
- What are the current limitations in diagnosing psychosomatic illnesses using traditional methods [8]?
- Can a CNN model trained to recognize emotional expressions improve the diagnostic accuracy for psychosomatic illnesses [9]?
- How effectively can such a model be integrated into clinical workflows to support medical professionals without replacing their expert judgment?
2. Related Works
3. Methodology
3.1. CNN Architecture and Development
- The initial layers of the convolutional neural network are designed to extract detailed features from the facial images progressively. The network comprises two convolutional layers with 64 filters each, followed by a max pooling layer. This configuration is designed to capture fundamental facial characteristics, such as edges and simple textures.
- The intermediate layers comprise sets of convolutional layers with an increasing number of filters included, with 128 and 256 filters per set. Each set is then followed by a max pooling layer. The aforementioned layers are designed to detect more complex features, such as the constituent parts of facial expressions [3].
- The final layers of the network are referred to as the “advanced layers.” A final set of deep convolutional layers with 512 filters is employed to extract high-level features that are crucial for the recognition of subtle nuances in expressions.
- The network is concluded with fully connected layers that act as classifiers, i.e., classification layers. In order to prevent overfitting, the model employs dropout regularization, while batch normalization is used to maintain the mean output close to 0 and the output standard deviation close to 1 throughout training [3].
- Initial Layers: Similar to the first model, the initial layers include two convolutional layers with 64 filters each, followed by batch normalization and LeakyReLU activation [23], concluding with a max pooling layer and dropout for regularization.
- Intermediate Layers: The intermediate layers consist of two sets of convolutional layers with 128 and 256 filters per set, each followed by batch normalization, LeakyReLU activation, max pooling, and dropout layers.
- Advanced Layers: This model includes an additional set of deep convolutional layers with 256 filters, followed by batch normalization and LeakyReLU activation, before applying max pooling and dropout.
- Final Layers: After flattening the feature maps, the model includes a dense layer with 256 units, batch normalization, LeakyReLU activation, and dropout, ending with a dense softmax layer for classification.
3.2. Methodology for Predicting Psychological Disorders Using Facial Analysis and Pulse Data
3.2.1. Pulse Detection Methodology
3.2.2. Integration with Facial Emotion Recognition
3.2.3. Overall System Architecture
3.2.4. Conclusions
3.2.5. Condition Interpretation
4. Evaluation of CNN Models
4.1. Performance Testing
4.2. Validation Techniques
- Cross-validation was employed as a technique to ensure the robustness of the model and reduce the likelihood of overfitting. In order to ensure the model’s robustness and to reduce the likelihood of overfitting, k-fold cross-validation was employed during the training phase. This technique involved dividing the training dataset into “k” subsets and iteratively training the model “k” times (k = 5), with each iteration involving the use of a different subset held out for validation.
- Real-time testing: To simulate real-world usage, the model was also tested in a simulated clinical environment, where it processed live video feeds to detect and interpret emotional expressions in real time. This testing was of great importance in assessing the model’s performance in dynamic situations that were similar to actual patient interactions. The experiments, including both training and testing phases, were conducted on an MSI Summit B15 A11 MT laptop with an Intel Core i7 processor and 64 GB of RAM, running on a Linux operating system. The training duration for each session of 100 epochs was approximately 100 h.
- Enhanced data augmentation: To reflect the variability observed in real-world clinical settings, extensive data augmentation techniques were employed, including rotation, scaling, and lighting changes. This approach facilitated the training of the model to recognize emotional cues accurately across diverse conditions.
- Layer Optimization: A variety of architectural configurations were tested, including alterations to the number and size of convolutional layers and filters, with the objective of optimizing the model for the accurate detection of subtle emotional cues that are frequently indicative of psychosomatic illnesses.
- User-centric design: The interface for the emotion recognition tool was designed with input from healthcare professionals to ensure that it is intuitive and enhances rather than disrupts the clinical workflow.
- Feedback loops: Ongoing feedback was sought from end-users to facilitate iterative improvements to the system. Such feedback was of the utmost importance for the refinement of the interface design, the improvement of model accuracy, and the assurance of the tool’s relevance in clinical settings [30].
- Ethical and privacy considerations: Given the sensitivity of psychological assessments, the model’s integration was conducted with a strong emphasis on ethical considerations and patient privacy. This included secure data handling and clear patient consent processes [31].
5. Results and Analysis
- is the observed frequency.
- is the expected frequency.
5.1. Application in Psychosomatic Illness Diagnosis
5.2. Conclusions
6. Discussion
6.1. Comparison with Baseline Models and State-of-the-Art Techniques
- Logistic regression is a straightforward linear model often used for classification tasks.
- Support vector machine (SVM) is a powerful non-linear model with a radial basis function (RBF) kernel.
- k-nearest neighbors (k-NNs) is a non-parametric method that classifies data based on proximity in the feature space.
6.2. Limitations and Potential Biases
- Dataset Limitations: The FER-2013 dataset is diverse. However, it does not capture the full range of human emotions and lacks the context that may be present in real clinical settings.
- Environmental Factors: The models were not tested under different lighting or background conditions. This could affect their performance in practical applications.
- Generalization: The ability of the models to generalize to different demographic groups was not assessed, which is of critical importance for a tool that is designed for clinical use.
6.3. Clinical Implications and Ethical Considerations
- Increasing the size of and variability in the training dataset by collecting a wider range of emotional expressions and situations, as well as including more instances to fine-tune and increase model accuracy.
- Validation under real-world settings: It is important to carry out experiments under actual clinical settings involving different demographics so that learned models can better adapt to real-world nuances.
- To improve the quality of information provided by models, it is important to combine their results with the additional data that can be obtained from patients, such as medical history and verbal information.
7. Conclusions
7.1. Summary of Key Findings
- Certain emotional states can be identified consistently by CNN models, which might serve as proxies for underlying psychosomatic illnesses.
- The “happy” and “surprise” emotions were recognized with high accuracy—these two emotions play a significant role in understanding patient well-being.
- Based on confusion matrices and performance metrics, areas where the model could be improved were established. One such area is recognizing emotions that are under-represented in the training dataset.
7.2. Recommendations for Future Research
- Enlarging and diversifying the training dataset to have more coverage over emotional expressions and demographic variability.
- Carrying out clinical validation studies that test the models in a real-world setting to refine their predictive capabilities.
- Combining multimodal data—physiological plus audio cues—with the aim of creating a more comprehensive system for emotion recognition.
- Reinforcement Learning [32]: Reinforcement learning fosters generalization by providing real-time feedback and enabling trial-and-error learning, which is essential in achieving optimal performance.
- Ensemble Methods [33]: Ensemble methods, including bagging, boosting, and stacking, help combine models for improved performance.
- GANs [34]: GANs generate synthetic data to augment training datasets, which is an important component in enhancing the overall quality of a model.
- RNNs [35]: RNNs are used to analyze temporal patterns, thus aiding in better understanding emotional states.
- Hybrid Models: The merging of CNNs with SVMs or decision trees allows for handling different types of data, especially when it comes to complexity.
- Neural Architecture Search (NAS) [36]: Neural architecture search involves automating the discovery process towards attaining an optimal neural network structure that would be most suitable based on the nature of the dataset at hand.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Hong, A.J.; DiStefano, D.; Dua, S. Can CNNs Accurately Classify Human Emotions? A Comparative Study. arXiv 2023, arXiv:2310.09473. Available online: https://arxiv.org/pdf/2310.09473 (accessed on 1 June 2024).
- Romanovs, A.; Sultanovs, E.; Buss, E.; Merkuryev, Y.; Majore, G. Challenges and Solutions for Resilient Telemedicine Services. In Proceedings of the 2020 IEEE 8th Workshop on Advances in Information, Electronic and Electrical Engineering (AIEEE), Vilnius, Lithuania, 22–24 April 2021; pp. 1–7. [Google Scholar] [CrossRef]
- Navakauskiene, R.; Treigyte, G.; Borutinskaite, V.-V.; Matuzevicius, D.; Navakauskas, D.; Magnusson, K.-E. Alpha-Dystrobrevin and its Associated Proteins in Human Promyelocytic Leukemia Cells Induced to Apoptosis. J. Proteom. 2012, 75, 3291–3303. [Google Scholar] [CrossRef] [PubMed]
- Narigina, M.; Kempelis, A.; Romanovs, A. Machine Learning-Based Forecasting of Sensor Data for Enhanced Environmental Sensing. WSEAS Trans. Syst. 2023, 22, 543–555. [Google Scholar] [CrossRef]
- Valiuliene, G.; Treigyte, G.; Savickiene, J.; Matuzevicius, D.; Alksne, M.; Jarasiene-Burinskaja, R.; Bukelskiene, V.; Navakauskas, D.; Navakauskiene, R. Histone Modifications Patterns in Tissues and Tumours from Acute Promyelocytic Leukemia Xenograft Model in Response to Combined Epigenetic Therapy. Biomed. Pharmacother. 2016, 79, 62–70. [Google Scholar] [CrossRef] [PubMed]
- Narigina, M.; Osadcijs, E.; Romanovs, A. Analysis of Medical Data Processing Technologies. In Proceedings of the 2022 63rd International Scientific Conference on Information Technology and Management Science of Riga Technical University (ITMS), Riga, Latvia, 6–7 October 2022; pp. 1–6. [Google Scholar] [CrossRef]
- Sultanovs, E.; Strebko, J.; Romanovs, A.; Lektauers, A. The Information Technologies in the Control Mechanism of Medical Processes. In Proceedings of the 2020 61st International Scientific Conference on Information Technology and Management Science of Riga Technical University (ITMS), Riga, Latvia, 15–16 October 2020; pp. 1–5. [Google Scholar] [CrossRef]
- Wei, Z. A Novel Facial Expression Recognition Method for Real-Time Applications. In IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2019; Volume 612, p. 062048. Available online: https://iopscience.iop.org/article/10.1088/1757-899X/612/5/052048/pdf (accessed on 1 June 2024).
- Qu, D.; Zheng, Y.; Li, H.; Wang, J.; Chen, X. Facial Emotion Recognition using CNN in PyTorch. arXiv 2023, arXiv:2312.10818. Available online: https://arxiv.org/pdf/2312.10818 (accessed on 22 July 2024).
- Verkruysse, W.; Svaasand, L.O.; Nelson, J.S. Remote photoplethysmographic imaging using ambient light. Opt. Express 2008, 16, 21434–21445. [Google Scholar] [CrossRef] [PubMed]
- Georgescu, M.-I.; Ionescu, R.T.; Popescu, M. Local Learning with Deep and Handcrafted Features for Facial Expression Recognition. IEEE Access 2019, 7, 64827–64836. [Google Scholar] [CrossRef]
- Pecoraro, R.; Basile, V.; Bono, V. Local Multi-Head Channel Self-Attention for Facial Expression Recognition. Information 2022, 13, 419. [Google Scholar] [CrossRef]
- Fard, A.P.; Mahoor, M.H. Ad-Corre: Adaptive Correlation-Based Loss for Facial Expression Recognition in the Wild. IEEE Access 2022, 10, 26756–26768. [Google Scholar] [CrossRef]
- Vignesh, S.; Savithadevi, M.; Sridevi, M.; Sridhar, R. A Novel Facial Emotion Recognition Model Using Segmentation VGG-19 Architecture. Int. J. Inf. Technol. 2023, 15, 1777–1787. [Google Scholar] [CrossRef]
- Mukhopadhyay, M.; Dey, A.; Kahali, S. A Deep-Learning-Based Facial Expression Recognition Method Using Textural Features. Neural Comput. Appl. 2023, 35, 6499–6514. [Google Scholar] [CrossRef]
- Shahzad, T.; Iqbal, K.; Khan, M.A.; Iqbal, N. Role of Zoning in Facial Expression Using Deep Learning. IEEE Access 2023, 11, 16493–16508. [Google Scholar] [CrossRef]
- El Boudouri, Y.; Bohi, A. EmoNeXt: An Adapted ConvNeXt for Facial Emotion Recognition. In Proceedings of the 2023 IEEE 25th International Workshop on Multimedia Signal Processing (MMSP), Poitiers, France, 27–29 September 2023. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Erhan, D.; Carrier, P.L.; Courville, A.; Mirza, M.; Hamner, B.; Cukierski, W.; Tang, Y.; Thaler, D.; Lee, D.-H. Challenges in Representation Learning: A Report on Three Machine Learning Contests. In Proceedings of the Neural Information Processing: 20th International Conference, ICONIP 2013, Daegu, Republic of Korea, 3–7 November 2013; Proceedings, Part III. Springer: Berlin/Heidelberg, Germany, 2013; pp. 117–124. [Google Scholar] [CrossRef]
- Sambare, M. FER 2013: Facial Expression Recognition Dataset [Data Set]. Kaggle. Available online: https://www.kaggle.com/datasets/msambare/fer2013/data (accessed on 1 June 2024).
- Chand, H.V.; Chrisanthus, A.; Thampi, A.K. A Review on Various CNN-based Approaches for Facial Emotion Recognition. In Proceedings of the International Congress on Information and Communication Technology, Lalitpur, Nepal, 26–28 April 2023; Available online: https://ieeexplore.ieee.org/document/10133947 (accessed on 1 June 2024).
- Şen, S.Y.; Özkurt, N. Convolutional Neural Network Hyperparameter Tuning with Adam Optimizer for ECG Classification. In Proceedings of the 2020 Innovations in Intelligent Systems and Applications Conference (ASYU), Istanbul, Turkey, 15–17 October 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Ozdemir, M.A.; Elagoz, B.; Alkan, A. Real Time Emotion Recognition from Facial Expressions Using CNN. In Proceedings of the 2019 Medical Technologies Congress (TIPTEKNO), Izmir, Turkey, 3–5 October 2019; IEEE: Piscataway, NJ, USA, 2019. Available online: https://ieeexplore.ieee.org/document/8895215 (accessed on 1 June 2024).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Empirical Evaluation of Rectified Activations in Convolution Network. arXiv 2015. [Google Scholar] [CrossRef]
- Chowdhury, S.; Chowdhury, S.; Ifty, J.T.; Khan, R. Vehicle Detection and Classification Using Deep Neural Networks. In Proceedings of the 2022 International Conference on Electrical and Information Technology (IEIT), Malang, Indonesia, 15–16 September 2022; pp. 95–100. [Google Scholar] [CrossRef]
- Wang, Y.; Bian, Z.-P.; Hou, J.; Chau, L.-P. Convolutional Neural Networks With Dynamic Regularization. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 2299–2304. [Google Scholar] [CrossRef]
- Li, X.; Chen, J.; Zhao, G.; Pietikäinen, M. Remote Heart Rate Measurement from Face Videos under Realistic Situations. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 4264–4271. [Google Scholar] [CrossRef]
- Pradeep, V.; Madhushree, B.; Sumukha, B.S.; Richards, G.R.; Prashant, S.P. Facial Emotion Detection using CNN and OpenCV. In Proceedings of the 2024 International Conference on Emerging Technologies in Computer Science for Interdisciplinary Applications (ICETCS), Bengaluru, India, 25–26 April 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Yang, Y.; Liu, F.; Li, M.; Jin, J.; Weber, E.; Liu, Q.; Crozier, S. Pseudo-Polar Fourier Transform-Based Compressed Sensing MRI. IEEE Trans. Biomed. Eng. 2017, 64, 816–825. [Google Scholar] [CrossRef]
- Tarassenko, L.; Villarroel, M.; Guazzi, A.; Jorge, J.; Clifton, D.A.; Pugh, C. Non-contact video-based vital sign monitoring using ambient light and auto-regressive models. Physiol. Meas. 2014, 35, 807–831. [Google Scholar] [CrossRef]
- De Haan, G.; Jeanne, V. Robust pulse rate from chrominance-based rPPG. IEEE Trans. Biomed. Eng. 2013, 60, 2878–2886. [Google Scholar] [CrossRef]
- Poh, M.-Z.; McDuff, D.J.; Picard, R.W. Non-contact, automated cardiac pulse measurements using video imaging and blind source separation. Opt. Express 2010, 18, 10762–10774. [Google Scholar] [CrossRef]
- Song, C.; Chen, C.; Li, Y.; Wu, X. Deep Reinforcement Learning Apply in Electromyography Data Classification. In Proceedings of the 2018 IEEE International Conference on Cyborg and Bionic Systems (CBS), Shenzhen, China, 25–27 October 2018; pp. 505–510. [Google Scholar] [CrossRef]
- Gang, Z.; Jia, C.; Guo, C.; Li, P.; Gao, J.; Zhao, L. Predicting Chronic Obstructive Pulmonary Disease Based on Multi-Stage Composite Ensemble Learning Framework. In Proceedings of the 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Istanbul, Turkiye, 5–8 December 2023; pp. 1921–1924. [Google Scholar] [CrossRef]
- Chang, J.; Zhang, Z.; Wang, Z.; Li, J.; Meng, L.; Lin, P. Generative Listener EEG for Speech Emotion Recognition Using Generative Adversarial Networks With Compressed Sensing. IEEE J. Biomed. Health Inform. 2024, 28, 2025–2036. [Google Scholar] [CrossRef] [PubMed]
- Le, M.D.; Singh Rathour, V.; Truong, Q.S.; Mai, Q.; Brijesh, P.; Le, N. Multi-module Recurrent Convolutional Neural Network with Transformer Encoder for ECG Arrhythmia Classification. In Proceedings of the 2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI), Athens, Greece, 27–30 July 2021; pp. 1–5. [Google Scholar] [CrossRef]
- Zhao, F.; Nie, J.; Ma, M.; Chen, X.; He, X.; Wang, B.; Hou, Y. Assessing the Role of Different Heterogeneous Regions in DCE-MRI for Predicting Molecular Subtypes of Breast Cancer based on Network Architecture Search and Vision Transformer. In Proceedings of the 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Sydney, Australia, 24–27 July 2023; pp. 1–4. [Google Scholar] [CrossRef]
Emotion | Training Images | Test Images |
---|---|---|
Angry | 3995 | 958 |
Disgust | 436 | 111 |
Fear | 4097 | 1024 |
Happy | 7215 | 1774 |
Neutral | 4965 | 1233 |
Sad | 4830 | 1247 |
Surprise | 3171 | 831 |
Emotions | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
Angry | 0.55 | 0.51 | 0.53 | 958 |
Fear | 0.45 | 0.24 | 0.32 | 1024 |
Happy | 0.75 | 0.88 | 0.81 | 1774 |
Neutral | 0.52 | 0.61 | 0.57 | 1233 |
Sad | 0.47 | 0.49 | 0.48 | 1247 |
Surprise | 0.70 | 0.72 | 0.71 | 831 |
Accuracy | 0.62 | 7067 | ||
Macro Avg | 0.60 | 0.59 | 0.59 | 7067 |
Weighted Avg | 0.61 | 0.62 | 0.61 | 7067 |
Emotions | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
Angry | 0.58 | 0.55 | 0.56 | 958 |
Fear | 0.50 | 0.28 | 0.36 | 1024 |
Happy | 0.85 | 0.88 | 0.87 | 1774 |
Neutral | 0.55 | 0.68 | 0.61 | 1233 |
Sad | 0.49 | 0.53 | 0.51 | 1247 |
Surprise | 0.72 | 0.77 | 0.75 | 831 |
Accuracy | 0.64 | 7067 | ||
Macro Avg | 0.62 | 0.62 | 0.61 | 7067 |
Weighted Avg | 0.63 | 0.64 | 0.63 | 7067 |
Model | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
Logistic regression | 0.40 | 0.39 | 0.38 | 0.38 |
SVM | 0.45 | 0.44 | 0.43 | 0.43 |
k-NN | 0.42 | 0.40 | 0.39 | 0.39 |
CNN model 1 | 0.62 | 0.60 | 0.59 | 0.59 |
CNN model 2 | 0.64 | 0.62 | 0.62 | 0.61 |
Model | Accuracy |
---|---|
Residual Masking Network | 0.74 |
EmoNeXt | 0.71 |
CNN model 1 | 0.62 |
CNN model 2 | 0.64 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Narigina, M.; Romanovs, A.; Merkuryev, Y. Convolutional Neural Network-Based Digital Diagnostic Tool for the Identification of Psychosomatic Illnesses. Algorithms 2024, 17, 329. https://doi.org/10.3390/a17080329
Narigina M, Romanovs A, Merkuryev Y. Convolutional Neural Network-Based Digital Diagnostic Tool for the Identification of Psychosomatic Illnesses. Algorithms. 2024; 17(8):329. https://doi.org/10.3390/a17080329
Chicago/Turabian StyleNarigina, Marta, Andrejs Romanovs, and Yuri Merkuryev. 2024. "Convolutional Neural Network-Based Digital Diagnostic Tool for the Identification of Psychosomatic Illnesses" Algorithms 17, no. 8: 329. https://doi.org/10.3390/a17080329
APA StyleNarigina, M., Romanovs, A., & Merkuryev, Y. (2024). Convolutional Neural Network-Based Digital Diagnostic Tool for the Identification of Psychosomatic Illnesses. Algorithms, 17(8), 329. https://doi.org/10.3390/a17080329