A Method for Automatic Emotion Detection Through Machine Learning
Abstract
1. Introduction
2. Related Work
3. Materials and Methods
- SqueezeNet: a deep model for image recognition with a thin structure, fewer structural parameters, and fewer calculations, in fact it only has 1 × 1 and 3 × 3 convolution kernels, with the aim of simplifying the network, while maintaining a classification accuracy similar to a public network [52].
- Inception v3: a deep Convolutional Neural Network’s framework that is particularly effective for image classification and recognition problems such as those of the ImageNet dataset. It has the innate ability to obtain fewer errors. It also has the advantage of minimizing concatenation operations to highly associated nodes so that they remain scattered [53].
- Logistic Regression: a classification algorithm with LASSO (L1) or ridge (L2) regularization. It learns a Logistic Regression model from the data and it only works for classification tasks [54].
- Neural Network: uses sklearn’s Multi-layer Perceptron algorithm that can learn non-linear models as well as linear [55].
- Gradient Boosting: a distributed gradient enhancement optimized library designed to be efficient and flexible. It provides parallel tree boosting, implementing machine learning algorithms under the Gradient Boosting framework [56].
- Random Forest: an ensemble learning method used for classification, regression, and other tasks [57].
- Conv2D: A 2-dimensional (2D) convolution layer with the purpose of generating a convolution kernel to produce an output tensor;
- MaxPooling2D: Performs down sampling of the input along its spatial dimensions to obtain the maximum value on an input window for each input channel;
- Batch Normalization: Essential for stabilizing the training of neural networks, its goal is to standardize the output, keeping the mean close to 0 and the standard deviation close to 1. During the training phase, it normalizes the data using the mean and standard deviation of the input batch. Instead, during inference, it uses a moving average of the statistics (mean and standard deviation) collected during training;
- Flatten: Converts multidimensional matrices into a two-dimensional matrix, typically in the transition from the convolutional layer to the fully connected layer, without influencing the batch size;
- Dropout: Prevents overfitting by randomly setting inputs to 0 during the training phase at a rate frequency at each step from this layer, while inputs not set to 0 are scaled by 1/(1 − rate) so that the sum of all inputs does not change;
- Dense: Each neuron takes input from every neuron in the prior layer, executing a matrix–vector multiplication. The values in this matrix, which serve as the layer’s parameters, can be learned and refined using backpropagation.
4. Results
- Disgust + Fear
- Happiness + Surprise
- Angry + Sadness
- Neutral
4.1. Experiment 1
4.2. Experiment 2
5. Conclusions and Future Work
- Our dataset shows a significant imbalance between the class “Happiness + Surprise” (13,417 images) and the class “Angry + Sadness” (with 12,151 images), compared with the classes “Disgust + Fear” (4339 images) and the single class “Neutral” (5718 images). To begin with, however, the “Disgust” and “Fear” classes, as present on the Kaggle repository, are individually made up of a smaller number of images, with 547 images for “Disgust” and 5121 images for “Fear”, compared with the 8989 images in the “Happiness” folder, 4002 images in the “Surprise” folder, 4953 images of the class “Angry”, and 6077 images of the class “sadness”. Therefore, a potential bias in favour of majority classes cannot be ruled out, which makes it appropriate to conduct future studies on more balanced datasets.
- The quality of some images within the dataset could be another factor influencing the learning capabilities of the model. Moreover, some images are characterized by a certain degree of ambiguity and similarity which makes it difficult to choose which emotional class they belong to, in addition to the fact that each classification, even if to a minimal extent, can be influenced by the subjectivity of the person analyzing the dataset during the classification of emotions.
- The Orange environment primarily provides a graphical interface that abstracts many implementation details. While this limits fine-grained control over learning parameters, it was selected to streamline experimentation and demonstrate model comparisons clearly. While Orange indeed abstracts some implementation details, it was intentionally chosen for its reproducibility and accessibility in comparative model evaluation. Despite limited parameter control, the environment supports standardized preprocessing and model benchmarking, ensuring fair and consistent comparisons across algorithms.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Campeau, S.; Falls, W.A.; Cullinan, W.; Picard, R.W. Informatica Affettiva; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
- Pel, G.; Li, H.; Lu, Y.; Wang, Y.; Hua, S.; LI, T. Affective Computing: Recent advances, challenges, and future trends. Intell. Comput. 2024, 3, 0076. [Google Scholar] [CrossRef]
- Cesarelli, M.; Martinelli, F.; Mercaldo, F.; Santone, A. Emotion recognition from facial expression using explainable deep learning. In 2022 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), Calabria, Italy, 12–15 September 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
- Liu, H.; Zhou, Q.; Zhang, C.; Zhu, J.; Liu, T.; Zhang, Z.; Li, Y.F. MMATrans: Muscle movement aware representation learning for facial expression recognition via transformers. IEEE Trans. Ind. Inform. 2024, 20, 13753–13764. [Google Scholar] [CrossRef]
- Liu, T.; Wang, J.; Yang, B.; Wang, X. NGDNet: Nonuniform Gaussian-label distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 2021, 436, 210–220. [Google Scholar] [CrossRef]
- Liu, H.; Deng, L.; Liu, T.; Meng, R.; Zhang, Z.; Li, Y.F. ACAForms: Learning Adaptive Context-Aware Feature for Facial Expression Recognition in Human-Robot Interaction. In Proceedings of the 2024 9th International Conference on Robotics and Automation Engineering (ICRAE), Singapore, 15–17 November 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 175–180. [Google Scholar]
- Mattioli, M.; Cabitza, F. Not in my face: Challenges and ethical considerations in automatic face emotion recognition technology. Mach. Learn. Knowl. Extr. 2024, 6, 2201–2231. [Google Scholar] [CrossRef]
- Canal, F.Z.; Müller, T.R.; Matias, J.C.; Scotton, G.G.; de Sa Junior, A.R.; Pozzebon, E.; Sobieranski, A.C. A survey on facial emotion recognition techniques: A state-of-the-art literature review. Inf. Sci. 2022, 582, 593–617. [Google Scholar] [CrossRef]
- Ekman, P.; Sorenson, E.R.; Friesen, W.V. Pan-cultural elements in facial displays of emotion. Science 1969, 164, 86–88. [Google Scholar] [CrossRef] [PubMed]
- Andrejevic, M.; Selwyn, N. Facial recognition technology in schools: Critical questions and concerns. Learn. Media Technol. 2020, 45, 115–128. [Google Scholar] [CrossRef]
- Turk, M.; Pentland, A. Eigenfaces for recognition. J. Cogn. Neurosci. 1991, 3, 71–86. [Google Scholar] [CrossRef]
- Dimino, G. Introduction to Modern Artificial Intelligence Techniques. Elettronica e Telecomunicazioni, 2020. pp. 5–20, RAI-Centre for Research, Technological Innovation and Experimentation. Available online: http://www.crit.rai.it/eletel/2020-1/201-2.pdf (accessed on 18 December 2025).
- Naga, P.; Marri, S.D.; Borreo, R. Facial emotion recognition methods, datasets and technologies: A literature survey. Mater. Today Proc. 2023, 80, 2824–2828. [Google Scholar] [CrossRef]
- Mohanta, S.R.; Veer, K. Trends and challenges of image analysis in facial emotion recognition: A review. Netw. Model. Anal. Health Inform. Bioinform. 2022, 11, 35. [Google Scholar] [CrossRef]
- Jones, M.; Viola, P. Fast Multi-View Face Detection; Mitsubishi Electric Research Lab TR-20003-96; Mitsubishi Electric Research Laboratories, Inc.: Cambridge, MA, USA, 2003; Volume 3, p. 2. [Google Scholar]
- Soo, S. Object Detection Using Haar-Cascade Classifier; Institute of Computer Science, University of Tartu: Tartu, Estonia, 2014; Volume 2, pp. 1–12. [Google Scholar]
- Kumar, K.S.; Prasad, S.; Semwal, V.B.; Tripathi, R.C. Real time face recognition using AdaBoost improved fast PCA algorithm. Int. J. Artif. Intell. Appl. 2011, 2, 45–58. [Google Scholar] [CrossRef]
- Rajesh, K.; Naveenkumar, M. A robust method for face recognition and face emotion detection system using support vector machines. In Proceedings of the 2016 International Conference on Electrical, Electronics, Communication, Computer and Optimization Techniques (ICEECCOT), Mysuru, India, 9–10 December 2016; pp. 1–5. [Google Scholar]
- Wang, Y.; Li, Y.; Song, Y.; Rong, X. Facial expression recognition based on random forest and convolutional Neural Network. Information 2019, 10, 375. [Google Scholar] [CrossRef]
- Li, X.; Ji, Q. Active affective state detection and user assistance with dynamic Bayesian Networks. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 2004, 35, 93–105. [Google Scholar] [CrossRef]
- Adyapady, R.R.; Annappa, B. A comprehensive review of facial expression recognition techniques. Multimed. Syst. 2023, 29, 73–103. [Google Scholar] [CrossRef]
- Ghimire, D.; Jeong, S.; Lee, J.; Park, S.H. Facial expression recognition based on local region specifc features and support vector machines. Multimed. Tools Appl. 2017, 76, 7803–7821. [Google Scholar] [CrossRef]
- Ahonen, T.; Hadid, A.; Pietikainen, M. Face description with local binary patterns: Application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 2037–2041. [Google Scholar] [CrossRef]
- Zhong, L.; Liu, Q.; Yang, P.; Huang, J.; Metaxas, D.N. Learning multiscale active facial patches for expression analysis. IEEE Transact. Cybern. 2014, 45, 1499–1510. [Google Scholar] [CrossRef]
- Liong, S.T.; See, J.; Wong, K.; Phan, R.C.W. Less is more: Micro-expression recognition from video using apex frame. Signal Process. 2018, 62, 82–92. [Google Scholar] [CrossRef]
- Guo, C.; Liang, J.; Zhan, G.; Liu, Z.; Pietikäinen, M.; Liu, L. Extended local binary patterns for efficient and robust spontaneous facial micro-expression recognition. IEEE Access 2019, 7, 174517–174530. [Google Scholar] [CrossRef]
- Rashmi, R.A.; Annappa, B. Micro expression recognition using delaunay triangulation and voronoi tessellation. IETE J. Res. 2023, 69, 8019–8035. [Google Scholar]
- Kim, D.H.; Baddar, W.J.; Jang, J.; Ro, Y.M. Multi-objective based spatio-temporal feature representation learning robust to expression intensity variations for facial expression recognition. IEEE Trans. Affect. Comput. 2017, 10, 223–236. [Google Scholar] [CrossRef]
- Zhang, K.; Huang, Y.; Du, Y.; Wang, L. Facial expression recognition based on deep evolutional spatial-temporal networks. IEEE Trans. Image Process. 2017, 26, 4193–4203. [Google Scholar] [CrossRef]
- Sun, N.; Li, Q.; Huan, R.; Liu, J.; Han, G. Deep spatial-temporal feature fusion for facial expression recognition in static images. Pattern Recogn. Lett. 2019, 119, 49–61. [Google Scholar] [CrossRef]
- Georgescu, M.I.; Ionescu, R.T.; Popescu, M. Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 2019, 7, 64827–64836. [Google Scholar] [CrossRef]
- Ruan, D.; Yan, Y.; Lai, S.; Chai, Z.; Shen, C.; Wang, H. Feature decomposition and reconstruction learning for effective facial expression recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7660–7669. [Google Scholar]
- Cai, J.; Meng, Z.; Khan, A.S.; O’Reilly, J.; Li, Z.; Han, S.; Tong, Y. Identity-free facial expression recognition using conditional generative adversarial network. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1344–1348. [Google Scholar]
- Liu, Y.; Feng, C.; Yuan, X.; Zhou, L.; Wang, W.; Qin, J.; Luo, Z. Clip-aware expressive feature learning for video-based facial expression recognition. Inf. Sci. 2022, 598, 182–195. [Google Scholar] [CrossRef]
- Xie, S.; Hu, H.; Wu, Y. Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition. Pattern Recogn. 2019, 92, 177–191. [Google Scholar] [CrossRef]
- Li, S.; Deng, W. Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Trans. Image Process. 2018, 28, 356–370. [Google Scholar] [CrossRef]
- Sun, X.; Xia, P.; Zhang, L.; Shao, L. A roi-guided deep architecture for robust facial expressions recognition. Inf. Sci. 2020, 522, 35–48. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Shao, J.; Qian, Y. Three convolutional neural network models for facial expression recognition in the wild. Neurocomputing 2019, 355, 82–92. [Google Scholar] [CrossRef]
- Safarov, F.; Kutlimuratov, A.; Khojamuratova, U.; Abdusalomov, A.; Cho, Y.I. Enhanced AlexNet with Gabor and Local Binary Pattern Features for Improved Facial Emotion Recognition. Sensors 2025, 25, 3832. [Google Scholar] [CrossRef]
- Li, S.; Wang, J.; Tian, L.; Wang, J.; Huang, Y. A fine-grained human facial key feature extraction and fusion method for emotion recognition. Sci. Rep. 2025, 15, 6153. [Google Scholar] [CrossRef]
- So, J.; Han, Y. Facial Landmark-Driven Keypoint Feature Extraction for Robust Facial Expression Recognition. Sensors 2025, 25, 3762. [Google Scholar] [CrossRef]
- Lee, J.; Choi, Y.; Kim, H.; Kim, I.J.; Nam, G.P. Navigating label ambiguity for facial expression recognition in the wild. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; AAAI Press: Washington, DC, USA, 2025; Volume 39, pp. 4517–4525. [Google Scholar]
- Abate, A.F.; Bisogni, C.; Castiglione, A.; Nappi, M. Head pose estimation: An extensive survey on recent techniques and applications. Pattern Recognit. 2022, 127, 108591. [Google Scholar] [CrossRef]
- Hsu, W.Y.; Chung, C.J. A novel eye center localization method for head poses with large rotations. IEEE Trans. Image Process. 2020, 30, 1369–1381. [Google Scholar] [CrossRef]
- Kazemi, V.; Sullivan, J. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1867–1874. [Google Scholar]
- Hempel, T.; Abdelrahman, A.A.; Al-Hamadi, A. 6d rotation representation for unconstrained head pose estimation. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 2496–2500. [Google Scholar]
- Liu, H.; Zhang, C.; Deng, Y.; Liu, T.; Zhang, Z.; Li, Y.F. Orientation cues-aware facial relationship representation for head pose estimation via transformer. IEEE Trans. Image Process. 2023, 32, 6289–6302. [Google Scholar] [CrossRef]
- Narayan, K.; VS, V.; Chellappa, R.; Patel, V.M. Facexformer: A unified transformer for facial analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Honolulu, HI, USA, 19–23 October 2025; pp. 11369–11382. [Google Scholar]
- Wu, X.G.; Xie, H.J.; Niu, X.C.; Wang, C.; Wang, Z.L.; Zhang, S.W.; Shan, Y.Z. Transformer-based weakly supervised 3D human pose estimation. J. Vis. Commun. Image Represent. 2025, 109, 104432. [Google Scholar] [CrossRef]
- Wang, A.; Wang, M.; Jiang, K.; Cao, M.; Iwahori, Y. A dual neural architecture combined SqueezeNet with OctConv for LiDAR data classification. Sensors 2019, 19, 4927. [Google Scholar] [CrossRef]
- Alsubai, S.; Alqahtani, A.; Sha, M. Genetic hyperparameter optimization with Modified Scalable-Neighbourhood Component Analysis for breast cancer prognostication. Neural Netw. 2023, 162, 240–257. [Google Scholar] [CrossRef]
- Available online: https://orangedatamining.com/widget-catalog/model/logisticregression/ (accessed on 1 September 2025).
- Available online: https://orangedatamining.com/widget-catalog/model/neuralnetwork/ (accessed on 1 September 2025).
- Available online: https://xgboost.readthedocs.io/en/latest/index.html (accessed on 1 September 2025).
- Available online: https://orangedatamining.com/widget-catalog/model/randomforest/ (accessed on 1 September 2025).
- Available online: https://www.kaggle.com/datasets/ananthu017/emotion-detection-fer/data (accessed on 1 September 2025).
- Available online: https://orangedatamining.com/download/ (accessed on 1 September 2025).
- Cai, Y.; Li, X.; Li, J. Emotion recognition using different sensors, emotion models, methods and datasets: A comprehensive review. Sensors 2023, 23, 2455. [Google Scholar] [CrossRef]















| Algorithms | Hyperparameters |
|---|---|
| Logistic Regression | Regularization type ridge L2, strength C = 1 |
| Neural Network | Neurons in hidden layers 100, activation ReLu, solver Adam, regularization alfa = 0.0001, maximal number of iterations 200, replicable training |
| Gradient Boosting (Experiment 1) | Number of trees 100, learning rate 0.300; replicable training, regularization Lambda 1, limit depth of individual trees 6, fraction of training instances 1.00, fraction of features for each tree 1.00, fraction of features for each level 1.00, fraction of features for each split 1.00 |
| Gradient Boosting (Experiment 2) | Number of trees 150, learning rate 0.100; replicable training, regularization Lambda 1, limit depth of individual trees 4, fraction of training instances 0.80, fraction of features for each tree 0.80, fraction of features for each level 0.80, fraction of features for each split 0.80 |
| Random Forest (Experiment 2) | Number of trees 100, number of attributes considered at each split 20, limit depth of individual trees 15, do not spilt smaller than 10 |
| Class (Emotion) | # of Images |
|---|---|
| Disgust + Fear | 4339 |
| Happiness + Surprise | 13,417 |
| Angry + Sadness | 12,151 |
| Neutral | 5718 |
| Total | 35,625 |
| Embedder: SqueezeNet | Precision | Recall | F1 |
|---|---|---|---|
| Logistic Regression | 0.589 | 0.605 | 0.587 |
| Neutral Network | 0.600 | 0.601 | 0.600 |
| Gradient Boosting | 0.617 | 0.620 | 0.618 |
| Embedder: Inception v3 | Precision | Recall | F1 |
|---|---|---|---|
| Logistic Regression | 0.605 | 0.617 | 0.610 |
| Neural Network | 0.614 | 0.617 | 0.616 |
| Gradient Boosting | 0.622 | 0.626 | 0.623 |
| Embedder: SqueezeNet | AUC | CA | F1 | Precision | Recall | MCC |
|---|---|---|---|---|---|---|
| Logistic Regression | −3 | 0.605 | 0.587 | 0.589 | 0.605 | 0.421 |
| Random Forest | −3 | 0.573 | 0.524 | 0.620 | 0.573 | 0.362 |
| Neural Network | −3 | 0.611 | 0.610 | 0.610 | 0.611 | 0.445 |
| Gradient Boosting | −3 | 0.596 | 0.564 | 0.599 | 0.596 | 0.401 |
| Embedder: Inception v3 | AUC | CA | F1 | Precision | Recall | MCC |
|---|---|---|---|---|---|---|
| Logistic Regression | −3 | 0.611 | 0.603 | 0.599 | 0.611 | 0.436 |
| Random Forest | −3 | 0.579 | 0.524 | 0.627 | 0.579 | 0.371 |
| Neural Network | −4 | 0.627 | 0.625 | 0.624 | 0.627 | 0.466 |
| Gradient Boosting | −3 | 0.611 | 0.584 | 0.609 | 0.611 | 0.426 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Lucarelli, J.; Cesarelli, M.; Santone, A.; Martinelli, F.; Mercaldo, F. A Method for Automatic Emotion Detection Through Machine Learning. Appl. Sci. 2026, 16, 397. https://doi.org/10.3390/app16010397
Lucarelli J, Cesarelli M, Santone A, Martinelli F, Mercaldo F. A Method for Automatic Emotion Detection Through Machine Learning. Applied Sciences. 2026; 16(1):397. https://doi.org/10.3390/app16010397
Chicago/Turabian StyleLucarelli, Jessica, Mario Cesarelli, Antonella Santone, Fabio Martinelli, and Francesco Mercaldo. 2026. "A Method for Automatic Emotion Detection Through Machine Learning" Applied Sciences 16, no. 1: 397. https://doi.org/10.3390/app16010397
APA StyleLucarelli, J., Cesarelli, M., Santone, A., Martinelli, F., & Mercaldo, F. (2026). A Method for Automatic Emotion Detection Through Machine Learning. Applied Sciences, 16(1), 397. https://doi.org/10.3390/app16010397

