A Semi-Supervised Lie Detection Algorithm Based on Integrating Multiple Speech Emotional Features
Abstract
:1. Introduction
2. Method
2.1. Feature Processing
2.2. Feature Fusion
2.3. Multi-Loss Optimization
2.3.1. Supervised Loss
2.3.2. Unsupervised Loss
2.3.3. Total Loss
Algorithm 1 Algorithm of the Multi loss optimization |
|
2.3.4. Functions of Different Modules
3. Experimental Section
3.1. Datasets
3.2. Experimental Setup and Evaluation Metrics
- (1)
- MFDF: Directly concatenate and fuse the 312 dimensional acoustic statistical features and 64 dimensional logarithmic Mel spectrogram features trained by the model in one dimension for lie detection.
- (2)
- MFDF + ML: Directly concatenate and fuse the 312 dimensional acoustic statistical features and 64-dimensional logarithmic Mel spectrogram features after model training, and combine them with a semi-supervised learning algorithm improved by LMMD loss for lie detection.
- (3)
- JAF: Using joint attention method to fuse the 312 dimensional acoustic statistical features and 64 dimensional logarithmic Mel spectrogram features after model training for lie detection.
3.3. Comparative Experiments
- (1)
- SS-AE: Utilizes a semi-supervised auto encoder [22] network for speech lie detection. This network extracts latent representations of unlabeled data using an encoder and then reconstructs features using a decoder to compute the reconstruction loss between the original and reconstructed data. This loss is utilized as an unsupervised loss to optimize the model’s generalization ability on unlabeled data.
- (2)
- Mean-teacher: Based on the Mean-teacher model [12,23], which utilizes a combination of the reconstruction loss from the autoencoder network and the consistency regularization loss from the Mean-teacher model to optimize the decision boundary, enhancing the classification ability of the semi-supervised model.
3.4. Confusion Matrix
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Viji, D.; Gupta, N.; Parekh, K.H. History of Deception Detection Techniques. In Proceedings of International Conference on Deep Learning, Computing and Intelligence: ICDCI 2021; Springer Nature Singapore: Singapore, 2022; pp. 373–387. [Google Scholar]
- Liu, Z.; Xu, J.; Wu, M.; Cao, W.; Chen, L.F.; Ding, X.W.; Hao, M.; Xie, Q. Review of emotional feature extraction and dimension reduction method for speech emotion recognition. Chin. J. Comput. 2017, 40, 1–23. [Google Scholar]
- Ekman, P.; O’Sullivan, M.; Friesen, W.V.; Scherer, K.R. Invited article: Face, voice, and body in detecting deceit. J. Nonverbal Behav. 1991, 15, 125–135. [Google Scholar]
- Kirchhuebel, C. The Acoustic and Temporal Characteristics of Deceptive Speech. Ph.D. Thesis, University of York, York, UK, 2013. [Google Scholar]
- Hansen, J.H.L.; Womack, B.D. Feature analysis and neural network-based classification of speech under stress. IEEE Trans. Speech Audio Process. 1996, 4, 307–313. [Google Scholar]
- Kirchhübel, C.; Howard, D.M.; Stedmon, A.W. Acoustic correlates of speech when under stress: Research, methods and future directions. Int. J. Speech, Lang. Law 2011, 18, 75–98. [Google Scholar]
- Springer. Springer Handbook of Speech Processing; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
- Bareeda, E.P.F.; Mohan, B.S.S.; Muneer, K.V.A. Lie detection using speech processing techniques. J. Phys. Conf. Ser. Iop Publ. 2021, 1921, 012028. [Google Scholar]
- Nasri, H.; Ouarda, W.; Alimi, A.M. ReLiDSS: Novel lie detection system from speech signal. In Proceedings of the 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), Agadir, Morocco, 29 November–2 December 2016; pp. 1–8. [Google Scholar]
- Zhou, Y.; Zhao, H.; Pan, X. Lie detection from speech analysis based on k–svd deep belief network model. In Proceedings of the Intelligent Computing Theories and Methodologies: 11th International Conference, ICIC 2015, Fuzhou, China, 20–23 August 2015; pp. 189–196. [Google Scholar]
- Sanaullah, M.; Gopalan, K. Deception detection in speech using bark band and perceptually significant energy features. In Circuits and Systems (MWSCAS), Proceedings of the 2013 IEEE 56th International Midwest Symposium, Columbus, OH, USA, 4–7 August 2013; IEEE: Piscataway, NJ, USA, 2013; p. 6674872. [Google Scholar]
- Tarvainen, A.; Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Adv. Neural Inf. Process. Syst. 2017, 30, 1195–1204. [Google Scholar]
- Sun, B.; Cao, S.; He, J.; Yu, L. Affect recognition from facial movements and body gestures by hierarchical deep spatio-temporal features and fusion strategy. Neural Netw. Off. J. Int. Neural Netw. Soc. 2018, 105, 36–51. [Google Scholar]
- Fang, Y.; Fu, H.; Tao, H.; Liang, R.; Zhao, L. A novel hybrid network model based on attentional multi-feature fusion for deception detection. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 2020, E104A, 622–626. [Google Scholar]
- Enos, F.; Shriberg, E.; Graciarena, M.; Hirschberg, J.; Stolcke, A. Detecting deception using critical segments. In Proceedings of the INTERSPEECH 2007, 8th Annual Conference of the International Speech Communication Association, Antwerp, Belgium, 27–31 August 2007; pp. 2281–2284. [Google Scholar]
- He, C.; Wei, H. Transformer-Based Deep Hashing Method for Multi-Scale Feature Fusion. In Proceedings of the ICASSP 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Praveen, R.G.; Granger, E.; Cardinal, P. Recursive joint attention for audio-visual fusion in regression based emotion recognition. In Proceedings of the ICASSP 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Bousquet, P.-M.; Rouvier, M. Jeffreys Divergence-Based Regularization of Neural Network Output Distribution Applied to Speaker Recognition. In Proceedings of the ICASSP 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Zhu, Y.; Zhuang, F.; Wang, J.; Ke, G.; Chen, J.; Bian, J.; Xiong, H.; He, Q. Deep Subdomain Adaptation Network for Image Classification. In IEEE Transactions on Neural Networks and Learning Systems, Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2021; Volume 32, pp. 1713–1722. [Google Scholar]
- Logan, B. Mel frequency cepstral coefficients for music modeling. Ismir 2000, 270, 11. [Google Scholar]
- Hirschberg, J.; Benus, S.; Brenier, J.M.; Enos, F.; Friedman, S.; Gilman, S.; Girand, C.; Graciarena, M.; Kathol, A.; Michaelis, L.; et al. Distinguishing deceptive from non-deceptive speech. In Proceedings of theINTERSPEECH 2005 - Eurospeech, 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, 4–8 September 2005; pp. 1833–1836. [Google Scholar]
- Deng, J.; Xu, X.; Zhang, Z.; Frühholz, S.; Schuller, B. Semisupervised autoencoders for speech emotion recognition. IEEE ACM Trans. Audio Speech Lang. Process. 2017, 26, 31–43. [Google Scholar]
- Fu, H.; Yu, H.; Wang, X.; Lu, X.; Zhu, C. A Semi-Supervised Speech Deception Detection Algorithm Combining Acoustic Statistical Features and Time-Frequency Two-Dimensional Features. Brain Sci. 2023, 13, 725. [Google Scholar] [CrossRef] [PubMed]
Model | Labeled Samples | ||
---|---|---|---|
200 | 600 | 1200 | |
MFDF | 58.89 | 61.85 | 63.89 |
MFDF + ML | 59.25 | 60.37 | 64.07 |
JAF | 61.67 | 61.60 | 63.52 |
proposed algorithm | 63.70 | 63.30 | 65.74 |
Model | Labeled Samples | ||
---|---|---|---|
200 | 600 | 1200 | |
SS-AE | 58.37 | 56.94 | 59.87 |
Mean-teacher | 55.45 | 58.39 | 60.84 |
proposed algorithm | 63.70 | 63.30 | 65.74 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xi, J.; Yu, H.; Xu, Z.; Zhao, L.; Tao, H. A Semi-Supervised Lie Detection Algorithm Based on Integrating Multiple Speech Emotional Features. Appl. Sci. 2024, 14, 7391. https://doi.org/10.3390/app14167391
Xi J, Yu H, Xu Z, Zhao L, Tao H. A Semi-Supervised Lie Detection Algorithm Based on Integrating Multiple Speech Emotional Features. Applied Sciences. 2024; 14(16):7391. https://doi.org/10.3390/app14167391
Chicago/Turabian StyleXi, Ji, Hang Yu, Zhe Xu, Li Zhao, and Huawei Tao. 2024. "A Semi-Supervised Lie Detection Algorithm Based on Integrating Multiple Speech Emotional Features" Applied Sciences 14, no. 16: 7391. https://doi.org/10.3390/app14167391
APA StyleXi, J., Yu, H., Xu, Z., Zhao, L., & Tao, H. (2024). A Semi-Supervised Lie Detection Algorithm Based on Integrating Multiple Speech Emotional Features. Applied Sciences, 14(16), 7391. https://doi.org/10.3390/app14167391