Detecting Deceptive Behaviours through Facial Cues from Videos: A Systematic Review

: Interest in detecting deceptive behaviours by various application ﬁelds, such as security systems, political debates, advanced intelligent user interfaces, etc., makes automatic deception detection an active research topic. This interest has stimulated the development of many deception-detection methods in the literature in recent years. This work systematically reviews the literature focused on facial cues of deception. The most relevant methods applied in the literature of the last decade have been surveyed and classiﬁed according to the main steps of the facial-deception-detection process (video pre-processing, facial feature extraction, and decision making). Moreover, datasets used for the evaluation and future research directions have also been analysed.


Introduction
Lying is a complex social activity by someone (i.e., the deceiver) aimed at causing a specific behaviour in another person (i.e., the deceived) by making their view of the situation more congruent with the view and behaviour of the deceived that usually occurs, according to the deceiver's knowledge [1]. Deception may destroy the relationship and hamper communication, leading to negative consequences [2]. Therefore, deception detection has been an investigated research topic for decades, and the development of tools and methods for detecting deceptive behaviours has become an urgent need for society. However, despite a rich corpus of deception research, detecting deceptive behaviours is still a very challenging task, mainly because human beings do not have good lie-detection abilities. A study by Bond and DePaulo [3] quantifies the human accuracy of discriminating between lies and truths at 54% on average, in other words just slightly above a random guess. This percentage is also confirmed by the Hartwig's [4] more recent study. In response to the necessity to improve this accuracy rate, researchers have long been trying to decode human behaviour in an attempt to discover deceptive cues. In this study, what we consider "deceptive" is human behaviour in which the deceiver intentionally acts to make the deceived believe in something the deceiver considers false [5]. This is a conscious and deliberated act, opposed to the unconscious (non-deceptive) behaviour in which a person provides false information believed to be true.
To discover deceptive cues, different methods have been proposed by scientists from different disciplines, ranging from psychologists and physiologists to technologists. For instance, methods based on the psycho-physiological detection of deception that uses mainly psychological tests and physiological monitoring have been applied [6,7]. A positive contribution is also given by technologists that developed methods for deceptive detection based on technological tools. This study focuses on the technological field, in which a considerable number of methods for detecting verbal (explicit) and non-verbal (implicit) cues of deception have been developed [8]. Explicit cues are categorised according to their nature (i.e., visual, verbal, or paralinguistic) [9], while implicit cues involve facial expressions, body movements/posture, eye contact, and hand movements [10].

1.
to collect and systematise the scientific knowledge related to automated face deception detection from videos; 2.
to summarise (i) the methods used for video pre-processing, (ii) the facial extracted features, (iii) the decision-making algorithms, and (iv) the datasets used for the evaluation; 3.
to point out future research directions that emerged from the analysis of the existing studies.
The paper is structured as follows. Section 2 introduces the related work. Section 3 illustrates the methodology followed for the systematic literature review, while in Section 4, the results are discussed. Section 5 describes how the surveyed studies answered the review questions defined in the paper. Section 6 summarises future research directions that emerged from the literature review. Finally, Section 7 concludes the paper.

Related Work
Automated deception detection from videos is a challenging task that can find applications in many real-world scenarios, including airport security screening, job interviews, court trials, and personal credit risk assessment [14].
Various surveys of automated deception-detection methods from videos have been developed in the literature, classifying the methods according to different dimensions. In particular, recent surveys (see Table 1) have provided a very broad overview of deceptiondetection methods considering monomodal and multimodal features. The survey developed by Constâncio et al. [15] provides a systematic literature review of deception detection based on machine-learning methods, underlying the techniques and approaches applied, the difficulties, the kind of data, and the performance levels achieved. In this survey, verbal and non-verbal cues, such as facial expressions, gestures, and prosodic and linguistic features, have been considered. The 81 surveyed papers are classified according to the type of extracted features (emotional, psychological, and facial) and the kinds of machine-learning algorithms. Analogously, the survey in [16] gives an overview of automated deception detection through machine-intelligence-based techniques by providing a critical analysis of the existing tools and available datasets. The authors focused on deception detection through text, speech, and video data analysis by classifying the 100 surveyed papers according to both the research domain (i.e., psychological, professional, and computational) Appl. Sci. 2023, 13, 9188 3 of 20 and the type of extracted features (verbal, non-verbal, and multimodal). A further survey focused on monomodal features (speech cues) is proposed by Herchonvicz and de Santiago [17], in which deep-learning-based techniques, available datasets, and metrics extracted from the surveyed papers are discussed. learning algorithms. Analogously, the survey in [16] gives an overview of automated deception detection through machine-intelligence-based techniques by providing a critical analysis of the existing tools and available datasets. The authors focused on deception detection through text, speech, and video data analysis by classifying the 100 surveyed papers according to both the research domain (i.e., psychological, professional, and computational) and the type of extracted features (verbal, non-verbal, and multimodal). A further survey focused on monomodal features (speech cues) is proposed by Herchonvicz and de Santiago [17], in which deep-learning-based techniques, available datasets, and metrics extracted from the surveyed papers are discussed.

References Cues Techniques Datasets
Steps of the Deception Detection Process (Figure 1) Constâncio et al., 2023 [15] Verbal and non-verbal cues Machine-learning methods Not surveyed Step 2 and Step 3 Alaskar et al., 2023 [16] Verbal and non-verbal cues Machine-intelligence-based techniques Surveyed Step 2 and Step 3 Herchonvicz and de Santiago, 2021 [17] Verbal cues Deep-learning-based techniques Surveyed Step 2 and Step 3 Our survey Non-verbal cues Machine-learning methods Surveyed Step 1, Step 2, and Step 3 Different from existing surveys, this work focuses on deception detection methods specifically based on facial cues and proposes a classification of these methods based on the main steps of the facial deception detection process (video pre-processing, facial feature extraction, and decision making), as proposed by Thannoon et al. [18]. On the contrary, the existing surveys considered only the second and third steps of this process.
A general workflow of the followed deception detection process that used video datasets is provided in Figure 1.
During the video pre-processing step, the video is analysed by a face detector, which bounds the box containing the face, and by a facial-landmark-tracking tool, which locates and tracks the facial landmarks. In the facial-feature-extraction step, the extraction of a set Different from existing surveys, this work focuses on deception detection methods specifically based on facial cues and proposes a classification of these methods based on the main steps of the facial deception detection process (video pre-processing, facial feature extraction, and decision making), as proposed by Thannoon et al. [18]. On the contrary, the existing surveys considered only the second and third steps of this process.
A general workflow of the followed deception detection process that used video datasets is provided in Figure 1.
During the video pre-processing step, the video is analysed by a face detector, which bounds the box containing the face, and by a facial-landmark-tracking tool, which locates and tracks the facial landmarks. In the facial-feature-extraction step, the extraction of a set of facial features for recognizing the facial cues meaningful to deception is carried out. Finally, the classification of the extracted features into truthful or deceptive behaviour is performed in the decision-making step. According to these steps, the studies included in the review are analysed to extract (i) the methods used for video pre-processing, (ii) the facial extracted features, (iii) the decision-making algorithms, and (iv) the datasets used for the evaluation.

Materials and Methods
This section illustrates the methodology for systematically searching and analysing the existing literature on automated face deception detection from videos. The followed methodology was the systematic literature review (SLR) described in the PRISMA recommendations [19] composed of the following steps: (1) identifying the review focus; (2) specifying the review question(s); (3) identifying studies to include in the review; (4) data extraction and study quality appraisal; (5) synthesizing the findings; and (6) reporting the results.
The review focus consists of the collection and systematization of the scientific knowledge related to automated deception detection from videos focusing on facial signals. Specifically, we aim to study the methods for video pre-processing, the facial extracted features, the decision-making algorithms, and the available datasets.
As required by the second step of the SLR process, the following review questions (RQ) were defined: RQ 1: Which methods for video pre-processing are used?; RQ 2: Which facial features for automated deception detection are extracted?; RQ 3: Which decision-making algorithms for automated deception detection are used?; RQ 4: Which datasets for automated deception detection are used?
In the third step of the SLR process, the identification of the studies to include in the review was carried out following the four phases of the PRISMA recommendations [19], as shown in Figure 2. For the identification of an initial set of scientific works, two indexed scientific databases (Web of Science (WoS) and Scopus) containing formally published literature (e.g., journal articles, books, and conference papers) were used. Specifically, WoS and Scopus were used due to their proven usefulness in conducting bibliometric analyses of the existing literature in various research domains to establish trends [21]. Grey literature has not been included in the systematic review because there is no gold standard method For the identification of an initial set of scientific works, two indexed scientific databases (Web of Science (WoS) and Scopus) containing formally published literature (e.g., journal articles, books, and conference papers) were used. Specifically, WoS and Scopus were used due to their proven usefulness in conducting bibliometric analyses of the existing literature in various research domains to establish trends [21]. Grey literature has not been included in the systematic review because there is no gold standard method for searching grey literature, nor is there a defined methodology for the same, which makes it all the more difficult [22].
The search strings defined to search the scientific papers on the databases are as follows: ("deception detection" OR "lie detection" OR "deceptive behaviour*" OR "lie behaviour*" OR "detect* deception" AND "video*" AND "facial expression*"). Note that this study does not disambiguate among the different forms of deception (e.g., bluff, mystification, propaganda, etc.) because it aims to find studies focused on deceptive behaviours in general without restricting the search to specific forms of deception.
During the screening phase, the obtained results were filtered according to the inclusion and exclusion criteria shown in Table 2. The full text of the eligible studies was analysed by two reviewers using a quality evaluation checklist composed of four questions and related scores (see Table 3). The "disagreed" studies were evaluated by a moderator who provided the final scores. Table 3. Quality evaluation checklist and scores.

QA1
Does the study describe the methods for video pre-processing?
1-yes, the methods for video pre-processing are described in detail. 0.5-partially, the methods for video pre-processing are summarised without providing details on the different steps. 0-no, the methods for video pre-processing are only cited, without providing a description.

QA2
Does the study describe the facial extracted features?
1-yes, the facial extracted features are described in detail. 0.5-partially, the facial extracted features are summarised without providing a detailed description. 0-no, the facial extracted features are not described.

QA3
Does the study describe the decision-making algorithm?
1-yes, the decision-making algorithm is fully described. 0.5-partially, the decision-making algorithm is summarised without providing a detailed description. 0-no, the decision-making algorithm is not described.

QA4
Does the study use some datasets for the evaluation of the method?
1-yes, one or more datasets are used. 0-no, datasets are not used.
Studies that obtained a score < 1.5 were excluded from the qualitative analysis, while studies that scored 1.5 or more were included in the review.
Finally, analysis of the full texts of the included studies was performed, extracting the following information (if any): • Pre-processing techniques of facial cues; • Facial features, facial action coding system, and action units; • Facial deception classification approaches; • Evaluation datasets In the following sections, the findings resulting from the application of the SLR process are described.

Results of the SLR
As depicted in Figure 2, during the identification phase (performed in May 2023), a total of 230 articles were retrieved using the two search engines: 203 from Scopus and 27 from WoS.
As required by the duplication criterion, removing duplicate records resulted in 120 studies. By applying the understandability criterion, 6 studies that were not written in English were excluded, and therefore, a total of 114 articles were screened for the inclusion criteria. Moreover, the studies that did not satisfy the document type criterion (2 articles) and those that did not contain in the titles the terms used for the search (70 studies) were removed, and therefore, a total of 42 studies resulted at the end of the screening phase.
Four studies that were not accessible in full text (availability criterion) and six studies that were considered not relevant for the focus of the review (relevance criterion) were removed. The relevance criterion to the abstract was also applied, and therefore, a total of 32 articles were retained for a full evaluation of eligibility, evaluated by two reviewers according to the quality evaluation checklist shown in Table 3. Seven studies with a score less than 1.5 were excluded, while the remaining twenty-four studies were included in the qualitative synthesis. Table 4 provides an overview of the selected studies. The selected studies were published mainly in conference proceedings (15 studies; 62.5%), followed by journals (9 studies; 37.5%). Figure 3 shows the temporal distribution of the selected studies, underlying the growing interest of the scientific community in the facial-deception-detection topic, which reached a peak in 2020. Appl

References Authors
Year Document Type [23] Chebbi and Jebara 2020 Conference paper [24] Abbas and Al-Ani 2023 Article [25] Shen et al. The selected studies were published mainly in conference proceedings (15 studies; 62.5%), followed by journals (9 studies; 37.5%). Figure 3 shows the temporal distribution of the selected studies, underlying the growing interest of the scientific community in the facial-deception-detection topic, which reached a peak in 2020.

Discussion
In this section, an analysis of the answers to the four review questions (introduced in Section 3) extracted from the 24 surveyed studies is carried out. Specifically, to deal with RQ1, we analyse the methods for video pre-processing. Addressing RQ2, the facial features are classified. Concerning RQ3, the decision-making algorithms used for automated deception detection are analysed. Finally, as part of RQ4, the used datasets are analysed. Pre-processing consists of those operations that prepare data for subsequent analysis in an attempt to compensate for systematic errors. The videos are subjected to several corrections, such as geometric, radiometric, and atmospheric, although all these corrections might not be necessarily applied in all cases. These errors are systematic and can be removed before they reach the user. According to the nature of the information to be extracted from the videos, the most relevant pre-processing techniques are chosen.
Facial region detection, landmark localisation, and tracking are necessary for preprocessing when capturing videos in unconstrained recording conditions. This step has the aim of localizing and extracting the face region from the background and identifying the locations of the facial key landmark points. These points have been defined as "either the dominant points describing the unique location of a facial component (e.g., eye corner) or an interpolated point connecting those dominant points around the facial components and facial contour" [44]. Regardless of the application domain, face detection and landmark tracking from input video streams are preliminary steps necessary to process any kind of information extractable from the face.
The vast majority of these methods and tools are based on 2D image processing. The main drawback of 2D face-detection and landmark-tracking methods is that their performances depend strictly on the following four factors: the pose of the head that can generate self-occlusions and deformations; occlusions caused by masks, sunglasses, or other faces; illumination variations caused by internal camera control and skin reflectance properties; time delay caused by face changes over time. However, these methods have the advantages of being generally easier and less expensive in terms of computational load compared to 3D face-detection methods. This last class of methods uses a 3D dataset that contains both face and head shapes as range data or polygonal meshes [55]. The 3D methods allow the fundamental limitations of 2D methods to be overcome since they are insensitive to pose and illumination variations.
A widely accepted classification of face-detection techniques [56] distinguishes them into two categories: feature-based and image-based approaches. On the one hand, featurebased techniques make explicit use of face knowledge by extracting local or global features (e.g., eyes, nose, mouth) and use this knowledge to classify between facial and non-facial regions. The active shape model (ASM) [38] and the Viola-Jones method [33] are two of the most applied and successful feature-based methods so far. On the other hand, imagebased approaches learn to recognise a face pattern from examples following a training step for classifying examples into face and non-face prototype classes. The only imagebased approach, applied by the surveyed studies, is the Constrained Local Neural Field (CLNF) [46].
A total of 18 papers among those surveyed (24) use the face-detection and landmarktracking methods represented in Table 5.
From the table, we can observe that during the video pre-processing step, mainly 2D methods are used (17 studies; 94%), while only two works apply 3D methods. Moreover, the majority of the studies (14 studies; 78%) apply academic and commercial software that allows joint face-detection and landmark localization tasks to be performed. OpenFace [45] is the open-source tool most applied by surveyed facial-deception-detection works (7 studies; 39%). The remaining three surveyed works use specific face-detection and face-tracking algorithms, mainly belonging to the feature-based category. The most applied algorithms are the Viola-Jones face detector [33] and the Constrained Local Neural Field (CLNF) [46] used for face detection and face tracking, respectively.

RQ2: Which Facial Features for Automated Deception Detection Are Extracted?
The feature-extraction process consists of transforming raw data into a set of features to be used as reliable indicators for detecting deceptive behaviours.
A multitude of facial cues has been correlated with deception in the literature, including lip pressing, eye blinking, facial rigidity, smiling, and head movement. A total of 22 papers among those surveyed (24 studies) extracted the following 10 facial features, represented in Table 6  More than half of the studies (15 studies; 68%) rely on the Facial Action Coding System (FACS), an anatomically based system for representing all visually discernible facial movements proposed by Ekman et al. [42]. The FACS provides a list of action units (AUs) that are the individual components of muscle movement extracted from facial expressions. Ekman et al. [57] provided a list of AUs that are the most difficult to produce deliberately and the hardest to inhibit. Therefore, the majority of the analysed studies detect the movements of these AUs as potential indicators for distinguishing liars from truth-tellers in high-stakes situations.
A total of 18 studies (82%) rely on extracting facial expression features for detecting deceptive behaviours. Facial expressions, particularly micro-expressions (i.e., facial expressions of emotion that are expressed for 0.5 s or less), are often considered strong indicators of deception. Six basic facial expressions of emotion are defined in the literature: disgust, anger, enjoyment, fear, sadness (or distress), and surprise. In particular, the surveyed studies have detected deceptive behaviours by analysing the presence of both extreme facial movements, which can indicate high-intensity emotions that are not really being experienced, and facial rigidity (i.e., little or no facial movements) that deceivers could purposefully inhibit to appear truthful. Moreover, half of the studies (11) rely on head pose, followed by gaze (9 studies; 41%).

RQ3: Which Decision-Making Algorithms for Automated Deception Detection Are Used?
During the decision-making step, the extracted features are used to train a classification approach to automatically distinguish between deceptive and truthful videos.
Specifically, 20 studies among the surveyed ones apply a deception classification approach, as represented in Table 7. Most of them (19 studies; 95%) use supervised learning techniques that require labelled training data (i.e., a set of feature vectors with their corresponding labels (deceptive vs. truthful)). The most applied supervised technique is the support vector machine (SVM), which is applied in seven studies (35%), followed by six studies (30%) applying Random Forest, five studies applying neural networks (25%), and four studies (20%) applying K-nearest neighbourhood. Various types of regression models have been experimented by surveyed papers on facial deception detection, including multinomial logistic and multivariate regression. Only one study relies on unsupervised methods that do not need manually annotated facial attributes to train the model. Specifically, it applies hidden Markov models [42].   To evaluate the deception classification approaches shown in Table 7   An important step for a systematic evaluation of the performance of the deceptiondetection method is the preparation and use of well-defined evaluation datasets [58]. Table 9 summarises the datasets used by the selected studies and their main characteristics, i.e., the size, the source used to acquire the data, and the availability. The most popular dataset (seven studies; 33%) is the real-life trial dataset [59], which is available upon request. This dataset consists of 121 videos (average length of 28.0 s) collected from public court trials, including 61 deceptive and 60 truthful trial clips. The remaining studies (14 studies; 67%) developed their own dataset to evaluate the deception-detection method mainly by recording videos from game/TV shows (5 studies; 24%), followed by the use of spontaneous (4 studies; 19%) or controlled (4 studies; 19%) interviews, and by the use of storytelling (2 studies; 10%) or the download of videos from YouTube (2 studies; 10%). The larger dataset is Abd et al.'s database [18] containing 888 video clips from 102 participants. Considering the availability, the majority of the studies collected datasets without making them available (11 studies; 52%), while 7 studies (33%) made datasets available on request. Only three studies (14%) published the collected dataset. Table 9. Datasets used by the selected studies.

Reference
Dataset Availability Size Source

Future Research Directions
The development of facial-deceptive-detection methods brings many opportunities for future research. Table 10 summarises the future research directions extracted from the surveyed studies and opportunely classifies them according to the aim of the future research. Note that only the studies that provided a future work discussion have been included in the table (14 studies; 58%).  -Exploring ways to automatically learn the best views combination for the classification task for reducing the complexity. -

Complexity reduction
The majority of the surveyed studies underline the need to extend the current dataset for the generalization of the results (five studies; 36%) and use of a multimodal approach (five studies; 36%) that combines different types of features (speech, video, text, hand gestures, head movements, etc.). In more detail, evaluating the accuracy of the detection algorithms over a larger database yields a generalization of results, as well as an increase in the performance of the algorithms, while the integration of several modalities yields higher accuracy.
Relevance is also given to the complexity reduction (four studies; 29%) by exploring innovative ways of combining and classifying the features for real-time deception detection.
Furthermore, a look at the development of further methods and the use of larger samples (three studies; 21% each) is also suggested by the surveyed studies. In more detail, the need to develop more accurate methods of action unit detection, more complex classifiers to model conversational context and time-dependent features, and affect-aware systems resulted from the analysis. On considering the use of larger samples, particular attention is given to an equal representation by gender and race and generalization to other scenarios.
Finally, it is also important to mention the future research directions looking for further psychological analysis, transferability, and increase in the extracted features (two studies; 14% each), as well as the explainability of the algorithms, model validation, and use of 3D deformable models (one study; 7% each).

Conclusions
This study provided a systematic literature analysis on facial deception detection in videos. The main contribution was a discussion of the methods applied in the literature of the last decade classified according to the main steps of the facial-deception-detection process (video pre-processing, facial feature extraction, and decision making). Moreover, datasets used for the evaluation and future research directions have also been analysed.
The initial search on WoS and Scopus databases returned 230 studies, of which 38 studies were retained after the application of the inclusion and exclusion criteria defined in the methodology. Finally, 24 studies were included in the qualitative synthesis according to the scores obtained in the quality evaluation checklist. The analysis revealed that in the last ten years, the interest of the scientific community in the topic of facial deception detection began growing in 2014 and reached a peak in 2020.
Considering the methods for video pre-processing, we observed that the majority of the surveyed studies use academic and commercial software for face detection and landmark tracking. Moreover, the most applied and successful feature-based methods are the active shape model [43] and the Viola-Jones method [33], while the Constrained Local Neural Field (CLNF) [46] is the only used image-based approach.
Concerning the extracted facial features, the Facial Action Coding System (FACS) is the most used system to represent all visually discernible facial movements, applied by 68% of the studies. Moreover, 82% of the studies rely on extracting facial expression features, followed by head pose (50% of the studies) and gaze motion (41%).
Furthermore, the real-life trial dataset [59] is the most popular dataset used for the evaluation of the performance of the decision-making algorithms (33% of the studies), while the remaining studies (67%) developed their own dataset mainly by recording videos from game/TV shows (24%), using spontaneous (19%) or controlled (19%) interviews, and using storytelling (10%).
Finally, the surveyed studies highlight the following future research directions: (i) extension of the current datasets for evaluating the detection algorithms; (ii) use of a multimodal approach that combines different types of features (facial, speech, video, text, hand gestures, etc.); (iii) reduction in the complexity of the detection process for realtime use; (iv) use of larger samples for an equal representation by gender and race; and (v) development of further more accurate and affect-aware methods.
The study has some limitations mainly related to the focus on facial cues from videos. Future studies could also include analysis of explicit cues (verbal communication) from face-to-face interactions. Indeed, multimodal analysis that combines both verbal and non-verbal cues is the most valuable approach for detecting deception [15].

Data Availability Statement:
The data presented in this study are available upon request from the corresponding author.