Analysis of Facial Occlusion Challenge in Thermal Images for Human Affective State Recognition

Several studies have been conducted using both visual and thermal facial images to identify human affective states. Despite the advantages of thermal facial images in recognizing spontaneous human affects, few studies have focused on facial occlusion challenges in thermal images, particularly eyeglasses and facial hair occlusion. As a result, three classification models are proposed in this paper to address the problem of thermal occlusion in facial images, with six basic spontaneous emotions being classified. The first proposed model in this paper is based on six main facial regions, including the forehead, tip of the nose, cheeks, mouth, and chin. The second model deconstructs the six main facial regions into multiple subregions to investigate the efficacy of subregions in recognizing the human affective state. The third proposed model in this paper uses selected facial subregions, free of eyeglasses and facial hair (beard, mustaches). Nine statistical features on apex and onset thermal images are implemented. Furthermore, four feature selection techniques with two classification algorithms are proposed for a further investigation. According to the comparative analysis presented in this paper, the results obtained from the three proposed modalities were promising and comparable to those of other studies.


Introduction
The human affective state is an important factor that greatly influences our lifestyle, including our thoughts, activities, thinking, focus, and our problem-solving and decisionmaking abilities. Thus, affective state recognition is of considerable interest to researchers. Affect, mood, and emotion are concepts included in the domain of affective computing. Affect is a general term for a range of feelings that a person can experience. It includes emotions, which are intense feelings that can be directed at a source and are usually shortlived, and moods, which are longer-lasting, less intense, and may not require a specific stimulus [1,2].
Researchers have recently been looking into techniques to recognize human emotions and are applying them to a variety of fields. This includes human-computer interaction to facilitate communication between humans and computers. Human-robot interaction involves understanding how people behave and feel, as this helps the robot to interact with people in an appropriate manner [3]. Human emotion techniques have been used in security applications to identify people who are able to mask their emotions, often referred to as having a "poker face" [4], and in deception detection to detect when someone has not been truthful or accurate in their statements [5,6]. They have also been used for medical applications, including people with autism disorder who may not be able to use their body language, facial expressions, or spoken language to show how they feel. Therefore, they require assistance to help with their specific needs in order to understand and express their emotions [7]. Other medical applications include sleep apnea, which causes a person's breathing to become shallow or to temporarily stop when they are sleeping [8].
previous studies, which means that abandoning eye ROIs could lead to overcoming the eyeglass occlusion challenge. The second proposed model in this study divided the 6 main ROIs into 27 sub-ROIs; the goal of this model is to explore the efficiency of facial sub-ROIs in recognition human affects rather than using the main ROIs, which could have partial occlusion. The third proposed model in this study conducted affective state recognition based on 11 selective sub-ROIs. The selected ROIs located on facial patches with the free existence of facial hair, such as a beard, mustaches, and hair bangs. Therefore, the goal of this model is the recognition of the human affective state even when wearing eyeglasses and with the facial hair occlusion.

Previous Studies
This section presents a brief overview of previous studies related to human affective state recognition based on thermal imaging and discusses the main stages, such as dataset collection, preprocessing, facial regions of interest (ROIs), feature extraction methods, and classification algorithms.

Thermal Dataset
The sophistication of thermal sensors has encouraged researchers to use thermal images to recognize human affects. According to previous studies, the dataset collection stage is an important process; thus, several studies constructed their own dataset [22,[24][25][26], while others used already-published databases, such as KTFE or USTC-NVIE [11,27,28].
Depending on the emotion type, thermal datasets can be classified as posed or spontaneous [29]. Individuals in the posed dataset were asked to express a variety of emotions in order to identify their posed emotional state. This means that this type of emotion does not accurately represent the affective state. In contrast, participants were exposed to stimuli and were unaware that they were being recorded to elicit their spontaneous affective state while their actions were being recorded. As a result, creating spontaneous databases is a difficult process [30].

Preprocessing
In the preprocessing stage, literatures have proposed various methods to enhance thermal images and localize faces; for instance, HOG and SVM were used by Kopaczka et al. [28] to extract faces from thermal images. In order to identify the region of the initial frame and calculate the head motion, Liu and Yin [31] proposed a model for face identification which comprises a combination of trees with the shard pools of parts drawn from [32]. Otsu thresholding is an approach used in image processing to transform a grayscale image into a black and white image. Wang et al. [13] applied the Otsu thresholding method to create a binary image, and then they analyzed the vertical and horizontal curves of that image to determine the gradient with the highest value; this was used to identify the facial boundary. Latif et al. [33] conducted contrast limited histogram equalization to enhance image contrast. Moreover, to detect the facial region, Mohd et al. [10] used a computer vision technique known as the Viola-Jones boosting algorithm combined with a series of Haar-like features to identify facial regions within a thermal image. Wan et al. [19] proposed a method to help identify the face in an image. The technique uses temperature space, which means that it looks at how hot or cold different parts of an image are. This allows the algorithm to tell which parts are the face and which parts are the background. For face detection, Goulart et al. [24] designed a process to detect a face in a thermal image. This process used three different kinds of filters: median filters, gaussian filters, and a binary filter.

Region of Interests (ROIs)
Human affects contribute significantly to temperature differences in facial regions. More specifically, the sympathetic nervous system (SNS) responds to human affects by controlling a variety of physiological signals, such as increasing blood flow, which then causes an increase in body temperature, propagated to the surface of the face. Consequently, thermal imaging could detect minor differences in facial temperature [33]. Moreover, variations in facial temperatures are caused by contractions in the facial action unit during human affects [34]. Numerous facial regions have been focused on by previous studies to measure the human affective state, including the forehead, tip of the nose, eyes, mouth, cheeks, and chin [6,10,24,31,33,34]. Differences in temperature values according to a specific emotion type have been reported in numerous studies; for example, Cruz-Albarran et al. [35] mentioned that the temperature of right and left cheeks increased when showing the emotion of being sad or disgusted, whereas the temperature of the maxillary and nose decreased when showing the sad or disgusted emotion. Ioannou et al. [36] showed that forehead temperature decreased when expressing sad or fear emotions and that it increased when showing the anger emotion. Moreover, the study of Jian et al. [37] reported that there is a positive correlation in the cheek and eye regions related to human emotions. Rooj et al. [6] selected facial ROIs such as cheeks, forehead, nose, and maxillary. Kumar et al. [38] proposed facial landmarks with the DenseNet model for facial ROI localization and extraction. Saha et al. [22] proposed the DFTA model to propose eight small facial patches, which contain important information which contributes to differentiating between emotion classes.

Feature Extraction
The type of selected features plays an important role in classification accuracy; thus, previous studies have used various types of features in their work. For example, statistical features including mean, variance, covariance, median, minimum, maximum, and histogram statistical features [13,20,[23][24][25]28,34,39]. Other feature types which have been used are GLCM features [13,33,40], HOG features (HOG) [28], and LBP features [22,33,34]. Some studies in the literature have also used features from deep learning methods, such as transfer learning from Alex Net [41] and convolutional sparse coding [6].

Proposed Methodology
The proposed method in this study consists of three classification models, each with a different number of ROIs. The goal of implementing three models with different numbers of ROIs is to encompass all situations of occlusion presence and explore the efficiency of sub-ROIs in classifying human affects. For example, if a facial image only contains eyeglass occlusion, model one can handle this because it excludes eyes patches from the ROIs. Furthermore, model three can be used to avoid occlusion when facial images include both eyeglasses and facial hair, such as beards or mustaches. The main stages for proposed models are demonstrated in Figure 1. The first stage is the preprocessing of facial images prior to classification; facial images were extracted from their backgrounds and a frontal view process was used to ensure that all the faces had the same correlation due to spontaneous emotions, which are usually accompanied by head movements. Then, 6 main ROIs were cropped from facial patches and subdivided into 27 sub-ROIs. In the proposed models, the following ROIs are used: the first model has 6 main ROIs, the second model has 27 sub-ROIs, and the third model has 11 selective sub-ROIs. The next stage implemented nine statistical features for three models and four types of feature selection algorithms were applied, such as principal component analysis (PCA), analysis of variance (ANOVA), neighborhood components analysis (NCA), and naive Bayes (NB). The last stage has 27 sub-ROIs, and the third model has 11 selective sub-ROIs. The next stage implemented nine statistical features for three models and four types of feature selection algorithms were applied, such as principal component analysis (PCA), analysis of variance (ANOVA), neighborhood components analysis (NCA), and naive Bayes (NB). The last stage focused on the classification of an affective state based on SVM and MLP for the three proposed models.

Dataset Selection
This study selected a dataset based on numerous factors. The first one is the recognition of a spontaneous affective state. The second requirement is that onset (beginning of emotion intensity) and apex (maximum emotion intensity) frames should be available for each individual in order to compute statistical features between two images. The third factor focused for the current study is facial occlusion, which includes eyeglasses and facial hair. As a result, the USTC-NVIE [27] database was selected because it satisfied the previous factors. The database provides six basic spontaneous emotions, which are happy, disgust, fear, surprise, anger, and sad. Furthermore, the database applied an evaluation process to ensure the intensity of an emotion for each class through five experienced evaluators. Based on their evaluation report, the study conducted an outlier process to select subjects with a higher emotion intensity, and after detecting the outliers, the number of instances employed for each class was as follows: happy: 99 subjects, disgust: 81 subjects, fear: 55 subjects, surprise: 65 subjects, anger: 56 subjects, and sad: 73 subjects.

Preprocessing Stage
As demonstrated in Figure 1, facial extraction is the first step in the preprocessing stage. Several approaches have been accomplished in previous studies to extract facial regions, for example, HOG with SVM [45], face detection based on eye coordination and template matching [46], and the Viola-Jones algorithm [33]. The current study applied Goulart et al. s [24] approach to extract thermal faces by using median and Gaussian filters with further preprocessing stages. More importantly, both onset and apex images have been selected to extract statistical features between them. Therefore, the facial extraction process was applied on both images, which are related to the same subject. Spontaneous emotions were accompanied with facial movements [47]. Therefore, to preserve the same coordination and frontal view for onset and apex images, this study applied image registration by conducting a similarity transformation. The images in Figure 2a,b are the onset and apex images before the similarity transformation, and those in Figure 2c,d are the onset and apex images after conducting a similarity transformation.

Dataset Selection
This study selected a dataset based on numerous factors. The first one is the recognition of a spontaneous affective state. The second requirement is that onset (beginning of emotion intensity) and apex (maximum emotion intensity) frames should be available for each individual in order to compute statistical features between two images. The third factor focused for the current study is facial occlusion, which includes eyeglasses and facial hair. As a result, the USTC-NVIE [27] database was selected because it satisfied the previous factors. The database provides six basic spontaneous emotions, which are happy, disgust, fear, surprise, anger, and sad. Furthermore, the database applied an evaluation process to ensure the intensity of an emotion for each class through five experienced evaluators. Based on their evaluation report, the study conducted an outlier process to select subjects with a higher emotion intensity, and after detecting the outliers, the number of instances employed for each class was as follows: happy: 99 subjects, disgust: 81 subjects, fear: 55 subjects, surprise: 65 subjects, anger: 56 subjects, and sad: 73 subjects.

Preprocessing Stage
As demonstrated in Figure 1, facial extraction is the first step in the preprocessing stage. Several approaches have been accomplished in previous studies to extract facial regions, for example, HOG with SVM [45], face detection based on eye coordination and template matching [46], and the Viola-Jones algorithm [33]. The current study applied Goulart et al.'s [24] approach to extract thermal faces by using median and Gaussian filters with further preprocessing stages. More importantly, both onset and apex images have been selected to extract statistical features between them. Therefore, the facial extraction process was applied on both images, which are related to the same subject. Spontaneous emotions were accompanied with facial movements [47]. Therefore, to preserve the same coordination and frontal view for onset and apex images, this study applied image registration by conducting a similarity transformation. The images in Figure 2a,b are the onset and apex images before the similarity transformation, and those in Figure 2c,d are the onset and apex images after conducting a similarity transformation.

Selected ROIs
Despite the fact that the aforementioned studies from the literature have focused on several facial regions, such as the forehead, tip of the nose, eyes, mouth, cheeks, and chin [44,48,49], very few previous studies tackled the challenge of manipulating facial occlusion. Therefore, the current study selected facial images with eyeglass occlusion and excluded the important eye region from the selected ROIs. Moreover, this study utilized three classification models to explore the efficiency of ROIs when thermal face comprised occlusion. For example, the first classification model employed six main facial ROIs, including the forehead, tip of the nose, cheeks, mouth, and chin. The second classification model focused on decomposing the previous six main ROIs into 27 sub-ROIs to explore the efficiency of sub-ROIs in the recognition of human affects. The third proposed classification model in this study selected 11 sub-ROIs from 27 sub-ROIs; the selection process conducted based on the criteria of each sub-ROI should be free of facial hair, as the goal of this study is to tackle the challenge of manipulating facial occlusion. Figure 3 demonstrates the proposed three types of selective ROIs for each classification model. The selection of facial ROIs for three proposed classification models in this study is illustrated in the following steps:

Selected ROIs
Despite the fact that the aforementioned studies from the literature have focused on several facial regions, such as the forehead, tip of the nose, eyes, mouth, cheeks, and chin [44,48,49], very few previous studies tackled the challenge of manipulating facial occlusion. Therefore, the current study selected facial images with eyeglass occlusion and excluded the important eye region from the selected ROIs. Moreover, this study utilized three classification models to explore the efficiency of ROIs when thermal face comprised occlusion. For example, the first classification model employed six main facial ROIs, including the forehead, tip of the nose, cheeks, mouth, and chin. The second classification model focused on decomposing the previous six main ROIs into 27 sub-ROIs to explore the efficiency of sub-ROIs in the recognition of human affects. The third proposed classification model in this study selected 11 sub-ROIs from 27 sub-ROIs; the selection process conducted based on the criteria of each sub-ROI should be free of facial hair, as the goal of this study is to tackle the challenge of manipulating facial occlusion. Figure 3 demonstrates the proposed three types of selective ROIs for each classification model. The selection of facial ROIs for three proposed classification models in this study is illustrated in the following steps: • Classification based on six main ROIs: forehead, tip of the nose, left cheek, right cheek, mouth, and chin. • Classification based on subdividing 6 main ROIs into 27 sub-ROIs as the following: 1. Subdivide forehead main ROI into 12 sub-ROIs.

2.
Use tip of the nose ROI without subdividing.

3.
Subdivide left cheek main ROI into three sub-ROIs.

4.
Subdivide right cheek main ROI into three sub-ROIs.
Subdivide chin main ROI into two sub-ROIs. the efficiency of sub-ROIs in the recognition of human affects. The third proposed classification model in this study selected 11 sub-ROIs from 27 sub-ROIs; the selection process conducted based on the criteria of each sub-ROI should be free of facial hair, as the goal of this study is to tackle the challenge of manipulating facial occlusion. Figure 3 demonstrates the proposed three types of selective ROIs for each classification model. The selection of facial ROIs for three proposed classification models in this study is illustrated in the following steps:

Feature Extraction
Before the feature extraction process, the facial regions were converted to temperature values using the equation from [50]. Therefore, the features were calculated based on temperature values instead of gray level intensity values. However, this research proposed nine statistical features to explore the variation in emotion intensity between apex and onset images for each emotion class, as demonstrated in the following: Equation (1): the mean value of the temperature points (X) in the apex image. f 1 represents feature one.
Equation (2): the mean value of the temperature differences (xd) between the onset and apex images. Variable f 2 represents feature two.
Equation (3): variation in the temperature points in the apex image. Variable f 3 represents feature three.
Equation (4): variation in the temperature differences (Xd) between the onset and apex images. Variable f 4 represents feature four.
Equation (5): the maximum temperature value obtained from the apex image. Variable f 5 represents feature five.
Equation (6): the minimum temperature value obtained from the apex image. Variable f 6 represents feature six.
Equation (7): the mean of the maximum and minimum temperature values obtained from the apex image. Variable f 7 represents feature seven.
Equation (9): the median of the temperature differences (Xd) between the onset and apex images. Variable f 9 represents feature nine.
After conducting statistical features, the number of features for the first, second, and third classification models in this research are 54, 243, and 99 features, respectively. Consequently, for an efficient classification, the feature reduction technique is required due to the large number of features. Four types of feature selection methods were proposed to explore the more efficient features. The selective feature reduction methods include principal component analysis (PCA), analysis of variance (ANOVA), neighborhood components analysis (NCA), and feature selection based on the naïve Bayes (NB) algorithm. After the feature selection algorithms arrange the features according to their importance in the classification, the first 50 features were selected from the ANOVA and PCA algorithms, while first 10 features were selected from the NCA and NB algorithms.

Classification
For the current study, the support vector machine (SVM), and backpropagation multilayer perceptron (MLP) algorithms were utilized for the classification of six basic emotion classes, including the emotions of happy, disgust, fear, surprise, anger, and sad. To obtain the multiclassification process, the one-against-one technique was used by reducing the multi-classification process to multiple binary classifications between two pairs of classes. Therefore, 15 binary classification models were utilized for each multiclassification. However, to ensure the data of each class were free of noise, the outlier process was performed before the classification by calculating the mean and standard deviation of each class, and instances with more than three standard deviations far from the mean were excluded. More importantly, to preserve balance criteria in binary classification, the down sampling technique was performed. Moreover, a 10-fold cross validation technique was utilized for the validation process. The selected kernel for the SVM classifier was the radial basis function, with an epsilon equal to 0.001, while the configuration of MLP was as follows: the learning rate was equal to 0.01, the number of epochs was 500, the backpropagation rate was 0.2, and the number of hidden layers equaled 6.

Experimental Analysis and Discussion
This section demonstrates the experimental results obtained from three classification models and discusses the reported results based on the study's objectives. For each classification model, the process flow was obtained as demonstrated in Figure 4. To evaluate the performance of the proposed models, three statistical analysis methods were obtained, including the precision, F1 score, and Kappa analysis.

Affective State Recognition Based on Six Main Facial ROIs
The goal of the proposed classification model was to explore the efficiency of six main facial ROIs to classify six basic emotions. Table 1 outlines the mean accuracy results reported from the SVM and MLP classification algorithms with PCA, ANOVA, NCA, and NB feature selectors. As shown in Table 1, the highest mean accuracy results were reported in the sad class, at 98.5%, for SVM-NCA, and the second highest mean accuracy results, at 98.4%, were reported in the sad class for SVM-NB. Moreover, the lowest mean accuracy result was reported to be 62.9% in the fear class from MLP-PCA. Furthermore, Table 2 demonstrates that SVM-NCA reported 95.3%, which is the highest overall mean recognition accuracy result, followed by MLP-NB, which reported 92.1%. The minimum overall mean accuracy results were reported at 73.8% for SVM-PCA. For the evaluation process, Table 3 outlines three statistical evaluation methods, the precision, F1 score, and Kappa, which were reported to be highest from the first two classification algorithms.

Affective State Recognition Based on Six Main Facial ROIs
The goal of the proposed classification model was to explore the efficiency of six main facial ROIs to classify six basic emotions. Table 1 outlines the mean accuracy results reported from the SVM and MLP classification algorithms with PCA, ANOVA, NCA, and NB feature selectors. As shown in Table 1, the highest mean accuracy results were reported in the sad class, at 98.5%, for SVM-NCA, and the second highest mean accuracy results, at 98.4%, were reported in the sad class for SVM-NB. Moreover, the lowest mean accuracy result was reported to be 62.9% in the fear class from MLP-PCA. Furthermore, Table 2 demonstrates that SVM-NCA reported 95.3%, which is the highest overall mean recognition accuracy result, followed by MLP-NB, which reported 92.1%. The minimum overall mean accuracy results were reported at 73.8% for SVM-PCA. For the evaluation process, Table 3 outlines three statistical evaluation methods, the precision, F1 score, and Kappa, which were reported to be highest from the first two classification algorithms.

Affective State Recognition Based on 27 Sub-Facial ROIs
The second classification model in this study explored the efficiency of decomposing 6 main facial ROIs into 27 sub-ROIs with the implementation of 9 statistical features for each sub-ROI. Table 3 outlines the mean accuracy results reported from the proposed method. The highest mean accuracy was reported, at 97.6%, in the disgust class obtained from the MLP-NB classification, and the second highest accuracy result was reported, at 97.3%, in the disgust class from MLP-NCA. The lowest mean accuracy result was reported, at 54.1%, in the anger class from SVM-PCA. The highest overall mean accuracy results were reported to be 95.2% from SVM-NB and 95.1% from SVM-NCA. The overall lowest mean accuracy result reported was 61.6% in SVM-PCA. Table 4 demonstrates statistical evaluation methods related to a higher classification with the feature selection method used in this model.

Affective State Recognition Based on Selective 11 Sub-Facial ROIs
The third proposed model for the current study relied on the selection of 11 sub-ROIs from 27 sub-ROIs; the goal of this proposed model was to explore the efficiency of the selected sub-ROIs with free facial hair to avoid thermal occlusion. As demonstrated in Figure 3c, R14 and R17 are the sub-ROIs related to the upper left and right cheeks selected to avoid the existence of hair from a beard. Furthermore, the selected lower sub-ROIs in the forehead are represented by R10 to R12 in Figure 3c, also to avoid patches of forehead that could be covered by hair (bangs). Moreover, only upper and lower lips, represented by R22 to R25, were selected from the mouth region to avoid subregions, which may include facial hair such as mustaches and a beard. However, Table 5 demonstrates the mean accuracy results reported from SVM and MLP classification algorithms and PCA, ANOVA, NCA, and NB feature selectors. The disgust class reported the highest mean recognition accuracy of 96.7% in MLP-NCA and 96.1 in SVM-NCA. Surprisingly, the lowest class was reported to be 42.2% from SVM-ANOVA. Moreover, MLP-NCA reported the highest overall recognition accuracy result of 93.4%, followed by SVM-NB, which reported, at 92.8%, the second highest recognition accuracy. SVM-PCA reported a 51.5% overall lower recognition accuracy. Table 6 demonstrates the statistical evaluation method reported from the classification of the human affective state based on the selected 11 sub-ROIs and SVM and MLP classifiers with NCA and NB feature selection algorithms.

Comparative Study
The highest results from the previous proposed classification models have been reported by two feature selection algorithms, namely, NCA and NB, while the lowest outcomes were identified by PCA and ANOVA. This finding may refer to the significance of feature selection algorithms in the classification process. Moreover, as aforementioned, the number of selected features could play a significant role in the classification results. In the current study, 50 higher ranked features were opted from PCA and ANOVA, while 10 higher ranked features were selected from NCA and NB. The reason for selecting 50 features from PCA and ANOVA relied on experimental trails to achieve higher accuracy results. More importantly, Figure 5 documents a comparison between the results reported from three proposed models. As shown in Figure 5, the highest overall mean accuracy result in model one was 95.3%, reported from the SVM-NCA classifier, and the highest overall mean accuracy result in model two was 95.2%, reported from SVM-NB classifiers, while the highest overall mean accuracy result in model three was 93.4%, reported from MLP-NCA. However, the results from the three proposed models appeared to be nearly identical, which means that the proposed method of decomposing main facial patches into sub-regions could help the researcher to avoid occluded facial regions in thermal images, such as eyeglasses and facial hair. Furthermore, the results of this study show that, despite using a small number of facial ROIs, the classification performance is still promising. More importantly, the current study's findings show that increasing the number of features does not always result in a higher accuracy, while few robust features could report more significant impact.
After a comparison has been made between three proposed models, it is important to compare the proposed models with those conducted in the literature. Therefore, the current study selected models from other studies which selected their data from the USTC-NVIE [27] database. Table 7 demonstrates the comparative results obtained from the current study with other methods from the literature based on the UTSC-NVIE database. As shown in Table 7, the results reported from the current study outperform the results reported from other studies, except for the study in [41], which reported higher mean accuracy results. The reason for this could refer to the type of features that were used. Sensors 2023, 23, x FOR PEER REVIEW 13 of 17 After a comparison has been made between three proposed models, it is important to compare the proposed models with those conducted in the literature. Therefore, the current study selected models from other studies which selected their data from the USTC-NVIE [27] database. Table 7 demonstrates the comparative results obtained from the current study with other methods from the literature based on the UTSC-NVIE database. As shown in Table 7, the results reported from the current study outperform the results reported from other studies, except for the study in [41], which reported higher mean accuracy results. The reason for this could refer to the type of features that were used. This study also validated its findings by comparing them to previous studies based on visual facial images. Table 8 outlines the comparison based on visual images and shows that the mean accuracy results reported from the current study are competitive with other results.  This study also validated its findings by comparing them to previous studies based on visual facial images. Table 8 outlines the comparison based on visual images and shows that the mean accuracy results reported from the current study are competitive with other results.

Conclusions
This paper proposed three modalities for spontaneous affective state recognition based on facial thermal images with eyeglass and facial hair occlusion. The main objective of this paper was to explore the facial sub-regions which were free of eyeglass and hair occlusion and more efficient in exploring the human affective state. The three proposed models were dependent on each other. The first model focused on the classification of affective states based on six main facial ROIs, and the eyeglass location was excluded. The second model decomposed the 6 main ROIs into 27 sub-ROIs to explore the efficiency of sub-facial regions in the classification of affective states. The third model in this paper selected 11 sub-ROIs from 27 ROIs to explore the ability of avoiding facial hair regions in thermal images. In comparison to previous studies, the results reported from the three proposed models demonstrate a higher mean accuracy: 95.3%, 95.2%, and 92.8% for models one, two, and three, respectively. Furthermore, the results of this study show the importance of feature selection techniques in improving classification accuracy. However, in future studies, we will focus on the automatic extraction of ROIs and employ deep learning algorithms to improve the recognition of human affective states.