Optical Flow Filtering-Based Micro-Expression Recognition Method

: The recognition accuracy of micro-expressions in the ﬁeld of facial expressions is still understudied, as current research methods mainly focus on feature extraction and classiﬁcation. Based on optical ﬂow and decision thinking theory, we propose a novel micro-expression recognition method, which can ﬁlter low-quality micro-expression video clips. Determined by preset thresholds, we develop two optical ﬂow ﬁltering mechanisms: one based on two-branch decisions (OFF2BD) and the other based on three-way decisions (OFF3WD). In OFF2BD, which use the classical binary logic to classify images, and divide the images into positive or negative domain for further ﬁltering. Di ﬀ er from the OFF2BD, OFF3WD added boundary domain to delay to judge the motion quality of the images. In this way, the video clips with low degree of morphological change can be eliminated, so as to directly improve the quality of micro-expression features and recognition rate. From the experimental results, we verify the recognition accuracy of 61.57%, and 65.41% for CASMEII, and SMIC datasets, respectively. Through the comparative analysis, it shows that the scheme can e ﬀ ectively improve the recognition performance.


Introduction
Micro-expressions are always subtle and unconscious, which the duration of facial expression action is less than 0.2 s [1]. Micro-expressions can mine the true emotional state hidden in the human and demonstrates important application value in medical, polygraph, legal case and other fields. Therefore, micro-expressions recognition has gradually become a hot research issue. In early period, Ekman et al. [2] designed a Facial Action Coding System (FACS) based on the correlation between facial morphology changes to facilitate in-depth research in the field of it, and each of facial action coding is called an action unit (AU). After that, a lot of relative works have been researched on micro-expressions with the gradual maturity of artificial intelligence technology. However, the dataset of micro-expressions is difficult to collect because of the short duration and unconscious characteristics. Therefore, many micro-expression datasets have been constructed for the convenience of researchers for better in-depth research, such as SMIC [3], CASME II [4], SAMM [5], CAS(ME)2 [6], etc.
The main research idea of traditional micro-expression is to mine micro-expression information by extracting the features of the image combined with the classifier method. Pfister et al. [7] proposed an automatic micro-expression recognition system based on a temporal interpolation model and a spontaneous micro-expression corpus. This work is one of the earliest to realize automatic micro-expression recognition. Domestically, as representatives of micro-expression research and development, Fu et al. [8,9] made a lot of efforts to facilitate further research on micro-expression.
With the gradual maturity of deep learning technology, researchers at home and abroad began to use deep learning methods [10][11][12] for micro-expression image coding and micro-expression recognition. Those methods used CNN and extended methods to replace the traditional texture feature extraction method for feature extraction and recognition.
According to the research, it can be found that the current research on micro-expression recognition at home and abroad is mainly related to the following two links: The first is how to extract high-quality texture features, and the second is to choose which classifier has a good classification effect. However, there are several problems with the current research thinking. 1. Traditional research methods mainly focus on the internal connection between the feature value in the images. Actually, the micro-expression is essentially the combined behavior of the morphological semantics of the various organs of the face, so that the influence of the morphological semantic quality of micro-expression is ignored. 2. Because of the small amount in the micro-expression dataset, the potential of deep learning method in the field of micro expression can't be fully exploited, which also has low interpretability as a brute force method. In response to this problem, this paper introduces the decision-making thinking of rough set theory, which proposes an optical flow filtering method based on optical flow characteristics. This method eliminates video clips with low morphological change to improve the feature expression ability of the micro-expression to be recognized, thereby achieving the purpose of improving the performance of micro-expression recognition.
Therefore, the main contributions are as follows: 1.
From the morphological change perspective, we propose a novel image weighting method based on optical flow to assign semantic quality to images between video clips; 2.
Two optical flow filtering algorithms are proposed based on two-branch decisions and three-way decisions and a framework of micro-expression recognition, which establishes an interpretable and robust micro-expression recognition theory; 3.
Compared with the state-of-the-art solution, the experiment proves the superiority of the effectiveness and efficiency of this method.
The rest of the paper is organized as follows: Section 2 conduct research work on related technologies. Section 3 describes the details of our method. Section 4 reports and discusses the experimental results. Section 5 gives conclusion and future work.

Related Works
Compared with macro-expression, the micro-expression has the characteristics of short duration and low intensity of facial movements, which leads to greater challenges in the facial features of micro-expressions. For the research on micro-expressions, the current research can be summarized as end-to-end research mode, which mainly focus on pre-processing, diversified feature extraction, and joint classifiers to micro-expression recognition. Nowadays, more and more new research methods are proposed with the maturity of science and technology.
I. Face detection and alignment. Face detection is essential for many facial applications (such as facial recognition and facial expression analysis). Accurate facial recognition through face detection methods can effectively reduce the impact of noise on micro-expressions in human face. The current face detection methods mainly include: 1. Use feature points to align faces and determine key points in the face area; 2. Use deep learning methods for face detection.
Cootes et al. [13] proposed the active shape model (ASM), which uses a 68-point subjective shape model to locate the key points of the human face to obtain the key points of the face area. This is an early widely used method for aligning face parts. Through the improvement of ASM, Cootes et al. [14] further proposed the active appearance model (AAM), which not only includes the establishment of shape models, but also the establishment of texture models and hybrid models. Subsequently, Cootes et al. [15] proposed a constrained local model (CLM), which abandoned the global texture method of the AAM algorithm, and modeled the local area in the neighborhood of candidate matching feature points and Space position is constrained.
With the maturity of deep learning technology, deep learning is widely applied in different fields [16,17]. Yang et al. [18] proposed a cascaded structure to detect tiny faces in the complex environments. Zhang et al. [19] exploits the inherent correlation between the tasks of detection and alignment, and combined the CNN to proposed MTCNN. MTCNN utilizes a cascaded multi-task framework to boost up the performance of both face detection and alignment. In addition, fast R-CNN [20] or faster R-CNN [21] has also been applied to face detection, which has good improvement in detection speed.
II Feature extraction and derivation. Feature extraction has been a hot issue in the field of micro-expression research. The performance of micro-expression recognition is mainly reflected in the extraction of texture features according to current research. Ojala et al. [22] proposed a kind of binary vector encoding (LBP) of local pixels, and describes the texture information of the current image by generating a histogram. LBP generally used for feature extraction of static image. Expand based on LBP operator, Zhao et al. [23] proposed a dynamic spatiotemporal feature operator LBPTOP. Compared with LBP Operator, LBPTOP takes into account the influence of video spatiotemporal information, thus adding two new feature planes XT and YT of projection on T time axis. Based on the above two classic operators, a variety of feature extraction variant operators have been derived. Huang et al. [24] proposed a spatiotemporal Local Binary Pattern with Integral Projection (STLBP-IP), which uses a differential image-based integral projection method to obtain appearance and motion features in horizontal and vertical directions. Subsequently, Huang et al. [25] proposed a spatial and temporal complete local quantization module (STCLQP) for facial micro-expression analysis, which optimizes the efficiency of micro-expression recognition by extracting three levels of feature vectors of pixel coincidence, amplitude and direction. With the further research, the team [26] once again proposed the local binary mode based on the spatiotemporal radon transform. Guo et al. [27] proposed an extended local binary on the tri-orthogonal plane Value model (ELBPTOP). In order to improve the efficiency of dynamic feature extraction, the LBPSIP [28] (Local Binary Patterns with Six Intersection Points) operator is proposed through the optimization on the LBP-TOP operator. This method transforms the histogram features of the three planes of LBPTOP into a two-dimensional vector encoding between 6 points intersecting XY, XT and YT, and uses the generated histogram as the final dynamic texture feature. Guo et al. [29] proposed to replace the classic LBPTOP operator with CBPTOP operator for image feature extraction, and then use ELM for feature recognition of micro-expressions.
III Research on micro-expression framework. With the gradual maturity of feature operators, in order to break this solidified research model, researchers began to conduct in-depth research on the recognition framework of micro-expression. Jia et al. [30] used the LBP operator to extract the macro-information and the LBP-TOP operator to extract the micro-information to construct the conversion model of the micro expression macro and micro information. Yan et al. [31] restricted local model (CLM) algorithm combined with local binary mode (LBP) and optical flow (of) to identify and analyze the correlation between the frames of face region (ROIs). Wang et al. [32] combining the weights calculated by the cumulative optical flow with the spatiotemporal features extracted by LBPTOP obtained the weighted spatiotemporal features, and the weighted features are input into a support vector machine for micro-expression recognition. Wang et al. [33] convert a color video clip into a four-dimensional vector, where the first two dimensions are spatial information, the third dimension is temporal information, and the fourth dimension is color information. Liong et al. [34] extracted the features of the vertex frame by selecting the Apex frame and the offset frame image of each video and combining the bi-directional weighted optical flow (Bi-WOOF).
IV Application of decision thinking theory. The decision thinking in rough set theory is the core idea of two-branch decision and three-way decision. Two-branch decision emphasizes the right or wrong of things and is absolutely. However, the three-way decision is a decision-making model based on human cognition, which believes that people can immediately make judgments about things that are fully certain to accept or reject in the actual decision-making process. Additionally, for things that cannot be made immediately, people tend to postpone the judgment of the event. Yang et al. [35] applied the three-way decision in medical diagnosis. Liang et al. [36] introduced the method of two-branch decision and three-way decision into the field of clustering, and defined a cluster calculation method based on clustering.
In addition, there are analysis some state-of-the-art-review in the fields of micro-expression, as shown in Table 1. According to the above research results, this paper combines multiple decision-making methods with classic feature extraction methods to construct a new micro-expression research framework. Table 1. Analysis of the state-of-the-art-review for micro-expression recognition.

Methods
Description Advantage

STCLQP [25]
Extracts the features of sign, magnitude and orientation components information and fused.
For facial micro-expression appearance and motion feature extraction has a better effect.

TAOF [32]
The motion intensity is calculated by accumulating the neighboring optical flow in the time interval, and the feature are further weighted locally.
Enhanced the discrimination of the weighted features for micro-expression recognition.

BI-WOOF [34]
Created a new feature extractor, which to encode essential expressiveness of the apex frame.
Capable of representing the apexframe in a discriminative manner which emphasizes facial motion information at both bin and block levels.

FHOFO [37]
Construct suitable angular histograms from optical flow vector orientations and using histogram fuzzification to encode the temporal pattern.
Insensitive to change of illumination conditions among video clips and the extracted features are robust to the variation of expression intensities

RIMDS [38]
Explored the shallower-architecture and lower-resolution input data, shrinking model and input complexities simultaneously for the composite-database.
Achieved the superior performance compared to the micro-expression recognition approaches.

FMBH [39]
Combing both the horizontal and the vertical components of the differential of optical flow as inspired from the motion boundary histograms (MBH).
The unexpected motions, caused by residual mis-registration that appearsbetween images cropped from different frames, can be removed.

FDM [40]
Characterize the movements of a micro-expression in different granularity.
Provided an effective and intuitive understanding of human facial expressions.

STRCN [41]
A novel deep recurrent convolutional networks based micro-expression recognition approach and capturing the spatiotemporal deformations of micro-expression sequence.
Conquered the "limited and imbalanced training samples" problem and considered the spatiotemporal deformations.

Proposed Methods
Our proposed method consists of three main phases: Weight generation phase, optical flows filtering phase and micro-expression recognition phase. The main framework of our method is shown in Figure 1.

Weight Generation Phase
The analysis of SMIC [3] and CASMEII [4] can be found that the video clips contains a large number of images with no expression and low morphological semantic change. The region of the expression is mainly focus on the beginning of Onset-Frame to the end of Offset-Frame, and the peak of semantics is reached in Apex-Frame. Therefore, the features extracted through low motion semantic images will result in a large amount of low-quality and imbalance data. In order to conquered the problem, this paper proposes a low motion semantic image filtering method based on optical flow and decision thinking theory, which is used for the experimental preprocessing of video clips.
A pre-processing step was needed to ensure that have common features because of the resolution and other external features of each dataset used in experiments are various. First, Multitask Cascaded Convolutional Networks (MTCNN) [19] module was adopted to detect face areas from the input images. Second, it crops only facial area from the entire image to make them the same size and not affected by unnecessary parts such as hair or accessories. Additionally, the cropped face images are resized to a size of 300 × 250. In this manner, the variations of spatial appearance among different video clips are normalized. Subsequently, the dense optical flows [42] are extracted from the aligned video clips to represent motion information of each pixel in micro-expression video clips. consist of horizontal displacements and vertical displacements. Therefore, the optical flows ( , ) can be expressed as: where ℎ ( , ) and ( , ) indicate the horizontal and vertical displacements respectively.
According to the optical flow of obtaining − 1 frames, for each image of optical flow, the Euclidean distance of the horizontal and vertical displacement of each pixel is obtained as the motion

Weight Generation Phase
The analysis of SMIC [3] and CASMEII [4] can be found that the video clips contains a large number of images with no expression and low morphological semantic change. The region of the expression is mainly focus on the beginning of Onset-Frame to the end of Offset-Frame, and the peak of semantics is reached in Apex-Frame. Therefore, the features extracted through low motion semantic images will result in a large amount of low-quality and imbalance data. In order to conquered the problem, this paper proposes a low motion semantic image filtering method based on optical flow and decision thinking theory, which is used for the experimental preprocessing of video clips.
A pre-processing step was needed to ensure that have common features because of the resolution and other external features of each dataset used in experiments are various. First, Multi-task Cascaded Convolutional Networks (MTCNN) [19] module was adopted to detect face areas from the input images. Second, it crops only facial area from the entire image to make them the same size and not affected by unnecessary parts such as hair or accessories. Additionally, the cropped face images are resized to a size of 300 × 250. In this manner, the variations of spatial appearance among different video clips are normalized. Subsequently, the dense optical flows [42] are extracted from the aligned video clips to represent motion information of each pixel in micro-expression video clips. The optical flows ξ o i (x, y) are computed from each pair of neighboring images. When input t-frames video V = {v 1 , v 2 , . . . , v t } into the method, t − 1 frames optical flows O = {o 1 , o 2 , . . . , o t−1 } will be obtained, Where o i is the optical flow image between v i and v i+1 . The optical flows ξ o i (x, y) consist of horizontal displacements and vertical displacements. Therefore, the optical flows ξ o i (x, y) can be expressed as: where h o i (x, y) and v o i (x, y) indicate the horizontal and vertical displacements respectively. According to the optical flow of obtaining t − 1 frames, for each image of optical flow, the Euclidean distance of the horizontal and vertical displacement of each pixel is obtained as the motion intensity of the current pixel. Then, the optical flow motion intensity W o i of the entire optical flow image of W × H pixels can be calculated as: where o i are the i-frame optical flow and i = 1, 2, . . .  (3): where W o i represents the motion intensity of i -frame optical flow, W represents for optical flow o i width, H represents for optical flow o i height. Therefore, the motion intensity of optical flow can be measured by the average motion intensity value of optical flow in Equation (3). The average motion intensity reflects the strength of each group of neighboring video images. That is, the larger the average motion intensity of the optical flow, the larger the semantic quality between neighboring video frames. Namely, the smaller the average motion intensity of the optical flow, the smaller the semantic quality between neighboring video frames. Therefore, the average motion intensity of optical flow can be used to measure semantic quality information.
In order to better classify the semantic quality, the semantic quality of optical flow is evenly distributed in [0, 1] by defining a normalization weighting formula. That is to say, the larger the average motion intensity of the optical flow, the larger the normalized semantic weight value; on the contrary, the smaller the average motion intensity of the optical flow, the smaller the normalized semantic weight value. For each optical flow frame of the average motion intensity W o i , the evaluation weighting of optical flow weighting filtering algorithm as shown in Equation (4).
where θ is a parameter to adjust the influence of the weighted over the index. Therefore, all the video frames obtained the semantic quality weight information ω o i (i = 1, 2, . . . , t − 1) between neighboring video frames, the weight information represent the strength of the face area change between the neighboring video frames.

Optical Flows Filtering Algorithm Based on Two-Branch Decisions
The two-branch decisions have important application value in the choice of uncertainty about the data removal and retention. Therefore, determine the semantic quality of the video frames by defining the weight threshold δ(δ ∈ (0, 1)), which facilitates the classification of high-quality or low-quality semantic images. In solving uncertainty problems, classic binary logic embodies its important role. In distinguishing the motion intensity, the weight threshold is used to judge the semantic quality of the video frames, so that the optical flow weight between the video frames is divided into an object set When Through the above method, the optical flow frames with high semantic change can be distinguished from the optical flow frames with low semantic change, so that the high semantic quality optical flow frames in set POS δ (O), and the low semantic quality optical flow frames in set NEG δ (O).
With the average motion intensity weight ω o i (i = 1, 2, . . . , t − 1) and threshold δ(δ ∈ (0, 1)), the motion intensity of the optical flow frame is divided into a set POS δ (O) and a set NEG δ (O). The more essential meaning is to distinguish images with small morphological changes between neighboring video frames from pictures with large morphological changes. This method uses the image of the first frame as the semantic benchmark to mine the morphological semantic quality of time series. Corresponding to this, the video frames V can be divided as follows according to the optical flow sets POS δ (O) and NEG δ (O) as follows: When At this point, any video frame v i ∈ V in the video frames set V given semantic quality w.r.t V and based on optical flow weighting of the image semantic quality filtering algorithm is completed, which lays a solid foundation for the establishment of the morphological semantic database in Section 4 below. The pipeline of OFF2BD algorithm is illustrated in Algorithm 1.

Optical Flows Filtering Algorithm Based on Three-Way Decisions
As the most important extension of the two-branch decisions, the three-way decisions [43] is an important granular calculation method derived from the rough set theory in machine learning, which is a new theory for solving uncertain problems proposed by the famous granular computing expert Yao [44] for the first time based on rough set theory and time research. Compared with the traditional two-branch decisions which requires the iterative incremental solution to solve the uncertainty of the problem, the three-way decision increases the delay behavior when the decision making is not accurate or cannot make accurate positive and negative decisions. The three-way decisions strategy not only reasonably explain the division behavior of the three decision domains (positive domain, negative domain and boundary domain) in the rough set theory, but also reduce the time consumption (abandoning incremental decision-making strategy (i.e., adding new knowledge constantly to make sure the uncertainty part turn to certain domain) in two-branch decisions.) In dealing with uncertain problems, the three-way decisions method developed by the two-decision method is more suitable for human inertial thinking in solving problems. This approach is more representative of the problem-handling theory of contradictory relation and contrary relation problems. Therefore, in order to improve the semantic relationship between video frames and improve the quality of feature extraction, the quality thresholds α(α ∈ (0, 1)) and β(β ∈ (0, 1) ∧ β > α) of the video frames are defined for optical flow filtering. And then, the optical flows can be divided into an object set When ω o i ≥ β the optical flow frame o i is divided into the positive domain of the set ω o i and this domain is denoted as When ω o i < α, the optical flow frame o i is divided into the negative domain of the set ω o i , and this domain is denoted as Three-way decisions methods for optical flow quality division are defined for (α, β) threshold, wherein the (α, β) are usually set to 0 < α < β < 1. Compared with the two-branch decisions filtering method, this method adds the uncertainty domain BND (α,β) (O) as a buffer area for uncertain semantic quality processing. Then, the BND (α,β) (O) domain and the POS (α,β) (O) domain are jointly used as the update domain, and iterative optical flow filtering analysis is performed until convergence is reached or a certain number of stops.
As the same time, the first frame of the image is used as a semantic reference combined with a convergent high-quality optical watershed to divide the collection of video frames. Corresponding to this, the video frames V can be divided as follows according to the optical flow set POS (α,β) (O) , BND (α,β) (O) and NEG (α,β) (O) as follows: When o i ∈ POS (α,β) (O), the video frame v i+1 is divided into the high semantic quality domain, recorded as v i+1 ∈ POS (α,β) (V) ; When o i ∈ BND (α,β) (O), the video frame v i+1 is divided into the fuzzy semantic quality domain, recorded as v i+1 ∈ BND (α,β) (V) ; When o i ∈ NEG (α,β) (O), the video frame v i+1 is divided into the low semantic quality domain, recorded as v i+1 ∈ NEG (α,β) (V) ; When a specified number of iterative filtering or convergence is achieved, the POS (α,β) (V) and ∈ BND (α,β) (V) video frames set are jointly updated to the video frames set of the target domain V, and the video frame is reordered to obtain a high-quality semantic sequence video combination. The pipeline of OFF3WD algorithm is illustrated in Algorithm 2.
Algorithm 2 Optical flows filtering algorithm based on three-way decision (OFF3WD)

Feature Extraction
In the field of micro-expression recognition, Local Binary Pattern (LBP) [22] is a common symbiotic pattern method that encodes local pixels in an image. The original LBP operator is defined as within the window of 3 × 3, taking the center pixel of the window as the threshold, and comparing the gray value of the adjacent 8 pixels with the center pixel in a clockwise direction, if the surrounding pixel value is greater than or equal to the center pixel value, the position of the pixel is marked as 1, otherwise it is 0. In this way, 8 pixels in the neighborhood of 3 × 3 can be compared to produce 8-bit binary numbers (usually converted to decimal numbers, that is, LBP codes, a total of 256 kinds), that is, the LBP value of the pixel at the center of the window is obtained, and this value is used to Reflect the texture information of this area. Such a 3 × 3 picture can be transformed into 256-dimensional texture features. The formula is shown in Equation (5).
where (x c , y c ) represents the central pixel, i c is the gray value, i p is the gray value of the adjacent point, and S(x) is a function that conforms to Equation (6). According to the above formula method, the binary extraction process of the face picture is shown in Figure 2.  26. Obtain the video frame set according to the completed iteration condition or the converged optical flow set ; 27. Output high-quality semantic image set .

Feature Extraction
In the field of micro-expression recognition, Local Binary Pattern (LBP) [22] is a common symbiotic pattern method that encodes local pixels in an image. The original LBP operator is defined as within the window of 3 × 3, taking the center pixel of the window as the threshold, and comparing the gray value of the adjacent 8 pixels with the center pixel in a clockwise direction, if the surrounding pixel value is greater than or equal to the center pixel value, the position of the pixel is marked as 1, otherwise it is 0. In this way, 8 pixels in the neighborhood of 3 × 3 can be compared to produce 8-bit binary numbers (usually converted to decimal numbers, that is, LBP codes, a total of 256 kinds), that is, the LBP value of the pixel at the center of the window is obtained, and this value is used to Reflect the texture information of this area. Such a 3 × 3 picture can be transformed into 256-dimensional texture features. The formula is shown in Equation (5).
where ( , ) represents the central pixel, is the gray value, is the gray value of the adjacent point, and ( ) is a function that conforms to Equation (6). According to the above formula method, the binary extraction process of the face picture is shown in Figure 2. In order to solve the dynamic feature extraction, based on the LBP method proposed a dynamic feature extraction method LBPTOP (Local Binary Pattern from Three Orthogonal Planes). LBPTOP In order to solve the dynamic feature extraction, based on the LBP method proposed a dynamic feature extraction method LBPTOP (Local Binary Pattern from Three Orthogonal Planes). LBPTOP solves the problem that LBP can only extract two-dimensional features, and can be applied to feature extraction in three dimensions. LBPTOP extracts LBP features in three dimensions from the spatial plane XY and the spatial-temporal planes XT and YT respectively, and then combines them in series to form spatial-temporal features.
We use LBPTOP, LBPSIP, STCLQP, STLBPIP feature descriptors to encode the optical flow filtered video to express temporal and spatial information. These feature descriptors are all improved based on the LBPTOP method, and feature enhancement is performed on the three planes of the spatial plane XY and the spatial-temporal plane XT, YT. Therefore, the feature histograms of the based on the two-branch decisions spatial-temporal video V obtained after the optical flow filtering can be expressed as: where V δ represents the set of high-quality semantic frame spatiotemporal sequences obtained after the convergence of the OFF2BD algorithm. H δ are the functions to get the feature histograms from the V δ .
To further improve the morphological semantics quality of the spatial-temporal video V, the feature histograms based on OFF3WD can be expressed as: where V α,β represents the set of high-quality semantic frame spatiotemporal sequences obtained after the convergence of the OFF3WD algorithm. H α,β are the functions to get the feature histograms from the V α,β .

Micro-Expression Recognition
For classic micro-expression feature recognition, Whitehill et al. [45] analyzed and compared it and found that SVM has relatively good recognition effect on micro-expression recognition compared with other traditional machine learning algorithms. FHOFO [37] uses linear SVM, 3point KNN and LDA to make correlation comparison among different micro-expression datasets. The experimental results show that compared with other classifiers, linear SVM has good generalization ability and stability. On the classification method, the leave-one-subject-out (LOSO) method was used to train and test the classifier. The LOSO-based validation is performed considering all images in the dataset. Given a dataset of n samples, the LOSO-based validation is performed with n iterations, such that in each iteration the classifier is trained with n-1 samples and tested on the remaining one sample. Using this strategy to evaluate the classification effect can more intuitively and accurately reflect the effect of the current experimental method. Therefore, this classification test method is currently used in most machine vision fields.

Experiments and Discussion
In this paper, The SMIC and CASME II will be used for optical flow filtering related experiments. All experiments in this paper are running on a PC of Windows 10, intel i7-8750H, 16G memory, python 3.7 environments, MATLAB 2016a and the python IDE environment is pyCharm professional 2019.3.

Datasets
The experiment selects two most popular spontaneous datasets, including CASME II and SMIC are used to evaluate the performance of the proposed method. The datasets statistics are summarized in Table 2. There are three versions of SMIC data: one is the high-speed camera (HS) of 100fps, the ordinary camera (VIS) version of 25 fps, and the near-infrared camera (NIR) version of 25 fps. HS cameras were used to record all data, while vis and NIR cameras were used to record data from the last eight subjects. Emotion classes are only based on participants' self-report. CASME II [4]: CASME II contains 247 micro-expression video clips from 26 subjects. All samples were recorded by high-speed cameras at 200 fps. The sampling resolution is 640 × 480 pixels. These samples were divided into five categories: Happiness (32 samples), Surprise (25 samples), Disgust (64 samples), Repression (27 samples) and Others (99 samples). Unlike SMIC, the AU label of CASME II follows the facial action coding system (FACS).

Experimental Setting
This section aims to verify the adaptability and performance of the method in this paper through relevant pairing and analysis experiments. MTCNN has good accuracy for face detection compared to traditional face alignment methods. Therefore, MTCNN is used for face positioning on the SMIC and CASME II data images, and the face region is cropped according to the final bounding box to obtain an average face area image size of 300 × 250.
Meanwhile, this article uses two optical flow filtering algorithms (OFF2BD, OFF3WD) to reduce the impact by low-quality semantic images. Through OFF2BD-LBPTOP, OFF3WD-LBPTOP, OFF2BD-LBPSIP, OFF3WD-LBPSIP, OFF2BD-STCLQP, OFF3WD-STCLQP, OFF2BD-STLBPIP, OFF3WD-STLBIP methods and classic micro-expressions recognition methods LBP-TOP, LBP-SIP, STCLQP, STLBP-IP for performance comparison. In order to verify the effectiveness of the optical flow filtering algorithm proposed in this paper on micro-expression recognition, several groups of experiments are designed as follows: • Sensitivity of the parameter θ to normalization function ω. The confusion matrix of recognition accuracies for each micro-expression on CASMEII and SMIC dataset.
Choices of Parameters θ: The size of the cropped face image is equalized to 300 × 250 for SMIC and CASME II. A parameter θ is adopted in the computation of normalized weighting to adjust the influence of the magnitudes of optical flows over the index. Among then, through the image calculation of CSAMEII and SMIC the unit pixel magnitudes of each optical flow belong to the interval of [3 × 10 −4 , 3.5 × 10 −3 ]. Parameter θ setting range is (0,1), and its relationship with the normalized weight can be represented by Figure 3. As shown in Figure 3, when setting θ within a certain range values, e.g., setting θ < 0.005, the Normalized Weight steady increase as the unit pixel magnitudes of optical flow increases and the difference between high-quality semantic optical flow weights and low-quality semantic weights is evenly expanded. When setting θ to larger values, the weight gap of high-quality semantic optical flow and that of low-quality semantic optical flow will be narrowed down, this will directly lead to the effect of the next optical flows weight filtering. Empirically, it is suggested that the parameter θ be set in the interval of [5 × 10 −4 , 0.01]. Whereas θ = 0.001, the weighted interval with a more uniform distribution will be obtained. Therefore, this paper chooses θ = 0.001 as the weight coefficient parameter of the experiment.
Electronics 2020, 9, x 12 of 19 weights and low-quality semantic weights is evenly expanded. When setting to larger values, the weight gap of high-quality semantic optical flow and that of low-quality semantic optical flow will be narrowed down, this will directly lead to the effect of the next optical flows weight filtering. Empirically, it is suggested that the parameter be set in the interval of 5 × 10 , 0.01 . Whereas = 0.001, the weighted interval with a more uniform distribution will be obtained. Therefore, this paper chooses = 0.001 as the weight coefficient parameter of the experiment. Performance Evaluation Index: Accuracy is a classic method to judge the performance of the model in machine learning. At present, most judgments on micro-expression recognition models are judged by recognition accuracy. The formula is as follows: = * * + * (9) where * is the number of correct micro-expressions predicted by the model, * is the number of incorrect micro expressions predicted by the model, * + * is the total number of micro expressions in the micro-expression dataset.
F1 score (F1 Score) is an indicator used to measure the accuracy of the binary classification model in statistics, and it also takes into account the precision rate and recall rate of the classification model. Therefore, by calculating the recall rate combined with the precision rate to further calculate F1-score. In the prediction of the -th micro-expression class, the number of true positive cases of the -th micro-expression class that is predicted to be correct, the number of false positive cases that the -th micro-expression class is predicted to be incorrect and the number of false negative cases that the non--th micro-expression class are predicted as -th microexpression class. The precision rate and the recall rate is calculated as follows: where + is the total number of the real -th micro-expression class, + is the number of -th micro-expression class after prediction.
Calculation of micro-expression precision and recall rate of N ME classes, F1-score is defined as: Performance Evaluation Index: Accuracy is a classic method to judge the performance of the model in machine learning. At present, most judgments on micro-expression recognition models are judged by recognition accuracy. The formula is as follows: where n * is the number of correct micro-expressions predicted by the model, m * is the number of incorrect micro expressions predicted by the model, n * + m * is the total number of micro expressions in the micro-expression dataset. F1 score (F1 Score) is an indicator used to measure the accuracy of the binary classification model in statistics, and it also takes into account the precision rate p and recall rate r of the classification model. Therefore, by calculating the recall rate r combined with the precision rate p to further calculate F1-score. In the prediction of the i-th micro-expression class, the number N TP i of true positive cases of the i-th micro-expression class that is predicted to be correct, the number N FP i of false positive cases that the i-th micro-expression class is predicted to be incorrect and the number N FN i of false negative cases that the non-i-th micro-expression class are predicted as i-th micro-expression class. The precision rate p i and the recall rate r i is calculated as follows: 11) where N TP i + N FP i is the total number of the real i-th micro-expression class, N TP i + N FN i is the number of i-th micro-expression class after prediction.
Calculation of micro-expression precision p and recall rate r of N ME classes, F1-score is defined as: where p i and r i are the precision and recall of the i-th micro-expression class. The values of the above Accuracy and F1-score performance evaluation indicators are in the range [0, 1], and the larger the indicator value, the better the recognition performance of the framework model for micro-expressions.

Experiments and Analysis
The LOSO method is used to the CASMEII and SMIC data are experimentally reproduced based on the LBP-TOP feature, and the selection of optimized relevant parameters value to further complete the parameter adjustment of the OFF2BD and OFF3BD algorithms. The video clips are divided to N × N × 2 blocks. The recognition results of micro-expressions are shown in Table 3, the best accuracy and F1-score for the CASME II dataset is achieved at 7 × 7 blocks, the radius for the spatial plane (XY plane) and the radius for the spatial-temporal planes (XT and YT planes) is achieved at R x , R y , R t = (1, 1, 4). Whereas the best accuracy and F1-score for the SMIC is achieved at 5 × 5 blocks and R x , R y , R t = (2, 2, 3). In this part, the dataset is divided into train and test sets multiple times with different specifications by CASME II and SMIC. Through θ the comparison of the micro-expression recognition improvement rate obtained by the various threshold δ of multiple models in different dataset, and the robustness and generalization of the model of the OFF2BD algorithm under different thresholds are verified by the curve comparison of the test set and train set maps. LBPTOP and OFF2BD-LBPTOP are used as the benchmark method to compare the experimental algorithm. The improvement rate of the improved algorithm and the baseline algorithm is used as the evaluation index. The two-dimensional curve graph of threshold and improvement rate is fitted by many experiments on the data under different thresholds.
In this experiment, the CASME II and SMIC data are divided into multiple different dimensions of the dataset, and the effect of fitting the training data and test data of each dimension can be obtained, as shown in Figure 4. From the overall perspective, the analysis of the sets of data shows that the test data has a similar performance curve to the train data. In other words, the OFF2BD algorithm model has good generalization ability. From a partial analysis, when the threshold is at 0.5 ± 0.1, the optical flow processing of OFF2BD algorithm has a good performance improvement. Among them, SMIC peaked around 0.43, and CASME II peaked around 0.5. When the threshold is (0.2, 0.4), the micro-expression recognition performance gradually increases with the increase of the optical flow threshold, and when the threshold is greater than 0.55, the micro-expression recognition performance gradually decreases. Meanwhile, when the threshold is too high, as a large number of video clips are eliminated, some high-quality clips will be classified into negative domain, which will lead to a negative improvement rate values compared with the baseline. In summary, the OFF2BD algorithm further proves that the two-branch decision method is suitable for solving the uncertainty of image semantic quality.

Sensitivity of the Threshold (α, β) to OFF3WD
Through the analysis of the experimental results of OFF2BD, it is found that the threshold between (0.2, 0.4) has a significant improvement, and the performance gradually decreases after 0.55. The uncertainty domain BND is mainly used to solve the uncertainty of low-quality and high-quality data, and is used as a buffer domain of data to further distinguish the judgment after combining. Based on the above experience, this experiment chooses the threshold α ∈ (0.2, 0.4) and threshold β ∈ (0.5, 0.8) to experiment on the SMIC. Three-dimensional fitting is through the results generated by the OFF3WD algorithm, and the effect diagram is shown in Figure 5.
The α and β increase naturally according to the specified step length, and the range of the uncertainty domain is adjusted to select the optical flow intensity quality. As shown in Figure 5, the recognition rate of ME under the OFF3WD algorithm changes when the threshold range changes. Among them, it is a better effect improvement when the value of α is around 0.35 and the value of β is around 0.6. Additionally, the current threshold based on the OFF3WD algorithm also has a better improvement rate on CASME II ME recognition compared to the benchmark method. lead to a negative improvement rate values compared with the baseline. In summary, the OFF2BD algorithm further proves that the two-branch decision method is suitable for solving the uncertainty of image semantic quality.

Sensitivity of the Threshold ( , ) to OFF3WD
Through the analysis of the experimental results of OFF2BD, it is found that the threshold between (0.2, 0.4) has a significant improvement, and the performance gradually decreases after 0.55. The uncertainty domain BND is mainly used to solve the uncertainty of low-quality and high-quality data, and is used as a buffer domain of data to further distinguish the judgment after combining. Based on the above experience, this experiment chooses the threshold ∈ (0.2,0.4) and threshold ∈ (0.5,0.8) to experiment on the SMIC. Three-dimensional fitting is through the results generated by the OFF3WD algorithm, and the effect diagram is shown in Figure 5. The and increase naturally according to the specified step length, and the range of the uncertainty domain is adjusted to select the optical flow intensity quality. As shown in Figure 5, the recognition rate of ME under the OFF3WD algorithm changes when the threshold range changes. Among them, it is a better effect improvement when the value of is around 0.35 and the value of is around 0.6. Additionally, the current threshold based on the OFF3WD algorithm also has a better improvement rate on CASME II ME recognition compared to the benchmark method.    lead to a negative improvement rate values compared with the baseline. In summary, the OFF2BD algorithm further proves that the two-branch decision method is suitable for solving the uncertainty of image semantic quality.

Sensitivity of the Threshold ( , ) to OFF3WD
Through the analysis of the experimental results of OFF2BD, it is found that the threshold between (0.2, 0.4) has a significant improvement, and the performance gradually decreases after 0.55. The uncertainty domain BND is mainly used to solve the uncertainty of low-quality and high-quality data, and is used as a buffer domain of data to further distinguish the judgment after combining. Based on the above experience, this experiment chooses the threshold ∈ (0.2,0.4) and threshold ∈ (0.5,0.8) to experiment on the SMIC. Three-dimensional fitting is through the results generated by the OFF3WD algorithm, and the effect diagram is shown in Figure 5. The and increase naturally according to the specified step length, and the range of the uncertainty domain is adjusted to select the optical flow intensity quality. As shown in Figure 5, the recognition rate of ME under the OFF3WD algorithm changes when the threshold range changes. Among them, it is a better effect improvement when the value of is around 0.35 and the value of is around 0.6. Additionally, the current threshold based on the OFF3WD algorithm also has a better improvement rate on CASME II ME recognition compared to the benchmark method.

Comparison with the State-of-the-Art Methods
LBPTOP was reproduced in this experiment and combined with the OFF2BD algorithm and the OFF3WD algorithm to obtain the ME recognition performance score. The experimental results are shown in Figure 6. The effects of OFF2BD-LBPTOP and OFF3WD-LBPTOP are significantly better than the traditional LBPTOP method on the test data. The accuracy has improved on the CASME II and SMIC about 2-3% by OFF2BD compared with the benchmark method. The OFF3WD algorithm has a more accurate judgment on the quality of optical flow semantics, which has increased by 0.5-2% compared with the OFF2BD algorithm. All in all, the experimental data preliminarily shows that the OFF2BD and OFF3WD algorithms can effectively mine the semantic information of micro-expressions and complete the image noise with low morphological semantic quality.

Comparison with the State-of-the-Art Methods
LBPTOP was reproduced in this experiment and combined with the OFF2BD algorithm and the OFF3WD algorithm to obtain the ME recognition performance score. The experimental results are shown in Figure 6. The effects of OFF2BD-LBPTOP and OFF3WD-LBPTOP are significantly better than the traditional LBPTOP method on the test data. The accuracy has improved on the CASME II and SMIC about 2-3% by OFF2BD compared with the benchmark method. The OFF3WD algorithm has a more accurate judgment on the quality of optical flow semantics, which has increased by 0.5-2% compared with the OFF2BD algorithm. All in all, the experimental data preliminarily shows that the OFF2BD and OFF3WD algorithms can effectively mine the semantic information of microexpressions and complete the image noise with low morphological semantic quality. This section compares derivative algorithms (OFF2BD-LBPSIP, OFF3WD-LBPSIP, OFF2BD-STCLQP, OFF3WD-STCLQP, OFF2BD-STLBPIP, OFF3WD-STLBIP) and original algorithms (LBP-SIP, STCLQP, STLBP-IP) to further verify the improvement of the two filtering algorithms on the ME recognition. Two evaluation criteria (F1-score and Accuracy) were used to evaluate the performance scores. The experimental results are shown in Table 4. The proposed method achieves comparable recognition performances with the state-of-the-art methods on SMIC and CASME II. Compared with LBPSIP, the accuracy of OFF2BD and OFF3WD On CASME II is increased from 49.73% to 52.77% and 53.45%, which is 3-4% higher, and the F1 score is improved from 0.47 to 0.51. The accuracy of SMIC was improved by 2.94% and 4.15%, and the F1 score was improved from 0.46 to 0.5. Compared with STCLQP and STCLBPIP, the accuracy of the two algorithms on CASME II is improved by 2.48%, 2.89% and 2.06%, 1.02% respectively, and 0.82%, 1.39% and 1.79%, 3.90% on SMIC.  This section compares derivative algorithms (OFF2BD-LBPSIP, OFF3WD-LBPSIP, OFF2BD-STCLQP, OFF3WD-STCLQP, OFF2BD-STLBPIP, OFF3WD-STLBIP) and original algorithms (LBP-SIP, STCLQP, STLBP-IP) to further verify the improvement of the two filtering algorithms on the ME recognition. Two evaluation criteria (F1-score and Accuracy) were used to evaluate the performance scores. The experimental results are shown in Table 4. The proposed method achieves comparable recognition performances with the state-of-the-art methods on SMIC and CASME II. Compared with LBPSIP, the accuracy of OFF2BD and OFF3WD On CASME II is increased from 49.73% to 52.77% and 53.45%, which is 3-4% higher, and the F1 score is improved from 0.47 to 0.51. The accuracy of SMIC was improved by 2.94% and 4.15%, and the F1 score was improved from 0.46 to 0.5. Compared with STCLQP and STCLBPIP, the accuracy of the two algorithms on CASME II is improved by 2.48%, 2.89% and 2.06%, 1.02% respectively, and 0.82%, 1.39% and 1.79%, 3.90% on SMIC.
Among the feature methods of the current experiment, the two algorithms in this paper have achieved the greatest recognition accuracy and F1 score on STCLQP features. Especially, after three-way decision optical filtering, the recognition accuracy of STCLQP reached 65.41% on SMIC and 61.28% on CASME II, which have higher accuracies and F1-score than the state-of-the-art methods such as BI-WOOF and FHOFO. Simultaneously, the micro-expression feature obtained by the algorithm also has a good improvement effect on the F1-score. In order to further evaluate the performance of the method in this paper to identify different micro-expressions, we perform optical flow filtering algorithm application and performance testing on the STCLQP features, and use the LOSO cross-validation method to generate a confusion matrix with the best recognition accuracy. Figures 7 and 8 show the confusion matrix of OFF2BD-STCLQP and OFF3WD-STCLQP on SMIC and CASMEII respectively.
On the task of micro-expression detection, STCLQP can easily detect most micro-expression videos from non-micro-expression videos, and the OFF2BD and OFF3WD algorithms select high-quality semantic images, which further deepens the feature expression ability of STCLQP. For the micro-expression recognition on the SMIC database, OFF2BD has a corresponding improvement in the recognition accuracy of "positive" and "negative" classes compared with STCLQP, while the recognition of "surprise" shows a stable trend. OFF3WD further enhances the expression of morphological and semantic features of the "negative" class and make the recognition accuracy of the "negative" category reach 0.7. On the CASME II database, the micro-expression categories have become more complex and diverse, and the CASME II database contains five categories. From the confusion matrix of the CASMEII data set, it is found that the recognition and expression ability of the two algorithms proposed in this paper on STCLQP has generally increased. It shows the best improvement in identifying "happy" and "disgust", and there are also minor improvements in "depressed", "surprise" and "other". Among the five classes, "depression" and "disgust" are the most difficult to identify micro-expression, while the classes of "surprise", "happiness" and "other" are easy to identify. The algorithm improves the morphological semantic information of different classes, enhances the expression ability of different classes of features, and reduces the probability that the samples of "happiness", "surprise", "disgust" and "depression" are wrongly classified as "other" micro expression classes. Among the feature methods of the current experiment, the two algorithms in this paper have achieved the greatest recognition accuracy and F1 score on STCLQP features. Especially, after threeway decision optical filtering, the recognition accuracy of STCLQP reached 65.41% on SMIC and 61.28% on CASME II, which have higher accuracies and F1-score than the state-of-the-art methods such as BI-WOOF and FHOFO. Simultaneously, the micro-expression feature obtained by the algorithm also has a good improvement effect on the F1-score.
In order to further evaluate the performance of the method in this paper to identify different micro-expressions, we perform optical flow filtering algorithm application and performance testing on the STCLQP features, and use the LOSO cross-validation method to generate a confusion matrix with the best recognition accuracy. Figures 7 and 8 show the confusion matrix of OFF2BD-STCLQP and OFF3WD-STCLQP on SMIC and CASMEII respectively.
On the task of micro-expression detection, STCLQP can easily detect most micro-expression videos from non-micro-expression videos, and the OFF2BD and OFF3WD algorithms select highquality semantic images, which further deepens the feature expression ability of STCLQP. For the micro-expression recognition on the SMIC database, OFF2BD has a corresponding improvement in the recognition accuracy of "positive" and "negative" classes compared with STCLQP, while the recognition of "surprise" shows a stable trend. OFF3WD further enhances the expression of morphological and semantic features of the "negative" class and make the recognition accuracy of the "negative" category reach 0.7. On the CASME II database, the micro-expression categories have become more complex and diverse, and the CASME II database contains five categories. From the confusion matrix of the CASMEII data set, it is found that the recognition and expression ability of the two algorithms proposed in this paper on STCLQP has generally increased. It shows the best improvement in identifying "happy" and "disgust", and there are also minor improvements in "depressed", "surprise" and "other". Among the five classes, "depression" and "disgust" are the most difficult to identify micro-expression, while the classes of "surprise", "happiness" and "other" are easy to identify. The algorithm improves the morphological semantic information of different classes, enhances the expression ability of different classes of features, and reduces the probability that the samples of "happiness", "surprise", "disgust" and "depression" are wrongly classified as "other" micro expression classes.    Figure 8. The confusion matrix of OFF3WD on SMIC and CASME II.

Conclusions
We propose an optical flow filtering micro-expression recognition algorithm (OFF2BD and OFF3WD) based on two-branch decision and three-way decision. The algorithm integrates the threeway decision method of rough set theory and adopts the three-way division idea of uncertain affairs, and improves the judgment of POS, BND and NEG domain conditions. The algorithm divides the morphological semantic quality of temporal images by redefining the weight value of optical flow to obtain a more robust and high-quality video images collection. We have seen that these two algorithms have improved their effects on the classic feature extraction methods. For STCLQP features, the optimized recognition rate and F1 score of the test are achieved on the SMIC data set. Optical flow filtering strengthens the feature intensity information on the time axis and eliminates images with low morphological semantic quality.
For future work, we pay more attention to the changes brought about by the facial action areas, and combine the idea of the OFF3WD algorithm to conduct semantic mining on the XY axis and the time axis T to reduce the impact of noise in different dimensions. In addition, not only limited to the method test of different features, we are designing the connection between different classifiers with our algorithm.

Conclusions
We propose an optical flow filtering micro-expression recognition algorithm (OFF2BD and OFF3WD) based on two-branch decision and three-way decision. The algorithm integrates the three-way decision method of rough set theory and adopts the three-way division idea of uncertain affairs, and improves the judgment of POS, BND and NEG domain conditions. The algorithm divides the morphological semantic quality of temporal images by redefining the weight value of optical flow to obtain a more robust and high-quality video images collection. We have seen that these two algorithms have improved their effects on the classic feature extraction methods. For STCLQP features, the optimized recognition rate and F1 score of the test are achieved on the SMIC data set. Optical flow filtering strengthens the feature intensity information on the time axis and eliminates images with low morphological semantic quality.
For future work, we pay more attention to the changes brought about by the facial action areas, and combine the idea of the OFF3WD algorithm to conduct semantic mining on the XY axis and the time axis T to reduce the impact of noise in different dimensions. In addition, not only limited to the method test of different features, we are designing the connection between different classifiers with our algorithm.