A Novel Approach to Face Pattern Analysis

: Recognizing facial expressions is a major challenge and will be required in the latest fields of research such as the industrial Internet of Things. Currently, the available methods are useful for detecting singular facial images, but they are very hard to extract. The main aim of face detection is to capture an image in real ‐ time and search for the image in the available dataset. So, by using this biometric feature, one can recognize and verify the person’s image by their facial features. Many researchers have used Principal Component Analysis (PCA), Support Vector Machine (SVM), a combination of PCA and SVM, PCA with an Artificial Neural Network, and even the traditional PCA ‐ SVM to improve face recognition. PCA ‐ SVM is better than PCA ‐ ANN as PCA ‐ ANN has the limitation of a small dataset. As far as classification and generalization are concerned, SVM requires fewer parameters and generates less generalization errors than an ANN. In this paper, we propose a new framework, called FRS ‐ DCT ‐ SVM, that uses GA ‐ RBF for face detection and optimization and the discrete cosine transform (DCT) to extract features. FRS ‐ DCT ‐ SVM using GA ‐ RBF gives better results in terms of clustering time. The average accuracy received by FRS ‐ DCT ‐ SVM using GA ‐ RBF is 98.346, which is better than that of PCA ‐ SVM and SVM ‐ DCT (86.668 and 96.098, respectively). In addition, a comparison is made based on the training, testing, and classification times.


Introduction
A facial recognition system (FRS) is a biometric concept based on the facial features of a human being. People use a dataset of images and, based on the facial features in the images, the recognition of a person is carried out. A FRS is a type of visual recognition system. Feature extraction and classification form the basis of all facial recognition systems, where statistical and geometrical approaches are used to perform feature extraction. Various methods for facial recognition are discussed in [1,2]. Images containing faces are vital for keen vision-based human-PC communication, and exploration endeavors in face handling incorporate facial recognition, face following, present assessment, and demeanor recognition. Many detailed techniques can be used to recognize and restrict the faces in an image or an image grouping.

Facial Recognition System Principle
A FRS starts working by using an input image (captured by a camera) in a 2D or 3D way. Then, this input image is compared with the available images in the database by analyzing the input mathematically without error [3]. Facial recognition is used in some use cases, such as a second authentication factor, access to a mobile application, access to buildings, access to locked devices, and payment methods. Figure 1 describes the step-by-step procedure of the facial recognition system. In stage I, the image is input, and preprocessing is done. In stage II, face detection and face normalization are done. In stage III a, images are stored in the database after normalization, and in stage III b feature extraction is performed. Then, in stage IV a comparator is used to obtain features from the database and produce results. Image Pre-Processing Image pre-processing is the step taken before the training of the model and is used to enhance the speed of the detection process and minimize false positives [4,5]. It reduces the noise effect, color intensity, and background and provides a difference in illumination. Some basic pre-processing procedures include face detection and cropping, image resizing, image normalization, de-noising, and filtering [6].

Face Detection
This scan identifies whether a captured/used image/video is of a human or not.

Face Normalization
Face normalization minimizes the amount of redundant information and the effect of useless things in the background, such as hair and clothes, to improve the recognition process. In [7], the authors proposed a technique for face normalization in which the normalization of geometry and the brightness of faces is done to improve the efficiency. Normalization ensures that the data distribution will be similar for all input parameters. It can be done by subtracting the mean per channel and subtracting the pixel per channel.

Feature Extraction
Feature extraction involves a process of reducing the dimensionality in which division and reduction of the input data are performed to make manageable groups. Some of the features of an image are edges, corners, interest points, regions, ridges, shapes, and confidence. Some of the traditional feature detection techniques are Harris corner detection [7], the Shi-Tomasi corner detector [8], Scale-Invariant Feature Transform (SIFT) [9], Speeded-Up Robust Features (SURF) [10], Features from an Accelerated Segment Test (FAST) [11], Binary Robust Independent Elementary Features (BRIEF) [12], Oriented FAST, and Rotated BRIEF (ORB) [13], and some are useful for deep learning, such as super point [14], D2-Net [15], LF-Net [16], PCA [17], and LDA [18]. Figure 2 presents a detailed description of issues in face detection and feature extraction.

Recognition Result
Image recognition is used for identity verification purposes and to identify objects, places, people, and actions in images. Trained algorithms are used for the recognition process so that some hidden representations of features can be analyzed and applied for different objectives such as classification.
The basic structure of a facial recognition system and issues related to face detection and feature extraction have been discussed. The steps of pre-processing, face detection, face normalization, feature extraction, and the use of a comparator have been explained in detail.

Literature Review
There is a need to develop linear feature extraction algorithms for face identification and detection under various parameters of FR to improve FR algorithms with respect to space and time complexity and performance accuracy. Additionally, features need to be extracted from the magnitude and phase components of the image in the frequency domain. The traditional and intelligent techniques for object (face and eye) detection and face identification need to be compared and a suitable classifier for the extracted features needs to be selected.
Within the classification, less variety in parameters is needed in SVM compared with an ANN. SVM minimizes the generalization error and avoids the overfitting problem. For improvement in these factors, we must propose a new algorithm.
Taleb et al. [1] discuss access control using facial detection by using PCA-LDA algorithms. In this study, an access control mechanism for vehicle parking is proposed, which works based on a camera installed in the parking areaʹs entryway. The camera recognizes the driver's face, which is then matched with data images, and a decision is made as to whether this driver has authority to park the car in this parking area or not. For this, the Viola-Jones method is used for face detection and, by their proposed method, the authors detected variations in the pose, which was a severe issue at the time. In this study, the authors used PCA, which will not provide sensible results if the principal components are not linear combinations. For a facial recognition dataset with names, we can utilize a straight separate examination. It is utilized to handle arrangements. PCA requires the information fluctuation after the decrease in dimensionality to be enormous and isolated as broadly as could be expected, while LDA requires the difference inside the similar classification of information bunches after projection to be just about as small as could be expected, and the change between gatherings to be as extensive as possible. This implies that LDA regulates the decrease in dimensionality and it should utilize the mark data to isolate various classes of information as much as could reasonably be expected.
Choi et al. [2] proposed a method based on discriminant analysis that provides a composite feature vector to recognize a face. First, feature extraction from the image is performed given holistic and local features by using discriminative features. A comparison is made between the proposed composite component method and other methods such as holistic, regional, and hybrid methods. The proposed technique displayed better facial recognition compared with utilizing just the holistic or local features. Many training sets contain 2D sets in which SVM can find a set of straight lines to classify the training data correctly [2]. Because of the restriction on the quantity of information in the preparation set, the examples outside the preparation set might be nearer to the dividing line than the information in the preparation set. So, we should pick the line farthest from the closest informative element, specifically the help vector. This is the limitation of utilizing SVM.
Khan et al. [3] proposed an algorithm for face detection based on a Convolutional Neural Network (CNN). For validation purposes, the authors developed a student classroom attendance management system using face recognition. The authors used the LFW dataset for the training of the model. This system was able to detect 35 students and recognize 30 students out of 40 students in the image. An accuracy of 97.9 was obtained by using testing data. In this study, facial recognition was applied over a classroom for marking the students' class attendance, where the features were fixed for the classroom.
Peng et al. [4] reviewed different methods and algorithms developed and used for face detection. The authors discussed the early stages of the development of the PCA and LDA algorithms. SVM, Ada Boost, small samples, and neural networks were discussed for classification. The authors focused on facial recognition based on actual conditions, for which they used deep learning. Ganidisastra and Bandung [5] developed an online examination portal in which students give examinations in online mode using proctored mode. An online examination proctoring model is proposed that will not work if the lighting in the area is not proper or some postures cannot be identified, and the student will receive a notification informing them that they have engaged in malpractice and will be blocked.
This system was designed to monitor students during a test. The system should prevent malpractice and be able to verify that the student who is giving the examination is a verified student. For this, the authors used CNN-FR. The problem that arises during facial recognition in different postures/poses is variations in the lighting system. So, other authors have used image equalization and SURF to address this lighting issue. Here, the authors proposed an incremental training process that will reduce the computation cost and time. Yolo Face, MTCNN, LBP, and the Haar-cascade face detector were used for accuracy, and the Face Net model was tested. This deep-learning-based face detector overcomes the limitations of other available methods and achieved an accuracy of 98%. Realtime video-based facial recognition is also available [19], where the attendance of students can be managed by the recognition of faces and, by experimental analysis, researchers obtained an accuracy of 82%. When it comes to the self-learning model, a new optimized radial basis function (RBF) neural network algorithm based on the Genetic Algorithm (GA-RBF algorithm) [20] is used. The GA-RBF [21] algorithm is used to reduce the inputs over the RBF network [22], and then training and simulation of the model are conducted. Hammouch et al. [23] used four feature extraction approaches on the basis of Discrete Cosine Transform (DCT) to extract features from a digital handwritten document, and the same was used for the comparison with traditional PCA. For the COVID-19 pandemic situation, Pushpalatha et al. [24] proposed a human action recognition system to identify a person. The proposed system can be used for the surveillance of COVID-19 wards for patient identification.
PCA [25] and LDA [26] are two commonly used algorithms that are used to fuse features, human activity recognition [27,28], and feature extraction [29]. PCA extracts features based on their similarity within the class itself and the dissimilarity of a particular individual in other classes due to its covariance matrix, which is based on all images of the training set. LDA extracts features of a particular individual from within the class itself only to discriminate among the individuals. This algorithm maximizes the ratio of between-class variance to within-class variance in any particular object set to maximize the separability. A hybrid approach was also proposed in which a combination of probabilistic neural networks (PNNs) and improved kernel linear discriminant analysis (IKLDA) is used for facial recognition. The proposed hybrid approach achieved an accuracy of 97.22 over the ORL dataset.
Cook et al. [30] studied demographic factors, such as gender and age, by which support is provided to calculate performance and perform classification. They checked the performance based on 11 commercial biometric systems of the U.S. Department of Homeland Security in 2018. Out of the 11 systems, every single system had 363 subjects in a controlled environment. A commercial algorithm was used to calculate the efficiency and accuracy over the dataset. Prior work has shown that different biometric algorithms produce different results in demographic categories and found that skin phenotypes are best from this perspective. In this proposed method, all of the work was done automatically and the measurement of relative facial skin reflectance using subjects can be done easily by linear modeling. It was observed that the overall accuracy of the systems is inversely proportional to the size of the skin reflectance effect; i.e., if the accuracy is high, the size has to be at a minimum.
Yadav et al. [31] used color details of images to exploit skin, face, or eye color by applying a color convertor algorithm to remove background and other unnecessary details from images to detect/identify objects. Kalbkhani et al. [32] converted a RGB image [33,34] into YCBCR color using a nonlinear transformation and used an eye mapping algorithm based on a created face mask to detect the location of eyes on faces or the face itself in an image. The YCBCR is a color space transformation algorithm. Y is the luma component and CB and CR are the blue-difference and red-difference chroma components, respectively.
The authors of [35] located the eye coordinates by separating skin color and eye color details in the HSL color space. However, color conversion techniques are time consuming and are not suitable for real-time applications. The colored images obtain more details of faces in the color space, which affects the performance and accuracy. However, colored images require more time and memory for image analysis as compared with gray-level images. The object detection algorithms have limitations with respect to distinguishing an iris from closed eyes or eyes wearing glasses, so some researchers used these algorithms to detect the whole eye instead of the iris in order to identify and locate closed eyes. The low-pass filter of horizontal details of the Haar Discrete Wavelet Transform (DWT) is applied on sub-blocks. It provides more information about the eye as compared with highpass filtering. Then, PCA and LDA are applied for feature extraction on the low-pass filter of horizontal details of the Wavelet transform. Testing for eye detection is performed on the ORL and Yale face databases. The authors observed that the error on translation using Wavelet Transform combined with PCA is less than that using WT without PCA. Additionally, the scaling aberration of WT + PCA is less than that of wavelet coefficients. Shapebased eye detection models provide better object detection/tracing in real-time applications. However, they are sensitive to various angle orientations, degradation, and noisy images. The PCA algorithm has been implemented on GMM data to extract significant features, which reduce the time and space dimensionality of images as well as the accuracy of eye, nose, and mouth detection.

Proposed Model
The decision boundary is the critical issue in SVM algorithms, where a radial basis function is the one that changes with the distance from a location. An ORL and YALE face dataset was used that contains 400 images of forty different subjects. Validation was used in this experimental work, as 80% of the dataset was used for training and the remaining 20% was used for validation.
In some of the subjects, factors such as time, lighting, and facial expressions (smile/non-smile, open eyes/closed eyes), and details of faces, such as the subject wearing glasses or not wearing glasses, will be different at different times. The issue will be in the frontal positioning in all images, and the background will be dark and homogeneous. The size of each image is 92 × 112 pixels, and each pixel has 256 grey levels. Each photo in the training and testing sets of pictures was apportioned into the equivalent size of squares. Subsequently, the DCT coefficient was determined for each square. The obtained coefficients were changed to include vectors. Then, the highlight vectors of the training set were prepared by the radial basis function-based SVM. In this, the radial basis function portion boundary of the SVM was streamlined by traditional tasks. Discrete Cosine Transform (DCT) is a strong transform used to extract features in facial recognition. After implementing DCT over all the images, feature vectors were constructed based on Zonal masking coefficients. Optimization was done through GA-RBF. When there was a requirement to compensate for illumination variations, the available low-frequency coefficients were discarded.
Step-by-Step Procedure of Proposed Model as in Figure 3 is as mentioned below: 1. Divide the dataset into training and testing datasets.
2. Perform feature extraction using DCT, for which DCT coefficients need to be calculated for all training set images and normalized.
3. Calculate the RBF-SVM function by , exp ∥ ∥ , where the decision boundary will be decided by σ. RBF-SVM classifies benign from malignant cases. 4. Define the objective function input of the RBF hyperparameters and the output of a test score. Then, use the Genetic Algorithm for optimization (GA-RBF). It is an adaptive system; it automatically changes its organization, design, and association weights without human intervention and makes it possible to join a Genetic Algorithm with the RBF Kernel parameters. 5. A framework of robust capacities given as * , where accuracy is considered at an upgraded estimation of the optimized value and by taking several generations to find the optimized value. 6. Results.  Figure 4 shows the RBF network topology, in which the activation function is taken by the hidden layer in the form of a RBF. Figure 5 shows the entire procedure of the Genetic Algorithm, and Figure 6 explains the automation in the network's establishment/adjustment and connection weights where human intervention is not required. Accurate mapping of the Genetic Algorithm with a neural network is done.     Figure 4, the network has m inputs and n outputs, the hidden layer contains s neurons, and and are the connection weights between the information layer and the hidden layer and the hidden layer and the output layer, respectively. The threshold value associated with the hidden layer is . is the input of the hidden layer and is calculated by Equation (1).
The final output is calculated by Equation (3) as where 1,2, … . The error function is calculated by Equation (4) as where is the final actual output of the network. The input face picture is first changed from a spatial area to a recurrence area. Different fundamental transformation methods were utilized, such as Discrete Wavelet Transform (DWT) and DCT. DCT [9] is used for highlight extraction because of its information compaction property. The 2D DCT is considered a distinct administrator premise that works for 8 * 8 pixels as in [1]. The 2D DCT is used with the assumption that the data array has a finite rectangular support on 0, 1 0, 1 . The 2D DCT is given as [10] , where , 0, 1 0, 1] or , ≜ 0. Assuming , 7,7 .
where K is the portion of the work, is the value of the preparation test, and , is the boundary value of the model.

Result Analysis
Frontal face images were taken from the ORL and YALE face database for our experimental setup, and the number of pictures varied from 10 to 40. Ten different poses were selected from a 1:40 ratio of unique subjects. The lighting factor was fixed for all upstanding frontal images. The fixed size taken for all photos was 112 × 94 pixels. For testing, boundaries were set as square measures of 8. Table 1 shows the experimental setup. Tables 2 and 3 contain the practical results based on parameters such as the number of faces, accuracy, training time, testing time, and classification time for the PCA-SVM, DCT-SVM using GA-RBF, and SVM-DCT models. Table 4 contains a comparison between the proposed and other deep learning methods.  Table 4. Comparison between the proposed and other deep learning methods.

Methods
Accuracy FKNN [36] 87% LBPH, KNN, and BPNN [37] 98% Deep-learning-based face recognition attendance system [38] 95.02 Deep learning using OpenCV [39] 91.7% Proposed 98.17% Five different experiments were conducted by varying the number of faces from 10 to 40. A comparison of the proposed model with existing models was made by using several experimental parameters, including accuracy, training time, testing time, and classification time. The accuracy of the models is presented in Figure 7. An average accuracy of 98.346 was achieved by using DCT-SVM using GA-RBF, which is better than the other two models as 86.668 was achieved by PCA-SVM and 96.098 was achieved by SVM-DCT. A comparison of training time and classification time was also made. The average training time of DCT-SVM using GA-RBF is 2.621, as presented in Figure 8, which is better than that of the DCT-SVM and PCA-SVM models. The average classification time for DCT-SVM using the GA-RBF model is 2.86, as shown in Figure 9, which is better than the 3.49 by PCA-SVM and the 2.94 by DCT-SVM. The number of samples per face was set to 6 for all five experiments.    The proposed model is better than PCA-SVM and DCT-SVM in terms of accuracy, time taken for training by the model, and time taken for classification by the model. The accuracy of the proposed model increased as the number of faces increased. The training time for the proposed model was 1.2, which was the minimum, when the number of faces was 20. The classification time for the proposed model was 1.422, which was the minimum, when the number of faces was 20. It was observed that the proposed model gave the best result when the number of faces was 30.
In [40], the authors proposed a global expansion ACNN and achieved an accuracy of 91.67% by using the ORL dataset. Chen et al. [41] used a combination of a CNN and SVM and achieved an accuracy of 97.50% by using the ORL dataset.
Our proposed model achieved an average accuracy of 98.346% by FRS-DCT-SVM using GA-RBF, which is better than the those of the ACNN and CNN+SVM.

Conclusions and Future Work
In this paper, we proposed a novel face detection algorithm that provides better results than the DCT-SVM and PCA-SVM models. It uses a combination of DCT and SVM and uses the Genetic Algorithm for optimization. After dividing the dataset into two pools for training and testing, DCT was used for feature extraction so that DCT coefficients could be used to train the proposed algorithm and normalize the ORL and YALE face dataset images. Furthermore, the RVM-SVM function was used to start the classification from malignant cases, and the Genetic Algorithm was used for optimization. As a result, the proposed model provides higher accuracy, takes less time for training, and requires less time for classification than the DCT-SVM and PCA-SVM models. Experimental results are presented and were compared with results from other models. An internal comparison for the proposed model was also made based on varying values of several face images for parameters such as accuracy, training time, testing time, and classification time. Future work could include some enhancement of the proposed model so that the accuracy in the case where the number of faces is 20 can also be increased as it is minimal compared with the cases where the number of faces is 10, 25, 30, and 40. The testing time for the proposed model is more than that for DCT-SVM, so we will try to minimize that and create our own dataset of facial images.
The proposed algorithm was applied to databases that are limited in terms of their size and type. In the future, it may apply to large databases and noisy pictures. In any case, we only considered the factors of lighting, presentation, illumination, and verbalization in the database. We may additionally fuse age and sex-bearing fragments. To develop a secure framework, the proposed model could be used together with other biometric structures, such as Iris Fingerprint and Retina.