1. Introduction
Since its beginnings in the 1970s, research on face recognition has increased rapidly, particularly since the revolution in ICT and digital photography and scanning technologies since the 1990s [
1]. A Face Recognition System (FRS) can deploy automated technologies to check faces according to stored and verified features. On the most fundamental level, FRSs determine and predict the extent of similarity between two faces (i.e., between two images of faces) to enable verification of a person’s identity; on the elemental level, it takes a photo of a person seeking verification and compares the photo against a pre-verified stored (library) image. This rudimentary overview includes numerous phases of varying complexity in practical applications. When an FRS is confronted by a face, it must first detect that this target object is indeed a face. During this basic process of determining a face (e.g., using algorithmic calculations based on major facial characteristics and physical distances etc. within the facial topography), the FRS can undertake initial attempts to identify the particular face, by checking whether it matches any known (stored) faces in accessible databases [
2].
FRS has diverse applications in many fields requiring identity verification and person location, such as finding missing children or criminals in crowds, enabling employee access to appropriate areas within a workplace, and calling up customer or patient records from governmental or healthcare databases [
3,
4]. FRS usually involves detection and pre-processing of the face being processed, followed by feature extraction, and then face recognition [
5].
In the initial phase, the face image is typically improved by various enhancements to eliminate noise within the image (i.e., superfluous data), in order to hone in on precise facial features of interest and utility for identification purposes [
5]. Features extracted can be either “local” (e.g., the mouth, nose, eyes, etc.) or global (such as the facial topography and relative locations of local features) [
6,
7]. Features extracted by these methods are subsequently categorized using machine learning classifiers, such as Artificial Neural Network (ANN), K-nearest neighbour, and Support Vector Machine (SVM) [
8,
9].
Facial recognition can struggle to define effective and discriminative facial descriptors due to great variability in light levels, pose, expression, image resolution, and partial occlusion, among other potential issues [
10,
11,
12]. Most modern FRSs deploy single features, while complex and multifarious face recognition tasks required more modalities of facial features to comprehensively classify facial image data [
10,
11,
12]. Consequently, improvements in information fusion during feature extraction and processing are sorely required for FRS industrial solutions.
Face recognition data fusion is executed at the feature or decision level [
13,
14]. Feature-level methods combine multiple feature sets in a unified, fused set, which is subsequently processed with a normal classifier. Decision-level techniques combine numerous characteristic-related classifiers to generate a stronger aggregate classifier [
10,
15,
16]. Fusion at the feature level has simple training requirements for the mono-phase learning process for the combined feature vector, and it can detect and utilize multiple features’ correlations during an initial period. It is necessary that fused features are in a uniform format [
14,
16].
Utilizing such techniques; this research presents an FFLFRM model rooted in Multi-Resolution Discrete Cosine Transform (MDCT) technique for feature fusion. In this research, the MDCT technique is selected for feature fusion due to its effectiveness in enhancing the fused image resolutions [
17]. Local and global facial features were extracted via MDCT, followed by Principal Component Analysis (PCA) and Local Binary Pattern (LBP). Subsequently, the feature vectors that had been fused underwent classification with the use of Multi-Layer Perceptron (MLP) ANN. Following this, the developed FFLFRM was executed with 10,000 greyscale images for evaluation. Test images were sourced from the Olivetti Research Laboratory (ORL) facial image database. Model performance was contrasted with three advanced facial recognition models, depending on the fusion techniques of Covariance Intersection (CI), Frequency Partition (FP), and Laplacian Pyramid (LP). The comparison considered the issues of expression, illumination, low image resolution, occlusion, and pose.
This study demonstrates the coherent integration of local and global methods of feature extraction to generate sound conclusions. The particular contributions of this study are to propose an MDCT fusion-based FLFRM; evaluate its performance with MLP ANN in terms of expression, illumination, low resolution of face images, occlusion, and pose with 10,000 facial images from the ORL database; and contrast the developed model’s performance with that of three advanced fusion models based on CI, FP, and LP [
18,
19,
20]. The proposed MDCT fusion-based FLFRM has a significant impact on real-time face recognition applications, hence is able to analyse more sophisticated facial characteristics and categorize images/faces according to higher-order identity markets. In addition, it can be applied in many fields requiring identity verification and personal location, such as finding missing children or criminals in crowds and calling up customer or patient records from governmental or healthcare databases.
The following section review literature related to techniques and models for facial recognition.
Section 3 presents the FFLFRM developed in this study, followed by a presentation and discussion of the experimental results. Finally,
Section 5 concludes the paper and identifies areas for future research.
2. Literature Review
The integration of data from multiple images in a single image is the crux of image fusion [
21]. The resultant image ought to incorporate detailed information and improve applicability for observers (perceiving the image) than the original image/object. In terms of facial recognition, the decision or feature levels can be the key dimension in which this occurs [
13]. Decision-level image fusion has been explored by many studies, whereby individual classifiers are scored in relation to individual local features extracted from images [
10,
22,
23,
24,
25,
26,
27,
28,
29,
30], based on whose integration final decisions are made [
14,
16]. This usually entails the combination of classifiers’ output scores [
14,
16]. LBP, Gabor, and pixel scores were fused in [
10], with post-processing normalization. The same local descriptors were used in [
31] to fuse variable LDA-depend on one-shot similarity scores and were deployed with further Gabor features by Wolf et al. [
32]. Hellinger, ranking-based, one-shot, and two-shot distances are demonstrably capable of attaining highly efficient classification [
14].
Feature-level fusion begins by collating the features extracted within a single feature vector, which is subsequently handed on to the classifier [
14]. LBP technique was generalized for texture classification in [
33] with variances and pixel intensities derived from local patches. Extensive experimentation with three sophisticated texture datasets (KTHTIPS2b, Outex, and CUReT) revealed that the developed model achieved the best classification for KTHTIPS2b data, with outcomes comparable to advanced existing solutions for CUReT, due to including LBP variants within a joint histogram approach.
In [
34] a novel image set-matching method was developed comprising strong facial region descriptors depending on local features, numerous exemplar, and sub-space metrics for comparing related facial regions, and joint learning of more discriminative facial regions when ascertaining optimal weight mixing to combine metrics. MOBIO, LFW, and PIE face datasets were used in experiments which determined that the algorithm significantly outperformed comparator techniques, including the Local Principal Angle and the Kernel Affine Hull techniques.
In [
35] a novel descriptor for re-identification purposes was developed, utilizing advanced Fisher vectors with a basic attribute vector of pixel coordinates whose per-pixel intensity was calibrated in the ETHZ and VIPeR two-person re-identification benchmarks. The descriptor attained impressive effectiveness for the studied datasets. To obtain a global image representation, the local descriptors were turned into Fisher Vectors before pooling. Consequently, local descriptors encoded by the Fisher Vector (LDFV) were experimentally confirmed.
Yuan et al. [
36] used Local Phase Quantization (LPQ) and LBP to devise an FRS whereby the facial image is segmented into different zones; these are subjected to LBP operator analysis for feature detection, while the LPQ operator determines related frequency area features. This LBP-LPQ hybrid FRS presents an improved feature vector for face description. AR and YALE face databases were used for experimental investigations which demonstrated that the method has better facial recognition accuracy than individual methods.
In [
37] a new method using Local Ternary Patterns (LTP) and LBP descriptors for facial image representation was developed, with feature similarity selection and classification algorithm to improve recognition. The facial image is initially divided into smaller zones which are used to draw LTP and LBP histograms, which are subsequently collated within a single feature vector. Experimental testing with the ORL database and Extended Yale Face Database B affirmed the impressive performance of the algorithm.
Gu and Liu [
38] developed a new LBP feature extraction method with encoded features and texture data, defined by Gabor wavelet features, edges, and colour features. The process beings by extracting feature pixels from the target image to form a binary image, then a distance-vector field is generated with distance vector calculation between each pixel and the nearest feature pixels within the binary image. Experiments using eye detection with FERET and BioID datasets revealed the suitability of the technique (FLBP), which achieved more accurate localization of the eye centre compared to other tested models.
Li [
39] obtained local SIFT and LBP features from densely sampled and multi-scale image spots. They trained a Gaussian mixture model (GMM) connection each feature with the corresponding position, to obtain all of the training set facial images’ spatial distribution. To verify facial identification, SVM was trained on the vector to calibrate vector variance related to all feature pairs to determine whether a tracked faces matched. They proposed a joint Bayesian adaptation technique to calibrate the general (universally trained) GMM and model pose differences in target faces, which consistently improved face verification accuracy. They demonstrated that their method significantly outperformed alternative models for the YouTube video face dataset’s most highly constricted protocol and the Labeled Face in the Wild (LFW) dataset.
Vu’s [
40] novel facial image description technique, Patterns of Oriented Edge Magnitudes (POEM), considers links between orientations and gradient magnitudes of numerous local image structures. Whitened PCA dimensionality reduction method was applied for POEM- and POD-based images to attain compact and additional discriminative face descriptors. An experimental investigation with numerous common benchmarks was conducted, including the LF and FERET data sets’ non-frontal and frontal images. The outcomes indicated that POEM achieved higher efficiency than alternative methods, with greater simplicity and more powerful performance.
Tan and Triggs [
10] developed an FRS with feature-level fusion by extracting two feature sets with Gabor wavelet descriptors and LBP local appearance. The joined feature vector was tested with the Kernel Discriminative Common Vector technique to indicate discriminant non-linear recognition features. Performance outcomes were tested with numerous face databases, including FERET, FRGC 1.0.4, and FRGC 2.0.4.
Mirza [
41] explored the fusion of global and local features for gender classification. LBP was utilized to extract local features, supported by two-dimensional DCT, while global feature extraction used PCA and Discrete Cosine Transform (DCT). The suggested system was tested extensively using the FERET dataset, for which it achieved a recognition efficiency of 98.16%.
Yan et al. [
42,
43,
44,
45,
46] suggested a Multi-feature Fusion and Decomposition (MFD) model based on the multi-head attention and the backbone network for age-invariant face recognition. The proposed model reduces the intra-class variants by learning the discriminative, efficient, and robust features. CACD and CACD-VS databases were used for experimental investigations which demonstrated that the suggested model has better facial recognition accuracy than state-of-the-art models.
Nusir [
18] developed an FRS using the FP method and feature-level fusion to collate local and global features with LBP and PCA. Experimental work on facial images from the ORL database revealed that the developed method achieved improved face recognition efficiency and was more powerful than individual LBP- and PCA-based methods.
Qing Guo et al. [
47] integrated the expectation-maximization (EM) algorithm with the covariance intersection (CI) principle in a new image fusion approach using data source cross-correlation, thereby producing estimates with accuracy and consistency due to the use of convex combinations. In practical applications covariance information is usually unknown, thus EM helps by providing a maximum likelihood estimate (MLE) for the covariance matrix.
Al-Shatnawi et al. [
19] used Laplacian Pyramid (LP) for facial recognition with feature-level fusion, combining local and global features utilizing LBP and PCA. Testing with ORL database facial images with MLP NN revealed that the developed model produced more efficient face recognition results than models depending on LBP and PCA techniques alone in terms of facial expression, illumination, and occlusion challenging contexts. El-Bashir et al. [
20] also produced a feature-level fusion-based face recognition model based on the CI technique that they experimentally evaluated using ORL database facial images with MLP NN, and they also reported the improved effectiveness of their developed model.
3. Proposed Model (FFLFRM)
This study proposes an MDCT fusion-based FFLFRM to detect faces and surfaces, extract and fuse features, and classify faces with MLP ANN (
Figure 1). Initial face detection based on core features (e.g., eyes, etc.) uses the Haar-cascade face detection technique. Local and global feature extraction utilizing LBP and PCA enables MDCT fusion. Subsequently, fused feature vectors are input in the MLP ANN for facial classification. The proposed FFLFRM architecture is displayed in
Figure 1, and its steps are described below.
3.1. Face Detection with Haar-Cascade
The Haar-cascade face detection method identifies the core facial features (i.e., mouth, nose, and eyes) depending on appearance [
2,
12,
48]. Haar-like facial features are used, unlike pixel analysis techniques [
49]. Haar-like features comprise related features’ quasi-rectangular shapes representing target features [
49]. Extraction typically deploys integral image methods, Adaptive Boosting (AdaBoost), and attentional cascade [
50]. This research deployed Wang’s [
50] modified Viola-Jones Haar-cascade face detection system to detect four face feature patches: mouth, nose, eyes, and face (general).
3.2. Global Facial Feature Extraction with Principal Component Analysis (PCA)
Feature extraction is the most essential stage in FRS, which determines its ultimate effectiveness, seeking to efficiently represent the facial image based on characteristics (features) extracted globally or locally using PCA and LBP, prior to MDCT fusion of the features [
12,
51]. Traditional statistical linear transform PCA is widely used for pattern [
52] and face recognition [
53]. PCA extracts features statistically by identifying global features for fusion with local ones extracted using MSCT [
54]. PCA is commonly used for feature extraction to limit image feature dimensionality. It begins by determining the data matrix mean, then it calculates covariance, eigenvalues, and eigenvectors [
55]. PCA identifies the space indicating the maximal difference of studied data, determining the low-dimensional space (PCA space (W)) used for data transformation (X = {x1, x2, …, xN}) from a higher- to lower-dimensional space, where N is the number of samples, and xi expresses the i
th observation, sample, or pattern [
56].
3.3. Local Facial Feature Extraction with LBP
Texture analysis is the underlying technique of LBP local statistical face feature extraction [
57], which fuses locally extracted features with global features extracted with MDCT. The central pixel within a 3 × 3 pixel block is used to determine feature values based on the pixel threshold. The complete facial image is then rendered as a feature vector in decimal values [
18,
58].
3.4. Feature Fusion with Multi-Resolution Discrete Cosine Transform (MRDCT)
Naidu’s [
17] MDCT-based image fusion of extracted local and global facial features deployed feature vectors subsequently used by MLP ANN. As per the 1D
DCT method, MDCT-based fusion separates the target image into columns and rows, which are then processed to generate 1D vector data; the data vector subsequently undergoes
DCT. MDCT decomposes the vector to the maximum decomposition available (
) by deployed
DCT, to generate image enhancement.
DCT decomposition of the vector results in High Frequency (HF) and Low Frequency (LF) coefficients; the former has minimal image data, while the latter encompasses the prerequisite material to generate the processed, fused image. The LF data is thus sent to Inverse
DCT to obtain new vector data, which enables further decomposition for the next level [
17].
where
k is the multi-resolution decomposition level; note that
.
Image and integration by MDCT is undertaken as described below.
1: Convert images and to 1D vectors and .
2: Find the
DCT coefficients
and
using the previous two vectors
and
. The fusion role of MDCT coefficients are:
3: Obtain the fused image utilizing Equations (1) through (4).
Figure 2 displays the MDCT diagram [
17].
This study used FFLRM with the MDCT fusion technique for the fusion of extracted features from the PCA and the LBP. PCA-derived extracted features undergo maximum decomposition in MDCT (). Features extracted from the LBP are maximally decomposed by MDCT () to attain a higher-resolution image (than the original). Following LBP and PCA feature decomposition, the MDCT deploys Inverse DCT at each decomposition level to generate the fused image.
3.5. Face Recognition with Artificial Neural Network (ANN)
ANN is an effective classification tool widely applied in prediction, pattern classification, and approximation activities [
5,
59,
60,
61]. In terms of FRS, multi-layer perceptron ANN (MLP ANN) was used in this paper for fused facial images’ facial recognition.
4. Experimental Results and Discussion
The effective face recognition model has to ideally deal with a number of challenges. It mainly contains expression changes, illumination, images with low resolution, occlusion, and pose [
2]. To validate the proposed FFLFRM based on MDCT model effectiveness, it has been compared with three state-of-the-art models that were developed based on FP [
18], LP [
19], and CI [
20] features fusion techniques. The proposed FFLFRM is based on the MDCT technique, and three state-of-the-art models were run using MATLAB@2015a programming language on a PC with an Intel Core i7 processor (2.40 GHz, 8 GB RAM).
The proposed FFLFRM and three state-of-the-art models were tested and evaluated using MLP ANN for 10,000 face images derived from the ORL database based on the expression changes, illumination, and images with low resolution, occlusion, and pose challenges. The comparative evaluation results of their classification efficiencies are presented and summarized in
Table 1 and are elaborated on and discussed in more depth below.
4.1. Pose Change
Changing angles of capture (i.e., of cameras) and changing positions of photo subjects (i.e., human faces) during image capture result in changes in pose, which alter facial geometry (in the captured images) [
19,
36]. This can result in inaccurate renderings of facial characteristics and thus reduce image recognition accuracy. The comparative evaluation results of the proposed FFLFRM and the three state-of-the-art models for pose change are presented in
Figure 3. It can be noticed that the proposed FFLFRM achieved 96.66% efficiency in classification, which is lower than one model (the FR-based model, with 97.02%), but slightly higher than the other two using LP and CI (with 96.14% and 96.23%). These results indicate the acceptability of the proposed FFLFRM’s effectiveness in facial recognition in the context of changing poses.
4.2. Illumination Change
Altering lighting conditions result in illumination changes, which can significantly alter the facial appearance, thereby undermining accuracy in FRS [
19,
36]. The comparative evaluation results for the proposed FFLFRM and the three state-of-the-art models for illumination change are displayed in
Figure 4. It highlights the proposed FFLFRM classification efficiency of 97.07%, outperforming the other models (which had 96.47%, 97.03%, and 96.89% efficiency), indicating its effectiveness for face recognition with illumination change.
4.3. Expression Change
Varying facial expressions are fundamental to human communication, entailing significant changes in facial features, with obvious implications for FRS [
19,
36]. Comparative evaluation results for performance between the proposed FFLFRM and the three state-of-the-art models for change in facial expression are shown in
Figure 5. It can be seen that the FFLFRM had the highest classification efficiency (97.70%). The classification efficiencies of the three other models were 97.73%, 98.02%, and 97.68%. The performance of the proposed FFLFRM is better than that of the model depending on IC, but only lower than that of the models depending on LP and FR. This approves the effectiveness of the proposed FFLFRM in face recognition under the condition of expression change.
4.4. Low-Resolution Images
The resolution of an image of a face is affected by various contextual factors, including ambient conditions during image capture (particularly illumination, as discussed above), and the technical specifications and abilities of the camera used to capture the image [
19,
36]. Low-resolution images generally undermine the accuracy of FRS. Comparative evaluation results for performance between the proposed FFLFRM and the three state-of-the-art models for image resolution are shown in
Figure 6. It can be seen that the proposed FFLFRM achieved a classification efficiency of 97.11%, outperforming the other models’ classification efficiency rates (96.99%, 96.5%, and 96.1%). This indicates the proposed FFLFRM effectiveness for face recognition with low-resolution images.
4.5. Occlusion Challenge
The complete or partial covering of the face results in occlusion, which hinders feature extraction in FRS [
19,
36]. Comparative evaluation results for performance between the proposed FFLFRM and the three state-of-the-art models for occlusion are shown in
Figure 7. It can be seen that the proposed FFLFRM attained the highest classification efficiency with 96.87%, compared to the other models (with 96.18%, 96.2%, and 96.48%), which affirms the proposed FFLFRM effectiveness for occlusion-condition face recognition.
Thus, based on the abovementioned five discussed challenges, the proposed FFLFRM based on MDCT model classification efficiency achieved the best results outperforming the other three state-of-the-art models in terms of illumination change, dealing with low-resolution images, and working effectively with occlusion-condition. But in terms of changing poses challenges, the proposed model achieved promising classification results higher than the two state-of-the-art based on LP and CI models. Furthermore, it achieved better classification efficiency compared to the CI state-of-the-art model under the condition of expression change. These results indicate the acceptability of the proposed FFLFRM effectiveness in facial recognition in the context of the above-mentioned challenges. Consequently, it proves the proposed FFLFRM based on the MDCT model effectiveness compared to the three state-of-the-art models (i.e., Frequency Partition (FP), Laplacian Pyramid (LP), and Covariance Intersection (CI)).
Moreover, to validate the proposed FFLFRM based on MDCT model effectiveness compared to various transformation methods such as Discrete Fourier Transform (DFT), Fast Fourier Transform (FFT) and Discrete Wavelet Transform (DWT), we have tested and evaluated the proposed FFLFRM and three abovementioned transformations methods using MLP ANN for 10,000 face images derived from the ORL database based on classification accuracy and execution time during the training and testing phases. The comparative evaluation results of the proposed FFLFRM based on MDCT and the three abovementioned methods of classification efficiencies are presented and summarized in
Table 2, and graphically shown in
Figure 8. As well as the executed time of the FFLFRM model using the four transformation methods (i.e., MDCT, DFT, FFT, and DW) during the training and testing phases using the ORL dataset is presented in
Table 3.
It is noticed that in
Table 2, the proposed FFLFRM model using the MDCT method achieved the best classification accuracy outperforming the other three transformation methods (i.e., DFT, FFT, and DW) when tested on the ORL database using MLP ANN. Furthermore,
Table 3 proves that the proposed FFLFRM model using the MDCT method is faster than the other three abovementioned transformation methods, and thus proves the effectiveness of the MDCT as a feature fusion technique for fiscal real-life applications.
5. Conclusions and Future Works
This research has presented an FFLFRM rooted in the MDCT fusion technique with face detection, feature extraction and fusion (using the MDCT method), and face classification (utilizing the MLP ANN) stages. Facial characteristics such as eyes and mouth etc. were initially detected by the Haar-cascade technique, followed by local and global feature extraction with PCA and LBP for the MDCT fusion technique. Consequently, MLP ANN applied facial classification based on fused feature vector inputs. Testing of the developed model’s performance in terms of facial recognition involved comparative analysis with the accurate classification performance of three state-of-the-art models for fusion-level face recognition, utilizing the FP, LP, and CI fusion techniques. Testing was conducted to evaluate models’ performance with 10,000 images from the ORL database utilizing the MLP ANN conditions. The proposed model achieved the following classification efficiency levels for the studied conditions: expression change (97.70%), illumination change (97.07%), low image resolution (97.11%), occlusion (96.87), and pose change (96.66%).
The proposed FFLFRM based on MDCT model classification efficiency achieved the best results outperforming the other three state-of-the-art models in terms of illumination change, dealing with low-resolution images, and working effectively with occlusion-condition. In terms of dealing with low-resolution images, the proposed FFLFRM based on the MDCT model produced (97.11%) classification accuracy, while the other three state-of-the-art models produced (96.99%, 96.5% and 96.1%) classification accuracies respectively. In terms of working effectively with occlusion the proposed FFLFRM based on the MDCT model produced (96.87%) classification accuracy, while the other three state-of-the-art models produced (96.18%, 96.2% and 96.2%) classification accuracies respectively. In terms of illumination change the proposed FFLFRM based on the MDCT model produced (97.07%) classification accuracy, while the other three state-of-the-art models produced (96.47%, 97.03% and 96.89%) classification accuracies respectively. But, dealing with changes poses challenges, the proposed model achieved promising classification results that are higher than the two state-of-the-art based on LP and CI models. Hence it produced (96.66%) classification accuracy, while the other three state-of-the-art models produced (97.02%, 96.14% and 96.23%) classification accuracies respectively. Furthermore, it achieved better classification efficiency compared to the CI state-of-the-art model under the condition of expression change. Hence it produced (97.7%) classification accuracy, while the other three state-of-the-art models produced (97.73%, 98.2% and 97.68%) classification accuracies respectively. These results indicate the acceptability of the proposed FFLFRM effectiveness in facial recognition in the context of the abovementioned challenges. Consequently, it proves the proposed FFLFRM based on the MDCT model effectiveness compared to the three state-of-the-art models (i.e., Frequency Partition (FP), Laplacian Pyramid (LP), and Covariance Intersection (CI)).
Furthermore, the proposed FFLFRM based on MDCT model effectiveness is verified and compared with Discrete Fourier Transform (DFT), Fast Fourier Transform (FFT) and Discrete Wavelet Transform (DWT) transformation methods, in terms of classification accuracy and execution time during the training and testing phases. The results proved that the proposed FFLFRM model using the MDCT method achieved the best classification accuracy compared to the other three transformation methods when tested on the ORL database using MLP ANN. Hence, it produced (97.7%) classification accuracy, whereas the other three methods produced (96.8%, 96.3% and 97.5%) classification accuracies respectively. The proposed FFLFRM based on the MDCT model is faster than the other three transformation methods in both the training and testing phases. Therefore, this paper concluded that MDCT is simpler, faster, and more accurate than the DFT, FFT, and DWT as well as, and it is an effective method for facial real-life applications.
In future research, it is planned to evaluate the FFLFRM’s performance with alternative classifiers, including SVM and HMM, and to compare decision-level fusion performance. Further research can explore the fusion of various global and local feature extraction techniques with the deployment of MDCT. Furthermore, it will be interesting to test and evaluate the proposed model using other large-scale face recognition datasets.