A Multicomponent Face Verification and Identification System

Douklias, Athanasios; Zorzos, Ioannis; Maltezos, Evangelos; Nousis, Vasilis; Bolierakis, Spyridon Nektarios; Karagiannidis, Lazaros; Ouzounoglou, Eleftherios; Amditis, Angelos

doi:10.3390/app15158161

Open AccessArticle

A Multicomponent Face Verification and Identification System

by

Athanasios Douklias

^*

,

Ioannis Zorzos

,

Evangelos Maltezos

,

Vasilis Nousis

,

Spyridon Nektarios Bolierakis

,

Lazaros Karagiannidis

,

Eleftherios Ouzounoglou

and

Angelos Amditis

Institute of Communication and Computer Systems (ICCS), 15773 Zografou, Greece

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(15), 8161; https://doi.org/10.3390/app15158161

Submission received: 10 June 2025 / Revised: 18 July 2025 / Accepted: 19 July 2025 / Published: 22 July 2025

(This article belongs to the Special Issue Application of Artificial Intelligence in Image Processing)

Download

Browse Figures

Versions Notes

Abstract

Face recognition technology is a biometric technology, which is based on the identification or verification of facial features. Automatic face recognition is an active research field in the context of computer vision and artificial intelligence (AI) that is fundamental for a variety of real-time applications. In this research, the design and implementation of a face verification and identification system of a flexible, modular, secure, and scalable architecture is proposed. The proposed system incorporates several and various types of system components: (i) portable capabilities (mobile application and mixed reality [MR] glasses), (ii) enhanced monitoring and visualization via a user-friendly Web-based user interface (UI), and (iii) information sharing via middleware to other external systems. The experiments showed that such interconnected and complementary system components were able to perform robust and real-time results related to face identification and verification. Furthermore, to identify a proper model of high accuracy, robustness, and performance speed for face identification and verification tasks, a comprehensive evaluation of multiple face recognition pre-trained models (FaceNet, ArcFace, Dlib, and MobileNetV2) on a curated version of the ID vs. Spot dataset was performed. Among the models used, FaceNet emerged as a preferable choice for real-time tasks due to its balance between accuracy and inference speed for both face identification and verification tasks achieving AUC of 0.99, Rank-1 of 91.8%, Rank-5 of 95.8%, FNR of 2% and FAR of 0.1%, accuracy of 98.6%, and inference speed of 52 ms.

Keywords:

face recognition; face identification; face verification; artificial intelligence; deep learning; neural networks; computer vision; mobile application; Web-based UI; mixed reality

1. Introduction

Face recognition technology is a biometric technology, which is based on the identification or verification of facial features [1,2,3]. Automatic face recognition in 2D and 3D [4] is an active research field in computer vision tasks and artificial intelligence (AI) that is fundamental for a variety of real-time applications. Face recognition can be performed either by verification or identification. Verification involves comparison of the features related to a given face with the corresponding feature template against a claimed identity mainly presented in a document or card. It involves only one single comparison. On the other side, identification compares the given face with all the feature templates stored in a database to find the identity of a face among several possibilities that matches the most closely. Face identification takes time proportional to the size of the database [5].

Related to the face recognition accuracy, there are several challenges due to pose variation, facial expression scale variation, aging, partial occlusion, resolution, noise or blur, and complex illumination [5,6,7,8]. Deep learning (DL) techniques have received increased attention for achieving satisfying results in many problems related to feature extraction, object detection, classification, semantic segmentation, and other applications by implementing deep neural network (DNN) schemes under a supervised or an unsupervised setting [9,10,11,12,13]. In this content, several DNNs algorithms have been developed in recent years to achieve high accuracy and performance on face recognition tasks [14,15]. Despite the complexity of such architectures and the high computational costs, they provide several advantages, compared with traditional methods, in terms of learning ability, high variability, and generalization [5,16]. Among DNNs, convolutional neural networks (CNNs) are considered the best-fitting option [17,18]. Such well-known and efficient DNN and CNN models are (i) the DeepFace [19], (ii) DeepID [20], (iii) DeepID2 [21], (iv) Baidu [22], (v) FaceNet [23], (vi) VGGFace [24], (vii) LightCNN [25], (viii) CNN with spatial transformer layers [26], (ix) deep coupled ResNet (DCR) [27], (x) PSI-CNN [28], (xi) ArcFace [29], (xii) MagFace [30], and (xiii) CFR-GAN [31]. The aforementioned models require a huge amount of memory resulting in limited devices and embedded systems. Despite the rise of several lightweight CNN architectures such as PocketNet [32], MobileFaceNet [33], EfficientNet-B0 [34], and GhostNet [35], it seems that they suffer from high floating point operations [36], as well as their heavy and more complex counterparts, such as face rotation and low-level face inputs [17].

In [5,8,14,17,37,38,39,40,41], a number of face datasets are referenced. In [14,42,43], several innovative approaches are presented such as explainable techniques [44] or the development of semi-supervised methods exploiting little labeled data, as well as the utilization of transfer learning and curriculum learning methods to enable the extension of existing methods to new data instead of starting from scratch every time. Furthermore, the authors highlight the necessity of the use of proper diversity datasets, such as [39], to reduce gender and racial bias in the networks.

Finally, related to the related legal framework in Europe for the use of AI systems, the AI Act (Regulation (EU) 2024/1689 laying down harmonized rules on artificial intelligence) has been implemented. The aim of the rules is to foster trustworthy AI in Europe. The AI Act sets out a clear set of risk-based rules for AI developers and deployers regarding specific uses of AI including, among others, biometric systems. More details for the AI Act can be found in [45].

1.1. Related Work

A typical face recognition system broadly follows three steps: (i) detection and extraction/capture of face, (ii) feature extraction and representation, and (iii) face recognition [5]. Related to the feature extraction, for instance, FaceNet creates unified embeddings of the faces and then compares the faces in the embedding space [46]. FaceNet is a DNN used to obtain facial features and determine if there is a match between the input faces and a non-matching face with triplet-based loss function. More details for the feature extraction of FaceNet can be found in [8,23,47]. In [48] the authors focused on building a secure authentication system with face, location, and gesture recognition as components. User gestures and location data were a sequence of time series. They utilized unsupervised learning in a long short-term memory recurrent neural network to actively learn to recognize, group, and discriminate user gestures and location. Moreover, a clustering-based technique was also implemented for recognizing gestures and location. In [49] the authors presented a fast and invariant to illumination changes AdaBoost framework for face and eye detection. The main components of the proposed architecture were face detection at a distance, face feature extraction, user ID recognition, and user authenticator. In [50] a face detection and recognition system for blind people in real time by using the Raspberry Pi and Android app was introduced. The main aspects of the study were an object detection function based on boosted cascade and a face recognition function based on eigenfaces. In [51] the authors present the architecture of a smart imaging sensor system for face recognition, based on a custom-design smart pixel capable of computing local spatial gradients in the analog domain and a digital coprocessor that performs image classification. In [52] a system was designed for the identification of person’s face with or without a mask. The input data were obtained from either an image or a video adopting in the architecture used a MobileNet. In [53], the authors proposed a system with several features such as face database construction, database management system, and human-computer interaction system function. Their system was based on a CNN. In [54], an encrypted face recognition payment system was proposed. In [55], two deep learning models for face recognition were specially designed for applications on mobile devices and resources saving environments. In [56] the authors introduced an innovative real-time framework that leverages deep learning techniques, particularly CNNs, to accurately detect human faces in complex images. In [57], a novel approach to real-time criminal detection through the use of cutting-edge face recognition technology was proposed. The proposed system uses a multi-task cascade neural network to reliably identify and recognize faces in difficult situations, such as low light or obscured views. The authors of [58] proposed a facial recognition intrusion detection system in energy-efficient terms. As referenced in [55,59], nowadays, the future solutions and research include the design and development of mobile, ubiquitous portable, and low resource devices for face recognition in various research fields.

1.2. Our Contribution

In this research, we aimed to contribute to the aforementioned growing research with the design and implementation of a face verification and identification system of a flexible, modular, secure, and scalable architecture. The proposed system incorporates several and various types of system components, such as (i) mobile application, (ii) mixed reality (MR) glasses, (iii) Web-based user interface (UI), and (iv) middleware for information sharing to other external systems. In this context, several interconnected and complementary system components and distributed actors/operators are able to be involved for several types of applications in real-time terms. Furthermore, to identify a proper model of high accuracy, robustness, and performance speed for the face identification and verification tasks, a comprehensive evaluation of multiple state-of-the-art face recognition algorithms on a curated version of the ID vs. Spot dataset [60] was performed. Such a dataset is particularly challenging as it includes face images captured in both controlled and uncontrolled environments, and according to our knowledge, there is a lack of related studies that perform such comprehensive evaluation on it.

This research is organized as follows. Section 2 presents the proposed architecture scheme, as well as the related developed system’s components. Also, the implementation details and the evaluation methodology related to the face recognition algorithms are presented. Section 3 presents the results of the evaluation of the face recognition models and the results from the various system’s components. Section 4 provides a discussion related to the extracted results and limitations. Finally, Section 5 concludes the manuscript.

2. Materials and Methods

2.1. System’s Architecture

In Figure 1, the architecture of the proposed face verification and identification system is presented. The Backend service block incorporates several architectural sub-blocks such as the Rest API; the Database; and other sub-blocks associated with the face verification and identification process such as the Face detection, Face standardization, Feature extraction, Verification, and Identity search sub-blocks. The Face standardization step refers to the horizontal alignment of the face image (making sure that the eyes are horizontal) and picture size normalization. This step enhances the consistency of the face verification functionality by reducing the system’s dependence on the picture provided by the user. For the experiments of this research, dummy data were created to fill the Database sub-block. The Backend service block is connected to several system components such as the Mobile App, Web-based UI, Mixed Reality glasses, and 3rd party systems blocks. The Backend service, Mobile App, Web-based UI, Mixed Reality glasses, and 3rd party systems blocks adopt authentication and encryption mechanisms (VPN and TLS) to ensure a high security aspect. The components Mobile App, Web-based UI, and Mixed Reality glasses utilize the same Rest API interface. The Mobile App and the Mixed Reality glasses blocks both send and receive information through the Rest API, while the Web-based UI only receives data. The Metadata & Statistics sub-block incorporates the related metadata and calculated statistics that are visualized in the Web-based UI block. The Mobile App block incorporates the UI, App logic, and face detection sub-blocks. In the case that more than one face image is detected, only one is selected from the Backend service. The face image that is selected is the one with the maximum area, while the rest are rejected. The results from the Backend service block are translated and shared via the Adapter block to the 3rd party systems block. The Adapter module connected to the 3rd party systems block facilitates integration, meaning that it gathers face recognition/verification results and sends them to other systems. The 3rd party systems block can be considered to be middleware (e.g., Apache Kafka [61]) for information sharing with other external systems. The bidirectional arrows, when present, represent the bidirectional exchange of information.

In this context, several workflows are executed involving several distributed actors/operators for various applications and scenarios:

A face verification and identification process is achieved via the Mobile App block (see Section 3.2).
A face identification process is achieved via the Mixed Reality glasses block (see Section 3.3).
The Web-based UI block visualizes the results from the Mobile App block (see Section 3.4).
The results from the Mobile App block are shared to the 3rd party systems block (see Section 3.5).

The specifications of the utilized machine for the Backend service are (i) CPU, 8 cores of Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz; (ii) RAM, 32 GB; and (iii) operating system, Ubuntu 22.04.

2.2. Deep Learning Model for the Face Identification and Verification

2.2.1. Dataset Curation

The ID vs. Spot dataset [60] was used for this research. This dataset comprises a vast collection of face images captured in controlled (ID photos) and uncontrolled (Spot photos) environments. To enhance the robustness of the experiments, a meticulous curation of the dataset was undertaken to eliminate poor-quality photos against exclusion criteria such as blur (variance of Laplacian < 100), severe lighting, over/under-exposure, face coverage < 40% of frame, etc. The curated dataset ultimately utilized for the experiments encompassed a total of 1232 ID RGB photos and 3967 Spot RGB photos. Prior to inputting the images into the models, several preprocessing steps were executed, such as the following:

Resizing: All images were resized to a standard dimension of 160 by 160 pixels.
Normalization: The pixel values of the images were normalized to achieve a mean of 0 and a standard deviation of 1.
Histogram equalization: Histogram equalization was applied to augment the contrast of the images.

2.2.2. Implementation Details

The models were implemented using the PyTorch library in Python. Several auxiliary libraries were used such as Numpy, Scipy, Pandas, and Tensorboard. The experiments were conducted on a machine with the following specifications: (i) CPU, Intel^® Core™ i9-10900X CPU @ 3.70 GHz; (ii) GPU, Nvidia^® 3090 RTX; (iii) RAM, 64 GB; (iv) operating system, Linux Ubuntu 20.04.1; (v) programming language, Python, 3.7.10; and (vi) libraries, PyTorch 1.8.1.

2.2.3. Pre-Trained Models

Four cutting-edge pre-trained face recognition algorithms were evaluated on the curated ID vs. Spot dataset:

FaceNet: A deep convolutional network devised by Google for face recognition [23].
ArcFace: A method employing additive angular margin loss for deep face recognition [29,62].
Dlib (baseline): A toolkit encompassing various machine learning algorithms, including face recognition [63,64]. Specifically the dlib_face_recognition_resnet_model_v1, a 29-layer ResNet trained on a cleaned subset of MS-Celeb-1M and producing 128-D embeddings, was employed [65,66]. Dlib’s ResNet offers a compact (12 MB) C++ inference engine with zero external dependencies, widely adopted on edge devices and often used in academic baselines. A lightweight CNN such as LeNet-5 does not reach acceptable accuracy on modern benchmarks; hence, Dlib is a more realistic “classical” baseline.
MobileNetV2: A method employing large margin cosine loss for face recognition [62,67,68].

Additional details for the parameters and giga floating point operations (GFLOPs) of the aforementioned models are provided. The Dlib model is a streamlined ResNet-34 with ~6 million parameters, 0.4 GFLOPs, 29 conv layers, and half the filters per layer, hence its significantly lower parameter count and computation. MobileNetV2–AM refers to a MobileNetV2 backbone with an additive margin loss (ArcFace/CosFace); the full MobileNetV2 (width = 1) has ~3.4 million parameters and ~0.3 billion multiply-adds at 224 × 224 input, which scales to roughly 0.15 GFLOPs at 160 × 160. FaceNet-IR-v1 uses the Inception-ResNet v1 architecture and has ~7.5 million parameters, and its pretrained model requires on the order of 1.6 GFLOPs for 160 × 160 images. ArcFace-ResNet100 is a 100-layer ResNet model; it has ~65 million parameters and, when evaluated on 112 × 112 input, about 24.2 GFLOPs, which corresponds to roughly 49 GFLOPs at 160 × 160. Further details can be found in [29,63,64,67,69].

2.2.4. Similarity Measure and Evaluation Methodology

Cosine similarity [70] is a measure used to gauge the similarity between two entities, typically represented as vectors in a multi-dimensional space. In the context of face recognition, both for identification and verification, cosine similarity assesses how closely aligned the feature vectors of different face images are. The closer the cosine similarity is to 1, the more similar the faces are. In our application, we utilize cosine similarity due to its effectiveness in high-dimensional spaces, like those involved in facial recognition. It helps in differentiating good results from poor ones by providing a clear metric for similarity. Additionally, we employed a normalized cosine distance, calculated as 1−cosine similarity, expressed as a percentage. This approach aids users by quantifying the degree of resemblance between faces, offering an intuitive and user-friendly way to interpret the similarity between the faces being compared or verified.

The performance of the face recognition algorithms was assessed using objective evaluation metrics:

Face identification: This entails matching a given face image to a known identity in a database. The Rank-1 identification rate [71], defined as the percentage of test images for which the correct identity is ranked first, served as the evaluation metric. The Rank-5 identification rate represents the percentage of test images for which the correct identity is within the top 5 matches (or ranks) from the database.
Face verification: This involves determining whether two given face images belong to the same person. The area under the receiver operating characteristic curve (AUC-ROC) [72], the false negative rate (FNR), and the false acceptance rate (FAR) [73] served as the evaluation metrics.

3. Results

3.1. Face Identification and Verification Results

This section presents the results of the experiments conducted to evaluate the performance of the face recognition algorithms on the curated ID vs. Spot dataset. The results are presented in terms of the aforementioned evaluation metrics.

In our research, we adopted a data-driven approach to determine the optimal cosine similarity threshold for face verification. This method involved analyzing the distributions of cosine distances for both positive (same person) and negative (different persons) pairings in the dataset. By employing a decision tree algorithm, we identified the threshold that maximized information gain and thereby optimally distinguished between identities. Specifically, a decision stump, i.e., a depth-1 decision tree, was employed, using cosine similarity as the sole input feature. The objective was to identify a threshold that maximally distinguishes between positive (same identity) and negative (different identity) face pairings by maximizing information gain, measured via entropy. To ensure generalizability and avoid overfitting, a five-fold cross-validation strategy was applied, selecting the averaged threshold. Of utmost importance was to reduce the FAR in order to achieve a highly secure system. This threshold, set at 0.235, was crucial in enhancing the precision of the face verification process ensuring high accuracy and reliability.

Concerning the face identification task, Table 1 shows the Rank-1 and Rank-5 identification rates attained by each face recognition algorithm and the baseline model on the curated ID vs. Spot dataset utilizing 3880 pairs of images. Figure 2 exhibits the ROC curves of the face recognition algorithms and the baseline model on these image pairs. As shown in Table 1, ArcFace outperformed other models in terms of Rank-1 and Rank-5 identification rates. However, it is pertinent to also consider the inference time, which significantly impacts the real-time applicability of these algorithms. The inference time for ArcFace was measured at 97 ms, whereas FaceNet required only 52 ms. This lower inference time is crucial for real-time face recognition tasks where rapid processing is imperative. Therefore, despite the slightly higher identification rates of ArcFace (and the similar achieved AUC of 0.99), FaceNet was chosen due to its superior speed, demonstrating a more balanced trade-off between accuracy and performance speed, making it more suitable for the proposed system. The MobileNetV2 and Dlib models also demonstrated competitive performance, although they lagged behind in accuracy.

For the evaluation of face verification, we meticulously constructed a dataset comprising 3967 pairs of ID and spot RGB photos, each representing either the same individual or different individuals. This balanced dataset allowed for a robust assessment of the algorithms’ performance under varying conditions. The same threshold of the cosine similarity, as in face identification, was used for face verification as well. Thus, we rigorously tested each pair against a cosine similarity measure threshold of 0.235, a critical parameter for distinguishing between matches and non-matches, computed as stated earlier. The evaluation process was designed to measure the algorithms’ accuracy in correctly identifying true positives (accurate matches) and avoiding false positives (erroneous matches). Notably, FaceNet emerged as a highly effective tool in this evaluation, achieving an outstanding accuracy of 98.6%. This high level of accuracy highlights FaceNet’s capability in consistently and reliably verifying facial identities, even in a diverse and challenging dataset. The results from this evaluation serve as a testament to the robustness and reliability of FaceNet.

In addition to the primary metrics evaluated, the implementation of FaceNet was rigorously tested to assess its FNR and FAR, which are crucial for understanding the algorithm’s robustness in correctly identifying matching face pairs. Remarkably, the FaceNet implementation exhibited a nearly negligible FNR of 0.02 (2%) and FAR of 0.001 (0.1%). Such an exceedingly low FNR and FAR underscore the model’s proficiency in accurately recognizing faces, minimizing the likelihood of overlooking true matches. Also, such low FNR and FAR further bolster FaceNet’s standing as a highly reliable choice for the proposed system, ensuring that false negatives cases are significantly mitigated. Also, a more related analysis is provided in Figure 3 and Figure 4. Figure 3 illustrates the trade-off between the FNR and the FAR in a logarithmic scale via the detection error tradeoff (DET) curve. Each point on the curve represents a different decision threshold, capturing how tightening the threshold to reduce FNR inevitably increased FAR and vice versa. Notably, at an operating point where FNR was 0.1%, the FAR rose to approximately 19%, underscoring how demanding a near-zero FNR came at the expense of a high FAR. Conversely, at a more stringent threshold aimed at FAR = 0.1%, the FNR reached just over 5%, reflecting a stricter acceptance criterion that lowered the false acceptance but increased the rejection of true matches.

Figure 4 complements this perspective by plotting both FNR and FAR as functions of the similarity threshold on a log scale. As the threshold increases, we see a pronounced decrease in FAR (orange curve) but a corresponding uptick in FNR (blue curve), revealing a typical crossover point where the two error rates are comparable. The vertical dashed lines highlight specific threshold values—one that achieves FNR ≈ 0.1% and one that achieves FAR ≈ 0.1%—visually confirming the steepness of the trade-off region.

3.2. FACE-VI Mobile Application

In Figure 5, the case of the utilization of the FACE-VI app in real-time terms is presented.

In Figure 6, the welcome page and other related pages from the Face Verification-Identification (FACE-VI) app are depicted. The user should log in with the related credentials and then is able to select either the face identification function or the face verification function. Furthermore, the user is able to configure some settings related to the two functions such as the matching score threshold (%), which is related to the similarity measure (where matching score = 100 ∗ (1 − similarity measure)), above which the related matching scores are highlighted (with green in the results), as well as the number of the top matched identities (1 to 5) to be listed in the results (for the face identification function).

Once the user starts with either the face identification function or the face verification function, the mobile device’s camera and location are activated. To ensure that a proper face image is captured, a face detection box and a supporting-colored box and bar appear to guide the user. Once the supporting box, the related bar, and the middle-bottom cycle are in green, the user is able to capture the face image. For the face identification task, one face image is required (i.e., one captured on the spot to be compared with the ones in a related database), while for the face verification task, two face images are required (one captured on the spot and one from the face photo of a document or card). The face identification results (Figure 7) are a list of the matched identities with the related matching scores and other additional information and metadata, such as name, surname, etc. The user has two options: (i) to select one matched identity from the listed matched identities and then to verify the person or (ii) to reject the listed matched identities. By clicking the related option, i.e., “VERIFY” or “REJECT”, an identification or mis-identification event is able to be respectively send to another system, e.g., to the Web-based UI (see Section 3.4) or to other external systems via middleware (see Section 3.5).

The face verification result (Figure 8) is the matching score that indicates how similar the two face images are. The user has two options: (i) to verify the person or (ii) to reject the result. Similarly, by clicking the related option, i.e., “VERIFY” or “REJECT”, a verification or mis-verification event is able to be sent to another system.

It should be noted that for both functions, the related events are not automatically extracted but only after the related action from the user, i.e., only by clicking “VERIFY” or “REJECT”. Thus, the final decision related to the face identification or verification events is by the user. According to the real-time experiments, the total processing mean time for the execution of a face identification or a face verification process, and the related visualization to the FACE-VI app, was measured as 1.75 s.

3.3. Mixed Reality Glasses

In Figure 9 (left) the case of the utilization of the MR glasses in real-time terms is presented. The associated face identification results are depicted in Figure 9 (right).

The idea of importing into an MR HMD, the face image capturing functionality along with the results coming from the Backend service related to the face identification processing, came up due to two main reasons. At first, MR HMDs allow users to operate hands-free by enabling eye tracking commands, hand recognition and interactions, or even voice commands. Moreover, the operation is similar to a mobile device, meaning it uses a camera input for video and image capturing and is easy to establish data exchange with services. Finally, the HMD setup enables the possibility to experiment with advanced visualizations in space by utilizing spatial information coming from MR glasses.

The user should first log in with the related credentials. The application was developed in Unity for Hololens 2 (UWP MR glasses). The application starts using its camera as input. Each camera frame is processed by a face detection model with the sole purpose to detect faces and their corresponding bounding boxes. Two of the texted ML models are FaceONNX and BlazeFace. Inference of face detection models was run on the device (Hololens 2) to achieve real-time results and provided as results the recognition confidence and the area on the image where a face was detected. To forward a single face as input to the Backend service for the face identification, only the face detection result with the highest confidence is taken under consideration for each frame. As postprocessing, the image is cropped and shared to the Backend service for the face identification process getting as a result the recognized person metadata such as the name, surname, etc.

For advanced spatial visualizations we experimented with combining the Hololens 2 spatial mapping feature, which generates meshes for each detected surface, which includes human bodies. This means that human bodies and surfaces are represented as low poly meshes with no detail that are interactable using colliders. We leveraged this functionality to create a projection method that utilized a ray-cast from the center of the face detection bounding box and its corresponding position on the camera field of view toward the dynamic generated mesh as a simplistic way to overlay results coming from the face identification process on top of the human’s position in real space. According to the real-time experiments, the total processing mean time for the execution of a face identification process and related visualization via the MR glasses was measured as 0.15 s.

3.4. Web-Based UI

In Figure 10, the related pages from the Web-based UI are visualized. The user of the Web-based UI’s should first log in with the related credentials. The Web-based UI’s functionalities are the following: (i) map visualization of the extracted event’s (identification, mis-identification, verification, and mis-verification) from the FACE-VI app, (ii) calculation of several statistics, and (iii) list of the related transactions (events extracted by the FACE-VI app with the possibility of filtering (according to the type of events, date, time, etc.) and of depicting related information, e.g., the type of events, the matching score, etc. According to the real-time experiments, the total mean time for the visualization of the FACE-VI app results (either from a face identification or a face verification process) in the Web-based UI being measured as 1 s.

3.5. Middeware

In Figure 11, a sample *.JSON file (formed in a specific data model) is depicted for a mis-verification case via the FACE-VI app that is able to be shared via Apache Kafka [61] (as middleware) to other external systems. The topics are associated with the (i) id number of the extracted event, (ii) date and time of the event, (iii) type of the extracted event, (iv) the location (coordinates) of the used mobile device, (v) other useful information such as the calculated matching score (%), and (vi) associated captured images as Base64 format.

4. Discussion

Among the tested models, FaceNet emerged as a preferable choice due to its balance between accuracy and inference speed for both face identification and verification tasks. The results of this research are consistent with previous studies that have found the FaceNet model to perform well in face recognition tasks. A study by [74] found that the FaceNet model achieved a higher accuracy than other face recognition models on the YALE face dataset. Also, in [8], FaceNet achieved an accuracy of 99.6% in the Labeled Faces in the Wild (LFW) dataset, as well as the related system achieving a 49 ms runtime. In the same study, several other combinations with FaceNet achieved rates from 79.1% to 95.7% with a runtime from 49 ms to 75 ms. In [46], FaceNet achieved a recognition rate of 97.2% on a masked face dataset [75]. Such results are quite similar to the extracted ones in this research, indicating a homogeneous and stable performance of FaceNet. However, it is worth noting that the performance of face recognition models can vary depending on the dataset and the application scenario. For example, a study by [29] found that the ArcFace model outperformed the FaceNet model in a face identification task on the YouTube Faces (YTF) and LFW datasets. Therefore, it is important to test the models on the specific dataset and application scenario of interest before deploying them. As previously mentioned, the proposed system employs a flexible, modular, secure, and scalable architecture and thus, in this context, enables the deployment of any proper model and not only FaceNet. In [3,5,13,36,37,43,51], several comparative analyses were performed, highlighting the performance of several algorithms/models in various datasets for face recognition.

However, there are some limitations that should be considered when interpreting the results. In this research the performance of the models was evaluated on only a single dataset, the curated ID vs. Spot dataset. While this dataset is quite representative and consists of face images that are captured in both controlled and uncontrolled environments, the results may not generalize to other datasets or application scenarios. Furthermore, for this research and the implemented model, to achieve satisfactory results, the captured face portrait image must (i) contain the face of one person, (ii) be taken in proper light conditions to ensure that all face features appear properly and also are not occluded (i.e., do not wear masks, glasses, etc.), and (iii) contain a face that is looking straight into the camera’s lens and without face expressions and face rotations.

5. Conclusions

Automatic face recognition is an active research field that is fundamental for a variety of real-time applications. Face recognition can be conducted by either verification or identification. Verification involves comparison of the features related to given face with the corresponding feature template against a claimed identity mainly presented in a document or card. It involves only one single comparison. On the other side, identification compares the given face with all the feature templates stored in a database to find the identity of a face among several possibilities that matches the most closely.

This study contributes to this growing research with the design and implementation of a face verification and identification system of a flexible, modular, secure, and scalable architecture and backend service. The proposed system incorporates several and various types of system components: (i) with portable capabilities (mobile application and MR glasses), (ii) for enhanced monitoring and visualization via a user-friendly Web-based UI, and (iii) for information sharing with other external systems. The experiments show that such interconnected and complementary system components are able to produce robust and real-time results related to face identification and verification tasks, achieving total processing and visualization mean times of 1.75 s, 0.15 s, and 1 s via the FACE-VI app, the MR glasses, and the Web-based UI, respectively. In this context, the proposed system enables the involvement of various actors/operators for several types of applications in real-time terms.

Furthermore, to identify a proper model of high accuracy, robustness, and performance speed for the face identification and verification tasks, a comprehensive evaluation of multiple face recognition pre-trained models (FaceNet, ArcFace, Dlib, and MobileNetV2) on a curated version of the ID vs. Spot dataset was performed. The performance of the models was assessed in terms of objective evaluation metrics. Among the models used, FaceNet emerged as a preferable choice for real-time tasks due to its balance between accuracy and inference speed for both face identification and verification tasks, achieving AUC of 0.99, Rank-1 of 91.8%, Rank-5 of 95.8%, FNR of 2% and FAR of 0.1%, accuracy of 98.6%, and inference speed of 52 ms.

Future work could enhance this research by evaluating the performance of the models on multiple datasets and in multiple application scenarios. Additionally, other evaluation metrics and other face recognition models could be considered. Another direction for future work could be to retrain related models and to investigate the impacts of data augmentation techniques on the training data, as well as to investigate the impacts of different model configurations and training parameters on the performance of the models.

Author Contributions

Conceptualization, A.D., I.Z., E.M. and L.K.; methodology, A.D. and I.Z.; software, A.D., I.Z., S.N.B. and V.N.; validation, A.D., I.Z., S.N.B. and E.M.; writing—review and editing, E.M., I.Z., A.D. and S.N.B.; supervision, A.D., E.M., L.K., E.O. and A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This work is a part of the Flexi-cross project. This project has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No 101073879. Content reflects only the authors’ view, and the Research Executive Agency (REA)/European Commission is not responsible for any use that may be made of the information it contains.

Institutional Review Board Statement

Institute of Communication and Computer Systems has already submitted an ethical approval (approval date: 29.02.2024) for this research in the context of the European Union’s Horizon Europe research and innovation program under grant agreement No. 101073879.

Informed Consent Statement

Informed consent was obtained from the researchers involved in real-time experiments.

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from [60] and are available from the authors of [60] with the permission of [60].

Acknowledgments

This work is a part of the Flexi-cross project. This project has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No 101073879. Content reflects only the authors’ view, and the Research Executive Agency (REA)/European Commission is not responsible for any use that may be made of the information it contains.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this paper:

AI	Artificial intelligence
AUC-ROC	Area under the receiver operating characteristic curve
CNN	Convolutional neural networks
DET	Detection error tradeoff
DL	Deep learning
DNN	Deep neural network
FACE-VI	Face verification-identification
FAR	False acceptance rate
FNR	False negative rate
LFW	Labeled Faces in the Wild
GFLOPs	Giga floating point operations
MR	Mixed reality
UI	User interface
YTF	YouTube Faces

References

Public-IvS. Available online: http://www.cbsr.ia.ac.cn/users/xiangyuzhu/projects/LBL/main.htm (accessed on 14 February 2025).
Kortli, Y.; Jridi, M.; Al Falou, A.; Atri, M. Face Recognition Systems: A Survey. Sensors 2020, 20, 342. [Google Scholar] [CrossRef] [PubMed]
Li, L.; Mu, X.; Li, S.; Peng, H. A Review of Face Recognition Technology. IEEE Access 2020, 8, 139110–139120. [Google Scholar] [CrossRef]
Samatas, G.G.; Papakostas, G.A. Biometrics: Going 3D. Sensors 2022, 22, 6364. [Google Scholar] [CrossRef]
Jayaraman, U.; Gupta, P.; Gupta, S.; Arora, G.; Tiwari, K. Recent development in face recognition. Neurocomputing 2020, 408, 231–245. [Google Scholar] [CrossRef]
Waller, B.M.; Kavanagh, E.; Micheletta, J.; Clark, P.R.; Whitehouse, J. The face is central to primate multicomponent signals. Int. J. Primatol. 2024, 45, 526–542. [Google Scholar] [CrossRef]
Sharma, R.; Sharma, V.K.; Singh, A. A Review Paper on Facial Recognition Techniques. In Proceedings of the 2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Palladam, India, 11–13 November 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 617–621. [Google Scholar] [CrossRef]
Sanchez-Moreno, A.S.; Olivares-Mercado, J.; Hernandez-Suarez, A.; Toscano-Medina, K.; Sanchez-Perez, G.; Benitez-Garcia, G. Efficient Face Recognition System for Operating in Unconstrained Environments. J. Imaging 2021, 7, 161. [Google Scholar] [CrossRef] [PubMed]
Guérin, J.; Gibaru, O.; Thiery, S.; Nyiri, E. CNN features are also great at unsupervised classification. arXiv 2017, arXiv:1707.01700. [Google Scholar] [CrossRef]
Voulodimos, A.; Doulamis, N.; Doulamis, A.; Protopapadakis, E. Deep Learning for Computer Vision: A Brief Review. Comput. Intell. Neurosci. 2018, 2018, 7068349. [Google Scholar] [CrossRef]
Douklias, A.; Karagiannidis, L.; Misichroni, F.; Amditis, A. Design and Implementation of a UAV-Based Airborne Computing Platform for Computer Vision and Machine Learning Applications. Sensors 2022, 22, 2049. [Google Scholar] [CrossRef]
Maltezos, E.; Doulamis, A.; Ioannidis, C. Improving the visualisation of 3D textured models via shadow detection and removal. In Proceedings of the 2017 9th International Conference on Virtual Worlds and Games for Serious Applications (VS-Games), Athens, Greece, 6–8 September 2017; pp. 161–164. [Google Scholar] [CrossRef]
Park, J.; Yang, H.; Roh, H.-J.; Jung, W.; Jang, G.-J. Encoder-Weighted W-Net for Unsupervised Segmentation of Cervix Region in Colposcopy Images. Cancers 2022, 14, 3400. [Google Scholar] [CrossRef]
Bansal, A.; Ranjan, R.; Castillo, C.D.; Chellappa, R. Deep CNN Face Recognition: Looking at the Past and the Future. In Deep Learning-Based Face Analytics; Ratha, N.K., Patel, V.M., Chellappa, R., Eds.; Advances in Computer Vision and Pattern Recognition; Springer International Publishing: Cham, Switzerland, 2021; pp. 1–20. [Google Scholar] [CrossRef]
Wang, W. The development of face recognition in accuracy and speed: A Review. In Proceedings of the 2021 2nd International Conference on Computing and Data Science (CDS), Stanford, CA, USA, 28–29 January 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 79–89. [Google Scholar] [CrossRef]
Boutros, F.; Damer, N.; Kuijper, A. QuantFace: Towards Lightweight Face Recognition by Synthetic Data Low-bit Quantization. arXiv 2022, arXiv:2206.10526. [Google Scholar] [CrossRef]
Perez-Montes, F.; Olivares-Mercado, J.; Sanchez-Perez, G.; Benitez-Garcia, G.; Prudente-Tixteco, L.; Lopez-Garcia, O. Analysis of Real-Time Face-Verification Methods for Surveillance Applications. J. Imaging 2023, 9, 21. [Google Scholar] [CrossRef] [PubMed]
Zhang, A.; Lipton, Z.C.; Li, M.; Smola, A.J. Dive into Deep Learning. arXiv 2021, arXiv:2106.11342. [Google Scholar] [CrossRef]
Han, F.; Ling, Q.-H.; Huang, D.-S. Modified constrained learning algorithms incorporating additional functional constraints into neural networks. Inf. Sci. 2008, 178, 907–919. [Google Scholar] [CrossRef]
Sun, Y.; Wang, X.; Tang, X. Deep Learning Face Representation from Predicting 10,000 Classes. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1891–1898. [Google Scholar] [CrossRef]
Sun, Y.; Wang, X.; Tang, X. Deep Learning Face Representation by Joint Identification-Verification. arXiv 2014, arXiv:1406.4773. [Google Scholar] [CrossRef]
Liu, J.; Deng, Y.; Bai, T.; Wei, Z.; Huang, C. Targeting Ultimate Accuracy: Face Recognition via Deep Embedding. arXiv 2015, arXiv:1506.07310. [Google Scholar] [CrossRef]
Schroff, F.; Kalenichenko, D.; Philbin, J. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 815–823. [Google Scholar] [CrossRef]
Parkhi, O.M.; Vedaldi, A.; Zisserman, A. Deep Face Recognition. In Proceedings of the British Machine Vision Conference 2015, Swansea, UK, 7–10 September 2015; British Machine Vision Association: Durham, UK, 2015; pp. 41.1–41.12. [Google Scholar] [CrossRef]
Wu, X.; He, R.; Sun, Z.; Tan, T. A Light CNN for Deep Face Representation with Noisy Labels. arXiv 2015, arXiv:1511.02683. [Google Scholar] [CrossRef]
Zhong, Y.; Chen, J.; Huang, B. Towards End-to-End Face Recognition through Alignment Learning. arXiv 2017, arXiv:1701.07174. [Google Scholar] [CrossRef]
Lu, Z.; Jiang, X.; Kot, A. Deep Coupled ResNet for Low-Resolution Face Recognition. IEEE Signal Process. Lett. 2018, 25, 526–530. [Google Scholar] [CrossRef]
Nam, G.; Choi, H.; Cho, J.; Kim, I.-J. PSI-CNN: A Pyramid-Based Scale-Invariant CNN Architecture for Face Recognition Robust to Various Image Resolutions. Appl. Sci. 2018, 8, 1561. [Google Scholar] [CrossRef]
Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 4685–4694. [Google Scholar] [CrossRef]
Meng, Q.; Zhao, S.; Huang, Z.; Zhou, F. MagFace: A Universal Representation for Face Recognition and Quality Assessment. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 14220–14229. [Google Scholar] [CrossRef]
Ju, Y.-J.; Lee, G.-H.; Hong, J.-H.; Lee, S.-W. Complete Face Recovery GAN: Unsupervised Joint Face Rotation and De-Occlusion from a Single-View Image. In Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1173–1183. [Google Scholar] [CrossRef]
Boutros, F.; Siebke, P.; Klemt, M.; Damer, N.; Kirchbuchner, F.; Kuijper, A. PocketNet: Extreme Lightweight Face Recognition Network Using Neural Architecture Search and Multistep Knowledge Distillation. IEEE Access 2022, 10, 46823–46833. [Google Scholar] [CrossRef]
Chen, S.; Liu, Y.; Gao, X.; Han, Z. MobileFaceNets: Efficient CNNs for Accurate Real-Time Face Verification on Mobile Devices. In Biometric Recognition; Zhou, J., Wang, Y., Sun, Z., Jia, Z., Feng, J., Shan, S., Ubul, K., Guo, Z., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 10996, pp. 428–438. [Google Scholar] [CrossRef]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946. [Google Scholar] [CrossRef]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features From Cheap Operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1577–1586. [Google Scholar] [CrossRef]
Hoo, S.C.; Ibrahim, H.; Suandi, S.A. ConvFaceNeXt: Lightweight Networks for Face Recognition. Mathematics 2022, 10, 3592. [Google Scholar] [CrossRef]
Kim, S.; An, B.S.; Lee, E.C. Comparative Analysis of AI-Based Facial Identification and Expression Recognition Using Upper and Lower Facial Regions. Appl. Sci. 2023, 13, 6070. [Google Scholar] [CrossRef]
Adjabi, I.; Ouahabi, A.; Benzaoui, A.; Taleb-Ahmed, A. Past, Present, and Future of Face Recognition: A Review. Electronics 2020, 9, 1188. [Google Scholar] [CrossRef]
Merler, M.; Ratha, N.; Feris, R.S.; Smith, J.R. Diversity in Faces. arXiv 2019, arXiv:1901.10436. [Google Scholar] [CrossRef]
Filntisis, P.P.; Retsinas, G.; Paraperas-Papantoniou, F.; Katsamanis, A.; Roussos, A.; Maragos, P. Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos. arXiv 2022, arXiv:2207.11094. [Google Scholar] [CrossRef]
Liu, Z.; Luo, P.; Wang, X.; Tang, X. Deep Learning Face Attributes in the Wild. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 3730–3738. [Google Scholar] [CrossRef]
Huang, B.; Wang, Z.; Yang, J.; Han, Z.; Liang, C. Unlabeled Data Assistant: Improving Mask Robustness for Face Recognition. IEEE Trans. Inf. Forensics Secur. 2024, 19, 3109–3123. [Google Scholar] [CrossRef]
Gururaj, H.L.; Soundarya, B.C.; Priya, S.; Shreyas, J.; Flammini, F. A Comprehensive Review of Face Recognition Techniques, Trends, and Challenges. IEEE Access 2024, 12, 107903–107926. [Google Scholar] [CrossRef]
Huber, M.; Luu, A.T.; Terhörst, P.; Damer, N. Efficient Explainable Face Verification based on Similarity Score Argument Backpropagation. In Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 4724–4733. [Google Scholar] [CrossRef]
AI Act|Shaping Europe’s Digital Future. Available online: https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai (accessed on 14 February 2025).
Anwar, A.; Raychowdhury, A. Masked Face Recognition for Secure Authentication. arXiv 2020, arXiv:2008.11104. [Google Scholar] [CrossRef]
Mensah, J.A.; Appati, J.K.; Boateng, E.K.A.; Ocran, E.; Asiedu, L. FaceNet recognition algorithm subject to multiple constraints: Assessment of the performance. Sci. Afr. 2024, 23, e02007. [Google Scholar] [CrossRef]
Rexha, B.; Shala, G.; Xhafa, V. Increasing Trustworthiness of Face Authentication in Mobile Devices by Modeling Gesture Behavior and Location Using Neural Networks. Future Internet 2018, 10, 17. [Google Scholar] [CrossRef]
Lee, H.; Park, S.-H.; Yoo, J.-H.; Jung, S.-H.; Huh, J.-H. Face Recognition at a Distance for a Stand-Alone Access Control System. Sensors 2020, 20, 785. [Google Scholar] [CrossRef] [PubMed]
Chillaron, M.; Dunai, L.; Fajarnes, G.P.; Lengua, I.L. Face detection and recognition application for Android. In Proceedings of the IECON 2015—41st Annual Conference of the IEEE Industrial Electronics Society, Yokohama, Japan, 9–12 November 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 003132–003136. [Google Scholar] [CrossRef]
Valenzuela, W.; Soto, J.E.; Zarkesh-Ha, P.; Figueroa, M. Face Recognition on a Smart Image Sensor Using Local Gradients. Sensors 2021, 21, 2901. [Google Scholar] [CrossRef]
Talahua, J.S.; Buele, J.; Calvopiña, P.; Varela-Aldás, J. Facial Recognition System for People with and without Face Mask in Times of the COVID-19 Pandemic. Sustainability 2021, 13, 6900. [Google Scholar] [CrossRef]
Liu, R.; Liu, Y.; Wang, Z.; Tian, H. Research on face recognition technology based on an improved LeNet-5 system. In Proceedings of the 2022 International Seminar on Computer Science and Engineering Technology (SCSET), Indianapolis, IN, USA, 8–9 January 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 121–123. [Google Scholar] [CrossRef]
Zhou, Y.; Wu, N.; Hu, B.; Zhang, Y.; Qiu, J.; Cai, W. Implementation and Performance of Face Recognition Payment System Securely Encrypted by SM4 Algorithm. Information 2022, 13, 316. [Google Scholar] [CrossRef]
Ríos-Sánchez, B.; Costa-da-Silva, D.; Martín-Yuste, N.; Sánchez-Ávila, C. Deep Learning for Facial Recognition on Single Sample per Person Scenarios with Varied Capturing Conditions. Appl. Sci. 2019, 9, 5474. [Google Scholar] [CrossRef]
Reddy, T.J.; Ganesh, M.S.; Kumar Reddy, M.H.; Bhandhavya, C.; Jansi, R. Deep Learning-Powered Face Detection and Recognition for Challenging Environments. In Proceedings of the 2024 2nd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT), Bengaluru, India, 4–6 January 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1453–1459. [Google Scholar] [CrossRef]
Durai, S.; Sujithra, T.; Satyam, B.V.; Keshetty, S.N.; Sagar, C.N.S.; Charan, A.S. Real Time Facial Recognition-Based Criminal Identification Using MTCNN. In Proceedings of the 2024 2nd International Conference on Sustainable Computing and Smart Systems (ICSCSS), Coimbatore, India, 10–12 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1261–1265. [Google Scholar] [CrossRef]
Živković, N.; Žarić, N. Development and implementation of a facial recognition intrusion detection system. In Proceedings of the 2024 28th International Conference on Information Technology (IT), Zabljak, Montenegro, 21–24 February 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–4. [Google Scholar] [CrossRef]
Medvedev, I.; Shadmand, F.; Cruz, L.; Gonçalves, N. Towards Facial Biometrics for ID Document Validation in Mobile Devices. Appl. Sci. 2021, 11, 6134. [Google Scholar] [CrossRef]
Zhu, X.; Liu, H.; Lei, Z.; Shi, H.; Yang, F.; Yi, D.; Qi, G.; Li, S.Z. Large-Scale Bisample Learning on ID Versus Spot Face Recognition. Int. J. Comput. Vis. 2019, 127, 684–700. [Google Scholar] [CrossRef]
Apache Kafka. Available online: https://kafka.apache.org/ (accessed on 26 January 2022).
MaoCai. Pytorch_Face_Recognition. Python. 31 March 2022. Available online: https://github.com/QuasarLight/Pytorch_Face_Recognition (accessed on 21 November 2023).
King, D.E. Dlib-ml: A Machine Learning Toolkit. J. Mach. Learn. Res. 2009, 10, 1755–1758. [Google Scholar]
OpenCV—Open Computer Vision Library. Available online: https://opencv.org/ (accessed on 14 February 2025).
dlib C++ Library. Available online: https://dlib.net/ (accessed on 7 July 2025).
davisking GitHub. Davisking/Dlib-Models: Trained Model Files for Dlib Example Programs. Available online: https://github.com/davisking/dlib-models (accessed on 7 July 2025).
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 4510–4520. [Google Scholar] [CrossRef]
MobileNet v2|PyTorch. Available online: https://pytorch.org/hub/pytorch_vision_mobilenet_v2/ (accessed on 14 February 2025).
Martindez-Diaz, Y.; Luevano, L.S.; Mendez-Vazquez, H.; Nicolas-Diaz, M.; Chang, L.; Gonzalez-Mendoza, M. ShuffleFaceNet: A Lightweight Face Architecture for Efficient and Highly-Accurate Face Recognition. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 2721–2728. [Google Scholar] [CrossRef]
Ye, J. Cosine similarity measures for intuitionistic fuzzy sets and their applications. Math. Comput. Model. 2011, 53, 91–97. [Google Scholar] [CrossRef]
Deshpande, M.; Karypis, G. Item-based top-N recommendation algorithms. ACM Trans. Inf. Syst. 2004, 22, 143–177. [Google Scholar] [CrossRef]
Huang, J.; Ling, C.X. Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 2005, 17, 299–310. [Google Scholar] [CrossRef]
Xu, X.; Huang, Y.; Shen, P.; Li, S.; Li, J.; Huang, F.; Li, Y.; Cui, Z. Consistent Instance False Positive Improves Fairness in Face Recognition. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 578–586. [Google Scholar] [CrossRef]
Gopakumar, R.; Kotegar, K.A.; Vishal Anand, M. A Quantitative Study on the FaceNet System. In Advanced Computational and Communication Paradigms; Borah, S., Gandhi, T.K., Piuri, V., Eds.; Lecture Notes in Networks and Systems; Springer Nature: Singapore, 2023; Volume 535, pp. 211–223. [Google Scholar] [CrossRef]
Akingbesote, D.; Zhan, Y.; Maskeliūnas, R.; Damaševičius, R. Improving Accuracy of Face Recognition in the Era of Mask-Wearing: An Evaluation of a Pareto-Optimized FaceNet Model with Data Preprocessing Techniques. Algorithms 2023, 16, 292. [Google Scholar] [CrossRef]

Figure 1. Architecture of the proposed face verification and identification system.

Figure 2. ROC curves and AUC scores attained by each face recognition algorithm and the baseline model on the curated ID vs. Spot dataset.

Figure 3. Trade-off between the FNR and the FAR.

Figure 4. FNR and FAR as functions of the similarity threshold.

Figure 5. Utilization of the FACE-VI mobile application in real-time terms.

Figure 6. Welcome page, as well as other related pages from the FACE-VI app.

Figure 7. Face identification results via the FACE-VI app.

Figure 8. Face verification results via the FACE-VI app.

Figure 9. (Left) Utilization of the MR glasses in real-time terms. (Right) Face identification results via the MR glasses.

Figure 10. Visualization of the events extracted by the FACE-VI app to the Web-based UI.

Figure 11. Sample *.JSON file for information sharing from the FACE-VI app for a mis-verification case via middleware to other external systems.

Table 1. Face identification evaluation via Rank-1 and Rank-5 identification rates attained by each face recognition algorithm and the baseline model on the curated ID vs. Spot dataset.

Model	Rank-1 (%)	Rank-5 (%)	Inference Time (ms)
Dlib (baseline)	79.5	81.3	30
ArcFace	92	96.1	97
MobileNetV2	85.2	89.6	20
FaceNet	91.8	95.8	52

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Douklias, A.; Zorzos, I.; Maltezos, E.; Nousis, V.; Bolierakis, S.N.; Karagiannidis, L.; Ouzounoglou, E.; Amditis, A. A Multicomponent Face Verification and Identification System. Appl. Sci. 2025, 15, 8161. https://doi.org/10.3390/app15158161

AMA Style

Douklias A, Zorzos I, Maltezos E, Nousis V, Bolierakis SN, Karagiannidis L, Ouzounoglou E, Amditis A. A Multicomponent Face Verification and Identification System. Applied Sciences. 2025; 15(15):8161. https://doi.org/10.3390/app15158161

Chicago/Turabian Style

Douklias, Athanasios, Ioannis Zorzos, Evangelos Maltezos, Vasilis Nousis, Spyridon Nektarios Bolierakis, Lazaros Karagiannidis, Eleftherios Ouzounoglou, and Angelos Amditis. 2025. "A Multicomponent Face Verification and Identification System" Applied Sciences 15, no. 15: 8161. https://doi.org/10.3390/app15158161

APA Style

Douklias, A., Zorzos, I., Maltezos, E., Nousis, V., Bolierakis, S. N., Karagiannidis, L., Ouzounoglou, E., & Amditis, A. (2025). A Multicomponent Face Verification and Identification System. Applied Sciences, 15(15), 8161. https://doi.org/10.3390/app15158161

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Multicomponent Face Verification and Identification System

Abstract

1. Introduction

1.1. Related Work

1.2. Our Contribution

2. Materials and Methods

2.1. System’s Architecture

2.2. Deep Learning Model for the Face Identification and Verification

2.2.1. Dataset Curation

2.2.2. Implementation Details

2.2.3. Pre-Trained Models

2.2.4. Similarity Measure and Evaluation Methodology

3. Results

3.1. Face Identification and Verification Results

3.2. FACE-VI Mobile Application

3.3. Mixed Reality Glasses

3.4. Web-Based UI

3.5. Middeware

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI