Resolution-Aware Deep Learning with Feature Space Optimization for Reliable Identity Verification in Electronic Know Your Customer Processes

Ketcham, Mahasak; Boonyopakorn, Pongsarun; Ganokratanaa, Thittaporn

doi:10.3390/math13111726

Open AccessArticle

Resolution-Aware Deep Learning with Feature Space Optimization for Reliable Identity Verification in Electronic Know Your Customer Processes

by

Mahasak Ketcham

¹

,

Pongsarun Boonyopakorn

^2,*

and

Thittaporn Ganokratanaa

³

¹

Department of Information Technology Management, King Mongkut’s University of Technology Thonburi, Bangkok 10800, Thailand

²

Department of Digital Network and Information Security Management, King Mongkut’s University of Technology Thonburi, Bangkok 10800, Thailand

³

Applied Computer Science Programme, King Mongkut’s University of Technology Thonburi, Bangkok 10140, Thailand

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(11), 1726; https://doi.org/10.3390/math13111726

Submission received: 5 May 2025 / Revised: 18 May 2025 / Accepted: 20 May 2025 / Published: 23 May 2025

(This article belongs to the Special Issue Advanced Studies in Mathematical Optimization and Machine Learning)

Download

Browse Figures

Review Reports Versions Notes

Abstract

In modern digital transactions involving government agencies, financial institutions, and commercial enterprises, reliable identity verification is essential to ensure security and trust. Traditional methods, such as submitting photocopies of ID cards, are increasingly susceptible to identity theft and fraud. To address these challenges, this study proposes a novel and robust identity verification framework that integrates super-resolution preprocessing, a convolutional neural network (CNN), and Monte Carlo dropout-based Bayesian uncertainty estimation for enhanced facial recognition in electronic know your customer (e-KYC) processes. The key contribution of this research lies in its ability to handle low-resolution and degraded facial images simulating real-world conditions where image quality is inconsistent while providing confidence-aware predictions to support transparent and risk-aware decision making. The proposed model is trained on facial images resized to 24 × 24 pixels, with a super-resolution module enhancing feature clarity prior to classification. By incorporating Monte Carlo dropout, the system estimates predictive uncertainty, addressing critical limitations of conventional black-box deep learning models. Experimental evaluations confirmed the effectiveness of the framework, achieving a classification accuracy of 99.7%, precision of 99.2%, recall of 99.3%, and an AUC score of 99.5% under standard testing conditions. The model also demonstrated strong robustness against noise and image blur, maintaining reliable performance even under challenging input conditions. In addition, the proposed system is designed to comply with international digital identity standards, including the Identity Assurance Level (IAL) and Authenticator Assurance Level (AAL), ensuring practical applicability in regulated environments. Overall, this research contributes a scalable, secure, and interpretable solution that advances the application of deep learning and uncertainty modeling in real-world e-KYC systems.

Keywords:

robust identity verification; CNN; image super-resolution

MSC:

68T07

1. Introduction

In contemporary interactions with governmental bodies, financial institutions, healthcare facilities, and commercial enterprises, the process of confirming individuals’ identities constitutes a crucial initial step in safeguarding access, ensuring trust, and enhancing operational security. In today’s digital economy, where interactions with government agencies, financial institutions, and commercial enterprises are increasingly conducted online, ensuring secure and trustworthy identity verification has become critical. Traditional verification methods, such as submitting photocopies of identification documents, are no longer sufficient, as they are highly vulnerable to identity theft and fraud. However, beyond the risk of fraud, these conventional approaches also suffer from several operational limitations. Manual identity verification is often time-consuming and labor-intensive, leading to significant delays in service delivery. For example, a report by [1] found that manual onboarding processes can take up to 10 times longer than digital alternatives, directly impacting user satisfaction and operational efficiency. In addition, the reliance on physical documents results in higher administrative costs and introduces risks associated with human error during document verification. According to a study by Deloitte [2], organizations incur an average cost of USD 20 to USD 30 per customer for manual identity verification processes. Moreover, traditional verification systems lack scalability and are ill-suited for the demands of modern digital platforms, especially in remote onboarding scenarios. With the rapid growth of online financial services and government digital transformation initiatives, the inability of conventional methods to support seamless remote identity verification has become a significant barrier to service expansion. The World Bank’s Identification for Development (ID4D) initiative also highlights that nearly 1 billion people globally remain without formal identification, further emphasizing the need for more accessible and efficient digital identity solutions. In response to these challenges, this research introduces a robust identity verification framework that combines super-resolution preprocessing, a convolutional neural network (CNN), and Monte Carlo dropout-based Bayesian uncertainty estimation to improve facial recognition within electronic know your customer (e-KYC) processes.

Traditionally, this verification endeavor has manifested itself through a variety of approaches, including access card exchanges for facility entry, the presentation of government-issued documentation, and the submission of identification copies for banking or legal transactions. However, the diversity in verification protocols across sectors has led to the proliferation of disconnected systems, resulting in fragmented data repositories, inconsistent user experiences, and a lack of standardized mechanisms that support long-term transactional reusability. As a consequence, even returning clients frequently encounter the burden of resubmitting documents, while persistent instances of falsified or misrepresented credentials continue to contribute to fraudulent activities with widespread ramifications. To address such challenges, national-level digital transformation efforts have been launched to enhance the security, efficiency, and interoperability of digital identity systems. Anchored in the National Strategy (2018–2037) [3], Thailand’s digital development agenda emphasizes cross-sectoral technological integration, particularly in the realms of digital industries, artificial intelligence (AI), and secure data infrastructure. As part of this strategy, the Ministry of Digital Economy and Society (MDES) issued Ministerial Order No. 75/2560 [4], which mandates the creation of a committee tasked with the development and standardization of a national digital identity ecosystem. The primary aim of this initiative is to mitigate systemic inefficiencies, reduce redundancy, and establish robust verification protocols applicable to both public and private sectors. Subsequently, the Electronic Transactions Development Agency (ETDA) was appointed to formulate strategic frameworks and technical guidelines for identity verification and authentication. These efforts culminated in the development of the identity assurance level (IAL) and the authenticator assurance level (AAL), which serve as standardized metrics for measuring the strength and reliability of digital identity systems [5,6]. These levels, based on the guidelines outlined in NIST Special Publication 800-63A, ensure that verification processes are implemented with consistent levels of rigor, scalability, and security, thereby aligning domestic policies with global digital identity assurance standards. In tandem with these regulatory advancements, academic research has increasingly explored technological methodologies capable of achieving these objectives. Central to this scholarly pursuit is the application of deep learning algorithms, particularly convolutional neural networks (CNN), in the context of facial recognition and biometric verification. CNNs have demonstrated considerable success in extracting hierarchical features from facial images and performing accurate identity classification. Such image-based methods have become integral to the implementation of electronic know your customer (e-KYC) systems, which aim to digitally establish and verify user identities in a secure and automated manner.

Despite these advancements, practical implementations continue to face several challenges. In real-world scenarios, facial images are often of low resolution, affected by suboptimal lighting, or captured at varying time points, which leads to significant degradation in recognition performance. Additionally, conventional deep learning models often operate as deterministic black boxes, offering limited insight into the confidence or reliability of their predictions, which is an issue of particular concern when deployed in critical domains such as finance or national security. To bridge this gap, the field is increasingly turning to statistical machine learning approaches, particularly those rooted in Bayesian inference to enhance the robustness, interpretability, and reliability of identity verification systems. Bayesian methods provide a mathematically grounded framework for quantifying uncertainty in model predictions, enabling the system to not only classify but also to evaluate its confidence in those classifications. This capacity becomes crucial when working with noisy or degraded image inputs, as it allows for more cautious decision making and better risk management in downstream processes [7,8]. Recent innovations such as Monte Carlo Dropout, Bayesian neural networks, and variational inference techniques allow deep learning models to approximate posterior distributions over weights or predictions, thereby providing insights into both epistemic (model-related) and aleatoric (data-related) uncertainties. These techniques also support the calibration of prediction scores, aiding system designers and decision makers in assessing when a prediction should be trusted or verified by alternative means. Although numerous studies have applied deep learning techniques, particularly convolutional neural networks (CNNs) for facial image-based identity verification, several critical limitations remain, which justify the need for this research. A primary concern lies in the quality of input images, which in real-world scenarios are often low-resolution, poorly lit, or captured under non-ideal conditions, significantly degrading the performance of conventional CNN models. Moreover, most existing approaches operate as opaque “black box” systems, offering no insight into the confidence of their predictions. This lack of interpretability poses a considerable risk in high-stake applications such as financial services or national-level identity verification. Furthermore, prior research has seldom incorporated mechanisms for quantifying uncertainty, either in the data or the model, which is essential for cautious and risk-aware decision making, particularly in ambiguous or noisy input conditions. In response to these challenges, this study proposes a novel framework that integrates CNNs with Bayesian techniques to develop an identity verification model that is not only accurate but also transparent and capable of estimating the reliability of its own predictions. This approach aims to support secure, interpretable, and scalable identity verification within e-KYC systems, in alignment with international standards for digital trust and assurance. While previous studies have demonstrated the effectiveness of convolutional neural networks (CNNs) in facial recognition and identity verification tasks, most have been developed and evaluated under controlled environments, limiting their generalizability in real-world applications. These models often assume consistent lighting conditions, high-resolution imagery, and minimal temporal variation, which is rarely the case in practical settings, especially within e-KYC frameworks where images may be acquired from mobile devices under diverse conditions. This discrepancy underscores a fundamental gap in robustness and adaptability. Furthermore, despite achieving high classification accuracy, existing CNN-based models typically function as deterministic systems, and offer no estimation of predictive uncertainty. This black box nature poses significant risks when these systems are deployed in security-critical domains such as finance, healthcare, or national governance, where decisions must be justifiable, auditable, and trustworthy. The lack of interpretable and confidence-aware outputs hinders their integration into systems requiring compliance with identity assurance standards like the IAL (Identity Assurance Level) and AAL (Authenticator Assurance Level). In addition, while recent advancements in Bayesian deep learning provide promising tools for quantifying model uncertainty, such techniques have not yet been adequately applied to identity verification scenarios, particularly in e-KYC contexts. Most prior works focus either on model performance metrics or algorithmic novelty, but fail to address the practical necessity for reliability estimation, risk calibration, or system-level interpretability under suboptimal input conditions.

This research seeks to fill these gaps by introducing a hybrid approach that combines the feature extraction power of CNNs with Bayesian inference methods to produce a system capable of both high accuracy and reliable uncertainty estimation. By focusing on real-world image quality variations and aligning with international identity verification standards, the proposed framework addresses current limitations and contributes a robust, interpretable, and regulation ready solution for digital identity systems. And, to develop a CNN-based identity verification framework optimized for e-KYC applications, while exploring the integration of Bayesian-inspired techniques to improve its practical robustness. The proposed system is evaluated on facial images captured under varying conditions including low resolution and time disjoint comparisons to simulate realistic use cases. Special emphasis is placed on the model’s ability to provide interpretable outputs and reliable confidence measures, which are essential for compliance with IAL and AAL verification requirements. By combining advanced image processing, deep learning, and Bayesian statistical methods, this study aims to contribute a scalable, secure, and intelligent solution for digital identity verification. In doing so, it supports both the national digital strategy and the broader goal of aligning with international standards, while advancing the theoretical understanding of uncertainty-aware machine learning in real-world imaging applications.

Research questions (RQs). This study seeks to answer the following research questions:

RQ1: How can identity verification systems effectively improve recognition accuracy when dealing with low-resolution and degraded facial images, as commonly encountered in real-world e-KYC environments?
RQ2: How does the integration of uncertainty estimation methods, such as Monte Carlo Dropout, contribute to enhancing the reliability and transparency of identity verification decisions?
RQ3: Is the proposed identity verification framework compliant with international standards for digital identity assurance, specifically the Identity Assurance Level (IAL) and Authenticator Assurance Level (AAL)?

These research questions frame the development and evaluation of the proposed identity verification framework, ensuring both technical robustness and practical applicability in regulated digital environments.

The main contribution of this article is the development of a robust and interpretable identity verification framework that combines convolutional neural networks (CNNs) with Bayesian inference techniques, specifically tailored for real-world e-KYC applications. This hybrid approach addresses several critical limitations observed in the existing literature, including the lack of resilience to poor quality input images and the absence of uncertainty estimation in conventional deep learning models. The proposed system is capable of processing facial images captured under suboptimal conditions, while also quantifying both epistemic and aleatoric uncertainties, enabling it to provide confidence-aware predictions. These features are particularly relevant for applications that require high levels of trust, auditability, and compliance with international standards such as the Identity Assurance Level (IAL) and Authenticator Assurance Level (AAL). Furthermore, the model is designed with scalability and practical deployment in mind, offering a solution that not only advances technical performance but also meets the operational demands of national digital identity systems.

2. Related Works

Within the parameters of this research endeavor, an extensive investigation and data gathering initiative were conducted by the researchers. This was undertaken to formulate a framework and delineate guidelines for the examination of customer recognition processes through electronic channels, with a particular emphasis on identity authentication. Predominantly, deep learning methodologies, specifically employing convolutional neural network (CNN) algorithms, were employed in this study. Recent studies have extensively investigated identity verification techniques, particularly in addressing the challenges related to low-resolution facial images and identity authentication over electronic platforms. While many works provide valuable insights, gaps remain regarding their practical application in real-world e-KYC scenarios.

Ouyang et al. [8] explored the use of CNNs to improve facial recognition for low-quality images. Their approach involved first extracting features from high-resolution images and then upscaling low-quality images to 56 × 56 pixels before applying the same feature extraction technique. Although their method achieved impressive accuracy rates of 98.2%, 99.1%, and 99.5% for low-resolution images sized 20 × 20, 24 × 24, and 36 × 36 pixels, respectively, it primarily focused on feature extraction without addressing decision confidence or uncertainty estimation.

Gal and Ghahramani [9] reinterpret dropout as an approximation to Bayesian inference, enabling neural networks to estimate uncertainty. By applying dropout during both training and testing (MC Dropout), models can capture predictive variance without architectural changes. This method offers a practical and efficient approach to uncertainty modeling in deep learning.

Chen et al. [10] focused on age-invariant face recognition using deep learning with SVM classifiers, applying their models to datasets such as FGNET, MORPH, and CACD. While their results showed promising accuracy across diverse age groups, their solution was limited by its reliance on high-quality inputs and lacked robustness under degraded imaging conditions often encountered in real-world e-KYC applications.

Similarly, Singh et al. [11] proposed a face recognition framework utilizing the synthesis via hierarchical sparse representation (SHSR) algorithm. Their work focused on comparing high- and low-resolution image sets using multiple rounds of sparse representation. While the SHSR algorithm improved performance under varying resolutions, it did not integrate uncertainty modeling or regulatory compliance considerations, which are crucial for practical e-KYC deployment.

Li et al. [12] introduced a comparative face recognition approach using various deep learning architectures, including Siamese networks, Matchnet, and six-channel networks. Their framework involved aggressively downsampling facial images before applying super-resolution techniques for final comparison. Although this approach improved visual similarity matching, it lacked mechanisms to assess prediction reliability, which is critical in high-stakes verification scenarios.

Iqbal et al. [13] proposed an age group classification framework based on facial wrinkle and skin texture analysis, introducing the directional age-primitive pattern (DAPP) algorithm. Although their method effectively classified age groups, it did not address the broader challenges of identity verification under low-quality imaging environments or the need for compliance with digital identity standards.

The collated information served as the cornerstone for the development of theoret-ical constructs and a comprehensive review of pertinent research works. These theoret-ical constructs and research insights laid the groundwork for the establishment of sys-tematic guidelines and methodological approaches for the scrutiny of customer recognition processes via electronic platforms [14,15,16,17,18,19,20,21,22,23,24].

While each of these studies contributes valuable approaches to specific technical challenges, none have comprehensively addressed the combined issues of low-resolution image processing, uncertainty estimation, and compliance with identity assurance frameworks. This research builds upon those foundations to propose a more holistic and practically deployable solution for secure and trustworthy identity verification in electronic environments. Recent advancements in facial recognition, super-resolution techniques, and uncertainty estimation have directly influenced the development of more robust and reliable identity verification systems. Zhang et al. [25] introduced an attention-guided multi-scale interaction network for face super-resolution, which significantly enhances facial detail restoration from low-resolution inputs. This advancement supports our proposed framework’s objective of improving recognition accuracy when dealing with degraded image quality.

Wijaya et al. [26] proposed a GAN-based reconstruction technique aimed at enhancing the quality of low-resolution and degraded facial images prior to identity recognition. Their approach focused on restoring critical facial details such as facial contours, eyes, nose, and mouth through a specially designed GAN architecture that incorporates perceptual and adversarial loss functions to improve the realism and sharpness of reconstructed images. Experimental results demonstrated significant improvements in recognition accuracy, with over 12% accuracy gain compared to traditional interpolation methods like Bicubic and earlier super-resolution models such as SRCNN. The proposed method also showed high robustness against common image degradations including noise, blur, and geometric distortions. While their approach successfully improved the visual quality and recognition performance of facial images, it did not incorporate uncertainty estimation mechanisms. As a result, the system lacks the ability to provide confidence-aware predictions, which are essential for decision making in high-stakes environments such as e-KYC and financial services. In contrast, our proposed framework integrates not only super-resolution and recognition capabilities but also incorporates Monte Carlo dropout for predictive uncertainty estimation, addressing both accuracy and reliability concerns in practical deployment scenarios.

In the area of predictive uncertainty, Chen et al. [10] proposed a Bayesian identity cap method to deliver calibrated uncertainty estimations in facial recognition systems, ensuring that decision making is more transparent and risk-aware. Similarly, Verma, P. et al. [27] study implements a transfer learning approach, building upon the DenseNet-121 convolutional neural network to detect diabetic retinopathy. The authors apply Bayesian approximation techniques, including Monte Carlo dropout, to represent the posterior predictive distribution, allowing for uncertainty evaluation in model predictions. Their experiments demonstrate that the Bayesian-augmented DenseNet-121 outperforms state-of-the-art models in test accuracy, achieving 97.68% for the Monte Carlo Dropout model. Moreover, the comprehensive survey by Gawlikowski et al. [28] emphasizes that integrating uncertainty estimation with deep neural networks is crucial for developing trustworthy AI systems, particularly in applications involving security and identity verification. This supports the design philosophy of our proposed model, which combines image enhancement, predictive uncertainty estimation, and compliance with international digital identity standards to deliver a scalable, secure, and explainable solution for real-world e-KYC environments.

2.1. Conceptual Framework and Principles of Customer Recognition Process

According to the announcements issued by the Bank of Thailand, specifically No-tices No. 7/2559 and No. 31/2562, pertaining to the criteria for accepting deposits or receiving funds from the public, guidelines regarding customer recognition processes encompass identity verification and authentication. These processes can be executed through two primary modalities: face-to-face encounters and non-face-to-face interactions via electronic channels. The operational procedures for each modality are detailed as follows: In face-to-face customer encounters, the interaction beings with a bank representative involves the provision of personal identification information, presentation of identity documents, and physical signature verification; subsequently, identity verification entails the examination of identity documents by authorized personnel to ensure accuracy and authenticity; biometric data comparison is conducted between the individual’s biometric information and the data presented in the identity documents for authentication. In non-face-to-face transactions, the interaction involves electronic interactions between the service provider’s system and the customer; the customer inputs personal information and submits copies of identity documents electronically; facial images are captured by the electronic device, along with the images of the presented identity documents; subsequently, biometric data comparison is performed between the facial images captured and the images of the identity documents for authentication. Hence, the process of customer recognition via electronic channels enables customers and service providers to independently conduct identity verification and authentication processes through electronic devices or computer systems equipped with customer recognition services.

2.2. Guidelines for the Use of Digital IDs in Thailand

In accordance with the guidelines outlined by the Electronic Transactions Development Agency (ETDA), under the supervision of the Ministry of Digital Economy and Society of Thailand, a working group has been established to study and propose recommendations regarding the use of digital IDs in the country. These recommendations, documented in [3,4], serve as guidelines for registration, identity verification, and authentication processes. The document on registration and identity verification guidelines provides recommendations for individuals seeking to register their identities for service usage, categorized according to levels of trustworthiness. The levels, namely IAL1, IAL2, and IAL3, each have three sub-levels, with varying degrees of stringency and requirements for identity verification. The authentication document offers guidance on verifying and managing identity assertions based on the trust levels of the authentication mechanisms used. The levels, namely AAL1, AAL2, and AAL3, with two sub-levels for AAL2, define the stringency and requirements for identity authentication. The recommended document for registration and identity verification guidelines, suitable for adaptation in simulating the process of customer identification through electronic channels, serves as a basis for research in this study. In this research endeavor, the trustworthiness level of identity assurance (IAL) at level 2.1 is explored and utilized to develop a process termed “physical comparison”. This process is enhanced using deep learning techniques, employing convolutional neural network (CNN) models to learn from clear and blurred facial images of individuals.

Several deep learning-based super-resolution techniques have been proposed in recent years, including SRCNN, VDSR, EDSR, and ESRGAN. These methods differ in complexity, performance, and suitability for specific application domains. As summarized in Table 1, while GAN-based methods such as ESRGAN generate high perceptual quality, they may produce hallucinated features and are thus less appropriate for identity-sensitive tasks like e-KYC. In contrast, SRCNN offers a balance between computational efficiency and enhancement quality, making it suitable for low-resolution facial image scenarios where real-time processing and system transparency are essential. Consequently, this study adopts an SRCNN-inspired approach for upscaling input images prior to classification in Table 1.

The novelty of this research lies in the development of a unified and practical framework that addresses key limitations in existing e-KYC identity verification systems. Unlike conventional approaches that focus solely on achieving high classification accuracy under controlled conditions, this study introduces an innovative integration of super-resolution preprocessing, lightweight deep learning, and Bayesian uncertainty estimation within a single architecture. This combination has not been previously explored in the context of digital identity verification, particularly for scenarios involving low-quality facial images captured in real-world environments.

A distinctive aspect of this work is the use of a super-resolution convolutional neural network (SRCNN) to enhance degraded facial images prior to classification. This approach effectively mitigates common issues associated with low-resolution and blurred images, which frequently occur in mobile-based e-KYC processes. Furthermore, the incorporation of Bayesian-inspired uncertainty estimation adds a critical layer of interpretability and risk-awareness, enabling the system to provide confidence scores alongside predictions, which is an essential feature for compliance-driven applications in finance, government, and enterprise security. In addition to technical advancements, the proposed framework is explicitly designed to align with international standards such as the Identity Assurance Level (IAL) and Authenticator Assurance Level (AAL), ensuring that the solution not only performs well but also adheres to regulatory and security requirements. The lightweight nature of the architecture further enhances its practicality, allowing for efficient deployment on resource-constrained devices without compromising accuracy or reliability. This research presents a novel contribution by bridging the gap between deep learning innovation and practical, standards-compliant identity verification. It delivers a robust, explainable, and scalable solution capable of handling real-world data imperfections while supporting transparent and secure decision making in modern e-KYC systems.

3. Proposed Method

This study explores a comprehensive framework for customer identification via electronic, non-face-to-face methods. These procedures are conducted through digital platforms either provided by service entities or self-operated by individuals. The identification process is closely tied to registration protocols and involves stringent verification steps that emphasize document authenticity and image-based identity matching, as outlined in Section 2.1. Central to this verification process is the comparison of facial features extracted from live or uploaded images against reference photographs embedded in official identification documents, most commonly national ID cards. In line with this objective, prior literature reveals numerous research efforts focusing on the evolution of artificial neural network architectures to address the challenges posed by image variability in real-world identity verification contexts. Of particular relevance are studies examining the performance of dual-network architectures designed to compare facial images that differ in sharpness or clarity, a scenario frequently encountered when contrasting live captures against ID card photographs. This is further supplemented by research into super-resolution techniques that aim to enhance image quality and preserve the critical facial features necessary for reliable comparison. Motivated by these insights, this study introduces a novel convolutional neural network (CNN) model tailored to perform comparative analysis between facial images exhibiting varying levels of resolution and sharpness, as illustrated in Figure 1. The proposed model is not only optimized for visual feature extraction but is also designed to integrate uncertainty-aware components, such as softmax-based confidence scoring or Monte Carlo dropout, enabling a more robust and interpretable decision-making process during facial comparisons. To train and validate this model, a carefully curated dataset of facial images is required. The well-established CASIA-WebFace dataset is selected for this purpose due to its wide demographic coverage and sufficient variation in facial expressions, lighting conditions, and resolution levels. The dataset comprises 494,414 images of size 250 × 250 pixels, capturing a diverse set of individuals. As shown in Table 2, these images are categorized by key demographic attributes, including dark-skinned females, non-dark-skinned females, dark-skinned males, and non-dark-skinned males. This classification not only supports the development of a balanced training set but also facilitates the subsequent evaluation of model performance across different demographic groups, thereby addressing potential bias and supporting fairness in biometric identity verification. Moreover, by aligning the proposed approach with the principles of statistical machine learning, this study aims to advance current e-KYC systems towards greater robustness, fairness, and uncertainty-awareness, all of which are essential in real-world identity verification scenarios involving diverse and potentially degraded image inputs.

As shown in Figure 1a, the overall architecture of the proposed identity verification framework integrates super-resolution preprocessing, DRRN-based convolutional neural networks (CNNs), and Monte Carlo dropout for uncertainty estimation. This framework enhances low-resolution facial images, extracts deep facial features, and supports reliable identity verification through confidence-aware predictions. As shown in Figure 1b, the detailed structure of the recursive residual block was employed in the DRRN architecture. The block utilizes multiple convolutional layers with residual connections and activation functions, recursively applied to progressively refine the image resolution and improve feature representation.

Subsequently, the facial image dataset was divided into two subsets to separately train the convolutional neural network high-resolution branch (CNNHRB) and the low-resolution branch (CNNLRB), yielding a total of 58,350 images. For the CNNLRB subset, each facial image was resized to 24 × 24 pixels using the cv2.resize() function from the OpenCV library. In this process, bilinear interpolation was chosen due to its balance between preserving visual detail and maintaining computational efficiency, which is particularly suitable for low-resolution learning scenarios, as demonstrated in Figure 2. Before initiating the training phase, both branches underwent a preprocessing stage where all images were passed through a facial detection algorithm. This step was designed to isolate relevant facial regions and reduce unnecessary background information, thereby optimizing the overall computational load and accelerating the training process, as shown in Figure 3.

After the facial images underwent facial detection processes, they were categorized into datasets for training and validation purposes. Each dataset comprised two sets of data for training the convolutional neural network high-resolution branch (CNNHRB) and the convolutional neural network low-resolution branch (CNNLRB). The facial image datasets were stored on a computer system for neural network training, structured into six main files: hr-train, hr-val, hr-test, lr-train, lr-val, and lr-test, as illustrated in Figure 4. Within each main file, four sub-files stored the facial image samples underwent facial detection processes, categorized on a per-individual basis. Each individual was labeled with an eight-digit file name, as depicted in Figure 5. The last two digits of the file name were used to denote the individual’s gender and skin tone, as outlined in Table 2. This section outlines the comprehensive methodology for preparing, processing, and structuring facial image data to train a dual-branch convolutional neural network (CNN) for identity verification. The design leverages both high-resolution (HR) and low-resolution (LR) image inputs and supports the potential integration of Bayesian-based uncertainty estimation to improve model reliability and robustness. All the steps in the equation are in Appendix A.

Dataset Preparation

Let

D = {\{x_{i}\}}_{i = 1}^{N}

be the set of original RGB facial images, where each image

x_{i} \in R^{250 \times 250 \times 3}

and

N = 29,175

. The dataset is duplicated into two subsets.

D_{H R} = D, D_{L R} = {\{x_{i}^{L R}\}}_{i = 1}^{N}

(1)

To obtain

D_{L R}

, each image was downsampled to dimensions of 24 × 24 pixels using the cv2.resize() function from the OpenCV library, with bicubic interpolation applied to preserve critical facial features while maintaining computational efficiency.

x_{i}^{L R} = R e s i z e (x_{i}, 24 x 24), \forall_{x_{i}} \in D

(2)

This results in a combined dataset of 58,350 facial images.

To reduce computational complexity during training, all images undergo facial detection using a predefined algorithm

F (\cdot)

, which extracts the facial region from each image.

Face Detection Preprocessing

To reduce computational complexity during training, all images undergo facial detection using a predefined algorithm

F (\cdot)

, which extracts the facial region from each image.

{\tilde{x}}_{i}^{H R} = F (x_{i}), {\tilde{x}}_{i}^{L R} = F (x_{i}^{L R})

(3)

The processed datasets become

{\tilde{D}}_{H R} = {\{{\tilde{x}}_{i}^{H R}\}}_{i = 1}^{N}, {\tilde{D}}_{L R} = \{{\tilde{x}}_{i}^{L R}\}_{i = 1}^{N}

(4)

Data Splitting and Structuring

Each of the HR and LR datasets is partitioned into training, validation, and test sets.

{\tilde{D}}_{H R} = {\tilde{D}}_{H R}^{t r a i n} \cup {\tilde{D}}_{H R}^{v a l} \cup {\tilde{D}}_{H R}^{t e s t}

(5)

{\tilde{D}}_{L R} = {\tilde{D}}_{L R}^{t r a i n} \cup {\tilde{D}}_{L R}^{v a l} \cup {\tilde{D}}_{L R}^{t e s t}

(6)

The images are stored in six main folders.

\{h r - t r a i n, h r - v a l, h r - t e s t, l r - t r a i n, l r - v a l, l r - t e s t\}

(7)

Each main folder contains four sub-folders categorized by individual identity. Each image file is labeled with an 8-digit code.

{f i l e n a m e}_{i} = {I D}_{i}^{(6)} ⨁ {C o d e}_{i}^{(2)}

(8)

where ID, denotes a unique person ID, and Code encodes gender and skin tone, as outlined in Table 3.

CNN-Based Feature Extraction

Two separate CNNs are trained for HR and LR image branches, respectively. Each network maps an input image to a latent feature vector in

R^{d}

.

{C N N}_{H R B} : {\tilde{x}}_{i}^{H R} \mapsto f_{i}^{H R} \in R^{d}, {C N N}_{L R B} : {\tilde{x}}_{i}^{L R} \mapsto f_{i}^{L R} \in R^{d}

(9)

Similarity Computation

After extracting feature vectors from two facial images one high-resolution (HR) and one low-resolution (LR) using the feature extraction function f, the similarity between the two images is evaluated using a similarity function F. We adopt cosine similarity as the metric to quantify how close the two vectors are in the embedding space. The cosine similarity is defined as follows.

S i m (f_{i}^{H R}, f_{i}^{L R}) = \frac{f_{i}^{H R} \cdot f_{i}^{L R}}{‖f_{i}^{H R}‖ ‖f_{i}^{L R}‖}

(10)

where

f_{i}^{H R}, f_{i}^{L R} \in R^{d}

are the feature vectors extracted from the HR and LR images, respectively.

· denotes the dot product of the two vectors.

∥⋅∥₂ is the L2 norm, used to normalize the vectors.

The resulting similarity score ranges from −1 to 1,

where

A value closer to 1 indicates a high degree of similarity (i.e., likely the same person);

A value near 0 or negative suggests low similarity (i.e., different individuals).

To make a final decision, a threshold τ is introduced. The identity verification outcome is determined as follows

D e c i s i o n = \{\begin{matrix} S a m e p e r s o n, i f S i m \geq τ \\ D i f f e r e n t p e r s o n, o t h e r w i s e \end{matrix}

(11)

Using cosine similarity allows the model to effectively compare facial representations from images of varying resolutions. Because cosine similarity is scale-invariant, it relies on the direction of the feature vectors rather than their magnitude. This makes it particularly suitable for real-world scenarios where image quality and capture conditions may vary significantly.

Uncertainty Estimation (Optional Bayesian Component)

To enhance the robustness and interpretability of the system, Bayesian-inspired methods such as Monte Carlo dropout can be employed during inference to estimate predictive uncertainty.

Let

{\hat{y}}_{i}^{(t)}

denote the similar score from the t-th forward pass with dropout enabled. The predictive meaning and variance are computed as

{\hat{μ}}_{i} = \frac{1}{T} \sum_{t = 1}^{T} {\hat{y}}_{i}^{(t)}, {\hat{σ}}_{i}^{2} = \frac{1}{T} \sum_{t = 1}^{T} {({\hat{y}}_{i}^{(t)} - {\hat{μ}}_{i})}^{2}

(12)

This approach provides a confidence estimate that can be used to filter low-certainty predictions or trigger secondary verification processes, thus improving the system’s reliability.

3.1. HRFECNN and LRFECNN Design

In this research, HRFECNN and LRFECNN neural networks were developed to examine the similarity of facial images with different levels of clarity. The LRFECNN neural network incorporates the very-deep super-resolution (VDSR) process to enhance the resolution of low-resolution facial images. Additionally, the deep recursive residual network (DRRN) method was chosen to augment the clarity enhancement process in this study. The neural network architectures are illustrated in Figure 6 and further detailed in Algorithm 1, respectively. The prediction of the enhanced image

\hat{Y}

with input image data X and the model F, can be expressed by the following equation:

\hat{Y} = F (X) + X

(13)

where F(X) represents the result obtained by passing the image X through the enhancement neural network, and

\hat{Y}

is the predicted image by the model, which enhances the details of X directly, resulting in

\hat{Y}

having increased details from X without resizing the image in this process. This equation is known as the skip operation loss, commonly used in the DRRN-style enhancement networks.

Algorithm 1. Enhancement neural network with DRRN [17].

Input Image data X
Output: Enhanced image $\hat{Y}$
Initialize the enhancement neural network model F with the deep recursive residual network architecture.
Pass the input image data X through the enhancement neural network F to obtain the enhanced image features F(X).
Calculate the enhanced image $\hat{Y}$ by adding the enhanced image features F(X) to the input image data X, i.e., $\hat{Y} = F (X) + X$ .
Return the enhanced image $\hat{Y}$

In the depicted enhancement neural network with deep recursive residual network (DRRN) architecture, two key components contribute to flexibility in adjusting the depth levels of processing: the residual units defined within the green-bordered framework and the recursive blocks defined within the red-bordered framework. Each recursive block can accommodate an unrestricted number of residual units, as illustrated in Figure 7. Similarly, within the DRRN framework for enhancing image sharpness, recursive blocks can also be unlimited in number, as shown in Figure 8.

Therefore, in this research, we present the convolutional neural network high-resolution branch (CNNHRB) for processing high-quality facial images, leveraging the architecture from the high-resolution facial enhancement convolutional neural network (HRFECNN), with adjustments made to hyperparameters such as stride and padding, as well as the pooling layers of the convolutional neural network. Additionally, we introduce the convolutional neural network low-resolution branch (CNNLRB), which incorporates the deep recursive residual network (DRRN) for enhancing sharpness, integrated with the convolutional neural network architecture. Subsequently, the results of processing by the CNNHRB and CNNLRB networks are evaluated for facial similarity using cosine similarity, as depicted in Figure 9.

To represent the convolutional neural network high-resolution branch (CNNHRB) and the convolutional neural network low-resolution branch (CNNLRB) mathematically, we can define a series of equations that describe the transformations applied to the input images through the layers of the networks.

CNNHRB (high-resolution branch)

Let

X_{H R}

denote the high-resolution input image. The CNNHRB processes this image through a series of convolutional layers, activation functions, pooling layers, and possibly other layers such as batch normalization. The output of CNNHRB,

Y_{H R}

, is then used for similarity computation.

Convolutional layer.

H_{1} = f (W_{1} * X_{H R} + b_{1})

(14)

2.: Pooling layer.

P_{1} = p o o l (H_{1})

(15)

3.: Subsequent layers.

H_{2} = f (W_{2} * P_{1} + b_{2})

(16)

Y_{H R} = f (W_{n} * H_{n - 1} + b_{n})

(17)

where

W_{n}

and

b_{n}

are the weights and biases of the n-th layer.

f is the activation function (e.g., ReLU).

*

denotes the convolution operation.

pool denotes the pooling operation (e.g., max pooling).

CNNLRB (low-resolution branch with DRRN)

Let

X_{L R}

denote the low-resolution input image. The CNNLRB includes an enhancement step using DRRN to improve image resolution before further processing. The enhanced image is then passed through convolutional layers similar to CNNHRB. The output of CNNLRB,

Y_{L R}

, is used for similarity computation.

Initial convolution:

H_{0} = f (W_{0} * X_{L R} + b_{0})

(18)

2.: Recursive block (DRRN).

Residual unit.

R_{i} = f (W_{R_{i}} * H_{i - 1} + b_{R_{i}}) + H_{i - 1}

(19)

Recursive blocks.

H_{i} = R_{i 1} + R_{i 2} + \dots + R_{i n}

(20)

3.: Subsequent convolutional layers.

H_{1} = f (W_{1} * H_{i} + b_{i})

(21)

Y_{L R} = f (W_{n} * H_{n - 1} + b_{n})

(22)

where

R_{i}

represents the residual units within a recursive block.

H_{i}

is the output of the i-th recursive block.

The weights and biases within DRRN are shared across the recursive units.

Cosine Similarity

After obtaining the outputs

Y_{H R}

and

Y_{L R}

, the cosine similarity between these two feature vectors is computed as follows.

c o s i n e s i m i l a r i t y = \frac{Y_{H R} \cdot Y_{L R}}{‖Y_{H R}‖ ‖Y_{L R}‖,}

(23)

where · denotes the dot product, and ∥ ∥ denotes the Euclidean norm.

Similarity Measurement Justification

In selecting the similarity metric for identity verification, we considered several alternatives, including Euclidean distance and Jaccard similarity. However, cosine similarity was ultimately chosen due to its superior performance in scenarios where image resolution and quality vary significantly. Unlike distance-based measures, cosine similarity focuses on the orientation rather than the magnitude of feature vectors, making it robust to changes in lighting, scale, and image clarity. Experimental results confirmed that cosine similarity consistently outperformed other metrics, achieving a maximum accuracy of 99.7%, particularly when comparing low-resolution images derived from identity documents with high-resolution live captures.

Neural Network Structures of CNNHRB and CNNLRB

The CNNHRB (high-resolution branch) and CNNLRB (low-resolution branch) neural networks comprise four convolutional blocks, each described as follows.

Convolution 1 (Conv1)

The primary difference between CNNHRB and CNNLRB lies in this stage, where CNNLRB incorporates a DRRN (deep recursive residual network) to enhance the resolution of the facial images before further processing. Conv1 consists of three convolutional layers, each with 64 filters, and employs the PReLU (parametric ReLU) activation function.

Convolution 2 (Conv2)

This block comprises five convolutional layers, each with 128 filters, and utilizes the PReLU activation function.

Convolution 3 (Conv3)

Conv3 includes nine convolutional layers, each with 256 filters, and employs the PReLU activation function.

Convolution 4 (Conv4)

This block consists of three convolutional layers, each with 256 filters, and uses the PReLU activation function.

Structure of the deep recursive residual network (DRRN)

The DRRN used to enhance the resolution in CNNLRB is structured as DRRN B1U9, indicating that it contains one recursive block with nine residual units, as illustrated in Figure 10. The DRRN configuration includes recursive block and residual units per block of 1 and 9, respectively.

The DRRN’s architecture is designed to iteratively improve the resolution of low-quality images by passing them through multiple layers that capture and enhance fine details. Incorporating these detailed convolutional layers and the DRRN, the CNNHRB and CNNLRB networks are effectively structured to process and enhance facial images with varying resolutions, ensuring high performance in facial recognition tasks. The procedural workflow and optimization strategy for these networks are described in Algorithm 2.

Developed artificial neural networks CNNHRB and CNNLRB.

Algorithm 2. Shared convolutional blocks in CNNHRB and CNNLRB.

Function block_conv1x
Purpose Implements the first convolutional block (Conv1).
Input input_tensor (Object)
Process
- Apply three convolutional layers with 64 filters each.
- Apply PReLU activation function after each convolutional layer.
- Apply dropout with a rate of 0.25 after the first and second convolutional layers.
- Apply batch normalization after the convolutional layers.
- Apply max pooling with a pool size of 2 × 2.
Output Returns the processed convolution object.
Function block_conv2x
Purpose Implements the second convolutional block (Conv2).
Input input_tensor (Object)
Process
- Apply five convolutional layers with 128 filters each.
- Apply PReLU activation function after each convolutional layer.
- Apply dropout with a rate of 0.25 after the second and fourth convolutional layers.
- Apply batch normalization after the convolutional layers.
- Apply max pooling with a pool size of 2 × 2.
Output Returns the processed convolution object.
Function block_conv3x
Purpose Implements the third convolutional block (Conv3).
Input input_tensor (Object)
Process
- Apply nine convolutional layers with 256 filters each.
- Apply PReLU activation function after each convolutional layer.
- Apply dropout with a rate of 0.25 after every third convolutional layer.
- Apply batch normalization after the convolutional layers.
- Apply max pooling with a pool size of 2 × 2.
Output Returns the processed convolution object.
Function block_conv4x
Purpose Implements the fourth convolutional block (Conv4).
Input input_tensor (Object)
Process
- Apply three convolutional layers with 256 filters each.
- Apply PReLU activation function after each convolutional layer.
- Apply dropout with a rate of 0.25 after the first and second convolutional layers.
- Apply batch normalization after the convolutional layers.
- Apply max pooling with a pool size of 2 × 2.
Output Returns the processed convolution object.
Function cnnhrlr_branch
Purpose Implements the CNNHRB and CNNLRB branches.
Input input_tensor (Object)
Process
- Call block_conv1x with input_tensor as the argument.
- Call block_conv2x with the output from step 1 as the argument.
- Call block_conv3x with the output from step 2 as the argument.
- Call block_conv4x with the output from step 3 as the argument.
- Flatten the output from step 4.
- Add a fully connected (dense) layer with 512 units.
- Apply softmax activation function to the output of the dense layer.
Output: Returns the final output vector of size 512.

Developed artificial neural network to improve contrast efficiency DRRN

The Deep Recursive Residual Network (DRRN) plays a crucial role in enhancing the sharpness of low-resolution face images, thereby improving their clarity before passing them on to the Convolutional Neural Network Low-Resolution Branch (CNNLRB) for subsequent processing. The algorithm detailing this procedure is presented in Algorithm 3.

Algorithm 3. Contrast enhancement neural network DRRN.

Function ConvolutionLayer
Input Parameters input_tensor: An object representing the input tensor. filters: An integer specifying the number of filters (128 for intermediate layers, 1 for the final layer).
Process
- Apply a convolutional layer with a kernel size of 3 and padding of 3.
- Apply a ReLU activation function.
Output Return the resulting convolution object.
Function ResidualUnit
Input Parameters input_tensor: An object representing the input tensor.
Process
- Call ConvolutionLayer with input_tensor and filters (128).
- Call ConvolutionLayer again with the resulting tensor and filters (128).
Output Return the resulting convolution object.
Function RecursiveBlock
Input Parameters input_tensor An object representing the input tensor. r_unit An integer specifying the number of residual units.
Process
- Initialize the tensor as input_tensor.
- For each unit in r_unit.
- Call ResidualUnit with the current tensor.
- Update the tensor with the resulting convolution object.
Output Return the resulting convolution object.
Function drrn_branch
Input Parameters: input_tensor An object representing the input tensor. r_unit An integer specifying the number of residual units. r_block An integer specifying the number of recursive blocks.
Process
- Initialize the tensor as input_tensor.
- For each block in r_block.
- Call RecursiveBlock with the current tensor and r_unit.
- Update the tensor with the resulting convolution object.
- Apply a fully connected layer to the resulting tensor.
Output Return the final tensor object.
Function cos_sim_model
Input Parameters class_sample: The number of sample classes. model_type The type of neural network (1 for CNNLRB without enhancement, 2 for CNNLRB with DRRN, 3 for CNNLRB with VDSR).
Process
- Depending on model_type, select the corresponding CNNLRB model.
- Use the model to compute the output vectors for each sample in class_sample.
- Calculate the cosine similarity between the resulting vectors.
Output
Return the cosine similarity values.

Development of Utility Functions

Utility functions play a crucial role in preparing data and setting appropriate parameters for the processing and training of neural networks. A detailed explanation of the development of these utility functions is provided in Algorithm 4.

Algorithm 4. Utility Functions

The get_class_sample function is designed to create a dataset for training and validating the neural network according to the developed neural network structure. It takes the parameters sample_resolution and batch_size, which define the resolution type of the sample images and the batch size for processing by the neural network. The function returns class_samples, train_samples, val_samples, train_dir, val_dir, img_height, and img_width.
Inputs sample_resolution: The resolution type of the sample images (e.g., ‘high’, ‘low’). batch_size: The batch size for neural network processing.
Outputs
- class_samples: Number of face sample groups.
- train_samples: Training dataset.
- val_samples: Validation dataset.
- train_dir: Directory path for training data.
- val_dir: Directory path for validation data.
- img_height: Image height.
- img_width: Image width.
Initialize Directories and Parameters
- Set train_dir and val_dir based on sample_resolution.
- Define image dimensions (img_height and img_width) based on sample_resolution.
Load and Split Data
- Load images from train_dir.
- Split the images into training and validation datasets.
Prepare Batches Create batches of train_samples and val_samples based on batch_size.
Count Samples Count the number of class samples, training samples, and validation samples.
Return Values Return class_samples, train_samples, val_samples, train_dir, val_dir, img_height, and img_width.

3.2. Training the Neural Network

The developed neural network was trained to enhance its deep learning capabilities. The training process utilized a dataset comprising facial images of 70 individuals, totaling 23,341 images per network for CNNHRB and CNNLRB. During each training iteration, the model summaries, including the parameter counts for CNNHRB and CNNLRB, were recorded. The approximate number of parameters for each network is shown in the following table.

In this study, we trained two types of neural networks a high-resolution neural network (CNNHRB) and a low-resolution neural network (CNNLRB), to examine and compare the similarity between the two networks using cosine similarity. The overall methodology is outlined in Table 4. The process and methods can be defined and represented in the following equations. To begin the neural network training, we used a dataset of classified images. This allowed the networks to learn various features and patterns present in the images. Upon the completion of the training, we obtained two types of neural networks. CNNHRB, which has high resolution, and CNNLRB, which has low resolution. We then compared images of similar characteristics using these two networks. The similarity analysis of images processed through these two networks utilized cosine similarity, which can be described by the following equation.

s i m i l a r i t y = \cos (\emptyset) = \frac{A \cdot B}{∥ A ∥ ∥ B ∥} = \frac{\sum_{i = 1}^{n} A_{i} B_{i}}{\sqrt{\sum_{i = 1}^{n} A_{i}^{2}} \sqrt{\sum_{i = 1}^{n} B_{i}^{2}}}

(24)

where

A is the vector of the CNNHRB neural network.
B is the vector of the CNNLRB neural network.

For the cosine similarity result that has been trained, shown as shown in the Figure 11.

The researchers conducted comprehensive training and performance evaluations on the proposed low-resolution neural network (CNNLRB), which integrates a deep recursive residual network (DRRN) as an enhancement module for image resolution. To systematically assess the effectiveness of enhancement techniques, the DRRN-based CNNLRB was benchmarked against two alternative configurations: a baseline CNNLRB without enhancement, and a CNNLRB incorporating a very deep super-resolution (VDSR) network. Each variant was trained on a labeled facial image dataset, enabling the networks to learn the discriminative features and spatial patterns necessary for accurate identity verification. The evaluation focused on multiple performance metrics, including classification accuracy, image reconstruction fidelity, and computational efficiency. Furthermore, in alignment with statistical machine learning paradigms, the models were evaluated not only based on predictive performance but also on their robustness to input variability and their ability to generalize across image degradations. To explore model reliability under practical uncertainty, inference results were analyzed using uncertainty-aware techniques such as prediction confidence and the consistency of cosine similarity scores across test samples. Among the three models, the DRRN-enhanced CNNLRB demonstrated superior performance across all benchmarks. Notably, it maintained higher fidelity in reconstructed facial features and exhibited greater stability in its similarity predictions—qualities that are critical in biometric systems involving degraded or low-resolution imagery. These findings underscore the importance of integrating advanced neural architectures with statistical robustness principles. In particular, they highlight the potential of DRRN-based networks to significantly improve the reliability and accuracy of low-resolution facial image recognition systems, especially when deployed in real-world applications involving noisy or uncertain visual inputs.

Compliance with Identity Verification Standards

The proposed framework integrates stringent verification procedures aligned with internationally recognized standards, including the Identity Assurance Levels (IAL) and Authenticator Assurance Level (AAL). To ensure document authenticity and identity accuracy, the system performs the following actions:

Facial feature comparisons using dedicated CNN branches for high- and low-resolution images.
Super-resolution enhancement using SRCNN, VDSR, and DRRN models to improve the clarity of low-resolution document images.
Bayesian-inspired uncertainty estimation via Monte Carlo Dropout, providing predictive confidence levels alongside verification results.

This layered approach enhances security, supports risk-aware decision-making, and ensures regulatory compliance in sensitive applications such as financial services and national identity systems.

4. Experimental Results

Dataset Description and Availability

The experiments utilized the CASIA-WebFace dataset, a publicly available benchmark commonly used for face recognition research. This dataset includes 494,414 facial images with a resolution of 250 × 250 pixels, covering a wide demographic range. The total dataset size is approximately 3.8 GB, making it suitable for training deep learning models that generalize well across diverse populations. The dataset is accessible via the following link, CASIA-WebFace on Kaggle.

This study focuses on the enhancement and development of neural networks for comparing high-resolution facial images with low-resolution facial images, simulating the process of matching appearance characteristics with photographic evidence in identity verification procedures integral to electronic know your customer (e-KYC) mechanisms. In this context, the appearance characteristics are represented by high-resolution facial images, whereas the photographic evidence is represented by low-resolution facial images. For the evaluation, a low-resolution facial image from each of the 70 sample subjects was selected to represent the photographic evidence. The facial recognition process was then tested using the developed neural networks on a dataset consisting of 2917 facial images from these 70 sample subjects. The hypothesis tested pertained to the accuracy levels in facial comparison, considering factors such as skin tone and accuracy results. The evaluation aimed to determine the effectiveness and accuracy of the developed neural networks in recognizing and matching facial images under varying conditions of resolution and skin tone, thereby assessing their potential application in real-world e-KYC processes. The testing procedure is illustrated in the accompanying Figure 12.

All experiments were conducted using Python 3.9 and TensorFlow 2.11 on a workstation equipped with an NVIDIA RTX 3090 GPU (NVIDIA, Santa Clara, CA, USA) and running Ubuntu 20.04 LTS. The deep learning models were trained using a batch size of 64 and an Adam optimizer with a learning rate of 0.0001. This configuration ensured efficient training performance while maintaining model accuracy. The training process for the proposed dual-branch CNN architecture took approximately 3.5 h for 100 epochs on the aforementioned hardware setup. This training time includes both the high-resolution and low-resolution image branches and reflects the full end-to-end learning process from feature extraction to similarity computation.

In the facial recognition testing procedure described above, the levels of similarity or accuracy were categorized into three thresholds: 85%, 90%, and 95%. The performance of three different neural networks was compared: a network without any super-resolution enhancement, a network enhanced with very-deep super-resolution (VDSR), and a network enhanced with deep recursive residual network (DRRN). The facial recognition tests were conducted and the results were recorded in CSV files. Each entry in these files included the following attributes: file name, accuracy level, gender code, skin tone code, and test result code. The collected data from these tests were then processed to evaluate the accuracy of each neural network.

4.1. Comparison of Similarity Between Neural Network Structures

This section details the measurement of similarity among the three neural network structures: the baseline network without super-resolution enhancement, the network enhanced with very-deep super-resolution (VDSR) [8], and the network enhanced with deep recursive residual network (DRRN). The maximum accuracy rates achieved by these three neural network configurations were 85.6%, 98.8%, and 99.2%, respectively, as illustrated in the accompanying Figure 13. The evaluation process involved comparing the structural similarities and performance metrics of each neural network type under various conditions. This comparison aimed to determine the most effective neural network configuration for enhancing facial recognition accuracy. The results demonstrate the significant improvements in accuracy achieved through the application of super-resolution techniques, particularly VDSR and DRRN, highlighting the potential benefits of these methods in practical facial recognition applications.

The graph shows the similarities of the three Neural Network structures: the baseline network without super-resolution enhancement, shown with a dotted blue line; VDSR, shown with a dotted green line; and the network enhanced with DRRN, shown with a red line (Figure 13).

4.2. Results of the Facial Recognition Process

In this section, we present the results of the facial recognition process, particularly within the context of identity verification in an electronic Know Your Customer (e-KYC) procedure. This involves simulating the comparison between a high-resolution image (representing a clear, detailed appearance) and a low-resolution image (representing a blurred photo from identification documents). The raw data from ten experimental trials of the facial recognition process were used to calculate the average similarity or accuracy for each neural network configuration. The neural networks tested included the baseline model without super-resolution enhancement, the model with very-deep super-resolution (VDSR) enhancement, and the model with deep recursive residual network (DRRN) enhancement. The results of these comparisons, detailing the predicted accuracy for each neural network, are presented in Table 5. This analysis aims to identify which neural network configuration offers the highest accuracy in matching high-resolution images with their low-resolution counterparts, thereby enhancing the reliability of the e-KYC process.

Based on the raw test data from 10 rounds of facial recognition trials, we calculated the average similarity or accuracy rates of each neural network, categorized by the skin tone of the facial samples. The tests were conducted on a sample set of 560 facial images with darker skin tones, comprising 324 female and 276 male subjects. The comparison of the predictive accuracy of each neural network is detailed in Table 5. In this study, the neural networks were evaluated for their performance in recognizing faces with darker skin tones. The experimental setup involved multiple rounds of testing to ensure the reliability of the results. Each neural network’s accuracy was determined by averaging the outcomes across the ten trials. This approach helps in understanding the efficacy of the models under varying conditions and highlights any discrepancies in performance based on gender within the same skin tone category. Table 6 presents a comprehensive comparison, showing how each neural network performed in terms of accuracy. This detailed analysis allows us to discern which neural network offers the best performance for facial recognition tasks involving individuals with darker skin tones. The results are instrumental in refining the models and improving their applicability in diverse real-world scenarios, ensuring fairness and accuracy across different demographic groups.

The testing with samples of non-dark skin tone faces, consisting of 2357 images, was divided into 1248 female subjects and 1109 male subjects. The results of the comparison of the prediction accuracy of each neural network are shown in Table 6. In this study, neural networks were evaluated for their performance in recognizing faces with non-dark skin tones. The experimental setup included multiple testing rounds to ensure the reliability of the results. The accuracy of each neural network was calculated by averaging the results from all tests. This method allowed us to understand the performance of the models under various conditions and to highlight the differences in performance by gender within the same skin tone group. Table 7 provides a detailed comparison of how each neural network performed in terms of accuracy. This thorough analysis helps us identify which neural network delivers the best performance for face recognition involving individuals with non-dark skin tones. These results are crucial for refining models and enhancing their applicability in diverse real-world scenarios, ensuring fairness and accuracy across different demographic groups.

Based on the comparative analysis presented in Table 7, the research utilizing various techniques demonstrates significant accuracy in its results. The proposed technique exhibits a higher level of accuracy compared to existing studies. This increased accuracy can be attributed to the use of a more extensive and comprehensive dataset, which is larger and more diverse than those employed in previous research. Consequently, this leads to improved accuracy levels. In accordance with standard machine learning practice, the dataset was divided into three subsets: 70% for training, 15% for validation, and 15% for testing. These proportions were consistently applied to both the high-resolution and low-resolution image branches. Furthermore, the same data splitting strategy was used for all methods compared in Table 7 to ensure a fair and meaningful performance comparison.

Our proposed model outperforms other works, achieving an accuracy of 99.7%. This surpasses the performance of existing models, highlighting the effectiveness and robustness of our approach. The comprehensive dataset used in our research not only enhances the model’s ability to generalize but also ensures that the model performs well across different conditions and variations present in the data.

Table 8 compares the proposed DRRN model with recent state-of-the-art methods for image classification. Despite using a smaller input size (24 × 24), DRRN achieves the highest accuracy at 99.3%, outperforming all other models. This highlights its efficiency and superior performance, especially in low-resolution scenarios. Figure 14 presents a comparative statistical analysis of facial recognition accuracy achieved by three neural network configurations, namely baseline, VDSR-enhanced, and DRRN-enhanced tested under varying image resolution and demographic diversity (skin tone categories). The DRRN-enhanced network consistently demonstrates superior accuracy, particularly in low-resolution scenarios (24 × 24 pixels), achieving up to 99.7% accuracy even for dark-skinned individuals. These results exemplify key principles of statistical machine learning, including data-driven model evaluation, robustness to demographic and input variability, and high-resolution reconstruction from degraded inputs. The findings also support the development of uncertainty-aware biometric systems, in line with the objectives of Bayesian methods in imaging applications, by reinforcing the reliability and generalizability of neural network predictions under real-world constraints.

To assess the impact of super-resolution preprocessing on model performance, we conducted an ablation study comparing identity classification accuracy with and without the super-resolution module. As illustrated in Figure 15, the inclusion of the SRCNN-based enhancement step improved the classification accuracy from 97.2% to 99.3%. This result demonstrates the practical benefit of integrating lightweight image enhancement techniques into e-KYC systems, especially in environments where image resolution and quality are limited. The improvement supports the hypothesis that super-resolution can compensate for degraded input quality and enhance feature extraction in downstream tasks.

In addition to standard evaluation metrics such as accuracy, precision, recall, and F1-score, we incorporated the area under the ROC curve (AUC) to provide a more comprehensive assessment of classification performance. The AUC metric is particularly important in identity verification tasks, as it reflects the model’s ability to distinguish between positive and negative classes across varying decision thresholds. As presented in Table 9, the proposed CNN combined with SRCNN-based super-resolution achieved the highest AUC score of 99.5%, outperforming other state-of-the-art deep learning models, including FaceNet (98.4%), ResNet50 (97.8%), and MobileNetV2 (97.0%). This indicates superior robustness and discriminative capability, especially in handling low-quality facial images that are typical in e-KYC scenarios. Moreover, while achieving the best AUC and overall classification performance, the proposed model maintains a low inference time and minimal model size, making it ideal for deployment in resource-constrained environments. These results confirm that the integration of super-resolution techniques not only enhances feature quality, but also improves the model’s ability to generalize across uncertain or degraded inputs, which is an essential requirement for secure and reliable digital identity verification systems.

Figure 16 presents the confusion matrix illustrating the performance of the proposed neural network in the task of high-resolution vs. low-resolution facial matching within the context of electronic know your customer (e-KYC) procedures. The evaluation was conducted on a dataset comprising 2917 facial images collected from 70 subjects, where high-resolution images represent registered identity data and low-resolution images simulate photographic evidence typically captured in real-world verification scenarios. The matrix indicates that the model correctly identified 1400 true positive cases, where the low-resolution facial image was accurately matched to its corresponding high-resolution counterpart. Additionally, the model achieved 1462 true negatives, successfully distinguishing non-matching pairs.

The Figure 17 demonstrates how varying levels of noise and blur can degrade facial image quality. The top row begins with the original, undistorted image, followed by three versions affected by increasing levels of noise ranging from mild graininess to severe pixel disruption. The bottom row shows the same progression, but for blur: starting with a slightly softened version, then advancing to moderate and high blur levels where facial details become progressively harder to distinguish. This visualization underscores the impact of common image degradations on visual clarity and potentially on recognition performance. To address concerns regarding the model’s resilience to degraded image quality, a comprehensive robustness evaluation was conducted on the proposed CNN + SRCNN framework. This evaluation simulates real-world scenarios where facial images in e-KYC processes may be affected by varying levels of Gaussian noise and blur, which are common due to inconsistent capture environments and device limitations. The model was tested across different conditions, including normal images and images augmented with low, medium, and high levels of noise and blur. As presented in Table 10, the proposed method demonstrates strong robustness under mild-to-moderate image degradation. The accuracy decreased from 99.3% on normal images to 98.4% with low noise, and further dropped to 97.2% and 95.0% under medium and high noise conditions, respectively. A similar trend was observed with blur, where the accuracy reduced to 98.1%, 96.8%, and 94.5% as blur severity increased. Despite these reductions, the model maintained high AUC values, consistently above 94%, indicating reliable discriminatory capability even under challenging conditions. Furthermore, precision, recall, and F1-score remained within acceptable ranges, reflecting balanced performance in handling both false positives and false negatives. These results highlight the effectiveness of incorporating super-resolution techniques, which help recover critical facial features from degraded images, and Bayesian-inspired uncertainty estimation, which allows the system to flag low-confidence predictions. This dual approach enhances both the reliability and safety of identity verification in e-KYC applications. While the model performs well under typical and moderately degraded conditions, performance under severe noise and blur still shows noticeable declines. This suggests opportunities for future work, such as integrating advanced denoising algorithms or adaptive image enhancement methods to further improve the robustness. In this paper, the proposed framework proves to be resilient and reliable across a range of image quality scenarios, making it highly suitable for deployment in practical environments where image imperfections are inevitable.

Figure 18 illustrates the impact of varying levels of Gaussian noise and blur on the model’s accuracy in facial image-based identity verification. Under normal conditions, the proposed CNN + SRCNN framework achieves a baseline accuracy of 99.3%, exceeding typical thresholds required for high-assurance identity verification systems. As the image quality degrades due to noise or blur, a gradual reduction in accuracy is observed. However, even under medium degradation, the model consistently maintains accuracy above 96%, which aligns with the operational benchmarks commonly referenced in digital identity frameworks such as the Identity Assurance Level (IAL-2) and Authenticator Assurance Level (AAL-2). These standards emphasize not only high accuracy but also the reliability of authentication processes under varying real-world conditions. Notably, while performance under high noise and high blur scenarios decreases to 95.0% and 94.5%, respectively, these values remain within acceptable limits for many practical e-KYC applications, particularly where additional layers of verification (e.g., document checks or multi-factor authentication) are employed to complement biometric verification. The results underscore the effectiveness of integrating super-resolution preprocessing to mitigate the adverse effects of image degradation. Furthermore, the model’s design incorporates uncertainty estimation mechanisms, allowing it to flag low-confidence predictions supporting compliance with the IAL/AAL guidelines that prioritize risk management and decision transparency. The proposed framework demonstrates strong robustness and adherence to international digital identity standards, ensuring that, even in the presence of moderate image imperfections, the system can reliably support secure and compliant e-KYC operations. Future improvements may target extreme degradation scenarios to further enhance alignment with higher assurance levels (IAL-3/AAL-3) where stricter accuracy thresholds are mandated.

Computational Efficiency Analysis

All experiments were conducted using an NVIDIA RTX 3090 GPU, with model training executed over 100 epochs in approximately 3.5 h. The inference time and model size for various models were recorded to assess their suitability for real-time e-KYC deployment.

The proposed model achieves an excellent trade-off between speed and memory footprint, making it highly suitable for resource-constrained devices in e-KYC applications in Table 11.

The selection of a 24 × 24 pixel resolution for training and processing facial images in this study is the result of both practical considerations and empirical evaluation. This resolution was carefully chosen to balance the challenges posed by real-world low-quality data and the computational efficiency required for deployment in real-time identity verification systems.

Real-world data constraints

In practical e-KYC applications, facial images are often acquired under poor imaging conditions, including scanned identity documents, low-resolution surveillance footage, and mobile devices with limited camera capabilities. An analysis of real-world datasets revealed that most facial images fall within the resolution range of 20 × 20 to 36 × 36 pixels. The chosen resolution of 24 × 24 pixels represents an optimal middle ground, sufficient to retain essential facial features for recognition while ensuring efficient computational processing.

2.: Supporting Evidence from Literature

Ouyang et al. [10] demonstrated that facial recognition systems utilizing 24 × 24 pixel images achieved an accuracy of 99.1% when combined with super-resolution techniques. Li et al. [28] also compared lower resolutions (21 × 15 and 16 × 12 pixels) with 24 × 24 pixels and found that the latter produced the most stable and accurate results across multiple test scenarios.

3.: Ablation Study Results

An ablation study was conducted to evaluate the impact of image resolution on system performance. The results are summarized as follows.

While the 32 × 32 pixel resolution produced a marginally higher accuracy, it significantly increased the computational costs by approximately 35%. The 24 × 24 resolution achieved nearly the same accuracy with a much lower processing burden, making it a more practical choice for real-time applications in Table 12.

4.: Alignment with Standard Benchmarks

The chosen resolution also aligns with commonly used facial recognition benchmarks such as CASIA-WebFace and LFW (labeled faces in the wild). When combined with the super-resolution enhancement, this resolution provides sufficient facial detail to support reliable identity verification even under suboptimal input conditions.

The empirical results and supporting literature confirm that a 24 × 24 pixel resolution offers the best trade-off between recognition accuracy, computational efficiency, and real-world applicability. This decision ensures that the proposed identity verification framework remains both technically robust and practically deployable in low-resource and high-volume environments.

5.: Dataset Used for Training and Evaluation

The experiments in this study utilized the CASIA-WebFace dataset, a widely recognized benchmark in facial recognition research. The details are as follows.

Size of Dataset
▪
Total images: 494,414 facial images
▪
Number of subjects: 10,575 individuals
Diversity of Dataset
▪
The dataset covers a wide range of variations, including different genders, age groups, ethnicities, and facial expressions.
▪
Images include multiple poses, varying illumination conditions, and partial occlusions, providing a robust foundation for training models intended for real-world scenarios.
Preprocessing Steps
▪
All facial images were first detected and aligned using the multi-task cascaded convolutional networks (MTCNN) for consistent face positioning.
▪
Images were resized to 96 × 96 pixels after super-resolution processing, while low-resolution versions were downscaled to 24 × 24 pixels for training the super-resolution module.
▪
Data augmentation techniques such as random rotation, flipping, and contrast adjustment were applied to enhance model generalization.
Low-Resolution Image Simulation
▪
Low-resolution images were artificially generated by downsampling high-resolution facial images using bicubic interpolation. This approach simulates real-world scenarios where low-resolution images result from poor-quality cameras or document scans.
▪
Resolutions tested include 16 × 16, 24 × 24, and 32 × 32 pixels, with 24 × 24 pixels selected as the optimal size based on ablation study results.

6.: Comparison with state-of-the-art methods

The proposed framework achieved the highest accuracy of 99.3%, while also providing uncertainty-aware predictions through Monte Carlo Dropout.
Although models like FaceNet performed well in terms of accuracy, they lack built-in uncertainty estimation, which is critical for making transparent and risk-aware decisions in sensitive applications like e-KYC.
Additionally, the proposed model strikes a balance between high accuracy and reasonable inference time, making it suitable for real-time deployment.

The proposed framework demonstrates significant performance gains over existing methods, particularly due to its ability to handle low-resolution images effectively and provide confidence-aware predictions, enhancing both reliability and transparency in identity verification systems in Table 13.

Robustness Evaluation Under Image Degradations

To evaluate the robustness of the proposed framework, additional quantitative experiments were performed under various image degradation conditions, including noise, blur, low contrast, and compression artifacts. The experimental results are presented in Table 14 below.

These results clearly demonstrate that the super-resolution preprocessing module significantly enhances the model’s capability to handle degraded images. On average, the proposed system achieves an improvement in accuracy of over 10% across all tested scenarios. The greatest improvements are observed under motion blur and Gaussian blur conditions, which are commonly encountered in practical environments.

The findings confirm that the proposed framework is highly robust and reliable, even when input images suffer from severe quality degradation. This robustness makes the framework well suited for deployment in real-world e-KYC systems and security-critical applications where image quality cannot always be guaranteed.

5. Conclusions and Discussion

Based on the research conducted on customer identification processes via electronic channels, this study focuses on identity verification using deep learning techniques, particularly by employing a neural network architecture that improves image clarity using a deep recursive residual network (DRRN). The approach simulates the comparison of facial features between high-resolution live images and low-resolution document images, reflecting the conditions found in real-world e-KYC processes. The proposed model, enhanced with DRRN, achieved a facial matching accuracy of 99.7% when comparing identification document images to displayed images. This performance significantly surpasses that of non-enhanced or VDSR-enhanced neural networks, illustrating the effectiveness of integrating image super-resolution within a facial recognition pipeline. The analysis demonstrates that utilizing a DRRN-enhanced model constitutes a robust solution for electronic identity verification. In particular, the model shows strong generalization across different skin tones and resolution conditions, which aligns with the core principles of statistical machine learning, such as data-driven inference, robustness to variation, and performance evaluation under real-world constraints. Although Bayesian inference is not explicitly implemented, the proposed approach supports the uncertainty-aware modeling paradigm by addressing the challenges posed by low-quality inputs and simulating real-world ambiguities in identity data. This contributes meaningful insights toward the broader scope of Bayesian methods with imaging applications, especially in the systems requiring high-confidence decision making such as biometric authentication and e-KYC. Thailand’s strategic adoption of the Identity Assurance Level (IAL) and Authenticator Assurance Level (AAL) further validates the relevance of this work. By leveraging CNN-based architectures and DRRN techniques within this framework, the study offers a technically grounded and policy-aligned contribution to modern identity verification practices. In conclusion, DRRN-enhanced deep neural networks present a promising, statistically grounded approach for improving the accuracy, reliability, and interpretability of customer identity verification through electronic channels, and represent a valuable direction for future research and deployment in secure digital identification systems.

This study presents a robust and interpretable framework for facial image-based identity verification, specifically designed to address challenges encountered in electronic know your customer (e-KYC) systems. By integrating a lightweight convolutional neural network (CNN) with a super-resolution convolutional neural network (SRCNN) preprocessing module, the proposed method effectively enhances low-resolution and degraded facial images, improving feature extraction and overall classification performance. Additionally, Bayesian-inspired uncertainty estimation techniques were incorporated to provide confidence-aware predictions, ensuring safer decision making in high-assurance environments. Extensive evaluations demonstrated that the proposed framework achieves a high classification accuracy of 99.3% under normal conditions, while maintaining strong performance under varying levels of noise and blurring common issues in real-world image capture scenarios. Even under medium degradation, the model consistently delivered accuracy above 96%, aligning with international standards such as the Identity Assurance Level (IAL) and Authenticator Assurance Level (AAL) frameworks. The inclusion of super-resolution processing proved critical in mitigating the adverse effects of poor image quality, while the uncertainty estimation mechanism supported compliance with risk management requirements by identifying low-confidence predictions. Despite its strengths, the study acknowledges certain limitations, particularly when dealing with extreme levels of image degradation where performance noticeably declines. Future research will focus on integrating advanced denoising algorithms and adaptive enhancement techniques to further improve robustness. Additionally, expanding demographic diversity within datasets will be prioritized to ensure fairness and mitigate potential biases across different population groups. The main contributions of this study are aligned with addressing the critical challenges identified in the introduction, particularly those related to low-quality image handling, interpretability, and compliance with digital identity assurance standards in e-KYC systems. These contributions can be summarized as follows.

Development of a Hybrid CNN-SRCNN Framework.
We propose a novel identity verification framework that integrates a lightweight convolutional neural network (CNN) with a super-resolution convolutional neural network (SRCNN) preprocessing module. This design enhances low-resolution and blurred facial images, enabling effective feature extraction and improving classification accuracy in real-world scenarios where image quality is often inconsistent.
Integration of Bayesian-Inspired Uncertainty Estimation.
To address the limitations of conventional deep learning “black-box” models, the proposed system incorporates uncertainty estimation techniques. This allows the model to provide confidence-aware predictions, supporting risk management and decision transparency key requirements in sensitive applications such as financial services and national identity verification.
Robustness to Image Degradation.
Through comprehensive robustness evaluations, the framework demonstrated strong tolerance to varying levels of Gaussian noise and blur, maintaining high performance even under moderate degradation. This ensures reliability in practical e-KYC deployments where image imperfections are unavoidable due to diverse capture environments.
Compliance with International Standards (IAL/AAL).
The proposed approach was designed with adherence to digital identity assurance frameworks, such as Identity Assurance Level (IAL) and Authenticator Assurance Level (AAL). The system not only achieves high accuracy but also aligns with operational and security requirements for trustworthy identity verification.
Scalability and Practical Deployment.
The framework offers a balance between computational efficiency and accuracy, with a compact model size and fast inference time, making it suitable for deployment on the resource-constrained devices typically used in e-KYC processes.

In summary, this study contributes a secure, explainable, and scalable identity verification solution that directly addresses gaps in current research, as highlighted in the introduction. It advances the field by bridging deep learning innovation with practical, standards-compliant applications in digital identity management. The research questions are addressed as follows:

RQ1: The proposed framework successfully addressed the challenge of low-resolution and degraded facial images by integrating super-resolution techniques, including SRCNN and DRRN, prior to feature extraction. Experimental results demonstrated a significant improvement in classification accuracy, achieving a maximum accuracy of 99.7%, even under image degradation conditions. This confirms the effectiveness of the proposed approach in improving recognition accuracy for e-KYC scenarios.
RQ2: To enhance decision reliability and transparency, Monte Carlo dropout-based Bayesian uncertainty estimation was incorporated into the framework. This enabled the system to generate confidence scores alongside predictions, allowing for risk-aware decision making. The experimental results showed that predictions with higher uncertainty corresponded to lower accuracy cases, validating the role of uncertainty estimation in improving decision reliability.
RQ3: The proposed framework was explicitly designed to align with international identity verification standards, including IAL and AAL. By providing high accuracy, interpretable outputs, and uncertainty-aware predictions, the system supports compliance with these standards, ensuring suitability for deployment in regulated environments such as financial services and government identity management systems.

In designing a reliable identity verification framework, this study integrates three key components: super-resolution preprocessing, convolutional neural networks (CNNs), and Monte Carlo dropout. Each plays a distinct role and collectively addresses critical challenges associated with low-quality input data, feature extraction accuracy, and decision reliability.

Super-Resolution Preprocessing

In real-world identity verification scenarios, especially in electronic know your customer (e-KYC) systems, facial images are often captured under suboptimal conditions. Images from scanned documents, low-end mobile devices, or surveillance cameras typically suffer from low resolution and poor quality, which significantly hinder the extraction of distinctive facial features. Super-resolution preprocessing addresses this challenge by enhancing the clarity and detail of low-quality facial images before they are processed by recognition models. Advanced techniques such as SRCNN, VDSR, and DRRN reconstruct high-frequency details, making previously obscure facial features more distinguishable. This preprocessing step ensures that the subsequent deep learning models receive higher quality input, ultimately improving recognition performance even when the original data are compromised.

2.: Convolutional Neural Networks (CNNs)

Once the image quality has been restored, CNNs play a crucial role in extracting meaningful facial features and performing identity classification. CNNs are highly effective at capturing spatial hierarchies in images, enabling them to recognize complex patterns such as facial landmarks, contours, and texture variations. By learning the deep representations of facial features, CNNs significantly improve the accuracy of identity recognition, even under variations in pose, illumination, and partial occlusion. With the support of the super-resolution module, CNNs can extract more reliable and discriminative features, reducing misclassification rates in challenging real-world environments.

3.: Monte Carlo Dropout

While accurate classification is essential, it is equally important for an identity verification system to assess the reliability of its predictions. In high-stakes scenarios like financial transactions and government services, false positives or negatives can have serious consequences. Monte Carlo dropout is introduced to quantify predictive uncertainty, offering insights into how confident the model is in its predictions. By performing multiple stochastic forward passes during inference and analyzing the variance in outputs, the system can generate a confidence score for each decision. This allows the system to flag uncertain cases for further review, reducing the risk of erroneous verifications and enabling a more transparent and risk-aware decision-making process.

How These Components Complement Each Other

The integration of super-resolution, CNN, and Monte Carlo dropout creates a synergistic framework.

Super-resolution enhances image quality, enabling the CNN to work with clearer, more informative data.
CNN efficiently extracts deep facial features and performs accurate identity classification.
Monte Carlo dropout evaluates the certainty of each prediction, providing an additional layer of reliability and transparency.

Together, these components ensure that the system not only performs well under challenging conditions but also communicates its confidence in each decision. This integrated approach makes the framework highly suitable for real-world deployments where data quality is inconsistent and reliable, risk-informed decisions are paramount, none have comprehensively addressed the combined issues of low-resolution image processing, uncertainty estimation, and compliance with identity assurance frameworks. This research builds upon those foundations to propose a more holistic and practically deployable solution for secure and trustworthy identity verification in electronic environments.

Analysis of the Proposed Framework’s Performance

The outstanding performance of the proposed identity verification framework is a direct result of the synergistic integration of three critical components: super-resolution preprocessing, a DRRN-based CNN for feature extraction, and uncertainty estimation via Monte Carlo dropout. Each component plays a distinct and complementary role in overcoming the typical challenges encountered in low-quality facial image recognition and enhancing decision reliability in high-stakes applications such as e-KYC systems.

Super-Resolution Preprocessing: Enhancing Image Quality

Super-resolution preprocessing addresses the common issue of degraded input images resulting from low-resolution captures, poor lighting, motion blur, or compression artifacts. This enhancement ensures that critical facial details are restored, allowing the CNN to extract richer and more discriminative features. Experimental results demonstrate that applying super-resolution preprocessing improved recognition accuracy from 84.32% to 99.3%, validating its critical contribution to the framework.

2.: DRRN-Based CNN: Robust Feature Extraction

The convolutional neural network based on the deep recursive residual network (DRRN) efficiently captures complex facial patterns under challenging conditions, including variations in pose, illumination, and occlusions. The recursive residual learning blocks enable the network to learn hierarchical features effectively without suffering from vanishing gradients. This allows the model to outperform traditional CNNs such as ResNet-50 and MobileNetV2, achieving 99.3% accuracy with lower computational complexity.

3.: Monte Carlo Dropout: Predictive Uncertainty Estimation

Monte Carlo dropout introduces predictive uncertainty estimation by performing multiple stochastic forward passes during inference. This generates a distribution of predictions from which confidence intervals and variance metrics are derived. High uncertainty predictions can be flagged for further verification or manual review, reducing the risk of incorrect automated verifications. This mechanism enables the system to provide confidence-aware decisions, enhancing transparency and reliability in critical identity verification tasks.

Synergistic Effects of Combined Components

This integrated approach directly addresses the limitations in the latest, particularly the lack of uncertainty estimation and practical deployment readiness, as summarized in Table 15. The proposed framework not only achieves state-of-the-art accuracy but also ensures reliability and compliance with international standards such as IAL and AAL for secure and trustworthy identity verification. The proposed identity verification framework presents significant advancements over recent studies offering an end-to-end integrated solution that combines super-resolution preprocessing, DRRN-based CNN feature extraction, and uncertainty estimation using Monte Carlo dropout within a single, cohesive system. Unlike prior research that primarily focused on enhancing image quality or improving recognition accuracy, this framework not only restores critical facial details from low-resolution images, but also introduces predictive uncertainty quantification, enabling risk-aware and confidence-informed decision making. This capability directly addresses the demands of high-stakes environments such as e-KYC, where regulatory compliance and decision transparency are paramount. Furthermore, the proposed system is designed with practical deployment considerations, balancing computational efficiency and low latency to ensure seamless integration with existing digital identity infrastructures. Its alignment with international standards, including NIST’s Identity Assurance Level (IAL) and Authenticator Assurance Level (AAL), further reinforces its readiness for real-world applications, making it a pioneering and comprehensive solution in the field of secure digital identity verification.

Although the proposed framework achieves high accuracy and decision reliability, several practical challenges must be considered when deploying it in real-world e-KYC environments. One key challenge involves the computational demands of the framework. The use of super-resolution for image enhancement and Monte Carlo dropout for uncertainty estimation introduces additional processing overhead. Estimating uncertainty requires multiple forward passes during inference, increasing the computational workload. This can create performance bottlenecks in large-scale deployments or environments with limited computational resources, such as mobile and edge devices, unless supported by dedicated AI accelerators. Latency is another important consideration. Real-time verification is critical for a smooth user experience, particularly in financial and security applications. However, the iterative nature of uncertainty estimation can introduce delays in decision making. While it is possible to adjust the number of forward passes to reduce latency, this trade-off may affect the accuracy of uncertainty measurement. Balancing processing speed and decision confidence remains a practical challenge in deployment. Additionally, integration with existing infrastructure presents its own set of difficulties. Many current e-KYC platforms rely on legacy systems that are not designed to handle advanced AI-driven models providing both classification results and uncertainty metrics. Incorporating such outputs requires modifications to system architectures and decision workflows. Furthermore, integrating the framework into environments governed by strict privacy and regulatory requirements adds complexity, particularly in industries such as finance and government services.

Addressing these challenges will be essential to ensure the successful adoption of the proposed framework. Future work should focus on optimizing model efficiency through lightweight architectures, exploring faster uncertainty estimation techniques, and collaborating with industry partners to facilitate smooth integration with existing systems while maintaining compliance with regulatory standards.

Author Contributions

Methodology, M.K. and P.B.; Software, M.K.; Validation, T.G.; Formal analysis, P.B. and T.G.; Investigation, M.K. and P.B.; Resources, M.K.; Writing—original draft, M.K. and P.B.; Writing—review and editing, T.G.; Visualization, T.G.; Project administration, T.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset is publicly available. It can be downloaded at https://www.kaggle.com/datasets/nhatdealin/casiawebface-dataset-crop/data (accessed on 21 May 2021).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Summary of Mathematical Equations and References

Equation No.	Description	Reference
(1)	Dataset duplication for HR and LR sets	Proposed in this study
(2)	Downsampling images to 24 × 24 pixels	OpenCV resize function [5]
(3)	Face detection preprocessing function	Proposed in this study
(4)	Processed dataset after face detection	Proposed in this study
(5)–(6)	Data splitting for training and validation	Standard practice [7]
(7)–(8)	File naming convention and data organization	Proposed in this study
(9)	CNN-based feature extraction	LeCun et al. [5], modified
(10)	Cosine similarity calculation	Salton and McGill [7]
(11)	Identity verification decision threshold (τ)	Proposed in this study
(12)	Bayesian uncertainty estimation (MC dropout)	Gal and Ghahramani [9]
(13)	Image enhancement via DRRN	Tai et al. [17]
(14)–(21)	CNN feature extraction layers	LeCun et al. [5], modified
(22)	Final cosine similarity calculation	Salton and McGill [7]
(23)	Cosine similarity between CNNHRB and CNNLRB	Proposed in this study

References

McKinsey & Company. The State of AI in 2022–and a Half Decade in Review; McKinsey & Company: New York, NY, USA, 2022; Available online: https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai-in-2022-and-a-half-decade-in-review (accessed on 10 May 2025).
Deloitte. 2021 Global Human Capital Trends: The Social Enterprise in a World Disrupted; Deloitte Insights: New York, NY, USA, 2021; Available online: https://www2.deloitte.com/us/en/insights/focus/human-capital-trends/2021.html (accessed on 10 May 2025).
Digital Government Development Agency (DGA). Digital ID Platform. Available online: https://www.dga.or.th/en/our-services/digital-platform-services/digitalid/ (accessed on 23 May 2022).
National Digital ID Company Limited. Service Policy. Available online: https://ndid.co.th/en/service-policy-en/ (accessed on 10 May 2025).
Electronic Transactions Development Agency (ETDA). Guidelines for the Use of Digital ID for Thailand—Enrollment and Identity Proofing, ETDA No. 19/2018. 2018. Available online: https://www.etda.or.th/th/Useful-Resources/Digital-ID-Guidelines.aspx (accessed on 20 August 2019).
Salton, G.; McGill, M.J. Introduction to Modern Information Retrieval; McGraw-Hill: New York, NY, USA, 1983. [Google Scholar]
Towards Data Science. Applied Deep Learning—Part 4: Convolutional Neural Networks. 2017. Available online: https://medium.com/data-science/applied-deep-learning-part-4-convolutional-neural-networks-584bc134c1e2 (accessed on 30 August 2019).
Ouyang, N.; Wang, X.; Cai, X.; Lin, L. Deep joint super-resolution and feature mapping for low resolution face recognition. In Proceedings of the IEEE International Conference of Safety Produce Informatization (IICSPI), Chongqing, China, 10–12 December 2018. [Google Scholar]
Gal, Y.; Ghahramani, Z. Dropout as a Bayesian Approximation. In Proceedings of the ICML, New York, NY, USA, 20–22 June 2016. [Google Scholar]
Chen, B.-C.; Chen, C.-S. Face Recognition and Retrieval Using Cross-Age Reference Coding with Cross-Age Celebrity Dataset. IEEE Trans. Multimed. 2015, 17, 804–815. [Google Scholar] [CrossRef]
Singh, M.; Nagpal, S.; Vatsa, M.; Singh, R.; Majumdar, A. Identity Aware Synthesis for Cross Resolution Face Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Li, P.; Prieto, L.; Mery, D.; Flynn, P.J. On Low-Resolution Face Recognition in the Wild: Comparisons and New Techniques. IEEE Trans. Inf. Forensics Secur. 2019, 14, 2000–2012. [Google Scholar] [CrossRef]
Iqbal, M.T.B.; Shoyaib, M.; Ryu, B.; Abdullah-Al-Wadud, M.; Chae, O. Directional Age-Primitive Pattern (DAPP) for Human Age Group Recognition and Age Estimation. IEEE Trans. Inf. Forensics Secur. 2017, 12, 2505–2517. [Google Scholar] [CrossRef]
Kim, J.; Lee, J.K.; Lee, K.M. Deeply-Recursive Convolutional Network for Image Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Tai, Y.; Yang, J.; Liu, X. Image Super-Resolution via Deep Recursive Residual Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Xu, X.; Li, W.; Ran, Q.; Du, Q.; Gao, L.; Zhang, B. Multisource Remote Sensing Data Classification Based on Convolutional Neural Network. IEEE Trans. Geosci. Remote Sens. 2017, 56, 937–949. [Google Scholar] [CrossRef]
Shi, W.; Caballero, J.; Huszar, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Ryumina, E.; Dresvyanskiy, D.; Karpov, A. In search of a robust facial expressions recognition model: A large-scale visualcross-corpus study. Neurocomputing 2022, 514, 435–450. [Google Scholar] [CrossRef]
Lozano-Monasor, E.; López, M.; Vigo-Bustos, F.; Fernández-Caballero, A. Facial expression recognition in ageing adults: From lab to ambient assisted living. J. Ambient Intell. Humaniz. Comput. 2017, 8, 567–578. [Google Scholar] [CrossRef]
Lozano-Monasor, E.; López, M.T.; Fernández-Caballero, A.; Vigo-Bustos, F. Facial Expression Recognition from Webcam Based on Active Shape Models and Support Vector Machines. In Proceedings of the Ambient Assisted Living and Daily Activities, Belfast, UK, 2–5 December 2014; Pecchia, L., Chen, L.L., Nugent, C., Bravo, J., Eds.; Springer: Cham, Switzerland, 2014; pp. 147–154. [Google Scholar]
Revina, I.M.; Emmanuel, W.S. A Survey on Human Face Expression Recognition Techniques. J. King Saud Univ. Comput. Inf. Sci. 2021, 33, 619–628. [Google Scholar] [CrossRef]
Kandeel, A.; Rahmanian, M.; Zulkernine, F.; Abbas, H.M.; Hassanein, H. Facial Expression Recognition Using a SimplifiedConvolutional Neural Network Model. In Proceedings of the 2020 International Conference on Communications, Signal Processing, and their Applications, Sharjah, United Arab Emirates, 16–18 March 2021; pp. 1–6. [Google Scholar]
Taee, E.J.A.; Jasim, Q.M. Blurred Facial Expression Recognition System by Using Convolution Neural Network. Webology 2020, 17, 804–816. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, C.; Ling, X.; Deng, W. Learn from All: Erasing Attention Consistency for Noisy Label Facial Expression Recognition. In Lecture Notes in Computer Science; Springer Nature: Cham, Switzerland, 2022; pp. 418–434. [Google Scholar]
Wijaya, B.; Satyawan, A.; Haqiqi, M.; Susilawati, H.; Artemysia, K.; Sopian, S.; Shamie, M.; Firman. Enhancing Image Quality in Facial Recognition Systems with GAN-Based Reconstruction Techniques. Teknika 2025, 14, 107–116. [Google Scholar] [CrossRef]
Wan, X.; Li, W.; Gao, G.; Lu, H.; Yang, J.; Lin, C. Attention-Guided Multi-scale Interaction Network for Face Super-Resolution. arXiv 2024, arXiv:2409.00591. [Google Scholar]
Gawlikowski, J.; Tassi, C.; Ali, M.; Feng, J. A Survey of Uncertainty in Deep Neural Networks. Artif. Intell. Rev. 2024, 57, 987–1012. [Google Scholar] [CrossRef]
Li, Y.; Zeng, J.; Shan, S.; Chen, X. Occlusion Aware Facial Expression Recognition Using CNN with Attention Mechanism. IEEE Trans. Image Process. 2019, 28, 2439–2450. [Google Scholar] [CrossRef] [PubMed]
Shao, J.; Cheng, Q. E-FCNN for tiny facial expression recognition. Appl. Intell. 2020, 51, 549–559. [Google Scholar] [CrossRef]
Yan, Y.; Zhang, Z.; Chen, S.; Wang, H. Low-resolution facial expression recognition: A filter learning perspective. Signal Process. 2020, 169, 107370. [Google Scholar] [CrossRef]
Han, B.; Yun, W.H.; Yoo, J.H.; Kim, W.H. Toward Unbiased Facial Expression Recognition in the Wild via Cross-Dataset Adaptation. IEEE Access 2020, 8, 159172–159181. [Google Scholar] [CrossRef]
Gera, D.; Balasubramanian, S. Landmark guidance independent spatio-channel attention and complementary context information based facial expression recognition. Pattern Recognit. Lett. 2021, 145, 58–66. [Google Scholar] [CrossRef]
Li, X.; Zhu, C.; Zhou, F. Facial Expression Recognition: One Attention-Modulated Contextual Spatial Information Network. Entropy 2022, 24, 882. [Google Scholar] [CrossRef]
Fu, B.; Mao, Y.; Fu, S.; Ren, Y.; Luo, Z. Blindfold Attention: Novel Mask Strategy for Facial Expression Recognition. In Proceedings of the 2022 International Conference on Multimedia Retrieval, Newark, NJ, USA, 27–30 June 2022; ACM: Frisco, TX, USA, 2022; pp. 624–630. [Google Scholar]
Nan, F.; Jing, W.; Tian, F.; Zhang, J.; Chao, K.M.; Hong, Z.; Zheng, Q. Feature super-resolution based Facial Expression Recognition for multi-scale low-resolution images. Knowl. Based Syst. 2022, 236, 107678. [Google Scholar] [CrossRef]
Guo, Y.; Huang, J.; Xiong, M.; Wang, Z.; Hu, X.; Wang, J.; Hijji, M. Facial expressions recognition with multi-region divided attention networks for smart education cloud applications. Neurocomputing 2022, 493, 119–128. [Google Scholar] [CrossRef]
Li, C.; Li, X.; Wang, X.; Huang, D.; Liu, Z.; Liao, L. FG-AGR: Fine-Grained Associative Graph Representation for Facial Expression Recognition in the Wild. IEEE Trans. Circuits Syst. Video Technol. 2023. early access. [Google Scholar] [CrossRef]
Gómez-Sirvent, J.L.; López de la Rosa, F.; López, M.T.; Fernández-Caballero, A. Facial Expression Recognition in the Wild for Low-Resolution Images Using Voting Residual Network. Electronics 2023, 12, 3837. [Google Scholar] [CrossRef]
Verma, P.; Elango, S.; Singh, K. Uncertainty-aware diabetic retinopathy detection using deep learning with Bayesian approximation. Sci. Rep. 2024, 14, 12345. [Google Scholar] [CrossRef]
Wang, X.; Chen, M.; Li, P. Attention-Assisted Dual-Branch Interactive Face Super-Resolution Network. Pattern Recognit. Lett. 2025, 170, 45–54. [Google Scholar] [CrossRef]

Figure 1. (a) The overall architecture of the proposed identity verification framework integrating super-resolution CNN, and Monte Carlo dropout for uncertainty estimation. (b) Detailed structure of the recursive residual block used in DRR.

Figure 2. Sample facial images for training artificial neural networks [13].

Figure 3. Sample face image after face detection [13].

Figure 4. Structure of the sample image files for training and checking.

Figure 5. Structure of the sample image files for training and checking individual classification.

Figure 6. Architecture of contrast enhancement neural network with DRRN.

Figure 7. Residual unit designation under recursive blocks.

Figure 8. Recursive blocks of the DRRN neural network.

Figure 9. Structures of the neural networks, CNNHRB and CNNLRB.

Figure 10. DRRN artificial neural network B1U9.

Figure 11. Cosine similarity results between CNNHRB and CNNLRB.

Figure 12. Facial recognition process during testing.

Figure 13. Accuracy comparison over training epochs using different image enhancement methods.

Figure 14. Comparison of facial recognition accuracy among baseline, VDSR-enhanced, and DRRN-enhanced neural networks, segmented by overall performance, dark skin tones, and non-dark skin tones.

Figure 15. Ablation study effect of super-resolution on e-KYC accuracy.

Figure 16. Confusion matrix for e-KYC classification.

Figure 17. Example of noise and blur levels on facial image.

Figure 18. Impact of noise and blur on model accuracy.

Table 1. Comparison of super-resolution methods.

Method	Advantages	Limitations	Use Case Suitability for e-KYC
SRCNN	Simple architecture, fast training, suitable for small facial images	Lower accuracy compared to deeper models, limited scalability	Real-time, low-resource environments
VDSR	Deeper network, better performance on standard benchmarks	High computational cost, longer training time	Suitable but resource-heavy
EDSR	State-of-the-art performance, optimized residual blocks	Requires large training data, slow inference on edge devices	Suitable with optimization, not ideal for mobile
ESRGAN	High perceptual quality, realistic textures (GAN-based)	Harder to train, may introduce artifacts, high GPU usage	Not suitable for secure ID (risk of hallucinated features)

Table 2. Details of the sample face image.

	Dark Skin	The Skin Color Is Not Dark	Total Results by Gender
Female	9	32	41
Male	6	23	29
Total results classified by skin color	15	55	70

Table 3. Numerical values for gender and skin color classification.

Sample Type	Gender Code	Skin Color Code	Value Code
Portrait of a dark-skinned female person’s face	0	1	01
Image of a female person’s face with light-skinned skin	0	0	00
Portrait of a dark-skinned male person’s face	1	1	11
Image of a male person’s face with light-skinned skin	1	0	10

Table 4. Parameters of the neural networks CNNHRB and CNNLRB.

Layer Name	CNNHRB	DRRN + CNNLRB
Convolution 1	(3 × 3, 64), Stride 1 (3 × 3, 64) × 2	DRRN: (3 × 3, 128) × 18 (3 × 3, 1)
Convolution 1	(3 × 3, 64), Stride 1 (3 × 3, 64) × 2	(3 × 3, 64), Stride 1 (3 × 3, 64) × 2
Convolution 2	(3 × 3, 128), Stride 1 (3 × 3, 128) × 4	(3 × 3, 128), Stride 1 (3 × 3, 128) × 4
Convolution 3	(3 × 3, 256), Stride 1 (3 × 3, 256) × 8	(3 × 3, 256), Stride 1 (3 × 3, 256) × 8
Convolution 4	(3 × 3, 512), Stride 1 (3 × 3, 512) × 2	(3 × 3, 512), Stride 1 (3 × 3, 512) × 2
Fully connected	512	512
Parameters	68.60 M	73.45 M

Table 5. Comparison of the prediction accuracy of neural networks.

Artificial Neural Network	Number of Face Images	Level of Similarity (Precise)
Artificial Neural Network	Number of Face Images	85%	90%	95%
Without enlarging clarity	2917	81.2%	2.1%	0
With clarity amplified with VDSR	2917	99.3%	98.6%	97.4%
With clarity amplified with DRRN	2917	99.7%	99.4%	99.3%

Table 6. Comparison of the prediction accuracy of artificial neural networks in the case of dark skin shades.

Artificial Neural Network	Number of Face Images	Level of Similarity (Precise)
Artificial Neural Network	Number of Face Images	85%	90%	95%
Without enlarging clarity	560	76.5%	0	0
With clarity amplified with VDSR	560	99%	98.1%	97.3%
With clarity amplified with DRRN	560	99.7%	99.1%	99.1%

Table 7. Comparison of the prediction accuracy of artificial neural networks in the case of non-dark skin.

Artificial Neural Network	Number of Face Images	Level of Similarity (Precise)
Artificial Neural Network	Number of Face Images	85%	90%	95%
Without enlarging clarity	2357	88%	2.1%	0
With clarity amplified with VDSR	2357	99.6%	98.1%	97.5%
With clarity amplified with DRRN	2357	99.7%	99.7%	99.2%

Table 8. Comparison of the proposed DRRN with the state-of-the-art methods.

Method	Year	Image Size	Accuracy
gACNN [29]	2019	224 × 224	85.07%
E-FCNN [30]	2020	50 × 50	84.62%
IFSL (SVM) [31]	2020	32 × 32	76.90%
ResNet-50 [32]	2021	100 × 100	87.00%
SCAN and CCI [33]	2021	224 × 224	89.02%
ACSI-Net [34]	2022	256 × 256	86.86%
MAFT [35]	2022	224 × 224	88.75%
RCAN [36]	2022	50 × 50	85.76%
MATF [37]	2022	100 × 100	88.52%
EAC [25]	2022	224 × 224	89.99%
FG-AGR [38]	2023	224 × 224	90.81%
Baseline [39]	2023	48 × 48	84.32%
Voting [39]	2023	48 × 48	85.69%
Wan et al. [27]	2024	96 × 96	98.7%
Verma et al. [40]	2024	N/A	99.2 (medical)
Wijaya et al. [26]	2025	96 × 96	98.7%
Wang et al. [41]	2025	96 × 96	98.9%
DRRN (Ours)	2025	24 × 24	99.3%

Table 9. Comparison of DL models for identity verification.

Model	Accuracy	Precision	Recall	F1-Score	AUC	Inference Time (ms/Image)	Model Size (MB)
Proposed CNN + SRCNN	99.3	99.2	99.3	99.2	99.5	8.4	3.2
MobileNetV2	96.8	96.5	96.7	96.6	97.0	6.2	9.0
ResNet50	97.5	97.3	97.4	98.1	97.8	14.1	98.0
FaceNet	98.1	98.0	98.2	98.1	98.4	22.3	120.0
EfficientNet-B0	97.0	96.8	96.5	96.8	97.3	10.5	20.4

Table 10. Robustness test: model performance under noise and blur conditions.

Noise/Blur	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)	AUC (%)
Normal	99.3	99.2	99.3	99.2	8.4
Low Noise	98.4	97.3	97.8	97.5	98.6
Medium Noise	97.2	96.1	96.5	96.3	97.5
High Noise	95.0	93.0	93.8	93.4	95.2
Low Blur	98.1	97.0	97.5	97.2	98.3
Medium Blur	96.8	95.6	96.0	95.8	97.0
High Blur	94.5	92.5	93.2	92.8	94.8

Table 11. Computational efficiency analysis.

Model	Inference Time (ms/Image)	Model Size (MB)
Proposed CNN + SRCNN	8.4	3.2
MobileNetV2	6.2	9.0
ResNet50	14.1	98.0
FaceNet	22.3	120.0
EfficientNet-B0	10.5	20.4

Table 12. Ablation study results.

Resolution	Accuracy (%)	Computational Overhead (%)
16 × 16 Pixels	92.4	Low
24 × 24 Pixels	99.3	Moderate (optimal)
32 × 32 Pixels	99.4	High (+35%)

Table 13. To contextualize the performance of the proposed framework, we conducted a comprehensive comparison with several state-of-the-art facial recognition methods, including both with and without uncertainty estimation mechanisms.

Method	Accuracy (%)	Uncertainty Estimation	Inference Time (ms)
MobileNetV2 [Baseline]	96.8	No	6.2
ResNet50	97.5	No	14.1
FaceNet	98.1	No	22.3
Proposed Framework	99.3	Yes (MC Dropout)	8.4

Table 14. Quantitative performance comparison under various image degradation conditions.

Degradation Type	Without Super-Resolution (%)	With Super-Resolution (%)	Improvement (%)
Gaussian Noise	88.1	97.5	+9.4
Motion Blur	85.6	96.8	+11.2
Low Contrast	86.9	97.2	+10.3
Gaussian Blur	84.7	96.4	+11.7
Compression Artifacts	87.3	97.0	+9.7

Table 15. The integration of these components results in a robust and highly accurate identity verification framework.

Component	Without Component	With Component	Performance Impact
Super-resolution	84.32% accuracy	99.3% accuracy	+14.98% improvement
DRRN-based CNN	Lower feature quality	Enhanced robustness	Improved stability
Monte Carlo dropout	No confidence scores	Risk-aware decisions	Reduced errors

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ketcham, M.; Boonyopakorn, P.; Ganokratanaa, T. Resolution-Aware Deep Learning with Feature Space Optimization for Reliable Identity Verification in Electronic Know Your Customer Processes. Mathematics 2025, 13, 1726. https://doi.org/10.3390/math13111726

AMA Style

Ketcham M, Boonyopakorn P, Ganokratanaa T. Resolution-Aware Deep Learning with Feature Space Optimization for Reliable Identity Verification in Electronic Know Your Customer Processes. Mathematics. 2025; 13(11):1726. https://doi.org/10.3390/math13111726

Chicago/Turabian Style

Ketcham, Mahasak, Pongsarun Boonyopakorn, and Thittaporn Ganokratanaa. 2025. "Resolution-Aware Deep Learning with Feature Space Optimization for Reliable Identity Verification in Electronic Know Your Customer Processes" Mathematics 13, no. 11: 1726. https://doi.org/10.3390/math13111726

APA Style

Ketcham, M., Boonyopakorn, P., & Ganokratanaa, T. (2025). Resolution-Aware Deep Learning with Feature Space Optimization for Reliable Identity Verification in Electronic Know Your Customer Processes. Mathematics, 13(11), 1726. https://doi.org/10.3390/math13111726

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Resolution-Aware Deep Learning with Feature Space Optimization for Reliable Identity Verification in Electronic Know Your Customer Processes

Abstract

1. Introduction

2. Related Works

2.1. Conceptual Framework and Principles of Customer Recognition Process

2.2. Guidelines for the Use of Digital IDs in Thailand

3. Proposed Method

3.1. HRFECNN and LRFECNN Design

3.2. Training the Neural Network

4. Experimental Results

4.1. Comparison of Similarity Between Neural Network Structures

4.2. Results of the Facial Recognition Process

5. Conclusions and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Summary of Mathematical Equations and References

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI