An Iris Image Super-Resolution Model Based on Swin Transformer and Generative Adversarial Network

Lu, Hexin; Zhu, Xiaodong; Cui, Jingwei; Jiang, Haifeng

doi:10.3390/a17030092

Open AccessArticle

An Iris Image Super-Resolution Model Based on Swin Transformer and Generative Adversarial Network

¹

Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China

²

College of Computer Science and Technology, Jilin University, Changchun 130012, China

³

College of Software, Jilin University, Changchun 130012, China

^*

Author to whom correspondence should be addressed.

Algorithms 2024, 17(3), 92; https://doi.org/10.3390/a17030092

Submission received: 22 January 2024 / Revised: 17 February 2024 / Accepted: 19 February 2024 / Published: 20 February 2024

Download

Browse Figures

Versions Notes

Abstract

The process of iris recognition can result in a decline in recognition performance when the resolution of the iris images is insufficient. In this study, a super-resolution model for iris images, namely SwinGIris, which combines the Swin Transformer and the Generative Adversarial Network (GAN), is introduced. SwinGIris performs quadruple super-resolution reconstruction for low-resolution iris images, aiming to improve the resolution of iris images and thereby improving the recognition accuracy of iris recognition systems. The model utilizes residual Swin Transformer blocks to extract depth global features, and the progressive upsampling method along with sub-pixel convolution is conducive to focusing on the high-frequency iris information in the presence of more non-iris information. In order to preserve high-frequency details, the discriminator employs a VGG-style relative classifier to guide the generator in generating super-resolution images. In experimental section, we enhance low-resolution (56 × 56) iris images to high-resolution (224 × 224) iris images. Experimental results indicate that the SwinGIris model achieves satisfactory outcomes in restoring low-resolution iris image textures while preserving identity information.

Keywords:

iris recognition; image reconstruction; super-resolution; transformer; Generative Adversarial Network (GAN)

1. Introduction

The iris contains crucial biometric characteristics marked by stability, uniqueness, and universality, making it highly valuable for personal identification purposes. However, the acquisition of low-resolution iris images may be attributed to hardware limitations and extended acquisition distances, significantly compromising the recognition performance of the iris recognition system. One feasible approach is enhancing the recognition performance of low-resolution iris images by utilizing a super-resolution model to boost the resolution before the iris recognition processing.

Compared to general images, there is less structural information revealed in iris patterns. In addition, the randomness, complexity, and significant variability in iris texture make the super-resolution task more complex. Furthermore, beyond the visual effect, the recognition performance of the reconstructed iris image is also an important point to consider.

The iris super-resolution task requires the recovery of meaningful texture information, which consists mainly of high-frequency components. Our objective is to identify an optimal solution that not only enhances the visual effect but also improves recognition performance. To achieve this, we designed an iris image super-resolution model based on a Swin Transformer and Generative Adversarial Network (GAN). Through experimental validation, it has been confirmed that the method fulfills the requirements of the iris super-resolution task.

1.1. Research Questions

Firstly, it is crucial to assess the feasibility of the super-resolution methodology.

Our objective is to enhance the recognition performance of low-resolution iris images. It is essential to validate that the super-resolution method can genuinely enhance the recognition performance of low-resolution iris images.

In the experimental section, as presented in Section 4, we conducted a comparative analysis of recognition performance between low-resolution iris images and those reconstructed using super-resolution. The results demonstrate that super-resolution reconstruction significantly enhances the recognition performance of low-resolution iris images.

Secondly, it is imperative to deliberate upon the architecture of the model.

In recent years, significant strides have been made in the field of super-resolution (refer to Section 2 for details). However, it is noteworthy that much of the existing work primarily focuses on super-resolution for general images, with an emphasis on optimizing the visual quality of reconstructed images.

The inherent complexity and unpredictability of iris images distinguish the iris super-resolution task from conventional image super-resolution tasks. The unique demands of the iris super-resolution task necessitate a heightened emphasis on recovering intricate image texture details to ensure the preservation of vital identity information throughout the process.

Throughout experimentation, we observed a notable enhancement in the recognition performance of low-resolution iris images using the model based on Swin Transformer or Generative Adversarial Network (GAN). Therefore, we aimed to design a model (refer to Section 3 for details) that effectively integrates Swin Transformer and GAN to synergistically collaborate in iris super-resolution task.

Thirdly, the selection of appropriate metrics is paramount.

There is no single metric that comprehensively evaluates both the reconstruction quality and recognition performance of the iris image. Therefore, distinct metrics are employed to assess the quality of generated images (such as PSNR, SSIM, PI) and the recognition performance (EER). For detailed information, please refer to Section 4.2.3.

1.2. Contributions and Paper Outline

The main contributions of this research can be summarized as follows:

(1): The paper introduces a unique model by combining residual Swin Transformer blocks with sub-pixel convolutional progressive upsampling. The model effectively addresses the relationship between global information in images while minimizing information loss during the reconstruction process.
(2): The method incorporates adversarial learning into the model to provide more effective constraints on texture recovery. This addition enables the generated images to exhibit higher-frequency details, resulting in more realistic and visually appealing results.
(3): A combination of multiple loss functions was employed, including perceptual loss, adversarial loss and the loss function utilizing the $L_{1}$ norm. This integrated loss function design aids in balancing the quality, structure, and visual perceptual aspects of the generated images during training. By merging multiple loss functions, this paper presents an effective training approach that enhances the precision and realism of generated images.

This paper comprises five parts. Section 2 presents related work and analyzes in detail the advantages and limitations of each approach. Section 3 delineates the proposed method and presents how the iris image resolution task is achieved. Section 4 describes the experiments and results, including the description of the dataset, the data preprocessing process, the model parameters, the metrics, and the analysis of the results. Section 5 presents the conclusions and the limitations of the method.

2. Related Work

The interpolation method is an early technique utilized for image super-resolution reconstruction. The method involves using known low-resolution discrete pixel points as input and calculating the values of unknown pixel points for extension. Traditional linear interpolation algorithms, such as Nearest Interpolation [1], Bilinear Interpolation [2], and Bicubic Interpolation [3], are commonly employed. The interpolation method is straightforward and easy to implement; however, these methods can cause issues, such as excessive image smoothing and loss of high-frequency information. This is because the interpolation method solely focuses on the change in pixel value and does not consider the relationships between pixels.

Dong et al. [4]. proposed the SRCNN (Super-Resolution Convolutional Neural Network) algorithm in 2014, marking the first utilization of a CNN network model for super-resolution tasks. The SRCNN consists of only three convolutional layers. Authors employed the Bicubic interpolation method to downsample the target images, generating the preprocessed low-resolution images. These images were then upscaled using bicubic interpolation before being fed into the SRCNN network for image recovery.

CNNs are primarily adept at processing local information and are less effective at modeling long-term dependencies [5]. Furthermore, utilizing the same convolutional kernel to process texture information across different regions of an image may not facilitate optimal image recovery. Unlike CNNs, the transformer model’s self-attention mechanism places a stronger emphasis on capturing global information connections [6]. In 2021, MSRA proposed Swin Transformer [7], which incorporates a shifted window mechanism to partition the image into multiple overlapping windows for attention computation. The method enhances the locality of convolution and improves the model’s ability to comprehend global information through multiple layers of attention computations.

Building upon this advancement, in 2021, researchers proposed SwinIR [8], a super-resolution model based on the Swin Transformer. SwinIR utilizes convolutional networks to extract shallow features from low-resolution images, which are then passed through residual Swin Transformer blocks for deep feature extraction. The model subsequently employs an image reconstruction module to generate high-resolution images. Compared to previous methods, SwinIR demonstrates superior feature extraction and image reconstruction capabilities, leading to enhanced quality for the super-resolution images. Based on SwinIR, researchers proposed the HAT [9] model, which had shown an improved performance in super-resolution tasks in 2022.

Drawing inspiration from SwinIR, the model proposed in this paper employs residual Swin Transformer blocks (RSTBs) to extract deep features from images. By introducing RSTBs, the generator benefits from the ability of CNNs to process large-size images and capture long-term dependencies within the model.

To help focus high-frequency information in the iris region and address the issue of information loss during upsampling, sub-pixel convolutional progressive upsampling [10,11] is utilized. This approach gradually increases the resolution of the images by a factor of two at a time. However, this scheme provides insufficient constraints on texture recovery and still suffers from high-frequency detail loss.

Super-resolution models based on adversarial generative networks (GAN) offer enhanced generative capabilities [12], resulting in images with improved visual quality and higher-frequency details. The SRGAN [13] model was the first to introduce GANs into the super-resolution field. Building upon SRGAN, Wang et al. proposed the ESRGAN model [14], aiming to further improve image quality and realism. ESRGAN employs a deep residual network to extract image features from low-resolution images and utilizes a relative discriminator (RaGAN) to assess image realism. Additionally, ESRGAN incorporates modifications to the perceptual loss by utilizing pre-activation features, enabling better preservation of image details and texture information.

The adversarial generative network exhibits a greater capability for handling complex image content, and it also provides robust constraints for texture recovery. This characteristic aligns well with the requirements of the iris super-resolution task. Therefore, the model in this paper adopts the GAN structure, with the discriminator component utilizing RaGAN based on the VGG-style network [15]. This choice enables the evaluation of the generator’s output in terms of realism, thereby improving training stability and the quality of the generated images.

GAN-based super-resolution models encounter several challenges, such as uncertainty in the generation process and training instability. Consequently, fine-tuning of parameters and careful training are necessary to achieve optimal results.

In terms of the loss function, a combination of the loss function utilizing

L_{1}

norm, perceptual loss function [16], and adversarial loss function is utilized [17]. This ensures that the generated image retains both the structural and content-related characteristics of the original image.

3. Method

In this research, we utilize a Generative Adversarial Network (GAN) framework to construct a specialized super-resolution model for the enhancement of an iris recognition system. The model comprises two main components: a generator and a discriminator. The generator is responsible for generating iris super-resolution images, while the discriminator aids in training the generator by providing feedback on the generated images’ quality.

Figure 1 illustrates the integration of the super-resolution model into the iris recognition system. Within the figure, SwinGIris (generator) represents the generator component of the trained iris super-resolution model. The workflow of the iris recognition system integrated with a super-resolution model can be roughly described as follows: Firstly, the acquired low-resolution iris images are input into the SwinGIris (generator). The images are then reconstructed into high-resolution images. Subsequently, a quality evaluation is conducted, and if the quality is deemed inadequate, iris image recapturing is required. Following a successful quality evaluation, the iris images will undergo normalization and feature extraction to obtain the iris code. Finally, a comparison with the data in the iris database will help obtain the recognition result. In the experimental section, we use VGG16, ResNet101 and DenseNet121 as backbone extraction networks, respectively.

This section will provide a detailed overview of the network model’s architecture and its individual components.

3.1. Model Architecture

The overall structure of the generator is shown in Figure 2. A three-channel low-resolution image with a resolution of 56 × 56 is input and a three-channel high-resolution image with a resolution of 224 × 224 is output after super-resolution reconstruction.

We first use a 3 × 3 convolution kernel

f_{c o n v} (\cdot)

to extract shallow features from the image using a 3 × 3 convolution kernel

I_{L R} \in R^{C \times H \times W}

for the input low-resolution image, obtaining shallow features

F_{0} \in R^{C^{'} \times H \times W}

:

F_{0} = f_{c o n v} (I_{L R})

(1)

The size of the shallow feature

F_{0}

is the same as the size of the input image

I_{L R}

. The role of the convolution kernel is to map the image from the low-dimensional space

C

to the high-dimensional space

C^{'}

to recover the image’s low-frequency information.

After shallow feature extraction (the feature maps shown in Figure 3), the feature maps still retain structural information. To recover the lost high-frequency information, we partition the feature map from the shallow feature extraction module into several non-overlapping patch embeddings. The overall structure of the generator is shown in Figure 2.

The RSTB (Residual Swin Transformer Block) structure, depicted in Figure 4, is composed of multiple Swin Transformer layers (STL) and convolutional layers. The RSTB structure incorporates a residual connection, enabling the fusion of features from different levels and enhancing training stability. The introduction of convolutional layers aims to enhance the model’s translational invariance, enabling it to better capture spatial information in the iris images. The combination of Swin Transformer layers and convolutional layers within the RSTB structure helps improve the model’s feature representation capabilities.

The STL divides the input into multiple non-overlapping local windows and calculates each window’s standard self-attention (i.e., local attention) separately. The structure of the STL is shown in Figure 5. The STL block consists of two LayerNorm, a Multi-Headed Self-Attention (MSA) module based on shifted windows and a nonlinear layer containing a two-layer perceptron (MLP) [18]. The specific structure of the model includes the addition of LayerNorm before each MSA module and MLP module, along with the use of residual connectivity throughout. The main differences between the STL compared to the original Transformer layer are the use of local attention and shifted windows mechanisms.

The transition block is set after several residual Swin Transformer blocks. This operation aims at keeping the model compactness and improving the computing efficiency, which consists of a BN layer, followed by a Conv layer with 1 × 1 kernels and a 2 × 2 average pooling layer.

Inspired by progressive upsampling networks, we upsample the image once at each stage and gradually reconstruct the image to obtain a higher resolution. This progressive upsampling operation reduces the difficulty of learning. It adds more detailed information in each of the multiple upsampling, thus alleviating the problem of information loss during a single upsampling. For low-resolution images with quadruple downsampling, we magnify the features by a factor of two each time and execute two upsamplings. After this series of operations, we obtained the deep feature map shown in Figure 6.

As shown in Figure 7, we recombine multiple channels and upsample them using subpixel convolution to convert low-resolution feature maps to subpixel convolution feature maps (feature maps shown in Figure 8). Subpixel convolution, also known as PixelShuffle, refers to the process of refining pixels. In the imaging system, as it discretizes the continuous representation of the real-world through photosensitive elements, there exists a level smaller than a pixel between each pixel known as “subpixel”. The specific process of subpixel convolution involves using convolution to extract features from an image, followed by reassembly of the feature maps from different channels, thereby obtaining a higher resolution feature map. Compared with manual filters, such as bilinear interpolation and bicubic interpolation, the pixel expansion of subpixel convolution is achieved by convolution, and the corresponding parameters are generated by learning, thus enabling the better fitting of the relationships between pixels. In addition, the subpixel layer has a larger perceptual field and can provide more contextual information, thus helping to generate more realistic details. Figure 9 illustrates the changes in the feature maps during the super-resolution reconstruction process.

3.2. Adversarial Architecture

The overall structure of the discriminator is shown in Figure 10. The discriminator uses a VGG-style network structure that can be divided into two parts: the feature extraction and the classifier. The input is the super-resolution image generated by the generator. After feature extraction and classifier processing, the input image is mapped to a number from 0 to 1 by a Sigmoid function, determining whether the input image is the original high-resolution image or the generated super-resolution image. The GAN solves a min–max type problem. The generator and discriminator components of the network are optimized alternately in order to make the final output distribution match the target distribution. Its objective function can be defined as:

\min_{θ_{G}} \max_{θ_{D}} E_{I^{H R}} [l o g D_{θ_{D}} (I^{H R})] + E_{I^{S R}} [l o g {(1 - D}_{θ_{D}} (G_{θ_{G}} (I^{L R}))]

(2)

where

E_{I^{H R}}

[.] and

E_{I^{S R}}

[.] represent the operation of averaging all the high-resolution image data and the generated super-resolution image data in a mini-batch, respectively,

I^{H R}

is the high-resolution iris image,

I^{L R}

is the low-resolution iris image,

θ_{G}

and

θ_{D}

are the parameters of the generator and discriminator,

G

is the generator, and

D

is the discriminator.

The idea behind the formula is that the generator is continuously trained to fool the discriminator. At the same time, the discriminator is also constantly optimized to improve its ability to distinguish the actual image and the generated image.

Unlike the standard GAN network, inspired by ESRGAN, the discriminator adopts the design scheme of Relativistic GAN (RaGAN). Compared with the discriminator of the traditional GAN, that determines whether the image is actual or generated, the RaGAN learns to distinguish the probability that the actual image is more realistic relative to the faked image, resulting in a more significant effect when processing images.

The formula is described as follows:

D_{R a} (I^{H R}, I^{S R}) = σ (D (I^{H R}) - E_{I^{S R}} [D (I^{S R})])

(3)

D_{R a} (I^{S R}, I^{H R}) = σ (D (I^{S R}) - E_{I^{H R}} [D (I^{H R})])

(4)

where

σ

is the sigmoid function,

D_{R a}

is the relative discriminator and

I^{S R}

is the super-resolution image generated by the generator. Our target is to maximize

D_{R a} (I^{H R}, I^{S R})

.

3.3. Loss Function

The loss function consists of three components: the loss function utilizing

L_{1}

norm, adversarial loss, and perceived loss, as shown in the following formula:

L_{total} = α L_{1} + β L_{p e r} + γ L_{G}^{R a}

(5)

where

α

,

β

and

γ

are the trade-off parameters to balance the weight of each loss. In Section 4.2.2, we will detail the specific values of the parameters in the experiments.

The loss function utilizing

L_{1}

norm is a kind of pixel loss function which calculates the average absolute pixel difference between the generated image and the target image. It can be used to measure the pixel difference between the generated high-resolution image and the ground truth high-resolution image.

The specific formula is as follows:

L_{1} = \frac{1}{n} \sum_{i = 1}^{n} ‖I_{i}^{S R} - I_{i}^{H R}‖

(6)

where n is the number of pixel points,

I_{i}^{S R}

is the i-th pixel point of the generated super-resolution image, and

I_{i}^{H R}

is the i-th pixel point of the target high-resolution image.

Perceptual loss is introduced considering the structure and content of the images. The distance between the generated image and the feature representation of the target image in the pre-trained VGG network is calculated to guide the model to learn a more accurate image structure and content. The specific formula is as follows:

L_{p e r} = \frac{1}{C_{j} H_{j} W_{j}} {‖ϕ_{j} (I^{S R}) - ϕ_{j} (I^{H R})‖}^{2}

(7)

where

ϕ_{j}

is the feature image output from the j-th layer of the VGG network, and

C_{j} {, H}_{j}, a n d W_{j}

are the number of channels, height, and width of the feature map output from the j-th layer of the network.

The training of GAN is divided into two parts: the training of the generator and the training of the discriminator. Two loss functions need to be defined: one for the generator network, which aims to provide stronger constraints and generate images with richer texture details, and one for the discriminator network, which aims to improve the ability to discriminate high-resolution images from super-resolution images. The two networks are trained alternately until convergence.

The loss function of the generator is as follows:

L_{G}^{R a} = - E_{I^{H R}} [\log (1 - D_{R a} (I^{H R}, I^{S R}))] - E_{I^{S R}} [\log (D_{R a} (I^{H R}, I^{S R}))]

(8)

The loss function of the discriminator is as follows:

L_{D}^{R a} = - E_{I^{H R}} [\log (D_{R a} (I^{H R}, I^{S R}))] - E_{I^{S R}} [\log (1 - D_{R a} (I^{H R}, I^{S R}))]

(9)

4. Experiments and Results

4.1. Datasets

The research uses the CASIA Iris Image Database released by the Chinese Academy of Sciences Institute of Automation (CASIA) for training and testing [19].

The training set consists of 45,323 iris images. Specifically, it is constructed with CASIA-Iris-Lamp, CASIA-Iris-Syn, part of CASIA-Iris-Interval (first 80% of subjects) and part of CASIA-Iris-Thousand (first 80% of subjects). Figure 11 shows a number of example images of different datasets, and it is worth noting that the datasets originally did not have the same dimensions; in order to facilitate the processing, we standardized the dimensions of the datasets (the specific method is described in Section 4.2.1).

It can be seen that CASIA-Iris-Interval (Figure 11 third line) contains relatively more iris information. CASIA-Iris-Thousand (Figure 11 fourth line) contains relatively little iris information, but the dataset has a much larger number, as well as a variety of iris images (the exact number will be specifically described below). Therefore, we select the remaining data of CASIA-Iris-Thousand (last 20% of subjects) and CASIA-Iris-Interval (last 20% of subjects) as two testing sets. For convenience, the two testing sets are still named CASIA-Iris-Thousand and CASIA-Iris-Interval in the paper.

CASIA-Iris-Lamp was taken using a handheld iris sensor and features the introduction of elastic distortion of the iris texture due to pupil expansion and contraction caused by different lighting conditions. The dataset includes 411 subjects and 16,212 images in 819 categories. CASIA-Iris-Syn is a synthetic dataset that introduces distortion, blurring, and rotation into this collection, including 1000 categories and 10,000 images.

The iris images in the CASIA-Iris-Interval dataset were captured indoors by CASIA’s self-developed iris camera, with samples taken from CASIA graduate students. These images have extremely sharp iris texture details. The dataset contains 2639 images, involving 249 subjects and 395 classes, all with a resolution of 320 × 280.

The iris images in the CASIA-Iris-Thousand dataset were collected indoors with being turned lights on/off by the device IKEMB-100, developed by Irisking. The samples were from a wide age distribution of students, workers, and farmers. The dataset contains 20,000 iris images from 1000 subjects with a resolution of 640 × 480. Since there are more types and numbers of iris images in the CASIA-Iris-Thousand dataset, we have selected the last 20% of the subjects’ images (total 528 images) as a test set.

This dataset encompasses a wide variety of iris categories, and the iris images within it contain a moderate amount of texture information. Therefore, we have selected the last 20% of the subjects’ images (total 2000 images) as the other testing set.

4.2. Implementation Details

4.2.1. Data Preparation

For the iris dataset, different subsets of iris images need to be preprocessed with a uniform resolution of 224 × 224, with all images being saved as 24-bit color, bmp format for post-processing. The specific steps are as follows:

(1): Use the open-source OSIRIS [20] to acquire the circle parameters of the original iris images.
(2): Generate high-resolution images (HR images, 224 × 224): use the bicubic interpolation method to resize the iris images to a radius of 105 pixels and subsequently crop images into a resolution of 224 × 224 centered around the center of the pupil.
(3): Generate low-resolution images (LR images, 56 × 56): low-resolution images with a resolution of 56 × 56 pixels are generated by bicubic interpolation downsampling of HR images with a resolution of 224 × 224.
(4): Save all images as 24-bit color, bmp format.

In contrast to some super-resolution (SR) methods that employ overlapped patches as input and subsequently merge the output patches to form a finalized image, our model utilizes the entire low-resolution (LR) image as input to capture global information for iris images.

In order to verify that the super-resolution technique is indeed helpful for low-resolution iris image recognition, the recognition performance of low-resolution images as well as super-resolution images under different methods should be compared in experiments. However, the size of LR images is different from the size of SR images, and iris recognition systems require the input iris images to have a consistent size and format for effective comparison and recognition. For comparison, we used the conventional interpolation algorithm bicubic to enlarge the LR size and test the recognition performance of the LR images.

4.2.2. Parameters

For the architecture of the generator, the window size is set to eight. Four RSTBs are included between the convolution block for shallow feature extraction and the transition block, as well as between the transition block and the upsampling block. Each RSTB includes 6 STLs, and the number of attention heads in different layers is all six. The ratio of the MLP hidden layers’ feature map channels to the embedded layers’ feature map channels is set to two. A learnable bias is added to the attention mechanism’s query, key, and value. The number of feature maps for each transition block becomes 0.9 times the original.

The training is divided into two stages, and the first stage trains the generator alone with the loss function using only the loss function utilizing

L_{1}

norm and the coefficients set to 1 × 10⁻². The second stage introduces the adversarial structure with the loss function, as shown in Formula (5), setting the coefficients

α

to 1 × 10⁻²,

β

to 1, and

γ

to 5 × 10⁻³. In the experimental ablation part, we train only the generator and use the loss function utilizing

L_{1}

norm for optimisation.

The generator and discriminator are optimized alternately using the ADAM optimizer [21], with an initial learning rate of 1 × 10⁻⁴ for the generator and 1 × 10⁻⁵ for the discriminator. When the loss does not decrease significantly, the learning rate is reduced appropriately until the model converges. All experiments were conducted on four NVIDIA GTX 1080 Ti GPUs, with the batch size per gpu set at eight.

4.2.3. Metrics

To evaluate the quality of super-resolution images after reconstruction, we will introduce several commonly used metrics to assess the effectiveness of image reconstruction.

The most widely used evaluation metrics to measure the quality of super-resolution images are peak signal-to-noise ratio (PSNR) [22] and structural similarity (SSIM) [23].

The formulas are as follows:

PSNR = 10 l g (\frac{{M a x V a l u e}_{I}}{M S E})

(10)

where

{M a x V a l u e}_{I}

is the maximum pixel value of the image and MSE is the mean square error.

SSIM = l (I^{S R}, I^{H R}) \cdot c (I^{S R}, I^{H R}) \cdot s (I^{S R}, I^{H R})

(11)

where

l (I^{S R}, I^{H R})

is the brightness comparison of the image,

c (I^{S R}, I^{H R})

is the contrast comparison of the image, and

s (I^{S R}, I^{H R})

is the structure comparison of the image.

PSNR and SSIM are based on a comparison between pixels and do not fully reflect the quality of the generated image. For example, even though a generated image has a high PSNR and high SSIM, it may suffer from high-frequency texture information loss. To evaluate the generated image texture details more accurately, we introduce the evaluation index of the perceptual index (PI) [24]. Its formula is as follows:

Perceptual Index = \frac{1}{2} ((10 - M a) + N I Q E)

(12)

where Ma is Ma’s score [25] and NIQE is the naturalness image quality evaluator [26].

Furthermore, we also need to consider the validity of iris recognition. We use equal error rate (EER) as a metric to evaluate the performance of iris recognition. Specifically, the images in the training set, ground truth images and super-resolution images are subjected to normalization using Osiris. Subsequently, we conduct fine-tuning of the VGG16, ResNet101 [27], and DenseNet121 [28], which were originally pre-trained on ImageNet, utilizing the images from the normalized training dataset. These classification networks are used as backbone extraction networks for feature extraction from super-resolution normalized images and ground truth normalized images. Finally, the EER is calculated separately for each method.

4.3. Results

In the table, the best results of all metrics are marked in bold. When evaluating the quality of super-resolution images, a higher value of PSNR and SSIM represents better quality of the generated images, while a lower value of PI indicates better performance. In terms of evaluating the recognition effectiveness of iris images, a lower value of the EER denotes better effectiveness.

We train SRCNN based on the CNN network model, ESRGAN based on the GAN network model, SwinIR and HAT based on the Transformer network model, as well as IrisDnet based on the GAN and the densely connected structure targeting the iris super-resolution model using the same training set while testing with the same testing set.

Furthermore, a set of ablation experiments is conducted to examine the effect of GAN structure on the generation of richly detailed image content. The approach is to train only the generator part of the model and test it with test sets, and the results are shown in the SwinIris column of the table. The complete training process is divided into two phases. During the first phase, only the generator is trained. During the second phase, the adversarial structure is introduced. The results are displayed in the SwinGIris column of the table.

The CASIA-Iris-Interval dataset is used to represent a dataset with fewer categories of sharp iris texture details in bright environments. The experimental results are shown in Table 1, and the visual results are displayed in Figure 6.

Compared with other super-resolution models, SwinGIris obtains the highest score in PI, which indicates that our model effectively recovers the detailed information of iris images and is closer to human visual perception. It can be found that, under subjective perception, the images reconstructed by our model have a sharper texture, as shown in Figure 12. Compared to SwinIris, SwinIris exhibits some deficiencies in the restoration of fine details. As shown in Figure 13, in order to show the details of the generated iris image, we intercepted the same area of the image generated by different methods and then zoomed in on the the red box area. Each method recovers images with some loss of detail; however, our method has a stronger ability to recover high frequency information.

Moreover, the metrics in PSNR and SSIM are not the best, as the GAN super-resolution model is featured with more substantial generative power and a higher visual effect. The generated images may contain more high-frequency details that may be noisy or distorted for metrics like PSNR and SSIM. Our ablation model, SwinIris, also possesses higher PSNR and SSIM metrics than SwinGIris.

Considering the effectiveness of adversarial training in enhancing PI, a fair comparison with HAT should be SwinIris, which is without the second stage of training. SwinIris is doing better than HAT, while for PSNR/SSIM, SwinIris performs only a little worse than HAT.

For iris images, high PSNR and SSIM can lead to problems with loss of detail. As shown in Figure 12 and Figure 13, although HAT has the highest PSNR/SSIM, it is not as rich in detail as SwinGIris.

The final stage of iris recognition is to compare the iris code with the iris code in the iris database to determine if the recognition can be successful or not; thus, we think it is also necessary to compare the iris code of iris images generated by different methods. We use the open-source OSIRIS [20] to acquire the iris code for different methods of generating images. The specific results are shown in the Figure 14.

The iris code has more information, and in order to show some of the details, we intercepted and zoomed in on the same area (red area) of the iris image generated by each of the different methods, as shown in Figure 15. It can be clearly seen that the iris code generated by our method is the closest to HR’s iris code, so our method has a better recognition performance.

The current mainstream metrics for measuring super-resolution reconstruction are not a good measure concerning the recognition performance of iris images, as indicated by the results in Table 2.

As shown in Table 2, it can be observed that the super-resolution images are superior to the low-resolution images in all metrics of generated image quality. All the super-resolution models perform better than the low-resolution images regarding iris recognition performance, which demonstrates the effectiveness of super-resolution models in iris recognition.

The experimental results show that all models achieve their lowest EER when using DenseNet121 as the backbone classification network, as the structure of dense connections has stronger classification capabilities. In contrast, as VGG16 has a simple structure, its recognition performance is worse.

In comparison with other models, our model achieved the lowest EER on different networks due to the fact that the global information is taken into consideration during the design of the model, resulting in better structural information in the generated images. The incorporation of the GAN’s architecture enhances the capacity to represent texture details, both of which are formally essential for the iris recognition process.

The EER curves do not show significant variations. To effectively illustrate the trend of the EER curves, a logarithmic scale was employed for the X-axis, and the Y-axis was represented using decimal coordinates. As shown in Figure 16, it will be closed with only minor differences for networks featuring strong classification ability. As the iris information in this database is more abundant, mainstream super-resolution models exhibit similar performances. The model proposed in this paper reaches the best performance in most cases.

The CASIA-Iris-Thousand dataset was employed to represent a multiclass dataset with unclear texture details collected in dim environments. The results of the comparison experiments are shown in Table 3.

Compared to the CASIA-Iris-Interval dataset, more low-frequency information (e.g., eyelids, etc.) is present. In this case, as shown in Table 3, improved PSNR and SSIM metrics of the generated images are achieved when compared with the super-resolution model. HAT still performs the best in PSNR, and SSIM metrics. Here, it should be noted that SwinGIris obtains the highest score in PI. The overall results are similar to the results of the CASIA-Iris-Interval dataset.

Figure 17 shows the visual results of CASIA-Iris-Thousand dataset, in order to show the details of the generated images, we intercepted the same area (red area) of the iris images generated by different methods and enlarged them, as shown in Figure 18. As this dataset does not contain much iris information, the overall recovery of all methods is not as good as the CASIA-Iris-Interval dataset; however, our method is still visually the closest to the HR.

Figure 19 shows visual results of the iris code for the CASIA-Iris-Thousand dataset, and Figure 20 provides visual details of the iris code for CASIA-Iris-Thousand dataset. Due to the limited information content in the CASIA-Iris-Thousand dataset, there are discernible differences between the iris codes recovered by all methods and the high-resolution (HR) reference iris codes. Consequently, the overall recognition performance of this dataset is not as robust as that on the CASIA-Iris-Interval dataset. However, it is noteworthy that our proposed method consistently generates iris codes that closely approximate the HR reference iris codes.

As displayed in Table 4, the iris recognition performance is degraded due to the poor clarity of the iris texture in this dataset, while our model still achieves the best performance with classification networks of different capabilities. In contrast, the HAT performs poorly. This phenomenon may be caused by the interference resulting from the high amount of non-iris information (e.g., eyelids, etc.) in the dataset. The HAT is unable to fully focus on the recovery of the iris region. In contrast to SwinIris and SwinGIris, the improvement of iris recognition performance by sub-pixel convolutional progressive upsampling is more obvious than that of the CASIA-Iris-Interval dataset, which suggests that sub-pixel convolutional progressive upsampling is beneficial for focusing on the recovery of iris regions. This also highlights the stability of our method for different iris datasets.

In Figure 21, the DET curves for various models in the CASIA-Iris-Thousand dataset are presented. Compared to its performance on the CASIA-Iris-Interval, the advantage of our model is more noticeable in the CASIA-Iris-Thousand dataset. It is shown that our model can focus on the recovery of iris regions in the presence of more information interference.

4.4. Discussion

Quality Analysis: Our model achieves the best performance in all the PI metrics; however, in regard to PSNR/SSIM metrics, it does not perform well. It has been demonstrated that, for image restoration algorithms, PSNR/SSIM and PI are at odds with each other [29]. Iris images carry biological information, and recognition performance is clearly more important for iris super-resolution tasks. The currently prevailing metrics for assessing the quality of the SR image are inadequate for evaluating the reconstructed iris image.

Recognition Analysis: Our model obtains the best performance under different testing sets and backbone networks with different classification capabilities, as transformer architecture introduced in the generator section effectively captures structural information in the image, and the overall GAN architecture can in turn contain more high-frequency details. In addition, sub-pixel convolutional progressive upsampling can contribute to focusing on the recovery of high-frequency iris information in the presence of more information interference.

5. Conclusions

In this paper, a type of novel model SwinGIris has been proposed to enhance iris recognition accuracy by converting low-resolution iris images into high-resolution images.

Through experimental validation, our method achieves the best performance under different testing sets and backbone networks with different classification capabilities. Apart from that, our model is more capable of recovering high-frequency iris information compared to other mainstream models when more information interference is present.

It cannot be ignored that there are some limitations to this method. Due to the inherent GAN-based architecture, the methodology is susceptible to training instability issues during the training process. Furthermore, the evaluation indexes present smaller advantages for the image super-resolution task. However, for the iris recognition task, it does improve the accuracy of low-resolution iris image recognition over other mainstream super-scoring models. Overall, we believe our model will have a positive impact in the field of iris recognition.

Author Contributions

Conceptualization, H.L. and H.J.; methodology, H.L.; software, H.L.; validation, H.L., J.C. and H.J.; formal analysis, X.Z.; investigation, H.L.; resources, X.Z.; data curation, H.L.; writing—original draft preparation, H.L.; writing—review and editing, H.L.; visualization, H.L.; supervision, X.Z. and J.C.; project administration, H.L. and X.Z.; funding acquisition, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (NSFC) under Grant No. 61471181, the Natural Science Foundation of Jilin Province (grant number YDZJ202101ZYTS144), the Jilin province Industrial Innovation Special Fund Project (grant number 2019C053.2), and the Science and Technology Project of the Jilin Provincial Education Department (grant number JJKH20180448KJ).

Data Availability Statement

The Iris datasets in this paper are openly available at http://biometrics.idealtest.org/index.jsp#/datasetDetail/4 (accessed on 5 October 2022).

Acknowledgments

Thanks to the Jilin Provincial Key Laboratory of Biometrics New Technology for supporting this research.

Conflicts of Interest

We declare that we have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

References

Parker, J.A.; Kenyon, R.V.; Troxel, D.E. Comparison of interpolating methods for image resampling. IEEE Trans. Med. Imaging 1983, 2, 31–39. [Google Scholar] [CrossRef] [PubMed]
Blu, T.; Thévenaz, P.; Unser, M. Linear interpolation revitalised. IEEE Trans. Image Process. 2004, 13, 710–719. [Google Scholar] [CrossRef] [PubMed]
Keys, R. Cubic convolution interpolation for digital image processing. IEEE Trans. Acoust. Speech Signal Process. 1981, 29, 1153–1160. [Google Scholar] [CrossRef]
Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Springer International Publishing: Berlin/Heidelberg, Germany, 2014; pp. 184–199. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2021, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2021, Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar]
Chen, X.; Wang, X.; Zhou, J.; Qiao, Y.; Dong, C. Activating more pixels in image super-resolution transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023, Vancouver, BC, Canada, 17–24 June 2023; pp. 22367–22377. [Google Scholar]
Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar]
Wang, Y.; Perazzi, F.; McWilliams, B.; Sorkine-Hornung, A.; Sorkine-Hornung, O.; Schroers, C. A fully progressive approach to single-image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops 2018, Salt Lake City, UT, USA, 18–23 June 2018; pp. 864–873. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Change Loy, C. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops 2018, Munich, Germany, 8–14 September 2018; pp. 63–79. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Dosovitskiy, A.; Brox, T. Generating images with perceptual similarity metrics based on deep networks. In Proceedings of the Advances in Neural Information Processing Systems 29 (NIPS 2016), Centre Convencions Internacional Barcelona, Barcelona, Spain, 5–10 December 2016; pp. 658–666. [Google Scholar]
Goodrich, B.; Arel, I. Reinforcement learning based visual attention with application to face detection. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 16–21 June 2012; pp. 19–24. [Google Scholar]
Taud, H.; Mas, J.F. Multilayer perceptron (MLP). In Geomatic Approaches for Modeling Land Change Scenarios; Springer: Cham, Switzerland, 2018; pp. 451–455. [Google Scholar]
CASIA Iris Image Database. Available online: http://biometrics.idealtest.org/index.jsp#/datasetDetail/4 (accessed on 5 October 2022).
Othman, N.; Dorizzi, B.; Garcia-Salicetti, S. OSIRIS: An open source iris recognition software. Pattern Recognit. Lett. 2016, 82, 124–131. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimisation. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Sheikh, H.R.; Sabir, M.F.; Bovik, A.C. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image Process. 2006, 15, 3440–3451. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
Blau, Y.; Mechrez, R.; Timofte, R.; Michaeli, T.; Zelnik-Manor, L. The 2018 PIRM challenge on perceptual image super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops 2018, Munich, Germany, 8–14 September 2018. [Google Scholar]
Ma, C.; Yang, C.Y.; Yang, X.; Yang, M.H. Learning a no-reference quality metric for single-image super-resolution. Comput. Vis. Image Underst. 2017, 158, 1–16. [Google Scholar] [CrossRef]
Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyser. IEEE Signal Process. Lett. 2012, 20, 209–212. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Blau, Y.; Michaeli, T. The perception-distortion tradeoff. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6228–6237. [Google Scholar]

Figure 1. Iris recognition system with super-resolution model.

Figure 2. Generator structure.

Figure 3. Shallow feature maps. From left to right and top to bottom, the obtained feature maps correspond to the increasing epochs.

Figure 4. Residual Swin Transformer Block (RSTB) structure.

Figure 5. Swin Transformer Layer (STL) structure.

Figure 6. Deep feature maps. From left to right and top to bottom, the obtained feature maps correspond to the increasing epochs.

Figure 7. The structure of Image Reconstruction.

Figure 8. Subpixel convolution feature maps. From left to right and top to bottom, the obtained feature maps correspond to the increasing epochs.

Figure 9. An example of super-resolution reconstruction process.

Figure 10. Discriminator structure.

Figure 11. Example images of CASIA-Iris-Lamp (first line), CASIA-Iris-Syn (second line), CASIA-Iris-Interval (third line) and CASIA-Iris-Thousand (fourth line).

Figure 12. Visual results of CASIA-Iris-Interval dataset.

Figure 13. Visual details of CASIA-Iris-Interval dataset. To show the details of the generated iris images, we zoomed in on a portion of the iris region. The area in red is the area we intercepted and zoomed in on.

Figure 14. Visual results of iris code for CASIA-Iris-Interval dataset.

Figure 15. Visual details of iris code for CASIA-Iris-Interval dataset. The red frame area is the hand details.

Figure 16. DET curves for different classification networks of the CASIA-Iris-Interval dataset.

Figure 17. Visual results of CASIA-Iris-Thousand dataset.

Figure 18. Visual details of CASIA-Iris-Thousand dataset. The red frame area is the hand details.

Figure 19. Visual results of iris code for CASIA-Iris-Thousand dataset.

Figure 20. Visual details of iris code for CASIA-Iris-Thousand dataset. The red frame area is the hand details.

Figure 21. DET curves for different classification networks of the CASIA-Iris-Thousand dataset.

Table 1. Experimental results of image quality on CASIA-Iris-Interval (best results are indicated with bold).

	Bicubic	SRCNN	ESRGAN	SwinIR	HAT	IrisDnet	SwinIris	SwinGIris
PSNR	31.4261	31.6137	31.9372	33.6537	34.7421	31.5786	34.5798	33.5663
SSIM	0.8102	0.8063	0.8301	0.8703	0.8927	0.7994	0.8857	0.8639
PI	8.1907	9.1302	6.3256	7.9390	7.8452	6.0320	7.7257	5.9280

Table 2. Experimental results of iris verification on CASIA-Iris-Interval (best results are indicated with bold).

	LR	SRCNN	ESRGAN	SwinIR	HAT	IrisDnet	SwinIris	SwinGIris
EER(VGG)	2.9474	2.4737	1.7895	1.3158	1.0526	1.1053	1.1632	0.8421
EER(ResNet)	2.0526	0.8947	0.5263	0.4737	0.4211	0.4211	0.4737	0.2632
EER(DenseNet)	1.2632	0.6842	0.3684	0.4210	0.2105	0.2105	0.4211	0.1579

Table 3. Experimental results of image quality on CASIA-Iris-Thousand (best results are indicated with bold).

	Bicubic	SRCNN	ESRGAN	SwinIR	HAT	IrisDnet	SwinIris	SwinGIris
PSNR	33.4895	34.4581	34.8884	36.0513	36.3218	34.1663	35.8515	34.7214
SSIM	0.8339	0.8563	0.8905	0.9036	0.9152	0.8629	0.9011	0.8883
PI	8.2088	8.1540	6.3844	6.9119	7.6180	6.2439	6.8485	6.1892

Table 4. Experimental results of iris verification on CASIA-Iris-Thousand (best results are indicated with bold).

	LR	SRCNN	ESRGAN	SwinIR	HAT	IrisDnet	SwinIris	SwinGIris
EER(VGG)	4.8600	3.4806	2.5119	2.6761	3.4007	2.3200	2.5143	2.0995
EER(ResNet)	3.3600	1.9818	1.4101	1.3997	4.5828	1.3088	1.3595	1.1200
EER(DenseNet)	2.1200	1.2952	0.8583	0.9402	0.9001	0.7203	0.8199	0.6000

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, H.; Zhu, X.; Cui, J.; Jiang, H. An Iris Image Super-Resolution Model Based on Swin Transformer and Generative Adversarial Network. Algorithms 2024, 17, 92. https://doi.org/10.3390/a17030092

AMA Style

Lu H, Zhu X, Cui J, Jiang H. An Iris Image Super-Resolution Model Based on Swin Transformer and Generative Adversarial Network. Algorithms. 2024; 17(3):92. https://doi.org/10.3390/a17030092

Chicago/Turabian Style

Lu, Hexin, Xiaodong Zhu, Jingwei Cui, and Haifeng Jiang. 2024. "An Iris Image Super-Resolution Model Based on Swin Transformer and Generative Adversarial Network" Algorithms 17, no. 3: 92. https://doi.org/10.3390/a17030092

APA Style

Lu, H., Zhu, X., Cui, J., & Jiang, H. (2024). An Iris Image Super-Resolution Model Based on Swin Transformer and Generative Adversarial Network. Algorithms, 17(3), 92. https://doi.org/10.3390/a17030092

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Iris Image Super-Resolution Model Based on Swin Transformer and Generative Adversarial Network

Abstract

1. Introduction

1.1. Research Questions

1.2. Contributions and Paper Outline

2. Related Work

3. Method

3.1. Model Architecture

3.2. Adversarial Architecture

3.3. Loss Function

4. Experiments and Results

4.1. Datasets

4.2. Implementation Details

4.2.1. Data Preparation

4.2.2. Parameters

4.2.3. Metrics

4.3. Results

4.4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI