Biometric-Based Key Generation and User Authentication Using Acoustic Characteristics of the Outer Ear and a Network of Correlation Neurons

Trustworthy AI applications such as biometric authentication must be implemented in a secure manner so that a malefactor is not able to take advantage of the knowledge and use it to make decisions. The goal of the present work is to increase the reliability of biometric-based key generation, which is used for remote authentication with the protection of biometric templates. Ear canal echograms were used as biometric images. Multilayer convolutional neural networks that belong to the autoencoder type were used to extract features from the echograms. A new class of neurons (correlation neurons) that analyzes correlations between features instead of feature values is proposed. A neuro-extractor model was developed to associate a feature vector with a cryptographic key or user password. An open data set of ear canal echograms to test the performance of the proposed model was used. The following indicators were achieved: EER = 0.0238 (FRR = 0.093, FAR < 0.001), with a key length of 8192 bits. The proposed model is superior to known analogues in terms of key length and probability of erroneous decisions. The ear canal parameters are hidden from direct observation and photography. This fact creates additional difficulties for the synthesis of adversarial examples.


Introduction
Any unauthorized interference in the work of artificial intelligence (AI) can lead to the following consequences: property damage, information security breaches, threats to life and the health of citizens, technological failures or disasters, etc. All of these depend on the purpose of a particular implementation of AI and its capabilities. Therefore, AI algorithms must support protected execution mode in mission-critical applications. "Protected execution" means the impossibility of analyzing the logic and control of AI and extracting knowledge from the AI memory (for example, personal data) by any unauthorized person.
Responsible AI applications include biometric authentication systems based on fingerprints, the iris, ink, handwriting, voice, and other parameters. Biometric images (patterns) are personal data that need reliable protection against compromise. The "protected execution" of the biometric authentication procedure can be implemented based on homomorphic encryption or by using special mathematical models that make it possible to associate a biometric image (pattern) of a person with his password or personal cryptographic key. The key binding model is usually used to generate a strictly specified key in response to a biometric image of a certain person. The combination of the biometric template and key is stored as helper data. The key binding models can be divided into two main categories: fuzzy extractors (fuzzy commitment, fuzzy vault, fuzzy embedder), which are based on the use of error correction codes, and neuro-extractors (neural fuzzy extractors), which are based on artificial neural networks (ANN).
Each group of models has fundamental disadvantages.
In this paper, we propose a neuro-extractor model based on correlation neurons (c-neuro-extractor). The proposed model has no typical limits compared to existing models. This model has several benefits that allow for the use of longer keys (passwords) for authentication and reduce the false rejection (FRR) and false accept (FAR) rates (the FRR and FAR are expressed as a probability, as in the present work, or as a percentage). Correlation neurons are a new class of neurons that analyze correlations between features instead of analyzing feature values in pattern classification problems. The analysis of the internal correlations of images (patterns) and the classification of the decisions occur without storing the information about the correlations or values of features typical for biometric images of users of computer systems. In other words, the reference information about the class of images is not compromised during its storage. The learning process of correlation neurons is fully automatic and remains robust even on small training sets. The effectiveness of the proposed model is illustrated by the example of the verification of a person by the peculiarities of the ear canal's internal structure using an open data set of acoustic images (patterns) of ears (AIC-ears-75). The advantage of the ear canal features is that they are not compromised in the natural environment (a photograph of the ear is not informative and is not suitable for making a physical or digital fake of the ear for realizing adversarial attacks) [1]. This problem is difficult to solve because the parameters of the ear canal are significantly less informative than fingerprint and iris parameters.

Related Works: Common Terms
Homomorphic encryption is a form of perspective encryption that allows one to perform mathematical operations on ciphertext and obtain an encrypted result that matches the result of operations performed in plaintext. The main problem with these methods is poor performance. For example, a comparison of the presented images with fingerprint templates requires significant computational resources. Furthermore, the recognition of a person takes too much time, even when using parallel computing [2]. As mentioned by the authors in [3], the main disadvantage of the proposed homomorphic protection scheme of multibiometric templates is its low performance. The performance degradation is significant, even for simple decision rules. In addition, when recognizing homomorphically encrypted images, the FRR and FAR values often increase. In particular, the authors in [4] proposed a method of homomorphic protection of face parameters extracted from images using deep neural networks. The extracted parameters are encrypted using the Paillier probabilistic cryptosystem. The proposed encryption scheme reduces the accuracy of face verification. The situation can be explained by the fact that homomorphic ciphertexts stop decrypting after performing a sufficiently large number of addition and multiplication operations. Of course, these problems could be solved in the future, but the effective protection of biometric templates is possible without using homomorphic encryption.
The operation of the key binding model requires creating a biometric template and entering a pre-generated cryptographic key or password. This process is performed when a new user registers. A biometric template is a set of reference characteristics of the user's biometric data (for example, the values of the weight coefficients of a neural network, etc.). The secure template does not compromise the reference biometric characteristics of the user during its storage. Biometric images (patterns) can be represented as raw data, feature vectors, or meta-features. To obtain a feature vector, "raw" biometric data must be processed by a special algorithm. The feature extraction unit converts the raw biometric data into a fixed-length feature vector. The implementation of the feature extraction unit depends on the way in which the biometric images (handwritten signatures, fingerprints, iris images, voice signal spectrograms, EEG, ECG) are presented. A meta-feature is an integral characteristic that is calculated based on the processing of two or more biometric features. The quality of a feature (meta-feature) is determined by its informativeness, that is, the amount of information contained in the feature, which makes it possible to distinguish one user from another. FAR = 0.01 with FRR = 0.0241 (with a key length of 1024 bits). In [15], an ensemble (stack) of two neural networks was used. The first VGG-Face network (13 convolutional and 2 fully connected layers) was pre-trained on 2.6 million images and received 224 * 224 face images as inputs, and produced a 4096-bit binary code at the output. The second network (six fully connected layers trained by the Adam optimizer) translated a 4096-bit vector into a user key, which was specified during training and had a length of up to 1024 bits. The equal error rate was EER = 0.036 (EER = FAR = FRR). The disadvantage of deep fuzzy neuro-extractor schemes is that gradient descent algorithms tend to overfit. The scheme can be poorly transferable to other modalities since the image structure and ANN architecture are different for each modality. The proposed approach is difficult to apply in cases where collecting a large amount of training data is impossible. The benefits of applying such security schemes to facial biometric images are not entirely clear since face parameters can be easily compromised by photography.
Classifiers based on CNNs with the softmax function at the output are vulnerable to adversarial attacks [16]. It is known that the imposition of additive Gaussian noise on the image significantly increases the FAR when verifying the biometric images using neural networks with a similar architecture [17]. If an attacker has access to the weight coefficients, it greatly simplifies the process of carrying out these attacks. Therefore, the neuro-extractor architecture should be built so that the synapse weights do not compromise the biometric data of users.

Materials and Methods
In the proposed scheme, a feature extraction unit and a c-neuro-extractor are highlighted ( Figure 1). The requirement for the features is that each feature extracted from an image must obey the normal distribution law or, at least, the feature probability density function must be symmetric and unimodal.
attack [13], which is based on an analysis of the weights and tables of the connectio neurons.
Finally, models of deep neuro-extractors based on multilayer convolutional n networks (CNNs) have been proposed for applications of facial biometrics. The AN [14] included two convolutional layers, a max-pooling layer, two fully connected l and two dropout layers. The following values for the error rates were obtained: F 0.01 with FRR = 0.0241 (with a key length of 1024 bits). In [15], an ensemble (stack) o neural networks was used. The first VGG-Face network (13 convolutional and 2 connected layers) was pre-trained on 2.6 million images and received 224 * 224 fac ages as inputs, and produced a 4096-bit binary code at the output. The second net (six fully connected layers trained by the Adam optimizer) translated a 4096-bit v into a user key, which was specified during training and had a length of up to 102 The equal error rate was EER = 0.036 (EER = FAR = FRR). The disadvantage of deep neuro-extractor schemes is that gradient descent algorithms tend to overfit. The sc can be poorly transferable to other modalities since the image structure and ANN tecture are different for each modality. The proposed approach is difficult to app cases where collecting a large amount of training data is impossible. The benefits plying such security schemes to facial biometric images are not entirely clear sinc parameters can be easily compromised by photography.
Classifiers based on CNNs with the softmax function at the output are vulnera adversarial attacks [16]. It is known that the imposition of additive Gaussian noise o image significantly increases the FAR when verifying the biometric images using n networks with a similar architecture [17]. If an attacker has access to the weight c cients, it greatly simplifies the process of carrying out these attacks. Therefore, the ro-extractor architecture should be built so that the synapse weights do not compr the biometric data of users.

Materials and Methods
In the proposed scheme, a feature extraction unit and a c-neuro-extractor are lighted ( Figure 1). The requirement for the features is that each feature extracted fro image must obey the normal distribution law or, at least, the feature probability de function must be symmetric and unimodal.  In the present study, an autoencoder based on convolutional neurons was used. The autoencoder is an artificial neural network that can compress the dimension of the input data, encoding them with a set of informative features, as well as recovering the input data from the feature vector. The autoencoder consists of two subnets. The encoder is used only to extract features. A decoder is used for data recovery (it is impossible to train the encoder without the decoder). The autoencoder can be trained on a large dataset of anonymized biometric images. The encoder can remain in an unprotected form after training since it does not produce classification solutions at its outputs and does not store personal biometric data and user keys. The encoder outputs (features) must be converted to Bayes-Minkowski meta-features using a special mapping (Figure 1). Then meta-features must be connected to c-neuroextractors. A separate c-neuro-extractor, which is trained on «Genuine» and «Impostors» images in a trusted environment, is created for each new user. It takes the place of the softmax layer and is capable of generating an almost random output in response to an ambiguous «Impostor» image or user key in response to a «Genuine» image.
C-neuro-extractors can be placed anywhere after being trained. The encoder can be placed in the cloud so feature extraction functions are available to all users. In this approach, the decoder can be removed.
The proposed architecture of the authentication system combines the advantages of deep neural networks (the ability to extract highly informative features and use transfer learning) with the advantages of c-neuro-extractors (protection of private keys and biometric personal data from compromise, robust automatic learning on a small sample set of user images). The c-neuro-extractor model based on correlation neurons and Bayes-Minkowski meta-features space is proposed.

A Curved Feature Space: The Informativeness and Cross-Correlation of Features
The Euclidian proximity measure and Manhattan distance [18,19] were used as the decision rules, which were protected by homomorphic encryption. These proximity measures were generalized in the form of the Minkowski measure [20] (1): whereā-is a feature vector representing a biometric image; a j is the value of the j-th feature from the vectorā; n is the number of features; m j and σ j are the mathematical expectation and standard deviation of the values of the j-th feature for the «Genuine» class, which is compared with the imageā (the «Genuine» class represents the biometric images of one of the legitimate users); p is a power coefficient that determines the level of "curvature" of the space. For p = 1, we obtain the Manhattan measure; for p = 2, the Euclidian measure; and for p→∞, the Minkowski measure tends to the Chebyshev measure. The Minkowski distance changes depending on the value of the power coefficient p. Figure 2 illustrates what a circle might look like in a two-dimensional Minkowski space. In contrast to the example in Figure 2, the feature space is multidimensional (each feature "creates" one dimension).
In the present study, an autoencoder based on convolutional neurons was used. The autoencoder is an artificial neural network that can compress the dimension of the input data, encoding them with a set of informative features, as well as recovering the input data from the feature vector. The autoencoder consists of two subnets. The encoder is used only to extract features. A decoder is used for data recovery (it is impossible to train the encoder without the decoder).
The autoencoder can be trained on a large dataset of anonymized biometric images. The encoder can remain in an unprotected form after training since it does not produce classification solutions at its outputs and does not store personal biometric data and user keys. The encoder outputs (features) must be converted to Bayes-Minkowski meta-features using a special mapping (Figure 1). Then meta-features must be connected to c-neuro-extractors. A separate c-neuro-extractor, which is trained on «Genuine» and «Impostors» images in a trusted environment, is created for each new user. It takes the place of the softmax layer and is capable of generating an almost random output in response to an ambiguous «Impostor» image or user key in response to a «Genuine» image.
C-neuro-extractors can be placed anywhere after being trained. The encoder can be placed in the cloud so feature extraction functions are available to all users. In this approach, the decoder can be removed.
The proposed architecture of the authentication system combines the advantages of deep neural networks (the ability to extract highly informative features and use transfer learning) with the advantages of c-neuro-extractors (protection of private keys and biometric personal data from compromise, robust automatic learning on a small sample set of user images). The c-neuro-extractor model based on correlation neurons and Bayes-Minkowski meta-features space is proposed.

A Curved Feature Space: The Informativeness and Cross-Correlation of Features
The Euclidian proximity measure and Manhattan distance [18,19] were used as the decision rules, which were protected by homomorphic encryption. These proximity measures were generalized in the form of the Minkowski measure [20] where ā-is a feature vector representing a biometric image; aj is the value of the j-th feature from the vector ā; n is the number of features; mj and σj are the mathematical expectation and standard deviation of the values of the j-th feature for the «Genuine» class, which is compared with the image ā (the «Genuine» class represents the biometric images of one of the legitimate users); p is a power coefficient that determines the level of "curvature" of the space. For p = 1, we obtain the Manhattan measure; for p = 2, the Euclidian measure; and for p→∞, the Minkowski measure tends to the Chebyshev measure. The Minkowski distance changes depending on the value of the power coefficient p. Figure 2 illustrates what a circle might look like in a two-dimensional Minkowski space. In contrast to the example in Figure 2, the feature space is multidimensional (each feature "creates" one dimension). The curvature of the feature space occurs due to the correlations between its dimensions ( Figure 3). Typically, the feature space is neither flat nor equally curved. All classes of images have different matrices of correlation coefficients Cj,t (2) between fea- The curvature of the feature space occurs due to the correlations between its dimensions ( Figure 3). Typically, the feature space is neither flat nor equally curved. All classes of images have different matrices of correlation coefficients C j,t (2) between features (the biometric image of each person has a unique correlation matrix). Therefore, the feature space is curved differently for the various classes of biometric images.
where K G is the number of images in the «Genuine» training set (K I is the number of images in the «Impostors» training set), and k is an index of the images in the «Genuine» training set.
where KG is the number of images in the «Genuine» training set (KI is the number of ages in the «Impostors» training set), and k is an index of the images in the «Genu training set.
The informativeness level of a feature is an important indicator [21]. The amou individual information of the j-th feature for a certain class of images is determined u Formula (3): where AUC, the area under the curve, is limited by the probability density funct «Genuine» ՓG(aj) and «Impostors» ՓI(aj), as well as by the x-axis. ՓG(aj) characterize values of the feature strictly for a certain class of images, and ՓI(aj) characterizes values of the same feature for all classes of images as a whole [21]. The higher the average, the further separated the proper class regions in the feature space.
The number of classification errors when using the Minkowski measure can be creased by changing the parameter p that was demonstrated in [20]. The optimum v of p depends on the average indicators of the information content and intraclass cor tion between features (I and C, respectively). The direction of space compression of two features (a) with a positive significant cor tion between features (the distance "a" is greater than the distance "b" since the feature space i "flat" but curved due to the correlation); (b) with independent features (distance "a" is greater "b"); (c) with different correlations. However, the correlation not only warps the feature space but also transfers som the information about the images to "hidden" dimensions. This information can be u as features for image classification, which is shown for the first time in the next p graph.

The Bayes-Minkowski Meta-Feature Space
To extract information about the levels of curvature of the feature space in th rection of each dimension, we introduce several variations of the Bayes-Minko Figure 3. The direction of space compression of two features (a) with a positive significant correlation between features (the distance "a" is greater than the distance "b" since the feature space is not "flat" but curved due to the correlation); (b) with independent features (distance "a" is greater than "b"); (c) with different correlations.
The informativeness level of a feature is an important indicator [21]. The amount of individual information of the j-th feature for a certain class of images is determined using Formula (3): where AUC, the area under the curve, is limited by the probability density functions «Genuine» Φ G (a j ) and «Impostors» Φ I (a j ), as well as by the x-axis. Φ G (a j ) characterizes the values of the feature strictly for a certain class of images, and Φ I (a j ) characterizes the values of the same feature for all classes of images as a whole [21]. The higher the I on average, the further separated the proper class regions in the feature space. The number of classification errors when using the Minkowski measure can be decreased by changing the parameter p that was demonstrated in [20]. The optimum value of p depends on the average indicators of the information content and intraclass correlation between features (I and C, respectively).
However, the correlation not only warps the feature space but also transfers some of the information about the images to "hidden" dimensions. This information can be used as features for image classification, which is shown for the first time in the next paragraph.

The Bayes-Minkowski Meta-Feature Space
To extract information about the levels of curvature of the feature space in the direction of each dimension, we introduce several variations of the Bayes-Minkowski measure (4)- (9), which operate using the differences between the features. These metrics take the smaller values and the higher C j , t (Figure 4). If the t-th and j-th features are linearly dependent (C j , t = 1) for images of a certain class, then the t-th and j-th dimensions with respect to this class become "singular" (combined into one dimension, Figure 3c) and the corresponding difference under the modulus sign always takes the value of zero (as if there is no j-th dimension). However, if the features have a weak dependence, the difference (in modulus) increases. The higher the correlation between the features, the lower the percentage of wrong decisions that will be obtained. Figure 4 shows that AUC |C| > 0.95 (Φ G (y), Φ I (y)) < AUC |C|< 0.3 (Φ G (y), Φ I (y)).
where µ j and δ j are normalizing coefficients calculated as the mathematical expectation and standard deviation of the feature values for the «Impostors» class. The meaning of the µ j and δ j coefficients is to bring all the features to an approximately single scale; µ j and δ j do not compromise the data of any user since they represent the parameters of the distribution of the feature values for a set of depersonalized images. Thus, differential confidentiality is ensured using Measures (6)- (9). Measures (8)-(9) provide the highest level of confidentiality since they operate only using the normalizing coefficients of the range of δ j ; therefore, their use is preferable for solving the considered problems. To enhance the confidentiality of Measures (6)- (7), noise can be added to µ j (random shift of the value).
The meta-features are the differences and have the following forms (10)- (12): These differences are a rough (point) estimate of the correlation dependence between the two initial features under the numbers j and t (the smaller a' in the modulus means a higher intraclass correlation between relevant features if C j,t = 1, then a' t,j ≈ 0). A point estimate is an estimate made from just one sample of biometrics but in the presence of some a priori knowledge (m j , σ j , δ j , µ j ) obtained in the training process. The dimension of the Bayes-Minkowski meta-feature space is (13): n' = 0.5(n(n−1)) = 0.5n 2 − 0.5n, n > 0 (13) Measures (4)-(9) are linear classifiers in the Bayes-Minkowski meta-feature space. We can transform the initial feature space into the rectifying Bayes-Minkowski metafeature space using mapping a' t,j = f(a t , a j ) to use any classifiers. It is most convenient to depict three-dimensional initial and rectifying spaces ( Figure 5) since n' = n = 3. The space of the meta-features can contain much more information about the classes of images than the initial space. Figure 5 shows that in the initial space, the classes are linearly inseparable and the informativeness of the features is very low (0.1 < I j < 0.2) but the meta-features are much more informative (0.35 ≤ I' j* ≤ 2.95). The initial features are highly correlated (0.94 ≤ C j,t ≤ 0.96), whereas the correlation between the meta-features is insignificant (0.1 ≤ C' j*,t* ≤ 0.22).  The meta-features are the differences and have the following forms (10)-(12): (a) for all classes where 1> C j,t > 0.95, n' = 1; (b) for all classes where 1 > C j,t > 0.95, n' = 5; (c) for the «Genuine» class where 1 > C j,t > 0.95, for the «Impostors» class where |C j,t | < 0.3, n' = 5; (d) for the «Genuine» class where 1 > C j,t > 0.95, for the «Impostors» class where −1 < C j,t < −0.95, n' = 5; (e) for all classes where |C j,t | < 0.3, n' = 5; (f) for all classes where −1 < C j,t < −0.95, n' = 5.  Negatively correlated initial features can form correlated pairs of meta-features. Figure 6 illustrates two positively correlated features (normalized to δ j ) forming a metatrait a' 2 with chaotic dynamics (which has no significant correlation with other metafeatures). Negatively correlated features form meta-features a' 1 and a' 3 , which are positively correlated (relative to each other) for class 1 and negatively correlated for class 3. For class 2, a' 1 and a' 3 have an implicit correlation-at a certain moment, the positive correlation changes to a negative one. The "naive" scheme of the Bayes classification is fully correct when features ar dependent, i.e., the feature space has absolutely no curvature. Minkowski's measur the contrary, measures the distance in the curved space. New metrics convert the spa correlated features into the space of independent meta-features so they are named u Bayes-Minkowski metrics.

Assessment of Bayes-Minkowski's Meta-Feature Informativeness Using Synthetic Datasets
A computational experiment was conducted on the recognition of images (patt in the space of abstract (imitated) features. All features had a normal distribution of ues. A total of 65 classes of images in spaces of independent and dependent features different I indicators were generated. The generated classes of the images differed b parameters of the feature distribution. The method of feature and image (patterns) eration was based on the Monte Carlo method and is described in [20].
To identify the generated images on a closed set of 65 classes, a computationa periment was provided with the use of the "naive" Bayesian classifier. For the trainin the Bayesian classifier, the parameters of the normal law distribution were calcu (mathematical expectation and standard deviation) for each feature or meta-featur cording to the training set (10 random samples of images per class). Conventional p ability densities were calculated in accordance with the normal distribution law. The images from each class that were not included in the training set were used as the te set. The decision was taken in favor of the hypothesis with the highest a poste probability. Rank 1 accuracy was calculated (the number of correct classification sions was divided by the total number of experiments). The test results are presente Figure 7. This experiment demonstrates the following: • A correlation between features can carry more information than the features th selves. If the initial features are more informative (I≈0.5) and independent ( <0.3), then in the meta-feature space, the accuracy of the identification of imag lower than in a case where the initial features are less informative (I ≈0.15 strongly correlated (1> Cj,t> 0.95).

•
If the initial features are independent, the meta-features cause noise (the accura the identification is higher when using only the initial independent features when combining the independent features with the meta-features); • The "transition" to the space of the meta-features does not lead to a manifestatio the "Curse of dimensionality" problem if the features are strongly correlated. curse of dimensionality is a problem associated with an exponential increase in In order to eliminate the negative correlation, one of the mappings a' t,j = f(a t , a j ) should be used repeatedly (first, in relation to pairs of negatively correlated features, then, pairs of positively correlated meta-features). The meta-features space of the second order (after the repeated "transition") even has a greater dimension (14): For this reason, the informativeness of the negative correlations of the initial features can be significantly higher than the positive ones. Further "transition" (construction of metafeature spaces of the third, fourth, and other orders) makes sense until there are correlated pairs of meta-features. The correlation between all pairs of meta-features becomes weak on the Chaddock scale usually after two-three "transitions" Thus, when constructing classifiers, meta-features obtained from pairs of positively and negatively correlated features can be used. At the same time, the use of meta-features generated by pairs of independent (weakly correlated) features (or meta-features) should be avoided since such generations can be noisy. Independent (weakly correlated) features should be processed separately, without performing transformations over them Equations (10)- (12).
The "naive" scheme of the Bayes classification is fully correct when features are independent, i.e., the feature space has absolutely no curvature. Minkowski's measure, on the contrary, measures the distance in the curved space. New metrics convert the space of correlated features into the space of independent meta-features so they are named using Bayes-Minkowski metrics.

Assessment of Bayes-Minkowski's Meta-Feature Informativeness Using Synthetic Datasets
A computational experiment was conducted on the recognition of images (patterns) in the space of abstract (imitated) features. All features had a normal distribution of values. A total of 65 classes of images in spaces of independent and dependent features with different I indicators were generated. The generated classes of the images differed by the parameters of the feature distribution. The method of feature and image (patterns) generation was based on the Monte Carlo method and is described in [20].
To identify the generated images on a closed set of 65 classes, a computational experiment was provided with the use of the "naive" Bayesian classifier. For the training of the Bayesian classifier, the parameters of the normal law distribution were calculated (mathe-matical expectation and standard deviation) for each feature or meta-feature according to the training set (10 random samples of images per class). Conventional probability densities were calculated in accordance with the normal distribution law. The 100 images from each class that were not included in the training set were used as the testing set. The decision was taken in favor of the hypothesis with the highest a posteriori probability. Rank 1 accuracy was calculated (the number of correct classification decisions was divided by the total number of experiments). The test results are presented in Figure 7. This experiment demonstrates the following:

•
A correlation between features can carry more information than the features themselves.
If the initial features are more informative (I ≈ 0.5) and independent (|C j,t | < 0.3), then in the meta-feature space, the accuracy of the identification of images is lower than in a case where the initial features are less informative (I ≈0.15) but strongly correlated (1 > C j,t > 0.95).

•
If the initial features are independent, the meta-features cause noise (the accuracy of the identification is higher when using only the initial independent features than when combining the independent features with the meta-features); • The "transition" to the space of the meta-features does not lead to a manifestation of the "Curse of dimensionality" problem if the features are strongly correlated. The curse of dimensionality is a problem associated with an exponential increase in the volume of the training set and related calculations due to the linear growth of the dimension of the features. As we can see in Figure 7, by using a similar training set (10 images), it is possible to achieve higher accuracy when going into the space of greater dimension  From the simulation results, it can be seen that the optimum according to the racy of the recognition of images was achieved at 0.7 ≤ p ≤ 1 (depending on the level features' informativeness). When using the mapping in (11), a significant increase curacy was not observed. From the simulation results, it can be seen that the optimum according to the accuracy of the recognition of images was achieved at 0.7 ≤ p ≤ 1 (depending on the level of the features' informativeness). When using the mapping in (11), a significant increase in accuracy was not observed.
An assessment of the meta-features' informativeness in relation to the informativeness and pair correlation of the initial features (Figure 8) was also carried out. The extremum of the average informativeness of the meta-features was observed at 0.9 ≤ p ≤ 1. The setting of the value p to 0.9 is preferable because the transformation of the features to the meta-features is nonlinear at p = 1. If initial features are independent, then I' < I. However, if features are correlated, then I' > I. The correlation relationship between two features may be more informative than the pair of features itself. For example, when 1 > C > 0.95 and I = 0.15, then I = 0.488 (at p = 0.9). So, the higher the level of correlation, the more informative the Bayes-Minkowski meta-features. The obtained results show that the Bayes-Minkowski meta-feature space is the best approach to use, at least in the task of pattern classification.

Correlation Neuron Model for Biometric Authentication
Each neuron should separate the input data according to the level of correlation. The neuron is connected to the meta-features that were generated by features paired with a similar level of mutual correlation. Let us introduce two levels of correlations of features: C-> Cj,t (C-ϵ[−0.99;−0.3]) and C+ < Cj,t (C+ϵ[0.3; 0.99]). With |C-+| < 0.3, the correlation neuron may work incorrectly and the number of errors will be significant. Condition |C-| = |C+| does not have to be fulfilled; the more negatively and positively correlated pairs of features, the higher the absolute values of the threshold coefficients C-and C+ should be set. One meta-feature should be associated with only one correlation neuron in order to avoid the implementation of Marshalko attacks [13]. Thus, correlation neurons are partially connected.
The metric in (15) works well with positively correlated data but cannot determine negatively correlated data (Figure 4). The metric of the standard deviation of the meta-feature values in (16) allows for the separation of both positively and negatively correlated data (Figure 9), which is conditional on the fact that the values of the deviation modules |a'j* − m'| tend to reduce if the correlation between the initial features is strong (both positively and negatively).
Correlation neurons can be based on the weighted standard deviation metric in (17): The described regularities are valid when features have a normal distribution. For other distribution laws, evaluations were not provided. So, the higher the level of correlation, the more informative the Bayes-Minkowski meta-features. The obtained results show that the Bayes-Minkowski meta-feature space is the best approach to use, at least in the task of pattern classification.

Correlation Neuron Model for Biometric Authentication
Each neuron should separate the input data according to the level of correlation. The neuron is connected to the meta-features that were generated by features paired with a similar level of mutual correlation. Let us introduce two levels of correlations of features: C -> C j,t (C -[−0.99;−0.3]) and C + < C j,t (C + [0.3; 0.99]). With |C -+ | < 0.3, the correlation neuron may work incorrectly and the number of errors will be significant. Condition |C -| = |C + | does not have to be fulfilled; the more negatively and positively correlated pairs of features, the higher the absolute values of the threshold coefficients Cand C + should be set. One meta-feature should be associated with only one correlation neuron in order to avoid the implementation of Marshalko attacks [13]. Thus, correlation neurons are partially connected.
The metric in (15) works well with positively correlated data but cannot determine negatively correlated data (Figure 4). The metric of the standard deviation of the metafeature values in (16) allows for the separation of both positively and negatively correlated data (Figure 9), which is conditional on the fact that the values of the deviation modules |a' j* − m'| tend to reduce if the correlation between the initial features is strong (both positively and negatively).

) fo «Genuine» and «Impostors» images. Parameters m(G),ι'', m(I),ι'', σ(G),ι'', σ(I),ι'' must be de after training.
Correlation neurons can be based on the weighted standard deviation metric in (17): where y is a neuron response, η is the number of neuron inputs, w j* is the weight of the synapse under the number j* (w j* ≥ 0, if w j* = 0, then the j*-th meta-feature does not affect the sum, i.e., it does not connect to a neuron), and ι is the number of meta-features without taking into account synapses with zero weight. Metric (17) realized a "transition" into the space of the Bayes-Minkowski meta-features of the second order (a" ι = (a' ι -m') 2 ) but only for the neuron-related meta-features. The synapse weight is calculated using Formula (18): where m (G),ι ", m (I),ι " are the mathematical expectations, and σ (G),ι ", σ (I),ι " are the standard deviations of the values of the ι-th meta-feature of the second order (a" ι = (a' ι -m') 2 ) for the «Genuine» and «Impostors» images. Parameters m (G),ι ", m (I),ι ", σ (G),ι ", σ (I),ι " must be deleted after training. It is proposed to use the multilevel threshold quantization function in (19) as an activation function: where Φ(y) is a neuron output and T left , T middle, and T right are the left, middle, and right threshold values of the neuron activation ( Figure 9). In accordance with the proposed model, a neuron has four activation outcomes {0, 1, 2, 3} and only one of them corresponds to the «Genuine» hypothesis; the rest correspond to the «Impostors» hypothesis. The potential attacker does not have information about the correct activation state that corresponds to the «Genuine» hypothesis (herein, Φ G ) since it is not saved after the neuron is configured. The purpose of neuron training is that a certain state almost always appears at the output of the neuron when the «Genuine» images enter the neuron; in other cases, the states {0, 1, 2, 3} at the output of the neuron become equally probable: P (0) ≈ P (1) ≈ P (2) ≈ P (3) ≈ 0.25. The P(Φ(y)) is the relative frequency of the occurrence of Φ(y) when the «Impostor» image enters at the input of the neuron. It is difficult to achieve such an exact ratio in practice. To provide a high entropy of neuron outputs in response to the «Impostors» images, it is sufficient to adhere to the following ratio: 0.1 < P(Φ(y)) < 0.4.
When the thresholds were being calculated, the probable boundary values of the responses of the neuron y to the «Genuine» (y Gmin , y Gmax ) and «Impostors» (y Imin , y Imax ) training samples were first calculated using Formula (20). Then, the values of the corresponding distribution functions F G (y) and F I (y) were calculated using Formula (21), and the probability density was calculated using Formula (22). For the first approximation, the distribution law of the random variable y in (17) was close to the normal one in (20), which was confirmed by the Chi-square method on a large set of the generated data.
where ξ and ς are, respectively, the mathematical expectation and the standard deviation of the y values that were calculated based on the training set. It was proposed to set the thresholds in accordance with the algorithm, which is illustrated in Figure 10. We also introduced the AUC MAX coefficient equal to the maximum allowable AUC(Φ G (y), Φ I (y)) for a neuron in order to exclude "weak" neurons that give close responses to the «Genuine» and «Impostors» images ( Figure 10). Sensors 2022, 22, x FOR PEER REVIEW 16 Figure 10. Scheme of the algorithm of synthesis and training of the correlation neuron.  One of the hash transformations should be applied to modify the value of the activation function (Table 1). A hash transformation should be chosen randomly during neuron training but considering which two key bits (hereinafter b) the neuron should be set to. For example, if Φ G = 1 and b = "10", the hash transformation number was selected from the set {5, 6, 9, 10, 15, 16, 21, 22} (Table 1). So, it is sufficient to determine the associated meta-features, calculate weights and thresholds, and set a hash transformation to train a correlation neuron (Figure 10 and Appendix A).

Synthesis and Automatic Training of C-Neuro-Extractors
A c-neuro-extractor is a shallow neural network consisting of one hidden layer of correlation neurons. It works with feature vectors so "raw" images of the «Impostors» and «Genuine» training sets must first be processed with the encoder (Figure 1).
The mapping in (12) at p = 0.9 was used to transform the features into meta-features. The normalization coefficients δ j for switching to the space of the meta-features must be calculated based on the «Impostors» training set before building and learning the c-neuro-extractors.
A case is further considered when the number of inputs η for all correlation neurons should be equal. With the synthesis of the c-neuro-extractor, it is necessary to make sure that there is a sufficient number of pairs of features and levels of mutual correlation C j,t < Cand C j,t > C + . To do this, it is necessary to calculate the correlation matrix according to the «Genuine» training set. Any pair of correlated features potentially generates one meta-feature. Let Nand N + be the number of neurons focused on processing the levels of correlated data of C j,t < Cand C j,t > C + , respectively. The condition N -≈ N + should be observed (discrepancy is allowed by 1-3 neurons). Each neuron must handle a unique combination of meta-features and generate 2 bits at the output. The required number of neurons is determined based on the required key length L. For practical purposes, a sufficient length is L = 1024 bits, then N -= N + = L/2/2 = 256. Then, if η = 4, 2048 pairs of features will be required (1024 = 256 · 4 pairs for each level of correlation) for the synthesis of the c-neuro-extractor. When using autoencoders, the number of features can be made arbitrary. For example, 256 features give 32,640 potential pairs, of 2048 (≈6.27%) are chosen.
The proposed algorithm of the synthesis and training of the c-neuro-extractor can be summarized as a sequence of steps:

1.
Calculation of the feature correlation matrix.

2.
Counting pairs of negatively correlated features (C j,t < C -). If the number of pairs is less than η·N -, then Cis increased by 0.05 and this step is repeated.

3.
Synthesis and training of Nneurons for the analysis of negatively correlated data in accordance with the algorithm in Figure 10. If the number of neurons satisfying the conditions of the algorithm in Figure 10 turns out to be less than N -, then Cincreases by 0.05 and steps 2-3 are repeated.

4.
Counting pairs of positively correlated features (C j,t > C + ). If the number of pairs is less than η·N + , then C + decreases by 0.05 and this step is repeated.

5.
Synthesis and training of N + neurons for the analysis of positively correlated data in accordance with the algorithm in Figure 10. If the number of neurons satisfying the conditions of the algorithm in Figure 10 is less than N + , then C + decreases by 0.05 and steps 4-5 are repeated.
As the interval (C -; C + ) narrows, it is permissible not to delete already created neurons (new neurons can be added to existing ones). The algorithm is executed until the condition N -= N + = L/4 is fulfilled or until the condition C -≤ 0.3 ∨ C + ≥ 0.3 is violated. The latter means that it is not possible for the user to associate the c-neuro-extractor with the key length L.
Weight coefficient tables and the numbers of hash transformations of the trained correlation neurons represent the secure template of the user.
A c-neuro-extractor has the following peculiarities: • correlation neurons are not affected by the problem of imbalance in training (the size of the «Impostors» training set is much larger than the size of the "Genuine" training set); • the correlation network setting up process is robust (overfitting does not occur); • the length of the key associated with the c-neuro-extractor is potentially much larger than those associated with the fuzzy extractors and the base model of the neuro-extractor; • this model should have a much higher level of resistance to adversarial attacks [16,17] than the classical deep network with the softmax activation function at the output, at least in terms of the effect on the FAR indicator. Adding noise and other modifications is unlikely to affect the closeness of the correlations of the «Impostor» image to the «Genuine» image.
In this work, two intervals of correlation of features (1 > C j,t > C + and −1 < C j,t < C -) and 4 intervals of quantization for the activation function are considered (19). Increasing the number of intervals should enhance the hashing properties of the c-neuro-extractor.

Data Set
A biometric authentication method based on an analysis of the acoustic properties of the ear canal was proposed in [1] and uses headphones with a built-in microphone. The headphones produce a sound signal that resonates in the ear canal. Since the inner ear structure for each person has individual characteristics, the parameters of the propagating sound signal in each ear change in different ways. The microphone records the reproduced signal whose parameters can be considered biometric features. The structure of the ear canal does not change significantly after eight years of age [22].
Traditional biometric parameters, such as fingerprints, face images, signatures, voices, etc., are compromised in the natural environment. These parameters can be "intercepted", for example, fingerprints can be removed from door handles, mugs, and photographs and voices can be recorded on a dictaphone. The advantages of the method in [1] are that the characteristics of the auditory canal are hidden from direct observation and cannot be copied by being photographed. A "flat" ear image is not informative enough to make adversarial copies.
In the previously mentioned study [1], a set of impersonal (depersonalized) data of ear canal echograms of 75 computer users aged 18-40 years (AIC-ears-75) was collected, which is available for research purposes. Each echogram (or acoustic image of the auditory canal) is presented as a wav format file (mono, 44 kHz, 16 bit). Fifteen measurements of each ear were made for each user. After each measurement, the user removed and put on the headphones again (a device in the form of headphones with a built-in microphone for recording the reflected signal). Each user had to listen to a mono sound signal of increasing and decreasing frequencies (sliding modulated sine), obtained using linear frequency modulations (chirp signals). The signal frequencies varied in the range of 1 kHz to 14 kHz and the signal duration was 10 s (5 s frequency increases, 5 s decreases). The dataset includes 2 folders for the right and left ears with each folder containing 75 subfolders of the ear measurements of the related users.

Image Preprocessing
For the preliminary processing of the acoustic images of the auditory canal, the method of calculating the so-called averaged signal spectrum proposed in [1] was used. To obtain the averaged amplitude spectrum using a short-time Fourier transform, we first calculated the spectrogram with a window size W size = 65,536 (about one-and-a-half seconds of the signal according to [1]) and a step size W step = 16,384 (2 times less than that used in [1]). Further, for all windows, the spectrum of the average values of the amplitudes was calculated depending on the frequency (Figure 11a). The first 1500 and last 3000 samples were removed from the averaged spectrum in order not to take into account frequencies less than 1 kHz (this is the initial frequency of the chirp signal) and more than 20 kHz (the microphone was not able to register these frequencies). Frequencies of 14-20 kHz were considered since useful information can appear in overtones. Then, the obtained averaged spectra were "compressed" to 2048 samples using the linear interpolation algorithm. Before being fed into the neural network, the images were reduced to the range of values [0; 1]. creasing and decreasing frequencies (sliding modulated sine), obtained using linear frequency modulations (chirp signals). The signal frequencies varied in the range of 1 kHz to 14 kHz and the signal duration was 10 s (5 s frequency increases, 5 s decreases). The dataset includes 2 folders for the right and left ears with each folder containing 75 subfolders of the ear measurements of the related users.

Image Preprocessing
For the preliminary processing of the acoustic images of the auditory canal, the method of calculating the so-called averaged signal spectrum proposed in [1] was used. To obtain the averaged amplitude spectrum using a short-time Fourier transform, we first calculated the spectrogram with a window size Wsize = 65,536 (about one-and-a-half seconds of the signal according to [1]) and a step size Wstep = 16,384 (2 times less than that used in [1]). Further, for all windows, the spectrum of the average values of the amplitudes was calculated depending on the frequency (Figure 11a). The first 1500 and last 3000 samples were removed from the averaged spectrum in order not to take into account frequencies less than 1 kHz (this is the initial frequency of the chirp signal) and more than 20 kHz (the microphone was not able to register these frequencies). Frequencies of 14-20 kHz were considered since useful information can appear in overtones. Then, the obtained averaged spectra were "compressed" to 2048 samples using the linear interpolation algorithm. Before being fed into the neural network, the images were reduced to the range of values [0; 1].
In this work, the following window functions were used to obtain averaged spectra: Hamming, Blackman, triangular (Bartlett), rectangular, Gaussian (standard and parametric (generalized) with a shape parameter value of 1.5), and Laplace. Various features were extracted depending on the applied window function.

Autoencoder Training for Feature Extraction
To build feature extraction units, it was decided to use two similar autoencoder architectures, as presented in Tables 2 and 3. The decoders of both autoencoders were identical. Any neural network can extract features that differ from the features extracted using another neural network trained independently of the first, even if the training sets are identical. The correlation of features extracted by different neural networks is a desirable property. In this work, the following window functions were used to obtain averaged spectra: Hamming, Blackman, triangular (Bartlett), rectangular, Gaussian (standard and parametric (generalized) with a shape parameter value of 1.5), and Laplace. Various features were extracted depending on the applied window function.

Autoencoder Training for Feature Extraction
To build feature extraction units, it was decided to use two similar autoencoder architectures, as presented in Tables 2 and 3. The decoders of both autoencoders were identical.
Any neural network can extract features that differ from the features extracted using another neural network trained independently of the first, even if the training sets are identical. The correlation of features extracted by different neural networks is a desirable property.  It is known that the training of multilayer neural networks requires a large training set. However, the AIC-ears-75 database consists of only 2250 images, which is not enough for the training of autoencoders and subsequent testing of c-neuro-extractors.
Since the acoustic image of the ear has significant similarities with the voice signal, as well as their averaged spectra (Figure 11b-d), it was decided to use the following learning transfer scheme. From the speech datasets TIMIT (formed in 1993) and VoxCeleb1 [23], a total of 71,264 voice images of speakers were extracted. The sizes of the images ranged from 25 to 250 kbytes. Their averaged spectra were visually similar to the averaged spectra of the ear canal echograms using this size of sound files. However, the sampling frequency of the voice signals differed from the frequency characteristics of the ear canal echograms (16 kHz, the analyzed frequency range was up to 8 kHz, Figure 11b). These images were also converted to averaged spectra of 2048 amplitudes (with the parameters W size = 4096 and W step = 2048). In this case, only four window functions were used: Hamming, Blackman, triangular, and rectangular. Thus, 285,056 samples of averaged voice spectra, which were used to train two autoencoders, were obtained (Tables 2 and 3

Autoencoder Training for Feature Extraction
A computational experiment was performed. The AIC-ears-75 dataset was randomly divided into two parts: 50 users (100 ears) were «Genuine Users» and 25 users (50 ears) were «Unseen Impostors». All images were processed by both encoders. In the first stage of the experiment, the images of the left and right ears were divided into two classes as if they were two different users (Tables 4 and 5). In the second stage, the features extracted from both ears were combined into one (double) image (Table 6). Table 4. Error probabilities of user personality verification with the image of one ear using features extracted by the encoder based on architectures 1 and 2 (with C + = 0.5, C -= 0.5, AUC MAX = 0.3, K G = 8).  Table 5. Error probabilities of user personality verification with the image of one ear using combined features extracted by both encoders (at C + = 0.5, C -= 0.5, AUC MAX = 0.3, K G = 8). Samples of «Genuine» and «Impostors» were formed to train each c-neuro-extractor. The «Genuine» training set consisted of four to eight images (8 ≥ K G ≥ 4) of a specific user from the set of «Genuine Users», and the rest of the images of the user were used to test and calculate the FRR. The «Impostors» training set was formed from the images of other users from the «Genuine Users» set; one image of each ear was considered (K I = 99 in the first stage of the experiment and K I = 49 in the second, where K I is the number of images in the «Impostors» training set). Images from the set of «Unseen Impostors» were used only for testing and determining the FAR. The experimental results are presented in Tables 4-6. A similar computational experiment was performed but used a base neuro-extractor model that was trained in accordance with GOST R 52,633.5. The best results are as follows:

Type of Spectra
• EER = 0.03041 (FRR = 0.2288 at FAR < 0.001) with a key length L = 716; the size of the «Genuine» training set was K G = 8 and the size of the «Impostors» training set was K I = 49.

Discussion
The feature vectors extracted from the averaged spectra of the same signal but based on different windows (Figure 11c,d) are strongly correlated (correlation coefficient ranged from 0.9 to 0.99). However, as we can see in Table 4, the use of strongly correlated features allows for a twofold reduction in the error probability and also a significant increase in the length of the associated key. We can also see that combining the features obtained using encoders that are slightly different in their architectures (Tables 2 and 3) also has a positive effect (Table 5). For example, by combining the features extracted by both encoders based on the Hamming spectra, it is possible to reduce the error probability twofold (Tables 4 and 5). In addition, when combining the features extracted by the encoders from the spectra based on the Hamming window and a rectangular window, the error probability decreases by more than 10%, whereas the key length increases by a factor of 2 (Tables 4 and 5). The obtained results suggest that many similar feature extraction units (similar but slightly different nonlinear transformations with respect to the patterns/images) can be used to improve the performance of the c-neuro-extractor. This technique, as a rule, does not have such a tangible effect in combination with other machine learning methods (for most classifiers, these sets of features will be similar in information content). Of course, there is a limit to the reduction of errors, but this issue deserves a separate study. A significant change in the number of neuron inputs is an unproductive approach ( Table 5). The best results for the dataset used are achieved with η = 5 (Table 6). We can also conclude that it is not worth changing the boundaries of the intervals (C -; C + ) significantly, and the optimal values are C -= −0.5, C + = 0.5 (Table 6). This can probably be explained by the fact that on the small «Genuine» training set (8 ≥ K G ≥ 4), the correlation coefficients are calculated with large errors. With an increase in Cand C + in the modulus, the set of pairs of correlated features that carry information does not fall into the specified intervals and is not used. When the modulus Cand C + are decreased, a lot of feature pairs that do not carry useful information are taken into account. There is also an optimum for AUC MAX ≈0.3 (at least for the AIC-ears-75 dataset). Too large an AUC MAX value negatively affects the results since more unstable neurons are created; too small a value leads to the creation of a small number of neurons.
Reducing the size of the «Genuine» training set does not greatly affect the probability of errors ( Table 6). The model learns well with 8, 7, and 6 examples of images. Thus, the use of a large sample set for the c-neuro-extractor training is not required. Figure 12 shows that if 16% of erroneous bits are corrected at the c-neuro-extractor output, then the authentication system can be configured for the following indicators: FAR < 0.001 at FRR = 0.093, which in general can be useful for practical purposes. The correction of the wrong key bits can be performed by classical methods of error-correcting coding (but unlike fuzzy extractors, this will not affect the key length). The obtained indicators are probably not limiting since the results can be improved by adding other window functions or several autoencoders with a different architecture (this is not the aim of the work). In any case, the proposed models are very promising for further development. REVIEW 2 If necessary, the bit sequence arising at the output of the c-neuro-extractor transformed into a key of a desired length by applying the cryptographic hash fu (after the correction of erroneous bits).

022, 22, x FOR PEER
The comparison of the obtained results with those previously achieved is pre in Table 7. The c-neuro-extractor, as we can see, showed better results than those ously achieved.   If necessary, the bit sequence arising at the output of the c-neuro-extractor can be transformed into a key of a desired length by applying the cryptographic hash function (after the correction of erroneous bits).
The comparison of the obtained results with those previously achieved is presented in Table 7. The c-neuro-extractor, as we can see, showed better results than those previously achieved.

Conclusions
The classical theory of mathematical statistics states that if features are correlated, they duplicate certain information. However, the obtained results indicate the opposite, that is, strongly correlated pairs of features contain additional information. Of course, a theory that has been proven for decades cannot be wrong. The data obtained only clarify it in relation to the problems of image (pattern) classification.
Correlation links between features deform the feature space. The nature of the curvature relative to each class of images is generally different since for different classes of images, the correlation matrices of features can differ significantly. This curvature makes it difficult to construct separating hyperplanes in the feature space in the process of machine learning. Independent (weakly correlated) informative meta-features can be extracted from correlated pairs of features. One meta-feature can contain 2-3 times more information than is contained in the pair of initial features from which it was generated (I' j,t > I j + I t ). The new meta-feature space is called the Bayes-Minkowski meta-feature space.
A model of correlation neurons is proposed, which allows for the use of the Bayes-Minkowski meta-features for image (pattern) classification. The network of correlation neurons is trained automatically with a small training set and can be combined with a pre-trained deep neural network. This provides benefits both in terms of the simplification of the training procedure and information security. Correlation neuron networks are potentially more resistant to destructive influences such as adversarial attacks. When identifying an image that does not belong to any of the known classes, the networks of correlation neurons should generate an almost random binary code that should be ignored when making decisions. This work does not assert the superiority of networks of correlation neurons over multilayer networks, but only indicates the promising nature of correlation neurons in some respects (automatic learning/additional training, ensuring the safety of the decision-making process).
The proposed model of a c-neuro-extractor allows for the association of a cryptographic key or password with a user's biometric image and the storage of both of these components safely without compromise. The proposed model surpasses the previously known models (fuzzy extractors, neuro-extractors) in key length while allowing a low percentage of obtained erroneous decisions. The experimental results showed high efficiency of the proposed model in the problems of key generation based on acoustic images of the ear: EER = 0.0238 (FRR = 0.093 at FAR < 0.001) with a key length L = 8192 bits; the volume of the «Genuine» training set was K G = 6 and that of the «Impostors» training set was K I = 49 (the following values were obtained for the base model of the neuro-extractor: EER = 0.03041, FRR = 0.2288, FAR < 0.001, L = 716, K G = 8, K I = 49).
There are many potential neuron constructs that allow for the analysis of correlations without compromising biometric templates. We can say that the Bayes-Minkowski measure is an "antagonist" in relation to the Minkowski measure since it has the opposite properties (makes fewer mistakes if the features are correlated and more errors if the features are independent). Therefore, they can be used together, by analyzing the strongly correlated features using the Bayes-Minkowski neurons and the weakly correlated ones with the Minkowski neurons. There are also plans to improve the learning algorithm for correlation networks.
In addition to biometric authentication, networks of correlation neurons can be used in other tasks of image (pattern) classification (also in combination with classical deep networks), especially in cases where the size of the training set is limited. The use of networks of correlation neurons for the synthesis of artificial intelligence resistant to adversarial attacks, as well as attacks aimed at extracting AI knowledge and various manipulations with AI models, seems promising.