1. Introduction
Review [
1,
2] examined the state of research in the field of associative data protection in scene analysis [
3,
4]. Associativity is defined by the use of a masking mechanism for binary matrix etalons of decimal digits of code representations of object names and their coordinates. Both are encoded by
k-bit numbers. Each number digit is entered into a binary matrix of size
,
. The masking algorithm performs a division of the etalons of the set
, where
Q is the base of the number system (in this case,
), and sequentially formed subsets into dichotomic pairs according to the value of one bit. The positions of this bit correspond to the unit in the inverse (assuming bit-wise inversion, i.e., complement) matrices of the masks of both subsets of the pair.
We used decimal coding with postal (ZIP code-style) symbols (
Figure 1). Individual bits of the etalon set were placed along the outer contour and inner “zigzag” of the binary matrices (
Figure 2).
The mask generation process is random. For each matrix, a separate mask matrix of the same size is created, which stores the bits essential for its identification in the etalon. The set of masks is the key to recognition. The masked bits are subject to randomization. As a result, each numerical code is converted into a
k-section steganographic container, initially filled with a segment of a pseudo-random sequence of length
, into which randomly stored code bits are interspersed at the positions of the units of the inverse matrix masks. Regardless of
n, the average number of such bits
remains constant. Here
M is the mathematical expectation for a single reference. At the same time, as n increases, the steganographic security of the method also increases due to the increase in
[
5].
The integration of cryptography and steganography is a characteristic feature of associative security.
The previous study [
2] addressed the problem of associative security primarily at the level of engineering optimization: minimizing the average number of bit inclusions M in the GAMMA container while maintaining recognition capability. However, this optimization-oriented perspective left unresolved a more fundamental issue that directly affects the practical viability of the method. Specifically, the non-uniform distribution of unmasked bits along the container length creates two interrelated vulnerabilities. First, the distribution spikes at nodal points provide an adversary with exploitable structural regularities, enabling targeted distortion of the transmitted message through intentional network noise. Second, the requirement to exclude keys producing inclusions at nodal points leads to a drastic reduction in the usable key space (to 27% of the total for
), thereby degrading cryptographic strength. Thus, the central problem addressed in this work is the elimination of distribution non-uniformity through principled selection of etalon configurations—a problem that lies at the intersection of bit distribution uniformity and key space preservation, and that has direct implications for adversarial robustness. Unlike [
2], which treated etalon reconfiguration as a means of reducing M, the present work elevates the uniformity of bit distribution to a primary design criterion and formulates a heuristic rule grounded in the analysis of dichotomous partition cardinalities, thereby shifting the level of inquiry from parameter optimization to structural design of the encoding system.
Notation and Definitions
To facilitate reading across disciplines, we summarize the core notation and definitions used throughout the paper.
Number system base Q: The radix of the positional number system used to encode object names and coordinates. In this work, (decimal) and (hexadecimal).
Code length k: The number of digits in the code representation of an object name or coordinate value. For , the maximum number of representable values is .
Binary matrix-etalon: A binary matrix of size , where , that encodes the graphical representation of a single digit from the set . The positions of ones in this matrix define the shape (configuration) of the digit. The term “etalon” (from French étalon, meaning “reference standard”) denotes a reference pattern against which incoming data is matched during recognition.
Etalon outline: The ordered sequence of matrix cells traversed along the outer contour (clockwise, starting from the lower-left corner) and then along the inner zigzag of the binary matrix (see
Figure 2). This linearization maps the two-dimensional matrix to a one-dimensional bit sequence.
Nodal points: The corner cells of the binary matrix that lie at the junctions of contour segments (highlighted in
Figure 2). These points are shared by two or more segments and play a special role in the dichotomous masking process.
Mask matrix: A binary matrix of the same size as the etalon, generated randomly for each etalon in the set. The positions of ones in the mask indicate the bits of the etalon that are preserved (stored) in the container; the remaining bits are replaced by pseudorandom values. The complete collection of mask matrices for all Q etalons constitutes the secret key.
Dichotomous partitioning: The recursive process by which the masking algorithm divides the etalon set and its subsequent subsets into pairs based on the value of a single bit at a given position of the outline. At each step, the bit position where the partition occurs is recorded in the inverse mask matrices of both resulting subsets.
GAMMA container: A pseudorandom binary sequence of length that serves as the steganographic carrier. The unmasked (preserved) bits of all k etalons are embedded into this sequence at positions determined by the inverse masks, while all other positions retain their pseudorandom values. The container thus integrates cryptographic (key-dependent masking) and steganographic (embedding in a pseudorandom carrier) protection.
Mathematical expectation of stored bits M: The average number of etalon bits that survive the masking process and are embedded into the container for a single etalon, computed based on generated keys.
Distribution function of bit inclusions: The function that maps each position along the linearized etalon outline to the total number of stored (unmasked) bits at that position, aggregated over all keys. A uniform distribution indicates that no position is disproportionately favoured, which is the design objective of this work.
Permutation: An ordered arrangement of Q symbols that defines the initial assignment of digits to etalon shapes in the masking algorithm. The total number of distinct permutations is . Different permutations produce different keys.
2. Research Objectives
The perspective for the development of research on associative security outlined in [
2] is linked to the reconfiguration of digital etalons and the transition from a decimal to a hexadecimal system. This transition increases the maximum number of object names and coordinate values from
(decimal) to
(hexadecimal) for
. This should significantly increase cryptographic strength, provided that complete coverage is achieved—meaning that GAMMA sequence is generated once using the cryptographic version of the Mersenne Twister pseudorandom number generator [
6,
7]. Given an acceptable key search, decryption of the container contents will then yield the complete set of names. The key space also expands significantly due to a much larger number of permutations available with hexadecimal symbols. The use of such permutations is characteristic of the developed masking algorithm [
2].
The criterion for selecting etalon configurations is not trivial. The emphasis of reconfiguration in [
2] was made on minimising the average number of bit inclusions in the information-carrying GAMMA container. Our findings give
for “postal coding,” rather than
as in [
2]. This is already quite close to the minimum possible value
for decimal encoding. For the set
, this number is reduced to
by sequentially dividing the entire set and the resulting subsets in half [
2].
In this article, the focus has changed while keeping an interest in the obtained
M values. We suppose that the main issue is the need to eliminate spikes in the distribution function of unmasked bits along the container length. Such spikes make it easier for an adversary to distort the transmitted message using intentional network noise. To obtain a uniform distribution function of inclusions along the length of the container in [
2], it was necessary to prohibit the appearance of units at the nodal points highlighted in
Figure 2 when forming key sets of inverse mask matrices. This led to a significant reduction in the number of keys used (“correct” keys) and, consequently, to a decrease in cryptographic strength.
To ensure programming convenience we abandon the reverse diagonal approach introduced in [
2] and return to the contour shown in
Figure 2. This choice facilitates the transition from matrix to linear representation, which leads to a significant reduction in the amount of data transferred. Moreover, as the research has shown, this makes it possible to better meet the reconfiguration criterion adopted in [
2].
The randomness of key (i.e., mask set) generation is largely determined by the random selection of the initial permutation of characters in the set
. The number of permutations of
r elements is
.
Figure 3 shows the distribution of inverse mask units along the etalon outlines (
Figure 2) for postal symbols (
Figure 1), obtained using the Fisher–Yates shuffling algorithm [
8,
9].
The majority of the individual bits of all keys involved are distributed among the nodes shown in
Figure 2. In
Figure 3, the zero position corresponds to the lower left point in
Figure 2, the outer contour is traversed clockwise, then continuously in a zigzag pattern (linearisation of the matrix representation of the binary etalon).
It was useful to check whether the spikes in the distribution function were the result of an unsuccessful set of statistics. The most reliable check would be to go through all possible permutations. For
, the number of possible permutations is 10! = 3,628,800. The current state of computer technology allows us to generate this list in a fraction of a second. Complete iteration over the list during mask generation took 30 min and confirmed the validity of the previously obtained result (see
Figure 4;
in this case).
Let us refer to
Figure 3. Each key has an average of 8 occurrences of stored bits at selected points. This casts doubt on the existence of “correct” keys with zeros at all node points. Nevertheless, such keys do exist [
1]. We analysed all
obtained keys. It was found that 73% of them contain at least one “critical” mask (on average, 8 such masks for each non-working key). In other words, only 27% of the keys can be used in practice. This could lead to a reduction in the cryptographic security strength.
This defines our primary research objective: finding such a rule for selection of digital etalon shapes that will eliminate spikes in the inclusion distribution function.
Spikes occur as a result of combining multiple keys. Figuring out an appropriate etalon selection rule, we should bear in mind that the key generation system [
10] is subject to external influences, specifically changes in etalon shapes. Therefore, we can only observe the effects of this influence without explaining the underlying mechanisms. Explanation is, of course, a key function of science [
11], but a full explanation here proves difficult.
Studying
Figure 4 can provide us with valuable information regarding the heuristic rule that we seek to develop. According to the condition, the outline of the etalon points between the nodal points is composed of line segments consisting only of ones or only of zeros. The results of the first dichotomous partitions for all points of each segment, except for the nodal points, are identical. Examination of
Figure 4 reveals that the distribution for each segment is also identical. Therefore, it is logical to assume that:
- 1.
The behaviour of the system at each point of the outline is determined by the result of the first dichotomous division at that point.
- 2.
To determine the cause of the spikes, it is enough to consider the case .
3. The Etalon Configurations Selection Rule and the Prerequisites for Its Verification
Our Algorithm 1 for generating the complete set of permutations of
r elements is based on the expression:
.
| Algorithm 1 Generation of the complete set of permutations of r elements |
- Require:
Initial permutation - Ensure:
Complete list of permutations - 1:
procedure GeneratePermutations(P, r) - 2:
if then - 3:
Record P - 4:
Perform one cyclic left shift on P; record the result - 5:
return - 6:
end if - 7:
for to r do - 8:
Record the current state of P - 9:
GeneratePermutations(, ) - 10:
Perform a cyclic left shift on P - 11:
end for - 12:
end procedure
|
A complete permutation list is characterized by the equal presence of each element in any position of the sequence
times. This fact is illustrated by
Table 1 using the example of
, where
.
Let us consider the case
. It is characteristic that the spikes in
Figure 4 appear only at the nodal points of the outer contour of
Figure 2a. There is no spike at the upper right point, because there is no dichotomous pair for it. Among the dichotomous pairs at the nodal points, two pairs include singleton subsets, two include subsets of two elements, and one includes a subset of three elements. At other points where no spikes are observed, five pairs include subsets of cardinality 4, three pairs include subsets of cardinality 5, and one pair includes a subset of cardinality 3. However, the node with the subset of cardinality 3 produces a spike. Therefore, configurations exhibiting this property should be excluded.
According to the assumption given in
Section 2, we come to the following conclusion. A necessary condition for the absence of spikes in the distribution function is the absence of dichotomy at certain points or (what is still hypothetical) that during the first division in the masking process, dichotomous selections occur at the points of the outline in
Figure 2a, and subsets of cardinality (4–5)—for
and, by analogy, (7–8)—for
, close in cardinality to the subsets selected in [
1] to minimise the number of inclusions.
Based on this analysis, we propose the following heuristic rule:
Rule 1. The configurations of digital etalons should be chosen in such a way that the previously specified conditions are fulfilled in order to obtain a uniform distribution function.
The above rule was derived from the analysis of the minimal case
. We now provide a justification for its extrapolation to arbitrary values of
n. The outer contour of the binary matrix of size
,
, contains exactly four corner (nodal) points at fixed relative positions (
Figure 2), regardless of
n. Increasing
n extends the contour segments between these points but does not alter the number or topology of the nodal points themselves. Since the distribution spikes occur precisely at the nodal points, the structural source of the problem is identical for all
n.
Moreover, the result of the first dichotomous partition at any contour point depends solely on the number of etalons having a 1 versus a 0 at that position, which is determined entirely by the etalon configurations and not by the matrix size. Between any two adjacent nodal points, the contour consists of a segment with identical bit values for each etalon, so all non-nodal points within a segment produce the same first-partition result. This segment homogeneity holds for all n, as increasing n only adds points within each segment without affecting its uniformity. After the first partition, all subsequent recursive partitions operate on subsets whose composition is already fixed by the first step, meaning that deeper recursion levels are modulated by the n-invariant first partition. Collectively, these invariants imply that the conditions producing or eliminating spikes at the nodal points are governed by the etalon configurations alone.
To supplement this structural argument with empirical evidence, we conducted additional experiments at intermediate values
and
for both
and
(with configurations presented in
Section 4). In all cases, the distribution of bit inclusions along the container remained uniform, with no spikes at the nodal points, and the values of
M coincided with those obtained for
and
. This is consistent with the analysis above: since
M depends on the partition structure rather than on the matrix dimensions, it remains constant across
n. What does change with
n is the container length
and consequently the steganographic security ratio
, which grows linearly.
We acknowledge that this extrapolation applies specifically to the distribution uniformity criterion and the value of M. Other aspects of system behaviour—such as computational efficiency of key generation and practical resistance to specific statistical attacks at different container lengths—may exhibit n-dependent properties and require separate investigation.
This hypothesis requires experimental verification. For the case , testing can be performed on the entire set of permutations. However, when , the total number of permutations becomes as much as 16! = 20,922,789,888,000. Exhaustively enumerating all permutations is computationally infeasible. It is therefore practical to construct a suitable limited test list for the purposes of research.
We now clarify what “suitable” means as applied to a limited test list. To ensure suitability, we must select a subset of permutations in such a way that will allow to reproduce the same distribution function as could be obtained by the usage of complete permutation set. For this purpose we should strive to maintain the property which is naturally inherent to a complete permutation set: that each element is equally contained in any position of the r-sequence.
To form such a list for even values of
r, the following Algorithm 2 is proposed:
| Algorithm 2 Generation of the test list of permutations for even r |
- Require:
Initial permutation , r is even - Ensure:
Test list of permutations - 1:
Perform cyclic left shifts on P, obtaining r permutations (including the initial one) - 2:
for each of the r resulting permutations do - 3:
Split it into two halves of elements - 4:
Generate the complete list of permutations for each half using Algorithm 1 - 5:
end for
|
For this algorithm, the cardinality of the resulting set of permutations is equal to
. Each element is present in any position of the
r-sequence
times. These positions are illustrated in
Table 2 (
) and
Table 3 (
). When
, we obtain 12 test permutations, while
. All permutations are different. Each element appears three times in any position. When
, we obtain 66 different test permutations, while
. Each element appears 11 times in any position.
For
(
), we obtain
different test permutations, i.e., approximately
of the total set. Each decimal digit appears 239 times in any position of the
r-sequence. The validity of using the test list to estimate the distribution of inclusions along the length of the container is illustrated in
Figure 5 using the example of
(compare with
Figure 4).
To provide a quantitative assessment of the agreement between the distributions obtained from the test list (
Figure 5) and the complete permutation set (
Figure 4), we introduce two statistical metrics [
15]. Let
and
denote the normalized distribution values at position
i of the etalon outline for the full permutation set and the test list, respectively, where
and
is the outline length. Normalization is performed by dividing the raw counts by the total number of keys in each case, so that the distributions are directly comparable regardless of sample size.
The first metric is the maximum absolute deviation , analogous to the Kolmogorov–Smirnov statistic. For , , the computed value is , indicating that the largest pointwise discrepancy between the two normalized distributions does not exceed .
The second metric is the root mean square error . The obtained value confirms that the average deviation across all outline positions is negligible.
These two metrics collectively confirm that the test list of 2390 permutations, comprising only of the full set, reproduces the distribution of bit inclusions with high fidelity. The property of equal element presence at every position of the r-sequence, ensured by Algorithm 2, provides the structural basis for this agreement. We therefore consider the test list to be a reliable proxy for the complete permutation set in subsequent experiments with , where exhaustive enumeration of permutations is computationally prohibitive.
The requirements of the formulated rule were fulfilled in the configurations of digital etalons shown in
Figure 6. Of the 15 points in
Figure 2a, at
(
Figure 6a) we have: 4 points with no dichotomous partitions; and among the rest (with existing partitions): 7 points with subsets of cardinality 4 and 4 points with subsets of cardinality 5. At
(
Figure 6b)—the previous 4 points of absence, 8 points with subsets of cardinality 8, 3 points—subsets of cardinality 7.
The proposed
rule formulates necessary but not sufficient conditions. Repeated adjustments are possible in some etalons. In the course of further planned research, an increase in the duration of the computational experiment cannot be ruled out. This led to a revision of the software implementation of the masking algorithm presented in [
1].
The new software implementation of the masking algorithm (see
Figure A1 in
Appendix A) employs recursive dichotomous partitioning and utilizes the BitVectorLib library [
16] for efficient bitwise operations. The implementation targets the .NET 10 platform. On the full set of permutations in the case of
, the original implementation [
17,
18] required approximately 38 min, whereas the new implementation completes the same task in 51 s in single-threaded mode—a roughly 45-fold acceleration achieved solely through algorithmic optimizations, including compact bit-vector representations and elimination of redundant memory allocations. Further acceleration is attainable by engaging the Parallel class of the .NET framework [
19]; however, a systematic study of parallel scalability is beyond the scope of this article and is planned as a separate investigation. The testing was carried out on a hardware platform with the following characteristics: Intel Core i5-9300H processor, 16 GB DDR4 RAM, Windows 11 operating system (10.0.22631.4460).
4. Results of the Computational Experiment
This section presents the results of experimental verification of the formulated rule for the proposed etalon configurations on the sets
and
with evaluation of the values of
M and enumeration of the complete and limited test lists of permutations, respectively. The value of
n seriously affects the volume of the transmitted message. Thus, the research is limited to the case
, which retains the advantages of associative protection [
20].
Case : The experiment with shapes in
Figure 6a led to the appearance of a spike in the upper left node of
Figure 2a. It can be seen that there is no spike in the lower right node. We consider the reason for this lies in the fact that the first position gives a subset with the cardinality of 4 in the first division, while the second position divides the complete set
exactly in half. To align the situations, we had to change the representation of the digit 3 (
Figure 7).
As a result, the desired uniform distribution was obtained (
Figure 8) with the value
.
Case : The figure is similar to that observed in the case of
.
Figure 6b still maintains the spike in the upper left node of
Figure 2a. Note that during the first division, subsets with a cardinality of
,
are selected for this and the lower-right nodes in this case. The applied correction principle is the same as before: we adjust the cardinality of the selected subsets in the upper-left node to the level
assumed for the case
, correcting the representation of the number 3 (
Figure 9).
This ensured the required uniform distribution (
Figure 10) with a value of
.
Practical significance of the obtained M values. The mathematical expectation M of stored etalon bits plays a dual role in the associative protection system, simultaneously affecting recognition reliability and steganographic concealment. We now discuss the practical implications of the values (for ) and (for ) obtained with the proposed configurations.
From the standpoint of recognition reliability, M represents the average number of bits per etalon that survive the masking process and are available for pattern matching during decoding. A higher M increases the probability of correct identification of each digit, since more reference bits are preserved in the container. However, this advantage is bounded: beyond a certain threshold, additional preserved bits yield diminishing returns in recognition accuracy while increasing the system’s vulnerability to detection. For the decimal case, the obtained value lies close to the theoretical minimum , meaning that each etalon retains on average only bits out of the 15 positions in the outline. This is sufficient for unambiguous identification of each of the digits (since bits are needed in theory), while keeping the information footprint minimal. For the hexadecimal case, is likewise close to and comfortably exceeds the information-theoretic minimum of bits required to distinguish 16 symbols.
From the standpoint of steganographic security, the critical parameter is not
M alone but the ratio
, which characterizes the degree to which the preserved bits are “diluted” within the pseudorandom container. For
and
, the container length is
bits per code. The total number of embedded bits per code is
, yielding embedding ratios of
for
and
for
. In both cases, the preserved information constitutes less than
of the container, making statistical detection by an adversary substantially more difficult [
3]. A lower
M improves this ratio, enhancing steganographic concealment; a higher
M degrades it.
The transition from to increases M by approximately (from to ), which moderately reduces the embedding ratio. However, this cost is offset by three significant gains: the key space expands from to (an increase by a factor of approximately ), the addressable name space grows from to , and the distribution uniformity is preserved. From an engineering perspective, the slight increase in M represents an acceptable trade-off.
Regarding acceptable thresholds, the lower bound is , below which the masking process cannot produce a valid key for all etalons. The upper bound is less rigid and depends on the application: for high-security scenarios where steganographic concealment is paramount, one should aim for M values as close to as possible; for scenarios prioritizing recognition robustness (e.g., in noisy channels), a moderately higher M may be preferable. The values obtained in this work—within of for and within for —fall well within the range where both security and reliability requirements are satisfied for typical applications.
Applicability boundaries and failure analysis of the proposed rule: As stated above, the proposed rule formulates necessary but not sufficient conditions for achieving a uniform distribution of bit inclusions. We now discuss in greater detail the practical boundaries of this rule, the conditions under which it may fail, and the validation steps that should accompany its application.
The rule requires that the cardinalities of the subsets formed during the first dichotomous partition at each nodal point be close to
. However, this condition alone does not fully determine the distribution behaviour, because the subsequent recursive partitions may introduce secondary non-uniformities that are not captured by the first-level analysis. This is precisely the situation that arose with the initial configurations in
Figure 6: the rule’s conditions were satisfied at all nodal points, yet a spike persisted at the upper-left node. The cause was traced to a subtle asymmetry: at the upper-left node the first partition produced a subset of cardinality 4 (for
), while at the lower-right node the partition divided the complete set exactly in half. The correction required modifying the representation of digit 3 (
Figure 7 and
Figure 9) to align the partition cardinalities at both nodes.
This example serves as a counterexample to the sufficiency of the rule in its original formulation and illustrates a general pattern: when two nodal points undergo first partitions with different subset cardinalities (even if both are individually within the range specified by the rule), the resulting distribution may still exhibit local non-uniformity. Therefore, a stricter interpretation of the rule should require not only that the cardinalities at each nodal point lie within the prescribed range, but also that the cardinalities be consistent across all nodal points of the same type.
More broadly, the rule may fail to guarantee uniformity in the following scenarios: (a) when the etalon configurations produce partition cardinalities that satisfy the range condition but differ significantly between nodal points, as demonstrated above; (b) when the number system base Q is such that is not an integer (i.e., Q is odd), making exact halving impossible and requiring a relaxed cardinality criterion; or (c) when the etalon shapes introduce correlations between bit values at non-nodal positions that propagate through deeper recursion levels in ways not predicted by the first-partition analysis.
Given these limitations, the following post hoc validation procedure is recommended for any new set of etalon configurations:
- 1.
Compute the distribution function of bit inclusions over the complete permutation set (for
) or the test permutation list (for
and larger), using the masking algorithm described in the
Appendix A, and verify the absence of spikes at all nodal points.
- 2.
Compare the obtained value of M with the theoretical minimum for the given Q to ensure that the reconfiguration has not significantly increased the average number of inclusions.
These two checks are computationally inexpensive relative to the design process itself and provide a practical safeguard against over-reliance on the heuristic rule.