Permutation-Based Block Code for Short Packet Communication Systems

This paper presents an approach to the construction of block error-correcting code for data transmission systems with short packets. The need for this is driven by the necessity of information interaction between objects of machine-type communication network with a dynamically changing structure and unique system of commands or alerts for each network object. The codewords of a code are permutations with a given minimum pairwise Hamming distance. The purpose of the study is to develop a statistical method for constructing a code, in contrast to known algebraic methods, and to investigate the code size. An algorithm for generating codewords has been developed. It can be implemented both by enumeration of the full set of permutations, and by enumeration of a given number of randomly selected permutations. We have experimentally determined the dependencies of the average and the maximum values of the code size on the size of a subset of permutations used for constructing the code. A technique for computing approximation quadratic polynomials for the determined code size dependencies has been developed. These polynomials and their corresponding curves estimate the size of a code generated from a subset of random permutations of such a size that a statistically significant experiment cannot be performed. The results of implementing the developed technique for constructing a code based on permutations of lengths 7 and 11 have been presented. The prediction relative error of the code size did not exceed the value of 0.72% for permutation length 11, code distance 9, random permutation subset size 50,000, and permutation statistical study range limited by 5040.


Introduction
The volume of information transfer including confidential information is continuously growing. According to the Statista Research Department report [1], over the next years up to 2025, global data creation is projected to grow to more than 180 zettabytes. A competitive environment has been created for designing and improving both attack systems and information security systems. These circumstances lead to an increase in the mathematical and logical complexity and degree of intellectualization of the used algorithms, processes, and technical means. As a result, the effectiveness and dependability [2] (reliability and security) of telecommunication systems and networks, as well as their components that implement data protection functions need to be improved.
Integrating methods of channel coding and cryptographic protection, or secure channel coding schemes, is one of the ways to increase the efficiency of information-processing
This study develops an approach using permutations.
The methodology of integrated-information security based on non-separable factorial coding [18,19] uses a subset of the set of permutations π of numbers {0; 1; . . . ; M − 1} as codewords. Each number {0; 1; . . . ; M − 1} is encoded by a binary code with a fixed length of l r = log 2 M bits. Such information conversion allows getting a non-standard and redundant frame structure that does not require a separate field for the syncword, allows maintaining frame synchronization on the data signal, and allows the non-separable factorial code being used as a transport mechanism in short packet communications [20][21][22][23][24][25][26][27]. The cost of including syncwords is not negligible in such systems [28][29][30]. Using a nonseparable factorial code makes it possible to effectively search for frame boundaries even with a bit error rate close to 0.5, which is important for information transmission under the conditions of strong noise [31,32]. In addition, non-separable factorial coding may be a suitable tool to implement a cross-layer integrated approach to security and achieve secure short-packet communication from the perspective of both cryptography and physical layer security [26,27].
Previous studies [33,34] investigate the ability of a non-separable factorial code to detect and correct communication channel errors. The efficiency of the code has been proven, which is achieved, among other factors, due to its synchronization properties [31,32]. The studies [33,34] use the binary Hamming distance between codewords.
In this paper, similarly to the error-correcting Reed-Solomon coding [35], we will consider symbols as elements of a codeword. This approach is of interest to ensure reliable transmission of permutations, in particular, for a three-pass cryptographic protocol based on permutations [36].
We introduce the following definition to distinguish between the binary Hamming distance used in previous studies [33,34] and the Hamming distance between permutations of symbols {0; 1; . . . ; M − 1}. Definition 1. The symbol Hamming distance D ij between two permutations π i and π j is the number of symbol positions in which permutations π i and π j are different.
It is obvious that D ij = D ji and 0 ≤ D ij ≤ M. In addition, D ij = 0 if and only if π i = π j . In this case, D min is the symbol code distance. Let N(M, D min ) be the (M, D min )-code size equal to the number of its codewords. Since the code size N(M, D min ) determines the amount of information transmitted by each codeword equal to log 2 (N(M, D min )) bits, the use of a (M, D min )-code of the maximum size is the most efficient in terms of channel capacity. The last statement also follows from the central problem of coding theory [37,38].
In the literature [39], the (M, D min )-codes are called error correcting permutation codes. These codes are used for error correction of powerline communications using M-ary frequency shift keying modulation [40].
In this study, in contrast to known algebraic methods, we present a statistical method for constructing a (M, D min )-code and estimating its size N(M, D min ). We also take into account the fact that the (M, D min )-code must be unique for each object in the dynamic wireless sensor network, and the code agreement between the participants of the information exchange process can take place by applying a cryptographic protocol [36]. In such conditions, increasing the variability and unpredictability of the codeword ensemble is a necessary key condition for ensuring the protocol strength.

Main Contributions
We will generate codewords for a (M, D min )-code by enumerating a set of permutations {π} of length M and selecting permutations with the symbol Hamming distance to all preselected permutations not exceeding the D min value. Constructing a (M, D min )-code is complicated by the fact that when M increases, it is practically impossible to generate M! permutations.
The goal of the study is to determine the dependence of the code size N(M, D min ) on the values of M and D min when using the proposed statistical method.
To achieve this goal, the following tasks must be solved.
• A statistical algorithm to generate codewords for a (M, D min )-code must be developed and implemented. • An analysis of the distribution frequency of a random value N(M, D min ) for a given number of implementations of the codeword generating algorithm must be performed. The distribution law for N(M, D min ) must be determined.

•
The dependences of the average and the maximum (M, D min )-code size, its standard deviation from the parameters M and D min must be explored. • A technique to estimate a (M, D min )-code size depending on parameters M and D min must be developed and applied.

Paper Structure
This paper is organized as follows: Section 2 describes an algorithm to generate a set of codewords, analyses the dependence of the (M, D min )-code size on the values of M and D min , and presents a technique for constructing an approximation polynomial for the code size dependencies; Section 3 shows the results of implementing the developed technique for M = 11 and discusses the results, and Section 4 is the conclusion.  Figure 1 shows the algorithm to generate a set of codewords of a (M, D min )-code.

Algorithm to Generate Codewords
into the set of codewords being generated. Then the second permutation is selected the Hamming distance to the first codeword is calculated for it. If the calculated dist is no less than the given min D value, the second permutation is also placed into the s codewords. Otherwise, the next permutation from the initial set is selected. We cont the process of selecting permutations, calculating the Hamming distances to all sele codewords, and placing the permutation into the set of codewords if all calcu Hamming distances are no less than min D till all permutations of the initial set have enumerated. After that, the number of permutations in the set of codewords is count permutations. At the same time, storing the permutation numbers (or the correspon factorial numbers) instead of the permutations reduces the required amount of mem however, due to additional transformations, it leads to an increase in the time to gen and output a permutation.
To reduce the amount of memory required to store the full set of ! M permutat the initial set of permutations can be generated simultaneously with their analysis. In case, in the algorithm of Figure 1, there is no block for generating the initial set of mutations, and the block for selecting the next permutation is replaced by a bloc generating the next permutation ( Figure 2). Initially the set of codewords does not contain permutations. The initial complete set of M! permutations is generated randomly. The first permutation is selected and placed into the set of codewords being generated. Then the second permutation is selected and the Hamming distance to the first codeword is calculated for it. If the calculated distance is no less than the given D min value, the second permutation is also placed into the set of codewords. Otherwise, the next permutation from the initial set is selected. We continue the process of selecting permutations, calculating the Hamming distances to all selected codewords, and placing the permutation into the set of codewords if all calculated Hamming distances are no less than D min till all permutations of the initial set have been enumerated. After that, the number of permutations in the set of codewords is counted.
Constructing of a complete set of M! permutations can be implemented both by generating them in a certain, for example, lexicographic order with subsequent mixing, and by using random factorial numbers [0; M! − 1] and their bijective transformation into permutations. At the same time, storing the permutation numbers (or the corresponding factorial numbers) instead of the permutations reduces the required amount of memory; however, due to additional transformations, it leads to an increase in the time to generate and output a permutation.
To reduce the amount of memory required to store the full set of M! permutations, the initial set of permutations can be generated simultaneously with their analysis. In this case, in the algorithm of Figure 1, there is no block for generating the initial set of permutations, and the block for selecting the next permutation is replaced by a block for generating the next permutation (Figure 2). At the same time, the uniqueness check of the generated permutation is additionally implemented in the new block. Table 1 shows estimates of the mathematical expectation N(M, D min ) and the standard deviation σ(N(M, D min )) of the code size, as well as its maximum value N max (M, D min ) obtained as a result of implementing the algorithm shown in Figure 1  At the same time, the uniqueness check of the generated permutation is additionally implemented in the new block. Table 1          Let the null hypothesis state that the distribution of a random value N(7, 4) corresponds to a normal distribution. The use of Pearson's chi-squared test χ 2 [48] indicates that there is no reason to reject the null hypothesis with the achieved p-value (significance level) of 0.2768.
The normality of the distribution of a random value N(M, D min ) is also confirmed for M = 7 and D min = 5: p-value = 0.6313 . However, p-value = 0.0000 for M = 7 and D min = 6.
Note that as the value of M increases, the implementation of the algorithm shown in Figure 1 becomes more difficult, since the generation of a complete set of M! permutations requires a significant amount of memory and processor time ( Figure 4). For example, storing of M! = 39, 916, 800 (M = 11) permutations using a fixed length binary code to encode permutation symbols requires 209.37 MB of memory; for M = 15 this amount is 8.92 TB. These calculations do not take into account the need to store service information.
If we add service information then the memory amount required to form a complete set of permutations in the Python programming language [49] is 67 MB for M = 9, 667 MB for M = 10, 7.15 GB for M = 11, and 70 GB for M = 12. It is possible to somewhat reduce the amount of used memory by optimizing the program code. However, it is almost impossible to implement the algorithm shown in Figure 1 on a standard modern workstation when M ≥ 12. modern workstation when 12 M ≥ . The average time to generate one permutation was determined experimenta generating 1,000,000 permutations of a given length M .
All experiments in this research were implemented in the Python program language [49] using the PyCharm Community Edition 2020.3 [50] integrated de ment environment on a desktop personal computer with the following parameters The approach proposed in this study is based on the following. The algori generate a set of codewords shown in Figure 1 is preserved. At the same time, the set of permutations is a proper subset of the complete set of ! M permutations. T of such a proper subset is denoted by lim N .

Algorithms to Generate the Initial Set of lim N Random Permutations
Permutations of the initial set will also be generated randomly. Here, we co two algorithms: The average time to generate one permutation was determined experimentally by generating 1,000,000 permutations of a given length M.
All experiments in this research were implemented in the Python programming language [49] using the PyCharm Community Edition 2020. 3 [50] integrated development environment on a desktop personal computer with the following parameters: Here, we provide the possibility to construct a (M, D min )-code for the values of M that do not allow generating M! permutations in practice.
The approach proposed in this study is based on the following. The algorithm to generate a set of codewords shown in Figure 1 is preserved. At the same time, the initial set of permutations is a proper subset of the complete set of M! permutations. The size of such a proper subset is denoted by N lim .

Algorithms to Generate the Initial Set of N lim Random Permutations
Permutations of the initial set will also be generated randomly. Here, we consider two algorithms:

1.
Generating a random integer decimal number in the range [0; M! − 1], converting the decimal number into a factorial number using division operations [51], and then converting the factorial number into a permutation [51]; 2.
Randomly generating individual digits of a factorial number and converting the factorial number into a permutation.
Both of the above algorithms to generate the initial set of permutations control the uniqueness of permutations within the set ( Figure 5). Note that the above algorithms to generate lim N permutations can also be the block for generating the next permutation of the algorithm in Figure 2. I the algorithms will output the permutation for analysis instead of writin memory. Note that the above algorithms to generate N lim permutations can also be applied in the block for generating the next permutation of the algorithm in Figure 2. In this case, the algorithms will output the permutation for analysis instead of writing it to the memory.
Comparing the speed of the two algorithms for generating random permutations shown in Figure 5, we evaluated the performance of only the distinctive parts of the presented algorithms, the procedures for generating factorial numbers. The average time to generate one factorial number (Figure 6) was calculated based on the results of the generation of 10,000 numbers. second method. In addition, unlike the first algorithm, the processes for the rithm in Figure 5 are convenient for parallelization. This circumstance make to further increase the performance of the algorithm.
In this paper, we will use the second proposed algorithm to generate the lim N random permutations, ! lim N M < .  Table 2.  The achieved graphs indicate that the time to generate a factorial number with the first algorithm ( Figure 5) increases with an increase of the M value much faster than the second method. In addition, unlike the first algorithm, the processes for the second algorithm in Figure 5 are convenient for parallelization. This circumstance makes it possible to further increase the performance of the algorithm.
In this paper, we will use the second proposed algorithm to generate the initial set of N lim random permutations, N lim < M!.

Dependence of the (M, D min )-Code Size on the Values of M, D min , and N lim
We will use (M, D min , N lim ) to denote block factorial code (M, D min ) formed by a subset of N lim random permutations, and will use N(M, D min , N lim ) to denote the size of (M, D min , N lim )-code.
Next, we determine the dependence of the size N(M, D min , N lim ) on the value of N lim . Such dependence can be used both to determine the required value of N lim when designing a data transmission system with a (M, D min )-code, and to evaluate the efficiency of the code constructed from N lim random permutations.
We will determine the dependence N(M, D min , N lim ) experimentally. In this case, the N lim values are formed as follows.
Coefficients of quadratic approximation polynomials for N(7, D min , N lim ) and N max (7, D min , N lim  Note that x = T − t + 1 according to Equation (2). Then the approximation functions can be easily calculated by setting the values of t for the required N lim .

Technique for Constructing an Approximation Polynomial
To construct approximations for dependencies N(M, D min , N lim ) and N max (M, D min , N lim ) and, if necessary, to perform extrapolation to predict the behaviour of these functions at N lim values exceeding the upper limit of the range of their statistical study, it is necessary to perform the next steps:
To generate dependencies N(M, D min , N lim ) and N max (M, D min , N lim ) for the range of N lim values determined in accordance with (2); 3.
To determine approximation polynomials for N(M, D min , N lim ) and N max (M, D min , N lim ).
It is also possible to select the values of N lim for constructing dependencies N(M, D min , N lim ) and N max (M, D min , N lim ) in the opposite direction with respect to (1), from the smallest to the largest. In this case, the method to obtain approximations is as follows: 1.
Dependences N(M, D min , N lim ) and N max (M, D min , N lim ) are also formed for the range of N lim values determined in accordance with (3); 3.
Quadratic approximation polynomials are calculated for N(M, D min , N lim ) and N max (M, D min , N lim ).
To obtain approximation polynomials of the form y = at 2 + bt + c in the obtained expressions of the form y = ax 2 + bx + c, it is necessary to perform the replacement x = t + 1.

Results
Here, we apply the developed method for M = 11 when it is necessary to predict the average and the maximum number of codewords with D min = 9 formed by N lim = 30, 000 and N lim = 50, 000 random permutations.
To construct approximations, we use the N lim values given in Table 2 for the step ∆ = 0.04 at t 0 = ln(ln 7!) = 2.1430. Figure 9 shows the graphs of the estimates of the mathematical expectation N (11, 9, N lim ) and the maximum value N max (11, 9, N lim ) of the code size (11, 9, N lim ) against the value N lim = {13; 16; 20; . . . ; 2617; 5040}. Each value of N (11, 9, N lim ) and N max (11, 9, N lim ) was formed as a result of 10,000 experiments.
Here, we apply the developed method for 11 M = when it is necessary to predict the average and the maximum number of codewords with 9 min D = formed by 30, 000 lim N = and 50, 000 lim N = random permutations. To construct approximations, we use the lim N values given in Table 2 for the step 0.04 ∆ = at ( ) 0 ln ln 7! 2.1430 t = = . Figure 9 shows the graphs of the estimates of the mathematical expectation ( ) 11,9, 11, 9, N lim ) and N max (11, 9, N lim ) when ∆ = 0.04.
The predicted values of 73.3836 and 77.1631 fall within the indicated confidence in- Here, we determine the confidence interval for the obtained sample means [54]: where N sample (M, D min , N lim ) is the sample mean; s = K K−1 · σ sample is the corrected sample standard deviation, σ sample is the sample standard deviation; K is the number of experiments (K = 30); t α/2,K−1 is the the upper α/2 quantile of Student's t-distribution with K − 1 degrees of freedom.
The predicted values of 73.3836 and 77.1631 fall within the indicated confidence intervals.
Here, we calculate and present in Table 6 the relative prediction error for the values given in Table 5. We assume that the maximum number of reference points (T = 30) forms the most accurate prediction. The results in Table 6 indicate that three points as far as possible from each other (T = 2) are sufficient to obtain an approximation curve (an approximation polynomial of the second degree). At the same time, the authors recommend using four points (T = 3) to construct such a curve.
Here, we discuss the study results.
The proposed algorithm to generate codewords allows for the provision of the necessary technical result of constructing (M, D min )-code with the required code distance. However, the obtained N(M, D min ) values do not reach the known lower bounds [39,[41][42][43][44][45][46][47]. For example, N max (11,9,5040) = 67. At the same time, paper [39] gives the lower bound of 154 for the (11,9)-code size. The corresponding values for M = 7 are N max (7, 6, 5040) = 19 vs. the lower bound of 42 and N max (7, 5, 5040) = 57 vs. the lower bound of 77 in [39]. However, we cannot say that the result is negative. First, in this study, we used not an algebraic, but a statistical method for code construction. Second, the proposed statistical method, unlike the algebraic method, allows for the construction of a unique system of commands or alerts for dynamic wireless sensor network objects. Note also that increasing the N(M, D min )-code size may lead to a decrease in the number of different possible (M, D min )-codes, which can be constructed for the defined values of M, D min , and N lim . In turn, the number of different possible (M, D min )-codes is important for applying the (M, D min )-code both in secure-channel coding schemes and for constructing a unique system of commands or alerts for MTC objects. At the same time, we do not deny the need to continue the search for new effective and fast statistical methods for (M, D min )-code construction or to improve the proposed method. Determining the balance between the (M, D min )-code size and the number of possible different (M, D min )-codes is an actual problem that can be the subject for further research.
The study has shown that the relative error in predicting the size of (M, D min )-code increases with increasing the hypothetical number N lim of permutations in the initial set, as well as with increasing the step ∆. However, the nature of this dependence is not obvious and can be further investigated.

Conclusions
In this paper, we have developed and implemented a statistical algorithm to generate codewords of a (M, D min )-code by enumerating a set of permutations {π} of length M and selecting permutations with the symbol Hamming distance to all preselected codewords not exceeding the D min value.
We applied two algorithms to generate a random factorial number. The first algorithm is based on the conversion from a random decimal number by division, and the second algorithm is based on the random generation of individual digits of a factorial number. We found that the second method is faster.
We have determined experimentally the dependences of the average and the maximum values of the size N(M, D min , N lim ) of a (M, D min )-code constructed from a subset of N lim permutations, on the value of N lim .
A technique to compute approximation quadratic polynomials for the determined dependences of the average and the maximum values of the (M, D min )-code size has been developed. A key feature of this technique is to use the function (1) of a double logarithm ln(ln N lim ) and to use a quadratic polynomial. The approximation polynomials and their corresponding curves can be used to extrapolate the dependencies and predict their behavior at N lim values exceeding the upper limit of their statistical study range.
Finally, we confirmed the effectiveness of the developed technique to estimate the average and the maximum size values N(11.9, N lim ) for N lim = 30, 000 and N lim = 50, 000 at the upper limit of the statistical study range N lim = 5040. The prediction relative error of (M, D min )-code size did not exceed the value of 0.72% obtained for N lim = 50, 000 and ∆ = 0.44.  Data Availability Statement: All initial data, program codes will be provided upon request to the correspondent's e-mail with appropriate justification.