A New Transformation Technique for Reducing Information Entropy: A Case Study on Greyscale Raster Images

This paper proposes a new string transformation technique called Move with Interleaving (MwI). Four possible ways of rearranging 2D raster images into 1D sequences of values are applied, including scan-line, left-right, strip-based, and Hilbert arrangements. Experiments on 32 benchmark greyscale raster images of various resolutions demonstrated that the proposed transformation reduces information entropy to a similar extent as the combination of the Burrows–Wheeler transform followed by the Move-To-Front or the Inversion Frequencies. The proposed transformation MwI yields the best result among all the considered transformations when the Hilbert arrangement is applied.


Introduction
Information entropy is a measure for an uncertainty in data [1].Data with lower entropy have reduced diversity and, consequently, are more predictable.The concept was introduced by Shannon [2].It finds applications in various disciplines including computer science [3], mathematics [4], chemistry [5], mechanics [6], and statistics [7].In computer science, we are often interested in determining the minimum number of bits required to encode a message X, where X = x i is a sequence of symbols from the alphabet Σ X = {x i }.Each symbol x i ∈ Σ X is assigned a probability p i , which is calculated as the ration of the number of occurrences of x i in X to the number of all symbols in X. Shannon's information entropy is calculated with Equation ( 1), and provides a lower bound on the average number of bits required to represent symbols x i Σ X . ( The entropy is strongly related to the efficiency of various compression algorithms; lower entropy leads to better compression [8,9].However, there are known techniques that can influence the information entropy of X [10], including predictions and transformations.This paper initially considers three such transformation techniques: Move-To-Front, Inversion Frequencies, and the Burrows-Wheeler Transform.A new transformation technique is proposed later. This paper is divided into five Sections.Section 2 provides a brief explanation of Move-To-Front, Inversion Frequencies, and the Burrows-Wheeler Transform.Section 3 introduces the new transformation method named Move-With-Interleaving (MwI), and discusses various possibilities for arranging the data from the raster images into sequences X, which are then transformed.The results of applying the considered transformations on 32 benchmark greyscale raster images are presented in Section 4. Section 5 concludes the paper.

Move-To-Front Transform
Move-To-Front (MTF) transformation was introduced independently by Ryabko [11], and, shortly thereafter, by Bentley et al. [14].It is one of the self-organising data structures [20].
MTF changes the domain from Σ X to the set of natural numbers including 0. The lengths of the sequences X and Y remain the same, i.e., |X| = |Y|.MTF utilises a list L with random access, and operates through the following steps: -Find the index l in L where x i is located; -Send l to Y; -Increment the positions of all x k , where 0 ≤ k < l; -Move x i to the front of L. 1, where X = barbara|barbara , Σ X = {a, b, r, |} and H(X) = 1.806.MTF transforms X into Y = 1, 1, 2, 2, 2, 2, 1, 3, 3, 2, 3, 2, 2, 2, 1 , with H(Y) = 1.457.In this example, Y contains fewer symbols than |Σ X |, although this is not always the case.

Let us consider an example in Table
MTF reduces the information entropy in data by revealing local correlations.In fact, the sequences of the same symbols are transformed into 0, pairs of symbols are transformed into 1, triplets are transformed into 2, and so on.In some cases, repeated MTF transformations further reduce H [21].
The Inverse Move-To-Front (IMTF) Transform is straightforward.The input consists of the sequence of indices Y = y i and the alphabet Σ X .List L should be initialised in the same manner as in the MTF case (see Table 1).After that, indices l = y i are taken one by one from Y.The symbol x i at index l in L is read and sent to X. L is then rearranged in the same way as during the transformation.

Inversion Frequencies
Transformation Inversion Frequencies (IF) was proposed by Arnavut and Magliveras [12,22].IF accepts X = x i as an input, where x i is from the alphabet Σ X , and transforms it into Y = y i , Similarly to MTF, IF transforms the input symbols into the domain of natural numbers, but this time, the limit is |X| instead of |Σ X | as in the case of MTF.Of course, not all elements from Σ Y need to be present in Y.
For each x i ∈ Σ X , IF stores the position (i.e., an index) of its first appearance in X, and calculates an offset for all subsequent occurrences of x i .However, all symbols x j ∈ Σ X , 0 ≤ j < i, that have been used up to this point, are skipped over.The partial results for each x i are stored in auxiliary sequences A x i , which are merged in Y at the end.
Let us transform X = barbara|barbara with IF, where Σ X = {a, b, r, |}.The partial transformations are: The first 'a' is located at position 1 in X and, therefore, the first entry into A a is 1.To reach the next 'a', two symbols ('r' and 'b') have to be skipped, and therefore, the next entry into A a is 2. The remaining entries in A a are obtained using the same principle.First, 'b' is located at index 0. Two symbols ('a' and 'r') exist before the next 'b'.However, 'a' was already used, giving the offset 1.The first appearance of 'r' in X is at the position 2.However, as 'b' and 'a' were already used they should be skipped, and therefore, the first entry in A r is 0. All the auxiliary arrays are then merged into Y = 1, 2, 1, 2, 2, 1, 0, 1, 2, 1, 0, 0, 1, 0, 0 with H(Y) = 1.566.Expectantly, the values in the auxiliary sequences become smaller gradually, with all entries being zero for the last symbol.
Inverse Inversion Frequency (IIF) transformation requires information about the lengths of auxiliary sequences, i.e., the frequencies of the symbols in X, in addition to Y and Σ X .In our example, F = 6, 4, 4, 1 .However, F could be avoided by the introduction of a guard, which should not be an element in Σ X .The guard then separates the elements from auxiliary sequences.As we know that the last auxiliary array only contains zeros, it can be avoided.If the guard is −1, then Y = 1, 2, 1, 2, 2, 1, −1, 0, 1, 2, 1, −1, 0, 0, 1, 0, −1 .When the occurrence of the last symbol exceeds |Σ X |, |Y| < |X| could be advantageous, for example, for compression.

Burrows-Wheeler Transform
One of the ideas on how to transform X could be the generation of all possible permutations, and then selecting the one with the highest local correlations.The consecutive number of this permutation should be stored to reproduce the X.Unfortunately, the number of permutations grows exponentially by |X|, and this approach is, therefore, not applicable in practice.However, one of the permutations is obtained by sorting.The sorted sequence offers many good properties; among others, the local correlations are also emphasised.Unfortunately, an inverse transformation, which would convert the sorted sequence into its unsorted source, is not known.Burrows-Wheeler Transform (BWT), one of the most surprising algorithms in Computer Science [23], constructs the permutation of X, where the same symbols tend to be close together.In addition, only O(1) additional information is needed to restore X. Transformation, as suggested by Burrows and Wheeler [18], consists of four steps: 1.
Generating |X| permutations of X through rotational shift-right operations; 2.
Reading the BWT(X) from the last column of the sorted permutations; 4.
Determining the position of X in the sorted array of permutations.This position is essential for reconstruction, and is considered a BWT index, i BWT ; The construction of BWT for X = barbara|barbara is shown in Table 2.The majority of the same symbols are placed together in the obtained result Y = rbbbbrrr|aaaaaa .The position of X, i BWT = 9, should be stored for reconstruction.i BWT = 9 is needed to reconstruct X from Y.The first column C of the sorted array of permutations is obtained from Y straightforwardly by sorting (see Table 3).The first symbol is easily obtained from C, as it is pointed by i BWT = 9, C 9 = 'b' and X = b .The symbol 'b' is the fourth 'b' in C, and, therefore, it can be found in Y at the position 4. C 4 = 'a' is inserted into X = ba .The found symbol was the fifth 'a' in C, so the fifth 'a' is searched for in Y.It is found at position 13, where the corresponding C 13 ='r' is added into X = bar .The process continues until index i BWT is reached again.

Materials and Methods
Continuous-tone greyscale raster images (i.e., photographs) are used in our study, and therefore, the new transformation technique, introduced in Section 3.1, is designed accordingly.There are various transformations commonly applied to images among with Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) among the most widely used.These transformations are most frequently used for the spectral analysis of the image for the quantification of higher frequencies for lossy compression.The rare exception is JPEG2000 with LGT 5/3 wavelet which enables lossless compression.However, previous studies demonstrate higher compression ratios for more advanced prediction-based encoders for lossless compression, such as JPEG-LS [24,25], FLIF, and JPEG XL lossless [26].Prediction methods are more commonly used for reducing information entropy for lossless image compression [27].However, these methods are domain-dependent, whereas the proposed transformation methods are general.
Images are 2D structures that should be rearranged into 1D sequences to apply the aforementioned transformation techniques.However, this rearrangement can be performed in different ways, as discussed in Section 3.2.

Move with Interleaving
Let I be a greyscale raster image consisting of pixels p x,y , 0 ≤ x < N, 0 ≤ y < M, where N × M is the image resolution, and p x,y ∈ [0, 255].The pixels are arranged in the sequence X = x i , x i ∈ [0, 255], |X| = N × M, on which transformation is applied in order to reduce the information entropy.
Neighbouring pixels p x,y ∈ I, and therefore also consecutive symbols x i ∈ X, often reveal local correlations.However, in the majority of cases in photographs, these correlations do not manifest as the sequences of the same values or repeated patterns; instead, these values tend to be just similar enough within a certain tolerance δ.It can be assumed that the suitable value for δ depends on the specific image, and therefore, it is experimentally determined in Section 4. The values of x i change importantly only when the scene in I changes drastically (for example, such as during the transition from a branch of a tree the image background [28]).MTF would, in such a case (see Section 2.1), issue a long sequence of large indices to bring enough similar values near to the beginning of the list.Unfortunately, this means that the values in Y would be considerably dispersed, which is undesirable from the information entropy perspective.
The proposed transformation is based on the idea of MTF and, therefore, utilises list L with random access.Let x i ∈ X represent the value to be transformed.The position l of x i in L should be found and l is sent to Y.The updating process of the L operates in two modes: Mode 1: MTF is applied when l ≤ δ.
Mode 2: The temporal array T is filled with 2 • δ interleaved values, starting with x i when l > δ.The values from T are then inserted at the front of L, shifting all the remaining values in L accordingly.This is why we named this transformation Move with Interleaving (MwI).
Function InitialiseL in Line 4 of Algorithm 1 obtains the first element from X (x 0 = 7) and δ = 3 as input, and populates the list L. The first element in L becomes x 0 , while 2 • δ elements from interval [7 − 3, 7 + 3] are interleaved around this value.The remaining elements of L are then filled from the smallest up to the largest possible value from the interval [0, 15], according to the alphabetical order in Σ X .x 0 also becomes the first element in Y (see Line 5) to enable the same initialisation of L during the decoding phase.The situation after the initialisation is therefore:  The inverse MwI transformation (IMwI) is shown in Algorithm 2. As can be seen, it completely mimics the transformation procedure.The first element in Y represent the absolute value of x 0 , and it is obtained in Line 3. x 0 is utilised to populate the list L in Line 4, and depended on the output sequence X in Line 5. All other elements in Y are processed with the for-loop starting in Line 6.The specific position l is obtained from Y (Line 7), the value v is retrieved from L at the position l (Line 8), and stored in X in Line 9.After that, the algorithm evaluates l with regard to δ and applies either MTF (Line 11) or resets the content of L in Lines 13 and 14.When all indices from X have been processed, the reconstructed values are returned in Line 17.

Algorithm 2 Inverse MwI transformation 1: function IMwI(Y, δ)
Returns restored sequence X 2: Y: input sequence of indices; δ: tolerance 3: x 0 = Y 0 The first entry in X is Y 0 4: L = InitialiseL(x 0 , δ) Initialisation of L is done according to first element 5: AddToX(X, x 0 ) x 0 is sent to the reconstructed sequence X return X Returns restored sequence 18: end function

Time Complexity Estimation
The worst-case time complexity analysis for the considered transformation techniques is performed in this subsection.

MTF:
In the worst-case scenario, the last element of L should always be moved to the front.There are |Σ X | elements in L and consequently, For each x i ∈ Σ X , all elements in X are always visited, resulting in The algorithm presented in Section 2.3 has, unfortunately, A time complexity of O(|X| 2 log |X|), which limits its practical use for longer sequences.Later, it was shown that BWT can be constructed from the suffix array in linear time [23], and since there are known algorithms for constructing the suffix array in linear time [29,30], BWT itself can be obtained in T BWT (X) = O(|X|) time.BWT+MTF: Based on the above analysis, the combination BWT, followed by MTF, works in T BWT+MTF (X) = T BWT (X) + T MTF (X) = O(|X|).BWT+IF: Similarly, as above, the combination BWT, followed by IF, operates in time complexity MwI operates in two modes.In mode 1, then In mode 2, the algorithm performs two tasks.Firstly, it fills the auxiliary sequence

Rearranging Raster Data in the Sequence
Images are typically rearranged into sequences using a Scan-line order, which is a heritage of television (see Figure 1a).Three other possibilities shall be used for our experiments:
The Strip arrangement requires a user-defined parameter h for the width of the strip.Its value is evaluated in Section 4. A well-established approach for transforming multidimensional data into a one-dimensional form is through the use of space-filling curves.
The Hilbert curve [31] has been frequently applied to images [10,32,33].An implementation based on the state diagram [34] has been used for mapping between 2D images and 1D sequences, and vice versa.The complete Hilbert curve can only be constructed on images with resolutions equal to powers of 2 in both directions.However, images of different resolutions are quite common.Therefore, the Hilbert curve is cut off accordingly, as shown in Figure 1d.

Experiments
Figure 2 shows 32 benchmark 8-bit greyscale images used in the performed experiments.Table 4 gives the resolutions of these images in the second column, and their information entropies in the third one.
The information about the proposed transformation MwI is given in the fourth and fifth columns: firstly, the best values of δ, and secondly, the achieved information entropies.On average, the best value of δ is 12.However, δ = 11 was used for further experiments, since 17 out of the 32 images achieved the best reduction in entropy with δ < 12.The decrease in information entropy is shown in columns 6, 7, and 8 of Table 4 for MTF, IF, and MwI, respectively.MwI considerably outperformed MTF and IF.
BWT was used before MTF, IF, and MwI in the last three columns of Table 4. BWT had a considerably positive effect only on MTF and IF, but not on MwI.MwI, with its transformation mechanism, is capable of entirely replacing BWT.The last row of Table 4 shows the rank achieved by all the considered transformations.The ranking was as follows: BWT in front of IF was in the first place, MwI was in the second, BWT followed by MTF was in the third, while MwI after BWT, IF, and MWI were in the fourth, fifth, and the sixth places, respectively.
(1) Baboon (2) Ballons (3) Barb (4) Barbara ( 5) Bark ( 6) Board ( 7) Boats ( 8) Cameraman ( 9) Earth ( 10) Flower ( 11) Flowers ( 12) Fruits ( 13) Girl ( 14) Gold (15) Lena ( 16) Malamute    Table 5 presents the average entropy of all 32 benchmark images when different pixel arrangements were used to obtain sequence X.The results are quite intriguing, and deserve further analysis.For example, MTF significantly benefited from the Strip order, but performed poorly on the Scan-line and Left-right orders.The same pattern also applies to IF.On the other hand, the effect of the arrangement type was reduced when BWT was used in front of MTF or IF.Even more, BWT followed by MTF or IF was the best when the Scan-line order was applied.The pipeline BWT followed by MwI yielded worse results compared to using MwI alone.Therefore, it can be concluded that MwI efficiently replaced BWT.It can be observed that MwI was also not very sensitive to the data arrangements.However, the Hilbert arrangement was the most suitable, as, in this case, MwI achieved the best result between all the tested transformation and data arrangements.Besides the formal analysis provided in Section 3.1.1,it is even more important to consider how efficient the algorithm is in practice.Table 6 shows the CPU time spent on three techniques, all achieving a similar reduction in information entropy for seven images ranging from the smallest to the largest.The Scan-line order was used, and MwI was consistently the fastest in all cases.Personal computer with AMD Ryzen 5 5500 processor clocked at 3.60 GHz and equipped with 32 GB of RAM, running the Windows 11 operating system, was used in the experiments.The algorithms were programmed in C++ using MS Visual Studio, version 17.4.2.

Discussion
This paper introduces a transformation technique named Move with Interleaving (MwI).It operates in two modes.The first mode is the classical Move-To-Front, where the considered symbol x i from the alphabet Σ X is moved to the front of the list L. In the second mode, MwI moves 2 • δ symbols interleaved around x i in front of L. As a result of MwI, less oscillating transformed values are obtained, which exhibit lower information entropy.The approach proves to be especially beneficial in the sequences of symbols where local correlations are manifested as similar symbols within a certain tolerance, rather than as completely identical symbols, or symbols that reveal repeating patterns.Continuous-tone raster images are typical examples of such data, and were used in this paper to illustrate the concept.
Pixels, which define a raster image, can be arranged into a sequence in various ways, with Scan-order being used the most frequently.Three other possibilities have been tried in this study: Left-right, Strip, and the Hilbert arrangement.The proposed MwI was compared against Move-To-Front (MTF) and Inversion Frequencies (IF) transformations, both individually, and after applying the Burrows-Wheeler transform (BWT).
A total of 32 benchmark 8-bit greyscale raster images with different resolutions and contexts were used in the experiments.The effect of the aforementioned transformations on the information entropy can be summarised as follows:

•
When BWT is not used before MwI, it is considerably more efficient than MTF and IF.• MwI is as efficient as BWT, followed by MTF or IF.• BWT followed by MwI yields worse results in comparison to the results obtained by MwI alone.
• MwI is less sensitive to the arrangements on the input data compared to MTF and IF.• MwI is the most efficient transformation technique when the Hilbert data arrangement is used.• BWT, for its operations, requires the knowledge of the whole sequence in advance, while MwI operates incrementally and can, therefore, also be used in streaming applications.

•
Implementing MwI is easier compared to BWT, as it does not require the implementation of a prefix array for computational efficiency.
At this point, it is worth mentioning that string transformation techniques are less efficient than the modern prediction methods in the domain of 2D continuous-tone raster images [27,35].In future work, it would be interesting to investigate the combination of prediction-based methods and the proposed MwI transformation.A comprehensive comparison with other string transformation techniques and data domains, such as audio, should be conducted.And finally, open challenges remain: how to set δ for each individual data sequence, or even better, how to dynamically modify it during the processing of the considered data sequences.

Y = 7 , 3 , 11 , 2 , 13 Algorithm 1 MwI 1 :L 6 : 9 :T
Transformation function MwI(X, δ) Returns transformed sequence Y = InitialiseL(x i , δ) Initialisation of L is done according to x i=0 5:Y 0 = x iThe first entry in Y is x i=0 for i ← 1 to |X| do For all other x i 7:l =GetPosition(L, x i ) Find the position of x i in L 8:Y = AddToY(Y, l) Store the position in Y if l ≤ δ then If position is smaller that δ = FillT(x i , δ) Fill temporal sequence T 13: L = ModifyL(L, T)Place symbols in T in front of L

L
= ModifyL(L, T) Place symbols in T in front of L

Table 4 .
Information about the images' resolutions and their entropies H, and the entropies obtained by different transformations-all for the Scan-line order.

Table 5 .
Average entropies achieved for different arrangements of the pixels in sequences.width of the strip h = 4.2width of the strip h = 12.3width of the strip h = 8. 4 width of the strip h = 16. 1

Table 6 .
The CPU time spent in seconds for three transformation techniques, all achieving similar reductions in information entropy.