Reconstructing Binary Signals from Local Histograms

In this paper, we considered the representation power of local overlapping histograms for discrete binary signals. We give an algorithm that is linear in signal size and factorial in window size for producing the set of signals, which share a sequence of densely overlapping histograms, and we state the values for the sizes of the number of unique signals for a given set of histograms, as well as give bounds on the number of metameric classes, where a metameric class is a set of signals larger than one, which has the same set of densely overlapping histograms.


Introduction
A histogram is a central tool for analyzing the content of signals while disregarding positional relations. It is useful for tasks such as setting thresholds for detecting extremal events and for designing codes in communication tasks. In [1], the three fundamental scales for histograms for discrete signals (and images) were presented: the intensity resolution or the bin-width, the spatial resolution, and the local extent, for which a histogram is evaluated. Even when fixing these scale parameters, it is still essential to consider the sampling phase, since in general, we do not know the location of the interesting signal parts, and thus, we must consider all phases, or equivalently, all overlapping histograms and all histograms for different positions of the first left bin-edge.
A natural question, when using local histograms for signals and image analysis, is: How many signals share a given set of overlapping, local histograms (illustrated in Figure 1)? Out of pure theoretical interest, in this paper, we took a first step in answering this question by considering densely overlapping histograms of binary signals.
This is a much smaller number than 2 3 possible signals, but it is not a bijective relation. The representation power of the histogram may be quantified as the conditional entropy of signals given their histogram. For binary signals of length three, the histogram of a signal may be summarized by its count of "1"-values, since the number of "0"-values will be three minus this count. For length three binary signals, there are eight different signals [0; 0; 0], [0; 0; 1], . . . [1; 1; 1], which have four different histograms where the counts of "1" values are 0, 1, 2, and 3, respectively, and the corresponding number of signals counted by their "1"-values is 1, 3, 3, and 1. Given a histogram, the conditional probability of each of these corresponding signals is thus 0, log 2 3, log 2 3, 0, and the conditional entropy may thus be found to be approximately 1.2 bit. In this paper, we did not focus on coding schemes for signals, but on the expression power of local overlapping histograms. Thus, consider again the signal in (1), but now with a set of local histograms of extent two, in which case, the histograms are calculated for the overlapping sub-signals, In this case, there is only one signal that has this sequence of overlapping histograms, since, by the first histogram, we know that the first two values are "0", and in combination with the second histogram, we conclude that the last value must be "1"; thus, the signal must be [0; 0; 1]. In contrast, the signals, [0; 1; 0] and [1; 0; 1] have the same local histograms, and: in which case the signal↔histograms is not a bijective relation. However, the local histogram and the global histogram together uniquely identify the signals. As these examples show, the relation between local histograms and signals is non-trivial, and in this paper, we considered the space of possible overlapping local histograms and the number of signals sharing a given set of local histograms.
In the early 20th Century, much attention was given to the lossless reconstruction of signals, in particular using error-correction codes, where the original signal is sent together with added information [2]. Such additional information could be related to the histogram of the original signal. Later, the reconstruction of signals and images became more pressing problems, primarily as a way to compress images without losing essential content, and this resulted in still widely used image representation standards such as mpeg, tiff, and jpeg.
While signal representation has still been of some concern in the 21st Century, advances in hardware means that more attention has been given to image representation and, in particular, to qualifying the information content of image features. In [3], the authors introduced the concept of metameric classes for local features. The authors considered scale-space features and investigated the space of images that share these features. They further presented several algorithms for picking a single reconstruction. An extension of this approach was presented in [4], where patches at interest points of an original image were matched with patches from a database by a feature descriptor such as SIFT [5]. The database patches were then warped and stitched to form an approximation of the original image. In [6], the authors presented a reconstruction algorithm based on binarized local Gaussian weighted averages and using convex optimization. The theoretical properties of the reconstruction algorithm is still an open research question. In [7], images were reconstructed from a histogram of a densely sampled dictionary of local image descriptors (bag-of-visual-words) as a jigsaw puzzle with overlaps. They showed that their method resulted in a quadratic assignment problem and used heuristics to find a good reconstruction. In [8], the authors investigated the reconstruction of images from a simplified SIFT transform. The reconstruction was performed based on the SIFT-key points and their discretized local histograms of the gradient orientations, and several models were presented for choosing a single reconstruction from the possible candidates. In [9], a convolutional neural network was presented that reconstructs images from a regularly, but sparsely sampled set of image descriptors. The network was able to learn image priors and was able to reconstruct images from both classical features such as SIFT and representations found in AlexNET [10]. This was later extended in [11], where an adversarial network was investigated for reconstruction from local SIFT features.
Our work is closely related to [12], which discussed the relation between FRAME [13] and Julesz's model for human perception of textures [14]. In [13], the authors defined a Julesz ensemble as a set of images that share identical values of basic features statistics.
Although not considered in their works, histogram bin-values can be considered a feature statistics, and hence, the metameric classes presented in this paper are Julesz ensembles in the sense of [13]. In [12], they considered normalized histograms of images filtered with Gabor kernels [15], and they considered the limit of the spatial sampling domain converging to Z 2 . Their perspective may be generalized to local histograms; however, their results only hold in the limit. This paper is organized as follows: First, we define the problem in Section 2. Section 3 describes an algorithm for finding the signal(s) that has (have) a specific set of local histograms. In Section 2, constraints on possible local histograms and the size of metameric classes are discussed, and finally, in Section 5, we present our conclusions.

Metameric Signal Classes
We were interested in the number of signals that have the same set of local histograms. In case there is more than one, then we call this a metameric signal class (or just a metameric class) defined by their shared set of local histograms. We define signals and their local histograms as follows: Consider an alphabet A = {0, 1}, l = |A| = 2 and a one-dimensional signal S ∈ A n , n > 1, which we denote S = [s 0 ; s 1 ; . . . ; s n−1 ] and where s i ∈ A is the value of S at position i. For a given window size 1 < m ≤ n, we considered all local windows S j = [s j ; s j+1 ; . . . ; s j+m−1 ], 0 ≤ j ≤ n − m and their histograms h j : A → Z + , where δ is the Kronecker delta function, defined as: All local histograms of for the signal S are H S = [h 0 ; h 1 ; . . . ; h n−m ].
As an example, consider the signal, in which case n = 5. For m = 3, the windows are: and the corresponding histograms are: or equivalently, in short form, In some cases, two different signals will have the same set of histograms, and we call these signals metameric, i.e., they appear identical w.r.t. their histograms. We say that they belong to the same metameric class given by their common histogram sequence. For example, when n = 5 and m = 2, the signals, have the same sequence of n − m + 1 = 4 histograms, and thus, S and S belong to the same metameric class denoted by We were interested in the ability of local histograms to represent signals. Hence, for a given signal and window sizes, we sought to calculate µ, the number of signals S, which are uniquely identified by H S and κ, the number of metameric classes. The values of µ and κ for small values of n are shown in Table 1. These values were counted by considering all 2 n different possible signals, which is an approach only possible for small values of n. From the table, we observe that we did not find any combination of signal lengths and window sizes without a metameric class; hence, none of the tested combinations yielded a unique relation between the local histograms and the signal. Further, the number of unique signals µ appeared to grow with n/m, and the number of metameric classes κ appeared to be convex in m for a large value of n.

An Algorithm for Reconstructing the Complete Set of Signals from a Sequence of Histograms
We constructed an algorithm for reconstructing the one or more signals, which has or have a given sequence of histograms. It was constructed using the following facts: There is a non-empty and finite set of signals of size m, which share the same histogram h. These can be produced as all the distinct permutations of the following signal: and the number of distinct signals is given by the binomial coefficient: Fact 2 Consider the windows S i−1 and S i and their corresponding histograms h i−1 and h i for i = 0..n − m. If s i−1 = s i+m−1 , the histograms will be identical; otherwise, the histograms will differ by the count of one at s i−1 and at s i+m−1 , and s i+m−1 =s i−1 , where· is the Boolean "not" operator; Fact 3 From Fact 2, it follows that the histogram of [s i ; s i+1 ; . . . ; s i+m−2 ] is equal to h i , but where h i (s i+m−1 ) has been reduced by one. We call this h i ; Fact 4 The difference, As a consequence, for candidate signals S i that have histogram h i , but that have the wrong value placed at s i+m−1 , the difference will have both negative and positive values.
Thus, we constructed the following algorithm: Step The signal is a sequence of binary digits, and the histogram sequence is represented as a sequence of maps, where each map-entry is an (intensity, count) pair, i.e., map [(0, 2); (1, 1)] above is equal to the histogram (2, 1). Finally, the solutions is represented as a sequence of sequences of binary digits. In this case, there is, as we can see, a metameric class of two signals, which shares a sequence of histograms. We verified that the algorithm is able to correctly reconstruct all the signal considered in Table 1 including all the members of the metameric classes.

Theoretical Considerations on µ and κ
In the following, we consider classes of histogram sequences and relate them to the number of metameric classes for a given family of signals and their sizes.
As a preliminary fact, note that for a window size m, all the histograms must have: since all entries in s i are counted exactly once.

Constant Sequence of Histograms (h 0 = h j )
There are two different constant signals of length n: [0; 0; . . . ; 0] and [1; 1; . . . ; 1]. All neighborhoods and histograms of the constant signals are identical, and a histogram will have one non-zero element with value m. These signals cannot belong to a metameric class, since permuting the position of the values in S 0 does not give a new signal, and they are trivially unique.
In general signals with constant histogram sequences, h 0 = h j , the signal must be periodic, since the only difference in the histogram count of h j and h j+1 is that h j includes s j+m and h j+1 does not include s j . Hence, for h j = h j+1 , then s j = s j+m . For example, [1; 0; 1; 0; 1] is a periodic signal for m = 2 with histogram h j = (1, 1), j = 0 . . . 3. Any constant sequence of histograms describes a periodic signal, and for non-constant signals (∀a : h(a) < m) and m > 1, these histograms describe a metameric class, since some permutations of S = [S 0 |S 0 | . . .] will produce new signals without changing the histograms due to periodicity. For any n > m, there are 2 m − 2 such periodic binary signals.

Global Histogram (m = n)
For a particular h = h 0 , all permutations of the signal belong to the same metameric class. Thus, the number of metameric class is equal to the number of different histograms with sum m, except those for constant signals. This corresponds to picking m numbers from A where repetition is allowed and order does not matter.
Following the standard derivation of unordered sampling with replacement, we visually rewrite the terms in (23) with a list of "·'s", where each "·" represents the count of one for a given bin. For example, for m = 3, we may have the histogram (2, 1), which implies that 3 = 2 + 1 =··+·. A different histogram could be (0, 3), implying that 3 = 0 + 3 =+···. Hence, any permutation of three "·'s" and one "+" will in this representation give the sum of three, and the number of permutations is equal to the number of ways we can choose m out of m + 1 positions. Thus, the number of different ways we can pick histograms is given as the binomial coefficient: Out of these, two histograms stem from the constant signals. The remaining histograms have ∀a : h(a) < m, and each of these histograms defines a metameric class, since there will always be more than one signal with such a histogram by (21). Hence, the number of different histograms is, This equation confirms the values in Table 1 where n = m.

Smallest Histogram (m = 2)
For the case of m = 2, we now show that:  Since the sum of a histogram is m (see (23)) and since the histograms for binary signals only have two bins, we can identify each histogram by: i.e., as the number of one-values in S j . Thus, in the following, we identify h j by σ j . In the following, we consider consecutive pairs of histograms for signals of varying lengths n > m. Firstly, consider the case n = 3 and m = 2 and all possible combinations of histograms h 0 and h 1 , i.e., σ 0 , σ 1 ∈ {0, 1, 2}. The organization of all the 2 3 signals in terms of σ 0 and σ 1 is shown in Table 2. We call such tables transition tables, and we say that each table cell contains a set of signal pieces. For n = 3 and m = 2, the table illustrates that there is one metameric class shown in cell σ 0 = σ 1 = 1, since this table cell contains two elements. This case is also discussed in relation to (26). Table 2. All signals grouped by the sum of their local histograms when n = 3 and m = 2. Arrows show the relations between the (σ j , σ j+1 ) and (σ j+1 , σ j+2 ) tables. (σ j , σ j+1 ) is a tridiagonal table: Since σ j and σ j+1 only differ by the values s j and s j+m , then the differences between σ j and σ j+1 can maximally be one. Hence, the table will have a tridiagonal structure; Fact 6 Elements on the main diagonal have s j = s j+m : On the diagonal σ j+1 = σ j , hence: Thus, s j = s j+m ; Fact 7 Elements on the first diagonal above have s j = 0 ∧ s j+m = 1: On the first diagonal above, σ j+1 = σ j + 1, and thus, Thus, s j = s j+m − 1 ⇒ s j = 0 ∧ s j+m = 1;

Fact 8
Elements the first diagonal below have s j = 1 ∧ s j+m = 0: On the first diagonal below, σ j+1 = σ j − 1, and thus, Thus, s j = s j+m + 1 ⇒ s j = 1 ∧ s j+m = 0. For counting the number of elements in the table, let γ(σ i , σ j ) be the number of elements in cell (σ i , σ j ): In all six cases, the histograms are from signals where either or both S j and S j+1 are constant, and hence, we can trivially reconstruct the corresponding m + 1 values from the histograms. We call such a histogram pair a two-trivial pair; Fact 10 On the main diagonal, except σ j = σ j+1 = 0 and σ j = σ j+1 = m, By Fact 6, s j = s j+m . For s j = 0, the possible signals for s j+k , 1 ≤ k ≤ m − 1 are signals summing to σ j , i.e., ( m−1 σ j ), and for s j = 1, we have ( m−1 σ j −1 ). Since 0 < σ j < m, therefore γ(σ j , σ j ) ≥ 2; Fact 11 On the first diagonal above, By Fact 7, s j = 0 ∧ s j+m = 1. Hence, the possible signals for s j+i , 1 ≤ i ≤ m − 1 are signals summing to σ j . Further, since 1 < σ j+1 < m and σ j+1 = σ j + 1, therefore 0 < σ j < m − 1, and therefore, γ(0, σ j + 1) = γ(m − 1, σ j + 1) = 1 and γ(σ j , σ j + 1) ≥ 2 for all other cases; Fact 12 On the first diagonal below, By Fact 8, s j = 1 ∧ s j+m = 0. Hence, the possible signals for s j+i , 1 ≤ i ≤ m − 1 are signals summing to σ j − 1 = σ j+1 . Further, since 1 < σ j < m and σ j+1 = σ j − 1, then 0 < σ j+1 < m − 1, and therefore, γ(σ j , 0) = γ(σ j , m − 1) = 1 and γ(σ j , σ j − 1) ≥ 2 in all other cases. For transitions, the following facts hold: Fact 13 Any signal of any length n > 2, m = 2 can be described as a route following the arrows in the table; Fact 14 An element in column j transitions to an element in row j, and as a consequence, Fact 16 Intracellular paths for 0 < i, j < m + 1, |j − i| ≤ 1, are ambiguous, since these cells contain several indistinguishable elements, and we cannot determine the path's starting point from its histogram sequence; Fact 17 Cell pairs, connected by more than one arrow in the same direction, give rise to ambiguous pairs, and paths that only contain such crossings or intracellular paths are ambiguous, since the paths cannot be distinguished by their histograms; Fact 18 For m = n − 1, the number of metameric classes is equal to the number of nonempty cells in the tridiagonal table minus the six two-trivial cells.
For 2 ≤ m < n − 1, an upper bound on the number of metameric classes is equal to the number of ambiguous paths.
We have yet to come up with a closed form for κ(n, m).
We have yet to identify an efficient algorithm to count all the ambiguous paths in such tables.

Conclusions
From the concept of locally orderless images [1] in image processing, we were intrigued by characterizing the metameric classes for a given set of local histograms. In this article, we took the first step by studying binary signals. We gave a sifting algorithm with a computational complexity that was factorial in the size of the window and linear in the signal size. We further identified all unique signals and an upper bound on the number of metameric classes for all signal and window sizes. While the transition tables illuminated important aspects in identifying metameric classes, we have yet to discover an efficient algorithm for this purpose. Future work includes extending our work to signals of more complex types, sets of histograms with varying window sizes, and signals of higher dimensions such as images.

Conflicts of Interest:
The authors declare no conflict of interest.