Entropy Inequalities for Lattices

We study entropy inequalities for variables that are related by functional dependencies. Although the powerset on four variables is the smallest Boolean lattice with non-Shannon inequalities, there exist lattices with many more variables where the Shannon inequalities are sufficient. We search for conditions that exclude the existence of non-Shannon inequalities. The existence of non-Shannon inequalities is related to the question of whether a lattice is isomorphic to a lattice of subgroups of a group. In order to formulate and prove the results, one has to bridge lattice theory, group theory, the theory of functional dependences and the theory of conditional independence. It is demonstrated that the Shannon inequalities are sufficient for planar modular lattices. The proof applies a gluing technique that uses that if the Shannon inequalities are sufficient for the pieces, then they are also sufficient for the whole lattice. It is conjectured that the Shannon inequalities are sufficient if and only if the lattice does not contain a special lattice as a sub-semilattice.


Introduction
The existence of non-Shannon inequalities has received much attention since the first inequality of this type was discovered by Zhang and Yeung [1]. The basic observation is that any four random variables X, Y, Z and W satisfy the following inequality: 2I (Z; W) ≤ I (X; Y) + I (X; Z ⊎ W) + 3I (Z; W X) + I (Z; W Y) . (1) Here, C ⊎ D denotes the random variable that takes a value of the form (c, d) if c = C and d = D. As usual, I (⋅; ⋅) and I (⋅; ⋅ ⋅) denote mutual information and conditional mutual information given by: where H denotes the Shannon entropy. The inequality (1) is non-Shannon in the sense that it cannot be deduced from the positivity, monotonicity and submodularity of the entropy function on the variables X, Y, Z, and their joins, i.e., satisfaction of the following inequalities: Monotonicity Positivity and monotonicity were recognized by Shannon [2], while submodularity was first observed by McGill [3]. It is easy to show that any inequality involving only three variables rather than four can be deduced from Shannon's inequalities [4]. The powerset of four variables is a Boolean algebra with 16 elements, and any smaller Boolean algebra corresponds to a smaller number of variables, so in a trivial sense, the Boolean algebra with 16 elements is the smallest Boolean algebra with non-Shannon inequalities.
In the literature on non-Shannon inequalities, all inequalities are expressed in terms of sets of variables and their joins. Another way to formulate this is that the inequalities are stated for the free ∪-semi-lattice generated by a finite number of variables. In this paper, we will also consider intersections of sets of variables. We note that for sets of variables, we have the inequality: Inequality (7) has even inspired some authors to use I (⋅ ∧ ⋅) as notation for mutual information.
Although non-Shannon inequalities have been known for two decades, they have found remarkably few applications compared with the Shannon inequalities. One of the reasons is that there exists much larger lattices than the Boolean algebra with 16 elements for which the Shannon inequalities are sufficient. The simplest examples are the Markov chains: where any variable X j is determined by its predecessor, i.e., the conditional entropies H X j+1 X j are zero for j = 1, 2, . . . , n − 1. For such a chain, one has: The inequalities (9) are all instances of the entropy function being monotone, and it is quite clear that these inequalities are sufficient in the sense that for any sequence of values that satisfies these inequalities, there exists random variables related by a deterministic Markov chain with these values as entropies.
In this paper, we look at entropy inequalities for random variables that are related by functional dependencies. Functional dependencies give a partial ordering of sets of variables into a lattice. Such functional dependence lattices have many applications in information theory, but in this paper, we will focus on determining whether a lattice of functionally-related variables can have non-Shannon inequalities. In order to achieve interesting results, we have to restrict our attention to special classes of lattices.
Entropy inequalities have been studied using matroid theory, but finite matroids are given by geometric lattices, i.e., atomistic semi-modular lattices (see the textbook of Stern [5] for definitions). For the study of non-Shannon inequalities, it is more natural to look at general lattices rather than geometric lattices because many important applications involve lattices that are not atomistic or not semi-modular. For instance, a deterministic Markov chain gives a lattice that is not atomistic. It is known that a function is entropic if and only if it is (approximately) equal to the logarithm of the index of a subgroup in a group [6]. Therefore, it is natural to study entropic functions on lattices and their relations to subgroup lattices.
In this paper, we bridge lattice theory, database theory and the theory of conditional independence, but sometimes, the terminology in these fields does not match. In such cases, we give preference to lattice theory over database theory and preference to database theory over the theory of conditional independence. For instance, there is a property for closure operators that is called extensivity in the theory of lattices. We translate extensivity into a property for functional dependence, and it turns out that extensivity can be used instead of the property for functional dependences, which is called augmentation. Extensivity is apparently a weaker condition than augmentation, but together with the properties called monotonicity and transitivity, they are equivalent on finite lattices. Finally, we translate extensivity from functional dependencies to separoid relations that model the concept of conditional independence. In the literature on conditional independence, extensivity has been termed "normality" without any explanation why this term is used. We called it extensivity because it is equivalent to the notion of extensivity in lattice theory, which we consider as a more fundamental theory.
The paper is organized as follows. In Section 2, we describe the link between lattice theory and the theory of functional dependences in detail. We demonstrate how properties of closure operators associated with sub-semilattices correspond to the properties of functional dependence that are normally called Armstrong's axioms. In Section 3, we describe positive monotone submodular functions (polymatroid functions) and how they lead to separoid relations on lattices. These separoid relations generalize the notion of conditional independence known from Bayesian networks and similar graphical models. We demonstrate how properties of separoid relations correspond to properties of functional dependences.
In Section 4, we describe entropy functions on lattices and how they correspond to subgroup lattices of a group. We conjecture that the Shannon inequalities are sufficient for describing entropic polymatroid functions of a lattice if and only if the lattice does not contain a special lattice as a sub-semilattice. In Section 5, we develop some technical results related to "gluing" lattices together. The gluing technique is very useful for planar lattices, and in Section 6, we demonstrate that entropic functions on planar modular lattices can be described by Shannon's inequalities.
We finish with a short discussion, where we outline some future research directions. There is one appendix with some additional comments related to Armstrong's axioms. These are mainly intended for readers that are familiar with the theory of functional dependencies in databases. A second appendix contains a long list of lattices that are used to document that polymatroid functions on lattices with seven or fewer elements can be described by Shannon's inequalities.
Some of the results presented in this paper have been published in preliminary form and without proof [7,8], but since then, most of the results have now been strengthened or reformulated. In this paper, all proof details will be given.

Lattices of Functional Dependence
In this section, we shall briefly describe functional dependencies and their relation to lattice theory. The relation between functional dependence and lattices has been studied [7,[9][10][11][12][13]. The relation between lattices and functional dependencies is closely related to minimal sets of Shannon-type inequalities [14,15]. Relations between functional dependencies and Bayesian networks have also been described [8,16]. Many problems in information theory and cryptography can be formulated in terms of functional dependencies. Example 1. Consider a group consisting of n agents. One might be interested in giving each agent in the group part of a password in such a way that no single agent can recover the whole password, but any two agents are able to recover the password. Here, the password should be a function of the variables known by any two agents, but must not be a function of a variable held by any single agent. The functional dependence structure is the lattice illustrated in the Hasse diagram in Figure 1. The node at the top illustrates the password. Each of the intermediate nodes represents the knowledge of an agent. The bottom node represents no knowledge.
A ∧-semilattice is a set equipped with a binary operator ∧ that satisfies the following properties: Associativity For a ∧-semilattice the relation X ∧ Y = X defines a preordering that we will denote X ≤ Y. If (L, ∧) is a semilattice, then we say that M is sub-semilattice if M is closed under the ∧ operation. Let (L, ∧) denote a semilattice. Let ↓ X = {Y ∈ L Y ≤ X}. Then, ↓ (X ∧ Y) = (↓ X) ∩ (↓ Y). Therefore, we can identify any finite semilattice with a ∩-semilattice in a powerset. Since we will usually identify semilattice elements with sets of variables, we will often use ⊆ and ∩ to denote the ordering and the meet operation. In this paper, we will assume that all semilattices and all lattices are finite. If a ∩-semilattice (L, ∩) has a maximal element, then a binary operator ∨ can be defined as: and then, (L, ∩, ∨) is a lattice. Let (L, ⊆) denote a lattice with M as a sub-semilattice with the same maximal element as L. Then, a unary operator cl ∶ L → L can be defined by: The operator cl is a closure operator [17], i.e., it satisfies: Idempotency cl (cl (X)) = cl (X) .
For any closure operator cl, the element X is said to be closed if cl (X) = X. If X and Y are closed, then X ∩ Y is closed ( [18], [Lemma 28]), so the closed elements of a lattice under a closure operator form a ∩-semilattice.

Example 2.
If G is a group, then a subgroup is defined as a subset that is closed under the group operations. The closure of a subset of G is the subgroup generated by the subset. The lattice of subgroups forms a ∩-semilattice in the lattice of all subsets of the group. Let G denote a finite group. For any subgroupG ⊆ G, we associate the variable XG that maps an element g ∈ G into the left coset gG. Then, the subgroup lattice of G is mapped into a lattice of variables where the subset ordering of subgroups is equivalent to functional dependences between the corresponding variables.

Proposition 2.
If cl is a closure operator on a lattice, then the relation cl(X) ⊇ Y and the relation cl(X) ⊇ cl(Y) are equivalent. The relation X → Y given by cl(X) ⊇ Y satisfies the following properties. Transitivity Remark 1. The monotonicity of → is called reflexivity in the literature on databases. We reserve the notion of reflexivity to the relation X → X, in accordance with the terminology for ordered sets. In database theory, the property X → X is called self determination.
In the literature on databases extensivity, (18) is replaced by an apparently stronger property called augmentation, but in a finite lattice augmentation can be proven from extensivity, monotonicity and transitivity. See Appendix A for details.
The monotonicity (19) of → follows directly from the monotonicity (15) of cl.
The transitivity (20) of → follows from the transitivity of ⊇.
If L is a lattice with a relation → that satisfies Armstrong's axioms, then we say that a lattice element X is → closed if X → Y implies that X ⊇ Y. Theorem 1. Let L be finite lattice with a relation → that satisfies Armstrong's axioms. Then, the set of → closed elements form a ∩-semilattice with the same maximal element as L. The relation X → Y holds if and only if cl (X) ⊇ Y, where cl denotes the closure operator with respect to the semilattice.
Proof. Assume that X 1 and X 2 are closed and that X 1 ∩ X 2 → Y. The monotonicity (19) implies X i → X 1 ∩ X 2 , and then, the transitivity (20) implies that X i → Y. Since X i is closed, we have X i ⊇ Y. Since this holds for both i = 1 and i = 2, we have X 1 ∩ X 2 ⊇ Y, implying that X 1 ∩ X 2 is closed. The monotonicity (19) also implies that the maximal element of L is closed so that the set of closed elements M forms a ∩-semilattice with a closure operator cl M .
Let cl denote the closure with respect to M. We will prove that X → cl (X). Let Then, X 1 → X 2 and X 1 ⊂ X 2 . Iterate this construction so that: Since the lattice is finite, the construction must terminate, and when it terminates, X n is closed. Using transitivity, we get X → X n and X ⊆ X n . Since cl (X) is the smallest closed element greater than X, we have X → cl (X).
We will look at functional dependencies in databases. Assume that a set of records is labeled by elements in a set A. In statistics records are the individual elements of a sample. For each record a ∈ A, the database contains the values of various attributes given by a number of functions from A to the set of possible attributes. Sets of such functions will be denoted by capital letters, and these will be our variables. We say that X determines Y and write X → Y if there exists some function f such that Y(a) = f (X(a)) for any record a ∈ A. Then, the relation → satisfies Armstrong's axioms. Armstrong proved that these axioms form a complete set of inference rules [19]. That means that if a set A of functional dependencies is given and a certain functional dependence X → Y holds in any database where all the functional dependencies in A hold, then X → Y holds in that database. Therefore, for any functional dependence X → Y that cannot be deduced using Armstrong's axioms, there exists a database where the functional dependence is violated [20,21]. As a consequence, there exists a database where a functional dependence holds if and only if it can be deduced from Armstrong's axioms. Using the result that Armstrong's axioms are equivalent to the closed sets forming a lattice, Armstrong's result is easy to prove.

Theorem 2.
For any finite lattice L, there exists a database with a set of related variables such that the elements of the lattice corresponds to closed sets under functional dependence.
Proof. As the set of records, we take the elements of the lattice L.
We have seen that for a subgroup lattice of a group, there exists a lattice of functional dependence. The opposite is also true. To each database with attributes related by functional dependence, there is a group. The construction is as follows. Let A denote a set of records. Let G = Sym(A) be the symmetric group consisting of permutations of the records. If X is a function on A, then we define the stabilizer group G X as the set of permutations that leave X invariant, i.e., permutations π ∈ Sym(A) such that X(π(a)) = X(a) for all a ∈ A. Then, X → Y if and only if G X ⊆ G Y . In this way, the functional dependence lattice of a database can be mapped into a lattice of subgroups of a group.
Combining Theorem 2 with the stabilizers subgroups of the symmetric group of a database, we get the following result that was first proven in 1946 by Whitman [22]. Corollary 1. Any finite lattice can be represented as a functional dependence lattice generated by subgroups of a group.

Definition 1. On a lattice, the submodularity of a function h is defined via the inequality h
If the submodular inequality holds with equality, we say that the function is modular. A polymatroid function on a lattice is a function that is non-negative, increasing and sub-modular. For a polymatroid function h on a lattice, one may introduce a function I h (⋅; ⋅ ⋅) that corresponds to conditional mutual information by: One can rewrite I h (⋅; ⋅ ⋅) as: Since h is monotone and submodular, we have: It is straightforward to verify that: We will say that a function I(⋅; ⋅ ⋅) that satisfies positivity (26), symmetry (27) and the chain rule (28) is a separoid function.

Proposition 3.
If I (⋅, ⋅ ⋅) is a separoid function, then the following property is satisfied.
Proof. Assume that Y ⊆ Z. We can use the chain rule (28) to get: Hence, monotonicity (29) is satisfied.
The relation I h (X; X Z) = 0 is equivalent to h (X ⊎ Z) = h (Z), and this relation will be denoted X → h Z. The first to observe that to h defines a lattices was Shannon, who published a very short paper on this topic in 1953 [23]. Shannon did not mention the relation to the theory of functional dependences because that theory was only developed two decades later. Surprisingly, Shannon's paper was only cited once until 2002!
The relation → h satisfies Armstrong's axioms, and the most instructive way to see this is via separoid relations. If h is a polymatroid function, then the relation I h (X, Y Z) = 0 will be denoted X h Y Z. Following Dawid et al. [24,25], we say that a relation (⋅ ⋅ ⋅) on a lattice (L, ∩, ⊎) is a separoid relation, if it has the following properties: Chain rule Remark 2. The term monotonicity was used for a different concept by Paolini [26]. In [24,25], a weaker condition than monotonicity was used, but their condition together with the chain rule implies monotonicity.
With this definition we see that h is a separoid relation. The properties (31)- (33) should hold for all X, Y, Z, W ∈ L. In this paper, we are particularly interested in the case where the subsets are not disjoint. In the literature on Bayesian networks and similar graphical models, the focus has been on disjoint sets where only the last two properties (32) and (33) are used to define a semi-graphoid relation [27]. See also [28], Remark 2.5, where it is noted that semi-graphoid relations can be defined on join semi-lattices.
A long list of properties for the notion of independence was given by Paolini [26], but Studený has proven that one cannot deduce all properties of statistical conditional independence from a finite list of axioms [28,29].
Transitivity If X Y W and X Z Y ⊎ W, then X Z W .
Proof. To prove the extensivity (34), assume that X Y Z, which is equivalent to X Y Z ⊎ Z.
The monotonicity (31) gives X Z Z. The conclusion X Y ⊎ Z Z is obtained by the chain rule (33).
To prove the transitivity (35), assume that X Y W and X Z Y ⊎ W. The chain rule (33) applied twice gives X Y ⊎ Z W and X Z W.
In a set of random variables, we note that if Y is independent of Y given X, then Y is a function of X almost surely. If Y Y X, we write X → Y. The monotonicity (19) follows directly from the monotonicity (31).
To prove the transitivity of → , assume that X → Y and Y → Z. The monotonicity (31) implies that Z X Z ⊎ Y, which by the chain rule (33), implies Z Z ⊎ X Y. By the chain rule (33), we have Z Z Y ⊎ X. The monotonicity (31) also gives Z Y Y ⊎ X, which together with X → Y implies that Z Y X by transitivity (35). The transitivity (35) then implies Z Z X.
To prove that the relation (⋅ ⋅ ⋅) restricted to the lattice of closed lattice elements is separoid, one just has to prove that X Y Z if and only if X cl (Y) Z if and only if X Y cl (Z). This follows from Armstrong's results.
The significance of this theorem is that if we start with a separoid relation on a lattice, then this separoid relation is also a separoid when restricted to elements that are closed under the relation → . Proof. For any finite lattice L, one identifies the elements with subgroups of a group G. If the group G is assigned a uniform distribution, then the variable corresponding to a subgroup will also have a uniform distribution. With this distribution, a variable is independent of itself given another variable if and only if the other variable determines the first variable. Therefore, statistical independence with respect to the uniform distribution on G gives a separoid relation for which the closure is the original lattice.
Assume that X and Y are → h closed. Then: Therefore, h restricted to the → h closed elements is polymatroid. We may summarize these observations in the following proposition.

Proposition 5.
If h is a polymatroid function defined on the lattice (L, ⊆), then the relation → h satisfies Armstrong's axioms. The function h restricted to the lattice of → h closed elements is polymatroid.
We recall that a pair of point (Y, Z) is said to be a modular pair, and we write YMZ if Y ∩ Z ⊆ X ⊆ Z implies that: If all pairs are modular, we say that the lattice is modular, and we have: when X ⊆ Z.

Proposition 6. If (⋅ ⋅ ⋅) is a separoid relation on a lattice and:
then YMZ in the lattice of closed elements. In particular, if h is a polymatroid function on a lattice and: then YMZ in the lattice of closed elements.
Proof. If Y ∩ Z ⊆ X ⊆ Z, then we have the following sequence of implications. Hence, If is separoid, then according to the extensivity (34), the relation X Y Z implies: so that Z ⊇ (X ⊎ Z) ∩ (Y ⊎ Z) ⊇ Z. Following Dawid [24], we define the relation X M Y Z by:

Theorem 5. If a polymatroid function h on a lattice is modular, then the lattice of → h closed elements is modular. If the lattice is modular, then X h Y Z if and only if X M Y Z in the lattice of closed elements.
Proof. If the function h is modular, then all pairs of elements are modular in the lattice of h-closed elements, so the lattice of closed elements is modular. In a modular lattice: The following result appears in [24] with a longer proof.

Corollary 2. For a lattice, the relation X M Y Z is separoid if and only if the lattice is modular.
Proof. Assume that the lattice is modular. Then, the ranking function r is modular, and X → r Y if and only if X ⊇ Y. Therefore, X M Y Z is equivalent to the separoid relation I r (X, Y Z) = 0. Assume that the relation M is separoid. Since X M Y X ∩ Y, we have that XMY. Since all pairs are modular, the lattice is modular.

Entropy in Functional Dependence Lattices
Let L denote a lattice with maximal element m. Let Γ (L) denote the set of polymatroid functions on L. The set Γ (L) is polyhedral, and often, we may normalize the polymatroid functions by replacing h (⋅) by h (⋅) h (m). In this way, we obtain a polytope that we will denote Γ 1 (L).

Definition 2. A function h ∈ Γ (L) is said to be entropic if there exists a function f from L into a set of random variables such that h (X) = H ( f (X)) for any element X in the lattice.
Let Γ * 1 (L) denote the set of normalized entropic functions on L, and letΓ * 1 (L) denote the closure of Γ * 1 (L).

Definition 3.
A lattice is said to be a Shannon lattice if any polymatroid function can be realized approximately by random variables, i.e., Γ 1 (L) =Γ * 1 (L) .
One may then check whether a lattice is a Shannon lattice by checking that the extreme polymatroid functions are entropic or can be approximated by entropic functions. Example 4. Let G denote a finite group. For any subgroupG ⊆ G, we associate the variable XG that maps an element g ∈ G into the left coset gG. The number of possible values of XG is G ∶G = G G . Assume that the subgroups are given a functional dependence structure where a variable X is given by a function A → B . If A has n elements, then the groups of permutations G have n! elements. The subgroup that leaves X invariant has: element. Therefore: If U is the uniform distribution on the finite group G, then the distribution of XG is uniform, and the entropy is H XG = ln ( G ) − ln G . It has been proven that the set of entropic functions generated form a convex cone. Therefore, the normalized polymatroid functions generated by groups hasΓ * 1 (L) as closure [4].
From Definition 3, we immediately get the following result.

Proposition 7.
If L is a Shannon lattice and M is a subset that is a ∩-semi-lattice, then M is a Shannon lattice.
In particular, all sub-lattices of a Shannon lattice are Shannon lattices.
Proof. Assume that L is a Shannon lattice and that M is a sub-lattice. Let h ∶ M → R denote a polymatroid function. For ∈ L, let˜ denote the m ∈ M that minimize h (m) under the constraint that m ⊇ . Define the functionh ( ) = h ˜ . Now,h is an extension of h, and with this definition,h is non-negative and increasing. For x, y ∈ L, we have: Hence,h is submodular. By the assumption,h is entropic, so the restriction ofh to M is also entropic.
With these results it hand, we can start hunting for non-Shannon lattices. We take a lattice that may or may not be a Shannon lattice. We find the extreme normalized polymatroid functions. These extreme polymatroid functions can be found either by hand or by using some suitable software that can find extreme points of a convex polytope specified by a finite set of inequalities. For instance, the R program with package rcdd can find all extreme points of a polytope. For each extreme point, we determine the lattice of closed elements using Proposition 5. These lattices of closed sets will often have a much simpler structure than the original lattice, and the goal is to check if these lattices are Shannon lattices or not. It turns out that there are quite a few of these reduced lattices, and they could be considered as the building blocks for larger lattices.
We recall that an element i is ⊎-irreducible if i = X ⊎ Y implies that i = X or i = Y. An ∩-irreducible element is defined similarly. An element is double irreducible if it is both ⊎-irreducible and ∩-irreducible. The lattice denoted M n is a modular lattice with a smallest element, a largest element and n double irreducible elements arranged in-between. Theorem 6. For any n, the lattice M n is a Shannon lattice.
Proof. The proof is essentially the same as the solution to the cryptographic problem stated at the beginning of Section 2. The idea is that one should look for groups with a subgroup lattice M n and then check that the subgroups of such group have the right cardinality.
Assume n ≥ 3. Then, the values should satisfy the inequalities: Proof. Assume that the polymatroid function h only takes the values 0, 1 2, and 1. Then, h defines a separoid relation, and the closed elements form a lattice isomorphic to M n for some integer n.
The function h is entropic on M n , so h is also entropic on the original lattice.

Lemma 1.
If h is submodular and increasing on ∩-irreducible elements, then h is increasing.

Proof.
Assume that h is submodular and increasing on ∩-irreducible elements. We have to prove that if X ⊇ Z, then h (X) ≥ h (Z) . In order to obtain a contradiction, assume that Z is a maximal element such that there exist an element X such that X ⊇ Z, but h (X) < h (Z) . We may assume that X cover Z. Since h is increasing at ∩-irreducible elements, Z cannot be ∩-irreducible. Therefore, there exists Theorem 7. Any lattice with seven or fewer elements is a Shannon lattice.
Proof. Up to isomorphism, there only exist finitely many lattices with seven elements or less. These are listed in the Appendix B. Each of these lattices has finitely many extreme polymatroid functions. These extreme polymatroid functions can be found by hand or by using the R program with package rcdd. All the extreme polymatroid functions on these lattices can be represented by a trivial lattice, or by the two-element chain 2, or by M 5 , or by M 6 , or by M 7 . All these lattices are representable, and thereby, they are Shannon lattices.
The number of lattices grows quite fast with the number of elements, and the number of elements is not the best way of comparing lattices.
The Boolean lattice with four atoms is the smallest non-Shannon Boolean algebra. Nevertheless, there are smaller non-Shannon lattices. Figure 2 illustrates the Matúš lattice, which is a lattice with just 11 elements that violates Inequality (1). This corresponds to the fact that the lattice in Figure 2 is not equivalent to a lattice of subgroups of a finite group. The lattices that are equivalent to lattices of subgroups of finite groups have been characterized [30], but the characterization is too complicated to describe here. Using the ideas from [31], one can prove that the Matúš lattice in Figure 2 has infinitely many non-Shannon inequalities. Therefore, any lattice that contains the Matúš lattice as a ∩-semilattice also has infinitely many non-Shannon inequalities. The result of Matúš has recently found a parallel in matroid theory. An infinite set of inequalities is needed in order to characterize presentable matroids [32][33][34].

The Skeleton of a Lattice
In this section, we will develop a cutting-and-gluing technique that can be used to handle many lattices, but it is especially useful for planar lattices. We present the notion of tolerance. Further details about this concept can be found in the literature [5,35].

Definition 4.
A symmetric and reflexive relation Θ on a lattice is called a tolerance relation if X 1 ΘX 2 and Y 1 ΘY 2 imply: and If Θ is a tolerance relation, then for any X, the set {Y ∈ L XΘY} is an interval in the lattice. These intervals are called the blocks of Θ, and the blocks will be denoted [X] Θ . For a tolerance relation, the blocks may be considered as elements of the factor L Θ, and this factor has a natural structure as a lattice. Congruence relations are special cases of tolerance relations, but in general, the blocks of a tolerance relation may overlap. We note that if the intersection of two blocks is non-empty, then the intersection is a sublattice. If X ∈ L Θ, then L X will denote the block in L determined by X. We defined a glued tolerance relation as a tolerance relation where X cover Y in L Θ, implying that L X ∩ L Y ≠ ∅.
A tolerance relation can be identified with a subset of L × L, so tolerance relations are ordered by subset ordering. The trivial tolerance relation is the one where xΘy holds for all x, y ∈ L, and this tolerance relation is the greatest tolerance relation. A glued tolerance relation contains any covering pair, and glued tolerance relations are characterized by this property. Therefore, the intersection of two glued tolerance relations is a glued tolerance relation. Therefore, the set of glued tolerance relations forms a lattice. The smallest glued tolerance relation is denoted Σ (L) and is called the skeleton of the lattice. An example of a planar modular lattice is given in Figure 3 and the skeeton is given in Figure 4.
for all X, Y where X ∩ Y is covered by X and Y, then the function h is submodular on L.
Proof. First, we prove that if the function h satisfies: for all X, Y where X ∩ Y is covered by X, then the function h is submodular on L. Let A and A denote two lattice elements. Define sequences X 1 ⊆ X 2 ⋅ ⋅ ⋅ ⊆ X n = A and Y 1 ⊆ Y 2 ⋅ ⋅ ⋅ ⊆ Y n = A ⊎ B by first defining X 1 = A ∩ B and Y 1 = B. Assume that X 1 is an element that covers A ∩ B and such that X 1 ≤ A. Let X i+1 ⊆ A be a cover of A ∩ Y i , and let Y i+1 = X i+1 ⊎ Y i . Then: Adding all these inequalities leads to: and the inequality is obtained because h is increasing to that h (X i+1 ∩ Y i ) − h (X i ) ≥ 0 and because X i+1 ∩ Y i ⊇ X i by construction of the sequences. To see that, we just need to check submodularity when B covers A ∩ B proven in the same way.
Proposition 8. Let L be a lattice with a tolerance relations Θ, and let h ∶ L → R denote some function. Then, h is polymatroid if and only if the restriction of h to any block L x is polymatroid.
If h is entropic, then the restriction to each block is entropic. Characterizing the blocks of a lattice has been done for certain classes of lattices, but here, we shall only mention a single result. Theorem 8 ([36]). The blocks of a modular lattice are the maximal atomistic intervals.
In particular, the skeleton of a modular lattice consists of blocks that are geometric lattices.

Results for Planar Lattices
In this section, we will restrict our attention to planar lattices. There are several reasons for this restriction. First of all, any poset with a planar Hasse diagram is a lattice if and only if it has a least element and a greatest element [37]. As a consequence, any ∩-semilattice of a planar lattice is also a planar lattice. Certain cut-and-glue techniques are also very efficient for planar lattices. Finally, both planar distributive lattices and planar modular lattices have nice representations that will play a central role in our proofs. Proof. The proof is via induction over the number of elements in the lattice. For a trivial lattice, there is nothing to prove. Assume that the theorem has been proven for all lattices with fewer elements than the number of elements of L. Assume that h is a polymatroid. Since the lattice is planar, it has a left boundary chain ∅ ⊂ L 1 ⊂ L 2 ⋅ ⋅ ⋅ ⊂ L m and a right boundary chain ∅ ⊂ R 1 ⊂ R 2 ⋅ ⋅ ⋅ ⊂ R n where L m = R n is the maximal element of L. Let R k be the minimal element of the right boundary chain such that L 1 ⊆ R k . We note that R k = L 1 ⊎ R k−1 . Let L j denote the largest element in the left boundary chain such that L j ⊆ R k . Then, there is a chain from L j to R k , and we have a glued tolerance relation with two blocks L 0 = {X ∈ L X ⊆ R k } and L 1 = X ∈ L X ⊇ L j and with the two element chain lattice 2 as the factor lattice. These two blocks are glued together along a chain L j = y 1 ⊂ y 2 ⊂ ⋅ ⋅ ⋅ ⊂ y t = R k that L 1 and L 0 share. There are two cases: either R k ⊂ R n or R k = R n .
Assume that R k ⊂ R n . Then, the glued tolerance relation is non-trivial. Since h restricted to X ∈ L X ⊇ L j and {X ∈ L X ⊆ R k } are probabilistically representable, we may without loss of generality assume that there exist two groups G 1 and G 0 such that to X ∈ L i , there is a subgroup G i (X) ⊆ G i such that h (X) = ln G i ∶ G i (X) . We associate the variable X G i (X) that maps an element g ∈ G into the left coset gG i (X) . The goal is to find a joint distribution to a set of variables associated with each X ∈ L. We note that all variables in L 0 are functions of r k , so if we map X G 1 (r k ) into X G 0 (r k ) , all other variables in L 2 are determined. In particular, the chain y 1 ⊂ y 2 ⊂ . . . y t is determined by r k = y t . The sequences X G 1 (y i ) are mapped into the sequence X G 0 (y i ) recursively, starting with mapping X G 1 (y 1 ) into X G 0 (y 1 ) . This is possible since X G 1 (y 1 ) and X G 2 (y 1 ) are uniform distributions on sets of the same size. Now, there are equally many values of X G 1 (y 2 ) and X G 0 (y 2 ) that map into the same values of X G 1 (y 1 ) and X G 0 (y 1 ) , so the the values of X G 1 (y 2 ) and X G 0 (y 2 ) can be mapped into each other. We continue like that until all the random variables along the chain y 1 ⊂ y 2 ⊂ . . . y t have been identified.
If r k = r n , then we make a similar construction with the role of the left chain and the right chain reversed. If this leads to a non-trivial glued tolerance relation, we glue representations together as we did above.
If both the left chain and the right chain lead to trivial glued tolerance relations, then L 1 ⊎ r 1 is the maximal element of L, and the whole lattice consists of a single block in Σ (L) . In this case, the content of the theorem is trivial.
Theorem 10. All planar modular lattices are Shannon lattices.
Proof. Without loss of generality, we may assume that the lattice consists of just one block for the tolerance relation Σ (L) . A modular block is atomistic, but if a modular planar lattice is atomistic, it is equivalent to the trivial lattice or to the lattice 2, or to the lattice 2 × 2, or to one of the lattices M n .
Our construction actually tells us more. If the lattice is distributive, it is glued together with blocks that are either equivalent to 2 or to the lattice 2 × 2. Therefore, the lattice is a sublattice of a product of two chains, as illustrated in Figure 5. This result was first proven by Dilworth [38]. Other characterizations of planar distributive lattices can be found in the literature [39]. Since the extreme polymatroid functions on the lattices 2 and the lattice 2 × 2 only take the values zero and one, the same is true for any planar distributive lattice.
A modular planar lattice will also contain blocks of the type M n . Therefore, a modular planar lattice can be obtained from a distributive planar lattice by adding double irreducible elements [40], as illustrated in Figure 6. Since M n has extreme polymatroid functions that take the values 0, 1 2 and 1, the extreme functions are modular. Gluing such modular functions together leads to extreme polymatroid functions that are modular. Therefore, all extreme polymatroid functions on a planar modular lattice can be represented by a planar modular lattice with a modular function. Therefore, the independence structure is given by (X Y Z) when Z = (X ⊎ Z) ∩ (Y ⊎ Z) . The extreme polymatroid functions on a planar modular lattice can be represented as follows. Let X 1 , X 2 , . . . X m , Y 1 , Y 2 , . . . , Y n denote independent random variables uniformly distributed over Z p for some large value of p. Let Z ij denote the random variable: and let Z ijk denote the random variable: for k > 0. The way to index the variables can be seen in Figure 7. Then, the entropy is proportional to the ranking function. A polymatroid function h that has a representation given by an Abelian group satisfies the Ingleton inequalities [41], i.e., inequalities of the form: Therefore, the Shannon inequalities imply the Ingleton inequalities as long as the polymatroid function is defined on a planar modular lattice. Paajanen [42] has proven that under some conditions, the entropy function of a nilpotent p-group can be represented by an Abelian group. The core of the proof was that the subgroup lattice of a nilpotent p-group is also the subgroup lattice of an Abelian group. Many of these lattices are planar, and in these cases, the results by Paajanen follow from our results on planar graphs.

Discussion
In this paper, we have proven that the three basic Shannon inequalities are sufficient for certain lattices. It would be a major step forward if one could make a complete characterization of lattices without non-Shannon inequalities, but this may be too ambitious. In order to obtain results, one may have to restrict to certain classes of lattices like general modular lattices or geometric lattices. For handling such lattices, one would have to develop new techniques that may also be of wider interest.
Lattices seem to fall into two types. For one type, one does not have non-Shannon inequalities. For the other type, there are infinitely many non-Shannon inequalities. We do not know of any lattice with non-Shannon inequalities where the entropic functions are characterized by finitely many inequalities. Apparently, the complexity increases from three basic inequalities to infinitely many inequalities, and this transition seems to happen due to the Matúš lattice. Similarly matroids in general have no finite characterization, and conditional independence does not have a finite characterization. It appear to be the case that the leap from low complexity to infinite complexity happens for the same reason and seems to be related to the structure of the Matús lattice. In this paper, we have provided some basic results and a common terminology that should be useful for further exploration of this research area.
Bayesian networks and similar graphical models have not been discussed in the present paper. Nevertheless, Bayesian networks are closely related to functional dependencies, so important properties of Bayesian networks can be translated into lattice language. This will be the topic of a separate publication [43], but some preliminary results have already been published [7].
We have seen how a separoid relation generates a notion of functional dependence. For modular lattices, we have also seen that the lattice structure generates a separoid relation. It is an open question to what extent general lattices are born with a canonical notion of conditional independence that can be formalized in terms of separoids. For functional dependencies corresponding to Bayesian networks, this question has been studied in detail [16], but more general results related to these questions would be of great importance to our understanding of concepts related to cause and effect. The next five lattices have extreme points that can be represented by M 5 or the lattice 2. The first two lattices are modular, but not distributive. The next three are not modular.
The extreme points of the last nine lattices are all represented by the lattice 2. The first four are not modular.  The next seven lattices have extreme points that can be represented by M 4 , M 3 or the lattice 2. The first two lattices are modular. The last five lattices are not modular.