1. Introduction
A poset is a fundamental concept in mathematics that formalizes the notion of ordering in a collection of objects. Formally, a poset
is a pair consisting of a finite set
V and a binary relation
that is reflexive, antisymmetric and transitive. In this paper, we only consider strict posets, written here as
, where the binary relation is antisymmetric and transitive but irreflexive. An example of a strict poset is shown in
Figure 1a. From here on, all posets being discussed refer to strict posets.
Every poset
also corresponds to a directed acyclic graph (DAG)
having the vertex set
V and the edge set
. However, a poset is more commonly illustrated using a Hasse diagram that corresponds to the transitive reduction of the DAG.
Figure 1b shows the Hasse diagram
of the poset
P defined in
Figure 1a.
The binary relation in a poset is called a partial order since not all pairs need to be related. For the case wherein all pairs are related, the binary relation is called a total order, and the poset is said to be a totally ordered set or a linear order. Formally, a linear order
is a poset where for all distinct elements
, either
or
. Basically, we can treat a linear order as a permutation of the elements in
V. Moreover, a linear order
is said to be a linear extension of poset
P if and only if
. The set of all linear extensions of
P is denoted by
. For example,
Figure 1c shows the set of linear extensions of the poset
P in
Figure 1a.
Generating the linear extension of a poset, which is also equivalent to getting all the topological sorting of a directed acyclic graph, is a well studied problem. Algorithms [
1,
2] on this show that we can generate each successive linear extension or topological sorting in polynomial-time. The reverse problem where the given is a set of linear extensions, or more technically a set of linear orders, and the goal is to determine the set of posets that generate the given linear orders is the Poset Cover Problem. Formally, the Poset Cover Problem is defined as follows [
3]:
Poset Cover Problem
Instance: A set of linear orders over the set .
Solution: A set of posets where and k is minimum.
The Poset Cover Problem finds application in data mining where there is a massive set of sequential data and the goal is to determine directed networks that explain the ordering of objects in the sequential data. A simple example of this is in marketing, when the business wants to generate a directed process model from the logs of customers’ purchases of their products. The logs can be treated as linear orders of the products. Determining directed process models, which explains the purchasing behavior of the customers, can then be represented as getting a set of posets that generate the linear orders (logs). The process model can then be used to predict other customers’ future purchases, and the business could develop marketing campaigns from it. Similar to this kind of data mining, the problem is also relevant in other areas such as in neuroscience [
4], chemical engineering [
5], epidemiology [
6], paleontology [
7,
8], and systems biology [
9].
The decision version of the Poset Cover Problem has been shown to be NP-Hard [
3]. To deepen the knowledge about the Poset Cover Problem, efforts in the past have focused on studying constrained cases of the problem and trying to draw boundaries of the cases that are in P and cases that are NP-Hard. There are essentially two ways in which the problem can be restricted or constrained. The first is to consider only a specific number of posets, say
k. The problem when
, or the 1-Poset Cover Problem, is in P [
10]. The computational complexity of the problem when
, or the 2-Poset Cover Problem, is not yet known. What have been devised are heuristics for the 2-Poset Cover Problem [
11]. The other way of restricting the problem is to consider only a specific class of posets according to their Hasse diagram, such as a tree-poset. The poset in
Figure 1 is an example of a tree-poset since its Hasse diagram is a tree. The restricted cases of the Poset Cover Problem are similarly important as there may also be instances wherein the class of posets to be reconstructed are known. For example in paleontology, the goal is to construct evolutionary ordering of fossil sites from sequential data about the taxa that occur in each site [
7,
8]. Evolutionary orderings are usually expressed using trees because evolution starts with an origin and branches out to descendants. Other classes of posets that have been studied are hammock posets and leveled posets [
10]. The 2-Tree-Poset Cover Problem then is a contrained variation where the goal is to determine if there exist two tree-posets that cover exactly all of the given set of linear orders.
In our study, we derived properties on posets, which lead to an exact solution for the 2-Poset Cover Problem. The algorithm runs in exponential time. However, if the posets to be considered are tree-posets, the running time of the algorithm becomes polynomial. This proves that the 2-Tree-Poset Cover Problem is also in P.
2. Definitions
We first define here the terms and notations used in the discussion of results.
Definition 1. ancestors(v,P)
Given a poset and , the is the set of elements in poset P that precedes v, i.e., .
Definition 2. descendants(v,P)
Given a poset and , the is the set of elements in poset P that succeeds v, i.e., .
Definition 3. cover relation
Given a poset , its cover relation is and there is no where .
Definition 4. cover
The term cover is used in many instances for different objects in the discussion.
Given two elements of poset P, we say that u covers v if and only if . In this instance, are also said to be cover pairs in P.
Given a set of linear orders Y and poset P, we say that P covers Y if and only if .
Given a set of linear orders Y and a set of posets , we say that covers Y if and only if .
Definition 5. Hasse Diagram of Poset
A poset corresponds to a Hasse diagram , which is a directed acyclic graph G with the elements in V as nodes and pairs in the cover relation of P as edges, i.e., .
Definition 6. Tree Poset
A tree-poset is a poset whose Hasse diagram is a rooted directed tree with each non-root node being covered by exactly one node.
Definition 7. parent(v,P)
Given a tree-poset and such that v is not the root node, is the element u that covers v, i.e., if and only if .
Definition 8. 2-Poset Cover Problem
Instance: A set of linear orders over the set .
Question: Does there exist a pair of distinct posets and such that and is neither a subset nor a superset of ?
Definition 9. 2-Tree-Poset Cover Problem
Instance: A set of linear orders over the set .
Question: Does there exist a pair of distinct tree-posets and such that and is neither a subset nor a superset of ?
Definition 10. comparable and incomparable
Given a poset and elements , u and v are comparable, denoted by if or . Otherwise, u and v are incomparable, denoted by .
3. Theoretical Bases
Before we discuss the algorithms, we present the following lemmas and theorems that serve as bases in devising the algorithms.
Lemma 1. Given posets and , if , then .
Proof. Let the linear order L be a linear extension of , i.e., . By definition of linear extension, . Since , then by transitivity, . This implies that . Hence, . □
Given a pair that is incomparable in P, say , we know that there are some linear extensions where while in the remaining ones. In other words, we can partition into and . The next theorem shows that there exists posets , that cover and , respectively. Moreover, we can derive the relationship of and from , which is also given in the following theorem.
Theorem 1. Consider a poset and distinct elements where . Let and . Moreover, let Then, there exist posets and where and such that and . Moreover, in terms of cover relation, To illustrate the theorem, consider the posets
P,
and
with Hasse diagrams in
Figure 2. Since
, let us take elements 3 and 4 as
a and
b in the theorem, respectively. If we generate
, we have the following linear extensions.
We can partition
into two -
and
. Clearly,
contains all the linear extensions in the left column while
contains those in the right column. From the theorem, there exist posets
and
that cover
and
, respectively. In our example, they are the posets
and
in
Figure 2. Verify that
and
. Moreover,
In terms of cover relation, which is the transitive reduction of
and
, we have the following:
Proof. We first show that where is a poset. Since P is a poset, then we know that is irreflexive, antisymmetric and transitive. We also know that a and b are distinct elements. An element x cannot be both in and , otherwise, a and b are related in . Hence, we can say that is also irreflexive and antisymmetric. To show that is transitive, suppose , then we have the following cases:
and
By transitive property of poset P, . Hence,
and
By the definition of A, and . It cannot be that or because of the assumption that a and b are distinct. It cannot also be that and , otherwise the assumption that is violated. Hence, we can say that this case is not possible.
and
By the definition of A, and . If , then . Hence and thus, . On the other hand, if , then . By the transitive property of , . In other words, . Hence, and thus, .
and
By the definition of A, and . If , then . Hence and thus, . On the other hand, if , then . By the transitive property of , . In other words, . Hence and thus, .
From these cases, we can say that is transitive. Hence, is a poset. In a similar way, we can also show that is a poset.
Next, we show that by showing that and .
To prove the first direction, suppose . Then . This implies that . Since , then we can also say that . This means, . Hence, .
To prove the other direction, suppose . Then, and . Now, let . It suffices to show that . This clearly holds if , since . Otherwise, , which implies that ( or ) and ( or ), whence ( or ) and ( or ). In any case, since , we get by transitivity.
In a similar way, it can also be shown that .
Now, let us determine the cover relation . We first determine which pairs are cover pairs in . If , we know that by definition of , there exists no such that and , otherwise . Since , then we also know that there exists no such that and . Hence, . On the other hand, any other value of in will lead to (), () or (), which all imply that . Hence, the only pair in that is covered in is . Now, we know that corresponds to . Hence, is the transitive reduction of . To get the transitive reduction, we have to elimate pairs in that are not in because the addition of no longer makes them covered. These are the following:
where
In this case , hence, .
where
In this case , hence,
Hence, . In a similar way, we can derive the formula for . □
Corollary 1. If the poset in Theorem 1 is a tree-poset, then Equations (3) and (4) become Equations (5) and (6), respectively. Proof. From Theorem 1, . First, we show that if P is a tree-poset. Suppose there exists such v where and . Since, every non-root node in a tree-poset is covered by only one element, then only a covers v. Then it must be that , so that . This is a contradiction to the assumption that . Hence, .
Next, we examine the set
. If
P is a tree-poset, then there exists only one possible
u that covers
b, which is
. Hence, we have two possibilities. If
then
. Note that
, otherwise
. Hence, the only other condition is when
. With this condition, Equation (
3) then becomes
.
We can prove Equation (
6) in a similar way. □
4. Algorithm for the 2-Poset Cover Problem
As mentioned earlier, there is already a polynomial-time solution for the 1-Poset Cover Problem [
10]. We can determine in
time, where
m is the number of linear orders over a base set of
n elements, the poset that covers a given set of linear orders. Let us denote the algorithm for the 1-Poset Cover Problem as GeneratePoset. The input to GeneratePoset is a set of linear orders and it returns a poset, if there exists such, that covers the input; otherwise it returns null. Now, for the 2-Poset Cover Problem with input
, a brute-force algorithm is then to determine
,
⊆
such that
and then use GeneratePoset to determine if there is a poset that covers
and a poset that covers
. Since there are
possibilities for values of
and
, the brute-force algorithm entails a running time of
.
Theorem 1 suggests a strategy of partitioning . Given a pair , we can partition into and . Suppose there exist two posets, say and that cover . If and , then we just have partitioned perfectly because and . Hence, we can determine P and using GeneratePoset with and as inputs, respectively. This is executed in Lines 7 and 8 of Algorithm 1. However, such a pair does not always exist. There can be also instances where and and vice versa—there also exists a different pair, say where and . Without lost of generality on , suppose . Then, if we apply the same partitioning strategy, and . We are not sure if there exists a poset that covers , however, we are sure from Theorem 1 that there exists a poset that covers , say . Moreover, with the use of the equations in the Theorem, we can determine or reconstruct the relation of from . These are executed in Lines 22–32. We can also determine P in a similar way with the pair . This is executed in Lines 10–20 in the Algorithm.
Theorem 2. Algorithm 1 produces a solution to the 2-Poset Cover Problem in time.
Proof. Let P and be the two different posets that cover , i.e., . In the following, we want to show that P and , if there exist such, can be determined and generated by the 2-Poset Cover Algorithm.
With the assumption that means also that , then there must exist, without loss of generality, at least one such that but . On the other hand, there must also exist such that but , otherwise, and by Lemma 1, can be covered by a single poset .
Let us first take the case where but . With this, we have two further cases.
In this case, and . Hence, posets and can be generated in Lines 5 and 6, respectively and are returned in Line 8.
In this case, . Hence, there exist linear extensions of where and linear extensions where . Let and . This implies that and . Let and be the return of GeneratePoset for and in Lines 4 and 5, respectively.
We know that
is not null because from Theorem 1 there exists a poset that covers
. Moreover, we can also reconstruct
from
using Equation (
4), i.e.,
where
and
.
In reconstructing
from
, let
be our working poset. First, we let
and if
, then we can already disregard the components
of
. This is because
. With Lemma 1, whatever set of linear orders that are covered by the poset with the later cover relation can be already covered by the poset with the former cover relation. These are executed with the “If” condition in Lines 23–24. Otherwise (the else part), we have to determine
C and
D. However, in the Algorithm, what we have only generated and know is the relation of
. We do not know
yet hence we also do not know
C and
D. However, we know that
and
, and with Equation (
2),
and
. Hence,
where
and
where
. Hence, we can try all possible subsets of
A and
B for
. This is what we have done in Lines 26–32. Every possible poset from each combination is a candidate poset and added to
.
Algorithm 1: 2-Poset Cover Algorithm |
|
We can also do the same in determining P when the iteration evaluates the pair where but .
Lastly, to get P and in , we can try all possible pairs of posets that exactly cover in Lines 33–35.
Now, we determine the running time complexity of the algorithm.
The first and outermost for-loop iterates in
. One dominating execution inside it is the call to GeneratePoset, which runs in
[
10]. Another dominating part is the two inner for-loops, in Lines 16–20 or Lines 28–32, that iterates through all the pairwise combinations of subsets A and subsets of B. Note that an element cannot be both an ancestor and descendant of
a or
b. Hence
. If
, then
is at most
. Hence, there are
pairs of their subsets. Inside the two inner for-loops is a statement that gets all linear extensions of poset, i.e.,
. This can be done in
by using the algorithm of Pruesse and Ruskey [
1]. Hence, the total running time of the first outermost for-loop is
.
For the second outermost for-loop,
since the smallest posets (with respect to the number of linear extensions) are the linear orders themselves. Thus, the second for-loop iterates in
. Inside it is a set equality testing that can be performed in constant time [
12]. Hence, the second for-loop runs in
.
Thus, the total running time of the algorithm is . □
The running time of the solution for the 2-Poset Cover Problem is still exponential. But there is now an improvement from the running time of a brute-force solution, i.e. from a base of 3 to a base of 2 and from exponents in terms of to exponent in terms of . Hence, we can say that for a large m, Algorithm 1 is much more efficient than a brute-force solution.
In the following, we show that when the posets are tree-posets, the running time of the algorithm becomes polynomial.
6. Conclusions
In this study, we explored the Poset Cover Problem, a hard problem that is relevant in the field of data mining. It is already known that we can determine a single poset that covers a given set of linear orders in polynomial time. In this paper, we extended the knowledge on the problem by investigating the hardness of the case where we want to determine two posets, if they exist, that cover a set of linear orders, which we called the 2-Poset Cover Problem.
We discovered properties on posets that lead to an exact solution for the 2-Poset Cover Problem. The algorithm runs in where m is the number of linear orders over a based set of n elements. When , the algorithm is significantly faster than a brute-force solution, which runs in . Since the algorithm runs in exponential time, the complexity class that the 2-Poset Cover Problem belongs to remains unknown. However, when the posets to be considered are tree posets, which can be treated as combinatorial models for evolutionary orderings, the running time of the algorithm becomes polynomial. Hence, this proves that the more restricted case, which we called 2-Tree-Poset Cover Problem, belongs to the computational complexity class P.