Next Article in Journal
Frequent Errors in Modeling by Machine Learning: A Prototype Case of Predicting the Timely Evolution of COVID-19 Pandemic
Previous Article in Journal
BENK: The Beran Estimator with Neural Kernels for Estimating the Heterogeneous Treatment Effect
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dictionary Encoding Based on Tagged Sentential Decision Diagrams

1
Department of Computer Science, College of Information Science and Technology, Jinan University, Guangzhou 510632, China
2
Guangdong-Macao Advanced Intelligent Computing Joint Laboratory, Zhuhai 519031, China
3
Key Laboratory of Safety of Intelligent Robots for State Market Regulation, Guangdong Testing Institute of Product Quality Supervision, Guangzhou 510670, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Algorithms 2024, 17(1), 42; https://doi.org/10.3390/a17010042
Submission received: 20 December 2023 / Revised: 10 January 2024 / Accepted: 12 January 2024 / Published: 18 January 2024

Abstract

:
Encoding a dictionary into another representation means that all the words can be stored in the dictionary in a more efficient way. In this way, we can complete common operations in dictionaries, such as (1) searching for a word in the dictionary, (2) adding some words to the dictionary, and (3) removing some words from the dictionary, in a shorter time. Binary decision diagrams (BDDs) are one of the most famous representations of such encoding and are widely popular due to their excellent properties. Recently, some people have proposed encoding dictionaries into BDDs and some variants of BDDs and showed that it is feasible. Hence, we further investigate the topic of encoding dictionaries into decision diagrams. Tagged sentential decision diagrams (TSDDs), as one of these variants based on structured decomposition, exploit both the standard and zero-suppressed trimming rules. In this paper, we first introduce how to use Boolean functions to represent dictionary files and then design an algorithm that encodes dictionaries into TSDDs with the help of tries and a decoding algorithm that restores TSDDs to dictionaries. We utilize the help of tries in the encoding algorithm, which greatly accelerates the encoding process. Considering that TSDDs integrate two trimming rules, we believe that using TSDDs to represent dictionaries would be more effective, and the experiments also show this.

1. Introduction

A word is a string consisting of some symbols or characters, and a dictionary is a set that includes numerous words. A large dictionary can consist of millions of words. With the evolution of computer technology, in practical life, we use dictionaries in many places. For example, teachers store the names of all students in the school on their computers; this list of students is actually a dictionary, where each student’s name is a word. In order to solve practical problems, we often need to perform some common operations in dictionaries, three of which are, for example, (1) searching for a word in a dictionary, (2) adding words to a dictionary, and (3) removing words from a dictionary. In general, the time complexity of the above operations is O ( n ) . When the size of a dictionary (that is, the number of words) is too large, the above three operators are time-consuming and ineffective. Therefore, designing an effective method for operations in dictionaries is worth investigating.
An effective method for operations in dictionaries is to encode the dictionary into an efficient and compact representation. Some scholars have proposed a decision-diagram-based method of operation for dictionaries [1]. However, they have not provided any algorithms or implementations of the method. Following the above-mentioned idea, in this paper, we represent large dictionaries in decision diagrams and solve the common operations for dictionaries via operations in decision diagrams.
Binary decision diagrams (BDDs) are unique canonical forms that adhere to specific constraints, namely, ordering and reduction, ensuring that each Boolean function possesses a distinct BDD representation. This characteristic minimizes the storage requirements of BDDs and facilitates O(1) time equality tests on BDDs. After the emergence of the BDD, a variant known as the zero-suppressed BDD (ZDD) was introduced in [2]. ZDDs share similar characteristics with BDDs, such as canonicity and support for polynomial-time Boolean operations. The primary distinction between BDDs and ZBDDs lies in their respective reduction rules. Building on the applications of BDDs and ZBDDs, several extensions have been developed, including tagged BDDs (TBDDs) [3], chain-reduced BDDs (CBDDs) [1], chain-reduced ZDDs (CZDDs) [1], and edge-specified reduction BDDs (ESRBDDs) [4]. These extensions, which integrate two reduction rules, offer more compact representations compared to BDDs and ZDDs.
With the increasing maturity of decision diagram technology, people have begun to focus on research on applying decision diagrams to many fields. In ref. [1], the researchers successfully transformed a dictionary into BDDs and two variants of BDDs, CBDDs and CZDDs. They started from the following three points: (1) A Boolean function can be used to represent a binary number. (2) Decision diagrams can be used to represent Boolean functions. (3) Characters and symbols can be encoded into binary codes. Then, they considered the possibility of using decision diagrams to represent characters and symbols and implemented it. Finally, they encoded two dictionaries containing hundreds of thousands of words into three decision diagrams and provided data on the node count and time indicators. This confirms that encoding dictionaries into decision diagrams is a feasible research direction.
Hence, in this paper, we focus on applying another type of decision diagram, tagged sentential decision diagrams (TSDDs), into encoding dictionaries, and we first introduce TSDDs. To introduce TSDDs, we first introduce sentential decision diagrams (SDDs), which are decision diagrams based on structured decomposition [5], while BDDs are based on Shannon decomposition [6]. While BDDs are characterized by a total variable order, SDDs are defined by a variable tree (vtree), which is a complete binary tree with variables as its leaves, and apply standard trimming rules. Furthermore, in ref. [7], the researchers introduced the zero-suppressed variant of the SDD, known as the ZSDD, which also utilizes structured decomposition and applies zero-suppressed trimming rules instead of the standard trimming rules used in SDDs. ZSDDs offer a more compact representation for sparse Boolean functions compared to SDDs, while SDDs are better suited for homogeneous Boolean functions. To leverage the strengths of both SDDs and ZSDDs, ref. [8] devised a new decision diagram, the TSDD, which combines the standard and zero-suppressed trimming rules.
The contributions of our paper mainly include the following: (1) We propose an algorithm for encoding dictionaries into decision diagrams. We first transform a dictionary into a well-known data structure: the trie. With the help of tries, our algorithm can encode a dictionary into a decision diagram more efficiently than the method without tries. (2) We encoded 14 dictionaries into seven decision diagrams, i.e., BDDs, ZDDs, CBDDs, CZDDs, SDDs, ZSDDs, and TSDDs, in four ways, and we believe that TSDDs are the most suitable for representing dictionaries among all decision diagrams. We adopted TSDDs to represent dictionaries, and the experimental results show that TSDDs are more compact representations compared to other decision diagrams. (3) We also designed an algorithm that decodes a decision diagram and obtains the original dictionary. Our algorithm recursively restores each word in the dictionary and then saves these words together to obtain the original dictionary. Moreover, our algorithm can quickly complete the decoding process in most cases.
The rest of this paper is organized as follows. We first introduce the syntax and semantics of SDDs and ZSDDs. Section 3 introduces the syntax of TSDDs and the binary operation on TSDDs. In this section, we mainly introduce how to use a TSDD to denote a Boolean function and the trimming rules of TSDDs. Ref. [8] proposed a related definition of TSDDs. However, the researchers used three different semantics to explain SDDs, ZSDDs, and TSDDs. In fact, we believe that the difference in these decision diagrams mainly lies in the different trimming rules. These three decision diagrams can be explained with the same semantics. In this way, we can have a more intuitive understanding of these three decision diagrams and theoretically understand why TSDDs are more effective than SDDs and ZSDDs. Section 4 introduces how to use a Boolean function to represent a dictionary, the process of encoding a dictionary with the help of tries, and the decoding algorithm. An experimental evaluation comparing TSDDs with other decision diagrams appears in Section 5. Finally, Section 6 concludes this paper.

2. Preliminaries

Throughout this paper, we use lowercase letters (e.g., x 1 , x 2 ) for variables and bold uppercase letters (e.g., X , Y ) for sets of variables. For a variable x, we use x ¯ to denote the negation of x. A literal is a variable or a negated one. A truth assignment over X is the mapping σ : X { 0 , 1 } . We let Σ X be the set of truth assignments over X . We say f is a Boolean function over X , which is the mapping Σ X { 0 , 1 } . We use 1 (resp. 0 ) for the Boolean function that maps all assignments to 1 (resp. 0).
Let X and Y be two disjoint and non-empty sets of variables. We use f to denote a Boolean function and use f ( X ) to denote a Boolean function over the variable set X . We say the set { ( f 1 p ( X ) , f 1 s ( Y ) ) , , ( f n p ( X ) , f n s ( Y ) ) } is an ( X , Y ) -decomposition of a Boolean function f ( X , Y ) iff f = ( f 1 p ( X ) f 1 s ( Y ) ) ( f n p ( X ) f n s ( Y ) ) , where every f i p ( X ) (resp. f i s ( Y ) ) is a Boolean function over X (resp. Y ). A decomposition is compressed iff f i s f j s for i j . An ( X , Y ) -decomposition is called an ( X , Y ) -partition iff (1) f i p 0 for 1 i n , (2) f i p f j p = 0 for i j , and (3) f i p f n p = 1 .
A vtree is a full binary tree whose leaves are labeled by variables, and we use T to denote a vtree node. Then, we use T l to denote the left subtree of T , while T r denotes the right subtree of T . The set of variables appearing in the leaves of T is denoted by v ( T ) . In addition, there is a special leaf node labeled by 0, which can be considered a child of any vtree node, and v ( 0 ) = ∅. The notation T 1 T 2 denotes that T 1 is a subtree of T 2 . In order to unify the definition, we use a tuple to denote a decision diagram in this paper and give the following definition.
Definition 1.
A decision diagram is a tuple  ( T 1 , T 2 , α )  s.t.  T 2 T 1 , which is recursively defined as follows:
  • α is a terminal node labeled by one of four symbols:  1 ,  0 , ε, or  ε ¯ ;
  • α is a decomposition node  { ( p 1 , s 1 ) , , ( p n , s n ) }  satisfying the following:
    Each p i is a decision diagram ( T 3 , T 4 , β ) , where T 4 T 3 T 2 ;
    Each s i is a decision diagram ( T 5 , T 6 , γ ) , where T 6 T 5 T 2 .
The size of α is denoted by | α | , and | α | = 0 when α is a terminal node and | α | = n when α = { ( p 1 , s 1 ) , , ( p n , s n ) } . We use ( T 1 , T 2 , α ) to denote the Boolean function that this decision diagram represents. Then, we give the definition of the syntax of decision diagrams.
Definition 2.
Let T 1 and T 2 be two vtrees, and let T 2 be a subtree of T 1 . The semantics of decision diagrams is inductively defined as follows:
  • ( T 1 , T 2 , 1 ) = x v ( T 1 ) v ( T 2 ) x ¯ , ( T 1 , T 2 , 0 ) = 0 .
  • ( T 1 , T 2 , ε ) = v ( T 1 ) x ¯ , ( T 1 , T 2 , ε ¯ ) = x v ( T 1 ) v ( T 2 ) x ¯ x v ( T 2 ) x .
  • ( T 1 , T 2 , { ( p 1 , s 1 ) , , ( p n , s n ) } ) = x v ( T 1 ) v ( T 2 ) x ¯ i = 1 n ( p i s i ) and satisfies the following conditions:
    p i 0 for 1 i n .
    p i p j for i j .
    i = 1 n p i = 1 .
A sentential decision diagram (SDD) ( T 1 , T 2 , α ) has the following further constraints based on the above definition of a decision diagram:
  • If α is a terminal node, then it must be one of the following:
    ( 0 , 0 , 1 ) = 1 .
    ( 0 , 0 , 0 ) = 0 .
    ( T 1 , 0 , 1 ) , where T 1 must be a leaf vtree node.
    ( T 1 , T 2 , ε ¯ ) , where T 1 must be a leaf vtree node and T 1 = T 2 .
  • If α is a decomposable node, then T 1 = T 2 .
Suppose that ( T 1 , T 2 , α ) is an SDD; then, we know that T 1 = T 2 if α is a decomposable node. Hence, we use ( T , T , α ) to denote an SDD in order to provide a more intuitive definition of compression and trimming rules. The compression and trimming rules for SDDs are proposed in [9], and we give them according to the above definition.
  • Standard compression rule: if s i = s j , then replace ( T , T , { ( p 1 , s 1 ) , , ( p i , s i ) , , ( p j , s j ) , , ( p n , s n ) } ) with ( T , T , { ( p 1 , s 1 ) , , ( p , s i ) , , ( p n , s n ) } ) , where p = p i p j .
  • Standard trimming rules:
    Replace ( T , T , { ( p 1 , ( 0 , 0 , 1 ) ) , ( p 2 , ( 0 , 0 , 0 ) ) } ) with p 1 (shown in Figure 1a).
    Replace ( T , T , { ( 0 , 0 , 1 ) , s ) } ) with s (shown in Figure 1b).
For Figure 1, we need to clarify the following content. For a decision diagram ( T 1 , T 2 , α ) , when α is a terminal node, the above three components are represented by a square, where α is shown on the left side of the square, T 1 is in the upper-right corner, and T 2 is in the lower-right corner. When α is a decomposition node, T 1 and T 2 are displayed as circles with outgoing edges pointing to the elements. Each element ( p i , s i ) is represented by paired boxes, where the left box represents the prime p i , and the right box stands for the sub s i .
Compressed and trimmed SDDs were shown to be a canonical form of Boolean functions in [9]. Then, we show the syntax and semantics of ZSDDs in our definition. A ZSDD has different constraints based on the above definition of a decision diagram. First of all, it should be noted that we use T r o o t to denote the root node of the whole vtree:
  • If α is a terminal node, then it must be one of the following:
    ( T , 0 , 1 ) = v ( T ) x ¯ .
    ( 0 , 0 , 0 ) = 0 .
    ( T r o o t , T , ε ¯ ) , where T must be a leaf vtree node.
    ( T r o o t , T , 1 ) , where T must be a leaf vtree node.
The compression and trimming rules for ZSDDs are as follows:
  • Zero-suppressed compression rule: if s i = s j , then replace ( T 1 , T 2 , { ( p 1 , s 1 ) , , ( p i , s i ) , , ( p j , s j ) , , ( p n , s n ) } ) with ( T 1 , T 2 , { ( p 1 , s 1 ) , , ( p , s i ) , , ( p n , s n ) } ) , where p = p i p j .
  • Zero-suppressed trimming rules:
    Replace ( T 1 , T 2 , { ( ( T l 2 , T 3 , α ) , ( T r 2 , 0 , 1 ) ) , ( p 2 , ( 0 , 0 , 0 ) ) } ) with ( T 1 , T 3 , α ) (shown in Figure 1c).
    Replace ( T 1 , T 2 , { ( T l 2 , 0 , 1 ) , ( T r 2 , T 3 , α ) ) , ( p 2 , s 2 ) } ) with ( T 1 , T 3 , α ) (shown in Figure 1d).
Similar to SDDs, compressed and trimmed ZSDDs were also proven to be a canonical form of Boolean functions in [7]. SDDs are suitable to represent homogeneous Boolean functions, while ZSDDs are suitable to represent Boolean functions with sparse values.

3. Tagged Sentential Decision Diagrams

In this section, we will first introduce the syntax and semantics of TSDDs and the compression and trimming rules of TSDDs. Similar to SDDs and ZSDDs, we give the syntax and semantics based on the syntax and semantics of decision diagrams in Section 2. Then, we briefly introduce the binary operations on TSDDs and how to use these operations to construct a Boolean function.

3.1. Syntax and Semantics

TSDDs have different constraints compared to SDDs and ZSDDs based on the syntax and semantics of decision diagrams. Here, we give the following constraints.
Definition 3.
Let T 1 and T 2 be two vtrees s.t. T 2 T 1 and ( T 1 , T 2 , α ) is a TSDD. Then,
  • If α is a terminal node, then it must be one of the following:
    ( 0 , 0 , 0 ) , and ( 0 , 0 , 0 ) = 0 .
    ( T 1 , 0 , 1 ) , and ( T 1 , 0 , 1 ) = v ( T 1 ) x ¯ (specifically, if T 1 = 0 , then ( 0 , 0 , 1 ) = 1 ).
    ( T 1 , T 2 , ε ¯ ) and T 2 must be a leaf vtree node, and ( T 1 , T 2 , ε ¯ ) = ( v ( T 1 ) v ( T 2 ) x ¯ ) ( v ( T 2 ) x ) .
  • If α is a decomposition node, then T 2 must not be a leaf vtree node.
We can see that for a TSDD ( T 1 , T 2 , α ) , T 2 must be a leaf vtree node or 0 if α is a terminal node. We should note that we can directly construct some TSDDs. (1) The TSDDs ( 0 , 0 , 0 ) and ( 0 , 0 , 1 ) can be constructed. (2) The TSDDs ( T , 0 , 1 ) and ( T , T , ε ¯ ) , where T is a leaf vtree node, can be constructed. These two kinds of TSDDs are special TSDDs; that is, they are the foundation for constructing a TSDD to represent a Boolean function. For the second type of TSDDs, we know that T is a leaf vtree node; that is, v ( T ) contains only one variable. Suppose that v ( T ) = x ; then, ( T , 0 , 1 ) = x ¯ and T , T , ε = x .

3.2. Canonicity

TSDDs, as a variant of SDDs, apply trimming rules that integrate the trimming rules of SDDs and ZSDDs. Hence, the trimming rules of TSDDs include the trimming rules of SDDs and ZSDDs that are shown in Figure 1a–d, and we do not introduce them in the following definition. Hence, the trimming rules of TSDDs start from (e). In addition, these rules also include five new rules. We then show the compression and trimming rules for TSDDs as follows:
  • Tagged compression rule: if s i = s j , then replace ( T 1 , T 2 , { ( p 1 , s 1 ) , , ( p i , s i ) , , ( p j , s j ) , , ( p n , s n ) } ) with ( T 1 , T 2 , { ( p 1 , s 1 ) , , ( p , s i ) , , ( p n , s n ) } ) , where p = p i p j .
  • Tagged trimming rules:
    If p = ( 0 , 0 , 1 ) and s = ( 0 , 0 , 0 ) , then replace ( T 1 , T 2 , { ( p , s ) } ) with ( 0 , 0 , 0 ) (shown in Figure 1e).
    If p 1 = ( T 3 , T 4 , α ) , s 1 = ( T r 2 , 0 , 1 ) , s 2 = ( 0 , 0 , 0 ) , and T 3 T l 2 , then replace ( T 1 , T 2 , { ( p 1 , s 1 ) , ( p 2 , s 2 ) } ) with ( T 1 , T l 2 , { ( p 1 , ( 0 , 0 , 1 ) ) , ( p 2 , ( 0 , 0 , 0 ) ) } ) (shown in Figure 1f).
    If p 1 = ( T 3 , T 4 , α ) , s 1 = ( T r 2 , 0 , 1 ) , s 2 = ( 0 , 0 , 0 ) , and T 3 T r 2 , then replace ( T 1 , T 2 , { ( p 1 , s 1 ) , ( p 2 , s 2 ) } ) with ( T 1 , T l 2 , { ( ( 0 , 0 , 1 ) , p 1 ) } ) (shown in Figure 1g).
    If p 1 = ( T l 2 , 0 , 1 ) , s 1 = ( T 3 , T 4 , α ) , s 2 = ( 0 , 0 , 0 ) , and T 3 T l 2 , then replace ( T 1 , T 2 , { ( p 1 , s 1 ) , ( p 2 , s 2 ) } ) with ( T 1 , T r 2 , { ( p 1 , ( 0 , 0 , 1 ) ) , ( p 2 , ( 0 , 0 , 0 ) ) } ) (shown in Figure 1h).
    If p 1 = ( T l 2 , 0 , 1 ) , s 1 = ( T 3 , T 4 , α ) , s 2 = ( 0 , 0 , 0 ) , and T 3 T r 2 , then replace ( T 1 , T 2 , { ( p 1 , s 1 ) , ( p 2 , s 2 ) } ) with ( T 1 , T r 2 , { ( ( 0 , 0 , 1 ) , p 1 ) } ) (shown in Figure 1i).
A TSDD is compressed (resp. trimmed) if no tagged compression (resp. trimming) rules can be applied to it. The canonicity of compressed and trimmed TSDDs was proved in [8].

3.3. Operations on TSDDs

The main operations on TSDDs include conjunction (∧), disjunction (∨), and negation ( x ¯ ). The algorithm of the binary operations conjunction (∧) and disjunction (∨) was shown in [8]. Here, we show the negation algorithm on TSDDs in Algorithm 1.
Algorithm 1:  Negate  ( F )
Algorithms 17 00042 i001
   For special terminal nodes, which are the two kinds of TSDDs mentioned in Section 3.1, we can directly compute the resulting TSDD (Lines 1–2). For the other terminal nodes, we need to transform them into another form, namely, decomposable nodes (Line 3). And then, we apply N e g a t e ( s i ) to every element and construct the following elements: γ = { ( p 1 , N e g a t e ( s 1 ) ) , , ( p n , N e g a t e ( s n ) ) } ; the resulting TSDD is ( T 1 , T 2 , γ ) (Lines 4–7). Finally, we apply compression and trimming rules to H (Line 8).
The binary operation is represented by A p p l y ( F , G , ) , where ∘ represents ∧ or ∨, as shown in [8]. With these three operations, we can construct TSDDs to represent any Boolean function based on existing TSDDs. Here, we give an example. Given the Boolean function f = ( x 1 x 2 ) ¯ ( x 2 x ¯ 3 ) , we have the following initial TSDDs: F ( x 1 ) , F ( x ¯ 1 ) , F ( x 2 ) , F ( x ¯ 2 ) , F ( x 3 ) , and F ( x ¯ 3 ) , where F ( x 1 ) = x 1 and so on. The steps are as follows: (1) Let F = A p p l y ( F ( x 1 ) , F ( x 2 ) , ) ) . (2) Let F = N e g a t e ( F ) . (3) Let G = A p p l y ( F ( x 2 ) , F ( x ¯ 3 ) , ) . (4) Let F = A p p l y ( F , G , ) . Finally, we obtain TSDD F, where F = f .

4. Encoding Dictionaries into TSDDs

A dictionary includes a large number of words, and we intend to transform it into a decision diagram that stores the whole dictionary. In this way, we can complete some common operations in the dictionary just by performing some binary operations on decision diagrams. When we want to verify whether a word exists in the dictionary, we just need to compute the result A p p l y ( F , G , ) , where F is the decision diagram representing the dictionary, and G is the decision diagram representing the word. If the result is f a l s e , then the word does not exist in the dictionary; otherwise, it exists in the dictionary. When we want to remove some words from the dictionary, we just need to construct the decision diagram G representing the set of words. And then, we compute the result A p p l y ( F , N e g a t e ( G ) , ) , where F represents the dictionary. Changing the traversal process on dictionaries to several binary operations on decision diagrams can remove the traversal process. This is why we encode dictionaries into decision diagrams.
In this section, we first introduce the process of encoding a dictionary into a decision diagram in four different ways. Ref. [1] pointed out the use of tries when encoding dictionaries into decision diagrams but did not provide a detailed description of the process. Hence, we were inspired by them and designed our method of encoding dictionaries with the help of tries, which greatly accelerated the compilation process. In the following, we intend to explain in detail how to use tries to complete the encoding of a dictionary and present the algorithm for decoding TSDDs.

4.1. Four Encoding Methods

The key to our method is to establish the correspondence between letters and numbers so that we can use a string of numbers to represent a word. For example, the ASCII code, a famous encoding system, uses a number ranging from 0 to 127 to represent 128 characters. With the help of the ASCII code, we represent a letter with some variables by transforming the code into its binary representation. As we know, the ASCII code of the letter ‘A’ is 65, whose binary representation is ‘1000001’ with 7 bits. We represent the letter ‘A’ by 7 variables, ‘ x 1 x 2 x 3 x 4 x 5 x 6 x 7 ’, and consider the variable to be ‘0’ when the value is false and ‘1’ when the value is true. Hence, a letter needs 7 variables to be represented in the ASCII code. If a word consists of n letters, it needs 7*n variables in total to be represented.
However, there are many characters in the ASCII code that are often not used in most dictionaries, which means that it will cause significant redundancy if we use 7 bits to represent every letter. Suppose that a dictionary consists only of lowercase letters; that is, there are only 26 different letters in this dictionary. We number each letter in alphabetical order starting from 1, with the maximum number being 26, representing z. The number 26 just needs 5 bits to be represented, and its binary representation is ‘11010’, which means that a letter needs 5 variables to be represented. In this way, we reduce the number of variables required to represent a letter, and this code is called the compact code.
Compared with the ASCII code, the compact code can reduce the number of variables required to represent a letter. However, we need to make a protocol that specifies the correspondence between letters and numbers if we apply the compact code and records this correspondence in a table. This table is necessary for both encoding and decoding. Therefore, this table should be known as necessary information by all those who use compact encoding. The ASCCI code is an international standard that does not require additional space to store the correspondence between letters and codes. Hence, the universality of the ASCCI code is better than that of the compact code.
We use binary numbers to represent both the ASCII code and compact code for each letter, which means that the number of variables is the number of binary digits. Hence, we call this the binary way of encoding dictionaries. If the maximum value of the code is n, we can also use n variables to represent the code. For example, given the word zoo, there are two different letters, ‘z’ and ‘o’, in this word, and the word consists of three letters. Hence, we need six variables, x 1 , x 2 , x 3 , x 4 , x 5 , and x 6 , to represent it. x 1 and x 2 represent the first letter, and the first letter is ‘z’ if x 1 = t r u e and x 2 = f a l s e , while it is ‘o’ if x 1 = f a l s e and x 2 = t r u e . The other variables have similar meanings. Therefore, the word ‘zoo’ is represented by x 1 x 2 x 3 x 4 x 5 x 6 = 100101 . We call this the one-hot way of encoding the dictionary. A word consisting of m letters needs n*m variables in total to represent it in one-hot encoding, while it just needs l o g 2 n m variables in total to represent it in a binary way.

4.2. Encoding with Tries

We can first encode every word in the dictionary into a TSDD one by one and then apply disjunction to these TSDDs to obtain the resulting TSDD that represents the whole dictionary. However, the process of encoding takes too much time, which is unacceptable to us. If the number of words in the dictionary exceeds 100,000, the encoding time will exceed two hours. Therefore, we first transform the dictionary into a trie, which is a multibranch tree, and then transform it into a TSDD. This can greatly accelerate the speed of encoding a dictionary.
A trie is a multibranch tree whose every node saves a letter, except for two special nodes. We know that a tree is a directed acyclic graph, and every node has its in- and out-degrees. In general, the in- and out-degrees of each node in a trie are both greater than 0, and these nodes all save a letter in them. However, there are two special nodes, one with an in-degree of 0 and the other with an out-degree of 0. We call the node with an in-degree of 0 the head and the node with an out-degree of 0 the tail. Both nodes do not save any letters. We can easily know that all paths in the trie must start from the head and end at the tail. A path represents a word in the dictionary, which means that the number of paths is the number of words in the trie.
We first introduce the algorithm for transforming a dictionary into a trie in Algorithm 2. We need to initialize the two special nodes, that is, the head node h ˜ and the tail node t ˜ , and a node v ˜ , which is an empty node (Line 1). Then, we traverse each word in the dictionary, and in a loop, we let v ˜ be the node h ˜ (Lines 2–3). We perform the following operations on the letters in the word in order. The operation F i n d ( l , v ˜ ) means we look for a node from successor nodes of v ˜ that save the letter l. If such a node does not exist, we create a new node with NewNode(l) and let it be the successor node of trace with A p p e n d ( v ˜ , n ˜ ) (Lines 4–8), and then we let v ˜ be the node n ˜ (Line 9). After all the letters have been accessed, we let t ˜ be the successor node of v ˜ (Line 10). Finally, the node h ˜ is the resulting trie.
Algorithm 2:  ToTrie  ( D )
Algorithms 17 00042 i002
After we obtain the trie, we need to compress it further. Firstly, the nodes in the trie need to be assigned an important parameter: depth; we denote a node as P ˜ ( d , l ) , where d stands for depth and l stands for letter. We first give the following definition.
Definition 4.
A trie node P ˜ ( d , l ) is defined as follows:
  • d = 0 if the node P ˜ ( l , d ) is the node h e a d .
  • If P ˜ ( d , l ) is a node and P ˜ ( d , l ) is one of its children, then d = d + 1 .
  • Letters that appear sequentially in the path from the root to the leaf form a word.
The step of compressing the trie entails merging two nodes that have the same depth and meet certain conditions into one node. The conditions are as follows: (1) The two nodes save the same letter. (2) The two nodes have the same children nodes. We group all nodes by depth and merge nodes by group, starting from the deepest group.
Here, we give an example using a dictionary that contains six words: aco, cat, dot, purple, ripple, and zoo. The tries before and after merging the nodes are shown in Figure 2. There are two special nodes in the graph, namely, the root node with a depth of 0 and the tail node represented by a solid circle without a depth. Except for these two nodes, each other node stores a letter, and we stipulate that all paths representing words start at the root node and end at the tail node. Except for the tail node, all other nodes have a depth, and the maximum depth on a path is the number of letters in the word represented by that path. After we merge the nodes, we can see that the number of nodes has significantly decreased.
We then transform this trie into a TSDD, and the algorithm is shown in Algorithm 3. We first compute the number of variables in the TSDD and suppose that the maximum length of all words in the dictionary is n and the number of different letters in the dictionary is m. The number of variables is 128*n if we use the ASCII code in a one-hot way and 7*n if we use the ASCII code in a binary way. The number of variables is m*n if we use the compact code in a one-hot way and l o g 2 m n if we use the compact code in a binary way. Here, we explain the method that transforms a trie node into a TSDD node by using the ASCII code in a binary way, and we save the corresponding TSDD in the trie node. There are three steps in total. Let P ˜ be a trie node. The steps are as follows: (1) We access each node n ˜ ( d , l ) from bottom to top and compute the result of applying disjoin on all the TSDDs that are saved in the successor nodes of n ˜ ( d , l ) and record this result as G, and Successor( n ˜ ) represents all the successor nodes of the trie node n ˜ (Lines 1–4). (2) We compute the TSDD GetTSDD( n ˜ ), which represents the letter saved in the trie node (Line 5). For example, if the letter l is ‘a’, the binary representation of the letter is ‘1100001’, and the related variables are x ( d 1 ) 7 + 1 , x ( d 1 ) 7 + 2 , x ( d 1 ) 7 + 3 , x ( d 1 ) 7 + 4 , x ( d 1 ) 7 + 5 , x ( d 1 ) 7 + 6 , x ( d 1 ) 7 + 7 . Of these variables, the variables that are marked as ‘1’ are x ( d 1 ) 8 + 1 , x ( d 1 ) 8 + 2 , x ( d 1 ) 8 + 7 . We need to construct the TSDD G that represents the Boolean function f = x ( d 1 ) 7 + 1 x ( d 1 ) 7 + 2 x ¯ ( d 1 ) 7 + 3 ) x ¯ ( d 1 ) 7 + 4 x ¯ ( d 1 ) 7 + 5 x ¯ ( d 1 ) 7 + 6 x ( d 1 ) 7 + 7 . (3) We compute the TSDD G = A p p l y ( G , G e t T S D D ( n ˜ ) , ) (Line 5). Then, we save this TSDD G in the trie node (Line 6). Similarly, we group all trie nodes based on depth and transform the nodes into TSDDs in the group in descending order of depth. We should note that if P ˜ is the tail trie node, the TSDD saved in it is not fixed. The TSDD saved in the tail trie node is decided based on the depth of its parent node. Let the depth of the parent node be d and the TSDD saved in the tail node be G; then, G = i = d 7 + 1 n x ¯ . In addition, not all words consist of n letters. If a word consists of fewer than n letters, we use null characters to supplement the remaining positions. For the above example, if the letter saved in the trie node is a null character, then the Boolean function in step 2 is f = x ¯ ( d 1 ) 7 + 1 x ¯ ( d 1 ) 7 + 2 x ¯ ( d 1 ) 7 + 3 x ¯ ( d 1 ) 7 + 4 ) x ¯ ( d 1 ) 7 + 5 x ¯ ( d 1 ) 7 + 6 x ¯ ( d 1 ) 7 + 7 . Finally, the TSDD saved in the head trie node is the TSDD that represents the whole dictionary (Line 7).
Algorithm 3:  ToTSDD  ( P ˜ )
Algorithms 17 00042 i003
For the other three encoding methods, the process is also similar, with only differences in the relevant variables. Similarly, it is necessary to perform a conjunction operation on the variables marked as 1 and the negation of variables marked as 0. The TSDD saved in the head trie node is what we want. At this point, the process of encoding a dictionary is complete.

4.3. Decoding

We use one code to represent a letter and then multiple codes to represent an entire word. On the contrary, we can also restore a word from a code string. The process of decoding a TSDD means restoring a string of code from the TSDD and then obtaining the corresponding words. After restoring all strings of codes, we can obtain the original dictionary.
Before explaining the decoding algorithm, we first introduce some operations. Given a depth d, we use List(d) to denote the set of all possible TSDDs, that is, the TSDDs representing the possible letters. In Section 4.1, we gave an example of how to represent the word zoo with six variables by using the compact code in a binary way. Now, we continue to use this example to illustrate the decoding process. In addition, the variables denote a null character when both x 1 and x 2 are assigned the value ’false’. Hence, there are three cases when the depth is 1. L i s t ( 1 ) = { F 1 , F 2 , F 3 } , where F 1 , F 2 , and F 3 are TSDDs, and F 1 = x ¯ 1 x ¯ 2 , F 2 = x ¯ 1 x 2 , and F 3 = x 1 x ¯ 2 . We also use Letter(F) to denote the letter that the TSDD F represents; that is, L e t t e r ( F 1 ) is a null character, L e t t e r ( F 2 ) = ‘z’, and L e t t e r ( F 3 ) = o’. Given the word ‘zo’, we stipulate that Push(‘zo’,‘o’) = ‘zoo’ and P o p (‘zoo’) = ‘zo’. We use d m a x to denote the max depth in the dictionary. The algorithm is shown in Algorithm 4.
The initial inputs for this algorithm are the TSDD to be decoded, the empty word w, the empty dictionary D ¯ , and the depth d = 1. Then, we make the operation A p p l y ( F , G , ) , where G represents the letter that may appear in the word in order (Lines 1–2). If the result of A p p l y ( F , G , ) is not false, we recursively add the next possible letter until we encounter a null character or reach the maximum depth (Lines 3–10). After the algorithm is completed, the dictionary D ¯ is the result we want.
However, this algorithm is suitable for decision diagrams with fewer variables. We can decode decision diagrams that are encoded in a binary way but not decision diagrams that are encoded in a one-hot way. This is because the time for decoding a decision diagram that is encoded in a one-hot way exceeds one hour when the dictionary includes over 10,000 words. Hence, we need another effective algorithm to decode such decision diagrams. In this study, we did not conduct decoding-related experiments.
Algorithm 4:  Decode  ( F , w , D ¯ , d )
Algorithms 17 00042 i004

5. Experimental Results

In this section, we first compare the speed of encoding a dictionary with and without tries to demonstrate the acceleration effect of tries on encoding a dictionary. We then encode 14 dictionaries into BDDs, ZDDs, CBDDs, CZDDs, SDDs, ZSDDs, and TSDDs with tries and compare the node count of decision diagrams and the time required for encoding the dictionaries into decision diagrams. All experiments were carried out on a machine equipped with an Intel Core i7-8086K 4 GHz CPU and 64 GB RAM. We used four encoding methods to conduct all experiments: compact code in a one-hot way, compact code in a binary way, ASCII code in a one-hot way, and ASCII code in a binary way.
The dictionaries we used are the English words in the file /usr/shar/dict/words on a MacOS system with 235,886 words with a length of up to 24 from 54 symbols, a password list with 979,247 words with a length of up to 32 from 79 symbols [10], and other word lists from the website at [11]. To compare the time for encoding dictionaries into TSDDs with and without tries, we separated the first 20,000, 30,000, and 40,000 words from the dictionary words to form four new dictionaries and encoded them into TSDDs. This is because, without the help of a trie, the encoding time would exceed two hours when the number of words exceeds 40,000. We used the same right linear vtree to complete the first experiment. Hence, the size and node count of TSDDs must be the same in this experiment with the same encoding method, and we only compared the encoding time. The results are shown in Table 1.
We can see that all the experiments were completed within 10 s with the help of tries for these three dictionaries. When we did not rely on the help of tries, even the minimum encoding time reached 251.86 s (the compact code in a one-hot way for the dictionary with 20,000 words). When the number of words reaches 40,000, it even takes over 1000 s to encode the dictionary, which exceeds the time required for encoding the dictionary with the help of tries by more than a hundred times. It can be inferred that for dictionaries with more words, encoding them will take more time, which we cannot tolerate. Therefore, we can see that tries have excellent acceleration effects when we encode a dictionary, and it is necessary to rely on the help of tries when encoding dictionaries.
Then, the second experiment is as follows. Ref. [1] encoded two dictionaries, words and password, into four decision diagrams, i.e., BDDs, ZDDs, CBDDs, and CZDDs, in four ways and gave the number of nodes of decision diagrams and the encoding times for some data. Here, we use their data and record them in Table 2. For the other 12 dictionaries, they did not conduct any relevant experiments. Therefore, we extended their experiment by encoding the remaining twelve dictionaries into BDDs, ZDDs, CBDDs, and CZDDs in four ways and recording the node count and encoding time. We also encoded the 14 dictionaries into SDDs, ZSDDs, and TSDDs in four ways and then compared all decision diagrams using two categories of benchmarks: node count, the node count of decision diagrams, and time, the time for encoding a dictionary. The initial vtrees are all right linear trees. To reduce the node count of decision diagrams, we designed minimization algorithms for ZSDDs and TSDDs and applied them once to SDDs, ZSDDs, and TSDDs when over half of the words in the dictionary were encoded. The results of this experiment are shown in Table 2. Columns 1–2 report the names of the dictionaries and the word counts of the dictionaries. The names of the decision diagrams that represent the dictionaries are reported in column 3. Then, columns 4–11 report the node counts of decision diagrams and the encoding times using the four encoding methods. In our experiment, each dictionary contained over 100,000 words. In addition, we use ’–’ in place of data if the encoding time exceeded one hour.
In general, the number of variables used in one-hot encoding is much larger than that in binary encoding, and the Boolean functions encoded in a one-hot way are more sparse than those encoded in a binary way when representing the same dictionary. For each dictionary, the node count of the decision diagram with the minimum node count among all decision diagrams for each encoding method is highlighted in bold so that readers can intuitively see which decision diagram performs the best in node count. We first consider the decision diagrams that are encoded with the compact code in a one-hot way. Taking the dictionary phpbb as an example, we can see that the ZDD and CZDD have the same and minimum node counts, while the CBDD has the second-smallest node count among all decision diagrams. The node count of the CBDD is 344,800, which is just 332 more than that of the BDD and CZDD. The node count of the ZSDD is larger than that of the ZDD, CZDD, and CBDD, and hence, it has the third-smallest node count. The node count of the TSDD is larger than that of the above decision diagrams. However, the node count still does not exceed 1.5 times the minimum value (the node count of ZDD or CZDD), that is, 446,421/344,468 = 1.29 < 1.5. The node counts of the SDD and BDD are much larger than those of the other decision diagrams, and the node count of the SDD is slightly smaller than that of the BDD. Hence, we can conclude that for the node count, among all the decision diagrams, the ZDD and CZDD have the best performance, and the performance of the CBDD, ZSDD, and TSDD is slightly inferior to that of the ZDD and CZDD, while the SDD and BDD perform the worst among all decision diagrams. For the encoding time, the performance of these decision diagrams is similar to that for the node count. We believe that the ZDD and CZDD take the minimum time to encode the dictionary. TSDDs perform worse than ZDDs and CZDDs in encoding time, while they are better than BDDs and SDDs. Although the performance of TSDDs is not very good when we encode in a one-hot way, it is not much worse than the best decision diagrams (ZDDs and CZDDs), and we can tolerate this drawback.
For the other 13 dictionaries, the performance of these decision diagrams for node count and time is similar to that for phpbb. We believe that the Boolean function that represents the dictionary in a one-hot way is a spare Boolean function, which makes the performance of ZDDs, CZDDs, CBDDs, ZSDDs, and TSDDs better than that of SDDs and BDDs. The decision diagrams that are encoded by the ASCII code in a one-hot way include more variables than those that are encoded by the compact code in a one-hot way. Because they are both encoded in a one-hot way, they have similar performance for the node count and encoding time.
For the decision diagrams that are encoded in a binary way, we focus on the performance of TSDDs. We can easily find that TSDDs have the minimum node count among all decision diagrams if we encode the same dictionary in a binary way, regardless of whether we use the compact code or ASCII code. Moreover, the node count of TSDDs may even be much smaller than that of other decision diagrams. For example, the node count of the TSDD that represents dutch-wordlist with the ASCII code in a binary way is 273,022, while the second-smallest node count is 976,921, which is more than 3 times larger than that of the TSDD. Although the encoding time of the TSDD is not the smallest among all decision diagrams, we believe that it is worth taking more time to encode the dictionaries into the TSDD, which has the minimum node count. Hence, we conclude that TSDDs are the most suitable decision diagrams for representing dictionaries in a binary way.
Finally, we make the following conclusions: (1) There is no decision diagram that has the minimum node count and encoding time for all dictionaries with all encoding methods. (2) TSDDs must have the minimum node count if we encode in a binary way, and the node count of TSDDs can be much smaller than those of other decision diagrams. (3) The node count of the TSDD is not more than 1.5 times that of the minimum node count for all dictionaries, except for walk-the-line, if we encode in the one-hot way. (4) Although the encoding time of TSDDs is not the best, for all dictionaries, we can encode them into TSDDs in four ways within half an hour, while it may take more than one hour for some dictionaries if we want to encode them into BDDs or SDDs in a one-hot way. Based on point 1, we find that there is no decision diagram that can perform better than all other decision diagrams in our experiments. Therefore, we need to find a decision diagram that is suitable for representing dictionaries among the seven decision diagrams. Based on points 2 and 3, we believe that, overall, TSDDs have the best performance in terms of the node count among all decision diagrams. Based on point 4, we believe that it is worthwhile to exchange some time for excellent performance in terms of the node count, which means that we intend to take more time to encode dictionaries into a more compact representation. In addition, we know that the number of variables of Boolean functions representing dictionaries encoded in a one-hot way is much larger than that in a binary way. With the increase in the word length, the number of variables will increase quickly. When we need to represent a large number by a Boolean function, a lot of variables are required if we represent it in a one-hot way. If a large number is represented in a binary way, only a small number of variables are required. Too many variables not only make management difficult but also require a lot of space to store. In general, people tend to represent large numbers in a binary way. We can see that TSDDs are the decision diagrams with the minimum node count and a suitable encoding time among the seven decision diagrams. Hence, we believe that TSDDs are more suitable for representing dictionaries.

6. Conclusions

In this paper, we have unified the definitions of semantics and syntax for SDDs, ZSDDs, and TSDDs based on Boolean functions, and our contributions are as follows: (1) We first propose an algorithm that encodes dictionaries into decision diagrams with the help of tries. To transform a dictionary into a decision diagram, we first transform the dictionary into a trie and then compress the trie, which can reduce the nodes of the trie. Then, we transform the trie into a decision diagram. Because we compress the trie, the number of operations on decision diagrams can be effectively reduced, which greatly accelerates the encoding speed. We have demonstrated through experiments that tries are of great help to our algorithm. (2) We then show that TSDDs are the decision diagrams that are more suitable for representing dictionaries. TSDDs had the smallest node count in our experiment when we encoded in a binary way and had a node count that was no more than 1.5 times that of the minimum node count among all decision diagrams in our experiments when we encoded in a one-hot way. In addition, the encoding time of TSDDs did not perform the best among all decision diagrams. However, we believe that it is worthwhile to exchange some encoding time for a smaller number of nodes. Hence, we believe that TSDDs are more suitable for representing dictionaries. (3) We also designed a decoding algorithm that transforms a decision diagram into the original dictionary. However, the decoding algorithm cannot decode decision diagrams encoded in a one-hot way.
By using TSDDs to represent the dictionary, we can complete common operations used on a dictionary with some binary operations on TSDDs. Hence, encoding dictionaries into TSDDs is very meaningful. Our study proposes an algorithm that transforms dictionaries into decision diagrams. However, our algorithm can still be further improved. In the future, we can continue our research in the following three directions: (1) We can search for a more compact way to represent symbols and characters by Boolean functions so as to reduce the number of variables. In this way, we can make the node count of decision diagrams smaller. (2) We can further improve the algorithms for reducing the node count and encoding time. If the node count of decision diagrams can be made small enough and the encoding time can be made short enough, decision diagrams will play a vital role in representing dictionaries. For example, we can use less storage space to store data consisting of symbols and characters by transforming them into decision diagrams. (3) On the other hand, our decoding algorithm needs to be improved so that it can decode decision diagrams encoded in a one-hot way in a short time. We believe that studying how to encode decision diagrams into dictionaries more efficiently will be very valuable in the future.

Author Contributions

Conceptualization, D.Z. and L.F. methodology, D.Z. and L.F.; software, D.Z.; validation, L.F. and Q.G.; formal analysis, Q.G.; investigation, D.Z.; resources, Q.G.; data curation, Q.G.; writing—original draft preparation, D.Z. and L.F.; writing—review and editing, D.Z., L.F. and Q.G.; visualization, D.Z.; supervision, L.F.; project administration, D.Z.; funding acquisition, Q.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Planning Project of Guangdong (2021B0101420003, 2020ZDZX3013, 2023A0505030013, 2022A0505030025, 2023ZZ03), the Science and Technology Planning Project of Guangzhou (202206030007), the Guangdong-Macao Advanced Intelligent Computing Joint Laboratory (2020B1212030003), and the Opening Project of Key Laboratory of Safety of Intelligent Robots for State Market Regulation (GQI-KFKT202205).

Data Availability Statement

Due to the involvement of our research data in another study, we will not provide details regarding where data supporting the reported results can be found.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Bryant, R.E. Chain Reduction for Binary and Zero-Suppressed Decision Diagrams. J. Autom. Reason. 2020, 64, 1361–1391. [Google Scholar] [CrossRef]
  2. Minato, S. Zero-Suppressed BDDs for Set Manipulation in Combinatorial Problems. In Proceedings of the 30th International Design Automation Conference (DAC-1993), Dallas, TX, USA, 14–18 June 1993; pp. 272–277. [Google Scholar]
  3. van Dijk, T.; Wille, R.; Meolic, R. Tagged BDDs: Combining Reduction Rules from Different Decision Diagram Types. In Proceedings of the 17th International Conference on Formal Methods in Computer-Aided Design (FMCAD-2017), Vienna, Austria, 2–6 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 108–115. [Google Scholar]
  4. Babar, J.; Jiang, C.; Ciardo, G.; Miner, A. CESRBDDs: Binary decision diagrams with complemented edges and edge-specified reductions. Int. J. Softw. Tools Technol. Transf. 2022, 24, 89–109. [Google Scholar]
  5. Pipatsrisawat, K.; Darwiche, A. New Compilation Languages Based on Structured Decomposability. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI-2008), Chicago, IL, USA, 13–17 July 2008; pp. 517–522. [Google Scholar]
  6. Shannon, C.E. A Symbolic Analysis of Relay and Switching Circuits. Trans. Am. Inst. Electr. Eng. 1938, 57, 713–723. [Google Scholar] [CrossRef]
  7. Nishino, M.; Yasuda, N.; Minato, S.I.; Nagata, M. Zero-Suppressed Sentential Decision Diagrams. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI-2016), Phoenix, AZ, USA, 12–17 February 2016; pp. 1058–1066. [Google Scholar]
  8. Fang, L.; Fang, B.; Wan, H.; Zheng, Z.; Chang, L.; Yu, Q. Tagged Sentential Decision Diagrams: Combining Standard and Zero-suppressed Compression and Trimming Rules. In Proceedings of the 38th IEEE/ACM International Conference on Computer-Aided Design (ICCAD-2019), Westminster, CO, USA, 4–7 November 2019; pp. 1–8. [Google Scholar]
  9. Darwiche, A. SDD: A New Canonical Representation of Propositional Knowledge Bases. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI-2011), Catalonia, Spain, 16–22 July 2011; pp. 819–826. [Google Scholar]
  10. Bryant, R.E. Supplementary Material on Chain Reduction for Binary and Zero-Suppressed Decision Diagrams. 2020. Available online: http://www.cs.cmu.edu/~bryant/bdd-chaining.html (accessed on 5 December 2023).
  11. SecLists. Available online: https://github.com/danielmiessler/SecLists/tree/master (accessed on 8 December 2023).
Figure 1. Trimming rules for TSDDs.
Figure 1. Trimming rules for TSDDs.
Algorithms 17 00042 g001
Figure 2. Trie.
Figure 2. Trie.
Algorithms 17 00042 g002
Table 1. The time (secs.) for encoding dictionaries with and without tries.
Table 1. The time (secs.) for encoding dictionaries with and without tries.
Word Count Compact, One-HotCompact, BinaryASCII, One-HotASCII, Binary
20,000with trie3.5272.2284.233.119
without trie251.86412.156292.52842.622
30,000with trie5.5093.3265.0843.702
without trie650.39994.768677.5241511.209
40,000with trie7.1413.9277.6775.503
without trie1125.3271682.5851100.1682500.582
Table 2. The comparison of SDDs, ZSDDs, and TSDDs over 2 categories of benchmarks.
Table 2. The comparison of SDDs, ZSDDs, and TSDDs over 2 categories of benchmarks.
DictionaryWord
Count
Decision
Diagram 
Compact, One-HotCompact, BinaryASCII, One-HotASCII, Binary
Node CountTime/sNode CountTime/sNode CountTime/sNode CountTime/s
words235,886BDD9,701,439770.1141,117,454102.7423,161,5012583.6391,464,773116.729
ZDD297,68148.78723,54213.11297,681173.56851,58014.4
CBDD626,07097.6711,007,86869.721626,07196.3651,277,640117.756
CZDD297,68115.04723,5429.7297,68121.84851,58010.2
SDD9,560,113745.614300,235235.77423,106,5452011.952536,433260.563
ZSDD298,995221.506443,883224.212305,323225.827740,462248.558
TSDD358,778233.301215,588109.899360,797228.642191,762209.041
password979,247BDD49,231,0853513.9064,422,2921061.89579,014,9313582.564,943,940963.165
ZDD1,130,729713.152,506,08852.521,130,729658.212,875,61250.77
CBDD2,321,5721734.8693,597,474710.5042,321,7921466.6744,307,6141466.614
CZDD1,130,72946.732,506,08830.621,130,72957.812,875,61230.33
SDD--3,648,9731074.062--3,979,7171,348.212
ZSDD1,146,448432.1121,744,536562.3291,188,576440.7622,866,480919.113
TSDD1,356,8691316.6811,101,669510.2761,366,130927.0792,426,472792.076
Ashley-
Madison
375,853BDD2,111,727364.7462,082,848280.353
ZDD548,365103.033922,478154.342547,753112.3711,326,746220.22
CBDD548,724498.3071,033,895199.4695,493427.791,435,5853301.287
CZDD548,36583.179922,475131.328547,753102.6951,326,745200.374
SDD--1,671,665525.145--1,656,556518.945
ZSDD560,898273.59923,129273.59564,650297.3031,280,338423.104
TSDD596,396351.467420,810228.901595,493369.138965,362333.126
cain-and
-abel
306,706BDD10,056,981887.6811,158,528100.34823,800,8132660.4461,298,518116.788
ZDD317,74535.482623,05368.525317,51943.817873,02093.901
CBDD318,014102.549623,08384.873317,76899.385872,924122.171
CZDD317,67532.672623,03358.034317,39438.553872,95880.506
SDD9,919,0421006.302494,076286.03223,665,1652860.691693,425312.641
ZSDD319,387235.417623,169274.117323,398227.594789,973289.935
TSDD382,102253.088233,681219.508383,978271.741221,983225.274
dutch
_wordlist
679,006BDD1,532,202283.5131,497,917270.681
ZDD434,264111.298730,174159.714416,140119.327977,282221.458
CBDD434,853649.243730,142236.863416,667520.259976,921332.901
CZDD434,264100.395730,103131.916416,140102.627977,208205.869
SDD--1,113,678484.359--1,124,797469.573
ZSDD462,359275.034665,840322.585449,766271.683977,282432.675
TSDD521,416648.479315,129276.333504,568603.326273,022310.324
honeynet2226,928BDD16,684,5681805.8691,153,693123.61820,820,9112822.8581,135,440121.942
ZDD290,59245.767507,57462.566287,03244.778737,354103.399
CBDD290,974181.046507,56181.879287,392136.038737,267121.531
CZDD290,54240.384507,51554.568286,97537.238737,28099.295
SDD16,498,5452618.111365,332541.942--338,896509.53
ZSDD306,700425.001492,636496.043306,572426.833738,166523.851
TSDD375,037563.074224,538284.982376,022557.861193,971407.62
honeynet226,081BDD16,678,5011832.5911,154,555132.44720,820,7872878.6941,135,434124.669
ZDD289,33756.012481,96969.611287,03344.508737,354105.59
CBDD289,719185.336481,94778.524287,392132.029737,267123.408
CZDD289,33747.135481,88860.258287,03340.667737,280100.879
SDD--332,475516.381--336,096521.902
ZSDD302,318424.39481,176475.143304,772424.651672,556492.475
TSDD374,666955.063200,047322.606373,872710.177207,326430.065
honeynet-
withcount
226,928BDD19,411,1881912.061,340,803130.74624,073,9812877.5321,314,007129.876
ZDD323,72550.566584,37763.843320,20654.416847,306124.843
CBDD324,089173.293584,35186.678320,558141.628847,198128.246
CZDD323,72546.186584,31758.419320,20647.772847,220107.445
SDD19,037,0122176.356383,858562.195--438,056555.949
ZSDD338,609436.965585,246475.067340,241432.563822,046530.876
TSDD416,101438.361214,368405.273417,684426.115214,044425.736
mssql-
passwords
172,696BDD6,033,921538.622445,94024.0967,852,025827.749430,15623.884
ZDD96,0326.446217,33814.30495,3797.727266,12518.321
CBDD96,39340.944217,32918.00995,74333.264266,08623.042
CZDD96,0386.468217,33213.27695,3857.668266,10617.458
SDD5,996,363733.86586,651103.1277,843,567917.80186,137118.743
ZSDD110,101392.158133,880325.367118,332436.18673,709876.657
TSDD124,794222.71485,10944.981127,391237.56480,50452.27
phpbb-
cleaned-up
184,364BDD19,685,3912767.2141,473,949176.61,466,049178.394
ZDD344,15458.099646,61784.216344,38250.182905,808125.349
CBDD344,488247.555626,625116.836344,681210.924905,717162.292
CZDD344,15453.096626,58883.761344,38249.099905,750120.685
SDD19,570,7452092.278773,398380.189--790,332374.249
ZSDD352,559233.575626,852281.159358,112232.441905,946320.189
TSDD446,421447.709277,164220.467444,665394.971231,022261.902
phpbb184,388BDD19,915,2252623.8451,475,978186.1361,467,977169.521
ZDD344,46856.859627,42272.915344,68056.34906,562120.612
CBDD344,800226.646627,42995.844344,976190.775906,471138.621
CZDD344,46854.703627,39388.244344,68055.07906,504120.922
SDD19,637,3842196.641775,524376.349--791,935375.543
ZSDD353,976239.4627,657281.859358,506273.315906,699312.853
TSDD445,432369.997278,763255.027448,969418.608236,446241.572
phpbb-
withcount
184,389BDD21,291,7012903.6871,584,038205.2571,573,792202.202
ZDD365,84770.259695,36499.788365,57384.197972,290147.415
CBDD366,120260.719695,356137.425365,835234.62972,220194.979
CZDD365,84772.25695,303106.178365,57379.704972,236133.168
SDD--800,791387.881--774,597355.65
ZSDD374,969240.648692,190277.569379,074239.707972,429318.899
TSDD473,851355.424302,141195.59477,819366.837481,863301.143
walk-
the-line
279,616BDD22,5261.71638820.1990,11114.99447300.223
ZDD8800.04217640.0198800.13624150.081
CBDD8920.10117940.038920.26624390.099
CZDD8920.05917940.0468920.12724390.078
SDD598030.19613840.65971,458195.53719131.066
ZSDD20161.09311980.16552022.09914210.25
TSDD16271.91211750.3341876.28112691.634
xato-net-
10-million
755,995BDD3,055,697760.5343,022,307659.394
ZDD853,731499.1881,372,273270.62850,452198.0321,944,759396.952
CBDD854,2871364.9181,372,115467.862850,9731130.4531,944,325678.649
CZDD853,731492.3491,372,123262.815850,452236.0051,944,387664.089
SDD--2,409,7521056.438--2,313,3201,074.787
ZSDD859,361538.1691,358,287662.663860,536570.8751,877,647793.756
TSDD1,055,7631537.339665,174629.0511,052,7451745.6051,279,015776.298
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhong, D.; Fang, L.; Guan, Q. Dictionary Encoding Based on Tagged Sentential Decision Diagrams. Algorithms 2024, 17, 42. https://doi.org/10.3390/a17010042

AMA Style

Zhong D, Fang L, Guan Q. Dictionary Encoding Based on Tagged Sentential Decision Diagrams. Algorithms. 2024; 17(1):42. https://doi.org/10.3390/a17010042

Chicago/Turabian Style

Zhong, Deyuan, Liangda Fang, and Quanlong Guan. 2024. "Dictionary Encoding Based on Tagged Sentential Decision Diagrams" Algorithms 17, no. 1: 42. https://doi.org/10.3390/a17010042

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop