In this section, we introduce the basics about the CYK algorithm and overview a class of distributed representations called holographic reduced representation.
3.1. CYK Algorithm
The CYK algorithm is a classical algorithm for recognition/parsing based on contextfree grammars (CFGs), using dynamic programming. We provide here a brief description of the algorithm in order to introduce the notation used in later sections; we closely follow the presentation in [
4], and we assume the reader is familiar with the formalism of CFG ([
1], Chapter 5). The algorithm requires CFGs in Chomsky normal form, where each rule has the form
$A\to BC$ or
$A\to a$, where
A,
B, and
C are nonterminal symbols and
a is a terminal symbol. We write
R to denote the set of all rules of the grammar and
$NT$ to denote the set of all its nonterminals.
Given an input string
$w={a}_{1}{a}_{2}\cdots {a}_{n}$,
$n\ge 1$, where each
${a}_{i}$ is an alphabet symbol, the algorithm uses a twodimensional table
P of size
$(n+1)\times (n+1)$, where each entry stores a set of nonterminals representing partial parses of the input string. More precisely, for
$0\le i<j\le n$, a nonterminal
A belongs to set
$P[i,j]$ if and only if there exists a parse tree with root
A generating the substring
${a}_{i+1}\cdots {a}_{j}$ of
w. Thus,
w can be parsed if the initial nonterminal of the grammar
S is added to
$P[0,n]$. Algorithm 1 shows how table
P is populated.
P is first initialized using unary rules, at Line 3. Then, each entry
$P[i,j]$ is filled at Line 11 by looking at pairs
$P[i,k]$ and
$P[k,j]$ and by using binary rules.
Algorithm 1 CYK(string $w={a}_{1}{a}_{2}\cdots {a}_{n}$, rule set R) return table P. 
 1:
for$i\leftarrow 1$ to n do  2:
for each $A\to {a}_{i}$ in R do  3:
add A to $P[i1,i]$  4:
end for  5:
end for  6:
for$j\leftarrow 2$ to n do  7:
for $i\leftarrow j2$ to 0 do  8:
for $k\leftarrow i+1$ to $j1$ do  9:
for each $A\to BC$ in R do  10:
if $B\in P[i,k]$ and $C\in P[k,j]$ then  11:
add A to $P[i,j]$  12:
end if  13:
end for  14:
end for  15:
end for  16:
end for

A running example is presented in
Figure 1, showing a set
R of grammar rules along with the table
P produced by the algorithm when processing the input string
$w=aab$ (the right part of this figure should be ignored by now). For instance,
S is added to
$P[1,3]$, since
$D\in P[1,2]$,
$E\in P[2,3]$, and
$(S\to DE)\in R$. Since
$S\in P[0,3]$, we conclude that
w can be generated by the grammar.
3.2. Distributed Representations with Holographic Reduced Representations
Holographic reduced representations (HRRs), introduced in [
26] and extended in [
27], are distributed representations well suited for our aim of encoding the twodimensional parsing table
P of the CYK algorithm and for implementing the operation of selecting the content of its cells
$P[i,j]$. In the following, we introduce the operations we use, along with a graphical way to represent their properties. The graphical representation is based on Tetrislike pieces.
These HRRs represent sequences of symbols
$s={s}_{1}\cdots {s}_{n}$ in vectors
$\overrightarrow{s}$ by composing vectors
$\overrightarrow{{s}_{i}}$ of symbols
${s}_{i}$ in the sequences. Hence, these representations offer an encoder and a decoder for sequences in vectors. The encoder and the decoder are not learned, but rely on the statistical properties of random vectors and the properties of a basic operation for composing vectors. The basic operation is circular convolution [
26], extended to shuffled circular convolution in [
27], which is noncommutative and, then, alleviates the problem of confusing sequences with the same symbols in different order.
The starting point of a distributed representation and, hence, of HRR is how to encode symbols into vectors: symbol
a is encoded using a random vector
$\overrightarrow{a}\in {\mathbb{R}}^{d}$ drawn from a multivariate normal distribution
$\overrightarrow{a}\sim N(0,\mathbf{I}\frac{1}{\sqrt{d}})$. These are used as basis vectors for the Johnson–Lindenstrauss transform [
28], as well as for random indexing [
29]. The main property of these random vectors is the following:
In this paper, we use the matrix representation of the shuffled circular convolution introduced in [
27]. In this way, symbols are represented in a way where composition is just matrix multiplication and the inverse operation is matrix transposition. Given the above symbol encoding, we can define a basic operation
${\left[\phantom{\rule{0.222222em}{0ex}}\right]}^{\oplus}$ and its approximate inverse
${\left[\phantom{\rule{0.222222em}{0ex}}\right]}^{\ominus}$. These operations take as input a symbol and provide a matrix in
${\mathbb{R}}^{d\times d}$ and are the basis for our encoding and decoding. The first operation is defined as:
where
$\Phi $ is a permutation matrix to obtain the shuffling [
27] and
${\mathrm{A}}_{\circ}$ is the circulant matrix of the vector
${\overrightarrow{a}}^{\top}=\left(\begin{array}{cccc}{a}_{0}& {a}_{1}& \dots & {a}_{d1}\end{array}\right)$, that is:
while
${s}_{i}\left(\overrightarrow{a}\right)$ is the circular shifting of
i positions of the vector
$\overrightarrow{a}$. Circulant matrices are used to describe circular convolution. In fact,
$\overrightarrow{a}\ast \overrightarrow{b}={\mathrm{A}}_{\circ}\overrightarrow{b}={\mathrm{B}}_{\circ}\overrightarrow{a}$ where
$\ast $ is circular convolution. This operation has a nice approximated inverse in:
We then have:
since
$\Phi $ is a permutation matrix and therefore
$\Phi {\Phi}^{\top}=I$ and since:
due to the fact that
${\mathrm{A}}_{\circ}$ and
${\mathrm{B}}_{\circ}$ are circulant matrices based on random vectors
$\overrightarrow{a},\overrightarrow{b}\sim N(0,\mathbf{I}\frac{1}{\sqrt{d}})$; hence,
${s}_{i}{\left(\overrightarrow{a}\right)}^{\top}{s}_{j}\left(\overrightarrow{b}\right)\approx 1$ if both
$i=j$ and
$\overrightarrow{a}=\overrightarrow{b}$, and
${s}_{i}{\left(\overrightarrow{a}\right)}^{\top}{s}_{j}\left(\overrightarrow{b}\right)\approx 0$ otherwise. Finally, the permutation matrix
$\Phi $ is used to enforce noncommutativity in the matrix product such as
${\left[a\right]}^{\oplus}{\left[b\right]}^{\oplus}{\left[c\right]}^{\oplus}$.
With the ${\left[\phantom{\rule{0.222222em}{0ex}}\right]}^{\oplus}$ and ${\left[\phantom{\rule{0.222222em}{0ex}}\right]}^{\ominus}$ operations at hand, we can now encode and decode strings, that is finite sequences of symbols. As an example, the string $abc$ can be represented as the matrix product ${\left[a\right]}^{\oplus}{\left[b\right]}^{\oplus}{\left[c\right]}^{\oplus}$. In fact, we can check that ${\left[a\right]}^{\oplus}{\left[b\right]}^{\oplus}{\left[c\right]}^{\oplus}$ starts with a, but not with b or with c, since we have ${\left[a\right]}^{\ominus}{\left[a\right]}^{\oplus}{\left[b\right]}^{\oplus}{\left[c\right]}^{\oplus}\approx {\left[b\right]}^{\oplus}{\left[c\right]}^{\oplus}$, which is different from $\mathbf{0}$, while ${\left[b\right]}^{\ominus}{\left[a\right]}^{\oplus}{\left[b\right]}^{\oplus}{\left[c\right]}^{\oplus}\approx \mathbf{0}$ and ${\left[c\right]}^{\ominus}{\left[a\right]}^{\oplus}{\left[b\right]}^{\oplus}{\left[c\right]}^{\oplus}\approx \mathbf{0}$. Knowing that ${\left[a\right]}^{\oplus}{\left[b\right]}^{\oplus}{\left[c\right]}^{\oplus}$ starts with a, we can also check that the second symbol in ${\left[a\right]}^{\oplus}{\left[b\right]}^{\oplus}{\left[c\right]}^{\oplus}$ is b, since ${\left[b\right]}^{\ominus}{\left[a\right]}^{\ominus}{\left[a\right]}^{\oplus}{\left[b\right]}^{\oplus}{\left[c\right]}^{\oplus}$ is different from $\mathbf{0}$. Finally, knowing that ${\left[a\right]}^{\oplus}{\left[b\right]}^{\oplus}{\left[c\right]}^{\oplus}$ starts with $ab$, we can check that the string ends in c, since ${\left[c\right]}^{\ominus}{\left[b\right]}^{\ominus}{\left[a\right]}^{\ominus}{\left[a\right]}^{\oplus}{\left[b\right]}^{\oplus}{\left[c\right]}^{\oplus}\approx \mathbf{I}$.
Using the above operations, we can also encode sets of strings. For instance, the string set $\mathcal{S}=\{abS,DSa\}$ is represented as the sum of matrix products ${\left[a\right]}^{\oplus}{\left[b\right]}^{\oplus}{\left[S\right]}^{\oplus}+{\left[D\right]}^{\oplus}{\left[S\right]}^{\oplus}{\left[a\right]}^{\oplus}$. We can then test whether $abS\in \mathcal{S}$ by computing the matrix product ${\left[S\right]}^{\ominus}{\left[b\right]}^{\ominus}{\left[a\right]}^{\ominus}({\left[a\right]}^{\oplus}{\left[b\right]}^{\oplus}{\left[S\right]}^{\oplus}+{\left[D\right]}^{\oplus}{\left[S\right]}^{\oplus}{\left[a\right]}^{\oplus})\approx \mathbf{I}$, meaning that the answer is positive. Similarly, $aDS\in \mathcal{S}$ is false, since ${\left[S\right]}^{\ominus}{\left[D\right]}^{\ominus}{\left[a\right]}^{\ominus}({\left[a\right]}^{\oplus}{\left[b\right]}^{\oplus}{\left[S\right]}^{\oplus}+{\left[D\right]}^{\oplus}{\left[S\right]}^{\oplus}{\left[a\right]}^{\oplus})\approx \mathbf{0}$. We can also test whether there is any string in $\mathcal{S}$ starting with a, by computing ${\left[a\right]}^{\ominus}({\left[a\right]}^{\oplus}{\left[b\right]}^{\oplus}{\left[S\right]}^{\oplus}+{\left[D\right]}^{\oplus}{\left[S\right]}^{\oplus}{\left[a\right]}^{\oplus})\approx {\left[b\right]}^{\oplus}{\left[S\right]}^{\oplus}$ and providing a positive answer since the result is different from $\mathbf{0}$.
Not only can our operations be used to encode sets, as described above; they can also be used to encode multiple sets, that is they can keep a count of the number of occurrences of a given symbol/string within a collection. For instance, consider the multiset consisting of two occurrences of symbol a. This can be encoded by means of the sum ${\left[a\right]}^{\oplus}+{\left[a\right]}^{\oplus}$. In fact, we can test the number of occurrences of symbol a in the multiset using the product ${\left[a\right]}^{\ominus}({\left[a\right]}^{\oplus}+{\left[a\right]}^{\oplus})\approx \mathbf{I}+\mathbf{I}=2\mathbf{I}$.
Our operations have the nice property of deleting symbols in a chain of matrix multiplication if the opposed symbols are contiguous. In fact, two contiguous opposed symbols become the identity matrix, which is invariant with respect to matrix multiplication. This behavior seems to be similar to what happens for the Tetris game where pieces delete lines when shapes complement holes in lines. Hence, to visualize the encoding and decoding implemented by the above operations, we will use a graphical representation based on Tetris. Symbols under the above operations are represented as Tetris pieces: for example,
. In this way strings are sequences of pieces; for example,
encodes
$abS$ (equivalently,
${\left[a\right]}^{\oplus}{\left[b\right]}^{\oplus}{\left[S\right]}^{\oplus}$). Then, like in Tetris, elements with complementary shapes are canceled out and removed from a sequence; for example, if
is applied to the left of
, the result is
as
disappears. Sets of strings (sums of matrix products) are represented in boxes, as for instance:
which encodes set
$\{abS,DSa\}$ (equivalently,
${\left[a\right]}^{\oplus}{\left[b\right]}^{\oplus}{\left[S\right]}^{\oplus}+{\left[D\right]}^{\oplus}{\left[S\right]}^{\oplus}{\left[a\right]}^{\oplus}$). In addition to the usual Tetris rules, we assume here that an element with a certain shape will select from a box only elements with the complementary shape facing it. For instance, if
is applied to the left of the above box
L, the result is the new box:
as
selects
but not
With the encoding introduced above and with the Tetris metaphor, we can describe our model to encode P tables as matrices of real numbers, and we can implement CFG rule applications by means of matrix multiplication, as discussed in the next section.