<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Algorithms</journal-id>
<journal-title>Algorithms</journal-title>
<issn pub-type="epub">1999-4893</issn>
<publisher>
<publisher-name>Molecular Diversity Preservation International (MDPI)</publisher-name></publisher></journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3390/a4040285</article-id>
<article-id pub-id-type="publisher-id">algorithms-04-00285</article-id>
<article-categories>
<subj-group>
<subject>Article</subject></subj-group></article-categories>
<title-group>
<article-title>An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms</article-title></title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Marschall</surname><given-names>Tobias</given-names></name><xref ref-type="aff" rid="af1-algorithms-04-00285"><sup>1</sup></xref><xref ref-type="corresp" rid="c1-algorithms-04-00285"><sup>*</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>Rahmann</surname><given-names>Sven</given-names></name><xref ref-type="aff" rid="af2-algorithms-04-00285"><sup>2</sup></xref><xref ref-type="aff" rid="af3-algorithms-04-00285"><sup>3</sup></xref><xref ref-type="corresp" rid="c1-algorithms-04-00285"><sup>*</sup></xref></contrib></contrib-group>
<aff id="af1-algorithms-04-00285">
<label>1</label> Centrum Wiskunde &amp; Informatica (CWI), Science Park 123, 1098 XG Amsterdam, The Netherlands</aff>
<aff id="af2-algorithms-04-00285">
<label>2</label> Genome Informatics, Faculty of Medicine, University of Duisburg-Essen, Hufelandstr. 55, 45122 Essen, Germany</aff>
<aff id="af3-algorithms-04-00285">
<label>3</label> Bioinformatics, Computer Science XI, TU Dortmund, 44221 Dortmund, Germany</aff>
<author-notes>
<corresp id="c1-algorithms-04-00285">
<label>*</label>Authors to whom correspondence should be addressed; E-Mails: <email>T.Marschall@cwi.nl</email> (T.M.); <email>Sven.Rahmann@tu-dortmund.de</email> (S.R.); Tel./Fax: +31(0)20 592 4132 ext. 4199.</corresp></author-notes>
<pub-date pub-type="collection">
<year>2011</year></pub-date>
<pub-date pub-type="epub">
<day>31</day>
<month>10</month>
<year>2011</year></pub-date>
<volume>4</volume>
<issue>4</issue>
<fpage>285</fpage>
<lpage>306</lpage>
<history>
<date date-type="received">
<day>14</day>
<month>10</month>
<year>2011</year></date>
<date date-type="rev-recd">
<day>26</day>
<month>10</month>
<year>2011</year></date>
<date date-type="accepted">
<day>26</day>
<month>10</month>
<year>2011</year></date></history>
<permissions>
<copyright-statement>© 2011 by the authors; licensee MDPI, Basel, Switzerland.</copyright-statement>
<copyright-year>2011</copyright-year>
<license>
<p>This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/.)</p></license></permissions>
<abstract>
<p>We propose a framework for the exact probabilistic analysis of window-based pattern matching algorithms, such as Boyer–Moore, Horspool, Backward DAWG Matching, Backward Oracle Matching, and more. In particular, we develop an algorithm that efficiently computes the distribution of a pattern matching algorithm's running time cost (such as the number of text character accesses) for any given pattern in a random text model. Text models range from simple uniform models to higher-order Markov models or hidden Markov models (HMMs). Furthermore, we provide an algorithm to compute the exact distribution of <italic>differences</italic> in running time cost of two pattern matching algorithms. Methodologically, we use extensions of finite automata which we call <italic>deterministic arithmetic automata</italic> (DAAs) and <italic>probabilistic arithmetic automata</italic> (PAAs) [<xref ref-type="bibr" rid="b1-algorithms-04-00285">1</xref>]. Given an algorithm, a pattern, and a text model, a PAA is constructed from which the sought distributions can be derived using dynamic programming. To our knowledge, this is the first time that substring- or suffix-based pattern matching algorithms are analyzed exactly by computing the whole distribution of running time cost. Experimentally, we compare Horspool's algorithm, Backward DAWG Matching, and Backward Oracle Matching on prototypical patterns of short length and provide statistics on the size of minimal DAAs for these computations.</p></abstract>
<kwd-group>
<kwd>pattern matching</kwd>
<kwd>analysis of algorithms</kwd>
<kwd>finite automaton</kwd>
<kwd>minimization</kwd>
<kwd>deterministic arithmetic automaton</kwd>
<kwd>probabilistic arithmetic automaton</kwd></kwd-group></article-meta></front>
<body>
<sec sec-type="intro">
<label>1.</label>
<title>Introduction</title>
<p>The basic pattern matching problem is to find all occurrences of a <italic>pattern</italic> string in a (long) <italic>text</italic> string, with few character accesses, where a <italic>character access</italic> is the act of retrieving one character of the input string from memory. For many pattern matching algorithms, this is equivalent to speaking of character <italic>comparisons</italic>, as every accessed character is immediately compared to a character in the pattern. However, for some algorithms (e.g., the Knuth–Morris–Pratt algorithm [<xref ref-type="bibr" rid="b2-algorithms-04-00285">2</xref>]), each character access triggers a table lookup rather than a comparison. Thus, we discuss character accesses rather than character comparisons in the remainder of this article.</p>
<p>Let <italic>n</italic> be the text length and <italic>m</italic> be the pattern length. The well-known Knuth–Morris–Pratt algorithm [<xref ref-type="bibr" rid="b2-algorithms-04-00285">2</xref>] reads each text character exactly once from left to right and hence needs exactly <italic>n</italic> character accesses for any text of length <italic>n</italic>, after preprocessing the pattern in Θ(<italic>m</italic>) time. In contrast, the Boyer–Moore [<xref ref-type="bibr" rid="b3-algorithms-04-00285">3</xref>], Horspool [<xref ref-type="bibr" rid="b4-algorithms-04-00285">4</xref>], Sunday [<xref ref-type="bibr" rid="b5-algorithms-04-00285">5</xref>], Backward DAWG Matching (BDM, [<xref ref-type="bibr" rid="b6-algorithms-04-00285">6</xref>]) and Backward Oracle Matching (BOM, [<xref ref-type="bibr" rid="b7-algorithms-04-00285">7</xref>]) algorithms move a length-<italic>m</italic> search window across the text and first compare its <italic>last</italic> character to the last character of the pattern. This often allows to move the search window by more than one position (at best, by <italic>m</italic> positions if the last window character does not occur in the pattern at all), for a best case of <italic>n/m</italic>, but a worst case of <italic>mn</italic> character accesses. The worst case can often be improved to Θ(<italic>m</italic> + <italic>n</italic>), but this makes the code more complicated and seldom provides a speed-up in practice. For practical pattern matching applications, the most important algorithms are Horspool, BDM (often implemented as Backward Nondeterministic DAWG Matching, BNDM, via a non-deterministic automaton that is simulated in a bit-parallel fashion), and BOM, depending on alphabet size, text length and pattern length; see [<xref ref-type="bibr" rid="b8-algorithms-04-00285">8</xref>] for an experimental map.</p>
<p>A question that has apparently so far not been investigated is about the exact probability distribution of the number of required character accesses 
<inline-formula>
<mml:math id="mm1" display="inline">
<mml:semantics id="sm1">
<mml:mrow>
<mml:msubsup>
<mml:mtext mathvariant="italic">X</mml:mtext>
<mml:mi>n</mml:mi>
<mml:mi>p</mml:mi></mml:msubsup></mml:mrow></mml:semantics></mml:math></inline-formula> when matching a given pattern <italic>p</italic> against a random text of finite length <italic>n</italic> (non-asymptotic case), even though related questions have been answered in the literature. For example, [<xref ref-type="bibr" rid="b9-algorithms-04-00285">9</xref>,<xref ref-type="bibr" rid="b10-algorithms-04-00285">10</xref>] analyze the expected value of 
<inline-formula>
<mml:math id="mm2" display="inline">
<mml:semantics id="sm2">
<mml:mrow>
<mml:msubsup>
<mml:mtext mathvariant="italic">X</mml:mtext>
<mml:mi>n</mml:mi>
<mml:mi>p</mml:mi></mml:msubsup></mml:mrow></mml:semantics></mml:math></inline-formula> for the Horspool algorithm. In [<xref ref-type="bibr" rid="b11-algorithms-04-00285">11</xref>] it is further shown that for the Horspool algorithm, 
<inline-formula>
<mml:math id="mm3" display="inline">
<mml:semantics id="sm3">
<mml:mrow>
<mml:msubsup>
<mml:mtext mathvariant="italic">X</mml:mtext>
<mml:mi>n</mml:mi>
<mml:mi>p</mml:mi></mml:msubsup></mml:mrow></mml:semantics></mml:math></inline-formula> is asymptotically normally distributed for random texts with independent and identically distributed (i.i.d.) characters, and [<xref ref-type="bibr" rid="b12-algorithms-04-00285">12</xref>] extends this result to Markovian text models. In [<xref ref-type="bibr" rid="b13-algorithms-04-00285">13</xref>], a method to compute mean and variance of these distributions is given.</p>
<p>In contrast to these results that are special to the Horspool algorithm, we use a general framework called <italic>probabilistic arithmetic automata</italic> (PAAs), introduced at CPM'08 [<xref ref-type="bibr" rid="b1-algorithms-04-00285">1</xref>], to compute the exact distribution of 
<inline-formula>
<mml:math id="mm4" display="inline">
<mml:semantics id="sm4">
<mml:mrow>
<mml:msubsup>
<mml:mtext mathvariant="italic">X</mml:mtext>
<mml:mi>n</mml:mi>
<mml:mi>p</mml:mi></mml:msubsup></mml:mrow></mml:semantics></mml:math></inline-formula> for any window-based pattern matching algorithm. In [<xref ref-type="bibr" rid="b1-algorithms-04-00285">1</xref>], PAAs were introduced in order to compute the distribution of occurrence counts of patterns, a purpose for which multiple other researchers have also proposed to combine finite automata with probabilistic text models [<xref ref-type="bibr" rid="b14-algorithms-04-00285">14</xref>-<xref ref-type="bibr" rid="b17-algorithms-04-00285">17</xref>]. Especially the early approach of Nicodéme <italic>et al.</italic> [<xref ref-type="bibr" rid="b14-algorithms-04-00285">14</xref>] has shown how to derive generating functions and perform asymptotic analysis of occurrence distributions.</p>
<p>Here, we show that a similar idea can be applied to the analysis of pattern matching algorithms by constructing an automaton that encodes the behavior of such an algorithm and then combining it with a text model. The PAA framework allows doing this in a natural way, which further highlights its utility The random text model can be quite general, from simple i.i.d. uniform models to high-order Markov models or HMMs. The approach is applied to the following pattern matching algorithms in the non-asymptotic regime (short patterns, medium-length texts): Horspool, B(N)DM, BOM. We do not treat BDM and BNDM separately as, in terms of text character accesses, they are indistinguishable (see Section 2.2).</p>
<p>This paper is organized as follows. In the next section, we give a brief review of the Horspool, B(N)DM and BOM algorithms. In Section 3, we define <italic>deterministic arithmetic automata</italic> (DAAs). In Section 4, we present a simple general DAA construction for the analysis of window-based pattern matching algorithms. We also show that the state space of the DAA can be considerably reduced by adapting DFA minimization to DAAs. In Section 5, we summarize the PAA framework with its generic algorithms, define finite-memory text models and connect DAAs to PAAs. Given a pattern <italic>p</italic>, an algorithm, and a random text model, this framework allows constructing a PAA that mimics the algorithms' behavior. By applying dynamic programming to this PAA we obtain an algorithm to compute the distribution of 
<inline-formula>
<mml:math id="mm5" display="inline">
<mml:semantics id="sm5">
<mml:mrow>
<mml:msubsup>
<mml:mtext mathvariant="italic">X</mml:mtext>
<mml:mi>n</mml:mi>
<mml:mi>p</mml:mi></mml:msubsup></mml:mrow></mml:semantics></mml:math></inline-formula> for any finite text length <italic>n</italic>. Section 6 introduces <italic>difference DAAs</italic> by a product construction that allows to compare two algorithms on a given pattern. Results on the comparison of several algorithms for example patterns can be found in Section 7. There, we also provide statistics on automata sizes for different algorithms and pattern lengths. Section 8 contains a concluding discussion.</p>
<p>An extended abstract of this work has been presented at LATA'10 [<xref ref-type="bibr" rid="b18-algorithms-04-00285">18</xref>] with two alternative DAA constructions. In contrast to that version, the DAA construction in the present paper can be seen as a combination of both of those, and is much simpler. Additionally, the DAA minimization introduced in the present paper allows the analysis of much longer patterns in practice. While [<xref ref-type="bibr" rid="b18-algorithms-04-00285">18</xref>] was focused on Horspool's and Sunday's algorithms, here, we give a general construction scheme applicable to any window-based pattern matching algorithm and discuss the most relevant algorithms, namely Horspool, BOM, and B(N)DM, as examples.</p>
<sec>
<title>Notation</title>
<p>Throughout this paper, Σ denotes a finite alphabet, <italic>p</italic> ∈ Σ* is an arbitrary but fixed pattern, and <italic>s</italic> ∈ Σ* is the text to be searched for <italic>p</italic>. Furthermore, <italic>m</italic> ≔ |<italic>p</italic>| and <italic>n</italic> ≔ |<italic>s</italic>|. Indexing generally starts at zero, that is, <italic>s</italic> = <italic>s</italic>[0] … <italic>s</italic>[|<italic>s</italic>| − 1] for all <italic>s</italic> ∈ Σ*. The prefix, suffix, and substring of a string <italic>s</italic> are written <italic>s</italic>[‥<italic>i</italic>] ≔ <italic>s</italic>[0] … <italic>s</italic>[<italic>i</italic>], <italic>s</italic>[<italic>i</italic>‥] ≔ <italic>s</italic>[<italic>i</italic>] … <italic>s</italic>[|<italic>s</italic>| − 1], and <italic>s</italic>[<italic>i</italic> … <italic>j</italic>] ≔ <italic>s</italic>[<italic>i</italic>] … <italic>s</italic>[<italic>j</italic>], respectively. By <italic>p⃖</italic>, we denote the reverse pattern <italic>p</italic>[<italic>m</italic> − 1] … <italic>p</italic>[0]. For a random variable <italic>X</italic>, its distribution (law) is denoted by ℒ(<italic>X</italic>). Iverson brackets are written ⟦·⟧, <italic>i.e.</italic>, ⟦<italic>A</italic>⟧ = 1 if the statement <italic>A</italic> is true and ⟦<italic>A</italic>⟧ = 0 otherwise.</p></sec></sec>
<sec>
<label>2.</label>
<title>Algorithms</title>
<p>In the following, we summarize the Horspool, B(N)DM and BOM algorithms; algorithmic details can be found in [<xref ref-type="bibr" rid="b8-algorithms-04-00285">8</xref>].</p>
<p>We do not discuss the Knuth–Morris–Pratt algorithm because its number of text character accesses is constant: Each character of the text is looked at exactly once. Therefore, 
<inline-formula>
<mml:math id="mm6" display="inline">
<mml:semantics id="sm6">
<mml:mrow>
<mml:mi>ℒ</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mtext mathvariant="italic">X</mml:mtext>
<mml:mi>n</mml:mi>
<mml:mi>p</mml:mi></mml:msubsup></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></inline-formula> is the Dirac distribution on <italic>n, i.e.</italic>, 
<inline-formula>
<mml:math id="mm7" display="inline">
<mml:semantics id="sm7">
<mml:mrow>
<mml:mi>ℙ</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mtext mathvariant="italic">X</mml:mtext>
<mml:mi>n</mml:mi>
<mml:mi>p</mml:mi></mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mi>n</mml:mi></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:semantics></mml:math></inline-formula>.</p>
<p>We also do not discuss the Boyer–Moore algorithm, since it is never the best one in practice because of its complicated code to achieve optimal asymptotic running time. In contrast to our earlier paper [<xref ref-type="bibr" rid="b18-algorithms-04-00285">18</xref>], we also skip the Sunday algorithm, as it is almost always inferior to Horspool's. Instead, we focus on those algorithms that are fastest in practice according to [<xref ref-type="bibr" rid="b8-algorithms-04-00285">8</xref>].</p>
<p>The Horspool, B(N)DM and BOM algorithms have the following properties in common: They maintain a search window <italic>w</italic> of length <italic>m</italic> = |<italic>p</italic>| that initially starts at position 0 in the text <italic>s</italic>, such that its rightmost character is at position <italic>t</italic> = <italic>m</italic> − 1. The right window position <italic>t</italic> grows in the course of the algorithm; we always have <italic>w</italic> = <italic>s</italic>[(<italic>t</italic> − <italic>m</italic> + 1) … <italic>t</italic>]. The two properties of an algorithm that influence our analysis are the following: For a pattern <italic>p</italic> ∈ Σ<italic><sup>m</sup></italic>, each window <italic>w</italic> ∈ Σ<italic><sup>m</sup></italic> determines
<list list-type="order">
<list-item>
<p>its cost <italic>ξ<sup>p</sup></italic>(<italic>w</italic>), e.g., the number of text character accesses required to analyze this window,</p></list-item>
<list-item>
<p>its shift <italic>shift<sup>p</sup></italic>(<italic>w</italic>), which is the number of characters the window is advanced after it has been examined.</p></list-item></list></p>
<sec>
<label>2.1.</label>
<title>Horspool</title>
<p>First, the rightmost characters of window and pattern are compared; that means, <italic>a</italic> ≔ <italic>w</italic>[<italic>m</italic> − 1] = <italic>s</italic>[<italic>t</italic>] is compared with <italic>p</italic>[<italic>m</italic> − 1]. If they match, the remaining <italic>m</italic> − 1 characters are compared until either the first mismatch is found or an entire match has been verified. This comparison can happen right-to-left, left-to-right, or in an arbitrary order that may depend on <italic>p</italic>. In our analysis, we focus on the right-to-left case for concreteness, but the modifications for the other cases are straightforward. Therefore, the cost of window <italic>w</italic> is
<disp-formula id="FD1">
<mml:math id="mm8" display="block">
<mml:semantics id="sm8">
<mml:mrow>
<mml:msup>
<mml:mi>ξ</mml:mi>
<mml:mi>p</mml:mi></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:mi>m</mml:mi></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mtext>if</mml:mtext>
<mml:mspace width="0.3em"/>
<mml:mi>p</mml:mi>
<mml:mo>=</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mo>min</mml:mo>
<mml:mo>{</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>:</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>≤</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>≤</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">[</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>-</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo>≠</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo stretchy="false">[</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>-</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo>}</mml:mo></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mtext>otherwise</mml:mtext>
<mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>In any case, the rightmost window character a is used to determine how far the window can be shifted for the next iteration. The shift function ensures that no match can be missed by moving the window such that a becomes aligned to the rightmost <italic>a</italic> in <italic>p</italic> (not considering the last position). If <italic>a</italic> does not occur in <italic>p</italic> (or only at the last position), it is safe to shift by <italic>m</italic> positions. Formally, we define
<disp-formula id="FD2">
<mml:math id="mm9" display="block">
<mml:semantics id="sm9">
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msup>
<mml:mtext mathvariant="italic">right</mml:mtext>
<mml:mi>p</mml:mi></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>≔</mml:mo>
<mml:mo>max</mml:mo>
<mml:mo stretchy="false">[</mml:mo>
<mml:mo stretchy="false">{</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>∈</mml:mo>
<mml:mo stretchy="false">{</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo stretchy="false">}</mml:mo>
<mml:mo>:</mml:mo>
<mml:mi>p</mml:mi>
<mml:mo stretchy="false">[</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo stretchy="false">}</mml:mo>
<mml:mo>∪</mml:mo>
<mml:mo stretchy="false">{</mml:mo>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">}</mml:mo>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mtext mathvariant="monospace">shift</mml:mtext>
<mml:mo stretchy="false">[</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo>≔</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>-</mml:mo>
<mml:msup>
<mml:mtext>right</mml:mtext>
<mml:mi>p</mml:mi></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:mspace width="0.3em"/>
<mml:mtext>assuming</mml:mtext>
<mml:mspace width="0.3em"/>
<mml:mi>p</mml:mi>
<mml:mspace width="0.3em"/>
<mml:mtext>fixed</mml:mtext>
<mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msup>
<mml:mtext mathvariant="italic">shift</mml:mtext>
<mml:mi>p</mml:mi></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>≔</mml:mo>
<mml:mtext mathvariant="monospace">shift</mml:mtext>
<mml:mo stretchy="false">[</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo stretchy="false">[</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>For concreteness, we state Horspool's algorithm and how we count text character accesses as pseudocode in Algorithm 1. Note that after a shift, even when we know that <italic>a</italic> now matches its corresponding pattern character, the corresponding position is compared again and counts as a text access. Otherwise the additional bookkeeping would make the algorithm more complicated; this is not worth the effort in practice. The lookup in the 
<monospace>shift</monospace>-table does not count as an additional access, since we can remember 
<monospace>shift</monospace>[<italic>a</italic>] as soon as the last window character has been read.</p>
<p>The main advantage of the Horspool algorithm is its simplicity. Especially, a window's shift value depends only on its last character, and its cost is easily computed from the number of consecutive matching characters at its right end. The Horspool algorithm does not require any advanced data structure and can be implemented in a few lines of code.</p>
<array>
<tbody>
<tr>
<td colspan="2" valign="bottom">
<hr/></td></tr>
<tr>
<td colspan="2" align="left" valign="top"><bold>Algorithm 1</bold> H<sc>orspool</sc>-<sc>with</sc>-C<sc>ost</sc></td></tr>
<tr>
<td colspan="2" valign="bottom">
<hr/></td></tr>
<tr>
<td colspan="2" align="left" valign="top"><bold>Input:</bold> text <italic>s</italic> ∈ Σ*, pattern <italic>p</italic> ∈ Σ<italic><sup>m</sup></italic></td></tr>
<tr>
<td colspan="2" align="left" valign="top"><bold>Output:</bold> pair (number <italic>occ</italic> of occurrences of <italic>p</italic> in <italic>s</italic>, number <italic>cost</italic> of accesses to <italic>s</italic>)</td></tr>
<tr>
<td align="right" valign="top">1:</td>
<td align="left" valign="top">pre-compute table 
<monospace>shift</monospace>[<italic>a</italic>] for all <italic>a</italic> ∈ Σ</td></tr>
<tr>
<td align="right" valign="top">2:</td>
<td align="left" valign="top">(<italic>occ, cost</italic>) ← (0, 0)</td></tr>
<tr>
<td align="right" valign="top">3:</td>
<td align="left" valign="top"><italic>t</italic> ← <italic>m</italic> − 1</td></tr>
<tr>
<td align="right" valign="top">4:</td>
<td align="left" valign="top"><bold>while</bold> <italic>t</italic> &lt; |<italic>s</italic>| <bold>do</bold></td></tr>
<tr>
<td align="right" valign="top">5:</td>
<td align="left" valign="top"> <italic>i</italic> ← 0</td></tr>
<tr>
<td align="right" valign="top">6:</td>
<td align="left" valign="top"> <bold>while</bold> <italic>i</italic> &lt; <italic>m</italic> <bold>do</bold></td></tr>
<tr>
<td align="right" valign="top">7:</td>
<td align="left" valign="top">  <italic>cost</italic> ← <italic>cost</italic> + 1</td></tr>
<tr>
<td align="right" valign="top">8:</td>
<td align="left" valign="top">  <bold>if</bold> <italic>s</italic>[<italic>t</italic> − <italic>i</italic>] = <italic>p</italic>[(<italic>m</italic> − 1) − <italic>i</italic>] <bold>then break</bold></td></tr>
<tr>
<td align="right" valign="top">9:</td>
<td align="left" valign="top">  <italic>i</italic> ← <italic>i</italic> + 1</td></tr>
<tr>
<td align="right" valign="top">10:</td>
<td align="left" valign="top"> <bold>if</bold> <italic>i</italic> = <italic>m</italic> <bold>then</bold> <italic>occ</italic> ← <italic>occ</italic> + 1</td></tr>
<tr>
<td align="right" valign="top">11:</td>
<td align="left" valign="top"> <italic>t</italic> ← <italic>t</italic> + 
<monospace>shift</monospace>[<italic>s</italic>[<italic>t</italic>]]</td></tr>
<tr>
<td align="right" valign="top">12:</td>
<td align="left" valign="top"><bold>return</bold> (<italic>occ, cost</italic>)</td></tr>
<tr>
<td colspan="2" valign="bottom">
<hr/></td></tr></tbody></array></sec>
<sec>
<label>2.2.</label>
<title>Backward (Nondeterministic) DAWG Matching, B(N)DM</title>
<p>The main idea of the BDM algorithm is to build a deterministic finite automaton (in this case, a suffix automaton, which is a directed acyclic word graph or DAWG) that recognizes all substrings of the reversed pattern, accepts all suffixes of the reversed pattern (including the empty suffix), and enters a FAIL state if a string has been read that is not a substring of the reversed pattern.</p>
<p>The suffix automaton processes the window right-to-left. As long as the FAIL state has not been reached, we have read a substring of the reversed pattern. If we are in an accepting state, we have even found a suffix of the reversed pattern (<italic>i.e.</italic>, a prefix of <italic>p</italic>). Whenever this happens before we have read <italic>m</italic> characters, the last such event marks the next potential window start that may contain a match with <italic>p</italic>, and hence determines the shift. When we are in an accepting state after reading <italic>m</italic> characters, we have found a match, but this does not influence the shift.</p>
<p>So, <italic>ξ<sup>p</sup></italic>(<italic>w</italic>) is the number of characters read when entering FAIL (including the FAIL-inducing character), or m if <italic>p</italic> = <italic>w</italic>. Let <italic>I<sup>p</sup></italic>(<italic>w</italic>) ⊆ {0, …, <italic>m</italic> − 1} be the set defined by <italic>i</italic> ∈ <italic>I<sup>p</sup></italic>(<italic>w</italic>) if and only if the suffix automaton of <italic>p⃖</italic> is in an accepting state after reading <italic>i</italic> characters of <italic>w</italic>. Then
<disp-formula id="FD3">
<mml:math id="mm10" display="block">
<mml:semantics id="sm10">
<mml:mrow>
<mml:msup>
<mml:mtext mathvariant="italic">shift</mml:mtext>
<mml:mi>p</mml:mi></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo>min</mml:mo>
<mml:mo stretchy="false">{</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>-</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>∣</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msup>
<mml:mi>I</mml:mi>
<mml:mi>p</mml:mi></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">}</mml:mo>
<mml:mo>.</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>Note that <italic>I<sup>p</sup></italic>(<italic>w</italic>) is never empty as the suffix automaton accepts the empty string and, thus, 0 ∈ <italic>I<sup>p</sup></italic>(<italic>w</italic>) for all windows <italic>w</italic>.</p>
<p>The advantage of BDM is that it makes long shifts, but its main disadvantage is the necessary construction of the suffix automaton, which is possible in <italic>O</italic>(<italic>m</italic>) time via the suffix tree of the reversed pattern, but too expensive in practice to compete with other algorithms unless the search text is extremely long.</p>
<p>Constructing a nondeterministic finite automaton (NFA) instead of the deterministic suffix automaton is much simpler. However, processing a text character then does not take constant, but <italic>O</italic>(<italic>m</italic>) time. However, the NFA can be efficiently simulated with bit-parallel operations such that processing a text character takes <italic>O</italic>(<italic>m/W</italic>) time, where <italic>W</italic> is the machine word size. For many patterns in practice, this is as good as <italic>O</italic>(1). The resulting algorithm is then called BNDM.</p>
<p>From the “text character accesses” analysis point of view, BDM and BNDM are equivalent, as they have the same shift and cost functions.</p></sec>
<sec>
<label>2.3.</label>
<title>Backward Oracle Matching, BOM</title>
<p>BOM is similar to B(N)DM, but the suffix automaton of the reversed pattern is replaced by a simpler deterministic automaton, the factor oracle [<xref ref-type="bibr" rid="b8-algorithms-04-00285">8</xref>]. The factor oracle of a string <italic>x</italic> (which corresponds to the reversed pattern <italic>p⃖</italic>) of length <italic>m</italic> has the following properties.</p>
<list list-type="order">
<list-item>
<p>If <italic>y</italic> is a factor (substring) of <italic>x</italic>, then there exists a path spelling <italic>y</italic> from the start state to some state which is not the FAIL state; we say that <italic>y</italic> is <italic>recognized</italic>.</p></list-item>
<list-item>
<p>The only string of length <italic>m</italic> recognized is <italic>x</italic>.</p></list-item>
<list-item>
<p>It has the minimal number of states (<italic>m</italic> + 1) necessary for recognizing <italic>x</italic> (omitting the FAIL state).</p></list-item>
<list-item>
<p>It has between <italic>m</italic> and 2<italic>m</italic> − 1 transitions (omitting those into the FAIL state).</p></list-item></list>
<p>It may recognize more strings than the substrings of <italic>x</italic> (although in practice not many more), but is easier to construct. It still guarantees that, once the FAIL state is reached, the sequence of read characters is not a substring of <italic>x</italic>. We refer to [<xref ref-type="bibr" rid="b8-algorithms-04-00285">8</xref>] for the construction details and further properties of the oracle; an example is shown in <xref ref-type="fig" rid="f1-algorithms-04-00285">Figure 1</xref>.</p>
<p>The cost function <italic>ξ<sup>p</sup></italic>(<italic>w</italic>) is the number of characters read when entering FAIL (including the FAIL-inducing character), or <italic>m</italic> if <italic>p</italic> = <italic>w</italic>. The shift function is based on the principle that the window can be safely shifted beyond the FAILed substring; so <italic>shift<sup>p</sup></italic>(<italic>w</italic>) is defined as <italic>m</italic> minus the number of successfully read characters in <italic>w</italic> if <italic>w</italic> ≠ <italic>p</italic>, and <italic>shift<sup>p</sup></italic>(<italic>p</italic>) ≔ 1 (although this special case for <italic>w</italic> = <italic>p</italic> can be improved by examining the pattern).</p>
<p>By construction, BOM never gives longer shifts than B(N)DM. The main advantage of BOM over BDM is reduced space usage and preprocessing time; the factor oracle only has <italic>m</italic> + 1 states and can be constructed faster than a suffix automaton.</p></sec></sec>
<sec>
<label>3.</label>
<title>Deterministic Arithmetic Automata</title>
<p>In this section, we introduce deterministic arithmetic automata (DAAs). They extend ordinary deterministic finite automata (DFAs) by performing a computation while one moves from state to state. Even though DAAs can be shown to be formally equivalent to families of DFAs on an appropriately defined larger state space, they are a useful concept before introducing probabilistic arithmetic automata (PAAs) and allow us to construct PAAs for the analysis of pattern matching algorithms in a simpler way. By using the PAA framework, we emphasize the connection between the problems discussed in the present article and those solved before using the same formalism: Other applications in biological sequence analysis include the exact computation of clump size distributions and <italic>p</italic>-values of sequence motifs [<xref ref-type="bibr" rid="b19-algorithms-04-00285">19</xref>], and the determination of seed sensitivity for pairwise sequence alignment algorithms based on filtering [<xref ref-type="bibr" rid="b20-algorithms-04-00285">20</xref>].</p>
<sec>
<title>Definition 1 (Deterministic Arithmetic Automaton, DAA)</title>
<p>A <italic>deterministic arithmetic automaton</italic> is a tuple
<disp-formula id="FD4">
<mml:math id="mm11" display="block">
<mml:semantics id="sm11">
<mml:mrow>
<mml:mi mathvariant="script">D</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi mathvariant="script">Q</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>∑</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>δ</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="script">V</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>ℰ</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>η</mml:mi>
<mml:mi>q</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>∈</mml:mo>
<mml:mi mathvariant="script">Q</mml:mi></mml:mrow></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mi>q</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>∈</mml:mo>
<mml:mi mathvariant="script">Q</mml:mi></mml:mrow></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula>where 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/> is a finite set of states, <italic>q</italic><sub>0</sub> ∈ 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/> is the start state, Σ is a finite alphabet, <italic>δ</italic> : 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/> × Σ → 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/> is a transition function, 
<inline-graphic xlink:href="algorithms-04-00285i4.gif"/> is a finite or countable set of values, <italic>v</italic><sub>0</sub> ∈ 
<inline-graphic xlink:href="algorithms-04-00285i4.gif"/> is called the start value, ℰ is a finite set of emissions, <italic>η<sub>q</sub></italic> ∈ ℰ is the emission associated to state <italic>q</italic>, and <italic>θ<sub>q</sub></italic> : 
<inline-graphic xlink:href="algorithms-04-00285i4.gif"/> × ℰ → 
<inline-graphic xlink:href="algorithms-04-00285i4.gif"/> is a binary operation associated to state <italic>q</italic>.</p>
<p>Informally, a DAA starts with the state-value pair (<italic>q</italic><sub>0</sub>, <italic>v</italic><sub>0</sub>) and reads a sequence of symbols from Σ. Being in state <italic>q</italic> with value <italic>v</italic>, upon reading <italic>σ</italic> ∈ Σ, the DAA performs a state transition to <italic>q</italic>′ ≔ <italic>δ</italic>(<italic>q, σ</italic>) and updates the value to <italic>v</italic>′ ≔ <italic>θ<sub>q′</sub></italic>(<italic>v, η<sub>q′</sub></italic>) using the operation and emission of the new state <italic>q′</italic>.</p>
<p>Further, we define the associated joint transition function
<disp-formula id="FD5">
<mml:math id="mm12" display="block">
<mml:semantics id="sm12">
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mover accent="true">
<mml:mi>δ</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mo>:</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi mathvariant="script">Q</mml:mi>
<mml:mo>×</mml:mo>
<mml:mi mathvariant="script">V</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>×</mml:mo>
<mml:mo>∑</mml:mo>
<mml:mo>→</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi mathvariant="script">Q</mml:mi>
<mml:mo>×</mml:mo>
<mml:mi mathvariant="script">V</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo></mml:mrow></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mover accent="true">
<mml:mi>δ</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>σ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>:</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>δ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>σ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mrow>
<mml:mi>δ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>σ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>η</mml:mi>
<mml:mrow>
<mml:mi>δ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>σ</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable>
<mml:mo>.</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>As usual, we extend the definition of <italic>δ̂</italic> inductively from Σ to Σ* in its second argument by <italic>δ̂</italic>((<italic>q, v</italic>), <italic>ε</italic>) ≔ (<italic>q, v</italic>) for the empty string <italic>ε</italic> and <italic>δ̂</italic>((<italic>q, v</italic>), <italic>xσ</italic>) ≔ <italic>δ</italic>(<italic>δ̂</italic>((<italic>q, v</italic>), <italic>x</italic>),<italic>σ</italic>) for all <italic>x</italic> ∈ Σ* and <italic>σ</italic> ∈ Σ.</p>
<p>When <italic>δ̂</italic>((<italic>q</italic><sub>0</sub>, <italic>v</italic><sub>0</sub>), <italic>s</italic>) = (<italic>q, v</italic>) for some <italic>q</italic> ∈ 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/> and <italic>s</italic> ∈ Σ*, we say that 
<inline-graphic xlink:href="algorithms-04-00285i2.gif"/> computes value <italic>v</italic> for input <italic>s</italic> and define <italic>value</italic><sub>
<inline-graphic xlink:href="algorithms-04-00285i5.gif"/></sub>(<italic>s</italic>) ≔ <italic>v</italic>.</p>
<p>For each state <italic>q</italic>, the emission <italic>η<sub>q</sub></italic> is fixed and could be dropped from the definition of DAAs. In fact, one could also dispense with values and operations entirely and define a DFA over state space 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/> × 
<inline-graphic xlink:href="algorithms-04-00285i4.gif"/>, performing the same operations as a DAA. However, we intentionally include values, operations, and emissions to emphasize the connection to PAAs (which are defined in Section 5).</p>
<p>As a simple example for a DAA, take a standard DFA (
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/>, <italic>q</italic><sub>0</sub>, Σ, <italic>δ, F</italic>) with <italic>F</italic> ⊂ 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/> being a set of final (or accepting) states. To obtain a DAA that counts how many times the DFA visits an accepting state when reading <italic>s</italic> ∈ Σ*, let ℰ ≔ {0, 1} and define <italic>η<sub>q</sub></italic> ≔ 1 if <italic>q</italic> ∈ <italic>F</italic>, and <italic>η<sub>q</sub></italic> ≔ 0 otherwise. Further define 
<inline-graphic xlink:href="algorithms-04-00285i4.gif"/> = ℕ with <italic>v</italic><sub>0</sub> ≔ 0, and let the operation in each state be the usual addition: <italic>θ<sub>q</sub></italic>(<italic>v, e</italic>) ≔ <italic>v</italic> + <italic>e</italic> for all <italic>q</italic>. Then <italic>value</italic><sub>
<inline-graphic xlink:href="algorithms-04-00285i5.gif"/></sub>(<italic>s</italic>) is the desired count.</p></sec></sec>
<sec sec-type="methods">
<label>4.</label>
<title>Constructing DAAs for Pattern Matching Analysis</title>
<p>For a given algorithm and pattern <italic>p</italic> ∈ Σ<italic><sup>m</sup></italic> with known shift and cost functions, <italic>shift<sup>p</sup></italic> : Σ<italic><sup>m</sup></italic> → {1, …, <italic>m</italic>}, <italic>w</italic> ↦ <italic>shift<sup>p</sup></italic>(<italic>w</italic>) and <italic>ξ<sup>p</sup></italic> : Σ<italic><sup>m</sup></italic> → ℕ, <italic>w</italic> ↦ <italic>ξ<sup>p</sup></italic>(<italic>w</italic>), we construct a DAA that upon reading a text <italic>s</italic> ∈ Σ* computes the total cost, defined as the sum of costs over all examined windows. (Which windows are examined depends of course on the shift values of previously examined windows.) Slightly abusing notation, we write <italic>ξ<sup>p</sup></italic>(<italic>s</italic>) for the total cost incurred on <italic>s</italic>.</p>
<p>While different constructions are possible (see also [<xref ref-type="bibr" rid="b18-algorithms-04-00285">18</xref>]), the construction presented here has the advantage that it is simple to describe and implement and processes only one text character at a time. This property allows the construction of a product DAA that directly compares two algorithms as detailed in Section 6.</p>
<sec>
<title>Definition 2 (DAA encoding a pattern matching algorithm)</title>
<p>Given a window-based pattern matching algorithm <italic>A</italic>, a pattern <italic>p</italic> ∈ Σ<sup><italic>m</italic></sup>, and the associated shift and cost functions, <italic>shift<sup>p</sup></italic> : Σ<sup><italic>m</italic></sup> → {1, …, <italic>m</italic>} and <italic>ξ<sup>p</sup></italic> : Σ<sup><italic>m</italic></sup> → ℕ, the <italic>DAA encoding algorithm A</italic> is defined by
<list list-type="bullet">
<list-item>
<p>
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/> ≔ Σ<sup><italic>m</italic></sup> × {0, …,<italic>m</italic>},</p></list-item>
<list-item>
<p><italic>q</italic><sub>0</sub> ≔ (<italic>p, m</italic>),</p></list-item></list>where informally, a state <italic>q</italic> = (<italic>w, x</italic>) means that the last m read characters spell <italic>w</italic> and that <italic>x</italic> more characters need to be read to get to the end of the current window. For the start state <italic>q</italic><sub>0</sub> = (<italic>p, m</italic>), the component <italic>p</italic> is arbitrary, as we need to read <italic>m</italic> characters to reach the end of the first window.</p>
<p>The remaining components are defined as
<list list-type="bullet">
<list-item>
<p>
<inline-graphic xlink:href="algorithms-04-00285i4.gif"/> ≔ ℕ,</p></list-item>
<list-item>
<p><italic>v</italic><sub>0</sub> ≔ 0,</p></list-item>
<list-item>
<p>ℰ ≔ {1, …, <italic>m</italic>},</p></list-item>
<list-item>
<p>
<inline-formula>
<mml:math id="mm13" display="inline">
<mml:semantics id="sm13">
<mml:mrow>
<mml:msub>
<mml:mi>η</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msub>
<mml:mo>:</mml:mo>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mn>0</mml:mn></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mtext mathvariant="italic">if</mml:mtext>
<mml:mspace width="0.3em"/>
<mml:mi>x</mml:mi>
<mml:mo>&gt;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msup>
<mml:mi>ξ</mml:mi>
<mml:mi>p</mml:mi></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mtext mathvariant="italic">if</mml:mtext>
<mml:mspace width="0.3em"/>
<mml:mi>x</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:semantics></mml:math></inline-formula></p></list-item>
<list-item>
<p><italic>θ<sub>q</sub></italic> : (<italic>v, e</italic>) ↦ <italic>v</italic> + <italic>e</italic> for all <italic>q</italic> ∈ 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/> (addition),</p></list-item>
<list-item>
<p>
<inline-formula>
<mml:math id="mm14" display="inline">
<mml:semantics id="sm14">
<mml:mrow>
<mml:mi>δ</mml:mi>
<mml:mo>:</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>σ</mml:mi></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>↦</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo>′</mml:mo>
<mml:mi>σ</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mtext mathvariant="italic">if</mml:mtext>
<mml:mspace width="0.3em"/>
<mml:mi>x</mml:mi>
<mml:mo>&gt;</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo>′</mml:mo>
<mml:mi>σ</mml:mi>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mtext mathvariant="italic">shift</mml:mtext>
<mml:mi>p</mml:mi></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>w</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mtext mathvariant="italic">if</mml:mtext>
<mml:mspace width="0.3em"/>
<mml:mi>x</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:semantics></mml:math></inline-formula></p></list-item></list>(where <italic>w′</italic> is the length-(<italic>m</italic> − 1) suffix of <italic>w, i.e., w′</italic> ≔ <italic>w</italic>[1] … <italic>w</italic>[<italic>m</italic> − 1].</p>
<p><xref ref-type="fig" rid="f2-algorithms-04-00285">Figure 2</xref> shows an example of how a DAA for Horspool's algorithm moves from state to state. The value accumulates the cost of examined windows. Therefore, the operation is a simple addition in each state, and the emission of state (<italic>w, x</italic>) specifies the cost to add. Consequently, the emission is zero if the state does not correspond to an examined window (<italic>x</italic> &gt; 0), and the emission equals the window cost <italic>ξ<sup>p</sup></italic>(<italic>w</italic>) if <italic>x</italic> = 0. The transition function <italic>δ</italic> specifies how to move from one state to the next when reading the next text character <italic>σ</italic> ∈ Σ: In any case, the window content is updated by forgetting the first character and appending the read <italic>σ</italic>. If the end of the current window has not been reached (<italic>x</italic> &gt; 0), the counter <italic>x</italic> is decremented. Otherwise, the window's shift value is used to compute the number of characters till the next window aligns.</p></sec>
<sec>
<title>Theorem 1</title>
<p>Let 
<inline-graphic xlink:href="algorithms-04-00285i2.gif"/> be a DAA as given by Definition 2. Then, value<sub>
<inline-graphic xlink:href="algorithms-04-00285i2.gif"/></sub>(<italic>s</italic>) = <italic>ξ<sup>p</sup></italic>(<italic>s</italic>) for all <italic>s</italic> ∈ Σ*.</p></sec>
<sec>
<title>Proof</title>
<p>The total cost <italic>ξ<sup>p</sup></italic>(<italic>s</italic>) can be written as the sum of costs of all processed windows: <italic>ξ<sup>p</sup></italic>(<italic>s</italic>) = Σ<sub><italic>i</italic>∈<italic>I<sub>s</sub></italic></sub> <italic>ξ<sup>p</sup></italic>(<italic>s</italic>[<italic>i</italic> − <italic>m</italic> + 1 … <italic>i</italic>]), where <italic>I<sub>s</sub></italic> is the set of indices giving the processed windows, <italic>i.e., I<sub>s</sub></italic> ⊂ {<italic>m</italic> − 1, …, |<italic>s</italic>| − 1} such that
<disp-formula id="FD6">
<mml:math id="mm15" display="block">
<mml:semantics id="sm15">
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>:</mml:mo>
<mml:mo>⇔</mml:mo></mml:mrow></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mtext>or</mml:mtext></mml:mrow></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>∃</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mi>s</mml:mi></mml:msub>
<mml:mo>:</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>+</mml:mo>
<mml:msup>
<mml:mtext mathvariant="italic">shift</mml:mtext>
<mml:mi>p</mml:mi></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo stretchy="false">[</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>-</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>…</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>We have to prove that the DAA computes this value for <italic>s</italic> ∈ Σ*.</p>
<p>Let (<italic>w<sub>i</sub>, x<sub>i</sub></italic>) be the DAA state active after reading <italic>s</italic>[‥<italic>i</italic>]. Observe that the transition function <italic>δ</italic> ensures that the <italic>w<sub>i</sub></italic>-component of (<italic>w<sub>i</sub>, x<sub>i</sub></italic>) reflects the rightmost length-<italic>m</italic> window of <italic>s</italic>[‥<italic>i</italic>], which can immediately be verified inductively. Thus, the emission on reading the last character <italic>s</italic>[<italic>i</italic>] of <italic>s</italic>[‥<italic>i</italic>] with <italic>i</italic> ≥ <italic>m</italic> − 1 is, by definition of <italic>η</italic>(<italic>w<sub>i</sub>, x<sub>i</sub></italic>), either <italic>ξ<sup>p</sup></italic>(<italic>s</italic>[<italic>i</italic> − <italic>m</italic> + 1 … <italic>i</italic>]) or zero, depending on the second component of (<italic>w<sub>i</sub>, x<sub>i</sub></italic>). As the operation is an addition for all states, 
<inline-formula>
<mml:math id="mm16" display="inline">
<mml:semantics id="sm16">
<mml:mrow>
<mml:msub>
<mml:mtext mathvariant="italic">value</mml:mtext>
<mml:mi mathvariant="script">D</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msubsup>
<mml:mi>I</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:msub>
<mml:mrow>
<mml:msup>
<mml:mi>ξ</mml:mi>
<mml:mi>p</mml:mi></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo stretchy="false">[</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>-</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>…</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></inline-formula> for
<disp-formula id="FD7">
<mml:math id="mm17" display="block">
<mml:semantics id="sm17">
<mml:mrow>
<mml:msubsup>
<mml:mi>I</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>′</mml:mo></mml:msubsup>
<mml:mo>≔</mml:mo>
<mml:mo stretchy="false">{</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>∈</mml:mo>
<mml:mo stretchy="false">{</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">|</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo stretchy="false">|</mml:mo></mml:mrow>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">}</mml:mo>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo stretchy="false">}</mml:mo>
<mml:mo>.</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>It remains to show that 
<inline-formula>
<mml:math id="mm18" display="inline">
<mml:semantics id="sm18">
<mml:mrow>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mi>s</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mi>I</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:semantics></mml:math></inline-formula>. To this end, note that by <italic>δ</italic>, we have <italic>x<sub>i</sub></italic><sub>+1</sub> = <italic>x<sub>i</sub></italic> − 1 if <italic>x<sub>i</sub></italic><sub>+1</sub> &gt; 0 and <italic>x<sub>i</sub></italic><sub>+1</sub> = <italic>shift<sup>p</sup></italic>(<italic>w<sub>i</sub></italic>) − 1 if <italic>x<sub>i</sub></italic><sub>+1</sub> = 0. As <italic>q</italic><sub>0</sub> = (<italic>p, m</italic>), it follows that 
<inline-formula>
<mml:math id="mm19" display="inline">
<mml:semantics id="sm19">
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>∈</mml:mo>
<mml:msubsup>
<mml:mi>I</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:semantics></mml:math></inline-formula>. Using <italic>w<sub>i</sub></italic> = <italic>s</italic>[<italic>i</italic> − <italic>m</italic> + 1 … <italic>i</italic>] for <italic>i</italic> ≥ <italic>m</italic> − 1, we conclude that whenever <italic>x<sub>i</sub></italic> = 0, it follows that <italic>x<sub>j</sub></italic> = 0 for <italic>j</italic> = <italic>i</italic>+<italic>shift<sup>p</sup></italic>(<italic>s</italic>[<italic>i</italic> − <italic>m</italic> + 1 … <italic>i</italic>]) and that <italic>x<sub>j</sub></italic><sub>′</sub> &gt; 0 for <italic>i</italic> &lt; <italic>j</italic>′ &lt; <italic>j</italic>. Hence we obtain that 
<inline-formula>
<mml:math id="mm20" display="inline">
<mml:semantics id="sm20">
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msubsup>
<mml:mi>I</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:semantics></mml:math></inline-formula> implies that 
<inline-formula>
<mml:math id="mm21" display="inline">
<mml:semantics id="sm21">
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>+</mml:mo>
<mml:msup>
<mml:mtext>shift</mml:mtext>
<mml:mi>p</mml:mi></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo stretchy="false">[</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>-</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>…</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>∈</mml:mo>
<mml:msubsup>
<mml:mi>I</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:semantics></mml:math></inline-formula> and 
<inline-formula>
<mml:math id="mm22" display="inline">
<mml:semantics id="sm22">
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>∉</mml:mo>
<mml:msubsup>
<mml:mi>I</mml:mi>
<mml:mi>s</mml:mi>
<mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:semantics></mml:math></inline-formula> for 0 &lt; <italic>k</italic> &lt; <italic>shift<sup>p</sup></italic>(<italic>s</italic>[<italic>i</italic> − <italic>m</italic> + 1 … <italic>i</italic>]), which completes the proof.</p>
<sec>
<title>DAA Minimization</title>
<p>The size of the constructed DAA's state space is (<italic>m</italic> + 1)|Σ|<italic><sup>m</sup></italic> and grows exponentially with the pattern length, making the application for long patterns infeasible in practice. However, depending on the particular circumstances (<italic>i.e.</italic>, algorithm and pattern analyzed), the constructed DAA can often be substantially reduced by state space minimization [<xref ref-type="bibr" rid="b21-algorithms-04-00285">21</xref>]. For example, for B(N)DM, both cost and shift of an examined window depend only on the longest factor of <italic>p</italic> that is a suffix of the window. Since there are only <italic>O</italic>(<italic>m</italic><sup>2</sup>) different factors, it is reasonable that |Σ|<italic><sup>m</sup></italic> can be replaced by <italic>O</italic>(<italic>m</italic><sup>2</sup>), for a total state space of size <italic>O</italic>(<italic>m</italic><sup>3</sup>). Therefore, for each algorithm, a specialized construction may exist that directly constructs the minimal state space whose size may only grow polynomially with <italic>m</italic>. For the Horspool algorithm, it is known that the state space has a size of only <italic>O</italic>(<italic>m</italic><sup>2</sup>), as the construction of Tsai [<xref ref-type="bibr" rid="b13-algorithms-04-00285">13</xref>] can be adapted to construct a DAA according to our definition. However, we have been unable to provide a direct construction of the minimal DAA applicable to all window-based pattern matching algorithms.</p>
<p>Hopcroft's algorithm [<xref ref-type="bibr" rid="b21-algorithms-04-00285">21</xref>] minimizes a DFA in <italic>O</italic>(|
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/>| log |
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/>|) time by iteratively refining a partition of the state set. In the beginning, all states are partitioned into two distinct sets: one containing the accepting states, and the other containing the non-accepting states. This partition is iteratively refined whenever a reason for non-equivalence of two states in the same set is found. Upon termination, the states are partitioned into sets of equivalent states. Refer to [<xref ref-type="bibr" rid="b22-algorithms-04-00285">22</xref>] for an in-depth explanation of Hopcroft's algorithm.</p>
<p>The algorithm can straightforwardly be adapted to minimize DAAs by choosing the initial state set partition appropriately. In our case, each DAA state is associated with the same operation. The only differences in state's behavior thus stem from different emissions. Therefore, Hopcroft's algorithm can be initialized by the partition induced by the emissions and then continued as usual.</p>
<p>As we exemplify in Section 7, this leads to a considerable reduction of the number of states.</p></sec></sec></sec>
<sec>
<label>5.</label>
<title>Probabilistic Arithmetic Automata</title>
<p>This section introduces finite-memory random text models and explains how to construct a <italic>probabilistic arithmetic automaton</italic> (PAA) from a (minimized) DAA and a random text model. PAAs were introduced in [<xref ref-type="bibr" rid="b1-algorithms-04-00285">1</xref>], where they are used to compute pattern occurrence count distributions. Further examples for the utility of PAAs are discussed in [<xref ref-type="bibr" rid="b19-algorithms-04-00285">19</xref>] and [<xref ref-type="bibr" rid="b20-algorithms-04-00285">20</xref>].</p>
<sec>
<label>5.1.</label>
<title>Random Text Models</title>
<p>Given an alphabet Σ, a random text is a stochastic process (<italic>S<sub>t</sub></italic>)<sub><italic>t</italic>∈ℕ<sub>0</sub></sub>, where each <italic>S<sub>t</sub></italic> takes values in Σ. A text model ℙ is a probability measure assigning probabilities to (sets of) strings. It is given by (consistently) specifying the probabilities ℙ(<italic>S</italic><sub>0</sub> … <italic>S</italic><sub>|<italic>s</italic>|−1</sub> = <italic>s</italic>) for all <italic>s</italic> ∈ Σ*. We only consider finite-memory models in this article which are formalized in the following definition.</p>
<sec>
<title>Definition 3 (Finite-memory text model)</title>
<p>A finite-memory text model is a tuple (
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/>, <italic>c</italic><sub>0</sub>, Σ, <italic>φ</italic>), where 
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/> is a finite state space (called context space), <italic>c</italic><sub>0</sub> ∈ 
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/> a start context, Σ an alphabet, and <italic>φ</italic> : 
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/> × Σ × 
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/> → [0, 1] a transition function with Σ<sub><italic>σ</italic>∈Σ,<italic>c′</italic>∈
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/></sub> <italic>φ</italic>(<italic>c, σ, c′</italic>) = 1 for all <italic>c</italic> ∈ 
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/>. The random variable giving the text model state after <italic>t</italic> steps is denoted <italic>C<sub>t</sub></italic> with C<sub>0</sub> :≡ <italic>c</italic><sub>0</sub>. A probability measure is now induced by stipulating
<disp-formula id="FD8">
<mml:math id="mm23" display="block">
<mml:semantics id="sm23">
<mml:mrow>
<mml:mi>ℙ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>…</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>:</mml:mo>
<mml:mo>=</mml:mo>
<mml:munderover>
<mml:mo>∏</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:munderover>
<mml:mrow>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo stretchy="false">[</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula>for all <italic>n</italic> ∈ ℕ<sub>0</sub>, <italic>s</italic> ∈ Σ<italic><sup>n</sup></italic>, and (<italic>c</italic><sub>1</sub>, …, <italic>c<sub>n</sub></italic>) ∈ 
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/><italic><sup>n</sup></italic>.</p>
<p>The idea is that the model given by (
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/>, <italic>c</italic><sub>0</sub>, Σ, <italic>φ</italic>) generates a random text by moving from context to context and emitting a character at each transition, where <italic>φ</italic>(<italic>c, σ, c′</italic>) is the probability of moving from context <italic>c</italic> to context <italic>c′</italic> and thereby generating the letter <italic>σ</italic>.</p>
<p>Note that the probability ℙ(<italic>S</italic><sub>0</sub> … <italic>S</italic><sub>|<italic>s</italic>|−1</sub> = <italic>s</italic>) is obtained by marginalization over all context sequences that generate <italic>s</italic>. This can be efficiently done, using the decomposition of the following lemma.</p></sec>
<sec>
<title>Lemma 1</title>
<p>Let (
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/>, <italic>c</italic><sub>0</sub>, Σ, <italic>φ</italic>) be a finite-memory text model. Then,
<disp-formula id="FD9">
<mml:math id="mm24" display="block">
<mml:semantics id="sm24">
<mml:mrow>
<mml:mi>ℙ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>…</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>σ</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>∈</mml:mo>
<mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:munder>
<mml:mrow>
<mml:mi>ℙ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>…</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>⋅</mml:mo>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>σ</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula>for all <italic>n</italic> ∈ ℕ<sub>0</sub>, <italic>s</italic> ∈ Σ<italic><sup>n</sup>, σ</italic> ∈ Σ and <italic>c</italic> ∈ 
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/>.</p></sec>
<sec>
<title>Proof</title>
<p>We have
<disp-formula id="FD10">
<mml:math id="mm25" display="block">
<mml:semantics id="sm25">
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left"/>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mi>ℙ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>…</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>σ</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mo>=</mml:mo></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:munder>
<mml:mrow>
<mml:mi>ℙ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>…</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>σ</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>,</mml:mo></mml:mrow></mml:msub>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mo>=</mml:mo></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:munder>
<mml:mrow>
<mml:munderover>
<mml:mo>∏</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:munderover>
<mml:mrow>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo stretchy="false">[</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>⋅</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>σ</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mo>=</mml:mo></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo>∈</mml:mo>
<mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:munder>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:munder>
<mml:mrow>
<mml:munderover>
<mml:mo>∏</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn></mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:munderover>
<mml:mrow>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo stretchy="false">[</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow>
<mml:mo>⋅</mml:mo>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>σ</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mo>=</mml:mo></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo>∈</mml:mo>
<mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:munder>
<mml:mrow>
<mml:mi>ℙ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>…</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>⋅</mml:mo>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mi>n</mml:mi></mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>σ</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>Renaming <italic>c<sub>n</sub></italic> to <italic>c′</italic> yields the claimed result.</p>
<p>Similar text models are used in [<xref ref-type="bibr" rid="b23-algorithms-04-00285">23</xref>], where they are called probability transducers. In the following, we refer to a finite-memory text model (
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/>, <italic>c</italic><sub>0</sub>, Σ, <italic>φ</italic>) simply as text model, as all text models considered in this article are special cases of Definition 3.</p>
<p>For an i.i.d. model, we set 
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/> = {<italic>ε</italic>} and <italic>φ</italic>(<italic>ε, σ, ε</italic>) = <italic>p<sub>σ</sub></italic> for each <italic>σ</italic> ∈ Σ, where <italic>p<sub>σ</sub></italic> is the occurrence probability of letter <italic>σ</italic> (and <italic>ε</italic> may be interpreted as an empty context). For a Markovian text model of order <italic>r</italic>, the distribution of the next character depends on the <italic>r</italic> preceding characters (fewer at the beginning); thus 
<inline-formula>
<mml:math id="mm26" display="inline">
<mml:semantics id="sm26">
<mml:mrow>
<mml:mi mathvariant="script">C</mml:mi>
<mml:mo>:</mml:mo>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mo>∪</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn></mml:mrow>
<mml:mi>r</mml:mi></mml:msubsup>
<mml:mrow>
<mml:msup>
<mml:mo>∑</mml:mo>
<mml:mi>i</mml:mi></mml:msup></mml:mrow></mml:mrow></mml:semantics></mml:math></inline-formula>. This notion of text models also covers variable order Markov chains as introduced in [<xref ref-type="bibr" rid="b24-algorithms-04-00285">24</xref>], which can be converted into equivalent models of fixed order. Text models as defined above have the same expressive power as character-emitting HMMs, that means, they allow to construct the same probability distributions.</p></sec></sec>
<sec>
<label>5.2.</label>
<title>Basic PAA Concepts</title>
<p>Probabilistic arithmetic automata (PAAs), as introduced in [<xref ref-type="bibr" rid="b1-algorithms-04-00285">1</xref>], are a generic concept useful to model probabilistic chains of operations. In this section, we sum up the definition and basic recurrences needed in this article.</p>
<sec>
<title>Definition 4 (Probabilistic Arithmetic Automaton, PAA)</title>
<p>A <italic>probabilistic arithmetic automaton</italic> is a tuple 
<inline-graphic xlink:href="algorithms-04-00285i7.gif"/> = (
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/>, <italic>q</italic><sub>0</sub>, <italic>T</italic>, 
<inline-graphic xlink:href="algorithms-04-00285i4.gif"/>, <italic>v</italic><sub>0</sub>, ℰ, <italic>μ</italic> = (<italic>μ<sub>q</sub></italic>)<sub><italic>q</italic>∈
<inline-graphic xlink:href="algorithms-04-00285i6.gif"/></sub>, <italic>θ</italic> = (<italic>θ<sub>q</sub></italic>)<sub><italic>q</italic>∈
<inline-graphic xlink:href="algorithms-04-00285i6.gif"/></sub>), where 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/>, <italic>q</italic><sub>0</sub>, 
<inline-graphic xlink:href="algorithms-04-00285i4.gif"/>, <italic>v</italic><sub>0</sub>, ℰ and <italic>θ</italic> have the same meaning as for a DAA, each <italic>μ<sub>q</sub></italic> is a state-specific probability distribution on the emissions ℰ, and <italic>T</italic> : 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/> × 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/> → [0, 1] is a transition function, such that <italic>T</italic>(<italic>q, q′</italic>) gives the probability of a transition from state <italic>q</italic> to state <italic>q′</italic>, i.e., (<italic>T</italic>(<italic>q, q′</italic>))<sub><italic>q, q′</italic>∈
<inline-graphic xlink:href="algorithms-04-00285i6.gif"/></sub> is a stochastic matrix.</p>
<p>A PAA induces three stochastic processes: (1) the state process (<italic>Q<sub>t</sub></italic>)<sub><italic>t</italic>∈ℕ</sub> with values in 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/>, (2) the emission process (<italic>E<sub>t</sub></italic>)<sub><italic>t</italic>∈ℕ</sub> with values in ℰ, and (3) the value process (<italic>V<sub>t</sub></italic>)<sub><italic>t</italic>∈ℕ</sub> with values in 
<inline-graphic xlink:href="algorithms-04-00285i4.gif"/> such that 
<inline-graphic xlink:href="algorithms-04-00285i4.gif"/><sub>0</sub> :≡ <italic>v</italic><sub>0</sub> and 
<inline-graphic xlink:href="algorithms-04-00285i4.gif"/><sub><italic>t</italic></sub> ≔ <italic>θ<sub>Q<sub>t</sub></sub></italic> (<italic>V</italic><sub><italic>t</italic>−1</sub>, <italic>E<sub>t</sub></italic>).</p>
<p>We now restate the PAA recurrences from [<xref ref-type="bibr" rid="b1-algorithms-04-00285">1</xref>] to compute the state-value distribution after <italic>t</italic> steps. For the sake of a shorter notation, we define <italic>f<sub>t</sub></italic>(<italic>q, v</italic>) ≔ ℙ(<italic>Q<sub>t</sub></italic> = <italic>q, V<sub>t</sub></italic> = <italic>v</italic>). Since we are generally only interested in the value distribution, note that it can be obtained by marginalization: ℙ(<italic>V<sub>t</sub></italic> = <italic>v</italic>) = Σ<sub><italic>q</italic>∈
<inline-graphic xlink:href="algorithms-04-00285i6.gif"/></sub> <italic>f<sub>t</sub></italic>(<italic>q, v</italic>).</p></sec>
<sec>
<title>Lemma 2 (State-value recurrence, [<xref ref-type="bibr" rid="b1-algorithms-04-00285">1</xref>])</title>
<p>The state-value distribution is given by <italic>f</italic><sub>0</sub>(<italic>q, v</italic>) = 1 if <italic>q</italic> = <italic>q</italic><sub>0</sub> and <italic>v</italic> = <italic>v</italic><sub>0</sub>, and <italic>f</italic><sub>0</sub>(<italic>q, v</italic>) = 0 otherwise. For <italic>t</italic> ≥ 0,
<disp-formula id="FD11">
<label>(1)</label>
<mml:math id="mm27" display="block">
<mml:semantics id="sm27">
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>∈</mml:mo>
<mml:mi mathvariant="script">Q</mml:mi></mml:mrow></mml:munder>
<mml:mrow>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>∈</mml:mo>
<mml:msubsup>
<mml:mi>θ</mml:mi>
<mml:mi>q</mml:mi>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msubsup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:munder>
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>⋅</mml:mo>
<mml:mi>T</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>⋅</mml:mo>
<mml:msub>
<mml:mi>μ</mml:mi>
<mml:mi>q</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula>where 
<inline-formula>
<mml:math id="mm28" display="inline">
<mml:semantics id="sm28">
<mml:mrow>
<mml:msubsup>
<mml:mi>θ</mml:mi>
<mml:mi>q</mml:mi>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msubsup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></inline-formula> denotes the inverse image set of <italic>v</italic> under <italic>θ<sub>q</sub></italic>.</p>
<p>The recurrence in Lemma 2 resembles the Forward recurrences known from HMMs.</p>
<p>Note that the range of <italic>V<sub>t</sub></italic> is finite for each <italic>t</italic>, even when 
<inline-graphic xlink:href="algorithms-04-00285i4.gif"/> is infinite, as <italic>V<sub>t</sub></italic> is a function of the states and emissions up to time <italic>t</italic>, and state set 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/> and emission set ℰ are finite. We define 
<inline-graphic xlink:href="algorithms-04-00285i4.gif"/><italic><sub>t</sub></italic> ≔ range <italic>V<sub>t</sub></italic> and <italic>ϑ<sub>n</sub></italic> ≔ max<sub>0≤<italic>t</italic>≤<italic>n</italic></sub> |
<inline-graphic xlink:href="algorithms-04-00285i4.gif"/><italic><sub>t</sub></italic>|. Clearly <italic>ϑ<sub>n</sub></italic> ≤ (|
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/>| · |ℰ|)<italic><sup>n</sup></italic>. Therefore all actual computations are on finite sets. When analyzing the number of character accesses of a pattern matching algorithm, we have 
<inline-graphic xlink:href="algorithms-04-00285i4.gif"/><italic><sub>t</sub></italic> ⊂ {0, …, <italic>m</italic>(<italic>n</italic> − <italic>m</italic> + 1)}, as at most (<italic>n</italic> − <italic>m</italic> + 1) search windows are processed, each causing at most <italic>m</italic> character accesses. Thus, <italic>ϑ<sub>n</sub></italic> ∈ <italic>O</italic>(<italic>n</italic> · <italic>m</italic>).</p></sec></sec>
<sec>
<label>5.3.</label>
<title>Constructing a PAA from a DAA and a Text Model</title>
<p>We now formally state how to combine a DAA and a text model into a PAA that allows us to compute the distribution of values produced by the DAA when processing a random text.</p>
<sec>
<title>Definition 5 (PAA induced by DAA and text model)</title>
<p>Let a text model <italic>M</italic> = (
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/>, <italic>c</italic><sub>0</sub>, Σ, <italic>φ</italic>) and a DAA 
<inline-formula>
<mml:math id="mm29" display="inline">
<mml:semantics id="sm29">
<mml:mrow>
<mml:mi mathvariant="script">D</mml:mi>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="script">Q</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>q</mml:mi>
<mml:mn>0</mml:mn>
<mml:mi mathvariant="script">D</mml:mi></mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mo>∑</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>δ</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="script">V</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>ℰ</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>η</mml:mi>
<mml:mi>q</mml:mi></mml:msub></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">Q</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msup></mml:mrow></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>θ</mml:mi>
<mml:mi>q</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msubsup></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">Q</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msup></mml:mrow></mml:msub></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></inline-formula> over the same alphabet Σ be given. Then, we define the <italic>PAA induced by</italic> 
<inline-graphic xlink:href="algorithms-04-00285i2.gif"/> and <italic>M</italic> by giving
<list list-type="bullet">
<list-item>
<p>a state space 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/> ≔ 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/><sup>
<inline-graphic xlink:href="algorithms-04-00285i5.gif"/></sup> × 
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/>,</p></list-item>
<list-item>
<p>a start state 
<inline-formula>
<mml:math id="mm30" display="inline">
<mml:semantics id="sm30">
<mml:mrow>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>≔</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>q</mml:mi>
<mml:mn>0</mml:mn>
<mml:mi mathvariant="script">D</mml:mi></mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>0</mml:mn></mml:msub></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></inline-formula>,</p></list-item>
<list-item>
<p>transition probabilities
<disp-formula id="FD12">
<label>(2)</label>
<mml:math id="mm31" display="block">
<mml:semantics id="sm31">
<mml:mrow>
<mml:mi>T</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>≔</mml:mo>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>σ</mml:mi>
<mml:mo>∈</mml:mo>
<mml:mo>∑</mml:mo>
<mml:mo>:</mml:mo>
<mml:mi>δ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>σ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>′</mml:mo></mml:mrow></mml:munder>
<mml:mrow>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>σ</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula></p></list-item>
<list-item>
<p>(deterministic) emission probability vectors 
<inline-formula>
<mml:math id="mm32" display="inline">
<mml:semantics id="sm32">
<mml:mrow>
<mml:msub>
<mml:mi>μ</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>≔</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">⟦</mml:mo>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>η</mml:mi>
<mml:mi>q</mml:mi></mml:msub></mml:mrow>
<mml:mo stretchy="false">⟧</mml:mo></mml:mrow>
<mml:mspace width="0.3em"/>
<mml:mtext>for all</mml:mtext>
<mml:mspace width="0.3em"/>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>∈</mml:mo>
<mml:mi mathvariant="script">Q</mml:mi></mml:mrow></mml:semantics></mml:math></inline-formula>,</p></list-item>
<list-item>
<p>operations 
<inline-formula>
<mml:math id="mm33" display="inline">
<mml:semantics id="sm33">
<mml:mrow>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>≔</mml:mo>
<mml:msubsup>
<mml:mi>θ</mml:mi>
<mml:mi>q</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msubsup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mspace width="0.3em"/>
<mml:mtext>for all</mml:mtext>
<mml:mspace width="0.3em"/>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>∈</mml:mo>
<mml:mi mathvariant="script">Q</mml:mi></mml:mrow></mml:semantics></mml:math></inline-formula>.</p></list-item></list></p>
<p>Note that states having zero probability of being reached from <italic>q</italic><sub>0</sub> may be omitted from 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/> and <italic>T</italic> without changing the PAA's state, emission or value process. The next lemma states that the PAA given by Definition 3 indeed reflects the probabilistic behavior of the input DAA acting on a random text generated by the text model. Furthermore, it gives the runtime required to compute the distribution of DAA values via dynamic programming.</p></sec>
<sec>
<title>Lemma 3 (Properties of PAA induced by DAA and text model)</title>
<p>Let a text model <italic>M</italic> = (
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/>, <italic>c</italic><sub>0</sub>, Σ, <italic>φ</italic>) and a DAA 
<inline-formula>
<mml:math id="mm34" display="inline">
<mml:semantics id="sm34">
<mml:mrow>
<mml:mi mathvariant="script">D</mml:mi>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="script">Q</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>q</mml:mi>
<mml:mn>0</mml:mn>
<mml:mi mathvariant="script">D</mml:mi></mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mo>∑</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>δ</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="script">V</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>ℰ</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>η</mml:mi>
<mml:mi>q</mml:mi></mml:msub></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo stretchy="false">∈</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">Q</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msup></mml:mrow></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>θ</mml:mi>
<mml:mi>q</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msubsup></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">Q</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msup></mml:mrow></mml:msub></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></inline-formula>be given and let 
<inline-graphic xlink:href="algorithms-04-00285i7.gif"/> = (
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/>, <italic>q</italic><sub>0</sub>, <italic>T</italic>, 
<inline-graphic xlink:href="algorithms-04-00285i4.gif"/>, <italic>v</italic><sub>0</sub>, ℰ, <italic>μ</italic> = (<italic>μ<sub>q</sub></italic>)<sub><italic>q</italic>∈
<inline-graphic xlink:href="algorithms-04-00285i6.gif"/></sub>, <italic>θ</italic> = (<italic>θ<sub>q</sub></italic>)<sub><italic>q</italic>∈
<inline-graphic xlink:href="algorithms-04-00285i6.gif"/></sub>) be the PAA given by Definition 5. Then,
<list list-type="order">
<list-item>
<p>ℒ(<italic>V<sub>t</sub></italic>) = ℒ(<italic>value</italic><sub>
<inline-graphic xlink:href="algorithms-04-00285i5.gif"/></sub>(<italic>S</italic><sub>0</sub> … <italic>S</italic><sub><italic>t</italic>−1</sub>)) for all <italic>t</italic> ∈ ℕ<sub>0</sub>, where <italic>S</italic> is a random text according to the text model <italic>M</italic>,</p></list-item>
<list-item>
<p>the value distribution ℒ(<italic>V<sub>n</sub></italic>) can be computed with <italic>O</italic>(<italic>n</italic> · |
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/><sup>
<inline-graphic xlink:href="algorithms-04-00285i5.gif"/></sup>| · |
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/>|<sup>2</sup> · |Σ| · <italic>ϑ<sub>n</sub></italic>) operations using <italic>O</italic>(|
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/><sup>
<inline-graphic xlink:href="algorithms-04-00285i5.gif"/></sup>| · |
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/>| · <italic>ϑ<sub>n</sub></italic>) space, and</p></list-item>
<list-item>
<p>if for all <italic>c</italic> ∈ 
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/> and <italic>σ</italic> ∈ Σ, there exists at most one <italic>c′</italic> ∈ 
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/> such that <italic>φ</italic>(<italic>c, σ, c′</italic>) &gt; 0, then the runtime is bounded by <italic>O</italic>(<italic>n</italic> · |
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/><sup>
<inline-graphic xlink:href="algorithms-04-00285i5.gif"/></sup>| · |
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/>| · |Σ| · <italic>ϑ<sub>n</sub></italic>).</p></list-item></list></p></sec>
<sec>
<title>Proof</title>
<p>As in Section 5.2, we define <italic>f<sub>t</sub></italic>(<italic>q, v</italic>) ≔ ℙ(<italic>Q<sub>t</sub></italic> = <italic>q, V<sub>t</sub></italic> = <italic>v</italic>). To prove 1, we show that
<disp-formula id="FD13">
<label>(3)</label>
<mml:math id="mm35" display="block">
<mml:semantics id="sm35">
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi></mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msup>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>=</mml:mo>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msup>
<mml:mo>∑</mml:mo>
<mml:mi>t</mml:mi></mml:msup></mml:mrow></mml:munder>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">⟦</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>δ</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>q</mml:mi>
<mml:mn>0</mml:mn>
<mml:mi mathvariant="script">D</mml:mi></mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mn>0</mml:mn></mml:msub></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>s</mml:mi></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msup>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow>
<mml:mo stretchy="false">⟧</mml:mo></mml:mrow></mml:mrow>
<mml:mo>⋅</mml:mo>
<mml:mi>ℙ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>…</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mi>t</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula>for all <italic>q</italic><sup>
<inline-graphic xlink:href="algorithms-04-00285i5.gif"/></sup> ∈ 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/><sup>
<inline-graphic xlink:href="algorithms-04-00285i5.gif"/></sup>, <italic>c</italic> ∈ 
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/>, <italic>v</italic> ∈ 
<inline-graphic xlink:href="algorithms-04-00285i4.gif"/>, and <italic>t</italic> ∈ ℕ<sub>0</sub>. For <italic>t</italic> = 0, <xref rid="FD13" ref-type="disp-formula">Equation (3)</xref> is satisfied by definitions of PAAs, DAAs and text models. For <italic>t</italic> &gt; 0 we prove it inductively. Assume <xref rid="FD13" ref-type="disp-formula">Equation (3)</xref> to be correct for all <italic>t′</italic> with 0 ≤ <italic>t′</italic> &lt; <italic>t</italic>. Then
<disp-formula id="FD14">
<label>(4)</label>
<mml:math id="mm36" display="block">
<mml:semantics id="sm36">
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>t</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:munder>
<mml:munder>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msup>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo stretchy="true">︸</mml:mo></mml:munder>
<mml:mrow>
<mml:mo>=</mml:mo>
<mml:mo>:</mml:mo>
<mml:mi>q</mml:mi></mml:mrow></mml:munder>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula>
<disp-formula id="FD15">
<label>(5)</label>
<mml:math id="mm37" display="block">
<mml:semantics id="sm37">
<mml:mrow>
<mml:mo>=</mml:mo>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>∈</mml:mo>
<mml:mi mathvariant="script">Q</mml:mi></mml:mrow></mml:munder>
<mml:mrow>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>∈</mml:mo>
<mml:msubsup>
<mml:mi>θ</mml:mi>
<mml:mi>q</mml:mi>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msubsup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:munder>
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>⋅</mml:mo>
<mml:mi>T</mml:mi></mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>⋅</mml:mo>
<mml:msub>
<mml:mi>μ</mml:mi>
<mml:mi>q</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula>
<disp-formula id="FD16">
<label>(6)</label>
<mml:math id="mm38" display="block">
<mml:semantics id="sm38">
<mml:mrow>
<mml:mo>=</mml:mo>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>∈</mml:mo>
<mml:mi mathvariant="script">Q</mml:mi></mml:mrow></mml:munder>
<mml:mrow>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>∈</mml:mo>
<mml:mi mathvariant="script">V</mml:mi>
<mml:mo>×</mml:mo>
<mml:mi>ℰ</mml:mi></mml:mrow></mml:munder>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">⟦</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>θ</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msup></mml:mrow>
<mml:mi mathvariant="script">D</mml:mi></mml:msubsup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>v</mml:mi></mml:mrow>
<mml:mo stretchy="false">⟧</mml:mo></mml:mrow>
<mml:mo>⋅</mml:mo>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>⋅</mml:mo>
<mml:mi>T</mml:mi></mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>⋅</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">⟦</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>η</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mrow>
<mml:mi mathvariant="script">D</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>e</mml:mi></mml:mrow>
<mml:mo stretchy="false">⟧</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula>
<disp-formula id="FD17">
<label>(7)</label>
<mml:math id="mm39" display="block">
<mml:semantics id="sm39">
<mml:mrow>
<mml:mo>=</mml:mo>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mrow>
<mml:mo>′</mml:mo>
<mml:mi mathvariant="script">D</mml:mi></mml:mrow></mml:msup>
<mml:mo>∈</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">Q</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msup></mml:mrow></mml:munder>
<mml:mrow>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>∈</mml:mo>
<mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:munder>
<mml:mrow>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>∈</mml:mo>
<mml:mi mathvariant="script">V</mml:mi>
<mml:mo>×</mml:mo>
<mml:mi>ℰ</mml:mi></mml:mrow></mml:munder>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">⟦</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>θ</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msup></mml:mrow>
<mml:mi mathvariant="script">D</mml:mi></mml:msubsup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>v</mml:mi></mml:mrow>
<mml:mo stretchy="false">⟧</mml:mo></mml:mrow></mml:mrow>
<mml:mo>⋅</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">⟦</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>η</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mrow>
<mml:mi mathvariant="script">D</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>e</mml:mi></mml:mrow>
<mml:mo stretchy="false">⟧</mml:mo></mml:mrow>
<mml:mo>⋅</mml:mo>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow>
<mml:mo>⋅</mml:mo>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>σ</mml:mi>
<mml:mo>∈</mml:mo>
<mml:mo>∑</mml:mo></mml:mrow></mml:munder>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">⟦</mml:mo>
<mml:mrow>
<mml:mi>δ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mrow>
<mml:mo>′</mml:mo>
<mml:mi mathvariant="script">D</mml:mi></mml:mrow></mml:msup>
<mml:mo>,</mml:mo>
<mml:mi>σ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msup></mml:mrow>
<mml:mo stretchy="false">⟧</mml:mo></mml:mrow></mml:mrow>
<mml:mo>⋅</mml:mo>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>σ</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula>
<disp-formula id="FD18">
<label>(8)</label>
<mml:math id="mm40" display="block">
<mml:semantics id="sm40">
<mml:mrow>
<mml:mo>=</mml:mo>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msup>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:munder>
<mml:mrow>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>σ</mml:mi>
<mml:mo>∈</mml:mo>
<mml:mo>∑</mml:mo></mml:mrow></mml:munder>
<mml:mrow>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mrow>
<mml:mo>′</mml:mo>
<mml:mi mathvariant="script">D</mml:mi></mml:mrow></mml:msup>
<mml:mo>∈</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">Q</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msup></mml:mrow></mml:munder>
<mml:mrow>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>∈</mml:mo>
<mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:munder>
<mml:mrow>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>∈</mml:mo>
<mml:mi mathvariant="script">V</mml:mi>
<mml:mo>×</mml:mo>
<mml:mi>ℰ</mml:mi></mml:mrow></mml:munder>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">⟦</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>θ</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msup></mml:mrow>
<mml:mi mathvariant="script">D</mml:mi></mml:msubsup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>v</mml:mi></mml:mrow>
<mml:mo stretchy="false">⟧</mml:mo></mml:mrow>
<mml:mo>⋅</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">⟦</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>η</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mrow>
<mml:mi mathvariant="script">D</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>e</mml:mi></mml:mrow>
<mml:mo stretchy="false">⟧</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:mrow>
<mml:mo>⋅</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">⟦</mml:mo>
<mml:mrow>
<mml:mi>δ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mrow>
<mml:mo>′</mml:mo>
<mml:mi mathvariant="script">D</mml:mi></mml:mrow></mml:msup>
<mml:mo>,</mml:mo>
<mml:mi>σ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mi>σ</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msup></mml:mrow>
<mml:mo stretchy="false">⟧</mml:mo></mml:mrow>
<mml:mo>⋅</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">⟦</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>δ</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:msubsup>
<mml:mi>q</mml:mi>
<mml:mn>0</mml:mn>
<mml:mi mathvariant="script">D</mml:mi></mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mrow>
<mml:mo>′</mml:mo>
<mml:mi mathvariant="script">D</mml:mi></mml:mrow></mml:msup>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo stretchy="false">⟧</mml:mo></mml:mrow>
<mml:mo>⋅</mml:mo>
<mml:mi>ℙ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>…</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>2</mml:mn></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>C</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msup>
<mml:mo>=</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>⋅</mml:mo>
<mml:mi>φ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>σ</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula>
<disp-formula id="FD19">
<label>(9)</label>
<mml:math id="mm41" display="block">
<mml:semantics id="sm41">
<mml:mrow>
<mml:mo>=</mml:mo>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mi>σ</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msup>
<mml:mo>∑</mml:mo>
<mml:mi>t</mml:mi></mml:msup></mml:mrow></mml:munder>
<mml:mrow>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mrow>
<mml:mo>′</mml:mo>
<mml:mi mathvariant="script">D</mml:mi></mml:mrow></mml:msup>
<mml:mo>∈</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">Q</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msup></mml:mrow></mml:munder>
<mml:mrow>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>∈</mml:mo>
<mml:mi mathvariant="script">V</mml:mi>
<mml:mo>×</mml:mo>
<mml:mi>ℰ</mml:mi></mml:mrow></mml:munder>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">⟦</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>θ</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msup></mml:mrow>
<mml:mi mathvariant="script">D</mml:mi></mml:msubsup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>v</mml:mi></mml:mrow>
<mml:mo stretchy="false">⟧</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow>
<mml:mo>⋅</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">⟦</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>η</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mrow>
<mml:mi mathvariant="script">D</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>e</mml:mi></mml:mrow>
<mml:mo stretchy="false">⟧</mml:mo></mml:mrow>
<mml:mo>⋅</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">⟦</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>δ</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:msubsup>
<mml:mi>q</mml:mi>
<mml:mn>0</mml:mn>
<mml:mi mathvariant="script">D</mml:mi></mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mrow>
<mml:mo>′</mml:mo>
<mml:mi mathvariant="script">D</mml:mi></mml:mrow></mml:msup>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi>
<mml:mo>′</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo stretchy="false">⟧</mml:mo></mml:mrow>
<mml:mo>⋅</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">⟦</mml:mo>
<mml:mrow>
<mml:mi>δ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mrow>
<mml:mo>′</mml:mo>
<mml:mi mathvariant="script">D</mml:mi></mml:mrow></mml:msup>
<mml:mo>,</mml:mo>
<mml:mi>σ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mi>σ</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msup></mml:mrow>
<mml:mo stretchy="false">⟧</mml:mo></mml:mrow>
<mml:mo>⋅</mml:mo>
<mml:mi>ℙ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>…</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>2</mml:mn></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>σ</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mi>t</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula>
<disp-formula id="FD20">
<label>(10)</label>
<mml:math id="mm42" display="block">
<mml:semantics id="sm42">
<mml:mrow>
<mml:mo>=</mml:mo>
<mml:munder>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mi>σ</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msup>
<mml:mo>∑</mml:mo>
<mml:mi>t</mml:mi></mml:msup></mml:mrow></mml:munder>
<mml:mrow>
<mml:mrow>
<mml:mo>⟦</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>δ</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>q</mml:mi>
<mml:mn>0</mml:mn>
<mml:mi mathvariant="script">D</mml:mi></mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mn>0</mml:mn></mml:msub></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>σ</mml:mi></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mi mathvariant="script">D</mml:mi></mml:msup>
<mml:mo>,</mml:mo>
<mml:mi>v</mml:mi></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow>
<mml:mo>⟧</mml:mo></mml:mrow>
<mml:mo>⋅</mml:mo>
<mml:mi>ℙ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>…</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>σ</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mi>t</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>c</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>.</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>In the above derivation, step (<xref rid="FD14" ref-type="disp-formula">4</xref>)→(<xref rid="FD15" ref-type="disp-formula">5</xref>) follows from (<xref rid="FD11" ref-type="disp-formula">1</xref>). Step (<xref rid="FD15" ref-type="disp-formula">5</xref>)→(<xref rid="FD16" ref-type="disp-formula">6</xref>) follows from the definitions of <italic>θ<sub>q</sub></italic> and <italic>μ<sub>q</sub></italic>. Step (<xref rid="FD16" ref-type="disp-formula">6</xref>)→(<xref rid="FD17" ref-type="disp-formula">7</xref>) uses the definitions of <italic>T</italic> and 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/> in Lemma 3. Step (<xref rid="FD17" ref-type="disp-formula">7</xref>)→(<xref rid="FD18" ref-type="disp-formula">8</xref>) uses the induction assumption. Step (<xref rid="FD18" ref-type="disp-formula">8</xref>)→(<xref rid="FD19" ref-type="disp-formula">9</xref>) uses Lemma 1. The final step (<xref rid="FD19" ref-type="disp-formula">9</xref>)→(<xref rid="FD20" ref-type="disp-formula">10</xref>) follows by combining the four Iverson brackets summed over <italic>q′</italic><sup>
<inline-graphic xlink:href="algorithms-04-00285i5.gif"/></sup> and (<italic>v′, e</italic>) into a single Iverson bracket.</p>
<p>To compute the table <italic>f<sub>n</sub></italic> containing <italic>f<sub>n</sub></italic>(<italic>q, v</italic>) for all <italic>q</italic> ∈ 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/> and <italic>v</italic> ∈ 
<inline-graphic xlink:href="algorithms-04-00285i4.gif"/>, we start with <italic>f</italic><sub>0</sub> and perform <italic>n</italic> update steps. The runtime bounds given in 2. and 3. can be verified by considering a “push” algorithm: When computing <italic>f<sub>t</sub></italic><sub>+1</sub>, we initialize the table with zeros and iterate over all <italic>q</italic> ∈ 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/>, <italic>v</italic> ∈ 
<inline-graphic xlink:href="algorithms-04-00285i4.gif"/> and <italic>q</italic>′ ∈ {<italic>q</italic>″ ∈ 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/> : <italic>T</italic>(<italic>q, q″</italic>) &gt; 0}; for each combination of <italic>q, v</italic>, and <italic>q′</italic> with <italic>T</italic>(<italic>q, q′</italic>) &gt; 0, we add <italic>f<sub>t</sub></italic>(<italic>q, v</italic>) · <italic>T</italic>(<italic>q, q′</italic>) to <italic>f</italic><sub><italic>t</italic>+1</sub>(<italic>q′, θ<sub>q′</sub></italic>(<italic>v, η<sub>q′</sub></italic>)).</p>
<p>As a direct consequence of the above lemma and of the DAA construction from Section 4, we arrive at our main theorem.</p></sec>
<sec>
<title>Theorem 2</title>
<p>Let a finite-memory text model (
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/>, <italic>c</italic><sub>0</sub>, Σ, <italic>φ</italic>), a window-based pattern matching algorithm A, a pattern <italic>p</italic> with |<italic>p</italic>| = <italic>m</italic>, and the functions <italic>shift<sup>A,p</sup></italic> and <italic>ξ<sup>A,p</sup></italic> be given. Then, the cost distribution 
<inline-formula>
<mml:math id="mm43" display="inline">
<mml:semantics id="sm43">
<mml:mrow>
<mml:mi>ℒ</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>X</mml:mi>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>p</mml:mi></mml:mrow></mml:msubsup></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></inline-formula> can be computed using <italic>O</italic>(<italic>n</italic><sup>2</sup> · <italic>m</italic> · |
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/><sup>
<inline-graphic xlink:href="algorithms-04-00285i5.gif"/></sup>| · |
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/>|<sup>2</sup> · |Σ|) time and <italic>O</italic>(|
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/><sup>
<inline-graphic xlink:href="algorithms-04-00285i5.gif"/></sup>| · |
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/>| · <italic>n</italic> · <italic>m</italic>) space. If for all <italic>c</italic> ∈ 
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/> and <italic>σ</italic> ∈ Σ, there exists at most one <italic>c′</italic> ∈ 
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/> such that <italic>φ</italic>(<italic>c, σ, c′</italic>) &gt; 0, a factor of |
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/>| can be dropped from the runtime bounds.</p>
<p>Using optimal algorithm-dependent DAA constructions schemes (e.g., the <italic>O</italic>(<italic>m</italic><sup>2</sup>) construction for the Horspool algorithm by Tsai [<xref ref-type="bibr" rid="b13-algorithms-04-00285">13</xref>]) allows to replace |
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/><sup>
<inline-graphic xlink:href="algorithms-04-00285i5.gif"/></sup>| by a polynomial in <italic>m</italic>, instead of <italic>O</italic>(<italic>m</italic>|Σ|<italic><sup>m</sup></italic>).</p></sec></sec></sec>
<sec>
<label>6.</label>
<title>Comparing Algorithms with Difference DAAs</title>
<p>Computing the cost distribution for two algorithms allows us to compare their performance characteristics. One natural question, however, cannot be answered by comparing these two (one-dimensional) distributions: What is the probability that algorithm <italic>A</italic> needs more text accesses than algorithm <italic>B</italic> to scan the same random text? The answer will depend on the correlation of algorithm performances: Do the same instances lead to long runtimes for both algorithms or are there instances that are easy for one algorithm but difficult for the other? This section answers these questions by constructing a PAA to compute the distribution of <italic>cost differences</italic> of two algorithms. That means, we calculate the probability that algorithm <italic>A</italic> needs <italic>v</italic> text accesses <italic>more</italic> than algorithm <italic>B</italic> for all <italic>v</italic> ∈ ℤ.</p>
<p>We start by giving a general construction of a DAA that computes the difference of the sum of emission of two given DAAs.</p>
<sec>
<title>Definition 6 (Difference DAA)</title>
<p>Let a finite alphabet Σ and two DAAs 
<inline-formula>
<mml:math id="mm44" display="inline">
<mml:semantics id="sm44">
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="script">D</mml:mi>
<mml:mn>1</mml:mn></mml:msup>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="script">Q</mml:mi>
<mml:mn>1</mml:mn></mml:msup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>q</mml:mi>
<mml:mn>0</mml:mn>
<mml:mn>1</mml:mn></mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mo>∑</mml:mo>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>δ</mml:mi>
<mml:mn>1</mml:mn></mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">V</mml:mi>
<mml:mn>1</mml:mn></mml:msup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>v</mml:mi>
<mml:mn>0</mml:mn>
<mml:mn>1</mml:mn></mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>ℰ</mml:mi>
<mml:mn>1</mml:mn></mml:msup>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msubsup>
<mml:mi>η</mml:mi>
<mml:mi>q</mml:mi>
<mml:mn>1</mml:mn></mml:msubsup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">Q</mml:mi>
<mml:mn>1</mml:mn></mml:msup></mml:mrow></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>θ</mml:mi>
<mml:mi>q</mml:mi>
<mml:mn>1</mml:mn></mml:msubsup></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">Q</mml:mi>
<mml:mn>1</mml:mn></mml:msup></mml:mrow></mml:msub></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></inline-formula> and 
<inline-formula>
<mml:math id="mm45" display="inline">
<mml:semantics id="sm45">
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="script">D</mml:mi>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="script">Q</mml:mi>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>q</mml:mi>
<mml:mn>0</mml:mn>
<mml:mn>2</mml:mn></mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mo>∑</mml:mo>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>δ</mml:mi>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">V</mml:mi>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>v</mml:mi>
<mml:mn>0</mml:mn>
<mml:mn>2</mml:mn></mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>ℰ</mml:mi>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msubsup>
<mml:mi>η</mml:mi>
<mml:mi>q</mml:mi>
<mml:mn>2</mml:mn></mml:msubsup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">Q</mml:mi>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>θ</mml:mi>
<mml:mi>q</mml:mi>
<mml:mn>2</mml:mn></mml:msubsup></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>∈</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">Q</mml:mi>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:msub></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></inline-formula> be given with 
<inline-formula>
<mml:math id="mm46" display="inline">
<mml:semantics id="sm46">
<mml:mrow>
<mml:msup>
<mml:mi mathvariant="script">V</mml:mi>
<mml:mn>1</mml:mn></mml:msup>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">V</mml:mi>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>=</mml:mo>
<mml:mi>ℕ</mml:mi>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>v</mml:mi>
<mml:mn>0</mml:mn>
<mml:mn>1</mml:mn></mml:msubsup>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mi>v</mml:mi>
<mml:mn>0</mml:mn>
<mml:mn>2</mml:mn></mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>ℰ</mml:mi>
<mml:mn>1</mml:mn></mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>ℰ</mml:mi>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>⊂</mml:mo>
<mml:mi>ℕ</mml:mi></mml:mrow></mml:semantics></mml:math></inline-formula>, and all operations are additions of previous value and current emission. The <italic>difference DAA of</italic> 
<inline-graphic xlink:href="algorithms-04-00285i2.gif"/><sup>1</sup> <italic>and</italic> 
<inline-graphic xlink:href="algorithms-04-00285i2.gif"/><sup>2</sup> is defined as
<disp-formula id="FD21">
<mml:math id="mm47" display="block">
<mml:semantics id="sm47">
<mml:mrow>
<mml:mtext mathvariant="italic">DiffDAA</mml:mtext>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">D</mml:mi>
<mml:mn>1</mml:mn></mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi mathvariant="script">D</mml:mi>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>≔</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi mathvariant="script">Q</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>∑</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>δ</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi mathvariant="script">V</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>v</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>ℰ</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>η</mml:mi>
<mml:mi>q</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>∈</mml:mo>
<mml:mi mathvariant="script">Q</mml:mi></mml:mrow></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>θ</mml:mi>
<mml:mi>q</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
<mml:mo>∈</mml:mo>
<mml:mi mathvariant="script">Q</mml:mi></mml:mrow></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula>where
<list list-type="bullet">
<list-item>
<p>
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/> ≔ 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/><sup>1</sup> × 
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/><sup>2</sup> and 
<inline-formula>
<mml:math id="mm48" display="inline">
<mml:semantics id="sm48">
<mml:mrow>
<mml:msub>
<mml:mi>q</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>≔</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>q</mml:mi>
<mml:mn>0</mml:mn>
<mml:mn>1</mml:mn></mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>q</mml:mi>
<mml:mn>0</mml:mn>
<mml:mn>2</mml:mn></mml:msubsup></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></inline-formula>,</p></list-item>
<list-item>
<p>
<inline-graphic xlink:href="algorithms-04-00285i4.gif"/> ≔ ℤ and <italic>v</italic><sub>0</sub> ≔ 0,</p></list-item>
<list-item>
<p>ℰ ≔ ℰ<sup>1</sup> × ℰ<sup>2</sup> and 
<inline-formula>
<mml:math id="mm49" display="inline">
<mml:semantics id="sm49">
<mml:mrow>
<mml:msub>
<mml:mi>η</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mn>1</mml:mn></mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msub>
<mml:mo>≔</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>η</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mn>1</mml:mn></mml:msup></mml:mrow>
<mml:mn>1</mml:mn></mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>η</mml:mi>
<mml:mrow>
<mml:msup>
<mml:mi>q</mml:mi>
<mml:mn>2</mml:mn></mml:msup></mml:mrow>
<mml:mn>2</mml:mn></mml:msubsup></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></inline-formula>,</p></list-item>
<list-item>
<p><italic>δ</italic> : ((<italic>q</italic><sup>1</sup>, <italic>q</italic><sup>2</sup>), <italic>σ</italic>) ↦ (<italic>δ</italic><sup>1</sup>(<italic>q</italic><sup>1</sup>, <italic>σ</italic>), <italic>δ</italic><sup>2</sup>(<italic>q</italic><sup>2</sup>, <italic>σ</italic>)),</p></list-item>
<list-item>
<p><italic>θ<sub>q</sub></italic> : (<italic>v</italic>, (<italic>e</italic><sup>1</sup>, <italic>e</italic><sup>2</sup>)) ↦ <italic>v</italic> + <italic>e</italic><sup>1</sup> − <italic>e</italic><sup>2</sup>.</p></list-item></list></p></sec>
<sec>
<title>Lemma 4</title>
<p>Let 
<inline-graphic xlink:href="algorithms-04-00285i2.gif"/><sup>1</sup> and 
<inline-graphic xlink:href="algorithms-04-00285i2.gif"/><sup>2</sup> be DAAs meeting the criteria given in Definition 6 and 
<inline-graphic xlink:href="algorithms-04-00285i2.gif"/> ≔ <italic>DiffDAA</italic>(
<inline-graphic xlink:href="algorithms-04-00285i2.gif"/><sup>1</sup>, 
<inline-graphic xlink:href="algorithms-04-00285i2.gif"/><sup>2</sup>). Then, value<sub>
<inline-graphic xlink:href="algorithms-04-00285i5.gif"/></sub>(<italic>s</italic>) = value<sub>
<inline-graphic xlink:href="algorithms-04-00285i5.gif"/><sup>1</sup></sub>(<italic>s</italic>) − value<sub>
<inline-graphic xlink:href="algorithms-04-00285i5.gif"/><sup>2</sup></sub>(<italic>s</italic>) for all <italic>s</italic> ∈ Σ*.</p></sec>
<sec>
<title>Proof</title>
<p>Follows directly from Definition 6.</p>
<p>Lemma 4 can now be applied to the DAAs constructed for the analysis of two algorithms as described in Section 4. Since the above construction builds the product of both state spaces, it is advisable to minimize both DAAs before generating the product. Furthermore, in an implementation, only reachable states of the product automaton need to be constructed. Before being used to build a PAA (by applying Lemma 3), the product DAA should again be minimized.</p>
<p>As discussed in Section 5.2, at most <italic>m</italic>(<italic>n</italic> − <italic>m</italic> + 1) character accesses can result from scanning a text of length <italic>n</italic> for a pattern of length <italic>m</italic>. Thus, the difference of costs for two algorithms lies in the range {−<italic>m</italic>(<italic>n</italic> − <italic>m</italic> + 1), …, <italic>m</italic>(<italic>n</italic> − <italic>m</italic> + 1)} and, hence, <italic>ϑ<sub>n</sub></italic> ∈ <italic>O</italic>(<italic>n</italic> · <italic>m</italic>).</p></sec></sec>
<sec>
<label>7.</label>
<title>Case Studies</title>
<p>In Section 2, we considered three practically relevant algorithms, namely Horspool's algorithm, backward oracle matching (BOM), and backward (non)-deterministic DAWG matching (B(N)DM). Now, we compare the distributions of running time costs of these algorithms for several patterns over the DNA alphabet {
<monospace>A, C, G, T</monospace>}. <xref ref-type="fig" rid="f3-algorithms-04-00285">Figure 3</xref> shows these distributions for the patterns 
<monospace>ATATAT</monospace> and 
<monospace>ACGTAC</monospace> for text lengths 100 and 500 under a second order Markovian text model estimated from the human genome. For text length 500, the distributions for Horspool and B(N)DM resemble the shape of normal distributions. In fact, for Horspool's algorithm it has been proven that the distribution is asymptotically normal [<xref ref-type="bibr" rid="b12-algorithms-04-00285">12</xref>]. For smaller text lengths (e.g., 100, as shown in left column of <xref ref-type="fig" rid="f3-algorithms-04-00285">Figure 3</xref>), the distributions are less smooth than for longer texts.</p>
<p>It is remarkable that for BOM we find zero probabilities with a fixed period. The period equals <italic>m</italic> + 1 which is 7 in the shown examples. This behavior is caused by the factor-based nature of BOM; when a suffix of the search window has been recognized as not being a factor (substring) of the pattern, the window is just moved far enough to exclude this substring, creating the relation <italic>shift<sup>p</sup></italic>(<italic>w</italic>) = <italic>m</italic> − <italic>ξ<sup>p</sup></italic>(<italic>w</italic>) + 1 between cost and shift of a window <italic>w</italic>. As the following lemma shows, this property is a sufficient condition for the observed zero probabilities.</p>
<sec>
<title>Lemma 5</title>
<p>Let a window-based pattern matching algorithm <italic>A</italic>, a pattern <italic>p</italic> with |<italic>p</italic>| = <italic>m</italic>, and the functions <italic>shift<sup>A,p</sup></italic> and <italic>ξ<sup>A,p</sup></italic> be given such that <italic>shift<sup>A,p</sup></italic>(<italic>w</italic>) = <italic>m</italic> − <italic>ξ<sup>A,p</sup></italic>(<italic>w</italic>) + 1 for all <italic>w</italic> ∈ Σ<italic><sup>m</sup></italic>. Then,
<disp-formula id="FD22">
<mml:math id="mm50" display="block">
<mml:semantics id="sm50">
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msup>
<mml:mi>ξ</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>p</mml:mi></mml:mrow></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>≢</mml:mo>
<mml:mn>0</mml:mn></mml:mrow></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>mod</mml:mo>
<mml:mspace width="0.3em"/>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula>for all <italic>n</italic> ∈ ℕ and all <italic>s</italic> ∈ Σ<italic><sup>n</sup></italic>.</p>
<sec>
<title>Proof</title>
<p>Let <italic>w<sub>i</sub></italic>, …, <italic>w<sub>k</sub></italic> be the sequence of windows examined by algorithm <italic>A</italic> when processing the text <italic>s</italic> ∈ Σ<italic><sup>n</sup></italic>. In the beginning, the rightmost position of the current window is at position <italic>m</italic> − 1 and is moved by <italic>shift<sup>A,p</sup></italic>(<italic>w<sub>i</sub></italic>) after processing window <italic>w<sub>i</sub></italic>. After processing all windows it is beyond the end of the text. Formally,
<disp-formula id="FD23">
<mml:math id="mm51" display="block">
<mml:semantics id="sm51">
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>≤</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:munderover>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>k</mml:mi></mml:munderover>
<mml:mrow>
<mml:msup>
<mml:mtext mathvariant="italic">shift</mml:mtext>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>p</mml:mi></mml:mrow></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>&lt;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>m</mml:mi></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>By using the assumption that <italic>shift<sup>A,p</sup></italic>(<italic>w</italic>) = <italic>m</italic> − <italic>ξ<sup>A,p</sup></italic>(<italic>w</italic>) + 1, we obtain
<disp-formula id="FD24">
<mml:math id="mm52" display="block">
<mml:semantics id="sm52">
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left"/>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>≤</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:munderover>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>k</mml:mi></mml:munderover>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>-</mml:mo>
<mml:msup>
<mml:mi>ξ</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>p</mml:mi></mml:mrow></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>&lt;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>m</mml:mi></mml:mrow></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mo>⇔</mml:mo></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo>≤</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>-</mml:mo>
<mml:msup>
<mml:mi>ξ</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>p</mml:mi></mml:mrow></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>&lt;</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>m</mml:mi></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mo>⇔</mml:mo></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>&lt;</mml:mo>
<mml:msup>
<mml:mi>ξ</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>p</mml:mi></mml:mrow></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>≤</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mi>m</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula>which means that <italic>ξ<sup>A,p</sup></italic>(<italic>s</italic>) + <italic>n</italic> + 1 lies strictly between <italic>k</italic>-times and (<italic>k</italic> + 1)-times <italic>m</italic> + 1 and thus cannot be a multiple of <italic>m</italic> + 1.</p>
<p>The probability that one pattern matching algorithm is faster than another depends on the pattern. Using the technique introduced in Section 6, we can quantify the strength of this effect. <xref ref-type="fig" rid="f4-algorithms-04-00285">Figure 4</xref> shows distributions of cost <italic>differences</italic> for different patterns and algorithms. That means, the probability that the first algorithm is faster is represented by the area under the curve left of zero. For the pattern 
<monospace>ACCCCC</monospace> and a random text of length 100, for example, there is a 53.9% probability that Horspool's algorithm needs fewer character accesses than B(N)DM (for the same second order Markovian model as used before), while for 
<monospace>ACGTAC</monospace>, the probability is only 0.0016%.</p>
<p>Worth noting and perhaps surprising is the fact that there is a non-zero probability of BOM being faster than B(N)DM although, <italic>shift</italic><sup>B(N)DM,<italic>p</italic></sup>(<italic>w</italic>) ≥ <italic>shift</italic><sup>BOM,<italic>p</italic></sup>(<italic>w</italic>) for all window contents <italic>w</italic>. The explanation, of course, is that a shorter (say, first) shift for BOM leads to a different window content than for B(N)DM for the second window, which may have a larger shift value. This effect depends on the pattern: For the pattern 
<monospace>ACCCCC</monospace>, there is a 52.4% probability that BOM is at least as fast as B(N)DM (in terms of character accesses), while it is 4.9% for 
<monospace>ACGTAC</monospace>, again on texts of length 100. An example text where BOM is faster than B(N)DM while searching for the pattern 
<monospace>ACCCCC</monospace> is shown in <xref ref-type="fig" rid="f5-algorithms-04-00285">Figure 5</xref>. Both algorithms read two characters of the first window but prescribe different shifts. The first window ends on 
<monospace>TC</monospace>. BOM recognizes that 
<monospace>TC</monospace> is not a substring of the pattern an shifts the window by five positions, just far enough to exclude this substring. In contrast, B(N)DM determines that neither 
<monospace>C</monospace> nor 
<monospace>TC</monospace> are prefixes of the pattern and shifts the window by six positions. It turns out, however, that a shorter shift of five positions was beneficial in this case as BOM can process the second window with one character access, while B(N)DM uses two character accesses.</p>
<p>To assess the effect of DAA minimization before constructing PAAs, we constructed minimized DAAs for all 21840 patterns of lengths 2 to 7 over Σ = {
<monospace>A, C, G, T</monospace>}. The minimum, average, and maximum state counts are shown in <xref ref-type="table" rid="t1-algorithms-04-00285">Table 1</xref>. For length 6, <xref ref-type="fig" rid="f6-algorithms-04-00285">Figure 6</xref> contains a detailed histogram. These statistics show that construction and minimization as given in this article lead to smaller automata (and thus better runtimes) than the constructions given in the conference version of this article [<xref ref-type="bibr" rid="b18-algorithms-04-00285">18</xref>]. It may be conjectured that the worst-case size of the minimal state space grows only polynomially with m for all of these algorithms, as has been previously proven for the Horspool algorithm [<xref ref-type="bibr" rid="b13-algorithms-04-00285">13</xref>].</p>
<p>The algorithms were implemented in JAVA and are available as part of the MoSDi software package available at <ext-link xlink:href="http://mosdi.googlecode.com" ext-link-type="uri">http://mosdi.googlecode.com</ext-link>. They were run on an Intel Core 2 Dual CPU at 2.1 GHz. Computing the distributions shown in <xref ref-type="fig" rid="f3-algorithms-04-00285">Figure 3</xref> took 0.5 to 1.3 seconds for each distribution. Distributions of differences as in <xref ref-type="fig" rid="f4-algorithms-04-00285">Figure 4</xref> were computed in 56 to 97 seconds.</p></sec></sec></sec>
<sec sec-type="discussion">
<label>8.</label>
<title>Discussion</title>
<p>Using PAAs, we have shown how the exact distribution of the number of character accesses for window-based pattern matching algorithms can be computed algorithmically. The framework admits general finite-memory text models, including i.i.d. models, Markov models of arbitrary order, and character-emitting hidden Markov models. The given construction results in an asymptotic runtime of <italic>O</italic>(<italic>n</italic><sup>2</sup> · <italic>m</italic> · |
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/><sup>
<inline-graphic xlink:href="algorithms-04-00285i5.gif"/></sup>| · |
<inline-graphic xlink:href="algorithms-04-00285i1.gif"/>|<sup>2</sup> · |Σ|). The number of DAA states |
<inline-graphic xlink:href="algorithms-04-00285i3.gif"/><sup>
<inline-graphic xlink:href="algorithms-04-00285i5.gif"/></sup>| can be as large as <italic>O</italic>(<italic>m</italic> · Σ<italic><sup>m</sup></italic>), but we conjecture that for each reasonable algorithm, the necessary minimal state set 
<inline-formula>
<mml:math id="mm53" display="inline">
<mml:semantics id="sm53">
<mml:mrow>
<mml:msubsup>
<mml:mi mathvariant="script">Q</mml:mi>
<mml:mrow>
<mml:mo>min</mml:mo></mml:mrow>
<mml:mi mathvariant="script">D</mml:mi></mml:msubsup></mml:mrow></mml:semantics></mml:math></inline-formula> grows only polynomially with <italic>m</italic>. In particular, we conjecture <italic>O</italic>(<italic>m</italic><sup>3</sup>) sizes for B(N)DM and BOM; this is consistent with the numbers in <xref ref-type="table" rid="t1-algorithms-04-00285">Table 1</xref>. For Horspool, a specialized <italic>O</italic>(<italic>m</italic><sup>2</sup>) construction is known [<xref ref-type="bibr" rid="b13-algorithms-04-00285">13</xref>]. Otherwise, in practice, the DAA size can be reduced by DAA minimization, but it remains open if there exists an algorithm to construct the minimal automaton directly in general, <italic>i.e.</italic>, using only 
<inline-formula>
<mml:math id="mm54" display="inline">
<mml:semantics id="sm54">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">|</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi mathvariant="script">Q</mml:mi>
<mml:mrow>
<mml:mo>min</mml:mo></mml:mrow>
<mml:mi mathvariant="script">D</mml:mi></mml:msubsup></mml:mrow>
<mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></inline-formula> time. A proof that this is the case for a broad class of pattern matching algorithms would be an important insight into the nature of these algorithms and therefore certainly warrants further research.</p>
<p>The behavior of BOM deserves further attention: first, periodic zero probabilities are found in its distribution of text character accesses; and second, it may (unexpectedly) need fewer text accesses than B(N)DM on some patterns, although BOM's shift values are never better than B(N)DM's.</p>
<p>We focused on algorithms for single patterns, but the presented techniques also apply to algorithms to search for multiple patterns like the Wu-Manber algorithm [<xref ref-type="bibr" rid="b25-algorithms-04-00285">25</xref>] or “set backward oracle matching” and “multiple BNDM”, as described in [<xref ref-type="bibr" rid="b8-algorithms-04-00285">8</xref>]. A comparison of the resulting distributions could yield new insights into these algorithms as well.</p>
<p>Other metrics than text character accesses might be of interest and could be easily substituted; for example, just counting the number of windows by defining <italic>ξ<sup>p</sup></italic>(<italic>w</italic>) = 1 for all <italic>w</italic> ∈ Σ<italic><sup>m</sup></italic>.</p>
<p>The given constructions allow us to analyze an algorithm's performance for each pattern individually. While this is desirable for detailed analysis, the cost distribution resulting from randomly choosing text <italic>and</italic> pattern would also be of interest.</p>
<p>The results of this paper were obtained while Tobias Marschall was a PhD student with Sven Rahmann and TU Dortmund. The thesis is available at <ext-link xlink:href="http://hdl.handle.net/2003/27760" ext-link-type="uri">http://hdl.handle.net/2003/27760</ext-link>.</p></sec></body>
<back>
<sec sec-type="display-objects">
<title>Figures and Table</title>
<fig id="f1-algorithms-04-00285" position="float">
<label>Figure 1.</label>
<caption>
<p>Factor Oracle for <italic>x</italic> = 
<monospace>CACCACCCT</monospace>, corresponding to pattern <italic>p</italic> = 
<monospace>TCCCACCAC</monospace>. Omitted edges lead into the omitted FAIL state. The string 
<monospace>ACCT</monospace> is recognized (states 1,2,3,8), although it is not a substring of <italic>x</italic>.</p></caption>
<graphic xlink:href="algorithms-04-00285f1.gif"/></fig>
<fig id="f2-algorithms-04-00285" position="float">
<label>Figure 2.</label>
<caption>
<p>Illustration of the DAA encoding the behavior of Horspool's algorithm when searching the text <italic>s</italic> = <italic>CGACATACGA</italic> for the pattern <italic>p</italic> = 
<monospace>ACGA</monospace>. On top, one sees the state the DAA takes after reading the character below. The leftmost state is the start state. At the bottom, the windows considered by Horspool's algorithm are indicated, illustrating that the second component of the current state encodes the distance to the right end of the next window.</p></caption>
<graphic xlink:href="algorithms-04-00285f2.gif"/></fig>
<fig id="f3-algorithms-04-00285" position="float">
<label>Figure 3.</label>
<caption>
<p>Exact distributions of character access counts for patterns 
<monospace>ATATAT</monospace> (top) and 
<monospace>ACGTAC</monospace> (bottom) for text length 100 (left) and text length 500 (right). A second order Markovian text model estimated from the human genome is used.</p></caption>
<graphic xlink:href="algorithms-04-00285f3.gif"/></fig>
<fig id="f4-algorithms-04-00285" position="float">
<label>Figure 4.</label>
<caption>
<p>Exact distributions of differences in character access counts for different patterns using a second order Markovian text model estimated from the human genome and random texts of lengths 100.</p></caption>
<graphic xlink:href="algorithms-04-00285f4.gif"/></fig>
<fig id="f5-algorithms-04-00285" position="float">
<label>Figure 5.</label>
<caption>
<p>Example of a string for which BOM executes less character accesses than B(N)DM when searching for the pattern <italic>p</italic> = 
<monospace>ACCCCC</monospace>. The searched windows are indicated below the text; the nearby number gives the number of character accesses executed when processing this window.</p></caption>
<graphic xlink:href="algorithms-04-00285f5.gif"/></fig>
<fig id="f6-algorithms-04-00285" position="float">
<label>Figure 6.</label>
<caption>
<p>Histogram on number of states of minimal DAAs over all patterns of length 6 over Σ = {
<monospace>A, C, G, T</monospace>}.</p></caption>
<graphic xlink:href="algorithms-04-00285f6.gif"/></fig>
<table-wrap id="t1-algorithms-04-00285" position="float">
<label>Table 1.</label>
<caption>
<p>Comparison of DAA sizes for all patterns of length <italic>m</italic> over Σ = {
<monospace>A, C, G, T</monospace>}.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" valign="middle" rowspan="3"><bold><italic>m</italic></bold></th>
<th align="center" valign="top"><bold>States unminimized</bold></th>
<th colspan="3" align="center" valign="top"><bold>States minimized (min./avg./max.)</bold></th></tr>
<tr>
<th valign="bottom" colspan="4">
<hr/></th></tr>
<tr>
<th align="center" valign="top"><bold>|Σ|<italic><sup>m</sup></italic> · (<italic>m</italic> + 1)</bold></th>
<th align="center" valign="top"><bold>Horspool</bold></th>
<th align="center" valign="top"><bold>BOM</bold></th>
<th align="center" valign="top"><bold>B(N)DM</bold></th></tr></thead>
<tbody>
<tr>
<td align="center" valign="top">2</td>
<td align="center" valign="top">48</td>
<td align="center" valign="top">4 / 4.8 / 5</td>
<td align="center" valign="top">4 / 4.0 / 4</td>
<td align="center" valign="top">4 / 4.8 / 5</td></tr>
<tr>
<td align="center" valign="top">3</td>
<td align="center" valign="top">256</td>
<td align="center" valign="top">7 / 8.3 / 9</td>
<td align="center" valign="top">7 / 8.3 / 9</td>
<td align="center" valign="top">7 / 9.6 / 10</td></tr>
<tr>
<td align="center" valign="top">4</td>
<td align="center" valign="top">1280</td>
<td align="center" valign="top">11 / 14.3 / 15</td>
<td align="center" valign="top">11 / 15.6 / 18</td>
<td align="center" valign="top">11 / 17.0 / 19</td></tr>
<tr>
<td align="center" valign="top">5</td>
<td align="center" valign="top">6144</td>
<td align="center" valign="top">16 / 23.6 / 25</td>
<td align="center" valign="top">16 / 26.5 / 30</td>
<td align="center" valign="top">16 / 27.9 / 31</td></tr>
<tr>
<td align="center" valign="top">6</td>
<td align="center" valign="top">28672</td>
<td align="center" valign="top">22 / 37.0 / 39</td>
<td align="center" valign="top">22 / 41.8 / 47</td>
<td align="center" valign="top">22 / 42.8 / 48</td></tr>
<tr>
<td align="center" valign="top">7</td>
<td align="center" valign="top">131072</td>
<td align="center" valign="top">29 / 55.2 / 58</td>
<td align="center" valign="top">29 / 62.4 / 70</td>
<td align="center" valign="top">29 / 62.6 / 70</td></tr></tbody></table></table-wrap></sec>
<ack>
<p>Most work was carried out while both authors were affiliated with TU Dortmund.</p></ack>
<ref-list>
<title>References</title>
<ref id="b1-algorithms-04-00285"><label>1.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Marschall</surname><given-names>T.</given-names></name><name><surname>Rahmann</surname><given-names>S.</given-names></name></person-group><article-title>Probabilistic Arithmetic Automata and Their Application to Pattern Matching Statistics</article-title><conf-name>Proceedings of the 19th Annual Symposium on Combinatorial Pattern Matching, CPM '08</conf-name><conf-loc>Pisa, Italy</conf-loc><conf-date>18–20 June 2008</conf-date><person-group person-group-type="editor"><name><surname>Ferragina</surname><given-names>P.</given-names></name><name><surname>Landau</surname><given-names>G.M.</given-names></name></person-group><publisher-name>Springer</publisher-name><publisher-loc>Berlin, Germany</publisher-loc><year>2008</year><comment>Volume 5029</comment><fpage>95</fpage><lpage>106</lpage></citation></ref>
<ref id="b2-algorithms-04-00285"><label>2.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Knuth</surname><given-names>D.E.</given-names></name><name><surname>Morris</surname><given-names>J.</given-names></name><name><surname>Pratt</surname><given-names>V.R.</given-names></name></person-group><article-title>Fast pattern matching in strings</article-title><source>SIAM J. Comput.</source><year>1977</year><volume>6</volume><fpage>323</fpage><lpage>350</lpage><pub-id pub-id-type="doi">10.1137/0206024</pub-id></citation></ref>
<ref id="b3-algorithms-04-00285"><label>3.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Boyer</surname><given-names>R.S.</given-names></name><name><surname>Moore</surname><given-names>J.S.</given-names></name></person-group><article-title>A fast string searching algorithm</article-title><source>Commun. ACM</source><year>1977</year><volume>20</volume><fpage>762</fpage><lpage>772</lpage><pub-id pub-id-type="doi">10.1145/359842.359859</pub-id></citation></ref>
<ref id="b4-algorithms-04-00285"><label>4.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Horspool</surname><given-names>R.N.</given-names></name></person-group><article-title>Practical fast searching in strings</article-title><source>Softw.-Pract. Exp.</source><year>1980</year><volume>10</volume><fpage>501</fpage><lpage>506</lpage><pub-id pub-id-type="doi">10.1002/spe.4380100608</pub-id></citation></ref>
<ref id="b5-algorithms-04-00285"><label>5.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sunday</surname><given-names>D.M.</given-names></name></person-group><article-title>A very fast substring search algorithm</article-title><source>Commun. ACM</source><year>1990</year><volume>33</volume><fpage>132</fpage><lpage>142</lpage><pub-id pub-id-type="doi">10.1145/79173.79184</pub-id></citation></ref>
<ref id="b6-algorithms-04-00285"><label>6.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Crochemore</surname><given-names>M.</given-names></name><name><surname>Czumaj</surname><given-names>A.</given-names></name><name><surname>Gasieniec</surname><given-names>L.</given-names></name><name><surname>Jarominek</surname><given-names>S.</given-names></name><name><surname>Lecroq</surname><given-names>T.</given-names></name><name><surname>Plandowski</surname><given-names>W.</given-names></name><name><surname>Rytter</surname><given-names>W.</given-names></name></person-group><article-title>Speeding up two string-matching algorithms</article-title><source>Algorithmica</source><year>1994</year><volume>12</volume><fpage>247</fpage><lpage>267</lpage><pub-id pub-id-type="doi">10.1007/BF01185427</pub-id></citation></ref>
<ref id="b7-algorithms-04-00285"><label>7.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Allauzen</surname><given-names>C.</given-names></name><name><surname>Crochemore</surname><given-names>M.</given-names></name><name><surname>Raffinot</surname><given-names>M.</given-names></name></person-group><article-title>Efficient Experimental String Matching by Weak Factor Recognition</article-title><conf-name>Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching, CPM '01</conf-name><conf-loc>Jerusalem, Israel</conf-loc><conf-date>1–4 July 2001</conf-date><person-group person-group-type="editor"><name><surname>Goos</surname><given-names>G.</given-names></name><name><surname>Hartmanis</surname><given-names>J.</given-names></name><name><surname>van Leeuwen</surname><given-names>J.</given-names></name></person-group><comment>Volume 2089</comment><fpage>51</fpage><lpage>72</lpage></citation></ref>
<ref id="b8-algorithms-04-00285"><label>8.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Navarro</surname><given-names>G.</given-names></name><name><surname>Raffinot</surname><given-names>M.</given-names></name></person-group><source>Flexible Pattern Matching in Strings</source><publisher-name>Cambridge University Press</publisher-name><publisher-loc>Cambridge, UK</publisher-loc><year>2002</year></citation></ref>
<ref id="b9-algorithms-04-00285"><label>9.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Baeza-Yates</surname><given-names>R.A.</given-names></name><name><surname>Gonnet</surname><given-names>G.H.</given-names></name><name><surname>Régnier</surname><given-names>M.</given-names></name></person-group><article-title>Analysis of Boyer-Moore-Type String Searching Algorithms</article-title><conf-name>Proceedings of the 1st Annual ACM- SIAM Symposium on Discrete Algorithms, SODA '90</conf-name><conf-loc>San Francisco, CA, USA</conf-loc><conf-date>22–24 January 1990</conf-date><fpage>328</fpage><lpage>343</lpage></citation></ref>
<ref id="b10-algorithms-04-00285"><label>10.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baeza-Yates</surname><given-names>R.A.</given-names></name><name><surname>Régnier</surname><given-names>M.</given-names></name></person-group><article-title>Average running time of the boyer-moore-horspool algorithm</article-title><source>Theor. Comput. Sci.</source><year>1992</year><volume>92</volume><fpage>19</fpage><lpage>31</lpage><pub-id pub-id-type="doi">10.1016/0304-3975(92)90133-Z</pub-id></citation></ref>
<ref id="b11-algorithms-04-00285"><label>11.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mahmoud</surname><given-names>H.M.</given-names></name><name><surname>Smythe</surname><given-names>R.T.</given-names></name><name><surname>Régnier</surname><given-names>M.</given-names></name></person-group><article-title>Analysis of Boyer-Moore-Horspool string-matching heuristic</article-title><source>Random Struct. Algorithms</source><year>1997</year><volume>10</volume><fpage>169</fpage><lpage>186</lpage></citation></ref>
<ref id="b12-algorithms-04-00285"><label>12.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smythe</surname><given-names>R.T.</given-names></name></person-group><article-title>The Boyer-Moore-Horspool heuristic with Markovian input</article-title><source>Random Struct. Algorithms</source><year>2001</year><volume>18</volume><fpage>153</fpage><lpage>163</lpage><pub-id pub-id-type="doi">10.1002/1098-2418(200103)18:2&lt;153::AID-RSA1003&gt;3.0.CO;2-O</pub-id></citation></ref>
<ref id="b13-algorithms-04-00285"><label>13.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tsai</surname><given-names>T.</given-names></name></person-group><article-title>Average case analysis of the Boyer-Moore algorithm</article-title><source>Random Struct. Algorithms</source><year>2006</year><volume>28</volume><fpage>481</fpage><lpage>498</lpage><pub-id pub-id-type="doi">10.1002/rsa.20111</pub-id></citation></ref>
<ref id="b14-algorithms-04-00285"><label>14.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nicodème</surname><given-names>P.</given-names></name><name><surname>Salvy</surname><given-names>B.</given-names></name><name><surname>Flajolet</surname><given-names>P.</given-names></name></person-group><article-title>Motif statistics</article-title><source>Theor. Comput. Sci.</source><year>2002</year><volume>287</volume><fpage>593</fpage><lpage>617</lpage><pub-id pub-id-type="doi">10.1016/S0304-3975(01)00264-X</pub-id></citation></ref>
<ref id="b15-algorithms-04-00285"><label>15.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nicodème</surname><given-names>P.</given-names></name></person-group><article-title>Regexpcount, a symbolic package for counting problems on regular expressions and words</article-title><source>Fundam. Inform.</source><year>2002</year><volume>56</volume><fpage>71</fpage><lpage>88</lpage></citation></ref>
<ref id="b16-algorithms-04-00285"><label>16.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nuel</surname><given-names>G.</given-names></name></person-group><article-title>Pattern Markov chains: Optimal Markov chain embedding through deterministic finite automata</article-title><source>J. Appl. Probab.</source><year>2008</year><volume>45</volume><fpage>226</fpage><lpage>243</lpage><pub-id pub-id-type="doi">10.1239/jap/1208358964</pub-id></citation></ref>
<ref id="b17-algorithms-04-00285"><label>17.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lladser</surname><given-names>M.</given-names></name><name><surname>Betterton</surname><given-names>M.D.</given-names></name><name><surname>Knight</surname><given-names>R.</given-names></name></person-group><article-title>Multiple pattern matching: A Markov chain approach</article-title><source>J. Math. Biol.</source><year>2008</year><volume>56</volume><fpage>51</fpage><lpage>92</lpage><pub-id pub-id-type="pmid">17668213</pub-id></citation></ref>
<ref id="b18-algorithms-04-00285"><label>18.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Marschall</surname><given-names>T.</given-names></name><name><surname>Rahmann</surname><given-names>S.</given-names></name></person-group><article-title>Exact Analysis of Horspool's and Sunday's Pattern Matching Algorithms with Probabilistic Arithmetic Automata</article-title><conf-name>Proceedings of the 4th International Conference on Language and Automata Theory and Applications, LATA '10</conf-name><conf-loc>Trier, Germany</conf-loc><conf-date>24–28 May 2010</conf-date><person-group person-group-type="editor"><name><surname>Dediu</surname><given-names>A.H.</given-names></name><name><surname>Fernau</surname><given-names>H.</given-names></name><name><surname>Martín-Vide</surname><given-names>C.</given-names></name></person-group><comment>Volume 6031</comment><fpage>439</fpage><lpage>450</lpage></citation></ref>
<ref id="b19-algorithms-04-00285"><label>19.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Marschall</surname><given-names>T.</given-names></name><name><surname>Rahmann</surname><given-names>S.</given-names></name></person-group><article-title>Efficient exact motif discovery</article-title><source>Bioinformatics</source><year>2009</year><volume>25</volume><fpage>i356</fpage><lpage>i364</lpage><pub-id pub-id-type="doi">10.1093/bioinformatics/btp188</pub-id><pub-id pub-id-type="pmid">19478010</pub-id></citation></ref>
<ref id="b20-algorithms-04-00285"><label>20.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Herms</surname><given-names>I.</given-names></name><name><surname>Rahmann</surname><given-names>S.</given-names></name></person-group><article-title>Computing Alignment Seed Sensitivity with Probabilistic Arithmetic Automata</article-title><conf-name>Proceedings of the 8th International Workshop Algorithms in Bioinformatics, WABI '08</conf-name><conf-loc>Karlsruhe, Germany</conf-loc><conf-date>15–19 September 2008</conf-date><person-group person-group-type="editor"><name><surname>Crandall</surname><given-names>K.</given-names></name><name><surname>Lagergren</surname><given-names>J.</given-names></name></person-group><publisher-name>Springer</publisher-name><publisher-loc>Berlin, Germany</publisher-loc><year>2008</year><comment>Volume 5251</comment><fpage>318</fpage><lpage>329</lpage></citation></ref>
<ref id="b21-algorithms-04-00285"><label>21.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Hopcroft</surname><given-names>J.</given-names></name></person-group><article-title>An <italic>n</italic> log <italic>n</italic> Algorithm for Minimizing the States in a Finite Automaton</article-title><source>The Theory of Machines and Computations</source><person-group person-group-type="editor"><name><surname>Kohavi</surname><given-names>Z.</given-names></name><name><surname>Paz</surname><given-names>A.</given-names></name></person-group><publisher-name>Academic Press</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>1971</year><fpage>189</fpage><lpage>196</lpage></citation></ref>
<ref id="b22-algorithms-04-00285"><label>22.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Knuutila</surname><given-names>T.</given-names></name></person-group><article-title>Re-describing an algorithm by Hopcroft</article-title><source>Theor. Comput. Sci.</source><year>2001</year><volume>250</volume><fpage>333</fpage><lpage>363</lpage><pub-id pub-id-type="doi">10.1016/S0304-3975(99)00150-4</pub-id></citation></ref>
<ref id="b23-algorithms-04-00285"><label>23.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kucherov</surname><given-names>G.</given-names></name><name><surname>Noé</surname><given-names>L.</given-names></name><name><surname>Roytberg</surname><given-names>M.</given-names></name></person-group><article-title>A unifying framework for seed sensitivity and its application to subset seeds</article-title><source>J. Bioinform. Comput. Biol.</source><year>2006</year><volume>4</volume><fpage>553</fpage><lpage>569</lpage><pub-id pub-id-type="doi">10.1142/S0219720006001977</pub-id><pub-id pub-id-type="pmid">16819802</pub-id></citation></ref>
<ref id="b24-algorithms-04-00285"><label>24.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Schulz</surname><given-names>M.</given-names></name><name><surname>Weese</surname><given-names>D.</given-names></name><name><surname>Rausch</surname><given-names>T.</given-names></name><name><surname>Döring</surname><given-names>A.</given-names></name><name><surname>Reinert</surname><given-names>K.</given-names></name><name><surname>Vingron</surname><given-names>M.</given-names></name></person-group><article-title>Fast and Adaptive Variable Order Markov Chain Construction</article-title><conf-name>Proceedings of the 8th International Workshop Algorithms in Bioinformatics, WABI '08</conf-name><conf-loc>Karlsruhe, Germany</conf-loc><conf-date>15–19 September 2008</conf-date><person-group person-group-type="editor"><name><surname>Crandall</surname><given-names>K.A.</given-names></name><name><surname>Lagergren</surname><given-names>J.</given-names></name></person-group><publisher-name>Springer</publisher-name><publisher-loc>Berlin, Germany</publisher-loc><year>2008</year><comment>Volume 5251</comment><fpage>306</fpage><lpage>317</lpage></citation></ref>
<ref id="b25-algorithms-04-00285"><label>25.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Wu</surname><given-names>S.</given-names></name><name><surname>Manber</surname><given-names>U.</given-names></name></person-group><source>A Fast Algorithm for Multi-Pattern Searching</source><comment>Technical report</comment><publisher-name>University of Arizona</publisher-name><publisher-loc>Tucson, AZ, USA</publisher-loc><year>1994</year></citation></ref></ref-list></back></article>
