Next Article in Journal
A Catalog of Self-Affine Hierarchical Entropy Functions
Previous Article in Journal / Special Issue
The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing
Article Menu

Export Article

Open AccessArticle
Algorithms 2011, 4(4), 285-306; doi:10.3390/a4040285

An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms

1
Centrum Wiskunde & Informatica (CWI), Science Park 123, 1098 XG Amsterdam, The Netherlands
2
Genome Informatics, Faculty of Medicine, University of Duisburg-Essen, Hufelandstr. 55, 45122 Essen, Germany
3
Bioinformatics, Computer Science XI, TU Dortmund, 44221 Dortmund, Germany
*
Authors to whom correspondence should be addressed.
Received: 14 October 2011 / Revised: 26 October 2011 / Accepted: 26 October 2011 / Published: 31 October 2011
(This article belongs to the Special Issue Selected Papers from LATA 2010)
View Full-Text   |   Download PDF [521 KB, uploaded 31 October 2011]   |  

Abstract

We propose a framework for the exact probabilistic analysis of window-based pattern matching algorithms, such as Boyer–Moore, Horspool, Backward DAWG Matching, Backward Oracle Matching, and more. In particular, we develop an algorithm that efficiently computes the distribution of a pattern matching algorithm’s running time cost (such as the number of text character accesses) for any given pattern in a random text model. Text models range from simple uniform models to higher-order Markov models or hidden Markov models (HMMs). Furthermore, we provide an algorithm to compute the exact distribution of differences in running time cost of two pattern matching algorithms. Methodologically, we use extensions of finite automata which we call deterministic arithmetic automata (DAAs) and probabilistic arithmetic automata (PAAs) [1]. Given an algorithm, a pattern, and a text model, a PAA is constructed from which the sought distributions can be derived using dynamic programming. To our knowledge, this is the first time that substring- or suffix-based pattern matching algorithms are analyzed exactly by computing the whole distribution of running time cost. Experimentally, we compare Horspool’s algorithm, Backward DAWG Matching, and Backward Oracle Matching on prototypical patterns of short length and provide statistics on the size of minimal DAAs for these computations. View Full-Text
Keywords: pattern matching; analysis of algorithms; finite automaton; minimization; deterministic arithmetic automaton; probabilistic arithmetic automaton pattern matching; analysis of algorithms; finite automaton; minimization; deterministic arithmetic automaton; probabilistic arithmetic automaton
This is an open access article distributed under the Creative Commons Attribution License (CC BY 3.0).

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Marschall, T.; Rahmann, S. An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms. Algorithms 2011, 4, 285-306.

Show more citation formats Show less citations formats

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Algorithms EISSN 1999-4893 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top