Next Article in Journal / Special Issue
An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms
Previous Article in Journal
Radio Frequency Interference Detection and Mitigation Algorithms Based on Spectrogram Analysis
Previous Article in Special Issue
Applying Length-Dependent Stochastic Context-Free Grammars to RNA Secondary Structure Prediction
Algorithms 2011, 4(4), 262-284; doi:10.3390/a4040262
Article

The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing

1,* , 2, 2,*  and 1
Received: 12 October 2011 / Accepted: 14 October 2011 / Published: 26 October 2011
(This article belongs to the Special Issue Selected Papers from LATA 2010)
View Full-Text   |   Download PDF [536 KB, 27 October 2011; original version 26 October 2011]   |   Browse Figures

Abstract

The smallest grammar problem—namely, finding a smallest context-free grammar that generates exactly one sequence—is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose a new perspective on this problem by splitting it into two tasks: (1) choosing which words will be the constituents of the grammar and (2) searching for the smallest grammar given this set of constituents. We show how to solve the second task in polynomial time parsing longer constituent with smaller ones. We propose new algorithms based on classical practical algorithms that use this optimization to find small grammars. Our algorithms consistently find smaller grammars on a classical benchmark reducing the size in 10% in some cases. Moreover, our formulation allows us to define interesting bounds on the number of small grammars and to empirically compare different grammars of small size.
Keywords: smallest grammar problem; hierarchical structure inference; optimal parsing; data discovery smallest grammar problem; hierarchical structure inference; optimal parsing; data discovery
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Share & Cite This Article

Further Mendeley | CiteULike
Export to BibTeX |
EndNote
MDPI and ACS Style

Carrascosa, R.; Coste, F.; Gallé, M.; Infante-Lopez, G. The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing. Algorithms 2011, 4, 262-284.

View more citation formats

Article Metrics

For more information on the journal, click here

Comments

Cited By

[Return to top]
Algorithms EISSN 1999-4893 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert