- freely available
- re-usable
Algorithms 2011, 4(4), 262-284; doi:10.3390/a4040262
Article
The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing
1
Grupo de Procesamiento de Lenguaje Natural Universidad Nacional de Córdoba/Consejo Nacional de Investigaciones Científicas y Técnicas, Argentina
2
Symbiose Project, IRISA/INRIA Rennes-Bretagne Atlantique, France
* Authors to whom correspondence should be addressed.
Received: 12 October 2011 / Accepted: 14 October 2011 / Published: 26 October 2011
(This article belongs to the Special Issue Selected Papers from LATA 2010)
The original version is still available [532 KB, uploaded 26 October 2011 13:55 CEST]
Abstract: The smallest grammar problem—namely, finding a smallest context-free grammar that generates exactly one sequence—is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose a new perspective on this problem by splitting it into two tasks: (1) choosing which words will be the constituents of the grammar and (2) searching for the smallest grammar given this set of constituents. We show how to solve the second task in polynomial time parsing longer constituent with smaller ones. We propose new algorithms based on classical practical algorithms that use this optimization to find small grammars. Our algorithms consistently find smaller grammars on a classical benchmark reducing the size in 10% in some cases. Moreover, our formulation allows us to define interesting bounds on the number of small grammars and to empirically compare different grammars of small size.
Keywords: smallest grammar problem; hierarchical structure inference; optimal parsing; data discovery
Article Statistics
Click here to load and display the download statistics.Cite This Article
MDPI and ACS Style
Carrascosa, R.; Coste, F.; Gallé, M.; Infante-Lopez, G. The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing. Algorithms 2011, 4, 262-284.
AMA StyleCarrascosa R, Coste F, Gallé M, Infante-Lopez G. The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing. Algorithms. 2011; 4(4):262-284.
Chicago/Turabian StyleCarrascosa, Rafael; Coste, François; Gallé, Matthias; Infante-Lopez, Gabriel. 2011. "The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing." Algorithms 4, no. 4: 262-284.
Algorithms
EISSN 1999-4893
Published by MDPI AG, Basel, Switzerland
RSS
E-Mail Table of Contents Alert
