#
Lyndon Factorization Algorithms for Small Alphabets and Run-Length Encoded Strings^{ †}

^{1}

^{2}

^{3}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Basic Definitions

- $\left|\beta \right(w\left)\right|=0$;
- either $\left|w\right|=1$ or $w\left[0\right]<w\left[\right|w|-1]$.

**Theorem**

**1.**

**Property**

**1.**

## 3. Duval’s Algorithm

**Lemma**

**1.**

**Lemma**

**2.**

**Lemma**

**3.**

- 1.
- For ${a}^{\prime}\in \mathsf{\Sigma}$ and $a>{a}^{\prime}$, $w{a}^{\prime}\notin {P}^{\prime}$;
- 2.
- For ${a}^{\prime}\in \mathsf{\Sigma}$ and $a<{a}^{\prime}$, $w{a}^{\prime}\in L$;
- 3.
- For ${a}^{\prime}=a$, $w{a}^{\prime}\in {P}^{\prime}\setminus L$.

**Lemma**

**4.**

**Lemma**

**5.**

## 4. Improved Algorithm for Small Alphabets

**Lemma**

**6.**

**Proof.**

**Lemma**

**7.**

**Proof.**

**Lemma**

**8.**

**Proof.**

## 5. Computing the Lyndon Factorization of a Run-Length Encoded String

**Lemma**

**9.**

**Proof.**

**Corollary**

**1.**

**Definition**

**1.**

**Lemma**

**10.**

- 1.
- If ${c}_{j}<{c}_{s}$ then ${c}_{k}^{{l}_{k}}\dots {c}_{j}^{{l}_{j}}\notin {P}^{\prime}$;
- 2.
- If ${c}_{j}>{c}_{s}$ then ${c}_{k}^{{l}_{k}}\dots {c}_{j}^{{l}_{j}}\in {L}^{\prime}$ and 1 holds for $j+1$, ${\ell}^{\prime}=0$;

- 3.
- If ${c}_{j}={c}_{i}$ and ${l}_{j}\le {l}_{i}$, then ${c}_{k}^{{l}_{k}}\dots {c}_{j}^{{l}_{j}}\in {P}^{\prime}$ and 1 holds for $j+1$, ${\ell}^{\prime}=\ell +1$;
- 4.
- If ${c}_{j}={c}_{i}$ and ${l}_{j}>{l}_{i}$, either ${c}_{j}<{c}_{i+1}$ and ${c}_{k}^{{l}_{k}}\dots {c}_{j}^{{l}_{j}}\notin {P}^{\prime}$ or ${c}_{j}>{c}_{i+1}$, ${c}_{k}^{{l}_{k}}\dots {c}_{j}^{{l}_{j}}\in {L}^{\prime}$ and 1 holds for $j+1$, ${\ell}^{\prime}=0$.

**Proof.**

## 6. Experimental Results

**Testing LF-skip.**At first we tested the variations of LF-skip against the variations of LF-Duval. The texts were random sequences of 5 MB symbols. For each alphabet size $\sigma =2,4,\dots ,256$ we generated 100 sequences with a uniform distribution, and each run with each sequence was repeated 500 times. The average run times are given in Table 1 which is shown in a graphical form in Figure 4.

**Testing LF-rle.**To assess the performance of the LF-rle algorithm, we tested it together with LF-Duval, LF-Duval2 and LF-skip2 for random binary sequences of 5 MB with different probability distributions, so as to vary the number of runs in the sequence. The running time of LF-rle does not include the time needed to compute the RLE of the sequence, i.e., we assumed that the sequence is given in the RLE form, since otherwise other algorithms are preferable. For each test we generated 100 sequences, and each run with each sequence was repeated 500 times. The average run times are given in Table 3 which is shown in a graphical form in Figure 5.

## 7. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Chen, K.T.; Fox, R.H.; Lyndon, R.C. Free differential calculus. IV. The quotient groups of the lower central series. Ann. Math.
**1958**, 68, 81–95. [Google Scholar] [CrossRef] - Mantaci, S.; Restivo, A.; Rosone, G.; Sciortino, M. Sorting suffixes of a text via its Lyndon factorization. In Proceedings of the Prague Stringology Conference 2013, Prague, Czech Republic, 2–4 September 2013; pp. 119–127. [Google Scholar]
- Gil, J.Y.; Scott, D.A. A bijective string sorting transform. arXiv
**2012**, arXiv:1201.3077. [Google Scholar] - Kufleitner, M. On bijective variants of the Burrows-Wheeler transform. In Proceedings of the Prague Stringology Conference 2009, Prague, Czech Republic, 31 August–2 September 2009; pp. 65–79. [Google Scholar]
- Duval, J.P. Factorizing words over an ordered alphabet. J. Algorithms
**1983**, 4, 363–381. [Google Scholar] [CrossRef] - Apostolico, A.; Crochemore, M. Fast parallel Lyndon factorization with applications. Math. Syst. Theory
**1995**, 28, 89–108. [Google Scholar] [CrossRef][Green Version] - Roh, K.; Crochemore, M.; Iliopoulos, C.S.; Park, K. External memory algorithms for string problems. Fundam. Inform.
**2008**, 84, 17–32. [Google Scholar] - Tomohiro, I.; Nakashima, Y.; Inenaga, S.; Bannai, H.; Takeda, M. Faster Lyndon factorization algorithms for SLP and LZ78 compressed text. Theor. Comput. Sci.
**2016**, 656, 215–224. [Google Scholar] [CrossRef] - Furuya, I.; Nakashima, Y.; Tomohiro, I.; Inenaga, S.; Bannai, H.; Takeda, M. Lyndon Factorization of Grammar Compressed Texts Revisited. In Proceedings of the Annual Symposium on Combinatorial Pattern Matching (CPM 2018), Qingdao, China, 2–4 July 2018. [Google Scholar] [CrossRef]
- Ghuman, S.S.; Giaquinta, E.; Tarhio, J. Alternative algorithms for Lyndon factorization. In Proceedings of the Prague Stringology Conference 2014, Prague, Czech Republic, 1–3 September 2014; pp. 169–178. [Google Scholar]
- Lothaire, M. Combinatorics on Words; Cambridge Mathematical Library, Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
- Durian, B.; Holub, J.; Peltola, H.; Tarhio, J. Improving practical exact string matching. Inf. Process. Lett.
**2010**, 110, 148–152. [Google Scholar] [CrossRef] - Navarro, G.; Raffinot, M. Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM J. Exp. Algorithm
**2000**, 5, 4. [Google Scholar] [CrossRef]

**Figure 4.**Comparison of the algorithms on random sequences (5 MB) with a uniform distribution of a varying alphabet size.

**Table 1.**Run times in milliseconds on random sequences (5 MB) with a uniform distribution of a varying alphabet size.

$\mathit{\sigma}$ | LF-Duval | LF-Duval2 | LF-skip | LF-skip2 |
---|---|---|---|---|

2 | 14.6 | 21.9 | 2.5 | 1.5 |

4 | 14.6 | 14.9 | 1.6 | 1.1 |

8 | 14.7 | 9.1 | 1.3 | 1.1 |

16 | 14.7 | 6.4 | 1.3 | 1.2 |

32 | 14.7 | 5.0 | 1.4 | 1.6 |

64 | 14.7 | 4.3 | 1.7 | 2.3 |

128 | 14.7 | 4.0 | 2.0 | 3.2 |

192 | 14.6 | 3.8 | 1.7 | 3.7 |

256 | 14.6 | 3.8 | 2.6 | 4.1 |

LF-Duval | LF-Duval2 | LF-skip | LF-skip2 | |
---|---|---|---|---|

DNA (15 MB) | 44.7 | 52.2 | 3.0 | 2.2 |

Protein (2.9 MB) | 8.5 | 3.4 | 0.50 | 0.39 |

P(zero) | LF-Duval | LF-Duval2 | LF-skip2 | LF-rle |
---|---|---|---|---|

0.05 | 14.6 | 5.7 | 1.4 | 0.70 |

0.10 | 14.7 | 7.8 | 1.1 | 1.3 |

0.20 | 14.7 | 12.4 | 1.0 | 2.4 |

0.30 | 14.8 | 17.4 | 1.2 | 3.2 |

⋯ | ||||

0.70 | 14.7 | 16.9 | 1.7 | 3.2 |

0.80 | 14.6 | 12.7 | 2.0 | 2.4 |

0.90 | 14.6 | 8.4 | 2.8 | 1.3 |

0.95 | 14.7 | 6.3 | 4.7 | 0.70 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Ghuman, S.S.; Giaquinta, E.; Tarhio, J. Lyndon Factorization Algorithms for Small Alphabets and Run-Length Encoded Strings. *Algorithms* **2019**, *12*, 124.
https://doi.org/10.3390/a12060124

**AMA Style**

Ghuman SS, Giaquinta E, Tarhio J. Lyndon Factorization Algorithms for Small Alphabets and Run-Length Encoded Strings. *Algorithms*. 2019; 12(6):124.
https://doi.org/10.3390/a12060124

**Chicago/Turabian Style**

Ghuman, Sukhpal Singh, Emanuele Giaquinta, and Jorma Tarhio. 2019. "Lyndon Factorization Algorithms for Small Alphabets and Run-Length Encoded Strings" *Algorithms* 12, no. 6: 124.
https://doi.org/10.3390/a12060124