## Abstract

## 1. Introduction

## 2. Basic Definitions

- $\left|\beta \right(w\left)\right|=0$;
- either $\left|w\right|=1$ or $w\left[0\right]<w\left[\right|w|-1]$.

## 3. Duval’s Algorithm

- For ${a}^{\prime}\in \mathsf{\Sigma}$ and $a>{a}^{\prime}$, $w{a}^{\prime}\notin {P}^{\prime}$;
- For ${a}^{\prime}\in \mathsf{\Sigma}$ and $a<{a}^{\prime}$, $w{a}^{\prime}\in L$;
- For ${a}^{\prime}=a$, $w{a}^{\prime}\in {P}^{\prime}\setminus L$.

## 4. Improved Algorithm for Small Alphabets

## 5. Computing the Lyndon Factorization of a Run-Length Encoded String

- If ${c}_{j}<{c}_{s}$ then ${c}_{k}^{{l}_{k}}\dots {c}_{j}^{{l}_{j}}\notin {P}^{\prime}$;
- If ${c}_{j}>{c}_{s}$ then ${c}_{k}^{{l}_{k}}\dots {c}_{j}^{{l}_{j}}\in {L}^{\prime}$ and 1 holds for $j+1$, ${\ell}^{\prime}=0$;

- If ${c}_{j}={c}_{i}$ and ${l}_{j}\le {l}_{i}$, then ${c}_{k}^{{l}_{k}}\dots {c}_{j}^{{l}_{j}}\in {P}^{\prime}$ and 1 holds for $j+1$, ${\ell}^{\prime}=\ell +1$;
- If ${c}_{j}={c}_{i}$ and ${l}_{j}>{l}_{i}$, either ${c}_{j}<{c}_{i+1}$ and ${c}_{k}^{{l}_{k}}\dots {c}_{j}^{{l}_{j}}\notin {P}^{\prime}$ or ${c}_{j}>{c}_{i+1}$, ${c}_{k}^{{l}_{k}}\dots {c}_{j}^{{l}_{j}}\in {L}^{\prime}$ and 1 holds for $j+1$, ${\ell}^{\prime}=0$.

## 6. Experimental Results

**Testing LF-skip.**At first we tested the variations of LF-skip against the variations of LF-Duval. The texts were random sequences of 5 MB symbols. For each alphabet size $\sigma =2,4,\dots ,256$ we generated 100 sequences with a uniform distribution, and each run with each sequence was repeated 500 times. The average run times are given in Table 1 which is shown in a graphical form in Figure 4.

**Testing LF-rle.**To assess the performance of the LF-rle algorithm, we tested it together with LF-Duval, LF-Duval2 and LF-skip2 for random binary sequences of 5 MB with different probability distributions, so as to vary the number of runs in the sequence. The running time of LF-rle does not include the time needed to compute the RLE of the sequence, i.e., we assumed that the sequence is given in the RLE form, since otherwise other algorithms are preferable. For each test we generated 100 sequences, and each run with each sequence was repeated 500 times. The average run times are given in Table 3 which is shown in a graphical form in Figure 5.

## 7. Conclusions

**Figure 4.**Comparison of the algorithms on random sequences (5 MB) with a uniform distribution of a varying alphabet size.

**Table 1.**Run times in milliseconds on random sequences (5 MB) with a uniform distribution of a varying alphabet size.

$\mathit{\sigma}$ | LF-Duval | LF-Duval2 | LF-skip | LF-skip2 |
---|---|---|---|---|

2 | 14.6 | 21.9 | 2.5 | 1.5 |

4 | 14.6 | 14.9 | 1.6 | 1.1 |

8 | 14.7 | 9.1 | 1.3 | 1.1 |

16 | 14.7 | 6.4 | 1.3 | 1.2 |

32 | 14.7 | 5.0 | 1.4 | 1.6 |

64 | 14.7 | 4.3 | 1.7 | 2.3 |

128 | 14.7 | 4.0 | 2.0 | 3.2 |

192 | 14.6 | 3.8 | 1.7 | 3.7 |

256 | 14.6 | 3.8 | 2.6 | 4.1 |

LF-Duval | LF-Duval2 | LF-skip | LF-skip2 | |
---|---|---|---|---|

DNA (15 MB) | 44.7 | 52.2 | 3.0 | 2.2 |

Protein (2.9 MB) | 8.5 | 3.4 | 0.50 | 0.39 |

P(zero) | LF-Duval | LF-Duval2 | LF-skip2 | LF-rle |
---|---|---|---|---|

0.05 | 14.6 | 5.7 | 1.4 | 0.70 |

0.10 | 14.7 | 7.8 | 1.1 | 1.3 |

0.20 | 14.7 | 12.4 | 1.0 | 2.4 |

0.30 | 14.8 | 17.4 | 1.2 | 3.2 |

⋯ | ||||

0.70 | 14.7 | 16.9 | 1.7 | 3.2 |

0.80 | 14.6 | 12.7 | 2.0 | 2.4 |

0.90 | 14.6 | 8.4 | 2.8 | 1.3 |

0.95 | 14.7 | 6.3 | 4.7 | 0.70 |

