Computing Maximal Lyndon Substrings of a String

: There are two reasons to have an efﬁcient algorithm for identifying all right-maximal Lyndon substrings of a string: ﬁrstly, Bannai et al. introduced in 2015 a linear algorithm to compute all runs of a string that relies on knowing all right-maximal Lyndon substrings of the input string, and secondly, Franek et al. showed in 2017 a linear equivalence of sorting sufﬁxes and sorting right-maximal Lyndon substrings of a string, inspired by a novel sufﬁx sorting algorithm of Baier. In 2016, Franek et al. presented a brief overview of algorithms for computing the Lyndon array that encodes the knowledge of right-maximal Lyndon substrings of the input string. Among those presented were two well-known algorithms for computing the Lyndon array: a quadratic in-place algorithm based on the iterated Duval algorithm for Lyndon factorization and a linear algorithmic scheme based on linear sufﬁx sorting, computing the inverse sufﬁx array, and applying to it the next smaller value algorithm. Duval’s algorithm works for strings over any ordered alphabet, while for linear sufﬁx sorting, a constant or an integer alphabet is required. The authors at that time were not aware of Baier’s algorithm. In 2017, our research group proposed a novel algorithm for the Lyndon array. Though the proposed algorithm is linear in the average case and has O ( n log ( n )) worst-case complexity, it is interesting as it emulates the fast Fourier algorithm’s recursive approach and introduces τ -reduction, which might be of independent interest. In 2018, we presented a linear algorithm to compute the Lyndon array of a string inspired by Phase I of Baier’s algorithm for sufﬁx sorting. This paper presents the theoretical analysis of these two algorithms and provides empirical comparisons of both of their C++ implementations with respect to the iterated Duval algorithm.


Introduction
In combinatorics on words, Lyndon words play a very important role. Lyndon words, a special case of Hall words, were named after Roger Lyndon, who was looking for a suitable description of the generators of free Lie algebras [1]. Despite their humble beginnings, to date, Lyndon words have facilitated many applications in mathematics and computer science, some of which are: constructing de Bruijin sequences, constructing bases in free Lie algebras, finding the lexicographically smallest or largest substring in a string, and succinct prefix matching of highly periodic strings; see Marcus et al. [2], and the informative paper by Berstel and Perrin [3] and the references therein.
The pioneering work on Lyndon decomposition was already introduced by Chen, Fox, and Lyndon in [4]. The Lyndon decomposition theorem is not explicitly stated there; nevertheless, it follows from the work presented there.
i.e., only pairwise comparisons of the alphabet symbols are needed. Its weakness is its quadratic worst-case complexity, which becomes a problem for longer strings with long right-maximal Lyndon substrings, as one of our experiments showed (see Figure 11 in Section 7).
In our empirical work, we used IDLA as a control for comparison and as a verifier of the results. Note that the reason the procedure MaxLyn of Figure 1 really computes the longest Lyndon prefix is not obvious and is based on the properties of periods of prefixes; see [8] or Observation 6 and Lemma 11 in [19].
Lemma 1, below, characterizes right-maximal Lyndon substrings in terms of the relationships of the suffixes and follows from the work of Hohlweg and Reutenauer [20]. Though the definition of the proto-Lyndon substring is formally given in Section 2 below, it suffices to say the it means that it is a prefix of a Lyndon substring of the string. The definition of the lexicographic ordering ≺ is also given Section 2. The proof of this lemma is delayed to the end of Section 2, where all the technical terms needed are defined.
procedure MaxLyn(x [1 . . n], j, Σ, ≺) | integer Thus, the Lyndon array is an NSV (Next Smaller Value) array of the inverse suffix array. Consequently, the Lyndon array can be computed by sorting the suffixes, i.e., computing the suffix array, then computing the inverse suffix array, and then applying NSV to it; see [19]. Computing the inverse suffix array and applying NSV are "naturally" linear, and computing the suffix array can be implemented to be linear; see [19,21] and the references therein. The execution and space characteristics are dominated by those of the first step, i.e., computation of the suffix array. We refer to this scheme as SSLA.
In 2018, a linear algorithm to compute the Lyndon array from a given Burrows-Wheeler transform was presented [22]. Since the Burrows-Wheeler transform is computed in linear time from the suffix array, it is yet another scheme of how to obtain the Lyndon array via suffix sorting: compute the suffix array; from the suffix array, compute the Burrows-Wheeler transform, then compute the Lyndon array during the inversion of the Burrows-Wheeler transform. We refer to this scheme as BWLA.
The introduction of Baier's suffix sort in 2015 and the consequent realization of the connection to right-maximal Lyndon substrings brought up the realization that there was an elementary (not relying on a pre-processed global data structure such as a suffix array or a Burrows-Wheeler transform) algorithm to compute the Lyndon array, and that, despite its original clumsiness, could be eventually refined to outperform any SSLA or BWLA implementation: any implementation of a suffix sorting-based scheme requires a full suffix sort and then some additional processing, while Baier's approach is "just" a partial suffix sort; see [23].
In this work, we present two additional algorithms for the Lyndon array not discussed in [19]. The C++ source code of the three implementations IDLA, TRLA, and BSLA is available; see [24]. Note that the procedure IDLA is in the lynarr.hpp file.
The first algorithm presented here is TRLA. TRLA is a τ-reduction based Lyndon array algorithm that follows Farach's approach used in his remarkable linear algorithm for suffix tree construction [25] and reproduced very successfully in all linear algorithms for suffix sorting (e.g., see [21,26] and the references therein). Farach's approach follows the Cooley-Tukey algorithm for the fast Fourier transform relying on recursion to lower the quadratic complexity to O(n log(n)) complexity; see [27]. TRLA was first introduced by the authors in 2019 (see [28,29]) and presented as a part of Liut's Ph.D. thesis [30].
The second algorithm, BSLA, is a Baier's sort-based Lyndon array algorithm. BSLA is based on the idea of Phase I of Baier's suffix sort, though our implementation necessarily differs from Baier's. BSLA was first introduced at the Prague Stringology Conference 2018 [23] and also presented as a part of Liut's Ph.D. thesis [30] in 2019; here, we present a complete and refined theoretical analysis of the algorithm and a more efficient implementation than that initially introduced.
The paper is structured as follows: In Section 2, the basic notions and terminology are presented. In Section 3, the TRLA algorithm is presented and analysed. In Section 4, the BSLA algorithm is presented and analysed. In Section 5, the datasets with random strings of various lengths and over various alphabets and other datasets used in the empirical tests are described. In Section 6, the conclusion of the research and the future work are presented. The results of the empirical measurements of the performance of IDLA, TRLA, and BSLA on those datasets are presented in Section 7 in both tabular and graphical forms.

Basic Notation and Terminology
Most of the fundamental notions, definitions, facts, and string algorithms can be found in [31][32][33][34]. For the ease of access, this section includes those that are directly related to the work herein.
The set of integers is denoted by Z. For two integers i ≤ j, the range i..j = {k ∈ Z | i ≤ k ≤ j}. An alphabet is a finite or infinite set of symbols, or equivalently called letters. We always assume that the sentinel symbol $ is not in the alphabet and is always assumed to be lexicographically the smallest. A string over an alphabet A is a finite sequence of symbols from A. A $-terminated string over A is a string over A terminated by $. We use the array notation indexing from 1 for strings; thus, x[1..n] indicates a string of length n; the first symbol is the symbol with index 1, i.e., We use the term strings over a constant alphabet if the alphabet is a fixed finite alphabet. The integer alphabet is the infinite alphabet A = {0, 1, 2, ...}. We use the term strings over integer alphabet for the strings over the alphabet {0, 1, 2, ...} with an additional constraint that all letters occurring in the string are all smaller than the length of the string, i.e., in this paper, x[1..n] is a string over the integer alphabet if it is a string over the alphabet {0, 1, ...n−1}. Many authors use a more general definition; for instance, Burkhardt and Kärkkäinen [35] defined it as any set of integers of size n o(1) ; however, our results can easily be adapted to such more general definitions without changing their essence.
We use a bold font to denote strings; thus, x denotes a string, while x denotes some other mathematical entity such as an integer. The empty string is denoted by ε and has length zero. The length or size of string x = x[1..n] is n. The length of a string x is denoted by |x|. For two strings x = x[1..n] and y = y[1..m], the concatenation xy is a string u If x = uvw, then u is a prefix, v a substring, and w a suffix of x. If u (respectively, v, w) is empty, then it is called an empty prefix (respectively, empty substring, empty suffix); if |u| < |x| (respectively, |v| < |x|, |w| < |x|), then it is called a proper prefix (respectively, proper substring, proper suffix). If x = uv, then vu is called a rotation or a conjugate of x; if either u = ε or v = ε, then the rotation is called trivial. A non-empty string x is primitive if there is no string y and no integer k ≥ 2 so that x = y k = yy · · · y k times . A non-empty string x has a non-trivial border u if u is both a non-empty proper prefix and a non-empty proper suffix of x. Thus, both ε and x are trivial borders of x. A string without a non-trivial border is call unbordered.
Let ≺ be a total order of an alphabet A. The order is extended to all finite strings over the alphabet A: for x = x[1..n] and y = y[1..n], x ≺ y if either x is a proper prefix of y or there is a j ≤ min{n, m} so that . This total order induced by the order of the alphabet is called a lexicographic order of all non-empty strings over A. We denote by x y if either x ≺ y or x = y. A string x over A is Lyndon for a given order ≺ of A if x is strictly lexicographically smaller than any non-trivial rotation of x. In particular: Note that the reverse implications do not hold: aba is primitive but neither unbordered, nor Lyndon, while acaab is unbordered, but not Lyndon .n] so that L [i] = j where j is the last position of the right-maximal Lyndon substring starting at the position i. The relationship between those two definitions is straightforward: If j = n, then (c) holds. Therefore, let us assume that j < n. If c 1 ≺ c 2 , then uc 1 ≺ uc 2 , and so, uv ≺ uc 2 . For any u = u 1 u 2 , uv ≺ u 2 v as uv is Lyndon, so uv ≺ u 2 c 2 , giving uvc 2 to be Lyndon, a contradiction with the right-maximality of uv. Therefore, c 1 c 2 ; thus, uw ≺ uvuw, and (c) holds. Now, we go in the opposite direction. Assuming (b ) and (c), we need to show that x[i..j] is right-maximal Lyndon.
Consider the right-maximal substring of x starting at the position i; it is x[i.. ] for some i ≤ ≤ n.
(1) With (a) proven, we can now replace (b ) in the claim with (b), completing the proof.

τ-Reduction Algorithm (TRLA)
The purpose of this section is to introduce a recursive algorithm TRLA for computing the Lyndon array of a string. As will be shown below, the most significant aspect is the so-called τ-reduction of a string and how the Lyndon array of the τ-reduced string can be expanded to a partially filled Lyndon array for the whole string, as well as how to compute the missing values. This section thus provides the mathematical justification for the algorithm and, in so doing, proves the correctness of the algorithm. The mathematical understanding of the algorithm provides the bases for the bounding of its worst-case complexity by O(n log(n)) and determining the linearity of the average-case complexity.
The first idea of the algorithm was proposed in Paracha's 2017 Ph.D. thesis [36]. It follows Farach's approach [25]: (1) reduce the input string x to y; (2) by recursion, compute the Lyndon array of y; and (3) from the Lyndon array of y, compute the Lyndon array of x.
The input strings for the algorithm are $-terminated strings over an integer alphabet. The reduction computed in (1) [37], obtaining the sorted suffixes for positions i ≡ 0 (mod 3) and i ≡ 1 (mod 3) via the recursive call is sufficient to determine the order of suffixes for the i ≡ 2 (mod 3) positions, then merging both lists together. However, there is no such proximity property for right-maximal Lyndon substrings, so the reduction itself must have a property that helps determine some of the values of the Lyndon array of x from the Lyndon array of y and computing the rest.
In our algorithm, we use a special reduction, which we call τ-reduction, defined in Section 3.2, that reduces the original string to at most 1 2 and at least 2 3 of its length. The algorithm computes y as a τ-reduction of the input string x in Step (1) in linear time. In Step (3), it expands the Lyndon array of the reduced string computed by Step (2) to an incomplete Lyndon array of the original string also in linear time. The incomplete Lyndon array computed in (3) is about 1 2 to 2 3 full, and for every position i with an unknown value, the values at positions i−1 and i+1 are known. In particular, the values at Position 1 and position n are both known. Therefore, much information is provided by the recursive Step (2). For instance, for 00011001, via the recursive call, we would identify the right-maximal Lyndon substrings that are underlined in 00011 001 and would need to compute the missing right-maximal Lyndon substrings that are underlined in 00011001.
However, computing the missing values of the incomplete Lyndon array takes at most O(n log(n)) steps, as we will show, resulting in the overall worst-case complexity of O(n log(n)). When the input string is such that the missing values of the incomplete Lyndon array of the input string can be computed in linear time, the overall execution of the algorithm is linear as well, and thus, the average case complexity will be shown to be linear in the length of the input string.
In the following subsections, we describe the τ-reduction in several steps: first, the τ-pairing, then choosing the τ-alphabet, and finally, the computation of the τ(x). The τ-reduction may be of some general interest as it preserves (see Lemma 6) some right-maximal Lyndon substrings of the original string.

τ-Pairing
Consider a $-terminated string x = x[1..n] whose alphabet A x is ordered by ≺ where x[n+1] = $ and $ ≺ a for any a ∈ A x . A τ-pair consists of a pair of adjacent positions from the range 1..n+1. The τ-pairs are computed by induction: 1. the initial τ-pair is (1, 2); 2. if (i−1, i) is the last τ-pair computed, then: if i = n−1 then the next τ-pair is set to (n, n+1)
Every position of the input string that occurs in some τ-pair as the first element is labelled black; all others are labelled white. Note that Position 1 is always black, while the last position n can be either black or white; however, the positions n−1 and n cannot be simultaneously both black. Note also that most of the τ-pairs do not overlap; if two τ-pairs overlap, they overlap in a position i such that 1 < i < n and   Proof. This is by induction; trivially true for |x| = 1 as (1, 2) is the only τ-pair. Assume it is true for |x| ≤ n−1.

τ-Reduction
For each τ-pair (i, i+1), we consider the pair of alphabet symbols (x[i], x[i+1]). We call them symbol τ-pairs. They are in a total order induced by ≺ : They are sorted using the radix sort with a key of size two and assigned letters from a chosen τ-alphabet that is a subset of {0, 1, ..., |τ(x)|} so that the assignment preserves the order. Since the input string is over an integer alphabet, the radix sort is linear.
The τ-letters are substituted for the symbol τ-pairs, and the resulting string is terminated with $. This string is called the τ-reduction of x and denoted τ(x), and it is a $-terminated string over an integer alphabet. For our running example from Figure 2, τ(x) = 021534. The next lemma justifies calling the above transformation a reduction.
Proof. There are two extreme cases; the first is when all the τ-pairs do not overlap at all, then |τ(x)| = 1 2 |x|; and the second is when all the τ-pairs overlap, then |τ(x)| = 2 3 |x|. Any other case must be in between.
Let B(x) denote the set of all black positions of x. For any i ∈ 1..|τ(x)|, b(i) = j where j is a black position in x of the τ-pair corresponding to the new symbol in τ(x) at position i, while t(j) assigns each black position of x the position in τ(x) where the corresponding new symbol is, i.e., b(t(j)) = j and t(b(i)) = i. Thus, In addition, we define p as the mapping of the τ-pairs to the τ-alphabet.

Properties Preserved by τ-Reduction
The most important property of τ-reduction is a preservation of right-maximal Lyndon substrings of x that start at black positions. This means there is a closed formula that gives, for every right-maximal Lyndon substring of τ(x), a corresponding right-maximal Lyndon substring of x. Moreover, the formula for any black position can be computed in constant time. It is simpler to present the following results using L , the alternative form of the Lyndon array, the one where the end positions of right-maximal Lyndon substrings are stored rather than their lengths. More formally: .m] be the Lyndon array of τ(x); and let L x [1.
.n] be the Lyndon array of x.
Then, for any black i ∈ 1..n, L x The proof of the theorem requires a series of lemmas that are presented below. First, we show that τ-reduction preserves the relationships of certain suffixes of x.
Proof. Since i and j are both black positions, both t(i) and t(j) are defined, and t(i) = t(j).  (1a) Case: j+n−i is black.
Since n may be either black or white, we need to discuss two cases.
(1aα) Case: n is white. Since n is white, the last τ-pair of x must be (n−1, n). The τ-pairs of x[j..j+n−i] must be the same as the τ-pairs of x[i..n]; the last τ-pair of x[j..j+n−i] must be (j+n−i−1, j+n−i). Since j+n−i is black by our assumption (1a), the next τ-pair of x must be (j+n−i, j+n−i+1), as indicated in the following diagram: Case: n is black. Then, the last τ-pair of x must be (n, n +1), and hence, the last τ-pair of x[j..j+n−i], so the next τ-pair is (j+n−i, j+n−i+1); since n−1 cannot be black when n is, the situation is as indicated in the following diagram: Then, j+n−i−1 is black; hence, n−1 is black; so, n must also be white, and thus, , as indicated by the following diagram: (2) Case: First note that i+ −2 and j+ −2 are either both black, or both are white: −2], and x[j+ −1] have the same relationship, and thus, the τ-pair following (j+ −3, j+ −2) will be the "same" as the τ-pair following .n] correspond one-to-one to the τ-pairs (j, j+1), ..., (j+ −2, j+ −1) of x[j..n]. It follows that j+ −2 is black as well.
We proceed by discussing these two cases for the colours of i+ −2 and j+ −2.
(3a) Case when i+ −2 and j+ −2 are both white. Therefore, we have the τ- Case when i+ −2 and j+ −2 are both black. Therefore, we have the τ- We need to discuss the four cases based on the colours of i+ −1 and j+ −1.
(3bα) Both i+ −1 and j+ −1 are black. It follows that the next τ-pair for It follows that the next τ-pair for x[i..n] is (i+ , i+ +1), and the next τ-pair Both i+ −1 and j+ −1 are white. Then, the next τ-pair for x[i..n] is (i + , i + + 1), and the next τ-pair for −1] and, by our Lemma 5 shows that τ-reduction preserves the proto-Lyndon property of certain proto-Lyndon substrings of x.
.j] be a proto-Lyndon substring of x, and let i be a black position.
Proof. Let us first assume that j is black.
Since both i and j are black, t(i) and t(j) are defined. Let i 1 = t(i), j 1 = t(j), and consider k 1 , so that Now, we can show that τ-reduction preserves some right-maximal Lyndon substrings.
If j = n, then there are two simple cases. If n is white, n−1 is black and On the other hand, if n is black, then m = t(n), and so, Thus, in the following, we can assume that j < n and that . We will proceed by discussing two possible cases, one where j is black and the other where j is white. (1) Case: j is black. We need to show that either If j = n, then t(j) = m, and we are done. Thus, we can assume that j < n. We must show that . It follows that the τ-pair (j, j+1) is followed by a τ-pair (j+2, j+3), and thus, t(j)+1 = t(j+2). Thus, , and so, (2) Case: j is white.
Since j is white, necessarily both j−1 and j+1 are black and t(j−1)+1 = t(j+1 We proceed by analysis of the two possible cases of the label for the position j. Let ( * ) denote the condition from the theorem, i.e., . We have to also prove that the condition ( * ) holds. If j = n, then the condition ( * ) holds. Therefore, assume that j < n. Since x[i..j] is right-maximal, by Lemma 1, x[j + 1..n] ≺ x[i..n], and so, Case: j is white.
We want to show that the condition ( * ) does not hold.

Computing
Theorem 2 indicates how to compute the partial L x from L τ(x) . The procedure is given in Figure 3. To compute the missing values, the partial array is processed from right to left. When a missing value at position i is encountered (note that it is recognized by L x [i] = nil), the Lyndon array L x [i+1..n] is completely filled, and also, L x [i−1] is known. Recall that L x [i+1] is the ending position of the right-maximal Lyndon substring starting at the position i+1. In several cases, we can determine the value of L x [i] in constant time: We call such points easy. All others will be referred to as hard. For a hard point i, it means that x[i] is followed by at least two consecutive right-maximal Lyndon substrings before reaching either L x [i−1] or n, and we might need to traverse them all.
The while loop, seen in Figure 4's procedure, is the likely cause of the O(n log(n)) complexity. At first glance, it may seem that the complexity might be O(n 2 ); however, the doubling of the length of the string when a hard point is introduced actually trims it down to an O(n log(n)) worst-case complexity. See Section 3.5 for more details and Section 7 for the measurements and graphs.  [5] is more complicated and an example of a hard point: we can extend the right-maximal Lyndon substring from L x [6] to the left to 23, but no more, so L x [5] = 6. Computing L x [2] is again easy as x[2] = x [3], and so, L x [2] = 2. Thus, L x [1..9] = 9, 2, 3, 9, 6, 6, 9, 8, 9.

The Complexity of TRLA
To determine the complexity of the algorithm, we attach to each position i a counter red[i] initialized to zero. Imagine a hard point j indicated by the following diagram: To determine the right-maximal Lyndon substring starting at the hard position j, we need first to check if A 1 can be left-extended by x[j] to make jA 1 Lyndon; we are using abbreviated notation jA 1 for the substring x[j..k] where A 1 = x[j+1..k]; in simple words, jA 1 represents the left-extension of A 1 by one position. If jA 1 is proto-Lyndon, we have to check whether A 2 can be left-extended by jA 1 to a Lyndon substring. If jA 1 A 2 is Lyndon, we must continue until we check whether jA 1 A 2 ...A r−1 is Lyndon. If so, we must check whether jA 1 ...A r is Lyndon. We need not go beyond stop.
How do we check if jA 1 ...A k can left-extend A k+1 to a Lyndon substring? If jA 1 ...A k A k+1 , we can stop, and jA 1 ...A k is the right-maximal Lyndon substring starting at position j. If jA 1 ...A k ≺ A k+1 , we need to continue. Since stop is the last position of the right-maximal Lyndon substring at the position j−1 or n, we are assured to stop there. When comparing the substring jA 1 ...A k with A k+1 , we increment the counter red[i] at every position of A k+1 used in the comparison. When done with the whole array, the value of red[i] represents how many times i was used in various comparisons, for any position i.
Consider a position i that was used k times for k ≥ 4, i.e., red[i] = k. In the next four diagrams and related text, the upper indices of A and C do not represent powers; they are just indices. The next diagram indicates the configuration when the counter red[i] was incremented for the first time in the comparison of j 1 A 1 1 ...A 1 r 1 −1 and A 1 r 1 during the computation of the missing value L x [j 1 ] where: The next diagram indicates the configuration when the counter red[i] was incremented for the second time in the comparison of j 2 A 2 1 ...A 2 r 2 −1 and A 2 r 2 during the computation of the missing value L x [j 2 ] where: The next diagram indicates the configuration when the counter red[i] was incremented for the third time in the comparison of j 3 A 3 1 ...A 3 r 3 −1 and A 3 r 3 during the computation of the missing value L x [j 3 ] where: The next diagram indicates the configuration when the counter red and so forth until the k-th increment of red[i]. Thus, if red[i] = k, then n ≥ 2 k−1 (n 1 +1) ≥ 2 k as n 1 +1 ≥ 2. Thus, n ≥ 2 k , and so, k ≤ log(n). Thus, either k < 4 or k ≤ log(n). Therefore, the overall complexity is O(n log(n)).
To show that the average case complexity is linear, we first recall that the overall complexity of TRLA is determined by the procedure filling the missing values. We showed above that there are at most log(n) missing values (hard positions) that cannot be determined in constant time. We overestimate the number of strings of length n over an alphabet of size Σ, 2 ≤ Σ ≤ n, which will force a non-linear computation, by assuming that every possible log(n) subset of indices with any possible letter assignment forces the worst performance. Thus, there are Σ n − ( n log(n) )Σ log(n) strings that are processed in linear time, say with a constant K 1 , and there are ( n log(n) )Σ log(n) strings that are processed in the worst time, with a constant K 2 . Let K = max(K 1 , K 2 ). Then, the average time is bounded by: for n ≥ 2 7 . The last step follows from the fact that n 2 log(n) ≤ 2 n for any n ≥ 2 7 . The combinatorics of the processing is too complicated to ascertain whether the worst-case complexity is linear or not. We tried to generate strings that might give the worst performance. We used three different formulas to generate the strings, nesting the white indices that might require non-constant computation: the dataset extreme_trla of binary strings is created using the recursive formula u k+1 = 00u k 0u k , using the first 100 shortest binary Lyndon strings as the start u 0 . The moment the size u k exceeds the required length of the string, the recursion stops, and the string is trimmed to the required length. For the extreme_trla1 dataset, we used the same approach with the formula u k+1 = 000u k 00u k , and for the extreme_trla2 dataset, we used the formula u k+1 = 0000u k 00u k .
The space complexity of our C++ implementation is bounded by 9n integers. This upper bound is derived from the fact that a Tau object (see Tau.hpp [24]) requires 3n integers of space for a string of length n. Therefore, the first call to TRLA requires 3n, the next recursive call at most 3 2 3 n, the next recursive call at most 3( 2 3 ) 2 n, ...; thus, 3n + 3 2 3 n + 3( 2 = 9n. However, it should be possible to bring it down to 6n integers.

The Algorithm BSLA
The purpose of this section is to present a linear algorithm BSLA for computing the Lyndon array of a string over an integer alphabet. The algorithm is based on a series of refinements of a list of groups of indices of the input string. The refinement is driven by a group that is already complete, and the refinement process makes the immediately preceding group also complete. In turn, this newly completed group is used as the driver of the next round of the refinement. In this fashion, the refinement proceeds from right to left until all the groups in the list are complete. The initial list of groups consists of the groups of indices with the same alphabet symbol. The section contains proper definitions of all these terms-group, complete group, and refinement. In the process of refinement, each newly created group is assigned a specific substring of the input string referred to as the context of the group. Throughout the process, the list of the groups is maintained in an increasing lexicographic order by their contexts. Moreover, at every stage, the contexts of all the groups are Lyndon substrings of x with an additional property that the contexts of the complete groups are right-maximal Lyndon substrings. Hence, when the refinement is completed, the contexts of all the groups in the list represent all the right-maximal Lyndon substrings of x. The mathematics of the process of refinement is necessary in order to ascertain its correctness and completeness and to determine the worst-case complexity of the algorithm.

Notation and Basic Notions of BSLA
For the sake of simplicity, we fix a string x = x[1..n] for the whole Section 4.1; all the definitions and the observations apply and refer to this x.
A group G is a non-empty set of indices of x. The group G is assigned a context, i.e., a substring con(G) of x with the property that for any i ∈ G, x[i..i+|con(G)|−1] = con(G). If i ∈ G, then C(i) denotes the occurrence of the context of G at the position i, i.e., the substring C(i) = x[i..i+|con(G)|−1]. We say that a group G is smaller than or precedes a group G if con(G ) ≺ con(G ). Definition 1. An ordered list of groups G k , G k−1 , ..., G 2 , G 1 is a group configuration if: Note that (C 1 ) and (C 2 ) guarantee that G k , G k−1 , ..., G 2 , G 1 is a disjoint partitioning of 1..n. For i ∈ {1, .., n}, gr(i) denotes the unique group to which i belongs, i.e., if i ∈ G t , then gr(i) = G t . Note that using this notation,
For a group G from a group configuration, we define an equivalence ∼ on G as follows: i ∼ j iff gr(prev(i)) = gr(prev(j)) or prev(i) = prev(j) = nil. The symbol [i] ∼ denotes the class of equivalence ∼ that contains i, i.e., [i] ∼ = {j ∈ G | j ∼ i}. If prev(i) = nil, then the class [i] ∼ is called trivial. An interesting observation states that if G is viewed as an ordered set of indices, then a non-trivial [i] ∼ is an interval: Observation 7. Let G be a group from a group configuration for x. Consider an i ∈ G such that prev(i) = nil.
On each non-trivial class of ∼, we define a relation ≈ as follows: i ≈ j iff |j−i| = |con(G)|; in simple terms, it means that the occurrence C(i) of con(G) is immediately followed by the occurrence C(j) of con(G). The transitive closure of ≈ is a relation of equivalence, which we also denote by ≈.
Proof. We argue by contradiction. Assume that there is an j ∈ [i] ∼ so that j 1 < j < j 2 and j / ∈ [i] ≈ . Take the minimal such j. Consider j = j − |con(G)|. Then, j ∈ [i] ∼ , and since, j < j, j ∈ [i] ≈ due to the minimality of j. Therefore, i ≈ j ≈ j, and so, j ≈ i, a contradiction.
The condition (C 6 ) is all-important for carrying out the refinement process (see (R 3 ) below). The conditions (C 7 ) and (C 8 ) are necessary for asserting that the condition (C 6 ) is preserved during the refinement process.

The Refinement
For the sake of simplicity, we fix a string x = x[1..n] for the whole Section 4.2; all the definitions, lemmas, and theorems apply and refer to this x.
Proof. (C 1 ), (C 2 ), (C 3 ), and (C 4 ) are straightforward to verify. To verify (C 5 ), we need to show that G 1 is complete. Any occurrence of a k in x is a right-maximal Lyndon substring, so G 1 is complete.
To verify (C 6 ), consider j = prev(i) and val(i) = v for i ∈ G 1 . Consider any r such that j < r < i. If x[r] = a k , then prev(i) < r, which contradicts the definition of prev. Hence, x[r] = a k , and so, ..x[j+v+1] = a k , while x[j] = a q for some q < k. It follows that x[j..n] has a q (a k ) v as a prefix.
To verify (C 8 ), consider C(i) ∩ C(j) = ∅. Then, Let G k , ..., G t , ..., G 1 by a t-complete group configuration. The refinement is driven by the group G t , and it might only partition the groups that precede it, i.e., the groups G k , ..., G t+1 , while the groups G t , ..., G 1 remain unchanged. [ For each j ,«¦ , move all elements {prev(i) | i ∈ [j ,«¦ ] ≈ } from the group gr(prev(j ,«¦ )) into a new group H, place H in the list of groups right after the group gr(prev(j ,«¦ )), and set its context to con(gr(prev(j ,«¦ )))con(gr(j ,«¦ )) val(j , «¦ ) . Note, that this "doubling of the contexts" is possible due to (C 6 ) . Then, update prev: All values of prev are correct except possibly the values of prev for indices from H. It may be the case that for i ∈ H, there is i ∈ gr(j ,«¦ ), so that prev(i) < i , so prev(i) must be reset to the maximal such i . (Note that before the removal of H from gr(j ,«¦ ), the index i was not eligible to be considered for prev(i) as i and i were both from the same group.) Theorem 3 shows that having a t-complete group configuration G k , ..., G t+1 , G t , ..., G 1 and refining it by G t , then the resulting system of groups is a (t+1)-complete group configuration. This allows carrying out the refinement in an iterative fashion. = G k , ..., G t+1 , G t , ..., G 1 be a t-complete group configuration, 1 ≤ t. After performing the refinement of Conf by group G t , the resulting system of groups denoted as Conf is a (t+1)-complete group configuration.

Theorem 3. Let Conf
Proof. We carry the proof in a series of claims. The symbols gr(), con(), C(), prev(), and val() denote the functions for Conf, while gr (), con (), C (), prev (), and val () denote the functions for Conf . When a group G t+1 is partitioned, a part of it is moved as the next group in the list, and we call it H t+1 ; thus, G t+1 ≺ H t+1 ≺ G t . For details, please see (R 3 ) above.
Proof of Claim 1. (C 1 ) and (C 2 ) follow from the fact that the process is a refinement, i.e., a group is either preserved as is or is partitioned into two or more groups. The doubling of the contexts in Step (R 3 ) guarantees that the increasing order of the contexts is preserved, i.e., (C 3 ) holds. For any j ∈ G t so that j = prev(i) = nil, con(gr(prev(j))) is Lyndon, and con(gr(j)) is also Lyndon, while con(gr(prev(j))) ≺ con(gr(j)), so con(gr(prev(j)))con(gr(j)) val(j) is Lyndon as well; thus, (C 4 ) holds.
To illustrate the concatenation: let us call con(gr(prev(j))) as A and con(gr(j)) as B, and let val(j) = m, then we know that A is Lyndon and B is Lyndon and A ≺ B; so, AB m is clearly Lyndon as if A and B were letters.
Consider C (i) and C (j) for some 1 ≤ i < j ≤ n.
(a) Show that (C 7 (a)) holds. If C (j) C (i), then C(j)C(j 1 )..C(j w ) C(i); hence, C(j) C(i), and so, by (C 7 (a)) for Conf, con(G t ) ≺ con(gr(j)). By the t-completeness of Conf, C(j) is a right-maximal Lyndon substring, a contradiction with C(j)C(j 1 ).., C(j w ) being Lyndon. This is an impossible case.
Assume that D = ∅: be the smallest element of D. Since C(i «¦ ) cannot be the prefix of C(j), it means that i «¦ = i 1 . Since C(i 1 ) cannot be a prefix of C(j), it means that C(i) ∩ C(j) = ∅, and so, C(j) ⊆ C(i), which contradicts the fact that C(j) ⊆ i ∈D C(i ) ⊆ C (i).
Show that (C 8 ) holds. Let C (i) ∩ C (j) = ∅. Let us first assume that C(i) ∩ C(j) = ∅. Then, C(j) ⊆ C(i). Since C(j) cannot be a suffix of C(i), it follows that C(i) ∩ C(j 1 ) = ∅. Therefore, C(j)C(j 1 ) ⊆ C(i). Repeating this argument leads to C(j)C(j 1 )...C(j w ) ⊆ C(i), and we are done. Therefore, assume that C(i) ∩ C(j) = ∅. Let 1 ≤ ≤ v be the smallest such that C(i ) ∩ C(j) = ∅. Such an must exist. Then, i ≤ j. If i = j, then either C(i ) is a prefix of C(j) or vice versa, both impossibilities; hence, i < j. Repeating the same arguments as for i, we get that C(j)C(j 1 )..C(j w ) ⊆ C(i ), and so, we are done.
It remains to show that (C 7 (b)) for Conf holds. Consider C (i) immediately followed by C (j) with C (i) ≺ C (j).

2.
Assume that gr (j) = G t . Then, the group gr(i) is partitioned when refining by G t , and so, C (i) = con (gr (i)) = con(gr(i))C(j) v for v = val(j). Since C (i) is immediately followed by C (j) = con(G t ), we have again a contradiction, as it implies that val(j) = v+1.
This concludes the proof of Claim 2.
This concludes the proof of Claim 4. The four claims show that all the conditions (C 1 ) ... (C 8 ) are satisfied for Conf , and that proves Theorem 3.
As the last step, we show that when the process of refinement is completed, all right-maximal Lyndon substrings of x are identified and sorted via the contexts of the groups of the final configuration.
with gr 3 (), con 3 (), C 3 (), prev 3 (), and val 3 () be the 3-complete group configuration obtained from Conf 2 through the refinement by the group G 2 2 . ... Let Conf r = G r k r , G r k r −1 , ..., G r 2 , G r 1 with gr r (), con r (), C r (), prev r (), and val r () be the r-complete group configuration obtained from Conf r−1 through the refinement by the group G r−1 r−1 . Let Conf r be the final configuration after the refinement runs out.
Then, x[i..k] is a right-maximal Lyndon substring of x iff x[i..k] = C r (i) = con r (gr r (i)).
Proof. That all the groups of Conf r are complete follows from Theorem 3, and hence, every C r (i) is a right-maximal Lyndon string. Let x[i..k] be a right-maximal Lyndon substring of x. Consider C r (i); since it is maximal, it must be equal to x[i..k].

Motivation for the Refinement
The process of refinement is in fact a process of the gradual revealing of the Lyndon substrings, which we call the water draining method: (a) lower the water level by one; (b) extend the existing Lyndon substrings; the revealed letters are used to extend the existing Lyndon substrings where possible, or became Lyndon substrings of length one otherwise; (c) consolidate the new Lyndon substrings; processed from the right, if several Lyndon substrings are adjacent and can be joined to a longer Lyndon substring, they are joined.
The diagram in Figure 5 and the description that follows it illustrate the method for a string 011023122. The input string is visualized as a curve, and the height at each point is the value of the letter at that position.
In Figure 5, we illustrate the process: (1) We start with the string 011023122 and a full tank of water.
We drain one level; only 3 is revealed; there is nothing to extend, nothing to consolidate.
We drain one more level, and three 2's are revealed; the first 2 extends 3 to 23, and the remaining two 2's form Lyndon substrings 2 of length one; there is nothing to consolidate.
We drain one more level, and three 1's are revealed; the first two 1's form Lyndon substrings 1 of length one; the third 1 extends 22 to 122; there is nothing to consolidate.

(5)
We drain one more level, and two 0's are revealed; the first 0 extends 11 to 011; the second 0 extends 23 to 023; in the consolidation phase, 023 is joined with 122 to form a Lyndon substring 023122, and then, 011 is joined with 023122 to form a Lyndon substring 011023122.
(1) In Figure 6, we present an illustrative example for the string 011023122, where the arrows represent the prev mapping shown only on the group used for the refinement. The groups used for the refinement are indicated by the bold font.

The Complexity of BSLA
The computation of the initial configuration can be done in linear time. To compute the initial value of prev in linear time, a stack-based approach similar to the NSV algorithm is used. Since all groups are non-empty, there can never be more groups than n. Theorem 3 is at the heart of the algorithm. The refinement by the last completed group is linear in the size of the group, including the update of prev. Therefore, the overall worst-case complexity of BSLA is linear in the length of the input string.

Data and Measurements
Initially, computations were performed on the Department of Computing and Software's moore server; memory: 32 GB (DDR4 @ 2400 MHz), CPU: 8 × Intel Xeon E5-2687W v4 @ 3.00 GHz, OS: Linux Version 2.6.18-419.el5 (gcc Version 4.1.2 and Red Hat Version 4.1.2-55). To verify correctness, new randomized data were produced and computed independently on the University of Toronto Mississauga's octolab cluster; memory: 8 × 32 GB (DDR4 @ 3200 MHz), CPU: 8 × AMD Ryzen Threadripper 1920X (12-Core) @ 4.00 GHz, OS: Ubuntu 16.04.6 LTS (gcc Version 5.4.0). The results of both were extremely similar, and those reported herein are those generated using the moore server. All the programs were compiled without any additional level of optimization (i.e., neither -O1, nor -O2, nor -O3 flags were specified for the compilation). The CPU time was measured in clock ticks with 1,000,000 clock ticks per second. Since the execution time was negligible for short strings, the processing of the same string was repeated several times (the repeat factor varied from 10 6 , for strings of length 10, to one, for strings of length 5 × 10 6 ), resulting in a higher precision. Thus, for graphing, the logarithmic scale was used for both, x-axis representing the length of the strings and y-axis representing the time. We used four categories of randomly generated datasets: (1) bin random strings over an integer alphabet with exactly two distinct letters (kind of binary strings). (2) dna random strings over an integer alphabet with exactly four distinct letters (kind of random DNA strings). Each dataset contains 100 randomly generated strings of the same length. For each category, there were datasets for length s 10, 50, 10 2 , 5 × 10 2 , ..., 10 5 , 5 × 10 5 , 10 6 , and 5 × 10 6 . The minimum, average, and maximum times for each dataset were computed. Since the variance for each dataset was minimal, the results for minimum times and the results for maximum times completely mimicked the results for the average times, so we only present the averages here.
Tables 1-4 and the graphs in Figures 7-10 from Section 7 clearly indicate that the performance of the three algorithms is linear and virtually indistinguishable. We expected IDLA and TRLA to exhibit linear behaviour on random strings as such strings tend to have many, but short right-maximal Lyndon substrings. However, we did not expect the results to be so close.
Despite the fact that IDLA performed in linear time on the random strings, it is relatively easy to force it into its worst quadratic performance. The dataset extreme_idla contains individual strings 0123...n−1 of the required lengths. Table 5 and the graph in Figure 11 from Section 7 show this clearly.
In Section 3.5, we describe how the three datasets, extreme_trla, extreme_trla1, and extreme_trla2, are generated and why. The results of experimenting with these datasets do not suggest that the worst-case complexity for TRLA is O(n log(n)). Yet again, the performances of the three algorithms are linear and virtually indistinguishable; see Tables 6, 7, and 8 and the graphs in Figures 12, 13 and 14 in Section 7.

Conclusions and Future Work
We present two novel algorithms for computing right-maximal Lyndon substrings. The first one, TRLA, has a simple implementation with a complicated theory behind it. Its average time complexity is linear in the length of the input string, and its worst-case complexity is no worse than O(n log(n)).
The τ-reduction used in the algorithm is an interesting reduction preserving right-maximal Lyndon substrings, a fact used significantly in the design of the algorithm. Interestingly, it seem to slightly outperform BSLA, at least on the datasets used for our experimentations. BSLA, the second algorithm, is linear and elementary in the sense that it does not require a pre-processed global data structure. Being linear and elementary, BSLA is more interesting, and it is possible that its performance could be more streamlined. However, both the theory and implementation of BSLA are rather complex.
On random strings, none of the two algorithms were significantly better than the simple IDLA, whose implementation is just a few lines. However, its quadratic worst-case complexity is an obstacle, as our experiments indicated.
Additional effort needs to go into proving TRLA's worst-case complexity. The experiments performed did not indicate that it is not linear even in the worst case. Both algorithms need to be compared to some efficient implementation of SSLA and BWLA.

Results
This section contains the measurements of the average times for the datasets discussed in the previous section. For better understanding of the data, we present them in Tables 1-8 and Figures 7-14. All the graphs include the curve x = y for reference.