Asymptotic Analysis of the kth Subword Complexity

Patterns within strings enable us to extract vital information regarding a string’s randomness. Understanding whether a string is random (Showing no to little repetition in patterns) or periodic (showing repetitions in patterns) are described by a value that is called the kth Subword Complexity of the character string. By definition, the kth Subword Complexity is the number of distinct substrings of length k that appear in a given string. In this paper, we evaluate the expected value and the second factorial moment (followed by a corollary on the second moment) of the kth Subword Complexity for the binary strings over memory-less sources. We first take a combinatorial approach to derive a probability generating function for the number of occurrences of patterns in strings of finite length. This enables us to have an exact expression for the two moments in terms of patterns’ auto-correlation and correlation polynomials. We then investigate the asymptotic behavior for values of k=Θ(logn). In the proof, we compare the distribution of the kth Subword Complexity of binary strings to the distribution of distinct prefixes of independent strings stored in a trie. The methodology that we use involves complex analysis, analytical poissonization and depoissonization, the Mellin transform, and saddle point analysis.


Introduction
Analyzing and understanding occurrences of patterns in a character string is helpful for extracting useful information regarding the nature of a string. We classify strings to low-complexity and high-complexity, according to their level of randomness. For instance, we take the binary string X = 10101010..., which is constructed by repetitions of the pattern w = 10. This string is periodic, and therefore has low randomness. Such periodic strings are classified as low-complexity strings, whereas strings that do not show periodicity are considered to have high complexity. An effective way of measuring a string's randomness is to count all distinct patterns that appear as contiguous subwords in the string. This value is called the Subword Complexity. The name is given by Ehrenfeucht, Lee, and Rozenberg [1], and initially was introduced by Morse and Hedlund in 1938 [2]. The higher the Subword Complexity, the more complex the string is considered to be.
Assessing information about the distribution of the Subword Complexity enables us to better characterize strings, and determine atypically random or periodic strings that have complexities far from the average complexity [3]. This type of string classification has applications in fields such as data compression [4], genome analysis (see [5][6][7][8][9]), and plagiarism detection [10]. For example, in data compression, a data set is considered compressible if it has low complexity, as consists of repeated subwords. In computational genomics, Subword Complexity (known as k-mers) is used in detection of repeated sequences and DNA barcoding [11,12]. k-mers are composed of A, T, G, and C nucleotides.
For instance, 7-mers for a DNA sequence GTAGAGCTGT is four, meaning that there are 4-hour distinct substrings of length 7 in the given DNA sequence. Counting k-mers becomes challenging for longer DNA sequences. Our results can be easily extended to the alphabet {A, T, G, C} and directly applied in theoretical analysis of the genomic k-mer distributions under the Bernoulli probabilistic model, particularly when the length n of the sequence approaches infinity.
There are two variations for the definition of the Subword Complexity: the one that counts all distinct subwords of a given string (also known as Complexity Index and Sequence Complexity [13]), and the one that only counts the subwords of the same length, say k, that appear in the string. In our work, we analyze the latter, and we call it the kth Subword Complexity to avoid any confusion.
Throughout this work, we consider the kth Subword Complexity of a random binary string of length n over a memory-less source, and we denote it by X n,k . We analyze the first and second factorial moments of X n,k (1) for the range k = Θ(log n), as n → ∞. More precisely, will divide the analysis into three ranges as follows.
Our approach involves two major steps. First, we choose a suitable model for the asymptotic analysis, and afterwards we provide proofs for the derivation of the asymptotic expansion of the first two factorial moments.

Part I
This part of the analysis is inspired by the earlier work of Jacquet and Szpankowski [14] on the analysis of suffix trees by comparing them to independent tries. A trie, first introduced by René de la Briandais in 1959 (see [15]), is a search tree that stores n strings, according to their prefixes. A suffix tree, introduced by Weiner in 1973 (see [16]), is a trie where the strings are suffixes of a given string. An example of these data structures are given in Figure 1.  A direct asymptotic analysis of the moments is a difficult task, as patterns in a string are not independent from each other. However, we note that each pattern in a string can be regarded as a prefix of a suffix of the string. Therefore, the number of distinct patterns of length k in a string is actually the number of nodes of the suffix tree at level k and lower. It is shown by I. Gheorghiciuc and M. D. Ward [17] that the expected value of the k-th Subword Complexity of a Bernoulli string of length n is asymptotically comparable to the expected value of the number of nodes at level k of a trie built over n independent strings generated by a memory-less source.
We extend this analysis to the desired range for k, and we prove that the result holds for when k grows logarithmically with n. Additionally, we show that asymptotically, the second factorial moment of the k-th Subword Complexity can also be estimated by admitting the same independent model generated by a memory-less source. The proof of this theorem heavily relies on the characterization of the overlaps of the patterns with themselves and with one another. Autocorrelation and correlation polynomials explicitly describe these overlaps. The analytic properties of these polynomials are key to understanding repetitions of patterns in large Bernoulli strings. This, in conjunction with Cauchy's integral formula (used to compare the generating functions in the two models) and the residue theorem, provides solid verification that the second factorial moment in the Subword Complexity behaves the same as in the independent model.
To make this comparison, we derive the generating functions of the first two factorial moments in both settings. In a paper published by F. Bassino, J. Clément, and P. Nicodème in 2012 [18], the authors provide a multivariate probability generating function f (z, x) for the number of occurrences of patterns in a finite Bernoulli string. That is, given a pattern w, the coefficient of the term z n x m in f (z, x) is the probability in the Bernoulli model that a random string of size n has exactly m occurrences of the pattern w. Following their technique, we derive the exact expression for the generating functions of the first two factorial moments of the kth Subword Complexity. In the independent model, the generating functions are obtained by basic probability concepts.

Part II
This part of the proof is analogous to the analysis of profile of tries [19]. To capture the asymptotic behavior, the expressions for the first two factorial moments in the independent trie are further improved by means of a Poisson process. The poissonized version yields generating functions in the form of harmonic sums for each of the moments. The Mellin transform and the inverse Mellin transforms of these harmonic sums establish a connection between the asymptotic expansion and singularities of the transformed function. This methodology is sufficient for when the length k of the patterns are fixed. However, allowing k to grow with n, makes the analysis more challenging. This is because for large k, the dominant term of the poissonized generating function may come from the term involving k, and singularities may not be significant compared to the growth of k. This issue is treated by combining the singularity analysis with a saddle point method [20]. The outcome of the analysis is a precise first-order asymptotics of the moments in the poissonized model. Depoissonization theorems are then applied to obtain the desired result in the Bernoulli model.

Results
For a binary string X = X 1 X 2 ...X n , where X i 's ( i = 1, ..., n) are independent and identically distributed random variables , we assume that P(X i = 1) = p, P(X i = 0) = q = 1 − p, and p > q. We define the kth Subword Complexity, X n,k , to be the number of distinct substrings of length k that appear in a random string X with the above assumptions. In this work, we obtain the first order asymptotics for the average and the second factorial moment of X n,k . The analysis is done in the range k = Θ(log n). We rewrite this range as k = a log n, and by performing a saddle point analysis, we will show that In the first step, we compare the kth Subword Complexity to an independent model constructed in the following way: We store a set of n independently generated strings by a memory-less source in a trie. This means that each string is a sequence of independent and identically distributed Bernoulli random variables from the binary alphabet A = {0, 1}, with P(1) = p, P(0) = q = 1 − p . We denote the number of distinct prefixes of length k in the trie byX n,k , and we call it the kth prefix complexity. Before proceeding any further, we remind that factorial moments of a random variable are defined as following.
Definition 1. The jth factorial moment of a random variable X is defined as where j = 1, 2, ... will show that the first and second factorial moments of X n,k are asymptotically comparable to those ofX n,k , when k = Θ(log n). We have the following theorems.

Theorem 1.
For large values of n, and for k = Θ(log n), there exists M > 0 such that We also prove a similar result for the second factorial moments of the kth Subword Complexity and the kth Prefix Complexity: For large values of n, and for k = Θ(log n), there exists > 0 such that In the second part of our analysis, we derive the first order asymptotics of the kth Prefix Complexity. The methodology used here is analogous to the analysis of profile of tries [19]. The rate of the asymptotic growth depends on the location of the value a as seen in (1) . For instance, for the average kth Subword Complexity ,E[X n,k ], we have the following observations. i.
For the range I 1 : 1 log q −1 < a < 2 log q −1 + log p −1 , the growth rate is of order O(2 k ), ii.
in the range I 2 : 2 log q −1 + log p −1 < a < 1 q log q −1 + p log p −1 , we observe some oscillations with n, and iii.
in the range I 3 : 1 q log q −1 + p log p −1 < a < 1 log p −1 , the average has a linear growth O(n).
The above observations will be discussed in depth in the proofs of the following theorems. For a ∈ I 1 , where ν = −r 0 + a log(p −r 0 + q −r 0 ), and is a bounded periodic function.

Theorem 4.
The second factorial moment of the kth Prefix Complexity has the following asymptotic expansion.
i. For a ∈ I 1 , ii. For a ∈ I 2 , iii. For a ∈ I 3 , The periodic function Φ 1 (x) in Theorems 3 and 4 is shown in Figure 2. The results in Theorem 4 will follow for the second moment of the kth Subword Complexity as the analysis can be easily extended from the second factorial moment to the second moment. The variance however, as seen in Figure 3, does not show the same asymptotic behavior as the variance of kth Subword Complexity.

Groundwork
We first introduce a few terminologies and lemmas regarding overlaps of patterns and their number of occurrences in texts. Some of the notations we use in this work are borrowed from [18] and [21].

Definition 2.
For a binary word w = w 1 ...w k of length k, The autocorrelation set S w of the word w is defined in the following way.
The autocorrelation index set is And the autocorrelation polynomial is Definition 3. For the distinct binary words w = w 1 ...w k and w = w 1 ...w k , the correlation set S w,w of the words w and w is The correlation index set is The correlation polynomial is The following two lemmas present the probability generating functions for the number of occurrences of a single pattern and a pair of distinct pattern, respectively, in a random text of length n. For a detailed dissection on obtaining such generating functions, refer to [18]. Lemma 1. The Occurrence probability generating function for a single pattern w in a binary text over a memoryless source is given by The coefficient [z n x m ]F w (z, x − 1) is the probability that a random binary string of length n has m occurrences of the pattern w. Lemma 2. The Occurrence PGF for two distinct Patterns of length k in a Bernoulli random text is given by and M(z, t 1 , is the probability that there are m 1 occurrences of w and m 2 occurrences of w in a random string of length n.
The above results will be used to find the generating functions for the first two factorial moments of the kth Subword Complexity in the following section.

Derivation of Generating Functions
where This yields We . By defining f w (z) = F w (z, 0) and from (10), we obtain Having the above function, we derive the following result.
ii. For this part, we first note that Due to properties of indicator random variables, we observe that the expected value of the second factorial moment has only one term: We proceed by defining a second indicator variable as following.
Finally, we are able to express E[(X n,k ) 2 ] in the following where Having the above expression, we finally obtain In the following lemma, we present the generating functions for the first two factorial moments for the kth Prefix Complexity in the independent model.
Proof. i. We define the indicator variableX n,k as follows.
Summing over all words w of length k, determines the generating functionĤ(z): ii. Similar to in (18) and (20), we obtain Subsequently, we obtain the generating function below.
Our first goal is to compare the coefficients of the generating functions in the two models. The coefficients are expected to be asymptotically equivalent in the desired range for k. To compare the coefficients, we need more information on the analytic properties of these generating functions. This will be discussed in Section 3.3.

Analytic Properties of the Generating Functions
Here, we turn our attention to the smallest singularities of the two generating functions given in Lemma 3. It has been shown by Jacquet and Szpankowski [21] that D w (z) has exactly one root in the disk |z| ≤ ρ. Following the notations in [21], we denote the root within the disk |z| ≤ ρ of D w (z) by A w , and by bootstrapping we obtain We also denote the derivative of D w (z) at the root A w , by B w , and we obtain In this paper, we will prove a similar result for the polynomial D w,w (z) through the following work.
Lemma 5. If w and w are two distinct binary words of length k and δ = √ p, there exists ρ > 1, such that This leads to the following Lemma 6. There exist K > 0, and ρ > 1 such that pρ < 1, and such that, for every pair of distinct words w, and w of length k ≥ K , and for |z| ≤ ρ, we have In other words, S w (z)S w (z) − S w,w (z)S w ,w (z) does not have any roots in |z| ≤ ρ.
Proof. There are three cases to consider: Case i. When either S w (z) = 1 or S w (z) = 1, then every term of S w,w (z)S w ,w (z) has degree k or larger, and therefore Case ii. If the minimal degree for S w (z) − 1 or S w (z) − 1 is greater than k/2 , then every term of S w,w (z)S w ,w (z) has degree at least k/2. We also note that, by Lemma 9, |S w (z)S w (z)| > 0. Therefore, there exists K 2 > 0, such that Case iii. The only remaining case is where the minimal degree for S w (z) − 1 and S w (z) − 1 are both less than or equal to k/2 .
There exists K 3 > 0, such that Similarly, we can show that there exists K 3 , such that |S w,w (z)| < |S w (z)|. Therefore, for k > K 3 we have We complete the proof by setting K = max{K 1 , K 2 , K 3 , K 3 }.
Lemma 7. There exist K w,w > 0 and ρ > 1 such that pρ < 1, and for every word w and w of length k ≥ K w,w , the polynomial has exactly one root in the disk |z| ≤ ρ.
Proof. First note that This yields There exist K , K large enough, such that, for k > K , we have ) has only one root in |z| ≤ ρ, then also D w,w (z) has exactly one root in |z| ≤ ρ.
We denote the root within the disk |z| ≤ ρ of D w,w (z) by α w,w , and by bootstrapping we obtain We also denote the derivative of D w,w (z) at the root α w,w , by β w,w , and we obtain We will refer to these expressions in the residue analysis that we present in the next section.

Asymptotic Difference
We begin this section by the following lemmas on the autocorrelation polynomials.
Lemma 8 (Jacquet and Szpankowski, 1994). For most words w, the autocorrelation polynomial S w (z) is very close to 1, with high probably. More precisely, if w is a binary word of length k and δ = √ p, there exists where θ = (1 − p) −1 . We use Iverson notation Lemma 9 (Jacquet and Szpankowski, 1994). There exist K > 0 and ρ > 1, such that pρ < 1, and for every binary word w with length k ≥ K and |z| ≤ ρ, we have In other words, S w (z) does not have any roots in |z| ≤ ρ.

Lemma 10.
With high probability, for most distinct pairs {w, w }, the correlation polynomial S w,w (z) is very close to 0. More precisely, if w and w are two distinct binary words of length k and δ = √ p, there exists ρ > 1, We will use the above results to prove that the expected values in the Bernoulli model and the model built over a trie are asymptotically equivalent. We now prove Theorem 1 below.

Proof of Theorem 1. From Lemmas 3 and 4, we have
subtracting the two generating functions, we obtain We define Therefore, by Cauchy integral formula (see [20]), we have where the path of integration is a circle about zero with counterclockwise orientation. We note that the above integrand has poles at z = 0, z = 1 1 − P(w) , and z = A w (refer to expression (29)). Therefore, we define where the circle of radius ρ contains all of the above poles. By the residue theorem, we have We observe that where B w is as in (30) Res z=1/1−P(w)Ĥ w (z) z n+1 = −(1 − P(w)) n+1 .
Then we obtain and finally, we have First, we show that, for sufficiently large n, the sum approaches zero.
Lemma 11. For large enough n, and for k = Θ(log n), there exists M > 0 such that Proof. We let The Mellin transform of the above function is We define which is negative and uniformly bounded for all w. Also, for a fixed s, we have and therefore, we obtain From this expression, and noticing that the function has a removable singularity at s = 0, we can see that the Mellin transform r * w (s) exists on the strip where (s) > −1. We still need to investigate the Mellin strip for the sum ∑ w∈A k r * w (s). In other words, we need to examine whether summing r * w (s) over all words of length k (where k grows with n) has any effect on the analyticity of the function. We observe that Lemma 8 allows us to split the above sum between the words for which S w (1) ≤ 1 + O(δ k ) and words that have S w (1) > 1 + O(δ k ).
This shows that ∑ w∈A k r * w (s) is bounded above for (s) > −1 and, therefore, it is analytic. This argument holds for k = Θ(log n) as well, as (q k ) − (s)−1 would still be bounded above by a constant M s,k that depends on s and k.
We would like to approximate ∑ w∈A k r * w (s) when z → ∞. By the inverse Mellin transform, we have We choose c ∈ (−1, M) for a fixed M > 0. Then by the direct mapping theorem [22], we obtain and subsequently, we get We next prove the asymptotic smallness of I w n (ρ) in (54).

Lemma 12.
Let For large n and k = Θ(log n), we have Proof. We observe that For |z| = ρ, we show that the denominator in (71) is bounded away from zero.
To find a lower bound for |1 − (1 − P(w))z|, we can choose K w large enough such that We now move on to finding an upper bound for the numerator in (71), for |z| = ρ.
Therefore, there exists a constant µ > 0 such that Summing over all patterns w, and applying Lemma 8, we obtain which approaches zero as n → ∞ and k = Θ(log n). This completes the proof of of Theorem 1.
Similar to Theorem 1, we provide a proof to show that the second factorial moments of the kth Subword Complexity and the kth Prefix Complexity, have the same first order asymptotic behavior. We are now ready to state the proof of Theorem 2.

Proof of Theorem 2. As discussed in Lemmas 3 and 4, the generating functions representing E[(X n,k ) 2 ]
and E[(X n,k ) 2 ] respectively, are Note that In Theorem 1, we proved that for every M > 0 (which does not depend on n or k), we have Therefore, both (77) and (78) are of order (2 k − 1)O(n −M ) = O(n −M+a log 2 ) for k = a log n. Thus, to show the asymptotic smallness, it is enough to choose M = a log 2 + , where is a small positive value. Now, it only remains to show (79) is asymptotically negligible as well. We define Next, we extract the coefficient of z n where the path of integration is a circle about the origin with counterclockwise orientation. We define The above integrand has poles at z = 0, z = α w,w (as in (46)), and z = 1 1−P(w)−P(w ) . We have chosen ρ such that the poles are all inside the circle |z| = ρ. It follows that and the residues give us the following.
Res z= where β w,w is as in (47). Therefore, we get We now show that the above two terms are asymptotically small.

Lemma 13.
There exists > 0 where the sum is of order O(n − ).
Proof. We define The Mellin transform of the above function is where We note that C w,w is negative and uniformly bounded from above for all w, w ∈ A k .For a fixes s, we also have, and Therefore, we have To find the Mellin strip for the sum ∑ w∈A k r * w,w (s), we first note that (x + y) a ≤ x a + y a , for any real x, y > 0 and a ≤ 1.
Since − (s) < 1, we have and By Lemma 10, with high probability, a randomly selected w has the property S w,w (1) = O(δ k ) , and thus With that and by Lemma 8, for most words w, Therefore, both sums (91) and (93) are of the form (2 k − 1)O(δ k ). The sums (92) and (94) are also of order (2 k − 1)O(δ k ) by Lemma 10. Combining all these terms we will obtain By the inverse Mellin transform, for k = a log n, M = a log 2 + and c ∈ (−1, M), we have In the following lemma we show that the first term in (85) is asymptotically small.

Lemma 14.
Recall that Proof. First note that We saw in (73) that |1 − (1 − P(w ))z| ≥ c 2 , and therefore, it follows that For z = ρ, |D w,w (z)| is also bounded below as the following which is bounded away from zero by the assumption of Lemma 7. Additionally, we show that the numerator in (98) is bounded above, as follows This yields By (75), the first term above is of order (2 k − 1)O(ρ −n+k ) and by Lemma 10 and an analysis similar to (75), the second term yields (2 k − 1)O(ρ −n+k ) as well. Finally, we have Which goes to zero asymptotically, for k = Θ(log n).
This lemma completes our proof of Theorem 2.

Asymptotic Analysis of the kth Prefix Complexity
We finally proceed to analyzing the asymptotic moments of the kth Prefix Complexity. The results obtained hold true for the moments of the kth Subword Complexity. Our methodology involves poissonization, saddle point analysis (the complex version of Laplace's method [23]), and depoissonization. (Jacquet and Szpankowski, 1998). LetG(z) be the Poisson transform of a sequence g n . IfG(z) is analytic in a linear cone S θ with θ < π/2, and if the following two conditions hold: (I) For z ∈ S θ and real values B, r > 0, ν

On the Expected Value:
To transform the sequence of interest, (E[X n,k ]) n≥0 , into a Poisson model, we recall that in (25) we found Thus, the Poisson transform isẼ To asymptotically evaluate this harmonic sum, we turn our attention to the Mellin Transform once more. The Mellin transform ofẼ k (z) is which has the fundamental strip s ∈ −1, 0 . For c ∈ (−1, 0), the inverse Mellin integral is the followingẼ where we define h(s) = s a − log(p −s + q −s ) for k = a log z. We emphasize that the above integral involves k, and k grows with n. We evaluate the integral through the saddle point analysis. Therefore, we choose the line of integration to cross the saddle point r 0 . To find the saddle point r 0 , we let h (r 0 ) = 0, and we obtain and therefore, By (108) and the fact that (p/q) it j = 1 for t j = 2πj log p/q and j ∈ Z, we can see that there are actually infinitely many saddle points z j of the form r 0 + it j on the line of integration.
In the first range, which corresponds to we perform a residue analysis, taking into account the dominant pole at s = −1. In the second range, we have and we get the asymptotic result through the saddle point method. The last range corresponds to and we approach it with a combination of residue analysis at s = 0, and the saddle point method. We now proceed by stating the proof of theorem 3.

Proof of Theorem 3.
We begin with proving part ii which requires a saddle point analysis. We rewrite the inverse Mellin transform with integration line at (s) = r 0 as Step one: Saddle points' contribute to the integral estimation First, we are able to show those saddle points with |t j | > log n do not have a significant asymptotic contribution to the integral. To show this, we let which is very small for large n. Note that for t ∈ ( log n, ∞), t r 0 /2−1/2 is decreasing, and bounded above by (log n) r 0 /4−1/4 .
Step two: Partitioning the integral There are now only finitely many saddle points to work with. We split the integral range into sub-intervals, each of which contains exactly one saddle point. This way, each integral has a contour traversing a single saddle point, and we will be able to estimate the dominant contribution in each integral from a small neighborhood around the saddle point. Assuming that j * is the largest j for which 2πj log p/q ≤ log n, we split the integralẼ k (z) as following By the same argument as in (115), the second term in (116) is also asymptotically negligible. Therefore, we are only left withẼ Step three: Splitting the saddle contour For each integral S j , we write the expansion of h(t) about t j , as follows The main contribution for the integral estimate should come from an small integration path that reduces kh(t) to its quadratic expansion about t j . In other words, we want the integration path to be such that The above conditions are true when |t − t j | k −1/2 and |t − tj| k −1/3 . Thus, we choose the integration path to be |t − t j | ≤ k −2/5 . Therefore, we have Saddle Tails Pruning.
Over the main path, the integrals are of the form We have and Therefore, by Laplace's theorem (refer to [22]) we obtain We finally sum over all j (|j| < j * ), and we get We can rewriteẼ k (z) as where ν = −r 0 + a log(p −r 0 + q −r 0 ), and For part ii, we move the line of integration to r 0 ∈ (0, ∞). Note that in this range, we must consider the contribution of the pole at s = 0. We havẽ Computing the residue at s = 0, and following the same analysis as in part i for the above integral, we arrive atẼ For part iii. of Theorem 3, we shift the line of integration to c 0 ∈ (−2, −1), then we havẽ where ν 0 = −c 0 + a log(p −c 0 + q −c 0 ) < 1.
Therefore, we have This completes the proof of Theorem 3.

On the Second Factorial Moment:
We poissonize the sequence (E[(X n,k ) 2 ]) n≥0 as well. By the analysis in (27), which gives the following poissonized form We show that in all ranges of a the leftover sum in (138) has a lower order contribution toG k (z) compared to (Ẽ k (z)) 2 . We definẽ In the first range for k, we take the Mellin transform ofL k (z), which is and we note that the fundamental strip for this Mellin transform of is −2, 0 as well. The inverse Mellin transform for c ∈ (−2, 0) is We note that this range of r 0 corresponds to The integrand in (141) is quite similar to the one seen in (107). The only difference is the extra term 2 −s−1 − 1. However, we notice that 2 −s−1 − 1 is analytic and bounded. Thus, we obtain the same saddle points with the real part as in (109) and the same imaginary parts in the form of 2πij log p/q , j ∈ Z. Thus, the same saddle point analysis for the integral in (107) applies toL k (z) as well. We avoid repeating the similar steps, and we skip to the central approximation, where by Laplace's theorem (ref. [22]), we getL which can be represented as where This shows thatL k (z) = O z ν log n , when Subsequently, for 1 log q −1 < a < 2 log q −1 + log p −1 , we get and for p 2 + q 2 q 2 log q −1 + p 2 log p −1 < a < 1 log p −1 , we get It is not difficult to see that for each range of a as stated above,L k (z) has a lower order contribution to the asymptotic expansion ofG k (z), compared to (Ẽ k (z)) 2 . Therefore, this leads us to Theorem 4, which will be proved bellow.
Therefore both depoissonization conditions are satisfied and the desired result follows.

Corollary. A Remark on the Second Moment and the Variance
For the second moment we have n,kX Therefore, by (105) and (138) the Poisson transform of the second moment, which we denote byG which results in the same first order asymptotic as the second factorial moment. Also, it is not difficult to extend the proof in Chapter 6 to show that the second moments of the two models are asymptotically the same. For the variance we have Var[X n,k ] = E (X n,k ) 2 − E X n,k (1 − P(w)) n − (1 − P(w)) 2n .
This is quite similar to what we saw in (106), which indicates that the variance has the same asymptotic growth as the expected value. But the variance of the two models do not behave in the same way (cf. Figure 2).

Summary and Conclusions
We studied the first-order asymptotic growth of the first two (factorial) moments of the kth Subword Complexity. We recall that the kth Subword Complexity of a string of length n is denoted by X n,k , and is defined as the number of distinct subwords of length k, that appear in the string. We are interested in the asymptotic analysis for when k grows as a function of the string's length. More specifically, we conduct the analysis for k = Θ(log n), and as n → ∞.
The analysis is inspired by the earlier work of Jacquet and Szpankowski on the analysis of suffix trees, where they are compared to independent tries (cf. [14]). In our work, we compare the first two moments of the kth Subword Complexity to the kth Prefix Complexity over a random trie built over n independently generated binary strings. We recall that we define the kth Prefix Complexity as the number of distinct prefixes that appear in the trie at level k and lower.
We obtain the generating functions representing the expected value and the second factorial moments as their coefficients, in both settings. We prove that the first two moments have the same asymptotic growth in both models. For deriving the asymptotic behavior, we split the range for k into three intervals. We analyze each range using the saddle point method, in combination with residue analysis. We close our work with some remarks regarding the comparison of the second moment and the variance to the kth Prefix Complexity.

Future Challenges
The intervals' endpoints for a in Theorems 3 and 4 are not investigated in this work. The asymptotic analysis of the end points can be studied using van der Waerden saddle point method [24].
The analogous results are not (yet) known in the case where the underlying probability source has Markovian dependence or in the case of dynamical sources.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

PGF
Probabilty Generating Function P Probability E Expected value Var Variance E[(X n,k ) 2 ] The second factorial moment of X n,k