Next Article in Journal
Element-Based Construction Methods for Uninorms on Bounded Lattices
Previous Article in Journal
Two-Party Quantum Private Comparison with Pauli Operators
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Spheres of Strings Under the Levenshtein Distance

by
Said Algarni
and
Othman Echi
*,†
Department of Mathematics, College of Computing and Mathematics, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Axioms 2025, 14(8), 550; https://doi.org/10.3390/axioms14080550
Submission received: 3 May 2025 / Revised: 14 July 2025 / Accepted: 20 July 2025 / Published: 22 July 2025

Abstract

Let Σ be a nonempty set of characters, called an alphabet. The run-length encoding ( RLE ) algorithm processes any nonempty string u over Σ and produces two outputs: a k-tuple ( b 1 , b 2 , , b k ) , where each b i is a character and b i + 1 b i ; and a corresponding k-tuple ( q 1 , q 2 , , q k ) of positive integers, so that the original string can be reconstructed as u = b 1 q 1 b 2 q 2 b k q k . The integer k is termed the run-length of u, and symbolized by ρ ( u ) . By convention, we let ρ ( ε ) = 0 . In the Euclidean space ( R n , · 2 ) , the volume of a sphere is determined solely by the dimension n and the radius, following well-established formulas. However, for spheres of strings under the edit metric, the situation is more complex, and no general formulas have been identified. This work intended to show that the volume of the sphere S L ( u , 1 ) , composed of all strings of Levenshtein distance 1 from u, is dependent on the specific structure of the “ RLE -decomposition” of u. Notably, this volume equals ( 2 l ( u ) + 1 ) s 2 l ( u ) ρ ( u ) , where ρ ( u ) represents the run-length of u and l ( u ) denotes its length (i.e., the number of characters in u). Given an integer p 2 , we present a partial result concerning the computation of the volume | S L ( u , p ) | in the specific case where the run-length ρ ( u ) = 1 . More precisely, for a fixed integer n 1 and a character a Σ , we explicitly compute the volume of the Levenshtein sphere of radius p, centered at the string u = a n . This case corresponds to the simplest run structure and serves as a foundational step toward understanding the general behavior of Levenshtein spheres.

1. Introduction

The Levenshtein metric [1], denoted L , is widely used for identifying the closest valid strings to a misspelled term by evaluating edit distances. This metric is also pivotal in genetic sequence comparison, where it quantifies the count of mutations required to transform one sequence into another, thereby elucidating evolutionary relationships. Furthermore, it plays a significant role in plagiarism detection, text similarity analysis, and the comparison of documents or code to identify discrepancies (see, for instance, [2]).
In natural language processing (NLP), the edit metric is crucial for tasks like machine translation (see  [3,4]), where it helps measure the similarity between machine-generated translations and human-created reference texts. Additionally, it is employed to detect duplicate records in databases by assessing the similarity between entries.
In voice recognition, the edit metric serves to accurately compare phonetic transcriptions, aiding in the correct identification of spoken strings [5]. Moreover, it helps correct OCR errors by suggesting the most likely corrections based on the recognized text and a reference dictionary.
The Hamming metric, extensively used in coding theory, particularly in error-correcting codes like Hamming codes, is essential for detecting and correcting errors in data transmission or storage (see [6,7,8]). It is also used in computer science for comparing binary data, such as hashes or binary fingerprints [9].
In bioinformatics, the Hamming metric is utilized to compare genetic sequences or protein structures of equal length, accounting for substitutions without considering insertions or deletions. Both the Hamming and edit metrics are effective tools in bioinformatics and computational biology (see [10,11]).
The Hamming metric is also applied in machine learning for clustering and classification tasks, particularly with binary or categorical data.
Run-length encoding ( RLE ) is a fundamental data compression technique where sequences of the same data value, called runs, are substituted by a unique value followed by a count. This is especially effective for compressing data with many runs, such as simple graphics and animations (see [12]).
Throughout this paper, Σ represents a (finite) alphabet of size s N ; Σ * denotes the set of all finite strings, including the empty string ε ; and Σ + is the set of all nonempty strings over Σ .
Given a nonempty string u of length n over the alphabet Σ , RLE produces two k-tuples:
(i)
( a 1 , a 2 , , a k ) , where each a i is a character and a i a i + 1 ;
(ii)
( r 1 , r 2 , , r k ) , where each r i N , with u = a 1 r 1 a 2 r 2 a k r k .
The integer k, known as the run-length of u, is denoted ρ ( u ) . By convention, we let ρ ( ε ) = 0 .
For r R , with r > 0 , and a vector x R n , the sphere and ball are specified by
S n ( x , r ) : = { y R n : x y 2 = r } , B n ( x , r ) : = { y R n : x y 2 r } ,
where · 2 represents the Euclidean norm. The volumes of S n ( x , r ) and of B n ( x , r ) rely exclusively on the radius r, independent of the center x .
If u is a string, we denote its length by l ( u ) , that is, the number of characters in u. The set of all strings of length n formed from the alphabet Σ is denoted by Σ ( n ) .
For the Hamming and Levenshtein metrics H and L on the monoid Σ * of strings over the alphabet Σ of size s, consider p N and a string u of length n. We define
S H ( u , p ) : = { v Σ * : l ( u ) = l ( v ) = n and H ( u , v ) = p } , S L ( u , p ) : = { v Σ * : L ( u , v ) = p } ,
For further information regarding volume formulas for balls of strings, see [13].
Given a string u of length n over an alphabet Σ , the volumes of S H ( u , p ) and B H ( u , p ) are precisely ( n p ) ( s 1 ) p and i = 0 p n i ( s 1 ) i , respectively (see Section 2). These volumes are solely determined by n = l ( u ) , the alphabet size s, and the radius p. However, for the edit metric, this is not the case. We are aiming to demonstrate that the volume of S L ( u , 1 ) (consisting of Levenshtein neighbors of u) is influenced by the structure of the “ RLE -decomposition” of u. Specifically, this volume equals ( 2 l ( u ) + 1 ) s 2 l ( u ) ρ ( u ) .
Given integers p 2 and n 1 and a character a Σ , we compute the volume of the Levenshtein sphere S L ( a n , p ) .

2. Spheres of Strings Under the Hamming Metric

Let Σ be an alphabet of size s 1 , and let Σ * represent the set of all possible strings over Σ , including the empty string ε . The set Σ + is defined as Σ * { ε } , which includes all nonempty strings over Σ .
Consider two strings u and v over Σ , both sharing the same length, denoted l ( u ) = l ( v ) . The Hamming metric between u and v, denoted by H ( u , v ) , is the count of positions where the corresponding characters in u and v differ. Alternatively, it is the smallest number of substitutions needed to transform one string into the other.
Named after Richard Hamming, the Hamming metric was initially introduced to aid in error detection and correction in data transmission [6]. Since then, it has been applied in various fields, particularly in coding theory and information theory.
It is worth noting that the Hamming distance on strings is closely related to hypercubes and k-ary n-cube structures; see, for instance, [14,15].
Formally, if u = u 1 u 2 u n and v = v 1 v 2 v n are two strings of length n over the alphabet Σ , the Hamming metric H ( u , v ) is specified by H ( u , v ) = i = 1 n δ ( u i , v i ) , where δ ( u i , v i ) = 0 if u i = v i and 1 otherwise. This function essentially counts the total number of positions i where u i and v i differ.
This section reviews the determination of the count of strings v that have a Hamming distance p from a string u.
We now discuss a well-known formula for the volume of a p-Hamming sphere, demonstrating that | S H ( u , p ) | depends solely on p, the length l ( u ) , and the size s of the alphabet Σ .
To construct all the strings in S H ( u , p ) , we select a subset I of [ n ] = { 1 , , n } with size p, which can be achieved in ( n p ) ways. For each chosen subset I, there are ( s 1 ) p possible p-tuples of characters ( v i ) i I , so that v i Σ { u i } } . This results in the expression | S H ( u , p ) | = ( s 1 ) p n p .
Another approach to derive this formula involves using the “combinatorial additive rule”. Let P ( n , p ) = { I { 1 , , n } : | I | = p } . Consider the mapping
φ : S H ( u , p ) P ( n , p ) v = v 1 v n { i [ n ] : v i u i }
Given that | S H ( u , p ) | = I P ( n , p ) | φ 1 ( I ) | , and noting that
γ : φ 1 ( I ) i I Σ { u i } v = v 1 v n ( v i ) i I
is bijective, we can infer that
| S H ( u , p ) | = I P ( n , p ) ( s 1 ) p = | P ( n , p ) | · ( s 1 ) p = ( s 1 ) p n p .
This confirms the following desired result.
Proposition 1.
The volume of the p-Hamming sphere is expressed by the formula | S H ( u , p ) | = ( s 1 ) p n p , where n = l ( u ) .

3. Scattered Strings and Run-Length Encoding

Run-length encoding ( RLE ) is a simple yet effective data compression method where sequences of repeated elements are substituted by a unique value followed by a count of its occurrences. The concept of RLE has its roots in early developments in data compression and digital image processing.
In 1961, Freeman introduced a method for encoding lines and curves in digital images that utilized run-length encoding [12]. This work laid the foundation for the widespread use of RLE in various fields, including image compression, data transmission, and digital storage.
RLE operates by substituting consecutive identical characters in a string with the character itself succeeded by the count of its repetitions. For instance, the string “bbbabb” would be encoded as “3b1a2b”. This method is particularly effective for compressing data that contains many consecutive repeating characters, such as simple visual graphics, text with repeated letters, or other forms of repetitive data.
This section presents a fundamental theorem concerning the decomposition of a nonempty string using run-length encoding ( RLE ) .
To begin, we introduce the following concept.
Definition 1.
A string u Σ * is termed scattered if either u = ε or u = u 1 u n , where each u i is a character and u i u i + 1 . The set of all scattered strings of length n is denoted SS n ( Σ ) .
Partitioning an integer n N into k positive parts involves expressing n as the sum x 1 + x 2 + + x k , where each x i N . This concept is central to number theory and combinatorics, as it explores the different ways in which an integer can be decomposed into a specific number of summands. Each partition corresponds to a unique combination of these summands that collectively equal the original integer n. The ordered k-tuple ( x 1 , x 2 , , x k ) is known as a k-partition of n, and the set of all such k-partitions is denoted P r ( n , k ) .
Theorem 1
( RLE -decomposition). Let u be a nonempty string over Σ with length n. There are a single positive integer k, a unique scattered string v = a 1 a k , and a unique k-partition ( r 1 , r 2 , , r k ) of n so that
u = a 1 r 1 a k r k ,
where this expression is termed the RLE -decomposition of u.
Proof. 
Existence. We will establish the existence through induction on n. For the initial step, where n = 1 , let u = u 1 . We can simply set k = 1 , v = u , and r 1 = 1 .
Assume, the decomposition is ensured for all nonempty strings of length n. Let u = u 1 u 2 u n u n + 1 be a string of length n + 1 . By the inductive assumption, there exist s N , a sequence w = a 1 a s , and an s-partition ( r 1 , , r s ) of n such that
u 1 u 2 u n = a 1 r 1 a s r s .
Two cases may be considered.
If u n + 1 = a s , then the required decomposition of u is u = a 1 r 1 a s r s + 1 , associated with w = a 1 a s , and the s-partition of n + 1 is ( r 1 , r 2 , , r s + 1 ) .
If u n + 1 a s , then the decomposition of u is u = a 1 r 1 a s r s a s + 1 1 , where a s + 1 = u n + 1 , and the ( s + 1 ) -partition of n + 1 is ( r 1 , r 2 , , r s , 1 ) .
Uniqueness of the decomposition. Again, we proceed through induction on n. For the initial step, where n = 1 , suppose
u = u 1 = a 1 α 1 a k α k = b 1 β 1 b s β s ,
then k = 1 = s , a 1 = b 1 = u 1 , and α 1 = β 1 = 1 .
Now, for a given positive integer n, assume that every nonempty string u with l ( u ) n possesses a unique decomposition. Let u = u 1 u n + 1 be a nonempty string of length n + 1 , and assume
u = a 1 α 1 a k α k = b 1 β 1 b s β s ,
are two decompositions of u. Thus, u 1 = b 1 = a 1 .
If k = 1 , then the equality u = a 1 α 1 = b 1 β 1 b s β s implies s = 1 , since b 1 b 2 b s is scattered. Consequently, α 1 = β 1 , establishing the decomposition as unique.
If k 2 , then necessarily s 2 (since a 1 a 2 a k is scattered). We assert that α 1 = β 1 . Indeed, if (for instance) α 1 > β 1 , then we would have a 1 α 1 β 1 a k α k = b 2 β 2 b s β s . Therefore, b 1 = a 1 = b 2 , contradicting the condition that b 1 b 2 b s is scattered. As a consequence, α 1 = β 1 . Thus, we obtain a 2 α 2 a k α k = b 2 β 2 b s β s . By the inductive assumption, we derive k 1 = s 1 , α i = β i , and a i = b i for i = 2 , , k . This confirms that the decomposition is unique.  □
The above result motivates the following definitions.
Definition 2.
The integer k in the previous theorem is termed the run-length of u, referred to as ρ ( u ) . The scattered string a 1 a 2 a k is termed the run-root of u, denoted by rr ( u ) . The k-tuple ( r 1 , r 2 , , r k ) is called the run-partition of u and is referred to as rp ( u ) .
Remark 1.
Theorem 1 can be reformulated using functions. For each positive integer n, the map
Φ : Σ ( n ) k = 1 n SS k ( Σ ) × P r ( n , k ) u ( rr ( u ) , rp ( u ) )
defines a bijection.

4. Unit Spheres of Strings Under the Levenshtein Metric

Unlike the Hamming metric, where the volume of a unit sphere depends only on the length of the string and the size of the alphabet, the volume of S L ( u , 1 ) under the Levenshtein metric exhibits a more intricate dependence. Specifically, it is entirely determined by three parameters: the run-length ρ ( u ) of the string u, the size s of the alphabet Σ , and the length l ( u ) of u. This dependence highlights the sensitivity of the Levenshtein metric to the internal structure and redundancy within strings, especially with respect to repeated characters and their arrangements.
For further detailed comparisons between the Hamming distance and the Levenshtein distance, we refer the reader to [16].
To proceed with the calculation of | S L ( u , 1 ) | , the number of strings at Levenshtein distance one from u, we begin by introducing some notation and conventions that are used throughout the computation.
Notation 1.
Let u be a string of length n over the alphabet Σ, and let a Σ .
1.
For real numbers x y , the notation [ [ x , y ] ] represents the intersection [ x , y ] Z .
2.
If u = v w , where v , w Σ * , l ( v ) = i , and l ( w ) = j , then v is referred to as pref i ( u ) (the prefix of u of length i), and w is referred to as suf j ( u ) (the suffix of u of length j). By convention, pref 0 ( u ) = suf 0 ( u ) = ε .
3.
DEL ( u ) denotes the set of all strings obtained from u by deleting one character. The cardinality of DEL ( u ) is referred to as the deletion degree of u and denoted by del ( u ) . By convention, DEL ( ε ) = . Using prefix–suffix notation,
DEL ( u ) = { pref i 1 ( u ) · suf n i ( u ) : i [ [ 1 , n ] ] } .
4.
INS ( u ) denotes the set of all strings obtained from u by inserting one character. The cardinality of INS ( u ) is referred to as the insertion degree of u and denoted by ins ( u ) . Using prefix–suffix notation, we have
INS ( u ) = i = 0 n { pref i ( u ) · x · suf n i ( u ) : x Σ } .
5.
SUB ( u ) represents the set of all strings obtained from u by replacing one character u i with a character from Σ { u i } . The cardinality of SUB ( u ) is referred to as the substitution degree of u and denoted by sub ( u ) . By convention, SUB ( ε ) = . Using prefix–suffix notation, we have  
SUB ( u ) = i = 1 n { pref i 1 ( u ) · x · suf n i ( u ) : x Σ { u i } } .
6.
For i [ [ 1 , n ] ] , we denote by D i ( u ) the string obtained from u by deleting the character u i , i.e., D i ( u ) = pref i 1 ( u ) · suf n i ( u ) .
7.
For i [ [ 0 , n ] ] and a Σ , we denote by I ( i , a ) ( u ) the string obtained by inserting the character a at position i + 1 in u, i.e., I ( i , a ) ( u ) = pref i ( u ) · a · suf n i ( u ) .
8.
For i [ [ 1 , n ] ] and a Σ { u i } , we denote by S ( i , a ) ( u ) the string obtained by substituting the character u i with a, i.e., S ( i , a ) ( u ) = pref i 1 ( u ) · a · suf n i ( u ) .
Remark 2.
Applying a basic combinatorial rule, namely, the additive rule, for any string u over Σ we have | S L ( u , 1 ) | = sub ( u ) + del ( u ) + ins ( u ) .
In what follows, let u = u 1 u n be a nonempty string over Σ of length n. The enumeration of | S L ( u , 1 ) | requires a sequence of lemmas.
From Theorem 1, we derive the following lemma.
Lemma 1.
The substitution degree of the string u equals sub ( u ) = | S H ( u , 1 ) | = ( s 1 ) n .
The strings D i ( u ) and D j ( u ) share a common prefix of length i 1 and a common suffix of length n j . Consequently, the equality D i ( u ) = D j ( u ) is equivalent to the condition u i + 1 u j = u i u j 1 . This insight results in the following lemma.
Lemma 2.
Let i < j be indices in [ [ 1 , n ] ] . Then, D i ( u ) = D j ( u ) is equivalent to u t = u i for every t [ [ i , j ] ] .
This indicates that whenever consecutive characters are identical, deleting either of them results in the same string. As a consequence, we have the following corollary.
Corollary 1.
The deletion degree of a string u equals del ( u ) = ρ ( u ) , where ρ ( u ) denotes the run-length of u.
For the insertion degree, the two strings I ( i , a ) ( u ) and I ( j , b ) ( u ) share a common prefix of length i and a common suffix of length n j . Therefore, the equality I ( i , a ) ( u ) = I ( j , b ) ( u ) is equivalent to a u i + 1 u j = u i + 1 u j b . This insight results in the following lemma.
Lemma 3.
Let i < j be indices in [ [ 0 , n ] ] , and let a , b Σ . Then, I ( i , a ) ( u ) = I ( j , b ) ( u ) means a = b = u t for every t [ [ i + 1 , j ] ] .
Now, we provide the enumeration of ins ( u ) for a run-length 1 string.
Lemma 4.
Let a be a character and r N , then the insertion degree of the string a r is
ins ( a r ) = ( s 1 ) r + s .
Proof. 
It is clear that INS ( a r ) = a r + 1 i = 0 r I ( i , x ) ( a r ) : x Σ { a } , and by Lemma 3 the union is disjoint. Consequently,
ins ( a r ) = 1 + i = 0 r { I ( i , x ) ( a r ) : x Σ { a } } = 1 + ( s 1 ) ( r + 1 ) = ( s 1 ) r + s .
Lemma 5.
Let u = a 1 r 1 a k r k be a string over Σ, with run-length k 2 , and let
A 1 = v a 2 r 2 a k r k : v INS ( a 1 r 1 ) , A 2 = a 1 r 1 v a 3 r 3 a k r k : v INS ( a 2 r 2 ) , A k = a 1 r 1 a 2 r 2 a k 1 r k 1 v : v INS ( a k r k ) ;
the following properties are satisfied:
0.
INS ( u ) = A 1 A 2 A k .
1.
If i [ [ 1 , k 1 ] ] , then
A i A i + 1 = { a 1 r 1 a i r i x a i + 1 r i + 1 a k r k : x Σ } .
2.
If k 3 and i [ [ 1 , k 2 ] ] , then
A i A i + 2 = { a 1 r 1 a i r i a i + 1 r i + 1 + 1 a i + 2 r i + 2 a k r k } = A i A i + 1 A i + 2 .
3.
If j i 3 , then A i A j = .
Proof. 
1.
Let
w = a 1 r 1 a i 1 r i 1 v 1 a i + 1 r i + 1 a k r k = a 1 r 1 a i r i v 2 a i + 2 r i + 2 a k r k ,
for some v 1 INS ( a i r i ) and v 2 INS ( a i + 1 r i + 1 ) . In turn, there is p [ [ 0 , r i ] ] , q [ [ 0 , r i + 1 ] ] , and x , y Σ so that v 1 = I ( p , x ) ( a i r i ) and v 2 = I ( q , y ) ( a i + 1 r i + 1 ) . We consider four cases.
Case 1: p < r i and q > 0 . Based on Lemma 3, the two equal insertions of u lead to x = a i = a i + 1 , contradicting the RLE -decomposition of u. So this case cannot happen.
Case 2: p < r i and q = 0 . Following Lemma 3, we obtain x = y = a i , and
w = a 1 r 1 a i r i + 1 a i + 1 r i + 1 a k r k .
Case 3: p = r i and q > 0 . Lemma 3 guarantees the equality x = a i + 1 = y . This gives
v 1 = a i r i a i + 1 , v 2 = a i + 1 r i + 1 + 1 , and w = a 1 r 1 a i r i a i + 1 r i + 1 + 1 a k r k .
Case 4: p = r i and q = 0 . In this scenario, v 1 = a i r i x and v 2 = y a i + 1 r i + 1 , and as a consequence, x = y can be any character of Σ .
As a conclusion, we obtain
A i A i + 1 = { a 1 r 1 a i r i x a i + 1 r i + 1 a k r k : x Σ } .
2.
Let
w = a 1 r 1 a i 1 r i 1 v 1 a i + 1 r i + 1 a k r k = a 1 r 1 a i + 1 r i + 1 v 2 a i + 3 r i + 3 a k r k ,
for some v 1 INS ( a i r i ) and v 2 INS ( a i + 2 r i + 2 ) . So there exist p [ [ 0 , r i ] ] , q [ [ 0 , r i + 2 ] ] , and x , y Σ so that v 1 = I ( p , x ) ( a i r i ) and v 2 = I ( q , y ) ( a i + 2 r i + 2 ) .
Based on Lemma 3, this results in x = a i + 1 = y .
We assert that p = r i . Otherwise, by Lemma 3, the equality of two insertions of u implies that x = a i = a i + 1 = y , which contradicts the RLE -decomposition of u.
We also claim that q = 0 . Otherwise, by Lemma 3, x = a i + 1 = a i + 1 = y , again obtaining a contradiction.
As a consequence,
v 1 = a i r i a i + 1 , v 2 = a i + 1 a i + 2 r i + 2 , and w = a 1 r 1 a i r i a i + 1 r i + 1 + 1 a i + 2 r i + 2 a k r k .
Additionally, it is clear that w A i + 1 . Therefore,
A i A i + 2 = { a 1 r 1 a i r i a i + 1 r i + 1 + 1 a i + 2 r i + 2 a k r k } = A i A i + 1 A i + 2 .
3.
If j i + 3 , and A i A j , then there would be a string
w = a 1 r 1 a i 1 r i 1 v 1 a i + 1 r i + 1 a k r k = a 1 r 1 a j 1 r j 1 v 2 a j + 1 r j + 1 a k r k ,
with v 1 = I ( p , x ) ( a i r i ) and v 2 = I ( q , y ) ( a j r j ) . Again, using Lemma 3, x = a i + 1 = a i + 2 = y , that is not possible.
The earlier lemma highlights the existence of bijections: between A i and INS ( a i r i ) , and between A i A i + 1 and Σ . Additionally, the intersections A i A i + 2 and A i A i + 1 A i + 2 are singletons. Using Lemma 4, the following cardinalities are derived.
Corollary 2.
1.
| A i | = r i ( s 1 ) + s ;
2.
| A i A i + 1 | = s ;
3.
| A i A i + 2 | = | A i A i + 1 A i + 2 | = 1 .
To evaluate ins ( u ) , we use the inclusion–exclusion principle (or the sieve formula), first introduced by Abraham de Moivre as part of his efforts to understand and quantify probabilities [17].
Recall that if S 1 , S 2 , , S k are finite sets, then the cardinality of their union equals
i = 1 k S i = i = 1 k ( 1 ) i 1 J [ k ] , | J | = i j J S j .
The following lemma indicates that the insertion degree of a string u is solely determined by l ( u ) and s = | Σ | .
Lemma 6.
The insertion degree of u equals ins ( u ) = ( s 1 ) l ( u ) + s .
Proof. 
Lemma 4 confirms the validity of the given equality for strings with a run-length k = ρ ( u ) = 1 .
Now, we extend this result to strings with a run-length k = ρ ( u ) 2 .
Applying the inclusion–exclusion principle in conjunction with Lemma 5, we obtain
ins ( u ) = i = 1 k | A i | i = 1 k 1 | A i A i + 1 | + i = 1 k 2 | A i A i + 2 | + i = 1 k 2 | A i A i + 1 A i + 2 | = i = 1 k ( r i ( s 1 ) + s ) i = 1 k 1 s + i = 1 k 2 1 + i = 1 k 2 1 = ( s 1 ) l ( u ) + k s ( ( k 1 ) s + k 2 ) + k 2 = ( s 1 ) l ( u ) + s .
By combining Lemma 1, Corollary 1, and Lemma 6, we can now present the main result of this paper, which calculates the volume of S H ( u , 1 ) .
Theorem 2.
The volume of S L ( u , 1 ) is given by
| S L ( u , 1 ) | = ( 2 l ( u ) + 1 ) s 2 l ( u ) ρ ( u ) .
Corollary 3
(Minimum and maximum values of volumes). Let n N and Σ be an alphabet of size s 2 . The following properties are satisfied.
1.
The minimum value of S L ( u , 1 ) : u Σ ( n ) is ( 2 n + 1 ) s ( 2 n 1 ) . This value is attained by strings structured as u = a n , where a is a character from the alphabet.
2.
The maximum value of S L ( u , 1 ) : u Σ ( n ) is ( 2 n + 1 ) s n . This value is attained by strings u where the characters are maximally scattered.
3.
Intermediate value result: Every integer in [ ( 2 n + 1 ) s ( 2 n 1 ) , ( 2 n + 1 ) s n ] is the volume of S L ( u , 1 ) for some string u of length n.
Proof. 
Clearly, minimizing the value of ( 2 n + 1 ) s 2 n ρ ( u ) is equivalent to minimizing ρ ( u ) . This occurs when ρ ( u ) = 1 , which corresponds to u = a n for some character a. Therefore, the minimum value is ( 2 n + 1 ) s 2 n 1 .
Similarly, the maximum value of ( 2 n + 1 ) s 2 n ρ ( u ) is attained when ρ ( u ) is maximized. This occurs when ρ ( u ) = n , corresponding to a scattered string u. The maximum value is ( 2 n + 1 ) s 2 n n = ( 2 n + 1 ) s n .
Now, let x [ [ ( 2 n + 1 ) s 2 n 1 , ( 2 n + 1 ) s n ] ] . Then x = ( 2 n + 1 ) s i for some integer 1 i 2 n 1 . Let v be a scattered string of length 2 n i with a tail character a Σ , and let u = v a i n . Then ρ ( u ) = 2 n i . By Theorem 2,
S L ( u , 1 ) = ( 2 n + 1 ) s 2 n ρ ( u ) = ( 2 n + 1 ) s i = x .

5. Spheres of Strings with Centers of Run-Length 1

This section is devoted to computing the cardinality of the Levenshtein sphere S L ( u , p ) , where p 2 is a fixed integer and u = a n is a string consisting of n repetitions of a single character a from the alphabet Σ . In other words, u has run-length one.
This special case is of particular interest due to the structural simplicity of u, which allows for an explicit enumeration of all strings at Levenshtein distance p. The analysis will rely on combinatorial principles and recursive formulations of the Levenshtein distance.
We begin by recalling a classical result often referred to as the recurrence relation for the Levenshtein distance, which forms the foundation for our computations; the details may be found, for example, in [18].
Remark 3.
Let Σ be a nontrivial alphabet (i.e., of cardinality at least 2). Let s , t be strings over Σ; ε be the empty string; and a , b Σ be single characters. Then, the following properties hold.
1.
L ( s , ε ) = | s | .
2.
L ( ε , t ) = | t | .
3.
L ( a , b ) = 1 if a b , 0 if a = b .
4.
L ( s a , t a ) = L ( s , t ) .
5.
If a b , then L ( s a , t b ) = min L ( s , t ) + 1 , L ( s , t b ) + 1 , L ( s a , t ) + 1 .
The following result provides a formula for computing the Levenshtein distance between a string u = a n , which has run-length 1, and an arbitrary string v of the same or varying length.
Remark 4.
Before stating the result, we first show how to transform the string v into the string u = a n by means of edit operations. Three situations arise:
1.
Case 1: l ( v ) n .
Replace every character of v that is not equal to a with a, obtaining the intermediate string a l ( v ) . The number of such substitutions is l ( v ) c a ( v ) , where c a ( v ) denotes the number of occurrences of a in v.
Then, insert the character a exactly n l ( v ) times to produce the string a n .
The total number of edit operations is therefore
( l ( v ) c a ( v ) ) + ( n l ( v ) ) = n c a ( v ) ,
which shows that L ( a n , v ) n c a ( v ) .
2.
Case 2: l ( v ) > n and n c a ( v ) .
Delete every character of v that is not equal to a, obtaining the string a c a ( v ) .
Then, delete c a ( v ) n occurrences of a to obtain the string a n .
The number of edit operations performed is
( l ( v ) c a ( v ) ) + ( c a ( v ) n ) = l ( v ) n ,
which implies L ( a n , v ) l ( v ) n .
3.
Case 3: c a ( v ) < n < l ( v ) .
Delete l ( v ) n characters of v that are not equal to a, producing a string w of length n.
Then, substitute each character of w that is not equal to a with a, resulting in the string a n .
The total number of edit operations is
( l ( v ) n ) + ( n c a ( v ) ) = l ( v ) c a ( v ) ,
hence L ( a n , v ) l ( v ) c a ( v ) .
The following result shows that the inequalities obtained in cases 1, 2, and 3 are in fact equalities.
Theorem 3.
Let v be a string, n a non-negative integer, and a a character. Then, the Levenshtein distance between the constant string a n and v is given by
L ( a n , v ) = n c a ( v ) , if l ( v ) n , l ( v ) n , if l ( v ) > n and n c a ( v ) , l ( v ) c a ( v ) , if c a ( v ) < n < l ( v ) ,
where l ( v ) denotes the length of v, and c a ( v ) is the number of occurrences of the character a in v.
Proof. 
The formula given in the theorem can be equivalently written in a more compact form as
L ( a n , v ) = max l ( v ) , n min n , c a ( v ) .
We proceed by induction on l ( v ) + n .
Base case: Suppose l ( v ) + n = 0 . Then l ( v ) = n = 0 , and we have
L ( a n , v ) = L ( ε , ε ) = 0 = n c a ( v ) .
Hence, the formula holds in the base case.
Inductive step: Let k be a non-negative integer. Assume that for every non-negative integer n and every string w such that l ( w ) + n k , the formula given in the theorem for L ( a n , w )  holds.
Now, let v be a string such that l ( v ) + n = k + 1 . Write v = w b , where w is a string and b is a character. We consider two cases.
Case 1: If a = b , then since l ( w ) + ( n 1 ) = l ( v ) 1 + ( n 1 ) = k 1 , the induction hypothesis gives
L ( a n , v ) = L ( a n 1 , w ) = max l ( w ) , n 1 min n 1 , c a ( w ) = max l ( v ) 1 , n 1 min n 1 , c a ( v ) 1 = max l ( v ) , n 1 min n , c a ( v ) 1 = max l ( v ) , n min n , c a ( v ) .
Case 2: If a b , then by Remark 3, we have
L ( a n , v ) = L ( a n 1 a , w b ) = min L ( a n 1 , w ) + 1 , L ( a n 1 , v ) + 1 , L ( a n , w ) + 1 .
By the induction hypothesis, since l ( w ) + ( n 1 ) = k 1 , l ( v ) + ( n 1 ) = k , and l ( w ) + n = k , we deduce the following:
L ( a n 1 , w ) + 1 = max ( l ( w ) , n 1 ) min ( n 1 , c a ( w ) ) + 1 = max ( l ( v ) 1 , n 1 ) min ( n 1 , c a ( w ) ) + 1 = max ( l ( v ) , n ) min ( n 1 , c a ( v ) ) ,
L ( a n 1 , v ) + 1 = max ( l ( v ) , n 1 ) min ( n 1 , c a ( v ) ) + 1 = max ( l ( v ) + 1 , n ) min ( n 1 , c a ( v ) ) L ( a n 1 , w ) + 1 ,
L ( a n , w ) + 1 = max ( l ( w ) , n ) min ( n , c a ( w ) ) + 1 = max ( l ( v ) , n + 1 ) min ( n , c a ( v ) ) .
From Equation (1), it follows that
L ( a n , v ) = min max ( l ( v ) , n ) min ( n 1 , c a ( v ) ) , max ( l ( v ) , n + 1 ) min ( n , c a ( v ) ) .
We now consider several sub-cases:
(i)
Suppose l ( v ) n and c a ( v ) n 1 . Then, max ( l ( v ) , n ) min ( n 1 , c a ( v ) ) = n c a ( v ) , and max ( l ( v ) , n + 1 ) min ( n , c a ( v ) ) = n + 1 c a ( v ) .
Thus, by Equation (2), L ( a n , v ) = n c a ( v ) = max ( l ( v ) , n ) min ( n , c a ( v ) ) .
(ii)
Suppose l ( v ) n and c a ( v ) = n . Then, v = a n , so L ( a n , v ) = 0 = max ( l ( v ) , n ) min ( n , c a ( v ) ) .
(iii)
Suppose l ( v ) n + 1 and c a ( v ) n 1 . Then,
max ( l ( v ) , n ) = max ( l ( v ) , n + 1 ) = l ( v ) , and min ( n 1 , c a ( v ) ) = min ( n , c a ( v ) ) = c a ( v ) .
Therefore, by Equation (2), L ( a n , v ) = l ( v ) c a ( v ) = max ( l ( v ) , n ) min ( n , c a ( v ) ) .
(iv)
Suppose l ( v ) n + 1 and c a ( v ) > n 1 . Then,
max ( l ( v ) , n ) = max ( l ( v ) , n + 1 ) = l ( v ) , min ( n 1 , c a ( v ) ) = n 1 , and min ( n , c a ( v ) ) = n .
Thus, max ( l ( v ) , n ) min ( n 1 , c a ( v ) ) = l ( v ) n + 1 , and max ( l ( v ) , n + 1 ) min ( n , c a ( v ) ) = l ( v ) n . Hence, by Equation (2),
L ( a n , v ) = l ( v ) n = max ( l ( v ) , n ) min ( n , c a ( v ) ) .
This completes the induction. □
The above theorem allows us to compute the volume of the Levenshtein sphere S L ( a n , p ) . In line with the three cases considered in the computation of L ( a n , v ) in Theorem 3, we begin by establishing the following lemma.
Lemma 7.
Let a be a character in an alphabet Σ of size s 2 , and let n 1 and p 2 be integers. Set
S 1 = { v S L ( a n , p ) : l ( v ) n } , S 2 = { v S L ( a n , p ) : l ( v ) > n and n c a ( v ) } , S 3 = { v S L ( a n , p ) : c a ( v ) < n < l ( v ) } .
The following statements hold:
1.
| S 1 | = 0 , if p > n , j = n p n j n p ( s 1 ) j n + p , if p n .
2.
| S 2 | = j = n p + n p + n j ( s 1 ) p + n j .
3.
| S 3 | = j = 0 n 1 p + j j ( s 1 ) p , if n < p , j = n p + 1 n 1 p + j j ( s 1 ) p , if n p .
Proof. 
 
1.
A string v lies in S 1 if and only if p = n c a ( v ) ; equivalently, c a ( v ) = n p 0 . Consequently, S 1 = whenever p > n .
Assume now that p n . Any string v of length j n in S 1 contains exactly c a ( v ) = n p occurrences of a; all remaining characters belong to Σ { a } . The number of such strings of length j is
j n p ( s 1 ) j ( n p ) .
Hence,
| S 1 | = j = n p n j n p ( s 1 ) j n + p .
2.
A string v with length l ( v ) > n lies in S 2 precisely when p = l ( v ) n and n c a ( v ) . Set j = c a ( v ) ; then n j = c a ( v ) l ( v ) = p + n .
The number of strings v of length p + n for which c a ( v ) = j equals ( p + n j ) ( s 1 ) p + n j .
Consequently, | S 2 | = j = n p + n p + n j ( s 1 ) p + n j .
3.
Strings v S 3 satisfy l ( v ) > n , n > c a ( v ) , and l ( v ) = c a ( v ) + p . Set j = c a ( v ) ; then n p < j < n . For such an integer j, the number of strings v of length p + j with c a ( v ) = j equals ( p + j j ) ( s 1 ) p + j j = ( p + j j ) ( s 1 ) p .
-
If n < p , then the admissible values of j are 0 , , n 1 . Hence, | S 3 | = j = 0 n 1 p + j j ( s 1 ) p .
-
If n p , then the admissible values of j are n p + 1 , , n 1 . Thus, | S 3 | = j = n p + 1 n 1 p + j j ( s 1 ) p .
Taking into consideration Lemma 7, we are ready to provide a formula of the volume of the Levenshtein sphere S L ( a n , p ) .
Theorem 4.
Let a be a character of an alphabet Σ with | Σ | = s 2 . For integers n 1 and p 2 , the volume of the Levenshtein sphere centered at a n with radius p is given by
| S L ( a n , p ) | = j = n p n j n p ( s 1 ) j n + p + j = n p + n p + n j ( s 1 ) p + n j + j = n p + 1 n 1 p + j j ( s 1 ) p , if p n , j = n p + n p + n j ( s 1 ) p + n j + j = 0 n 1 p + j j ( s 1 ) p , if p > n .
In [13], the authors provided a formula for the volume of the sphere S L ( a , p ) , where a Σ and p 2 is an integer. This formula can be recovered from the above theorem by taking n = 1 .
Corollary 4
([13]).
| S L ( a , p ) | = s 1 p + s p + 1 s 1 p + 1 .
Proof. 
By Theorem 4, we have
| S L ( a , p ) | = j = 1 p + 1 p + 1 j ( s 1 ) p + 1 j + j = 0 0 p + j j ( s 1 ) p .
Since
j = 1 p + 1 p + 1 j ( s 1 ) p + 1 j = ( s 1 + 1 ) p + 1 ( s 1 ) p + 1 = s p + 1 ( s 1 ) p + 1 ,
it follows that
| S L ( a , p ) | = ( s 1 ) p + s p + 1 ( s 1 ) p + 1 .
In [13], the authors also suggested the following conjecture.
Problem 1
([13]). There is a function f : Z + × Σ * × N [ 0 , ) that satisfies the following conditions:
( i )
| f ( z , u , p ) z | is monotonically increasing as l ( u ) p ;
( i i )
f ( z , u , p ) z as p ,
so that if l ( u ) p , then
| S L ( u , p ) | = f ( s , u , p ) l ( u ) s p ,
where s is the size of Σ.
We conclude this section by formulating the following open problems, which naturally arise from the preceding analysis. These problems highlight directions for future investigation and aim to deepen our understanding of the combinatorial structure of Levenshtein spheres.
Problem 2.
Given an integer p 2 and a string u over an alphabet Σ of size s, a considerably more difficult task is to determine the volume of the Levenshtein sphere S L ( u , p ) when the run-length of u satisfies ρ ( u ) 2 .
Problem 3.
Given an alphabet Σ, an integer p 2 , and two strings u 1 , u 2 over Σ of the same length, under what conditions on u 1 and u 2 does the equality S L ( u 1 , p ) = S L ( u 2 , p ) hold?

6. Pseudocode and Illustrative Examples

In this section we present concise pseudocode for our principal computational tasks (Algorithm 1):
1.
Evaluating the Levenshtein distance between arbitrary strings;
2.
Enumerating the volume of the Levenshtein sphere of radius p centered at a fixed string u, denoted S L ( u , p ) .
To demonstrate the effectiveness of this pseudocode and to corroborate the theoretical results established earlier, we also include a series of tables (Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6). Each table showcases carefully chosen input strings and parameters (run-length), together with the corresponding output values. The examples serve as a practical guide for the implementation and highlight how the run-length ρ ( u ) , the word length l ( u ) , the radius p, and the alphabet size s influence the cardinality of S L ( u , p ) .
Implementing this pseudocode in Python 3.12 yields the sequence of tables presented in this section.
We now present a series of tables that illustrate the theoretical results established in the preceding sections. These tables are intended to highlight how the structure of the string u, particularly its length and run-length, influences the cardinality of the corresponding Levenshtein sphere S L ( u , p ) . Each table serves to validate and exemplify the formulas and properties discussed earlier.
Algorithm 1 Enumerate all strings at Levenshtein distance p from a given string u
  • Require: Integer n 1 , radius p 1 , alphabet size s 2
  • Require: Alphabet Σ with | Σ | = s
  • Require: String u over Σ with length n
  • Ensure: All strings v such that L ( u , v ) = p and the count | S L ( u , p ) |
  •   1:function LevenshteinDistance( x , y )            ▹internal helper
  •   2:     m length ( x ) ,    n length ( y )
  •   3:    Create array D [ 0 . . m ] [ 0 . . n ]
  •   4:    for  i 0  to m do                  ▹initialise first column
  •   5:         D [ i ] [ 0 ] i
  •   6:    for  j 0  to n do                    ▹initialise first row
  •   7:         D [ 0 ] [ j ] j
  •   8:    for  i 1  to m do
  •   9:        for  j 1  to n do
  • 10:            cost 0 if x [ i 1 ] = y [ j 1 ] 1 otherwise
  • 11:            D [ i ] [ j ] min D [ i 1 ] [ j ] + 1 , D [ i ] [ j 1 ] + 1 , D [ i 1 ] [ j 1 ] + cost
  • 12:    return  D [ m ] [ n ]
  • 13:/* enumeration phase */
  • 14: Solutions
  • 15: minLen max ( 0 , n p ) ,    maxLen n + p
  • 16:for  l minLen   to   maxLen   do
  • 17:    for all  v Σ l  do
  • 18:        if LevenshteinDistance( u , v ) = p  then
  • 19:           SolutionsSolutions { v }
  • 20:Output every string in Solutions
  • 21:Output | S L ( u , p ) | | Solutions |
Observe from the above table that for radii p 2 , the equality | S L ( u 1 , p ) | = | S L ( u 2 , p ) | does not necessarily hold, even when the two strings share the same run-length ρ ( u 1 ) = ρ ( u 2 ) and length l ( u 1 ) = l ( u 2 ) . By contrast, when p = 1 , these two conditions are sufficient to guarantee | S L ( u 1 , 1 ) | = | S L ( u 2 , 1 ) | (see Theorem 2).

7. Discussion

In this paper, we established a closed-form expression for the volume of the unit Levenshtein sphere, that is, the set S L ( u , 1 ) = { v Σ * L ( u , v ) = 1 } , expressed as a function of the string length l ( u ) , the alphabet size s, and the run-length ρ ( u ) .
To provide a comparative perspective, we also revisited the classical Hamming metric, for which we derived exact counts of Hamming sphere volumes. This comparison highlighted the structural differences between the Hamming distance, which only accounts for substitutions, and the more flexible Levenshtein distance, which captures a broader spectrum of edit operations—including insertions, deletions, and substitutions.
The general problem of determining the volume | S L ( u , p ) | for an arbitrary radius p 2 remains a challenging and largely unresolved combinatorial question. Nonetheless, for strings of the form u = a n —i.e., strings with run-length ρ ( u ) = 1 —we obtained a closed-form expression for | S L ( a n , p ) | valid for all p 2 .
The difficulty in deriving a closed-form expression for | S L ( u , p ) | when ρ ( u ) 2 stemmed from the absence of an explicit formula for the Levenshtein distance L ( u , v ) , even in seemingly simple cases. For instance, in the case where u = a n b m (so that ρ ( u ) = 2 ), no general closed-form expression for L ( u , v ) was known. This lack of an analytic formula complicated the enumeration of all strings v such that L ( u , v ) = p , and thereby hindered the computation of | S L ( u , p ) | .
When the run-length satisfied ρ ( u ) 2 , the allowable transformations are strongly influenced by the run-partition rp ( u ) , which records the lengths of the successive character runs in u. We generated experimental values for the volumes | S L ( u , p ) | for selected strings with ρ ( u ) 2 , providing benchmark data that future theoretical developments must aim to explain.
A key open direction is to identify and formalize the structural patterns that emerged in the ρ ( u ) 2 case and translate them into rigorous closed-form expressions.
Beyond their intrinsic combinatorial appeal, explicit formulas for Levenshtein sphere volumes have concrete applications in areas such as error-correcting codes(particularly in sequence alignment) and compressed-text indexing. Advancing our understanding of the multi-run case thus offers both theoretical insight and practical impact.

Author Contributions

Methodology, S.A. and O.E.; Investigation, S.A. and O.E.; Writing—review & editing, S.A. and O.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors gratefully acknowledge the valuable comments and suggestions of the three anonymous referees, which significantly improved both the mathematical content and the clarity of the exposition. The authors also acknowledge the support provided by the Deanship of Research at King Fahd University of Petroleum and Minerals, Saudi Arabia.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Levenshtein, V.I. Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 1966, 10, 707–710. [Google Scholar]
  2. Barrón-Cedeno, A.; Stein, B.; Rosso, P. Cross-language plagiarism detection. Lang Resour. Eval. 2011, 45, 45–62. [Google Scholar]
  3. Brill, E.; Moore, R.C. An improved error model for noisy channel spelling correction. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, Hong Kong, China, 3–6 October 2000; pp. 286–293. [Google Scholar]
  4. Koehn, P. Europarl: A Parallel Corpus for Statistical Machine Translation. In Proceedings of the 10th Machine Translation Summit, Phuket, Thailand, 12–16 September 2005; pp. 79–86. [Google Scholar]
  5. Jurafsky, D.; Martin, J.H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition; Prentice Hall: Upper Saddle River, NJ, USA, 2008. [Google Scholar]
  6. Hamming, R.W. Error detecting and error correcting codes. Bell Syst. Tech. J. 1950, 29, 147–160. [Google Scholar] [CrossRef]
  7. Katz, J.; Lindell, Y. Introduction to Modern Cryptography, 3rd ed.; Chapman & Hall/CRC Cryptography and Network Security; CRC Press: Boca Raton, FL, USA, 2021. [Google Scholar]
  8. Lin, S.; Costello, D.J. Error Control Coding: Fundamentals and Applications, 2nd ed.; Pearson/Prentice Hall: Upper Saddle River, NJ, USA, 2004. [Google Scholar]
  9. Andoni, A.; Indyk, P. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 2018, 51, 117–122. [Google Scholar] [CrossRef]
  10. Amir, A.; Amit, M.; Landau, G.M.; Sokol, D. Period recovery of strings over the Hamming and edit distances. Theor. Comput. Sci. 2018, 710, 2–18. [Google Scholar] [CrossRef]
  11. Marçais, G.; DeBlasio, D.; Pandey, P.; Kingsford, C. Locality-sensitive hashing for the edit distance. Bioinformatics 2019, 35, i127–i135. [Google Scholar] [CrossRef] [PubMed]
  12. Malon, S.; Freeman, H. On the encoding of arbitrary geometric configurations. IRE Trans. EC 1961, 10, 260–268. [Google Scholar]
  13. Koyano, H.; Hayashida, M. Volume formula and growth rates of the balls of strings under the edit distances. Appl. Math. Comput. 2023, 458, 128202. [Google Scholar] [CrossRef]
  14. Wang, M.; Wang, S. Connectivity and diagnosability of center k-ary n-cubes. Discrete Appl. Math. 2021, 294, 98–107. [Google Scholar] [CrossRef]
  15. Wang, M.; Lin, Y.; Wang, S. The connectivity and nature diagnosability of expanded k-ary n-cubes. RAIRO Theor. Inform. Appl. 2017, 51, 71–89. [Google Scholar] [CrossRef]
  16. Bakhtary, P.; Echi, O. On minimal Hamming compatible distances. RAIRO Theor. Inform. Appl. 2014, 48, 495–503. [Google Scholar] [CrossRef]
  17. de Moivre, A. The Doctrine of Chances: Or, a Method of Calculating the Probabilities of Events in Play; Chelsea Publishing Company: New York, NY, USA, 1967. [Google Scholar]
  18. Navarro, G. A guided tour to approximate string matching. ACM Comput. Surv. 2001, 33, 31–88. [Google Scholar] [CrossRef]
Table 1. Levenshtein spheres S L ( u , 1 ) of center u and radius p = 1 over the alphabet Σ = { 0 , 1 , 2 } , illustrating the cardinality | S L ( u , 1 ) | stated in Theorem 2.
Table 1. Levenshtein spheres S L ( u , 1 ) of center u and radius p = 1 over the alphabet Σ = { 0 , 1 , 2 } , illustrating the cardinality | S L ( u , 1 ) | stated in Theorem 2.
u S L ( u , 1 ) | S L ( u , 1 ) |
010, 1, 00, 02, 11, 21, 001, 010, 011, 012, 021, 101, 20113
01000, 01, 10, 000, 011, 012, 020, 110, 210, 0010, 0100, 0101, 0102, 0110, 0120, 0210, 1010, 201018
0101001, 010, 011, 101, 0001, 0100, 0102, 0111, 0121, 0201, 1101, 2101, 00101, 01001, 01010, 01011, 01012, 01021, 01101, 01201, 02101, 10101, 2010123
Table 2. Levenshtein spheres S L ( u , 2 ) over the alphabet Σ = { 0 , 1 , 2 } .
Table 2. Levenshtein spheres S L ( u , 2 ) over the alphabet Σ = { 0 , 1 , 2 } .
u S L ( u , 2 ) | S L ( u , 2 ) |
01 ε , 2, 10, 12, 20, 22, 000, 002, 020, 022, 100, 102, 110, 111, 112, 121, 200, 202, 210, 211, 212, 221, 0001, 0010, 0011, 0012, 0021, 0100, 0101, 0102, 0110, 0111, 0112, 0120, 0121, 0122, 0201, 0210, 0211, 0212, 0221, 1001, 1010, 1011, 1012, 1021, 1101, 1201, 2001, 2010, 2011, 2012, 2021, 2101, 220155
0100, 1, 02, 11, 12, 20, 21, 001, 002, 021, 022, 100, 101, 102, 111, 112, 120, 200, 201, 211, 212, 220, 0000, 0001, 0002, 0011, 0012, 0020, 0111, 0112, 0121, 0122, 0200, 0201, 0202, 0211, 0212, 0220, 1000, 1011, 1012, 1020, 1100, 1101, 1102, 1110, 1120, 1210, 2000, 2011, 2012, 2020, 2100, 2101, 2102, 2110, 2120, 2210, 00010, 00100, 00101, 00102, 00110, 00120, 00210, 01000, 01001, 01002, 01010, 01011, 01012, 01020, 01021, 01022, 01100, 01101, 01102, 01110, 01120, 01200, 01201, 01202, 01210, 01220, 02010, 02100, 02101, 02102, 02110, 02120, 02210, 10010, 10100, 10101, 10102, 10110, 10120, 10210, 11010, 12010, 20010, 20100, 20101, 20102, 20110, 20120, 20210, 21010, 22010109
010100, 01, 10, 11, 000, 002, 012, 020, 021, 100, 102, 110, 111, 121, 201, 210, 211, 0000, 0002, 0010, 0011, 0012, 0021, 0110, 0112, 0120, 0122, 0200, 0202, 0210, 0211, 0221, 1001, 1010, 1011, 1012, 1021, 1100, 1102, 1111, 1121, 1201, 2001, 2010, 2011, 2100, 2102, 2111, 2121, 2201, 00001, 00010, 00011, 00012, 00021, 00100, 00102, 00111, 00121, 00201, 01000, 01002, 01020, 01022, 01100, 01102, 01110, 01111, 01112, 01121, 01200, 01202, 01210, 01211, 01212, 01221, 02001, 02010, 02011, 02012, 02021, 02100, 02102, 02111, 02121, 02201, 10001, 10100, 10102, 10111, 10121, 10201, 11001, 11010, 11011, 11012, 11021, 11101, 11201, 12101, 20001, 20100, 20102, 20111, 20121, 20201, 21001, 21010, 21011, 21012, 21021, 21101, 21201, 22101, 000101, 001001, 001010, 001011, 001012, 001021, 001101, 001201, 002101, 010001, 010010, 010011, 010012, 010021, 010100, 010101, 010102, 010110, 010111, 010112, 010120, 010121, 010122, 010201, 010210, 010211, 010212, 010221, 011001, 011010, 011011, 011012, 011021, 011101, 011201, 012001, 012010, 012011, 012012, 012021, 012101, 012201, 020101, 021001, 021010, 021011, 021012, 021021, 021101, 021201, 022101, 100101, 101001, 101010, 101011, 101012, 101021, 101101, 101201, 102101, 110101, 120101, 200101, 201001, 201010, 201011, 201012, 201021, 201101, 201201, 202101, 210101, 220101187
Table 3. Levenshtein spheres S L ( a n , 2 ) over Σ = { 0 , 1 } , illustrating Theorem 4.
Table 3. Levenshtein spheres S L ( a n , 2 ) over Σ = { 0 , 1 } , illustrating Theorem 4.
u S L ( u , 2 ) | S L ( u , 2 ) |
011, 000, 001, 010, 011, 100, 101, 1108
00 ε , 1, 11, 011, 101, 110, 0000, 0001, 0010, 0011, 0100, 0101, 0110, 1000, 1001, 1010, 110017
0000, 01, 10, 011, 101, 110, 0011, 0101, 0110, 1001, 1010, 1100, 00000, 00001, 00010, 00011, 00100, 00101, 00110, 01000, 01001, 01010, 01100, 10000, 10001, 10010, 10100, 1100028
000000, 001, 010, 100, 0011, 0101, 0110, 1001, 1010, 1100, 00011, 00101, 00110, 01001, 01010, 01100, 10001, 10010, 10100, 11000, 000000, 000001, 000010, 000011, 000100, 000101, 000110, 001000, 001001, 001010, 001100, 010000, 010001, 010010, 010100, 011000, 100000, 100001, 100010, 100100, 101000, 11000042
00000000, 0001, 0010, 0100, 1000, 00011, 00101, 00110, 01001, 01010, 01100, 10001, 10010, 10100, 11000, 000011, 000101, 000110, 001001, 001010, 001100, 010001, 010010, 010100, 011000, 100001, 100010, 100100, 101000, 110000, 0000000, 0000001, 0000010, 0000011, 0000100, 0000101, 0000110, 0001000, 0001001, 0001010, 0001100, 0010000, 0010001, 0010010, 0010100, 0011000, 0100000, 0100001, 0100010, 0100100, 0101000, 0110000, 1000000, 1000001, 1000010, 1000100, 1001000, 1010000, 110000059
Table 4. Levenshtein spheres S L ( u , 3 ) over the alphabet Σ = { 0 , 1 } , illustrating the volume computation described in Theorem 4.
Table 4. Levenshtein spheres S L ( u , 3 ) over the alphabet Σ = { 0 , 1 } , illustrating the volume computation described in Theorem 4.
u S L ( u , 3 ) | S L ( u , 3 ) |
0111, 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 111016
00111, 0111, 1011, 1101, 1110, 00000, 00001, 00010, 00011, 00100, 00101, 00110, 00111, 01000, 01001, 01010, 01011, 01100, 01101, 01110, 10000, 10001, 10010, 10011, 10100, 10101, 10110, 11000, 11001, 11010, 1110031
000 ε , 1, 11, 111, 0111, 1011, 1101, 1110, 00111, 01011, 01101, 01110, 10011, 10101, 10110, 11001, 11010, 11100, 000000, 000001, 000010, 000011, 000100, 000101, 000110, 000111, 001000, 001001, 001010, 001011, 001100, 001101, 001110, 010000, 010001, 010010, 010011, 010100, 010101, 010110, 011000, 011001, 011010, 011100, 100000, 100001, 100010, 100011, 100100, 100101, 100110, 101000, 101001, 101010, 101100, 110000, 110001, 110010, 110100, 11100060
00000, 01, 10, 011, 101, 110, 0111, 1011, 1101, 1110, 00111, 01011, 01101, 01110, 10011, 10101, 10110, 11001, 11010, 11100, 000111, 001011, 001101, 001110, 010011, 010101, 010110, 011001, 011010, 011100, 100011, 100101, 100110, 101001, 101010, 101100, 110001, 110010, 110100, 111000, 0000000, 0000001, 0000010, 0000011, 0000100, 0000101, 0000110, 0000111, 0001000, 0001001, 0001010, 0001011, 0001100, 0001101, 0001110, 0010000, 0010001, 0010010, 0010011, 0010100, 0010101, 0010110, 0011000, 0011001, 0011010, 0011100, 0100000, 0100001, 0100010, 0100011, 0100100, 0100101, 0100110, 0101000, 0101001, 0101010, 0101100, 0110000, 0110001, 0110010, 0110100, 0111000, 1000000, 1000001, 1000010, 1000011, 1000100, 1000101, 1000110, 1001000, 1001001, 1001010, 1001100, 1010000, 1010001, 1010010, 1010100, 1011000, 1100000, 1100001, 1100010, 1100100, 1101000, 1110000104
0000000, 001, 010, 100, 0011, 0101, 0110, 1001, 1010, 1100, 00111, 01011, 01101, 01110, 10011, 10101, 10110, 11001, 11010, 11100, 000111, 001011, 001101, 001110, 010011, 010101, 010110, 011001, 011010, 011100, 100011, 100101, 100110, 101001, 101010, 101100, 110001, 110010, 110100, 111000, 0000111, 0001011, 0001101, 0001110, 0010011, 0010101, 0010110, 0011001, 0011010, 0011100, 0100011, 0100101, 0100110, 0101001, 0101010, 0101100, 0110001, 0110010, 0110100, 0111000, 1000011, 1000101, 1000110, 1001001, 1001010, 1001100, 1010001, 1010010, 1010100, 1011000, 1100001, 1100010, 1100100, 1101000, 1110000, 00000000, 00000001, 00000010, 00000011, 00000100, 00000101, 00000110, 00000111, 00001000, 00001001, 00001010, 00001011, 00001100, 00001101, 00001110, 00010000, 00010001, 00010010, 00010011, 00010100, 00010101, 00010110, 00011000, 00011001, 00011010, 00011100, 00100000, 00100001, 00100010, 00100011, 00100100, 00100101, 00100110, 00101000, 00101001, 00101010, 00101100, 00110000, 00110001, 00110010, 00110100, 00111000, 01000000, 01000001, 01000010, 01000011, 01000100, 01000101, 01000110, 01001000, 01001001, 01001010, 01001100, 01010000, 01010001, 01010010, 01010100, 01011000, 01100000, 01100001, 01100010, 01100100, 01101000, 01110000, 10000000, 10000001, 10000010, 10000011, 10000100, 10000101, 10000110, 10001000, 10001001, 10001010, 10001100, 10010000, 10010001, 10010010, 10010100, 10011000, 10100000, 10100001, 10100010, 10100100, 10101000, 10110000, 11000000, 11000001, 11000010, 11000100, 11001000, 11010000, 11100000168
Table 5. Illustration of | S L ( u , p ) | for p = 2 , 3 , 5 , 8 over the alphabet = { 0 , 1 } with fixed n = l ( u ) and run-length ρ ( n ) . See Problem 3 for further discussion.
Table 5. Illustration of | S L ( u , p ) | for p = 2 , 3 , 5 , 8 over the alphabet = { 0 , 1 } with fixed n = l ( u ) and run-length ρ ( n ) . See Problem 3 for further discussion.
n ρ ( u ) u | S L ( u , 2 ) | | S L ( u , 3 ) | | S L ( u , 5 ) | | S L ( u , 8 ) |
420001471104724026
1110471104724026
0011491114744027
1100491114744027
430010521124744028
1101521124744028
0110531124754029
1001531124754029
440101551124744028
1010551124744028
Table 6. Distinct Volumes for equal run-partitions (the alphabet is Σ = { 0 , 1 , 2 } ).
Table 6. Distinct Volumes for equal run-partitions (the alphabet is Σ = { 0 , 1 , 2 } ).
String uRun: ρ ( u ) Run-Partition Sequence: rp ( u ) Volume of the Sphere S L ( u , 2 )
01000003(1, 1, 5)446
01222223(1, 1, 5)448
01000014(1, 1, 4, 1)479
01222204(1, 1, 4, 1)481
01110105(1, 3, 1, 1, 1)514
01110125(1, 3, 1, 1, 1)517
00101016(2, 1, 1, 1, 1, 1)542
11020206(2, 1, 1, 1, 1, 1)547
01010107(1, 1, 1, 1, 1, 1, 1)565
01010127(1, 1, 1, 1, 1, 1, 1)571
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Algarni, S.; Echi, O. Spheres of Strings Under the Levenshtein Distance. Axioms 2025, 14, 550. https://doi.org/10.3390/axioms14080550

AMA Style

Algarni S, Echi O. Spheres of Strings Under the Levenshtein Distance. Axioms. 2025; 14(8):550. https://doi.org/10.3390/axioms14080550

Chicago/Turabian Style

Algarni, Said, and Othman Echi. 2025. "Spheres of Strings Under the Levenshtein Distance" Axioms 14, no. 8: 550. https://doi.org/10.3390/axioms14080550

APA Style

Algarni, S., & Echi, O. (2025). Spheres of Strings Under the Levenshtein Distance. Axioms, 14(8), 550. https://doi.org/10.3390/axioms14080550

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop