Spheres of Strings Under the Levenshtein Distance
Abstract
1. Introduction
- (i)
- , where each is a character and ;
- (ii)
- , where each , with .
2. Spheres of Strings Under the Hamming Metric
3. Scattered Strings and Run-Length Encoding
4. Unit Spheres of Strings Under the Levenshtein Metric
- 1.
- For real numbers , the notation represents the intersection .
- 2.
- If , where , , and , then v is referred to as (the prefix of u of length i), and w is referred to as (the suffix of u of length j). By convention, .
- 3.
- denotes the set of all strings obtained from u by deleting one character. The cardinality of is referred to as the deletion degree of u and denoted by . By convention, . Using prefix–suffix notation,
- 4.
- denotes the set of all strings obtained from u by inserting one character. The cardinality of is referred to as the insertion degree of u and denoted by . Using prefix–suffix notation, we have
- 5.
- represents the set of all strings obtained from u by replacing one character with a character from . The cardinality of is referred to as the substitution degree of u and denoted by . By convention, . Using prefix–suffix notation, we have
- 6.
- For , we denote by the string obtained from u by deleting the character , i.e., .
- 7.
- For and , we denote by the string obtained by inserting the character a at position in u, i.e., .
- 8.
- For and , we denote by the string obtained by substituting the character with a, i.e., .
- 0.
- 1.
- If , then
- 2.
- If and , then
- 3.
- If , then .
- 1.
- LetCase 1: and . Based on Lemma 3, the two equal insertions of u lead to , contradicting the -decomposition of u. So this case cannot happen.Case 2: and . Following Lemma 3, we obtain , andCase 3: and . Lemma 3 guarantees the equality . This givesCase 4: and . In this scenario, and , and as a consequence, can be any character of .As a conclusion, we obtain
- 2.
- LetBased on Lemma 3, this results in .We assert that . Otherwise, by Lemma 3, the equality of two insertions of u implies that , which contradicts the -decomposition of u.We also claim that . Otherwise, by Lemma 3, , again obtaining a contradiction.As a consequence,Additionally, it is clear that . Therefore,
- 3.
- If , and , then there would be a string
- 1.
- ;
- 2.
- ;
- 3.
- 1.
- The minimum value of is . This value is attained by strings structured as , where a is a character from the alphabet.
- 2.
- The maximum value of is . This value is attained by strings u where the characters are maximally scattered.
- 3.
- Intermediate value result: Every integer in is the volume of for some string u of length n.
5. Spheres of Strings with Centers of Run-Length 1
- 1.
- .
- 2.
- .
- 3.
- 4.
- .
- 5.
- If , then
- 1.
- Case 1: .Replace every character of v that is not equal to a with a, obtaining the intermediate string . The number of such substitutions is , where denotes the number of occurrences of a in v.Then, insert the character a exactly times to produce the string .The total number of edit operations is thereforewhich shows that .
- 2.
- Case 2: and .Delete every character of v that is not equal to a, obtaining the string .Then, delete occurrences of a to obtain the string .The number of edit operations performed iswhich implies .
- 3.
- Case 3: .Delete characters of v that are not equal to a, producing a string w of length n.Then, substitute each character of w that is not equal to a with a, resulting in the string .The total number of edit operations ishence .
- 1.
- 2.
- 3.
- 1.
- A string v lies in if and only if ; equivalently, . Consequently, whenever .Assume now that . Any string v of length in contains exactly occurrences of a; all remaining characters belong to . The number of such strings of length j isHence,
- 2.
- A string v with length lies in precisely when and . Set ; thenThe number of strings v of length for which equalsConsequently,
- 3.
- Strings satisfy , , and . Set ; then . For such an integer j, the number of strings v of length with equals
- -
- If , then the admissible values of j are . Hence,
- -
- If , then the admissible values of j are . Thus,
- is monotonically increasing as ;
- as ,
6. Pseudocode and Illustrative Examples
- 1.
- Evaluating the Levenshtein distance between arbitrary strings;
- 2.
- Enumerating the volume of the Levenshtein sphere of radius p centered at a fixed string u, denoted .
Algorithm 1 Enumerate all strings at Levenshtein distance p from a given string u |
|
7. Discussion
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Levenshtein, V.I. Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 1966, 10, 707–710. [Google Scholar]
- Barrón-Cedeno, A.; Stein, B.; Rosso, P. Cross-language plagiarism detection. Lang Resour. Eval. 2011, 45, 45–62. [Google Scholar]
- Brill, E.; Moore, R.C. An improved error model for noisy channel spelling correction. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, Hong Kong, China, 3–6 October 2000; pp. 286–293. [Google Scholar]
- Koehn, P. Europarl: A Parallel Corpus for Statistical Machine Translation. In Proceedings of the 10th Machine Translation Summit, Phuket, Thailand, 12–16 September 2005; pp. 79–86. [Google Scholar]
- Jurafsky, D.; Martin, J.H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition; Prentice Hall: Upper Saddle River, NJ, USA, 2008. [Google Scholar]
- Hamming, R.W. Error detecting and error correcting codes. Bell Syst. Tech. J. 1950, 29, 147–160. [Google Scholar] [CrossRef]
- Katz, J.; Lindell, Y. Introduction to Modern Cryptography, 3rd ed.; Chapman & Hall/CRC Cryptography and Network Security; CRC Press: Boca Raton, FL, USA, 2021. [Google Scholar]
- Lin, S.; Costello, D.J. Error Control Coding: Fundamentals and Applications, 2nd ed.; Pearson/Prentice Hall: Upper Saddle River, NJ, USA, 2004. [Google Scholar]
- Andoni, A.; Indyk, P. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 2018, 51, 117–122. [Google Scholar] [CrossRef]
- Amir, A.; Amit, M.; Landau, G.M.; Sokol, D. Period recovery of strings over the Hamming and edit distances. Theor. Comput. Sci. 2018, 710, 2–18. [Google Scholar] [CrossRef]
- Marçais, G.; DeBlasio, D.; Pandey, P.; Kingsford, C. Locality-sensitive hashing for the edit distance. Bioinformatics 2019, 35, i127–i135. [Google Scholar] [CrossRef] [PubMed]
- Malon, S.; Freeman, H. On the encoding of arbitrary geometric configurations. IRE Trans. EC 1961, 10, 260–268. [Google Scholar]
- Koyano, H.; Hayashida, M. Volume formula and growth rates of the balls of strings under the edit distances. Appl. Math. Comput. 2023, 458, 128202. [Google Scholar] [CrossRef]
- Wang, M.; Wang, S. Connectivity and diagnosability of center k-ary n-cubes. Discrete Appl. Math. 2021, 294, 98–107. [Google Scholar] [CrossRef]
- Wang, M.; Lin, Y.; Wang, S. The connectivity and nature diagnosability of expanded k-ary n-cubes. RAIRO Theor. Inform. Appl. 2017, 51, 71–89. [Google Scholar] [CrossRef]
- Bakhtary, P.; Echi, O. On minimal Hamming compatible distances. RAIRO Theor. Inform. Appl. 2014, 48, 495–503. [Google Scholar] [CrossRef]
- de Moivre, A. The Doctrine of Chances: Or, a Method of Calculating the Probabilities of Events in Play; Chelsea Publishing Company: New York, NY, USA, 1967. [Google Scholar]
- Navarro, G. A guided tour to approximate string matching. ACM Comput. Surv. 2001, 33, 31–88. [Google Scholar] [CrossRef]
u | ||
---|---|---|
01 | 0, 1, 00, 02, 11, 21, 001, 010, 011, 012, 021, 101, 201 | 13 |
010 | 00, 01, 10, 000, 011, 012, 020, 110, 210, 0010, 0100, 0101, 0102, 0110, 0120, 0210, 1010, 2010 | 18 |
0101 | 001, 010, 011, 101, 0001, 0100, 0102, 0111, 0121, 0201, 1101, 2101, 00101, 01001, 01010, 01011, 01012, 01021, 01101, 01201, 02101, 10101, 20101 | 23 |
u | ||
---|---|---|
01 | , 2, 10, 12, 20, 22, 000, 002, 020, 022, 100, 102, 110, 111, 112, 121, 200, 202, 210, 211, 212, 221, 0001, 0010, 0011, 0012, 0021, 0100, 0101, 0102, 0110, 0111, 0112, 0120, 0121, 0122, 0201, 0210, 0211, 0212, 0221, 1001, 1010, 1011, 1012, 1021, 1101, 1201, 2001, 2010, 2011, 2012, 2021, 2101, 2201 | 55 |
010 | 0, 1, 02, 11, 12, 20, 21, 001, 002, 021, 022, 100, 101, 102, 111, 112, 120, 200, 201, 211, 212, 220, 0000, 0001, 0002, 0011, 0012, 0020, 0111, 0112, 0121, 0122, 0200, 0201, 0202, 0211, 0212, 0220, 1000, 1011, 1012, 1020, 1100, 1101, 1102, 1110, 1120, 1210, 2000, 2011, 2012, 2020, 2100, 2101, 2102, 2110, 2120, 2210, 00010, 00100, 00101, 00102, 00110, 00120, 00210, 01000, 01001, 01002, 01010, 01011, 01012, 01020, 01021, 01022, 01100, 01101, 01102, 01110, 01120, 01200, 01201, 01202, 01210, 01220, 02010, 02100, 02101, 02102, 02110, 02120, 02210, 10010, 10100, 10101, 10102, 10110, 10120, 10210, 11010, 12010, 20010, 20100, 20101, 20102, 20110, 20120, 20210, 21010, 22010 | 109 |
0101 | 00, 01, 10, 11, 000, 002, 012, 020, 021, 100, 102, 110, 111, 121, 201, 210, 211, 0000, 0002, 0010, 0011, 0012, 0021, 0110, 0112, 0120, 0122, 0200, 0202, 0210, 0211, 0221, 1001, 1010, 1011, 1012, 1021, 1100, 1102, 1111, 1121, 1201, 2001, 2010, 2011, 2100, 2102, 2111, 2121, 2201, 00001, 00010, 00011, 00012, 00021, 00100, 00102, 00111, 00121, 00201, 01000, 01002, 01020, 01022, 01100, 01102, 01110, 01111, 01112, 01121, 01200, 01202, 01210, 01211, 01212, 01221, 02001, 02010, 02011, 02012, 02021, 02100, 02102, 02111, 02121, 02201, 10001, 10100, 10102, 10111, 10121, 10201, 11001, 11010, 11011, 11012, 11021, 11101, 11201, 12101, 20001, 20100, 20102, 20111, 20121, 20201, 21001, 21010, 21011, 21012, 21021, 21101, 21201, 22101, 000101, 001001, 001010, 001011, 001012, 001021, 001101, 001201, 002101, 010001, 010010, 010011, 010012, 010021, 010100, 010101, 010102, 010110, 010111, 010112, 010120, 010121, 010122, 010201, 010210, 010211, 010212, 010221, 011001, 011010, 011011, 011012, 011021, 011101, 011201, 012001, 012010, 012011, 012012, 012021, 012101, 012201, 020101, 021001, 021010, 021011, 021012, 021021, 021101, 021201, 022101, 100101, 101001, 101010, 101011, 101012, 101021, 101101, 101201, 102101, 110101, 120101, 200101, 201001, 201010, 201011, 201012, 201021, 201101, 201201, 202101, 210101, 220101 | 187 |
u | ||
---|---|---|
0 | 11, 000, 001, 010, 011, 100, 101, 110 | 8 |
00 | , 1, 11, 011, 101, 110, 0000, 0001, 0010, 0011, 0100, 0101, 0110, 1000, 1001, 1010, 1100 | 17 |
000 | 0, 01, 10, 011, 101, 110, 0011, 0101, 0110, 1001, 1010, 1100, 00000, 00001, 00010, 00011, 00100, 00101, 00110, 01000, 01001, 01010, 01100, 10000, 10001, 10010, 10100, 11000 | 28 |
0000 | 00, 001, 010, 100, 0011, 0101, 0110, 1001, 1010, 1100, 00011, 00101, 00110, 01001, 01010, 01100, 10001, 10010, 10100, 11000, 000000, 000001, 000010, 000011, 000100, 000101, 000110, 001000, 001001, 001010, 001100, 010000, 010001, 010010, 010100, 011000, 100000, 100001, 100010, 100100, 101000, 110000 | 42 |
00000 | 000, 0001, 0010, 0100, 1000, 00011, 00101, 00110, 01001, 01010, 01100, 10001, 10010, 10100, 11000, 000011, 000101, 000110, 001001, 001010, 001100, 010001, 010010, 010100, 011000, 100001, 100010, 100100, 101000, 110000, 0000000, 0000001, 0000010, 0000011, 0000100, 0000101, 0000110, 0001000, 0001001, 0001010, 0001100, 0010000, 0010001, 0010010, 0010100, 0011000, 0100000, 0100001, 0100010, 0100100, 0101000, 0110000, 1000000, 1000001, 1000010, 1000100, 1001000, 1010000, 1100000 | 59 |
u | ||
---|---|---|
0 | 111, 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110 | 16 |
00 | 111, 0111, 1011, 1101, 1110, 00000, 00001, 00010, 00011, 00100, 00101, 00110, 00111, 01000, 01001, 01010, 01011, 01100, 01101, 01110, 10000, 10001, 10010, 10011, 10100, 10101, 10110, 11000, 11001, 11010, 11100 | 31 |
000 | , 1, 11, 111, 0111, 1011, 1101, 1110, 00111, 01011, 01101, 01110, 10011, 10101, 10110, 11001, 11010, 11100, 000000, 000001, 000010, 000011, 000100, 000101, 000110, 000111, 001000, 001001, 001010, 001011, 001100, 001101, 001110, 010000, 010001, 010010, 010011, 010100, 010101, 010110, 011000, 011001, 011010, 011100, 100000, 100001, 100010, 100011, 100100, 100101, 100110, 101000, 101001, 101010, 101100, 110000, 110001, 110010, 110100, 111000 | 60 |
0000 | 0, 01, 10, 011, 101, 110, 0111, 1011, 1101, 1110, 00111, 01011, 01101, 01110, 10011, 10101, 10110, 11001, 11010, 11100, 000111, 001011, 001101, 001110, 010011, 010101, 010110, 011001, 011010, 011100, 100011, 100101, 100110, 101001, 101010, 101100, 110001, 110010, 110100, 111000, 0000000, 0000001, 0000010, 0000011, 0000100, 0000101, 0000110, 0000111, 0001000, 0001001, 0001010, 0001011, 0001100, 0001101, 0001110, 0010000, 0010001, 0010010, 0010011, 0010100, 0010101, 0010110, 0011000, 0011001, 0011010, 0011100, 0100000, 0100001, 0100010, 0100011, 0100100, 0100101, 0100110, 0101000, 0101001, 0101010, 0101100, 0110000, 0110001, 0110010, 0110100, 0111000, 1000000, 1000001, 1000010, 1000011, 1000100, 1000101, 1000110, 1001000, 1001001, 1001010, 1001100, 1010000, 1010001, 1010010, 1010100, 1011000, 1100000, 1100001, 1100010, 1100100, 1101000, 1110000 | 104 |
00000 | 00, 001, 010, 100, 0011, 0101, 0110, 1001, 1010, 1100, 00111, 01011, 01101, 01110, 10011, 10101, 10110, 11001, 11010, 11100, 000111, 001011, 001101, 001110, 010011, 010101, 010110, 011001, 011010, 011100, 100011, 100101, 100110, 101001, 101010, 101100, 110001, 110010, 110100, 111000, 0000111, 0001011, 0001101, 0001110, 0010011, 0010101, 0010110, 0011001, 0011010, 0011100, 0100011, 0100101, 0100110, 0101001, 0101010, 0101100, 0110001, 0110010, 0110100, 0111000, 1000011, 1000101, 1000110, 1001001, 1001010, 1001100, 1010001, 1010010, 1010100, 1011000, 1100001, 1100010, 1100100, 1101000, 1110000, 00000000, 00000001, 00000010, 00000011, 00000100, 00000101, 00000110, 00000111, 00001000, 00001001, 00001010, 00001011, 00001100, 00001101, 00001110, 00010000, 00010001, 00010010, 00010011, 00010100, 00010101, 00010110, 00011000, 00011001, 00011010, 00011100, 00100000, 00100001, 00100010, 00100011, 00100100, 00100101, 00100110, 00101000, 00101001, 00101010, 00101100, 00110000, 00110001, 00110010, 00110100, 00111000, 01000000, 01000001, 01000010, 01000011, 01000100, 01000101, 01000110, 01001000, 01001001, 01001010, 01001100, 01010000, 01010001, 01010010, 01010100, 01011000, 01100000, 01100001, 01100010, 01100100, 01101000, 01110000, 10000000, 10000001, 10000010, 10000011, 10000100, 10000101, 10000110, 10001000, 10001001, 10001010, 10001100, 10010000, 10010001, 10010010, 10010100, 10011000, 10100000, 10100001, 10100010, 10100100, 10101000, 10110000, 11000000, 11000001, 11000010, 11000100, 11001000, 11010000, 11100000 | 168 |
n | u | |||||
---|---|---|---|---|---|---|
4 | 2 | 0001 | 47 | 110 | 472 | 4026 |
1110 | 47 | 110 | 472 | 4026 | ||
0011 | 49 | 111 | 474 | 4027 | ||
1100 | 49 | 111 | 474 | 4027 | ||
4 | 3 | 0010 | 52 | 112 | 474 | 4028 |
1101 | 52 | 112 | 474 | 4028 | ||
0110 | 53 | 112 | 475 | 4029 | ||
1001 | 53 | 112 | 475 | 4029 | ||
4 | 4 | 0101 | 55 | 112 | 474 | 4028 |
1010 | 55 | 112 | 474 | 4028 |
String u | Run: | Run-Partition Sequence: | Volume of the Sphere |
---|---|---|---|
0100000 | 3 | (1, 1, 5) | 446 |
0122222 | 3 | (1, 1, 5) | 448 |
0100001 | 4 | (1, 1, 4, 1) | 479 |
0122220 | 4 | (1, 1, 4, 1) | 481 |
0111010 | 5 | (1, 3, 1, 1, 1) | 514 |
0111012 | 5 | (1, 3, 1, 1, 1) | 517 |
0010101 | 6 | (2, 1, 1, 1, 1, 1) | 542 |
1102020 | 6 | (2, 1, 1, 1, 1, 1) | 547 |
0101010 | 7 | (1, 1, 1, 1, 1, 1, 1) | 565 |
0101012 | 7 | (1, 1, 1, 1, 1, 1, 1) | 571 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Algarni, S.; Echi, O. Spheres of Strings Under the Levenshtein Distance. Axioms 2025, 14, 550. https://doi.org/10.3390/axioms14080550
Algarni S, Echi O. Spheres of Strings Under the Levenshtein Distance. Axioms. 2025; 14(8):550. https://doi.org/10.3390/axioms14080550
Chicago/Turabian StyleAlgarni, Said, and Othman Echi. 2025. "Spheres of Strings Under the Levenshtein Distance" Axioms 14, no. 8: 550. https://doi.org/10.3390/axioms14080550
APA StyleAlgarni, S., & Echi, O. (2025). Spheres of Strings Under the Levenshtein Distance. Axioms, 14(8), 550. https://doi.org/10.3390/axioms14080550