Special Issue "String Matching and Its Applications"

A special issue of Algorithms (ISSN 1999-4893).

Deadline for manuscript submissions: 31 October 2019.

Special Issue Editors

Guest Editor
Prof. Dr. Thierry Lecroq

LITIS EA 4108, Normastic FR3638, IRIB, University of Rouen Normandie, Normandie University, Rouen, France
Website | E-Mail
Phone: 33(0)235146581
Interests: stringology; bioinformatics; combinatorics on words
Guest Editor
Prof. Dr. Simone Faro

Department of Mathematics and Computer Science, University of Catania, I-95125 Catania, Italy
Website | E-Mail
Interests: algorithms on strings; algorithms on graphs; bioinformatics

Special Issue Information

Dear Colleagues,

With the rapid growth of available data in almost all fields, there is large demand for efficient pattern-matching algorithms. We invite you to submit your latest research in the area of string matching (single or multiple, on-line or off-line, exact or approximate, in uncompressed or compressed form) describing new data structures and/or new algorithms. High-quality papers are solicited to address both theoretical and practical issues of string matching including, but not restricted to, natural language processing, text mining, bioinformatics (DNA, RNA, protein sequences), chemoinformatics, intrusion detection, security, plagiarism detection, digital forensics, video retrieval, and music analysis.

Prof. Dr. Thierry Lecroq
Prof. Dr. Simone Faro
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • exact string matching
  • approximate string matching
  • multiple string matching
  • development and analysis of algorithms
  • text indexing
  • data structures for string matching
  • information retrieval
  • searching for regularities
  • string matching in natural language processing
  • string matching in bioinformatics

Published Papers (4 papers)

View options order results:
result details:
Displaying articles 1-4
Export citation of selected articles as:

Research

Open AccessArticle
Lyndon Factorization Algorithms for Small Alphabets and Run-Length Encoded Strings
Algorithms 2019, 12(6), 124; https://doi.org/10.3390/a12060124
Received: 24 May 2019 / Accepted: 17 June 2019 / Published: 21 June 2019
PDF Full-text (829 KB) | HTML Full-text | XML Full-text
Abstract
We present two modifications of Duval’s algorithm for computing the Lyndon factorization of a string. One of the algorithms has been designed for strings containing runs of the smallest character. It works best for small alphabets and it is able to skip a [...] Read more.
We present two modifications of Duval’s algorithm for computing the Lyndon factorization of a string. One of the algorithms has been designed for strings containing runs of the smallest character. It works best for small alphabets and it is able to skip a significant number of characters of the string. Moreover, it can be engineered to have linear time complexity in the worst case. When there is a run-length encoded string R of length ρ , the other algorithm computes the Lyndon factorization of R in O ( ρ ) time and in constant space. It is shown by experimental results that the new variations are faster than Duval’s original algorithm in many scenarios. Full article
(This article belongs to the Special Issue String Matching and Its Applications)
Figures

Figure 1

Open AccessArticle
Applications of Non-Uniquely Decodable Codes to Privacy-Preserving High-Entropy Data Representation
Algorithms 2019, 12(4), 78; https://doi.org/10.3390/a12040078
Received: 28 February 2019 / Revised: 4 April 2019 / Accepted: 11 April 2019 / Published: 17 April 2019
PDF Full-text (344 KB) | HTML Full-text | XML Full-text
Abstract
Non-uniquely-decodable (non-UD) codes can be defined as the codes that cannot be uniquely decoded without additional disambiguation information. These are mainly the class of non–prefix–free codes, where a code-word can be a prefix of other(s), and thus, the code-word boundary information is essential [...] Read more.
Non-uniquely-decodable (non-UD) codes can be defined as the codes that cannot be uniquely decoded without additional disambiguation information. These are mainly the class of non–prefix–free codes, where a code-word can be a prefix of other(s), and thus, the code-word boundary information is essential for correct decoding. Due to their inherent unique decodability problem, such non-UD codes have not received much attention except a few studies, in which using compressed data structures to represent the disambiguation information efficiently had been previously proposed. It had been shown before that the compression ratio can get quite close to Huffman/Arithmetic codes with an additional capability of providing direct access in compressed data, which is a missing feature in the regular Huffman codes. In this study we investigate non-UD codes in another dimension addressing the privacy of the high-entropy data. We particularly focus on such massive volumes, where typical examples are encoded video or similar multimedia files. Representation of such a volume with non–UD coding creates two elements as the disambiguation information and the payload, where decoding the original data from these elements becomes hard when one of them is missing. We make use of this observation for privacy concerns. and study the space consumption as well as the hardness of that decoding. We conclude that non-uniquely-decodable codes can be an alternative to selective encryption schemes that aim to secure only part of the data when data is huge. We provide a freely available software implementation of the proposed scheme as well. Full article
(This article belongs to the Special Issue String Matching and Its Applications)
Figures

Figure 1

Open AccessArticle
Permuted Pattern Matching Algorithms on Multi-Track Strings
Algorithms 2019, 12(4), 73; https://doi.org/10.3390/a12040073
Received: 28 February 2019 / Revised: 2 April 2019 / Accepted: 3 April 2019 / Published: 8 April 2019
PDF Full-text (1984 KB) | HTML Full-text | XML Full-text
Abstract
A multi-track string is a tuple of strings of the same length. Given the pattern and text of two multi-track strings, the permuted pattern matching problem is to find the occurrence positions of all permutations of the pattern in the text. In this [...] Read more.
A multi-track string is a tuple of strings of the same length. Given the pattern and text of two multi-track strings, the permuted pattern matching problem is to find the occurrence positions of all permutations of the pattern in the text. In this paper, we propose several algorithms for permuted pattern matching. Our first algorithm, which is based on the Knuth–Morris–Pratt (KMP) algorithm, has a fast theoretical computing time with O ( m k ) as the preprocessing time and O ( n k log σ ) as the matching time, where n, m, k, σ , and occ denote the length of the text, the length of the pattern, the number of strings in the multi-track, the alphabet size, and the number of occurrences of the pattern, respectively. We then improve the KMP-based algorithm by using an automaton, which has a better experimental running time. The next proposed algorithms are based on the Boyer–Moore algorithm and the Horspool algorithm that try to perform pattern matching. These algorithms are the fastest experimental algorithms. Furthermore, we propose an extension of the AC-automaton algorithm that can solve dictionary matching on multi-tracks, which is a task to find multiple multi-track patterns in a multi-track text. Finally, we propose filtering algorithms that can perform permuted pattern matching quickly in practice. Full article
(This article belongs to the Special Issue String Matching and Its Applications)
Figures

Figure 1

Open AccessArticle
Efficient Algorithms for Computing the Inner Edit Distance of a Regular Language via Transducers
Algorithms 2018, 11(11), 165; https://doi.org/10.3390/a11110165
Received: 6 October 2018 / Revised: 16 October 2018 / Accepted: 18 October 2018 / Published: 23 October 2018
PDF Full-text (353 KB) | HTML Full-text | XML Full-text
Abstract
The concept of edit distance and its variants has applications in many areas such as computational linguistics, bioinformatics, and synchronization error detection in data communications. Here, we revisit the problem of computing the inner edit distance of a regular language given via a [...] Read more.
The concept of edit distance and its variants has applications in many areas such as computational linguistics, bioinformatics, and synchronization error detection in data communications. Here, we revisit the problem of computing the inner edit distance of a regular language given via a Nondeterministic Finite Automaton (NFA). This problem relates to the inherent maximal error-detecting capability of the language in question. We present two efficient algorithms for solving this problem, both of which execute in time O ( r 2 n 2 d ) , where r is the cardinality of the alphabet involved, n is the number of transitions in the given NFA, and d is the computed edit distance. We have implemented one of the two algorithms and present here a set of performance tests. The correctness of the algorithms is based on the connection between word distances and error detection and the fact that nondeterministic transducers can be used to represent the errors (resp., edit operations) involved in error-detection (resp., in word distances). Full article
(This article belongs to the Special Issue String Matching and Its Applications)
Figures

Figure 1

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

1. Title: Ambiguous Codes with applications

Author: Muhammed Oguzhan Kulekci

Abstract: Ambiguous codes can be defined as the codes that cannot be uniquely decoded without additional disambiguation information. These are mainly the class of non-prex-free codes, where a codeword can be a prex of other(s), and thus, the codeword boundary information is essential for correct decoding. Although that ambiguity can be seen as a disadvantage in terms of compression, actually there are cases where that intrinsic property can make sense. We present first an efficient ambiguous coding/decoding system, which is publicly available at GitHub. Based on that implementation we provide several applications where such a representation helps us to provide privacy, particularly on massive high entropy volumes.

2. Title: Tandem Repeat Discovery Using A Heuristically Derived Threshold for A String Correspondence Metric

Authors: Bruce Watson and Derrick Courie

Abstract:  It is challenging to select appropriate parameter settings that drive approximate tandem repeat detecting algorithms. This study shows how that challenge can be mitigated by deriving a threshold value of a string correspondence metric based on Levenshtein distances. It is shown how an algorithm can use such a threshold value to effectively identify repeat elements. The threshold value is empirically derived from a recall-precision analysis carried out on synthetic data. It is shown that the performance of an algorithm relying on this threshold value compares very favourably to other published tandem repeat discovery algorithms and sometimes outperforms them.

Algorithms EISSN 1999-4893 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top