Next Article in Journal
On the Condition of Independence of Linear Forms with a Random Number of Summands
Previous Article in Journal
Perishable Inventory System with N-Policy, MAP Arrivals, and Impatient Customers
Article

Solving the Longest Common Subsequence Problem Concerning Non-Uniform Distributions of Letters in Input Strings

by 1,†, 2,†, 1,3,*,†, 1,†, 4,† and 3,†
1
Faculty of Natural Science and Mathematics, University of Banja Luka, 78 000 Banja Luka, Bosnia and Herzegovina
2
Faculty of Mathematics, University of Belgrade, 105104 Belgrade, Serbia
3
Institute of Logic and Computation, Faculty of Informatics, TU Wien, 1040 Vienna, Austria
4
Artificial Intelligence Research Institute (IIIA-CSIC), Campus UAB, 08193 Bellaterra, Spain
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Academic Editor: Alfredo Milani
Mathematics 2021, 9(13), 1515; https://doi.org/10.3390/math9131515
Received: 21 May 2021 / Revised: 23 June 2021 / Accepted: 25 June 2021 / Published: 29 June 2021
(This article belongs to the Section Mathematics and Computer Science)
The longest common subsequence (LCS) problem is a prominent NP–hard optimization problem where, given an arbitrary set of input strings, the aim is to find a longest subsequence, which is common to all input strings. This problem has a variety of applications in bioinformatics, molecular biology and file plagiarism checking, among others. All previous approaches from the literature are dedicated to solving LCS instances sampled from uniform or near-to-uniform probability distributions of letters in the input strings. In this paper, we introduce an approach that is able to effectively deal with more general cases, where the occurrence of letters in the input strings follows a non-uniform distribution such as a multinomial distribution. The proposed approach makes use of a time-restricted beam search, guided by a novel heuristic named Gmpsum. This heuristic combines two complementary scoring functions in the form of a convex combination. Furthermore, apart from the close-to-uniform benchmark sets from the related literature, we introduce three new benchmark sets that differ in terms of their statistical properties. One of these sets concerns a case study in the context of text analysis. We provide a comprehensive empirical evaluation in two distinctive settings: (1) short-time execution with fixed beam size in order to evaluate the guidance abilities of the compared search heuristics; and (2) long-time executions with fixed target duration times in order to obtain high-quality solutions. In both settings, the newly proposed approach performs comparably to state-of-the-art techniques in the context of close-to-uniform instances and outperforms state-of-the-art approaches for non-uniform instances. View Full-Text
Keywords: longest common subsequence problem; multi-nomial distribution; probability-based search guidance longest common subsequence problem; multi-nomial distribution; probability-based search guidance
Show Figures

Figure 1

MDPI and ACS Style

Nikolic, B.; Kartelj, A.; Djukanovic, M.; Grbic, M.; Blum, C.; Raidl, G. Solving the Longest Common Subsequence Problem Concerning Non-Uniform Distributions of Letters in Input Strings. Mathematics 2021, 9, 1515. https://doi.org/10.3390/math9131515

AMA Style

Nikolic B, Kartelj A, Djukanovic M, Grbic M, Blum C, Raidl G. Solving the Longest Common Subsequence Problem Concerning Non-Uniform Distributions of Letters in Input Strings. Mathematics. 2021; 9(13):1515. https://doi.org/10.3390/math9131515

Chicago/Turabian Style

Nikolic, Bojan, Aleksandar Kartelj, Marko Djukanovic, Milana Grbic, Christian Blum, and Günther Raidl. 2021. "Solving the Longest Common Subsequence Problem Concerning Non-Uniform Distributions of Letters in Input Strings" Mathematics 9, no. 13: 1515. https://doi.org/10.3390/math9131515

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop