Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (19)

Search Parameters:
Keywords = suffix trees

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 1288 KB  
Article
Intrusion Alert Analysis Method for Power Information Communication Networks Based on Data Processing Units
by Rui Zhang, Mingxuan Zhang, Yan Liu, Zhiyi Li, Weiwei Miao and Sujie Shao
Information 2025, 16(7), 547; https://doi.org/10.3390/info16070547 - 27 Jun 2025
Viewed by 624
Abstract
Leveraging Data Processing Units (DPUs) deployed at network interfaces, the DPU-accelerated Intrusion Detection System (IDS) enables microsecond-latency initial traffic inspection through hardware offloading. However, while generating high-throughput alerts, this mechanism amplifies the inherent redundancy and noise issues of traditional IDS systems. This paper [...] Read more.
Leveraging Data Processing Units (DPUs) deployed at network interfaces, the DPU-accelerated Intrusion Detection System (IDS) enables microsecond-latency initial traffic inspection through hardware offloading. However, while generating high-throughput alerts, this mechanism amplifies the inherent redundancy and noise issues of traditional IDS systems. This paper proposes an alert correlation method using multi-similarity factor aggregation and a suffix tree model. First, alerts are preprocessed using LFDIA, employing multiple similarity factors and dynamic thresholding to cluster correlated alerts and reduce redundancy. Next, an attack intensity time series is generated and smoothed with a Kalman filter to eliminate noise and reveal attack trends. Finally, the suffix tree models attack activities, capturing key behavioral paths of high-severity alerts and identifying attacker patterns. Experimental evaluations on the CPTC-2017 and CPTC-2018 datasets validate the proposed method’s effectiveness in reducing alert redundancy, extracting critical attack behaviors, and constructing attack activity sequences. The results demonstrate that the method not only significantly reduces the number of alerts but also accurately reveals core attack characteristics, enhancing the effectiveness of network security defense strategies. Full article
Show Figures

Figure 1

29 pages, 3843 KB  
Review
Whole-Genome Alignment: Methods, Challenges, and Future Directions
by Bacem Saada, Tianchi Zhang, Estevao Siga, Jing Zhang and Maria Malane Magalhães Muniz
Appl. Sci. 2024, 14(11), 4837; https://doi.org/10.3390/app14114837 - 3 Jun 2024
Cited by 7 | Viewed by 11696
Abstract
Whole-genome alignment (WGA) is a critical process in comparative genomics, facilitating the detection of genetic variants and aiding our understanding of evolution. This paper offers a detailed overview and categorization of WGA techniques, encompassing suffix tree-based, hash-based, anchor-based, and graph-based methods. It elaborates [...] Read more.
Whole-genome alignment (WGA) is a critical process in comparative genomics, facilitating the detection of genetic variants and aiding our understanding of evolution. This paper offers a detailed overview and categorization of WGA techniques, encompassing suffix tree-based, hash-based, anchor-based, and graph-based methods. It elaborates on the algorithmic properties of these tools, focusing on performance and methodological aspects. This paper underscores the latest progress in WGA, emphasizing the increasing capacity to manage the growing intricacy and volume of genomic data. However, the field still grapples with computational and biological hurdles affecting the precision and speed of WGA. We explore these challenges and potential future solutions. This paper aims to provide a comprehensive resource for researchers, deepening our understanding of WGA tools and their applications, constraints, and prospects. Full article
(This article belongs to the Section Biomedical Engineering)
Show Figures

Figure 1

20 pages, 5061 KB  
Article
Machine Learning Classification of Time since BNT162b2 COVID-19 Vaccination Based on Array-Measured Antibody Activity
by Qing-Lan Ma, Fei-Ming Huang, Wei Guo, Kai-Yan Feng, Tao Huang and Yu-Dong Cai
Life 2023, 13(6), 1304; https://doi.org/10.3390/life13061304 - 31 May 2023
Cited by 2 | Viewed by 3466
Abstract
Vaccines trigger an immunological response that includes B and T cells, with B cells producing antibodies. SARS-CoV-2 immunity weakens over time after vaccination. Discovering key changes in antigen-reactive antibodies over time after vaccination could help improve vaccine efficiency. In this study, we collected [...] Read more.
Vaccines trigger an immunological response that includes B and T cells, with B cells producing antibodies. SARS-CoV-2 immunity weakens over time after vaccination. Discovering key changes in antigen-reactive antibodies over time after vaccination could help improve vaccine efficiency. In this study, we collected data on blood antibody levels in a cohort of healthcare workers vaccinated for COVID-19 and obtained 73 antigens in samples from four groups according to the duration after vaccination, including 104 unvaccinated healthcare workers, 534 healthcare workers within 60 days after vaccination, 594 healthcare workers between 60 and 180 days after vaccination, and 141 healthcare workers over 180 days after vaccination. Our work was a reanalysis of the data originally collected at Irvine University. This data was obtained in Orange County, California, USA, with the collection process commencing in December 2020. British variant (B.1.1.7), South African variant (B.1.351), and Brazilian/Japanese variant (P.1) were the most prevalent strains during the sampling period. An efficient machine learning based framework containing four feature selection methods (least absolute shrinkage and selection operator, light gradient boosting machine, Monte Carlo feature selection, and maximum relevance minimum redundancy) and four classification algorithms (decision tree, k-nearest neighbor, random forest, and support vector machine) was designed to select essential antibodies against specific antigens. Several efficient classifiers with a weighted F1 value around 0.75 were constructed. The antigen microarray used for identifying antibody levels in the coronavirus features ten distinct SARS-CoV-2 antigens, comprising various segments of both nucleocapsid protein (NP) and spike protein (S). This study revealed that S1 + S2, S1.mFcTag, S1.HisTag, S1, S2, Spike.RBD.His.Bac, Spike.RBD.rFc, and S1.RBD.mFc were most highly ranked among all features, where S1 and S2 are the subunits of Spike, and the suffixes represent the tagging information of different recombinant proteins. Meanwhile, the classification rules were obtained from the optimal decision tree to explain quantitatively the roles of antigens in the classification. This study identified antibodies associated with decreased clinical immunity based on populations with different time spans after vaccination. These antibodies have important implications for maintaining long-term immunity to SARS-CoV-2. Full article
(This article belongs to the Special Issue COVID-19 Prevention and Treatment: 2nd Edition)
Show Figures

Figure 1

18 pages, 612 KB  
Article
Structure and Sequence Aligned Code Summarization with Prefix and Suffix Balanced Strategy
by Jianhui Zeng, Zhiheng Qu and Bo Cai
Entropy 2023, 25(4), 570; https://doi.org/10.3390/e25040570 - 26 Mar 2023
Cited by 1 | Viewed by 2829
Abstract
Source code summarization focuses on generating qualified natural language descriptions of a code snippet (e.g., functionality, usage and version). In an actual development environment, descriptions of the code are missing or not consistent with the code due to human factors, which makes it [...] Read more.
Source code summarization focuses on generating qualified natural language descriptions of a code snippet (e.g., functionality, usage and version). In an actual development environment, descriptions of the code are missing or not consistent with the code due to human factors, which makes it difficult for developers to comprehend and conduct subsequent maintenance. Some existing methods generate summaries from the sequence information of code without considering the structural information. Recently, researchers have adopted the Graph Neural Networks (GNNs) to capture the structural information with modified Abstract Syntax Trees (ASTs) to comprehensively represent a source code, but the alignment method of the two information encoder is hard to decide. In this paper, we propose a source code summarization model named SSCS, a unified transformer-based encoder–decoder architecture, for capturing structural and sequence information. SSCS is designed upon a structure-induced transformer with three main novel improvements. SSCS captures the structural information in a multi-scale aspect with an adapted fusion strategy and adopts a hierarchical encoding strategy to capture the textual information from the perspective of the document. Moreover, SSCS utilizes a bidirectional decoder which generates a summary from opposite direction to balance the generation performance between prefix and suffix. We conduct experiments on two public Java and Python datasets to evaluate our method and the result show that SSCS outperforms the state-of-art code summarization methods. Full article
(This article belongs to the Special Issue Forward Error Correction for Optical CDMA Networks)
Show Figures

Figure 1

27 pages, 496 KB  
Review
A Review on Planted (l, d) Motif Discovery Algorithms for Medical Diagnose
by Satarupa Mohanty, Prasant Kumar Pattnaik, Ahmed Abdulhakim Al-Absi and Dae-Ki Kang
Sensors 2022, 22(3), 1204; https://doi.org/10.3390/s22031204 - 5 Feb 2022
Cited by 3 | Viewed by 3474
Abstract
Personalized diagnosis of chronic disease requires capturing the continual pattern across the biological sequence. This repeating pattern in medical science is called “Motif”. Motifs are the short, recurring patterns of biological sequences that are supposed signify some health disorder. They identify the binding [...] Read more.
Personalized diagnosis of chronic disease requires capturing the continual pattern across the biological sequence. This repeating pattern in medical science is called “Motif”. Motifs are the short, recurring patterns of biological sequences that are supposed signify some health disorder. They identify the binding sites for transcription factors that modulate and synchronize the gene expression. These motifs are important for the analysis and interpretation of various health issues like human disease, gene function, drug design, patient’s conditions, etc. Searching for these patterns is an important step in unraveling the mechanisms of gene expression properly diagnose and treat chronic disease. Thus, motif identification has a vital role in healthcare studies and attracts many researchers. Numerous approaches have been characterized for the motif discovery process. This article attempts to review and analyze fifty-four of the most frequently found motif discovery processes/algorithms from different approaches and summarizes the discussion with their strengths and weaknesses. Full article
Show Figures

Figure 1

26 pages, 1892 KB  
Article
Reversed Lempel–Ziv Factorization with Suffix Trees
by Dominik Köppl
Algorithms 2021, 14(6), 161; https://doi.org/10.3390/a14060161 - 21 May 2021
Cited by 1 | Viewed by 4271
Abstract
We present linear-time algorithms computing the reversed Lempel–Ziv factorization [Kolpakov and Kucherov, TCS’09] within the space bounds of two different suffix tree representations. We can adapt these algorithms to compute the longest previous non-overlapping reverse factor table [Crochemore et al., JDA’12] within the [...] Read more.
We present linear-time algorithms computing the reversed Lempel–Ziv factorization [Kolpakov and Kucherov, TCS’09] within the space bounds of two different suffix tree representations. We can adapt these algorithms to compute the longest previous non-overlapping reverse factor table [Crochemore et al., JDA’12] within the same space but pay a multiplicative logarithmic time penalty. Full article
(This article belongs to the Special Issue Combinatorial Methods for String Processing)
Show Figures

Figure 1

21 pages, 844 KB  
Article
Non-Overlapping LZ77 Factorization and LZ78 Substring Compression Queries with Suffix Trees
by Dominik Köppl
Algorithms 2021, 14(2), 44; https://doi.org/10.3390/a14020044 - 29 Jan 2021
Cited by 9 | Viewed by 5006
Abstract
We present algorithms computing the non-overlapping Lempel–Ziv-77 factorization and the longest previous non-overlapping factor table within small space in linear or near-linear time with the help of modern suffix tree representations fitting into limited space. With similar techniques, we show how to answer [...] Read more.
We present algorithms computing the non-overlapping Lempel–Ziv-77 factorization and the longest previous non-overlapping factor table within small space in linear or near-linear time with the help of modern suffix tree representations fitting into limited space. With similar techniques, we show how to answer substring compression queries for the Lempel–Ziv-78 factorization with a possible logarithmic multiplicative slowdown depending on the used suffix tree representation. Full article
(This article belongs to the Special Issue Algorithms and Data-Structures for Compressed Computation)
Show Figures

Figure 1

25 pages, 442 KB  
Article
Subpath Queries on Compressed Graphs: A Survey
by Nicola Prezza
Algorithms 2021, 14(1), 14; https://doi.org/10.3390/a14010014 - 5 Jan 2021
Cited by 4 | Viewed by 4470
Abstract
Text indexing is a classical algorithmic problem that has been studied for over four decades: given a text T, pre-process it off-line so that, later, we can quickly count and locate the occurrences of any string (the query pattern) in T in [...] Read more.
Text indexing is a classical algorithmic problem that has been studied for over four decades: given a text T, pre-process it off-line so that, later, we can quickly count and locate the occurrences of any string (the query pattern) in T in time proportional to the query’s length. The earliest optimal-time solution to the problem, the suffix tree, dates back to 1973 and requires up to two orders of magnitude more space than the plain text just to be stored. In the year 2000, two breakthrough works showed that efficient queries can be achieved without this space overhead: a fast index be stored in a space proportional to the text’s entropy. These contributions had an enormous impact in bioinformatics: today, virtually any DNA aligner employs compressed indexes. Recent trends considered more powerful compression schemes (dictionary compressors) and generalizations of the problem to labeled graphs: after all, texts can be viewed as labeled directed paths. In turn, since finite state automata can be considered as a particular case of labeled graphs, these findings created a bridge between the fields of compressed indexing and regular language theory, ultimately allowing to index regular languages and promising to shed new light on problems, such as regular expression matching. This survey is a gentle introduction to the main landmarks of the fascinating journey that took us from suffix trees to today’s compressed indexes for labeled graphs and regular languages. Full article
(This article belongs to the Special Issue Combinatorial Methods for String Processing)
Show Figures

Figure 1

9 pages, 920 KB  
Article
Efficient Data Structures for Range Shortest Unique Substring Queries
by Paniz Abedin, Arnab Ganguly, Solon P. Pissis and Sharma V. Thankachan
Algorithms 2020, 13(11), 276; https://doi.org/10.3390/a13110276 - 30 Oct 2020
Cited by 9 | Viewed by 3319
Abstract
Let T[1,n] be a string of length n and T[i,j] be the substring of T starting at position i and ending at position j. A substring T[i,j] [...] Read more.
Let T[1,n] be a string of length n and T[i,j] be the substring of T starting at position i and ending at position j. A substring T[i,j] of T is a repeat if it occurs more than once in T; otherwise, it is a unique substring of T. Repeats and unique substrings are of great interest in computational biology and information retrieval. Given string T as input, the Shortest Unique Substring problem is to find a shortest substring of T that does not occur elsewhere in T. In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over T answering the following type of online queries efficiently. Given a range [α,β], return a shortest substring T[i,j] of T with exactly one occurrence in [α,β]. We present an O(nlogn)-word data structure with O(logwn) query time, where w=Ω(logn) is the word size. Our construction is based on a non-trivial reduction allowing for us to apply a recently introduced optimal geometric data structure [Chan et al., ICALP 2018]. Additionally, we present an O(n)-word data structure with O(nlogϵn) query time, where ϵ>0 is an arbitrarily small constant. The latter data structure relies heavily on another geometric data structure [Nekrich and Navarro, SWAT 2012]. Full article
(This article belongs to the Special Issue Combinatorial Methods for String Processing)
Show Figures

Figure 1

9 pages, 252 KB  
Article
More Time-Space Tradeoffs for Finding a Shortest Unique Substring
by Hideo Bannai, Travis Gagie, Gary Hoppenworth, Simon J. Puglisi and Luís M. S. Russo
Algorithms 2020, 13(9), 234; https://doi.org/10.3390/a13090234 - 18 Sep 2020
Cited by 2 | Viewed by 3252
Abstract
We extend recent results regarding finding shortest unique substrings (SUSs) to obtain new time-space tradeoffs for this problem and the generalization of finding k-mismatch SUSs. Our new results include the first algorithm for finding a k-mismatch SUS in sublinear space, which [...] Read more.
We extend recent results regarding finding shortest unique substrings (SUSs) to obtain new time-space tradeoffs for this problem and the generalization of finding k-mismatch SUSs. Our new results include the first algorithm for finding a k-mismatch SUS in sublinear space, which we obtain by extending an algorithm by Senanayaka (2019) and combining it with a result on sketching by Gawrychowski and Starikovskaya (2019). We first describe how, given a text T of length n and m words of workspace, with high probability we can find an SUS of length L in O(n(L/m)logL) time using random access to T, or in O(n(L/m)log2(L)loglogσ) time using O((L/m)log2L) sequential passes over T. We then describe how, for constant k, with high probability, we can find a k-mismatch SUS in O(n1+ϵL/m) time using O(nϵL/m) sequential passes over T, again using only m words of workspace. Finally, we also describe a deterministic algorithm that takes O(nτlogσlogn) time to find an SUS using O(n/τ) words of workspace, where τ is a parameter. Full article
(This article belongs to the Special Issue Algorithms in Bioinformatics)
22 pages, 4908 KB  
Article
Detecting Pattern Anomalies in Hydrological Time Series with Weighted Probabilistic Suffix Trees
by Yufeng Yu, Dingsheng Wan, Qun Zhao and Huan Liu
Water 2020, 12(5), 1464; https://doi.org/10.3390/w12051464 - 21 May 2020
Cited by 8 | Viewed by 4350
Abstract
Anomalous patterns are common phenomena in time series datasets. The presence of anomalous patterns in hydrological data may represent some anomalous hydrometeorological events that are significantly different from others and induce a bias in the decision-making process related to design, operation and management [...] Read more.
Anomalous patterns are common phenomena in time series datasets. The presence of anomalous patterns in hydrological data may represent some anomalous hydrometeorological events that are significantly different from others and induce a bias in the decision-making process related to design, operation and management of water resources. Hence, it is necessary to extract those “anomalous” knowledge that can provide valuable and useful information for future hydrological analysis and forecasting from hydrological data. This paper focuses on the problem of detecting anomalous patterns from hydrological time series data, and proposes an effective and accurate anomalous pattern detection approach, TFSAX_wPST, which combines the advantages of the Trend Feature Symbolic Aggregate approximation (TFSAX) and weighted Probabilistic Suffix Tree (wPST). Experiments with different hydrological real-world time series are reported, and the results indicate that the proposed methods are fast and can correctly detect anomalous patterns for hydrological time series analysis, and thus promote the deep analysis and continuous utilization of hydrological time series data. Full article
(This article belongs to the Special Issue Computational Ecohydrology)
Show Figures

Figure 1

2 pages, 154 KB  
Editorial
Editorial: Special Issue on Efficient Data Structures
by Jesper Jansson
Algorithms 2019, 12(7), 136; https://doi.org/10.3390/a12070136 - 5 Jul 2019
Viewed by 3566
Abstract
This Special Issue of Algorithms is focused on the design, formal analysis, implementation, and experimental evaluation of efficient data structures for various computational problems. Full article
(This article belongs to the Special Issue Efficient Data Structures)
11 pages, 446 KB  
Article
Sliding Suffix Tree
by Andrej Brodnik and Matevž Jekovec
Algorithms 2018, 11(8), 118; https://doi.org/10.3390/a11080118 - 3 Aug 2018
Cited by 6 | Viewed by 7410
Abstract
We consider a sliding window W over a stream of characters from some alphabet of constant size. We want to look up a pattern in the current sliding window content and obtain all positions of the matches. We present an indexed version of [...] Read more.
We consider a sliding window W over a stream of characters from some alphabet of constant size. We want to look up a pattern in the current sliding window content and obtain all positions of the matches. We present an indexed version of the sliding window, based on a suffix tree. The data structure of size Θ(|W|) has optimal time queries Θ(m+occ) and amortized constant time updates, where m is the length of the query string and occ is its number of occurrences. Full article
(This article belongs to the Special Issue Efficient Data Structures)
Show Figures

Figure 1

17 pages, 2590 KB  
Article
Feature-Based and String-Based Models for Predicting RNA-Protein Interaction
by Donald Adjeroh, Maen Allaga, Jun Tan, Jie Lin, Yue Jiang, Ahmed Abbasi and Xiaobo Zhou
Molecules 2018, 23(3), 697; https://doi.org/10.3390/molecules23030697 - 19 Mar 2018
Cited by 17 | Viewed by 5648
Abstract
In this work, we study two approaches for the problem of RNA-Protein Interaction (RPI). In the first approach, we use a feature-based technique by combining extracted features from both sequences and secondary structures. The feature-based approach enhanced the prediction accuracy as it included [...] Read more.
In this work, we study two approaches for the problem of RNA-Protein Interaction (RPI). In the first approach, we use a feature-based technique by combining extracted features from both sequences and secondary structures. The feature-based approach enhanced the prediction accuracy as it included much more available information about the RNA-protein pairs. In the second approach, we apply search algorithms and data structures to extract effective string patterns for prediction of RPI, using both sequence information (protein and RNA sequences), and structure information (protein and RNA secondary structures). This led to different string-based models for predicting interacting RNA-protein pairs. We show results that demonstrate the effectiveness of the proposed approaches, including comparative results against leading state-of-the-art methods. Full article
Show Figures

Figure 1

23 pages, 664 KB  
Article
Local Similarity Search to Find Gene Indicators in Mitochondrial Genomes
by Ruby L. V. Moritz, Matthias Bernt and Martin Middendorf
Biology 2014, 3(1), 220-242; https://doi.org/10.3390/biology3010220 - 11 Mar 2014
Cited by 1 | Viewed by 7704
Abstract
Given a set of nucleotide sequences we consider the problem of identifying conserved substrings occurring in homologous genes in a large number of sequences. The problem is solved by identifying certain nodes in a suffix tree containing all substrings occurring in the given [...] Read more.
Given a set of nucleotide sequences we consider the problem of identifying conserved substrings occurring in homologous genes in a large number of sequences. The problem is solved by identifying certain nodes in a suffix tree containing all substrings occurring in the given nucleotide sequences. Due to the large size of the targeted data set, our approach employs a truncated version of suffix trees. Two methods for this task are introduced: (1) The annotation guided marker detection method uses gene annotations which might contain a moderate number of errors; (2) The probability based marker detection method determines sequences that appear significantly more often than expected. The approach is successfully applied to the mitochondrial nucleotide sequences, and the corresponding annotations that are available in RefSeq for 2989 metazoan species. We demonstrate that the approach finds appropriate substrings. Full article
(This article belongs to the Special Issue Developments in Bioinformatic Algorithms)
Show Figures

Back to TopTop