Special Issue "Big Data Algorithmics"

A special issue of Algorithms (ISSN 1999-4893). This special issue belongs to the section "Algorithms for Databases, and Data Structures".

Deadline for manuscript submissions: 31 August 2020.

Special Issue Editors

Prof. Andrea Pietracaprina
Website
Guest Editor
Department of Information Engineering, University of Padova, I-35131 Padova, Italy
Interests: big data algorithmics and data mining; algorithms and data structures for parallel and hierarchical platforms; models of computation; wireless interconnection networks and routing
Prof. Geppino Pucci
Website
Guest Editor
Department of Information Engineering, University of Padova, I-35131 Padova, Italy
Interests: algorithms and data structures for big data; models of computation; wireless interconnection networks and routing; graph analytics; high-performance numerical kernels; data mining
Prof. Francesco Silvestri
Website
Guest Editor
Department of Information Engineering, University of Padova, I-35131 Padova, Italy
Interests: algorithms and data structures for big data; parallel and memory-efficient algorithms; algorithms for similarity search; models of computation

Special Issue Information

Dear Colleagues,

The last two decades have witnessed an impressive growth in the amount of digital data produced in a wide spectrum of application domains. The effective exploitation of this data is of great value to science, business, government, and society, and it has been considered as one of the most important scientific challenges of the 21st century. Due to the volume, velocity, heterogeneity, and noise, dealing with big data requires a sharp paradigm shift with respect to traditional computing. Novel computational frameworks such as MapReduce and Streaming have recently established themselves as standards for big data processing, both in academia and in industry. The effective use of these frameworks requires new approaches to algorithm design. For instance, algorithmic solutions must take into account that the data are typically distributed or streamed, and in any case cannot fit in a single machine's local memory. Moreover, since exact solutions are often computationally unfeasible for very large and possibly noisy inputs, suitable tradeoffs must be sought between the accuracy of the returned solution and computational efficiency.

Despite the considerable effort that the research community has been devoting to big data algorithmics, there is a dire need for new contributions as well as efforts to clean up and systematize very recent advances in the field. This Special Issue aims at making a step in this direction by soliciting original and unpublished contributions in the area. Topics of interest include (but are not limited to) those listed below.

TOPICS OF INTEREST

  • Computational Models for Big Data Processing
  • MapReduce Algorithms
  • Streaming Algorithms
  • Parallel and Distributed Algorithms for Data Intensive Computing
  • Approximation Algorithms for Massive Instances of Optimization Problems
  • Algorithms for Large Graphs
  • Models and Algorithms for Large Evolving Data and Sequences
  • Models and Algorithms for Large Uncertain Data
  • Dimensionality Reduction Techniques
  • Compact Data Structures and Sketches
  • Sampling Methodologies
  • Impossibility Results

Prof. Andrea Pietracaprina
Prof. Geppino Pucci
Prof. Francesco Silvestri
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Open AccessArticle
Mining Sequential Patterns with VC-Dimension and Rademacher Complexity
Algorithms 2020, 13(5), 123; https://doi.org/10.3390/a13050123 - 18 May 2020
Abstract
Sequential pattern mining is a fundamental data mining task with application in several domains. We study two variants of this task—the first is the extraction of frequent sequential patterns, whose frequency in a dataset of sequential transactions is higher than a user-provided threshold; [...] Read more.
Sequential pattern mining is a fundamental data mining task with application in several domains. We study two variants of this task—the first is the extraction of frequent sequential patterns, whose frequency in a dataset of sequential transactions is higher than a user-provided threshold; the second is the mining of true frequent sequential patterns, which appear with probability above a user-defined threshold in transactions drawn from the generative process underlying the data. We present the first sampling-based algorithm to mine, with high confidence, a rigorous approximation of the frequent sequential patterns from massive datasets. We also present the first algorithms to mine approximations of the true frequent sequential patterns with rigorous guarantees on the quality of the output. Our algorithms are based on novel applications of Vapnik-Chervonenkis dimension and Rademacher complexity, advanced tools from statistical learning theory, to sequential pattern mining. Our extensive experimental evaluation shows that our algorithms provide high-quality approximations for both problems we consider. Full article
(This article belongs to the Special Issue Big Data Algorithmics)
Show Figures

Figure 1

Open AccessArticle
Deterministic Coresets for k-Means of Big Sparse Data
Algorithms 2020, 13(4), 92; https://doi.org/10.3390/a13040092 - 14 Apr 2020
Abstract
Let P be a set of n points in Rd, k1 be an integer and ε(0,1) be a constant. An ε-coreset is a subset CP with appropriate non-negative weights (scalars), that [...] Read more.
Let P be a set of n points in R d , k 1 be an integer and ε ( 0 , 1 ) be a constant. An ε-coreset is a subset C P with appropriate non-negative weights (scalars), that approximates any given set Q R d of k centers. That is, the sum of squared distances over every point in P to its closest point in Q is the same, up to a factor of 1 ± ε to the weighted sum of C to the same k centers. If the coreset is small, we can solve problems such as k-means clustering or its variants (e.g., discrete k-means, where the centers are restricted to be in P, or other restricted zones) on the small coreset to get faster provable approximations. Moreover, it is known that such coreset support streaming, dynamic and distributed data using the classic merge-reduce trees. The fact that the coreset is a subset implies that it preserves the sparsity of the data. However, existing such coresets are randomized and their size has at least linear dependency on the dimension d. We suggest the first such coreset of size independent of d. This is also the first deterministic coreset construction whose resulting size is not exponential in d. Extensive experimental results and benchmarks are provided on public datasets, including the first coreset of the English Wikipedia using Amazon’s cloud. Full article
(This article belongs to the Special Issue Big Data Algorithmics)
Show Figures

Figure 1

Back to TopTop