Next Article in Journal
A Novel Approach for Segment-Length Selection Based on Stationarity to Perform Effective Connectivity Analysis Applied to Resting-State EEG Signals
Previous Article in Journal
Semi-Supervised Adversarial Learning Using LSTM for Human Activity Recognition
Previous Article in Special Issue
ENERDGE: Distributed Energy-Aware Resource Allocation at the Edge
Article

Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines

1
Institute of Informatics, Federal University of Rio Grande do Sul, UFRGS/PPGC, Porto Alegre 91501-970, RS, Brazil
2
LIG-ERODS, Université Grenoble Alpes, 38058 Grenoble, France
3
Graduate Program in Teleinformatics Engineering Federal, University of Ceará, PPGETI/UFC, Center of Technology, Campus of Pici, Fortaleza 60455-970, CE, Brazil
4
COPELABS, Universidade Lusófona de Humanidades e Tecnologias, 1749-024 Lisboa, Portugal
5
VALORIZA, Research Center for Endogenous Resource Valorization, Polytechnic Institute of Portalegre, 7300-555 Portalegre, Portugal
*
Author to whom correspondence should be addressed.
Academic Editors: Aris Leivadeas, Vasileios Karyotis and Dimitrios Dechouniotis
Sensors 2022, 22(13), 4756; https://doi.org/10.3390/s22134756 (registering DOI)
Received: 20 May 2022 / Revised: 17 June 2022 / Accepted: 20 June 2022 / Published: 23 June 2022
A significant rise in the adoption of streaming applications has changed the decision-making processes in the last decade. This movement has led to the emergence of several Big Data technologies for in-memory processing, such as the systems Apache Storm, Spark, Heron, Samza, Flink, and others. Spark Streaming, a widespread open-source implementation, processes data-intensive applications that often require large amounts of memory. However, Spark Unified Memory Manager cannot properly manage sudden or intensive data surges and their related in-memory caching needs, resulting in performance and throughput degradation, high latency, a large number of garbage collection operations, out-of-memory issues, and data loss. This work presents a comprehensive performance evaluation of Spark Streaming backpressure to investigate the hypothesis that it could support data-intensive pipelines under specific pressure requirements. The results reveal that backpressure is suitable only for small and medium pipelines for stateless and stateful applications. Furthermore, it points out the Spark Streaming limitations that lead to in-memory-based issues for data-intensive pipelines and stateful applications. In addition, the work indicates potential solutions. View Full-Text
Keywords: backpressure; big data; spark streaming; stream processing backpressure; big data; spark streaming; stream processing
Show Figures

Figure 1

MDPI and ACS Style

Matteussi, K.J.; dos Anjos, J.C.S.; Leithardt, V.R.Q.; Geyer, C.F.R. Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines. Sensors 2022, 22, 4756. https://doi.org/10.3390/s22134756

AMA Style

Matteussi KJ, dos Anjos JCS, Leithardt VRQ, Geyer CFR. Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines. Sensors. 2022; 22(13):4756. https://doi.org/10.3390/s22134756

Chicago/Turabian Style

Matteussi, Kassiano J., Julio C.S. dos Anjos, Valderi R.Q. Leithardt, and Claudio F.R. Geyer. 2022. "Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines" Sensors 22, no. 13: 4756. https://doi.org/10.3390/s22134756

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop