You are currently viewing a new version of our website. To view the old version click .
Engineering Proceedings
  • Proceeding Paper
  • Open Access

18 July 2025

Event-Driven Data Orchestration: A Modular Approach for High-Volume Real-Time Processing †

and
1
Faculty of Mathematics and Informatics, University of Plovdiv “Paisii Hilendarski”, 4000 Plovdiv, Bulgaria
2
Faculty of Physics and Technology, University of Plovdiv “Paisii Hilendarski”, 4000 Plovdiv, Bulgaria
*
Author to whom correspondence should be addressed.
Presented at the 14th International Scientific Conference TechSys 2025—Engineering, Technology and Systems, Plovdiv, Bulgaria, 15–17 May 2025.
This article belongs to the Proceedings The 14th International Scientific Conference TechSys 2025—Engineering, Technologies and Systems

Abstract

This article presents a model for orchestrating data extraction, processing, and storage, addressing the challenges posed by diverse data sources and increasing data volumes. The proposed model includes three primary components: data production, data transfer, and data consumption and storage. Key architectures for data production are explored, such as modular designs and distributed processes, each with advantages and limitations regarding scalability, fault tolerance, and resource efficiency. A buffering module is introduced to enable temporary data storage, ensuring resilience and asynchronous processing. The data consumption module focuses on transforming and storing data in data warehouses while providing options for parallel and unified processing architectures to enhance efficiency. Additionally, a notification module demonstrates real-time alerts based on specific data events, integrating seamlessly with messaging platforms like Telegram. The model is designed to ensure adaptability, scalability, and robustness for modern data-driven applications, making it a versatile solution for effective data flow management.

1. Introduction

Data orchestration is the coordinated movement of data from different sources to a single center where it can be stored and processed. This includes collection, pre-processing, quality assurance, and storage of information. For this purpose, various approaches, models, and tools are applied that ensure data security and authenticity. Effective coordination and operation with information enables important judgments and conclusions to be made, leading to successful development. Coordinated management of data extraction, processing, and storage has become paramount for organizations that want to use their information assets successfully. With the proliferation of devices and data sources, a robust model that seamlessly organizes these processes is needed. In this article, we propose a model that we have implemented to meet this challenge. Through it, we can securely, easily, and conveniently control data extraction, processing, and storage. It focuses on the roles of data production modules, buffers, and data stream consumers.
Effective orchestration in data extraction, processing, and storage has been a cornerstone for modern digital systems to unlock the full potential of an organization’s information assets. With data volumes, speeds, and types continuing to increase, underpinned by the exponential adoption of IoT devices, cloud services, and enterprise systems, demand for strong frameworks for data management correspondingly rises [1,2,3]. Historically, pioneering works have highlighted the need for data organization in order to make decisions and undertake strategic planning. The early works of [4,5] naturally set up the bases for current architectures that would aim at modularity, fault tolerance, and scalability. These early efforts laid the groundwork for contemporary architectures that prioritize modularity, fault tolerance, and scalability [6,7,8].
This work revisits such principles by proposing an overall model that smoothly integrates data production, buffering, and consumption toward solving contemporary challenges. Unlike previous centralized approaches that suffered from bottlenecks and single points of failure, this model emphasizes modularity, distributed processing, and real-time analytics inspired by recent work in cloud computing and asynchronous processing [6]. It integrates components such as modular adapters, message brokers, and notification systems, among others, to avoid shortcomings like system downtime, wastes of resources, and slow processing.
In addition, notification mechanisms for data changes are integrated into the model, showing its commitment to proactive communication and decision-making. It reflects a shift in focus from static data storage to dynamic data flow, thus fitting well into the real-world demands of e-commerce, healthcare, and finance, among others [9]. Accordingly, this work advances the state of the art in data management from its traditional roots to modern needs for flexibility, efficiency, and fault tolerance in data-driven ecosystems.
In the domain of large-scale data collection and processing, Lambda Architecture and Kappa Architecture [10] have become foundational paradigms. Lambda Architecture combines a batch layer for accurate and complete historical processing with a speed layer for real-time, low-latency updates. While effective for ensuring accuracy and fault tolerance, this dual-layer approach introduces considerable complexity, as it requires maintaining and synchronizing two distinct code paths for the same business logic. To address this, Kappa Architecture was introduced as a simplification, advocating for a unified stream processing model where all data, past and present, is treated as a continuous stream, processed and reprocessed in a single layer. Despite reducing complexity, Kappa Architecture assumes a more uniform data model and may face challenges when accommodating highly heterogeneous data sources or when needing modularity and flexibility in processing logic. By contrast, the proposed event-driven orchestration model advances these paradigms by emphasizing modular adapters for diverse producers, configurable buffering mechanisms, and unified or parallel consumer architectures. This design offers greater adaptability for real-time heterogeneous environments such as e-commerce and IoT systems, where data formats and processing needs can be highly variable. As part of our future work, we plan to conduct a detailed comparative analysis between the proposed model and existing models like Lambda and Kappa Architectures to further validate performance, flexibility, and operational efficiency across diverse application scenarios.

3. Resource Optimization and Performance Insights

To evaluate the efficiency of different consumer architectures within the proposed data orchestration model, we conducted a comparative benchmark focusing on CPU utilization, memory consumption, latency, and throughput.

3.1. Experimental Data and Scenario

The experimental scenario was designed to evaluate the performance and flexibility of the proposed orchestration model in handling real-world data collection and processing it. Specifically, the scenario focused on collecting and processing data from four distinct simulated data sources, each representing a different origin type and emulating common data generation patterns observed in information, for example. The objective of the scenario was to validate how the system performs under diverse and asynchronous data loads, and to compare the efficiency of the two proposed architectures—parallel and unified—under identical conditions.
For data generation, we developed a dedicated external application responsible for producing randomized objects. These objects contained varied attributes, including identifiers, names, preferences, and behavioral indicators, designed to reflect diverse and semi-structured data formats commonly encountered in dynamic user-centric environments. The data retrieval service, implemented with built-in adapters as part of the producer module, established connections to these external applications to ingest the simulated data in real time.
The simulated environment enabled us to
  • Emulate real-world data heterogeneity, as objects featured variations in attributes;
  • Test the adaptability of the ingestion layer to dynamic data formats through modular adapters;
  • Validate processing and storage efficiency under realistic streaming workloads.
For data persistence, we used a NoSQL database, which was selected based on its suitability for storing semi-structured and schema-flexible data types. NoSQL storage ensured that the variability in incoming objects could be accommodated without the need for rigid schema definitions, aligning well with the principles of scalability and adaptability.

3.2. Performance Benchmarking and Results

The results, summarized in Table 1 and Figure 7, reveal distinct trade-offs between parallel and unified consumption strategies. The Parallel Consumers architecture, while achieving a slightly higher throughput of 5000 messages/s, incurred a significantly higher CPU usage (75%) and memory consumption (1200 MB). In contrast, the Unified Consumer design maintained a competitive throughput of 4500 messages/s, while demonstrating substantially lower CPU usage (40%) and memory usage (300 MB).
Table 1. Parallel Consumers and Unified Consumer results.
Figure 7. Parallel Consumers and Unified Consumer results.
These results suggest that although the Parallel Consumers model offers marginally better throughput, it does so at the expense of resource efficiency. The unified model emerges as a more sustainable solution, particularly suitable for resource-constrained environments or cloud-native deployments with cost and performance trade-offs. This evaluation validates the architectural decision to support both configurations within the model, enabling system designers to balance performance and efficiency based on domain-specific requirements.

3.3. Results of System Objectives

The implications of these findings are directly related to the model’s overarching objectives:
Adaptability: The ability to seamlessly ingest variable User objects using modular adapters and store them in a schema-flexible NoSQL database demonstrates the model’s adaptability to diverse data types and application domains.
Scalability: Reduced CPU and memory consumption in the unified model enhances scalability, as additional consumers can be deployed cost-effectively to meet increasing demand without performance degradation.
Robustness: Despite lower resource consumption, the unified model sustained near-equivalent throughput and ensured smooth data flow, thus maintaining robustness under real-time processing conditions.
These results validate the proposed model’s capacity to meet the goals of adaptability, scalability, and robustness, making it suitable for modern data-driven applications.

4. Conclusions

In this study, we proposed a robust and adaptable model for orchestrating data extraction, processing, and storage, addressing the growing need for efficient and scalable data management solutions. The model incorporates modular and distributed architectures for data production, ensuring compatibility with heterogeneous data sources while balancing scalability and fault tolerance. A buffering module was introduced as a crucial intermediary to decouple data ingestion and processing, enhancing system resilience and load balancing. Furthermore, the data consumption and storage module demonstrated multiple approaches to optimize performance and scalability, particularly through parallel processing and unified worker processes.
We also showcased the implementation of an integrated notification module, highlighting its ability to provide real-time alerts tailored to specific business needs. This flexible architecture allows seamless integration of additional functionalities without disrupting existing workflows. Despite some challenges, such as managing distributed processes and ensuring consistency, the proposed model effectively addresses the critical demands of modern data-driven systems. It lays a strong foundation for future advancements in data orchestration, offering organizations a scalable, fault-tolerant, and efficient framework for leveraging their data assets.
In the present, defined by complex and pervasive information, using such a data orchestration model would contribute to driving innovation, gaining competitive advantage, and delivering a superior user experience.

Author Contributions

Introduction S.D. and M.D.; Conceptualization S.D. and M.D.; visualization S.D. and M.D.; Related Work and Background S.D.; Module platform S.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Research Fund, MUPD2-2-FTF-023, National Program “Young Scientists and Postdoctoral Fellows—2” at the University of Plovdiv “Paisii Hilendarski”.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

This article is part of the work on the project NP “Young scientists, doctoral students and postdoctoral fellows”—2, at the University of Plovdiv “Paisii Hilendarski”.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Murad, S. Big data marketing. In The Essentials of Today’s Marketing; Tarakçı, i.E., Aslan, R., Eds.; Efe Academy Publishing: Muğla, Turkey, 2023; pp. 265–284. [Google Scholar]
  2. Divyeshkumar, V. Big Data in Marketing Strategy. Available online: https://www.igi-global.com/gateway/chapter/359370 (accessed on 1 July 2025).
  3. K, S.K. Exploring Real-Time Data Processing Using Big Data Frameworks. Commun. Appl. Nonlinear Anal. 2024, 31, 620–634. [Google Scholar] [CrossRef]
  4. Inmon, W.H. Building the Data Warehouse; Wiley: Hoboken, NJ, USA, 1995. [Google Scholar]
  5. Kimball, R. The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses; Wiley: Hoboken, NJ, USA, 1996. [Google Scholar]
  6. Bystrov, S.; Kushnerov, A. Asynchronous Data Processing. Behavior Analysis. 2022. Available online: https://www.researchgate.net/profile/Alexander-Kushnerov/publication/362910934_Asynchronous_Data_Processing_Behavior_Analysis/links/6307535a61e4553b9538f614/Asynchronous-Data-Processing-Behavior-Analysis.pdf (accessed on 15 July 2025).
  7. Alfaia, E.C.; Dusi, M.; Fiori, L.; Gringoli, F.; Niccolini, S. Fault-Tolerant Streaming Computation with BlockMon. In Proceedings of the 2015 IEEE Global Communications Conference (GLOBECOM), San Diego, CA, USA, 6–10 December 2015. [Google Scholar]
  8. Pogiatzis, A.; Samakovitis, G. An Event-Driven Serverless ETL Pipeline on AWS. Appl. Sci. 2020, 11, 191. [Google Scholar] [CrossRef]
  9. Khriji, S.; Benbelgacem, Y.; Chéour, R.; Houssaini, D.E.; Kanoun, O. Design and implementation of a cloud-based event-driven architecture for real-time data processing in wireless sensor networks. J. Supercomput. 2022, 78, 3374–3401. [Google Scholar] [CrossRef]
  10. Tatipamula, S. Real-Time vs. Batch Data Processing: When speed matters. World J. Adv. Res. Rev. 2025, 26, 1612–1631. [Google Scholar] [CrossRef]
  11. Speckhard, D.; Bechtel, T.; Ghiringhelli, L.M.; Kuban, M.; Rigamonti, S.; Draxl, C. How big is Big Data? Faraday Discuss. 2024, 256, 483–502. [Google Scholar] [CrossRef] [PubMed]
  12. Holmstedt, J. Development of a modular open systems approach to achieve power distribution component commonality. In Proceedings of the Ground Vehicle Systems Engineering and Technology Symposium (GVSETS), 2024, Novi, MI, USA, 15–17 August 2023. [Google Scholar] [CrossRef]
  13. Wang, M. Design and implementation of asynchronous FIFO. Appl. Comput. Eng. 2024, 70, 220–226. [Google Scholar] [CrossRef]
  14. Meixia, M.; Siqi, Z.; Jiawei, L.; Jianghong, W. Aggregatably Verifiable Data Streaming. IEEE Internet Things J. 2024, 11, 24109–24122. [Google Scholar] [CrossRef]
  15. Arora, D.; Sonwane, A.; Wadhwa, N.; Mehrotra, A.; Utpala, S.; Bairi, R.; Kanade, A.; Natarajan, N. MASAI: Modular Architecture for Software-engineering AI Agents. arXiv 2024, arXiv:2406.11638. [Google Scholar] [CrossRef]
  16. Mainali, D.; Nagarkoti, M.; Dangol, J.; Pandit, D.; Adhikari, O.; Sharma, O.P. Cloud Computing Fault Tolerance. Int. J. Innov. Sci. Res. Technol. (IJISRT) 2024, 9, 220–225. [Google Scholar] [CrossRef]
  17. Bouvier, T.; Nicolae, B.; Chaugier, H.; Costan, A.; Foster, I.; Antoniu, G. Efficient Data-Parallel Continual Learning with Asynchronous Distributed Rehearsal Buffers. In Proceedings of the 2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid), Philadelphia, PA, USA, 6–9 May 2024. [Google Scholar]
  18. Kim, M.; Jang, J.; Choi, Y.; Yang, H.J. Distributed Task Offloading and Resource Allocation for Latency Minimization in Mobile Edge Computing Networks. IEEE Trans. Mob. Comput. 2024, 23, 15149–15166. [Google Scholar] [CrossRef]
  19. Dyjach, S.; Plechawska-Wójcik, M. Efficiency comparison of message brokers. J. Comput. Sci. Inst. 2024, 31, 116–123. [Google Scholar] [CrossRef]
  20. Mager, T. Big Data Forensics on Apache Kafka. In Proceedings of the International Conference on Information Systems Security, Raipur, India, 16–20 December 2023; pp. 42–56. [Google Scholar]
  21. Ayanoglu, E.; Aytas, Y.; Nahum, D. Mastering RabbitMQ; Packt Publishing Ltd.: Birmingham, UK, 2016. [Google Scholar]
  22. Banowosari, L.Y.; Purnamasari, D. Approach for Unwrapping the Unstructured to Structured Data the Case of Classified Ads in HTML Format. Adv. Sci. Lett. 2016, 22, 1909–1913. [Google Scholar] [CrossRef][Green Version]
  23. Cheruku, S.R.; Jain, S.; Aggarwal, A. Building Scalable Data Warehouses: Best Practices and Case Studies. Mod. Dyn. Math. Prog. 2016, 1, 116–130. [Google Scholar] [CrossRef]
  24. Qi, Z.; Wang, H.; Dong, Z. Feature Selection on Inconsistent Data. In Dirty Data Processing for Machine Learning; Springer: Singapore, 2023. [Google Scholar]
  25. Singh, A.; Prasad, R. Java Classes. 2024. Available online: https://www.researchgate.net/publication/380464284_Java_Classes?channel=doi&linkId=663da12008aa54017af11b2b&showFulltext=true (accessed on 15 July 2025).
  26. Setiawan, I.P.E.; Desnanjaya, I.G.M.N.; Supartha, K.D.G.; Ariana, A.G.B.; Putra, I.D.P.G.W. Implementation of Telegram Notification System for Motorbike Accidents Based on Internet of Things. J. Galaksi 2024, 1, 1–11. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.