Serendipitous, Open Big Data Management and Analytics: The SeDaSOMA Framework †
Abstract
:1. Introduction
2. Related Work
Innovations of SeDaSOMA over Conventional State-of-the-Art Approaches
- Support for Big Data Crowdsourcing: One of the most prominent characteristics provided by SeDaSOMA is represented by the fact that its big data layer is alimented via crowdsourcing methodologies, like in some recent initiatives (e.g., [37]—contrary to this, in conventional approaches, big data repositories are alimented via methods that, essentially, are inspired by traditional ETL (Extraction, Transformation and Loading) procedures (e.g., [38]); obviously, the usage of crowdsourcing methodologies ensures more dynamicity in the data sources, more scalability, and more heterogeneity;
- Support for Big Data Marketplacing: SeDaSOMA provides support for big data marketplacing (e.g., [39]), meaning that, after that big data within the SeDaSOMA data layer are consolidated, they are not exposed to the big data analytics layer via conventional methods (e.g., publish-subscribe, push-down, ETL, etc.) but, rather, via the innovative data marketplace model, according to which client users/applications can access a marketplace of (consolidated) big data sources and select the ones that are more appropriate and “convenient” to them based on their big data analytics goals, preferences, workload issues, and so forth;
- Support for Context-Aware Big Data Processing: In addition to the above-listed innovations, the SeDaSOMA framework makes use of another really relevant feature, i.e., the approach of processing big data in dependence on the context which respect to which they are produced, located, elaborated, and consumed. This is a topic that is raising a great deal of attention now (e.g., [40]), as it overall allows us to achieve better intelligent big data processing, which turns to play a critical role with respect to specific and advanced challenges of big data, such as big data understanding and big data fruition;
- Support Big Data Preparation for Analytics: Finally, SeDaSOMA introduces the nice amenity of applying data preparation techniques (e.g., [41]) for effectively and efficiently supporting big data analytics tasks. Indeed, due to the well-known 3V properties of big data, big data cannot be processed by analytical tasks as they are, but they indeed need specific data preparation solutions (e.g., normalization, scaling, and so forth).
3. SeDaSOMA’s Anatomy
- Big Data Source Layer, which locates and manages different sources of big data, which are heterogeneous in nature, distributed, and streamed inherently.
- Big Data Repository and Provisioning Layer, which stores and processes big data, then transmits it to the higher layers.
- Big Data Analytics Layer, which extracts useful knowledge from big data by running complex and scalable analytics over these huge amounts of data.
- Big Data Application Layer, which includes Cloud-aware big data vertical applications that completely rely on the underlying layers. These applications focus on the main domains of modern information community challenges specified by European Union data management guidelines.
3.1. SeDaSOMA Components
- Physical Resource Management, which handles the physical aspects of resources that are capable of generating data streams (e.g., sensors and datacenters) by diving into and exploring innovative aspects such as noise-detection and cleaning, stream pipelining, resource/stream synchronization, etc. This can be further extended by checking out other advanced topics such as designing high-performance infrastructures that solve the traditional issues regarding big data streams, including the common 3Vs of big data, i.e., Volume, Variety, Velocity.
- Heterogeneous Distributed Big Data Stream Management, which is responsible for managing big data (stream) at the framework’s input interface, where big data is considered as data objects on top of the underlying physical resources (e.g., sensors and datacenters that are capable of generating big data streams). The goal here is to overcome the following major challenges: (i) having to deal with big-data’s streaming nature; (ii) having to deal with multi-rate arrivals and inter-arrivals; (iii) having to deal with the massive amounts of (streaming) big data, etc.
- Big Data Crowdsourcing, which tackles the problem of big data collection using the innovative crowdsourcing paradigm. This aims to develop models, techniques and algorithms for applying this novel paradigm to big data collection/acquisition problems.
- Big Data Storage, which is a major component of the SeDaSOMA framework. It tackles big data representation and storage problems in particular as scalability presents one of the key requirements. In this context, several challenges exist, which range from advanced data structures for big data storage to big data indexing, from big data partitioning techniques to innovative data management policies and regulations focusing on elastic big data storage solutions, etc.
- Big Data Warehousing, which ensures the reliability of data management within the SeDaSOMA framework, is a key feature of warehousing methodologies and solutions due to the complex nature of big data. In our case, these methodologies are captured and implemented by the Big Data Warehousing component in combination with the innovative Cloud-based paradigm, which is built on top of the Big Data Storage component and modern NoSQL architectures (e.g., in distributed settings) that are used in the implementation of this component. This component addresses the following main issues: (i) multidimensional big data models, which are critical for supporting big data analytics; (ii) compressed big multidimensional data representations; (iii) MapReduce-based big data warehousing, etc.
- Secure Big Data Processing, which represents a major concern of the SeDaSOMA framework as it is critical to guarantee the security of big data stored in the Big Data Warehousing component; this is because these (big) data are accessed and processed within an open environment, i.e., the Big Data Marketplace component. Therefore, as a result, the SeDaSOMA framework is required to incorporate ad-hoc solutions in order to secure big data processing. The Secure Big Data Processing component directly interfaces with the Big Data Warehousing component and achieves this goal. In this context, some of the relevant existing challenges are as follows: (i) ensuring scalability and security when accessing big data; (ii) implementing innovative big data encryption methodologies; (iii) defining novel techniques for big data provenance, etc.
- Big Data Marketplace, the SeDaSOMA framework contains innovative characteristics because its core data layer provides big data to higher layers and consumer (Cloud-aware) applications and makes it available by employing a novel data marketplace (DaaS) paradigm instead of using a traditional big data fruition scheme. This paradigm handles modern Cloud computing environments and anticipates an applicative setting that uses service-oriented primitives in order to make big data available for consumer applications. The main goal of this component consists of making the big data stored in the Big Data Warehousing component available to consumer applications following the DaaS paradigm, all while ensuring security (which is guaranteed by the Secure Big Data Processing component). Several open problems emerge in this context: from innovative service-oriented big data provisioning to designing models for the big data marketplace, from challenges related to scalability to Cloud-compliant heterogeneity problems, etc.
- Big Data Query Answering: this component addresses challenges related to implementing effective and efficient algorithms that support query answering over big data, along with scalability issues. The traditional query answering algorithms that have been proposed and developed in the context of traditional data-intensive scenarios (e.g., very large databases) are not capable of processing large amounts of big data; therefore, as a result, new techniques need to be introduced. The major relevant challenges regarding this concept are the following: (i) the use of data compression paradigms to enhance query answering performance while ensuring the accuracy of answers; (ii) evaluating preference-based query processing techniques and how they handle big data; (iii) scalability problems related to querying big data, etc.
- Context-Aware Big Data Processing: As highlighted before, the SeDaSOMA framework aims primarily at providing powerful data-intensive analytics in next-generation big data applications, where these analytics can be exploited based on the fortunate Cloud computing paradigm. Following this concept, it is critical to incorporate context-aware methods in order to provide “the right (big) data to the right application” due to the large amounts and the strong heterogeneity of big data. This issue has become very relevant lately, especially due to the strong relationship among topics such as advertisements and recommendations in big data. In the case of our SeDaSOMA framework, we guarantee this requirement via the Context-Aware Big Data Processing component. The major relevant challenges regarding this context are as follows: (i) context-awareness in the proposed algorithms and techniques for big data; (ii) preference-based big data processing; (iii) scalability in context-aware big data processing, etc.
- Big Data Preparation for Analytics, which is a major challenge within the process of designing and executing analytics over big data, i.e., how to perform pre-processing over big data in order to make more effective and more efficient analytics. In fact, it is a critical issue due to the characteristics of big data, such as the extremely massive size and strong heterogeneity. In addition, the analytics become irrelevant (e.g., in terms of “responsiveness”) due to the lack of a “unifying” schema. The Big Data Preparation for Analytics component is responsible for providing solutions that fulfil some technical needs in this context, such as refining existing data, reducing unnecessary data, normalizing data, and successfully extracting attributes and features from big data that guide the analytics process. Moreover, since these requirements can only be answered during the analytics phase, the Big Data Preparation for Analytics component needs to provide on-demand preparation primitives in order to be able to respond to requests from the analytics component. Finally, it is mandatory that all preparation primitives should be used in a privacy-preserving manner, especially when managing user data. SOLID is an ecosystem that we plan to refer to in order to handle and overcome the challenges of privacy (e.g., [43,44]).
- Scalable Big Data Analytics, which is the component that ensures the main goal of the whole SeDaSOMA proposal, i.e., supporting scalable big data analytics. To this end, the Scalable Big Data Analytics component addresses the problem of defining novel paradigms for next-generation analytics that are characterized by high responsiveness and high scalability. This component plays a critical role as Cloud-aware big data vertical applications rely on the SeDaSOMA framework (being devoted to assessing and possibly showing the effectiveness and the reliability of the framework), and on successfully exploiting the results and the interactions of analytics in order to attain their respective applicative goals. Several relevant research problems can be faced regarding this aspect, such as the following: (i) data-intensive analytics, i.e., analytics interacting with large-scale repositories such as big data; (ii) declarative against procedural analytics; (iii) quality-aware analytics and measures of the “quality” of analytics; (iv) responsiveness issues related to analytics; (v) scalability issues related to analytics, etc.
- Social Big Data Management for Workplace Safety and Health, where, regarding vertical application, it focuses on the “serendipitily” issue when exploring and integrating (big) data collected from different media sources such as blogs, social networks, emails, forums, communities, etc., while integrating and mining such data in order to share this data among users, along with successfully exploiting it for workplace safety and health. In this case, it will be necessary that people play an “active” role, meaning that they are able to share their opinions, criticisms, and positive/negative opinions and, therefore, create a “real” (big-data-based) community with powerful analytics capabilities.
- Integration of (Big) Bank Data and (Big) Customer Data for Supporting Big Data Intelligence. Vertical applications in this context aim to improve the “on-line” experience of their customers through typical banking services delivered via different types of devices. Both types of such data (i.e., bank data and customer data) are naturally large and also introduce the common characteristics of big data (i.e., 3Vs of big data). These applications will also be exposed to dealing with heterogeneous types of data, such as location data related to customers, and services, such as health services that may take advantage of their integration with bank data associated with customers, which are, as a result, considered as citizens by health services.
- Big Data Advertisement on the Web for Opportunity Funding. In this context, vertical applications consider the issues related to using big data analytics (also combined with artificial intelligence algorithms) to support advanced Web marketing in various domains (work, investments, etc.). This represents a very exciting challenge for the next-generation community, as it is encouraged by already widely-adopted technologies and platforms for the Web (e.g., Google, Amazon, Alibaba, etc.) and for social networks (e.g., Facebook, Instagram, LinkedIn, etc.), which give providers and vendors the amenity for collecting and storing immense amounts of big data repositories representing user profiles, user preferences, user goals, etc. In addition to these big data repositories, artificial intelligence algorithms are performed in order to discover the best sub-optimal solution in a very large set of Web application cases, such as buying, investing, dating, etc.
3.2. SeDaSOMA Implementation
- Big Data Sources. In contemporary data environments, enterprises utilize a combination of cloud and on-premise resources to access and manage diverse big data sources. Cloud platforms offer scalability and accessibility, allowing enterprises to store and process large volumes of data efficiently. Within these cloud environments, enterprises integrate various data sources, including social media platforms, which provide valuable insights into consumer behavior and market trends through user-generated content and engagement metrics. Additionally, enterprises leverage activity generated from their own digital channels, such as websites and mobile applications, to gain real-time insights into user interactions and preferences. On-premise infrastructure complements cloud resources by hosting proprietary archives of historical data and legacy systems, ensuring data continuity and regulatory compliance. Furthermore, enterprises tap into public data repositories and external sources to enrich their datasets with contextual information from governmental databases, open data initiatives, and third-party APIs. Regarding the handling the different types of big data at this level of the architecture, Oracle NoSQL Database can provide support for the following: (i) key–value data; (ii) MongoDB for document-shaped big data (e.g., JSON); (iii) Neo4J for big graph data; (iv) Apache Streaming for big stream data. This hybrid approach to big data management enables enterprises to leverage a diverse array of sources to support decision-making processes and drive innovation.
- Big Data Archives. Within the architecture of big data management and analytics systems, the big data archive component plays a crucial role in facilitating the storage, preservation, and accessibility of vast volumes of data. Serving as a repository for historical and infrequently accessed data, the big data archive component ensures the long-term retention of valuable information while optimizing storage resources and cost-effectiveness. Leveraging scalable storage technologies and efficient data compression techniques, this component accommodates the ever-growing influx of data generated by diverse sources. Moreover, the big data archive component incorporates robust data governance [46] and security mechanisms to safeguard sensitive information and adhere to regulatory compliance requirements. By seamlessly integrating with other components of the big data ecosystem, such as data processing and analytics modules, the archive component enables the efficient retrieval and analysis of archived data, thereby facilitating informed decision-making and fostering innovation in data-driven enterprises using.
- Transactional Systems. This component assumes a pivotal role in enabling real-time processing and management of high-volume transactional data streams. This component encompasses sophisticated data processing frameworks and distributed transaction processing engines designed to handle massive concurrent transactions with low latency and high throughput. Leveraging scalable and fault-tolerant architectures, the big data transactional system ensures the reliability and integrity of transactional data in dynamic and distributed environments. Furthermore, it integrates with various data sources and downstream analytics modules to enable continuous data ingestion, processing, and analysis, thus supporting timely decision-making and operational intelligence in enterprise settings. Additionally, the big data transactional system component incorporates advanced transaction management functionalities, including distributed locking mechanisms, transaction isolation levels, and conflict resolution strategies, to maintain data consistency and concurrency control across distributed computing nodes.
- Big Data Engine. This comprises the Hadoop Distributed File System (HDFS) and MapReduce, therefore constituting a fundamental infrastructure for scalable and parallel processing of large datasets. Hadoop HDFS serves as a distributed file system designed to store and replicate data across a cluster of commodity hardware, ensuring fault tolerance and high availability. Concurrently, MapReduce provides a programming model and runtime environment for distributed data processing, enabling efficient parallel execution of computational tasks across distributed computing nodes. Together, Hadoop HDFS and MapReduce form the backbone of big data processing frameworks, facilitating the distributed storage and processing of massive datasets in a fault-tolerant and cost-effective manner. This component plays a pivotal role in supporting diverse analytics workflows, ranging from batch processing and data warehousing to real-time stream processing and machine learning, thereby enabling enterprises to derive actionable insights and drive innovation through data-driven decision-making.
- Operational Data Store (ODS). This component assumes a critical role as an intermediary storage layer facilitating real-time access and integration of heterogeneous data sources. The ODS acts as a centralized repository for ingesting, cleansing, and harmonizing streaming and batch data from diverse operational systems, sensors, and external sources. By providing a unified view of operational data in near real-time, the ODS enables organizations to make informed decisions, monitor performance, and respond promptly to evolving business needs. Leveraging scalable distributed architectures and advanced data processing techniques, the ODS ensures data consistency, reliability, and timeliness, thus serving as a foundational component for downstream analytics, reporting, and decision support applications. Moreover, the ODS fosters interoperability and data sharing across disparate systems and departments, driving collaboration and innovation in data-driven enterprises.
- Big Data Analytics (SeDaSOMA). Our proposed framework assumes a central role, employing a suite of advanced tools and frameworks to extract actionable insights from vast and diverse datasets. Utilizing Spark SQL for SQL-based querying and processing of structured data, this component enables efficient and scalable data manipulation and analysis, facilitating exploratory data analysis, data visualization, and ad-hoc querying tasks. Complementing Spark SQL, MLlib provides a comprehensive library of machine learning algorithms and utilities, empowering users to build and deploy scalable machine learning models for classification, regression, clustering, and collaborative filtering tasks. Furthermore, Spark Streaming facilitates real-time stream processing and analysis of continuous data streams, enabling organizations to derive timely insights and respond promptly to evolving trends and events. Additionally, GraphX offers a powerful framework for graph analytics, supporting graph processing algorithms and graph-based analytics applications such as social network analysis, recommendation systems, and fraud detection. Together, these components form a robust and versatile analytics ecosystem, empowering organizations to unlock the full potential of big data and drive innovation through data-driven decision-making processes.
4. Discussion
- distributed big data management and analytics: complex methodologies;
- privacy-preservation methodologies for distributed big data management and analytics;
- distributed uncertain and imprecise big data management and analytics.
4.1. Distributed Big Data Management and Analytics: Complex Methodologies
- analysis of authoritative proposals in the investigated scientific area;
- selection of reference complex big data types;
- integration of complex big data repositories in specific Cloud data stores, such as NoSQL data layers;
- definition of novel big data management tasks on top of these data stores;
- project and implementation of big data management tasks;
- definition of novel big data analytics tools on top of these data stores;
- project and implementation of big data analytics tools.
4.2. Privacy-Preservation Methodologies for Distributed Big Data Management and Analytics
- sophisticated recommendation analytics for large data management and analytics that guarantee privacy in a distributed environment;
- identify case studies for privacy-preserving big data management and analysis in distributed environments (e.g., IoT, social networks, Cloud storage, etc.);
- specification of big data analytics and management tools/processes to be handled in distributed environments (e.g., OLAP, data publication, analytics-based tensor, long-term big data analytics, etc.);
- develop novel tools for privacy-preserving big data management and analysis in distributed environments, based on differential privacy theory, for example;
- design, implement, and test privacy-preserving big data management and analysis algorithms in a distributed environment;
- create and deploy benchmark case studies to comprehensively assess privacy-cyber-secure big data management and analysis in distributed environments;
- create and implement tuning solutions for predefined case studies.
4.3. Distributed Uncertain and Imprecise Big Data Management and Analytics
- evaluation of state-of-the-art proposals in the context of approximate paradigms for supporting big data management and analytics in distributed environments;
- development of target uncertain and imprecise big data management and analytics scenarios in distributed environments for usage as case studies (e.g., Internet of Things, social networks, Cloud storage, and so forth);
- definition of target big data management and analytics tools and processes in distributed environments to be addressed, which are characterized by uncertainty and imprecision (e.g., OLAP over streaming big data, sensor networks, social networks, etc.);
- establishment of novel approximate big data analytics and management tools over imprecise and uncertain big data repositories in distributed environments, such as those based on probability theory;
- creation, execution, and evaluation of approximation-based big data management and analytics algorithms for unpredictable and inaccurate big data repositories in distributed environments;
- establishment and execution of benchmark case studies aimed at comprehensively evaluating approximate big data management and analytics techniques over ambiguous and imprecise big data repositories in distributed environments.
5. A Practical Implementation: The CORE-BCD-mAI Framework
Methodologies, Methods and Main Functionalities of CORE-BCD-mAI
6. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Chen, J.; Chen, Y.; Du, X.; Li, C.; Lu, J.; Zhao, S.; Zhou, X. Big Data Challenge: A Data Management Perspective. Front. Comput. Sci. Sci. 2013, 7, 157–164. [Google Scholar] [CrossRef]
- Russom, P. Big Data Analytics. TDWI Best Pract. Rep. 2011, 19, 1–34. [Google Scholar]
- Hashem, I.A.T.; Chang, V.; Anuar, N.B.; Adewole, K.S.; Yaqoob, I.; Gani, A.; Ahmed, E.; Chiroma, H. The Role of Big Data in Smart City. Int. J. Inf. Manag. 2016, 36, 748–758. [Google Scholar] [CrossRef]
- Tan, W.; Blake, M.B.; Saleh, I.; Dustdar, S. Social-Network-Sourced Big Data Analytics. IEEE Internet Comput. 2013, 17, 62–69. [Google Scholar] [CrossRef]
- Bonifati, A.; Cuzzocrea, A. Storing and Retrieving XPath Fragments in Structured P2P Networks. Data Knowl. Eng. 2006, 59, 247–269. [Google Scholar] [CrossRef]
- Zhu, L.; Yu, F.R.; Wang, Y.; Ning, B.; Tang, T. Big Data Analytics in Intelligent Transportation Systems: A Survey. IEEE Trans. Intell. Transp. Syst. 2019, 20, 383–398. [Google Scholar] [CrossRef]
- Baqleh, L.A.; Alateeq, M.M. The Impact of Supply Chain Management Practices on Competitive Advantage: The Moderating Role of Big Data Analytics. Int. J. Prof. Bus. Rev. 2023, 8, 3. [Google Scholar] [CrossRef]
- Zhou, Y. Integrated Development of Industrial and Regional Economy using Big Data Technology. Comput. Electr. Eng. 2023, 109, 108764. [Google Scholar] [CrossRef]
- Cuzzocrea, A. Approximate OLAP Query Processing over Uncertain and Imprecise Multidimensional Data Streams. In Proceedings of the 24th International Conference on Database and Expert Systems Applications, DEXA 2013, Prague, Czech Republic, 26–29 August 2013. [Google Scholar]
- Cuzzocrea, A.; Serafino, P. LCS-Hist: Taming Massive High-dimensional Data Cube Compression. In Proceedings of the 12th International Conference on Extending Database Technology, EDBT 2009, Saint Petersburg, Russia, 24–26 March 2009. [Google Scholar]
- Ceci, M.; Cuzzocrea, A.; Malerba, D. Effectively and Efficiently Supporting Roll-up and Drill-down OLAP Operations over Continuous Dimensions via Hierarchical Clustering. J. Intell. Inf. Syst. 2015, 44, 309–333. [Google Scholar] [CrossRef]
- Cuzzocrea, A. OLAP Intelligence: Meaningfully Coupling OLAP and Data Mining Tools and Algorithms. Int. J. Bus. Intell. Data Min. 2009, 4, 213–218. [Google Scholar]
- Cuzzocrea, A. Scalable OLAP-based Big Data Analytics over Cloud Infrastructures: Models, Issues, Algorithms. In Proceedings of the 2017 International Conference on Cloud and Big Data Computing, ICCBDC 2017, London, UK, 17–19 September 2017. [Google Scholar]
- Han, J.; Sethu, H. OLAP Mining: Integration of OLAP with Data Mining. In Proceedings of the 7th Conference on Database Semantics, DS-7, Leysin, Switzerland, 7–10 October 1997. [Google Scholar]
- Adadi, A. A Survey on Data-Efficient Algorithms in Big Data Era. J. Big Data 2021, 8, 24. [Google Scholar] [CrossRef]
- Chaudhuri, S.; Dayal, U. An Overview of Data Warehousing and OLAP Technology. SIGMOD Rec. 1997, 26, 65–74. [Google Scholar] [CrossRef]
- Aidala, C.A.; Burr, C.; Cattaneo, M.; Fitzgerald, D.S.; Morris, A.; Neubert, S.; Tropmann, D. Ntuple Wizard: An Application to Access Large-Scale Open Data from LHCb. Comput. Softw. Big Sci. 2023, 7, 6. [Google Scholar] [CrossRef]
- Coronato, A.; Cuzzocrea, A. An Innovative Risk Assessment Methodology for Medical Information Systems. IEEE Trans. Knowl. Data Eng. 2022, 34, 3095–3110. [Google Scholar] [CrossRef]
- Khalil, M.; Esseghir, M.; Merghem-Boulahia, L. Privacy-Preserving Federated Learning: An Application for Big Data Load Forecast in Buildings. Comput. Secur. 2023, 131, 103211. [Google Scholar] [CrossRef]
- Zheng, Z.; Zhu, J.; Lyu, M.R. Service-Generated Big Data and Big Data-as-a-Service: An Overview. In Proceedings of the IEEE International Congress on Big Data, BigData Congress 2013, Santa Clara, CA, USA, 27 June–2 July 2013. [Google Scholar]
- Fahmideh, M.; Beydoun, G. Big Data Analytics Architecture Design—An Application in Manufacturing Systems. Comput. Ind. Eng. 2019, 128, 948–963. [Google Scholar] [CrossRef]
- European Commission. Horizon Europe–The EU Framework Programme for Research and Innovation; European Commission: Brussels, Belgium, 2022; Available online: https://research-and-innovation.ec.europa.eu/funding/funding-opportunities/funding-programmes-and-open-calls/horizon-europe_en (accessed on 1 April 2023).
- Cuzzocrea, A.; Ciancarini, P. SeDaSOMA: A Framework for Supporting Serendipitous, Data-As-A-Service-Oriented, Open Big Data Management and Analytics. In Proceedings of the 5th International Conference on Cloud and Big Data Computing, ICCBDC 2021, Liverpool, UK, 13–15 August 2021. [Google Scholar]
- Cuzzocrea, A. Advanced, Privacy-Preserving and Approximate Big Data Management and Analytics in Distributed Environments: What is Now and What is Next. In Proceedings of the 44th IEEE Annual Computers, Software, and Applications Conference, COMPSAC 2020, Madrid, Spain, 13–17 July 2020. [Google Scholar]
- Cuzzocrea, A.; Bringas, P.G. CORE-BCD-mAI: A Composite Framework for Representing, Querying, and Analyzing Big Clinical Data by Means of Multidimensional AI Tools. In Proceedings of the 17th International Conference on Hybrid Artificial Intelligent Systems, HAIS 2022, Salamanca, Spain, 5–7 September 2022. [Google Scholar]
- Pavlopoulou, C.; Carey, M.J.; Tsotras, V.J. Revisiting Runtime Dynamic Optimization for Join Queries in Big Data Management Systems. SIGMOD Rec. 2023, 52, 104–113. [Google Scholar] [CrossRef]
- Siddiqa, A.; Hashem, I.A.T.; Yaqoob, I.; Marjani, M.; Shamshirband, S.; Gani, A.; Nasaruddin, F. A Survey of Big Data Management: Taxonomy and State-of-the-art. J. Netw. Comput. Appl. 2016, 71, 151–166. [Google Scholar] [CrossRef]
- Mikalef, P.; Boura, M.; Lekakos, G.; Krogstie, J. Big Data Analytics and Firm Performance: Findings from a Mixed-Method Approach. J. Bus. Res. 2019, 98, 261–276. [Google Scholar] [CrossRef]
- Woodside, A.G. Embrace• Perform• Model: Complexity Theory, Contrarian Case Analysis, and Multiple Realities. J. Bus. Res. 2014, 67, 2495–2503. [Google Scholar] [CrossRef]
- Ranjan, J.; Foropon, C. Big Data Analytics in Building the Competitive Intelligence of Organizations. Int. J. Inf. Manag. 2021, 56, 102231. [Google Scholar] [CrossRef]
- Wang, Y.; Wei, J.; Srivatsa, M.; Duan, Y.; Du, W. IntegrityMR: Integrity Assurance Framework for Big Data Analytics and Management Applications. In Proceedings of the 2013 IEEE International Conference on Big Data, BigData 2013, Santa Clara, CA, USA, 6–9 October 2013. [Google Scholar]
- Fiore, S.; Palazzo, C.; D’Anca, A.; Foster, I.T.; Williams, D.N.; Aloisio, G. A Big Data Analytics Framework for Scientific Data Management. In Proceedings of the 2013 IEEE International Conference on Big Data, BigData 2013, Santa Clara, CA, USA, 6–9 October 2013. [Google Scholar]
- Puthal, D.; Nepal, S.; Ranjan, R.; Chen, J. A Secure Big Data Stream Analytics Framework for Disaster Management on the Cloud. In Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2016, Sydney, Australia, 12–14 December 2016. [Google Scholar]
- Abdullah, M.F.; Ibrahim, M.; Zulkifli, H. Big Data Analytics Framework for Natural Disaster Management in Malaysia. In Proceedings of the 2nd International Conference on Internet of Things, Big Data and Security, IoTBDS 2017, Porto, Portugal, 24–26 April 2017. [Google Scholar]
- Terrazas, G.; Ferry, N.; Ratchev, S.M. A Cloud-based Framework for Shop Floor Big Data Management and Elastic Computing Analytics. Comput. Ind. 2019, 109, 204–214. [Google Scholar] [CrossRef]
- Jindal, A.; Kumar, N.; Singh, M. A Unified Framework for Big Data Acquisition, Storage, and Analytics for Demand Response Management in Smart Cities. Future Gener. Comput. Syst. 2020, 108, 921–934. [Google Scholar] [CrossRef]
- Almagrabi, A.O.; Ali, R.; Alghazzawi, D.M.; Albarakati, A.; Khurshaid, T. A Reinforcement Learning-Based Framework for Crowdsourcing in Massive Health Care Internet of Things. Big Data 2022, 10, 161–170. [Google Scholar] [CrossRef]
- Mehmood, E.; Anees, T. Distributed Real-Time ETL Architecture for Unstructured Big Data. Knowl. Inf. Syst. 2022, 64, 3419–3445. [Google Scholar] [CrossRef]
- Miltiadou, D.; Pitsios, S.; Spyropoulos, D.; Alexandrou, D.; Lampathaki, F.; Messina, D.; Perakis, K. A Big Data Intelligence Marketplace and Secure Analytics Experimentation Platform for the Aviation Industry. In Proceedings of the 10th EAI International Conference and 13th EAI International Conference on Wireless Internet, BDTA/WiCON 2020, Virtual Event, 11 December 2020. [Google Scholar]
- Dinh, L.T.N.; Karmakar, G.C.; Kamruzzaman, J. A Survey on Context Awareness in Big Data Analytics for Business Applications. Knowl. Inf. Syst. 2010, 62, 3387–3415. [Google Scholar] [CrossRef]
- Doherty, A.J.; Murphy, R.; Schieweck, A.; Clancy, S.; Breathnach, C.; Margaria, T. CensusIRL: Historical Census Data Preparation with MDD Support. In Proceedings of the 2022 IEEE International Conference on Big Data, BigData 2022, Osaka, Japan, 17–20 December 2022. [Google Scholar]
- Zhang, H.; Chen, G.; Ooi, B.C.; Tan, K.-L.; Zhang, M. In-Memory Big Data Management and Processing: A Survey. IEEE Trans. Knowl. Data Eng. 2015, 27, 1920–1948. [Google Scholar] [CrossRef]
- Buyle, R.; Taelman, R.; Mostaert, K.; Joris, G.; Mannens, E.; Verborgh, R.; Berners-Lee, T. Streamlining Governmental Processes by Putting Citizens in Control of their Personal Data. In Proceedings of the 6th International Conference on Electronic Governance and Open Society: Challenges in Eurasia, EGOSE 2019, St. Petersburg, Russia, 13–14 November 2019. [Google Scholar]
- Cuzzocrea, A.; Damiani, E. Making the Pedigree to Your Big Data Repository: Innovative Methods, Solutions, and Algorithms for Supporting Big Data Privacy in Distributed Settings via Data-Driven Paradigms. In Proceedings of the 43rd IEEE Annual Computer Software and Applications Conference, COMPSAC 2019, Milwaukee, WI, USA, 15–19 July 2019. [Google Scholar]
- Elmeiligy, M.A.; El-Desouky, A.I.; El-Ghamrawy, S.M. A Multi-Dimensional Big Data Storing System for Generated COVID-19 Large-Scale Data using Apache Spark. arXiv 2020, arXiv:2005.05036. [Google Scholar] [CrossRef]
- Alaoui, S.S.; Farhaoui, Y.; Aksasse, B. Data Openness for Efficient E-Governance in the Age of Big Data. Int. J. Cloud Comput. 2021, 10, 522–532. [Google Scholar] [CrossRef]
- Xiao, F.; Xie, J.; Chen, Z.; Li, F.; Chen, Z.; Liu, J.; Liu, Y. Ganos Aero: A Cloud-Native System for Big Raster Data Management and Processing. Proc. VLDB Endow. 2023, 16, 3966–3969. [Google Scholar] [CrossRef]
- Mehta, N.; Pandit, A.; Shukla, S. Transforming Healthcare with Big Data Analytics and Artificial Intelligence: A Systematic Mapping Study. J. Biomed. Inform. 2019, 100, 103311. [Google Scholar] [CrossRef]
- Galakatos, A.; Markovitch, M.; Binnig, C.; Fonseca, R.; Kraska, T. FITing-Tree: A Data-aware Index Structure. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD/PODS 2019, Amsterdam, The Netherlands, 30 June–5 July 2019. [Google Scholar]
- Gu, J.; Watanabe, Y.H.; Mazza, W.A.; Shkapsky, A.; Yang, M.; Ding, L.; Zaniolo, C. RaSQL: Greater Power and Performance for Big Data Analytics with Recursive-Aggregate-SQL on Spark. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD/PODS 2019, Amsterdam, The Netherlands, 30 June–5 July 2019. [Google Scholar]
- Xie, T.; Chandola, V.; Kennedy, O. Query Log Compression for Workload Analytics. Proc. VLDB Endow. 2018, 12, 183–196. [Google Scholar] [CrossRef]
- Chatzimilioudis, G.; Cuzzocrea, A.; Gunopulos, D.; Mamoulis, N. A Novel Distributed Framework for Optimizing Query Routing Trees in Wireless Sensor Networks via Optimal Operator Placement. J. Comput. Syst. Sci. 2013, 79, 349–368. [Google Scholar] [CrossRef]
- Nguyen, D.T.; Jung, J.E. Real-Time Event Detection for Online Behavioral Analysis of Big Social Data. Future Gener. Comput. Syst. 2017, 66, 137–145. [Google Scholar] [CrossRef]
- Cuzzocrea, A.; Song, I.Y.; Davis, K.C. Analytics over Large-Scale Multidimensional Data: The Big Data Revolution! In Proceedings of the ACM 14th International Workshop on Data Warehousing and OLAP, DOLAP 2011, Glasgow, UK, 28 October 2009. [Google Scholar]
- Han, G.; Sethu, H. Closed Walk Sampler: An Efficient Method for Estimating Eigenvalues of Large Graphs. IEEE Trans. Big Data 2020, 6, 29–42. [Google Scholar] [CrossRef]
- Islam, M.M.; Razzaque, M.A.; Hassan, M.M.; Ismail, W.N.; Song, B. Mobile Cloud-Based Big Healthcare Data Processing in Smart Cities. IEEE Access 2017, 5, 11887–11899. [Google Scholar] [CrossRef]
- Zhang, J.; Wu, S.; Tan, Z.; Chen, G.; Cheng, Z.; Cao, W.; Gao, Y.; Feng, X. S3: A Scalable In-memory Skip-List Index for Key-Value Store. Proc. VLDB Endow. 2019, 12, 2183–2194. [Google Scholar] [CrossRef]
- Cuzzocrea, A. Aggregation and Multidimensional Analysis of Big Data for Large-Scale Scientific Applications: Models, Issues, Analytics, and Beyond. In Proceedings of the 27th International Conference on Scientific and Statistical Database Management, SSDBM 2015, La Jolla, CA, USA, 29 June 2015–1 July 2015. [Google Scholar]
- Zhang, J.; Liu, Y.; Zhou, K.; Li, G.; Xiao, Z.; Cheng, B.; Xing, J.; Wang, Y.; Cheng, T.; Liu, L.; et al. An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD 2019, Amsterdam, The Netherlands, 30 June–5 July 2019. [Google Scholar]
- Lu, R.; Zhu, H.; Liu, X.; Liu, J.K.; Shao, J. Toward Efficient and Privacy-Preserving Computing in Big Data Era. IEEE Netw. 2014, 28, 46–50. [Google Scholar] [CrossRef]
- Tran, H.-Y.; Hu, J. Privacy-Preserving Big Data Analytics A Comprehensive Survey. J. Parallel Distrib. Comput. 2019, 134, 207–218. [Google Scholar] [CrossRef]
- Au, M.H.; Liang, K.; Liu, J.K.; Lu, R.; Ning, J. Privacy-Preserving Personal Data Operation on Mobile Cloud-Chances and Challenges over Advanced Persistent Threat. Future Gener. Comput. Syst. 2018, 79, 337–349. [Google Scholar] [CrossRef]
- Komishani, E.G.; Abadi, M.; Deldar, F. PPTD: Preserving Personalized Privacy in Trajectory Data Publishing by Sensitive Attribute Generalization and Trajectory Local Suppression. Knowl. Based Syst. 2016, 94, 43–59. [Google Scholar] [CrossRef]
- Liang, P.; Zhang, L.; Kang, L.; Ren, J. Privacy-Preserving Decentralized ABE for Secure Sharing of Personal Health Records in Cloud Storage. J. Inf. Secur. Appl. 2019, 47, 258–266. [Google Scholar] [CrossRef]
- Boubiche, S.; Boubiche, D.E.; Bilami, A.; Toral-Cruz, H. Big Data Challenges and Data Aggregation Strategies in Wireless Sensor Networks. IEEE Access 2018, 6, 20558–20571. [Google Scholar] [CrossRef]
- Cuzzocrea, A. Privacy-Preserving Big Data Management: The Case of OLAP. In Big Data-Algorithms, Analytics, and Applications; Chapman and Hall/CRC: Boca Raton, FL, USA, 2015; pp. 301–326. [Google Scholar]
- Cuzzocrea, A.; Saccà, D. A Constraint-Based Framework for Computing Privacy Preserving OLAP Aggregations on Data Cubes. In Proceedings of the 15th East-European Conference on Advances in Databases and Information Systems, ADBIS 2011, Vienna, Austria, 20–23 September 2011. [Google Scholar]
- Chen, Y.; Guo, J.; Li, C.; Ren, W. FaDe: A Blockchain-Based Fair Data Exchange Scheme for Big Data Sharing. Future Internet 2019, 11, 225. [Google Scholar] [CrossRef]
- Zheng, Z.; Xie, S.; Dai, H.; Chen, X.; Wang, H. An Overview of Blockchain Technology: Architecture, Consensus, and Future Trends. In Proceedings of the 2017 IEEE International Congress on Big Data, BigData Congress 2017, Honolulu, HI, USA, 25–30 June 2017. [Google Scholar]
- Tankard, C. Big Data Security. Netw. Secur. 2012, 2012, 5–8. [Google Scholar] [CrossRef]
- Zakerzadeh, H.; Aggarwal, C.C.; Barker, K. Privacy-Preserving Big Data Publishing. In Proceedings of the 27th International Conference on Scientific and Statistical Database Management, SSDBM 2015, La Jolla, CA, USA, 29 June 2015–1 July 2015. [Google Scholar]
- Cuzzocrea, A.; Bertino, E.; Saccà, D. Towards a Theory for Privacy Preserving Distributed OLAP. In Proceedings of the 2012 Joint EDBT/ICDT Workshops, EDBT/ICDT 2012, Berlin, Germany, 30 March 2012. [Google Scholar]
- Dwork, C. Differential Privacy: A Survey of Results. In Proceedings of the 5th International Conference on Theory and Applications of Models of Computation, TAMC 2008, Xi’an, China, 25–29 April 2008. [Google Scholar]
- Song, Q.; Ge, H.; Caverlee, J.; Hu, X. Tensor Completion Algorithms in Big Data Analytics. ACM Trans. Knowl. Discov. Data 2019, 13, 1–48. [Google Scholar] [CrossRef]
- Qaosar, M.; Alam, K.M.R.; Li, C.; Morimoto, Y. Privacy-Preserving Top-K Dominating Queries in Distributed Multi-Party Databases. In Proceedings of the 2019 IEEE International Conference on Big Data, BigData 2019, Los Angeles, CA, USA, 9–12 December 2019. [Google Scholar]
- Grolinger, K.; Higashino, W.A.; Tiwari, A.; Capretz, M.A.M. Data Management in Cloud Environments: NoSQL and NewSQL Data Stores. J. Cloud Comput. 2013, 2, 22. [Google Scholar] [CrossRef]
- Wang, T.; Ding, B.; Zhou, J.; Hong, C.; Huang, Z.; Li, N.; Jha, S. Answering Multi-Dimensional Analytical Queries under Local Differential Privacy. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD/PODS 2019, Amsterdam, The Netherlands, 30 June–5 July 2019. [Google Scholar]
- Braun, P.; Cuzzocrea, A.; Jiang, F.; Leung, C.K.-S.; Pazdor, A.G.M. MapReduce-Based Complex Big Data Analytics over Uncertain and Imprecise Social Networks. In Proceedings of the 19th International Conference on Big Data Analytics and Knowledge Discovery, DaWaK 2017, Lyon, France, 28–31 August 2017. [Google Scholar]
- Hariri, R.H.; Fredericks, E.M.; Bowers, K.M. Uncertainty in Big Data Analytics: Survey, Opportunities, and Challenges. J. Big Data 2019, 6, 44. [Google Scholar] [CrossRef]
- Mouratidis, K.; Tang, B. Exact Processing of Uncertain Top-K Queries in Multi-Criteria Settings. Proc. VLDB Endow. 2018, 11, 866–879. [Google Scholar] [CrossRef]
- Muzammal, M.; Gohar, M.; Rahman, A.U.; Qu, Q.; Ahmad, A.; Jeon, G. Trajectory Mining Using Uncertain Sensor Data. IEEE Access 2018, 6, 4895–4903. [Google Scholar] [CrossRef]
- Cuzzocrea, A. CAMS: OLAPing Multidimensional Data Streams Efficiently. In Proceedings of the 11th International Conference on Big Data Analytics and Knowledge Discovery, DaWaK 2009, Linz, Austria, 31 August–2 September 2009. [Google Scholar]
- Hershberger, J.; Shrivastava, N.; Suri, S.; Tóth, C.D. Adaptive Spatial Partitioning for Multidimensional Data Streams. Algorithmica 2006, 46, 97–117. [Google Scholar] [CrossRef]
- Feng, Y.; Zhou, Y.; Tarokh, V. Recurrent Neural Network-Assisted Adaptive Sampling for Approximate Computing. In Proceedings of the 2019 IEEE International Conference on Big Data, BigData 2019, Los Angeles, CA, USA, 9–12 December 2019. [Google Scholar]
- Ma, S.; Huai, J. Approximate Computation for Big Data Analytics. ACM SIGWEB Newsl. 2021, 2021, 1–8. [Google Scholar] [CrossRef]
- Pei, J. Some New Progress in Analyzing and Mining Uncertain and Probabilistic Data for Big Data Analytics. In Proceedings of the 14th International Conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, RSFDGrC 2013, Halifax, NS, Canada, 11–14 October 2013. [Google Scholar]
- Kantere, V. Approximate Queries on Big Heterogeneous Data. In Proceedings of the 2015 IEEE International Congress on Big Data, BigData Congress 2015, New York City, NY, USA, 27 June–2 July 2015. [Google Scholar]
- Zhou, Z.; Zhang, H.; Li, S.; Du, X. Hermes: A Privacy-Preserving Approximate Search Framework for Big Data. IEEE Access 2018, 6, 20009–20020. [Google Scholar] [CrossRef]
- Cech, P.; Lokoc, J.; Silva, Y.N. Pivot-Based Approximate k-NN Similarity Joins for Big High-Dimensional Data. Inf. Syst. 2020, 87, 101410. [Google Scholar] [CrossRef]
- Salloum, S.; Wu, Y.; Huang, J.Z. A Sampling-Based System for Approximate Big Data Analysis on Computing Clusters. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM 2019, Beijing, China, 3–7 November 2019. [Google Scholar]
- Paredes, P.; Ribeiro, P.M.P. Rand-FaSE: Fast Approximate Subgraph Census. Soc. Netw. Anal. Min. 2015, 5, 17:1–17:18. [Google Scholar] [CrossRef]
- Perozzi, B.; McCubbin, C.; Halbert, J.T. Scalable Graph Clustering with Parallel Approximate PageRank. Soc. Netw. Anal. Min. 2014, 4, 179. [Google Scholar] [CrossRef]
- Park, Y.; Mozafari, B.; Sorenson, J.; Wang, J. VerdictDB: Universalizing Approximate Query Processing. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD 2018, Houston, TX, USA, 10–15 June 2018. [Google Scholar]
- Peng, J.; Zhang, D.; Wang, J.; Pei, J. AQP++: Connecting Approximate Query Processing with Aggregate Precomputation for Interactive Analytics. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD 2018, Houston, TX, USA, 10–15 June 2018. [Google Scholar]
- Zeng, K.; Agarwal, S.; Stoica, I. IOLAP: Managing Uncertainty for Efficient Incremental OLAP. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD 2016, San Francisco, CA, USA, 26 June–1 July 2016. [Google Scholar]
- Yu, F.; Hou, W.-C. CS*: Approximate Query Processing on Big Data using Scalable Join Correlated Sample Synopsis. In Proceedings of the 2019 IEEE International Conference on Big Data, BigData 2019, Los Angeles, CA, USA, 9–12 December 2019. [Google Scholar]
- Hasani, S.; Thirumuruganathan, S.; Asudeh, A.; Koudas, N.; Das, G. Efficient Construction of Approximate Ad-Hoc ML Models Through Materialization and Reuse. Proc. VLDB Endow. 2018, 11, 1468–1481. [Google Scholar] [CrossRef]
- Xiao, G.; Li, K.; Zhou, X.; Li, K. Efficient Monochromatic and Bichromatic Probabilistic Reverse Top-K Query Processing for Uncertain Big Data. J. Comput. Syst. Sci. 2017, 89, 92–113. [Google Scholar] [CrossRef]
- Benbernou, S.; Ouziri, M. Query Answering on Uncertain Big RDF Data Using Apache Spark Framework. In Proceedings of the 2018 IEEE International Conference on Big Data, BigData 2018, Seattle, WA, USA, 10–13 December 2018. [Google Scholar]
- Yuan, Y.; Wang, G.; Chen, L.; Ning, B. Efficient Pattern Matching on Big Uncertain Graphs. Inf. Sci. 2016, 339, 369–394. [Google Scholar] [CrossRef]
- Perez-Arriaga, M.O.; Poddar, K.A. Clinical Trials Data Management in the Big Data Era. In Proceedings of the 2020 IEEE International Congress on Big Data, BigData Congress 2020, Honolulu, HI, USA, 18–20 September 2020. [Google Scholar]
- Shae, Z.-Y.; Tsai, J.J.P. A Clinical Kidney Intelligence Platform Based on Big Data, Artificial Intelligence, and Blockchain Technology. Int. J. Artif. Intell. Tools 2022, 31, 2241007. [Google Scholar] [CrossRef]
- Gray, J.; Chaudhuri, S.; Bosworth, A.; Layman, A.; Reichart, D.; Venkatrao, M.; Pellow, F.; Pirahesh, H. Data Cube: A Relational Aggregation Operator Generalizing Group-by, cross-Tab, and Sub Totals. Data Min. Knowl. Discov. 1997, 1, 29–53. [Google Scholar] [CrossRef]
- Shahbaz, M.; Gao, C.-Y.; Zhai, L.; Shahzad, F.; Hu, Y. Investigating the Adoption of Big Data Analytics in Healthcare: The Moderating Role of Resistance to Change. J. Big Data 2019, 6, 6. [Google Scholar] [CrossRef]
- Chrimes, D.; Zamani, H. Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services. Comput. Math. Methods Med. 2017, 2017, 6120820. [Google Scholar] [CrossRef]
- Groves, P.; Kayyali, B.; Knott, D.; Kuiken, S.V. The ‘Big Data’ Revolution in Healthcare: Accelerating Value and Innovation; McKinsey Tech Rep: New York, NY, USA, 2016. [Google Scholar]
- Habl, C.; Renner, A.T.; Bobek, J.; Laschkolnig, A. Study on Big Data in Public Health, Telemedicine and Healthcare; European Commission Tech Rep: Brussels, Belgium, 2016. [Google Scholar]
- Nam, J.; Kwon, H.W.; Lee, H.; Ahn, E.K. National Healthcare Service and Its Big Data Analytics. Healthc. Inform. Res. 2018, 24, 247–249. [Google Scholar] [CrossRef]
- Yang, E.; Scheff, J.D.; Shen, S.C.; Farnum, M.; Sefton, J.; Lobanov, V.S.; Agrafiotis, D.K. A Late-Binding, Distributed, NoSQL Warehouse for Integrating Patient Data from Clinical Trials. Database J. Biol. Databases Curation 2019, 2019, baz032. [Google Scholar] [CrossRef] [PubMed]
- Laney, D. 3D Data Management: Controlling Data Volume, Velocity, and Variety; Technical Report; META Group Inc.: Stamford, CT, USA, 2001. [Google Scholar]
- Barkwell, K.E.; Cuzzocrea, A.; Leung, C.K.; Ocran, A.A.; Sanderson, J.M. Big Data Visualization and Visual Analytics for Music Data Mining. In Proceedings of the 22nd International Conference Information Visualisation, IV 2018, Fisciano, Italy, 10–13 July 2018. [Google Scholar]
- Keim, D.A.; Qu, H.; Ma, K.-L. Big-Data Visualization. IEEE Comput. Graph. Appl. 2013, 33, 20–21. [Google Scholar] [CrossRef]
- Armbrust, M.; Fox, A.; Griffith, R.; Joseph, A.D.; Katz, R.H.; Konwinski, A.; Lee, G.; Patterson, D.A.; Rabkin, A.; Stoica, I.; et al. A View of Cloud Computing. Commun. ACM 2010, 53, 50–58. [Google Scholar] [CrossRef]
- Buyya, R.; Yeo, C.S.; Venugopal, S.; Broberg, J.; Brandic, I. Cloud Computing and Emerging IT Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility. Future Gener. Comput. Syst. 2009, 25, 599–616. [Google Scholar] [CrossRef]
- White, T. Hadoop: The Definitive Guide; O’Reilly Media Inc.: Sebastopol, CA, USA, 2009. [Google Scholar]
- Dean, J.; Ghemawat, S. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 2008, 51, 107–113. [Google Scholar] [CrossRef]
- Gale, C.; Statnikov, Y.; Jawad, S.; Uthaya, S.N.; Modi, N. Neonatal Brain Injuries in England: Population-Based Incidence Derived from Routinely Recorded Clinical Data Held in the National Neonatal Research Database. ADC Fetal Neonatal Ed. 2017, 103, 301–3415. [Google Scholar] [CrossRef]
- Wu, X.; Duan, J.; Pan, Y.; Li, M. Medical Knowledge Graph: Data Sources, Construction, Reasoning, and Applications. Big Data Min. Anal. 2023, 6, 201–217. [Google Scholar] [CrossRef]
- Minatogawa, V.L.F.; Franco, M.M.V.; Rampasso, I.S.; Anholon, R.; Quadros, R.; Durán, O.; Batocchio, A. Operationalizing Business Model Innovation through Big Data Analytics for Sustainable Organizations. Sustainability 2020, 12, 277. [Google Scholar] [CrossRef]
- Sun, Y.; Xiong, H.; Yiu, S.M.; Lam, K.-Y. BitAnalysis: A Visualization System for Bitcoin Wallet Investigation. IEEE Trans. Big Data 2023, 9, 621–636. [Google Scholar] [CrossRef]
- Íñiguez, L.; Galar, M. A Scalable and Flexible Open Source Big Data Architecture for Small and Medium-Sized Enterprises. In Proceedings of the 16th International Conference on Soft Computing Models in Industrial and Environmental Applications, SOCO 2021, Bilbao, Spain, 22–24 September 2021. [Google Scholar]
- Stergiou, C.L.; Psannis, K.E.; Gupta, B.B. InFeMo: Flexible Big Data Management Through a Federated Cloud System. ACM Trans. Internet Techn. 2022, 22, 1–22. [Google Scholar] [CrossRef]
- Teng, D.; Kong, J.; Wang, F. Scalable and flexible management of medical image big data. Distrib. Parallel Databases 2019, 37, 235–250. [Google Scholar] [CrossRef] [PubMed]
- Haseeb, K.; Saba, T.; Rehman, A.; Ahmed, I.; Lloret, J. Efficient Data Uncertainty Management for Health Industrial Internet of Things Using Machine Learning. Int. J. Commun. Syst. 2021, 34, 4948. [Google Scholar] [CrossRef]
- Shukla, A.K.; Muhuri, P.K. Big-data Clustering with Interval Type-2 Fuzzy Uncertainty Modeling in Gene Expression Datasets. Eng. Appl. Artif. Intell. 2019, 77, 268–282. [Google Scholar] [CrossRef]
- Koshizuka, N.; Mano, H. DATA-EX: Infrastructure for Cross-Domain Data Exchange Based on Federated Architecture. In Proceedings of the IEEE International Conference on Big Data, Big Data 2022, Osaka, Japan, 17–20 December 2022. [Google Scholar]
- Li, T.; Ren, W.; Xiang, Y.; Zheng, X.; Zhu, T.; Choo, K.-K.R.; Srivastava, G. FAPS: A Fair, Autonomous and Privacy-Preserving Scheme for Big Data Exchange Based on Oblivious Transfer, Ether Cheque and Smart Contracts. Inf. Sci. 2021, 544, 469–484. [Google Scholar] [CrossRef]
- Kang, Q.; Liu, J.; Yang, S.; Xiong, H.; An, H.; Li, X.; Feng, Z.; Wang, L.; Dou, D. Quasi-Optimal Data Placement for Secure Multi-tenant Data Federation on the Cloud. In Proceedings of the 2020 IEEE International Conference on Big Data, BigData 2020, Atlanta, GA, USA, 10–13 December 2020. [Google Scholar]
- Liu, J.; Zhou, X.; Mo, L.; Ji, S.; Liao, Y.; Li, Z.; Gu, Q.; Dou, D. Distributed and Deep Vertical Federated Learning with Big Data. Concurr. Comput. Pract. Exp. 2023, 35, e7697. [Google Scholar] [CrossRef]
- Nair, A.K.; Sahoo, J.; Raj, E.D. Privacy Preserving Federated Learning Framework for IoMT Based Big Data Analysis using Edge Computing. Comput. Stand. Interfaces 2023, 86, 103720. [Google Scholar] [CrossRef]
Layers | Technology |
---|---|
Big Data Source Layer | Oracle NoSQL Database, MongoDB, Neo4J, Apache Streaming |
Big Data Repository and Provisioning Layer | Apache Hadoop |
Big Data Analytics Layer | Apache Spark |
Big Data Application Layer | Java, C#, J#, Python |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cuzzocrea, A.; Ciancarini, P. Serendipitous, Open Big Data Management and Analytics: The SeDaSOMA Framework. Modelling 2024, 5, 1173-1196. https://doi.org/10.3390/modelling5030061
Cuzzocrea A, Ciancarini P. Serendipitous, Open Big Data Management and Analytics: The SeDaSOMA Framework. Modelling. 2024; 5(3):1173-1196. https://doi.org/10.3390/modelling5030061
Chicago/Turabian StyleCuzzocrea, Alfredo, and Paolo Ciancarini. 2024. "Serendipitous, Open Big Data Management and Analytics: The SeDaSOMA Framework" Modelling 5, no. 3: 1173-1196. https://doi.org/10.3390/modelling5030061
APA StyleCuzzocrea, A., & Ciancarini, P. (2024). Serendipitous, Open Big Data Management and Analytics: The SeDaSOMA Framework. Modelling, 5(3), 1173-1196. https://doi.org/10.3390/modelling5030061