Big Data Sharing: A Comprehensive Survey
Abstract
1. Introduction
- To the best of our knowledge, this paper is the first comprehensive survey that formally defines and delves into the technical details of big data sharing.
- We present the readers with the state-of-the-art development and research of big data sharing by articulating the definition, general workflow, and requirements and summarizing the existing popular platforms, challenging issues, and solutions.
- The promising future directions, i.e., blockchain-based big data sharing and edge as big data-sharing infrastructure, are identified and may incentivize future research.
2. Preliminaries of Big Data Sharing
2.1. Basics of Big Data
2.2. Definition of Big Data Sharing
2.3. General Procedures of Big Data Sharing
2.4. Benefits of Big Data Sharing
- For researchers, sharing scholarly data increases the visibility of their work and can strengthen their academic reputations. Shared materials typically comprise full texts, source code, experimental tools, and evaluation datasets. Open data encourage replication and comparative studies: open-access articles record 89% more full-text downloads and 42% more PDF downloads than paywalled equivalents [43], while publicly available medical datasets attract 69% more citations after controlling for journal impact, publication date, and institutional affiliation [44].
- For enterprises and public sectors, big data sharing can enhance recognition and foster ongoing collaboration. Over the past decade, national open-government portals, exemplified by data.gov.uk, data.gov, and data.gov.sg, have proliferated, furnishing citizens, firms, and researchers with standardized access to public-sector datasets. Empirical studies indicate that such transparency measures enhance institutional trust and stimulate civic engagement [45]. Parallel developments are evident in the private sector, where enterprises leverage big data sharing as a strategic marketing and innovation instrument. A prominent modality is the datathon: firms release curated datasets to the public and sponsor predictive-modelling contests. Kaggle, the largest online community of data scientists and machine-learning practitioners, currently hosts several thousand public datasets together with reproducible code notebooks, thereby lowering transaction costs and fostering collaborative analytics between industry and academia.
- Monetary rewards are a clear incentive for sharees engaging in big data sharing, particularly from a commercial perspective. The daily deluge of high-value data produced by billions of low-cost devices and users constitutes a major commercial asset. Facebook, for instance, with over two billion monthly active accounts, generates approximately four petabytes of new data each day. Legal prohibitions on direct sale, imposed by privacy statutes and platform terms of service, do not diminish this asset’s worth; instead, the expected future monetization of these data underpins Facebook’s market capitalization, which surpassed USD 1.45 trillion in October 2024. The immense volume and value of data creates unprecedented opportunities for monetization and new business models. Particularly, dedicated trading venues such as Japan Data Exchange Inc. and Shanghai Data Exchange Corp. have emerged to facilitate the compliant, market-mediated exchange of big data rights while respecting regulatory constraints.
- Promotion of academic integrity: Big data sharing fosters academic integrity, which is the ethical standard that mandates the avoidance of plagiarism and cheating in academic endeavors. By making scholarly data accessible, research findings become more reproducible, as others can replicate specific experiments. This transparency encourages researchers to exercise greater caution when publishing their findings, thereby creating a virtuous cycle that enhances academic integrity. More broadly, sharing big data ensures that the evidence underpinning scientific results is preserved, which is crucial for the advancement of science.
- Incentivization for data quality management: Sharing high-velocity data (e.g., from IoT networks) facilitates real-time decision making in domains like smart cities and supply chain management. However, high-velocity data are usually low quality, and making these data publicly available creates reputational incentives for researchers to implement rigorous data management workflows and to enforce stringent quality-control procedures. Large-scale repositories invariably contain redundant records that inflate storage costs and degrade query performance. Sharers can exploit big-data reduction techniques, e.g., deduplication, stratified sampling, or lossless compression, to eliminate superfluous information while preserving analytical utility. The resulting high-quality datasets not only attract a broader user base but also lower the indirect costs (bandwidth, replication, and curation) imposed on the hosting infrastructure without eroding the intrinsic scientific value of the resource.
- Facilitation of collaboration and innovation: Big data sharing encourages increased collaboration and connectivity among researchers, potentially leading to significant new discoveries within a field. Data serve as the bedrock of scientific progress and are typically acquired through substantial effort and publicly funded projects. However, their utility is often confined to generating scientific publications, leaving much data underutilized. Big data sharing offers a more efficient approach by enabling researchers to share resources. Furthermore, big data sharing enables societal-level insights that are impossible with smaller datasets. Examples include the Google Flu study [35] and large-scale climate modelling [46].
2.5. Requirements of Big Data-Sharing Solutions
- Data security. Big data sharing is inherently dyadic: a trustworthy solution must guarantee that no entity other than the two designated parties can read or modify the dataset. Access and alteration rights must be strictly predicated on explicit, fine-grained authorizations issued by the data sharer. Moreover, the architecture has to provide verifiable recovery mechanisms that can reconstruct both the data and the immutable sharing log in the event of corruption or malicious destruction.
- User privacy. Within big data-sharing ecosystems, the identities of both the data sharer and sharee must be shielded from external observers; ideally, they should also remain mutually concealed. The transaction should foreground the dataset itself while rendering the participating parties provably anonymous.
- Data privacy. Datasets such as electronic health records combine high analytic value with extreme sensitivity. To preserve privacy while enabling big data sharing, custodians must apply protective transformations, e.g., masking, generalization, and cryptographic obfuscation, before any external release.
- Big data preview. Under a preview regime, the sharee receives only a down-sampled or fragmentary surrogate, e.g., a textual excerpt, a low-resolution video frame, or an audio snippet, rendered through a functionality-restricted viewer. This partial disclosure permits value assessment while withholding the native dataset.
- Search over big data. The sharer exposes a controlled query interface that accepts only pre-approved query types, e.g., keyword, range, ranked, or similarity search, while restricting the sharee to search operations. The sharer retains full authority over permissible query grammars and returned data formats, thereby prohibiting bulk retrieval or direct inspection of the underlying corpus.
- Nearline computation. The sharee can perform operations using a combination of predefined interfaces, extending beyond the search to include actions like addition, deletion, and updates. “Nearline” indicates that computation is nearly online and quickly accessible without human intervention.
- Big data transfer. The sharer directly transfers the data to the sharee, allowing for a wide range of operations. Post-transfer operations depend on the contractual agreement between the parties. For instance, if data ownership is not transferred, the sharee is legally prohibited from further disseminating the data.
2.6. Big Data-Sharing Applications
3. Existing Platforms and Categorization
3.1. Existing Platforms
3.1.1. Epimorphics Linked Data Platform
3.1.2. HKSTP Data Studio
3.1.3. SEEK
3.1.4. InterPlanetary File System
3.1.5. Amazon Web Services Data Exchange
3.2. Categorization of Existing Platforms
3.2.1. Data-Hosting Center
3.2.2. Data Aggregation Center
3.2.3. Decentralized Big Data Sharing Solutions
- DHCs prioritize efficiency and data authenticity at the cost of data owner control and privacy. They are suitable for scenarios where performance and reliability are paramount, and data owners are willing to entrust their data to a central authority.
- DACs offer a middle ground, preserving data owner control while providing a centralized mechanism for data discovery. However, this model introduces concerns about the center’s potential to overstep its role and the difficulty in verifying data authenticity.
- Decentralized solutions champion resilience, censorship resistance, and the elimination of a central authority, which removes single points of failure and control. This comes at the price of performance consistency, a higher maintenance burden on participants, and less defined incentive structures. This model is ideal for environments where trust in a central entity is low or non-existent.
4. Challenges and Existing Solutions
4.1. Standardization of Heterogeneous Data
4.2. Value Assessment and Pricing Model
- Intrinsic quality: the conformance of a dataset to elementary syntactic and semantic criteria: volume, accuracy, completeness, timeliness, uniqueness, internal consistency, security posture, and provenance reliability [75].
- Presentation quality: the clarity of structure and semantics conveyed to the consumer, encompassing conciseness, interpretability, syntactic uniformity, and cognitive ease of comprehension.
- Contextual quality: the degree to which data content aligns with the specific decision-making context and is fit for the intended analytical or operational purpose.
- Accessibility: the ease and economy with which the buyer can locate, negotiate, and physically retrieve the dataset, including communication latency and any associated transactional overheads.
- Reliability: the cumulative reputation and verifiable trustworthiness of both the data originator and the vendor, evaluated through historical performance and third-party attestations.
- Cost approach: Value is anchored to the historic expenditure incurred during collection, cleansing, storage, and maintenance. Owing to joint-production effects and indivisible overheads, marginal cost is rarely observable, so the method often understates the option value and fails to capture future rent-generating potential.
- Market approach: Value is inferred from recent transaction prices of allegedly comparable datasets. The paucity of transparent exchanges and the heterogeneity of data attributes (schema granularity, provenance, timeliness) render the identification of true comparables problematic, producing wide confidence intervals.
- Income (revenue) approach: Value equates to the discounted stream of incremental cash flows attributable to the dataset across its economic life. Because forecast benefits are application-specific and buyer-specific, the approach is inherently subjective; valuations can diverge by orders of magnitude across prospective licensees.
- Multi-dimensional quantitative evaluation of quality. Although the literature proposes extensive taxonomies of data-quality dimensions, most remain conceptual schemata supported by qualitative heuristics; operational, quantitative models are conspicuously absent. This deficit is exacerbated when repositories contain massive unstructured corpora, such as text, imagery, and sensor streams, whose semantic content resists automatic, scalable, and reproducible metrology.
- Data collection quality assessment. Most current approaches assess data quality at the level of individual data units (e.g., a single text or image). However, data sharing and trading platforms typically involve large datasets (e.g., 10,000 texts or 100,000 images). Evaluating the overall quality of these datasets by aggregating the quality statistics of individual data units ignores the relationships between data units and their impact on the overall quality of the dataset.
- Dynamic evaluation of data value. Quantifying the value of a dataset is an inherently complex task, requiring an assessment of factors such as its rarity, acquisition difficulty, and intrinsic quality. A significant limitation of existing evaluation frameworks, however, is their tendency to focus on static measures of quality while neglecting the dynamic nature of data’s true value. The value of data is not fixed; it evolves in response to technological advancements in collection and storage, the optimization of data-mining models, and shifts in application scenarios and consumer needs. This temporal dynamism introduces a profound layer of complexity, rendering simplistic, static assessments inadequate and making robust value estimation a persistent challenge.
- Developing quantitative models and methods for evaluating data quality. Researchers should focus on creating specific quantitative models and methods for evaluating data quality, particularly for unstructured data.
- Assessing data collection quality. New approaches should be developed to assess the overall quality of large datasets, taking into account the relationships between data units and their impact on the overall quality of the dataset.
- Evaluating data value dynamically. Researchers should develop methods that can reasonably evaluate the dynamic characteristics of data value, including rarity, difficulty in obtaining, and changes in data collection, storage, and application scenarios.
4.3. Sharing Security
- Time-limited authorization: determining whether user authorization has time constraints.
- Authority division: distinguishing between data ownership and usage rights.
- Re-sharing permissions: deciding whether users can re-share data.
- Flexible revocation: allowing for the complete revocation of user permissions.
- Consolidation and integration of access control strategies. In many cases, data users require access to multiple heterogeneous data sources. Integrating access control policies from these sources is essential, but automated or semi-automated strategic integration systems are needed to resolve conflict issues [80]. Allowing data providers to develop their access strategies can complicate data sharing, and the automatic integration and merging of these strategies remains challenging.
- Authorization management. Fine-grained access control requires efficient authorization management, which can be resource-intensive for large datasets. Automatic authorization technologies are necessary based on the user’s digital identity, profile, context, and data content and metadata. While initial steps have been taken in developing machine learning-based permission assignments [81], more advanced methods are needed to address dynamically changing contexts and situations.
- Implementation of access control on big data platforms. The rise of big data platforms has introduced new challenges in implementing fine-grained access control for diverse users. Although initial work has focused on injecting access control policies into submitted work, further research is needed to study the effective implementation of such strategies in big data storage, particularly in fine-grained encryption.
4.4. Sharing Privacy
4.5. Data Traceability and Accountability
- High-speed real-time online access: Unlike streaming media, data access in big data environments can be sporadic and arbitrary. Therefore, improvements in the efficiency and speed of online access are essential.
- Improvement of dedicated software: Existing dedicated software for copyright management typically restricts users to browsing activities, such as watching videos or listening to music. In the context of data transactions, such software must also support data computation and visualization for buyers.
- Function restriction mechanism: Certain software applications should disable screen capture functionalities and implement mechanisms to prevent buyers from capturing images or videos of the screen, thereby preventing indirect infringement.
- Infringement detection: Beyond using product keys and continuous online identity verification to prevent infringement, it is crucial to detect instances of infringement. This can be achieved by recording the devices involved in data transmission and integrating copyright restrictions within the data transaction contract. The software should automatically assess whether infringement has occurred and, if so, penalize the buyer by disrupting data access or completely revoking usage rights.
4.6. High Quality of Service
- Storage subsystem. Conventional electromechanical hard-disk drives exhibit random-access latency and throughput that are orders of magnitude below the ingestion and query rates required for real-time big-data workloads. Although solid-state drives (SSDs) and phase-change memory (PCM) offer substantially higher IOPS, their unit cost and limited write endurance have slowed enterprise-wide deployment.
- Index-management algorithms. Existing data structure and indexing techniques are not co-designed with modern storage hierarchies; as a result, point and range queries remain CPU-bound despite abundant secondary-storage bandwidth. Cache-conscious, compression-aware index layouts must therefore be re-engineered to exploit both byte-addressable NVM and parallel flash arrays.
- Secure high-bandwidth transport. Because data acquisition and service delivery are predominantly cloud-resident, multi-gigabit, wide-area transfers are routine. Packet loss, jitter, and man-in-the-middle attacks can silently corrupt or exfiltrate in-flight segments; hence, low-overhead, line-rate encryption and loss-resilient integrity checks are mandatory.
- Compute-power scalability. Aggregate data volume now grows super-linearly with transistor density, whereas single-core clock frequencies have plateaued since 2005. Sustained quality of service, therefore, hinges on heterogeneous parallelism (GPUs, FPGAs, domain-specific accelerators) and energy-efficient cluster fabrics rather than on frequency scaling alone.
- Timeliness guarantees. Meeting subsecond latency service-level objectives for complex analytics (streaming joins, iterative graph algorithms, deep-learning inference) demands a synergistic redesign of compute architectures, scheduling policies, and approximation algorithms; failure at any stratum propagates delay and violates business-critical deadlines.
5. Future Directions
5.1. Blockchain-Based Big Data Sharing
- Decentralization: The blockchain is maintained by a P2P network, in which all nodes are identical and there is no central authority.
- Transparency: The blocks and transactions are visible to all the nodes in the blockchain network and even public to everyone.
- Immutability: The data cannot be changed once stored on the blockchain because the blocks are generated individually and securely linked via cryptographic functions.
5.2. Edge as Big Data-Sharing Infrastructure
6. Conclusions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ajagbe, S.A.; Mudali, P.; Adigun, M.O. Internet of things with deep learning techniques for pandemic detection: A comprehensive review of current trends and open issues. Electronics 2024, 13, 2630. [Google Scholar] [CrossRef]
- Zhang, Z.; Liu, M.; Sun, M.; Deng, R.; Cheng, P.; Niyato, D.; Chow, M.Y.; Chen, J. Vulnerability of machine learning approaches applied in iot-based smart grid: A review. IEEE Internet Things J. 2024, 11, 18951–18975. [Google Scholar] [CrossRef]
- Salaris, S.; Ocagli, H.; Casamento, A.; Lanera, C.; Gregori, D. Foodborne event detection based on social media mining: A systematic review. Foods 2025, 14, 239. [Google Scholar] [CrossRef]
- Chen, M.; Mao, S.; Liu, Y. Big data: A survey. Mob. Netw. Appl. 2014, 19, 171–209. [Google Scholar] [CrossRef]
- Liu, X.; Cao, J.; Yang, Y.; Jiang, S. CPS-based smart warehouse for industry 4.0: A survey of the underlying technologies. Computers 2018, 7, 13. [Google Scholar] [CrossRef]
- Jamarani, A.; Haddadi, S.; Sarvizadeh, R.; Haghi Kashani, M.; Akbari, M.; Moradi, S. Big data and predictive analytics: A systematic review of applications. Artif. Intell. Rev. 2024, 57, 176. [Google Scholar] [CrossRef]
- Latupeirissa, J.J.P.; Dewi, N.L.Y.; Prayana, I.K.R.; Srikandi, M.B.; Ramadiansyah, S.A.; Pramana, I.B.G.A.Y. Transforming public service delivery: A comprehensive review of digitization initiatives. Sustainability 2024, 16, 2818. [Google Scholar] [CrossRef]
- Wang, R.; Xu, C.; Dong, R.; Luo, Z.; Zheng, R.; Zhang, X. A secured big-data sharing platform for materials genome engineering: State-of-the-art, challenges and architecture. Future Gener. Comput. Syst. 2023, 142, 59–74. [Google Scholar] [CrossRef]
- Ye, M.; Shen, W.; Du, B.; Snezhko, E.; Kovalev, V.; Yuen, P.C. Vertical federated learning for effectiveness, security, applicability: A survey. ACM Comput. Surv. 2025, 57, 1–32. [Google Scholar] [CrossRef]
- Mello, M.M.; Lieou, V.; Goodman, S.N. Clinical trial participants’ views of the risks and benefits of data sharing. N. Engl. J. Med. 2018, 378, 2202–2211. [Google Scholar] [CrossRef] [PubMed]
- Figueiredo, A.S. Data sharing: Convert challenges into opportunities. Front. Public Health 2017, 5, 327. [Google Scholar] [CrossRef]
- Agapito, G.; Cannataro, M. An overview on the challenges and limitations using cloud computing in healthcare corporations. Big Data Cogn. Comput. 2023, 7, 68. [Google Scholar] [CrossRef]
- Hajian, A.; Prybutok, V.R.; Chang, H.C. An empirical study for blockchain-based information sharing systems in electronic health records: A mediation perspective. Comput. Hum. Behav. 2023, 138, 107471. [Google Scholar] [CrossRef]
- Rhahla, M.; Allegue, S.; Abdellatif, T. Guidelines for GDPR compliance in Big Data systems. J. Inf. Secur. Appl. 2021, 61, 102896. [Google Scholar] [CrossRef]
- Wang, J.; Gao, F.; Zhou, Y.; Guo, Q.; Tan, C.W.; Song, J.; Wang, Y. Data sharing in energy systems. Adv. Appl. Energy 2023, 10, 100132. [Google Scholar] [CrossRef]
- Liu, Z.; Huang, B.; Li, Y.; Sun, Q.; Pedersen, T.B.; Gao, D.W. Pricing game and blockchain for electricity data trading in low-carbon smart energy systems. IEEE Trans. Ind. Inform. 2024, 20, 6446–6456. [Google Scholar] [CrossRef]
- Deepa, N.; Pham, Q.V.; Nguyen, D.C.; Bhattacharya, S.; Prabadevi, B.; Gadekallu, T.R.; Maddikunta, P.K.R.; Fang, F.; Pathirana, P.N. A survey on blockchain for big data: Approaches, opportunities, and future directions. Future Gener. Comput. Syst. 2022, 131, 209–226. [Google Scholar] [CrossRef]
- Khan, N.; Yaqoob, I.; Hashem, I.A.T.; Inayat, Z.; Mahmoud Ali, W.K.; Alam, M.; Shiraz, M.; Gani, A. Big data: Survey, technologies, opportunities, and challenges. Sci. World J. 2014, 2014, 712826. [Google Scholar] [CrossRef] [PubMed]
- Arora, S.; Kumar, M.; Johri, P.; Das, S. Big heterogeneous data and its security: A survey. In Proceedings of the 2016 International Conference on Computing, Communication and Automation (ICCCA), Greater Noida, India, 29–30 April 2016; pp. 37–40. [Google Scholar]
- Yang, P.; Xiong, N.; Ren, J. Data security and privacy protection for cloud storage: A survey. IEEE Access 2020, 8, 131723–131740. [Google Scholar] [CrossRef]
- Ferradi, H.; Cao, J.; Jiang, S.; Cao, Y.; Saxena, D. Security and privacy in big data sharing: State-of-the-art and research directions. arXiv 2022, arXiv:2210.09230. [Google Scholar]
- Liu, L.; Han, M. Data sharing and exchanging with incentive and optimization: A survey. Discov. Data 2024, 2, 2. [Google Scholar] [CrossRef]
- Liang, H.; Zhang, Z.; Hu, C.; Gong, Y.; Cheng, D. A Survey on Spatio-temporal Big Data Analytics Ecosystem: Resource Management, Processing Platform, and Applications. IEEE Trans. Big Data 2024, 10, 174–193. [Google Scholar] [CrossRef]
- Almeida, A.; Brás, S.; Sargento, S.; Pinto, F.C. Time series big data: A survey on data stream frameworks, analysis and algorithms. J. Big Data 2023, 10, 83. [Google Scholar] [CrossRef]
- Selmy, H.A.; Mohamed, H.K.; Medhat, W. Big data analytics deep learning techniques and applications: A survey. Inf. Syst. 2023, 120, 102318. [Google Scholar] [CrossRef]
- Lv, Z.; Qiao, L. Analysis of healthcare big data. Future Gener. Comput. Syst. 2020, 109, 103–110. [Google Scholar] [CrossRef]
- Talebkhah, M.; Sali, A.; Marjani, M.; Gordan, M.; Hashim, S.J.; Rokhani, F.Z. IoT and big data applications in smart cities: Recent advances, challenges, and critical issues. IEEE Access 2021, 9, 55465–55484. [Google Scholar] [CrossRef]
- Kumar, A.; Sangwan, S.R.; Nayyar, A. Multimedia social big data: Mining. In Multimedia Big Data Computing for IoT Applications: Concepts, Paradigms and Solutions; Springer: Berlin/Heidelberg, Germany, 2020; pp. 289–321. [Google Scholar]
- Nelufule, N.; Senamela, P.; Moloi, P. Digital Forensics Investigations on Evolving Digital Ecosystems and Big Data Sharing: A Survey of Challenges and Potential Opportunities. In Proceedings of the 2025 IST-Africa Conference, Nairobi, Kenya, 28–30 May 2025; pp. 1–12. [Google Scholar]
- Hemmati, A.; Arzanagh, H.M.; Rahmani, A.M. A taxonomy and survey of big data in social media. Concurr. Comput. Pract. Exp. 2024, 36, e7875. [Google Scholar] [CrossRef]
- Khan, S.; Liu, X.; Shakil, K.A.; Alam, M. A survey on scholarly data: From big data perspective. Inf. Process. Manag. 2017, 53, 923–944. [Google Scholar] [CrossRef]
- Adler-Milstein, J.; Garg, A.; Zhao, W.; Patel, V. A survey of health information exchange organizations in advance of a nationwide connectivity framework. Health Aff. 2021, 40, 736–744. [Google Scholar] [CrossRef]
- Manzoor, A.; Braeken, A.; Kanhere, S.S.; Ylianttila, M.; Liyanage, M. Proxy re-encryption enabled secure and anonymous IoT data sharing platform based on blockchain. J. Netw. Comput. Appl. 2021, 176, 102917. [Google Scholar] [CrossRef]
- Jacobs, A. The pathologies of big data. Commun. ACM 2009, 52, 36–44. [Google Scholar] [CrossRef]
- Lazer, D.; Kennedy, R.; King, G.; Vespignani, A. The parable of Google Flu: Traps in big data analysis. Science 2014, 343, 1203–1205. [Google Scholar] [CrossRef]
- Ginsberg, J.; Mohebbi, M.H.; Patel, R.S.; Brammer, L.; Smolinski, M.S.; Brilliant, L. Detecting influenza epidemics using search engine query data. Nature 2009, 457, 1012–1014. [Google Scholar] [CrossRef]
- Hossain, M.A.; Dwivedi, Y.K.; Rana, N.P. State-of-the-art in open data research: Insights from existing literature and a research agenda. J. Organ. Comput. Electron. Commer. 2016, 26, 14–40. [Google Scholar] [CrossRef]
- Wu, H.; Cao, J.; Jiang, S.; Yang, R.; Yang, Y.; Hey, J. TSAR: A fully-distributed trustless data sharing platform. In Proceedings of the 2018 IEEE International Conference on Smart Computing (SMARTCOMP), Taormina, Italy, 18–20 June 2018; pp. 350–355. [Google Scholar]
- Cuzzocrea, A.; Damiani, E. Privacy-preserving big data exchange: Models, issues, future research directions. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; pp. 5081–5084. [Google Scholar]
- Zhang, M.; Beltrán, F.; Liu, J. A survey of data pricing for data marketplaces. IEEE Trans. Big Data 2023, 9, 1038–1056. [Google Scholar] [CrossRef]
- Azcoitia, S.A.; Laoutaris, N. A survey of data marketplaces and their business models. ACM SIGMOD Rec. 2022, 51, 18–29. [Google Scholar] [CrossRef]
- Jiang, S.; Cao, J.; McCann, J.A.; Yang, Y.; Liu, Y.; Wang, X.; Deng, Y. Privacy-preserving and efficient multi-keyword search over encrypted data on blockchain. In Proceedings of the 2019 IEEE International Conference on Blockchain (Blockchain), Atlanta, GA, USA, 14–17 July 2019; pp. 405–410. [Google Scholar]
- Davis, P.M.; Lewenstein, B.V.; Simon, D.H.; Booth, J.G.; Connolly, M.J. Open access publishing, article downloads, and citations: Randomised controlled trial. BMJ 2008, 337, 343–345. [Google Scholar] [CrossRef] [PubMed]
- Piwowar, H.A.; Day, R.S.; Fridsma, D.B. Sharing detailed research data is associated with increased citation rate. PLoS ONE 2007, 2, e308. [Google Scholar] [CrossRef]
- Janssen, M.; Charalabidis, Y.; Zuiderwijk, A. Benefits, adoption barriers and myths of open data and open government. Inf. Syst. Manag. 2012, 29, 258–268. [Google Scholar] [CrossRef]
- Guo, H.D.; Zhang, L.; Zhu, L.W. Earth observation big data for climate change research. Adv. Clim. Change Res. 2015, 6, 108–117. [Google Scholar] [CrossRef]
- Van Panhuis, W.G.; Paul, P.; Emerson, C.; Grefenstette, J.; Wilder, R.; Herbst, A.J.; Heymann, D.; Burke, D.S. A systematic review of barriers to data sharing in public health. BMC Public Health 2014, 14, 1144. [Google Scholar] [CrossRef]
- Houtkoop, B.L.; Chambers, C.; Macleod, M.; Bishop, D.V.; Nichols, T.E.; Wagenmakers, E.J. Data sharing in psychology: A survey on barriers and preconditions. Adv. Methods Pract. Psychol. Sci. 2018, 1, 70–85. [Google Scholar] [CrossRef]
- Jiang, S.; Cao, J.; Wu, H.; Yang, Y.; Ma, M.; He, J. Blochie: A blockchain-based platform for healthcare information exchange. In Proceedings of the 2018 IEEE International Conference on Smart Computing (Smartcomp), Taormina, Italy, 18–20 June 2018; pp. 49–56. [Google Scholar]
- Wu, H.; Jiang, S.; Cao, J. High-efficiency blockchain-based supply chain traceability. IEEE Trans. Intell. Transp. Syst. 2023, 24, 3748–3758. [Google Scholar] [CrossRef]
- Jiang, S.; Chai, W.; Zhang, M.; Cao, J.; Xuan, S.; Shen, J. Verifying Energy Generation via Edge LLM for Web3-based Decentralized Clean Energy Networks. Inf. Fusion 2026, 127, 103752. [Google Scholar] [CrossRef]
- Sim, I.; Stebbins, M.; Bierer, B.E.; Butte, A.J.; Drazen, J.; Dzau, V.; Hernandez, A.F.; Krumholz, H.M.; Lo, B.; Munos, B.; et al. Time for NIH to lead on data sharing. Science 2020, 367, 1308–1309. [Google Scholar] [CrossRef] [PubMed]
- Sun, J.; Fang, Y. Cross-domain data sharing in distributed electronic health record systems. IEEE Trans. Parallel Distrib. Syst. 2009, 21, 754–764. [Google Scholar] [CrossRef]
- Hossain, M.E.; Khan, A.; Moni, M.A.; Uddin, S. Use of electronic health data for disease prediction: A comprehensive literature review. IEEE/ACM Trans. Comput. Biol. Bioinform. 2019, 18, 745–758. [Google Scholar] [CrossRef] [PubMed]
- Oh, H.; Park, S.; Lee, G.M.; Choi, J.K.; Noh, S. Competitive Data Trading Model with Privacy Valuation for Multiple Stakeholders in IoT Data Markets. IEEE Internet Things J. 2020, 7, 3623–3639. [Google Scholar] [CrossRef]
- Zheng, Z.; Peng, Y.; Wu, F.; Tang, S.; Chen, G. Arete: On designing joint online pricing and reward sharing mechanisms for mobile data markets. IEEE Trans. Mob. Comput. 2019, 19, 769–787. [Google Scholar] [CrossRef]
- Zhao, Y.; Wang, H.; Su, H.; Zhang, L.; Zhang, R.; Wang, D.; Xu, K. Understand love of variety in wireless data market under sponsored data plans. IEEE J. Sel. Areas Commun. 2020, 38, 766–781. [Google Scholar] [CrossRef]
- Wolstencroft, K.; Owen, S.; Krebs, O.; Nguyen, Q.; Stanford, N.J.; Golebiewski, M.; Weidemann, A.; Bittkowski, M.; An, L.; Shockley, D.; et al. SEEK: A systems biology data and model management platform. BMC Syst. Biol. 2015, 9, 33. [Google Scholar] [CrossRef]
- Rocca-Serra, P.; Brandizi, M.; Maguire, E.; Sklyar, N.; Taylor, C.; Begley, K.; Field, D.; Harris, S.; Hide, W.; Hofmann, O.; et al. ISA software suite: Supporting standards-compliant experimental annotation and enabling curation at the community level. Bioinformatics 2010, 26, 2354–2356. [Google Scholar] [CrossRef]
- Psaras, Y.; Dias, D. The interplanetary file system and the filecoin network. In Proceedings of the 2020 50th Annual IEEE-IFIP International Conference on Dependable Systems and Networks-Supplemental Volume (DSN-S), Valencia, Spain, 28 June–2 July 2020; p. 80. [Google Scholar]
- Shen, B.; Guo, J.; Yang, Y. MedChain: Efficient healthcare data sharing via blockchain. Appl. Sci. 2019, 9, 1207. [Google Scholar] [CrossRef]
- Zuech, R.; Khoshgoftaar, T.M.; Wald, R. Intrusion detection and big heterogeneous data: A survey. J. Big Data 2015, 2, 3. [Google Scholar] [CrossRef]
- Chen, D.; Yuan, H.; Hu, S.; Wang, Q.; Wang, C. BOSSA: A decentralized system for proofs of data retrievability and replication. IEEE Trans. Parallel Distrib. Syst. 2020, 32, 786–798. [Google Scholar] [CrossRef]
- Yang, A.; Xu, J.; Weng, J.; Zhou, J.; Wong, D.S. Lightweight and privacy-preserving delegatable proofs of storage with data dynamics in cloud storage. IEEE Trans. Cloud Comput. 2018, 9, 212–225. [Google Scholar] [CrossRef]
- He, K.; Chen, J.; Du, R.; Wu, Q.; Xue, G.; Zhang, X. Deypos: Deduplicatable dynamic proof of storage for multi-user environments. IEEE Trans. Comput. 2016, 65, 3631–3645. [Google Scholar] [CrossRef]
- Yu, J.; Ren, K.; Wang, C.; Varadharajan, V. Enabling cloud storage auditing with key-exposure resistance. IEEE Trans. Inf. Forensics Secur. 2015, 10, 1167–1179. [Google Scholar] [CrossRef]
- Wang, H.; Li, M.; Bu, Y.; Li, J.; Gao, H.; Zhang, J. Cleanix: A parallel big data cleaning system. ACM SIGMOD Rec. 2016, 44, 35–40. [Google Scholar] [CrossRef]
- Gudivada, V.; Apon, A.; Ding, J. Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations. Int. J. Adv. Softw. 2017, 10, 1–20. [Google Scholar]
- Ahmed, M. Data summarization: A survey. Knowl. Inf. Syst. 2019, 58, 249–273. [Google Scholar] [CrossRef]
- Hesabi, Z.R.; Tari, Z.; Goscinski, A.; Fahad, A.; Khalil, I.; Queiroz, C. Data summarization techniques for big data—A survey. In Handbook on Data Centers; Springer: Berlin/Heidelberg, Germany, 2015; pp. 1109–1152. [Google Scholar]
- Xiao, D.; Bashllari, A.; Menard, T.; Eltabakh, M. Even metadata is getting big: Annotation summarization using insightnotes. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, VIC, Australia, 31 May–4 June 2015; pp. 1409–1414. [Google Scholar]
- Laban, P.; Kryściński, W.; Agarwal, D.; Fabbri, A.R.; Xiong, C.; Joty, S.; Wu, C.S. SUMMEDITS: Measuring LLM ability at factual reasoning through the lens of summarization. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; pp. 9662–9676. [Google Scholar]
- Ardagna, D.; Cappiello, C.; Samá, W.; Vitali, M. Context-aware data quality assessment for big data. Future Gener. Comput. Syst. 2018, 89, 548–562. [Google Scholar] [CrossRef]
- Li, J.; Li, J.; Wang, X.; Qin, R.; Yuan, Y.; Wang, F.Y. Multi-blockchain based data trading markets with novel pricing mechanisms. IEEE/CAA J. Autom. Sin. 2023, 10, 2222–2232. [Google Scholar] [CrossRef]
- Hazen, B.T.; Boone, C.A.; Ezell, J.D.; Jones-Farmer, L.A. Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications. Int. J. Prod. Econ. 2014, 154, 72–80. [Google Scholar] [CrossRef]
- Caruccio, L.; Desiato, D.; Polese, G.; Tortora, G. GDPR compliant information confidentiality preservation in big data processing. IEEE Access 2020, 8, 205034–205050. [Google Scholar] [CrossRef]
- Colombo, P.; Ferrari, E. Privacy aware access control for big data: A research roadmap. Big Data Res. 2015, 2, 145–154. [Google Scholar] [CrossRef]
- Ding, Y.; Sato, H. Bloccess: Enabling fine-grained access control based on blockchain. J. Netw. Syst. Manag. 2023, 31, 6. [Google Scholar] [CrossRef]
- Ding, W.; Yan, Z.; Deng, R.H. Privacy-preserving data processing with flexible access control. IEEE Trans. Dependable Secur. Comput. 2017, 17, 363–376. [Google Scholar] [CrossRef]
- Zhang, L.; Wang, J.; Mu, Y. Privacy-preserving flexible access control for encrypted data in Internet of Things. IEEE Internet Things J. 2021, 8, 14731–14745. [Google Scholar] [CrossRef]
- Nobi, M.N.; Krishnan, R.; Huang, Y.; Shakarami, M.; Sandhu, R. Toward deep learning based access control. In Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy, Washington, DC, USA, 25–27 April 2022; pp. 143–154. [Google Scholar]
- Naveed, M.; Agrawal, S.; Prabhakaran, M.; Wang, X.; Ayday, E.; Hubaux, J.P.; Gunter, C. Controlled functional encryption. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, AZ, USA, 3–7 November 2014; pp. 1280–1291. [Google Scholar]
- Yan, H.; Wang, Y.; Jia, C.; Li, J.; Xiang, Y.; Pedrycz, W. IoT-FBAC: Function-based access control scheme using identity-based encryption in IoT. Future Gener. Comput. Syst. 2019, 95, 344–353. [Google Scholar] [CrossRef]
- Zhang, Y.; Deng, R.H.; Xu, S.; Sun, J.; Li, Q.; Zheng, D. Attribute-based encryption for cloud computing access control: A survey. ACM Comput. Surv. 2020, 53, 1–41. [Google Scholar] [CrossRef]
- Luo, F.; Wang, H.; Yan, X.; Wu, J. Key-Policy Attribute-Based Encryption with Switchable Attributes for Fine-Grained Access Control of Encrypted Data. IEEE Trans. Inf. Forensics Secur. 2024, 19, 7245–7258. [Google Scholar] [CrossRef]
- Bethencourt, J.; Sahai, A.; Waters, B. Ciphertext-policy attribute-based encryption. In Proceedings of the 2007 IEEE Symposium on Security and Privacy, Berkeley, CA, USA, 20–23 May 2007; pp. 321–334. [Google Scholar]
- Acar, A.; Aksu, H.; Uluagac, A.S.; Conti, M. A survey on homomorphic encryption schemes: Theory and implementation. ACM Comput. Surv. 2018, 51, 1–35. [Google Scholar] [CrossRef]
- Zhang, Z.; Cheng, P.; Wu, J.; Chen, J. Secure state estimation using hybrid homomorphic encryption scheme. IEEE Trans. Control Syst. Technol. 2020, 29, 1704–1720. [Google Scholar] [CrossRef]
- Gentry, C.; Boneh, D. A Fully Homomorphic Encryption Scheme; Stanford University Stanford: Stanford, CA, USA, 2009; Volume 20. [Google Scholar]
- Marcolla, C.; Sucasas, V.; Manzano, M.; Bassoli, R.; Fitzek, F.H.; Aaraj, N. Survey on fully homomorphic encryption, theory, and applications. Proc. IEEE 2022, 110, 1572–1609. [Google Scholar] [CrossRef]
- Zhou, L.; Fu, A.; Yu, S.; Su, M.; Kuang, B. Data integrity verification of the outsourced big data in the cloud environment: A survey. J. Netw. Comput. Appl. 2018, 122, 1–15. [Google Scholar] [CrossRef]
- Li, B.; He, Q.; Chen, F.; Jin, H.; Xiang, Y.; Yang, Y. Inspecting edge data integrity with aggregate signature in distributed edge computing environment. IEEE Trans. Cloud Comput. 2021, 10, 2691–2703. [Google Scholar] [CrossRef]
- Yang, Y.; Zheng, X.; Guo, W.; Liu, X.; Chang, V. Privacy-preserving smart IoT-based healthcare big data storage and self-adaptive access control system. Inf. Sci. 2019, 479, 567–592. [Google Scholar] [CrossRef]
- Yu, H.; Hu, Q.; Yang, Z.; Liu, H. Efficient continuous big data integrity checking for decentralized storage. IEEE Trans. Netw. Sci. Eng. 2021, 8, 1658–1673. [Google Scholar] [CrossRef]
- Ganeriwal, S.; Balzano, L.K.; Srivastava, M.B. Reputation-based framework for high integrity sensor networks. ACM Trans. Sens. Netw. 2008, 4, 1–37. [Google Scholar] [CrossRef]
- Westin, A.F. Social and political dimensions of privacy. J. Soc. Issues 2003, 59, 431–453. [Google Scholar] [CrossRef]
- Jiang, S.; Cao, J.; Wu, H.; Chen, K.; Liu, X. Privacy-preserving and efficient data sharing for blockchain-based intelligent transportation systems. Inf. Sci. 2023, 635, 72–85. [Google Scholar] [CrossRef]
- Hu, H.; Xu, J.; Xu, X.; Pei, K.; Choi, B.; Zhou, S. Private search on key-value stores with hierarchical indexes. In Proceedings of the IEEE 30th International Conference on Data Engineering, Chicago, IL, USA, 31 March–4 April 2014; pp. 628–639. [Google Scholar]
- Vasa, J.; Thakkar, A. Deep learning: Differential privacy preservation in the era of big data. J. Comput. Inf. Syst. 2023, 63, 608–631. [Google Scholar] [CrossRef]
- Kamara, S.; Papamanthou, C.; Roeder, T. Dynamic searchable symmetric encryption. In Proceedings of the 2012 ACM Conference on Computer and Communications Security, Raleigh, NC, USA, 16–18 October 2012; pp. 965–976. [Google Scholar]
- Abdalla, M.; Bellare, M.; Catalano, D.; Kiltz, E.; Kohno, T.; Lange, T.; Malone-Lee, J.; Neven, G.; Paillier, P.; Shi, H. Searchable encryption revisited: Consistency properties, relation to anonymous IBE, and extensions. In Proceedings of the Annual International Cryptology Conference, Santa Barbara, CA, USA, 14–18 August 2005; pp. 205–222. [Google Scholar]
- Colombo, S.; Nikitin, K.; Corrigan-Gibbs, H.; Wu, D.J.; Ford, B. Authenticated private information retrieval. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security 23), Anaheim, CA, USA, 9–11 August 2023; pp. 3835–3851. [Google Scholar]
- Xu, H.; Xiao, B.; Liu, X.; Wang, L.; Jiang, S.; Xue, W.; Wang, J.; Li, K. Empowering authenticated and efficient queries for STK transaction-based blockchains. IEEE Trans. Comput. 2023, 72, 2209–2223. [Google Scholar] [CrossRef]
- Sangaiah, A.K.; Javadpour, A.; Ja’fari, F.; Pinto, P.; Chuang, H.M. Privacy-aware and ai techniques for healthcare based on k-anonymity model in internet of things. IEEE Trans. Eng. Manag. 2024, 239, 122343. [Google Scholar] [CrossRef]
- Ashkouti, F.; Khamforoosh, K.; Sheikhahmadi, A. DI-Mondrian: Distributed improved Mondrian for satisfaction of the L-diversity privacy model using Apache Spark. Inf. Sci. 2021, 546, 1–24. [Google Scholar] [CrossRef]
- Ren, W.; Ghazinour, K.; Lian, X. kt-Safety: Graph Release via k-Anonymity and t-Closeness. IEEE Trans. Knowl. Data Eng. 2022, 35, 9102–9113. [Google Scholar] [CrossRef]
- Sun, Y.; Yin, L.; Liu, L.; Xin, S. Toward inference attacks for k-anonymity. Pers. Ubiquitous Comput. 2014, 18, 1871–1880. [Google Scholar] [CrossRef]
- Zhang, H.; Jiang, S.; Xuan, S. Decentralized federated learning based on blockchain: Concepts, framework, and challenges. Comput. Commun. 2024, 216, 140–150. [Google Scholar] [CrossRef]
- Zhang, M.; Cao, J.; Sahni, Y.; Chen, X.; Jiang, S. Resource-efficient Parallel Split Learning in Heterogeneous Edge Computing. In Proceedings of the International Conference on Computing, Networking and Communications, Big Island, HI, USA, 19–22 February 2024; pp. 794–798. [Google Scholar]
- Odoom, J.; Huang, X.; Zhou, Z.; Danso, S.; Zheng, J.; Xiang, Y. Linked or unlinked: A systematic review of linkable ring signature schemes. J. Syst. Archit. 2023, 134, 102786. [Google Scholar] [CrossRef]
- Zhou, L.; Diro, A.; Saini, A.; Kaisar, S.; Hiep, P.C. Leveraging zero knowledge proofs for blockchain-based identity sharing: A survey of advancements, challenges and opportunities. J. Inf. Secur. Appl. 2024, 80, 103678. [Google Scholar] [CrossRef]
- Ma, R.; Zhang, L.; Wu, Q.; Mu, Y.; Rezaeibagha, F. Be-trdss: Blockchain-enabled secure and efficient traceable-revocable data-sharing scheme in industrial internet of things. IEEE Trans. Ind. Inform. 2023, 19, 10821–10830. [Google Scholar] [CrossRef]
- Jung, T.; Li, X.Y.; Huang, W.; Qian, J.; Chen, L.; Han, J.; Hou, J.; Su, C. Accounttrade: Accountable protocols for big data trading against dishonest consumers. In Proceedings of the IEEE INFOCOM 2017-IEEE Conference on Computer Communications, Atlanta, GA, USA, 1–4 May 2017; pp. 1–9. [Google Scholar]
- Wu, H.; Li, H.; Luo, X.; Jiang, S. Blockchain-Based Onsite Activity Management for Smart Construction Process Quality Traceability. IEEE Internet Things J. 2023, 10, 21554–21565. [Google Scholar] [CrossRef]
- Jiang, S.; Cao, J.; Tung, C.L.; Wang, Y.; Wang, S. Sharon: Secure and Efficient Cross-shard Transaction Processing via Shard Rotation. In Proceedings of the IEEE INFOCOM 2024-IEEE Conference on Computer Communications, Vancouver, BC, Canada, 20–23 May 2024; pp. 2418–2427. [Google Scholar]
- Chen, H.; Pendleton, M.; Njilla, L.; Xu, S. A survey on ethereum systems security: Vulnerabilities, attacks, and defenses. ACM Comput. Surv. 2020, 53, 1–43. [Google Scholar] [CrossRef]
- Wu, H.; Cao, J.; Yang, Y.; Tung, C.L.; Jiang, S.; Tang, B.; Liu, Y.; Wang, X.; Deng, Y. Data management in supply chain using blockchain: Challenges and a case study. In Proceedings of the 2019 28th International Conference on Computer Communication and Networks (ICCCN), Valencia, Spain, 29 July–1 August 2019; pp. 1–8. [Google Scholar]
- Zheng, Z.; Zhu, J.; Lyu, M.R. Service-generated big data and big data-as-a-service: An overview. In Proceedings of the 2013 IEEE International Congress on Big Data, Silicon Valley, CA, USA, 6–9 October 2013; pp. 403–410. [Google Scholar]
- Byabazaire, J.; O’Hare, G.; Delaney, D. Data quality and trust: Review of challenges and opportunities for data sharing in iot. Electronics 2020, 9, 2083. [Google Scholar] [CrossRef]
- Jiang, S.; Cao, J.; Zhu, J.; Cao, Y. Polychain: A generic blockchain as a service platform. In Proceedings of the Third International Conference Blockchain and Trustworthy Systems (BlockSys), Guangzhou, China, 5–6 August 2021; pp. 459–472. [Google Scholar]
- Zhang, M.; Cao, J.; Yang, L.; Zhang, L.; Sahni, Y.; Jiang, S. Ents: An edge-native task scheduling system for collaborative edge computing. In Proceedings of the 2022 IEEE/ACM 7th Symposium on Edge Computing (SEC), Seattle, WA, USA, 5–8 December 2022; pp. 149–161. [Google Scholar]
- Zhang, Z.; Yang, K.; Tian, Y.; Ma, J. An anti-disguise authentication system using the first impression of avatar in metaverse. IEEE Trans. Inf. Forensics Secur. 2024, 19, 6393–6408. [Google Scholar] [CrossRef]
- Wang, S.; Yang, M.; Jiang, S.; Chen, F.; Zhang, Y.; Fu, X. BBS: A secure and autonomous blockchain-based big-data sharing system. J. Syst. Archit. 2024, 150, 103133. [Google Scholar] [CrossRef]
- Chen, X.; Cao, J.; Sahni, Y.; Jiang, S.; Liang, Z. Dynamic task offloading in edge computing based on dependency-aware reinforcement learning. IEEE Trans. Cloud Comput. 2024, 12, 594–608. [Google Scholar] [CrossRef]
- Zhang, M.; Shen, X.; Cao, J.; Cui, Z.; Jiang, S. Edgeshard: Efficient llm inference via collaborative edge computing. IEEE Internet Things J. 2025, 12, 13119–13131. [Google Scholar] [CrossRef]
- Zhang, M.; Cao, J.; Sahni, Y.; Chen, Q.; Jiang, S.; Yang, L. Blockchain-based collaborative edge intelligence for trustworthy and real-time video surveillance. IEEE Trans. Ind. Inform. 2022, 19, 1623–1633. [Google Scholar] [CrossRef]










| Term | Data Type | Incentives | Commerciality |
|---|---|---|---|
| Big data sharing | Unrestricted | Unrestricted | Unrestricted |
| Open data | Governmental and scholarly data | Public good | Non-commercial |
| Data exchange | Unrestricted | Right of using data | Unrestricted |
| Big data trading | Unrestricted | Monetary reward and right of using data | Commercial |
| Dataset | Description | Update Frequency | Data Source |
|---|---|---|---|
| Coronavirus Disease (COVID-19) Testing Data | The dataset includes positive and negative results, pending tests, and the total people tested for each U.S. state or district. | Every two hours | COVID Tracking Project, Washington, DC, USA |
| COVID-19 Apple Mobility Trends Reports | The dataset contains COVID-19 mobility trends in countries/regions and cities. | Daily | Apple Inc., Cupertino, CA, USA |
| USA Hospital Beds—COVID-19 | This dataset includes data on the numbers of licensed beds, staffed beds, ICU beds, and the bed utilization rate for the hospitals in the U.S. | Daily | Definitive Healthcare, Framingham, MA, USA |
| Google COVID-19 Community Mobility Reports | The dataset includes the movement trends by geography across different categories of places over time. | Daily | Google LLC, Mountain View, CA, USA |
| COVID-19—World Confirmed Cases, Deaths, and Testing | The dataset includes COVID-19 data on confirmed cases, deaths, and testing worldwide. | Daily | Our World in Data, Oxford, UK |
| Feature/Criterion | DHC | DAC | Decentralized Solutions |
|---|---|---|---|
| Central Authority | Yes, to manage data storage, access, and transfer | Yes, for data discovery and brokering connections but not for data storage | No, the network itself governs the interactions |
| Pros | High efficiency, data authenticity | Data integrity, efficient data discovery | Resilience, high efficiency for “hot” data |
| Cons | Data privacy risk | Potential for misuse, weak authenticity | Maintenance burden, unclear incentives, inconsistent performance |
| Examples | SEEK, AWS Data Exchange | Epimorphics LDP, HKSTP Data Studio | IPFS |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jiang, S. Big Data Sharing: A Comprehensive Survey. Data 2025, 10, 182. https://doi.org/10.3390/data10110182
Jiang S. Big Data Sharing: A Comprehensive Survey. Data. 2025; 10(11):182. https://doi.org/10.3390/data10110182
Chicago/Turabian StyleJiang, Shan. 2025. "Big Data Sharing: A Comprehensive Survey" Data 10, no. 11: 182. https://doi.org/10.3390/data10110182
APA StyleJiang, S. (2025). Big Data Sharing: A Comprehensive Survey. Data, 10(11), 182. https://doi.org/10.3390/data10110182
